0% found this document useful (0 votes)

9 views

Multitask multimodal 2

The document presents M3T-LM, a multi-modal multi-task learning model designed to predict inpatient length of stay (LoS) and mortality simultaneously by integrating diverse clinical data sources. The model employs various sub-models tailored for different data types, including a novel attention-embedded CNN for numerical data, LSTM networks for clinical notes, and a specialized CNN for chest X-ray images. Experimental results on the MIMIC-IV dataset demonstrate that M3T-LM outperforms existing methods in both regression and classification tasks, achieving a test MAE of 5.54 for LoS and an F1 score of 0.876 for mortality prediction.

Uploaded by

Yanwei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Multitask multimodal 2

Uploaded by

Yanwei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Computers in Biology and Medicine 183 (2024) 109237

Contents lists available at ScienceDirect

Computers in Biology and Medicine

journal homepage: www.elsevier.com/locate/compbiomed

M3T-LM: A multi-modal multi-task learning model for jointly predicting

patient length of stay and mortality
Junde Chen a , Qing Li b , Feng Liu c , Yuxin Wen a ,∗
a
Dale E. and Sarah Ann Fowler School of Engineering, Chapman University, Orange, CA 92866, USA
b
Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA 50011, USA
c
School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, NJ 07030, USA

ARTICLE INFO ABSTRACT

Keywords: Ensuring accurate predictions of inpatient length of stay (LoS) and mortality rates is essential for enhancing
Multi-task learning hospital service efficiency, particularly in light of the constraints posed by limited healthcare resources.
Data-fusion model Integrative analysis of heterogeneous clinic record data from different sources can hold great promise for
Length of stay prediction
improving the prognosis and diagnosis level of LoS and mortality. Currently, most existing studies solely
Deep learning
focus on single data modality or tend to single-task learning, i.e., training LoS and mortality tasks separately.
This limits the utilization of available multi-modal data and prevents the sharing of feature representations
that could capture correlations between different tasks, ultimately hindering the model’s performance. To
address the challenge, this study proposes a novel Multi-Modal Multi-Task learning model, termed as M3T-LM,
to integrate clinic records to predict inpatients’ LoS and mortality simultaneously. The M3T-LM framework
incorporates multiple data modalities by constructing sub-models tailored to each modality. Specifically, a
novel attention-embedded one-dimensional (1D) convolutional neural network (CNN) is designed to handle
numerical data. For clinical notes, they are converted into sequence data, and then two long short-term
memory (LSTM) networks are exploited to model on textual sequence data. A two-dimensional (2D) CNN
architecture, noted as CRXMDL, is designed to extract high-level features from chest X-ray (CXR) images.
Subsequently, multiple sub-models are integrated to formulate the M3T-LM to capture the correlations between
patient LoS and modality prediction tasks. The efficiency of the proposed method is validated on the MIMIC-
IV dataset. The proposed method attained a test 𝑀𝐴𝐸 of 5.54 for LoS prediction and a test 𝐹 1 of 0.876
for mortality prediction. The experimental results demonstrate that our approach outperforms state-of-the-art
(SOTA) methods in tackling mixed regression and classification tasks.

1. Introduction accurate prediction of inpatients’ LoS and mortality. With the ris-
ing prevalence of electronic health record (EHR) systems, patients’
Healthcare systems continue to face a significant challenge of pro- clinical records, such as patients’ laboratory test results, vital signs,
viding timely patient care while optimizing resource utilization, espe- demographic information, clinical notes, and other details, are now
cially in the wake of the COVID-19 pandemic [1]. Inpatients’ length accessible. Leveraging this abundant knowledge, sophisticated data-
of stay (LoS) and mortality are two crucial metrics that hospitals driven algorithms enable precise predictions for inpatients’ LoS and
utilize to assess clinical quality and optimize resource allocation [2]. mortality.
This research focuses on improving the service efficiency and man-
Prolonged LoS escalates the likelihood of encountering adverse events,
agement capabilities of hospitals by simultaneously predicting inpatient
such as poor nutritional levels, hospital-acquired infections, adverse
LoS and mortality. As mentioned previously, patients’ LoS and mortality
drug events, and various other complications. Furthermore, prolonged
in a hospital are crucial indicators to assess the quality of care and effec-
LoS increases in the odds of inpatient mortality [3]. This has triggered
tive allocation of healthcare resources. Therefore, predicting inpatient
hospitals to spend intensive efforts on resource allocation. Real-time
LoS (number of days) will be a regression prediction, and mortality will
demand capacity (RTDC) management [4] and multidisciplinary dis- be a binary classification in this study. From the recent literature, it is
charge rounds (MDRs) [5] have shown great promise as best practices evident that machine learning (ML) offers unprecedented opportunities
in addressing these challenges, but their effectiveness relies on the

∗ Corresponding author.
E-mail addresses: [email protected] (J. Chen), [email protected] (Q. Li), [email protected] (F. Liu), [email protected] (Y. Wen).

https://doi.org/10.1016/j.compbiomed.2024.109237
Received 25 May 2024; Received in revised form 29 September 2024; Accepted 30 September 2024
Available online 7 October 2024
0010-4825/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

to improve patient and clinical outcomes due to the great potential notes, laboratory test results, and medical images, are integrated to
for learning essential features and extracting meaningful insights from be used in our scheme. According to different data modalities, the
data [6]. Some notable works include, but are not limited to, Gaussian basic models (sub-models) are constructed using relevant data types.
Process Regression for clinical e-health modeling [7], prediction of In- Concretely, a novel attention-embedded one-dimensional (1D) CNN is
tensive Care Unit (ICU) LoS based on four ML methods such as Logistic designed to handle numerical data. By converting the texts to sequence
Regression (LR), Support Vector Machine (SVM), Random forest (RF), data, two long short-term memory (LSTM) networks are used to model
and XGBoost [8], a Hierarchical Attention Network (HAN) for LoS and on clinical notes. A two-dimensional (2D) CNN architecture, named
mortality predictions [9], ensemble learning for improving predictive CRXMDL, is designed to extract high-level features from chest X-ray
performance [10], U-Net-Based Models for medical image segmentation (CXR) images. Subsequently, multiple sub-models are integrated to
[11], CNN for medical image classification [12], etc. However, the form the M3T-LM to capture the correlations between patient LoS
majority of existing ML models in healthcare either rely exclusively and modality prediction tasks. It is important to note that predicting
on a single data modality or solely for a single task [13]. With the in- inpatients’ LoS and mortality involves a challenging mixed-task sce-
creasing availability and accessibility of multi-modal data, multi-modal nario, encompassing both regression and classification tasks. A novel
deep learning (DL) models [14], aiming to integrate data of different predictive framework is proposed to address this challenge. Overall, the
distributions, sources, and formats into a unified space where both key contributions of this study can be recapitulated as follows:
inter-modality and cross-modality aspects can be uniformly captured,
have been successful in a wide range of domains, such as autonomous • A joint classification-regression scheme that implements mixed-
driving and video classification through combining visual features from task types using heterogeneous data modalities is proposed to
cameras along with data from Light Detection and Ranging (LiDAR) predict inpatients’ LoS and mortality simultaneously.
sensors [15], emotion recognition through the fusion of audiovisual
• An enhanced squeeze-and-excitation (SE)-block, where the 2D
content with textual users’ comments [16], and process monitoring in
pooling layer is replaced by a 1D one and two non-linear fully-
manufacturing using multimodal sensor data [17]. The main challenge
connected layers are substituted by a 1 × 1 convolution layer to
of multi-modal data fusion is that data from different sources and file
address numerical data and decrease the number of parameters,
formats exhibit heterogeneity and high-dimensionality, seldom adher-
is incorporated into the network for adaptive feature calibration.
ing to uniformity, and this is especially the case with clinical data. The
complex nature of clinical data imposes significant challenges on how
• CXR images are an integral component of our scheme, where we
to efficiently make joint representations of heterogeneous modalities in
have developed an innovative CNN model referred to as CRXMDL.
a way that enables their seamless integration. Consequently, even with
This model utilizes the InceptionResNet V2 as its backbone net-
significant importance, the predictions of inpatient LoS and mortality
work, known for its powerful feature extraction capabilities. We
using multi-modal data have received less attention in the literature
further enhance this architecture by embedding three convolution
[2].
blocks with 32, 16, and 8 filters of size 3 × 3, a max pooling
Another noticeable trend is that most clinical machine learning
(MAP) layer, a flatten layer, and a fully connected (FC) layer.
systems focus on single clinical prediction tasks. Nonetheless, in the
These additions are designed to capture and leverage the most
real-world clinical environment, multiple tasks always demonstrate
salient features from CXR images, thereby improving the accuracy
interdependence. For instance, while the risk of heart disease and the
and robustness of our predictions.
likelihood of diabetes development represent distinct medical condi-
tions, they share underlying physiological factors such as blood pres- • A unified model that incorporates losses from both regression
sure, cholesterol levels, and family medical history [18]. Multi-task and classification tasks is developed. An adaptive loss weight
learning (MTL), a subfield of machine learning, fosters the interchange assignment solution is proposed to determine the optimal weights
of insights among interconnected tasks by training multiple related for these tasks automatically, enhancing the model’s overall per-
tasks simultaneously using a single model. By sharing information be- formance.
tween related tasks, MTL improves the generalization and performance
of the model by leveraging the shared information of related tasks. Le The rest of this paper is structured as follows. Section 2 briefly
et al. [19] proposed a convolutional neural network (CNN) based multi- introduces the relevant work and identifies the research gaps. Section 3
task classification and segmentation architecture for cancer diagnosis discusses the proposed methodology in detail. Section 4 presents ex-
in mammography. Yu et al. [20] used a multi-task recurrent neural periment results as well as comparative analysis. Finally, Section 5
network with an attention mechanism to predict patient mortality in concludes the paper and points out future work.
hospitals. Despite the achievements in medical predictions using MTL,
there has not been much effort to simultaneously incorporate multi- 2. Related work
modal clinical data and multi-task learning with the aim of enhancing
prediction performance. Tan et al. [21] proposed a multi-modal and First, we conduct a review of the literature that focuses on the
multi-task DL framework called MultiCoFusion to combine the power of related studies on inpatients’ LoS and mortality predictions. Then, a
different modalities and tasks for cancer prognosis prediction. Their ex- review of the multi-modal multi-task learning in the healthcare field is
perimental results indicate that the joint learning of multiple tasks can presented. Subsequently, the current research gaps are discussed in this
utilize the intrinsic association between features (i.e., genes), and thus, section.
can further promote the learning performance. However, they manually
extracted features from histopathological images and mRNA expression 2.1. Length of stay prediction
data, and LoS prediction was not their research topic. Harerimana
et al. [9] developed a hierarchical deep attention model to forecast Accurate prediction of LoS can increase patient satisfaction by
the LoS and in-hospital mortality from ICD codes and demographic reducing unnecessary wait times and saving hospital costs. The ex-
data. Unfortunately, the LoS was predicted in a classification manner. isting ML models for LoS prediction can be broadly grouped into
In addition, LoS and mortality tasks were trained separately. two categories: classification models and regression models [22]. In
To address the aforementioned challenges, in this study, we propose classification models, the aim is to group the LoS into multiple classes,
a novel Multi-Modal Multi-Task learning model, termed as M3T-LM, to e.g., short stay, medium stay, and long stay, based on the number
perform the LoS regression and mortality classification tasks simultane- of days that the patient stays in the hospital. Morton et al. [23]
ously. Multiple data modalities, including demographic data, clinical categorized the LoS of diabetic inpatients into long-term and short-term

2
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

stays. Multiple classification models such as SVM, RF, and LASSO- multiple learning tasks at the same time, while exploiting common-
based multi-task learning were used in their comparison experiments. alities and differences across tasks, has proven to be more reliable in
Although the SVM and RF achieved the desired results, they believe identifying related characteristics, less sensitive to data noises, and less
that multi-task learning is promising for LoS prediction. Thompson overfitting risk [33]. Zhang et al. [34] proposed a 3-dimensional multi-
et al. [24] divided the LoS of newborns into prolonged LoS or not, and task DL model based on MLP-Mixer architecture to simultaneously
they used well-known ML algorithms such as LR, Decision Tree (DT), implement FDG/AV45-PET SUVR and AD status prediction tasks in
SVM, RF, and neural networks (NN) to implement the class prediction. Alzheimer’s Disease Diagnosis. Their experimental results show that
The RF, DT, and NN achieved impressive performance. Nevertheless, MTL can share feature representations, which is beneficial for both
recent studies have pointed out that the LoS distributions exhibit a sig- tasks. Shao et al. [35] proposed a multi-task multi-modal learning
nificant right-skewed pattern [9,25], which indicates that the balance method for joint prognosis and diagnosis of cancer patients. Two types
of the dataset is disrupted, with only a limited number of instances of data including histopathological images and genomic data were used
demonstrating long LoS. Consequently, this imbalance causes the model
in their scheme to address both prognosis and diagnosis tasks. They
to treat classes with long LoS as anomalies, leading to a decrease in
concluded that the MTL captured the correlation between different
classification accuracy. Therefore, it is more appropriate to formulate
tasks and obtained better performance than single-task learning. Using
the LoS task as a regression problem. Modeling LoS prediction as a
different data modalities such as histopathological images and mRNA
regression problem has gained less attention in the literature. Tsai et al.
expression data, Tan et al. [21] built a multi-modal multi-task learning
[26] applied a NN method to predict LoS for cardiology patients. They
model to perform the survival analysis and grade classification for
concluded that the NN model is robust for predicting prolonged LoS.
However, there is still room for improvement in the accuracy of their cancer prognosis diagnosis. They concluded that using multi-modal
model. Using a cluster-boosted regression method, Rouzbahman et al. data would perform better than using only single-modal data. In an-
[27] conducted mortality and LoS predictions for ICU inpatients. Their other study, Liu et al. [36] proposed jointly identifying brain diseases
findings indicated enhanced accuracy in regression predictions for both and predicting clinical scores using both magnetic resonance imag-
mortality and LoS. However, determining an optimal number of clusters ing (MRI) and patient demographic information. Their experimental
remains challenging and involves a degree of subjectivity. Muhlestein findings demonstrate that the MTL outperforms the single-task learning.
et al. [10] trained an ML ensemble model to predict inpatient LoS Although some joint learning models have been proposed, certain
after brain tumor surgery. Their experimental results demonstrated a models incorporate only a single data modality. Moreover, most of
good performance of the ensemble model for LoS prediction. However, them first extract hand-crafted features from images and pre-process
the ML ensemble model integrated multiple sub-models, increasing the the data separately, and the separate process might lack effective co-
complexity of calculations. ordination, consequently resulting in suboptimal learning performance.
Besides, most research implemented the same task type but not mixed-
2.2. Inpatient mortality prediction task types. To address abovementioned research gap, in this paper, we
establish a multi-modal multi-task learning model to simultaneously
Accurate prediction of inpatient mortality plays a vital role in implement mixed-type regression and classification tasks using multiple
evaluating disease severity, interventions, assessing the efficacy of data modalities. Specifically, we aim to simultaneously predict inpa-
novel therapies, and guiding healthcare initiatives. Over the past few tients’ length of stay and mortality as they have proven to be closely
decades, great efforts have been invested in the prediction of inpatient related for the inpatients after ICU admission, and share common
mortality. Ruzicka et al. [28] applied XGBoost, to predict patients’ feature representations needed to train regression and classification
mortality in hospitals, and compared it with a traditional unregularized
models. Heterogeneous medical data modalities, including but not lim-
LR model. In their experiments, the XGBoost outperformed the LR
ited to, static numerical data (demographics, healthcare examination),
but was not competitive with existing methods. Ganapathy et al. [29]
unstructured texts (clinical notes, long procedure texts), and Chest X-
compared several models, such as LR, Binary Discriminant Analysis
ray images, are used by the proposed multi-modal multi-task model
(BDA), Bayesian Linear Regression (BLR), NN, and RF, for inpatient
(M3T-LM) to implement the automatic prediction of inpatients’ LoS and
mortality prediction, and the BLR model achieved the best precision
mortality.
in their experiments. They concluded that the ML classifiers had the
best predictive ability in comparison to statistical models. Caicedo-
Torres et al. [30] designed a deep learning model called ISeeU2 to 3. Methodology
predict mortality inside the ICU. Their proposed model outperformed
the compared baselines, highlighting the valuable insights that can The proposed M3T-LM method includes the following key distinc-
be extracted from raw nursing notes. Similarly, in another research,
tions: (1) To maximize the utilization of available data, a multi-modal
Zeng et al. [31] proposed a recurrent neural network (RNN)-based
data fusion that fuses diverse data modalities, including patient de-
DL architecture to predict the mortality for all admissions in the
mographic information, diagnosis, free clinical notes, laboratory test
ICU. Although their proposed approach outperforms classical machine
results, and medical images, is implemented in our scheme. (2) A multi-
learning methods such as LR, RF, and XGBoost, the issue of imbalanced
task learning model with a shared network layer is proposed to capture
positive and negative sample distributions remains unaddressed. Using
the correlations between inpatients’ LoS and modality prediction tasks,
image-Transformed electrocardiograms (ECG) waveforms, Kondo et al.
[32] conducted short-term mortality prediction for cardiac care unit since these two tasks are intrinsically associated with each other. (3)
patients, and their method successfully reached the desired prediction Mixed-task types are learned in our scheme. Different from the same
accuracy. Nevertheless, it is noteworthy that the model solely relies on task type implemented in most existing research, the mixed-task types
image data, which limits the overall accuracy of their approach. including the regression and classification tasks are simultaneously
performed for the inpatient LoS and mortality prediction. The pro-
2.3. Multi-modal multi-task learning posed approach leverages the interconnections among the diverse data
and tasks, which potentially improves model efficiency and reduces
Compared to single modality models, multi-modal models have overfitting risk through modeling nonlinear within and cross-modality
the capacity of producing more reliable results owing to their ability relationships. Fig. 1 provides the flow diagram of the proposed proce-
to perceive different aspects of the data, leading to enhanced model dure. In the following, we first present the architecture of the M3T-LM,
accuracy and reliability. Multi-task learning (MTL), which can solve and then the optimization process is discussed in detail.

3
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

[41] is used as a backbone network of the model. Transfer learning is

applied in our modeling scheme. Specifically, the InceptionResnet V2 is
employed with the truncated top layers, which is followed by a global
average pooling (GAP) layer, a flatten layer, and a batch normalization
(BN) layer to extract deep-level features of CXR images.
Subsequently, the multiple basic sub-models including Att-1DCNN,
LSTM, and CXRMDL are integrated to form a novel Multi-Modal Multi-
Task learning model, where two densely-connected (DC) layers with
the neuron numbers of 64 and 32 are embedded into the networks
to change the vector dimensions. Following each DC layer, a dropout
layer with a dropout rate of 0.1 is added to suppress the overfitting
risks. After that, a fully-connected (FC) ReLu layer and a FC Softmax
Fig. 1. The flowchart of the proposed procedure. layer are used in the network for the final LoS prediction and mortality
classification tasks. In brief, the crucial steps are summarized below.
(1) The raw dataset is pre-processed and augmented to 𝑫, which
3.1. Architecture of the M3T-LM is divided into diverse subsets 𝑫 = {𝑫 𝟏 , 𝑫 𝟐 , …, 𝑫 𝑴 }. Where 𝑫 𝒌 =
{(𝑥𝑘,1 , 𝑦𝑘,1 , 𝑧𝑘,1 ), (𝑥𝑘,2 , 𝑦𝑘,2 , 𝑧𝑘,2 ), … , (𝑥𝑘,𝑀 , 𝑦𝑘,𝑀 , 𝑧𝑘,𝑀 )}, 𝑥𝑘,𝑚 indicates the
Define the input data as 𝑫, which is composed of diverse subsets extracted feature data, and (𝑦𝑘,𝑚 , 𝑧𝑘,𝑚 ) denotes the corresponding target
𝑫 = {𝑫 𝟏 , 𝑫 𝟐 , …, 𝑫 𝑴 }, where 𝑀 is the total number of modalities (As variables (e.g., LoS and mortality), 𝑘, 𝑚 ∈ {1, 2, … , 𝑀}.
shown in Table 2, three different data modalities including numerical, (2) Construct the backbone network models 𝐻 = {𝐻1 , 𝐻2 , … , 𝐻𝑀 },
text, and image data are used in our scheme). Denote the length of which is used for addressing data in different modalities such as nu-
stay for each subject as 𝑦𝑖 , 𝑖 = 1, 2, … , 𝑁, labels of 𝐶 categories as merical data, text data, and CXR image data. Each backbone model is
𝑧𝑐 = {𝑧𝑐𝑛 }𝑁 , 𝑁 is the number of total samples. First, the sub-models are fed corresponding data subsets 𝑫 = {𝑫 𝟏 , 𝑫 𝟐 , …, 𝑫 𝑴 } extracted in the
𝑛=1
built based on the different data modalities to learn abstractions of the pre-processed stage, where the data transformation and data cleaning
data from raw data directly. Specifically, for the numerical data, a novel are implemented.
(3) The outputs of basic models are concatenated and used as the
attention-embedded 1D CNN noted Att-1DCNN is developed to extract
input to the subsequent (secondary) predictor 𝐹 . Here, the subsequent
meaningful information. Using 32 convolutional kernels with the size
predictor 𝐹 consists of 2 DC layers with 64 and 32 neurons, 2 dropout
of 3, two cascaded 1D convolution layers followed by a 1D max pooling
layers, a FC ReLu layer, and then a FC Softmax layer is used for the final
layer are used to extract favorable features. Then, a modified squeeze-
LoS regression and mortality classification tasks, respectively. Fig. 2
and-excitation (SE)-block, where the 2D pooling layer is replaced by a
portrays an overall architecture of the proposed M3T-LM, and the major
1D one and two non-linear fully-connected (FC) layers are substituted
parameters are summarized in Table 1.
by a 1 × 1 convolution layer to address 1D numerical data and decrease
the number of parameters, is incorporated into the network for adap-
3.2. Optimization of the M3T-LM
tive feature calibration [37]. Next, following the SE-block, a spatial
attention (SA) mechanism that can help CNN extract global features
In the proposed framework, each branch of the joint model learns a
via mining the inter-spatial relationship between features is embedded
different task, and therefore it is necessary to specify a loss function for
into the network to infer the importance of spatial points [38]. By this
each task. In this research, the inpatient LoS and mortality predictions
means, the features obtained by the attention mechanism and bottom
are modeled as the regression and classification tasks, respectively.
convolution block are fused to generate the output of the Att-1DCNN,
Thus, the regression and classification loss functions in the proposed
which comprehensively extracts the useful information of numerical
M3T-LM are separately defined below.
data for the prediction tasks of the model.
(1) Regression loss function. In general, the mean squared error
For unstructured texts, the long short-term memory (LSTM) network (MSE) function is the most-used loss function employed in deep learn-
is an effective and end-to-end DL method for text processing. Besides, ing models for addressing regression problems, while it also has some
word embedding is a popular technique to map words or phrases from demerits, such as sensitivity to outliers. Therefore, to reduce the impact
vocabulary to a corresponding vector of continuous values. However, of singularity values, we introduce a custom regression loss function
directly modeling sequential notes using word embeddings and DL [42] in the network for the LoS prediction task. The formula of the
can be time-consuming. It may not be practical since the length of regression loss function is defined by:
different documents varies. Therefore, the tokenizer is first employed to
implement word segmentation for long texts. Then, a Text2Seq function ⎧ 1 ∑1
𝑁
⎪ (𝑦 − 𝑦̂𝑖 ), 𝑓 𝑜𝑟 ||𝑦𝑖 − 𝑦̂𝑖 || ≤ 𝛿,
[39] is used to transform the text data into the sequence variables. ⎪ 𝑁 𝑖=1 2 𝑖
To capture the dependence along with sequence variables, two LSTM 𝐿𝑟𝑒𝑔 =⎨ (1)
1 ∑ |
𝑁
⎪ 1
networks are separately designed to take the output of Text2Seq for (𝛿 𝑦 − 𝑦̂𝑖 || − 𝛿 2 ), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
⎪
⎩ 𝑁 𝑖=1 | 𝑖 2
d_icd_diagnoses and d_icd_procedures long titles to infer the sequence-
dependent feature representations. Here, the hyper-parameter of the where 𝑦𝑖 and 𝑦̂𝑖 denote the actual and predicted values, respectively.
perceptron number is set to 2 and L2 regularization is employed for 𝛿 is a hyperparameter of the threshold value, and here it is set to 2
suppressing overfitting risk. according to the hyperparameter tuning results.
The integration of clinical data and chest X-ray images showed (2) Classification loss function. The in-hospital mortality prediction
a favorable impact on the predictive performance in prognostication belongs to a classification problem, and the Binary Cross Entropy (BCE)
tasks, and it also delivered a positive performance for in-hospital mor- loss function is exploited in our network to address the in-hospital
tality prediction and phenotype classification [12,14,40]. Particularly, mortality prediction task. The formula of BCE loss function is expressed
the effects of CXR images were illustrated by Hayat et al. [40], who as:
observed a significant improvement in accuracy when using the CXR
1 ∑
𝑁
images along with clinical data to build a multi-modal fusion model. 𝐿𝑐𝑙𝑎𝑠𝑠 = − 𝑦 ⋅ 𝑙𝑜𝑔(𝑝(𝑦𝑖 )) + (1 − 𝑦𝑖 ) ⋅ 𝑙𝑜𝑔(1 − 𝑝(𝑦𝑖 )) (2)
𝑁 𝑖=1 𝑖
Therefore, we further integrate the CXR images into our modeling
scheme. For the CXR image data, we devise a convolutional neural where 𝑦𝑖 represents the actual class and 𝑝(𝑦𝑖 ) denotes the predicted
network architecture named CXRMDL, in which the InceptionResnet V2 probability of that class. As a consequence, the two different loss

4
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Fig. 2. The overall architecture of the proposed M3T-LM.

Table 1
The major parameters of the proposed model.
Layer (module) Input shape Filter no. Kernel size Output shape Params Repeat
num (InputLayer) (None, 26, 1) – – (None, 26, 1) – 1
conv1d (Conv1D) (None,26,1) 32 3 (None, 24, 32) 128 1
conv1d-1 (Conv1D) (None,24,32) (None, 22, 32) 3,104 1
MaxPooling1D (None,22,32) – – (None,11,32) – 1
GlobalAveragePooling1D (None,11,32) – – (None,32) – 1
conv1d-2 (Conv1D) (None,32,1) 32 3 (None,32,1) 3 1
conv1d-3 (Conv1D) (None,1,11,2) 32 3 (None,1,11,1) 3 1
(None,1,1,32)
sigmoid-1 (None,1,11,1) – – (None,1,11,1) – 1
(None,1,11,32)
ad (InputLayer) (None,23) – – (None,23) – 1
ap (InputLayer) (None,18) – – (None,18) – 1
multiply_1 (Multiply) (None,1,11,32) – – (None,1,11,32) – 1
(None,1,11,1)
𝑒th (InputLayer) (None,1) – – (None,1) – 1
Flatten (None,1,11,32) – – (None,352) (None,27) – 2
(None,27,1)
embedding (None,1) (None,18) – – (None,1,32) 192+16,960+ 17,120 3
(None,23)
cxg (InputLayer) (None,128,128,1) – – [(None,128,128,1)] – 1
LSTM (None,23,32) (None,2) 280 2
(None,18,32)
ohe (InputLayer) (None,28) – – (None,28) – 1
flatten (Flatten) (None,1,32) – – (None, 32) – 1
sequential (Sequential) (None,128,128,1) – – (None, 1536) 54,336,160 1
Concatenate layer (None,378) (None,28) – – (None, 1978) – 2
(None, 32) (None, 1536)
(None, 2)
Dense (drop=0.1) (None,1978) – – (None, 32) 126,656+2,080 2
Regression/Class (None,32) – – (None,1) (None,2) 33+33 1
Total – – – – 54,503,032 (33 layers) –

functions are established for multi-modal multi-task learning. However, relied on the defined weights between each task’s loss, but tuning these
only the result of one loss function can be updated in the process of weights by hand is a great challenge and an expensive process. There-
backpropagation, so a joint loss function must be defined to integrate fore, we include the loss weights in the definition of the loss function
the two different loss functions, and the weighted sum method is itself and develop an adaptive way to update loss weights through
the most commonly used scheme. The weighted total loss function is callbacks, which manage the changes internally. The loss weight update
formulated as: can be defined as:
∑ (𝑡) (𝑡)
𝐿(𝑡) = 𝑤𝑖 𝐿𝑖 (3) 𝑤(𝑡+1) ← 𝑤(𝑡) (4)
𝑖 − 𝜆∇𝑤𝑖 𝐿𝑔𝑟𝑎𝑑
𝑡𝑜𝑡𝑎𝑙
𝑖=1 𝑖

where 𝑤𝑖 denotes the weight for the 𝑖th loss function 𝐿𝑖 , and 𝑡 implies where 𝜆 is a constant hyper-parameter, and 𝐿𝑔𝑟𝑎𝑑 denotes the gradient
the 𝑡th epoch of training. The performance of the system is highly loss, which is introduced especially to depict the loss caused by loss

5
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Fig. 3. The distribution of LoS and mortality.

Table 2
The characteristics of the data.

Characteristic Data type No. of variables Name of variables

Numerical & Categorical 6 Subject_id, gender, anchor_age,

Demographic variables anchor_year, anchor_year_group, dod
Chart Event variables Numerical & Categorical 9 Hadm_id, stay_id, charttime, storetime,
itemid, value, valuenum valueuom,
warning
Laboratory Event variables Numerical & Categorical 23 Labevent_id, hadm_id, specimen_id,
itemid, charttime, storetime, value,
valuenum, valueuom,
ref_range_lower,ref_range_upper, flag,
priority; WBC count, neutrophils count,
monocytes count, lymphocytes count,
platelets count, hemoglobin, glucose,
chloride, creatinine, BUN
Procedure Event variables Numerical & Categorical 10 Patient weight, total amount, total
amount uom, isopenbag, continue in
next dept, cancel reason, status
description, comments_date, original
amount, original rate
Text Note variables Text 2 AdmitDiagnosis, AdmitProcedure
Chest X-ray variables Image 1 Chest X-ray image

weight 𝑤. The formula of gradient loss 𝐿𝑔𝑟𝑎𝑑 is written by: first evaluate the accuracy of the proposed M3T-LM compared to
∑ | (𝑡) [ ]𝛼 | state-of-the-art (SOTA) methods. Next, we perform the hyperparameter
𝐿𝑔𝑟𝑎𝑑 (𝑡, 𝑤(𝑡) |𝐺 − 𝐺(𝑡) × 𝑟(𝑡) | (5)
𝑖 )= | 𝑖 𝑖 | optimization and fine-tuning via the random search for optimal sets of
𝑖 | |1
essential hyperparameters to maximize the prediction performance of
‖ (𝑡) ‖ 𝐿(𝑡)
𝑖 ∕𝐿𝑖
(0)
the model. Ultimately, we assess the efficacy of fused data modalities
𝐺𝑖(𝑡) = ‖∇𝜃 𝑤(𝑡)
𝑖 𝐿𝑖 ‖ , 𝑟(𝑡) = [ ] (6) and newly added modules for the proposed approach via ablation
‖ ‖2 𝑖
𝐸𝑡𝑎𝑠𝑘 𝐿(𝑡)
𝑖 ∕𝐿𝑖
(0)
study.
In Eq. (5), 𝐺𝑖(𝑡) is the value of gradient normalization on the 𝑖th task
4.1. Dataset description and preprocessing
in the 𝑡th epoch of training, which is calculated by the L2 norm
of the weighted loss gradient. 𝐺(𝑡) represents the mean of gradient
MIMIC, short for the Medical Information Mart for Intensive Care,
normalization for all tasks in the 𝑡th epoch of training. 𝑟(𝑡)
𝑖 denotes the is a large database of clinical records for patients admitted to the
relative training speed of the 𝑖th task in the 𝑡th epoch of training, which Beth Israel Deaconess Medical Center (BIDMC). The MIMIC-IV, which
is calculated as the ratio of the training speed of the 𝑖th task to the consists of comprehensive clinical information on hospital stays for
average training speed of all tasks. Overall, the loss weight is regarded patients, contains de-identified records of 50,048 individual patients
as an optimization parameter in this solution, and the 𝐿𝑔𝑟𝑎𝑑 of loss admitted to the ICU or emergency department (ED) at the BIDMC in
weight 𝑤 is established in each epoch of update. The initial weights Boston, MA, USA, between 2008 and 2019. The MIMIC-IV’s most recent
of the regression and classification loss functions are both set to 0.5, version (v1.0) [43], which was released on June 22, 2022, improves on
and the gradient update is implemented for each epoch of training. MIMIC-III [44] to provide public access to the EHR data based on the
BIDMC’s MetaVision clinical information system. Whilst, MIMIC-CXR-
4. Experiments JPG v2.0.0 [45], which is a large image dataset comprised of 227,827
CXR images sourced from the BIDMC between 2011 and 2016. This
In this section, we present the empirical performance evaluation dataset is freely available to facilitate and encourage broad research in
of the proposed approach for LoS and mortality prediction tasks. We the field of medical computer vision.

6
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Fig. 4. The LoS prediction effect of the proposed approach.

To efficiently predict the inpatients’ LoS and mortality, we utilize 4.2. Experiment setup and performance metrics
the MIMIC-IV v1.0 dataset combined with the MIMIC-CXR-JPG v2.0.0
dataset in this study. All the data are de-identified, where patient The experiments are conducted using Python 3.6 deep learning
identifiers are removed according to the Health Insurance Portability framework, where the commonly-used libraries including Keras, Scikit-
and Accountability Act (HIPAA) Safe Harbor provision. The MIMIC-IV learn, TensorFlow, and Matplotlib are utilized with the aid of a graphics
v1.0 database includes a wide range of patient records, such as pa- processing unit (GPU). The hardware environment for operating the
tients’ demographic information, laboratory test results, procedures and Python DL framework to implement the proposed M3T-LM contains the
AMD EPYC 7502P 32-Core Processor, 32 GB memory, and RTX A6000
diagnoses, free-text notes authored by clinicians, medication orders,
GPU.
etc. The tables in the MIMIC-IV v1.0 dataset mainly contain AD-
To evaluate the performance of LoS prediction, the standard mea-
MISSIONS, DIAGNOSIS_ICD, D_DIAGNOSIS_ICD PATIENTS, ICDSTAYS,
sure metrics like the mean absolute error (𝑀𝐴𝐸), root mean square
PROCEDURES_ICD, and D_PROCEDURES_ICD. Among them, the AD-
error (𝑅𝑀𝑆𝐸), coefficient of determination (𝑅-𝑆𝑞𝑢𝑎𝑟𝑒 or 𝑅2 ), and
MISSIONS table provides records for each hospitalization including explained variance (𝐸𝑉 𝐴𝑅 ) [6,8,14] are utilized, which are calculated
each patient’s admission and discharge time and the source of the by
admission. The DIAGNOSIS_ICD table gives the diagnosis category
1 ∑|
𝑁
information. The PATIENTS table provides timing information and de- 𝑀𝐴𝐸 = 𝑦 − 𝑦̂𝑖 || (7)
mographics for each patient, and the ICDSTAYS table provides the ICU 𝑁 𝑖=1 | 𝑖
√
data for each hospital admission. The PROCEDURES_ICD table presents √
√1 ∑ 𝑁
the procedure code for inpatients and the corresponding procedure 𝑅𝑀𝑆𝐸 = √ (𝑦 −𝑦̂ )2 (8)
𝑁 𝑖=1 𝑖 𝑖
names are included in the D_PROCEDURES_ICD table. Additionally, for
the patients’ chest X-ray images, they are stored in the MIMIC-CXR-JPG ∑
𝑁 ∑
𝑁
𝑅2 = 1 − (𝑦𝑖 − 𝑦̂𝑖 )2 ∕ ̄2
(𝑦𝑖 − 𝑦) (9)
database in JPG format with structured labels. The characteristics of
𝑖=1 𝑖=1
the data we used are summarized in Table 2. In this research, the LoS is
defined as the time between hospital discharge and admission measured 𝐸𝑉 𝐴𝑅 = 1 − 𝑣𝑎𝑟(𝒚 − 𝒚)∕𝑣𝑎𝑟(𝒚)
̂ (10)
in days. The mortality is depicted by the field of hospital_expired_flag where 𝑦̂𝑖 , 𝑦𝑖 , and 𝑦̄ indicate the predicted value, actual value, and
in the table ADMISSIONS, where 1 indicates the death and 0 indicates mean of actual values, respectively. 𝑁 is the number of total samples,
survival of patients in hospitals. Data preprocessing, including data and 𝑣𝑎𝑟(⋅) implies the variance function. 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 reflect the
cleaning, data transformation, revision of outliers, interpolation of mean of the absolute error and the square root of the average squared
missing data [46,47] are implemented for the original tabular data. As error between the predicted value and actual value, respectively. 𝑅2
a result, a total of 51 variables, including blood, circulatory, digestive, measures the proportion of the dependent variable change that can
endocrine, injury, and nervous, are extracted from the MIMIC-IV v1.0 be interpreted by the independent variable, and the 𝐸𝑉 𝐴𝑅 reveals the
dataset. For the CXR image data, only the images with the ViewPosition explanatory power of models. For both the 𝐸𝑉 𝐴𝑅 and 𝑅2 , the ideal
of ‘‘PA (Posterior-Anterior)’’ or ‘‘AP (Anterior-Posterior)’’ are chosen in value is equal to 1, while greater values are worse for the 𝑀𝐴𝐸 and
our experiments since they are photographed from the front view. As 𝑅𝑀𝑆𝐸 indicators. Moreover, widely-used metrics including 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
(𝐴𝑐𝑐), 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃 𝑟𝑒), 𝑅𝑒𝑐𝑎𝑙𝑙 (𝑅𝑒𝑐), and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 (𝐹 1) [32] are uti-
such, 4,144 CXR image samples are used for the LoS and mortality
lized to investigate the efficiency of mortality prediction, which can be
prediction experiments. Fig. 3 portrays the distribution of LoS and
calculated by the following equations:
mortality. From Fig. 3 it can be visualized that most LoS is under 20
days, and there is a class-imbalanced problem in the distribution of 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (11)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
mortality. The survival category (class 0) comprises the majority of
samples, while the mortality category (class 1) consists of only a small 𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (12)
number of samples. The distribution is extremely unbalanced. To cope (𝑇 𝑃 + 𝐹 𝑃 )
with this challenge, the Synthetic Minority Oversampling Technique 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (13)
(SMOTE) [48] is utilized to augment the minority class samples to (𝑇 𝑃 + 𝐹 𝑁)
ensure a balanced distribution of positive and negative samples in the (2 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙)
𝐹1 = (14)
training set. Using the SMOTE, new synthetic data are generated to (𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙)
make the number of samples in the mortality category very close to where 𝑇 𝑃 is true positive, 𝐹 𝑃 is false positive, 𝑇 𝑁 is true negative,
that in the survival category. and 𝐹 𝑁 is false negative.

7
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Fig. 5. The 𝑅𝑀𝑆𝐸 and 𝐸𝑉 𝐴𝑅 comparison of different methods.

Table 3
LoS prediction of different methods.
No. Methods Training set Validation set Test set Time
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅
1 MLP 5.33 10.28 0.45 0.45 4.63 6.54 0.09 0.12 5.65 7.64 0.58 0.61 0:00:40
2 XGBoost 3.43 5.23 0.85 0.85 5.04 6.40 0.13 0.27 5.01 7.21 0.62 0.66 0:00:31
3 RF 4.55 8.05 0.66 0.66 0.66 7.04 −0.05 0.09 6.28 9.22 0.38 0.43 0:00:31
4 VGG-style CNN 7.52 15.16 −0.18 0.07 5.10 7.87 −0.30 0.09 10.62 15.41 −0.71 0.08 0:02:46
5 1D-CNN 6.76 14.18 −0.03 0.14 4.28 6.97 −0.02 0.16 9.40 14.02 −0.41 0.12 0:03:04
6 M3T-LM 3.68 10.44 0.44 0.44 3.85 5.30 0.41 0.48 5.54 7.18 0.62 0.63 0:34:22

Fig. 6. The ROC curve of mortality prediction.

4.3. Results and discussion and the RMSprop [49] optimizer. The dataset is randomly divided
into training, validation, and test sets in a 7:2:1 ratio. We employ the
To demonstrate the robustness of the proposed method, the mostly- leave-one-out cross-validation approach for performance evaluation,
used ML methods, multilayer perceptron (MLP), RF, and extreme where 90% of the samples are used for training and validation, and
gradient boosting (XGBoost), along with a VGG-style CNN and one- the remaining 10% for testing. Fig. 4 illustrates the LoS prediction
dimensional CNN (1D-CNN) are selected for comparative analysis. performance of the proposed approach on randomly selected samples
Different from the proposed approach that fits multi-modal data, the from both the validation and test datasets. In Fig. 4, the orange curve
classical ML methods can only take a single data modality, such as represents the actual values of inpatients’ LoS and the blue curve
numerical data. Therefore, these compared methods are conducted denotes the predicted LoS. It can be seen from Fig. 4 that the predicted
on the tabular data of the MIMIC-IV v1.0 dataset. To ensure a fair values are very close to their actual values for most samples, indicating
comparison, the core hyperparameters of the compared models are set the efficacy of the proposed approach.
to the same as that of the proposed approach. Specifically, the mini- Table 3 presents the overall LoS prediction performance of different
batch size is set to 64, with a learning rate of 1×10−3 , 30 training epochs methods. Fig. 5 visualizes the 𝑅𝑀𝑆𝐸 and 𝐸𝑉 𝐴𝑅 comparison of different

8
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Fig. 7. The test confusion matrices of different methods.

Table 4
Mortality prediction of different methods.
No. Methods Training set (%) Validation set (%) Test set (%) Time (s)
𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1
1 MLP 91.28 85.74 98.42 91.64 91.44 85.25 99.46 91.81 73.01 33.87 22.82 27.27 0:00:36
2 XGBoost 88.60 83.98 94.56 88.96 88.96 83.72 95.74 89.33 70.84 28.98 21.73 24.84 0:00:37
3 RF 89.11 82.10 99.21 89.85 90.76 84.13 99.64 91.23 70.12 27.77 21.73 24.39 0:00:36
4 VGG-style CNN 98.32 98.31 98.26 98.29 95.59 95.51 95.51 95.51 77.83 22.72 6.25 9.80 0:01:58
5 1D-CNN 98.24 98.77 97.62 98.19 94.91 96.76 92.75 94.71 78.31 27.27 7.50 11.76 0:03:08
6 M3T-LM 98.91 97.84 100.00 98.90 98.72 97.80 99.65 98.71 95.42 91.78 83.75 87.58 0:27:23

methods. It can be seen from Table 3 that the proposed approach methods. As depicted in Fig. 6, the proposed approach exhibits supe-
realizes the 𝑅2 of 0.62 and 0.41, and the 𝑅𝑀𝑆𝐸 of 7.18 and 5.30 rior operating characteristics, with the 𝑅𝑂𝐶 curves of all categories
on the test set and validation set, respectively, which are superior to positioned close to the top-left corner of the figure. This positioning
that of other comparison methods. The proposed approach achieves the signifies the validity and effectiveness of the proposed approach for
best results. Notably, although the ensemble learning algorithms such mortality prediction. In addition, it can be observed from the confusion
as XGBoost and RF, perform better than the proposed M3T-LM in the matrix of Fig. 7(f) that the M3T-LM has accurately identified most of the
training set, a significant decline in validation and test performance samples. The 67 mortality samples have been correctly recognized by
is observed for both XGBoost and RF, which has also been shown in the proposed approach except for 13 misidentified samples. Likewise,
Fig. 5. It is noted that the proposed M3T-LM takes 34 min for 30
in addition to 6 samples misclassified into the mortality category, the
epochs of training, which is also reported in Table 3. Due to the large
329 survival samples have all been correctly identified by the proposed
number of parameters in the proposed deep learning framework and
approach. As a consequence, the proposed approach achieves a test
the concurrent execution of two tasks, the proposed model requires
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 of 95.42%, and the test 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, 𝑅𝑒𝑐𝑎𝑙𝑙, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 have
more time than benchmark methods. Though the time consumption of
also realized no less than 91.78%, 83.75%, and 87.58% respectively,
the proposed approach is slightly higher than that of other compared
methods, this aspect remains manageable and can be further improved as presented in Table 4.
by various optimization techniques. Moreover, we conduct a performance evaluation of the proposed
Next, we evaluate the performance of the proposed approach for the method in comparison to the findings presented in the latest literature
inpatients’ mortality prediction. Fig. 6 depicts the receiver operating concerning the prediction of LoS and mortality as shown in Table 5.
characteristic (𝑅𝑂𝐶) curves of the proposed approach, and the test From Table 5 it can be visualized that the proposed approach delivers a
confusion matrices of different methods are portrayed in Fig. 7. Ta- comparable result and outperforms most of the existing methods on the
ble 4 presents the overall mortality prediction performance of different MIMIC-IV v1.0 dataset. In summary, the outcomes of the comparative

9
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

Table 5
Comparison with state-of-the-art methods.
ID References Year Description 𝑅𝑀𝑆𝐸 (LoS) 𝑅2 (LoS) 𝐹 1 (mortality)
1 Vaswani et al. [50] 2017 Transformer 6.18 0.27 0.738
2 Harutyunyan et al. [51] 2019 LSTM 6.61 0.28 0.745
3 Ma et al. [52] 2020 ConCare – – 0.778
4 Rocheteau et al. [53] 2021 Temporal Pointwise Convolution (TPC) 5.20 0.59 0.784
5 Al-Dailami et al. [2] 2022 Temporal Dilated Separable Convolution 4.30 0.64 0.821
with Con-text Aware Feature Fusion
(TDSC-CAFF)
6 Shu et al. [54] 2023 ML-based scoring models – – 0.613
7 This study 2024 M3T-LM 7.18 0.62 0.876

Table 6
LoS prediction results with hyperparameter optimization.
Mini-batch 𝑙𝑟=0.001 𝑙𝑟=0.002 𝑙𝑟=0.005 Time (s)
size
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅
32 4.23 6.31 0.38 0.40 4.61 6.47 0.35 0.35 4.81 6.97 0.25 0.25 0:52:02
64 5.54 7.18 0.62 0.63 4.55 6.30 0.38 0.39 4.59 6.88 0.26 0.30 0:27:23
128 4.62 6.87 0.27 0.37 4.72 6.40 0.36 0.38 5.19 7.37 0.15 0.16 0:14:21
256 4.41 6.38 0.37 0.38 4.64 7.05 0.23 0.38 10.22 12.99 −1.62 0.02 0:08:28

Table 7
The results of ablation experiments.
Ablation approach Test accuracy of LoS prediction Accuracy of mortality prediction (%) Time (s)
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1
Delete images (CXRMDL) 6.33 10.14 0.39 0.44 89.39 100.00 45.00 62.06 0:04:56
Delete long texts (LSTM) 6.94 10.56 0.36 0.39 60.96 33.05 100.00 49.69 0:24:12
Delete attention module 5.84 9.37 0.37 0.42 92.53 72.47 98.75 83.59 0:26:52
This study 5.54 7.18 0.62 0.63 95.42 91.78 83.75 87.58 0:27:23

analysis affirm the excellence of the proposed method in predicting data modality addressed by the LSTM model in our networks. We
both LoS and mortality. notice that a significant drop in accuracy occurs in this ablation model.
The test 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 in LoS prediction rise to 6.94 and 10.56
4.4. Hyperparameter optimization (increase by 1.40 and 3.38), and the test 𝑅2 and 𝐸𝑉 𝐴𝑅 drop to 0.36
and 0.39 (decrease by 0.26 and 0.24), respectively. Likewise, the test
In this section, we implement a grid search for optimal sets of 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 in mortality prediction separately
essential hyperparameters including mini-batch size (𝑏𝑠) and learning drop to 60.96%, 33.05%, and 49.69% (decrease by 34.46%, 58.73%,
rate (𝑙𝑟) on the prediction of inpatient LoS. The range of the mini- and 37.89%). Consequently, though the efficacy of the ablated models
batch size hyperparameter is set as (|𝐵|) ∈ {32, 64, 128, 256}, and the is still better than that of other compared baselines, it suffers a notable
learning rate (𝑙𝑟) ∈ {0.001, 0.002, 0.005}. We train our model using decline in comparison with the multi-modal data aggregation model
hyperparameters from these sets for 30 epochs on the publicly available proposed in the study. In the second ablation experiment, we remove
MIMIC-IV v1.0 dataset with the same splits, as mentioned in Section 4 the newly added attention module from the networks to investigate
.3. We found the best hyperparameter set for the LoS prediction is a the performance of the proposed method. We notice a minor drop
mini-batch size of 64 with 𝑙𝑟 = 0.001. Table 6 presents the prediction in accuracy occurs in this ablation model, where the test 𝑀𝐴𝐸 and
performance of the proposed method with different hyperparameter 𝑅𝑀𝑆𝐸 of the ablated model separately rise to 5.84 and 9.37 (increase
settings. by 0.30 and 2.19) in LoS prediction. The test 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, and
𝐹 1-𝑆𝑐𝑜𝑟𝑒 of the proposed method in mortality prediction also drop
4.5. Ablation study to 92.53%, 72.47%, and 83.59% (decrease by 2.89%, 19.31%, and
3.99%), respectively. This ablation experiment demonstrates that the
To gain a deeper understanding of the sub-models and different results of the model adding the attention mechanism are slightly better
modalities contributing to a system’s performance, ablation study on than that of the model without the attention module, and removing
our model is performed. the attention module has a minor negative influence on the model
Table 7 summarizes the ablation experiment results. In the first accuracy.
ablation experiment, we remove the usage of the CXR image data
modality and delete the CXRMDL module in our model. We notice a 5. Conclusion
major decrease in the test accuracy, where the 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 in
LoS prediction rise to 6.33 and 10.14 (increase by 0.79 and 2.96), Estimating the inpatient LoS and mortality accurately is a chal-
and the 𝑅2 and 𝐸𝑉 𝐴𝑅 drop to 0.39 and 0.44 (decrease by 0.23 and lenging daily task in the field of health care. This study proposes a
0.19). The 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑅𝑒𝑐𝑎𝑙𝑙, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 in mortality prediction novel Multi-Modal Multi-Task learning model called M3T-LM to predict
drop to 89.39%, 45.00%, and 62.06% (decrease by 6.03%, 38.75%, patient outcomes, specifically, remaining LoS and inpatient mortality.
and 25.52%), respectively. On another front, it is noted that the time Leveraging mixed regression and classification tasks, M3T-LM simulta-
consumption of this ablation model shows a significant decrease from neously predicts inpatient LoS and mortality from multi-modal data.
27 min 23 s to 4 min 56 s (a reduction of over 22 min). This ablation Acknowledging the skewed distribution of LoS, the proposed M3T-LM
experiment results demonstrate that removing the CXR image data treats LoS prediction as a regression task, delivering more informative
modality has a great impact on the performance compared to the multi- results by estimating the actual number of days rather than assigning
modal data aggregation model. Subsequently, we remove the long text classes. At the same time, M3T-LM integrates mortality prediction,

10
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

recognizing its close association with inpatient LoS scenarios. The two [10] W.E. Muhlestein, D.S. Akagi, J.M. Davies, L.B. Chambless, Predicting inpatient
tasks share standard feature representations necessary for the mixed- length of stay after brain tumor surgery: developing machine learning ensembles
to improve predictive performance, Neurosurgery 85 (3) (2019) 384.
task model training. The main advantage of the proposed method is
[11] R. Yousef, S. Khan, G. Gupta, T. Siddiqui, B.M. Albahlal, S.A. Alajlan, M.A. Haq,
its capability of utilizing the inherent correlation within multiple task U-net-based models towards optimal MR brain image segmentation, Diagnostics
types to guide the feature selection process, which can further promote 13 (9) (2023) 1624.
the learning performance. Besides, multiple data modalities are effec- [12] A.W. Salehi, S. Khan, G. Gupta, B.I. Alabduallah, A. Almjally, H. Alsolai, T.
tively utilized by the proposed method in a unified model, which leads Siddiqui, A. Mellit, A study of CNN and transfer learning in medical imaging:
Advantages, challenges, future scope, Sustainability 15 (7) (2023) 5930.
to more effective resource allocation, higher prognostic accuracy, and
[13] S.-C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, M.P. Lungren, Fusion of medical
better informative clinical decision-making. Impressively, experimental imaging and electronic health records using deep learning: a systematic review
findings demonstrate that the proposed M3T-LM is superior to other and implementation guidelines, NPJ Dig. Med. 3 (1) (2020) 136.
SOTA baseline methods on both tasks. [14] J. Chen, Y. Wen, M. Pokojovy, T.-L.B. Tseng, P. McCaffrey, A. Vo, E. Walser,
While the proposed approach yields satisfactory results, it has some S. Moen, Multi-modal learning for inpatient length of stay prediction, Comput.
Biol. Med. 171 (2024) 108121.
limitations related to computational complexity. In the future, we plan
[15] D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm,
to incorporate model pruning algorithms to simplify the model and W. Wiesbeck, K. Dietmayer, Deep multi-modal object detection and semantic
enhance its efficiency. Another interesting direction is that, in response segmentation for autonomous driving: Datasets, methods, and challenges, IEEE
to the escalating concerns regarding data privacy and security, we plan Trans. Intell. Transp. Syst. 22 (3) (2020) 1341–1360.
to incorporate privacy-preserving techniques into our model to ensure [16] S. Nemati, R. Rohani, M.E. Basiri, M. Abdar, N.Y. Yen, V. Makarenkov, A hybrid
latent space data fusion method for multimodal emotion recognition, IEEE Access
the safeguarding of sensitive information while maintaining effective 7 (2019) 172948–172964.
data fusion without harming the model predictive performance. [17] J. Petrich, Z. Snow, D. Corbin, E.W. Reutzel, Multi-modal sensor fusion with
machine learning for data-driven process monitoring for additive manufacturing,
CRediT authorship contribution statement Addit. Manuf. 48 (2021) 102364.
[18] L. Men, N. Ilk, X. Tang, Y. Liu, Multi-disease prediction using LSTM recurrent
neural networks, Expert Syst. Appl. 177 (2021) 114905.
Junde Chen: Writing – review & editing, Writing – original draft, [19] T.-L.-T. Le, N. Thome, S. Bernard, V. Bismuth, F. Patoureaux, Multitask classi-
Validation, Software, Methodology, Investigation, Data curation, Con- fication and segmentation for cancer diagnosis in mammography, 2019, arXiv
ceptualization. Qing Li: Writing – review & editing, Visualization, preprint arXiv:1909.05397.
Formal analysis. Feng Liu: Writing – review & editing, Validation. [20] R. Yu, Y. Zheng, R. Zhang, Y. Jiang, C.C. Poon, Using a multi-task recurrent
neural network with attention mechanisms to predict hospital mortality of
Yuxin Wen: Writing – review & editing, Supervision, Methodology, patients, IEEE J. Biomed. Health Inform. 24 (2) (2019) 486–492.
Funding acquisition, Conceptualization. [21] K. Tan, W. Huang, X. Liu, J. Hu, S. Dong, A multi-modal fusion framework based
on multi-task correlation learning for cancer prognosis prediction, Artif. Intell.
Declaration of competing interest Med. 126 (2022) 102260.
[22] Z. Lu, W. Chang, S. Meng, M. Xue, J. Xie, J. Xu, H. Qiu, Y. Yang, F.
Guo, The effect of high-flow nasal oxygen therapy on postoperative pulmonary
The authors declare that they have no known competing finan- complications and hospital length of stay in postoperative patients: a systematic
cial interests or personal relationships that could have appeared to review and meta-analysis, J. Intensiv. Care Med. 35 (10) (2020) 1129–1140.
influence the work reported in this paper. [23] A. Morton, E. Marzban, G. Giannoulis, A. Patel, R. Aparasu, I.A. Kakadiaris, A
comparison of supervised machine learning techniques for predicting short-term
in-hospital length of stay among diabetic patients, in: 2014 13th International
Acknowledgment Conference on Machine Learning and Applications, IEEE, 2014, pp. 428–431.
[24] B. Thompson, K.O. Elish, R. Steele, Machine learning-based prediction of pro-
This research is partially funded by the National Science Founda- longed length of stay in newborns, in: 2018 17th IEEE International Conference
on Machine Learning and Applications, ICMLA, IEEE, 2018, pp. 1454–1459.
tion, USA under Grant No. 2246158.
[25] J. Chen, T. Di Qi, J. Vu, Y. Wen, A deep learning approach for inpatient length
of stay and mortality prediction, J. Biomed. Inform. 147 (2023) 104526.
References [26] P.-F.J. Tsai, P.-C. Chen, Y.-Y. Chen, H.-Y. Song, H.-M. Lin, F.-M. Lin, Q.-P. Huang,
et al., Length of hospital stay prediction at the admission stage for cardiology
[1] J. Sheng, J. Amankwah-Amoah, Z. Khan, X. Wang, COVID-19 pandemic in the patients using artificial neural network, J. Healthc. Eng. 2016 (2016).
new era of big data analytics: Methodological innovations and future research [27] M. Rouzbahman, A. Jovicic, M. Chignell, Can cluster-boosted regression improve
directions, Br. J. Manage. 32 (4) (2021) 1164–1183. prediction of death and length of stay in the icu? IEEE J. Biomed. Health Inform.
[2] A. Al-Dailami, H. Kuang, J. Wang, Predicting length of stay in ICU and mortality 21 (3) (2016) 851–858.
with temporal dilated separable convolution and context-aware feature fusion, [28] D. Ruzicka, T. Kondo, G. Fujimoto, A.P. Craig, S.-W. Kim, H. Mikamo, Develop-
Comput. Biol. Med. 151 (2022) 106278. ment of a clinical prediction model for recurrence and mortality outcomes after
[3] A.H. Association, et al., AHA Hospital Statistics: Fast Facts on US Hospitals, clostridioides difficile infection using a machine learning approach, Anaerobe 77
American Hospital Association, 2017, available at: www/aha/org. (Accessed 31 (2022) 102628.
May 2017). [29] S. Ganapathy, K. Harichandrakumar, P. Penumadu, K. Tamilarasu, N.S. Nair,
[4] R. Resar, K. Nolan, D. Kaczynski, K. Jensen, Using real-time demand capacity Comparison of Bayesian, frequentist and machine learning models for predicting
management to improve hospitalwide patient flow, Jt. Comm. J. Qual. Patient the two-year mortality of patients diagnosed with squamous cell carcinoma of
Saf. 37 (5) (2011) 217–AP3. the oral cavity, Clin. Epidemiology Glob. Health 17 (2022) 101145.
[5] N. Meo, E. Paul, C. Wilson, J. Powers, M. Magbual, K.M. Miles, Introducing [30] W. Caicedo-Torres, J. Gutierrez, ISeeU2: Visually interpretable mortality predic-
an electronic tracking tool into daily multidisciplinary discharge rounds on a tion inside the ICU using deep learning and free-text medical notes, Expert Syst.
medicine service: a quality improvement project to reduce length of stay, BMJ Appl. 202 (2022) 117190.
Open Qual. 7 (3) (2018) e000174. [31] G. Zeng, J. Zhuang, H. Huang, M. Tian, Y. Gao, Y. Liu, X. Yu, Use of deep
[6] I.T. Peres, S. Hamacher, F.L.C. Oliveira, F.A. Bozza, J.I.F. Salluh, Data-driven learning for continuous prediction of mortality for all admissions in intensive
methodology to predict the ICU length of stay: A multicentre study of 99,492 care units, Tsinghua Sci. Technol. 28 (4) (2023) 639–648.
admissions in 109 Brazilian units, Anaesth. Crit. Care Pain Med. 41 (6) (2022) [32] T. Kondo, A. Teramoto, E. Watanabe, Y. Sobue, H. Izawa, K. Saito, H. Fujita,
101142. Prediction of short-term mortality of cardiac care unit patients using image-
[7] L. Clifton, D.A. Clifton, M.A. Pimentel, P.J. Watkinson, L. Tarassenko, Gaussian transformed ECG waveforms, IEEE J. Transl. Eng. Health Med. 11 (2023)
processes for personalized e-health monitoring with wearable sensors, IEEE 191–198.
Trans. Biomed. Eng. 60 (1) (2012) 193–197. [33] D.D. Solomon, S. Khan, S. Garg, G. Gupta, A. Almjally, B.I. Alabduallah, H.S.
[8] L. Hempel, S. Sadeghi, T. Kirsten, Prediction of intensive care unit length of stay Alsagri, M.M. Ibrahim, A.M.A. Abdallah, Hybrid majority voting: Prediction and
in the MIMIC-IV dataset, Appl. Sci. 13 (12) (2023) 6930. classification model for obesity, Diagnostics 13 (15) (2023) 2610.
[9] G. Harerimana, J.W. Kim, B. Jang, A deep attention model to forecast the length [34] Z.-C. Zhang, X. Zhao, G. Dong, X.-M. Zhao, Improving alzheimer’s disease diag-
of stay and the in-hospital mortality right on admission from ICD codes and nosis with multi-modal PET embedding features by a 3D multi-task MLP-mixer
demographic data, J. Biomed. Inform. 118 (2021) 103778. neural network, IEEE J. Biomed. Health Inf. (2023).

11
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237

[35] W. Shao, T. Wang, L. Sun, T. Dong, Z. Han, Z. Huang, J. Zhang, D. Zhang, [45] A.E. Johnson, T.J. Pollard, N.R. Greenbaum, M.P. Lungren, C.-y. Deng, Y. Peng,
K. Huang, Multi-task multi-modal learning for joint diagnosis and prognosis of Z. Lu, R.G. Mark, S.J. Berkowitz, S. Horng, MIMIC-CXR-jpg, a large publicly
human cancers, Med. Image Anal. 65 (2020) 101795. available database of labeled chest radiographs. arxiv 2019, 2019, arXiv preprint
[36] M. Liu, J. Zhang, E. Adeli, D. Shen, Joint classification and regression via deep arXiv:1901.07042.
multi-task multi-channel learning for alzheimer’s disease diagnosis, IEEE Trans. [46] C. Fan, M. Chen, X. Wang, J. Wang, B. Huang, A review on data preprocessing
Biomed. Eng. 66 (5) (2018) 1195–1206. techniques toward efficient and reliable knowledge discovery from building
[37] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of operational data, Frontiers in Energy Research 9 (2021) 652801.
the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. [47] A. Kiourtis, A. Mavrogiorgou, G. Manias, D. Kyriazis, Ontology-driven data
7132–7141. cleaning towards lossless data compression, in: Challenges of Trustable AI and
[38] C. Chen, T. Wang, Y. Liu, L. Cheng, J. Qin, Spatial attention-based convolutional Added-Value on Health, IOS Press, 2022, pp. 421–422.
transformer for bearing remaining useful life prediction, Meas. Sci. Technol. 33 [48] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic
(11) (2022) 114001. minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
[39] J. Brownlee, How to prepare text data for deep learning with keras, 2019, [49] T. Tieleman, G. Hinton, Rmsprop: Divide the gradient by a running average of its
Machine Learning Mastery, Disponível em: https://machinelearningmastery. recent magnitude. coursera: Neural networks for machine learning, COURSERA
com/prepare-text-data-deep-learning-keras/. (acesso em: 15 de nov. de 2020). Neural Netw. Mach. Learn 17 (2012).
[40] N. Hayat, K.J. Geras, F.E. Shamout, MedFuse: Multi-modal fusion with clinical [50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser,
time-series data and chest X-ray images, in: Machine Learning for Healthcare I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
Conference, PMLR, 2022, pp. 479–503. [51] H. Harutyunyan, H. Khachatrian, D.C. Kale, G. Ver Steeg, A. Galstyan, Multitask
[41] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and learning and benchmarking with clinical time series data, Sci. Data 6 (1) (2019)
the impact of residual connections on learning, in: Proceedings of the AAAI 96.
Conference on Artificial Intelligence, 31, (1) 2017. [52] L. Ma, C. Zhang, Y. Wang, W. Ruan, J. Wang, W. Tang, X. Ma, X. Gao, J. Gao,
[42] Q. Wang, Y. Ma, K. Zhao, Y. Tian, A comprehensive survey of loss functions in Concare: Personalized clinical feature embedding via capturing the healthcare
machine learning, Ann. Data Sci. (2020) 1–26. context, in: Proceedings of the AAAI Conference on Artificial Intelligence, 34,
[43] A.E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T.J. (01) 2020, pp. 833–840.
Pollard, S. Hao, B. Moody, B. Gow, et al., MIMIC-IV, a freely accessible electronic [53] E. Rocheteau, P. Liò, S. Hyland, Temporal pointwise convolutional networks
health record dataset, Sci. Data 10 (1) (2023) 1. for length of stay prediction in the intensive care unit, in: Proceedings of the
[44] A.E. Johnson, T.J. Pollard, L. Shen, L.-w.H. Lehman, M. Feng, M. Ghassemi, B. Conference on Health, Inference, and Learning, 2021, pp. 58–68.
Moody, P. Szolovits, L. Anthony Celi, R.G. Mark, MIMIC-III, a freely accessible [54] T. Shu, J. Huang, J. Deng, H. Chen, Y. Zhang, M. Duan, Y. Wang, X. Hu, X.
critical care database, Sci. Data 3 (1) (2016) 1–9. Liu, Development and assessment of scoring model for ICU stay and mortality
prediction after emergency admissions in ischemic heart disease: a retrospective
study of MIMIC-IV databases, Intern. Emer. Med. 18 (2) (2023) 487–497.

Alasan LOS
No ratings yet
Alasan LOS
14 pages
Heart Diesease Prediction and Recommendation System Using Machine Learning
No ratings yet
Heart Diesease Prediction and Recommendation System Using Machine Learning
11 pages
3531326
No ratings yet
3531326
29 pages
Multi Label Disease Prediction Paper Final Using Optimization
No ratings yet
Multi Label Disease Prediction Paper Final Using Optimization
49 pages
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
No ratings yet
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
14 pages
s41746-022-00712-8
No ratings yet
s41746-022-00712-8
14 pages
References
No ratings yet
References
10 pages
[IJCST-V13I2P2]:Seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
No ratings yet
[IJCST-V13I2P2]:Seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
2 pages
E3sconf Icmpc2023 01051
No ratings yet
E3sconf Icmpc2023 01051
10 pages
No_3
No ratings yet
No_3
4 pages
Explainable ML framework for Lung cancer
No ratings yet
Explainable ML framework for Lung cancer
10 pages
research paper group 9
No ratings yet
research paper group 9
4 pages
Latest Seminar Report Yash Ingole
No ratings yet
Latest Seminar Report Yash Ingole
35 pages
Multiple Disease Prediction System Using ML: June 2024
No ratings yet
Multiple Disease Prediction System Using ML: June 2024
8 pages
Research Paper
No ratings yet
Research Paper
7 pages
Base Paper
No ratings yet
Base Paper
4 pages
Disease Prediction Using Python
100% (1)
Disease Prediction Using Python
7 pages
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
No ratings yet
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
11 pages
Implementation of An Incremental Deep Learning Model For Survival Prediction of Cardiovascular Patients
No ratings yet
Implementation of An Incremental Deep Learning Model For Survival Prediction of Cardiovascular Patients
9 pages
final project
No ratings yet
final project
25 pages
Machine Learning For Clinical Outcome Prediction
No ratings yet
Machine Learning For Clinical Outcome Prediction
11 pages
Fin Irjmets1705419474
No ratings yet
Fin Irjmets1705419474
13 pages
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
No ratings yet
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
5 pages
Batch (4)-1-2
No ratings yet
Batch (4)-1-2
20 pages
Forecasting_Patient_Length_of_Stay_for_Optimal_Hospital_Resource_Allocation
No ratings yet
Forecasting_Patient_Length_of_Stay_for_Optimal_Hospital_Resource_Allocation
6 pages
Hospital patients length of stay prediction A federated learning
No ratings yet
Hospital patients length of stay prediction A federated learning
11 pages
Multi Disease Prediction System Using ML (Phase-II)
No ratings yet
Multi Disease Prediction System Using ML (Phase-II)
14 pages
Jtpes 2024 4 2 - 2
No ratings yet
Jtpes 2024 4 2 - 2
7 pages
Multiple Disease Prediction Using Machine Learning and Deep Learning With The Im
No ratings yet
Multiple Disease Prediction Using Machine Learning and Deep Learning With The Im
7 pages
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
No ratings yet
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
9 pages
Ijarcce 2019 81210
No ratings yet
Ijarcce 2019 81210
3 pages
Mathematics 10 02049 v3
No ratings yet
Mathematics 10 02049 v3
17 pages
ITRByAYUSH
No ratings yet
ITRByAYUSH
58 pages
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
No ratings yet
A Comprehensive Review For Chronic Disease Prediction Using Machine Learning Algorithms
28 pages
Mathematics 11 04681
No ratings yet
Mathematics 11 04681
15 pages
Emergency Patient Forecasting With Models Based On Support Vector Machines
No ratings yet
Emergency Patient Forecasting With Models Based On Support Vector Machines
12 pages
Disease Prediction Research Report
No ratings yet
Disease Prediction Research Report
6 pages
Epidemics vs. Pandemics (1)
No ratings yet
Epidemics vs. Pandemics (1)
15 pages
Symptom-Based_Disease_Prediction_A_Machine_Learnin
No ratings yet
Symptom-Based_Disease_Prediction_A_Machine_Learnin
10 pages
Research Paper
No ratings yet
Research Paper
13 pages
No_11
No ratings yet
No_11
8 pages
BTech Phase 4 Presentation Template
No ratings yet
BTech Phase 4 Presentation Template
24 pages
Predicting Inpatient Flows
No ratings yet
Predicting Inpatient Flows
43 pages
ESCI2024 Paper 0912
No ratings yet
ESCI2024 Paper 0912
6 pages
Patient Sickness Prediction System
No ratings yet
Patient Sickness Prediction System
8 pages
Heart Failure Prediction Using Machine Learning Algorithm
No ratings yet
Heart Failure Prediction Using Machine Learning Algorithm
5 pages
Final Research paper
No ratings yet
Final Research paper
6 pages
Team DLJ Researchpaper
No ratings yet
Team DLJ Researchpaper
8 pages
Mts Made To Stick Model
No ratings yet
Mts Made To Stick Model
36 pages
Performance Enhancement of Machine Learning System Applicable To Detect Heart Disease 2024
No ratings yet
Performance Enhancement of Machine Learning System Applicable To Detect Heart Disease 2024
9 pages
IRJMETS60200047755-feb
No ratings yet
IRJMETS60200047755-feb
5 pages
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
No ratings yet
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
7 pages
No_17
No ratings yet
No_17
6 pages
Machine Learning in Healthcare Management For Medical Insurance Cost Prediction
No ratings yet
Machine Learning in Healthcare Management For Medical Insurance Cost Prediction
11 pages
diseaseppt
No ratings yet
diseaseppt
18 pages
HeartAttackAllPoints
No ratings yet
HeartAttackAllPoints
15 pages
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
5 pages
Machine Learning & Prediction of Heart Disease
No ratings yet
Machine Learning & Prediction of Heart Disease
21 pages
Big Data in Healthcare: Statistical Analysis of the Electronic Health Record
From Everand
Big Data in Healthcare: Statistical Analysis of the Electronic Health Record
Farrokh Alemi
No ratings yet
Data-Driven Healthcare: Revolutionizing Patient Care with Data Science
From Everand
Data-Driven Healthcare: Revolutionizing Patient Care with Data Science
William Webb
No ratings yet
Pre Placement Paid Internship ServiceNow
No ratings yet
Pre Placement Paid Internship ServiceNow
4 pages
NR7505915345058349 Invoice
No ratings yet
NR7505915345058349 Invoice
2 pages
Ethan Soto Resume
No ratings yet
Ethan Soto Resume
2 pages
Introduction To English Phonetics: Phonetics and Phonology of Our Language, Since We Use Our Language All The Time, and
No ratings yet
Introduction To English Phonetics: Phonetics and Phonology of Our Language, Since We Use Our Language All The Time, and
16 pages
Requirements Writing for System Engineering 1st ed. Edition George Koelsch instant download
No ratings yet
Requirements Writing for System Engineering 1st ed. Edition George Koelsch instant download
56 pages
Geriatric Physical Therapy 3rd ed 3rd Edition Andrew A. Guccione - The complete ebook version is now available for download
100% (2)
Geriatric Physical Therapy 3rd ed 3rd Edition Andrew A. Guccione - The complete ebook version is now available for download
50 pages
Catalogo - Techlok
No ratings yet
Catalogo - Techlok
30 pages
Motor Starting PDF-1
100% (1)
Motor Starting PDF-1
35 pages
Richard Crandall, Carl B. Pomerance - Prime Numbers - A Computational Perspective (2005, Springer)
No ratings yet
Richard Crandall, Carl B. Pomerance - Prime Numbers - A Computational Perspective (2005, Springer)
81 pages
Eminence CapitalMen's Warehouse
No ratings yet
Eminence CapitalMen's Warehouse
0 pages
Exam 1 Section 1
100% (1)
Exam 1 Section 1
24 pages
Theory of Architecture
No ratings yet
Theory of Architecture
5 pages
Landing Party Manual 1960 (OPNAV-P34-03) PDF
No ratings yet
Landing Party Manual 1960 (OPNAV-P34-03) PDF
652 pages
Get Molecular Biology of the Cell 6th Edition Bruce Alberts free all chapters
100% (30)
Get Molecular Biology of the Cell 6th Edition Bruce Alberts free all chapters
60 pages
Titration of A Weak Base With A Strong Acid
No ratings yet
Titration of A Weak Base With A Strong Acid
7 pages
Sample Book Review
No ratings yet
Sample Book Review
4 pages
Propush - Me Quick Start Guide
No ratings yet
Propush - Me Quick Start Guide
12 pages
1142 2536 2 PB
No ratings yet
1142 2536 2 PB
15 pages
Catheter Choice
No ratings yet
Catheter Choice
13 pages
British Battleships Of World War One New Revised Edition R A Burt download
No ratings yet
British Battleships Of World War One New Revised Edition R A Burt download
26 pages
Factory Overhead Cost Standards
No ratings yet
Factory Overhead Cost Standards
3 pages
Digital Design Through Verilog PDF
No ratings yet
Digital Design Through Verilog PDF
68 pages
211.Pom.chap8.Mrp
No ratings yet
211.Pom.chap8.Mrp
29 pages
Pajero III Rear Differential Lock
100% (2)
Pajero III Rear Differential Lock
8 pages
PET519 Exam1
No ratings yet
PET519 Exam1
4 pages
Neucler Transplantation Experiments
No ratings yet
Neucler Transplantation Experiments
6 pages
SNMPTN 02
No ratings yet
SNMPTN 02
12 pages
Celtic Music
100% (3)
Celtic Music
18 pages
Randstad - White Paper Choosing The Right RPO
No ratings yet
Randstad - White Paper Choosing The Right RPO
12 pages
Lesson 1 - Introduction To System Integration
100% (1)
Lesson 1 - Introduction To System Integration
27 pages

Multitask multimodal 2

Uploaded by

Multitask multimodal 2

Uploaded by

Computers in Biology and Medicine 183 (2024) 109237

Contents lists available at ScienceDirect

Computers in Biology and Medicine

M3T-LM: A multi-modal multi-task learning model for jointly predicting

ARTICLE INFO ABSTRACT

[41] is used as a backbone network of the model. Transfer learning is

Fig. 2. The overall architecture of the proposed M3T-LM.

Fig. 3. The distribution of LoS and mortality.

Characteristic Data type No. of variables Name of variables

Numerical & Categorical 6 Subject_id, gender, anchor_age,

Fig. 4. The LoS prediction effect of the proposed approach.

Fig. 5. The 𝑅𝑀𝑆𝐸 and 𝐸𝑉 𝐴𝑅 comparison of different methods.

Fig. 6. The ROC curve of mortality prediction.

Fig. 7. The test confusion matrices of different methods.

You might also like