Multitask multimodal 2
Multitask multimodal 2
Keywords: Ensuring accurate predictions of inpatient length of stay (LoS) and mortality rates is essential for enhancing
Multi-task learning hospital service efficiency, particularly in light of the constraints posed by limited healthcare resources.
Data-fusion model Integrative analysis of heterogeneous clinic record data from different sources can hold great promise for
Length of stay prediction
improving the prognosis and diagnosis level of LoS and mortality. Currently, most existing studies solely
Deep learning
focus on single data modality or tend to single-task learning, i.e., training LoS and mortality tasks separately.
This limits the utilization of available multi-modal data and prevents the sharing of feature representations
that could capture correlations between different tasks, ultimately hindering the model’s performance. To
address the challenge, this study proposes a novel Multi-Modal Multi-Task learning model, termed as M3T-LM,
to integrate clinic records to predict inpatients’ LoS and mortality simultaneously. The M3T-LM framework
incorporates multiple data modalities by constructing sub-models tailored to each modality. Specifically, a
novel attention-embedded one-dimensional (1D) convolutional neural network (CNN) is designed to handle
numerical data. For clinical notes, they are converted into sequence data, and then two long short-term
memory (LSTM) networks are exploited to model on textual sequence data. A two-dimensional (2D) CNN
architecture, noted as CRXMDL, is designed to extract high-level features from chest X-ray (CXR) images.
Subsequently, multiple sub-models are integrated to formulate the M3T-LM to capture the correlations between
patient LoS and modality prediction tasks. The efficiency of the proposed method is validated on the MIMIC-
IV dataset. The proposed method attained a test 𝑀𝐴𝐸 of 5.54 for LoS prediction and a test 𝐹 1 of 0.876
for mortality prediction. The experimental results demonstrate that our approach outperforms state-of-the-art
(SOTA) methods in tackling mixed regression and classification tasks.
1. Introduction accurate prediction of inpatients’ LoS and mortality. With the ris-
ing prevalence of electronic health record (EHR) systems, patients’
Healthcare systems continue to face a significant challenge of pro- clinical records, such as patients’ laboratory test results, vital signs,
viding timely patient care while optimizing resource utilization, espe- demographic information, clinical notes, and other details, are now
cially in the wake of the COVID-19 pandemic [1]. Inpatients’ length accessible. Leveraging this abundant knowledge, sophisticated data-
of stay (LoS) and mortality are two crucial metrics that hospitals driven algorithms enable precise predictions for inpatients’ LoS and
utilize to assess clinical quality and optimize resource allocation [2]. mortality.
This research focuses on improving the service efficiency and man-
Prolonged LoS escalates the likelihood of encountering adverse events,
agement capabilities of hospitals by simultaneously predicting inpatient
such as poor nutritional levels, hospital-acquired infections, adverse
LoS and mortality. As mentioned previously, patients’ LoS and mortality
drug events, and various other complications. Furthermore, prolonged
in a hospital are crucial indicators to assess the quality of care and effec-
LoS increases in the odds of inpatient mortality [3]. This has triggered
tive allocation of healthcare resources. Therefore, predicting inpatient
hospitals to spend intensive efforts on resource allocation. Real-time
LoS (number of days) will be a regression prediction, and mortality will
demand capacity (RTDC) management [4] and multidisciplinary dis- be a binary classification in this study. From the recent literature, it is
charge rounds (MDRs) [5] have shown great promise as best practices evident that machine learning (ML) offers unprecedented opportunities
in addressing these challenges, but their effectiveness relies on the
∗ Corresponding author.
E-mail addresses: [email protected] (J. Chen), [email protected] (Q. Li), [email protected] (F. Liu), [email protected] (Y. Wen).
https://doi.org/10.1016/j.compbiomed.2024.109237
Received 25 May 2024; Received in revised form 29 September 2024; Accepted 30 September 2024
Available online 7 October 2024
0010-4825/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
to improve patient and clinical outcomes due to the great potential notes, laboratory test results, and medical images, are integrated to
for learning essential features and extracting meaningful insights from be used in our scheme. According to different data modalities, the
data [6]. Some notable works include, but are not limited to, Gaussian basic models (sub-models) are constructed using relevant data types.
Process Regression for clinical e-health modeling [7], prediction of In- Concretely, a novel attention-embedded one-dimensional (1D) CNN is
tensive Care Unit (ICU) LoS based on four ML methods such as Logistic designed to handle numerical data. By converting the texts to sequence
Regression (LR), Support Vector Machine (SVM), Random forest (RF), data, two long short-term memory (LSTM) networks are used to model
and XGBoost [8], a Hierarchical Attention Network (HAN) for LoS and on clinical notes. A two-dimensional (2D) CNN architecture, named
mortality predictions [9], ensemble learning for improving predictive CRXMDL, is designed to extract high-level features from chest X-ray
performance [10], U-Net-Based Models for medical image segmentation (CXR) images. Subsequently, multiple sub-models are integrated to
[11], CNN for medical image classification [12], etc. However, the form the M3T-LM to capture the correlations between patient LoS
majority of existing ML models in healthcare either rely exclusively and modality prediction tasks. It is important to note that predicting
on a single data modality or solely for a single task [13]. With the in- inpatients’ LoS and mortality involves a challenging mixed-task sce-
creasing availability and accessibility of multi-modal data, multi-modal nario, encompassing both regression and classification tasks. A novel
deep learning (DL) models [14], aiming to integrate data of different predictive framework is proposed to address this challenge. Overall, the
distributions, sources, and formats into a unified space where both key contributions of this study can be recapitulated as follows:
inter-modality and cross-modality aspects can be uniformly captured,
have been successful in a wide range of domains, such as autonomous • A joint classification-regression scheme that implements mixed-
driving and video classification through combining visual features from task types using heterogeneous data modalities is proposed to
cameras along with data from Light Detection and Ranging (LiDAR) predict inpatients’ LoS and mortality simultaneously.
sensors [15], emotion recognition through the fusion of audiovisual
• An enhanced squeeze-and-excitation (SE)-block, where the 2D
content with textual users’ comments [16], and process monitoring in
pooling layer is replaced by a 1D one and two non-linear fully-
manufacturing using multimodal sensor data [17]. The main challenge
connected layers are substituted by a 1 × 1 convolution layer to
of multi-modal data fusion is that data from different sources and file
address numerical data and decrease the number of parameters,
formats exhibit heterogeneity and high-dimensionality, seldom adher-
is incorporated into the network for adaptive feature calibration.
ing to uniformity, and this is especially the case with clinical data. The
complex nature of clinical data imposes significant challenges on how
• CXR images are an integral component of our scheme, where we
to efficiently make joint representations of heterogeneous modalities in
have developed an innovative CNN model referred to as CRXMDL.
a way that enables their seamless integration. Consequently, even with
This model utilizes the InceptionResNet V2 as its backbone net-
significant importance, the predictions of inpatient LoS and mortality
work, known for its powerful feature extraction capabilities. We
using multi-modal data have received less attention in the literature
further enhance this architecture by embedding three convolution
[2].
blocks with 32, 16, and 8 filters of size 3 × 3, a max pooling
Another noticeable trend is that most clinical machine learning
(MAP) layer, a flatten layer, and a fully connected (FC) layer.
systems focus on single clinical prediction tasks. Nonetheless, in the
These additions are designed to capture and leverage the most
real-world clinical environment, multiple tasks always demonstrate
salient features from CXR images, thereby improving the accuracy
interdependence. For instance, while the risk of heart disease and the
and robustness of our predictions.
likelihood of diabetes development represent distinct medical condi-
tions, they share underlying physiological factors such as blood pres- • A unified model that incorporates losses from both regression
sure, cholesterol levels, and family medical history [18]. Multi-task and classification tasks is developed. An adaptive loss weight
learning (MTL), a subfield of machine learning, fosters the interchange assignment solution is proposed to determine the optimal weights
of insights among interconnected tasks by training multiple related for these tasks automatically, enhancing the model’s overall per-
tasks simultaneously using a single model. By sharing information be- formance.
tween related tasks, MTL improves the generalization and performance
of the model by leveraging the shared information of related tasks. Le The rest of this paper is structured as follows. Section 2 briefly
et al. [19] proposed a convolutional neural network (CNN) based multi- introduces the relevant work and identifies the research gaps. Section 3
task classification and segmentation architecture for cancer diagnosis discusses the proposed methodology in detail. Section 4 presents ex-
in mammography. Yu et al. [20] used a multi-task recurrent neural periment results as well as comparative analysis. Finally, Section 5
network with an attention mechanism to predict patient mortality in concludes the paper and points out future work.
hospitals. Despite the achievements in medical predictions using MTL,
there has not been much effort to simultaneously incorporate multi- 2. Related work
modal clinical data and multi-task learning with the aim of enhancing
prediction performance. Tan et al. [21] proposed a multi-modal and First, we conduct a review of the literature that focuses on the
multi-task DL framework called MultiCoFusion to combine the power of related studies on inpatients’ LoS and mortality predictions. Then, a
different modalities and tasks for cancer prognosis prediction. Their ex- review of the multi-modal multi-task learning in the healthcare field is
perimental results indicate that the joint learning of multiple tasks can presented. Subsequently, the current research gaps are discussed in this
utilize the intrinsic association between features (i.e., genes), and thus, section.
can further promote the learning performance. However, they manually
extracted features from histopathological images and mRNA expression 2.1. Length of stay prediction
data, and LoS prediction was not their research topic. Harerimana
et al. [9] developed a hierarchical deep attention model to forecast Accurate prediction of LoS can increase patient satisfaction by
the LoS and in-hospital mortality from ICD codes and demographic reducing unnecessary wait times and saving hospital costs. The ex-
data. Unfortunately, the LoS was predicted in a classification manner. isting ML models for LoS prediction can be broadly grouped into
In addition, LoS and mortality tasks were trained separately. two categories: classification models and regression models [22]. In
To address the aforementioned challenges, in this study, we propose classification models, the aim is to group the LoS into multiple classes,
a novel Multi-Modal Multi-Task learning model, termed as M3T-LM, to e.g., short stay, medium stay, and long stay, based on the number
perform the LoS regression and mortality classification tasks simultane- of days that the patient stays in the hospital. Morton et al. [23]
ously. Multiple data modalities, including demographic data, clinical categorized the LoS of diabetic inpatients into long-term and short-term
2
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
stays. Multiple classification models such as SVM, RF, and LASSO- multiple learning tasks at the same time, while exploiting common-
based multi-task learning were used in their comparison experiments. alities and differences across tasks, has proven to be more reliable in
Although the SVM and RF achieved the desired results, they believe identifying related characteristics, less sensitive to data noises, and less
that multi-task learning is promising for LoS prediction. Thompson overfitting risk [33]. Zhang et al. [34] proposed a 3-dimensional multi-
et al. [24] divided the LoS of newborns into prolonged LoS or not, and task DL model based on MLP-Mixer architecture to simultaneously
they used well-known ML algorithms such as LR, Decision Tree (DT), implement FDG/AV45-PET SUVR and AD status prediction tasks in
SVM, RF, and neural networks (NN) to implement the class prediction. Alzheimer’s Disease Diagnosis. Their experimental results show that
The RF, DT, and NN achieved impressive performance. Nevertheless, MTL can share feature representations, which is beneficial for both
recent studies have pointed out that the LoS distributions exhibit a sig- tasks. Shao et al. [35] proposed a multi-task multi-modal learning
nificant right-skewed pattern [9,25], which indicates that the balance method for joint prognosis and diagnosis of cancer patients. Two types
of the dataset is disrupted, with only a limited number of instances of data including histopathological images and genomic data were used
demonstrating long LoS. Consequently, this imbalance causes the model
in their scheme to address both prognosis and diagnosis tasks. They
to treat classes with long LoS as anomalies, leading to a decrease in
concluded that the MTL captured the correlation between different
classification accuracy. Therefore, it is more appropriate to formulate
tasks and obtained better performance than single-task learning. Using
the LoS task as a regression problem. Modeling LoS prediction as a
different data modalities such as histopathological images and mRNA
regression problem has gained less attention in the literature. Tsai et al.
expression data, Tan et al. [21] built a multi-modal multi-task learning
[26] applied a NN method to predict LoS for cardiology patients. They
model to perform the survival analysis and grade classification for
concluded that the NN model is robust for predicting prolonged LoS.
However, there is still room for improvement in the accuracy of their cancer prognosis diagnosis. They concluded that using multi-modal
model. Using a cluster-boosted regression method, Rouzbahman et al. data would perform better than using only single-modal data. In an-
[27] conducted mortality and LoS predictions for ICU inpatients. Their other study, Liu et al. [36] proposed jointly identifying brain diseases
findings indicated enhanced accuracy in regression predictions for both and predicting clinical scores using both magnetic resonance imag-
mortality and LoS. However, determining an optimal number of clusters ing (MRI) and patient demographic information. Their experimental
remains challenging and involves a degree of subjectivity. Muhlestein findings demonstrate that the MTL outperforms the single-task learning.
et al. [10] trained an ML ensemble model to predict inpatient LoS Although some joint learning models have been proposed, certain
after brain tumor surgery. Their experimental results demonstrated a models incorporate only a single data modality. Moreover, most of
good performance of the ensemble model for LoS prediction. However, them first extract hand-crafted features from images and pre-process
the ML ensemble model integrated multiple sub-models, increasing the the data separately, and the separate process might lack effective co-
complexity of calculations. ordination, consequently resulting in suboptimal learning performance.
Besides, most research implemented the same task type but not mixed-
2.2. Inpatient mortality prediction task types. To address abovementioned research gap, in this paper, we
establish a multi-modal multi-task learning model to simultaneously
Accurate prediction of inpatient mortality plays a vital role in implement mixed-type regression and classification tasks using multiple
evaluating disease severity, interventions, assessing the efficacy of data modalities. Specifically, we aim to simultaneously predict inpa-
novel therapies, and guiding healthcare initiatives. Over the past few tients’ length of stay and mortality as they have proven to be closely
decades, great efforts have been invested in the prediction of inpatient related for the inpatients after ICU admission, and share common
mortality. Ruzicka et al. [28] applied XGBoost, to predict patients’ feature representations needed to train regression and classification
mortality in hospitals, and compared it with a traditional unregularized
models. Heterogeneous medical data modalities, including but not lim-
LR model. In their experiments, the XGBoost outperformed the LR
ited to, static numerical data (demographics, healthcare examination),
but was not competitive with existing methods. Ganapathy et al. [29]
unstructured texts (clinical notes, long procedure texts), and Chest X-
compared several models, such as LR, Binary Discriminant Analysis
ray images, are used by the proposed multi-modal multi-task model
(BDA), Bayesian Linear Regression (BLR), NN, and RF, for inpatient
(M3T-LM) to implement the automatic prediction of inpatients’ LoS and
mortality prediction, and the BLR model achieved the best precision
mortality.
in their experiments. They concluded that the ML classifiers had the
best predictive ability in comparison to statistical models. Caicedo-
Torres et al. [30] designed a deep learning model called ISeeU2 to 3. Methodology
predict mortality inside the ICU. Their proposed model outperformed
the compared baselines, highlighting the valuable insights that can The proposed M3T-LM method includes the following key distinc-
be extracted from raw nursing notes. Similarly, in another research,
tions: (1) To maximize the utilization of available data, a multi-modal
Zeng et al. [31] proposed a recurrent neural network (RNN)-based
data fusion that fuses diverse data modalities, including patient de-
DL architecture to predict the mortality for all admissions in the
mographic information, diagnosis, free clinical notes, laboratory test
ICU. Although their proposed approach outperforms classical machine
results, and medical images, is implemented in our scheme. (2) A multi-
learning methods such as LR, RF, and XGBoost, the issue of imbalanced
task learning model with a shared network layer is proposed to capture
positive and negative sample distributions remains unaddressed. Using
the correlations between inpatients’ LoS and modality prediction tasks,
image-Transformed electrocardiograms (ECG) waveforms, Kondo et al.
[32] conducted short-term mortality prediction for cardiac care unit since these two tasks are intrinsically associated with each other. (3)
patients, and their method successfully reached the desired prediction Mixed-task types are learned in our scheme. Different from the same
accuracy. Nevertheless, it is noteworthy that the model solely relies on task type implemented in most existing research, the mixed-task types
image data, which limits the overall accuracy of their approach. including the regression and classification tasks are simultaneously
performed for the inpatient LoS and mortality prediction. The pro-
2.3. Multi-modal multi-task learning posed approach leverages the interconnections among the diverse data
and tasks, which potentially improves model efficiency and reduces
Compared to single modality models, multi-modal models have overfitting risk through modeling nonlinear within and cross-modality
the capacity of producing more reliable results owing to their ability relationships. Fig. 1 provides the flow diagram of the proposed proce-
to perceive different aspects of the data, leading to enhanced model dure. In the following, we first present the architecture of the M3T-LM,
accuracy and reliability. Multi-task learning (MTL), which can solve and then the optimization process is discussed in detail.
3
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
4
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
Table 1
The major parameters of the proposed model.
Layer (module) Input shape Filter no. Kernel size Output shape Params Repeat
num (InputLayer) (None, 26, 1) – – (None, 26, 1) – 1
conv1d (Conv1D) (None,26,1) 32 3 (None, 24, 32) 128 1
conv1d-1 (Conv1D) (None,24,32) (None, 22, 32) 3,104 1
MaxPooling1D (None,22,32) – – (None,11,32) – 1
GlobalAveragePooling1D (None,11,32) – – (None,32) – 1
conv1d-2 (Conv1D) (None,32,1) 32 3 (None,32,1) 3 1
conv1d-3 (Conv1D) (None,1,11,2) 32 3 (None,1,11,1) 3 1
(None,1,1,32)
sigmoid-1 (None,1,11,1) – – (None,1,11,1) – 1
(None,1,11,32)
ad (InputLayer) (None,23) – – (None,23) – 1
ap (InputLayer) (None,18) – – (None,18) – 1
multiply_1 (Multiply) (None,1,11,32) – – (None,1,11,32) – 1
(None,1,11,1)
𝑒th (InputLayer) (None,1) – – (None,1) – 1
Flatten (None,1,11,32) – – (None,352) (None,27) – 2
(None,27,1)
embedding (None,1) (None,18) – – (None,1,32) 192+16,960+ 17,120 3
(None,23)
cxg (InputLayer) (None,128,128,1) – – [(None,128,128,1)] – 1
LSTM (None,23,32) (None,2) 280 2
(None,18,32)
ohe (InputLayer) (None,28) – – (None,28) – 1
flatten (Flatten) (None,1,32) – – (None, 32) – 1
sequential (Sequential) (None,128,128,1) – – (None, 1536) 54,336,160 1
Concatenate layer (None,378) (None,28) – – (None, 1978) – 2
(None, 32) (None, 1536)
(None, 2)
Dense (drop=0.1) (None,1978) – – (None, 32) 126,656+2,080 2
Regression/Class (None,32) – – (None,1) (None,2) 33+33 1
Total – – – – 54,503,032 (33 layers) –
functions are established for multi-modal multi-task learning. However, relied on the defined weights between each task’s loss, but tuning these
only the result of one loss function can be updated in the process of weights by hand is a great challenge and an expensive process. There-
backpropagation, so a joint loss function must be defined to integrate fore, we include the loss weights in the definition of the loss function
the two different loss functions, and the weighted sum method is itself and develop an adaptive way to update loss weights through
the most commonly used scheme. The weighted total loss function is callbacks, which manage the changes internally. The loss weight update
formulated as: can be defined as:
∑ (𝑡) (𝑡)
𝐿(𝑡) = 𝑤𝑖 𝐿𝑖 (3) 𝑤(𝑡+1) ← 𝑤(𝑡) (4)
𝑖 − 𝜆∇𝑤𝑖 𝐿𝑔𝑟𝑎𝑑
𝑡𝑜𝑡𝑎𝑙
𝑖=1 𝑖
where 𝑤𝑖 denotes the weight for the 𝑖th loss function 𝐿𝑖 , and 𝑡 implies where 𝜆 is a constant hyper-parameter, and 𝐿𝑔𝑟𝑎𝑑 denotes the gradient
the 𝑡th epoch of training. The performance of the system is highly loss, which is introduced especially to depict the loss caused by loss
5
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
Table 2
The characteristics of the data.
weight 𝑤. The formula of gradient loss 𝐿𝑔𝑟𝑎𝑑 is written by: first evaluate the accuracy of the proposed M3T-LM compared to
∑ | (𝑡) [ ]𝛼 | state-of-the-art (SOTA) methods. Next, we perform the hyperparameter
𝐿𝑔𝑟𝑎𝑑 (𝑡, 𝑤(𝑡) |𝐺 − 𝐺(𝑡) × 𝑟(𝑡) | (5)
𝑖 )= | 𝑖 𝑖 | optimization and fine-tuning via the random search for optimal sets of
𝑖 | |1
essential hyperparameters to maximize the prediction performance of
‖ (𝑡) ‖ 𝐿(𝑡)
𝑖 ∕𝐿𝑖
(0)
the model. Ultimately, we assess the efficacy of fused data modalities
𝐺𝑖(𝑡) = ‖∇𝜃 𝑤(𝑡)
𝑖 𝐿𝑖 ‖ , 𝑟(𝑡) = [ ] (6) and newly added modules for the proposed approach via ablation
‖ ‖2 𝑖
𝐸𝑡𝑎𝑠𝑘 𝐿(𝑡)
𝑖 ∕𝐿𝑖
(0)
study.
In Eq. (5), 𝐺𝑖(𝑡) is the value of gradient normalization on the 𝑖th task
4.1. Dataset description and preprocessing
in the 𝑡th epoch of training, which is calculated by the L2 norm
of the weighted loss gradient. 𝐺(𝑡) represents the mean of gradient
MIMIC, short for the Medical Information Mart for Intensive Care,
normalization for all tasks in the 𝑡th epoch of training. 𝑟(𝑡)
𝑖 denotes the is a large database of clinical records for patients admitted to the
relative training speed of the 𝑖th task in the 𝑡th epoch of training, which Beth Israel Deaconess Medical Center (BIDMC). The MIMIC-IV, which
is calculated as the ratio of the training speed of the 𝑖th task to the consists of comprehensive clinical information on hospital stays for
average training speed of all tasks. Overall, the loss weight is regarded patients, contains de-identified records of 50,048 individual patients
as an optimization parameter in this solution, and the 𝐿𝑔𝑟𝑎𝑑 of loss admitted to the ICU or emergency department (ED) at the BIDMC in
weight 𝑤 is established in each epoch of update. The initial weights Boston, MA, USA, between 2008 and 2019. The MIMIC-IV’s most recent
of the regression and classification loss functions are both set to 0.5, version (v1.0) [43], which was released on June 22, 2022, improves on
and the gradient update is implemented for each epoch of training. MIMIC-III [44] to provide public access to the EHR data based on the
BIDMC’s MetaVision clinical information system. Whilst, MIMIC-CXR-
4. Experiments JPG v2.0.0 [45], which is a large image dataset comprised of 227,827
CXR images sourced from the BIDMC between 2011 and 2016. This
In this section, we present the empirical performance evaluation dataset is freely available to facilitate and encourage broad research in
of the proposed approach for LoS and mortality prediction tasks. We the field of medical computer vision.
6
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
To efficiently predict the inpatients’ LoS and mortality, we utilize 4.2. Experiment setup and performance metrics
the MIMIC-IV v1.0 dataset combined with the MIMIC-CXR-JPG v2.0.0
dataset in this study. All the data are de-identified, where patient The experiments are conducted using Python 3.6 deep learning
identifiers are removed according to the Health Insurance Portability framework, where the commonly-used libraries including Keras, Scikit-
and Accountability Act (HIPAA) Safe Harbor provision. The MIMIC-IV learn, TensorFlow, and Matplotlib are utilized with the aid of a graphics
v1.0 database includes a wide range of patient records, such as pa- processing unit (GPU). The hardware environment for operating the
tients’ demographic information, laboratory test results, procedures and Python DL framework to implement the proposed M3T-LM contains the
AMD EPYC 7502P 32-Core Processor, 32 GB memory, and RTX A6000
diagnoses, free-text notes authored by clinicians, medication orders,
GPU.
etc. The tables in the MIMIC-IV v1.0 dataset mainly contain AD-
To evaluate the performance of LoS prediction, the standard mea-
MISSIONS, DIAGNOSIS_ICD, D_DIAGNOSIS_ICD PATIENTS, ICDSTAYS,
sure metrics like the mean absolute error (𝑀𝐴𝐸), root mean square
PROCEDURES_ICD, and D_PROCEDURES_ICD. Among them, the AD-
error (𝑅𝑀𝑆𝐸), coefficient of determination (𝑅-𝑆𝑞𝑢𝑎𝑟𝑒 or 𝑅2 ), and
MISSIONS table provides records for each hospitalization including explained variance (𝐸𝑉 𝐴𝑅 ) [6,8,14] are utilized, which are calculated
each patient’s admission and discharge time and the source of the by
admission. The DIAGNOSIS_ICD table gives the diagnosis category
1 ∑|
𝑁
information. The PATIENTS table provides timing information and de- 𝑀𝐴𝐸 = 𝑦 − 𝑦̂𝑖 || (7)
mographics for each patient, and the ICDSTAYS table provides the ICU 𝑁 𝑖=1 | 𝑖
√
data for each hospital admission. The PROCEDURES_ICD table presents √
√1 ∑ 𝑁
the procedure code for inpatients and the corresponding procedure 𝑅𝑀𝑆𝐸 = √ (𝑦 −𝑦̂ )2 (8)
𝑁 𝑖=1 𝑖 𝑖
names are included in the D_PROCEDURES_ICD table. Additionally, for
the patients’ chest X-ray images, they are stored in the MIMIC-CXR-JPG ∑
𝑁 ∑
𝑁
𝑅2 = 1 − (𝑦𝑖 − 𝑦̂𝑖 )2 ∕ ̄2
(𝑦𝑖 − 𝑦) (9)
database in JPG format with structured labels. The characteristics of
𝑖=1 𝑖=1
the data we used are summarized in Table 2. In this research, the LoS is
defined as the time between hospital discharge and admission measured 𝐸𝑉 𝐴𝑅 = 1 − 𝑣𝑎𝑟(𝒚 − 𝒚)∕𝑣𝑎𝑟(𝒚)
̂ (10)
in days. The mortality is depicted by the field of hospital_expired_flag where 𝑦̂𝑖 , 𝑦𝑖 , and 𝑦̄ indicate the predicted value, actual value, and
in the table ADMISSIONS, where 1 indicates the death and 0 indicates mean of actual values, respectively. 𝑁 is the number of total samples,
survival of patients in hospitals. Data preprocessing, including data and 𝑣𝑎𝑟(⋅) implies the variance function. 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 reflect the
cleaning, data transformation, revision of outliers, interpolation of mean of the absolute error and the square root of the average squared
missing data [46,47] are implemented for the original tabular data. As error between the predicted value and actual value, respectively. 𝑅2
a result, a total of 51 variables, including blood, circulatory, digestive, measures the proportion of the dependent variable change that can
endocrine, injury, and nervous, are extracted from the MIMIC-IV v1.0 be interpreted by the independent variable, and the 𝐸𝑉 𝐴𝑅 reveals the
dataset. For the CXR image data, only the images with the ViewPosition explanatory power of models. For both the 𝐸𝑉 𝐴𝑅 and 𝑅2 , the ideal
of ‘‘PA (Posterior-Anterior)’’ or ‘‘AP (Anterior-Posterior)’’ are chosen in value is equal to 1, while greater values are worse for the 𝑀𝐴𝐸 and
our experiments since they are photographed from the front view. As 𝑅𝑀𝑆𝐸 indicators. Moreover, widely-used metrics including 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
(𝐴𝑐𝑐), 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃 𝑟𝑒), 𝑅𝑒𝑐𝑎𝑙𝑙 (𝑅𝑒𝑐), and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 (𝐹 1) [32] are uti-
such, 4,144 CXR image samples are used for the LoS and mortality
lized to investigate the efficiency of mortality prediction, which can be
prediction experiments. Fig. 3 portrays the distribution of LoS and
calculated by the following equations:
mortality. From Fig. 3 it can be visualized that most LoS is under 20
days, and there is a class-imbalanced problem in the distribution of 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (11)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
mortality. The survival category (class 0) comprises the majority of
samples, while the mortality category (class 1) consists of only a small 𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (12)
number of samples. The distribution is extremely unbalanced. To cope (𝑇 𝑃 + 𝐹 𝑃 )
with this challenge, the Synthetic Minority Oversampling Technique 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (13)
(SMOTE) [48] is utilized to augment the minority class samples to (𝑇 𝑃 + 𝐹 𝑁)
ensure a balanced distribution of positive and negative samples in the (2 × 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙)
𝐹1 = (14)
training set. Using the SMOTE, new synthetic data are generated to (𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙)
make the number of samples in the mortality category very close to where 𝑇 𝑃 is true positive, 𝐹 𝑃 is false positive, 𝑇 𝑁 is true negative,
that in the survival category. and 𝐹 𝑁 is false negative.
7
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
Table 3
LoS prediction of different methods.
No. Methods Training set Validation set Test set Time
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅
1 MLP 5.33 10.28 0.45 0.45 4.63 6.54 0.09 0.12 5.65 7.64 0.58 0.61 0:00:40
2 XGBoost 3.43 5.23 0.85 0.85 5.04 6.40 0.13 0.27 5.01 7.21 0.62 0.66 0:00:31
3 RF 4.55 8.05 0.66 0.66 0.66 7.04 −0.05 0.09 6.28 9.22 0.38 0.43 0:00:31
4 VGG-style CNN 7.52 15.16 −0.18 0.07 5.10 7.87 −0.30 0.09 10.62 15.41 −0.71 0.08 0:02:46
5 1D-CNN 6.76 14.18 −0.03 0.14 4.28 6.97 −0.02 0.16 9.40 14.02 −0.41 0.12 0:03:04
6 M3T-LM 3.68 10.44 0.44 0.44 3.85 5.30 0.41 0.48 5.54 7.18 0.62 0.63 0:34:22
4.3. Results and discussion and the RMSprop [49] optimizer. The dataset is randomly divided
into training, validation, and test sets in a 7:2:1 ratio. We employ the
To demonstrate the robustness of the proposed method, the mostly- leave-one-out cross-validation approach for performance evaluation,
used ML methods, multilayer perceptron (MLP), RF, and extreme where 90% of the samples are used for training and validation, and
gradient boosting (XGBoost), along with a VGG-style CNN and one- the remaining 10% for testing. Fig. 4 illustrates the LoS prediction
dimensional CNN (1D-CNN) are selected for comparative analysis. performance of the proposed approach on randomly selected samples
Different from the proposed approach that fits multi-modal data, the from both the validation and test datasets. In Fig. 4, the orange curve
classical ML methods can only take a single data modality, such as represents the actual values of inpatients’ LoS and the blue curve
numerical data. Therefore, these compared methods are conducted denotes the predicted LoS. It can be seen from Fig. 4 that the predicted
on the tabular data of the MIMIC-IV v1.0 dataset. To ensure a fair values are very close to their actual values for most samples, indicating
comparison, the core hyperparameters of the compared models are set the efficacy of the proposed approach.
to the same as that of the proposed approach. Specifically, the mini- Table 3 presents the overall LoS prediction performance of different
batch size is set to 64, with a learning rate of 1×10−3 , 30 training epochs methods. Fig. 5 visualizes the 𝑅𝑀𝑆𝐸 and 𝐸𝑉 𝐴𝑅 comparison of different
8
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
Table 4
Mortality prediction of different methods.
No. Methods Training set (%) Validation set (%) Test set (%) Time (s)
𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1
1 MLP 91.28 85.74 98.42 91.64 91.44 85.25 99.46 91.81 73.01 33.87 22.82 27.27 0:00:36
2 XGBoost 88.60 83.98 94.56 88.96 88.96 83.72 95.74 89.33 70.84 28.98 21.73 24.84 0:00:37
3 RF 89.11 82.10 99.21 89.85 90.76 84.13 99.64 91.23 70.12 27.77 21.73 24.39 0:00:36
4 VGG-style CNN 98.32 98.31 98.26 98.29 95.59 95.51 95.51 95.51 77.83 22.72 6.25 9.80 0:01:58
5 1D-CNN 98.24 98.77 97.62 98.19 94.91 96.76 92.75 94.71 78.31 27.27 7.50 11.76 0:03:08
6 M3T-LM 98.91 97.84 100.00 98.90 98.72 97.80 99.65 98.71 95.42 91.78 83.75 87.58 0:27:23
methods. It can be seen from Table 3 that the proposed approach methods. As depicted in Fig. 6, the proposed approach exhibits supe-
realizes the 𝑅2 of 0.62 and 0.41, and the 𝑅𝑀𝑆𝐸 of 7.18 and 5.30 rior operating characteristics, with the 𝑅𝑂𝐶 curves of all categories
on the test set and validation set, respectively, which are superior to positioned close to the top-left corner of the figure. This positioning
that of other comparison methods. The proposed approach achieves the signifies the validity and effectiveness of the proposed approach for
best results. Notably, although the ensemble learning algorithms such mortality prediction. In addition, it can be observed from the confusion
as XGBoost and RF, perform better than the proposed M3T-LM in the matrix of Fig. 7(f) that the M3T-LM has accurately identified most of the
training set, a significant decline in validation and test performance samples. The 67 mortality samples have been correctly recognized by
is observed for both XGBoost and RF, which has also been shown in the proposed approach except for 13 misidentified samples. Likewise,
Fig. 5. It is noted that the proposed M3T-LM takes 34 min for 30
in addition to 6 samples misclassified into the mortality category, the
epochs of training, which is also reported in Table 3. Due to the large
329 survival samples have all been correctly identified by the proposed
number of parameters in the proposed deep learning framework and
approach. As a consequence, the proposed approach achieves a test
the concurrent execution of two tasks, the proposed model requires
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 of 95.42%, and the test 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, 𝑅𝑒𝑐𝑎𝑙𝑙, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 have
more time than benchmark methods. Though the time consumption of
also realized no less than 91.78%, 83.75%, and 87.58% respectively,
the proposed approach is slightly higher than that of other compared
methods, this aspect remains manageable and can be further improved as presented in Table 4.
by various optimization techniques. Moreover, we conduct a performance evaluation of the proposed
Next, we evaluate the performance of the proposed approach for the method in comparison to the findings presented in the latest literature
inpatients’ mortality prediction. Fig. 6 depicts the receiver operating concerning the prediction of LoS and mortality as shown in Table 5.
characteristic (𝑅𝑂𝐶) curves of the proposed approach, and the test From Table 5 it can be visualized that the proposed approach delivers a
confusion matrices of different methods are portrayed in Fig. 7. Ta- comparable result and outperforms most of the existing methods on the
ble 4 presents the overall mortality prediction performance of different MIMIC-IV v1.0 dataset. In summary, the outcomes of the comparative
9
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
Table 5
Comparison with state-of-the-art methods.
ID References Year Description 𝑅𝑀𝑆𝐸 (LoS) 𝑅2 (LoS) 𝐹 1 (mortality)
1 Vaswani et al. [50] 2017 Transformer 6.18 0.27 0.738
2 Harutyunyan et al. [51] 2019 LSTM 6.61 0.28 0.745
3 Ma et al. [52] 2020 ConCare – – 0.778
4 Rocheteau et al. [53] 2021 Temporal Pointwise Convolution (TPC) 5.20 0.59 0.784
5 Al-Dailami et al. [2] 2022 Temporal Dilated Separable Convolution 4.30 0.64 0.821
with Con-text Aware Feature Fusion
(TDSC-CAFF)
6 Shu et al. [54] 2023 ML-based scoring models – – 0.613
7 This study 2024 M3T-LM 7.18 0.62 0.876
Table 6
LoS prediction results with hyperparameter optimization.
Mini-batch 𝑙𝑟=0.001 𝑙𝑟=0.002 𝑙𝑟=0.005 Time (s)
size
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅
32 4.23 6.31 0.38 0.40 4.61 6.47 0.35 0.35 4.81 6.97 0.25 0.25 0:52:02
64 5.54 7.18 0.62 0.63 4.55 6.30 0.38 0.39 4.59 6.88 0.26 0.30 0:27:23
128 4.62 6.87 0.27 0.37 4.72 6.40 0.36 0.38 5.19 7.37 0.15 0.16 0:14:21
256 4.41 6.38 0.37 0.38 4.64 7.05 0.23 0.38 10.22 12.99 −1.62 0.02 0:08:28
Table 7
The results of ablation experiments.
Ablation approach Test accuracy of LoS prediction Accuracy of mortality prediction (%) Time (s)
𝑀𝐴𝐸 𝑅𝑀𝑆𝐸 𝑅2 𝐸𝑉 𝐴𝑅 𝐴𝑐𝑐 𝑃 𝑟𝑒 𝑅𝑒𝑐 𝐹1
Delete images (CXRMDL) 6.33 10.14 0.39 0.44 89.39 100.00 45.00 62.06 0:04:56
Delete long texts (LSTM) 6.94 10.56 0.36 0.39 60.96 33.05 100.00 49.69 0:24:12
Delete attention module 5.84 9.37 0.37 0.42 92.53 72.47 98.75 83.59 0:26:52
This study 5.54 7.18 0.62 0.63 95.42 91.78 83.75 87.58 0:27:23
analysis affirm the excellence of the proposed method in predicting data modality addressed by the LSTM model in our networks. We
both LoS and mortality. notice that a significant drop in accuracy occurs in this ablation model.
The test 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 in LoS prediction rise to 6.94 and 10.56
4.4. Hyperparameter optimization (increase by 1.40 and 3.38), and the test 𝑅2 and 𝐸𝑉 𝐴𝑅 drop to 0.36
and 0.39 (decrease by 0.26 and 0.24), respectively. Likewise, the test
In this section, we implement a grid search for optimal sets of 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 in mortality prediction separately
essential hyperparameters including mini-batch size (𝑏𝑠) and learning drop to 60.96%, 33.05%, and 49.69% (decrease by 34.46%, 58.73%,
rate (𝑙𝑟) on the prediction of inpatient LoS. The range of the mini- and 37.89%). Consequently, though the efficacy of the ablated models
batch size hyperparameter is set as (|𝐵|) ∈ {32, 64, 128, 256}, and the is still better than that of other compared baselines, it suffers a notable
learning rate (𝑙𝑟) ∈ {0.001, 0.002, 0.005}. We train our model using decline in comparison with the multi-modal data aggregation model
hyperparameters from these sets for 30 epochs on the publicly available proposed in the study. In the second ablation experiment, we remove
MIMIC-IV v1.0 dataset with the same splits, as mentioned in Section 4 the newly added attention module from the networks to investigate
.3. We found the best hyperparameter set for the LoS prediction is a the performance of the proposed method. We notice a minor drop
mini-batch size of 64 with 𝑙𝑟 = 0.001. Table 6 presents the prediction in accuracy occurs in this ablation model, where the test 𝑀𝐴𝐸 and
performance of the proposed method with different hyperparameter 𝑅𝑀𝑆𝐸 of the ablated model separately rise to 5.84 and 9.37 (increase
settings. by 0.30 and 2.19) in LoS prediction. The test 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, and
𝐹 1-𝑆𝑐𝑜𝑟𝑒 of the proposed method in mortality prediction also drop
4.5. Ablation study to 92.53%, 72.47%, and 83.59% (decrease by 2.89%, 19.31%, and
3.99%), respectively. This ablation experiment demonstrates that the
To gain a deeper understanding of the sub-models and different results of the model adding the attention mechanism are slightly better
modalities contributing to a system’s performance, ablation study on than that of the model without the attention module, and removing
our model is performed. the attention module has a minor negative influence on the model
Table 7 summarizes the ablation experiment results. In the first accuracy.
ablation experiment, we remove the usage of the CXR image data
modality and delete the CXRMDL module in our model. We notice a 5. Conclusion
major decrease in the test accuracy, where the 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 in
LoS prediction rise to 6.33 and 10.14 (increase by 0.79 and 2.96), Estimating the inpatient LoS and mortality accurately is a chal-
and the 𝑅2 and 𝐸𝑉 𝐴𝑅 drop to 0.39 and 0.44 (decrease by 0.23 and lenging daily task in the field of health care. This study proposes a
0.19). The 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦, 𝑅𝑒𝑐𝑎𝑙𝑙, and 𝐹 1-𝑆𝑐𝑜𝑟𝑒 in mortality prediction novel Multi-Modal Multi-Task learning model called M3T-LM to predict
drop to 89.39%, 45.00%, and 62.06% (decrease by 6.03%, 38.75%, patient outcomes, specifically, remaining LoS and inpatient mortality.
and 25.52%), respectively. On another front, it is noted that the time Leveraging mixed regression and classification tasks, M3T-LM simulta-
consumption of this ablation model shows a significant decrease from neously predicts inpatient LoS and mortality from multi-modal data.
27 min 23 s to 4 min 56 s (a reduction of over 22 min). This ablation Acknowledging the skewed distribution of LoS, the proposed M3T-LM
experiment results demonstrate that removing the CXR image data treats LoS prediction as a regression task, delivering more informative
modality has a great impact on the performance compared to the multi- results by estimating the actual number of days rather than assigning
modal data aggregation model. Subsequently, we remove the long text classes. At the same time, M3T-LM integrates mortality prediction,
10
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
recognizing its close association with inpatient LoS scenarios. The two [10] W.E. Muhlestein, D.S. Akagi, J.M. Davies, L.B. Chambless, Predicting inpatient
tasks share standard feature representations necessary for the mixed- length of stay after brain tumor surgery: developing machine learning ensembles
to improve predictive performance, Neurosurgery 85 (3) (2019) 384.
task model training. The main advantage of the proposed method is
[11] R. Yousef, S. Khan, G. Gupta, T. Siddiqui, B.M. Albahlal, S.A. Alajlan, M.A. Haq,
its capability of utilizing the inherent correlation within multiple task U-net-based models towards optimal MR brain image segmentation, Diagnostics
types to guide the feature selection process, which can further promote 13 (9) (2023) 1624.
the learning performance. Besides, multiple data modalities are effec- [12] A.W. Salehi, S. Khan, G. Gupta, B.I. Alabduallah, A. Almjally, H. Alsolai, T.
tively utilized by the proposed method in a unified model, which leads Siddiqui, A. Mellit, A study of CNN and transfer learning in medical imaging:
Advantages, challenges, future scope, Sustainability 15 (7) (2023) 5930.
to more effective resource allocation, higher prognostic accuracy, and
[13] S.-C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, M.P. Lungren, Fusion of medical
better informative clinical decision-making. Impressively, experimental imaging and electronic health records using deep learning: a systematic review
findings demonstrate that the proposed M3T-LM is superior to other and implementation guidelines, NPJ Dig. Med. 3 (1) (2020) 136.
SOTA baseline methods on both tasks. [14] J. Chen, Y. Wen, M. Pokojovy, T.-L.B. Tseng, P. McCaffrey, A. Vo, E. Walser,
While the proposed approach yields satisfactory results, it has some S. Moen, Multi-modal learning for inpatient length of stay prediction, Comput.
Biol. Med. 171 (2024) 108121.
limitations related to computational complexity. In the future, we plan
[15] D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm,
to incorporate model pruning algorithms to simplify the model and W. Wiesbeck, K. Dietmayer, Deep multi-modal object detection and semantic
enhance its efficiency. Another interesting direction is that, in response segmentation for autonomous driving: Datasets, methods, and challenges, IEEE
to the escalating concerns regarding data privacy and security, we plan Trans. Intell. Transp. Syst. 22 (3) (2020) 1341–1360.
to incorporate privacy-preserving techniques into our model to ensure [16] S. Nemati, R. Rohani, M.E. Basiri, M. Abdar, N.Y. Yen, V. Makarenkov, A hybrid
latent space data fusion method for multimodal emotion recognition, IEEE Access
the safeguarding of sensitive information while maintaining effective 7 (2019) 172948–172964.
data fusion without harming the model predictive performance. [17] J. Petrich, Z. Snow, D. Corbin, E.W. Reutzel, Multi-modal sensor fusion with
machine learning for data-driven process monitoring for additive manufacturing,
CRediT authorship contribution statement Addit. Manuf. 48 (2021) 102364.
[18] L. Men, N. Ilk, X. Tang, Y. Liu, Multi-disease prediction using LSTM recurrent
neural networks, Expert Syst. Appl. 177 (2021) 114905.
Junde Chen: Writing – review & editing, Writing – original draft, [19] T.-L.-T. Le, N. Thome, S. Bernard, V. Bismuth, F. Patoureaux, Multitask classi-
Validation, Software, Methodology, Investigation, Data curation, Con- fication and segmentation for cancer diagnosis in mammography, 2019, arXiv
ceptualization. Qing Li: Writing – review & editing, Visualization, preprint arXiv:1909.05397.
Formal analysis. Feng Liu: Writing – review & editing, Validation. [20] R. Yu, Y. Zheng, R. Zhang, Y. Jiang, C.C. Poon, Using a multi-task recurrent
neural network with attention mechanisms to predict hospital mortality of
Yuxin Wen: Writing – review & editing, Supervision, Methodology, patients, IEEE J. Biomed. Health Inform. 24 (2) (2019) 486–492.
Funding acquisition, Conceptualization. [21] K. Tan, W. Huang, X. Liu, J. Hu, S. Dong, A multi-modal fusion framework based
on multi-task correlation learning for cancer prognosis prediction, Artif. Intell.
Declaration of competing interest Med. 126 (2022) 102260.
[22] Z. Lu, W. Chang, S. Meng, M. Xue, J. Xie, J. Xu, H. Qiu, Y. Yang, F.
Guo, The effect of high-flow nasal oxygen therapy on postoperative pulmonary
The authors declare that they have no known competing finan- complications and hospital length of stay in postoperative patients: a systematic
cial interests or personal relationships that could have appeared to review and meta-analysis, J. Intensiv. Care Med. 35 (10) (2020) 1129–1140.
influence the work reported in this paper. [23] A. Morton, E. Marzban, G. Giannoulis, A. Patel, R. Aparasu, I.A. Kakadiaris, A
comparison of supervised machine learning techniques for predicting short-term
in-hospital length of stay among diabetic patients, in: 2014 13th International
Acknowledgment Conference on Machine Learning and Applications, IEEE, 2014, pp. 428–431.
[24] B. Thompson, K.O. Elish, R. Steele, Machine learning-based prediction of pro-
This research is partially funded by the National Science Founda- longed length of stay in newborns, in: 2018 17th IEEE International Conference
on Machine Learning and Applications, ICMLA, IEEE, 2018, pp. 1454–1459.
tion, USA under Grant No. 2246158.
[25] J. Chen, T. Di Qi, J. Vu, Y. Wen, A deep learning approach for inpatient length
of stay and mortality prediction, J. Biomed. Inform. 147 (2023) 104526.
References [26] P.-F.J. Tsai, P.-C. Chen, Y.-Y. Chen, H.-Y. Song, H.-M. Lin, F.-M. Lin, Q.-P. Huang,
et al., Length of hospital stay prediction at the admission stage for cardiology
[1] J. Sheng, J. Amankwah-Amoah, Z. Khan, X. Wang, COVID-19 pandemic in the patients using artificial neural network, J. Healthc. Eng. 2016 (2016).
new era of big data analytics: Methodological innovations and future research [27] M. Rouzbahman, A. Jovicic, M. Chignell, Can cluster-boosted regression improve
directions, Br. J. Manage. 32 (4) (2021) 1164–1183. prediction of death and length of stay in the icu? IEEE J. Biomed. Health Inform.
[2] A. Al-Dailami, H. Kuang, J. Wang, Predicting length of stay in ICU and mortality 21 (3) (2016) 851–858.
with temporal dilated separable convolution and context-aware feature fusion, [28] D. Ruzicka, T. Kondo, G. Fujimoto, A.P. Craig, S.-W. Kim, H. Mikamo, Develop-
Comput. Biol. Med. 151 (2022) 106278. ment of a clinical prediction model for recurrence and mortality outcomes after
[3] A.H. Association, et al., AHA Hospital Statistics: Fast Facts on US Hospitals, clostridioides difficile infection using a machine learning approach, Anaerobe 77
American Hospital Association, 2017, available at: www/aha/org. (Accessed 31 (2022) 102628.
May 2017). [29] S. Ganapathy, K. Harichandrakumar, P. Penumadu, K. Tamilarasu, N.S. Nair,
[4] R. Resar, K. Nolan, D. Kaczynski, K. Jensen, Using real-time demand capacity Comparison of Bayesian, frequentist and machine learning models for predicting
management to improve hospitalwide patient flow, Jt. Comm. J. Qual. Patient the two-year mortality of patients diagnosed with squamous cell carcinoma of
Saf. 37 (5) (2011) 217–AP3. the oral cavity, Clin. Epidemiology Glob. Health 17 (2022) 101145.
[5] N. Meo, E. Paul, C. Wilson, J. Powers, M. Magbual, K.M. Miles, Introducing [30] W. Caicedo-Torres, J. Gutierrez, ISeeU2: Visually interpretable mortality predic-
an electronic tracking tool into daily multidisciplinary discharge rounds on a tion inside the ICU using deep learning and free-text medical notes, Expert Syst.
medicine service: a quality improvement project to reduce length of stay, BMJ Appl. 202 (2022) 117190.
Open Qual. 7 (3) (2018) e000174. [31] G. Zeng, J. Zhuang, H. Huang, M. Tian, Y. Gao, Y. Liu, X. Yu, Use of deep
[6] I.T. Peres, S. Hamacher, F.L.C. Oliveira, F.A. Bozza, J.I.F. Salluh, Data-driven learning for continuous prediction of mortality for all admissions in intensive
methodology to predict the ICU length of stay: A multicentre study of 99,492 care units, Tsinghua Sci. Technol. 28 (4) (2023) 639–648.
admissions in 109 Brazilian units, Anaesth. Crit. Care Pain Med. 41 (6) (2022) [32] T. Kondo, A. Teramoto, E. Watanabe, Y. Sobue, H. Izawa, K. Saito, H. Fujita,
101142. Prediction of short-term mortality of cardiac care unit patients using image-
[7] L. Clifton, D.A. Clifton, M.A. Pimentel, P.J. Watkinson, L. Tarassenko, Gaussian transformed ECG waveforms, IEEE J. Transl. Eng. Health Med. 11 (2023)
processes for personalized e-health monitoring with wearable sensors, IEEE 191–198.
Trans. Biomed. Eng. 60 (1) (2012) 193–197. [33] D.D. Solomon, S. Khan, S. Garg, G. Gupta, A. Almjally, B.I. Alabduallah, H.S.
[8] L. Hempel, S. Sadeghi, T. Kirsten, Prediction of intensive care unit length of stay Alsagri, M.M. Ibrahim, A.M.A. Abdallah, Hybrid majority voting: Prediction and
in the MIMIC-IV dataset, Appl. Sci. 13 (12) (2023) 6930. classification model for obesity, Diagnostics 13 (15) (2023) 2610.
[9] G. Harerimana, J.W. Kim, B. Jang, A deep attention model to forecast the length [34] Z.-C. Zhang, X. Zhao, G. Dong, X.-M. Zhao, Improving alzheimer’s disease diag-
of stay and the in-hospital mortality right on admission from ICD codes and nosis with multi-modal PET embedding features by a 3D multi-task MLP-mixer
demographic data, J. Biomed. Inform. 118 (2021) 103778. neural network, IEEE J. Biomed. Health Inf. (2023).
11
J. Chen et al. Computers in Biology and Medicine 183 (2024) 109237
[35] W. Shao, T. Wang, L. Sun, T. Dong, Z. Han, Z. Huang, J. Zhang, D. Zhang, [45] A.E. Johnson, T.J. Pollard, N.R. Greenbaum, M.P. Lungren, C.-y. Deng, Y. Peng,
K. Huang, Multi-task multi-modal learning for joint diagnosis and prognosis of Z. Lu, R.G. Mark, S.J. Berkowitz, S. Horng, MIMIC-CXR-jpg, a large publicly
human cancers, Med. Image Anal. 65 (2020) 101795. available database of labeled chest radiographs. arxiv 2019, 2019, arXiv preprint
[36] M. Liu, J. Zhang, E. Adeli, D. Shen, Joint classification and regression via deep arXiv:1901.07042.
multi-task multi-channel learning for alzheimer’s disease diagnosis, IEEE Trans. [46] C. Fan, M. Chen, X. Wang, J. Wang, B. Huang, A review on data preprocessing
Biomed. Eng. 66 (5) (2018) 1195–1206. techniques toward efficient and reliable knowledge discovery from building
[37] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of operational data, Frontiers in Energy Research 9 (2021) 652801.
the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. [47] A. Kiourtis, A. Mavrogiorgou, G. Manias, D. Kyriazis, Ontology-driven data
7132–7141. cleaning towards lossless data compression, in: Challenges of Trustable AI and
[38] C. Chen, T. Wang, Y. Liu, L. Cheng, J. Qin, Spatial attention-based convolutional Added-Value on Health, IOS Press, 2022, pp. 421–422.
transformer for bearing remaining useful life prediction, Meas. Sci. Technol. 33 [48] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic
(11) (2022) 114001. minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357.
[39] J. Brownlee, How to prepare text data for deep learning with keras, 2019, [49] T. Tieleman, G. Hinton, Rmsprop: Divide the gradient by a running average of its
Machine Learning Mastery, Disponível em: https://machinelearningmastery. recent magnitude. coursera: Neural networks for machine learning, COURSERA
com/prepare-text-data-deep-learning-keras/. (acesso em: 15 de nov. de 2020). Neural Netw. Mach. Learn 17 (2012).
[40] N. Hayat, K.J. Geras, F.E. Shamout, MedFuse: Multi-modal fusion with clinical [50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser,
time-series data and chest X-ray images, in: Machine Learning for Healthcare I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
Conference, PMLR, 2022, pp. 479–503. [51] H. Harutyunyan, H. Khachatrian, D.C. Kale, G. Ver Steeg, A. Galstyan, Multitask
[41] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-resnet and learning and benchmarking with clinical time series data, Sci. Data 6 (1) (2019)
the impact of residual connections on learning, in: Proceedings of the AAAI 96.
Conference on Artificial Intelligence, 31, (1) 2017. [52] L. Ma, C. Zhang, Y. Wang, W. Ruan, J. Wang, W. Tang, X. Ma, X. Gao, J. Gao,
[42] Q. Wang, Y. Ma, K. Zhao, Y. Tian, A comprehensive survey of loss functions in Concare: Personalized clinical feature embedding via capturing the healthcare
machine learning, Ann. Data Sci. (2020) 1–26. context, in: Proceedings of the AAAI Conference on Artificial Intelligence, 34,
[43] A.E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T.J. (01) 2020, pp. 833–840.
Pollard, S. Hao, B. Moody, B. Gow, et al., MIMIC-IV, a freely accessible electronic [53] E. Rocheteau, P. Liò, S. Hyland, Temporal pointwise convolutional networks
health record dataset, Sci. Data 10 (1) (2023) 1. for length of stay prediction in the intensive care unit, in: Proceedings of the
[44] A.E. Johnson, T.J. Pollard, L. Shen, L.-w.H. Lehman, M. Feng, M. Ghassemi, B. Conference on Health, Inference, and Learning, 2021, pp. 58–68.
Moody, P. Szolovits, L. Anthony Celi, R.G. Mark, MIMIC-III, a freely accessible [54] T. Shu, J. Huang, J. Deng, H. Chen, Y. Zhang, M. Duan, Y. Wang, X. Hu, X.
critical care database, Sci. Data 3 (1) (2016) 1–9. Liu, Development and assessment of scoring model for ICU stay and mortality
prediction after emergency admissions in ischemic heart disease: a retrospective
study of MIMIC-IV databases, Intern. Emer. Med. 18 (2) (2023) 487–497.
12