0% found this document useful (0 votes)

46 views

CPAE Contrastive Predictive Autoencoder For Unsu - 2023 - Computer Methods and

This paper proposes a new framework called contrastive predictive autoencoder (CPAE) for unsupervised pre-training on electronic health records (EHR) data. CPAE aims to extract both global slow-varying features and local transient features from EHRs. It contains a contrastive learning component and a reconstruction component. An attention mechanism is also introduced in one variant called AtCPAE. Experiments on a real-world EHR dataset show CPAE and AtCPAE outperform other models for two downstream health prediction tasks, especially with limited labeled data.

Uploaded by

doy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

CPAE Contrastive Predictive Autoencoder For Unsu - 2023 - Computer Methods and

Uploaded by

doy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Computer Methods and Programs in Biomedicine 234 (2023) 107484

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine

journal homepage: www.elsevier.com/locate/cmpb

CPAE: Contrastive predictive autoencoder for unsupervised

pre-training in health status prediction
Shuying Zhu a,1, Weizhong Zheng a,1, Herbert Pang a,b,∗
a
Li Ka Shing Faculty of Medicine, the University of Hong Kong, Hong Kong SAR, China
b
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, NC, USA

a r t i c l e i n f o a b s t r a c t

Article history: Background and objective: Fully-supervised learning approaches have shown promising results in some
Received 12 December 2022 health status prediction tasks using Electronic Health Records (EHRs). These traditional approaches rely
Revised 20 February 2023
on sufficient labeled data to learn from. However, in practice, acquiring large-scaled labeled medical data
Accepted 12 March 2023
for various prediction tasks is often not feasible. Thus, it is of great interest to utilize contrastive pre-
training to leverage the unlabeled information.
Keywords:
Methods: In this work, we propose a novel data-efficient framework, contrastive predictive autoencoder
Unsupervised learning
Pre-training (CPAE), to first learn without labels from the EHR data in the pre-training process, and then fine-tune
Electronic health records on the downstream tasks. Our framework comprises of two parts: (i) a contrastive learning process, in-
herited from contrastive predictive coding (CPC), which aims to extract global slow-varying features, and
(ii) a reconstruction process, which forces the encoder to capture local features. We also introduce the
attention mechanism in one variant of our framework to balance the above two processes.
Results: Experiments on real-world EHR dataset verify the effectiveness of our proposed framework on
two downstream tasks (i.e., in-hospital mortality prediction and length-of-stay prediction), compared to
their supervised counterparts, the CPC model, and other baseline models.
Conclusions: By comprising of both contrastive learning components and reconstruction components,
CPAE aims to extract both global slow-varying information and local transient information. The best re-
sults on two downstream tasks are all achieved by CPAE. The variant AtCPAE is particularly superior when
fine-tuned on very small training data. Further work may incorporate techniques of multi-task learning
to optimize the pre-training process of CPAEs. Moreover, this work is based on the benchmark MIMIC-III
dataset which only includes 17 variables. Future work may extend to a larger number of variables.
© 2023 Elsevier B.V. All rights reserved.

1. Introduction development of deep learning models to enhance healthcare deliv-

ery. Moreover, in many scenarios, the risk management of a spe-
Accurate personalized health status prediction can support clin- cific patient group (e.g., the elderly, children and pregnant women)
ical decision making. To achieve better outcomes for patients and can be different from the risk management of a general popula-
their quality of life, an adequate estimation of patients’ severity tion. The risk estimation of a specific patient group requires spe-
and risk stratification are essential. Patients’ diagnoses and vital cific skills and tools, which calls for the development of models
signs (such as body temperature, blood pressure, heart rate, and that can learn from a large general population sample first and
breathing rate) can serve as predictors of patients’ severity and then also be able to quickly learn from the specific patient group
risks. However, due to the complexity of human health and biol- with a smaller sample size.
ogy, assessing and predicting patients’ health status can be diffi- The recent explosive accumulation of Electronic Health Record
cult even for experienced experts. This provides incentives for the (EHR) data has provided a wealth of information for both health-
care providers and academic researchers, largely facilitating the
∗ development of precision medicine. Many machine learning and
Corresponding author at: Department of Biostatistics and Bioinformatics, Duke
University School of Medicine, NC, USA. deep learning methods have been proposed to learn from EHR
E-mail addresses: [email protected] (S. Zhu), [email protected] data in a fully supervised fashion, to predict specific tasks such as
(W. Zheng), [email protected] (H. Pang). mortality [2,11,32] or sepsis [27]. These supervised approaches can
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.cmpb.2023.107484
0169-2607/© 2023 Elsevier B.V. All rights reserved.
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

extract meaningful features from EHR data while relying on the tion mechanism (termed AtCPAE). Below we highlight our major
supervision of a large amount of task-specific labels. However, the contributions:
limitations of fully-supervised methods are twofold. First, super-
• We propose two novel architectures of unsupervised contrastive
vised methods learn in a task-specific way, which may not fully
learning, BaseCPAE and AtCPAE, to capture both global slow-
explore the intrinsic nature of data itself. Second, building large
varying features and local transient features for EHR research.
labeled datasets for all medical prediction tasks is not practically
Compared to other unsupervised models on 0.1%, 0.5%, 1%, 5%
feasible. In many scenarios, task-specific labels are a lot smaller
labeled data on two downstream classification tasks, the best
than the size of available data. Learning merely from task-specific
results are all achieved by our models.
labels does not fully utilize information provided by the whole
• Our models outperform their supervised counterparts in both
available population. Therefore, it is meaningful to design a learn-
low and high label rate scenarios with few exceptions, demon-
ing approach that can successfully leverage the unlabeled data and
strating that the contrastive pre-training helps improve the per-
quickly learn to predict downstream tasks using only a small num-
formance on downstream prediction tasks regardless of the la-
ber of labeled examples.
bel fraction. This shows the potentials of CPAEs as an effective
Contrastive unsupervised learning, which has drawn massive at-
pre-training paradigm for EHR research.
tention in computer vision and natural language processing, pro-
vides a “unsupervised pre-train then fine-tune” paradigm to mit-
2. Methodology
igate the above issues of learning from labels. It treats the pre-
diction task as a two-step problem. First, it pre-trains the model
In this section, we first introduce contrastive predictive coding
by minimizing the dissimilarity of augmented data derived from
(CPC) proposed by Oord et al. [24]. Then we describe the moti-
similar (or identical) samples and maximizing the dissimilarity of
vation and design of our proposed learning paradigm, contrastive
augmented data derived from different samples. Second, the pre-
predictive autoencoder (CPAE), and two versions of CPAE: the ba-
trained model is fine-tuned by the labels on the downstream tasks.
sic version of CPAE (BaseCPAE) and CPAE with attention mecha-
This self-supervision paradigm enables contrastive learning to fully
nism (AtCPAE). Figure 1 illustrates the architecture of CPC. Figure 2
exploit the abundant information of data itself, leading to higher
illustrates the architecture of BaseCPAE and AtCPAE.
data-efficiency and generalization ability, which motivates us to
employ self-supervised contrastive learning on EHR research. Sev-
2.1. Contrastive predictive coding
eral works have applied unsupervised contrastive learning to EHR
research. Cai et al. [3], Wang et al. [29] proposed graph-based
To capture higher-level features that broadly affect the shared
contrastive learning frameworks. Cai et al. [3] designed a frame-
context [24], CPC is designed to maximize the mutual information
work to learn patient-code graph, patient graph and medical code
between “past” context vector and “future” latent vector. There are
graph in a contrastive way. Wang et al. [29] proposed a graph sam-
five major components of CPC:
pling contrastive learning method for EHR coding problem. Both
works aimed to utilize International Classification of Diseases (ICD) • a data sampling and partition module that randomly chooses a
codes to construct contrastive pairs, which made them not ap- time frame t0 to segment the patient time series into two parts:
plicable to the scenarios where only time-series data is available. “past” and “future”;
For medical time-series data, there is little prior work using con- • an encoder function fenc (· ) which embeds each time frame into
trastive unsupervised learning. Yèche et al. [33] proposed a neigh- latent vectors;
borhood contrastive learning framework. Their data augmentation • a regressor freg (· ) which learns the context information c (t0 )
methods to define positive and negative samples were based on of “past” time series by sequentially feeding the latent vectors
channel dropout, Gaussian noise and momentum encoder. In our z (1 ), z (2 ), . . . , z (t0 ) encoded by fenc into freg ;
work, we propose a different way to augment the data, with the • a prediction function fpred (· ) which predicts “future” latent vec-
potentials to better exploit the intrinsic predictive features of the tors in K time steps: zˆ(t0 + 1 ), zˆ(t0 + 2 ), . . . , zˆ(t0 + K );
time-series. • a contrastive loss function L which measures the “discriminabil-
Our work is inspired by the intrinsic design to uncover the ity” of the model.
predictive and high-level features of time-series data of the con-
As shown in Fig. 1, in CPC framework, the original data
trastive learning paradigm, contrastive predictive coding (CPC) [24].
for one individual i is denoted as X i = [xi (1 ), xi (2 ),. . . , xi (n )].
One of the main research problems in clinical data is to predict
xi (1 ), xi (2 ) . . . , xi (n ) are the feature vector for individual i at
the tendency of patients’ future vitals and further predict the fu-
time point 1, 2, ..., n. X i (i = 1, . . . , N ; N is the number of in-
ture outcome. The framework of CPC aims to extract the intrinsic
dividuals) are firstly randomly split into “past” and “future”.
predictive features of time-series that can predict the future given
X i (i = 1, . . . , N ) are encoded by fenc , obtaining the latent vectors
the past, which has great potentials to match our goal. There-
zi (1 ), zi (2 ), . . . , zi (n ). Then latent vectors in the “past” are fed into
fore, we propose an unsupervised framework, termed Contrastive
freg to obtain the context information ci (t0 ) over time. fpred sub-
Predictive Autoencoder (CPAE), to learn the high-level information
from the EHR data and adapt the state-of-the-art contrastive learn- sequently takes ci (t0 ) as input, to predict future latent vectors
i i i
ing paradigm to EHR data. As CPC is designed to extract predic- zˆ (t0 +1 ), zˆ (t0 +2 ), . . . , zˆ (t0 +K ), in K time steps. The predicted fu-
tive and slow-varying features [24] over time, it may thus disre- ture latent vectors and the embedded future latent vectors across
gard transient local features. However, these local features can be samples will be used to form contrastive pairs.
very important in clinical scenarios. This is due to the fact that pa- The contrastive loss function LNCE is formulated with the simi-
tients’ time-series can be affected by various transient events (e.g., larity among positive pairs and the similarity among negative pairs
i
treatments and social/environmental events), of which the impacts without labels. Let (zˆ (t ), zi (t )) denote the predicted and embed-
should be taken into consideration. Therefore, we propose CPAE ded latent vectors of individual i at time frame t. Let B be a ran-
to comprise of: (i) a contrastive learning process inherited from domly sampled batch containing N samples. As contrastive learn-
CPC, which aims to extract slow-varying and predictive features, ing is a discriminative approach which aims to maximize the sim-
and (ii) a reconstruction process, which forces the encoder to cap- ilarity between positive pairs and the difference between negative
i
ture local features. In addition to the basic version of CPAE (termed pairs, we define positive pairs as (zˆ (t ), zi (t )) (t0 < t ≤ t0 +K , i ∈ B ),
BaseCPAE), we propose one variant of CPAE to incorporate atten- i.e., the predicted latent vectors and embedded latent vectors from

2
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

Fig. 1. The framework of CPC: the whole time-series is divided into “past” and “future”. CPC aims to maximize the mutual information between “past” context vector, i.e.,
ci (t ), and “future” embedded latent vectors, i.e., z (t0 + 1 ), z (t0 + 2 ), . . . , z (t0 + K ), by learning to minimize the contrastive loss formulated with the predicted future latent
vectors, i.e., zˆ(t0 + 1 ), zˆ(t0 + 2 ), . . . , zˆ(t0 + K ), and the embedded future latent vectors, i.e., z (t0 + 1 ), z (t0 + 2 ), . . . , z (t0 + K ).

Fig. 2. Frameworks of CPAE. (a) BaseCPAE: in addition to components of CPC, BaseCPAE adds a decoder function (i.e., f dec ) to reconstruct the time-series; (b) AtCPAE:
attention mechanism is added on top of the latent vectors.

l
the same individual; we define negative pairs as (zˆ (t ), z j (t )) (t0 < ( fdec ) on the latent vectors (z (1 ), z (2 ), . . . , z (n )) to reconstruct X ,
t ≤ t0 +K ; l, j ∈ B; l = j ), i.e., the predicted latent vectors and em- so that it can capture local information jointly.
bedded latent vectors from different individuals. sim(u, v) is a sim- However, evaluating the reconstruction of such sparse data re-
ilarity measurement function for vectors u and v. Here we simply quires an elaborate design. As missing data can take up to 80% of
used inner product as sim(u, v). Then the contrastive loss for batch data points in EHR data, the imputation (following the benchmark
B, Noise Contrastive Estimation (NCE) loss [10,23,24], can be for- work [12]) for these “not missing at random” (the missingness is
mulated as: non-random and relates to the missing variable) data brings a large
amount of repeated values, leading to biased inference [1]. To mit-

K i
exp(sim((zˆ (t0 + k ), zi (t0 + k ))) igate the effect caused by missing data, the uncertainty of these
LBNCE = − log (1)
l
exp(sim((zˆ (t0 + k ), z j (t0 + k ))) imputed data should be passed to the network and the calculation
i∈B k=1 l, j∈B
of reconstruction loss should be weighted more on observed data.
The more discriminating the latent vectors across samples are, the Thus, we 1) stack a Boolean indicator matrix (masking matrix) I
lower the LBNCE would be. Therefore, the way to train CPC models to X , representing the missingness of corresponding data points

is to minimize the B LBNCE and thus to update the encoder. in X (1:missing, 0:observed) ; 2) calculate the reconstruction error
not only for the whole data matrix, but also explicitly for the data
2.2. BaseCPAE points which are not missing. In addition, the missingness of these
“not missing at random” data itself can be informative in clinical
CPC has shown promising results in problems such as speaker scenarios. For instance, blood oxygen saturation is usually a diag-
identification and image classifications, demonstrating its capabil- nostic test for patients who have symptoms of lung disease (such
ity in extracting high-level, global and slow-varying information as chest distress and shallow breathing). Patients showing no res-
[24]. However, merely extracting slow-varying information may piratory symptoms may have fewer records of blood oxygen satu-
be insufficient for clinical outcomes prediction. Transient medical ration tests. To take the information contained in missingness into
events, which may lead to sudden local fluctuations on the time- consideration, we additionally calculate the reconstruction error of
series data, are important for outcomes prediction. To jointly cap- the masking matrix I . Therefore, we formulate the reconstruction
ture global slow-varying information and local transient informa- loss as follows:
tion, we propose CPAE to incorporate two processes: a contrastive Let [Xˆp |Iˆp ] denote the reconstructed data matrix and the re-
learning process as in CPC and a reconstruction process as in au- constructed masking matrix for individual p, MSE(·, ·) denote the
mean squared error. Let set C p = {(i, j )|I i, j = 0}, containing the po-
p
toencoder.
As shown in Fig. 2(a), BaseCPAE inherits the components of sitions where the data are observed (excluding imputed data) for
p
CPC (as mentioned in Section 2.1) to capture high-level and global individual p. X i, j where (i, j ) ∈ C p means the j-th feature at time i
slow-varying information. It additionally adds a decoder function for individual p is an observed value, instead of an imputed value

3
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

Fig. 3. Detailed illustration of the attention modules in AtCPAE.

which was originally missing. We calculate the reconstructed loss Other than the attention modules, AtCPAE shares the same archi-
p p
for individual p, Ldec , as tecture with BaseCPAE. The zpred (t ) is fed to freg and subsequent
p
prediction procedures, and zdec (t ) is fed to fdec for reconstruction.
p
Ldec = λ1 L1p + λ2 L2p + λ3 L3p (2)
The loss function of AtCPAE is defined in the same way as shown
in Eq. (4).
p
L1p = MSE([X p |I p ], [Xˆ |Iˆ p ] )
3. Experiments
p
( p ˆ 2
(i, j )C p X i j −X i j ) 3.1. Data
L2p =
| | Cp
The development and evaluation of both the proposed models
p
L3p = MSE(I p , Iˆ ) and baseline models are conducted on MIMIC-III database [17]. We
pre-train our models on the training set without labels, then fine-
Then we can define a multitask loss in a batch B to balance the
p tune our models on the same training set for two downstream
reconstruction loss Ldec and NCE loss LBNCE of CPAE:
tasks, i.e., in-hospital mortality and length of stay predictions. All
λNCE λdec p
the data selection, training-test split and preprocessing are con-
LB = LBNCE + Ldec (3) ducted using codes provided by benchmark work [12]. As the re-
s s
p∈B sult, the first 48 h of 17 time-series variables of 21,138 patients are
included in this work. The training, validation, and test sets contain
λNCE λ1 λ2 λ3 14,681, 3221, and 3236 patients, respectively. In the pre-training
= LBNCE + ( L1p + L2p + L3p ), (4)
s s s s phase, the 48 h time-series is divided into “past” and “future” as
p∈B
illustrated in Section 2. For the downstream prediction phase, the
here the s = λNCE + λ1 + λ2 + λ3 . context vector c (t ) (as described in Fig. 2(a), and (b) of 48 h is
The BaseCPAE can be trained by iterating through batches to fed into the linear classifier, as it represents the information con-
minimize the loss function defined by Equation 4. The weights tained in the first 48 h after ICU admission. Since the recording
λNCE , λ1 , λ2 , λ3 are a set of hyper-parameters which need to be time points of medical records are extremely uneven, the data was
tuned in the process. Further study may consider using multi-task “discretized”(see Harutyunyan et al. [12]) so each hour has exact
learning techniques to optimize the choices of the weights. four time points. Out of the 17 variables, twelve are continuous
and five are categorical. The categorical variables are one-hot en-
2.3. CPAE with attention (AtCPAE) coded, which results in 76 dimensions overall.
For in-hospital mortality prediction, we follow the standard ex-
Forcing the features of the latent vectors to be responsible for perimental settings commonly adopted by previous research [12].
both reconstruction and prediction processes at the same time may The goal of this task is to use data of the first 48 h of a hospital
limit the learning capacity of these features. We thus introduce visit to predict whether this individual would eventually die in the
attention mechanism to allow a flexible selection of features in hospital before he/she was discharged.
a self-adaptive way. As shown in Fig. 3, for each individual, two For length-of-stay prediction, we modify the commonly adopted
linear feature-wise attention modules are introduced on top of settings to improve clinical utilization. The length of stay of a pa-
the latent vectors z p (t ), with one attending to the decoding task tient is defined as the time duration from the patient being ad-
and another attending to the predictive task. As defined above, mitted to the hospital to the patient being discharged from the
z p (t ) denotes the latent vector at time point t for individual p. Let hospital. Previous work recoded the length of stay into 10 cate-
Wpred , Wdec denote the linear attention module attending to pre- gories: less than one day, 1-2 days, 2-3 days, 3-4 days, 4-5 days,
dictive task and decoding task, respectively. Wpred and Wdec are 5-6 days, 6-7 days, 1-2 weeks and more than 2 weeks. However,
learnable matrices. We then have: this definition neglects the outcome of patients. It treats the pa-
tients discharged in short duration the same as the patients who
p
zpred (t ) = spred
p
(t ) · z p (t )= Wpred z p (t ) · z (t ) p
died in short duration, which is not appropriate. Instead, we re-
p
zdec (t ) = sdec
p
(t ) · z p (t ) = Wdec z p (t ) · z p (t ) (5) define the prediction target into three categories: death, short stay

4
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

and long stay. The short stay is defined as length of stay shorter The evaluation metrics for in-hospital mortality prediction and
than 35.5 h (the median time for length of stay). The long stay is length-of-stay prediction are area under the curve (AUC) and accu-
defined as length of stay longer than 35.5 h. We do not recode the racy (top-1 accuracy), respectively.
stay duration into ten categories, since we aim to conduct experi-
ments on the training set with a small percentage of labels, which 3.4. Comparison results
would lead to too few labels for a ten-class classification task.
In this subsection, we analyse the comparison results on the
3.2. Baselines two downstream tasks between our proposed models and the
baseline models, as shown in Tables 1 and 2.
To demonstrate the effectiveness of our proposed two architec-
tures BaseCPAE and AtCPAE, we compare our models with the fol- 3.4.1. LSTM-based models
lowing four baselines of two types as ablation studies : First, compared to SUPl , AtCPAEl achieves better performance
across different proportions of labels, which demonstrates the
strength of pre-training in AtCPAEl . The pre-training is particularly
3.2.1. Supervised models trained from scratch (SUP) helpful when there are very few labels (0.1% and 0.5%). In contrast,
We train fully-supervised models (SUP), as a supervised coun- all other pre-trained models (AEl , CAEl , CPCl , BaseCPAEl ) do not
terpart of BaseCPAE and AtCPAE, from scratch, directly on down- outperform SUPl in at least one case. Particularly, the AUCs of AEl ,
stream task without pre-training. More specifically, the whole CAEl , CPCl and BaseCPAEl on length-of-stay prediction are lower
time-series is encoded by an encoder, then fed into a regression than SUPl with only one exception. This demonstrates that the
function to obtain the context vector, which is finally connected to pre-training effect of (AEl , CAEl , CPCl , BaseCPAEl ) based on LSTM
a fully connected layer for prediction. backbone on MIMIC-III dataset may not be stable, while the pre-
training in AtCPAEl stably contributes to the prediction on down-
3.2.2. Pre-trained models stream tasks.
Contrastive predictive coding (CPC) We pre-train the encoder and Second, we observe that AtCPAEl achieves the best performance
regressor of CPC. Then we connect the output of the regressor to out of all models, across two backbones (as shown with (∗ ) in Ta-
the downstream classifier. bles), on 0.1%, 0.5%, 1% of in-hospital mortality data, and 0.5%. 1%,
Contrastive autoencoder (CAE) We design CAE as an Autoencoder 5% of length-of-stay data. BaseCPAEl achieves the best performance
of which the latent vector and reconstructed vector are formed as on 5% of in-hospital mortality data. None of the best performance
contrastive pairs to pre-train. On the downstream task, the latent of the downstream tasks is achieved by baseline models.
vectors are flattened and fed into the downstream classifier.
Autoencoder (AE) We also conduct pre-training using AE and 3.4.2. CNN-based models
then feed the flattened latent vector into the downstream classi- First, we observe that for CNN backbone, CPAEs consistently
fier. outperform SUPc in both tasks. In contrast, other contrastive pre-
All the implementations of the above four baselines share the trained models (CAEc , CPCc ) do not outperform SUPc on some pro-
same hyperparameters with our proposed models, so that the portions of labeled data. For instance, on 1% labeled length-of-stay
comparisons can serve as model ablation studies. prediction data, the pre-training in CAEc and CPCc lead to 0.036
and 0.018 decrease in performance, respectively, compared to SUPc .
In addition, BaseCPAEc achieves the best performance among
3.3. Setting-up
CNN-based models for in-hospital mortality prediction. AtCPAEc
achieves the best performance among CNN-based models for
For models except SUP, the whole training process contains two
length-of-stay prediction when the label rate is 1% or 5%.
steps: 1) pre-training on the full training data without labels, 2)
BaseCPAEc ranks first on average in these prediction tasks, among
finetuning on a proportion of training data with labels. We evalu-
all CNN-based models.
ate the performance of all models on two backbones, long short-
term memory (LSTM) and convolutional neural network (CNN),
3.5. Effect of label rates
separately. Subscript l denotes the backbone LSTM (e.g., CPAEl ) and
subscript c denotes backbone CNN (e.g., CPAEc ).
We further investigate BaseCPAE and AtCPAE’s performance
To compare the performance of the above models in a fair man-
when the label rate is larger. Results are shown in Fig. 4. We ob-
ner, we implement the models in a way that the shared architec-
serve that 1) BaseCPAEl is more advantageous when the label rate
ture has identical layers and activation functions in each set of ex-
is larger than 10%; 2) AtCPAEl significantly surpasses its super-
periments. Please see source code2 for the hyper-parameters of the
vised counterpart when the label rate is smaller than 10%; 3) the
architecture.
pre-training in BaseCPAE and AtCPAE improves the performance in
The pre-training process uses Adam as the optimizer, with
most cases, except for few cases.
weight decay as 0.0 0 04, eps at 1e-9. The learning rate is auto-
matically updated during the process. For fine-tuning process, we
3.6. Case study: prediction among the elderly
sample 0.1%, 0.5%, 1%, 5% of the labeled training data in a class-
balanced way for in-hospital mortality prediction, and 0.5%, 1%, 5%
We conduct a case study to investigate the prediction perfor-
for length-of-stay prediction. Note that 0.1% length-of-stay data is
mances of our proposed models and baseline models on a sub-
too small for three-class classification so we do not conduct exper-
group consisting of the elderly (age>75). The training data, vali-
iments on 0.1% length-of-stay data. The sampling process is con-
dation data and test data are of sizes 3,750,824,843, respectively.
ducted ten times. Then we fine-tune the pre-trained models and
Since LSTM-based models achieve better performance on both
the downstream classifier (linear classifier) together on the sam-
tasks, we hereby conduct experiments focusing on the LSTM back-
pled subsets. Average performances and standard deviations are re-
bone. The results are shown in Table 3.
ported in Tables 1 and 2.
We observe that AtCPAEl achieves the best performance on in-
hospital mortality prediction, and BaseCPAEl achieves the best per-
2
Code is available at https://github.com/anonymousparticipant/CPAE. formance on length-of-stay prediction.

5
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

Table 1
AUC of models ﬁne-tuned with few labels on in-hospital mortality prediction. Models with
subscript l are LSTM-based models, and models with subscript c are CNN-based models.
Results closed to the best (difference ≤0.001) are shown in bold. The number in the bracket
is the standard deviation of the ten predictions.

In-hospital mortality prediction

Label rate 0.1% 0.5% 1% 5%

SUPl 0.672 (0.045) 0.763 (0.020) 0.790 (0.011) 0.804 (0.004)

AEl 0.699 (0.046) 0.777 (0.017) 0.792 (0.008) 0.804 (0.004)
CAEl 0.685 (0.046) 0.759 (0.017) 0.781 (0.008) 0.799 (0.004)
CPCl 0.689 (0.034) 0.762 (0.021) 0.791 (0.011) 0.804 (0.004)
BaseCPAEl (ours ) 0.692 (0.042) 0.761 (0.021) 0.793 (0.013) 0.828∗ (0.005)
AtCPAEl (ours ) 0.723∗ (0.063) 0.784∗ (0.024) 0.801∗ (0.004) 0.812 (0.004)
SUPc 0.629 (0.038) 0.684 (0.029) 0.730 (0.014) 0.773 (0.012)
AEc 0.674 (0.045) 0.724 (0.035) 0.753 (0.017) 0.798 (0.008)
CAEc 0.636 (0.043) 0.684 (0.036) 0.719 (0.030) 0.777 (0.014)
CPCc 0.663 (0.032) 0.727 (0.015) 0.752 (0.012) 0.794 (0.010)
BaseCPAEc (ours ) 0.669 (0.078) 0.745 (0.028) 0.771 (0.012) 0.797 (0.003)
AtCPAEc (ours ) 0.655 (0.034) 0.718 (0.025) 0.750 (0.015) 0.790 (0.009)

Fig. 4. Performance of BaseCPAE, AtCPAE and SUP on two prediction tasks when the label rate ranges from 0.1% to 100%. Models in the left two ﬁgures are LSTM-based, and
models in the right two ﬁgures are CNN-based.

4. Related works mortality prediction scores in medical systems. Logistic regres-

sion [2,11], Hidden Markov Model [28] ANN models [25,32], XG-
Supervised learning models on EHR Fully-supervised machine Boost and ensemble methods [16] achieve good performance on
learning and deep learning methods have promised their capa- ICU mortality predictions. Machine learning and deep learning are
bility on speciﬁc clinical outcome predictions [7,21]. Many meth- also applied to other clinical prediction tasks, such as extubation
ods developed to predict mortality have outperformed APACHE- outcome prediction [8] and surgical time prediction [22]. Several
III [18] and SAPS-II [19] which are the most widely used ICU works also explored supervised contrastive learning on EHR data

6
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

Table 2 mentioned methods all utilized sample-level features/data to con-

Accuracy of models fine-tuned with few labels on length-of-stay prediction.
struct contrastive pairs. Our work, instead, forms contrastive pairs
Length-of-stay prediction by coupling a patients’ past time-series with its future time-series,
Label rate 0.5% 1% 5% which exploits the temporal information and learns the intrinsic
predictive features of the time-series.
SUPl 0.558 (0.023) 0.579 (0.021) 0.599 (0.011)
AEl 0.559 (0.027) 0.554 (0.012) 0.583 (0.012)
CAEl 0.545 (0.019) 0.547 (0.021) 0.583 (0.018)
5. Discussion and conclusion
CPCl 0.514 (0.032) 0.474 (0.036) 0.586 (0.012)
BaseCPAEl (ours ) 0.509 (0.029) 0.549 (0.033) 0.592 (0.016) In this work, we propose a novel contrastive learning frame-
AtCPAEl (ours ) 0.567∗ (0.023 ) 0.593∗ (0.023 ) 0.605∗ (0.013 ) work CPAE and demonstrate its effectiveness on unsupervised
SUPc 0.490 (0.025) 0.512 (0.023) 0.518 (0.011)
pre-training for health status prediction. By comprising of both
AEc 0.508 (0.026) 0.513 (0.039) 0.562 (0.015)
CAEc 0.467 (0.030) 0.476 (0.030) 0.537 (0.018) contrastive learning components and reconstruction components,
CPCc 0.492 (0.019) 0.494 (0.028) 0.530 (0.018) CPAE aims to extract both global slow-varying information and
BaseCPAEc (ours ) 0.504 (0.023) 0.520 (0.035) 0.555 (0.022) local transient information. The best results on two downstream
AtCPAEc (ours ) 0.502 (0.025) 0.521 (0.046) 0.578 (0.013) tasks are all achieved by CPAE. The variant AtCPAE is particularly
superior when fine-tuned on very small training data. This work
Table 3
also has some limitations. First, we used trivial fine-tuning for
Prediction performance on subgroup where the patients are older than 75. multi-task parameters. Further work may incorporate techniques of
The numbers in bold are the best performance. multi-task learning to optimize the pre-training process of CPAEs.
Model In-hospital mortality (AUC) Length-of-stay (Accuracy)
Second, this work is based on the benchmark MIMIC-III dataset
which only includes 17 variables. In future work, a larger number
SUPl 0.814 0.591
of variables may be investigated. Third, the proposed framework is
AEl 0.805 0.595
CAEl 0.803 0.582 designed for time-series data. We may further investigate how to
CPCl 0.798 0.597 incorporate multi-modalities. Last, as clinical data are very sensi-
BaseCPAEl 0.820 0.604 tive and stored in different hospital systems, we will consider ex-
AtCPAEl 0.824 0.593
tending our frameworks to federated learning settings to extract
representations in a privacy-preserved manner.

[20,30,36]. Based on the labels, samples of the same class are Declaration of Competing Interest
treated as pairs to formulate a contrastive training signal [36]. To
mitigate the impact of high intra-class variation and class imbal- Authors declare that they have no conflict of interest.
ance, Wanyan et al. [30] proposed two strategies to construct the
Acknowledgments
k-nearest neighbors sample graph and then draw positive pairs. Li
and Gao [20] constructed a contrastive loss between samples and
HP reports personal fees from Genentech outside the submitted
learned cluster anchors, in addition to the supervised contrastive
work. Other authors state no conflict of interest. This work was
loss. However, these successes rely on the large number of labels,
exempt from Institutional Review Board review as it is an analysis
which are not always available in many clinical scenarios.
of publicly downloadable data.
Unsupervised contrastive pre-training Unsupervised contrastive
learning has been successfully applied in the fields of computer References
vision [4,14,31] and natural language processing [9,15,26], achiev-
ing comparable results to supervised learning while using only a [1] B.K. Beaulieu-Jones, D.R. Lavage, J.W. Snyder, J.H. Moore, S.A. Pendergrass,
C.R. Bauer, Characterizing and managing missing structured data in electronic
few labels. There are also efforts in biomedical fields. You et al. health records: data analysis, JMIR Med. Inform. 6 (1) (2018) e11, doi:10.2196/
[34,35] propose a graph contrastive learning model and extend medinform.8960.
contrastive learning to the biochemical applications. In the ap- [2] D. Bera, M.M. Nayak, Mortality risk assessment for ICU patients using logistic
regression, in: 2012 Computing in Cardiology, IEEE, 2012, pp. 493–496.
plication of medical image processing, several recent works in-
[3] D. Cai, C. Sun, M. Song, B. Zhang, S. Hong, H. Li, Hypergraph contrastive learn-
troduce novel pretext tasks tailored to its domain-specific down- ing for electronic health records, in: Proceedings of the 2022 SIAM Interna-
stream tasks [5,37]. Dong et al. [5] proposed a multi-task frame- tional Conference on Data Mining (SDM), SIAM, 2022, pp. 127–135.
work to learn from sequential medical images. In addition, Dong [4] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for con-
trastive learning of visual representations, arXiv preprint arXiv:2002.05709
and Voiculescu [6] extended contrastive learning for medical im- (2020).
age to a federated setup. Considering multiple modalities, some [5] N. Dong, M. Kampffmeyer, I. Voiculescu, Self-supervised multi-task representa-
other works proposed multi-modal contrastive learning methods tion learning for sequential medical images, in: Machine Learning and Knowl-
edge Discovery in Databases. Research Track: European Conference, ECML
on medical data to utilize both texts and images [13]. There also PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III 21,
exists some works which applied unsupervised contrastive learn- Springer, 2021, pp. 779–794.
ing to non-imaging medical research. Cai et al. [3], Wang et al. [6] N. Dong, I. Voiculescu, Federated contrastive learning for decentralized un-
labeled medical images, in: Medical Image Computing and Computer As-
[29] proposed graph-based contrastive learning frameworks. Cai sisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg,
et al. [3] designed a framework to learn patient-code graph, pa- France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, 2021,
tient graph and medical code graph in a contrastive way. Wang pp. 378–387.
[7] J. Egger, C. Gsaxner, A. Pepe, K.L. Pomykala, F. Jonske, M. Kurz, J. Li, J. Kleesiek,
et al. [29] proposed a graph sampling contrastive learning method
Medical deep learning–a systematic meta-review, Comput. Methods Programs
for EHR coding problem. Both works aimed to utilize ICD codes Biomed. (2022) 106874.
to build contrastive pairs, which made them not applicable to the [8] A. Fabregat, M. Magret, J.A. Ferré, A. Vernet, N. Guasch, A. Rodríguez, J. Gómez,
M. Bodí, A machine learning decision-making tool for extubation in intensive
scenarios where only time-series data is available. There is little
care unit patients, Comput. Methods Programs Biomed. 200 (2021) 105869.
prior work which designed contrastive unsupervised learning for [9] H. Fang, S. Wang, M. Zhou, J. Ding, P. Xie, Cert: contrastive self-supervised
medical time-series data. Yèche et al. [33] proposed a neighbor- learning for language understanding, arXiv preprint arXiv:2005.12766 (2020).
hood contrastive learning framework. Their augmentation meth- [10] M. Gutmann, A. Hyvärinen, Noise-contrastive estimation: a new estimation
principle for unnormalized statistical models, in: Proceedings of the Thir-
ods to define positive and negative samples were based on chan- teenth International Conference on Artificial Intelligence and Statistics, 2010,
nel dropout, Gaussian noise and momentum encoder. The afore- pp. 297–304.

7
S. Zhu, W. Zheng and H. Pang Computer Methods and Programs in Biomedicine 234 (2023) 107484

[11] S.L. Hamilton, J.R. Hamilton, Predicting in-hospital-death and mortality per- [24] A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive pre-
centage using logistic regression, in: 2012 Computing in Cardiology, IEEE, 2012, dictive coding, arXiv preprint arXiv:1807.03748 (2018).
pp. 489–492. [25] T.J. Pollard, L. Harra, D. Williams, S. Harris, D. Martinez, K. Fong, 2012 phy-
[12] H. Harutyunyan, H. Khachatrian, D.C. Kale, G. Ver Steeg, A. Galstyan, Multitask sionet challenge: an artificial neural network to predict mortality in ICU pa-
learning and benchmarking with clinical time series data, Sci. Data 6 (1) (2019) tients and application of solar physics analysis methods, in: 2012 Computing
96, doi:10.1038/s41597- 019- 0103- 9. in Cardiology, IEEE, 2012, pp. 485–488.
[13] L. Heiliger, A. Sekuboyina, B. Menze, J. Egger, J. Kleesiek, Beyond medical [26] A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A.
imaging-a review of multimodal deep learning in radiology (2022). Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from nat-
[14] O. Henaff, Data-efficient image recognition with contrastive predictive coding, ural language supervision, arXiv preprint arXiv:2103.0 0 020 (2021).
in: International Conference on Machine Learning, PMLR, 2020, pp. 4182–4192. [27] M. Scherpf, F. Gräßer, H. Malberg, S. Zaunseder, Predicting sepsis with a re-
[15] D. Iter, K. Guu, L. Lansing, D. Jurafsky, Pretraining with contrastive sentence current neural network using the MIMIC III database, Comput. Biol. Med. 113
objectives improves discourse performance of language models, arXiv preprint (2019) 103395.
arXiv:2005.10389 (2020). [28] S. Vairavan, L. Eshelman, S. Haider, A. Flower, A. Seiver, Prediction of mortality
[16] A.E. Johnson, N. Dunkley, L. Mayaud, A. Tsanas, A.A. Kramer, G.D. Clifford, Pa- in an intensive care unit using logistic regression and a hidden Markov model,
tient specific predictions in the intensive care unit using a Bayesian ensemble, in: 2012 Computing in Cardiology, IEEE, 2012, pp. 393–396.
in: 2012 Computing in Cardiology, IEEE, 2012, pp. 249–252. [29] S. Wang, P. Ren, Z. Chen, Z. Ren, H. Liang, Q. Yan, E. Kanoulas, M. de Rijke, Few-
[17] A.E. Johnson, T.J. Pollard, L. Shen, H.L. Li-Wei, M. Feng, M. Ghassemi, B. Moody, shot electronic health record coding through graph contrastive learning, arXiv
P. Szolovits, L.A. Celi, R.G. Mark, MIMIC-III, a freely accessible critical care preprint arXiv:2106.15467 (2021).
database, Sci. Data 3 (1) (2016) 1–9. [30] T. Wanyan, J. Zhang, Y. Ding, A. Azad, Z. Wang, B.S. Glicksberg, Bootstrapping
[18] W.A. Knaus, D.P. Wagner, E.A. Draper, J.E. Zimmerman, M. Bergner, P.G. Bastos, your own positive sample: contrastive learning with electronic health record
C.A. Sirio, D.J. Murphy, T. Lotring, A. Damiano, et al., The apache III prognos- data, arXiv preprint arXiv:2104.02932 (2021).
tic system: risk prediction of hospital mortality for critically III hospitalized [31] Z. Wu, Y. Xiong, S.X. Yu, D. Lin, Unsupervised feature learning via non-para-
adults, Chest 100 (6) (1991) 1619–1636. metric instance discrimination, in: Proceedings of the IEEE Conference on
[19] J.-R. Le Gall, S. Lemeshow, F. Saulnier, A new simplified acute physiology score Computer Vision and Pattern Recognition, 2018, pp. 3733–3742.
(SAPS II) based on a European/North American multicenter study, JAMA 270 [32] H. Xia, B.J. Daley, A. Petrie, X. Zhao, A neural network model for mortality
(24) (1993) 2957–2963. prediction in ICU, in: 2012 Computing in Cardiology, IEEE, 2012, pp. 261–264.
[20] R. Li, J. Gao, Multi-modal contrastive learning for healthcare data analytics, [33] H. Yèche, G. Dresdner, F. Locatello, M. Hüser, G. Rätsch, Neighborhood con-
in: 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI), trastive learning applied to online patient monitoring, in: International Con-
IEEE, 2022, pp. 120–127. ference on Machine Learning, PMLR, 2021, pp. 11964–11974.
[21] H.W. Loh, C.P. Ooi, S. Seoni, P.D. Barua, F. Molinari, U.R. Acharya, Application of [34] Y. You, T. Chen, Y. Shen, Z. Wang, Graph contrastive learning automated, 2021.
explainable artificial intelligence for healthcare: a systematic review of the last [35] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, Y. Shen, Graph contrastive learning
decade (2011–2022), Comput. Methods Programs Biomed. 226 (2022) 107161. with augmentations, arXiv preprint arXiv:2010.13902 (2020).
[22] O. Martinez, C. Martinez, C.A. Parra, S. Rugeles, D.R. Suarez, Machine learning [36] C. Zang, F. Wang, Scehr: supervised contrastive learning for clinical risk predic-
for surgical time prediction, Comput. Methods Programs Biomed. 208 (2021) tion using electronic health records, arXiv preprint arXiv:2110.04943 (2021).
106220. [37] Y. Zhang, H. Jiang, Y. Miura, C.D. Manning, C.P. Langlotz, Contrastive learning
[23] A. Mnih, Y.W. Teh, A fast and simple algorithm for training neural probabilistic of medical visual representations from paired images and text, arXiv preprint
language models, arXiv preprint arXiv:1206.6426 (2012). arXiv:2010.00747 (2020).

Miseducation - Inequality, Education and The Working Classes (PDFDrive)
No ratings yet
Miseducation - Inequality, Education and The Working Classes (PDFDrive)
224 pages
Revised Curriculum DPT-UHS (15!09!15)
67% (3)
Revised Curriculum DPT-UHS (15!09!15)
294 pages
Bowen Levy's Family Files Lawsuit Against Board of Education, Principal of Central Special
No ratings yet
Bowen Levy's Family Files Lawsuit Against Board of Education, Principal of Central Special
28 pages
Survey A11
No ratings yet
Survey A11
22 pages
(Velho)Interpretable-risk-models-for-Sleep-Apnea-and-Coronary_2022_Expert-Systems-w
No ratings yet
(Velho)Interpretable-risk-models-for-Sleep-Apnea-and-Coronary_2022_Expert-Systems-w
9 pages
Abstract AI in Dentistry Boston - ScopingReviewontheUseofMLonEHRsfortheEarlyDetectionandPreventionofDiseases
No ratings yet
Abstract AI in Dentistry Boston - ScopingReviewontheUseofMLonEHRsfortheEarlyDetectionandPreventionofDiseases
4 pages
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
No ratings yet
Combining Structured and Unstructured Data For Predictive Models: A Deep Learning Approach
11 pages
5427-Article Text-8652-1-10-20200511
No ratings yet
5427-Article Text-8652-1-10-20200511
8 pages
1 s2.0 S1532046421001696 Main
No ratings yet
1 s2.0 S1532046421001696 Main
9 pages
Medical Condition Diagnosis Through The Application of Machine Learning To Ehrs
No ratings yet
Medical Condition Diagnosis Through The Application of Machine Learning To Ehrs
6 pages
An Efficient and Privacy-Preserving Disease Risk Prediction Scheme For E-Healthcare
No ratings yet
An Efficient and Privacy-Preserving Disease Risk Prediction Scheme For E-Healthcare
14 pages
Hi-BEHRT_Hierarchical_Transformer-Based_Model_for_Accurate_Prediction_of_Clinical_Events_Using_Multimodal_Longitudinal_Electronic_Health_Records
No ratings yet
Hi-BEHRT_Hierarchical_Transformer-Based_Model_for_Accurate_Prediction_of_Clinical_Events_Using_Multimodal_Longitudinal_Electronic_Health_Records
12 pages
A Disease Prediction by Machine Learning Over Bigdata From Healthcare Communities
No ratings yet
A Disease Prediction by Machine Learning Over Bigdata From Healthcare Communities
3 pages
Knowledge_Guided_Feature_Aggregation_for_the_Prediction_of_Chronic_Obstructive_Pulmonary_Disease_With_Chinese_EMRs
No ratings yet
Knowledge_Guided_Feature_Aggregation_for_the_Prediction_of_Chronic_Obstructive_Pulmonary_Disease_With_Chinese_EMRs
10 pages
Healthcare Predictive Analytics Using Machine Learning and Deep Learning Techniques: A Survey
No ratings yet
Healthcare Predictive Analytics Using Machine Learning and Deep Learning Techniques: A Survey
45 pages
Implementation of An Incremental Deep Learning Model For Survival Prediction of Cardiovascular Patients
No ratings yet
Implementation of An Incremental Deep Learning Model For Survival Prediction of Cardiovascular Patients
9 pages
Machine Learning For Clinical Outcome Prediction
No ratings yet
Machine Learning For Clinical Outcome Prediction
11 pages
predictive health analytics
No ratings yet
predictive health analytics
47 pages
Convolution Denoising Regularized Auto Encoder Stacked Method For Coronary Acute Syndrome in Internet of Medical Things Platform
No ratings yet
Convolution Denoising Regularized Auto Encoder Stacked Method For Coronary Acute Syndrome in Internet of Medical Things Platform
11 pages
Ramesh 2019
No ratings yet
Ramesh 2019
13 pages
Particle_Swarm_Optimization-Based_Random_Forest_Framework_for_the_Classification_of_Chronic_Diseases
No ratings yet
Particle_Swarm_Optimization-Based_Random_Forest_Framework_for_the_Classification_of_Chronic_Diseases
16 pages
ppt_major_p3[1]
No ratings yet
ppt_major_p3[1]
24 pages
Team DLJ Researchpaper
No ratings yet
Team DLJ Researchpaper
8 pages
Extracting Diagnosis Pathways From Electronic Health Records 2t6rainp
No ratings yet
Extracting Diagnosis Pathways From Electronic Health Records 2t6rainp
17 pages
Synopsis
No ratings yet
Synopsis
6 pages
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
No ratings yet
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
10 pages
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
No ratings yet
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
8 pages
s41746-022-00712-8
No ratings yet
s41746-022-00712-8
14 pages
ICMLWS
No ratings yet
ICMLWS
10 pages
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
No ratings yet
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission To A Cardiac Unit
14 pages
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
No ratings yet
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
16 pages
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
No ratings yet
Opportunities in Machine Learning For Healthcare: Preprint. Under Review
16 pages
Healthcare 1
No ratings yet
Healthcare 1
8 pages
Predictive Analytics Healthcare Bayesian
No ratings yet
Predictive Analytics Healthcare Bayesian
28 pages
2020 Rbme Fs
No ratings yet
2020 Rbme Fs
12 pages
Base Paper
No ratings yet
Base Paper
4 pages
Introduction to Artificial Intelligence in medicine
No ratings yet
Introduction to Artificial Intelligence in medicine
19 pages
Gottlieb 2013
No ratings yet
Gottlieb 2013
10 pages
Extensive Deep Learning Model To Enhance Electrocardio - 2023 - Computer Methods
No ratings yet
Extensive Deep Learning Model To Enhance Electrocardio - 2023 - Computer Methods
15 pages
1 s2.0 S1532046420301179 Main
No ratings yet
1 s2.0 S1532046420301179 Main
7 pages
A Scoping Review of Causal Methods Enabling Predictions Under Hypothetical Interventions
No ratings yet
A Scoping Review of Causal Methods Enabling Predictions Under Hypothetical Interventions
16 pages
3531326
No ratings yet
3531326
29 pages
Rheumatic Heart Disease
No ratings yet
Rheumatic Heart Disease
12 pages
HealthCare Predictive Model.
No ratings yet
HealthCare Predictive Model.
2 pages
Latest Seminar Report Yash Ingole
No ratings yet
Latest Seminar Report Yash Ingole
35 pages
Xinlu Zhang
No ratings yet
Xinlu Zhang
2 pages
Knowledge Based Systems (Sci)
No ratings yet
Knowledge Based Systems (Sci)
32 pages
Identification and Prediction of Chronic Diseases Using Machine
No ratings yet
Identification and Prediction of Chronic Diseases Using Machine
9 pages
Predictive_Analytics_and_Personalized_Health_Monitoring_Powered_by_Machine_Learning
No ratings yet
Predictive_Analytics_and_Personalized_Health_Monitoring_Powered_by_Machine_Learning
6 pages
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
No ratings yet
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
7 pages
Evaluation and Improving Prediction Accuracy on Healthcare using Classifier algorithms
No ratings yet
Evaluation and Improving Prediction Accuracy on Healthcare using Classifier algorithms
7 pages
1 Review Paper
No ratings yet
1 Review Paper
5 pages
Final Research Paper
No ratings yet
Final Research Paper
10 pages
2402.00746
No ratings yet
2402.00746
9 pages
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
No ratings yet
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
15 pages
1232-Document Upload-4637-1-10-20200719
No ratings yet
1232-Document Upload-4637-1-10-20200719
7 pages
Machine Learning Algorithms For Predictive Analytics
No ratings yet
Machine Learning Algorithms For Predictive Analytics
17 pages
Capstone PROJECT PLANNING PPT Assigment 5
No ratings yet
Capstone PROJECT PLANNING PPT Assigment 5
20 pages
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
No ratings yet
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
9 pages
ESCI2024 Paper 0912
No ratings yet
ESCI2024 Paper 0912
6 pages
Clinical Decision Support System: Fundamentals and Applications
From Everand
Clinical Decision Support System: Fundamentals and Applications
Fouad Sabry
5/5 (1)
Clinical Trial Management – an Overview
From Everand
Clinical Trial Management – an Overview
Editor IJSMI
No ratings yet
Health Systems Engineering: Building A Better Healthcare Delivery System
From Everand
Health Systems Engineering: Building A Better Healthcare Delivery System
Mbuso Mabuza
No ratings yet
Adamovic Nikola CV - English
No ratings yet
Adamovic Nikola CV - English
2 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
5 pages
Ateneo de Manila (1872-1877)
No ratings yet
Ateneo de Manila (1872-1877)
3 pages
Chapter 1
No ratings yet
Chapter 1
14 pages
Intervention Plan: Republic of The Philippines Department of Education Region V Schools Division of Sorsogon City
No ratings yet
Intervention Plan: Republic of The Philippines Department of Education Region V Schools Division of Sorsogon City
2 pages
B1+ UNIT 1 Extra Grammar Practice Extension
69% (13)
B1+ UNIT 1 Extra Grammar Practice Extension
1 page
HRM Nalco Project
No ratings yet
HRM Nalco Project
28 pages
THS Yearbook 1987
No ratings yet
THS Yearbook 1987
96 pages
Assessment in Learning 2 Prof. Ed 7: San Jose Community College San Jose, Malilipot, Albay
No ratings yet
Assessment in Learning 2 Prof. Ed 7: San Jose Community College San Jose, Malilipot, Albay
4 pages
MPL Handbook
No ratings yet
MPL Handbook
13 pages
Foundations of Multiliteracies Reading, Writing and Talking in The 21st Century by Michèle Anstey, Geoff Bull
No ratings yet
Foundations of Multiliteracies Reading, Writing and Talking in The 21st Century by Michèle Anstey, Geoff Bull
258 pages
Erasmus For Young Entrepreneurs
No ratings yet
Erasmus For Young Entrepreneurs
11 pages
Jyoti Resume
No ratings yet
Jyoti Resume
3 pages
Model United Nations
100% (1)
Model United Nations
13 pages
Issues in Applied Linguistics: Michael Mccarthy
No ratings yet
Issues in Applied Linguistics: Michael Mccarthy
0 pages
Course Handout-TURBO MACHINES
No ratings yet
Course Handout-TURBO MACHINES
12 pages
Introduction EIM
No ratings yet
Introduction EIM
2 pages
Pre-Intermediate / Intermediate: Level 1
No ratings yet
Pre-Intermediate / Intermediate: Level 1
1 page
Allison Mei R. Bialba 4nur1 RLE 4
No ratings yet
Allison Mei R. Bialba 4nur1 RLE 4
3 pages
Social Movements in India A Review of Li
No ratings yet
Social Movements in India A Review of Li
3 pages
Daily Rate Quote for Research Assistants & Enumerators.docx (2)
No ratings yet
Daily Rate Quote for Research Assistants & Enumerators.docx (2)
2 pages
For Every Minute Spent in Organizing
No ratings yet
For Every Minute Spent in Organizing
1 page
Buku English Informatic 1 9786237694427
No ratings yet
Buku English Informatic 1 9786237694427
79 pages
Katelynn Capizzi Updated Resume
No ratings yet
Katelynn Capizzi Updated Resume
1 page
Book PIC Microcontrollers - Programming in C - ToC
No ratings yet
Book PIC Microcontrollers - Programming in C - ToC
1 page
Quality Customer Service Syllabus March 2021
No ratings yet
Quality Customer Service Syllabus March 2021
10 pages
WME01 01 Que 20210430
No ratings yet
WME01 01 Que 20210430
32 pages

CPAE Contrastive Predictive Autoencoder For Unsu - 2023 - Computer Methods and

Uploaded by

CPAE Contrastive Predictive Autoencoder For Unsu - 2023 - Computer Methods and

Uploaded by

Computer Methods and Programs in Biomedicine 234 (2023) 107484

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine

CPAE: Contrastive predictive autoencoder for unsupervised

1. Introduction development of deep learning models to enhance healthcare deliv-

Fig. 3. Detailed illustration of the attention modules in AtCPAE.

In-hospital mortality prediction

Label rate 0.1% 0.5% 1% 5%

SUPl 0.672 (0.045) 0.763 (0.020) 0.790 (0.011) 0.804 (0.004)

4. Related works mortality prediction scores in medical systems. Logistic regres-

Table 2 mentioned methods all utilized sample-level features/data to con-

You might also like