0% found this document useful (0 votes)
5 views

paper2

This document presents a study on predicting Alzheimer's Disease (AD) using a Long Short-Term Memory (LSTM) model, which aims to forecast the progression of the disease rather than merely classifying its current state. The authors highlight the importance of early diagnosis and propose a novel approach that utilizes temporal data from patients to improve prediction accuracy, outperforming existing models. The research utilizes data from the AD Neuroimaging Initiative (ADNI) and emphasizes the significance of preprocessing imaging data to enhance the model's performance.

Uploaded by

fazal wahab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

paper2

This document presents a study on predicting Alzheimer's Disease (AD) using a Long Short-Term Memory (LSTM) model, which aims to forecast the progression of the disease rather than merely classifying its current state. The authors highlight the importance of early diagnosis and propose a novel approach that utilizes temporal data from patients to improve prediction accuracy, outperforming existing models. The research utilizes data from the AD Neuroimaging Initiative (ADNI) and emphasizes the significance of preprocessing imaging data to enhance the model's performance.

Uploaded by

fazal wahab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

SPECIAL SECTION ON DATA-ENABLED INTELLIGENCE FOR DIGITAL HEALTH

Received May 4, 2019, accepted May 16, 2019, date of publication May 27, 2019, date of current version July 3, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2919385

Predicting Alzheimer’s Disease Using LSTM


XIN HONG 1,2 , RONGJIE LIN1 , CHENHUI YANG1 , NIANYIN ZENG 3,

CHUNTING CAI1 , JIN GOU 2 , AND JANE YANG4


1 Computer Science Department, School of Information Science and Engineering, Xiamen University, Xiamen 361005, China
2 College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
3 Department of Instrumental and Electrical Engineering, Xiamen University, Xiamen 361005, China
4 Cognitive Science Department, Sixth College, University of California, San Diego CA92092, USA

Corresponding authors: Xin Hong ([email protected], [email protected]), Chenhui Yang ([email protected]), Nianyin Zeng
([email protected])
This work was supported by the International Science and Technology Cooperation Project of Fujian Province of China under Grant
2019I0003, the Science and Technology Planning Project of Quanzhou under Grant 2017G01, the Online Course Supporting Project of
Fujian under Grant 612-52418005 and 612-50117024, and the Fundamental Research Funds for the Central Universities under Grant
20720190009.

ABSTRACT Alzheimer’s Disease (AD) is a chronic neurodegenerative disease. Early diagnosis will
considerably decrease the risk of further deterioration. Unfortunately, current studies mainly focus on
classifying the states of disease in its current stage, instead of predicting the possible development of the
disease. Long short-term memory (LSTM) is a special kind of recurrent neural network, which might be
able to connect previous information to the present task. Noticing that the temporal data for a patient are
potentially meaningful for predicting the development of the disease, we propose a predicting model based
on LSTM. Therefore an LSTM network, with fully connected layer and activation layers, is built to encode
the temporal relation between features and the next stage of Alzheimer’s Disease. The Experiments show
that our model outperforms most of the existing models.

INDEX TERMS Alzheimer’s Disease, Prediction, LSTM, Time Sequence, Magnetic Resonance Imaging

I. INTRODUCTION who gets a degenerative disease of the brain. Mild Cognitive


Alzheimer’s Disease (AD) [1] is a progressive disease with Impairment (MCI) is the prodromal stage of AD [3]. The
memory loss and other cognitive disabilities symptoms, symptom of patients may develop to the intermediate stage,
which accounts for sixty to eighty percent of all dementia namely progress Mild Cognitive Impairment (pMCI). It may
cases. AD was first described in 1906, and after 70 years also remain in a stable stage, that is stable Mild Cognitive
it was recognized as a major cause of death [2]. In 2015, Impairment (sMCI). In practice, the doctors care more about
it was recorded that 110,561 deaths were attributed to AD the time point when the disease is transformed from one
by official death certificates, making AD the sixth leading symptom to another, while current researches mainly focus
cause of death in the United States. According to Wikipedia, on predicting the possibility that the disease transforms into
AD is one of the most costly diseases.1 In 2018, it was another stage. However, it is difficult to forecast the progress
estimated that 47 billion dollars were projected to AD or other of the disease by simply classifying the states from sMCI
dementias. Early diagnosis may slow down the progression to pMCI. Our model can predict the transform stage over
of the disease, and therefore, reduce the substantial cost for a period of time. Different from previous work, a model
health care. The research showed that the early treatment to predict the development of AD with LSTM [4]–[6] is
which decreases the rate of functional decline Alzheimer or proposed.
other dementias would reduce average per-person lifetime On one side, according to the studies, brain changes associ-
cost by 4,122 dollars in 2017 [1]. ated with AD may begin more than 20 years before symptoms
In general, there are several stages in AD. Compared to appear. On the other side, LSTM, with the chain of repeating
Normal Controls (NC) who are healthy, AD is the patient neural network module, is capable of learning long-term
dependencies. In this paper, considering that the time series
The associate editor coordinating the review of this manuscript and
data may affect the prediction, we use the time step data
approving it for publication was Wenbing Zhao. obtained by a data preprocessing pipeline. Base on these data,
1 https://en.wikipedia.org/wiki/Alzheimer’s_disease we build an LSTM time sequence model to predict the AD.
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 80893
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

Then, we compare the efficiency of our algorithm with recent data serialization. Then, an LSTM model is built to perform
state-of-art algorithms. Furthermore, we test our algorithm’s the prediction. As shown in the experiments of Table 2, our
stability in different data size and evaluate our algorithm’s approach demonstrates better performance than the existing
sensibility on different features. approaches on time sequence data.

II. RELATED WORK III. DATA PREPROCESS


Magnetic Resonance Imaging(MRI) [7], Positron Emis- The images are extracted from the AD Neuroimaging Initia-
sion Tomography (PET) [8] and Diffusion Tensor Imag- tive (ADNI) database. In the datasets, one thousand series
ing (DTI) [9] are the Alzheimer’s neuroimaging tools used of cases showing AD progress symptom were collected. The
for researches and some cases of clinical diagnosis aids. datasets released from ADNI are adopted in our experiments.
Recently, a number of studies use imaging data to facilitate They have been pre-processed by skull strip, registration,
the development of treatments that target underlying brain segmentation, normalization, and smoothing [21]. The image
changes at each stage [10]. The AD Neuroimaging Initia- process is conducted by following the procedures. All the
tive (ADNI)2 is a data resource providing the datasets for skulls are stripped from images firstly, then images are regis-
researchers, including MRI, DTI, and PET images. Among tered with each other and performed the segmentation of the
all these image data types, MRI is a technique used to image main brain structures. After that, all the images are normal-
the anatomy of the brain. With MRI, atrophy can be measured ized and smoothed into a standard size.
by the volume of gray matter (GM) and white matter (WM) For advancing the prediction of dementia, five most quan-
of the brain. titative biomarkers are recently included in the revised diag-
In the literature, there are in general three different meth- nostic criteria for AD and MCI due to AD [22]. These five
ods [11], [12] to identify AD. Conventionally, the predic- biomarkers belong to three categories of MRI: measures of
tion is made with traditional classifier on human-engineered the volumes, cortical thicknesses, and surface areas. They are
features [13]–[15]. In recent years, there appear classifi- cortical Thickness Standard deviation (TS), cortical Thick-
cation approaches that are built based on the deep neural ness Average (TA), Volume of WM Parcellation (SV), Sur-
network [16]. The combination of deep neural network and face Area (SA) and the volume of Cortical Parcellation (CV).
traditional classification approach also can be seen in recent The ROI features, such as volume, cortical thickness and
literature. In these hybrid approaches, the neural network is surface areas, are extracted by using the Freesurfer from MRI,
either treated as the feature selection tool or used as the clas- PET, and DTI. The image preprocessing pipeline is shown
sifier given human engineered features are ready [17]–[20]. in Figure. 1.
In the first category of methods, Gray Matter (GM) tissue The data from ADNI are the original medical examination
volumes of the Region-Of-Interest (ROIs) are selected by data, with missing records, which include the missing records
using the Discriminative Self-representation Sparse Regres- of examination in time series and the missing records of
sion. The features are thereafter fed into SVM classifier image type. Missing records of examination, in other words,
to make the prediction [13]. In the second type of the means that not all the subjects take the examination regularly.
approaches, Random Forest (RF) is adopted to undertake the One example of missing image type is that the subject only
selected features and the Deep Neural Network (DNN) is has the examination data of MRI but without PET. All in all,
adopted to do the classification [20]. For the third category the data have to be preprocessed before the model training.
approaches, a Convolutional Neural Network (CNN) along In order to identify the time series data of a subject, we have
with PCA-LASSO is designed to select features. Similar to to preprocess the data by collecting the time series data
the first type of approaches, the SVM is adopted to classify under the same subject. In addition, because the prediction is
the disease [19]. In the first and second methods, traditional different from the classification, the original classification is
classification is unable to use time-relative features predicting reorganized into the converted states. In other words, classifi-
the disease. However, for both of the first and second types cation identifies the current state, while prediction identifies
of approaches, it is hard for traditional classifiers to make use the future’s state. For example, when we feed a series of the
of the temporal status of each individual case in the data. previous state into the model, it will output the next state.
Although the classifier is powerful for the first and second MRI Longitudinal time sequence data acquired from
types of approaches, the performance is still limited due to ADNI have 1105 subjects. Total records include 1272 NC,
the lack of temporal information in the traditional features. 1741 MCI, and 965 AD, in which, 32 NC converted to MCI,
Contrast to these categories of approaches, the third type of 147 MCI converted to AD, 1 NC converted to AD, 25 MCI
methods is able to capture the time-sensitive features, which converted to NC, 6 AD converted to MCI. In our experiments,
could be conveniently used to predict the future stage of the we select data randomly from these data set with 900 NC,
disease. Due to the superiority of this type of approaches, 900 MCI, and 900 AD records.
it is adopted in our design. First, data are pre-processed We pre-process the data by reclassifying, data interpola-
by reclassifying, data interpolation, data normalization, and tion, data normalization, data serialization, and time step pre-
process. The data processing pipeline is shown in Figure. 2.
2 http://adni.loni.usc.edu/ Originally, there are eight categories, which treats the

80894 VOLUME 7, 2019


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

FIGURE 1. The Images Preprocess Pipeline.

FIGURE 2. Data Preprocess Pipeline.

converted classes as independent classes. Usually, the last according to the stages of the disease (AD, NC, and MCI).
stage of converted classes is one of the following three stages: Besides, linear interpolation is adapted to fill the missing
AD, NC, and MCI. In our experiment, following this con- feature data. The pre-processed data are further re-scaled into
vention, eight categories are reclassified into three categories the range of [0, 1], and the data are serialized after the object

VOLUME 7, 2019 80895


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

following the time step. Although some of the objects are an LSTM [27] network, with fully connected and activation
collected more than ten years, most of the objects have one layers, is built to encode the temporal relation between fea-
or two records (one record for every six months). Considering tures and the next stage of AD.
the data distribution, we reformat the data series in two time LSTM, with the chain of repeating neural network mod-
steps. ules, is designed to avoid the long-term dependency problem.
In Figure. 2, data interpolation shows one example of a There are four interacting layers in an LSTM neural network,
subject, for whom all the data are in the table. Each row such as ‘‘forget gate’’, ‘‘input gate’’, ‘‘update gate’’, and ‘‘out-
indicates the features, and each column indicates the time put gate’’. The decision whether the information is thrown
intervals. The column is designated as ‘‘M’’ for the month; away from the cell state is made by the ‘‘forget gate’’, shown
e.g., ‘‘M00’’ refers to the first month; ‘‘M06’’ refers to the in Eq. 4. The ‘‘input gate’’ with a sigmoid layer and a tanh
sixth month. Features are denoted by ‘‘F’’, where ‘‘F1’’ refers layer decides which values will be updated. The equation is
to the first feature, ‘‘F2’’ refers to the second feature, etc. shown in Eq. 5 and Eq. 6. The ‘‘update gate’’ in equation
Each subject is scanned multiple times when the maximum Eq. 7. updates the old cell state with the value from the ‘‘input
of months is 120 months. gate’’. Finally, the ‘‘output gate’’ in equation Eq. 8 and Eq. 9
In data interpolation, for a particular subject, as not all the decides which value is to be output from the layer.
time intervals have values, subsequent data is required for a
full record. (1) In the case of the missing data in row F1, where ft = σ (Wf · [ht−1 , xt ] + bf ) (4)
there is only one value for all the procedures, we assign this where Wf is the weight matrix; bf is the bias vector; and ft is
feature value to all the time intervals. (2) Linear interpolation a number between 0 and 1, where 0 represents the forget and
is applied to the incomplete data at M36 in row F2. (3) In the 1 represents the keep.
case of row F4 (where all the values are missing), we fill the
average feature value for the same classification. it = σ (Wi · [ht−1 , xt ] + bi ) (5)
As AD is a smooth progression of brain atrophy in et = tanh(WC · [ht−1 , xt ] + bC )
C (6)
time series, the missing interval data in case (2) is inter-
polated in linear. We define the feature as {(xi , ti ); i = where Wi and WC are the weight matrices; bi and bC are the
1, · · · , n}, where xi refers to the i-th feature value and ti bias vectors; and it , C
et are the outputs of these two equations.
refers to the i-th time step. Feature sequence is formatted as
{(x0 , t0 ),(xi , ti ), · · · , (xn , tn )}. Suppose (xm , tm ) is the missing Ct = ft ∗ Ct−1 + it ∗ C
et (7)
value to be interpolated, (xi , ti ) and (xj , tj ) are the feature where ft decides which information is to be forgotten, and
values in time step i and j. it ∗ Cet chooses how many values are to be updated for the
When i < m < j, we interpolate xm as follows: cell.
(xj − xi )(tm − ti )
xm = xi +
tj − ti
(1) io = σ (Wo · [ht−1 , xt ] + bo ) (8)
ht = ot ∗ tanh(Ct ) (9)
When xi is the first not null feature value in time step i and
0 < m < i, we interpolate xm as follows: where the value of io in Equation 8 decides which part of the
(xi+1 − xi )(ti − tm ) cell state will be the output. The new cell state Ct multiplied
xm = xi − (2) by ot , and function tanh have selected, to obtain ht in Equa-
ti+1 − ti
tion 9, which is the output of the parts io .
When xj is the last not null feature value in time step j and As Long Short-Term Memory (LSTM) is capable of learn-
j < m < n, we interpolate xm as follows: ing long-term dependencies, a model based on LSTM is built
(xj − xj−1 )(tm − tj ) to express the progression of AD. In this paper, we create
xm = xj + (3)
tj − tj−1 a model that processes the time series data to make the
six months’ prediction of AD. The proposed model, shown
in Figure. 3, involves three layers: the Pre-Fully Connected
IV. AD PREDICTION MODEL AND EVALUATION Layer, Cells Layer, and Post-Fully Connected Layer. The
Studies have demonstrated that brain changes associated with Pre-Fully Connected layer consists of one fully connected
AD may begin more than twenty years before symptoms layer and a ReLU function; the Cells layer consists of one
appear [23]–[26]. MRI is a technique used to image the LSTM layer and a Dropout Wrapper; the Post-Fully Con-
anatomy of the brain. Between two scans, atrophy can be nected layer consists of one fully connected layer and a
measured by the loss of volume in a particular brain region. softmax layer.
It stands to the reason that different brain Regions Of Inter- During the model training, preprocess sequential data with
est (ROIs) have different disease prediction abilities under time steps are fed to the model, and the state of the next six
the sequence-dependent temporal context. LSTM is a special months is predicted by the model. For example, in Figure. 3,
kind of recurrent neural network, which might be able to the first two steps of O1 (subject O1) time sequential data
connect previous information to the present task. Therefore, from the 6-th month to the 12-th month, are fed to the model.

80896 VOLUME 7, 2019


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

FIGURE 3. The prediction model with LSTM.

The state of the 18-th month, six months after the 12-th month where N is the number of classes; and AUC(c ˆ i , cj ) is the
is predicted by the model as ‘‘AD’’. average AUC(ci , cj ) for classes i and j.
During the model testing, when the 18-th and 24-th month We evaluate our method by applying it to the MRI data.
features’ data of the subject are fed to the model, the output The goal is to select a compact set of time-sensitive and
is the prediction of the subject’s state in the 30-th month. disease-relevant features while maintaining high predictive
One of the most essential evaluation metrics for check- power. Specifically, when the brain region features are fed to
ing classification model’s performance, the Area Under the model with time series, the most relative combination of
Curve (AUC) is defined as the area under the Receiver features are extracted by Pre-Fully Connected layer. There-
Operating Characteristic (ROC) curve. It measures how after, time-sensitive features are selected by the Cells layer.
true positive rate (recall) and false positive rate trade- When the features are fed to the Post-Fully Connected layer,
off. And multi-class Area Under the receiver operating the combinations of the time-sensitive features are employed
Curve (mAUC), which is independent of the group sizes, to predict AD.
gives an overall measure of classification ability to each With 5-fold cross-validation, based on prediction, follow-
class. Our system is evaluated by using AUC/mAUC, where ing parameters are fixed to adjust the network structure of
binary-class is accounted for by AUC, and the multi-class is our model in the experiments: batch_size as 256, the learn-
accounted for by mAUC [28]. These measures are defined as ing_rate as 1e-4, the type of cell as LSTM Cell, the number
follows: of pre-fully connected cells as 512, the number of post-fully
AUC is the area under the ROC curve, and AUC(ci , cj ) is connected cells as 3 and keep_prob as 0.8. Furthermore,
the AUC of a class ci against the other class cj , defined in to find the best parameters, we tune our model with the
Equation 10 as follows: following parameters: the number of the fully connected
cells, the number of LSTM cells, and the number of LSTM
Ri − ni (n2i +1)
AUC(ci , cj ) = (10) layers. Experiments of three classes prediction on AD vs MCI
ni nj vs NC show that the model achieves the best mAUC when
where ni and nj are the total numbers of points belonging to the number of fully connected cells is 512, and the number
classes i and j, respectively Ri is the sum of the ranks of the of LSTM layers is 2. The results in Table 1 show that our
likelihood belonging to class i. model achieves the best mAUC when the number of LSTM
ˆ
AUC(c i , cj ) [28] is the average AUC(ci , cj ) for classes i and cells is 128.
j, defined as AUC(cˆ i , cj ) = 0.5 ∗ (AUC(ci , cj ) + AUC(cj , ci )).
While the mAUC [28] is multi-class Area Under the ROC V. EXPERIMENT
Curve, defined by Equation 11 as follows: Studies have demonstrated that brain changes associated with
2 XX N i AD may begin more than twenty years before symptoms
mAUC = ˆ
AUC(ci , cj ) (11) appear [23]–[26]. Multiple symptoms reflect the degree of
N (N − 1)
i=2 j=1 damage to neurons in different parts of the brain, where

VOLUME 7, 2019 80897


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

TABLE 1. Parameter fitting of the model.

FIGURE 5. Varying data sizes based on CV feature.

FIGURE 6. Varying data sizes based on SA feature.

TABLE 2. Different algorithms based on different prediction.


FIGURE 4. Varying data size in different prediction (TA).

symptoms advance from mild to moderate, varies from person


to person [1]. In this work, we consider the prediction states
of AD after six months. As a result, the subject’s states of
next six month are labeled as the prediction status. The per-
formance of the model is evaluated by the binary prediction
tasks (e.g., AD vs NC, MCI vs NC, and AD vs MCI) and
multi-prediction tasks (e.g., AD vs NC vs MCI).
Firstly, we preprocessed the data with time steps and used
a five folds cross-validation technique for all methods. Typ-
ically, we randomly partitioned the data into five subsets.
Then, we selected one subset for testing and the rest four
subsets for training.
To compare our algorithm with other state-of-the-art meth-
ods, we selected Zhu [13] from the first method, Lin [19] and
Amoroso [20] from the second and third method. Zhu [13]
FIGURE 7. Varying data sizes based on SV feature.
uses the Discriminative Self-representation Sparse Regres-
sion to select features and an SVM classifier to make the
classification. In Lin [19], a Random Forest (RF) is under- Based on all the features of Longitudinal MRI data, includ-
taken to select features, and the Deep Neural Network (DNN) ing 900 AD, 900 MCI, and 900 NC, the comparison with
is adopted to classify the disease. A Convolutional Neural other methods in Table 2 shows that our algorithm achieves
Network (CNN) along with PCA-LASSO in Amoroso [20] is the best AUC/mAUC in most of the predictions. Specifically,
designed to select features, and the SVM is adopted to classify AUC is 0.935 for AD vs NC, 0.798 for AD vs MCI and mAUC
the disease. is 0.777 for AD vs NC vs MCI. Our method achieves the last
All the experiments are taken in tensorflow software rank in the prediction for MCI vs NC, when AUC is 0.697,
environment and one GPU 1060i Nvidia card in the hard- because the data set of NC vs MCI is not able to distinguish
ware environment. We use five-fold cross-validation for all the MCI converting to AD from the MCI staying stable.
the methods. The results show that our model achieves a The proposed method aims to identify time relative
significant improvement over all compared approaches in biomarkers associated with disease status. Five biomarkers
prediction. identified by medical science are listed as follows: cortical

80898 VOLUME 7, 2019


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

FIGURE 10. Different feature based on AD vs MCI vs NC.


FIGURE 8. Varying data sizes based on TA feature.

achieves the best performance in AUC/mAUC among those


features. According to the neuropathology, the AD accom-
panied by the gross atrophy of cerebral cortex and certain
subcortical regions, which is identified by the TA feature.

VI. CONCLUSION
In this paper, a deep learning model is introduced to predict
the development of AD. Noticing that the disease is inher-
ently progressive, the temporal information collected from
the cases is considered in the model. Compared with existing
FIGURE 9. Varying data sizes based on TS feature.
approaches, our model can carry out the future state predic-
tion for the disease, rather than classify the state of a current
TABLE 3. The AUC/mAUC of different features. diagnosis. Experiments show that the performance of our
model is much better than most of the existing approaches.
Besides, our method is table for different data size. At the
same time, the results also show that the Cortical Thickness
Average (TA) feature is a significant feature to predict the
progression of AD.

ACKNOWLEDGMENT
Thickness Standard deviation (TS), cortical Thickness Aver- This work is supported by International Science and
age (TA), WM Parcellation Volume (SV), Surface Area (SA), Technology Cooperation Project of Fujian Province of
and Cortical parcellation Volume (CV). Among those five China under Grant 2019I0003, the Science and Tech-
features, TA is the identification of gross atrophy in the nology Planning Project of Quanzhou under Grant
cerebral cortex and certain subcortical regions. According 2017G01,the Online Course Supporting Project of Fujian
to neuropathology, AD is characterized by loss of neurons under Grant (612-52418005,612-50117024) and the Funda-
and synapses in the cerebral cortex and certain subcortical mental Research Funds for the Central Universities under
regions, resulting in gross atrophy of the affected regions. Grant 20720190009.
By comparing these five different features extracted from
the MRI image, the experimental results of our model (see REFERENCES
Table 3) show that the TA feature outperforms other features [1] Alzheimer’s Association, ‘‘2018 Alzheimer’s disease facts and figures,’’
Alzheimer’s Dementia, vol. 14, no. 3, pp. 367–429, 2018.
in most of the classification task, which means that TA is a [2] R. Katzman, ‘‘The prevalence and malignancy of Alzheimer dis-
distinct feature to predict the progression of AD. ease: A major killer,’’ Arch. Neurol., vol. 33, no. 4, pp. 217–218,
To test the stability of our algorithm, we compare the 1976.
[3] E. M. Reiman, J. B. Langbaum, and P. N. Tariot, ‘‘Alzheimer’s prevention
AUC/mAUC of data sizes from 400 numbers to 900 numbers. initiative: A proposal to evaluate presymptomatic treatments as quickly as
The results based on TA features in Figure. 4 show that possible,’’ Biomarkers Med., vol. 4, no. 1, pp. 3–14, 2010.
our algorithm is stable with varying data sizes for all the [4] W. Zaremba, I. Sutskever, and O. Vinyals, ‘‘Recurrent neural net-
work regularization,’’ 2014, arXiv:1409.2329. [Online]. Available: https://
predictions.
arxiv.org/abs/1409.2329
Based on varying data sizes, including data sizes from [5] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhu-
400 to 900, the AUC/mAUC comparison with different fea- ber, ‘‘LSTM: A search space odyssey,’’ IEEE Trans. Neural Netw. Learn.
ture types is taken in Figure. 5, Figure. 6, Figure. 7, Figure. 8, Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017.
[6] F. A. Gers, ‘‘Learning to forget: Continual prediction with LSTM,’’
Figure. 9, and Figure. 10, the results show that our algorithm in Proc. 9th Int. Conf. Artif. Neural Netw. (ICANN), 1999,
is stable in most of the features, especially the TA feature pp. 850–855.

VOLUME 7, 2019 80899


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

[7] J. W. Belliveau, D. N. Kennedy, R. C. Mckinstry, B. R. Buchbinder, [24] E. M. Reiman, Y. T. Quiroz, A. S. Fleisher, K. Chen, C. Velez-Pardo,
R. M. Weisskoff, M. S. Cohen, J. M. Vevea, T. J. Brady, and B. R. Rosen, M. Jimenez-Del-Rio, A. M. Fagan, A. R. Shah, S. Alvarez, A. Arbelaez,
‘‘Functional mapping of the human visual cortex by magnetic resonance and M. Giraldo, ‘‘Brain imaging and fluid biomarker analysis in young
imaging,’’ Science, vol. 254, no. 5032, pp. 716–719, Nov. 1991. adults at genetic risk for autosomal dominant Alzheimer’s disease in the
[8] V. Camus, P. Payoux, L. Barré, B. Desgranges, T. Voisin, C. Tauber, presenilin 1 E280A kindred: A case-control study,’’ Lancet Neurol., vol. 11,
R. La Joie, M. Tafani, C. Hommet, G. Chételat, K. Mondon, no. 12, pp. 1048–1056, 2012.
V. de La Sayette, J. P. Cottier, E. Beaufils, M. J. Ribeiro, V. Gissot, [25] C. R. Jack, Jr., V. J. Lowe, S. D. Weigand, H. J. Wiste, M. L. Senjem,
E. Vierron, J. Vercouillie, B. Vellas, F. Eustache, and D. Guilloteau, ‘‘Using D. S. Knopman, M. M. Shiung, J. L. Gunter, B. F. Boeve, B. J. Kemp, and
PET with 18 F-AV-45 (florbetapir) to quantify brain amyloid load in a M. Weiner, ‘‘Serial PIB and MRI in normal, mild cognitive impairment
clinical environment,’’ Eur. J. Nucl. Med. Mol. Imag., vol. 39, no. 4, and Alzheimer’s disease: Implications for sequence of pathological events
pp. 621–631, Apr. 2012. in Alzheimer’s disease,’’ Brain, vol. 132, no. 5, pp. 1355–1365, 2009.
[9] D. Le Bihan, J.-F. Mangin, C. Poupon, C. A. Clark, S. Pappata, N. Molko, [26] R. J. Bateman, R. J. Bateman, and C. Xiong, ‘‘Clinical and biomarker
and H. Chabriat, ‘‘Diffusion tensor imaging: Concepts and applications,’’ changes in dominantly inherited Alzheimer’s disease,’’ New England J.
J. Magn. Reson. Imag., vol. 13, no. 4, pp. 534–546, 2010. Med., vol. 367, pp. 795–804, Sep. 2012.
[10] N. Zeng, Z. Wang, B. Zineddin, Y. Li, M. Du, L. Xiao, X. Liu, and [27] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
T. Young, ‘‘Image-based quantitative analysis of gold immunochromato- Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
graphic strip via cellular neural network approach,’’ IEEE Trans. Med. [28] R. V. Marinescu, N. P. Oxtoby, A. L. Young, E. E. Bron, A. W. Toga,
Imag., vol. 33, no. 5, pp. 1129–1136, May 2014. M. W. Weiner, F. Barkhof, N. C. Fox, S. Klein, D. C. Alexander, and
[11] N. Zeng, Z. Wang, Y. Li, M. Du, and X. Liu, ‘‘A hybrid EKF and switching the EuroPOND Consortium, for the Alzheimer’s Disease Neuroimaging
PSO algorithm for joint state and parameter estimation of lateral flow Initiative, ‘‘TADPOLE challenge: Prediction of longitudinal evolution
immunoassay models,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 9, in Alzheimer’s disease,’’ 2018, arXiv:1805.03909. [Online]. Available:
no. 2, pp. 321–329, Mar./Apr. 2012. https://arxiv.org/abs/1805.03909
[12] N. Zeng, Z. Wang, and H. Zhang, ‘‘Inferring nonlinear lateral flow
immunoassay state-space models via an unscented Kalman filter,’’
Sci. China Inf. Sci., vol. 59, no. 11, pp. 112204:1–112204:10,
2016. XIN HONG received the B.S. and M.S. degrees
[13] X. Zhu, H.-I. Suk, S.-W. Lee, and D. Shen, ‘‘Discriminative self- in computer science from Hua Qiao University,
representation sparse regression for neuroimaging-based Alzheimer’s dis- China, in 2001 and 2004, respectively. She is cur-
ease diagnosis,’’ Brain Imag. Behav., vol. 13, no. 1, pp. 27–40, 2017.
rently pursuing the Ph.D. degree in computer sci-
[14] N. Zeng, H. Qiu, Z. Wang, W. Liu, H. Zhang, and Y. Li, ‘‘A new
ence with Xiamen University, China. From 2004 to
switching-delayed-PSO-based optimized SVM algorithm for diagno-
sis of Alzheimer’s disease,’’ Neurocomputing, vol. 320, pp. 195–202,
2006, she was a Teaching Assistant with the Col-
Dec. 2018. lege of Computer Science and Technology, Hua
[15] S. H. Nozadi, S. Kadoury, and Alzheimer’s Disease Neuroimaging Initia- Qiao University. From 2006 to 2012, she was a
tive, ‘‘Classification of Alzheimer’s and MCI patients from semantically Lecturer with the College of Computer Science
parcelled PET Images: A comparison between AV45 and FDG-PET,’’ Int. and Technology, Hua Qiao University. Since 2012,
J. Biomed. Imag., vol. 2018, Mar. 2018, Art. no. 1247430. she has been an Associate Professor with the College of Computer Science
[16] H. Choi, K. H. Jin, and Alzheimer’s Disease Neuroimaging Initiative, and Technology, Hua Qiao University. From 2009 to 2010, she was a Visiting
‘‘Predicting cognitive decline with deep learning of brain metabolism Researcher with CAD and CG National Research Lab, Zhe Jian University,
and amyloid imaging,’’ Behav. Brain Res., vol. 344, pp. 103–109, China. From 2012 to 2013, she was a Visiting Researcher with the Faculty of
May 2018. Science and Technology, University of Stavanger, Norway. She is the author
[17] H.-I. Suk and D. Shen, ‘‘Deep learning-based feature representation for of 5 books, 2 inventions and more than 15 articles. Her research interests
AD/MCI classification,’’ in Proc. Int. Conf. Med. Image Comput. Comput.- include medical image, data mining, and pattern recognition.
Assist. Intervent., vol. 16, 2013, pp. 583–590.
[18] F. Li, L. Tran, K. H. Thung, S. Ji, D. Shen, and J. Li, ‘‘A robust deep model
for improved classification of AD/MCI patients,’’ IEEE J. Biomed. Health
Inform., vol. 19, no. 5, pp. 1610–1616, Sep. 2015. RONGJIE LIN was born in Fujian, China, in 1995.
[19] W. Lin, T. Tong, Q. Gao, D. Guo, X. Du, Y. Yang, G. Guo, M. Xiao, M. Du, He received the bachelor’s degree in software engi-
X. Qu, and Alzheimer’s Disease Neuroimaging Initiative, ‘‘Convolutional
neering from Huaqiao University, Xiamen, China,
neural networks-based MRI image analysis for the Alzheimer’s disease
and he is pursuing the master’s degree at Xiamen
prediction from mild cognitive impairment,’’ Frontiers Neurosci., vol. 12,
p. 777, Nov. 2018. University, Xiamen, China, since 2018. His current
[20] N. Amoroso, D. Diacono, A. Fanizzi, M. La Rocca, A. Monaco, research interests include computer vision and pat-
A. Lombardi, C. Guaragnella, R. Bellotti, S. Tangaro, and Alzheimer’s tern recognition.
Disease Neuroimaging Initiative, ‘‘Deep learning reveals Alzheimer’s dis-
ease onset in MCI subjects: Results from an international challenge,’’
J. Neurosci. Methods, vol. 302, pp. 3–9, May 2018.
[21] K. K. Leung, J. Barnes, M. Modat, G. R. Ridgway, J. W. Bartlett,
N. C. Fox, S. Ourselin, and Alzheimer’s Disease Neuroimaging Initiative,
‘‘Brain MAPS: An automated, accurate and robust brain extraction tech- CHENHUI YANG received the B.S. and M.S.
nique using a template library,’’ NeuroImage, vol. 55, no. 3, pp. 1091–1108, degrees in automatic control department from
2011. the National University of Defense Technol-
[22] M. S. Albert, S. T. DeKosky, D. Dickson, B. Dubois, H. H. Feldman, N. ogy, in 1989 and 1992, respectively, where he
C. Fox, A. Gamst, D. M. Holtzman, W. J. Jagust, R. C. Petersen, P. J. started to conduct research on driverless vehicle.
Snyder, M. C. Carrillo, B. Thies, and C. H. Phelps, ‘‘The diagnosis of
He received the Ph.D. degree in mechanical engi-
mild cognitive impairment due to Alzheimer’s disease: Recommendations
neering from Zhejiang University, in 1995. He had
from the National Institute on Aging-Alzheimer’s Association workgroups
on diagnostic guidelines for Alzheimer’s disease,’’ Alzheimer’s Dementia, been being a Faculty Member in computer science
vol. 7, no. 3, pp. 270–279, 2011. department in Xiamen University, and becoming a
[23] V. L. Villemagne, S. Burnham, P. Bourgeat, B. Brown, K. A. Ellis, Full Professor, in 2005. He was a Visiting Scholar
O. Salvado, C. Szoeke, S. L. Macaulay, R. Martins, P. Maruff, and in Argonne National Laboratory, from 1990 to 2000 and in USC, from
D. Ames, ‘‘Amyloid β deposition, neurodegeneration, and cognitive 2014 to 2015. His research interests include computer vision, graphics and
decline in sporadic Alzheimer’s disease: A prospective cohort study,’’ machine learning, with strong desires to design new products in intelligent
Lancet Neurol., vol. 12, pp. 357–367, Apr. 2013. transportation, medicine and industry.

80900 VOLUME 7, 2019


X. Hong et al.: Predicting Alzheimer’s Disease Using LSTM

NIANYIN ZENG received the B.Eng. degree in JIN GOU received the B.E. and M.E. degrees in
electrical engineering and automation, in 2008, computer science and technology from Fuzhou
and the Ph.D. degree in electrical engineer- University, China, in 1999 and 2002, respectively.
ing, in 2013, both from Fuzhou University. He received the Ph.D. degree in computer science
From October 2012 to March 2013, he was a and technology from Zhejiang University, China,
Research Associate in the Department of Elec- in 2006. Currently, he is a Professor with Huaqiao
trical and Electronic Engineering, the Univer- University, China. His research interests include
sity of Hong Kong. From September 2017 to knowledge fusion and artificial intelligence.
August 2018, he was an ISEF Fellow founded by
the Korea Foundation for Advance Studies and
also a Visiting Professor at the Korea Advanced Institute of Science and
Technology. Currently, he is an Associate Professor with the Department
of Instrumental and Electrical Engineering, Xiamen University. His current
research interests include intelligent data analysis, computational intelligent,
time-series modeling, and applications. He is the author or coauthor of
several technical papers including 6 ESI Highly Cited Papers according to the
most recent Clarivate Analytics ESI report and also a very active Reviewer
for many international journals and conferences. He is currently serving
as an Associate Editor for Neurocomputing, Editorial Board members for
Computers in Biology and Medicine, Biomedical Engineering Online, and
also a Guest Editor for Frontiers in Neuroscience. He also serves as a
technical program committee member for ICBEB 2014, an Invited Session
Chair of ICCSE 2017.

JANE YANG received the High School Diploma


degree from the South Pasadena High School.
She is currently pursuing the bachelor’s degree in
CHUNTING CAI was born in Sanming, China, cognitive science with a specialization in machine
in 1991. He received the M.Sc. degree in com- learning and neural computation with the Uni-
munication engineering from Huaqiao University, versity of California at San Diego (UCSD), San
Xiamen, China, and he is pursuing the Ph.D. Diego, CA, USA. Her current research interests
degree with Xiamen University, Xiamen, China, include brain science and artificial intelligence.
since 2018. His current research interests include She has participated in several CS-oriented com-
video watermarking and information hiding. petitions during her time at UCSD, including the
winner in Entrepreneurship at HackSC and the first place in the IEEE
Quarterly Project of UCSD branch.

VOLUME 7, 2019 80901

You might also like