0% found this document useful (0 votes)
4 views

Base Paper

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Base Paper

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

TYPE Review

PUBLISHED 06 February 2023


DOI 10.3389/fncom.2023.1038636

Conventional machine learning and


OPEN ACCESS deep learning in Alzheimer’s
disease diagnosis using
EDITED BY
Junhai Xu,
Tianjin University, China

REVIEWED BY
Shuqiang Wang,
neuroimaging: A review
Shenzhen Institutes of Advanced Technology
(CAS), China
Dalin Yang, Zhen Zhao1 , Joon Huang Chuah1*, Khin Wee Lai2*,
Washington University in St. Louis, United States
Chee-Onn Chow1 , Munkhjargal Gochoo3 ,
*CORRESPONDENCE
Joon Huang Chuah Samiappan Dhanalakshmi4 , Na Wang5 , Wei Bao6* and Xiang Wu7
[email protected]
1
Khin Wee Lai Department of Electrical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia,
2
[email protected] Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia,
3
Wei Bao Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United
[email protected] Arab Emirates, 4 Department of Electronics and Communication Engineering, SRM Institute of Science and
Technology, Chennai, India, 5 School of Automation, Guangdong Polytechnic Normal University, Guangzhou,
RECEIVED 07 September 2022 China, 6 China Electronics Standardization Institute, Beijing, China, 7 School of Medical Information
ACCEPTED 13 January 2023 Engineering, Xuzhou Medical University, Xuzhou, China
PUBLISHED 06 February 2023

CITATION
Zhao Z, Chuah JH, Lai KW, Chow C-O,
Alzheimer’s disease (AD) is a neurodegenerative disorder that causes memory
Gochoo M, Dhanalakshmi S, Wang N, Bao W
and Wu X (2023) Conventional machine degradation and cognitive function impairment in elderly people. The irreversible
learning and deep learning in Alzheimer’s and devastating cognitive decline brings large burdens on patients and society. So
disease diagnosis using neuroimaging: A
far, there is no effective treatment that can cure AD, but the process of early-stage
review. Front. Comput. Neurosci. 17:1038636.
doi: 10.3389/fncom.2023.1038636 AD can slow down. Early and accurate detection is critical for treatment. In recent
COPYRIGHT
years, deep-learning-based approaches have achieved great success in Alzheimer’s
© 2023 Zhao, Chuah, Lai, Chow, Gochoo, disease diagnosis. The main objective of this paper is to review some popular
Dhanalakshmi, Wang, Bao and Wu. This is an conventional machine learning methods used for the classification and prediction of
open-access article distributed under the terms
of the Creative Commons Attribution License
AD using Magnetic Resonance Imaging (MRI). The methods reviewed in this paper
(CC BY). The use, distribution or reproduction include support vector machine (SVM), random forest (RF), convolutional neural
in other forums is permitted, provided the network (CNN), autoencoder, deep learning, and transformer. This paper also reviews
original author(s) and the copyright owner(s)
are credited and that the original publication in
pervasively used feature extractors and different types of input forms of convolutional
this journal is cited, in accordance with neural network. At last, this review discusses challenges such as class imbalance and
accepted academic practice. No use, data leakage. It also discusses the trade-offs and suggestions about pre-processing
distribution or reproduction is permitted which
does not comply with these terms.
techniques, deep learning, conventional machine learning methods, new techniques,
and input type selection.

KEYWORDS

Alzheimer’s disease, machine learning, deep learning, convolutional neural network,


transformer, classification, neuroimaging, Magnetic Resonance Imaging

1. Introduction
Alzheimer’s disease (AD) is a neurodegenerative disease with insidious onset and progressive
development. Clinically, AD is characterized by memory disorder, aphasia, apraxia, agnosia,
visual skill damage, and general dementia with personality and behavior changes. However, the
cause of the disease remains unknown. Currently, there is no accurate diagnosis and validated
disease-modifying treatment. In addition, since AD symptoms are sudden and severe memory
loss, there is a high cost of caring for the patients. The high increase in public health needs
enormous numbers of budget. The socio-economic prices of AD are far more significant than
expected. As a result, AD brings a massive burden on the patient’s family and society. According
to a recent report by Nichols et al. (2022), globally, the number of patients with dementia is 57.4
million in 2019, and the number may increase to around 152.8 million in 2050. So the accurate

Frontiers in Computational Neuroscience 01 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

diagnosis of AD is critical for the patients and society. In general, AD the results of MMSE and CDR remain valuable references due to the
has three stages: normal control (NC), mild cognitive impairement limited biomarkers available.
(MCI), and Alzheimer’s disease (AD). In particular, MCI is the Multi-modality studies utilize more than one modality of
early stage of AD, which is defined as the intermedia state between each subject, while single-modality studies use only one modality.
AD and normal control. The mark of MCI is loss of memory Using multi-modality is that features extracted from different
and poor memory. While some MCI patients proceed to AD, modalities could contain complementary information. MRI, FDG-
some remain MCI. Early diagnosis is crucial in effective clinical PET, Cerebrospinal Fluid (CSF), MMSE, and Alzheimer’s Disease
intervention and alleviating disease progression (Livingston et al., Assessment Scale-Cognitive Subscale (ADAS-Cog) are often-used
2020). AD/MCI diagnosis is one of the most significant and modalities.
challenging tasks in AD assessment. The accurate classification of Detecting AD remains a challenging task in computer vision areas
AD/MCI determines the follow-up treatment. What’s more, proper for a few reasons. The image dataset is not large enough compared
treatment during MCI can reduce or slow down the development with other image classification datasets like ImageNet. The medical
to AD. So, prediction of conversion from MCI to AD is even more images acquired are usually of low quality, with relatively coarse
valuable than classification between NC and AD or MCI patients. noise segmentation results. Compared with images in other areas,
However, the traditional AD diagnosis methods considerably rely on the complexity of medical images is high. Images are acquired from
clinical experts’ experience and human efforts. As the development different devices with various strengths, leading to more effort spent
of computer-aided diagnosis, computer softwares can provide in pre-processing. The distinction between NC and MCI, MCI, and
automatic classification and prediction of AD. For the reasons AD are not apparent in computer vision.
mentioned above, computer-aided diagnosis of AD is necessary and The public open-access databases have extensively helped AD-
significant. related research in artificial intelligence (A.I.). In recent years,
Artificial intelligence has been thriving in recent years, and this field has attracted the attention of a considerable number of
researchers and engineers conducted extensive research on AD- researchers, and the number of related papers published each year
related areas. According to the methods utilized, these researches are is also boosting rapidly. Therefore, there is a need to analyze and
in two categories: convention machine learning and deep learning. summarize related documents so that researchers can more easily
Convention machine learning methods contain support vector understand the development status of associated fields. We aim to
machine (SVM), random forest, linear regression, naïve Bayesian, help relevant researchers quickly understand the research status and
artificial neural networks, etc. Deep learning methods include future trends in related fields. The objectives of this study are to
convolutional neural networks, recursive neural networks, etc. explore the associated datasets, pre-processing techniques, popular
Many biomarkers, such as genetic, biological, and neuroimaging conventional machine learning methods, including SVM and RF, and
techniques, including Magnetic Resonance Imaging (MRI), Deep learning methods, including CNN, autoencoders, transformer,
fluorodeoxyglucose positron emission tomography (FDG-PET) and transfer learning. So we examined recent works, compared the
imaging, amyloid PET, and diffusion tensor imaging (DTI), are trade-off, summarized the current trend, and provided a future guide
used for AD diagnosis. The MRI image is one of the most widely on computer-aided AD diagnosis using MRI images in the A.I. area.
used for the early detection and classification of AD. Since MRI This review mainly focuses on the highly cited studies that adopted
provides high-resolution images of brain anatomical structures, the most widely used techniques.
researchers can retrieve rich information from MRI images. MRI As shown in Figure 1, we will organize our paper as below: In
shows the shrinkage of brain tissue, particularly the hippocampus, the Introduction Section, we will have a brief introduction to the
which confirms the structural change in the brain. Moreover, MRI background knowledge. In the Materials Section, we will introduce
can be used to predict if a patient with MCI will eventually develop the public datasets that are often used in related areas. In the Methods
Alzheimer’s disease since MRI can detect brain abnormalities Section, we will explore our search strategy, the pre-processing
associated with MCI. In recent years, public open-access databases techniques, conventional machine learning like Support-vector
supplied MRI images of AD biomarkers, and the datasets were machine (SVM) and Random forest (RF), convolutional neural
maintained by updating and adding new data. Considerable network (CNN), autoencoders, transformer, and transfer learning
researchers have conducted their work to analyze AD employing methods. In the Challenges and discussion Section, we will discuss the
MRI-based biomarkers. In this article, we mainly focus on MRI- current challenges like class imbalance, data leakage, and trade-offs
based applications. Some researchers used MRI together with PET, when designing a proper model with our recommendations.
so we also introduced PET. MRI and PET data as the 3D image
which reveals structural brain atrophy are two of the most frequently
used modalities in deep learning areas. MRI uses magnetic resonance 2. Materials
phenomena to extract electromagnetic signals from the human body
and reconstruct a 3D representation of human information. MRI 2.1. Datasets
can be done without injecting radioactive isotopes, which makes
MRI safer. PET uses short-lived radionuclides to generate images of In recent years, many research centers accumulated plentiful
the target. The PET scanner can detect areas of high radionuclides medical and image data and published the data to the public. Public
concentration within the body. Both MRI and PET are non-invasive data plays a significant role for researchers in research and developing
neuroimaging modalities. The other two most widely used medical AI on AD. The online datasets make biomarker information like
tests that evaluate AD levels are the Mini-Mental State Examination neuroimaging modalities, genetic and blood information, and clinical
(MMSE) and the Clinical Dementia Rating (CDR). Taking the results and cognitive assessments. Most pervasively used datasets include
of MMSE and CDR as the ground truth labels may be incorrect. Still, Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Jack et al.,

Frontiers in Computational Neuroscience 02 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

FIGURE 1
Mind map of this paper.

2008), Australian Imaging, Biomarker, & Lifestyle Flagship Study MRI scans from 150 subjects. OASIS-3 contains 2,168 MRIs and
of Aging (AIBL) (Ellis et al., 2009), Open Access Series of Imaging 1,608 PET scans from 1,098 subjects.
Studies (OASIS) (Marcus et al., 2007, 2010; LaMontagne et al., 2019), The MIRIAD dataset contains 708 MRI scans from 46 AD
and Minimal Interval Resonance Imaging in Alzheimer’s Disease patients and 23 NC volunteers.
(MIRIAD) (Malone et al., 2013). Moreover, some studies use the datasets above along with their
ADNI is notable for being a longitudinal and multicenter own datasets. For instance, Basaia et al. (2019) collected 3D T1-
study. It is the most common dataset. The objective of ADNI is weighted images from 124 patients with probable AD, 50 patients
to investigate if the combination of MRI, PET, other biological with MCI, and 55 healthy controls. They named their dataset as
markers, and clinical and neuropsychological assessment could “Milan” dataset. Suk et al. (2016b) used images from ADNI-2 and
measure the progression of MCI and early AD. ADNI-1, ADNI- their in-house dataset with 37 participants of 12 MCI subjects and 25
GO, ADNI-2, and ADNI-3. The following collections are the NC subjects.
supplement and improvements of previous ones. From patients,
ADNI researchers collect several data types, including clinical,
genetic, MRI, PET images, and biospecimen. ADNI-1 contains 200
NC, 400 MCI, and 200 AD. ADNI-GO adds 200 MCI on ADNI- 3. Methods
1. ADNI-2 extends ADNI-1 and ADNI-GO with 150 NC, 100
early MCI, 150 late MCI, and 150 AD. ADNI-3 expands existing This section reviews a few classical conventional machine
ADNI-1, ADNI-GO, and ADNI-2, adding 133 NC, 151 MCI, and learning and deep learning methods. Firstly, we introduce the search
87 AD. strategy for our review. Secondly, we will examine two traditional
AIBL collects imaging and medical data from 211 individuals machine learning methods: Support-vector machine and random
with AD, 133 individuals with MCI, and 768 healthy individuals forest. Thirdly, we will review the convolutional neural network,
without cognitive impairment. including popular CNN backbones and different input types of CNN.
OASIS aims to share neuroimaging brain data sets with Fourthly, we will discuss autoencoders in AD detection. Fifthly,
researchers in related areas. OASIS has three releases: OASIS-1 we talk about transformer. At last, we will briefly introduce the
contains 434 MRI scans from 416 subjects. OASIS-2 contains 373 application of transfer learning.

Frontiers in Computational Neuroscience 03 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

3.1. Search strategy

This paper is conducted by following the PRISMA 2020


guidelines (Page et al., 2021).

3.1.1. Databases and keywords of search


We searched Scopus, one of the largest abstract and citation
databases of peer-reviewed literature: scientific journals, books,
and conference proceedings. We selected research papers regarding
Alzheimer’s disease diagnosis using AI techniques from 2013 to 2022.
Scopus searches within the article title, abstract, and keywords. The
papers will not be selected if the search documents only appear in the
text or figure caption. FIGURE 2
Paper search flowchart.

3.1.1.1. Inclusion keyword groups


Inclusion Keyword group 1: “Alzheimer’s disease” OR “AD” OR
3.2. Pre-processing
“dementia” OR “mild cognitive impairment” OR “MCI.”
Inclusion Keyword group 2: “Artificial intelligence” OR “AI”
The size of the training set highly impacts classification
OR “machine learning” OR “deep learning” OR “computer-assisted
performance. In all datasets introduced above, the numbers of
diagnosis” OR “computer assisted diagnosis” OR “CAD” OR “Neural
image scans retrieved from AD and MCI subjects are limited. In
network” OR “convolutional neural network” OR “CNN” OR
most studies, pre-processing must be done before manipulating
“recurrent neural network” OR “RNN” OR “random forest” OR
the data. Pre-processing is a set of image processing tasks
“support vector machine” OR “SVM.”
performed on the acquired image scans. Some MRI software
Inclusion Keyword group 3: “Magnetic Resonance Imaging” OR
packages like FreeSurfer (Fischl, 2012), Computational Anatomy
“MRI” OR “Structural Magnetic Resonance Imaging” OR “sMRI” OR
Toolbox (CAT12), FMRIB Software Library (FSL) (Jenkinson et al.,
“Functional Magnetic Resonance Imaging” OR “fMRI.”
2012), Statistical Parametric Mapping (SPM), ANTS (Avants et al.,
2009), etc., provide well-encapsulated pre-processing algorithms.
Pervasively used pre-processing techniques include registration,
3.1.1.2. Exclusion keyword groups
normalization, smoothing, segmentation, skull-stripping, noise
Exclusion Keyword group 1: “Schizophrenia” OR “depression”
removal, temporal filtering, covariates removal, etc. This review will
OR “major depressive disorder.”
introduce intensity normalization, registration, skull-stripping, tissue
Exclusion Keyword group 2: “Computed tomography” OR “CT”
segmentation, and class balancing.
OR “Positron Emission Tomography” OR “PET” OR “amyloid-β.”
Exclusion Keyword group 3: “REVIEW.”
Initially, the search result contained 2,561 documents in total.
3.2.1. Intensity normalization
Then we filtered the document type as article (1,712 documents)
Intensity normalization, known as field correction or intensity
and source type as “journal” (1,705 documents). At last, we gave
inhomogeneity correction, refers to rescaling the intensities of
up those documents written in languages other than English (1,678
each pixel to a normalized intensity. In the process of MR image
documents).
acquisition, various scanners or parameters will scan distinct subjects
or the same subject at different times, which may cause significant
intensity changes. Large intensity changes will significantly affect
3.1.1.3. Exclusion criteria
the performance of subsequent pre-processing like registration
1. The initial search result is further filtered according to the
and segmentation.
following exclusion criteria.
2. Studies only focus on preprocessing, brain extraction, or other
similar feature selection.
3.2.2. Registration
3. Studies using other biomarkers only other than MRI images (e.g.,
Registration is a method to spatially align image scans to
CT, PET, amyloid-β, genetic, etc.)
ensure the correspondence of anatomy across modalities, individuals,
4. Studies focus on brain aging or other types of brain disease.
and studies. Registration is also used in multi-modality tasks
5. Conference Paper, conference review, book chapter, editorial,
for co-registration. The most commonly used templates are MIN
review, note, letter, or data paper.
305, Collin27, and MNI152. Liu et al. (2016) reported a higher
6. Conference proceeding, book series, book, or trade journal.
performance adopting multiple templates over a single template.
7. Articles are written in languages other than English.
They utilized multiple templates for feature extraction, selected the
The criteria above generated a collection of 31 articles in total for most representative features of each template, trained multiple SVM
in-depth reviewing as shown in Figure 2. classifiers, and ensemble the results of all classifiers to generate the

Frontiers in Computational Neuroscience 04 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

result. However, multiple templates lead to high computational costs, complexity and difficulty of multi-kernel SVMs are much more
especially in image registration. significant than single kernel SVM. In terms of space, the multi-
kernel SVM algorithms need to calculate the kernel combination
coefficients corresponding to each kernel matrix, so multi-kernel
3.2.3. Skull-stripping matrices must participate in the operation. In other words, multi-
Skull-stripping or brain extraction means removing the non- kernel matrices need to be stored in memory simultaneously. If
brain tissues like skull, fat, eyes, etc., and remaining gray matter the number of samples is too large, the dimension of the kernel
(GM), white matter (WM), Cerebrospinal fluid (CSF), etc. in the matrix will be huge. If the number of kernels is also too large, it
brain scan. will undoubtedly occupy colossal memory space. In terms of time,
training of multi-kernel SVM is time-consuming. The high time and
space complexity are one of the main reasons the multi-kernel SVM
3.2.4. Tissue segmentation algorithms cannot be widely used. Suk and Shen (2013) and Suk et al.
Tissue segmentation means partitioning the image scan into (2015) used multi-kernel SVM classifiers in the model to deal with
segments corresponding to various tissues. The volume of tissues is the feature vectors extracted from Stacked AEs. Khedher et al. (2015)
a measurement often used after tissue segmentation. GM probability reported an accuracy of 88.49%, specificity of 91.27%, and sensitivity
maps are a popular input form in classification tasks. Usually, Pre- of 85.11% using partial least squares and PCA as feature extractors
processing techniques like intensity normalization and registration and linear and RBF kernel SVM as classifiers.
need to be done. Random forest (RF) is an ensemble algorithm. Each decision
tree is a classifier. Multiple decision tree classifiers form the random
forest. Individual decision trees are trained in parallel. Random
3.2.5. Data augmentation forest integrates all classification voting results and assigns the
Data augmentation is a way to solve the limitation on the number category with the most votes as the final output. Random forest is
of subjects in a dataset. It is a technique to enlarge the dataset without a flexible and practical method. It works well on a large dataset.
collecting new data by generating new data samples from the existing It can handle thousands of input variables without dimension
data. Data augmentation techniques have been used, including reduction. It estimates the significance of different variables in
cropping, reflection, random translation, gamma correction, scaling, a task. Calculating many trees and integrating their outputs can
random rotation, elastic transform, vertical flip, horizontal flip, and consume many computing resources. Moradi et al. (2015) proposed
different types of blurring. Moreover, new synthesis techniques like a novel biomarker-based diagnosis in classifying different stages
autoencoders and generative adversarial networks are also used of MCI by utilizing a low-density separation classifier and a
in data augmentation. However, synthesis techniques need more random forest classifier. Lebedev et al. (2014) tested random forest
proof of the effectiveness of the generated images in AD-related on ADNI and AddNeuroMed datasets using MRI images and a
classification and prediction tasks. combination of morphometric measurements with ApoE-genotype
and demographics (age, sex, and education) MRI images. Bi et al.
(2020) aimed to overcome the minor sample issue and proposed
3.3. Conventional machine learning a clustering evolutionary random forest architecture to deal with
multimodal data from ADNI to detect abnormality in the brain and
Support-vector machines (SVMs) are supervised learning pathogenic genes.
methods in conventional machine learning and are often used
to solve classification and regression problems. SVMs map the
input to points in multidimensional space to maximize the margin
between hyperplanes of different data types. A kernel function, 3.4. Convolutional neural network
for example, Gaussian or polynomial function, maps the current
multidimensional space into a higher-dimensional space. SVMs can Deep learning is a subset of machine learning techniques in
be used alone and work with other methods for both conventional which the learning process is performed through a hierarchical and
machine learning and deep learning methods. Since SVMs can deep structure. Deep learning techniques have received significant
achieve a relatively good performance and the principles of SVMs attention in the last few years and have been used widely in different
work are clear and understandable, SVMs are extensively applied brain studies. One of the most successful deep learning methods is
in industrial and scientific areas. Suk et al. (2016a) used a linear the convolutional neural network.
SVM classifier, and Suk and Shen (2013) and Suk et al. (2015) used Convolutional Neural Networks (CNN) are artificial neural
multi-kernel SVMs to classifier integrated features from multi-modal networks that use convolution operations to filter the input data
inputs. Shi et al. (2018) proposed a model that takes stacked deep and extract useful features. Research on CNN has emerged and
polynomial networks (DPN) as the feature extractor and a linear thrived swiftly. CNN has attracted widespread attention from
kernel SVM as the classifier. Suk et al. (2014) used a linear SVM for researchers and achieved state-of-the-art results on various tasks
the hierarchical classifiers to work with feature representations found in detection, classification, and segmentation problems in different
by Deep Boltzmann Machine (DBM). domains, including medical imaging, natural language processing,
The multi-kernel SVMs provide more flexibility than the single etc. The tremendous success CNN achieved in the classification
kernel SVM. Although multi-kernel SVM has shown excellent and segmentation of realistic images has promoted the development
performance in many tasks, efficiency is the most significant and application of CNN in the medical area. In recent years, CNN
bottleneck for developing multi-kernel SVM. The computational has performed well in organ segmentation and disease detection

Frontiers in Computational Neuroscience 05 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

tasks. The classic CNN structure consists of a series of convolutional deep learning model that achieves high accuracy (Zhao et al.,
layers, pooling layers, activation layers, and fully connected layers. 2021).
A SoftMax function is applied to classify the input image with
probabilistic values between zero and one.
The convolutional layer contains concepts of local receptive 3.4.1. CNN backbones
fields, shared weights, filters, stride, and padding. A filter contains CNN backbones refer to the feature extracting networks or
unknown parameters that will be learned during training. The feature extractors. In this section, we will introduce classic CNN
convolution is the process by which a filter slides across the whole backbones that are pervasively used in AD diagnosis tasks.
image from top-left to bottom-right and convolves with the input
image to calculate the weighted sum. The stride refers to the step 3.4.1.1. LeNet
size that a filter moves in per slice. However, the edges’ pixels will LeCun et al. (1998) proposes LeNet, the first work that uses
never be in the center of a filter, and a filter cannot extend beyond the CNN in a character recognition task. The basic concepts of
edge region. After each convolution between the input and the filter, convolutional, pooling, and fully connected layers are introduced
only part of the pixels is detected at the edge, and information at the in one architecture. It also introduces the idea of local receptive
image boundary is lost. Padding is designed to overcome this issue. fields within CNN. These concepts are the fundamentals of the other
Padding means filling in some values along the input boundaries to deep learning module. Yang and Liu (2020) propose their model
increase the input size. Usually, the values filled are zeros. Padding is with LeNet-5 to do classification and prediction. They take PET
needed when it is necessary to keep the dimensions constant before images of 350 subjects who are MCI from ADNI. The model achieves
and after convolution to avoid information loss. The size of the sensitivity and specificity of 91.02 and 77.63% in MCI transformation
filters determines the receptive field in the convolutional layers. The prediction.
convolutional layers are excellent feature extractors for images since
3.4.1.2. AlexNet
images contain massive spatial redundancy, and convolutional layers
A significant architecture after LeNet (Krizhevsky et al., 2017)
solve this characteristic of images with shared weights. After reducing
proposed AlexNet. A Rectified Linear Unit (RELU) was used as the
spatial redundancy, the feature vector that the convolution layers
activation function. Besides, the author introduced a way to train the
output stands for the image’s content.
networks using multiple GPUs.
The pooling layer is the dimension reduction operation on the
feature maps. It helps reduce the number of parameters to train and 3.4.1.3. VGG
accelerates the training process. The most widely used pooling layers Simonyan and Zisserman (2015) proposed VGG. A stack of 3
are max pooling, average pooling, and global pooling. Max pooling × 3 convolution filters was used to replace large convolution filters
outputs the maximum value within the region of the feature map like 5 × 5, 7 × 7, 9 × 9, or 11 × 11 convolution filters. A stack of
covered by the filter. Average pooling calculates the average value of small convolution filters for a given receptive is better than one large
the elements presented within the feature map region covered by the convolution filter. The use of small filters results in fewer parameters
filter. Global pooling reduces each channel in the input to a single and deeper networks which will help train a more complex model
value. in a shorter time. Jain et al. (2019) utilized the transfer learning
The activation layer provides a non-linear mapping to the output approach to build the AD classification model. The feature extractor
of the convolutional layer. The calculations in a convolutional layer in this work was VGG16 which was pre-trained on ImageNet. They
are linear. The non-linearity provided by activation layers enhanced converted 3D MRI images to 2D slices, selected the most informative
the reasoning ability of the network. The most pervasively used 32 slices in pre-processing, and then fed the slices into VGG16,
activation functions include ReLU, Sigmoid, Tanh, etc. followed by fully connected layers. Although their dataset had MRI
The fully connected layer takes the feature extractor’s inputs and images of 150 subjects from ADNI, the model achieved an accuracy of
predicts the correct label with probabilities. 99.14, 99.30, and 99.22% for AD vs. CN, AD vs. MCI, and MCI vs. CN
CNN can be used as the feature extractor and classifier or only classifications. Even though the classification accuracy was high for all
feature extractor. Some researchers use CNN to extract features and binary tasks, the generality of the proposed model was highly doubted
adopt the conventional machine learning method for classification. since the dataset was too small. Lim et al. (2022) tested a CNN, VGG-
Suk et al. (2017) utilized CNN to take the target-level representations 16, and ResNet-50 as the feature extractor to distinguish NC, AD,
generated from the sparse regression for clinical decision making. and MCI using MRI images. They trained the CNN from scratch and
Feng et al. (2020) applied 3D CNN with MRI to execute AD pre-trained VGG-16 and ResNet-50 on the ImageNet database. VGG
classification using MRI images. They replaced SoftMax with an achieved the best performance with an accuracy of 83.90%, precision
SVM as the classifier, and this 3D-CNN-SVM model achieved of 82.49%, recall of 83.90%, and F1-score of 83.19%.
better classification performance than 2D-CNN and 3D-CNN. With
the thriving of CNN in computer vision, researchers contribute 3.4.1.4. GoogLeNet
several CNN backbones that achieve state-of-the-art performance in Szegedy et al. (2015), Szegedy et al. (2016), and Szegedy et al.
many tasks. (2017) contributed several versions of the Inception structure and
When comparing conventional machine learning and deep introduced a series of new ideas, including the Inception module
learning methods in AD-related areas, we can conclude that: in and batch normalization. Instead of choosing whether we should
general, deep learning methods achieve better performance than use 3 × 3, 5 × 5, or 7 × 7 filters manually, the inception structure
conventional machine learning methods. The proper size of the automatically makes the network learn how to find a proper structure.
training samples should be no <1,000. A dataset containing over Batch normalization introduced in inception v2 reduces internal
five thousand samples can be considered sufficient to train a covariate shift, which is generated after convolution operations. The

Frontiers in Computational Neuroscience 06 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

consistency of statistical characteristics of data is maintained during diagnosis. They adopted DenseNet due to the issue of limited data
training. Inception v3 further replaces the large convolution kernel and trained a few 3D-DenseNets with varying hyperparameters. The
with the small convolution kernel. A convolution kernel of n × n is final result is generated with the weighted sum of each base 3D-
cracked into a stack or parallel form of 1 × n and 1 × n convolution DenseNets, and the model achieved an accuracy of 97.19%. Zhang
kernel. General network design principles suggested in inception v3 et al. (2021) also proposed their network using 3D DenseNet. Usually,
are slowly reducing the information’s dimension to the desired extent. training a deep learning model like DenseNet with such a small
Ding et al. (2019) used Inception v3 pre-trained on ImageNet as dataset usually results in a high risk of overfitting. The voting strategy
their deep learning backbone. They collect 2,109 PET images of 1,002 help compensates for this fault. However, training multiple deep
patients from ADNI as their dataset. learning models from scratch is time-consuming and inefficient.
Transfer learning may be a good choice.
3.4.1.5. ResNet
He et al. (2016) proposed the deep residual neural networks
(ResNet) to deal with problems of vanishing and exploding gradients. 3.4.2. Input types management
Before ResNet came into being, a network could not be designed CNN is a powerful tool that can process features in different
as deep since the gradient vanishes quickly as the network goes sizes and dimensions. Based on four different input types, four main
deeper. The network can extract more complex feature patterns categories of methods are pervasively used in CNNs: 2D slice-based,
when increasing the number of network layers. Theoretically, when 3D patch-based, 3D region-of-interest-based (ROI-Based), and 3D
a model becomes deeper, better results should be obtained. However, subject-level. Table 1 presents comparisons among recent works.
the network accuracy becomes saturated or even decreases as the
network depth increases. ResNet solves this issue by adding shortcut 3.4.2.1. 2D slice
connections that skip one or more layers. The accumulation layer 2D slice-based approaches extract 2D slices from a 3D image
only does the identity mapping when the residual is zero. At least to reduce the number of hyper-parameters. The hypothesis here is
the network performance will not decline. The residual will not be useful features for classification or prediction tasks can be extracted
zero, enabling the accumulation layer to learn new features based on from 2D slices. A common way to extract 2D slices from a 3D image
the input features. Usually, the residual will be relatively small, so the is to project the whole brain scan to the sagittal, coronal, and axial
model is easy to train. Abrol et al. (2020) applied a 3D ResNet in planes. Sometimes the sagittal, coronal, and axial planes are also
their network for classification and prediction. They took 3D gray called the median, frontal, and horizontal planes. The center part of
matter images as the input to train the model for MCI detection the brain is usually more informative than the parts on the edges.
first, then utilized transfer learning to transfer the trained model The information entropy of the images in the center part is larger
to the domain of NC and AD classification. Korolev et al. (2017) than the rest. As a result, not all slices will be used during training.
adopted a 3D ResNet and a CNN network similar to VGG to extract Slices of sagittal, coronal, and axial views contain complementary
features necessary for 3D image classification using brain MRIs. Both information. Some studies integrate features extracted from sagittal,
networks worked well to classify AD and NC but failed to separate coronal, and axial views. It is easy to obtain large numbers of samples
AD and NC from MCI. Islam and Zhang (2018) tested an architecture when using 2D slices. A deep learning model with 2D CNN usually
that ensembled Inception v4 and ResNet to identify different stages of contains fewer parameters and needs a shorter training time than
AD and achieved an accuracy of 93.18% on OASIS. a 3D model. The disadvantage of slice-based approaches is that 2D
slices of a brain image lose the spatial information between each
3.4.1.6. DenseNet other since each 2D slice is processed independently. Sarraf et al.
Huang et al. (2017) proposed DenseNet to make full use of (2017), Wang S. H. et al. (2018), and Jain et al. (2019) adopted 2D
features from all layers. Two main approaches to improving neural MRI slices as the input type in their proposed model. Sarraf et al.
effects are going deeper and becoming more expansive. On the (2017) used LeNet-5 as the CNN backbone and reported an accuracy
contrary, DenseNet connects all layers directly. In other words, the of 96.86% for the classification of AD ad NC. Wang S. H. et al.
input for each layer is derived from the output for all previous (2018) trained their own 2D CNN from scratch. Jain et al. (2019)
layers. By doing so, DenseNet mitigates vanishing gradient and makes used 2D MRI slices as the input type in the model they presented.
the best use of features to improve the effect. At the same time, They adopted the VGG-16 pre-trained on ImageNet as the feature
the number of parameters is reduced to some extent. Wang et al. extractor. Lin et al. (2018) investigated to use CNN with PCA and
(2019) proposed their model in which every classifier takes the 3D Lasso to predict MCI-to-AD conversion. They trained the CNN as
DenseNet as the backbone, followed by fully connected layers and the feature extractor to input 2.5D patches, adopted PCA and Lasso
a softmax function. Each 3D DenseNet is initialized and trained to reduce the dimensions, and selected the most informative features.
separately. A voting system is adopted to integrate the probabilistic At last, fed the features to an extreme learning machine to make the
scores generated from independent classifiers. The model is trained classification. Furthermore, they tested the features generated from
on images of 833 subjects in the ADNI dataset. Liu et al. (2020) FreeSurfer together with the CNN-based features, and it turned out
integrated the multi-task deep CNN and DenseNet models for that using both features can generate better performance than using
hippocampal segmentation and AD classification. In detail, the solely CNN-based or FreeSurfer-based features.
multi-task deep CNN extracted the features for segmentation and
classification, and a 3D DenseNet learned the features for disease 3.4.2.2. 3D patch
classification. At last, the model integrated the features learned from 3D patch-based approaches are like 2D slide-based methods,
the multi-task CNN and DenseNet models to make the classification. but instead of sampling the projections of particular planes cutting,
Wang S. et al. (2018) ensembled 3D-DenseNets for AD and MCI the 3D brain scan into a set of 3D patches with stride as a

Frontiers in Computational Neuroscience 07 frontiersin.org


TABLE 1 Comparison among papers with high citations in AD diagnosis.
Frontiers in Computational Neuroscience

Zhao et al.
References Scan type Dataset Subjects Participants Accuracy Technical details Pre-processing
Suk and Shen (2013) MRI + PET ADNI 202 HC: 52, 98.8% Stacked AEs + a multi-kernel SVM Anterior commissure-posterior commissure correction,
AD: 51, skull-stripping, cerebellum removal, and tissue segmented
MCI: 99

Liu et al. (2014) MRI + PET ADNI 311 HC: 77, 91.4% Stacked sparse AEs + a softmax layer Non-linear registration and tissue segmentation
AD: 65,
pMCI: 67,
sMCI: 102

Lebedev et al. (2014) MRI ADNI 896 HC: 225, Overall RF FreeSurfer segmentation and cortical reconstruction
AddNeuroMed MCI: 165, Accuracy
AD: 185 ADNI: 86.6%
HC: 100, AddNeuroMed:
AD: 107, 86.25%
MCI: 114

Suk et al. (2014) MRI + PET ADNI 398 HC: 101, 95.35% DBM + a linear kernel SVM Anterior Commissure (AC)-Posterior Commissure (PC)
AD: 93, correction, skull-stripping, cerebellum removal, and tissue
MCI: 204 segmented

Payan and Montana MRI ADNI 2,264 HC: 755, 95.39% Sparse AEs and 3D CNN + FC Normalization
(2015) AD: 755,
MCI: 755

Suk et al. (2015) MRI + PET ADNI 202 HC: 52, 89.13% Stacked AEs + a multi-kernel SVM Anterior commissure (AC)-posterior commissure (PC)
AD: 51, correction, skull-stripping, and cerebellum removal.
pMCI: 43,
08

sMCI: 56

Moradi et al. (2015) MRI ADNI 825 HC: 231, 75% LDS + RF Intensity correction, spacial normalization, and tissue
MCI: 394, segmentation
AD: 200

Khedher et al. (2015) MRI ADNI 818 HC: 229, 88.49% Partial least squares+ PCA + SVM Spatial Normalization and segmentation(GM, WM, CSF)
AD: 188,
MCI: 401

Li et al. (2015) MRI + PET + CSF ADNI 202 HC: 52, 91.4% PCA features stacked RBMs + a linear Anterior commissure-posterior commissure correction,
+ MMSE + ADAS-Cog AD: 51, kernel SVM skull stripping, cerebellum removal, and spatially
MCI: 99 normalization

Suk et al. (2016a) MRI ADNI-2 100 HC: 31, 72.58% Deep Auto-Encoder Realignment and normalization
In-house MCI: 31 81.08%
dataset HC: 25,
MCI: 13

10.3389/fncom.2023.1038636
Hosseini-Asl et al. (2016a) MRI ADNI, 310 + 30 = HC: 70, AD vs. MCI vs. A 3D CNN pre-trained with stacked 3D Normalizing, skull stripping, and intensity normalization
CADDementia 240 AD: 70, NC: 94.6% convolutional AEs
MCI: 70 AD + MCI vs.
30 subject NC: 95.7%
AD vs. NC:
99.3%
frontiersin.org

AD vs. MCI:
100%
MCI vs. NC:
94.2%
(Continued)
Frontiers in Computational Neuroscience

Zhao et al.
TABLE 1 (Continued)

References Scan type Dataset Subjects Participants Accuracy Technical details Pre-processing
Hosseini-Asl et al. (2016b) MRI ADNI, 310 + 30 = HC: 70, AD vs. MCI vs. A 3D CNN pre-trained with stacked 3D Normalizing, skull stripping, and intensity normalization
CADDementia 240 AD: 70, NC: 89.1% convolutional AEs
MCI: 70 AD + MCI vs.
30 subject NC: 90.3%
AD vs. NC:
97.6%
AD vs. MCI:
95%
MCI vs. NC:
90.8%

Liu et al. (2016) MRI ADNI 459 HC: 128, 93.06% Ensemble SVMs Non-parametric non-uniform bias correction, skull
AD: 97, stripping, cerebellum removal, tissue segmentation, and
sMCI: 117, affine alignment
pMCI: 117

Suk et al. (2017) MRI ADNI 805 HC: 226, AD vs. NC: Sparse regression + CNN Anterior Commissure (AC)-Posterior Commissure (PC)
AD: 186, 90.28% correction, skull-stripping, and cerebellum removal
pMCI: 167, MCI vs. NC:
sMCI: 226 74.20%
pMCI vs.
09

sMCI: 73.28%

Korolev et al. (2017) MRI ADNI 231 HC: 61, 88% 3D CNN based on ResNet and VGGNet + Alignment and skull stripping
AD: 50,
sMCI: 77,
pMCI: 43

Sarraf et al. (2017) MRI ADNI 144 + 302 = HC: 92 + 91, 100% GoogLeNet and LeNet-5 + Skull stripping, tissue segmentation, registration, and
446 AD: 52 + 211 smoothing

Islam and Zhang (2018) MRI OASIS 416 416 93.18% 2 CNNs, Inception v4,ResNet Data augmentation

Lin et al. (2018) MRI ADNI 818 HC: 229, 79.90% PCA + Lasso + CNN Skull-stripping, deformation registration, and intensity
AD: 188, normalization
MCI: 401

Wang S. H. et al. (2018) MRI OASIS, local 196 HC: 98, 97.65% A 2D CNN Brain extraction, spatial normalization, normalization,
AD: 28 smoothing, and histogram stretching
AD: 70

10.3389/fncom.2023.1038636
Shi et al. (2018) MRI + PET ADNI 202 HC: 52, 97.13% Stacked DPN Anterior commissure (AC)-posterior commissure (PC)
AD: 51, +a linear kernel SVM correction, intensity inhomogeneity, skull-stripping,
sMCI: cerebellum removal, tissue segmentation, registration
56,breakpMCI:
43
frontiersin.org

(Continued)
Frontiers in Computational Neuroscience

Zhao et al.
TABLE 1 (Continued)

References Scan type Dataset Subjects Participants Accuracy Technical details Pre-processing
Basaia et al. (2019) MRI ADNI 1,2,GO 1,385 In total AD vs. HC: CNN Spatial Normalization and tissue segmentation,
+ Milan HC: 407, AD: 99% on ADNI
dataset 418, c-MCI: 98% on ADNI
280, sMCI: 280 + Milan
ADNI cMCI vs.
HC: 352, sMCI: 75% on
AD: 294, both datasets
MCI: 763
Milan
HC: 55,
AD: 124,
MCI: 50

Wang et al. (2019) MRI ADNI 833 HC: 315, 97.52% Esemble 3D-CNN Grad-warping, intensity correction, skull stripping, and
AD: 221, alignment
MCI: 297

Khan et al. (2019) MRI ADNI 150 HC: 50, 99.20% VGG Employ image entropy to select the most informative slices
AD: 50,
MCI: 50

Jain et al. (2019) MRI ADNI 150 HC: 50, 95.73% VGG-16 pre-trained on ImageNet Motion Correction, non-uniform intensity normalization,
10

AD: 50, + 2D CNN Talairach transform computation, intensity normalization,


MCI: 50 + FC and skull stripping

Liu et al. (2020) MRI ADNI 449 HC: 119, AD vs. NC: 3D DenseNet Hippocampus segmentation and affine registration Tissue
AD: 97, 88.9% segmentation and
MCI: 233 MCI vs. NC: non-linear registration
76.2%

Lian et al. (2020) MRI ADNI-1 951 HC: 229, AD vs. NC: FCN Anterior commissure (AC)-posterior, commissure (PC)
ADNI-2 AD: 199, 90.3% correction, intensity correction, skull stripping, cerebellum
sMCI: 226, pMCI vs. removing, and affine registration
pMCI: 167 sMCI: 80.9%
HC: 200,
AD: 159,
sMCI: 239,
pMCI: 38

Abrol et al. (2020) MRI ADNI 828 HC: 237 83.01% CNN Tissue segmentation, normalization, and smoothing
AD: 157 +3D ResNet

10.3389/fncom.2023.1038636
sMCI: 245
pMCI: 189

Feng et al. (2020) MRI ADNI 489 HC: 179, NC: 93.71% 3D CNN Spatial normalization, skull stripping, tissue segmentation,
AD: 153, MCI: 96.82% +SVM affine transition, and registration
MCI: 157 AD: 96.73%
frontiersin.org

(Continued)
Zhao et al. 10.3389/fncom.2023.1038636

hyperparameter. The sample size is larger after cutting. The 3D


patch-based methods compensate for the loss of spatial information

Data enhancement processing (clipping, flipping, increase


compared with 2D slice-based methods, but patches are often used

Random resize, cropping, random rotation, random


independently during training. 3D patch-based methods need low

horizontal flip, center cropping, and normalization


memory when a model uses the same network for each patch. If
training an independent network for each patch separately and then
using an assemble architecture to integrate the results from previous
independent networks, the complexity of the whole network will
Normalization and smoothing

be high. Challenges in the 3D patch-based method are to choose


the informative patches from the brain scan and select the most
Pre-processing

discriminative features. Qiu et al. (2020) and Zhang et al. (2021)

contrast, rotate etc.)


adopted 3D patches as the input type.

3.4.2.3. 3D ROI
3D ROI-based methods pay attention to specific regions which
have been proved to be related to AD clinically. Images of ROI
represent the 3D image of a segmented brain region. The selected
regions, for example, gray matter volume, hippocampal volume,
cortical thickness, etc., are usually informative. Using an ROI-based
method will not lead to overfitting easily. The model interpretability
is excellent since a human can see the contribution of each region
Technical details

in the model. The shortage of ROI-based methods is the prerequisite


Deep separable CNN

knowledge of the regions to select in AD. Liu et al. (2014) took the
3D ROI-based input and extracted features in Stacked sparse AEs. Li
et al. (2015) adopted 3D ROI-based input in their model and used an
ResNet18

SVM classifier.
RF

3.4.2.4. 3D subject
MCI vs. EMCI:
EMCI vs. AD:

LMCI vs. AD:


Participants Accuracy

3D subject-based methods take a 3D brain scan as a whole,


99.99%

99.95%

99.95%

78.02%
90%

so complete integration of spatial information is preserved. Since


a patient only provides one sample at a time, the sample size is
too few compared with the number of subjects in popular datasets.
Consequently, the risk of overfitting is high when using 3D subject-
AD: 25, MCI:

based methods. MRI scans are globally similar. Minor changes are not
HC: 332,

MCI: 68
AD: 30,
HC: 35,

HC: 25,

easily recognized in MRIs.


AD: 37

63
Subjects

3.5. Autoencoder
113

430
72

An autoencoder (AE) is an artificial neural network in which


the input and learning objectives are almost the same. Autoencoders
aim to learn hidden representations of the input in an unsupervised
Dataset

manner. An autoencoder consists of an encoder and a decoder.


ADNI2

OASIS
ADNI

ADNI

Given input space and feature space, an autoencoder solves the


mapping between the input and output to ensure the reconstruction
error of the input feature is minimized. In other words, the latent
layer feature, the encoded feature generated by the encoder, can be
regarded as a representation of the input data.
Scan type

The representational ability of an AE is limited. Stacked AEs


are a combination of a series of AEs stacked together. In Stacked
MRI

MRI

MRI

AEs, the output of hidden units of an AE is used as the input of


another AE in the deeper layer. As the stacked AEs become deeper,
the representational power increases. Stacked AEs can also be used
TABLE 1 (Continued)

Odusami et al. (2021)

in transfer learning. Stacked AEs as self-supervised learning can


effectively extract the latent representation of input data. So stacked
References

Liu et al. (2021)


Bi et al. (2020)

AEs can be used as a feature extractor. Train the AE with the


training set, then replace the decoder with a classifier for classification
purposes. The latent representation extracted in the AE can be used
in pre-training. In tasks lacking datasets like AD classification and

Frontiers in Computational Neuroscience 11 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

prediction, stacked AEs are pervasively used. Suk and Shen (2013), domain that contains existing knowledge, while the target domain
Suk et al. (2015), and Suk et al. (2016a) proposed networks used is the one to which the current knowledge is transferred. Since
stacked AEs as feature extractors. SVM is used as the classifier to the most pervasively used backbone networks like LeNet, AlexNet,
process the features to make the classification. Hosseini-Asl et al. VGGNet, ResNet, DenseNet, and GoogLeNet are all trained on
(2016a,b) used a 3D CNN pre-trained with stacked 3D convolutional ImageNet, ImageNet has become the most common source dataset
AEs in their work. Payan and Montana (2015) adopted sparse AEs for transfer learning (Ardalan and Subbian, 2022). Researchers use
and CNN and compared the classification accuracy of 2D and 3D transfer learning to pre-train their deep learning algorithms to solve
approaches. The 3D approach provided a boost in performance the problem of scarcity of data samples.
compared to the 2D method. Fine-tuning means applying a pre-trained model and using the
weights of the pre-trained model to initialize the new model that
will train. Fine-tuning helps to save a lot of time for training since a
model does not need to train from scratch. Researchers can choose to
3.6. Transformer
freeze, fine-tune, and randomly initialize parts of the pre-train model.
According to Ardalan and Subbian (2022), most researchers prefer to
The utilization of state-of-the-art models of other computer
fine-tune convolution and fully connected layers.
vision tasks significantly improves the performance of AD
The prediction for MCI conversion is more challenging than
classification and prediction. Integrating the latest model into
the classification between AD and HC because the brain structural
AD-related studies is always a good idea. The next possible candidate
changes of MCI may be very subtle. However, since the classification
to improve AD performance may be the attention mechanism.
task between AD and HC is highly correlated with the task of
The attention mechanism proposed by Vaswani et al. (2017) was
MCI prediction, researchers often transfer the weights learned from
initially designed to solve Natural Language Processing (NLP)
the AD classification to initialize the parameters of the network
problems. Although the nature of the transformer is nothing but a
for MCI classification. Khan et al. (2019) attempted to solve the
weighted sum, the performance of the transformer is unbelievable
need of large dataset issue with transfer learning. Their transfer
and fabulous in a wide range of areas.
learning strategy they deployed was to fine tune with layer-wise
The vision transformer (Vit) proposed by Dosovitskiy et al.
tuning which meant only a predefined group of layers were trained
(2020) ditches the CNN structure and utilizes a pure transformer. As
while other layers stayed frozen. Liu et al. (2021) adopted the
a new type of feature extractor, Vit focuses on patch-level attention
AlexNet and GooLeNet as the base for transfer learning with an
instead of focusing on pixel-level attention. Vit achieves better
accuracy of 91.4 and 93.02%, respectively. The GoogLeNet achieved
performance than CNN in the various task in computer vision. If
a slightly higher performance since it contains deeper layers and
Vit is successfully used in AD diagnosis, the interpretability of the
more convolutions than AlexNet. Odusami et al. (2021) utilized a
model will be increased since Vit depicts the importance of each area.
transfer learning method for Alzheimer’s detection. They utilized a
The shortage of Vit is the dimension of the input feature is too large
pre-trained ResNet18 network as the source domain and unfroze
as most AD-related tasks use 3D images. Using Vit to handle such
all the layers to update the parameters of the network. Basaia et al.
input with such a large dimension is unrealistic. Since 3D images
(2019) implemented transfer learning in the way that the weights of
contain much more spatial redundancy than 2D images and texts, it
the CNN used to classify ADNI AD vs. HC were transferred to the
is necessary to reduce the duplication before processing.
other CNNs and used as pre-trained initial weights. Lian et al. (2020)
With the great success of masked language models like
transferred the weights learned from the AD vs. HC classification
Bidirectional Encoder Representations from Transformers (BERT)
task to the MCI classification task. Hosseini-Asl et al. (2016a) pre-
(Devlin et al., 2019) for pre-training in NLP, a new transfer learning
trained a 3D convolutional autoencoder in the source domain (CAD-
method may also help improve performance. Masked Autoencoder
Dementia) and fine-tuned in the target domain (ADNI). Li et al.
(MAE), proposed by He et al. (2021), explains the natural difference
(2015) pre-trained with RBM in an unsupervised manner. Similarly,
between language and vision. Language is concrete and has high
Payan and Montana (2015) pre-trained convolutional layers with a
sematic information density, while vision is a continuous signal that
sparse autoencoder and used the layers to initialize CNN.
contains duplication in space. Masked parts are more likely to be
recovered in a vision task. An original image can be reconstructed
based on the given partial observation information.
4. Challenges and discussion
3.7. Transfer learning This article still contains some limitations. The papers we
reviewed are mostly papers with high citations per year, which
Humans can utilize existing knowledge of one area to accelerate is not fair for newly published ones. The document and source
solving problems in another area. In many studies, researchers types are strictly limited to “article” and “journal.” Furthermore,
train their deep learning models from scratch. However, it is often we only reviewed articles written in English. We mainly reviewed
inefficient since the training process is time-consuming, and a dataset papers on Alzheimer’s disease diagnosis using MRI as the data
of adequate size up to millions of images is required. Because of type. Neuroimaging of other forms, genetic, biological, voice-based,
the high cost of learning directly from scratch, researchers expect text-based, etc., may be reviewed in separate papers. The multi-
to use existing knowledge to assist in learning new knowledge faster modality models that can fuse information from different modalities
and better. Transfer learning means transferring knowledge learned usually outperform the models with only one modality since various
from one domain to another. The source domain is defined as the modalities may contain complementary information.

Frontiers in Computational Neuroscience 12 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

Artificial intelligence, especially conventional machine learning split may occur when using 2D slices and 3D patches as deep
and deep learning methods, is thriving in AD-related tasks. However, learning input. The proper split should happen at the subject level.
there are still some challenges. The datasets in the AD area are still Prejudice transfer learning happens if the source and destination
small compared with datasets in computer vision tasks because of domain of transfer learning overlap. Different source and destination
the privacy of medical data. Given the complexity of AD-related datasets are excellent ways to avoid prejudice transfer learning. No
tasks, a large-scale dataset is a must for a researcher to develop more independent validation set exists in research in which the dataset
effective and powerful models. Currently, researchers mainly focus is split into only training and test set. The test set should only be
more on AD, MCI, and NC classification than prediction. The early used for evaluation and never use the test set for hyperparameter
detection of AD remains a challenging issue. Performance between optimization. A separate validation set that does not overlap with the
each proposed model is hard to compare due to using different test set can be used to optimize the hyperparameter of the model.
numbers of samples, modalities, pre-processing techniques, feature
extractors, classifiers, etc.

4.3. Trade-off discussion


4.1. Class imbalance In the reviewed articles of this paper, most authors utilized pre-
processing techniques. Even though deep learning requires less pre-
Class imbalance is a common issue in datasets. Usually, images
processing of data, for instance, Islam and Zhang (2018) and Khan
in some classes may be far more than those in others in a dataset.
et al. (2019) utilized no pre-processing techniques in their CNN
Increasing the number of images that is fewer than other classes or
networks; we still suggest pre-processing according to the standard
reducing the number of images that is more than other classes are
pipeline before using the raw data, especially when adopting the
two methods to solve the imbalanced data issue. Synthetic Minority
conventional machine learning method. A recommend standard pre-
Oversampling Technique (SMOTE) technique is used to address the
processing pipeline includes: intensity correction, skull-stripping,
class imbalance problem in the dataset is by randomly duplicating
registration, normalization, and tissue segmentation.
the minority class of images in the dataset to minimize the overfitting
In the reviewed papers, SVM is the most pervasively utilized.
problem (Chawla et al., 2002). Murugan et al. (2021) adopted SMOTE
However, the trend in recent years is that CNN will surge in
to overcome the class imbalance issue in their work and reported
popularity. Deep learning approaches achieved better performance in
a training and validation accuracy of 99 and 94% compared with
diagnostic tasks than conventional methods. A significant drawback
96 and 78% when not implementing SMOTE. Data augmentation
of deep learning is it lacks interpretability and transparency. The deep
is one way to handle imbalanced data by enlarging the number of
learning models are in a black box state, and the internal operating
samples in the rare class. Reducing the number of images from the
mechanism is challenging to comprehend. Moreover, compared to
over-sampled class makes the dataset smaller. Afzal et al. (2019)
conventional machine learning, deep learning techniques usually
adopted data augmentation to address the class imbalance concern
requires higher-performance graphics processing units, an enormous
in AD detection using 3D MRI images from OASIS and achieved
amount of storage, and more time to train.
high performance for Alzheimer’s disease diagnosis. However, using
Most of the research is conducted using one dataset. However,
a balanced dataset can improve the performance even if the dataset
some researchers use more than one dataset for specific purposes.
becomes smaller due to dataset balancing (Farooq et al., 2017). A
For instance, Liu et al. (2018) and Poloni and Ferrari (2022) used
balanced dataset is preferable. Another way of solving imbalanced
multiple datasets to enlarge the number of subjects. A few researchers
class issues is by reconstructing medical images. Hu et al. (2020)
use multiply datasets for different stages. Qiu et al. (2020) proposed a
proposed a Generative Adversarial Network (GAN) to reconstruct
network to take ADNI as the training dataset and AIBL, FHS, and
neuroimages. They used the new reconstructed images to augment
NACC as the testing dataset. Basaia et al. (2019) tested CNN on
the imbalanced dataset. They trained two 3D densely convolutional
two datasets, ADNI and ADNI + Milan, and achieved an accuracy
connected networks with the raw dataset and the fresh balanced
of 99% on ADNI and 98% ADNI + Milan in the classification of
and tested the performance of these two networks. The neuroimages
AD and HC, and an accuracy of 75% in detection cMCI and sMCI
generated from the GAN helped improve classification accuracy from
on both datasets. Lian et al. (2020) automated the identification
67 to 74%.
of discriminative local patches and regions, then fused the features
learned for classification by a hierarchical fully convolutional network
on ADNI-1 and ADNI-2 and achieved an accuracy of 90.3% for AD
4.2. Data leakage vs. NC and 89.9% for pMCI vs. sMCI in the classification tasks.
Cutting the 3D image from various perspectives can generate
Data leakage refers to the use of testing data during training (Wen 2D slices. 2D slice-based is a cheap method since the 2D image is
et al., 2020). Four main reasons that lead to data leakage are: incorrect much easier to process than 3D. In addition, slicing helps enlarge
data split, late split, improper transfer learning, and no independent the sample size of the dataset. Usually, the 2D slices in the center
test set. The late split occurs using data augmentation techniques with larger entropy will be selected, so the input dimension is
before splitting the dataset into training, test, and validation sets. further reduced. However, when using the slices of one 3D image
As a result, the images generated from the same source can be independently, the interrelationship information may be lost through
split into different datasets, leading to a biased evaluation. Incorrect slicing. We recommend that researchers who do not have hardware
data split means images of a subject at multiple time points are support concentrate on designing a small architecture to try 2D
split into different training, test, and validation sets. Incorrect data slice-based data as the input form. Like the 2D slice-based methods,

Frontiers in Computational Neuroscience 13 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

3D patch-based methods provide a large dataset. The 3D patch wrote the first draft of the manuscript, and contributed to the
compromises the 2D slice and the 3D subject image. However, the subsequent reviews and final version. KL, SD, and WB supervised the
network should have to train a classifier for a patch. As a result, review process, revised the manuscript, and contributed to writing
there will be too many classifiers to train. Extracting discriminative the final version. JC and C-OC helped conceptualize and supervise
features and selecting the most informative ones from all the 3D this study. MG, NW, and XW contributed to the literature review,
patches is tough. Although the ROIs are usually informative, only result evaluation, and presentation. All authors participated in editing
one or a few regions will be considered in a model. However, AD the first draft of the manuscript and read and approved the final
often covers multiple brain regions. For researchers who comprehend manuscript.
how to define and use Region-of-Interests 3D ROI-based method
may be a suitable solution with adequate interpretability. Subject-
level methods contain only one sample per patient, so subject-level Funding
methods usually contain too few samples for a complicated task like
AD detection. This work was supported in part by Universiti Malaya, under
There is no fixed answer to determining a suitable backbone project no. IIRG001B-2021IISS and UPAR Grant No. #12T031.
or an input form. In general, larger and more complex models
have a greater chance of yielding higher performance. According to
Elharrouss et al. (2022), the complexity of DenseNet-121 and ResNet- Acknowledgments
101 is 0.525 a and 7.6 Giga Floating Point Operations Per Second
(GFLOPs). The complexity of AlexNet is over ten times higher than We want to acknowledge all the study participants
ResNet-101. However, their top-1 error rates are 25.02% and 19.87%, participating in the research on public access datasets. We are
which means fourteen times the complexity in exchange for a 5.15% also grateful to all the researchers who reported their work in
reduction in the top-1 error rate. this review.
Compared with CNN, one of the most significant advantages
of the autoencoders is that it is an unsupervised learning method,
and CNN must utilize marked data to work. However, autoencoders Conflict of interest
learn to capture as much information as possible, but the captured
information may not be relevant to the specific task. If the The authors declare that the research was conducted in the
information most pertinent to an issue makes up only a tiny absence of any commercial or financial relationships that could be
part of the input, the autoencoders may lose much of it. Vision construed as a potential conflict of interest.
transformers outperform CNNs in some image classification tasks.
However, Vision transformers need costly pre-training on large
datasets. Researchers must choose the most suitable model based Publisher’s note
on their hardware conditions and specific application requirements,
balancing performance and complexity. All claims expressed in this article are solely those
of the authors and do not necessarily represent those of
their affiliated organizations, or those of the publisher,
Author contributions the editors and the reviewers. Any product that may be
evaluated in this article, or claim that may be made by
ZZ conceived the original idea for the review, performed the its manufacturer, is not guaranteed or endorsed by the
selection of paper and data extraction, prepared the table and figure, publisher.

References
Abrol, A., Bhattarai, M., Fedorov, A., Du, Y., Plis, S., and Calhoun, V. (2020). Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002).
Deep residual learning for neuroimaging: an application to predict progression to Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357.
Alzheimer’s disease. J. Neurosci. Methods 339, 108701. doi: 10.1016/j.jneumeth.202 doi: 10.1613/jair.953
0.108701
Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). “BERT: pre-
Afzal, S., Maqsood, M., Nazir, F., Khan, U., Aadil, F., Awan, K. M., et al. (2019). A data training of deep bidirectional transformers for language understanding,” in NAACL
augmentation-based framework to handle class imbalance problem for Alzheimer’s stage HLT 2019–2019 Conference of the North American Chapter of the Association for
detection. IEEE Access 7, 115528–115539. doi: 10.1109/ACCESS.2019.2932786 Computational Linguistics: Human Language Technologies-Proceedings of the Conference,
Vol. 1 (Minneapolis, MN), 4171–4186.
Ardalan, Z., and Subbian, V. (2022). Transfer learning approaches for neuroimaging
analysis: a scoping review. Front. Artif. Intell. 5, 780405. doi: 10.3389/frai.2022.780405 Ding, Y., Sohn, J. H., Kawczynski, M. G., Trivedi, H., Harnish, R., Jenkins, N. W., et
al. (2019). A deep learning model to predict a diagnosis of Alzheimer disease by using 18
Avants, B. B., Tustison, N., and Song, G. (2009). Advanced normalization tools (ants).
F-FDG PET of the brain. Radiology 290, 456–464. doi: 10.1148/radiol.2018180958
Insight J. 2, 1–35. doi: 10.54294/uvnhin
Basaia, S., Agosta, F., Wagner, L., Canu, E., Magnani, G., Santangelo, R., et al. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et
(2019). Automated classification of Alzheimer’s disease and mild cognitive impairment al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
using a single MRI and deep neural networks. NeuroImage Clin. 21, 101645. Vienna.
doi: 10.1016/j.nicl.2018.101645 Elharrouss, O., Akbari, Y., Almaadeed, N., and Al-Maadeed, S.
Bi, X. A., Hu, X., Wu, H., and Wang, Y. (2020). Multimodal data analysis of Alzheimer’s (2022). Backbones-review: feature extraction networks for deep learning
disease based on clustering evolutionary random forest. IEEE J. Biomed. Health Inform. and deep reinforcement learning approaches. arXiv Preprint arXiv:2206.
24, 2973–2983. doi: 10.1109/JBHI.2020.2973324 08016.

Frontiers in Computational Neuroscience 14 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

Ellis, K. A., Bush, A. I., Darby, D., Fazio, D. D., Foster, J., Hudson, P., et al. (2009). Lin, W., Tong, T., Gao, Q., Guo, D., Du, X., Yang, Y., et al. (2018). Convolutional
The Australian imaging, biomarkers and lifestyle (AIBL) study of aging: methodology neural networks-based MRI image analysis for the Alzheimer’s disease prediction from
and baseline characteristics of 1112 individuals recruited for a longitudinal study of mild cognitive impairment. Front. Neurosci. 12, 777. doi: 10.3389/fnins.2018.00777
Alzheimer’s disease. Int. Psychogeriatr. 21, 672–687. doi: 10.1017/S1041610209009405
Liu, J., Li, M., Luo, Y., Yang, S., Li, W., and Bi, Y. (2021). Alzheimer’s disease
Farooq, A., Anwar, S., Awais, M., and Alnowami, M. (2017). “Artificial intelligence detection using depthwise separable convolutional neural networks. Comput. Methods
based smart diagnosis of Alzheimer’s disease and mild cognitive impairment,” in 2017 Prog. Biomed. 203, 106417. doi: 10.1016/j.cmpb.2021.106417
International Smart Cities Conference, ISC2 2017 (Institute of Electrical and Electronics
Liu, M., Li, F., Yan, H., Wang, K., Ma, Y., Shen, L., et al. (2020). A
Engineers Inc.). doi: 10.1109/ISC2.2017.8090871
multi-model deep convolutional neural network for automatic hippocampus
Feng, W., Halm-Lutterodt, N. V., Tang, H., Mecum, A., Mesregah, M. K., Ma, Y., et al. segmentation and classification in Alzheimer’s disease. NeuroImage 208, 116459.
(2020). Automated MRI-based deep learning model for detection of alzheimer’s disease doi: 10.1016/j.neuroimage.2019.116459
process. Int. J. Neural Syst. 30, 2050032. doi: 10.1142/S012906572050032X
Liu, M., Zhang, D., and Shen, D. (2016). Relationship induced multi-template learning
Fischl, B. (2012). Freesurfer. NeuroImage 62, 774–781. for diagnosis of Alzheimer’s disease and mild cognitive impairment. IEEE Trans. Med.
doi: 10.1016/j.neuroimage.2012.01.021 Imaging 35, 1463–1474. doi: 10.1109/TMI.2016.2515021
He, K., Chen, X., Xie, S., Li, Y., Doll, P., and Girshick, R. (2021). Masked Autoencoders Liu, M., Zhang, J., Adeli, E., and Shen, D. (2018). Landmark-based deep
are Scalable Vision Learners New Orleans, LA: IEEE. doi: 10.1109/CVPR52688.2022.01553 multi-instance learning for brain disease diagnosis. Med. Image Anal. 43, 157–168.
doi: 10.1016/j.media.2017.10.005
He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep residual learning for image
recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision Liu, S., Liu, S., Cai, W., Pujol, S., Kikinis, R., and Feng, D. (2014). “Early diagnosis of
and Pattern Recognition (Las Vegas, NV: IEEE), 770–778. doi: 10.1109/CVPR.2016.90 Alzheimer’s disease with deep learning,” in 2014 IEEE 11th International Symposium on
Biomedical Imaging (Beijing: IEEE), 1015–1018. doi: 10.1109/ISBI.2014.6868045
Hosseini-Asl, E., Gimel’farb, G., and El-Baz, A. (2016a). Alzheimer’s Disease Diagnostics
by a Deeply Supervised Adaptable 3D Convolutional Network Frontiers in Bioscience- Livingston, G., Huntley, J., Sommerlad, A., Ames, D., Ballard, C., Banerjee, S.,
Landmark. et al. (2020). Dementia prevention, intervention, and care: 2020 report of the lancet
commission. Lancet 396, 413–446. doi: 10.1016/S0140-6736(20)30367-6
Hosseini-Asl, E., Keynton, R., and El-Baz, A. (2016b). “Alzheimer’s disease diagnostics
by adaptation of 3D convolutional network,” Proceedings–International Conference on Malone, I. B., Cash, D., Ridgway, G. R., MacManus, D. G., Ourselin, S., Fox, N. C., et al.
Image Processing, ICIP (Phoenix, AZ), 126–130. doi: 10.1109/ICIP.2016.7532332 (2013). MIRIAD–public release of a multiple time point Alzheimer’s MR imaging dataset.
NeuroImage 70, 33–36. doi: 10.1016/j.neuroimage.2012.12.044
Hu, S., Yu, W., Chen, Z., and Wang, S. (2020). “Medical image reconstruction using
generative adversarial network for Alzheimer disease assessment with class-imbalance Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C., and Buckner, R. L.
problem,” in 2020 IEEE 6th International Conference on Computer and Communications (2010). Open Access series of imaging studies: longitudinal MRI data in nondemented and
(ICCC) (Chengdu: IEEE), 1323–1327. doi: 10.1109/ICCC51575.2020.9344912 demented older adults. J. Cogn. Neurosci. 22, 2677–2684. doi: 10.1162/jocn.2009.21407
Huang, G., Liu, Z., Maaten, L. V. D., and Weinberger, K. Q. (2017). “Densely connected Marcus, D. S., Wang, T. H., Parker, J., Csernansky, J. G., Morris, J. C., and Buckner, R. L.
convolutional networks,” in Proceedings–30th IEEE Conference on Computer Vision and (2007). Open access series of imaging studies (OASIS): cross-sectional MRI data in young,
Pattern Recognition (Honolulu, HI), 2261–2269. doi: 10.1109/CVPR.2017.243 middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19, 1498–1507.
doi: 10.1162/jocn.2007.19.9.1498
Islam, J., and Zhang, Y. (2018). Brain mri analysis for Alzheimer’s disease diagnosis
using an ensemble system of deep convolutional neural networks. Brain Inform. 5, Moradi, E., Pepe, A., Gaser, C., Huttunen, H., and Tohka, J. (2015). Machine learning
359–369. doi: 10.1007/978-3-030-05587-5_34 framework for early MRI-based Alzheimer’s conversion prediction in mci subjects.
NeuroImage 104, 398–412. doi: 10.1016/j.neuroimage.2014.10.002
Jack, C. R., Bernstein, M. A., Fox, N. C., Thompson, P., Alexander, G., Harvey, D., et al.
(2008). The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Murugan, S., Venkatesan, C., Sumithra, M. G., Gao, X. Z., Elakkiya, B.,
Reson. Imaging 27, 685–691. doi: 10.1002/jmri.21049 Akila, M., et al. (2021). Demnet: a deep learning model for early diagnosis of
alzheimer diseases and dementia from mr images. IEEE Access 9, 90319–90329.
Jain, R., Jain, N., Aggarwal, A., and Hemanth, D. J. (2019). Convolutional neural
doi: 10.1109/ACCESS.2021.3090474
network based Alzheimer’s disease classification from magnetic resonance brain images.
Cogn. Syst. Res. 57, 147–159. doi: 10.1016/j.cogsys.2018.12.015 Nichols, E., Steinmetz, J. D., Vollset, S. E., Fukutaki, K., Chalek, J., Abd-Allah, F., et al.
(2022). Estimation of the global prevalence of dementia in 2019 and forecasted prevalence
Jenkinson, M., Beckmann, C. F., Behrens, T. E., Woolrich, M. W., and Smith, S. M.
in 2050: an analysis for the global burden of disease study 2019. Lancet Public Health 7,
(2012). Fsl. NeuroImage 62, 782–790. doi: 10.1016/j.neuroimage.2011.09.015
e105-e125. doi: 10.1016/S2468-2667(21)00249-8
Khan, N. M., Abraham, N., and Hon, M. (2019). Transfer learning with intelligent
Odusami, M., Maskeliūnas, R., Damaševičius, R., and Krilavičius, T. (2021). Analysis
training data selection for prediction of Alzheimer’s disease. IEEE Access 7, 72726–72735.
of features of Alzheimer’s disease: detection of early stage from functional brain changes
doi: 10.1109/ACCESS.2019.2920448
in magnetic resonance images using a finetuned resnet18 network. Diagnostics 11, 1071.
Khedher, L., Ramírez, J., Górriz, J. M., Brahim, A., and Segovia, F. (2015). Early doi: 10.3390/diagnostics11061071
diagnosis of Alzheimer’s disease based on partial least squares, principal component
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D.,
analysis and support vector machine using segmented mri images. Neurocomputing 151,
et al. (2021). The prisma 2020 statement: an updated guideline for reporting systematic
139–150. doi: 10.1016/j.neucom.2014.09.072
reviews. BMJ 372:105906. doi: 10.31222/osf.io/v7gm2
Korolev, S., Safiullin, A., Belyaev, M., and Dodonova, Y. (2017). “Residual and
Payan, A., and Montana, G. (2015). “Predicting Alzheimer’s disease: a neuroimaging
plain convolutional neural networks for 3D brain MRI classification,” in Proceedings–
study with 3D convolutional neural networks,” in ICPRAM 2015–4th International
International Symposium on Biomedical Imaging (Melbourne, VICL: IEEE), 835–838.
Conference on Pattern Recognition Applications and Methods, Vol. 2 (Lisbon), 355–362.
doi: 10.1109/ISBI.2017.7950647
Poloni, K. M., and Ferrari, R. J. (2022). Automated detection, selection and
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet classification with
classification of hippocampal landmark points for the diagnosis of Alzheimer’s
deep convolutional neural networks. Commun. ACM 60, 84–90. doi: 10.1145/3065386
disease. Comput. Methods Prog. Biomed. 214, 106581. doi: 10.1016/j.cmpb.202
LaMontagne, P. J., Benzinger, T. L. S., Morris, J. C., Keefe, S., Hornbeck, 1.106581
R., Xiong, C., et al. (2019). OASIS-3: longitudinal neuroimaging, clinical, and
Qiu, S., Joshi, P. S., Miller, M. I., Xue, C., Zhou, X., Karjadi, C., et al. (2020).
cognitive dataset for normal aging and Alzheimer disease. medRxiv. 14, P1097–P1097.
Development and validation of an interpretable deep learning framework for Alzheimer’s
doi: 10.1101/2019.12.13.19014902
disease classification. Brain 143, 1920–1933. doi: 10.1093/brain/awaa137
Lebedev, A. V., Westman, E., Westen, G. J. V., Kramberger, M. G., Lundervold, A.,
Sarraf, S., DeSouza, D. D., Anderson, J., Tofighi, G., and for the Alzheimer’s Disease
Aarsland, D., et al. (2014). Random forest ensembles for detection and prediction of
Neuroimaging Initiative (2017). DeepAD: Alzheimer’s disease classification via deep
Alzheimer’s disease with a good between-cohort robustness. NeuroImage Clin. 6, 115–125.
convolutional neural networks using MRI and fMRI. bioRxiv 70441. doi: 10.1101/070441
doi: 10.1016/j.nicl.2014.08.023
Shi, J., Zheng, X., Li, Y., Zhang, Q., and Ying, S. (2018). Multimodal
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning
neuroimaging feature learning with multimodal stacked deep polynomial networks
applied to document recognition. Proc. IEEE 86, 2278–2323. doi: 10.1109/5.726791
for diagnosis of Alzheimer’s disease. IEEE J. Biomed. Health Inform. 22, 173–183.
Li, F., Tran, L., Thung, K. H., Ji, S., Shen, D., and Li, J. (2015). A robust deep model doi: 10.1109/JBHI.2017.2655720
for improved classification of AD/MCI patients. IEEE J. Biomed. Health Inform. 19,
Simonyan, K., and Zisserman, A. (2015). “Very deep convolutional networks for large-
1610–1616. doi: 10.1109/JBHI.2015.2429556
scale image recognition,” in 3rd International Conference on Learning Representations,
Lian, C., Liu, M., Zhang, J., and Shen, D. (2020). “Hierarchical fully ICLR 2015–Conference Track Proceedings (San Diego, CA), 1–14.
convolutional network for joint atrophy localization and alzheimer’s disease
Suk, H.-I., Lee, S.-W., and Shen, D. (2015). Latent feature representation with
diagnosis using structural MRI. IEEE Trans. Pattern Anal. Mach. Intell. 42, 880–893.
stacked auto-encoder for AD/MCI diagnosis. Brain Struct. Funct. 220, 841–859.
doi: 10.1109/TPAMI.2018.2889096
doi: 10.1007/s00429-013-0687-3
Lim, B. Y., Lai, K. W., Haiskin, K., Kulathilake, K. A. S. H., Ong, Z. C., Hum,
Suk, H. I., Lee, S.-W., Shen, D., Disease, T. A., and Initiative, N. (2016a). Deep sparse
Y. C., et al. (2022). Deep learning model for prediction of progressive mild cognitive
multi-task learning for feature selection in Alzheimer’s disease diagnosis. Brain Struct.
impairment to Alzheimer’s disease using structural MRI. Front. Aging Neurosci. 14,
Funct. 221, 2569–2587. doi: 10.1007/s00429-015-1059-y
876202. doi: 10.3389/fnagi.2022.876202

Frontiers in Computational Neuroscience 15 frontiersin.org


Zhao et al. 10.3389/fncom.2023.1038636

Suk, H. I., Lee, S. W., and Shen, D. (2014). Hierarchical feature representation and Guyon, R. Fergus, H. Wallach, S. V. N. Vishwananthan, U. von Luxburg, R. Garnett, S.
multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101, 569–582. Bengio (Long Beach, CA: Neural Information Processing Systems Foundation).
doi: 10.1016/j.neuroimage.2014.06.077
Wang, H., Shen, Y., Wang, S., Xiao, T., Deng, L., Wang, X., et al. (2019). Ensemble of 3D
Suk, H. I., Lee, S. W., and Shen, D. (2017). Deep ensemble learning of sparse densely connected convolutional network for diagnosis of mild cognitive impairment and
regression models for brain disease diagnosis. Med. Image Anal. 37, 101–113. Alzheimer’s disease. Neurocomputing 333, 145–156. doi: 10.1016/j.neucom.2018.12.018
doi: 10.1016/j.media.2017.01.008
Wang, S., Wang, H., Shen, Y., and Wang, X. (2018). “Automatic recognition
Suk, H. I., and Shen, D. (2013). Deep learning-based feature representation for of mild cognitive impairment and Alzheimers disease using ensemble based
AD/MCI classification. Med. Image Comput. Comput. Assist. Interv. 16 (Pt. 2), 583–590. 3D densely connected convolutional networks,” in 2018 17th IEEE International
doi: 10.1007/978-3-642-40763-5_72 Conference on Machine Learning and Applications (ICMLA) (Orlando, FL), 517–523.
doi: 10.1109/ICMLA.2018.00083
Suk, H. I., Wee, C. Y., Lee, S. W., and Shen, D. (2016b). State-space model with
deep learning for functional dynamics estimation in resting-state fMRI. NeuroImage 129, Wang, S. H., Phillips, P., Sui, Y., Liu, B., Yang, M., and Cheng, H. (2018). Classification
292–307. doi: 10.1016/j.neuroimage.2016.01.005 of Alzheimer’s disease based on eight-layer convolutional neural network with leaky
rectified linear unit and max pooling. J. Med. Syst. 42, 85. doi: 10.1007/s10916-018-0932-7
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. (2017). “Inception-v4, inception-
ResNet and the impact of residual connections on learning,” in 31st AAAI Conference Wen, J., Thibeau-Sutre, E., and Diaz-Melo, M. (2020). Convolutional neural networks
on Artificial Intelligence, AAAI (San Francisco, CA), 4278–4284. doi: 10.1609/aaai.v31i for classification of Alzheimer’s disease: overview and reproducible evaluation. Med.
1.11231 Image Anal. 63, 101694. doi: 10.1016/j.media.2020.101694
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Yang, Z., and Liu, Z. (2020). The risk prediction of Alzheimer’s disease based on the
“Going deeper with convolutions,” in Proceedings of the IEEE Computer Society deep learning model of brain 18F-FDG positron emission tomography. Saudi J. Biol. Sci.
Conference on Computer Vision and Pattern Recognition (Boston, MA: IEEE), 1–9. 27, 659–665. doi: 10.1016/j.sjbs.2019.12.004
doi: 10.1109/CVPR.2015.7298594
Zhang, J., Zheng, B., Gao, A., Feng, X., Liang, D., and Long, X. (2021). A
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). “Rethinking 3D densely connected convolution neural network with connection-wise attention
the inception architecture for computer vision,” in Proceedings of the IEEE Computer mechanism for Alzheimer’s disease classification. Magn. Reson. Imaging 78, 119–126.
Society Conference on Computer Vision and Pattern Recognition (Las Vegas, NV: IEEE), doi: 10.1016/j.mri.2021.02.001
2818–2826. doi: 10.1109/CVPR.2016.308
Zhao, X., Ang, C. K. E., Acharya, U. R., and Cheong, K. H. (2021). Application of
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). artificial intelligence techniques for the detection of Alzheimer’s disease using structural
“Attention is all you need,” in Advances in Neural Information Processing Systems, eds I. MRI images. Biocybern. Biomed. Eng. 41, 456–473. doi: 10.1016/j.bbe.2021.02.006

Frontiers in Computational Neuroscience 16 frontiersin.org


© 2023. This work is licensed under
http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.

You might also like