Seminar Report Vivek R
Seminar Report Vivek R
A SEMINAR REPORT
submitted by
VIVEK R
CEM20CS041
to
Bachelor of Technology
in
Computer Science and Engineering
Thiruvananthapuram
DECEMBER 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
VISION
MISSION
Provide learning ambience through strong theoretical and practical background with
an emphasis on software development.
Establish industry interaction programmes to enhance the technical-knowhow at par
with the current trends in the industry and to promote the entrepreneurship skills.
Promote research-based projects in the emerging areas of Computer Science and
Engineering.
Empowering the youth in rural communities with computer education.
Inculcating professional behaviour, leadership qualities, team spirit, skills on problem
solving, critical thinking, and ethical responsibilities.
I
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the report entitled “Using Verb Fluency, Natural Language Processing,
and Machine Learning to Detect Alzheimer’s Disease” submitted by Vivek R to the APJ
Abdul Kalam Technological University in partial fulfillment of the requirements for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering is a
bonafide record of the seminar work carried out by him/her under my guidance and
supervision. This report in any form has not been submitted to any other University or
Institute for any purpose.
Mrs. Bindhu J S
Assistant Professor
Dept of CSE
II
ACKNOWLEDGEMENT
I take this opportunity to express my deep sense of gratitude and sincere thanks to all who
helped me to complete the seminar successfully.
Firstly, I would like to express my sincere gratitude to my guide Mrs. Surya S R for the
guidance and valuable comments.
I am also extremely thankful to my seminar coordinator Mrs. Devi Dath for giving her moral
support, positive criticism and cooperation.
I also express my gratitude to Mrs. Bindhu J S, Head of Computer Science and Engineering
Department for her support and cooperation.
Finally, I thank all other staff members, my parents and friends for their help and motivation.
Vivek R
III
ABSTRACT
Alzheimer’s disease (AD) causes significant impairments in memory and other cognitive
domains. As there is no cure to the disease yet, early detection and delay of disease
progression are critical for management of AD. Verbal fluency is one of the most common
and sensitive neuro psychological methods used for detection and evaluation of the cognitive
declines in AD, in which a subject is required to name as many items as possible in 30 or 60
seconds that belong to a certain category. In this study, we develop an approach to detect AD
using a verb fluency (VF) task, a specific subset of verbal fluency analyzing the subjects’
listing of verbs in a given time period. We use machine learning techniques including random
forest (RF), neural network (NN), recurrent NN (RNN), and natural language processing
(NLP) to detect the risk of AD. The results show that the developed models can stratify
subjects into the corresponding AD and control groups with up to 76% accuracy using RF,
but at a cost of having to preprocess the data. This accuracy is slightly lower, but not
significantly, at 67% using RNN and NLP, which involves almost no manual preprocessing
of the data. This study opens up a powerful approach of using simple VF tasks for early
detection of AD.
IV
CONTENTS
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF FIGURES vi
1. INTRODUCTION 1
1.1 OBJECTIVE 2
1.2 MOTIVATION 3
2. LITERATURE SURVEY 5
2.1 Detecting Japanese Patients with Alzheimer’s Disease based on
Word Category Frequencies 5
2.2 Deep learning to detect Alzheimer's disease from neuroimaging: A systematic
literature review 6
2.3 Early-Stage Alzheimer's Disease Prediction Using Machine Learning Models 8
3. PROPOSED METHOD 12
3.2 Models 14
REFERENCES 25
V
LIST OF FIGURES
VI
LIST OF TABLES
VI
CHAPTER 1
INTRODUCTION
Alzheimer’s disease (AD) is the leading cause of dementia, accounting for 60-80 percent of
cases. Dementia generally refers to a patient’s decline in memory and cognitive skills such as
their ability to reason, think, or speak clearly. AD is a degenerative brain disease that
originates from damage to brain cells. While no cure for AD currently exists, earlier detection
of the disease means earlier intervention and more effective care. Despite the growing
number of cases of AD, approximately only a quarter of the patients are typically diagnosed.
Worse yet, the mortality rate of AD in the United States has significantly increased between
2000 and 2018 from 17.6 to 37.3 deaths per 100,000 population.
A large research body is dedicated to studying the utilization of language tasks in improving
early detection of AD. This research has shown promise as AD results in cognitive
impairment and typically has negative implications on how patients produce or use language.
In general, past studies have covered recording a patient’s speech over a period of time and
analyzing the number and types of words they produce to detect AD. Such an approach to
AD detection is promising because there is generally no need for expensive equipment or
invasive procedures, and the data collection and analysis can be done even remotely.
However, existing works on detecting AD from recorded speech data generally use time-
intensive tasks, such as open ended interviews with clinicians.
We leverage a verb fluency (VF) task data analysis to detect AD that simply relies on the way
patients list verbs in bursts of 30 seconds. Although verbal fluency, e.g., semantic fluency
and phonemic fluency, has been commonly used to detect AD, analyzing the listing of verbs
is much less explored. This task of listing verbs has the potential to simplify the evaluation
process and can be more readily transferable and generalizable to a large array of languages.
Therefore, in this study, we aim to leverage machine learning (ML) and natural language
processing (NLP) along with VF for early AD detection. ML is a branch of artificial
intelligence that allows for eliciting patterns from the data. It can draw associations between a
set of input variables (e.g., the choice of verbs, the pattern with which they are produced, etc.)
and output (response) variables (e.g., at risk for AD or not). NLP is a field at the intersection
of artificial intelligence and linguistics that concerns with the interactions between human
(natural) language and computers. Both ML and NLP, either separately or jointly, have been
used in health care applications to much success, e.g., to detect or predict various outcomes
or risks for patients using electronic medical records.
In this study, we develop an approach to detect AD using the data from a 30-second VF task.
First, we develop ML models that detect AD using psycholinguistic features of the input
verbs, extracted by experts from the VF task data. We specifically develop random forest
(RF) and neural networks (NN) models. Next, we leverage NLP and ML jointly to develop an
end-to-end ML pipeline. That is, we use NLP on the concatenated text string of verbs from
subjects to elicit information. We then use this elicited information along with the (raw)
sequence of verbs produced in a recurrent neural network (RNN) model to detect AD.
1.1 OBJECTIVE
Find individuals at risk of Alzheimer's disease before it worsens, allowing timely
support.
Evaluate memory and cognitive issues using verbal fluency tasks.
Employ advanced technology like machine learning and NLP to develop accurate AD
prediction models.
Classify individuals into AD or control groups based on their performance in VF
tasks.
Strive for high detection accuracy by comparing various machine learning models and
data preprocessing methods.
Explore the feasibility of using straightforward VF tasks for affordable and accessible
early AD detection.
1.2 MOTIVATION
Research in Alzheimer's disease detection is a field that holds immense motivation. At its
core, it is about improving the lives of individuals and families affected by this devastating
disease. Early detection, which your work seeks to advance, can offer the gift of more time
and better care, making a profound impact on the quality of life for those at risk. This noble
cause fuels your research with a sense of purpose and urgency. The hope that research instills
is another powerful motivator. Each step forward, every breakthrough, and every experiment
conducted brings us closer to potential treatments or interventions that can change the course
of Alzheimer's disease. The possibility of being a part of this transformative journey gives
you a strong sense of purpose and optimism.
Research in the field of Alzheimer's disease detection is deeply motivated by the profound
impact it can have on the lives of individuals and families affected by this devastating
condition. Several key motivators drive researchers in this noble cause:
Improving Lives: At the core of Alzheimer's disease research is the fundamental goal of
improving the lives of those afflicted by this condition. Early detection, which your work
seeks to advance, offers the invaluable gift of more time and better care. This can make a
significant difference in the quality of life for individuals at risk of AD and provide
support for their families.
Sense of Purpose: The noble cause of early detection and the potential to positively
influence the trajectory of the disease give your research a strong sense of purpose.
Knowing that your work can contribute to early interventions, improved treatments, and
enhanced patient care adds a profound sense of meaning to your efforts.
Hope and Optimism: Research instills a powerful sense of hope and optimism. Each step
forward, every breakthrough, and every experiment conducted brings us closer to
potential treatments or interventions that can change the course of Alzheimer's disease.
The possibility of being a part of this transformative journey to combat AD gives you a
strong sense of purpose and a positive outlook.
1.3 PROBLEM STATEMENT
Alzheimer's disease (AD) represents a growing global health crisis with no cure. Early
detection is essential for effective management, but current methods often involve complex
and costly procedures. The need for a non-invasive, accessible, and accurate detection tool is
evident. This research addresses the pressing problem of late-stage Alzheimer's disease
diagnosis by exploring the potential of verb fluency tasks in machine learning and natural
language processing. The current diagnostic process is often resource-intensive, time-
consuming, and may not yield timely results, hindering early intervention and support for
affected individuals.
Alzheimer's disease (AD) is a global health crisis characterized by its growing prevalence,
lack of a cure, and profound impact on individuals and society. One of the most pressing
challenges in AD management is early detection. The earlier AD is identified, the more
effective interventions and support can be provided to patients. However, the current
diagnostic methods for AD often involve complex and costly procedures, making them less
accessible and efficient for timely detection.
Lack of Timely Results: The results of current diagnostic procedures may not be
available in a timely manner. Delays in obtaining diagnosis can lead to missed
opportunities for early intervention and support.
CHAPTER 2
LITERATURE SURVEY
This study focuses on the early detection of Alzheimer's disease (AD) through the analysis of
spoken language using Natural Language Processing (NLP). While previous research has
explored aspects like vocabulary size, grammatical complexity, and fluency, the content
analysis of narratives of AD patients remains a challenge in NLP. To investigate this, the
researchers recruited 18 participants aged between 53 and 90, with an average age of 76.89.
These participants were divided into two groups based on their Mini-Mental State
Examination (MMSE) scores, a commonly used test to assess cognitive function. The AD
group consisted of 9 participants with MMSE scores of 21 or lower, indicating cognitive
impairment, while the healthy control group included 9 participants with scores of 22 or
higher, indicating normal cognitive function. The researchers used Linguistic Inquiry and
Word Count (LIWC), a text analysis software, to categorize the words used by the
participants. LIWC is designed to identify linguistic patterns and psychological processes in
written or spoken text. They also measured the word frequency through observation. The key
finding of this study is the significant difference observed in the usage of impersonal
pronouns in the AD group compared to the control group. Impersonal pronouns, such as "it,"
"they," or "them," are often used to refer to objects or people in a non-specific or indirect
way. This suggests that individuals with AD may exhibit distinct linguistic patterns, such as a
higher usage of impersonal pronouns, in their spoken language.
This research is centered on the early detection of Alzheimer's Disease (AD) by employing
Natural Language Processing (NLP) to analyze spoken language. Previous investigations in
this area have explored linguistic aspects like vocabulary size, grammatical complexity, and
fluency, but the analysis of narrative content in AD patients remains a challenging endeavor
for NLP.
Participant Recruitment: For the study, 18 participants were recruited, ranging in age from 53
to 90, with an average age of 76.89. These participants were divided into two groups based
on their Mini-Mental State Examination (MMSE) scores, a widely used test for assessing
cognitive function. The AD group comprised 9 participants with MMSE scores of 21 or
lower, indicating cognitive impairment associated with AD. In contrast, the healthy control
group included 9 participants with MMSE scores of 22 or higher, indicating normal cognitive
function.
Key Finding: The study's most significant discovery is the notable difference in the usage of
impersonal pronouns between the AD group and the control group. Impersonal pronouns,
such as "it," "they," or "them," are used to refer to objects or people in a non-specific or
indirect manner. The increased usage of impersonal pronouns in the AD group suggests that
individuals with AD exhibit distinctive linguistic patterns in their spoken language. This
specific finding underscores the potential of NLP in uncovering linguistic markers or cues
that may aid in the early detection of AD, providing valuable insights into the cognitive
changes associated with the disease.
Alzheimer's Disease (AD) is one of the leading causes of death in developed countries. From
a research point of view, impressive results have been reported using computer-aided
algorithms, but clinically no practical diagnostic method is available. In recent years, deep
models have become popular, especially in dealing with images. Since 2013, deep learning
has begun to gain considerable attention in AD detection research, with the number of
published papers in this area increasing drastically since 2017. Deep models have been
reported to be more accurate for AD detection compared to general machine learning
techniques. Nevertheless, AD detection is still challenging, and for classification, it requires a
highly discriminative feature representation to separate similar brain patterns.
This paper reviews the current state of AD detection using deep learning. Through a
systematic literature review of over 100 articles, we set out the most recent findings and
trends. Specifically, we review useful biomarkers and features (personal information, genetic
data, and brain scans), the necessary pre-processing steps, and different ways of dealing with
neuroimaging data originating from single-modality and multi-modality studies. Deep models
and their performance are described in detail. Although deep learning has achieved notable
performance in detecting AD, there are several limitations, especially regarding the
availability of datasets and training procedures.
Emergence of Deep Learning: In recent years, deep learning, a subset of machine learning,
has gained prominence in various domains, particularly for tasks involving images. The
adoption of deep learning in AD detection research began around 2013, with a substantial
increase in related research papers since 2017. Deep learning models have shown promise in
achieving higher accuracy compared to traditional machine learning techniques.
Review Objectives: This paper aims to provide a comprehensive review of the current state of
AD detection using deep learning. The review is based on a systematic analysis of over 100
research articles. The paper focuses on the most recent findings and emerging trends in the
field. Specifically, it covers the following key aspects:
Biomarkers and Features: The review examines the biomarkers and features used for AD
detection. These include personal information, genetic data, and brain scans, all of which can
provide valuable insights into an individual's risk of developing AD.
Data Preprocessing: The paper discusses the preprocessing steps necessary to prepare data for
deep learning models. This includes data cleaning, normalization, and other procedures that
enhance the quality and consistency of the data.
Neuroimaging Data: The review explores the use of neuroimaging data in AD detection,
considering data from both single-modality and multi-modality studies. Neuroimaging data,
such as MRI scans, offer critical information about brain structure and function that can aid in
AD diagnosis.
Deep Models: The paper delves into deep learning models in detail, providing insights into
the specific models used in AD detection and their performance.
Alzheimer's disease (AD) is the leading cause of dementia in older adults. There is currently
a lot of interest in applying machine learning to find out metabolic diseases like Alzheimer's
and Diabetes that affect a large population of people around the world. Their incidence rates
are increasing at an alarming rate every year. In Alzheimer's disease, the brain is affected by
neurodegenerative changes. As our aging population increases, more and more individuals,
their families, and healthcare will experience diseases that affect memory and functioning.
These effects will be profound on the social, financial, and economic fronts. In its early
stages, Alzheimer's disease is hard to predict. A treatment given at an early stage of AD is
more effective, and it causes fewer minor damage than a treatment done at a later stage.
Several techniques such as Decision Tree, Random Forest, Support Vector Machine, Gradient
Boosting, and Voting classifiers have been employed to identify the best parameters for
Alzheimer's disease prediction. Predictions of Alzheimer's disease are based on Open Access
Series of Imaging Studies (OASIS) data, and performance is measured with parameters like
Precision, Recall, Accuracy, and F1-score for ML models. The proposed classification
scheme can be used by clinicians to make diagnoses of these diseases. It is highly beneficial
to lower annual mortality rates of Alzheimer's disease in early diagnosis with these ML
algorithms. The proposed work shows better results with the best validation average accuracy
of 83% on the test data of AD. This test accuracy score is significantly higher in comparison
with existing works.
Alzheimer's disease (AD) is a prevalent neurodegenerative condition and the leading cause of
dementia among older adults. AD has garnered significant attention in the field of machine
learning due to its increasing incidence rates and profound impact on individuals, families,
and healthcare systems. As the aging population continues to grow, the societal, financial,
and economic burden of AD is escalating. Early diagnosis of AD is crucial because treatment
interventions are more effective and less damaging when administered in the disease's early
stages. Therefore, machine learning techniques are being harnessed to improve prediction
accuracy for AD.
Machine Learning Approaches: The study employs a range of machine learning techniques,
including Decision Trees, Random Forest, Support Vector Machine, Gradient Boosting, and
Voting classifiers. These algorithms are utilized to identify the most effective parameters for
predicting Alzheimer's disease. The research leverages the Open Access Series of Imaging
Studies (OASIS) dataset, which contains valuable imaging data. Evaluation metrics such as
Precision, Recall, Accuracy, and F1-score are used to measure the performance of these
machine learning models.
Outcomes: The study reports promising results, with the best validation average accuracy
reaching 83% on the AD test data. This is a significant achievement compared to existing
works, suggesting that the machine learning models applied in this research can make more
accurate predictions of Alzheimer's disease. Higher accuracy rates in AD prediction are
crucial for early intervention and improved patient care.
In summary, this research emphasizes the importance of using machine learning techniques
to predict Alzheimer's disease. The study's results showcase the potential for more accurate
and early diagnoses of AD, which can have substantial benefits for both patients and
healthcare providers, potentially reducing the impact of this devastating disease on
individuals and society as a whole.
In this research, the primary objective is to develop a novel approach for the identification of
Alzheimer's Disease (AD) stages in patients through the utilization of mobility data recorded
via smartphones. A cohort of 35 AD patients at a daycare center wore smartphones, which
collected data over the span of a week. Crucially, this data was meticulously labeled to reflect
the different stages of the disease, categorizing patients as either in the early, middle, or late
stage of AD. The researchers harnessed the power of a Convolutional Neural Network (CNN)
model to meticulously process these intricate time series datasets.
The outcomes of this study are highly promising, with the CNN-based method achieving an
impressive accuracy rate of 90.91% and an F1-score of 0.897. These results, in comparison to
the performance of conventional feature-based classifiers, underscore the remarkable
potential of deep learning techniques, especially CNNs, in the realm of Alzheimer's Disease
diagnostics. This innovative method reveals the substantial value of mobility data in both the
treatment and comprehension of the disease, offering not only advanced disease monitoring
capabilities but also insights into the evolution of AD.
Promising Outcomes: The study has yielded highly promising results, with the CNN-based
method achieving an impressive accuracy rate of 90.91% and an F1-score of 0.897. These
results, when compared to the performance of conventional feature-based classifiers,
underscore the remarkable potential of deep learning techniques, particularly CNNs, in the
field of AD diagnostics. The CNN-based approach not only demonstrates the feasibility of
using mobility data for advanced disease monitoring but also offers valuable insights into the
progression of AD.
Clinical Implications: The CNN-based approach introduced in this research has significant
implications for clinical practice and the management of AD. It provides healthcare
professionals with a more precise and efficient tool for assessing AD patients and delivering
tailored care. By accurately identifying the different stages of AD, this method enables timely
interventions and personalized treatments, ultimately contributing to enhanced patient
outcomes and improved healthcare practices in the context of Alzheimer's Disease.
Enhanced Quality of Life: The potential impact of this research extends beyond the clinical
setting. It holds the promise of improving the quality of life for individuals affected by AD.
Through early and accurate diagnosis, it opens doors to interventions and treatments that can
significantly enhance the well-being of AD patients. Furthermore, it contributes to our
understanding of the disease's progression and evolution.
In summary, this research introduces an innovative approach for identifying different stages
of Alzheimer's Disease using mobility data and deep learning techniques. The impressive
results and potential applications emphasize its importance in advancing AD diagnostics,
healthcare practices, and the overall quality of life for individuals living with this challenging
condition.
CHAPTER 3
PROPOSED SYSTEM
The subject cohort includes a total of 20 AD patients (mean age = 77.85 years) and 25 age-
matched controls (mean age = 72.68 years). Each subject is asked to say as many verbs as
possible in a 30-second block. The responses are recorded verbatim. The study protocol was
reviewed and approved by The University of Tennessee Health Science Center. The data are
analyzed by subject matter experts to extract psycholinguistic properties. The analysis is
performed to elicit properties pertaining to VF responses of individuals with amnestic AD [1]
and cognitively healthy older adults. Specifically, The English Lexicon project, a multi-
university effort to provide a standardized behavioral and descriptive data set for 40,481
words and 40,481 non-words, is used for the psycholinguistic analysis. To extract
psycholinguistic properties, the root forms of the verbs are used.
The properties extracted include: Total number of correctly produced words, length of the
word, the number of phonological neighbors that a word has, the number of orthographic
neighbors that a word has, how pleasant a word is, the extent to which the word denotes
something that is weak or strong, number of phonemes in the pronunciation, word frequency,
and the age of acquisition of the word.
In this study, the research population comprises a total of 20 patients diagnosed with
Alzheimer's disease (AD)[1], with a mean age of 77.85 years, and 25 age-matched control
subjects, whose average age is 72.68 years. The participants are tasked with a specific verbal
fluency (VF) task, where they are required to spontaneously generate as many verbs as
possible within a 30-second timeframe. Their responses are meticulously recorded verbatim
during this task.
To ensure the ethical and regulatory standards of research, the study protocol has been
thoroughly reviewed and officially approved by The University of Tennessee Health Science
Center. Subsequently, the collected data undergoes a rigorous analysis process, guided by
subject matter experts, aimed at extracting various psycholinguistic properties associated with
the verbal fluency responses of individuals with amnestic Alzheimer's Disease and
cognitively healthy older adults.
For the psycholinguistic analysis, the researchers draw upon a valuable resource known as
The English Lexicon project. This project represents a collaborative effort across multiple
universities, providing a standardized dataset encompassing 40,481 words and 40,481 non-
words, each meticulously documented with various linguistic and behavioral properties.
Total Number of Correctly Produced Words: This metric quantifies the total count of
accurately generated verbs within the given 30-second time frame.
Word Length: It captures the number of letters or characters in each word, reflecting
word complexity.
Phonological Neighbors: This property counts the number of words that sound similar or
share phonetic characteristics with the generated word.
Orthographic Neighbors: It quantifies the number of words that share similar spelling
with the generated word.
Word Pleasantness: This indicates the emotional connotation of the word, reflecting
whether it is considered pleasant or not by individuals.
Word Strength: It gauges the extent to which a word denotes something that is perceived
as weak or strong.
Number of Phonemes: This represents the count of individual sounds or phonemes in the
pronunciation of a word.
Word Frequency: This metric reflects how often the word is typically used in the
language.
Two types of ML models were developed in this study. The first type relied on features
extracted from the psycholinguistic properties. Specifically, we calculated the average,
standard deviation, and range of each of the psycholinguistic properties reported for any
given subject. This resulted in 60 initial features. We then used these features to develop two
ML models, namely RF and NN. RF is an ensemble classifier that uses a large number of
decision trees, each fitted on a randomly selected subset of the data, for classification. RF is
generally highly robust against overfitting. For the RF model, based on preliminary results
using out-of-bag (OOB) error, 100 trees were included in the model.
In addition, we used NNs, another non-linear learning model for classification. NNs
transfer the information from an input layer into a hidden layer and finally outputs the results.
For the NN model, one hidden layer with 16 hidden nodes was used. The activation function
was set to rectified linear unit (ReLU). Also, the learning rate was set to 0.001. Adam
optimizer was used for model training. Lastly, features were normalized before feeding them
into the model.
Figure 3.1 Example 2D word embedding space, where similar words are closer together
Two types of machine learning (ML) models were developed to analyze the data. The first
type of model utilized features extracted from psycholinguistic properties, which are
linguistic and cognitive characteristics of the subjects. Specifically, the researchers calculated
the average, standard deviation, and range for each of these psycholinguistic properties for
each subject. This process resulted in the creation of 60 initial features that describe the
linguistic and cognitive profiles of the study participants.
The first ML model developed using these features is known as a Random Forest (RF). RF is
an ensemble classifier that operates by employing a large number of decision trees, each of
which is trained on a randomly selected subset of the dataset. This ensemble approach helps
to enhance the model's robustness and reduce the risk of overfitting, a common problem in
machine learning. The researchers determined that, based on preliminary results and the out-
of-bag (OOB) error, it was most effective to use 100 trees in the RF model.
The second ML model created in the study is a Neural Network (NN). Neural networks are
non-linear models that simulate the functioning of the human brain. They transfer
information from an input layer, through one or more hidden layers, and then produce an
output. In this case, the researchers employed a NN with one hidden layer consisting of 16
hidden nodes. The activation function for this model was set to the rectified linear unit
(ReLU). They used an Adam optimizer for training the model, and the learning rate was set at
0.001. Additionally, before being input into the model, the features were normalized to
ensure that they had a consistent scale.
The second type of ML models did not rely on features extracted from the psycholinguistic
properties. Specifically, we developed an RNN directly using the recorded verbatim. This
involved using the concatenated string of verbal responses for any given subject, plus the
corresponding word embeddings obtained from NLP. In particular, we used word
embeddings to convert the words into vectors, allowing the RNN to form a relationship
between different verbs produced by subjects. Figure 3.1 provides an example of how this
relationship is established in a two-dimensional word embeddings. As seen in the figure, the
word ‘walk’, for example, is closer to the word ‘jog’ than the word ‘laugh.’ Therefore, words
that are closer in meaning (or are related in some way) have more similar vector
representations. The RNN includes one hidden layer with 50 hidden nodes. The activation
function was set to ‘sigmoid’. Adam optimizer was used for model training. The learning rate
was again set to 0.001. All models are developed in Python. For ML models, we use Keras
with the TensorFlow backend. In addition, we use pre-trained, 300-dimension word
embeddings from the spaCy package, which are trained on a corpus of web page data.
Table 3.1 Top 15 Most Important Features-RF And NN Models In The Order Of Importance
RF allows for ranking feature importance per the total decrease in the Gini measure of node
impurities. We use this feature ranking to perform feature pruning. Specifically, the 60 initial
features are first ranked based on their importance using RF. The 15 most important features
are then selected to be used in both RF and NN models. D. Input Data Tuning for the RNN
and NLP Model Recall that concatenated string of verbal responses for any given subject was
used in the RNN and NLP model.
To improve model performance, various text string combinations were explored. This
included using concatenated strings with and without stumbling (such as “um” and “uh”), and
with and without repeated verbs, if they occurred. E. Model Evaluation and Metrics For all
models, we employ five-fold cross validation. In each fold, training is done using balanced
sets, i.e., equal numbers of AD patients and healthy controls. This helped to avoid favoring
the more representative group. We consequently provide mean and standard deviations across
the five folds. Evaluation metrics include accuracy, F1 score, and area under the receiver
operating characteristic curve (AUC). Accuracy is the ratio of correct predictions over total
predictions. F1 is the harmonic mean of precision and recall, where precision is the
proportion of positive predictions that are correct and recall is the proportion of positive
predictions that are correctly classified. AUC is the value that reflects the overall ranking
performance of a classifier.
For the Recurrent Neural Network (RNN) and Natural Language Processing (NLP) models,
the input data consists of concatenated strings of verbal responses from each subject. To
enhance the performance of these models, the study explores different combinations of text
strings. This includes variations with and without filler words (such as "um" and "uh") and
with and without repeated verbs if they are present in the responses. This fine-tuning of input
data helps to optimize the RNN and NLP models by considering different text string
combinations.
To assess the performance of all the developed models, a five-fold cross-validation approach
is employed. In each fold, the training is conducted using balanced sets, ensuring an equal
number of AD patients and healthy control subjects. This balanced training approach
prevents any bias toward the more representative group.
The study utilizes several key evaluation metrics to assess model performance:
Accuracy measures the ratio of correct predictions to the total predictions. It indicates how
many predictions were right out of all predictions made by the model. The F1 score is a
metric that combines precision and recall. Precision is the proportion of positive predictions
that are actually correct, and recall is the proportion of actual positives that are correctly
predicted. The F1 score provides a balanced measure of a model's accuracy, particularly
when dealing with imbalanced datasets. AUC measures the overall ranking performance of a
classifier. It reflects the model's ability to distinguish between the two classes, in this case,
AD patients and healthy controls. A higher AUC indicates better discrimination capabilities.
CHAPTER 4
Feature Importance and Descriptions: In Figure 4.1, the visual representation illustrates the
importance of the top 15 features selected for inclusion in both the Random Forest (RF) and
Neural Network (NN) models. These features are central to the models' ability to make
accurate classifications. Table 3.1, on the other hand, provides detailed descriptions of these
features, shedding light on their nature and characteristics.
Figure 4.1 Feature Importance Table 3.1 presents the descriptions of these features.
Figure 4.1 displays the importance of the top 15 features that are included in the RF and NN
models. Table 3.1 presents the descriptions of these features. As seen in the figure and table,
the features are generally drawn from psycholinguistic properties relating to age of
acquisition, number of phonemes how pleasant the word is, phonological neighbours that a
word has, among others. In addition, per the preliminary results, the text strings without
stumbling and with repeated verbs resulted in best performance. We used this approach for
the rest of the study.
Feature Origins: These top 15 features are derived from various psycholinguistic properties.
The descriptions in Table 3.1 reveal that these properties encompass aspects related to
language and cognition, such as the age of word acquisition, the number of phonemes
(distinct speech sounds) in a word, the emotional connotation or pleasantness of a word, and
the presence of phonological neighbors (words that sound similar) for a given word. These
properties collectively capture linguistic and cognitive dimensions of the verbal responses
used in the study.
Optimal Text String Combinations: As suggested by preliminary results, the study found that
using text strings without "stumbling" (e.g., filler words like "um" and "uh") and including
repeated verbs yielded the best model performance. This implies that these specific
combinations of verbal responses, which are presumably more fluent and coherent, are most
effective in discriminating between Alzheimer's Disease (AD) patients and healthy control
subjects. Based on this finding, the researchers decided to continue using these specific text
string combinations in the subsequent phases of the study.
Table 4.1 Averages (and standard deviations) of metrics of the three ml models
Table 4.1 presents the averages and standard deviations of the evaluation metrics for RF, NN,
and RNN models. As seen in the table, RF slightly outperforms NN and RNN models. This
model is able to detect AD participants with an accuracy of 76%. Note that the RF model
relies on features extracted from psycholinguistic properties that require considerable
preprocessing of the data by subject matter experts. However, even with minimal
preprocessing of data, RNN model is able to correctly detect AD with an accuracy of 67%.
The table presents the averages and standard deviations of evaluation metrics for the Random
Forest (RF), Neural Network (NN), and Recurrent Neural Network (RNN) models, all of
which were developed in the study to detect Alzheimer's Disease (AD) in participants.
RF's Advantages: The RF model's success can be attributed to its use of features
extracted from psycholinguistic properties, which provide valuable linguistic and
cognitive insights. However, it's worth noting that this approach involves a considerable
amount of data preprocessing conducted by subject matter experts to ensure the
relevance and accuracy of these features. This meticulous data preparation contributes to
the RF model's higher accuracy.
RNN's Simplicity: In contrast, the RNN model, while achieving a slightly lower accuracy
of 67%, offers an important advantage. It requires minimal preprocessing of the data.
This means that, compared to the RF model, the RNN model is less reliant on extensive
manual data preparation. This simplicity can be particularly valuable in terms of time and
resource efficiency.
Implications: The study's findings suggest that the RF model, with its higher accuracy, is
a robust choice for AD detection, especially when extensive data preprocessing can be
employed. However, the RNN model, despite its slightly lower accuracy, offers a more
streamlined approach that requires less data preparation effort. This is significant because
it might make AD detection more accessible and cost-effective, particularly when
resources for extensive preprocessing are limited.
In essence, the choice between the RF and RNN models hinges on a balance between the
desire for high accuracy and the practical constraints related to data preprocessing. Each
model has its advantages, and this study provides valuable insights for researchers and
healthcare professionals seeking efficient methods for AD detection.
Further, we performed paired t-test to compare the results of the three models. Table III lists
the p-values of these tests. As seen in the table, the differences between the RF model and the
other models is not statistically significant. This concludes that the results from the RF model
are not significantly better than the other two models.
The paired t-tests conducted in the study aimed to assess whether the differences in
performance between the Random Forest (RF) model and the other two models (Neural
Network, NN, and Recurrent Neural Network, RNN) are statistically significant. The
results of these t-tests are presented in Table III, which lists the p-values associated with
the comparisons.
Findings from the Table: As indicated in the table, the p-values for the comparisons
between the RF model and the other models are not statistically significant. In other
words, the differences in performance between the RF model and the NN and RNN
models are not substantial enough to be considered statistically meaningful.
Implications: This finding has important implications. It suggests that, based on the data
and the evaluation metrics used in the study, there is no compelling statistical evidence to
conclude that the RF model significantly outperforms the NN and RNN models or vice
versa. The models appear to have comparable performance in terms of accuracy, F1
score, and AUC, as indicated by the non-significant p-values.
In practical terms, this means that researchers and practitioners have flexibility in
choosing between these models for Alzheimer's Disease (AD) detection. The choice can
be influenced by factors such as the level of data preprocessing needed, resource
availability, and the specific goals of the study. The findings emphasize the importance
of considering not only model performance but also the practical aspects of model
implementation when making decisions in AD detection research.
CHAPTER 5
CONCLUSION
Our results demonstrate that we can correctly detect AD with above-average chance accuracy
using NLP, even when using an RNN that requires almost no preprocessing of subjects’ VF
data. Our accuracy scores fall within the reported accuracy ranges of several clinical AD
detection methods, such as EEG and brain scans, that are considerably more costly and time-
consuming than VF tasks. Our results thus show promise for detecting AD using data-driven
methods without resorting to cost prohibitive, invasive or time-consuming clinical
procedures. As indicated by our results, RF performs as the slightly better method for
detecting AD when compared with NN and RNN. However, the differences are not
significant. It is worth noting that the RF requires considerable data preprocessing. While the
RF requires analysis and computation of psycholinguistic properties, the RNN simply
requires the concatenation of the subjects’ verbs. The latter methodology provides a much
more efficient, time and cost saving means to detect AD with 67% accuracy, and can easily
be conducted remotely. Given these benefits and the insights derived here regarding its
potential effectiveness, further exploration into using an RNN with an NLP after collecting
subjects’ verb listings stands out as a worthy venture. More specifically, future work may
include further refining and tuning the RNN and NLP while studying the patient-specific
covariates including age and comorbidities more comprehensively. It is also worth
investigating whether analyzing different categories of verbal fluency tasks (e.g., semantic,
phonemic, and verb fluency) simultaneously adds value to the detection of AD, given the
distinctive psycholinguistic processes demanded upon each task and each task’s sensitivity to
different aspects of cognitive declines in AD. Lastly, we acknowledge that one of the
limitations of this study is the small sample size. Hence, further studies using larger data sets
are needed to reproduce the current findings and build upon them.
Results are quite promising, demonstrating the potential for accurate Alzheimer's Disease
(AD) detection using Natural Language Processing (NLP), particularly with the utilization of
a Recurrent Neural Network (RNN). Notably, even the RNN, which requires minimal
preprocessing of subjects' Verb Fluency (VF) data, achieves an accuracy rate of 67%. These
findings are significant because they indicate the feasibility of using data-driven methods for
AD detection without resorting to invasive, costly, or time-consuming clinical procedures.
Comparison with Clinical Methods: The study's results are particularly encouraging when
compared to established clinical AD detection methods such as EEG and brain scans. These
clinical methods are not only costly and time-consuming but also often invasive. The
accuracy scores achieved in this study using NLP and the RNN model are competitive with
these clinical methods. Therefore, NLP-based AD detection, while non-invasive and more
cost-effective, shows potential for delivering accurate results.
The results also reveal that the Random Forest (RF) model performs slightly better in AD
detection when compared with the Neural Network (NN) and RNN models. However, the
differences in performance among these models are not statistically significant. An important
consideration here is that RF involves extensive data preprocessing, whereas the RNN model
requires minimal preparation, making it more efficient, cost-effective, and suitable for remote
applications.
Future Directions and Exploration: Several promising avenues for future research. First,
further refinement and tuning of the RNN and NLP models can enhance their performance.
Additionally, it is advisable to study patient-specific covariates, including age and
comorbidities, more comprehensively to improve the models' accuracy. Exploring the
simultaneous analysis of different categories of verbal fluency tasks, such as semantic,
phonemic, and verb fluency, may add value to AD detection, as each task can tap into distinct
psycholinguistic processes and cognitive declines in AD.
One of its limitations, which is the relatively small sample size. To strengthen the robustness
and generalizability of the findings, further research with larger data sets is recommended.
This will not only help reproduce the current results but also allow for building upon them
and refining the models for more accurate and widespread AD detection.
The study demonstrates that Alzheimer's Disease (AD) can be detected with above-average
accuracy using Natural Language Processing (NLP), particularly with a Recurrent Neural
Network (RNN). This method requires minimal preprocessing of subjects' Verb Fluency (VF)
data, making it a promising non-invasive approach.
The accuracy scores obtained in this research are comparable to reported accuracies of more
costly and time-consuming clinical AD detection methods, such as EEG and brain scans. This
highlights the potential of data-driven approaches, like NLP and RNN, to provide a cost-
effective alternative.
While Random Forest (RF) performs slightly better in AD detection compared to Neural
Network (NN) and RNN, the differences are not statistically significant. Notably, RF requires
substantial data preprocessing, whereas RNN simply involves concatenating subjects' verbs.
This makes the RNN method more efficient, cost-effective, and suitable for remote use.
The study suggests further exploration of using RNN with NLP after collecting subjects' verb
listings, indicating potential for refining and enhancing the method. Additionally, future work
could involve a more comprehensive study of patient-specific factors, including age and
comorbidities, to improve AD detection accuracy.
Investigating various categories of verbal fluency tasks (e.g., semantic, phonemic, and verb
fluency) concurrently could provide valuable insights into their combined effectiveness in
AD detection. Understanding the distinct psycholinguistic processes involved in each task
may refine the diagnostic approach.The study recognizes the limitation of a relatively small
sample size. Therefore, larger datasets are recommended for future studies to validate and
expand upon the current findings.
REFERENCES
[1] Aradhana Soni, Benjamin Amrhein, Matthew Baucum, Eun Jin Paek, “Using Verb
Fluency, Natural Language Processing, and Machine Learning to Detect Alzheimer’s
Disease” 2021 43rd Annual International Conference of the IEEE Engineering in
Medicine & Biology Society (EMBC) Oct 31 - Nov 4, 2021.
[2] J. Elflein, “Death rate due to alzheimer’s disease in the u.s. 2000- 2019,”
Alzheimer’s Association, 2021.