0% found this document useful (0 votes)

17 views

Seminar Report Vivek R

Uploaded by

ഓൺലൈൻ ആങ്ങള

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Seminar Report Vivek R

Uploaded by

ഓൺലൈൻ ആങ്ങള

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

USING VERB FLUENCY, NATURAL LANGUAGE PROCESSING, AND

MACHINE LEARNING TO DETECT ALZHEIMER’S DISEASE

A SEMINAR REPORT

submitted by

VIVEK R
CEM20CS041

the APJ Abdul Kalam Technological University

in partial fulfillment of the requirements for the award of the Degree
of

Bachelor of Technology
in
Computer Science and Engineering

Department of Computer Science and Engineering

College of Engineering Muttathara

Thiruvananthapuram

DECEMBER 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

COLLEGE OF ENGINEERING MUTTATHARA

VISION

To generate competent professionals in Computer Science and Engineering by imparting

quality education and there by facilitating cutting edge research and development to serve as
valuable resources for the industry and society.

MISSION

 Provide learning ambience through strong theoretical and practical background with
an emphasis on software development.
 Establish industry interaction programmes to enhance the technical-knowhow at par
with the current trends in the industry and to promote the entrepreneurship skills.
 Promote research-based projects in the emerging areas of Computer Science and
Engineering.
 Empowering the youth in rural communities with computer education.
 Inculcating professional behaviour, leadership qualities, team spirit, skills on problem
solving, critical thinking, and ethical responsibilities.

I
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

COLLEGE OF ENGINEERING MUTTATHARA

CERTIFICATE

This is to certify that the report entitled “Using Verb Fluency, Natural Language Processing,
and Machine Learning to Detect Alzheimer’s Disease” submitted by Vivek R to the APJ
Abdul Kalam Technological University in partial fulfillment of the requirements for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering is a
bonafide record of the seminar work carried out by him/her under my guidance and
supervision. This report in any form has not been submitted to any other University or
Institute for any purpose.

Internal Supervisor: Seminar Coordinator:

Mrs. Surya S R Mrs. Devi Dath

Assistant Professor Assistant Professor
Dept of CSE Dept of CSE

HEAD OF THE DEPARTMENT:

Mrs. Bindhu J S
Assistant Professor
Dept of CSE

II
ACKNOWLEDGEMENT

I take this opportunity to express my deep sense of gratitude and sincere thanks to all who
helped me to complete the seminar successfully.

Firstly, I would like to express my sincere gratitude to my guide Mrs. Surya S R for the
guidance and valuable comments.

I am also extremely thankful to my seminar coordinator Mrs. Devi Dath for giving her moral
support, positive criticism and cooperation.

I also express my gratitude to Mrs. Bindhu J S, Head of Computer Science and Engineering
Department for her support and cooperation.

Finally, I thank all other staff members, my parents and friends for their help and motivation.

Vivek R

III
ABSTRACT

Alzheimer’s disease (AD) causes significant impairments in memory and other cognitive
domains. As there is no cure to the disease yet, early detection and delay of disease
progression are critical for management of AD. Verbal fluency is one of the most common
and sensitive neuro psychological methods used for detection and evaluation of the cognitive
declines in AD, in which a subject is required to name as many items as possible in 30 or 60
seconds that belong to a certain category. In this study, we develop an approach to detect AD
using a verb fluency (VF) task, a specific subset of verbal fluency analyzing the subjects’
listing of verbs in a given time period. We use machine learning techniques including random
forest (RF), neural network (NN), recurrent NN (RNN), and natural language processing
(NLP) to detect the risk of AD. The results show that the developed models can stratify
subjects into the corresponding AD and control groups with up to 76% accuracy using RF,
but at a cost of having to preprocess the data. This accuracy is slightly lower, but not
significantly, at 67% using RNN and NLP, which involves almost no manual preprocessing
of the data. This study opens up a powerful approach of using simple VF tasks for early
detection of AD.

IV
CONTENTS

ACKNOWLEDGEMENT iii

ABSTRACT iv

LIST OF FIGURES vi

LIST OF TABLES vii

1. INTRODUCTION 1
1.1 OBJECTIVE 2

1.2 MOTIVATION 3

1.3 PROBLEM STATEMENTM 4

2. LITERATURE SURVEY 5
2.1 Detecting Japanese Patients with Alzheimer’s Disease based on
Word Category Frequencies 5
2.2 Deep learning to detect Alzheimer's disease from neuroimaging: A systematic
literature review 6
2.3 Early-Stage Alzheimer's Disease Prediction Using Machine Learning Models 8

3. PROPOSED METHOD 12

3.1 Data Collection and Preprocessing 12

3.2 Models 14

3.3 Feature Selection for RF and NN 16

4. RESULT AND DISCUSSION 18

5. CONCLUSION 22

REFERENCES 25

V
LIST OF FIGURES

No. Title Page no.

3.1 Example 2D word embedding space,

where similar words are closer together 14

4.1 Feature Importance Table 3.1 presents

the descriptions of these features. 18

VI
LIST OF TABLES

No. Title Page no.

3.1 Top 15 Most Important Features-RF And

NN Models In The Order Of Importance 12

4.1 Averages (and standard deviations) of

metrics of the three ml models 19

4.2 t-Test Comparison Of Three Models 20

VI
CHAPTER 1

INTRODUCTION

Alzheimer’s disease (AD) is the leading cause of dementia, accounting for 60-80 percent of
cases. Dementia generally refers to a patient’s decline in memory and cognitive skills such as
their ability to reason, think, or speak clearly. AD is a degenerative brain disease that
originates from damage to brain cells. While no cure for AD currently exists, earlier detection
of the disease means earlier intervention and more effective care. Despite the growing
number of cases of AD, approximately only a quarter of the patients are typically diagnosed.
Worse yet, the mortality rate of AD in the United States has significantly increased between
2000 and 2018 from 17.6 to 37.3 deaths per 100,000 population.

A large research body is dedicated to studying the utilization of language tasks in improving
early detection of AD. This research has shown promise as AD results in cognitive
impairment and typically has negative implications on how patients produce or use language.
In general, past studies have covered recording a patient’s speech over a period of time and
analyzing the number and types of words they produce to detect AD. Such an approach to
AD detection is promising because there is generally no need for expensive equipment or
invasive procedures, and the data collection and analysis can be done even remotely.
However, existing works on detecting AD from recorded speech data generally use time-
intensive tasks, such as open ended interviews with clinicians.

We leverage a verb fluency (VF) task data analysis to detect AD that simply relies on the way
patients list verbs in bursts of 30 seconds. Although verbal fluency, e.g., semantic fluency
and phonemic fluency, has been commonly used to detect AD, analyzing the listing of verbs
is much less explored. This task of listing verbs has the potential to simplify the evaluation
process and can be more readily transferable and generalizable to a large array of languages.

Therefore, in this study, we aim to leverage machine learning (ML) and natural language
processing (NLP) along with VF for early AD detection. ML is a branch of artificial
intelligence that allows for eliciting patterns from the data. It can draw associations between a
set of input variables (e.g., the choice of verbs, the pattern with which they are produced, etc.)
and output (response) variables (e.g., at risk for AD or not). NLP is a field at the intersection
of artificial intelligence and linguistics that concerns with the interactions between human
(natural) language and computers. Both ML and NLP, either separately or jointly, have been
used in health care applications to much success, e.g., to detect or predict various outcomes
or risks for patients using electronic medical records.

In this study, we develop an approach to detect AD using the data from a 30-second VF task.
First, we develop ML models that detect AD using psycholinguistic features of the input
verbs, extracted by experts from the VF task data. We specifically develop random forest
(RF) and neural networks (NN) models. Next, we leverage NLP and ML jointly to develop an
end-to-end ML pipeline. That is, we use NLP on the concatenated text string of verbs from
subjects to elicit information. We then use this elicited information along with the (raw)
sequence of verbs produced in a recurrent neural network (RNN) model to detect AD.

1.1 OBJECTIVE
 Find individuals at risk of Alzheimer's disease before it worsens, allowing timely
support.
 Evaluate memory and cognitive issues using verbal fluency tasks.
 Employ advanced technology like machine learning and NLP to develop accurate AD
prediction models.
 Classify individuals into AD or control groups based on their performance in VF
tasks.
 Strive for high detection accuracy by comparing various machine learning models and
data preprocessing methods.
 Explore the feasibility of using straightforward VF tasks for affordable and accessible
early AD detection.
1.2 MOTIVATION

Research in Alzheimer's disease detection is a field that holds immense motivation. At its
core, it is about improving the lives of individuals and families affected by this devastating
disease. Early detection, which your work seeks to advance, can offer the gift of more time
and better care, making a profound impact on the quality of life for those at risk. This noble
cause fuels your research with a sense of purpose and urgency. The hope that research instills
is another powerful motivator. Each step forward, every breakthrough, and every experiment
conducted brings us closer to potential treatments or interventions that can change the course
of Alzheimer's disease. The possibility of being a part of this transformative journey gives
you a strong sense of purpose and optimism.

Research in the field of Alzheimer's disease detection is deeply motivated by the profound
impact it can have on the lives of individuals and families affected by this devastating
condition. Several key motivators drive researchers in this noble cause:

 Improving Lives: At the core of Alzheimer's disease research is the fundamental goal of
improving the lives of those afflicted by this condition. Early detection, which your work
seeks to advance, offers the invaluable gift of more time and better care. This can make a
significant difference in the quality of life for individuals at risk of AD and provide
support for their families.

 Sense of Purpose: The noble cause of early detection and the potential to positively
influence the trajectory of the disease give your research a strong sense of purpose.
Knowing that your work can contribute to early interventions, improved treatments, and
enhanced patient care adds a profound sense of meaning to your efforts.

 Hope and Optimism: Research instills a powerful sense of hope and optimism. Each step
forward, every breakthrough, and every experiment conducted brings us closer to
potential treatments or interventions that can change the course of Alzheimer's disease.
The possibility of being a part of this transformative journey to combat AD gives you a
strong sense of purpose and a positive outlook.
1.3 PROBLEM STATEMENT

Alzheimer's disease (AD) represents a growing global health crisis with no cure. Early
detection is essential for effective management, but current methods often involve complex
and costly procedures. The need for a non-invasive, accessible, and accurate detection tool is
evident. This research addresses the pressing problem of late-stage Alzheimer's disease
diagnosis by exploring the potential of verb fluency tasks in machine learning and natural
language processing. The current diagnostic process is often resource-intensive, time-
consuming, and may not yield timely results, hindering early intervention and support for
affected individuals.

Alzheimer's disease (AD) is a global health crisis characterized by its growing prevalence,
lack of a cure, and profound impact on individuals and society. One of the most pressing
challenges in AD management is early detection. The earlier AD is identified, the more
effective interventions and support can be provided to patients. However, the current
diagnostic methods for AD often involve complex and costly procedures, making them less
accessible and efficient for timely detection.

Challenges with Current Diagnostic Methods:

 Resource-Intensive: Many of the existing diagnostic methods for AD are resource-

intensive. They may require specialized equipment, extensive testing, and expert medical
personnel, making them costly and difficult to implement widely.

 Time-Consuming: The diagnostic process can be time-consuming, involving multiple

stages of evaluation, including clinical assessments and neuroimaging studies. These
time delays hinder early intervention, which is critical in managing AD effectively.

 Lack of Timely Results: The results of current diagnostic procedures may not be
available in a timely manner. Delays in obtaining diagnosis can lead to missed
opportunities for early intervention and support.
CHAPTER 2

LITERATURE SURVEY

2.1 Detecting Japanese Patients with Alzheimer’s Disease based on Word

Category Frequencies [2]

This study focuses on the early detection of Alzheimer's disease (AD) through the analysis of
spoken language using Natural Language Processing (NLP). While previous research has
explored aspects like vocabulary size, grammatical complexity, and fluency, the content
analysis of narratives of AD patients remains a challenge in NLP. To investigate this, the
researchers recruited 18 participants aged between 53 and 90, with an average age of 76.89.
These participants were divided into two groups based on their Mini-Mental State
Examination (MMSE) scores, a commonly used test to assess cognitive function. The AD
group consisted of 9 participants with MMSE scores of 21 or lower, indicating cognitive
impairment, while the healthy control group included 9 participants with scores of 22 or
higher, indicating normal cognitive function. The researchers used Linguistic Inquiry and
Word Count (LIWC), a text analysis software, to categorize the words used by the
participants. LIWC is designed to identify linguistic patterns and psychological processes in
written or spoken text. They also measured the word frequency through observation. The key
finding of this study is the significant difference observed in the usage of impersonal
pronouns in the AD group compared to the control group. Impersonal pronouns, such as "it,"
"they," or "them," are often used to refer to objects or people in a non-specific or indirect
way. This suggests that individuals with AD may exhibit distinct linguistic patterns, such as a
higher usage of impersonal pronouns, in their spoken language.

This research is centered on the early detection of Alzheimer's Disease (AD) by employing
Natural Language Processing (NLP) to analyze spoken language. Previous investigations in
this area have explored linguistic aspects like vocabulary size, grammatical complexity, and
fluency, but the analysis of narrative content in AD patients remains a challenging endeavor
for NLP.

Participant Recruitment: For the study, 18 participants were recruited, ranging in age from 53
to 90, with an average age of 76.89. These participants were divided into two groups based
on their Mini-Mental State Examination (MMSE) scores, a widely used test for assessing
cognitive function. The AD group comprised 9 participants with MMSE scores of 21 or
lower, indicating cognitive impairment associated with AD. In contrast, the healthy control
group included 9 participants with MMSE scores of 22 or higher, indicating normal cognitive
function.

Methodology and Analysis: To investigate linguistic patterns, the researchers employed

Linguistic Inquiry and Word Count (LIWC), a text analysis software designed to categorize
words used in written or spoken text. LIWC is a powerful tool for identifying linguistic
patterns and psychological processes in text data. Additionally, the study measured word
frequency through observation.

Key Finding: The study's most significant discovery is the notable difference in the usage of
impersonal pronouns between the AD group and the control group. Impersonal pronouns,
such as "it," "they," or "them," are used to refer to objects or people in a non-specific or
indirect manner. The increased usage of impersonal pronouns in the AD group suggests that
individuals with AD exhibit distinctive linguistic patterns in their spoken language. This
specific finding underscores the potential of NLP in uncovering linguistic markers or cues
that may aid in the early detection of AD, providing valuable insights into the cognitive
changes associated with the disease.

In summary, this research contributes to our understanding of how linguistic analysis,

specifically focusing on the usage of impersonal pronouns, can play a role in the early
detection of Alzheimer's Disease. It highlights the significance of NLP as a tool for
examining spoken language and offers insights that could inform the development of early
diagnostic tools and interventions for AD.

2.2 Deep learning to detect Alzheimer's disease from neuroimaging: A

systematic literature review [3]

Alzheimer's Disease (AD) is one of the leading causes of death in developed countries. From
a research point of view, impressive results have been reported using computer-aided
algorithms, but clinically no practical diagnostic method is available. In recent years, deep
models have become popular, especially in dealing with images. Since 2013, deep learning
has begun to gain considerable attention in AD detection research, with the number of
published papers in this area increasing drastically since 2017. Deep models have been
reported to be more accurate for AD detection compared to general machine learning
techniques. Nevertheless, AD detection is still challenging, and for classification, it requires a
highly discriminative feature representation to separate similar brain patterns.

This paper reviews the current state of AD detection using deep learning. Through a
systematic literature review of over 100 articles, we set out the most recent findings and
trends. Specifically, we review useful biomarkers and features (personal information, genetic
data, and brain scans), the necessary pre-processing steps, and different ways of dealing with
neuroimaging data originating from single-modality and multi-modality studies. Deep models
and their performance are described in detail. Although deep learning has achieved notable
performance in detecting AD, there are several limitations, especially regarding the
availability of datasets and training procedures.

Context and Importance of AD Detection: Alzheimer's Disease (AD) is a significant public

health concern, particularly in developed countries, where it ranks among the leading causes
of death. From a research perspective, there have been remarkable advancements in using
computer-aided algorithms for AD detection. However, from a clinical standpoint, practical
diagnostic methods are still limited and not widely available.

Emergence of Deep Learning: In recent years, deep learning, a subset of machine learning,
has gained prominence in various domains, particularly for tasks involving images. The
adoption of deep learning in AD detection research began around 2013, with a substantial
increase in related research papers since 2017. Deep learning models have shown promise in
achieving higher accuracy compared to traditional machine learning techniques.

Challenges in AD Detection: Despite the potential of deep learning, AD detection remains a

challenging task. To effectively classify AD, it necessitates highly discriminative feature
representations that can distinguish subtle differences in brain patterns between individuals
with AD and those without.

Review Objectives: This paper aims to provide a comprehensive review of the current state of
AD detection using deep learning. The review is based on a systematic analysis of over 100
research articles. The paper focuses on the most recent findings and emerging trends in the
field. Specifically, it covers the following key aspects:
Biomarkers and Features: The review examines the biomarkers and features used for AD
detection. These include personal information, genetic data, and brain scans, all of which can
provide valuable insights into an individual's risk of developing AD.

Data Preprocessing: The paper discusses the preprocessing steps necessary to prepare data for
deep learning models. This includes data cleaning, normalization, and other procedures that
enhance the quality and consistency of the data.

Neuroimaging Data: The review explores the use of neuroimaging data in AD detection,
considering data from both single-modality and multi-modality studies. Neuroimaging data,
such as MRI scans, offer critical information about brain structure and function that can aid in
AD diagnosis.

Deep Models: The paper delves into deep learning models in detail, providing insights into
the specific models used in AD detection and their performance.

Limitations of Deep Learning in AD Detection: While deep learning has demonstrated

notable success in AD detection, several limitations persist. These include challenges related
to the availability of suitable datasets for training deep models and the design of effective
training procedures. These limitations underscore the need for further research and innovation
in the field of AD detection.

2.3 Early-Stage Alzheimer's Disease Prediction Using Machine Learning

Models [4]

Alzheimer's disease (AD) is the leading cause of dementia in older adults. There is currently
a lot of interest in applying machine learning to find out metabolic diseases like Alzheimer's
and Diabetes that affect a large population of people around the world. Their incidence rates
are increasing at an alarming rate every year. In Alzheimer's disease, the brain is affected by
neurodegenerative changes. As our aging population increases, more and more individuals,
their families, and healthcare will experience diseases that affect memory and functioning.
These effects will be profound on the social, financial, and economic fronts. In its early
stages, Alzheimer's disease is hard to predict. A treatment given at an early stage of AD is
more effective, and it causes fewer minor damage than a treatment done at a later stage.
Several techniques such as Decision Tree, Random Forest, Support Vector Machine, Gradient
Boosting, and Voting classifiers have been employed to identify the best parameters for
Alzheimer's disease prediction. Predictions of Alzheimer's disease are based on Open Access
Series of Imaging Studies (OASIS) data, and performance is measured with parameters like
Precision, Recall, Accuracy, and F1-score for ML models. The proposed classification
scheme can be used by clinicians to make diagnoses of these diseases. It is highly beneficial
to lower annual mortality rates of Alzheimer's disease in early diagnosis with these ML
algorithms. The proposed work shows better results with the best validation average accuracy
of 83% on the test data of AD. This test accuracy score is significantly higher in comparison
with existing works.

Alzheimer's disease (AD) is a prevalent neurodegenerative condition and the leading cause of
dementia among older adults. AD has garnered significant attention in the field of machine
learning due to its increasing incidence rates and profound impact on individuals, families,
and healthcare systems. As the aging population continues to grow, the societal, financial,
and economic burden of AD is escalating. Early diagnosis of AD is crucial because treatment
interventions are more effective and less damaging when administered in the disease's early
stages. Therefore, machine learning techniques are being harnessed to improve prediction
accuracy for AD.

Machine Learning Approaches: The study employs a range of machine learning techniques,
including Decision Trees, Random Forest, Support Vector Machine, Gradient Boosting, and
Voting classifiers. These algorithms are utilized to identify the most effective parameters for
predicting Alzheimer's disease. The research leverages the Open Access Series of Imaging
Studies (OASIS) dataset, which contains valuable imaging data. Evaluation metrics such as
Precision, Recall, Accuracy, and F1-score are used to measure the performance of these
machine learning models.

Clinical Applicability: The proposed classification scheme is designed to have practical

implications for clinicians in making more accurate diagnoses of Alzheimer's disease. Early
diagnosis is essential for improving patient outcomes and lowering the annual mortality rates
associated with AD. Machine learning algorithms offer the potential to enhance the
diagnostic process, aiding healthcare professionals in identifying individuals at risk of AD.

Outcomes: The study reports promising results, with the best validation average accuracy
reaching 83% on the AD test data. This is a significant achievement compared to existing
works, suggesting that the machine learning models applied in this research can make more
accurate predictions of Alzheimer's disease. Higher accuracy rates in AD prediction are
crucial for early intervention and improved patient care.

In summary, this research emphasizes the importance of using machine learning techniques
to predict Alzheimer's disease. The study's results showcase the potential for more accurate
and early diagnoses of AD, which can have substantial benefits for both patients and
healthcare providers, potentially reducing the impact of this devastating disease on
individuals and society as a whole.

2.4 Alzheimer’s Disease stage identification using deep learning models[5]

In this research, the primary objective is to develop a novel approach for the identification of
Alzheimer's Disease (AD) stages in patients through the utilization of mobility data recorded
via smartphones. A cohort of 35 AD patients at a daycare center wore smartphones, which
collected data over the span of a week. Crucially, this data was meticulously labeled to reflect
the different stages of the disease, categorizing patients as either in the early, middle, or late
stage of AD. The researchers harnessed the power of a Convolutional Neural Network (CNN)
model to meticulously process these intricate time series datasets.

The outcomes of this study are highly promising, with the CNN-based method achieving an
impressive accuracy rate of 90.91% and an F1-score of 0.897. These results, in comparison to
the performance of conventional feature-based classifiers, underscore the remarkable
potential of deep learning techniques, especially CNNs, in the realm of Alzheimer's Disease
diagnostics. This innovative method reveals the substantial value of mobility data in both the
treatment and comprehension of the disease, offering not only advanced disease monitoring
capabilities but also insights into the evolution of AD.

Ultimately, the CNN-based approach introduced by this research holds significant

implications for clinical practice and AD management, as it can provide healthcare
professionals with a more precise and efficient tool for assessing AD patients and delivering
tailored care. The potential impact of this research extends to improving the quality of life for
individuals affected by AD, as it opens doors to timely interventions and personalized
treatments, ultimately contributing to enhanced patient outcomes and healthcare practices in
the context of Alzheimer's Disease.
The primary goal of this research is to develop an innovative approach for identifying
different stages of Alzheimer's Disease (AD) in patients by analyzing mobility data collected
through smartphones. A cohort of 35 AD patients wore smartphones that recorded data over a
week. Importantly, this data was meticulously labeled to reflect the stages of the disease,
categorizing patients as being in the early, middle, or late stages of AD. The research team
utilized a Convolutional Neural Network (CNN) model to process these complex time series
datasets.

Promising Outcomes: The study has yielded highly promising results, with the CNN-based
method achieving an impressive accuracy rate of 90.91% and an F1-score of 0.897. These
results, when compared to the performance of conventional feature-based classifiers,
underscore the remarkable potential of deep learning techniques, particularly CNNs, in the
field of AD diagnostics. The CNN-based approach not only demonstrates the feasibility of
using mobility data for advanced disease monitoring but also offers valuable insights into the
progression of AD.

Clinical Implications: The CNN-based approach introduced in this research has significant
implications for clinical practice and the management of AD. It provides healthcare
professionals with a more precise and efficient tool for assessing AD patients and delivering
tailored care. By accurately identifying the different stages of AD, this method enables timely
interventions and personalized treatments, ultimately contributing to enhanced patient
outcomes and improved healthcare practices in the context of Alzheimer's Disease.

Enhanced Quality of Life: The potential impact of this research extends beyond the clinical
setting. It holds the promise of improving the quality of life for individuals affected by AD.
Through early and accurate diagnosis, it opens doors to interventions and treatments that can
significantly enhance the well-being of AD patients. Furthermore, it contributes to our
understanding of the disease's progression and evolution.

In summary, this research introduces an innovative approach for identifying different stages
of Alzheimer's Disease using mobility data and deep learning techniques. The impressive
results and potential applications emphasize its importance in advancing AD diagnostics,
healthcare practices, and the overall quality of life for individuals living with this challenging
condition.
CHAPTER 3

PROPOSED SYSTEM

3.1 Data Collection and Preprocessing

The subject cohort includes a total of 20 AD patients (mean age = 77.85 years) and 25 age-
matched controls (mean age = 72.68 years). Each subject is asked to say as many verbs as
possible in a 30-second block. The responses are recorded verbatim. The study protocol was
reviewed and approved by The University of Tennessee Health Science Center. The data are
analyzed by subject matter experts to extract psycholinguistic properties. The analysis is
performed to elicit properties pertaining to VF responses of individuals with amnestic AD [1]
and cognitively healthy older adults. Specifically, The English Lexicon project, a multi-
university effort to provide a standardized behavioral and descriptive data set for 40,481
words and 40,481 non-words, is used for the psycholinguistic analysis. To extract
psycholinguistic properties, the root forms of the verbs are used.

The properties extracted include: Total number of correctly produced words, length of the
word, the number of phonological neighbors that a word has, the number of orthographic
neighbors that a word has, how pleasant a word is, the extent to which the word denotes
something that is weak or strong, number of phonemes in the pronunciation, word frequency,
and the age of acquisition of the word.

In this study, the research population comprises a total of 20 patients diagnosed with
Alzheimer's disease (AD)[1], with a mean age of 77.85 years, and 25 age-matched control
subjects, whose average age is 72.68 years. The participants are tasked with a specific verbal
fluency (VF) task, where they are required to spontaneously generate as many verbs as
possible within a 30-second timeframe. Their responses are meticulously recorded verbatim
during this task.

To ensure the ethical and regulatory standards of research, the study protocol has been
thoroughly reviewed and officially approved by The University of Tennessee Health Science
Center. Subsequently, the collected data undergoes a rigorous analysis process, guided by
subject matter experts, aimed at extracting various psycholinguistic properties associated with
the verbal fluency responses of individuals with amnestic Alzheimer's Disease and
cognitively healthy older adults.
For the psycholinguistic analysis, the researchers draw upon a valuable resource known as
The English Lexicon project. This project represents a collaborative effort across multiple
universities, providing a standardized dataset encompassing 40,481 words and 40,481 non-
words, each meticulously documented with various linguistic and behavioral properties.

The extracted psycholinguistic properties include:

 Total Number of Correctly Produced Words: This metric quantifies the total count of
accurately generated verbs within the given 30-second time frame.

 Word Length: It captures the number of letters or characters in each word, reflecting
word complexity.

 Phonological Neighbors: This property counts the number of words that sound similar or
share phonetic characteristics with the generated word.

 Orthographic Neighbors: It quantifies the number of words that share similar spelling
with the generated word.

 Word Pleasantness: This indicates the emotional connotation of the word, reflecting
whether it is considered pleasant or not by individuals.

 Word Strength: It gauges the extent to which a word denotes something that is perceived
as weak or strong.

 Number of Phonemes: This represents the count of individual sounds or phonemes in the
pronunciation of a word.

 Word Frequency: This metric reflects how often the word is typically used in the
language.

 Age of Acquisition: It indicates when in a person's life a word is typically learned or

acquired.
3.2 Models

Two types of ML models were developed in this study. The first type relied on features
extracted from the psycholinguistic properties. Specifically, we calculated the average,
standard deviation, and range of each of the psycholinguistic properties reported for any
given subject. This resulted in 60 initial features. We then used these features to develop two
ML models, namely RF and NN. RF is an ensemble classifier that uses a large number of
decision trees, each fitted on a randomly selected subset of the data, for classification. RF is
generally highly robust against overfitting. For the RF model, based on preliminary results
using out-of-bag (OOB) error, 100 trees were included in the model.

In addition, we used NNs, another non-linear learning model for classification. NNs
transfer the information from an input layer into a hidden layer and finally outputs the results.
For the NN model, one hidden layer with 16 hidden nodes was used. The activation function
was set to rectified linear unit (ReLU). Also, the learning rate was set to 0.001. Adam
optimizer was used for model training. Lastly, features were normalized before feeding them
into the model.

Figure 3.1 Example 2D word embedding space, where similar words are closer together

Two types of machine learning (ML) models were developed to analyze the data. The first
type of model utilized features extracted from psycholinguistic properties, which are
linguistic and cognitive characteristics of the subjects. Specifically, the researchers calculated
the average, standard deviation, and range for each of these psycholinguistic properties for
each subject. This process resulted in the creation of 60 initial features that describe the
linguistic and cognitive profiles of the study participants.
The first ML model developed using these features is known as a Random Forest (RF). RF is
an ensemble classifier that operates by employing a large number of decision trees, each of
which is trained on a randomly selected subset of the dataset. This ensemble approach helps
to enhance the model's robustness and reduce the risk of overfitting, a common problem in
machine learning. The researchers determined that, based on preliminary results and the out-
of-bag (OOB) error, it was most effective to use 100 trees in the RF model.

The second ML model created in the study is a Neural Network (NN). Neural networks are
non-linear models that simulate the functioning of the human brain. They transfer
information from an input layer, through one or more hidden layers, and then produce an
output. In this case, the researchers employed a NN with one hidden layer consisting of 16
hidden nodes. The activation function for this model was set to the rectified linear unit
(ReLU). They used an Adam optimizer for training the model, and the learning rate was set at
0.001. Additionally, before being input into the model, the features were normalized to
ensure that they had a consistent scale.

The second type of ML models did not rely on features extracted from the psycholinguistic
properties. Specifically, we developed an RNN directly using the recorded verbatim. This
involved using the concatenated string of verbal responses for any given subject, plus the
corresponding word embeddings obtained from NLP. In particular, we used word
embeddings to convert the words into vectors, allowing the RNN to form a relationship
between different verbs produced by subjects. Figure 3.1 provides an example of how this
relationship is established in a two-dimensional word embeddings. As seen in the figure, the
word ‘walk’, for example, is closer to the word ‘jog’ than the word ‘laugh.’ Therefore, words
that are closer in meaning (or are related in some way) have more similar vector
representations. The RNN includes one hidden layer with 50 hidden nodes. The activation
function was set to ‘sigmoid’. Adam optimizer was used for model training. The learning rate
was again set to 0.001. All models are developed in Python. For ML models, we use Keras
with the TensorFlow backend. In addition, we use pre-trained, 300-dimension word
embeddings from the spaCy package, which are trained on a corpus of web page data.
Table 3.1 Top 15 Most Important Features-RF And NN Models In The Order Of Importance

3.3 Feature Selection for RF and NN

RF allows for ranking feature importance per the total decrease in the Gini measure of node
impurities. We use this feature ranking to perform feature pruning. Specifically, the 60 initial
features are first ranked based on their importance using RF. The 15 most important features
are then selected to be used in both RF and NN models. D. Input Data Tuning for the RNN
and NLP Model Recall that concatenated string of verbal responses for any given subject was
used in the RNN and NLP model.

To improve model performance, various text string combinations were explored. This
included using concatenated strings with and without stumbling (such as “um” and “uh”), and
with and without repeated verbs, if they occurred. E. Model Evaluation and Metrics For all
models, we employ five-fold cross validation. In each fold, training is done using balanced
sets, i.e., equal numbers of AD patients and healthy controls. This helped to avoid favoring
the more representative group. We consequently provide mean and standard deviations across
the five folds. Evaluation metrics include accuracy, F1 score, and area under the receiver
operating characteristic curve (AUC). Accuracy is the ratio of correct predictions over total
predictions. F1 is the harmonic mean of precision and recall, where precision is the
proportion of positive predictions that are correct and recall is the proportion of positive
predictions that are correctly classified. AUC is the value that reflects the overall ranking
performance of a classifier.
For the Recurrent Neural Network (RNN) and Natural Language Processing (NLP) models,
the input data consists of concatenated strings of verbal responses from each subject. To
enhance the performance of these models, the study explores different combinations of text
strings. This includes variations with and without filler words (such as "um" and "uh") and
with and without repeated verbs if they are present in the responses. This fine-tuning of input
data helps to optimize the RNN and NLP models by considering different text string
combinations.

To assess the performance of all the developed models, a five-fold cross-validation approach
is employed. In each fold, the training is conducted using balanced sets, ensuring an equal
number of AD patients and healthy control subjects. This balanced training approach
prevents any bias toward the more representative group.

The study utilizes several key evaluation metrics to assess model performance:

Accuracy measures the ratio of correct predictions to the total predictions. It indicates how
many predictions were right out of all predictions made by the model. The F1 score is a
metric that combines precision and recall. Precision is the proportion of positive predictions
that are actually correct, and recall is the proportion of actual positives that are correctly
predicted. The F1 score provides a balanced measure of a model's accuracy, particularly
when dealing with imbalanced datasets. AUC measures the overall ranking performance of a
classifier. It reflects the model's ability to distinguish between the two classes, in this case,
AD patients and healthy controls. A higher AUC indicates better discrimination capabilities.
CHAPTER 4

RESULT AND DISCUSSION

Feature Importance and Descriptions: In Figure 4.1, the visual representation illustrates the
importance of the top 15 features selected for inclusion in both the Random Forest (RF) and
Neural Network (NN) models. These features are central to the models' ability to make
accurate classifications. Table 3.1, on the other hand, provides detailed descriptions of these
features, shedding light on their nature and characteristics.

Figure 4.1 Feature Importance Table 3.1 presents the descriptions of these features.

Figure 4.1 displays the importance of the top 15 features that are included in the RF and NN
models. Table 3.1 presents the descriptions of these features. As seen in the figure and table,
the features are generally drawn from psycholinguistic properties relating to age of
acquisition, number of phonemes how pleasant the word is, phonological neighbours that a
word has, among others. In addition, per the preliminary results, the text strings without
stumbling and with repeated verbs resulted in best performance. We used this approach for
the rest of the study.

Feature Origins: These top 15 features are derived from various psycholinguistic properties.
The descriptions in Table 3.1 reveal that these properties encompass aspects related to
language and cognition, such as the age of word acquisition, the number of phonemes
(distinct speech sounds) in a word, the emotional connotation or pleasantness of a word, and
the presence of phonological neighbors (words that sound similar) for a given word. These
properties collectively capture linguistic and cognitive dimensions of the verbal responses
used in the study.
Optimal Text String Combinations: As suggested by preliminary results, the study found that
using text strings without "stumbling" (e.g., filler words like "um" and "uh") and including
repeated verbs yielded the best model performance. This implies that these specific
combinations of verbal responses, which are presumably more fluent and coherent, are most
effective in discriminating between Alzheimer's Disease (AD) patients and healthy control
subjects. Based on this finding, the researchers decided to continue using these specific text
string combinations in the subsequent phases of the study.

Table 4.1 Averages (and standard deviations) of metrics of the three ml models

Table 4.1 presents the averages and standard deviations of the evaluation metrics for RF, NN,
and RNN models. As seen in the table, RF slightly outperforms NN and RNN models. This
model is able to detect AD participants with an accuracy of 76%. Note that the RF model
relies on features extracted from psycholinguistic properties that require considerable
preprocessing of the data by subject matter experts. However, even with minimal
preprocessing of data, RNN model is able to correctly detect AD with an accuracy of 67%.

The table presents the averages and standard deviations of evaluation metrics for the Random
Forest (RF), Neural Network (NN), and Recurrent Neural Network (RNN) models, all of
which were developed in the study to detect Alzheimer's Disease (AD) in participants.

 Model Performance Comparison: As observed in the table, the RF model slightly

outperforms the NN and RNN models. Specifically, RF achieves an accuracy rate of
76%, which means it correctly identifies AD participants 76% of the time. This is a
notable achievement in the context of AD detection.

 RF's Advantages: The RF model's success can be attributed to its use of features
extracted from psycholinguistic properties, which provide valuable linguistic and
cognitive insights. However, it's worth noting that this approach involves a considerable
amount of data preprocessing conducted by subject matter experts to ensure the
relevance and accuracy of these features. This meticulous data preparation contributes to
the RF model's higher accuracy.

 RNN's Simplicity: In contrast, the RNN model, while achieving a slightly lower accuracy
of 67%, offers an important advantage. It requires minimal preprocessing of the data.
This means that, compared to the RF model, the RNN model is less reliant on extensive
manual data preparation. This simplicity can be particularly valuable in terms of time and
resource efficiency.

 Implications: The study's findings suggest that the RF model, with its higher accuracy, is
a robust choice for AD detection, especially when extensive data preprocessing can be
employed. However, the RNN model, despite its slightly lower accuracy, offers a more
streamlined approach that requires less data preparation effort. This is significant because
it might make AD detection more accessible and cost-effective, particularly when
resources for extensive preprocessing are limited.

In essence, the choice between the RF and RNN models hinges on a balance between the
desire for high accuracy and the practical constraints related to data preprocessing. Each
model has its advantages, and this study provides valuable insights for researchers and
healthcare professionals seeking efficient methods for AD detection.

Figure 4.2 t-Test Comparison Of Three Models

Further, we performed paired t-test to compare the results of the three models. Table III lists
the p-values of these tests. As seen in the table, the differences between the RF model and the
other models is not statistically significant. This concludes that the results from the RF model
are not significantly better than the other two models.

 The paired t-tests conducted in the study aimed to assess whether the differences in
performance between the Random Forest (RF) model and the other two models (Neural
Network, NN, and Recurrent Neural Network, RNN) are statistically significant. The
results of these t-tests are presented in Table III, which lists the p-values associated with
the comparisons.

 Interpreting the P-values: In statistical hypothesis testing, the p-value is a crucial

measure. It quantifies the strength of evidence against the null hypothesis. A smaller p-
value suggests stronger evidence against the null hypothesis, while a larger p-value
indicates weaker evidence.

 Findings from the Table: As indicated in the table, the p-values for the comparisons
between the RF model and the other models are not statistically significant. In other
words, the differences in performance between the RF model and the NN and RNN
models are not substantial enough to be considered statistically meaningful.

 Implications: This finding has important implications. It suggests that, based on the data
and the evaluation metrics used in the study, there is no compelling statistical evidence to
conclude that the RF model significantly outperforms the NN and RNN models or vice
versa. The models appear to have comparable performance in terms of accuracy, F1
score, and AUC, as indicated by the non-significant p-values.

 In practical terms, this means that researchers and practitioners have flexibility in
choosing between these models for Alzheimer's Disease (AD) detection. The choice can
be influenced by factors such as the level of data preprocessing needed, resource
availability, and the specific goals of the study. The findings emphasize the importance
of considering not only model performance but also the practical aspects of model
implementation when making decisions in AD detection research.
CHAPTER 5

CONCLUSION

Our results demonstrate that we can correctly detect AD with above-average chance accuracy
using NLP, even when using an RNN that requires almost no preprocessing of subjects’ VF
data. Our accuracy scores fall within the reported accuracy ranges of several clinical AD
detection methods, such as EEG and brain scans, that are considerably more costly and time-
consuming than VF tasks. Our results thus show promise for detecting AD using data-driven
methods without resorting to cost prohibitive, invasive or time-consuming clinical
procedures. As indicated by our results, RF performs as the slightly better method for
detecting AD when compared with NN and RNN. However, the differences are not
significant. It is worth noting that the RF requires considerable data preprocessing. While the
RF requires analysis and computation of psycholinguistic properties, the RNN simply
requires the concatenation of the subjects’ verbs. The latter methodology provides a much
more efficient, time and cost saving means to detect AD with 67% accuracy, and can easily
be conducted remotely. Given these benefits and the insights derived here regarding its
potential effectiveness, further exploration into using an RNN with an NLP after collecting
subjects’ verb listings stands out as a worthy venture. More specifically, future work may
include further refining and tuning the RNN and NLP while studying the patient-specific
covariates including age and comorbidities more comprehensively. It is also worth
investigating whether analyzing different categories of verbal fluency tasks (e.g., semantic,
phonemic, and verb fluency) simultaneously adds value to the detection of AD, given the
distinctive psycholinguistic processes demanded upon each task and each task’s sensitivity to
different aspects of cognitive declines in AD. Lastly, we acknowledge that one of the
limitations of this study is the small sample size. Hence, further studies using larger data sets
are needed to reproduce the current findings and build upon them.

Results are quite promising, demonstrating the potential for accurate Alzheimer's Disease
(AD) detection using Natural Language Processing (NLP), particularly with the utilization of
a Recurrent Neural Network (RNN). Notably, even the RNN, which requires minimal
preprocessing of subjects' Verb Fluency (VF) data, achieves an accuracy rate of 67%. These
findings are significant because they indicate the feasibility of using data-driven methods for
AD detection without resorting to invasive, costly, or time-consuming clinical procedures.
Comparison with Clinical Methods: The study's results are particularly encouraging when
compared to established clinical AD detection methods such as EEG and brain scans. These
clinical methods are not only costly and time-consuming but also often invasive. The
accuracy scores achieved in this study using NLP and the RNN model are competitive with
these clinical methods. Therefore, NLP-based AD detection, while non-invasive and more
cost-effective, shows potential for delivering accurate results.

The results also reveal that the Random Forest (RF) model performs slightly better in AD
detection when compared with the Neural Network (NN) and RNN models. However, the
differences in performance among these models are not statistically significant. An important
consideration here is that RF involves extensive data preprocessing, whereas the RNN model
requires minimal preparation, making it more efficient, cost-effective, and suitable for remote
applications.

Future Directions and Exploration: Several promising avenues for future research. First,
further refinement and tuning of the RNN and NLP models can enhance their performance.
Additionally, it is advisable to study patient-specific covariates, including age and
comorbidities, more comprehensively to improve the models' accuracy. Exploring the
simultaneous analysis of different categories of verbal fluency tasks, such as semantic,
phonemic, and verb fluency, may add value to AD detection, as each task can tap into distinct
psycholinguistic processes and cognitive declines in AD.

One of its limitations, which is the relatively small sample size. To strengthen the robustness
and generalizability of the findings, further research with larger data sets is recommended.
This will not only help reproduce the current results but also allow for building upon them
and refining the models for more accurate and widespread AD detection.

The study demonstrates that Alzheimer's Disease (AD) can be detected with above-average
accuracy using Natural Language Processing (NLP), particularly with a Recurrent Neural
Network (RNN). This method requires minimal preprocessing of subjects' Verb Fluency (VF)
data, making it a promising non-invasive approach.
The accuracy scores obtained in this research are comparable to reported accuracies of more
costly and time-consuming clinical AD detection methods, such as EEG and brain scans. This
highlights the potential of data-driven approaches, like NLP and RNN, to provide a cost-
effective alternative.

While Random Forest (RF) performs slightly better in AD detection compared to Neural
Network (NN) and RNN, the differences are not statistically significant. Notably, RF requires
substantial data preprocessing, whereas RNN simply involves concatenating subjects' verbs.
This makes the RNN method more efficient, cost-effective, and suitable for remote use.

The study suggests further exploration of using RNN with NLP after collecting subjects' verb
listings, indicating potential for refining and enhancing the method. Additionally, future work
could involve a more comprehensive study of patient-specific factors, including age and
comorbidities, to improve AD detection accuracy.

Investigating various categories of verbal fluency tasks (e.g., semantic, phonemic, and verb
fluency) concurrently could provide valuable insights into their combined effectiveness in
AD detection. Understanding the distinct psycholinguistic processes involved in each task
may refine the diagnostic approach.The study recognizes the limitation of a relatively small
sample size. Therefore, larger datasets are recommended for future studies to validate and
expand upon the current findings.
REFERENCES

[1] Aradhana Soni, Benjamin Amrhein, Matthew Baucum, Eun Jin Paek, “Using Verb
Fluency, Natural Language Processing, and Machine Learning to Detect Alzheimer’s
Disease” 2021 43rd Annual International Conference of the IEEE Engineering in
Medicine & Biology Society (EMBC) Oct 31 - Nov 4, 2021.

[2] J. Elflein, “Death rate due to alzheimer’s disease in the u.s. 2000- 2019,”
Alzheimer’s Association, 2021.

[3] W. Jarrold, B. Peintner, D. Wilkins, D. Vergryi, C. Richey, M. L. Gorno-Tempini,

and J. Ogar, “Aided diagnosis of dementia type through computer-based analysis of
spontaneous speech,” in Proceedings of the Workshop on Computational Linguistics and
Clinical Psychology: From Linguistic Signal to Clinical Reality. Baltimore, Maryland,
USA: Association for Computational Linguistics, Jun. 2014, pp. 27–37.

[4] D. Shibata, S. Wakamiya, A. Kinoshita, and E. Aramaki, “Detecting Japanese

patients with Alzheimer’s disease based on word category frequencies,” in Proceedings
of the Clinical Natural Language Processing Workshop (ClinicalNLP). Osaka, Japan:
The COLING 2016 Organizing Committee, Dec. 2016, pp. 78–85.

[5] K. Palmer, L. Backman, B. Winblad, and L. Fratiglioni, “Detection of ¨ alzheimer’s

disease and dementia in the preclinical phase: population based cohort study,” Bmj, vol.
326, no. 7383, p. 245, 2003

Titanic Data Analysis
No ratings yet
Titanic Data Analysis
11 pages
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
83% (6)
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
15 pages
Email Phishing 01
No ratings yet
Email Phishing 01
72 pages
hindi Alzheimer's detection
No ratings yet
hindi Alzheimer's detection
7 pages
Batch19 Final Doc
No ratings yet
Batch19 Final Doc
67 pages
Batch 03 Entire Report
No ratings yet
Batch 03 Entire Report
87 pages
Early Detection of Alzheimers Disease Using Cognitive Features A Voting-Based Ensemble Machine Learning Approach
No ratings yet
Early Detection of Alzheimers Disease Using Cognitive Features A Voting-Based Ensemble Machine Learning Approach
10 pages
Detection of Dementia On Voice Recordings Using Deep Learning: A Framingham Heart Study
No ratings yet
Detection of Dementia On Voice Recordings Using Deep Learning: A Framingham Heart Study
15 pages
j2
No ratings yet
j2
10 pages
Alzheimer'S Disease Detection Using Vgg19 and Inceptionv3 Cnns
No ratings yet
Alzheimer'S Disease Detection Using Vgg19 and Inceptionv3 Cnns
38 pages
JMAIN
No ratings yet
JMAIN
9 pages
BCSE497J Project I Report
No ratings yet
BCSE497J Project I Report
51 pages
Optimized Transfer Learning Based Dementia Prediction System For Rehabilitation Therapy Planning
No ratings yet
Optimized Transfer Learning Based Dementia Prediction System For Rehabilitation Therapy Planning
13 pages
G3 - Final Report
No ratings yet
G3 - Final Report
68 pages
RemovePagesResult_2025_04_01_03_45_32
No ratings yet
RemovePagesResult_2025_04_01_03_45_32
103 pages
Mobeena Jamshed
No ratings yet
Mobeena Jamshed
44 pages
Classification of Alzheimer's Disease Using Gaussian-Based
No ratings yet
Classification of Alzheimer's Disease Using Gaussian-Based
16 pages
AD Detection
No ratings yet
AD Detection
17 pages
FinalReportCapstone
No ratings yet
FinalReportCapstone
57 pages
Alzheimer’s Disease
No ratings yet
Alzheimer’s Disease
15 pages
2021 Acmhealth Dementia and Autoencoder
No ratings yet
2021 Acmhealth Dementia and Autoencoder
11 pages
dl (1)
No ratings yet
dl (1)
57 pages
Investigating Deep Learning For Early Detection and Decision-Making in Alzheimer's Disease: A Comprehensive Review
No ratings yet
Investigating Deep Learning For Early Detection and Decision-Making in Alzheimer's Disease: A Comprehensive Review
38 pages
ETRI Journal - 2024 - Bang - Alzheimer S Disease Recognition From Spontaneous Speech Using Large Language Models
No ratings yet
ETRI Journal - 2024 - Bang - Alzheimer S Disease Recognition From Spontaneous Speech Using Large Language Models
10 pages
Alzheimar 4
No ratings yet
Alzheimar 4
33 pages
brainsci-13-00770-v2 (1)
No ratings yet
brainsci-13-00770-v2 (1)
21 pages
ALZHEIMER ANALYSIS USING MACHINE
No ratings yet
ALZHEIMER ANALYSIS USING MACHINE
7 pages
Ara Kau Sand Raj
No ratings yet
Ara Kau Sand Raj
20 pages
Alzheimer's Disease Detection Using Deep Learning On Neuroimaging A Systematic Review
No ratings yet
Alzheimer's Disease Detection Using Deep Learning On Neuroimaging A Systematic Review
42 pages
Final Synopsis
No ratings yet
Final Synopsis
12 pages
Hss 2
No ratings yet
Hss 2
6 pages
Technicalsemiar1 Merged
No ratings yet
Technicalsemiar1 Merged
21 pages
s12021-023-09625-7
No ratings yet
s12021-023-09625-7
26 pages
IRJAEM_02-04-062_0138_2403036_1039-1047
No ratings yet
IRJAEM_02-04-062_0138_2403036_1039-1047
9 pages
Alzhemy_A_cloud_enabled_machine_learning_model_for_Alzheimers_disease_prediction
No ratings yet
Alzhemy_A_cloud_enabled_machine_learning_model_for_Alzheimers_disease_prediction
6 pages
Abed 2020
No ratings yet
Abed 2020
6 pages
Article
No ratings yet
Article
7 pages
Early Diagnosis of Alzheimers Disease Using Deep Learning
No ratings yet
Early Diagnosis of Alzheimers Disease Using Deep Learning
11 pages
Research Article: Machine Learning For The Preliminary Diagnosis of Dementia
No ratings yet
Research Article: Machine Learning For The Preliminary Diagnosis of Dementia
10 pages
Primary phase Alzheimer's disease detection using ensemble learning model
No ratings yet
Primary phase Alzheimer's disease detection using ensemble learning model
9 pages
Alzheimer S Dementia Speech Audio Vs Text Multi Model Machine Learning at High Vs Low Resolution
No ratings yet
Alzheimer S Dementia Speech Audio Vs Text Multi Model Machine Learning at High Vs Low Resolution
17 pages
An Optimised Deep Learning Approach For Alzheimer's Disease Classification
No ratings yet
An Optimised Deep Learning Approach For Alzheimer's Disease Classification
7 pages
AI-Based Model For Detection and Classification of Alzheimer Disease
No ratings yet
AI-Based Model For Detection and Classification of Alzheimer Disease
6 pages
Ensemble of CNN Models For Identifying Stages of Alzheimers Disease An Approach Using MRI Scans
No ratings yet
Ensemble of CNN Models For Identifying Stages of Alzheimers Disease An Approach Using MRI Scans
5 pages
Artificial Intelligence in Medicine
No ratings yet
Artificial Intelligence in Medicine
24 pages
Natural-Language-Processing-of-Electronic-Health-Records-for-Predicting-Alzheimer's Disease
No ratings yet
Natural-Language-Processing-of-Electronic-Health-Records-for-Predicting-Alzheimer's Disease
34 pages
1
No ratings yet
1
11 pages
paper 2
No ratings yet
paper 2
14 pages
A Hybrid Data Mining Model For Diagnosis of Patients With Clinical Suspicion of Dementia
No ratings yet
A Hybrid Data Mining Model For Diagnosis of Patients With Clinical Suspicion of Dementia
11 pages
2023 Applied Neuropsychology Paper COGITAB
No ratings yet
2023 Applied Neuropsychology Paper COGITAB
14 pages
Using Neural Networks For Differential Diagnosis of Alzheimer Disease and Vascular Dementia
No ratings yet
Using Neural Networks For Differential Diagnosis of Alzheimer Disease and Vascular Dementia
7 pages
(IJCST-V10I1P12) :hemalatha. R, Subathra Devi. L
No ratings yet
(IJCST-V10I1P12) :hemalatha. R, Subathra Devi. L
4 pages
Alzheimer's Detection Proposal
No ratings yet
Alzheimer's Detection Proposal
8 pages
Draft Publikasi
No ratings yet
Draft Publikasi
4 pages
A Novel Approach Utilizing Machine Learning For The Early Diagnosis of Alzheimer's Disease
No ratings yet
A Novel Approach Utilizing Machine Learning For The Early Diagnosis of Alzheimer's Disease
17 pages
(IJCST-V11I3P10) :jayati Bhardwaj, Navjeet Singh, Iqra Naaz, Pankaj Kumar Singh, Nikita Chaudhary
No ratings yet
(IJCST-V11I3P10) :jayati Bhardwaj, Navjeet Singh, Iqra Naaz, Pankaj Kumar Singh, Nikita Chaudhary
8 pages
1-s2.0-S0933365724001702-main
No ratings yet
1-s2.0-S0933365724001702-main
24 pages
16 1 s2.0 S1319157824000296 Main
No ratings yet
16 1 s2.0 S1319157824000296 Main
22 pages
Ipr - 4
No ratings yet
Ipr - 4
4 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Mastering Data Science: A Comprehensive Guide to Techniques and Applications
From Everand
Mastering Data Science: A Comprehensive Guide to Techniques and Applications
Adam Jones
No ratings yet
coam4_merged
No ratings yet
coam4_merged
5 pages
CS-Bucket2-CST383-Concetps-in-Machine-Learning-Batch3
No ratings yet
CS-Bucket2-CST383-Concetps-in-Machine-Learning-Batch3
2 pages
CS-Bucket2-CST383-Concetps-in-Machine-Learning-Bactch1
No ratings yet
CS-Bucket2-CST383-Concetps-in-Machine-Learning-Bactch1
2 pages
CS-Bucket3-CST385-Client-Server-System
No ratings yet
CS-Bucket3-CST385-Client-Server-System
1 page
FULLTEXT03
No ratings yet
FULLTEXT03
110 pages
Ect282 Microcontrollers, July 2021
No ratings yet
Ect282 Microcontrollers, July 2021
2 pages
Maths ICSE IMPORTANT QUESTION
No ratings yet
Maths ICSE IMPORTANT QUESTION
1 page
Maths Icse Important Questions Class 10
No ratings yet
Maths Icse Important Questions Class 10
1 page
The Use of Machine Learning Techniques To Advance The Detection and Classification of Unknown Malware
No ratings yet
The Use of Machine Learning Techniques To Advance The Detection and Classification of Unknown Malware
6 pages
ML notes-1
No ratings yet
ML notes-1
54 pages
IJRAR23B3375
No ratings yet
IJRAR23B3375
5 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
2504.20039v1
No ratings yet
2504.20039v1
18 pages
Assignment 1-ML
No ratings yet
Assignment 1-ML
4 pages
Guidelines for Preparing the Presentation Slides for Intro to Data Science and AI Project
No ratings yet
Guidelines for Preparing the Presentation Slides for Intro to Data Science and AI Project
2 pages
Mobile Phone price classification and Prediction - Final Project
No ratings yet
Mobile Phone price classification and Prediction - Final Project
7 pages
Software Defect Prediction: A Survey With Machine Learning Approach
No ratings yet
Software Defect Prediction: A Survey With Machine Learning Approach
6 pages
Ground Water Level Prediction: Srigurulekha K. & Dhivya S
No ratings yet
Ground Water Level Prediction: Srigurulekha K. & Dhivya S
11 pages
11 - Vietnamese Text Classification and Sentiment Based
No ratings yet
11 - Vietnamese Text Classification and Sentiment Based
3 pages
Bank Fraud Prediction
No ratings yet
Bank Fraud Prediction
16 pages
AIML Syllabus
No ratings yet
AIML Syllabus
3 pages
Modeling Intrusion Detection System Using Hybrid Intelligent Systems
No ratings yet
Modeling Intrusion Detection System Using Hybrid Intelligent Systems
21 pages
Elgendy GDLFCV MEAP V01 ch1
No ratings yet
Elgendy GDLFCV MEAP V01 ch1
48 pages
Paper 6
No ratings yet
Paper 6
8 pages
Soft Computing Techniques (ECE - 425)
No ratings yet
Soft Computing Techniques (ECE - 425)
2 pages
Data Mining Week 1 2
No ratings yet
Data Mining Week 1 2
117 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Computer Vision and Simulation
No ratings yet
Computer Vision and Simulation
191 pages
Prediction of Heart Disease Using Machine Learning Algorithms: A Survey
No ratings yet
Prediction of Heart Disease Using Machine Learning Algorithms: A Survey
6 pages
CNN Course V1.3
No ratings yet
CNN Course V1.3
19 pages
Baltruschat Et Al. - 2019
No ratings yet
Baltruschat Et Al. - 2019
10 pages
Chapter 3 - Logistic Regression
No ratings yet
Chapter 3 - Logistic Regression
33 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
(IJETA-V11I3P52) :dr. Sunil Kumar Nandal, Vikash
No ratings yet
(IJETA-V11I3P52) :dr. Sunil Kumar Nandal, Vikash
7 pages
Unit 26 Machine Learning - Assignment 02
No ratings yet
Unit 26 Machine Learning - Assignment 02
58 pages

Seminar Report Vivek R

Uploaded by

Seminar Report Vivek R

Uploaded by

USING VERB FLUENCY, NATURAL LANGUAGE PROCESSING, AND

MACHINE LEARNING TO DETECT ALZHEIMER’S DISEASE

the APJ Abdul Kalam Technological University

Department of Computer Science and Engineering

College of Engineering Muttathara

COLLEGE OF ENGINEERING MUTTATHARA

To generate competent professionals in Computer Science and Engineering by imparting

COLLEGE OF ENGINEERING MUTTATHARA

Internal Supervisor: Seminar Coordinator:

Mrs. Surya S R Mrs. Devi Dath

HEAD OF THE DEPARTMENT:

LIST OF TABLES vii

1.3 PROBLEM STATEMENTM 4

3.1 Data Collection and Preprocessing 12

3.3 Feature Selection for RF and NN 16

4. RESULT AND DISCUSSION 18

No. Title Page no.

where similar words are closer together 14

4.1 Feature Importance Table 3.1 presents

the descriptions of these features. 18

No. Title Page no.

NN Models In The Order Of Importance 12

4.1 Averages (and standard deviations) of

metrics of the three ml models 19

4.2 t-Test Comparison Of Three Models 20

Challenges with Current Diagnostic Methods:

 Resource-Intensive: Many of the existing diagnostic methods for AD are resource-

 Time-Consuming: The diagnostic process can be time-consuming, involving multiple

2.1 Detecting Japanese Patients with Alzheimer’s Disease based on Word

Methodology and Analysis: To investigate linguistic patterns, the researchers employed

In summary, this research contributes to our understanding of how linguistic analysis,

2.2 Deep learning to detect Alzheimer's disease from neuroimaging: A

Context and Importance of AD Detection: Alzheimer's Disease (AD) is a significant public

Challenges in AD Detection: Despite the potential of deep learning, AD detection remains a

Limitations of Deep Learning in AD Detection: While deep learning has demonstrated

2.3 Early-Stage Alzheimer's Disease Prediction Using Machine Learning

Clinical Applicability: The proposed classification scheme is designed to have practical

2.4 Alzheimer’s Disease stage identification using deep learning models[5]

Ultimately, the CNN-based approach introduced by this research holds significant

3.1 Data Collection and Preprocessing

The extracted psycholinguistic properties include:

 Age of Acquisition: It indicates when in a person's life a word is typically learned or

3.3 Feature Selection for RF and NN

RESULT AND DISCUSSION

 Model Performance Comparison: As observed in the table, the RF model slightly

Figure 4.2 t-Test Comparison Of Three Models

 Interpreting the P-values: In statistical hypothesis testing, the p-value is a crucial

[3] W. Jarrold, B. Peintner, D. Wilkins, D. Vergryi, C. Richey, M. L. Gorno-Tempini,

[4] D. Shibata, S. Wakamiya, A. Kinoshita, and E. Aramaki, “Detecting Japanese

[5] K. Palmer, L. Backman, B. Winblad, and L. Fratiglioni, “Detection of ¨ alzheimer’s

You might also like