0% found this document useful (0 votes)

16 views

Paper 26-Early Detection of Autism Spectrum Disorder

The document discusses using machine learning models to detect autism spectrum disorder (ASD) early based on analyzing parents' dialog. Traditional machine learning algorithms like SVM, logistic regression, KNN and random forest were applied to sentiment analysis of sentences about children's symptoms, achieving accuracies of 71-71-62-69%. A cosine similarity model was also used for detecting specific ASD problems based on categorized sentences.

Uploaded by

gautamvkathari71034

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Paper 26-Early Detection of Autism Spectrum Disorder

Uploaded by

gautamvkathari71034

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 14, No. 6, 2023

Early Detection of Autism Spectrum Disorder (ASD)

using Traditional Machine Learning Models
Prasenjit Mukherjee1, Sourav Sadhukhan2, Manish Godse3, Baisakhi Chakraborty4
Dept. of Technology, Vodafone Intelligent Solutions, Pune, India 1
Dept. of Computer Science, Manipur International University, Manipur, India 1
Dept. of Finance, Pune Institute of Business Management, Pune, India 2
Dept. of IT, BizAmica Software, Pune, India3
Dept. of Computer Science and Engg, National Institute of Technology, Durgapur, India 4

Abstract—Autism Spectrum Disorder (ASD) is a mental genetic predisposition, environmental factors, and lifestyle
disorder among children that is difficult to diagnose at an early choices. Although the exact cause is still unknown, the
age of a child. People with ASD have difficulty functioning in available evidence shows that it is a multi-faceted condition.
areas such as communication, social interaction, motor skills, and In addition, the lack of trained professionals and resources to
emotional regulation. They may also have difficulty processing diagnose and treat ASD [1] has created a huge gap in access to
sensory information and have difficulty understanding language, care. Furthermore, due to the complexity of the disorder, it can
which can lead to further difficulty in socializing. Early detection be difficult to diagnose and properly classify it, leading to
can help with learning coping skills, communication strategies, misdiagnosis or delayed diagnosis. This is because autism is a
and other interventions that can make it easier for them to
complex disorder, and it can manifest itself differently in each
interact with the world. This kind of disorder is not curable but
it is possible to reduce the symptoms of ASD. The early age
affected individual [4]. As such, it is difficult to create a single
detection of ASD helps to start several therapies corresponding biomarker that can accurately detect the disorder.
to ASD symptoms. The detection of ASD symptoms at an early Additionally, research into developing tools and applications,
age of a child is our main problem where traditional machine data analysis, and pattern recognition [5][6] to help identify
learning algorithms like Support Vector Machine, Logistic children with autism is challenging, as it requires creating a
Regression, K-nearest neighbour, and Random Forest classifiers comprehensive program that can detect subtle signs of autism
have been applied to parents’ dialog to understand the sentiment across a range of contexts as in [7]. People with autism may
of each statement about their child. After completion of the struggle with understanding social cues, interpreting and
prediction of these models, each positive ASD symptoms-related responding to others‟ emotions, and forming relationships.
sentence has been used in the cosine similarity model for the They may also have difficulty with processing sensory
detection of ASD problems. Samples of parents’ dialogs have information or have strong interests in certain topics or
been collected from social networks and special child training activities. Diagnosis is based on observed behavior, and the
institutes. Data has been prepared according to the model for process can involve interviews and questionnaires, cognitive
sentiment analysis. The accuracies of these proposed classifiers assessments, physical examinations, and genetic and
are 71%, 71%, 62%, and 69% percent according to the prepared neurological tests. All of these evaluations can take time and
data. Another dataset has been prepared where each sentence money, and the cost can be prohibitive for some families.
refers to a particular categorical ASD problem and that has been These tests are designed to identify patterns of behavior and
used in cosine similarity calculation for ASD problem detection.
symptoms associated with autism, by asking parents and
Keywords—Support vector; logistic regression; cosine
professionals to observe the individual. They then analyze the
similarity; K-nearest neighbor; random forest responses and compare them to a set of criteria established to
identify autism or other developmental disorders. For
I. INTRODUCTION example, if a person is using a metal detector, they must have
an understanding of the type of metal they are looking for and
People with ASD [1] often have difficulty in
the size of the object they are searching for. The quality of the
understanding the social cues and expectations that are
metal detector will also have an impact on the accuracy and
necessary for meaningful conversations and relationships with
efficiency of the screening method. Such systems can use
others. This can lead to isolation, difficulty in forming
algorithms to analyze large amounts of data and detect
relationships, and, in some cases, difficulty in gaining
patterns with high accuracy, potentially leading to earlier and
recognition in society as in [2]. Early detection can help
more accurate diagnoses. Additionally, such systems can help
identify the illness sooner, allowing for personalized
to automate certain labor-intensive tasks and reduce the
treatments or preventive measures to be put in place that can
amount of time needed to complete diagnostic tests. This is
help reduce the severity of the illness and improve the chances
because machine learning algorithms can analyze large
of recovery as in [3]. It is caused by a combination of genetic
amounts of data and identify patterns and correlations that
and environmental factors that affect the development of the
would be difficult or impossible for humans to find. The
brain. It is characterized by difficulty in social interaction,
algorithms can then be used to develop predictive models that
communication, and repetitive behaviors. Research has been
can accurately identify potential diagnoses and suggest
done to identify the causes of this syndrome, which include

231 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

therapies as in [8]. Some research scholar has done some work disorders is often written in a complex, highly technical
on ASD diagnosis using machine learning. The aim of this language that is difficult to parse and interpret with natural
research is to reduce the classification time of ASD diagnosis language processing tools. Additionally, many of the diseases
process after the detection of the most influential ASD are multi-faceted and involve a variety of clinical terms that
diagnosis items as in [9][10][11][12]. Machine learning (ML) need to be identified by the NLP tools in order to accurately
is a powerful tool that can be used to analyze vast amounts of extract relevant information. The authors evaluated the
data and identify patterns that can be used to detect mental predictive performance using precision, recall, and F1 score.
health issues. ML can also be used to develop personalized We also ran a manual evaluation to compare the manual
treatments based on individual patient characteristics. This annotation of ASD-related terms with the tools‟ extracted
could potentially lead to more targeted and effective terms, and found that CLAMP outperformed the other two
treatments for mental health issues. Through the use of data- tools in terms of precision, recall, and F1 score on both the
driven techniques, ML enables the analysis of large amounts abstracts and full-text articles. The F1 score combines the
of data to uncover previously unknown patterns, trends, and precision and recall of a system, so it takes into account both
correlations. ML can be used to develop predictive models or the accuracy and completeness of the system. In this case,
to recommend interventions that may be tailored to individual CLAMP had the highest F1 score, meaning it had both a
needs. These challenges include the need to ensure responsible higher precision and a higher recall than the other two
data collection and storage, to develop equitable access to systems. This type of analysis protocol allows researchers to
ML-enabled solutions, to ensure ethical and responsible use of better identify, classify, and quantify the symptoms of a
ML and AI, and to ensure that privacy and confidentiality are disorder, even when there is not a well-defined terminology
maintained as in [13]. set to describe it. This makes it easier to compare the
presentation of the disorder across different populations and
The proposed work is based on the detection of ASD can help to identify potential biomarkers for the disorder as in
symptoms from the parents‟ dialogue. Parents of autistic [14]. People with ASD had more difficulty in expressing
children have the best experience with their autistic children‟s
emotions and abstract concepts than typically developing
symptoms. The data has been collected from many social sites individuals, as well as difficulty in using language to describe
and organizations for special children. The data is related to events and convey information. This suggests that
the parents‟ dialogue in text mode and a dataset has been impairments in the use of pragmatic language are an important
prepared using these parents‟ text inputs. Traditional machine aspect of ASD and should be addressed in interventions. This
learning models like SVM, Logistic Regression, K-nearest suggests that the differences in narrative production between
neighbor (KNN), and Random Forest have been used to detect ASD and control groups are related to difficulties in
the symptoms from the parents‟ text. The sentiment analysis understanding and expressing emotions, as well as producing
process has been used to detect sentences from the parents‟ more abstract language. The individuals with typical
text. After completion of the prediction using the proposed development had a more varied range of vocabulary, which
machine learning models, the positive sentences have been included more words with both positive and negative
used as input in the cosine similarity model. This model will sentiments, while the participants with ASD displayed a
calculate the cosine similarity of input sentences and ASD limited vocabulary, resulting in a greater tendency to use
symptoms sentences to detect ASD problems. Many machine negative words. The lower level of language abstraction in the
learning-based applications related to mental disorders have ASD narratives could be due to the limitation of their
been discussed in Section II. The proposed dataset, detailed vocabulary and the difficulty of expressing abstract concepts.
architecture of the proposed system, and machine learning This suggests that language abstraction and emotional polarity
models have been discussed in Section III. The results of this can be used to measure the narrative abilities of individuals
proposed system have been discussed in Section IV. The with ASD without relying on age or IQ scores. The strong
limitation has been given in Section V whereas conclusion has positive correlation between linguistic abstraction and
been discussed in Section VI and ends with the future work in emotional polarity indicates that the more abstract language
Section VII. used, the more likely it is to contain emotional content. The
II. RELATED WORKS difference in emotional polarity between the two groups could
be due to the fact that individuals with ASD may have
Today, Autism Spectrum Disorder (ASD) is a highly difficulty recognizing and expressing emotions. In addition,
prevalent disorder problem among children. Now it is one of they may have difficulty understanding abstract language
the main components in the healthcare domain and much concepts, which could explain why they used fewer abstract
research has been done using Artificial Intelligence (AI). A words in their narratives as in [15]. One of the most promising
few important AI-based research works on Mental Health areas for developing assistive tools is the use of artificial
related issues have been included in this related work section. intelligence (AI) and machine learning (ML) algorithms.
These NLP software tools use a combination of natural These algorithms can be used to analyze data from various
language processing (NLP) algorithms and domain-specific sources and can provide insights that may help diagnose ASD
ontologies to identify and extract biomedical concepts from earlier and more accurately. The proposed approach is
unstructured texts. The ontologies provide an organized expected to find the underlying patterns in the eye-tracking
representation of biomedical concepts and the NLP algorithms records which can be used to accurately diagnose the
enable the software to accurately identify the concepts in the disorders. The results of this study could provide clinicians
text. This is due to the fact that the existing literature on these with a powerful tool that could potentially improve the

232 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

accuracy and speed of diagnosis. By applying NLP methods to that was trained on EEG data from patients with Parkinson's
the raw eye-tracking data, the study was able to extract Disease in order to classify them as either having the disease
meaningful features from the data that could be used to train or not. The CNN was able to extract relevant features from the
classification models. The experiment showed that using these data without any manual input, resulting in a higher accuracy
features could yield better results than using the raw data than other machine learning approaches. This is because
alone. The authors [16] used a customized loss function to CNNs can learn more complex patterns from the data and
adjust the weights of the model, which allowed them to have the ability to generalize to new data. Event-related
achieve a high level of accuracy. Additionally, authors [16] spectrograms capture more information about the events of
utilized transfer learning to fine-tune the model, allowing us to interest, which can be used to extract more accurate features
further improve the accuracy of the model. The author‟s [16] than resting state EEG spectrograms. This suggests that these
approach could realize a promising accuracy of classification techniques can be used to identify and visualize the underlying
(ROC-AUC up to 0.8) as in [16]. Social behavior issues are physiological differences between neurological disorders and
often the most noticeable in children with autism, and they healthy brains, potentially leading to a better understanding of
may include difficulty forming relationships, lack of eye their underlying pathophysiology. Deep networks are useful
contact, and difficulty understanding nonverbal because they can extract meaningful patterns from EEG
communication. Clinical tests can also be used to look for signals and are capable of handling large amounts of data.
developmental delays, such as difficulty with speech and These results suggest that deep networks can also be used to
language, as well as repetitive behaviours like hand flapping analyze EEG dynamics from smaller datasets, which could be
or rocking. The assessment process is designed to identify key used to develop biomarkers for clinical use as in [18]. EEG
characteristics of autism in individuals, such as difficulty in can provide valuable information to help diagnose ADHD in
communication and social interaction, and to determine the children because it can measure electrical activity in the brain
severity of the condition. By using semi-structured data posted and detect any abnormal electrical activity that may be
in Twitter, the team of doctors can gain insight into the indicative of ADHD. Additionally, EEG can help to
individual's behavior, which can then be used to develop a differentiate ADHD from other mental disorders that may be
more accurate and effective assessment. Analyzing the tweets, present in the child. Symptoms of ADHD include difficulty
it allows researchers to detect the sentiment of people's paying attention, impulsivity, and hyperactivity. These
opinions on autism, the topics that are most commonly symptoms can interfere with a child's ability to learn, manage
discussed, and the language used to discuss autism. This helps emotions, and interact with peers. Video long-range EEG
researchers gain a better understanding of how people think monitoring can provide more accurate and detailed
and talk about autism, and can help inform policy decisions information about the brain activity of children with ADHD
NLP and topic modeling allow for more efficient processing compared to ambulatory EEG monitoring, as it allows for
of data by automatically recognizing patterns and keywords, more frequent data collection and better visualization of the
saving time and effort. Furthermore, the results of the analysis EEG data. It also helps to identify abnormal brain electrical
are highly accurate, making them an ideal choice for studying activities which may be associated with ADHD, thus aiding in
topics such as genetic analysis, the effect of vaccination, and the diagnosis of the condition. By doing this, they were able to
behavior analysis. The 10k tweets dataset is enough to provide accurately identify children with ADHD and study their
in-depth analysis and insight into these topics. The analytical behavioral patterns in order to better understand and treat the
results are used to learn the genetic impact on ASD, the disorder. This allowed for a more precise and detailed analysis
vaccination effect on ASD and also used to learn the behavior than traditional methods of observation. Comparing the results
changes and population of autistic children as in [17]. It is of various models can help to identify which model is best
characterized by a persistent pattern of inattention and/or suited for recognizing signs of ADHD in EEG data. By
hyperactivity-impulsivity that interferes with functioning or selecting the most accurate and appropriate model, researchers
development. It is often accompanied by other mental health can then use it to build a recognition method that can diagnose
disorders, such as anxiety and depression, which can further children with ADHD more accurately. This is because long-
impair functioning and quality of life. We applied the CNN term video EEG can detect the abnormal EEG patterns
model to the EEG data in order to distinguish between ADHD associated with ADHD, such as slow wave activity, and can
patients and healthy controls. The CNN was able to accurately also detect the degree of attention fluctuation in children with
classify the EEG data with an accuracy of 90.3%, significantly ADHD as in [19]. With the recent advances in artificial
outperforming other methods, particularly of event-related intelligence, computers can now analyze EEG data and
potentials (ERP) from ADHD patients (n = 20) and healthy provide results much faster than a neurologist. This has
controls (n = 20) collected during the Flanker Task, with 2800 enabled the field of neurology to become much more efficient
samples for each group. By exploiting invariances, deep and provide more accurate results in a fraction of the time.
networks are able to classify data even when there are This is made possible because AI is able to quickly analyze
variations in the data, such as changes in lighting or and process large amounts of data. It can quickly identify
orientation of an image. Compositional features are patterns and draw conclusions from the data that would take
combinations of basic elements that form a more complex human hours or even days to detect ADHD. Additionally, AI
representation of the data, such as edges and shapes in an can look for indicators of diseases or abnormalities that would
image. Deep networks are able to identify these features, be difficult for humans to find on their own. This is because it
which enables them to accurately classify data. This was can automate the process of analyzing EEG signals, thus
achieved by using a Convolutional Neural Network (CNN) allowing neurologists to quickly and accurately identify

233 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

patterns associated with different neurological diseases. associated with depression. Recognizing the early signs of
Furthermore, this technology can also help neurologists to depression can help to identify and address the issue before it
identify subtle changes in EEG signals that could potentially becomes a more serious problem. The CNN is used to extract
signal the onset of a neurological disorder. The ML model can high-level features from speech signals, while the SVM
process the EEG signals quickly and accurately to detect classifier is used to classify the extracted features. The hybrid
patterns that may indicate ADHD. By making use of the data model is trained on a dataset of Arabic speech from people
generated from the EEG signals, the ML model can diagnose with depression and those without, to produce a model that is
ADHD more accurately and quickly than traditional methods. capable of distinguishing between the two. The hybrid model
By analyzing the EEG signals, the ML model can identify uses a combination of convolutional neural networks (CNNs)
patterns that are indicative of ADHD. Additionally, the ML and support vector machines (SVMs) to analyze while 30% of
model can be trained to recognize these patterns more quickly data were used to test the proposed model. A hybrid model
and accurately than traditional methods. With the right pre- (CNN + SVM) attained a 90.0% and 91.60% accuracy rate to
processing techniques and machine learning algorithms, the predicting the depression from the data and make predictions.
ML model can provide a more accurate diagnosis of ADHD This combination of techniques allows for the model to
than traditional methods as in [20]. This allows individuals to process the data quickly and accurately, resulting in the high
stay connected with their friends and family and to keep up accuracy rates it achieved. This is likely because the hybrid
with what is going on in the world. Additionally, it makes it model combines the strengths of both models. The RNN can
easier to stay in touch with people who are not in the same accurately make predictions based on the context of the data,
physical location, making it a great way to stay connected while the CNN can detect the most important features in the
during this time. The pandemic has had a negative impact on data. By combining both models, the predictive power of the
the mental health of many people, and it has become harder hybrid model is enhanced ‚e RNN achieved an 80.70% and
for them to access in-person support. As a result, online tools 81.60% accuracy rate. This indicates that the combined model
and resources have become more important than ever for those was more effective in classifying depression than either of the
struggling with mental health issues, allowing them to get the individual models alone. The results suggest that incorporating
help they need even when they are unable to leave their multiple models into one prediction system can increase the
homes. Mental health conditions can have a significant impact accuracy of the diagnosis. This is because the achieved
on an individual's overall well-being, affecting their ability to findings can be used to identify key indicators of depression in
work and their relationships with others. Additionally, spoken Arabic, such as speech patterns, intonation, and
research has found that mental illnesses can increase an pauses. These indicators can then be used to identify
individual's risk of developing chronic physical health individuals who may be suffering from depression and help
conditions, such as heart disease and diabetes. AI methods can physicians, psychiatrists, and psychologists provide more
help mental health providers to detect patterns in patient data effective treatment as in [22]. The mental health issues, such
that might otherwise go unnoticed, as well as to generate as depression and anxiety, are becoming more common, and
insights into the patient‟s current state. This can lead to more people are recognizing the need to prioritize their mental
accurate diagnoses and better treatment plans, leading to better health as well as their physical health. Additionally, with the
overall outcomes for the patient. AI can help to analyze development of telehealth services, it's become easier for
patient data quickly and accurately, identify patterns and people to access mental health services regardless of their
correlations, and make predictions about the best course of location. This means that most people who suffer from mental
action for a patient's diagnosis and treatment. AI can also help health issues are unable to get access to the right diagnosis and
reduce the time and resources required for manual data treatment, resulting in an overall decrease in the mental health
analysis and provide more efficient and cost-effective care. of the population. The model will be trained on a dataset of
The models were tested on a labeled dataset of Reddit posts speech samples from people with and without depression.
from users with self-reported mental illnesses and compared Exploring the acoustic features and patterns in the speech
against a baseline model. The results showed that the machine samples of people with depression will help to identify the
learning, deep learning, and transfer learning models differences between those with and without depression. By
outperformed the baseline model in correctly classifying the doing so, it will be possible to detect signs of depression in an
different mental illnesses. This will help to reduce the amount individual and provide an initial diagnosis of mental health
of time it takes to identify and respond to medical problems. This model uses Natural Language Processing
emergencies, which will ultimately lead to more lives being (NLP) techniques to analyze the text and determine the
saved. Additionally, it will also help to reduce the burden on sentiment of the posts. The sentiment of the posts is then used
healthcare workers, which will make the public health system to assess an individual's mental health status as in [23].
more efficient and cost-effective as in [21]. A variety of
A comparative analysis has been done on proposed
factors can contribute to depression, such as genetics, brain
chemistry, environmental influences, traumatic experiences, systems that are equipped with machine learning models and
and other medical conditions. Additionally, depression can be similar types of systems that are also based on machine
caused by a combination of these factors, making it difficult to learning models. Table I contains „Models‟ as the first
pinpoint a single cause. Genetics and brain chemistry can attribute where each model name is defined. The „Description‟
predispose someone to depression, while environmental attribute contains details about the models. The third attribute
factors and traumatic experiences can trigger its onset. Other is „Dataset‟ which refers to the dataset details and the fourth
medical conditions such as chronic illnesses can also be attribute is „Accuracy‟ where each model‟s accuracy has been
given. The last attribute is „Remarks‟ about each model. Fig. 1

234 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

shows the accuracy graph of similar machine learning models and proposed machine learning models.
TABLE I. COMPARATIVE STUDY OF PROPOSED MODELS WITH SIMILAR TYPE MODELS IN MENTAL DISORDERS
Sl.No. Models Description Dataset Accuracy Remarks
Similar Type Machine Learning Models in Mental Disorders
1 CNN, RNN, SNN Deep learning CNN, Recurrent Neural EEG data has been 88%, 86%, EEG is a medical test that measures electrical
[18] Network, and Recurrent Neural used. and 78% activity in the brain. This data is a very high
Network are used for classification and volume and time and cost-effective.
comparison to detect Attention deficit
hyperactivity disorder (ADHD).
2 Fully connected Neural Network-based Deep Learning Deep learning long- 97.7% The data is long-range EEG big data which is a
neural network Model to detect disorders like ADHD. range EEG big data. very high volume data for analysis.
model [19]
3 KNN, SVM, and KNN, SVM, and RF Models are used EEG signals data of 69%, 72%, Much time has to be given for preprocessing to
RF [20] trained with the EEG signals data to ADHD and 74% improve the quality of EEG signals.
detect ADHD.
4 Linear Support Depression, anxiety, bipolar disorder,Unstructured user 79%, 79%, Reddit's post-dataset cleaning process is related
Vector ADHD, and PTSD detection from data on the Reddit 74%, and to removing personal information, punctuation
Classifier, LR, unstructured data. platform has been 75% marks, and URLs.
NB, and RF [21] used.
5 CNN +SVM[22] Intelligent system to detect depressive Basic Arabic 90 and The dataset has been prepared from the audio
symptoms using speech analysis Vocal Emotions 91.60 format for sentiment analysis.
Dataset (BAVED)
6 RNN+CNN [22] Intelligent system to detect depressive Basic Arabic 88.50 and The dataset has been prepared from the audio
symptoms using speech analysis Vocal Emotions 86.60 format for sentiment analysis.
Dataset (BAVED)
Proposed Models in Mental Disorder (Autism Spectrum Disorder)

7 Proposed SVM SVM model to predict positive ASD Parents‟ Dialogues 71% The data has been collected in text form. The
symptoms from parents‟ dialogue. of Autistic Children parents‟ dialogues about their autistic children
in text format from are very useful because they shared their
SAHAS- Durgapur, experiences and thoughts about their autistic
India, and Social children. A parent of an autistic child is the best
Sites. source to understand the ASD symptoms
patterns.
8 Proposed Logistic SVM model to predict positive ASD Parents‟ Dialogues 71% The data has been collected in text form. The
Regression symptoms from parents‟ dialogue. of Autistic Children parents‟ dialogues about their autistic children
in text format from are very useful because they shared their
SAHAS- Durgapur, experiences and thoughts about their autistic
India, and Social children. A parent of an autistic child is the best
Sites. source to understand the ASD symptoms
patterns.
9 Proposed K SVM model to predict positive ASD Parents‟ Dialogues 62% The data has been collected in text form. The
Nearest Neighbor symptoms from parents‟ dialogue. of Autistic Children parents‟ dialogues about their autistic children
(KNN) in text format from are very useful because they shared their
SAHAS- Durgapur, experiences and thoughts about their autistic
India, and Social children. A parent of an autistic child is the best
Sites. source to understand the ASD symptoms
patterns.
10 Proposed SVM model to predict positive ASD Parents‟ Dialogues 69% The data has been collected in text form. The
Random Forest symptoms from parents‟ dialogue. of Autistic Children parents‟ dialogues about their autistic children
in text format from are very useful because they shared their
SAHAS- Durgapur, experiences and thoughts about their autistic
India, and Social children. A parent of an autistic child is the best
Sites. source to understand the ASD symptoms
patterns.

235 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

Y
100
80

ACCURACY
60
40
20
0

RNN [18]

LSVC [21]

RNN [22]
CNN [18]

NN [19]
KNN [20]

CNN [22]

CNN [22]
SVM [20]

Proposed KNN
RF [20]

LR [21]
NB [21]
RF [21]

SVM [22]

Proposed LR
SNN [18]

Proposed RF
Proposed SVM
MACHINE LEARNING MODELS

Fig. 1. Accuracy graph of similar ML models and proposed ML models.

III. ARCHITECTURE OF PROPOSED MODELS 4. My youngest with autism, learning disabilities and is non-verbal,
will be 4. She has to be in a pushchair whilst out and about for
A few traditional machine learning classifiers have been safety as has zero sense of danger. I‟m struggling to find a
used to identify ASD symptoms from parents‟ dialogues. double pushchair suitable for a newborn and my will be 4 year
SVM has been used as the first classifier to identify the old. If anyone can send any links or pictures that would be great.
symptoms from the parents‟ dialogue. Logistic regression is a 5. From few days my son eye movements strangely like keeping
head down n seeing up and moving eye balls to the corners of
second classifier that is also identifying the ASD symptoms the eyes. Can anyone suggest why he is doing so? Please...
from the given dataset. KNN and Random forest are the last thanks!
two classifiers that are also used to identify ASD symptoms
from a given dataset. The Dataset has been prepared from the text in Table II.
Each sentence has been taken into consideration to identify
A. Dataset of Proposed System whether it is a symptom of ASD or not. There are no fixed
The Dataset has been prepared using the parents‟ dialogue symptoms in ASD for identification. Increment of those
where parents are describing their thoughts and experiences parents‟ dialogues who are actually parents of autistic children
about their own autistic child. These data have been collected can be a good idea to identify more symptoms as well as a
from several different social networks and organizations good advantage to train the machine learning models for better
where special children are taking their therapies on accuracy. A few examples of data from the proposed dataset
communication, speech, and behavior. A few parent dialogue have been given in Table III.
example has been given in Table II. Parents‟ dialogues are
very important data from where all possible symptoms of TABLE III. EXAMPLE DATA IN THE PROPOSED DATASET
ASD can be identified. The given dialogues are used to make Sl. No. Comments Sentiment
the dataset for proposed machine learning models training and
testing. because all they do there is play with toys
1. 1
with him every time
TABLE II. EXAMPLE OF PARENTS‟ DIALOGUES 2. I'm confused guys help my son is 3years 0
old now
Sl. No. Parents’ Dialogues My little girl is 3 and a half and still non
3. 1
1. My second son is 4 and also autistic; he's on the move always verbal
and always into something and he's also a big momma's boy,
loves hugging and cuddling me. I'm nervous about bringing 4. he does is mumbles only no proper words 1
baby home. Idk how he'll handle it. Any advice? I was really surprised when he came home
2. Hi. Please I need some advice. My son is 10 and from a few 5. 0
with iep papers
years is very hard to make him do some activities (writing and
staff like that) At school he refuse. They are not able to make
him do anything. At school just play and if say no to him he just The dataset structure in the proposed research has been
scream. He doesn't want to do anything; (in terms of studying or described in Table III where the first column is Serial
activities). I really don't know what to do. Number, the second column is Comments, and the third
3. I‟m currently having problems with washing my (almost 2 year column is Sentiment. Paragraph text from parents‟ dialogues
old) daughter's hair. Whenever i try, she basically goes ballistic
has been taken to prepare the dataset. Each sentence has been
and throws a fit. She‟s scared and I‟m trying to figure out how to
support her and make her feel safe because she does have to get taken from the paragraph text and identifies whether it is a
hair washed. Any suggestions and things that have worked for symptom of ASD or not. If it is a symptom of ASD then it is
you? labeled as 1 (true) otherwise 0 (false). According to Table III,
Sentences in the Comments column with serial numbers 1, 3,

236 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

and 4 are true symptoms of ASD whereas serial numbers 2 dataset and it generates good predictive results according to
and 5 are false symptoms. Now this ASD symptom-based the problem. SVM is based on the finding of the best
dataset has been prepared to train some traditional machine hyperplane that divides data points either in two classes or
learning models like SVM, Logistic Regression, KNN, and multiclass. The proposed approach is binary classification
Random Forest. where data points either true (1) or false (0).

TABLE IV. LIST OF LABELS WITH ASD PROBLEMS

Sl. No. Label ASD Problems
1. 1 Speech Problem
2. 2 Sensory Problem
3. 3 Behaviour Problem
4. 4 Special Education
5. 5 Social Interaction
6. 6 Eye Contact
7. 7 Cognitive Behaviour
8. 8 Hyper Active Problem
9. 9 Child Psychological Problem
10. 10 Attention Problem
Fig. 2. Support Vector Machine (SVM).
Table IV shows that each ASD problem is associated with
the label. Label 1 denotes the “Speech Problem” whereas It can be observed according to the above image that it is a
Label 2 and 3 denotes the “Sensory” and “Behaviour” two features classification problem. The optimized hyperplane
problems. The other problems also mention in the label in has been drawn to linearly separate support vectors. The
Table IV. This table has been used after the prediction of the support vectors can be seen as red and green circles in Fig. 2.
sentiment of a sentence according to the ASD symptoms. If It is a binary classification problem where the SVM algorithm
the sentence is positive (1) then the proposed system will use draws many lines to separate vectors according to true and
this positive sentence as input of the Spacy cosine similarity false. After optimization, the SVM algorithm returns the best-
model. Table V shows a dataset that contains a number of fitted line for classifying the support vectors.
positive sentences with labels. Each label indicates an ASD According to the equation of hyperplane:
problem according to Table V. Each sentence will be used for
a similarity check with predicted positive sentences in the w.X+b=0 Where X is a vector and w is a vector normal to
cosine similarity model and that has been discussed in the hyperplane and b is an offset value.
Proposed System Flow sections. The decision rules have been applied to classify the
positive and negative value.
TABLE V. DATASET FOR COSINE SIMILARITY CHECK
Sl. No. Positive Sentences Label
⃗ ⃗⃗⃗
1. I can‟t show him how to potty during the
day while his dad is at work
7 putting as b, we get
2. 1 ⃗ ⃗⃗⃗
he does is mumbles only no proper words
3. 10 Hence,
when I call him he doesn‟t come to me
4. 6
He needs to visualize what I‟m saying if ⃗ ⃗⃗⃗
5. She gets so frustrated it breaks my heart I {
guess I‟m looking for success stories
9 if ⃗ ⃗⃗⃗
Each model has been described with the proposed According to the above equation as in [24], the value
algorithm in the next sections where this dataset has been w.X+b>0 then it will be detected as a positive value (1)
utilized to train these models and the result of each model has otherwise it will be a negative value (0). The proposed
been discussed in the Result and Discussion section. algorithm is used the ASD symptoms dataset to train the SVM
model. The proposed algorithm to train the SVM model for
B. Support Vector Machine (SVM) the prediction of ASD symptoms has been given below.
Support vector machine (SVM) is the first approach to Proposed SVM Algorithm:
identify the symptoms of ASD. SVM is a supervised machine Pseudo Code:
learning algorithm that can be used for classification or Step 1: Read data from csv file.
regression problems. It is a good idea to use SVM on a small

237 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

Step 2. X=data from csv p()-> refers to the probability function.

x1=[a1,a2, a3,a4,a5,………an] is a user text column inside the
dataset. M-> refers to the input
x2=[r1,r2, r3,r4,r5,………rn] is a label data column inside the Where p(M)/1-p(M) is in the left side is termed as odds
dataset. and the left side is called logit. The odds are the ratio of
chance of success according to the chance of failure. In
Step 3. Split the dataset as train data and test data.
logistic regression, the linear input combination is transformed
train, test = train_test_split(X, test_size=0.2,
to log(odds).
random_state=1) The inverse of the above function will be: p(M)=(eβ0+β1x /
X_train = train['text'].values 1+ eβ0+β1x)
X_test = test['text'].values
y_train = train['label'] This function is a sigmoid function that can be produced
y_test = test['label'] an S-shaped curve and it returns a value between 0 and 1. The
main work of the sigmoid function is to generate a probability
Step 4. Define NLP functions to pre-process text from X_train and value from the expected value and this value always will be
X_test. bounded between 0 and 1. The mathematical representation of
// Text tokenization the sigmoid function can be - f(m) = 1/(1+e-m)
tokenize_text=tokenizer(text)
// Stop Words removal from text Fig. 3 shows the S-shape curve according to the function-
fresh_text = stopwords.words(text) f(m) = 1/(1+e-m)
// text to vector conversion using vectorization method
vectorizer = CountVectorizer(
analyzer = 'word',
tokenizer = tokenize_text,
lowercase = True,
ngram_range=(1, 1),
stop_words = fresh_text)

Step 5. Call method to train SVM model.

// kfolds has been used to send data as a bunch into the SVM model.
kfolds = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)
// Make the pipeline to send data inside the SVM model.
pipeline_svm = make_pipeline(vectorizer, SVC(probability=True,
kernel="linear", class_weight="balanced"))
// SVM model initialization with parameters
grid_svm = GridSearchCV(pipeline_svm,
Fig. 3. Sigmoid function according to the equation.
param_grid = {'svc__C': [0.01, 0.1, 1]},
cv = kfolds, The proposed algorithm which is based on logistic
scoring="roc_auc",
regression has been given below.
verbose=1,
n_jobs=-1) Proposed Logistic Regression Algorithm:
// fit data inside the model to train Pseudo Code:
grid_svm.fit(X_train, y_train) Step 1: Read data from CSV file.
Step 2: X=data from csv
Step 6. Predict the result using SVM model. x1=[a1,a2, a3,a4,a5,………an] is a user text column inside the
model= grid_svm.best_estimator_ dataset.
prediction = model.predict(X_test) x2=[r1,r2, r3,r4,r5,………rn] is a label data column inside the
The result of this proposed algorithm has been discussed in dataset
the Result and Discussion section. Step 3: Features generation using Vectorizer function.
// Vectorizer function converts the string value to number values.
C. Logistic Regression vectorizer = CountVectorizer(
The next approach is logistic regression which is able to analyzer = 'word',
identify ASD symptoms from user text. This is another lowercase = False,)
machine-learning algorithm for binary classification problems. // Feature creation using vectorizer.fit_transform function
The logistic regression model works on finding the value features = vectorizer.fit_transform(x1)
between 0 and 1 and this algorithm is bounded. The logistic // Feature array creation
regression does not contain any relationship between input and features_nd = features.toarray()
output variables because of the nonlinear transformation to the Step 4: Model creation and training
//Logistic model creation
odds ratio. Logistic regression can be defined as-
log_model = LogisticRegression()
Log(p(M)/1-p(M))=β0+ β1X // Logistic model train
log_model = log_model.fit(X=X_train, y=y_train)

238 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

Step 5: Prediction using Logistic Regression model Hamming distance functions can be used here to calculate the
y_pred = log_model.predict(X_test) distance. The proposed algorithm is based on KNN that uses
The output as a result of this proposed algorithm has been the proposed dataset.
discussed in the Result and Discussion section.
Proposed KNN Algorithm:
D. K-Nearest Neighbor (KNN) Pseudocode:
The third approach to identifying ASD symptoms from Step 1: Read data from CSV file.
Step 2: X=data from csv
user text. KNN is a supervised algorithm that can be used in
x1=[a1,a2, a3,a4,a5,………an] is a user text column inside the
classification problems. This algorithm uses feature similarity
dataset.
to predict the value for a new data point that comes as input. y1=[r1,r2, r3,r4,r5,………rn] is a label data column inside the
KNN uses the similarity between new data points with dataset
available categorical data points and identifies this data point // Split the data in train and test format
in a particular similar data point‟s category. KNN is very x_train,x_test,y_train,y_test=train_test_split(x1, y1,stratify=
popular in binary classification. Fig. 4 shows before KNN y1,test_size=0.33)
prediction the new data point plotted on a graph where two Step 3: String value to Vectorizer transformation.
categories of data points are present. Category A and Category // Vector function declaration
B have been classified according to the nearest data points. vectorizer=CountVectorizer()
According to Fig. 5, after applying the KNN algorithm, the // Vector transformation of x_train
new data point has been assigned as Category B because the x_train_bow=vectorizer.fit_transform(x_train)
nearest neighbor of the new data point is the data point of // Vector transformation of y_train
Category B. x_test_bow=vectorizer.transform(x_test)
Step 4: KNN Model creation
grid_params = { 'n_neighbors' : [40,50,60,70,80,90],'metric' :
['manhattan']}
knn=KNeighborsClassifier()
Step 5: KNN model training using prepared dataset
clf = RandomizedSearchCV(KNN, grid_params,
random_state=0,n_jobs=-1,verbose=1)
clf.fit(x_train_bow,y_train)
Step 6: Prediction using KNN model
Prediction=clf.predict_proba(x_test_bow)
The result of this proposed KNN-based algorithm has been
discussed in Result and Discussion section.
E. Random Forest
The last approach is a Random forest machine learning
algorithm to identify the ASD Symptoms from user text. This
Fig. 4. Before the KNN algorithm is applied on a new data point. is one of the important machine learning algorithms which is
constructed from decision tree algorithms. The Random forest
algorithm is used to solve regression and classification
problems. This algorithm is trained through bagging which is
an ensemble algorithm. The ensemble algorithm is used to
improve the accuracy of the machine learning algorithms. The
outcomes of the random forest are based on the prediction of
the decision tree. The mean of various decision trees is used to
calculate the prediction value by the random forest algorithm.
Decision trees in random forest algorithms use the tree view to
generate prediction value from a series of feature-based splits
where it starts from a root node and ends in a leaf node with a
decision. Feature selection and the splitting process is
depending on the impurity which means either result will be
„yes‟ or „no‟. To know about the impurity of the dataset, the
Gini index [25] is a good option and that can be written
Fig. 5. After the KNN algorithm applied on a new data point. mathematically-

K is a parameter in KNN that is related to the number of Gini Index = 1 - Σ (Pi)2

nearest neighbors that are used to count the majority process. = 1- [ (P+)2 + (P-)2 ]
The first step of KNN is to transform data points into vectors Where P+ is denoted as a probability of positive class and
where the KNN algorithm will calculate the distance of these P- is denoted as a probability of negative class. Gini Index
vectors. KNN computes the distance of each data point of will find out all the possibilities of splits and will choose the
training data then it will calculate the probability of a new data root node and this root node will be a low impurity means the
point is similar to the training data. Euclidean, Minkowski or lowest Gini index.

239 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

The proposed random forest-based algorithm has been X_tfidf_df = pd.DataFrame(X_tfidf.toarray())

given below which is utilizing the proposed dataset to train the X_tfidf_df.columns = words
model for the prediction of ASD symptoms from user text. return(X_tfidf_df)

Proposed Random Forest algorithm: tfidf_vect = TfidfVectorizer(analyzer=clean)

Pseudo code: tfidf_vect_fit=tfidf_vect.fit(X_train['text'])
Step 1: Read data from CSV file. X_train=vectorize(X_train['text'],tfidf_vect_fit)
Step 2: X=data from csv Step 4: Random Forest model initialization
x1=[a1,a2, a3,a4,a5,………an] is a user text column inside the model=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
dataset. class_weight=None,
x2=[r1,r2, r3,r4,r5,………rn] is a label data column inside the criterion='gini', max_depth=20, max_features='auto',
dataset max_leaf_nodes=None, max_samples=None,
Step 3: Features generation using TFIDF Vectorizer function. min_impurity_decrease=0.0,
// Split data and assign for training and testing purpose min_samples_leaf=1, min_samples_split=2,
X_train, X_test, y_train, y_test = train_test_split(x1, x2,,test_size = min_weight_fraction_leaf=0.0, n_estimators=100,
0.90, random_state=42) n_jobs=None, oob_score=False, random_state=None,
X_train, X_test, y_train, y_test = verbose=0, warm_start=False)
train_test_split(X_train,y_train,test_size = 0.5, random_state=42) X_val=vectorize(X_val['text'],tfidf_vect_fit)
X_val, X_test, y_val, y_test = train_test_split(X_test,y_test,test_size rf1 = RandomForestClassifier(n_estimators=100,max_depth=20)
= 0.5, random_state=42) rf1.fit(X_train, y_train.values.ravel())
// TFIDF Vectorizer Function declaration Step 5: Prediction of Random Forest Model.
def vectorize(data,tfidf_vect_fit): Prediction = model.predict(X_val)
X_tfidf = tfidf_vect_fit.transform(data) The result of this algorithm has been discussed in the
words = tfidf_vect_fit.get_feature_names() Result and Discussion section.

Fig. 6. Flow diagram of Proposed System Architecture

F. Proposed System Flow data. After completion of the model training and testing, the
Fig. 6 shows the overall architectural diagram of the proposed system is ready to accept new paragraph text from
proposed system to identify ASD symptoms from user text. the user to identify a number of positive sentences from the
The proposed system will read data from the ASD symptoms given text that denotes ASD symptoms. The predicted
dataset in the first step. Each sentence will be passed through sentences will be in two modes either it will positive (1) or
some NLP tasks like tokenization, stop words removal, and negative (0). The proposed system will select only the
text-to-vector transformation. Sentences are tokenized by the sentences that are positive and the negative sentences will be
tokenization process of NLP where stop words mean discarded in the next step. The selected positive sentences will
unwanted words (tokens) like „am‟,‟is‟,a‟,‟an‟, etc. are be the input to the Spacy Cosine Similarity Model. This model
removed from the sentence. The final task is to transform each will read each positive sentence from the ASD symptoms
token into vectors. These vectors are the main input in each dataset (Table V) and calculate the cosine similarity with the
machine-learning model with labeled data. After vector input sentence. The Spacy cosine similarity model will check
transformation, data are separated into two parts which are a sentence that has the highest cosine similarity score with the
training and testing data. According to Fig. 6, SVM, Logistic input sentence and the Label will be selected of this sentence
Regression, KNN, and Random Forest models are trained with by the system. The Label will indicate the ASD problem
the training data, and testing the prediction results with test according to Table IV. Each input sentence will be handled by

240 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

this cosine similarity model to identify ASD problems. The TABLE VI. SVM MODEL METRICS
algorithm has been given below. Sl. No. Metrics Value
Proposed Cosine Similarity algorithm: 1. AUC 0.77
Pseudo code: 2. F1 0.74
Step 1: // Declare Python and Spacy packages
import spacy 3. Accuracy 0.71
import pandas as pd
4. Precision 0.71
nlp = spacy.load('en_core_web_lg')
// Initialize positive ASD symptoms data in a Dataframe 5. Recall 0.77
Step 2: df = pd.read_csv(“ASD_Smptoms.csv”)
// Three list variable has been declared to store each cosine
The SVM model has multiple metrics to understand the
similarity value with sentence and label
comments=[]
model‟s performance and scalability. According to Table VI,
sentiment=[] the AUC score is 77% which is a good score for any trained
cosine_value=[] SVM model. The AUC refers to the area under the ROC curve
Step 3: Define Cosine Similarity Calculation Method that is a popular metric of SVM. If AUC = 1, then the model
def Spacy_Cosine(strs): can distinguish correctly between positive and negative. If the
for ind in df.index: condition is 0.5<AUC<1 then there is a high chance to
sen1 = nlp(df['Comments'][ind]) distinguish between positive and negative. The F1 score of
sen2 =nlp(strs) this proposed SVM model is 74% which refers to the
combination of precision and recall scores which are 71% and
sen1_no_stop_words = nlp(' '.join([str(t) for t in sen1 if not 77% respectively. The overall accuracy of this proposed SVM
t.is_stop])) model is 71% and this score is a good approach. According to
sen2_no_stop_words = nlp(' '.join([str(t) for t in sen2 if not the ROC curve, the higher Y-axis value denotes a higher
t.is_stop])) number of true positives than false negatives as well as the
higher X-axis value denotes a higher number of false positives
comments.append(df['Comments'][ind]) than true negatives. According to Fig. 7, the ROC curve of this
sentiment.append(df['Sentiment'][ind]) proposed SVM model shows a higher true positive rate than
the false positive rate. This signifies that the proposed is able
score=sen2_no_stop_words.similarity(sen1_no_stop_words)
to generate good prediction results and this ROC curve
# score=sen2.similarity(sen1)
indication satisfied this.
cosine_value.append(score)

dfc=pd.DataFrame(
{'Comments': comments,
'Sentiment': sentiment,
'Cosine_Scores': cosine_value
})

dfc.to_csv(r'ASD_Cosine_Data.csv')
dfc['Cosine_Scores']=dfc['Cosine_Scores'].astype('float64')
i = dfc['Cosine_Scores'].idxmax()

return dfc['Sentiment'][i]
Step 4: // Select only predicted positive (1) sentences as input
Strs= List of predicted positive sentences
for st in strs['Comments']:
result=Spacy_Cosine(st)
print(st,"=",result) Fig. 7. ROC curve of proposed SVM model.
The output of this proposed algorithm has been given and
discussed in Result and Discussion section. According to Fig. 8, the training scores line on the graph is
between 0.99 and 0.94 (approx.) and the cross-validation
IV. RESULT AND DISCUSSION scores line is between 0.70 and 0.79 (approx.). The gap
The proposed system uses multiple traditional machine between the two score lines is not very high. This proposed
learning models which are SVM, Logistic Regression, KNN, model is able to generate good prediction results according to
and Random Forest. The proposed dataset has been utilized to the given Fig. 8.
train and test these models. The result of each model
according to the dataset has been discussed here one by one.
A. Result and Discussion of SVM Model
Table VI has been given here to show the SVM model
metrics after training and testing.

241 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

Fig. 8. Training scores and cross-validation scores of SVM.

A few sentences have been sent to the proposed SVM Fig. 10. Confusion matrix of logistic regression model.
model for the prediction. According to Fig. 9, the proposed
model shows the output as 1 or 0, which is attached to each TABLE VII. LOGISTIC REGRESSION MODEL METRICS
sentence. One (1) refers to a positive sentence regarding ASD Sl. No. Metrics Value
detection whereas zero (0) refers to a negative sentence.
1. AUC 0.69

2. F1 0.63

3. Accuracy 0.71

4. Precision 0.72

5. Recall 0.56

Fig. 9. Prediction result of SVM model as output.

B. Result and Discussion of Logistic Regression

According to Fig. 10, a confusion matrix has been
represented that refers to how many are true actual 1s, actual
0s, predicted 0s, and predicted 1s.
According to the test data, the proposed logistic regression
model selects 75 sentences as actual 0s and predicted as 0s.
Fifteen (15) sentences are actual 0s but predicted as 1s
whereas 31 sentences are actual 1s and predicted as 0s. Forty
(40) sentences are actual 1s and predicted as 1s. The following Fig. 11. ROC curves of logistic regression.
metrics for model evaluation have been given in Table VII.
The AUC value is 0.69 (69%) which covers the ROC curve. C. Result and Discussion of KNN model
The F1 score is 0.63 (63%) which combines the precision and According to Fig. 12, two ROC curve has been represented
recall values. The precision value is 0.72(72%) and the recall that shows the accuracy of the proposed KNN model.
value is 0.56 (56%). The overall accuracy of the proposed
Logistic regression model is 0.71 (71%). According to Fig. 11,
the ROC curve, the higher Y-axis value denotes a higher
number of true positives than false negatives as well as the
higher X-axis value denotes a higher number of false positives
than true negatives. The training accuracy is 0.97 (97%)
whereas the testing accuracy is 0.70 (70%) on the proposed
dataset according to Fig. 11.

242 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

Fig. 13. ROC curves of random forest.

Fig. 12. ROC curves of KNN training and testing.
Fig. 13 shows the two lines that are showing the accuracy
The AUC value of the proposed KNN model is 0.67 (67%) graph of the proposed Random Forest model. The blue color
which refers to the high chance to distinguish positive and line shows the accuracy of the training data as well as yellow
negative sentences. The AUC refers to the area of the ROC line shows the accuracy of the testing data. According to Fig.
curves. The given two ROC curves stated that they are moving 13, the accuracy score of this model on the testing data is 0.98
in almost the same manner from 0 to 1. The training accuracy (98%), and 0.68 (68%) accuracy score on the testing data.
is 0.80 (80%) as well as testing accuracy is 0.65 (65%) on the According to Table IX, the F1 score is 0.73 (73%) which is
proposed dataset. It is another good metric that shows the combined two metrics values that are Precision and Recall.
ability of the prediction of the proposed KNN model. Precision refers to the measurement of the positive prediction
Accuracy is an important metric for machine learning model of a model whereas recall refers to the positive cases that are
determination according to the task. The popular metrics have correctly predicted by the model. The Precision value is 0.70
been given in Table VIII are useful to evaluate the machine (70%) and the Recall value is 0.76 (76%). These two values
learning model. The AUC value is already given as 0.67 have been defined in Table IX. The overall accuracy value of
(67%). The F1 score is 0.65 (65%) which combines the the proposed model is 0.69 (69%).
precision and recall values. The precision value is 0.65 (65%)
and the recall value is 0.66 (66%). The overall accuracy of the TABLE IX. RANDOM FOREST METRICS
proposed KNN model is 0.62 (62%).
Sl. No. Metrics Value
TABLE VIII. KNN MODEL METRICS 1. AUC 0.68

Sl. No. Metrics Value 2. F1 0.73

1. AUC 0.67 3. Accuracy 0.69
2. F1 0.65 4. Precision 0.70
3. Accuracy 0.62 5. Recall 0.76
4. Precision 0.65
Fig. 14 shows the top 20 important features from all
5. Recall 0.66
sentences of the prepared dataset that are also used in the
model training. The frequency of each feature can be seen in
D. Result and Discussion of Random Forest model Fig. 14. “poo”, “autism”, and “toilet”, are three noted words
The last proposed model is Random Forest which is a good with the highest frequencies. Other words are also given in
classifier. The proposed dataset has been applied to this model Fig. 14.
to predict the sentiment of the sentences from the parents‟
dialogues. This model trains with the features after extracting
these from the sentences.

243 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

generate positive or negative sentences using sentiment

analysis. A sentence that contains ASD symptoms is 1 and a
sentence that does not contain any ASD symptoms is 0. The
sentiment analysis has been done using SVM, Logistic
Regression, KNN, and Random Forest models. These models
are trained with the proposed dataset. After prediction, the
proposed system will select all positive sentences as input for
the cosine similarity model. An ASD symptoms dataset has
been proposed where each sentence is labeled with a value
that indicates particular ASD symptoms. The proposed system
will calculate the cosine similarity value between the input
sentence and each ASD sentence of the ASD symptoms
dataset. The proposed system will select a label value of an
ASD symptoms sentence that has the highest cosine similarity
value with the input sentence and this label value will indicate
Fig. 14. Important features of proposed dataset. the ASD problem. This system is based on text and does not
need to use MRI or Image data for the prediction of ASD at
E. Result and Discussion of Spacy Cosine Similarity Model the early age of a child. This system may be used in many
The proposed model returns the ASD problem according health centers in rural areas because people in rural areas are
to the positive sentences that contain ASD symptoms. not aware of ASD as well as many of them are financially
weak to spend money for MRI or other ASD diagnosis
processes.
VII. FUTURE WORK
The proposed dataset can be utilized to train the Naïve
Bayes and XGBoost models for better output and accuracy.
The cosine similarity model of this system depends on the
prediction result of the traditional machine learning models.
These models are good for small datasets but these models
will not work with the best performance when the dataset is
large. XGBoost is an ensemble model which is a very
Fig. 15. Output of spacy cosine similarity model. powerful model for prediction as well as the Naïve Bayes
model is a probabilistic model that works on Bayes theorem.
The output can be seen here in Fig. 15 where the sentence These two models implementation using a proposed dataset
“she can‟t understand where to pee” is labeled with 7. The for ASD detection is the future development of this proposed
sentences like “She is very aggressive and throwing objects to system.
others” and “Eyes are scrolling and hand flapping” are labeled
with 8 and 6. According to Table IV, 7 denotes Cognitive ACKNOWLEDGMENT
Behaviour problems whereas 8 refers to Hyper Active The authors extend their appreciation to the Manipur
problems and 6 refers to the Eye contact problem. After the International University, Imphal, India for supporting this
detection of ASD problems, therapies can be started according Post-Doctoral (D.Sc.) research work on Autism.
to the detected problem and that will be very helpful to reduce
the ASD symptoms. REFERENCES
[1] Raj, Suman, Masood, Sarfaraz, “Analysis and Detection of Autism
V. LIMITATION OF THE PROPOSED SYSTEM Spectrum Disorder Using Machine Learning Techniques”, Procedia
The proposed system has been equipped with traditional Computer Science, vol. 167, pp. 994-1004, 2020.
machine-learning models. The Probabilistic model like Naïve [2] A.S. Mohanty , K.C. Patra , P. Parida, "Toddler ASD classification using
machine learning techniques", Int. J. Online Biomed. Eng. vol. 17,
Bayes or ensemble model like XGBoost models can be 2021.
applied to this dataset for better accuracy. More data can be [3] Ashima Sindhu Mohanty, Priyadarsan Parida, Krishna Chandra Patra,
collected for the training score and testing score enhancement. "ASD classification for children using deep neural network", Global
More accurate ASD-related parent dialogs related to ASD are Transitions Proceedings, pp.461–466, 2021.
needed to train the model. If the dataset is large then this [4] K. K. Hyde, M. N. Novack, N. LaHaye, C. Parlett-Pelleriti, R. Anden,
traditional machine learning model will not work better and D.R. Dixon, and E. Linstead, “Applications of supervised machine
that will downgrade the proposed system. If one part of this learning in autism spectrum disorder research: a review”, Review
Journal of Autism and Developmental Disorders, vol. 6(2), pp.128-146,
system is not responding then the cosine similarity part will 2019.
not work perfectly.
[5] L. Xu, X. Geng, X. He, J. Li and J. Yu, “Prediction in Autism by Deep
Learning Short-Time Spontaneous Hemodynamic Fluctuations”.
VI. CONCLUSION Frontiers in Neuroscience, vol. 13, 2019.
The proposed system will accept natural language text
from the parents‟ dialogues. The proposed system will

244 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 6, 2023

[6] A.L. Georgescu, J.C. Koehler, J. Weiske, K. Vogeley, N. Koutsouleris, [23] Amanda Sun, Zhe Wu, "Early detection of mental disorder via social
C. Falter-Wagner, “Machine Learning to Study Social Interaction media posts using deep learning models", Proceedings of Asia Pacific
Difficulties in ASD.” Computational Approaches for Human-Human Computer Systems Conference, pp. 149-158, 2021.
and Human-Robot SocialInteractions, 2019. [24] Anshul Saini, "Support Vector Machine(SVM): A Complete guide for
[7] Shomona Gracia Jacob, Majdi Mohammed Bait Ali Sulaiman, Bensujin beginners", https://www.analyticsvidhya.com/blog/2021/10/ support-
Bennet, "Algorithmic Approaches to Classify Autism Spectrum vector-machinessvm-a-complete-guide-for-beginners/, 2023.
Disorders: A Research Perspective", Procedia Computer Science, vol. [25] Himanshi Singh, "How to select Best Split in Decision trees using Gini
201, pp. 470–477, 2022. Impurity", https://www.analyticsvidhya.com/blog/2021/03/how-to-
[8] Fadi Thabtah, David Peebles, "A new machine learning model based on select-best-split-in-decision-trees-gini-impurity/, 2021.
induction of rules for autism detection",
[9] D. P. Wall, R. Dally, R. Luyster R, et al., "Use of artificial intelligence AUTHORS‟ PROFILE
to shorten the behavioral diagnosis of autism", PLoS ONE, 2012. Prasenjit Mukherjee has 14 years of experience in
[10] M. Duda, R. Ma, N. Haber, et al., "Use of machine learning for academics and industry. He completed his Ph.D. in
behavioral distinction of autism and ADHD", Transl Psychiat, vol. 9(6), Computer Science and Engineering in the area of Natural
2016. Language Processing from the National Institute of
Technology (NIT), Durgapur, India under the
[11] A.Pratap, C.S. Kanimozhiselvi, R. Vijayakumar, et al., "Predictive Visvesvaraya PhD Scheme from 2015 to 2020.
assessment of autism using unsupervised machine learning models, Int J Presently, He is working as a Data Scientist at Vodafone
Adv Intell Paradig, vol.6(2), pp. 113–121, 2014. Intelligent Solutions, Pune, Maharashtra, India, and
[12] M. Al-Diabat, "Fuzzy data mining for autism classification of children", doing his Post Doctoral (D.Sc.) in Computer Science
Int J Adv Comput Sci Appl, vol. 9(7), pp. 11–17, 2018. from Manipur International University, Imphal, Manipur, India.
[13] ANJA THIEME, DANIELLE BELGRAVE, GAVIN DOHERTY, Sourav Sadhukhan has above 5 years of experience
"Machine Learning in Mental Health: A Systematic Review of the HCI in Law and Management. He completed his Graduation
Literature to Support the Development of Effective and Implementable in LLB from Calcutta University, Kolkata, India, and
ML Systems", Trans. Comput.-Hum. Interact, vol. 27(5), Article 34, Post Graduate Diploma in Management from Pune
2020. Institute of Business Management, Pune, India. Presently
[14] Jacqueline Peng, Mengge Zhao, James Havrilla, Cong Liu, Chunhua he is a student of Executive Post Graduation in Data
Weng, Whitney Guthrie, Robert Schultz, Kai Wang, Yunyun Zhou, Science and Analytics from the Indian Institute of
"Natural language processing (NLP) tools in extracting biomedical Management, Amritsar, India.
concepts from research articles: a case study on autism spectrum
Dr. Manish Godse has 27 years of experience in
disorder", BMC Med Inform Decis Mak, pp. 1-9, 2020.
academics and industry. He holds Ph.D. from Indian
[15] Izabela Chojnicka, Aleksander Wawer, "Social language in autism Institute of Technology, Bombay (IITB). He is currently
spectrum disorder: A computational analysis of sentiment and linguistic working as an IT Consultant in the Bizamica Software,
abstraction", PLOS ONE, pp. 1-16, 2020. Pune in the area of Artificial Intelligence and Analytics.
[16] Mahmoud Elbattah, Jean-Luc Guérin, Romuald Carette, Federica Cilia, His research areas of interest include automation,
Gilles Dequen, "NLP-Based Approach to Detect Autism Spectrum machine learning, natural language processing and
Disorder in Saccadic Eye Movement", IEEE Symposium Series on business analytics. He has multiple research papers
Computational Intelligence (SSCI), pp. 1581-1587, 2020. indexed at IEEE, ELSEVIER, etc.
[17] T. Lakshmi Praveena, N. V. Muthu Lakshmi, "Sentiment Analysis on Dr. Baisakhi Chakraborty received the PhD.
Autism Spectrum Disorder using Twitter Data", International Journal of degree in 2011 from National Institute of Technology,
Recent Technology and Engineering (IJRTE), vol. 7(4), pp. 204-208, Durgapur, India in Computer Science and Engineering.
2018. Her research interest includes knowledge systems,
[18] Laura Dubreuil-Vall, Giulio Ruffini, Joan A. Camprodon1, "Deep knowledge engineering and management, database
Learning Convolutional Neural Networks Discriminate Adult ADHD systems, data mining, natural language processing, and
From Healthy Individuals on the Basis of Event-Related Spectral EEG", software engineering. She has several research scholars
Front. Neurosci, vol. 14, pp. 1-12, 2020. under her guidance. She has more than 60 international
publications. She has a decade of industrial and 22
[19] Dingfu Zhou, Zhihang Liao, Rong Chen, "Deep Learning Enabled years of academic experience.
Diagnosis of Children‟s ADHD Based on the Big Data of Video Screen
Long-Range EEG", Journal of Healthcare Engineering, pp. 1-9, 2022.
[20] Shubham Dhuri, Nitin Ahire, Deepak Kamat, Sunil Nayak, Bhavesh
Maurya, "ADHD EEG signal analysis using Machine Learning",
International Research Journal of Engineering and Technology (IRJET),
vol. 8(5), pp. 2572-2575, 2021.
[21] Iqra Ameer, Muhammad Arif,Grigori Sidorov, Helena Gomez-Adorno,
Alexander Gelbukh, "Mental Illness Classication on Social Media Texts
using Deep Learning and Transfer Learning", arXiv:2207.01012, pp. 1-
12, 2022.
[22] Tanzila Saba, Amjad Rehman Khan, Ibrahim Abunadi, Saeed AliBahaj,
Haider Ali, Maryam Alruwaythi, "Arabic Speech Analysis for
Classification and Prediction of Mental Illness due to Depression Using
Deep Learning", Computational Intelligence and Neuroscience, vol.
2022, pp. 1-9, 2022.

245 | P a g e
www.ijacsa.thesai.org

High Society - 1-54!1!22
0% (1)
High Society - 1-54!1!22
22 pages
Gandhi Cloth Company - Integer & Mixed Integer Programming
No ratings yet
Gandhi Cloth Company - Integer & Mixed Integer Programming
3 pages
FORM SSC.2 School Sports Club Attachment Form v1
100% (2)
FORM SSC.2 School Sports Club Attachment Form v1
3 pages
Deloitte Interview Questions
0% (1)
Deloitte Interview Questions
12 pages
Paper 41-Detection of Autism Spectrum Disorder
No ratings yet
Paper 41-Detection of Autism Spectrum Disorder
15 pages
Early Detection of Autism Spectrum Disorder Based
No ratings yet
Early Detection of Autism Spectrum Disorder Based
10 pages
j Dr 20230064
No ratings yet
j Dr 20230064
9 pages
Microsoft Word - 1 Uddin OK
No ratings yet
Microsoft Word - 1 Uddin OK
17 pages
Prediction_of_Autism_Spectrum_Disorder_using_Random_Forest_Classifier_in_Adults
No ratings yet
Prediction_of_Autism_Spectrum_Disorder_using_Random_Forest_Classifier_in_Adults
8 pages
IJCRT2209468
No ratings yet
IJCRT2209468
8 pages
10 1109@iciem48762 2020 9160123
No ratings yet
10 1109@iciem48762 2020 9160123
5 pages
Research Paper
No ratings yet
Research Paper
23 pages
A Computational Intelligent Analysis of Autism Spectrum Disorder Using Machine Learning Techniques
No ratings yet
A Computational Intelligent Analysis of Autism Spectrum Disorder Using Machine Learning Techniques
10 pages
Fin Irjmets1690473459
No ratings yet
Fin Irjmets1690473459
7 pages
A Machine Learning Approach To Predict Autism
No ratings yet
A Machine Learning Approach To Predict Autism
4 pages
A Survey on Genetic Disease - Autism Spectrum
No ratings yet
A Survey on Genetic Disease - Autism Spectrum
17 pages
Detection of Autism Spectrum Disorder (ASD) in Children and Adults Using Machine Learning
No ratings yet
Detection of Autism Spectrum Disorder (ASD) in Children and Adults Using Machine Learning
14 pages
Detection of Autism Spectrum Disorder Using Deep Learning Models
No ratings yet
Detection of Autism Spectrum Disorder Using Deep Learning Models
6 pages
Proposal
No ratings yet
Proposal
11 pages
Irjet V9i4357
No ratings yet
Irjet V9i4357
6 pages
20230302114652PPT
No ratings yet
20230302114652PPT
38 pages
122 Submission
No ratings yet
122 Submission
10 pages
Multi-Modal Data Fusion For Classification of Autism Spectrum Disorder Using Phenotypic and Neuroimaging Data
No ratings yet
Multi-Modal Data Fusion For Classification of Autism Spectrum Disorder Using Phenotypic and Neuroimaging Data
16 pages
Cit 5
No ratings yet
Cit 5
17 pages
1 s2.0 S2665917423001101 Main
No ratings yet
1 s2.0 S2665917423001101 Main
6 pages
Prediction and Comparison Using Adaboost and ML Algorithms With Autistic Children Dataset IJERTV9IS070091
No ratings yet
Prediction and Comparison Using Adaboost and ML Algorithms With Autistic Children Dataset IJERTV9IS070091
4 pages
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
100% (1)
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
22 pages
Autistic_Spectrum_Disorder_Screening_Prediction_with_Machine_Learning_Models
No ratings yet
Autistic_Spectrum_Disorder_Screening_Prediction_with_Machine_Learning_Models
7 pages
Machine Learning Classifiers For Autism Spectrum Disorder A Review
No ratings yet
Machine Learning Classifiers For Autism Spectrum Disorder A Review
6 pages
ML Autism
No ratings yet
ML Autism
6 pages
Autism ML Paper
No ratings yet
Autism ML Paper
7 pages
paper for conf1 (1) (1) (1) (1)
No ratings yet
paper for conf1 (1) (1) (1) (1)
6 pages
Classification of Adult Autistic Spectrum Disorder Using Machine Learning Approach
No ratings yet
Classification of Adult Autistic Spectrum Disorder Using Machine Learning Approach
9 pages
Ijresm V4 I12 18
No ratings yet
Ijresm V4 I12 18
2 pages
S. Deutsch Et Al. Autism Spectrum Disorders PDF
No ratings yet
S. Deutsch Et Al. Autism Spectrum Disorders PDF
210 pages
ASD Classification for Children Using Deep Nueral Network
No ratings yet
ASD Classification for Children Using Deep Nueral Network
11 pages
Childhood Autism Rating Scale, 2nd Edition
No ratings yet
Childhood Autism Rating Scale, 2nd Edition
25 pages
Ca 4 ML Report
No ratings yet
Ca 4 ML Report
17 pages
Note Academic
No ratings yet
Note Academic
13 pages
fpsyt-13-993077 (1)
No ratings yet
fpsyt-13-993077 (1)
15 pages
Autism Spectrum Disorder Using Machine Learning
No ratings yet
Autism Spectrum Disorder Using Machine Learning
1 page
IJETAUTISMPAPER
No ratings yet
IJETAUTISMPAPER
6 pages
Asd 1
No ratings yet
Asd 1
14 pages
Model For Autism Disorder Detection Using Deep Learning
No ratings yet
Model For Autism Disorder Detection Using Deep Learning
8 pages
Sailaja Paper
No ratings yet
Sailaja Paper
6 pages
Predictive_Analysis_of_Autism_Spectrum_Disorder_ASD_using_Machine_Learning
No ratings yet
Predictive_Analysis_of_Autism_Spectrum_Disorder_ASD_using_Machine_Learning
6 pages
ASD-DLrevZ24
No ratings yet
ASD-DLrevZ24
33 pages
50%_report TY
No ratings yet
50%_report TY
25 pages
Screening, Diagnosis and Early Intervention in Autism Spectrum Disorders
No ratings yet
Screening, Diagnosis and Early Intervention in Autism Spectrum Disorders
5 pages
Detecting High-Functioning Autism in Adults Using Eye Tracking and Machine Learning
No ratings yet
Detecting High-Functioning Autism in Adults Using Eye Tracking and Machine Learning
8 pages
Research Article: Classification and Detection of Autism Spectrum Disorder Based On Deep Learning Algorithms
No ratings yet
Research Article: Classification and Detection of Autism Spectrum Disorder Based On Deep Learning Algorithms
10 pages
Mohanty_2021_J._Phys.__Conf._Ser._1921_012006
No ratings yet
Mohanty_2021_J._Phys.__Conf._Ser._1921_012006
18 pages
Early Screening of Autism in Toddlers Via Response-To-Instructions Protocol
No ratings yet
Early Screening of Autism in Toddlers Via Response-To-Instructions Protocol
11 pages
50%_report TY
No ratings yet
50%_report TY
26 pages
2309.11646v2
No ratings yet
2309.11646v2
20 pages
Detection of Autism Spectrum Disorder in Children Using
No ratings yet
Detection of Autism Spectrum Disorder in Children Using
17 pages
Kanimozhiselvi Et Al. - 2019 - Grading Autism Children Using Machine Learning Tec
No ratings yet
Kanimozhiselvi Et Al. - 2019 - Grading Autism Children Using Machine Learning Tec
3 pages
AUTISM SPECTRUM DISORDER (1)
No ratings yet
AUTISM SPECTRUM DISORDER (1)
7 pages
A Novel Machine Learning Model To Predict Autism Spectrum Disorders Risk Gene
No ratings yet
A Novel Machine Learning Model To Predict Autism Spectrum Disorders Risk Gene
7 pages
Autism BASE
No ratings yet
Autism BASE
9 pages
disease
No ratings yet
disease
9 pages
Intl J of Devlp Neuroscience - 2014 - Matson - Diagnosing Young Children With Autism
No ratings yet
Intl J of Devlp Neuroscience - 2014 - Matson - Diagnosing Young Children With Autism
5 pages
Unlocking Potential: Navigating Employment for Neurodiverse Talent
From Everand
Unlocking Potential: Navigating Employment for Neurodiverse Talent
Travis Breeding
No ratings yet
Autism: Navigating the Spectrum
From Everand
Autism: Navigating the Spectrum
Christopher Ford
No ratings yet
Vesda VLF-500
100% (1)
Vesda VLF-500
2 pages
Cute Cat Pfps For Discord - Google Search
No ratings yet
Cute Cat Pfps For Discord - Google Search
1 page
Mat1003 Discrete-Mathematical-Structures TH 3.0 6 Mat1003 Discrete-Mathematical-Structures TH 3.0 6 Mat 1003 Discrete Mathematical Structures
No ratings yet
Mat1003 Discrete-Mathematical-Structures TH 3.0 6 Mat1003 Discrete-Mathematical-Structures TH 3.0 6 Mat 1003 Discrete Mathematical Structures
2 pages
FormulaSheet Fourier Series
No ratings yet
FormulaSheet Fourier Series
7 pages
Otago 649834
No ratings yet
Otago 649834
27 pages
Shift &amp Key Lock System-01
No ratings yet
Shift &amp Key Lock System-01
1 page
Study of Adhesion Properties of Natural Rubber, Epoxidized Natural Rubber, and Ethylene-Propylene Diene Terpolymer-Based Adhesives
No ratings yet
Study of Adhesion Properties of Natural Rubber, Epoxidized Natural Rubber, and Ethylene-Propylene Diene Terpolymer-Based Adhesives
44 pages
Usab 6206
No ratings yet
Usab 6206
65 pages
FT0045809-E00CRT Rev00 Dynamometer Test Report
No ratings yet
FT0045809-E00CRT Rev00 Dynamometer Test Report
41 pages
Lexmark Supplies Guide 2003
No ratings yet
Lexmark Supplies Guide 2003
22 pages
FFPF60SA60DS: 6 A, 600 V, STEALTH™ Dual Diode
No ratings yet
FFPF60SA60DS: 6 A, 600 V, STEALTH™ Dual Diode
5 pages
Annex 2
No ratings yet
Annex 2
5 pages
Vogele Ab 500 Extending Screed Training
100% (46)
Vogele Ab 500 Extending Screed Training
10 pages
LEMBARAN KERJA BI TAHUN 4 (m7-10) PDPR
No ratings yet
LEMBARAN KERJA BI TAHUN 4 (m7-10) PDPR
9 pages
8kg
No ratings yet
8kg
4 pages
Vandenbosch, Fardouly, Tiggemann
No ratings yet
Vandenbosch, Fardouly, Tiggemann
16 pages
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
No ratings yet
Winter - 2019 Examination Subject Name: Basic Mathematics Model Answer Subject Code
18 pages
Paper Cup Machine Price List W.E.F 1.3.2010
No ratings yet
Paper Cup Machine Price List W.E.F 1.3.2010
2 pages
WebUser 442 2018 02 07
100% (2)
WebUser 442 2018 02 07
76 pages
033 Montenegro Toilets Elevations PDF
No ratings yet
033 Montenegro Toilets Elevations PDF
1 page
Synchronous Motor
No ratings yet
Synchronous Motor
26 pages
MATHEMATICS AND COMPUTER SCIENCE
No ratings yet
MATHEMATICS AND COMPUTER SCIENCE
11 pages
AP Inter 2nd Year Syllabus 2020-21 - MATHEMATICS - 2B
100% (1)
AP Inter 2nd Year Syllabus 2020-21 - MATHEMATICS - 2B
3 pages
Check List of Documents Required For New/Renew MOA: S.No. Documents Submitted Yes / No
No ratings yet
Check List of Documents Required For New/Renew MOA: S.No. Documents Submitted Yes / No
9 pages
Her Milk, His Desire (Swells, Lacey) (Z-Library)
No ratings yet
Her Milk, His Desire (Swells, Lacey) (Z-Library)
52 pages
Copie de COMPTRAIN 4 WEEK OPEN PREP
No ratings yet
Copie de COMPTRAIN 4 WEEK OPEN PREP
2 pages

Paper 26-Early Detection of Autism Spectrum Disorder

Uploaded by

Paper 26-Early Detection of Autism Spectrum Disorder

Uploaded by

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 14, No. 6, 2023

Early Detection of Autism Spectrum Disorder (ASD)

Fig. 1. Accuracy graph of similar ML models and proposed ML models.

TABLE IV. LIST OF LABELS WITH ASD PROBLEMS

Step 2. X=data from csv p()-> refers to the probability function.

Step 5. Call method to train SVM model.

K is a parameter in KNN that is related to the number of Gini Index = 1 - Σ (Pi)2

The proposed random forest-based algorithm has been X_tfidf_df = pd.DataFrame(X_tfidf.toarray())

Proposed Random Forest algorithm: tfidf_vect = TfidfVectorizer(analyzer=clean)

Fig. 6. Flow diagram of Proposed System Architecture

Fig. 8. Training scores and cross-validation scores of SVM.

Fig. 9. Prediction result of SVM model as output.

B. Result and Discussion of Logistic Regression

Fig. 13. ROC curves of random forest.

Sl. No. Metrics Value 2. F1 0.73

generate positive or negative sentences using sentiment

You might also like