0% found this document useful (0 votes)
41 views

Final Project

The document discusses developing a machine learning system to reduce the time taken to diagnose autism in children. Currently, diagnosing autism can take up to 6 months. The proposed system would use algorithms like Random Forest, SVM, Decision Tree and AdaBoost to classify individuals as autistic based on their characteristics. It would analyze medical and health data to make more accurate and earlier predictions. The system aims to help doctors diagnose autism at an earlier stage. It will also identify the most relevant questions from autism diagnostic questionnaires to enhance predictive models.

Uploaded by

Dharaniselvaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Final Project

The document discusses developing a machine learning system to reduce the time taken to diagnose autism in children. Currently, diagnosing autism can take up to 6 months. The proposed system would use algorithms like Random Forest, SVM, Decision Tree and AdaBoost to classify individuals as autistic based on their characteristics. It would analyze medical and health data to make more accurate and earlier predictions. The system aims to help doctors diagnose autism at an earlier stage. It will also identify the most relevant questions from autism diagnostic questionnaires to enhance predictive models.

Uploaded by

Dharaniselvaraj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

1.

INTRODUCTION

1.1 About the project

Health care is one of the most important fields that would benefit from reducing
processing time. The speed and efficiency of human health issues diagnostics is significant.
The current diagnosing time is a huge challenge in many health conditions, especially
Autism. It takes up to six months to firmly diagnose a child with autism due the long process,
and a child must see many different specialists to diagnose autism, starting from
developmental pediatricians, neurologists, psychiatrists or psychologists.

The time consumed to finalize Autism diagnoses is relatively long in the current
traditional way. Therefore, Machine Learning methods can make a significant change to
accelerate the process. It is known that Early Intervention is the key for improving Autistic
children. Clearly speeding the diagnosing time is even more crucial in Autism cases. Big data
and machine learning technologies can make enormous progress to predict and speed up the
complex and time-consuming processes of diagnosis and treatment.

A machine learning system can be developed to utilize a massive amount of health and
medical data available towards predictive modeling and predictive analysis. In this project, a
comparison of several machine learning techniques and models will be tested and analyzed.
Data is pre-processed to make a prediction based on different categories into which test
subjects are classified as Autistic. There are many existing classification algorithms that can
be applied. Every classifier is diverse in its way of data accumulation, data filtering, feature
extraction and employing these processes towards feeding the model to learn.

A second stage is attempted to use machine learning algorithms to identify the most
relevant diagnostic questions of a traditional Autism diagnostic questionnaire (AQ) using the
extended version of the test containing fifty questions. In addition, we analyze the results to
be used to further enhance Autism predictive models. The second stage of this research is
very important for future research in the Autism diagnosis field. The model designed to check
the questions relevancy can be used in many different ways. Once more data is collected, this
model can help to determine the child’s Autism severity in the scale. Further research with
larger dataset would be very useful to improve the updating the Autism test questions based
on outcome of this model after being fed larger amount of data which will lead to
improvement is the Machine Learning process.

1
Autism Spectrum Disorder (ASD) is a complex heterogeneous neurodevelopment
disability that may cause behavioral, social, or communication challenges or difficulties The
term “spectrum” indicates that the severity of the symptoms varies in individuals, and
symptoms also differ due to the heterogeneity of the conditions.

The centers for disease control (CDC) reports that the prevalence of autism has been
increasing during the past two decades According to this study, in 2000, in every 150
children’s one is diagnosed with ASD in the US. In 2014 one in every 59 children and in
2016, one in every 54 children is diagnosed with ASD prevalence reported by CDC from
2000 to 2016 For the past two decades, researchers have been using various methods to help
diagnose people with ASD and understand what may cause this issue in individuals.

The proposed system makes use of Random Forest (RF), Support Vector Machine
(SVM), Decision Tree and Ada Booster algorithms to predict autism spectrum disorder in an
individual in terms of accuracy, specificity, sensitivity, precision and f1-score
The result is measured in terms of specificity, sensitivity, and accuracy by using the
confusion matrix and classification report.

In this research we use machine learning to determine a set of conditions that together
prove to be predictive of Autism Spectrum Disorder. This will be a great use to physicians,
helping them detect Autism Spectrum Disorder at a much earlier stage.

2
1.2. System specification

1.2.1. Hardware specification

Processor : Pentium IV 2.4 GHz

Hard Disk : 500 GB

RAM : 8GB RAM

Video : 800 x 600 resolutions, 256 colors

1.2.2. Software specification

Operating system : Windows 7

Internet explorer : IE 6.0

Front end : Python

Tool : Anaconda (Jupiter)

3
1.3 Software description

1.3.1 Introduction to front end

Python

Python is a high-level, interpreted and general-purpose dynamic programming


language that focuses on code readability. The syntax in Python helps the programmers to do
coding in fewer steps as compared to Java or C. Python is widely used in bigger organizations
because of its multiple programming paradigms. They usually involve imperative and object-
oriented functional programming. It has a comprehensive and large standard library that has
automatic memory management and dynamic features.

The software development companies prefer Python language because of its versatile
features and fewer programming codes. Nearly 14% of the programmers use it on the
operating systems like UNIX, Linux, Windows and Mac OS. Interactive

Advantages or Benefits of Python

The Python language has diversified application in the software development companies
such as in gaming, web frameworks and applications, language development, prototyping,
graphic design applications, etc. This provides the language a higher plethora over other
programming languages used in the industry. Some of its advantages are-

• Extensive Support Libraries

It provides large standard libraries that include the areas like string operations, Internet,
web service tools, operating system interfaces and protocols. Most of the highly used
programming tasks are already scripted into it that limits the length of the codes to be written
in Python.

4
• Presence of Third Party Modules

The Python Package Index (PyPI) contains numerous third-party modules that make
Python capable of interacting with most of the other languages and platforms.

• Integration Feature

Python integrates the Enterprise Application Integration that makes it easy to develop
Web services by invoking COM or COBRA components. It has powerful control capabilities
as it calls directly through C, C++ or Java via Python. Python also processes XML and other
markup languages as it can run on all modern operating systems through same byte code.

• Improved Programmer’s Productivity

The language has extensive support libraries and clean object-oriented designs that
increase two to tenfold of programmer’s productivity while using the languages like Java, VB,
Perl, C, C++ and C#.

• Productivity

With its strong process integration features, unit testing framework and enhanced
control capabilities contribute towards the increased speed for most applications and
productivity of applications. It is a great option for building scalable multi-protocol network
applications.

• Open Source and Community Development

Python language is developed under an OSI-approved open source license, which


makes it free to use and distribute, including for commercial purposes.

Further, its development is driven by the community which collaborates for its through
hosting conferences and mailing lists, and provides for its numerous modules.

5
• Learning Ease and Support Available

Python offers excellent readability and uncluttered simple-to-learn syntax which helps
beginners to utilize this programming language. The code style guidelines, PEP 8, provide a
set of rules to facilitate the formatting of code. Additionally, the wide base of users and active
developers has resulted in a rich internet resource bank to encourage development and the
continued adoption of the language.

• User-friendly Data Structures

Python has built-in list and dictionary data structures which can be used to construct
fast runtime data structures. Further, Python also provides the option of dynamic high-level
data typing which reduces the length of support code that is needed.

• Simple and easy to Learn

• The syntaxes of python language are very simple. Anybody can remember the
python language syntaxes, rules and regulations very easily.
• The elegant syntaxes of the python language make the people to learn python in
easiest manner.
• Without having any other programming languages knowledge, we can learn
python directly.
• The simple and powerful syntax of python language makes the programmers to
express their business logic is less lines of code.

• Platform Independent

• Like Java programs, Python programs are also platform independent.


• Once we write a Python program, it can run on any platform without rewriting
once again.
• Python uses PVM to convert python code to machine understandable code.

6
• High Level Language

• Python is a high level language.


• While developing python applications, developers no need to bother about
memory management.
• Dynamically Typed Language.
• Python is a dynamically typed language. I mean, in python there is no need to
declare type of a variable.
• Whenever we assign a value to the variable, based on the value the type will be
allocated automatically.

• Python supports POP & OOP

• Python language supports both Procedural Oriented Programming and Object


Oriented Programming features.
• We can implement OOPs features like Encapsulation, polymorphic, inheritance
and abstraction in python programming language.

• Python is interpreted Language

• Like PHP, Python is also interpreted language.


• Python applications do not require explicit compilation so that compiler is not
requiring for python software.
• Python interpreter is responsible for execution of python applications.
• Whenever we run a python application, the python interpreter checks for the
syntax errors. If there is no syntax errors found in our code then the interpreter
converts the code into intermediate code in the form of low level format and
executes it.
• The intermediate code of the python applications is known as byte code.
• The extension of the byte code file is .pye (Python Compile Code).

7
• Python is Embeddable

• We can embed the python code into other languages such as C, C++, Java and Etc.
• In order to provide the scripting capabilities to other languages we can use the
python code in it.

• Develop GUI & Web Application

• We can develop the GUI based applications using Python Languages.


• We can also develop the Web applications using Python Language.

• Applications of Python
• GUI based desktop applications
• Image processing and graphic design applications
• Scientific and computational applications
• Games
• Web frameworks and web applications
• Enterprise and business applications
• Operating systems
• Language development
• Prototyping

8
2. SYSTEM STUDY

2.1. Existing system

Autism Spectrum Disorder is a group of developmental disorders. It includes a wide


range of symptoms because it is a “Spectrum” that means different levels of disability, skills
and characteristics. Children with Autism often show some common signs such as social
problems, that including difficulty communicating and interacting with peers and others.
Another obvious characteristic of autism is repetitive behaviors in addition to limited interests
or activities. Some children are slightly impaired by their symptoms while other children are
severely disabled. Early intervention and services can improve children with Autism
symptoms and their ability to function. Parents and caregivers are advised to seek immediate
Autism assessment if they notice these symptoms.

According to the Centers of Disease Control and Prevention (CDC), around one in 68
children have been identified with some forms of ASD. The main problem statement of this
research is to expedite Autism diagnoses by providing a machine learning system that uses
different machine learning algorithms that lead to the make Autism predictive model with
most possible accuracy.

The solution is proposing a predictive model with high accuracy that can predict if a
child has Autism or not using Autism Quotient questionnaire (AQ) test. The aim is to use a
traditional Autism diagnosis method and transform it to a machine learning model that can
utilize the massive amount of data collected to make predictions, observations and lead to
better solutions in the future of discovering Autism at the earliest age possible. Ideally more
observations and data analysis in the field will lead to improvements suggesting new methods
of improving

2.1.1. Drawbacks of existing system

• Accuracy is very less compared to the proposed system.


• Analyze only small amount of data compared to the proposed system
• Cost is more
• Analyze of clinical data is not accurate

9
2.2. Proposed system

In proposed system, three Machine learning algorithms used in predicting autism spectrum
disease are Logistic Regression KNN, and Random Forest, concluding that these algorithms
can achieve high accuracy in predicting Autism spectrum disorder. Using machine learning,
the app can analyze and predict if a child is at risk for developmental delay or autism. It can
help identify red flags in development but also help with analyzing progress on normal
development.

2.2.1. Features of proposed system

• Machine learning algorithms can analyze a large amount of data to assist medical
professionals in making more informed decisions cost-effectively.
• Machine Learning algorithms allowed us to analyze clinical data, draw relationships
between diagnostic variables, design the predictive model, and test it against the new
case. The predictive model achieved an accuracy of 89.4 percent using Random Forest
Classifier’s default setting to predict fatal diseases
• Finally, the model we built to predict autism spectrum disease can save enormous
medical bills, improve diagnosis capability on large scale, and most importantly save
live
• By mapping this activity over time in the brain's many regions, the algorithm generates
neural activity “fingerprints.”
• Although unique for each individual just like real fingerprints, the brain fingerprints
nevertheless share similar features, allowing them to be sorted and classified.
• With the help of and machine learning, researchers are hoping the brain can help
identify mental health issues.
• By applying specially designed algorithms to brain scans, labs could identify
distinctive features that determine a patient's optimal treatment.

10
3. SYSTEM DESIGN AND DEVELOPMENT

3.1. Input design

The input design is the process of entering data to the system. The input design
goal is to enter to the computer as accurate as possible. Here inputs are designed
effectively so that errors made by the operations are minimized. The inputs to the system
have been designed in such a way that manual forms and the inputs are coordinated where
the data elements are common to the source document and to the input. The input is
acceptable and understandable by the users who are using it.

Input design is the process of converting user-originated inputs to a computer-


based format input data are collected and organized into group of similar data. Once
identified, appropriate input media are selected for processing. The input design also
determines the user to interact efficiently with the system. Input design is a part of overall
system design that requires special attention because it is the common source for data
processing error. The goal of designing input data is to make entry easy and free from
errors.
Input design is the process of connecting the user-originated inputs into a computer to
used format. The goal of the input design is to make the data entry logical & free from errors.
Errors in the input database controlled by input design.This application are being developed
in a user-friendly manner. The forms are being designed in such a way that during the
processing the cursor is placed in the position where the data must be entered. An option of
selecting an appropriate input from the values of validation is made for each of every data
entered.

Help managers are also provided whenever the user entry to a new field to that he/she
can understand what is to be entered. Whenever the user enter an error data, error manager
displayed user can move to next field only after entering the correct data.

Data Pre-processing is for removing outliers and noise in the raw data and makes it
available for training the model. In simple way, Data Pre-processing is the major step in the
project evaluation to obtain the best accuracy.

11
3.2. Output design

Output design is the process of converting computer data into hard copy that is
understood by all. The various outputs have been designed in such a way that they represent
the same format that the office and management used to.

Computer output is the most important and direct source of information to the user.
Efficient, intelligible output design should improve the systems relationships with the user
and help in decision making. A major form of output is the hardcopy from the printer.

Output requirements are designed during system analysis. A good starting point for the
output design is the Data Flow Diagram (DFD). Human factors educe issues for design
involves addressing internal controls to ensure readability.

The output form in the system is either by screen or by hard copies. Output design aims
at communicating the results of the processing of the users. The reports are generated to suit
the needs of the users. The reports have to be generated with appropriate levels.

12
3.3. System development

3.3.1. Description of modules

• Dataset Collection

The dataset for this study was gathered from the UCI Repository, which is open to the
public. Dataset Name-ASD Screening Data for Adult, Attribute Type-continuous and binary,
Number of Instances-704. There are sixteen characteristics in the dataset, which are a
combination of categorical and numerical data of which include: Question 1- 10, Age,
Gender, if the person born with jaundice, Any Family member with ASD, who is completing
the test, and Class. The AQ-10 screening questions cover a variety of domains, including
attention switching, imagination, communication, and social interaction. The questions are
graded using a one point scoring system for each of the ten questions. On each question, the
user might earn 0 or 1 point depending on their response.

• Preprocessing of Data

Data Pre-processing is for removing outliers and noise in the raw data and makes it
available for training the model. In simple way, Data Pre-processing is the major step in the
project evaluation to obtain the best accuracy. Thus, raw data is converted into something
usable and understandable. Real-world data is frequently partial and inaccurate because it
contains many errors and outliers. Several methods are to handle data, such as handling
incomplete data, outlier analysis, data reduction, and discretization. The missing values in
these datasets were solved using the imputation method.

• Transformation of Data

Transforming data into appropriate forms to perform data mining is the third step. The
transformation refers to the right format of each features involved in the data mining. The
format of the feature is decided based on the techniques selected for data mining.

• Data mining model building

Choosing a data mining algorithm which is appropriate to extract the pattern in the data is
initially done. In this step, the data mining model is built for mining data. The model is
trained on the training dataset and tested on the testing set. Data mining models include
clustering, classification, and prediction. Considering an 80:20 ratio, the complete dataset is
categorized into training set and testing set respectively as per requirements. In the wake of

13
implementing several types of supervised learning systems such as random forests (RF),
KNN and Logistics Regression.

• Evaluation and Accuracy Prediction

In this step, the models developed for data mining is evaluated for its performance and
accuracy of the results. For each data mining techniques the evaluation methods and metrics
differ. In addition to assessing accuracy, sensitivity, specificity and precision, the proposed
model was also tested using the leave-one-out strategy on the AQ-10 dataset. As part of the
validation process, field observations were conducted at various places using forms to collect
over 189 ASD cases and 515 cases without ASD from a special education institute for people
with special needs

Here we will evaluate the accuracy of three algorithms like decision tree, Logistics
Regression and KNN.

Methods

Here we will be using several classifiers to compare accuracy scores - we will be settling
and fine-tuning the best classifier after comparing effectiveness.

Logistic regression

Logistic regression estimates the probability of an event occurring, such as voted or


didn't vote, based on a given dataset of independent variables. Since the outcome is a
probability, the dependent variable is bounded between 0 and K-Nearest Neighbors K-NN
algorithm stores all the available data and classifies a new data point based on the similarity.

Random Forest

An RF is an ensemble learning method that utilizes many individual classification and


regression trees. A classification decision is made based on a majority vote of the trees
predictions. A beneficial feature of using an RF is the built-in bootstrapping, which leads to
training and validation of the algorithm with less intrinsic bias in the analysis.

14
Gui Prediction

In the end, a Tkinter GUI Application has been developed specifically for the general
public. The user answers closed ended questions to receive a result with regard to autism or
not. The input data will be collected from graphical screen and it will be compared to the
algorithm model and the prediction result will be displayed in the screen. The accuracy will
be calculated for three algorithm model like Logistic Regression, Random Forest and KNN.

15
4. TESTING AND IMPLEMENTATION

4.1 System Testing

System testing is the process of exercising software with the intent of finding and
ultimately correcting errors. This fundamental philosophy does not change for web
applications, because Web-based systems and application reside on a network and
interoperate with many different operating system, browsers, hardware platforms, and
communication protocols; the search for errors represents a significant challenge for web
application.

The distributed nature of client\server environments, the performance issues


associated with transaction processing, the potential presence of a number of different
hardware platforms, the complexities of network communication, the need to serve multiple
clients from a centralized database and the requirements imposed on the server all combine to
make testing of client\server architectures.

System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer based system. System testing is the state of implementation that is
aimed at assuring that the system works accurately and efficiently. Testing is the vital to the
success of the system. System testing makes the logical assumption that if all the parts of the
system are correct, the goal will be successfully achieved.

The Objective Of Testing Is As Follows

• Testing is the process of executing a program with the intent of finding an error.
• A successful test is that one of the cover of undiscovered error.

Testing Issues

• Client GUI considerations


• Target environment and platform diversity considerations
• Distributed database considerations
• Distributed processing considerations

16
Testing Methodologies

System testing is the state of implementation, which is aimed at ensuring that the
system works accurately and efficiently as expect before live operation commences. It
certifies that the whole set of programs hang together. System testing requires a test plan that
consists of several key activities and steps for run program, string, system and user
acceptance testing. The implementation of newly designed package is important in adopting a
successful new system.

Testing is an important stage in software development.The system test in


implementation stage in software development. The system test in implementation should be
confirmation that all is correct and an opportunity to show the users that the system works as
expected. It accounts the largest percentage of technical effort in the software development
process.

Testing phase in the development cycle validates the code against the functional
specification. Testing is vital to the achievement of the system goals. The objective of
testing is to discover errors. To fulfill this objective a series of test step unit, integration,
validations and system tests were planned and executed.

4.2 Types of Testing

The test steps are,

• Unit testing
• Integration testing
• Functional testing
• System testing
• White box testing
• Black box testing

4.2.1 Unit testing

Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before

17
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive.

Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined
inputs and expected results.

4.2.2 Integration testing

Integration tests are designed to test integrated software components to determine if


they actually run as one program. Testing is event driven and is more concerned with the
basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically aimed
at exposing the problems that arise from the combination of components.

4.2.3 Functional testing

Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.


Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or


special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.

18
4.2.4 System Testing

System testing ensures that the entire integrated software system meets requirements.
It tests a configuration to ensure known and predictable results. An example of system testing
is the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.

4.2.5 White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.

4.2.6 Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in
which the software under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the software works.

Quality Assurance

Quality assurance consists of the auditing and reporting functions of management.


The goal of quality assurance is to provide management with the data necessary to be
informed about product quality, thereby gaining insight and confidence that product quality is
meeting its goal.

Quality Assurance Goals

• Correctness

The extent to which the program meets system specifications and user objectives.

• Reliability

The degree to which the system performs its intended functions overtime.

19
• Efficiency

The amount of computer resources required by a program to perform a function.

• Usability

The effort required learning and operating a system.

• Maintainability

To use with which program errors are located and corrected.

• Testability

The effort required testing a program to ensure its correct performance.

• Portability

The ease of transporting a program from one hardware configuration to another.

• Accuracy

The required position in input editing, computations and output.

Generic Risk

Generic risks are estimated to contribute 40 to 80 percent of ASD risk. The risk from
gene variants combined with environmental risk factors, such as parental age, birth
complications, and others that have not been identified, determine an individual's risk of
developing this complex condition. Risk factors may include a sibling with autism. Older
parents. Certain genetic conditions, such as Down, fragile X, and Rett syndromes.

Security Technologies and Policies

Any system developed should be secured and protected against possible hazards.
Security measures are provided to prevent unauthorized access of the database at various
levels. At uninterrupted power supply should be so that the power failure or voltage
fluctuations will not erase the data in the files.

Password protection and simple procedures to change the unauthorized access are
provided to the users. The system allows the user to enter the system for product management
and order status entry only through login utility. The user will have to enter the user name
and password.

20
A multi-layered security architecture comprising firewalls, filtering routers,
encryption and digital certification must be ensured in this project in real time that order and
payment details protected from unauthorized access. The customer can access this order
status only by using his customer code and order number.

4.4 System Implementation

Implementation is the stage in the project where the theoretical design is turned into a
working system. The most crucial stage is achieving a successful new system & giving the
user confidence in that the new system will work efficiently & effectively in the
implementation state.

The Stage Consists Of

• Testing the developed program with simple data.


• Detection’s and correction of error.
• Creating whether the system meets user requirements.
• Testing whether the system.
• Making necessary changes as desired by the user.
• Training user personnel.

Implementation Procedures

The implementation phase is less creative than system design. A system project may
be dropped at any time prior to implementation, although it becomes more difficult when it
goes to the design phase.

The final report to the implementation phase includes procedural flowcharts, record
layouts, report layouts, and a workable plan for implementing the candidate system design
into an operational one. Conversion is one aspect of implementation.
Several procedures of documents are unique to the conversion phase. They include the
following,

First Frame selection is performed. Frames with sufficient number of blocks are
selected. Next, only some predetermined low frequency DCT coefficients are permitted to
hide data. Then the average energy of the block is expected to be greater than a

21
predetermined threshold. In the final stage, the energy of each coefficient is compared against
another threshold.

The unselected blocks are labeled as erasures and they are not processed. For each
selected block, there exists variable number of coefficients. These coefficients are used to
embed and decode single message bit by employing multi-dimensional form of FZDH that
uses cubic lattice as its base quantize.

User Manual

User Training

User Training is designed to prepare the user for testing & consenting the system.
They are:

• User Manual.
• Help Screens.
• Training Demonstration.

1) User Manual

The summary of important functions about the system and software can be provided
as a document to the user.

2) Help Screens

This features now available in every software package, especially when it is used with
a menu. The user selects the “Help” option from the menu. The system accesses the
necessary description or information for user reference.

3) Training Demonstration

Another User Training element is a Training Demonstration. Live demonstrations


with personal contact are extremely effective for Training Users.

4.5 System Maintenance

Maintenance is actually the implementation of the review plan. As important as it is,


many programmers and analysts are to perform or identify themselves with the maintenance
effort. There are psychological, personality and professional reasons for this. Analysts and
programmers spend far more time maintaining programs than they do writing them.
Maintenance accounts for 50-80 percent of total system development

22
Maintenance is expensive. One way to reduce the maintenance costs are through
maintenance management and software modification audits.

• Maintenance is not as rewarding as exciting as developing systems. It is perceived


as requiring neither skill not experience.
• Users are not fully cognizant of the maintenance problem or its high cost.
• Few tools and techniques are available for maintenance.
• A good test plan is lacking.
• Standards, procedures, and guidelines are poorly defined and enforced.
• Programs are often maintained without care for structure and documentation.
• There are minimal standards for maintenance.
• Programmers expect that they will not be in their current commitment by time
their programs go into the maintenance cycle.

Corrective Maintenance

It means repairing, processing or performance failure or making changes because of


previously uncovered problems or false assumptions.

Perfective Maintenance

It means changes made to a system to add new features or to improve performance.

Preventive Maintenance

Changes made to a system to avoid possible future problems.

23
5. CONCLUSION AND FUTURE ENHANCEMENT

5.1. Conclusion

This research provides a prediction model that is developed to predict autism traits.
Using the AQ-10 dataset, the proposed model can predict autism with 92.89%, 96.20%,
100.00% and 79.14% accuracy in case of Decision tree, Random forest, Adaboost and SVM
algorithms respectively. Comparing all four supervised machine learning algorithms,
AdaBoost and Random Forest algorithm are efficient algorithms for better prediction of
ASD. This result showed better performance comparing to the other existing approach of
screening autism. Moreover, the proposed model can predict autism traits for age groups
below 3 years, while many other existing approaches missed this feature.

A user-friendly Web application has been developed for end users based on the
proposed prediction model so that individual can use the application to predict the autism
traits easily.

24
5.2. Future Enhancement

The world of computers is not static. It is always subject to change. The technology
today will become outdated the very next day. To keep abstract of the technological
improvements the system need refinements, so it is concluded, it will be improved for further
enhancements, whenever the user needs an additional feature into it.

The outcome of this research provides an effective and efficient approach to detect
autism traits for age groups 3 years and below. Since diagnosing the autism traits is quite a
costly and lengthy process, it’s often delayed because of the difficulty of detecting autism in
toddlers. With the help of autism screening application, an individual can be guided at an
early stage that will prevent the situation from getting any worse and reduce costs associated
with delayed diagnosis.

25
BIBLIOGRAPHY

Reference Books

1. Lutz, M. (2013). Learning Python, 5th Edition (5 edition). Beijing: O’Reilly Media.

2. Tibbits S., vander Harten, A., & Baer, S. (2011). Rhino Python Primer (3rd ed.).

3. Downey, A.B. (2015). “Think Python: How to Think Like a Computer


Scientist (2edition)”. Sebastopol, CA: O’Reilly Media.

4. Greg Wilson .“Data crunching: solve every day problems using Java, Python and
more. The pragmatic programmers”, Pragmatic Bookshelf, Raleigh.

5. Guido van rossum and Fred L. Drake, Jr. “The Python Tutorial - An Introduction to
Python”. Network Theory Ltd., Bristol,

6. Michael Dawson. “Python programming for the absolute beginner”. Premier Press
Inc., Boston, MA, USA, 2003.

7. Harvey M. Deitel, Paul Deitel, Jonathan Liperi, and Ben Wieder mann “Python How
to Program”. P T R Prentice-Hall, Englewood Cliffs.

Reference Websites

• www.W3schools.com

• www.udemy.com

• www.learnpython.com

• www.Guru99.com

• www.towardsdatascience.com

• www.kaggle.com

26
APPENDICES

A) System Flow Diagram

Start

Input Data

Autism Spectrum
Data Set
Pre-Processing

Data Partition

Model Building
Split Data into Training data and Test Data

Logistic Random Forest KNN


Regression

Accuracy Prediction Result

27
B) Sample Coding

from sklearn.model_selection import train_test_split


from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from tkinter import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import roc_curve, auc
from sklearn import metrics

data = pd.read_csv(r'csv_result-Autism_Data.csv')
n_records = len(data.index)
n_asd_yes = len(data[data['Class/ASD'] == 'YES'])
n_asd_no = len(data[data['Class/ASD'] == 'NO'])
yes_percent = float(n_asd_yes) / n_records * 100

data.replace("?", np.nan, inplace=True)


total_missing_data = data.isnull().sum().sort_values(ascending=False)
percent_of_missing_data = (
data.isnull().sum()/data.isnull().count()*100).sort_values(ascending=False)
missing_data = pd.concat(
[
total_missing_data,
percent_of_missing_data
],
axis=1,
keys=['Total', 'Percent'] )

28
data.dropna(inplace=True)
gender_n = {"m": 1, "f": 0}
jundice_n = {"yes": 1, "no": 0}
austim_n = {"yes": 1, "no": 0}
used_app_before_n = {"yes": 1, "no": 0}
result_n = {"YES": 1, "No": 0}
# Encode columns into numeric
for column in data.columns:
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
features = ['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score', 'A6_Score',
'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score','ethnicity','contry_of_res','relation']
predicted = ['Class/ASD']
X = data[features].values
y = data[predicted].values
split_test_size = 0.20
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=split_test_size, random_state=0)
# Setup a knn classifier with k neighbors
knn = KNeighborsClassifier()
# Fit the model
knn.fit(X_train, y_train.ravel())
from PIL import Image, ImageTk
top = Tk()

# You can set the geometry attribute to change the root windows size
top.geometry("700x700") # You want the size of the app to be 500x500
top.resizable(0, 0) # Don't allow resizing in the x or y direction
top.title('Autism Spectrum Disorder Classification Tool')
top.option_add("*Button.Background", "grey")
top.option_add("*Button.Foreground", "black")
from PIL import ImageTk, Image
photo = ImageTk.PhotoImage(Image.open(r'ribbon.png'))
logo = Label(top, image=photo)
29
logo.pack()
top.mainloop
label_pos_x = 30
label_pos_y = 50

Label(top, text="A1_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A2_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A3_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A4_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A5_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A6_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A7_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A8_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A9_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="A10_Score").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20
30
Label(top, text="Ethnicity").place(x=label_pos_x, y=label_pos_y)
label_pos_y += 20

Label(top, text="Contry_of_res").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="Relation").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

entry_pos_x = 200
entry_pos_y = 50

e1 = Entry(top)
e1.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e2 = Entry(top)
e2.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e3 = Entry(top)
e3.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e4 = Entry(top)
e4.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e5 = Entry(top)
e5.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

31
e6 = Entry(top)
e6.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e7 = Entry(top)
e7.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e8 = Entry(top)
e8.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e9 = Entry(top)
e9.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e10 = Entry(top)
e10.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e11 = Entry(top)
e11.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e12 = Entry(top)
e12.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

e13 = Entry(top)
e13.place(x=entry_pos_x, y=entry_pos_y)
entry_pos_y += 20

entryText = StringVar()
prediction_entry = Entry(top, textvariable=entryText, width=60)
32
prediction_entry.place(x=200, y=460)
entry_pos_y += 20

entryAccuracy = StringVar()
accuracy_entry = Entry(top, textvariable=entryAccuracy, width=60)
accuracy_entry.place(x=200, y=460+20)
entry_pos_y += 20

Label(top, text="List of features:").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

label_pos_x = 400
label_pos_y = 20

Label(top, text="Classifier Performance:").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

label_pos_x = 100
label_pos_y = 460

Label(top, text="Classification:").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

Label(top, text="Accuracy:").place(x=label_pos_x, y=label_pos_y)


label_pos_y += 20

acc= []
acc1 = []
total_accuracy = {}
def accuracy(model):
pred = model.predict(X_test)
pred = (pred > 0.5)
accu = metrics.accuracy_score(y_test,pred)
errors = abs(pred - y_test)
33
print('Model Performance')
print("\nAccuracy Of the Model: ",accu)
entryAccuracy.set('Accuracy Of the Model: '+ str(accu))
print("\nAverage Error: {:0.2f} degrees.".format(np.mean(errors)))
total_accuracy[str((str(model).split('(')[0]))] = accu

model_test = model.predict(X_test)
# true negative, false positive, etc...
cm = confusion_matrix(y_test, model_test)
total1=sum(sum(cm))

#confusion matrix calculate sensitivity,specificity


specificity1 = cm[0,0]/(cm[0,0]+cm[0,1])
print('Specificity Of the Model: ', specificity1,'\n' )

sensitivity1 = cm[1,1]/(cm[1,0]+cm[1,1])
print('Sensitivity Of the Model: ', sensitivity1,'\n')

acc.append([accu,sensitivity1, specificity1])

def classify_using_knn():
A1_Score = float(e1.get())
A2_Score = float(e2.get())
A3_Score = float(e3.get())
A4_Score = float(e4.get())
A5_Score = float(e5.get())
A6_Score = float(e6.get())
A7_Score = float(e7.get())
A8_Score = float(e8.get())
A9_Score = float(e9.get())
A10_Score = float(e10.get())
A11_Score = float(e11.get())
A12_Score = float(e12.get())
A13_Score = float(e13.get())
34
best_grid = KNeighborsClassifier(n_neighbors=14, metric = 'hamming')
best_grid.fit(X_train, y_train.ravel())

prediction = knn.predict([[A1_Score, A2_Score, A3_Score, A4_Score, A5_Score, A6_Score,


A7_Score, A8_Score, A9_Score,A10_Score, A11_Score, A12_Score, A13_Score]])
accuracy(best_grid)

#entryText1.set()
if prediction[0] == 0:
entryText.set('Not pre-diagnose with ASD')
else:
entryText.set('Pre-diagnosed with ASD, seek clinician for further assistance')

def classify_using_rf():
A1_Score = float(e1.get())
A2_Score = float(e2.get())
A3_Score = float(e3.get())
A4_Score = float(e4.get())
A5_Score = float(e5.get())
A6_Score = float(e6.get())
A7_Score = float(e7.get())
A8_Score = float(e8.get())
A9_Score = float(e9.get())
A10_Score = float(e10.get())
A11_Score = float(e11.get())
A12_Score = float(e12.get())
A13_Score = float(e13.get())

from sklearn.ensemble import RandomForestClassifier


RF3=RandomForestClassifier(n_estimators=64,min_samples_split=12,min_samples_leaf=2,
max_features=13,max_depth= 11, bootstrap=True)
RF3.fit(X_train, y_train.ravel())

35
prediction = RF3.predict([[A1_Score, A2_Score, A3_Score, A4_Score, A5_Score,
A6_Score, A7_Score, A8_Score, A9_Score,A10_Score, A11_Score, A12_Score,
A13_Score]])
accuracy(RF3)
if prediction[0] == 0:
entryText.set('Not pre-diagnose with ASD')
else:
entryText.set('Pre-diagnosed with ASD, seek clinician for further assistance')

def classify_using_lr():
from sklearn.linear_model import LogisticRegression
A1_Score = float(e1.get())
A2_Score = float(e2.get())
A3_Score = float(e3.get())
A4_Score = float(e4.get())
A5_Score = float(e5.get())
A6_Score = float(e6.get())
A7_Score = float(e7.get())
A8_Score = float(e8.get())
A9_Score = float(e9.get())
A10_Score = float(e10.get())
A11_Score = float(e11.get())
A12_Score = float(e12.get())
A13_Score = float(e13.get())

from sklearn.linear_model import LogisticRegression


from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Create standardizer
standardizer = StandardScaler()

36
# Create logistic regression
for c in [0.00001, 0.0001, 0.001, 0.1, 1, 10]:
logRegModel = LogisticRegression(C=c)

# Create a pipeline that standardizes, then runs logistic regression


pipeline = make_pipeline(standardizer, logRegModel)
best_score=0\

# Create k-Fold cross-validation


kf = KFold(n_splits=5, shuffle=True, random_state=42)

# perform K-fold cross-validation


scores = cross_val_score(pipeline, X_train, y_train, cv=kf, scoring='accuracy', n_jobs=-1)

# compute mean cross-validation accuracy


score = np.mean(scores)

# Find the best parameters and score


if score >best_score:
best_score = score
best_parameters = c

LogRegModel = LogisticRegression().fit(X_train, y_train.ravel())


prediction = LogRegModel.predict([[A1_Score, A2_Score, A3_Score, A4_Score, A5_Score,
A6_Score, A7_Score, A8_Score, A9_Score, A10_Score, A11_Score, A12_Score,
A13_Score]])
accuracy(LogRegModel)

if prediction[0] == 0:
entryText.set('Not pre-diagnose with ASD')
else:
entryText.set('Pre-diagnosed with ASD, seek clinician for further assistance')

37
Button(top, text="Classify using K-Nearest Neigbhbors", command=classify_using_knn,
activebackground="pink",
activeforeground="blue").place(x=200, y=420)
Button(top, text="Classify using Random Forest", command=classify_using_rf,
activebackground="pink",
activeforeground="blue").place(x=200, y=385)
Button(top, text="Classify using Logistic Regression", command=classify_using_lr,
activebackground="pink",
activeforeground="blue").place(x=200, y=350)
top.mainloop()

38
C) Sample Input Forms

Sample Screen

39
Login page

40
D) Sample Output Forms

41
Classifier performance

42
43
44
Model performance

45

You might also like