100% found this document useful (1 vote)
994 views

Heart Disease Prediction Final Report

The document is a major project report submitted by Harshit More and Nikhil Kute for their Bachelor of Technology degree. It discusses the development of a machine learning model to predict heart diseases. The model will be developed under the supervision of Prof. Deepak Rathore. The report includes an introduction to heart diseases, machine learning, and data mining techniques. It also discusses the motivation, objectives, and organization of the project report.

Uploaded by

MUSIC BY LOST
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
994 views

Heart Disease Prediction Final Report

The document is a major project report submitted by Harshit More and Nikhil Kute for their Bachelor of Technology degree. It discusses the development of a machine learning model to predict heart diseases. The model will be developed under the supervision of Prof. Deepak Rathore. The report includes an introduction to heart diseases, machine learning, and data mining techniques. It also discusses the motivation, objectives, and organization of the project report.

Uploaded by

MUSIC BY LOST
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Prediction of Heart Diseases using Machine

Learning
A Major Project Report
Submitted in Partial fulfillment for the award of
Bachelor of Technology in Computer Science & Engineering

Submitted to
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL (M.P)

MAJOR PROJECT REPORT


Submitted by
Harshit More [0103CS193D05] Nikhil Kute[0103CS193D11]
.

Under the supervision of


Prof. Deepak Rathore
Assistant Professor

Department of Computer Science & Engineering


Lakshmi Narain College of Technology, Bhopal (M.P.)
Session 2021-22
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY,

BHOPAL

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that the work embodied in this project work entitled”
Prediction of Heart Diseases using Machine Learning” has been
satisfactorily completed by the Harshit More (0103CS193D05), Nikhil
Kute (0103CS193D11) . It is a Bonafede piece of work, carried out under
the guidance in Department of Computer Science & Engineering,
Lakshmi Narain College of Technology, Bhopal for the partial
fulfillment of the Bachelor of Technology during the academic year 2021-
22.

Prof. Deepak Rathore


Assistant Professor

Approved By
Dr. Sadhna K. Mishra
Prof. & Head
Department of Computer Science & Engineering
LAKSHMI NARAIN COLLEGE OF TECHNOLOGY,
BHOPAL

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ACKNOWLEDGEMENT

We express our deep sense of gratitude to Prof. Deepak Rathore (Guide)


department of Computer Science & Engineering L.N.C.T., Bhopal. Whose
kindness valuable guidance and timely help encouraged me to complete this
project.

A special thank goes to Dr. Sadhna K. Mishra (HOD) who helped me in


completing this project work. She exchanged her interesting ideas &
thoughts which made this project work successful.

We would also thank our institution and all the faculty members without
whom this project work would have been a distant reality.

Harshit More[0103CS193D05]
Nikhil Kute[0103CS193D11]
Table of Content

Sr .No. Topics Page

1. Abstract 1

2. Introduction 6-10

3. Data Mining and Its 10-22


Applications

4. Literature Review 23-24

5. Problem Definition & 25-29


Proposed Method

6. Experiment Setup 30-34

7. Conclusion & Future Work 35


ABSTRACT

Heart related diseases or cardiovascular diseases (CVDs) are the main


reason for a huge number of deaths in the world over the last few
decades and has emerged as the most life-threatening disease, not only in
India but in the whole world. So, there is a need of reliable,
accurate and feasible system to diagnose such diseases in time for proper
treatment. Machine Learning algorithms and techniques have
been applied to various medical datasets to automate the analysis of large
and complex data. Many researchers, in recent times, have
been using several machine learning techniques to help the health care
industry and the professionals in the diagnosis of heart related
diseases. This paper presents a survey of various models based on such
algorithms and techniques and analyze their performance. Models
based on supervised learning algorithms such as Support Vector Machines
(SVM), K-Nearest Neighbor (KNN), Naïve Bayes, Decision
Trees (DT), Random Forest (RF) and ensemble models are found very
popular among the researchers.
CHAPTER 1
INTRODUCTION

1.1 Introduction
We are living in very fast and hectic scheduled world. Nobody is aware about their health.
Everybody is concentrating over carrier. Due to this negligence of health very fast-growing
region is heart diseases patient. Suppose you just diagnose with heart diseases type 1 or type 2.
This is very important to start understanding about this illness. You can stop this by growing
day by day by starting to live your life healthier.

There are many causes of heart failure, but the condition is generally broken down into two
types

Heart failure with reduced left ventricular function (HF-rEF)

The lower left chamber of the heart (left ventricle) gets bigger (enlarges) and cannot squeeze
(contract) hard enough to pump the right amount of oxygen-rich blood to the rest of the body.

Heart failure with preserved left ventricular function (HF-pEF)

The heart contracts and pumps normally, but the bottom chambers of the heart (ventricles) are
thicker and stiffer than normal. Because of this, the ventricles can't relax properly and fill up all
the way. Because there's less blood in the ventricles, less blood is pumped out to the rest of the
body when the heart contracts
1.2 Machine Learning
This industry is not new when we go through multiple papers and publication, we found that this
mechanism evolves in year 1960. Within From last some years m/c learning involve in many
domains due to improving volume of data in daily life. Much online web application generated
huge amount of log or data that can be processed by any algorithm.

Predictive analysis is an analytics process which is analyses historical data cum new data to
forecast behavior and trends activity for future prediction goal, it is closely related to

advanced analytics.
Machine learning
techniques

Types of Data mining techniques


Prediction

Figure 1.2: Types of Prediction


Nature-inspired algorithm

1.3 Introduction of Diseases


Diseases is arising as a serious chronic disease and has become an epidemic now in higher
percentage of urban areas. In India, the diseases occurrence has significantly raised from 12 to
19 percent approximately in urban scenario and 6.5 percent in rural scenario. This is important
to note that this growth rate is 49 to 79 percent higher than that of China.

Figure 1.3: Diseases % Infected


In the figure 1.3 depicts the overall estimated statistics of preHeart diseases, heart diseases and
total cases in India which clearly indicates the seriousness of the issue concerned.
1.4 Diagnosis
Figure 1.4: Types of Diagnosis

AIC: The A1C test measures your average blood sugar for the past two to three months. The
advantages of being diagnosed this way are that you don't have to fast or drink anything.

Fasting Plasma Glucose (FPG): This test checks your fasting blood sugar levels. Fasting
means after not having anything to eat or drink (except water) for at least 8 hours before the test.
This test is usually done first thing in the morning, before breakfast.

Oral Glucose Tolerance Test (OGTT): The OGTT is a two-hour test that checks your blood
sugar levels before and two hours after you drink a special sweet drink. It tells the doctor how
your body processes sugar.

1.5 Research Area in Heart diseases


The Research is developed by American society of Heart diseases in field of Heart disease’s
research for this grant from different source and Industries is providing funds. Many innovations
and is promising in field of diseases. Every scientist group is working for different and specific
projects. Here everyone is contributing to improve the life of people throughout the society.
Distribution of fund in different area is given below:

Figure 1.5: Research area according to America Heart diseases Society [8]
Type 1 research:
This Diseases is introduced by an autoimmune attack on beta-cells, eliminating the ability of the
body to produce insulin. In 1921, research led to the discovery of insulin, changing type 1
diseases from a life-threatening condition to a manageable one.
Type 2 research:
This Diseases is developed by both genetic and environmental factors and effect the body’s
capability to make or use insulin. Research is developing for 3 decades to improve the glucose
controlling of patient.
Type 1 and type 2
Type 1 and Type 2 diseases have different underlying causes, but both result in high blood
glucose and lead to similar complications.
Obesity
Obesity significantly increases risk for type 2 Diseases and complicates management of type 1
Diseases.
1.6 Motivation
The Motivation of our Dissertation is that In Today’s hectic life nobody has time to detect the
behavior of their life style. Once it gets infected, we all come under threaten. So we come to
conclusion if we detected our working behavior through any mechanism that will give us prior
information about Diseases.
1.7 What does the science say about diseases
"What can I eat?" is one of the top questions asked by people with diseases when they are
diagnosed. Everybody has to follow the Diet plan when they diagnosed by illness. Some food
planning is given below:

Figure 1.6: Food Plan [8]


1.8 predeceases Factors
Predeceases condition can be recognized by following factors which we need to care. By caring
we can reduce the heart diseases risks:

Are 45 or older patient having High Blood


Pressure

Family Member Diabetics Have low HDL cholesterol


Figure 1.7 Preheart disease’s Factors

Are you fall in over Had diabetes during pregnancy


weight?

Are you physically polycystic ovary syndrome


inactive?

CHAPTER 2
DATA MINING AND ITS APPLICATION

2.1 Data Mining (DM)

“Knowledge shows the way to Power and Success”

The origin of data mining technology meets people’s necessities. DM sometimes also called as
Knowledge Discovery from the Database (KDD). A terrific amount of data and information is
being collected with the help of computing devices and latest technologies. Now data is
everywhere: from business transactions, government, healthcare, websites and scientific data
etc. Just retrieval is not enough for decision-making, so the DM come into picture for
summarization of data for valuable information i.e. Knowledge discovery and the discovery of
patterns in raw data [9].
In the beginning, we started storing all data. Unfortunately, these gigantic collections of data
accumulated on dissimilar data structures very rapidly became devastating. DM can extract
implicit but potentially useful information and knowledge, which people do not know in
advance, from a lot of noisy, incomplete, random and fuzzy data in practical application. The
DM is happening field and powerful means to extract useful knowledge from massive amounts
of data to bridge the gap between knowledge and data.

Another definition of DM is the investigation and analysis of huge quantities of data in order to
discover legitimate, narrative, potentially useful, and eventually understandable patterns in data.
Process of analysing through intelligent algorithms from large databases to find patterns that
are:

✓ Valid: The true patterns that holds in common.


✓ Novel: the pattern we do not know beforehand.
✓ Valuable: From the patterns we can invent actions.
✓ Understandable: We can deduce and figure out the patterns.

DM and KDD is a new interdisciplinary field, merging ideas from statistics, machine learning
databases and parallel computing.
Researchers have defined the term ‘data mining’ in many ways.
Few definitions of DM or KDD, which are available in literature, are given below.

2.2 KDD (Knowledge Data Discovery)


KDD process is a type of data mining methodology which used to extract hidden knowledge
from a large database, by implementing pre-processing step and data transformation step.

1. Identification of Goal Definition of Application Goal Known


Problem Prior

2. Target of Data Set Data Set Selection Data set Creation

3. Data Pre-Processing Removing Noisy Handling Missing Data


Data

4. Data Transformation Find Useful Feature Find Weighted Value

5. Data Mining Choosing DM Fun. Search for Presentation

6. Presentation Visualization Replace Redundant Pattern


Figure 2.1: KDD Process
This research will predict Diseases by using the Knowledge Discovery in Database (KDD)
methodology. KDD is the process of extracting knowledge from large database and emphasize
“high-level" application of particular data mining methods. KDD process consists of nine step,
the steps are iterative and interactive in nature9. Note that the process is iterative at each step,
meaning that one might have to move back to previous step. The process starts with determining
the KDD goals, and ends with the implementation of the discover knowledge.
KDD Steps:

1. Developing an understanding of

➢ The appropriate prior knowledge


➢ The Aim of the end-user

2. Creating a target data set or selecting a data set, on which detection is to be accomplish.
3. Data cleaning and pre-processing.

➢ Removal of noise in dataset.


➢ Plan of action for handling missing data.

4. Data reduction

➢ Finding useful features to represent the data depending on the aim of the task.
➢ Use of dimensionality reduction methods to reduce the decrease number of
variables for the representations for the data.

5. Choosing the data mining task.

➢ Choose the Aim of the KDD process is classification, regression, clustering or


any other.

6. Choosing the data algorithms.

➢ Selecting methods to be used for searching for patterns in the data.


➢ Deciding which models and parameters may be appropriate.

7. Data mining.

➢ A set of such representations as classification rules or trees, regression,


clustering.

8. Define mined patterns.


9. Combine founded knowledge.

2.3 Data mining process


Data mining is the process of extracting hidden, previously unknown patterns from huge
database or data warehouse. Data mining is also known as knowledge discovery from data
(KDD). Data mining play important role in the various area like banking, education, health care,
medical etc. Many organizations use data mining technique to analyses large dataset, to support
decision making process and to get better result for their long-term need.

Data Data Data Data Data


Selection Processing Trans- Mining Evaluation
formation

Figure 2.1: Data Mining Process Steps

Health organization use data mining technique in order to identify hidden patterns from disease,
drugs dataset and used for prediction and detection of different disease and also it supports
decision making process in clinical diagnosis. Different data mining technique is used prediction
and detection of different disease, some of the technique is listed below.[24]

2.3.1 Data Mining Techniques


Classification is the process of finding a model which describes and distinguishes data classes
or concepts based on a class label. There are different classification algorithms some of this are
Artificial Neural Network (ANN), Decision tree, Bayesian network, naïve bays etc.
Clustering is the process of analysing data objects without consulting a class label. It is process
of grouping new class based on maximizing the intra class similarity and minimizing the
interclass similarity. There are different clustering algorithms some of this are K nearest
neighbour and k mean clustering.
Association rule learning is machine learning method which used for finding frequent patterns.
Some of the association algorithm is Apriori algorithm, Eclat algorithm and FP growth
algorithm.

2.3.2 Applications of Data mining


A Traffic Prediction
P
Videos Surveillance
P
L
Search Engine Result Refining
I
C Online Fraud Detection
A
T Product Recommendations
I
O
Future Healthcare
N
N
Manufacturing Engineering

O
N
Figure 2.2: Area where DM Used
Traffic Predictions: Google uses the DM algorithm n the traffic prediction we all used the GPS
navigation system because of this navigation system the data is saved is a central database and
update the location of a vehicle. The underlying problem is that there are a minimum number of
cars that are equipped with GPS. Machine learning in such scenarios helps to estimate the
regions where congestion can be found on the basis of daily experiences. [7]
Videos Surveillance: Imagine a single person monitoring multiple video cameras, a difficult
job to do and boring as well. This is why the idea of training computers to do this job makes
sense.
The video surveillance device nowadays is powered by way of AI that makes it viable to hit
upon crime earlier than they happen. They song uncommon behavior of people like status
immobile for a long term, stumbling or snoozing on benches.
Search Engine Result Refining: Google and other search engines use DM to improve the
search results for you. Every time you execute a search, the algorithms at the backend keep a
watch at how you respond to the results. If you open the top results and stay on the web page for
long, the search engine assumes that the results it displayed were in accordance to the query.
Similarly, if you reach the second or third page of the search results but do not open any of the
results, the search engine estimates that the results served did not match requirement. This way,
the algorithms working at the backend improve the search results.[7]
Online Fraud Detection: DM is proving its potential to make cyberspace a secure place and
tracking monetary frauds online is one of its examples. For example: PayPal is using ML for
protection against money laundering.
Product Recommendations: DM algorithm is used in product recommendations User got the
same product on his social media account that he saw on a e-commerce website.
Future Healthcare: Data mining improve health systems. It uses data and analytics to verify
best practices that improve supervision and reduce costs. Researchers use data mining
algorithms like multi-dimensiona
l databases, machine learning, soft computing, data visualization and statistics. Mining can be
useful to predict the volume of patients in every class. Methods are developed that make sure
that the patients gets appropriate supervision at the right place and at right time.
Market Basket Analysis: Market basket analysis is a modelling algorithm based on theory that
if you buy a certain group of items you are more likely to buy another group of items. This
method may allow the shopkeeper to know the purchase behaviour of a purchaser. This
information can help the shopkeeper to understand the purchaser’s requirements and change the
shop’s layout accordingly.
Education: There is new emerging field, known as Educational Data Mining, concerns with
developing techniques that discover knowledge from data obtained from the educational
Environments. The objectives of EDM are identified as predicting the students’ future studying
behaviour, understanding the effects of educational help, and improving scientific knowledge
about learning. Data mining can be used by an institution to take correct decisions and also for
predicting the Progress Report of the student. With the results the institution can focus on how
to teach and what to teach.[7]

CRM: Customer Relationship Management, it is about acquiring and retaining customers, also
advancing customers’ loyalty and developing customer focused strategies. To maintain a proper
relationship with the customer.

Product Recommendations DM algorithm are used in product recommendations User got the
same product on his social media to account that he saw on an e-commerce website.

2.3.3 Data Mining Challenges:

• Developing a Unifying Theory of Data Mining.

• Scaling Up for High Dimensional Data/High Speed Streams.

• Mining Sequence Data and Time Series Data.


2.4 Introduction to Machine Learning

Machine learning works on a very simple concept understanding with experiences. Machine
learning is the process that comes from humans and animals teaches computer that learning
from the experience. Machine learning contains algorithms that learn from past data and
predicts the future data. In machine learning we train computer by algorithm on some data and
predicted the future results. The algorithms adaptively improve their performance as the number
of samples available for learning increases.

2.4.1 Types of Techniques of Machine Learning

Supervised ML

Unsupervised ML

Semi supervised ML

Reinforcement ML

Machine Learning Multitasking Learning

Ensemble Learning

Neural Network

Instance Based Learning

Figure 2.3: Types of Machine Learning

Supervised Learning: In supervised learning mechanism we have to educate the model with
some prior knowledge so that they can behave like intelligent program. Here we have to give
training as well as we can use this program for further use.
Unsupervised Learning: In unsupervised learning mechanism we have to educate the model
without any prior knowledge means this is typical to make a program behaves intelligently.
Reinforcement Learning: In this learning all programs learn their steps on the basis of their
experiences. This comes in between supervised & unsupervised. Here a terms agent comes in
picture which has very important work. Here agent will take action or learn decisions on the
basis of prior working.
Multitasking Learning: Multitask Learning (MTL) is an initial changing tools whose main
motto to enhance generalization conduct. MTL improves the above mechanism by averaging the
domain related advice containing in the training indicator of related works.
Decision Tree Model: A decision tree model is one of the most common data mining models. It
is popular because the resulting model is easy to understand. The algorithms use a recursive
partitioning approach. Decision tree is a type of supervised learning algorithm that is mostly
used in classification problems.
Types of decision tree is based on the type of target variable; it can be of two types:
Categorical Decision
Decision Tree
Continuous Decision
Figure 2.4: Types of Decision Tree
Categorical Variable Decision Tree: Decision Tree which has categorical target variable then
it called as categorical variable decision tree.
Example: In above scenario of student problem, where the target variable was “It will rain
today” YES or NO.
Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is
called as Continuous Variable Decision Tree. Example: - Salary of a person.
Support Vector Machine Model: A Support Vector Machine (SVM) searches for so called
support vectors which are data points that are found to lie at the edge of an area in space which
is a boundary from one class of points to another. In the terminology of SVM we talk about the
space between regions containing data points in different classes as being the margin between
those classes. The support vectors are used to identify a hyperlane (when we are talking about
many dimensions in the data, or a line if we were talking about only two-dimensional data) that
separates the classes.[6]
Y-Axis

X-Axis
Figure 2.5: Model of Support Vector Machine
Artificial neural network
Artificial neural network is one of prediction algorithm which use learning rate and momentum
to classify data accurately. ANN predict the output by adjusting weight. It consists of three
layers

OUTPUT LAYER
HIDDEN LAYER
LAYER
INPUT

Figure 2.6: Layers of Artificial Neural Network

Back propagation algorithm is type of Artificial neural network algorithm by which each neuron
is learned by adjusting the weighted associated with it in order to correct or reduce the error. It
is supervised learning algorithm which used gradient descent optimization algorithm in order to
adjust the weight on the neurons by computing the gradient of loss function. [6]
Advantage of Artificial neural network
This study chooses ANN algorithm because of the following advantages some of them are:
1) Ability to classify nonlinear data and Complex relationship.

2) It has high ability tolerance to Noisy data and missing value.

3) It has ability to classify untrained data.

Clustering: Clustering is the process of grouping the physical and abstract objects into classes
of the similar objects. Clustering is a process of partitioning a set of data (or objects) into a set
of meaningful sub-classes, called clusters. It is an unsupervised learning method there are no
predefined classes. Clustering technique will generate high quality clusters that intra-class
similarity is high and inter-class similarity is low. The characteristic of a clustering result also
relies upon both the similarity measure used by the technique and its implementation. The
aspect of a clustering technique is measured by its performance to find some or all of the unseen
patterns.
Boosting: Boosting is very important classification method in the recent development. It works
by applying a classification algorithm sequentially to reweighted version of training dataset,
then choosing the weighted majority vote of sequence of classifiers produced this simple
algorithm results in dramatic improvement in performance for many classification algorithms.
This seems that phenomenon can be understood in terms of statistical principles, namely
additive modelling on logistic scale which uses Bernoulli criterion as much as it can.
Association Rule Mining: Association rules analysis is a technique to uncover how items are
associated to each other. Association rule mining „ Finding frequent patterns, associations,
correlations, or causal structures among sets of items in transaction databases. What customer
buying in his basket by finding associations and correlations between the different items that
customers place in their baskets. „
Applications of association rule mining
1) Basket data analysis.

2) Cross-marketing.

3) Catalog design.

4) Loss-leader analysis.

2.5 Importance of Boosting Method


Boosting is Machine learning Meta algorithm for reducing bias and variance in supervised learning and
machine learning which converts weak learner to strong learner. A question is posed by Kearn and
Valiant “Can a group of weak learners make a strong learner? “Here a weak learner is defined as
classifier i.e. slightly correlated with the right classification (it can provide example which are better
than random guessing) on contrary. a strong learner is a classifier which is arbitrarily well correlated
with the right classification.
2.6 Types of Classification Algorithms

Naïve Bayes

Support Vector Machine

Logistic Regression

Decision Tree

Random Forest
Classification
Algorithms K-Mean

Neural Network

Fuzzy k-NN

Genetic Algorithm

Figure 2.7: Types of Classification Algorithms


CHAPTER 3
LITERATURE REVIEW

3.1 Introduction
We all know that health is very important key features nowadays to all of us. We know that
many countries like India, Bangladesh and Pakistan is really struggling with Disease’s patients.
In America people also struggling with this stage. So many Researchers start contributing their
efforts in this field. In below section we studied number of research papers and tried to build
some summary for our research work.
According to the paper, Data mining is sub branch of computer science. It the the way through
which we find some info from a given huge data. Here every day new technology comes into
existence like manufactured intelligences, DBMS, ML, DL. work of data mining is find
structural data that will provide some reasonable information from a give huge data. Here
Authors proposed that algorithm like Bayesian and KNN to apply of patient data and try to find
prediction of heart diseases based upon given features. [10].
Finally, Authors conclude that Authors used a large dataset to ensure better prediction
result. Here Authors give some recommendation to the patient that how to control Diseases in
the case of young age patient. Authors build a system which will anticipate heart diseases
patient. Here knowledge base assistance plays vital role in prediction system. Authors taken a
dataset which has 2000 in counts which will give nearness levels of heart disease’s patient. Here
prediction is taken place with the help of Naive bayes and k-nearest Neighbors and also, they
compare on the basis of some performance parameters. This developed system may be very
useful for HealthCare Industries for finding pre Diseasess patients [11].
Here Authors Explained that we have several Machine learning techniques which are used for
better prediction over a big data set. We all know that due to complexity prediction in health
sector is challenging job for all data scientist but it is very important for HealthCare sector. This
paper discussed about different six machine learning algorithms are utilized for our prediction
system. Performance and accuracy of applied on a dataset. Here Authors applied different
comparison parameters. Here Authors tried to prove which one gives better result in terms of
Accuracy. Aim of this research to help out doctors and PR actioners for finding early prediction
of Diseases with ML techniques.
Authors concluded that predictive analysis in any HealthCare system may be change the
mindset of doctors by finding Insight Information from a given data by using Machine Learning.
Here Authors used different Algorithms like SVM, KNN, RF, NB, DT & LR. Authors used
Pima Indian dataset for their analysis. Authors claim that SVM and KNN give higher accuracy
both these algorithms give 77% accuracy. For finding better accuracy they need huge real data
for creating model [ 12].

In this paper, Here Authors explained about Disease’s mellitus is very common disease in many
people due to disorder of metabolic functionality. Due to this many organs gets infected. if we
talk about blood veins and nerves. If we predict early prediction then may be possible that we
can stop in any human body from very dangerous stage. Machine Learning techniques provides
efficient result to extract knowledge by creating any predicting model for diagnostic medical
datasets collected from different real heart diseases patients. From this dataset we can extract
many insightful information by using machine learning mechanism. In this work Authors
applied very popular ML Models like SVM, NB, K-Nearest Neighbors & c4.5 Decision Tree. In
this case DT gives better result in terms of Accuracy or other performance parameters [13].
Here Authors concludes that analysis of early prediction can reduce the risk factor by using
machine learning techniques. Here Authors extract the Insightful information from give dataset.
They applied multiple ML Model out of them c4.5 decision Tree gives better results in terms of
Accuracy.
In this work Authors focus over use in 21st century Major cause of death is diseases/syndrome.
If the trends go similar then in 2030 millions of people can die due to this disease. Health Sector
is collecting a real data from different hospital or Test center for doing research. Here Machine
learning gives very good support in terms of finding
CHAPTER 4
PROBLEM DEFINITION & PROPOSED METHOD

4.1 Diseases Prediction Methods


In order to find our goal, our methodology contains a number of stages which we are explaining
below:
A. Datasets & Properties
B. Data Preprocessing
C. Apply Different Machine Learning Techniques
D. Finding Performance Measures
For better understanding we are explaining it in form of process flow diagram which is given
below:

Real Time Problem

Relevant Data Health Data


Collection Storage

Data Preprocessing

Training Dataset Testing Dataset

Apply ML Model

Performance Result

Figure4.1: Proposed Process Flow

Dataset & Properties


Table 4.1 Properties Description
S.N0: Properties Remark
1 Pregnancies Number of times pregnant
2 glucose plasma glucose concentration 2 hours in an oral
glucose tolerance test

3 Blood Pressure Diastolic blood pressure (mm Hg)


4 Skin Thickness Triceps skin fold thickness(mm)
5 Insulin 2-Hour serum insulin (mu U/ml)
6 BMI Body mass index (weight in kg/(height in m)2)
7 Diseases Pedigree Diseases pedigree function
Function
8 Age Age (Years)
9 Outcome class variables (0 or 1) 268 of 768 are 1 other are 0

Data Processing
As a researcher we all know that we have two types of data Numerical Data and Nominal Data.
Both data has specific work in their field sometimes we have to convert one form to another
form. Here we are converting from Numerical Data to Nominal Data.
The patient’s age is classified into three categories
Table 4.2 Data Conversion
S. No Classification Numerical Value
1 Young 10-25 years
2 Adult 26- 50 years
3 Old (Above 50 years)

Apply Machine Learning


When our Data is ready for using by any ML Techniques to create Model. Here we are
Applying Number of Machine Learning Algorithms for finding better Results.

Apply Performance Measure


By using following equations, we can find many Evaluation Parameters some of them is given
below:

Precision:
Recall:

F-measure

Accuracy:

4.2 Algorithms

Step 01: Store Data from Kaggle Repository

Step 02: Import Prior Libraries:

Step03: Now Import our Required Dataset

Step04: Apply Feature Extraction

a) Data Conversion

b) Apply Encoding Techniques

Step 05: Visualize Data for better understanding

Step06: Applying Machine Learning Algorithms

Step07: Apply Different Model

Step08: Repeat Step07 for many times with different Algorithms

Step09: Finally Compare Results with performance parameters like Accuracy


CHAPTER 5
EXPERIMENT SETUP

5.1 Experimental Framework


Python is a prominent environment using by researcher to development or deployment of
generated systems. It has vast set of libraries with number of modules, packages that supports
programmer to attain in many ways to complete their work efficiently.

Figure 5.1: GUI Anaconda

Anaconda is a totally free Environment their source is really open to all for doing much.
Python and its libraries are using in data science and data analysis very efficiently. They are
also largely used for creating expandable machine learning algorithms. Python can apply
various machines learning techniques such65 as Classification, Regression, Recommendation,
and Clustering.

Python offers to researcher ready-to-Implement Environment for doing or performing data


mining tasks on huge volumes and variety of data effectively in lesser time.
Pandas

SciKit-Learn
Python Utility
SciPy

Matplotlib

Figure 5.2: Libraries of Python

5.2 Dataset & Features


Machine learning data is usually described in a matrix called dataset. This matrix is structured in
a way that corresponds to each row an observation (example) data set and each column
represents a feature (also variable or attribute) that describes the data. Data values can take
many representations. Data can be numerical (integer or real numbers) or nominal data, where
values are differentiated by name. Nominal data is type of Categorical data type of that, as its
name indicates, the data only can have a fixed set of nominal values (or categories).
5.3 Implementation
The model employs filters for faster evaluation and lesser overall time. The pre-processing
methods and application of filters affect a lot in final evaluation results of classifiers (ML based
models). The feature extraction methods, conversion of nominal to binary and cleaning are few
of those filters

5.4 Different Process Stage

Figure 5.3: Calling Libraries


Explanation: In the figure 5.3 we called Libraries which will help you to call all functionality
which required.

Figure 5.4: Major Columns

Explanation: In the igure 5.4 we try to show number of major columns available in our
DataSet.we have 9 columns in our data set.

Figure 5.5: Major Columns

Explanation: In the igure 5.5 we try to show all attributes available in our DataSet.we have 9
columns in our data set.
Figure 5.6: Histogram of Age in terms of Diseases

Explanation: In the igure 5.6 we try to show age variation in terms of Diseasess variation.

Figure 5.7: All Parameter Dependencies

Explanation: In the Figure 5.7 we try to show all parameter impact in terms of terms of
Diseasess variation.
CHAPTER - 6
CONCLUSION AND FUTURE WORK

6.1 Conclusion
We conclude that when we Implemented Number of Machine Learning Algorithms for finding

best results in terms of performance. We proposed Tunning of given model for improving the

performance and after applying GradientBoostingClassifier & LGBMClassifier with changes

like random state and some other parameter which improve the performance values. The values

we getting is RF: 0.897368,

XGB: 0.901316, LightGBM: 0.896053.

6.2 Future Work


The future works focus on applying some other techniques to improving the performances of

these methods for up to maximum extent. Another concept that can be implemented Deep

learning in place of machine learning technology. The reason behind this is best and efficient

techniques using nowadays. Deep learning is also introduced nowadays which is becoming

more popular for classification purpose. So, we can also implement deep learning in future work

also.

You might also like