0% found this document useful (0 votes)
34 views4 pages

Heart Disease Diagnosis Using Data Mining Technique

Heart Disease Diagnosis Using Data Mining Technique

Uploaded by

krlejugadkrle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views4 pages

Heart Disease Diagnosis Using Data Mining Technique

Heart Disease Diagnosis Using Data Mining Technique

Uploaded by

krlejugadkrle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Conference on Electronics, Communication and Aerospace Technology

ICECA 2017

Heart Disease Diagnosis Using Data Mining


Technique
Sarath Babu, Vivek EM, Famina KP, Fida K, Aswathi P, Shanid M, Hena M
Dept. of Computer Science and Engineering
Eranad Knowledge City Technical Campus
Manjeri, Malappuram Dist. Kerala
[email protected]

Abstract— Data mining is an advanced technology, which is the 2. Classification


process of discovering actionable information from large set of 3. Clustering
data, which is used to analyze large volumes of data and extracts Association rule mining is method for discovery of
patterns that can be converted to useful knowledge. Medical data interesting relations between variables in large databases. It is
mining has a great potential for exploring the hidden patterns in
intended to identify strong rules discovered in databases using
the data sets of medical domain. These patterns can be utilized to
do clinical diagnosis. These data need to be collected in a some measurements. The Apriori algorithm and MAFIA
standardized form. From the medical profiles fourteen attributes algorithm are used for generating association rules. MAFIA
are extracted such as age, sex, blood pressure and blood sugar algorithm generate maximal frequent item set before finding
etc. can predict the likelihood of patient getting heart disease. all frequent item set and once we find maximal frequent item
These attributes are fed in to K-means algorithms, MAFIA set, we can generate all frequent item set in single scan. These
algorithm and Decision tree classification in heart disease are the main advantage of MAFIA algorithm than Apriori
prediction, applying the data mining technique to heart disease algorithm.
treatment; it can provide as reliable performance as that The Database containing huge amount of data with
achieved in diagnosing heart disease. By this medical industries
hidden information that used for making decisions.
could offer better diagnosis and treatment of the patient to attain
a good quality of services. The main advantages of this paper are: Classification model use to extract a model describing
early detection of heart disease and its diagnosis correctly on time important classes. Classification techniques are decision tree
and providing treatment with affordable cost. algorithm and Naïve Bayes algorithm [4]. Decision tree are
very flexible, easy to understand and easy to debug. It takes
Keywords: Heart disease; Data mining; Decision tree; K-mean care of various issues like missing value, outlier and
clustering algorithm. identifying significant dimensions. Naïve Bayes is supervised
algorithm.it assume underlying probabilistic model. It is
I. INTRODUCTION
assumption, so loss accuracy and if no occurrence of attribute
In the modern life style health diseases are increasing or class label then probability estimate will be zero. So
tremendously. Our life style had a great impact on our health decision tree is better than naïve Bayes.
causing heart diseases and other health problems. Taking a Clustering is the process of grouping same
survey of present population it is seen that about sixty characteristic data into classes or cluster. K-mean clustering
percentages are suffering from heart diseases. algorithm used for clustering. K-mean algorithm is faster than
Early detection of heart diseases can prevent the other clustering algorithm and Works great if clusters are
death rate, people are not aware about the detection of heart spherical. K-means becomes a great solution for pre-
disease earlier due to lack of knowledge. Health care clustering, reducing the space into disjoint smaller sub-spaces
industries are aiming to diagnose the disease at early stages. In where other clustering algorithms can be applied.
most cases it is noticed at the final stages of disease or after
death. The cost of treatment for heart disease is very II. RELATED WORKS
expensive. The treatment cost is not affordable for everyone. The researchers have been investigating the use of data
Therefore people are reluctant to do proper treatment at early mining techniques to detect heart disease. There are some
stages of disease. The aim of our project is to diagnose the factors such as factors associated with heart disease like age,
disease at early stage at affordable cost. By using data mining sex, chest pain, blood pressure, cholesterol, blood sugar, etc.
technique we can detect disease at early stage and we can These factors are used to diagnosis the heart disease in patients.
completely cure the disease by proper diagnosis. Health care
Jyoti Soni et al.[1] intends to provide a survey of current
industry collect huge amount of data, which are not mined to
techniques of data extraction from databases using data
discover hidden information. Remedy of this problem is data
mining techniques that are used in Heart Disease Prediction.
mining technique. Data mining is the process of analyzing
The techniques used here are Naive Bayes, Decision List and
large set of data and summarizing into useful information.
KNN. Here the Classification based on clustering is not
Data mining techniques are:
performing well.
1. Association

978-1-5090-5686-6/17/$31.00 ©2017 IEEE 750


International Conference on Electronics, Communication and Aerospace Technology
ICECA 2017

P.K Anooj et al. [2] presented a weighted fuzzy rule-based evaluation the Decision tree is considered as the best classifier
system for the diagnosis of heart disease, the system will for heart disease diagnosis from the dataset.
automatically retrieve knowledge from the patient’s data. The
proposed system for the prediction of heart disease consists of Deepika N et al. [10] used Pruning Classification
two phases: (1) automated approach for the generation of Association Rule (PCAR). Pruning Classification Association
weighted fuzzy rules and (2) developing a fuzzy rule-based Rule comes from Apriori algorithm. The proposed method
decision support system. The weighted fuzzy rules were used deletes minimum frequency item with minimum frequency
to build the system using Mamdani fuzzy inference system. item sets and deletes infrequent items from item sets then the
frequent item set is discovered.
Nidhi Bhatla et al. [3] proposed to analyse various data
III. PROPOSED SYSTEM
mining techniques used in heart disease prediction. The
observations reveal that neural networks with 15 attributes has In the proposed system early diagnosis of the heart disease
outperformed over all other data mining techniques. Another is carried using the data mining techniques. A huge amount of
conclusion from the analysis is that decision tree has also healthcare data, which unfortunately, are not mined to discover
shown good accuracy with the help of genetic algorithm and hidden information for effective decision making.
feature subset selection [6].

Aditya Methaila et al. [4] desire to use data mining


Classification Modeling Techniques, such as Decision Trees,
Naïve Bayes and Neural Network, in addition to weighted
association Apriori algorithm and MAFIA algorithm in Heart
Disease Prediction

Shimpy Goyal et al. [5] discussed Data Mining


Techniques to Predict Heart Disease based on K-means and
apriori algorithm. The researchers also presented the
challenges in detecting and diagnose the diseases and analyze
results of research.

M. Akhil jabbar et al. [7] presented an efficient associative


classification based genetic algorithm is used for heart disease
prediction. The main reason for using genetic algorithm to
predict disease from large dataset is that the actual size of data
is to get best attribute set. There is certain limitation in the
prediction of heart disease using data mining approach. By
reducing the set of attributes we can make it less complex and
better.

R Sethukkarasi et al. [8] provides novel neuro fuzzy


techniques and pre-processed by genetic algorithm. There are
four layered fuzzy neural networks used, A radial basic
function neural network is constructed with five input, training
and normalisation in hidden layer and output layer with one
node.

Mohammed Abdul Khaleel et al. [9] presented a method to


find the diseases with Apriori data mining technique. A
graphically representation is also used to visualize the
techniques. A prototype is produced to demonstrate the
efficiency of the method. It results that the prototype can be
useful in real world.
Fig 1: Block diagram of the system

Boshra Bahrami et al. [10] checked the different This research provides a prototype “Heart Disease
classification techniques in diagnosis heart disease. Classifiers Diagnosis Using Data Mining Technique” such as
such as Decision Tree, KNN, and Naive Bayes are used to
a. Genitic algorithm
divide dataset. After the classification and performance
b. K-means algorithm

978-1-5090-5686-6/17/$31.00 ©2017 IEEE 751


International Conference on Electronics, Communication and Aerospace Technology
ICECA 2017

c. MAFIA algorithm of the clusters resulting from the previous step. After we have
these k new centroids, a new binding has to be
d. Decission tree classification
done between the same data set points and the nearest
Genetic algorithm new center. A loop has been generated, as a result of this loop
it is notice that the k centers change their location step by step
A genetic algorithm (GA) is a searching that imitate
until no more changes are done or in other words centers do
the process of natural evolution. This inquistive is routinely
not move any more. Finally, this algorithm aims
used to generate useful solutions to optimization and search
at minimizing an objective function know as squared error
problems. In our system the genetic algorithme is used to
function given by:
extract attribute from a hugh attribute set.
2
( )= ∑ ∑ | − | (1)
The extracted attribute are as follows,
1. age: age in years Where, ||xi - vj|| is the Euclidean distance between xi and vj.
2. sex: sex (1 = male; 0 = female) ‘ci’ is the number of data points in ith cluster.
3. cp: chest pain type ‘c’ is the number of cluster canters.
Value 1: typical angina
Value 2: atypical angina Algorithm 1: K-means clustering
Value 3: non-anginal pain
Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to Input: The number of clusters k, and a database containing n
the hospital) objects.
5. chol: serum cholestoral in mg/dl Output: A set of k clusters which minimizes the squared-error
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) criterion.
7. restecg: resting electrocardiographic results Value 0: normal Method:
Value 1: having ST-T wave abnormality (T wave inversions 1) arbitrarily choose k objects as the initial cluster centers;
and/or ST elevation or depression of > 0.05 mV) 2) repeat
Value 2: showing probable or definite left ventricular 3) assign each object to the cluster to which the object is the
hypertrophy by Estes' criteria most similar,
8. thalach: maximum heart rate achieved based on the mean value of the objects in the cluster;
9. exang: exercise induced angina (1 = yes; 0 = no) 4) update the cluster means, i.e., calculate the mean value of
10. oldpeak = ST depression induced by exercise relative to the objects for each cluster;
rest 5) until no change;
11. slope: the slope of the peak exercise ST segment Value 1:
upsloping
Value 2: flat
Value 3: downsloping
12. ca: number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. num: diagnosis of heart disease (angiographic disease
status)
Value 0: < 50% diameter narrowing
Value 1: > 50% diameter narrowing
K-means algorithm
k-means clustering algorithm is one of the simplest
unsupervised learning algorithms that solve the
well known clustering problem. The method follows a simple
way to cluster a given data set through a certain number
of clusters fixed apriori. The main idea is to define k Fig 2: K-mean clustering for heart disease patient
centers is one for each cluster. These centers should be
MAFIA algorithm
arranged in a smart way because
of different location causes different result in the clustering. MAFIA algorithm is used for mining maximal
So, the better choice is to arrange them as much as frequent itemsets from a database. This algorithm is notably
possible far away from each other. The next stride is to take efficient when the itemsets in the database are very large. The
each point belongs to the given data set and associate it to the search procedure of the algorithm incorporates a depth-first
nearest center. When no point is pending, the first step is traversal of the itemset frame with effective pruning
completed and an early group age is done. At this point we mechanisms.
need to re-calculate k new centroids as barycenter
Pseudo code for MAFIA:

978-1-5090-5686-6/17/$31.00 ©2017 IEEE 752


International Conference on Electronics, Communication and Aerospace Technology
ICECA 2017

MAFIA(C, MFI, Boolean IsHUT) [6] M. Anbarasi, E. Anupriya, N.Ch.S.N.Iyengar, “Enhanced Prediction of
Heart Disease with Feature Subset Selection using Genetic Algorithm”,
{ International Journal of Engineering Science and Technology Vol.
name HUT = C.head C.tail; 2(10), 2010, 5370-5376
if HUT is in MFI [7] M.Akhil jabbar, Dr.Priti Chandra, Dr.B.L Deekshatulu “ Heart Disease
stop generation of children and return Prediction System using Associative Classification and Genetic
Count all children, use PEP to trim the tail, and recorder by Algorithm”, ICECIT, 2012.
increasing support, [8] R. Sethukkarasi and Kannan, “An Intelegent System for Mining
Temporal rules in Clinical database using Fuzzy neural network”,
For each item i in C, trimmed_tail European Journal of Scientific Research, ISSN 1450-216, Vol 70(3), pp
{ 386-395, 2012.
IsHUT = whether i is the first item in the tail newNode = C I [9] Mohammed Abdul Khaleel, Sateesh Kumar Pradhan, “ Finding Locally
MAFIA (newNode, MFI, IsHUT) Frequent Diseases Using Modified Apriori Algorithm”, International
} Journal of Advanced Research in Computer and Communication
Engineering Vol. 2, Issue 10, October 2013.
if (IsHUT and all extensions are frequent)
[10] Boshra Bahrami, Mirsaeid Hosseini Shirvani, “Prediction and Diagnosis
Stop search and go back up subtree of Heart Disease by Data Mining Techniques”, Journal of
If (C is a leaf and C.head is not in MFI) Multidisciplinary Engineering Science and Technology (JMEST) ISSN:
Add C.head to MFI 3159-0040 Vol. 2 Issue 2, February–2015.
} [11] Deepika .N, “Association Rules for Classifiaction of Heart Attack
Patients”, IJAEST, Vol 11(2), pp 253-257, 2011.
Decision algorithm [12] Cleveland database:
http://archive.ics.uci.edu/ml/datasets/Heart+Disease.
A decision tree is a flow-chart-like tree structure, [13] Statlog database: http://archive.ics.uci.edu/ml/machine-learning-
where each internal node denotes a test on an attribute, each databases/statlog/heart.
branch represents an outcome of the test, and leaf nodes
represent classes or class distributions. The topmost node
in a tree is the root node. Internal nodes are represented by
rectangles, and leaf nodes are denoted by ovals. For the
classification of an unknown sample, the sample attribute
values are tested across the decision tree. A path is drawn
from the root to a leaf node which holds the class prediction
for that sample, so decision trees can easily be converted to
classification rules.
IV. CONCLUSION
In this paper the focus is on using different
algorithms in data mining and sequence of several attributes
for effective heart disease prediction and its diagnosis.
Decision Tree has tremendous efficiency using fourteen
attributes, after applying genetic algorithm to reduce the actual
data size to get the optimal subset of attribute acceptable for
heart disease prediction.
REFERENCES
[1] Jyoti Soni, Predictive Data Mining for Medical Diagnosis: An Overview
of Heart Disease Prediction, International Journal of Computer
Applications (0975 – 8887) Volume 17– No.8, March 2011.
[2] P.K. Anooj, Clinical decision support system: Risk level prediction of
heart disease using weighted fuzzy rules, Journal of King Saud
University – Computer and Information Sciences (2012) 24, 27–40.
[3] Nidhi Bhatla and Kiran Jyoti, An Analysis of Heart Disease Prediction
using Different Data Mining Techniques, International Journal of
Engineering Research & Technology (IJERT) Vol. 1 Issue 8, October -
2012 ISSN: 2278-0181.
[4] Aditya Methaila, Prince Kansal, Himanshu Arya, Pankaj Kumar, Early
Heart Disease Prediction Using Data Mining Echniques,
Sundarapandian et al. (Eds) : CCSEIT, DMDB, ICBB, MoWiN, AIAP –
2014 pp. 53–59, 2014. © CS & IT-CSCP 2014 DOI :
10.5121/csit.2014.4807.
[5] Shimpy Goyal and Dr. Rajender Singh Chhillar , A Literature Survey on
Applications of Data Mining Techniques to Predict Heart Diseases,
International Journal of Engineering Sciences Paradigms and Researches
(IJESPR) (Vol. 20, Issue 01) and (Publishing Month: May 2015)

978-1-5090-5686-6/17/$31.00 ©2017 IEEE 753

You might also like