Heart Disease Diagnosis Using Data Mining Technique
Heart Disease Diagnosis Using Data Mining Technique
ICECA 2017
P.K Anooj et al. [2] presented a weighted fuzzy rule-based evaluation the Decision tree is considered as the best classifier
system for the diagnosis of heart disease, the system will for heart disease diagnosis from the dataset.
automatically retrieve knowledge from the patient’s data. The
proposed system for the prediction of heart disease consists of Deepika N et al. [10] used Pruning Classification
two phases: (1) automated approach for the generation of Association Rule (PCAR). Pruning Classification Association
weighted fuzzy rules and (2) developing a fuzzy rule-based Rule comes from Apriori algorithm. The proposed method
decision support system. The weighted fuzzy rules were used deletes minimum frequency item with minimum frequency
to build the system using Mamdani fuzzy inference system. item sets and deletes infrequent items from item sets then the
frequent item set is discovered.
Nidhi Bhatla et al. [3] proposed to analyse various data
III. PROPOSED SYSTEM
mining techniques used in heart disease prediction. The
observations reveal that neural networks with 15 attributes has In the proposed system early diagnosis of the heart disease
outperformed over all other data mining techniques. Another is carried using the data mining techniques. A huge amount of
conclusion from the analysis is that decision tree has also healthcare data, which unfortunately, are not mined to discover
shown good accuracy with the help of genetic algorithm and hidden information for effective decision making.
feature subset selection [6].
Boshra Bahrami et al. [10] checked the different This research provides a prototype “Heart Disease
classification techniques in diagnosis heart disease. Classifiers Diagnosis Using Data Mining Technique” such as
such as Decision Tree, KNN, and Naive Bayes are used to
a. Genitic algorithm
divide dataset. After the classification and performance
b. K-means algorithm
c. MAFIA algorithm of the clusters resulting from the previous step. After we have
these k new centroids, a new binding has to be
d. Decission tree classification
done between the same data set points and the nearest
Genetic algorithm new center. A loop has been generated, as a result of this loop
it is notice that the k centers change their location step by step
A genetic algorithm (GA) is a searching that imitate
until no more changes are done or in other words centers do
the process of natural evolution. This inquistive is routinely
not move any more. Finally, this algorithm aims
used to generate useful solutions to optimization and search
at minimizing an objective function know as squared error
problems. In our system the genetic algorithme is used to
function given by:
extract attribute from a hugh attribute set.
2
( )= ∑ ∑ | − | (1)
The extracted attribute are as follows,
1. age: age in years Where, ||xi - vj|| is the Euclidean distance between xi and vj.
2. sex: sex (1 = male; 0 = female) ‘ci’ is the number of data points in ith cluster.
3. cp: chest pain type ‘c’ is the number of cluster canters.
Value 1: typical angina
Value 2: atypical angina Algorithm 1: K-means clustering
Value 3: non-anginal pain
Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to Input: The number of clusters k, and a database containing n
the hospital) objects.
5. chol: serum cholestoral in mg/dl Output: A set of k clusters which minimizes the squared-error
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) criterion.
7. restecg: resting electrocardiographic results Value 0: normal Method:
Value 1: having ST-T wave abnormality (T wave inversions 1) arbitrarily choose k objects as the initial cluster centers;
and/or ST elevation or depression of > 0.05 mV) 2) repeat
Value 2: showing probable or definite left ventricular 3) assign each object to the cluster to which the object is the
hypertrophy by Estes' criteria most similar,
8. thalach: maximum heart rate achieved based on the mean value of the objects in the cluster;
9. exang: exercise induced angina (1 = yes; 0 = no) 4) update the cluster means, i.e., calculate the mean value of
10. oldpeak = ST depression induced by exercise relative to the objects for each cluster;
rest 5) until no change;
11. slope: the slope of the peak exercise ST segment Value 1:
upsloping
Value 2: flat
Value 3: downsloping
12. ca: number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. num: diagnosis of heart disease (angiographic disease
status)
Value 0: < 50% diameter narrowing
Value 1: > 50% diameter narrowing
K-means algorithm
k-means clustering algorithm is one of the simplest
unsupervised learning algorithms that solve the
well known clustering problem. The method follows a simple
way to cluster a given data set through a certain number
of clusters fixed apriori. The main idea is to define k Fig 2: K-mean clustering for heart disease patient
centers is one for each cluster. These centers should be
MAFIA algorithm
arranged in a smart way because
of different location causes different result in the clustering. MAFIA algorithm is used for mining maximal
So, the better choice is to arrange them as much as frequent itemsets from a database. This algorithm is notably
possible far away from each other. The next stride is to take efficient when the itemsets in the database are very large. The
each point belongs to the given data set and associate it to the search procedure of the algorithm incorporates a depth-first
nearest center. When no point is pending, the first step is traversal of the itemset frame with effective pruning
completed and an early group age is done. At this point we mechanisms.
need to re-calculate k new centroids as barycenter
Pseudo code for MAFIA:
MAFIA(C, MFI, Boolean IsHUT) [6] M. Anbarasi, E. Anupriya, N.Ch.S.N.Iyengar, “Enhanced Prediction of
Heart Disease with Feature Subset Selection using Genetic Algorithm”,
{ International Journal of Engineering Science and Technology Vol.
name HUT = C.head C.tail; 2(10), 2010, 5370-5376
if HUT is in MFI [7] M.Akhil jabbar, Dr.Priti Chandra, Dr.B.L Deekshatulu “ Heart Disease
stop generation of children and return Prediction System using Associative Classification and Genetic
Count all children, use PEP to trim the tail, and recorder by Algorithm”, ICECIT, 2012.
increasing support, [8] R. Sethukkarasi and Kannan, “An Intelegent System for Mining
Temporal rules in Clinical database using Fuzzy neural network”,
For each item i in C, trimmed_tail European Journal of Scientific Research, ISSN 1450-216, Vol 70(3), pp
{ 386-395, 2012.
IsHUT = whether i is the first item in the tail newNode = C I [9] Mohammed Abdul Khaleel, Sateesh Kumar Pradhan, “ Finding Locally
MAFIA (newNode, MFI, IsHUT) Frequent Diseases Using Modified Apriori Algorithm”, International
} Journal of Advanced Research in Computer and Communication
Engineering Vol. 2, Issue 10, October 2013.
if (IsHUT and all extensions are frequent)
[10] Boshra Bahrami, Mirsaeid Hosseini Shirvani, “Prediction and Diagnosis
Stop search and go back up subtree of Heart Disease by Data Mining Techniques”, Journal of
If (C is a leaf and C.head is not in MFI) Multidisciplinary Engineering Science and Technology (JMEST) ISSN:
Add C.head to MFI 3159-0040 Vol. 2 Issue 2, February–2015.
} [11] Deepika .N, “Association Rules for Classifiaction of Heart Attack
Patients”, IJAEST, Vol 11(2), pp 253-257, 2011.
Decision algorithm [12] Cleveland database:
http://archive.ics.uci.edu/ml/datasets/Heart+Disease.
A decision tree is a flow-chart-like tree structure, [13] Statlog database: http://archive.ics.uci.edu/ml/machine-learning-
where each internal node denotes a test on an attribute, each databases/statlog/heart.
branch represents an outcome of the test, and leaf nodes
represent classes or class distributions. The topmost node
in a tree is the root node. Internal nodes are represented by
rectangles, and leaf nodes are denoted by ovals. For the
classification of an unknown sample, the sample attribute
values are tested across the decision tree. A path is drawn
from the root to a leaf node which holds the class prediction
for that sample, so decision trees can easily be converted to
classification rules.
IV. CONCLUSION
In this paper the focus is on using different
algorithms in data mining and sequence of several attributes
for effective heart disease prediction and its diagnosis.
Decision Tree has tremendous efficiency using fourteen
attributes, after applying genetic algorithm to reduce the actual
data size to get the optimal subset of attribute acceptable for
heart disease prediction.
REFERENCES
[1] Jyoti Soni, Predictive Data Mining for Medical Diagnosis: An Overview
of Heart Disease Prediction, International Journal of Computer
Applications (0975 – 8887) Volume 17– No.8, March 2011.
[2] P.K. Anooj, Clinical decision support system: Risk level prediction of
heart disease using weighted fuzzy rules, Journal of King Saud
University – Computer and Information Sciences (2012) 24, 27–40.
[3] Nidhi Bhatla and Kiran Jyoti, An Analysis of Heart Disease Prediction
using Different Data Mining Techniques, International Journal of
Engineering Research & Technology (IJERT) Vol. 1 Issue 8, October -
2012 ISSN: 2278-0181.
[4] Aditya Methaila, Prince Kansal, Himanshu Arya, Pankaj Kumar, Early
Heart Disease Prediction Using Data Mining Echniques,
Sundarapandian et al. (Eds) : CCSEIT, DMDB, ICBB, MoWiN, AIAP –
2014 pp. 53–59, 2014. © CS & IT-CSCP 2014 DOI :
10.5121/csit.2014.4807.
[5] Shimpy Goyal and Dr. Rajender Singh Chhillar , A Literature Survey on
Applications of Data Mining Techniques to Predict Heart Diseases,
International Journal of Engineering Sciences Paradigms and Researches
(IJESPR) (Vol. 20, Issue 01) and (Publishing Month: May 2015)