0% found this document useful (0 votes)
4 views

Data Mining Apriori Algorithm for Heart Disease Prediction

NA

Uploaded by

dynamogaming8055
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Mining Apriori Algorithm for Heart Disease Prediction

NA

Uploaded by

dynamogaming8055
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/317952833

Data Mining Apriori Algorithm for Heart Disease Prediction

Article · January 2017

CITATIONS READS
34 1,651

3 authors:

Mirpouya Mirmozaffari Alireza Alinezhad


University of Texas at Arlington Qazvin Islamic Azad University
30 PUBLICATIONS 355 CITATIONS 190 PUBLICATIONS 635 CITATIONS

SEE PROFILE SEE PROFILE

Azadeh Gilanpour
University of Oklahoma
3 PUBLICATIONS 82 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

clustering in data mining View project

Expansion of HVDC and EHVAC View project

All content following this page was uploaded by Azadeh Gilanpour on 27 June 2017.

The user has requested enhancement of the downloaded file.


Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477

Data Mining Apriori Algorithm for Heart


Disease Prediction
Mirpouya Mirmozaffari1, Alireza Alinezhad2, and Azadeh Gilanpour3


Abstract—Heart disease is a major cause of morbidity and II. BACKGROUND AND LITERATURE REVIEW
mortality in the modern society. Almost 60% of the world population
Growing number of heart patients worldwide have
fall victim to the heart disease. Although significant progress has
been made in the diagnosis and treatment of coronary heart disease,
motivated researchers to do comprehensive research to reveal
further investigation is still needed. Data mining, as a solution to hidden patterns in clinical datasets. This section provides an
extract hidden pattern from the clinical dataset are applied to a overview of previous computational studies on pattern
database in this research. The database consists of 209 instances and recognition in heart disease. Not only are different techniques
8 attributes. The system was implemented in WEKA and MATLAB addressed, but also various heart disease datasets are covered
software and prediction accuracy within Apriori algorithm in 3 steps, to have a fair comparison. Finally, the gap in existing
are compared. MATLAB is introduced as better performance literature, which was the main motivation of this study is also
software. provided. Some of the key studies are as follows:
 Das et al. introduced a neural network classifier for
Keywords— Data mining, Apriori, MATLAB, WEKA. diagnosing of the valvular heart disease. The ensemble-based
methods create new models by combining the posterior
I. INTRODUCTION probabilities or the predicted values from multiple predecessor

C ARDIOVASCULAR diseases, such as coronary heart


disease and arrhythmia, are among diseases which
endanger human life [1]. Medical practitioners conduct
models. An effective model has been created and
experimentally tested. A classification accuracy of 97.4% from
the experiment on a dataset containing 215 samples is
different surveys on heart diseases and gather information of achieved [3].
heart patients, their symptoms and disease progression.  Pandey et al. proposed the performance of clustering
Increasingly are reported about patients with common diseases algorithm using heart disease dataset. They evaluated the
who have typical symptoms. performance and prediction accuracy of some clustering
Data Mining is the process of extracting hidden knowledge algorithms. The performance of clusters will be calculated
from large volumes of raw data. [2]. It has been defined as using the mode of classes to clusters evaluation. Finally, they
“the nontrivial extraction of previously unknown, implicit and proposed Make Density Based Cluster with the prediction
potentially useful information from data. Data mining is the accuracy of 85.8086%, as the most versatile algorithm for
science of extracting useful information from large databases. heart disease diagnosis [4].
To find the unknown trends in heart disease, Apriori  Karaolis et al. developed a data mining system using
algorithm in association rule are applied to a unique dataset association analysis based on the Apriori algorithm for the
and their accuracy are compared in two different software. A assessment of heart-related risk factors with WEKA tools. A
dataset of 209 instances and 8 attributes (7 inputs and 1 total of 369 cases were collected from the Paphos CHD
output) are used to test and justify the algorithm. To further Survey, most of them with more than one event. Selected rules
enhance accuracy and achieve more reliable variables, the were evaluated according to the importance of each rule. Each
dataset is purified by Discretization unsupervised filter. extracted rule was further evaluated by inspection of the
Finally, better performance software for Apriori algorithm number of cases within the database [5].
with better accuracy is introduced. Therefore, pattern recognition in heart disease can be
addressed through different computational techniques. In
regard to association rule algorithms, other respected works,
focused on diverse aspects of heart disease on different
datasets can be mentioned: Danapana et al., 2011 [6]; Ordonez
Mirpouya Mirmozaffari1, Msc. student, Faculty of Industrial and 2006 [7]; Han et al., 2011 [8]; Deekshatulu 2012 [9]; Deepika
Mechanical Engineering, Qazvin Branch, Islamic Azad University, Qazvin, 2011 [10]; Lakshmi et al., 2013 [11]. Also, different
Iran
Alireza Alinezhad2, Associate Professor, Faculty of Industrial and computational techniques for other health care issues have
Mechanical Engineering, Qazvin Branch, Islamic Azad University, been reported in the literature [12-13].
Qazvin,Iran . It is observed various associators are frequently utilized in
Azadeh Gilanpour3, Islamic Azad University (IAU). different studies to predict heart disease. Therefore, a
comprehensive comparison of association rules algorithms
practically provides an insight into associator performances.

https://doi.org/10.15242/IJCCIE.DIR1116010 20
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477

This comparison is of great importance to medical Data mining in this research is utilized to build models for
practitioners who desire to predict heart failure at a proper step prediction of the class based on selected attributes. Waikato
of its progression. Furthermore, except for Ref. [14], which Environment for knowledge Analysis (WEKA) has been used
has evaluated 4 classification techniques, there is not any other for prediction due to its proficiency in discovering, analysis
study on the current dataset. Finally, a unique coding in and predicting of patterns [15]. In addition, the system was
MATLAB software is applied in Apriori algorithm which implemented using MATLAB R2013a. MATLAB is a high
eventually results in better performance in compare with language and interactive environment for numerical
WEKA software, covered in this study. computation, visualization, and programming. The language
tool, and built-in math functions enable us to explore multiple
III. DATASET DESCRIPTION approaches and reach a solution faster than with spreadsheets
The standard dataset, compiled in this study contains 209 of traditional programming languages, such as C/C++ of
records, which is collected from a hospital in Iran, under the JAVA. Generally, the whole process can be split into two
supervision of National Health Ministry. Data is gathered from steps as follows:
a single resource, so it precludes any integration operations. A. filtering preprocess
Eight attributes are utilized, from them, 7 are considered as The data in the real world is highly susceptible to noise,
inputs which predict the future state of the attribute missing, and inconsistency. Therefore, preprocessing of data is
“Diagnosis”. All the attributes, along with their values and very important. We apply a filter on datasets and purify them
data types are discussed in Table I. from dirty and redundant data present in the dataset. In
association rules, Discretization should be applied in WEKA
TABLE I
THE ARRANGEMENT OF CHANNELS 2016 (version 3.9.0) and MATLAB R2013a to change
Attributes Descriptions Encoding\Values Feature numeric data into nominal data. This process is implemented.

Age Age in years 28-66 Numeric


B. Evaluation in Association Rules
Figure 1 elaborates the proposed model and different steps.
It signals heart attack and has We apply proposed model in WEKA and MATLAB software.
four different conditions: Asymptotic = 1
Chest Pain Asymptotic, Atypical Angina,
Atypical Angina = Finally, to choose the better software, the accuracy of three
2 Nominal strong rules in two different software are compared.
Type Typical Angina, and without Typical Angina = 3
Angina. Non-Angina = 4
Patient‟s resting blood pressure
Rest Blood
in mm Hg at the time of 94-200 Numeric Imbalanced Dataset
Pressure
admission to the hospital

Below 120 mm Hg- Normal High = 1 Nominal


Blood Sugar
Above 120 mm Hg- High Normal = 0 Binary
Discretization (Unsupervised attribute filter)
Normal, Normal=1
Rest
Left Ventricular Hypertrophy Left Vent Hyper =
Electrocardi
(LVH) 2 Nominal Strong rules with Strong rules Strong rules with
ographic
ST_T wave abnormality ST_T wave min Support: 0.7 with min Support: min Support: 0.5
abnormality = 3 and min 0.6 and min and min
maximum heart rate attained confidence: 0.9 confidence: 0.9 confidence: 0.9
Maximum
in sport test 82-188 Numeric
Heart Rate
It includes two conditions of
Exercise Positive = 1 Nominal
positive and negative Step A Step B
Angina Negative = 0 Binary Step C
Output Output Output
It includes two conditions of Positive = 1 Nominal
Diagnosis
positive and negative Negative = 0 Binary
Fig 1: Implementation of Apriori Algorithm for accuracy analysis

IV. RESEARCH METHODOLOGY


The objective of this study is to effectively predict
possible heart attacks, from the patient dataset. Using a
prediction methodology, a model was developed to determine
the characteristics of heart disease in terms of some attributes.

https://doi.org/10.15242/IJCCIE.DIR1116010 21
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477

 The Third MATLAB Apriori rule in Step C:


V. RESULT AND DISCUSSION 3. ECG at rest = „(-inf-0.666667]‟ Exercise induced Angina„(-
The higher the support and confidence of a rule, the more it inf-0.5]‟ 119 ==> Blood sugar = „(-inf-0.5]‟ 113 (Conf:
represents a regular pattern in the dataset. If these measures 0.9496, lift: 1.0283, Sup: 0.5407)
are relatively low, then any inconsistency would be less strong
than it would be for rules with high confidence and high It is evident that in step A, B, and C, MATLAB software
support. In step A, B and C, the accuracy of three strong rules exhibits more appropriate performances. In a more detailed
in two different software, are compared. discussion, some advantages of MATLAB are thoroughly
discussed below:
A. Step A (strong rules with "min support": 0.7, and "min  All numbers with better accuracy are considered. For
confidence": 0.9) example, in The First WEKA Apriori rule in Step A, lift is
It should be noted, in this step, only one rule with "min one. In fact, there is no correlation between ECG (X) and
support" (0.7) is evaluated. Because there is only one support Blood sugar (Y). But, in the first MATLAB Apriori rule in
higher than 0.7. In fact, it has highest support (0.7703) among Step A, lift is 1.002. It can be observed ECG and Blood sugar
all rules. This process is implemented in two software as have a weak positive correlation.
follows:  Despite "min support", the exact number of supports are
 The First WEKA Apriori rule in Step A: introduced. For instance, in The Second WEKA Apriori rule,
only "min support" (0.6) is introduced. However, in The
1. ECG at rest = „(-inf-1.666667]‟ 174 ==> Blood sugar =„(-inf- Second MATLAB Apriori rule, despite "min support" (0.6),
0.5]‟ 161 (Conf: 0.93, lift : 1) the exact number of support (0.622) is evaluated.
 Strong rules with high support, confidence and positive
 The First MATLAB Apriori rule in step A: correlation lift (more than one) are considered. For example, in
The Forth WEKA Apriori rule in step C, lift is 0.98. It means,
1. ECG at rest = „(-inf-1.666667]‟ 174 ==> Blood sugar = „(-inf- Resting blood pressure and Blood sugar have a negative
0.5]‟ 161 (Conf: 0.9253, lift: 1.0020, Sup: 0.7703) correlation. On the other hand, MATLAB does not consider
weak rules with negative correlation.

B. Step B (strong rules with "min support": 0.7, and VI. CONCLUSION
"min confidence": 0.9) Various Apriori algorithm's strong rules in data mining
In step B, only two rules with "min support" (0.6) are were compared to predict heart disease. A unique model
evaluated. Because there is only one support between 0.6 and consisting of one filter and evaluation methods are evolved.
0.7. The first rule is the same one in step A. Three strong rules, as well as different evaluation methods, are
 The Second WEKA Apriori rule in Step B: applied to find the superior software. Apriori rules are
compared regarding their exact number of support, better
2. Exercise induced Angina = „(-inf-0.5]‟ 137 ==> Blood sugar accuracy, and considering strong rules. The high-performance
= „(-inf-0.5]‟ 130 (Conf: 0.95, lift: 1.03) software was introduced. The experiment can serve as a
practical tool for physicians to effectively predict uncertain
 The Second MATLAB Apriori rule in Step B: cases and advise accordingly.

2. Exercise induced Angina = „(-inf-0.5]‟ 137 ==> Blood sugar


REFERENCES
= „(-inf-0.5]‟ 130 (Conf: 0.9489, lift: 1.0276, Sup: 0.6220)
[1] F. Jin, J. Liu, and W. Hou, “The application of pattern recognition
technology in the diagnosis and analysis on the heart disease: Current
C. Step C (strong rules with "min support": 0.5, and "min status and future,” In 24th Chinese Control and Decision Conference
confidence": 0.9) (CCDC), pp. 1304-1307, 2012.
In this step, four rules with "min support" (0.5) in WEKA [2] E. Kolce, and N. Frashery, “A literature review on data mining
and three rules with "min support" (0.5) in MATLAB are techniques used in Healthcare data bases,” ICT innovations web
evaluated. The first and the second rules are the same in step A proceedings 2012.
and step B. [3] R. Das, I. Turkoglu, and A. Sengur, “Diagnosis of valvular heart disease
through neural networks ensembles,” Elsevier, 2009.
 The Third WEKA Apriori rule in Step C:
[4] A. K. Pandey, P. Pandey, K. L. Jaiswal, and A. K. Sen, “Data Mining
3. ECG at rest = „(-inf-0.666667]‟ Exercise induced Angina Clustering Techniques in the Prediction of Heart Disease using Attribute
„(-inf-0.5]‟ 119 ==> Blood sugar =„(-inf-0.5]‟ 113 (Conf: Selection Method,” International Journal of Science, Engineering and
0.95, lift: 1.03) Technology Research (IJSETR), ISSN: 2277798, Vol 2, Issue10,
October 2013.
 The Forth WEKA Apriori rule in Step C: [5] M. Karaolis, J. A. Moutiris, and C. S. Pattichis, “Association rule
4. Resting blood pressure = „(128-164]‟ 123 ==> Blood sugar analysis for the assessment of the risk of coronary heart events,”
=„(-inf-0.5]‟ 111 (Conf: 0.9, lift: 0.98) Proceedings of the 31st Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, 2009.
https://doi.org/10.1109/iembs.2009.5334656

https://doi.org/10.15242/IJCCIE.DIR1116010 22
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477

[6] H. Danapana, and M. S. Roy, “Effective data mining association rules


for heart disease prediction system,” IJCST, Vol. 2, Issue 4, Oct-Dec.
2011.
[7] C. Ordonez, “Association rule discovery with train and test approach for
heart disease prediction ,” IEEE transactions on information technology
in biomedicine, Vol 10, No. 2, pp 334-343, April 2006.
https://doi.org/10.1109/TITB.2006.864475
[8] J. Han, M. Kamber, and J. Pay, “Data mining concepts and techniques,”
Elsevier 2011.
[9] B. L. Deekshatulu, M. A. Jabbar and P. Chantra, “Knowledge discovery
from mining association rules for heart disease prediction,” Journal of
theoretical and applied information technology, Vol.41, No. 2, 2012.
[10] N. Deepika, “Association rules for classification of heart attack
patients,” IJAEST, Vol.11, pp. 253-257, 2011.
[11] K. R. Lakshmi, M. V. Krishna and S. P. Kumar, “Performance
Comparison of Data Mining Techniques for Predicting of Heart Disease
Survivability,” International Journal of Scientific and Research
Publications, ISSN 2250-3153, Vol.3, Issue.6, June 2013.
[12] A. Goodini, M. Torabi, M. Goodarzi, R. Safdari, M. Darayi, M.
Tavassoli, and M. Shabani, “The simulation model of teleradiology in
telemedicine project,” The Health Care Manager, Vol. 34- Issue 1, p 69-
78, January/March 2015.
[13] R. Isola, R. Carvalho, and A. Kumar, “Knowledge discovery in medical
systems using differential diagnosis, lampstar and K-NN,” conference of
IEEE transactions on information technology in biomedicine, 2011.
[14] B. Bahrami, and M. H. Shirvani, “Prediction and Diagnosis of Heart
Disease by Data Mining Techniques,” Journal of Multidisciplinary
Engineering Science and Technology (JMEST), ISSN: 3159-0040, Vol.
2, Issue 2, February 2015.
[15] I. H. Witten and E. Frank, “Data Mining Practical Machine Learning
Tools and Techniques,” Morgan Kaufman Publishers, 2005.

https://doi.org/10.15242/IJCCIE.DIR1116010 23

View publication stats

You might also like