Data Mining Apriori Algorithm for Heart Disease Prediction
Data Mining Apriori Algorithm for Heart Disease Prediction
net/publication/317952833
CITATIONS READS
34 1,651
3 authors:
Azadeh Gilanpour
University of Oklahoma
3 PUBLICATIONS 82 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Azadeh Gilanpour on 27 June 2017.
Abstract—Heart disease is a major cause of morbidity and II. BACKGROUND AND LITERATURE REVIEW
mortality in the modern society. Almost 60% of the world population
Growing number of heart patients worldwide have
fall victim to the heart disease. Although significant progress has
been made in the diagnosis and treatment of coronary heart disease,
motivated researchers to do comprehensive research to reveal
further investigation is still needed. Data mining, as a solution to hidden patterns in clinical datasets. This section provides an
extract hidden pattern from the clinical dataset are applied to a overview of previous computational studies on pattern
database in this research. The database consists of 209 instances and recognition in heart disease. Not only are different techniques
8 attributes. The system was implemented in WEKA and MATLAB addressed, but also various heart disease datasets are covered
software and prediction accuracy within Apriori algorithm in 3 steps, to have a fair comparison. Finally, the gap in existing
are compared. MATLAB is introduced as better performance literature, which was the main motivation of this study is also
software. provided. Some of the key studies are as follows:
Das et al. introduced a neural network classifier for
Keywords— Data mining, Apriori, MATLAB, WEKA. diagnosing of the valvular heart disease. The ensemble-based
methods create new models by combining the posterior
I. INTRODUCTION probabilities or the predicted values from multiple predecessor
https://doi.org/10.15242/IJCCIE.DIR1116010 20
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477
This comparison is of great importance to medical Data mining in this research is utilized to build models for
practitioners who desire to predict heart failure at a proper step prediction of the class based on selected attributes. Waikato
of its progression. Furthermore, except for Ref. [14], which Environment for knowledge Analysis (WEKA) has been used
has evaluated 4 classification techniques, there is not any other for prediction due to its proficiency in discovering, analysis
study on the current dataset. Finally, a unique coding in and predicting of patterns [15]. In addition, the system was
MATLAB software is applied in Apriori algorithm which implemented using MATLAB R2013a. MATLAB is a high
eventually results in better performance in compare with language and interactive environment for numerical
WEKA software, covered in this study. computation, visualization, and programming. The language
tool, and built-in math functions enable us to explore multiple
III. DATASET DESCRIPTION approaches and reach a solution faster than with spreadsheets
The standard dataset, compiled in this study contains 209 of traditional programming languages, such as C/C++ of
records, which is collected from a hospital in Iran, under the JAVA. Generally, the whole process can be split into two
supervision of National Health Ministry. Data is gathered from steps as follows:
a single resource, so it precludes any integration operations. A. filtering preprocess
Eight attributes are utilized, from them, 7 are considered as The data in the real world is highly susceptible to noise,
inputs which predict the future state of the attribute missing, and inconsistency. Therefore, preprocessing of data is
“Diagnosis”. All the attributes, along with their values and very important. We apply a filter on datasets and purify them
data types are discussed in Table I. from dirty and redundant data present in the dataset. In
association rules, Discretization should be applied in WEKA
TABLE I
THE ARRANGEMENT OF CHANNELS 2016 (version 3.9.0) and MATLAB R2013a to change
Attributes Descriptions Encoding\Values Feature numeric data into nominal data. This process is implemented.
https://doi.org/10.15242/IJCCIE.DIR1116010 21
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477
B. Step B (strong rules with "min support": 0.7, and VI. CONCLUSION
"min confidence": 0.9) Various Apriori algorithm's strong rules in data mining
In step B, only two rules with "min support" (0.6) are were compared to predict heart disease. A unique model
evaluated. Because there is only one support between 0.6 and consisting of one filter and evaluation methods are evolved.
0.7. The first rule is the same one in step A. Three strong rules, as well as different evaluation methods, are
The Second WEKA Apriori rule in Step B: applied to find the superior software. Apriori rules are
compared regarding their exact number of support, better
2. Exercise induced Angina = „(-inf-0.5]‟ 137 ==> Blood sugar accuracy, and considering strong rules. The high-performance
= „(-inf-0.5]‟ 130 (Conf: 0.95, lift: 1.03) software was introduced. The experiment can serve as a
practical tool for physicians to effectively predict uncertain
The Second MATLAB Apriori rule in Step B: cases and advise accordingly.
https://doi.org/10.15242/IJCCIE.DIR1116010 22
Int'l Journal of Computing, Communications & Instrumentation Engg. (IJCCIE) Vol. 4, Issue 1 (2017) ISSN 2349-1469 EISSN 2349-1477
https://doi.org/10.15242/IJCCIE.DIR1116010 23