Hybrid Machine Learning Algorithm For Arrhythmia Classification Using Stacking Ensemble, Random Forest and J.48 Algorithm
Hybrid Machine Learning Algorithm For Arrhythmia Classification Using Stacking Ensemble, Random Forest and J.48 Algorithm
ISSN No:-2456-2165
Abstract:- Arrhythmias also known as dysrhythmia is a The term hybrid machine learning algorithm is employed
heart ailment that arises when electrical signals that when an ensemble of heterogeneous collection of learners are
coordinate the heartbeats do not work appropriately, they involved in contrast to other ensemble models where
are often precursors to a number of heart diseases which homogenous collection of learners are mostly used as is the
may be terminal, and early detection and adequate case of bagging or boosting.
treatment can save life, in this paper we propose a
classification technique that blends two good performing Ensemble learning is a machine learning theory where
machine learning algorithms to enhance the accuracy of two or more learners (machine learning algorithms) are trained
detecting arrhythmia using Electrocardiogram (ECG) or utilized on datasets to solve the same task by extracting
data and Weka machine learning tool, these algorithms several predictions then merged into a single composite
include the J.48 and Random Forest algorithms combined prediction [2] Ensemble algorithms coalesces the decisions of
with an ensemble algorithm called Stacking; For this separate classifiers that composes it, in order to improve the
experiment the MIT-BIH ECG dataset from Kaggle.com final prediction. according to [3] It is the procedure of running
was used to train, test and validate the hybrid algorithm. two or more related but different models and then fusing their
This dataset used classified ECG data into the 5 super class outcomes into a single score or spread with the aim of
of arrhythmia approved by the association for the improving the accuracy of predictive analytics and data
advancement of medical instrumentation (AAMI) to be mining applications.
detectable by equipment and methods, they include
normal sinus (N), fusion beat (F), supraventricular ectopic A. Electrocardiogram (ECG)
beat (SVEB), ventricular ectopic beat (VEB), and The electrical activities of the heart (typically of
unknown beat (Q). the hybrid algorithm “stacked random consisting depolarization and repolarization) is captured by
forest and j.48) outperformed the other individual the electrocardiogram, it facilitates the detection and diagnosis
algorithms, the performance metrics gotten include of heart anomalies by quantifying electrical potentials on the
97.63% accuracy, an approximate sensitivity (recall) and human body surface, generating a record of the electrical
Positive predictivity (precision) value of 0.98, other metrics currents associated with heart muscular activities.
includes a weighted precision recall curve area of 0.97,
receiver operator characteristics area of 0.96 and test time The propagation of electrical signals in the heart are
of 1.66 seconds and finally a model size of 38.2mb which is pattern like, thus it results to electrical currents ensuing on the
suitable for building application for mobile devices. surface of the body and electrical potential on the skin surface;
consequently, this potential is picked up and/or quantified with
Keywords:- Machine Learning, Arrhythmia Classification, the aid of electrode or sensors. The electrical potential
ECG, Random Forest, J.48, Stacking Ensemble. difference between the spaces where the electrodes are placed
on the skin surface, are normally enhanced using an
I. INTRODUCTION operational amplifier with optic isolation. Then, the signal is
then fed to a high-pass filter; after which it is then also
Arrhythmia is an ailment that ensues when electrical submitted to an antialiasing low-pass filter. Finally, the
impulses that controls how the heart beats don't work as processed signal shows in an analogical to digital converter.
required, this makes the heart to beat faster than normal, too The graphical illustration (a plot of voltage (mV) against time)
slow, flutter, fibrillate, or suffer early heartbeat known as of this process is called electrocardiogram (ECG). ECG was
premature contraction. sometimes, arrhythmias are precursors first demonstrated on humans by Augustus Desiré Waller in
to cardiac arrest which could be fatal; The past two decades 1887 [4], since then, the heart’s electrical activities have been
have seen considerable advancements in the diagnosis and recorded, however, the capacity to diagnose the normal
management of supraventricular and ventricular arrhythmias cardiac rhythm and arrhythmias became a routine medical
[1], with digital devices being more available, this paper check-up from 1960s.
proposes a classification model for a more accurate detection
of arrhythmia by using a hybrid machine learning algorithm.
Random Forest
Random Forest is an ensemble algorithm that is
J.48
composed of decision trees (also called “forest”), thus it is an
J.48 machine learning algorithm is WEKA data mining
ensemble of decision trees that uses voting ensemble; as
tool open source Java implementation of Quinlan’s C4.5
shown in Figure 3, to perform classification on a new object
algorithm for making pruned or unpruned decision tree; it is
based on some attributes, each tree provides a classification,
an extension of Quinlan’s prior ID3 algorithm. “C4.5 was
therefore we say the tree “votes” for that class. The forest
previously ranked number one data mining algorithm in 2008”
chooses the classification having the most votes (over all the
according to [15]. A given set of training data that is labeled
trees in the forest).
can be used by J.48 to build decision trees using the concept
of information entropy. It employs the fact that every attribute
of the data can be used to make a decision by splitting the data
into smaller subsets. The decision trees generated by J.48 can
then be used for classification of new unknown data.
(𝑇𝑃 + 𝑇𝑁)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Accuracy (%)
Build Time
Model Size
Test Time
statistics
Kappa
(Sec)
(Sec)
(Kb)
Random Forest 97.23 0.90 175.50 1.95 38,171
J.48 95.98 0.87 194.72 0.09 527
Stacked Random Forest & J.48 97.63 0.92 1789.64 1.66 38,150
Table 2: Accuracy, Build time, Kappa Statistics, Model size and Test Time
Stacked RF &
97
96.5
J.48
J48
RF
95.98
96
95.5
95 0 (N) 0.973 0.122 0.98
RF j.48 Stacked RF.J48 1 (S) 0.766 0.121 0.887
Fig 5: Model Accuracy 2 (V) 0.886 0.198 0.954
3 (F) 0.726 0.903 0.865
4 (Q) 0.949 0.98 0.987
Algorithm Model Size (Kb) Weighted average 0.958 0.195 0.976
Table 3: Positive predictivity
50000
38171 38150
40000
Positive Predictivity
30000
1.2
0.958 0.976
20000 1
10000 0.8
527
0.6
0
RF j.48 Stacked RF.J48 0.4
0.195
Fig 6: Algorithm model sizes 0.2
0
The results show that the stacked random forest has the RF J.48 SRJ
best accuracy of 97.63% and a model size of 38.15mb, which
is an improved performance. The model size is worthy of note, Fig 7: Weighted Positive predictivity
if one plans to develop for a mobile application (eg android or
Stacked RF &
expected and returning a majority of all positive results.
J.48
J48
RF
𝑇𝑝
𝑆𝑒 =
𝑇𝑝 + 𝐹𝑛
Table 4 shows the recall of experimented algorithms, it 0 (N) 0.13 0.147 0.096
can also be seen that Stacked Random Forest and J48 1 (S) 0.005 0 0.003
performed best with weighted recall average of 0.976 2 (V) 0.008 0.001 0.003
3 (F) 0.002 0 0.001
4 (Q) 0.004 0 0.001
Stacked RF &
J48
RF
Stacked RF &
Precision Recall Curve Area
J.48
J48
RF
1 0.992
0.98 0.973
V. CONCLUSION
ROC
0.995 The experimental results reveals that the hybrid
1
algorithm, stacked random forest and J.48 performed better
0.98 than the individual algorithms on the MIT-BIH arrhythmia
0.964 dataset with a good accuracy of 97.63%, an approximate recall
0.96 and precision value of 0.98, PRC area of 0.97, ROC area of
0.935 0.96 reassures its effectiveness at providing excellent results
0.94
and a test time of 1.66sec. the hybrid algorithm (Stacked
0.92 Random forest and J.48) performed brilliantly thus is a better
choice for automatic arrhythmia application design. Though
0.9 the model size of 38.2mb is a bit large, it is still a good model
RF J.48 SRJ size for machine learning application design for mobile
Fig 10: Weighted Average ROC of the Algorithms devices, given that benchmark size for mobile application
machine learning models is 50mb.
G. Precision Recall Curve Area
A high area under the curve represents both high recall REFERENCES
and high precision, as show in table 7.
[1]. Aro, A. L., & Chugh, S. S. (2018). Epidemiology and
global burden of arrhythmias (Vol. 1).
Stacked RF & J48
https://doi.org/10.1093/med/9780198784906.003.0064
[2]. Onwuka, U. (2019). Ensemble Learning. Retrieved July
J.48
learning
[3]. Burn, E. (2015) What is ensemble modeling? -
TechTarget. Retrieved July 23, 2019, from
0 (N) 0.972 0.999 0.986 https://searchbusinessanalytics.techtarget.com/definitio
1 (S) 0.53 0.863 0.764 n/Ensemble-modeling
2 (V) 0.786 0.98 0.927 [4]. Oxford D (2004). Waller, Augustus Désiré (1856–1922),
physiologist. https://doi.org/10.1093/ref:odnb/38099
3 (F) 0.462 0.825 0.723
[5]. Dietterich, T. G., & Bakiri, G. (1991). Error-correcting
4 (Q) 0.885 0.994 0.968
output codes: A general method for improving multiclass
Weighted average 0.938 0.992 0.973
inductive learning programs. AAAI Press. AAAI. 572–
Table 7: Precision Recall Curve Area 577
[6]. Waske B., Benediktsson, J. A. (2007) Fusion of Support
Vector Machines for classification of multisensor data,
IEEE Trans. Geosci. Remote Sens. 3858–3866.
[7]. Duin, Robert & Tax, David. (2000). Experiments with
Classifier Combining Rules. 16-29. 10.1007/3-540-
45014-9_2.