Explainable Deep Learning-Based Approach For Multilabel Classification of Electrocardiogram
Explainable Deep Learning-Based Approach For Multilabel Classification of Electrocardiogram
Abstract—Recently computer-aided diagnosis methods have of diseases that affects the heart and its blood vessels. It is
been widely adopted to aid doctors in disease diagnosis making usually caused due to the build-up of fatty deposits inside the
their decisions more reliable and error-free. Electrocardiogram arteries [1]. The number of deaths globally due to CVD has
(ECG) is the most commonly used, noninvasive diagnostic tool for
investigating various cardiovascular diseases. In real life, patients been increased from 12.3 million (25.8%) in 1990 to 17.9
suffer from more than one heart disease at a time. So any practical million deaths (32.1%), in 2015 [2], [3]. So, an automated heart
automated heart disease diagnosis system should identify multiple disease diagnosis method is needed to aid doctors in an accurate
heart diseases present in a single ECG signal. In this article, we diagnosis of various CVDs. Electrocardiogram (ECG) is one of
propose a novel deep learning-based method for the multilabel the most commonly used diagnostic tools for the identification
classification of ECG signals. The proposed method can accurately
identify up to two labels of an ECG signal pertaining to eight of various cardiovascular diseases. Many methods have been
rhythm or morphological abnormalities of the heart and also the proposed in the literature for automated heart disease diagnosis
normal heart condition. Also, the black-box nature of deep learning using ECG signals [4]–[6]. However, patients suffer from more
models prevents them from being applied to high-risk decisions like than one heart disease at the same time. So, for automated heart
the automated heart disease diagnosis. So in this article, we also disease diagnostic methods to be practical, it has to identify
establish an explainable artificial intelligence (XAI) framework for
ECG classification using class activation maps obtained from the multiple heart diseases present in a single ECG signal.
Grad-CAM technique. In the proposed method, we train a convolu- In this article, we propose a fully automated method for
tional neural network (CNN) with constructed ECG matrices. With the multilabel classification of ECG into eight cardiovascular
the experiments conducted, we establish that training the CNN by diseases: 1) Atrial fibrillation (AF), 2) First-degree atrioven-
taking only one label for each ECG signal data point is enough for tricular block (I-AVB), 3) left bundle brunch block (LBBB),
the network to learn the features of an ECG point with multilabel
information in it (multiple heart diseases at the same time). During 4) right bundle brunch block (RBBB), (5) premature atrial
classification, we apply thresholding on the output probabilities of contraction (PAC), 6) premature ventricular contraction (PVC),
the softmax layer of our CNN, to obtain the multilabel classification 7) ST-segment depression, 8) ST-segment elevation and the
of ECG signals.We trained the model with 6311 ECG records and normal heart condition. In the proposed method we train a CNN
tested the model with 280 ECG records. During testing, the model using ECG matrices. The ECG matrix is constructed by taking
achieved a subset accuracy of 96.2% and a hamming loss of 0.037
and a precision of 0.986 and a recall of 0.949 and an F1-score of beats from different leads of the patient’s ECG signal in each
0.967. Considering the fact that the model has performed very row of the matrix. During classification, we show that for a
well in all the metrics of multilabel classification, the model can multilabel test point, the output probabilities from the softmax
be directly used as a practical tool for automated heart disease layer corresponding to correct labels will be of the same order
diagnosis. of magnitude. Further, by applying some simple thresholding
Index Terms—Deep learning, electrocardiogram (ECG), on the output probabilities of the softmax layer of our CNN,
explainability, explainable AI, multilabel classification. we classify multilabel ECG recording with up to two labels
accurately.
I. INTRODUCTION Also, the problematic black-box nature of deep learning mod-
ARDIOVASCULAR disease (CVD), one of the major els poses a requirement for an explainable artificial intelligence
C causes of premature death throughout the world is a class (XAI). So that users and domain experts can analyze the various
features learned by the neural network for its classification task.
Manuscript received 21 May 2021; revised 30 July 2021; accepted 10 August In the current article, we also establish an XAI framework
2021. Date of publication 14 September 2021; date of current version 16 June
2023. Review of this manuscript was arranged by Department Editor N. Gerdsri. for the ECG classification task using class activation maps.
(Corresponding author: Ganeshkumar M.) Thereby, making sure that the neural network has learned the
Ganeshkumar M., Sowmya V, Gopalakrishnan E.A, and Soman K.P are with right features of different diseases considered and not some local
the Center for Computational Engineering and Networking (CEN), Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 601103, noises present in the dataset. The neural network erroneously
India (e-mail: [email protected]; [email protected]; learning the local noises present in the dataset will lead to
[email protected]; [email protected]). catastrophic misidentification of heart diseases, when tested with
Vinayakumar Ravi is with the Center for Artificial Intelligence, Prince
Mohammad Bin Fahd University, 34754 Khobar, Saudi Arabia (e-mail: ECG signals other than the ones present in the dataset used. So,
[email protected]). the proposed XAI framework validates that the neural network
Color versions of one or more figures in this article are available at has learned the right features and makes the predictions from it
https://doi.org/10.1109/TEM.2021.3104751.
Digital Object Identifier 10.1109/TEM.2021.3104751 highly confident. The contributions of this article are as follows.
0018-9391 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2788 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
M. et al.: EXPLAINABLE DEEP LEARNING-BASED APPROACH FOR MULTILABEL CLASSIFICATION OF ECG 2789
TABLE I
REVIEW OF EXISTING METHODS FOR THE MULTILABEL CLASSIFICATION OF ECG SIGNALS
Shui Hua Wang et al. [14] proposed a novel neural network to learn the relative features among the batch of CT slices.
architecture called graph rank-based pooling neural network The RAPNN is a VGG-16 architecture that utilizes a novel
(GRAPNN) for the diagnosis of pulmonary tuberculosis using rank-based pooling layer instead of the traditional max-pooling
CT images. layer. The Grad-CAM technique is utilized on the GRAPNN to
The authors first extracted image-level features from the CT make sure that it is paying attention to the areas of lesions (CT
slices using rank-based pooling neural network (RAPNN) and manifestations of pulmonary tuberculosis), thereby creating an
further a graph convolutional neural network (GCN) is adopted XAI framework.
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2790 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
The authors in [15] derived a CNN architecture from VGG-16 classification of ECG. Also, we utilize our trained CNN to
model and added convolutional block attention modules with it establish the XAI framework for the ECG classification task.
and used it to detect COVID-19, pneumonia, and tuberculosis
from CT images. The authors used the same Grad-CAM tech- A. Preprocessing
nique to make sure that the features learned by the CNN are
The preprocessing stage aims to remove the common noises
appropriate manifestations of those diseases in the CT images.
which get added to ECG signals while recording them. We
Authors in [16] proposed a novel patch shuffle stochastic
adopted the same preprocessing steps followed in the arti-
pooling neural network (PSSPNN) to detect COVID-19, pneu-
cle [18], in which authors identified arrhythmias from ECG
monia, and tuberculosis from CT images. Stochastic pooling
signals using CNNs. The steps followed are: 1) Baseline-wander
neural network (SPNN) is a traditional neural network in which
removal and 2) powerline interference removal. The presence of
the max and average pooling layers are replaced by a random-
these noises (baseline-wander and powerline interference) is un-
ized stochastic pooling layer. In SPNN, patch shuffling was
desirable as sometimes the CNN might confuse them as a feature
introduced, in which each minibatch of images and feature
to be learned. However, these noises are completely irrelevant
maps are partitioned into nonoverlapping patches and they are
for the identification of any heart diseases and their removal
shuffled randomly to create local variations which could prevent
helps CNN to learn the appropriate distinguishing features of
overfitting. The Grad-CAM technique was utilized to create an
various heart diseases.
XAI model.
1) Baseline Wander Removal: Baseline wander is the drift
Shui Hua Wang et al. [17] proposed a novel transfer learning
of ECG recordings from its isoelectric level (no positive or
algorithm for pretrained deep learning models and used it to de-
negative charges of electricity). Baseline wander is usually a
tect COVID-19, pneumonia, and tuberculosis from CT images.
low-frequency noise [19]. During the preprocessing stage, a
The authors also used heat maps generated from the Grad-CAM
Butterworth high-pass filter with a cutoff frequency of 0.5 Hz is
technique to create an explainable model.
used to remove baseline wander in ECG signals. Also to get a
zero-phase shift, the filter is applied in both forward and back-
III. PROPOSED METHOD ward directions. The Butterworth high-pass filter is designed
using the SciPy package.1 Fig. 3 shows a sample ECG signal
The flow diagram of the proposed method is shown in Fig. 2, it
from our dataset before and after baseline wander removal.
broadly consists of four steps: 1) Preprocessing the ECG signals,
2) ECG matrix formation, 3) training the CNN, and 4) applying
thresholding on softmax probabilities leading to the multilabel 1 [Online]. Available: https://www.scipy.org/
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
M. et al.: EXPLAINABLE DEEP LEARNING-BASED APPROACH FOR MULTILABEL CLASSIFICATION OF ECG 2791
Fig. 3. Sample ECG signal before and after Baseline wander removal.
Fig. 4. Sample ECG signal before and after power line interference removal.
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2792 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
Fig. 6. Architecture of the CNN used for the multilabel classification of ECG signals.
segmenting our dataset. The following procedure is adopted for from the VGG-16 gave a better performance while evaluating
beat segmenting the ECG signals: After detecting the R-peaks, with the test set. Also, the number of neurons in the fully
we considered a beat to be the signal spanning between connected layers was reduced compared to that of VGG-16 and
170 ms left from the current R-peak to 170 ms left from the next dropouts with a rate of 0.5 were introduced in between fully
R-peak. That is the signal which starts from the PR interval of the connected layers to avoid overfitting.
current beat and spans till the beginning of the PR interval of the All the convolutional layers of the CNN had 3×3 Kernels. The
next beat. number of filters in each convolution layer is also mentioned in
For R-peak detection, we used multiple methods available Fig. 6 (e.g., 64). Pool/2 in Fig. 6 indicates a max-pooling opera-
in the literature: Pan and Tompkins method [23], stationary tion with 2×2 stride. The fully connected layers are denoted by
wavelet transform method [24], Christov method [25], Hamilton “fc” along with the number of neurons in those layers. We tried
method [26], Engelse and Zeelenberg method [27], and two tuning various hyperparameters of the CNN like the number of
moving average method [28]. The accuracy of R-peaks detection layers, filter size, and the number of filters in each convolutional
varied in each of these methods. During our experiments, we layer. However, there were no improvements in the performance
found that a particular R-peak detection method worked well when deviated from the hyperparameter configurations of the
for a particular ECG data point and other methods gave some standard VGG-16 architecture. We trained our CNN using an
false positives or false negatives. So for R-peak detection, all Adam optimizer with a learning rate of 0.0001 and a batch size
abovementioned methods were applied one by one, until an of 32.
appropriate number of R-peak is detected with respect to the The CNN was trained by taking only one label for each of
length of that particular ECG signal. For period normalizing the ECG recordings. While experimenting with the test ECG
the ECG beats, we adopted a similar procedure as that of the data points, we observed that CNN automatically captured the
one used in the papers [29], [30], where authors processed ECG features of multilabel ECG points, this is further validated in the
signals by constructing ECG matrices and tensors, respectively. experiments section.
The R-peaks of all the beats were aligned at 200 ms and zeros
are padded appropriately at the start and end to period normalize D. Thresholding on Softmax Layer Probabilities and
the beats to 400 ms. This alignment of R-peaks at 200 ms Multilabel Classification of ECG
is an important normalization step. CNNs are discriminative
After training the proposed CNN with constructed ECG ma-
models, they learn the features by discriminating the data points
trices, during testing, we applied thresholding on the softmax
belonging to different classes. The misalignment in R-peaks can
layer probabilities to pick the right labels for our test ECG
therefore act as noise, as CNN may confuse it as characteristic
points. Leading to the multilabel classification of ECG signals.
features of some heart disease, which makes it difficult for the
This thresholding technique was formulated with the help of
CNN to learn the actual features of the diseases.
experiments we conducted and it is described in the Experiments
section with supporting results.
C. Training the CNN
After constructing the ECG matrices for all the ECG record- E. XAI Using Class Activation Maps
ings, the CNN is trained with those matrices. Since the con- The XAI framework for the ECG classification task is estab-
structed ECG matrices are 2-D, we were able to process them lished using class activation maps obtained from a technique
with traditional 2D CNNs. Fig. 6 describes the CNN architecture called Grad-CAM [31]. Grad-CAM generates a map of weights
used. Our CNN is adapted from the standard VGG-16 architec- indicating the important regions in the input used by the CNN for
ture. One convolution + max-pooling block was removed from predicting its class label. Grad-CAM uses the values of gradients
the standard VGG 16 architecture to make it compatible with the flowing into the final convolutional layer to produce such class
dimensions of our input and to reduce the number of trainable activation maps. The detailed working of Grad-CAM can be
parameters. Thereby making the training of CNN faster. We found in the article [31]. We picked sample ECG matrices that
experimentally found that removing the 2nd convolutional block are being correctly classified by our CNN and obtained their
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
M. et al.: EXPLAINABLE DEEP LEARNING-BASED APPROACH FOR MULTILABEL CLASSIFICATION OF ECG 2793
Fig. 7. Computing the mean of activations across the beat space of the ECG matrix.
class activation maps using Grad-CAM. After obtaining the class TABLE II
CLASS DISTRIBUTION OF OUR TEST SET
activations maps for the considered ECG matrices (Leads ×
Time), the mean of activations across the leads is computed, to
obtain the activation along the beat space of those ECG matrices.
Fig. 7 describes this process. The obtained average activation
along the beat space indicates the important segments of the
ECG signal used by the CNN to predict that particular heart
disease successfully.
Further, the obtained average activation along the segments
of the ECG beat is analyzed and a correlation is established
with the ECG manifestations of that particular heart disease.
Thereby making sure that the CNN has learned the right features
for classifying ECG signals according to the heart diseases they
belong to. Thus establishing an XAI framework for the ECG
classification task. was not able to capture the features of ECG recordings with
three labels. The CNN was trained with 6311 ECG recordings.
We used the validation set given in our dataset for testing our
IV. EXPERIMENTS
model. Table II gives the number of ECG recordings in different
The CNN and Grad-CAM models were implemented using classes of our test set, multilabel points are considered in all the
TensorFlow2 and Keras3 package and we ran our experiments classes they belong to.
in a 12 GB NVIDIA Tesla K80 GPU.
B. Softmax Probability Thresholding for Multilabel
A. Dataset Used Classification of ECG Signals
The dataset we used was taken from “The China Physio- After training the proposed CNN by taking one label for
logical Signal Challenge 2018: Automatic identification of the each ECG matrix, while testing we observed that CNN was
rhythm/morphology abnormalities in 12-lead ECGs.” A detailed automatically able to capture the features of the multilabel ECG
description of the dataset is published in an article by Feifei Liu points. CNNs capture the features of a particular class with some
et al. [32]. The dataset contains 12 leads ECG recordings lasting of its neurons. The respective neurons get activated whenever
from 6 to 60 s. ECG recordings were sampled at 500 Hz. We had the CNN detects those features in input, which in turn increases
to drop some recordings from the training set and the validation the softmax probability of that particular class. Whenever the
set of our dataset, as we could not extract enough number of beats CNN detects the features of multiple classes in the input, the
from them, due to their shorter length. Our dataset also had six respective neurons which capture the features of those classes
three-label points, which we removed, as our proposed method get activated, leading to increased softmax probabilities of all
those classes. During our experiments we found out that for
2 [Online]. Available: https://www.tensorflow.org/ a multilabel ECG data point, the output probabilities obtained
3 [Online]. Available: https://keras.io/ from the softmax layer corresponding to its right class labels
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2794 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
M. et al.: EXPLAINABLE DEEP LEARNING-BASED APPROACH FOR MULTILABEL CLASSIFICATION OF ECG 2795
TABLE III
PERFORMANCE OF THE PROPOSED METHOD IN VARIOUS EVALUATION METRICS
FOR THE MULTIABEL ECG CLASSIFICATION TASK
E. F1-Score
F1-score is the harmonic mean between precision and recall
Fig. 10. Average activations across the beat space of an ECG sample with AF.
2(precision (h)∗ recall(h))
F1 = precision (h)+recall(h) . (5)
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2796 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
Fig. 11. Average activations across the beat space of an ECG sample with
I-AVB.
Fig. 13. Average activations across the beat space of an ECG sample with
PAC.
Fig. 12. Average activations across the beat space of an ECG sample with
LBBB.
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
M. et al.: EXPLAINABLE DEEP LEARNING-BASED APPROACH FOR MULTILABEL CLASSIFICATION OF ECG 2797
Fig. 15. Average activations across the beat space of an ECG sample with Fig. 17. Average activations across the beat space of an ECG sample with
ST-segment depression. RBBB.
Fig. 16. Average activations across the beat space of an ECG sample with Fig. 18. Average activations across the beat space of an ECG sample with
ST-segment elevation. PVC.
beat space of an ECG sample with ST-segment elevation. From see that our CNN is properly getting activated at the broad QRS
Fig. 16, we can observe that our CNN is precisely activated in complex region of the ECG.
the regions of normal QRS complex, elevated ST-segment, and
a normal T wave. With this, we make sure, that our CNN has C. Visualization
learned the right features for classifying an ECG signal with To visualize our test set, we extracted the features from
ST-segment elevation, according to its characteristic features. the last convolution layer of our trained CNN for all the test
8) Right Bundle Branch Block (RBBB): A wide slur S wave points, vectorized and reduced them to a dimension of two
in leads I, V5, and V6 is the characteristic feature of RBBB [35]. using the t-distributed stochastic neighbor embedding (t-SNE)
V5 and V6 are the two out of the three leads our CNN is trained technique [40]. Fig. 19 visualizes our test set in 2-D. The legend
with. Fig. 17 shows the average activations across the beat space in the figure describes the mapping of colors to different diseases
of an ECG sample with RBBB. The signal shown in Fig. 17 is the test points belong to (considering only the first label of each
from the V5 lead. From Fig. 17, we can see that our CNN’s test point). From the figure, we can understand that the data is
average activation across the beat space is relatively high in the highly nonlinearly separable.
wide slur S wave region, in correspondence to the characteristic
features of RBBB.
9) Premature Ventricular Contraction (PVC): An abnor- VII. DISCUSSION AND FUTURE WORK
mally broad QRS complex is the characteristic feature of ECGs Our proposed method achieved a state-of-the-art performance
with PVC [39]. Fig. 18 shows the average activations across the in the multilabel ECG classification task, also providing an
beat space of an ECG sample with PVC. From Fig. 18, we can XAI framework for its classifications. This section provides a
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.
2798 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 70, NO. 8, AUGUST 2023
TABLE IV the disadvantages of all the other methods by including the iden-
COMPARISON OF THE PERFORMANCE OF THE PROPOSED METHOD WITH
EXISTING METHODS
tification of ST-segment depression and ST-segment elevation.
Thereby, aiding doctors in identifying critical heart diseases like
myocardial Infarction.
Authorized licensed use limited to: Bar Ilan University. Downloaded on March 12,2024 at 10:40:49 UTC from IEEE Xplore. Restrictions apply.