An Interpretable Approach With Explainable AI for the Detection of Cardiovascular Disease
An Interpretable Approach With Explainable AI for the Detection of Cardiovascular Disease
Abstract—Cardiovascular disease (CVD) is one of the SHAP works on the theory of cooperative game theory in
2024 International Conference on Integrated Intelligence and Communication Systems (ICIICS) | 979-8-3315-0496-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICIICS63763.2024.10860265
prominent contributors to global mortality. Early detection and which it assigns each feature of a model an importance value,
precise diagnosis of the disease are critically required to reduce which represents its contribution to a particular prediction.
its impact. This paper proposes an interpretable approach using This method ensures consistency and adds transparency by
Explainable Artificial Intelligence (XAI) to detect explaining how individual features (such as blood pressure or
cardiovascular disease. By combining different machine cholesterol level) affect the model’s output. SHAP provides
learning (ML) algorithms with XAI techniques, we aim to global and local interpretability, offering a clear
enhance the models’ predictability and transparency. Further, understanding of how the model behaves across all predictions
we use SHAP (SHapley Additive exPlanations) to provide a
as well as individual ones [7].
human-understandable explanation of model predictions. The
proposed approach emphasizes both the accuracy and In contrast, LIME provides local explanations of complex
interpretability of the model which enhances the model’s ML models by approximating these models with simpler and
performance and also solves the black-box problem associated interpretable models. LIME introduces slight changes to the
with AI models. Our results demonstrate that the proposed feeded data and observes the impact on the model’s output.
models achieve competitive performance metrics, where ANN This helps to understand why the model made a specific
achieved the highest accuracy of 91%. This work highlights the decision for a particular instance and thus provides
importance of XAI to bridge the gap between ML models and transparency in cases. This technique is particularly useful
clinical decision-making, fostering trust in AI-driven healthcare
when trying to understand the decision boundaries of complex
solutions for the early detection of cardiovascular disease.
models.
Keywords—random forest, cardiovascular disease, support Both these XAI models offer valuable insights into
vector machine, LIME, SHAP. machine learning models, making them indispensable tools in
sectors like healthcare, where understanding "why" a
I. INTRODUCTION
prediction was made is just as important as the prediction itself
Cardiovascular disease (CVD) is considered one of the [8].
major health issues worldwide, responsible for approx. 17.9
million deaths annually [1]. The rise in CVD cases is This paper presents an interpretable approach using
attributed to several factors like obesity, hypertension, Explainable AI techniques to detect cardiovascular disease.
smoking, diabetes, and sedentary lifestyles. Therefore, early By leveraging SHAP, we aim to provide insights into model
detection and treatment are crucial for improving patients’ predictions, allowing clinicians to assess both the accuracy
health. The intricacy and multivariate nature of CVD present and rationale behind the risk assessments. The main aim of the
obstacles to established diagnostic approaches, which often proposed work is to bridge the gap between AI-based CVD
call for expert interpretation and may not be accurate enough detection models and help doctors in clinical decisions,
to detect individuals at risk at the initial stage [2]. enhancing both the precision and interpretability of diagnostic
tools.
In the past decade, AI has shown great potential in almost
all domains [3], especially the healthcare sector for improving The remaining paper is structured as follows: Section 2
disease diagnosis and prediction [4]. Various predictive presents work done on machine learning and XAI applications
models have been developed for identifying CVD risk and in healthcare, specifically for CVD detection. Section 3
these models are trained on vast amounts of patients’ data to outlines the data and methodology used in this study,
enhance their performance. Despite that, the "black box" including model development and explainability techniques.
character of many machine learning algorithms is a significant Section 4 presents the results and analysis, followed by model
obstacle to the use of AI in healthcare. These models, though interpretability and discussion in Section 5 and Section 6.
powerful, due to the black box character, often provide a II. LITERATURE REVIEW
limited perception to the clinical experts for making informed
decisions, leaving clinicians and healthcare providers For the last one-decade researchers are extensively
uncertain about the reliability and interpretability of AI- exploring different ML techniques for the successful
generated predictions [5]. identification and prediction of various diseases. Further to
increase the performance of these ML models various
Explainable AI (XAI) has emerged as a viable method to optimization algorithms have been devised. This section
deal with this issue. The goal of XAI is to increase the provides an overview of related studies in this area.
transparency of AI models so that medical professionals can
comprehend the logic underlying a model's predictions [6]. Mahim et al. proposed an integrated framework for the
SHAP and LIME are two of the most commonly implemented successful detection of Alzheimer’s disease with a Vision
techniques for XAI. Transformer for the identification of significant features and
a Gated Recurrent Unit to establish a correlation between
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.
these features. The author used a magnetic resonance images Similarly, Nimbhorkar et al. suggested an approach using
(MRI) dataset from the Kaggle for the detection of the deep learning to diagnose COVID-19 based on chest X-ray
disease. Their model showed 99.53% accuracy in the case of images publicly available on Kaggle. The dataset contained
4 classes and for binary classification, the model achieved a 6400 images which have been divided into training (70%),
bit higher accuracy of 99.69% [9]. testing (15%), and validation (15%) data. Comparisons of the
custom CNN with those having pre-trained VGG-19,
Another study, presented by Majhi et al. used two ResNet50, Inception V3, and AlexNet are done. The results of
different tree classifiers, random forest, and XGBoost for the the CNN proposed model performed better with 94.25%
successful identification of heart disease based on patients’ accuracy than the selected pre-trained models [14].
ECG signals. They have implemented the classifiers on three
different ECG datasets (MIT_BIH, Physionet challenge Hassan et al. suggested a classification algorithm for the
2016, and Pascal challenge competition). The authors further detection of prostate cancer. They have implemented their
implemented the SHAP method for the identification of algorithm on ultrasound and MRI images. The proposed
significant features. The results showed that XGBoost method achieved a higher accuracy of 97% for ultrasound
performs well in all three datasets [10]. images and 80% for MRI images [15].
An XAI-based deep learning model was proposed by III. MATERIALS AND METHODS
Bhawna et al. for intrusion detection especially in IOT
networks, to categorize different attacks. Feature abstraction For Experimental purposes, the publicly available Z-
was done using a filter-based method. The author used two Alizadeh Sani dataset for CAD diagnosis has been used from
publicly available datasets NSL-KDD and UNSW-NB 15. the UCI data repository.
The reduced dataset was implemented on the deep and The dataset contains 303 records and 54 attributes. These
convolution neural network (CNN) model. The Deep neural features have been divided into four different categories such
network model performed better as compared to CNN. as demographic features, symptoms and examination features,
Further to answer the opaqueness of DNN author ECG (electrocardiogram) features, and laboratory and ECHO.
implemented LIME and SHAP methods [11]. The dataset contains 54 features. All the features are not
Talaat et al. proposed a seven-stage CardioRiskNet model equally significant and contribute to the result. Feature
with XAI integration for the identification of cardiovascular selection reduces the number of features by removing those
diseases. They have implemented their model on two that are least significant and selecting the most suitable ones
different datasets with around 300 records with 40 relevant that have a significant impact on output.
features. The model achieved an accuracy of 98.7% [12]. The dataset is reduced by using a correlation-based feature
Another research work was carried out by Munshi et al. subset selection method. This method is used effectively to
for the detection of breast cancer. The authors proposed a identify the most significant features that are highly correlated
novel framework using images and laboratory data of the with the class and least correlated with other attributes in the
patients, integrated with XAI for clear interpretation of the dataset. The selection of the most significant features helps in
model. For image-based prediction, the authors implemented improving the performance of the model and hence increases
a U-Net transfer model. Additionally, they have proposed the efficiency of the model. In this method, the correlation
ensemble methods using CNN with random forest and threshold is typically chosen through experimentation and
support vector machine. The performance of original features analysis of model performance by experimenting with
with the convoluted were compared and the proposed method different threshold values.
achieved an accuracy of 99.9% [13]. Table 1 shows the selected significant features.
A. Proposed Model divided into training 70% and testing 30% subsets. After
Figure 1 shows the complete proposed framework of the creating training and testing subsets different machine
suggested model. The dataset was obtained from the UCI learning models were constructed such as RF, DT, and ANN.
repository and then preprocessing was carried out to check for The performance was evaluated for all the said models using
any missing values or noise. The dataset contains 56 attributes, standard performance parameters i.e. accuracy, error rate,
so feature selection was carried out by implementing a precision, recall, etc.
correlation-based feature subset method and significant
features were identified. Further, the obtained dataset was
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.
reducing the problem. The final is generated by totalling the
output from each tree and then voting is carried out for
classification and the average is taken for regression
problems [16].
2) Decision Tree: It is used for problems related to
classification and problems related to regression problems. It
is a tree-like structure where features are represented by
internal nodes and, each branch represents a rule for that
feature and finally the prediction is shown by a leaf node. It
works based on the GINI index or Information gain to split
the data into subsets based on the most significant feature.
Decision trees are widely used in different domains as they
are easy to understand and visualize [17] and can be used for
both numerical and categorical data but generally, the
accuracy achieved by DT is lower than the ensemble methods
such as RF.
The Gini impurity for a given dataset (DS) can be
computed as in (1)
B. Model Creation Where DS is the dataset, cls denotes no. of classes and P
1) Random Forest: It is a popular supervised ensemble denotes probability of the class.
machine learning technique primarily used for both the The entropy for dataset DS can be computed as in (2)
problems related to classification and the problems related to
regression tasks. Being an ensemble in nature it pools ∑ (2)
multiple classifiers to solve a complex task and performs well
with imbalanced datasets. It builds several decision trees and Information Gain is the decrease in the amount of entropy
trains each tree on a random dataset, thus requiring more after splitting the dataset based on a particular feature. It can
computational power and memory. Selecting features be calculated as in (3)
randomly for each tree provides diversity and helps in
| |
, ∑!∈#$ %& ! (3)
| |
3) Artificial Neural Network: This model is based on the depth -2, Min Sample split – 80:20, no. of estimators – 100.
working of the human brain consisting of several For DT Max Depth-3, Min Sample split – 80:20, Criteria –
interconnected nodes called neurons. In the forward pass, Gini index. For ANN no. of layers –4, Batch Size-10,
every neuron creates a linear combination of its inputs using Activating Function- relu, epochs: 150, Optimizer-Adam.
the in (4) The results shown in Table 2 and Figure 2 show the bar
graph of accuracy and error rate achieved by different ML
'( ∑* )(* +* , -( (4) methods on the CVD Dataset.
Where )(* are the weights +* are the inputs and -( is TABLE II. ACCURACY AND ERROR RATE OF THE CLASSIFIERS
the bias term. Model Accuracy Error Rate
Random Forest 0.83 0.17
The linear combination is computed, followed by an Decision Tree 0.81 0.19
activation function, to introduce non-linearity in the model ANN 0.91 0.09
because the real-world data is non-linear, and deep learning
helps us with that. Further loss function is applied to minimize Precision, Recall, and F1 score are also important
the loss i.e. to minimize the difference between predicted and performance parameters to evaluate the performance of ML
actual values. The method can also be used with unstructured Models. Table 3 shows the value of these parameters obtained
data such as images, audio, and text. by different ML Models. Figure 3 shows the bar graph of the
parameters mentioned for the CVD dataset. Figure 4, Figure
IV. RESULTS 5, and Figure 6 show the contingency matrix for RF, DT, and
For early identification of Cardiovascular disease, ANN respectively.
Artificial Neural Network outshined all the other methods and
got an accuracy of 91% and the lowest error rate of 9%. The TABLE III. VALUE OF F1 SCORE, RECALL AND PRECISION OBTAINED
decision tree was not able to perform and was able to get only Classifiers F1 Score recall Precision
81% accuracy, with an error rate of 19% and Random Forest Random Forest 0.77 0.75 0.83
achieved a classification accuracy of 83% and an error rate of Decision Tree 0.77 0.75 0.79
17%. The hyperparameters used for different machine ANN 0.89 0.88 0.89
learning models are as follows: for RF we have used Max
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.
ANN obtains the highest value of precision, recall as well
as F1 score i.e. 89%, 88%, and 89%. DT achieves the lowest
precision of 79%. For random forest i.e. 83%. Recall and f1
score values are the same for DT as well as RF i.e. 75% and
77%.
V. MODEL INTERPRETATION
SHAP (SHapley Additive exPlanations) is a model
interpretability method that is based on the Shapley regression
values. It assigns an importance value to each feature in the
dataset which shows the contribution of that feature to a
specific prediction. This important value is also called SHAP
value. It supports both local and global interpretations which
makes it easier for users to understand the importance of
features in an individual prediction or the entire dataset.
Here, we have computed feature importance across all the
models using SHAP summary plots. Figure 7, Figure 9, and
Figure 11 show the summary plot for feature importance for
the implemented models. Figure 8 and Figure 10 show the
direction of impact of each feature on the model’s prediction.
Points on the right (red) indicate higher feature values, while
those on the left (blue) indicate lower feature values. The
features are ranked based on their impact on the model’s
Fig. 4. Confusion matrix for RF
prediction.
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.
Fig. 7. SHAP summary plot for Random Forest Fig. 10. SHAP value impact for Decision Tree
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] “Global effect of modifiable risk factors on cardiovascular disease and
mortality,” New England Journal of Medicine, vol. 389, no. 14, pp.
1273–1285, Oct. 2023. doi:10.1056/nejmoa2206916
[2] V. Sapra, and M.L. Saini, “Deep learning network for identification of
ischemia using clinical data,” International Journal of Engineering and
Advanced Technology, 8(5), pp. 2357-2363 June. 2019.
[3] R. C. Oliveira and R. D. Silva, “Artificial Intelligence in agriculture:
Benefits, challenges, and Trends,” Applied Sciences, vol. 13, no. 13, p.
7405, Jun. 2023. doi:10.3390/app13137405
[4] V. Sapra and L. Sapra, “Early detection of type 2 diabetes mellitus
using deep neural network–based model,” Advanced Healthcare
Systems, pp. 305–317, Jan. 2022. doi:10.1002/9781119769293.ch15
[5] V. Hassija et al., “Interpreting black-box models: A review on
Explainable Artificial Intelligence,” Cognitive Computation, vol. 16,
no. 1, pp. 45–74, Aug. 2023. doi:10.1007/s12559-023-10179-8
[6] T. Hulsen, “Explainable artificial intelligence (XAI): Concepts and
challenges in Healthcare,” AI, vol. 4, no. 3, pp. 652–666, Aug. 2023.
doi:10.3390/ai4030034
[7] J. Y. Kim, U. H. Shin, and K. Kim, “Predicting biomass composition
and operating conditions in fluidized bed biomass gasifiers: An
automated machine learning approach combined with Cooperative
Game Theory,” Energy, vol. 280, p. 128138, Oct. 2023.
doi:10.1016/j.energy.2023.128138
[8] A, Dhurandhar et al., “Locally invariant explanations: Towards stable
and unidirectional explanations through local invariant learning,”
Advances in Neural Information Processing Systems, vol 36, Feb 2024.
[9] S. M. Mahim et al., “Unlocking the potential of XAI for improved
alzheimer’s disease detection and classification using a VIT-GRU
model,” IEEE Access, vol. 12, pp. 8390–8412, 2024.
doi:10.1109/access.2024.3351809
[10] B. Majhi and A. Kashyap, “Explainable AI-driven machine learning
for heart disease detection using ECG Signal,” Applied Soft
Computing, vol. 167, p. 112225, Dec. 2024.
doi:10.1016/j.asoc.2024.112225
[11] B. Sharma, L. Sharma, C. Lal, and S. Roy, “Explainable artificial
intelligence for intrusion detection in IOT Networks: A deep learning
based approach,” Expert Systems with Applications, vol. 238, p.
121751, Mar. 2024. doi:10.1016/j.eswa.2023.121751
[12] F. M. Talaat, A. R. Elnaggar, W. M. Shaban, M. Shehata, and M.
Elhosseini, “CardioRiskNet: A hybrid AI-based model for explainable
risk prediction and prognosis in cardiovascular disease,”
Bioengineering, vol. 11, no. 8, p. 822, Aug. 2024.
doi:10.3390/bioengineering11080822
[13] R. M. Munshi et al., “A novel approach for Breast Cancer Detection
using Optimized Ensemble Learning Framework and xai,” Image and
Vision Computing, vol. 142, p. 104910, Feb. 2024.
doi:10.1016/j.imavis.2024.104910
[14] J. S. Nimbhorkar, K. S. Aravind, K. Jeevesh, and S. Palaniswamy,
“Detection of pneumonia and COVID-19 from chest X-ray images
using neural networks and deep learning,” Lecture Notes in Networks
and Systems, pp. 61–71, Oct. 2022. doi:10.1007/978-981-19-4863-
3_6.
[15] Md. R. Hassan et al., “Prostate cancer classification from ultrasound
and MRI images using deep learning based explainable artificial
intelligence,” Future Generation Computer Systems, vol. 127, pp. 462–
472, Feb. 2022. doi:10.1016/j.future.2021.09.030
[16] V. Jackins, S. Vimal, M. Kaliappan, and M. Y. Lee, “AI-based smart
prediction of clinical disease using random forest classifier and naive
bayes,” The Journal of Supercomputing, vol. 77, no. 5, pp. 5198–5219,
Nov. 2020. doi:10.1007/s11227-020-03481-x
[17] B. Charbuty and A. Abdulazeez, “Classification based on Decision
Tree Algorithm for Machine Learning,” Journal of Applied Science
and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021.
doi:10.38094/jastt20165.
Authorized licensed use limited to: SRM Institute of Science and Technology Kattankulathur. Downloaded on June 05,2025 at 21:54:59 UTC from IEEE Xplore. Restrictions apply.