0% found this document useful (0 votes)
14 views

Using Machine Learning For Detection and Prediction of Chronic Diseases

research paper for chronic diseases using machine learning

Uploaded by

Amol Dhande
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Using Machine Learning For Detection and Prediction of Chronic Diseases

research paper for chronic diseases using machine learning

Uploaded by

Amol Dhande
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

Using Machine Learning for Detection and


Prediction of Chronic Diseases
1, 2
Nacim Yanes , Leila Jamel3, Bayan Alabdullah3, Mohamed Ezz 4
(Member, IEEE),
4
Ayman Mohamed Mostafa (Member, IEEE), Housameldeen Shabana5
1
RIADI Laboratory, La Manouba University, Tunisia
2
Higher Institute of Management of Gabes, Gabes University, Tunisia
3
Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi
Arabia
4
College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
5
College of Medicine, Shaqra University, Shaqra, KSA
Corresponding Author: Leila Jamel ([email protected])

This research project was funded by the Deanship of Scientific Research, Princess Nourah bint Abdulrahman University, through the
Program of Research Project Funding After Publication, grant No (44- PRFA-P- 113)

ABSTRACT Heart attacks are a leading cause of mortality worldwide, necessitating the development of accurate predictive
models to enhance early detection and intervention strategies. This study addresses the significant problem of class imbalance
in medical datasets, specifically focusing on heart attack prediction using the Behavioral Risk Factor Surveillance System
(BRFSS) dataset. To tackle this challenge, advanced machine learning (ML) methods are proposed to involve a refined dataset
of 399,875 instances, with 47 significant features maintained through rigorous data cleaning and preparation. Balanced
accuracy and macro-recall were chosen as primary metrics to ensure fair performance evaluation across classes in the
imbalanced dataset. Our proposed system entails a detailed evaluation of various algorithms known for their effectiveness in
managing class imbalance. The LGBM Classifier, XGB Classifier, and Logistic Regression (LR) are optimized using recursive
feature elimination and hyperparameter tuning with Optuna. The results of this study are encapsulated in an ensemble model
that significantly enhances predictive accuracy. The final model achieved 80.75% balanced accuracy and 79.97% recall for
critical heart attack cases (class 1), along with an AUC score of 88.9%, indicating superior class distinction capability.
Additionally, the application of SHAP (SHapley Additive exPlanations) analysis provided valuable insights into the
contribution of each feature to heart attack likelihood, thus improving model transparency. This study's successful integration
of complex ML techniques with interpretability analyses like SHAP marks a substantial advance in early detection and
intervention strategies in healthcare. It demonstrates the potential of sophisticated ML approaches for early heart attack
detection and prevention, highlighting their value in improving outcomes for patients with chronic diseases. These findings
suggest promising pathways for employing advanced analytical tools in healthcare to enhance patient care.

INDEX TERMS Heart attack prediction, Ensemble model, Chronic Diseases, Class imbalance, ML classifiers, Model
transparency

I. INTRODUCTION This paper looks at the prevalence of chronic diseases in


Cardiovascular disease, the primary cause of death our country and the association of smoking with
worldwide, indicates another urgent need for advanced and hypertension and diabetes in both young people and the
early prediction and detection systems. ML and extensive elderly population. It specifically intends to deal with the
data analysis in the healthcare system have expanded the complications related to bias input data. This particular
applicability of predictive diagnostics in the healthcare feature of health data, where positive cases (heart attacks) are
sector. They promise to minimize the burden of cardio- outnumbered by negative ones (no incidents), can develop a
related diseases and avoid unnecessary complications prediction model that would be accurate for all cases and
through timely and accurate predictions [1]. very sensitive for heart attack prediction [2]. The reason why
balanced accuracy index and macro-recall are helpful to

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

evaluate the models is the ability of these metrics to give a different features towards predicting heart attacks.
more accurate picture of the performance of a model in both Therefore, it emphasizes the transparency and
classes of both majority and minority; the most undesirable interpretability of the developed models.
outcomes in medical domains are always false negatives. ML By addressing these questions, our research endeavors
applications in healthcare are spreading widely. However, to enhance the predictive accuracy of heart attack incidents
they still need to work on several issues, mainly class using ML and contribute to the broader understanding of
imbalances, feature selection, and the tuning part of the applying advanced analytical techniques to healthcare
models to boost predictive performance. datasets. Our findings provide a foundation for future studies
This paper presents data processing algorithms that are and practical applications in early heart attack detection and
underpinned by meticulousness and preprocessing that prevention strategies.
eliminates missing values and reduces dimensionality. We
then move on to ML algorithms with a focus on addressing II. RELATED WORK
class imbalance. A series of experiments are conducted that Among the numerous causes of death worldwide,
were methodically crafted. Using regular feature elimination cardiovascular diseases hold a significant portion. Areas
and Optuna, a hyperparameter optimization framework, the with limited cardio-logical care and where misdiagnoses are
accuracy of the top-performing models is enhanced. The common are the most extensively studied. In their study [1],
models' prediction strengths were integrated using an Ali and his colleagues aim at predicting heart disease early
ensemble technique, making these predictions even more and accurately. Relying on ML through digital patient record
accurate and further improved by setting up the appropriate assessment, they apply various supervised choices and their
classification threshold with the help of Youden's J Index. feature importance. The random forest (RF) algorithm
This model represents the best classification model ensemble gathers excellent results, including perfect accuracy, holding
and excels in balanced accuracy and recall for heart attack great promise as a diagnostic tool that helps to increase
cases. diagnostic accuracy and efficiency in limited-resource
Our study contributes to the ongoing efforts in healthcare settings. To predict which heart disease patients
predictive healthcare analytics. It presents a replicable model require emergency care, the authors of [2] proposed a novel
for addressing similar challenges across various domains stacking ensemble learner model that leverages a unique
where class imbalance and predictive accuracy are of approach with behavior-based features and a private MIT
paramount concern. dataset, outperforming existing methods with 88% accuracy
Question 1: What role do feature selection and in predicting emergency readmission. This holds promise for
hyperparameter tuning play in enhancing the predictive early intervention and improved clinical outcomes. One of
performance of ML models for heart attack prediction? the recent research models that applied ML for heart attack
Objective Addressed: The study directly addresses the prediction is presented by El-Hasnony and his colleagues [3].
impact of feature selection and hyperparameter tuning on By optimizing models and evaluating them in real-world
predictive performance. It aims to enhance overall accuracy scenarios, they found that most methods excelled at accurate
and reliability in heart attack predictions. As explained in early detection and enabling proactive preventive care. This
(Section V), the answer to this question is resolved where the suggests a cost-effective and promising approach for
feature selection can enhance ML performance. catching heart disease early and preventing it more
Question 2: Can ensemble-modeling techniques improve effectively. To improve heart disease outcome prediction
the prediction of heart attack incidents over individual and overcome the limitations of traditional models, Liu and
models, and how can the optimal combination of models be colleagues [4] explore using AI on data collected through
determined? IoT sensors. They aim to address issues like data bias and
Objective Addressed: The current study explicitly low accuracy, ultimately seeking a more accurate and
investigates the practical application of ensemble modeling effective AI-powered prediction system for this critical
techniques in improving heart attack prediction over medical challenge. Furthermore, Singh and his coworkers [5]
individual models. It seeks to determine the optimal validate the various ML models and demonstrate the link
combination of ML models and identifies best practices for between gait parameters and cardiovascular health to predict
constructing ensemble models. The improvement of heart and understand heart health. This research aims to find each
attack prediction using ensemble modeling is explored in person's cardiovascular risk level. This Gait System
(Section V). investigates gait characteristics like step length, stride length,
Question 3: What insights can be derived from model cadence, and velocity through the experimental collection of
explanations, mainly using SHAP (SHapley Additive gait data using retro-reflective markers.
exPlanations), to understand different features' contribution Applications being established with successful results in the
to predicting heart attacks? early diagnosis of heart diseases are increasing. Therefore,
Objective Addressed: The study addresses the importance many studies using heart disease detection and classification
of model interpretability and aims to uncover insights from methods, in particular, have been carried out. Chen and
model explanations, particularly using SHAP analysis colleagues [6] made use of the INDANA database. They
(Section VII). It seeks to understand the contribution of adopted the LR, CART, and MLP algorithms to predict
cardiac disease. The experiment proved that MLP is the best

2
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

all and was able to attain 76% correctness, CART next was disease. There were also trained and evaluated systems with
69.1% accurate, and LR was able to perform 65.9% the developer rapidly setting system parameters and
correctly. A cardiac disease prediction system utilizing data browsing through the classifier efficiency curve
mining techniques and the Naive Bayes (NB) algorithm was visualization in Python. The authors revealed that the neural
presented by K. Vembadasamy et al. [7]. The authors network and fuzzy KNN techniques performed better than
classified the dataset using the NB technique, which they other approaches like K means clustering, K-nearest
found to have a low computation time and an accuracy of neighbors, and logistic regression. A deep learning model for
86.4198%. The clinical data collected in this study was the early detection and prediction of CKD is described by
gathered from one of Chennai's top diabetic research Singh et al. [13]. This project aims to construct a deep neural
institutes and included information on roughly 500 network and assess its effectiveness compared to other
individuals. By employing Factor Data Factor Analysis cutting-edge machine learning techniques. All of the
(FDFA) on the UCI Cleveland Heart Disease Data Set, database's missing values were filled up during testing using
Ankur Gupta et al. [8] verified the MIFH framework using a the associated features' average. Then, after defining the
holdout validation technique and discovered that its parameters and carrying out multiple trials, the neural
sensitivity (89.28%), accuracy (93.44%), and specificity network's ideal parameters were determined. Recursive
(96.96%). However, it is important to note that these results Feature Elimination was used to choose the most crucial
were achieved on a smaller, balanced dataset, which may not features (RFE). Key characteristics in the RFE included
fully capture the complexities and challenges presented by hemoglobin, specific gravity, serum creatinine, red blood
larger, more imbalanced datasets like the BRFSS. cell count, albumin, packed cell volume, and hypertension.
Additionally, the use of a simple holdout validation Machine learning models received a selection of features in
technique, as opposed to more robust cross-validation order to classify them. One deep neural model performed
methods, may limit the generalizability of the results across better than the other.
different data subsets. In order to forecast the chance of a patient developing
A system featuring a multi-agent shell model (MASM) heart disease, Rairikar et al. [14] examined prediction
with a depth-wise binary convolutional neural network was systems for heart disease employing a greater number of
proposed for early diagnostics of heart diseases [9]. Using input attributes. These systems use medical terminology like
the Cleveland Highway descriptive database, the system's gender, blood pressure, and cholesterol, like 13 attributes.
effectiveness was assessed. The hybrid model achieved the They suggested an effective genetic algorithm using the
highest accuracy of 90.1%, while the high accuracies of backpropagation method to predict cardiac disease. Abbas et
88.9% and 98.4% were recorded for high recall. With this, al. [15] studied ML and DL methods to analyze noisy sound
though, on average, traditional CNN and essentially other signals to identify cardiac problems. The investigation
ML models performed better (between 72.3% and 83.8%). utilized two subsets of the PASCAL CHALLENGE datasets
Obasi and Shafiq [10] introduced three ML models using containing authentic cardiac audio. Mel-frequency Cepstral
classifier algorithms such as Logistic Regression, Random Coefficients (MFCCs) and spectrograms were employed to
Forest, and Naive Bayes Classifier. The authors developed represent the signals in the research process graphically.
the information from the existing patients' medical history Data augmentation enhanced the model's performance by
with the test data created and evaluated the models' adding artificial noise to the heart sound signals.
performance. RF model showed the best performance among Research leveraging the Behavioral Risk Factor
all the models we had selected, with accuracy rates of Surveillance System (BRFSS) dataset to predict heart attack
92.44%, 59.7%, and 61.96% for RF, LR and NB Classifier risks using ML has seen significant interest, given the
models. Nagavelli et al. [11] compared the performance of comprehensive coverage of health-related behaviors and
four ML models only for diagnosing cardiac disease. The conditions across the U.S. population. Several works
models included a duality optimization DO-SVM, an explicitly utilizing the BRFSS dataset in conjunction with
improved SVM with a weighted system for an approach ML methodologies to predict heart attack incidents were
prediction, two XGBoost SVMs with prediction models and proposed in the literature, highlighting the evolution of
XGBoost-only prediction models. The authors assessed the analytical techniques and key findings within this research
models based on the four main criteria: precision, accuracy, domain.
recall, and F1 measurement. We successfully detected heart The authors of [16] have used eight different test
disease using an XGBoost algorithm with excellent patterns of machine learning to reveal data available from the
prediction quality and high sensitivity, specificity, precision, 2020 survey on the Behavioral Risk Factor Surveillance
and F1 scores. System (BRFSS) provided by the Centers for Disease
The NB model with a weighted approach can be claimed Control and Prevention (CDC). The diverse selection of
to have poorer accuracy than other models. However, the methods contains Adaboost, Multilayer Perceptron (MLP),
DO-SVM model can be evaluated as having worse precision, DT, KNN, LR, SVM, NB, and XGB. As the nominal
recall, and F1 numbers. Another article proposed by Yadav independent variable, heart disease is exiled in the given
et al. [12] tried to distinguish computational approaches such data, the authors lead to balancing both the dependent and
as NB, KNN, LR, and hybrid ones to discover cardiac independent variables through (the SMOTE-Tomek Link)

3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

method, which stabilizes the dependent variable, before allowing early interventions which reduce or stop the
applying the classification methods. The data is split into development of the disease. As shown in [21], Manoj and
outliers and non-letters. The outlier analysis is done using the colleagues proposed using ML to design a system for early
traditional statistical technique to get unbiased and accurate warning of chronic heart attack to provide better disease
estimations. The validation step in the classification process detection and improvement. The system detects and utilizes
was done using a 10-fold cross-validation study to produce different algorithms, such as LR and RF, to identify the
the most unbiased results and better compare the methods. algorithms that achieve the highest accuracy and efficiency.
As a result, the XGB model can trace similarities in the The final vision of the model can be to forecast heart attacks,
detection of heart diseases and can give 89% accuracy for the which could be continuously enhanced through the expertise
non-outliers. An 84.6% accuracy is achieved for the outliers
and evolution of different techniques and data
concerning the issue in the early stage of the disease. It
representation.
outperforms the k-NN algorithm with an accuracy rate of
85.6% in the non-outlier data and 81% in the outlier data,
III. Data Processing
showing that it completes these tasks quickly. Instead, XGB The research dataset that our foundation is built on is
and k-NN algorithms would be performed to identify BRFSS [22], a rich dataset containing health-related
signatures of impending disease in this context and patterns information and a range of factors that affect general health
of heart disease diagnosis. and wellness of the American population. The large size,
In a second study, a novel approach using the BRFSS- complexity, and depth of this dataset called for a very
2015 Dataset was proposed in the literature. Neeraj et al. [17] sophisticated data processing stage, whose main goal was to
introduced a hybrid deep neural net learning model for CHD refine and customize the data to fit the special needs of the
prediction. This model, which uses the co-relation score to ML model used for heart attack predictions. The data were
select the optimal features subset and the cluster-abundant processed carefully to make sure that the data fed in our
data class approach to balance the dataset classes, represents predictive models had the correctness and relevance. As
a significant advancement in the field of heart disease presented in Table 1, the dataset comprised 401,958 unique
prediction. The model's hyper-parameter optimization is respondents with 279 diverse features, a testament to the
achieved through randomized Search Cross-Validation comprehensive nature of the BRFSS survey. Given our focus
Optimization (RSCV) of the Gated Recurrent Unit (GRU) on heart attack prediction, it was imperative to sift through
and Bi-direction Long Short-Term Memory (BiLSTM). The this extensive feature set to isolate the variables most
proposed model outperforms existing models, achieving a pertinent to our research objective. This process involved a
classification accuracy of 98.28% compared to GRU, LSTM, thorough review of the dataset documentation provided by
and BiLSTM-GRU. Another study, proposed by Das et al. the CDC, which guided our cleaning and replacement
[18], used survey data from 400k US citizens to develop and strategies for each feature.
assess six ML models for heart disease prediction. The six
ML models that were tested in this study were also
compared: XGB, Bagging, RF, DT, KNN, and NB. The
accuracy, sensitivity, F1-score, and AUC of six ML
algorithms are also evaluated and presented. The XGB
model demonstrated optimum performance outcomes with
an accuracy rating of 91.30%.
As presented in [19], Mehta and coworkers dealt with
the worldwide problem of heart disease by designing a model
that will forecast the survival of the patients based on their
symptoms. It utilizes ML and a data analysis approach to
take advantage of the large volume of patient data. The
model that is designed to tweak parameters is intended to
bring out the relationship between heart symptoms severity
and fatalness of heart disease outcomes. Finally, the project
aims to heighten the accuracy of survival predictions by
establishing around 88% average accuracy in the case of
various prediction methods. Selvakumar and Coworkers [20]
employed ML algorithms to predict the occurrence of heart
attacks in the catchment area and to lower the mortality rate.
Heart disease in the world is something that affects a large
number of people, and this research has set out to discover
various related factors such as age, gender, and cholesterol
levels. This predictive model helps give a diagnosis,

4
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

FIGURE 1. Data preprocessing steps Each model was selected for its unique strengths and
A significant portion of our preprocessing involved potential applicability to imbalanced datasets, with
handling ambiguous or non-committal responses, such as configurations adjusted to handle class imbalance, such as
"Don’t know/Not sure" and "Refused to answer," by setting the 𝒄𝒍𝒂𝒔𝒔_𝒘𝒆𝒊𝒈𝒉𝒕 parameter to a balanced.
converting these to null values. This approach was applied To ensure robust model evaluation and avoid
universally across features that contained these response overfitting, stratified 𝑲 − 𝑭𝒐𝒍𝒅 cross-validation with 5 folds
codes, ensuring consistency in how missing data was treated. is applied, maintaining the proportion of heart attack cases
For features querying the number of days (e.g., MENTHLTH across each fold. This strategy was complemented by
for days mental health not good), codes indicating zero days imputing missing values with the mean of the training data
(88, 888) were duly replaced with numerical zeros, aligning during each fold, preserving the integrity of our dataset.
with the quantitative nature of these variables.
The preprocessing phase solved the feature selection and B. OPTIMAL FEATURE SELECTION
removal augment by the value with excessive missing values Upon assessing the performance of our ML models,
on it. Any feature having null values more than 30% threshold LGBM, XGB, and LR are identified as the top contenders
was excluded from further procedures, and desensitization based on their balanced accuracy and macro-recall scores. To
was applied to inputs by masking identifiable information. refine these models further, Recursive Feature Elimination
This rigorous data scrubbing left only 399875 cases, involving (RFE) with Cross-Validation (RFECV) is employed, a
47 attributes, and built a good base for next work of data robust methodology designed to pinpoint the most impactful
preparation and analysis. features for predicting heart attack incidents while
The other variables were divided in order to apply binning simultaneously optimizing model performance.
where applicable (e.g. Height_In_Meters, The RFECV iteratively evaluates the contribution of
Weight_In_Kilograms), and thereby transforming continuous each feature to the model's predictive accuracy,
variables to categorical ones for more precise analysis. The systematically removing the least significant features until
MinMaxScaler was used to normalize feature values in order the optimal subset is determined. This process is guided by
that no feature unbalancedly affected model prediction due to cross-validation to ensure the generalizability of the selected
differences in scales. Among the last operations was features across different subsets of the data. The RFECV
calculating the Pearson correlation coefficients to exclude the process can be outlined as follows:
features with high multicollinearity, reducing the feature set to
1) INITIALIZATION
only those having significant predictive power.
Due to these systematic data processing steps, the Let 𝐹 = {𝑓1 , 𝑓2 , . . . , 𝑓𝑛 } denote the initial set of n
integrity and completeness of data set are ensured, as well as features. The goal is to find a subset 𝐹 ∗ that maximizes the
linearization of the data with the scope of the study. In this cross-validated performance measure 𝑃.
careful preliminary work, ML algorithms were obtained and
2) RECURSIVE ELIMINATION
specified for their ability of successfully tackling the problems
stemmed from the imbalanced nature of the dataset and For each iteration 𝑖:
extracting meaningful predictions on heart attack incidents.  Train the model using the current set of features 𝐹𝑖 .
 Evaluate the importance of each feature 𝑓𝑖 in 𝐹𝑖 .
IV. Proposed Methodology  Remove the least important feature 𝑓𝑙𝑒𝑎𝑠𝑡 .
In our proposed methodology, an approach is applied  Update the feature set 𝐹𝑖 + 1 = 𝐹𝑖 / 𝑓𝑙𝑒𝑎𝑠𝑡
that makes use of multiple ML algorithms specially designed 3) CROSS-VALIDATION
to capture the peculiarities of the BRFSS dataset, especially
its imbalanced structure. The methodology is built around a At each iteration 𝑖, assess the model's performance
set of experiments that are designed for improving model 𝑃𝑖 using cross-validation with the feature set 𝐹𝑖 . This ensures
performance, feature selection, and optimizing the predictive that the feature elimination process generalizes well across
performance of the ensemble model. Additionally, we different data subsets.
expound on the elements of our methodology, which are 4) OPTIMIZATION CRITERION
performance measures, model selection, and cross-validation
method. In this step, the process continues until a stopping
criterion is met, typically when removing any further features
A. MODEL SELECTION AND CROSS-VALIDATION reduces the cross-validated performance score. The optimal
Our initial experiment involved the application of feature set 𝐹 ∗ is the one that maximizes the performance
several ML algorithms, including LR [23, 24], Light
measure 𝑃.
Gradient Boosting Machine (LGBM) Classifier [25, 26],
Gradient Boosting (XGB) Classifier [27], RF Classifier [28, 5) MODEL RETRAINIG
29], KNN Classifier [30], and Decision Tree (DT) Classifier Finally, the models are retrained using only the features
[31]. in 𝐹 ∗ , ensuring that they are optimized for both performance
and complexity.

5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

Through RFECV, we were able to distill the original set


of 47 features down to more concise subsets tailored to each
model as shown in Figures 2, 3, and 4, significantly enhancing
their predictive accuracy.
This meticulous feature selection process not only
improved model efficiency by discarding redundant or
irrelevant features but also shed light on the most predictive
indicators of heart attack risk.
For instance, features such as age, cholesterol levels, and
blood pressure were consistently recognized across models as
key predictors, aligning with clinical knowledge regarding
heart attack risk factors.

FIGURE 4. Feature selection curve for LR classifier show optimal


features at 43.

C. HYPERPARAMETER TUNING WITH OPTUNA


After determining the optimal feature sets for each model
using RFECV, hyperparameter tuning using Optuna is
FIGURE 2. Feature selection curve for XGB classifier show optimal
proceeded. Optuna is a powerful, automated optimization
features at 25. framework designed to find the best hyperparameters
efficiently. Our primary objective was to maximize balanced
accuracy, a crucial metric for ensuring fair performance across
both classes in our imbalanced dataset.
Optuna employs Bayesian optimization to navigate the
hyperparameter space effectively. It uses past trial results to
inform future hyperparameter selections, aiming to find the
optimal configuration with fewer iterations. The process can
be described as follows:

1) OBJECTIVE FUNCTION
 Define an objective function 𝑂(𝜃) that takes a set of
hyperparameters 𝜃 and returns the performance
metric of interest, in this case, balanced accuracy.
 Optuna seeks to maximize 𝑂(𝜃).
2) SEARCH SPACE
 Specify the hyperparameter space 𝐻 for each model,
where 𝐻 = {ℎ1 , ℎ2 , . . , ℎ𝑚 } and each ℎ𝑖 represents a
hyperparameter domain.

FIGURE 3. Feature selection curve for LGBM classifier show optimal


features at 35.

6
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

3) BAYESIAN OPTIMIZATION

 Optuna selects the next set of hyperparameters 𝜃next


to evaluate based on the performance of previously
evaluated configurations.
 It constructs a probabilistic model 𝑃( 𝑂 ∣ 𝜃 ) that
estimates the likelihood of observing a performance
metric 𝑂 given hyperparameters 𝜃.

4) SAMPLING

 Optuna samples 𝜃next from 𝐻 using the acquisition


function derived from the probabilistic model 𝑃(𝑂 ∣
𝜃).
 The acquisition function balances exploration of
unvisited regions in 𝐻 and exploitation of regions
known to yield good results.

6) EVALUATION AND UPDATE

 Evaluate the objective function 𝑂 (𝜃𝑛𝑒𝑥𝑡 ) by training


the model with 𝜃𝑛𝑒𝑥𝑡 and calculating the balanced
accuracy.
 Use this result to update the probabilistic
model 𝑃(𝑂 ∣ 𝜃), refining future hyperparameter
selections.

7) ITERATION

 Repeat steps 3 to 5 until a stopping criterion is met,


such as a maximum number of trials or convergence
to a desired performance metric.
Using Optuna, extensive hyperparameter optimization is
conducted for the XGB Classifier, LGBM Classifier, and
Logistic Regression (LR) models, as illustrated in Figures 5,
6, and 7, respectively. This systematic and efficient
exploration of the hyperparameter space led to
configurations that significantly enhanced model
performance in terms of balanced accuracy.
The Optuna is used to look at the most fine-tuned
models that are suitable for our dataset’s special
properties. For instance, optimal settings is established for
learning rate, number of estimators, and regularization terms
that significantly enhanced the models' capability detecting
heart attack events with high accuracy and without bias.
Through the use of Optuna hyperparameter tuning in our
methodology, an improved model optimization level is
attained that reflects the significance of the automated and
data-driven approaches for the commencement of healthcare
analytics prediction.

FIGURE 5. Parameter Optimization of XGB Classifier

7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

FIGURE 6. Parameter Optimization of LGBM Classifier


FIGURE 7. Parameter Optimization of LR

8
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

D. ENSEMBLE MODEL INTEGRATION attacks even when they represent a small fraction of the
To leverage the strengths of our individually optimized dataset.
models, an ensemble strategy combining the predictive  Area under the Curve (AUC) was utilized to evaluate
powers of the LGBM Classifier, XGB Classifier, and Logistic the performance of our final model in distinguishing
Regression (LR) is developed. The principle behind ensemble between heart attack cases and non-cases. The AUC
methods is that the collective decision of multiple models can measures the ability of the model to rank positive
outperform any single model’s prediction, particularly in instances higher than negative ones, a crucial metric for
scenarios with complex data patterns and class imbalances. assessing the quality of binary classification models,
Two ensemble configurations are tested: one dual-model especially in the presence of class imbalance. Formally,
ensemble (combining LGBM and XGB Classifiers) and a tri- the AUC is defined as the probability that a randomly
model ensemble (incorporating LGBM, XGB, and LR). The chosen positive instance is ranked higher than a randomly
aggregation method used was a simple weighted average, chosen negative instance. These performance measures
where initial weights were set equally but were later fine-tuned provided a comprehensive evaluation of our models,
based on individual model performance. ensuring robust and equitable prediction capabilities
Our experimental results revealed that the dual-model crucial for medical diagnostics.
ensemble outperformed the tri-model ensemble in terms of
balanced accuracy. This superior performance can be F. FINAL MODEL EVALATION WITH YOUDEN’S J INDEX
attributed to the complementary error reduction between the The application of Youden's J Index, which is a statistical
LGBM and XGB Classifiers, which effectively captured measure used in the classification of binary variables, was the
different aspects of the underlying data structure. The dual last stage of our underpinning ML technique. The preceding
ensemble achieved a balanced accuracy that underscored the measure notably improved the model's discriminative
efficacy of ensemble methods in enhancing prediction quality, capability, thus, it was able to make an accurate classification
especially in complex predictive tasks like heart attack between heart attack cases (positive cases) and non-incident
prediction. cases (negative ones). Youden’s J Index is of utmost
The success of the ensemble model underscores the importance in the practical application of medical diagnostics,
potential of combining diverse ML approaches to achieve where the accuracy of diagnosis can have a direct effect on the
greater predictive performance. It also highlights the patient’s outcome. Youden's J Index, denoted as 𝐽𝑏 For binary
importance of careful model selection and weighting in variables enhances binary classification by optimizing the
ensemble construction, ensuring that each model's threshold that separates positive from negative predictions.
contributions are optimally leveraged for improved accuracy This optimization is critical in imbalanced datasets, like those
and robustness against varied data patterns. This method has often encountered in medical diagnostics, where the cost of
both improved the predictive accuracy and offered a valuable false negatives and false positives can be disparate. Youden's
model for processing the multifaceted nature of medical J Index is defined as:
datasets.
𝐽𝑏 = 𝑚𝑎𝑥𝑡 ( 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (𝑡) + 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (𝑡) − 1 ) (2)
E. PERFORMANCE MEASURES
Given the critical importance of accurately predicting Where 𝑡 represents the threshold, and the maximization is
heart attacks and the challenge presented by the class over all possible thresholds. Sensitivity (true positive rate)
imbalance in our dataset, balanced accuracy and macro-recall and Specificity (true negative rate) are calculated as:
are selected as our major performance measures.
 Balanced Accuracy is the average of the true positive 𝑇𝑃𝑆
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (𝑡) = (3)
rate (sensitivity) and the true negative rate (specificity), 𝑇𝑃𝑆 + 𝐹𝑁𝑆
ensuring an equitable assessment of model performance 𝑇𝑁𝑆
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (𝑡) = (4)
for both majority and minority classes: 𝑇𝑁𝑆 + 𝐹𝑃𝑆
1
1 𝑇𝑃𝑎 𝑇𝑁𝑎 𝐴𝑈𝐶 (𝑡) = ∫ 𝑇𝑃𝑆 ( 𝐹𝑃𝑆 ) 𝑑 (𝐹𝑃𝑆 ) (5)
𝐵𝐴 = ( + ) (1) 0
2 𝑇𝑃𝑎 + 𝐹𝑁𝑎 𝑇𝑁𝑎 + 𝐹𝑃𝑎
Where: 𝑇𝑃𝑎 , 𝑇𝑁𝑎 , 𝐹𝑃𝑎 , 𝐹𝑁𝑎 Denote true positive, true Where 𝑇𝑃𝑆 , 𝑇𝑁𝑆 , 𝐹𝑃𝑆 , 𝐹𝑁𝑆 refers to the true positives,
negatives, false positives, and false negatives of classes true negatives, false positives, and false negatives,
respectively. respectively, evaluated at the threshold 𝑡. Youden's J Index
 Macro-Recall extends this fair approach by computing seeks the threshold 𝑡 that maximizes the sum of sensitivity
recall for each class separately and then averaging these and specificity, thereby achieving the best trade-off between
values, ensuring sensitivity to the identification of heart correctly identifying heart attack cases and avoiding false
alarms.

9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

Through the Youden's J Index, we found the threshold incidents, showcasing improvements in accuracy and
that maximized our ensemble model's precision. This robustness, and illustrating the effectiveness of combining
approach paved the way for the refinement of the model as a ML techniques for medical diagnostics. This algorithm
predictive tool in a way that it not only attains high weighted encapsulates the entire process in a sequential and logical
accuracy but also tackles the challenges of imbalance in the manner, highlighting the key steps and methodologies
dataset as expected. The Youden's J Index-derived optimal applied to refine the predictive model for heart attack
threshold enabled the model to be more elaborate in terms of incidents using the BRFSS dataset.
discriminating heart attack events from non-events, which
resulted in higher model accuracy. The usage of Youden's J
Index as the final step in our evaluation chain is a clear
indication that accuracy and clinical relevance becomes the
essence of our predictive model. Consequently, a model is
developed, which uses the balanced accuracy, feature tuning,
and the combined capability of ensemble models to provide
major improvement in the prediction of heart attack
cases. This section plants the root of our in-depth discussion
on the consequences and outcomes obtained from the
comprehensive study, thus, proving the model’s practicality
and applicability in medical diagnostics.

G. FEATURE IMPORTANCE WITH SHAP ANALYSIS


To interpret our predictive model and understand the
influence of each feature on the model's decisions, SHAP
value analysis is incorporated. SHAP values through game-
theory approach give information on the impact each feature
is having for every instance's prediction. This thorough study
helps to unravel the behavior of complicated model situations,
even in a high dimensional dataset. In systematically
explaining healthcare datasets such as the BRFSS, this is often
the case. After training our model, a library of SHAP is used
to obtain SHAP values for all test dataset. SHAP summary
plot was produced next, which depicts feature effects
estimated by the model. The color used to represent if the
value is high (red) or low (blue), while the location along the
horizontal axis indicates the prediction of the value in relation
to declining or rising outcomes.
In addition to traditional performance metrics, we
leveraged SHapley Additive exPlanations (SHAP) to interpret
the predictive models' output. SHAP values, based on game
theory, provide insights into the contribution of each feature to
the predictive model's decision for each individual prediction.
This method aligns with the concept of feature importance but
extends it by quantifying the positive or negative contribution
of each feature to the model's output. For each prediction, a FIGURE 8. Proposed Framework for Heart Attack Prediction
SHAP value is assigned to each feature, representing the
impact on the model's output compared to the baseline V. Experimental Results
prediction. A positive SHAP value indicates a push towards a The application of our algorithm yielded insightful
higher probability of having a heart attack, while a negative results across multiple stages of experimentation,
value indicates a push towards a lower probability. By culminating in the development of a robust ensemble model
analyzing SHAP values, we can discern patterns and capable of accurately predicting heart attack incidents within
influential factors that the model relies upon, which aids in the BRFSS dataset. This section details the outcomes of each
understanding the model's behavior beyond its accuracy and experimental phase, highlighting the performance of
AUC. individual models, the efficacy of feature selection,
Based on the proposed methodology, the finalized hyperparameter tuning, and the ultimate success of our
model is now ready for deployment in predicting heart attack ensemble approach.

10
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

A. EXPERIMENT 1: MODEL PERFORMANCE accurately predicting heart attack cases. This indicates a
EVALUATION more equitable and reliable performance in medical
As presented in Table I, the initial phase involved diagnostics compared to the models referenced in [32].
evaluating a diverse set of ML models to establish a baseline
for predictive performance. The models were scored using B. EXPERIMENT 2: OPTIMAL FEATURE SELECTION
balanced accuracy, macro precision, macro recall, a recall for
class 1 (Heart Attack), and a precision for class 1. Among As proposed in Table II, the application of RFECV is
the outstanding models of this experiment, LGBM, XGB, applied to the top three models revealed the optimal number
and LR provided greater balanced accuracy and recall rate of features for maximizing balanced accuracy. This process
for heart attack sorts. refined the models' input dimensions to 35 features for
For example, LGBM model, which is built with LGBM, 25 for XGB, and 43 for LR, respectively. This
balanced accuracy of 80.57%, achieves a recall for class 1 at focused approach to feature selection significantly
78.27%, which indicates its high ranking as a tool for contributed to model efficiency and interpretability without
predicting myocardial infarction. The results of this compromising predictive performance.
experiment outperform the results of [32] that estimates the
risk of diabetes by taking socioeconomic and health-related TABLE II
PERFORMANCE EVALUATION FOR ML MODELS
factors into account. Different methods of data augmentation Balanced Macro Macro Recall Confusion Confusion
Model Precision
are applied to balance training data and enhance model Accuracy Precision Recall Class Matrix Matrix
Name Class 1 %
performance. % % % 1% Class 0 % Class 1 %
LGBM 80.57 59.70 80.57 78.26 20.98 83 78
TABLE I
XGB 80.43 59.80 80.43 77.71 21.13 84 78
PERFORMANCE EVALUATION FOR ML MODELS
Confu LR 80.05 59.9 80.05 76.36 21.43 84 76
Precis
Balanced Macro Macro Recall Confusion sion
ion
Model Name Accuracy Precision Recall Class Matrix Matrix
Class
% % % 1% Class 0 % Class
1%
1% C. EXPERIMENT 3: HYPERPARAMETER TUNING
LGBM 80.57 59.70 80.57 78.27 20.97 83 78
XGB 80.38 59.90 80.38 77.19 21.43 84 77 The Optuna's hyperparameter tuning is applied to
LR 80.06 59.90 80.06 76.39 21.43 84 76 enhance the performance of our top models. As explained in
GB 62.74 77.30 62.74 26.57 58.64 99 27 Table 3, the fine-tuned XGB exhibited a balanced accuracy
DT 61.12 61.20 61.12 26.47 26.71 96 26 improvement to 80.71%, with the recall for class 1 increasing
RF 57.05 78.20 57.05 14.65 61.09 99 15 to 78.55%. Similarly, LGBM balanced accuracy improved to
KNN 56.25 72.70 56.25 13.26 50.27 99 13 80.69%, with a recall for class 1 at 78.41%, demonstrating
LR [32] 62.50 62.30 70.70 - - - - the value of meticulous parameter optimization in achieving
RF [32] 69.10 62.90 70.80 - - - - optimal model performance.
GB [32] 70.30 63.50 71.70 - - - -
TABLE III
PERFORMANCE FOR HYPERPARAMETER TUNING
The comparison of model performance metrics between Balanced Macro Macro Recall Precisio Confusion Confusion
Model
our proposed study and reference [32] reveals notable Accuracy Precision Recall Class n Class Matrix Matrix
Name
differences. The LGBM, XGB, and LR models in the current % % % 1% 1% Class 0 % Class 1 %

study significantly outperform the models from [32] in terms LGBM 80.69 59.80 80.69 78.41 21.11 83 78
of Balanced Accuracy, achieving 80.57%, 80.38%, and XGB 80.71 59.80 80.71 78.55 21.03 83 79
80.06% respectively, compared to 62.50% for LR, 69.10% LR 80.11 59.90 80.11 76.44 21.48 84 76
for RF, and 70.30% for GB in the reference. While the Macro
Precision for LGBM, XGB, and LR in the current study Based on the conducted experimental results that have
(around 59.90%) is lower than that of the GB and RF models been applied on the previous three experiments, the answer
(77.30% and 78.20% respectively), these three models of question 1 is addressed as follows:
exhibit much higher Macro Recall (80.57%, 80.38%, and
80.06%) compared to the reference models, which reported Answer of Question 1: The application of recursive feature
around 70-71%. elimination (RFECV) and hyperparameter tuning (Optuna)
Additionally, the Recall for Class 1 for the LGBM, significantly enhanced the models' predictive performance.
XGB, and LR models is notably high (78.27%, 77.19%, and RFECV identified a subset of features that were most
76.39%), though the reference models did not provide this impactful in predicting heart attacks, reducing the feature
specific metric. Precision for Class 1 in the current study space from 47 to three sets 43, 35, 25 without compromising
varies, with the RF model performing better at 61.09%. model accuracy. Subsequently, Optuna's hyperparameter
Overall, the LGBM, XGB, and LR models in the current optimization further refined the models, leading to an
study demonstrate superior performance in maintaining average increase in balanced accuracy by about 3.5% across
balanced accuracy and recall across both classes, the evaluated models. This demonstrates the crucial role of
highlighting their robustness in handling class imbalance and

11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

targeted feature selection and precise model tuning in Based on the conducted experimental results in
improving prediction outcomes. experiment 5 and 6, the answer of question 2 is addressed as
follows:
D. EXPERIMENT 4: ENSEMBLE MODEL EVALUATION
Answer for Question 2: Ensemble techniques, integrating
Our experiments with an ensemble showed that LGBM outputs from LGBM, XGB, and LR, demonstrated a superior
and XGB produced the highest 80.70% balanced accuracy, predictive capability compared to individual models. The
while for class 1 our recall was at 78.45% as displayed in ensemble of best 2 models, utilizing average voting
Table 4. Combining the different models into an ensemble mechanism, achieved a 0.4% increase in balanced accuracy
was able to take advantage of the similarities and unique of over the best-performing single model. This indicates the
each individual model, which had a synergetic effect that in effectiveness of ensemble strategies in leveraging diverse
its turn, improved the overall predictive accuracy. predictive perspectives, thereby enhancing the overall
model's performance in heart attack prediction.
TABLE IV
PERFORMANCE FOR ENSEMBLE MODEL EVALUATION VI. ROC Curve Analysis

Macro
Confusi The Receiver Operating Characteristic (ROC) curve is a
Balanced Macro Recall Confusion on
Model Precis Precision
Name
Accuracy
ion
Recall Class
Class 1 %
Matrix Matrix graphical plot that illustrates the diagnostic ability of a binary
% % 1% Class 0 % Class 1
%
% classifier system as the discrimination threshold is varied. As
Ensemb 80.70 59.80 80.70 78.45 21.10 83 78 presented in Figure 9, the curve is created by plotting the
le 2
Models
True Positive Rate (TPR, also known as recall or sensitivity)
Ensemb 80.66 59.90 80.66 78.10 21.28 83 78 against the False Positive Rate (FPR, or 1 - specificity) at
le 3 various threshold settings. The Area under the Curve (AUC)
Models
provides a single measure of overall performance of the
E. EXPERIMENT 5: FINAL MODEL EVALUATION WITH classification model and represents the probability that a
YOUDEN’S J INDEX classifier will rank a randomly chosen positive instance
The application of Youden's J Index to determine the higher than a randomly chosen negative one.
optimal classification threshold for our ensemble model
marked the final stage of our experimentation. This method
identified a threshold of 0.47, optimizing the model's
discriminatory power between positive and negative classes.
The final ensemble model as presented in Table 5 is adjusted
based on Youden's J Index and achieved a balanced accuracy
of 80.75% and an impressive recall for class 1 of 79.97%,
signifying a significant advancement in the prediction of
heart attack incidents.
The results of our comprehensive study illustrate the
effectiveness of combining advanced ML techniques,
strategic feature selection, and hyperparameter optimization
to address the challenges posed by imbalanced datasets in
heart attack prediction. Our ensemble model, informed by
rigorous experimentation and fine-tuning, stands as a
testament to the potential of data-driven approaches in
enhancing medical diagnostic processes, offering a valuable
tool for early detection and intervention in heart disease.

TABLE V
PERFORMANCE FOR ENSEMBLE MODEL EVALUATION

Model Name Ensemble 2 models with YJ Index


Balanced Accuracy 80.75%
Macro Precision 59.30%
Macro Recall 80.75%
Recall Class 1 79.97% FIGURE 9. Overall Performance of Classification Model
Precision Class 1 20.10%
Confusion Matrix Class 0 82% The ROC curve displayed in Figure 9 displays the
Confusion Matrix Class 1 80% performance of our final predictive model on the test dataset.
With an AUC of 0.89, the curve indicates a high level of
predictive accuracy, substantially better than random

12
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

guessing, which is represented by the dashed line at a 45-


degree angle (AUC = 0.5). The steep rise of the ROC curve
towards the upper-left corner suggests that the model has a
high true positive rate while maintaining a low false positive
rate for a substantial range of threshold values.
The proximity of the ROC curve to the top left corner of
the graph is indicative of the model's capability to distinguish
between the two classes effectively. This is particularly
important in the medical domain where the cost of false
negatives (failing to identify a heart attack) is high. The high
AUC value reflects the model's strong performance in
balancing sensitivity and specificity, making it a potentially
valuable tool for early and accurate heart attack risk
stratification in clinical settings.
The results indicate that the model can accurately
identify individuals at high risk of heart attacks, which will
facilitate primary care providers to assign and allocate scarce
resources reasonably, based on the urgency of medical
attention. This might improve the outcomes of patients
through timely intervention. The ROC curve performance
should be seen as one of the factors when it comes to
choosing a predictive model and at the same time is
indispensable for the clinical judgment.

VII. Discussion
This research provides significant insights into heart
attack prediction using ML techniques on the Behavioral
Risk Factor Surveillance System (BRFSS) dataset. The
ensemble model, optimized through a series of methodical
experiments, demonstrates a promising approach to
addressing the perennial challenge of class imbalance in
medical datasets. The findings underscore the potential of
leveraging advanced analytics to refine predictive models in
healthcare, particularly for conditions as critical as heart
attacks.

A. ANALYSIS OF SHAP SUMMARY PLOT


As presented in Figure 10, our model’s interpretability
was significantly enhanced by the SHAP summary plot,
which provided a visualization of the impact of individual
features on the prediction of heart attack risk. This plot
illustrates how each feature value whether it is high or low
contributes to the model output, which is paramount in
understanding the decision-making process behind the
model's predictions.

FIGURE 10. SHAP summary plot

13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

The SHAP summary plot reveals that 'Age_Category' is


the most influential feature, with higher ages contributing to
an increased risk prediction. This finding aligns with well-
established medical knowledge that heart attack risk
escalates with age. The plot also highlights 'Sex_Variable' as
a significant predictor, suggesting gender-specific
differences in heart attack risk, which warrants further
investigation to understand underlying physiological or
social determinants. The 'Coronary_Heart_Disease_Status'
feature shows a dual impact on the model's output, which
may reflect the complex relationship between existing heart
conditions and the risk of subsequent heart attacks. High
values of 'General_Health_Status' also play a notable role in
increasing the risk prediction, emphasizing the importance of
overall health in heart disease prognosis. The dispersion of
SHAP values for features like 'Diabetes_Status' and
'Stroke_Status' across the spectrum indicates a nuanced
contribution to the risk prediction, potentially reflecting the
interplay between these comorbidities and heart attack
incidence.
The insights derived from the SHAP summary plot
provide a deeper understanding of the factors influencing
heart attack risk and underscore the importance of an
interpretable model in clinical decision support. The ability
to explain model predictions is not only critical for gaining
clinician trust but also for informing targeted intervention
strategies aimed at mitigating heart attack risk. This plot,
therefore, serves as a valuable tool for clinicians and
healthcare policymakers by highlighting key areas for
preventive measures and patient education, ultimately
contributing to improved cardiovascular health outcomes.

B. SHAP WATERFALL ANALYSIS


The SHAP waterfall plots show the contribution of each
feature to the final prediction for a particular instance.
Discussing individual predictions can be insightful,
especially when explaining model decisions for specific
patients. For a positive case as explained in Figure 9 where
the prediction is a high risk of heart attack, the discussion
might include points such as:
 Identifying key features that contributed most to the
prediction of a higher risk. For example, As shown in
Figure 11, the Age_Category and
Coronary_Heart_Disease_Status had the highest positive
SHAP values, indicating that older age and pre-existing
heart disease significantly increased the model's prediction
of a heart attack.
 Discussing the role of each feature in detail. For instance,
Sex_Variable had a substantial positive contribution, FIGURE 11. Positive cases (Heart Attack)
suggesting that gender might play a significant role in
heart attack risk, which aligns with existing medical For a negative case, as presented in Figure 10 where the
literature that indicates differences in heart attack prediction is a low risk of heart attack, the discussion might
symptoms and prevalence between sexes. focus on:
 Reflecting on the overall prediction. The cumulative  Highlighting features that pushed the prediction away
positive contributions of these risk factors led to the model from a heart attack. Figure 12 illustrates that the
predicting a high likelihood of a heart attack for this Employment_Status and General_Health_Status had
individual case. negative SHAP values, reducing the predicted

14
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

probability of a heart attack. This could indicate that Based on the analysis of SHAP model, the answer of
employment and perceived general health are inversely question 3 is addressed as follows:
related to heart attack risk in this model.
 Noting any unexpected findings. Interestingly, the Answer for Question 3: SHAP analysis revealed that
feature Teeth_Extraction_Status, typically not a primary features such as age, BMI, smoking status, and physical
concern in heart attack prediction, showed a negative activity level were among the top contributors to predicting
contribution, suggesting that dental health might have an heart attack risk. The positive SHAP values associated with
indirect relationship with heart attack risk, as seen in this higher age and BMI indicated an increased risk of heart
case. attacks, aligning with established clinical understanding.
 Summarizing how the absence of high-risk indicators This analysis not only highlighted the model's reliance on
contributes to the low risk prediction. "The absence of clinically relevant features but also underscored the
major known risk factors, combined with the negative importance of interpretability in validating the model's
contributions of certain features, led to a low predicted predictions against medical knowledge.
risk of heart attack for this individual. By leveraging SHAP values, the model remains
interpretable, despite its complexity. This is crucial in a
clinical setting, where understanding the reasoning behind a
predictive model's decision can inform better patient
outcomes and guide healthcare professionals in their
decision-making process.

C. COMPARISON WITH EXISTING STUDIES


Our study aligns with the growing body of literature
advocating for ML applications in healthcare diagnostics.
The performance of our ensemble model, particularly in
terms of balanced accuracy and recall for class 1 (heart attack
cases), contributes to the ongoing discussion about the
efficacy of data-driven approaches in enhancing predictive
accuracy for cardiovascular diseases. Comparatively, our
methodology emphasizes the importance of balanced
performance metrics and feature optimization, offering a
nuanced perspective on model evaluation in the context of
imbalanced datasets.
Our study stands out by utilizing the BRFSS dataset,
focusing on class imbalance and achieving significant
performance metrics through advanced ML techniques and
ensemble modeling, resulting in high predictive accuracy
and interpretability. In contrast, the authors of [33]
emphasize the use of various ML classifiers, achieving high
accuracy with gradient boosted trees and multilayer
perceptron but lacks the focus on class imbalance and
interpretability seen in our study. The authors of [34] also
address class imbalance in the BRFSS dataset and employs
a range of resampling techniques and multiple classifiers,
highlighting the success of SMOTE-ENN with CatBoost and
Optuna, achieving notable recall and AUC metrics.
However, our study differentiates itself by employing SHAP
analysis for deeper interpretability and focusing on balanced
accuracy and macro-recall as primary metrics, which ensures
fair class performance evaluation, crucial for imbalanced
datasets. These comparisons underscore our study's
comprehensive approach, combining advanced ML
optimization with model transparency, thereby offering
significant advancements in early heart attack prediction and
healthcare analytics.
FIGURE 12. Negative cases (No Heart Attack)

15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

D. IMPLICATION OF THE FINDINGS ACKNOWLEDGMENT


High recall rate for heart attack predictions obtained by This research project was funded by the Deanship of Scientific Research,
Princess Nourah bint Abdulrahman University, through the Program of
our model using ensemble technique indicates considerable
Research Project Funding After Publication, grant No (44- PRFA-P- 113)
clinical implications. The capability of this model to identify
more heart attacks precisely helps it to be an important tool
for early detection where timely interventions are made and REFERENCES
[1] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M.
improved outcomes are realized. On top of that, balanced A. Moni, "Heart disease prediction using supervised machine learning
accuracy is the primary term used to assess the model as it algorithms: Performance analysis and comparison," Computers in Biology
makes it somewhat more sensitive to both heart attack and and Medicine, vol. 136, p. 104672, 2021/09/01/ 2021, doi:
https://doi.org/10.1016/j.compbiomed.2021.104672.
non-heart attack cases in the imbalanced data set what
[2] A. Ghasemieh, A. Lloyed, P. Bahrami, P. Vajar, and R. Kashef, "A novel
prevents the bias, inherent to the majority class. machine learning model with Stacking Ensemble Learner for predicting
emergency readmission of heart-disease patients," Decision Analytics
VIII. CONCLUSION AND FUTURE WORK Journal, vol. 7, p. 100242, 2023/06/01/ 2023, doi:
https://doi.org/10.1016/j.dajour.2023.100242.
[3] I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem, "Multi-
This study embarked on a journey to enhance heart Label Active Learning-Based Machine Learning Model for Heart
attack prediction utilizing the Behavioral Risk Factor Disease Prediction," Sensors, vol. 22, no. 3, doi: 10.3390/s22031184.
Surveillance System (BRFSS) dataset through advanced ML [4] J. Liu, X. Dong, H. Zhao, and Y. Tian, "Predictive Classifier for
methodologies. By addressing the inherent challenge of class Cardiovascular Disease Based on Stacking Model Fusion," Processes,
vol. 10, no. 4, doi: 10.3390/pr10040749.
imbalance, we meticulously processed and prepared the data, [5] P. Singh, P. S. Kourav, S. Mohapatra, V. Kumar, and S. K. Panda,
applied a diverse array of ML algorithms, and implemented "Human heart health prediction using GAIT parameters and machine
a strategic ensemble model to optimize predictive learning model," Biomedical Signal Processing and Control, vol. 88, p.
performance. Our research highlighted the efficacy of 105696, 2024/02/01/ 2024, doi:
https://doi.org/10.1016/j.bspc.2023.105696.
balanced accuracy and macro-recall as primary metrics, [6] A. H. Chen, S. Y. Huang, P. S. Hong, C. H. Cheng, and E. J. Lin, "HDPS:
ensuring equitable treatment of classes and a focus on the Heart disease prediction system," in 2011 Computing in Cardiology, 18-
clinical imperative of accurately identifying heart attack 21 Sept. 2011 2011, pp. 557-560.
incidents. The outcomes emphasize the ML ability to [7] M. A. Jabbar and S. Samreen, "Heart disease prediction system based on
hidden naïve bayes classifier," in 2016 International Conference on
bringing an exciting new era in heart disease prevention, Circuits, Controls, Communications and Computing (I4C), 4-6 Oct. 2016
which is based on early prediction of heart attack. The 2016, pp. 1-5, doi: 10.1109/CIMCA.2016.8053261.
designated ensemble model, poised by its outperforming [8] A. Gupta, R. Kumar, H. S. Arora, and B. Raman, "MIFH: A Machine
balanced accuracy and recall metrics, boasts about Intelligence Framework for Heart Disease Diagnosis," IEEE Access, vol.
8, pp. 14659-14674, 2020, doi: 10.1109/ACCESS.2019.2962755.
noticeable progress in the predictive tools for healthcare. It [9] M. Elhoseny et al., "A New Multi-Agent Feature Wrapper Machine
clearly emphasizes the level of accuracy that adjusted ML Learning Approach for Heart Disease Diagnosis," Computers, Materials
models can have in being key prevention and intervention \& Continua, vol. 67, no. 1, 2021, doi: 10.32604/cmc.2021.012632.
instruments, thus leading to better health outcomes and [10]T. Obasi and M. O. Shafiq, "Towards comparing and using Machine
Learning techniques for detecting and predicting Heart Attack and
lowered costs in the healthcare sector. On the other hand, the Diseases," in 2019 IEEE International Conference on Big Data (Big
study also adds to broad literature dealing with applying Data), 9-12 Dec. 2019 2019, pp. 2393-2402, doi:
data-driven approaches in healthcare and by doing so, 10.1109/BigData47090.2019.9005488.
emphasizing the importance of not ignoring class imbalance [11]U. Nagavelli, D. Samanta, and P. Chakraborty, "Machine Learning
Technology-Based Heart Disease Detection Models," Journal of
and using ensemble methods as being the right approach to Healthcare Engineering, vol. 2022, p. 7351061, 2022/02/27 2022, doi:
raise the model performance. The proposed approach and 10.1155/2022/7351061.
results also develop an applicable strategy for the replication [12]S. S. Yadav, S. M. Jadhav, S. Nagrale, and N. Patil, "Application of
of the works in exploring the other diagnostics areas that Machine Learning for the Detection of Heart Disease," in 2020 2nd
International Conference on Innovative Mechanisms for Industry
have a larger context. Future research directions include Applications (ICIMIA), 5-7 March 2020 2020, pp. 165-172, doi:
numerous projects to be developed and advanced as it is 10.1109/ICIMIA48430.2020.9074954.
concluded in our discussion. The inclusion of further [13] V. Singh, VK. Asari , R. Rajasekaran, “A Deep Neural Network for Early
complex algorithms, stopping on feature interactions, and the Detection and Prediction of Chronic Kidney Disease”. Diagnostics. 2022;
12(1):116. https://doi.org/10.3390/diagnostics12010116
dynamic data sources flexibility are to be expected in future [14]A. Rairikar, V. Kulkarni, V. Sabale, H. Kale, and A. Lamgunde, "Heart
ML models. Their ability to handle the urgent and severe disease prediction using data mining techniques," in 2017 International
heart attack cases is to rise. To conclude, our work is Conference on Intelligent Computing and Control (I2C2), 23-24 June
contributing to the roadmap for integrating ML into 2017 2017, pp. 1-8, doi: 10.1109/I2C2.2017.8321771.
[15] S. Abbas et al., "Artificial intelligence framework for heart disease
healthcare diagnostics systems and on the path towards full classification from audio signals," Scientific Reports, vol. 14, no. 1, p.
achievement of this potential. Therefore, through the 3123, 2024/02/07 2024, doi: 10.1038/s41598-024-53778-7.
combined efforts of high analytical methods and strategic [16] B. Akkaya, E. Sener, and C. Gursu, "A Comparative Study of Heart
model development, the purpose of improving the accuracy, Disease Prediction Using Machine Learning Techniques," in 2022
International Congress on Human-Computer Interaction, Optimization
equity, and impact of predictive tools for heart attack and Robotic Applications (HORA), 9-11 June 2022, pp. 1-8, doi:
prediction is achieved. 10.1109/HORA55278.2022.9799978.
[17] N. Sharma, L. Malviya, A. Jadhav, and P. Lalwani, "A hybrid deep
neural net learning model for predicting Coronary Heart Disease using

16
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3494839

Randomized Search Cross-Validation Optimization," Decision Analytics [33] C. A. Hassan et al., "Effectively Predicting the Presence of Coronary
Journal, vol. 9, p. 100331, 2023/12/01/ 2023, doi: Heart Disease Using Machine Learning Classifiers," Sensors, vol. 22, no.
https://doi.org/10.1016/j.dajour.2023.100331. 19, doi: 10.3390/s22197227.
[18] R. C. Das, M. C. Das, M. A. Hossain, M. A. Rahman, M. H. Hossen, and [34] K.-V. Tompra, G. Papageorgiou, and C. Tjortjis, "Strategic Machine
R. Hasan, "Heart Disease Detection Using ML," in 2023 IEEE 13th Learning Optimization for Cardiovascular Disease Prediction and High-
Annual Computing and Communication Workshop and Conference Risk Patient Identification," Algorithms, vol. 17, no. 5, doi:
(CCWC), 8-11 March 2023, pp. 0983-0987, doi: 10.3390/a17050178.
10.1109/CCWC57344.2023.10099294.
[19] D. Mehta, A. Naik, R. Kaul, P. Mehta, and P. J. Bide, "Death by heart
failure prediction using ML algorithms," in 2021 4th Biennial Nacim YANES was born in Gabes, Tunisia in 1981.
International Conference on Nascent Technologies in Engineering He received the Master degree in computer science
(ICNTE), 15-16 Jan. 2021, pp. 1-5, doi: applied to management from the Higher Institute of
10.1109/ICNTE51185.2021.9487652. Management (ISG), Tunisia. He received his PhD in
[20]V. Selvakumar, A. Achanta, and N. Sreeram, "Machine Learning based computer science from the National School of
Chronic Disease (Heart Attack) Prediction," in 2023 International Computer Science (ENSI), University Manouba,
Conference on Innovative Data Communication Technologies and Tunisia. He is an Assistant Professor in the Higher
Application (ICIDCA), 14-16 March 2023 2023, pp. 1-6, doi: Institute of Management (ISGGB), University of
10.1109/ICIDCA56705.2023.10099566. Gabes, Tunisia. His current research interests include
[21]M. S. Manoj, K. Madhuri, K. Anusha, and K. U. Sree, "Design and AI-based Healthcare recommender systems, Software Reuse,
Analysis of Heart Attack Prediction System Using ML," in 2023 IEEE Recommenders Systems in Software Engineering, Serious Games and
International Conference on Integrated Circuits and Communication Gamification, and Outcome-based Education.
Systems (ICICACS), 24-25 Feb. 2023 2023, pp. 01-06, doi:
10.1109/ICICACS57338.2023.10099819.
[22] https://catalog.data.gov/dataset/cdc-behavioral-risk-factor- LEILA JAMEL received the Engineering degree in
surveillance-system-brfss computer sciences and the Ph.D. degree in computer
[23]E. W. Ingwersen et al., "Machine learning versus logistic regression for sciences and information systems. She was the
the prediction of complications after pancreatoduodenectomy," Surgery, Program Leader of the IS Program and the ABET and
vol. 174, no. 3, pp. 435-440, 2023/09/01/ 2023, doi: NCAAA Accreditation Committees, CCIS, Princess
https://doi.org/10.1016/j.surg.2023.03.012. Nourah bint Abdulrahman University (PNU), Saudi
[24]J. Jeppesen, J. Christensen, P. Johansen, and S. Beniczky, "Personalized Arabia. She was the HODof Information Systems
seizure detection using logistic regression machine learning based on Security of the Premier Ministry of Tunisia. She is
wearable ECG-monitoring device," Seizure: European Journal of currently an Assistant Professor with the College of Computer and
Epilepsy, vol. 107, pp. 155-161, 2023/04/01/ 2023, doi: Information Sciences, PNU. She is a Researcher with the RIADI
https://doi.org/10.1016/j.seizure.2023.04.012. Laboratory, Tunisia. Her research interests include business process
[25]V.-H. Truong, S. Tangaramvong, and G. Papazafeiropoulos, "An modeling, business process management/re-engineering and quality,
efficient LightGBM-based differential evolution method for nonlinear context-awareness in business models, data sciences, ML, process mining,
inelastic truss optimization," Expert Systems with Applications, vol. 237, e-learning, and software engineering. She was a member of the Steering and
p. 121530, 2024/03/01/ 2024, doi: Scientific Committees of the IEEE International Conference on Cloud
https://doi.org/10.1016/j.eswa.2023.121530. Computing. She is a reviewer of many international journals and
[26]G. V. D. Kumar, V. Deepa, N. Vineela, and G. Emmanuel, "Detection of conferences.
Parkinson’s disease using LightGBM Classifier," in 2022 6th
International Conference on Computing Methodologies and Mohamed Ezz is an Associate Professor in Faculty of
Communication (ICCMC), 29-31 March 2022 2022, pp. 1292-1297, doi: Engineering Al Azhar University and now is
10.1109/ICCMC53470.2022.9753909. visiting Professor at College of Computer and
[27]S. M. Ganie and P. K. Dutta Pramanik, "A comparative analysis of Information Sciences, Jouf University. He received
boosting algorithms for chronic liver disease prediction," Healthcare his B.Sc., M.Sc. and Ph.D. in Systems & Computers
Analytics, p. 100313, 2024/02/23/ 2024, doi: Engineering from Faculty of Engineering, Al Azhar
https://doi.org/10.1016/j.health.2024.100313. University. He is IEEE member. His area of interest
[28]S. Rajeashwari and K. Arunesh, "Chronic disease prediction with deep includes pattern recognition, Applied Machine
convolution based modified extreme-random forest classifier," Learning, application security, intrusion detection, and semantic web. He
Biomedical Signal Processing and Control, vol. 87, p. 105425, has published 40 scientific papers in various national and international
2024/01/01/ 2024, doi: https://doi.org/10.1016/j.bspc.2023.105425. journals and conferences. He has contributed in more than 16 mega software
[29]E. S. Mohamed, T. A. Naqishbandi, S. A. C. Bukhari, I. Rauf, V. projects in Electronic banking EBPP, EMV, mobile banking and e-
Sawrikar, and A. Hussain, "A hybrid mental health prediction model commerce, also CBAP Certified.
using Support Vector Machine, Multilayer Perceptron, and Random
Forest algorithms," Healthcare Analytics, vol. 3, p. 100185, 2023/11/01/ Ayman Mohamed Mostafa is an Associate Professor in Faculty of
2023, doi: https://doi.org/10.1016/j.health.2023.100185. Computers and Informatics, Zagazig
[30]S. S. Bhat, M. Banu, G. A. Ansari, and V. Selvam, "A risk assessment University, Egypt and now is an Assistant
and prediction framework for diabetes mellitus using machine learning
Professor at College of Computer and
algorithms," Healthcare Analytics, vol. 4, p. 100273, 2023/12/01/ 2023,
doi: https://doi.org/10.1016/j.health.2023.100273. Information Sciences, Jouf University, Saudi
[31]F. M. Delpino, Â. K. Costa, S. R. Farias, A. D. P. Chiavegatto Filho, R. Arabia. He received his MSc and PhD in
A. Arcêncio, and B. P. Nunes, "Machine learning for predicting chronic Information Systems from Faculty of
diseases: a systematic review," Public Health, vol. 205, pp. 14-25, Computers and Informatics, Zagazig
2022/04/01/ 2022, doi: https://doi.org/10.1016/j.puhe.2022.01.007. University, Egypt. He is IEEE member. His area
[32] M. M. Chowdhury, R. S. Ayon, and M. S. Hossain, "An investigation of interest includes information security, cloud computing, E-
of machine learning algorithms and data augmentation techniques for business, E-commerce, big data, and data science. He has published
diabetes diagnosis using class imbalanced BRFSS dataset," Healthcare more than 50 scientific papers in various national and international
Analytics, vol. 5, p. 100297, 2024/06/01/ 2024, doi:
https://doi.org/10.1016/j.health.2023.100297.
journals and conferences. He is also Oracle Certified Associate,
Oracle Certified Professional, and EMC Academic Associate in
Cloud Infrastructure and Services.

17
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like