0% found this document useful (0 votes)
4 views

Performance Evaluation of Machine Learning Techniques

This research article evaluates various machine learning techniques for predicting heart disease, highlighting the effectiveness of the k-nearest neighbor (KNN) and random forest (RF) algorithms, which achieved a 99.04% accuracy rate. The study emphasizes the importance of early diagnosis in reducing mortality rates associated with heart disease and discusses the use of feature selection methods to enhance prediction accuracy. The findings suggest that machine learning can significantly improve heart disease prognosis and patient outcomes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Performance Evaluation of Machine Learning Techniques

This research article evaluates various machine learning techniques for predicting heart disease, highlighting the effectiveness of the k-nearest neighbor (KNN) and random forest (RF) algorithms, which achieved a 99.04% accuracy rate. The study emphasizes the importance of early diagnosis in reducing mortality rates associated with heart disease and discusses the use of feature selection methods to enhance prediction accuracy. The findings suggest that machine learning can significantly improve heart disease prognosis and patient outcomes.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Hindawi

Computational and Mathematical Methods in Medicine


Volume 2023, Article ID 8191261, 10 pages
https://doi.org/10.1155/2023/8191261

Research Article
Performance Evaluation of Machine Learning Techniques
(MLT) for Heart Disease Prediction

Gufran Ahmad Ansari,1 Salliah Shafi Bhat,2 Mohd Dilshad Ansari ,3 Sultan Ahmad ,4,5
Jabeen Nazeer ,4 and A. E. M. Eljialy 6
1
Department of Computer Science, Dr. Vishwanath Karad MIT World Peace University, Pune 411038, India
2
Department of Computer Applications, B.S Abdur Rahman Institute of Science & Technology, Chennai 600048, India
3
Guru Nanak University, Hyderabad 501506, India
4
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, P. O.
Box. 151, Alkharj 11942, Saudi Arabia
5
University Center for Research and Development (UCRD), Department of Computer Science and Engineering,
Chandigarh University, Gharuan, Mohali, 140413 Punjab, India
6
Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University,
P.O.Box. 151, Alkharj 11942, Saudi Arabia

Correspondence should be addressed to Sultan Ahmad; [email protected]

Received 23 January 2023; Revised 9 March 2023; Accepted 10 March 2023; Published 29 May 2023

Academic Editor: Yaser Ahangari N.

Copyright © 2023 Gufran Ahmad Ansari et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

The leading cause of death worldwide today is heart disease (HD). The heart is recognised as the second-most significant organ
behind the brain. A successful outcome of treatment can be improved by an early diagnosis which can significantly reduce the
chance of death in health care. In this paper, we proposed a method to predict heart disease. We used various machine
learning algorithms (MLA), namely, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), Naive
Bayes (NB), random forest (RF), and decision tree (DT). With the testing data set, we evaluated the model’s accuracy in heart
disease prediction. When compared to the other five models, the random forest and k-nearest neighbor approaches perform
better. With a 99.04% accuracy rate, the k-nearest neighbor algorithm and random forest provide the best match to the data as
compared to other algorithms. Six feature selection algorithms were used for the performance evaluation matrix. MCC
parameters for accuracy, precision, recall, and F measure are used to evaluate models.

1. Introduction the causes of heart disease in its early stages to make human
life better [2]. Due to the limited accessibility of diagnostic
One of the most difficult and severe illnesses affecting indi- tools, the lack of specialists, and other resources that affect
viduals worldwide is heart disease. The heart which regulates the accurate diagnosis and treatment of heart patients, heart
blood flow throughout the body is a crucial component of disease diagnosis and therapy are particularly difficult in
the human body. The human lifespan will be shortened by developing countries [3]. Since cardiac illness has a complex
heart disease. HD affects around 15 million people each year character, it requires cautious management. Regression,
[1]. Heart disease is one of the top causes of death in the KNN, SVM, NB, and DT are used to categorise the severity
contemporary world. Heart illnesses are caused by many risk of the condition. In order to help with decision-making and
factors, such as high blood pressure and excessive choles- prediction from the vast quantity of data generated by the
terol, high cholesterol, diabetes, and irregular heartbeats. healthcare business, machine learning (ML) has been proven
Doctors, researchers, and scientists are working to identify to be useful [4]. Around 17.9 million people died in 2016
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Computational and Mathematical Methods in Medicine

which is 31% of all deaths worldwide. Among them, heart classification algorithm that will predict the above sickness
attack and stroke account for 85% of the deaths. Patients with the highest level of accuracy. This research will be sup-
are facing more cardiac problems due to a variety of factors, ported by a comparative analysis such as logistic regression,
including lifestyle choices like smoking, eating poorly, and KNN, support vector machine, Naive Bayes, decision tree,
having high blood pressure [5]. The RF and KNN and random forest for prediction of heart disease, and the
approaches outperform the other five methods. The k- most accurate algorithm would be considered to be the
nearest neighbor method and RF when compared to other better one. The segmentation of the paper is organized as
algorithms, offer the best match to the data with a 99.04% follows: Section 1 is the Introduction. Section 2 discusses
accuracy rate. Based on symptoms such as pulse rate, age, related work with existing methods. Section 3 discusses the
gender, asthma, smoking, and blood pressure, heart disease flow chart of the proposed framework. Section 4 describes
is predicted with accuracy [6]. Additionally, recently many data collection and methodology. Section 5 is about results
researchers create machine learning-based methods for fore- and analysis. Finally, Section 6 ends with a conclusion as
casting the prevalence of heart illnesses [7]. The categoriza- well as a future enhancement.
tion and prediction for the diagnosis of cardiac disease
have been the subject of numerous studies, and a variety of 2. Related Work
machine learning models are being applied. Using a simu-
lated classifier, the patients with high and low risks of con- HD is a common disease that affects many people during
gestive heart failure are displayed [8]. Shortness of breath, middle age or old age. A wide variety of issues can solve
muscular weakness, swollen feet, and exhaustion are among related to heart diseases using a machine learning approach.
the indications and symptoms of heart disease [9]. Heart ill- Marimuthu et al. conducted a review for the prediction of
ness can be fatal and should not be ignored. Males are more heart disease using a data analytical technique. For predict-
likely than females to suffer heart disease, according to Har- ing cardiac disease, machine learning techniques (MLT)
vard Health Publishing [10]. We gathered a dataset for the has included DT, NB, KNN, and SVM [13]. A comprehen-
research of heart disease from different sources, namely, sive review of heart disease prediction using machine learn-
the University of California (UCI). Using machine learning ing was written by Battula et al. They have created a table
techniques, the UCI database is used to identify heart dis- that contrasts every MLT used to predict heart disease since
ease. Using NB, DT, LR, and the random forest algorithm, 2012 [14]. Comparative analysis of cardiac disorders using
they demonstrated the accuracy of the random forest algo- MLA has been done in numerous research articles. The liter-
rithm at 90.16 percent. As a result, the accuracy achieved ature evaluation has shown the classification effectiveness of
with logistic regression is 89.06 percent, whereas the accu- various machine learning algorithms on the dataset for heart
racy achieved without using logistic regression is 87.77 per- disease [15]. A suggestion of a decision support system based
cent [11, 12]. Researchers applied the random forest and on a logistic regression classifier for categorising heart
nearest neighbor algorithms for improving accuracy. A disease attained a classification accuracy of 77%. Machine
detailed analysis of heart disease prediction using machine learning is useful for a variety of problems. One use for this
learning was published in 2020. As a result, the annual decline method is to a dependent variable can be predicted using the
in heart disease deaths has been significant. However, it is values of the independent variables. Due to its extensive data
really helpful to utilize machine learning techniques to forecast resources, which are difficult to manage manually, the health
results from existing data. This research employs a sector has advanced analytics. Even in developed economies,
classification-based machine learning technique to anticipate heart disease has been found to be one of the leading causes
the risk of heart disease from the risk factors. It also aims to of death. Heart disease deaths are caused in part because the
improve the accuracy of heart disease risk predictions. risks are not identified or are detected much later than they
ought to be. However, using machine learning techniques
1.1. Motivation of Study. There are several diseases that can help resolve this problem and provide early risk predic-
affect people everywhere in the world. Today, HD is a seri- tions. Support vector machines (SVM), DT, regression, and
ous problem that has a big impact on mortality in both NB classifiers are a few of the methods utilised for these pre-
men and women. 17.9 million deaths from heart disease diction issues. With 92.1% accuracy, SVM was found to be
are reported annually by the WHO, which accounts for the strongest predictor followed by neural networks (91%)
31% of all deaths from heart diseases. Although there are and decision trees (89.6%) diabetes, hypertension [16]. It
machine learning tools and approaches available, there are was believed that gender and smoking were risk factors for
no models that are now suitable for quickly and accurately heart disease. [17]. Machine learning techniques such as
predicting the disease. There is currently no reliable auto- DT, NB, and associative classification are effective at predict-
mated system that can improve heart disease prognosis or ing cardiac disease according to analytical research. Com-
reduce its consequences. Because of this, using machine paring associative classification to standard classifiers,
learning algorithms to lessen the effects of the disease would especially when dealing with unstructured data, it produces
be a significant accomplishment. It might improve the qual- higher accuracy and flexibility. Decision tree classifiers are
ity of life for heart patients while also significantly delaying easy to use and precise, according to a comparison of classi-
the onset of the condition. The major goals of this research fication methods. The best algorithm was discovered to be
are to build a model to predict the presence of heart diseases. Naive Bayes which was then followed by neural networks
Additionally, the goal of this research is to determine the and decision trees [18]. Additionally used for disease prediction
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational and Mathematical Methods in Medicine 3

are artificial neural networks. Supervised networks have been between 0 and 1. The dataset basically contains a deletion
utilised for diagnosis as well as the back propagation algorithm of the missing values feature row. This research imple-
can be used to train them. The test results have demonstrated mented each of these data preparation methods.
satisfactory accuracy. It introduced the Intelligent Heart Dis-
ease Prediction System (IHDPS) and techniques like DT, NB, 4.2.1. Data Cleaning. Data were used to acquire unprocessed
and neural networks (NN) [19]. The authors’ experiments information. As a result, a variety of methods has been used
showed that the NB model had the highest prediction accuracy to clean the data including eliminating duplicates and irrel-
(86.1%). DT came in third with a score of 80.4%, and NN came evant information.
in second with a score of 86.12% for right prediction. The
majority of high-accuracy reduction research employs a mixed 4.3. Feature Selection. The most pertinent information is
method that involves categorization algorithms. Our research, chosen by feature selection, a type of dimensionality reduc-
which is summarized here is aimed at improving the classifica- tion in order to categorise and predict the disease. In many
tion of algorithms by using machine learning techniques. Both well-known classification applications, the feature selection
the effectiveness of these classification algorithms and the accu- process is one of the fundamental elements [20]. Before clas-
racy of heart disease prediction is enhanced. Research on LR, sifying the data, more relevant features must be chosen in
KNN, SVM, NB, DT, and RF is performed out, and the out- order to produce a better result in accuracy, and unnecessary
comes are evaluated. Applying feature selection improves the features must be eliminated [21]. In order to classify the
outcomes much more. The results are used to evaluate how input data, the most relevant feature is selected. This feature
effectively these classifiers may be used in the healthcare sector. selection approach is frequently used in all application
domains because it removes duplicate data without sacrific-
3. Flow Chart of Proposed Framework ing any information. As a result, this technique is used with
a variety of algorithms. The following reasons support the
The proposed flow chart for the entire experiment from data implementation of the feature selection technique:
collection to result development is shown in Figure 1. Data is
first preprocessed after being collected from sources (as (i) Reduced training program
described earlier). (ii) It facilitates the identification of the data by the
Preprocessing data is used to reduce bias, noise, and algorithm
inaccuracy. Following the data preprocessing stage, there
are training and testing sets for the database. (iii) The removal of unnecessary data from high-
In addition, many machine learning technologies are uti- dimensional space
lised to train and test the data. The technique is finished with
(iv) By lowering the variables, the output data can be
the generation of accurate results that are compared across
enhanced
various machine learning techniques.
4.3.1. Correlation Matrix. When creating a useful dataset
4. Data Collection and Methodology analysis, it is frequently simpler to take the relationship
The purpose of the research paper is to explore, and the cre- between variables into consideration. A statistic known as
ative process is briefly covered in the following subsections. correlation determines how closely two variables move in
relationship to one another. Two variables are considered
4.1. Data Set. The researchers analyze the use of Dataset for to be positively linked when they move in the same direction
Cleveland Heart from UCI’s machine learning. The dataset and negatively correlated when they move in the opposite
has 12 attributes and 520 occurrences. The dataset’s descrip- direction. The correlation map based on the diabetes dataset
tion can be found in Table 1 This proposed research used the is shown in Figure 2. The dataset is evaluated, and a heat
dataset to create a machine-learning-based method for diag- map is created to show the correlation between the values.
nosing heart problems. The features are age, gender, From this, it can be seen that age, gender, and Thalch char-
Trestbps, Chol, fbs, Thalch, smoker, CP, skin cancer, BMI, acteristics that most strongly match the target variable. The
blood pressure, and outcome. The main class has two values, correlation between age and outcome is 1 : 0.11 in Figure 2
“False” and “True,” which represent the absence or presence which is greater than other attributes.
of any heart disease, respectively.
4.4. K-Fold and Data Splitting. Researchers and practitioners
4.2. Data Preprocessing. When using machine learning algo- frequently utilize the K-fold cross-validation method to
rithms cleaning, the data is crucial for maximizing precision build models and get rid of information bias. With a k
and effectiveness. Data preparation is required for accurate value of 10, the K-fold cross-validation method has been
data representation and machine learning classifiers which applied. Ten equal-sized partitions of the full dataset were
must be trained and tested properly. In order for MLT to created at random. Ten partitions were created; however,
effectively represent data and be trained and validated data only one was utilised to validate (test) the model. The
must first be preprocessed. The standard scalar guarantees remaining ten partitions are used as training data. Each of
that each feature has a mean of 0 and a variance of 1 result- the 10 partitions was used as the validation data exactly
ing in an equal coefficient for all features. The data is modi- once during the course of the entire process’ ten iterations.
fied similarly in MinMax Scaler so that all features fall The accumulation function was used to combine the results
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Computational and Mathematical Methods in Medicine

Start

Training data set Apply machine learning


K-fold cross
validation technique
Testing data set

Dataset
LR
Performance
evaluation Similar KNN
Yes
Compare patients
Data pre-processing SVM
model found
NB
Prediction of
disease No DT
File missing value

Accuracy of model
updated
End

Figure 1: Proposed flow chart for predicting heart disease.

Table 1: Features of the dataset.

S.no Attributes Description


1 Age Age in years
2 Gender If a person is male or female
3 Trestbs Hospital admission mm/hg for the resting blood pressure
4 Chol Mg/dl of serum cholesterol
5 Fbs Blood sugar levels at breakfast (more than 120 mg|dl). 1 indicates true. 0 indicates false.
6 Thalch It achieved maximum heart rate
7 Smoker Whether the person is a smoker or not
8 CP Whether the person is having chest pain or not
9 Skin cancer Whether the person is suffering from skin cancer or not.
10 BMI Body mass index of a person
11 Blood pressure Whether the person is having blood pressure or not.
12 Outcome 0 denotes the absence of heart disease. 1 denotes cardiovascular disease

of all iterations. To match the performance of both training ment and distribution. Together with Python, Spyder is used
and testing datasets, the issue of overfitting and underfitting as an integrated development environment for program-
has been reduced in the dataset. The advantage of this ming tasks and calculations (3.7.6). A machine is trained
strategy was that it eliminated bias from the data when cre- using machine learning to take information from the data
ating ML models to produce accurate results. In order to and predict the results of new sets of information. As a
validate the results, each bin of testing data has been used result, we now have training and test sets of data. After the
exactly once. All data samples are used for both training machine has been trained using the training data set, the
and testing. The dataset is split into 70% for testing and results are verified using the test data set. A software will
30% for training, and the analysis is carried out using the be created as part of the machine learning model that we will
method identified below. create. Supervised learning and unsupervised learning are
the two subcategories of machine learning. the supervised
4.5. Apply Machine Learning Technique. Using machine education, in supervised learning, the computer receives
learning classification, groups of patients with heart disease instruction (mentoring), but in unsupervised learning, the
and healthy people are segregated. Using open-source Ana- machine picks up skills on its own (self-study). The exam-
conda 2020, the entire experimental work was performed ples which follow will help us understand how the two vary.
uses of data science and machine learning in scientific com- Supervised learning (SL) algorithms:
puting. The preprocessing of large amounts of data, predic-
tive analysis, and other applications using the free and (i) The machine must determine if an incoming mail is
unrestricted open-source Python distribution known as spam or not given the data of emails designated by
Anaconda. It was developed to simplify package manage- users as trash or not
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational and Mathematical Methods in Medicine 5

1.0
Age

1 −0.063 0.2 0.14 0.065 0.22 0.32 0.097 0.4 0.3 0.2
Trestbps Gender

−0.063 1 0.27 0.31 0.28 0.12 0.22 −0.21 0.21 0.052 0.014
0.8

0.2 0.27 1 0.6 0.45 0.26 0.37 0.087 0.24 0.088 0.24
Chol

0.14 0.31 0.6 1 0.41 0.33 0.32 0.028 0.33 0.13 0.2
0.6
Fbs

0.065 0.28 0.45 0.41 1 0.28 0.24 0.09 0.069 −0.0045 0.14
Smoker Thalch

0.22 0.12 0.26 0.33 0.28 1 0.18 0.028 0.3 0.31 0.15 0.4

0.32 0.22 0.37 0.32 0.24 0.18 1 −0.064 0.29 0.14 0.24

0.2
−0.21 −0.064 −0.15
CP

0.097 0.087 0.028 0.09 0.028 1 0.13 0.16


cancer
Skin

0.4 0.21 0.24 0.33 0.069 0.3 0.29 −0.15 1 0.29 0.077

0.0
BMI

0.3 0.052 0.088 0.13 −0.0045 0.31 0.14 0.13 0.29 1 0.11
pressure
Blood

0.2 0.014 0.24 0.2 0.14 0.15 0.24 0.16 0.077 0.11 1
−0.2
Age Gender Trestbps Chol Fbs Thalch Smoker CP Skin cancer BMI Blood pressure

Figure 2: Heat Map of a data set.

(ii) The machine should be able to determine whether a where q is the probability c, d will be the parameter of the
new patient has cancer based on the data of individ- model, and x is a factor.
uals who have been diagnosed with the disease
4.5.2. K-Nearest Neighbor. On the basis of the samples
(iii) The machine must predict the cost of the property Euclidean distance, it extracts knowledge e ðy1 , yj Þ and the
with the specific size given data on the costs of vast majority of k nearest neighbor.
homes in a certain area of varying sizes
  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
And the following in unsupervised learning algorithms E y1 , yj = ðyi1, yj:1Þ2 + ðyi, m − yjmÞ2 : ð2Þ

(i) Finding patterns in the data using the scientific data


4.5.3. Support Vector Machine. Models are described as
(ii) Noise reduction in the audio input finite-dimensional vector spaces, where each dimension
(iii) Obtaining song background music for the chorus denotes a “feature” of a particular object. It has been demon-
strated to be a successful strategy in high-dimensional space
In short, SL uses labeled data, whereas unsupervised issues. Due to its computational effectiveness on huge data-
learning uses unlabelled data. A list of various machine sets, this technique is typically utilised in sentiment analysis
learning algorithms is provided below. decision tree, Naive and the classification of data.
Bayes, support vector machine, logistic regression, k-
4.5.4. Naïve Bayes. The Naïve Bayes algorithm classifies the
nearest neighbor, random forest.
dataset using the Bayes rule. Based on the probability
4.5.1. Logistic Regression. Supervised learning which includes observed in the training data, the classification is made using
classification and regression problems can be resolved using all the features. It is a supervised learning algorithm. The
the technique of logistic regression. The range of logistic classification is made based on the probability
regression’s result is between 0 and 1. The maximum likeli-
hood estimate is the foundation of this technique. In logistic PðAjBÞ:PðBÞ
P ðA Þ = , ð3Þ
regression, the Sigmoid function whose probability is pre- P ðBÞ
sented as a binary one is used as an activation function
[22]. In equation (1), it is shown as where P(A|B) is the conditional probability of A given B,
P(B|A) is the conditional probability of B given A, P(A) is
the probability of event A, and P(B) is the probability of
q = ð1 + eÞðc+dxÞ−1 , ð1Þ event B.
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Computational and Mathematical Methods in Medicine

Age Gender Trestbps Fbs


300 300
100
200
200 200
50 100
100 100

0 0 0 0
25 50 75 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
Thalch Smoker CP Blood pressure
300 400 400

200 200
200 200
100 100

0 0 0 0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
Outcome Chol_0 Chol_1 Skin cancer_0
300 300 300
300
200 200 200
200
100 100 100 100

0 0 0 0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0
Skin cancer_1 BMI_0 BMI_1
300

200 200 200

100 100 100

0 0 0
0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

Figure 3: Histogram of attributes.

4.5.5. Decision Tree. Each leaf node has a class label, and (5–9). These evaluation measures are utilised to contrast the
each branch shows the outcome of a test on a specific vari- efficiency of our suggested strategy and possible alternatives.
able in these supervised machine algorithms. At the top of
the tree is the parent node, also referred to as the root node. TP + TN
To identify a different separate category based on the most Accuracy = ∗ 100, ð5Þ
TP + TN + FP + FN
data collected, decision-makers can choose the best option
and make their way up a decision tree from root to leaf
[23]. DT can handle constant and continuous parameters. TP
Precision = ∗ 100, ð6Þ
The major benefit of the decision tree is that it can overfit. TP + FP
TP
4.5.6. Random Forest. One algorithm for classification is the Recall = ∗ 100, ð7Þ
random forest approach. Based on the bagging process, TP + FN
instruction is given. In the algorithm for supervised learning, 2 ∗ ðPrecision ∗ RecallÞ
F Measure = ∗ 100, ð8Þ
the classification of the algorithm is given in Precision + Recall
TP ∗ TN − FP ∗ FN
MCC = ∗ 100:
ðTP + FPÞðTP + FNÞðTN + FPÞ + ðTN + FNÞ
MSE = 1N〠ðNi = 1fi − yiÞ2, ð4Þ
ð9Þ

5. Result and Analysis


where 'N' is the occurrence count,fi is the model’s output,
and yi represents the instances’ true values. Numerous classification models and their statistical analyses
are provided in this section of the research. On the Cleve-
4.6. Performance Evaluations. A comparison of several cate- land heart disease data, we assess the effectiveness of LR,
gorization techniques has been done using the Cleveland KNN, SVM, NB, RF, and DT in the first stage. In this
dataset. The performance matrices Accuracy, Precision, research, we investigated different machine learning algo-
Recall, F-Measure, and MCC are all explained by Equations rithms for the prediction of cardiac disease using an
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational and Mathematical Methods in Medicine 7

90
80
70
Accuracy (%)

60
50
40
30
20
10
0
Logistic regression KNN SVM Naive bayes Decision tree Random forest
Algorithms

Figure 4: Accuracy of models for different algorithms.

75
Age

50
25
1
Gender

0
1
Trestbps

0
1
Fbs

0
1
Thalch

0
1
Smoker

0
1
CP

0
1
pressure
Blood

0
1
Chol_0

0
1
BMI_0 cancer_1 cancer_0 Chol_1

0
1
Skin

0
1
Skin

0
1

0
1
BMI_1

0
0 100 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Age Gender Trestbps Fbs Thalch Smoker CP Blood pressure Chol_0 Chol_1 Skin cancer_0 Skin cancer_1 BMI_0 BMI_1

0
1

Figure 5: Scatterplot in a data set representation.

experimental and analytical techniques. Figure 3 displays the can better understand the variations. MLA style depends
histogram that was created in addition to the plots that on their consistency. The comparison shows that RF and
depict the distribution of each dataset attribute. KNN are more accurate than the other models. The follow-
ing bar graph illustrates how accurate various algorithms
5.1. Model Accuracy. Twelve features are used in the devel- are depicted.
opment of the prediction models, and the modelling tech- Six machine learning algorithms were used in this paper
niques’ accuracy is evaluated. Figure 4 compares multiple for predicting heart disease. The relationship between the
algorithms and shows the accuracy numbers so that we features used in the dataset is depicted in the scatterplot in
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Computational and Mathematical Methods in Medicine

Logistic regression confusion matrix Decision tree classifier confusion matrix

0 37 3 38 2

0
4 60 2 62
1

1
0 1 0 1

Figure 6: Logistic algorithms. Figure 10: Decision tree algorithm.

K nearest neighbors confusion matrix Random forest confusion matrix

35 5 40 0
0

0
1 63 1 63
1

1
0 1 0 1

Figure 7: K-nearest neighbor. Figure 11: Random forest algorithm.

Support vector machine confusion matrix values. In Figures 6–11, these classifier confusion matrices
are displayed. The performance assessment of ML models
is checked using the confusion matrix to look for mistakes
39 1
0

or miscalculations while predicting heart disease. Based on


four factors, including true positive (TP), true negative
(TN), false positive (FP), and false negative, it compares
1 63 the actual results with the predicted ones (FN). The different
1

ML classifiers have been analyzed using statistical metrics


0 1
including accuracy, precision, recall, F measure, and MCC
using the confusion matrices.
Figure 8: Support vector machine. Additionally as shown in Figure 12, certain other sta-
tistical measures are also calculated. The machine learning
classifiers are evaluated using these parameters. Accuracy,
Naive bayes confusion matrix precision, recall, F measure, and MCC are some of the
different parameters.
35 5
0

5.2. Comparative Analysis. Table 2 compares the effective-


ness of our proposed framework with a variety of relevant
4 60
types of literature in terms of the methodologies employed,
1

the dataset, and the analysis. Most cardiac markers are con-
sistent throughout all studies conducted for comparison
0 1 with the suggested study. It was discovered that our well-
Figure 9: Naïve Bayes algorithm.
planned approach produced positive outcomes for several
evaluation measures, especially accuracy for the prediction
of heart disease. The employment of techniques like data
Figure 5. For each dot’s location along the X and Y axes, the imputation for handling missing values, scatterplot method
values that are utilised to quantify a specific data point are for identifying and replacing outliers, and transformation
displayed. method for standardizing and normalizing data has led to
In machine learning algorithms, the performance of the superior outcomes than those of other relevant research.
algorithms is evaluated using a confusion matrix. In a tabu- When creating the proposed framework, the K-fold cross-
lar arrangement, the rows reflect the actual values, the col- validation technique was used to get results that were more
umns the expected values, and the rows display the actual reliable than those from similar research.
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational and Mathematical Methods in Medicine 9

Performance evaluation of algorithms

Logistic K-nearest Support Random


vector Naïve bayes Decision tree
regression neighbour forest
machine
1 2 3 4 5 6
Accuracy 93 99 98 91 95 99
Precision 94 97 94 89 92 97
Recall 93 99 98 91 95 99
F measure 93 97 93 86 79 97
MCC 83 93 83 85 69 93

Figure 12: Performance evaluation of algorithms.

Table 2: Comparison with the existing system.

Authors Techniques used Dataset Accuracy


Chauhan et al. [24] RF, LR, SVM PIDD data set 78%
Pouriyeh et al. [25] KNN, SVM, RF, NB Cleveland data 89%
Kedia et al. [26] LR, KNN, SVM, DT UCI data 90%
Atallah et al. [27] SVM, DT KNN, RF Cleveland data 87%
Proposed method LR, KNN, SVM, NB, DT, RF Cleveland Heart from UCI’s machine learning 99.04%

6. Conclusions Authors’ Contributions


Comparing various ML for the early detection of heart dis- All authors have contributed equally to this work and have
ease is the main contribution, preprocessing techniques were also read and agreed to submit the current version of the
used to enhance the dataset’s quality. With the primary manuscript to this journal.
objectives being the handling of corrupted and missing
values as well as the removal of outliers in order to predict Acknowledgments
the illness. Additionally, we used a variety of machine learn-
ing techniques, and the outcomes were compared using This study is supported via funding from the Prince Sattam Bin
various statistical metrics. The experimental finding indi- Abdulaziz University, project number (PSAU/2023/R/1444).
cates a 70 : 30 ratio between testing and training the data.
In this study, we perform 10-fold cross-validation to a num- References
ber of machine learning methods, and we find that random
forest and k-nearest neighbor are 99.04% accurate compared [1] K. Battula, R. Durgadinesh, K. Suryapratap, and
to other algorithms. Future work can be carried out using G. Vinaykumar, “Use of machine learning techniques in the
various combinations of machine learning methodologies prediction of heart disease,” in 2021 International Conference
to enhance prediction techniques. For the purpose of better on Electrical, Computer, Communications and Mechatronics
comprehension of the critical features and increasing the Engineering (ICECCME), pp. 1–5, Mauritius, Mauritius, 2021.
precision of heart disease prediction, new feature selection [2] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart
approaches can also be developed. disease prediction using hybrid machine learning techniques,”
IEEE Access, vol. 7, pp. 81542–81554, 2019.
[3] S. Ghwanmeh, A. Mohammad, and A. Al-Ibrahim, “Innova-
tive artificial neural networks-based decision support system
Data Availability for heart diseases diagnosis,” Journal of Intelligent Learning
Systems and Applications, vol. 5, no. 3, pp. 176–183, 2013.
The data used to support the findings of this study are available [4] V. Vijayaganth and M. Naveenkumar, “Smart sensor based
from the first author upon request ([email protected]). prognostication of cardiac disease prediction using machine
learning techniques,” in Applications of Machine Learning in
Big-Data Analytics and Cloud Computing, pp. 63–80, River
Conflicts of Interest Publishers, 2022.
[5] H. B. Kibria and A. Matin, “The severity prediction of the
The authors declare no conflict of interest. binary and multi-class cardiovascular disease − a machine
7396, 2023, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2023/8191261, Wiley Online Library on [08/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Computational and Mathematical Methods in Medicine

learning-based fusion approach,” Computational Biology and using the recommender system,” Wireless Communications
Chemistry, vol. 98, article 107672, 2022. and Mobile Computing, vol. 2022, Article ID 5663001, 14
[6] M. Gandhi and S. N. Singh, “Predictions in heart disease using pages, 2022.
techniques of data mining,” in 2015 International Conference [20] A. Tsanas, “Relevance, redundancy, and complementarity
on Futuristic Trends on Computational Analysis and Knowl- trade-off (RRCT): a principled, generic, robust feature-
edge Management (ABLAZE), pp. 520–525, Greater Noida, selection tool,” Patterns, vol. 3, no. 5, article 100471, 2022.
India, 2015. [21] A. Alharbi, K. Equbal, S. Ahmad, H. U. Rahman, and
[7] S. S. Bhat, V. Selvam, G. A. Ansari, M. D. Ansari, and M. H. H. Alyami, “Human gait analysis and prediction using the
Rahman, “Prevalence and early prediction of diabetes using Levenberg-Marquardt method,” Journal of Healthcare Engi-
machine learning in North Kashmir: a case study of district neering, vol. 2021, Article ID 5541255, 11 pages, 2021.
Bandipora,” Computational Intelligence and Neuroscience, [22] S. Ahmad, H. A. Abdeljaber, J. Nazeer, M. Y. Uddin,
vol. 2022, Article ID 2789760, 12 pages, 2022. V. Lingamuthu, and A. Kaur, “Issues of clinical identity verifi-
[8] P. Melillo, N. de Luca, M. Bracale, and L. Pecchia, “Classifica- cation for healthcare applications over mobile terminal plat-
tion tree for risk assessment in patients suffering from conges- form,” Wireless Communications and Mobile Computing,
tive heart failure via long-term heart rate variability,” IEEE vol. 2022, Article ID 6245397, 10 pages, 2022.
Journal of Biomedical and Health Informatics, vol. 17, no. 3, [23] G. A. Ansari and S. S. Bhat, “Exploring a link between fasting
pp. 727–733, 2013. perspective and different patterns of diabetes using a machine
[9] S. Ahmad, S. Khan, M. Fahad AlAjmi et al., “Deep learning learning approach,” Educational Research, vol. 12, no. 2,
enabled disease diagnosis for secure internet of medical pp. 500–517, 2022.
things,” Computers, Materials & Continua, vol. 73, no. 1, [24] T. Chauhan, S. Rawat, S. Malik, and P. Singh, “Supervised and
pp. 965–979, 2022. unsupervised machine learning based review on diabetes care,”
[10] A. A. Ahdal, M. Rakhra, S. Badotra, and T. Fadhaeel, “An inte- in 2021 7th International con-Ference on Advanced Computing
grated machine learning techniques for accurate heart disease and Communication Systems (ICACCS), pp. 581–585, Coim-
prediction,” in 2022 International Mobile and Embedded Tech- batore, India, 2021.
nology Conference (MECON), pp. 594–598, Noida, India, 2022. [25] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia,
[11] A. Noor, L. Ali, H. T. Rauf, U. Tariq, and S. Aslam, “An inte- and J. Gutierrez, “A comprehensive investigation and compar-
grated decision support system for heart failure prediction ison of machine learning techniques in the domain of heart
based on feature transformation using grid of stacked autoen- disease,” in 2017 IEEE Symposium on Computers and Commu-
coders,” Measurement, vol. 205, article 112166, 2022. nications (ISCC), pp. 204–207, Heraklion, Greece, 2017.
[12] Y. A. Nanehkaran, Z. Licai, J. Chen et al., “Anomaly detection [26] S. Kedia and M. Bhushan, “Prediction of mortality from heart
in heart disease using a density-based unsupervised approach,” failure using machine learning,” in 2022 2nd International con-
Wireless Communications and Mobile Computing, vol. 2022, Ference on Emerging Frontiers in Electrical and Electronic
Article ID 6913043, 14 pages, 2022. Technologies (ICEFEET), pp. 1–6, Patna, India, 2022.
[13] M. Marimuthu, M. Abinaya, K. S. Hariesh, K. Madhankumar, [27] R. Atallah and A. Al-Mousa, “Heart disease detection using
and V. Pavithra, “A review on heart disease prediction using machine learning majority voting ensemble method,” in 2019
machine learning and data analytics approach,” International 2nd International Conference on New Trends in Computing
Journal of Computer Applications, vol. 181, no. 18, pp. 20–25, Sciences (Ictcs), pp. 1–6, Amman, Jordan, 2019.
2018.
[14] B. L. Y. Agbley, J. P. Li, A. U. Haq et al., “Federated Fusion of
Magnified Histopathological Images for Breast Tumor Classi-
fication in the Internet of Medical Things,” in IEEE Journal of
Biomedical and Health Informatics, 2023.
[15] I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem,
“Multi-label active learning-based machine learning model
for heart disease prediction,” Sensors, vol. 22, no. 3, p. 1184,
2022.
[16] S. Goel, A. Deep, S. Srivastava, and A. Tripathi, “Comparative
analysis of various techniques for heart disease prediction,” in
2019 4th International Conference on Information Systems and
Computer Networks (ISCON), pp. 88–94, Mathura, India,
2019.
[17] R. Poonguzhali, S. Ahmad, P. T. Sivasankar et al., “Automated
brain tumor diagnosis using deep residual u-net segmentation
model,” Computers, Materials & Continua, vol. 74, no. 1,
pp. 2179–2194, 2023.
[18] F. Ma, T. Sun, L. Liu, and H. Jing, “Detection and diagnosis of
chronic kidney disease using deep learning-based heteroge-
neous modified artificial neural network,” Future Generation
Computer Systems, vol. 111, pp. 17–26, 2020.
[19] Y. A. Nanehkaran, Z. Licai, J. Chen et al., “Diagnosis of chronic
diseases based on patients’ health records in iot healthcare

You might also like