Heart Disease Prediction Technical Seminar Report (1)
Heart Disease Prediction Technical Seminar Report (1)
BACHELOR OF ENGINEERING
IN
This is to certify that the seminar titled ”Heart Disease Detection”, is the
bonafide work carried out by Pranav Raj B (160121729050), a student of
B.E.(AIML) of Chaitanya Bharathi Institute of Technology(A), Hyderabad, affiliated
to Osmania University, Hyderabad, Telangana (India) during the period of 2021-2025,
submitted in partial fulfillment of the requirements for the award of the degree in
Bachelor of Engineering - Artificial Intelligence and Machine Learning and that
the seminar has not formed the basis for the award previously of any other degree,
diploma, fellowship or any other similar title.
ii
ABSTRACT
Heart disease is one of the leading causes of death globally. Early detection and
timely treatment play a crucial role in preventing fatal outcomes. Traditional
diagnostic methods are often time-consuming and may lack predictive capabilities. In
recent years, Machine Learning (ML) has emerged as a powerful tool for predictive
healthcare analytics.
This seminar report presents a detailed study on heart disease prediction using
various machine learning algorithms including Logistic Regression, Decision Trees,
Random Forest, K-Nearest Neighbors (KNN), and Support Vector Machines (SVM).
The model is trained and evaluated on the Cleveland Heart Disease dataset, which
consists of patient health parameters such as age, sex, chest pain type, blood pressure,
cholesterol levels, and more.
iii
LIST OF CONTENTS
CONTENTS PAGE NO
Title Page i
Certificate ii
Abstract iii
List of Tables v
List of Abbreviations v
Chapter 1: Introduction
Conclusion 12
Future Scope 13
References 14
iv
LIST OF TABLES
LIST OF ABBREVIATIONS
Abbreviation Description
SVM Support Vector Machine
MLP Multi Layered Perceptron
CNN Convolutional Neural Network
KNN K-Nearest Neighbors
EWOA Enhanced Whale Optimisation Algorithm
SMOTE Synthetic Minority Oversampling Technique
UCI University of California, Irvine
v
CHAPTER 1 INTRODUCTION
INTRODUCTION
Heart disease remains one of the leading causes of mortality globally, affecting
millions each year. Early detection and accurate diagnosis are critical to reducing the
risk of severe health complications or death. Traditional diagnostic methods often rely
on manual evaluation, which can be time-consuming and prone to human error. With
the increasing availability of clinical data and advancements in computational
technology, there is an urgent need for automated systems that can assist healthcare
professionals in identifying heart disease risk more efficiently.
1.1Methodology of Research
The review reveals that no single model universally outperforms others across all
datasets. SVM and Random Forest-based models frequently report high accuracy,
Page 1 of 13
CHAPTER 1 INTRODUCTION
often exceeding 95%. Deep learning approaches such as CNNs, particularly when
combined with Federated Learning or hybrid models, demonstrate robustness in
preserving data privacy while maintaining competitive accuracy. Feature selection
techniques like Enhanced Whale Optimization Algorithm (EWOA) and sequential
feature selection further enhance model performance. However, results also
emphasize the dependency on data quality, preprocessing steps, and hyperparameter
tuning for optimal outcomes.
Page 2 of 13
CHAPTER 2 LITERATURE SURVEY
LITERATURE SURVEY
2.1 Heart Disease Detection Techniques and Methodologies
In 2022, Abdellatif et al. [2] created a predictive model for cardiovascular disease
(CVD) with an emphasis on multiple stages such as data preprocessing, dataset
partitioning, classifier training, hyperparameter optimization, and benchmarking. The
study achieved outstanding accuracy levels—99.2% for CVD classification and
95.73% for severity level classification—based on six different machine learning
classifiers and the application of SMOTE (Synthetic Minority Over-sampling
Technique) to deal with class imbalance-related problems. The major strengths of the
study lie in extensive comparative evaluation of multiple classifiers, comprehensive
comparison against benchmarks, and effective imbalanced dataset management. The
methodology is, however, subject to some drawbacks such as dependence on data
quality, computational needs with regards to hyperparameter tuning, restricted usage
across datasets, and overfitting. A major positive aspect of this implementation is the
inclusion of SMOTE along with hyperparameter optimization, which together
achieves precision and resilience in CVD prediction and severity level classification.
Page 3 of 13
CHAPTER 2 LITERATURE SURVEY
In 2023, Sharma et al. [3] introduced a heart disease prediction model using Federated
Learning (FL) and Convolutional Neural Networks (CNNs). The study employed the
UCI Cleveland dataset, a popular heart disease prediction dataset. Random
oversampling was employed to address class imbalance to ensure consistency in
sample distribution. The FL model divided data into five clients, each receiving
18,000 samples from a pool of 90,000, ensuring data privacy during model training.
The CNN architecture made up of convolutional, pooling, and fully connected layers
was trained for 15 epochs and six communication rounds. The FL model had a
validation accuracy of 94.99%, a good predictive ability, though lower than
centralized CNN model accuracy of 97%. The study identifies FL’s strength in
maintaining patient data privacy and scalability across health centers at the cost of
dependence on client data quality. The study makes its novel FL-CNNs combination,
offering secure, collaborative heart disease prediction without central data storage,
evading critical privacy issues in medical AI applications.
In 2023, Lakshmi et al. [4] created a heart disease prediction model from the
Framingham heart disease dataset of Kaggle. The research used an Enhanced Whale
Optimization Algorithm (EWOA) for feature reduction, enhancing model
performance by selecting the optimal predictors and minimizing computer
complexity. Various machine learning classifiers, including hybrid models Hybrid
SVM-RF and Hybrid SVM-KNN, were trained on the reduced feature space and
tested for accuracy, precision, recall, and F1-score. The approach showed enhanced
accuracy with efficient feature selection due to EWOA and hybrid model
complementarities. However, the model’s performance remained dataset quality
dependent, and introducing complexity through hybrid classifiers had limitations in
hyperparameter tuning. The main contribution of the research is the augmentation of
the Whale Optimization Algorithm with enhanced robustness and generalization in
feature selection and, by extension, predictive accuracy in heart disease diagnosis.
In 2023, Singh et al. [5] proposed a heart disease prediction model on an open-source
data set consisting of 1025 tuples and 11 features. Implementation entailed training
different machine learning models, such as Naive Bayes, Multilayer Perceptron
(MLP), Decision Tree, and Logistic Regression, on 80:20 train-test split. Model
performance was measured using accuracy, precision, recall, and F1-score metrics.
Page 4 of 13
CHAPTER 2 LITERATURE SURVEY
Decision Tree model recorded the highest accuracy of 98.04%, which is an indicator
of its strength. Strength of the study lies in its diversity in the choice of algorithm for
comparative study and improved understanding. Weakness is that it is based on the
quality of the data set, has restricted generalizability, and lower accuracy of some of
the models (e.g., Naive Bayes at 83.12%). One of the distinct aspects of this
implementation is the combination of machine learning and deep learning strategies,
where they are used together to harness their synergy for better predictive
performance.
In 2023, Jawalkar et al. [7] created a predictive model for heart disease through an
SGB-optimized decision tree-based random forest (DTRF) classifier. The
methodology involved rigorous data preprocessing steps, such as missing data
handling, removal of duplicate records, and encoding categorical variables. The
classifier was trained on a cleaned patient health record data with features of age,
blood pressure, and cholesterol levels. Model performance was validated using a
publicly available real-world dataset, with results achieving an accuracy rate of 96%.
Page 5 of 13
CHAPTER 2 LITERATURE SURVEY
There are observed advantages, such as resistance to noisy data, improved health
outcomes through early detection, and stringent data preprocessing standards. There
are, however, observed weaknesses, such as dependency on data quality, model
complexity, narrow generalizability, and the need for repeated updates in accordance
with advances in medical knowledge. A unique feature of this implementation is the
application of SGB optimization, which facilitates better convergence of the model
and evasion of overfitting, making it a viable tool for heart disease prediction.
In 2024, Giri et al. [8] used a machine learning-based method of predicting heart
disease through Support Vector Machines (SVM) and k-Nearest Neighbors (KNN).
The research used a structured data set with important health markers like age, gender,
resting blood pressure, cholesterol, blood sugar, and ECG. Comparative analysis was
used to ascertain the performance of the algorithms in medical diagnosis. The SVM
model functioned with enhanced accuracy of 89%, better than KNN at 86%. The
study highlights the benefit of using non-invasive diagnostic techniques, in line with
the trend of reducing risks in medical testing. Additionally, the model flexibility
across datasets was also highlighted, as was their potential in applicability across
populations. The study, however, also has some limitations, such as the requirement
for datasets and a narrow scope due to exclusion of other possible machine learning
algorithms. The greatest contribution of this study is its comparative analysis of SVM
and KNN in heart disease detection, with insightful information on their classification
capability and validity in healthcare use.
In 2024, Mall et al. [9] developed a Support Vector Machine (SVM)-based model of
heart disease prediction because of its ability to classify patients into ”high risk” and
”low risk” classes. The implementation utilized feature mapping to transform data
into a high-dimensional space so that the SVM model could identify an optimal
hyperplane for classification. Effective data preprocessing made accurate
interpretation of significant features such as age, BMI, and cholesterol, leading to
better predictions. The study focused on cost-effectiveness and real-time operation of
the model, appropriate for healthcare applications. The process was restricted,
however, by large data and noisy data that could reduce accuracy. A unique feature of
the implementation was the use of data mining methods to detect hidden patterns,
enhancing predictive accuracy and allowing for informed clinical decision-making.
Page 6 of 13
CHAPTER 2 LITERATURE SURVEY
Salau et al. [10] in 2024 created a Support Vector Machine (SVM)-based heart disease
detection model with sequential feature selection (FS). The paper focused on
enhancing classification efficiency through systematic discovery of the most critical
features to reduce redundancy and enhance predictive power. The model was
validated with 5-fold cross-validation and achieved a significant accuracy rate of
98.6% when applied with a given subset of 8 features. The strengths of the application
are high accuracy, low cost, and potential as an automated decision aid in medical
diagnosis. Its weaknesses, however, are the dependency on features selected and
potentially difficult generalizability to multiple datasets. The distinctive feature of this
method lies in its application of sequential feature selection, in addition to enhancing
the efficiency of the model, as it also has the benefit of clinical decision-making
through the discovery of the most critical signs of heart disease.
Page 7 of 13
CHAPTER 3 PERFORMANCE EVALUATION
PERFORMANCE EVALUATION
Beyond these core metrics, the models were evaluated on other performance aspects
such as feature selection methods, scalability, and interpretability. For instance, Salau
et al.’s use of sequential feature selection resulted in improved prediction accuracy
with minimal input variables, supporting clinical interpretability. Similarly, Lakshmi
et al.’s implementation of the Enhanced Whale Optimization Algorithm (EWOA)
enhanced both robustness and generalization by selecting optimal feature subsets.
Scalability was addressed in Sharma et al.’s work using Federated Learning, which
enabled decentralized training without compromising data privacy. The trade-off
between performance and complexity was also noted—models like CNNs and
XGBoost provided high accuracy but required substantial computational resources
and parameter tuning. Overall, the comparative evaluations highlight that the most
effective models are those that strike a balance between predictive accuracy,
interpretability, and adaptability across diverse datasets.
Page 8 of 13
CHAPTER 3 PERFORMANCE EVALUATION
Page 9 of 13
CHAPTER 3 PERFORMANCE EVALUATION
central
performance
Lakshm Framingham Hybrid 95.3 Enhanced Whale
i et al., (Kaggle) SVM-KNN, Optimization
2023 Hybrid SVM-RF Algorithm for
+ EWOA feature reduction
Bhatt et Kaggle Random Forest, 87.28 (MLP) Hyperparameter
al., (70,000 → MLP, XGBoost, tuning with
2023 59,000) Decision Tree GridSearchCV;
MLP performed
best
Mall et UCI SVM + Feature 96.8 Real-time,
al., Cleveland Mapping cost-effective
2024 SVM model with
feature mapping
Salau et UCI SVM + 98.6 Focused on
al., Cleveland Sequential feature relevance
2024 Feature Selection using sequential
selection
Giri et Structured SVM vs. KNN 89 (SVM), 86 Comparative
al., Dataset (KNN) model
2024 performance
using basic
health indicators
Page 10 of 13
CONCLUSION
CONCLUSION
In conclusion, heart disease remains a leading cause of death worldwide, and early
diagnosis is essential to reducing mortality rates. This technical seminar reviewed a
wide range of research studies that applied various machine learning algorithms to
predict heart disease using clinical data. From traditional classifiers like Decision
Trees and Naive Bayes to advanced techniques like Deep Neural Networks, SVMs,
and Federated Learning models, the reviewed works demonstrate how machine
learning has significantly contributed to predictive accuracy in the medical domain.
Feature selection techniques, data preprocessing, and model optimization have also
played a vital role in improving model performance and reducing computational
complexity.
The studies collectively indicate that no single model is universally superior; instead,
model selection and success depend heavily on the nature of the dataset and the
context of application. While many models achieved high accuracy, challenges such
as overfitting, data imbalance, lack of generalization, and limited interpretability
remain areas for improvement. As healthcare systems continue to embrace artificial
intelligence, ongoing research and innovation will be key to refining these models,
ensuring they are reliable, interpretable, and scalable for real-world deployment. The
review underscores the importance of interdisciplinary collaboration between data
scientists, medical experts, and policy-makers in achieving the full potential of
machine learning in heart disease prediction.
Page 11 of 13
FUTURE SCOPE
FUTURE SCOPE
The scope for future research in heart disease prediction using machine learning is
broad and full of potential. One promising direction is the integration of these
predictive models into real-time clinical systems and wearable health devices. This
could enable continuous monitoring of vital signs and early alerts for potential heart
conditions, thus improving preventive care and timely intervention. Furthermore, the
adoption of Explainable Artificial Intelligence (XAI) frameworks will be essential in
making machine learning models more transparent and interpretable to healthcare
professionals. By providing insights into the decision-making process of these
models, medical practitioners can better understand and trust the predictions being
made.
Another critical area for development lies in the use of larger and more diverse
datasets. Most current studies rely on limited or geographically constrained data,
which can result in biased models. Expanding datasets to include broader
demographic and regional information will enhance the robustness and applicability
of the predictive systems. In addition, privacy-aware techniques like Federated
Learning hold great promise for collaborative model training without the need to
share sensitive patient data. This will be especially important as ethical and legal
concerns regarding data privacy continue to grow. Lastly, future work could focus on
developing hybrid models that combine the strengths of traditional machine learning,
deep learning, and optimization algorithms to achieve higher accuracy and
adaptability. Multi-disease detection models that can identify and differentiate
between various cardiovascular diseases simultaneously could also increase the
practical utility of these systems in clinical environments.
Page 12 of 13
REFERENCES
REFERENCES
[1] R. Mittal and S. Kaur, “Sentiment Analysis Using Integrated BiLSTM and CNN
Model Optimized by Improved Grey Wolf Optimizer,” *IEEE Access*, vol. 10, pp.
113371–113385, 2022.
[5] M. A. Khan, T. Zia, and R. Riaz, “Aspect-Based Sentiment Analysis Using Hybrid
BERT and BiLSTM Approach,” *IEEE Access*, vol. 11, pp. 125372–125384, 2023.
[6] A. H. Pathan, M. A. Shah, and A. Rauf, “An Enhanced Pre-Trained Lan guage
Model for Context-Aware Sentiment Analysis,” *IEEE Access*, vol. 12, pp.
15722–15732, 2024.
[7] H. Chen, Y. Zhou, and W. Xu, “Leveraging DeBERTa and Knowledge Graphs for
Fine-Grained Sentiment Analysis,” *IEEE Transactions on Affective Computing*,
Early Access, 2024.
[8] P. Roy, S. Pal, and M. De, “Gated Convolutional Network With Attention-Based
BiLSTM for Multi-Class Sentiment Detection,” *IEEE Access*, vol. 12, pp.
24334–24345, 2024.
[9] Y. Zhang, J. Wang, and X. Zhang, “Attribute-Based Injection Trans former for
Personalized Sentiment Analysis,” *IEEE Transactions on Emerging Topics in
Computational Intelligence*, vol. 8, no. 3, pp. 2581–2591, June 2024.
[10] L. Yao and N. Zheng, “Sentiment Analysis Based on Improved Trans former
Model and Conditional Random Fields,” *IEEE Access*, vol. 12, pp. 90145–90157,
2024.
Page 13 of 13