Research Paper
Research Paper
ABSTRACT
Cardiovascular diseases remain one of the leading causes of mortality worldwide. Early detection and accurate
diagnosis are crucial for effective treatment and patient survival. This research presents a comparative analysis of
machine learning algorithms for predicting the presence of heart disease using selected clinical parameters, including
age, resting blood pressure, serum cholesterol, maximum heart rate, and ST depression. The study utilizes the publicly
available UCI Heart Disease dataset and evaluates the performance of multiple models—namely Logistic Regression,
Support Vector Machines (SVM), Random Forest, and Artificial Neural Networks (ANN). The models are trained and
validated using cross-validation techniques to ensure generalizability and robustness. Performance is assessed using
metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. The experimental results demonstrate the
potential of machine learning approaches in developing decision support systems for early heart disease prediction.
Such systems can assist clinicians in making informed diagnostic decisions, potentially improving patient outcomes
and optimizing healthcare delivery.
Keywords: Heart Disease Prediction, Machine Learning, Logistic Regression, Support Vector Machine, Random
Forest, Artificial Neural Network, Clinical Parameters, ROC-AUC, Medical Diagnosis, UCI Dataset
I. INTRODUCTION
Heart disease is one of the most common causes of death in the world today, and many lives are lost simply because
the signs are not caught early enough [1]. People often ignore the warning signs, and in many places, proper medical
testing is either too expensive or not easily available. This is where modern technology can help. Machine learning, a
part of artificial intelligence, can learn patterns from patient data and help doctors predict the chances of heart disease
more accurately and faster [2]. Instead of waiting for costly lab tests, we can use information like age, blood pressure,
cholesterol, and heart rate to train machine learning models that make reliable predictions. This paper explores how
different algorithms like Logistic Regression, Random Forest, Support Vector Machine, and Neural Networks perform
when predicting heart disease using the UCI Heart Disease dataset [3]. Our goal is to make early detection easier,
smarter, and more accessible, especially in areas with fewer medical facilities. Using machine learning in this way can
support doctors and save lives by giving timely alerts and suggestions based on real health data [4].
II. METHODOLOGY
To build an effective heart disease prediction model, we followed a step-by-step approach starting with data
understanding and ending with model evaluation. We used the UCI Heart Disease dataset, which includes important
health parameters like age, blood pressure, cholesterol, and maximum heart rate. These features help identify if a
person is likely to have heart disease [5].
1. Data Collection and Preprocessing
www.irjmets.com @International Research Journal of Modernization in Engineering,
Technology and Science
[1]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering
Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:07/Issue:04/April-2025 Impact Factor- 8.187
www.irjmets.com
We began by collecting the dataset and cleaning it. Some values were missing or not in the correct format, so
we handled those by removing or correcting them. We also changed the text categories (like "male"/"female") into
numbers using encoding so that machine learning models can understand them [6].
2. Feature Selection
Next, we selected the most important features that could impact heart health. Based on healthcare studies, we chose
features such as age, trestbps (resting blood pressure), chol (cholesterol level), thalach (maximum heart rate), and
oldpeak (exercise-related ECG changes) [7]. These features were selected to reduce noise and improve prediction
accuracy.
3. Data Splitting
We then divided the dataset into two parts: 80% for training the model and 20% for testing. This helps us
check how well the model performs on unseen data. We also applied k-fold cross-validation to make the evaluation
fair and consistent [8].
4. Model Building
We used four machine learning models for comparison:
Logistic Regression – simple and fast for binary classification
Random Forest – combines multiple decision trees for better accuracy
Support Vector Machine (SVM) – good for data with a clear margin
Neural Network – can learn deep patterns with layers
Each model was trained using the training data and then tested using the test data to check its performance [9].
5. Model Evaluation
We evaluated all models using standard metrics like accuracy, precision, recall, and ROC AUC score. These
help us understand not just how many correct predictions were made, but also how reliable and balanced the model is
when handling different types of data [10].
Key Findings:
Table 3: Performance of Machine Learning Models Across Key Clinical Trial Tasks
Task Model Metric Result
Patient Grouping Autoencoder + K-Means Silhouette 0.71
Treatment Prediction XGBoost AUC-ROC 0.93 ± 0.02
Imaging Outcome CNN Accuracy 89.6% ± 1.3%
www.irjmets.com @International Research Journal of Modernization in Engineering,
Technology and Science
[4]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering
Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:07/Issue:04/April-2025 Impact Factor- 8.187
www.irjmets.com
Health Forecasting (Time) RNN RMSE 3.01
Adaptive Trial Simulation Bayesian Optimization Uplift +19.8% better than fixed
Doctors reported a 85% increase in confidence when model outputs were explained clearly using SHAP and LIME
[19].
D. Future Directions: There are still some challenges. Real-world patient behavior and social factors are hard
to model. Also, simulations are helpful, but testing in live hospital settings is the real goal.
Next Steps:
Test in live clinical trials for real-world impact
Use Federated Learning to protect privacy across hospitals
Build systems where doctors and ML work together—not one replacing the other [26]
ACKNOWLEDGEMENTS
www.irjmets.com @International Research Journal of Modernization in Engineering,
Technology and Science
[5]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering
Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:07/Issue:04/April-2025 Impact Factor- 8.187
www.irjmets.com
We would like to express our sincere thanks to everyone who supported us throughout this journey.
First and foremost, we are deeply grateful to Dr. Yagnesh Shukla, Dean of FoET, Atmiya University, for his
constant encouragement and visionary guidance. We would also like to thank Mr. Darshan Jani, Head of the
B.Tech Information Technology Department, Atmiya University for his valuable support and motivation, which
inspired us to keep pushing forward.
A big thank you to all the scholars, researchers, and organizations whose work and open access to data
helped shape this study. Your contributions laid the foundation for our research.
We are also thankful to our faculty members and research mentors for their thoughtful feedback and
continuous support, which helped us improve and refine our work.
To our peers and colleagues, your positive words and meaningful discussions played a big part in
shaping the direction of this research.
Lastly, from the bottom of our hearts, we thank our family—our loving mother, supportive father,
caring brothers and sisters—for their unconditional love and belief in us. Your support means the world to us.
REFERENCES
1. World Health Organization. (2021). Cardiovascular diseases (CVDs). Retrieved from https://www.who.int/news-
room/fact-sheets/detail/cardiovascular-diseases-(cvds)
2. Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930.
https://doi.org/10.1161/CIRCULATIONAHA.115.001593
3. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J. J., Sandhu, S., ... & Froelicher, V. (1989).
International application of a new probability algorithm for the diagnosis of coronary artery disease. The
American Journal of Cardiology, 64(5), 304–310. https://doi.org/10.1016/0002-9149(89)90524-9
4. Fernandes, S., Cardoso, J. S., & Fernandes, J. (2020). Data mining and machine learning in heart disease
prediction: A systematic review. Health and Technology, 10(5), 1135–1144. https://doi.org/10.1007/s12553-020-
00447-8
5. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J. J., Sandhu, S., ... & Froelicher, V. (1989).
International application of a new probability algorithm for the diagnosis of coronary artery disease. The
American Journal of Cardiology, 64(5), 304–310. https://doi.org/10.1016/0002-9149(89)90524-9
6. Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques (3rd ed.). Morgan Kaufmann.
7. Gudadhe, M., Wankhade, K., & Dongre, S. (2010). Decision support system for heart disease based on support
vector machine and artificial neural network. International Conference on Computer and Communication
Technology, 741–745. https://doi.org/10.1109/ICCCT.2010.5640410
8. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection.
International Joint Conference on Artificial Intelligence, 14(2), 1137–1145.
9. Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31(3),
249–268.
10. Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks.
Information Processing & Management, 45(4), 427–437.
11. Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology.
Nucleic acids research, 32(suppl_1), D267-D270.
12. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research,
9(Nov), 2579-2605.
13. Vincent, P., et al. (2010). Stacked denoising autoencoders: Learning useful representations in a deep
network with a local denoising criterion. JMLR, 11, 3371-3408.
14. McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for
Dimension Reduction. arXiv preprint arXiv:1802.03426.