0% found this document useful (0 votes)

21 views

Multi-Disease Prediction With Machine Learning

The document describes a study that developed a machine learning model to predict diseases based on symptoms. The researchers used three machine learning algorithms - Random Forest, Naive Bayes, and Support Vector Machine. They collected data from Kaggle containing diseases and symptoms. The models were trained on this data and evaluated for accuracy. Random Forest achieved the highest accuracy of 95.7% while Naive Bayes and SVM achieved 94.5% and 79.66% accuracy respectively. The study aims to help doctors make faster diagnoses and improve health outcomes.

Uploaded by

Umar Khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Multi-Disease Prediction With Machine Learning

Uploaded by

Umar Khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

How to Cite:

Karwa, H., Gupta, P., Agrawal, R., Virdi, G. S., Kumar, A., & Jain, S. (2022). Multi-disease
prediction with machine learning. International Journal of Health Sciences, 6(S2), 9477–
9483. https://doi.org/10.53730/ijhs.v6nS2.7487

Multi-disease prediction with machine learning

Harsh Karwa
Student, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India
Correspondence author email: [email protected]

Pavan Gupta
Student, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India

Ram Agrawal
Student, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India

Gursewak Singh Virdi

Student, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India

Amit Kumar
Student, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India

Sweta Jain
Professor, CSE Department, Shri Ramdeobaba College of Engineering and
Management, Nagpur, India

Abstract---In the present era, Machine learning (ML) algorithms are

extensively used in computer assisted diagnosis of the disease based
on the symptoms of the disease. The widespread use of healthcare
applications in the pandemic time, provides a motivation to further
develop new computer assisted diagnostic application in the
healthcare domain. Prevention and treatment of disease, accurate and
timely diagnosis of any health-related problem is essential. In the case
of a serious illness, a standard diagnostic method may not be enough.
We have proposed a system for predicting the disease. There were
about forty-one diseases in the data corpus that needed to be
analyzed based on the symptoms. The system delivers a disease
prediction that a person may have depending on the symptoms. This
diagnostic program can assist a physician in diagnosing disease,
allowing for timely treatment and saving lives. The disease forecasting
system was developed using ML models such as the Random Forests,

International Journal of Health Sciences ISSN 2550-6978 E-ISSN 2550-696X © 2022.

Manuscript submitted: 27 March 2022, Manuscript revised: 18 April 2022, Accepted for publication: 9 May 2022
9477
9478

the Naive Bayes, and the Support Vector Machine Classification

Algorithm. The presented work outlines an analysis of the
aforementioned algorithms.

Keywords---machine learning, disease prediction, random forest, naive

bayes, support vector machine.

Introduction

Medicine and health are important factors in economic growth and human life.
Technology assisted health care applications are significantly increasing since the
past two decades. In the pandemic period, there are many remote areas that still
do not have emergency health care services. To effectively cater the need of
masses in heavily populated. Country like India, the online diagnosis system is
the need of hour. Disease predicting systems may be a boom in many cases as it
can prevent the caused life risk diseases beforehand and can suggest the
individual to get immediate treatment to prevent further damage from the
underlying diseases. Not just that, this strategy will cut treatment costs and
reduce fear in the late stages, enabling sufficient treatment to be provided at the
right moment and reducing the rate of death. Furthermore, several localized
diseases have distinct features in different places, making disease outbreak
prediction difficult.

Related Work

There is a lot of research into disease prediction models utilizing various machine
learning algorithms, with varying results for various medical techniques. The
author Chauhan et al., (2020) shown an accuracy of the machine learning models
- Decision Tree, Random Forest, and Naive Bayes,as 92.4 %, 95.7 %, and 94.5 %
respectively[1]. Another study published by author Chen et al. (2017) on CNN-
based multimodal disease risk prediction achieved an accuracy of 94.5 % [2]. The
accuracy of the research work on Fuzzy Logic, Fuzzy Neural Networks, and
Decision Tree published by Leoni et al. (2017) was 58.8 %, 91 %, and 68.7 %,
respectively [3]. Furthermore, the accuracy achieved by Vijayarani et al. research
work on the SVM and Nayes Bayes was determined to be 79.66 % and 61.28 %,
respectively [4].

Materials and Methods

Our proposed work is based on the prediction of many diseases that follow a
patient's symptoms. In any medical application task the main important footstep
is to get the data corpus. The data preprocessing is an essential step to clean it
and prepare it for building a model during training phase. The testing of the
model is carried out using unseen data/test data from corpus.User will provide
the symptoms to our system. The symptoms will be provided as input / key
feature to our ML model where we will be using algorithms like Random Forest,
Naive Bayes, and SVM to predict disease in order to help the patient in early
stages of their disease. In this work, we have used python as a platform for using
9479

machine learning algorithms. We've also built a great GUI to provide system
connectivity.

Dataset

The data corpus for this application is collected from Kaggle, which includes
attributes containing diseases, and their symptoms. The user needs to
understand related features in the dataset. This dataset can be easily found on
Kaggle for the same link that has been provided in references [5].Workflow
diagram of the system is shown in Figure1.

Dataset Collected from Kaggle

Preprocess Data and Split the dataset as

Training and Test Set

Build the ML Model using Training Set

RF, SVM, and NB

Test the model using Test Set

Predict the class of Disease

and evaluate the performance

Figure1: Workflow diagram of the system

Machine Learning

The ML algorithms are heavily dependent on the amount of the data presented to
them for learning / training the target variable. In this work we have implemented
multiclass classification of the diseases. The dataset is publically available.
Finding a subset of the relevant features is the major task in design of any
healthcare application. As, it is possible that many disease may have some of the
overlapping symptoms. The result obtained may not be generalized to real world
diagnosis applications, unless it is examined by the specialist doctor.

Naive Bayes

It's a classification technique that utilizes Bayes' Theorem as well as the predictor
independence condition. This model is straightforward and is extremely
9480

advantageous for big datasets. Naive Bayes is believed to exceed even the most
sophisticated classification algorithms due to its simplicity. This is rooted in the
Bayes theorem, which enables us to evaluate the conditional probabilities, say P
(F|G) using P (F), P (G), and P (G|F).Thus the Bayes Theorem can be represented
as

P (F|G) =P (G|F)*P (F)/P (G)

The conditional probabilities and the class probabilities P (Yi) are computed using
the training dataset by the Naive Bayes classifier. Although connected features
are voted twice in the model, the Naive Bayes classifier works perfectly when they
are omitted. This yields an overemphasis on the value of the associated features.

Random Forest

Random forest is a supervised learning technique that can be used to classify and
predict data. However, it is mostly employed to solve categorization issues. A
forest, as we all know, is made up of trees, and more trees equals a more healthy
forest. It's an ensemble method that's superior to a single decision tree because it
averages the results to reduce over fitting of the model. It chooses the best voting
solution. Random Forest produces better results than real problems mainly due
to noise incompatibility in the database and is not based on overload. It works
great too and shows excellent performance over other tree-based algorithms. To
read the tree, bootstrap is widely used for merging or wrapping.

Support Vector Machine

SVM is a popular method of classification. It is widely used in Machine Learning

for differentiating the differences in any given dataset. Linear SVM is used for
linear data, which means that if a database can be divided into two categories
using one straight line, such data is called linear data. Non-Linear SVM is used
for non-linear data, which means that if the database cannot be categorized using
a straight line. There are different types of kernels used in Non-Linear SVM, some
of them are Gaussian radial basis function (RBF), Polynomial kernel, Hyperbolic
Tangent kernel sigmoid kernel, ANOVA radial basis kernel. In our SVM, we have
utilized the RBF, which is the Gaussian radial basis function. The RBF kernel say
for a and a' can be represented as

Here,
= Square of the Euclidean distance between the two feature vectors
(For positive values)

Result and Discussion

This section indicates the results of a developed system that can predict disease
faster, accurately with high fidelity than the existing system. Results are obtained
9481

with random forest, Naïve Bayes and SVM using Python. When a user accesses
the Disease Prediction Website, he or she is directed to the homepage. On the
homepage, there are specific procedures for predicting one's diseases, as seen in
figure 2.

The application also has a good visual interface that handles all of the inputs
required for prediction. The user will select symptoms from the drop down menu
and add them by clicking the add button; if the user wishes to remove a specific
symptom, he or she can do so by clicking the delete button and by clicking the
clear button, all symptoms are removed, as illustrated in figure 3.
9482

By clicking on predict now button, the user can find all of the probable diseases
predicted by the various algorithms, as seen in figure 4.

Conclusion

In this paper, we used three machine learning algorithms to predict and achieve a
desirable result for the user, as well as making the system more efficient than the
existing one and thus providing a better user experience than other available
systems. The present study focused only on the structured dataset of symptoms.
Most of the disease prediction system needs multimodal data as input for correct
diagnosis of the disease. There are no standard ways for dealing with semi-
structured and unstructured data.

References

1. “Disease Prediction using Machine Learning” Raj H. Chauhan, Daksh N.

Naik, Rinal A. Halpati, Sagarkumar J. Patel, Mr. A.D.Prajapati, International
Research Journal of Engineering and Technology (IRJET). Volume: 07 Issue:
05 | May 2020.
2. “Disease prediction by machine learning over big data from healthcare
communities”, M. Chen, Y. Hao, K. Hwang, L. Wang, IEEE Access, vol. 5, no.
1, pp. 8869-8879, 2017. DOI: 10.1109/ACCESS.2017.2694446
https://ieeexplore.ieee.org/abstract/document/7912315
3. “Disease Classification Using Machine Learning Algorithms - A Comparative
Study”, S. Leoni Sharmila, C. Dharuman and P. Venkatesan, , International
Journal of Pure and Applied Mathematics, vol. 114, no. 6, pp. 1-10, 2017.
4. “Liver Disease Prediction using SVM and Naive Bayes Algorithms” - Dr. S.
Vijayarani, Mr.S.Dayananda, International Journal of Science, Engineering
and Technology Research (IJSETR), 2015.Volume 4, Issue 4, April 2015.
9483

5. Disease and symptomsDataset, Kaggle Dataset Link:

https://www.kaggle.com/itachi9604/disease-symptom-description-
dataset?select=dataset.csv

The Prediction of Disease Using Machine Learning
No ratings yet
The Prediction of Disease Using Machine Learning
8 pages
JETIR2006301
No ratings yet
JETIR2006301
5 pages
DiseasePredReport (3) (1)
No ratings yet
DiseasePredReport (3) (1)
42 pages
The Prediction of Disease Using Machine Learning: December 2021
No ratings yet
The Prediction of Disease Using Machine Learning: December 2021
8 pages
No_11
No ratings yet
No_11
8 pages
Multiple Disease Prediction
No ratings yet
Multiple Disease Prediction
18 pages
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
5 pages
Lipid Patient Prediction Using Machine Learning
No ratings yet
Lipid Patient Prediction Using Machine Learning
34 pages
Mini Project Report
No ratings yet
Mini Project Report
11 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
Disease Prediction System Using Naïve Bayes
No ratings yet
Disease Prediction System Using Naïve Bayes
7 pages
Heart Disease Prediction System
No ratings yet
Heart Disease Prediction System
3 pages
Ijrsrfinal
No ratings yet
Ijrsrfinal
5 pages
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
0% (1)
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
7 pages
Detection of Plant Disease FF
No ratings yet
Detection of Plant Disease FF
11 pages
Paper
No ratings yet
Paper
7 pages
Project Report
No ratings yet
Project Report
13 pages
No_3
No ratings yet
No_3
4 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
Biomedical Applications of Machine Learning
No ratings yet
Biomedical Applications of Machine Learning
23 pages
itmconf_icacc2022_03057
No ratings yet
itmconf_icacc2022_03057
6 pages
10 1109@iccubea 2018 8697439
No ratings yet
10 1109@iccubea 2018 8697439
6 pages
Mental Health Detection Using Machine Learning
100% (1)
Mental Health Detection Using Machine Learning
31 pages
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
No ratings yet
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
5 pages
Kohli 2018
No ratings yet
Kohli 2018
4 pages
20221031014210pmwebology 19 (5) 56
No ratings yet
20221031014210pmwebology 19 (5) 56
10 pages
Edited - Django Website For Disease Prediction Using Machine Learning
No ratings yet
Edited - Django Website For Disease Prediction Using Machine Learning
7 pages
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
No ratings yet
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
6 pages
Second Progres Report
No ratings yet
Second Progres Report
10 pages
Minnie Project On Library Management System
No ratings yet
Minnie Project On Library Management System
27 pages
Disease Prediction by Machine Learning
No ratings yet
Disease Prediction by Machine Learning
6 pages
Heart Disease
No ratings yet
Heart Disease
14 pages
Prediction of Diseases Using Random Forest
No ratings yet
Prediction of Diseases Using Random Forest
8 pages
Patient Sickness Prediction System
No ratings yet
Patient Sickness Prediction System
8 pages
Digital Prescription and Disease Prediction Using Machine Learning
No ratings yet
Digital Prescription and Disease Prediction Using Machine Learning
8 pages
Comparative Study of Machine Learning Algorithms For Diabetes
No ratings yet
Comparative Study of Machine Learning Algorithms For Diabetes
11 pages
A Study on Predictive Algorithms in Heal
No ratings yet
A Study on Predictive Algorithms in Heal
7 pages
A Survey On Machine Learning Assisted Big Data Analysis For Health Care Domain
No ratings yet
A Survey On Machine Learning Assisted Big Data Analysis For Health Care Domain
5 pages
Design of Heart Disease
No ratings yet
Design of Heart Disease
6 pages
Multi-Disease Prediction Using Machine Learning Algorithm
No ratings yet
Multi-Disease Prediction Using Machine Learning Algorithm
9 pages
A Systematic Review of Supervised Learning Algorithms in Disease Diagnosis
No ratings yet
A Systematic Review of Supervised Learning Algorithms in Disease Diagnosis
10 pages
Analysis of Classification Techniques For Medical Data: April 2018
No ratings yet
Analysis of Classification Techniques For Medical Data: April 2018
6 pages
Ijet V7i2 8 10557
No ratings yet
Ijet V7i2 8 10557
4 pages
Parkinson
No ratings yet
Parkinson
7 pages
Hybrid Heart Disease Prediction Model Using Machine Learning Algorithm
No ratings yet
Hybrid Heart Disease Prediction Model Using Machine Learning Algorithm
6 pages
mlPPT_11_45
No ratings yet
mlPPT_11_45
31 pages
Lungcancer
No ratings yet
Lungcancer
5 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
10 pages
Springer Lecture Notes in Computer Science (1)
No ratings yet
Springer Lecture Notes in Computer Science (1)
11 pages
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
No ratings yet
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
22 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Project Title and Abstract
No ratings yet
Project Title and Abstract
17 pages
1 s2.0 S1877050915004536 Main
No ratings yet
1 s2.0 S1877050915004536 Main
8 pages
Research Paper-TWS-Assign- 2-with mendeley software
No ratings yet
Research Paper-TWS-Assign- 2-with mendeley software
6 pages
TSP Csse 31761
No ratings yet
TSP Csse 31761
17 pages
Literature Review
No ratings yet
Literature Review
18 pages
INTRODUCTION
No ratings yet
INTRODUCTION
14 pages
PREDICTIN
No ratings yet
PREDICTIN
3 pages
BP-5 (Model, Algo Info)
No ratings yet
BP-5 (Model, Algo Info)
8 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
6973-Article Text-12451-1-10-20210605
No ratings yet
6973-Article Text-12451-1-10-20210605
6 pages
Assignment II Machine Learning
No ratings yet
Assignment II Machine Learning
8 pages
Publications
No ratings yet
Publications
26 pages
Complete Download Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF All Chapters
100% (4)
Complete Download Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF All Chapters
62 pages
Brain Tumor Detection Using MRI Images
No ratings yet
Brain Tumor Detection Using MRI Images
4 pages
ML 01
No ratings yet
ML 01
15 pages
Cse-564 (Final Viva Voce Ppt)
No ratings yet
Cse-564 (Final Viva Voce Ppt)
32 pages
Stock Price Prediction Using Machine Learning
No ratings yet
Stock Price Prediction Using Machine Learning
57 pages
Icrecs 2024 438 443
No ratings yet
Icrecs 2024 438 443
6 pages
A Machine Learning Approach For Intrusion Detection
No ratings yet
A Machine Learning Approach For Intrusion Detection
6 pages
POWERTECH 2023 - Two-Stage - Event-Driven - NILM - Utilizing - Odd - Harmonic - Distortion
No ratings yet
POWERTECH 2023 - Two-Stage - Event-Driven - NILM - Utilizing - Odd - Harmonic - Distortion
6 pages
Praveen Apc Report 2018
No ratings yet
Praveen Apc Report 2018
22 pages
FAKE JOB POST PREDICTION USING ML
No ratings yet
FAKE JOB POST PREDICTION USING ML
7 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
MACHINE LEARNING IN DATA ANALYSIS
No ratings yet
MACHINE LEARNING IN DATA ANALYSIS
17 pages
Android Application For Crop Yield Prediction and Crop Disease Detection
No ratings yet
Android Application For Crop Yield Prediction and Crop Disease Detection
4 pages
DLT Unit-1 Answers
No ratings yet
DLT Unit-1 Answers
36 pages
Nikhil Major Project
No ratings yet
Nikhil Major Project
60 pages
UGC List of Approved Journals
No ratings yet
UGC List of Approved Journals
9 pages
Forecasting Dengue Fever Using Machine Learning Regression Techniques
No ratings yet
Forecasting Dengue Fever Using Machine Learning Regression Techniques
8 pages
ECG and Fingerprint Bimodal Authentication
No ratings yet
ECG and Fingerprint Bimodal Authentication
10 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
A Predictor For Movie Success: 2.1 Data Collection
No ratings yet
A Predictor For Movie Success: 2.1 Data Collection
5 pages
Spam Detection
No ratings yet
Spam Detection
40 pages
8.improving Machine Learning Based Phase and Hardness Prediction of High-Entropy Alloys by Using Gaussian Noise Augmented Data
No ratings yet
8.improving Machine Learning Based Phase and Hardness Prediction of High-Entropy Alloys by Using Gaussian Noise Augmented Data
7 pages
6 Project Report Sem6
No ratings yet
6 Project Report Sem6
13 pages
Krishi Mitra - Intelligent Crop and Fertilizer Recommender-2
No ratings yet
Krishi Mitra - Intelligent Crop and Fertilizer Recommender-2
6 pages
Support Vector Machine For EEG Signal
No ratings yet
Support Vector Machine For EEG Signal
4 pages
VG Computer Science AI Recommender
No ratings yet
VG Computer Science AI Recommender
18 pages
Machine Learning Approaches and Sentinel-2 Data in Crop Type Mapping
No ratings yet
Machine Learning Approaches and Sentinel-2 Data in Crop Type Mapping
21 pages

Multi-Disease Prediction With Machine Learning

Uploaded by

Multi-Disease Prediction With Machine Learning

Uploaded by

How to Cite:

Multi-disease prediction with machine learning

Gursewak Singh Virdi

Abstract---In the present era, Machine learning (ML) algorithms are

International Journal of Health Sciences ISSN 2550-6978 E-ISSN 2550-696X © 2022.

the Naive Bayes, and the Support Vector Machine Classification

Keywords---machine learning, disease prediction, random forest, naive

Materials and Methods

Dataset Collected from Kaggle

Preprocess Data and Split the dataset as

Build the ML Model using Training Set

Test the model using Test Set

Predict the class of Disease

Figure1: Workflow diagram of the system

P (F|G) =P (G|F)*P (F)/P (G)

Support Vector Machine

SVM is a popular method of classification. It is widely used in Machine Learning

Result and Discussion

1. “Disease Prediction using Machine Learning” Raj H. Chauhan, Daksh N.

5. Disease and symptomsDataset, Kaggle Dataset Link:

You might also like