0% found this document useful (0 votes)
55 views6 pages

Fin Irjmets1680519036

This document discusses using machine learning techniques to predict multiple diseases. It describes predicting breast cancer, heart disease, and diabetes using a web application built with machine learning models. The document provides background on challenges in healthcare predictive analytics and discusses how machine learning can be used to analyze large amounts of medical data to help practitioners make timely treatment decisions. It also summarizes several studies that have used support vector machines and logistic regression algorithms to predict diabetes and heart disease with 78-97.5% accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Fin Irjmets1680519036

This document discusses using machine learning techniques to predict multiple diseases. It describes predicting breast cancer, heart disease, and diabetes using a web application built with machine learning models. The document provides background on challenges in healthcare predictive analytics and discusses how machine learning can be used to analyze large amounts of medical data to help practitioners make timely treatment decisions. It also summarizes several studies that have used support vector machines and logistic regression algorithms to predict diabetes and heart disease with 78-97.5% accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
MULTIPLE DISEASE PREDICTION USING MACHINE LEARNING
D. Vasavi*1, D. Venkatesh*2, S. Santhosh Kumar*3, S. Sahaja*4,
V. Santhosh Kumar*5
*1Sr. Asst. Professor, Dept. Of Computer Science And Engineering, Aditya Institute Of
Technology And Management (A)-Tekkali, India.
*2,3,4,5Students, Dept. Of Computer Science And Engineering, Aditya Institute Of
Technology And Management (A)-Tekkali, India.
ABSTRACT
There are multiple techniques in machine learning that can in a variety of industries, do predictive analytics on
large amounts of data. Predictive analytics in healthcare is a difficult endeavour, but it can eventually assist
practitioners in making timely decisions regarding patients' health and treatment based on massive data.
Diseases like Breast cancer, diabetes, and heart-related diseases are causing many deaths globally but most of
these deaths are due to the lack of timely check-ups of the diseases. The above problem occurs due to a lack of
medical infrastructure and a low ratio of doctors to the population. The diseases related to heart, cancer, and
diabetes can cause a potential threat to mankind, if not found early. In this work, breast cancer, heart, and
diabetes are included. To make this work seamless and usable by the mass public, our team made a medical test
web application that makes predictions about various diseases using the concept of machine learning. In this
work, our aim to develop a disease-predicting web app that uses the concept of machine learning-based
predictions about various diseases like Breast cancer, Diabetes, and Heart diseases.
Keywords: Machine Learning, SVM, Logistic Regression, Accuracy.
I. INTRODUCTION
In terms of data collecting and processing, healthcare is one of the most worrisome industries. With the advent
of the digital era and technological advancements, a vast quantity of multidimensional data on patients is
created, including clinical factors, hospital resources, illness diagnostic information, patients’ records, and
medical equipment. The enormous, dense, and complex data must be processed and evaluated in order to
extract knowledge for effective decision making. Medical data mining offers a lot of potential for uncovering
hidden patterns in medical data sets. By identifying significant patterns and detecting correlations and
relationships among many variables in huge databases, the use of various data mining tools and machine
learning approaches has changed healthcare organizations. It serves as an important instrument in the medical
sector, providing and comparing existing data for the future course of action. This technology combines
multiple analytic methodologies with modern and complex algorithms, allowing for the exploration of massive
amounts of data . It is used in healthcare to gather, organize, and analyze patient data in a systematic manner. It
may be used to identify inherent inefficiencies and best practices for providing better services, which may lead
to improved diagnosis, better medicine, and more successful treatment, as well as a platform for a deeper
knowledge of the mechanisms in practically all elements of the medical domain. Overall, it assists in the early
detection and prevention of disease epidemics by searching medical databases for pertinent information. The
process of determining a condition based on a person’s symptoms and indicators is known as medical
diagnosis. In the diagnostic process, one or more diagnostic procedures, such as diagnostic tests, are
performed. Diagnosis of chronic illnesses is a vital issue in the medical industry since it is based on many
symptoms. It is a complex procedure that frequently leads to incorrect assumptions. When diagnosing illnesses,
the clinical judgment is based mostly on the patient’s symptoms as well as the physicians’ knowledge and
experience.
Diabetes is a chronic disease caused by the increase in blood sugar, mainly either due to the less production or
no production of insulin in body (type 1diabetes), or due to the fact that cells do not respond to the produced
insulin (type 2 diabetes). In recent years, the number of diabetic patients has increased drastically, mainly due
to the ageing population and irregular western food habits. According to the World Health Organization,
diabetes affects around 346 million people in the world, with the prevalence of diabetes type 2. Moreover,

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[80]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
diabetes is the major cause for heart stroke, kidney failure, lower-limb amputations and blindness. The absence
of symptoms, or the absence of recognition of the indicators in the patient’s data, may lead to the pre-diabetes
or diabetes condition that goes undetected even in more than one-third of people that are later diagnosed with
diabetes.
Machine learning is the area of artificial intelligence that uses the statistical analyses, and is recognized to be a
promising area that, based on the given dataset of diabetics, can help in patient classification or probability
prediction regarding the patient’s pre-diabetic or diabetic condition. The main strength of these methods is
contained in the ability of the algorithms to learn from data and to use that knowledge for later predictions and
decisions. There are a number of machine learning and statistical modeling approaches that so far have been
involved in various aspects of solving the problem. Although other classifiers perform well, the SVM
outperforms other classifiers with respect to accuracy.
Early detection and treatment of several heart diseases is very complex, especially in developing countries,
because of the lack of diagnostic centers and qualified doctors and other resources that affect the accurate
prognosis of heart disease. With this concern, in recent times computer technology and machine learning
techniques are being used to make medical aid software as a support system for early diagnosis of heart
disease. Identification of any heart related illness at primary stage can reduce the death risk. Various ML
techniques are used in medical data to understand the pattern of data and making prediction from them.
Healthcare data are generally massive in volumes and complex in structure. ML algorithms are capable to
handle the big data and mine them to find the meaningful information. Machine Learning algorithms learn from
past data and do prediction on real time data. This sort of ML framework for coronary illness expectation can
encourage cardiologists in taking quicker actions so more patients can get medicines within a shorter
timeframe, thus saving large number of lives.
The clinical diagnosis of Parkinson disease (PD) can be confirmed basing on neuropathologic and histo-
pathologic criteria [1]. Clinical diagnostic classification of PD can be done on comprehensive review of the
literature data and selection basing on the sensitivity and specificity of the characteristic clinical features.
Prospective with clinic-pathologic studies in representative population of patients showing PD are needed to
investigate the clinical, pathologic, and nosologic studies based on frequency of occurrence, characteristics, and
risk factors in patients. PD causes vocal impairment that effects speech, motor skills, and other functions like
behavior, mood, sensation and thinking. Tele_monitoring of the disease using voice measurement has a vital
role in its early diagnosis of PD. The conventional bootstrapping or leave-one-out validation methods with
Support Vector Machine (SVM) for building a classification to build a predictive model for assessing the
relevance and the statistical significance of the PD relations to attributes.
Streamlit :
Streamlit is a free and open-source framework to rapidly build and share beautiful machine learning and data
science web apps. It is a Python-based library specifically designed for machine learning engineers. Data
scientists or machine learning engineers are not web developers and they're not interested in spending weeks
learning to use these frameworks to build web apps. Instead, they want a tool that is easier to learn and to use,
as:

Author Name Disease Algorithm Accuracy

Anuja Kumari,R.Chitra Diabetes SVM 78%

Tafa Zhilbert,
Nerxhivane Pervetica, Diabetes SVM 95.52%
and Bertran Karahoda

Mujumdar, Aishwarya,
Diabetes Logistic Regression 97.5%
and V. Vaidehi

Sharma, Vijeta,
Heart Disease SVM 0.995( precision)
Shrinkhala Yadav, and

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[81]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
Manjari Gupta

Jindal, Harshit, Heart Disease Logistic Regression 87.5%

Sriram T. V Parkinson Disease SVM 88.9%


II. LITERATURE SURVEY
Long as it can display data and collect needed parameters for modeling. Streamlit allows you to create a
stunning-looking application with only a few lines of code.
The best thing about Streamlit is that you don't even need to know the basics of web development to get started
or to create your first web application. So if you're somebody who's into data science and you want to deploy
your models easily, quickly, and with only a few lines of code, Streamlit is a good fit.
III. METHODOLOGY
Machine Learning:
As for the formal definition of Machine Learning, we can say that a Machine Learning algorithm learns from
experience E with respect to some type of task T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
Logistic Regression:
Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised
Learning technique. It is used for predicting the categorical dependent variable using a given set of independent
variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression
is used for solving Regression problems, whereas Logistic regression is used for solving the classification
problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts
two maximum values
(0 or 1).
The curve from the logistic function indicates the likelihood of something such as whether the cells are
cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression can be used to classify the observations using different types of data and can easily
determine the most effective variables used for the classification. The below image is showing the logistic
function

Support Vector Machine :


Support Vector Machines (SVMs) are a type of supervised learning algorithm that can be used for classification
or regression tasks. The main idea behind SVMs is to find a hyperplane that maximally separates the different
classes in the training data. This is done by finding the hyperplane that has the largest margin, which is defined
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[82]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
as the distance between the hyperplane and the closest data points from each class. Once the hyperplane is
determined, new data can be classified by determining on which side of the hyperplane it falls. SVMs are
particularly useful when the data has many features, and/or when there is a clear margin of separation in the
data.

Support Vector Machine (SVM) is a relatively simple Supervised Machine Learning Algorithm used for
classification and/or regression. It is more preferred for classification but is sometimes very useful for
regression as well. Basically, SVM finds a hyper-plane that creates a boundary between the types of data. In
2dimensional space, this hyper-plane is nothing but a line. In SVM, we plot each data item in the dataset in an N-
dimensional space, where N is the number of features/attributes in the data. Next, find the optimal hyperplane
to separate the data. So by this, you must have understood that inherently, SVM can only perform binary
classification (i.e., choose between two classes). However, there are various techniques to use for multi-class
problems. Support Vector Machine for Multi-CLass Problems. To perform SVM on multi-class problems, we can
create a binary classifier for each class of the data. The two results of each classifier will be :
•The data point belongs to that class OR
•The data point does not belong to that class.
For example, in a class of fruits, to perform multi-class classification, we can create a binary classifier for each
fruit. For say, the ‘mango’ class, there will be a binary classifier to predict if it IS a mango OR it is NOT a mango.
The classifier with the highest score is chosen as the output of the SVM. SVM for complex (Non Linearly
Separable) SVM works very well without any modifications for linearly separable data. Linearly Separable Data
is any data that can be plotted in a graph and can be separated into classes using a straight line.
Radial Basis Function Kernel (RBF): The similarity between two points in the transformed feature space is an
exponentially decaying function of the distance between the vectors and the original input space as shown
below. RBF is the default kernel used in SVM.
Polynomial Kernel: The Polynomial kernel takes an additional parameter, ‘degree’ that controls the model’s
complexity and k(x,x’) = exp(-r || x-x’ ||) computational cost of the transformation

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[83]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
About Dataset

Parkinson Disease Heart Disease

Diabetes
IV. RESULTS AND ANALYSIS
Disease Algorithm Accuracy score

Diabetes SVM 0.77

Heart Disease Logistic Regression 0.81

Parkinson’s Disease SVM 0.87


V. CONCLUSION
Data for healthcare is an interdisciplinary topic of research that evolved from database statistics and is valuable
in assessing the efficiency of medical interventions. Data visualization with machine learning Diabetes-related
heart disease is a kind of heart disease that occurs in diabetics. Diabetes is a chronic disease that arises when
the pancreas fails to create enough insulin or when the body fails to utilize the insulin that is generated
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[84]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:05/Issue:04/April-2023 Impact Factor- 7.868 www.irjmets.com
appropriately. Heart disease, often known as cardiovascular disease, isa group of disorders affecting the heart
or blood arteries. Despite the existence of many data mining classification methods for predicting heart
disease, there is insufficient data to predict heart dis-ease in a diabetic individual. We fine tuned the SVM model
for optimum performance in forecasting the chance of heart disease in diabetic patients since it consistently
outperformed the logistic regression and support vector machine models.
VI. REFERENCES
[1] Kumari, V. Anuja, and R. Chitra. "Classification of diabetes disease using support vector machine."
International Journal of Engineering Research and Applications 3.2 (2013): 1797-1801.
[2] Sisodia, Deepti, and Dilip Singh Sisodia."Prediction of diabetes using classification algorithms."
Procedia computer science 132 (2018): 1578-1585.
[3] Tafa, Zhilbert, Nerxhivane Pervetica, and Bertran Karahoda. "An intelligent system for diabetes
prediction." 2015 4th Mediterranean Conference on Embedded Computing (MECO). IEEE, 2015.
[4] Mujumdar, Aishwarya, and V. Vaidehi."Diabetes prediction using machine learning algorithms."
Procedia Computer Science 165 (2019): 292299.
[5] Joshi, Tejas N., and P. P. M. Chawan."Diabetes prediction using machine learning techniques." Ijera 8.1
(2018): 9-13.
[6] Sriram, T. V., et al. "Intelligent Parkinsondisease prediction using machine learning algorithms." Int. J.
Eng. Innov. Technol 3 (2013): 212-215.
[7] Yadav, Anupama, Levish Gediya, and Adnanuddin Kazi. "Heart disease prediction using machine
learning." International Research Journal of Engineering and Technology (IRJET 8.09 (2021).
[8] Jindal, Harshit, et al. "Heart disease prediction using machine learning algorithms." IOP conference
series: materials science and engineering. Vol. 1022. No. 1. IOP Publishing, 2021.
[9] Sharma, Vijeta, Shrinkhala Yadav, and Manjari Gupta. "Heart disease prediction using machine learning
techniques." 2020 2nd International Conference on Advances in Computing, Communication Control
and Networking (ICACCCN). IEEE, 2020.
[10] Arumugam, K., et al. "Multiple disease prediction using Machine learning algorithms." Materials Today:
Proceedings (2021).

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[85]

You might also like