BE Project13
BE Project13
Abstract— With the continuing increase in the number of the such as diabetic ketoacidosis, nonketotic hyperosmolar,
deadly diseases that threaten both human health and life, cardiovascular disease, stroke etc.
medical Decision Support Systems (DSS) continue to prove their
effectiveness in providing physicians and other healthcare According to the World Health Organization, diabetes is
professionals with support in clinical decision making. Among one of the leading causes of death worldwide and about 422
these dangerous diseases, diabetes continues to be one of the million people worldwide have diabetes. Indeed, it caused the
leading one that has caused several deaths in the world. It is deaths of 1.6 million people in 2016 [3].
characterized by an increase in blood sugar levels which can
have severe effects on other human organs. According to the
There are two main types of diabetes, type1 and type 2.
International Diabetes Federation (IDA), 382 million people are The diabetes type 1 span 5 to 10% of all diabetes cases. This
living with diabetes and by 2035, these statistics will double to type of diabetes appears most often during childhood or
reach 592 million. In this paper, we propose a DSS for diabetes adolescence and characterized by the partial functioning of
prediction based on Machine Learning (ML) techniques. We pancreas. At the beginning, type 1 diabetes does not develop
compared conventional machine learning with deep learning any symptoms, as the pancreas remains partially functional.
approaches. For conventional machine learning method, we The disease only becomes apparent when 80-90% of
considered the most commonly used classifiers: Support Vector pancreatic insulin-producing cells are already destroyed [4].
Machine (SVM) and the Random Forest(RF). On the other
hand, for Deep Learning (DL) we employed a fully
The diabetes type 2 presents 90% of all diabetes cases.
Convolutional Neural Network (CNN) to predict and detect the This type of diabetes is characterized by chronic
diabetes patients. The proposed system is evaluated on publicly hyperglycemia and the body's inability to regulate blood sugar
available Pima Indians Diabetes database which consisted of levels, which causes a too high glucose (sugar) level in the
total 768 samples each with 8 features. 500 samples were labeled blood. This disease usually occurs in older adults and affects
as non-diabetic while 268 were diabetic patients. The overall more obese or overweight people [5].
accuracy obtained using DL, SVM and RF was 76.81%, 65.38%
In medicine, doctors and current research confirm that if
and 83.67% respectively. The experimental results show that RF
the disease is discovered at an early stage, the chances of
was more effective for diabetes prediction compared to deep
learning and SVM methods.
recovery will be greater. With the continuous advancement of
technology, machine learning and deep learning techniques
Keywords— Decision Support Systems, diabetes, machine have become very useful in early prediction and disease
learning, deep learning, Support Vector Machine, Random analysis. Among these techniques, Support Vector Machine
Forest, Convolutional Neural Network. (SVM), the Random Forest (RF) and the Convolutional
Neural Network (CNN) are used in this research to predict the
I. INTRODUCTION diabetes.
Diabetes mellitus or diabetes is one of the incurable Recently, several researches have focused on predicting
chronic diseases caused by lack or absence of a hormone diabetes using machine learning and deep learning techniques.
called insulin [1]. It is an essential hormone produced by the For instance, in [6], authors have proposed a deep learning-
pancreas that allows the cells to absorb glucose (blood sugar) based method for diabetes data classification by using the
from food supplies in order to provide them the necessary Deep Neural Network (DNN) method. The proposed system
energy [2]. The presence of high blood sugar levels in the was experimented on Pima Indians Diabetes data set. The
blood is known as Hyperglycemia in medical terms. This proposed system has shown good classification accuracy
situation can occur for two main reasons: (1) when the body (86.26%) which shows the effectiveness of the DNN in
cannot make insulin required by the blood cells (2) the body helping doctors to predict the disease.
cannot respond to insulin properly. The body needs insulin so
glucose in the blood can enter the cells of the body where it In [7], authors have presented a theoretical research based
can be used for energy. However, if the body fails to utilize on three classification method from machine learning
glucose to produce energy, it builds up in the blood resulting techniques which are the SVM, the Logistic Regression (LR)
in hyperglycemia. This can cause serious health problems and the Artificial Neural Network (ANN).
V. CONCLUSION
This study performed a comparative analysis of machine
learning and deep learning-based algorithms for prediction of
diabetes. The results showed that RF was more effective for
classification of the diabetes in all rounds of experiments
which produced overall accuracy for diabetic prediction to be
83.67%. The prediction accuracy for SVM reached 65.38%
while DL method produced 76.81% on our dataset. In future
we would like to improve the feature extraction step by
applying an automatic deep feature extraction approach and
for obtaining a better fitting model to improve the prediction
accuracy.