Prediction of Diabetes Using R
Prediction of Diabetes Using R
---------------------------------------------------------------------------------------------------------------------------------------
Submitted: 18-12-2022 Accepted: 31 -12-2022
---------------------------------------------------------------------------------------------------------------------------------------
ABSTRACT—Diabetes, a chronic disease which (INDIAB Study)[9].More than 200 million people
is caused due to continued high blood sugar levels are infected and about a seven percent increase in
in the human body. It is further classified into the annual predominance of diabetes in the world
“Type1” and “Type2” based on the level of glucose [16]. K- Nearest Neighbor Algorithm is a simple
in the body and also gestational diabetes (diabetes and supervised algorithm which is used for both
while pregnant). Currently diabetes is diagnosed classification and regression models. Decision tree
using A1C, Fasting blood sugar test, Glucose Algorithm is used for preparing a training model
tolerance test and Random blood sugar test. which is used to predict the outcomes . Random
However, if detected early diabetes can be avoided. Forest is one of the best algorithms which is widely
Detection of diabetes with Machine Learning and used for Classification and Regression
Deep learning techniques come into play to solve analysis.Hence, this paper implements three
this issue. This research paper experiments and prediction techniques as mentioned above also
analyzes 3 Machine learning algorithms- Random taking into consideration only significant factors
Forest(RF), Decision tree and K-Nearest from the dataset.For better results up-sampling,
Neighbor(KNN) and also Upsampling, Feature feature selection and data cleaning has been
Selection and Performance Metric (Precision and implemented.
Recall). The data used in the dataset was procured
from the Iraqi Society from the laboratory of II. DATASETDESCRIPTION
Medical City Hospital (The specialized center for The Diabetes data is selected from the Iraqi Society
Endocrinology and Diabetes-Al-Kindy Teaching from the laboratory of Medical City Hospital (The
Hospital).The dataset consists of 11 risk factors. specialized center for Endocrinology and Diabetes-
However, Upsampling, Feature Selection and Al- Kindy Teaching Hospital).10 risk factors are
Correlation Matrix helped to wave off some included in the dataset also the patient's
irrelevant factors. gender is taken into consideration.These
Keywords: Machine Learning, Diabetes characteristics are displayed in Table 1.The dataset
prediction, Regression analysis, KNN, Random consists of a total 1000 observations including 11
Forest, Decision Tree ,Upsampling,Feature attributes. Dataset contains 2 Integer 2-Character
Selection, Precision, Recall. and 8 Numeric attributes.
I. INTRODUCTION Table1
Diabetes is a disease that is threatening lives Diabetes Dataset Risk Factors
around the world today..The most common types of FEATURENUMB ATTRIBUT ATTRIBUT
Diabetes are -Type1 , Type2 and gestational ER ENAME ETYPE
diabetes. Some of the factors include Age, High
Blood Pressure , Weight , family history etc . The 1 Gender Character
symptoms may include hunger , fatigue , high thirst
2 Age Integer
, blurred vision , numbness etc [1]. In India's
adult population, probably 72.96-million cases are 3 Urea Numeric
of diabetes. The prevalence in urban areas ranged 4 Cr Integer
from 10.9% to 14.2%[9]. In rural India, the 5 HbA1c Numeric
prevalence was 3.0-7.8%, from the population age 6 Chol Numeric
group 20 years and above, with a much higher 7 TG Numeric
prevalence among individuals over the age of 50
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 885
International Journal of Advances in Engineering and Management (IJAEM)
Volume 4, Issue 12 Dec. 2022, pp: 885-890 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 886
International Journal of Advances in Engineering and Management (IJAEM)
Volume 4, Issue 12 Dec. 2022, pp: 885-890 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 887
International Journal of Advances in Engineering and Management (IJAEM)
Volume 4, Issue 12 Dec. 2022, pp: 885-890 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 888
International Journal of Advances in Engineering and Management (IJAEM)
Volume 4, Issue 12 Dec. 2022, pp: 885-890 www.ijaem.net ISSN: 2395-5252
V. CONCLUSION:
The detection and prediction of diabetes is
collectively one of the most common medical
problems in today’s world and if not diagnosed in
the early phase it can lead to a lot of other issues
and health problems. The above use of algorithms
as well model effectiveness techniques can serve as
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 889
International Journal of Advances in Engineering and Management (IJAEM)
Volume 4, Issue 12 Dec. 2022, pp: 885-890 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-0412885890 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 890