0% found this document useful (0 votes)

53 views

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

The document summarizes a project that uses machine learning techniques to analyze medical data related to diabetes. Specifically, it used a K-Nearest Neighbors (KNN) algorithm to predict diabetes status using 760 instances from a diabetes dataset. The project aims to help with early prediction of illness and identify contributing factors. It describes preprocessing the data, applying the KNN model to make predictions, and evaluating the accuracy of the predictions against actual outcomes.

Uploaded by

Hemasundar Reddy Jollu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

Uploaded by

Hemasundar Reddy Jollu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

DATA ANALYSIS OF DIABETES

USING
MACHINE LEARNING
Dept. of Mechanical Engineering
MITS, Madanapalle

Under the guidance of:

Dr. RAM KRISHNA

Project by:
P. Vishnu Vardhan (18691A03F6)
C.V. Akhileswar Reddy (18691A03F1)
B. Sravan kumar (18691A03F2)
N. Yogeshwar Reddy (18691A03G1)
K. Santhosh kumar (18691A03B9)
Abstract:

Machine Learning Techniques are used to analyse medical information in the

early stages of human life safety. One of the missions is to analyse illness data.
Diabetic illnesses are currently among the main causes of morality worldwide.
Various variables were employed by different researchers at specific stages to
organise and analyse symptoms in medical data. In this a total of 760 instances,
obtained data set from National Institute of Diabetes and Digestive and
Kidney Diseases, for analysis. In the machine learning algorithms, the most
known predictive algorithm is K-Nearest Neighbors (KNN). In this study, we
used KNN algorithm technique in order to early predict of illness and
contribution of parameters for the cause of illness, by using the historical
medical data.
PROBLEM STATEMENT
 This dataset is originally from the National Institute of Diabetes and Digestive and
Kidney Diseases. The objective of the dataset is to diagnostically predict whether or
not a patient has diabetes, based on certain diagnostic measurements included in the
dataset. Several constraints were placed on the selection of these instances from a
larger database. In particular, all patients here are females at least 21 years old.
Sample Data set

Pregnancies Glucose Blood Skin Insulin BMI Diabetes Age Outcome

pressure thickness Pedigree
Function

6 148 72 35 0 33.6 0.627 50 1

1 85 66 29 0 26.6 0.351 31 0

8 183 64 0 0 23.3 0.672 32 1

1 89 66 23 94 28.1 0.167 21 0

0 137 40 35 168 43.1 2.288 33 1

5 116 74 0 0 25.6 0.201 30 1

Problem Details
 X(Input)= “Pregnancies, Glucose, Blood pressure, Skin Thickness, Insulin, BMI,
Diabetes Pedigree Function, Age”
 Y(Output)=outcome
 In this we will get Output as 1 or 0
 If the Outcome is 0 then it is True and if the Outcome is 1 then it is False which
indicates Diabetic or Non Diabetic Patient.
 The Treatment process carried according to the patient condition if Diabetic.
 Behind everything Health has to be given priority.
LITERATURE REVIEW

S.NO TITLE AUTHOR’S YEAR OF FINDINGS

NAME PUBLICATION

1 Public health Prev.Med J.H.Kim, J&Choi, 2015 The author explains developing statical
H.G diagnosis model by discovering principal
parameters for type 2 diabetes mellitus

2 International journal of Kumar Dewangam, 2015 Classification of diabetes mellitus using

engineering and applied A & Agarwal P. machine learning technique
sciences,2(5),145-148

3 Procedia computer Eswari T,Sampath 2015 Predictive methodology for diabetic data
science,50,230-208 P.,& Lavanya S. analysis in big data
4 Internal journal of Kumari V.A,& chitra 2013 Classificaction of diabetes disease using
engineering research and R. support vector machine
applications,3(2),1797-
1801
METHODOLGY (K-NN)

 K-Nearest Neighbor is one of the simplest Machine Learning algorithms based

on Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
 K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
K-NN
 K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
 It is also called a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.
K-NN
 Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. Our KNN model
will find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN algorithm?

 Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To
solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset. Consider the
below diagram:
Suppose we have a new data point and we need
to put it in the required category. Consider the
below image:
Step by step approach:

1. Uploading: Uploading the “Diabetes.csv” data file to google colab.

2. Importing: On Uploading, Import pandas as pd from python library Also,
Import matplotlib.pyplot as plt and numpy as np.
3. Reading: Next reading the data uploaded as df(data frame) in google
colab.
4. Divison: Dividing the data inputs and outputs as x and y variables
respectively.
5. K-NN classifier : Here, where the actual Classification starts
6. From sklearn.neighbors import KNeighborsClassifierCreating the object as
kc for KNeighborsClassifier
7. Fit the x and y values in kc object
8. Model is been fit to kc
9. Then predicting the value of y by x in pred_y variable
10.Compare the pred_y value and y values for the error or wrong that system
undergo Accuracy part and Confusion Matrix
11.Checking whether the accuracy of the data output of the predict value and
output value by using accuracy score from sklearn.metrics
12.And also the confusion matrix provides the figures that how much the
model is accurate on either correct or wrong values.
CONTRIBUTING PARAMETERS
459(TP) 42(FP)
68(FN) 201(TN)
Result
 True Positive (TP) = 459; meaning 459 positive class data points were
correctly classified by the model(They are affected from diabetes)
 True Negative (TN) = 201; meaning 201 negative class data points
were correctly classified by the model(They are free from diabetes as
per the model)
 False Positive (FP) = 42; meaning 42 negative class data points were
incorrectly classified as belonging to the positive class by the
model(They are not having Diabetes but model shows they have
Diabetes)
 False Negative (FN) = 68; meaning 68 positive class data points were
incorrectly classified as belonging to the negative class by the model
(They are having Diabetes but model shows they are not having
Diabetes)
Thank You

Milwaukee Paper Manufacturing - CPM and PERT
100% (1)
Milwaukee Paper Manufacturing - CPM and PERT
10 pages
CH 06
No ratings yet
CH 06
46 pages
Case Study On Business Mathematics
No ratings yet
Case Study On Business Mathematics
10 pages
Sample Size Calculations Thabane
No ratings yet
Sample Size Calculations Thabane
42 pages
Stats Notes
No ratings yet
Stats Notes
77 pages
Diabetes Prediction System With KNN Algorithm
No ratings yet
Diabetes Prediction System With KNN Algorithm
29 pages
KNN (Ap21110011309)
No ratings yet
KNN (Ap21110011309)
5 pages
Internship
No ratings yet
Internship
15 pages
KNN Diabetes Internasional 2
No ratings yet
KNN Diabetes Internasional 2
6 pages
AIML Report (1) 11
No ratings yet
AIML Report (1) 11
13 pages
MANUFINAL
No ratings yet
MANUFINAL
18 pages
AIML Report.
No ratings yet
AIML Report.
12 pages
Applying The Algorithm of Fuzzy K-Nearest Neighbor in Every Class To The Diabetes Mellitus Screening Model
No ratings yet
Applying The Algorithm of Fuzzy K-Nearest Neighbor in Every Class To The Diabetes Mellitus Screening Model
5 pages
DL Project Progress Report
No ratings yet
DL Project Progress Report
49 pages
unit 4.8 KNN
No ratings yet
unit 4.8 KNN
10 pages
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
No ratings yet
Diabetes Prediction Using Machine Learning KNN - Algorithm Technique
4 pages
Classification
No ratings yet
Classification
58 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Lab 5
No ratings yet
Lab 5
2 pages
KNN
No ratings yet
KNN
53 pages
Report
No ratings yet
Report
11 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
K-Nearest Neighbor(KNN) 6
No ratings yet
K-Nearest Neighbor(KNN) 6
46 pages
Prediction of Diabetes Using R
No ratings yet
Prediction of Diabetes Using R
6 pages
Prediction For Diagnosing Liver Disease in Patients Using KNN and Nave Bayes Algorithms
No ratings yet
Prediction For Diagnosing Liver Disease in Patients Using KNN and Nave Bayes Algorithms
5 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Machine Learning
100% (5)
Machine Learning
56 pages
ashwinexp7
No ratings yet
ashwinexp7
4 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
Independent Project
No ratings yet
Independent Project
10 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Classification of Diabetes Mellitus Using Machine Learning Techniques
No ratings yet
Classification of Diabetes Mellitus Using Machine Learning Techniques
4 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Diabetes Disease Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Disease Prediction Using Machine Learning Techniques
7 pages
ML UNIT 5..
No ratings yet
ML UNIT 5..
40 pages
Lecture 38 KNN
No ratings yet
Lecture 38 KNN
4 pages
PML Lab Exp 11
No ratings yet
PML Lab Exp 11
3 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Diabetes Prediction Based on KNN XGBoost SVM and L
No ratings yet
Diabetes Prediction Based on KNN XGBoost SVM and L
5 pages
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
No ratings yet
A Survey On Medical Diagnosis of Diabetes Using Machine Learning Techniques
12 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Comparison of ML Techniques
No ratings yet
Comparison of ML Techniques
16 pages
-21-KNN
No ratings yet
-21-KNN
28 pages
K_Nearest_Neighbour_Classifier
No ratings yet
K_Nearest_Neighbour_Classifier
24 pages
Practical 7
No ratings yet
Practical 7
6 pages
2777-Article Text-14832-2-10-20230331
No ratings yet
2777-Article Text-14832-2-10-20230331
14 pages
Clustering Approach in Diabetes Dataset: Submitted By: Submitted To: Dr. Mridu Sahu
No ratings yet
Clustering Approach in Diabetes Dataset: Submitted By: Submitted To: Dr. Mridu Sahu
20 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
KNN With Example (2)
No ratings yet
KNN With Example (2)
21 pages
ML Unit-2
No ratings yet
ML Unit-2
24 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Share UNIT-IV-1
No ratings yet
Share UNIT-IV-1
138 pages
DCW Project Report
No ratings yet
DCW Project Report
12 pages
KNN
No ratings yet
KNN
6 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
22 pages
KNN
No ratings yet
KNN
20 pages
New Microsoft PowerPoint Presentation (Recovered)
No ratings yet
New Microsoft PowerPoint Presentation (Recovered)
23 pages
Data Science Project Ideas, Methodology & Python Codes in Health Care
From Everand
Data Science Project Ideas, Methodology & Python Codes in Health Care
Zemelak Goraga
No ratings yet
Cyborg Technology: A Quiet Revolution: Review Article
No ratings yet
Cyborg Technology: A Quiet Revolution: Review Article
4 pages
Abstract Submission
No ratings yet
Abstract Submission
1 page
Magenetic Chip Collector New 2
No ratings yet
Magenetic Chip Collector New 2
27 pages
Unit 2
No ratings yet
Unit 2
29 pages
Rialto Enterprises - Overview File & Stipend Details - 2022
No ratings yet
Rialto Enterprises - Overview File & Stipend Details - 2022
18 pages
Magenetic Chip Collector New 2
No ratings yet
Magenetic Chip Collector New 2
27 pages
1_s20_S0264999323004145_main
No ratings yet
1_s20_S0264999323004145_main
17 pages
Ise Elementary Statistics Ise Hed Statistics 4th Edition William Navidi pdf download
No ratings yet
Ise Elementary Statistics Ise Hed Statistics 4th Edition William Navidi pdf download
90 pages
Tugas Desain Riset Data Sas - Nandyni Zulfa Fitasari - E10021137 - D
No ratings yet
Tugas Desain Riset Data Sas - Nandyni Zulfa Fitasari - E10021137 - D
4 pages
Biserial
No ratings yet
Biserial
2 pages
A2 - Bizuayehu Tadesse - 510A
100% (1)
A2 - Bizuayehu Tadesse - 510A
13 pages
L3 - Data Analysis - Central Tendency 20 - 21
No ratings yet
L3 - Data Analysis - Central Tendency 20 - 21
22 pages
Stat
No ratings yet
Stat
58 pages
Critical Value Spearman
No ratings yet
Critical Value Spearman
1 page
III Sem. B.SC Mathematics-Complementary Course - Statistical Inference On10Dec2015
0% (4)
III Sem. B.SC Mathematics-Complementary Course - Statistical Inference On10Dec2015
47 pages
Analysis of Binary Panel Data by Static and Dynamic Logit Models
No ratings yet
Analysis of Binary Panel Data by Static and Dynamic Logit Models
45 pages
Studies in Financial Derivatives, Assignment 1
No ratings yet
Studies in Financial Derivatives, Assignment 1
17 pages
The Effect of Extraction of Third Molars On Late Lower Incisor Crowding A Randomized Controlled Trial
No ratings yet
The Effect of Extraction of Third Molars On Late Lower Incisor Crowding A Randomized Controlled Trial
6 pages
Chapter One: Introduction To Research Methodology
No ratings yet
Chapter One: Introduction To Research Methodology
7 pages
Binomial Probability Distribution & Poisson Probability Distribution
No ratings yet
Binomial Probability Distribution & Poisson Probability Distribution
21 pages
Reading 11 - Wolfe, J. 1996. How To Write A PHD Thesis, UNSW
No ratings yet
Reading 11 - Wolfe, J. 1996. How To Write A PHD Thesis, UNSW
14 pages
Bishop Heber College, UG SYLLABUS
No ratings yet
Bishop Heber College, UG SYLLABUS
55 pages
Hypothesis Test and Significance Level
No ratings yet
Hypothesis Test and Significance Level
27 pages
MS2 CHP 1-10 by Mark Yu
No ratings yet
MS2 CHP 1-10 by Mark Yu
107 pages
BS-M201 Revision Questions
No ratings yet
BS-M201 Revision Questions
43 pages
1 s2.0 S2352710221001443 Main
No ratings yet
1 s2.0 S2352710221001443 Main
13 pages
Lec1 - Introduction
No ratings yet
Lec1 - Introduction
10 pages
Lesson 11-6 Analysis Data
No ratings yet
Lesson 11-6 Analysis Data
57 pages
Krippendorff - 2004 - Reliability in Content Analysis PDF
No ratings yet
Krippendorff - 2004 - Reliability in Content Analysis PDF
17 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
72 pages
Social Media Use and It's Relation With Self-Esteem
No ratings yet
Social Media Use and It's Relation With Self-Esteem
12 pages

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

Uploaded by

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

Uploaded by

DATA ANALYSIS OF DIABETES

Under the guidance of:

Dr. RAM KRISHNA

Machine Learning Techniques are used to analyse medical information in the

Pregnancies Glucose Blood Skin Insulin BMI Diabetes Age Outcome

6 148 72 35 0 33.6 0.627 50 1

8 183 64 0 0 23.3 0.672 32 1

0 137 40 35 168 43.1 2.288 33 1

5 116 74 0 0 25.6 0.201 30 1

S.NO TITLE AUTHOR’S YEAR OF FINDINGS

2 International journal of Kumar Dewangam, 2015 Classification of diabetes mellitus using

 K-Nearest Neighbor is one of the simplest Machine Learning algorithms based

1. Uploading: Uploading the “Diabetes.csv” data file to google colab.

You might also like