0% found this document useful (0 votes)
53 views

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

The document summarizes a project that uses machine learning techniques to analyze medical data related to diabetes. Specifically, it used a K-Nearest Neighbors (KNN) algorithm to predict diabetes status using 760 instances from a diabetes dataset. The project aims to help with early prediction of illness and identify contributing factors. It describes preprocessing the data, applying the KNN model to make predictions, and evaluating the accuracy of the predictions against actual outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle

The document summarizes a project that uses machine learning techniques to analyze medical data related to diabetes. Specifically, it used a K-Nearest Neighbors (KNN) algorithm to predict diabetes status using 760 instances from a diabetes dataset. The project aims to help with early prediction of illness and identify contributing factors. It describes preprocessing the data, applying the KNN model to make predictions, and evaluating the accuracy of the predictions against actual outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA ANALYSIS OF DIABETES

USING
MACHINE LEARNING
Dept. of Mechanical Engineering
MITS, Madanapalle

Under the guidance of:

Dr. RAM KRISHNA

Project by:
P. Vishnu Vardhan (18691A03F6)
C.V. Akhileswar Reddy (18691A03F1)
B. Sravan kumar (18691A03F2)
N. Yogeshwar Reddy (18691A03G1)
K. Santhosh kumar (18691A03B9)
Abstract:

Machine Learning Techniques are used to analyse medical information in the


early stages of human life safety. One of the missions is to analyse illness data.
Diabetic illnesses are currently among the main causes of morality worldwide.
Various variables were employed by different researchers at specific stages to
organise and analyse symptoms in medical data. In this a total of 760 instances,
obtained data set from National Institute of Diabetes and Digestive and
Kidney Diseases, for analysis. In the machine learning algorithms, the most
known predictive algorithm is K-Nearest Neighbors (KNN). In this study, we
used KNN algorithm technique in order to early predict of illness and
contribution of parameters for the cause of illness, by using the historical
medical data.
PROBLEM STATEMENT
 This dataset is originally from the National Institute of Diabetes and Digestive and
Kidney Diseases. The objective of the dataset is to diagnostically predict whether or
not a patient has diabetes, based on certain diagnostic measurements included in the
dataset. Several constraints were placed on the selection of these instances from a
larger database. In particular, all patients here are females at least 21 years old.
Sample Data set

Pregnancies Glucose Blood Skin Insulin BMI Diabetes Age Outcome


pressure thickness Pedigree
Function

6 148 72 35 0 33.6 0.627 50 1

1 85 66 29 0 26.6 0.351 31 0

8 183 64 0 0 23.3 0.672 32 1

1 89 66 23 94 28.1 0.167 21 0

0 137 40 35 168 43.1 2.288 33 1

5 116 74 0 0 25.6 0.201 30 1


Problem Details
 X(Input)= “Pregnancies, Glucose, Blood pressure, Skin Thickness, Insulin, BMI,
Diabetes Pedigree Function, Age”
 Y(Output)=outcome
 In this we will get Output as 1 or 0
 If the Outcome is 0 then it is True and if the Outcome is 1 then it is False which
indicates Diabetic or Non Diabetic Patient.
 The Treatment process carried according to the patient condition if Diabetic.
 Behind everything Health has to be given priority.
LITERATURE REVIEW

S.NO TITLE AUTHOR’S YEAR OF FINDINGS


NAME PUBLICATION

1 Public health Prev.Med J.H.Kim, J&Choi, 2015 The author explains developing statical
H.G diagnosis model by discovering principal
parameters for type 2 diabetes mellitus

2 International journal of Kumar Dewangam, 2015 Classification of diabetes mellitus using


engineering and applied A & Agarwal P. machine learning technique
sciences,2(5),145-148

3 Procedia computer Eswari T,Sampath 2015 Predictive methodology for diabetic data
science,50,230-208 P.,& Lavanya S. analysis in big data
4 Internal journal of Kumari V.A,& chitra 2013 Classificaction of diabetes disease using
engineering research and R. support vector machine
applications,3(2),1797-
1801
METHODOLGY (K-NN)

 K-Nearest Neighbor is one of the simplest Machine Learning algorithms based


on Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
 K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
K-NN
 K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
 It is also called a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to
the new data.
K-NN
 Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. Our KNN model
will find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why do we need a K-NN algorithm?

 Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories. To
solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset. Consider the
below diagram:
Suppose we have a new data point and we need
to put it in the required category. Consider the
below image:
Step by step approach:

1. Uploading: Uploading the “Diabetes.csv” data file to google colab.


2. Importing: On Uploading, Import pandas as pd from python library Also,
Import matplotlib.pyplot as plt and numpy as np.
3. Reading: Next reading the data uploaded as df(data frame) in google
colab.
4. Divison: Dividing the data inputs and outputs as x and y variables
respectively.
5. K-NN classifier : Here, where the actual Classification starts
6. From sklearn.neighbors import KNeighborsClassifierCreating the object as
kc for KNeighborsClassifier
7. Fit the x and y values in kc object
8. Model is been fit to kc
9. Then predicting the value of y by x in pred_y variable
10.Compare the pred_y value and y values for the error or wrong that system
undergo Accuracy part and Confusion Matrix
11.Checking whether the accuracy of the data output of the predict value and
output value by using accuracy score from sklearn.metrics
12.And also the confusion matrix provides the figures that how much the
model is accurate on either correct or wrong values.
CONTRIBUTING PARAMETERS
459(TP) 42(FP)
68(FN) 201(TN)
Result
 True Positive (TP) = 459; meaning 459 positive class data points were
correctly classified by the model(They are affected from diabetes)
 True Negative (TN) = 201; meaning 201 negative class data points
were correctly classified by the model(They are free from diabetes as
per the model)
 False Positive (FP) = 42; meaning 42 negative class data points were
incorrectly classified as belonging to the positive class by the
model(They are not having Diabetes but model shows they have
Diabetes)
 False Negative (FN) = 68; meaning 68 positive class data points were
incorrectly classified as belonging to the negative class by the model
(They are having Diabetes but model shows they are not having
Diabetes)
Thank You

You might also like