0% found this document useful (0 votes)
21 views

Internship

Internship Ppt

Uploaded by

hmm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Internship

Internship Ppt

Uploaded by

hmm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Vijaya Vittala Institute of Technology

Department of Computer science and Engineering


Internship
on

“EARLY DIABETES PREDICTION USING MACHINE LEARNING ”


Submitted by

MOHAN KRISHNA K
1VJ17CS028

Internship Carried out


At
“Exposys Data Labs”
Internal guide External Guide
Ms. Deepa Pattan Y Visnuvardhan
Asst. Professor, Dept. of CS&E Chief
Director
TABLE OF CONTENTS
• ABOUT THE COMPANY
• ABOUT THE DEPARTMENT
• TASK PERFORMED
• INTRODUCTION
• PROPOSED METHOD WITH ARCHITECTURE
• METHODLOGY
• IMPLEMENTATION

• REFLECTION NOTES

• REFERENCES
ABOUT THE COMPANY
• EXPOSYS DATA LABS AIMS TO SOLVE REAL WORLD BUSINESS PROBLEMS LIKE
AUTOMATION, BIG DATA AND DATA SCIENCE. OUR CORE TEAM OF EXPERTS IN VARIOUS
TECHNOLOGIES HELP BUSINESSES TO IDENTIFY ISSUES, OPPURTUNITIES AND
PROTOTYPE SOLUTIONS USING TRENDING TECHNOLOGIES LIKE AI, ML, DEEP LEARNING
AND DATA SCIENCE. WE FOLLOW A HUMAN-FOCUSSED AND NOT TECHNOLOGY DRIVEN
APPROACH TO ACHIEVE SUCCESS IN OUR CLIENTS ENDEAVOURS.

• “OUR DISCOVERIES ARE BEYOND BELIEF AND IF YOU’RE WITH US, YOU’LL DISCOVER A
NEWER WAY TO THINK!”
ABOUT THE DEPARTMENT
• THE FOLLOWING IS A SUMMARY OF THE DATA SCIENTIST, DJANGO DEVELOPER DEPARTMENT AND ROLES AND
RESPONSIBILITIES, :
• DJANGO DEVELOPERS ARE RESPONSIBLE FOR DEVELOPING CLOUD-BASED PRODUCTS, WORK ON UX WITH FRONT-
END DEVELOPERS
• WORK WITH STAKEHOLDERS TO DETERMINE HOW TO USE BUSINESS DATA FOR VALUABLE BUSINESS SOLUTIONS
• SEARCH FOR WAYS TO GET NEW DATA SOURCES AND ASSESS THEIR ACCURACY
• BROWSE AND ANALYZE ENTERPRISE DATABASES TO SIMPLIFY AND IMPROVE PRODUCT DEVELOPMENT, MARKETING
TECHNIQUES, AND BUSINESS PROCESSES
• CREATE CUSTOM DATA MODELS AND ALGORITHMS
• USE PREDICTIVE MODELS TO IMPROVE CUSTOMER EXPERIENCE, AD TARGETING, REVENUE GENERATION, AND MORE
• DEVELOP THE ORGANIZATION’S TEST MODEL QUALITY AND A/B TESTING FRAMEWORK
• COORDINATE WITH VARIOUS TECHNICAL/FUNCTIONAL TEAMS TO IMPLEMENT MODELS AND MONITOR RESULTS
• DEVELOP PROCESSES, TECHNIQUES, AND TOOLS TO ANALYZE AND MONITOR MODEL PERFORMANCE WHILE
ENSURING DATA ACCURACY
TASK PERFORMED
1.INTRODUCTION
• PROBLEM STATEMENT

• THE NORMAL IDENTIFYING PROCESS IS THAT PATIENTS NEED TO VISIT A DIAGNOSTIC CENTER, CONSULT THEIR
DOCTOR, AND SIT TIGHT FOR A DAY OR MORE TO GET THEIR REPORTS.

• MOREOVER, EVERY TIME THEY WANT TO GET THEIR DIAGNOSIS REPORT, THEY HAVE TO WASTE THEIR MONEY IN VAIN.
DIABETES MELLITUS (DM) IS DEFINED AS A GROUP OF METABOLIC DISORDERS MAINLY CAUSED BY ABNORMAL
INSULIN SECRETION AND/OR ACTION.

• MACHINE LEARNING INTRODUCTION - TOM M. MITCHELL DEFINES MACHINE LEARNING AS A COMPUTER PROGRAM IS
SAID TO LEARN FROM EXPERIENCE E WITH RESPECT TO SOME CLASS OF TASKS T AND PERFORMANCE MEASURE P IF
ITS PERFORMANCE AT TASKS IN T, AS MEASURED BY P, IMPROVES WITH EXPERIENCE E.

• GENERAL DEFINITION - MACHINE LEARNING IS THE TRAINING OF A MODEL FROM DATA THAT GENERALIZES A
DECISION AGAINST A PERFORMANCE MEASURE.
2.1 PROPOSED METHOD

 K -NEAREST NEIGHBOR ALGORITHM: KNN is a method which is used for


classifying objects based on closest training examples in the feature space. KNN is
the most basic type of instance-based learning or lazy learning. It assumes
all instances are points in n-dimensional space. A distance measure is needed to
determine the “closeness” of instances. KNN classifies an instance by finding its
nearest neighbors and picking the most popular class among the neighbors.

Figure 2: KNN model with Two classes


2.2 Architecture of the proposed algorithm

Figure 3: Flow diagram of the proposed system


3. METHODOLOGY

 In this dataset, the missing values are represented by zero values that need to
be replaced. The zero values are replaced by NaN so that missing values can
easily be imputed using the fillna() command. We perform Feature scaling on
the dataset using Minmaxscaler() so that it scales the entire dataset such that
it lies between 0 and 1. It is an important preprocessing step for many
algorithms.
 In the feature correlation heatmap, we can observe that Glucose, Insulin,
Age and BMI are highly correlated with the outcome. So, we select these
features as X and the outcome as Y. The dataset is then split using
train_test_split with an 80:20 ratio.
4. IMPLEMENTATION

 The data is divided into classes, if other data is wanted to classify then it finds the
neighbours of that element based on the majority number of votes for the label.
 The classification report and confusion matrix, accuracy, precision , f1-score are
shown below:

Figure 4.1: Data cells and training classification report of the system
Threshold value

 The threshold value of the model is 0.72 based on ROC curve

Fig 4.2: Diabetes classifier ROC curve


DIABETES EARLY PREDICTION
SYSTEM USING DJANGO

Fig 5: Diabetes prediction deployed to Django framework in which the sdjango server is hosted
locally
DIABETES EARLY PREDICTION
SYSTEM USING DJANGO

Figure 6: Diabetes early prediction system model using


Django framework
Reflection notes

1. Problem Solving Skill


• Handling of Missing data
• Removing rows with them
This method is a simple, but a messy way to handle missing values since in
addition to removing these values, it can potentially remove data that aren’t null.
You can call dropna() on your entire data frame or on specific columns:

Fig 7:dropna() method


• Similar to problem solving skills, there is no one way to increase your
curiosity.
2.WORK EXPERIENCE

 Well, the internship has definitely reaffirmed my passion for Data Science and I am
grateful that my works did leave some traction for future works. The research and
development phase, the communication skills required to talk to different stakeholders, the
curiosity and passion to solve business problems using data (just to name a few) have all
contributed to my interest in this field.
 The Data Science industry is still very young and its job description could somehow seem
vague and ambiguous to job seekers like us. It’s perfectly normal to not possess all the
skills needed as the most job description is idealistically created to align with their best
expectation
REFRERENCES

 [1] W. Xueli, J. Zhiyong and Y. Dahai, "An Improved KNN Algorithm Based on Kernel Methods
and Attribute Reduction," 2015 Fifth International Conference on Instrumentation and Measurement,
Computer, Communication and Control (IMCCC), 2015, pp. 567-570, doi:
10.1109/IMCCC.2015.125.
 [2] D. Shetty, K. Rit, S. Shaikh and N. Patil, "Diabetes disease prediction using data mining," 2017
International Conference on Innovations in Information, Embedded and Communication Systems
(ICIIECS), 2017, pp. 1-5, doi: 10.1109/ICIIECS.2017.8276012.
 [3] V. S. Lakshmi, V. Nithya, K. Sripriya, C. Preethi and K. Logeshwari, "Prediction of Diabetes
Patient Stage Using Ontology Based Machine Learning System," 2019 IEEE International
Conference on System, Computation, Automation and Networking (ICSCAN), 2019, pp. 1-4, doi:
10.1109/ICSCAN.2019.8878831.
 [4] Y. Chang and H. Liu, "Semi-supervised classification algorithm based on the KNN," 2011 IEEE
3rd International Conference on Communication Software and Networks, 2011, pp. 9-12, doi:
10.1109/ICCSN.2011.6014376.

You might also like