Internship
Internship
MOHAN KRISHNA K
1VJ17CS028
• REFLECTION NOTES
• REFERENCES
ABOUT THE COMPANY
• EXPOSYS DATA LABS AIMS TO SOLVE REAL WORLD BUSINESS PROBLEMS LIKE
AUTOMATION, BIG DATA AND DATA SCIENCE. OUR CORE TEAM OF EXPERTS IN VARIOUS
TECHNOLOGIES HELP BUSINESSES TO IDENTIFY ISSUES, OPPURTUNITIES AND
PROTOTYPE SOLUTIONS USING TRENDING TECHNOLOGIES LIKE AI, ML, DEEP LEARNING
AND DATA SCIENCE. WE FOLLOW A HUMAN-FOCUSSED AND NOT TECHNOLOGY DRIVEN
APPROACH TO ACHIEVE SUCCESS IN OUR CLIENTS ENDEAVOURS.
• “OUR DISCOVERIES ARE BEYOND BELIEF AND IF YOU’RE WITH US, YOU’LL DISCOVER A
NEWER WAY TO THINK!”
ABOUT THE DEPARTMENT
• THE FOLLOWING IS A SUMMARY OF THE DATA SCIENTIST, DJANGO DEVELOPER DEPARTMENT AND ROLES AND
RESPONSIBILITIES, :
• DJANGO DEVELOPERS ARE RESPONSIBLE FOR DEVELOPING CLOUD-BASED PRODUCTS, WORK ON UX WITH FRONT-
END DEVELOPERS
• WORK WITH STAKEHOLDERS TO DETERMINE HOW TO USE BUSINESS DATA FOR VALUABLE BUSINESS SOLUTIONS
• SEARCH FOR WAYS TO GET NEW DATA SOURCES AND ASSESS THEIR ACCURACY
• BROWSE AND ANALYZE ENTERPRISE DATABASES TO SIMPLIFY AND IMPROVE PRODUCT DEVELOPMENT, MARKETING
TECHNIQUES, AND BUSINESS PROCESSES
• CREATE CUSTOM DATA MODELS AND ALGORITHMS
• USE PREDICTIVE MODELS TO IMPROVE CUSTOMER EXPERIENCE, AD TARGETING, REVENUE GENERATION, AND MORE
• DEVELOP THE ORGANIZATION’S TEST MODEL QUALITY AND A/B TESTING FRAMEWORK
• COORDINATE WITH VARIOUS TECHNICAL/FUNCTIONAL TEAMS TO IMPLEMENT MODELS AND MONITOR RESULTS
• DEVELOP PROCESSES, TECHNIQUES, AND TOOLS TO ANALYZE AND MONITOR MODEL PERFORMANCE WHILE
ENSURING DATA ACCURACY
TASK PERFORMED
1.INTRODUCTION
• PROBLEM STATEMENT
• THE NORMAL IDENTIFYING PROCESS IS THAT PATIENTS NEED TO VISIT A DIAGNOSTIC CENTER, CONSULT THEIR
DOCTOR, AND SIT TIGHT FOR A DAY OR MORE TO GET THEIR REPORTS.
• MOREOVER, EVERY TIME THEY WANT TO GET THEIR DIAGNOSIS REPORT, THEY HAVE TO WASTE THEIR MONEY IN VAIN.
DIABETES MELLITUS (DM) IS DEFINED AS A GROUP OF METABOLIC DISORDERS MAINLY CAUSED BY ABNORMAL
INSULIN SECRETION AND/OR ACTION.
• MACHINE LEARNING INTRODUCTION - TOM M. MITCHELL DEFINES MACHINE LEARNING AS A COMPUTER PROGRAM IS
SAID TO LEARN FROM EXPERIENCE E WITH RESPECT TO SOME CLASS OF TASKS T AND PERFORMANCE MEASURE P IF
ITS PERFORMANCE AT TASKS IN T, AS MEASURED BY P, IMPROVES WITH EXPERIENCE E.
• GENERAL DEFINITION - MACHINE LEARNING IS THE TRAINING OF A MODEL FROM DATA THAT GENERALIZES A
DECISION AGAINST A PERFORMANCE MEASURE.
2.1 PROPOSED METHOD
In this dataset, the missing values are represented by zero values that need to
be replaced. The zero values are replaced by NaN so that missing values can
easily be imputed using the fillna() command. We perform Feature scaling on
the dataset using Minmaxscaler() so that it scales the entire dataset such that
it lies between 0 and 1. It is an important preprocessing step for many
algorithms.
In the feature correlation heatmap, we can observe that Glucose, Insulin,
Age and BMI are highly correlated with the outcome. So, we select these
features as X and the outcome as Y. The dataset is then split using
train_test_split with an 80:20 ratio.
4. IMPLEMENTATION
The data is divided into classes, if other data is wanted to classify then it finds the
neighbours of that element based on the majority number of votes for the label.
The classification report and confusion matrix, accuracy, precision , f1-score are
shown below:
Figure 4.1: Data cells and training classification report of the system
Threshold value
Fig 5: Diabetes prediction deployed to Django framework in which the sdjango server is hosted
locally
DIABETES EARLY PREDICTION
SYSTEM USING DJANGO
Well, the internship has definitely reaffirmed my passion for Data Science and I am
grateful that my works did leave some traction for future works. The research and
development phase, the communication skills required to talk to different stakeholders, the
curiosity and passion to solve business problems using data (just to name a few) have all
contributed to my interest in this field.
The Data Science industry is still very young and its job description could somehow seem
vague and ambiguous to job seekers like us. It’s perfectly normal to not possess all the
skills needed as the most job description is idealistically created to align with their best
expectation
REFRERENCES
[1] W. Xueli, J. Zhiyong and Y. Dahai, "An Improved KNN Algorithm Based on Kernel Methods
and Attribute Reduction," 2015 Fifth International Conference on Instrumentation and Measurement,
Computer, Communication and Control (IMCCC), 2015, pp. 567-570, doi:
10.1109/IMCCC.2015.125.
[2] D. Shetty, K. Rit, S. Shaikh and N. Patil, "Diabetes disease prediction using data mining," 2017
International Conference on Innovations in Information, Embedded and Communication Systems
(ICIIECS), 2017, pp. 1-5, doi: 10.1109/ICIIECS.2017.8276012.
[3] V. S. Lakshmi, V. Nithya, K. Sripriya, C. Preethi and K. Logeshwari, "Prediction of Diabetes
Patient Stage Using Ontology Based Machine Learning System," 2019 IEEE International
Conference on System, Computation, Automation and Networking (ICSCAN), 2019, pp. 1-4, doi:
10.1109/ICSCAN.2019.8878831.
[4] Y. Chang and H. Liu, "Semi-supervised classification algorithm based on the KNN," 2011 IEEE
3rd International Conference on Communication Software and Networks, 2011, pp. 9-12, doi:
10.1109/ICCSN.2011.6014376.