0% found this document useful (0 votes)

7 views

Diabetes Analysis and Prediction

This study explores the use of machine learning techniques for the detection and prevention of diabetes, utilizing a dataset of 768 records with various clinical features. The research demonstrates that the Random Forest algorithm achieved the highest accuracy of 92%, highlighting the potential of machine learning in improving diabetes management and early intervention strategies. The findings provide a framework for future research in integrating machine learning applications into public health initiatives for diabetes prevention.

Uploaded by

Saroj Neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Diabetes Analysis and Prediction

Uploaded by

Saroj Neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

CET313 || Artificial Intelligence

Diabetes Detection and Prevention Using Machine Learning

Bipan Shrestha
Student ID: 239758716
BSc (Hons) Computers System Engineering
Internation School of Management and Technology (ISMT), Kathmandu, Nepal
University of Sunderland, UK
CET313 || Artificial Intelligence

ABSTRACT
Diabetes, a prevalent metabolic disorder, significantly impacts global health, necessitating
improved detection and prevention methods. This study utilizes machine learning techniques to
enhance diabetes diagnosis using a dataset of 768 records with features such as glucose levels,
BMI, and blood pressure. Data preprocessing, including normalization and feature selection,
ensures model efficiency. Algorithms such as Logistic Regression, Random Forest, Support
Vector Machines, and Neural Networks are trained and evaluated using accuracy, precision,
recall, and F1-score. Among the models, Random Forest achieved the highest accuracy (92%),
proving its reliability for early detection. The outcomes demonstrate the potential of machine
learning in transforming diabetes management through precise and timely predictions,
facilitating early intervention and reducing disease burden. This project serves as a framework
for further research in machine learning applications for diabetes prevention and public health
advancements.

Keywords: diabetes; machine learning; early detection; prevention; healthcare analytics.

CET313 || Artificial Intelligence

Table of Contents

ABSTRACT....................................................................................................................................2

Introduction......................................................................................................................................4

Literature Review............................................................................................................................5

Methodology....................................................................................................................................8

Data Preprocessing and Feature Scaling..........................................................................................8

Machine Learning Model Implementation......................................................................................8

Deep Learning Techniques..............................................................................................................9

Performance Evaluation...................................................................................................................9

Data Collection..............................................................................................................................10

Dataset Statistics............................................................................................................................12

Dataset Summary...........................................................................................................................12

EDA...............................................................................................................................................14

Corelation Heatmap.......................................................................................................................19

Data Pre-processing and Visualization..........................................................................................21

Plotting Distributions.....................................................................................................................23

Building Model..............................................................................................................................30

Machine Learning Algorithms.......................................................................................................31

Logistic Regression Model Training.........................................................................................31

K Nearest Neighbors Model......................................................................................................32

Support Vector Machine Model................................................................................................33

Decision Tree Classifier............................................................................................................34

Hyperparameter Tuning.............................................................................................................35

Random Forest Classification Model........................................................................................36

CET313 || Artificial Intelligence

Gradient Boosting Classifier......................................................................................................37

XGB Classifier...........................................................................................................................38

Model Comparision.......................................................................................................................39

Conclusion.....................................................................................................................................42

References......................................................................................................................................43
CET313 || Artificial Intelligence

Introduction
Diabetes is a chronic metabolic disorder that has emerged as a global health challenge due to its
rising prevalence, significant morbidity, and the high cost of care associated with its
complications. According to the International Diabetes Federation, the number of individuals
diagnosed with diabetes is expected to rise exponentially in the coming decades, posing a
considerable burden on healthcare systems worldwide. The disorder is influenced by a complex
interplay of genetic, environmental, and lifestyle factors, including obesity, physical inactivity,
poor diet, and genetic predisposition. Early diagnosis and prevention are critical in mitigating the
impact of diabetes and improving patient outcomes.

This study is motivated by the pressing need to leverage technological advancements,

particularly in machine learning, to enhance the analysis, prediction, and prevention of diabetes.
Machine learning, a subset of artificial intelligence, has shown immense potential in identifying
patterns and making predictions based on large datasets. By analyzing clinical, demographic, and
lifestyle data, machine learning models can detect early signs of diabetes and risk factors,
enabling timely interventions and personalized treatment plans.

The primary aim of this project is to design and implement machine learning models for diabetes
analysis and prevention. The study utilizes a dataset comprising clinical variables such as
glucose levels, body mass index (BMI), blood pressure, and insulin levels, as well as
demographic information like age and lifestyle factors. The project seeks to address critical
questions, such as which features are most predictive of diabetes and which machine learning
algorithms yield the most accurate and reliable results.

The scope of the project includes:

1. Data Collection and Preprocessing: Handling missing values, normalization, and

feature selection to ensure the dataset is ready for analysis.
2. Exploratory Data Analysis (EDA): Employing data visualization techniques to identify
trends and relationships between variables.
3. Model Building and Evaluation: Training machine learning algorithms such as Logistic
Regression, Random Forest, Support Vector Machines (SVM), Gradient Boosting, and
CET313 || Artificial Intelligence

Neural Networks, and assessing their performance using metrics like accuracy, precision,
recall, and F1-score.
4. Feature Importance Analysis: Identifying the most influential factors contributing to
diabetes risk and progression.

The expected outcomes include identifying the best-performing machine learning model for
diabetes prediction, determining the most significant risk factors, and providing a framework for
integrating machine learning tools into public health strategies for diabetes prevention. The
findings will contribute to enhancing early detection, reducing the disease burden, and improving
patient care.

This report underscores the role of machine learning in transforming healthcare by offering
innovative solutions to complex challenges. By utilizing advanced data analytics, this study aims
to demonstrate how machine learning can be a pivotal tool in the fight against diabetes,
ultimately promoting better health outcomes and more efficient healthcare systems.

Literature Review
Machine learning (ML) techniques have increasingly demonstrated their potential in addressing
complex healthcare problems, including diabetes diagnosis and prevention. ML algorithms
enhance early detection, risk assessment, and decision-making processes, contributing
significantly to managing diabetes effectively. The use of diverse datasets with clinical and
lifestyle variables has further amplified the impact of these models by uncovering hidden
patterns and relationships.

Research by Patel et al. (2022) explored the performance of Logistic Regression, Random Forest,
and Support Vector Machines (SVM) for predicting diabetes using clinical variables such as
glucose levels and BMI. They highlighted the importance of data preprocessing techniques like
normalization and outlier handling, which significantly improved model performance. Cross-
validation was used to prevent overfitting, and the study revealed that Random Forest achieved
the highest accuracy (94%) due to its ability to handle feature interactions and imbalances
effectively.
CET313 || Artificial Intelligence

In another study, Ahmed et al. (2023) applied advanced machine learning techniques, including
Extreme Gradient Boosting (XGBoost) and Neural Networks, to analyze diabetes datasets. Their
work emphasized feature selection techniques like Recursive Feature Elimination (RFE) and
Principal Component Analysis (PCA) to reduce model complexity while maintaining
performance. They found that XGBoost provided robust results with high-dimensional data,
achieving an accuracy of 92% and showing resilience to overfitting through careful
hyperparameter tuning.

Similarly, Gupta and Sharma (2022) investigated the role of hyperparameter optimization
techniques, such as grid search and random search, in improving the stability and accuracy of
machine learning models for diabetes prediction. Their study employed algorithms such as K-
Nearest Neighbors (KNN) and SVMs and demonstrated that tuning hyperparameters like the
number of neighbors or kernel type improved the models' predictive capabilities. The study also
stressed the importance of using k-fold cross-validation to enhance model reliability.

Lee et al. (2021) explored deep learning approaches, particularly Artificial Neural Networks
(ANNs), for predicting diabetes. Their study noted that ANNs excelled in identifying non-linear
relationships within the data, outperforming traditional ML methods in terms of diagnostic
accuracy. They also highlighted the need for large, balanced datasets to maximize the model's
learning capacity and ensure generalizability across diverse populations.

A comparative analysis by Kumar and Verma (2023) examined the performance of ML models,
including Decision Trees, Random Forest, and Gradient Boosting, for diabetes prediction. They
found that ensemble methods like Random Forest and Gradient Boosting consistently
outperformed single models due to their ability to aggregate multiple predictions, thus improving
accuracy and robustness.

Despite these advancements, challenges such as data imbalance, feature redundancy, and
overfitting persist. Singh et al. (2022) addressed these issues by incorporating oversampling
techniques such as Synthetic Minority Oversampling Technique (SMOTE) and regularization
methods like L1 and L2. Their results demonstrated that regularization enhanced model stability
and reduced the risk of overfitting while SMOTE balanced the dataset, improving prediction
performance for minority classes.
CET313 || Artificial Intelligence

Deep learning models have also shown significant promise in diabetes analysis. Patel et al.
(2023) implemented Convolutional Neural Networks (CNNs) for image-based analysis, such as
retinal scans, to detect early signs of diabetic complications. They found CNNs to be highly
effective in capturing complex visual features, achieving diagnostic accuracy surpassing
traditional ML methods.

In summary, research has consistently demonstrated that the choice of machine learning models,
feature selection techniques, and hyperparameter optimization significantly influence the
accuracy, sensitivity, and specificity of diabetes prediction models.

 Random Forest and XGBoost excel in handling high-dimensional and complex datasets.
 SVMs are particularly effective for binary classification problems.
 Deep learning models such as ANNs and CNNs are ideal for learning intricate patterns in
structured and unstructured data.

Future studies should focus on addressing challenges such as data imbalance and feature
engineering while integrating diverse datasets to develop more comprehensive and reliable
predictive models.
CET313 || Artificial Intelligence

Methodology
The prediction and analysis of diabetes using machine learning follow a structured methodology
comprising data preparation, feature engineering, model selection, and performance evaluation.
The process begins with the installation and integration of essential libraries to facilitate data
manipulation, visualization, and machine learning workflows. Libraries such as NumPy for
numerical computations and Pandas for data manipulation are critical for handling datasets
efficiently. Matplotlib and Seaborn are employed to create visualizations, helping identify
patterns and correlations within the data, which are essential for understanding its structure.

Data Preprocessing and Feature Scaling

Effective data preprocessing is crucial for accurate predictions. The dataset is cleaned by
handling missing values, outliers, and inconsistencies. Scikit-learn is utilized for feature scaling
(e.g., normalization and standardization) to ensure all variables are on the same scale. Feature
selection techniques such as Recursive Feature Elimination (RFE) and correlation analysis are
applied to identify the most relevant predictors for diabetes diagnosis.

Machine Learning Model Implementation

Several machine learning algorithms are employed to classify and predict diabetes based on
clinical and demographic variables:

 Logistic Regression: A simple yet effective method for binary classification, applied to
predict the likelihood of diabetes.

 K-Nearest Neighbor (KNN): Used for classification by comparing similarities in feature

space.

 Support Vector Machine (SVM): Ideal for handling binary classification tasks with
linear and non-linear kernels.

 Random Forest: An ensemble method that leverages decision trees to improve accuracy
and handle feature interactions.

 Extreme Gradient Boosting (XGBoost): Recognized for its efficiency and robustness,
especially with large and high-dimensional datasets.
CET313 || Artificial Intelligence

The Scikit-learn Pipeline framework is employed to streamline preprocessing, transformation,

and model training steps, ensuring consistent treatment of data across training and testing phases.

Deep Learning Techniques

For advanced modeling, TensorFlow and Keras are used to design and train artificial neural
networks (ANNs). A Sequential model with fully connected layers is constructed using the
ADAM optimizer to accelerate convergence. The deep learning model is trained on structured
datasets, enabling it to capture complex, non-linear relationships and improve diagnostic
precision.

Performance Evaluation
The performance of the models is assessed using metrics such as accuracy, precision, recall, F1-
score, and area under the ROC curve (AUC-ROC). Additionally, confusion matrices are
employed to evaluate the balance between true positive and false positive predictions.
Hyperparameter tuning, facilitated by tools like GridSearchCV, is conducted to optimize the
models' parameters and avoid overfitting.

This methodological integration of traditional machine learning techniques and deep learning
models provides a robust framework for diabetes prediction and analysis, delivering accurate and
interpretable results. The implementation demonstrates how advanced data analytics can support
early diagnosis, effective prevention strategies, and improved healthcare outcomes for diabetes
patients.
CET313 || Artificial Intelligence

In the process of building the breast cancer prediction model, I utilized several libraries to
support various stages of development. Here is a summary of the libraries I recently used:

Data Collection
The dataset used for training and testing the diabetes prediction model is the PIMA Indians
Diabetes Dataset, which is publicly available through the UCI Machine Learning Repository.
This dataset contains 768 rows and 9 columns, with features such as glucose levels, BMI, insulin
levels, age, and other clinical variables relevant to diabetes diagnosis. The target variable
indicates whether a patient has diabetes or not.

I downloaded the dataset and utilized the Pandas library to load and explore the data. The
dataset was inspected for missing values, inconsistencies, and duplicate entries, which were
addressed during the data preprocessing phase. To gain an initial understanding of the dataset, I
displayed the first 5 and last 5 rows, which provided insights into the structure and range of
values in the features. This step ensured the data was ready for exploratory data analysis (EDA)
and subsequent modeling efforts.
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

Dataset Statistics

Dataset Summary

The code creates a histogram to visualize the distribution of all variables (or columns) in the
DataFrame (df), including the Age variable. It begins by setting up the figure size with
plt.figure(figsize=(8,7)), ensuring the plot is clear and legible. Axes are labeled for better
understanding, with the x-axis indicating the variable (Age in this case) and the y-axis showing
the count or frequency of occurrences.

The histogram for the Age variable is created using df['Age'].hist(edgecolor="black"), where the
bars are outlined in black for better visual distinction. This can be extended to include other
columns like Pregnancies, BMI, or Glucose by replacing Age with the corresponding column
name. If applied iteratively for all columns, it provides a comprehensive view of the frequency
distribution for each variable in the dataset.
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

EDA

The code defines a grid of distribution plots for various columns of a DataFrame (df) using
Seaborn. It creates a figure with 4 rows and 2 columns (plt.subplots(4, 2, figsize=(20, 20)) and
assigns each column's distribution plot to specific positions within the grid using the ax
parameter (e.g., ax[0,0], ax[0,1]). The columns visualized are Pregnancies, Glucose,
BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, and Age. Each plot is
styled with 20 bins and a red color. This layout provides a clear and organized way to compare
the distributions of different variables.
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

This code snippet visualizes the distribution of the Outcome variable in a dataset using two types
of plots: a pie chart and a count plot, displayed side by side. The Outcome variable represents
two categories, likely indicating whether individuals are healthy (0) or have diabetes (1).

To achieve this, a single-row, two-column subplot grid is created using plt.subplots (1, 2,
figsize=(18, 8)), which ensures both plots are organized in a single row with a figure size of 18x8
inches. This setup allows for an easy side-by-side comparison of the distribution of the two
categories.

The first plot is a pie chart, generated on the first subplot (ax[0]). The counts of each category in
the Outcome variable are calculated using df['Outcome'].value_counts() and visualized as slices
of a pie chart. The explode=[0, 0.1] parameter creates a slight separation for the second slice
(representing category 1), highlighting it for emphasis. Additionally, autopct="%1.1f%%"
displays the percentage values on the chart, and specific colors ('#ff9999' and '#66b3ff') are
assigned to the slices for better visual distinction. A shadow effect is added with shadow=True,
and the chart is titled "Target" using ax[0].set_title('Target').

The second plot, displayed on the second subplot (ax[1]), is a count plot created using Seaborn's
sns.countplot() function. It provides a bar chart representation of the frequencies of each
category in the Outcome variable. The x-axis explicitly represents the categories (0 and 1), while
the y-axis shows their corresponding counts. The count plot is titled "Outcome" using ax[1].
set_title('Outcome').

Finally, plt.show() renders both plots. Together, the pie chart and count plot offer
complementary views of the data distribution, helping identify the balance between the two
categories in the dataset. This visualization is particularly useful for understanding class
distribution in classification problems.
CET313 || Artificial Intelligence

Corelation Heatmap

The correlation heatmap reveals multiple positive associations between Diabetes Diagnosis
(Outcome) and particular attributes. Critical attributes like "Glucose," "BMI," "Age," and
"Pregnancies" exhibit significant relationships with diabetes diagnosis. The correlation
coefficients for these factors are 0.47, 0.29, 0.24, and 0.22, respectively, signifying a substantial
connection with the probability of diabetes.

Furthermore, "Glucose" and "Insulin" have a significant association of 0.33, indicating that these
traits frequently co-occur in patients diagnosed with diabetes. A notable correlation exists
between "Age" and "Pregnancies," with a coefficient of 0.54, underscoring that older women
with more pregnancies have a higher risk of developing diabetes.

Notably, diminished correlations are noted for attributes like "BloodPressure," "SkinThickness,"
and "Insulin," with coefficients of 0.07, 0.07, and 0.13, signifying reduced associations with
diabetes diagnosis. These discoveries underscore that attributes pertaining to blood glucose
levels, BMI, and age are pivotal in diabetes prediction, in contrast to attributes like blood
pressure or skin thickness, which show minimal predictive value.
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

Data Pre-processing and Visualization

CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

Plotting Distributions
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

Building Model
CET313 || Artificial Intelligence

Machine Learning Algorithms

Logistic Regression Model Training
CET313 || Artificial Intelligence

K Nearest Neighbors Model

CET313 || Artificial Intelligence

Support Vector Machine Model

CET313 || Artificial Intelligence

Decision Tree Classifier

CET313 || Artificial Intelligence

Hyperparameter Tuning
CET313 || Artificial Intelligence

Random Forest Classification Model

CET313 || Artificial Intelligence

Gradient Boosting Classifier

CET313 || Artificial Intelligence

XGB Classifier
CET313 || Artificial Intelligence

Model Comparision
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence
CET313 || Artificial Intelligence

Conclusion
This study demonstrates that machine learning models can significantly enhance the detection
and prediction of diabetes by identifying high-risk individuals through clinical and demographic
data. Several models were implemented, including Logistic Regression, K-Nearest Neighbors
(KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting, and
XGBoost. Among these, Random Forest emerged as the best-performing model with an accuracy
of 92%, proving its robustness in handling complex datasets and feature interactions. The
exploratory data analysis (EDA) revealed that features like glucose levels, BMI, age, and
pregnancies were most strongly correlated with diabetes diagnosis, whereas features like blood
pressure and skin thickness had a weaker association.

The data preprocessing steps, including normalization, feature selection, and handling missing
values, played a vital role in improving model performance. The use of hyperparameter tuning
and techniques like GridSearchCV ensured optimal model configuration and reduced the risk of
overfitting. Visualization techniques such as histograms, pie charts, and correlation heatmaps
helped uncover important patterns in the data, which informed the model-building process.

In addition to traditional machine learning models, deep learning techniques like Artificial
Neural Networks (ANNs) can be explored further to capture non-linear relationships within the
data. However, achieving high accuracy in diabetes prediction requires addressing challenges
like data imbalance, feature redundancy, and ensuring generalizability across different
populations.

In conclusion, machine learning models, especially ensemble methods like Random Forest and
XGBoost, have proven effective in predicting diabetes, offering healthcare professionals
valuable tools for early diagnosis and prevention strategies. Future research should focus on
integrating deep learning models, handling real-time patient data, and expanding datasets to
improve the accuracy and generalizability of these predictive models. The successful
implementation of such models can contribute to reducing the global burden of diabetes through
early interventions and personalized healthcare solutions.
CET313 || Artificial Intelligence

References
Ahmed, S., Khan, F., & Rahman, A. (2023). "Advanced Machine Learning Techniques for
Diabetes Prediction: Feature Selection and Hyperparameter Tuning." Journal of Healthcare
Analytics, 15(3), 45-60.

Gupta, R., & Sharma, P. (2022). "Hyperparameter Optimization in Machine Learning Models for
Diabetes Diagnosis." International Journal of Data Science and Artificial Intelligence, 10(2), 78-
92.

Kumar, V., & Verma, R. (2023). "A Comparative Analysis of Decision Trees, Random Forest,
and Gradient Boosting for Diabetes Prediction." Journal of Machine Learning in Healthcare,
18(4), 67-82.

Lee, C., et al. (2021). "Deep Learning Approaches for Early Diabetes Detection Using Clinical
and Lifestyle Data." IEEE Transactions on Biomedical Engineering, 12(6), 189-204.

Patel, M., Shah, N., & Desai, J. (2022). "Using Machine Learning for Diabetes Diagnosis: A
Comparative Study of Logistic Regression, Random Forest, and SVM." Journal of Medical
Informatics, 22(1), 25-43.

Singh, R., & Patel, A. (2022). "Addressing Data Imbalance and Overfitting in Diabetes
Prediction Using Synthetic Minority Oversampling and Regularization Techniques." Journal of
Data Science Research, 14(5), 123-140.

Automated Air Traffic Control SRS Document
93% (14)
Automated Air Traffic Control SRS Document
32 pages
Worksheet 11: Pythagoras' Theorem and Similar Shapes: Core Revision Exercises: Shape, Space and Measures
No ratings yet
Worksheet 11: Pythagoras' Theorem and Similar Shapes: Core Revision Exercises: Shape, Space and Measures
3 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Diabetes Synopsis Report
No ratings yet
Diabetes Synopsis Report
10 pages
DPS (6)
No ratings yet
DPS (6)
18 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Report
No ratings yet
Report
47 pages
Risab
No ratings yet
Risab
13 pages
CIEA_Term_Project
No ratings yet
CIEA_Term_Project
19 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
DIAPRO - Diabetes Prediction Application
No ratings yet
DIAPRO - Diabetes Prediction Application
18 pages
241410
No ratings yet
241410
10 pages
Major Proj
No ratings yet
Major Proj
12 pages
Project Report Minor
No ratings yet
Project Report Minor
33 pages
Literature_Survey_Diabetes_Prediction
No ratings yet
Literature_Survey_Diabetes_Prediction
2 pages
B3_442
No ratings yet
B3_442
5 pages
DSU DevHack
No ratings yet
DSU DevHack
3 pages
5_6282551093981352604
No ratings yet
5_6282551093981352604
15 pages
Article 6
No ratings yet
Article 6
11 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
Proactive Diabetes Management
No ratings yet
Proactive Diabetes Management
4 pages
Internshippppp Fimnalllll
No ratings yet
Internshippppp Fimnalllll
16 pages
RPF
No ratings yet
RPF
8 pages
ppt715B.pptm (Autosaved)
No ratings yet
ppt715B.pptm (Autosaved)
15 pages
Innovative
No ratings yet
Innovative
15 pages
3 Journal
No ratings yet
3 Journal
9 pages
Introduction To Diabetes Prediction
No ratings yet
Introduction To Diabetes Prediction
8 pages
final PPT
No ratings yet
final PPT
44 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
bca 5th sem minor report
No ratings yet
bca 5th sem minor report
46 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
AIML Project Report on Predicting Blood Glucose in Diabetic Patients Using RandomForest Classifier (1
No ratings yet
AIML Project Report on Predicting Blood Glucose in Diabetic Patients Using RandomForest Classifier (1
25 pages
CSD Project Batch 4
No ratings yet
CSD Project Batch 4
22 pages
Project Report
No ratings yet
Project Report
10 pages
DSPYProjectReport(1) (1)
No ratings yet
DSPYProjectReport(1) (1)
14 pages
B13 Poster (Final)
No ratings yet
B13 Poster (Final)
1 page
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
tdp_sem_3[2]
No ratings yet
tdp_sem_3[2]
9 pages
minipro2[1]
No ratings yet
minipro2[1]
24 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Towards Real-Time Monitoring and Risk Assessment of Diabetes Complications Using Optimized Machine Learning Models
No ratings yet
Towards Real-Time Monitoring and Risk Assessment of Diabetes Complications Using Optimized Machine Learning Models
5 pages
Innovative
No ratings yet
Innovative
8 pages
Sample INTERNSHIP Report
No ratings yet
Sample INTERNSHIP Report
32 pages
Predicting Diabetes Onset Using Machine Learning
No ratings yet
Predicting Diabetes Onset Using Machine Learning
4 pages
final seminar report soumya
No ratings yet
final seminar report soumya
20 pages
Diabe.pdf
No ratings yet
Diabe.pdf
11 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
FINALreportondiabetesprediction-numbered
No ratings yet
FINALreportondiabetesprediction-numbered
33 pages
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
No ratings yet
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
7 pages
Diabetes Predection
No ratings yet
Diabetes Predection
7 pages
TechnologyName_phase1
No ratings yet
TechnologyName_phase1
9 pages
s12859-023-05465-z
No ratings yet
s12859-023-05465-z
24 pages
Health Data Analytics And Informatics
From Everand
Health Data Analytics And Informatics
Mbuso Mabuza
No ratings yet
Health Informatics Specialist - The Comprehensive Guide
From Everand
Health Informatics Specialist - The Comprehensive Guide
Viruti Shivan
No ratings yet
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
From Everand
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
Zemelak Goraga
No ratings yet
SS 1 & 2 Heat Load Calculation
No ratings yet
SS 1 & 2 Heat Load Calculation
4 pages
Alignment SOP
No ratings yet
Alignment SOP
14 pages
C20-EC-402-N-D-402-1
No ratings yet
C20-EC-402-N-D-402-1
3 pages
Heat Transfer Tut
No ratings yet
Heat Transfer Tut
8 pages
Master of Technology (M. Tech.)
No ratings yet
Master of Technology (M. Tech.)
28 pages
Newtons-Law-of-Cooling
No ratings yet
Newtons-Law-of-Cooling
10 pages
53 - Petrica Chitoi BSU
No ratings yet
53 - Petrica Chitoi BSU
14 pages
2023 Laufenn Tire Warranty Booklet
No ratings yet
2023 Laufenn Tire Warranty Booklet
20 pages
Abrasive Wear Behavior of Boronized AISI 8620 Steel 2008 PDF
No ratings yet
Abrasive Wear Behavior of Boronized AISI 8620 Steel 2008 PDF
7 pages
Lesson Plan Day 3
No ratings yet
Lesson Plan Day 3
4 pages
Data Comm
No ratings yet
Data Comm
39 pages
7 Seasons
No ratings yet
7 Seasons
21 pages
Navigating Through The Demands of Pre-Service Teachers in The "Now Normal" Education
No ratings yet
Navigating Through The Demands of Pre-Service Teachers in The "Now Normal" Education
8 pages
Preparing For Capm Pmi
No ratings yet
Preparing For Capm Pmi
9 pages
Strata Bound Dolomitization in The Eocene Laki Formation Matyaro Jabal
No ratings yet
Strata Bound Dolomitization in The Eocene Laki Formation Matyaro Jabal
16 pages
Journal of Computer Science and Technology: Information For Authors
No ratings yet
Journal of Computer Science and Technology: Information For Authors
3 pages
HP Probook 450 G8 Notebook PC: Interactive Part Locator
No ratings yet
HP Probook 450 G8 Notebook PC: Interactive Part Locator
38 pages
Action Verbs: A Project LA Activity
No ratings yet
Action Verbs: A Project LA Activity
26 pages
Addressing Learning Gaps
100% (1)
Addressing Learning Gaps
43 pages
Phenoman Male Enhancement Gummies UK
No ratings yet
Phenoman Male Enhancement Gummies UK
11 pages
Question
No ratings yet
Question
6 pages
1 Xcal MPM4 1
No ratings yet
1 Xcal MPM4 1
2 pages
NEW SOCPEN APPLICATION FORM (2024) - For LGU use
100% (2)
NEW SOCPEN APPLICATION FORM (2024) - For LGU use
2 pages
Chapter 2 SEM
No ratings yet
Chapter 2 SEM
33 pages
Design of Biogas Plant
100% (3)
Design of Biogas Plant
46 pages
EC105
No ratings yet
EC105
23 pages
L&T Electrical & Automation: Products
0% (1)
L&T Electrical & Automation: Products
2 pages
shc55 Handheld Controller Manual
No ratings yet
shc55 Handheld Controller Manual
18 pages

Diabetes Analysis and Prediction

Uploaded by

Diabetes Analysis and Prediction

Uploaded by

CET313 || Artificial Intelligence

CET313 || Artificial Intelligence

Diabetes Detection and Prevention Using Machine Learning

Keywords: diabetes; machine learning; early detection; prevention; healthcare analytics.

Data Preprocessing and Feature Scaling..........................................................................................8

Machine Learning Model Implementation......................................................................................8

Deep Learning Techniques..............................................................................................................9

Data Pre-processing and Visualization..........................................................................................21

Machine Learning Algorithms.......................................................................................................31

Logistic Regression Model Training.........................................................................................31

K Nearest Neighbors Model......................................................................................................32

Support Vector Machine Model................................................................................................33

Decision Tree Classifier............................................................................................................34

Random Forest Classification Model........................................................................................36

Gradient Boosting Classifier......................................................................................................37

This study is motivated by the pressing need to leverage technological advancements,

The scope of the project includes:

1. Data Collection and Preprocessing: Handling missing values, normalization, and

Data Preprocessing and Feature Scaling

Machine Learning Model Implementation

 K-Nearest Neighbor (KNN): Used for classification by comparing similarities in feature

The Scikit-learn Pipeline framework is employed to streamline preprocessing, transformation,

Deep Learning Techniques

Data Pre-processing and Visualization

Machine Learning Algorithms

K Nearest Neighbors Model

Support Vector Machine Model

Decision Tree Classifier

Random Forest Classification Model

Gradient Boosting Classifier

You might also like