0% found this document useful (0 votes)

42 views

Logistic - Ipynb - Colaboratory

The document discusses building a logistic regression model to predict diabetes using a diabetes dataset. It loads and explores the dataset, splits it into training and test sets, trains a logistic regression classifier on the training set, evaluates the model on the test set, and predicts on a new sample. Key steps include data preprocessing, training an LR model, calculating accuracy and other metrics, and predicting the probability of diabetes for a new data point.

Uploaded by

Akansha Uniyal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Logistic - Ipynb - Colaboratory

Uploaded by

Akansha Uniyal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

3/31/23, 2:59 PM logistic.

ipynb - Colaboratory

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import pandas_profiling as pp
import plotly.express as px

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report,accuracy_score,f1_score,precision_score,recall_score,roc_curve,roc_auc_score

%matplotlib inline

df = pd.read_csv('/content/diabetes2.csv')

df_temp = df.copy()

df.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabe

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

df.shape

(768, 9)

df["Outcome"].value_counts()

0 500
1 268
Name: Outcome, dtype: int64

df.isnull().sum()

Pregnancies 0
Glucose 0

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 1/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

# The visualisation of outcome
sns.catplot(x="Outcome", kind="count", data=df_temp, palette="Set2")
plt.show()

# The visualisation count of Age of their diabetics
ax = sns.catplot(x="Age", kind="count",hue="Outcome",data=df_temp, palette="pastel", legend=False)
ax.fig.set_figwidth(20)
plt.legend(loc='upper right', labels= ["Non diabetic", "Diabetic"])
plt.show()

# Age Distribution
fig = px.histogram(df, x="Age",
marginal="box")
fig.show()

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 2/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

100

80
count

# Age distribution by Outcome 0
fig = px.histogram(df, x=df[df.Outcome==0].Age,
40
marginal="box",
color_discrete_sequence=['lightgreen'])
fig.show() 20

0
20 30 40 50

60
count

0
20 30 40 50

# Age distribution by Outcome 0
fig = px.histogram(df, x=df[df.Outcome==1].Age,
marginal="box",
color_discrete_sequence=['purple'])
fig.show()

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 3/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

50
# Glucose distribution by Outcome 1
fig = px.histogram(df, x=df[df.Outcome==1].Glucose,
40
marginal="box",
color_discrete_sequence=['#AB63FA'])
fig.show()
30
count

35
0
20 30 40
30

25
count

0
0 50 100 150 200

# Average Glucose for Diabetics person
df[df.Outcome==1].Glucose.mean()

141.25746268656715

x = df_temp.drop(['Outcome'], axis = 1)
y = df_temp.loc[:,"Outcome"].values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 123)

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(514, 8)
(514,)
(254, 8)
(254,)
Code Text
lr = LogisticRegression(solver='liblinear', max_iter = 10) #solve=liblinear kaggle için gerekli
lr.fit(x_train, y_train)

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 4/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

/usr/local/lib/python3.9/dist-packages/sklearn/svm/_base.py:1244: ConvergenceWarning:

lr.fit(x_train, y_train)
Liblinear failed to converge, increase the number of iterations.

▾ LogisticRegression
/usr/local/lib/python3.9/dist-packages/sklearn/svm/_base.py:1244: Converge
LogisticRegression(max_iter=10, solver='liblinear')
Liblinear failed to converge, increase the number of iterations.

▾ LogisticRegression
LogisticRegression(max iter=10, solver='liblinear')

x_pred = lr.predict(x_train)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_train, x_pred)

array([[311, 32],
[120, 51]])

#train score
score = accuracy_score(y_train, x_pred)
score

0.7042801556420234

y_pred = lr.predict(x_test)

confusion_matrix(y_pred,y_test)

array([[143, 61],
[ 14, 36]])

cm1 = confusion_matrix(y_test, y_pred)
sns.heatmap(cm1, annot=True, fmt=".0f")
plt.xlabel('Predicted Values')
plt.ylabel('Actual Values')
plt.title('Accuracy Score: {0}'.format(score), size = 15)
plt.show()

print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.70 0.91 0.79 157

1 0.72 0.37 0.49 97

accuracy 0.70 254

macro avg 0.71 0.64 0.64 254
weighted avg 0.71 0.70 0.68 254

# Defined data set (it should be diabetic)

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 5/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory
# Defined data set (it should be diabetic)
data = [[5, 150, 33.7, 50, 150, 74, 0.5, 53]]

# Create the pandas DataFrame
df_test = pd.DataFrame(data, columns = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','A

# Predict on new data
res = lr.predict(df_test)
res

array([1])

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 6/6

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
88% (8)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
Grammar With Tamil Explanation
83% (135)
Grammar With Tamil Explanation
77 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Logistic Regression With Pyspark
No ratings yet
Logistic Regression With Pyspark
19 pages
Assignment 5 - SourceCode - Ipynb - Colab
No ratings yet
Assignment 5 - SourceCode - Ipynb - Colab
4 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Logistic
No ratings yet
Logistic
5 pages
Hyperparameter tuning using Ridge and Lasso Regression
No ratings yet
Hyperparameter tuning using Ridge and Lasso Regression
7 pages
ML 5
No ratings yet
ML 5
3 pages
Hear Disease
No ratings yet
Hear Disease
45 pages
TP.ipynb - Colab
No ratings yet
TP.ipynb - Colab
6 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
HousepricedataDT - Ipynb - Colab
No ratings yet
HousepricedataDT - Ipynb - Colab
3 pages
Day 39
No ratings yet
Day 39
6 pages
Statistical Data Analysis - Ipynb - Colaboratory
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
6 pages
KNN ALGORITHM.ipynb - Colab (1)
No ratings yet
KNN ALGORITHM.ipynb - Colab (1)
4 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Trabajo
No ratings yet
Trabajo
5 pages
aditi_dsbda4_final_main
No ratings yet
aditi_dsbda4_final_main
3 pages
Heart FailureDataset ML Algorithms
No ratings yet
Heart FailureDataset ML Algorithms
10 pages
RandomForest
No ratings yet
RandomForest
8 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Feature Engineering On Banks' Private Credit Data - Ipynb - Colab
No ratings yet
Feature Engineering On Banks' Private Credit Data - Ipynb - Colab
6 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
Cardiovascular_Disease_Prediction
No ratings yet
Cardiovascular_Disease_Prediction
2 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
Generative AI Binary Classification
No ratings yet
Generative AI Binary Classification
7 pages
Brain Tumor Detection Localization 1682362328
No ratings yet
Brain Tumor Detection Localization 1682362328
16 pages
Indi - Colab
No ratings yet
Indi - Colab
11 pages
diabetes-prediction-using-machine-learning
No ratings yet
diabetes-prediction-using-machine-learning
16 pages
ML program 7 ,8,9 and10
No ratings yet
ML program 7 ,8,9 and10
12 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
Descriptive Analytics.ipynb - Colab
No ratings yet
Descriptive Analytics.ipynb - Colab
9 pages
Neural_Network
No ratings yet
Neural_Network
7 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
Materi Analisis Data - Colab
No ratings yet
Materi Analisis Data - Colab
3 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Logistic Multiclass Classification
No ratings yet
Logistic Multiclass Classification
2 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
Bitcoin Price Prediction 5 - Colaboratory
No ratings yet
Bitcoin Price Prediction 5 - Colaboratory
5 pages
Linear - Regression - Ipynb - Colaboratory
No ratings yet
Linear - Regression - Ipynb - Colaboratory
4 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
34 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
No ratings yet
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
8 pages
EXP6_DOGS&CATS.ipynb - Colaboratory
No ratings yet
EXP6_DOGS&CATS.ipynb - Colaboratory
4 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
AttiqAhmadAfsarMidExam
No ratings yet
AttiqAhmadAfsarMidExam
8 pages
healthcare-project-simplilearn- Week2
No ratings yet
healthcare-project-simplilearn- Week2
8 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Python Basic Data Types
No ratings yet
Python Basic Data Types
27 pages
Raster and Random Display
No ratings yet
Raster and Random Display
6 pages
Introduction To Computer Graphics
No ratings yet
Introduction To Computer Graphics
2 pages
Web Browser and Web Server
No ratings yet
Web Browser and Web Server
14 pages
Certi Cate of Analysis: Trim Ready/CBD Hemp Direct Sample: 1809NVC0923-4848
No ratings yet
Certi Cate of Analysis: Trim Ready/CBD Hemp Direct Sample: 1809NVC0923-4848
2 pages
Q2 - Summative Test 1 Variation
No ratings yet
Q2 - Summative Test 1 Variation
3 pages
VO MBA SEM3 Project Management UNIT 04
No ratings yet
VO MBA SEM3 Project Management UNIT 04
14 pages
FsdafsadfsafawefwqCBAR Background
No ratings yet
FsdafsadfsafawefwqCBAR Background
3 pages
Dissertation On Environmental Law in India
100% (2)
Dissertation On Environmental Law in India
8 pages
RIASEC
No ratings yet
RIASEC
1 page
University of Exeter
No ratings yet
University of Exeter
148 pages
ORAL COM 11 Quarter 1 Module 1
No ratings yet
ORAL COM 11 Quarter 1 Module 1
37 pages
Reasoning: Immediate Inference
No ratings yet
Reasoning: Immediate Inference
14 pages
Hawker Life IQ
No ratings yet
Hawker Life IQ
10 pages
Thesis Topics in Child Psychology
100% (3)
Thesis Topics in Child Psychology
8 pages
Afriyalni Syafril 21045073 Tugas Abi Ke 5
No ratings yet
Afriyalni Syafril 21045073 Tugas Abi Ke 5
6 pages
Vampire Facts
No ratings yet
Vampire Facts
21 pages
Space Tourism British English Student
No ratings yet
Space Tourism British English Student
6 pages
2 Aktiviti Log - Volunteerism
No ratings yet
2 Aktiviti Log - Volunteerism
11 pages
ACTIVITY SHEET IN ENGLISH 7, Q3 M1 Multimedia Resources
No ratings yet
ACTIVITY SHEET IN ENGLISH 7, Q3 M1 Multimedia Resources
9 pages
S3 - Sanghoon Lee
No ratings yet
S3 - Sanghoon Lee
12 pages
Lab Report 1 - PHD 226
No ratings yet
Lab Report 1 - PHD 226
4 pages
Size Structured Population Models of Dap
No ratings yet
Size Structured Population Models of Dap
8 pages
TDS Engage 8200
No ratings yet
TDS Engage 8200
3 pages
MOCA Assessment Tool
No ratings yet
MOCA Assessment Tool
2 pages
Factors That Promote and Hinder Development
87% (15)
Factors That Promote and Hinder Development
2 pages
Chapter-12 ANOVA For-Homework
No ratings yet
Chapter-12 ANOVA For-Homework
16 pages
RSC 2001 Marangoni Schramm PDF
No ratings yet
RSC 2001 Marangoni Schramm PDF
46 pages
Data Angket 1
No ratings yet
Data Angket 1
4 pages

Logistic - Ipynb - Colaboratory

Uploaded by

Logistic - Ipynb - Colaboratory

Uploaded by

3/31/23, 2:59 PM logistic.

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabe

precision recall f1-score support

0 0.70 0.91 0.79 157

accuracy 0.70 254

# Defined data set (it should be diabetic)

Colab paid products - Cancel contracts here

You might also like