0% found this document useful (0 votes)

163 views

ML Lab Programs PDF

1. The document describes implementing k-nearest neighbors (KNN) classification in python. KNN is an algorithm that classifies data points based on their distance from neighboring training data points. 2. The code loads training data, scales the features, splits the data into training and test sets, and trains a KNN classifier with different values of k neighbors. It evaluates the model using accuracy on both training and test sets. 3. Metrics like the confusion matrix and ROC curve are used to evaluate model performance on the test set. The optimal k value is selected based on achieving the highest test accuracy.

Uploaded by

Ravi Kiran

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views

ML Lab Programs PDF

Uploaded by

Ravi Kiran

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

List of Experiments

1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school

days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)

2. Extract the data from database using python

3. Implement k-nearest neighbours classification using python

4. Given the following data, which specify classifications for nine combinations of VAR1
and VAR2, predict a classification for a case where VAR1=0.906 and VAR2=0.606, using
the result of kˇmeans clustering with 3 means (i.e., 3 centroids)

VAR1 VAR2 CLASS

1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
5. The following training examples map descriptions of individuals onto high, medium and
lowcredit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisK
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk medium golf transport married forties yes
-> lowRisk
high skiing banking single thirties yes -> highRisk low golf unemployed married forties yes -
> highRisk Input attributes are (from left to right) income, recreation, job, status, age-group,
home-owner. Find the unconditional probability of `golf' and the conditional probability of
`single' given `medRisk' in the dataset?

6. Implement linear regression using python.

PROBLEM STATEMENT 1:

The probability that it is Friday and that a student is absent is 3 %. Since there are 5
school days in a week, the probability that it is Friday is 20 %. What is the probability
that a student is absent given that today is Friday? Apply Baye’s rule in python to get
the result. (Ans: 15%)

PROCEDURE:

If A and B are two events in a sample space S, then the conditional probability
of A given B is defined as
Similarly, the P(B/A) formula is:
P(A/B) =P(A∩B) / P(B)
P(B/A) =P(A∩B) / P(A)
Two events A and B are independent if and only if P(A∩B)=P(A)P(B)
For the given experiment, we have, P(A∩B)=0.03
P(A)=0.2
P(B|A)=P(A∩B)/P(A)
=0.03/0.2=0.15
Another Example: If a fair die /is rolled. Let A be the event that the outcome is
an odd number, i.e., A={1,3,5}. Also let B be the event that the outcome is less
than or equal to 3, i.e., B={1,2,3}. The figure shows the Venn diagram of the
events. p(B|A)=(2/6)/(3/6)=2/3=0.666

Source code:
probitisFridaynstudentAbsent=float(input("Enter the probability of
being Friday and student is absent: "))
probitisFriday=0.2
pstudentisAbsentgivenitisFriday=probitisFridaynstudentAbsent/probit
isFriday
print("Probability that student is absent given it is Friday is:",
pstudentisAbsentgivenitisFriday)

Out put:
Enter the probability of being Friday and student is absent: 0.03 Probability
that student is absent given it is Friday is: 0.15
PROBLEM STATEMENT 2
Extract the data from database using python

MySQL is an open-source, relational database management system(RDBMS)

that is based on Structured Query Language(SQL). Download and install
MySQL database from official website https://www.mysql.com/downloads/.
Next install MySQL Connector for Python, MySQL Connector enables the
Python programs to access the MySQL database. It can be downloaded and
installed from https://dev.mysql.com/downloads/connector/python/ Or using the
following command.
python -m pip install mysql-connector-python
If the MySQL connector is installed correctly there will be no error after
executing the import statement, import mysql.connector.

PROCEDURE:

Following are the steps to connect a python application to MySQL database.

1. Import mysql.connector module
2. Create the connection object.
3. Create the cursor object
4. Execute the query

SOURCE CODE:

import mysql.connector

#establishing the connection

conn = mysql.connector.connect(user='root',
password='cmrec@1234',host='localhost', database='cmrec')

#Creating a cursor object using the cursor() method

cursor = conn.cursor()

#Retrieving single row

sql = '''SELECT * from AIML'''

#Executing the query

cursor.execute(sql)

#Fetching 1st row from the table

#result = cursor.fetchone();
#print(result)

#Fetching 1st row from the table

result = cursor.fetchall();
print(result)

#Closing the connection

conn.close()

OUTPUT:

Implement k-nearest neighbours classification using python

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based

on Supervised Learning technique. It is an instance based or lazy learning
algorithm, here model is not learned using training data prior and the learning
process is postponed to a time when prediction is requested on the new instance.

K-NN Algorithm
• • Load the training data.
• • Choose K the number of nearest neighbors to look
• • Compute the test point’s distance from each training point
• • Sort the distances in ascending (or descending) order
• • Use the sorted distances to select the Knearest neighbors
• • Use majority rule(for classification) or averaging (for regression)

Advantages of KNN
1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems

Disadvantages of KNN
5. Memory Intensive / Computationally expensive
6. Sensitive to scale of data
7. Not work well on rare event (skewed) target variable
8. Struggle when high number of independent variables

SOURCE CODE:

import numpy as np

import pandas as pd

import math

import matplotlib.pyplot as plt

import seaborn as sns

from matplotlib.ticker import StrMethodFormatter

sns.set()
import warnings

warnings.filterwarnings('ignore')

#%matplotlib inline

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.model_selection import train_test_split

#import the csv file

df=pd.read_csv(r'C:\Users\AIMLJAVA4\Desktop\lab 3\diabetes.csv')

print(df)

df.info(verbose=True)

df.describe().T

df_copy = df.copy(deep = True)

df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']] =

df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].replace(0,np.NaN)

## showing the count of Nans

print(df_copy.isnull().sum())

p = df.hist(figsize = (20,20))

df_copy['Glucose'].fillna(df_copy['Glucose'].mean(), inplace = True)

df_copy['BloodPressure'].fillna(df_copy['BloodPressure'].mean(), inplace = True)

df_copy['SkinThickness'].fillna(df_copy['SkinThickness'].median(), inplace = True)

df_copy['Insulin'].fillna(df_copy['Insulin'].median(), inplace = True)

df_copy['BMI'].fillna(df_copy['BMI'].median(), inplace = True)

sc_X = StandardScaler()

X = pd.DataFrame(sc_X.fit_transform(df_copy.drop(["Outcome"],axis = 1),),

columns=['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age'])

X.head()

#splitting the trained set

y =df_copy.Outcome

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=1/3,random_state=42, stratify=y)

test_scores = []

train_scores = []

for i in range(1,15):

knn = KNeighborsClassifier(i)

knn.fit(X_train,y_train)

train_scores.append(knn.score(X_train,y_train))

test_scores.append(knn.score(X_test,y_test))

## score that comes from testing on the same datapoints that were used for training

max_train_score = max(train_scores)

train_scores_ind = [i for i, v in enumerate(train_scores) if v == max_train_score]

print('Max train score {} % and k = {}'.format(max_train_score*100,list(map(lambda x: x+1,

train_scores_ind))))

## score that comes from testing on the datapoints that were split in the beginning to be used for

testing solely
max_test_score = max(test_scores)

test_scores_ind = [i for i, v in enumerate(test_scores) if v == max_test_score]

print('Max test score {} % and k = {}'.format(max_test_score*100,list(map(lambda x: x+1,

test_scores_ind))))

#Setup a knn classifier with k neighbors

knn = KNeighborsClassifier(11)

knn.fit(X_train,y_train)

knn.score(X_test,y_test)

y_pred = knn.predict(X_test)

from sklearn import metrics

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')

plt.title('Confusion matrix', y=1.1)

plt.ylabel('Actual label')

plt.xlabel('Predicted label')

#import classification_report

from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

#classifying the data through ROC ROC (Receiver Operating Characteristic)

#Curve tells us about how good the model can distinguish between two things

from sklearn.metrics import roc_curve

y_pred_proba = knn.predict_proba(X_test)[:,1]

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

#Area under ROC curve

from sklearn.metrics import roc_auc_score

roc_auc_score(y_test,y_pred_proba)

#import GridSearchCV

from sklearn.model_selection import GridSearchCV

#In case of classifier like knn the parameter to be tuned is n_neighbors

param_grid = {'n_neighbors':np.arange(1,50)}

knn = KNeighborsClassifier()

knn_cv= GridSearchCV(knn,param_grid,cv=5)

knn_cv.fit(X,y)

print("Best Score:" + str(knn_cv.best_score_))

print("Best Parameters: " + str(knn_cv.best_params_))

OUTPUT:

accuracy of 121/143 = 84.6%.

PROBLEM STATEMENT 4
Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2, predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the
result of kˇmeans clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1

SOURCE CODE:

from sklearn.cluster import KMeans

import numpy as np
X = np.array([[1.713,1.586], [0.180,1.786], [0.353,1.240],
[0.940,1.566], [1.486,0.759], [1.266,1.106],[1.540,0.419],[0.459,1.799],[0.773,0.186]])
y=np.array([0,1,1,0,1,0,1,1,1])
kmeans = KMeans(n_clusters=3, random_state=0).fit(X,y)
print("The input data is ")
print("VAR1 \t VAR2 \t CLASS")
i=0
for val in X:
print(val[0],"\t",val[1],"\t",y[i])
i+=1
print("="*20)
# To get test data from the user
print("The Test data to predict ")
test_data = []
VAR1 = float(input("Enter Value for VAR1 :"))
VAR2 = float(input("Enter Value for VAR2 :"))
test_data.append(VAR1)
test_data.append(VAR2)
print("="*20)
print("The predicted Class is : ",kmeans.predict([test_data]))
OUTPUT:
PROBLEM STATEMENT 5

The following training examples map descriptions of individuals onto high, medium and
lowcredit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?

SOURCE CODE:

total_Records=10
numGolfRecords=4
unConditionalprobGolf=numGolfRecords / total_Records
print("Unconditional probability of golf: ={}".format(unConditionalprobGolf))
#conditional probability of 'single' given 'medRisk'
numMedRiskSingle=2
numMedRisk=3
probMedRiskSingle=numMedRiskSingle/total_Records
probMedRisk=numMedRisk/total_Records
conditionalProb=(probMedRiskSingle/probMedRisk)
print("Conditional probability of single given medRisk: = {}".format(conditionalProb))

OUTPUT:
Unconditional probability of golf: =0.4
Conditional probability of single given medRisk: = 0.6666666666666667
PROBLEM STATEMENT 6

Implement linear regression using python.

SOURCE CODE:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":
main()
OUTPUT:

PCE Mock Exam (100 Questions) English
70% (10)
PCE Mock Exam (100 Questions) English
10 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Learning Predictive Analytics with Python
From Everand
Learning Predictive Analytics with Python
Kumar Ashish
4/5 (1)
Purves - Formal Patterns
No ratings yet
Purves - Formal Patterns
27 pages
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
Grasha-Reichmann Learning Style Survey
No ratings yet
Grasha-Reichmann Learning Style Survey
6 pages
The Tycoon's Runaway Bride
75% (4)
The Tycoon's Runaway Bride
32 pages
NR21 ML LAB MANUAL
No ratings yet
NR21 ML LAB MANUAL
34 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
35 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
MLlab Manual
No ratings yet
MLlab Manual
36 pages
MACHINE LEARNING LAB MANUAL
No ratings yet
MACHINE LEARNING LAB MANUAL
36 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
39 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
R18 B ML LAB Manual - Minor Degree
No ratings yet
R18 B ML LAB Manual - Minor Degree
16 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
34 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
ML Acoerecordiot
No ratings yet
ML Acoerecordiot
18 pages
Lab3 Markdown
No ratings yet
Lab3 Markdown
7 pages
ML LAB Manual
No ratings yet
ML LAB Manual
28 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Quiz 1 (Bayesian Decision Theory)
6 pages
Regression Ex
No ratings yet
Regression Ex
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
ML - Practical File
No ratings yet
ML - Practical File
15 pages
R_language Lab Manual_pg 2024
No ratings yet
R_language Lab Manual_pg 2024
29 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
Lab Manual DL (New)
No ratings yet
Lab Manual DL (New)
89 pages
ML Lab Program 7
No ratings yet
ML Lab Program 7
7 pages
Abhishek ML File
No ratings yet
Abhishek ML File
23 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Supervised Learning With Scikit-Learn: Preprocessing Data
No ratings yet
Supervised Learning With Scikit-Learn: Preprocessing Data
32 pages
Advance Stats Assignment
No ratings yet
Advance Stats Assignment
18 pages
Rahul Raj.ipynb - Colab
No ratings yet
Rahul Raj.ipynb - Colab
50 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
DS Lab
No ratings yet
DS Lab
31 pages
Big Data Machine Learning
100% (1)
Big Data Machine Learning
6 pages
Procedure GLM
No ratings yet
Procedure GLM
37 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
ISYE6501-Homework-2
No ratings yet
ISYE6501-Homework-2
11 pages
Arnav MLlab05
No ratings yet
Arnav MLlab05
12 pages
ML-Lab Manual - NEP - DSS
No ratings yet
ML-Lab Manual - NEP - DSS
23 pages
ML W8 Merged
No ratings yet
ML W8 Merged
27 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
34 pages
Deep Learning For Credit Risk 1713932406
No ratings yet
Deep Learning For Credit Risk 1713932406
13 pages
Ml Lab Manual Completed
No ratings yet
Ml Lab Manual Completed
56 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
ML manoj
No ratings yet
ML manoj
51 pages
MCA Question Bank
No ratings yet
MCA Question Bank
33 pages
Indi - Colab
No ratings yet
Indi - Colab
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
25 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
cp4252-machine learning lab manual 23-24
No ratings yet
cp4252-machine learning lab manual 23-24
28 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Record
No ratings yet
Record
23 pages
Ensemblediabetes - Ipynb - Colab
No ratings yet
Ensemblediabetes - Ipynb - Colab
4 pages
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
Action Research Original
100% (1)
Action Research Original
34 pages
Types Section 02 Module01 PDF
No ratings yet
Types Section 02 Module01 PDF
3 pages
Communicative Language Teaching Approach Study
No ratings yet
Communicative Language Teaching Approach Study
78 pages
Cash Flow Statement Problems
No ratings yet
Cash Flow Statement Problems
11 pages
Iron Man Reading
No ratings yet
Iron Man Reading
3 pages
BizTalk Server Assessment
No ratings yet
BizTalk Server Assessment
15 pages
Montaigne: - On The Education of Children
No ratings yet
Montaigne: - On The Education of Children
38 pages
The Wisdom of The Chaldeans - An Old Hebrew Astrological Text (Moses Gaster 1900) - Text
No ratings yet
The Wisdom of The Chaldeans - An Old Hebrew Astrological Text (Moses Gaster 1900) - Text
25 pages
Writing Personal Webpage
No ratings yet
Writing Personal Webpage
14 pages
Physci Questions
No ratings yet
Physci Questions
3 pages
Total Quality Management and Customers Satisfaction in Selected Service Industries in Ilorin, Nigeria
No ratings yet
Total Quality Management and Customers Satisfaction in Selected Service Industries in Ilorin, Nigeria
18 pages
Lecture 6 Brand Management
No ratings yet
Lecture 6 Brand Management
14 pages
Test Media
No ratings yet
Test Media
10 pages
Filmfare May 4 20166255
No ratings yet
Filmfare May 4 20166255
104 pages
First Homecoming To Defense of 1st Novel
No ratings yet
First Homecoming To Defense of 1st Novel
11 pages
Society_of_apostolic_life
No ratings yet
Society_of_apostolic_life
4 pages
Bahasa Inggris Kelas 8 Present, Past, Comparative
No ratings yet
Bahasa Inggris Kelas 8 Present, Past, Comparative
6 pages
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
67% (3)
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
125 pages
E2021 Meeting Program Draft
No ratings yet
E2021 Meeting Program Draft
86 pages
Oedipus Is A Good Leader or A Bad Leader? Find Text Evidence To Support Your Position. Think About His Character Traits, His Actions
No ratings yet
Oedipus Is A Good Leader or A Bad Leader? Find Text Evidence To Support Your Position. Think About His Character Traits, His Actions
4 pages
Return to Paradise PDF
No ratings yet
Return to Paradise PDF
25 pages
Inflation and Deflation
No ratings yet
Inflation and Deflation
7 pages
What Is The Use of Choke in Florescent Tubes
100% (2)
What Is The Use of Choke in Florescent Tubes
7 pages
Writing in Paragraphs
No ratings yet
Writing in Paragraphs
6 pages
The City School Training Lesson
No ratings yet
The City School Training Lesson
8 pages
Lanuza V de Leon in RE
No ratings yet
Lanuza V de Leon in RE
3 pages

ML Lab Programs PDF

Uploaded by

ML Lab Programs PDF

Uploaded by

List of Experiments

2. Extract the data from database using python

3. Implement k-nearest neighbours classification using python

VAR1 VAR2 CLASS

6. Implement linear regression using python.

MySQL is an open-source, relational database management system(RDBMS)

Following are the steps to connect a python application to MySQL database.

#establishing the connection

#Creating a cursor object using the cursor() method

#Retrieving single row

#Executing the query

#Fetching 1st row from the table

#Fetching 1st row from the table

#Closing the connection

Implement k-nearest neighbours classification using python

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based

import matplotlib.pyplot as plt

import seaborn as sns

from matplotlib.ticker import StrMethodFormatter

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.model_selection import train_test_split

#import the csv file

df_copy = df.copy(deep = True)

## showing the count of Nans

df_copy['Glucose'].fillna(df_copy['Glucose'].mean(), inplace = True)

df_copy['BloodPressure'].fillna(df_copy['BloodPressure'].mean(), inplace = True)

df_copy['SkinThickness'].fillna(df_copy['SkinThickness'].median(), inplace = True)

df_copy['Insulin'].fillna(df_copy['Insulin'].median(), inplace = True)

df_copy['BMI'].fillna(df_copy['BMI'].median(), inplace = True)

columns=['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age'])

#splitting the trained set

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=1/3,random_state=42, stratify=y)

train_scores_ind = [i for i, v in enumerate(train_scores) if v == max_train_score]

print('Max train score {} % and k = {}'.format(max_train_score*100,list(map(lambda x: x+1,

test_scores_ind = [i for i, v in enumerate(test_scores) if v == max_test_score]

print('Max test score {} % and k = {}'.format(max_test_score*100,list(map(lambda x: x+1,

#Setup a knn classifier with k neighbors

from sklearn import metrics

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')

plt.title('Confusion matrix', y=1.1)

from sklearn.metrics import classification_report

#classifying the data through ROC ROC (Receiver Operating Characteristic)

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

#Area under ROC curve

from sklearn.metrics import roc_auc_score

from sklearn.model_selection import GridSearchCV

#In case of classifier like knn the parameter to be tuned is n_neighbors

print("Best Score:" + str(knn_cv.best_score_))

print("Best Parameters: " + str(knn_cv.best_params_))

accuracy of 121/143 = 84.6%.

from sklearn.cluster import KMeans

Implement linear regression using python.

def estimate_coef(x, y):

# mean of x and y vector

# calculating cross-deviation and deviation about x

# calculating regression coefficients

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the regression line

# function to show plot

# plotting regression line

You might also like