0% found this document useful (0 votes)
163 views

ML Lab Programs PDF

1. The document describes implementing k-nearest neighbors (KNN) classification in python. KNN is an algorithm that classifies data points based on their distance from neighboring training data points. 2. The code loads training data, scales the features, splits the data into training and test sets, and trains a KNN classifier with different values of k neighbors. It evaluates the model using accuracy on both training and test sets. 3. Metrics like the confusion matrix and ROC curve are used to evaluate model performance on the test set. The optimal k value is selected based on achieving the highest test accuracy.

Uploaded by

Ravi Kiran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views

ML Lab Programs PDF

1. The document describes implementing k-nearest neighbors (KNN) classification in python. KNN is an algorithm that classifies data points based on their distance from neighboring training data points. 2. The code loads training data, scales the features, splits the data into training and test sets, and trains a KNN classifier with different values of k neighbors. It evaluates the model using accuracy on both training and test sets. 3. Metrics like the confusion matrix and ROC curve are used to evaluate model performance on the test set. The optimal k value is selected based on achieving the highest test accuracy.

Uploaded by

Ravi Kiran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

List of Experiments

1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school

days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)

2. Extract the data from database using python

3. Implement k-nearest neighbours classification using python

4. Given the following data, which specify classifications for nine combinations of VAR1
and VAR2, predict a classification for a case where VAR1=0.906 and VAR2=0.606, using
the result of kˇmeans clustering with 3 means (i.e., 3 centroids)

VAR1 VAR2 CLASS


1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
5. The following training examples map descriptions of individuals onto high, medium and
lowcredit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisK
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk medium golf transport married forties yes
-> lowRisk
high skiing banking single thirties yes -> highRisk low golf unemployed married forties yes -
> highRisk Input attributes are (from left to right) income, recreation, job, status, age-group,
home-owner. Find the unconditional probability of `golf' and the conditional probability of
`single' given `medRisk' in the dataset?

6. Implement linear regression using python.


PROBLEM STATEMENT 1:

The probability that it is Friday and that a student is absent is 3 %. Since there are 5
school days in a week, the probability that it is Friday is 20 %. What is the probability
that a student is absent given that today is Friday? Apply Baye’s rule in python to get
the result. (Ans: 15%)

PROCEDURE:

If A and B are two events in a sample space S, then the conditional probability
of A given B is defined as
Similarly, the P(B/A) formula is:
P(A/B) =P(A∩B) / P(B)
P(B/A) =P(A∩B) / P(A)
Two events A and B are independent if and only if P(A∩B)=P(A)P(B)
For the given experiment, we have, P(A∩B)=0.03
P(A)=0.2
P(B|A)=P(A∩B)/P(A)
=0.03/0.2=0.15
Another Example: If a fair die /is rolled. Let A be the event that the outcome is
an odd number, i.e., A={1,3,5}. Also let B be the event that the outcome is less
than or equal to 3, i.e., B={1,2,3}. The figure shows the Venn diagram of the
events. p(B|A)=(2/6)/(3/6)=2/3=0.666

Source code:
probitisFridaynstudentAbsent=float(input("Enter the probability of
being Friday and student is absent: "))
probitisFriday=0.2
pstudentisAbsentgivenitisFriday=probitisFridaynstudentAbsent/probit
isFriday
print("Probability that student is absent given it is Friday is:",
pstudentisAbsentgivenitisFriday)

Out put:
Enter the probability of being Friday and student is absent: 0.03 Probability
that student is absent given it is Friday is: 0.15
PROBLEM STATEMENT 2
Extract the data from database using python

MySQL is an open-source, relational database management system(RDBMS)


that is based on Structured Query Language(SQL). Download and install
MySQL database from official website https://www.mysql.com/downloads/.
Next install MySQL Connector for Python, MySQL Connector enables the
Python programs to access the MySQL database. It can be downloaded and
installed from https://dev.mysql.com/downloads/connector/python/ Or using the
following command.
python -m pip install mysql-connector-python
If the MySQL connector is installed correctly there will be no error after
executing the import statement, import mysql.connector.

PROCEDURE:

Following are the steps to connect a python application to MySQL database.


1. Import mysql.connector module
2. Create the connection object.
3. Create the cursor object
4. Execute the query

SOURCE CODE:

import mysql.connector

#establishing the connection


conn = mysql.connector.connect(user='root',
password='cmrec@1234',host='localhost', database='cmrec')

#Creating a cursor object using the cursor() method


cursor = conn.cursor()

#Retrieving single row


sql = '''SELECT * from AIML'''

#Executing the query


cursor.execute(sql)

#Fetching 1st row from the table


#result = cursor.fetchone();
#print(result)

#Fetching 1st row from the table


result = cursor.fetchall();
print(result)

#Closing the connection


conn.close()

OUTPUT:

| FIRST_NAME | Country |
+------------+-------------+
| Shikhar | India |
| Jonathan | SouthAfrica |
| Kumara | Srilanka |
| Virat | India |
| Rohit | India |
PROBLEM STATEMENT 3

Implement k-nearest neighbours classification using python

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based


on Supervised Learning technique. It is an instance based or lazy learning
algorithm, here model is not learned using training data prior and the learning
process is postponed to a time when prediction is requested on the new instance.

K-NN Algorithm
• • Load the training data.
• • Choose K the number of nearest neighbors to look
• • Compute the test point’s distance from each training point
• • Sort the distances in ascending (or descending) order
• • Use the sorted distances to select the Knearest neighbors
• • Use majority rule(for classification) or averaging (for regression)

Advantages of KNN
1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems

Disadvantages of KNN
5. Memory Intensive / Computationally expensive
6. Sensitive to scale of data
7. Not work well on rare event (skewed) target variable
8. Struggle when high number of independent variables

SOURCE CODE:

import numpy as np

import pandas as pd

import math

import matplotlib.pyplot as plt

import seaborn as sns

from matplotlib.ticker import StrMethodFormatter

sns.set()
import warnings

warnings.filterwarnings('ignore')

#%matplotlib inline

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import KFold

from sklearn.model_selection import train_test_split

#import the csv file

df=pd.read_csv(r'C:\Users\AIMLJAVA4\Desktop\lab 3\diabetes.csv')

print(df)

df.info(verbose=True)

df.describe().T

df_copy = df.copy(deep = True)

df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']] =

df_copy[['Glucose','BloodPressure','SkinThickness','Insulin','BMI']].replace(0,np.NaN)

## showing the count of Nans

print(df_copy.isnull().sum())

p = df.hist(figsize = (20,20))

df_copy['Glucose'].fillna(df_copy['Glucose'].mean(), inplace = True)

df_copy['BloodPressure'].fillna(df_copy['BloodPressure'].mean(), inplace = True)

df_copy['SkinThickness'].fillna(df_copy['SkinThickness'].median(), inplace = True)

df_copy['Insulin'].fillna(df_copy['Insulin'].median(), inplace = True)

df_copy['BMI'].fillna(df_copy['BMI'].median(), inplace = True)


sc_X = StandardScaler()

X = pd.DataFrame(sc_X.fit_transform(df_copy.drop(["Outcome"],axis = 1),),

columns=['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age'])

X.head()

#splitting the trained set

y =df_copy.Outcome

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=1/3,random_state=42, stratify=y)

test_scores = []

train_scores = []

for i in range(1,15):

knn = KNeighborsClassifier(i)

knn.fit(X_train,y_train)

train_scores.append(knn.score(X_train,y_train))

test_scores.append(knn.score(X_test,y_test))

## score that comes from testing on the same datapoints that were used for training

max_train_score = max(train_scores)

train_scores_ind = [i for i, v in enumerate(train_scores) if v == max_train_score]

print('Max train score {} % and k = {}'.format(max_train_score*100,list(map(lambda x: x+1,

train_scores_ind))))

## score that comes from testing on the datapoints that were split in the beginning to be used for

testing solely
max_test_score = max(test_scores)

test_scores_ind = [i for i, v in enumerate(test_scores) if v == max_test_score]

print('Max test score {} % and k = {}'.format(max_test_score*100,list(map(lambda x: x+1,

test_scores_ind))))

#Setup a knn classifier with k neighbors

knn = KNeighborsClassifier(11)

knn.fit(X_train,y_train)

knn.score(X_test,y_test)

y_pred = knn.predict(X_test)

from sklearn import metrics

cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')

plt.title('Confusion matrix', y=1.1)

plt.ylabel('Actual label')

plt.xlabel('Predicted label')

#import classification_report

from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

#classifying the data through ROC ROC (Receiver Operating Characteristic)

#Curve tells us about how good the model can distinguish between two things

from sklearn.metrics import roc_curve

y_pred_proba = knn.predict_proba(X_test)[:,1]

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

#Area under ROC curve

from sklearn.metrics import roc_auc_score


roc_auc_score(y_test,y_pred_proba)

#import GridSearchCV

from sklearn.model_selection import GridSearchCV

#In case of classifier like knn the parameter to be tuned is n_neighbors

param_grid = {'n_neighbors':np.arange(1,50)}

knn = KNeighborsClassifier()

knn_cv= GridSearchCV(knn,param_grid,cv=5)

knn_cv.fit(X,y)

print("Best Score:" + str(knn_cv.best_score_))

print("Best Parameters: " + str(knn_cv.best_params_))

OUTPUT:

accuracy of 121/143 = 84.6%.


PROBLEM STATEMENT 4
Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2, predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the
result of kˇmeans clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1

SOURCE CODE:

from sklearn.cluster import KMeans


import numpy as np
X = np.array([[1.713,1.586], [0.180,1.786], [0.353,1.240],
[0.940,1.566], [1.486,0.759], [1.266,1.106],[1.540,0.419],[0.459,1.799],[0.773,0.186]])
y=np.array([0,1,1,0,1,0,1,1,1])
kmeans = KMeans(n_clusters=3, random_state=0).fit(X,y)
print("The input data is ")
print("VAR1 \t VAR2 \t CLASS")
i=0
for val in X:
print(val[0],"\t",val[1],"\t",y[i])
i+=1
print("="*20)
# To get test data from the user
print("The Test data to predict ")
test_data = []
VAR1 = float(input("Enter Value for VAR1 :"))
VAR2 = float(input("Enter Value for VAR2 :"))
test_data.append(VAR1)
test_data.append(VAR2)
print("="*20)
print("The predicted Class is : ",kmeans.predict([test_data]))
OUTPUT:
PROBLEM STATEMENT 5

The following training examples map descriptions of individuals onto high, medium and
lowcredit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?

SOURCE CODE:

total_Records=10
numGolfRecords=4
unConditionalprobGolf=numGolfRecords / total_Records
print("Unconditional probability of golf: ={}".format(unConditionalprobGolf))
#conditional probability of 'single' given 'medRisk'
numMedRiskSingle=2
numMedRisk=3
probMedRiskSingle=numMedRiskSingle/total_Records
probMedRisk=numMedRisk/total_Records
conditionalProb=(probMedRiskSingle/probMedRisk)
print("Conditional probability of single given medRisk: = {}".format(conditionalProb))

OUTPUT:
Unconditional probability of golf: =0.4
Conditional probability of single given medRisk: = 0.6666666666666667
PROBLEM STATEMENT 6

Implement linear regression using python.

SOURCE CODE:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)

if __name__ == "__main__":
main()
OUTPUT:

You might also like