0% found this document useful (0 votes)

3 views

ML LAB

The document is a lab manual for a Machine Learning course, detailing a series of experiments and programming tasks related to various algorithms and techniques in machine learning. It includes implementations of algorithms such as FIND-S, Candidate-Elimination, ID3 decision trees, k-Nearest Neighbors, and Support Vector Machines, among others. Each experiment outlines the objectives, programming requirements, and expected outputs, using datasets in CSV format for training and testing.

Uploaded by

sampathkondapalli064

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

ML LAB

Uploaded by

sampathkondapalli064

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

III B.

Tech II Sem ML Lab Manual

S NO LIST OF EXPERIMENT

1 Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based
on a given set of training data samples. Read the training data from a .CSV file.
2 For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
3 Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
4 Exercises to solve the real-world problems using the following machine learning methods: a)
Linear Regression b) Logistic Regression c) Binary Classifier
5 Develop a program for Bias, Variance, Remove duplicates , Cross Validation

6 Write a program to implement Categorical Encoding, One-hot Encoding

7 Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same
using appropriate data sets.
8 Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print
both correct
and wrong predictions.
9 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
10 Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy,
precision, and recall for your data set.
11 Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment
on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
12 Exploratory Data Analysis for Classification using Pandas or Matplotlib.

13 Write a Python program to construct a Bayesian network considering medical data. Use this
model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
14 Write a program to Implement Support Vector Machines and Principle Component Analysis

15 Write a program to Implement Principle Component Analysis

III B.Tech II Sem ML Lab Manual

Experiment – 1:

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis

based on a given set of training data samples. Read the training data from a .CSV file.

Aim: Demonstration of FIND-S algorithm for finding the most specific hypothesis

Import csv

With open(‘tennis.csv’, ‘r’) as f:

Reader=csv.reader(f)

Your_list=list(reader)

H=[[‘0’, ‘0’, ‘0’, ‘0’, ‘0’]]

For i in your

list Print(i)

Ifi[-1]==”True”:

J=0

For x in i:

If x!=”True”

if x != h[0][j] and h[0][j] == '0':

h[0][j] = x

elif x != h[0][j] and h[0][j] != '0':

h[0][j] = '?'
else:

pass

j=j+1

print("Most specific hypothesis is")

print(h)

PITT 1
III B.Tech II Sem ML Lab Manual

Output

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True

'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True

'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',Fals

'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True

Maximally Specific set

[['Sunny', 'Warm', '?', 'Strong', '?', '?']]

PITT 2
III B.Tech II Sem ML Lab Manual

Experiment – 2:

For a given set of training data examples stored in a .CSV file, implement and demonstrate the

Candidate-Elimination algorithm to output a description of the set of all hypotheses

consistent with the training examples.

Aim: Demonstration of Candidate-Elimination algorithm

Program code

class Holder:

factors={} #Initialize an empty dictionary

attributes = () #declaration of dictionaries parameters with an arbitrary

length '''

Constructor of class Holder holding two parameters,self refers to the instance of the

class '''

def init (self,attr): # self.attributes = attrfor i in attr:

self.factors[i]=[]

def

add_values(self,factor,values):self.factors[factor]=values

class CandidateElimination:

Positive={}

#Initialize positive empty dictionary

Negative={} #Initialize negative empty

dictionary

def init (self,data,fact): self.num_factors = len(data[0][0])self.factors = fact.factors

self.attr = fact.attributesself.dataset = data

def run_algorithm(self):'''

Initialize the specific and general boundaries, and loop the dataset against the

algorithm

PITT 3
III B.Tech II Sem ML Lab Manual
'''

PITT 4
III B.Tech II Sem ML Lab Manual

G = self.initializeG()S = self.initializeS() '''

Programmatically populate list in the iterating variable

trial_set''' count=0

for trial_set in self.dataset:

if self.is_positive(trial_set): #if trial set/example consists of positive examples

G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data fromthe general boundary

S_new = S[:] #initialize the dictionary with no key-value
pairprint (S_new) for s in S:
if not self.consistent(s,trial_set[0]):S_new.remove(s)

generalization = self.generalize_inconsistent_S(s,trial_set[0])generalization =

self.get_general(generalization,G)

if generalization:

S_new.append(generalization) S = S_new[:]

S = self.remove_more_general(S)print(S) else:#if it is negative

S = self.remove_inconsistent_S(S,trial_set[0]) #remove inconsitent data fromthe specific

boundary G_new = G[:] #initialize the dictionary with no key-value pair (dataset cantake

any value)

print (G_new)for g in G:

ifself.consistent(g,trial_set[0]):G_new.remove(g)

specializations = self.specialize_inconsistent_G(g,trial_set[0])specializationss =

self.get_specific(specializations,S)

if specializations != []: G_new +=

specializations

G = G_new[:]

G = self.remove_more_specific(G)print(G) print (S)print (G)

def initializeS(self):

''' Initialize the specific boundary '''

S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vectorreturn

[S] def initializeG(self):

PITT 5
III B.Tech II Sem ML Lab Manual
''' Initialize the general boundary '''

G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the

vectorreturn [G] def is_positive(self,trial_set):

''' Check if a given training trial_set is positive '''if trial_set[1] == 'Y':

return True

elif trial_set[1] == 'N':return False

else:

raise TypeError("invalid target

value") def

match_factor(self,value1,value2):

''' Check for the factors values match, necessary while checking the consistency oftraining trial_set

with the hypothesis '''

if value1 == '?' or value2 == '?':return

True elif value1 == value2 :return True

return False

def consistent(self,hypothesis,instance):

''' Check whether the instance is part of the hypothesis '''for i,factor in

enumerate(hypothesis): if not self.match_factor(factor,instance[i]):return False

return True

def remove_inconsistent_G(self,hypotheses,instance):''' For a positive trial_set, the hypotheses in

G inconsistent with it should be removed '''G_new = hypotheses[:]

for g in hypotheses:
if not self.consistent(g,instance):G_new.remove(g) return G_new

def remove_inconsistent_S(self,hypotheses,instance):
''' For a negative trial_set, the hypotheses in S
inconsistent with it should be removed ''
'S_new = hypotheses[:] for s in hypotheses:

if self.consistent(s,instance):S_new.remove(s)

return S_new
PITT 6
III B.Tech II Sem ML Lab Manual
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in Sgeneral than others in S should be removed '''
S_new = hypotheses[:]for old in hypotheses:

for new in S_new:

if old!=new and self.more_general(new,old):S_new.remove[new] return S_new
def remove_more_specific(self,hypotheses):

''' After specializing G for a negative trial_set, the hypothesis in Gspecific than others in G should

be removed '''

G_new = hypotheses[:]for old in hypotheses: for new in G_new:

if old!=new and self.more_specific(new,old):G_new.remove[new] return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specificboundary S, itshould be
generalized to be consistent with the trial_set ... we will get onehypothesis''' hypo = list(hypothesis) # convert
tuple to list for mutabilityfor i,factor in enumerate(hypo): if factor == '-':
hypo[i] = instance[i]

elif not self.match_factor(factor,instance[i]):hypo[i] = '?'

generalization = tuple(hypo) # convert list back to tuple for immutabilityreturn

eneralization def specialize_inconsistent_G(self,hypothesis,instance):

''' When a inconsistent hypothesis for negative trial_set is seen in the generalboundary G

should be specialized to be consistent with the trial_set.. we will get a set of

hypotheses ''' specializations = []

hypo = list(hypothesis) # convert tuple to list for mutabilityfor i,factor in

enumerate(hypo): if factor == '?':

values = self.factors[self.attr[i]]for j in values:

if instance[i] != j:hyp=hypo[:] hyp[i]=j

hyp=tuple(hyp) # convert list back to tuple for

immutabilityspecializations.append(hyp) return specializations

def get_general(self,generalization,G):

''' Checks if there is more general hypothesis

in G for a generalization of inconsistent

hypothesis in S

in case of positive trial_set and returns valid generalization

PITT 7
III B.Tech II Sem ML Lab Manual
''' for g in G:

if self.more_general(g,generalization):return

generalization return None

def get_specific(self,specializations,S):

''' Checks if there is more specific hypothesis in Sfor each of hypothesis in specializations of

an inconsistent hypothesis in G in case of negative trial_setand return the valid

specializations''' valid_specializations = [] for hypo in specializations:

for s in S:

if self.more_specific(s,hypo) or

s==self.initializeS()[0]:valid_specializations.append(hypo) return

valid_specializations

def exists_general(self,hypothesis,G):

'''Used to check if there exists a more general hypothesis ingeneral boundary for version

space''' for g in G:

if self.more_general(g,hypothesis):return True

return False
def exists_specific(self,hypothesis,S):

'''Used to check if there exists a more specific hypothesis ingeneral boundary for version

space''' for s in S:

if self.more_specific(s,hypothesis):return

True return False

def more_general(self,hyp1,hyp2):

''' Check whether hyp1 is more general than hyp2 ''

'hyp = zip(hyp1,hyp2) for i,j in hyp:if i == '?':

continue

elif j == '?':

if i != '?':
PITT 8
III B.Tech II Sem ML Lab Manual
return

False elif i

!= j:

return

False else:

continue

return

True

def more_specific(self,hyp1,hyp2): ''' hyp1 more specific than hyp2 is

equivalent to hyp2 being more general than hyp1 '''return self.more_general(hyp2,hyp1)

dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron

g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi

gh','strong','cool','change'),'Y')]

attributes =('Sky','Temp','Humidity','Wind','Water','Forecast')f = Holder(attributes)

f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy

f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm

f.add_values('Humidity',('normal','high')) #Humidity can be normal or high

f.add_values('Wind',('weak','strong')) #wind can be weak or strong f.add_values('Water',('warm','cold’)

#water can be warm or cold f.add_values('Forecast',('same','change')) #Forecast can be same or change

a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call therun algoritm method

a.run_algorithm()

PITT 9
III B.Tech II Sem ML Lab Manual

Output

[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]

[('sunny', 'warm', 'normal', 'strong', 'warm','same')]

[('sunny', 'warm', '?', 'strong', 'warm', 'same')]

[('?', '?', '?', '?', '?', '?')]

[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]

[('sunny', 'warm', '?', 'strong', 'warm', 'same')]

[('sunny', 'warm', '?', 'strong', '?', '?')]

[('sunny']

PITT 10
III B.Tech II Sem ML Lab Manual

Experiment-3:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use

an appropriate data set for building the decision tree and apply this knowledge to classify a

new sample. Aim: Demonstration of ID3 algorithm

Dataset: Tennis dataset

Program code:

import numpy as npimport math

from data_loader import read_data

class Node:

def init (self, attribute): self.attribute = attributeself.children = [] self.answer = "" def str (self): return self.attribute
def subtables(data, col, delete):dict = {}

items = np.unique(data[:, col])

count = np.zeros((items.shape[0], 1), dtype=np.int32)for x in range(items.shape[0]):

for y in range(data.shape[0]):

if data[y, col] == items[x]:count[x]

+= 1 for x in range(items.shape[0]):

dict[items[x]] = np.empty((int(count[x]), data.shape[1]),

dtype="|S32") pos = 0

for y in range(data.shape[0]):if data[y, col] == items[x]:

dict[items[x]][pos] = data[y]

pos += 1 if delete:

dict[items[x]] = np.delete(dict[items[x]], col, 1)return items,

dict def entropy(S):

items = np.unique(S)if items.size == 1:

return 0

counts = np.zeros((items.shape[0], 1))sums =

0 for x in range(items.shape[0]):

PITT 11
III B.Tech II Sem ML Lab Manual
counts[x] = sum(S == items[x]) / (S.size *
1.0) for count in counts:
sums += -1 * count * math.log(count, 2)return
sums def gain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))intrinsic = np.zeros((items.shape[0], 1)) for x in
range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0) entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
total_entropy = entropy(data[:, -1])iv = -1 * sum(intrinsic)
for x in range(entropies.shape[0]):total_entropy -=
entropies[x] return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1:node = Node("")
node.answer = np.unique(data[:, -1])[0]return node

gains = np.zeros((data.shape[1] - 1, 1))for col in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col)split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split,

delete=True) for x in

range(items.shape[0]):

child = create_node(dict[items[x]], metadata)node.children.append((items[x],

child)) return node def empty(size):

s = ""

for x in range(size):s += " "

return s

def print_tree(node, level):if node.answer != "":

print(empty(level), node.answer)return

PITT 12
III B.Tech II Sem ML Lab Manual
print(empty(level), node.attribute)for value, n in node.children:

print(empty(level + 1), value)print_tree(n, level + 2)

metadata, traindata = read_data("tennis.csv")data =

np.array(traindata) node = create_node(data,

metadata)print_tree(node, 0)

Data_loader.py

import csv

def read_data(filename):

with open(filename, 'r') as csvfile:

datareader = csv.reader(csvfile, delimiter=',')headers = next(datareader)

metadata = []traindata = []

for name in headers: metadata.append(name)

for row in datareader: traindata.append(row)

return (metadata, traindata)

PITT 13
III B.Tech II Sem ML Lab Manual

Input:

Tennis.csv

outlook,temperature,humidity,wind,answer sunny,hot,high,weak,no sunny,hot,high,strong,no

overcast,hot,high,weak,yes rain,mild,high,weak,yes rain,cool,normal,weak,yes

rain,cool,normal,strong,no overcast,cool,normal,strong,yes sunny,mild,high,weak,no

sunny,cool,normal,weak,yes rain,mild,normal,weak,yes sunny,mild,normal,strong,yes

overcast,mild,high,strong,yes overcast,hot,normal,weak,yes rain,mild,high,strong,no

Output

outlook

overcastb'yes'

rain

wind

b'strong'b'no' b'weak'

b'yes' sunny

humidityb'high'b'no'

b'normal'b'yes

PITT 14
III B.Tech II Sem ML Lab Manual

Experiment – 4:
Exercises to solve the real-world problems using the following machine learning methods.a).
Linear Regression b). Logistic Regression

Aim: To solve the real-world problems using the machine learning methods. Linear Regression and Logistic
Regression
Dataset: std_marks.csv-constructed on own by using students lab internal and external marks.
Program code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
data=pd.read_csv(r"E:\sudhakar\std_marks.csv")
print('First 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
x_set=data[['internal']]
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data[['external']]
print('First 5 rows of features set are:')
print(y_set.head())
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
model=linear_model.LinearRegression()
model.fit(x_train,y_train)
print('Regression coefficient is',float(model.coef_))
print('Regression intercept is',float(model.intercept_))
y_pred=model.predict(x_test)

PITT 15
III B.Tech II Sem ML Lab Manual
y_preds=[]
for i in y_pred:
7 y_preds.append(float(i))
print('Predicted values for test data are:')
print(y_preds)
print('mean squared error is ',mean_squared_error(y_test,y_pred))
plt.scatter(x_test,y_test,color='blue',label='actual y values')
plt.plot(x_test,y_pred,color='red',linewidth=3,label='predicted regression line')
plt.ylabel('y value')
plt.xlabel('x value')
plt.title('simple linear regression')
plt.legend(loc='best')
plt.show()

PITT 16
III B.Tech II Sem ML Lab Manual
Output screen shots :

PITT 17
III B.Tech II Sem ML Lab Manual

Experiment – 4b:

Aim: To solve the real-world problems using the machine learning methods. Logistic Regression

Program code:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.preprocessing import StandardScaler
data=pd.read_csv(r"E:\sudhakar\heart.csv")
print('The first 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['target'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print('Class labels are:')
print(class_labels)
sns.countplot(data['target'])
col_names=data.columns
feature_names=col_names[:-1]
feature_names=list(feature_names)
print('Feature names are:')
print(feature_names)
x_set = data.drop(['target'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data[['target']]
print('First 5 rows of features set are:')
PITT 18
III B.Tech II Sem ML Lab Manual
print(y_set.head())
scaler=StandardScaler()
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
scaler.fit(x_train)
x_train=scaler.transform(x_train)
model = LogisticRegression()
model.fit(x_train, y_train)
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')

PITT 19
III B.Tech II Sem ML Lab Manual
Output screen shots:

PITT 20
III B.Tech II Sem ML Lab Manual

PITT 21
III B.Tech II Sem ML Lab Manual
Experiment – 5:

Aim: Implement a program for Bias, Variance and cross-validation

Dataset: winequality.csv- The data set is related to white variant of the Portuguese "Vinho Verde"
wine. The data set is collected from https://archive.ics.uci.edu/ml/datasets/wine+quality.

Program code:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
import matplotlib.pyplot as plt
from statistics import mean,stdev
data=pd.read_csv(r"E:\machine learning\datasets\winequality.csv")
dim=data.shape
print('Dimensions of the data set are',dim)
print('First 5 rows of the data set are:')
print(data.head())
col_names=data.columns
col_names=list(col_names)
print('Attrubte names are:')
print(col_names)
feature_names=col_names[:-1]
print('Feature names are:',feature_names)
x_set=data.drop('quality',axis=1)
y_set=data['quality']
model=linear_model.LinearRegression()
scores=cross_val_score(model, x_set, y_set, cv=10)
k_list=range(2,200)
bias=[]
variance=[]
for k in k_list:
model=linear_model.LinearRegression()
scores=cross_val_score(model, x_set, y_set, cv=k)
bias.append(mean(scores))

PITT 22
III B.Tech II Sem ML Lab Manual
variance.append(stdev(scores))
plt.plot(k_list, bias, 'b', label='bias of model')
plt.plot(k_list, variance, 'r', label='Variance of model')
plt.xlabel('k value')
plt.title('bias-variance trade off')
plt.legend(loc='best')
plt.show()
#From, graph , best value is about 85
model=linear_model.LinearRegression()
scores=cross_val_score(model, x_set, y_set, cv=85)
bias=mean(scores)
variance=stdev(scores)
print('Bias of the model is',bias)
print('Variance of the model is',variance)

PITT 23
III B.Tech II Sem ML Lab Manual

Output screen shots:

PITT 24
III B.Tech II Sem ML Lab Manual
Experiment-7

Build an Artificial Neural Network by implementing the Back propagation algorithm

and test the same using appropriate data sets.

Aim: Demonstration of Artificial neural network using back propagation algorithm Program Code
import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

X = X/np.amax(X,axis=0) # maximum of X array longitudinallyy =

y/100 #Sigmoid Functiondef sigmoid (x):

return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Functiondef derivatives_sigmoid(x):

return x * (1 - x)

#Variable initialization

epoch=7000 #Setting training iterationslr=0.1 #Setting learning rate

inputlayer_neurons = 2 #number of features in data set hiddenlayer_neurons = 3 #number of

hidden layers neuronsoutput_neurons = 1 #number of neurons at output layer #weight and bias

initialization wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))

bh=np.random.uniform(size=(1,hiddenlayer_neurons))

wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))

bout=np.random.uniform(size=(1,output_neurons))

#draws a random range of numbers uniformly of dim x*yfor i in

range(epoch): #Forward Propogation

hinp1=np.dot(X,wh) hinp=hinp1 + bh hlayer_act = sigmoid(hinp)

outinp1=np.dot(hlayer_act,wout)outinp= outinp1+ bout

output = sigmoid(outinp)

#Backpropagation

EO = y-output

PITT 25
III B.Tech II Sem ML Lab Manual
outgrad = derivatives_sigmoid(output)d_output = EO*

outgrad EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wtscontributed to error

d_hiddenlayer = EH * hiddengrad

wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror andcurrentlayerop

# bout += np.sum(d_output, axis=0,keepdims=True) *lrwh +=

X.T.dot(d_hiddenlayer) *lr #bh += np.sum(d_hiddenlayer, axis=0,keepdims=True)

*lrprint("Input: \n" + str(X)) print("Actual Output: \n" + str(y)) print("Predicted

Output: \n" ,output)

Input:

[[ 0.66666667 1. ]

[ 0.33333333 0.55555556]

[ 1. 0.66666667]]

Actual

Output:[[0.92] [

0.86]

[ 0.89]]

Predicted

Output:[[0.89559591] [

0.88142069]

[ 0.8928407 ]]

PITT 26
III B.Tech II Sem ML Lab Manual

Experiment-8:

Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set.

Print both correct and wrong predictions.

Aim: To implement k-Nearest Neighbor algorithm

Program Code:

import csv import random

import math import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):with open(filename, 'rb') as

csvfile: lines = csv.reader(csvfile)dataset = list(lines)

for x in range(len(dataset)-1):for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split:

trainingSet.append(dataset[x])else:

testSet.append(dataset[x])

def euclideanDistance(instance1, instance2, length):distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)return

math.sqrt(distance) def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance)-1 for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length)distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))neighbors = []
for x in range(k):

neighbors.append(distances[x][0])return

neighbors def

getResponse(neighbors):classVotes = {}

for x in range(len(neighbors)): response = neighbors[x][-1]if response in classVotes:

classVotes[response] += 1

PITT 27
III B.Tech II Sem ML Lab Manual
else:

classVotes[response] = 1

sortedVotes = sorted(classVotes.iteritems(),reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet, predictions): correct = 0 for x in

range(len(testSet)):

key=operator.itemgetter(1),
if testSet[x][-1] == predictions[x]:correct += 1 return (correct/float(len(testSet))) * 100.0 def main():
# prepare data trainingSet=[] testSet=[]split = 0.67

loadDataset('knndat.data', split, trainingSet, testSet) print('Train set: ' +

repr(len(trainingSet))) print('Test set: ' + repr(len(testSet)))

# generate predictions

predictions=[]k=3 for x in

range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x],k) result = getResponse(neighbors)

predictions.append(result)

print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-1])) accuracy =

getAccuracy(testSet, predictions)

print('Accuracy: ' + repr(accuracy) +'%') main()

OUTPUT

Confusion matrix is as follows

[[11 0 0]

[0 9 1]

[0 1 8]]

Accuracy metrics0 1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0,89 9

Avg/Total 0.93 0.93 0.93 30

PITT 28
III B.Tech II Sem ML Lab Manual
Experiment – 9:

Implement the non-parametric Locally Weighted Regression algorithm in orderto fit

data points. Select appropriate data set for your experiment and drawgraphs.

Aim: Demonstration of -parametric Locally Weighted Regression

algorithm Program Code

from numpy import *import operator

from os import listdirimport matplotlib

import matplotlib.pyplot as pltimport pandas as

pd import numpy as np1 import numpy.linalg as

np from scipy.stats.stats import pearsonr

def kernel(point,x mat, k):

m, n = np1.shape(xmat)

weights = np1.mat(np1.eye((m)))

for j in range(m):

diff = point - X[j]

weights[j,j] = np1.exp(diffdiff.T/(-2.0k**2))return weights

def localWeight(point,xmat,ymat,k):wei = kernel(point,xmat,k)

W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T)) return W

def localWeightRegression(xmat,ymat,k):m,n = np1.shape(xmat) ypred = np1.zeros(m)for i in range(m):

ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)return

ypred # load data points

data = pd.read_csv('data10.csv')bill = np1.array(data.total_bill) tip =

np1.array(data.tip) #preparing and add 1 in billmbill = np1.mat(bill)

mtip = np1.mat(tip)

m= np1.shape(mbill)[1]

one = np1.mat(np1.ones(m))

PITT 29
III B.Tech II Sem ML Lab Manual
X= np1.hstack((one.T,mbill.T)) #set k here

ypred = localWeightRegression(X,mtip,2)

SortIndex = X[:,1].argsort(0)xsort = X[SortIndex][:,0]

Output

PITT 30
III B.Tech II Sem ML Lab Manual

Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy,precision, and recall for your data set
Aim: classification of set of documents using Naive Bayesian classification
Program code
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.messagey=msg.labelnumprint(X)
print(y)
#splitting the dataset into train and test data from
sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape) print(ytest.shape) print(ytrain.shape)
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text
import CountVectorizercount_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)
#tabular representation
print(xtrain_dtm)
#sparse matrix representation
# Training Naive Bayes (NB) classifier on training data
from sklearn.naive_bayes import MultinomialNB clf== MultinomialNB().fit(xtrain_dtm,ytrain)

predicted = clf.predict(xtest_dtm)
#printing accuracy metrics
from sklearn import metricsprint('Accuracy metrics')

PITT 31
III B.Tech II Sem ML Lab Manual
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
'''docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)predictednew = clf.predict(X_new_counts)

for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, msg.labelnum[category]))'''
I love this sandwich,pos This is an amazing place,pos
I feel very good about these beers,posThis is my best work,pos
What an awesome view,pos
I do not like this restaurant,negI am tired of this stuff,neg
I can't deal with this,neg He is my sworn enemy,negMy boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,negI love to dance,pos
I am sick and tired of this place,negWhat a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,posI went to my enemy's house today,neg

OUTPUT:-

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']about am amazing an and
awesome beers best boss can ... today \
0 10 0 0 0 0 1 0 0 0 ... 0
1 00 0 0 0 0 0 1 0 0 ... 0
2 00 1100 0 0 0 0 ... 0
3 00 0 0 0 0 0 0 0 0 ... 1
4 00 0 0 0 0 0 0 0 0 ... 0
5 01 001 0 0 0 0 0 ... 0
6 00 0 0 0 0 0 0 0 1 ... 0
7 00 0 0 0 0 0 0 0 0 ... 0
8 01 0 0 0 0 0 0 0 0 ... 0
9 00 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0

PITT 32
III B.Tech II Sem ML Lab Manual

Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering
usingkMeans algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
Aim: Implementation of EM algorithm to cluster a Heart Disease Data Set
Program Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator
import make_blobsX, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X = X[:, ::-1]
#flip axes for better plotting
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture (n_components = 4).fit(X)lables = gmm.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap=‟viridis‟);probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * probs.max(1) ** 2
# square emphasizes differences
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=‟viridis‟, s=size);
from matplotlib.patches import Ellipse
def draw_ellipse(position, covariance, ax=None, **kwargs);
“””Draw an ellipse with a given position and covariance”””
Ax= ax or plt.gca()

# Convert covariance to principal axes

if covariance.shape ==(2,2):
U, s, Vt = np.linalg.svd(covariance)
Angle = np.degrees(np.arctan2(U[1, 0], U[0,0]))Width, height = 2 * np.sqrt(s)
else:
angle = 0
width, height = 2 * np.sqrt(covariance)
#Draw the Ellipse
for nsig in range(1,4):
ax.add_patch(Ellipse(position, nsig * width, nsig *height,angle, **kwargs))
def plot_gmm(gmm, X, label=True, ax=None):ax = ax or plt.gca()
labels = gmm.fit(X).predict(X)if label:
ax.scatter(X[:, 0], x[:, 1], c=labels, s=40, cmap=‟viridis‟, zorder=2)else:
ax.scatter(X[:, 0], x[:, 1], s=40, zorder=2)ax.axis(„equal‟)
w_factor = 0.2 / gmm.weights_.max()
for pos, covar, w in zip(gmm.means_, gmm.covariances_, gmm.weights_):draw_ellipse(pos, covar,
alpha=w * w_factor)
gmm = GaussianMixture(n_components=4, random_state=42)plot_gmm(gmm, X)

PITT 33
III B.Tech II Sem ML Lab Manual
gmm = GaussianMixture(n_components=4, covariance_type=‟full‟,random_state=42)
plot_gmm(gmm, X)

Output

[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]

K MEANS :
from sklearn.cluster import KMeans
#from sklearn import metricsimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data=pd.read_csv("kmeansdata.csv")
df1=pd.DataFrame(data)
print(df1)
f1 = df1['Distance_Feature'].valuesf2 = df1['Speeding_Feature'].values
X=np.matrix(list(zip(f1,f2)))plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50]) plt.title('Dataset') plt.ylabel('speeding_feature')plt.xlabel('Distance_Feature')
plt.scatter(f1,f2)
plt.show()
# create new plot and data
plt.plot()
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']
# KMeans algorithm#K = 3
kmeans_model = KMeans(n_clusters=3).fit(X)
plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l], marker=markers[l],ls='None')plt.xlim([0, 100])
plt.ylim([0, 50])plt.show()
Driver_ID,Distance_Feature,Speeding_Feature
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40

PITT 34
III B.Tech II Sem ML Lab Manual
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18

PITT 35
III B.Tech II Sem ML Lab Manual

Experiment -12

Aim: Exploratory data analysis for classification using pandas and Matplotlib
Dataset: tae.csv- The data consist of evaluations of teaching performance over three regular
semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics
Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-
sized categories ("low", "medium", and "high") to form the class variable. The data set is collected
from https://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation
Program code:
import pandas as pd
import matplotlib.pyplot as plt
print(‘pandas version is’, pd. version )
data = pd.read_csv(r"E:\sudhakar\tae.csv",header=None)
col_names=['native_speaker','instructor','course','semester','class_size','score']
data.columns=col_names
print('Data type of target variable is:',data['score'].dtype)
print('Converting target variable data type to categorical')
data['score']=data['score'].astype('category')
print('Afrer conversion, data type of target variable is:',data['score'].dtype)
print('Dimesnions of the data set:')
print(data.shape)
print('The first 5 rows of the data set are:')
print(data.head())
print('The last 5 rows of the data set are:')
print(data.tail())
print('Randomly selected 5 rows of the data set are:')
print(data.sample(5))
print('The columns of the data set are:')
print(data.columns.tolist())
print('Names and data types of attributes are:')
print(data.dtypes)
print('Converting native_speaker data type to categorical')
data['native_speaker']=data['native_speaker'].astype('category')

PITT 36
III B.Tech II Sem ML Lab Manual
print('After conversion,Names and data types of attributes are:')
print(data.dtypes)
print('Information of the data set attributes:')
print(data.info())
print('Statistics of the numerical attributes of the data set are:')
print(data.describe())
print('Statistics of the all attributes of the data set are:')
print(data.describe(include='all'))
print('Corelation matrix of the numerical attributes of the data set is:')
corr=data.corr()
print(corr)
print('Distribution of the target variable is:')
print(data['score'].value_counts())
print('Target class distrubtion w.r.t \'native_speaker\' attribute')
print(pd.crosstab(data.native_speaker,data.score))
print('Target class distrubtion w.r.t \'native_speaker\' attribute')
print(pd.crosstab(data.native_speaker,data.score,normalize='index'))
print('Target class distrubtion w.r.t \'native_speaker\' attribute using groupby')
data.groupby('native_speaker').score.value_counts()
print('Checking for null values:')
print(data.isnull().sum())
data.dropna(subset=['instructor'],axis=0,inplace=True)
print('After removal rows with null values in column \'instructor\'')
print(data.isnull().sum())
print('Unique values in the column named \'score\'')
print(data['score'].unique())
data.plot(kind='scatter',x='semester',y='class_size',color='red')
print('Number of distinct courses semester wise')
data.groupby('semester')['course'].nunique().plot(kind='bar')
print('Frequency of values in column \'semester\'')
data[['semester']].plot(kind='hist')
data.plot(kind='bar',x='semester',y='course',color='red')
ax = plt.gca()#gca means get current axes
data.plot(kind='line',x='semester',y='class_size',ax=ax)
PITT 37
III B.Tech II Sem ML Lab Manual

Output screen shots:

PITT 38
III B.Tech II Sem ML Lab Manual

PITT 39
III B.Tech II Sem ML Lab Manual

PITT 40
III B.Tech II Sem ML Lab Manual

PITT 41
III B.Tech II Sem ML Lab Manual

PITT Page 42
III B.Tech II Sem ML Lab Manual

Experiment -13:

Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set.
import bayespy as bp
import numpy as np
import csv from colorama
import init from colorama
import Fore, Back, Style init()

# Define Parameter Enum values

#Age
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3, 'Teen':4}
# Gender
genderEnum = {'Male':0, 'Female':1}
# FamilyHistory
familyHistoryEnum = {'Yes':0, 'No':1}
# Diet(Calorie Intake)
dietEnum = {'High':0, 'Medium':1, 'Low':2}
# LifeStyle
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
# Cholesterol
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
# HeartDisease
heartDiseaseEnum = {'Yes':0,'No':1}
#heart_disease_data.csv

with open('heart_disease_data.csv')as csvfile:

lines = csv.reader(csvfile)
dataset= list(lines) data = []
for x in dataset:

data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[x[3]],lifeSt
yleEn um[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]])
# Training data for machine learning todo: should import
from csv data = np.array(data)
N = len(data)

# Input data column assignment

p_age=bp.nodes.Dirichlet(1.0*np.nes(5))
age = bp.nodes.Categorical(p_age, plates=(N,)) age.observe(data[:,0])
p_gender=bp.nodes.Dirichlet(1.0*np.one(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])

PITT Page 43
III B.Tech II Sem ML Lab Manual

p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])

p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,)) lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))

C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:107: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use àrr[tuple(seq)]` instead
of àrr[seq]`. In the future this will be interpreted as an array index, àrr[np.array(seq)]`, which will
result either in an error or a different result.
u0[[np.arange(np.size(x)), np.ravel(x)]] = 1
cholesterol.observe(data[:,5])

# Prepare nodes and establish edges

# np.ones(2) -> HeartDisease has 2 options Yes/No
# plates(5, 2, 2, 3, 4, 3) -> corresponds to options present for domain values
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4, 3))
heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet, lifestyle, cholesterol],
bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:,6])
p_heartdisease.update()

# Sample Test with hardcoded values

#print("Sample Probability")
#print("Probability(HeartDisease|Age=SuperSeniorCitizen, Gender=Female,
FamilyHistory=Yes, DietIntake=Medium, LifeStyle=Sedetary, Cholesterol=High)")
#print(bp.nodes.MultiMixture([ageEnum['SuperSeniorCitizen'], genderEnum['Female'],
familyHistoryEnum['Yes'], dietEnum['Medium'], lifeStyleEnum['Sedetary'],
cholesterolEnum['High']], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']])

# Interactive Test
m=0
while m == 0:
print("\n")

res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter

Gender: ' + str(genderEnum))), int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))),
int(input('Enter dietEnum: ' + str(dietEnum))), int(input('Enter LifeStyle: ' +
str(lifeStyleEnum))), int(input('Enter Cholesterol: ' + str(cholesterolEnum)))],

PITT Page 44
III B.Tech II Sem ML Lab Manual

bp.nodes.Categorical, p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
#print(Style.RESET_ALL)
m = int(input("Enter for Continue:0, Exit :1 "))

OUTPUT:

Enter Age: {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2, 'Youth': 3, 'Teen': 4}1

Enter Gender: {'Male': 0, 'Female': 1}0
Enter FamilyHistory: {'Yes': 0, 'No': 1}0
Enter dietEnum: {'High': 0, 'Medium': 1, 'Low': 2}2
Enter LifeStyle: {'Athlete': 0, 'Active': 1, 'Moderate': 2, 'Sedetary': 3}2
Enter Cholesterol: {'High': 0, 'BorderLine': 1, 'Normal': 2}1
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:43: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use àrr[tuple(seq)]` instead
of àrr[seq]`. In the future this will be interpreted as an array index, àrr[np.array(seq)]`, which will
result either in an error or a different result.
u0[[np.arange(np.size(x)), np.ravel(x)]] = 1
Probability(HeartDisease) = 0.5
Enter for Continue:0, Exit :1 1

PITT Page 45
III B.Tech II Sem ML Lab Manual

Experiment -14:

Write a program to implement Support

Vector Machines Aim:
To implement Support Vector Machines
Dataset: haberman.csv- The dataset contains cases from a study that was conducted
between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of
patients who had undergone surgery for breast cancer. The goal is to predict the Survival
status (class attribute) of the patient(1 = the patient survived 5 years or longer,2 = the
patient died within 5 years). The data set is
collected from https://archive.ics.uci.edu/ml/datasets/Haberman's+Survival.

Program code:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
data = pd.read_csv(r"E:\sudhakar\haberman.csv", header=None)
#age=age of the patient
#year=Patient's year of operation (year - 1900)
#pos_axil_nodes=Number of positive axillary nodes detected
#survival_status:
1 -the patient survived 5 years or longer
#:2 -the patient died within 5 year col_names=['age','year','pos_axil_nodes','survival_status']
data.columns=col_names
#we removed the attribute year of operation data=data.drop(['year'], axis=1)
print('The first 5 rows of the data set are:') print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the dataset is:')
print(data.corr())
class_lbls=data['survival_status'].unique() class_labels=[]
for x in class_lbls:
class_labels.append(
str(x))
print('Class labels are:')
print(class_labels) sns.countplot(data['survival_status']) col_names=data.columns
feature_names=col_names[:-1]
feature_names=list(feature_names)

PITT Page 46
III B.Tech II Sem ML Lab Manual

print('Feature names are:')

print(feature_names)
x_set = data.drop(['survival_status'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data['survival_status']
print('First 5 rows of target variable are:')
print(y_set.head())
print('Distribution of Target variable is:')
print(y_set.value_counts())
scaler=StandardScaler()
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set,
test_size = 0.3) scaler.fit(x_train)
x_train=scaler.transform(x_train) model =SVC()
print("Traning the model with train data set")model.fit(x_train, y_train)
x_test=scaler.transform(x_test) y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=cl
ass_labels)) cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index =
class_labels) df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted' sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30,
cmap=plt.cm.Paired) plt.xlabel('age')
plt.ylabel('pos_axil_nodes') plt.title('Data points in traning data set')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30,
cmap=plt.cm.Paired) plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('support vectors and decision boundary') ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='red', levels=[-1, 0, 1],
alpha=0.5,
linestyles=['--', '-', '--']) # plot support vectors
ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=30,
facecolors='green')

PITT Page 47
III B.Tech II Sem ML Lab Manual

plt.show()
Output screen shots:

PITT Page 48
III B.Tech II Sem ML Lab Manual

PITT Page 49
III B.Tech II Sem ML Lab Manual
Experiment -15:

Write a program to implement principle component analysis

import numpy as nmp

import matplotlib.pyplot as mpltl

import pandas as pnd

DS = pnd.read_csv('Wine.csv')

# Now, we will distribute the dataset into two components "X" and "Y"

X = DS.iloc[: , 0:13].values
Y = DS.iloc[: , 13].values
from sklearn.model_selection import train_test_split as tts

X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2, random_state = 0)

from sklearn.preprocessing import StandardScaler as SS

SC = SS()
X_train = SC.fit_transform(X_train)
X_test = SC.transform(X_test)
from sklearn.decomposition import PCA

PCa = PCA (n_components = 1)

X_train = PCa.fit_transform(X_train)
X_test = PCa.transform(X_test)
explained_variance = PCa.explained_variance_ratio_

from sklearn.linear_model import LogisticRegression as LR

classifier_1 = LR (random_state = 0)
classifier_1.fit(X_train, Y_train)

Output:

LogisticRegression(random_state=0)

PITT Page 50

Material Handling Study Report For Removable Tube Bundle
No ratings yet
Material Handling Study Report For Removable Tube Bundle
9 pages
Cisco Systems: Managing The Go-To-Market Evolution
No ratings yet
Cisco Systems: Managing The Go-To-Market Evolution
14 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
BIM Standard Infrastructure en
No ratings yet
BIM Standard Infrastructure en
63 pages
PMx-300 Rhino User Manual
No ratings yet
PMx-300 Rhino User Manual
125 pages
ML Lab Manual R20 1
No ratings yet
ML Lab Manual R20 1
63 pages
ML-LAB-MANUAL-R20-1
No ratings yet
ML-LAB-MANUAL-R20-1
63 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
31 pages
B.TECH Machine Learning-Lab
No ratings yet
B.TECH Machine Learning-Lab
99 pages
Machine Learning Lab Mannual CS 601
No ratings yet
Machine Learning Lab Mannual CS 601
30 pages
IT 804
No ratings yet
IT 804
33 pages
original ML lab manual (1)
No ratings yet
original ML lab manual (1)
22 pages
ml lab(print copy)
No ratings yet
ml lab(print copy)
23 pages
Ml_Lab_Manual
No ratings yet
Ml_Lab_Manual
70 pages
Edited - Edited - Final ML Lab Manual Version11
No ratings yet
Edited - Edited - Final ML Lab Manual Version11
83 pages
6CS4-22 Machine Learning Lab
No ratings yet
6CS4-22 Machine Learning Lab
30 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
30 pages
ML RECORD NEW FORMAT
No ratings yet
ML RECORD NEW FORMAT
48 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
Jntuk R20 ML
No ratings yet
Jntuk R20 ML
43 pages
ML - LAB Record
No ratings yet
ML - LAB Record
36 pages
22K61A0618_removed_lab manual sasi cld
No ratings yet
22K61A0618_removed_lab manual sasi cld
25 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
ML-LAB-MANUAL-R20
No ratings yet
ML-LAB-MANUAL-R20
77 pages
My ML Lab Manual
No ratings yet
My ML Lab Manual
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
ML_LAB Record_final
No ratings yet
ML_LAB Record_final
39 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
Machine Learning Laboratory 18CSL76: Institute of Technology and Management
No ratings yet
Machine Learning Laboratory 18CSL76: Institute of Technology and Management
49 pages
R20-21NM-III-I-ML-LAB MANUAL (1)
No ratings yet
R20-21NM-III-I-ML-LAB MANUAL (1)
38 pages
ML Record
No ratings yet
ML Record
18 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
ML Lab R20
No ratings yet
ML Lab R20
42 pages
ML Lab
No ratings yet
ML Lab
49 pages
ML Experiments
No ratings yet
ML Experiments
22 pages
15CSL76
No ratings yet
15CSL76
35 pages
ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
csml1819 PDF
No ratings yet
csml1819 PDF
36 pages
Ad3461 Ml Lab Manual
100% (1)
Ad3461 Ml Lab Manual
54 pages
24CSPC212-PIC Lab Manual
No ratings yet
24CSPC212-PIC Lab Manual
45 pages
FINAL LAB PROGRAMS (2)
No ratings yet
FINAL LAB PROGRAMS (2)
52 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
AD3461-Machine Learning Lab Manual
No ratings yet
AD3461-Machine Learning Lab Manual
26 pages
lab_program-2
No ratings yet
lab_program-2
4 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
Cognitive Science Manual
No ratings yet
Cognitive Science Manual
17 pages
22K61A0654_2_sasi_auto
No ratings yet
22K61A0654_2_sasi_auto
24 pages
ML Lab Manual-99
No ratings yet
ML Lab Manual-99
23 pages
C1 W2 Lab05 Sklearn GD Soln
No ratings yet
C1 W2 Lab05 Sklearn GD Soln
3 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
ML Lab Manual Devansh (1)
No ratings yet
ML Lab Manual Devansh (1)
57 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
ML Manual
No ratings yet
ML Manual
34 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
1.implement FIND-S Algorithm: Desription
No ratings yet
1.implement FIND-S Algorithm: Desription
19 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
DIY Water Quality Testing ESP8266 NodeMCU and TDS
No ratings yet
DIY Water Quality Testing ESP8266 NodeMCU and TDS
5 pages
Project Members: Underground Cable Fault Detection by Using GSM
No ratings yet
Project Members: Underground Cable Fault Detection by Using GSM
14 pages
Fivt Assignment-Lo-1 Time - 2 Hrs
No ratings yet
Fivt Assignment-Lo-1 Time - 2 Hrs
6 pages
Ex 2
No ratings yet
Ex 2
15 pages
Materials and Systems For Pre-Stressing
No ratings yet
Materials and Systems For Pre-Stressing
20 pages
Landscape Photography Case Study
No ratings yet
Landscape Photography Case Study
1 page
02 Handout 3
No ratings yet
02 Handout 3
8 pages
Us Android User Guide
No ratings yet
Us Android User Guide
15 pages
Edutap Computer Awareness Notes
No ratings yet
Edutap Computer Awareness Notes
104 pages
IECExBVS17.0089 Issue2
No ratings yet
IECExBVS17.0089 Issue2
5 pages
Iwe Ogun PDF
No ratings yet
Iwe Ogun PDF
4 pages
En tg200727 Uk0yr1215d
No ratings yet
En tg200727 Uk0yr1215d
8 pages
1pluschem, LLC (Presentation)
No ratings yet
1pluschem, LLC (Presentation)
12 pages
ECE-R09 Syllabus, 26.07.2012
No ratings yet
ECE-R09 Syllabus, 26.07.2012
85 pages
Major Security Challenges in 5G Network
No ratings yet
Major Security Challenges in 5G Network
8 pages
Graph 110 - Fall 2018 - Greg Kammer
No ratings yet
Graph 110 - Fall 2018 - Greg Kammer
3 pages
Delhi NCR Based Companies List 1680421970
100% (1)
Delhi NCR Based Companies List 1680421970
3 pages
STM32C011x4/x6: Arm Cortex - M0+ 32-Bit MCU, 32 KB Flash, 6 KB RAM, 2 X USART, Timers, ADC, Comm. I/Fs, 2-3.6 V
No ratings yet
STM32C011x4/x6: Arm Cortex - M0+ 32-Bit MCU, 32 KB Flash, 6 KB RAM, 2 X USART, Timers, ADC, Comm. I/Fs, 2-3.6 V
92 pages
Bacharuddin Jusuf Habibie Known As BJ
No ratings yet
Bacharuddin Jusuf Habibie Known As BJ
2 pages
AXIUM DX8000: Smart Payment and Commerce Solution For In-Store Performance
No ratings yet
AXIUM DX8000: Smart Payment and Commerce Solution For In-Store Performance
2 pages
Untitled
No ratings yet
Untitled
4 pages
2013 Smart RK11 Autorefra
No ratings yet
2013 Smart RK11 Autorefra
3 pages
Electra x2s
No ratings yet
Electra x2s
2 pages
Oodp Unit 2
No ratings yet
Oodp Unit 2
100 pages
ANT AQU4518R24v06 Datasheet
No ratings yet
ANT AQU4518R24v06 Datasheet
2 pages
JavaScript Allonge The Six Edition
No ratings yet
JavaScript Allonge The Six Edition
297 pages