ML Lab Manual (1-9)
ML Lab Manual (1-9)
NO: 1
CANDIDATE - ELIMINATION ALGORITHM
DATE:
AIM:
ALGORITHM:
• If d is a positive example
• Remove s from S
• Remove from S any hypothesis that is more general than another hypothesis
in S
• If d is a negative example
• Remove g from G
• Remove from G any hypothesis that is less general than another hypothesis
in G
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
specific_h = concepts[0].copy()
print(specific_h)
range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
for i in indices:
DATA SET:
OUTPUT:
Final Specific_h:
Final General_h:
Thus the program to implement Candidate-Elimination Algorithm for using the given
dataset have been executed successfully and the output got verified.
EX.NO: 2
DECISION TREE BASED ID3 ALGORITHM
DATE:
AIM:
To demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
ALGORITHM :
PROGRAM :
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers
class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
self.answer=""
def subtables(data,col,delete):
dic={}
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1
for x in range(len(attr)):
range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic
def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0
counts=[0,0]
for i in range(2):
sums=0
sums+=-1*cnt*math.log(cnt,2)
return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
for x in range(len(attr)):
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy
def build_tree(data,features):
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node
n=len(data[0])-1
gains=[0]*n
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:]
attr,dic=subtables(data,split,delete=True)
for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return
print(" "*level,node.attribute)
print(" "*(level+1),value)
print_tree(n,level+2)
def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
if x_test[pos]==value:
classify(n,x_test,features)
'''Main program'''
dataset,features=load_csv("data3.csv")
node1=build_tree(dataset,features)
is")
print_tree(node1,0)
testdata,features=load_csv("data3_test.csv")
classify(node1,xtest,features)
DATA SET:
OUTPUT:
Outlook
rain
Wind
strong
no
weak
yes
overcast
yes
sunny
Humidity
normal
yes
high
no
Thus the program to implement Decision tree for ID3 Algorithm using the given
dataset have been executed successfully and the output got verified.
EX.NO: 3
BACKPROPAGATION ALGORITHM
DATE:
AIM :
ALGORITHM :
Create a feed-forward network with ni inputs, nhidden hidden units, and nout
output
units.
Initialize all network weights to small random numbers
Until the termination condition is met, Do
For each (⃗x, t ), in training examples, Do
Normalize the input
PROGRAM :
import numpy as np
y = y/100
#Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neur
ons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neuron
s))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
EH = d_output.dot(wout.T)
d_hiddenlayer = EH * hiddengrad
wh += X.T.dot(d_hiddenlayer) *lr
OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89726759]
[0.87196896]
[0.9000671]]
RESULT:
Thus the program to implement Back propagation algorithm using the given dataset
have been executed successfully and the output got verified.
EX.NO: 4
NAÏVE BAYESIAN CLASSIFIER
DATE:
AIM:
To implement the naïve Bayesian classifier for a sample training data set stored as a
.CSV file.
ALGORITHM :
The data set used in this program is the Pima Indians Diabetes problem.
This data set is comprised of 768 observations of medical details for Pima
Indians patients. The records describe instantaneous measurements taken from
the patient such as their age, the number of times pregnant and blood workup.
All patients are women
aged 21 or older. All attributes are numeric, and their units vary from attribute
to attribute.
The attributes are Pregnancies, Glucose, BloodPressure, SkinThickness,
Insulin,
BMI, DiabeticPedigreeFunction, Age, Outcome
Each record has a class value that indicates whether the patient suffered an
onset of
diabetes within 5 years of when the measurements were taken (1) or not (0)
PROGRAM:
import csv
import random
import math
def loadcsv(filename):
dataset = list(lines)
for i in range(len(dataset)):
return dataset
def splitdataset(dataset, splitratio):
trainset = []
copy = list(dataset);
#generate indices for the dataset list randomly to pick ele for
training data
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
def separatebyclass(dataset):
for i in range(len(dataset)):
vector = dataset[i]
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
return math.sqrt(variance)
attribute in zip(*dataset)];
return summaries
def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
summaries[classvalue] = summarize(instances)
return summaries
exponent = math.exp(-(math.pow(x-mean,2)/
(2*math.pow(stdev,2))))
probabilities = {}
probabilities[classvalue] = 1
for i in range(len(classsummaries)):
probabilities[classvalue] *=
return probabilities
is passed
probabilities = calculateclassprobabilities(summaries,
inputvector)
bestProb = probability
bestLabel = classvalue
return bestLabel
for i in range(len(testset)):
predictions.append(result)
return predictions
correct = 0
for i in range(len(testset)):
if testset[i][-1] == predictions[i]:
correct += 1
def main():
filename = 'naivedata.csv'
splitratio = 0.67
dataset = loadcsv(filename);
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
main()
OUTPUT:
RESULT:
Thus the program to implement Naïve Bayesian classifier using the given dataset
have been executed successfully and the output got verified.
EX.NO: 5 NAÏVE BAYESIAN CLASSIFIER USING ACCURACY,RECALL
DATE: PRECISION
AIM:
To implement use the naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
ALGORITHM :
1. collect all words, punctuation, and other tokens that occur in Examples
• Vocabulary ← c the set of all distinct words and other tokens occurring in
any text document from Examples
• docsj ← the subset of documents from Examples for which the target value
is vj
• P(wk|vj) ← ( nk + 1) / (n + | Vocabulary| )
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext.csv',names=['message','label'])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe
ature_names())
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
metrics.precision_score(ytest,predicted))
metrics.recall_score(ytest,predicted))
DATA SET:
OUTPUT:
8 He is my sworn enemy
9 My boss is horrible
12 I love to dance
01
11
21
31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'can', 'deal', 'do', 'enemy', 'feel',
'fun', 'good', 'great', 'have', 'he', 'holiday', 'house', 'is', 'like', 'love', 'my', 'not', 'of', 'place',
'restaurant', 'sandwich', 'sick', 'sworn', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very',
Confusion matrix
[[2 1]
[0 2]]
RESULT:
Thus the program to implement Naïve Bayesian classifier using accuracy, precision,
recall using the given dataset have been executed successfully and the output got verified.
EX.NO: 6
BAYESIAN NETWORK
DATE:
AIM:
To construct a Bayesian network, to demonstrate the diagnosis of heart
patients using standard Heart Disease Data Set.
ALGORITHM:
PROGRAM:
import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3,
'Teen':4} genderEnum = {'Male':0, 'Female':1}
familyHistoryEnum = {'Yes':0, 'No':1}
dietEnum = {'High':0, 'Medium':1, 'Low':2}
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
heartDiseaseEnum = {'Yes':0, 'No':1}
with open('heart_disease_data.csv') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:
data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[ x
[3]],lifeStyleEnum[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]]) data =
np.array(data)
N = len(data)
p_age = bp.nodes.Dirichlet(1.0*np.ones(5))
age = bp.nodes.Categorical(p_age, plates=(N,))
age.observe(data[:,0])
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4,
3))heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet,
lifstyle, cholesterol], bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:,6])
p_heartdisease.update()
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' str(ageEnum))),
int(input('Enter Gender: ' + str(genderEnum))), int(input('Enter FamilyHistory: ' +
str(familyHistoryEnum))), int(input('Enter dietEnum: ' + str(dietEnum))),
int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter Cholesterol: ' +
str(cholesterolEnum)))], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
m = int(input("Enter for Continue:0, Exit :1 "))
OUTPUT:
AIM:
dataset.
ALGORITHM:
get θ
PROGRAM:
dataset=load_iris()
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'
] y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=4
0) plt.title('Real')
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gm
m],s=4 0) plt.title('GMM Classification')
OUTPUT:
RESULT:
Thus the program to implement EM Algorithm for clustering networks
using the given dataset have been executed successfully and the output got
verified.
EX.NO: 8
K-NEAREST NEIGHBOUR ALGORITHM
DATE:
AIM:
To implment k-Nearest Neighbour algorithm for given dataset.
ALGORITHM:
PROGRAM:
iris=datasets.load_iris()
iris_data=iris.data
iris_labels=iris.target
x_train, x_test, y_train, y_test=(train_test_split(iris_data, iris_labels, test_size=0.20))
classifier=KNeighborsClassifier(n_neighbors=6) classifier.fit(x_train, y_train)
y_pred=classifier.predict(x_test)
print("accuracy is")
print(classification_report(y_test, y_pred))
OUTPUT:
accuracy is
precision recall f1-score support
accuracy 0.97 30
macro avg 0.96 0.98 0.97 30
weighted avg 0.97 0.97 0.97 30
RESULT:
AIM:
ALGORITHM:
Step 1: Read the Given data Sample to X and the curve (linear
or non linear) to Y
Step 2: Set the value for Smoothening parameter or Free parameter
say τ
Step 3: Set the bias /Point of interest set x0 which is a subset of X
Step 4: Determine the weight matrix using :
PROGRAM:
OUTPUT :
RESULT:
Thus the program to implement regression algorithm using the given dataset have
been executed successfully and the output got verified.