0% found this document useful (0 votes)
6 views

ML Lab Manual (1-9)

machine learning
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ML Lab Manual (1-9)

machine learning
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

EX.

NO: 1
CANDIDATE - ELIMINATION ALGORITHM
DATE:

AIM:

To implement and demonstrate the Candidate-Elimination algorithm to output a


description of the set of all hypotheses consistent with the training examples.

ALGORITHM:

Initialize G to the set of maximally general hypotheses in H

Initialize S to the set of maximally specific hypotheses in H

For each training example d, do

• If d is a positive example

• Remove from G any hypothesis inconsistent with d

• For each hypothesis s in S that is not consistent with d

• Remove s from S

• Add to S all minimal generalizations h of s such that

• h is consistent with d, and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis
in S

• If d is a negative example

• Remove from S any hypothesis inconsistent with d

• For each hypothesis g in G that is not consistent with d

• Remove g from G

• Add to G all minimal specializations h of g such that

• h is consistent with d, and some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis
in G
PROGRAM:

import numpy as np

import pandas as pd

data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))

concepts = np.array(data.iloc[:,0:-1])

print(concepts)

target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("initialization of specific_h and general_h")

print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in

range(len(specific_h))]

print(general_h)

for i, h in enumerate(concepts):

if target[i] == "yes":

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

specific_h[x] ='?'

general_h[x][x] ='?'

print(specific_h)

print(specific_h)

if target[i] == "no":

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

general_h[x][x] = specific_h[x]
else:

general_h[x][x] = '?'

print(" steps of Candidate Elimination Algorithm",i+1)

print(specific_h)

print(general_h)

indices = [i for i, val in enumerate(general_h) if val ==

['?', '?', '?', '?', '?', '?']]

for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h:", s_final, sep="\n")

print("Final General_h:", g_final, sep="\n")

DATA SET:

Sky AirTemp Humidity Wind Water Forecast EnjoySport

sunny warm normal strong warm same yes

sunny warm high strong warm same yes

rainy cold high strong warm change no

sunny warm high strong cool change yes

OUTPUT:

Final Specific_h:

['sunny' 'warm' '?' 'strong' '?' '?']

Final General_h:

[['sunny', '?', '?', '?', '?', '?'],

['?', 'warm', '?', '?', '?', '?']]


RESULT:

Thus the program to implement Candidate-Elimination Algorithm for using the given
dataset have been executed successfully and the output got verified.
EX.NO: 2
DECISION TREE BASED ID3 ALGORITHM
DATE:

AIM:

To demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.

ALGORITHM :

 Create a Root node for the tree


 If all Examples are positive, Return the single-node tree Root, with label = +
 If all Examples are negative, Return the single-node tree Root, with label = -
 If Attributes is empty, Return the single-node tree Root, with label = most common
value of Target_attribute in Examples
 Otherwise Begin
 A ← the attribute from Attributes that best* classifies Examples
 The decision attribute for Root ← A
 For each possible value, vi, of A,
 Add a new tree branch below Root, corresponding to the test A = vi
 Let Examples vi, be the subset of Examples that have value vi for A
 If Examples vi , is empty
 Then below this new branch add a leaf node with label = most common value of
Target _ attribute in Examples
 Else below this new branch add the subtree ID3(Examples vi, Targe_tattribute,
Attributes – {A}))
 End
 Return Root

PROGRAM :

import math

import csv

def load_csv(filename):

lines=csv.reader(open(filename,"r"));

dataset = list(lines)

headers = dataset.pop(0)

return dataset,headers
class Node:

def __init__(self,attribute):

self.attribute=attribute

self.children=[]

self.answer=""

def subtables(data,col,delete):

dic={}

coldata=[row[col] for row in data]

attr=list(set(coldata))

counts=[0]*len(attr)

r=len(data)

c=len(data[0])

for x in range(len(attr)):

for y in range(r):

if data[y][col]==attr[x]:

counts[x]+=1

for x in range(len(attr)):

dic[attr[x]]=[[0 for i in range(c)] for j in

range(counts[x])]

pos=0

for y in range(r):

if data[y][col]==attr[x]:

if delete:

del data[y][col]

dic[attr[x]][pos]=data[y]

pos+=1

return attr,dic
def entropy(S):

attr=list(set(S))

if len(attr)==1:

return 0

counts=[0,0]

for i in range(2):

counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)

sums=0

for cnt in counts:

sums+=-1*cnt*math.log(cnt,2)

return sums

def compute_gain(data,col):

attr,dic = subtables(data,col,delete=False)

total_size=len(data)

entropies=[0]*len(attr)

ratio=[0]*len(attr)

total_entropy=entropy([row[-1] for row in data])

for x in range(len(attr)):

ratio[x]=len(dic[attr[x]])/(total_size*1.0)

entropies[x]=entropy([row[-1] for row in

dic[attr[x]]])

total_entropy-=ratio[x]*entropies[x]

return total_entropy

def build_tree(data,features):

lastcol=[row[-1] for row in data]

if(len(set(lastcol)))==1:

node=Node("")
node.answer=lastcol[0]

return node

n=len(data[0])-1

gains=[0]*n

for col in range(n):

gains[col]=compute_gain(data,col)

split=gains.index(max(gains))

node=Node(features[split])

fea = features[:split]+features[split+1:]

attr,dic=subtables(data,split,delete=True)

for x in range(len(attr)):

child=build_tree(dic[attr[x]],fea)

node.children.append((attr[x],child))

return node

def print_tree(node,level):

if node.answer!="":

print(" "*level,node.answer)

return

print(" "*level,node.attribute)

for value,n in node.children:

print(" "*(level+1),value)

print_tree(n,level+2)

def classify(node,x_test,features):

if node.answer!="":

print(node.answer)

return
pos=features.index(node.attribute)

for value, n in node.children:

if x_test[pos]==value:

classify(n,x_test,features)

'''Main program'''

dataset,features=load_csv("data3.csv")

node1=build_tree(dataset,features)

print("The decision tree for the dataset using ID3 algorithm

is")

print_tree(node1,0)

testdata,features=load_csv("data3_test.csv")

for xtest in testdata:

print("The test instance:",xtest)

print("The label for test instance:",end=" ")

classify(node1,xtest,features)

DATA SET:

Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes


D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

OUTPUT:

The decision tree for the dataset using ID3 algorithm is

Outlook

rain

Wind

strong

no

weak

yes

overcast

yes

sunny

Humidity

normal

yes

high

no

The test instance: ['rain', 'cool', 'normal', 'strong']

The label for test instance: no

The test instance: ['sunny', 'mild', 'normal', 'strong']

The label for test instance: yes


RESULT:

Thus the program to implement Decision tree for ID3 Algorithm using the given
dataset have been executed successfully and the output got verified.
EX.NO: 3
BACKPROPAGATION ALGORITHM
DATE:

AIM :

To Build an Artificial Neural Network by implementing the Backpropagation


algorithm and test the same using appropriate data sets.

ALGORITHM :

 Create a feed-forward network with ni inputs, nhidden hidden units, and nout
output
 units.
 Initialize all network weights to small random numbers
 Until the termination condition is met, Do
 For each (⃗x, t ), in training examples, Do
 Normalize the input

PROGRAM :

import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

X = X/np.amax(X,axis=0) # maximum of X array longitudinally

y = y/100

#Sigmoid Function

def sigmoid (x):

return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Function

def derivatives_sigmoid(x):

return x * (1 - x)

#Variable initialization

epoch=5000 #Setting training iterations


lr=0.1 #Setting learning rate

inputlayer_neurons = 2 #number of features in data set

hiddenlayer_neurons = 3 #number of hidden layers neurons

output_neurons = 1 #number of neurons at output layer

#weight and bias initialization

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neur

ons))

bh=np.random.uniform(size=(1,hiddenlayer_neurons))

wout=np.random.uniform(size=(hiddenlayer_neurons,output_neuron

s))

bout=np.random.uniform(size=(1,output_neurons))

#draws a random range of numbers uniformly of dim x*y

for i in range(epoch):

#Forward Propogation

hinp1=np.dot(X,wh)

hinp=hinp1 + bh

hlayer_act = sigmoid(hinp)

outinp1=np.dot(hlayer_act,wout)

outinp= outinp1+ bout

output = sigmoid(outinp)

#Backpropagation

EO = y-output

outgrad = derivatives_sigmoid(output)

d_output = EO* outgrad

EH = d_output.dot(wout.T)

#how much hidden layer wts contributed to error


hiddengrad = derivatives_sigmoid(hlayer_act)

d_hiddenlayer = EH * hiddengrad

# dotproduct of nextlayererror and currentlayerop

wout += hlayer_act.T.dot(d_output) *lr

wh += X.T.dot(d_hiddenlayer) *lr

print("Input: \n" + str(X))

print("Actual Output: \n" + str(y))

print("Predicted Output: \n" ,output)

OUTPUT:

Input:

[[0.66666667 1. ]

[0.33333333 0.55555556]

[1. 0.66666667]]

Actual Output:

[[0.92]

[0.86]

[0.89]]

Predicted Output:

[[0.89726759]

[0.87196896]

[0.9000671]]

RESULT:

Thus the program to implement Back propagation algorithm using the given dataset
have been executed successfully and the output got verified.
EX.NO: 4
NAÏVE BAYESIAN CLASSIFIER
DATE:

AIM:

To implement the naïve Bayesian classifier for a sample training data set stored as a
.CSV file.

ALGORITHM :

 The data set used in this program is the Pima Indians Diabetes problem.
 This data set is comprised of 768 observations of medical details for Pima
Indians patients. The records describe instantaneous measurements taken from
the patient such as their age, the number of times pregnant and blood workup.
All patients are women
 aged 21 or older. All attributes are numeric, and their units vary from attribute
to attribute.
 The attributes are Pregnancies, Glucose, BloodPressure, SkinThickness,
Insulin,
 BMI, DiabeticPedigreeFunction, Age, Outcome
 Each record has a class value that indicates whether the patient suffered an
onset of
 diabetes within 5 years of when the measurements were taken (1) or not (0)

PROGRAM:

import csv

import random

import math

def loadcsv(filename):

lines = csv.reader(open(filename, "r"));

dataset = list(lines)

for i in range(len(dataset)):

#converting strings into numbers for processing

dataset[i] = [float(x) for x in dataset[i]]

return dataset
def splitdataset(dataset, splitratio):

#67% training size

trainsize = int(len(dataset) * splitratio);

trainset = []

copy = list(dataset);

while len(trainset) < trainsize:

#generate indices for the dataset list randomly to pick ele for

training data

index = random.randrange(len(copy));

trainset.append(copy.pop(index))

return [trainset, copy]

def separatebyclass(dataset):

separated = {} #dictionary of classes 1 and 0

#creates a dictionary of classes 1 and 0 where the values are

#the instances belonging to each class

for i in range(len(dataset)):

vector = dataset[i]

if (vector[-1] not in separated):

separated[vector[-1]] = []

separated[vector[-1]].append(vector)

return separated

def mean(numbers):

return sum(numbers)/float(len(numbers))

def stdev(numbers):

avg = mean(numbers)

variance = sum([pow(x-avg,2) for x in


numbers])/float(len(numbers)-1)

return math.sqrt(variance)

def summarize(dataset): #creates a dictionary of classes

summaries = [(mean(attribute), stdev(attribute)) for

attribute in zip(*dataset)];

del summaries[-1] #excluding labels +ve or -ve

return summaries

def summarizebyclass(dataset):

separated = separatebyclass(dataset);

#print(separated)

summaries = {}

for classvalue, instances in separated.items():

#for key,value in dic.items()

#summaries is a dic of tuples(mean,std) for each class value

summaries[classvalue] = summarize(instances)

#summarize is used to cal to mean and std

return summaries

def calculateprobability(x, mean, stdev):

exponent = math.exp(-(math.pow(x-mean,2)/

(2*math.pow(stdev,2))))

return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateclassprobabilities(summaries, inputvector):

# probabilities contains the all prob of all class of test data

probabilities = {}

for classvalue, classsummaries in summaries.items():


#class and attribute information as mean and sd

probabilities[classvalue] = 1

for i in range(len(classsummaries)):

mean, stdev = classsummaries[i] #take mean and

sd of every attribute for class 0 and 1 seperaely

x = inputvector[i] #testvector's first attribute

probabilities[classvalue] *=

calculateprobability(x, mean, stdev);#use normal dist

return probabilities

def predict(summaries, inputvector): #training and test data

is passed

probabilities = calculateclassprobabilities(summaries,

inputvector)

bestLabel, bestProb = None, -1

for classvalue, probability in probabilities.items():

#assigns that class which has the highest prob

if bestLabel is None or probability > bestProb:

bestProb = probability

bestLabel = classvalue

return bestLabel

def getpredictions(summaries, testset):


predictions = []

for i in range(len(testset)):

result = predict(summaries, testset[i])

predictions.append(result)

return predictions

def getaccuracy(testset, predictions):

correct = 0

for i in range(len(testset)):

if testset[i][-1] == predictions[i]:

correct += 1

return (correct/float(len(testset))) * 100.0

def main():

filename = 'naivedata.csv'

splitratio = 0.67

dataset = loadcsv(filename);

trainingset, testset = splitdataset(dataset, splitratio)

print('Split {0} rows into train={1} and test={2}

rows'.format(len(dataset), len(trainingset), len(testset)))

# prepare model

summaries = summarizebyclass(trainingset);

#print(summaries)

# test model

predictions = getpredictions(summaries, testset) #find the

predictions of test data with the training data

accuracy = getaccuracy(testset, predictions)

print('Accuracy of the classifier is :


{0}%'.format(accuracy))

main()

OUTPUT:

Split 768 rows into train=514 and test=254 rows

Accuracy of the classifier is : 71.65354330708661%

RESULT:

Thus the program to implement Naïve Bayesian classifier using the given dataset
have been executed successfully and the output got verified.
EX.NO: 5 NAÏVE BAYESIAN CLASSIFIER USING ACCURACY,RECALL

DATE: PRECISION

AIM:

To implement use the naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.

ALGORITHM :

1. collect all words, punctuation, and other tokens that occur in Examples

• Vocabulary ← c the set of all distinct words and other tokens occurring in
any text document from Examples

2. calculate the required P(vj) and P(wk|vj) probability terms

• For each target value vj in V do

• docsj ← the subset of documents from Examples for which the target value
is vj

• P(vj) ← | docsj | / |Examples|

• Textj ← a single document created by concatenating all members of docsj

• n ← total number of distinct word positions in Textj

• for each word wk in Vocabulary

• nk ← number of times word wk occurs in Textj

• P(wk|vj) ← ( nk + 1) / (n + | Vocabulary| )

PROGRAM:

import pandas as pd

msg=pd.read_csv('naivetext.csv',names=['message','label'])

print('The dimensions of the dataset',msg.shape)

msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message

y=msg.labelnum

print(X)

print(y)

#splitting the dataset into train and test data

from sklearn.model_selection import train_test_split

xtrain,xtest,ytrain,ytest=train_test_split(X,y)

print ('\n The total number of Training Data :',ytrain.shape)

print ('\n The total number of Test Data :',ytest.shape)

#output of count vectoriser is a sparse matrix

from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()

xtrain_dtm = count_vect.fit_transform(xtrain)

xtest_dtm=count_vect.transform(xtest)

print('\n The words or Tokens in the text documents \n')

print(count_vect.get_feature_names())

df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe

ature_names())

# Training Naive Bayes (NB) classifier on training data.

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(xtrain_dtm,ytrain)

predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall

from sklearn import metrics

print('\n Accuracy of the classifer is’,

metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')

print(metrics.confusion_matrix(ytest,predicted))

print('\n The value of Precision' ,

metrics.precision_score(ytest,predicted))

print('\n The value of Recall' ,

metrics.recall_score(ytest,predicted))

DATA SET:

Text Documents Label

1 I love this sandwich pos

2 This is an amazing place pos

3 I feel very good about these beers pos

4 This is my best work pos

5 What an awesome view pos

6 I do not like this restaurant neg

7 I am tired of this stuff neg

8 I can't deal with this neg

9 He is my sworn enemy neg

10 My boss is horrible neg

11 This is an awesome place pos

12 I do not like the taste of this juice neg

13 I love to dance pos

14 I am sick and tired of this place neg

15 What a great holiday pos

16 That is a bad locality to stay neg

17 We will have good fun tomorrow pos


18 I went to my enemy's house today neg

OUTPUT:

The dimensions of the dataset (18, 2)

0 I love this sandwich

1 This is an amazing place

2 I feel very good about these beers

3 This is my best work

4 What an awesome view

5 I do not like this restaurant

6 I am tired of this stuff

7 I can't deal with this

8 He is my sworn enemy

9 My boss is horrible

10 This is an awesome place

11 I do not like the taste of this juice

12 I love to dance

13 I am sick and tired of this place

14 What a great holiday

15 That is a bad locality to stay

16 We will have good fun tomorrow

17 I went to my enemy's house today

Name: message, dtype: object

01

11
21

31

41

50

60

70

80

90

10 1

11 0

12 1

13 0

14 1

15 0

16 1

17 0

Name: labelnum, dtype: int64

The total number of Training Data: (13,)

The total number of Test Data: (5,)

The words or Tokens in the text documents

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'can', 'deal', 'do', 'enemy', 'feel',

'fun', 'good', 'great', 'have', 'he', 'holiday', 'house', 'is', 'like', 'love', 'my', 'not', 'of', 'place',

'restaurant', 'sandwich', 'sick', 'sworn', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very',

'view', 'we', 'went', 'what', 'will', 'with', 'work']


Accuracy of the classifier is 0.8

Confusion matrix

[[2 1]

[0 2]]

The value of Precision 0.6666666666666666

The value of Recall 1.0

RESULT:

Thus the program to implement Naïve Bayesian classifier using accuracy, precision,
recall using the given dataset have been executed successfully and the output got verified.
EX.NO: 6
BAYESIAN NETWORK
DATE:

AIM:
To construct a Bayesian network, to demonstrate the diagnosis of heart
patients using standard Heart Disease Data Set.

ALGORITHM:

Step 1: Read the training dataset T;


Step 2: Calculate the mean and standard deviation of the predictor
variables in each class;
Step 3: Repeat Calculate the probability of fi using the gauss density
equation in each class; Until the probability of all predictor variables (f1, f2, f3, .. ,
fn) has been calculated.
Step 4: Calculate the likelihood for each class;
Step 5: Get the greatest likelihood;

PROGRAM:

import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
ageEnum = {'SuperSeniorCitizen':0, 'SeniorCitizen':1, 'MiddleAged':2, 'Youth':3,
'Teen':4} genderEnum = {'Male':0, 'Female':1}
familyHistoryEnum = {'Yes':0, 'No':1}
dietEnum = {'High':0, 'Medium':1, 'Low':2}
lifeStyleEnum = {'Athlete':0, 'Active':1, 'Moderate':2, 'Sedetary':3}
cholesterolEnum = {'High':0, 'BorderLine':1, 'Normal':2}
heartDiseaseEnum = {'Yes':0, 'No':1}
with open('heart_disease_data.csv') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
data = []
for x in dataset:
data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[ x
[3]],lifeStyleEnum[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]]) data =
np.array(data)
N = len(data)
p_age = bp.nodes.Dirichlet(1.0*np.ones(5))
age = bp.nodes.Categorical(p_age, plates=(N,))
age.observe(data[:,0])
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
p_heartdisease = bp.nodes.Dirichlet(np.ones(2), plates=(5, 2, 2, 3, 4,
3))heartdisease = bp.nodes.MultiMixture([age, gender, familyhistory, diet,
lifstyle, cholesterol], bp.nodes.Categorical, p_heartdisease)
heartdisease.observe(data[:,6])
p_heartdisease.update()
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' str(ageEnum))),
int(input('Enter Gender: ' + str(genderEnum))), int(input('Enter FamilyHistory: ' +
str(familyHistoryEnum))), int(input('Enter dietEnum: ' + str(dietEnum))),
int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter Cholesterol: ' +
str(cholesterolEnum)))], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
m = int(input("Enter for Continue:0, Exit :1 "))

OUTPUT:

Enter Age: {'SuperSeniorCitizen': 0, 'SeniorCitizen': 1, 'MiddleAged': 2, 'Youth': 3,


'Teen': 4}1 Enter Gender: {'Male': 0, 'Female': 1}0
Enter FamilyHistory: {'Yes': 0, 'No': 1}0
Enter dietEnum: {'High': 0, 'Medium': 1, 'Low': 2}2
Enter LifeStyle: {'Athlete': 0, 'Active': 1, 'Moderate': 2,
'Sedetary': 3}2 Enter Cholesterol: {'High': 0, 'BorderLine': 1,
'Normal': 2}1
Probability(HeartDisease) = 0.5
Enter for Continue:0, Exit :1 1
RESULT:

Thus the program to implement a bayesian networks in the given heart


disease dataset have been executed successfully and the output got verified.
EX.NO: 7
EM ALGORITHM USING K MEANS
DATE:

AIM:

To implement the EM algorithm for clustering networks using the given

dataset.

ALGORITHM:

Initialize θ randomly Repeat until convergence:


E-step:
Compute q(h) = P(H = h | E = e; θ) for each h (probabilistic
inference) Create fully-observed weighted examples: (h, e) with
weight q(h)
M-step:
Maximum likelihood (count and normalize) on weighted examples to

get θ

PROGRAM:

from sklearn.cluster import KMeans


from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset=load_iris()
# print(dataset)

X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width'
] y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=4
0) plt.title('Real')

# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gm
m],s=4 0) plt.title('GMM Classification')

OUTPUT:
RESULT:
Thus the program to implement EM Algorithm for clustering networks
using the given dataset have been executed successfully and the output got
verified.
EX.NO: 8
K-NEAREST NEIGHBOUR ALGORITHM
DATE:

AIM:
To implment k-Nearest Neighbour algorithm for given dataset.

ALGORITHM:

Step-1: Select the number K of the neighbors


Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
Step-4: Among these k neighbors, count the number of the data points in
each category.
Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
Step-6: Our model is ready.

PROGRAM:

from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import pandas as pd
import numpy as np
from sklearn import datasets

iris=datasets.load_iris()
iris_data=iris.data
iris_labels=iris.target
x_train, x_test, y_train, y_test=(train_test_split(iris_data, iris_labels, test_size=0.20))
classifier=KNeighborsClassifier(n_neighbors=6) classifier.fit(x_train, y_train)
y_pred=classifier.predict(x_test)
print("accuracy is")
print(classification_report(y_test, y_pred))
OUTPUT:
accuracy is
precision recall f1-score support

0 1.00 1.00 1.00 9


1 1.00 0.93 0.96 14
2 0.88 1.00 0.93 7

accuracy 0.97 30
macro avg 0.96 0.98 0.97 30
weighted avg 0.97 0.97 0.97 30

RESULT:

Thus the program to implement k-Nearest Neighbour Algorithm for


clustering Iris dataset have been executed successfully and output got verified.
EX.NO: 9
BUILD REGRESSION MODELS
DATE:

AIM:

To build regression models such as locally weighted linear regression and


plot the necessary graphs.

ALGORITHM:

Step 1: Read the Given data Sample to X and the curve (linear
or non linear) to Y
Step 2: Set the value for Smoothening parameter or Free parameter
say τ
Step 3: Set the bias /Point of interest set x0 which is a subset of X
Step 4: Determine the weight matrix using :

Step 5: Determine the value of model term parameter β using :

Step 6: Prediction = x0*β.

PROGRAM:

from math import ceil


import numpy as np
from scipy import linalg
def lowess(x, y, f, iterations):
n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iterations):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights
* x), np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
import matplotlib.pyplot as plt
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")

OUTPUT :
RESULT:

Thus the program to implement regression algorithm using the given dataset have
been executed successfully and the output got verified.

You might also like