0% found this document useful (0 votes)

2 views

Ml Lab Record

The document outlines the implementation of the Candidate-Elimination and ID3 algorithms for machine learning. It details the steps for both algorithms, including loading datasets, initializing hypotheses, and calculating entropy and information gain. The results demonstrate the successful application of these algorithms to classify data and generate decision trees.

Uploaded by

examcell.vvcet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Ml Lab Record

Uploaded by

examcell.vvcet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

EX NO:1

DATE: Implementation of Candidate-Elimination algorithm

AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.

ALGORITHM:

Step1:Load Data set

Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: a. If the example is positive, remove from the specific boundary any hypothesis that is not
consistent with the example.
b. If the example is negative, remove from the general boundary any hypothesis that is consistent
with the example.
Step5: If example is Negative example make generalize hypothesis more specific.

PROGRAM:

import numpy as np
import pandas as pd
data=pd.read_csv('https://raw.githubusercontent.com/manishsingh7163/Machine-Learning-
Algorithms/master/Candidate%20Elimination%20Algorithm/enjoysport.csv')
concepts = np.array(data.iloc[:,:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ",specific_h)
print("General Hypothesis: ",general_h,)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ", specific_h)
print("General Hypothesis: ",general_h,)
indices = [i for i,val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

print("*"*20,"Costumer Elimination Algorithm","*"*20)
s_final, g_final = learn(concepts, target)
print("Final Specific hypothesis:", s_final)
print("Final General hypothesis:", g_final)

Output:

For Training instance No:0 the hypothesis is

Specific Hypothesis: ['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
General Hypothesis: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

For Training instance No:1 the hypothesis is

Specific Hypothesis: ['sunny' 'warm' '?' 'strong' 'warm' 'same']
General Hypothesis: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
For Training instance No:2 the hypothesis is
Specific Hypothesis: ['sunny' 'warm' '?' 'strong' 'warm' 'same']
General Hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]

For Training instance No:3 the hypothesis is

Specific Hypothesis: ['sunny' 'warm' '?' 'strong' '?' '?']
General Hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific hypothesis: ['sunny' 'warm' '?' 'strong' '?' '?']
Final General hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

Result:
Thus, we have successfully implemented and demonstrated the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the training examples.
EX NO:2
DATE: Decision Tree Based ID3 Algorithm

AIM:
To Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.

ALGORITHM:

1. Create a Root node for the tree

2. If all Examples are positive, Return the single-node tree Root, with label = +
3. If all Examples are negative, Return the single-node tree Root, with label = -
4. If Attributes is empty, Return the single-node tree Root, with label = most common value
5. of Target_attribute in Examples
6. Otherwise Begin
7. A ← the attribute from Attributes that best* classifies Examples
8. The decision attribute for Root ← A
9. For each possible value, vi, of A,
i. Add a new tree branch below Root, corresponding to the test A = vi
ii. Let Examples vi, be the subset of Examples that have value vi for A
iii. If Examples vi , is empty
iv. Then below this new branch add a leaf node with label = most common
v. value of Target_attribute in Examples
vi. Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
10. End
11. Return Root

PROGRAM:

import pandas as pd
from pandas import DataFrame
df_tennis = DataFrame.from_csv('PlayTennis.csv')
print("\n Given Play Tennis Data Set:\n\n", df_tennis)
#Function to calculate the entropy of probaility of observations
# -p*log2*p

def entropy(probs):
import math
return sum( [-prob*math.log(prob, 2) for prob in probs] )
#Function to calulate the entropy of the given Data Sets/List with respect to target attributes
def entropy_of_list(a_list):
#print("A-list",a_list)
from collections import Counter
cnt = Counter(x for x in a_list) # Counter calculates the propotion of class
# print("\nClasses:",cnt)
#print("No and Yes Classes:",a_list.name,cnt)
num_instances = len(a_list)*1.0 # = 14
print("\n Number of Instances of the Current Sub Class is {0}:".format(num_instances ))
probs = [x / num_instances for x in cnt.values()] # x means no of YES/NO
print("\n Classes:",min(cnt),max(cnt))
print(" \n Probabilities of Class {0} is {1}:".format(min(cnt),min(probs)))
print(" \n Probabilities of Class {0} is {1}:".format(max(cnt),max(probs)))
return entropy(probs) # Call Entropy :

# The initial entropy of the YES/NO attribute for our dataset.

print("\n INPUT DATA SET FOR ENTROPY CALCULATION:\n", df_tennis['PlayTennis'])

total_entropy = entropy_of_list(df_tennis['PlayTennis'])

print("\n Total Entropy of PlayTennis Data Set:",total_entropy)

def information_gain(df, split_attribute_name, target_attribute_name, trace=0):
print("Information Gain Calculation of ",split_attribute_name)
'''
Takes a DataFrame of attributes, and quantifies the entropy of a target
attribute after performing a split along the values of another attribute.
'''
# Split Data by Possible Vals of Attribute:
df_split = df.groupby(split_attribute_name)
# for name,group in df_split:
# print("Name:\n",name)
# print("Group:\n",group)

# Calculate Entropy for Target Attribute, as well as

# Proportion of Obs in Each Data-Split
nobs = len(df.index) * 1.0
# print("NOBS",nobs)
df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs]
})[target_attribute_name]
#print([target_attribute_name])
#print(" Entropy List ",entropy_of_list)
#print("DFAGGENT",df_agg_ent)
df_agg_ent.columns = ['Entropy', 'PropObservations']
#if trace: # helps understand what fxn is doing:
# print(df_agg_ent)

# Calculate Information Gain:

new_entropy = sum( df_agg_ent['Entropy'] * df_agg_ent['PropObservations'] )
old_entropy = entropy_of_list(df[target_attribute_name])
return old_entropy - new_entropy

print('Info-gain for Outlook is :'+str( information_gain(df_tennis, 'Outlook', 'PlayTennis')),"\n")

print('\n Info-gain for Humidity is: ' + str( information_gain(df_tennis, 'Humidity', 'PlayTennis')),"\n")
print('\n Info-gain for Wind is:' + str( information_gain(df_tennis, 'Wind', 'PlayTennis')),"\n")
print('\n Info-gain for Temperature is:' + str( information_gain(df_tennis,
'Temperature','PlayTennis')),"\n")
def id3(df, target_attribute_name, attribute_names, default_class=None):

## Tally target attribute:

from collections import Counter
cnt = Counter(x for x in df[target_attribute_name])# class of YES /NO

## First check: Is this split of the dataset homogeneous?

if len(cnt) == 1:
return next(iter(cnt)) # next input data set, or raises StopIteration when EOF is hit.

## Second check: Is this split of the dataset empty?

# if yes, return a default value
elif df.empty or (not attribute_names):
return default_class # Return None for Empty Data Set

## Otherwise: This dataset is ready to be devied up!

else:
# Get Default Value for next recursive call of this function:
default_class = max(cnt.keys()) #No of YES and NO Class
# Compute the Information Gain of the attributes:
gainz = [information_gain(df, attr, target_attribute_name) for attr in attribute_names] #
index_of_max = gainz.index(max(gainz)) # Index of Best Attribute
# Choose Best Attribute to split on:
best_attr = attribute_names[index_of_max]

# Create an empty tree, to be populated in a moment

tree = {best_attr:{}} # Iniiate the tree with best attribute as a node
remaining_attribute_names = [i for i in attribute_names if i != best_attr]

# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
#print(tree)
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
def classify(instance, tree, default=None): # Instance of Play Tennis with Predicted

#print("Instance:",instance)
attribute = next(iter(tree))
if instance[attribute] in tree[attribute].keys(): # Value of the attributs in set of Tree keys
result = tree[attribute][instance[attribute]]
print("Instance Attribute:",instance[attribute],"TreeKeys :",tree[attribute].keys())
if isinstance(result, dict): # this is a tree, delve deeper
return classify(instance, result)
else:
return result # this is a label
else:
return default
df_tennis['predicted'] = df_tennis.apply(classify, axis=1, args=(tree,'No') )
# classify func allows for a default arg: when tree doesn't have answer for a particular
# combitation of attribute-values, we can use 'no' as the default guess

print(df_tennis['predicted'])

print('\n Accuracy is:\n' + str( sum(df_tennis['PlayTennis']==df_tennis['predicted'] ) /

(1.0*len(df_tennis.index)) ))

df_tennis[['PlayTennis', 'predicted']]
training_data = df_tennis.iloc[1:-4] # all but last four instances
test_data = df_tennis.iloc[-4:] # just the last four
train_tree = id3(training_data, 'PlayTennis', attribute_names)

test_data['predicted2'] = test_data.apply( # < --- test_data source

classify,
axis=1,
args=(train_tree,'Yes') ) # < ---- train_data tree

print ('\n\n Accuracy is : ' + str( sum(test_data['PlayTennis']==test_data['predicted2'] ) /

(1.0*len(test_data.index)) ))

OUTPUT:

Given Play Tennis Data Set:

PlayTennis Outlook Temperature Humidity Wind

0 No Sunny Hot High Weak
1 No Sunny Hot High Strong
2 Yes Overcast Hot High Weak
3 Yes Rain Mild High Weak
4 Yes Rain Cool Normal Weak
5 No Rain Cool Normal Strong
6 Yes Overcast Cool Normal Strong
7 No Sunny Mild High Weak
8 Yes Sunny Cool Normal Weak
9 Yes Rain Mild Normal Weak
10 Yes Sunny Mild Normal Strong
11 Yes Overcast Mild High Strong
12 Yes Overcast Hot Normal Weak
13 No Rain Mild High Strong

INPUT DATA SET FOR ENTROPY CALCULATION:

0 No
1 No
2 Yes
3 Yes
4 Yes
5 No
6 Yes
7 No
8 Yes
9 Yes
10 Yes
11 Yes
12 Yes
13 No
Name: PlayTennis, dtype: object

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Total Entropy of PlayTennis Data Set: 0.9402859586706309

Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 4.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Outlook is :0.246749819774

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.42857142857142855:

Probabilities of Class Yes is 0.5714285714285714:

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.14285714285714285:

Probabilities of Class Yes is 0.8571428571428571:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Humidity is: 0.151835501362

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 8.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Wind is:0.0481270304083

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Temperature is:0.029222565659

Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 4.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes
Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.42857142857142855:

Probabilities of Class Yes is 0.5714285714285714:

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.14285714285714285:

Probabilities of Class Yes is 0.8571428571428571:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 8.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:
Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Information Gain Calculation of Wind
Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes
Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:
Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 3.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

The Resultant Decision Tree is :

{'Outlook': {'Overcast': 'Yes',
'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}
Best Attribute :
Outlook
Tree Keys:
dict_keys(['Overcast', 'Rain', 'Sunny'])

Instance Attribute: Strong TreeKeys : dict_keys(['Strong', 'Weak'])

0 No
1 No
2 Yes
3 Yes
4 Yes
5 No
6 Yes
7 No
8 Yes
9 Yes
10 Yes
11 Yes
12 Yes
13 No

Name: predicted, dtype: object

Accuracy is:
1.0

PlayTennis predicted

0 No No

1 No No

2 Yes Yes

3 Yes Yes

4 Yes Yes

5 No No

6 Yes Yes

7 No No

8 Yes Yes

9 Yes Yes

10 Yes Yes

11 Yes Yes

12 Yes Yes

13 No No
Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.2:

Probabilities of Class Yes is 0.8:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.16666666666666666:

Probabilities of Class Yes is 0.8333333333333334:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Sunny TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Temperature'])
Attribute: Temperature
Instance Attribute: Mild TreeKeys : dict_keys(['Cool', 'Hot', 'Mild'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Overcast TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Overcast TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Rain TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Wind'])
Attribute: Wind
Instance Attribute: Strong TreeKeys : dict_keys(['Strong', 'Weak'])

Accuracy is : 0.75

Result:
Thus, we have successfully implemented and demonstrated the working of the decision tree based ID3
algorithm.
EX NO:3
DATE: Artificial Neural Network by Backpropagation

AIM:
To build an artificial neural network by implementing the backpropagation algorithm and test
the same using appropriate data sets.

ALGORITHM:

Step 1: Inputs X, arrive through the preconnected path.

Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.

Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.

Step 4: Calculate the error in the outputs

Backpropagation Error= Actual Output – Desired Output

Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error.

Step 6: Repeat the process until the desired output is achieved.

PROGRAM:

import random
from math import exp
from random import seed

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):

network = list()
hidden_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_inputs + 1)]} for i in
range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_hidden + 1)]} for i in
range(n_outputs)]
network.append(output_layer)
i= 1
print("\n The initialised Neural Network:\n")
for layer in network:
j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub)
j=j+1
i=i+1
return network

# Calculate neuron activation (net) for an input

def activate(weights, inputs):

activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation to sigmoid function

def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Calculate the derivative of an neuron output

def transfer_derivative(output):
return output * (1.0 - output)

# Backpropagate error and store in neurons

def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()

if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error

def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

# Train a network for a fixed number of epochs

def train_network(network, train, l_rate, n_epoch, n_outputs):

print("\n Network Training Begins:\n")

for epoch in range(n_epoch):

sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

print("\n Network Training Ends:\n")

#Test training backprop algorithm

seed(2)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
print("\n The input Data Set :\n",dataset)
n_inputs = len(dataset[0]) - 1
print("\n Number of Inputs :\n",n_inputs)
n_outputs = len(set([row[-1] for row in dataset]))
print("\n Number of Outputs :\n",n_outputs)

#Network Initialization
network = initialize_network(n_inputs, 2, n_outputs)

# Training the Network

train_network(network, dataset, 0.5, 20, n_outputs)

# Calculate neuron activation for an input

def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation

def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output

# Make a prediction with a network

def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

# Test making predictions with the network

dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
#network = [[{'weights': [-1.482313569067226, 1.8308790073202204,
1.078381922048799]}, {'weights': [0.23244990332399884, 0.3621998343835864,
0.40289821191094327]}],
# [{'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]},
{'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]
for row in dataset:
prediction = predict(network, row)
print('Expected=%d, Got=%d' % (row[-1], prediction))

Output:
The input Data Set :
[[2.7810836, 2.550537003, 0], [1.465489372, 2.362125076, 0], [3.396561688, 4.400293529,
0], [1.38807019, 1.850220317, 0], [3.06407232, 3.005305973, 0], [7.627531214,
2.759262235, 1], [5.332441248, 2.088626775, 1], [6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1], [7.673756466, 3.508563011, 1]]

Number of Inputs :
2

Number of Outputs :
2
The initialised Neural Network:
Layer[1] Node[1]:
{'weights': [0.4560342718892494, 0.4478274870593494, -0.4434486322731913]}

Layer[1] Node[2]:
{'weights': [-0.41512800484107837, 0.33549887812944956, 0.2359699890685233]}

Layer[2] Node[1]:
{'weights': [0.1697304014402209, -0.1918635424108558, 0.10594416567846243]}

Layer[2] Node[2]:
{'weights': [0.10680173364083789, 0.08120401711200309, -0.3416171297451944]}

Network Training Begins:

>epoch=0, lrate=0.500, error=5.278

>epoch=1, lrate=0.500, error=5.122
>epoch=2, lrate=0.500, error=5.006
>epoch=3, lrate=0.500, error=4.875
>epoch=4, lrate=0.500, error=4.700
>epoch=5, lrate=0.500, error=4.466
>epoch=6, lrate=0.500, error=4.176
>epoch=7, lrate=0.500, error=3.838
>epoch=8, lrate=0.500, error=3.469
>epoch=9, lrate=0.500, error=3.089
>epoch=10, lrate=0.500, error=2.716
>epoch=11, lrate=0.500, error=2.367
>epoch=12, lrate=0.500, error=2.054
>epoch=13, lrate=0.500, error=1.780
>epoch=14, lrate=0.500, error=1.546
>epoch=15, lrate=0.500, error=1.349
>epoch=16, lrate=0.500, error=1.184
>epoch=17, lrate=0.500, error=1.045
>epoch=18, lrate=0.500, error=0.929
>epoch=19, lrate=0.500, error=0.831

Network Training Ends:

Final Neural Network :
Layer[1] Node[1]:
{'weights': [0.8642508164347665, -0.8497601716670763, -0.8668929014392035], 'output':
0.9295587965836384, 'delta': 0.005645382825629247}

Layer[1] Node[2]:
{'weights': [-1.2934302410111025, 1.7109363237151507, 0.7125327507327329], 'output':
0.04760703296164151, 'delta': -0.005928559978815076}

Layer[2] Node[1]:
{'weights': [-1.3098359335096292, 2.16462207144596, -0.3079052288835876], 'output':
0.19895563952058462, 'delta': -0.03170801648036037}
Layer[2] Node[2]:
{'weights': [1.5506793402414165, -2.11315950446121, 0.1333585709422027], 'output':
0.8095042653312078, 'delta': 0.029375796661413225}
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1

Result:
Thus, we have successfully built an artificial neural network by implementing the
backpropagation algorithm and test the same using appropriate data.
EX NO:4
DATE: Implementation Of Naïve Bayesian Classifier

AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.

ALGORITHM:

Step1: Load Data set

Step2: Lable the data
Step3: Split it into train and test data
Step4: Fit the data
Step5: Calculate accuracy and print it

PROGRAM:

# import necessary libarities

import pandas as pd
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB

# load data from CSV

data = pd.read_csv('tennisdata.csv')
print("THe first 5 values of data is :\n",data.head())

# obtain Train data and Train output

X = data.iloc[:,:-1]
print("\nThe First 5 values of train data is\n",X.head())

y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())

# Convert then in numbers

le_outlook = LabelEncoder()
X.Outlook = le_outlook.fit_transform(X.Outlook)

le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)

le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)

print("\nNow the Train data is :\n",X.head())

le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20)

classifier = GaussianNB()
classifier.fit(X_train,y_train)

from sklearn.metrics import accuracy_score

print("Accuracy is:",accuracy_score(classifier.predict(X_test),y_test))

Output:
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes

The First 5 values of train data is:

Outlook Temperature Humidity Windy
0 Sunny Hot High False
1 Sunny Hot High True
2 Overcast Hot High False
3 Rainy Mild High False
4 Rainy Cool Normal False

The first 5 values of Train output is

0 No
1 No
2 Yes
3 Yes
4 Yes
Name: PlayTennis, dtype: object

Now the Train data is :

Outlook Temperature Humidity Windy
0 2 1 0 0
1 2 1 0 1
2 0 1 0 0
3 1 2 0 0
4 1 0 1 0

Now the Train output is

[0 0 1 1 1 0 1 0 1 1 1 1 1 0]

Accuracy is: 0.6666666666666666

Result:
Thus, we have successfully written a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file and compute the accuracy with a few test data
sets.
EX NO:5
DATE: Implementation Of Naïve Bayesian Classifier Model

AIM:
To write a program to Implement naïve Bayesian Classifier model to classify a set of
documents and measure the accuracy, precision, and recall.

ALGORITHM:
Step1: Data Pre-processing step

Step2: Fitting Naive Bayes to the Training set

Step3: Predicting the test result

Step4: Test accuracy of the result(Creation of Confusion matrix)

Step5: Visualizing the test set result.

PROGRAM:

import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer

count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

Output:
Total Instances of Dataset: 18

about am an and awesome bad beers best boss can ... tired t
o
0 0 1 0 1 0 0 0 0 0 0 ... 1
0
1 0 0 0 0 0 0 0 0 0 0 ... 0
0
2 0 0 0 0 0 0 0 0 0 0 ... 0
0
3 0 0 0 0 0 0 0 0 0 1 ... 0
0
4 0 0 0 0 0 0 0 0 0 0 ... 0
0

today tomorrow very we went will with work

0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
4 0 0 0 0 0 0 0 0

[5 rows x 49 columns]

I am sick and tired of this place -> pos

I do not like the taste of this juice -> neg
I love this sandwich -> neg
I can't deal with this -> pos
I do not like this restaurant -> neg

Accuracy Metrics:

Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
Result:
Thus, we have successfully written a program to implement naïve Bayesian Classifier model
to classify a set of documents and measure the accuracy, precision, and recall.
EX NO:6
DATE: Bayesian network to diagnose CORONA infection

AIM:
To Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.

ALGORITHM:
Step1: Data Pre-processing step

Step2: Fitting Naive Bayes to the Training set

Step3: Predicting the test result

Step4: Visualizing the test set result.

PROGRAM:

import pandas as pd

inside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - Hubei.csv',

low_memory=False)
outside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - outside_Hubei.csv',
low_memory=False)

outside = outside.drop(['data_moderator_initials'], axis=1)

data = pd.concat([inside, outside])

import json
import itertools
from datetime import datetime

with open('./covid/data/00/symptom-mapping.json', 'r') as f:

symptom_map = json.load(f)

def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s

def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]

d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)

d.symptoms = d.symptoms.apply(lambda s: tokenize(s))

d.symptoms = d.symptoms.apply(lambda tokens: [map_to_symptoms(s) for s in tokens] if
tokens is not None else None)
d.symptoms = d.symptoms.apply(lambda arrs: None if arrs is None else
list(itertools.chain(*arrs)))

for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)

d = d.drop(['symptoms'], axis=1)
print(d.shape)

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('seaborn')

v = [d[d[c] == 1].shape[0] for c in d.columns]

s = pd.Series(v, d.columns)

fig, ax = plt.subplots(figsize=(15, 5))

_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptoms, n={d.shape[0]}')

plt.tight_layout()

from pybbn.graph.dag import Bbn

from pybbn.pptc.inferencecontroller import InferenceController
import json

with open('./covid/naive.json', 'r') as f:

bbn = Bbn.from_dict(json.load(f))
join_tree = InferenceController.apply(bbn)

from pybbn.generator.bbngenerator import convert_for_drawing

import networkx as nx
import warnings

with warnings.catch_warnings():
warnings.simplefilter('ignore')

graph = convert_for_drawing(bbn)
pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')

plt.figure(figsize=(15, 8))
plt.subplot(121)
labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
nx.draw(graph, pos=pos, with_labels=True, labels=labels)
plt.title('BBN DAG')

def potential_to_series(potential):
def get_entry_kv(entry):
arr = [(k, v) for k, v in entry.entries.items()]
arr = sorted(arr, key=lambda tup: tup[0])
return arr[0][1], entry.value

tups = [get_entry_kv(e) for e in potential.entries]

return pd.Series([tup[1] for tup in tups], [tup[0] for tup in tups])

series = [(node, potential_to_series(join_tree.get_bbn_potential(node))) for node in

join_tree.get_bbn_nodes()]

n_cols = 3
n_rows = int(len(series) / n_cols)

fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 20))

axes = np.ravel(axes)

for ax, (node, s) in zip(axes, series):

s.plot(kind='bar', ax=ax, title=f'{node.variable.name}')

plt.tight_layout()

%%time

from pybbn.graph.jointree import EvidenceBuilder

names = [
'anosmia', 'sputum', 'muscle', 'chills', 'fever',
'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
'running_nose'
]

predictions = []
for i, r in d.iterrows():
fields = [name for name in names if r[name] == 1]

join_tree.unobserve_all()

if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in
bbn_nodes]
join_tree.update_evidences(evidences)

disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)

predictions.append(s)

predictions = pd.DataFrame(predictions)
predictions

Output:

(656, 1)
(656, 32)
CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s

Result:
Thus, to write a program to construct a bayesian network to diagnose corona infection using
standard WHO data set.
EX NO:7
DATE: EM Algorithm & K-Means Algorithm

AIM:
To write a program to apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using the kk-Means
Means algorithm. Compare the results of these
two algorithms.

ALGORITHM:

K-Means Algorithm:
Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

EM Algorithm:

The EM algorithm is the combination of various unsupervised ML algorithms, such as the

k-means clustering algorithm. Being an iterative approach, it consists of two modes. In the
first mode, we estimate the missing or latent variables. Hence it is referred to as the
Expectation/estimation step (E--step).
step). Further, the other mode is used to optimize the
parameters of the models so that it can explain the data more clearly. The second mode is
known as the maximization-step or M-step.

 Expectation step (E - step): It involves the estimation (guess) of all missing values in
the dataset so that after completing this step, there should not be any missing value.
 Maximization step (M - step): This step involves the use of estimated data in the E-
step and updating the parameters.
 Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the dataset to
estimate the missing data of the latent variables and then use that data to update the values of
the parameters in the M-step.

PROGRAM:

from sklearn.cluster import KMeans

from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset=load_iris()
# print(dataset)

X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')

# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')

# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

Output:

Text(0.5, 1.0, 'GMM Classification')

Result:
Thus, we have successfully written a program to apply EM algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for clustering using the k-Means algorithm.
Compare the results of these two algorithms.
EX NO:8
DATE: K-Nearest Neighbour Algorithm

AIM:
To write a program to implement k-Nearest
k Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions

ALGORITHM:

K-Nearest Neighbor Algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.

PROGRAM:

from sklearn.datasets import load_iris

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np

dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=
0)

kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)

for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)

print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,da
taset["target_names"][prediction])
print(kn.score(X_test,y_test))

Output:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')

TARGET= 2 virginica PREDICTED= [2] ['virginica']

TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [2] ['virginica']
0.9736842105263158

Result:
Thus, we have successfully written a program to implement k-Nearest Neighbour algorithm
to classify the iris data set.Print both correct and wrong predictions.
EX NO:9
DATE: Locally Weighted Regression Algorithm

AIM:
To write a program to implement the non-parametric locally weighted regression algorithm in
order to fit data points. select an appropriate data set for your experiment and draw graphs

ALGORITHM:
 LWR is a non-parametric regression technique that fits a linear regression model
to a dataset by giving more weight to nearby data points.

 LWR fits a separate linear regression model for each query point based on the
weights assigned to the training data points.

 The weights assigned to each training data point are inversely proportional to
their distance from the query point.

 Training data points that are closer to the query point will have a higher weight
and contribute more to the linear regression model.

 LWR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in
the data.

PROGRAM:

from math import ceil

import numpy as np
from scipy import linalg

def lowess(x, y, f, iterations):

n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iterations):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x),
np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2

return yest

import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")

Output:

[<matplotlib.lines.Line2D at 0x37459696d8>]

Result:
Thus, we have successfully written a program to implement the non non-parametric
parametric locally
weighted regression algorithm in order to fit data points. select an appropriate data set for
your experiment and draw graphs.

Lab Manual
No ratings yet
Lab Manual
25 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
ML LAB P-1
No ratings yet
ML LAB P-1
10 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
code mlt
No ratings yet
code mlt
9 pages
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
ML1 3 Merged
No ratings yet
ML1 3 Merged
19 pages
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
No ratings yet
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
31 pages
ML MANUAL (1)
No ratings yet
ML MANUAL (1)
74 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
ML Lab Prog1-5 (5) College PDF
No ratings yet
ML Lab Prog1-5 (5) College PDF
12 pages
ML Lab Manual (1-9)
No ratings yet
ML Lab Manual (1-9)
37 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
New ML Lab Manual
No ratings yet
New ML Lab Manual
29 pages
MANUAL (1)
No ratings yet
MANUAL (1)
34 pages
ML
No ratings yet
ML
30 pages
IV - ML Lab
No ratings yet
IV - ML Lab
31 pages
MLT Shivani
No ratings yet
MLT Shivani
8 pages
ML Lab Programs 1-10-Converted NAM COLLEGE PDF
No ratings yet
ML Lab Programs 1-10-Converted NAM COLLEGE PDF
33 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
28 pages
Machine learning
No ratings yet
Machine learning
27 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
Amit MLT1
No ratings yet
Amit MLT1
22 pages
ML Lab Record
No ratings yet
ML Lab Record
33 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
MANUAL (2)
No ratings yet
MANUAL (2)
33 pages
Machine Learning Through Python Lab Mannual
No ratings yet
Machine Learning Through Python Lab Mannual
33 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
AIML
No ratings yet
AIML
12 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
ML Lab Manual-99
No ratings yet
ML Lab Manual-99
23 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
DOC-20250509-WA0027.
No ratings yet
DOC-20250509-WA0027.
34 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
ad3461-ml-lab-manual-format-edited
No ratings yet
ad3461-ml-lab-manual-format-edited
45 pages
ML Record
No ratings yet
ML Record
18 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
ML Lab PFG - Removed - Removed - Removed
No ratings yet
ML Lab PFG - Removed - Removed - Removed
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Machine Learning Laboratory (21AIL66)
No ratings yet
Machine Learning Laboratory (21AIL66)
7 pages
ML Lab
No ratings yet
ML Lab
9 pages
AD3461 ML lab manual
No ratings yet
AD3461 ML lab manual
32 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
Machine Learning Manual Final
No ratings yet
Machine Learning Manual Final
37 pages
MLWP LAB Experiment's
No ratings yet
MLWP LAB Experiment's
11 pages
MACHINE LEARNING LAB MANUAL (1)
No ratings yet
MACHINE LEARNING LAB MANUAL (1)
23 pages
Lab Manual Final
No ratings yet
Lab Manual Final
34 pages
Screenshot 2023-12-07 at 11.07.49 AM
No ratings yet
Screenshot 2023-12-07 at 11.07.49 AM
14 pages
ML Lab - 231009 - 210335
No ratings yet
ML Lab - 231009 - 210335
38 pages
ML Manual
No ratings yet
ML Manual
34 pages
Machine Learninf File Final
No ratings yet
Machine Learninf File Final
45 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
23 pages
ML LAB record[1]
No ratings yet
ML LAB record[1]
35 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
AI-Important Question
No ratings yet
AI-Important Question
1 page
IT8761 Security Laboratory Manual - Print NEW
No ratings yet
IT8761 Security Laboratory Manual - Print NEW
36 pages
Os Lab Record
No ratings yet
Os Lab Record
80 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
Ccw332 Digital Marketing-Syllabus-R 2021
No ratings yet
Ccw332 Digital Marketing-Syllabus-R 2021
1 page
The Spatial and Temporal Variation Features of Wind-Sun Complementarity in China
No ratings yet
The Spatial and Temporal Variation Features of Wind-Sun Complementarity in China
11 pages
16 Flat
No ratings yet
16 Flat
88 pages
Data Mining and Machine Learning PDF
No ratings yet
Data Mining and Machine Learning PDF
10 pages
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
No ratings yet
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
11 pages
Patterns_of_Parentification_Health_and_Life_Satisfaction
No ratings yet
Patterns_of_Parentification_Health_and_Life_Satisfaction
17 pages
Lecture 15 (91 Slides)
No ratings yet
Lecture 15 (91 Slides)
91 pages
Acoustic emission with machine learning in fracture of composites- preliminary study
No ratings yet
Acoustic emission with machine learning in fracture of composites- preliminary study
18 pages
Base SAS Certification Questions Series - Part 3
No ratings yet
Base SAS Certification Questions Series - Part 3
4 pages
Automatic Vehicle License Plate Recognition Using Optimal K-Means With Convolutional Neural Network For Intelligent Transportation Systems
No ratings yet
Automatic Vehicle License Plate Recognition Using Optimal K-Means With Convolutional Neural Network For Intelligent Transportation Systems
11 pages
Imbalanced K-Means: An Algorithm To Cluster Imbalanced-Distributed Data
No ratings yet
Imbalanced K-Means: An Algorithm To Cluster Imbalanced-Distributed Data
9 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Deteksi Uang Palsu Rupiah Dengan Menggunakan Metode Deteksi Tepi
No ratings yet
Deteksi Uang Palsu Rupiah Dengan Menggunakan Metode Deteksi Tepi
8 pages
Assigment 3
No ratings yet
Assigment 3
2 pages
Step by Step K Means Example
No ratings yet
Step by Step K Means Example
2 pages
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
No ratings yet
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
5 pages
RBFNetworks
No ratings yet
RBFNetworks
32 pages
Ml Unit 5 Material Svck Cse
No ratings yet
Ml Unit 5 Material Svck Cse
22 pages
dataminingshort Question part2
No ratings yet
dataminingshort Question part2
17 pages
Unit4-Clustering-Evaluation
No ratings yet
Unit4-Clustering-Evaluation
53 pages
Manual of Python
No ratings yet
Manual of Python
43 pages
Internship Presentation
No ratings yet
Internship Presentation
18 pages
CSE4062S24 Group5 Project DescriptiveAnalysis
No ratings yet
CSE4062S24 Group5 Project DescriptiveAnalysis
10 pages
Hierarchical Clustering in Machine Learning - GeeksforGeeks
No ratings yet
Hierarchical Clustering in Machine Learning - GeeksforGeeks
8 pages
nptel-assignment-answers
No ratings yet
nptel-assignment-answers
52 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
Lec 06 Clustering
No ratings yet
Lec 06 Clustering
44 pages
KMeansPP Soda
No ratings yet
KMeansPP Soda
9 pages
Multivariate Time Series Clustering Based On Common Principal Component Analysis 2019
No ratings yet
Multivariate Time Series Clustering Based On Common Principal Component Analysis 2019
9 pages
Deconstructing Disengagement
No ratings yet
Deconstructing Disengagement
10 pages