0% found this document useful (0 votes)
2 views

Ml Lab Record

The document outlines the implementation of the Candidate-Elimination and ID3 algorithms for machine learning. It details the steps for both algorithms, including loading datasets, initializing hypotheses, and calculating entropy and information gain. The results demonstrate the successful application of these algorithms to classify data and generate decision trees.

Uploaded by

examcell.vvcet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Ml Lab Record

The document outlines the implementation of the Candidate-Elimination and ID3 algorithms for machine learning. It details the steps for both algorithms, including loading datasets, initializing hypotheses, and calculating entropy and information gain. The results demonstrate the successful application of these algorithms to classify data and generate decision trees.

Uploaded by

examcell.vvcet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

EX NO:1

DATE: Implementation of Candidate-Elimination algorithm

AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.

ALGORITHM:

Step1:Load Data set


Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: a. If the example is positive, remove from the specific boundary any hypothesis that is not
consistent with the example.
b. If the example is negative, remove from the general boundary any hypothesis that is consistent
with the example.
Step5: If example is Negative example make generalize hypothesis more specific.

PROGRAM:

import numpy as np
import pandas as pd
data=pd.read_csv('https://raw.githubusercontent.com/manishsingh7163/Machine-Learning-
Algorithms/master/Candidate%20Elimination%20Algorithm/enjoysport.csv')
concepts = np.array(data.iloc[:,:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ",specific_h)
print("General Hypothesis: ",general_h,)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ", specific_h)
print("General Hypothesis: ",general_h,)
indices = [i for i,val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h


print("*"*20,"Costumer Elimination Algorithm","*"*20)
s_final, g_final = learn(concepts, target)
print("Final Specific hypothesis:", s_final)
print("Final General hypothesis:", g_final)

Output:

For Training instance No:0 the hypothesis is


Specific Hypothesis: ['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
General Hypothesis: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

For Training instance No:1 the hypothesis is


Specific Hypothesis: ['sunny' 'warm' '?' 'strong' 'warm' 'same']
General Hypothesis: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
For Training instance No:2 the hypothesis is
Specific Hypothesis: ['sunny' 'warm' '?' 'strong' 'warm' 'same']
General Hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]

For Training instance No:3 the hypothesis is


Specific Hypothesis: ['sunny' 'warm' '?' 'strong' '?' '?']
General Hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific hypothesis: ['sunny' 'warm' '?' 'strong' '?' '?']
Final General hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

Result:
Thus, we have successfully implemented and demonstrated the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the training examples.
EX NO:2
DATE: Decision Tree Based ID3 Algorithm

AIM:
To Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.

ALGORITHM:

1. Create a Root node for the tree


2. If all Examples are positive, Return the single-node tree Root, with label = +
3. If all Examples are negative, Return the single-node tree Root, with label = -
4. If Attributes is empty, Return the single-node tree Root, with label = most common value
5. of Target_attribute in Examples
6. Otherwise Begin
7. A ← the attribute from Attributes that best* classifies Examples
8. The decision attribute for Root ← A
9. For each possible value, vi, of A,
i. Add a new tree branch below Root, corresponding to the test A = vi
ii. Let Examples vi, be the subset of Examples that have value vi for A
iii. If Examples vi , is empty
iv. Then below this new branch add a leaf node with label = most common
v. value of Target_attribute in Examples
vi. Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
10. End
11. Return Root

PROGRAM:

import pandas as pd
from pandas import DataFrame
df_tennis = DataFrame.from_csv('PlayTennis.csv')
print("\n Given Play Tennis Data Set:\n\n", df_tennis)
#Function to calculate the entropy of probaility of observations
# -p*log2*p

def entropy(probs):
import math
return sum( [-prob*math.log(prob, 2) for prob in probs] )
#Function to calulate the entropy of the given Data Sets/List with respect to target attributes
def entropy_of_list(a_list):
#print("A-list",a_list)
from collections import Counter
cnt = Counter(x for x in a_list) # Counter calculates the propotion of class
# print("\nClasses:",cnt)
#print("No and Yes Classes:",a_list.name,cnt)
num_instances = len(a_list)*1.0 # = 14
print("\n Number of Instances of the Current Sub Class is {0}:".format(num_instances ))
probs = [x / num_instances for x in cnt.values()] # x means no of YES/NO
print("\n Classes:",min(cnt),max(cnt))
print(" \n Probabilities of Class {0} is {1}:".format(min(cnt),min(probs)))
print(" \n Probabilities of Class {0} is {1}:".format(max(cnt),max(probs)))
return entropy(probs) # Call Entropy :

# The initial entropy of the YES/NO attribute for our dataset.


print("\n INPUT DATA SET FOR ENTROPY CALCULATION:\n", df_tennis['PlayTennis'])

total_entropy = entropy_of_list(df_tennis['PlayTennis'])

print("\n Total Entropy of PlayTennis Data Set:",total_entropy)


def information_gain(df, split_attribute_name, target_attribute_name, trace=0):
print("Information Gain Calculation of ",split_attribute_name)
'''
Takes a DataFrame of attributes, and quantifies the entropy of a target
attribute after performing a split along the values of another attribute.
'''
# Split Data by Possible Vals of Attribute:
df_split = df.groupby(split_attribute_name)
# for name,group in df_split:
# print("Name:\n",name)
# print("Group:\n",group)

# Calculate Entropy for Target Attribute, as well as


# Proportion of Obs in Each Data-Split
nobs = len(df.index) * 1.0
# print("NOBS",nobs)
df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs]
})[target_attribute_name]
#print([target_attribute_name])
#print(" Entropy List ",entropy_of_list)
#print("DFAGGENT",df_agg_ent)
df_agg_ent.columns = ['Entropy', 'PropObservations']
#if trace: # helps understand what fxn is doing:
# print(df_agg_ent)

# Calculate Information Gain:


new_entropy = sum( df_agg_ent['Entropy'] * df_agg_ent['PropObservations'] )
old_entropy = entropy_of_list(df[target_attribute_name])
return old_entropy - new_entropy

print('Info-gain for Outlook is :'+str( information_gain(df_tennis, 'Outlook', 'PlayTennis')),"\n")


print('\n Info-gain for Humidity is: ' + str( information_gain(df_tennis, 'Humidity', 'PlayTennis')),"\n")
print('\n Info-gain for Wind is:' + str( information_gain(df_tennis, 'Wind', 'PlayTennis')),"\n")
print('\n Info-gain for Temperature is:' + str( information_gain(df_tennis,
'Temperature','PlayTennis')),"\n")
def id3(df, target_attribute_name, attribute_names, default_class=None):

## Tally target attribute:


from collections import Counter
cnt = Counter(x for x in df[target_attribute_name])# class of YES /NO

## First check: Is this split of the dataset homogeneous?


if len(cnt) == 1:
return next(iter(cnt)) # next input data set, or raises StopIteration when EOF is hit.

## Second check: Is this split of the dataset empty?


# if yes, return a default value
elif df.empty or (not attribute_names):
return default_class # Return None for Empty Data Set

## Otherwise: This dataset is ready to be devied up!


else:
# Get Default Value for next recursive call of this function:
default_class = max(cnt.keys()) #No of YES and NO Class
# Compute the Information Gain of the attributes:
gainz = [information_gain(df, attr, target_attribute_name) for attr in attribute_names] #
index_of_max = gainz.index(max(gainz)) # Index of Best Attribute
# Choose Best Attribute to split on:
best_attr = attribute_names[index_of_max]

# Create an empty tree, to be populated in a moment


tree = {best_attr:{}} # Iniiate the tree with best attribute as a node
remaining_attribute_names = [i for i in attribute_names if i != best_attr]

# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
#print(tree)
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
def classify(instance, tree, default=None): # Instance of Play Tennis with Predicted

#print("Instance:",instance)
attribute = next(iter(tree))
if instance[attribute] in tree[attribute].keys(): # Value of the attributs in set of Tree keys
result = tree[attribute][instance[attribute]]
print("Instance Attribute:",instance[attribute],"TreeKeys :",tree[attribute].keys())
if isinstance(result, dict): # this is a tree, delve deeper
return classify(instance, result)
else:
return result # this is a label
else:
return default
df_tennis['predicted'] = df_tennis.apply(classify, axis=1, args=(tree,'No') )
# classify func allows for a default arg: when tree doesn't have answer for a particular
# combitation of attribute-values, we can use 'no' as the default guess

print(df_tennis['predicted'])

print('\n Accuracy is:\n' + str( sum(df_tennis['PlayTennis']==df_tennis['predicted'] ) /


(1.0*len(df_tennis.index)) ))

df_tennis[['PlayTennis', 'predicted']]
training_data = df_tennis.iloc[1:-4] # all but last four instances
test_data = df_tennis.iloc[-4:] # just the last four
train_tree = id3(training_data, 'PlayTennis', attribute_names)

test_data['predicted2'] = test_data.apply( # < --- test_data source


classify,
axis=1,
args=(train_tree,'Yes') ) # < ---- train_data tree

print ('\n\n Accuracy is : ' + str( sum(test_data['PlayTennis']==test_data['predicted2'] ) /


(1.0*len(test_data.index)) ))

OUTPUT:

Given Play Tennis Data Set:

PlayTennis Outlook Temperature Humidity Wind


0 No Sunny Hot High Weak
1 No Sunny Hot High Strong
2 Yes Overcast Hot High Weak
3 Yes Rain Mild High Weak
4 Yes Rain Cool Normal Weak
5 No Rain Cool Normal Strong
6 Yes Overcast Cool Normal Strong
7 No Sunny Mild High Weak
8 Yes Sunny Cool Normal Weak
9 Yes Rain Mild Normal Weak
10 Yes Sunny Mild Normal Strong
11 Yes Overcast Mild High Strong
12 Yes Overcast Hot Normal Weak
13 No Rain Mild High Strong

INPUT DATA SET FOR ENTROPY CALCULATION:


0 No
1 No
2 Yes
3 Yes
4 Yes
5 No
6 Yes
7 No
8 Yes
9 Yes
10 Yes
11 Yes
12 Yes
13 No
Name: PlayTennis, dtype: object

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Total Entropy of PlayTennis Data Set: 0.9402859586706309

Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 4.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 14.0:


Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:


Info-gain for Outlook is :0.246749819774

Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.42857142857142855:

Probabilities of Class Yes is 0.5714285714285714:

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.14285714285714285:

Probabilities of Class Yes is 0.8571428571428571:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Humidity is: 0.151835501362

Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 8.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 14.0:


Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Wind is:0.0481270304083

Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:

Info-gain for Temperature is:0.029222565659

Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 4.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:


Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:


Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes
Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:


Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.42857142857142855:

Probabilities of Class Yes is 0.5714285714285714:

Number of Instances of the Current Sub Class is 7.0:

Classes: No Yes

Probabilities of Class No is 0.14285714285714285:

Probabilities of Class Yes is 0.8571428571428571:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class No is 0.35714285714285715:

Probabilities of Class Yes is 0.6428571428571429:


Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 8.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 14.0:

Classes: No Yes

Probabilities of Class No is 0.35714285714285715:


Probabilities of Class Yes is 0.6428571428571429:
Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:


Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:


Information Gain Calculation of Wind
Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes
Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:


Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:


Probabilities of Class Yes is 0.6:
Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 3.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:


Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.4:

Probabilities of Class Yes is 0.6:

The Resultant Decision Tree is :


{'Outlook': {'Overcast': 'Yes',
'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}
Best Attribute :
Outlook
Tree Keys:
dict_keys(['Overcast', 'Rain', 'Sunny'])

Instance Attribute: Strong TreeKeys : dict_keys(['Strong', 'Weak'])


0 No
1 No
2 Yes
3 Yes
4 Yes
5 No
6 Yes
7 No
8 Yes
9 Yes
10 Yes
11 Yes
12 Yes
13 No

Name: predicted, dtype: object

Accuracy is:
1.0

PlayTennis predicted

0 No No

1 No No

2 Yes Yes

3 Yes Yes

4 Yes Yes

5 No No

6 Yes Yes

7 No No

8 Yes Yes

9 Yes Yes

10 Yes Yes

11 Yes Yes

12 Yes Yes

13 No No
Information Gain Calculation of Outlook

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:


Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 5.0:

Classes: No Yes

Probabilities of Class No is 0.2:

Probabilities of Class Yes is 0.8:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 6.0:

Classes: No Yes

Probabilities of Class No is 0.16666666666666666:


Probabilities of Class Yes is 0.8333333333333334:

Number of Instances of the Current Sub Class is 9.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 2.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:


Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:

Number of Instances of the Current Sub Class is 4.0:


Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:


Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 4.0:

Classes: No Yes

Probabilities of Class No is 0.25:

Probabilities of Class Yes is 0.75:


Information Gain Calculation of Temperature

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:


Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Humidity

Number of Instances of the Current Sub Class is 2.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 1.0:

Classes: Yes Yes

Probabilities of Class Yes is 1.0:

Probabilities of Class Yes is 1.0:

Number of Instances of the Current Sub Class is 3.0:

Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Information Gain Calculation of Wind

Number of Instances of the Current Sub Class is 1.0:

Classes: No No

Probabilities of Class No is 1.0:

Probabilities of Class No is 1.0:

Number of Instances of the Current Sub Class is 2.0:

Classes: No Yes

Probabilities of Class No is 0.5:

Probabilities of Class Yes is 0.5:

Number of Instances of the Current Sub Class is 3.0:


Classes: No Yes

Probabilities of Class No is 0.3333333333333333:

Probabilities of Class Yes is 0.6666666666666666:


Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Sunny TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Temperature'])
Attribute: Temperature
Instance Attribute: Mild TreeKeys : dict_keys(['Cool', 'Hot', 'Mild'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Overcast TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Overcast TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Outlook'])
Attribute: Outlook
Instance Attribute: Rain TreeKeys : dict_keys(['Overcast', 'Rain', 'Sunny'])
Key: dict_keys(['Wind'])
Attribute: Wind
Instance Attribute: Strong TreeKeys : dict_keys(['Strong', 'Weak'])

Accuracy is : 0.75

Result:
Thus, we have successfully implemented and demonstrated the working of the decision tree based ID3
algorithm.
EX NO:3
DATE: Artificial Neural Network by Backpropagation

AIM:
To build an artificial neural network by implementing the backpropagation algorithm and test
the same using appropriate data sets.

ALGORITHM:

Step 1: Inputs X, arrive through the preconnected path.

Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.

Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.

Step 4: Calculate the error in the outputs

Backpropagation Error= Actual Output – Desired Output


Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error.

Step 6: Repeat the process until the desired output is achieved.

PROGRAM:

import random
from math import exp
from random import seed

# Initialize a network

def initialize_network(n_inputs, n_hidden, n_outputs):


network = list()
hidden_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_inputs + 1)]} for i in
range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_hidden + 1)]} for i in
range(n_outputs)]
network.append(output_layer)
i= 1
print("\n The initialised Neural Network:\n")
for layer in network:
j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub)
j=j+1
i=i+1
return network

# Calculate neuron activation (net) for an input

def activate(weights, inputs):


activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation to sigmoid function


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Calculate the derivative of an neuron output


def transfer_derivative(output):
return output * (1.0 - output)

# Backpropagate error and store in neurons


def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()

if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error


def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

# Train a network for a fixed number of epochs


def train_network(network, train, l_rate, n_epoch, n_outputs):

print("\n Network Training Begins:\n")

for epoch in range(n_epoch):


sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

print("\n Network Training Ends:\n")

#Test training backprop algorithm


seed(2)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
print("\n The input Data Set :\n",dataset)
n_inputs = len(dataset[0]) - 1
print("\n Number of Inputs :\n",n_inputs)
n_outputs = len(set([row[-1] for row in dataset]))
print("\n Number of Outputs :\n",n_outputs)

#Network Initialization
network = initialize_network(n_inputs, 2, n_outputs)

# Training the Network


train_network(network, dataset, 0.5, 20, n_outputs)

print("\n Final Neural Network :")

i= 1
for layer in network:
j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub)
j=j+1
i=i+1
from math import exp

# Calculate neuron activation for an input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Make a prediction with a network


def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

# Test making predictions with the network


dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
#network = [[{'weights': [-1.482313569067226, 1.8308790073202204,
1.078381922048799]}, {'weights': [0.23244990332399884, 0.3621998343835864,
0.40289821191094327]}],
# [{'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]},
{'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]
for row in dataset:
prediction = predict(network, row)
print('Expected=%d, Got=%d' % (row[-1], prediction))

Output:
The input Data Set :
[[2.7810836, 2.550537003, 0], [1.465489372, 2.362125076, 0], [3.396561688, 4.400293529,
0], [1.38807019, 1.850220317, 0], [3.06407232, 3.005305973, 0], [7.627531214,
2.759262235, 1], [5.332441248, 2.088626775, 1], [6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1], [7.673756466, 3.508563011, 1]]

Number of Inputs :
2

Number of Outputs :
2
The initialised Neural Network:
Layer[1] Node[1]:
{'weights': [0.4560342718892494, 0.4478274870593494, -0.4434486322731913]}

Layer[1] Node[2]:
{'weights': [-0.41512800484107837, 0.33549887812944956, 0.2359699890685233]}

Layer[2] Node[1]:
{'weights': [0.1697304014402209, -0.1918635424108558, 0.10594416567846243]}

Layer[2] Node[2]:
{'weights': [0.10680173364083789, 0.08120401711200309, -0.3416171297451944]}

Network Training Begins:

>epoch=0, lrate=0.500, error=5.278


>epoch=1, lrate=0.500, error=5.122
>epoch=2, lrate=0.500, error=5.006
>epoch=3, lrate=0.500, error=4.875
>epoch=4, lrate=0.500, error=4.700
>epoch=5, lrate=0.500, error=4.466
>epoch=6, lrate=0.500, error=4.176
>epoch=7, lrate=0.500, error=3.838
>epoch=8, lrate=0.500, error=3.469
>epoch=9, lrate=0.500, error=3.089
>epoch=10, lrate=0.500, error=2.716
>epoch=11, lrate=0.500, error=2.367
>epoch=12, lrate=0.500, error=2.054
>epoch=13, lrate=0.500, error=1.780
>epoch=14, lrate=0.500, error=1.546
>epoch=15, lrate=0.500, error=1.349
>epoch=16, lrate=0.500, error=1.184
>epoch=17, lrate=0.500, error=1.045
>epoch=18, lrate=0.500, error=0.929
>epoch=19, lrate=0.500, error=0.831

Network Training Ends:


Final Neural Network :
Layer[1] Node[1]:
{'weights': [0.8642508164347665, -0.8497601716670763, -0.8668929014392035], 'output':
0.9295587965836384, 'delta': 0.005645382825629247}

Layer[1] Node[2]:
{'weights': [-1.2934302410111025, 1.7109363237151507, 0.7125327507327329], 'output':
0.04760703296164151, 'delta': -0.005928559978815076}

Layer[2] Node[1]:
{'weights': [-1.3098359335096292, 2.16462207144596, -0.3079052288835876], 'output':
0.19895563952058462, 'delta': -0.03170801648036037}
Layer[2] Node[2]:
{'weights': [1.5506793402414165, -2.11315950446121, 0.1333585709422027], 'output':
0.8095042653312078, 'delta': 0.029375796661413225}
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1

Result:
Thus, we have successfully built an artificial neural network by implementing the
backpropagation algorithm and test the same using appropriate data.
EX NO:4
DATE: Implementation Of Naïve Bayesian Classifier

AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.

ALGORITHM:

Step1: Load Data set


Step2: Lable the data
Step3: Split it into train and test data
Step4: Fit the data
Step5: Calculate accuracy and print it

PROGRAM:

# import necessary libarities


import pandas as pd
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB

# load data from CSV


data = pd.read_csv('tennisdata.csv')
print("THe first 5 values of data is :\n",data.head())

# obtain Train data and Train output


X = data.iloc[:,:-1]
print("\nThe First 5 values of train data is\n",X.head())

y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())

# Convert then in numbers


le_outlook = LabelEncoder()
X.Outlook = le_outlook.fit_transform(X.Outlook)

le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)

le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)

print("\nNow the Train data is :\n",X.head())

le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20)

classifier = GaussianNB()
classifier.fit(X_train,y_train)

from sklearn.metrics import accuracy_score


print("Accuracy is:",accuracy_score(classifier.predict(X_test),y_test))

Output:
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes

The First 5 values of train data is:


Outlook Temperature Humidity Windy
0 Sunny Hot High False
1 Sunny Hot High True
2 Overcast Hot High False
3 Rainy Mild High False
4 Rainy Cool Normal False

The first 5 values of Train output is


0 No
1 No
2 Yes
3 Yes
4 Yes
Name: PlayTennis, dtype: object

Now the Train data is :


Outlook Temperature Humidity Windy
0 2 1 0 0
1 2 1 0 1
2 0 1 0 0
3 1 2 0 0
4 1 0 1 0

Now the Train output is


[0 0 1 1 1 0 1 0 1 1 1 1 1 0]

Accuracy is: 0.6666666666666666

Result:
Thus, we have successfully written a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file and compute the accuracy with a few test data
sets.
EX NO:5
DATE: Implementation Of Naïve Bayesian Classifier Model

AIM:
To write a program to Implement naïve Bayesian Classifier model to classify a set of
documents and measure the accuracy, precision, and recall.

ALGORITHM:
Step1: Data Pre-processing step

Step2: Fitting Naive Bayes to the Training set

Step3: Predicting the test result

Step4: Test accuracy of the result(Creation of Confusion matrix)

Step5: Visualizing the test set result.

PROGRAM:

import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer

count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])

from sklearn.naive_bayes import MultinomialNB


clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score


print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

Output:
Total Instances of Dataset: 18

about am an and awesome bad beers best boss can ... tired t
o
0 0 1 0 1 0 0 0 0 0 0 ... 1
0
1 0 0 0 0 0 0 0 0 0 0 ... 0
0
2 0 0 0 0 0 0 0 0 0 0 ... 0
0
3 0 0 0 0 0 0 0 0 0 1 ... 0
0
4 0 0 0 0 0 0 0 0 0 0 ... 0
0

today tomorrow very we went will with work


0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
4 0 0 0 0 0 0 0 0

[5 rows x 49 columns]

I am sick and tired of this place -> pos


I do not like the taste of this juice -> neg
I love this sandwich -> neg
I can't deal with this -> pos
I do not like this restaurant -> neg

Accuracy Metrics:

Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
Result:
Thus, we have successfully written a program to implement naïve Bayesian Classifier model
to classify a set of documents and measure the accuracy, precision, and recall.
EX NO:6
DATE: Bayesian network to diagnose CORONA infection

AIM:
To Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.

ALGORITHM:
Step1: Data Pre-processing step

Step2: Fitting Naive Bayes to the Training set

Step3: Predicting the test result

Step4: Visualizing the test set result.

PROGRAM:

import pandas as pd

inside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - Hubei.csv',


low_memory=False)
outside = pd.read_csv('./covid/data/00/COVID19_2020_open_line_list - outside_Hubei.csv',
low_memory=False)

outside = outside.drop(['data_moderator_initials'], axis=1)

data = pd.concat([inside, outside])

import json
import itertools
from datetime import datetime

with open('./covid/data/00/symptom-mapping.json', 'r') as f:


symptom_map = json.load(f)

def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s

def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]

d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)

d.symptoms = d.symptoms.apply(lambda s: tokenize(s))


d.symptoms = d.symptoms.apply(lambda tokens: [map_to_symptoms(s) for s in tokens] if
tokens is not None else None)
d.symptoms = d.symptoms.apply(lambda arrs: None if arrs is None else
list(itertools.chain(*arrs)))

for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)

d = d.drop(['symptoms'], axis=1)
print(d.shape)

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('seaborn')

v = [d[d[c] == 1].shape[0] for c in d.columns]


s = pd.Series(v, d.columns)

fig, ax = plt.subplots(figsize=(15, 5))


_ = s.plot(kind='bar', ax=ax, title=f'Frequency of symptoms, n={d.shape[0]}')

plt.tight_layout()

from pybbn.graph.dag import Bbn


from pybbn.pptc.inferencecontroller import InferenceController
import json

with open('./covid/naive.json', 'r') as f:


bbn = Bbn.from_dict(json.load(f))
join_tree = InferenceController.apply(bbn)

from pybbn.generator.bbngenerator import convert_for_drawing


import networkx as nx
import warnings

with warnings.catch_warnings():
warnings.simplefilter('ignore')

graph = convert_for_drawing(bbn)
pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')

plt.figure(figsize=(15, 8))
plt.subplot(121)
labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
nx.draw(graph, pos=pos, with_labels=True, labels=labels)
plt.title('BBN DAG')

def potential_to_series(potential):
def get_entry_kv(entry):
arr = [(k, v) for k, v in entry.entries.items()]
arr = sorted(arr, key=lambda tup: tup[0])
return arr[0][1], entry.value

tups = [get_entry_kv(e) for e in potential.entries]


return pd.Series([tup[1] for tup in tups], [tup[0] for tup in tups])

series = [(node, potential_to_series(join_tree.get_bbn_potential(node))) for node in


join_tree.get_bbn_nodes()]

n_cols = 3
n_rows = int(len(series) / n_cols)

fig, axes = plt.subplots(n_rows, n_cols, figsize=(10, 20))


axes = np.ravel(axes)

for ax, (node, s) in zip(axes, series):


s.plot(kind='bar', ax=ax, title=f'{node.variable.name}')

plt.tight_layout()

%%time

from pybbn.graph.jointree import EvidenceBuilder

names = [
'anosmia', 'sputum', 'muscle', 'chills', 'fever',
'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
'running_nose'
]

predictions = []
for i, r in d.iterrows():
fields = [name for name in names if r[name] == 1]

join_tree.unobserve_all()

if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in
bbn_nodes]
join_tree.update_evidences(evidences)

disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)

predictions.append(s)

predictions = pd.DataFrame(predictions)
predictions

Output:

(656, 1)
(656, 32)
CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s

Result:
Thus, to write a program to construct a bayesian network to diagnose corona infection using
standard WHO data set.
EX NO:7
DATE: EM Algorithm & K-Means Algorithm

AIM:
To write a program to apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using the kk-Means
Means algorithm. Compare the results of these
two algorithms.

ALGORITHM:

K-Means Algorithm:
Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

EM Algorithm:

The EM algorithm is the combination of various unsupervised ML algorithms, such as the


k-means clustering algorithm. Being an iterative approach, it consists of two modes. In the
first mode, we estimate the missing or latent variables. Hence it is referred to as the
Expectation/estimation step (E--step).
step). Further, the other mode is used to optimize the
parameters of the models so that it can explain the data more clearly. The second mode is
known as the maximization-step or M-step.

 Expectation step (E - step): It involves the estimation (guess) of all missing values in
the dataset so that after completing this step, there should not be any missing value.
 Maximization step (M - step): This step involves the use of estimated data in the E-
step and updating the parameters.
 Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the dataset to
estimate the missing data of the latent variables and then use that data to update the values of
the parameters in the M-step.

PROGRAM:

from sklearn.cluster import KMeans


from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset=load_iris()
# print(dataset)

X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')

# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')

# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

Output:

Text(0.5, 1.0, 'GMM Classification')


Result:
Thus, we have successfully written a program to apply EM algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for clustering using the k-Means algorithm.
Compare the results of these two algorithms.
EX NO:8
DATE: K-Nearest Neighbour Algorithm

AIM:
To write a program to implement k-Nearest
k Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions

ALGORITHM:

K-Nearest Neighbor Algorithm:

Step-1: Select the number K of the neighbors


Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.

PROGRAM:

from sklearn.datasets import load_iris


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np

dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=
0)

kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)

for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)

print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,da
taset["target_names"][prediction])
print(kn.score(X_test,y_test))

Output:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')

TARGET= 2 virginica PREDICTED= [2] ['virginica']


TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [2] ['virginica']
0.9736842105263158

Result:
Thus, we have successfully written a program to implement k-Nearest Neighbour algorithm
to classify the iris data set.Print both correct and wrong predictions.
EX NO:9
DATE: Locally Weighted Regression Algorithm

AIM:
To write a program to implement the non-parametric locally weighted regression algorithm in
order to fit data points. select an appropriate data set for your experiment and draw graphs

ALGORITHM:
 LWR is a non-parametric regression technique that fits a linear regression model
to a dataset by giving more weight to nearby data points.

 LWR fits a separate linear regression model for each query point based on the
weights assigned to the training data points.

 The weights assigned to each training data point are inversely proportional to
their distance from the query point.

 Training data points that are closer to the query point will have a higher weight
and contribute more to the linear regression model.

 LWR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in
the data.

PROGRAM:

from math import ceil


import numpy as np
from scipy import linalg

def lowess(x, y, f, iterations):


n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iterations):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x),
np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2

return yest

import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt


plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")

Output:

[<matplotlib.lines.Line2D at 0x37459696d8>]

Result:
Thus, we have successfully written a program to implement the non non-parametric
parametric locally
weighted regression algorithm in order to fit data points. select an appropriate data set for
your experiment and draw graphs.

You might also like