ML Lab Mannual Final
ML Lab Mannual Final
INSTITUTE
VISION
Emerging as a technical institution of high standard and excellence to produce quality Engineers,
Researchers, Administrators and Entrepreneurs with ethical and moral values to contribute the sustainable
development of the society.
MISSION
MI: To have in-depth domain knowledge with analytical and practical skills in cutting edge
technologies by imparting quality technical education.
MII: To be industry ready and multi-skilled personalities to transfer technology to industries and rural
areas by creating interests among students in Research and Development and Entrepreneurship.
DEPARTMENT
VISION
MISSION
To impart holistic education with niche technologies for the enrichment of knowledge and skills
through updated curriculum and inspired learning.
To empower valued based AI education to the students for developing intelligent systems and
innovative products to address the societal problems with ethical value.
The Programme Educational Objectives of B.Tech (Artificial Intelligence and Data Science) are
listed below:
PEO-1: Our Graduates are competent in building intelligent machines, software, or applications
with a cutting-edge combination of machine learning, analytics and visualization
technologies to identify new opportunities.
PEO-2: Our Graduates adapt the new technologies and develop the solutions to real world
problems with ethical practices to enhance their own stature to contribute society.
PEO-3: Our Graduates thrive to continuing education for fulfilling their lifelong goals and
satisfaction and successful professionals in industry, government, academia, research and
consultancy.
PSO-1: Apply the fundamental knowledge to develop intelligent systems using computational
principles, methods and systems for extracting knowledge from data to identify, formulate
and solve real time problems and societal issues for the sustainable development.
PSO-2: Enrich their abilities to qualify for Employment, Higher studies and Research in Artificial
Intelligence and Data science with ethical values.
GNANAMANI COLLEGE OF TECHNOLOGY (AUTONOMOUS)
Accredited by NBA & NAAC with “A” grade
A.K.Samuthiram, Pachal (PO), Namakkal – 637 018
CONTENTS
OBJECTIVES:
To understand the data sets and apply suitable algorithms for selecting the appropriate
features for analysis.
To learn to implement supervised machine learning algorithms on standard datasets and
evaluate the performance.
To experiment the unsupervised machine learning algorithms on standard datasets and
evaluate the performance.
To build the graph based learning models for standard data sets.
To compare the performance of different ML algorithms and select the suitable one based on
the application.
LIST OF EXPERIMENTS:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and measure the
accuracy, precision, and recall.
6. Write a program to construct a Bayesian network to diagnose CORONA infection using standard
WHO Data Set.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select an appropriate data set for your experiment and draw graphs.
OUTCOMES:
At the end of this course, the students will be able to:
CO1:Apply suitable algorithms for selecting the appropriate features for analysis.
CO2:Implement supervised machine learning algorithms on standard datasets and evaluate the
performance.
CO3:Apply unsupervised machine learning algorithms on standard datasets and evaluate the
performance.
CO4:Build the graph based learning models for standard data sets.
CO5:Assess and compare the performance of different ML algorithms and select the suitable one based
on the application.
EX NO:1
DATE: Implementation of Candidate-Elimination algorithm
AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
ALGORITHM:
PROGRAM:
import numpy as np
import pandas as pd
data=pd.read_csv('https://raw.githubusercontent.com/manishsingh7163/Machine-Learning-
Algorithms/master/Candidate%20Elimination%20Algorithm/enjoysport.csv')
concepts = np.array(data.iloc[:,:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ",specific_h)
print("General Hypothesis: ",general_h,)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ", specific_h)
print("General Hypothesis: ",general_h,)
indices = [i for i,val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
Output:
Result:
Thus, we have successfully implemented and demonstrated the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the training examples.
EX NO:2
DATE: Decision Tree Based ID3 Algorithm
AIM:
To Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
ALGORITHM:
PROGRAM:
import pandas as pd
from pandas import DataFrame
df_tennis = DataFrame.from_csv('PlayTennis.csv')
print("\n Given Play Tennis Data Set:\n\n", df_tennis)
#Function to calculate the entropy of probaility of observations
# -p*log2*p
def entropy(probs):
import math
return sum( [-prob*math.log(prob, 2) for prob in probs] )
#Function to calulate the entropy of the given Data Sets/List with respect to target attributes
def entropy_of_list(a_list):
#print("A-list",a_list)
from collections import Counter
cnt = Counter(x for x in a_list) # Counter calculates the propotion of class
# print("\nClasses:",cnt)
#print("No and Yes Classes:",a_list.name,cnt)
num_instances = len(a_list)*1.0 # = 14
print("\n Number of Instances of the Current Sub Class is {0}:".format(num_instances ))
probs = [x / num_instances for x in cnt.values()] # x means no of YES/NO
print("\n Classes:",min(cnt),max(cnt))
print(" \n Probabilities of Class {0} is {1}:".format(min(cnt),min(probs)))
print(" \n Probabilities of Class {0} is {1}:".format(max(cnt),max(probs)))
return entropy(probs) # Call Entropy :
total_entropy = entropy_of_list(df_tennis['PlayTennis'])
# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
#print(tree)
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
def classify(instance, tree, default=None): # Instance of Play Tennis with Predicted
#print("Instance:",instance)
attribute = next(iter(tree))
if instance[attribute] in tree[attribute].keys(): # Value of the attributs in set of Tree keys
result = tree[attribute][instance[attribute]]
print("Instance Attribute:",instance[attribute],"TreeKeys :",tree[attribute].keys())
if isinstance(result, dict): # this is a tree, delve deeper
return classify(instance, result)
else:
return result # this is a label
else:
return default
df_tennis['predicted'] = df_tennis.apply(classify, axis=1, args=(tree,'No') )
# classify func allows for a default arg: when tree doesn't have answer for a particular
# combitation of attribute-values, we can use 'no' as the default guess
print(df_tennis['predicted'])
df_tennis[['PlayTennis', 'predicted']]
training_data = df_tennis.iloc[1:-4] # all but last four instances
test_data = df_tennis.iloc[-4:] # just the last four
train_tree = id3(training_data, 'PlayTennis', attribute_names)
OUTPUT:
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Probabilities of Class No is 0.35714285714285715:
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Probabilities of Class No is 0.4:
Classes: No No
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Accuracy is:
1.0
PlayTennis predicted
0 No No
1 No No
2 Yes Yes
3 Yes Yes
4 Yes Yes
5 No No
6 Yes Yes
7 No No
8 Yes Yes
9 Yes Yes
10 Yes Yes
11 Yes Yes
12 Yes Yes
13 No No
Information Gain Calculation of Outlook
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No No
Classes: No No
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No No
Classes: No Yes
Accuracy is : 0.75
Result:
Thus, we have successfully implemented and demonstrated the working of the decision tree based ID3
algorithm.
EX NO:3
DATE: Artificial Neural Network by Backpropagation
AIM:
To build an artificial neural network by implementing the backpropagation algorithm and test
the same using appropriate data sets.
ALGORITHM:
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
PROGRAM:
import random
from math import exp
from random import seed
# Initialize a network
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
#Network Initialization
network = initialize_network(n_inputs, 2, n_outputs)
i= 1
for layer in network:
j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub)
j=j+1
i=i+1
from math import exp
Output:
The input Data Set :
[[2.7810836, 2.550537003, 0], [1.465489372, 2.362125076, 0], [3.396561688, 4.400293529,
0], [1.38807019, 1.850220317, 0], [3.06407232, 3.005305973, 0], [7.627531214,
2.759262235, 1], [5.332441248, 2.088626775, 1], [6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1], [7.673756466, 3.508563011, 1]]
Number of Inputs :
2
Number of Outputs :
2
The initialised Neural Network:
Layer[1] Node[1]:
{'weights': [0.4560342718892494, 0.4478274870593494, -0.4434486322731913]}
Layer[1] Node[2]:
{'weights': [-0.41512800484107837, 0.33549887812944956, 0.2359699890685233]}
Layer[2] Node[1]:
{'weights': [0.1697304014402209, -0.1918635424108558, 0.10594416567846243]}
Layer[2] Node[2]:
{'weights': [0.10680173364083789, 0.08120401711200309, -0.3416171297451944]}
Layer[1] Node[2]:
{'weights': [-1.2934302410111025, 1.7109363237151507, 0.7125327507327329], 'output':
0.04760703296164151, 'delta': -0.005928559978815076}
Layer[2] Node[1]:
{'weights': [-1.3098359335096292, 2.16462207144596, -0.3079052288835876], 'output':
0.19895563952058462, 'delta': -0.03170801648036037}
Layer[2] Node[2]:
{'weights': [1.5506793402414165, -2.11315950446121, 0.1333585709422027], 'output':
0.8095042653312078, 'delta': 0.029375796661413225}
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Result:
Thus, we have successfully built an artificial neural network by implementing the
backpropagation algorithm and test the same using appropriate data.
EX NO:4
DATE: Implementation Of Naïve Bayesian Classifier
AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
ALGORITHM:
PROGRAM:
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)
le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
classifier = GaussianNB()
classifier.fit(X_train,y_train)
Output:
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
Result:
Thus, we have successfully written a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file and compute the accuracy with a few test data
sets.
EX NO:5
DATE: Implementation Of Naïve Bayesian Classifier Model
AIM:
To write a program to Implement naïve Bayesian Classifier model to classify a set of
documents and measure the accuracy, precision, and recall.
ALGORITHM:
Step1: Data Pre-processing step
PROGRAM:
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
Output:
Total Instances of Dataset: 18
about am an and awesome bad beers best boss can ... tired t
o
0 0 1 0 1 0 0 0 0 0 0 ... 1
0
1 0 0 0 0 0 0 0 0 0 0 ... 0
0
2 0 0 0 0 0 0 0 0 0 0 ... 0
0
3 0 0 0 0 0 0 0 0 0 1 ... 0
0
4 0 0 0 0 0 0 0 0 0 0 ... 0
0
[5 rows x 49 columns]
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
Result:
Thus, we have successfully written a program to implement naïve Bayesian Classifier model
to classify a set of documents and measure the accuracy, precision, and recall.
EX NO:6
DATE: Bayesian network to diagnose CORONA infection
AIM:
To Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
ALGORITHM:
Step1: Data Pre-processing step
PROGRAM:
import pandas as pd
import json
import itertools
from datetime import datetime
def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s
def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]
d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)
for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)
d = d.drop(['symptoms'], axis=1)
print(d.shape)
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn')
plt.tight_layout()
with warnings.catch_warnings():
warnings.simplefilter('ignore')
graph = convert_for_drawing(bbn)
pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')
plt.figure(figsize=(15, 8))
plt.subplot(121)
labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
nx.draw(graph, pos=pos, with_labels=True, labels=labels)
plt.title('BBN DAG')
def potential_to_series(potential):
def get_entry_kv(entry):
arr = [(k, v) for k, v in entry.entries.items()]
arr = sorted(arr, key=lambda tup: tup[0])
return arr[0][1], entry.value
n_cols = 3
n_rows = int(len(series) / n_cols)
plt.tight_layout()
%%time
names = [
'anosmia', 'sputum', 'muscle', 'chills', 'fever',
'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
'running_nose'
]
predictions = []
for i, r in d.iterrows():
fields = [name for name in names if r[name] == 1]
join_tree.unobserve_all()
if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in
bbn_nodes]
join_tree.update_evidences(evidences)
disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)
predictions.append(s)
predictions = pd.DataFrame(predictions)
predictions
Output:
(656, 1)
(656, 32)
CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s
Result:
Thus, to write a program to construct a bayesian network to diagnose corona infection using
standard WHO data set.
EX NO:7
DATE: EM Algorithm & K-Means Algorithm
AIM:
To write a program to apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using the k-Means algorithm. Compare the results of these
two algorithms.
ALGORITHM:
K-Means Algorithm:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
EM Algorithm:
Expectation step (E - step): It involves the estimation (guess) of all missing values in
the dataset so that after completing this step, there should not be any missing value.
Maximization step (M - step): This step involves the use of estimated data in the E-
step and updating the parameters.
Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the dataset to
estimate the missing data of the latent variables and then use that data to update the values of
the parameters in the M-step.
PROGRAM:
dataset=load_iris()
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:
AIM:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions
ALGORITHM:
PROGRAM:
dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=
0)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)
for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)
print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,da
taset["target_names"][prediction])
print(kn.score(X_test,y_test))
Output:
Result:
Thus, we have successfully written a program to implement k-Nearest Neighbour algorithm
to classify the iris data set.Print both correct and wrong predictions.
EX NO:9
DATE: Locally Weighted Regression Algorithm
AIM:
To write a program to implement the non-parametric locally weighted regression algorithm in
order to fit data points. select an appropriate data set for your experiment and draw graphs
ALGORITHM:
LWR is a non-parametric regression technique that fits a linear regression model
to a dataset by giving more weight to nearby data points.
LWR fits a separate linear regression model for each query point based on the
weights assigned to the training data points.
The weights assigned to each training data point are inversely proportional to
their distance from the query point.
Training data points that are closer to the query point will have a higher weight
and contribute more to the linear regression model.
LWR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in
the data.
PROGRAM:
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
Output:
[<matplotlib.lines.Line2D at 0x37459696d8>]
Result:
Thus, we have successfully written a program to implement the non-parametric locally
weighted regression algorithm in order to fit data points. select an appropriate data set for
your experiment and draw graphs.