Ml Lab Record
Ml Lab Record
AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
ALGORITHM:
PROGRAM:
import numpy as np
import pandas as pd
data=pd.read_csv('https://raw.githubusercontent.com/manishsingh7163/Machine-Learning-
Algorithms/master/Candidate%20Elimination%20Algorithm/enjoysport.csv')
concepts = np.array(data.iloc[:,:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ",specific_h)
print("General Hypothesis: ",general_h,)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" \n\nFor Training instance No:{0} the hypothesis is\n".format(i))
print("Specific Hypothesis: ", specific_h)
print("General Hypothesis: ",general_h,)
indices = [i for i,val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
Output:
Result:
Thus, we have successfully implemented and demonstrated the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the training examples.
EX NO:2
DATE: Decision Tree Based ID3 Algorithm
AIM:
To Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
ALGORITHM:
PROGRAM:
import pandas as pd
from pandas import DataFrame
df_tennis = DataFrame.from_csv('PlayTennis.csv')
print("\n Given Play Tennis Data Set:\n\n", df_tennis)
#Function to calculate the entropy of probaility of observations
# -p*log2*p
def entropy(probs):
import math
return sum( [-prob*math.log(prob, 2) for prob in probs] )
#Function to calulate the entropy of the given Data Sets/List with respect to target attributes
def entropy_of_list(a_list):
#print("A-list",a_list)
from collections import Counter
cnt = Counter(x for x in a_list) # Counter calculates the propotion of class
# print("\nClasses:",cnt)
#print("No and Yes Classes:",a_list.name,cnt)
num_instances = len(a_list)*1.0 # = 14
print("\n Number of Instances of the Current Sub Class is {0}:".format(num_instances ))
probs = [x / num_instances for x in cnt.values()] # x means no of YES/NO
print("\n Classes:",min(cnt),max(cnt))
print(" \n Probabilities of Class {0} is {1}:".format(min(cnt),min(probs)))
print(" \n Probabilities of Class {0} is {1}:".format(max(cnt),max(probs)))
return entropy(probs) # Call Entropy :
total_entropy = entropy_of_list(df_tennis['PlayTennis'])
# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
#print(tree)
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
def classify(instance, tree, default=None): # Instance of Play Tennis with Predicted
#print("Instance:",instance)
attribute = next(iter(tree))
if instance[attribute] in tree[attribute].keys(): # Value of the attributs in set of Tree keys
result = tree[attribute][instance[attribute]]
print("Instance Attribute:",instance[attribute],"TreeKeys :",tree[attribute].keys())
if isinstance(result, dict): # this is a tree, delve deeper
return classify(instance, result)
else:
return result # this is a label
else:
return default
df_tennis['predicted'] = df_tennis.apply(classify, axis=1, args=(tree,'No') )
# classify func allows for a default arg: when tree doesn't have answer for a particular
# combitation of attribute-values, we can use 'no' as the default guess
print(df_tennis['predicted'])
df_tennis[['PlayTennis', 'predicted']]
training_data = df_tennis.iloc[1:-4] # all but last four instances
test_data = df_tennis.iloc[-4:] # just the last four
train_tree = id3(training_data, 'PlayTennis', attribute_names)
OUTPUT:
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Probabilities of Class No is 0.35714285714285715:
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Probabilities of Class No is 0.4:
Classes: No No
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Accuracy is:
1.0
PlayTennis predicted
0 No No
1 No No
2 Yes Yes
3 Yes Yes
4 Yes Yes
5 No No
6 Yes Yes
7 No No
8 Yes Yes
9 Yes Yes
10 Yes Yes
11 Yes Yes
12 Yes Yes
13 No No
Information Gain Calculation of Outlook
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No No
Classes: No No
Classes: No Yes
Classes: No No
Classes: No Yes
Classes: No No
Classes: No Yes
Accuracy is : 0.75
Result:
Thus, we have successfully implemented and demonstrated the working of the decision tree based ID3
algorithm.
EX NO:3
DATE: Artificial Neural Network by Backpropagation
AIM:
To build an artificial neural network by implementing the backpropagation algorithm and test
the same using appropriate data sets.
ALGORITHM:
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
PROGRAM:
import random
from math import exp
from random import seed
# Initialize a network
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
#Network Initialization
network = initialize_network(n_inputs, 2, n_outputs)
i= 1
for layer in network:
j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub)
j=j+1
i=i+1
from math import exp
Output:
The input Data Set :
[[2.7810836, 2.550537003, 0], [1.465489372, 2.362125076, 0], [3.396561688, 4.400293529,
0], [1.38807019, 1.850220317, 0], [3.06407232, 3.005305973, 0], [7.627531214,
2.759262235, 1], [5.332441248, 2.088626775, 1], [6.922596716, 1.77106367, 1],
[8.675418651, -0.242068655, 1], [7.673756466, 3.508563011, 1]]
Number of Inputs :
2
Number of Outputs :
2
The initialised Neural Network:
Layer[1] Node[1]:
{'weights': [0.4560342718892494, 0.4478274870593494, -0.4434486322731913]}
Layer[1] Node[2]:
{'weights': [-0.41512800484107837, 0.33549887812944956, 0.2359699890685233]}
Layer[2] Node[1]:
{'weights': [0.1697304014402209, -0.1918635424108558, 0.10594416567846243]}
Layer[2] Node[2]:
{'weights': [0.10680173364083789, 0.08120401711200309, -0.3416171297451944]}
Layer[1] Node[2]:
{'weights': [-1.2934302410111025, 1.7109363237151507, 0.7125327507327329], 'output':
0.04760703296164151, 'delta': -0.005928559978815076}
Layer[2] Node[1]:
{'weights': [-1.3098359335096292, 2.16462207144596, -0.3079052288835876], 'output':
0.19895563952058462, 'delta': -0.03170801648036037}
Layer[2] Node[2]:
{'weights': [1.5506793402414165, -2.11315950446121, 0.1333585709422027], 'output':
0.8095042653312078, 'delta': 0.029375796661413225}
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Result:
Thus, we have successfully built an artificial neural network by implementing the
backpropagation algorithm and test the same using appropriate data.
EX NO:4
DATE: Implementation Of Naïve Bayesian Classifier
AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
ALGORITHM:
PROGRAM:
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)
le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
classifier = GaussianNB()
classifier.fit(X_train,y_train)
Output:
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False Yes
4 Rainy Cool Normal False Yes
Result:
Thus, we have successfully written a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file and compute the accuracy with a few test data
sets.
EX NO:5
DATE: Implementation Of Naïve Bayesian Classifier Model
AIM:
To write a program to Implement naïve Bayesian Classifier model to classify a set of
documents and measure the accuracy, precision, and recall.
ALGORITHM:
Step1: Data Pre-processing step
PROGRAM:
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
Output:
Total Instances of Dataset: 18
about am an and awesome bad beers best boss can ... tired t
o
0 0 1 0 1 0 0 0 0 0 0 ... 1
0
1 0 0 0 0 0 0 0 0 0 0 ... 0
0
2 0 0 0 0 0 0 0 0 0 0 ... 0
0
3 0 0 0 0 0 0 0 0 0 1 ... 0
0
4 0 0 0 0 0 0 0 0 0 0 ... 0
0
[5 rows x 49 columns]
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
Result:
Thus, we have successfully written a program to implement naïve Bayesian Classifier model
to classify a set of documents and measure the accuracy, precision, and recall.
EX NO:6
DATE: Bayesian network to diagnose CORONA infection
AIM:
To Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
ALGORITHM:
Step1: Data Pre-processing step
PROGRAM:
import pandas as pd
import json
import itertools
from datetime import datetime
def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s
def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]
d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)
for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)
d = d.drop(['symptoms'], axis=1)
print(d.shape)
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn')
plt.tight_layout()
with warnings.catch_warnings():
warnings.simplefilter('ignore')
graph = convert_for_drawing(bbn)
pos = nx.nx_agraph.graphviz_layout(graph, prog='neato')
plt.figure(figsize=(15, 8))
plt.subplot(121)
labels = dict([(k, node.variable.name) for k, node in bbn.nodes.items()])
nx.draw(graph, pos=pos, with_labels=True, labels=labels)
plt.title('BBN DAG')
def potential_to_series(potential):
def get_entry_kv(entry):
arr = [(k, v) for k, v in entry.entries.items()]
arr = sorted(arr, key=lambda tup: tup[0])
return arr[0][1], entry.value
n_cols = 3
n_rows = int(len(series) / n_cols)
plt.tight_layout()
%%time
names = [
'anosmia', 'sputum', 'muscle', 'chills', 'fever',
'wheezing', 'nasal', 'fatigue', 'headache', 'sore_throat',
'dry_cough', 'diarrhoea', 'dyspnea', 'nausea', 'sneezing',
'running_nose'
]
predictions = []
for i, r in d.iterrows():
fields = [name for name in names if r[name] == 1]
join_tree.unobserve_all()
if len(fields) > 0:
bbn_nodes = [join_tree.get_bbn_node_by_name(f) for f in fields]
evidences = [EvidenceBuilder().with_node(n).with_evidence('t', 1.0).build() for n in
bbn_nodes]
join_tree.update_evidences(evidences)
disease = join_tree.get_bbn_node_by_name('disease')
disease_potential = join_tree.get_bbn_potential(disease)
s = potential_to_series(disease_potential)
predictions.append(s)
predictions = pd.DataFrame(predictions)
predictions
Output:
(656, 1)
(656, 32)
CPU times: user 6.85 s, sys: 40.2 ms, total: 6.89 s
Wall time: 6.93 s
Result:
Thus, to write a program to construct a bayesian network to diagnose corona infection using
standard WHO data set.
EX NO:7
DATE: EM Algorithm & K-Means Algorithm
AIM:
To write a program to apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using the kk-Means
Means algorithm. Compare the results of these
two algorithms.
ALGORITHM:
K-Means Algorithm:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
EM Algorithm:
Expectation step (E - step): It involves the estimation (guess) of all missing values in
the dataset so that after completing this step, there should not be any missing value.
Maximization step (M - step): This step involves the use of estimated data in the E-
step and updating the parameters.
Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the dataset to
estimate the missing data of the latent variables and then use that data to update the values of
the parameters in the M-step.
PROGRAM:
dataset=load_iris()
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
Output:
AIM:
To write a program to implement k-Nearest
k Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions
ALGORITHM:
PROGRAM:
dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=
0)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)
for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)
print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,da
taset["target_names"][prediction])
print(kn.score(X_test,y_test))
Output:
Result:
Thus, we have successfully written a program to implement k-Nearest Neighbour algorithm
to classify the iris data set.Print both correct and wrong predictions.
EX NO:9
DATE: Locally Weighted Regression Algorithm
AIM:
To write a program to implement the non-parametric locally weighted regression algorithm in
order to fit data points. select an appropriate data set for your experiment and draw graphs
ALGORITHM:
LWR is a non-parametric regression technique that fits a linear regression model
to a dataset by giving more weight to nearby data points.
LWR fits a separate linear regression model for each query point based on the
weights assigned to the training data points.
The weights assigned to each training data point are inversely proportional to
their distance from the query point.
Training data points that are closer to the query point will have a higher weight
and contribute more to the linear regression model.
LWR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in
the data.
PROGRAM:
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
Output:
[<matplotlib.lines.Line2D at 0x37459696d8>]
Result:
Thus, we have successfully written a program to implement the non non-parametric
parametric locally
weighted regression algorithm in order to fit data points. select an appropriate data set for
your experiment and draw graphs.