ML Lab Record
ML Lab Record
For a given set of training data examples stored in a .CSV file, implement and
DATE: demonstrate the Candidate-Elimination algorithm to output a description of the
set of all hypotheses consistent with the training examples
AIM:
PROCEDURE:
Training Examples:
enjoysport.csv
1
PROGRAM:
importnumpy as np
import pandas as pd
# Loading Data from a CSV File
data = pd.DataFrame(data=pd.read_csv('E:/ML/enjoysport.csv'))
print(data)
# find indices where we have empty rows, meaning those that are unchanged
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
# remove those rows from general_h
general_h.remove(['?', '?', '?', '?', '?', '?'])
# Return final values
returnspecific_h, general_h
s_final, g_final = learn(concepts, target)
print("\nFinalSpecific_h:", s_final, sep="\n")
print("\nFinalGeneral_h:", g_final, sep="\n")
OUTPUT:
3
RESULT:
4
EX NO: 2
Write a program to demonstrate the working of the decision tree based ID3
DATE: algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample
AIM:
PROCEDURE:
ID3Algorithm:
ID3(Examples,Target_attribute,Attributes)
o CreateaRootnodeforthetree
o IfallExamplesarepositive,Returnthesingle-nodetreeRoot,withlabel=+
o IfallExamplesarenegative,Returnthe single-nodetreeRoot,withlabel=-
o IfAttributesisempty,Returnthesingle-nodetreeRoot,withlabel=
mostcommonvalueofTarget_attributeinExamples
oOtherwiseBegin
A←the attributefromAttributesthatbest*classifiesExamples
ThedecisionattributeforRoot←A
Foreachpossiblevalue,υi,ofA,
Addanewtreebranchbelow Root,correspondingtothetestA=υi
LetExamplesυi,bethe subsetofExamplesthathavevalueυiforA
IfExamplesυi,isempty
Thenbelowthis newbranchadda
leafnodewithlabel=mostcommonvalueofTarget_attributeinExamples
Elsebelowthisnewbranchaddthe subtree
ID3(Examplesυi,Target_attribute, Attributes–{A}))End
ReturnRoot
PROGRAM:
import numpy as np
import math
import csv
defread_data(filename):
class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
returnself.attribute
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
6
return items, dict
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
for x in range(items.shape[0]):
defgain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))
for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
return total_entropy / iv
defcreate_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
for x in range(items.shape[0]):
return node
def empty(size):
s = ""
for x in range(size):
8
s += " "
return s
defprint_tree(node, level):
if node.answer != "":
print(empty(level), node.answer)
return
print(empty(level), node.attribute)
for value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)
data = np.array(traindata)
node = create_node(data, metadata)
print_tree(node, 0)
Data Set:
Tennisdata.csv
9
OUTPUT:
RESULT:
10
EX NO: 3
Build an Artificial Neural Network by implementing the Backpropagation
DATE: algorithm and test the same using appropriate data sets
AIM:
PROCEDURE:
11
PROGRAM:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
defderivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
13
OUTPUT:
RESULT:
14
EX NO: 4
Write a program to implement the naïve Bayesian classifier for a sample training
DATE: data set stored as a .CSV file and compute the accuracy with a few test data sets.
AIM:
PROCEDURE:
15
DATA SET:
Tennisdata.csv
PROGRAM:
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
classifier = GaussianNB()
classifier.fit(X_train,y_train)
17
OUTPUT:
RESULT:
18
EX NO: 5
Implement naïve Bayesian Classifier model to classify a set of documents and
DATE: measure the accuracy, precision, and recall.
AIM:
Data Set:
document.csv
PROGRAM:
import pandas as pd
msg = pd.read_csv('document.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
OUTPUT:
19
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
OUTPUT:
about am an and awesome bad beers best boss can ... tired to \
0 0 1 0 1 0 0 0 0 0 0 ... 1 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0
3 0 0 0 0 0 0 0 0 0 1 ... 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0
[5 rows x 49 columns]
OUTPUT:
I am sick and tired of this place -> pos
I do not like the taste of this juice -> neg
I love this sandwich -> neg
I can't deal with this -> pos
I do not like this restaurant -> neg
20
OUTPUT:
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
RESULT:
21
EX NO: 6
Write a program to construct a Bayesian network to diagnose CORONA infection
DATE: using standard WHO Data Set.
AIM:
PROCEDURE:
Let’s see if a Bayesian Belief Network (BBN) is able to diagnose the COVID-19 virus with any reasonable
success. The idea is that a patients presents some symptoms, and we must diagnostically reason from the
symptoms back to the cause. The BBN is taken from BayesiaLab’s Differential Diagnosis model.
PROGRAM:
Data
import pandas as pd
Data Transformation
We will apply transformations to the data, primarily on the symptoms. There are only about 200 unique
symptoms on all the COVID-19 patients. We map these 200 unique symptoms in a many-to-many approach
to 32 broad symptom categories. The following are the 32 broad symptom categories.
abdominal_pain
anorexia
anosmia
chest
chills
coronary
diarrhoea
digestive
discharge
dizziness
dry_cough
dryness
dyspnea
eye
fatigue
fever
headache
lungs
malaise
mild
22
muscle
myelofibrosis
nasal
nausea
respiratory
running_nose
sneezing
sore_throat
sputum
sweating
walking
wheezing
import json
import itertools
from datetime import datetime
def tokenize(s):
if s is None or isinstance(s, float) or len(s) < 1 or pd.isna(s):
return None
try:
delim = ';' if ';' in s else ','
return [t.strip().lower() for t in s.split(delim) if len(t.strip()) > 0]
except:
return s
def map_to_symptoms(s):
if s.startswith('fever') or s.startswith('low fever'):
return ['fever']
return [k for k, v in symptom_map.items() if s in v]
d = data[['symptoms']].dropna(how='all').copy(deep=True)
print(d.shape)
for s in symptom_map.keys():
d[s] = d.symptoms.apply(lambda arr: 0 if arr is None else 1 if s in arr else 0)
d = d.drop(['symptoms'], axis=1)
print(d.shape)
OUTPUT:
(656, 1)
(656, 32)
23
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn')
plt.tight_layout()
OUTPUT:
RESULT:
24
EX NO: 7
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
DATE: data set for clustering using the k-Means algorithm. Compare the results of these
two algorithms.
AIM:
PROGRAM:
# transform your data such that its distribution will have a # mean value 0 and standard deviation of 1.
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=40)
25
gmm.fit(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[0], s=40)
plt.title('GMM Clustering')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('Observation: The GMM using EM algorithm based clustering matched the true labels more closely
than the Kmeans.')
OUTPUT:
Observation: The GMM using EM algorithm based clustering matched the true labels more closely than the
Kmeans.
RESULT:
26
EX NO: 8
Write a program to implement k-Nearest Neighbour algorithm to classify the iris
DATE: data set. Print both correct and wrong predictions.
AIM:
PROCEDURE:
Training algorithm:
For each training example (x, f (x)), add the example to the list training examples Classification
algorithm:
Given a query instance xq to be classified,
Let x1 . . .xk denote the k instances from training examples that are nearest to xq
Return
Where, f(xi) function to calculate the mean value of the k nearest training examples.
Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes) Number of Attributes: 4
numeric, predictive attributes and the Class.
27
Sample Data:
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
OUTPUT:
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[4 0 0]
[0 4 0]
[0 2 5]] 29
-------------------------------------------------------------------------
Classification Report:
precision recall f1-score support
-------------------------------------------------------------------------
Accuracy of the classifer is 0.87
-------------------------------------------------------------------------
RESULT:
30
EX NO: 9
Implement the non-parametric Locally Weighted Regression algorithm in order
DATE: to fit data points. Select an appropriate data set for your experiment and draw
graphs.
AIM:
PROCEDURE:
Regression:
Regression:
Regressionisatechniquefromstatisticsthatisusedtopredictvaluesofadesired target
quantity when the target quantity is continuous.
Inregression,weseektoidentify(orestimate)acontinuousvariableyassociatedwith a given
input vector x.
yiscalledthedependentvariable.
xiscalledtheindependentvariable.
Loess/LowessRegression:
Loessregressionisanonparametrictechnique thatuseslocalweightedregressiontofita smooth curve
through points in a scatter plot.
LowessAlgorithm:
Locallyweightedregressionisaverypowerfulnonparametric modelusedinstatistical
learning.
GivenadatasetX,y,weattempttofindamodelparameterβ(x)that minimizes
residualsumofweightedsquared errors.
Theweightsaregivenbyakernelfunction(korw)whichcanbechosenarbitrarily
ALGORITHM:
1. ReadtheGivendataSampletoXandthecurve(linearornonlinear)toY
2. SetthevalueforSmootheningparameterorFreeparametersayτ
3. Setthebias/Pointofinterestsetx0whichisasubsetofX
4. Determinetheweightmatrixusing:
5. Determinethevalueofmodeltermparameter βusing :
6. Prediction=x0*β:
31
PROGRAM:
residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)
32
OUTPUT:
RESULT:
33