ML Experiments
ML Experiments
Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of
training data samples. Read the training data from a .CSV file.
Program:
import csv
a = []
with open('C:\\Users\\KOTES\\OneDrive\\Desktop\\BOOK2.CSV', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'yes':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is :\n" .format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)
Output:
Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-
Elimination algorithm to output a description of the set of all hypotheses consistent with the training examples.
Program:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('C:\\Users\\KOTES\\OneDrive\\Desktop\\BOOK2.CSV'))
concepts = np.array(data.iloc[:, 0:-1])
print(concepts)
target = np.array(data.iloc[:, -1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in
range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm", i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val ==['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final,g_final=learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
Output:
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set
for building the decision tree and apply this knowledge to classify a new sample.
Porgram:
import pandas as pd
import math
data=pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\EXP 3\\buys.csv")
features=[feat for feat in data.columns if feat !="Buys_computer"]
class Node:
def __init__(self):
self.children=[]
self.value=""
self.isLeaf=False
self.pred=""
def printTree(self,depth=0):
for i in range(depth):
print("\t",end="")
print(self.value,end="")
if self.isLeaf:
print("->",self.pred)
print()
for child in self.children:
child.printTree(depth+1)
def entropy(examples):
pos=examples["Buys_computer"].eq("yes").sum()
neg=examples["Buys_computer"].eq("no").sum()
if pos==0.0 or neg==0.0:
return 0.0
else:
p=pos/(pos+neg)
n=neg/(pos+neg)
return - (p * math.log (p,2) + n * math.log(n,2))
def info_gain(example,attr):
gain=entropy(example)
for u in example[attr]. unique():
subdata=example[example[attr]==u]
gain-=(len(subdata)/len(example))* entropy (subdata)
return gain
def ID3(example,attr):
root=Node()
max_gain=0
max_feat=""
for feature in attr:
gain=info_gain(example,feature)
if gain > max_gain:
max_gain=gain
max_feat=feature
root.value=max_feat
for u in example [max_feat].unique():
subdata=example[example[max_feat]==u]
if entropy(subdata)==0.0:
newNode=Node()
newNode.isLeaf=True
newNode.value=u
newNode.pred=subdata["Buys_computer"].unique()
root.children.append(newNode)
else:
dummyNode=Node()
dummyNode.value=u
new_attrs=[attr for attr in attr if attr !=max_feat]
child=ID3(subdata,new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
root= ID3 (data,features)
print("Decision Tree is:")
root.printTree()
Output:
Experiment-4:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform this
task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and recall for
your data set.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
df=pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
X=df[['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age','tar
get']]
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)
clf=GaussianNB()
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)
accuracy=accuracy_score(y_test,predictions)
confusion=confusion_matrix(y_test,predictions)
y_pred=clf.predict(X_test)
print(classification_report(y_test,y_pred))
print("accuracy:",accuracy)
print("confusion matrix: \n",confusion)
print('predictions',predictions)
Output:
Experiment-5:
Program:
Output:
Experiment-6:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct and
wrong predictions.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
data = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
n_neighbors = int(input("Enter the number of neighbors (k): "))
knn_classifier = KNeighborsClassifier(n_neighbors=n_neighbors)
knn_classifier.fit(X_train_scaled, y_train)
y_pred = knn_classifier.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:\n', report)
print('Confusion Matrix:\n', conf_matrix)
print("Predicted Labels", y_pred)
print("Actual Labels", y_test)
Output:
Experiment-7: Develop a program for Bias, Variance, Remove duplicates, Cross Validation
Program:
# import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load the heart disease dataset (replace 'heart.csv' with your dataset path)
data = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
# Remove duplicates
data = data.drop_duplicates()
Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier
Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import mean_squared_error, accuracy_score, confusion_matrix, classification_report
# Linear Regression
model = LinearRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Linear Regression Mean squared error:', mse)
# Logistic Regression
model = LogisticRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print('Logistic Regression Accuracy:', accuracy)
print('Logistic Regression Confusion Matrix:\n', conf_matrix)
Output:
Experiment-9: Write a program to Implement Support Vector Machines
Program:
# Load the iris dataset (you can replace this with your own dataset)
iris = datasets.load_iris()
X = iris.data
y = iris.target
Output:
Accuracy: 0.97
Experiment-10:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
Program:
# Set k here
ypred = localWeightRegression(X, mtip, 0.5)
SortIndex = X[:, 1].argsort(0)
xsort = X[SortIndex][:, 0]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(bill, tip, color='black')
ax.plot(xsort[:, 1], ypred[SortIndex], color='green', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
Output:
Experiment-11:Write a program to implement Categorical Encoding, One-hot Encoding
Program:
import pandas as pd
def categorical_encoding(df):
return pd.get_dummies(df)
def one_hot_encoding(df):
return pd.get_dummies(df, drop_first=True)
# Example usage:
df = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\tips.csv")
Output:
Experiment-12:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Program:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import seaborn as sns
# Apply EM Algorithm
em_clusters = GaussianMixture(n_components=3, random_state=42).fit_predict(heart_data_scaled)
heart_data['em_cluster'] = em_clusters
# Visualize Clusters
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.scatterplot(x=heart_data.columns[0], y=heart_data.columns[1], hue='kmeans_cluster', data=heart_data,
palette='viridis', legend='full')
plt.title('k-Means Clustering')
plt.subplot(1, 2, 2)
sns.scatterplot(x=heart_data.columns[0], y=heart_data.columns[1], hue='em_cluster', data=heart_data,
palette='viridis', legend='full')
plt.title('EM Clustering')
plt.show()
Output:
Experiment-13: Write a program to Implement Principle Component Analysis
Program:
# Example usage
if __name__ == "__main__":
# Generate some example data
np.random.seed(42)
data = np.random.rand(10, 5) # 100 samples, 5 features
Output:
Experiment-14:Write a Python program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
Program:
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
data = pd.read_csv("C:\\Users\\KOTES\\Downloads\\ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))
})
print(q)
Output:
Experiment-15:Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same using appropriate data sets.
Program:
import numpy as np
# Data normalization
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X / np.amax(X, axis=0)
y = y / 100
# Sigmoid Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Variable initialization
epoch = 7000
lr = 0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
# Training
for i in range(epoch):
# Forward Propagation
hinp1 = np.dot(X, wh)
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)
# Backpropagation
EO = y - output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
# Display results
print("Predicted Output: \n", output)
Output:
Predicted Output:
[[0.89384034]
[0.8807541 ]
[0.89510843]]