0% found this document useful (0 votes)
6 views

ML Experiments

The document outlines a series of programming experiments demonstrating various machine learning algorithms and techniques, including FIND-S, Candidate-Elimination, ID3 decision trees, Naïve Bayes, k-Nearest Neighbors, and Support Vector Machines. Each experiment includes code snippets in Python for implementing the algorithms, along with data preprocessing steps, model training, and evaluation metrics such as accuracy and confusion matrices. Additionally, it covers exploratory data analysis and categorical encoding methods, culminating in clustering comparisons using the EM algorithm and k-Means.

Uploaded by

dmoby93933
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ML Experiments

The document outlines a series of programming experiments demonstrating various machine learning algorithms and techniques, including FIND-S, Candidate-Elimination, ID3 decision trees, Naïve Bayes, k-Nearest Neighbors, and Support Vector Machines. Each experiment includes code snippets in Python for implementing the algorithms, along with data preprocessing steps, model training, and evaluation metrics such as accuracy and confusion matrices. Additionally, it covers exploratory data analysis and categorical encoding methods, culminating in clustering comparisons using the EM algorithm and k-Means.

Uploaded by

dmoby93933
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Experiment-1:

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of
training data samples. Read the training data from a .CSV file.

Program:

import csv
a = []
with open('C:\\Users\\KOTES\\OneDrive\\Desktop\\BOOK2.CSV', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'yes':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is :\n" .format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)

Output:
Experiment-2:

For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-
Elimination algorithm to output a description of the set of all hypotheses consistent with the training examples.

Program:

import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('C:\\Users\\KOTES\\OneDrive\\Desktop\\BOOK2.CSV'))
concepts = np.array(data.iloc[:, 0:-1])
print(concepts)
target = np.array(data.iloc[:, -1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in
range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm", i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val ==['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final,g_final=learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")

Output:
Experiment-3:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set
for building the decision tree and apply this knowledge to classify a new sample.
Porgram:
import pandas as pd
import math
data=pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\EXP 3\\buys.csv")
features=[feat for feat in data.columns if feat !="Buys_computer"]
class Node:
def __init__(self):
self.children=[]
self.value=""
self.isLeaf=False
self.pred=""
def printTree(self,depth=0):
for i in range(depth):
print("\t",end="")
print(self.value,end="")
if self.isLeaf:
print("->",self.pred)
print()
for child in self.children:
child.printTree(depth+1)
def entropy(examples):
pos=examples["Buys_computer"].eq("yes").sum()
neg=examples["Buys_computer"].eq("no").sum()
if pos==0.0 or neg==0.0:
return 0.0
else:
p=pos/(pos+neg)
n=neg/(pos+neg)
return - (p * math.log (p,2) + n * math.log(n,2))
def info_gain(example,attr):
gain=entropy(example)
for u in example[attr]. unique():
subdata=example[example[attr]==u]
gain-=(len(subdata)/len(example))* entropy (subdata)
return gain
def ID3(example,attr):
root=Node()
max_gain=0
max_feat=""
for feature in attr:
gain=info_gain(example,feature)
if gain > max_gain:
max_gain=gain
max_feat=feature
root.value=max_feat
for u in example [max_feat].unique():
subdata=example[example[max_feat]==u]
if entropy(subdata)==0.0:
newNode=Node()
newNode.isLeaf=True
newNode.value=u
newNode.pred=subdata["Buys_computer"].unique()
root.children.append(newNode)
else:
dummyNode=Node()
dummyNode.value=u
new_attrs=[attr for attr in attr if attr !=max_feat]
child=ID3(subdata,new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
root= ID3 (data,features)
print("Decision Tree is:")
root.printTree()

Output:
Experiment-4:

Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform this
task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and recall for
your data set.

Program:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
df=pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
X=df[['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age','tar
get']]
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)
clf=GaussianNB()
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)
accuracy=accuracy_score(y_test,predictions)
confusion=confusion_matrix(y_test,predictions)
y_pred=clf.predict(X_test)
print(classification_report(y_test,y_pred))
print("accuracy:",accuracy)
print("confusion matrix: \n",confusion)
print('predictions',predictions)

Output:
Experiment-5:

Exploratory Data Analysis for Classification using Pandas or Matplotlib.

Program:

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
df = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
# Bar Charts
df['BloodPressure'].value_counts().plot(kind='bar')
plt.xlabel('BloodPressure')
plt.ylabel('Count')
plt.title('Bar Chart - BloodPressure')
plt.show()
# Histograms
plt.hist(df['Age'], bins=20)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Histogram - Age Distribution')
plt.show()
# Line Charts
plt.plot(df['Age'], df['BMI'], marker='o', linestyle='None')
plt.xlabel('Age')
plt.ylabel('BMI')
plt.title('Scatter Plot - Age vs. BMI')
plt.show()
# Bubble Charts
plt.scatter(df['Age'], df['BMI'], s=df['BloodPressure'], alpha=0.5)
plt.xlabel('Age')
plt.ylabel('BMI')
plt.title('Bubble Chart - Age vs. BMI')
plt.show()

Output:
Experiment-6:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct and
wrong predictions.

Program:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
data = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
n_neighbors = int(input("Enter the number of neighbors (k): "))
knn_classifier = KNeighborsClassifier(n_neighbors=n_neighbors)
knn_classifier.fit(X_train_scaled, y_train)
y_pred = knn_classifier.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:\n', report)
print('Confusion Matrix:\n', conf_matrix)
print("Predicted Labels", y_pred)
print("Actual Labels", y_test)

Output:
Experiment-7: Develop a program for Bias, Variance, Remove duplicates, Cross Validation

Program:

# import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load the heart disease dataset (replace 'heart.csv' with your dataset path)
data = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\diabetes.csv")

# Remove duplicates
data = data.drop_duplicates()

# Prepare features and target


X = data.drop('target', axis=1)
y = data['target']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest Classifier


model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Calculate bias and variance


y_pred_train = model.predict(X_train)
bias = 1 - accuracy_score(y_train, y_pred_train)
variance = 1 - accuracy_score(y_test, model.predict(X_test))
# Cross-validation
cv_scores = cross_val_score(model, X, y, cv=10)
average_cv_score = np.mean(cv_scores)

print("Bias (Training Error):", bias)


print("Variance (Test Error):", variance)
print("Average Cross-Validation Accuracy:", average_cv_score)
print("cv_scores",cv_scores)
print("After removing duplicates \n",data)
Output:
Experiment-8:

Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier

Program:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import mean_squared_error, accuracy_score, confusion_matrix, classification_report

# Load the data


df = pd.read_csv('C:\\Users\\KOTES\\OneDrive\\Desktop\\heart.csv')

# Split the data into training and test sets


X_train,X_test,y_train,y_test=
train_test_split(df[['age','sex','cp','trtbps','chol','fbs','restecg','thalachh','exng','oldpeak','slp','caa','thall']], df['output'],
test_size=0.25, random_state=42)

# Linear Regression
model = LinearRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Linear Regression Mean squared error:', mse)

# Logistic Regression
model = LogisticRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print('Logistic Regression Accuracy:', accuracy)
print('Logistic Regression Confusion Matrix:\n', conf_matrix)

# Support Vector Machine (SVM)


model = SVC().fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print('SVM Accuracy:', accuracy)
print('SVM Confusion Matrix:\n', conf_matrix)

Output:
Experiment-9: Write a program to Implement Support Vector Machines

Program:

# Import necessary libraries


from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset (you can replace this with your own dataset)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create an SVM classifier


svm_classifier = SVC(kernel='linear', C=1.0, random_state=42)

# Train the classifier on the training data


svm_classifier.fit(X_train, y_train)

# Make predictions on the testing data


predictions = svm_classifier.predict(X_test)

# Evaluate the accuracy of the model


accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")

Output:
Accuracy: 0.97
Experiment-10:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.

Program:

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m, n = np.shape(xmat)
weights = np.mat(np.eye(m))
for j in range(m):
diff = point - xmat[j, :]
weights[j, j] = np.exp(np.dot(diff, diff.T) / (-2.0 * k**2))
return weights

def localWeight(point, xmat, ymat, k):


wei = kernel(point, xmat, k)
W = (xmat.T @ (wei @ xmat)).I @ (xmat.T @ (wei @ ymat))
return W

def localWeightRegression(xmat, ymat, k):


m, n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = np.dot(xmat[i, :], localWeight(xmat[i, :], xmat, ymat, k))
return ypred

# Load data points


data = pd.read_csv('C:\\Users\\KOTES\\OneDrive\\Desktop\\tips.csv')
bill, tip = np.array(data.total_bill), np.array(data.tip)
mbill, mtip = np.mat(bill).T, np.mat(tip).T # Transpose the matrices
X = np.hstack((np.ones((len(bill), 1)), mbill))

# Set k here
ypred = localWeightRegression(X, mtip, 0.5)
SortIndex = X[:, 1].argsort(0)
xsort = X[SortIndex][:, 0]

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(bill, tip, color='black')
ax.plot(xsort[:, 1], ypred[SortIndex], color='green', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
Output:
Experiment-11:Write a program to implement Categorical Encoding, One-hot Encoding

Program:

import pandas as pd
def categorical_encoding(df):
return pd.get_dummies(df)
def one_hot_encoding(df):
return pd.get_dummies(df, drop_first=True)
# Example usage:
df = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\tips.csv")

# Perform categorical encoding.


categorical_encoded_df = categorical_encoding(df.copy())

# Perform one-hot encoding.


one_hot_encoded_df = one_hot_encoding(df.copy())

# Print the encoded dataframes.


print(categorical_encoded_df)
print(one_hot_encoded_df)

Output:
Experiment-12:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.

Program:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset from CSV


heart_data = pd.read_csv("C:\\Users\\KOTES\\OneDrive\\Desktop\\heart.csv")

# Handle missing values and standardize features


imputer = SimpleImputer(strategy="mean")
scaler = StandardScaler()
heart_data_scaled = pd.DataFrame(scaler.fit_transform(imputer.fit_transform(heart_data)),
columns=heart_data.columns)

# Apply k-Means Algorithm


kmeans_clusters = KMeans(n_clusters=3, random_state=42).fit_predict(heart_data_scaled)
heart_data['kmeans_cluster'] = kmeans_clusters

# Apply EM Algorithm
em_clusters = GaussianMixture(n_components=3, random_state=42).fit_predict(heart_data_scaled)
heart_data['em_cluster'] = em_clusters

# Visualize Clusters
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.scatterplot(x=heart_data.columns[0], y=heart_data.columns[1], hue='kmeans_cluster', data=heart_data,
palette='viridis', legend='full')
plt.title('k-Means Clustering')

plt.subplot(1, 2, 2)
sns.scatterplot(x=heart_data.columns[0], y=heart_data.columns[1], hue='em_cluster', data=heart_data,
palette='viridis', legend='full')
plt.title('EM Clustering')

plt.show()

Output:
Experiment-13: Write a program to Implement Principle Component Analysis

Program:

from sklearn.decomposition import PCA

# Example usage
if __name__ == "__main__":
# Generate some example data
np.random.seed(42)
data = np.random.rand(10, 5) # 100 samples, 5 features

# Specify the number of principal components


num_components = 2

# Create a PCA instance


pca_model = PCA(n_components=num_components)

# Fit the model and transform the data


result = pca_model.fit_transform(data)

# Display the results


print(f"Original data shape: {data.shape}")
print(f"PCA result shape: {result.shape}")
print("PCA Result:")
print(result)

Output:
Experiment-14:Write a Python program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set

Program:

import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

data = pd.read_csv("C:\\Users\\KOTES\\Downloads\\ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)

model = BayesianModel([('age', 'Lifestyle'),('Gender', 'Lifestyle'), ('Family', 'heartdisease'),('diet',


'cholestrol'),('Lifestyle', 'diet'),('cholestrol', 'heartdisease'),('diet', 'cholestrol')])

model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)

HeartDisease_infer = VariableElimination(model)

q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))
})

print(q)

Output:
Experiment-15:Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same using appropriate data sets.

Program:

import numpy as np
# Data normalization
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X / np.amax(X, axis=0)
y = y / 100

# Sigmoid Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Derivative of Sigmoid Function


def derivatives_sigmoid(x):
return x * (1 - x)

# Variable initialization
epoch = 7000
lr = 0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1

# Weight and bias initialization


wh = np.random.uniform(size=(inputlayer_neurons, hiddenlayer_neurons))
bh = np.random.uniform(size=(1, hiddenlayer_neurons))
wout = np.random.uniform(size=(hiddenlayer_neurons, output_neurons))
bout = np.random.uniform(size=(1, output_neurons))

# Training
for i in range(epoch):
# Forward Propagation
hinp1 = np.dot(X, wh)
hinp = hinp1 + bh
hlayer_act = sigmoid(hinp)

outinp1 = np.dot(hlayer_act, wout)


outinp = outinp1 + bout
output = sigmoid(outinp)

# Backpropagation
EO = y - output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad

EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad

# Update weights and biases


wout += hlayer_act.T.dot(d_output) * lr
wh += X.T.dot(d_hiddenlayer) * lr

# Display results
print("Predicted Output: \n", output)

Output:

Predicted Output:
[[0.89384034]
[0.8807541 ]
[0.89510843]]

You might also like