AD3461 ML lab manual
AD3461 ML lab manual
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
5. Implement naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.
1
Ex.No.-1 CANDIDATE ELIMINATION ALGORITHM
AIM:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples
ALGORITHM:
The Candidate Elimination algorithm is a method in machine learning used to find a
hypothesis that is consistent with the training examples. It maintains two sets of hypotheses: the most
specific hypotheses (S) and the most general hypotheses (G).
Step 1: Initialization:
• Set the most specific hypothesis S0 to the most specific hypothesis possible.
• Set the most general hypothesis G0 to the most general hypothesis possible.
• If d is a negative example:
Step 3: Temination
• The algorithm terminates when all training examples have been processed.
• The version space (the set of hypotheses consistent with all training examples) is represented by the
hypotheses in S and G.
PROCEDURE:
2
PROGRAM:
3
OUTPUT:
RESULT:
Thus the Candidate Elimination algorithm was implemented and demonstrated successfully for
the given training example.
4
EX. NO.2 DECISION TREE BASED ID3 ALGORITHM
AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
ALGORITHM:
Step 1: Data Preparation
PROCEDURE:
PROGRAM:
import pandas as pd
import numpy as np
import math
from collections import Counter
def entropy(data):
labels = data.iloc[:, -1]
label_counts = Counter(labels)
total_count = len(labels)
ent = 0.0
for count in label_counts.values():
probability = count / total_count
6
ent -= probability * math.log2(probability)
return ent
if len(np.unique(labels)) == 1:
return labels.iloc[0]
if len(attributes) == 0:
return Counter(labels).most_common(1)[0][0]
return tree
attribute = next(iter(tree))
attribute_value = sample[attribute]
if attribute_value in tree[attribute]:
return classify(tree[attribute][attribute_value], sample)
else:
return None
OUTPUT:
The decision tree will be printed in a nested dictionary format, representing the structure of the
decision tree. Each key represents a decision node, and each value represents the branches (subtrees) for each
possible value of the attribute.
The classification result will be a single value, either 'Yes' or 'No', indicating whether the new sample
should be classified as 'Play' or 'No Play'.
Decision Tree: {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}
Classification of new sample: No
RESULT:
Thus the decision tree based ID3 algorithm was implemented and demonstrated successfully for the
given training example.
8
EX. NO.3 ARTIFICIAL NEURAL NETWORK BY
BACKPROPAGATION ALGORITHM
AIM:
To build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
ALGORITHM:
Step 1: Data Preparation
hidden_layer_activation = inputs.dot(weights_input_hidden)
o Apply the sigmoid function to get the output of the hidden layer:
hidden_layer_output = sigmoid(hidden_layer_activation)
final_layer_activation = hidden_layer_output.dot(weights_hidden_output)
predicted_output = sigmoid(final_layer_activation)
5. Calculate Error:
o Compute the error by subtracting the predicted output from the actual output:
error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)
d_hidden_layer = error_hidden_layer *
sigmoid_derivative(hidden_layer_output)
weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) *
learning_rate
PROCEDURE:
10
PROGRAM:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
np.random.seed(42)
# Output dataset
input_neurons = inputs.shape[1]
hidden_neurons = 2
output_neurons = outputs.shape[1]
# Learning rate
learning_rate = 0.5
epochs = 10000
11
# Forward pass
hidden_layer_output = sigmoid(hidden_layer_activation)
predicted_output = sigmoid(final_layer_activation)
# Calculate error
# Backward pass
error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)
# Update weights
if epoch % 1000 == 0:
loss = np.mean(np.abs(error))
hidden_layer_output = sigmoid(hidden_layer_activation)
predicted_output = sigmoid(final_layer_activation)
print("Predicted Output:")
print(predicted_output)
OUTPUT:
12
Epoch 2000, Loss: 0.006559532659682358
Predicted Output:
[[0.01202748]
[0.98827313]
[0.98814568]
[0.01163065]]
• Epoch and Loss: The loss decreases over time, indicating that the network is learning.
• Predicted Output: The predicted values are close to the actual XOR outputs [0, 1, 1, 0]
13
RESULT:
Thus the above code demonstrates the implementation of a simple neural network using the
backpropagation algorithm and tests it using the XOR problem.
AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
ALGORITHM:
Step 1: Load the Dataset
14
o Use the trained model to predict the labels for the test set.
PROGRAM:
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv')
le = LabelEncoder()
data = data.apply(le.fit_transform)
X = data.drop('Play', axis=1)
y = data['Play']
15
# Implement the Naïve Bayes Classifier
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(y_pred)
print(y_test.values)
OUTPUT:
Accuracy: 77.78%
[1 0 1 1 1]
[1 0 1 0 1]
16
RESULT:
Thus the naiive Bayesian classifier has been implemented for a given training sample
dataset.
AIM:
To implement the naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.
ALGORITHM:
Step 1: Load and Prepare the Dataset
1. Load Dataset:
o Fetch or download a dataset of documents. The 20 Newsgroups dataset is a common choice for
text classification tasks.
2. Preprocess Data:
o Preprocess the text data by tokenizing, removing stopwords, and converting text to numerical
features using techniques like TF-IDF or CountVectorizer.
3. Split Dataset:
o Split the dataset into training and testing sets. The usual split ratio is 70-30 or 80-20 for training and
testing, respectively.
17
Step 3: Train the Naïve Bayes Classifier
4. Initialize Classifier:
o Choose a suitable Naïve Bayesian classifier variant, such as MultinomialNB or GaussianNB.
5. Train the Model:
o Fit the classifier to the training data to learn the relationships between features and labels.
PROGRAM:
import pandas as pd
import numpy as np
18
# Vectorize the text data using CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target
model = MultinomialNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("\nClassification Report:")
OUTPUT:
Accuracy: 83.52%
Precision: 83.93%
Recall: 83.52%
Classification Report:
precision recall f1-score support
19
alt.atheism 0.90 0.70 0.79 319
comp.graphics 0.84 0.77 0.80 389
comp.os.ms-windows.misc 0.81 0.80 0.81 394
comp.sys.ibm.pc.hardware 0.68 0.88 0.77 392
comp.sys.mac.hardware 0.89 0.88 0.88 385
comp.windows.x 0.89 0.86 0.87 395
misc.forsale 0.83 0.85 0.84 390
rec.autos 0.90 0.91 0.90 396
rec.motorcycles 0.91 0.96 0.94 398
rec.sport.baseball 0.94 0.92 0.93 397
rec.sport.hockey 0.92 0.95 0.94 399
sci.crypt 0.97 0.96 0.97 396
sci.electronics 0.80 0.80 0.80 393
sci.med 0.94 0.86 0.90 396
sci.space 0.86 0.97 0.91 394
soc.religion.christian 0.67 0.98 0.79 398
talk.politics.guns 0.76 0.92 0.83 364
talk.politics.mideast 0.98 0.87 0.92 376
talk.politics.misc 0.90 0.59 0.71 310
talk.religion.misc 0.93 0.42 0.58 251
accuracy 0.84 7532
macro avg 0.87 0.84 0.84 7532
weighted avg 0.86 0.84 0.84 7532
RESULT:
Thus the above code demonstrated the implementation of the naïve Bayesian classifier for a
sample training data set stored as a .CSV file and the accuracy has been computed.
AIM:
To write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.
ALGORITHM:
20
Step 1: Load and Preprocess the Dataset
1. Load Dataset:
o Load the dataset containing patient data with symptoms and CORONA infection status.
2. Preprocess Data:
o Convert categorical variables to numerical format suitable for Bayesian network construction.
3. Define Nodes:
o Identify the variables (nodes) representing symptoms and CORONA infection status.
4. Define Edges:
o Determine the relationships (edges) between variables based on domain knowledge or data
analysis.
5. Learn Parameters:
o Use statistical methods like Maximum Likelihood Estimation (MLE) to estimate the parameters
(conditional probabilities) of the Bayesian network from the dataset.
6. Perform Inference:
o Use the constructed Bayesian network for inference tasks such as predicting the probability of
CORONA infection given observed symptoms.
7. Display Results:
o Output the results of inference, including the probability of CORONA infection given observed
symptoms.
PROGRAM:
import numpy as np
import pandas as pd
data = pd.read_csv('corona_dataset.csv')
('Fever', 'CoronaInfection'),
('BreathingDifficulty', 'CoronaInfection'),
('SoreThroat', 'CoronaInfection')])
# Learn the parameters from the data using Maximum Likelihood Estimation
model.fit(data, estimator=MaximumLikelihoodEstimator)
# Perform inference
inference = VariableElimination(model)
print(query)
OUTPUT:
+----------------------+--------------------------+
| CoronaInfection | phi(CoronaInfection) |
+======================+==========================+
| CoronaInfection(0) | 0.5429 |
+----------------------+--------------------------+
| CoronaInfection(1) | 0.4571 |
+----------------------+--------------------------+
22
This output represents the probability distribution of CORONA infection given the observed
symptoms (Cough=1, Fever=1), obtained from performing inference using the Bayesian network. In this
example, the probability of CORONA infection being positive is approximately 45.71%, while the probability
of being negative is approximately 54.29%.
The actual output will depend on the dataset and the observed symptoms provided for inference.
RESULT:
Thus the program to construct a Bayesian network to diagnose CORONA infection
using standard WHO Data Set has been implemented successfully.
23
EX. NO.7 EXPECTATION-MAXIMIZATION (EM) AND K-MEANS
ALGORITHM
AIM:
To apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
ALGORITHM:
1. Load Dataset:
o Load the dataset from the .CSV file into a pandas DataFrame.
3. Extract Features:
o Extract the features from the dataset, converting it into a format suitable for clustering algorithms.
4. Apply k-Means:
o Apply the k-Means algorithm to cluster the data into k clusters using the KMeans class from scikit-
learn.
5. Apply EM Algorithm:
o Apply the Expectation-Maximization (EM) algorithm to cluster the data into k clusters using the
GaussianMixture class from scikit-learn.
7. Compare Results:
o Compare the clustering results of the EM algorithm and the k-Means algorithm based on the
performance metric.
24
PROGRAM:
import pandas as pd
data = pd.read_csv('data.csv')
data.dropna(inplace=True)
X = data.values
kmeans_labels = kmeans.fit_predict(X)
# Apply EM algorithm
gmm_labels = gmm.fit_predict(X)
25
OUTPUT:
• In this example, the silhouette score for the k-Means algorithm is 0.6, while the silhouette score for the
EM (Gaussian Mixture) algorithm is 0.7.
• The silhouette score ranges from -1 to 1, where a higher score indicates better clustering performance.
Therefore, in this hypothetical scenario, the EM algorithm performs slightly better than the k-Means
algorithm based on the silhouette score.
• Your actual output may vary depending on the dataset and the parameters used in the clustering
algorithms.
26
RESULT:
Thus the above program demonstrates the clustering of a sample dataset using EM and K-
means algorithm and their results have been compared.
AIM:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
ALGORITHM:
• Load the Iris dataset, which contains features (sepal length, sepal width, petal length, petal width)
and corresponding labels (species).
• Split the dataset into training and testing sets. Typically, a 70-30 or 80-20 split is used for training
and testing, respectively.
• Predict the labels for the test set using the trained classifier.
• Evaluate the accuracy of the classifier by comparing the predicted labels with the true labels from
the test set.
27
Step 7: Print Correct and Wrong Predictions
• Print both correct and wrong predictions, along with the corresponding feature values, true labels,
and predicted labels.
PROGRAM:
import numpy as np
import pandas as pd
pip install scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
28
wrong_predictions = []
for i in range(len(y_test)):
if y_test[i] == y_pred[i]:
correct_predictions.append((X_test[i], y_test[i], y_pred[i]))
else:
wrong_predictions.append((X_test[i], y_test[i], y_pred[i]))
print("\nCorrect Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(correct_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")
print("\nWrong Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(wrong_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")
OUTPUT:
The accuracy of the k-NN classifier on the test set, printed as "Accuracy: [accuracy_value]"
Correct Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...
Wrong Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...
29
RESULT:
The above program implements the k-Nearest Neighbors (k-NN) algorithm for classifying the
Iris dataset and printing both correct and wrong predictions. It demonstrates how to load the dataset,
split it into training and testing sets, train the k-NN classifier, predict labels, evaluate accuracy, and
print predictions. Adjusting the value of k may affect the classification results.
AIM:
To implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.
ALGORITHM:
1. Input Parameters:
o Accept the input parameters:
▪ X: Feature matrix (shape: m x n).
▪ y: Target vector (shape: m x 1).
▪ query_point: Query point for prediction (shape: 1 x n).
▪ tau: Bandwidth parameter for weighting (optional).
2. Calculate Weights:
o For each data point in X, calculate the weight based on the distance from the query_point
using a Gaussian kernel:
▪ \text{weights} = \exp \left( -\frac{{\sum (X - \text{query_point})^2}}{2 \tau^2} \right)
3. Apply Weights:
o Create a diagonal weight matrix WWW using the calculated weights.
4. Calculate Theta:
o Compute the parameter vector θ\thetaθ using the weighted least squares formula:
30
5. Predict:
o Predict the target value for the query_point using the computed θ\thetaθ:
▪ \text{prediction} = \text{query_point} \cdot \theta
6. Output:
o Return the prediction.
PROGRAM:
# Function for Locally Weighted Regression
def locally_weighted_regression(X, y, query_point, tau=0.1):
# Calculate weights
weights = np.exp(-np.sum((X - query_point)**2, axis=1) / (2 * tau**2))
# Create weight matrix
W = np.diag(weights)
# Calculate theta
theta = np.linalg.inv(X.T.dot(W).dot(X)).dot(X.T).dot(W).dot(y)
# Predict target value
prediction = query_point.dot(theta)
return prediction
# Example usage
predictions = []
for query_point in query_points:
prediction = locally_weighted_regression(X, y, query_point, tau)
predictions.append(prediction)
OUTPUT:
31
RESULT:
Thus the above program was implemented successfully to demonstrate Locally Weighted
Regression for fitting data points. You can adjust the bandwidth parameter tau and experiment with
different datasets to observe the effects on the regression curve.
32