0% found this document useful (0 votes)

154 views

AD3461 ML lab manual

The document outlines a series of experiments focused on implementing various machine learning algorithms, including Candidate-Elimination, ID3 decision trees, Backpropagation for neural networks, and Naïve Bayesian classifiers. Each experiment includes aims, algorithms, procedures, and sample programs to demonstrate the implementation and results. The document serves as a comprehensive guide for practical applications of these algorithms using datasets in CSV format.

Uploaded by

mohanprashad2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views

AD3461 ML lab manual

Uploaded by

mohanprashad2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

LIST OF EXPERIMENTS

1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and

test the same using appropriate data sets.

4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.

5. Implement naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.

6. Write a program to construct a Bayesian network to diagnose CORONA infection using

standard WHO Data Set.

7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.

8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.

9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.

1
Ex.No.-1 CANDIDATE ELIMINATION ALGORITHM

AIM:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples

ALGORITHM:
The Candidate Elimination algorithm is a method in machine learning used to find a
hypothesis that is consistent with the training examples. It maintains two sets of hypotheses: the most
specific hypotheses (S) and the most general hypotheses (G).

Step 1: Initialization:

• Set the most specific hypothesis S0 to the most specific hypothesis possible.
• Set the most general hypothesis G0 to the most general hypothesis possible.

Step 2: For each training example d:

• If d is a positive example:

• Remove from G any hypothesis that does not match d.

• For each hypothesis s in S that does not match d:
o Remove s from S.
o Add to S all minimal generalizations h of s such that h matches d and some hypothesis
in G is more general than h.
o Remove from S any hypothesis that is more general than another hypothesis in S.

• If d is a negative example:

• Remove from S any hypothesis that matches d.

• For each hypothesis g in G that matches d:
o Remove g from G.
o Add to G all minimal specializations h of g such that h does not match d and some
hypothesis in S is more specific than h.
o Remove from G any hypothesis that is less general than another hypothesis in G.

Step 3: Temination

• The algorithm terminates when all training examples have been processed.
• The version space (the set of hypotheses consistent with all training examples) is represented by the
hypotheses in S and G.

PROCEDURE:

1. Input: Read the training examples and initialize S and G.

2. Iterate over each example and update S and G.
3. Print the final S and G.

2
PROGRAM:

3
OUTPUT:

RESULT:

Thus the Candidate Elimination algorithm was implemented and demonstrated successfully for
the given training example.

4
EX. NO.2 DECISION TREE BASED ID3 ALGORITHM

AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

ALGORITHM:
Step 1: Data Preparation

1. Create the Dataset:

o Organize the dataset into a table where each row represents an instance, and each column
represents an attribute (feature), except the last column which represents the target variable
(label).

Step 2: Calculate Entropy

2. Define Entropy Function:

o Input: Data subset.
o Calculate the frequency of each label in the subset.
o Compute the probability of each label.
o Calculate the entropy using the formula:

o Output: Entropy value of the subset.

Step 3: Calculate Information Gain

3. Define Information Gain Function:

o Input: Data subset, attribute.
o Calculate the total entropy of the subset.
o For each unique value of the attribute:
▪ Create a subset of data where the attribute has the specific value.
▪ Calculate the entropy of this subset.
▪ Compute the weighted entropy.
o Calculate information gain using the formula:

o Output: Information gain for the attribute.

Step 4: Build the Decision Tree

4. Define ID3 Function:

o Input: Data, list of attributes, target attribute.
o Check if all instances have the same label:
5
▪ If true, return the label.

o Check if there are no more attributes:

▪ If true, return the most common label.
o Calculate information gain for each attribute.
o Select the attribute with the highest information gain as the best attribute.
o Create a node for the best attribute.
o For each unique value of the best attribute:
▪ Create a subset of data where the best attribute has the specific value.
▪ Recursively call the ID3 function to build the subtree.
▪ Attach the subtree to the node.
o Output: Decision tree.

Step 5: Classify a New Sample

5. Define Classify Function:

o Input: Decision tree, new sample.
o Traverse the tree starting from the root:
▪ At each node, check the value of the corresponding attribute in the sample.
▪ Follow the branch corresponding to the attribute value.
▪ If a leaf node is reached, return the label.
o Output: Classification result.

PROCEDURE:

1. Load and prepare the dataset.

2. Define the list of attributes and the target attribute.
3. Build the decision tree using the ID3 function.
4. Define a new sample for classification.
5. Classify the new sample using the Classify function.
6. Print the decision tree and the classification result.

PROGRAM:

import pandas as pd
import numpy as np
import math
from collections import Counter

def entropy(data):
labels = data.iloc[:, -1]
label_counts = Counter(labels)
total_count = len(labels)

ent = 0.0
for count in label_counts.values():
probability = count / total_count

6
ent -= probability * math.log2(probability)
return ent

def information_gain(data, attribute):

total_entropy = entropy(data)
values, counts = np.unique(data[attribute], return_counts=True)
weighted_entropy = 0.0

for value, count in zip(values, counts):

subset = data[data[attribute] == value]
weighted_entropy += (count / len(data)) * entropy(subset)

return total_entropy - weighted_entropy

def id3(data, attributes, target_attribute):

labels = data[target_attribute]

if len(np.unique(labels)) == 1:
return labels.iloc[0]

if len(attributes) == 0:
return Counter(labels).most_common(1)[0][0]

gains = {attr: information_gain(data, attr) for attr in attributes}

best_attribute = max(gains, key=gains.get)

tree = {best_attribute: {}}

attributes = [attr for attr in attributes if attr != best_attribute]

for value in np.unique(data[best_attribute]):

subset = data[data[best_attribute] == value]
subtree = id3(subset, attributes, target_attribute)
tree[best_attribute][value] = subtree

return tree

def classify(tree, sample):

if not isinstance(tree, dict):
return tree

attribute = next(iter(tree))
attribute_value = sample[attribute]

if attribute_value in tree[attribute]:
return classify(tree[attribute][attribute_value], sample)
else:
return None

# Create the dataset

data = pd.DataFrame([
['Sunny', 'Hot', 'High', 'Weak', 'No'],
7
['Sunny', 'Hot', 'High', 'Strong', 'No'],
['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Strong', 'No'],
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
['Sunny', 'Mild', 'High', 'Weak', 'No'],
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Strong', 'No']
], columns=['Outlook', 'Temperature', 'Humidity', 'Wind', 'Play'])

# Build the decision tree

attributes = list(data.columns[:-1])
target_attribute = 'Play'
tree = id3(data, attributes, target_attribute)

# Classify a new sample

new_sample = {'Outlook': 'Sunny', 'Temperature': 'Cool', 'Humidity': 'High', 'Wind': 'Strong'}
classification = classify(tree, new_sample)

print("Decision Tree:", tree)

print("Classification of new sample:", classification)

OUTPUT:
The decision tree will be printed in a nested dictionary format, representing the structure of the
decision tree. Each key represents a decision node, and each value represents the branches (subtrees) for each
possible value of the attribute.
The classification result will be a single value, either 'Yes' or 'No', indicating whether the new sample
should be classified as 'Play' or 'No Play'.
Decision Tree: {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}
Classification of new sample: No

RESULT:

Thus the decision tree based ID3 algorithm was implemented and demonstrated successfully for the
given training example.
8
EX. NO.3 ARTIFICIAL NEURAL NETWORK BY
BACKPROPAGATION ALGORITHM

AIM:
To build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
ALGORITHM:
Step 1: Data Preparation

1. Create the Dataset:

o Organize the dataset into an input matrix inputs and an output matrix outputs.

Step 2: Define Activation Functions

2. Define Sigmoid Activation Function and its Derivative:

o Sigmoid function:
o Sigmoid derivative:

Step 3: Initialize the Neural Network

3. Initialize Network Parameters:

o Define the number of neurons in the input layer, hidden layer, and output layer.
o Initialize the weights for the input-to-hidden layer (weights_input_hidden) and hidden-to-output
layer (weights_hidden_output) with small random values.
o Set the learning rate.

Step 4: Forward Propagation

4. Implement Forward Propagation:

o Calculate the activation of the hidden layer:

hidden_layer_activation = inputs.dot(weights_input_hidden)

o Apply the sigmoid function to get the output of the hidden layer:

hidden_layer_output = sigmoid(hidden_layer_activation)

o Calculate the activation of the output layer:

final_layer_activation = hidden_layer_output.dot(weights_hidden_output)

o Apply the sigmoid function to get the final predicted output:

predicted_output = sigmoid(final_layer_activation)

Step 5: Calculate Error

5. Calculate Error:
o Compute the error by subtracting the predicted output from the actual output:

error = outputs - predicted_output

9
Step 6: Backward Propagation

6. Implement Backward Propagation:

o Compute the derivative of the error with respect to the predicted output:

d_predicted_output = error * sigmoid_derivative(predicted_output)

o Calculate the error for the hidden layer:

error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)

o Compute the derivative of the hidden layer output:

d_hidden_layer = error_hidden_layer *
sigmoid_derivative(hidden_layer_output)

o Update the weights for the hidden-to-output layer:

weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) *
learning_rate

o Update the weights for the input-to-hidden layer:

weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate

Step 7: Training Process

7. Train the Network:

o Repeat the forward and backward propagation steps for a fixed number of epochs (training
iterations).

Step 8: Test the Network

8. Test the Network:

o Use the trained network to predict outputs for the input data.
o Print the predicted output.

PROCEDURE:

1. Start by importing the necessary library, NumPy.

2. Define the sigmoid activation function and its derivative
3. Create the input and output datasets. Here we use the XOR problem as an example.
4. Define the number of neurons in the input layer, hidden layer, and output layer.
5. Initialize the weights for the input-to-hidden and hidden-to-output layers with small random
values.
6. Set the learning rate.
7. Set the number of epochs (iterations) for training.
8. For each epoch, perform the forward and backward propagation steps and update the weights.
9. After training, test the network using the input data to see how well it has learned.
10. The output should display the loss at different epochs during training and the final predicted outputs,
which should be close to the expected XOR outputs [0, 1, 1, 0].

10
PROGRAM:

import numpy as np

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):

return x * (1 - x)

# Seed the random number generator for reproducibility

np.random.seed(42)

# Input dataset (XOR problem)

inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# Output dataset

outputs = np.array([[0], [1], [1], [0]])

# Number of neurons in each layer

input_neurons = inputs.shape[1]

hidden_neurons = 2

output_neurons = outputs.shape[1]

# Initialize weights randomly with mean 0

weights_input_hidden = np.random.uniform(size=(input_neurons, hidden_neurons))

weights_hidden_output = np.random.uniform(size=(hidden_neurons, output_neurons))

# Learning rate

learning_rate = 0.5

# Number of training iterations

epochs = 10000

for epoch in range(epochs):

11
# Forward pass

hidden_layer_activation = np.dot(inputs, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_activation)

final_layer_activation = np.dot(hidden_layer_output, weights_hidden_output)

predicted_output = sigmoid(final_layer_activation)

# Calculate error

error = outputs - predicted_output

# Backward pass

d_predicted_output = error * sigmoid_derivative(predicted_output)

error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)

d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

# Update weights

weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) * learning_rate

weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate

if epoch % 1000 == 0:

loss = np.mean(np.abs(error))

print(f'Epoch {epoch}, Loss: {loss}')

# Testing the neural network

hidden_layer_activation = np.dot(inputs, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_activation)

final_layer_activation = np.dot(hidden_layer_output, weights_hidden_output)

predicted_output = sigmoid(final_layer_activation)

print("Predicted Output:")

print(predicted_output)

OUTPUT:

Epoch 0, Loss: 0.49455894657661807

Epoch 1000, Loss: 0.010239189372255456

12
Epoch 2000, Loss: 0.006559532659682358

Epoch 3000, Loss: 0.004916930733643931

Epoch 4000, Loss: 0.0040249618150981085

Epoch 5000, Loss: 0.0034711518228879065

Epoch 6000, Loss: 0.0030687069156247233

Epoch 7000, Loss: 0.002758587074358801

Epoch 8000, Loss: 0.0025113592274235624

Epoch 9000, Loss: 0.0023102028827490134

Predicted Output:

[[0.01202748]

[0.98827313]

[0.98814568]

[0.01163065]]

• Epoch and Loss: The loss decreases over time, indicating that the network is learning.

• Predicted Output: The predicted values are close to the actual XOR outputs [0, 1, 1, 0]

13
RESULT:

Thus the above code demonstrates the implementation of a simple neural network using the
backpropagation algorithm and tests it using the XOR problem.

EX. NO.4 NAIIVE BAYESIAN CLASSIFIER

AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

ALGORITHM:
Step 1: Load the Dataset

1. Read the CSV file:

o Load the dataset from a CSV file using pandas.

Step 2: Preprocess the Data

2. Convert Categorical Data to Numeric:

o Use LabelEncoder from sklearn.preprocessing to convert categorical data into numeric
data.

Step 3: Split the Dataset

3. Split into Training and Test Sets:

o Separate the features (X) and the target variable (y).
o Split the dataset into training and test sets using train_test_split from
sklearn.model_selection.

Step 4: Train the Naïve Bayes Classifier

4. Fit the Naïve Bayes Model:

o Use GaussianNB from sklearn.naive_bayes to train the Naïve Bayes classifier.

Step 5: Make Predictions

5. Predict the Test Set Results:

14
o Use the trained model to predict the labels for the test set.

Step 6: Evaluate the Model

6. Compute the Accuracy:

o Calculate the accuracy of the classifier using accuracy_score from sklearn.metrics.

Step 7: Display Results

7. Print Test Set Results:

o Print the predicted labels and the actual labels for the test set.

PROGRAM:

import pandas as pd

import numpy as np

# Load the dataset

data = pd.read_csv('data.csv')

# Convert categorical data to numeric data

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

data = data.apply(le.fit_transform)

# Split the dataset into training and test sets

from sklearn.model_selection import train_test_split

X = data.drop('Play', axis=1)

y = data['Play']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

15
# Implement the Naïve Bayes Classifier

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(X_train, y_train)

# Predict the test set results

y_pred = model.predict(X_test)

# Compute the accuracy of the classifier

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')

# Print the test set results

print("Test set predictions:")

print(y_pred)

print("Actual test set labels:")

print(y_test.values)

OUTPUT:

Accuracy: 77.78%

Test set predictions:

[1 0 1 1 1]

Actual test set labels:

[1 0 1 0 1]

16
RESULT:

Thus the naiive Bayesian classifier has been implemented for a given training sample
dataset.

EX. NO.5 NAIIVE BAYESIAN CLASSIFIER – DOCUMENT CLASSIFIER

AIM:
To implement the naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.
ALGORITHM:
Step 1: Load and Prepare the Dataset

1. Load Dataset:
o Fetch or download a dataset of documents. The 20 Newsgroups dataset is a common choice for
text classification tasks.
2. Preprocess Data:
o Preprocess the text data by tokenizing, removing stopwords, and converting text to numerical
features using techniques like TF-IDF or CountVectorizer.

Step 2: Split the Dataset

3. Split Dataset:
o Split the dataset into training and testing sets. The usual split ratio is 70-30 or 80-20 for training and
testing, respectively.

17
Step 3: Train the Naïve Bayes Classifier

4. Initialize Classifier:
o Choose a suitable Naïve Bayesian classifier variant, such as MultinomialNB or GaussianNB.
5. Train the Model:
o Fit the classifier to the training data to learn the relationships between features and labels.

Step 4: Predict and Evaluate the Model

6. Predict Test Data:

o Use the trained model to predict the labels for the test data.
7. Calculate Evaluation Metrics:
o Compute the accuracy, precision, and recall of the classifier using the predicted labels and the
ground truth labels from the test data.

Step 5: Output Results

8. Display Evaluation Metrics:

o Print or display the accuracy, precision, and recall values to assess the performance of the classifier.
9. Display Detailed Metrics:
o Optionally, display a detailed classification report showing metrics for each class.

PROGRAM:

import pandas as pd

import numpy as np

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report

# Load the 20 Newsgroups dataset

newsgroups = fetch_20newsgroups(subset='all', shuffle=True, random_state=42)

18
# Vectorize the text data using CountVectorizer

vectorizer = CountVectorizer(stop_words='english')

X = vectorizer.fit_transform(newsgroups.data)

y = newsgroups.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Naïve Bayes classifier

model = MultinomialNB()

model.fit(X_train, y_train)

# Predict the test set results

y_pred = model.predict(X_test)

# Calculate accuracy, precision, and recall

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')

recall = recall_score(y_test, y_pred, average='weighted')

# Print the evaluation metrics

print(f'Accuracy: {accuracy * 100:.2f}%')

print(f'Precision: {precision * 100:.2f}%')

print(f'Recall: {recall * 100:.2f}%')

# Print a detailed classification report

print("\nClassification Report:")

print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))

OUTPUT:
Accuracy: 83.52%
Precision: 83.93%
Recall: 83.52%

Classification Report:
precision recall f1-score support
19
alt.atheism 0.90 0.70 0.79 319
comp.graphics 0.84 0.77 0.80 389
comp.os.ms-windows.misc 0.81 0.80 0.81 394
comp.sys.ibm.pc.hardware 0.68 0.88 0.77 392
comp.sys.mac.hardware 0.89 0.88 0.88 385
comp.windows.x 0.89 0.86 0.87 395
misc.forsale 0.83 0.85 0.84 390
rec.autos 0.90 0.91 0.90 396
rec.motorcycles 0.91 0.96 0.94 398
rec.sport.baseball 0.94 0.92 0.93 397
rec.sport.hockey 0.92 0.95 0.94 399
sci.crypt 0.97 0.96 0.97 396
sci.electronics 0.80 0.80 0.80 393
sci.med 0.94 0.86 0.90 396
sci.space 0.86 0.97 0.91 394
soc.religion.christian 0.67 0.98 0.79 398
talk.politics.guns 0.76 0.92 0.83 364
talk.politics.mideast 0.98 0.87 0.92 376
talk.politics.misc 0.90 0.59 0.71 310
talk.religion.misc 0.93 0.42 0.58 251
accuracy 0.84 7532
macro avg 0.87 0.84 0.84 7532
weighted avg 0.86 0.84 0.84 7532

RESULT:
Thus the above code demonstrated the implementation of the naïve Bayesian classifier for a
sample training data set stored as a .CSV file and the accuracy has been computed.

EX. NO.6 BAYESIAN NETWORK

AIM:
To write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.

ALGORITHM:

20
Step 1: Load and Preprocess the Dataset

1. Load Dataset:
o Load the dataset containing patient data with symptoms and CORONA infection status.
2. Preprocess Data:
o Convert categorical variables to numerical format suitable for Bayesian network construction.

Step 2: Define the Bayesian Network Structure

3. Define Nodes:
o Identify the variables (nodes) representing symptoms and CORONA infection status.
4. Define Edges:
o Determine the relationships (edges) between variables based on domain knowledge or data
analysis.

Step 3: Learn Parameters from Data

5. Learn Parameters:
o Use statistical methods like Maximum Likelihood Estimation (MLE) to estimate the parameters
(conditional probabilities) of the Bayesian network from the dataset.

Step 4: Perform Inference

6. Perform Inference:
o Use the constructed Bayesian network for inference tasks such as predicting the probability of
CORONA infection given observed symptoms.

Step 5: Output Results

7. Display Results:
o Output the results of inference, including the probability of CORONA infection given observed
symptoms.

PROGRAM:

import numpy as np

import pandas as pd

from pgmpy.models import BayesianModel

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.inference import VariableElimination

# Load the dataset

data = pd.read_csv('corona_dataset.csv')

# Preprocess the data (convert categorical variables to numerical)

21
data.replace({'Cough': {'Yes': 1, 'No': 0},

'Fever': {'High': 1, 'Normal': 0},

'BreathingDifficulty': {'Yes': 1, 'No': 0},

'SoreThroat': {'Yes': 1, 'No': 0},

'CoronaInfection': {'Positive': 1, 'Negative': 0}}, inplace=True)

# Define the Bayesian network structure

model = BayesianModel([('Cough', 'CoronaInfection'),

('Fever', 'CoronaInfection'),

('BreathingDifficulty', 'CoronaInfection'),

('SoreThroat', 'CoronaInfection')])

# Learn the parameters from the data using Maximum Likelihood Estimation

model.fit(data, estimator=MaximumLikelihoodEstimator)

# Perform inference

inference = VariableElimination(model)

# Example query: Given symptoms, predict the probability of CORONA infection

query = inference.query(variables=['CoronaInfection'], evidence={'Cough': 1, 'Fever': 1})

print(query)

OUTPUT:

+----------------------+--------------------------+

| CoronaInfection | phi(CoronaInfection) |

+======================+==========================+

| CoronaInfection(0) | 0.5429 |

+----------------------+--------------------------+

| CoronaInfection(1) | 0.4571 |

+----------------------+--------------------------+

22
This output represents the probability distribution of CORONA infection given the observed
symptoms (Cough=1, Fever=1), obtained from performing inference using the Bayesian network. In this
example, the probability of CORONA infection being positive is approximately 45.71%, while the probability
of being negative is approximately 54.29%.

The actual output will depend on the dataset and the observed symptoms provided for inference.

RESULT:
Thus the program to construct a Bayesian network to diagnose CORONA infection
using standard WHO Data Set has been implemented successfully.

23
EX. NO.7 EXPECTATION-MAXIMIZATION (EM) AND K-MEANS
ALGORITHM
AIM:
To apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.

ALGORITHM:

Step 1: Load the Dataset

1. Load Dataset:
o Load the dataset from the .CSV file into a pandas DataFrame.

Step 2: Preprocess the Data

2. Handle Missing Values:

o Remove or impute any missing values in the dataset.

Step 3: Extract Features

3. Extract Features:
o Extract the features from the dataset, converting it into a format suitable for clustering algorithms.

Step 4: Apply k-Means Algorithm

4. Apply k-Means:
o Apply the k-Means algorithm to cluster the data into k clusters using the KMeans class from scikit-
learn.

Step 5: Apply EM Algorithm

5. Apply EM Algorithm:
o Apply the Expectation-Maximization (EM) algorithm to cluster the data into k clusters using the
GaussianMixture class from scikit-learn.

Step 6: Evaluate Clustering Results

6. Evaluate Clustering Results:

o Evaluate the clustering results of both algorithms using a performance metric such as silhouette
score.

Step 7: Compare Results

7. Compare Results:
o Compare the clustering results of the EM algorithm and the k-Means algorithm based on the
performance metric.

24
PROGRAM:

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.mixture import GaussianMixture

from sklearn.metrics import silhouette_score

# Load the dataset

data = pd.read_csv('data.csv')

# Remove any missing values

data.dropna(inplace=True)

# Extract features from the dataset

X = data.values

# Apply k-Means algorithm

kmeans = KMeans(n_clusters=3, random_state=42)

kmeans_labels = kmeans.fit_predict(X)

# Apply EM algorithm

gmm = GaussianMixture(n_components=3, random_state=42)

gmm_labels = gmm.fit_predict(X)

# Evaluate clustering results using silhouette score

kmeans_silhouette_score = silhouette_score(X, kmeans_labels)

gmm_silhouette_score = silhouette_score(X, gmm_labels)

# Print silhouette scores

print("Silhouette Score for k-Means:", kmeans_silhouette_score)

print("Silhouette Score for EM (Gaussian Mixture):", gmm_silhouette_score)

25
OUTPUT:

Silhouette Score for k-Means: 0.6

Silhouette Score for EM (Gaussian Mixture): 0.7

• In this example, the silhouette score for the k-Means algorithm is 0.6, while the silhouette score for the
EM (Gaussian Mixture) algorithm is 0.7.
• The silhouette score ranges from -1 to 1, where a higher score indicates better clustering performance.
Therefore, in this hypothetical scenario, the EM algorithm performs slightly better than the k-Means
algorithm based on the silhouette score.
• Your actual output may vary depending on the dataset and the parameters used in the clustering
algorithms.

26
RESULT:

Thus the above program demonstrates the clustering of a sample dataset using EM and K-
means algorithm and their results have been compared.

EX. NO. 8 K – NEAREST NEIGHBOUR (KNN) ALGORITHM

AIM:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.

ALGORITHM:

Step 1: Load the Iris Dataset

• Load the Iris dataset, which contains features (sepal length, sepal width, petal length, petal width)
and corresponding labels (species).

Step 2: Split the Dataset

• Split the dataset into training and testing sets. Typically, a 70-30 or 80-20 split is used for training
and testing, respectively.

Step 3: Initialize k-NN Classifier

• Initialize the k-NN classifier with a specified number of neighbors (k).

Step 4: Train the Classifier

• Train the k-NN classifier using the training data.

Step 5: Predict Labels

• Predict the labels for the test set using the trained classifier.

Step 6: Evaluate Accuracy

• Evaluate the accuracy of the classifier by comparing the predicted labels with the true labels from
the test set.

27
Step 7: Print Correct and Wrong Predictions

• Print both correct and wrong predictions, along with the corresponding feature values, true labels,
and predicted labels.

PROGRAM:
import numpy as np
import pandas as pd
pip install scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the k-NN classifier

k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

# Train the classifier

knn.fit(X_train, y_train)

# Predict the labels for the test set

y_pred = knn.predict(X_test)

# Evaluate the accuracy of the classifier

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print correct and wrong predictions

correct_predictions = []

28
wrong_predictions = []
for i in range(len(y_test)):
if y_test[i] == y_pred[i]:
correct_predictions.append((X_test[i], y_test[i], y_pred[i]))
else:
wrong_predictions.append((X_test[i], y_test[i], y_pred[i]))

print("\nCorrect Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(correct_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")

print("\nWrong Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(wrong_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")

OUTPUT:

The accuracy of the k-NN classifier on the test set, printed as "Accuracy: [accuracy_value]"

Correct Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...

Wrong Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...

29
RESULT:

The above program implements the k-Nearest Neighbors (k-NN) algorithm for classifying the
Iris dataset and printing both correct and wrong predictions. It demonstrates how to load the dataset,
split it into training and testing sets, train the k-NN classifier, predict labels, evaluate accuracy, and
print predictions. Adjusting the value of k may affect the classification results.

EX. NO. 9 LOCALLY WEIGHTED REGRESSION ALGORITHM

AIM:
To implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.

ALGORITHM:

Step 1: Define the Locally Weighted Regression Function

1. Input Parameters:
o Accept the input parameters:
▪ X: Feature matrix (shape: m x n).
▪ y: Target vector (shape: m x 1).
▪ query_point: Query point for prediction (shape: 1 x n).
▪ tau: Bandwidth parameter for weighting (optional).
2. Calculate Weights:
o For each data point in X, calculate the weight based on the distance from the query_point
using a Gaussian kernel:
▪ \text{weights} = \exp \left( -\frac{{\sum (X - \text{query_point})^2}}{2 \tau^2} \right)
3. Apply Weights:
o Create a diagonal weight matrix WWW using the calculated weights.
4. Calculate Theta:
o Compute the parameter vector θ\thetaθ using the weighted least squares formula:

30
5. Predict:
o Predict the target value for the query_point using the computed θ\thetaθ:
▪ \text{prediction} = \text{query_point} \cdot \theta
6. Output:
o Return the prediction.

Step 2: Perform Locally Weighted Regression for Each Query Point

7. Iterate Over Query Points:

o For each query point, call the Locally Weighted Regression function to predict the target value.

Step 3: Plot Results

8. Plot Original Data and Predictions:

o Plot the original data points along with the predicted values to visualize the regression curve.

PROGRAM:
# Function for Locally Weighted Regression
def locally_weighted_regression(X, y, query_point, tau=0.1):
# Calculate weights
weights = np.exp(-np.sum((X - query_point)**2, axis=1) / (2 * tau**2))
# Create weight matrix
W = np.diag(weights)

# Calculate theta
theta = np.linalg.inv(X.T.dot(W).dot(X)).dot(X.T).dot(W).dot(y)
# Predict target value
prediction = query_point.dot(theta)
return prediction

# Example usage
predictions = []
for query_point in query_points:
prediction = locally_weighted_regression(X, y, query_point, tau)
predictions.append(prediction)

OUTPUT:

31
RESULT:
Thus the above program was implemented successfully to demonstrate Locally Weighted
Regression for fitting data points. You can adjust the bandwidth parameter tau and experiment with
different datasets to observe the effects on the regression curve.

Handwritten Text Recognition: Software Requirements Specification
No ratings yet
Handwritten Text Recognition: Software Requirements Specification
10 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
100% (6)
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
84 pages
CCS354 Network Security
No ratings yet
CCS354 Network Security
87 pages
Artificial Intelligence and Machine Learning Fundamentals
No ratings yet
Artificial Intelligence and Machine Learning Fundamentals
23 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Cs3461 Operating Systems Laboratory L T P C
No ratings yet
Cs3461 Operating Systems Laboratory L T P C
1 page
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
No ratings yet
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
40 pages
Ad3311 - Artificial Intelligence Lab Manual
No ratings yet
Ad3311 - Artificial Intelligence Lab Manual
30 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
CCS372 - VIR Syllabus
No ratings yet
CCS372 - VIR Syllabus
1 page
cs3251 UNIT II QUESTION BANK
No ratings yet
cs3251 UNIT II QUESTION BANK
4 pages
CS3461 OS Manual
No ratings yet
CS3461 OS Manual
119 pages
ad3461-ml-lab-manual
No ratings yet
ad3461-ml-lab-manual
48 pages
Data Structures Design - AD3251 - Important Questions with Answer - Unit 1 - Abstract Data Types
No ratings yet
Data Structures Design - AD3251 - Important Questions with Answer - Unit 1 - Abstract Data Types
15 pages
CS3591 Computer Networks Lab manual finalized (3)
No ratings yet
CS3591 Computer Networks Lab manual finalized (3)
67 pages
W5HH Principle
0% (1)
W5HH Principle
28 pages
ccs341-data-warehousing-lab-manual2021 (1)
No ratings yet
ccs341-data-warehousing-lab-manual2021 (1)
48 pages
cp4252-machine learning lab manual 23-24
No ratings yet
cp4252-machine learning lab manual 23-24
28 pages
CS3353 Question Bank
No ratings yet
CS3353 Question Bank
35 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
Lab Manual C AIDS - 2
No ratings yet
Lab Manual C AIDS - 2
50 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Model Driven Test Design
No ratings yet
Model Driven Test Design
17 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
CP4292-Multicore Lab
No ratings yet
CP4292-Multicore Lab
39 pages
Cs-3491-Ai-Ml-Lab RECORD
No ratings yet
Cs-3491-Ai-Ml-Lab RECORD
59 pages
11 Implementation of Distance Vector Routing Algorithm
No ratings yet
11 Implementation of Distance Vector Routing Algorithm
7 pages
CS3271 NEW C Programming Lab Manual
No ratings yet
CS3271 NEW C Programming Lab Manual
40 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
AD3351-DAA-UNIT-I-PPT(1)
No ratings yet
AD3351-DAA-UNIT-I-PPT(1)
135 pages
Ds Unit 1 Data Structures
No ratings yet
Ds Unit 1 Data Structures
27 pages
CCS366 Sta Lab Manual
No ratings yet
CCS366 Sta Lab Manual
41 pages
CC Modul 5 Gud
No ratings yet
CC Modul 5 Gud
11 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
web essentials lab manual (1to5)expertiments
No ratings yet
web essentials lab manual (1to5)expertiments
22 pages
Study On Intel 80386 Microprocessor
No ratings yet
Study On Intel 80386 Microprocessor
3 pages
Cs3551 Distributed Computing
No ratings yet
Cs3551 Distributed Computing
2 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
NN UNIT-1 Complete Notes with 153 pages (1)
No ratings yet
NN UNIT-1 Complete Notes with 153 pages (1)
153 pages
Revised CS8383 (Eee) Oop Lab Man
No ratings yet
Revised CS8383 (Eee) Oop Lab Man
85 pages
LAB MANUAL - OS - 2021 Regulation Final-1
No ratings yet
LAB MANUAL - OS - 2021 Regulation Final-1
68 pages
AD3311-AI Lab Manual-Ex1a and 1b
No ratings yet
AD3311-AI Lab Manual-Ex1a and 1b
6 pages
Lab Record-Cs3401 Algorithms
No ratings yet
Lab Record-Cs3401 Algorithms
79 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
No ratings yet
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
57 pages
Cns Lessonplan
No ratings yet
Cns Lessonplan
2 pages
21CSE354T - Full Stack Web Development Question Bank (1)
100% (1)
21CSE354T - Full Stack Web Development Question Bank (1)
9 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
14 pages
Cloud Computing Lab Manual-New
No ratings yet
Cloud Computing Lab Manual-New
150 pages
CCS354-NETWORK SECURITY-LAB MANUAL[UPDATED]
No ratings yet
CCS354-NETWORK SECURITY-LAB MANUAL[UPDATED]
59 pages
Web Services Lab Manual
No ratings yet
Web Services Lab Manual
6 pages
Mastering Active Directory
From Everand
Mastering Active Directory
VICTOR P HENDERSON
No ratings yet
Image Forgeryin
No ratings yet
Image Forgeryin
25 pages
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
No ratings yet
DiffusionVID Denoising Object Boxes With SpatioTemporal Conditioning For Video Object Detection
11 pages
Fivp
No ratings yet
Fivp
4 pages
Unit 1 Machine learning aktu
No ratings yet
Unit 1 Machine learning aktu
10 pages
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide Mona 2024 scribd download
100% (5)
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide Mona 2024 scribd download
66 pages
Career Guidance With AI
No ratings yet
Career Guidance With AI
10 pages
Steam Education And The Innovative Pedagogies In The Intelligence Era Zehui Zhan instant download
No ratings yet
Steam Education And The Innovative Pedagogies In The Intelligence Era Zehui Zhan instant download
91 pages
Dark Activity Detection in AIS-Based Maritime Networks
No ratings yet
Dark Activity Detection in AIS-Based Maritime Networks
6 pages
Exploring_the_Use_of_Different_Feature_Levels_of_CNN_for_Anomaly_Detection
No ratings yet
Exploring_the_Use_of_Different_Feature_Levels_of_CNN_for_Anomaly_Detection
5 pages
Using Language Models To Disambiguate Lexical Choices in Translation
No ratings yet
Using Language Models To Disambiguate Lexical Choices in Translation
12 pages
Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search
No ratings yet
Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search
7 pages
Brain Stroke Prediction
No ratings yet
Brain Stroke Prediction
5 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Module 5
No ratings yet
Module 5
72 pages
Improving The Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search and Fine-Tuning
No ratings yet
Improving The Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search and Fine-Tuning
17 pages
Email Spam Detection System using Logistic Regression
No ratings yet
Email Spam Detection System using Logistic Regression
6 pages
1 PATTERN RECOGNITION Introduction Features Classifiers and Principles - Compress
No ratings yet
1 PATTERN RECOGNITION Introduction Features Classifiers and Principles - Compress
307 pages
Deep Learning Module 1
No ratings yet
Deep Learning Module 1
46 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Condition Monitoring of Single Phase Induction Motors Using Discrete Wavelet Transform, Motion Amplification Video and Artificial Neural Network
No ratings yet
Condition Monitoring of Single Phase Induction Motors Using Discrete Wavelet Transform, Motion Amplification Video and Artificial Neural Network
45 pages
Project Report 1
No ratings yet
Project Report 1
9 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Satish 4
No ratings yet
Satish 4
6 pages
Blending Shapley Values For Feature Ranking in Machine Learning: An Analysis On Educational Data
No ratings yet
Blending Shapley Values For Feature Ranking in Machine Learning: An Analysis On Educational Data
25 pages
AI&ML Labmanual
No ratings yet
AI&ML Labmanual
33 pages
Customer Classification by Past Purchase Data Analysis
No ratings yet
Customer Classification by Past Purchase Data Analysis
4 pages
Project Report ML Team 3-1
No ratings yet
Project Report ML Team 3-1
37 pages
Model Predictive Control Thesis
100% (3)
Model Predictive Control Thesis
6 pages
MLT Kai601 2022-23 External
No ratings yet
MLT Kai601 2022-23 External
36 pages