0% found this document useful (0 votes)
154 views

AD3461 ML lab manual

The document outlines a series of experiments focused on implementing various machine learning algorithms, including Candidate-Elimination, ID3 decision trees, Backpropagation for neural networks, and Naïve Bayesian classifiers. Each experiment includes aims, algorithms, procedures, and sample programs to demonstrate the implementation and results. The document serves as a comprehensive guide for practical applications of these algorithms using datasets in CSV format.

Uploaded by

mohanprashad2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

AD3461 ML lab manual

The document outlines a series of experiments focused on implementing various machine learning algorithms, including Candidate-Elimination, ID3 decision trees, Backpropagation for neural networks, and Naïve Bayesian classifiers. Each experiment includes aims, algorithms, procedures, and sample programs to demonstrate the implementation and results. The document serves as a comprehensive guide for practical applications of these algorithms using datasets in CSV format.

Uploaded by

mohanprashad2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

LIST OF EXPERIMENTS

1. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

2. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

3. Build an Artificial Neural Network by implementing the Backpropagation algorithm and


test the same using appropriate data sets.

4. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.

5. Implement naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.

6. Write a program to construct a Bayesian network to diagnose CORONA infection using


standard WHO Data Set.

7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.

8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.

9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.

1
Ex.No.-1 CANDIDATE ELIMINATION ALGORITHM

AIM:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples

ALGORITHM:
The Candidate Elimination algorithm is a method in machine learning used to find a
hypothesis that is consistent with the training examples. It maintains two sets of hypotheses: the most
specific hypotheses (S) and the most general hypotheses (G).

Step 1: Initialization:

• Set the most specific hypothesis S0 to the most specific hypothesis possible.
• Set the most general hypothesis G0 to the most general hypothesis possible.

Step 2: For each training example d:


• If d is a positive example:

• Remove from G any hypothesis that does not match d.


• For each hypothesis s in S that does not match d:
o Remove s from S.
o Add to S all minimal generalizations h of s such that h matches d and some hypothesis
in G is more general than h.
o Remove from S any hypothesis that is more general than another hypothesis in S.

• If d is a negative example:

• Remove from S any hypothesis that matches d.


• For each hypothesis g in G that matches d:
o Remove g from G.
o Add to G all minimal specializations h of g such that h does not match d and some
hypothesis in S is more specific than h.
o Remove from G any hypothesis that is less general than another hypothesis in G.

Step 3: Temination

• The algorithm terminates when all training examples have been processed.
• The version space (the set of hypotheses consistent with all training examples) is represented by the
hypotheses in S and G.

PROCEDURE:

1. Input: Read the training examples and initialize S and G.


2. Iterate over each example and update S and G.
3. Print the final S and G.

2
PROGRAM:

3
OUTPUT:

RESULT:

Thus the Candidate Elimination algorithm was implemented and demonstrated successfully for
the given training example.

4
EX. NO.2 DECISION TREE BASED ID3 ALGORITHM

AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

ALGORITHM:
Step 1: Data Preparation

1. Create the Dataset:


o Organize the dataset into a table where each row represents an instance, and each column
represents an attribute (feature), except the last column which represents the target variable
(label).

Step 2: Calculate Entropy

2. Define Entropy Function:


o Input: Data subset.
o Calculate the frequency of each label in the subset.
o Compute the probability of each label.
o Calculate the entropy using the formula:

o Output: Entropy value of the subset.

Step 3: Calculate Information Gain

3. Define Information Gain Function:


o Input: Data subset, attribute.
o Calculate the total entropy of the subset.
o For each unique value of the attribute:
▪ Create a subset of data where the attribute has the specific value.
▪ Calculate the entropy of this subset.
▪ Compute the weighted entropy.
o Calculate information gain using the formula:

o Output: Information gain for the attribute.

Step 4: Build the Decision Tree

4. Define ID3 Function:


o Input: Data, list of attributes, target attribute.
o Check if all instances have the same label:
5
▪ If true, return the label.

o Check if there are no more attributes:


▪ If true, return the most common label.
o Calculate information gain for each attribute.
o Select the attribute with the highest information gain as the best attribute.
o Create a node for the best attribute.
o For each unique value of the best attribute:
▪ Create a subset of data where the best attribute has the specific value.
▪ Recursively call the ID3 function to build the subtree.
▪ Attach the subtree to the node.
o Output: Decision tree.

Step 5: Classify a New Sample

5. Define Classify Function:


o Input: Decision tree, new sample.
o Traverse the tree starting from the root:
▪ At each node, check the value of the corresponding attribute in the sample.
▪ Follow the branch corresponding to the attribute value.
▪ If a leaf node is reached, return the label.
o Output: Classification result.

PROCEDURE:

1. Load and prepare the dataset.


2. Define the list of attributes and the target attribute.
3. Build the decision tree using the ID3 function.
4. Define a new sample for classification.
5. Classify the new sample using the Classify function.
6. Print the decision tree and the classification result.

PROGRAM:

import pandas as pd
import numpy as np
import math
from collections import Counter

def entropy(data):
labels = data.iloc[:, -1]
label_counts = Counter(labels)
total_count = len(labels)

ent = 0.0
for count in label_counts.values():
probability = count / total_count

6
ent -= probability * math.log2(probability)
return ent

def information_gain(data, attribute):


total_entropy = entropy(data)
values, counts = np.unique(data[attribute], return_counts=True)
weighted_entropy = 0.0

for value, count in zip(values, counts):


subset = data[data[attribute] == value]
weighted_entropy += (count / len(data)) * entropy(subset)

return total_entropy - weighted_entropy

def id3(data, attributes, target_attribute):


labels = data[target_attribute]

if len(np.unique(labels)) == 1:
return labels.iloc[0]

if len(attributes) == 0:
return Counter(labels).most_common(1)[0][0]

gains = {attr: information_gain(data, attr) for attr in attributes}


best_attribute = max(gains, key=gains.get)

tree = {best_attribute: {}}

attributes = [attr for attr in attributes if attr != best_attribute]

for value in np.unique(data[best_attribute]):


subset = data[data[best_attribute] == value]
subtree = id3(subset, attributes, target_attribute)
tree[best_attribute][value] = subtree

return tree

def classify(tree, sample):


if not isinstance(tree, dict):
return tree

attribute = next(iter(tree))
attribute_value = sample[attribute]

if attribute_value in tree[attribute]:
return classify(tree[attribute][attribute_value], sample)
else:
return None

# Create the dataset


data = pd.DataFrame([
['Sunny', 'Hot', 'High', 'Weak', 'No'],
7
['Sunny', 'Hot', 'High', 'Strong', 'No'],
['Overcast', 'Hot', 'High', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Cool', 'Normal', 'Strong', 'No'],
['Overcast', 'Cool', 'Normal', 'Strong', 'Yes'],
['Sunny', 'Mild', 'High', 'Weak', 'No'],
['Sunny', 'Cool', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'Normal', 'Weak', 'Yes'],
['Sunny', 'Mild', 'Normal', 'Strong', 'Yes'],
['Overcast', 'Mild', 'High', 'Strong', 'Yes'],
['Overcast', 'Hot', 'Normal', 'Weak', 'Yes'],
['Rain', 'Mild', 'High', 'Strong', 'No']
], columns=['Outlook', 'Temperature', 'Humidity', 'Wind', 'Play'])

# Build the decision tree


attributes = list(data.columns[:-1])
target_attribute = 'Play'
tree = id3(data, attributes, target_attribute)

# Classify a new sample


new_sample = {'Outlook': 'Sunny', 'Temperature': 'Cool', 'Humidity': 'High', 'Wind': 'Strong'}
classification = classify(tree, new_sample)

print("Decision Tree:", tree)


print("Classification of new sample:", classification)

OUTPUT:
The decision tree will be printed in a nested dictionary format, representing the structure of the
decision tree. Each key represents a decision node, and each value represents the branches (subtrees) for each
possible value of the attribute.
The classification result will be a single value, either 'Yes' or 'No', indicating whether the new sample
should be classified as 'Play' or 'No Play'.
Decision Tree: {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}
Classification of new sample: No

RESULT:

Thus the decision tree based ID3 algorithm was implemented and demonstrated successfully for the
given training example.
8
EX. NO.3 ARTIFICIAL NEURAL NETWORK BY
BACKPROPAGATION ALGORITHM

AIM:
To build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
ALGORITHM:
Step 1: Data Preparation

1. Create the Dataset:


o Organize the dataset into an input matrix inputs and an output matrix outputs.

Step 2: Define Activation Functions

2. Define Sigmoid Activation Function and its Derivative:


o Sigmoid function:
o Sigmoid derivative:

Step 3: Initialize the Neural Network

3. Initialize Network Parameters:


o Define the number of neurons in the input layer, hidden layer, and output layer.
o Initialize the weights for the input-to-hidden layer (weights_input_hidden) and hidden-to-output
layer (weights_hidden_output) with small random values.
o Set the learning rate.

Step 4: Forward Propagation

4. Implement Forward Propagation:


o Calculate the activation of the hidden layer:

hidden_layer_activation = inputs.dot(weights_input_hidden)

o Apply the sigmoid function to get the output of the hidden layer:

hidden_layer_output = sigmoid(hidden_layer_activation)

o Calculate the activation of the output layer:

final_layer_activation = hidden_layer_output.dot(weights_hidden_output)

o Apply the sigmoid function to get the final predicted output:

predicted_output = sigmoid(final_layer_activation)

Step 5: Calculate Error

5. Calculate Error:
o Compute the error by subtracting the predicted output from the actual output:

error = outputs - predicted_output


9
Step 6: Backward Propagation

6. Implement Backward Propagation:


o Compute the derivative of the error with respect to the predicted output:

d_predicted_output = error * sigmoid_derivative(predicted_output)

o Calculate the error for the hidden layer:

error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)

o Compute the derivative of the hidden layer output:

d_hidden_layer = error_hidden_layer *
sigmoid_derivative(hidden_layer_output)

o Update the weights for the hidden-to-output layer:

weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) *
learning_rate

o Update the weights for the input-to-hidden layer:

weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate

Step 7: Training Process

7. Train the Network:


o Repeat the forward and backward propagation steps for a fixed number of epochs (training
iterations).

Step 8: Test the Network

8. Test the Network:


o Use the trained network to predict outputs for the input data.
o Print the predicted output.

PROCEDURE:

1. Start by importing the necessary library, NumPy.


2. Define the sigmoid activation function and its derivative
3. Create the input and output datasets. Here we use the XOR problem as an example.
4. Define the number of neurons in the input layer, hidden layer, and output layer.
5. Initialize the weights for the input-to-hidden and hidden-to-output layers with small random
values.
6. Set the learning rate.
7. Set the number of epochs (iterations) for training.
8. For each epoch, perform the forward and backward propagation steps and update the weights.
9. After training, test the network using the input data to see how well it has learned.
10. The output should display the loss at different epochs during training and the final predicted outputs,
which should be close to the expected XOR outputs [0, 1, 1, 0].

10
PROGRAM:

import numpy as np

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):

return x * (1 - x)

# Seed the random number generator for reproducibility

np.random.seed(42)

# Input dataset (XOR problem)

inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# Output dataset

outputs = np.array([[0], [1], [1], [0]])

# Number of neurons in each layer

input_neurons = inputs.shape[1]

hidden_neurons = 2

output_neurons = outputs.shape[1]

# Initialize weights randomly with mean 0

weights_input_hidden = np.random.uniform(size=(input_neurons, hidden_neurons))

weights_hidden_output = np.random.uniform(size=(hidden_neurons, output_neurons))

# Learning rate

learning_rate = 0.5

# Number of training iterations

epochs = 10000

for epoch in range(epochs):

11
# Forward pass

hidden_layer_activation = np.dot(inputs, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_activation)

final_layer_activation = np.dot(hidden_layer_output, weights_hidden_output)

predicted_output = sigmoid(final_layer_activation)

# Calculate error

error = outputs - predicted_output

# Backward pass

d_predicted_output = error * sigmoid_derivative(predicted_output)

error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)

d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

# Update weights

weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) * learning_rate

weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate

if epoch % 1000 == 0:

loss = np.mean(np.abs(error))

print(f'Epoch {epoch}, Loss: {loss}')

# Testing the neural network

hidden_layer_activation = np.dot(inputs, weights_input_hidden)

hidden_layer_output = sigmoid(hidden_layer_activation)

final_layer_activation = np.dot(hidden_layer_output, weights_hidden_output)

predicted_output = sigmoid(final_layer_activation)

print("Predicted Output:")

print(predicted_output)

OUTPUT:

Epoch 0, Loss: 0.49455894657661807

Epoch 1000, Loss: 0.010239189372255456

12
Epoch 2000, Loss: 0.006559532659682358

Epoch 3000, Loss: 0.004916930733643931

Epoch 4000, Loss: 0.0040249618150981085

Epoch 5000, Loss: 0.0034711518228879065

Epoch 6000, Loss: 0.0030687069156247233

Epoch 7000, Loss: 0.002758587074358801

Epoch 8000, Loss: 0.0025113592274235624

Epoch 9000, Loss: 0.0023102028827490134

Predicted Output:

[[0.01202748]

[0.98827313]

[0.98814568]

[0.01163065]]

• Epoch and Loss: The loss decreases over time, indicating that the network is learning.

• Predicted Output: The predicted values are close to the actual XOR outputs [0, 1, 1, 0]

13
RESULT:

Thus the above code demonstrates the implementation of a simple neural network using the
backpropagation algorithm and tests it using the XOR problem.

EX. NO.4 NAIIVE BAYESIAN CLASSIFIER

AIM:
To write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

ALGORITHM:
Step 1: Load the Dataset

1. Read the CSV file:


o Load the dataset from a CSV file using pandas.

Step 2: Preprocess the Data

2. Convert Categorical Data to Numeric:


o Use LabelEncoder from sklearn.preprocessing to convert categorical data into numeric
data.

Step 3: Split the Dataset

3. Split into Training and Test Sets:


o Separate the features (X) and the target variable (y).
o Split the dataset into training and test sets using train_test_split from
sklearn.model_selection.

Step 4: Train the Naïve Bayes Classifier

4. Fit the Naïve Bayes Model:


o Use GaussianNB from sklearn.naive_bayes to train the Naïve Bayes classifier.

Step 5: Make Predictions

5. Predict the Test Set Results:

14
o Use the trained model to predict the labels for the test set.

Step 6: Evaluate the Model

6. Compute the Accuracy:


o Calculate the accuracy of the classifier using accuracy_score from sklearn.metrics.

Step 7: Display Results

7. Print Test Set Results:


o Print the predicted labels and the actual labels for the test set.

PROGRAM:

import pandas as pd

import numpy as np

# Load the dataset

data = pd.read_csv('data.csv')

# Convert categorical data to numeric data

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

data = data.apply(le.fit_transform)

# Split the dataset into training and test sets

from sklearn.model_selection import train_test_split

X = data.drop('Play', axis=1)

y = data['Play']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

15
# Implement the Naïve Bayes Classifier

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(X_train, y_train)

# Predict the test set results

y_pred = model.predict(X_test)

# Compute the accuracy of the classifier

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy * 100:.2f}%')

# Print the test set results

print("Test set predictions:")

print(y_pred)

print("Actual test set labels:")

print(y_test.values)

OUTPUT:

Accuracy: 77.78%

Test set predictions:

[1 0 1 1 1]

Actual test set labels:

[1 0 1 0 1]

16
RESULT:

Thus the naiive Bayesian classifier has been implemented for a given training sample
dataset.

EX. NO.5 NAIIVE BAYESIAN CLASSIFIER – DOCUMENT CLASSIFIER

AIM:
To implement the naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.
ALGORITHM:
Step 1: Load and Prepare the Dataset

1. Load Dataset:
o Fetch or download a dataset of documents. The 20 Newsgroups dataset is a common choice for
text classification tasks.
2. Preprocess Data:
o Preprocess the text data by tokenizing, removing stopwords, and converting text to numerical
features using techniques like TF-IDF or CountVectorizer.

Step 2: Split the Dataset

3. Split Dataset:
o Split the dataset into training and testing sets. The usual split ratio is 70-30 or 80-20 for training and
testing, respectively.

17
Step 3: Train the Naïve Bayes Classifier

4. Initialize Classifier:
o Choose a suitable Naïve Bayesian classifier variant, such as MultinomialNB or GaussianNB.
5. Train the Model:
o Fit the classifier to the training data to learn the relationships between features and labels.

Step 4: Predict and Evaluate the Model

6. Predict Test Data:


o Use the trained model to predict the labels for the test data.
7. Calculate Evaluation Metrics:
o Compute the accuracy, precision, and recall of the classifier using the predicted labels and the
ground truth labels from the test data.

Step 5: Output Results

8. Display Evaluation Metrics:


o Print or display the accuracy, precision, and recall values to assess the performance of the classifier.
9. Display Detailed Metrics:
o Optionally, display a detailed classification report showing metrics for each class.

PROGRAM:

import pandas as pd

import numpy as np

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report

# Load the 20 Newsgroups dataset

newsgroups = fetch_20newsgroups(subset='all', shuffle=True, random_state=42)

18
# Vectorize the text data using CountVectorizer

vectorizer = CountVectorizer(stop_words='english')

X = vectorizer.fit_transform(newsgroups.data)

y = newsgroups.target

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Naïve Bayes classifier

model = MultinomialNB()

model.fit(X_train, y_train)

# Predict the test set results

y_pred = model.predict(X_test)

# Calculate accuracy, precision, and recall

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')

recall = recall_score(y_test, y_pred, average='weighted')

# Print the evaluation metrics

print(f'Accuracy: {accuracy * 100:.2f}%')

print(f'Precision: {precision * 100:.2f}%')

print(f'Recall: {recall * 100:.2f}%')

# Print a detailed classification report

print("\nClassification Report:")

print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))

OUTPUT:
Accuracy: 83.52%
Precision: 83.93%
Recall: 83.52%

Classification Report:
precision recall f1-score support
19
alt.atheism 0.90 0.70 0.79 319
comp.graphics 0.84 0.77 0.80 389
comp.os.ms-windows.misc 0.81 0.80 0.81 394
comp.sys.ibm.pc.hardware 0.68 0.88 0.77 392
comp.sys.mac.hardware 0.89 0.88 0.88 385
comp.windows.x 0.89 0.86 0.87 395
misc.forsale 0.83 0.85 0.84 390
rec.autos 0.90 0.91 0.90 396
rec.motorcycles 0.91 0.96 0.94 398
rec.sport.baseball 0.94 0.92 0.93 397
rec.sport.hockey 0.92 0.95 0.94 399
sci.crypt 0.97 0.96 0.97 396
sci.electronics 0.80 0.80 0.80 393
sci.med 0.94 0.86 0.90 396
sci.space 0.86 0.97 0.91 394
soc.religion.christian 0.67 0.98 0.79 398
talk.politics.guns 0.76 0.92 0.83 364
talk.politics.mideast 0.98 0.87 0.92 376
talk.politics.misc 0.90 0.59 0.71 310
talk.religion.misc 0.93 0.42 0.58 251
accuracy 0.84 7532
macro avg 0.87 0.84 0.84 7532
weighted avg 0.86 0.84 0.84 7532

RESULT:
Thus the above code demonstrated the implementation of the naïve Bayesian classifier for a
sample training data set stored as a .CSV file and the accuracy has been computed.

EX. NO.6 BAYESIAN NETWORK

AIM:
To write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.

ALGORITHM:

20
Step 1: Load and Preprocess the Dataset

1. Load Dataset:
o Load the dataset containing patient data with symptoms and CORONA infection status.
2. Preprocess Data:
o Convert categorical variables to numerical format suitable for Bayesian network construction.

Step 2: Define the Bayesian Network Structure

3. Define Nodes:
o Identify the variables (nodes) representing symptoms and CORONA infection status.
4. Define Edges:
o Determine the relationships (edges) between variables based on domain knowledge or data
analysis.

Step 3: Learn Parameters from Data

5. Learn Parameters:
o Use statistical methods like Maximum Likelihood Estimation (MLE) to estimate the parameters
(conditional probabilities) of the Bayesian network from the dataset.

Step 4: Perform Inference

6. Perform Inference:
o Use the constructed Bayesian network for inference tasks such as predicting the probability of
CORONA infection given observed symptoms.

Step 5: Output Results

7. Display Results:
o Output the results of inference, including the probability of CORONA infection given observed
symptoms.

PROGRAM:

import numpy as np

import pandas as pd

from pgmpy.models import BayesianModel

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.inference import VariableElimination

# Load the dataset

data = pd.read_csv('corona_dataset.csv')

# Preprocess the data (convert categorical variables to numerical)


21
data.replace({'Cough': {'Yes': 1, 'No': 0},

'Fever': {'High': 1, 'Normal': 0},

'BreathingDifficulty': {'Yes': 1, 'No': 0},

'SoreThroat': {'Yes': 1, 'No': 0},

'CoronaInfection': {'Positive': 1, 'Negative': 0}}, inplace=True)

# Define the Bayesian network structure

model = BayesianModel([('Cough', 'CoronaInfection'),

('Fever', 'CoronaInfection'),

('BreathingDifficulty', 'CoronaInfection'),

('SoreThroat', 'CoronaInfection')])

# Learn the parameters from the data using Maximum Likelihood Estimation

model.fit(data, estimator=MaximumLikelihoodEstimator)

# Perform inference

inference = VariableElimination(model)

# Example query: Given symptoms, predict the probability of CORONA infection

query = inference.query(variables=['CoronaInfection'], evidence={'Cough': 1, 'Fever': 1})

print(query)

OUTPUT:

+----------------------+--------------------------+

| CoronaInfection | phi(CoronaInfection) |

+======================+==========================+

| CoronaInfection(0) | 0.5429 |

+----------------------+--------------------------+

| CoronaInfection(1) | 0.4571 |

+----------------------+--------------------------+

22
This output represents the probability distribution of CORONA infection given the observed
symptoms (Cough=1, Fever=1), obtained from performing inference using the Bayesian network. In this
example, the probability of CORONA infection being positive is approximately 45.71%, while the probability
of being negative is approximately 54.29%.

The actual output will depend on the dataset and the observed symptoms provided for inference.

RESULT:
Thus the program to construct a Bayesian network to diagnose CORONA infection
using standard WHO Data Set has been implemented successfully.

23
EX. NO.7 EXPECTATION-MAXIMIZATION (EM) AND K-MEANS
ALGORITHM
AIM:
To apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.

ALGORITHM:

Step 1: Load the Dataset

1. Load Dataset:
o Load the dataset from the .CSV file into a pandas DataFrame.

Step 2: Preprocess the Data

2. Handle Missing Values:


o Remove or impute any missing values in the dataset.

Step 3: Extract Features

3. Extract Features:
o Extract the features from the dataset, converting it into a format suitable for clustering algorithms.

Step 4: Apply k-Means Algorithm

4. Apply k-Means:
o Apply the k-Means algorithm to cluster the data into k clusters using the KMeans class from scikit-
learn.

Step 5: Apply EM Algorithm

5. Apply EM Algorithm:
o Apply the Expectation-Maximization (EM) algorithm to cluster the data into k clusters using the
GaussianMixture class from scikit-learn.

Step 6: Evaluate Clustering Results

6. Evaluate Clustering Results:


o Evaluate the clustering results of both algorithms using a performance metric such as silhouette
score.

Step 7: Compare Results

7. Compare Results:
o Compare the clustering results of the EM algorithm and the k-Means algorithm based on the
performance metric.

24
PROGRAM:

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.mixture import GaussianMixture

from sklearn.metrics import silhouette_score

# Load the dataset

data = pd.read_csv('data.csv')

# Remove any missing values

data.dropna(inplace=True)

# Extract features from the dataset

X = data.values

# Apply k-Means algorithm

kmeans = KMeans(n_clusters=3, random_state=42)

kmeans_labels = kmeans.fit_predict(X)

# Apply EM algorithm

gmm = GaussianMixture(n_components=3, random_state=42)

gmm_labels = gmm.fit_predict(X)

# Evaluate clustering results using silhouette score

kmeans_silhouette_score = silhouette_score(X, kmeans_labels)

gmm_silhouette_score = silhouette_score(X, gmm_labels)

# Print silhouette scores

print("Silhouette Score for k-Means:", kmeans_silhouette_score)

print("Silhouette Score for EM (Gaussian Mixture):", gmm_silhouette_score)

25
OUTPUT:

Silhouette Score for k-Means: 0.6


Silhouette Score for EM (Gaussian Mixture): 0.7

• In this example, the silhouette score for the k-Means algorithm is 0.6, while the silhouette score for the
EM (Gaussian Mixture) algorithm is 0.7.
• The silhouette score ranges from -1 to 1, where a higher score indicates better clustering performance.
Therefore, in this hypothetical scenario, the EM algorithm performs slightly better than the k-Means
algorithm based on the silhouette score.
• Your actual output may vary depending on the dataset and the parameters used in the clustering
algorithms.

26
RESULT:

Thus the above program demonstrates the clustering of a sample dataset using EM and K-
means algorithm and their results have been compared.

EX. NO. 8 K – NEAREST NEIGHBOUR (KNN) ALGORITHM

AIM:
To write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.

ALGORITHM:

Step 1: Load the Iris Dataset

• Load the Iris dataset, which contains features (sepal length, sepal width, petal length, petal width)
and corresponding labels (species).

Step 2: Split the Dataset

• Split the dataset into training and testing sets. Typically, a 70-30 or 80-20 split is used for training
and testing, respectively.

Step 3: Initialize k-NN Classifier

• Initialize the k-NN classifier with a specified number of neighbors (k).

Step 4: Train the Classifier

• Train the k-NN classifier using the training data.

Step 5: Predict Labels

• Predict the labels for the test set using the trained classifier.

Step 6: Evaluate Accuracy

• Evaluate the accuracy of the classifier by comparing the predicted labels with the true labels from
the test set.

27
Step 7: Print Correct and Wrong Predictions

• Print both correct and wrong predictions, along with the corresponding feature values, true labels,
and predicted labels.

PROGRAM:
import numpy as np
import pandas as pd
pip install scikit-learn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the k-NN classifier


k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)

# Train the classifier


knn.fit(X_train, y_train)

# Predict the labels for the test set


y_pred = knn.predict(X_test)

# Evaluate the accuracy of the classifier


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print correct and wrong predictions


correct_predictions = []

28
wrong_predictions = []
for i in range(len(y_test)):
if y_test[i] == y_pred[i]:
correct_predictions.append((X_test[i], y_test[i], y_pred[i]))
else:
wrong_predictions.append((X_test[i], y_test[i], y_pred[i]))

print("\nCorrect Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(correct_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")

print("\nWrong Predictions:")
for i, (X_instance, true_label, predicted_label) in enumerate(wrong_predictions):
print(f"Instance {i+1}: Features: {X_instance}, True Label: {iris.target_names[true_label]},
Predicted Label: {iris.target_names[predicted_label]}")

OUTPUT:

The accuracy of the k-NN classifier on the test set, printed as "Accuracy: [accuracy_value]"

Correct Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...

Wrong Predictions:
Instance 1: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
Instance 2: Features: [feature_values], True Label: [true_label], Predicted Label:
[predicted_label]
...

29
RESULT:

The above program implements the k-Nearest Neighbors (k-NN) algorithm for classifying the
Iris dataset and printing both correct and wrong predictions. It demonstrates how to load the dataset,
split it into training and testing sets, train the k-NN classifier, predict labels, evaluate accuracy, and
print predictions. Adjusting the value of k may affect the classification results.

EX. NO. 9 LOCALLY WEIGHTED REGRESSION ALGORITHM

AIM:
To implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select an appropriate data set for your experiment and draw graphs.

ALGORITHM:

Step 1: Define the Locally Weighted Regression Function

1. Input Parameters:
o Accept the input parameters:
▪ X: Feature matrix (shape: m x n).
▪ y: Target vector (shape: m x 1).
▪ query_point: Query point for prediction (shape: 1 x n).
▪ tau: Bandwidth parameter for weighting (optional).
2. Calculate Weights:
o For each data point in X, calculate the weight based on the distance from the query_point
using a Gaussian kernel:
▪ \text{weights} = \exp \left( -\frac{{\sum (X - \text{query_point})^2}}{2 \tau^2} \right)
3. Apply Weights:
o Create a diagonal weight matrix WWW using the calculated weights.
4. Calculate Theta:
o Compute the parameter vector θ\thetaθ using the weighted least squares formula:

30
5. Predict:
o Predict the target value for the query_point using the computed θ\thetaθ:
▪ \text{prediction} = \text{query_point} \cdot \theta
6. Output:
o Return the prediction.

Step 2: Perform Locally Weighted Regression for Each Query Point

7. Iterate Over Query Points:


o For each query point, call the Locally Weighted Regression function to predict the target value.

Step 3: Plot Results

8. Plot Original Data and Predictions:


o Plot the original data points along with the predicted values to visualize the regression curve.

PROGRAM:
# Function for Locally Weighted Regression
def locally_weighted_regression(X, y, query_point, tau=0.1):
# Calculate weights
weights = np.exp(-np.sum((X - query_point)**2, axis=1) / (2 * tau**2))
# Create weight matrix
W = np.diag(weights)

# Calculate theta
theta = np.linalg.inv(X.T.dot(W).dot(X)).dot(X.T).dot(W).dot(y)
# Predict target value
prediction = query_point.dot(theta)
return prediction

# Example usage
predictions = []
for query_point in query_points:
prediction = locally_weighted_regression(X, y, query_point, tau)
predictions.append(prediction)

OUTPUT:

31
RESULT:
Thus the above program was implemented successfully to demonstrate Locally Weighted
Regression for fitting data points. You can adjust the bandwidth parameter tau and experiment with
different datasets to observe the effects on the regression curve.

32

You might also like