0% found this document useful (0 votes)
8 views

Machine learning record

Uploaded by

mmonica0703
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Machine learning record

Uploaded by

mmonica0703
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING

COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A
JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT) ALL
PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI GURU
MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND


DATA SCIENCE

AD3461 MACHINE LEARNING LABORATORY

REGULATION-2021

NAME :

REGISTER NUMBER :

YEAR : II

SEMESTER : IV
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING
COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.
.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND


DATA SCIENCE

VISION
To produce high quality, creative and ethical engineers, and technologists
contributing effectively to the ever-advancing Artificial Intelligence and Data
Science field.

MISSION
To educate future software engineers with strong fundamentals by
continuously improving the teaching-learning methodologies using
contemporary aids.
To produce ethical engineers/researchers by instilling the values of
humility, humaneness, honesty and courage to serve the society.
To create a knowledge hub of Artificial Intelligence and Data Science
with everlasting urge to learn by developing, maintaining and continuously
improving the resources/Data Science.
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING
COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A
JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT) ALL
PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI GURU
MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

Register No:

BONAFIDE CERTIFICATE

This is to certify that this is a bonafide record of the work done


by Mr./Ms. of IIrd YEAR/ IV SEM

B.Tech ARTIFICIAL INTELLIGENCE AND DATA SCIENCE in


AD3461- MACHINE LEARNING LABORATORY during the Academic year 2023 – 2024.

Faculty-in-charge Head of the Department

Submitted for the University Practical Examination held on :

Internal Examiner External Examiner

DATE:
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING
COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A
JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, EEE, ECE, CSE & IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

AD3461 MACHINE LEARNING LABORATORY

COURSE OUTCOMES

Understand the implementation procedures for the machine learning


CO1
algorithms

CO 2 Design Java/Python programs for various Learning algorithms.

CO 3 Apply appropriate Machine Learning algorithms to data sets

Identify and apply Machine Learning algorithms to solve real world


CO 4
problems
AL3461 MACHINE LEARNING LABORATORY
CONTENT

PAGE
S.NO. TOPIC DATE SIGNATURE
NO

For a given set of training data


examples stored in a .CSV file
1. implement and demonstrate
the Candidate –Elimination
algorithm to output a
description of the set of all
hypotheses consistent with the
training examples
Build an Artificial Neural
2. Network by implementing
the Backpropagation
algorithm
and test the same using
appropriate data sets.
Write a program to implement
3. the naïve Bayesian classifier for
a sample training dataset stored
as a .CSV file. Compute the
accuracy of the classifier,
considering few
test data sets.

4. Implement naïve Bayesian


Classifier model to classify a
set of documents and
measure the accuracy,
precision ,and recall.

5. Write a program to construct a


Bayesian network considering
medical data. Use this
model to demonstrate the
diagnosis of heart patients
using standard Heart Disease
Data Set.
Apply EM algorithm to cluster
a set of data stored in a .CSV
file. Use the same data set for
6. clustering using the k-Means
algorithm. Compare the
results of these two
algorithms.

Write a program to implement


7. k-Nearest Neighbour
algorithm to classify the iris
dataset. Print both correct and
wrong predictions

Write a program to
8.
implement Decision Tree
classification model

9. Implement Logistic regression


Algorithm with a dataset .And
measure the accuracy score and
confusion matrix

10. Implement Linear regression


Algorithm with a dataset .And
measure the accuracy score

V
SYLLABUS
AD3461 MACHINE LEARNING LABORATORY
COURSE OBJECTIVES
 To get practical knowledge on implementing machine learning algorithms in real time problem
for getting solutions
 To implement supervised learning and their applications
 To understand unsupervised learning like clustering and EM algorithms
 To understand the theoretical and practical aspects of probabilistic graphical models.

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Suggested Exercises:

1. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
2. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
3. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.
4. Implement naïve Bayesian Classifier model to classify a set of documents and measure
the accuracy, precision, and recall.
5. Write a program to construct a Bayesian network considering medical data. Use this model
to d demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
6. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using the k-Means algorithm. Compare the results of these two
algorithms.
7. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions.
8. Write a program to implement Decision Tree classification model.
9. Implement Logistic regression Algorithm with a dataset . And measure the accuracy score
and Confusion matrix.
10. Implement Linear regression Algorithm with a dataset . And measure the accuracy score.
Ex.No: 1 For a given set of training data examples stored in a
.CSV file implement and demonstrate the Candidate –
Elimination algorithm to output a description of the set
Date : of all hypotheses consistent with the training examples

AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent with the training examples.

DATASET: trainingdata1.xlsx

Link : https://docs.google.com/spreadsheets/d/1X- SG3qz2zCkWvGXMQ2GlsP76zvf7hMD4/edit?


usp=share_link&ouid=107168863405783275058&rtpof=true&sd=tru e

ALGORITHM:

Step 1: Load Data set

Step 2: Importing the dataset.


Step 3: For each training example

Step 4: : If example is positive example


if attribute_value == hypothesis_value:
Do
nothing else:
replace attribute value with '?' (Basically generalizing it)

Step 5: If example is Negative example


Make generalize hypothesis more specific.

10
PROGRAM/SOURCE CODE:
import numpy as
np import pandas
as pd
data = pd.DataFrame(data=pd.read_excel('trainingdata1.xlsx'))
print(data)
Origin Manufacturer color Decade Type Example Type
0 Japan Honda blue 1980 economy positive
1 Japan Toyota green 1970 sports positive
2 Japan Toyota blue 1990 economy negative
3 USA Chrysler red 1980 economy positive
4 Japan Honda white 1980 economy positive

concepts = np.array(data.iloc[:,0:-
1]) target = np.array(data.iloc[:,-1])
print("concept:",concepts)
print("target:",target)

concept: [['Japan' 'Honda' 'blue' 1980 'economy']


['Japan' 'Toyota' 'green' 1970 'sports']
['Japan' 'Toyota' 'blue' 1990 'economy']
['USA' 'Chrysler' 'red' 1980 'economy']
['Japan' 'Honda' 'white' 1980 'economy']]
target: ['positive' 'positive' 'negative' 'positive' 'positive’]

def learn(concepts, target):


specific_h = concepts[0].copy()
print("Initialization of specific_h and general_h")
print("specific_h: ",specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("general_h: ",general_h)
print("concepts: ",concepts)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in
range(len(specific_h)):
print(specific_h[x])
#print("h[x]",h[x])
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "no":
for x in
range(len(specific_h)): if
h[x] != specific_h[x]:
11
general_h[x][x] =
specific_h[x] else:
general_h[x][x] = '?'

12
print("\nSteps of Candidate Elimination Algorithm: ",i+1)
print("Specific_h: ",i+1)
print(specific_h,"\n")
print("general_h :",
i+1) print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
print("\nIndices",indices)
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final,g_final = learn(concepts, target) print("\
nFinal Specific_h:", s_final, sep="\n")

OUTPUT :

Initialization of specific_h and general_h


specific_h: ['Japan' 'Honda' 'blue' 1980 'economy']
general_h: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]
concepts: [['Japan' 'Honda' 'blue' 1980 'economy']
['Japan' 'Toyota' 'green' 1970 'sports']
['Japan' 'Toyota' 'blue' 1990 'economy']
['USA' 'Chrysler' 'red' 1980 'economy']
['Japan' 'Honda' 'white' 1980 'economy']]

Steps of Candidate Elimination Algorithm: 5


Specific_h: 5
['Japan' 'Honda' 'blue' 1980 'economy']

general_h : 5
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]

Indices []

Final Specific_h:
['Japan' 'Honda' 'blue' 1980 'economy']

13
RESULT:
Thus, the program to Implement the concept of decision trees with suitable data set from real world
problem and classify the data set to produce new sample using Python has been executed successfully.

14
Ex.No: 2 Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using
Date : appropriate data sets.

AIM:
Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.

ALGORITHM:

Step 1: Import `numpy`.


Step 2: Define and normalize the input and output data.
Step 3: Define the sigmoid function and its derivative.
Step 4: Initialize the hyperparameters and random weights and biases.
Step 5: Train the neural network using forward and backpropagation.
Step 6: Predict the output for new data using the trained weights and biases.
Step 7: Print the input, actual output, and predicted output.

PROGRAM/SOURCE CODE:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid
Function def
derivatives_sigmoid(x):
return x * (1 - x)
#Variable
initialization
epoch=7000 #Setting training
iterations lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers
neurons output_neurons = 1 #number of neurons at output
layer #weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

# draws a random range of numbers uniformly of dim x*y


15
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act =
sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad =
derivatives_sigmoid(output) d _output
= EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)

#how much hidden layer wts contributed to error


d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True)
*lr wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True)
*lr print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n"
,output)

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[92.]
[86.]
[89.]]
Predicted
Output:
[[0.99999908]
[0.99999712]
[0.99999904]]

RESULT:
Thus, the program to Build an Artificial Neural Network by implementing the Backpropagation
algorithm using python has been executed successfully.

16
Ex.No: 3 Write a program to implement the naïve Bayesian classifier for a
sample training dataset stored as a .CSV file. Compute the
accuracyof the classifier, considering few test data sets
Date :

AIM:
To implement the naïve Bayesian classifier for a sample training dataset stored as a .CSV
file and compute the accuracy with a few test data sets.

DATASET: pim_indian.csv

LINK: https://drive.google.com/file/d/18PcjOtDELvR8wY4-iCiAXm1wNuox67a7/view?usp=share_link

ALGORITHM:

Step 1: Import `pandas`, `train_test_split` and `GaussianNB` from `sklearn.naive_bayes`, and


`metrics` from `sklearn`.

Step 2: . Load the dataset and split it into training and testing datasets using `train_test_split`.

Step 3: . Train the Naive Bayes classifier on the training data using
`GaussianNB().fit(xtrain,ytrain.ravel())`.

Step 4: Predict the class labels for the testing data using `clf.predict(xtest)`.

Step 5: Print the accuracy of the classifier using `metrics.accuracy_score()`.s.

PROGRAM/SOURCE CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics

17
# Load the data
df = pd.read_csv("pim_indian.csv")
feature_col_names = ['Pregnancies', 'Glucose', 'BloodPressure', 'BMI',
'DiabetesPedigreeFunction', 'Age']
predicted_class_name = ['Diabetes']
X=df[feature_col_names].values
y=df[predicted_class_name].values
xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.33)
print('\n the total number of Training Data: ',ytrain.shape)
print('\n the total number of Test Data: ',ytest.shape)
clf = GaussianNB().fit(xtrain, ytrain.ravel())
predicted = clf.predict(xtest)
predicttestdata = clf.predict([[6, 148, 72, 35, 0,
50]]) print('\n confusion matrix')
print(metrics.confusion_matrix(ytest, predicted))
print('\n accuracy of the clssifier is',metrics.accuracy_score(ytest,predicted))
print('\n the value of precision', metrics.precision_score(ytest,predicted))
print('\n the value of recall', metrics.recall_score(ytest,predicted))
print("predict value for individual test dataa:",predicttestdata)

OUTPUT:
the total number of Training Data: (514,

1) the total number of Test Data: (254, 1)

18
confusion
matrix [[144 28]
[ 35 47]]

accuracy of the classifier is 0.7519685039370079

the value of precision 0.6266666666666667

the value of recall 0.573170731707317


predicte value for individual test dataa:
[1]

OUTPUT:
Thus, the program to implement the naïve Bayesian classifier for a sample training dataset stored as a
.CSV file using Python has been executed successfully.

19
Ex.No: 4 Implement naïve Bayesian Classifier model to classify a
set of documents and measure the accuracy, precision
Date : ,and recall.

AIM:
Implement naïve Bayesian Classifier model to classify a set of documents and
measure the accuracy, precision, and recall

DATASET: naivetext.csv

LINK: https://drive.google.com/file/d/1sEpbtiB9qP6DdpqlvbB_6OL8aBsJqe8s/view?usp=share_link

ALGORITHM:

Step 1: Import required libraries.

Step 2: . Load the dataset and convert labels to numerical values.

Step 3: Split the dataset into training and test data.

Step 4: Convert messages into document-term matrices using CountVectorizer.

Step 5: Train a Multinomial Naive Bayes classifier on the training data.

Step 6: Use the trained classifier to make predictions on the test data.

Step 7: Calculate and print the accuracy, confusion matrix, precision, and recall.

PROGRAM/SOURCE CODE:
import pandas as pd
msg=pd.read_csv('NaiveText.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':'1','neg':0})
20
21
X=msg.messag
e y=msg.label
print(X)
print(y)
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print ('\n The total number of Training Data :',ytrain.shape)
print ('\n The total number of Test Data :',ytest.shape)

#output of count vectoriser is a sparse matrix

from sklearn.feature_extraction.text import CountVectorizer


count_vect = CountVectorizer()
xtrain_dtm =
count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
df=pd.DataFrame(xtrain_dtm.toarray())

# Training Naive Bayes (NB) classifier on training data.

from sklearn.naive_bayes import MultinomialNB


clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall

from sklearn import metrics


print("\n Accuracy of the classifer is")
metrics.accuracy_score(ytest,predicted)
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\nThe value of Precision:', metrics.precision_score(ytest, predicted,average='macro')) print('\
nThe value of Recall:', metrics.recall_score(ytest, predicted,average='macro'))

22
OUTPUT:
The dimensions of the dataset (19, 2)
0 message
1 I love this sandwich
2 This is an amazing place
3 I feel very good about these beers
4 This is my best work
5 What an awesome view
6 I do not like this restaurant
7 I am tired of this stuff
8 I can't deal with this
9 He is my sworn enemy
10 My boss is horrible
11 This is an awesome place
12 I do not like the taste of this juice
13 I love to dance
14 I am sick and tired of this place
15 What a great holiday
16 That is a bad locality to stay
17 We will have good fun tomorrow
18 I went to my enemy's house
today Name: message, dtype: object
0 label
1 1
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 1
12 0
13 1
14 0
15 1
23
16 0
17 1
18 0
Name: label, dtype: object

The total number of Training Data :

(14,) The total number of Test Data : (5,)

Accuracy of the classifer is

Confusion
matrix [[2 0]
[1 2]]

The value of Precision:

0.8333333333333333 The value of Recall:

0.8333333333333333

RESULT:

Thus, the program to Implement naïve Bayesian Classifier model to classify a


set of documents and measure the accuracy, precision ,and recall using Python has
been executed successfully

24
Ex.No: 5 Write a program to construct a Bayesian network
considering medical data. Use this model to demonstrate the
Date : diagnosis of heart patients using standard Heart Disease
Data Set.

AIM:
To construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.

DATASET: heart.csv

LINK: https://drive.google.com/file/d/10C80zeowRWEGazpPZw_n0wK4f_rRlRbL/view?usp=share_link

ALGORITHM:

Step 1: Install 'pgmpy' package.


Step 2: Import required libraries and classes.
Step 3: Read heart disease dataset in CSV format.
Step 4: Replace missing values with 'NaN'.

Step 5: Define a Bayesian network model

Step 6: Learn CPDs of the model from the dataset using MLE.

Step 7: Perform inference with the Bayesian network using 'VariableElimination' class.

Step 8: Compute and print the probabilities of heart disease given evidence of 'restecg=1'
and 'cp=2' using the 'query' method of 'VariableElimination' object.

25
PROGRAM/SOURCE CODE:
!pip install pgmpy
import numpy as
np import pandas
as pd import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Sample instances from the dataset are given below')
print(heartDisease.head())
#display the Attributes names and datatyes
print('\n Attributes and datatypes')
print(heartDisease.dtypes)
#Creat Model- Bayesian Network
model =BayesianNetwork([('age','heartdisease'),('sex','heartdisease'),(
'exang','heartdisease'),('cp','heartdisease'),('heartdisease', 'restecg'),
('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators


print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:')
HeartDiseasetest_infer = VariableElimination(model)
#computing the Probability of HeartDisease given restecg print('\
n 1.Probability of HeartDisease given evidence=restecg :1')
q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)
#computing the Probability of HeartDisease given cp
print('\n 2.Probability of HeartDisease given evidence= cp:2 ')
q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)
26
OUTPUT:
Sample instances from the dataset are given below
age sex cp trestbps chol ... oldpeak slope ca thal heartdisease
0 63 1 1 145 233 ... 2.3 3 0 6 0
1 67 1 4 160 286 ... 1.5 2 3 3 2
2 67 1 4 120 229 ... 2.6 2 2 7 1
3 37 1 3 130 250 ... 3.5 3 0 3 0
4 41 0 2 130 204 ... 1.4 1 0 3 0

[5 rows x 14 columns]

Attributes and
datatypes age int64
sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope
int64
ca object
thal object
heartdisease
int64
dtype: object

Learning CPD using Maximum likelihood estimators

27
Inferencing with Bayesian Network:

1. Probability of HeartDisease given evidence=restecg :1


+ +- -+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.1016 |
+ +- -+
| heartdisease(1) | 0.0000 |
+ +- -+
| heartdisease(2) | 0.2361 |
+ +- -+
| heartdisease(3) | 0.2017 |
+ +- -+
| heartdisease(4) | 0.4605 |
+ +- -+

2. Probability of HeartDisease given evidence= cp:2


+ +- -+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.3742 |
+ +- -+
| heartdisease(1) | 0.2018 |
+ +- -+
| heartdisease(2) | 0.1375 |
+ +- -+
| heartdisease(3) | 0.1541 |
+ +- -+
| heartdisease(4) | 0.1323 |
+ +-
RESULT:
Thus, the program to implement Bayesian network considering medical
data. Use this model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set has been executed successfully.
28
Ex.No: 6 Apply EM algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for
Date : clustering using the k-Means algorithm.
Compare the results of these two algorithms.

AIM:
To Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using the k-Means algorithm. Compare the results of these two algorithms.

DATASET: iris.csv

LINK: https://drive.google.com/file/d/1-lseekjQ6h1xHKETLYlm7a_IfZ-3sAOY/view?usp=share_link

ALGORITHM:

Step 1: Import required libraries/modules/packages.

Step 2: Read the dataset from the given CSV file into a pandas dataframe.

Step 3: Extract the input features from the dataset anDstore it in a new dataframe
X.

Step 4: Create a KMeans model with three clusters and fit it to the input data X.

Step 5: Create a Gaussian Mixture model with three components and fit it to the input
data X.

Step 6: Print the accuracy score and confusion matrix of both models.

29
PROGRAM/SOURCE CODE:
from sklearn.cluster import
KMeans from sklearn import
preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset=load_iris()
# print(dataset)

X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')

# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')

# GMM PLOT
30
scaler=preprocessing.StandardScaler()

31
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

OUTPUT:

Text(0.5, 1.0, 'GMM Classification')

32
33
RESULT:
Thus, the program to implement EM algorithm to cluster a set of data stored in a
.CSV file. Use the same data set for clustering using the k-Means algorithm has been
executed successfully.

34
Ex.No:7 Write a program to implement k-Nearest Neighbour
algorithm to classify the iris dataset. Print both
Date : correct and wrong predictions

AIM:
To implement k-Nearest Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions

DATASET: iris.csv

LINK: https://drive.google.com/file/d/1vVwpEuIyb-r3uVtrvbZxBMDsX5wpq-ml/view?usp=share_link

ALGORITHM:

Step 1: Load the Iris dataset from a CSV file into a pandas dataframe.

Step 2: Split the dataset into the input features X and output class y.

Step 3: Split the dataset into training and testing sets.

Step 4: Initialize a KNN classifier with n_neighbors set to 5 and fit it to the training data.
Step 5: Predict the class labels for the testing set using the KNN classifier.

Step 6: Calculate and print the confusion matrix, classification report, and accuracy score
of the KNN classifier.

Step 7: Print the accuracy of the classifier.

35
PROGRAM/SOURCE CODE:
from sklearn.datasets import load_iris
from sklearn.neighbors import
KNeighborsClassifier from sklearn.model_selection
import train_test_split import numpy as np

dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=0)

kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)

for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)
print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,dataset
["target_names"]
[prediction])
print(kn.score(X_test,y_test))

OUTPUT:
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']

RESULT:
Thus, the program to implement k-Nearest Neighbour algorithm to classify the iris
data set using Python has been executed successfully.

36
Ex.No :8

Write a program to implement Decision Tree


Date : classification model using a .CSV file to measure the
accuracy.

AIM:
To implement Decision Tree classification model using a .CSV file to measure the
accuracy.

DATASET: data_cleaned.csv

LINK: https://drive.google.com/file/d/1VbrVnGcblK7e2PXVQlMkUTtxTkaVVYBW/view?usp=share_link

ALGORITHM:

Step 1: Load cleaned data and split into training and validation sets.

Step 2: Train a decision tree classifier on the training set.

Step 3: Evaluate classifier accuracy on training and validation sets.

Step 4: Use predict() to get predicted outcomes for validation set.

Step 5: Use predict_proba() to get survival probabilities for validation set.

Step 6: Create new predictions based on a probability threshold of 0.7.

Step 7: Iterate through different values of max_depth to generate train and validation
Accuracy scores.

Step 8: Visualize train and validation accuracy scores using a line graph.

Step 9: Create a decision tree classifier with max_depth of 8 and max_leaf_nodes of 25.

Step 10: Fit the classifier to the training set and evaluate accuracy on training and validation
Sets.
37
38
Step 11: Use graphviz to create a visualization of the decision tree.

Step 12: Display the visualization using matplotlib.pyplot.

PROGRAM/SOURCE CODE:
import pandas as
pd import numpy
as np
import matplotlib.pyplot as plt
%matplotlib inline
data=pd.read_csv('data_cleaned.csv')
print(data.shape)
data.isnull().sum()
y=
data['Survived']
X = data.drop(['Survived'], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state = 101, stratify=y,
test_size=0.25)
y_train.value_counts (normalize=True)
y_valid.value_counts(normalize=True)
X_train.shape, y_train.shape
X_valid.shape, y_valid.shape
from sklearn.tree import DecisionTreeClassifier
dt_model =
DecisionTreeClassifier(random_state=10)
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid, y_valid)
dt_model.predict(X_valid)
dt_model.predict_proba(X_valid)
y_pred = dt_model.predict_proba(X_valid)
[:,1] y_new = []
for i in range(len(y_pred)):
if y_pred[i]<=0.7:
y_new.append(0)
else:
y_new.append(1)
from sklearn.metrics import accuracy_score
accuracy_score(y_valid, y_new)
train_accuracy = []
validation_accuracy = []
for depth in
39
range(1,30):
dt_model = DecisionTreeClassifier(max_depth=depth, random_state=10)
dt_model.fit(X_train, y_train)
train_accuracy.append(dt_model.score(X_train, y_train))

40
validation_accuracy.append(dt_model.score(X_valid, y_valid))
frame = pd.DataFrame({'max_depth':range(1,30), 'train_acc':train_accuracy,
'valid_acc':validation_accuracy})
frame.head(15)
plt.figure(figsize=(12,6))
plt.plot(frame['max_depth'], frame['train_acc'], marker='o')
plt.plot(frame['max_depth'], frame['valid_acc'],
marker='o') plt.xlabel('Depth of tree')
plt.ylabel('performance')

dt_model = DecisionTreeClassifier(max_depth=8, max_leaf_nodes=25, random_state=10)


dt_model
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid,
y_valid) from sklearn import tree
!pip install
graphviz
decision_tree =
tree.export_graphviz(dt_model,out_file='tree.dot',feature_names=X_train.columns,
max_depth=2,filled=True)

image =
plt.imread('tree.png')
plt.figure(figsize=(15,15))
plt.imshow(image)

OUTPUT:
(891, 25)

41
RESULT:
Thus the program to implement Decision Tree classification model using a .CSV file to
measure the accuracy using python has been executed successfully.

42
Ex.No: 9
Implement Logistic regression Algorithm with a dataset
Date : .And measure the accuracy score and confusion matrix

AIM:
To implement Logistic regression Algorithm with a dataset . And measure the accuracy
score and Confusion matrix.

DATASET: iris.csv

LINK:https://drive.google.com/file/d/1w8C2PmuZkDOuVEhIwTdBb3LJMW7HIJ1R/view?usp=share_link

ALGORITHM:

Step 1: Import necessary libraries and load the dataset using pandas.read_csv().
Step 2: Extract the 'temp' and 'label' columns and reshape them.

Step 3: Create a scatter plot with a logistic regression line using seaborn.regplot().

Step 4: Split the data into training and testing sets using train_test_split().

Step 5: Initialize a LogisticRegression model object and fit the training data.

Step 6: Predict the y values for the testing data and calculate the accuracy score using
Accuracy score.

Step 7: Generate a confusion matrix to evaluate the model's performance on the entire
Dataset.

43
PROGRAM/SOURCE CODE:
import numpy as
np import pandas
as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris


dataset iris =
load_iris()
X = iris.data
y=
iris.target

# For binary classification, let's only use two


classes X = X[y != 2]
y = y[y != 2]

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Logistic Regression model


model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the test


set y_pred =
model.predict(X_test)

44
# Calculate the accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

45
# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test,
y_pred) print('Confusion Matrix:')
print(conf_matrix)

# Plot the confusion matrix


sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion
Matrix') plt.show()

46
OUTPUT:
Accuracy: 1.0
Confusion
Matrix:
[[17 0]
[ 0 13]]

RESULT:

Thus, the program to implement Logistic regression Algorithm with a dataset . And
measure the accuracy score and Confusion matrix using Python has been executed
successfully.

47
Ex.No: 10
Implement Linear regression Algorithm with a
Date :
dataset And measure the accuracy score.

AIM:
Implement Linear regression Algorithm with a dataset and measure the accuracy score.

DATASET: linear data.csv

LINK: https://drive.google.com/file/d/1zSPRRkSxqLPYWsQTCQHjAqUfNv14zrqU/view?usp=share_link

ALGORITHM:

Step 1: Import necessary libraries and mount Google Drive to access the dataset using
Drive.mount().

Step 2: Load the 'linear_data.csv' dataset using pandas.read_csv() function and store it in a
DataFrame variable called df.

Step 3: Extract the 'x' and 'y' columns and reshape them.

Step 4: Split the data into training and testing sets using train_test_split() with a test size of
0.25.

Step 5: Initialize a LinearRegression model object and fit the training data.

Step 6: Predict the y values for the testing data using lr.predict().

Step 7: Create a scatter plot of the 'x' and 'y' data with a linear regression line using
matplotlib.pyplot.scatter() and matplotlib.pyplot.plot() functions Dataset.

Step 8: Calculate the R-squared score of the model using r2_score() function by comparing
the predicted y values with the actual y values in the testing set.

Step 9: Print the calculated R-squared score.

48
PROGRAM/SOURCE CODE:
import numpy as
np import pandas
as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt

# Load the California Housing dataset


california =
fetch_california_housing()
X = california.data
y=
california.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Linear Regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test


set y_pred =
model.predict(X_test)

49
# Calculate the R^2 score
r2 = r2_score(y_test,
y_pred) print(f'R^2 score:
{r2}')

50
# Plotting the scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted
Values')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red')
# Diagonal line for reference
plt.show()

51
OUTPUT:
R^2 score: 0.595770232606166

RESULT:
Thus, the program to Implement Linear regression Algorithm with a
dataset and measure the accuracy score using python has been executed successfully.

52

You might also like