0% found this document useful (0 votes)
8 views

Machine Learning

The document discusses building a Naive Bayes classifier model to classify iris flower types using scikit-learn. It loads and preprocesses the iris dataset, builds and tunes the Naive Bayes model, evaluates its performance, and achieves an accuracy of 96.7%.

Uploaded by

bunsglazing135
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Machine Learning

The document discusses building a Naive Bayes classifier model to classify iris flower types using scikit-learn. It loads and preprocesses the iris dataset, builds and tunes the Naive Bayes model, evaluates its performance, and achieves an accuracy of 96.7%.

Uploaded by

bunsglazing135
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Machine Learning Lab 3

Naive Bayes Classifier

M. Ashwin 21BCE5695

Importing required libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import numpy as np

Function to load dataset

def load_dataset():
data=pd.read_csv("Iris_Data.csv")
dataset=data.values
X=dataset[:,:-1]
y=dataset[:,-1]
return X,y

Load the dataset by calling the function and split it into testing and training data

X, Y = load_dataset()
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size =
0.8, random_state = 1, shuffle = True)
print('Train',X_train.shape,Y_train.shape)
print('Test',X_test.shape,Y_test.shape)

Train (120, 4) (120,)


Test (30, 4) (30,)

Pre-processing the data

sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.transform(X_test)

Building the model

NB=GaussianNB()
NB.fit(X_train,Y_train)

GaussianNB()
Predicting the values for the testing data

Y_pred = NB.predict(X_test)

Performance evaluation

cm=confusion_matrix(Y_test,Y_pred)
acc=accuracy_score(Y_test,Y_pred)
print(cm, acc)

[[11 0 0]
[ 0 12 1]
[ 0 0 6]] 0.9666666666666667

class_rep=classification_report(Y_test,Y_pred)
print(class_rep)

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 11


Iris-versicolor 1.00 0.92 0.96 13
Iris-virginica 0.86 1.00 0.92 6

accuracy 0.97 30
macro avg 0.95 0.97 0.96 30
weighted avg 0.97 0.97 0.97 30

Hyper parameter Tuning

from sklearn.model_selection import RepeatedStratifiedKFold


np.logspace(0,-9,10)
cv=RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)

from sklearn.preprocessing import PowerTransformer


from sklearn.model_selection import GridSearchCV

grid_param={'var_smoothing':np.logspace(0,-9,100)}
grid_NB=GridSearchCV(estimator=NB, param_grid=grid_param,cv=cv,
verbose=1, scoring='accuracy')

data_trans=PowerTransformer().fit_transform(X_test)
grid_NB.fit(data_trans,Y_test)

Fitting 15 folds for each of 100 candidates, totalling 1500 fits

GridSearchCV(cv=RepeatedStratifiedKFold(n_repeats=3, n_splits=5,
random_state=1),
estimator=GaussianNB(),
param_grid={'var_smoothing': array([1.00000000e+00,
8.11130831e-01, 6.57933225e-01, 5.33669923e-01,
4.32876128e-01, 3.51119173e-01, 2.84803587e-01, 2.31012970e-01,
1.87381742e-01, 1.51991108e-01, 1.23284674e-01, 1.00000000e-01,
8.11130831e-02, 6.57933225e-02, 5.3...
1.23284674e-07, 1.00000000e-07, 8.11130831e-08, 6.57933225e-08,
5.33669923e-08, 4.32876128e-08, 3.51119173e-08, 2.84803587e-08,
2.31012970e-08, 1.87381742e-08, 1.51991108e-08, 1.23284674e-08,
1.00000000e-08, 8.11130831e-09, 6.57933225e-09, 5.33669923e-09,
4.32876128e-09, 3.51119173e-09, 2.84803587e-09, 2.31012970e-09,
1.87381742e-09, 1.51991108e-09, 1.23284674e-09, 1.00000000e-
09])},
scoring='accuracy', verbose=1)

grid_NB.best_score_

0.9666666666666668

grid_NB.best_params_

{'var_smoothing': 1.0}

Y_pred=grid_NB.predict(X_test)
cm=confusion_matrix(Y_test,Y_pred)
acc=accuracy_score(Y_test,Y_pred)
print(cm, acc)

[[11 0 0]
[ 0 13 0]
[ 0 1 5]] 0.9666666666666667

cr=classification_report(Y_test,Y_pred)
print(cr)

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 11


Iris-versicolor 0.93 1.00 0.96 13
Iris-virginica 1.00 0.83 0.91 6

accuracy 0.97 30
macro avg 0.98 0.94 0.96 30
weighted avg 0.97 0.97 0.97 30

You might also like