0% found this document useful (0 votes)
3 views

C5

The document outlines a Python script that utilizes the Iris dataset for classification using a Gaussian Naive Bayes model. It includes data loading, preprocessing, model training, and evaluation with a confusion matrix and classification report. The results indicate a high accuracy of 97% on the test dataset.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

C5

The document outlines a Python script that utilizes the Iris dataset for classification using a Gaussian Naive Bayes model. It includes data loading, preprocessing, model training, and evaluation with a confusion matrix and classification report. The results indicate a high accuracy of 97% on the test dataset.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

In [1]: import numpy as np

import matplotlib.pyplot as plt


import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris

In [2]: data = load_iris()


df = pd.DataFrame(data=data.data, columns=data.feature_names)

In [3]: df.head()

Out[3]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal length (cm) 150 non-null float64
1 sepal width (cm) 150 non-null float64
2 petal length (cm) 150 non-null float64
3 petal width (cm) 150 non-null float64
dtypes: float64(4)
memory usage: 4.8 KB

In [5]: data.target

Out[5]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [6]: X = data.data
Y = data.target

In [7]: from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state


sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

print(f'Train Dataset Size - X: {X_train.shape}, Y: {Y_train.shape}')


print(f'Test Dataset Size - X: {X_test.shape}, Y: {Y_test.shape}')

Train Dataset Size - X: (120, 4), Y: (120,)


Test Dataset Size - X: (30, 4), Y: (30,)
In [8]: from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()
classifier.fit(X_train, Y_train)
predictions = classifier.predict(X_test)

fig, axs = plt.subplots(2, 2, figsize = (12, 10), constrained_layout = True);


_ = fig.suptitle('Regression Line Tracing')

for i in range(4):
x, y = i // 2, i % 2
_ = sns.regplot(x = X_test[:, i], y = predictions, ax=axs[x, y])
_ = axs[x, y].scatter(X_test[:, i][::-1], Y_test[::-1], marker = '+', color="white"
_ = axs[x, y].set_xlabel(df.columns[i])

Out[8]: ▾ GaussianNB
GaussianNB()

Confusion matrix
In [9]: from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

cm = confusion_matrix(Y_test, predictions)
print(f'''Confusion matrix :\n
| Positive Prediction\t| Negative Prediction
---------------+------------------------+----------------------
Positive Class | True Positive (TP) {cm[0, 0]}\t| False Negative (FN) {cm[0, 1]}
---------------+------------------------+----------------------
Negative Class | False Positive (FP) {cm[1, 0]}\t| True Negative (TN) {cm[1, 1]}\n\n'''

cm = classification_report(Y_test, predictions)
print('Classification report : \n', cm)

Confusion matrix :

| Positive Prediction | Negative Prediction


---------------+------------------------+----------------------
Positive Class | True Positive (TP) 11 | False Negative (FN) 0
---------------+------------------------+----------------------
Negative Class | False Positive (FP) 0 | True Negative (TN) 13

Classification report :
precision recall f1-score support

0 1.00 1.00 1.00 11


1 0.93 1.00 0.96 13
2 1.00 0.83 0.91 6

accuracy 0.97 30
macro avg 0.98 0.94 0.96 30
weighted avg 0.97 0.97 0.97 30

You might also like