Machine Learning Lab5
Machine Learning Lab5
CODE’
python
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc, recall_score
import matplotlib.pyplot as plt
# Split the data into training and testing sets (using 18% as test data as specified)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.18, random_state=42)
# Make predictions
y_pred = gnb.predict(X_test)
y_pred_prob = gnb.predict_proba(X_test)[:, 1]
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
recall = recall_score(y_test, y_pred)
error_rate = 1 - accuracy
# Print results
print("Performance Metrics:")
print(f"Accuracy: {accuracy:.3f}")
print(f"Error Rate: {error_rate:.3f}")
print(f"Recall: {recall:.3f}")
print(f"ROC AUC: {roc_auc:.3f}")
print("\nConfusion Matrix:")
print(conf_matrix)
Now, let me explain why Gaussian Naive Bayes is a suitable classifier for this dataset and provide
justification for its use: