0% found this document useful (0 votes)
2 views

MLT Practical 3 and 4

The document outlines a program that implements a naïve Bayesian classifier using Python for two datasets: PlayTennis.csv and Spam.csv. It demonstrates how to preprocess the data, train the model, and evaluate its performance by calculating accuracy, precision, and recall. The program utilizes libraries such as pandas, sklearn, and CountVectorizer for data handling and model training.

Uploaded by

shubhipandey25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

MLT Practical 3 and 4

The document outlines a program that implements a naïve Bayesian classifier using Python for two datasets: PlayTennis.csv and Spam.csv. It demonstrates how to preprocess the data, train the model, and evaluate its performance by calculating accuracy, precision, and recall. The program utilizes libraries such as pandas, sklearn, and CountVectorizer for data handling and model training.

Uploaded by

shubhipandey25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

// Write a program to implement the naïve Bayesian classifier for a sample training dataset

stored in a .CSV file. Compute the accuracy of the classifier using a few test datasets
(PlayTennis.csv).

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
file_path = "PlayTennis.csv"
df = pd.read_csv(file_path)
print("Dataset:")
print(df.head())
for column in df.columns:
df[column] = df[column].astype('category').cat.codes
X = df.iloc[:, :-1] # All columns except the last one
y = df.iloc[:, -1] # Last column as the target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
//Assume a set of documents that need to be classified. Use the naïve Bayesian classifier
model to perform this task. Calculate accuracy, precision, and recall for your dataset
(Spam.csv).

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score
file_path = "Spam.csv"
df = pd.read_csv(file_path)
df['label'] = df['label'].map({'spam': 1, 'ham': 0})
X = df['message']
y = df['label']
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
print("Dataset Sample:")
print(df.head())
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

You might also like