0% found this document useful (0 votes)
2 views

DWM Exp 4

The document details the implementation of a Naive Bayes classifier using Python, including data preprocessing, model training, and evaluation metrics. It explains the theoretical foundation of Bayes' Theorem and its applications in various fields such as spam filtering and medical diagnosis. The experiment demonstrates the classifier's performance with an accuracy of 52% and provides a confusion matrix and classification report.

Uploaded by

2021.rishi.kokil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DWM Exp 4

The document details the implementation of a Naive Bayes classifier using Python, including data preprocessing, model training, and evaluation metrics. It explains the theoretical foundation of Bayes' Theorem and its applications in various fields such as spam filtering and medical diagnosis. The experiment demonstrates the classifier's performance with an accuracy of 52% and provides a confusion matrix and classification report.

Uploaded by

2021.rishi.kokil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

10/10/23, 10:48 PM Naive-bayes - Colaboratory

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('/content/MOCK_DATA.csv')

# Explore the dataset


print(df.head())

account_circle 0 Five-year
contract
contract
gender
non-binary
Churn
0
1 Five-year contract non-binary 0
2 Three-year contract female 1
3 Two-year contract non-binary 0
4 Five-year contract non-binary 1

# Handle missing values (if needed)


df = df.dropna()

# Encode categorical variables


label_encoder = LabelEncoder()
df['gender'] = label_encoder.fit_transform(df['gender'])
df['contract'] = label_encoder.fit_transform(df['contract'])
# You may need to encode other categorical columns as well.
print(df['contract'])

# Define features (X) and target variable (y)


X = df.drop(columns=['Churn'])
y = df['Churn']

0 0
1 0
2 3
3 4
4 0
..
995 1
996 0
997 3
998 4
999 2
Name: contract, Length: 1000, dtype: int64

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply SMOTE to address class imbalance


smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Train the Naïve Bayes classifier


nb_classifier = GaussianNB()
nb_classifier.fit(X_resampled, y_resampled)

▾ GaussianNB
GaussianNB()

y_pred = nb_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Create a confusion matrix


confusion = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(confusion)

# Display classification report

https://colab.research.google.com/drive/1YY7qFj8bBY3XEcq-gzrRNmXwssLBoRaV?usp=sharing#printMode=true 1/3
10/10/23, 10:48 PM Naive-bayes - Colaboratory
print('Classification Report:')
print(classification_report(y_test, y_pred))

Accuracy: 0.52
Confusion Matrix:
[[36 65]
[32 67]]
Classification Report:
precision recall f1-score support

0 0.53 0.36 0.43 101


1 0.51 0.68 0.58 99

accuracy 0.52 200


macro avg 0.52 0.52 0.50 200
weighted avg 0.52 0.52 0.50 200

import seaborn as sns


# Create a confusion matrix
confusion = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix using Seaborn


plt.figure(figsize=(8, 6))
sns.set(font_scale=1.2) # Adjust the font size if needed
sns.heatmap(confusion, annot=True, fmt='d', cmap='Blues', cbar=False,
xticklabels=['Not Churn', 'Churn'], yticklabels=['Not Churn', 'Churn'])

plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

https://colab.research.google.com/drive/1YY7qFj8bBY3XEcq-gzrRNmXwssLBoRaV?usp=sharing#printMode=true 2/3
10/10/23, 10:48 PM Naive-bayes - Colaboratory

https://colab.research.google.com/drive/1YY7qFj8bBY3XEcq-gzrRNmXwssLBoRaV?usp=sharing#printMode=true 3/3
Rishi Kokil | D12C | 38 | DWM

Experiment No. 4

Aim
Implementation of Bayesian algorithm.

Theory
Bayes’ Theorem describes the probability of an event, based on precedent knowledge of
conditions which might be related to the event. In other words, Bayes’ Theorem is the
add-on of Conditional Probability.
With the help of Conditional Probability, one can find out the probability of X given H, and it
is denoted by P(X | H). Now Bayes’ Theorem states that if we know Conditional Probability
(P(X | H)) then we can find out P(H | X), given the condition that P(X) and P(H) are already
known to us.
Bayes’ Theorem is named after Thomas Bayes. He first makes use of conditional probability
to provide an algorithm which uses evidence to calculate limits on an unknown parameter.
Bayes’ Theorem has two types of probabilities :

Prior Probability [P(H)]


Posterior Probability [P(H/X)]
Where,
X – X is a data tuple.
H – H is some Hypothesis.

1. Prior Probability
Prior Probability is the probability of occurring an event before the collection of new data. It
is the best logical evaluation of the probability of an outcome which is based on the present
knowledge of the event before the inspection is performed.

2. Posterior Probability
When new data or information is collected then the Prior Probability of an event will be
revised to produce a more accurate measure of a possible outcome. This revised
probability becomes the Posterior Probability and is calculated using Bayes’ theorem. So,
the Posterior Probability is the probability of an event X occurring given that event H has
occurred.

1
Rishi Kokil | D12C | 38 | DWM

Formula
Bayes’ Theorem, can be mathematically represented by the equation given below :

P(H/X) =P(X/H)P(H)/P(X)

Where,

H and X are the events and,


P (X) ≠ 0
P(H/X) – Conditional probability of H.
Given that X occurs.

P(X/H) – Conditional probability of X.


Given that H occurs.
P(H) and P(X) – Prior Probabilities of occurring H and X independent of each other.
This is called the marginal probability.

Naive Bayes classifier assumes that the presence (or absence) of a particular feature of
a class is unrelated to the presence (or absence) of any other feature. For example, a
fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even
though these features depend on the existence of the other features, a naive Bayes
classifier considers all of these properties to independently contribute to the probability
that this fruit is an apple.
An advantage of the naive Bayes classifier is that it requires a small amount of training
data to estimate the parameters (means and variances of the variables) necessary for
classification. Because independent variables are assumed, only the variances of the
variables for each class need to be determined and not the entire covariance matrix.

Naïve Bayesian classifier Algorithm


Step 1: Convert the data set into a frequency table.
Step 2: Create Likelihood table by finding the probabilities.

2
Rishi Kokil | D12C | 38 | DWM

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each
class. The class with the highest posterior probability is the outcome of prediction.

Applications of Bayes’ Theorem


Bayes' theorem or Bayesian classification in data mining has a wide range of applications in
many fields, including statistics, machine learning, artificial intelligence, natural language
processing, medical diagnosis, image and speech recognition, and more. Here are some
examples of its applications -

Spam filtering - Bayes' theorem is commonly used in email spam filtering, where it helps to
identify emails that are likely to be spam based on the text content and other features.
Medical diagnosis - Bayes' theorem can be used to diagnose medical conditions based on
the observed symptoms, test results, and prior knowledge about the prevalence and
characteristics of the disease.
Risk assessment - Bayes' theorem can be used to assess the risk of events such as
accidents, natural disasters, or financial market fluctuations based on historical data and
other relevant factors.
Natural language processing - Bayes' theorem can be used to classify documents,
sentiment analysis, and topic modeling in natural language processing applications.
Recommendation systems - Bayes' theorem can be used in recommendation systems like
e-commerce websites to suggest products or services to users based on their previous
behavior and preferences.
Fraud detection - Bayes' theorem can be used to detect fraudulent behavior, such as credit
card or insurance fraud, by analyzing patterns of transactions and other data.

3
Rishi Kokil | D12C | 38 | DWM

Conclusion
In conclusion, our experiment with Bayesian algorithms in Python has provided valuable
insights into the power and versatility of Bayesian inference for solving a wide range of
problems. Bayesian methods have proven to be a robust and principled framework for
handling uncertainty and making informed decisions in various applications.

You might also like