0% found this document useful (0 votes)

2 views

Ex.no.5_Naïve Bayesian classifier

The Naïve Bayes classifier uses Bayes' Theorem to classify data based on feature probabilities, primarily in high-dimensional text classification. It operates under the assumption that features are independent and is commonly applied in spam filtration and sentiment analysis. The document outlines the algorithm's steps, provides an example of its application in weather conditions, and presents problem statements for developing a text classification system and predicting tumor malignancy using a breast cancer dataset.

Uploaded by

Soyeb Mohammad

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Ex.no.5_Naïve Bayesian classifier

Uploaded by

Soyeb Mohammad

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Ex no:5 Naïve Bayesian classifier

Description:
The main idea behind the Naive Bayes classifier is to use Bayes’ Theorem to classify data
based on the probabilities of different classes given the features of the data. It is used mostly
in high-dimensional text classification. The Naive Bayes Classifier is a simple probabilistic
classifier with very few parameters used to build ML models that can predict at a faster speed
than other classification algorithms. Naive Bayes is called “naive” because it assumes that the
features of a data point are independent of each other. The Naïve Bayes Algorithm is used in
spam filtration, Sentiment analysis, classifying articles, etc.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

Steps in Naïve Bayesian Classifier Algorithm

Naive Bayes classifier calculates the probability of an event in the following steps:

Step 1: Calculate the prior probability for given class labels

Step 2: Find Likelihood probability with each attribute for each class
Step 3: Put these values in Bayes Formula and calculate posterior probability.
Step 4: See which class has a higher probability, given the input belongs to the higher
probability class.
Example:

Given an example of weather conditions and playing sports, calculate the probability of
playing sports. Now, you need to classify whether players will play or not, based on the
weather condition.

The Frequency table contains the occurrence of labels for all features. There are two
likelihood tables. Likelihood Table 1 is showing prior probabilities of labels and Likelihood
Table 2 is showing the posterior probability.

Suppose you want to calculate the probability of playing when the weather is overcast.

Probability of playing:

P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)

Calculate Prior Probabilities:

P(Overcast) = 4/14 = 0.29

P(Yes)= 9/14 = 0.64

Calculate Posterior Probabilities:

P(Overcast |Yes) = 4/9 = 0.44

Put Prior and Posterior probabilities in equation (1)

P (Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98(Higher)

Probability of not playing:

P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)

Calculate Prior Probabilities:

P(Overcast) = 4/14 = 0.29

P(No)= 5/14 = 0.36

Calculate Posterior Probabilities:

P(Overcast |No) = 0/9 = 0

Put Prior and Posterior probabilities in equation (2)

P (No | Overcast) = 0 * 0.36 / 0.29 = 0

The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast
than players will play the sport.

Problem statements:

1. Given a collection of text documents belonging to multiple categories (e.g., news

articles, emails, or product reviews), manually classifying them is inefficient and
prone to errors. Develop a Naïve Bayes-based text classification system in Python to
automatically categorize documents into predefined labels. Investigate how different
preprocessing techniques (such as stopword removal, stemming, and TF-IDF
vectorization) impact classification accuracy, precision, and recall. The goal is to
build an efficient, scalable, and interpretable model for real-world applications like
spam detection, sentiment analysis, and topic categorization.

2. Given a dataset of breast cancer patient records, the task is to build a Naïve Bayes
Classifier to predict whether a tumor is malignant or benign based on various features
such as tumor size, texture, and other medical measurements.
Breast Cancer dataset
mean_radius mean_texture mean_perimeter mean_area mean_smoothness diagnosis
17.99 10.38 122.8 1001 0.1184 0
20.57 17.77 132.9 1326 0.08474 0
19.69 21.25 130 1203 0.1096 0
11.42 20.38 77.58 386.1 0.1425 0
20.29 14.34 135.1 1297 0.1003 0
12.45 15.7 82.57 477.1 0.1278 0
16.13 20.68 108.1 798.8 0.117 0
19.81 22.15 130 1260 0.09831 0
13.54 14.36 87.46 566.3 0.09779 1
13.08 15.71 85.63 520 0.1075 1
9.504 12.44 60.34 273.9 0.1024 1

OceanofPDF.com Mind the Gap - Karen Gurney
No ratings yet
OceanofPDF.com Mind the Gap - Karen Gurney
301 pages
An Nasihah Brochure (Digital)
No ratings yet
An Nasihah Brochure (Digital)
7 pages
Credit Card Fraud Detection (Data Analyst)
No ratings yet
Credit Card Fraud Detection (Data Analyst)
22 pages
Basic Vibration Analysis
90% (10)
Basic Vibration Analysis
43 pages
Module 4 (Data Management) - Math 101
No ratings yet
Module 4 (Data Management) - Math 101
8 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
Chapter 1-Quiz 2
56% (9)
Chapter 1-Quiz 2
3 pages
Ebook 5 Crucial Church Growth Metrics
No ratings yet
Ebook 5 Crucial Church Growth Metrics
24 pages
reast-cancer-prediction-using-debt
No ratings yet
reast-cancer-prediction-using-debt
18 pages
Firas Al-Azizy ML Assignment 1
No ratings yet
Firas Al-Azizy ML Assignment 1
12 pages
SH Assignment
No ratings yet
SH Assignment
6 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
IPE 4706- Ergonomics and Safety Management Lab All Experiments
No ratings yet
IPE 4706- Ergonomics and Safety Management Lab All Experiments
57 pages
LAB # 08 Naive Bayes.ipynb - Colab
No ratings yet
LAB # 08 Naive Bayes.ipynb - Colab
3 pages
Mllabprog 5
No ratings yet
Mllabprog 5
6 pages
A4 - Jupyter Notebook PDF
No ratings yet
A4 - Jupyter Notebook PDF
8 pages
Emllab
No ratings yet
Emllab
6 pages
Ex 5 - NN - Wheat Seed Data
No ratings yet
Ex 5 - NN - Wheat Seed Data
5 pages
Copa S. Marcos v. Inf. 11 Lfis 200
No ratings yet
Copa S. Marcos v. Inf. 11 Lfis 200
11 pages
Lesson Plan in Random Variable
No ratings yet
Lesson Plan in Random Variable
10 pages
Lecture 3 Part 1 Understanding Data With Statistics
No ratings yet
Lecture 3 Part 1 Understanding Data With Statistics
7 pages
ML Lab Experiments (1) - Pages-3
No ratings yet
ML Lab Experiments (1) - Pages-3
11 pages
Correlation, Regression and Test of Signficance in R
No ratings yet
Correlation, Regression and Test of Signficance in R
16 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Experiment 8
No ratings yet
Experiment 8
14 pages
CustomNER ConfusionMatrix Explained
No ratings yet
CustomNER ConfusionMatrix Explained
8 pages
F1G113078 Uas
No ratings yet
F1G113078 Uas
11 pages
Terms and Definitions
No ratings yet
Terms and Definitions
31 pages
26 Xay Dung Va Validate Mo Hinh Hoi Qui Logistic Da Bien
No ratings yet
26 Xay Dung Va Validate Mo Hinh Hoi Qui Logistic Da Bien
18 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Problem
No ratings yet
Problem
25 pages
# Diabetes: Pandas PD Numpy NP Seaborn Sns
No ratings yet
# Diabetes: Pandas PD Numpy NP Seaborn Sns
4 pages
Assign 9
No ratings yet
Assign 9
7 pages
Naive
No ratings yet
Naive
5 pages
Lab10 Regression Evaluation Methods
No ratings yet
Lab10 Regression Evaluation Methods
5 pages
Breast Cancer Prediction
No ratings yet
Breast Cancer Prediction
5 pages
WS#3 Python Data Science Toolbox - Nitro
No ratings yet
WS#3 Python Data Science Toolbox - Nitro
6 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Karisma_23011101119_eda_rec
No ratings yet
Karisma_23011101119_eda_rec
88 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
This Is Only For Practice and Will Not Be Graded
No ratings yet
This Is Only For Practice and Will Not Be Graded
5 pages
Why Data Visualizations DashingD3js - Com
No ratings yet
Why Data Visualizations DashingD3js - Com
10 pages
Data Science
No ratings yet
Data Science
8 pages
Untitled6.Ipynb - Colab
No ratings yet
Untitled6.Ipynb - Colab
6 pages
3
No ratings yet
3
5 pages
Lecture7appendixa 12thsep2009
No ratings yet
Lecture7appendixa 12thsep2009
31 pages
Slides On Panel Data Analysis
No ratings yet
Slides On Panel Data Analysis
44 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
Glass Classification
100% (2)
Glass Classification
3 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Logistic Pima Indians - Ipynb - Colaboratory
No ratings yet
Logistic Pima Indians - Ipynb - Colaboratory
4 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
Documenting An Experiment: Imrd: ST RD
No ratings yet
Documenting An Experiment: Imrd: ST RD
20 pages
Logistic+Regression+Practice+Exercise+ +solutions - Ipynb Colaboratory
No ratings yet
Logistic+Regression+Practice+Exercise+ +solutions - Ipynb Colaboratory
5 pages
Topic 19 Identifying The Appropriate Rejection Region For A Given Level of Significance
100% (1)
Topic 19 Identifying The Appropriate Rejection Region For A Given Level of Significance
8 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
How Do I Profile C++ Code Running On Linux - Stack Overflow
No ratings yet
How Do I Profile C++ Code Running On Linux - Stack Overflow
30 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Diabetes Prediction - Logistic Regression - Jupyter Notebook
No ratings yet
Diabetes Prediction - Logistic Regression - Jupyter Notebook
4 pages
body_fat..
No ratings yet
body_fat..
2 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
Research Methodology Lab File
No ratings yet
Research Methodology Lab File
92 pages
How to Measure Anything in Cybersecurity Risk
From Everand
How to Measure Anything in Cybersecurity Risk
Douglas W. Hubbard
4.5/5 (5)
Wavelet Neural Networks: With Applications in Financial Engineering, Chaos, and Classification
From Everand
Wavelet Neural Networks: With Applications in Financial Engineering, Chaos, and Classification
Antonios K. Alexandridis
No ratings yet
Harry Potter RPG Core Rule Book
100% (1)
Harry Potter RPG Core Rule Book
163 pages
Review Be
No ratings yet
Review Be
15 pages
PT Communication Cable Systems Indonesia TBK Company and Industry Updates
No ratings yet
PT Communication Cable Systems Indonesia TBK Company and Industry Updates
17 pages
Lewis and Tolkien Syllabus
No ratings yet
Lewis and Tolkien Syllabus
3 pages
Strategies to teach tricky words
No ratings yet
Strategies to teach tricky words
4 pages
Financial Advisor Agreement: Page 1 / 7
No ratings yet
Financial Advisor Agreement: Page 1 / 7
7 pages
You and Your Research Dr. Richard W. Hamming
No ratings yet
You and Your Research Dr. Richard W. Hamming
28 pages
Architectural Influences Vietnam
100% (1)
Architectural Influences Vietnam
4 pages
Effect of DQ PDF
No ratings yet
Effect of DQ PDF
2 pages
Cabrera V Francisco
No ratings yet
Cabrera V Francisco
3 pages
Kadi Sarva Vishwavidyalaya: Faculty of Engineering & Technology
No ratings yet
Kadi Sarva Vishwavidyalaya: Faculty of Engineering & Technology
3 pages
Immediate download Financial Terms Dictionary Banking Terminology Explained Thomas Herold Thomas Herold ebooks 2024
100% (3)
Immediate download Financial Terms Dictionary Banking Terminology Explained Thomas Herold Thomas Herold ebooks 2024
34 pages
Mikrotik 4 WAN Load Balancing Using PCC Method
No ratings yet
Mikrotik 4 WAN Load Balancing Using PCC Method
4 pages
A Dissertation Report
No ratings yet
A Dissertation Report
73 pages
Melanie Klein
No ratings yet
Melanie Klein
5 pages
Architecture and Symbolism - A Case Study of Olusegun Obasanjo Presidential Library PDF
100% (1)
Architecture and Symbolism - A Case Study of Olusegun Obasanjo Presidential Library PDF
139 pages
NCHE Guide Lines For Programme Development
No ratings yet
NCHE Guide Lines For Programme Development
3 pages
IDT PPT 1
No ratings yet
IDT PPT 1
37 pages
Agilent Power Amplifier Design Guide
No ratings yet
Agilent Power Amplifier Design Guide
163 pages
DR Irza Wahid - Annemia Approach - 139
No ratings yet
DR Irza Wahid - Annemia Approach - 139
60 pages
Diagrama de Fases Agua Acido Acetico y Clorformo
No ratings yet
Diagrama de Fases Agua Acido Acetico y Clorformo
9 pages
Management of New Bilibid Prison: An Assessment
No ratings yet
Management of New Bilibid Prison: An Assessment
18 pages
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following Questions
No ratings yet
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following Questions
6 pages
New Directions in Children’s Welfare: Professionals, Policy and Practice 1st Edition Sharon Pinkney (Auth.) pdf download
100% (37)
New Directions in Children’s Welfare: Professionals, Policy and Practice 1st Edition Sharon Pinkney (Auth.) pdf download
48 pages
Chapter Four Brand Equity Concept
No ratings yet
Chapter Four Brand Equity Concept
19 pages

Ex.no.5_Naïve Bayesian classifier

Uploaded by

Ex.no.5_Naïve Bayesian classifier

Uploaded by

Ex no:5 Naïve Bayesian classifier

Steps in Naïve Bayesian Classifier Algorithm

Step 1: Calculate the prior probability for given class labels

P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)

Calculate Prior Probabilities:

P(Overcast) = 4/14 = 0.29

P(Yes)= 9/14 = 0.64

Calculate Posterior Probabilities:

P(Overcast |Yes) = 4/9 = 0.44

Put Prior and Posterior probabilities in equation (1)

Probability of not playing:

P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)

Calculate Prior Probabilities:

P(Overcast) = 4/14 = 0.29

P(No)= 5/14 = 0.36

Calculate Posterior Probabilities:

P(Overcast |No) = 0/9 = 0

Put Prior and Posterior probabilities in equation (2)

P (No | Overcast) = 0 * 0.36 / 0.29 = 0

1. Given a collection of text documents belonging to multiple categories (e.g., news

You might also like