0% found this document useful (0 votes)

29 views

Foundations of Data Science - Unit 6 - Naive Bayes

The document discusses the Naive Bayes classifier machine learning algorithm. It begins with an overview of probability theory and Bayes' Theorem. It then explains how the Naive Bayes classifier uses conditional independence assumptions to simplify calculating class probabilities. The classifier predicts the class with the highest posterior probability given attributes, based on estimating attribute probabilities for each class from training data. Examples are provided to demonstrate calculating posterior probabilities using Bayes' Theorem.

Uploaded by

Maria Mushtaque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Foundations of Data Science - Unit 6 - Naive Bayes

Uploaded by

Maria Mushtaque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

3/20/2023

Data Science
Unit 6

Naïve Bayes Classifier

1
3/20/2023

Outline
▪ Supervised Learning
▪ Revision of Probability Theory and Bayes
Theorem
▪ Naïve Bayes Classifier

Dr. Muhammad Usman 3/20/2023 3

Dr. Muhammad Usman 3/20/2023 4

2
3/20/2023

Notation

▪ P(A) – Probability of an Event A

▪ P(B|A) Probability of an Event B given Event A
▪ Also called the Conditional Probability of B given A
▪ So what is the probability of you passing the exam if the
teacher is angry at you?
P(Passing|Teacher angry) = ?
Dr. Muhammad Usman 3/20/2023 5

Conditional Probability
▪ Independent Events – each event is not affected
by any other events
▪ Example – tossing a coin
▪ Dependent Events – event affected by previous
events
▪ Example – Marbles in a bag

Dr. Muhammad Usman 3/20/2023 6

3
3/20/2023

Conditional Probability Example

B ~B
C ~C C ~C
A 12 5 9 2
~A 4 8 20 4

▪ P(A | B, C) = 12/16
▪ P(A, B | ~C) = 5 / 19
▪ P(B | ~A, C) = 4 / 24
Dr. Muhammad Usman 3/20/2023 7

Conditional Independence
▪ Two events A and B are independent if knowing that A has
happened does not say anything about B happening.
P(A B) = P(A) P(B)
P(A | B) = P(A)
▪ Two events A and B are conditionally independent given a
third event C precisely if the occurrence or non-
occurrence of A and B are independent events in their
conditional probability distribution given C.
P(A B | C) = P(A | C) P(B | C)
P(A | B C) = P(A | C)

Dr. Muhammad Usman 3/20/2023 8

4
3/20/2023

▪ Suppose events A1, A2, ……., Ak are mutually exclusive and exhaustive;
i.e., exactly one of the events must occur. Then for any event B:
P(Ai | B) = P(B | Ai) P(Ai)
 P(B | Ai) P(Ai)

Dr. Muhammad Usman 3/20/2023 9

Example I
▪ According to American Lung Association, 7% of the population has lung cancer.
Of these people having lung disease, 90% are smokers; and of those not having
lung disease, 25.3% are smokers.
▪ Determine the probability that a randomly selected smoker has lung cancer.

Dr. Muhammad Usman 3/20/2023 10

5
3/20/2023

Example I Solution
▪ Let L = Lung Cancer, S = Smoker
▪ Given that
▪ P(L) = 0.07
▪ P(S | L) = 0.90 P(~S | L) = 0.10
▪ P(S | ~L) = 0.253 P(~S | ~L) = 0.747

▪ Find probability, P(L | S)

Dr. Muhammad Usman 3/20/2023 11

Example II
▪ Assume that about 1 in 1000 individuals in a given organization have
committed a security violation.
▪ Assume that the sensitivity of a routine screening polygraph is about
85%. That is, the probability that the polygraph report will indicate a
concern is about 85% if the individual has committed a security
violation.
▪ Assume the specificity of the polygraph is about 80%. That is, if the
individual has not committed a security violation, there is about an 80%
chance that the polygraph report will not indicate a concern.
▪ What is the posterior probability that an individual whose polygraph
report indicates a concern has committed a security violation?

Dr. Muhammad Usman 3/20/2023 12

6
3/20/2023

Example II Solution
▪ Let
▪ S = Security Violation Committed,
▪ T = Test Positive

▪ Given that
▪ P(S) = 0.001
▪ P(T | S) = 0.85 P(~T | S) = 0.15
▪ P(T | ~S) = 0.20 P(~T | ~S) = 0.80

▪ Find probability, P(S | T)

Dr. Muhammad Usman 3/20/2023 13

Bayesian Classifiers
▪ Consider each attribute and class label as random variables
▪ Given a record with attributes (A1, A2,…,An)
▪ Goal is to predict class C
▪ Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An )
▪ Can we estimate P(C| A1, A2,…,An ) directly from data?

Dr. Muhammad Usman 3/20/2023 14

7
3/20/2023

Bayesian Classifiers
▪ Approach:
▪ Compute the posterior probability P(C | A1, A2, …, An) for all values of
C using the Bayes theorem

▪ Choose value of C that maximizes

P(C | A1, A2, …, An)
▪ Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)
▪ How to estimate P(A1, A2, …, An | C )?

Dr. Muhammad Usman 3/20/2023 15

Naïve Bayes Classifier

▪ Naïve Bayes classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other
attributes.
▪ This assumption is called class conditional independence.
▪ It is made to simplify the computations involved and, in this sense,
is considered “naïve”.
▪ Remember:
▪ Two events A and B are conditionally independent given a third event C precisely if the
occurrence or non-occurrence of A and B are independent events in their conditional
probability distribution given C.
P(A B | C) = P(A | C) P(B | C)

Dr. Muhammad Usman 3/20/2023 16

8
3/20/2023

Naïve Bayes Classifier

Dr. Muhammad Usman 3/20/2023 17

How to Estimate
ric
al
ric
al Probabilities
ou
s from Data?
u
go go in s▪s Class: P(C) = Nc/N
te te nt
c a c a c o cla
Tid Refund Marital Taxable ▪ P(No) = ?
Status Income Evade P(Yes) = ?
1 Yes Single 125K No
2 No Married 100K No
▪ For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No
P(Ai | Ck) = |Aik|/ NCk
5 No Divorced 95K Yes
6 No Married 60K No ▪ where |Aik| is number of
7 Yes Divorced 220K No instances having attribute Ai
8 No Single 85K Yes and belongs to class Ck
9 No Married 75K No ▪ Examples:
10 No Single 90K Yes
10

P(Status=Married|No) = ?
P(Refund=Yes|Yes)= ?
Dr. Muhammad Usman 3/20/2023 18

9
3/20/2023

How to Estimate
ica
l
ica
l Probabilities
ou
s from Data?
or or in
u
teg teg nt s▪s Class: P(C) = Nc/N
c a c a c o cla
Tid Refund Marital Taxable ▪ e.g., P(No) = 7/10,
Status Income Evade P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No
▪ For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No
P(Ai | Ck) = |Aik|/ NCk
5 No Divorced 95K Yes
6 No Married 60K No ▪ where |Aik| is number of
7 Yes Divorced 220K No instances having attribute Ai
8 No Single 85K Yes and belongs to class Ck
9 No Married 75K No ▪ Examples:
10 No Single 90K Yes
10

P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
Dr. Muhammad Usman 3/20/2023 19

Naïve Bayes
Classification: Mammals vs. Non-mammals
Name
human
Give Birth
yes no
Can Fly Live in Water Have Legs
no yes
Class
mammals ▪ Train the model (learn the
python no no no no non-mammals parameters) using the given
salmon no no yes no non-mammals
whale yes no yes no mammals
data set.
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals ▪ Apply the learned model on new
bat yes yes no yes mammals cases.
pigeon no yes no yes non-mammals
cat yes no no yes mammals
leopard shark yes no yes no non-mammals
turtle no no sometimes yes non-mammals
penguin no no sometimes yes non-mammals
porcupine yes no no yes mammals
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals
platypus no no no yes mammals
owl no yes no yes non-mammals
dolphin yes no yes no mammals
eagle no yes no yes non-mammals
Dr. Muhammad Usman Give Birth Can Fly Live in Water Have Legs Class 3/20/2023 20
yes no yes no ?

10
3/20/2023

Naïve Bayes
Classification: Mammals vs. Non-mammals
Name Give Birth Can Fly Live in Water Have Legs Class A: attributes
human yes no no yes mammals
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals N: non-mammals
whale yes no yes no mammals
frog no no sometimes yes non-mammals 6 6 2 2
komodo no no no yes non-mammals P( A | M ) =    = 0.06
bat yes yes no yes mammals
7 7 7 7
pigeon no yes no yes non-mammals 1 10 3 4
cat yes no no yes mammals
P( A | N ) =    = 0.0042
13 13 13 13
leopard shark yes no yes no non-mammals
7
turtle
penguin
no
no
no
no
sometimes
sometimes
yes
yes
non-mammals
non-mammals
P( A | M ) P( M ) = 0.06  = 0.021
20
porcupine yes no no yes mammals
13
eel no no yes no non-mammals
P( A | N ) P( N ) = 0.004  = 0.0027
salamander no no sometimes yes non-mammals 20
gila monster no no no yes non-mammals
platypus no no no yes mammals P(A|M)P(M) > P(A|N)P(N)
owl no yes no yes non-mammals
dolphin yes no yes no mammals => Mammals
eagle no yes no yes non-mammals

Dr. Muhammad Usman Give Birth Can Fly Live in Water Have Legs Class 3/20/2023 21

yes no yes no ?
21

Example: Play Tennis outlook

Outlook Temperature Humidity W indy Class P(sunny|p) = 2/9 P(sunny|n) = 3/5

sunny hot high false N P(overcast|p) = 4/9 P(overcast|n) = 0

rain mild high true N

P(P) = 9/14
Outlook Temperature Humidity Windy Class
rain hot high false ? P(N) = 5/14

Dr. Muhammad Usman 3/20/2023 22

11
3/20/2023

Characteristics of Naïve Bayes Classifier

▪ They are robust to isolated noise points because such points are averaged out
when estimating conditional probabilities from data.
▪ Naïve Bayes classifiers can also handle missing values by ignoring the example
during model building and classification.
▪ They are robust to irrelevant attributes. If Xi is an irrelevant attribute, then P(Xi |
Y) becomes almost uniformly distributed.
▪ Correlated attributes can degrade the performance of naïve Bayes classifiers
because the conditional independence assumption no longer holds for such
attributes

Dr. Muhammad Usman 3/20/2023 23

How Effective are Bayesian Classifiers

▪ Various empirical studies of this classifier in comparison to decision tree and
neural network classifiers have found it to be comparable in some domain.
▪ In theory, Bayesian classifiers have the minimum error rate in comparison to all
other classifiers.
▪ However, in practice this is not always the case, owning to inaccuracies in the
assumptions made of its use, such as class conditional independence, and the
lack of available probability data.

Dr. Muhammad Usman 3/20/2023 24

Part 3 of Negida Handbook of Clinical Research
No ratings yet
Part 3 of Negida Handbook of Clinical Research
147 pages
Business Statistics Formulas
100% (9)
Business Statistics Formulas
4 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
15 pages
ML-09-naive-bayes-classifier
No ratings yet
ML-09-naive-bayes-classifier
24 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
naive_bayes
No ratings yet
naive_bayes
19 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
navie classifier
No ratings yet
navie classifier
8 pages
Lecture 2.4-2.5
No ratings yet
Lecture 2.4-2.5
16 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
I239-5 Naive Bayes
No ratings yet
I239-5 Naive Bayes
35 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
28 pages
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
100% (3)
Bayesian Classifier and ML Estimation: 6.1 Conditional Probability
11 pages
Naive-By
No ratings yet
Naive-By
23 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Bayesian Classifier Notes
No ratings yet
Bayesian Classifier Notes
9 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
PR January20 05 PDF
No ratings yet
PR January20 05 PDF
24 pages
What Is Bayes Theorem?: Something Else Has Already Occurred. Using The Conditional Probability, We Can Calculate
No ratings yet
What Is Bayes Theorem?: Something Else Has Already Occurred. Using The Conditional Probability, We Can Calculate
8 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Lect-7-DM
No ratings yet
Lect-7-DM
65 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
27 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Bayes
No ratings yet
Bayes
48 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
6 - Naive Bayes
No ratings yet
6 - Naive Bayes
26 pages
Simple Bayesian Classifier: Assist - Prof. Songül Albayrak Yıldız Teknik Üniversitesi Bilgisayar Müh. Bölümü
No ratings yet
Simple Bayesian Classifier: Assist - Prof. Songül Albayrak Yıldız Teknik Üniversitesi Bilgisayar Müh. Bölümü
15 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bays Classifier (Machine Learning)
No ratings yet
Bays Classifier (Machine Learning)
16 pages
UNIT 2 AAM notes (1)
No ratings yet
UNIT 2 AAM notes (1)
38 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Lecture No. 03
No ratings yet
Lecture No. 03
23 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
26-Bayes Rule-16-03-2024
No ratings yet
26-Bayes Rule-16-03-2024
18 pages
Bayesian Classifiers Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Bayesian Classifiers Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
17 pages
Final Advanced Strategic Management
No ratings yet
Final Advanced Strategic Management
34 pages
T AND D TODAY
No ratings yet
T AND D TODAY
10 pages
13. Chap 3 Collecting and Analyzing Marketing Information
No ratings yet
13. Chap 3 Collecting and Analyzing Marketing Information
37 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
The Procurement Game Plan (1)
No ratings yet
The Procurement Game Plan (1)
7 pages
fip end (1)
No ratings yet
fip end (1)
35 pages
HP Session 3 Organizing A Hospital
No ratings yet
HP Session 3 Organizing A Hospital
36 pages
Industry Analysis
No ratings yet
Industry Analysis
3 pages
Pharmacology Assignment
No ratings yet
Pharmacology Assignment
6 pages
HP Session 1 (Intro and Overview)
No ratings yet
HP Session 1 (Intro and Overview)
67 pages
Designing Distribution Network
No ratings yet
Designing Distribution Network
46 pages
4 Elasticity
No ratings yet
4 Elasticity
19 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual pdf download
100% (3)
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual pdf download
46 pages
Statistical Assessment of Contaminated Land: Some Implications of The 'Mean Value Test'
No ratings yet
Statistical Assessment of Contaminated Land: Some Implications of The 'Mean Value Test'
4 pages
MCQ Set1 PTSP
No ratings yet
MCQ Set1 PTSP
2 pages
DADM NOTES and Cheat Sheet
No ratings yet
DADM NOTES and Cheat Sheet
11 pages
FDS-Unit1-2 QB
No ratings yet
FDS-Unit1-2 QB
3 pages
Karachi LTE1800 Model Tuning - Cluster Comparison
No ratings yet
Karachi LTE1800 Model Tuning - Cluster Comparison
18 pages
Kinh tế lượng code R
No ratings yet
Kinh tế lượng code R
10 pages
Cointegration Copenhagen
100% (1)
Cointegration Copenhagen
51 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
University of Gondar: August 2011 E.C Gondar, Ethiopia
No ratings yet
University of Gondar: August 2011 E.C Gondar, Ethiopia
10 pages
Finding the Mean and the Variance of the Sampling Distribution of the Sample Means
No ratings yet
Finding the Mean and the Variance of the Sampling Distribution of the Sample Means
28 pages
Cheung & Rensvold 2002
100% (1)
Cheung & Rensvold 2002
24 pages
Quantitative Investment Analysis 4th Edition Cfa Institute all chapter instant download
100% (2)
Quantitative Investment Analysis 4th Edition Cfa Institute all chapter instant download
41 pages
DS 630_Lec 4_St
No ratings yet
DS 630_Lec 4_St
27 pages
Miss-Specification: Assignment No # 5
No ratings yet
Miss-Specification: Assignment No # 5
4 pages
Statistical Regression and Classification - From Linear Models To Machine Learning
100% (9)
Statistical Regression and Classification - From Linear Models To Machine Learning
532 pages
91%-UGRD-IT6210 Quantitative Methods or Quantitative (Same Title)
No ratings yet
91%-UGRD-IT6210 Quantitative Methods or Quantitative (Same Title)
14 pages
CM Assignment 2
No ratings yet
CM Assignment 2
2 pages
Cluster Validation: Presented By:Rohit Paul
No ratings yet
Cluster Validation: Presented By:Rohit Paul
22 pages
Statistics in Education
No ratings yet
Statistics in Education
7 pages
REPORT Data-Science
No ratings yet
REPORT Data-Science
4 pages
Student T - Test Notes
No ratings yet
Student T - Test Notes
7 pages
AI ML Course Content
No ratings yet
AI ML Course Content
3 pages
Decompositin Lengkap PDF
No ratings yet
Decompositin Lengkap PDF
18 pages
Hypothesis Testing For Mean ANS
No ratings yet
Hypothesis Testing For Mean ANS
5 pages
Homework Problems Stat 479: March 28, 2012
No ratings yet
Homework Problems Stat 479: March 28, 2012
44 pages
Mathematics - Statistics 2
No ratings yet
Mathematics - Statistics 2
2 pages
Factor Analysis Results
No ratings yet
Factor Analysis Results
11 pages