Unit 2 Bayesian Learning Bayes Theorem and Bayes Optimal Classifier
Unit 2 Bayesian Learning Bayes Theorem and Bayes Optimal Classifier
Unit 2
Dr. Sushil Kumar
Associate Professor
Topics
• Bayes theorem
• Conditional Probability
• Posterior Probability
• Likelihood
• Prior Probability
• Marginal Probability
• Bayes Optimal Classifier
• Naïve Bayes classifier
• Bayesian belief networks
• EM (Expectation-Maximization) algorithm
Bayes Theorem (Proposed in 1763 by Royal Society of London)
Posterior Probability P(A|B): Likelihood P(B|A): Probability of
Probability of an event A given an event B given an event A is
an event B is already occurred already occurred
Thomas Bayes (1702–1761)
Expanded form:
Proof of Bayes Theorem
Conditional Probability
• The probability of an event A based on the occurrence of
another event B is termed conditional Probability.
Now, suppose you pick a bottle at random and find that it is defective.
What is the probability that it was produced by Machine A?
Solution:
P(A)=0.40
P(B)=0.60
P(D|A)=0.02
P(D|B)=0.05
P(D)=P(D|A)P(A)+P(D|B)P(B)
=0.02*0.04+0.05*0.60=0.038
P(A|D)=?
Example 2
A bag I contains 4 white and 6 black balls while another Bag II
contains 4 white and 3 black balls. One ball is drawn at random
from one of the bags, and it is found to be black. Find the
probability that it was drawn from Bag I.
Solution: ??
Solution:
Bag I: n(W)=4, n(B)=6
P(B| I)=6/10
P(B | II)=3/7
P(I)=P(II)=1/2
P(B)=P(I)*P(B |I)+P(II)*P(B | II)=
General Bayes Theorem
Evidence/data
Example
A company has three machines M1, M2, and M3 that produce bulbs. The
machines contribute to the total production as follows:
Machine M1 produces 30% of the bulbs.
Machine M2 produces 50% of the bulbs.
Machine M3 produces 20% of the bulbs.
The machines produce defective bulbs at the following rates:
Machine M1 produces defective bulbs 2% of the time.
Machine M2 produces defective bulbs 3% of the time.
Machine M3 produces defective bulbs 1% of the time.
A bulb picked at random, and it turns out to be defective. What is the
probability that it was produced by Machine M1, Machine M2, or Machine
M3?
Solution:
Bayes theorem on a dataset
• A dataset of emails, where each email is either labeled as spam or not
spam contains information about the presence of a specific word
("offer") in the emails. The dataset is as follows:
Contains
Email Spam
"offer"
Calculate the probability that an email is
1 Yes Yes
2 No Yes
spam, given that it contains the word "offer“.
3 Yes No
4 No Yes
5 Yes Yes
6 No No
7 No No
8 Yes Yes
Solution (4 spam (yes) out of 8 Emails)
Contains
Email Spam (Email No. 1, 5, and 8 out of all 8 Emails)
"offer"
1 Yes Yes
2 No Yes {5 emails (Emails 1, 2, 4, 5, and 8) contain the
3 Yes No word "offer“}
4 No Yes
5 Yes Yes
6 No No
7 No No
8 Yes Yes
Bayes Optimal Classifier
• Bayes Optimal Classifier is a theoretical model that provides the best
possible classification for a given problem by minimizing the probability
of misclassification.
• It uses Bayes' Theorem to calculate the posterior probabilities for each
possible class label given the evidence (or feature values), and then
selects the class with the highest posterior probability.
• The classifier is considered optimal because it maximizes the probability
of making the correct classification decision, provided that all the
probabilities (priors, likelihoods, etc.) are known exactly.
• However, in practice, it's often difficult to know these probabilities, so
the Bayes Optimal Classifier is usually used as a theoretical benchmark
rather than an actual classifier.
Formal Definition
Example: Bayes Optimal Classifier with a dataset
Consider the following dataset where we have several features for emails
(like whether they contain the words "free," "offer," or "win") and labels
(whether the email is spam or not spam).
=0.09375
=0.07031
Same