2024 - Slide2 - BayesML Sub
2024 - Slide2 - BayesML Sub
Today we learn:
• Bayesian classification
– E.g. How to decide if a patient is ill or healthy,
based on
• A probabilistic model of the observed data
• Prior knowledge
Classification problem
• Training data: examples of the form (d,h(d))
– where d are the data objects to classify (inputs)
– and h(d) are the correct class info for d, h(d){1,…K}
• Goal: given dnew, provide h(dnew)
Why Bayesian?
• Provides practical learning algorithms
– E.g. Naïve Bayes
• Prior knowledge and observed data can be
combined
• It is a generative (model based) approach, which
offers a useful conceptual framework
– E.g. sequences could also be classified, based on
a probabilistic model specification
– Any kind of objects can be classified, based on a
probabilistic model specification
Bayes’ Rule
Understanding Bayes' rule
P ( d | h) P ( h) d data
p(h | d ) h hypothesis (model)
P(d ) - rearranging
p ( h | d ) P ( d ) P ( d | h) P ( h)
P ( d , h) P ( d , h)
the same joint probability
Who is who in Bayes’ rule on both sides
3.Diagnosis ??
Choosing Hypotheses
• Maximum Likelihood hML arg max P(d | h)
hypothesis: hH
P ( x | e) P (e | x ) P ( x )
posterior likelihood prior
hNaive Bayes arg max P(h) P(x | h) arg max P(h) P(at | h)
h h t
arg max P(h) P(Outlook sunny | h) P(Temp cool | h) P( Humidity high | h) P(Wind strong | h)
h[ yes , no ]
• Working:
P ( PlayTennis yes) 9 / 14 0.64
P ( PlayTennis no) 5 / 14 0.36
P (Wind strong | PlayTennis yes) 3 / 9 0.33
P (Wind strong | PlayTennis no) 3 / 5 0.60
etc.
P ( yes) P( sunny | yes) P(cool | yes) P(high | yes) P ( strong | yes) 0.0053
P (no) P( sunny | no) P(cool | no) P(high | no) P( strong | no) 0.0206
answer : PlayTennis( x) no
Example: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Learning to classify text
• Learn from examples which articles are of
interest
• The attributes are the words
• Observe the Naïve Bayes assumption just
means that we have a random sequence
model within each class!
• NB classifiers are one of the most effective for
this task
• Resources for those interested:
– Tom Mitchell: Machine Learning (book) Chapter 6.
Results on a benchmark text corpus
Learning and inference pipeline
Learning Training
Labels
Training
Samples
Learned
Features Training
model
Learned
model
Inference
Features Prediction
Test Sample
Features
5 0 0
0 1 1
22
Feature vectors
• Features should be useful in discriminating
between categories.
23
Feature Representation
P(w
d 1 i 1
d ,i | classd ,i )
d: index of training document, i: index of a word
Parameter estimation
• Parameter estimate:
• Laplace’s estimate:
– Pretend you saw every r r b
outcome once more
than you actually did
Laplace Smoothing
• Laplace’s estimate (extended):
– Pretend you saw every outcome k r r b
extra times