Lecture 13
Lecture 13
60
Diffuse similarity
60 80 10 30 Rule:
“multiples of 10”
60 52 57 55 Focused similarity:
numbers near 50-60
åx
p( X = x | Y = y ) = 1
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Do we need this? Why not allow all logically possible
hypotheses, with uniform priors, and let the data sort
them out (via the likelihood)?
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
H1: Math properties (24) H2: Raw magnitude (5050) H3: Approx. magnitude (10)
• even numbers • 10-15 • 10-20
• powers of two • 20-32 • 20-30
• multiples of three • 37-54 • 30-40
…. p(h) = p(H1) / 24 …. p(h) = p(H2) / 5050 …. p(h) = p(H3) / 10
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative plausibility of alternative theories:
– Mathematical properties: p(h) ~ 1/120
– Approximate magnitude: p(h) ~ 1/50
– Raw magnitude: p(h) ~ 1/8500 (on average)
p(s)
s
Generalizing to new objects
p( A = a | B = b) = å p( A = a | Z = z, B = b) p(Z = z | B = b)
z
…especially useful if A and B are independent conditioned on Z:
p( A = a | B = b) = å p( A = a | Z = z) p(Z = z | B = b)
z
Hypothesis averaging
p( A = a | B = b) = å p( A = a | Z = z, B = b) p(Z = z | B = b)
z
…especially useful if A and B are independent conditioned on Z:
p( A = a | B = b) = å p( A = a | Z = z) p(Z = z | B = b)
z
Another example: what is the probability that the republican will
win the election, given that the weather man predicts rain?
p( Republican win | Weather report: “Rain storm”) =
å pp((Repub.
wÎweather
Republican | W =| w)
winswin = w | Weatherman
w)p(p(wW|Weather saysstorm”)
report: “Rain ' rain' )
conditions
Generalizing to new objects
Hypothesis averaging:
Compute the probability that C applies to some
new object y by averaging the predictions of all
hypotheses h, weighted by p(h|X):
p( y Î C | X ) = å$
p( y Î C | h) p(h | X )
!#!"
hÎH é 1 if yÎh
=ê
ë 0 if yÏh
= å p(h | X )
h É{ y , X }
Examples:
16
Examples:
16
8
2
64
Examples:
16
23
19
20
+ Examples Human generalization Bayesian Model
60
60 80 10 30
60 52 57 55
16
16 8 2 64
16 23 19 20
Summary of the Bayesian model
“tufa”
“tufa”
“tufa”
Learning rectangle concepts
Bayesian
concept learning
with tree-structured
hypothesis space
Exploring different models
• Different priors?
– More complex language-like hypothesis spaces, allowing
exceptions, compound concepts, and much more…