0% found this document useful (0 votes)

8 views

Lecture 13

9.66

Uploaded by

Gio Villa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture 13

9.66

Uploaded by

Gio Villa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Class announcements

• Recitations Th, F 4 PM – 46-3189

– This week Review of Basic Bayes

• PSet 1 out today, due Oct 3. Other psets due

approximately every two weeks thereafter.

• Classes next week are virtual: We will have a guest

lecture from Vikash Mansinghka on Thursday that
you can watch asynchronously, and I may give one
virtual lecture (depending on where we end up
today).
Plan for today

Basic Bayesian cognition

– The number game
The number game

60
Diffuse similarity

60 80 10 30 Rule:
“multiples of 10”

60 52 57 55 Focused similarity:
numbers near 50-60

Main phenomena to explain:

– Generalization can appear either similarity-
based (graded) or rule-based (all-or-none).
– Learning from just a few positive examples.
A single unifying account of (number) concept learning?

• We’re going to use this to introduce Bayesian

approaches, but first consider ...
– The “naïve programmer” approach?
– The “modern neural network” approach?
Traditional (algorithmic level) cognitive models

• Multiple representational systems: rules and

similarity
– Categorization, language (past tense), reasoning
• Questions this leaves open:
– How does each system work? How far and in ways to
generalize as a function of the examples observed?
• Which rule to choose?
– E.g., X = {60, 80, 10, 30}: multiples of 10 vs. even numbers?
• Which similarity metric?
– E.g., X = {60, 53} vs. {60, 20}?
– Why these two systems?
– When and why does a learner switch between them?
Reverse-engineering a cognitive system:
Marr’s three levels

• Level 1: Computational theory

– What are the inputs and outputs to the computation,
what is its goal, and what is the logic by which it is
carried out?
• Level 2: Representation and algorithm
– How is information represented and processed to
achieve the computational goal?
• Level 3: Hardware implementation
– How is the computation realized in physical or
biological hardware?
Bayesian model
• H: Hypothesis space of possible concepts:
– h1 = {2, 4, 6, 8, 10, 12, …, 96, 98, 100} (“even numbers”)
– h2 = {10, 20, 30, 40, …, 90, 100} (“multiples of 10”)
– h3 = {2, 4, 8, 16, 32, 64} (“powers of 2”)
– h4 = {50, 51, 52, …, 59, 60} (“numbers between 50 and 60”)
– ...

Representational interpretations for H:

– Candidate rules
– Features for similarity
– “Consequential subsets” (Shepard, 1987)
Three hypothesis subspaces for number
concepts
• Mathematical properties (24 hypotheses):
– Odd, even, square, cube, prime numbers
– Multiples of small integers
– Powers of small integers
• Raw magnitude (5050 hypotheses):
– All intervals of integers with endpoints between 1 and
100.
• Approximate magnitude (10 hypotheses):
– Decades (1-10, 10-20, 20-30, …)
Bayesian model

• H: Hypothesis space of possible concepts:

– Mathematical properties: even, odd, square, prime, . . . .
– Approximate magnitude: {1-10}, {10-20}, {20-30}, . . . .
– Raw magnitude: all intervals between 1 and 100.

• X = {x1, . . . , xn}: n examples of a concept C.

• Evaluate hypotheses given data:
p ( X | h) p ( h)
p(h | X ) =
å p( X | h¢) p(h¢)
h¢ÎH
– p(h) [“prior”]: domain knowledge, pre-existing biases
– p(X|h) [“likelihood”]: statistical information in examples.
– p(h|X) [“posterior”]: degree of belief that h is the true extension of C.
Likelihood: p(X|h)
• Size principle: Smaller hypotheses receive greater
likelihood, and exponentially more so as n increases.
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h

• Captures the intuition of a “representative” sample, versus

a “suspicious coincidence”.
Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100
Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100

Data slightly more of a coincidence under h1

Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100

Data much more of a coincidence under h1

Likelihood: p(X|h)
• Size principle: Smaller hypotheses receive greater
likelihood, and exponentially more so as n increases.
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h

• Captures the intuition of a “representative” sample, versus

a “suspicious coincidence”.
• A special case of the law of “conservation of belief”:

åx
p( X = x | Y = y ) = 1
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Do we need this? Why not allow all logically possible
hypotheses, with uniform priors, and let the data sort
them out (via the likelihood)?
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.

e.g., X = {60 80 10 30}:

• X = {60, 80, 10, 30}

• Why prefer “multiples of 10” over “even
numbers”? p(X|h).
• Why prefer “multiples of 10” over “multiples of
10 except 50 and 70”? p(h).
• Why does a good generalization need both high
prior and high likelihood? p(h|X) ~ p(X|h) p(h)
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative weights of alternative theories:
H: Total hypothesis space
p(H1) = 1/5 p(H3) = 1/5
p(H2) = 3/5

H1: Math properties (24) H2: Raw magnitude (5050) H3: Approx. magnitude (10)
• even numbers • 10-15 • 10-20
• powers of two • 20-32 • 20-30
• multiples of three • 37-54 • 30-40
…. p(h) = p(H1) / 24 …. p(h) = p(H2) / 5050 …. p(h) = p(H3) / 10
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative plausibility of alternative theories:
– Mathematical properties: p(h) ~ 1/120
– Approximate magnitude: p(h) ~ 1/50
– Raw magnitude: p(h) ~ 1/8500 (on average)

• Also degrees of plausibility within a theory,

e.g., for magnitude intervals of size s:

p(s)
s
Generalizing to new objects

From hypotheses to predictions:

How do we compute the probability that C
applies to some new object y, given the posterior
p(h|X)?
Hypothesis averaging

In general, we have the law of total probability:

p(A = a) = ∑ p(A = a | Z = z) p(Z = z)
z

In general, we have the law of total probability:

p(A = a) = ∑ p(A = a | Z = z) p(Z = z)
z

p( A = a | B = b) = å p( A = a | Z = z, B = b) p(Z = z | B = b)
z
…especially useful if A and B are independent conditioned on Z:
p( A = a | B = b) = å p( A = a | Z = z) p(Z = z | B = b)
z
Another example: what is the probability that the republican will
win the election, given that the weather man predicts rain?
p( Republican win | Weather report: “Rain storm”) =
å pp((Repub.
wÎweather
Republican | W =| w)
winswin = w | Weatherman
w)p(p(wW|Weather saysstorm”)
report: “Rain ' rain' )
conditions
Generalizing to new objects

Hypothesis averaging:
Compute the probability that C applies to some
new object y by averaging the predictions of all
hypotheses h, weighted by p(h|X):

p( y Î C | X ) = å$
p( y Î C | h) p(h | X )
!#!"
hÎH é 1 if yÎh
=ê
ë 0 if yÏh

= å p(h | X )
h É{ y , X }
Examples:
16
Examples:
16
8
2
64
Examples:
16
23
19
20
+ Examples Human generalization Bayesian Model

60 80 10 30

60 52 57 55

16 8 2 64

16 23 19 20
Summary of the Bayesian model

• How do the statistics of the examples interact with

prior knowledge to guide generalization?
posterior µ likelihood ´ prior

• Why does generalization appear rule-based or

similarity-based?
hypothesis averaging + size principle

broad p(h|X): similarity gradient

narrow p(h|X): all-or-none rule
Summary of the Bayesian model

• How do the statistics of the examples interact with

prior knowledge to guide generalization?
posterior µ likelihood ´ prior

• Why does generalization appear rule-based or

similarity-based?
hypothesis averaging + size principle

broad p(h|X): Many h of similar size, or

very few examples (i.e. 1)
narrow p(h|X): One h much smaller
Model variants
1. Bayes with weak sampling
posterior µ likelihood ´ prior
hypothesis averaging + size principle

“Weak sampling” p( X | h) µ 1 if x1 ,!, xn Î h

= 0 if any xi Ï h

2. Maximum a posteriori (MAP)

Maximum likelihood /subset principle
posterior µ likelihood ´ prior
hypothesis averaging + size principle
p( y Î C | X ) = 1 if y Î h*; h* = arg max p(h | X )
hÎH
= 0 if y Ï h *
Human generalization Full Bayesian model

Bayes with weak sampling Maximum a posteriori (MAP) / subset

(no size principle) principle (no hypothesis averaging)
Taking stock

• A model of high-level, knowledge-driven inductive reasoning

that makes strong quantitative predictions with minimal free
parameters.
(r2 > 0.9 for mean judgments on 180 generalization stimuli, with 3 free
numerical parameters)
• Explains qualitatively different patterns of generalization
(rules, similarity) as the output of a single general-purpose
rational inference engine.
– Marr level 1 (Computational theory) explanation of phenomena that
have traditionally been treated only at Marr level 2 (Representation
and algorithm).
Looking forward
• Can we see these ideas at work in more natural cognitive
function, not just toy problems and games?
– How might differently structured hypothesis spaces, different
likelihood functions or priors, be needed?
• Can we move from ‘weak rational analysis’ to ‘strong
rational analysis’ in the priors, as with the likelihood?
– “Weak”: behavior consistent with some reasonable prior.
– “Strong”: behavior consistent with the “correct” prior given the
structure of the world.
• Can we work with more flexible priors, not just restricted to
a small subset of all logically possible concepts?
– Would like to be able to learn any concept, even very complex ones,
given enough data (a non-dogmatic prior).
• Can we describe formally how these hypothesis spaces and
priors are generated by abstract knowledge or theories?
• Can we explain how people learn these rich priors?
Learning more natural concepts
“horse” “horse” “horse”

“tufa”
“tufa”

“tufa”
Learning rectangle concepts

Weighting different rectangle

hypotheses based on the size principle:
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h
Generalization gradients
Full Bayes Subset principle Bayes w/o size principle
(MAP Bayes) (0/1 likelihoods)
Modeling word learning (Xu & Tenenbaum, 2007)
Modeling word learning (Xu & Tenenbaum, 2007)
Modeling word learning (Xu & Tenenbaum, 2007)
Children’s
generalizations

Bayesian
concept learning
with tree-structured
hypothesis space
Exploring different models

• Priors, likelihoods derived from simple assumptions.

What about more complex cases?
• Different likelihoods?
– Suppose the examples are sampled by a different process,
such as active learning, or active pedagogy.

• Different priors?
– More complex language-like hypothesis spaces, allowing
exceptions, compound concepts, and much more…

Exponents Investigation
50% (2)
Exponents Investigation
5 pages
Hennig 2021 Probabilistic Machine Learning
No ratings yet
Hennig 2021 Probabilistic Machine Learning
189 pages
Understanding Systems
100% (1)
Understanding Systems
51 pages
Motivation Letter Ramlan
100% (1)
Motivation Letter Ramlan
2 pages
Content Library Read
No ratings yet
Content Library Read
24 pages
Lecture 12
No ratings yet
Lecture 12
42 pages
Lecture 14
No ratings yet
Lecture 14
54 pages
Artifical Intelligence Notes Part 7
No ratings yet
Artifical Intelligence Notes Part 7
49 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
ML-Lec3
No ratings yet
ML-Lec3
10 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Math_ML-trang-6
No ratings yet
Math_ML-trang-6
53 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
AL3391 AI UNIT 5 NOTES EduEngg
100% (1)
AL3391 AI UNIT 5 NOTES EduEngg
26 pages
Leon-Garcia-IPPR_Chapters 1-6
No ratings yet
Leon-Garcia-IPPR_Chapters 1-6
180 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Joint Probability Vs Conditional Probability - by Prathap Manohar Joshi - Medium
No ratings yet
Joint Probability Vs Conditional Probability - by Prathap Manohar Joshi - Medium
1 page
Unit 2
No ratings yet
Unit 2
20 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
L2_ Mathematical Preliminaries
No ratings yet
L2_ Mathematical Preliminaries
41 pages
l1
No ratings yet
l1
13 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
PMRprobabilistic Modelling Primer
No ratings yet
PMRprobabilistic Modelling Primer
14 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
9 pages
PML-UNIT-V-Material
No ratings yet
PML-UNIT-V-Material
44 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
Ocean Razor
No ratings yet
Ocean Razor
15 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
PyCon 2015 - Bayesian Statistics Made Simple
100% (4)
PyCon 2015 - Bayesian Statistics Made Simple
145 pages
unit2 AI & ML
No ratings yet
unit2 AI & ML
29 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Stat 230 PDF
No ratings yet
Stat 230 PDF
56 pages
Causes of Uncertainty
No ratings yet
Causes of Uncertainty
14 pages
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
L2 - Mathematical Preliminaries.
No ratings yet
L2 - Mathematical Preliminaries.
42 pages
3. Probabilistic Reasoning
No ratings yet
3. Probabilistic Reasoning
37 pages
5 Prob
No ratings yet
5 Prob
35 pages
CS3491 Unit 2 Aiml
100% (1)
CS3491 Unit 2 Aiml
21 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
No ratings yet
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
65 pages
ExplainingBayesTheorem PDF
No ratings yet
ExplainingBayesTheorem PDF
18 pages
OVERVIEW OF PRINCIPLES OF STATISTICS
No ratings yet
OVERVIEW OF PRINCIPLES OF STATISTICS
8 pages
תרגול - Bayesian Learning
No ratings yet
תרגול - Bayesian Learning
45 pages
ML MODULE - 1-1
No ratings yet
ML MODULE - 1-1
25 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
No ratings yet
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
37 pages
Short History of Bayes
No ratings yet
Short History of Bayes
41 pages
Bayesian
No ratings yet
Bayesian
91 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
c9666f72511a0f23aec9d39cd8f73b69390f751b (1)
No ratings yet
c9666f72511a0f23aec9d39cd8f73b69390f751b (1)
62 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
Week_7_Notes[1]
No ratings yet
Week_7_Notes[1]
11 pages
ML Unit 1
No ratings yet
ML Unit 1
13 pages
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Eld 307 - The Perfect Pet Lesson Plan
No ratings yet
Eld 307 - The Perfect Pet Lesson Plan
2 pages
Consumer Behavior Assignment: Stimulus Generalization
No ratings yet
Consumer Behavior Assignment: Stimulus Generalization
3 pages
Grade 1 POI 2024-2025
No ratings yet
Grade 1 POI 2024-2025
15 pages
Values
100% (1)
Values
15 pages
Systems Theory
No ratings yet
Systems Theory
5 pages
Letter of Recommendation For Mary Nemec
No ratings yet
Letter of Recommendation For Mary Nemec
1 page
Lexi Cology
No ratings yet
Lexi Cology
9 pages
Historical, Cultural and Legal/Ethical Consideration H P Nineteenth Century
No ratings yet
Historical, Cultural and Legal/Ethical Consideration H P Nineteenth Century
3 pages
AJC Academy Reform Recommendations
No ratings yet
AJC Academy Reform Recommendations
3 pages
Ce168p-2 Seu # 2
No ratings yet
Ce168p-2 Seu # 2
4 pages
A Compendium of Lacanian Terms
100% (3)
A Compendium of Lacanian Terms
237 pages
Fs 6
No ratings yet
Fs 6
23 pages
Myp Parents Handbook Academic Session 2021-22: 1 Beaconhouse Newlands Is A Candidate School For The IB MYP Programme
No ratings yet
Myp Parents Handbook Academic Session 2021-22: 1 Beaconhouse Newlands Is A Candidate School For The IB MYP Programme
20 pages
Articulo Ingles Liderazgo
No ratings yet
Articulo Ingles Liderazgo
10 pages
English DLP Week 4 Day 1 3 Q2
No ratings yet
English DLP Week 4 Day 1 3 Q2
6 pages
Decision Making Skills in Managing Sexuality Related Concerns and Issues
100% (2)
Decision Making Skills in Managing Sexuality Related Concerns and Issues
11 pages
Modelo Neurofisiológico
No ratings yet
Modelo Neurofisiológico
4 pages
Department of Education: Weekly Home Learning Plan
No ratings yet
Department of Education: Weekly Home Learning Plan
1 page
Manual Silent Reading
67% (3)
Manual Silent Reading
22 pages
Bab 6,7,8,9 Canvas BMC
No ratings yet
Bab 6,7,8,9 Canvas BMC
19 pages
Firm Stone Ore Editorial Opinions
No ratings yet
Firm Stone Ore Editorial Opinions
25 pages
Lesson Plan ON Course Plan, Master Plan, and Unit Plan
100% (1)
Lesson Plan ON Course Plan, Master Plan, and Unit Plan
20 pages
Svenonius PDF
No ratings yet
Svenonius PDF
17 pages
The Scharff Technique On How To Effectiv
No ratings yet
The Scharff Technique On How To Effectiv
20 pages
Topic 5 Seminar Management Accounting
No ratings yet
Topic 5 Seminar Management Accounting
84 pages
2024 Field Study 1 and 2 Body
No ratings yet
2024 Field Study 1 and 2 Body
56 pages
Let Reviewer FOR Educational Technology
No ratings yet
Let Reviewer FOR Educational Technology
19 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Class announcements

• Recitations Th, F 4 PM – 46-3189

• PSet 1 out today, due Oct 3. Other psets due

• Classes next week are virtual: We will have a guest

Basic Bayesian cognition

Main phenomena to explain:

• We’re going to use this to introduce Bayesian

• Multiple representational systems: rules and

• Level 1: Computational theory

Representational interpretations for H:

• H: Hypothesis space of possible concepts:

• X = {x1, . . . , xn}: n examples of a concept C.

• Captures the intuition of a “representative” sample, versus

Data slightly more of a coincidence under h1

Data much more of a coincidence under h1

• Captures the intuition of a “representative” sample, versus

e.g., X = {60 80 10 30}:

• X = {60, 80, 10, 30}

• Also degrees of plausibility within a theory,

From hypotheses to predictions:

In general, we have the law of total probability:

In general, we have the law of total probability:

• How do the statistics of the examples interact with

• Why does generalization appear rule-based or

broad p(h|X): similarity gradient

• How do the statistics of the examples interact with

• Why does generalization appear rule-based or

broad p(h|X): Many h of similar size, or

“Weak sampling” p( X | h) µ 1 if x1 ,!, xn Î h

2. Maximum a posteriori (MAP)

Bayes with weak sampling Maximum a posteriori (MAP) / subset

• A model of high-level, knowledge-driven inductive reasoning

Weighting different rectangle

• Priors, likelihoods derived from simple assumptions.

You might also like