12.uncertainty Reasoning Class
12.uncertainty Reasoning Class
Uncertainty
AIFA (AI61005)
2021 Autumn
▪ Deterministic outcomes
o Robot with smell sensor does not have the exact location of
an object
o Representing uncertainty
▪ Probability: Calculus of gambling
5
Certain vs. Uncertain World
6
Probability Basics
Probabilistic Knowledge Base
Proposition are represented via Random Variables (𝑇 = ℎ𝑜𝑡)
𝑃 ¬𝛼 = 1 − 𝑃 𝛼 ∀𝛼 𝑃 𝛼 = 𝑃 𝛼∧𝑉 = 𝑑
If 𝛼 ⟺ 𝛽, 𝑃 𝛼 = 𝑃 𝛽 𝑑∈𝑑𝑜𝑚(𝑉)
9
Conditional Probability
o Conditional Probability
▪ How the belief is updated when agent has new evidence?
o Posterior Probability
▪ Conditioning on everything the agent knows about a situation
▪ 𝑃(ℎ|𝑒): Belief of proposition ℎ based on another proposition 𝑒
▪ 𝑃 ℎ = 𝑃(ℎ|𝑇𝑟𝑢𝑒): Belief on ℎ before the agent has observed
anything
10
Conditional Probability: Diagnostic Assistant
𝑃(ℎ ∧ 𝑒)
𝑃 ℎ𝑒 = , 𝑃 𝑒 >0
𝑃(𝑒)
11
Joint Probability
= ෑ 𝑃(𝛼𝑖 |𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑖 )
𝑖=1
𝑃 𝐸 = 𝑃 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛
𝛼1 ,𝛼2 ,…,𝛼𝑛 ∈𝐸
12
Joint Probability: Event
T W P
13
Joint Probability: Marginal Distribution
Marginalization: Combine collapsed rows by adding
𝑃 𝑡 = 𝑃(𝑡, 𝑤) T P
T W P 𝑤∈𝑊 hot 0.5
𝑃 𝑇
hot sun 0.4
cold 0.5
hot rain 0.1
𝑃 𝑤 = 𝑃(𝑡, 𝑤) T P
cold sun 0.2
𝑡∈𝑇 sun 0.6
cold rain 0.3 𝑃 𝑊
cold 0.4
14
Joint to Conditional Distribution
𝑃(𝑊 = 𝑠 ∧ 𝑇 = 𝑐)
𝑃 𝑊 = 𝑠|𝑇 = 𝑐 =
𝑃(𝑇 = 𝑐)
0.2 0.2
T W P = =
𝑃 𝑐, 𝑠 + 𝑃 𝑐, 𝑟 0.2 + 0.3
hot sun 0.4
0.3
hot rain 0.1
𝑃 𝑊 = 𝑟|𝑇 = 𝑐 =
0.2 + 0.3
cold sun 0.2
15
Bayes’ Rule
How should an agent update its belief in a proposition
based on a new piece of evidence?
𝐵1
𝐵3
𝐵2
𝐵𝑛
𝑃 𝐴 𝐵𝑖 × 𝑃(𝐵𝑖 )
𝑃(𝐵𝑖 |𝐴) =
σ𝑗 𝑃(𝐴|𝐵𝑗 ) × 𝑃(𝐵𝑗 )
17
Bayes’ Rule: Example 𝑃(𝐵𝑖 |𝐴) =
𝑃 𝐴 𝐵𝑖 × 𝑃(𝐵𝑖 )
σ𝑗 𝑃(𝐴|𝐵𝑗 ) × 𝑃(𝐵𝑗 )
𝐶1
60 40
Suppose the chosen marble is red. What is the prob. that 𝐶1 was chosen?
𝐶2
45 55
𝐶3
18
Independence
Number of assignments to specify joint prob. distribution
𝑃 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛 ⇒ 𝑂(2𝑛 )
𝑋 ⊥ 𝑌 ∶ ∀𝑥,𝑦 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 = 𝑃 𝑋 = 𝑥 𝑃(𝑌 = 𝑦)
∀𝑥,𝑦 𝑃 𝑥|𝑦 = 𝑃 𝑥
19
Independence
Independence is a simplifying modelling assumption
{2020_US_Presidential_Election_Result, your_toothache}
{weather, traffic, cavity, toothache}
Test of Independence:
T P
T W P T W P
hot 0.5
hot sun 0.4 hot sun
cold 0.5
hot rain 0.1 hot rain
cold 0.4
20
Independence
N fair independent coin flips
𝑃 𝑋1 𝑃 𝑋2 𝑃 𝑋𝑛
H 0.5 H 0.5 H 0.5
𝑋1 𝑋2 ……… 𝑋𝑛 𝑃
21
Conditional Independence
Two events 𝐴 and 𝐵 are conditionally independent given
another event 𝐶 with 𝑃 𝐶 > 0 :
𝑃 𝐴 ∧ 𝐵|𝐶 = 𝑃 𝐴 𝐶 𝑃(𝐵|𝐶)
𝑃(𝐴 ∧ 𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)
𝑃(𝐴 ∧ 𝐵|𝐶) 𝑃(𝐴|𝐶) ∧ 𝑃(𝐵|𝐶)
𝑃 𝐴 𝐵, 𝐶 = =
𝑃(𝐵|𝐶) 𝑃(𝐵|𝐶)
= 𝑃(𝐴|𝐶)
23
Conditional Independence
𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝐶𝑎𝑣𝑖𝑡𝑦, 𝐶𝑎𝑡𝑐ℎ
Car Motor working? (M), Radio working (R), Battery State (B)
Imagine that you know the value of Z and you are trying to guess the value of
X. In your pocket is an envelope containing the value of Y. Would opening the
envelope help you guess X. If not 𝑋 ⊥ 𝑌|𝑍
26
Bayesian Networks
Conditional Independence & Chain Rule
Interested in: 𝑃(𝑋1 , 𝑋2 , … … , 𝑋𝑛 ) 𝑂(2𝑛 ) assignments
28
Conditional Independence & Chain Rule
𝑃 𝑅𝑎𝑖𝑛, 𝑇𝑟𝑎𝑓𝑓𝑖𝑐, 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎 𝑇 ⊥ 𝑈|𝑅
𝑃 𝑈 𝑇, 𝑅 = 𝑃(𝑈|𝑅)
𝑃 𝑇 𝑈, 𝑅 = 𝑃(𝑇|𝑅)
𝑃 𝑇, 𝑈 𝑅 = 𝑃 𝑇 𝑅 𝑃(𝑈|𝑅)
29
Bayes’ Rule: Revisited
𝑃 𝑒 ℎ 𝑃(ℎ)
𝑃 ℎ𝑒 =
𝑃(𝑒)
30
Bayesian Networks
Given a random variable 𝑋, a small set of variables may directly affect
𝑋’s value
𝑋𝑗 𝑋𝑘 𝑋𝑚
𝑋𝑖
33
Bayesian Networks: Example
o Fire Diagnostic Assistant
▪ Whether there is fire in building based on noisy sensor info
o Agent Information
▪ Report: whether everyone is leaving the building
• False Positive: Report people leaving falsely
• False Negative: Does not report when everyone is leaving
▪ Fire Alarm
• People may leave when it goes off
• Tampering or fire could affect the alarm
34
Bayesian Networks: Example
Tampering=t Tampering Fire=t
Fire (F)
(T)
0.02 0.01
t t 0.5
(A) (S) t 0.9
t f 0.85 f 0.01
f t 0.99
Alarm L=t | A
f f 0.0001 Leaving
(L) t 0.88
f 0.001
Leaving R=t | L
Report
t 0.75 (R)
f 0.01
Number of Assignments: 35
Bayesian Networks: Example
𝑃(𝑇𝑎𝑚𝑝𝑒𝑟𝑖𝑛𝑔 = 𝑡, 𝐹𝑖𝑟𝑒 = 𝑓, 𝐴𝑙𝑎𝑟𝑚 = 𝑡, 𝑆𝑚𝑜𝑘𝑒 = 𝑓, 𝐿𝑒𝑎𝑣𝑖𝑛𝑔 = 𝑡, 𝑅𝑒𝑝𝑜𝑟𝑡 = 𝑡) =
𝑃(𝑆𝑚𝑜𝑘𝑒 = 𝑡) =
36
Bayesian Networks: Example
𝑃(𝐴𝑙𝑎𝑟𝑚 = 𝑡) =
𝑃(𝐿𝑒𝑎𝑣𝑖𝑛𝑔) =
37
Bayesian Networks: Example
𝑃(𝑅𝑒𝑝𝑜𝑟𝑡 = 𝑡) =
38
Bayesian Networks: Example
Burglary=t Burglary Earthquake Earthquake=t
(B) (E)
0.001 0.002
B E 𝐴 = 𝑡 | 𝐵, 𝐸
Alarm
t t 0.95 (A)
t f 0.94
f t 0.29
f f 0.001
MarryCalls
JohnCalls
A J=t | A (R) A M=t | A
(L)
t 0.90 t 0.70
f 0.05 f 0.01
39
Bayesian Networks: Construction Issues
40
Bayesian Networks: CI and BN Topology
𝑋 𝑌 𝑍 𝑊
𝑋 𝑌 𝑍
𝐿 𝑅 𝑇 T = Traffic
𝑃 𝑙, 𝑟, 𝑡 𝑃 𝑙 𝑃 𝑟 𝑙 𝑃(𝑡|𝑟)
𝑃 𝑡 𝑙, 𝑟 = = = 𝑃(𝑡|𝑟)
𝑃 𝑙, 𝑟 𝑃 𝑙 𝑃(𝑟|𝑙)
𝑂 𝑇
Whether O and T are independent given R?
𝑃(𝑟, 𝑜, 𝑡) = 𝑃 𝑟 𝑃(𝑜|𝑟) 𝑃(𝑡|𝑟)
𝑅 𝐵
𝑇′
45
Path Patterns
Active triples Inactive triples
o 𝑋 ⊥ 𝑌|𝑍 ??
▪ Yes. If X and Y are “d-separated” by Z
46
Path Pattern & CI: Examples
𝑅⊥𝑀
𝑅 𝑀
𝑅 ⊥ 𝑀|𝑇
𝑇
𝑅 ⊥ 𝑀|𝑇′
𝑇′
47
Path Pattern & CI: Examples
𝐿 𝐿 ⊥ 𝑇 ′ |𝑇
𝐿⊥𝑀
𝑅 𝑀
𝐿 ⊥ 𝑀|𝑇
𝑇
𝑂 𝐿 ⊥ 𝑀|𝑇′
𝑇′
𝐿 ⊥ 𝑀|𝑇, 𝑅
48
Path Pattern & CI: Examples
𝑇⊥𝑂
𝑂 𝑇
𝑇 ⊥ 𝑂|𝑅
𝐻
𝑇 ⊥ 𝑂|𝑅, 𝐻
O = Lawn Overflow
R = Rain
T = Traffic
H = Stuck at home
49
Topology & Distribution
{𝑋 ⊥ 𝑌, 𝑋 ⊥ 𝑍, 𝑌 ⊥ 𝑍,
𝑋 ⊥ 𝑌 𝑍, 𝑋 ⊥ 𝑍 𝑌,
𝑌 ⊥ 𝑍|𝑋}
𝑌
{}
𝑋 𝑍
𝑌 𝑌
{𝑋 ⊥ 𝑍|𝑌} 𝑋 𝑍 𝑋 𝑍
𝑌
𝑌 𝑌
𝑋 𝑍
𝑋 𝑍 𝑋 𝑍
𝑌
𝑌 𝑌
𝑋 𝑍
𝑌 𝑋 𝑍 𝑋 𝑍
𝑋 𝑍
50
Probabilistic Inference
Probabilistic Inference Problem
Let 𝑉 be the set of all variables in a given Bayesian Network.
Compute the posterior probability
𝑃(𝑋|𝐸1 , 𝐸2 , … … , 𝐸𝑚 )
𝑉 = 𝑋 ∪ 𝐸1 , 𝐸2 , … … , 𝐸𝑚 ∪ {𝑌1 , 𝑌2 , … … , 𝑌𝑙 }
Hidden variable
argmax 𝑃(𝑋|𝐸1 , 𝐸2 , … … , 𝐸𝑚 )
𝑥
Evidence variable
Query variable
53
Inference by Enumeration
𝑃(𝑋, 𝑒)
𝑃 𝑋𝑒 = = 𝛼𝑃 𝑋, 𝑒 = 𝛼 𝑃(𝑋, 𝑒, 𝑦)
𝑃(𝑒)
𝑦
B E
𝑃 𝑏 𝑗, 𝑚 = 𝛼 𝑃(𝑏, 𝑗, 𝑚, 𝑒, 𝑎)
𝑒 𝑎
A
𝑃 𝑏 𝑗, 𝑚 = 𝛼 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
J M
Total number of products: 22 × 4 = 16
𝑃(𝑗) = 𝛼 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑏 𝑒 𝑎 𝑚
𝑃(𝑗) = 𝛼 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎) 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒
𝑎 𝑚 𝑏 𝑒
56
Variable Elimination
▪ Dynamic programming
o Works on factors
57
Variable Elimination
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑃(𝐵) 𝑃 𝑒 𝑃 𝑎 𝐵, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
𝑓1 (𝐵) 𝑓2 (𝐸) 𝑓3 (𝐴, 𝐵, 𝐸) 𝑓4 (𝐴) 𝑓5 (𝐴)
58
Variable Elimination
Evaluation Process (right-to-left): Product → Sum → Product → Sum …
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × 𝑓2 𝐸 × 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)
𝑒 𝑎
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × 𝑓2 𝐸 × 𝑓6 (𝐵, 𝐸)
𝑒
Product → 𝑓7 (𝐵) = 𝑓2 (𝐸) × 𝑓6 (𝐵, 𝐸)
Sum out on E 𝑒
= 𝑓2 (𝑒) × 𝑓6 𝐵, 𝑒 + 𝑓2 (¬𝑒) × 𝑓6 𝐵, ¬𝑒
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × 𝑓7 (𝐵)
59
Variable Elimination
Pointwise Product: ×
𝑓 𝑋1 , … , 𝑋𝑗 , 𝑌1 , … , 𝑌𝑘 , 𝑍1 , … , 𝑍𝑙 = 𝑓1 𝑋1 , … , 𝑋𝑗 , 𝑌1 , … , 𝑌𝑘 × 𝑓2 𝑌1 , … , 𝑌𝑘 , 𝑍1 , … , 𝑍𝑙
𝑓 𝐵, 𝐶 = 𝑓3 𝐴, 𝐵, 𝐶 = 𝑓3 𝑎, 𝐵, 𝐶 + 𝑓3 (¬𝑎, 𝐵, 𝐶)
𝑎
.06 .24 .18 .72 .24 .96
= + =
.42 .28 .06 .04 .48 .32
60
Variable Elimination
Variable Ordering:
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × 𝑓2 𝐸 × 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)
𝑒 𝑎
𝑃 𝑚 𝑎 = 𝑃 𝑚 𝑎 + 𝑃 ¬𝑚 𝑎 = 1
𝑚
o Variable M is irrelevant
o Any leaf node that is not a query variable or evidence can be removed
o Every variable that is not an ancestor of a query variable or evidence
variable is irrelevant to the query.
62
Variable Elimination: Example
Joint Distribution: 𝑃(𝑇, 𝑊) Selected Joint: 𝑃(𝑇 = 𝑐𝑜𝑙𝑑, 𝑊) Single Conditional: 𝑃(𝑊|𝑇 = 𝑐𝑜𝑙𝑑)
T W P T W P T W P
hot sun 0.4 cold sun 0.2 cold sun 0.4
hot rain 0.1 cold rain 0.3 cold rain 0.6
cold sun 0.2
cold rain 0.3
T W P T W P
hot sun 0.8 hot rain 0.2
Factors
hot rain 0.2 cold rain 0.6
cold sun 0.4
cold rain 0.6
63
Example
R=Raining
B=Bad Weather
L=Late for Class
R B L
𝑃 𝑅 𝑃 𝐵𝑅 𝑃(𝐿|𝐵)
64
Inference by Enumeration: Example
𝑃(𝑅, 𝐵, 𝐿)
65
Inference by Enumeration: Example
𝑃(𝑅, 𝐵, 𝐿)
R,B 𝑟 𝑏 𝑙 0.024
𝑟 𝑏 0.08 𝑏 𝑙 0.3
𝑟 𝑏 ¬𝑙 0.056
𝑟 ¬𝑏 0.02 𝑏 ¬𝑙 0.7 Join B ¬𝑟 ¬𝑏 𝑙 0.002
¬𝑟 𝑏 0.09 ¬𝑏 𝑙 0.1
¬𝑟 ¬𝑏 ¬𝑙 0.018
¬𝑟 ¬𝑏 0.81 ¬𝑏 ¬𝑙 0.9
L … … … …
𝑃 𝑅, 𝐵 × 𝑃 𝐿𝐵 = 𝑃(𝑅, 𝐵, 𝐿)
66
Inference by Enumeration: Example
Marginalization
𝑟 𝑏 0.08
𝑟 ¬𝑏 0.02 Sum R 𝑏 0.17 = 0.08+0.09
¬𝑟 𝑏 0.09 ¬𝑏 0.83 = 0.02+0.81
¬𝑟 ¬𝑏 0.81
Sum R Sum B
𝑃(𝑅, 𝐵, 𝐿) 𝑃(𝐵, 𝐿) 𝑃(𝐿)
𝑃 𝐿 = 𝑃 𝑅 × 𝑃 𝐵 𝑅 ×𝑃 𝐿 𝐵
𝐵 𝑅 67
Variable Elimination
Interleaved Join-Eliminate
𝑃(𝑅)
Join R
R
𝑃(𝑅, 𝐵)
Sum R
𝑃(𝐵|𝑅)
B 𝑃(𝐵) 𝑃(𝐿)
Sum B
Join B
L
𝑃(𝐿|𝐵) 𝑃(𝐵, 𝐿)
𝑃 𝐿 = 𝑃 𝐿 𝐵 × 𝑃 𝑅 × 𝑃(𝐵|𝑅)
𝐵 𝑅 68
Variable Elimination
𝑃(𝐿|𝑟)
𝑃(𝐵|𝑟)
𝑃(𝑟) 𝑟 𝑏 0.8
𝑃(𝐿, 𝑟)
𝑟 0.1 𝑟 ¬𝑏 0.2
𝑟 𝑙 0.026
𝑟 ¬𝑙 0.074
𝑃(𝑟, 𝐵)
𝑟 𝑏 0.08
𝑟 ¬𝑏 0.02
𝑟 𝑏 𝑙 0.024 𝑃(𝐿|𝑟)
𝑃(𝐿|𝐵) 𝑟 𝑏 ¬𝑙 0.056 𝑟 𝑙 0.26
𝑏 𝑙 0.3 𝑟 ¬𝑏 𝑙 0.002 𝑟 ¬𝑙 0.74
𝑏 ¬𝑙 0.7 𝑟 ¬𝑏 ¬𝑙 0.018
¬𝑏 𝑙 0.1 𝑃(𝑟, 𝐿, 𝐵)
¬𝑏 ¬𝑙 0.9
69
Summary
o Modelling uncertainty and belief with probability
o Using conditional independence to factorize joint probability in an
efficient way.
o Bayesian Network is a graphical model of a set of conditional
independences
o Exact inference in BN with variable elimination
o Approximate inference algorithms (Monte Carlo, Gibbs Sampling,
etc.)
o Probabilistic reasoning with time (Hidden Markov Models, Dynamic
BN)
70