0% found this document useful (0 votes)
11 views

12.uncertainty Reasoning Class

Uploaded by

plaban
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

12.uncertainty Reasoning Class

Uploaded by

plaban
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Reasoning with

Uncertainty

AIFA (AI61005)
2021 Autumn

Plaban Kumar Bhowmick


Introduction
Agent in Uncertain World

o World view till now


▪ Completely observable

▪ Deterministic outcomes

▪ Complete state information

o Uncertainty in world view


▪ Partially observable

▪ Outcomes are non-deterministic

▪ Incomplete state information


3
Agent in Uncertain World: Example

o Robot with smell sensor does not have the exact location of
an object

o Doctor diagnoses based on observation of patients

o A teacher does not know exactly what or whether a student


understands

o Exact causes of malfunction of a machine

o A bomb detector runs multiple probes


4
Agent in face of Uncertainty

o Has certain belief about the state of the partially


observable world

o Has to reason with uncertain knowledge/belief

o Live with uncertain outcome

o Representing uncertainty
▪ Probability: Calculus of gambling

5
Certain vs. Uncertain World

o CSP – States and Variables


▪ Constraints eliminate some words. T W CSP P

▪ Other worlds are possible hot sun T 0.4

o Probabilistic – States and Variables hot rain F 0.1

cold sun F 0.2


▪ Does not eliminate any world
cold rain T 0.3
▪ Some worlds are more likely; others
are less

6
Probability Basics
Probabilistic Knowledge Base
Proposition are represented via Random Variables (𝑇 = ℎ𝑜𝑡)

Probability Distribution (𝑃(𝑋)):


𝑃 𝑋 : 𝑑𝑜𝑚 𝑋 ⟼ ℝ such that
for 𝑥 ∈ 𝑑𝑜𝑚 𝑋 , 𝑃 𝑥 = 𝑃 𝑋 = 𝑥 is the probability of
proposition 𝑋 = 𝑥

Joint Probability Distribution (𝑃(𝑋, 𝑌)):


𝑃 𝑋, 𝑌 : distribution of probability of the expression
𝑋 = 𝑥 ∧ 𝑌 = 𝑦 i.e., 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦
for 𝑥 ∈ 𝑑𝑜𝑚 𝑋 , 𝑦 ∈ 𝑑𝑜𝑚(𝑌)
8
Probabilistic Knowledge Base: Axioms

Axiom 1: 0 ≤ 𝑃(𝛼) for any proposition 𝛼


Axiom 2: 𝑃 𝜏 = 1 if 𝜏 is a tautology
Axiom 3: If 𝛼 and 𝛽 are mutually exclusive [¬(𝛼 ∧ 𝛽) is a tautology]
𝑃 𝛼 ∨ 𝛽 = 𝑃 𝛼 + 𝑃(𝛽)

𝑃 ¬𝛼 = 1 − 𝑃 𝛼 ∀𝛼 𝑃 𝛼 = ෍ 𝑃 𝛼∧𝑉 = 𝑑
If 𝛼 ⟺ 𝛽, 𝑃 𝛼 = 𝑃 𝛽 𝑑∈𝑑𝑜𝑚(𝑉)

𝑃 𝛼 = 𝑃 𝛼 ∧ 𝛽 + 𝑃(𝛼 ∧ ¬𝛽) 𝑃 𝛼 ∨ 𝛽 = 𝑃 𝛼 + 𝑃 𝛽 + 𝑃(𝛼 ∧ 𝛽)

9
Conditional Probability

o Agent belief: Prior probability

o Conditional Probability
▪ How the belief is updated when agent has new evidence?

o Posterior Probability
▪ Conditioning on everything the agent knows about a situation
▪ 𝑃(ℎ|𝑒): Belief of proposition ℎ based on another proposition 𝑒
▪ 𝑃 ℎ = 𝑃(ℎ|𝑇𝑟𝑢𝑒): Belief on ℎ before the agent has observed
anything

10
Conditional Probability: Diagnostic Assistant

o Patient’s symptoms are evidences [𝑒]

o Prior: Distribution over possible diseases before looking at


the symptoms [𝑃(ℎ)]

o Posterior: Probability distribution over diseases after


considering evidence [𝑃 ℎ|𝑒 ]

𝑃(ℎ ∧ 𝑒)
𝑃 ℎ𝑒 = , 𝑃 𝑒 >0
𝑃(𝑒)
11
Joint Probability

Chain Rule: Decompose Conjunctions


𝑃 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛
= 𝑃 𝛼1 × 𝑃 𝛼2 𝛼1 × 𝑃 𝛼3 𝛼1 ∧ 𝛼2 × ⋯ × 𝑃 𝛼𝑛 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛−1
𝑛

= ෑ 𝑃(𝛼𝑖 |𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑖 )
𝑖=1

Event: Set 𝐸 of outcomes

𝑃 𝐸 = ෍ 𝑃 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛
𝛼1 ,𝛼2 ,…,𝛼𝑛 ∈𝐸
12
Joint Probability: Event

T W P

hot sun 0.4 Event: Set 𝐸 of outcomes


hot rain 0.1 𝑃 ℎ𝑜𝑡 ∧ 𝑠𝑢𝑛 = 0.4
cold sun 0.2 𝑃 ℎ𝑜𝑡 = 0.4 + 0.1 = 0.5
cold rain 0.3 𝑃 ℎ𝑜𝑡 ∨ 𝑠𝑢𝑛 = 0.4 + 0.1 + 0.2 = 0.7

13
Joint Probability: Marginal Distribution
Marginalization: Combine collapsed rows by adding

𝑃 𝑡 = ෍ 𝑃(𝑡, 𝑤) T P
T W P 𝑤∈𝑊 hot 0.5
𝑃 𝑇
hot sun 0.4
cold 0.5
hot rain 0.1

𝑃 𝑤 = ෍ 𝑃(𝑡, 𝑤) T P
cold sun 0.2
𝑡∈𝑇 sun 0.6
cold rain 0.3 𝑃 𝑊
cold 0.4

14
Joint to Conditional Distribution

𝑃(𝑊 = 𝑠 ∧ 𝑇 = 𝑐)
𝑃 𝑊 = 𝑠|𝑇 = 𝑐 =
𝑃(𝑇 = 𝑐)
0.2 0.2
T W P = =
𝑃 𝑐, 𝑠 + 𝑃 𝑐, 𝑟 0.2 + 0.3
hot sun 0.4
0.3
hot rain 0.1
𝑃 𝑊 = 𝑟|𝑇 = 𝑐 =
0.2 + 0.3
cold sun 0.2

cold rain 0.3 𝑃(𝛼1 ∧ 𝛼2 ) 𝑃(𝛼1 ∧ 𝛼2 )


𝑃 𝛼1 |𝛼2 = =
𝑃(𝛼2 ) σ𝛼1 𝑃(𝛼1 + 𝛼2 )

15
Bayes’ Rule
How should an agent update its belief in a proposition
based on a new piece of evidence?

𝑃 ℎ 𝑘 −→ Belief in proposition ℎ based on knowledge 𝑘

𝑃 ℎ 𝑒 ∧ 𝑘 −→ New belief in proposition ℎ based on new


evidence 𝑒
𝑃 ℎ∧𝑒
=𝑃 ℎ 𝑒 ×𝑃 𝑒
= 𝑃 𝑒 ℎ × 𝑃(ℎ)
𝑃 𝑒 ℎ ∧ 𝑘 × 𝑃(ℎ ∧ 𝑘)
𝑃 ℎ 𝑒∧𝑘 =
𝑃(𝑒|𝑘) 𝑃 𝑒 ℎ 𝑃(ℎ)
𝑃 ℎ𝑒 =
𝑃(𝑒)
16
Bayes’ Rule

Let the sample space is partitioned into 𝐵1 , 𝐵2 , … , 𝐵𝑛

𝐵1
𝐵3

𝐵2
𝐵𝑛

𝑃 𝐴 𝐵𝑖 × 𝑃(𝐵𝑖 )
𝑃(𝐵𝑖 |𝐴) =
σ𝑗 𝑃(𝐴|𝐵𝑗 ) × 𝑃(𝐵𝑗 )

17
Bayes’ Rule: Example 𝑃(𝐵𝑖 |𝐴) =
𝑃 𝐴 𝐵𝑖 × 𝑃(𝐵𝑖 )
σ𝑗 𝑃(𝐴|𝐵𝑗 ) × 𝑃(𝐵𝑗 )

A container is chosen at random and a marble from it is picked at random.


75 25 What is the prob. that the chosen marble is red?

𝐶1

60 40

Suppose the chosen marble is red. What is the prob. that 𝐶1 was chosen?
𝐶2

45 55

𝐶3
18
Independence
Number of assignments to specify joint prob. distribution
𝑃 𝛼1 ∧ 𝛼2 ∧ ⋯ ∧ 𝛼𝑛 ⇒ 𝑂(2𝑛 )

Assume some events to be independent

𝑋 ⊥ 𝑌 ∶ ∀𝑥,𝑦 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 = 𝑃 𝑋 = 𝑥 𝑃(𝑌 = 𝑦)
∀𝑥,𝑦 𝑃 𝑥|𝑦 = 𝑃 𝑥

19
Independence
Independence is a simplifying modelling assumption
{2020_US_Presidential_Election_Result, your_toothache}
{weather, traffic, cavity, toothache}

Test of Independence:
T P
T W P T W P
hot 0.5
hot sun 0.4 hot sun
cold 0.5
hot rain 0.1 hot rain

cold sun 0.2 T P cold sun

cold rain 0.3 sun 0.6 cold rain

cold 0.4
20
Independence
N fair independent coin flips
𝑃 𝑋1 𝑃 𝑋2 𝑃 𝑋𝑛
H 0.5 H 0.5 H 0.5

T 0.5 T 0.5 T 0.5

𝑋1 𝑋2 ……… 𝑋𝑛 𝑃

21
Conditional Independence
Two events 𝐴 and 𝐵 are conditionally independent given
another event 𝐶 with 𝑃 𝐶 > 0 :

𝑃 𝐴 ∧ 𝐵|𝐶 = 𝑃 𝐴 𝐶 𝑃(𝐵|𝐶)

𝑃(𝐴 ∧ 𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)
𝑃(𝐴 ∧ 𝐵|𝐶) 𝑃(𝐴|𝐶) ∧ 𝑃(𝐵|𝐶)
𝑃 𝐴 𝐵, 𝐶 = =
𝑃(𝐵|𝐶) 𝑃(𝐵|𝐶)
= 𝑃(𝐴|𝐶)

23
Conditional Independence
𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝐶𝑎𝑣𝑖𝑡𝑦, 𝐶𝑎𝑡𝑐ℎ

𝑃 +𝑐𝑎𝑡𝑐ℎ| + 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, +𝑐𝑎𝑣𝑖𝑡𝑦 = 𝑃 +𝑐𝑎𝑡𝑐ℎ| + 𝑐𝑎𝑣𝑖𝑡𝑦

𝑃 +𝑐𝑎𝑡𝑐ℎ| + 𝑡𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, −𝑐𝑎𝑣𝑖𝑡𝑦 = 𝑃 +𝑐𝑎𝑡𝑐ℎ| − 𝑐𝑎𝑣𝑖𝑡𝑦

𝑃 𝐶𝑎𝑡𝑐ℎ|𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝐶𝑎𝑣𝑖𝑡𝑦 = 𝑃 𝐶𝑎𝑡𝑐ℎ|𝐶𝑎𝑣𝑖𝑡𝑦

𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒|𝐶𝑎𝑡𝑐ℎ, 𝐶𝑎𝑣𝑖𝑡𝑦 = 𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒|𝐶𝑎𝑣𝑖𝑡𝑦

𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒, 𝐶𝑎𝑡𝑐ℎ|𝐶𝑎𝑣𝑖𝑡𝑦 = 𝑃 𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒|𝐶𝑎𝑣𝑖𝑡𝑦 𝑃 𝐶𝑎𝑡𝑐ℎ|𝐶𝑎𝑣𝑖𝑡𝑦

𝑇𝑜𝑜𝑡ℎ𝑎𝑐ℎ𝑒 ⊥ 𝐶𝑎𝑡𝑐ℎ | 𝐶𝑎𝑣𝑖𝑡𝑦


25
Conditional Independence
Amount of speeding fine (SF), Type of Car (CT), Speed (S)

Lung Cancer (LC), Yellow Teeth (YT), Smoking (S)

Car Motor working? (M), Radio working (R), Battery State (B)

Future (F), Past (P), Present (C)

Imagine that you know the value of Z and you are trying to guess the value of
X. In your pocket is an envelope containing the value of Y. Would opening the
envelope help you guess X. If not 𝑋 ⊥ 𝑌|𝑍
26
Bayesian Networks
Conditional Independence & Chain Rule
Interested in: 𝑃(𝑋1 , 𝑋2 , … … , 𝑋𝑛 ) 𝑂(2𝑛 ) assignments

Chain Rule: 𝑃 𝑋1 , 𝑋2 , … … , 𝑋𝑛 = 𝑃 𝑋1 𝑃 𝑋2 𝑋1 … … 𝑃(𝑋𝑛 |𝑋1 , 𝑋2 , … … 𝑋𝑛−1 )

Still 𝑂(2𝑛 ) assignments

Make Conditional Independence Assumption : 𝑃 𝑋𝑖 𝑋𝑗 , 𝑋𝑘 = 𝑃 𝑋𝑖 𝑋𝑗

Markov Assumption: 𝑃 𝑋1 , 𝑋2 , … … , 𝑋𝑛 = 𝑃 𝑋1 𝑃 𝑋2 𝑋1 … … 𝑃(𝑋𝑛 |𝑋𝑛−1 )

28
Conditional Independence & Chain Rule
𝑃 𝑅𝑎𝑖𝑛, 𝑇𝑟𝑎𝑓𝑓𝑖𝑐, 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎 𝑇 ⊥ 𝑈|𝑅
𝑃 𝑈 𝑇, 𝑅 = 𝑃(𝑈|𝑅)
𝑃 𝑇 𝑈, 𝑅 = 𝑃(𝑇|𝑅)
𝑃 𝑇, 𝑈 𝑅 = 𝑃 𝑇 𝑅 𝑃(𝑈|𝑅)

𝑃 𝑅𝑎𝑖𝑛, 𝑇𝑟𝑎𝑓𝑓𝑖𝑐, 𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎 = 𝑃 𝑅𝑎𝑖𝑛 𝑃 𝑇𝑟𝑎𝑓𝑓𝑖𝑐 𝑅𝑎𝑖𝑛 𝑃(𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎|𝑇𝑟𝑎𝑓𝑓𝑖𝑐, 𝑅𝑎𝑖𝑛)


= 𝑃 𝑅𝑎𝑖𝑛 𝑃 𝑇𝑟𝑎𝑓𝑓𝑖𝑐 𝑅𝑎𝑖𝑛 𝑃(𝑈𝑚𝑏𝑟𝑒𝑙𝑙𝑎|𝑅𝑎𝑖𝑛)

29
Bayes’ Rule: Revisited
𝑃 𝑒 ℎ 𝑃(ℎ)
𝑃 ℎ𝑒 =
𝑃(𝑒)

Modelling Causality with Bayes’ Rule


𝑃 𝐸𝑓𝑓𝑒𝑐𝑡 𝐶𝑎𝑢𝑠𝑒 𝑃(𝐶𝑎𝑢𝑠𝑒)
𝑃 𝐶𝑎𝑢𝑠𝑒 𝐸𝑓𝑓𝑒𝑐𝑡 =
𝑃(𝐶𝑎𝑢𝑠𝑒)
Diagnostic
probability 𝑃 𝑎𝑙𝑎𝑟𝑚 𝑓𝑖𝑟𝑒 𝑃(𝑓𝑖𝑟𝑒)
𝑃 𝑓𝑖𝑟𝑒 𝑎𝑙𝑎𝑟𝑚 =
𝑃(𝑎𝑙𝑎𝑟𝑚)

30
Bayesian Networks
Given a random variable 𝑋, a small set of variables may directly affect
𝑋’s value

𝑋 is conditionally independent of other variables given


values for the directly affecting variables

𝑋𝑗 𝑋𝑘 𝑋𝑚

𝑋𝑖

Bayesian Network is a graphical representation of conditional


dependence among a set of random variables 31
Bayesian Networks
𝑋𝑗 𝑋𝑘 𝑋𝑚
Chain Rule:
𝑛

𝑃 𝑋1 , 𝑋2 , … … , 𝑋𝑛 = ෑ 𝑃(𝑋𝑖 |𝑋1 , 𝑋2 , … … 𝑋𝑖−1 )


𝑋𝑖
𝑖=1
𝑃𝑎𝑟𝑒𝑛𝑡 𝑋𝑖 = minimal set of predecessors of 𝑋𝑖 in the total
ordering such that other predecessors are
conditionally independent of 𝑋𝑖 given 𝑃𝑎𝑟𝑒𝑛𝑡 𝑋𝑖
𝑃𝑎𝑟𝑒𝑛𝑡 𝑋𝑖 ⊆ {𝑋1 , 𝑋2 , … … , 𝑋𝑖−1 }

𝑃 𝑋𝑖 𝑋1 , 𝑋2 , … … 𝑋𝑖−1 = 𝑃(𝑋𝑖 |𝑃𝑎𝑟𝑒𝑛𝑡 𝑋𝑖 )


𝑛

𝑃 𝑋1 , 𝑋2 , … … , 𝑋𝑛 = ෑ 𝑃(𝑋𝑖 |𝑃𝑎𝑟𝑒𝑛𝑡(𝑋1 )) factors of JPD


𝑖=1 32
Bayesian Networks
Bayesian Belief Networks: Representation

A Directed Acyclic Graph (DAG)

Each node is labeled by a random variables

A domain of each random variable

A set of CPDs associated each the nodes

𝑃(𝑋𝑖 |𝑃𝑎𝑟𝑒𝑛𝑡 𝑋𝑖 ) for each node 𝑋𝑖

33
Bayesian Networks: Example
o Fire Diagnostic Assistant
▪ Whether there is fire in building based on noisy sensor info

o Agent Information
▪ Report: whether everyone is leaving the building
• False Positive: Report people leaving falsely
• False Negative: Does not report when everyone is leaving

▪ Fire Alarm
• People may leave when it goes off
• Tampering or fire could affect the alarm

▪ Smoke: Fire can cause smoke to rise

34
Bayesian Networks: Example
Tampering=t Tampering Fire=t
Fire (F)
(T)
0.02 0.01

Tampering Fire 𝐴 = 𝑡 | 𝑇, 𝐹 Alarm Smoke


Fire S=t | F

t t 0.5
(A) (S) t 0.9

t f 0.85 f 0.01

f t 0.99
Alarm L=t | A
f f 0.0001 Leaving
(L) t 0.88
f 0.001

Leaving R=t | L
Report
t 0.75 (R)
f 0.01

Number of Assignments: 35
Bayesian Networks: Example
𝑃(𝑇𝑎𝑚𝑝𝑒𝑟𝑖𝑛𝑔 = 𝑡, 𝐹𝑖𝑟𝑒 = 𝑓, 𝐴𝑙𝑎𝑟𝑚 = 𝑡, 𝑆𝑚𝑜𝑘𝑒 = 𝑓, 𝐿𝑒𝑎𝑣𝑖𝑛𝑔 = 𝑡, 𝑅𝑒𝑝𝑜𝑟𝑡 = 𝑡) =

𝑃(𝑆𝑚𝑜𝑘𝑒 = 𝑡) =

36
Bayesian Networks: Example
𝑃(𝐴𝑙𝑎𝑟𝑚 = 𝑡) =

𝑃(𝐿𝑒𝑎𝑣𝑖𝑛𝑔) =

37
Bayesian Networks: Example
𝑃(𝑅𝑒𝑝𝑜𝑟𝑡 = 𝑡) =

38
Bayesian Networks: Example
Burglary=t Burglary Earthquake Earthquake=t
(B) (E)
0.001 0.002

B E 𝐴 = 𝑡 | 𝐵, 𝐸
Alarm
t t 0.95 (A)

t f 0.94
f t 0.29
f f 0.001

MarryCalls
JohnCalls
A J=t | A (R) A M=t | A
(L)
t 0.90 t 0.70
f 0.05 f 0.01

39
Bayesian Networks: Construction Issues

40
Bayesian Networks: CI and BN Topology

𝑋 𝑌 𝑍 𝑊

𝑃(𝑋, 𝑌, 𝑍, 𝑊) = 𝑃 𝑋 𝑃(𝑌|𝑋) 𝑃(𝑍|𝑋, 𝑌) 𝑃(𝑊|𝑋, 𝑌, 𝑍)

𝑃(𝑋, 𝑌, 𝑍, 𝑊) = 𝑃 𝑋 𝑃(𝑌|𝑋) 𝑃(𝑍|𝑌) 𝑃(𝑊|𝑍)

𝑋 𝑌 𝑍

Are X and Z truly independent?


Low pressure causes rain, rain causes traffic
D-separation to study independence properties 41
D-Separation: Causal Chains
L = Low Pressure
R = Rain

𝐿 𝑅 𝑇 T = Traffic

𝑃(𝑙, 𝑟, 𝑡) = 𝑃 𝑙 𝑃(𝑟|𝑙) 𝑃(𝑡|𝑟)

Whether L and T are independent? NO

Whether L and T are independent given R?

𝑃 𝑙, 𝑟, 𝑡 𝑃 𝑙 𝑃 𝑟 𝑙 𝑃(𝑡|𝑟)
𝑃 𝑡 𝑙, 𝑟 = = = 𝑃(𝑡|𝑟)
𝑃 𝑙, 𝑟 𝑃 𝑙 𝑃(𝑟|𝑙)

Evidence along the chain blocks influence or information flow 42


Common Cause

Whether O and T are independent? NO

𝑂 𝑇
Whether O and T are independent given R?
𝑃(𝑟, 𝑜, 𝑡) = 𝑃 𝑟 𝑃(𝑜|𝑟) 𝑃(𝑡|𝑟)

O = Lawn Overflow 𝑃 𝑟, 𝑜, 𝑡 𝑃 𝑟 𝑃 𝑜 𝑟 𝑃(𝑡|𝑟)


𝑃 𝑡 𝑜, 𝑟 = = = 𝑃(𝑡|𝑟)
R = Rain 𝑃 𝑜, 𝑟 𝑃 𝑟 𝑃(𝑜|𝑟)
T = Traffic

Observing the cause blocks influence between effects


43
Common Effect (V-Structure)
𝑅 𝑀

Whether R and M are independent? YES


𝑇

Whether R and M are independent given T? NO


𝑃(𝑟, 𝑜, 𝑡) = 𝑃 𝑟 𝑃(𝑜|𝑟) 𝑃(𝑡|𝑟)
Traffic is a cause of two competing explanations
M = IPL Match
R = Rain
T = Traffic

Observing an effect activates influence between causes


44
D-Separation
𝐿
When two nodes are connected via an
undirected path; the path breaks in presence of
an evidence node

𝑅 𝐵

When evidence node in a V-structure is not


𝑇
𝑂 observed

𝑇′

45
Path Patterns
Active triples Inactive triples
o 𝑋 ⊥ 𝑌|𝑍 ??
▪ Yes. If X and Y are “d-separated” by Z

▪ If no active path considering all


(undirected) paths from X to Y then
independent

o A path is active if each triple is active


▪ Causal chain (X→Y→Z)

▪ Common cause (XY→Z)

▪ Common effect (X→YZ)

46
Path Pattern & CI: Examples

𝑅⊥𝑀
𝑅 𝑀

𝑅 ⊥ 𝑀|𝑇
𝑇

𝑅 ⊥ 𝑀|𝑇′

𝑇′

47
Path Pattern & CI: Examples

𝐿 𝐿 ⊥ 𝑇 ′ |𝑇

𝐿⊥𝑀
𝑅 𝑀
𝐿 ⊥ 𝑀|𝑇

𝑇
𝑂 𝐿 ⊥ 𝑀|𝑇′

𝑇′
𝐿 ⊥ 𝑀|𝑇, 𝑅
48
Path Pattern & CI: Examples

𝑇⊥𝑂
𝑂 𝑇

𝑇 ⊥ 𝑂|𝑅
𝐻

𝑇 ⊥ 𝑂|𝑅, 𝐻
O = Lawn Overflow
R = Rain
T = Traffic
H = Stuck at home
49
Topology & Distribution
{𝑋 ⊥ 𝑌, 𝑋 ⊥ 𝑍, 𝑌 ⊥ 𝑍,
𝑋 ⊥ 𝑌 𝑍, 𝑋 ⊥ 𝑍 𝑌,
𝑌 ⊥ 𝑍|𝑋}

𝑌
{}
𝑋 𝑍
𝑌 𝑌
{𝑋 ⊥ 𝑍|𝑌} 𝑋 𝑍 𝑋 𝑍
𝑌
𝑌 𝑌
𝑋 𝑍
𝑋 𝑍 𝑋 𝑍
𝑌
𝑌 𝑌
𝑋 𝑍
𝑌 𝑋 𝑍 𝑋 𝑍

𝑋 𝑍

50
Probabilistic Inference
Probabilistic Inference Problem
Let 𝑉 be the set of all variables in a given Bayesian Network.
Compute the posterior probability

𝑃(𝑋|𝐸1 , 𝐸2 , … … , 𝐸𝑚 )

Query variable Evidence variable

𝑉 = 𝑋 ∪ 𝐸1 , 𝐸2 , … … , 𝐸𝑚 ∪ {𝑌1 , 𝑌2 , … … , 𝑌𝑙 }

Hidden variable

𝑃(𝐵𝑢𝑟𝑔𝑙𝑎𝑟𝑦|𝑀𝑎𝑟𝑦𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒, 𝐽𝑜ℎ𝑛𝐶𝑎𝑙𝑙𝑠 = 𝑡𝑟𝑢𝑒)


52
Probabilistic Inference Problem
Let 𝑉 be the set of all variables in a given Bayesian Network.
Compute the most likely explanation (cause)

argmax 𝑃(𝑋|𝐸1 , 𝐸2 , … … , 𝐸𝑚 )
𝑥

Evidence variable
Query variable

Among burglary and earthquake which is most probable


Explanation for both Mary and John calling

53
Inference by Enumeration
𝑃(𝑋, 𝑒)
𝑃 𝑋𝑒 = = 𝛼𝑃 𝑋, 𝑒 = 𝛼 ෍ 𝑃(𝑋, 𝑒, 𝑦)
𝑃(𝑒)
𝑦
B E
𝑃 𝑏 𝑗, 𝑚 = 𝛼 ෍ ෍ 𝑃(𝑏, 𝑗, 𝑚, 𝑒, 𝑎)
𝑒 𝑎
A
𝑃 𝑏 𝑗, 𝑚 = 𝛼 ෍ ෍ 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
J M
Total number of products: 22 × 4 = 16

𝑃(𝑗) = 𝛼 ෍ ෍ ෍ ෍ 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑏 𝑒 𝑎 𝑚

Total number of products: 24 × 4 = 64 ≈ 𝑂(𝑛2𝑛 )


54
Inference by Enumeration
𝑃 𝑏 𝑗, 𝑚 = 𝛼 ෍ ෍ 𝑃(𝑏, 𝑗, 𝑚, 𝑒, 𝑎)
𝑒 𝑎
B E
𝑃 𝑏 𝑗, 𝑚 = 𝛼 ෍ ෍ 𝑃 𝑏 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
A
𝑃 𝑏 𝑗, 𝑚 = 𝛼𝑃(𝑏) ෍ 𝑃 𝑒 ෍ 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
J M

𝑃(𝑗) = 𝛼 ෍ 𝑃 𝑗 𝑎 ෍ 𝑃(𝑚|𝑎) ෍ 𝑃 𝑏 ෍ 𝑃 𝑒 𝑃 𝑎 𝑏, 𝑒
𝑎 𝑚 𝑏 𝑒

Total number of products: 24 = 16 ≈ 𝑂(2𝑛 )


55
Inference by Enumeration
𝑃 𝑏 𝑗, 𝑚 = 𝛼𝑃(𝑏) ෍ 𝑃 𝑒 ෍ 𝑃 𝑎 𝑏, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎

56
Variable Elimination

o Eliminate repeated calculations


▪ Compute once and save results for future use

▪ Dynamic programming

o Summations over each variable are done only portions of


the expression that depend on the variable

o Works on factors

57
Variable Elimination

𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑃(𝐵) ෍ 𝑃 𝑒 ෍ 𝑃 𝑎 𝐵, 𝑒 𝑃 𝑗 𝑎 𝑃(𝑚|𝑎)
𝑒 𝑎
𝑓1 (𝐵) 𝑓2 (𝐸) 𝑓3 (𝐴, 𝐵, 𝐸) 𝑓4 (𝐴) 𝑓5 (𝐴)

𝑓1 𝐵 = 0.001 0.999 𝑓2 𝐸 = 0.002 0.998 𝑓3 𝐴, 𝐵, 𝐸 2×2×2

𝑃(𝑗|𝑎) 090 𝑃(𝑚|𝑎) 070


𝑓4 𝐴 = = 𝑓5 𝐴 = =
𝑃(𝑗|¬𝑎) 0.05 𝑃(𝑚|¬𝑎) 0.01

𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × ෍ 𝑓2 𝐸 × ෍ 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)


𝑒 𝑎

58
Variable Elimination
Evaluation Process (right-to-left): Product → Sum → Product → Sum …
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × ෍ 𝑓2 𝐸 × ෍ 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)
𝑒 𝑎

Product → 𝑓6 (𝐵, 𝐸) = ෍ 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)


Sum out on A 𝑎
= 𝑓3 𝑎 𝐵, 𝐸 × 𝑓4 𝑎 × 𝑓5 𝑎 + 𝑓3 ¬𝑎 𝐵, 𝐸 × 𝑓4 ¬𝑎 × 𝑓5 ¬𝑎

𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × ෍ 𝑓2 𝐸 × 𝑓6 (𝐵, 𝐸)
𝑒
Product → 𝑓7 (𝐵) = ෍ 𝑓2 (𝐸) × 𝑓6 (𝐵, 𝐸)
Sum out on E 𝑒
= 𝑓2 (𝑒) × 𝑓6 𝐵, 𝑒 + 𝑓2 (¬𝑒) × 𝑓6 𝐵, ¬𝑒

𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × 𝑓7 (𝐵)
59
Variable Elimination
Pointwise Product: ×
𝑓 𝑋1 , … , 𝑋𝑗 , 𝑌1 , … , 𝑌𝑘 , 𝑍1 , … , 𝑍𝑙 = 𝑓1 𝑋1 , … , 𝑋𝑗 , 𝑌1 , … , 𝑌𝑘 × 𝑓2 𝑌1 , … , 𝑌𝑘 , 𝑍1 , … , 𝑍𝑙

𝑓 𝐵, 𝐶 = ෍ 𝑓3 𝐴, 𝐵, 𝐶 = 𝑓3 𝑎, 𝐵, 𝐶 + 𝑓3 (¬𝑎, 𝐵, 𝐶)
𝑎
.06 .24 .18 .72 .24 .96
= + =
.42 .28 .06 .04 .48 .32
60
Variable Elimination
Variable Ordering:
𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × ෍ 𝑓2 𝐸 × ෍ 𝑓3 (𝐴, 𝐵, 𝐸) × 𝑓4 (𝐴) × 𝑓5 (𝐴)
𝑒 𝑎

𝑃 𝐵 𝑗, 𝑚 = 𝛼𝑓1 𝐵 × ෍ 𝑓4 𝐴 × 𝑓5 𝐴 × ෍ 𝑓2 (𝐸)𝑓3 (𝐴, 𝐵, 𝐸)


𝑒 𝑒

o Each choice is valid

o Different ordering leads to different factorization

o Time and space requirement are dominated by the size of the


largest factor
▪ Finding optimal ordering is intractable → Heuristics 61
Variable Elimination
Variable Relevance:
𝑃 𝐽 𝑏 = 𝛼𝑃(𝑏) ෍ 𝑃 𝑒 ෍ 𝑃 𝑎 𝑏, 𝑒 𝑃 𝐽 𝑎 ෍ 𝑃(𝑚|𝑎)
𝑒 𝑎 𝑚

෍ 𝑃 𝑚 𝑎 = 𝑃 𝑚 𝑎 + 𝑃 ¬𝑚 𝑎 = 1
𝑚

o Variable M is irrelevant

o 𝑃 𝐽 𝑏 remains unchanged if M is removed from the BN

o Any leaf node that is not a query variable or evidence can be removed
o Every variable that is not an ancestor of a query variable or evidence
variable is irrelevant to the query.
62
Variable Elimination: Example
Joint Distribution: 𝑃(𝑇, 𝑊) Selected Joint: 𝑃(𝑇 = 𝑐𝑜𝑙𝑑, 𝑊) Single Conditional: 𝑃(𝑊|𝑇 = 𝑐𝑜𝑙𝑑)

T W P T W P T W P
hot sun 0.4 cold sun 0.2 cold sun 0.4
hot rain 0.1 cold rain 0.3 cold rain 0.6
cold sun 0.2
cold rain 0.3

Family of Conditionals: 𝑃(𝑊|𝑇) Specified Family: 𝑃(𝑊 = 𝑟𝑎𝑖𝑛|𝑇)

T W P T W P
hot sun 0.8 hot rain 0.2
Factors
hot rain 0.2 cold rain 0.6
cold sun 0.4
cold rain 0.6

63
Example
R=Raining
B=Bad Weather
L=Late for Class

R B L

𝑟 0.1 𝑟 𝑏 0.8 𝑏 𝑙 0.3


𝑟¬ 0.9 𝑟 ¬𝑏 0.2 𝑏 ¬𝑙 0.7
¬𝑟 𝑏 0.1 ¬𝑏 𝑙 0.1
¬𝑟 ¬𝑏 0.9 ¬𝑏 ¬𝑙 0.9

𝑃 𝑅 𝑃 𝐵𝑅 𝑃(𝐿|𝐵)
64
Inference by Enumeration: Example

𝑃(𝑅, 𝐵, 𝐿)

R 𝑟 0.1 𝑟 𝑏 0.8 𝑟 𝑏 0.08


𝑟¬ 0.9 𝑟 ¬𝑏 0.2 Join R 𝑟 ¬𝑏 0.02
¬𝑟 𝑏 0.1 ¬𝑟 𝑏 0.09
¬𝑟 ¬𝑏 0.9 ¬𝑟 ¬𝑏 0.81
B
𝑃 𝑅 × 𝑃 𝐵𝑅 = 𝑃(𝑅, 𝐵)

65
Inference by Enumeration: Example

𝑃(𝑅, 𝐵, 𝐿)
R,B 𝑟 𝑏 𝑙 0.024
𝑟 𝑏 0.08 𝑏 𝑙 0.3
𝑟 𝑏 ¬𝑙 0.056
𝑟 ¬𝑏 0.02 𝑏 ¬𝑙 0.7 Join B ¬𝑟 ¬𝑏 𝑙 0.002
¬𝑟 𝑏 0.09 ¬𝑏 𝑙 0.1
¬𝑟 ¬𝑏 ¬𝑙 0.018
¬𝑟 ¬𝑏 0.81 ¬𝑏 ¬𝑙 0.9
L … … … …
𝑃 𝑅, 𝐵 × 𝑃 𝐿𝐵 = 𝑃(𝑅, 𝐵, 𝐿)

66
Inference by Enumeration: Example
Marginalization

𝑟 𝑏 0.08
𝑟 ¬𝑏 0.02 Sum R 𝑏 0.17 = 0.08+0.09
¬𝑟 𝑏 0.09 ¬𝑏 0.83 = 0.02+0.81
¬𝑟 ¬𝑏 0.81

Sum R Sum B
𝑃(𝑅, 𝐵, 𝐿) 𝑃(𝐵, 𝐿) 𝑃(𝐿)

𝑃 𝐿 = ෍෍𝑃 𝑅 × 𝑃 𝐵 𝑅 ×𝑃 𝐿 𝐵
𝐵 𝑅 67
Variable Elimination
Interleaved Join-Eliminate
𝑃(𝑅)
Join R
R
𝑃(𝑅, 𝐵)
Sum R
𝑃(𝐵|𝑅)
B 𝑃(𝐵) 𝑃(𝐿)

Sum B
Join B
L
𝑃(𝐿|𝐵) 𝑃(𝐵, 𝐿)

𝑃 𝐿 = ෍ 𝑃 𝐿 𝐵 × ෍ 𝑃 𝑅 × 𝑃(𝐵|𝑅)
𝐵 𝑅 68
Variable Elimination
𝑃(𝐿|𝑟)
𝑃(𝐵|𝑟)
𝑃(𝑟) 𝑟 𝑏 0.8
𝑃(𝐿, 𝑟)
𝑟 0.1 𝑟 ¬𝑏 0.2
𝑟 𝑙 0.026
𝑟 ¬𝑙 0.074
𝑃(𝑟, 𝐵)
𝑟 𝑏 0.08
𝑟 ¬𝑏 0.02
𝑟 𝑏 𝑙 0.024 𝑃(𝐿|𝑟)
𝑃(𝐿|𝐵) 𝑟 𝑏 ¬𝑙 0.056 𝑟 𝑙 0.26
𝑏 𝑙 0.3 𝑟 ¬𝑏 𝑙 0.002 𝑟 ¬𝑙 0.74
𝑏 ¬𝑙 0.7 𝑟 ¬𝑏 ¬𝑙 0.018
¬𝑏 𝑙 0.1 𝑃(𝑟, 𝐿, 𝐵)
¬𝑏 ¬𝑙 0.9
69
Summary
o Modelling uncertainty and belief with probability
o Using conditional independence to factorize joint probability in an
efficient way.
o Bayesian Network is a graphical model of a set of conditional
independences
o Exact inference in BN with variable elimination
o Approximate inference algorithms (Monte Carlo, Gibbs Sampling,
etc.)
o Probabilistic reasoning with time (Hidden Markov Models, Dynamic
BN)
70

You might also like