cs188-su24-lec08
cs188-su24-lec08
A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Example: Alarm Network
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998
A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Conditional Independence
o X and Y are independent iff
o Example:
Bayes Nets: Assumptions
o Assumptions we are required to make to define the
Bayes net when given the graph:
X Y Z W
X Y Z
D-separation: Outline
D-separation: Outline
o Study independence properties for triples
o Why triples?
In numbers:
X: Low pressure Y: Rain Z: Traffic
P( +y | +x ) = 1, P( -y | - x ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
X Y Z
Causal Chains
o This configuration is a “causal chain” Given Y, is X guaranteed to be independent
of Z?
Yes!
X Y Z
Evidence along the chain “blocks” the
influence
Common Causes
o This configuration is a “common cause” Guaranteed X independent of Z ?
No!
Y: Project Y
One example set of CPTs for which X is not
due independent of Z is sufficient to show this
independence is not guaranteed.
X Z Example:
Project due causes both forums busy
and lab full
In numbers:
X: Forums
Z: Lab full
busy
P( +x | +y ) = 1, P( -x | -y ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
Common Cause
o This configuration is a “common cause” Guaranteed X and Z independent given
Y?
Y: Project
Y
due
X Z
X: Forums
Z: Lab full
busy Yes!
Observing the cause blocks influence
between effects.
Common Effect
o Last configuration: two causes of Are X and Y independent?
one effect (v-structures)
Yes: the ballgame and the rain cause traffic, but
they are not correlated
X: Raining Y: Ballgame
Proof:
X Y
Z: Traffic Z
Common Effect
o Last configuration: two causes of Are X and Y independent?
one effect (v-structures)
Yes: the ballgame and the rain cause traffic, but
they are not correlated
X: Raining Y: Ballgame
(Proved previously)
X Y
This is backwards from the other cases
Observing an effect activates influence between
Z: Traffic Z
possible causes.
The General Case
The General Case
Yes R B
T
’
Example
Yes
R B
Yes
D T
Yes
T
’
Example
o Variables:
o R: Raining
R
o T: Traffic
o D: Roof drips
o S: I’m sad T D
o Questions:
S
Yes
Another Perspective: Bayes Ball
Structure Implications
o Given a Bayes net structure, can run d-
separation algorithm to build a complete
list of conditional independences that are
necessarily true of the form
X Z X Z X Z
Bayes Nets Representation Summary
o Basic idea
o Draw N samples from a sampling
distribution S
o Compute an approximate
posterior probability
o Show this converges to the true
Sampling
o Sampling from given Example
distribution
C P(C)
o Step 1: Get sample u from uniform red 0.6
distribution over [0, 1)
o E.g. random() in python
green 0.1
blue 0.3
o Step 2: Convert this sample u into
an outcome for the given
distribution by having each target If random() returns u = 0.83,
outcome associated with a sub- then our sample is C = blue
interval of [0,1) with sub-interval E.g, after sampling 8 times:
size equal to probability of the
outcome
Sampling in Bayes’ Nets
o Prior Sampling
o Rejection Sampling
o Likelihood Weighting
o Gibbs Sampling
Prior Sampling
Prior Sampling
+c 0.5
-c 0.5
Cloudy
+s 0.1 +r 0.8
+c -s 0.9 +c -r 0.2
+s 0.5 Sprinkler +r 0.2
Rain
-c -s 0.5 -c -r 0.8
WetGrass Samples:
+w 0.99
+s +r -w 0.01 +c, -s, +r, +w
+w 0.90 -c, +s, -r, +w
-r -w 0.10
+w 0.90 …
-s +r -w 0.10
+w 0.01
-r -w 0.99
Prior Sampling
o Then
S R
Cloudy
+s 0.1 +r 0.8
+c -s 0.9 +c -r 0.2
+s 0.5 +r 0.2
-c -s 0.5 Sprinkler Rain -c -r 0.8
+w 0.99 WetGrass
+s +r -w 0.01
+w 0.90
-r -w 0.10 Samples:
+w 0.90 w = 1.0 x 0.1 x 0.99
-s
+c, +s, +r, +w
+r -w 0.10 w = 1.0 x 0.5 x 0.90
+w 0.01
-c, +s, -r, +w
-r -w 0.99 …
Likelihood Weighting
o Input: evidence instantiation
o w = 1.0
o for i = 1, 2, …, n in topological order
o if Xi is an evidence variable
o Xi = observation xi for Xi
o Set w = w * P(xi | Parents(Xi))
o else
o Sample xi from P(Xi | Parents(Xi))
o return (x1, x2, …, xn), w
Likelihood Weighting
o Sampling distribution if z sampled and e fixed evidence
S R
W
Example: Car Insurance: P(PropertyCost|e)
Gibbs Sampling
Markov Chain Monte Carlo
o Gibbs sampling is a MCMC technique (Metropolis-
Hastings)
o MCMC (Markov chain Monte Carlo) is a family of randomized
algorithms for approximating some quantity of interest over a very
large state space
o Markov chain = a sequence of randomly chosen states (“random walk”),
where each state is chosen conditioned on the previous state
o Monte Carlo = a very expensive city in Monaco with a famous casino
o Monte Carlo = an algorithm (usually based on sampling) that has some
probability of producing an incorrect answer
o MCMC = wander around for a bit, average what you see
Gibbs sampling
o A particular kind of MCMC
o States are complete assignments to all variables
o (local search: closely related to simulated annealing!)
o Evidence variables remain fixed, other variables change
o To generate the next state, pick a variable and sample a value for it
conditioned on all the other variables: Xi’ ~ P(Xi | x1,..,xi–1,xi+1,..,xn)
o Will tend to move towards states of higher probability, but can go down too
o In a Bayes net, P(Xi | x1,..,xi–1,xi+1,..,xn) = P(Xi | markovblanket(Xi))
o Theorem: Gibbs sampling is consistent*
o Provided all Gibbs distributions are bounded away from 0 and 1 and variable selection is fair
Gibbs Sampling Example: P( S | +r)
o Step 1: Fix evidence C
Step 2: Initialize other variables C
o R = +r Randomly
S +r S +r
W W
Steps 3: Repeat:
Choose a non-evidence variable X
Resample X from P( X | MarkovBlanket(X))
C C C C C C
S +r S +r S +r S +r S +r S +r
W W W W W W
Resampling of One Variable
o Sample from P(S | +c, +r, -w) C
S +r
Gibbs Sampling P( Q | e )
o Likelihood Weighting P( Q | e)
CS 188: Artificial Intelligence
Hidden Markov Models
§ CPT P(Xt | Xt-1): Two new ways of representing the same CPT
X1 X2 X3 X4
P ( Xt ) = ?
0.7 0.1