Session 3
Session 3
as the eight comes from the constraint that the probabilities need to sum to 1: p(Ā, B̄, C̄) = 1 − p1 − p2 − p3 − p4 − p6 − p7 .
a) Suppose that the joint distribution is factorized according to the Bayesian network in Figure 3.1 (left). How many
parameters are needed to specify the joint distribution? Express these parameters in terms of the parameters in (17).
b) Suppose that the joint distribution is factorized according to the Bayesian network in Figure 3.1 (right). How many
parameters are needed to specify the joint distribution? Express these parameters in terms of the parameters in (17).
A B
A B C
C
C A B
A C B
A B C
Show that
a) B|C;
for the center network, A
|=
b) B|C;
for the right network, A
|=
c) B|∅.
A B
C X D
E F
3.2 Modeling
Exercise 3.5 The Krakozhia problem (adapted from [4] and [1])
You are tasked with developing a clinical model for the ChestPain Inc. clinic to diagnose tuberculosis, lung cancer, and
bronchitis. Smoking is a known risk factor in bronchitis and lung cancer and, recently, there was an outbreak of tuberculosis
in Krakozhia, so prolonged stays in the country increase the risk of infection. Tuberculosis and lung cancer may cause lung
damage. Lung damange and bronchitis are both causes of dyspnoea, or shortness of breath.
Express this information as a Bayesian network, and use that for determining whether the following statements are true
or not:
a)
lung cancer bronchitis | smoking
|=
b)
stay in Krakozhia smoking | lung cancer
|=
c)
stay in Krakozhia smoking | lung cancer, dyspnoea
|=
d)
Exercises 3. Directed graphical models 12
p(E = l) = 0.2
p(S = l|O = l) = 0.9 p(S = n|O = l) = 0.1
p(S = l|O = h) = 0.1 p(S = n|O = h) = 0.4
p(O = l|E = l) = 0.9 p(O = l|E = h) = 0.05
p(F = l|I = l, E = l) = 0.9 p(F = l|I = l, E = h) = 0.1
p(F = l|I = h, E = l) = 0.1 p(F = l|I = h, E = h) = 0.01
p(I = l|O = l, E = l) = 0.9 p(I = l|O = l, E = h) = 0.1
p(I = l|O = h, E = l) = 0.1 p(I = l|O = h, E = h) = 0.01
a) Using the conditional probabilities given Table 3.2 draw the Bayesian network corresponding to this model.
b) Then, compute the probability that inflation (I) is high, given that the stock price (S) is normal and the futures price
(F ) is high.
Solutions 3
Solution to Exercise 3.1 a) To specify a conditional probability, we need two parameters (one for when the antecedent
is true and one for when it is false),
and similarly
Then X
q5 = p(A) = p(A, b) = p(A, B) + p(A, B̄) = p1 + p3 + p4 + p5 ;
b
So
p(A, B) p1 + p4
q1 = p(B|A) = = ,
p(A) p1 + p3 + p4 + p5
p(Ā, B) p2 + p6
q2 = p(B|Ā) = = .
p(Ā) 1 − p1 − p3 − p4 − p5
Similarly,
So
p(B, C) p1 + p2
q3 = p(C|B) = = ,
p(B) p1 + p2 + p4 + p6
p(B̄, C) p3 + p7
q4 = p(C|B̄) = = .
p(B̄) 1 − p1 − p2 − p4 − p6
b) To specify the conditional probability of C|A, B, we seed four parameters (one for each combination of the
antecedents):
with the probabilities of the complements given by normalization. With the additional two parameters to specify the
Bernoulli probabilities p(A) = r5 and p(B) = r6 , we have a total of 6 parameters.
Using the law of conditional probability and the expressions found in (a), we have that
p(A, B, C) p1
r1 = p(C|A, B) = = ,
p(A, B) p1 + p4
p(Ā, B, C) p2
r2 = p(C|Ā, B) = = ,
p(Ā, B) p2 + p6
p(A, B̄, C) p3
r3 = p(C|A, B̄) = = .
p(A, B̄) p3 + p5
To compute r4 , we use
B|C.
b) We have that
p(A, B, C) 1
p(A, B|C) = = p(B|C)p(C|A)p(A)
p(C) p(C)
p(C|A)p(A)
= p(B|C) = p(B|C)p(A|C),
p(C)
so A
|=
B|C.
c) We have that
X X
p(A, B) = p(A, B, C) = p(C|A, B)p(A)p(B) = p(A)p(B),
C C
so A
|=
B|∅.
Therefore, one possible Bayesian network resulting from the marginalization is presented in Figure 3.1.
A B
C D
E F
Figure 3.1: Bayesian network resulting from the marginalization of X in Figure 3.3. The edge between E and F has
arbitrary orientation.
The same result can be achieved using explicit computations. To this end we let
X X
p(A, B, C, D, E, F ) = p(A, B, C, D, E, F, X) = p(F |D, X)p(E|C, X)p(X|A, B)p(A)p(B)p(C)p(D).
X X
Now, using the dependency relations implied by the graph, we have that E A, B|C, X and that X C|A, B, so
|=
|=
p(X|A, B) = p(X|A, B, C), p(E|C, X) = p(E|A, B, C, X),
and
p(E|C, X)p(X|A, B) = p(E, X|A, B, C);
similarly, F A, B, C, E|D, X, so p(F |D, X) = p(F |A, B, C, D, E, X) and
|=
finally,
X
p(A, B, C, D, E, F ) = p(F, E, X|A, B, C, D)p(A)p(B)p(C)p(D)
X
= p(F, E|A, B, C, D)p(A)p(B)p(C)p(D).
note that, from the network, F C|A, B, D and E D|A, B, C, so we have the result
|=
|=
B.
Solutions 3. Directed graphical models 37
Hence,
p(A = 0, B = 0, C = 0) 0.192
p(A = 0, B = 0|C = 0) = = = 0.4
p(C = 0) 0.48
hence,
p(A = 0, C = 0) p(A = 0, B = 0, C = 0) + p(A = 0, B = 1, C = 0) 0.192 + 0.048
p(A = 0|C = 0) = = = = 0.5
p(C = 0) p(C = 0) 0.48
and
p(B = 0, C = 0) p(A = 0, B = 0, C = 0) + p(A = 1, B = 0, C = 0) 0.192 + 0.192
p(B = 0|C = 0) = = = = 0.8
p(C = 0) p(C = 0) 0.48
Hence,
p(A = 0|C = 0)p(B = 0|C = 0) = 0.5 · 0.8 = 0.4 = p(A = 0, B = 0|C = 0).
Since similar relationships hold for the other possible outcomes, we have that p(A, B|C) = p(A|C)p(B|C) and A
|=
B|C.
K S
T L B
Figure 3.2: Bayesian network for the Krakozhia exercise. With the added chest X-ray variable.
a)
L B|S? Yes, as L → S 6→ B and L → D → P 6→ B
|=
b)
K S|L? Yes, as T → D 6→ L and T → D → P 6→ B.
|=
c)
K S|L, P ? No, they are connected by T → D → P → B → S.
|=
d)
b)
c) See red addition in Figure 3.3.
W C|A? Yes, because W → A 6→ C
|=
d)
A S
B D
p(O) = 0.8 p(L|O) = 0.75 p(L|Ō) = 0.02 p(M |O) = 0.92 p(M |Ō) = 0.15
a) The Bayesian network of the model is given by Figure 3.5, and p(M, O, L) = p(M |O)p(L|O)p(O).
b) We have that
XX XX
p(L) = p(M, O, L) = p(M |O)p(L|O)p(O)
M O M O
X
= p(L|O)p(O) = 0.75 · 0.8 + 0.02 · 0.2 = 0.604;
O
note that
P
M p(M |O) = 1.
Solutions 3. Directed graphical models 39
c) We have that
p(L, M )
p(L|M ) = ,
p(M )
where
X
p(L, M ) = p(M, O, L) = p(M |O)p(L|O)p(O) + p(M |Ō)p(L|Ō)p(Ō)
O
= 0.92 · 0.75 · 0.8 + 0.15 · 0.02 · 0.2 = 0.5526
and
L M
O I
S
F
p(I = h, S = n, F = h)
p(I = h|S = n, F = h) = .
p(S = n, F = h)
Solutions 3. Directed graphical models 40
We have that
XX
P (I = h, S = n, F = h) = p(I = h, S = n, F = h, O, E)
O E
XX
= p(F = h|O, E, I = h)p(S = n|O)p(I = h|O, E)p(O|E)p(E)
O E
=p(F = h|I = h, E = l)p(S = n|O = l)p(I = h|O = l, E = l)p(O = l|E = l)p(E = l)
+ p(F = h|I = h, E = l)p(S = n|O = h)p(I = h|O = h, E = l)p(O = l|E = h)p(E = h)
+ p(F = h|I = h, E = h)p(S = n|O = l)p(I = h|O = l, E = h)p(O = h|E = l)p(E = l)
+ p(F = h|I = h, E = h)p(S = n|O = h)p(I = h|O = h, E = h)p(O = h|E = h)p(E = h)
≈ 0.3096
and that
XXX
P (S = n, F = h) = p(I, S = n, F = h, O, E)
I O E
=p(I = h, S = n, F = h, O, E) + p(I = l, S = n, F = h, O, E)
XX
=p(I = h, S = n, F = h, O, E) + p(F = h|O, E, I = l)p(S = n|O)p(I = l|O, E)p(O|E)p(E)
O E
=p(I = h, S = n, F = h)
+ p(F = h|I = l, E = l)p(S = n|O = h)p(I = l|O = h, E = l)p(O = l|E = h)p(E = h)
+ p(F = h|I = l, E = h)p(S = n|O = l)p(I = l|O = l, E = h)p(O = h|E = l)p(E = l)
+ p(F = h|I = l, E = h)p(S = n|O = h)p(I = l|O = h, E = h)p(O = h|E = h)p(E = h)
≈0.3144
Hence,
0.6902
p(I = h|S = n, F = h) = ≈ 0.9847
0.6985
Bibliography
[1] David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2012.
[2] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[3] Kevin B Korb and Ann E Nicholson. Bayesian artificial intelligence. CRC press, 2010.
[4] Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their
application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):157–194,
1988.
[5] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.