0% found this document useful (0 votes)
5 views

Session 3

Uploaded by

aializane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Session 3

Uploaded by

aializane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Exercises 3

Directed graphical models

3.1 Bayesian network basics


Exercise 3.1 Number of parameters
Consider a probabilistic model involving three binary random variables A, B, and C. In the general case, there are 23 = 8
possible outcomes for the three binary random variables and we need up to 7 distinct probabilities to specify the joint
distribution of the random variables,
p(A, B, C) = p1 , p(Ā, B, C) = p2 , p(A, B̄, C) = p3 , p(A, B, C̄) = p4 ,
(17)
p(A, B̄, C̄) = p5 , p(Ā, B, C̄) = p6 , p(Ā, B̄, C) = p7 ,

as the eight comes from the constraint that the probabilities need to sum to 1: p(Ā, B̄, C̄) = 1 − p1 − p2 − p3 − p4 − p6 − p7 .
a) Suppose that the joint distribution is factorized according to the Bayesian network in Figure 3.1 (left). How many
parameters are needed to specify the joint distribution? Express these parameters in terms of the parameters in (17).
b) Suppose that the joint distribution is factorized according to the Bayesian network in Figure 3.1 (right). How many
parameters are needed to specify the joint distribution? Express these parameters in terms of the parameters in (17).

A B

A B C
C

Figure 3.1: Bayesian networks in Exercise 3.1

Exercise 3.2 D-separation basic structures


Consider the three Bayesian networks in Figure 3.2.

C A B
A C B
A B C

Figure 3.2: Bayesian networks in Exercise 3.2

Show that

Version: September 10, 2019


Exercises 3. Directed graphical models 11

Table 3.1: Joint distribution of the variables in Exercise 3.4


A B C p(A, B, C)
0 0 0 0.192
0 0 1 0.144
0 1 0 0.048
0 1 1 0.216
1 0 0 0.192
1 0 1 0.064
1 1 0 0.048
1 1 1 0.096

for the left network, A


|=

a) B|C;
for the center network, A
|=

b) B|C;
for the right network, A
|=

c) B|∅.

Exercise 3.3 Marginalization (adapted from [5])


Consider the Bayesian network in Figure 3.3. Suppose that we marginalize out the random variable X. What does the
resulting Bayesian network look like? Which extra edges need to be added?

A B

C X D

E F

Figure 3.3: Bayesian network in Exercise 3.3.

Exercise 3.4 (adapted from [2])


Consider the three binary random variables A, B, and C having the joint distribution in table 3.1. Show, by explicit
computation, that A and B are marginally dependent (i.e., p(A, B) 6= p(A)p(B)) and that they become independent when
conditioned on C (i.e., p(A, B|C) = p(A|C)p(B|C)).

3.2 Modeling
Exercise 3.5 The Krakozhia problem (adapted from [4] and [1])
You are tasked with developing a clinical model for the ChestPain Inc. clinic to diagnose tuberculosis, lung cancer, and
bronchitis. Smoking is a known risk factor in bronchitis and lung cancer and, recently, there was an outbreak of tuberculosis
in Krakozhia, so prolonged stays in the country increase the risk of infection. Tuberculosis and lung cancer may cause lung
damage. Lung damange and bronchitis are both causes of dyspnoea, or shortness of breath.
Express this information as a Bayesian network, and use that for determining whether the following statements are true
or not:

tuberculosis smoking | dispnoea


|=

a)
lung cancer bronchitis | smoking
|=

b)
stay in Krakozhia smoking | lung cancer
|=

c)
stay in Krakozhia smoking | lung cancer, dyspnoea
|=

d)
Exercises 3. Directed graphical models 12

Exercise 3.6 Life of a builder (adapted from [1])


Exposure to asbestos (A) and smoking (S) have a synergistic effect on lung cancer (C) as described by

p(C, A, S) = p(C|A, S)p(A)p(S) (18)

a) Draw the graphical model representing (18).


b) Conditioned on a cancer diagnosis, are S and A independent?
c) Modify the model accounting for the fact that construction workers (W ) are more likely to be exposed to asbestos.
d) Given A, show that W and C are independent.

Exercise 3.7 D-separation


Consider the joint probability
p(A, B, C, D) = p(A|B)p(B|C)p(D|C)P (C). (19)

a) Draw the corresponding Bayesian network.


b) Consider B and D; which latent variables make them conditionally independent? Prove it explicitly.

Exercise 3.8 At the office (adapted from [3])


The manager Laura wants to track the working habits of the developer Marcus. Laura knows that Marcus spends 80% of
the time at the office. In order not to disturb him while he works, she only checks for light at his window. Laura knows that
Marcus sometimes works in the dark, and he keeps the light on 75% of the time he is at the office. He leaves the lights on
when he leaves 2% of the times.
In addition to checking the window, Laura has access to the login information of the mainframe Marcus is working on.
When he is at the office, Marcus is logged in 92% of the time. When he is at home, he logs in 15% of the time.

a) Draw the Bayesian network representing the scenario just described.


b) Compute Laura’s belief that Marcus’s light is on.
c) Suppose that Laura sees that Marcus is logged into the mainframe. What effect does this observation have on her
belief that Marcus’s light is on?

Exercise 3.9 An economic model (adapted from [1])


Consider the probabilistic model linking economy (E) to inflation (I) and price of oil (O) through observations of stock
prices (S) and prices of oil futures (F ). Suppose that all the quantities can take on two distinct values of high (h) and low
(l), except for the stock price which can also be normal (n).

Table 3.2: Conditional probabilities of the model in Exercise 3.9

p(E = l) = 0.2
p(S = l|O = l) = 0.9 p(S = n|O = l) = 0.1
p(S = l|O = h) = 0.1 p(S = n|O = h) = 0.4
p(O = l|E = l) = 0.9 p(O = l|E = h) = 0.05
p(F = l|I = l, E = l) = 0.9 p(F = l|I = l, E = h) = 0.1
p(F = l|I = h, E = l) = 0.1 p(F = l|I = h, E = h) = 0.01
p(I = l|O = l, E = l) = 0.9 p(I = l|O = l, E = h) = 0.1
p(I = l|O = h, E = l) = 0.1 p(I = l|O = h, E = h) = 0.01

a) Using the conditional probabilities given Table 3.2 draw the Bayesian network corresponding to this model.
b) Then, compute the probability that inflation (I) is high, given that the stock price (S) is normal and the futures price
(F ) is high.
Solutions 3

Directed graphical models

Solution to Exercise 3.1 a) To specify a conditional probability, we need two parameters (one for when the antecedent
is true and one for when it is false),

p(B|A) = q1 , p(B|Ā) = q2 p(B̄|A) = 1 − q1 , p(B̄|Ā) = 1 − q2 .

and similarly

p(C|B) = q3 , p(C|B̄) = q4 p(C̄|B) = 1 − q3 , p(C̄|B̄) = 1 − q4 .

In addition, we need one Bernoulli probability p(A) = q5 ; so in total we need 5 parameters.


Using the law of total probability, we have that
X
p(A, B) = p(A, B, c) = p(A, B, C) + p(A, B, C̄) = p1 + p4
c
X
p(Ā, B) = p(Ā, B, c) = p(Ā, B, C) + p(Ā, B, C̄) = p2 + p6
c
X
p(A, B̄) = p(A, B̄, c) = p(A, B̄, C) + p(A, B̄, C̄) = p3 + p5
c

Then X
q5 = p(A) = p(A, b) = p(A, B) + p(A, B̄) = p1 + p3 + p4 + p5 ;
b

So
p(A, B) p1 + p4
q1 = p(B|A) = = ,
p(A) p1 + p3 + p4 + p5
p(Ā, B) p2 + p6
q2 = p(B|Ā) = = .
p(Ā) 1 − p1 − p3 − p4 − p5
Similarly,

p(B, C) = p(A, B, C) + p(Ā, B, C) = p1 + p2


p(B, C̄) = p(A, B, C̄) + p(Ā, B, C̄) = p4 + p6
p(B̄, C) = p(A, B̄, C) + p(Ā, B̄, C) = p3 + p7

So
p(B, C) p1 + p2
q3 = p(C|B) = = ,
p(B) p1 + p2 + p4 + p6
p(B̄, C) p3 + p7
q4 = p(C|B̄) = = .
p(B̄) 1 − p1 − p2 − p4 − p6

Version: September 10, 2019


Solutions 3. Directed graphical models 35

b) To specify the conditional probability of C|A, B, we seed four parameters (one for each combination of the
antecedents):

p(C|A, B) = r1 , p(C|Ā, B) = r2 p(C|A, B̄) = r3 p(C|Ā, B̄) = r4 ;

with the probabilities of the complements given by normalization. With the additional two parameters to specify the
Bernoulli probabilities p(A) = r5 and p(B) = r6 , we have a total of 6 parameters.
Using the law of conditional probability and the expressions found in (a), we have that

p(A, B, C) p1
r1 = p(C|A, B) = = ,
p(A, B) p1 + p4
p(Ā, B, C) p2
r2 = p(C|Ā, B) = = ,
p(Ā, B) p2 + p6
p(A, B̄, C) p3
r3 = p(C|A, B̄) = = .
p(A, B̄) p3 + p5

To compute r4 , we use

p(Ā, B̄, C) p(Ā, B̄, C) p7


r4 = p(C|Ā, B̄) = = = .
p(Ā, B̄) p(Ā, B̄, C) + p(Ā, B̄, C̄) 1 − p1 − p2 − p3 − p4 − p5 − p6

Finally, r5 = p(A) = q5 and r6 = p(B) = p(A, B) + p(Ā, B) = p1 + p2 + p4 + p6 .

Solution to Exercise 3.2


The idea is to show that the joint density of A and B factorizes into two terms, using the joint density expressed by
the Bayesian networks.
a) We have that
p(A, B, C) p(A|C)p(B|C)p(C)
p(A, B|C) = = = p(A|C)p(B|C),
p(C) p(C)
so A
|=

B|C.
b) We have that

p(A, B, C) 1
p(A, B|C) = = p(B|C)p(C|A)p(A)
p(C) p(C)
p(C|A)p(A)
= p(B|C) = p(B|C)p(A|C),
p(C)

so A
|=

B|C.
c) We have that
X X
p(A, B) = p(A, B, C) = p(C|A, B)p(A)p(B) = p(A)p(B),
C C

so A
|=

B|∅.

Solution to Exercise 3.3


This exercise can be solved using the independence relationships expressed by the graph. Using d-separation, we have the
following
• D is independent of the whole graph except F ;
• F is not independent of A, B, and E;
• E is not independent of A, B, and F ;
• C is independent of the whole graph except E;
Solutions 3. Directed graphical models 36

Therefore, one possible Bayesian network resulting from the marginalization is presented in Figure 3.1.

A B

C D

E F

Figure 3.1: Bayesian network resulting from the marginalization of X in Figure 3.3. The edge between E and F has
arbitrary orientation.

The same result can be achieved using explicit computations. To this end we let
X X
p(A, B, C, D, E, F ) = p(A, B, C, D, E, F, X) = p(F |D, X)p(E|C, X)p(X|A, B)p(A)p(B)p(C)p(D).
X X

Now, using the dependency relations implied by the graph, we have that E A, B|C, X and that X C|A, B, so

|=

|=
p(X|A, B) = p(X|A, B, C), p(E|C, X) = p(E|A, B, C, X),

and
p(E|C, X)p(X|A, B) = p(E, X|A, B, C);
similarly, F A, B, C, E|D, X, so p(F |D, X) = p(F |A, B, C, D, E, X) and
|=

p(F |D, X)p(E, X|A, B, C) = p(F, E, X|A, B, C, D);

finally,
X
p(A, B, C, D, E, F ) = p(F, E, X|A, B, C, D)p(A)p(B)p(C)p(D)
X
= p(F, E|A, B, C, D)p(A)p(B)p(C)p(D).

Using conditioning, we have that

p(F, E|A, B, C, D) = p(F |E, A, B, C, D)p(E|A, B, C, D);

note that, from the network, F C|A, B, D and E D|A, B, C, so we have the result
|=

|=

p(A, B, C, D, E, F ) = p(F |E, A, B, D)p(E|A, B, C)p(A)p(B)p(C)p(D).

Solution to Exercise 3.4


We compute the joint distribution p(A, B) by using the law of total probability and marginalization:
X
p(A = 0, B = 0) = p(A = 0, B = 0, C) = p(A = 0, B = 0, C = 0) + p(A = 0, B = 0, C = 1) = 0.336;
C

similarly, p(A = 0, B = 1) = 0.264, p(A = 1, B = 0) = 0.256, and p(A = 1, B = 1) = 0.144.


Using the same reasoning, we can compute the marginal distribution of A:
X
p(A = 0) = p(A = 0, B) = p(A = 0, B = 0) + p(A = 0, B = 1) = 0.336 + 0.264 = 0.6;
B

and p(A = 1) = 1 − p(A = 0) = 0.4. And the marginal distribution of B:


X
p(B = 0) = p(A, B = 0) = p(A = 0, B = 0) + p(A = 1, B = 0) = 0.336 + 0.256 = 0.592.
A

Finally p(A = 0)p(B = 0) = 0.6 · 0.592 = 0.3352 6= p(A = 0, B = 0), So A 6


|=

B.
Solutions 3. Directed graphical models 37

Consider now the marginal distribution of C, we have that


XX
p(C = 0) = p(A, B, C = 0) = 0.48;
A B

Hence,
p(A = 0, B = 0, C = 0) 0.192
p(A = 0, B = 0|C = 0) = = = 0.4
p(C = 0) 0.48
hence,
p(A = 0, C = 0) p(A = 0, B = 0, C = 0) + p(A = 0, B = 1, C = 0) 0.192 + 0.048
p(A = 0|C = 0) = = = = 0.5
p(C = 0) p(C = 0) 0.48
and
p(B = 0, C = 0) p(A = 0, B = 0, C = 0) + p(A = 1, B = 0, C = 0) 0.192 + 0.192
p(B = 0|C = 0) = = = = 0.8
p(C = 0) p(C = 0) 0.48
Hence,
p(A = 0|C = 0)p(B = 0|C = 0) = 0.5 · 0.8 = 0.4 = p(A = 0, B = 0|C = 0).
Since similar relationships hold for the other possible outcomes, we have that p(A, B|C) = p(A|C)p(B|C) and A

|=
B|C.

Solution to Exercise 3.5


Define the random variables Tuberculosis (T ), Krakozhia (K), lung Damage (D), Lung cancer (L), Smoking (S), Bronchitis
(B), and dysPnea (P ). Figure 3.2 shows the final Bayesian network.

K S

T L B

Figure 3.2: Bayesian network for the Krakozhia exercise. With the added chest X-ray variable.

T S|P ? No, they are connected by T → D → P → B → S.


|=

a)
L B|S? Yes, as L → S 6→ B and L → D → P 6→ B
|=

b)
K S|L? Yes, as T → D 6→ L and T → D → P 6→ B.
|=

c)
K S|L, P ? No, they are connected by T → D → P → B → S.
|=

d)

Solution to Exercise 3.6


Using (18)
a) The Bayesian network is given in solid in Figure 3.3.
S A|C? No, they are connected by S → C → A.
|=

b)
c) See red addition in Figure 3.3.
W C|A? Yes, because W → A 6→ C
|=

d)

Solution to Exercise 3.7


Using (19)
Solutions 3. Directed graphical models 38

A S

Figure 3.3: Bayesian network in the worker example.

B D

Figure 3.4: Bayesian network in 3.7

a) The Bayesian network is given in Figure 3.4.


b) The node C is tail-to-tail on the path from B to D, so we need to condition on this variable. We have that
X
p(B, D|C) = p(A, B, D|C)
A
1 X
= p(A, B, D, C)
p(C)
A
1 X
= p(A|B)p(B|C)p(D|C)p(C)
p(C)
A
X
= p(A|B)p(B|C)p(D|C)
A
= p(B|C)p(D|C).

Solution to Exercise 3.8


Define boolean random variables indicating whether there is light in the window (L), whether Marcus is in the office (O),
and whether he is logged in to the mainframe (M ). Laura’s model corresponds to the following probabilities:

p(O) = 0.8 p(L|O) = 0.75 p(L|Ō) = 0.02 p(M |O) = 0.92 p(M |Ō) = 0.15

a) The Bayesian network of the model is given by Figure 3.5, and p(M, O, L) = p(M |O)p(L|O)p(O).
b) We have that
XX XX
p(L) = p(M, O, L) = p(M |O)p(L|O)p(O)
M O M O
X
= p(L|O)p(O) = 0.75 · 0.8 + 0.02 · 0.2 = 0.604;
O

note that
P
M p(M |O) = 1.
Solutions 3. Directed graphical models 39

c) We have that
p(L, M )
p(L|M ) = ,
p(M )
where
X
p(L, M ) = p(M, O, L) = p(M |O)p(L|O)p(O) + p(M |Ō)p(L|Ō)p(Ō)
O
= 0.92 · 0.75 · 0.8 + 0.15 · 0.02 · 0.2 = 0.5526

and

p(M ) = p(M |O)p(O) + p(M |Ō)p(Ō)


= 0.92 · 0.8 + 0.15 · 0.2 = 0.766;

so p(L|M ) ≈ 0.72 > p(L).

L M

Figure 3.5: Bayesian network of the model in Exercise 3.8.

Solution to Exercise 3.9


Observing the conditional distribution in Table 3.2, we have that

p(E, O, I, S, F ) = p(F |E, I)p(S|O)p(I|O, E)p(O|E)p(E)

a) The Bayesian network is presented in Figure 3.6.

O I

S
F

Figure 3.6: Bayesian network of the economic model in Exercise 3.9.

b) We are after p(I = h|S = n, F = h); we can compute it using

p(I = h, S = n, F = h)
p(I = h|S = n, F = h) = .
p(S = n, F = h)
Solutions 3. Directed graphical models 40

We have that
XX
P (I = h, S = n, F = h) = p(I = h, S = n, F = h, O, E)
O E
XX
= p(F = h|O, E, I = h)p(S = n|O)p(I = h|O, E)p(O|E)p(E)
O E
=p(F = h|I = h, E = l)p(S = n|O = l)p(I = h|O = l, E = l)p(O = l|E = l)p(E = l)
+ p(F = h|I = h, E = l)p(S = n|O = h)p(I = h|O = h, E = l)p(O = l|E = h)p(E = h)
+ p(F = h|I = h, E = h)p(S = n|O = l)p(I = h|O = l, E = h)p(O = h|E = l)p(E = l)
+ p(F = h|I = h, E = h)p(S = n|O = h)p(I = h|O = h, E = h)p(O = h|E = h)p(E = h)
≈ 0.3096

and that
XXX
P (S = n, F = h) = p(I, S = n, F = h, O, E)
I O E
=p(I = h, S = n, F = h, O, E) + p(I = l, S = n, F = h, O, E)
XX
=p(I = h, S = n, F = h, O, E) + p(F = h|O, E, I = l)p(S = n|O)p(I = l|O, E)p(O|E)p(E)
O E
=p(I = h, S = n, F = h)
+ p(F = h|I = l, E = l)p(S = n|O = h)p(I = l|O = h, E = l)p(O = l|E = h)p(E = h)
+ p(F = h|I = l, E = h)p(S = n|O = l)p(I = l|O = l, E = h)p(O = h|E = l)p(E = l)
+ p(F = h|I = l, E = h)p(S = n|O = h)p(I = l|O = h, E = h)p(O = h|E = h)p(E = h)
≈0.3144

Hence,
0.6902
p(I = h|S = n, F = h) = ≈ 0.9847
0.6985
Bibliography

[1] David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2012.
[2] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
[3] Kevin B Korb and Ann E Nicholson. Bayesian artificial intelligence. CRC press, 2010.
[4] Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their
application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):157–194,
1988.
[5] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.

Version: September 10, 2019

You might also like