0% found this document useful (0 votes)
43 views

Bayesian Belief Networks: CS 2740 Knowledge Representation

The document discusses Bayesian belief networks (BBNs). BBNs represent a probabilistic graphical model that compactly represents a joint probability distribution over a set of variables. A BBN has two components: 1) a directed acyclic graph whose nodes are variables and edges represent direct dependencies between variables, and 2) local conditional probability distributions that quantify the relationship between a variable and its parents in the graph. BBNs address limitations of representing the full joint distribution by exploiting conditional independencies between variables.

Uploaded by

alenjamesjoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Bayesian Belief Networks: CS 2740 Knowledge Representation

The document discusses Bayesian belief networks (BBNs). BBNs represent a probabilistic graphical model that compactly represents a joint probability distribution over a set of variables. A BBN has two components: 1) a directed acyclic graph whose nodes are variables and edges represent direct dependencies between variables, and 2) local conditional probability distributions that quantify the relationship between a variable and its parents in the graph. BBNs address limitations of representing the full joint distribution by exploiting conditional independencies between variables.

Uploaded by

alenjamesjoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CS 2740 Knowledge Representation

Lecture 19

Bayesian belief networks

Milos Hauskrecht
[email protected]
5329 Sennott Square

CS 2740 Knowledge Representation M. Hauskrecht

Probabilistic inference
Various inference tasks:

• Diagnostic task. (from effect to cause)

P( Pneumonia | Fever = T )
• Prediction task. (from cause to effect)

P( Fever | Pneumonia = T )
• Other probabilistic queries (queries on joint distributions).
P (Fever )
P ( Fever , ChestPain )

CS 2740 Knowledge Representation M. Hauskrecht

1
Inference
Any query can be computed from the full joint distribution !!!
• Joint over a subset of variables is obtained through
marginalization
P( A = a, C = c) = ∑∑ P( A = a, B = bi , C = c, D = d j )
i j

• Conditional probability over set of variables, given other


variables’ values is obtained through marginalization and
definition of conditionals
P ( A = a , C = c, D = d )
P ( D = d | A = a, C = c ) =
P ( A = a, C = c )
∑ P ( A = a , B = b , C = c, D = d )
i
= i

∑∑ P( A = a, B = b , C = c, D = d )
i j
i j

CS 2740 Knowledge Representation M. Hauskrecht

Inference
Any query can be computed from the full joint distribution !!!
• Any joint probability can be expressed as a product of
conditionals via the chain rule.

P( X 1 , X 2 ,K X n ) = P( X n | X 1, K X n −1 ) P( X 1, K X n −1 )
= P( X n | X 1, K X n −1 ) P( X n −1 | X 1, K X n − 2 ) P( X 1, K X n − 2 )

= ∏i =1 P( X i | X 1, K X i −1 )
n

• Sometimes it is easier to define the distribution in terms of


conditional probabilities:
– E.g. P( Fever | Pneumonia = T )
P( Fever | Pneumonia = F )
CS 2740 Knowledge Representation M. Hauskrecht

2
Modeling uncertainty with probabilities
• Defining the full joint distribution makes it possible to
represent and reason with uncertainty in a uniform way
• We are able to handle an arbitrary inference problem
Problems:
– Space complexity. To store a full joint distribution we
need to remember O(d n ) numbers.
n – number of random variables, d – number of values
– Inference (time) complexity. To compute some queries
n
requires O(d. ) steps.
– Acquisition problem. Who is going to define all of the
probability entries?

CS 2740 Knowledge Representation M. Hauskrecht

Medical diagnosis example


• Space complexity.
– Pneumonia (2 values: T,F), Fever (2: T,F), Cough (2: T,F),
WBCcount (3: high, normal, low), paleness (2: T,F)
– Number of assignments: 2*2*2*3*2=48
– We need to define at least 47 probabilities.
• Time complexity.
– Assume we need to compute the marginal of Pneumonia=T
from the full joint
P( Pneumonia = T ) =
= ∑ ∑ ∑ ∑ P( Pneumonia = T , Fever = i, Cough = j,WBCcount = k , Pale = u)
i∈T , F j∈T , F k = h , n ,l u∈T , F

– Sum over: 2*2*3*2=24 combinations


CS 2740 Knowledge Representation M. Hauskrecht

3
Modeling uncertainty with probabilities
• Knowledge based system era (70s – early 80’s)
– Extensional non-probabilistic models
– Solve the space, time and acquisition bottlenecks in
probability-based models
– froze the development and advancement of KB systems
and contributed to the slow-down of AI in 80s in general

• Breakthrough (late 80s, beginning of 90s)


– Bayesian belief networks
• Give solutions to the space, acquisition bottlenecks
• Partial solutions for time complexities
• Bayesian belief network

CS 2740 Knowledge Representation M. Hauskrecht

Bayesian belief networks (BBNs)


Bayesian belief networks.
• Represent the full joint distribution over the variables more
compactly with a smaller number of parameters.
• Take advantage of conditional and marginal independences
among random variables

• A and B are independent


P ( A, B ) = P ( A ) P ( B )
• A and B are conditionally independent given C

P ( A, B | C ) = P ( A | C ) P ( B | C )
P( A | C , B) = P( A | C )

CS 2740 Knowledge Representation M. Hauskrecht

4
Alarm system example.
• Assume your house has an alarm system against burglary.
You live in the seismically active area and the alarm system
can get occasionally set off by an earthquake. You have two
neighbors, Mary and John, who do not know each other. If
they hear the alarm they call you, but this is not guaranteed.
• We want to represent the probability distribution of events:
– Burglary, Earthquake, Alarm, Mary calls and John calls

Causal relations Burglary Earthquake

Alarm

JohnCalls MaryCalls

CS 2740 Knowledge Representation M. Hauskrecht

Bayesian belief network.


1. Directed acyclic graph
• Nodes = random variables
Burglary, Earthquake, Alarm, Mary calls and John calls
• Links = direct (causal) dependencies between variables.
The chance of Alarm is influenced by Earthquake, The
chance of John calling is affected by the Alarm
Burglary P(B) Earthquake P(E)

Alarm P(A|B,E)

P(J|A) P(M|A)

JohnCalls MaryCalls

CS 2740 Knowledge Representation M. Hauskrecht

5
Bayesian belief network.
2. Local conditional distributions
• relate variables and their parents

Burglary P(B) Earthquake P(E)

Alarm P(A|B,E)

P(J|A) P(M|A)

JohnCalls MaryCalls

CS 2740 Knowledge Representation M. Hauskrecht

Bayesian belief network.

P(B) P(E)
T F T F
Burglary 0.001 0.999 Earthquake 0.002 0.998

P(A|B,E)
B E T F
T T 0.95 0.05
Alarm T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
P(J|A) P(M|A)
A T F A T F
JohnCalls T 0.90 0.1 MaryCalls T 0.7 0.3
F 0.05 0.95 F 0.01 0.99

CS 2740 Knowledge Representation M. Hauskrecht

6
Bayesian belief networks (general)
Two components: B = ( S , Θ S ) B E
• Directed acyclic graph
– Nodes correspond to random variables A
– (Missing) links encode independences
J M
• Parameters
– Local conditional probability distributions
for every variable-parent configuration P(A|B,E)
B E T F
P ( X i | pa ( X i ))
T T 0.95 0.05
Where: T F 0.94 0.06
F T 0.29 0.71
pa ( X i ) - stand for parents of Xi F F 0.001 0.999

CS 2740 Knowledge Representation M. Hauskrecht

Full joint distribution in BBNs


Full joint distribution is defined in terms of local conditional
distributions (obtained via the chain rule):

P ( X 1 , X 2 ,.., X n ) = ∏ P( X
i =1,.. n
i | pa ( X i ))

B E
Example:
Assume the following assignment A
of values to random variables
B = T, E = T, A = T, J = T, M = F J M

Then its probability is:


P(B = T, E = T, A = T, J = T, M = F) =
P(B = T)P(E = T)P(A = T | B = T, E = T)P(J = T | A = T)P(M = F | A = T)
CS 2740 Knowledge Representation M. Hauskrecht

7
Bayesian belief networks (BBNs)
Bayesian belief networks
• Represent the full joint distribution over the variables more
compactly using the product of local conditionals.
• But how did we get to local parameterizations?
Answer:
• Graphical structure encodes conditional and marginal
independences among random variables
• A and B are independent P ( A , B ) = P ( A ) P ( B )
• A and B are conditionally independent given C
P( A | C , B) = P( A | C )
P ( A, B | C ) = P ( A | C ) P ( B | C )
• The graph structure implies the decomposition !!!
CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBNs
3 basic independence structures:

1. Burglary
2. 3.
Burglary Earthquake
Alarm

Alarm

Alarm JohnCalls MaryCalls

JohnCalls

CS 2740 Knowledge Representation M. Hauskrecht

8
Independences in BBNs

1. Burglary
2. 3.
Burglary Earthquake
Alarm

Alarm
Alarm JohnCalls MaryCalls

JohnCalls

1. JohnCalls is independent of Burglary given Alarm


P ( J | A, B ) = P ( J | A )
P ( J , B | A) = P ( J | A) P ( B | A)
CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBNs

1. Burglary
2. 3.
Burglary Earthquake
Alarm

Alarm

Alarm JohnCalls MaryCalls

JohnCalls

2. Burglary is independent of Earthquake (not knowing Alarm)


Burglary and Earthquake become dependent given Alarm !!
P(B, E ) = P(B)P(E )

CS 2740 Knowledge Representation M. Hauskrecht

9
Independences in BBNs

1. Burglary
2. 3.
Burglary Earthquake
Alarm

Alarm
Alarm JohnCalls MaryCalls

JohnCalls

3. MaryCalls is independent of JohnCalls given Alarm


P ( J | A, M ) = P ( J | A )
P ( J , M | A) = P ( J | A) P ( M | A)
CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBN
• BBN distribution models many conditional independence
relations among distant variables and sets of variables
• These are defined in terms of the graphical criterion called d-
separation
• D-separation and independence
– Let X,Y and Z be three sets of nodes
– If X and Y are d-separated by Z, then X and Y are
conditionally independent given Z
• D-separation :
– A is d-separated from B given C if every undirected path
between them is blocked with C
• Path blocking
– 3 cases that expand on three basic independence structures

CS 2740 Knowledge Representation M. Hauskrecht

10
Undirected path blocking
A is d-separated from B given C if every undirected path
between them is blocked

• 1. Path blocking with a linear substructure


Z
X Y

Z in C
X in A Y in B

CS 2740 Knowledge Representation M. Hauskrecht

Undirected path blocking


A is d-separated from B given C if every undirected path
between them is blocked

• 2. Path blocking with the wedge substructure


Z

X Y
Z in C
X in A Y in B

CS 2740 Knowledge Representation M. Hauskrecht

11
Undirected path blocking
A is d-separated from B given C if every undirected path
between them is blocked

• 3. Path blocking with the vee substructure


X in A Y in B
Y
X
Z
Z or any of its descendants not in C

CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBNs
Burglary Earthquake

RadioReport
Alarm

JohnCalls MaryCalls

• Earthquake and Burglary are independent given MaryCalls ?

CS 2740 Knowledge Representation M. Hauskrecht

12
Independences in BBNs
Burglary Earthquake

RadioReport
Alarm

JohnCalls MaryCalls

• Earthquake and Burglary are independent given MaryCalls F


• Burglary and MaryCalls are independent (not knowing Alarm) ?

CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBNs
Burglary Earthquake

RadioReport
Alarm

JohnCalls MaryCalls

• Earthquake and Burglary are independent given MaryCalls F


• Burglary and MaryCalls are independent (not knowing Alarm) F
• Burglary and RadioReport are independent given Earthquake ?

CS 2740 Knowledge Representation M. Hauskrecht

13
Independences in BBNs
Burglary Earthquake

RadioReport
Alarm

JohnCalls MaryCalls

• Earthquake and Burglary are independent given MaryCalls F


• Burglary and MaryCalls are independent (not knowing Alarm) F
• Burglary and RadioReport are independent given Earthquake T
• Burglary and RadioReport are independent given MaryCalls ?

CS 2740 Knowledge Representation M. Hauskrecht

Independences in BBNs
Burglary Earthquake

RadioReport
Alarm

JohnCalls MaryCalls

• Earthquake and Burglary are independent given MaryCalls F


• Burglary and MaryCalls are independent (not knowing Alarm) F
• Burglary and RadioReport are independent given Earthquake T
• Burglary and RadioReport are independent given MaryCalls F

CS 2740 Knowledge Representation M. Hauskrecht

14
Bayesian belief networks (BBNs)
Bayesian belief networks
• Represents the full joint distribution over the variables more
compactly using the product of local conditionals.
• So how did we get to local parameterizations?

P ( X 1 , X 2 ,.., X n ) = ∏ P( X
i =1,.. n
i | pa ( X i ))

• The decomposition is implied by the set of independences


encoded in the belief network.

CS 2740 Knowledge Representation M. Hauskrecht

Full joint distribution in BBNs


B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

CS 2740 Knowledge Representation M. Hauskrecht

15
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)

CS 2740 Knowledge Representation M. Hauskrecht

Full joint distribution in BBNs


B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)

CS 2740 Knowledge Representation M. Hauskrecht

16
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)

CS 2740 Knowledge Representation M. Hauskrecht

Full joint distribution in BBNs


B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)
P(B = T)P(E = T)

CS 2740 Knowledge Representation M. Hauskrecht

17
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A

P(B = T, E = T, A = T, J = T, M = F) = J M

= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)
P(B = T)P(E = T)
= P(J = T | A = T)P(M = F | A = T)P(A = T | B = T, E = T)P(B = T)P(E = T)
CS 2740 Knowledge Representation M. Hauskrecht

Bayesian belief network.


1. Directed acyclic graph
• Nodes = random variables
• Links = missing links encode independences.

Burglary P(B) Earthquake P(E)

Alarm P(A|B,E)

P(J|A) P(M|A)

JohnCalls MaryCalls

CS 2740 Knowledge Representation M. Hauskrecht

18
Bayesian belief network
2. Local conditional distributions
• relate variables and their parents
P(B) P(E)
T F T F
Burglary 0.001 0.999 Earthquake 0.002 0.998

P(A|B,E)
B E T F
T T 0.95 0.05
Alarm T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
P(J|A) P(M|A)
A T F A T F
JohnCalls T 0.90 0.1 MaryCalls T 0.7 0.3
F 0.05 0.95 F 0.01 0.99
CS 2740 Knowledge Representation M. Hauskrecht

Full joint distribution in BBNs


Full joint distribution is defined in terms of local conditional
distributions (obtained via the chain rule):

P ( X 1 , X 2 ,.., X n ) = ∏ P( X
i =1,.. n
i | pa ( X i ))

B E
Example:
Assume the following assignment A
of values to random variables
B = T, E = T, A = T, J = T, M = F J M

Then its probability is:


P(B = T, E = T, A = T, J = T, M = F) =
P(B = T)P(E = T)P(A = T | B = T, E = T)P(J = T | A = T)P(M = F | A = T)
CS 2740 Knowledge Representation M. Hauskrecht

19
Parameter complexity problem
• In the BBN the full joint distribution is defined as:
P ( X 1 , X 2 ,.., X n ) = ∏ P ( X i | pa ( X i ) )
i = 1 ,.. n
• What did we save?
Alarm example: 5 binary (True, False) variables
# of parameters of the full joint:
2 5 = 32 Burglary Earthquake

One parameter is for free:


2 5 − 1 = 31 Alarm

# of parameters of the BBN: ?


JohnCalls MaryCalls

CS 2740 Knowledge Representation M. Hauskrecht

Parameter complexity problem


• In the BBN the full joint distribution is defined as:
P ( X 1 , X 2 ,.., X n ) = ∏ P ( X i | pa ( X i ) )
i = 1 ,.. n
• What did we save?
Alarm example: 5 binary (True, False) variables
# of parameters of the full joint:
2 5 = 32 Burglary Earthquake

One parameter is for free:


2 5 − 1 = 31 Alarm

# of parameters of the BBN:


2 3 + 2 ( 2 2 ) + 2 ( 2 ) = 20 JohnCalls MaryCalls

One parameter in every conditional is for free:


?
CS 2740 Knowledge Representation M. Hauskrecht

20
Parameter complexity problem
• In the BBN the full joint distribution is defined as:
P ( X 1 , X 2 ,.., X n ) = ∏ P ( X i | pa ( X i ) )
i = 1 ,.. n
• What did we save?
Alarm example: 5 binary (True, False) variables
# of parameters of the full joint:
2 5 = 32 Burglary Earthquake

One parameter is for free:


2 5 − 1 = 31 Alarm

# of parameters of the BBN:


2 3 + 2 ( 2 2 ) + 2 ( 2 ) = 20 JohnCalls MaryCalls

One parameter in every conditional is for free:


2 2 + 2 ( 2 ) + 2 (1) = 10
CS 2740 Knowledge Representation M. Hauskrecht

21

You might also like