Unit 3 Graphical Models
Unit 3 Graphical Models
Graphical Models
Aka Bayesian networks, probabilistic networks
Nodes are hypotheses (random vars) and the probabilities
corresponds to our belief in the truth of the hypothesis
Arcs are direct influences between hypotheses
The structure is represented as a directed acyclic graph
(DAG)
The parameters are the conditional probabilities in the
arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
Causes and Bayes’ Rule
Diagnostic inference:
diagnostic Knowing that the grass is wet,
what is the probability that rain is
causal the cause?
P W |R P R
P R |W
P W
P W |R P R
P W |R P R P W |~ R P ~ R
0.9 0.4
0.75
0.9 0.4 0.2 0.6
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Conditional Independence
X and Y are independent if
P(X,Y)=P(X)P(Y)
X and Y are conditionally independent given Z if
P(X,Y|Z)=P(X|Z)P(Y|Z)
or
P(X|Y,Z)=P(X|Z)
Three canonical cases: Head-to-tail, Tail-to-tail, head-to-
head
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5
Case 1: Head-to-Head
P(X,Y,Z)=P(X)P(Y|X)P(Z|Y)
P(W|C)=P(W|R)P(R|C)+P(W|~R)P(~R|C)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6
Case 2: Tail-to-Tail
P(X,Y,Z)=P(X)P(Y|X)P(Z|X)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Case 3: Head-to-Head
P(X,Y,Z)=P(X)P(Y)P(Z|X,Y)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Causal vs Diagnostic Inference
Causal inference: If the
sprinkler is on, what is the
probability that the grass is wet?
Diagnostic: P(C|W ) = ?
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
Classification
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
Naive Bayes’ Classifier
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Linear Regression
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Junction Trees
If X does not separate E+ and E-, we convert it into a
junction tree and then apply the polytree algorithm
Tree of moralized,
clique nodes
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
Undirected Graphs: Markov
Random Fields
In a Markov random field, dependencies are symmetric,
for example, pixels in an image
In an undirected graph, A and B are independent if
removing C makes them unconnected.
Potential function yc(Xc) shows how favorable is the
particular configuration X over the clique C
The joint is defined in terms of the clique potentials
1
p( X ) y C ( X C ) where normalizer Z y C ( X C )
Z C X C
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15
Factor Graphs
Define new factor nodes and write the joint in terms of
them
1
p( X )
Z S
fS ( X S )
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16
Learning a Graphical Model
Learning the conditional probabilities, either as tables (for
discrete case with small number of parents), or as
parametric functions
Learning the structure of the graph: Doing a state-space
search over a score function that uses both goodness of
fit to data and some measure of complexity
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
Influence Diagrams
decision node
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18