Cse291d 7
Cse291d 7
1
Announcements
• Project proposals are due today!!
2
Log-sum-exp trick
• When computing probabilities, you will probably quickly hit
numerical underflow issues (e.g. in your homework!)
3
Log-sum-exp
• Log-sum-exp in
2 dimensions:
• Monotonic,
convex
MATLAB code:
5
Latent variable models
Z Latent variables
Discrete
Z latent variables:
Cluster assignments
7
Learning outcomes
By the end of the lesson, you should be able to:
8
9
10
Mixture models
Convex combination
of distributions
Marginalizing over a
latent variable zi
11
Mixture of Gaussians
12
Uses of mixture models: Clustering
Cluster centers
13
Uses of mixture models:
Density estimation
14
Mixture models
for classification (e.g. naïve Bayes)
Cluster assignments
Y correspond to
observed class labels
Parameters Φ X
Data
Points
15
Semi-supervised classification
Y Z
Φ X Φ X
Data Data
Points Points
16
Mixtures of experts
Regression
output
Y
Input feature X
17
Mixtures of experts
Regression
output
Y
Input-dependent
cluster assignments:
gating function
Input feature X
18
Mixtures of experts
Gating function:
Predicted y:
19
Topic models
21
Actually, D is strictly speaking the
right answer: the complete data log-
likelihood is concave.
,non-
22
,convex convex
Computing the MLE
• Suppose the complete data log-likelihood is in
the exponential family
Log-sum-exp
Linear
Convex Concave
23
Computing the MLE
• Observed data LL:
Log-sum-exp Log-sum-exp
Difference of
convex, D.C.
24
Mixtures of exponential families
• 1-of-k notation:
25
Mixtures of exponential families
• The complete data log-likelihood is in the exponential
family:
26
EM for exponential family mixtures
• E-step: Compute lower bound, expected
complete data log-likelihood:
Linear
Convex Convex Convex
27
EM for exponential family mixtures
• E-step responsibilities:
28
M-step
30
Collapsed Gibbs sampling
• Marginalize out the parameters
• Perform Gibbs sampling on just the z’s
• Recover parameter estimates based on z at the end
of the algorithm
• Rao-Blackwell Theorem:
31
Collapsed Gibbs sampling
32
Marginalize out mixture parameters
• Assume a Dirichlet prior on mixture weights
33
Collapsed conditional probabilities
34
Collapsed Gibbs sampler
for mixture model
35
36
Performance of collapsed sampler
37
Mixing advantages of
collapsed sampler
• Stochasticity: updating z updates the counts
immediately, so the information is propagated
sooner
38
Think-pair-share: Triage
39