20-gaussian-mixture-model
20-gaussian-mixture-model
Some of the slides are based on slides from Jiawei Han Chao Zhang, Mahdi Roozbahani and Barnabás Póczos.
Outline
• Overview
• Gaussian Mixture Model
• The Expectation-Maximization Algorithm
Recap
Conditional probabilities:
𝑝 𝐴, 𝐵 = 𝑝 𝐴 𝐵 𝑝 𝐵 = 𝑝 𝐵 𝐴 𝑝(𝐴)
Bayes rule:
𝑝(𝐴, 𝐵) 𝑝 𝐵 𝐴 𝑝(𝐴)
𝑝 𝐴|𝐵 = =
𝑝(𝐵) 𝑝(𝐵)
𝑝 𝐴 = 1 = σ𝐾
𝑖=1 𝑝(𝐴 = 1, 𝐵𝑖 )= σ𝐾
𝑖=1 𝑝 𝐴 𝐵𝑖 𝑝(𝐵𝑖 )
Tomorrow=Rainy Tomorrow=Cold P(Today)
Today=Rainy 4/9 2/9 [4/9 + 2/9] = 2/3
Today=Cold 2/9 1/9 [2/9 + 1/9] = 1/3
P(Tomorrow) [4/9 + 2/9] = 2/3 [2/9 + 1/9] = 1/3
P(Tomorrow = Rainy) =
Hard Clustering Can Be Difficult
• Overview
• Gaussian Mixture Model
• The Expectation-Maximization Algorithm
Gaussian Distribution
1-d Gaussian
1 𝑥−𝜇 2
−
𝑁 𝜇, 𝜎 = 𝑒 2𝜎2
2𝜋𝜎 2
Mixture Models
• Formally a Mixture Model is the weighted sum of a number of
pdfs where the weights are determined by a distribution,
What is f in GMM?
𝜋0 𝜋1 𝜋2
𝑥
𝑥
𝑓0 (𝑥)
𝑥
𝑓1 (𝑥)
𝑥
𝑓2 (𝑥)
𝜋0 𝜋1 𝜋2 𝑥
Why 𝑝(𝑥) is a pdf?
Why GMM?
It creates a new pdf for us to generate random variables. It is
a generative model.
p x = 𝜋0 𝑁 𝑋 𝜇0 , 𝜎0 + 𝜋1 𝑁 𝑋 𝜇1 , 𝜎1 + ⋯ + 𝜋𝑘 𝑁(𝑋|𝜇𝑘 , 𝜎𝑘 )
𝑝 𝑥 = 𝑁(𝑥|𝜇𝑘 , 𝜎𝑘 )𝜋𝑘
𝑘
𝑝 𝑥 = 𝑝 𝑥 𝑧𝑘 𝑝(𝑧𝑘 ) 𝑧𝑘 is component 𝑘
𝑘
𝑝(𝑥) = 𝑝(𝑥, 𝑧𝑘 )
𝑘
GMM with graphical model concept
𝐾
𝜋𝑘 𝑧𝑛𝑘 𝑍𝑘 is the latent variable
𝑍𝑘 𝑝 𝑧𝑛𝑘 𝜋𝑘 = ෑ 𝜋𝑘 1-of-K representation
𝑘=1
𝜃
𝐾 𝑧𝑛𝑘
𝜇𝑘 𝑋𝑛 𝑝 𝑥 𝑧𝑛𝑘 , 𝜋, 𝜇, Σ = ෑ 𝑁 𝑥 𝜇𝑘 , Σ𝑘
𝑘=1
N
Given 𝑧, 𝜋, 𝜇, and Σ, what is the
probability of x in component k
Σ𝑘
𝜋0 𝜋1 𝜋2 𝑥
What is soft assignment?
𝜋0 𝜋1 𝜋2
𝑥
𝑥
p x = 𝜋0 𝑁 𝑋 𝜇0 , 𝜎0 + 𝜋1 𝑁 𝑋 𝜇1 , 𝜎1 + 𝜋2 𝑁(𝑋|𝜇2 , 𝜎2 )
Let’s calculate the responsibility of the first component among the rest for one point x
𝑁 𝑋 𝜇0 , 𝜎0 𝜋0
𝜏0 =
𝑁 𝑋 𝜇0 , 𝜎0 𝜋0 + 𝑁 𝑋 𝜇1 , 𝜎1 𝜋1 + 𝑁 𝑋 𝜇2 , 𝜎2 𝜋2
𝑝 𝑥 𝑧0 𝑝(𝑧0 )
𝜏0 =
𝑝 𝑥 𝑧0 𝑝(𝑧0 ) + 𝑝 𝑥 𝑧1 𝑝(𝑧1 ) + 𝑝 𝑥 𝑧1 𝑝(𝑧1 )
𝑝(𝑥, 𝑧0 ) 𝑝(𝑥, 𝑧0 )
𝜏0 = 𝑘=2 = = 𝑝(𝑧0 |𝑥)
σ𝑘=0 𝑝(𝑥, 𝑧𝑘 ) 𝑝(𝑥)
Given a datapoint x, what is probability of that datapoint in component 0
AND
Picking data from that specific mixture component = p(𝑥|𝑧)
z is latent, we observe x, but z is hidden
𝑝 𝑥, 𝑧 = 𝑁(𝑥|𝜇𝑘 , 𝜎𝑘 )𝜋𝑘
𝜋0 𝜋1 𝜋2
𝑥
What are GMM parameters?
Mean 𝜇𝑘 Variance 𝜎𝑘 Size 𝜋𝑘
𝑓𝑘 (𝑥) 𝜋𝑘
𝜋0 𝜋1 𝜋2
𝑥
How about GMM for multimodal distribution?
Gaussian Mixture Model
Why having “Latent variable”
• A variable can be unobserved (latent) because:
it is an imaginary quantity meant to provide some simplified and
abstractive view of the data generation process.
- e.g., speech recognition models, mixture models (soft clustering)…
it is a real-world object and/or phenomena, but difficult or impossible
to measure
- e.g., the temperature of a star, causes of a disease, evolutionary ancestors …
it is a real-world object and/or phenomena, but sometimes wasn’t
measured, because of faulty sensors, etc.
𝐾 𝐾
𝑧𝑛𝑘 𝑧𝑛𝑘
𝑝(𝑧𝑛𝑘 |𝜃) = ෑ 𝜋𝑘 𝑝 𝑥 𝑧𝑛𝑘 , 𝜃 = ෑ 𝑁 𝑥 𝜇𝑘 , Σ𝑘
𝑘=1 𝑘=1
𝑁 𝑁 𝐾
ln[𝑝 𝑥 ] = ln[𝑝 𝑥 𝜋, 𝜇, Σ ]
• Optimization of means.
)
Maximum Likelihood of a GMM
• Optimization of covariance
Maximum Likelihood of a GMM
(𝑧𝑛𝑘 )
MLE of a GMM
• Overview
• Gaussian Mixture Model
• The Expectation-Maximization Algorithm
EM for GMMs
Book : C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
EM for GMMs
𝜸(𝒛𝒏𝒌 ) 𝜸(𝒛𝒏𝒌 )
𝒌 𝒌
𝒌 𝒌
𝜸(𝒛𝒏𝒌 )
𝒌
𝜸(𝒛
(𝒛𝒏𝒌 𝜸(𝒛𝒏𝒌 )
𝒏𝒌 )
𝒌
Book : C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Relationship to K-means
• Want to maximize:
1. Initialize parameters:
2. E Step: Evaluate:
Maximizing this
The first term is the expected complete log likelihood and the
second term, which does not depend on 𝜃, is the entropy.
𝑚𝑖𝑛
𝜇𝑜𝑢𝑡 𝑋𝑖 = min{𝜇𝑜𝑢𝑡2 𝑋𝑖 , 𝜇𝑜𝑢𝑡1 (𝑋𝑖 )}
𝜇𝑖𝑛 (𝑋𝑖 ) Xi
𝜇𝑜𝑢𝑡1 (𝑋𝑖 )
Silhouette Coefficient