Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Challenging Common Assumptions
in the Unsupervised Learning of
Disentangled Representations
(ICML 2019 Best Paper)
2019.07.17.
Sangwoo Mo
1

Outline
• Quick Review
• What is disentangled representation (DR)?
• Prior work on the unsupervised learning of DR
• Theoretical Results
• Unsupervised learning of DR is impossible without inductive biases
• Empirical Results
• Q1. Which method should be used?
• Q2. How to choose the hyperparameters?
• Q3. How to select the best model from a set of trained models?
2

Quick Review
• Disentangled representation: Learn a representation 𝑧 from the data 𝑥 s.t.
• Contain all the information of 𝑥 in a compact and interpretable structure
• Currently no single formal definition L (many definitions for the factor of variation)
3* Image from BetaVAE (ICLR 2017)

Quick Review: Prior Methods
• BetaVAE (ICLR 2017)
• Use 𝛽 > 1 for the VAE objective (force to the factorized Gaussian prior)
4

• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
51. It requires the aggregated posterior 𝑞(𝒛)

• FactorVAE (ICML 2018) & 𝜷-TCVAE (NeurIPS 2018)
• Penalize the total correlation of the representation, which is estimated1 by
adversarial learning (FactorVAE) or (biased) mini-batch approximation (𝛽-TCVAE)
• DIP-VAE (ICLR 2018)
• Match 𝑞(𝒛) to the disentangled prior 𝑝(𝒛), where 𝐷 is a (tractable) moment matching
61. It requires the aggregated posterior 𝑞(𝒛)

Quick Review: Evaluation Metrics
• Many heuristics are proposed to quantitatively evaluate the disentanglement
• Basic idea: Factors and representation should have 1-1 correspondence
7

• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
8

• BetaVAE (ICLR 2017) & FactorVAE (ICML 2018) metric
• Given a factor 𝑐., generate two (simulation) data 𝑥, 𝑥′ with same 𝑐. but different 𝑐1.,
then train a classifier to predict 𝑐. using the difference of the representation |𝑧 − 𝑧4|
• Indeed, the classifier will map the zero-valued index of |𝑧 − 𝑧4
| to the factor 𝑐.
• Mutual Information Gap (NeurIPS 2018)
• Compute the mutual information between each factor 𝑐. and each dimension of 𝑧5
• For the highest and second highest dimensions 𝑖7 and 𝑖8 of the mutual information,
measure the difference between them: 𝐼 𝑐., 𝑧5:
− 𝐼(𝑐., 𝑧5;
)
9

Theoretical Results
• “Unsupervised learning of disentangled representations is fundamentally impossible
without inductive biases on both the models and the data”
10

Theoretical Results
• Theorem. For 𝑝 𝒛 = ∏5>7
?
𝑝(𝑧5), there exists an infinite family of bijective functions 𝑓 s.t.
• 𝒛 and 𝑓(𝒛) are completely entangled (i.e.,
ABC(𝒖)
AEF
≠ 0 a.e. for all 𝑖, 𝑗)
• 𝒛 and 𝑓(𝒛) have same marginal distribution (i.e., 𝑃 𝒛 ≤ 𝒖 = 𝑃(𝑓 𝒛 ≤ 𝒖) for all 𝒖)
11

Theoretical Results
?
ABC(𝒖)
AEF
• Proof sketch. By construction.
• Let 𝑔: supp 𝒛 → 0,1 ?
s.t. 𝑔5 𝒗 = 𝑃(𝑧5 ≤ 𝑣5)
• Let ℎ: 0,1 ? → ℝ? s.t. ℎ5 𝒗 = 𝜓17(𝑣5) where 𝜓 is a c.d.f. of a normal distribution
• Then for any orthogonal matrix 𝑨, the following 𝑓 satisfies the condition:
𝑓 𝒖 = ℎ ∘ 𝑔 17(𝑨 ℎ ∘ 𝑔 𝒖 )
12

Theoretical Results
?
ABC(𝒖)
AEF
• Corollary. One cannot find the disentangled representation 𝑟(𝒙) (w.r.t. to the generative
model 𝐺(𝒙|𝒛)) as there are two equivalent generative models 𝐺 and 𝐺′ which has same
marginal distribution 𝑝(𝒙) but 𝒛4 = 𝑓(𝒛) is completely entangled w.r.t. 𝒛 (so as 𝑟(𝒙))
• Namely, inferring representation 𝒛 from observation 𝒙 is not a well-defined problem
13

Theoretical Results
• 𝛽-VAE learns some decorrelated features, but they are not semantically decomposed
• E.g., the width is entangled with the leg style in 𝛽-VAE
14* Image from BetaVAE (ICLR 2017)

Empirical Results
• A. Hyperparameters and random seeds matter more than the choice of the model
15

Empirical Results
• A. Selecting the best hyperparameter is extremely hard due to the randomness
16

Empirical Results
• A. Also, there is no obvious trend over the variation of hyperparameters
17

Empirical Results
• A. Good hyperparameters often can be transferred (e.g., dSprites → color-dSprites)
18
Rank correlation matrix

Empirical Results
• A. Unsupervised (training) scores do not correlated to the disentanglement metrics
19
Unsupervised scores vs disentanglement metrics

Summary
• TL;DR: Current unsupervised learning of disentangled representation has a limitation!
• Summary of findings:
• A. Current methods should be rigorously validated (no significant difference)
20

Summary
• A. No rule of thumb, but transfer across datasets seem to help!
21

Summary
• A. No rule of thumb, but transfer across datasets seem to help!
• A. (Unsupervised) model selection remains a key challenge!
22

Following Work & Future Direction
• “Disentangling Factors of Variation Using Few Labels”
(ICLR Workshop 2019, NeurIPS 2019 submission)
• Summary of findings: Using a few labels highly improves the disentanglement!
23

1. Existing disentanglement metrics + few labels perform well on model selection,
even though models are completely trained in an unsupervised manner
24

2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
25

2. One can obtain even better results if one use few labels into the learning processes
(use a simple supervised regularizer)
• Take-home message: Future research should be on “how to utilize inductive bias better”
using a few labels, rather than the previous total correlation-like approaches
26

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Recommended

More Related Content

What's hot (20)

Similar to Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations (20)

More from Sangwoo Mo (20)

Recently uploaded (20)

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations