Ijcai Ecai Tutorial
Ijcai Ecai Tutorial
Models
ADITYA GROVER AND STEFANO ERMON
STANFORD UNIVERSITY
IJCAI-ECAI 2018
URL: goo.gl/H1prjP
Introduction
Computational Speech
Computer Vision
+
Data Prior Knowledge
(e.g., images of bedrooms) Learning (e.g., physics, materials, ..)
Generative
Data model Prior Knowledge
Y=B , Y=D ,
P(Y = Bedroom | X= )
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 9
Progressive Growing of GANs
Parametric
Concatenative
WaveNet
Unconditional
Music
Li et al., 2017
P( X= ) … P( X= ) P( X= )
Still complex! L
Simple! J Needs a distribution for all
possible values of x1,x2,x3
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 21
Representation – Bayes Nets
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)
Solution #1 : simply the conditionals (Bayes Nets)
E.g., P(x1,x2,x3,x4) ≈ P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3) = P(x1) P(x2|x1) P(x3|x2) P(x4|x3)
x2 x4
x3 Neural network
Image x
Hou el al., 2016
x
! " $, & = ! & ! " ($|&) Image x log !" $ = log ∫ !" $, & ./
1) Easy to sample from J
2) P(x) intractable L
3) Enables feature learning J
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 26
Learning in Generative Models
Given: Samples from a data distribution and model family
Goal: Approximate a data distribution as closely as possible
((!"#$# , !% )
!"#$# !%
+, ~!"#$# %∈'
, = /, 0, … , 2 Model family
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 27
Learning in Generative Models
Given: Samples from a data distribution
Goal: Approximate a data distribution as closely as possible
min '()*+,+ , )$ )
$∈&
• Statistically efficient
• Requires the ability to tractably evaluate or optimize likelihoods
!&'(' (x1 )!&'(' (x2 |x1 )!&'(' (x3 |x1 ,x2 )!&'(' (x4 |x3 )
#$ ~!&'(' Analytic solution: pick " so that conditional probabilities
$ = *, ,, … , .
match the empirical ones, e.g., !" (x1 ) =!&'(' (x1 )
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 30
Maximum likelihood estimation
Tractable likelihoods: Directed models such as autoregressive models
Neural Net param.: !" (x1 ,x2 ,x3,x4 ) = !" (x1 ) !" (x2 |x1 ) !" (x3 |x1 ,x2 ) !" (x4 |x3 )
where !" (x3 |x1 ,x2 ) ≈ Neural-Net(x1 , x2; ")
min 45~!&'(' [−log ;" (x)]
"∈3
Figure from
Goodfellow et al., 2014
Pixel-RNN
Char-RNN
Sutskevar et al., 2011, Karpathy, 2015, Theis & Bethge, 2015, van den Oord, 2016a
PixelCNN WaveNet
van den Oord et al., 2016a, 2016b, 2016c
Key ideas
§Approximate the posterior !" ($|&) with a simpler, tractable
distribution () ($|&)
§Cast inference as optimization over the parameters of the model
and the approximate posterior
./ &, $
log - ./ &, $ = log - 23 ($|&)
23 ($|&)
$ $
z
x
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 48
Inference as Optimization
New Goal: Maximize lower bound to the marginal log-likelihood of
the data is tractable!
./ &, $
log - ./ &, $ = log - 23 ($|&)
23 ($|&)
$ $
67 &,$ z
≥ ∫$ 23 $ & log 8 ($|&)
9
23 $ & = ./ $|&
x
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 49
Variational Bayes
• Evidence lower bound (ELBO) for the marginal log-likelihood of !
Neal, 1998
%(' ( ) , + , ))
$ ! #
#∗ #.
!∗
!.
Improving variational learning via:
1. Better optimization techniques
2. More expressive approximating families
3. Alternate loss functions
Figure inspired from David Blei’s keynote at AISTATS 2018
Key idea: The mapping between ! and ", given by #$ : ℝ' → ℝ' , is
deterministic and invertible such that " = #$ ! and ! = #$*+ " .
Planar Flows
! = # + %ℎ(() # + *)
' = )(+)
Papamakarios et al., 2017, Kingma et al., 2016, van den Oord et al., 2018
Generative Model
• The mapping between ! and ", given by #$ , is deterministic
Training procedure
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 74
Distributional perspective - Discriminator
Jenson-Shannon Divergence
• Optimal generator is given by !" = !#$%$
Goodfellow et al., 2014
Wu et al., 2017, Grover et al., 2018, Salimans et al., 2016, Heusel et al., 2018
…
Labeled data
( , 0 ) = (xN , yN ) How can we leverage
unlabeled data?
( , ? ) = (xN+1 , ?)
Unlabeled
…
data
( , ? ) = (xN+M , ?)
!" ($|&)
Step 1: learn a latent var. generative
model on labeled and unlabeled data
Kingma et al, 2014
( , 0 ) = (xN , yN )
!" ((|&)
Inference net is a classifier!
( , ? ) = (xN+1 , ?) x
… Class-conditional VAE: a fully
probabilistic model where the label (y)
( , ? ) = (xN+M , ?) is another latent variable
Kingma et al, 2014
…
Labeled data
( , 0 ) = (xN , yN ) log $% &' , )' = log + $% &' , )' , -
,
≥ ELBO
( , ? ) = (xN+1 , ?)
… Unlabeled data
log $% &. = log + $% &., )
( , ? ) = (xN+M , ?) /
Generative model
5 0
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 91
Semi Supervised Learning
Idea 1: couple any discriminative
model and generative one
through a shared latent space
Differentiab Differentiab
le function le function
D D
Sample Sample
from expert from model
Output policy:
From raw visual inputs
Li et al. 2017
Figure: (Left) Likelihoods of different adversarial examples. (Right) ROC curves for detecting various
attacks.
Song et al., 2018
Figure: Adversarial images (left) and purified images after PixelDefend (right).
0.30 0.25
0.25 0.20
0.20
0.15
0.15
0.10
0.10
0.05 0.05
0.00 0.00
50
100
200
300
400
500
750
50
100
200
300
400
500
MNIST Omniglot
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 112
Transfer Compressive Sensing
Transferring from source, data-rich to target, data-hungry domains