0% found this document useful (0 votes)
11 views

Class Generative Models.pptx

The document discusses generative models in deep learning, focusing on their role in unsupervised learning and the ability to generate new data samples from existing distributions. It covers various types of generative models, including variational autoencoders and generative adversarial networks, along with their applications, advantages, and limitations. Additionally, it highlights evaluation metrics for assessing the quality of generated samples.

Uploaded by

Abhishek Sinha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Class Generative Models.pptx

The document discusses generative models in deep learning, focusing on their role in unsupervised learning and the ability to generate new data samples from existing distributions. It covers various types of generative models, including variational autoencoders and generative adversarial networks, along with their applications, advantages, and limitations. Additionally, it highlights evaluation metrics for assessing the quality of generated samples.

Uploaded by

Abhishek Sinha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Generative Models

Mina Rezaei, Goncalo Mordido

Deep Learning for Computer Vision


Content

1. Why unsupervised learning, and why generative models? (Selected


slides from Stanford University-SS2017 Generative Model)

2. What is a variational autoencoder? (Jaan Altosaar’s blog & OpenAI


blog & Victoria University-Generative Model )

Deep Learning
Generative for Computer Vision
Models Slide #2
Supervised Learning

Supervised Learning

Data: (x, y) where x is data, y is label

Goal: Learn a function to map x→ y

Examples: Classification, Object Detection,

Semantic segmentation, Image captioning

Deep Learning
Generative for Computer Vision
Models Slide #3
Supervised Learning

Supervised Learning
0.85
Data: (x, y) where x is data, y is label

Goal: Learn a function to map x→ y

Examples: Classification, Object Detection,

Semantic segmentation, Image captioning

Deep Learning
Generative for Computer Vision
Models Slide #4
Supervised Learning

Supervised Learning

Data: (x, y) where x is data, y is label

Goal: Learn a function to map x→ y

Examples: Classification, Object Detection,

Semantic segmentation, Image captioning

Deep Learning
Generative for Computer Vision
Models Slide #5
Unsupervised Learning

Unspervised Learning

Data: x, NO labels!!

Goal: Learn some underlying hidden structure of the data

Examples: Clustering, Dimensionality reduction,

Feature learning, Density estimation


K-means clustering

Deep Learning
Generative for Computer Vision
Models Slide #6
Unsupervised Learning

Unspervised Learning

Data: x, NO labels!!

Goal:Learn some underlying hidden structure of

the data
Principal Component Analysis

Examples: Clustering, Dimensionality reduction,

Feature learning, Density estimation

Deep Learning
Generative for Computer Vision
Models Slide #7
Unsupervised Learning

Unspervised Learning

Data: x, NO labels!!

Goal:Learn some underlying hidden structure of the data

Density estimation

Examples: Clustering, Dimensionality reduction,

Feature learning, Density estimation

Deep Learning
Generative for Computer Vision
Models Slide #8
Supervised vs Unsupervised Learning

Supervised Learning Unsupervised Learning

Data: (x, y) Data: x


Training data is cheap
x is data, y is label Just data, no labels!

Goal: Learn a function to map x -> y Goal: Learn some underlying hidden structure
of the data Solve unsupervised learning =>
understand structure of visual world

Examples: Classification, Object detection , Examples: Clustering, dimensionality


Semantic segmentation, Image captioning, etc. reduction, feature learning, density
estimation, etc.

Deep Learning
Generative for Computer Vision
Models Slide #9
Generative Models

Given training data, generate new samples from same distribution

Training data ~ pdata(x) Generated samples ~ pmodel(x)

Want to: learn pmodel(x) similar to pdata(x)

Addresses density estimation which is a core problem in unsupervised learning

Deep Learning
Generative for Computer Vision
Models
Lecture 13 - Slide #10
Generative Models

Given training data, generate new samples from same distribution

Training data ~ pdata(x) Generated samples ~ pmodel(x)

Want to: learn pmodel(x) similar to pdata(x)

Addresses density estimation which is a core problem in unsupervised learning

• Explicit density estimation: explicitly define and solve for p model(x)

• Implicit density estimation: learn model that can sample from p model(x) without explicitly defining it

Deep Learning
Generative for Computer Vision
Models Slide #11
Why Generative Model?

Deep Learning
Generative for Computer Vision
Models Slide #12
Why Generative Model?

• Increasing dataset, realistic samples for artwork, super-resolution, colorization, etc.

• Generative models of time-series data can be used for simulation and planning.

Deep Learning
Generative for Computer Vision
Models Slide #13
Taxonomy of Generative Models

Generative Model

Explicit Density Implicit Density

Tractable Density Approximate Density Direct Markov chain

Variational Markov chain GAN GSN


✓ Change of variables models
Fully Visible Belief Nets
• PixelRNN
• PixelCNN Boltzmann Machine
Variational Autoencoder

Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
Deep Learning
Generative for Computer Vision
Models Slide #14
Fully visible belief network

Explicit density model


Use chain rule to decompose likelihood of an image x into product of 1-d distributions:

Likelihood of Probability of i’th pixel value

image x given all previous pixels

Then maximize likelihood of training data

Deep Learning
Generative for Computer Vision
Models Slide #15
Fully visible belief network

Explicit density model


Use chain rule to decompose likelihood of an image x into product of 1-d distributions:

Will need to define


ordering of “previous
pixels”

Likelihood of Probability of i’th pixel value

image x given all previous pixels


Complex distribution over pixel values
=> Express using a neural network!
Then maximize likelihood of training data

Deep Learning
Generative for Computer Vision
Models Slide #16
PixelRNN [van der oord et al.2016]

Dependency on previous pixels modeled using an RNN (LSTM)

Generate image pixels starting from corner

Deep Learning
Generative for Computer Vision
Models Slide #17
PixelRNN [van der oord et al.2016]

Dependency on previous pixels modeled using an RNN (LSTM)

Generate image pixels starting from corner

Deep Learning
Generative for Computer Vision
Models Slide #18
PixelRNN [van der oord et al.2016]

Dependency on previous pixels modeled using an RNN (LSTM)

Generate image pixels starting from corner

Deep Learning
Generative for Computer Vision
Models Slide #19
PixelRNN [van der oord et al.2016]

Dependency on previous pixels modeled using an RNN (LSTM)

Generate image pixels starting from corner

Drawback: sequential generation is slow!

Deep Learning
Generative for Computer Vision
Models Slide #20
PixelCNN

Still generate image pixels starting from corner

Dependency on previous pixels now modeled using a CNN over context region

Training: maximize likelihood of training images

Generation must still proceed sequentially


=> still slow

Deep Learning
Generative for Computer Vision
Models Slide #21
PixelCNN vs PixelRNN

Pros: Improving PixelCNN performance

• Can explicitly compute likelihood p(x)


• Explicit likelihood of training data gives • Gated convolutional layers
good evaluation metric • Short-cut connections
• Good samples • Discretized logistic loss
• Multi-scale
• Training tricks
Con: • Etc…
• Sequential generation => slow
See
• Van der Oord et al. NIPS 2016
• Salimans et al. 2017 :PixelCNN++

Deep Learning
Generative for Computer Vision
Models Slide #22
Taxonomy of Generative Models

Generative Model

Explicit Density Implicit Density

Tractable Density Approximate Density Direct Markov chain

Variational Markov chain GAN GSN


✓ Change of variables models
Fully Visible Belief Nets
• PixelRNN
• PixelCNN Boltzmann Machine
Variational Autoencoder

Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
Deep Learning
Generative for Computer Vision
Models Slide #23
Autoencoders

Deep Learning
Generative for Computer Vision
Models Slide #24
Autoencoders

Encoder Decoder
Latent

Deep Learning
Generative for Computer Vision
Models Slide #25
Denoised Autoencoder

Deep Learning
Generative for Computer Vision
Models Slide #26
Autoencoder Application

Semantic Segmentation

Neural Inpainting

Deep Learning
Generative for Computer Vision
Models Slide #27
Variational Autoencoders (VAE)

Reconstruction loss

Stay close to normal(0,1)

Deep Learning
Generative for Computer Vision
Models Slide #28
Variational Autoencoders (VAE)

Z=µ+σΘε

Where ε ~ normal(0,1)

Deep Learning
Generative for Computer Vision
Models Slide #29
Variational Autoencoders (VAE)

• Model: Latent-variable model p(x|z, theta) usually specified by a neural network


• Inference: Recognition network for q(z|x, theta) usually specified by a neural network
• Training objective: Simple Monte Carlo for unbiased estimate of Variational lower bound
• Optimization method: Stochastic gradient ascent,
with automatic differentiation for gradients

Deep Learning
Generative for Computer Vision
Models Slide #31
Variational Autoencoders (VAE)

Pros
• Flexible generative model
• End-to-end gradient training
• Measurable objective (and lower bound - model is at
• least this good)
• Fast test-time inference
Cons:
• sub-optimal variational factors
• limited approximation to true posterior (will revisit)
• Can have high-variance gradients

Deep Learning
Generative for Computer Vision
Models Slide #32
Taxonomy of Generative Models

Generative Model

Explicit Density Implicit Density

Tractable Density Approximate Density Direct Markov chain

Variational Markov chain GAN GSN


✓ Change of variables models
Fully Visible Belief Nets
• PixelRNN
• PixelCNN Boltzmann Machine
Variational Autoencoder

Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks,
2017.
Deep Learning
Generative for Computer Vision
Models Slide #32
(Goodfellow et al., 2014)
Generative Adversarial Networks

▪ GANs or GAN for short

▪ Active research topic

▪ Have shown great improvements in image generation

https://github.com/hindupuravinash/the-gan-zoo

Radford et al., 2016


Deep Learning
Generative for Computer Vision
Models Slide #35
(Goodfellow et al., 2014)
Generative Adversarial Networks

▪ Generator (G) that learns the real data distribution to generate fake
samples
▪ Discriminator (D) that attributes a probability p of confidence of a
sample being real (i.e. coming from the training data)

Training data Rea


l sa
mp
le
Discriminator Is sample real?
p
D
ple
Generator s am
Noise e
Fak
G

Deep Learning
Generative for Computer Vision
Models Slide #36
(Goodfellow et al., 2014)
Generative Adversarial Networks

▪ Both models are trained together (minimax game):


▪ G: Increase the probability of D making mistakes
▪ D: Classify real samples with greater confidence

▪ G slightly changes the generated data based on D’s feedback

▪ Ideal scenario (equilibrium): G will eventually produce such realistic


samples that D attributes p = 0.5 (i.e. cannot distinguish real and fake
samples)

Deep Learning
Generative for Computer Vision
Models Slide #37
(Goodfellow et al., 2014)
Generative Adversarial Networks

Deep Learning
Generative for Computer Vision
Models Slide #38
(Mirza et al., 2014)
Conditional GANs (CGAN)

▪ G and D can be conditioned by additional information y


▪ Adding y as an input of both networks will condition their outputs
▪ y can be external information or data from the training set

Training data Rea


l sa
mp
le
Is sample real, given
Discriminator y?
y p
D
Noise
ple
Generator s am
e
Fak
G
y

Deep Learning
Generative for Computer Vision
Models Slide #39
(Mirza et al., 2014)
Conditional GANs (CGAN)

y = Senior

y = Mouth open

Gauthier, 2015

Deep Learning
Generative for Computer Vision
Models Slide #40
(Mirza et al., 2014)
Conditional GANs (CGAN)

y = Senior

y = Mouth open

Gauthier, 2015

Deep Learning
Generative for Computer Vision
Models Slide #41
(Mirza et al., 2014)
Conditional GANs (CGAN)

y = Senior

y = Mouth open

Gauthier, 2015

Deep Learning
Generative for Computer Vision
Models Slide #42
Limitations of GANs

1. Training instability

○ Good sample generation requires reaching Nash Equilibrium in the


game, which might not always happen

2. Mode collapse

○ When G is able to fool D by generating similarly looking samples


from the same data mode

3. GANs were original made to work only with real-valued, continuous


data (e.g. images)

○ Slight changes in discrete data (e.g. text) are impractical

Chart 41
Deep Learning
Generative for Computer Vision
Models Chart 41
Evaluation metrics

▪ What makes a good generative model?

■ Each generated sample is indistinguishable from a real sample

■ Generated samples should have variety

Deep Learning
Generative for Computer Vision
Models Images from Karras et al., 2017 Slide #44
Evaluation metrics

▪ How to evaluate the generated samples?

■ Cannot rely on the models’ loss :-(

■ Human evaluation :-/

■ Use a pre-trained model :-)

Deep Learning
Generative for Computer Vision
Models Slide #45
Evaluation metrics

▪ Inception Score (IS) [Salimans et al., 2016]

■ Inception model (Szegedy et al., 2015) trained on ImageNet

■ Given generated image x, assigned the label y by model p:

low entropy (one class) https://github.com/Kulbear/dee


p-learning-nano-foundation/wiki
/ReLU-and-Softmax-Activation-F
unctions
■ The distribution over all generated images should be spread
(evaluating mode collapse)

high entropy (many classes)

■ Combining the above, we get the final metric:

Deep Learning
Generative for Computer Vision
Models Slide #46
Evaluation metrics

▪ Fréchet Inception Distance (FID) [Heusel et al., 2017]

■ Calculates the distance between real and fake data (lower the better)

■ Uses the embeddings of the real and fake data from the last pooling layer of
Inception v3.

■ Converts the embeddings into continuous distributions and uses the mean and
covariance of each to calculate their distance.

Deep Learning
Generative for Computer Vision
Models Slide #47
Evaluation metrics

▪ IS vs FID

✓ FID considers the real dataset

✓ FID requires less sampling (faster)


(~10k instead of 50k in IS)

✓ FID more robust to noise and


human judgement

✓ FID also sensitive to mode collapse

FID (lower is better) IS (higher is better)


Deep Learning
Generative for Computer Vision
Models Images from Lucic et al., 2017 and Heusel et al., 2017
Practical scenario

▪ MNIST (handwritten dataset)

▪ Condition the number generation per row


GAN CGAN

https://github.com/gftm/Class_Generative_Networks
Deep Learning
Generative for Computer Vision
Models Slide #49
Practical scenario

▪ Task 1 - Add label as input to both models (plus the combined model)

▪ Task 2 - Get labels (y) from dataset

▪ Task 3 - Add labels to the models’ losses

▪ Task 4 - Generate specific numbers for each row

https://github.com/gftm/Class_Generative_Networks

Deep Learning
Generative for Computer Vision
Models Slide #50
Practical scenario

▪ Task 1 - Add label as input to both models (plus the combined model)

■ def __init__(self):

Deep Learning
Generative for Computer Vision
Models Slide #51
Practical scenario

▪ Task 1 - Add label as input to both models (plus the combined model)

■ def build_generator(self):

■ def build_discriminator(self):

Deep Learning
Generative for Computer Vision
Models Slide #52
Practical scenario

▪ Task 2 - Get labels (y) from dataset

■ def train(self, epochs, batch_size=128, sample_interval=50):

Deep Learning
Generative for Computer Vision
Models Slide #53
Practical scenario

▪ Task 3 - Add labels to the models’ losses

■ def train(self, epochs, batch_size=128, sample_interval=50):

Deep Learning
Generative for Computer Vision
Models Slide #54
Practical scenario

▪ Task 4 - Generate specific numbers for each row

■ def sample_images(self, epoch):

Deep Learning
Generative for Computer Vision
Models Slide #55
References

■ A. Radford, L. Mety, and S. Chintala. 2016. Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks.
■ I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y.
Bengio. 2014. Generative Adversarial Nets.
■ M. Mirza, and S. Osindero. 2014. Conditional Generative Adversarial Nets.
■ J. Gauthier. 2015. Conditional Generative Adversarial Nets for Convolutional Face Generation.
■ T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen. 2016. Improved
Techniques for Training GANs.
■ M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter. 2017. GANs Trained by a Two
Time-Scale Update Rule Converge to a Local Nash Equilibrium
■ T. Karras, T. Aila, S. Laine, J. Lehtinen. 2017. Progressive Growing of GANs for Improved Quality,
Stability, and Variation.
■ C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. 2015. Rethinking the Inception Architecture
for Computer Vision.

Deep Learning
Generative for Computer Vision
Models Slide #56

You might also like