The document discusses generative models and summarizes three popular types: PixelRNN/CNN, variational autoencoders (VAE), and generative adversarial networks (GAN). PixelRNN/CNN are fully visible belief networks that use a neural network to model the probability of each pixel given previous pixels to explicitly define the data distribution. VAEs are variational models that learn a latent representation to implicitly define the data distribution. GANs are implicit density models that train a generator and discriminator in an adversarial manner to generate samples from the data distribution.
This document provides an overview of deep generative models including generative and discriminative models, autoencoders, variational autoencoders, generative adversarial networks, and conditional generative models. It discusses applications of generative models such as image translation, denoising, and text generation. Specific generative models covered include VAEs, GANs, DRAW, fully convolutional networks, and CycleGAN. The document also notes challenges with training GANs and potential applications of generative models in understanding the real world and artificial general intelligence.
This document provides an overview of deep generative models for images. It discusses generative adversarial networks (GANs) which define generative modeling as an adversarial game between a generator and discriminator. Conditional GANs can generate images from text or translate between image domains. Variational autoencoders (VAEs) learn latent representations of the data. Fully convolutional models use transposed convolutions in the decoder. CycleGAN can perform unpaired image-to-image translation using cycle consistency losses. Overall, generative models aim to understand data distributions in order to generate new, realistic samples.
Deep neural network with GANs pre- training for tuberculosis type classificat...Behzad Shomali
The following presentation summarizes the bachelor's thesis (final project) of Behzad Shomali at the Ferdowsi University of Mashhad (FUM). The full text can be found at https://bit.ly/3xt4vc0
GAN for business value @ Data Science MilanAlex Honchar
This document discusses generative modeling beyond using GANs to generate creepy images on Twitter. It provides examples of using generative modeling for tasks like generating video game levels, text, molecules, and music. Generative modeling involves modeling a distribution to generate new data points, while discriminative modeling involves classifying data points. Generative modeling has value in data augmentation, anonymization, and translating data from one domain to another.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
This document provides information about a machine learning course including logistics, content, and expectations. The course consists of 2 lectures and 1 lab session per week over 10 weeks. Assessments include a coursework, exam, and problem sets completed during lab sessions. Students are encouraged to attend all sessions, complete assignments, ask questions, and provide feedback. The course will cover key topics in deep learning including supervised, unsupervised, and reinforcement learning using neural networks applied to domains like computer vision, natural language processing, and more. Landmarks in the field and the 2018 Turing Award winners are also mentioned.
This document outlines the agenda for the CS231n: Deep Learning for Computer Vision lecture. It introduces image classification as a core task in computer vision and discusses how deep learning approaches like convolutional neural networks (CNNs) are important tools for visual recognition problems. The lecture provides an overview of the course, which covers topics like CNNs, object detection, segmentation, and applications of deep learning beyond 2D images and computer vision.
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
Generating representations is the ultimate act of creativity. Recent advancements in neural networks (and in processing power) brought us the capability to perform regression against complex samples like images and audio. In this presentation we show the underlying mechanics of media generation from latent space representation of abstract visual ideas, real embodiment of “Platonic” concepts, with Variational Autoencoders, Generative Adversarial Networks, neural style transfer and PixelRNN/CNN along with current practical applications like DeepFake.
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...Tomoyuki Suzuki
The document presents a new method for unsupervised representation learning called Auto-Encoding Transformations (AET). AET trains models to encode and reconstruct transformations like rotations that are applied to images, rather than directly encoding the image pixels like traditional autoencoders. This avoids trivial solutions and forces the model to learn representations focused on the semantic content of images rather than surface statistics. The method outperforms previous self-supervised and generative methods on downstream tasks like object detection and segmentation.
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
A Short Introduction to Generative Adversarial NetworksJong Wook Kim
Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks compete against each other. One network generates new data instances, while the other evaluates them for authenticity. This adversarial process allows the generating network to produce highly realistic samples matching the training data distribution. The document discusses the GAN framework, various algorithm variants like WGAN and BEGAN, training tricks, applications to image generation and translation tasks, and reasons why GANs are a promising area of research.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Generative Adversarial Networks (GANs) are a class of machine learning frameworks where two neural networks contest with each other in a game. A generator network generates new data instances, while a discriminator network evaluates them for authenticity, classifying them as real or generated. This adversarial process allows the generator to improve over time and generate highly realistic samples that can pass for real data. The document provides an overview of GANs and their variants, including DCGAN, InfoGAN, EBGAN, and ACGAN models. It also discusses techniques for training more stable GANs and escaping issues like mode collapse.
Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks, a generator and discriminator, compete against each other. The generator learns to generate new data with the same statistics as the training set to fool the discriminator, while the discriminator learns to better distinguish real samples from generated samples. GANs have applications in image generation, image translation between domains, and image completion. Training GANs can be challenging due to issues like mode collapse.
Generative models for images can generate new samples from a target distribution or with particular properties. Variational autoencoders use an encoder to compress inputs into a latent representation and a decoder to reconstruct the input from that representation. Generative adversarial networks use a generator and discriminator that compete, with the generator trying to generate realistic samples and the discriminator trying to distinguish real from generated samples. CycleGAN uses GANs with cycle consistency to translate between image domains without paired data. Recent research includes Fader Networks for attribute transfer and semi-parametric image synthesis that reuses patterns from training data to generate new images conditioned on a semantic layout.
GAN for business value @ Data Science MilanAlex Honchar
This document discusses generative modeling beyond using GANs to generate creepy images on Twitter. It provides examples of using generative modeling for tasks like generating video game levels, text, molecules, and music. Generative modeling involves modeling a distribution to generate new data points, while discriminative modeling involves classifying data points. Generative modeling has value in data augmentation, anonymization, and translating data from one domain to another.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
This document provides information about a machine learning course including logistics, content, and expectations. The course consists of 2 lectures and 1 lab session per week over 10 weeks. Assessments include a coursework, exam, and problem sets completed during lab sessions. Students are encouraged to attend all sessions, complete assignments, ask questions, and provide feedback. The course will cover key topics in deep learning including supervised, unsupervised, and reinforcement learning using neural networks applied to domains like computer vision, natural language processing, and more. Landmarks in the field and the 2018 Turing Award winners are also mentioned.
This document outlines the agenda for the CS231n: Deep Learning for Computer Vision lecture. It introduces image classification as a core task in computer vision and discusses how deep learning approaches like convolutional neural networks (CNNs) are important tools for visual recognition problems. The lecture provides an overview of the course, which covers topics like CNNs, object detection, segmentation, and applications of deep learning beyond 2D images and computer vision.
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
Generating representations is the ultimate act of creativity. Recent advancements in neural networks (and in processing power) brought us the capability to perform regression against complex samples like images and audio. In this presentation we show the underlying mechanics of media generation from latent space representation of abstract visual ideas, real embodiment of “Platonic” concepts, with Variational Autoencoders, Generative Adversarial Networks, neural style transfer and PixelRNN/CNN along with current practical applications like DeepFake.
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...Tomoyuki Suzuki
The document presents a new method for unsupervised representation learning called Auto-Encoding Transformations (AET). AET trains models to encode and reconstruct transformations like rotations that are applied to images, rather than directly encoding the image pixels like traditional autoencoders. This avoids trivial solutions and forces the model to learn representations focused on the semantic content of images rather than surface statistics. The method outperforms previous self-supervised and generative methods on downstream tasks like object detection and segmentation.
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
A Short Introduction to Generative Adversarial NetworksJong Wook Kim
Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks compete against each other. One network generates new data instances, while the other evaluates them for authenticity. This adversarial process allows the generating network to produce highly realistic samples matching the training data distribution. The document discusses the GAN framework, various algorithm variants like WGAN and BEGAN, training tricks, applications to image generation and translation tasks, and reasons why GANs are a promising area of research.
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Generative Adversarial Networks (GANs) are a class of machine learning frameworks where two neural networks contest with each other in a game. A generator network generates new data instances, while a discriminator network evaluates them for authenticity, classifying them as real or generated. This adversarial process allows the generator to improve over time and generate highly realistic samples that can pass for real data. The document provides an overview of GANs and their variants, including DCGAN, InfoGAN, EBGAN, and ACGAN models. It also discusses techniques for training more stable GANs and escaping issues like mode collapse.
Generative adversarial networks (GANs) are a class of machine learning frameworks where two neural networks, a generator and discriminator, compete against each other. The generator learns to generate new data with the same statistics as the training set to fool the discriminator, while the discriminator learns to better distinguish real samples from generated samples. GANs have applications in image generation, image translation between domains, and image completion. Training GANs can be challenging due to issues like mode collapse.
Generative models for images can generate new samples from a target distribution or with particular properties. Variational autoencoders use an encoder to compress inputs into a latent representation and a decoder to reconstruct the input from that representation. Generative adversarial networks use a generator and discriminator that compete, with the generator trying to generate realistic samples and the discriminator trying to distinguish real from generated samples. CycleGAN uses GANs with cycle consistency to translate between image domains without paired data. Recent research includes Fader Networks for attribute transfer and semi-parametric image synthesis that reuses patterns from training data to generate new images conditioned on a semantic layout.
Content Moderation Services_ Leading the Future of Online Safety.docxsofiawilliams5966
These services are not just gatekeepers of community standards. They are architects of safe interaction, unseen defenders of user well-being, and the infrastructure supporting the promise of a trustworthy internet.
The final presentation of our time series forecasting project for the "Data Science for Society and Business" Master's program at Constructor University Bremen
Internal Architecture of Database Management SystemsM Munim
A Database Management System (DBMS) is software that allows users to define, create, maintain, and control access to databases. Internally, a DBMS is composed of several interrelated components that work together to manage data efficiently, ensure consistency, and provide quick responses to user queries. The internal architecture typically includes modules for query processing, transaction management, and storage management. This assignment delves into these key components and how they collaborate within a DBMS.
15 Benefits of Data Analytics in Business Growth.pdfAffinityCore
Explore how data analytics boosts business growth with insights that improve decision-making, customer targeting, operations, and long-term profitability.
Glary Utilities Pro 5.157.0.183 Crack + Key Download [Latest]Designer
Copy Link & Paste in Google👉👉👉 https://alipc.pro/dl/
Glary Utilities Pro Crack Glary Utilities Pro Crack Free Download is an amazing collection of system tools and utilities to fix, speed up, maintain and protect your PC.
How to Choose the Right Online Proofing Softwareskalatskayaek
This concise guide walks you through the essential factors to evaluate when selecting an online proofing solution. Learn how to compare collaboration features, file-format support, review workflows, integrations, security, and pricing—helping you choose the right proofing software that streamlines feedback, accelerates approvals, and keeps your creative projects on track. Visit cwaysoftware.com for more information and to explore Cway Software’s proofing tools.
"Machine Learning in Agriculture: 12 Production-Grade Models", Danil PolyakovFwdays
Kernel is currently the leading producer of sunflower oil and one of the largest agroholdings in Ukraine. What business challenges are they addressing, and why is ML a must-have? This talk explores the development of the data science team at Kernel—from early experiments in Google Colab to building minimal in-house infrastructure and eventually scaling up through an infrastructure partnership with De Novo. The session will highlight their work on crop yield forecasting, the positive results from testing on H100, and how the speed gains enabled the team to solve more business tasks.
Mastering Data Science: Unlocking Insights and Opportunities at Yale IT Skill...smrithimuralidas
The Data Science Course at Yale IT Skill Hub in Coimbatore provides in-depth training in data analysis, machine learning, and AI using Python, R, SQL, and tools like Tableau. Ideal for beginners and professionals, it covers data wrangling, visualization, and predictive modeling through hands-on projects and real-world case studies. With expert-led sessions, flexible schedules, and 100% placement support, this course equips learners with skills for Coimbatore’s booming tech industry. Earn a globally recognized certification to excel in data-driven roles. The Data Analytics Course at Yale IT Skill Hub in Coimbatore offers comprehensive training in data visualization, statistical analysis, and predictive modeling using tools like Power BI, Tableau, Python, and R. Designed for beginners and professionals, it features hands-on projects, expert-led sessions, and real-world case studies tailored to industries like IT and manufacturing. With flexible schedules, 100% placement support, and globally recognized certification, this course equips learners to excel in Coimbatore’s growing data-driven job market.
2. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Administrative
2
● A3 is out. Due May 25.
● Milestone was due May 10th
○ Read website page for milestone requirements.
○ Need to Finish data preprocessing and initial results by then.
● Midterm and A2 grades will be out this week
3. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Supervised vs Unsupervised Learning
3
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
4. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Supervised vs Unsupervised Learning
4
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
Cat
Classification
This image is CC0 public domain
5. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Supervised vs Unsupervised Learning
5
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
Image captioning
A cat sitting on a suitcase on the floor
Caption generated using neuraltalk2
Image is CC0 Public domain.
6. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Supervised vs Unsupervised Learning
6
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
DOG, DOG, CAT
This image is CC0 public domain
Object Detection
7. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Supervised vs Unsupervised Learning
7
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
Semantic Segmentation
GRASS, CAT,
TREE, SKY
8. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
8
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering,
dimensionality reduction, feature
learning, density estimation, etc.
Supervised vs Unsupervised Learning
9. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
9
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering,
dimensionality reduction, density
estimation, etc.
Supervised vs Unsupervised Learning
K-means clustering
This image is CC0 public domain
10. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
10
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering,
dimensionality reduction, density
estimation, etc.
Supervised vs Unsupervised Learning
Principal Component Analysis
(Dimensionality reduction)
This image from Matthias Scholz
is CC0 public domain
3-d 2-d
11. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
11
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering,
dimensionality reduction, density
estimation, etc.
Supervised vs Unsupervised Learning
2-d density estimation
2-d density images left and right
are CC0 public domain
1-d density estimation
Figure copyright Ian Goodfellow, 2016. Reproduced with permission.
Modeling p(x)
12. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Unsupervised Learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering,
dimensionality reduction, density
estimation, etc.
12
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)
x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
13. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Modeling
13
Training data ~ pdata
(x)
Objectives:
1. Learn pmodel
(x) that approximates pdata
(x)
2. Sampling new x from pmodel
(x)
Given training data, generate new samples from same distribution
learning
pmodel
(x)
sampling
14. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Modeling
14
Training data ~ pdata
(x)
Given training data, generate new samples from same distribution
learning sampling
Formulate as density estimation problems:
- Explicit density estimation: explicitly define and solve for pmodel
(x)
- Implicit density estimation: learn model that can sample from pmodel
(x) without
explicitly defining it.
pmodel
(x)
15. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Why Generative Models?
15
- Realistic samples for artwork, super-resolution, colorization, etc.
- Learn useful features for downstream tasks such as classification.
- Getting insights from high-dimensional data (physics, medical imaging, etc.)
- Modeling physical world for simulation and planning (robotics and
reinforcement learning applications)
- Many more ...
FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR Blog.
16. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Taxonomy of Generative Models
16
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord
17. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Taxonomy of Generative Models
17
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Today: discuss 3 most
popular types of generative
models today
Fully Visible Belief Nets
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord
18. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
18
PixelRNN and PixelCNN
(A very brief overview)
19. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
19
Fully visible belief network (FVBN)
Likelihood of
image x
Explicit density model
Joint likelihood of each
pixel in the image
20. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
20
Fully visible belief network (FVBN)
Use chain rule to decompose likelihood of an image x into product of 1-d
distributions:
Explicit density model
Likelihood of
image x
Probability of i’th pixel value
given all previous pixels
Then maximize likelihood of training data
21. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Then maximize likelihood of training data
21
Fully visible belief network (FVBN)
Use chain rule to decompose likelihood of an image x into product of 1-d
distributions:
Explicit density model
Likelihood of
image x
Probability of i’th pixel value
given all previous pixels
Complex distribution over pixel
values => Express using a neural
network!
23. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelRNN
23
Generate image pixels starting from corner
Dependency on previous pixels modeled
using an RNN (LSTM)
[van der Oord et al. 2016]
24. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelRNN
24
Generate image pixels starting from corner
Dependency on previous pixels modeled
using an RNN (LSTM)
[van der Oord et al. 2016]
25. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelRNN
25
Generate image pixels starting from corner
Dependency on previous pixels modeled
using an RNN (LSTM)
[van der Oord et al. 2016]
26. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelRNN
26
Generate image pixels starting from corner
Dependency on previous pixels modeled
using an RNN (LSTM)
[van der Oord et al. 2016]
Drawback: sequential generation is slow
in both training and inference!
27. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelCNN
27
[van der Oord et al. 2016]
Still generate image pixels starting from
corner
Dependency on previous pixels now
modeled using a CNN over context region
(masked convolution)
Figure copyright van der Oord et al., 2016. Reproduced with permission.
28. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
PixelCNN
28
[van der Oord et al. 2016]
Still generate image pixels starting from
corner
Dependency on previous pixels now
modeled using a CNN over context region
(masked convolution)
Figure copyright van der Oord et al., 2016. Reproduced with permission.
Training is faster than PixelRNN
(can parallelize convolutions since context region
values known from training images)
Generation is still slow:
For a 32x32 image, we need to do forward passes of
the network 1024 times for a single image
29. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generation Samples
29
Figures copyright Aaron van der Oord et al., 2016. Reproduced with permission.
32x32 CIFAR-10 32x32 ImageNet
30. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
30
PixelRNN and PixelCNN
Improving PixelCNN performance
- Gated convolutional layers
- Short-cut connections
- Discretized logistic loss
- Multi-scale
- Training tricks
- Etc…
See
- Van der Oord et al. NIPS 2016
- Salimans et al. 2017
(PixelCNN++)
Pros:
- Can explicitly compute likelihood
p(x)
- Easy to optimize
- Good samples
Con:
- Sequential generation => slow
31. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Taxonomy of Generative Models
31
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord
33. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
33
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
So far...
34. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
So far...
34
Variational Autoencoders (VAEs) define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
No dependencies among pixels, can generate all pixels at the same time!
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
35. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
So far...
35
Variational Autoencoders (VAEs) define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
No dependencies among pixels, can generate all pixels at the same time!
Why latent z?
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
36. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
36
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data
Encoder
Input data
Features
Decoder
37. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
37
Input data
Features
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data
z usually smaller than x
(dimensionality reduction)
Q: Why dimensionality
reduction?
Decoder
Encoder
38. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
38
Input data
Features
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data
z usually smaller than x
(dimensionality reduction)
Decoder
Encoder
Q: Why dimensionality
reduction?
A: Want features to
capture meaningful
factors of variation in
data
39. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
39
Encoder
Input data
Features
How to learn this feature
representation?
Train such that features
can be used to
reconstruct original data
“Autoencoding” -
encoding input itself
Decoder
Reconstructed
input data
Reconstructed data
Encoder: 4-layer conv
Decoder: 4-layer upconv
Input data
40. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
40
Encoder
Input data
Features
Decoder
Reconstructed data
Input data
Encoder: 4-layer conv
Decoder: 4-layer upconv
L2 Loss function:
Train such that features
can be used to
reconstruct original data
Doesn’t use labels!
41. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
41
Encoder
Input data
Features
Decoder
Reconstructed
input data
After training,
throw away decoder
42. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
42
Encoder
Input data
Features
Classifier
Predicted Label
Fine-tune
encoder
jointly with
classifier
Loss function
(Softmax, etc)
Encoder can be
used to initialize a
supervised model
plane
dog deer
bird
truck
Train for final task
(sometimes with
small data)
Transfer from large, unlabeled
dataset to small, labeled dataset.
43. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Some background first: Autoencoders
43
Encoder
Input data
Features
Decoder
Reconstructed
input data
Autoencoders can reconstruct
data, and can learn features to
initialize a supervised model
Features capture factors of
variation in training data.
But we can’t generate new
images from an autoencoder
because we don’t know the
space of z.
How do we make autoencoder a
generative model?
44. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
44
Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
45. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
45
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Assume training data is generated from the distribution of unobserved (latent)
representation z
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Sample from
true conditional
46. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
46
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Assume training data is generated from the distribution of unobserved (latent)
representation z
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Sample from
true conditional
Intuition (remember from autoencoders!):
x is an image, z is latent factors used to
generate x: attributes, orientation, etc.
47. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
47
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
48. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
48
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How should we represent this model?
49. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
49
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How should we represent this model?
Choose prior p(z) to be simple, e.g.
Gaussian. Reasonable for latent attributes,
e.g. pose, how much smile.
50. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
50
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How should we represent this model?
Choose prior p(z) to be simple, e.g.
Gaussian. Reasonable for latent attributes,
e.g. pose, how much smile.
Conditional p(x|z) is complex (generates
image) => represent with neural network
Decoder
network
51. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
51
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How to train the model?
Decoder
network
52. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
52
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How to train the model?
Learn model parameters to maximize likelihood
of training data
Decoder
network
53. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
53
Sample from
true prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from
true conditional
We want to estimate the true parameters
of this generative model given training data x.
How to train the model?
Learn model parameters to maximize likelihood
of training data
Q: What is the problem with this?
Intractable!
Decoder
network
54. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
54
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
57. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
57
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Intractable to compute p(x|z) for every z!
��
✔ ✔
58. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
58
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Intractable to compute p(x|z) for every z!
��
✔ ✔
Monte Carlo estimation is too high variance
59. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
59
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
��
✔ ✔
Posterior density:
Intractable data likelihood
60. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
60
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Solution: In addition to modeling pθ
(x|z), learn qɸ
(z|x) that approximates the true
posterior pθ
(z|x).
Will see that the approximate posterior allows us to derive a lower bound on the
data likelihood that is tractable, which we can optimize.
Variational inference is to approximate the unknown posterior distribution from
only the observed data x
Posterior density also intractable:
62. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
62
Variational Autoencoders
Taking expectation wrt. z
(using encoder network) will
come in handy later
66. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
66
Variational Autoencoders
The expectation wrt. z (using
encoder network) let us write
nice KL terms
67. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
67
Variational Autoencoders
This KL term (between
Gaussians for encoder and z
prior) has nice closed-form
solution!
pθ
(z|x) intractable (saw
earlier), can’t compute this KL
term :( But we know KL
divergence always >= 0.
Decoder network gives pθ
(x|z), can
compute estimate of this term through
sampling (need some trick to
differentiate through sampling).
68. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
68
Variational Autoencoders
We want to
maximize the
data
likelihood
This KL term (between
Gaussians for encoder and z
prior) has nice closed-form
solution!
pθ
(z|x) intractable (saw
earlier), can’t compute this KL
term :( But we know KL
divergence always >= 0.
Decoder network gives pθ
(x|z), can
compute estimate of this term through
sampling.
69. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
69
Variational Autoencoders
Tractable lower bound which we can take
gradient of and optimize! (pθ
(x|z) differentiable,
KL term differentiable)
We want to
maximize the
data
likelihood
70. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
70
Variational Autoencoders
Tractable lower bound which we can take
gradient of and optimize! (pθ
(x|z) differentiable,
KL term differentiable)
Decoder:
reconstruct
the input data
Encoder:
make approximate
posterior distribution
close to prior
71. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
71
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
72. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
72
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Let’s look at computing the KL
divergence between the estimated
posterior and the prior given some data
73. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
73
Encoder network
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
74. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
74
Encoder network
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Make approximate
posterior distribution
close to prior
Have analytical solution
75. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
75
Encoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Make approximate
posterior distribution
close to prior
Not part of the computation graph!
76. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
76
Encoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Reparameterization trick to make
sampling differentiable:
Sample
77. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
77
Encoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Reparameterization trick to make
sampling differentiable:
Sample
Part of computation graph
Input to
the graph
78. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
78
Encoder network
Decoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
79. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
79
Encoder network
Decoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
Maximize likelihood of original
input being reconstructed
80. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
80
Encoder network
Decoder network
Sample z from
Input Data
Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound
For every minibatch of input
data: compute this forward
pass, and then backprop!
81. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
81
Variational Autoencoders: Generating Data!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Sample from
true prior
Sample from
true conditional
Decoder
network
Our assumption about data generation
process
82. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
82
Variational Autoencoders: Generating Data!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Sample from
true prior
Sample from
true conditional
Decoder
network
Our assumption about data generation
process
Decoder network
Sample z from
Sample x|z from
Now given a trained VAE:
use decoder network & sample z from prior!
83. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
83
Decoder network
Sample z from
Sample x|z from
Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from prior!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
84. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
84
Decoder network
Sample z from
Sample x|z from
Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from prior! Data manifold for 2-d z
Vary z1
Vary z2
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
85. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
85
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of smile
Head pose
Diagonal prior on z
=> independent
latent variables
Different
dimensions of z
encode
interpretable factors
of variation
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
86. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
86
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of smile
Head pose
Diagonal prior on z
=> independent
latent variables
Different
dimensions of z
encode
interpretable factors
of variation
Also good feature representation that
can be computed using qɸ
(z|x)!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
87. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
87
Variational Autoencoders: Generating Data!
32x32 CIFAR-10
Labeled Faces in the Wild
Figures copyright (L) Dirk Kingma et al. 2016; (R) Anders Larsen et al. 2017. Reproduced with permission.
88. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Variational Autoencoders
88
Probabilistic spin to traditional autoencoders => allows generating data
Defines an intractable density => derive and optimize a (variational) lower bound
Pros:
- Principled approach to generative models
- Interpretable latent space.
- Allows inference of q(z|x), can be useful feature representation for other tasks
Cons:
- Maximizes lower bound of likelihood: okay, but not as good evaluation as
PixelRNN/PixelCNN
- Samples blurrier and lower quality compared to state-of-the-art (GANs)
Active areas of research:
- More flexible approximations, e.g. richer approximate posterior instead of diagonal
Gaussian, e.g., Gaussian Mixture Models (GMMs), Categorical Distributions.
- Learning disentangled representations.
89. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Taxonomy of Generative Models
89
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density
Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets
- NADE
- MADE
- PixelRNN/CNN
- NICE / RealNVP
- Glow
- Ffjord
91. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
So far...
91
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
92. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
So far...
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
92
What if we give up on explicitly modeling density, and just want ability to sample?
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
93. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
So far...
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
93
What if we give up on explicitly modeling density, and just want ability to sample?
GANs: not modeling any explicit density function!
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
94. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Adversarial Networks
94
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Problem: Want to sample from complex, high-dimensional training distribution. No direct
way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
95. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Problem: Want to sample from complex, high-dimensional training distribution. No direct
way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Generative Adversarial Networks
95
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
z
Input: Random noise
Generator
Network
Output: Sample from
training distribution
96. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Problem: Want to sample from complex, high-dimensional training distribution. No direct
way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Generative Adversarial Networks
96
z
Input: Random noise
Generator
Network
Output: Sample from
training distribution
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
But we don’t know which
sample z maps to which
training image -> can’t
learn by reconstructing
training images
97. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Problem: Want to sample from complex, high-dimensional training distribution. No direct
way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Generative Adversarial Networks
97
z
Input: Random noise
Generator
Network
Output: Sample from
training distribution
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
But we don’t know which
sample z maps to which
training image -> can’t
learn by reconstructing
training images
Objective: generated
images should look “real”
98. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Problem: Want to sample from complex, high-dimensional training distribution. No direct
way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Generative Adversarial Networks
98
z
Input: Random noise
Generator
Network
Output: Sample from
training distribution
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
But we don’t know which
sample z maps to which
training image -> can’t
learn by reconstructing
training images
Discriminator
Network
Real?
Fake?
Solution: Use a discriminator
network to tell whether the
generate image is within data
distribution (“real”) or not
gradient
99. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
99
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
100. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
100
z
Random noise
Generator Network
Discriminator Network
Fake Images
(from generator)
Real Images
(from training set)
Real or Fake
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
101. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
101
z
Random noise
Generator Network
Discriminator Network
Fake Images
(from generator)
Real Images
(from training set)
Real or Fake
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Generator learning signal
Discriminator learning signal
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
102. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
102
Train jointly in minimax game
Minimax objective function:
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
Generator
objective Discriminator
objective
103. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
103
Train jointly in minimax game
Minimax objective function:
Discriminator output
for real data x
Discriminator output for
generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
104. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
104
Train jointly in minimax game
Minimax objective function:
Discriminator output
for real data x
Discriminator output for
generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
105. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
105
Train jointly in minimax game
Minimax objective function:
Discriminator output
for real data x
Discriminator output for
generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
106. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
106
Train jointly in minimax game
Minimax objective function:
Discriminator output
for real data x
Discriminator output for
generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
- Discriminator (θd
) wants to maximize objective such that D(x) is close to 1 (real) and
D(G(z)) is close to 0 (fake)
- Generator (θg
) wants to minimize objective such that D(G(z)) is close to 1
(discriminator is fooled into thinking generated G(z) is real)
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images
Generator network: try to fool the discriminator by generating real-looking images
107. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
107
Minimax objective function:
Alternate between:
1. Gradient ascent on discriminator
2. Gradient descent on generator
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
108. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
108
Minimax objective function:
Alternate between:
1. Gradient ascent on discriminator
2. Gradient descent on generator
In practice, optimizing this generator objective
does not work well!
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
When sample is likely
fake, want to learn from
it to improve generator
(move to the right on X
axis).
109. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
109
Minimax objective function:
Alternate between:
1. Gradient ascent on discriminator
2. Gradient descent on generator
In practice, optimizing this generator objective
does not work well!
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
When sample is likely
fake, want to learn from
it to improve generator
(move to the right on X
axis).
But gradient in this
region is relatively flat!
Gradient signal
dominated by region
where sample is
already good
110. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
110
Minimax objective function:
Alternate between:
1. Gradient ascent on discriminator
2. Instead: Gradient ascent on generator, different objective
Instead of minimizing likelihood of discriminator being correct, now
maximize likelihood of discriminator being wrong.
Same objective of fooling discriminator, but now higher gradient
signal for bad samples => works much better! Standard in practice.
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
High gradient signal
Low gradient signal
111. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
111
Putting it together: GAN training algorithm
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
112. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
112
Putting it together: GAN training algorithm
Some find k=1
more stable,
others use k > 1,
no best rule.
Followup work
(e.g. Wasserstein
GAN, BEGAN)
alleviates this
problem, better
stability!
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Arjovsky et al. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017)
Berthelot, et al. "Began: Boundary equilibrium generative adversarial networks." arXiv preprint arXiv:1703.10717 (2017)
113. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Training GANs: Two-player game
113
Generator network: try to fool the discriminator by generating real-looking images
Discriminator network: try to distinguish between real and fake images
z
Random noise
Generator Network
Discriminator Network
Fake Images
(from generator)
Real Images
(from training set)
Real or Fake
After training, use generator network to
generate new images
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
114. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Adversarial Nets
114
Nearest neighbor from training set
Generated samples
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
115. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Adversarial Nets
115
Nearest neighbor from training set
Generated samples (CIFAR-10)
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
116. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Adversarial Nets: Convolutional Architectures
116
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016
Generator is an upsampling network with fractionally-strided convolutions
Discriminator is a convolutional network
117. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
117
Radford et al,
ICLR 2016
Samples
from the
model look
much
better!
Generative Adversarial Nets: Convolutional Architectures
118. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
118
Radford et al,
ICLR 2016
Interpolating
between
random
points in latent
space
Generative Adversarial Nets: Convolutional Architectures
119. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Generative Adversarial Nets: Interpretable Vector Math
119
Smiling woman Neutral woman Neutral man
Samples
from the
model
Radford et al, ICLR 2016
120. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
120
Smiling woman Neutral woman Neutral man
Samples
from the
model
Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
121. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
121
Smiling woman Neutral woman Neutral man
Smiling Man
Samples
from the
model
Average Z
vectors, do
arithmetic
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
122. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
122
Glasses man No glasses man No glasses woman
Woman with glasses
Radford et al,
ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
123. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
123
https://github.com/hindupuravinash/the-gan-zoo
See also: https://github.com/soumith/ganhacks for tips
and tricks for trainings GANs
2017: Explosion of GANs
“The GAN Zoo”
124. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
124
Better training and generation
LSGAN, Zhu 2017. Wasserstein GAN,
Arjovsky 2017.
Improved Wasserstein
GAN, Gulrajani 2017.
Progressive GAN, Karras 2018.
2017: Explosion of GANs
125. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
2017: Explosion of GANs
125
CycleGAN. Zhu et al. 2017.
Source->Target domain transfer
Many GAN applications
Pix2pix. Isola 2017. Many examples at
https://phillipi.github.io/pix2pix/
Reed et al. 2017.
Text -> Image Synthesis
126. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
2019: BigGAN
126
Brock et al., 2019
127. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Scene graphs to GANs
Specifying exactly what kind of image you
want to generate.
The explicit structure in scene graphs
provides better image generation for complex
scenes.
127
Johnson et al. Image Generation from Scene Graphs, CVPR 2019
Figures copyright 2019. Reproduced with permission.
128. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
HYPE: Human eYe Perceptual Evaluations
hype.stanford.edu
Zhou, Gordon, Krishna et al. HYPE: Human eYe Perceptual Evaluations, NeurIPS 2019
128
Figures copyright 2019. Reproduced with permission.
129. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Summary: GANs
129
Don’t work with an explicit density function
Take game-theoretic approach: learn to generate from training distribution through 2-player
game
Pros:
- Beautiful, state-of-the-art samples!
Cons:
- Trickier / more unstable to train
- Can’t solve inference queries such as p(x), p(z|x)
Active areas of research:
- Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
- Conditional GANs, GANs for all kinds of applications
130. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Summary
130
Autoregressive models:
PixelRNN, PixelCNN
Van der Oord et al, “Conditional
image generation with pixelCNN
decoders”, NIPS 2016
Variational Autoencoders
Kingma and Welling, “Auto-encoding
variational bayes”, ICLR 2013
Generative Adversarial
Networks (GANs)
Goodfellow et al, “Generative
Adversarial Nets”, NIPS 2014
131. Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 13 - May 12, 2022
Useful Resources on Generative Models
CS 236: Deep Generative Models (Stanford)
CS 294-158 Deep Unsupervised Learning (Berkeley)
131