Deep Learning 2017 Lecture7GAN
Deep Learning 2017 Lecture7GAN
Ian Goodfellow:
https://www.youtube.com/watch?v=YpdP_0-IEOw
Radford, (generate voices also here)
https://www.youtube.com/watch?v=KeJINHjyzOU
Tips for training GAN: https://github.com/soumith/ganhacks
Autoencoder
As close as possible
code
NN NN
Encoder Decoder
code
Randomly NN
generate a vector Decoder Image ?
as code
Autoencoder with 3 fully connected layers
Training: model.fit(X,X)
Cost function: Σk=1..N (xk – x’k)2
-1.5 1.5
NN
Decoder
Auto-encoder
-1.5 1.5
Auto-encoder
NN NN
input Encoder
output
Decoder
code
VAE Minimize
m1
m2 reconstruction error
NN m3 c1
input NN
Encoder σ1 exp + c2 output
Decoder
σ2 c3
σ3
X ci = exp(σi)ei + mi
From a normal e1
distribution e2 Minimize
e3
code
NN
Output As close as
Decoder
possible
Realistic Fake
VAE treats these the same
Gradual and step-wise generation
NN NN NN
Generator Generator Generator
v1 v2 v3
Randomly NN
Generator
sample a v1
vector 0 0 0 0
Discri-
image minator 1/0 (real or fake)
v1
Randomly sample
GAN – Learn a generator a vector
Train NN
Updating the parameters of this Generator
generator v2 v1
Generator + Discriminator =
a network Do not Discri-
Train
minator
Using gradient descent to This
v1
update the parameters in the
generator, but fix the
discriminator 1.0 0.13
Generating 2nd element figures
You can use the following to start a project (but this is in Chinese):
Source of images: https://zhuanlan.zhihu.com/p/24767059
From Dr. HY Lee’s notes.
DCGAN: https://github.com/carpedm20/DCGAN-tensorflow
GAN – generating 2nd element figures
100 rounds
1000 rounds
GAN – generating 2nd element figures
2000 rounds
GAN – generating 2nd element figures
5000 rounds
GAN – generating 2nd element figures
10,000 rounds
GAN – generating 2nd element figures
20,000 rounds
GAN – generating 2nd element figures
50,000 rounds
Next few images from Goodfellow lecture
Traditional mean-squared
Error, averaged, blurry
Last 2 are by deep learning approaches.
Similar to word embedding (DCGAN paper)
256x256 high resolution pictures
by Plug and Play generative network
From natural language to pictures
Deriving GAN
Note: PG is Gaussian mixture model, finding best θ will still be Gaussians, this
only can generate a few blubs. Thus this above maximum likelihood approach
does not work well.
Next we will introduce GAN that will change PG, not just estimating PG is
parameters We will find best PG , which is more complicated and structured, to
approximate Pdata.
Thus let’s use an NN as PG(x; θ)
PG(x,θ) Pdata(x)
θ
Prior
distribution Smaller
of z dimension
Larger How to compute the
dimension
likelihood?
PG(x) = Integrationz Pprior(z) I[G(z)=x]dz https://blog.openai.com/generative-models/
Basic Idea of GAN
G1 G2 G3
MaxDV(G,D), G*=arg minGmaxDV(G,D)
“difference” between
PG1 and Pdata
V(G1,D*1)
maxD V(G,D)
= V(G,D*), where D*(x) = Pdata / (Pdata + PG), and
1-D*(x) = PG / (Pdata + PG)
= Ex~P_data log D*(x) + Ex~P_G log (1-D*(x))
≈ Σ [ Pdata (x) log D*(x) + PG(x) log (1-D*(x)) ]
= -2log2 + 2 JSD(Pdata || PG ),
D1(x) D3(x)
D2(x)
Given G0
Find D*0 maximizing V(G0,D)
V(G0,D0*) is the JS divergence between Pdata(x) and PG0(x)
θG θG −η ΔV(G,D0*) / θG Obtaining G1 (decrease JSD)
Find D1* maximizing V(G1,D)
V(G1,D1*) is the JS divergence between Pdata(x) and PG1(x)
θG θG −η ΔV(G,D1*) / θG Obtaining G2 (decrease JSD)
And so on …
In practice … V = Ex~P_data [log D(x)]
+ Ex~P_G[log(1-D(x))]
Minimize L = - V’
or
Only
Once
Objective Function for Generator
in Real Implementation
Real implementation:
label x from PG as positive
Some issues in training GAN
Better
Evolution needs to be smooth:
……
Better PG_50(x) Pdata(x) JSD(PG_50 || Pdata) = log2
……
PG_100(x)
Pdata(x) JSD(PG_100 || Pdata) = 0
One simple solution: add noise
Data
Distribution
What we want
…
In reality …
Text to Image, by conditional GAN
"red flower with
Text to Image black center"
- Results From CY Lee lecture
Project topic: Code and data are all on web, many possibilities!
Algorithm WGAN
In each training iteration
Sample m examples {x1,x2, … xm} from data distribution Pdata(x)
Ian Goodfellow
comment: this Sample m noise samples {z , … , z } from a simple prior Pprior(z)
1 m
Only
Once
Experimental Results
d
Earth Mover’s Distance: best plan to
move
P
Q
JS vs Earth Mover’s Distance
d0 d50 d100