Slides 1
Slides 1
Diffusion Models
Part 1: From U-Nets to Diffusion
Agenda
• Part 1: From U-Nets to Diffusion
• Part 3: Optimizations
• Part 5: CLIP
• Electronic music
• Video games graphics
• Video game AI
• Computer animation
• Instant messaging chatbots
Discrimination Network
Flatten
Normalization
Convolution
Activation
Pooling
Function
GANs
Generative Adversarial Networks
Discrimination Network
Flatten
GANs
Generative Adversarial Networks
Discrimination Network
Flatten
Generation Network
Add to
[0.2, -.8, … -0.1]
CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block
Noise
Vector
GANs
Generative Adversarial Networks
Normalization
Normalization
Convolution
Activation
Activation
Conv T
Function
Function
Generation Network
Add to
[0.2, -.8, … -0.1]
CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block
Noise
Vector
GANs
Generative Adversarial Networks
Discriminator Real
Real Images
Fake
Generator
Fake
Noise Images
Image Segmentation
Not Corgi
Corgi
Image Segmentation + GANs
NVIDIA Spade
U-Nets
GANs
Generative Adversarial Networks
Discrimination Network
Flatten
Generation Network
Add to
[0.2, -.8, … -0.1]
CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block
Noise
Vector
GANs U-Nets
The U shaped Autoencoder
Encoder
Latent Vector
Decoder
[0.2, -.8, … -0.1]
CT
CT Featur Up
Feature Map Up e Map Block
Block
Latent
Vector
U-Nets
The U shaped Autoencoder
128 px
128 px
128 px
128 px
Feature
Feature Down0 Map
Map Copy
(Down0) (Up2)
px
2 8
x 1
px p
px
8 2 8 1 ch
2
8
1 2 x 50 ch 1
12
50 ch
3 ch 64 px
64 px
Feature
Feature
Map Down1
Map
(Down1 Copy
(Up1)
)
px
px
64
64
100 ch 2 x100 ch
32 px
32 px
Feature Map Feature Map
Down2 Copy
(Down2) (Up0)
px
px
200 ch 2 x 200 ch
32
32
Latent Vector 400 ch
200 x 32 x 32
204,800 features
Transposed Convolution
Transposed Convolution
Convolution Review
1 0 1
.25 .25
0 1 0
.25 .25
1 0 1
Transposed Convolution
Convolution Review
1 0
1
• .25 • .25
.25 .25 .5
0 1
0
• .25 • .25
.25 .25
1 0 1
Transposed Convolution
Convolution Review
0 1
1
• .25 • .25
.25 .25 .5 .5
1 0
0
• .25 • .25
.25 .25
1 0 1
Transposed Convolution
Convolution Review
1 0 1
.25 .25 .5 .5
0 1
0
• .25 • .25
.25 .25 .5
1 0
1
• .25 • .25
Transposed Convolution
Convolution Review
1 0 1
.25 .25 .5 .5
1 0
0
• .25 • .25
.25 .25 .5 .5
0 1
1
• .25 • .25
Transposed Convolution
Image Upscaling
.25 .25 1 0 1
0 1 0
.25 .25 1 0 1
Transposed Convolution
Image Upscaling
1 0 0 0 1
.25 .25 0 0 0 0 0
0 0 1 0 0
.25 .25 0 0 0 0 0
1 0 0 0 1
Transposed Convolution
Image Upscaling
1 0
• .25 • .25 0 0 1
.25
0 0
.25 .25 • .25 • .25 0 0 0
0 0 1 0 0
.25 .25 0 0 0 0 0
1 0 0 0 1
Transposed Convolution
Image Upscaling
0 0
1 • .25 • .25 0 1
.25 0
0 0
.25 .25 0 • .25 • .25 0 0
0 0 1 0 0
.25 .25 0 0 0 0 0
1 0 0 0 1
Transposed Convolution
Image Upscaling
1 0 0 0 1
.25 0 0 .25
.25 .25 0 0 0 0 0
0 .25 .25 0
0 0 1 0 0
0 .25 .25 0
.25 .25 0 0 0 0 0
.25 0 0 .25
1 0 0 0 1
Transposed Convolution
Stride
Image Image
Stride = 2 Stride = 3
1 0 0 0 0 0 1
1 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 1
Transposed Convolution
Padding
Image Image
Stride = 3 Padding = Stride = 3 Padding =
0 2
1 0 0 0 0 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0 0 0 1
Transposed Convolution
Padding
Image Image
Stride = 3 Padding = Stride = 3 Padding =
0 2
1 0 0 0 0 0 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 0 0 0 0 0 1
Transposed Convolution
Out Padding
1 0 1 1 0 1 0 1 0 1 0 0
0 1 0 0 1 0 0 0 1 0 0 0
1 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0
Image Resizing
Upsampling
64 px
128 px
192 px
64 px
128 px
192 px
Image Resizing
Upsampling
Deconvolved
Convolved
Image
Random Noise
The Experiment
16 px
16 px
16 px
16 px
Feature
Feature Down0 Map
Map Copy
(Down0) (Up2)
px
16
x
px
px
p 1 ch
6
16
1 2 x 16 ch
16
16 ch
1 ch
Feature
8 px
8 px
Feature
Map Down1
Map
(Down1 Copy
(Up1)
)
px
px
8
32 ch 2 x 32 ch
8
4 px
4 px
Feature Map Feature Map
Down2 Copy
(Down2) (Up0)
px
px
64 ch 2 x 64 ch
4
4
Latent Vector 128 ch
64 x 32 x 32
1024 features
Let’s get started!
Appendix:
The Normal Distribution
De Moivre
From Coin Flips to Bells
1
𝑝=
Pr ( 𝑋 =𝑘 )= 𝑛
𝑘( )
𝑘
𝑝 (1− 𝑝)
𝑛− 𝑘 2
𝑛=4
𝑛! 𝑘 𝑛− 𝑘
Pr ( 𝑋 =𝑘 )= 𝑝 (1− 𝑝)
𝑘! ( 𝑛− 𝑘 ) !
𝑘= 2
()
2 4−2
4 ! 1 1
Pr ( 𝑋 =2 ) = (1− )
2 ! ( 4 −2 ) ! 2 2
Pr ( 𝑋 =2 ) =
2 ∙1 ∙ 2∙ 1 4( )( )
4 ∙ 3 ∙ 2∙ 1 1 1
4
6
Pr ( 𝑋 =2 ) =
16
A weighted coin flipping through
the air like a cartoon
De Moivre
From Coin Flips to Bells
De Moivre
From Coin Flips to Bells
Pr ( 𝑋 =𝑘 )= 𝑛
𝑘( )
𝑘
𝑝 (1− 𝑝)
𝑛− 𝑘
𝑛! 𝑘 𝑛− 𝑘
Pr ( 𝑋 =𝑘 )= 𝑝 (1− 𝑝)
𝑘! ( 𝑛− 𝑘 ) !
( )
𝑛
𝑛
𝑛!≈ √2 𝜋
𝑒
2
( 𝑘 −𝑛𝑝 )
()
𝑛 𝑝 𝑘 𝑞 𝑛−𝑘 ≃ 1 −
2 𝑛𝑝𝑞
𝑒
𝑘 √2 𝜋 𝑛𝑝𝑞
𝑥 −𝜇
𝑧=
𝜎