0% found this document useful (0 votes)
8 views

Slides 1

Uploaded by

Ons Hadrich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Slides 1

Uploaded by

Ons Hadrich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Generative AI with

Diffusion Models
Part 1: From U-Nets to Diffusion
Agenda
• Part 1: From U-Nets to Diffusion

• Part 2: Denoising Diffusion Probabilistic Models

• Part 3: Optimizations

• Part 4: Classifier Free Diffusion

• Part 5: CLIP

• Part 6: Wrap-up & Assessment


Prerequisites

• Basic familiarity with convolutional neural


networks (CNNs)
• Basic familiarity with a deep learning framework
such as:
• PyTorch
• TensorFlow
A Brief History of
Generative AI
The Imitation Game
A.K.A The Turing Test

A robot looks in a mirror and the reflection is human, cyberpunk


IBM 704
The First Singing Computer
Eliza
The First Gen AI Chatbot?
Generative AI of the 70’s,
80’s and 90s?

• Electronic music
• Video games graphics
• Video game AI
• Computer animation
• Instant messaging chatbots

An 80’s arcade with lots of machines


The Rise of Neural
Networks
Deep Dreaming
Images by Martin Thoma

Original 10 Iterations 50 Iterations


GANs
Generative Adversarial Networks

Discrimination Network

[0.1, -.2, … 0.9]


Real
CNN CNN
Down Down
Featur Feature Map
Block Block
e Map
DNN
Fake

Flatten

Normalization
Convolution

Activation
Pooling
Function
GANs
Generative Adversarial Networks

Discrimination Network

[0.1, -.2, … 0.9]


Real
CNN CNN
Down Down
Featur Feature Map
Block Block
e Map
DNN
Fake

Flatten
GANs
Generative Adversarial Networks

Discrimination Network

[0.1, -.2, … 0.9]


Real
CNN CNN
Down Down
Featur Feature Map
Block Block
e Map
DNN
Fake

Flatten

Generation Network

Add to
[0.2, -.8, … -0.1]

CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block

Noise
Vector
GANs
Generative Adversarial Networks

Normalization

Normalization
Convolution
Activation

Activation
Conv T

Function

Function
Generation Network

Add to
[0.2, -.8, … -0.1]

CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block

Noise
Vector
GANs
Generative Adversarial Networks

Discriminator Real

Real Images

Fake

Generator

Fake
Noise Images
Image Segmentation

Not Corgi

Corgi
Image Segmentation + GANs
NVIDIA Spade
U-Nets
GANs
Generative Adversarial Networks

Discrimination Network

[0.1, -.2, … 0.9]


Real
CNN CNN
Down Down
Featur Feature Map
Block Block
e Map
DNN
Fake

Flatten

Generation Network

Add to
[0.2, -.8, … -0.1]

CT Discriminator
CT Featur Up
Up Dataset
Feature Map e Map Block
Block

Noise
Vector
GANs U-Nets
The U shaped Autoencoder

Encoder

[0.1, -.2, … 0.9]


CNN CNN
Down Down
Featur Feature Map
Block Block
e Map
DNN

Latent Vector

Decoder
[0.2, -.8, … -0.1]

CT
CT Featur Up
Feature Map Up e Map Block
Block

Latent
Vector
U-Nets
The U shaped Autoencoder

128 px
128 px
128 px

128 px
Feature
Feature Down0 Map
Map Copy
(Down0) (Up2)
px
2 8
x 1
px p
px

8 2 8 1 ch
2
8

1 2 x 50 ch 1
12

50 ch
3 ch 64 px

64 px
Feature
Feature
Map Down1
Map
(Down1 Copy
(Up1)
)

px
px

64
64

100 ch 2 x100 ch
32 px

32 px
Feature Map Feature Map
Down2 Copy
(Down2) (Up0)
px

px
200 ch 2 x 200 ch
32

32
Latent Vector 400 ch
200 x 32 x 32
204,800 features
Transposed Convolution
Transposed Convolution
Convolution Review

Kernel Image Output

1 0 1
.25 .25
0 1 0
.25 .25
1 0 1
Transposed Convolution
Convolution Review

Kernel Image Output

1 0
1
• .25 • .25
.25 .25 .5
0 1
0
• .25 • .25
.25 .25
1 0 1
Transposed Convolution
Convolution Review

Kernel Image Output

0 1
1
• .25 • .25
.25 .25 .5 .5
1 0
0
• .25 • .25
.25 .25
1 0 1
Transposed Convolution
Convolution Review

Kernel Image Output

1 0 1
.25 .25 .5 .5
0 1
0
• .25 • .25
.25 .25 .5
1 0
1
• .25 • .25
Transposed Convolution
Convolution Review

Kernel Image Output

1 0 1
.25 .25 .5 .5
1 0
0
• .25 • .25
.25 .25 .5 .5
0 1
1
• .25 • .25
Transposed Convolution
Image Upscaling

Kernel Image Output

.25 .25 1 0 1

0 1 0
.25 .25 1 0 1
Transposed Convolution
Image Upscaling

Kernel Image Output


Stride = 2

1 0 0 0 1

.25 .25 0 0 0 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Transposed Convolution
Image Upscaling

Kernel Image Output


Stride = 2

1 0
• .25 • .25 0 0 1
.25
0 0
.25 .25 • .25 • .25 0 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Transposed Convolution
Image Upscaling

Kernel Image Output


Stride = 2

0 0
1 • .25 • .25 0 1
.25 0
0 0
.25 .25 0 • .25 • .25 0 0

0 0 1 0 0
.25 .25 0 0 0 0 0

1 0 0 0 1
Transposed Convolution
Image Upscaling

Kernel Image Output


Stride = 2

1 0 0 0 1
.25 0 0 .25
.25 .25 0 0 0 0 0
0 .25 .25 0
0 0 1 0 0
0 .25 .25 0
.25 .25 0 0 0 0 0
.25 0 0 .25
1 0 0 0 1
Transposed Convolution
Stride

Image Image
Stride = 2 Stride = 3

1 0 0 0 0 0 1
1 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0

1 0 0 0 0 0 1
Transposed Convolution
Padding

Image Image
Stride = 3 Padding = Stride = 3 Padding =
0 2

1 0 0 0 0 0 1

0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0

1 0 0 0 0 0 1
Transposed Convolution
Padding

Image Image
Stride = 3 Padding = Stride = 3 Padding =
0 2

1 0 0 0 0 0 1

0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0

1 0 0 0 0 0 1
Transposed Convolution
Out Padding

Image Image Image


Out Out Out
Padding = Padding = Padding =
0 1 2

1 0 1 1 0 1 0 1 0 1 0 0

0 1 0 0 1 0 0 0 1 0 0 0

1 0 1 1 0 1 0 1 0 1 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0
Image Resizing
Upsampling

64 px

128 px

192 px
64 px

128 px

192 px
Image Resizing
Upsampling

Nearest Bilinear Bicubic


Deconvolution?
Same as Transposed Convolution?

Image Convolved FFT Convolved


Image

Deconvolved
Convolved
Image

Point Point Spread FFT Point Spread


Function Function
Lab
FashionMNIST
Convolutional Neural Network “Hello World”
Hypothesis: Generate an image from Noise

Random Noise
The Experiment

16 px
16 px
16 px

16 px
Feature
Feature Down0 Map
Map Copy
(Down0) (Up2)
px
16
x

px
px

p 1 ch
6

16
1 2 x 16 ch
16

16 ch
1 ch

Feature
8 px

8 px
Feature
Map Down1
Map
(Down1 Copy
(Up1)
)

px
px

8
32 ch 2 x 32 ch
8

4 px

4 px
Feature Map Feature Map
Down2 Copy
(Down2) (Up0)
px

px
64 ch 2 x 64 ch
4

4
Latent Vector 128 ch
64 x 32 x 32
1024 features
Let’s get started!
Appendix:
The Normal Distribution
De Moivre
From Coin Flips to Bells

1
𝑝=
Pr ( 𝑋 =𝑘 )= 𝑛
𝑘( )
𝑘
𝑝 (1− 𝑝)
𝑛− 𝑘 2

𝑛=4
𝑛! 𝑘 𝑛− 𝑘
Pr ( 𝑋 =𝑘 )= 𝑝 (1− 𝑝)
𝑘! ( 𝑛− 𝑘 ) !
𝑘= 2

()
2 4−2
4 ! 1 1
Pr ( 𝑋 =2 ) = (1− )
2 ! ( 4 −2 ) ! 2 2

Pr ( 𝑋 =2 ) =
2 ∙1 ∙ 2∙ 1 4( )( )
4 ∙ 3 ∙ 2∙ 1 1 1
4

6
Pr ( 𝑋 =2 ) =
16
A weighted coin flipping through
the air like a cartoon
De Moivre
From Coin Flips to Bells
De Moivre
From Coin Flips to Bells

Pr ( 𝑋 =𝑘 )= 𝑛
𝑘( )
𝑘
𝑝 (1− 𝑝)
𝑛− 𝑘

𝑛! 𝑘 𝑛− 𝑘
Pr ( 𝑋 =𝑘 )= 𝑝 (1− 𝑝)
𝑘! ( 𝑛− 𝑘 ) !

( )
𝑛
𝑛
𝑛!≈ √2 𝜋
𝑒
2
( 𝑘 −𝑛𝑝 )

()
𝑛 𝑝 𝑘 𝑞 𝑛−𝑘 ≃ 1 −
2 𝑛𝑝𝑞
𝑒
𝑘 √2 𝜋 𝑛𝑝𝑞

A weighted coin flipping through


the air like a cartoon
De Moivre
From Coin Flips to Bells

𝜎 =𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑎𝑖𝑡𝑖𝑜𝑛, 𝑎 .𝑘 . 𝑎. 𝑠𝑝𝑟𝑒𝑎𝑑

𝑥 −𝜇
𝑧=
𝜎

You might also like