Lecture 5 Diffusion - Models Part II Final
Lecture 5 Diffusion - Models Part II Final
Standard Gaussian
Data distribution
reverse
forward
High-level overview
• Three categories:
Forward process
𝑥 𝑥
… …
𝑥 ~𝑝(𝑥 ) 𝑥 ~𝒩(0, 𝐼)
Denoising Diffusion Probabilistic Models (DDPMs)
𝑥 𝑥
… …
… …
𝑥 𝑥 𝑥 𝑥
Denoising Diffusion Probabilistic Models (DDPMs)
𝛽 = 𝛼
𝑥 ~𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝛽 ⋅ 𝑥 , 1 − 𝛽 I) 𝛼 =1 − 𝛽
… …
𝑥 𝑥 𝑥 𝑥
DDPMs. Training objective
Remember that:
𝑥 𝑥 𝑥 𝑥
… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ;𝜇 𝑥 ,𝑡 ,Σ 𝑥 ,𝑡 )
Reverse process
𝑥 𝑥 𝑥 𝑥
… …
𝑝 𝑥 𝑥 ≈𝑝 𝑥 𝑥 = 𝒩(𝑥 ; 𝜇 𝑥 , 𝑡 , 𝜎 I)
Reverse process
𝜇 (𝑥 , 𝑡)
~𝒩 𝑥 , 𝜇 (𝑥 , 𝑡), 𝜎 I
𝑥
DDPMs. Training Algorithm
1
min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇
Training algorithm:
Repeat 𝛽 = 𝛼
𝑥 ~𝑝 𝑥
𝑡~𝒰 1, … , 𝑇
𝑧 ~𝒩(0, I)
𝑥 = 𝛽 ⋅𝑥 + 1−𝛽 𝑧
𝜃 = 𝜃 − 𝑙𝑟 ⋅ ∇ ℒ
Until convergence
DDPMs. Training Algorithm
1
min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇
ℒ 𝛽 = 𝛼
Training algorithm:
Repeat
𝑥 ~𝑝 𝑥 %We sample an image from our data set
𝑡~𝒰 1, … , 𝑇 %choose randomly a time step t of the forward process
𝑧 ~𝒩(0, I) %sample the noise z_t
𝑥 = 𝛽 ⋅𝑥 + 1−𝛽 𝑧 % Get noisy image
𝜃 = 𝜃 − 𝑙𝑟 ⋅ ∇ ℒ %Update neural network weights
Until convergence
DDPMs. Sampling
𝑥
𝑧 (𝑥 , 𝑡)
• Pass the current noisy image along with t to the neural network
𝑥
𝑧 (𝑥 , 𝑡)
1 1 − 𝛼
~𝒩 𝑥 , 𝑥 − 𝑧 𝑥 ,𝑡 ,𝜎 I
𝛼
1−𝛽
𝑥
Outline
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Stochastic Differential Equations
6. Conditional Generation
7. Research directions
Score Function
• Apply iterative updates with the score function to modify the sample
• Result will have a higher chance of being a sample of the true distribution p(x)
Naïve score-based model
• Score: gradient of the logarithm of the probability density with respect to the input
𝛾
𝑥 =𝑥 + ∇ log 𝑝 𝑥 + 𝛾⋅𝜔
2
Step size – controls the magnitude of the update in the direction of the score
𝑥 ~ 𝒩 𝑥 , 𝑥, 𝜎 ⋅ 𝐼 = 𝑝 (𝑥 )
Objective
~ ( )
After training:
Naïve score-based model. Problems
o 𝑝 𝑥 ≈ 𝒩(0, 𝐼)
o Almost equally with the standard gaussian distribution.
• Training the NCSN with denoising score matching, the following objective is minimized:
1 𝑥 −𝑥
ℒ = 𝔼 ( )𝔼 ( | ) 𝑠 𝑥 ,𝜎 +
𝑇 𝜎
Noise Conditioned Score Network (NCSNs)
• Training the NCSN with denoising score matching, the following objective is minimized:
1 𝑥 −𝑥
ℒ = 𝜆 𝜎 𝔼 ( )𝔼 ( | ) 𝑠 𝑥 ,𝜎 +
𝑇 𝜎
Weighting function
Noise Conditioned Score Network (NCSNs). Sampling
Annealed Langevin dynamics
Parameters:
– number of iterations for Langevin dynamics
…< - noise scales
- update magnitude
Algorithm:
for t do:
for do:
return
Noise Conditioned Score Network (NCSNs). Sampling
Annealed Langevin dynamics
Parameters:
– number of iterations for Langevin dynamics
…< - noise scales
- update magnitude
Algorithm:
DDPM: ℒ = ∑ 𝔼 ~ , ~𝒩 , 𝑧 𝑥 ,𝑡 − 𝑧
NCSN: ℒ = ∑ 𝜆(𝜎 )𝔼 ~ , ~ ( | ) 𝑠 𝑥 ,𝜎 +
• In , the weighting function is missing because better sample quality when is set to 1.
• Iterative updates are based on subtracting some form of noise from the noisy image.
NCSN: 𝑥 = 𝑥 + ⋅𝑠 𝑥 ,𝜎 + 𝛾 ⋅𝑧
• This is true also for NCSN because 𝑠 𝑥 , 𝜎 approximates the negative of the noise.
1 𝑥 −𝑥
ℒ = 𝔼 ( )𝔼 ( | ) 𝑠 𝑥 ,𝜎 + 1
𝑇 𝜎 min 𝔼 ~ , ~𝒩 , 𝑧 − 𝑧 (𝑥 , 𝑡)
𝑇
1 𝑥 −𝑥
ℒ = 𝔼 ( )𝔼 ( | ) 𝑠 𝑥 , 𝜎 − (− )
𝑇 𝜎 ℒ
𝑥 −𝑥
𝑧=
• Therefore, the generative processes defined by NCSN and DDPM are very similar. 𝜎
Outline
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Stochastic Differential Equations
6. Conditional Generation
7. Research directions
Stochastic Differential Equations (SDEs)
• A generalized framework that can be applied over the previous two methods
𝜕𝑥 Notation for:
= 𝑓 𝑥, 𝑡 + 𝜎 𝑡 𝜔 ⟺ 𝜕𝑥 = 𝑓 𝑥, 𝑡 𝜕𝑡 + 𝜎(𝑡) ⋅ 𝜕𝜔
𝜕𝑡 𝒩(0, 𝜕𝑡)
Function for drift coefficient: gradually White Gaussian
nullifies the data x0 noise
Function for diffusion
coefficient: controls how much
Gaussian noise is added
Stochastic Differential Equations (SDEs)
• The training objective is similar to NCSN, but adapted for continuous time:
ℒ∗ =𝔼 𝜆 𝑡 𝔼 ( )𝔼 ( | ) 𝑠 𝑥 , 𝑡 + 𝛻 𝑙𝑜𝑔 𝑝 𝑥 𝑥
𝜕𝑥 = 𝑓 𝑥, 𝑡 𝜕𝑡 + 𝜎(𝑡) ⋅ 𝜕𝜔
Stochastic Differential Equations (SDEs). DDPM
∆
𝑥 ≈(1− ) 𝑥 ∆ + 𝛽 𝑡 ∆𝑡 ⋅ 𝑧
𝛽 𝑡 ∆𝑡 𝛽 𝑡 ∆𝑡
𝑥 ≈𝑥 ∆ − 𝑥 ∆ + 𝛽 𝑡 ∆𝑡 ⋅ 𝑧 ⟺ 𝑥 − 𝑥 ∆ = − 𝑥 + 𝛽 𝑡 ∆𝑡 ⋅ 𝑧
2 2
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Stochastic Differential Equations
6. Conditional Generation
7. Research directions
Conditional generation.
Diffusion models estimate the score function, 𝜵𝒙𝒕 𝒍𝒐𝒈 𝒑𝒕 𝒙𝒕 to sample from a distribution 𝒑 𝒙 .
Sampling from 𝐩 𝒙 𝒚 requires the score function of this probability density, 𝜵𝒙𝒕 𝒍𝒐𝒈 𝒑𝒕 𝒙𝒕 𝒚 ; y is condition.
Solution 1. Conditional training: train the model with an additional input 𝑦 to estimate 𝜵𝒙𝒕 𝒍𝒐𝒈 𝒑𝒕 𝒙𝒕 𝒚 .
𝑠 𝑥 , 𝑡, 𝑦 ≈
𝛻 𝑙𝑜𝑔 𝑝 𝑥 𝑦
𝑦
Conditional generation. Classifier Guidance
Diffusion models estimate the score function, 𝜵𝒙𝒕 𝒍𝒐𝒈 𝒑𝒕 𝒙𝒕 to sample from a distribution 𝒑 𝒙 .
Sampling from 𝐩 𝒙 𝒚 requires the score function of this probability density, 𝜵𝒙𝒕 𝒍𝒐𝒈 𝒑𝒕 𝒙𝒕 𝒚 .
Bayes rule:
𝑝 𝑦 𝑥 ⋅ 𝑝 (𝑥 )
𝑝 𝑥 𝑦 = ⟺
𝑝 (𝑦)
Logarithm:
log 𝑝 𝑥 𝑦 = log 𝑝 𝑦 𝑥 + log 𝑝 𝑥 − log 𝑝 𝑦 ⟺
Gradient:
𝛻 log 𝑝 𝑥 𝑦 = 𝛻 log 𝑝 𝑦 𝑥 + 𝛻 log 𝑝 (𝑥 ) − 𝛻 log 𝑝 (𝑦) ⟺
𝑠=1 𝑠 = 10
Problem
• Need to have good gradients estimates at each step of denoising process
Bayes rule:
𝑝 𝑥 𝑦) ⋅ 𝑝 (𝑦)
𝑝 𝑦𝑥 =
𝑝 (𝑥 )
Logarithm:
log 𝑝 𝑦 𝑥 = log 𝑝 𝑥 𝑦 − log 𝑝 𝑥 + log 𝑝 (𝑦)
Gradient
𝛻 𝑙𝑜𝑔 𝑝 𝑦 𝑥 = 𝛻 𝑙𝑜𝑔 𝑝 𝑥 𝑦 − 𝛻 log 𝑝 (𝑥 )
𝑠 𝑥 , 𝑡, 𝑦
≈𝑝 𝑥 𝑦
𝑦
Conditional generation. Classifier-free Guidance
𝑠 𝑥 , 𝑡, 𝑦/0 ≈
𝑝 𝑥 𝑦 /𝑝 (𝑥 |0)
𝑦/0
CLIP guidance
What is a CLIP model?
Radford et al., “Learning Transferable Visual Models From Natural Language Supervision”, 2021.
Nichol et al., “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models”, 2021.
CLIP guidance
Replace the classifier in classifier guidance with a CLIP model
Radford et al., “Learning Transferable Visual Models From Natural Language Supervision”, 2021.
Nichol et al., “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models”, 2021.
Outline
1. Motivation
2. High-level overview
3. Denoising diffusion probabilistic models
4. Noise Conditioned Score Network
5. Conditional Generation
6. Stochastic Differential Equations
7. Research directions
Research directions
Unconditional image generation:
• Sampling efficiency
• Image quality
Conditional image generation:
• Text-to-image generation
Complex tasks in computer vision:
• Image editing, even based on text
• Super-resolution
• Image segmentation
• Anomaly detection in medical images
• Video generation
Thank you !
Survey: Github:
https://arxiv.org/abs/2209.04747 https://github.com/CroitoruAlin/Diffusion-
Models-in-Vision-A-Survey