0% found this document useful (0 votes)
30 views

DiffusionModel DDPM

Uploaded by

gnb.kartikgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

DiffusionModel DDPM

Uploaded by

gnb.kartikgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

DiffusionModed fat

INTRODUCTION
Diffusion Models state of the art for
are

generating diverse and Highesolution


images
Large scale models

follows
Few success architectures
are as

GLIDE
DAL LE 2 3
IMA G EN
DDP M Denoising
Our focus will be on
Ho et al 2020
Probabilistic Model by
Diffusion
then
Models are fundamentally different
Diffusion techniques
GAN's and other Image generation
In it is decomposed into smaller denoising
brief
steps
They can generate great stuff
GAN's have some issues and diffusion
models are better equipped
recent works
Some of the
magical
Using Stable diffusion

O
Diffusion Models Time line

Irate
prompt Ex 01
Tent 2
Img
via
Tent 2 Img Observe the detailing
The overall DDPM Illustration with Algo
This is plain image generationfrom noise

noise
Training viaprediction

Imagegenerals

This is conditional
On
image generation which

of
Can be properly Cautioned
ones tent

Fingp
learn
model that can

If we built a learning noise then


due to
information
Systematic decay of the process to
to reverse
it should be possible
noise
recover this back
info from
NatientF
from
Ends4ample
82
I
Predicted69
encodes

Backfartion

I ate A Blc D we Genera


need to train a goodgenerator
All of them uses differentways to regularize the
suitable
training for
IIIm.at and
rae
recovered
in
back to the
which data is project

to a latent space
initial state
contains compressed in
latent representation
VA The
about the original images
is trainable
to latent space
Mapping the last step
data entirely after
D1 Destroy the
representation
the forward pass and latent
of as the original ing
have the same dimensions

forward process is not trainable

the data
In DMs instead of learning modeling
aims to model
a series of noise
distribution it
so as to
distribution in a Markov chain
demois the data
decode the data by undoing ing
in a hierarchical fashion
lead to
iterative auto correction
steps
Small
Hires and good quality sample
generation

I
But it makes DMs slow as compared to GAN's
In it is assumed that
DMs during this process
model correct I entirally generate
self
can

good quality image iteratively

Basic idea is iterative representation

refinement are slower


But due to iterative sampling they
than GaN even at inference

to do sampling and noise


Question How
addition in a single step
NE E E 1 noise small
Adding
If of t we sampled from unit
cess
Sampling from unit normal and adding
it total
be combinedfmerged into single step
at can

How
Given
at sample set directlyfrom
1 E I
W Rt Mt JIB RE I E P

Sample Et from unit Normal


Et N O D
N
O Be r No Bt
JE Ee n

ITB Ke t JB Ee Nut Ee

NE P At JB E
Reparameterization

Process
Diffusion
There are two steps forward 4 Backward
overview Forward Diffusion

Input In Adding noise


gradually iteratively
through a series of
steps compute
Noise

ILhtnn.TL
basic Gaussian It used as intermediate
can be

target for our Neural Network


Forward step is known noise addition Requires no
Learning

Dg adding noise in a manner

so that later one can learn how to denoise


Reverse Diffusion
Now one need to train to recover the
the noising process
original data by reversing
how to devise
learning

NT P Ho

Since R D is stochastica
reversing
no Yan be different actual
process from
but can be devised version of
input
some
image
sampling
the
essentiallyTimid

is
Distribution
Sampled from Real Data
x
No N q
I
No t Es
A
t
Me
Nz 24 Ez
Iterge of 2 Gaussians

I
Nz 2 E3 Not GtEztcz_

at
at
t E MGTIE.EE
merging
aussians
are
if they
small

Gaussian
1
Any EE NO

Ni JEE NEI I TEI

Need to be small
Diffusion Process Explained
In detail

Forward Diffusion
VA E's also
Similar to DMs can

be seen as latent variable model


I
that
space

Watchmen
feature be directly
can't
observable

Forward diffusion can


actually
be

Chain
seen as Markov of T steps
on
Output of Ct step only depends
t IS
previous input
at
TEE at

at each Markov
specifically step of
chain small Gaussian noise with

variance 13 is added to t get


new latent variable

gin
Real Data
Distribution
no is 914
sampling q
x Rt 1

No 24 at at
I 4
Iteratively add
IsmaFaaussian New latent

E N
noise variable
will follow a
Ft
Dg Msg
E
distribution gutta 1
N Kt METTE E I
9 H E D
E BI
is
Gaussian
This with
distribution Bf is small
ÉDE

Hencengytends

small variance
ensuresuallchaing

introduced
changes
I Matrix
Identity
Each dimension has the
same standard deviation
fig d xd

x RT
Hence can 26
go from
we

in tractable
a
way
a men
914 t
2

UCI TD
Trajectory
II 9 H E D

9 23 2
q se a q nay
Joint probability

Symbol in q 24 D

we
keep on
applying repeatedly
for time steps t called as

Trajectory
need to
But for Co
K so we

500
500 times Basically samplings
apply

This makes sampling sequential slow


very
I Trick
Reparameterization
istheremedyx
one step
For
fast
I sampling
same closedform
Exactly
as VA E

We need Tractable closed form sampling


at time step
any
I
t
This reparameterization works just because

we have chooser ly as IF 2
mathematics
was It

t.gov
gMfI

Reparameterization trick
A tractable closed form sampling
at time step
any
Let at I Be
i e
Bit
I It as 4 4 4
cumulative multiplication
all
Assuming Ey Ey z No 1

one can use


single IET

NE JI at
t
IBE Et i

HE t
JE Nt Et l

Ay JI Nt z
t
Fat Et z

placing in

Ke Ja 9 2 5544 1 Et z

Fatt e
Adding 2 Random Variables coming from
normal distribution

Let X N N ex j Yo
Y N W My 9 op

Z X t y NW Mx thy JÉÉy Etty

Rebeating Eq do

NE TH Kt 2tF E

site
Here
This aliviables
can be seen as
aref added
J

NO Y o
E
X n
JET
var
vars at
y 44 D
Z X TY NW O
4594 t t 4

NW O I
44 e

Hence
2 It t 2 JEE tt z

or even E

all e's are sampled


fromWco 1 can be
taken as E

Ne JI no TIE to

revisit closely
Now
if we

N
JI no t ITE to

ywsy.wyyyww.my
This can be written as
S D
Y
Ne N N CT no
FIJI
Hence
var I E

at N
q Rt no

sampled from 9
is 2 126
at
where q 2412 N Gi JI no SEI
Here
All are hyperparameters
Pt x E
and and
already fixed
can be precomputed as
per
the variance schedule

i
But How can above eg can be
used which
for fast sampling of 209
is to calculate our tractable
required
volved
motive
www.jgig
went to
hear 27

Real data Intermediate content variable

M 22 Ha 241 Rtt t
ht

Sample found real data

50
7
30 9 2212
1 00
s
Computed
I sampled from
unit

11 11
Gaussian
Finton
sampled from g
only once
iii

is just
sampling
2

I p ga
Be Bo uses
DDPM
linearscheduled
can be fixed
Typically
linear Quadratic schedule

or cosine Best as reported


Nicht Dhaliwal 2021
Linear Vs Cosine Schedule basednoise addition

O is Parameter
Reverse Denoising

I J t t
O Parametric f distribution can Assuming it to be a
distribution tame JehiIggyhm.g

parametrized by
amplington can sing Moca E II Cnt t
distribution
this Suitable
a 4 Rt
ensure the system
parametric
Provided trameding
Suitable
is heardenoising
DM Us UAE

DM suggests that
On the other hand
if we
Vat take data
build a model learning a
d
Systematic Project into a latent
info decay
due to noise addition space
I
d learn its distribution
It can be possible to
the ELBO
reverse the process to utilizing
loss
recover info back by image
reconstruction
Hence instead
of learning
play it learns to model
aseeies9misedistibet
decode data
followingMRP.I
in hierarchal
by devising
a

fashion
REVERSE DIFFUSION

will lead 2
sampled from an

T I
isotropic Gaussian
distribution

all intermediate noise additions


Essentially
are small Gaussian
leads to bigger
a
mega
Isotropic Gaussian

Also inversion of Gaussian can also be a

Gaussian
Gaussian noise
addition
G

Gaussian noise
G
Removal
know learn how to convert
If we

2 NE I
ie ne we can sample
q at
My W o 1

I
Run reverse Inon q
process to get

real data
novel data point sampled from
the real
distribution without learning
data distribution just understanding
by
How to systematically devise in each step
Question is how to get q at see
i

K
Practically this is
not known
It can be estimated why
statistically and require
computation involving full
data its distribution

all data is needed


Basically
as well as with all intermediate
steps
I
Mismokesitabadabled
Dataset Real images
of
É

times

MEI ioise Do such things for


In times

i
insults

distribution
to
gordonwe
n
T
this
do
distribution
Require full dataandits intermediates as

need to be learned

time t save
Require huge memory
compute this
distribution
Practically infeasible
the reverse
Instead of computing
ComPut
Reversal Process
distribution Ito
I Wired
Let us approximate it using
Network
Approximate
Iq

q re in
using an

CO but a MN

As discussed this is
atoned

asGaussian star
Gaussian
Po also be a Gaussian and
Assuming
need to out k o
O figure
t Oo t
Mo at Rt
k
be t
It can
Eg
He Loy
Rt
condiunedonthWhy
Mo
but
additionally

Po Nt Kt NCE is More t
Egbert

H
N N Mo Mit Zola t

tr
are
One can see
they
Mo Hit
conditioned
additionally
z g t over

I
This will lead to learn how to

parameter for each


predict Gaussian
step specifically as it
varying upon time

No No 24 22 NT

Trajectory
One can
get data distribution from normal

distribution noises
Noise 44

90908 sexy
Basically WrortNP Noise

steps n

E
IE

E
Po Not Po at TI Po at i ke

pole Po 4 1 4 Po 7 214 1
polenta pole na

Po nope
Joint distribution
of the trajectory
One can that did
say postmen qq.tt
Nt
N N at i Mo ait i Eo ait

No Est Of
Ht FI z Get
know what
If we
amount
of parameterized
added we can remove

noise addition
geta

0JYQa.y sikiserversal is
w
ill posed intractable due
bar
to stochasticity Mei
V Il
i
Allen real data
distribution
EKLnQnstzDenoismgcondition.on
what to generate
x Is

Essentially we need to predict the noise

whose removal leads to ad Stef


it doesnt
BUT alone with
only no
mean much
Essentially
Step
direction guidance where
need some
must I need to be going
our diffusion
to lead to direction coming from
where to go Source

qTÉ9Ét is intractable Sohl Dickstein


it tractable
making
additional conditioning with How

it on

after additionally conditioning


makes it tractable

Doubly K pet 20
q me g conditioned
q
n
É the
Butfighadhaw
ty
t
namable
from
set
image
Effusion

Fetimagegand
in small denoising steps

additionally
of
Conationdwn
presence

use
this to ground
get the because
truth for Po No
available during
are
training
Hence

Ny N
NCE i NEED FEI
1 This can BE Assuming
qq.pro proved later diagonal

BELIE Be
Gmattnatibly
L

Ft CI ti
Me FIEF I I

It only depends on Bet


can all be precomputed

O
pin singles
Now
coming
FE JInotfc.TT

Just rewriting for no


I MIO D
No at JE ETE
Putting OB into

M Gt I Gt Eff
only depends a
ttinet
on
Targetmeana
Each time step now can have a

mean that only depends on a


Target
things discussed
Justsummanzing so
far
Distribution
Distribution Predicted
Target
t toe
parameterized distribution
needs to
NM
be learned by
assumed
Gaussian distribution
9 3
qcat
ilm.dz
math
t

predicted
914 Rte
distributing forwardGausseandistnd
Distribution
Target
q
2
1 2 120 UN Ge i NICHE

ambition
These things IT
Mo
IEEE
can
be Igt EG I
formally later
proved I
Ect C E I E
This shows that

At each step
I I
with
a
Epis mandated
Mala a Make fun of Cat

Eq
t fun of a
fixed known
Predicted Distribution

No Reit
Po Ge PINK s
Et
r
Mo

It Ect I
KE JEE OED
Go41117
t must be parameterized
q ay Since
as a
fn of oft
only don't depends on

As
Because it isnot auditioned on
y
is conditioned
JePredict
need not to
it Just
Po 74 1 24
precompute and
only on not on utilize whenever
required
fining variance is
Convinient but
later limiting
also learnt
by other algorithms
distributionPolat
Now in order to match Predicted
B
as Close as possible to gene Kp A
i
target distribution k
e
onE
KL q Kt Pa No H Paktika

essentially both
are

gaussians with Same


Vaname Ight
i the KL Dix minimization
boils down to

No d 11
Io
finally this can be

Kito Mell for a


fogiven t E

This simply 1 E D at E It
leads to
Noise prediction
where 4
TenotFa_
Simple Lossfu essentially noise prediction

The neural net to used to


design of
Predict noise as ofp for input e
of Samedimentions
Best and

popular
Diffusion
methods
Styleconversion

tent to realistic images


limitations are Face distortion

Genesaling
tent in an image

Diffusionmodels
are
currently
bad
Di usion Models for Image Generation
Reading Plan for Deep Learning Enthusiast (Prerequisite is VAE)
Prepared by Aditya Nigam @ IIT Mandi

Excellent blogpost By Lilian Weng (What are di usion models?)

https://lilianweng.github.io/posts/2021-07-11-diffusion-models/#score

Must have to finish this blogpost to understand all the components.

Both below posts summarizes the work (for the top)

1. AI-Summer Post by Sergios Karagiannakos and Nikolas Adaloglou (The math of


diffusion models from scratch) https://theaisummer.com/diffusion-models/
2. Medium post by J. Rafid Siddiqui, explaining with a very well annotated figure.
https://towardsdatascience.com/what-are-stable-diffusion-models-and-why-are-the
y-a-step-forward-for-image-generation-aa1182801d46

Two other resources, one comprehensive and other for everyone.

1. A detailed tutorial by Calvin Luo, https://arxiv.org/pdf/2208.11970.pdf


2. Youtube Video Explanation by Jay Alammar (from top for even non technical
persons), https://www.youtube.com/watch?v=MXmacOUJUaw with a
corresponding Blog Post https://jalammar.github.io/illustrated-stable-diffusion/

You might also like