Team15 Dreamfusion
Team15 Dreamfusion
USING 2D DIFFUSION
[Poole et al., ICLR 2023]
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim
https://forums.fast.ai/t/new-paper-upainting-unified-text-to-image-diffusion-generation-with-cross-modal-guidance/101669
https://paperswithcode.com/dataset/laion-5b
https://blog.allenai.org/objaverse-a-universe-of-annotated-3d-objects-718ef3d61fd6
Background: NeRF
Background: NeRF
Issue using NeRF in text-to-3D
• Need multiple images from various perspectives to train NeRF.
• However in text-to-3D, we don’t have ground truth images, only have a single
text. → We can’t train NeRF in general way
Q. Then how can we train NeRF without ground truth images, but using only single text?
Render
image
NeRF
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
Text-to-Image (contains information on
NeRF diffusion model how to adjust the
rendered image to align
with the provided text)
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
Text-to-Image (contains information on
NeRF diffusion model how to adjust the
rendered image to align
with the provided text)
Optimize NeRF
Backpropagation
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
(contains information on
NeRF how to adjust the
rendered image to align
with the provided text)
(𝑅1 , 𝑇1 )
Text-to-Image
NeRF
diffusion model
(𝑅2 , 𝑇2 )
Text-to-Image
NeRF
diffusion model
(𝑅1 , 𝑇1 )
Text-to-Image
NeRF
diffusion model
(𝑅2 , 𝑇2 )
Text-to-Image
NeRF
diffusion model
𝑥𝑇 𝑥𝑇−1 𝑥3 𝑥2 𝑥1 𝑥0
Text prompt Text prompt Text prompt Text prompt
𝑥𝑇 𝑥𝑇−1 𝑥3 𝑥2 𝑥1 𝑥0
Text prompt Text prompt Text prompt Text prompt
Key point !
Diffusion model doesn’t predict denoised image directly,
but it predicts noise first and subtract to to denoise image
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF
NeRF
𝑥0
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF
NeRF
𝑥0
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(1, T)
NeRF
𝑥0
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(1, T)
𝑥0
𝑥𝑡
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(0, T)
𝑥0
Text-to-Image
diffusion model
2. Generate random noise
𝑥𝑡 𝜖𝑡Ƹ
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(0, T)
𝑥0
Text-to-Image
diffusion model
2. Generate random noise
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1
∇𝐿: (𝜖𝑡Ƹ −𝜖)
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(0, T)
𝑥0
Text-to-Image
diffusion model
2. Generate random noise
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2
∇𝐿: (𝜖𝑡Ƹ −𝜖)(𝑈𝑛𝑒𝑡 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛)
1. Render image from NeRF 3. Select random
denoise timestep t
t = random(0, T)
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2 3
∇𝐿: (𝜖𝑡Ƹ −𝜖)(𝑈𝑛𝑒𝑡 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛)(𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛)
1. Render image from NeRF 3. Select random
denoise timestep t
3 t = random(0, T)
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2 3
∇𝐿: (𝜖𝑡Ƹ −𝜖)(𝑈𝑛𝑒𝑡 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛)(𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛)
1. Render image from NeRF 3. Select random
denoise timestep t
In practice, U-Net Jacobian term is expensive to
3 t = random(0, T) compute. And omitting it doesn’t change the
update direction
4. Add noise to make
“A yellow lego bulldozer”
NeRF noisy image at timestep t 5. Predict noise
2
Text-to-Image
𝑥0 diffusion model
2. Generate random noise
CLIP Encoder
Result: Examples
Limitation
• SDS loss is not a perfect loss function. It often produces oversmoothed results.
• Dreamfusion uses 64x64 Imagen model so the image resolution is limited to 64x64.
Oversmoothed example
(64x64)
Limitation
Janus Problem
DreamFusion approximates the view direction by categorizing angles into four rough categories,
which are “overhead”, “front”, “side”, and “back”.
However, this method can lead to issues, such as the occurrence of multiple features
(e.g., faces, eyes) at different angles.
Contribution
Papers inspired by Dreamfusion
- As the originator of using a 2D diffusion model to create 3D objects, this methodology led to
many subsequent studies that improved SDS loss, resulting in better text-to-3D models.
- This approach offers a revolutionary methodology for solving 3D-related tasks not by relying
on scarce 3D data, but by utilizing abundant 2D data alone.
Thank You
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim
Quiz
https://forms.gle/HG67Nz3DrawxLVkq7
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim