【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models

DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Diffusion model関連研究
Kensuke Wakasugi, Panasonic Holdings Corporation.
1

書誌情報
2
[1] Vincent, P. (2011).
A connection between score matching and denoising autoencoders.
Neural computation, 23(7), 1661-1674.
[2] Song, Y., & Ermon, S. (2019).
Generative modeling by estimating gradients of the data distribution.
Advances in Neural Information Processing Systems, 32.
[3] Ho, J., Jain, A., & Abbeel, P. (2020).
Denoising diffusion probabilistic models.
Advances in Neural Information Processing Systems, 33, 6840-6851.
[4] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020).
Score-based generative modeling through stochastic differential equations.
arXiv preprint arXiv:2011.13456.
[5] Anderson, B. D. (1982).
Reverse-time diffusion equation models.
Stochastic Processes and their Applications, 12(3), 313-326.
[6] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022).
High-resolution image synthesis with latent diffusion models.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695).
[7] Song, Y., Shen, L., Xing, L., & Ermon, S. (2021).
Solving inverse problems in medical imaging with score-based generative models.
arXiv preprint arXiv:2111.08005.
その他参考URL：
What are Diffusion Models?
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Generative Modeling by Estimating Gradients of the Data Distribution
https://yang-song.net/blog/2021/score/
選書理由
Diffusion modelの基礎部分の理解と、最新の応用事例に興味があったため
Diffusion modelの
考え方の理解のため
最近の応用事例

目次
3
1. Diffusion model概要
• 生成モデル
• スコア関数の学習精度向上とランジュバン動力学[2]
• デノイジングのモデル化[3]
• 確率微分方程式による定式化[4]
2. Diffusion modelの解釈（発表者の）
• マルコフ連鎖モンテカルロ法との対比
• 更新のステップと温度
3. 最新の応用研究
• 潜在空間でのdiffusion-based model[6]
• 医用画像の事前分布としての応用[7]

生成モデル
4
 目的
• データセット{𝑥1, 𝑥2, … , 𝑥𝑛}が与えられたとき、それを生成する𝑝(𝑥)をモデル化したい
 手法への要請
• log 𝑝𝜃(𝑥𝑖)の計算（学習時）
• 𝑝𝜃(𝑥)からの𝑥のサンプリング（生成時）
モデル学習時の工夫生成方法
GAN 敵対的学習デコーダ
VAE ELBO最大化デコーダ
Flow-based model 対数尤度を直接計算逆関数
Energy-based model 𝑍𝜃の計算回避サンプリング
Score-based model スコアマッチング＋DAE [1] サンプリング
 モデル構築と目的関数
• 𝑝𝜃(𝑥)でモデル化．あるいは、 𝑝𝜃 𝑥 =
𝑒−𝛽𝑓𝜃 𝑥
𝑍𝜃
でモデル化．
• max
𝜃 𝑖=0
𝑛
log 𝑝𝜃(𝑥𝑖)
[1] Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural computation, 23(7), 1661-1674.
Score-based
model自体は古くか
ら研究されている

スコア関数の学習精度向上とランジュバン動力学
5
Songら[2]が、スコア関数の学習精度向上と、ランジュバン動力学に基づくサンプリング方法を提案
図式引用：[2] Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32.
 目的関数（スコア関数の近似）
• 学習時の工夫：
学習データは一般に𝑝(𝑥)の大きい領域に偏在．
スコア関数の学習に悪影響（全域で学習できない）．
→ 分散の大きさの異なるノイズをデータに付加し、
それらのスコア関数も同時に学習
• 生成時の工夫：
ランジュバン動力学に用いるスコア関数として、
ノイズ大→小の順に更新を行い、データを生成．
※アニーリングと類似した発想
 生成時の更新則

デノイジングのモデル化
6
図式引用：[3] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840-6851.
Hoら[3]は、ガウス分布による拡散仮定と、その逆仮定を学習する方法を提案
 目的関数（ステップtにおいて加えられたノイズの推定）
• 生成時にはステップt-1でくわえられたノイズを推定し、減算．=デノイズ
• デノイズ≒𝑥0を復元する方向 → （対数）尤度を大きくする方向
ランジュバン動力学との類似性
• ガウス分布の平均、分散によるclose-form
• reparameterization trick
• 対数尤度に対する変分推定

確率微分方程式による定式化
7
図式引用：[4] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential
equations. arXiv preprint arXiv:2011.13456.
[5] Anderson, B. D. (1982). Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3), 313-326.
Songら[4]が、Forward SDEとReverse SDEによる生成モデルを提案
 目的関数（forwardの𝑥を用いたDSM）
• 条件付き確率への変形や、
サンプリング方法のバリエーションあり
• Reverse SDEの提案は[5]
• データのノイズを加えてForward SDEを行い、
そのデータでスコア関数を学習

ランジュバン動力学
マルコフ連鎖モンテカルロ法との対比
8
𝑥の更新則と定常分布との関係を利用して、サンプリングを実現
 MCMC視点
• 𝑝(𝑥)或いは𝑓(𝑥)の計算ができるとき、提案分布や採択率を定義し、サンプルを得る
※メトロポリスヘイスティング法
• ポイントは、更新則と、それに対応する定常分布．
• 大域解探索のために温度を利用する方法もある
定常分布𝜋(𝑥)が𝑝(𝑥)となるように𝑔(𝑥)を設定する．
• 更新則： 𝑥𝑡 ← 𝑔(𝑥𝑡−1)
• 定常分布： 𝜋(𝑥)
 物理学視点（ざっくりと）
• 確率微分方程式： 𝑑𝑥 = 𝑓 𝑥, 𝑡 𝑑𝑡 + 𝑔 𝑡 𝑑𝑤
• ドリフト項𝑓 𝑥, 𝑡 や拡散係数𝑔 𝑡 を与えた下で、𝑥の分布が知りたい
• ドリフト項は、エネルギーの低い方向への移動．拡散係数は温度に対応．
※Langevin dynamicsで調べると別の式が出てきますが
それとの対応までは理解できていません

更新のステップと温度
9
Diffusion-based modelはアニーリングと同様な経路をたどる
𝑥0
1000
温度
1
5
10
50
100
500
𝑥1 𝑥2 𝑥𝑡
𝑥0 𝑥1 𝑥2 𝑥𝑡
…
…
…
…
MCMC
レプリカ交換法
burn-in
Diffusion-based model
• MCMCでは、ある更新則の下で、 𝑥𝑡が分布𝑝(𝑥)に
収束することを目指す
• Diffusion-based modelでは、
更新のstepと温度がリンク．
※発表者の理解のイメージ図です

計算コストの課題
10
1試行で、U-Netを1000回計算（直列）するため、計算が重い
 アーキテクチャ
• U-netベース（各種改良あり）
• 𝑥𝑡−1 ← 𝑥𝑡の更新にU-netを一回計算．
• T=1000
• pixel空間での演算．
図式引用：[4] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential
equations. arXiv preprint arXiv:2011.13456.

潜在空間でのdiffusion-based model
11
図式引用：[6] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695).
 情報圧縮方法
• 画像の持つ情報（Semantic、Perceptual）のうち、
PerceptualをAutoencoderが担当
• blurを抑制するLossも利用
AEで情報圧縮．潜在空間でサンプリングを行い効率化

1/8のダウンサンプルで、良好な画像生成を達成
12
図式引用：[6] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695).

医用画像の事前分布としての応用
13
事前分布𝑝(𝑥) を最大限活用
図式引用：[7] Song, Y., Shen, L., Xing, L., & Ermon, S. (2021). Solving inverse problems in medical imaging with score-based generative models. arXiv preprint
arXiv:2111.08005.
 メインアイディア
• 医用画像に対する事前分布𝑝(𝑥) を学習し、
教師あり学習を教師なし学習に
• 教師によるバイアスを回避．
※従来は欠損データ→画像を予測するが、
欠損の程度のバイアスがかかる．
• 欠損のある観測データと、事前分布とに整合する画像
を復元

画像細部をきれいに復元
14
図式引用：[7] Song, Y., Shen, L., Xing, L., & Ermon, S. (2021). Solving inverse problems in medical imaging with score-based generative models. arXiv preprint
arXiv:2111.08005.

まとめ・感想
15
 まとめ
• Diffusion modelでは、ノイズを加えたデータでスコア関数を学習することで、
生成モデル（デノイズ）を学習
• デノイズの担うU-netに条件入力を持たせることで、各種生成に応用
 感想
• 計算コスト面が解消されれば、任意データの 𝑝(𝑥) としての活用が期待される．
• 実際の物理モデルとの融合も期待できるか

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models

Recommended

More Related Content

What's hot (20)

Similar to 【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models (20)

More from Deep Learning JP (20)

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models