SlideShare a Scribd company logo
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Masked World Models for Visual Control
Koki Yamane, University of Tsukuba
書誌情報
2022/12/9 2
題名 Masked World Models for Visual Control
著者 Younggyo Seo (1,2), Danijar Hafner (2,3,4), Hao Liu (2), Fangchen Liu (2),
Stephen James (2), Kimin Lee (3), Pieter Abbeel (2)
所属 (1) KAIST (2) UC Berkeley (3) Google Research (4) University of Toronto
会議 CoRL 2022
website https://sites.google.com/view/mwm-rl
概要  世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
 報酬の予測によりタスクに適した表現を獲得
先行研究:
世界モデル [Ha+ 2018]
 Vision (V) Model
 画像を潜在変数に圧縮
 VAE,対照学習など
 Memory (M) Model
 潜在変数の時間変化を学習
 RNNで潜在変数の系列を記憶
 Controller (C) Model
 潜在変数から行動を予測
 世界モデルが学習できれば方策は線
形モデルで単純なモデル化が可能
2022/12/9 D. Ha and J. Schmidhuber. World models. In Advances in Neural Information Processing Systems, 2018. 3
環境のシミュレータを学習により獲得し高いサンプル効率で強化学習
背景:物体消失問題
 画像表現学習とタスクのギャップ
 VAEのような再構成学習では面積の小
さい要素は無視してもLossが下がっ
てしまう
 一方でタスクに必要なのは対象物体
の位置などの一部の情報
 学習コストの問題
 画像モデルと状態遷移モデルを同時
に学習すると高次元データのRNNに
なり計算量が増大
2022/12/9
Okada, Masashi, and Tadahiro Taniguchi. "DreamingV2: Reinforcement Learning with Discrete World Models without
Reconstruction." arXiv preprint arXiv:2203.00494 (2022).
4
単純に再構成誤差でAEを学習してもタスクに適した表現は得られない
先行研究:
Masked Autoencoder (MAE) [He+ 2021]
 パッチに分割された画像の大部分
(75%)をマスクしてViTに入力
 損失関数
 マスクされたパッチの再構成誤差
(MSE)
 画像分類タスクで高精度を達成
2022/12/9
K. He, X. Chen, S. Xie, Y. Li, P. Dollar, and R. Girshick. Masked autoencoders are scalable vision learners. arXiv preprint
arXiv:2111.06377, 2021.
5
ViTをマスク復元タスクで事前学習
提案手法:
Masked World Models (MWM)
世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
2022/12/9 6
画像直接ではなく中間層でマスキング
(物体を消してしまうのを防ぐ?)
再構成に加え報酬を予測
(報酬にかかわる情報を重視させる)
実験
3つのシミュレーション環境で実験
2022/12/9 7
Meta-world RLBench
DeepMind
Control Suite
結果
性能・サンプル効率ともに従来手法(Dreamer V2)から改善
2022/12/9 8
小さな物体のない
タスクでは同等程度
小さな物体が重要な
タスクでは差が顕著
結果:Ablation Studies
75%の特徴量マスク+報酬予測で最高性能
2022/12/9 9
画像直接ではなく
特徴量のマスクで
性能向上
75%のマスクで最高性能 報酬予測で性能向上
結果:予測画像比較
Dreamer V2 と比較して MWM は物体の位置を予測できている
2022/12/9 10
既存手法では
物体消失
提案手法では
物体位置把握
まとめ
2022/12/9 11
 世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
 画像直接ではなく中間層でマスキング
 報酬の予測によりタスクに適した表現を獲得
 Dreamer V2 と比較して小さな物体を扱うタスクで大幅に性能改善
 感想
 損失関数にタスクの情報を含ませることが重要
 潜在変数がタスク依存になってしまう点が気になる
Ad

More Related Content

Similar to 【DL輪読会】Masked World Models for Visual Control (10)

関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
Akisato Kimura
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
Deep Learning JP
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
Deep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf
幸太朗 岩澤
 
ICCV2019 report
ICCV2019 reportICCV2019 report
ICCV2019 report
Tatsuya Shirakawa
 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes Detection
Yusuke Uchida
 
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
Deep Learning JP
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
nlab_utokyo
 
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
Preferred Networks
 
関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
関西CVPRML勉強会 2012.2.18 (一般物体認識 - データセット)
Akisato Kimura
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
Deep Learning JP
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
Deep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
Deep Learning JP
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf
幸太朗 岩澤
 
Deep Fakes Detection
Deep Fakes DetectionDeep Fakes Detection
Deep Fakes Detection
Yusuke Uchida
 
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera Movements
Deep Learning JP
 
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
Preferred Networks
 

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
 
Ad

【DL輪読会】Masked World Models for Visual Control