【DL輪読会】Masked World Models for Visual Control

Dec 9, 2022Download as PPTX, PDF0 likes351 views

Deep Learning JP

2022/12/9 Deep Learning JP http://deeplearning.jp/seminar-2/

書誌情報
2022/12/9 2
題名 Masked World Models for Visual Control
著者 Younggyo Seo (1,2), Danijar Hafner (2,3,4), Hao Liu (2), Fangchen Liu (2),
Stephen James (2), Kimin Lee (3), Pieter Abbeel (2)
所属 (1) KAIST (2) UC Berkeley (3) Google Research (4) University of Toronto
会議 CoRL 2022
website https://sites.google.com/view/mwm-rl
概要  世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
 報酬の予測によりタスクに適した表現を獲得

先行研究：
世界モデル [Ha+ 2018]
 Vision (V) Model
 画像を潜在変数に圧縮
 VAE，対照学習など
 Memory (M) Model
 潜在変数の時間変化を学習
 RNNで潜在変数の系列を記憶
 Controller (C) Model
 潜在変数から行動を予測
 世界モデルが学習できれば方策は線
形モデルで単純なモデル化が可能
2022/12/9 D. Ha and J. Schmidhuber. World models. In Advances in Neural Information Processing Systems, 2018. 3
環境のシミュレータを学習により獲得し高いサンプル効率で強化学習

背景：物体消失問題
 画像表現学習とタスクのギャップ
 VAEのような再構成学習では面積の小
さい要素は無視してもLossが下がっ
てしまう
 一方でタスクに必要なのは対象物体
の位置などの一部の情報
 学習コストの問題
 画像モデルと状態遷移モデルを同時
に学習すると高次元データのRNNに
なり計算量が増大
2022/12/9
Okada, Masashi, and Tadahiro Taniguchi. "DreamingV2: Reinforcement Learning with Discrete World Models without
Reconstruction." arXiv preprint arXiv:2203.00494 (2022).
4
単純に再構成誤差でAEを学習してもタスクに適した表現は得られない

先行研究：
Masked Autoencoder (MAE) [He+ 2021]
 パッチに分割された画像の大部分
（75%）をマスクしてViTに入力
 損失関数
 マスクされたパッチの再構成誤差
（MSE）
 画像分類タスクで高精度を達成
2022/12/9
K. He, X. Chen, S. Xie, Y. Li, P. Dollar, and R. Girshick. Masked autoencoders are scalable vision learners. arXiv preprint
arXiv:2111.06377, 2021.
5
ViTをマスク復元タスクで事前学習

提案手法：
Masked World Models (MWM)
世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
2022/12/9 6
画像直接ではなく中間層でマスキング
（物体を消してしまうのを防ぐ？）
再構成に加え報酬を予測
（報酬にかかわる情報を重視させる）

実験
3つのシミュレーション環境で実験
2022/12/9 7
Meta-world RLBench
DeepMind
Control Suite

結果
性能・サンプル効率ともに従来手法（Dreamer V2）から改善
2022/12/9 8
小さな物体のない
タスクでは同等程度
小さな物体が重要な
タスクでは差が顕著

結果：Ablation Studies
75%の特徴量マスク＋報酬予測で最高性能
2022/12/9 9
画像直接ではなく
特徴量のマスクで
性能向上
75%のマスクで最高性能報酬予測で性能向上

結果：予測画像比較
Dreamer V2 と比較して MWM は物体の位置を予測できている
2022/12/9 10
既存手法では
物体消失
提案手法では
物体位置把握

まとめ
2022/12/9 11
 世界モデルの画像表現学習に Masked Autoencoder (MAE) を使用
 画像直接ではなく中間層でマスキング
 報酬の予測によりタスクに適した表現を獲得
 Dreamer V2 と比較して小さな物体を扱うタスクで大幅に性能改善
 感想
 損失関数にタスクの情報を含ませることが重要
 潜在変数がタスク依存になってしまう点が気になる

シンプルな Vision Transformer (ViT) をベースにした Human Pose Estimation (HPE) 手法を提案した研究。本研究以前にも Transformer x HPE の研究は存在したが、その多くが複雑なモジュールやCNNとの融合等を用いた研究である一方、本研究はシンプルさを追求し、ViTをほぼそのまま用いたモデルを提案。更に、Masked-autoencoder による事前学習や、Multi-dataset Training 等を駆使し、超大規模データセットによる事前学習等を用いずに最高精度を実現。

【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDeep Learning JP

PolyLoss: A POLYNOMIAL EXPANSION PERSPECTIVE OF CLASSIFICATION LOSS FUNCTION...Plot Hong

【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)Deep Learning JP

第2回c#画像処理講習Koshiro Miyauchi

・オブジェクト指向・オブジェクト指向; 具体例・オブジェクト指向; メソッド・オブジェクト指向; フィールド・オブジェクト指向; 継承・クラス・クラス; コンストラクタ・クラス; デストラクタ・クラス; メンバ・クラス; カプセル化・クラス; 派生・クラス; オーバーライド・クラス; インスタンス・ポリモーフィズム

【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP

[DL輪読会]"Omnimatte: Associating Objects and Their Effects in Video"Deep Learning JP

関西CVPRML勉強会 2012.2.18 （一般物体認識 - データセット）Akisato Kimura

【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...Deep Learning JP

【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP

[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP

12. Diffusion Model の数学的基礎.pdf幸太朗岩澤

ICCV2019 reportTatsuya Shirakawa

Deep Fakes DetectionYusuke Uchida

[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera MovementsDeep Learning JP

20150414seminarnlab_utokyo

東北大学先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太Preferred Networks

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

More Related Content

Similar to 【DL輪読会】Masked World Models for Visual Control (10)

関西CVPRML勉強会 2012.2.18 （一般物体認識 - データセット）Akisato Kimura

【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...Deep Learning JP

【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP

[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP

12. Diffusion Model の数学的基礎.pdf幸太朗岩澤

ICCV2019 reportTatsuya Shirakawa

Deep Fakes DetectionYusuke Uchida

[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera MovementsDeep Learning JP

20150414seminarnlab_utokyo

東北大学先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太Preferred Networks

関西CVPRML勉強会 2012.2.18 （一般物体認識 - データセット）Akisato Kimura

【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...Deep Learning JP

【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP

[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP

12. Diffusion Model の数学的基礎.pdf幸太朗岩澤

ICCV2019 reportTatsuya Shirakawa

Deep Fakes DetectionYusuke Uchida

[DL輪読会]Human Dynamics from Monocular Video with Dynamic Camera MovementsDeep Learning JP

20150414seminarnlab_utokyo

東北大学先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太Preferred Networks

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP