This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
【DL輪読会】Mastering Diverse Domains through World ModelsDeep Learning JP
The document summarizes Mastering Diverse Domains through World Models, which introduces Dreamer V3. Dreamer V3 improves on previous Dreamer models through the use of symlog prediction networks and actor critics trained with temporal difference learning. It achieves better performance than ablation models in the Atari domain.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
【DL輪読会】Mastering Diverse Domains through World ModelsDeep Learning JP
The document summarizes Mastering Diverse Domains through World Models, which introduces Dreamer V3. Dreamer V3 improves on previous Dreamer models through the use of symlog prediction networks and actor critics trained with temporal difference learning. It achieves better performance than ablation models in the Atari domain.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
10. ベースラインアルゴリズム
10
先読み検索の有無でベースラインを区別:
IRIS(提案手法)はMonte Carlo Tree Searchとの組み合わせが可能だが、
本論文では先読み検索なしの手法を比較対象として設定
先読み検索なし:
SimPLe [5]、CURL [6]、DrQ [7]、SPR [8]
先読み検索あり:
MuZero [9]、EfficientZero [10]
[5] Kaiser, Łukasz, et al. "Model Based Reinforcement Learning for Atari." 2019.
[6] Srinivas, Aravind, Michael Laskin, and Pieter Abbeel. "CURL: Contrastive Unsupervised Representations for Reinforcement Learning." 2020.
[7] Yarats, Denis, Ilya Kostrikov, and Rob Fergus. "Image augmentation is all you need: Regularizing deep reinforcement learning from pixels." 2020.
[8] Schwarzer, Max, et al. "Data-efficient reinforcement learning with self-predictive representations." 2020.
[9] Schrittwieser, Julian, et al. "Mastering atari, go, chess and shogi by planning with a learned model." 2020.
[10] Ye, Weirui, et al. "Mastering atari games with limited data." 2021.