Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
Learning to summarize from human feedbackharmonylab
公開URL:https://arxiv.org/abs/2009.01325
出典:Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano : Learning to summarize from human feedback, arXiv:2009.01325 (2020)
概要:言語モデルが強力になるにつれて、モデルの学習と評価は特定のタスクで使用されるデータとメトリクスによってボトルネックになることが多い。要約モデルでは人間が作成した参照要約を予測するように学習され、ROUGEによって評価されることが多い。しかし、これらのメトリクスと人間が本当に気にしている要約の品質との間にはズレが存在する。本研究では、大規模で高品質な人間のフィードバックデータセットを収集し、人間が好む要約を予測するモデルを学習する。そのモデルを報酬関数として使用して要約ポリシーをfine-tuneする。TL;DRデータセットにおいて本手法を適用したところ、人間の評価において参照要約よりも上回ることがわかった。
13. Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra (Facebook Research)
https://arxiv.org/abs/1711.11543
“Embodied Question Answering” (arXiv, 2017)
Overview
This paper proposes Embodied Question Answering
(EmbodiedQA) task.
The simulator is available in github.
https://github.com/facebookresearch/house3d
Key Point of Proposed Method
Difference between existing QA task
1) State is presented as First person view
2) Agent needs its actions in order to answer correctly
In Experiment, they use hierarchical RL consisted of
planner and controller
- Train separately both modules of navigation and
QA, then joint two modules
Main Insights
Design concept of task
“Long term objective is to make intellogent agents that
can perceive, communicate and act”
- need active perception
- need inference with “common sense”
ex) If asked about car, agents try to go garage,
- need grounding of symbol and real world
15. David Ha, Jürgen Schmidhuber
https://arxiv.org/abs/1803.10122
“World Models” (arXiv, 2018)
Overview
This paper proposes to learn dynamics of environment
and control of agent separately in RL settings.
- model dynamics of environment using VAE and
mixture gaussian RNN
- We can make controller simpler (with fewer
parameters)
By learning model of environment, the agent can learn
policies without interacting real environment
(hallucinated dream), then even transfer into real
settings.
Key Point of Proposed Method
Making the controller simpler by dividing models into
“World Model” with a RNN, and controller with small
number of parameters
- dimension reduction with VAE
- predict latent representation z using Gaussian
Mixture RNN
- simple controller with linear model
Difference between Previous Work
Large RNNs have high capacity but in RL setting,
there’s credit assignment problem, so existing method
tended to use smaller RNNs.
In proposed method, the model is divided into the
model of environment and controller, so large RNNs
can be used.
Main Insights
- First model that achieved required score in
CarRacing-v0 task
- solve task using only learned environment model