This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
008 20151221 Return of Frustrating Easy Domain AdaptationHa Phuong
The document proposes a simple and effective method called CORrelation ALignment (CORAL) for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of the source and target distributions without requiring any target labels. The method whitens the source distribution and recolors it with the target covariance matrix. Experiments on object recognition and sentiment analysis tasks show CORAL outperforms other unsupervised domain adaptation methods.
[Paper Reading] Learning Distributed Representations for Structured Output Pr...Yusuke Iwasawa
1) The document proposes a new method called DISTRO that uses distributed representations for structured output prediction tasks.
2) DISTRO represents labels as dense real-valued vectors rather than one-hot vectors, and defines compositionality of labels using tensor products of label vectors.
3) Experiments on document classification and part-of-speech tagging show that DISTRO outperforms baselines by learning label vectors that capture similarities between labels.