【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action DiffusionDeep Learning JP
This document discusses a paper on visuomotor policy learning via action diffusion. The paper presents a method for training policies that map camera images directly to actions by incorporating action diffusion, which adds noise to actions during training. This helps explore the action space and avoid getting stuck in local optima during policy learning. The method can learn policies for complex manipulation tasks entirely from pixels using self-supervised reinforcement learning with image rewards.
1. The document discusses implicit behavioral cloning, which was presented in a 2021 Conference on Robot Learning (CoRL) paper.
2. Implicit behavioral cloning uses an implicit model rather than an explicit model to map observations to actions. The implicit model is trained using an InfoNCE loss function to discriminate positive observation-action pairs from negatively sampled pairs.
3. Experiments showed that the implicit model outperformed explicit models on several manipulation tasks like bi-manual sweeping, insertion, and sorting. The implicit approach was able to generalize better than explicit behavioral cloning.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action DiffusionDeep Learning JP
This document discusses a paper on visuomotor policy learning via action diffusion. The paper presents a method for training policies that map camera images directly to actions by incorporating action diffusion, which adds noise to actions during training. This helps explore the action space and avoid getting stuck in local optima during policy learning. The method can learn policies for complex manipulation tasks entirely from pixels using self-supervised reinforcement learning with image rewards.
1. The document discusses implicit behavioral cloning, which was presented in a 2021 Conference on Robot Learning (CoRL) paper.
2. Implicit behavioral cloning uses an implicit model rather than an explicit model to map observations to actions. The implicit model is trained using an InfoNCE loss function to discriminate positive observation-action pairs from negatively sampled pairs.
3. Experiments showed that the implicit model outperformed explicit models on several manipulation tasks like bi-manual sweeping, insertion, and sorting. The implicit approach was able to generalize better than explicit behavioral cloning.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
[Paper Reading] Learning Distributed Representations for Structured Output Pr...Yusuke Iwasawa
1) The document proposes a new method called DISTRO that uses distributed representations for structured output prediction tasks.
2) DISTRO represents labels as dense real-valued vectors rather than one-hot vectors, and defines compositionality of labels using tensor products of label vectors.
3) Experiments on document classification and part-of-speech tagging show that DISTRO outperforms baselines by learning label vectors that capture similarities between labels.
8. Related Works: Summary of Existing Methods
• Not deep (Fixed feature space)
– [Gong, 2012] GFK
– [Baktashmotlagh, 2013] UDICA
– [Sun, 2016] CORAL
– Many
• Deep (Neural based, Flexible feature space)
– Feature Adaptation
– Classifier Adaptation (<- This Work)
• [Long, 2016] RTN (NIPS2016)
• [Sener, 2016] knn-Ad (NIPS2016)
• [Saito, 2017] ATDA (ICML2017)
8
9. Related Works: Feature Adaptation
Mathematical Foundation
[Ganin, 2016] “Domain-Adversarial Training of Neural Networks”
Visualization
[Ben-David, 2010] “A theory of learning from different domains”
ドメイン間の距離ソース損失
理想的なhを使うと
きの損失の差
9
17. 17
Feature Adaptation Classifier Adaptation?
Name Ref. Minimize By Means Enc unshared? Y/N By Means
DDC [Tzeng, 2014] E|P(Zs) – P(Zt)| MMD N N /
DDA [Long, 2015] E|P(Zs) – P(Zt)| MK-MMD N N /
DANN [Ganin, 2014] E|P(Zs)/P(Zt)| Adversarial N N /
CORAL [Sun, 2016] E|P(Zs) – P(Zt)| 2nd order moment N N /
VFAE [Louizos, 2016] E|P(Zs) – P(Zt)| MMD + Graphical N N /
AdaBN [Li, 2017] E|P(Zs) – P(Zt)| Domain-wise BN N N /
CMD [Zellinger, 2017] E|P(Zs) – P(Zt)| k-th order moment N N /
UPLDA [Bousmalis, 2016] E|P(Xs)/P(Xt)| GAN / N /
DSN [Bousmalis, 2016]
E|P(Zs) – P(Zt)|
or E|P(Zs)/P(Zt)| MMD or Adversarial Y N /
ADDA [Tzeng, 2017] E|P(Zs)/P(Zt)| Adversarial Y N /
KNN-Ad [Sener, 2016] ?? ?? ?? Y ??
RTN [Long, 2016] E|P(Zs) – P(Zt)| MMD N Y Residual Classifier
ATDA [Saito, 2017] / / N Y Tri-Training
DRCN [Ghifary, 2016] Implicit Reconstruction N N /
CoGAN [Liu, 2016] Implicit GAN Y N /
GTA [Swami, 2017] Implicit Conditional GAN N N /
- Feature
- Shared Enc
- Feature
- Unshared Enc
- Classifier
- Feature
- Implicit
Category
18. 18
Feature Adaptation Classifier Adaptation?
Name Ref. Minimize By Means Enc unshared? Y/N By Means
DDC [Tzeng, 2014] E|P(Zs) – P(Zt)| MMD N N /
DDA [Long, 2015] E|P(Zs) – P(Zt)| MK-MMD N N /
DANN [Ganin, 2014] E|P(Zs)/P(Zt)| Adversarial N N /
CORAL [Sun, 2016] E|P(Zs) – P(Zt)| 2nd order moment N N /
VFAE [Louizos, 2016] E|P(Zs) – P(Zt)| MMD + Graphical N N /
AdaBN [Li, 2017] E|P(Zs) – P(Zt)| Domain-wise BN N N /
CMD [Zellinger, 2017] E|P(Zs) – P(Zt)| k-th order moment N N /
UPLDA [Bousmalis, 2016] E|P(Xs)/P(Xt)| GAN / N /
DSN [Bousmalis, 2016]
E|P(Zs) – P(Zt)|
or E|P(Zs)/P(Zt)| MMD or Adversarial Y N /
ADDA [Tzeng, 2017] E|P(Zs)/P(Zt)| Adversarial Y N /
KNN-Ad [Sener, 2016] ?? ?? ?? Y ??
RTN [Long, 2016] E|P(Zs) – P(Zt)| MMD N Y Residual Classifier
ATDA [Saito, 2017] / / N Y Tri-Training
DRCN [Ghifary, 2016] Implicit Reconstruction N N /
CoGAN [Liu, 2016] Implicit GAN Y N /
GTA [Swami, 2017] Implicit Conditional GAN N N /
<- Proposal