This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
1. The document discusses probabilistic modeling and variational inference. It introduces concepts like Bayes' rule, marginalization, and conditioning.
2. An equation for the evidence lower bound is derived, which decomposes the log likelihood of data into the Kullback-Leibler divergence between an approximate and true posterior plus an expected log likelihood term.
3. Variational autoencoders are discussed, where the approximate posterior is parameterized by a neural network and optimized to maximize the evidence lower bound. Latent variables are modeled as Gaussian distributions.
This document summarizes an internship project using deep reinforcement learning to develop an agent that can automatically park a car simulator. The agent takes input from virtual cameras mounted on the car and uses a DQN network to learn which actions to take to reach a parking goal. Several agent configurations were tested, with the three-camera subjective view agent showing the most success after modifications to the reward function and task difficulty via curriculum learning. While the agent could sometimes learn to park, the learning was not always stable, indicating further refinement is needed to the deep RL approach for this automatic parking task.
This document summarizes a presentation about variational autoencoders (VAEs) presented at the ICLR 2016 conference. The document discusses 5 VAE-related papers presented at ICLR 2016, including Importance Weighted Autoencoders, The Variational Fair Autoencoder, Generating Images from Captions with Attention, Variational Gaussian Process, and Variationally Auto-Encoded Deep Gaussian Processes. It also provides background on variational inference and VAEs, explaining how VAEs use neural networks to model probability distributions and maximize a lower bound on the log likelihood.
1) The document discusses using wearable sensors to measure electrodermal activity (EDA) for autistic individuals to help understand their emotions and stress levels.
2) EDA can indicate sympathetic nervous system arousal which may not match outward appearances. Measuring EDA daily over long periods provides a better understanding of baseline levels.
3) A wearable EDA sensor was designed for comfort during long-term, everyday use to gain insights into how social interactions impact physiological states in autistic individuals.
The document introduces autoencoders, which are neural networks that compress an input into a lower-dimensional code and then reconstruct the output from that code. It discusses that autoencoders can be trained using an unsupervised pre-training method called restricted Boltzmann machines to minimize the reconstruction error. Autoencoders can be used for dimensionality reduction, document retrieval by compressing documents into codes, and data visualization by compressing high-dimensional data points into 2D for plotting with different categories colored separately.
文献紹介:SlowFast Networks for Video RecognitionToru Tamaki
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, SlowFast Networks for Video Recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211
https://openaccess.thecvf.com/content_ICCV_2019/html/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html
1. Ishikawa Watanabe Lab
THE UNIVERSITY OF TOKYO
http://www.k2.t.u-tokyo.ac.jp/
猫でも分かる
Variational AutoEncoder
2016/07/30
龍野 翔 (Sho Tatsuno)
37. Ishikawa Watanabe Lab http://www.k2.t.u-tokyo.ac.jp/
教師ありVAE
• Semi-supervised Learning with Deep Generative Models
(14’ M. Welling)
– 教師ありVAE, セミ教師ありVAEの提案
– 同じ筆跡の別の文字などの生成も可能に
Y
labels
38. Ishikawa Watanabe Lab http://www.k2.t.u-tokyo.ac.jp/
GAN
• Generative Advisarial Net(14’ I. J. Goodfellow)
– 学習データに似たイメージを作るGenerator
– 学習データかGeneratorが作成したデータか見分けるDiscriminator
» GenratorとDiscriminatorでイタチごっこをする
一番右が近いイメージ
39. Ishikawa Watanabe Lab http://www.k2.t.u-tokyo.ac.jp/
LAPGAN
• Deep Generative Image Models using a Laplacian Pylamid of
Adversarial Networks(15’ E. Denton)
– 周波数ごとのGANを作り高解像の画像を生成する手法
41. Ishikawa Watanabe Lab http://www.k2.t.u-tokyo.ac.jp/
VAEとGANの統合
• Autoencoding beyond pixels using a learned similarity metric
(15’ A. B. L. Larsen)
– VAEの後ろ部分にGANをくっつけたもの
– VAEのreconstructionとGANの精細さを両立