本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
The document discusses attention mechanisms and their implementation in TensorFlow. It begins with an overview of attention mechanisms and their use in neural machine translation. It then reviews the code implementation of an attention mechanism for neural machine translation from English to French using TensorFlow. Finally, it briefly discusses pointer networks, an attention mechanism variant, and code implementation of pointer networks for solving sorting problems.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
The document discusses attention mechanisms and their implementation in TensorFlow. It begins with an overview of attention mechanisms and their use in neural machine translation. It then reviews the code implementation of an attention mechanism for neural machine translation from English to French using TensorFlow. Finally, it briefly discusses pointer networks, an attention mechanism variant, and code implementation of pointer networks for solving sorting problems.
EMNLP2014読み会 "Efficient Non-parametric Estimation of Multiple Embeddings per ...Yuya Unno
1. The document presents the Multi Sense Skip-gram (MSSG) model for learning multiple embeddings per word in vector space.
2. MSSG assigns a separate embedding to each sense of a word using a context vector. It extends the Skip-gram model by learning sense-specific embeddings.
3. The Non-Parametric MSSG (NP-MSSG) model extends MSSG by using a non-parametric approach to learn the context vectors instead of fixed vectors, allowing an unbounded number of senses per word.
This document summarizes research on using structured event representations extracted from news articles to predict stock price movements. Key points include:
- Events are extracted from articles and represented as tuples of actors, actions, and objects to capture the who, what, when of events.
- A deep neural network model is used to predict stock price changes based on extracted event representations.
- The model achieves better performance than baselines that use bag-of-words representations of articles.
Introduction of Chainer, a framework for neural networks, v1.11. Slides used for the student seminar on July 20, 2016, at Sugiyama-Sato lab in the Univ. of Tokyo.
This slide explains the deep learning model, DeepStereo. DeepStereo is proposed by J.Flynn, et al. This model solves the problem of new view synthesis.
This document introduce the literature 'Deep Compression' written by S. Han, et al. You can easily understand that literature by reading this. Only Japanese.
Get To The Point: Summarization with Pointer-Generator Networks_acl17_論文紹介Masayoshi Kondo
Neural Text Summarizationタスクの研究論文.ACL'17- long paper採択.スタンフォード大のD.Manning-labの博士学生とGoogle Brainの共同研究.長文データ(multi-sentences)に対して、生成時のrepetitionを回避するような仕組みをモデルに導入し、長文の要約生成を可能とした.ゼミでの論文紹介資料.論文URL : https://arxiv.org/abs/1704.04368
Literate Computing for Infrastructure - インフラ・コード化の実践におけるIPython (Jupyter) Not...No Bu
Presented at SC2015-6 on 6/3/2015 for ..
Infrastructure as Code meets IPython Notebook to be Literate Computing
IEICE Tech. Rep., vol. 115, no. 72, SC2015-6, pp. 27-32, June 2015.
Abstract: Cloud has put the pressure to rapidly build systems and frequently re-configure services, then Infrastructure as Code has come beyond the simple automation. The approach treats the configuration of systems the same way that software source code is treated. Infrastructure is validated and processed “as Code” with management tools. However, as Code is not limited only about the intelligent automation, but also about the communication based on code for reviewing, reproducing, customizing, and reusing. It is as important to be able to share information and processes with others, as to actually automate complex operations for infrastructures. IPython Notebook is a useful tool to both describe automated operations with code (and configuration data) and share predicted and reproducible outcomes with others, technical and non-technical alike.
IPython Notebook is a “literate computing” tool, which enables us to share stories about infrastructure’s design and elaborated workflows. We will share our experience how the literate stories are also useful for various customer communications as tracing individual issue, promoting self-administration etc.
Keywords DevOps, Infrastructure as Code, Literate Computing, IPython Notebook, Jupyter
インフラ・コード化の実践におけるIPython Notebookの適用
信学技報, vol. 115, no. 72, SC2015-6, pp. 27-32, 2015年6月
あらまし: クラウドサービスの浸透により,サービスの構築・再構築の機会が増加するのに伴って,作業手順をすべてCodeとして記述するInfrastructures as Codeというアプローチが着目されている.ここでの“as Code”は作業手順の正当性がプログラムコードのように,また実行結果も機械的に検証可能であるという意味合いで捉えられがちであるが,むしろ個々の作業の再現性を保証し,その上で作業をカスタマイズ・再利用すると言ったプロセス自体を,Codeとして見える化し,伝達可能にすることにこそ意義がある.DevOpsに於いては,何某かを実際に構築したり機械化したりするだけではなく,設計情報,運用状態を伝達・共有できるようにすることが重要である.
“Literate Computing”ツールと呼ばれ,ワークフローと実行結果を一体としてドキュメント化できるIPython Notebookを,基盤の構築,運用に適用する方式を提案すると共に,具体的な適用によってワークフローをどのように改善することができたかを報告する.
キーワード DevOps, Infrastructure as Code, Literate Computing, IPython Notebook, Jupyter
12. そしてAttentionの出現
[Cho+2014]
[Sutskever+2014]
[Bahdanau+2014]
Although most of the previousworks (see, e.g., Cho et al.,
2014a; Sutskever et al., 2014; Kalchbrenner and
Blunsom, 2013) used to encode a variable-length input
sentence into a fixed-length vector, it is not necessary,
and even it may be beneficial to have a variable-length
vector, as we will show later
13. - Background1: Neural Network
- Background2: Recurrent Neural Network
- Background3: Encoder-Decoder approach
(aka. sequence to sequence approach)
- Attention mechanism and its variants
- Global attention
- Local attention
- Pointer networks
- Attention for image (image caption generation)
- Attention techniques
- NN with Memory
Agenda
15. Although most of the previousworks (see, e.g., Cho et al.,
2014a; Sutskever et al., 2014; Kalchbrenner and Blunsom,
2013) used to encode a variable-length input sentence
into a fixed-length vector, it is not necessary, and
even it may be beneficial to have a variable-length vector,
as we will show later
[Bahdanau+2014]
入力系列をシンプルなRNN Encoderによって一つの固定長ベクトル
に詰め込むのではなく,Encode中の各隠れ層をとっておいて,
decodeの各ステップでそれらを使う(どう使うかはもうすぐ説明)
!
最終的に一つのベクトルとしてdecoderに渡すのは同じだが,
渡すベクトルが文脈(e.g. これまでの自身の出力)に応じて動的に変わる
16. simple enc-dec enc-dec + attention
Figures from [Luong+2015] for comparison
Enc-decがEncoderが入力系列を一つのベクトルに圧縮
Decoderが出力時の初期値として使う
76. [Graves2013] Alex Graves, Generating Sequences With Recurrent Neural Networks .
[Cho+2014] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk,
Yoshua Bengio, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation .
[Sutskever+2014] Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks .
[Luong+2015] Minh-Thang Luong, Hieu Pham, Christopher D. Manning, Effective Approaches to Attention-based Neural
Machine Translation .
[Denil+2011] Misha Denil, Loris Bazzani, Hugo Larochelle, Nando de Freitas, Learning where to Attend with Deep
Architectures for Image Tracking
[Cho+2015] Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder‒
Decoder Networks .
[Rush+2015] Alexander M. Rush, Sumit Chopra, Jason Weston, A Neural Attention Model for Abstractive Sentence
Summarization
[Ling+2015] Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black, Character-based Neural Machine Translation .
[Vinyals+2014] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton, Grammar as a Foreign
Language
[Shang+2015] Lifeng Shang, Zhengdong Lu, Hang Li , Neural Responding Machine for Short-Text Conversation
[Hermann+15] Karl Moritz Hermann, Tomáš Ko iský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil
Blunsom, Teaching Machines to Read and Comprehend
[Vinyals+2015] Oriol Vinyals, Meire Fortunato, Navdeep Jaitly, Pointer Networks
[Xu+2015] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua
Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention .
[Vinyals+2015b] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: A Neural Image Caption
Generator
[Mansimov+2016] Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, Generating Images from Captions
with Attention
[Meng+2016] Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, Qun Liu , A Deep Memory-based Architecture for
Sequence-to-Sequence Learning .
[Sukhbaatar+2015] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, End-To-End Memory Networks ..
[Graves+2014] Alex Graves, Greg Wayne, Ivo Danihelka , Neural Turing Machines .
[Tran+2016] Ke Tran, Arianna Bisazza, Christof Monz, Recurrent Memory Network for Language Modeling .
Reference