本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
P2P ネットワークにおいて重要な機能の1つは,あるデータを検索し,そのデータを持つノードを特定することである.この際,目的のノードに到達するまでの遅延時間をできるだけ小さくできることが望ましいが,従来の多くの手法では,局所的な情報のみを利用して経路を決定するため,必ずしも最短経路が選ばれるわけではない.この問題を解決するため,本稿では,経路ごとの遅延時間を空間効率良く保持できるデータ構造である Distance Bloom Filter,ならびに,これを用いて高い確率で最短経路を選択可能な手法を提案する.また,提案手法を構造化 P2P ネットワークの1つである Skip graph に適用したシミュレーションを行い,その有効性を確認した.
One of the key functions of P2P networks is locating a node that stores target data. This is performed by routing a search message with the key corresponding to the data over the overlay network. Minimizing the latency of this process is not fully achieved by most P2P systems since they only use local information to determine the route. This paper proposes a novel routing method for P2P systems that finds the shortest path to the destination with high probability. Distance Bloom Filter as a space-efficient data structure to store distance information is introduced to support the method. Simulation results of the method applied to skip graphs, a structured P2P network, are also reported.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
This document summarizes a research paper on modeling long-range dependencies in sequence data using structured state space models and deep learning. The proposed S4 model (1) derives recurrent and convolutional representations of state space models, (2) improves long-term memory using HiPPO matrices, and (3) efficiently computes state space model convolution kernels. Experiments show S4 outperforms existing methods on various long-range dependency tasks, achieves fast and memory-efficient computation comparable to efficient Transformers, and performs competitively as a general sequence model.
P2P ネットワークにおいて重要な機能の1つは,あるデータを検索し,そのデータを持つノードを特定することである.この際,目的のノードに到達するまでの遅延時間をできるだけ小さくできることが望ましいが,従来の多くの手法では,局所的な情報のみを利用して経路を決定するため,必ずしも最短経路が選ばれるわけではない.この問題を解決するため,本稿では,経路ごとの遅延時間を空間効率良く保持できるデータ構造である Distance Bloom Filter,ならびに,これを用いて高い確率で最短経路を選択可能な手法を提案する.また,提案手法を構造化 P2P ネットワークの1つである Skip graph に適用したシミュレーションを行い,その有効性を確認した.
One of the key functions of P2P networks is locating a node that stores target data. This is performed by routing a search message with the key corresponding to the data over the overlay network. Minimizing the latency of this process is not fully achieved by most P2P systems since they only use local information to determine the route. This paper proposes a novel routing method for P2P systems that finds the shortest path to the destination with high probability. Distance Bloom Filter as a space-efficient data structure to store distance information is introduced to support the method. Simulation results of the method applied to skip graphs, a structured P2P network, are also reported.
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性Yuichi Yoshida
1. The document discusses the maximum constraint satisfaction problem (Max CSP) and how to approximate its optimal value. It presents a basic linear programming (LP) relaxation called BasicLP that provides an (αΛ-ε, ε)-approximation for any CSP Λ, where αΛ is the integrality gap.
2. For some CSPs like Max Cut, BasicLP can be implemented as a packing LP and solved in polynomial time to give an (αΛ+ε, δ)-approximation in √n time, improving on the Ω(n) time needed for general CSPs.
3. The document outlines how to derive the (αΛ+
1. The document discusses various statistical and neural network-based models for representing words and modeling semantics, including LSI, PLSI, LDA, word2vec, and neural network language models.
2. These models represent words based on their distributional properties and contexts using techniques like matrix factorization, probabilistic modeling, and neural networks to learn vector representations.
3. Recent models like word2vec use neural networks to learn word embeddings that capture linguistic regularities and can be used for tasks like analogy-making and machine translation.