本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Several recent papers have explored self-supervised learning methods for vision transformers (ViT). Key approaches include:
1. Masked prediction tasks that predict masked patches of the input image.
2. Contrastive learning using techniques like MoCo to learn representations by contrasting augmented views of the same image.
3. Self-distillation methods like DINO that distill a teacher ViT into a student ViT using different views of the same image.
4. Hybrid approaches that combine masked prediction with self-distillation, such as iBOT.
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
PFN福田圭祐による東大大学院「融合情報学特別講義Ⅲ」(2022年10月19日)の講義資料です。
・Introduction to Preferred Networks
・Our developments to date
・Our research & platform
・Simulation ✕ AI
25. 25
PaintsChainer (#PaintsChainer)
Neural network colorize line arts.
Released Jan. 2017, and already painted about one
million line images
http://free-illustrations.gatag.net/2014/01/10/220000.html
Taizan Yonetsuji
60. 60
GAN 敵対的生成モデル
z
x = G(z)
x
次の手順でxを生成する
(1) z 〜 U(0, I)でサンプリングする
(2) x = G(z)を計算する
最後にサンプリング
がないことに注意p(z)がGaussianでなく
一様分布Uを使うのも特徴
高次元の一様分布の場合
隅が離れた表現を扱える
61. 61
GAN 敵対的生成モデルの学習
偽物かを判定するD(x)を用意
— 本物なら1, 偽物なら0を返す
Dは上式を最大化するように学習し
Gは最小化するように学習する
— この学習はうまく進めば
∫p(z)G(z)dz=P(x), D(x)=1/2という
均衡解にたどり着ける
z
x'
x = G(z)
{1(本物), 0(偽物)}
y = D(x)
x
78. 78
参考文献
[Pathak+ 2017] Curiosity-driven Exploration by Self-
supervised Prediction, ICML 2017, D. Pathak and et. al.
[Arora+ 2015] RAND-WALK: A Latent Variable Model Approch
to Word Embeddings, S. Arora and et. al.
[Ferrer-i-Cancho+ 2017] The origins of Zipf's meaning-
frequency law, R. Ferrer-i-Cancho and et. al.