1. The document discusses energy-based models (EBMs) and how they can be applied to classifiers. It introduces noise contrastive estimation and flow contrastive estimation as methods to train EBMs.
2. One paper presented trains energy-based models using flow contrastive estimation by passing data through a flow-based generator. This allows implicit modeling with EBMs.
3. Another paper argues that classifiers can be viewed as joint energy-based models over inputs and outputs, and should be treated as such. It introduces a method to train classifiers as EBMs using contrastive divergence.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
1. Two papers on unsupervised domain adaptation were presented at ICML2018: "Learning Semantic Representations for Unsupervised Domain Adaptation" and "CyCADA: Cycle-Consistent Adversarial Domain Adaptation".
2. The CyCADA paper uses cycle-consistent adversarial domain adaptation with cycle GAN to translate images at the pixel level while also aligning representations at the semantic level.
3. The semantic representation paper uses semantic alignment and introduces techniques like adding noise to improve over previous semantic alignment methods.
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
1. Two papers on unsupervised domain adaptation were presented at ICML2018: "Learning Semantic Representations for Unsupervised Domain Adaptation" and "CyCADA: Cycle-Consistent Adversarial Domain Adaptation".
2. The CyCADA paper uses cycle-consistent adversarial domain adaptation with cycle GAN to translate images at the pixel level while also aligning representations at the semantic level.
3. The semantic representation paper uses semantic alignment and introduces techniques like adding noise to improve over previous semantic alignment methods.
Open-ended Learning in Symmetric Zero-sum Games @ ICML19 Ohsawa Goodfellow
Deep Learning Japan では、現在、毎週水曜日 9:00 に東京大学での輪読会を実施しています。参加登録はこちらから
http://docs.google.com/forms/d/1kjT_kjwsoB_6FkDkm85nqapoZkNS2yUpmjW9rWW4p8U
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)Ohsawa Goodfellow
Deep Learning Japan @ 東大です
http://www.facebook.com/DeepLearning
https://sites.google.com/site/deeplearning2013/
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Ohsawa Goodfellow
Deep Learning Japan @ 東大です
http://www.facebook.com/DeepLearning
https://sites.google.com/site/deeplearning2013/
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...Ohsawa Goodfellow
Deep Learning Japan @ 東大です
http://www.facebook.com/DeepLearning
https://sites.google.com/site/deeplearning2013/
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...Ohsawa Goodfellow
Deep Learning Japan @ 東大です
http://www.facebook.com/DeepLearning
https://sites.google.com/site/deeplearning2013/
1. Representation Learning:
A Review and New Perspectives
Yoshua Bengio, Aaron Courville, and Pascal Vincent
Department of computer science and operations research, U. Montreal
12/14 2012
D1 大知 正直
1
7. 2. WHY SHOULD WE CARE ABOUT LEARNING
REPRESENTATIONS?
• 様々な分野での表現学習について言及
1. Speech Recognition and Signal
Processing(音声認識と信号処理)
– MAVIS(Microsoft Research) が約30%のエ
ラー率改善
2. Object Recognition(物体認識)
– MNIST(数字認識タスク)においてディープ
ラーニングがSVMの1.4%のエラー率を
0.27%に改善
– ImageNet(自然画像のデータセット)認識タ
スクで15.3%のエラー率へ改善
12
8. 2. WHY SHOULD WE CARE ABOUT LEARNING
REPRESENTATIONS?
3. Natural Language Processing(自然言語
処理)
– SENNAシステム・・・言語モデリングを行う
タスク(品詞タグ付け,チャンキング,固有
名詞認識,意味役割ラベリング,構文解析)
向けのシステム
– (Mikolov et al., 2011)は隠れ層を再帰的に追加
する手法で、平滑化n-gramをperplexity,品詞
認識のエラー率で上回る
13
9. 2. WHY SHOULD WE CARE ABOUT LEARNING
REPRESENTATIONS?
4. Multi-Task and Transfer Learning,
Domain Adaptation(マルチタスク,転
移学習,ドメイン適応)
説明要因(赤い◯)を発見す
る表現学習モデルのイメージ
図.
タスク間での統計的強度の共
有が表現の一般性の獲得を可
能にする.
ICML2011, NIPS2011の
ワークショップで良い結果が
報告されている
14
10. 3. WHAT MAKES A REPRESENTATION GOOD?
1. 人工知能による表現学習の前提
– Smoothness(平滑性)
• 3-2で議論
– Multiple explanatory factors(複数の説明要
因)
• 3-5で議論(3-3で議論する分散化した表現が前提
にある)
– A hierarchical organization of explanatory
factors(説明要因の階層化)
• 抽象的な概念はより上の階層へ(3-4で議論する
“deep representation”が利用される仮定)
15
11. 3. WHAT MAKES A REPRESENTATION GOOD?
1. 人工知能による表現学習の前提
– Semi-supervised learning(半教師あり学
習)
• 𝑋の分布を表現する𝑃(𝑋)は𝑃(𝑌|𝑋)の表現に有用.
教師あり,無し学習間の表現の共有を可能にする
(4で議論)
– Shared factors across tasks
• 𝑋と𝑡𝑎𝑠𝑘で共有された表現が説明要因となる(2-
3で述べた)
16
12. 3. WHAT MAKES A REPRESENTATION GOOD?
1. 人工知能による表現学習の前提
– Manifolds(多様性)
• 元のデータ空間より小さな次元数を持つ局所的な
領域にデータが集中している場合,オートエン
コーダアルゴリズムや他の多様体学習のアルゴリ
ズムが利用される(7-2,8で議論)
– Natural clustering
• 多様体上での局所的な分布がそのままクラスタに
なっている状態(人間が考える固有のカテゴリ,
クラスといった概念と一致)8-3 Manifold
Tangent Classifierで議論
17
13. 3. WHAT MAKES A REPRESENTATION GOOD?
1. 人工知能による表現学習の前提
– Temporal and spatial coherence
• 時間,空間的に近くな状態からの観測は似たよう
な結果をもたらしやすい(11-3で議論)
– Sparsity
• 観測データ𝑥で実際に関係のある要因がほんのわ
ずかであること(6-1-3, 7-2で議論)
これらの前提は学習器がデータの根底にある説明要因を学習,分解する手法
として,頻繁に見られる
18
14. 3. WHAT MAKES A REPRESENTATION GOOD?
2. 平滑性と次元の呪い
カーネル関数を用いて,局所
的で平滑な線形モデルを構築
することで解決
こうしたカーネルそのものの発見も表現学習に含まれると考える
19
32. 6. PROBABILISTIC MODELS
3. Generalizations of the RBM to Real-
valued data
– 画像データに対し、様々な拡張が提案
• Gaussian RBM, mean and covariance RBM,
covariance RBM, spike-and-slab RBM
学習した特徴とトレーニングデータの画像が近い様子
37