0% found this document useful (0 votes)
55 views

Auto (v3)

1. Auto-encoders aim to learn an embedding or latent representation of input data by reconstructing the input. 2. Beyond simply minimizing reconstruction error, encoders can be evaluated based on how well the learned embeddings support other tasks like classification. 3. Disentangling factors of variation in the data, like content and style, allows manipulating or transferring these factors independently. Adversarial training and designed network architectures can help achieve disentanglement. 4. Discrete representations make the embedding more interpretable and suitable for tasks like clustering compared to continuous representations. Vector quantization techniques can discretize the embedding.

Uploaded by

daredevilcho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Auto (v3)

1. Auto-encoders aim to learn an embedding or latent representation of input data by reconstructing the input. 2. Beyond simply minimizing reconstruction error, encoders can be evaluated based on how well the learned embeddings support other tasks like classification. 3. Disentangling factors of variation in the data, like content and style, allows manipulating or transferring these factors independently. Adversarial training and designed network architectures can help achieve disentanglement. 4. Discrete representations make the embedding more interpretable and suitable for tasks like clustering compared to continuous representations. Vector quantization techniques can discretize the embedding.

Uploaded by

daredevilcho
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

MORE ABOUT

AUTO-ENCODER
Hung-yi Lee 李宏毅
Auto-encoder
As close as possible

vector
NN NN
Encoder Decoder

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error


• More interpretable embedding
What is good embedding?
• An embedding should represent the object.

是一對 不是一對
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings


loss of the classification task is 𝐿𝐷 are representative.

Discrimi
image y/n
nator
𝜙
binary classifier

NN Say “Yes”
Encoder

NN
Encoder
Say “No”
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings


loss of the classification task is 𝐿𝐷 are representative.
Large 𝐿∗𝐷 Not representative
Discrimi
image y/n
nator
𝜙
binary classifier

NN Say “Yes”
Encoder

NN
Encoder
Say “No”
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings


loss of the classification task is 𝐿𝐷 are representative.
Large 𝐿∗𝐷 Not representative
Discrimi
image y/n
nator Train 𝜃 to minimize 𝐿∗𝐷
𝜙
binary classifier
𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿∗𝐷
𝜃

= 𝑎𝑟𝑔 min min 𝐿𝐷


NN 𝜃 𝜙
Encoder Train the encoder 𝜃 and
𝜃
discriminator 𝜙 to minimize 𝐿𝐷
Deep InfoMax (DIM)
NN
Encoder (c.f. training encoder and decoder
𝜃 to minimize reconstruction error)
Typical auto-encoder is a special case
As close as possible

vector
NN NN
Encoder Decoder

- score
(reconstruc
tion error)
vector

NN
Decoder

Discriminator
A document is a
sequence of sentences.
Sequential Data
previous
Skip thought
current

https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf
next
Quick thought
current
random
next
random
https://arxiv.org/pdf/1803.02893.pdf
Sequential Data
• Contrastive Predictive Coding (CPC)

https://arxiv.org/pdf/1807.03748.pdf
Auto-encoder
As close as possible

vector
NN NN
Encoder Decoder

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error


• More interpretable embedding
Feature Disentangle
Source: https://www.dreamstime.com/illustration/disentangle.html

• An object contains multiple aspect information

Encoder Decoder
input audio reconstructed
Include phonetic information,
speaker information, etc.

Encoder Decoder

input sentence reconstructed


Include syntactic information,
semantic information, etc.
Feature Disentangle
phonetic information

Encoder Decoder
input audio reconstructed
speaker information
phonetic information
Encoder
1
Decoder
Encoder reconstructed
2
speaker information
Feature Disentangle
- Voice Conversion
How are
you?
Encoder Decoder
How are you? How are you?

Hello

Encoder Decoder
Hello Hello
Feature Disentangle
- Voice Conversion
How are
you?
Encoder
How are you? How are
you?
Decoder

Hello

Encoder
Hello
How are you?
Feature Disentangle
- Voice Conversion
• The same sentence has different impact when it is
said by different people.
Do you want to
study a PhD?

Go away! Student

Do you want to
新垣結衣 study a PhD?
(Aragaki Yui)
Student
Feature Disentangle
- Adversarial Training
Learn to fool the Speaker
or
speaker classifier Classifier
(Discriminator)

How are
you?
Encoder Decoder
How are you? How are you?

Speaker classifier and encoder are learned iteratively


Feature Disentangle
- Designed Network Architecture
How are
you?
Encoder

IN
1

How are Decoder


How are you? you? How are you?
Encoder
2

IN = instance normalization (remove global information)


Feature Disentangle
- Designed Network Architecture
How are
you?
Encoder

IN
1

AdaIN
Decoder
How are you? How are you?
Encoder
2

IN = instance normalization (remove global information)


AdaIN = adaptive instance normalization
(only influence global information)
Feature Disentangle - Adversarial Training

Target Speaker

Source Speaker Source to Target


(Never seen during training!)

Me

Me

Me

Me

Thanks Ju-chieh Chou for providing the results.


https://jjery2243542.github.io/voice_conversion_demo/
Discrete Representation
non differentiable
• Easier to interpret or clustering https://arxiv.org/pdf/16
11.01144.pdf
One-hot
0.9 1
NN 0.1 0 NN
Encoder 0.3 0
Decoder
0.7 0

Binary
0.9 1
NN 0.1 0 NN
Encoder 0.3 0
Decoder
0.7 1
Discrete Representation
https://arxiv.org/abs/1711.00937
• Vector Quantized Variational Auto-encoder (VQVAE)

vector 3
vector
NN NN
Encoder Decoder

Compute similarity
vector 1

vector 2

vector 5
Codebook
vector 3
vector 4
(a set of vectors) The most similar one
Learn from data is the input of decoder.
For speech, the codebook represents phonetic information
https://arxiv.org/pdf/1901.08810.pdf
Sequence as Embedding
https://arxiv.org/abs/1810.02851 Only need a lot
of documents to
train the model
This is a seq2seq2seq auto-encoder.
Using a sequence of words as latent representation.
not readable …
word
document sequence document

G R
Summary?
Seq2seq Seq2seq
Sequence as Embedding

Human written summaries Real or not


Let Discriminator considers D Discriminator
my output as real
word
document sequence document
Readable

G R
Summary?
Seq2seq Seq2seq
感謝 王耀賢 同學提供實驗結果

Sequence as Embedding
• Document:澳大利亞今天與13個國家簽署了反興奮劑雙
邊協議,旨在加強體育競賽之外的藥品檢查並共享研究成
果 ……
• Summary:
• Human:澳大利亞與13國簽署反興奮劑協議
• Unsupervised:澳大利亞加強體育競賽之外的藥品檢查
• Document:中華民國奧林匹克委員會今天接到一九九二年
冬季奧運會邀請函,由於主席張豐緒目前正在中南美洲進
行友好訪問,因此尚未決定是否派隊赴賽 ……
• Summary:
• Human:一九九二年冬季奧運會函邀我參加
• Unsupervised:奧委會接獲冬季奧運會邀請函
感謝 王耀賢 同學提供實驗結果

Sequence as Embedding
• Document:據此間媒體27日報道,印度尼西亞蘇門答臘島
的兩個省近日來連降暴雨,洪水泛濫導致塌方,到26日為止
至少已有60人喪生,100多人失蹤 ……
• Summary:
• Human:印尼水災造成60人死亡
• Unsupervised:印尼門洪水泛濫導致塌雨
• Document:安徽省合肥市最近為領導幹部下基層做了新規
定:一律輕車簡從,不準搞迎來送往、不準搞層層陪同 ……
• Summary:
• Human:合肥規定領導幹部下基層活動從簡
• Unsupervised:合肥領導幹部下基層做搞迎來送往規定:
一律簡
Tree as Embedding

https://arxiv.org/abs/1806.07832 https://arxiv.org/abs/1904.03746
Concluding Remarks
As close as possible

code
NN NN
Encoder Decoder

• More than minimizing reconstruction error


• Using Discriminator
• Sequential Data
• More interpretable embedding
• Feature Disentangle
• Discrete and Structured

You might also like