0% found this document useful (0 votes)

55 views

Auto (v3)

1. Auto-encoders aim to learn an embedding or latent representation of input data by reconstructing the input. 2. Beyond simply minimizing reconstruction error, encoders can be evaluated based on how well the learned embeddings support other tasks like classification. 3. Disentangling factors of variation in the data, like content and style, allows manipulating or transferring these factors independently. Adversarial training and designed network architectures can help achieve disentanglement. 4. Discrete representations make the embedding more interpretable and suitable for tasks like clustering compared to continuous representations. Vector quantization techniques can discretize the embedding.

Uploaded by

daredevilcho

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Auto (v3)

Uploaded by

daredevilcho

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

MORE ABOUT

AUTO-ENCODER
Hung-yi Lee 李宏毅
Auto-encoder
As close as possible

vector
NN NN
Encoder Decoder

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error

• More interpretable embedding
What is good embedding?
• An embedding should represent the object.

是一對不是一對
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings

loss of the classification task is 𝐿𝐷 are representative.

Discrimi
image y/n
nator
𝜙
binary classifier

NN Say “Yes”
Encoder

NN
Encoder
Say “No”
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings

loss of the classification task is 𝐿𝐷 are representative.
Large 𝐿∗𝐷 Not representative
Discrimi
image y/n
nator
𝜙
binary classifier

NN Say “Yes”
Encoder

NN
Encoder
Say “No”
Train 𝜙 to minimize 𝐿𝐷
Beyond Reconstruction 𝐿∗𝐷 = min 𝐿𝐷
How to evaluate an encoder? 𝜙

Small 𝐿∗𝐷 The embeddings

loss of the classification task is 𝐿𝐷 are representative.
Large 𝐿∗𝐷 Not representative
Discrimi
image y/n
nator Train 𝜃 to minimize 𝐿∗𝐷
𝜙
binary classifier
𝜃 ∗ = 𝑎𝑟𝑔 min 𝐿∗𝐷
𝜃

= 𝑎𝑟𝑔 min min 𝐿𝐷

NN 𝜃 𝜙
Encoder Train the encoder 𝜃 and
𝜃
discriminator 𝜙 to minimize 𝐿𝐷
Deep InfoMax (DIM)
NN
Encoder (c.f. training encoder and decoder
𝜃 to minimize reconstruction error)
Typical auto-encoder is a special case
As close as possible

vector
NN NN
Encoder Decoder

- score
(reconstruc
tion error)
vector

NN
Decoder

Discriminator
A document is a
sequence of sentences.
Sequential Data
previous
Skip thought
current

https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf
next
Quick thought
current
random
next
random
https://arxiv.org/pdf/1803.02893.pdf
Sequential Data
• Contrastive Predictive Coding (CPC)

https://arxiv.org/pdf/1807.03748.pdf
Auto-encoder
As close as possible

vector
NN NN
Encoder Decoder

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error

• More interpretable embedding
Feature Disentangle
Source: https://www.dreamstime.com/illustration/disentangle.html

• An object contains multiple aspect information

Encoder Decoder
input audio reconstructed
Include phonetic information,
speaker information, etc.

Encoder Decoder

input sentence reconstructed

Include syntactic information,
semantic information, etc.
Feature Disentangle
phonetic information

Encoder Decoder
input audio reconstructed
speaker information
phonetic information
Encoder
1
Decoder
Encoder reconstructed
2
speaker information
Feature Disentangle
- Voice Conversion
How are
you?
Encoder Decoder
How are you? How are you?

Hello

Encoder Decoder
Hello Hello
Feature Disentangle
- Voice Conversion
How are
you?
Encoder
How are you? How are
you?
Decoder

Hello

Encoder
Hello
How are you?
Feature Disentangle
- Voice Conversion
• The same sentence has different impact when it is
said by different people.
Do you want to
study a PhD?

Go away! Student

Do you want to
新垣結衣 study a PhD?
(Aragaki Yui)
Student
Feature Disentangle
- Adversarial Training
Learn to fool the Speaker
or
speaker classifier Classifier
(Discriminator)

How are
you?
Encoder Decoder
How are you? How are you?

Speaker classifier and encoder are learned iteratively

Feature Disentangle
- Designed Network Architecture
How are
you?
Encoder

IN
1

How are Decoder

How are you? you? How are you?
Encoder
2

IN = instance normalization (remove global information)

Feature Disentangle
- Designed Network Architecture
How are
you?
Encoder

IN
1

AdaIN
Decoder
How are you? How are you?
Encoder
2

IN = instance normalization (remove global information)

AdaIN = adaptive instance normalization
(only influence global information)
Feature Disentangle - Adversarial Training

Target Speaker

Source Speaker Source to Target

(Never seen during training!)

Thanks Ju-chieh Chou for providing the results.

https://jjery2243542.github.io/voice_conversion_demo/
Discrete Representation
non differentiable
• Easier to interpret or clustering https://arxiv.org/pdf/16
11.01144.pdf
One-hot
0.9 1
NN 0.1 0 NN
Encoder 0.3 0
Decoder
0.7 0

Binary
0.9 1
NN 0.1 0 NN
Encoder 0.3 0
Decoder
0.7 1
Discrete Representation
https://arxiv.org/abs/1711.00937
• Vector Quantized Variational Auto-encoder (VQVAE)

vector 3
vector
NN NN
Encoder Decoder

Compute similarity
vector 1

vector 2

vector 5
Codebook
vector 3
vector 4
(a set of vectors) The most similar one
Learn from data is the input of decoder.
For speech, the codebook represents phonetic information
https://arxiv.org/pdf/1901.08810.pdf
Sequence as Embedding
https://arxiv.org/abs/1810.02851 Only need a lot
of documents to
train the model
This is a seq2seq2seq auto-encoder.
Using a sequence of words as latent representation.
not readable …
word
document sequence document

G R
Summary?
Seq2seq Seq2seq
Sequence as Embedding

Human written summaries Real or not

Let Discriminator considers D Discriminator
my output as real
word
document sequence document
Readable

G R
Summary?
Seq2seq Seq2seq
感謝王耀賢同學提供實驗結果

Sequence as Embedding
• Document:澳大利亞今天與13個國家簽署了反興奮劑雙
邊協議,旨在加強體育競賽之外的藥品檢查並共享研究成
果 ……
• Summary:
• Human:澳大利亞與13國簽署反興奮劑協議
• Unsupervised:澳大利亞加強體育競賽之外的藥品檢查
• Document:中華民國奧林匹克委員會今天接到一九九二年
冬季奧運會邀請函,由於主席張豐緒目前正在中南美洲進
行友好訪問,因此尚未決定是否派隊赴賽 ……
• Summary:
• Human:一九九二年冬季奧運會函邀我參加
• Unsupervised:奧委會接獲冬季奧運會邀請函
感謝王耀賢同學提供實驗結果

Sequence as Embedding
• Document:據此間媒體27日報道,印度尼西亞蘇門答臘島
的兩個省近日來連降暴雨,洪水泛濫導致塌方,到26日為止
至少已有60人喪生,100多人失蹤 ……
• Summary:
• Human:印尼水災造成60人死亡
• Unsupervised:印尼門洪水泛濫導致塌雨
• Document:安徽省合肥市最近為領導幹部下基層做了新規
定:一律輕車簡從,不準搞迎來送往、不準搞層層陪同 ……
• Summary:
• Human:合肥規定領導幹部下基層活動從簡
• Unsupervised:合肥領導幹部下基層做搞迎來送往規定:
一律簡
Tree as Embedding

https://arxiv.org/abs/1806.07832 https://arxiv.org/abs/1904.03746
Concluding Remarks
As close as possible

code
NN NN
Encoder Decoder

• More than minimizing reconstruction error

• Using Discriminator
• Sequential Data
• More interpretable embedding
• Feature Disentangle
• Discrete and Structured

(Download PDF) New Perspectives On HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual Full Chapter
100% (21)
(Download PDF) New Perspectives On HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual Full Chapter
36 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Hung-yi Lee GAN-Basic Idea (2017.04.21)
No ratings yet
Hung-yi Lee GAN-Basic Idea (2017.04.21)
67 pages
Autoencoder: Tuan Nguyen - AI4E
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
35 pages
Autoencoder: Tuan Nguyen - AI4E
No ratings yet
Autoencoder: Tuan Nguyen - AI4E
35 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
DeepLearning L9
No ratings yet
DeepLearning L9
94 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
58 pages
Lecture 5
No ratings yet
Lecture 5
41 pages
Roy Slides Part 1 3D Reconstruction With Deep Neural Networks
No ratings yet
Roy Slides Part 1 3D Reconstruction With Deep Neural Networks
74 pages
The Transformer Architecture
No ratings yet
The Transformer Architecture
9 pages
Deep Learning 2017 Lecture7GAN
No ratings yet
Deep Learning 2017 Lecture7GAN
62 pages
Lec3 Autoencoder
No ratings yet
Lec3 Autoencoder
24 pages
7 Talk PDF
No ratings yet
7 Talk PDF
29 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
79 pages
Digital Integrated Circuits: A Design Perspective
No ratings yet
Digital Integrated Circuits: A Design Perspective
107 pages
DH-XVR5232AN-X: Lite Series
No ratings yet
DH-XVR5232AN-X: Lite Series
1 page
PTE Overview Slides
No ratings yet
PTE Overview Slides
12 pages
Mavlas Slides
No ratings yet
Mavlas Slides
554 pages
5707-2.coding Basics
No ratings yet
5707-2.coding Basics
66 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
Image Processing Fundamentals
No ratings yet
Image Processing Fundamentals
40 pages
Deep Auto Encoder
No ratings yet
Deep Auto Encoder
29 pages
Jpeg 2000
No ratings yet
Jpeg 2000
15 pages
6 Image Compression
No ratings yet
6 Image Compression
45 pages
GE Logiq-200-Pro - Training
No ratings yet
GE Logiq-200-Pro - Training
39 pages
Ece141 Lec04 Introduction To Digital Modulation
No ratings yet
Ece141 Lec04 Introduction To Digital Modulation
48 pages
C4_W2
No ratings yet
C4_W2
57 pages
Image Retrieval - Transformer
No ratings yet
Image Retrieval - Transformer
10 pages
Baraniuk IMA Compression June07 Final
No ratings yet
Baraniuk IMA Compression June07 Final
87 pages
20190630transformer-210110081057
No ratings yet
20190630transformer-210110081057
32 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
Turbo Codes
100% (1)
Turbo Codes
28 pages
Convolutional Neural Networks (1) : Geena Kim
No ratings yet
Convolutional Neural Networks (1) : Geena Kim
28 pages
Analog - Digital Conversion
No ratings yet
Analog - Digital Conversion
19 pages
AI_slide_2
No ratings yet
AI_slide_2
82 pages
cs224n 2022 Lecture08 Final Project
No ratings yet
cs224n 2022 Lecture08 Final Project
71 pages
Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
Water Marking
No ratings yet
Water Marking
24 pages
Chapter 12
No ratings yet
Chapter 12
107 pages
Forming the Image and Understanding Lense
No ratings yet
Forming the Image and Understanding Lense
52 pages
Cyclic Codes
No ratings yet
Cyclic Codes
30 pages
The Transformer Model
No ratings yet
The Transformer Model
1 page
TSC TDP 225w Wristband Printer
No ratings yet
TSC TDP 225w Wristband Printer
2 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Domnic Image&Video Compression 2022
No ratings yet
Domnic Image&Video Compression 2022
76 pages
Endsem
No ratings yet
Endsem
738 pages
Semi Final
No ratings yet
Semi Final
19 pages
Cs224n Self Attention Transformers 2023 Draft
No ratings yet
Cs224n Self Attention Transformers 2023 Draft
18 pages
CIS679: Multimedia Basics: Multimedia Data Type Basic Compression Techniques
No ratings yet
CIS679: Multimedia Basics: Multimedia Data Type Basic Compression Techniques
17 pages
Seq 2 Seq
No ratings yet
Seq 2 Seq
59 pages
Audiowatermarking 160126225133
No ratings yet
Audiowatermarking 160126225133
24 pages
ADC and DAC
No ratings yet
ADC and DAC
221 pages
Handwritten Text Recognition
No ratings yet
Handwritten Text Recognition
27 pages
Training Objectives & Architectures: Bert, GPT, T5, Bart & Xlnet: Comprehensively Compared
No ratings yet
Training Objectives & Architectures: Bert, GPT, T5, Bart & Xlnet: Comprehensively Compared
36 pages
GAN Lecture
No ratings yet
GAN Lecture
53 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
S52095 - The Future of Generative AI For Content Creation - 1679442103451001Fz15
No ratings yet
S52095 - The Future of Generative AI For Content Creation - 1679442103451001Fz15
40 pages
J2EE AntiPatterns
From Everand
J2EE AntiPatterns
Bill Dudney
4/5 (2)
Lempel-Ziv-Welch (LZW) Compression Algorithm
No ratings yet
Lempel-Ziv-Welch (LZW) Compression Algorithm
22 pages
Performance Comparison of Hive, Impala and Spark SQL
No ratings yet
Performance Comparison of Hive, Impala and Spark SQL
6 pages
MM Unit-III - 0
No ratings yet
MM Unit-III - 0
22 pages
Erased Log by Sos
No ratings yet
Erased Log by Sos
3 pages
1 - Data Representation
No ratings yet
1 - Data Representation
55 pages
Fundamentals of Digital Image Processing
No ratings yet
Fundamentals of Digital Image Processing
3 pages
Log Cat 1701699872778
No ratings yet
Log Cat 1701699872778
3 pages
NeurIPS 2023 LLM Pruner On The Structural Pruning of Large Language Models Paper Conference
No ratings yet
NeurIPS 2023 LLM Pruner On The Structural Pruning of Large Language Models Paper Conference
19 pages
Ch4 Software
No ratings yet
Ch4 Software
13 pages
Inf Técnicas
No ratings yet
Inf Técnicas
2 pages
Tower Crossing Diagonal Check
No ratings yet
Tower Crossing Diagonal Check
5 pages
1.1.2 Images
No ratings yet
1.1.2 Images
10 pages
Speech Representation Disentanglement With Adversarial Mutual Information
No ratings yet
Speech Representation Disentanglement With Adversarial Mutual Information
5 pages
Interfaces: of HEIDENHAIN Encoders
No ratings yet
Interfaces: of HEIDENHAIN Encoders
25 pages
Definitions Computer Science - Caienotesofficial
No ratings yet
Definitions Computer Science - Caienotesofficial
12 pages
B. To Reduce The Size of Data To Save Space
100% (1)
B. To Reduce The Size of Data To Save Space
25 pages
m244comschp2engtz0xx+.indd
No ratings yet
m244comschp2engtz0xx+.indd
29 pages
Detrs Beat Yolos On Real-Time Object Detection
No ratings yet
Detrs Beat Yolos On Real-Time Object Detection
11 pages
Multimedia Assignment III
No ratings yet
Multimedia Assignment III
3 pages
DMC Mid Ii Bit Bank
No ratings yet
DMC Mid Ii Bit Bank
23 pages
M.tech. (Digital Systems & Signal Processing)
No ratings yet
M.tech. (Digital Systems & Signal Processing)
41 pages
Mobile and Wireless Communication Complete Lecture Notes #1
100% (3)
Mobile and Wireless Communication Complete Lecture Notes #1
22 pages
Chapter 4 - Multimedia Element Sound
No ratings yet
Chapter 4 - Multimedia Element Sound
32 pages
Unit 3
No ratings yet
Unit 3
39 pages
B.SC CSIT Micro Syllabus - Semester V
No ratings yet
B.SC CSIT Micro Syllabus - Semester V
21 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
4 pages
Lzma File Format
No ratings yet
Lzma File Format
3 pages
Core 3 Lesson 10 Audio Media and Information
No ratings yet
Core 3 Lesson 10 Audio Media and Information
14 pages
Immediate Download Basic Photographic Materials and Processes Fourth Edition Josh Shagam Ebooks 2024
100% (3)
Immediate Download Basic Photographic Materials and Processes Fourth Edition Josh Shagam Ebooks 2024
52 pages

Auto (v3)

Uploaded by

Auto (v3)

Uploaded by

MORE ABOUT

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error

Small 𝐿∗𝐷 The embeddings

Small 𝐿∗𝐷 The embeddings

Small 𝐿∗𝐷 The embeddings

= 𝑎𝑟𝑔 min min 𝐿𝐷

Embedding, Latent Representation, Latent Code

• More than minimizing reconstruction error

• An object contains multiple aspect information

input sentence reconstructed

Speaker classifier and encoder are learned iteratively

How are Decoder

IN = instance normalization (remove global information)

IN = instance normalization (remove global information)

Source Speaker Source to Target

Thanks Ju-chieh Chou for providing the results.

Human written summaries Real or not

• More than minimizing reconstruction error

You might also like