SlideShare a Scribd company logo
Self-Supervised
Learning
Hung-yi Lee 李宏毅
https://www.sesameworkshop.org/what-
we-do/sesame-streets-50th-anniversary
死臭酸宅本人
芝麻街
ELMo
(Embeddings from
Language Models)
BERT (Bidirectional
Encoder Representations
from Transformers)
ERNIE (Enhanced Representation
through Knowledge Integration)
Big Bird: Transformers for
Longer Sequences
Source of image: https://leemeng.tw/attack_on_bert_transfer_learni
ng_in_nlp.html
BERT
Bertolt
Hoover
340M
parameters
BERT
GPT-2
T5
GPT-3
ELMo
Source: https://youtu.be/wJJnjzNlMws
Source of image: https://huaban.com/pins/1714071707/
ELMO
(94M)
BERT
(340M)
GPT-2
(1542M)
The models become larger
and larger …
Megatron (8B)
GPT-2 T5 (11B)
Turing NLG
(17B)
The models become larger
and larger …
GPT-3 is 10 times larger than
Turing NLG.
BERT (340M)
GPT-3 (175B)
BERT
GPT-3
死臭酸宅本人
https://arxiv.org/abs/
2101.03961
Switch
Transformer
(1.6T)
Outline
BERT series GPT series
Self-supervised Learning
Supervised
𝑥
𝑦
label
Model
^
𝑦
𝑥
𝑥′
𝑥′′
Model
Self-
supervised
𝑦
Masking Input
BERT
台
MASK
Random
(special
token)
https://arxiv.org/abs/1810.04805
灣 大 學
Transformer
Encoder
Linear
學 0.1
灣 0.7
台 0.1
大 0.1
…… ……
(all characters)
=
=
or
Randomly masking
some tokens
一、天、大、小
…
softmax
Masking Input
BERT
台
MASK
Random
(special
token)
https://arxiv.org/abs/1810.04805
灣 大 學
Transformer
Encoder
Linear
=
=
or
Randomly masking
some tokens
一、天、大、小
…
softmax 灣
Ground
truth
minimize cross
entropy
Next Sentence Prediction
BERT
[SEP]
Yes/No
[CLS]
Linear
Robustly optimized BERT approach
(RoBERTa)
w1 w2
Sentence 1
w3 w4 w5
Sentence 2
• This approach is not helpful.
https://arxiv.org/abs/1907.11692
• SOP: Sentence order prediction
Used in ALBERT
https://arxiv.org/abs/1909.11942
• Masked token prediction
• Next sentence prediction
BERT
Self-supervised
Learning
Model for
Task 1
Downstream Tasks
Model for
Task 2
Model for
Task 3
• The tasks we care
• We have a little bit labeled data.
Fine-tune
Pre-train
GLUE
• Corpus of Linguistic Acceptability (CoLA)
• Stanford Sentiment Treebank (SST-2)
• Microsoft Research Paraphrase Corpus (MRPC)
• Quora Question Pairs (QQP)
• Semantic Textual Similarity Benchmark (STS-B)
• Multi-Genre Natural Language Inference (MNLI)
• Question-answering NLI (QNLI)
• Recognizing Textual Entailment (RTE)
• Winograd NLI (WNLI)
General Language Understanding
Evaluation (GLUE)
https://gluebenchmark.com/
GLUE also has Chinese version (https://www.cluebenchmarks.com/)
BERT and its Family
• GLUE scores
Source of image: https://arxiv.org/abs/1905.00537
How to use BERT – Case 1
BERT
[CLS] w1 w2 w3
Linear
class Input: sequence
output: class
sentence
Example:
Sentiment analysis
Random
initialization
Init by pre-train
This is the model
to be learned.
this is good
positive
Better than random
Pre-train v.s. Random Initialization
Source of image: https://arxiv.org/abs/1908.05620
(fine-
tune)
(scratch)
19
How to use BERT – Case 2
BERT
[CLS] w1 w2 w3
Linear
class
Input: sequence
output: same as input
sentence
Linear
class
Linear
class
I saw a saw
N V DET N
Example:
POS tagging
How to use BERT – Case 3
Input: two sequences
Output: a class
premise: A person on a horse
jumps over a broken down airplane
hypothesis: A person is at a diner. contradiction
Model
contradiction
entailment
neutral
Example:
Natural Language Inferencee (NLI)
Linear
w1 w2
How to use BERT – Case 3
BERT
[CLS] [SEP]
Class
Sentence 1 Sentence 2
w3 w4 w5
Input: two sequences
Output: a class
How to use BERT – Case 4
• Extraction-based Question
Answering (QA)
𝐷={𝑑1,𝑑2 ,⋯ ,𝑑𝑁 }
𝑄={𝑞1 , 𝑞2 , ⋯ , 𝑞𝑀 }
QA
Model
output: two integers (, )
𝐴={𝑑𝑠 , ⋯ ,𝑑𝑒 }
Document:
Query:
Answer:
𝐷
𝑄
𝑠
𝑒
17
77 79
𝑠=17 , 𝑒=17
𝑠=77 , 𝑒=79
q1 q2
How to use BERT – Case 4
BERT
[CLS] [SEP]
question document
d1 d2 d3
inner product
Softmax
0.5
0.3 0.2
s = 2
Random
Initialized
q1 q2
How to use BERT – Case 4
BERT
[CLS] [SEP]
question document
d1 d2 d3
inner product
Softmax
0.2
0.1 0.7
The answer is “d2 d3”.
s = 2 e = 3
Random
Initialized
That is all about BERT!
Training BERT is challenging!
GLUE scores
This work is done by 姜成翰
台達電產學合作計畫研究成果
Our ALBERT-base
Google’s ALBERT-base
https://arxiv.org/abs/2010.02480
Google’s BERT-base
Training data has more than 3 billions of words.
3000 times of Harry Potter series
8 days with TPU v3
BERT Embryology ( 胚胎學 )
When does BERT know POS tagging,
syntactic parsing, semantics?
The answer is counterintuitive!
https://arxiv.org/abs/2010.02480
Pre-training a seq2seq model
w1 w2 w3
w5 w6 w7
w4
Cross
Attention
w8
Decoder
Encoder
w1 w2 w3 w4
Reconstruct the input
Corrupted
MASS / BART
BART
A B [SEP] C D E
A B [SEP] C D E
A B [SEP] C E
C D E [SEP] A B
D E A B [SEP] C
A B [SEP] E
MASS
(Delete
“D”)
Text Infilling
(permutation)
(rotation)
https://arxiv.org/abs/1905.02450
https://arxiv.org/abs/1910.13461
T5 – Comparison
• Transfer Text-to-Text Transformer (T5)
• Colossal Clean Crawled Corpus (C4)
Why does BERT work?
BERT
台 灣 大 學
Represent the
meaning of “ 大”
魚
鳥
草
電
吃蘋果
蘋果手機
embedding
The tokens with similar meaning
have similar embedding.
Context is considered.
Why does BERT work?
BERT
喝 蘋 果 汁
BERT
蘋 果 電 腦
compute cosine similarity
self-supervised learning and Bert from a
Why does BERT work?
John Rupert Firth
You shall know a word by
the company it keeps
BERT
w1 w2 w3 w4
w2
word
embedding
Contextualized
word embedding
Why does BERT work?
• Applying BERT to protein, DNA, music classification
This work is done by 高瑋聰
https://arxiv.org/abs/2103.07162
EI CCAGCTGCATCACAGGAGGCCAGCG
EI AGACCCGCCGGGAGGCGGAGGACC
IE AACGTGGCCTCCTTGTGCCCTTCCCC
IE CCACTCAGCCAGGCCCTTCTTCTCCT
IE CCTGATCTGGGTCTCCCCTCCCACCCT
IE AGCCCTCAACCCTTCTGTCTCACCCTC
IE CCACTCAGCCAGGCCCTTCTTCTCCT
N CTGTGTTCACCACATCAAGCGCCGGG
N GTGTTACCGAGGGCATTTCTAACAGT
N TCTGAGCTCTGCATTTGTCTATTCTCC
class DNA sequence
A we
T you
C he
G she
This work is done by 高瑋聰
https://arxiv.org/abs/2103.07162
BERT
[CLS]
Linear
class
DNA sequence
Random
initialization
Init by pre-train
pre-train on English
Why does BERT work?
A G A C
we we
she he
Why does BERT work?
• Applying BERT to protein, DNA, music classification
This work is done by 高瑋聰
https://arxiv.org/abs/2103.07162
To Learn More ……
BERT (Part 1) BERT (Part 2)
https://youtu.be/1_gRK9EIQpc https://youtu.be/Bywo7m6ySlk
Multi-lingual BERT
Multi-BERT
深 度 學 習
Training a BERT model by many different languages.
Multi-BERT
high est moun tain Mask
Mask
Zero-shot Reading Comprehension
Training on the sentences of 104 languages
Multi-BERT
Doc1
Query1
Ans1
Doc2
Query2
Ans2
Doc3
Query3
Ans3
Doc4
Query4
Ans4
Doc5
Query5
Ans5
Doc1
Query1
? Doc3
Query3
?
Doc2
Query2
?
Train on English QA
training examples
Test on Chinese
QA test
Zero-shot Reading Comprehension
• English: SQuAD, Chinese: DRCD
F1 score of Human performance is 93.30%
Model Pre-train Fine-tune Test EM F1
QANet none Chinese
Chinese
66.1 78.1
BERT
Chinese Chinese 82.0 89.1
104
languages
Chinese 81.2 88.7
English 63.3 78.8
Chinese + English 82.6 90.1
This work is done by 劉記良、許宗嫄
https://arxiv.org/abs/1909.09587
Cross-lingual Alignment?
Multi-BERT
深 度 學 習
high est moun tain
魚
兔
跳
游
swim
jump
rabbit
fish
投影片來源 : 許宗嫄同學碩士口試投影
片
https://arxiv.org/abs/2010.10938
Mean Reciprocal Rank (MRR):
Higher MRR, better alignment
Google’s
Multi-BERT
Our Multi-BERT
200k sentences
for each lang
How about 1000k?
The training is also challenging …
Two days ……
(the whole training took one week)
投影片來源 : 許宗嫄同學碩士口試投影
片
https://arxiv.org/abs/2010.10938
Mean Reciprocal Rank (MRR):
Higher MRR, better alignment
Google’s Multi-
BERT
Our Multi-BERT
200k sentences
for each lang
Our Multi-BERT
1000k sentences
The amount of training data is critical for alignment.
魚
兔
跳
游
swim
jump
rabbit
fish
Multi-BERT
深 度 學 習
high est moun tain
Reconstruction
深 度 學 習
high est moun tain
Weird???
If the embedding is
language independent …
How to correctly
reconstruct?
There must be language
information.
https://arxiv.org/abs/2010.10041
Multi-BERT
Reconstruction
那 有 一 貓
Where is
Language?
Average of
Chinese
Average of
English
This work is done by 劉記良、許宗嫄、莊永松
there is a cat
+ + + +
魚
兔
跳
游
swim
jump
rabbit
fish
If this is true …
Average of
Chinese
Average of
English
This work is done by 劉記良、許宗嫄、莊永松
魚
兔
跳
游
swim
jump
rabbit
fish
x
Unsupervised token-level translation 
https://arxiv.org/abs/2010.10041
Outline
BERT series GPT series
Predict Next Token
<BOS> 台 灣
台 灣 大
h1 h2 h3 h4
Model
? ? ? ?
大
學
Linear
Transform
softmax
Cross
entropy
wt+1
from wt h𝑡
Training
data: “ 台灣
大學”
Predict Next Token
They can do generation.
https://talktotransformer.com/
How to use GPT?
Description
A few example
“Few-
shot”
Learning
“One-
shot”
Learning
“Zero-
shot”
Learning
(no gradient
descent)
“In-context” Learning
Average of 42 tasks
To learn more ……
https://youtu.be/DOG1L9lvsDY
Beyond Text
Data Centric Prediction
Position, 2015
Jigsaw, 2017
Rotation, 2018 Cutout, 2015
RNNLM, 1997
word2v, 2013 audio2v, 2019
BERT, 2018 Mock, 2020
TERA, 2020
APC, 2019
NLP Speech CV
Contrastive
InfoNCE,
2017
CPC, 2019
MoCo, 2019
SimCLR, 2020
MoCov2, 2020
BYOL, 2020
SimSiam, 2020
本投影片由劉廷緯同學提供
Image- SimCLR https://arxiv.org/abs/2002.05709
https://github.com/google-research/simclr
Image- BYOL
Bootstrap your own latent:
A new approach to self-supervised Learning
https://arxiv.org/abs/2006.07733
Speech
Audio version
BERT
深 度 學 習
Speech GLUE- SUPERB
• Speech processing Universal PERformance
Benchmark
• Will be available soon
• Downstream: Benchmark with 10+ tasks
• The models need to know how to process
content, speaker, emotion, and even semantics.
• Toolkit: A flexible and modularized framework for
self-supervised speech models.
• https://github.com/s3prl/s3prl
https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning
Appendix
(a joke)
Predict Next Token
They can do generation.
律師
混亂
I forced a bot to watch over 1,000 hours of XXX
是一個梗 ! 人在模仿機器模仿人 !!!
Ad

More Related Content

Similar to self-supervised learning and Bert from a (20)

AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
cscpconf
 
Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object
Tanu Malik
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020
Deep Learning Italia
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
RNN is recurrent neural networks and deep learning
RNN is recurrent neural networks and deep learningRNN is recurrent neural networks and deep learning
RNN is recurrent neural networks and deep learning
FeiXiao19
 
Learning for sequences - Adam Mathias
Learning for sequences  - Adam MathiasLearning for sequences  - Adam Mathias
Learning for sequences - Adam Mathias
DataFest Tbilisi
 
Large-scale data in life science
Large-scale data in life scienceLarge-scale data in life science
Large-scale data in life science
Tazro Ohta
 
Introduction to VeriFast @ Kyoto
Introduction to VeriFast @ KyotoIntroduction to VeriFast @ Kyoto
Introduction to VeriFast @ Kyoto
Kiwamu Okabe
 
Jtf new
Jtf newJtf new
Jtf new
Manuel Herranz
 
Phylogeny-for-dummies______________.pptx
Phylogeny-for-dummies______________.pptxPhylogeny-for-dummies______________.pptx
Phylogeny-for-dummies______________.pptx
ssuser34d73c1
 
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
PROIDEA
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractions
Jeongkyu Shin
 
1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP
Jordi Llonch
 
1 hour dive into erlang
1  hour dive into erlang1  hour dive into erlang
1 hour dive into erlang
Joan Valduvieco
 
2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
Jun Zhao
 
Solving Localization Challenges with Design Pattern Automation
Solving Localization Challenges with Design Pattern AutomationSolving Localization Challenges with Design Pattern Automation
Solving Localization Challenges with Design Pattern Automation
PostSharp Technologies
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
asahiushio1
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPS
Slide_N
 
Albert
AlbertAlbert
Albert
seungwoo kim
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
Basis Technology
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
cscpconf
 
Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object
Tanu Malik
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020
Deep Learning Italia
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
RNN is recurrent neural networks and deep learning
RNN is recurrent neural networks and deep learningRNN is recurrent neural networks and deep learning
RNN is recurrent neural networks and deep learning
FeiXiao19
 
Learning for sequences - Adam Mathias
Learning for sequences  - Adam MathiasLearning for sequences  - Adam Mathias
Learning for sequences - Adam Mathias
DataFest Tbilisi
 
Large-scale data in life science
Large-scale data in life scienceLarge-scale data in life science
Large-scale data in life science
Tazro Ohta
 
Introduction to VeriFast @ Kyoto
Introduction to VeriFast @ KyotoIntroduction to VeriFast @ Kyoto
Introduction to VeriFast @ Kyoto
Kiwamu Okabe
 
Phylogeny-for-dummies______________.pptx
Phylogeny-for-dummies______________.pptxPhylogeny-for-dummies______________.pptx
Phylogeny-for-dummies______________.pptx
ssuser34d73c1
 
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
JDD2015: Frege - Introducing purely functional programming on the JVM - Dierk...
PROIDEA
 
Deep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractionsDeep-learning based Language Understanding and Emotion extractions
Deep-learning based Language Understanding and Emotion extractions
Jeongkyu Shin
 
1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP
Jordi Llonch
 
2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
Jun Zhao
 
Solving Localization Challenges with Design Pattern Automation
Solving Localization Challenges with Design Pattern AutomationSolving Localization Challenges with Design Pattern Automation
Solving Localization Challenges with Design Pattern Automation
PostSharp Technologies
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
asahiushio1
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPS
Slide_N
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
Basis Technology
 

Recently uploaded (20)

GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Niamh Lucey, Mary Dunne. Health Sciences Libraries Group (LAI). Lighting the ...
Library Association of Ireland
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
Ad

self-supervised learning and Bert from a

Editor's Notes

  • #3: BERT & PALs (Projected Attention Layers)
  • #4: 超巨大巨人 諫ㄐㄧㄢˋ山創 貝爾托特·胡佛 Bertholdt Hoover 「進擊的巨人可以窺視未來繼承 者的記憶,也就是說它能看到未來。」
  • #6: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 93.6M 340M 1542M 40 GB
  • #8: Switch! 比玉山高
  • #13: [CLS] stands for classification.
  • #14: Downstream Tasks : the task you really want to solve Better than directly using labeld data
  • #15: COLA: Each example is a sequence of words annotated with whether it is a grammatical English sentence MRPC: with human annotations for whether the sentences in the pair are semantically equivalent QQP:o determine whether a pair of questions are semantically equivalent. STS: Each pair is human-annotated with a similarity score from 1 to 5; the task is to predict these scores
  • #17: Do I have to mention adaptor??? (https://arxiv.org/abs/1805.12471)
  • #20: (https://arxiv.org/abs/1805.12471)
  • #21: premise前提
  • #22: determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  • #23: (E.g. SQuAD)
  • #24: determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  • #25: determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.
  • #27: (哈利波特全套約 100 萬個詞) BERT (一個巨大的 LM) 用了 30 億個以上的詞 ===== 12500 x 60 x 130 = 97,500,000 (將近 一億) English Gigaword corpus (1200M words the Harry Potter books contain 1,084,170 words. At a typical speaking pace of 130 words per minute, a 1 minute speech will be about 130 words. https://wordcounter.net/blog/2015/11/23/10922_how-many-words-harry-potter.html
  • #28: some spoil here
  • #29: [Song, et al., ICML’19] [Lewis, et al., arXiv’19]
  • #30: Permutation / Rotation do not perform well. Text Infilling is consistently good.
  • #31: The C4 dataset we created for unsupervised pre-training is available in TensorFlow Datasets, but it requires a significant amount of bandwidth for downloading the raw Common Crawl scrapes (~7 TB) and compute for its preparation (~335 CPU-days). We suggest you take advantage of the Apache Beam support in TFDS, which enables distributed preprocessing of the dataset and can be run on Google Cloud Dataflow. With 500 workers, the job should complete in ~16 hours. Colossal龐大柯羅索巨獸 吐槽谷歌T5 Level(高級軟件工程師)。 Interesting demo: https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html 67
  • #34: https://colab.research.google.com/drive/1mWOWFdmI_f18Vo2HpxaVrauzlnAAWw2C
  • #38: https://ppfocus.com/hk/0/di75f00e4.html
  • #40: You have an universal token set for all languages.
  • #41: Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT https://arxiv.org/abs/1904.09077
  • #43: https://www.youtube.com/watch?v=8rDN1jUI82g
  • #44: https://en.wikipedia.org/wiki/Language_code TH態文 EL希臘文
  • #46: https://en.wikipedia.org/wiki/Language_code TH態文 EL希臘文
  • #51: https://arxiv.org/abs/1802.05365
  • #52: https://app.inferkit.com/generate
  • #53: Two reason: BERT has done It is too large ……
  • #56: Self-supervised Learning: (or semi-supervied learning ) Good ref: https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html Scenario Application: text, image, audio Pretext: Contrastive SimCLR,  MoCo Comparison of different approaches: https://arxiv.org/pdf/2011.10566v1.pdf No negative - BOYL Why it work: https://www.untitled-ai.com/understanding-self-supervised-contrastive-learning.html  Prediction  Reconstruction Teacher / Student  Self-training with Noisy Student improves ImageNet classification Billion-scale semi-supervised learning for image classification Self-training v.s. Self-supervied learning https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/deep-learning-self-training%E7%95%B6%E9%81%93-%E5%B0%8D%E6%AF%94pre-training%E7%9A%84%E5%84%AA%E7%BC%BA%E9%BB%9E-4f1b5a937c5d Other task:  Unsupervised Representation Learning by Predicting Image Rotations https://arxiv.org/pdf/1803.07728.pdf Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles https://arxiv.org/abs/1603.09246 Tracking Emerges by Colorizing Videos Weird thing: https://research.aimultiple.com/self-supervised-learning/
  • #60: http://superbbenchmark.org/
  • #62: self-supervised speech
  • #64: https://talktotransformer.com/ https://disp.cc/b/115-bC2b
  • #65: əˈtərnē
  • #66: ˈanərkē