SlideShare a Scribd company logo
2
Most read
5
Most read
8
Most read
BERT
Bidirectional Encoder
Representations from Transformer
PHAM QUANG KHANG
Concept
1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks
a. Randomly mask some words within a sequence and let the model try to predict that masked words
b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence
prediction”
2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision)
a. Pre-train on a large corpus as un-supervised learning to learn the language representation
b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD
2019/12/16 PHAM QUANG KHANG 2
Devlin et al,. 2018
Architecture
1. Encoder: encoder from transformer
a. Base model: N = 12, Hidden dim=768, Heads=12
b. Large model: N=24, Hidden dime=1024, Heads=16
2. Embedding:
2019/12/16 PHAM QUANG KHANG 3
Token Embedding
Multi-head
Attention
Add & Norm
Feed Forward
Add & Norm
N×
Positional Embedding
Segment Embedding
Linear + Softmax
Output
Pre-training tasks
Masked LM Next sentence prediction
2019/12/16 PHAM QUANG KHANG 4
Fine-tuning on SQuAD
 Use output hidden states to predict start and end span
 Apply 1 Linear(output=2) onto output hidden
state vectors T’i
 Output is predictions of starting and ending
positions of answer within input paragraph
 Objective function is log-likelihood of correct
start and end positions
2019/12/16 PHAM QUANG KHANG 5
Result on SQuAD
SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model
2019/12/16 PHAM QUANG KHANG 6
https://rajpurkar.github.io/SQuAD-explorer/
Improving from BERT
ROBERTA
1. Train longer, bigger batches, more data
2. Remove next-sentence-prediction task
3. Longer sequences
4. Dynamic changing masks
ALBERT
1. Factorized embedding params
2. Cross-layer param sharing
3. Inter-sentence coherence loss
2019/12/16 PHAM QUANG KHANG 7
References
1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi
netuning_with_cloud_tpus.ipynb
3. https://github.com/google-research/bert
4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT
5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach
6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
2019/12/16 PHAM QUANG KHANG 8

More Related Content

PPTX
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
PPTX
NLP State of the Art | BERT
shaurya uppal
 
PDF
BERT Finetuning Webinar Presentation
bhavesh_physics
 
PPTX
[Paper review] BERT
JEE HYUN PARK
 
PDF
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
PPTX
BERT introduction
Hanwha System / ICT
 
PDF
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
NLP State of the Art | BERT
shaurya uppal
 
BERT Finetuning Webinar Presentation
bhavesh_physics
 
[Paper review] BERT
JEE HYUN PARK
 
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
BERT introduction
Hanwha System / ICT
 
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 

What's hot (20)

PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
PPTX
Bert
Abdallah Bashir
 
PDF
Transformer Introduction (Seminar Material)
Yuta Niki
 
PPTX
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
PPTX
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
PPTX
Word embedding
ShivaniChoudhary74
 
PPTX
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
PPTX
Natural language processing and transformer models
Ding Li
 
PPTX
Tutorial on word2vec
Leiden University
 
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
PPTX
Bert.pptx
Divya Gera
 
PDF
An introduction to the Transformers architecture and BERT
Suman Debnath
 
PPTX
Notes on attention mechanism
Khang Pham
 
PDF
NLP using transformers
Arvind Devaraj
 
PDF
GANs and Applications
Hoang Nguyen
 
PDF
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
PPTX
Hyperparameter Tuning
Jon Lederman
 
PDF
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
PDF
Transformers in 2021
Grigory Sapunov
 
PPTX
Attention Is All You Need
Illia Polosukhin
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Transformer Introduction (Seminar Material)
Yuta Niki
 
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
Word embedding
ShivaniChoudhary74
 
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Natural language processing and transformer models
Ding Li
 
Tutorial on word2vec
Leiden University
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
Bert.pptx
Divya Gera
 
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Notes on attention mechanism
Khang Pham
 
NLP using transformers
Arvind Devaraj
 
GANs and Applications
Hoang Nguyen
 
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
Hyperparameter Tuning
Jon Lederman
 
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Transformers in 2021
Grigory Sapunov
 
Attention Is All You Need
Illia Polosukhin
 
Ad

Similar to BERT (20)

PPTX
BERT QnA System for Airplane Flight Manual
ArkaGhosh65
 
PPTX
BERT.pptx
HemanthKonamanchili1
 
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
PDF
Pre-Trained-Language-Models-for-NLU
POOJA BHOJWANI
 
PDF
[Code night] natural language proccessing and machine learning
Kenichi Sonoda
 
PDF
Deep learning based drug protein interaction
NAVER Engineering
 
PDF
Andrea gatto meetup_dli_18_feb_2020
Deep Learning Italia
 
PDF
Should we be afraid of Transformers?
Dominik Seisser
 
PDF
BERT Explained_ State of the art language model for NLP.pdf
sudeshnakundu10
 
PDF
An Introduction to Pre-training General Language Representations
zperjaccico
 
PDF
【DeepLearning研修】Transformerの基礎と応用 -- 第2回 Transformerの言語での応用
Sony - Neural Network Libraries
 
PPTX
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
PPTX
Transformer Zoo
Grigory Sapunov
 
PDF
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
PDF
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
 
PPTX
Natural Language Processing detailed description
guptashivani271997
 
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
PDF
Deep-learning based Language Understanding and Emotion extractions
Jeongkyu Shin
 
PDF
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
PPTX
Data Con LA 2022 - Transformers for NLP
Data Con LA
 
BERT QnA System for Airplane Flight Manual
ArkaGhosh65
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
Pre-Trained-Language-Models-for-NLU
POOJA BHOJWANI
 
[Code night] natural language proccessing and machine learning
Kenichi Sonoda
 
Deep learning based drug protein interaction
NAVER Engineering
 
Andrea gatto meetup_dli_18_feb_2020
Deep Learning Italia
 
Should we be afraid of Transformers?
Dominik Seisser
 
BERT Explained_ State of the art language model for NLP.pdf
sudeshnakundu10
 
An Introduction to Pre-training General Language Representations
zperjaccico
 
【DeepLearning研修】Transformerの基礎と応用 -- 第2回 Transformerの言語での応用
Sony - Neural Network Libraries
 
Introduction to Neural Information Retrieval and Large Language Models
sadjadeb
 
Transformer Zoo
Grigory Sapunov
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
 
Natural Language Processing detailed description
guptashivani271997
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Deep-learning based Language Understanding and Emotion extractions
Jeongkyu Shin
 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
 
Data Con LA 2022 - Transformers for NLP
Data Con LA
 
Ad

Recently uploaded (20)

PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Chad Readey - An Independent Thinker
Chad Readey
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 

BERT

  • 1. BERT Bidirectional Encoder Representations from Transformer PHAM QUANG KHANG
  • 2. Concept 1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks a. Randomly mask some words within a sequence and let the model try to predict that masked words b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence prediction” 2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision) a. Pre-train on a large corpus as un-supervised learning to learn the language representation b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD 2019/12/16 PHAM QUANG KHANG 2 Devlin et al,. 2018
  • 3. Architecture 1. Encoder: encoder from transformer a. Base model: N = 12, Hidden dim=768, Heads=12 b. Large model: N=24, Hidden dime=1024, Heads=16 2. Embedding: 2019/12/16 PHAM QUANG KHANG 3 Token Embedding Multi-head Attention Add & Norm Feed Forward Add & Norm N× Positional Embedding Segment Embedding Linear + Softmax Output
  • 4. Pre-training tasks Masked LM Next sentence prediction 2019/12/16 PHAM QUANG KHANG 4
  • 5. Fine-tuning on SQuAD  Use output hidden states to predict start and end span  Apply 1 Linear(output=2) onto output hidden state vectors T’i  Output is predictions of starting and ending positions of answer within input paragraph  Objective function is log-likelihood of correct start and end positions 2019/12/16 PHAM QUANG KHANG 5
  • 6. Result on SQuAD SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model 2019/12/16 PHAM QUANG KHANG 6 https://rajpurkar.github.io/SQuAD-explorer/
  • 7. Improving from BERT ROBERTA 1. Train longer, bigger batches, more data 2. Remove next-sentence-prediction task 3. Longer sequences 4. Dynamic changing masks ALBERT 1. Factorized embedding params 2. Cross-layer param sharing 3. Inter-sentence coherence loss 2019/12/16 PHAM QUANG KHANG 7
  • 8. References 1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi netuning_with_cloud_tpus.ipynb 3. https://github.com/google-research/bert 4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT 5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach 6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 2019/12/16 PHAM QUANG KHANG 8