BERT

Uploaded by

Khushal Jangid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

BERT

Uploaded by

Khushal Jangid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

BERT

The BERT model was proposed in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

It leverages transformers based neural network to understand and generate human like language.

BERT employs only encoder architecture.

The decision to use an encoder-only architecture in BERT suggests a primary emphasis on understanding input
sequences rather than generating output sequences.

Traditional methods limit the model’s awareness by excluding the preceding word, to overcome this BERT uses both
bi-directional approaches i.e. both sides context to the words by looking at all the words simultaneously.

Pre-Training on Large Data

BERT is pre-trained on large amount of unlabelled text data. The model learns contextual embeddings, which are the
representations of words that take into account their surrounding context in a sentence.

BERT engages in various unsupervised pre-training tasks. For instance, it might learn to predict missing words in a
sentence (Masked Language Model or MLM task), understand the relationship between two sentences, or predict
the next sentence in a pair.

Fine-Tuning on Labelled Data

After the pre-training phase, the BERT model, with its contextual embeddings, can be fine-tuned for as per NLP tasks.
This step tailors the model to more targeted applications by adapting its general language understanding to the
nuances of the particular task.

BERT is fine-tuned using labelled data specific to the downstream tasks of interest. The model’s parameters are
adjusted to optimize its performance as per requirements.

Working

BERT is designed to generate a language model so, only the encoder mechanism is used. Sequence of tokens are fed
to the Transformer encoder. These tokens are first embedded into vectors and then processed in the neural network.
The output is a sequence of vectors, each corresponding to an input token, providing contextualized representations.

1. Masked Language Model (MLM):

In BERT’s pre-training process, a portion of words in each input sequence is masked and the model is trained
to predict the original values of these masked words based on the context provided by the surrounding
words.

 Masking words: BERT hides some words (about 15%) and replaces them with a special symbol, like
[MASK].
 Guessing hidden words: BERT’s job is to figure out what these hidden words are by looking at the
words around them.

 How it learns:
o BERT adds a special layer on top of its learning system to make these guesses. It then checks
how close its guesses are to the actual hidden words.
o It does this by converting its guesses into probabilities.
o BERT’s main focus during training is on getting these hidden words right.

2. Next Sentence Prediction (NSP):

BERT predicts if the second sentence is connected to the first. This is done by transforming the output of the
[CLS] token into a 2×1 shaped vector using a classification layer, and then calculating the probability of
whether the second sentence follows the first using SoftMax.

 In the training process, BERT learns to understand the relationship between pairs of sentences,
predicting if the second sentence follows the first in the original document.
 50% of the input pairs have the second sentence as the subsequent sentence in the original
document, and the other 50% have a randomly chosen sentence.
 To help the model distinguish between connected and disconnected sentence pairs. The input is
processed before entering the model:
o A [CLS] token is inserted at the beginning of the first sentence, and a [SEP] token is added at
the end of each sentence.
o A sentence embedding indicating Sentence A or Sentence B is added to each token.
o A positional embedding indicates the position of each token in the sequence.
 BERT predicts if the second sentence is connected to the first. This is done by transforming the
output of the [CLS] token into a 2×1 shaped vector using a classification layer, and then calculating
the probability of whether the second sentence follows the first using SoftMax.

During the training of BERT model, the Masked LM and Next Sentence Prediction are trained together. The model
aims to minimize the combined loss function of the Masked LM and Next Sentence Prediction.

BERT Architecture

 BERTBASE has 12 layers in the Encoder stack while BERTLARGE has 24 layers in the Encoder stack.
 BERT architectures (BASE and LARGE) also have larger feedforward networks (768 and 1024 hidden units
respectively), and more attention heads (12 and 16 respectively).
 BERTBASE contains 110M parameters while BERTLARGE has 340M parameters.
This model takes the CLS token as input first, then it is followed by a sequence of words as input. Here CLS is a
classification token. It then passes the input to the above layers. Each layer applies self-attention and passes the
result through a feedforward network after then it hands off to the next encoder. The model outputs a vector of
hidden size (768 for BERT BASE).

Use cases of BERT

 Classification Task:
o It can be done by classifying the text into different categories (positive/ negative/ neutral), which can
be implemented by adding a classification layer on the top of the Transformer output for the [CLS]
token.
o The [CLS] token represents the aggregated information from the entire input sequence, which
further can be input for classification layer.
 Question Answering:
o BERT is trained for question answering by learning two additional vectors that mark the beginning
and end of the answer. During training, the model is provided with questions and corresponding
passages, and it learns to predict the start and end positions of the answer within the passage.
 Named Entity Recognition (NER):
o A BERT-based NER model is trained by taking the output vector of each token form the Transformer
and feeding it into a classification layer. The layer predicts the named entity label for each token,
indicating the type of entity it represents.

To tokenize and encode text using BERT, we will be using the ‘transformer’ library in Python.

Application of BERT

BERT is used for:

 Text Representation: BERT is used to generate word embeddings or representation for words in a sentence.
 Named Entity Recognition (NER): BERT can be fine-tuned for named entity recognition tasks, where the goal
is to identify entities such as names of people, organizations, locations, etc., in a given text.
 Text Classification: BERT is widely used for text classification tasks, including sentiment analysis, spam
detection, and topic categorization. It has demonstrated excellent performance in understanding and
classifying the context of textual data.
 Question-Answering Systems: BERT has been applied to question-answering systems, where the model is
trained to understand the context of a question and provide relevant answers. This is particularly useful for
tasks like reading comprehension.
 Machine Translation: BERT’s contextual embeddings can be leveraged for improving machine translation
systems. The model captures the nuances of language that are crucial for accurate translation.
 Text Summarization: BERT can be used for abstractive text summarization, where the model generates
concise and meaningful summaries of longer texts by understanding the context and semantics.
 Conversational AI: BERT is employed in building conversational AI systems, such as chatbots, virtual
assistants, and dialogue systems. Its ability to grasp context makes it effective for understanding and
generating natural language responses.
 Semantic Similarity: BERT embeddings can be used to measure semantic similarity between sentences or
documents. This is valuable in tasks like duplicate detection, paraphrase identification, and information
retrieval.

Bert Explained
No ratings yet
Bert Explained
8 pages
Educational Tour Assessment
No ratings yet
Educational Tour Assessment
1 page
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
No ratings yet
32-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024
8 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
Understanding BERT
No ratings yet
Understanding BERT
4 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
data_mining_report
No ratings yet
data_mining_report
17 pages
BERT
No ratings yet
BERT
98 pages
Bert 1
No ratings yet
Bert 1
4 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Rebertsubmission116 NW
No ratings yet
Rebertsubmission116 NW
26 pages
NLP-LLM
No ratings yet
NLP-LLM
47 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Lec 02
No ratings yet
Lec 02
33 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
100% (1)
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
8 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
BERT Slides
No ratings yet
BERT Slides
41 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
20 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
BERT_GPT_CoT
No ratings yet
BERT_GPT_CoT
83 pages
BERT Architecture
No ratings yet
BERT Architecture
8 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
BERT Language Model
No ratings yet
BERT Language Model
7 pages
BERT Interview Questions and Cross Questions-1
No ratings yet
BERT Interview Questions and Cross Questions-1
9 pages
BERT
No ratings yet
BERT
21 pages
LSTM to BERT
No ratings yet
LSTM to BERT
30 pages
Bert ayman
No ratings yet
Bert ayman
5 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
11 Bert
No ratings yet
11 Bert
66 pages
Bert
No ratings yet
Bert
36 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
A Primer in BERTology - What We Know About How BERT Works
No ratings yet
A Primer in BERTology - What We Know About How BERT Works
23 pages
855 Roberta A Robustly Optimized B
No ratings yet
855 Roberta A Robustly Optimized B
15 pages
BERT
No ratings yet
BERT
1 page
Reasoning With Transformer Bas
No ratings yet
Reasoning With Transformer Bas
28 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
A Primer in BERTology
No ratings yet
A Primer in BERTology
15 pages
7 Transformers
No ratings yet
7 Transformers
20 pages
Difference Between BART and BERT
No ratings yet
Difference Between BART and BERT
2 pages
Pretraining-Based Natural Language Generation For Text Summarization
No ratings yet
Pretraining-Based Natural Language Generation For Text Summarization
7 pages
2024.semeval-1.72
No ratings yet
2024.semeval-1.72
6 pages
C4_W3
No ratings yet
C4_W3
98 pages
S7 PROJECT REPORT.docx (5)
No ratings yet
S7 PROJECT REPORT.docx (5)
52 pages
6-Bert T5 GPT
No ratings yet
6-Bert T5 GPT
31 pages
2024.acl-long.256
No ratings yet
2024.acl-long.256
11 pages
BERT
No ratings yet
BERT
1 page
Stanford Dataset 2.0
No ratings yet
Stanford Dataset 2.0
9 pages
Securebert: A Domain-Specific Language Model For Cybersecurity
No ratings yet
Securebert: A Domain-Specific Language Model For Cybersecurity
19 pages
Final
No ratings yet
Final
30 pages
BERT Summarization MP IA1Final
No ratings yet
BERT Summarization MP IA1Final
12 pages
Intbert Acl19paper-3
No ratings yet
Intbert Acl19paper-3
8 pages
8.2+Transformer+Architectures+for+NLP
No ratings yet
8.2+Transformer+Architectures+for+NLP
16 pages
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Divyansh
No ratings yet
Divyansh
21 pages
Aastha Chhabra
No ratings yet
Aastha Chhabra
35 pages
References
No ratings yet
References
1 page
Preface
No ratings yet
Preface
1 page
Dhruv.K.M (19EJCCS714) Seminar Report
No ratings yet
Dhruv.K.M (19EJCCS714) Seminar Report
36 pages
Group 1 Research Paper
No ratings yet
Group 1 Research Paper
14 pages
Revised Tle DLL
100% (1)
Revised Tle DLL
2 pages
Research Methodology (5669)
No ratings yet
Research Methodology (5669)
16 pages
A2 Intro To Psychology 2020-2021 - Notes
No ratings yet
A2 Intro To Psychology 2020-2021 - Notes
9 pages
DLL - English 6 - Q4 - W2
No ratings yet
DLL - English 6 - Q4 - W2
3 pages
Eval - Case Study Performance Task and Rubric
No ratings yet
Eval - Case Study Performance Task and Rubric
2 pages
Learning Episode: Designing Classroom Bulletin Boards
No ratings yet
Learning Episode: Designing Classroom Bulletin Boards
6 pages
2014 Ncae Answer Sheet - Revised
No ratings yet
2014 Ncae Answer Sheet - Revised
10 pages
GNS 101
No ratings yet
GNS 101
7 pages
DLL MAPEH-5 Q1 W9-Reteach
No ratings yet
DLL MAPEH-5 Q1 W9-Reteach
4 pages
Columbia Mfa Thesis Show
100% (5)
Columbia Mfa Thesis Show
7 pages
Bes - School Learning Monitoring Plan 2022
No ratings yet
Bes - School Learning Monitoring Plan 2022
3 pages
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean Week 3 To Week 4
100% (1)
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean Week 3 To Week 4
20 pages
Book Abcsofart Elementsandprinciplesofdesign
No ratings yet
Book Abcsofart Elementsandprinciplesofdesign
25 pages
Organizational Behaviour: Models of Organisational Behaviour-Unit One
No ratings yet
Organizational Behaviour: Models of Organisational Behaviour-Unit One
11 pages
Essay
No ratings yet
Essay
2 pages
Dyson 2005 Kindergarten Children S Understanding of and Attitudes Toward People With Disabilities
No ratings yet
Dyson 2005 Kindergarten Children S Understanding of and Attitudes Toward People With Disabilities
11 pages
National Policy On Education (NPE 1986)
100% (1)
National Policy On Education (NPE 1986)
11 pages
6 Why Does The Future Needs Us
No ratings yet
6 Why Does The Future Needs Us
3 pages
Psy 101 All Past Quiz Es
No ratings yet
Psy 101 All Past Quiz Es
28 pages
Paul Romano B. Royo LP 4th Quarter
No ratings yet
Paul Romano B. Royo LP 4th Quarter
8 pages
Starter Kit - Future of Work Skills
No ratings yet
Starter Kit - Future of Work Skills
15 pages
Portfolio 2021-2022 Unay
No ratings yet
Portfolio 2021-2022 Unay
40 pages
Lesson Plan Example.
No ratings yet
Lesson Plan Example.
4 pages
BN4206 Assignment 2 Reassessment
No ratings yet
BN4206 Assignment 2 Reassessment
5 pages
DISC 212-Introduction To Management Science-Muhammad Adeel Zaffar
No ratings yet
DISC 212-Introduction To Management Science-Muhammad Adeel Zaffar
4 pages
AI Project List
No ratings yet
AI Project List
6 pages
Notes On Performance Management System PDF
100% (1)
Notes On Performance Management System PDF
20 pages
Assessment Process: Appreciation of Digital Literacy (Level 1) Module No. Module Name Learning Hours
No ratings yet
Assessment Process: Appreciation of Digital Literacy (Level 1) Module No. Module Name Learning Hours
3 pages