0% found this document useful (0 votes)

10 views

Assignment-7-Solution

The document contains an assignment with eight questions related to large language models, covering topics such as ELMo, BERT, T5, and GPT-style models. Each question includes the correct answer and a brief explanation of the concepts involved. The assignment tests knowledge on model architectures, pre-training objectives, and specific datasets used in training.

Uploaded by

Harsh Vardhan Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Assignment-7-Solution

Uploaded by

Harsh Vardhan Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Introduction to Large Language Models

Assignment- 7

Number of questions: 8 Total mark: 6 X 1 + 2 X 2 = 10

QUESTION 1: [1 mark]

Which of the following best describes how ELMo’s architecture captures different linguistic
properties?

a) The model explicitly assigns specific linguistic functions to each layer.

b) The lower layers capture syntactic information, while higher layers capture semantic
information.
c) All layers capture the similar properties.
d) ELMo uses a fixed, non-trainable weighting scheme for combining layer-wise
representations.

Correct Answer: b

Solution: ELMo uses a multi-layer bidirectional LSTM architecture, where different layers
capture different aspects of language. Empirical evidence shows that lower layers focus
more on syntactic information while higher layers capture more semantic nuances.

_________________________________________________________________________

QUESTION 2: [1 mark]

BERT and BART models differ in their architectures. While BERT is (i) model, BART
is (ii) one. Select the correct choices for (i) and (ii).

a) i: Decoder-only ; ii: Encoder-only

b) i: Encoder-decoder ; ii: Encoder-only
c) i: Encoder-only ; ii: Encoder-decoder
d) i: Decoder-only ; ii: Encoder-decoder

Correct Answer: c

Solution: BERT is an encoder-only transformer model, while BART is an encoder-decoder

model.

_________________________________________________________________________

QUESTION 3: [1 mark]

The pre-training objective for the T5 model is based on:

a) Next sentence prediction
b) Masked language modelling
c) Span corruption and reconstruction
d) Predicting the next token

Correct Answer: c

Solution: T5 is trained using a span corruption objective, which requires the model to
reconstruct masked spans of text.

________________________________________________________________________

QUESTION 4: [1 mark]

Which of the following datasets was used to pretrain the T5 model?

a) Wikipedia
b) BookCorpus
c) Common Crawl
d) C4

Correct Answer: d

Solution: T5 was pretrained on the “C4” (Colossal Clean Crawled Corpus) dataset.

_________________________________________________________________________

QUESTION 5: [1 mark]

Which of the following special tokens are introduced in BERT to handle sentence pairs?
a) [MASK] and [CLS]
b) [SEP] and [CLS]
c) [CLS] and [NEXT]
d) [SEP] and [MASK]

Correct Answer: b

Solution: BERT introduces the [CLS] token at the start for classification or overall sequence
representation and the [SEP] token to separate sentences. Thus, the special tokens are
“[SEP]” and “[CLS]”.

_________________________________________________________________________

QUESTION 6: [2 marks]

ELMo and BERT represent two different pre-training strategies for language models. Which
of the following statement(s) about these approaches is/are true?

a) ELMo uses a bi-directional LSTM to pre-train word representations, while BERT uses
a transformer encoder with masked language modeling.
b) ELMo provides context-independent word representations, whereas BERT provides
context-dependent representations.
c) Pre-training of both ELMo and BERT involve next token prediction.
d) Both ELMo and BERT produce word embeddings that can be fine-tuned for
downstream tasks.

Correct Answer: a, d

Solution: ELMo uses bidirectional LSTMs with a language modeling objective, while BERT
uses a transformer encoder and masked language modelling. Both can produce embeddings
that are fine-tuned for downstream tasks. Hence, the correct answers are (a) and (d).

_________________________________________________________________________

QUESTION 7: [1 mark]

Decoder-only models are essentially trained based on probabilistic language modelling.

Which of the following correctly represents the training objective of GPT-style models?

a) P(y | x) where x is the input sequence and y is the gold output sequence
b) P(x ∣ y) where x is the input sequence and y is the gold output sequence
c) P(wt ∣ w1:t−1), where wt represents the token at position t, and w1:t−1 is the sequence of
tokens from position 1 to t-1
d) P(wt ∣ w1:t+1), where wt represents the token at position t, and w1:t+1 is the sequence of
tokens from position 1 to t+1

Correct Answer: c

Solution: Decoder-only (GPT-style) models are trained using left-to-right language

modeling, predicting each token given all previous tokens. Thus, the objective is P(wt ∣
w1:t−1).

QUESTION 8: (Numerical Question) [2 marks]

In the previous week, we saw the usage of einsum function in numpy as a generalized
1 5
operation for performing tensor multiplications. Now, consider two matrices: 𝐴 = [ ] and
3 7
2 −1
𝐴 = [ ] . Then, what is the output of the following numpy operation?
4 2
numpy.einsum('ij,ij->', A, B)

Correct Answer: 23

Solution: The operation numpy.einsum('ij,ij->', A, B) computes the elementwise

product of A and B, then sums all those products.

Thus, output = 21 + (-1)5 + 43 + 27 = 2 – 5 + 12 + 14 = 23

Random Motors Project
75% (8)
Random Motors Project
10 pages
Applied NLP
50% (2)
Applied NLP
8 pages
Sample Midterm Questions Answers
No ratings yet
Sample Midterm Questions Answers
5 pages
Convex Optimization in Normed Spaces - Theory Methods and Examples
No ratings yet
Convex Optimization in Normed Spaces - Theory Methods and Examples
132 pages
Week 5 Exercises Solutions
100% (1)
Week 5 Exercises Solutions
12 pages
Ratio Method For Calculating A Ratio
No ratings yet
Ratio Method For Calculating A Ratio
5 pages
12th National ISMO Class 6 Question Paper With Solutions
70% (10)
12th National ISMO Class 6 Question Paper With Solutions
8 pages
Applied NLP - Project - Learner Template
No ratings yet
Applied NLP - Project - Learner Template
5 pages
Bert
No ratings yet
Bert
5 pages
HW4 Supplement Quiz
No ratings yet
HW4 Supplement Quiz
5 pages
Bert
No ratings yet
Bert
10 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
C4_W3
No ratings yet
C4_W3
98 pages
Sequence To Sequence Model, Transformers and BERT
No ratings yet
Sequence To Sequence Model, Transformers and BERT
2 pages
Overall Analysis: Solution Report
No ratings yet
Overall Analysis: Solution Report
19 pages
Assignment-8-Solution
No ratings yet
Assignment-8-Solution
7 pages
Nlp Mcq Advanced Real 1 20
No ratings yet
Nlp Mcq Advanced Real 1 20
7 pages
OCI GEN AI Test 1
No ratings yet
OCI GEN AI Test 1
6 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
in4080_exam_2021_solutions
No ratings yet
in4080_exam_2021_solutions
18 pages
Assignment-5-Solution
No ratings yet
Assignment-5-Solution
4 pages
DL & CD
No ratings yet
DL & CD
4 pages
Go4braindumps 1z0 1127 24 Questions by Day 22 07 2024 11qa
No ratings yet
Go4braindumps 1z0 1127 24 Questions by Day 22 07 2024 11qa
12 pages
Vin AI
No ratings yet
Vin AI
55 pages
Lec 02
No ratings yet
Lec 02
33 pages
Deep Learning - Assignment 11 Your Name, Roll Number 1. What Is The Difference Between Backpropagation Algorithm and Backpropagation Through Time (BPTT) Algorithm ?
No ratings yet
Deep Learning - Assignment 11 Your Name, Roll Number 1. What Is The Difference Between Backpropagation Algorithm and Backpropagation Through Time (BPTT) Algorithm ?
10 pages
LSTM to BERT
No ratings yet
LSTM to BERT
30 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Cs224n Midterm 2018 Solution
No ratings yet
Cs224n Midterm 2018 Solution
17 pages
BERT_GPT_CoT
No ratings yet
BERT_GPT_CoT
83 pages
BERT
No ratings yet
BERT
4 pages
cs224n Practice Midterm 3 Sol
No ratings yet
cs224n Practice Midterm 3 Sol
14 pages
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
No ratings yet
Learning To Answer by Learning To Ask - Getting The Best of GPT-2 and BERT Worlds PDF
10 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
GLM: General Language Model Pretraining With Autoregressive Blank Infilling
No ratings yet
GLM: General Language Model Pretraining With Autoregressive Blank Infilling
16 pages
Reasoning With Transformer Bas
No ratings yet
Reasoning With Transformer Bas
28 pages
11 Bert
No ratings yet
11 Bert
66 pages
NLP Midsem Paper Jan 2024 Regular exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular exam
4 pages
Assignment-10-Solution
No ratings yet
Assignment-10-Solution
6 pages
855 Roberta A Robustly Optimized B
No ratings yet
855 Roberta A Robustly Optimized B
15 pages
nlp_final
No ratings yet
nlp_final
11 pages
Week 11 nptel deep learning
No ratings yet
Week 11 nptel deep learning
6 pages
CS 224n Assignment #2: Word2Vec and Dependency Parsing
No ratings yet
CS 224n Assignment #2: Word2Vec and Dependency Parsing
10 pages
final mcqs
No ratings yet
final mcqs
7 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
New Microsoft Word Document 1
No ratings yet
New Microsoft Word Document 1
12 pages
Exam
No ratings yet
Exam
10 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
PML Questions September 25 2023
No ratings yet
PML Questions September 25 2023
4 pages
Seq2seq - What Are Differences Between T5 and Bart - Stack Overflow
No ratings yet
Seq2seq - What Are Differences Between T5 and Bart - Stack Overflow
3 pages
Assignment-9-Solution
No ratings yet
Assignment-9-Solution
7 pages
song19d
No ratings yet
song19d
11 pages
Week9 Discussion_ Deep Learning
No ratings yet
Week9 Discussion_ Deep Learning
22 pages
Week11 Discussion_ Deep Learning
No ratings yet
Week11 Discussion_ Deep Learning
23 pages
NLP_basics
No ratings yet
NLP_basics
119 pages
hw3 (1)
No ratings yet
hw3 (1)
35 pages
Stanford Dataset 2.0
No ratings yet
Stanford Dataset 2.0
9 pages
NLP-LLM
No ratings yet
NLP-LLM
47 pages
2003.07000
No ratings yet
2003.07000
9 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Learning Piecewise Control Strategies in a Modular Neural Network Architecture
No ratings yet
Learning Piecewise Control Strategies in a Modular Neural Network Architecture
9 pages
Chapter 2 - Describing Data
No ratings yet
Chapter 2 - Describing Data
24 pages
Inferential Statistics With Computer Application Lesson 1
No ratings yet
Inferential Statistics With Computer Application Lesson 1
20 pages
Validitas Jaya 2
No ratings yet
Validitas Jaya 2
1 page
Infinity and Me Guide
No ratings yet
Infinity and Me Guide
6 pages
Fixed End Method PDF
No ratings yet
Fixed End Method PDF
28 pages
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
53 pages
Visc
No ratings yet
Visc
3 pages
Development of Concrete Breakwater Armour Units: January 2003
No ratings yet
Development of Concrete Breakwater Armour Units: January 2003
13 pages
Feed Water Flow Control
100% (3)
Feed Water Flow Control
13 pages
MCQ_ICAI_Cost
No ratings yet
MCQ_ICAI_Cost
11 pages
Final - Module-9-Problem-Solving
No ratings yet
Final - Module-9-Problem-Solving
10 pages
X - Cbse - Maths - Material - Unit Test - IV
No ratings yet
X - Cbse - Maths - Material - Unit Test - IV
14 pages
Analisis Biaya Pemeliharaan Mesin Terhadap Kualitas Produksi Pada PT
No ratings yet
Analisis Biaya Pemeliharaan Mesin Terhadap Kualitas Produksi Pada PT
2 pages
Algebra 123
No ratings yet
Algebra 123
9 pages
Ultrasonic Tracking
No ratings yet
Ultrasonic Tracking
31 pages
Spell Number in MS Excel
No ratings yet
Spell Number in MS Excel
3 pages
Topcs UL Power Control Optimization
100% (1)
Topcs UL Power Control Optimization
14 pages
List of Generic Elective Papers For Semester III
No ratings yet
List of Generic Elective Papers For Semester III
1 page
Bhramagupta PDF
No ratings yet
Bhramagupta PDF
7 pages
Requirements Analysis and Specification: (Lecture 3)
No ratings yet
Requirements Analysis and Specification: (Lecture 3)
81 pages
E4 Sol
No ratings yet
E4 Sol
25 pages
Practice For Test 1 Key
No ratings yet
Practice For Test 1 Key
5 pages
Homework2 Ans
No ratings yet
Homework2 Ans
5 pages
Deduction and Induction
No ratings yet
Deduction and Induction
2 pages
Vector
No ratings yet
Vector
4 pages