0% found this document useful (0 votes)

3 views

exercises-en-text-models-2

The document outlines a lab class on Natural Language Processing focusing on text models, including exercises on Bag-of-Words, tf·idf, word vectors, Word Mover Distance, and sentence embeddings. Students are required to submit their solutions by May 7, 2025, and the exercises involve theoretical questions and practical calculations using provided document collections and word vectors. The document emphasizes understanding various text representation techniques and their applications in sentiment analysis and similarity measurements.

Uploaded by

Aditya Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

exercises-en-text-models-2

Uploaded by

Aditya Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Natural Language Processing April 18, 2025

Lab Class NLP: Text Models II

Submit your solutions until 07.05.2025, 23:59 pm on Moodle. The submission should include a single
PDF file with your group’s solutions to the exercises.

Exercise 1 : Text Representation: Bag-of-Words (1+2+1=4 Points)

The lecture introduced Bag-of-Words (BoW) model as a simple text representation technique.

(a) What is the main assumption of the BoW model?

(b) Consider the following document collection D of 4 documents:

d1: not bad good film d3: good film bad plot
d2: good film good plot d4: not good bad film

Create a BoW representation of the document collection D. Which document is the most similar to
the document d1 based on the BoW representations of the documents?
Hint: you don’t need to calculate the cosine similarity, just compare the resulting representations.

(c) You want to use the BoW representation to train a model for sentiment analysis prediction (e.g.
classifying movie reviews as positive or negative). Do you think the BoW representation is suitable
this task? Use your BoW representation of the document collection D to support your answer.

1 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 2 : Text Representation: Term Weighting tf ·idf (1+2+2=5 Points)
The lecture introduced tf ·idf as a measure to evaluate the importance w of a term t in a document d ∈ D as:

w(t) = tf (t, d) · idf (t, D)

(a) What is measured by tf (t, d) and idf (t, D) in the equation above? How are they calculated?

(b) Consider the following document collection D of 8 documents:

d1: bad bad fast cat d5: job big big cat
d2: run unix cat job d6: kill big big job
d3: big big big cat d7: unix job run cat
d4: big cat big kill d8: big cat big cat

(b1) Calculate the idf value for each term in the document collection D. Which term (or terms) have
the highest idf value in this collection? Report the words and their idf values.

(b2) The query q = big cat is run against the document collection D. Rank the documents
according to the weighted sum of tf ·idf values for the query terms:
X X
w(t) = tf (t, d) · idf (t, D)
t∈q t∈q

2 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 3 : Word Vectors (2+1+1+2=6 Points)
The lecture introduced distributional representations of words (word vectors) as embeddings of words in a
latent space.

(a) Which of the following statements about word vectors are true? (Select all that apply)

2 Word embeddings are sparse vector representations of words.

2 Embeddings are learned from the surrounding context in which words appear.
2 Embeddings can only be learned for words, not subwords or sentences.
2 Embeddings can capture semantic relationships between words.
2 Embeddings can not capture syntactic relationships between words.
2 The higher the dimensionality of the embedding, the more information they can capture.

(b) Which of the following equations should hold for good word embeddings? (Select all that apply)

2 wboy − wgirl ≈ wbrother − wsister

2 wcat − wdog ≈ wpuppy − wkitten
2 wberlin − wtokyo + wjapan ≈ wgermany
2 wbigger − wbig + wcold ≈ wcolder

(d) Most computational models of distributional similarity, including neural embeddings, such as
word2vec, often embed antonyms (e.g. “good” and “bad”) close to each other in the vector space and
assign them similar meanings* . Briefly explain the underlying reason for this phenomenon.

* You can visually inspect this phenomenon in the TensorFlow Projector tool.

3 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 4 : Word Mover Distance (1+1+1=3 Points)
The lecture introduced the Word Mover Distance (WMD) for measuring word vector similarity. WMD
finds the minimum cumulative transportation cost to move all words from one sentence to words in another
sentence in an embedding space.
You are given the sentences A, B, and C and the 3-dimensional word vectors [d1 , d2 , d3 ] for all occurring
words:

• Sentence A: "The cat climbed the tree."

• Sentence B: "The feline scaled the tree."
• Sentence C: "The kitten ascended the tree."

the cat climbed tree feline scaled kitten ascended

d1 0.1 0.4 0.7 1.0 0.35 0.75 0.45 0.77
d2 0.2 0.5 0.8 1.1 0.55 0.85 0.57 0.87
d3 0.3 0.6 0.9 1.2 0.65 0.95 0.67 0.97

(a) Calculate the WMD between Sentence A and Sentence B.

(b) Calculate the WMD between Sentence A and Sentence C.

(c) Which sentence B or C is more similar to Sentence A?

4 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 5 : Sentence Embeddings (1+1+1+0+2=5 Points)
The lecture introduced sentence embeddings as a way to represent sentences in a continuous vector space.
One approach to generating sentence embeddings is to average the word embeddings w of all words in a
sentence s:
1 X
semb = wi
|s| w ∈s
i

Given the same sentences A, B, and C and the 3-dimensional word vectors for all occurring words from the
Exercise 4:

(a) Calculate the sentence embeddings for Sentence A, B, and C using vector averaging.

(b) Which sentence embedding pair is more similar (A, B) or (A, C)? For your answer, calculate the
Manhattan distance D between the embeddings of the sentences in each pair.

(d) Another approach to measuring the similarity between two embedding representations is to calculate
the cosine similarity between them:
s1 · s2
simcosine (s1 , s2 ) =
∥s1 ∥ · ∥s2 ∥

(d1) Interpret the following cosine similarity values between two sentence embeddings:

a) simcosine (s1 , s2 ) = −1
b) simcosine (s1 , s2 ) = 0
c) simcosine (s1 , s2 ) = 1

(d2) Calculate the cosine similarity between Sentence A and Sentence B, and between Sentence A
and Sentence C. Which sentence is more similar to Sentence A according to the cosine
similarity measure?

Essential Mathematics For The Australian Curriculum Year 8
0% (1)
Essential Mathematics For The Australian Curriculum Year 8
17 pages
8.1 Differential Equations 01 Solutions
80% (5)
8.1 Differential Equations 01 Solutions
5 pages
Week002 LabEx
100% (2)
Week002 LabEx
4 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
wordembed
No ratings yet
wordembed
31 pages
Unit iv
No ratings yet
Unit iv
57 pages
Chapter 3 After Modfiy
No ratings yet
Chapter 3 After Modfiy
4 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
4. WordRepresentation
No ratings yet
4. WordRepresentation
26 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
ML for NLP-LO4
No ratings yet
ML for NLP-LO4
42 pages
Word Embedding Generation For Telugu Corpus
No ratings yet
Word Embedding Generation For Telugu Corpus
28 pages
Unit iv
No ratings yet
Unit iv
58 pages
Word Embeddings a Survey
No ratings yet
Word Embeddings a Survey
11 pages
DM Chapter 9 - word embedding
No ratings yet
DM Chapter 9 - word embedding
7 pages
NLP 2
No ratings yet
NLP 2
8 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Lect04
No ratings yet
Lect04
44 pages
lesson_13
No ratings yet
lesson_13
29 pages
unit2
No ratings yet
unit2
15 pages
05. Vector Semantics and Embeddings
No ratings yet
05. Vector Semantics and Embeddings
29 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
第二周quiz小测验
No ratings yet
第二周quiz小测验
4 pages
14-Word Embeddings II
No ratings yet
14-Word Embeddings II
31 pages
11.Chapter8_WordEmbedding
No ratings yet
11.Chapter8_WordEmbedding
17 pages
He Laskar 2019
No ratings yet
He Laskar 2019
4 pages
08 Exercises Word2vec MUD SOLVED
No ratings yet
08 Exercises Word2vec MUD SOLVED
3 pages
NLP DL Lecture2
No ratings yet
NLP DL Lecture2
54 pages
word embedding
No ratings yet
word embedding
35 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
Word Embeddings Master S Thesis
No ratings yet
Word Embeddings Master S Thesis
68 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
week2and3
No ratings yet
week2and3
76 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
NLP Lec 03
No ratings yet
NLP Lec 03
26 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
14 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Cs 224 N
No ratings yet
Cs 224 N
128 pages
4. Word Embadding
No ratings yet
4. Word Embadding
24 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Generative AI (1)
No ratings yet
Generative AI (1)
16 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Assignment 2 - 20240709
No ratings yet
Assignment 2 - 20240709
13 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Chapter II
No ratings yet
Chapter II
26 pages
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
No ratings yet
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
24 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
0YqnEK3vg4heOTv089KxSI1ijWzuAxT1AgGevOKKJE
No ratings yet
0YqnEK3vg4heOTv089KxSI1ijWzuAxT1AgGevOKKJE
4 pages
GEN AI LAB PROGRAMS
No ratings yet
GEN AI LAB PROGRAMS
15 pages
L4_CSE256_FA24_WE
No ratings yet
L4_CSE256_FA24_WE
68 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Module 5: Number Systems: Introduction To Networks v7.0 (ITN)
No ratings yet
Module 5: Number Systems: Introduction To Networks v7.0 (ITN)
16 pages
An Introduction to Quantum Mechanics
No ratings yet
An Introduction to Quantum Mechanics
3 pages
IP Unit 5 Notes
No ratings yet
IP Unit 5 Notes
97 pages
Completely Randomized Designs: Gary W. Oehlert
No ratings yet
Completely Randomized Designs: Gary W. Oehlert
33 pages
Aod (Q)
No ratings yet
Aod (Q)
10 pages
CheCalc Vessel Volume & Level Calculation
No ratings yet
CheCalc Vessel Volume & Level Calculation
4 pages
Infinity and Me Guide
No ratings yet
Infinity and Me Guide
6 pages
Chapter III Random Variables
No ratings yet
Chapter III Random Variables
99 pages
Computer Graphics Detailed PYQs Solutions
No ratings yet
Computer Graphics Detailed PYQs Solutions
2 pages
VAWT
100% (3)
VAWT
13 pages
Force & Acceleration: Chapter 13: Objectives
No ratings yet
Force & Acceleration: Chapter 13: Objectives
16 pages
Business Mathematics in English 2
No ratings yet
Business Mathematics in English 2
217 pages
Improper Rotation: 1 Three Dimensions
No ratings yet
Improper Rotation: 1 Three Dimensions
3 pages
Templeman Et Al 2021 Cutting Shoe Design
100% (1)
Templeman Et Al 2021 Cutting Shoe Design
41 pages
5555syllabi Physics 2014
No ratings yet
5555syllabi Physics 2014
52 pages
Constants, Variables, Terms, Algebraic Expressions, and Numerical and Literal Coefficients
No ratings yet
Constants, Variables, Terms, Algebraic Expressions, and Numerical and Literal Coefficients
6 pages
Game Theory & Olipoly PDF
No ratings yet
Game Theory & Olipoly PDF
19 pages
Reviewer For Math
No ratings yet
Reviewer For Math
45 pages
Heat Transfer Lab - 1
No ratings yet
Heat Transfer Lab - 1
31 pages
Vector
No ratings yet
Vector
4 pages
Human Resource Management - I Prof. Kalyan Chakravarti Department of Basic Courses Indian Institute of Technology, Kharagpur
No ratings yet
Human Resource Management - I Prof. Kalyan Chakravarti Department of Basic Courses Indian Institute of Technology, Kharagpur
20 pages
8085 Microprocessor Problems
No ratings yet
8085 Microprocessor Problems
2 pages
2020 ACE Adv Yearly
No ratings yet
2020 ACE Adv Yearly
19 pages
Batching of Fresh Concrete
No ratings yet
Batching of Fresh Concrete
6 pages
Vector Calculus and Differential Equations: Chaitanya Bharathi Institute of Technology
No ratings yet
Vector Calculus and Differential Equations: Chaitanya Bharathi Institute of Technology
2 pages
List of Generic Elective Papers For Semester III
No ratings yet
List of Generic Elective Papers For Semester III
1 page
Plucker Coordinates PDF
No ratings yet
Plucker Coordinates PDF
11 pages