exercises-en-text-models-2
exercises-en-text-models-2
Submit your solutions until 07.05.2025, 23:59 pm on Moodle. The submission should include a single
PDF file with your group’s solutions to the exercises.
d1: not bad good film d3: good film bad plot
d2: good film good plot d4: not good bad film
Create a BoW representation of the document collection D. Which document is the most similar to
the document d1 based on the BoW representations of the documents?
Hint: you don’t need to calculate the cosine similarity, just compare the resulting representations.
(c) You want to use the BoW representation to train a model for sentiment analysis prediction (e.g.
classifying movie reviews as positive or negative). Do you think the BoW representation is suitable
this task? Use your BoW representation of the document collection D to support your answer.
1 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 2 : Text Representation: Term Weighting tf ·idf (1+2+2=5 Points)
The lecture introduced tf ·idf as a measure to evaluate the importance w of a term t in a document d ∈ D as:
(a) What is measured by tf (t, d) and idf (t, D) in the equation above? How are they calculated?
d1: bad bad fast cat d5: job big big cat
d2: run unix cat job d6: kill big big job
d3: big big big cat d7: unix job run cat
d4: big cat big kill d8: big cat big cat
(b1) Calculate the idf value for each term in the document collection D. Which term (or terms) have
the highest idf value in this collection? Report the words and their idf values.
(b2) The query q = big cat is run against the document collection D. Rank the documents
according to the weighted sum of tf ·idf values for the query terms:
X X
w(t) = tf (t, d) · idf (t, D)
t∈q t∈q
2 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 3 : Word Vectors (2+1+1+2=6 Points)
The lecture introduced distributional representations of words (word vectors) as embeddings of words in a
latent space.
(a) Which of the following statements about word vectors are true? (Select all that apply)
(b) Which of the following equations should hold for good word embeddings? (Select all that apply)
(c) What is the difference between static and contextualized word embeddings?
(d) Most computational models of distributional similarity, including neural embeddings, such as
word2vec, often embed antonyms (e.g. “good” and “bad”) close to each other in the vector space and
assign them similar meanings* . Briefly explain the underlying reason for this phenomenon.
* You can visually inspect this phenomenon in the TensorFlow Projector tool.
3 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 4 : Word Mover Distance (1+1+1=3 Points)
The lecture introduced the Word Mover Distance (WMD) for measuring word vector similarity. WMD
finds the minimum cumulative transportation cost to move all words from one sentence to words in another
sentence in an embedding space.
You are given the sentences A, B, and C and the 3-dimensional word vectors [d1 , d2 , d3 ] for all occurring
words:
4 © WIEGMANN/MIRZAKHMEDOVA 2025
Exercise 5 : Sentence Embeddings (1+1+1+0+2=5 Points)
The lecture introduced sentence embeddings as a way to represent sentences in a continuous vector space.
One approach to generating sentence embeddings is to average the word embeddings w of all words in a
sentence s:
1 X
semb = wi
|s| w ∈s
i
Given the same sentences A, B, and C and the 3-dimensional word vectors for all occurring words from the
Exercise 4:
(a) Calculate the sentence embeddings for Sentence A, B, and C using vector averaging.
(b) Which sentence embedding pair is more similar (A, B) or (A, C)? For your answer, calculate the
Manhattan distance D between the embeddings of the sentences in each pair.
(c) What are the limitations of the vector averaging approach for generating sentence embeddings?
Name at least two.
(d) Another approach to measuring the similarity between two embedding representations is to calculate
the cosine similarity between them:
s1 · s2
simcosine (s1 , s2 ) =
∥s1 ∥ · ∥s2 ∥
(d1) Interpret the following cosine similarity values between two sentence embeddings:
a) simcosine (s1 , s2 ) = −1
b) simcosine (s1 , s2 ) = 0
c) simcosine (s1 , s2 ) = 1
(d2) Calculate the cosine similarity between Sentence A and Sentence B, and between Sentence A
and Sentence C. Which sentence is more similar to Sentence A according to the cosine
similarity measure?
5 © WIEGMANN/MIRZAKHMEDOVA 2025