0% found this document useful (0 votes)

6 views

CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp

The document outlines the topics and objectives of a course on Lexical and Vector Semantics in Natural Language Processing, focusing on concepts like lexical ambiguity, word vectors, and topic modeling. It emphasizes the importance of word sense disambiguation and introduces various approaches to operationalize the distributional hypothesis in understanding word meanings. Additionally, it discusses methods such as the Lesk algorithm and contextual embeddings for word sense disambiguation.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

Lexical and Vector

Semantics

CSE538 - Spring 2025

Natural Language Processing
Topics
● Lexical Ambiguity (why word sense disambiguation)
● Word Vectors
● Topic Modeling

Objectives
● Define common semantic tasks in NLP and learn some approaches to solve.
● Understand linguistic information necessary for semantic processing
● Motivate deep learning models necessary to capture language semantics.
● Learn word embeddings (the starting point for modern large language models)
(Jurafsky & Martin, SLP, 2019)
(Jurafsky & Martin, SLP, 2019)
(Schwartz, 2011)
(Jurafsky & Martin, SLP, 2019)
(Jurafsky & Martin, SLP, 2019)
Word Sense Disambiguation

He put the port on the ship.

He walked along the port of the steamer.

He walked along the port next to the steamer.

Word Sense Disambiguation

He put the port on the ship.

He walked along the port of the steamer.

He walked along the port next to the steamer.

Word Sense Disambiguation

He put the port on the ship.

He walked along the port of the steamer.

He walked along the port next to the steamer.

Word Sense Disambiguation
He put the port on the ship. port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
He walked along the port of the steamer.
country)
He walked along the port next to the steamer. port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
Word Sense Disambiguation
He put the port on the ship. port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
He walked along the port of the steamer.
country)
He walked along the port next to the steamer. port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
port.n.3, embrasure, porthole (an opening (in a
wall or ship or armored vehicle) for firing
through)
Word Sense Disambiguation
He put the port on the ship. port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
He walked along the port of the steamer.
country)
He walked along the port next to the steamer. port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
port.n.3, embrasure, porthole (an opening (in a
wall or ship or armored vehicle) for firing
through)
larboard, port.n.4 (the left side of a ship or
aircraft to someone who is aboard and facing
the bow or nose)
Word Sense Disambiguation
He put the port on the ship. port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
He walked along the port of the steamer.
country)
He walked along the port next to the steamer. port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
port.n.3, embrasure, porthole (an opening (in a
wall or ship or armored vehicle) for firing
through)
larboard, port.n.4 (the left side of a ship or
aircraft to someone who is aboard and facing
the bow or nose)
interface, port.n.5 ((computer science)
computer circuit consisting of the hardware and
associated circuitry that links one device with
another (especially a computer and a hard disk
drive or other peripherals))
Word Sense Disambiguation
He put the port on the ship. port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
He walked along the port of the steamer.
country)
He walked along the port next to the steamer. port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
port.n.3, embrasure, porthole (an opening (in a
As a verb… wall or ship or armored vehicle) for firing
through)
1. port (put or turn on the left side, of a ship) "port the helm"
2. port (bring to port) "the captain ported the ship at night" larboard, port.n.4 (the left side of a ship or
3. port (land at or reach a port) "The ship finally ported" aircraft to someone who is aboard and facing
4. port (turn or go to the port or left side, of a ship) "The big ship was the bow or nose)
slowly porting"
5. port (carry, bear, convey, or bring) "The small canoe could be ported interface, port.n.5 ((computer science)
easily" computer circuit consisting of the hardware and
6. port (carry or hold with both hands diagonally across the body, associated circuitry that links one device with
especially of weapons) "port a rifle" another (especially a computer and a hard disk
7. port (drink port) "We were porting all in the club after dinner" drive or other peripherals))
8. port (modify (software) for use on a different machine or platform)
Objective great.a.1 (relatively large in size or number
or extent; larger than others of its kind)
great.a.2, outstanding (of major significance
or importance)
great.a.3 (remarkable or out of the ordinary
in degree or magnitude or effect)
bang-up, bully, corking, cracking, dandy,
great.a.4, groovy, keen, neat, nifty, not bad,
great peachy, slap-up, swell, smashing, old (very
good)
capital, great.a.5, majuscule (uppercase)
big, enceinte, expectant, gravid, great.a.6,
large, heavy, with child (in an advanced
stage of pregnancy)
Objective great.a.1 (relatively large in size or number
or extent; larger than others of its kind)
great.a.2, outstanding (of major significance
or importance)
great.a.3 (remarkable or out of the ordinary
in degree or magnitude or effect)
bang-up, bully, corking, cracking, dandy,
great.a.4, groovy, keen, neat, nifty, not bad,
great peachy, slap-up, swell, smashing, old (very
good)
capital, great.a.5, majuscule (uppercase)
big, enceinte, expectant, gravid, great.a.6,
large, heavy, with child (in an advanced
stage of pregnancy)
great.n.1 (a person who has achieved
distinction and honor in some field)
port.n.1
port.n.2
Word Sense Disambiguation port.n.3,
port.n.4
A classification problem: port.n.5
General Form:
f (sent_tokens, (target_index, lemma, POS)) -> word_sense

He walked along the port next to the steamer.

Word Sense Disambiguation

A classification problem:
General Form:
f (sent_tokens, (target_index, lemma, POS)) -> word_sense

Logistic Regression (or any discriminative classifier):

Plemma,POS(sense = s | features)

He walked along the port next to the steamer.

Word Sense Disambiguation

A classification problem:
General Form:
f (sent_tokens, (target_index, lemma, POS)) -> word_sense

Logistic Regression (or any discriminative classifier):

Plemma,POS(sense = s | features)

He walked along the port next to the steamer.

(Jurafsky, SLP 3)
Distributional Hypothesis:

Wittgenstein, 1945: “The meaning of a word is its use in the language”

Distributional Hypothesis:

Wittgenstein, 1945: “The meaning of a word is its use in the language”

Distributional hypothesis -- A word’s meaning is defined by all the different
contexts it appears in (i.e. how it is “distributed” in natural language).

Firth, 1957: “You shall know a word by the company it keeps”

The nail hit the beam behind the wall.

Distributional Hypothesis

The nail hit the beam behind the wall.

Distributional Hypothesis

Similarity -

Relatedness -

The nail hit the beam behind the wall.

Distributional Hypothesis

Similarity - Has same or similar meaning.

synonyms (same as), hypernyms (is-a), hyponyms (has-a)

Relatedness - Any relationship:

includes similarity but also antonyms, meronyms (part-of), etc….

The nail hit the beam behind the wall.

Distributional Hypothesis

Similarity - Has same or similar meaning.

synonyms (same as), hypernyms (is-a), hyponyms (has-a)
beam is-a piece of wood
beam is similar to piece of wood

Relatedness - Any relationship:

includes similarity but also antonyms, meronyms (part-of), etc….

The nail hit the beam behind the wall.

Distributional Hypothesis

Similarity - Has same or similar meaning.

synonyms (same as), hypernyms (is-a), hyponyms (has-a)
beam is-a piece of wood
beam is similar to piece of wood

Relatedness - Any relationship:

includes similarity but also antonyms, meronyms (part-of), etc….
beam is-part-of a house
beam is related to a house
beam is similar to a house
The nail hit the beam behind the wall.
Approaches to WSD
I.e. how to operationalize the distributional hypothesis.
1. Bag of words for context
E.g. multi-hot for any word in a defined “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valued vectors that “encode” the context (TBD).
Approaches to WSD
I.e. how to operationalize the distributional hypothesis.
1. Bag of words for context
E.g. multi-hot for any word in a defined “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valued vectors that “encode” the context (TBD).
Lesk Algorithm for WSD
I.e. how to operationalize the distributional hypothesis.

1. Bag of words for context

E.g. multi-hot for any word in a defined “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valued vectors that “encode” the context (TBD).
Lesk Algorithm for WSD
I.e. how to operationalize the distributional hypothesis.
● bank.n.1 (sloping land (especially the slope beside a body of water)) "they
pulled the canoe up on the bank"; "he sat on the bank of the river and
watched the currents"
●1.bank.n.2
Bag of(awords forinstitution
financial context that accepts deposits and channels the
E.g. multi-hot
money for any
into lending word in"he
activities) a defined
cashed“context”.
a check at the bank"; "that bank
holds the mortgage on my home"
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
The
E.g.bank can guarantee
real valued vectors thatdeposits
“encode”will cover future
the context (TBD).tuition costs, ...
● Lesk Algorithm
bank.n.1 for WSD
(sloping land (especially the slope beside a body of water)) "they pulled the
canoe up on the bank"; "he sat on the bank of the river and watched the currents"
●I.e. how (atofinancial
bank.n.2 operationalize the
institution that distributional
accepts deposits and hypothesis.
channels the money into
lending activities) "he cashed a check at the bank"; "that bank holds the mortgage on
my home"
●1.... Bag of words for context
● bank.n.4 (an arrangement of similar objects in a row or in tiers) "he operated a bank of
E.g. multi-hot for any word in a defined “context”.
switches"
● ...
2. Surrounding window with positions
● bank.n.8 (a building in which the business of banking transacted) "the bank is on the
E.g. of
corner one-hot
Nassauper
andposition relative to word).
Witherspoon"
● bank.n.9 (a flight maneuver; aircraft tips laterally about its longitudinal axis (especially
3.in Lesk algorithm
turning)) "the plane went into a steep bank"
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
The
E.g.bank can guarantee
real valued vectors thatdeposits
“encode”will cover future
the context (TBD).tuition costs, ...
Lesk Algorithm for WSD
● striker.n.1 (a forward on a soccer team)
I.e.
● how to operationalize
striker.n.2 theintensive
(someone receiving distributional
training forhypothesis.
a naval technical
rating)
● striker.n.3 (an employee on strike against an employer)
1.● Bag of words
striker.n.4 for context
(someone who hits) "a hard hitter"; "a fine striker of the ball";
E.g. multi-hot are
"blacksmiths for good
any word in a defined “context”.
hitters"
2.● Surrounding
striker.n.5 (thewindow
part of a with
mechanical device that strikes something)
positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valuedHe addressed
vectors the strikers
that “encode” at the(TBD).
the context rally.
Approaches to WSD
I.e. how to operationalize the distributional hypothesis.
1. Bag of words for context
E.g. multi-hot for any word in a defined “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valued vectors that “encode” the context (TBD).
Approaches to WSD
I.e. how to operationalize the distributional hypothesis.
1. Bag of words for context
E.g. multi-hot for any word in a defined “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense definitions.
4. Selectors -- other target words that appear with same
context E.g. counts for any selector.
5. Contextual Embeddings
E.g. real valued vectors that “encode” the context (TBD).
Selectors
… a word which can take the place of another given word within the same local
context (Lin, 1997)

Original version: Local context defined by dependency parse

Selectors
… a word which can take the place of another given word within the same local
context (Lin, 1997)

Original version: Local context defined by dependency parse

object of

He addressed the strikers at the rally.

Selectors
… a word which can take the place of another given word within the same local
context (Lin, 1997)

Original version: Local context defined by dependency parse (Lin, 1997)

Web version: Local context defined by lexical patterns matched on the Web
(Schwartz, 2008).

“He addressed the * at the rally.”

Selectors
Selectors

0
1
0
0
0
1
0
0
0
...
Selectors
Selectors
Leverages hypernymy:
concept1 <is-a> concept2
Why Are Selectors Effective?
Sets of selectors tend to vary extensively by word sense:
Vector Semantics

1. Word2vec

2. Topic Modeling - Latent Dirichlet Allocation (LDA)

Supervised Selectors
Approaches to WSD
I.e. how to operationalize the distributional hypothesis.
1. Bag of words for context
E.g. multi-hot for any word in a deﬁned “context”.
2. Surrounding window with positions
E.g. one-hot per position relative to word).
3. Lesk algorithm
E.g. compare context to sense deﬁnitions.
4. Selectors -- other target words that appear with same context
E.g. counts for any selector.
5. Contextual Embeddings - introduced with Transformer LMs
E.g. real valued vectors that “encode” the context (TBD).
Vector Semantics

1. Word2vec

2. Topic Modeling - Latent Dirichlet Allocation (LDA)

Timeline: Language Modeling and Vector Semantics
1913 Markov: Probability that next letter would be vowel or consonant.
1948
Shannon: A Mathematical Theory of Communication (first digital language model)
Jelinek et al. (IBM): Language Models for Speech Recognition
1980
Osgood: The Brown et al.: Class-based ngram models of
Measurement 2003 natural language
of Meaning
Blei et al.: [LDA Topic Modeling]
Deerwater: 2010
Switzer: Vector Mikolov: word2vec
Indexing by Latent
Space Models
Semantic Analysis ELMO 2018
(LSA)
Collobert and
Bengio: GPT
Language Models Weston: A unified RoBERTA
Neural-net
Vector Semantics architecture for
based natural language BERT
LMs + Vectors DpSk-R1
embeddings processing: Deep
~logarithmic scale
neural networks... GPT4o
Objective
To embed: convert a token (or sequence) to a vector that represents meaning.
Objective
To embed: convert a token (or sequence) to a vector that represents meaning, or
is useful to perform downstream NLP application.
Objective

embed
port
Objective

0
…
embed
port 0
1
…
0
Objective

Prefer dense vectors

one-hot is sparse vector ● Less parameters (weights) for
machine learning model.
0 ● May generalize better implicitly.
… ● May capture synonyms
embed
port 0
1 For deep learning, in practice, they work
… better. Why? Roughly, less parameters
0 becomes increasingly important when you are
learning multiple layers of weights rather than
just a single layer.
Objective

Prefer dense vectors

(Jurafsky, 2012)
Objective

Prefer dense vectors

one-hot is sparse vector ● Less parameters (weights) for
machine learning model.
0 ● May generalize better implicitly.
10
… ● May capture synonyms
embed
port 0
1 For deep learning, in practice, they work
… 10 better. Why? Roughly,18 less parameters
5 3 0 2 becomes increasingly 10 important when you are
9 learning multiple layers of weights rather than
just a single layer.

0 (Jurafsky, 2012)
0 5 10 15 20
Objective
To embed: convert a token (or sequence) to a vector that represents meaning.
Distributional hypothesis
-- A word’s meaning is defined by all the different contexts it
appears in (i.e. how it is “distributed” in natural language).
To embed: convert a token (or sequence) to a vector that represents meaning.

Wittgenstein, 1945: “The meaning of a word is its use in the language”

Firth, 1957: “You shall know a word by the company it keeps”

The nail hit the beam behind the wall.

Distributional Hypothesis

The nail hit the beam behind the wall.

Word Vectors

Prefer dense vectors

"one-hot encoding" ● Less parameters (weights) for
machine learning model.
0 ● May generalize better implicitly.
… ● May capture synonyms
embed
port 0
1 For deep learning, in practice, they work
… better. Why? Roughly, less parameters
0 becomes increasingly important when you are
learning multiple layers of weights rather than
just a single layer.
Word Vectors

"vector embedding"

0.53
embed 1.5
port 3.21
-2.3
.76
Objective
port.n.1 (a place (seaport or airport) where
people and merchandise can enter or leave a
country)
port.n.2 port wine (sweet dark-red dessert wine
originally from Portugal)
0.53 port.n.3, embrasure, porthole (an opening (in a
embed 1.5 wall or ship or armored vehicle) for firing
port 3.21 through)
-2.3 larboard, port.n.4 (the left side of a ship or
.76 aircraft to someone who is aboard and facing
the bow or nose)
interface, port.n.5 ((computer science)
computer circuit consisting of the hardware and
associated circuitry that links one device with
another (especially a computer and a hard disk
drive or other peripherals))
Objective great.a.1 (relatively large in size or number
or extent; larger than others of its kind)
great.a.2, outstanding (of major significance
or importance)
great.a.3 (remarkable or out of the ordinary
in degree or magnitude or effect)
-0.2 bang-up, bully, corking, cracking, dandy,

great
embed 0.3
-1.1
-2.1
? great.a.4, groovy, keen, neat, nifty, not bad,
peachy, slap-up, swell, smashing, old (very
good)
.26
capital, great.a.5, majuscule (uppercase)
big, enceinte, expectant, gravid, great.a.6,
large, heavy, with child (in an advanced
stage of pregnancy)
great.n.1 (a person who has achieved
distinction and honor in some field)
Word2Vec
Principle: Predict missing word.

Similar to classification where y = context and x = word.

p(context | word)
Word2Vec
Principle: Predict missing word.

Similar to classification where y = context and x = word.

p(context | word)
To learn, maximize
Word2Vec
Principle: Predict missing word.

Similar to classification where y = context and x = word.

p(context | word)

To learn, maximize.
In practice, minimize
J = 1 - p(context | word)
Word2Vec: Context p(context | word)

2 Versions of Context:
1. Continuous bag of words (CBOW): Predict word from context
2. Skip-Grams (SG): predict context words from target
Word2Vec: Context p(context | word)

2 Versions of Context:
1. Continuous bag of words (CBOW): Predict word from context
2. Skip-Grams (SG): predict context words from target

(Jurafsky, 2017)
Skip-Grams (SG): predict context words from target
p(context | word)
Steps: