0% found this document useful (0 votes)

4 views

4. WordRepresentation

The document discusses distributed word representation and neural language models, focusing on key concepts such as neurons, layers, activation functions, and word embeddings. It explains the necessity of word embeddings for techniques that do not operate on text directly, highlighting types of embeddings like local and distributed representations. Additionally, it covers various models for word representation, including Word2Vec, GloVe, and FastText, and introduces concepts like negative sampling and subsampling to improve computational efficiency.

Uploaded by

saurav22465

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

4. WordRepresentation

Uploaded by

saurav22465

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Distributed Word representation

Md Shad Akhtar
[email protected]

shadakhtar:nlp:iiitd:2024:we
A few key terms
● Neurons
● Layers
○ Input, Output and Hidden
● Activation functions
○ Sigmoid, Tanh, Relu
● Softmax
● Weight matrices
○ Input → Hidden, Hidden → Output
● Backpropagation
○ Optimizers
■ Gradient Descent (GD), Stochastic Gradient Descent (SGD), etc.
○ Error (Loss) functions
■ Mean-Squared Error, Cross-Entropy etc.
○ Gradient of error
○ Passes: Forward pass and Backward pass

2
shadakhtar:nlp:iiitd:2024:we
Neural Language Model and
Distributed Word Representation

shadakhtar:dl:iiitd:2021:nlm:we
Input to the neural models
● Image
○ Pixel values

● Speech
○ Acoustic features extracted from some tools, e.g., MFCC

● Text
○ Word embeddings
■ Cat [1, 0, 0]
■ Dog [0, 1, 0]
■ Lamp [0, 0, 1]

shadakhtar:dl:iiitd:2021:nlm:we
Word Embeddings
• Word Embedding, Word Vector, or Word Representation
• A vector/numeric representation of a word.

• Why do we need it?

• Some techniques do not operate on text
• SVM, Neural Network, etc.

• Objective
• Semantically-rich word representations
• Conserve the meaning of a word (in context), e.g.,
• egood should convey the sense of a pleasant scenario
• egood should be on the opposite spectrum or far away from the unpleasant words, e.g.,
ebad.

5
shadakhtar:nlp:iiitd:2024:we
Types of embeddings
– Local Representation
Cat
• One hot
Dog
– Sparse
– No semantics
– Curse of Dimensionality Man
Table

6
shadakhtar:nlp:iiitd:2024:we
• I like deep learning.
• I like NLP.
Types of embeddings • I enjoy ying.

– Distributed Representation counts I like enjoy deep learning NLP flying .

• Co-occurrence vector
– Partially Dense I 0 2 1 0 0 0 0 0
– Low-degree of semantics
– Curse of Dimensionality like 2 0 0 1 0 1 0 0

enjoy 1 0 0 0 0 0 1 0

deep 0 1 0 0 1 0 0 0

learning 0 0 0 1 0 0 0 1

elike = [2,0,0,1,0,1,0,0] NLP 0 1 0 0 0 0 0 1

eenjoy = [1,0,0,0,0,0,1,0]
flying 0 0 1 0 0 0 0 1
Cosine(elike , eenjoy ) != 0
. 0 0 0 0 1 1 1 0
7
shadakhtar:nlp:iiitd:2024:we
fl
Types of embeddings
– Distributed Representation
• Word Embeddings
– Dense
– Semantically rich
– Dimension is not a function of the
vocabulary size.
ecat = [1.5, 0.5, 3.2, 5.7]

etable = [4.3, 1.7, 2.5, 1.9]

eman = [0.3, 5.6, 1.0, 3.9]

These representations are good (efficient), iff, y < z < x

8
shadakhtar:nlp:iiitd:2024:we *Matrix has hypothetical values
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

Probabilistic Language Modeling - [Bengio et al., 2003]

sky
● Input layer
○ n previous words
● Projection layer (no non-linearity)
○ Projection matrix C of size |V| x m
● Hidden layer (tanh)
● Output layer
● Skip-connection (optional)

y = softmax(b + Wx + U tanh(Hx + d))

x = [C(wt-1), C(wt-2), ..., C(wt-n+1)]

Clouds in the
shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

● Word2Vec (Mikolov et al.) inherited the idea and used it to capture the word semantics.
○ Used previous as well as future context
○ Removed the hidden layer
○ Used the shared projection layer

• Offers remarkable performance

• Foundation stone of DL-based NLP

shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

queen male king

Royal ordinary

male
woman

man

shadakhtar:dl:iiitd:2021:nlm:we
“Linguistics is the eye, computation is the body”: Word Embeddings

● “A word is known by the company its keeps” - [Firth, 1950s]

● Words with similar distributional properties have similar meanings. - [Harris, 1970s]

● Model differences/similarities in meaning rather than the proper meaning itself.

○ What is the meaning of the word “cat”?

• A word can be defined by the properties it posses

– Cat: {mew, animal, mice, furry, purr, carnivore, pet, milk, curious}
– Dog: {bark, animal, bone, police, faithful, pet, carnivore, milk}
– Lamp: {candle, light, flash, stand, shade, bulb}

12
shadakhtar:dl:iiitd:2021:nlm:we
Distributional Properties

≈ ≠
{mew, animal, mice, {bark, animal, bone, {candle, light, flash,
furry, purr, carnivore, police, faithful, pet, stand, shade, bulb}
pet, milk, curious} carnivore, milk}

13
shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

Loss

Linear Softmax
Continuous Bag-of-Words
(CBOW)

shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

Loss

Continuous Skip-gram Linear -

Softmax

shadakhtar:dl:iiitd:2021:nlm:we
Word2Vec: Preprocessing
Sentence: The domestic cat likes milk.

Context window size |C| = 2

X The domestic cat likes milk .

The domestic cat likes milk .

Previous |C| tokens Next |C| tokens

1000…0 Word-1
One-Hot encoding
0100…0 Word-2
Vocabulary
|V| ...

Corpus 0000…1 Word-N

shadakhtar:nlp:iiitd:2021:DL:NLP
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

The
[100000]
Cat
[000010]
Domestic
[010000]
Loss

Likes
[001000] Linear Softmax

Milk
[000001]
shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1301.3781.pdf

Efficient Word Representation - [Mikolov et al., 2013a]

The
[100000]
W ∈ R |V|×|h|

Cat Domestic
[000010] [010000]
Loss

Linear Likes -
[001000]

W T ⋅ [0,0,0,0,1,0]
∑
h=
Milk
Softmax
[000001]
shadakhtar:dl:iiitd:2021:nlm:we
Context window size |C| = 2

Objective function The domestic cat likes milk .

1 T If wt = cat, then
T∑ ∑
− log p(wt+j | wt ) wt+j = wt−2 = 'The' , for j = − 2
t=1 −c≤ j≤c, j≠0
= wt−1 = 'domestic' , for j = − 1
= wt+1 = 'likes' , for j = + 1
= wt+2 = 'milk' , for j = + 2

log p(the | cat) + log p(domestic | cat) + log p(likes | cat) + log p(milk | cat)

exp(vmilk vcat )
p(milk | cat) =
∑i∈V exp(vi vcat )

shadakhtar:dl:iiitd:2021:nlm:we
∂ log p(c | w) ∂ exp(vc vw)
= log
Objective function ∂vw ∂vw ∑i∈V exp(vi vw)

∂ ∂
∑
= log exp(vc vw) − log exp(vi vw)
∂vw ∂vw i∈V
1 T
T∑ ∑
− log p(wt+j | wt ) ∂vc vw ∂ ∂
∑ ∑
t=1 −c≤ j≤c, j≠0 = − log exp(vi vw) = vc − log exp(vi vw)
∂vw ∂vw i∈V
∂vw i∈V

1 ∂ ∂v v
exp(vl vw) . l w
∑i∈V exp(vi vw) ∂vw ∑
= vc − .
exp(vc vw) l∈V
∂vw
p(wt+j | wt ) = p(c | w) =
∑i∈V exp(vi vw)
1
∑i∈V exp(vi vw) ∑
= vc − exp(vl vw) . vl
l∈V

∑ ( ∑ exp(vi vw) ) l
exp(vl vw)
= vc − .v
∑
= vc − p(l | w) . vl
Denominator is a costly operation l∈V i∈V l∈V

shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1310.4546.pdf

Negative Sampling - [Mikolov et al., 2013b]

• The first version of word2vec had some computational issue

– One of the operations (denominator) in loss computation was expensiv
– A word can have a closed set of related/contextual words, usually small in number.
– A cat is related to mew, animal, mice, furry, purr, carnivore, pet, milk, curious, etc.
– On the other hand, a word can have a huge number of unrelated/non-contextual words.
– A cat is NOT related to everything other than what it is related to

• Negative sampling
– To reduce the complexity, unrelatedness of a word is calculated with a fixed/small set of
words.

shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1310.4546.pdf

Negative Sampling - [Mikolov et al., 2013b]

∑ ∑
log P(D = 1 | c, w) + log P(D = 0 | i, w)
(c,w)∈D (i,w)∈Pn

log (1 − P(D = 1 | i, w))

∑ ∑
log P(D = 1 | c, w) +
(c,w)∈D (i,w)∈Pn

log (1 − σ(vi vw))

∑ ∑
log σ(vc vw) +
(c,w)∈D (i,w)∈Pn

∑ ∑
log σ(vc vw) + log σ(−vi vw)
(c,w)∈D (i,w)∈Pn

shadakhtar:dl:iiitd:2021:nlm:we
https://arxiv.org/pdf/1310.4546.pdf

Subsampling - [Mikolov et al., 2013b]

• Word2vec model learns the semantic through cooccurrence
• Cooccurrence(“England”, “London”) is more crucial than the cooccurrence(“England”,
“The”)
• Frequent words (‘a’, ‘an’, ‘the’, etc.) usually provide less information value than the
rare words.

• Each word, wi, in the training set is discarded with probability computed by the formula

t
P(wi) = 1 −
f (wi)

shadakhtar:dl:iiitd:2021:nlm:we
Word Embedding Models
• Non-contextual
◦ Word2vec [Mikolov et al., 2013] - https://code.google.com/archive/p/word2vec/
▪ Two variants
▪ Skip-gram
▪ Continuous Bag-of-word

◦ GloVe [Pennington et al., 2014] - https://nlp.stanford.edu/projects/glove/

▪ Co-occurrence matrix

◦ FastText [Bojanowski et al., 2016] - https://fasttext.cc/docs/en/unsupervised-tutorial.html

▪ Similar to word2vec
▪ Works at the sub-word level

• Contextual: The representation for each word depends on the context in which it is used.
◦ Embeddings from Language Models (ELMo) [Peters et al., 2018] - https://allennlp.org/elmo
◦ OpenAIs’ Generative Pre-trained Transformer (GPT) [Radford et al., 2018] - https://openai.com/blog/better-language-models/
◦ Bidirectional Encoder Representations from Transformers (BERT) [Devlin et al., 2018] - https://huggingface.co/transformers/
◦ Many more based on BERT…

shadakhtar:dl:iiitd:2021:nlm:we
Gensim “word2vec” package.
● Explore https://radimrehurek.com/gensim/models/word2vec.html

● A toy example:
https://colab.research.google.com/drive/1CixWE8AR-tckgfV5fAzO9KqOqUwQBSIC#scrollTo=vR0upm_rUEF5

25
shadakhtar:nlp:iiitd:2024:we
Thanks

26
shadakhtar:nlp:iiitd:2024:we

word embedding
No ratings yet
word embedding
35 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
W03 NLP
No ratings yet
W03 NLP
88 pages
Unit iv
No ratings yet
Unit iv
57 pages
Unit iv
No ratings yet
Unit iv
58 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Chapter 3 After Modfiy
No ratings yet
Chapter 3 After Modfiy
4 pages
Lect04
No ratings yet
Lect04
44 pages
wordembed
No ratings yet
wordembed
31 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
UNIT-II
No ratings yet
UNIT-II
20 pages
07_word_embeddings_notes
No ratings yet
07_word_embeddings_notes
23 pages
L4_CSE256_FA24_WE
No ratings yet
L4_CSE256_FA24_WE
68 pages
Neural Network
No ratings yet
Neural Network
23 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Word 2 Vec
No ratings yet
Word 2 Vec
29 pages
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
No ratings yet
Multivariate Gaussian Document Representation From Word Embeddings For Text Categorization
6 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
GEN AI LAB PROGRAMS
No ratings yet
GEN AI LAB PROGRAMS
15 pages
lecture 10
No ratings yet
lecture 10
86 pages
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
No ratings yet
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
20 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
I041 NLP Assignment5
No ratings yet
I041 NLP Assignment5
12 pages
3. Graph Representation Learning
No ratings yet
3. Graph Representation Learning
32 pages
Word Vectors I
No ratings yet
Word Vectors I
23 pages
A Neural Probabilistic Language Model by Yoshua Bengio Ducharme and Vincent 2001
No ratings yet
A Neural Probabilistic Language Model by Yoshua Bengio Ducharme and Vincent 2001
7 pages
Word2vector Paper PDF
No ratings yet
Word2vector Paper PDF
9 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
No ratings yet
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
24 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
NLP m3
No ratings yet
NLP m3
111 pages
GML Part2
No ratings yet
GML Part2
48 pages
GML Part3
No ratings yet
GML Part3
49 pages
ML for NLP-LO4
No ratings yet
ML for NLP-LO4
42 pages
Vector Semantics and Embedding (part 2)
No ratings yet
Vector Semantics and Embedding (part 2)
47 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part I Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part I Spring 2016
10 pages
Word Embedding Generation For Telugu Corpus
No ratings yet
Word Embedding Generation For Telugu Corpus
28 pages
XCS224N_Module1_Slides
No ratings yet
XCS224N_Module1_Slides
72 pages
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
From Everand
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
Lena Neill
No ratings yet
C# Coding Mastery: Unlocking C# for Beginners and Beyond, A Step-By-Step Journey to Expertise in C# and Beyond - Python, Java, SQL, and More
From Everand
C# Coding Mastery: Unlocking C# for Beginners and Beyond, A Step-By-Step Journey to Expertise in C# and Beyond - Python, Java, SQL, and More
Ryan Campbell
No ratings yet
TouchCode Class 7: Coding Book
From Everand
TouchCode Class 7: Coding Book
Team Orange
No ratings yet
DB2 11 for z/OS: Intermediate Training for Application Developers
From Everand
DB2 11 for z/OS: Intermediate Training for Application Developers
Robert Wingate
No ratings yet
Ker As Tutorial
No ratings yet
Ker As Tutorial
33 pages
DL Jun - 2023
No ratings yet
DL Jun - 2023
2 pages
Model Questions DWT
No ratings yet
Model Questions DWT
3 pages
Aneja Convolutional Image Captioning CVPR 2018 Paper
No ratings yet
Aneja Convolutional Image Captioning CVPR 2018 Paper
10 pages
Continual Learning and Catastrophic Forgetting
No ratings yet
Continual Learning and Catastrophic Forgetting
21 pages
Cs224n Midterm 2018 Solution
No ratings yet
Cs224n Midterm 2018 Solution
17 pages
6 Supervised NN MLP
No ratings yet
6 Supervised NN MLP
16 pages
Introduction To Deep Learning: Ashis Kumer Biswas, Ph.D. ML Lab@CU Denver
No ratings yet
Introduction To Deep Learning: Ashis Kumer Biswas, Ph.D. ML Lab@CU Denver
28 pages
Manual Neuronica Calc
No ratings yet
Manual Neuronica Calc
5 pages
lightllm源码导读-模型
No ratings yet
lightllm源码导读-模型
37 pages
2003.00130 - James Wallbridge - Transformers For Limit Order Books
No ratings yet
2003.00130 - James Wallbridge - Transformers For Limit Order Books
16 pages
Neural_N_Problems - SLP
No ratings yet
Neural_N_Problems - SLP
123 pages
AMLATA2020_044
No ratings yet
AMLATA2020_044
11 pages
AI31
No ratings yet
AI31
13 pages
A Step by Step Backpropagation
No ratings yet
A Step by Step Backpropagation
8 pages
Applications of AI
No ratings yet
Applications of AI
56 pages
Code Question1-Adaline
No ratings yet
Code Question1-Adaline
29 pages
CENG3300 Lecture 9
No ratings yet
CENG3300 Lecture 9
19 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
DL Assignment 4
No ratings yet
DL Assignment 4
7 pages
DL Unit -1 Notes
No ratings yet
DL Unit -1 Notes
45 pages
Chap6 (Neural Network)
No ratings yet
Chap6 (Neural Network)
63 pages
Back Propagation Example
No ratings yet
Back Propagation Example
3 pages
DCNN-BasedVegetableImageClassificationUsingTransferLearning AComparativeStudy
No ratings yet
DCNN-BasedVegetableImageClassificationUsingTransferLearning AComparativeStudy
8 pages
Optimization For Long-Term Dependencies
No ratings yet
Optimization For Long-Term Dependencies
57 pages
Assignment 2
No ratings yet
Assignment 2
12 pages
CNN vs. RNN vs. ANN - Analysing 3 Types of Neural Networks in Deep Learning
No ratings yet
CNN vs. RNN vs. ANN - Analysing 3 Types of Neural Networks in Deep Learning
10 pages
Practical Neural Networks (3) : Part 3 - Feedback Nets and Competitive Nets
100% (2)
Practical Neural Networks (3) : Part 3 - Feedback Nets and Competitive Nets
5 pages
Sentiment Analysis On Twitter Using Neural Network
No ratings yet
Sentiment Analysis On Twitter Using Neural Network
7 pages
Pretrained ResNet-18 Convolutional Neural Network - MATLAB Resnet18
No ratings yet
Pretrained ResNet-18 Convolutional Neural Network - MATLAB Resnet18
2 pages