Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_learned_from_multisense_words

Constructing Dataset Based on
Concept Hierarchy for Evaluating
Word Vectors Learned from
Multisense Words
2019/08/27
Aoyama Gakuin University, JPN
1

Background
Word Embedding
A technique that represents a word as a vector
◆ Optimizes vectors to distinguish words’ meanings from
each other
◆ Learns vectors from parts of sentences in which individual
words appear
2
Interesting features such as enabling analogy
− + =
e.g.) king − man + woman = queen
contexts

Existing Techiniques
◆ Assign a single (word) vector to each word
e.g.) word2vec [1]
3
◆ Assign multiple (sense) vectors to each word
To prevent individual meanings from being mixed in a
single vector
e.g.) sense2vec [2], MSSG [3], and DeConf [4]
Learning models

Leaning Sense Vectors
4
◆Assigning multiple vectors to each word
◆Learning multiple vectors from various information
➢ Part-of-Speech (PoS) base: sense2vec [2]
➢ Clustering base: MSSG [3]
➢ WordNet’s synset base: DeConf [4]
sense vector
king (ruler) 0.5 0.1 ⋯
king (businessman) 0.2 0.3 ⋯
word vector
king 0.4 0.2 ⋯
word (single) vector sense (multiple) vector

Datasets for evaluation
◆ Evaluate how well word vectors are learned
◆ Assign similarity scores to pairs of words
Recent Activities
◆ Assign a single (word) vector to each word
e.g.) word2vec [1]
5
◆ Assign multiple (sense) vectors to each word
To prevent individual meanings from being mixed in a
single vector
e.g.) sense2vec [2], MSSG [3], and DeConf [4]
Learning models

Word Similarity Dataset
6
word 1 word 2 similarity score [0, 10]
king princess 3.27
accomplish achieve 8.57
accomplish win 7.85
SimLex-999 [5]
◆ Contains 999 pairs of words with similarity scores
◆ Evaluates the rank correlation between 2 ordered lists
of word pairs:
similarity scores v.s. similarities between learned vectors
averaged scores given by annotators

Motivation
Existing datasets
Evaluate similarity between words
7
Can existing datasets properly evaluate multiple vectors?
word-based evaluation

Word-based Evaluation
8
king (ruler)
king (businessman)
queen (ruler)
queen (musician)
◆Using some summarized value such as average or max
◆Assigns a single similarity score to a pair of words
king queen e.g. 7.00
to a single value
summarizing multiple values

Motivation
Existing datasets
Evaluate similarity between words
9
Can existing datasets properly evaluate multiple vectors?
word-based evaluation
Problem
king (ruler)
king (businessman)
queen (ruler)
queen (musician)
Used in same contexts

Purpose
10
◆Compares only pairs of vectors used in the same context
◆Collecting related words used in the same context
sense
king (ruler)
queen (ruler)
crowned_head
musician, businessman
evaluate
not evaluate
Same contexts
Different contexts
We propose the sense-based evaluation
: related words : unrelated words

Our Approach
11
Collect related words from concept hierarchies
Constructing dataset and proposing an evaluation metric
king (ruler)
queen (ruler)
crowned_head
hypernyms in
WordNet & BabelNet
musician, businessman
evaluate
not evaluate
➢ node: synset (a set of synonyms)
➢ link: hyernym-hyponym (parent-child) relationship
: related words : unrelated words

Overview of Dataset
12
word PoS synonyms hypernyms-1 …
hectare Noun ha, hm2, … metric, area_unit, … …
liter
Noun microlitre, … metric_capacity_unit, … …
Noun litér, … village, hamlet, … …
accomplish
Verb fulfil, action, … effect, complete, … …
Verb achieve,attain, … win, succeed, … …
single
multiple
meanings
sub-records
Records are pairs of a word and sub-records
◆ Each word has sub-records corresponds to the sense
◆ Sub-records are composed of
PoS tag (Noun or Verb), synonyms (synset), and 3 hypernyms

How to Get Related Words
13
hierarchy of liter (unit) in WordNet
Collect related words from concept hierarchies
word synonyms hypernyms-1 …
liter cubic_decimiter, litre, … metric_capacity_unit, … …
{𝐥𝐢𝐭𝐞𝐫Noun
1
, cubic_decimiterNoun
1
, litreNoun
1
, … }
{metric_capacity_unit 𝑁𝑜𝑢𝑛
1
, … }
synonyms (synset)
hypernyms-1
is-a
is-a
related words of liter (unit)

WordNet v.s. BabelNet
WordNet (comprehension:△) [6]
Famous concept hierarchy maintained manually
BabelNet (comprehension:〇) [7]
Semi-automatically created by using WordNet and Wikipedia
14
Combine WordNet and BabelNet to collect meanings
liter ➢ a metric unit of capacity
➢ a village in Hungary
WordNet
BabelNet
e.g. ) BabelNet has more variety of synsets of words

Combining WordNet & BabelNet
15
liter
metric_
capacity_
unit
unit_of_
measure
metric_
capacity_
unit
is-a is-a
is-a
is-ais-a
is-a1𝐿
synonyms
megalitre,
microlitre, …
cubic_decimiter,
litre, l, …
word synonyms hypernyms-1 …
liter
cubic_decimiter, litre, l,
megalitre, microlitre, …
metric_capacity_unit,
unit_of_measure, …
…
synonyms
WordNet BabelNet (Wikipedia)
hypernyms hypernyms
Merge words to compose related words
WordNet
BabelNet

Removing Inappropriate Words
16
For properly evaluation, remove words
having inappropriate hypernym-hyponym relationships
play 𝑁𝑜𝑢𝑛
8 play 𝑁𝑜𝑢𝑛
14
diversion 𝑁𝑜𝑢𝑛
1
evaluate 𝑉𝑒𝑟𝑏
2
same wordevaluate 𝑉𝑒𝑟𝑏
1
① different sense appear in
different hierarchies
② different synsets of words
have the same synset
is-a
is-a is-a
same relation
Leave words that can be properly evaluated

Evaluation Metric
17
𝑠𝑐𝑜𝑟𝑒 =
1
|𝑊|
෍
𝑤∈𝑊
σ1≤𝑠≤𝑆 𝑤
max
1≤𝑑≤𝑆 𝑑 𝑤
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑁(𝑁 𝑤 𝑠
, 𝑇𝑑)
max(𝑆 𝑤, 𝑆 𝑑 𝑤
)
➢ 𝑁 𝑤 𝑠
： set of 𝑁 neighboring words for a sense 𝑤𝑠 of word 𝑤
➢ 𝑇𝑑： union of sets of synonyms and hypernyms in dataset
12
3
4
1. Calculate 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 of neighbor words of learned vector for
each sense of the target word and select the sense that
achieves the maximum 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
2. Aggregate maximum sense-based scores every sub-records
3. Regularize the influence from the number of sense vectors
4. Calculate scores of all words by repeating steps 1 to 3 for
all the words and obtain an average score as the final result

Evaluation Method
18
Evaluate multiple by 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 and each related words
𝑠𝑐𝑜𝑟𝑒 =
1
|𝑊|
෍
𝑤∈𝑊
σ1≤𝑠≤𝑆 𝑤
max
1≤𝑑≤𝑆 𝑑 𝑤
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑁(𝑁 𝑤 𝑠
, 𝑇𝑑)
max(𝑆 𝑤, 𝑆 𝑑 𝑤
)
➢ 𝑁 𝑤 𝑠
： set of 𝑁 neighbor words for a sense 𝑤𝑠 of word 𝑤
➢ 𝑇𝑑： union of sets of synonyms and hypernyms in dataset
word PoS synonyms hypernyms-1 …
accomplish
Verb carry_out, fulfil, … effect, … …
Verb achieve, attain, … win, … …
𝑇 𝑤: related words
𝑆 𝑑 𝑤
: number of
synsets of word
12
3
4

Case of “accomplish” vectors
19
0.4
0.2
final result
carry_out, fulfil
accomplish-1 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@5sense-1
carry_out, fulfil,
execute, …
sense-2
achieve, attain,
reach, …achieve, by_luck
accomplish-2
: neighbor words : related words
1 Calculate 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and select the sense
2 Aggregate
scores of “accomplish” and others
0.4 0.2
3
for regularization
÷ 𝑆 𝑤 𝑜𝑟 𝑆 𝑑 𝑤
・・・ 0.3 0.3 0.1
𝑆 𝑤 𝑜𝑟 𝑆 𝑑 𝑤・・・ 𝑆 𝑤 𝑜𝑟 𝑆 𝑑 𝑤
4 Average
learned vectors
dataset
÷ 𝑺 𝒘 𝒐𝒓 𝑺 𝒅 𝒘

Experiments
20
How they evaluate word vectors
We examined evaluation results from the following viewpoints:
◆ Influence of the number of neighbor words
◆ Influence of the existence of compound words
◆ Influence of the number of sense vectors
Investigate the validity of the proposed dataset and metric
How well they can handle multisense words
Comparing the proposed dataset with SimLex-999

Word Vectors to Evaluate
Word Vectors
We used 7 set of word vectors learned or pre-trained
21
model corpus
word2vec
wikinl
wikimulti
sense2vec
wikinl
wikimulti
model corpus
word2vec Google News [8]
DeConf Google News
MSSG Wikipedia [9]
Pre-trained vectorsLearned vectors
✓wikinl： without lemmatization
✓wikimulti： with lemmatization and multi word tokenization
e.g.）cubic decimiter ⇒ cubic_decimiter
[8] https://news.google.com

Influence of
theNumber of Neighbor Words
22
model corpus 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 @𝑵 # of words
word2vec
wikinl
𝟎. 𝟏𝟖𝟐 1 1,000
0.085 5 1,000
0.058 10 1,000
0.011 100 1,000
wikimulti
𝟎. 𝟐𝟎𝟎 1 988
0.106 5 988
◆Increasing neighbor words 𝑁 decreases 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
◆Related words are often appeared in near neighbors
Number of neigbor words is desirable to 𝟓 or 𝟏𝟎

Influence of
theExistence of Compound Words
23
model corpus 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 @𝑵 # of words
word2vec
wikinl
𝟎. 𝟏𝟖𝟐 1 1,000
0.085 5 1,000
0.058 10 1,000
0.011 100 1,000
wikimulti
𝟎. 𝟐𝟎𝟎 1 988
0.106 5 988
Models learned from wikimulti improves 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Considering compound words is very important

Influence of
the Number of Sense Vectors
24
sense2vec
word2vec model learned in consideration of PoS tagging
If PoS tagging is perfect, sense2vec would outperform word2vec
model corpus @𝑵 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 word2vec
sense2vec
wikinl
1 0.109 0.182
5 0.053 0.085
sense2vec*
1 𝟎. 𝟏𝟒𝟔 0.182
5 𝟎. 𝟎𝟕𝟒 0.085
use all PoS
use only
related PoS
When using sense2vec, accurate PoS tagging is important
sense2vec is worse than word2vec and sense2vec*
⇒ Errors of PoS tagging make another problems to learn multisense words

Comparison with SimLex-999
25
SimLex-999 Proposed
model corpus 𝑎𝑣𝑔𝑆𝑖𝑚 𝑚𝑎𝑥𝑆𝑖𝑚 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@5
word2vec wikinl 𝟎.379 𝟎.375 0.098
MSSG Wikipedia 0.275 0.271 0.103
word2vec Google News 0.490 0.490 0.084
DeConf Google News 𝟎. 𝟓𝟑𝟗 𝟎. 𝟓𝟖𝟎 0.149
◆SimLex-999 sometimes evaluate vectors inappropriately
◆Proposed dataset tends to evaluate vectors appropriately
In SimLex-999 MSSG < word2vec < DeConf
In the proposed dataset word2vec < MSSG, DeConf

Conclusion
26
We proposed a method of constructing Dataset and
an evaluation metric to realize sense-based evaluation
of sense vectors for multisense words
◆ Evaluating the proposed dataset and evaluation metric under a wide variety
of settings
◆ Constructing dataset for evaluation of leaning models from sentence level
Future work
Proposed Dataset
Evaluation metric
Utilizes synonyms in WordNet and BabelNet to compose related words
that enable us to identify meanings of learned sense vectors
Can evaluate sense vectors more appropriately than existing word
similarity datasets such as SimLex-999

References 1
[1] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of
word representations in vector space. arXiv preprint arXiv:1301.3781
(2013)
[2] Trask, A., Michalak, P., Liu, J.: sense2vec-a fast and accurate method
for word sense disambiguation in neural word embeddings. arXiv e-
prints arXiv:1511.06388 (2015)
[3] Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-
parametric estimation of multiple embeddings per word in vector
space. In: Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing (EMNLP), pp. 1059-–1069. Association
for Computational Linguistics (2014). https://doi.org/10.3115/v1/D14-
1113. http://aclweb.org/anthology/D14-1113
[4] Pilehvar, M.T., Collier, N.: De-conflated semantic representations. In:
Proceedings of the 2016 Conference on Empirical Methods in Natural
Language Processing, pp. 1680–-1690. Association for Computational
Linguistics (2016). https://doi.org/10.18653/v1/D16-1174.
http://aclweb.org/anthology/D16-1174
27

References 2
[5] Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic
models with (genuine) similarity estimation. Comput. Linguist. 41(4), pp.
665–-695 (2015)
[6] Fellbaum, C.: Wordnet and wordnets. In: Barber, A. (ed.) Encyclopedia
of Language and Linguistics, pp. 2–-665. Elsevier, Amsterdam (2005)
[7] Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction,
evaluation and application of a wide-coverage multilingual semantic
network. Artif. Intell. 193, pp. 217--250 (2012)
[9] Shaoul, C.: The westbury lab wikipedia corpus. Edmonton, AB:
University of Alberta, p. 131. (2010)
28

Non-existent Meaning
“accomplish” vectors
29
model PoS matched words
word2vec - achieve, fulfill
sense2vec
Verb achieve
Noun -
Adjective achieve, fulfill
Adposition -
⇒ Same results both in the case of using only Verb and all PoS
Obtaining only necessary meaning leads to better result
non-existent PoS
PoS tagging errors cause non-existent meanings

Learning Different Meaning
30
model matched words
word2vec achieve, fulfill
MSSG achieve, fulfill
model matched words
word2vec proclaim
MSSG
declare
Inform, declare
-
“announce” vectors
MSSG = word2vec
MSSG fails
to learn different meanings
MSSG > word2vec
MSSG succeeds
in learning different meanings
Word vector learns multisense obtain better result

To Improve Evaluation Results
31
model common related words for each sense 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@5 result
word2vec
fulfill 0.2
𝟎. 𝟐
achieve 0.2
MSSG
fulfill 0.2
𝟎. 𝟐
achieve 0.2
DeConf
carry_out, fulfill, carry_through 𝟎. 𝟔
𝟎. 𝟒
achieve 0.2
For a better result, multiple matched words are needed
DeConf obtains multiple neighbor words for one sense

Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_learned_from_multisense_words

More Related Content

What's hot (20)

Similar to Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_learned_from_multisense_words (20)

More from 禎晃山崎 (9)

Recently uploaded (20)