Improving WordNet Using Word Embeddings
Improving WordNet Using Word Embeddings
Abstract—The main objective of this paper is to create a proof change [9]. Thus, there appears the need of a method for
of concept regarding the improvement of the human-generated updating the WordNet information by adding or updating
database WordNet using computer-generated information from concepts and links between them, so that the WordNet-based
Word2Vec. Thus, we change the WordNet content, using the
information from existing corpora. The main method used to similarity that may be computed to reflect as much as possible
achieve this goal is by comparing the results of path algorithms the current reality.
for computing the semantic similarities between WordNet Since its creation, in 1995, WordNet suffered multiple
concepts (such as Path, Wu and Palmer, or Leacock and Chodorow modifications ( [10], [11]) aimed at improving its coverage and
similarities), with cosine similarity between the Word2Vec vectors the correctness of its links. This paper presents an approach
of the same concepts. One way to improve WordNet is by adding
new concepts from the Word2Vec corpus which have strong that is intended to improve the quality of WordNet by signaling
connections with existing words from WordNet. Another way concepts that should be added to the ontology, along with
to improve it is by updating its existing connections to underline concepts whose meaning have changed since their introduction
semantic change. Our experimental results prove that the method in the database and, thus, their connections should be updated,
we propose may be used to improve the number of concepts and i.e., some of them should be modified to also mark that the
the quality of links between synsets in WordNet, creating a more
meaningful semantic resource. meaning has changed, while others should be added in the
Index Terms—WordNet, Word2Vec, semantic similarity, database. To do that, we considered the semantic similarities
semantic change, word embeddings provided by Word2Vec [12], another lexical resource that was
built in a different way. Thus, by combining the information
I. I NTRODUCTION from the two sources, we aim to obtain a more accurate
WordNet [1] is a lexical ontology for English manually resource, along with a methodology for (semi-)automatically
created by the Cognitive Science Laboratory from Princeton updating the content from the WordNet database to better
University. It is based on grouping the words in synsets reflect the current meanings of the words.
considering their parts of speech (noun, verb, adjective, Word2Vec [12] was developed by a group of researchers
adverb) and the concepts that they represent in specific from Google lead by Thomas Mikolov, with the purpose of
contexts. The structure of WordNet is created by adding links building a vectorized representation for the words from a very
between related synsets. large corpus, based on the contexts in which they appear in that
WordNet is a popular resource for building numerous corpus. The algorithm returns a numeric vector representation
practical applications, such as sentiment analysis [2], for each word, this representation having the property that
automated text summarization [3], question answering [4], words with similar context are closer to each other in the multi-
word sense disambiguation [5], artificial intelligence for chat dimensional vector space that is created. Starting from this
bots [6]. One important application built on top of WordNet is property, one could compute the semantic similarity of two
WN:Similarity [7], which offers the possibility to compute the concepts by evaluating the cosine similarity between the vector
semantic similarity between two words / concepts. Semantic representation of these concepts.
similarity is very important in computational linguistic and In this paper, we compare the similarities built using
natural language processing, so having its value as correct WordNet path-based methods with the ones obtained from
as possible is crucial to the accuracy of text processing the cosine similarity between the Word2Vec vectors. Having a
based applications. There are multiple ways to compute this small value for the cosine similarity on a pair of words that has
similarity with the help of WordNet: using the path between a large WordNet similarity means that the path between those
concepts, their information content, their attributes, or various words is too big, and thus one or more connections should be
combination of these elements. However, in this paper, we added. Subsequently, having a large cosine similarity, and a
only address path-based similarity to prove that the proposed small path-based similarity means that there is a possibility
method is feasible, all the other methods involving the same that the connections between those concepts have become
steps as shown in this paper. partially deprecated and should be updated to underline the
Since WordNet is a human curated database [8], errors semantic change [13].
might occur in the process of creating the synsets and the The corpus used for building the Word2Vec vectors has
relations between them. Also, concepts or links between them significantly more words than WordNet, i.e., approximately
may become deprecated or insufficient in time due to semantic three billion words in Word2Vec compared to approximately
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
two hundred thousand words in WordNet. Thus, to determine model uses a word to preserve the context for multiple targeted
what words / concepts may be added to WordNet, we analyze neighbouring words.
the words from the Word2Vec corpus that are not in WordNet Using word embeddings and the cosine similarity [15], the
and, if a word has multiple strong connections (using a given semantic similarity between two words can be computed [16].
heuristic) with existing words from WordNet, we assume that This method assumes that two word embeddings have
this word can also be added to WordNet. non-zero norm and measures their orientation instead of
The proposed method to improve WordNet is not guaranteed their magnitude as in the case of the Euclidean distance.
to provide correct suggestions and thus, it should be used as Equation (1) presents the cosine similarity for two p-
a tagging system, such as the Paypal fraud system [14], for dimensional vectors x = {xi |i ∈ 1, p} and y = {yi |i ∈ 1, p}.
example. Paypal security algorithms do not say for sure if a p
x i · yi
transaction is a fraud or not, but they tag it as conspicuous cos(θ) = p i=1 2·
p
2
(1)
and a human is assigned to check its validity. Similar to the x
i=1 i i=1 yi
Paypal system, we cannot say for sure if a connection or a B. WordNet
word / concept should be added / updated, but we can tag
WordNet is a semantic graph that curates natural language
connections and concepts and have a human check the validity
to create a structured resource of human language by defining
of making those changes.
the individual relations between the semantic building blocks,
The remainder of this paper is structured as follows.
i.e., synsets (synonymous set), such as synonyms, hyponyms,
Section II presents the required information about the two
and meronyms [17]. The synsets are grouped considering
resources used in this research, i.e., WordNet and Word2Vec.
their part of speech, i.e., noun, verb, adjective, adverb, and
Section III discusses some current approaches that are similar
the concepts that they represent in specific contexts. These
to the described work. Section IV details our methodology for
concepts are created as multi-word terms, terms that give
finding the changes that should be applied to alter WordNet. In
a certain interchangeable context within the synset. The
Section V, we showcase the experimental results and discuss
structure of WordNet is created by adding links between
our findings. Finally, in Section VI we summarize our work
related concepts in a graph manner.
and list future research directions.
For nouns, the synsets are grouped in a hierarchical structure
having the following main relations between the terms:
II. P RELIMINARIES
• Hypernymy: a concept X in a hypernym of the concept
Y if the Y concept is contained in the X concept,
In this section, we detail the main concepts used in this e.g.,“entrance” is a hypernym of “door”.
paper. We present both Word2Vec and WordNet, as well as • Hyponymy: is the reverse relation of hypernymy; a
the different similarity metrics employed. concept is a hyonym of another if the first concept in
contained in the second one, e.g. “exterior door” is a
A. Word2Vec hyponym of “door”.
Word2Vec is a shallow neural architecture that, given a • Meronymy: a synset X is a meronym of the concept Y
textual dataset as input, produces a vector space using the if the X synset is part of the Y synset, e.g. “doorframe”
values of the neural network hidden layer [12]. The output of is a meronym of “door”.
the model is also known as word embeddings. There are two • Holonymy: is the reverse relation of meronymy, e.g.,
approaches proposed in the literature: i) CBOW (Continuous “wall” is a holonym for “door”.
Bag-Of-Words) model and ii) Skip-gram model. • Coordinate terms: two synsets are coordinate terms if
The CBOW model accounts for the textual dataset they share a common hypernym.
vocabulary, representing documents as a continuous set of Verbs are also stored in a tree like structure. The relations
words, ignoring any semantic or syntactic context, but between them are:
preserving their co-occurrence. In this model, the input to • Hypernymy: A verb is a hypernym for another if the
the hidden layer connections are replicated by the number of action for the second verb is a (kind of) action of the
context words. The context is preserved through the use of first verb, e.g., “to compete” is a hypernym for “to play”.
multiple words that target a given word. • Troponymy: is the equivalent of the hyponymy relation
In the Skip-gram model, documents are represented as for nouns. A verb is a troponym of another if the action
sequences of words with gaps between them. The input to represented by the first verb is a type of action of the
the neural network is the target word and the output layer is second one, e.g. “to line up” is a troponym for “to play”.
replicated multiple times to accommodate the chosen number • Entailment: This is a special relation for verbs. A verb X
of context words. Thus, this model manages to embed in the is entailed by a verb Y if for doing the action X, the action
word representation its linguistic context. Y is a prerequisite, e.g., the verb “to win” is entailed by
The two models mirror each other while both preserve the verb “to play”.
context. The CBOW model uses multiple neighbouring words • Coordinate terms: As in the noun structure, two verbs
to preserve the context for a target word, while the Skip-gram are coordinate terms if they have a common hypernym.
122
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
WordNet adjectives are not structured in a ranked set, WordNet using Word2Vec, without much success. Therefore,
but have a special pair arrangement. Thus, adjectives are they decided to use decision trees to improve Japanese
arranged in pair of strong antonyms, e.g., “cold” - “hot”, WordNet.
which are surrounded by satellites of weaker, i.e., not so used Another research that uses both WordNet and Word2Vec
antonyms, connected by similarity relationships. Adverbs are investigates the type of semantic relations that are returned
very similar to adjective in the way they are structured. Some by the cosine similarity built on top of the Word2Vec vectors,
adverbs are derived from adjectives and contain a link to their along with the chances that two similar words from Word2Vec
corresponding adjectives. to be also connected in WordNet [19].
Finally, another research direction is to detect semantic
C. WordNet similarity metrics
change using word embeddings in order to update the relation
There are multiple methods to compute the similarity between words in WordNet.
between two WordNet concepts. These methods are based
either on the path between concepts, their information content, A. Improving of the Japanese WordNet
their attributes, or various combination of these [7]. The Japanese WordNet is the Japanese equivalent of
Path Similarity is a heuristic where the similarity between WordNet. It was created in 2006 and, ever since, it is prone to
two concepts (i.e., synsets C1 and C2 ) is computed considering continuous development. The structure of the lexical database
the shortest path (i.e., P (C1 , C2 )) between them that connects is copied from WordNet; consequently the concept of synsets
the concepts in the tree, i.e., using the hypernym / hyponym and links between them is also present here. However, the
relationships. Unlike the nouns group, the verbs do not Japanese WordNet was not obtained by translating the English
have a root node, so a fake root has been added to allow version and thus linguistics differences led to the decision to
the computation of the similarity between any two verbs. update certain synsets by underlining the semantic change and
The value returned by this metric is in the [0, 1] interval. add some others meanings.
Equation (2) presents the Path Similarity where P (C1 , C2 ) Previous studies show that approximately 5% of the inputs
is the smallest number of links between two synsets C1 and in the Japanese WordNet may contain errors [20]. Some of
C2 . these errors consist of words that are placed in the same
synset, even though they are not synonyms. Other errors are
1
simP (C1 , C2 ) = (2) represented by synsets that do not have the right connections
1 + P (C1 , C2 ) or are not connected at all, causing WordNet similarities to
Wu-Palmer Similarity considers two different facts: the give unrealistic values.
depth of the two senses in the taxonomy and the depth of A study that uses decision trees to check the validity of
the most specific ancestor in the WordNet tree, which is hyponymy relationships is presented in [18]. The study tries
called LCS (least common subsumer). Wu-Palmer Similarity to verify two hypotheses:
is presented in Equation (3), where LCS(C1 , C2 ) is the lowest 1) Does the vector space generated by Word2Vec has
node in hierarchy that is a hypernym of both C1 and C2 “noise” that makes the cosine similarities values not that
concepts, P (C1 , C2 ) is the shortest path from C1 to C2 , and accurate? But, is there a subspace of the Word2Vec space
D(C1 ) represents the depths of concept C1 in the hierarchy. that is relevant in detecting synonyms?
2) Can Word2Vec be used to find the sense of synonyms
2 · D(LCS(C1 , C2 )) members of the same synset?
simW P (C1 , C2 ) = (3)
P (C1 , C2 ) + 2 · D(LCS(C1 , C2 )) As part of their experiment, they generate Word2Vec vectors
Leacock-Chodorow Similarity is computed according to the using documents from Wikipedia. They run the training
shortest path between concepts (i.e., P (C1 , C2 )) but it is algorithm twice, to get vectors of different sizes, i.e., 200
combined with the maximum depth in the tree like structure and 800. They conclude that using a relative small number
where the concepts occur (i.e., ||D||). The mathematical of Word2Vec embedding vectors it is possible to find words
formulation of this similarity measure is presented in connected in the Japanese version of WordNet lexical database
Equation (4) where ||D|| is the taxonomy depth and P (x, y) is and that there is noise in the vector space generated by
the shortest path from concept x to concept y, only considering Word2Vec vectors that limit the use of cosine similarity score
the IS-A links from WordNet. in locating related words.
B. Word2Vec Vector Similarity to WordNet Relations
P (C1 , C2 )
simLC (C1 , C2 ) = − log (4) Word2vec builds a vector space in which the related words
2 · ||D||
are closer to each other, but it is not clear what kind
III. R ELATED W ORK of relations (in terms of WordNet semantic relations) are
The need to improve the quality of WordNet was identified shown by this closeness. Handler investigated this problem
by other researchers as well. In article [18], discuss about by computing the likelihood that two words that are separated
an attempt that is reportedly similar to what we propose in by a cosine similarity of k in Word2Vec to have a connection
this paper aimed at improving the accuracy of the Japanese in WordNet [19].
123
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
Handler starts from Word2Vec by generating a ranked list be updated to underline semantic changes. Thus, we
with the most similar words for any given word from the compute the cosine similarity for each pair of synonyms
corpus. The similarity is computed only for nouns, using from each WordNet synset and mark those words that
cosine similarity between the vector representations of each have weak connections (reduced cosine similarity) to all
word. Based on this list, he investigates the correlation the other words from the synset to be altered.
between the Word2Vec obtained ranking and the existence 2) Comparison of directly connected synsets. The
and type of a relation between the two words (the starting purpose of this set of tests is to see if there are any
word and the one from rank k) in the human curated first-grade connections between synsets that should be
English words database WordNet. The results show that among modified to underline the semantic change. If a link has a
the top-k most similar words from Word2Vec are returned weak cosine similarity and a strong path similarity, then
more hypernyms than synonyms, followed by meronyms, there is a chance for that specific link to be deprecated
hyponyms, and holonyms. In terms of the value of k or added incorrectly. Thus, for all direct connections of
(the maximum rank) that still generates connected words hyponymy / troponymy, hypernymy, and meronymy, we
in WordNet, the results show that for k smaller than 50 compute the cosine similarity and also all three path-
(most similar 50 words), the words have high chances to be based WordNet similarities, i.e., Path, Wu and Palmer
synonyms or hypernyms. With an increased value of k, i.e., and Leacock and Chodorow similarities.
k > 50, these probabilities tend to fall until k = 200, being 3 3) Comparison between synsets. This set of tests, similar
to 4 times smaller. When k is smaller than 10 the words have to the previous one, is performed separately for nouns
small chances to be holonyms, meronyms and hyponyms, and and for verbs, being constrained by WordNet’s structure
after that threshold, these chances go near to 0. which divides synsets according to their part of speech.
Our work is related, but somehow antithetical, to Handler’s Each synset is compared with all the other synsets,
approach. Thus, instead of starting from the most similar nouns taking into account the cosine similarity and all three
from Word2Vec, we start from the WordNet’s synsets and try path similarities (Path, Wu and Palmer and Leacock
to adjust the existing connections according to the data from and Chodorow similarities). For each synset, the top-
Word2Vec. Moreover, we try to find out what connections 10 connections with major differences between cosine
should be present in WordNet, but are not there and also what similarity and WordNet similarity is stored in three
new concepts (along with their connections) should be added different databases, one for each type of WordNet
to WordNet. similarity. These experiments are designed to find the
connections that should be added to WordNet. They
C. Semantic Change cover the cases when synsets that are not connected
Semantic change refers to the lexical or conceptual changes in WordNet have high cosine similarity in Word2Vec.
that appear in the meaning of terms. Word embeddings [12], This step was the most demanding task from the
[21]–[25] and transformer embeddings [26], [27] are used computational point of view, since the algorithm has a
as lexical representation to compute the semantic drift that complexity of O(n2 ), where n is the number of synsets
appears with the change of meaning over time. Many methods processed, i.e., n = 80 000 for nouns and n = 13 000
use embeddings models, e.g., Word2Vec [28], BERT [29], to for verbs.
encapsulate the term’s meaning whilst taking into account 4) Identification of the words / concepts to be added.
temporal-spatial information to detect semantic changes. This set of tests is devised to determine what words from
Other approaches build partial graphs into low dimensional Word2Vec corpus should be added to WordNet. In this
continuous hierarchical spaces to create low latent spaces step, only cosine similarity from Word2Vec is used. Each
for representing changes in meaning [30]. Current methods word from the Word2Vec corpus that is not in WordNet
employ LSTM-based sequence to sequence (Seq2Seq) models is compared with all the words from WordNet, using the
to track the changes in meaning over time [31]. Although, we cosine similarity between the Word2Vec representations
do not propose new methods for detecting semantic change, of these words. If a new word has multiple connections
our proposed solution improves WordNet’s capabilities to with the WordNet synsets, then that word is added to a
identify these changes and update its links. “could be added” list for further human analysis.
IV. M ETHODOLOGY V. R ESULTS AND D ISCUSSION
This paper’s main objective is to improve WordNet using In this section, we will first present the results obtained
Word2Vec corpus. Thus, new concepts or connections can be after applying the tests presented above and afterwards we will
added or updated. Also, during this phase we may discover that discuss how these results may be used for improving WordNet.
new types of connection are relevant and should be added to
WordNet. To achieve these goals, several tests are defined and A. Results
performed: Synonymy Relations. The first test that we undertook was
1) Comparison between synonyms in each synset. This to compute the Word2Vec cosine similarity of the concepts
set of tests is performed to identify words that should that were placed in the same synset in WordNet. This step’s
124
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
purpose was to update the concepts whose sense was not semantic change. The other situation is when the cosine
close to the one of the synset. The top-5 pairs of words similarity is large, but the WordNet similarity is small,
with the weakest cosine similarity is presented in Table I. For implying that some additional links should be introduced
comparison, we also provide the top-5 pairs of words with the between synsets. As the three different path-related metrics
strongest cosine similarity in Table II. that are built on top of WordNet provide different results, we
considered them individually.
TABLE I To detect the first situation, we computed the difference
W EAK S YNONYMS CONNECTIONS ACCORDING TO COSINE SIMILARITY between the results obtained with each of the three path-
Synset Noun1 Noun2 simcos related metrics and the ones obtained using cosine similarity
bus.n.01 Coach Omnibus -0.1106 from Word2Vec and retained only the ones with the highest
upset.n.04 Upset Swage -0.1022 values. The results obtained using the Wu and Palmer metric
taegu.n.01 Taegu Tegu -0.0934
keystone.n.02 Key Headstone -0.0874 are highlighted in Table V for nouns and Table VI for verbs,
rafter.n.01 Rafter Balk -0.0810 where ΔsimP , ΔsimW P , ΔsimLC represent the difference
between the Path, Wu and Palmer, or Leacock and Chodorow
similarities and the cosine similarity from Word2Vec.
TABLE II To find the opposite situation, we computed the difference
S TRONG S YNONYMS CONNECTIONS ACCORDING TO COSINE SIMILARITY
between the results obtained using the Word2Vec cosine
Synset Noun1 Noun2 simcos similarity and the ones coming from each of the three path-
shiite.n.01 Shiite Shi’ite 0.9493 related similarities from WordNet, i.e., Path, Wu and Palmer,
taliban.n.01 Taliban Taleban 0.9462
gaza strip.n.01 Gaza Strip Gaza 0.9358
and Leacock and Chodorow.
united nations.n.01 United Nations UN 0.9331 Concepts to be Added. The last test was designed to
hizballah.n.01 Hezbollah Hizbollah 0.9311 identify which concepts should be added to WordNet. The
test revealed that the number of words to add is troublesome
Hypernymy / Hyponymy Relations. As part of this test, for as the corpus used to train Word2Vec contains many variations
all pairs of WordNet synsets that are in a direct relationship of of a word and words or expression that are not in the format
hypernymy / hyponymy were computed the Word2Vec cosine required by WordNet. For example, in the corpus there are a
similarities and all three WordNet path-related similarities. It lot of proper nouns representing different names of people
should be noted that since the synsets were connected through that appear in the news corpus, along with (personal or
a direct link, the distance considered in the Path similarity institutional) email addresses which do not help the purpose
was 1 for all pairs of concepts, leading to a Path Similarity of this paper to improve WordNet by adding new concepts.
of 0.5. The five pairs of synsets related through a hypernymy Therefore, these must be filtered out.
/ hyponymy relation that had the lowest cosine similarity are The remaining words to be added have been found in
presented in Table III, while the ones with the highest cosine different domains. For example, in the food related area, some
similarity are presented in Table IV. new fast food items or different types of drinks that have
recently become popular have been detected.
TABLE III Examples of words to add from food related area might be:
T OP WEAK HYPERNYM / HYPONYM CONNECTIONS ACCORDING TO
COSINE SIMILARITY • real fruit smoothies: a drink where different types of
Noun1 Noun2 simcos simP simW P simLC fruits are smashed or juiced, depending of the fruit, and
addition.n.02 fluoridation.n.01 -0.0984 0.5000 0.9523 2.9444 served as a fresh beverage.
seizure.n.04 impress.n.01 -0.0967 0.5000 0.9411 2.9444
leadership.n.02 rome.n.02 -0.0963 0.5000 0.9230 2.9444 • cask conditioned beers: a type of beer that is unfiltered
panel.n.01 coffer.n.01 -0.0901 0.5000 0.9333 2.9444 and unpasteurized giving it a different aroma.
mammary gland.n.01 dug.n.01 -0.0867 0.5000 0.9473 2.9444
• de boeuf : a type of salad made of different vegetables
and mayonnaise which is popular mostly during holiday
TABLE IV periods.
T OP STRONG HYPERNYM / HYPONYM CONNECTIONS ACCORDING TO • chicken chasseur: a French recipe made by combining
COSINE SIMILARITY
chicken meat with chasseur sauce.
Noun1 Noun2 simcos simP simW P simLC • steaks burgers: a burger where the meat between the
homer.n.01 solo homer.n.01 0.9040 0.5000 0.9565 2.9444
trout.n.02 rainbow trout.n.02 0.8903 0.5000 0.9090 2.9444 loafs of breed is a steak instead of normal meat for
porch.n.01 front porch.n.01 0.8879 0.5000 0.9333 2.9444
professor.n.01 associate professor.n.01 0.8847 0.5000 0.9600 2.9444 burgers.
cardiovascular disease.n.01 heart disease.n.01 0.8815 0.5000 0.9411 2.9444
• cheese coleslaw: a salad made by cabbage and cheese
combined with a sauce, mostly used for picnics.
Synsets Relation. This test was developed to identify two
different situations. The first one is when the Word2Vec cosine Although the corpora used was trained on Google news,
similarity between two concepts is small and the WordNet not being specifically trained on a field, several medical terms
similarity is large, which means that some connections on have been found. Some examples of medical terms are:
the path between the two concepts need updates to underline • metastatic colorectal cancer: a type of colon cancer.
125
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
TABLE V
T OP NOUNS DEPRECATED CONNECTIONS ACCORDING TO THE ABSOLUTE DIFFERENCE BETWEEN W U PALMER SIMILARITY AND COSINE SIMILARITY
TABLE VI
T OP VERBS DEPRECATED CONNECTIONS ACCORDING TO ABSOLUTE DIFFERENCE BETWEEN W U PALMER SIMILARITY AND COSINE SIMILARITY
• anxiety insomnia: a disease where the patient is having is the most rudimentary from them, providing the poorest
trouble sleeping because of anxiety panic. results. The Wu and Palmer similarity is the one that gave
• hereditary blindness: an inherited eye disease which the best results at a human perception level, while Leacock
causes blindness mostly to small children because of their and Chodorow similarity generated the closest results to the
genetics. ones from Word2Vec cosine similarity.
• medulloblastoma malignant brain tumor: a type of We analyze and compare the results provided by the three
cancer which forms a tumor in the brain. WordNet similarity metrics with the ones obtained using the
• monozygotic twins: the type of twins that, in the early cosine similarity. For that, we normalize the values provided
stage of growing, share the same placenta. by these metrics in the [0, 1] interval. The normalization step is
Another field where Word2Vec discovered some new necessary for the cosine similarity that takes values in [−1, 1]
concepts to be added is the general science field. Also, a lot of interval, and for the Leacock-Chodorow Similarity where the
technical terms from computer science could be added. Some values are in the [0.8, 2.944] range. After this normalization
examples of computer science terms are: step, we compute the average squared difference between the
• ARM processor: a type of processor instructions. values generated by each of the WordNet similarity metrics
• SO DIMM memory: a type of memory for Apple and the corresponding Word2Vec cosine similarity. The results
computers. are shown in Table VII.
Finally, some examples of general terms that were identified
are: TABLE VII
C OMPARISON BETWEEN SQUARED DIFFERENCES OF PATH RELATED
• glacial meltwater: the phenomena of melting the glacial SIMILARITIES
provoked by the global warming.
Squared difference with
• synthetic fiber: fibers researched and produced by WordNet Similarity
Cosine Similarity
scientist to improve and replace natural fiber made up Path Similarity 0.6388
from plants or animals’ hair. Wu and Palmer Similarity 0.2519
Leacock adn Chodorow Similarity 0.1787
• copper indium diselenide: a semiconductor material
with many applications in solar energy.
• optical biosensors: a specific type of biosensor used for WordNet Improvements. After analyzing the obtained
detection and analyzing of different chemical substances. results, we notice that a few domains have weak WordNet
• magnetic resonance imaging: a medical procedure used connections, but strong cosine similarity.
for scanning the body of a patient with the purpose of The medical field is one such domain. We identify a few
detecting hard to find tumors. connections that should be added to WordNet, either directly
or in the path between concepts. The analysis of these terms
B. Discussions
shows that they could be classified as connections between i)
Since we use three different similarity metrics for computing disease and symptom / cause, ii) disease and part of the body
the concepts’ similarity in WordNet, the first thing to discuss affected, or iii) disease and medicine. Some examples in this
is the quality of these metrics. Thus, the Path similarity sense are provided in the list below:
126
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
• thromboembolism.n.01 - myocardial infarction.n.01: TABLE VIII
E XAMPLE OF W ORDS TO BE ADDED TO W ORD N ET
Thromboembolism is the obstruction of a blood vessel
and is one of the causes for myocardial infarction New words WordNet synsets they are related to
commonly known as an infarct. metastatic colorectal cancer multiple myeloma, carcinoma
anxiety insomnia polyuria, insomnia, paraesthesia, sleeplessness
• acetabulum.n.01 - osteochondroma.n.01: hereditary blindness myotonic dystrophy, degenerative disorder
Osteochondroma is a type of benign tumor that omelets pancakes quesadilla, caramel apple, fudge sauce
glacial meltwater glacier, lobate, water vapor
starts in the bones. Acetabulum is a part of the pelvis tight tolerances strain gage, leadless, toroidal
bone and the place where that kind of tumor often starts. ceramic brakes gearset, sunblind, brake pad, hp
• cefotaxime.n.01 - parasitemia.n.01: Cefotaxime is an jointed legs premolar, condyle, caudal fin, stretchability
nail lacquer potassium alum, nail varnish, spf
antibiotic used to treat bacterial infections. Parasitemia rhino beetle sacred ibis, palm civet, carpenter ant
is a condition where parasites are present in the body.
This is an example of cure-disease connection found by
TABLE IX
Word2Vec. E XAMPLE OF C ONNECTIONS TO BE ADDED TO W ORD N ET
• amino.n.01 - tyrosine.n.01: Tyrosine is a type of amino
acid, so there should be added a hyponymy / hypernymy New connections Connection type
burger.n.01-cheeseburger.n.01 hypernymy-hyponymy relationship
relation in WordNet. flash memory.n.01-memory.n.01 hyponymy-hypernymy relationship
Flora structure in WordNet would also require a review, with aspen.n.01-chlorosis.n.01 meronymy-holonymy relationship
quite a few relationships between different plants being found wine.n.01-pinot noir.n.01 hypernymy-hyponymy relationship
lobster.n.01-scallop.n.01 hyponymy-hyponymy relationship
by the Word2Vec method, in which one plant was similar to sesame oil.n.01-soy sauce.n.01 coordinate terms
the other, but their corresponding concepts are unrelated in the gnocchi.n.01-risotto.n.01 coordinate terms
WordNet taxonomy. Some examples that backup this claim are adagio.n.01-allegro.n.01 coordinate terms
provided below: coral.n.01-coral reef.n.01 hyponymy-hyponymy relationship
hesitance.n.01-reluctance.n.01 meronymy-holonymy relationship
• eggplant.n.01 - zucchini.n.01: both vegetables are related
through shape and method of cooking, but they are
TABLE X
unrelated in WordNet. E XAMPLE OF C ONNECTIONS TO BE MODIFIED IN W ORD N ET
• potentilla.n.01 - snowberry.n.01: these are two flower
species widely used as wedding decorations, but there Connection Reason for update
region.n.01-hell.n.01 direct hypernymy-hyponymy relationship
is no relation between them in WordNet. acting.n.01-heroics.n.01 direct hypernymy-hyponymy relationship
• cabbage.n.01 - cauliflower.n.01: both are part of the structure.n.01-shoebox.n.01 direct hypernymy-hyponymy relationship
“Brassica Oleracea” species which includes cabbage, condition.n.01-silence.n.01 direct hypernymy-hyponymy relationship
device.n.01-key.n.01 direct hypernymy-hyponymy relationship
broccoli or cauliflower. They have a strong cosine
set.n.01-threescore.n.01 direct hypernymy-hyponymy relationship
similarity, but weak WordNet similarity. power.n.01-repellent.n.03 direct hypernymy-hyponymy relationship
Another field where connections are missing is the food- romania.n.01-sultanate.n.01 relationship not direct, shorten path
romania.n.01-billings.n.01 relationship not direct, shorten path
related concepts. In many cases, either different types of frenchman.n.01-spartan.n.01 relationship not direct, shorten path
food are not correctly connected, or the ingredients are not
hyponyms of the food type. An example from this category
is the pair of concepts burger.n.01 - cheeseburger.n.01, Thus, starting from the existing connection types from
cheeseburger being a type of burger where a slice of cheese WordNet, Word2Vec can detect new connections that escaped
is added. the human annotators responsible for adding them, as is the
C. Overview of the Proposed Changes in WordNet case, for example, for the hypernymy / hyponymy connection
To estimate the impact that the proposed method has on between burger.n.01 and cheeseburger.n.01. Moreover, old
altering the WordNet structure, we evaluate five hundred connection that are not relevant anymore can be pointed
concepts and connections that are flagged by the four tests that out, by comparing the connections from WordNet with
we present above. The conclusion of this manual evaluation the Word2Vec cosine similarity between the vectors of the
is that only 49 words are relevant and need to be added corresponding concepts, that are relevant to the current view
to WordNet (Table VIII), 23 connections should be added of the world. One such example is the connection between
(Table IX), while 15 should be modified to underline the Romania.n.01 and sultanate.n.01. Besides that, new types of
semantic change (Table X), leading to accepting a total of relations may be added to WordNet, as is the case of the
87 successful operations, out of 500, representing only about connection between the disease and medicine concepts, which
17.4% of all the suggested modifications. are very similar according to their Word2Vec representations,
but there is no WordNet connection type that could be used
VI. C ONCLUSION to describe this relation.
In this paper, we present how Word2Vec is used to (semi-) Besides updating the connections between WordNet’s
automatically update WordNet to create and maintain a more existing concepts to underline semantic change, the lexical
powerful resource for natural language processing tasks. resource may be also enhanced by adding new concepts, along
127
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.
with their due connections, by observing which words from [10] R. Agerri and A. Garcı́a-Serrano, “Q-wordnet: Extracting polarity from
wordnet senses.” in LREC, 2010.
Word2Vec are strongly connected to words from WordNet. [11] T. Petrolito and F. Bond, “A survey of wordnet annotated corpora,” in
However, to avoid adding concepts that are proper nouns Proceedings of the Seventh Global WordNet Conference, 2014, pp. 236–
or declination of other words, a mechanism to reject such 245.
[12] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
words should be implemented to ease the manual work of word representations in vector space,” in International Conference on
the humans who will check the validity of these words. To Learning Representations, 2013.
show the stringency of such a mechanism, from the top-500 [13] F. Armaselu, E.-S. Apostol, A. F. Khan, C. Liebeskind, B. McGillivray,
C.-O. Truică, and G. V. Oleškevičienė, “HISTORIAE, HIStory of
words tagged by Word2Vec as having strong connections with culTural transfORmatIon as linguistIc dAta sciEnce, A Humanities Use
WordNet concepts, only 49 of them were considered relevant Case,” in Conference on Language, Data and Knowledge, 2021, pp.
when they were evaluated by experts. Thus, they may be added 34:1–34:13.
[14] B. Masters and P. Thiel, Zero to one: notes on start ups, or how to build
to WordNet. the future. Random House, 2014.
An important factor for updating WordNet is represented by [15] O. Levy, Y. Goldberg, and I. Dagan, “Improving distributional similarity
the vector representation of concepts from Word2Vec, which with lessons learned from word embeddings,” Transactions of the
Association for Computational Linguistics, vol. 3, pp. 211–225, 2015.
in turn are influenced by the corpus on which Word2vec is [16] T. Kenter and M. de Rijke, “Short text similarity with word embeddings,”
trained. The current work uses general representations, as the in ACM International on Conference on Information and Knowledge
used Word2Vec vectors are obtained by training on news from Management. ACM, 2015, pp. 1411–1420.
[17] Y. Pinter and J. Eisenstein, “Predicting semantic relations using global
the Google corpus. However, specific areas corresponding graph properties,” in Proceedings of the 2018 Conference on Empirical
to different fields need to be improved, e.g., medical, food Methods in Natural Language Processing. ACL, 2018.
related, flora, fauna, certain actions, etc. Therefore, instead [18] T. Hirao, T. Suzuki, N. Wariishi, and S. Hirokawa, “Vector similarity
of related words and synonyms in the japanese wordnet,” Information
of training Word2Vec on news, it can be trained on books Engineering Express, vol. 1, no. 4, pp. 21–31, 2015.
and studies from that field, leading to more specialized vector [19] A. Handler, An empirical study of semantic similarity in WordNet and
representations. These might provide better insights on the Word2Vec. University of New Orleans Theses and Dissertations, 2014.
[20] F. Bond, H. Isahara, S. Fujita, K. Uchimoto, T. Kuribayashi, and
domain vocabulary and connections between concepts. K. Kanzaki, “Enhancing the japanese wordnet,” in Proceedings of the 7th
Word2Vec ignores the polysemantic nature of words and workshop on Asian language resources. Association for Computational
creates a single representation for each word. Thus, different Linguistics, 2009, pp. 1–8.
[21] J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for
senses of the same word are mixed in the same vector. Another word representation,” in Conference on Empirical Methods in Natural
improvement of this work would be to use other methods to Language Processing, 2014, pp. 1532–1543.
build the vector representations, that also consider different [22] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
vectors with subword information,” Transactions of the Association for
senses that a word might have, e.g., FastText [22], ELMo [24], Computational Linguistics, vol. 5, pp. 135–146, 2017.
BERT [26], etc. This way, a better mapping from a synset to a [23] M. Nickel and D. Kiela, “Poincaré embeddings for learning hierarchical
word embedding is available, leading to more accurate results. representations,” in International Conference on Neural Information
Processing Systems, 2017, p. 6341–6350.
Acknowledgement. This research was funded by [24] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and
UEFISCDI, grant number PN-III-P2-2.1-PED-2019-4993, L. Zettlemoyer, “Deep contextualized word representations,” in Proc. of
Smart Urban Water-Based on Community Participation NAACL, 2018.
[25] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin,
Through Gamification – Watergame Project. “Advances in pre-training distributed word representations,” in
International Conference on Language Resources and Evaluation, 2018,
R EFERENCES pp. 52–55.
[1] G. A. Miller, “WordNet: a lexical database for english,” Communications [26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of the ACM, vol. 38, no. 11, pp. 39–41, 1995. of deep bidirectional transformers for language understanding,” in
[2] B. Liu, “Sentiment analysis and opinion mining,” Synthesis lectures on Proceedings of the 2019 Conference of the North. ACL, 2019.
human language technologies, vol. 5, no. 1, pp. 1–167, 2012. [27] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and
Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language
[3] M. Pourvali and M. S. Abadeh, “Automated text summarization base on
understanding,” in Advances in Neural Information Processing Systems,
lexicales chain and graph using of wordnet and wikipedia knowledge
2019, pp. 5753–5763.
base,” arXiv preprint arXiv:1203.3586, 2012.
[28] H. Gong, S. Bhat, and P. Viswanath, “Enriching word embeddings
[4] A. G. Tapeh and M. Rahgozar, “A knowledge-based question answering
with temporal and spatial information,” in Conference on Computational
system for b2c ecommerce,” Knowledge-Based Systems, vol. 21, no. 8,
Natural Language Learning, 2020, pp. 1–11.
pp. 946–950, 2008.
[29] M. Giulianelli, M. D. Tredici, and R. Fernández, “Analysing
[5] N. Gao, W. Zuo, Y. Dai, and W. Lv, “Word sense disambiguation
lexical semantic change with contextualised word representations,”
using wordnet semantic knowledge,” in Knowledge Engineering and
in Proceedings of the 58th Annual Meeting of the Association
Management. Springer, 2014, pp. 147–156.
for Computational Linguistics, ACL 2020, Online, July 5-10, 2020,
[6] T. Bauer, E. Devrim, M. Glazunov, W. L. Jaramillo, B. Mohan, and
D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association
G. Spanakis, “#MeTooMaastricht: Building a chatbot to assist survivors
for Computational Linguistics, 2020, pp. 3960–3973.
of sexual harassment,” in Machine Learning and Knowledge Discovery
[30] Y. Bizzoni, M. Mosbach, D. Klakow, and S. Degaetano-Ortlieb, “Some
in Databases. Springer International, 2020, pp. 503–521.
steps towards the generation of diachronic WordNets,” in Nordic
[7] L. Meng, R. Huang, and J. Gu, “A review of semantic similarity
Conference on Computational Linguistics, 2019, pp. 55–64.
measures in wordnet,” International Journal of Hybrid Information
[31] A. Tsakalidis and M. Liakata, “Sequential modelling of the evolution of
Technology, vol. 6, no. 1, pp. 1–12, 2013.
word representations for semantic change detection,” in Conference on
[8] G. A. Miller, WordNet: An electronic lexical database. MIT press,
Empirical Methods in Natural Language Processing, 2020, pp. 8485–
1998.
8497.
[9] G. de Melo, “Etymological Wordnet: Tracing the history of words,” in
International Conference on Language Resources and Evaluation, 2014,
pp. 1148–1154.
128
Authorized licensed use limited to: ASU Library. Downloaded on July 03,2022 at 20:57:47 UTC from IEEE Xplore. Restrictions apply.