0% found this document useful (0 votes)
17 views

Towards Understanding The Geometry of KN

This document summarizes a research paper that aims to analyze the geometry of knowledge graph embedding vectors. The paper introduces metrics to study the geometry and positions of entity and relation vectors embedded in a vector space. Through experiments on real-world datasets, the paper discovers insights about the differences in geometry between additive and multiplicative embedding methods, and relationships between geometric attributes and predictive performance. This is the first study to analyze the geometry of knowledge graph embeddings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Towards Understanding The Geometry of KN

This document summarizes a research paper that aims to analyze the geometry of knowledge graph embedding vectors. The paper introduces metrics to study the geometry and positions of entity and relation vectors embedded in a vector space. Through experiments on real-world datasets, the paper discovers insights about the differences in geometry between additive and multiplicative embedding methods, and relationships between geometric attributes and predictive performance. This is the first study to analyze the geometry of knowledge graph embeddings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Towards Understanding the Geometry of Knowledge Graph Embeddings

Chandrahas Aditya Sharma Partha Talukdar


Indian Institute of Science Indian Institute of Science Indian Institute of Science
[email protected] [email protected] [email protected]

Abstract predicates, e.g., (Bill de Blasio, mayorOf, New


York City).
Knowledge Graph (KG) embedding has The problem of learning embeddings for
emerged as a very active area of research Knowledge Graphs has received significant atten-
over the last few years, resulting in the tion in recent years, with several methods being
development of several embedding meth- proposed (Bordes et al., 2013; Lin et al., 2015;
ods. These KG embedding methods rep- Nguyen et al., 2016; Nickel et al., 2016; Trouil-
resent KG entities and relations as vectors lon et al., 2016). These methods represent enti-
in a high-dimensional space. Despite this ties and relations in a KG as vectors in high di-
popularity and effectiveness of KG em- mensional space. These vectors can then be used
beddings in various tasks (e.g., link pre- for various tasks, such as, link prediction, entity
diction), geometric understanding of such classification etc. Starting with TransE (Bordes
embeddings (i.e., arrangement of entity et al., 2013), there have been many KG embed-
and relation vectors in vector space) is un- ding methods such as TransH (Wang et al., 2014),
explored – we fill this gap in the paper. TransR (Lin et al., 2015) and STransE (Nguyen
We initiate a study to analyze the geome- et al., 2016) which represent relations as trans-
try of KG embeddings and correlate it with lation vectors from head entities to tail entities.
task performance and other hyperparame- These are additive models, as the vectors interact
ters. To the best of our knowledge, this is via addition and subtraction. Other KG embed-
the first study of its kind. Through exten- ding models, such as, DistMult (Yang et al., 2014),
sive experiments on real-world datasets, HolE (Nickel et al., 2016), and ComplEx (Trouil-
we discover several insights. For example, lon et al., 2016) are multiplicative where entity-
we find that there are sharp differences be- relation-entity triple likelihood is quantified by a
tween the geometry of embeddings learnt multiplicative score function. All these methods
by different classes of KG embeddings employ a score function for distinguishing correct
methods. We hope that this initial study triples from incorrect ones.
will inspire other follow-up research on In spite of the existence of many KG embed-
this important but unexplored problem. ding methods, our understanding of the geometry
and structure of such embeddings is very shallow.
1 Introduction
A recent work (Mimno and Thompson, 2017) an-
Knowledge Graphs (KGs) are multi-relational alyzed the geometry of word embeddings. How-
graphs where nodes represent entities and typed- ever, the problem of analyzing geometry of KG
edges represent relationships among entities. Re- embeddings is still unexplored – we fill this impor-
cent research in this area has resulted in the de- tant gap. In this paper, we analyze the geometry of
velopment of several large KGs, such as NELL such vectors in terms of their lengths and conicity,
(Mitchell et al., 2015), YAGO (Suchanek et al., which, as defined in Section 4, describes their po-
2007), and Freebase (Bollacker et al., 2008), sitions and orientations in the vector space. We
among others. These KGs contain thousands of later study the effects of model type and training
predicates (e.g., person, city, mayorOf(person, hyperparameters on the geometry of KG embed-
city), etc.), and millions of triples involving such dings and correlate geometry with performance.

122
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 122–131
c
Melbourne, Australia, July 15 - 20, 2018. 2018 Association for Computational Linguistics
We make the following contributions: by (Toutanova et al., 2015). In this paper, we study
the effect of the number of negative samples on
• We initiate a study to analyze the geometry of
KG embedding geometry as well as performance.
various Knowledge Graph (KG) embeddings.
In addition to the additive and multiplicative
To the best of our knowledge, this is the first
KG embedding methods already mentioned in
study of its kind. We also formalize various
Section 1, there is another set of methods where
metrics which can be used to study geometry
the entity and relation vectors interact via a neu-
of a set of vectors.
ral network. Examples of methods in this cate-
• Through extensive analysis, we discover sev- gory include NTN (Socher et al., 2013), CONV
eral interesting insights about the geometry (Toutanova et al., 2015), ConvE (Dettmers et al.,
of KG embeddings. For example, we find 2017), R-GCN (Schlichtkrull et al., 2017), ER-
systematic differences between the geome- MLP (Dong et al., 2014) and ER-MLP-2n (Rav-
tries of embeddings learned by additive and ishankar et al., 2017). Due to space limitations,
multiplicative KG embedding methods. in this paper we restrict our scope to the analysis
• We also study the relationship between geo- of the geometry of additive and multiplicative KG
metric attributes and predictive performance embedding models only, and leave the analysis of
of the embeddings, resulting in several new the geometry of neural network-based methods as
insights. For example, in case of multiplica- part of future work.
tive models, we observe that for entity vec-
tors generated with a fixed number of neg- 3 Overview of KG Embedding Methods
ative samples, lower conicity (as defined in
Section 4) or higher average vector length For our analysis, we consider six representative
lead to higher performance. KG embedding methods: TransE (Bordes et al.,
2013), TransR (Lin et al., 2015), STransE (Nguyen
Source code of all the analysis tools de- et al., 2016), DistMult (Yang et al., 2014), HolE
veloped as part of this paper is available (Nickel et al., 2016) and ComplEx (Trouillon
at https://github.com/malllabiisc/ et al., 2016). We refer to TransE, TransR and
kg-geometry. We are hoping that these re- STransE as additive methods because they learn
sources will enable one to quickly analyze the embeddings by modeling relations as translation
geometry of any KG embedding, and potentially vectors from one entity to another, which results in
other embeddings as well. vectors interacting via the addition operation dur-
ing training. On the other hand, we refer to Dist-
2 Related Work Mult, HolE and ComplEx as multiplicative meth-
In spite of the extensive and growing literature on ods as they quantify the likelihood of a triple be-
both KG and non-KG embedding methods, very longing to the KG through a multiplicative score
little attention has been paid towards understand- function. The score functions optimized by these
ing the geometry of the learned embeddings. A re- methods are summarized in Table 1.
cent work (Mimno and Thompson, 2017) is an ex- Notation: Let G = (E, R, T ) be a Knowledge
ception to this which addresses this problem in the Graph (KG) where E is the set of entities, R is
context of word vectors. This work revealed a sur- the set of relations and T ⊂ E × R × E is the set
prising correlation between word vector geometry of triples stored in the graph. Most of the KG em-
and the number of negative samples used during bedding methods learn vectors e ∈ Rde for e ∈ E,
training. Instead of word vectors, in this paper we and r ∈ Rdr for r ∈ R. Some methods also
focus on understanding the geometry of KG em- learn projection matrices Mr ∈ Rdr ×de for rela-
beddings. In spite of this difference, the insights tions. The correctness of a triple is evaluated using
we discover in this paper generalizes some of the a model specific score function σ : E × R × E →
observations in the work of (Mimno and Thomp- R. For learning the embeddings, a loss function
son, 2017). Please see Section 6.2 for more details. L(T , T ′ ; θ), defined over a set of positive triples
Since KGs contain only positive triples, nega- T , set of (sampled) negative triples T ′ , and the
tive sampling has been used for training KG em- parameters θ is optimized.
beddings. Effect of the number of negative sam- We use small italics characters (e.g., h, r) to
ples in KG embedding performance was studied represent entities and relations, and correspond-

123
Type Model Score Function σ(h, r, t)
TransE (Bordes et al., 2013) − kh + r − tk1
Additive TransR (Lin et al., 2015) − kMr h + r − Mr tk 1
STransE (Nguyen et al., 2016) − Mr1 h + r − Mr2 t 1
DistMult (Yang et al., 2014) r⊤ (h ⊙ t)
Multiplicative HolE (Nickel et al., 2016) r⊤ (h ⋆ t)
ComplEx (Trouillon et al., 2016) Re(r⊤ (h ⊙ t̄))

Table 1: Summary of various Knowledge Graph (KG) embedding methods used in the paper. Please see
Section 3 for more details.

ing bold characters to represent their vector em- STransE with Mr1 = Mr2 = Id and Mr1 = Mr2 =
beddings (e.g., h, r). We use bold capitalization Mr , respectively.
(e.g., V) to represent a set of vectors. Matrices are
represented by capital italics characters (e.g., M ). 3.2 Multiplicative KG Embedding Methods
This is the set of methods where the vectors inter-
3.1 Additive KG Embedding Methods
act via multiplicative operations (usually dot prod-
This is the set of methods where entity and rela- uct). The score function for these models can be
tion vectors interact via additive operations. The expressed as
score function for these models can be expressed
as below σ(h, r, t) = r⊤ f (h, t) (2)

σ(h, r, t) = − Mr1 h + r − Mr2 t 1



(1) where h, t, r ∈ Fd are vectors for head entity, tail
entity and relation respectively. f (h, t) ∈ Fd mea-
where h, t ∈ Rde and r ∈ Rdr are vectors for
sures compatibility of head and tail entities and
head entity, tail entity and relation respectively.
is specific to the model. F is either real space R
Mr1 , Mr2 ∈ Rdr ×de are projection matrices from
or complex space C. Detailed descriptions of the
entity space Rde to relation space Rdr .
models we consider are as follows.
TransE (Bordes et al., 2013) is the simplest addi-
DistMult (Yang et al., 2014) models entities and
tive model where the entity and relation vectors lie
relations as vectors in Rd . It uses an entry-wise
in same d−dimensional space, i.e., de = dr = d.
product (⊙) to measure compatibility between
The projection matrices Mr1 = Mr2 = Id are iden-
head and tail entities, while using logistic loss for
tity matrices. The relation vectors are modeled as
training the model.
translation vectors from head entity vectors to tail
entity vectors. Pairwise ranking loss is then used
σDistM ult (h, r, t) = r⊤ (h ⊙ t) (3)
to learn these vectors. Since the model is simple,
it has limited capability in capturing many-to-one, Since the entry-wise product in (3) is symmetric,
one-to-many and many-to-many relations. DistMult is not suitable for asymmetric and anti-
TransR (Lin et al., 2015) is another translation- symmetric relations.
based model which uses separate spaces for en- HolE (Nickel et al., 2016) also models entities and
tity and relation vectors allowing it to address the relations as vectors in Rd . It uses circular correla-
shortcomings of TransE. Entity vectors are pro- tion operator (⋆) as compatibility function defined
jected into a relation specific space using the cor- as
responding projection matrix Mr1 = Mr2 = Mr . d−1
X
The training is similar to TransE. [h ⋆ t]k = hi t(k+i) mod d
STransE (Nguyen et al., 2016) is a generalization i=0
of TransR and uses different projection matrices The score function is given as
for head and tail entity vectors. The training is
similar to TransE. STransE achieves better perfor- σHolE (h, r, t) = r⊤ (h ⋆ t) (4)
mance than the previous methods but at the cost of
more number of parameters. The circular correlation operator being asymmet-
Equation 1 is the score function used in ric, can capture asymmetric and anti-symmetric
STransE. TransE and TransR are special cases of relations, but at the cost of higher time complexity

124
Figure 1: Comparison of high vs low Conicity. Randomly generated vectors are shown in blue with
their sample mean vector M in black. Figure on the left shows the case when vectors lie in narrow cone
resulting in high Conicity value. Figure on the right shows the case when vectors are spread out having
relatively lower Conicity value. We skipped very low values of Conicity as it was difficult to visualize.
The points are sampled from 3d Spherical Gaussian with mean (1,1,1) and standard deviation 0.1 (left)
and 1.3 (right). Please refer to Section 4 for more details.

(O(d log d)). For training, we use pairwise rank- Dataset FB15k WN18
ing loss. #Relations 1,345 18
ComplEx (Trouillon et al., 2016) represents enti- #Entities 14,541 40,943
Train 483,142 141,440
ties and relations as vectors in Cd . The compati- #Triples Validation 50,000 5,000
bility of entity pairs is measured using entry-wise Test 59,071 5,000
product between head and complex conjugate of
tail entity vectors. Table 2: Summary of datasets used in the paper.

σComplEx (h, r, t) = Re(r⊤ (h ⊙ t̄)) (5) By this definition, a high value of Conicity(V)
In contrast to (3), using complex vectors in (5) al- would imply that the vectors in V lie in a nar-
lows ComplEx to handle symmetric, asymmetric row cone centered at origin. In other words, the
and anti-symmetric relations using the same score vectors in the set V are highly aligned with each
function. Similar to DistMult, logistic loss is used other. In addition to that, we define the variance
for training the model. of ATM across all vectors in V, as the ‘vector
spread’(VS) of set V,
4 Metrics !2
1 X
For our geometrical analysis, we first define a term VS(V) = ATM(v, V)−Conicity(V)
|V|
‘alignment to mean’ (ATM) of a vector v belong- v∈V
ing to a set of vectors V, as the cosine similarity1 Figure 1 visually demonstrates these metrics for
between v and the mean of all vectors in V. randomly generated 3-dimensional points. The
!
1 X left figure shows high Conicity and low vector
ATM(v, V) = cosine v, x spread while the right figure shows low Conicity
|V|
x∈V
and high vector spread.
We also define ‘conicity’ of a set V as the mean We define the length of a vector v as L2 -norm
ATM of all vectors in V. of the vector kvk2 and ‘average vector length’
1 X (AVL) for the set of vectors V as
Conicity(V) = ATM(v, V)
|V|
v∈V 1 X
AVL(V) = kvk2
1
cosine(u, v) = u⊤ v |V|
kukkvk v∈V

125
(a) Additive Models

(b) Multiplicative Models

Figure 2: Alignment to Mean (ATM) vs Density plots for entity embeddings learned by various additive
(top row) and multiplicative (bottom row) KG embedding methods. For each method, a plot averaged
across entity frequency bins is shown. From these plots, we conclude that entity embeddings from
additive models tend to have low (positive as well as negative) ATM and thereby low Conicity and high
vector spread. Interestingly, this is reversed in case of multiplicative methods. Please see Section 6.1 for
more details.

5 Experimental Setup Frequency Bins: We follow (Mimno and Thomp-


son, 2017) for entity and relation samples used in
Datasets: We run our experiments on subsets of
the analysis. Multiple bins of entities and relations
two widely used datasets, viz., Freebase (Bol-
are created based on their frequencies and 100 ran-
lacker et al., 2008) and WordNet (Miller, 1995),
domly sampled vectors are taken from each bin.
called FB15k and WN18 (Bordes et al., 2013), re-
These set of sampled vectors are then used for our
spectively. We detail the characteristics of these
analysis. For more information about sampling
datasets in Table 2.
vectors, please refer to (Mimno and Thompson,
Please note that while the results presented in
2017).
Section 6 are on the FB15K dataset, we reach the
same conclusions on WN18. The plots for our ex-
6 Results and Analysis
periments on WN18 can be found in the Supple-
mentary Section. In this section, we evaluate the following ques-
Hyperparameters: We experiment with multiple tions.
values of hyperparameters to understand their ef-
fect on the geometry of KG embeddings. Specif- • Does model type (e.g., additive vs multiplica-
ically, we vary the dimension of the generated tive) have any effect on the geometry of em-
vectors between {50, 100, 200} and the number beddings? (Section 6.1)
of negative samples used during training between
{1, 50, 100}. For more details on algorithm spe- com/Mrlyk423/Relation_Extraction (TransE,
TransR), https://github.com/datquocnguyen/
cific hyperparameters, we refer the reader to the STransE (STransE), https://github.com/
Supplementary Section.2 mnick/holographic-embeddings (HolE) and
https://github.com/ttrouill/complex (Com-
2
For training, we used codes from https://github. plEx and DistMult).

126
(a) Additive Models

(b) Multiplicative Models

Figure 3: Alignment to Mean (ATM) vs Density plots for relation embeddings learned by various additive
(top row) and multiplicative (bottom row) KG embedding methods. For each method, a plot averaged
across entity frequency bins is shown. Trends in these plots are similar to those in Figure 2. Main
findings from these plots are summarized in Section 6.1.

• Does negative sampling have any effect on vectors, respectively.3


the embedding geometry? (Section 6.2) Entity Embeddings: As seen in Figure 2, there
• Does the dimension of embedding have any is a stark difference between the geometries of en-
effect on its geometry? (Section 6.3) tity vectors produced by additive and multiplica-
tive models. The ATMs of all entity vectors pro-
• How is task performance related to embed- duced by multiplicative models are positive with
ding geometry? (Section 6.4) very low vector spread. Their high conicity sug-
gests that they are not uniformly dispersed in the
In each subsection, we summarize the main vector space, but lie in a narrow cone along the
findings at the beginning, followed by evidence mean vector. This is in contrast to the entity vec-
supporting those findings. tors obtained from additive models which are both
positive and negative with higher vector spread.
6.1 Effect of Model Type on Geometry
From the lower values of conicity, we conclude
Summary of Findings: that entity vectors from additive models are evenly
Additive: Low conicity and high vector spread. dispersed in the vector space. This observation
Multiplicative: High conicity and low vector is also reinforced by looking at the high vector
spread. spread of additive models in comparison to that of
In this section, we explore whether the type of multiplicative models. We also observed that addi-
the score function optimized during the training tive models are sensitive to the frequency of enti-
has any effect on the geometry of the resulting em- ties, with high frequency bins having higher conic-
bedding. For this experiment, we set the number ity than low frequency bins. However, no such pat-
of negative samples to 1 and the vector dimension tern was observed for multiplicative models and
to 100 (we got similar results for 50-dimensional 3
We also tried using the global mean instead of mean of
vectors). Figure 2 and Figure 3 show the distribu- the sampled set for calculating cosine similarity in ATM, and
tion of ATMs of these sampled entity and relation got very similar results.

127
Figure 4: Conicity (left) and Average Vector Length (right) vs Number of negative samples for entity
vectors learned using various KG embedding methods. In each bar group, first three models are additive,
while the last three are multiplicative. Main findings from these plots are summarized in Section 6.2

conicity was consistently similar across frequency vector space. From Figure 4 (right), we observe
bins. For clarity, we have not shown different plots that the average length of entity vectors produced
for individual frequency bins. by additive models is also invariant of any changes
Relation Embeddings: As in entity embeddings, in number of negative samples. On the other hand,
we observe a similar trend when we look at the increase in negative sampling decreases the aver-
distribution of ATMs for relation vectors in Fig- age entity vector length for all multiplicative mod-
ure 3. The conicity of relation vectors generated els except HolE. The average entity vector length
using additive models is almost zero across fre- for HolE is nearly 1 for any number of negative
quency bands. This coupled with the high vec- samples, which is understandable considering it
tor spread observed, suggests that these vectors constrains the entity vectors to lie inside a unit
are scattered throughout the vector space. Re- ball (Nickel et al., 2016). This constraint is also
lation vectors from multiplicative models exhibit enforced by the additive models: TransE, TransR,
high conicity and low vector spread, suggesting and STransE.
that they lie in a narrow cone centered at origin, Relation Embeddings: Similar to entity embed-
like their entity counterparts. dings, in case of relation vectors trained using ad-
ditive models, the average length and conicity do
6.2 Effect of Number of Negative Samples on not change while varying the number of negative
Geometry samples. However, the conicity of relation vec-
Summary of Findings: tors from multiplicative models decreases with in-
Additive: Conicity and average length are in- crease in negative sampling. The average rela-
variant to changes in #NegativeSamples for tion vector length is invariant for all multiplica-
both entities and relations. tive methods, except for HolE. We see a surpris-
Multiplicative: Conicity increases while av- ingly big jump in average relation vector length
erage vector length decrease with increasing for HolE going from 1 to 50 negative samples, but
#NegativeSamples for entities. Conicity de- it does not change after that. Due to space con-
creases, while average vector length remains straints in the paper, we refer the reader to the Sup-
constant (except HolE) for relations. plementary Section for plots discussing the effect
For experiments in this section, we keep the of number of negative samples on geometry of re-
vector dimension constant at 100. lation vectors.
Entity Embeddings: As seen in Figure 4 (left), We note that the multiplicative score between
the conicity of entity vectors increases as the num- two vectors may be increased by either increas-
ber of negative samples is increased for multi- ing the alignment between the two vectors (i.e., in-
plicative models. In contrast, conicity of the en- creasing Conicity and reducing vector spread be-
tity vectors generated by additive models is unaf- tween them), or by increasing their lengths. It is
fected by change in number of negative samples interesting to note that we see exactly these ef-
and they continue to be dispersed throughout the fects in the geometry of multiplicative methods

128
Figure 5: Conicity (left) and Average Vector Length (right) vs Number of Dimensions for entity vectors
learned using various KG embedding methods. In each bar group, first three models are additive, while
the last three are multiplicative. Main findings from these plots are summarized in Section 6.3.

analyzed above. tor dimension on conicity and length, we set the


number of negative samples to 1, while varying
6.2.1 Correlation with Geometry of Word the vector dimension. From Figure 5 (left), we
Embeddings observe that the conicity for entity vectors gen-
Our conclusions from the geometrical analysis of erated by any additive model is almost invariant
entity vectors produced by multiplicative mod- of increase in dimension, though STransE exhibits
els are similar to the results in (Mimno and a slight decrease. In contrast, entity vector from
Thompson, 2017), where increase in negative multiplicative models show a clear decreasing pat-
sampling leads to increased conicity of word vec- tern with increasing dimension.
tors trained using the skip-gram with negative
sampling (SGNS) method. On the other hand, ad-
As seen in Figure 5 (right), the average lengths
ditive models remain unaffected by these changes.
of entity vectors from multiplicative models in-
SGNS tries to maximize a score function of the
crease sharply with increasing vector dimension,
form wT · c for positive word context pairs, where
except for HolE. In case of HolE, the average vec-
w is the word vector and c is the context vector
tor length remains constant at one. Deviation in-
(Mikolov et al., 2013). This is very similar to the
volving HolE is expected as it enforces entity vec-
score function of multiplicative models as seen in
tors to fall within a unit ball. Similar constraints
Table 1. Hence, SGNS can be considered as a mul-
are enforced on entity vectors for additive models
tiplicative model in the word domain.
as well. Thus, the average entity vector lengths are
Hence, we argue that our result on the increase not affected by increasing vector dimension for all
in negative samples increasing the conicity of vec- additive models.
tors trained using a multiplicative score function
can be considered as a generalization of the one in
(Mimno and Thompson, 2017). Relation Embeddings: We reach similar conclu-
sion when analyzing against increasing dimension
6.3 Effect of Vector Dimension on Geometry the change in geometry of relation vectors pro-
Summary of Findings: duced using these KG embedding methods. In
Additive: Conicity and average length are in- this setting, the average length of relation vectors
variant to changes in dimension for both entities learned by HolE also increases as dimension is in-
and relations. creased. This is consistent with the other meth-
Multiplicative: Conicity decreases for both en- ods in the multiplicative family. This is because,
tities and relations with increasing dimension. unlike entity vectors, the lengths of relation vec-
Average vector length increases for both entities tors of HolE are not constrained to be less than
and relations, except for HolE entities. unit length. Due to lack of space, we are unable to
show plots for relation vectors here, but the same
Entity Embeddings: To study the effect of vec- can be found in the Supplementary Section.

129
Figure 6: Relationship between Performance (HITS@10) on a link prediction task vs Conicity (left) and
Avg. Vector Length (right). For each point, N represents the number of negative samples used. Main
findings are summarized in Section 6.4.

6.4 Relating Geometry to Performance to vectors being more dispersed in the space.
We see another interesting observation regard-
Summary of Findings:
ing the high sensitivity of HolE to the number of
Additive: Neither entites nor relations exhibit
negative samples used during training. Using a
correlation between geometry and performance.
large number of negative examples (e.g., N = 50
Multiplicative: Keeping negative samples fixed,
or 100) leads to very high conicity in case of HolE.
lower conicity or higher average vector length
Figure 6 (right) shows that average entity vector
for entities leads to improved performance. No
length of HolE is always one. These two obser-
relationship for relations.
vations point towards HolE’s entity vectors lying
In this section, we analyze the relationship be-
in a tiny part of the space. This translates to HolE
tween geometry and performance on the Link pre-
performing poorer than all other models in case of
diction task, using the same setting as in (Bordes
high numbers of negative sampling.
et al., 2013). Figure 6 (left) presents the effects of
We also did a similar study for relation vectors,
conicity of entity vectors on performance, while
but did not see any discernible patterns.
Figure 6 (right) shows the effects of average entity
vector length.4 7 Conclusion
As we see from Figure 6 (left), for fixed num-
ber of negative samples, the multiplicative model In this paper, we have initiated a systematic study
with lower conicity of entity vectors achieves bet- into the important but unexplored problem of an-
ter performance. This performance gain is larger alyzing geometry of various Knowledge Graph
for higher numbers of negative samples (N). Addi- (KG) embedding methods. To the best of our
tive models don’t exhibit any relationship between knowledge, this is the first study of its kind.
performance and conicity, as they are all clustered Through extensive experiments on multiple real-
around zero conicity, which is in-line with our ob- world datasets, we are able to identify several in-
servations in previous sections. In Figure 6 (right), sights into the geometry of KG embeddings. We
for all multiplicative models except HolE, a higher have also explored the relationship between KG
average entity vector length translates to better embedding geometry and its task performance.
performance, while the number of negative sam- We have shared all our source code to foster fur-
ples is kept fixed. Additive models and HolE don’t ther research in this area.
exhibit any such patterns, as they are all clustered
just below unit average entity vector length. Acknowledgements
The above two observations for multiplicative We thank the anonymous reviewers for their con-
models make intuitive sense, as lower conicity and structive comments. This work is supported in
higher average vector length would both translate part by the Ministry of Human Resources Devel-
4
A more focused analysis for multiplicative models is pre- opment (Government of India), Intel, Intuit, and
sented in Section 3 of Supplementary material. by gifts from Google and Accenture.

130
References Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark
Johnson. 2016. Stranse: a novel embedding model
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim
of entities and relationships in knowledge bases. In
Sturge, and Jamie Taylor. 2008. Freebase: a collab-
Proceedings of NAACL-HLT. pages 460–466.
oratively created graph database for structuring hu-
man knowledge. In Proceedings of the 2008 ACM Maximilian Nickel, Lorenzo Rosasco, and Tomaso A.
SIGMOD international conference on Management Poggio. 2016. Holographic embeddings of knowl-
of data. AcM, pages 1247–1250. edge graphs. In AAAI.
Antoine Bordes, Nicolas Usunier, Alberto Garcia- Srinivas Ravishankar, Chandrahas, and Partha Pratim
Duran, Jason Weston, and Oksana Yakhnenko. Talukdar. 2017. Revisiting simple neural networks
2013. Translating embeddings for modeling multi- for learning representations of knowledge graphs.
relational data. In Advances in neural information 6th Workshop on Automated Knowledge Base Con-
processing systems. pages 2787–2795. struction (AKBC) at NIPS 2017 .
T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg,
2017. Convolutional 2D Knowledge Graph Embed- I. Titov, and M. Welling. 2017. Modeling Relational
dings. ArXiv e-prints . Data with Graph Convolutional Networks. ArXiv e-
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko prints .
Horn, Ni Lao, Kevin Murphy, Thomas Strohmann,
Shaohua Sun, and Wei Zhang. 2014. Knowledge Richard Socher, Danqi Chen, Christopher D Manning,
vault: A web-scale approach to probabilistic knowl- and Andrew Ng. 2013. Reasoning with neural ten-
edge fusion. In Proceedings of the 20th ACM sor networks for knowledge base completion. In Ad-
SIGKDD international conference on Knowledge vances in Neural Information Processing Systems.
discovery and data mining. ACM, pages 601–610. pages 926–934.

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Fabian M Suchanek, Gjergji Kasneci, and Gerhard
Xuan Zhu. 2015. Learning entity and relation em- Weikum. 2007. Yago: a core of semantic knowl-
beddings for knowledge graph completion. In AAAI. edge. In WWW.
pages 2181–2187.
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoi-
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- fung Poon, Pallavi Choudhury, and Michael Gamon.
rado, and Jeff Dean. 2013. Distributed representa- 2015. Representing Text for Joint Embedding of
tions of words and phrases and their compositional- Text and Knowledge Bases. In Empirical Methods
ity. In Advances in neural information processing in Natural Language Processing (EMNLP). ACL
systems. pages 3111–3119. Association for Computational Linguistics.
George A Miller. 1995. Wordnet: a lexical database for Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric
english. Communications of the ACM 38(11):39– Gaussier, and Guillaume Bouchard. 2016. Complex
41. embeddings for simple link prediction. In ICML.
David Mimno and Laure Thompson. 2017. The strange Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng
geometry of skip-gram with negative sampling. In Chen. 2014. Knowledge graph embedding by trans-
Proceedings of the 2017 Conference on Empirical lating on hyperplanes. In AAAI. Citeseer, pages
Methods in Natural Language Processing. pages 1112–1119.
2863–2868.
T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Bet- Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng
teridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, Gao, and Li Deng. 2014. Embedding entities and
J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, relations for learning and inference in knowledge
N. Nakashole, E. Platanios, A. Ritter, M. Samadi, bases. arXiv preprint arXiv:1412.6575 .
B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen,
A. Saparov, M. Greaves, and J. Welling. 2015.
Never-ending learning. In Proceedings of AAAI.

131

You might also like