Intelli Graph
Intelli Graph
Paul Groth
University of Amsterdam
Abstract
Knowledge Graph Embedding (KGE) models are used to learn continuous rep-
resentations of entities and relations. A key task in the literature is predicting
missing links between entities. However, Knowledge Graphs are not just sets of
links but also have semantics underlying their structure. Semantics is crucial in
several downstream tasks, such as query answering or reasoning. We introduce the
subgraph inference task, where a model has to generate likely and semantically
valid subgraphs. We propose IntelliGraphs, a set of ve new Knowledge Graph
datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed
in logical rules for evaluating subgraph inference. We also present the dataset
generator that produced the synthetic datasets. We designed four novel baseline
models, which include three models based on traditional KGEs. We evaluate their
expressiveness and show that these models cannot capture the semantics. We be-
lieve this benchmark will encourage the development of machine learning models
that emphasize semantic understanding.
1 Introduction
Knowledge Graphs (KGs) contain knowledge about the world structured as graphs with entities
connected through different relations [Hogan et al., 2021]. Large-scale KGs are widely used in a
range of applications, such as query answering [Arakelyan et al., 2020] and information retrieval
[Noy et al., 2019].
To address the problem of incompleteness in KGs, Knowledge Graph Embedding (KGE) models
were developed. These learn continuous representations for entities and relations [Bordes et al.,
2013, Yang et al., 2014] through link prediction, the task of predicting missing links in large KGs,
by learning scoring functions that rank entities [Rufnelli et al., 2019]. These approaches implicitly
assume that each link (also known as a triple) in a Knowledge Graph can be predicted independently.
In this view of Knowledge Graphs, each triple is seen as a kind of “atomic fact” which is true or false
independent of other triples.
However, in modern Knowledge Graphs, triples depend on each other. For example, the triples
value(temperature_NY, 77) and unit(temperature_NY, Fahrenheit) together describe
that the temperature in New York is 77 °F. In this case, the truth of the rst triple depends on
the content of the second. Figure 1 provides another example: the fact that "Barack Obama lives
in the White House" highly depends on the fact that "Barack Obama is the president of the United
States" and on the temporal context 2009 – 2017.
1. Subgraph Inference. We dene a new task, where the goal is to generate, from a set of
examples, novel subgraphs that follow certain logical rules. We specied new evaluation
metrics that help empirically assess generated graphs’ semantic validity and novelty.
2. IntelliGraphs
(a) Synthetic Datasets. We propose three synthetic datasets, each designed to capture
different levels of semantics. We also describe the underlying semantics using First-
Order Logic.
(b) Real-world Datasets. We extract subgraphs from Wikidata 2 according to simple basic
patterns to generate two real-world datasets.
3. Data Generator. We developed a Python package that randomly generates and veries
subgraphs using pre-dened logical constraints.
2 Benchmark Task
2.1 Limitations of Link Predictors
Binary relations KGE models exploit structural regularities to perform Knowledge Graph com-
pletion. The last decade has seen the developments of several KGE models [Rufnelli et al., 2019],
which predict the likelihood that a pair of entities are related by a given binary relation. However,
a set of binary relations cannot represent an N-ary relation because the links depend on each other.
1
That is, for a true generalization of link prediction to subgraph prediction, we would provide a single, large
knowledge graph and require a model to predict missing subgraphs. In the interest of separating concerns, we
ask here only if generative models over small knowledge graphs are feasible.
2
https://www.wikidata.org/
3
https://pypi.org/project/intelligraphs
4
https://anaconda.org/thiv/intelligraphs
5
https://doi.org/10.5281/zenodo.7824818
2
Regardless of the context, KGE models assign a set of probabilities on links, and those probabilities
are independent of each other.
N-ary relations Link prediction has been extended to cover N-ary relations, where the goal is
to predict a missing link in an N-ary fact. N-ary relation can operate on any arbitrary number of
entities. Modelling N-ary relations as triples and treating them as entities in binary relations results
in a loss of structural information [Wen et al., 2016]. Wen et al. [2016] dene N-ary relations as the
mappings from the attribute sequences to the attribute values, such that each N-ary fact is an instance
of the corresponding N-ary relation. GRAN is a graph-based approach which uses a Transformer
decoder to score N-ary facts [Wang et al., 2021]. NeuInfer uses fully-connected neural networks to
embed N-ary relations and score candidate triples [Guan et al., 2020]. These models were evaluated
by inferring an element in an N-ary fact. Because a single N-ary relation can be represented in a set
of binary relations (i.e. triples), subgraphs can be used to represent N-ary relations. This means that
Subgraph models could be used to solve N-ary relation prediction, but the task is strictly broader than
that: every single N-ary relation can be represented as a subgraph, but not every subgraph can be
represented in a single N-ary relation.
Link prediction evaluation The standard link prediction evaluation framework [Rufnelli et al.,
2019] uses ranking-based evaluation metrics which do not explicitly check for the semantics of
the predicted links. Instead, the evaluation protocol assumes that the underlying semantics can
be indirectly validated if a missing link has been correctly predicted. Therefore, the standard link
prediction framework is not suitable for evaluating Subgraph Inference. In our work, we set out to
explicitly check the semantics during evaluation.
A Knowledge Graph, G, is a tuple G = (V, E, R, S) where V is a set of nodes and E is a set of edges
where E = V × R × V . R is the set of relations and E is the set of entities in G. V is drawn from
the set of possible entities E. S is a set of functions that dene the semantics of G by determining
which structures are permissible or not in G.
Given a Knowledge Graph G, we call subgraph F a tuple V f , E f , R where V f ⊂ V and
These subgraphs can be added back to the KG; therefore, this task can be seen as an extension of link
prediction. To make this extension complete, we should also specify how the training subgraphs are
extracted from G. However, to isolate the question of generative modelling of knowledge graphs,
we take this process as given in our tasks. For instance, in the two real-world datasets, we extract
subgraphs from Wikidata according to a hand-designed pattern. In the synthetic datasets, we simply
provide a set of small knowledge graphs over a shared set of entities and relations, leaving the larger
graph G entirely implicit. With this choice, the task reduces to training a generative model over small
knowledge graphs with a shared set of entities and relations.
Key Challenge. The subgraphs need to adhere to specic semantics which, in a learning setting,
have to be inferred from a limited set of examples, such as learning the types of entities.
6
We do not specify what larger graph this graph is a subgraph of. In most of our tasks, only the set of entities
and relations of the larger graph is given, and the rest of the graph is left implicit.
7
In symbolic AI, the term inference means formal reasoning. Here we use it to mean estimating the probability
distribution over the model’s unobserved variables given observed data (i.e. probabilistic inference).
3
Table 1: The size of the training, validation and test split for the ve datasets used in this work. The
number of edges is xed for the synthetic datasets and is variable for the Wikidata-based graphs.
3 IntelliGraphs
Motivated by the aforementioned limitations of link prediction datasets and the new task of subgraph
inference, we introduce ve new benchmark datasets where each dataset tests different semantics.
Table 1 shows key statistics about the synthetic and real-world graphs. The appendix (see Section
7.3.1) describes the algorithm used to generate the datasets.
Data Generator The sampler D samples subgraphs according to a probability distribution P
dened in the Python implementation of IntelliGraphs. For each dataset’s logical constraints L,
the sampler samples a graph F from the probability distribution P , ensuring that F satises all the
constraints in L.
Existential Nodes In some settings, it is necessary to have nodes that refer to entities that only
occur in one instance. For example, in the wd-movies dataset introduced below, each subgraph
in the data represents a movie. Its actors, directors and genres are entities that occur in multiple
instances, so a model can learn representations by observing the different contexts in which they
occur. However, each instance also contains one node representing the movie the graph describes.
These only occur in one instance, so a model cannot learn a representation for the specic instance,
only a general representation which expresses that some movie exists for which this subgraph is
true. We call such nodes existential nodes (in analogy to existentially quantied variables in logical
formulas) and use a special label, such as _movie, to refer to them in all instances.8 Strictly speaking,
this turns the predicted subgraphs into subgraph patterns of the Knowledge Graph G, but we refer to
them as subgraphs to keep the terminology simple.
3.1 Semantics
We use First-Order Logic (FOL) to express the underlying logical rules of the datasets. Section 7.4 in
the appendix provides a complete set of logical constraints for each IntelliGraphs dataset.
Logical Constraint Verier. The Logical Constraint Verier v is a function that veries whether
the logical constraints L hold in a generated subgraph F . We built a constraint verier within the
IntelliGraphs Python package.9 The constraint verier v(F, L) returns true if and only if the subgraph
F is consistent with all logical rules L.
Synthetic datasets allow complete control over the problem setup and provide a convenient testbed for
developing new machine learning models. The dataset is generated by the generator, D. We checked
if the generated subgraphs satisfy the logical rules L.10 Here is a brief description of the synthetic
datasets:
8
For most models, the difference will only be in the interpretation. For example, our baseline models will
learn one embedding vector for the node labelled _movie, which we use wherever movies occur. As such, we
do not treat it differently from the node labelled Antonio_Banderas, although when we interpret the graph,
these nodes mean different things.
9
A reasoning engine could also be used for checking the subgraphs for logical consistency. We wrote a set of
functions in Python for constraint verication and embedded it into the IntelliGraphs Python package to easily
verify graphs without loading them into a reasoning engine.
10
It is important to note that logical consistency does not equate to factual accuracy. We simply want to ensure
that the generated dataset is consistent with the logical rules.
4
• syn-paths is a dataset with path graphs. Path graphs have simple semantics that can be
algorithmically veried in linear time. Path graphs have a single directed path of length 3
and no other edges.
• syn-types contains entities with types Language, Country and City. These are con-
nected by three relations according to the relation’s type constraints: sam_type_as can
only exist between the same entity types, could_be_part_of between a capital city and
country, and could_be_spoken_in between a language and a country. The connections
are otherwise random.
• syn-tipr contains subgraphs based on the Time-indexed Person Role (tipr) ontology
pattern.11 Here, the semantics are dened by the tipr graph pattern. The semantics include
the fact that the start of an interval must precede its end.
Wikidata12 [Vrandečić and Krötzsch, 2014] is a large graph-structured knowledge base which consists
of crowdsourced factual knowledge on various topics. We created two datasets from Wikidata using
specic graph patterns to extract subgraphs about movies and research articles. Here is a brief
description of the two datasets:
• wd-movies contains small graphs extracted from Wikidata that describe movies. Each graph
contains one existential node representing the movie, entity nodes for the movie’s director(s)
connected by a has_director relation, entity nodes for the movie’s cast connected by a
has_actor relation and an entity for the movie’s genre connected by a has_genre relation.
• wd-articles contains small graphs that describe research articles extracted from Wikidata.
Each article is annotated by an ordered list of authors, implemented by a blank node for
each author linked to a node representing the author and to a node representing the order in
the author list. We add a list of the other articles that the current article references, and a list
of subjects, together with selected superclasses of those subjects. In this dataset, most node
types, including the article’s node, may be existential or entity nodes.
4 Evaluation
4.1 Evaluation by bits-per-graph
The most common objective for a generative model is probably maximum likelihood: the probability
of a graph from the test data under the model should have maximal probability, or, equivalently,
minimal negative log probability. When base 2 logarithms are used, the latter quantity, − log2 p(S, E),
can be interpreted as the number of bits required to compress the graph [Rissanen, 1978, Grünwald,
2007].
Averaging over all graphs, we arrive at a metric of bits-per-graph to evaluate how well our model
satises the maximum likelihood objective. Moreover, each of the terms in Equation 1 can be
read as separate codelengths: − log2 P (E) describes the bits required to encode the entities, and
− log2 P (S | E) describes the bits required to encode the structure once the entities are known.
4.2 Semantics
We evaluate the semantics of graphs generated by our baseline models using the following evaluation
metrics: 1) % Valid Graphs is the probability of sampling graphs that are logically valid according
to the logical constraints for each dataset, 2) % Novel Graphs is the probability of sampling graphs
that are not in the training data, 3) % Novel & Valid Graphs is the probability of sampling graphs
that are logically valid and are not in the training data, and 4) % Empty Graphs is the probability of
sampling graphs that did not yield any graphs, due to either p(E) or p(S | E) being too low. An ideal
model gives a high probability of sampling logically valid graphs but uses a minimal number of code
lengths to compress graphs.
11
http://ontologydesignpatterns.org/wiki/Submissions:Time_indexed_person_role
12
https://www.wikidata.org
5
4.3 Baseline Models
To the best of our knowledge, no probabilistic models in the literature can infer new subgraphs for
knowledge graphs. Therefore, we developed a set of simple baselines inspired by traditional KGE
models: ComplEx [Trouillon et al., 2016], DistMult [Yang et al., 2014] and TransE [Bordes et al.,
2013]. Traditional KGE models are trained to rank all possible triples to give the correct triple
the highest score [Rufnelli et al., 2019]. ComplEx, DistMult and TransE all use different scoring
functions. TransE represents relations as translation between entities, whereas DistMult models
relations as bilinear interactions. ComplEx extends DistMult using complex-valued embeddings.
We model a subgraph F by decomposing it into its entities and structure F = (E, S), that is,
p(F ) = p(S | E) p(E). Unlike traditional KGE models, we train our baseline models with a
maximum likelihood objective.
We decompose the objective function as
− log2 p(F ) = − log2 p(S|E) − log2 p(E). (1)
We model p(E) = e∈E p(e), with p(e) estimated as the relative frequency of e in the training data
(the proportion of training subgraphs it occurs in). We train KGE models to estimate p(S | E). We
use
p(S | E) = p((s, p, o) | E) 1 − p((s, p, o) | E),
(s,p,o)∈ST (s,p,o)∈SN
where ST represents the triples in the subgraph F , and SN represents all possible triples that are not
in the subgraph (i.e. all possible negatives).
Our random baseline model generates a random graph prediction by sampling p(E) and p(S|E)
from a uniform distribution. It then computes the exact number of bits required to represent these
probabilities, using −log2 (p) to determine the entropy of each probability value. This model does
not need to be trained.
Table 2 shows that the KGE baselines learn more compact representations than the random baseline.
The ComplEx baseline is most effective at compressing the structure of these graphs p(S|E), despite
requiring real and imaginary parts. The scale of complexity, represented by code length, seems
to increase rapidly from synthetic to real-world datasets. For instance, the highest code length for
syn-paths is 69.51 (for the TransE baseline), while the lowest code length for wd-movies is 202.68
(for ComplEx). wd-movies and wd-articles have many more entities to sample, making them
more challenging to compress.
Table 3 shows the probabilities of sampling graphs that are logically consistent. We perform subgraph
inference under two different settings:
• Sampling P (E) and P (S | E). Here, the baseline models sample both the entities that
are relevant for a subgraph and infer their edge connectivity. Our results indicate that the
probability of sampling valid graphs is consistently 0%. Selecting the incorrect entities
negatively impacts the structure prediction. Our results indicate that this task is challenging,
especially for the random baseline, as it consistently fails to infer valid subgraphs.
• Sampling only P (S | E). In this setup, the model is given an advantage by having access
to the correct set of entities (i.e., we give p(E)), such that it only needs to predict the edge
connections between the given entities. It is worth noting that under this setting, the baseline
model collapses into a link predictor as it just predicts the edge connections between the
given entities. Despite giving the advantage, the baseline models could not generate many
logically consistent subgraphs. Interestingly, this also reveals the complexity of the datasets
and what semantics these KGE models can learn. Most KGE models are able to generate
some valid path graphs, while for syn-tipr, which requires some temporal reasoning,
seems more challenging for all baseline models. Inferring the correct entity types from
syn-types was possible for a few graphs.
6
Table 2: Estimate of the codelengths, − log2 p(F ), (the number of bits) required to compress a graph
using the four baseline models for IntelliGraphs datasets. We used the test split for this. We rounded
the numbers up/down to two decimal points.
5 Related Work
Datasets for Query Embedding. Query Embedding (QE) involves interpreting complex logical
queries, commonly represented as a small graph, and evaluated on QE datasets, such as GQE
[Hamilton et al., 2018], Query2Box [Ren et al., 2020], and BetaE [Ren and Leskovec, 2020]. Ren
et al. [2023] presents a comprehensive comparison of datasets. As Ren et al. [2023] highlight, query
embedding datasets lack rules and types. Although the datasets in IntelliGraphs are similar to query
embedding datasets, there is a difference in the purpose and applications. Our datasets can be used
for learning distributions to infer new logically consistent subgraphs. In contrast, QA datasets are
concerned with reasoning using logical rules to nd a missing entity.
Datasets for n-ary Relations. N-ary relations are relations involving more than two entities. Various
methods have been studied in the literature that embeds complex N-ary relations, often in non-
euclidean spaces [Wang et al., 2021, Wen et al., 2016]. The difference between N-ary relations and
subgraphs is explained in Section 2.1.
Datasets for Neurosymbolic methods. If we interpret knowledge graphs as a set of logical statements,
we can see that the task of subgraph prediction is a neurosymbolic method: it combines symbolic
systems with neural networks. Datasets have been proposed to test various aspects of such systems:
interpretability, reasoning, and generalization capabilities. Several datasets were proposed to evaluate
the understanding and reasoning of complex rules and abstract concepts. Table 4 (in the appendix)
compares different datasets for Neurosymbolic AI from the literature. Existing datasets focus
primarily on the image and text modalities, neglecting background knowledge expressed in graphs.
6 Conclusion
Existing KG datasets used for representation learning lack well-understood semantics, which limits
studying how well KGE models capture new semantics. In our work, we propose Subgraph Inference
as a new research problem and IntelliGraphs, a collection of ve new datasets for benchmarking
models. Furthermore, we used baseline models inspired by traditional KGE models to estimate the
code lengths of these graphs and sample logically valid subgraphs. Our ndings show that traditional
KGE models show a limited understanding of semantics after training. We observed a rapid increase
7
Table 3: Semantic validity of the graphs produced by our baseline models. We have tested subgraph
inference under two settings: 1) Sampling from both P (E) and P (S | E), and 2) Sampling from
P (S | E) only, taking E from the test data. We check the novelty of the sampled graphs by comparing
them against the training and validation set. We used the same hyperparameters from the model
compression experiments here.
% % Novel % %
Setting Dataset Model Valid & Valid Novel Empty
Graphs Graphs Graphs Graphs
random 0 0 100 0
TransE 0.25 0.25 23.45 76.55
syn-paths
DistMult 0.69 0.69 14.59 85.41
Sampling from P (E) and P (S | E)
random 0 0 100 0
TransE 0 0 99.45 0.55
syn-tipr
DistMult 0 0 99.43 0.57
ComplEx 0 0 99.64 0.36
random 0 0 100 0
TransE 1.43 1.43 95.42 4.58
syn-types
DistMult 1.44 1.44 96.19 4.81
ComplEx 1.01 1.01 94.17 5.83
random 0 0 100 0
TransE 0.07 0.07 97.01 2.99
wd-movies
DistMult 0.10 0.10 95.86 4.17
ComplEx 0.41 0.41 93.04 6.96
random 0 0 100 0
TransE 0 0 98.35 1.65
wd-articles
DistMult 0 0 98.77 1.23
ComplEx 0 0 100.00 0.00
in complexity, represented by code lengths, from synthetic to real-world datasets. This complexity
makes real-world datasets more challenging to compress, which is an essential consideration for
future research in graph compression. We found that the probability of sampling valid graphs was
consistently low, emphasizing the complexity and difculty of the task.
Limitations. Subgraph inference assumes that the semantics of a KG is known. However, in some
cases, this assumption may not hold. Furthermore, our datasets assume we test the machine learning
8
models in a transductive setting; entities and relations not seen during training will not be handled
well.
Ethics Statement. Our synthetic graphs are based on the logical rules we constructed and should not
be used for applications where factual accuracy matters. However, wd-movies and wd-articles
are based on real-world factual knowledge retrieved from Wikidata. Therefore, certain biases may be
inherited from Wikidata. Since these datasets are likely unsuitable for training production models
or for pretraining, we do not expect that these biases will ever affect systems making real-world
decisions. Transparency about dataset creation and maintenance is critical for adopting new machine
learning datasets [Gebru et al., 2021]. In the appendix, we provide a data card for IntelliGraphs to
provide further information about the datasets.
Applications of IntelliGraphs. It is imperative to have guarantees for safety-critical applications to
prevent machine learning models from making fatal mistakes. To develop these systems, datasets
with logical constraints are helpful. In some problem domains, there is little or no data available such
as cases where training machine learning models on sensitive data for medical or industrial use cases.
In these cases, IntelliGraph’s dataset generation framework can be used to generate synthetic datasets
using background knowledge about the problem domain.
Acknowledgements We would like to thank Frank van Harmelen and Patrick Koopmann for their
feedback on this work.
References
Erik Arakelyan, Daniel Daza, Pasquale Minervini, and Michael Cochez. Complex query answering
with neural link predictors. In International Conference on Learning Representations (ICLR),
2020.
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko.
Translating embeddings for modeling multi-relational data. In Neural Information Processing
Systems (NIPS), pages 1–9, 2013.
Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large annotated
corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing (EMNLP), pages 632–642, 2015.
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and
Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.
arXiv preprint arXiv:1803.05457, 2018.
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach,
Hal Daumé Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64(12):
86–92, 2021.
Eleonora Giunchiglia, Mihaela Cătălina Stoian, Salman Khan, Fabio Cuzzolin, and Thomas
Lukasiewicz. Road-r: The autonomous driving dataset with logical requirements. arXiv preprint
arXiv:2210.01597, 2022.
Peter D Grünwald. The minimum description length principle. MIT press, 2007.
Saiping Guan, Xiaolong Jin, Jiafeng Guo, Yuanzhuo Wang, and Xueqi Cheng. NeuInfer: Knowledge
inference on N-ary facts. In Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, pages 6141–6151, Online, July 2020. Association for Computational
Linguistics. doi: 10.18653/v1/2020.acl-main.546. URL https://aclanthology.org/2020.
acl-main.546.
Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, and Jure Leskovec. Embedding logical
queries on knowledge graphs. Advances in neural information processing systems, 31, 2018.
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez,
Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. Knowledge
graphs. ACM Computing Surveys (CSUR), 54(4):1–37, 2021.
9
Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and
compositional question answering. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 6700–6709, 2019.
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and
Ross Girshick. Clevr: A diagnostic dataset for compositional language and elementary visual
reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2017.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen,
Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and
vision using crowdsourced dense image annotations. In Proceedings of the 30th AAAI Conference
on Articial Intelligence, pages 4088–4095, 2017.
Brenden Lake and Marco Baroni. Generalization without systematicity: On the compositional skills
of sequence-to-sequence recurrent networks. In International conference on machine learning,
pages 2873–2882. PMLR, 2018.
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. Industry-
scale knowledge graphs: Lessons and challenges: Five diverse technology companies show how
it’s done. Queue, 17(2):48–75, 2019.
Hongyu Ren and Jure Leskovec. Beta embeddings for multi-hop logical reasoning in knowledge
graphs. Advances in Neural Information Processing Systems, 33:19716–19726, 2020.
Hongyu Ren, Weihua Hu, and Jure Leskovec. Query2box: Reasoning over knowledge graphs in
vector space using box embeddings. arXiv preprint arXiv:2002.05969, 2020.
Hongyu Ren, Mikhail Galkin, Michael Cochez, Zhaocheng Zhu, and Jure Leskovec. Neural graph rea-
soning: Complex logical query answering meets graph databases. arXiv preprint arXiv:2303.14617,
2023.
Jorma Rissanen. Modeling by shortest data description. Automatica, 14(5):465–471, 1978.
Daniel Rufnelli, Samuel Broscheit, and Rainer Gemulla. You can teach an old dog new tricks! on
training knowledge graph embeddings. In International Conference on Learning Representations,
2019.
Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan Pascanu, Peter
Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning.
Advances in Neural Information Processing Systems, 30, 2017.
David Saxton, Edward Grefenstette, Felix Hill, and Pushmeet Kohli. Analysing mathematical
reasoning abilities of neural models. arXiv preprint arXiv:1904.01557, 2019.
Alane Suhr and Yoav Artzi. A corpus of natural language for visual reasoning. In Proceedings of the
55th Annual Meeting of the Association for Computational Linguistics (ACL), volume 1, pages
217–231, 2017.
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex
embeddings for simple link prediction. In International conference on machine learning, pages
2071–2080. PMLR, 2016.
Denny Vrandečić and Markus Krötzsch. Wikidata: a free collaborative knowledgebase. Communica-
tions of the ACM, 57(10):78–85, 2014.
Quan Wang, Haifeng Wang, Yajuan Lyu, and Yong Zhu. Link prediction on n-ary relational facts: A
graph-based approach. In Findings of the Association for Computational Linguistics: ACL-IJCNLP
2021, pages 396–407, Online, August 2021. Association for Computational Linguistics. doi: 10.
18653/v1/2021.ndings-acl.35. URL https://aclanthology.org/2021.findings-acl.35.
10
Jianfeng Wen, Jianxin Li, Yongyi Mao, Shini Chen, and Richong Zhang. On the representation and
embedding of knowledge bases beyond binary relations. arXiv preprint arXiv:1604.08642, 2016.
Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand
Joulin, and Tomas Mikolov. Towards ai-complete question answering: A set of prerequisite toy
tasks. arXiv preprint arXiv:1502.05698, 2015.
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and
relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
Jiale Yang, Jinnan Li, and Yuke Zhu. A dataset and architecture for visual reasoning with a working
memory. arXiv preprint arXiv:1803.06092, 2018.
Xianda Zhang, Yifan Zhang, and Mirella Lapata. Metaqa: A dataset of metaphorically annotated
movieqa questions. In Proceedings of the 27th International Conference on Computational
Linguistics, pages 1954–1964, 2018.
11
7 Supplementary Material
Contents
Neurosymbolic methods aim to combine neural networks with symbolic representations. As men-
tioned in Section 5, several datasets already exist in the literature for evaluating the performance of
neurosymbolic methods. Table 4 highlights widely used datasets used for benchmarking neurosym-
bolic systems.
To make our work fully reproducible, we make the codebase of our experiments public and open. Our
code is available on https://github.com/thiviyanT/IntelliGraphs. For each experiment,
we also provide the hyperparameter congurations we used. Furthermore, we have released a new
Python package for interacting with the IntelliGraphs datasets through the following software package
repositories: conda (https://anaconda.org/thiv/intelligraphs) and pypi (https://pypi.
org/project/intelligraphs). To ensure long-term preservation and easy access, we made
the datasets available on Zenodo (https://doi.org/10.5281/zenodo.7824818). Experimental
details can be found in the next Section.
We used the PyTorch library 13 to develop and test the models. All experiments were performed
on a single-node machine with an Intel(R) Xeon(R) Gold 5118 (2.30GHz, 12 cores) CPU and
64GB of RAM, with four NVIDIA RTX A4000 GPUs (16GB of VRAM). We used PyTorch’s
GPU acceleration for training the models. We used the Adam optimiser with variable learning rates
[Kingma and Ba, 2014].
13
https://pytorch.org/
12
Table 4: Brief comparison of commonly used datasets for benchmarking neurosymbolic methods,
listed in ascending order of publication year. For each dataset, we provide an overview of the task,
domain, modality, key characteristics, and whether the dataset is synthetic.
SNLI
Bowman et al. Logical Reasoning Natural Language Text Entailment, contradiction, No
[2015] neutral relationships
CLEVR
Johnson et al. Visual Reasoning Computer Vision Images & Text Object counting, comparison, Yes
[2017] querying attributes
NLVR
Suhr and Artzi Visual Reasoning Computer Vision Images & Text Visual reasoning, natural Yes
[2017] language understanding
Sort-of-CLEVR
Santoro et al. Relational Reasoning Computer Vision Images & Text Spatial and relational Yes
[2017] reasoning
Visual Genome
Krishna et al. Visual Reasoning Computer Vision Images & Text Object recognition, No
[2017] relationships, attributes
Aristo
Clark et al. Science Reasoning Natural Language Text Natural language No
[2018] understanding, applying
knowledge
COG
Yang et al. [2018] Cognitive Capabilities Computer Vision Images & Text Temporal and logical Yes
reasoning
MetaQA
Zhang et al. Multi-hop Reasoning Graph Knowledge Graph Multi-step reasoning, No
[2018] knowledge base
SCAN
Lake and Baroni Compositional Command-based Text Understanding and generating No
[2018] Generalization Language novel commands
Math Dataset
Saxton et al. Math Reasoning Natural Language Text Language understanding, No
[2019] symbolic reasoning
GQA
Hudson and Visual Reasoning Computer Vision Images Text & Spatial and relational No
Manning [2019] reasoning
ROAD-R
Giunchiglia et al. Visual Reasoning Computer Vision Videos & (handcrafted) Logical reasoning No
[2022] Logical Rules
7.3.1 Hyperparameters
For each dataset, we performed hyperparameter sweeps using every baseline model (TransE, DistMult,
ComplEx) using Weights&Biases 14 . For this, we used a random search strategy with the goal of
nding the hyperparameter congurations that yield the minimum compression bits on the validation
set. We do not include the reciprocal relation model, and we used the highest batch size that we could
t in memory. Table 5 shows the hyperparameter values we obtained via the sweeps. The random
baseline did not require hyperparameter netuning. We also used Weights & Biases for monitoring
our experiments.
Logical rules provide a formal framework for expressing and reasoning about the semantics of a
system. In this section, we discuss the logical rules we use to verify the semantics of the IntelliGraphs
datasets. We express each logical rule using First-Order Logic (FOL) unless otherwise stated. We
opted for First Order Logic (FOL) as the formal language to communicate logical constraints due to
14
https://wandb.ai/
13
Table 5: The results of a random hyperparameter search, presenting the chosen hyperparameters
for different datasets and baseline models. The hyperparameters include batch size, embedding
size, learning rate, biases usage, and initialization method. The batch size indicates the number of
training subgraphs processed together before updating the model. The embedding size represents the
dimensionality of the entity and relation embeddings. The learning rate controls the step size taken
during model optimization. The biases denote whether bias terms are included in the model, and the
initialization method refers to the technique used to initialize the model’s parameters.
Dataset Model Batch Size Emb. Learning Rate Biases Init.
syn-paths transe 4096 1531 7.029817939842623e-05 False uniform
syn-paths distmult 4096 158 0.0697979730927795 False uniform
syn-paths complex 4096 587 5.264944612887405e-05 False uniform
syn-tipr transe 2048 147 0.0008716274682049251 True normal
syn-tipr distmult 2048 168 0.005497983171450242 True normal
syn-tipr complex 2048 350 0.0015597556675205502 True normal
syn-types transe 2048 376 0.003017403610019781 True uniform
syn-types distmult 2048 273 0.0006013105272716594 True uniform
syn-types complex 2048 996 5.603405855158606e-05 False uniform
wd-movies transe 4096 68 0.000638003263107625 False normal
wd-movies distmult 4096 181 0.00307853821840767 True uniform
wd-movies complex 4096 102 0.019520125878695407 False uniform
wd-articles transe 32 888 6.094053758340765e-05 True normal
wd-articles distmult 32 65 0.03833121378755901 False uniform
wd-articles complex 32 283 0.002251396972378282 False normal
its ability to effectively express the necessary constraints and its widespread understanding within the
machine learning community 15 .
Although we provide the general FOL rules to check the semantics of graphs of any arbitrary lengths,
we apply a size constraint (i.e. checking for graphs with a xed number of triples) for the synthetic
datasets. This is because the synthetic data generator produces graphs with xed length and we
dened it as part of our semantics. The size constraint can also be expressed in FOL, but we specify
this constraint in natural language for brevity.
Traditionally, a reasoning engine is used to check logical consistencies in knowledge bases. We wrote
a semantic checker in Python. This was more convenient to use within our framework as the graphs
could be evaluated, without having to manually load them into a reasoning engine individually. Our
semantic checker was written to closely follow the logical rules, and it is accessible through the
IntelliGraph python package.
15
These FOL logical constraints can also be rewritten into data specication languages, such as DataLog.
14
7.4.2 Logical Rules of syn-types
15
7.4.5 Logical Rules of wd-articles
∃x : has_author(article_node, x)
∀x, y : connected(x, y) ⇔ has_author(x, y) ∨ has_name(x, y) ∨ has_order(x, y)∨
cites(x, y) ∨ has_subject(x, y) ∨ subclass_of (x, y)
∀x, y : connected(x, y) ⇒ ¬connected(y, x) ∨ cites(y, x)
∀x : ¬connected(x, x)
∀x, y : has_author(x, y) ⇒ x = article_node
article(article_node) ∨ iri(article_node)
∀x : has_author(article_node, x) ⇒ authorpos(x)
∀x : authorpos(x) ⇔ ∃y : has_order(x, y) ∧ ∃y : has_name(x, y)
∀x, y : has_order(x, y) ⇒ authorpos(x) ∧ ordinal(y)
∀x, y : has_name(x, y) ⇒ authorpos(x) ∧ name(y) ∨ iri(y)
∀x, y, z : has_order(x, y) ∧ has_order(x, z) ⇒ y = z
∀x, y, z : has_name(x, y) ∧ has_order(x, z) ⇒ y = z
∀x : author(x) ⇒ ¬subject(x) ∧ ¬iri(x) ∧ ¬name(x) ∧ ¬ordinal(x) ∧ ¬author_pos(x)
∀x : subject(x) ⇒ ¬author(x) ∧ ¬iri(x) ∧ ¬name(x) ∧ ¬ordinal(x) ∧ ¬author_pos(x)
∀x : iri(x) ⇒ ¬author(x) ∧ ¬subject(x) ∧ ¬name(x) ∧ ¬ordinal(x) ∧ ¬author_pos(x)
∀x : name(x) ⇒ ¬subject(x) ∧ ¬iri(x) ∧ ¬author(x) ∧ ¬ordinal(x) ∧ ¬author_pos(x)
∀x : ordinal(x) ⇒ ¬subject(x) ∧ ¬iri(x) ∧ ¬name(x) ∧ ¬author(x) ∧ ¬author_pos(x)
∀x : author_pos(x) ⇒ ¬subject(x) ∧ ¬iri(x) ∧ ¬name(x) ∧ ¬author(x) ∧ ¬ordinal(x)
∀x, y, z : subclass_trans(x, y) ∧ subclass_trans(y, z) ⇒ subclass_trans(x, z)
∀x, y : subclass_of (x, y) ⇒ subclass_trans(x, y)∧
(iri(x) ∨ subject(x)) ∧ (iri(y) ∨ subject(y))
∀x, y : subclass_of (x, y) ⇒ ∃z : subclass_trans(x, z) ∧ has_subject(article_node, z)
∀x, y : cites(x, y) ⇒ iri(y) ∧ x = article_node
∀x, y : has_subject(x, y) ⇒ (subject(y) ∨ iri(y)) ∧ x = article_node
In addition to the aforementioned rules for wd-articles, our semantic checker checks the ordinal
of the author’s position to make sure that they are a complete list of consecutive numbers (i.e.
ordinal_000, ordinal_001, ordinal_002, ..., etc.), but we leave it out of the rules for brevity.
The synthetic dataset generator contains two main modules: 1) a triple sampler is a module that
samples new triples one by one, 2) a triple verier module checks each triple for semantic validity
before they are added to a subgraph. The generator builds a subgraph by sampling one triple at a time
and veriying. If the triple passes the semantic check, it is added to a subgraph. To avoid duplicate
triples within the sam subgraph, we check if triple already exists before adding it to a subgraph. This
is done until a certain number of valid triples are samples. For reproducibility, we use the same seed
for all random data generations (seed=42).For each dataset, we generate training, validation and test
sets. To avoid data leakage, we check that these graphs are unique before splitting the dataset. In this
section, we briey describe how IntelliGraphs efciently samples valid subgraphs.
7.5.1 syn-paths
The entities are labelled after 49 Dutch cities and the relations are different modes of transport
(train_to, drive_to, cycle_to). This dataset primarily checks whether baseline models can
do structure learning.
A path graph Pk (G) of a graph G has vertex set Πk (G) and edges joining pairs of vertices that
represent two paths Pk , the union of which forms either a path Pk+1 . We denote by Πk (G) the set of
all paths of G on k vertices (k ≥ 1), and we randomly sample n edges from Πk (G) to generate each
path graph.
16
To generate a path graph, we begin by selecting a head (i.e. source node) by randomly selecting a
Dutch city and then we sample relation and a tail (i.e. target node). For the next triple in the subgraph,
we use the previous target node as the source node and then sample a relation and a target node. We
can repeat the last step k − 2 number of times to build a path-graph with k edges. We ensure that
each subgraph includes all three different relations. We avoid generating cyclic path-graphs.
7.5.2 syn-types
This dataset contains three types of entities (cities, countries & languages), 30 entities in
total (10 instances of each entity type), and three relations (same_type_as, could_be_part_of &
could_be_spoken_in). This dataset primarily checks whether baseline models can learn the types
of entities correctly.
For each relation, we sample a head and a tail entity of the corresponding type. For instance, for the
relation could_be_spoken_in we sample a language for the head of a triple and a country for the
tail. Similarly, we sample other triples to be added to the same subgraph, until a certain number of
valid triples have been sampled.
It is important to note that the syn-types dataset is not meant to be factually accurate but rather
serves as a way to study the type semantics learned by machine learning models.
7.5.3 syn-tipr
This datasets contain three entity types (names, roles, years) and (has_role, has_name,
start_year and end_year). We used a random name generator to generate 50 names. For
simplicity, we treat years are entities rather than literals. In each subgraph, there are two existential
nodes: _academic and _time). The main purpose of this dataset is to check structure learning and
check basic temporal reasoning (in this case, whether end_year appears after start_year).
The subgraphs in this dataset was modeeled after the time-indexed person role (tipr) pattern in
Semantic Web. For generating these subgraphs, we take the tipr pattern as a template and randomly
sampled entities the correct entity type. For instance, the relation has_role would always have an
academic_node in the head position of a triple and a role as a tail. Similarly, we sample triples for
the other relations (has_name, has_time, start_year, end_year). Valid triples containing every
relations is sampled. In total, every subgraph will contain ve triples.
For reproducibility, we use a specic Wikidata dump to extract the data, rather than the live version.
For both datasets, we use the Wikidata HDT dump from 3 March 2021, available from the HDT
website 16 .
In both cases, we rst extract all data that ts the template of the graph, for instance, for every movie
we extract all actors, directors and genres. We then prune this data to ensure that every entity occurs in
enough instances to allow a model to learn a representation for it. Depending on the dataset we either
remove the infrequent nodes or replace them by existential nodes. We set the minimal frequency to 6
in both datasets.
To avoid the situation where certain entity nodes are only present in the validation or test data, we
must make our splits carefully. Ideally, we’d like for each entity to be present in all three splits of the
data, and where this is not possible, for it to be present in at least the training data.
To achieve this, we use the following algorithm: for each instance we collect “votes” among all its
entities for which of the three splits it should be part of. Simultaneously, for each entity, we collect
the splits of which it is a member. The aim is to have all entity in each instance vote for the same split,
and for each entity to be represented in all splits. We rst alternate xing one of the two problems: we
unify the votes by choosing a random entity and setting the votes of the other entities in the instance
to that vote. After all votes have been xed, we x the split memberships by, for each entity that
is not representing in all splits, taking the most frequent split and changing the vote of one of its
instances to the missing split, repeating until all splits are represented.
15
https://www.behindthename.com/random/
16
https://www.rdfhdt.org/datasets/
17
We alternate these steps for 50 iterations. Then, in the rst step, we move any instance with conicting
votes to the training data and repeat the iteration in this fashion for another 20 steps. For both datasets,
this leads to all entities being represented in the training data, and only a small number present in
only the test or only the training data.
For both datasets, the labels are Wikipedia IRIs, but a mapping to human-readable labels is provided.
In this paper, we replace IRIs with these for readability.
7.6.1 wd-movies
We collect all entities that are labeled as “instance of” the class “lm”. For each we extract all entities
connected by the relations “cast member”, “director” and “genre” as its actors, directors and genres
respectively.
We then prune the data by removing all actors, directors and genres that do not appear in at least 6
instances. We then remove any movies that are left with no actors or no directors. We allow movies
with no genres. We iterate these two steps until no changes are made. Finally, we make a test, train
and validation split by the process described above. The following Wikidata properties and entities
were used:
label wikidata IRI
instance of http://www.wikidata.org/prop/P31
lm http://www.wikidata.org/entity/Q11424
cast member http://www.wikidata.org/prop/P161
director http://www.wikidata.org/prop/P57
genre http://www.wikidata.org/prop/P136
7.6.2 wd-articles
We collect all entities from wikidata that are the object of a triple with the relation “cites”.
For each article we collect the full list of authors, using the relations “author” and “author name
string”. The former is used to refer to authors that are represented in Wikidata as an entity, and the
latter is used for authors represented only by their name as a string literal. We require at least one of
the authors to be represented by an entity. If not, the article is ltered out.
Such statements are commonly annotated in Wikidata with an ordinal, representing the order of the
author in the author list. We extract these as well. If any author does not have an ordinal or if the
collection of these ordinals does not coincide exactly with the sequence 1, . . . , n, with n the number
of authors, the article is ltered out.
We then collect all articles that, as recorded in Wikidata, the current article cites. If there are no such
references, the article is ltered out.
Finally, we collect the article subjects, and for each subject, every superclass and its superclass,
that are an instance of “academic discipline”. We do not lter based on the subjects (no subjects or
superclasses is allowed).
We collect the rst 100 000 such articles for the dataset wd-articles, and all such articles for the
dataset wd-articles-large.
As with wd-movies, we prune the data to eliminate any entities that occur in fewer than 6 instances.
For the authors, the article itself and the subjects, we replace these with existential nodes. These have
node labels specic to the role they play in the graph: _article, _author001, and _subject001.
Any references to infrequent entities are removed. As before. this removal process is iterated until
the dataset stabilizes.
Splits are then made using the algorithm described above. In the construction of the dataset, we
add authors by introducing a blank node (using label _authorpos and the relation has_author),
to which the author identity (has_name) and the ordinal (has_order) are connected. References
are added by a single edge with the relation cites and subjects and superclasses with the relations
has_subject and subclass_of.
18
syn-paths
[Nieuwegein drive_to Lelystad, Lelystad drive_to IJmuiden, IJmuiden cycle_to Zaanstad]
[IJmuiden cycle_to Maastricht, Maastricht train_to Roermond, Roermond drive_to Groningen]
[Hilversum cycle_to Emmen, Emmen drive_to Spijkenisse, Spijkenisse train_to Sittard]
syn-tipr
[_academic has_name Cleophas Erős, _academic has_role masters researcher, _academic has_time _time, _time start_year
2016, _time end_year 2018]
[_academic has_name Romana Sitk, _academic has_role professor, _academic has_time _time, _time start_year 1982, _time
end_year 2009]
[_academic has_name Drusus Krejči, _academic has_role assistant professor, _academic has_time _time, _time start_year
1996, _time end_year 2000]
[_academic has_name Božidar Bullard, _academic has_role professor_academic has_time _time, _time start_year 1973, _time
end_year 1988]
syn-types
[Dutch same_type_as English, Budapest could_be_part_of United Kingdom, Czech spoken_in Serbia]
[Serbia same_type_as Spain, Paris could_be_part_of Norway, Dutch spoken_in Greece]
[Greek same_type_as Italian, Budapest could_be_part_of Ireland, French spoken_in Serbia]
wd-movies
[_movie has_director P. Pullaiah, _movie has_actor Gummadi Venkateswara Rao, _movie has_actor Akkineni Nageswara Rao,
_movie has_actor Anjali Devi, _movie has_actor Chittoor Nagaiah, _movie has_actor Ramana Reddy, _movie has_actor Relangi
Venkata Ramaiah, _movie has_actor S. V. Ranga Rao, _movie has_actor Santha Kumari, _movie has_genre historical film,
_movie has_genre biographical film]
[_movie has_director Albert Brooks, _movie has_actor Kathryn Harrold, _movie has_actor Albert Brooks, _movie has_actor
Bruno Kirby, _movie has_genre comedy film]
[_movie has_director Dragoslav Lazić, _movie has_actor Vesna Malohodžić, _movie has_actor Snežana Savić, _movie
has_genre comedy film]
[_movie has_director Balu Mahendra, _movie has_actor Silk Smitha, _movie has_actor Sridevi, _movie has_actor Kamal
Haasan, _movie has_genre romance film]
wd-articles
[_article has_author _authorpos000, _authorpos000 has_name _author000, _authorpos000 has_order ordinal_001,
_article has_author _authorpos001, _authorpos001 has_name _author001, _authorpos001 has_order ordinal_002, _article
has_author _authorpos002, _authorpos002 has_name _author002, _authorpos002 has_order ordinal_003, _article
has_author _authorpos003, _authorpos003 has_name _author003, _authorpos003 has_order ordinal_004, _article
has_author _authorpos004, _authorpos004 has_name _author004, _authorpos004 has_order ordinal_005, _article has_author
_authorpos005, _authorpos005 has_name _author005, _authorpos005 has_order ordinal_006, _article has_author _authorpos006,
_authorpos006 has_name _author006, _authorpos006 has_order ordinal_007, _article has_author _authorpos007, _authorpos007
has_name _author007, _authorpos007 has_order ordinal_008, _article cites http://www.wikidata.org/entity/Q25938995,
_article cites http://www.wikidata.org/entity/Q28242060, _article cites http://www.wikidata.org/entity/Q28286732,
_article cites http://www.wikidata.org/entity/Q34453213, _article cites http://www.wikidata.org/entity/Q34541710,
_article cites http://www.wikidata.org/entity/Q35758845, _article cites http://www.wikidata.org/entity/Q37942996,
_article cites http://www.wikidata.org/entity/Q37972005, _article cites http://www.wikidata.org/entity/Q42642132,
_article has_subject http://www.wikidata.org/entity/Q214781, http://www.wikidata.org/entity/Q214781 subclass_of
http://www.wikidata.org/entity/Q413, _article has_author _authorpos000, _authorpos000 has_name _author000, _authorpos000
has_order ordinal_001, _article has_author _authorpos001, _authorpos001 has_name _author001, _authorpos001 has_order
ordinal_002, _article has_author _authorpos002, _authorpos002 has_name _author002, _authorpos002 has_order ordinal_003,
_article has_author _authorpos003, _authorpos003 has_name http://www.wikidata.org/entity/Q41896189, _authorpos003
has_order ordinal_004, _article has_author _authorpos004, _authorpos004 has_name _author003, _authorpos004 has_order
ordinal_005, _article has_author _authorpos005, _authorpos005 has_name _author004, _authorpos005 has_order ordinal_006,
_article cites http://www.wikidata.org/entity/Q29547376, _article cites http://www.wikidata.org/entity/Q30655427,
_article cites http://www.wikidata.org/entity/Q53584979, _article has_subject http://www.wikidata.org/entity/Q13100823,
_article has_subject http://www.wikidata.org/entity/Q16, _article has_subject http://www.wikidata.org/entity/Q180507
_article has_subject http://www.wikidata.org/entity/Q183]
Figure 2: IntelliGraphs contains ve datasets: syn-paths, syn-tipr, syn-types, wd-movies, and
wd-articles. Here we showcase a few example subgraphs from each dataset. The subgraphs are
presented as a list of triples, where each list item represents a subgraph.
Figure 2 showcases a selection of example subgraph from each dataset: syn-paths, syn-tipr,
syn-types, wd-movies, and wd-articles.
19
8 Datacard
An up-to-date version of the data card can be found on https://github.com/thiviyanT/
IntelliGraphs/blob/main/Datacard.md.
20