Tutorial - Build A Knowledge Graph Using NLP and Ontologies - Developer Guides
Tutorial - Build A Knowledge Graph Using NLP and Ontologies - Developer Guides
Introduction
Video
Tools
APOC
neosemantics (n10s)
Similar Articles
Adding a custom ontology
Goals
This guide shows how to build and query a Knowledge Graph of entities extracted using APOC NLP
procedures and Ontologies extracted using neosemantics.
Prerequisites
You should have a basic understanding of the property graph model. Having Neo4j Desktop downloaded
and installed will allow you to code along with the examples.
→
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
Introduction control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 1/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
In this tutorial we’re going to build a Software Knowledge Graph based on:
Articles taken from dev.to, a developer blogging platform, and the entities extracted (using NLP
techniques) from those articles.
Software ontologies extracted from Wikidata, the free and open knowledge base that acts as
central storage for the structured data of Wikipedia.
Once we’ve done that we’ll learn how to query the Knowledge Graph to nd interesting insights that
are enabled by combining NLP and Ontologies.
The queries and data used in this guide can be found in the neo4j-examples/nlp-knowledge-graph
→
GitHub repository.
Video
Jesús Barrasa and Mark Needham presented a talk based on this tutorial at the Neo4j Connections:
Knowledge Graphs event on 25th August 2020. The video from the talk is available below:
→
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
Tools content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 2/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
We’re going to use a couple of plugin libraries in this tutorial, so you’ll need to install those if you want
to follow along with the examples.
APOC
more than 450 procedures and functions providing functionality for utilities,
conversions, graph updates, and more. We’re going to use this tool to scrape web
pages and apply NLP techniques on text data.
We can install APOC from the plugins section of a database in the Neo4j Desktop:
We’ll also need to install the APOC NLP Dependencies jar from GitHub releases. At the time of writing,
the latest version of APOC is 4.0.0.18, so we need to download apoc-nlp-dependencies-
4.0.0.18.jarThis
fromwebsite uses cookies
the 4.0.0.18 release page . Once we’ve downloaded that le, we need to
→
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 3/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
neosemantics (n10s)
neosemantics is a plugin that enables the use of RDF and its associated
→
vocabularies like OWL, RDFS, SKOS, and others in Neo4j. We’re going to use this tool
to import ontologies into Neo4j.
neosemantics only supports the Neo4j 4.0.x and 3.5.x series. It does not yet support the Neo4j 4.1.x series.
We can install neosemantics by following the instructions in the project installation guide .
→
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
What is a Knowledge
Accept Cookies Graph?
Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 4/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
There are many di erent de nitions of Knowledge Graphs . In this tutorial, the de nition of a
→
Knowledge Graph is a graph that contains the following:
Facts
Instance data. This would include graph data imported from any data source and could be
structured (e.g. JSON/XML) or semi structured (e.g. HTML)
Explicit Knowledge
Explicit description of how instance data relates. This comes from ontologies, taxonomies, or any
kind of metadata de nition.
Wikidata is a free and open knowledge base that can be read and edited by both
humans and machines. It acts as central storage for the structured data of its
Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and
others.
Wikidata provides a SPARQL API that lets users query the data directly. The screenshot below
→
shows an example of a SPARQL query along with the results from running that query:
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 5/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
content
that entity as far and serve
as it can. If wetargeted
run theads. Learnwe
query about
gethow we use cookies
a stream and (subject,
of triples how you can
predicate, subject).
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 6/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
We’re now going to learn how to import Wikidata into Neo4j using neosemantics.
call n10s.nsprefixes.add('neo','neo4j://voc#');
CALL n10s.mapping.add("neo4j://voc#subCatOf","SUB_CAT_OF");
CALL n10s.mapping.add("neo4j://voc#about","ABOUT");
Now we’re going to import the Wikidata taxonomies. We can get an importable URL directly from the
Wikidata SPARQL API, by clicking on the Code button:
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
Figure 4. Getting an importable URI for a Wikidata SPARQL query
Accept Cookies Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 7/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
We then pass that URL to the n10s.rdf.import.fetch procedure, which will import the stream of
triples into Neo4j.
The examples below contain queries that import taxonomies starting from Software Systems,
Programming Languages, and Data Formats.
WITH "https://query.wikidata.org/sparql?
query=prefix%20neo%3A%20%3Cneo4j%3A%2F%2Fvoc%23%3E%20%0A%23Cats%0A%23SELECT%20%3Fitem%20%3Flabe
l%20%0ACONSTRUCT%20%7B%0A%3Fitem%20a%20neo%3ACategory%20%3B%20neo%3AsubCatOf%20%3FparentItem%20
.%20%20%0A%20%20%3Fitem%20neo%3Aname%20%3Flabel%20.%0A%20%20%3FparentItem%20a%20neo%3ACategory%
3B%20neo%3Aname%20%3FparentLabel%20.%0A%20%20%3Farticle%20a%20neo%3AWikipediaPage%3B%20neo%3Aab
out%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%0A%7D%0AWHERE%20%0A%7B%0A%20%20%3Fitem%
20(wdt%3AP31%7Cwdt%3AP279)*%20wd%3AQ2429814%20.%0A%20%20%3Fitem%20wdt%3AP31%7Cwdt%3AP279%20%3Fp
arentItem%20.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3Flabel%20.%0A%20%20filter(lang(%3Flabel)%20%3D
%20%22en%22)%0A%20%20%3FparentItem%20rdfs%3Alabel%20%3FparentLabel%20.%0A%20%20filter(lang(%3Fp
arentLabel)%20%3D%20%22en%22)%0A%20%20%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%3Farticle%20
schema%3Aabout%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AinLanguage%20%22e
n%22%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fen.wikipe
dia.org%2F%3E%20.%0A%20%20%20%20%7D%0A%20%20%0A%7D" AS softwareSystemsUri
CALL n10s.rdf.import.fetch(softwareSystemsUri, 'Turtle' , { headerParams: { Accept:
"application/x-turtle" } })
YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams
RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams;
Table 1. Results
WITH "https://query.wikidata.org/sparql?
query=prefix%20neo%3A%20%3Cneo4j%3A%2F%2Fvoc%23%3E%20%0A%23Cats%0A%23SELECT%20%3Fitem%20%3Flabe
This website uses cookies
l%20%0ACONSTRUCT%20%7B%0A%3Fitem%20a%20neo%3ACategory%20%3B%20neo%3AsubCatOf%20%3FparentItem%20
.%20%20%0A%20%20%3Fitem%20neo%3Aname%20%3Flabel%20.%0A%20%20%3FparentItem%20a%20neo%3ACategory%
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
3B%20neo%3Aname%20%3FparentLabel%20.%0A%20%20%3Farticle%20a%20neo%3AWikipediaPage%3B%20neo%3Aab
content and serve targeted ads. Learn about how we use cookies and how you can
out%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%0A%7D%0AWHERE%20%0A%7B%0A%20%20%3Fitem%
control them in Cookie Settings. By using our site. you consent to our use of cookies.
20(wdt%3AP31%7Cwdt%3AP279)*%20wd%3AQ9143%20.%0A%20%20%3Fitem%20wdt%3AP31%7Cwdt%3AP279%20%3Fpare
ntItem%20.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3Flabel%20.%0A%20%20filter(lang(%3Flabel)%20%3D%20
Accept Cookies Use necessary cookies only
%22en%22)%0A%20%20%3FparentItem%20rdfs%3Alabel%20%3FparentLabel%20.%0A%20%20filter(lang(%3Fpare
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 8/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
ntLabel)%20%3D%20%22en%22)%0A%20%20%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%3Farticle%20sch
ema%3Aabout%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AinLanguage%20%22en%2
2%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fen.wikipedia
.org%2F%3E%20.%0A%20%20%20%20%7D%0A%20%20%0A%7D" AS programmingLanguagesUri
CALL n10s.rdf.import.fetch(programmingLanguagesUri, 'Turtle' , { headerParams: { Accept:
"application/x-turtle" } })
YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams
RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams;
Table 2. Results
WITH "https://query.wikidata.org/sparql?
query=prefix%20neo%3A%20%3Cneo4j%3A%2F%2Fvoc%23%3E%20%0A%23Cats%0A%23SELECT%20%3Fitem%20%3Flabe
l%20%0ACONSTRUCT%20%7B%0A%3Fitem%20a%20neo%3ACategory%20%3B%20neo%3AsubCatOf%20%3FparentItem%20
.%20%20%0A%20%20%3Fitem%20neo%3Aname%20%3Flabel%20.%0A%20%20%3FparentItem%20a%20neo%3ACategory%
3B%20neo%3Aname%20%3FparentLabel%20.%0A%20%20%3Farticle%20a%20neo%3AWikipediaPage%3B%20neo%3Aab
out%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%0A%7D%0AWHERE%20%0A%7B%0A%20%20%3Fitem%
20(wdt%3AP31%7Cwdt%3AP279)*%20wd%3AQ24451526%20.%0A%20%20%3Fitem%20wdt%3AP31%7Cwdt%3AP279%20%3F
parentItem%20.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3Flabel%20.%0A%20%20filter(lang(%3Flabel)%20%3
D%20%22en%22)%0A%20%20%3FparentItem%20rdfs%3Alabel%20%3FparentLabel%20.%0A%20%20filter(lang(%3F
parentLabel)%20%3D%20%22en%22)%0A%20%20%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%3Farticle%2
0schema%3Aabout%20%3Fitem%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AinLanguage%20%22
en%22%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fen.wikip
edia.org%2F%3E%20.%0A%20%20%20%20%7D%0A%20%20%0A%7D" AS dataFormatsUri
CALL n10s.rdf.import.fetch(dataFormatsUri, 'Turtle' , { headerParams: { Accept: "application/x-
turtle" } })
YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams
RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams;
Table 3. Results
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 9/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Let’s have a look at what’s been imported. We can get an overview of the contents of our database by
running the following query:
CALL apoc.meta.stats()
YIELD labels, relTypes, relTypesCount
RETURN labels, relTypes, relTypesCount;
Table 4. Results
Any labels or relationships types that have a _prefix can be ignores as they represent meta data
created by the n10s library.
We can see that we’ve imported over 2,000 Category nodes and 1,700 WikipediaPage nodes.
Every node that we create using n10s will have a Resource label, which is why we have over 4,000
nodes with thisThis
label.
website uses cookies
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
We also have more than
content and 7,000 ads. Learnrelationship
SUB_CAT_OF
serve targeted about how wetypes connecting
use cookies and how the Category nodes and
you can
control them in
3,000 ABOUT relationship Cookie
types Settings. Bythe
connecting usingWikipediaPage
our site. you consentnodes
to our use of cookies.
to the Category nodes.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 10/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Let’s now have a look at some of the actual data that we’ve imported. We can look at the sub
categories of the version control node by running the following query:
Finding sub categories of version control Cypher Copy to Clipboard Run in Neo4j Browser
Rational Synergy
F
_O
AT
B_C
SU
Pijul
Sun WorkShop software
TeamWare configuration
manager
SUB_CAT_OF
Razor
SU
GNU Bazaar
B_
FishEye
SUB_CAT_
CA
T_
SU MKS Integrity
OF
OF
B_
CA
AT_
T_
OF
SU
OF
B_C
OF
B_
T_
CA
SU
distributed CA
T_
B_
…
OF
B_C
AT_
T_
SUB_CAT_
OF
CA
B_
SU
F
SUB_…
O
Fossil
SU
B_
T_
SU Dat
CA
CA
B_
B_
CA
T_
_OF
T_
SU
SUB_C
OF
OF
_CAT
AT_OF
BitKeeper
SUB
Source Code
Control System Autodesk Vault
So far, so good!
databases, JavaScript frameworks, the latest AWS API, chatbots, and more. A screenshot of the home
Accept Cookies Use necessary cookies only
page is shown below:
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 11/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
We’re going to import some articles from dev.to into Neo4j. articles.csv contains a list of 30
→
articles of interest. We can query this le using Cypher’s LOAD CSV clause:
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 12/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
row
{uri: "https://dev.to/lirantal/securing-a-nodejs—rethinkdb—tls-setup-on-docker-containers"}
{uri: "https://dev.to/setevoy/neo4j-running-in-kubernetes-e4p"}
{uri: "https://dev.to/divyanshutomar/introduction-to-redis-3m2a"}
{uri: "https://dev.to/zaiste/15-git-commands-you-may-not-know-4a8j"}
{uri: "https://dev.to/michelemauro/atlassian-sunsetting-mercurial-support-in-bitbucket-2ga9"}
{uri: "https://dev.to/rootsami/rancher-kubernetes-on-openstack-using-terraform-1ild"}
{uri: "https://dev.to/jignesh_simform/comparing-mongodb—mysql-bfa"}
We’re going to use APOC’s apoc.load.html procedure to scrape the interesting information from
→
each of these URIs. Let’s rst see how to use this procedure on a single article, as shown in the
following query:
1
MERGE (a:Article {uri: "https://dev.to/lirantal/securing-a-nodejs--rethinkdb--tls-setup-on-
docker-containers"})
WITH a
2
CALL apoc.load.html(a.uri, {
body: 'body div.spec__body p',
title: 'h1',
time: 'time'
})
YIELD value
This website
UNWIND value.body AS itemuses cookies
3 We use cookies to o er you a better browsing experience, analyze site tra c, personalize
WITH a, content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By
apoc.text.join(collect(item.text), using
'') ASour site. you consent to our use of cookies.
body,
value.title[0].text AS title,
Accept Cookies Use necessary
value.time[0].attributes.datetime cookies only
AS date
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 13/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
4
SET a.body = body , a.title = title, a.datetime = datetime(date)
RETURN a;
1 Create node with Article label and uri property if it doesn’t already exist
2 Scrape data from the URI using the provided CSS selectors
Table 6. Results
(:Article {processed: TRUE, datetime: 2017-08-21T18:41:06Z, title: "Securing a Node.js + RethinkDB + TLS setup on
Docker containers", body: "We use RethinkDB at work across di erent projects. It isn’t used for any sort of big-data
applications, but rather as a NoSQL database, which spices things up with real-time updates, and relational tables
support.RethinkDB features an o cially supported Node.js driver, as well as a community-maintained driver as well
called rethinkdbdash which is promises-based, and provides connection pooling. There is also a database migration tool
called rethinkdb-migrate that aids in managing database changes such as schema changes, database seeding, tear up
and tear down capabilities.We’re going to use the o cial RethinkDB docker image from the docker hub and make use of
docker-compose.yml to spin it up (later on you can add additional services to this setup).A fair example for docker-
compose.yml:The compose le mounts a local tls directory as a mapped volume inside the container. The tls/ directory
will contain our cert les, and the compose le is re ecting this.To setup a secure connection we need to facilitate it
using certi cates so an initial technical step:Important notes:Update the compose le to include a command
con guration that starts the RethinkDB process with all the required SSL con gurationImportant notes:You’ll notice
there isn’t any cluster related con guration but you can add them as well if you need to so they can join the SSL
connection: — cluster-tls — cluster-tls-key /tls/key.pem — cluster-tls-cert /tls/cert.pem — cluster-tls-ca /tls/ca.pemThe
RethinkDB drivers support an ssl optional object which either sets the certi cate using the ca property, or sets the
rejectUnauthorized property to accept or reject self-signed certi cates when connecting. A snippet for the ssl
con guration to pass to the driver:Now that the connection is secured, it only makes sense to connect using a
user/password which are not the default.To set it up, update the compose le to also include the — initial-password
argument so you can set the default admin user’s password. For example:Of course you need to append this argument
to the rest of the command line options in the above compose le.Now, update the Node.js driver settings to use a user
and password to connect:Congratulations! You’re now eligible to “Ready for Production stickers.Don’t worry, I already
mailed them to your address.", uri: "https://dev.to/lirantal/securing-a-nodejs—rethinkdb—tls-setup-on-docker-
containers"})
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
Now we’re going to import the other articles.
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
We’ll use the apoc.periodic.iterate procedure so that we can parallelise this process. This
→
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 14/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
The data driven statement contains a stream of items to process, which will be the stream of URIs.
The operation statement de nes what to do to each of these items, which will be to call
apoc.load.html and create nodes with the Article label.
The nal parameter is for providing con g. We’re going to tell the procedure to process these items in
batches of 5 that can be run concurrently.
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'https://github.com/neo4j-examples/nlp-knowledge-
graph/raw/master/import/articles.csv' AS row
RETURN row",
"MERGE (a:Article {uri: row.uri})
WITH a
CALL apoc.load.html(a.uri, {
body: 'body div.spec__body p',
title: 'h1',
time: 'time'
})
YIELD value
UNWIND value.body AS item
WITH a,
apoc.text.join(collect(item.text), '') AS body,
value.title[0].text AS title,
value.time[0].attributes.datetime AS date
SET a.body = body , a.title = title, a.datetime = datetime(date)",
{batchSize: 5, parallel: true}
)
YIELD batches, total, timeTaken, committedOperations
RETURN batches, total, timeTaken, committedOperations;
Table 7. Results
7 32 15 32
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 15/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
On the left we have the Wikidata taxonomy graph, which represents the explicit knowledge in our
Knowledge Graph. And on the right we have the articles graph, which represents the facts in our
Knowledge Graph. We want to join these two graphs together, which we will do using NLP techniques.
In April 2020, the APOC standard library added procedures that wrap the NLP APIs
of each of the big cloud providers - AWS, GCP, and Azure. These procedures
→
extract text from a node property and then send that text to APIs that extract entities,
key phrases, categories, or sentiment.
We’re going to use the GCP Entity Extraction procedures on our articles. The GCP NLP API returns
→
a parameter thatAccept
contains it:
Cookies Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 16/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
We’re going to use the apoc.nlp.gcp.entities.stream procedure, which will return a stream of
entities found for the text content contained in a node property. Before running this procedure
against all of the articles, let’s run it against one of them to see what data is returned:
Table 8. Results
entity
{name: "connection", salience: 0.04166339, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content:
"connection", beginO set: -1}}, {type: "COMMON", text: {content: "connection", beginO set: -1}}]}
{name: "work", salience: 0.028608896, metadata: {}, type: "OTHER", mentions: [{type: "COMMON", text: {content: "work",
beginO set: -1}}]}
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 17/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Each row contains a name property that describes the entity. salience is an indicator of the
importance or centrality of that entity to the entire document text.
Some entities also contain a Wikipedia URL, which is found via the metadata.wikipedia_url key.
The rst entity, RethinkDB, is the only entity in this list that has such a URL. We’re going to lter the
rows returned to only include ones that have a Wikipedia URL and we’ll then connect the Article
nodes to the WikipediaPage nodes that have that URL.
Let’s have a look at how we’re going to do this for one article:
1
YIELD node, value
2
WITH entity, node
WHERE not(entity.metadata.wikipedia_url is null)
3
MERGE (page:Resource {uri: entity.metadata.wikipedia_url})
SET page:WikipediaPage
4
MERGE (node)-[:HAS_ENTITY]->(page)
3 Find a nodeThis
thatwebsite
matchesuses cookies URL. Create one if it doesn’t already exist.
the Wikipedia
4 We use cookies
Create a HAS_ENTITY to o er youbetween
relationship a better browsing experience,
the Article node analyze site tra c, personalize
and WikipediaPage
content and serve targeted ads. Learn about how we use cookies and how you can
We can see how running
control themthis querySettings.
in Cookie connects the article
By using our site.and taxonomy
you consent to oursub
use graphs by looking at the
of cookies.
following Neo4j Browser visualization:
Accept Cookies Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 18/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Securing a
Node.js +
RethinkDB +
HAS_ENTITY TLS setup on HAS_ENTITY
https://en.wikip… Docker https://en.wikip…
containers
AB
O
U
ABO
T
UT
RethinkDB
T_O
F NoSQL
_CA SU
database
SUB B_C
AT_
Distributed OF management
database system
management
system
free software
Now we can run the entity extraction technique over the rest of the articles with help from the
apoc.periodic.iterate procedure again:
CALL apoc.periodic.iterate(
"MATCH (a:Article)
WHERE not(exists(a.processed))
RETURN a",
"CALL apoc.nlp.gcp.entities.stream([item in $_batch | item.a], {
nodeProperty: 'body',
key: $key
})
YIELD node, value
SET node.processed = true
WITH node, value
UNWIND value.entities AS entity
WITH entity, node
WHERE not(entity.metadata.wikipedia_url is null)
MERGE (page:Resource {uri: entity.metadata.wikipedia_url})
SET page:WikipediaPage
MERGE (node)-[:HAS_ENTITY]->(page)",
{batchMode: "BATCH_SINGLE", batchSize: 10, params: {key: $key}})
YIELD batches, total, timeTaken, committedOperations
RETURN batches, total, timeTaken, committedOperations;
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 19/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Semantic Search
The rst query that we’re going to do is semantic search. The n10s.inference.nodesInCategory
procedure lets us search from a top level category, nding all its transitive sub categories, and then
returns nodes attached to any of those categories.
In our graph the nodes connected to category nodes are WikipediaPage nodes. We’ll therefore
need to add an extra MATCH clause to our query to nd the connected articles via the HAS_ENTITY
relationship type. We can see how to do this in the following query:
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 20/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Although we’ve searched for NoSQL, we can see from the results that a couple of articles don’t link
directly to that category. For example, we have a couple of articles about Apache Zookeeper. We can
see how this category is connected to NoSQL by writing the following query:
So Apache Zookeeper is actually a couple of levels away from the NoSQL category.
Similar Articles
This website uses cookies
Another thing that we can do with our Knowledge Graph is nd similar articles based on the entities
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
that articles have in common. The simplest version of this query would be to nd other articles that
content and serve targeted ads. Learn about how we use cookies and how you can
share commoncontrol
entities, asinshown
them Cookie in the following
Settings. query:
By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 21/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
The Neo4j performance testing article is about Neo4j, and there are two other Neo4j articles that we
could recommend to a reader that liked this article.
We can also use the category taxonomy in our query. We can nd articles that share a common parent
category by writing the following query:
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
Table 11. Results
control them in Cookie Settings. By using our site. you consent to our use of cookies.
other.title other.uri otherCategories pathToOther
Accept Cookies Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 22/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
"5 Best courses to learn "https://dev.to/javinpaul/5- ["Java", "Scratch", "Scala", [["Neo4j", "free software",
Apache Kafka in 2020" best-courses-to-learn- "Apache ZooKeeper"] "Scratch"], ["Neo4j", "free
apache-kafka-in-2020- software", "Apache
584h" ZooKeeper"]]
Note that in this query we’ll also returning the path from the initial article to the other article. So for
"Couchbase GeoSearch withuses
This website ASP.NET Core", there is a path that goes from the initial article to the
cookies
Neo4j category,We
from there to the proprietary software category, which is also a parent of the
use cookies to o er you a better browsing experience, analyze site tra c, personalize
Couchbase Server Category,
content and servewhich the
targeted "Couchbase
ads. Learn about GeoSearch with ASP.NET
how we use cookies Core"
and how you can article is connected
control them in Cookie Settings. By using our site. you consent to our use of cookies.
to.
Accept Cookies Use necessary cookies only
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 23/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
This shows o another nice feature of Knowledge Graphs - as well as making a recommendation, it’s
easy to explain why it was made as well.
We might not consider proprietary software to be a very good measure of similarity between two
technology products. It would be unlikely that we’re looking for similar articles based on this type of
similarity.
But a common way that software products are connected is via technology stacks . We could
→
therefore create our own ontology containing some of these stacks.
nsmntx.org/2020/08/swStacks contains an ontology for the GRANDstack, MEAN Stack, and LAMP
→
Stack. Before we import this ontology, let’s setup some mappings in n10s:
CALL n10s.nsprefixes.add('owl','http://www.w3.org/2002/07/owl#');
CALL n10s.nsprefixes.add('rdfs','http://www.w3.org/2000/01/rdf-schema#');
CALL n10s.mapping.add("http://www.w3.org/2000/01/rdf-schema#subClassOf","SUB_CAT_OF");
CALL n10s.mapping.add("http://www.w3.org/2000/01/rdf-schema#label","name");
CALL n10s.mapping.add("http://www.w3.org/2002/07/owl#Class","Category");
And now we can preview the import on the ontology by running the following query:
CALL n10s.rdf.preview.fetch("http://www.nsmntx.org/2020/08/swStacks","Turtle");
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 24/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
AngularJS
MongoDB
React
SU
B_
…
CA
CA
T_
B_
OF
SU
Express.js
OF
SUB_C
Node.js
T_OF
T_
AT_OF
SUB_CA
CA
MEAN Stack
B_
SU
SU
B_ GraphQL
CA
T… OF
This CAT_
ontology SUB_
describes some MySQL GRAND Stack
of the most SUB SUB_…
_CA… Software Stack
popular SUB_CAT_
OF
Software SU
LAMP Stack B_C
stacks. AT_
OF
SU
SUB
_CA
B_
OF T_O Neo4j
T_ F
CA
CA
B_
T_O
SU SU
B_
CA
F
OF
T_
OF Linux
SUB_CAT_OF
SU
T_
CA
Python
B_
CA
B_
SU
T_
Apollo (Data
OF
Graph Platform)
MariaDB
Perl
Apache HTTP PHP
Server
CALL n10s.rdf.import.fetch("http://www.nsmntx.org/2020/08/swStacks","Turtle")
YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams
RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, callParams;
"OK" 58 58 NULL {}
We can now re-run the similarity query, which will now return the following results:
This website uses cookies
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
Table 13. Results
content and serve targeted ads. Learn about how we use cookies and how you can
other.title control them inother.uri
Cookie Settings. By using our otherCategories
site. you consent to our use of cookies.
pathToOther
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 25/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
"Learn how YOU can build a "https://dev.to/azure/learn- ["Node.js", "GraphQL"] [["Neo4j", "GRAND Stack",
Serverless GraphQL API on how-you-can-build-a- "GraphQL"]]
top of a Microservice serverless-graphql-api-on-
architecture, part I" top-of-a-microservice-
architecture-233g"
This time we’ve now got a couple of extra articles at the top about GraphQL, which is one of the tools
in the GRANDstack, of which Neo4j is also a part.
This website uses cookies
Was this pageWe use cookies to o er you a better browsing experience, analyze site tra c, personalize
helpful?
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 26/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
Contents
Introduction
Video
Tools
APOC
neosemantics (n10s)
Similar Articles
Adding a custom ontology
Neo4j®, Neo Technology®, Cypher®, Neo4j® Sweden +46 171 480 113
Bloom™ and Neo4j® Aura™ are registered UK: +44 20 3868 3223
trademarks of Neo4j, Inc. All other marks are France: +33 (0) 8 05 08 03 44
Learn Social
Sandbox
This website uses cookies Twitter
Neo4j Community Site Meetups
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
Neo4j Developer
content Blog
and serve targeted ads. Learn about how we Github
use cookies and how you can
control them
Neo4j Videos in Cookie Settings. By using our site. you
Stackconsent
Overtoowour use of cookies.
GraphAcademy
Accept Cookies Use necessary cookies only
Want to Speak? Get $ back.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 27/28
12/17/2020 Tutorial: Build a Knowledge Graph using NLP and Ontologies - Developer Guides
p
Neo4j Labs
We use cookies to o er you a better browsing experience, analyze site tra c, personalize
content and serve targeted ads. Learn about how we use cookies and how you can
control them in Cookie Settings. By using our site. you consent to our use of cookies.
https://neo4j.com/developer/graph-data-science/build-knowledge-graph-nlp-ontologies/ 28/28