Introduction To Neo4j
Introduction To Neo4j
W. H. Bell
November 8, 2023
1 Introduction
The purpose of this document is to introduce common features of graph databases, using the Neo4j
database. Graph databases are used to rapidly join several entities together, allowing analysis across
data samples. A graph database approach is much faster than using a relational database with many
table joins and is syntactically simpler. Graph databases are used within machine learning models, for
fraud detection, and other applications that include relationships between many entities.
A fixed width font is used for commands or source code. To aid the reader, a small bucket character
is used to indicate spaces in commands or source code. This character should be replaced with
a space when the command or source code is entered. An example of the bucket that is used to
indicate spaces is demonstrated in Listing 1, where there is a single space between the command and
argument.
Listing 1: Demonstrating the bucket character used to indicate spaces in source code and commands.
command ␣ argument
An example relationship between a students and a course is given in Figure 1. This diagram includes
two nodes and a relationship between them. Each node comprises a label and a property. In this
example, the labels are :Student and :Course and the properties are two names Joan MacNain and
MSc Software Development. A data sample might comprise many students, who are studying many
different degree programs. Students can be selected by the course that they are following, using the
Cypher query language as shown in Listing 2. Concerning this Cypher query:
4 Assumptions
It is assumed that these exercises are being run on a Linux PC in the Computer and Information
Sciences labs. However, these exercises can be run on another computer, provided that Neo4j desktop
has been installed. If Neo4j desktop is used on another computer, it can be started from the application
menu rather than the command line.
This document includes Bash Linux commands to clone a software repository and change directory. A
separate sheet of Bash Linux commands is provided to Computer and Information Sciences students.
It is assumed that the reader is either familiar with these commands or has access to the reference
document.
The commands that are given in this document have been tested with Neo4j desktop version 5.13.0.
They may work with other recent versions of Neo4j.
5 Exercises
For each Cypher script provided, view it in the Neoj4 desktop browser before running it.
When Neo4j desktop first starts, the license agreement should be accepted by clicking on “I
Agree”. Then click on “Register later”. When working on the CIS lab PCs, ignore update requests
by clicking on “Later” as needed.
Neo4j desktop files are stored in a temporary area on the CIS lab PCs, which is on the
hard disk of the physical lab PC. This implies that any databases that are created will be lost
when the temporary area is cleaned up by IT management processes.
4. Use the mouse to hover over “Movie DBMS” and click on “Stop”.
5. Create a new project by clicking on “New”, which is next to “Projects”.
6. Change the project name to “Sales” by hovering over the project name and clicking on it.
7. Click on “Add” to the right of the project name “Sales”. Then select “Local DBMS”.
8. Rename the database to “Sales DBMS”.
9. Set a password of “password”. This is important if the optional Python programs are used.
10. Leave the “Version” set at its default value.
11. Click on “Create”.
12. Use the mouse to hover over the “Sales DBMS” and click on “Start”.
13. Click on “Reveal files in File Explorer”.
14. Use the file explorer to copy the files ending in .cypher from introduction-to-neo4j/src/ into
the project folder that is associated with the database.
15. Use the mouse to hover over create-customers.cypher in the Neo4j Desktop and click on
“Open”. This opens a separate window.
16. Click on the triangle to the right of the file contents to run the script.
• Line 1: This is a comment, which is ignored by Neo4j when the script is run.
• Line 2: A customer is created using CREATE, followed by (). The text within the () describes
one node. The name barry is a variable, which is accessible when the script is running.
The :Customer is label, where the label name can be chosen as needed. The name: is
a property name, which is followed by an associated value. Each node can have several
properties, which are separated by commas.
• Line 3-6: Another four customers are created.
• In the results display, view the “Graph”, “Table”, and “Text” output.
• In the “Graph” results, click on one of the notes to display information about it. Notice that
Neo4j has created an id value for each customer.
The contents of select-all.cypher is given in Listing 6. The command MATCH can be used
with or without an associated WHERE. If no constrains are provided, MATCH returns all nodes. In
this case, RETURN is followed by n. This causes a node to be returned, rather than a specified
property of a node.
The contents of find-purchases.cypher is given in Listing 9. The command finds all Customer
nodes that BOUGHT a purchase, where the name of the Customer must be “Barry Rayner”. The
RETURN is followed by p.name to specify that only the name of the purchase should be returned.
The contents of find-customers.cypher is given in Listing 10. The command returns the names
of the Customer nodes who have bought ice-cream.
22. Run find-customers-2.cypher to search for customers for two selected purchase.
The contents of find-customers-2.cypher is given in Listing 11. The command returns the
Customer nodes who have bought either ice-cream or french fries. Due to the data and query
specified, the query would return multiple instances of the same node without DISTINCT.
23. Edit the find-customers-2.cypher command and remove DISTINCT. Then re-run the command
to check what happens. Look at the data to understand the results of the query.
24. Run delete-all.cypher to remove all of the nodes.
The contents of delete-all.cypher is given in Listing 12. Since no selection conditions are
included, the command deletes all nodes.
25. Run create-network.cypher to create a series of houses, a postcode and electrical connections.
Then use select-all.cypher to view the nodes that are created.
• Line 3-27: Several nodes and relationships are created using one CREATE, where each
creation request is given within () that is separated from the next () by a comma.
• Line 3-5: A Postcode is created, which has three properties.
• Line 6-10: Five House nodes are created. These nodes have two labels Node and House.
This implies that either label could be used within a query.
• Line 11-15: The House nodes are associated with the postcode.
• Line 16-18: Three Connection nodes are created, which are also given the Node label.
• Line 19: A Splitter node is created, which is also given the Node label.
• Line 20-27: Connected relationships between the House, Connection and Splitter nodes
are defined.
26. Run count-connections.cypher to count the number of electrical connections for each
connection point.
The contents of count-connections.cypher is given in Listing 14. The command returns the
name of the Connection nodes and the number of houses that are connected to them.
27. Select all of the houses that are connected to the electrical connection point with the
most connections. This can be achieved by using a Cypher command that is similar to
find-customers.cypher.
6 Further reading
The examples presented in this document demonstrate some of the features of a graph
database. More information on Cypher is given at https://neo4j.com/developer/cypher/ and
https://www.tutorialspoint.com/neo4j/.