Graph Databases: Phil Bartie
Graph Databases: Phil Bartie
F20BD / F21BD
Big Data Management
Phil Bartie
Today we shall discuss
What is a graph ?
What a graph database does
Use cases
Introduction to Neo4J
Big Data Management
Database Landscape
RDF Virtuoso Object XML Relational Oracle
Jena Caché MarkLogic
Stardog Db4o MySQL
RDF4J
Versant
Sedna MS SQL Server
Tamino
GraphDB BaseX
Blazegraph ObjectStore eXist-db
PostgreSQL DB2
SQLite
MS Access Teradata
NoSQL
SAP Adaptive Server
Key-Value Redis Document MongoDB Hive
Memcached DynamoDB FileMaker
Riak KV MariaDB
Aerospike CouchBase
SimpleDB Elasticsearch Informix Vertica
Where is Neo4j?
CA : Guarantees
But (like to give a
Dynamo), CP: Guarantees
correct response but only while responses are correct even
tables are tunable
C
network works fine if there are network
towards CP
(Centralised / Traditional) failures, but response may
fail (Weak availability)
A P
https://aws.amazon.com/neptune/
12
6 degrees of Kevin Bacon
Can all actors be linked to Kevin Bacon in under 6 hops?
http://oracleofbacon.org/
Facebook claims
3.74 degrees of
freedom between
‘strangers’ (Nov 2016)
https://research.fb.com/blog/2016/02/three-and-a
-half-degrees-of-separation/ http://www.classtools.net/blog/wp-content/uploads/2015/04/bacon.jpg
13
£$€
https://aws.amazon.com/neptune/
https://www.flickr.com/photos/jasmic/
https://aws.amazon.com/neptune/
Network Diagram
http://www.conceptdraw.com/How-To-Guide/picture/Network-diagram-System-design.png 15
Big Data Management
London Underground
16
http://www.animalsontheunderground.com
Underground Network Maps
Paris Metro
23
Big Data Management
Adjacency: Example
24
Big Data Management
Neighbourhood: Example
http://www.w3.org/TR/rdf11-primer/
Example Applications
Social Networks: Technology
Friendship relationships Network links
Cliques Software dependencies
Centres
Transportation: Science
Shortest route Biological pathways
Intercity air links Chemistry
Inter-transport Electrical networks
Travelling Salesman Other
Fraud detection
Recommender Systems
27
What is a graph?
2
Single relationship Operations
Set of Vertices (nodes) Adjacency 3 2
Neighbours
Set of Edges (relationships): 1
Algorithms:
Directed or undirected Degree
May be weighted
Path Finding (number shows connections per node)
30
Graph representation
Sequential: adjacency matrix Linked: adjacency list
What is graph analytics?
Categories of analytics...
1. Graph query
2. Graph algorithm
3. Graph analytics
Photo by Hans-Peter Gauster on Unsplash
What is the difference between querying & analytics?
Good question…
no agreed answer
as analysis needs
to use querying.
2. Graph algorithm
3. Graph analytics
What is the difference between querying &
analytics?
STILL A GRAPH QUERY!
1. Graph query
2. Graph algorithm: queries that use built-in algorithms
e.g., Find the shortest path between Alice & Bob
3. Graph analytics
What is the difference between querying &
analytics?
1. Graph query
2. Graph algorithm
3. Graph analytics: tell you something about the graph in
general
e.g.
what is the average number of films that actors under 30 years old
have appeared in?
What is graph analytics?
1. Graph query
2. Graph algorithm
3. Graph analytics
Compare:
1. Find mutual friends of Alice & Bob.
2. Find all the friends of Alice that are not friends of Bob.
https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html 43
7 Bridges of Königsberg
https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html 45
Graph databases
Graph operations
CRUD operations Graph specific operations
Create Adjacency
Add a vertex or edge Find adjacent nodes of vertex
Read Neighbourhood
Retrieve data Find connected vertices up to a
certain radius
Update
Change vertex or edge properties Shortest path
Find the sequence of edges that
Delete
make the shortest path between two
Remove a vertex of edge nodes
Native vs Non-Native Storage
Native = data stored as a graph
Non-native = data not stored as a graph
Check here for more details: https://neo4j.com/blog/native-vs-non-native-graph-technology/
Storing graph in RDBMS
FRIEND Bob’s Friends
1 2
FRIEND
SELECT p2.Name
FRIEND
FROM Person p1,
FRIEND
Friendship f1,
99 Person p2
WHERE p1.ID = f1.PersonID
Person Friendships AND p2.ID = f1.FriendID
ID Name PersonID FriendID AND p1.Name = ‘Bob’
1 Alice 1 2
2 Bob 2 1
… … 2 99
99 Zach … …
99 1
Storing graph in RDBMS Complex query
Relational Graphs
Exploit data model structure Have to visit every node
Use group by operators
Benefits
• Data model looks like graph you would draw on whiteboard, so
simple to discuss with non technical people
• ACID compliant
• Flexible schema
• Cheap to run (e.g. compared to Oracle)
• Relationships are no longer hidden, they are a 1st class entity
• Fast
Native Graph Storage
http://neo4j.com/
Based on java
Data model
Property graph: Collection of nodes and relationships
Nodes
RESEARCH KEN
{
TEACH name: “ken”,
age: 26,
job: “ra”
}
Relationships
• Link 2 entities with a direction
KEN
• Or be qualitative (e.g., start
date)
• Have labels that define the class of
relationship
• Relationships ALWAYS have 1 direction
BUT: Ken follows HWU (i.e. on Twitter) and HWU follows Ken
Exercise
http://www.apcjones.com/arrows/
UNIVERSITY
LOCATED IN
CITY
STUDIES AT
LIVES IN
PERSON
64
UNIVERSITY
{
name: “hwu”,
founded: 1821
}
LOCATED IN
CITY
STUDIES AT {
name:
“Edinburgh”
}
{
name: “steve”,
age: 23,
LIVES IN
job: “student”
}
PERSON
HWU
KEN
Find all men who are connected within three friends of my women friends who
like sailing but not bowling and who live within 30 miles of my address.
Weaknesses
• Still something of a niche product. Many use cases, but not a
general purpose DBMS in the way that RDBMS are.
• Immature in comparison to RDBMS
• Writes are slow (in comparison to something like MongoDB)
• Flexible schema are not that flexible. You need to fix a schema in
order that your program knows what to ask for and how to ask
for it. So if there is no schema defined at the DBMS level, you
end up defining one in the programs that access the data.
• Neo4J Cypher pushing for a standard query language
https://neo4j.com/press-releases/query-language-graph-databa
ses-international-standard/
Cypher
http://neo4j.com/developer/cypher-query-language/
Cypher: create a node
Operation to run
() indicates a node
Operation to run
{ name: TEACHES
‘kcm’}
match (n {name: 'kcm’})
The name node is returned, then used to create a relationship with the new course node.
Cypher Summary
Create a node:
create()
Retrieve all nodes:
match(n) return n
Create two nodes and a relationship called :LINKS_TO between them:
create()-[:LINKS_TO]-> ()
Create LINKS_TO relationship between nodes 0 and 1:
match (n),(m) where id(n)=0 and id(m)= 1
create(n) -[:LINKS_TO]-> (m)
Delete all nodes and edges:
match(n) optional match (n)-[r]-() delete n,r
77
Pattern
Sequence of interleaved nodes
Any node
and relationships Node x and node y having any relation
Relationships with arrow: Any node that has a relation Play with any other node
Describe a single path *2..4 Any node that has a path 2 to 4 KNOW relations
Twitter Example:
80
http://neo4j.com/books/graph-databases/
https://neo4j.com/developer/cypher-query-language/ https://neo4j.com/developer/data-modeling/
82
https://neo4j.com/docs/pdf/cypher-refcard-3.3.pdf
83
Today we discussed
Graph databases
Neo4j