0% found this document useful (0 votes)
11 views

Graph Databases: Phil Bartie

explain this like explaining to 15yrs old girl remember that i'm having examination in next 1 hour so please make sure to cover all the key point in the document

Uploaded by

aksshu1902
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Graph Databases: Phil Bartie

explain this like explaining to 15yrs old girl remember that i'm having examination in next 1 hour so please make sure to cover all the key point in the document

Uploaded by

aksshu1902
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

Graph Databases

F20BD / F21BD
Big Data Management

Phil Bartie
Today we shall discuss

 What is a graph ?
 What a graph database does
 Use cases
 Introduction to Neo4J
Big Data Management

Database Landscape
RDF Virtuoso Object XML Relational Oracle
Jena Caché MarkLogic
Stardog Db4o MySQL
RDF4J
Versant
Sedna MS SQL Server
Tamino
GraphDB BaseX
Blazegraph ObjectStore eXist-db
PostgreSQL DB2
SQLite
MS Access Teradata
NoSQL
SAP Adaptive Server
Key-Value Redis Document MongoDB Hive
Memcached DynamoDB FileMaker
Riak KV MariaDB
Aerospike CouchBase
SimpleDB Elasticsearch Informix Vertica

Wide-Column Graph Neo4J NewSQL SAP HANA


Cassandra Titan Google Spanner
HBase Giraph Clustrix
Accumulo
HyperTable InfiniteGraph VoltDB MemSQL NuoDB
4
Big Data Management

Where is Neo4j?
CA : Guarantees
But (like to give a
Dynamo), CP: Guarantees
correct response but only while responses are correct even
tables are tunable
C
network works fine if there are network
towards CP
(Centralised / Traditional) failures, but response may
fail (Weak availability)

A P

AP: Always provides a “best-effort”


response even in presence of network failures
(Eventual consistency)
5
Graphs
These are charts,
not graphs!
A Simple Graph

A node (vertex) represents an entity


Nodes are connected by relationships (edges)
Relationships can have no direction, one direction or two directions
Can you think of use cases in which graphs feature?
Facebook

https://aws.amazon.com/neptune/

12
6 degrees of Kevin Bacon
 Can all actors be linked to Kevin Bacon in under 6 hops?
http://oracleofbacon.org/

Facebook claims
3.74 degrees of
freedom between
‘strangers’ (Nov 2016)

https://research.fb.com/blog/2016/02/three-and-a
-half-degrees-of-separation/ http://www.classtools.net/blog/wp-content/uploads/2015/04/bacon.jpg

13
£$€

https://aws.amazon.com/neptune/

https://www.flickr.com/photos/jasmic/

https://aws.amazon.com/neptune/
Network Diagram

http://www.conceptdraw.com/How-To-Guide/picture/Network-diagram-System-design.png 15
Big Data Management

London Underground

16
http://www.animalsontheunderground.com
Underground Network Maps

New York subway

Paris Metro
23
Big Data Management

Adjacency: Example

Marble Arch is adjacent


(connected) to:
• Bond Street
• Lancaster Gate

24
Big Data Management

Neighbourhood: Example

Marble Arch 2 step


neighbourhood:
• Bond Street
• Baker Street
• Oxford Circus
• Green Park
• Lancaster Gate
• Queensway
25
Any thing can be graph: social media

http://www.w3.org/TR/rdf11-primer/
Example Applications
 Social Networks:  Technology
 Friendship relationships  Network links
 Cliques  Software dependencies
 Centres

 Transportation:  Science
 Shortest route  Biological pathways
 Intercity air links  Chemistry
 Inter-transport  Electrical networks
 Travelling Salesman  Other
 Fraud detection
 Recommender Systems

27
What is a graph?

2
 Single relationship  Operations
 Set of Vertices (nodes)  Adjacency 3 2
 Neighbours
 Set of Edges (relationships): 1
 Algorithms:
 Directed or undirected Degree
 May be weighted
 Path Finding (number shows connections per node)

 Least cost path


 Centrality
 Degree (indegree,outdegree for directed graphs)
28
Formal Definition
 A graph G = (V, E)
 V: a set of vertices
 E: a set of edges
B
 Properties of vertex d(A) = 3
A d(B) = 2
 Adjacency: v1 adjacent to v2, if edge(v1, v2) exists C
d(C) = 2
 Neighbourhood (subgraph): d(D) = 1
D
N(v)={v1,…vn}, with edge(v, vi) \forall vi
 Degree: d(v) number of adjacent vertices
 Clique (subgraph): every two vertices are adjacent
 Algorithms
 Find shortest path
 Calculate degree
 Find maximum clique 3 cliques of 5 vertices
 Identify centrality
Is a Tree a Graph?
A Tree is special form of graph which has
minimal connections, just one path between any
2 vertices.

Graphs can have many connections between


vertices, including loops.

A Tree has a root node, while a Graph doesn’t


have a root or any concept of a parent/child
relationship.

30
Graph representation
Sequential: adjacency matrix Linked: adjacency list
What is graph analytics?

Trying to understand the


content of your graph to gain
business advantage.

Categories of analytics...
1. Graph query
2. Graph algorithm
3. Graph analytics
Photo by Hans-Peter Gauster on Unsplash
What is the difference between querying & analytics?

Good question…
no agreed answer
as analysis needs
to use querying.

Photo by Evan Dennis on Unsplash


What is the difference between querying &
analytics?
1. Graph query
e.g., Find mutual friends of Alice & Bob.
i.e. looking for simple patterns

2. Graph algorithm
3. Graph analytics
What is the difference between querying &
analytics?
STILL A GRAPH QUERY!

1. Graph query
2. Graph algorithm: queries that use built-in algorithms
e.g., Find the shortest path between Alice & Bob

3. Graph analytics
What is the difference between querying &
analytics?
1. Graph query
2. Graph algorithm
3. Graph analytics: tell you something about the graph in
general
e.g.
what is the average number of films that actors under 30 years old
have appeared in?
What is graph analytics?

1. Graph query
2. Graph algorithm
3. Graph analytics

Note: all 3 categories are graph queries!


The difference is the complexity of the query & its aim.
Question

Compare:
1. Find mutual friends of Alice & Bob.
2. Find all the friends of Alice that are not friends of Bob.

Can you think of a common use case for these sorts of


queries?
Classes of analysis

 Path – find the shortest distance between 2 nodes.


 Connectivity – are 2 nodes connected?
 Distance – distance between 2 nodes?
 Density – number of connections.
 Centrality – determine the most important nodes.
Question!
Suggest a use cases for each of these types of analysis:

 Path – find the shortest distance between 2 nodes.


 Connectivity – are 2 nodes connected?
 Distance & density:
 Distance – distance between 2 nodes?
 Density – number of connections.
 Centrality – determine the most important nodes.
Need a hint?

https://www.flickr.com/photos/jasmic/ Photo by William Iven on Unsplash


Answers
Guess the use cases for these types of analysis:

 Path – Anything that requires route optimisation, e.g., logistics.


 Connectivity – Determine weakness, e.g., power/phone grids.
 Distance & density – understand social group within network.
 Centrality – determine most influential people in a social network.
7 Bridges of Königsberg
Can you walk a route that crosses
every bridge only once?

https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html 43
7 Bridges of Königsberg

Only if the number


of nodes with an
odd degree
is 0 or 2

(If removed edge t)


degree
degree
A route visiting every edge once A(3)
B(3) A(3)
is known as a Euler path B(3)
C(5)
D(3) C(4)
Nodes with ODD connections D(2)
is 4 so Euler path not possible 2 ODD = Euler path possible
https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html 44
Big Data Management

Which of these have Euler Paths?

Only possible if number of


nodes with odd degree is 0
or 2

https://www.mathsisfun.com/activity/seven-bridges-konigsberg.html 45
Graph databases
Graph operations
CRUD operations Graph specific operations

 Create  Adjacency
 Add a vertex or edge  Find adjacent nodes of vertex
 Read  Neighbourhood
 Retrieve data  Find connected vertices up to a
certain radius
 Update
 Change vertex or edge properties  Shortest path
 Find the sequence of edges that
 Delete
make the shortest path between two
 Remove a vertex of edge nodes
Native vs Non-Native Storage
Native = data stored as a graph
Non-native = data not stored as a graph
Check here for more details: https://neo4j.com/blog/native-vs-non-native-graph-technology/
Storing graph in RDBMS
FRIEND Bob’s Friends
1 2
FRIEND
SELECT p2.Name
FRIEND
FROM Person p1,
FRIEND
Friendship f1,
99 Person p2
WHERE p1.ID = f1.PersonID
Person Friendships AND p2.ID = f1.FriendID
ID Name PersonID FriendID AND p1.Name = ‘Bob’
1 Alice 1 2
2 Bob 2 1
… … 2 99
99 Zach … …
99 1
Storing graph in RDBMS Complex query

FRIEND Bob’s Friends-of-friends


1 2
FRIEND SELECT p2.Name
FRIEND
AS Friend_of_friend
FRIEND
FROM Person p1,
99
Friendship f1,
Person Friendships
Person p2,
Friendship f2
ID Name PersonID FriendID
WHERE p1.ID = f1.PersonID
1 Alice 1 2
AND f1.FriendID =
2 Bob 2 1
f2.PersonID
… … 2 99 AND p2.ID = f2.FriendID
99 Zach … … AND p1.Name = ‘Bob’
99 1 AND f2.FriendID <> p1.ID
Storing Graphs in JSON
{
A:{
outE : [B, C]
}
B:{
outE : []
}
C:{
outE : [D]
}
D:{
outE : [A]
}
} http://www.slideshare.net/slidarko/graph-databases-trends-in-the-web-of-data
Native Storage
 Index-free adjacency
 Doesn’t rely on indexes
 Graph databases store relationship
data as first-class entities
 Relationships stored explicitly as
links between nodes
 No stored links to adjacent nodes
(Micro-index or local-index)
Graph processing
 Executing queries by traversing graph
 Following paths
 No index search…
 each node directly references its adjacent
nodes
 acting as a micro-index for all nearby nodes

 Relationships are easier to traverse in


any direction with native graph
processing
 Query time proportional to amount of
graph searched, not total size
Performance: Relationship queries
e.g. Find all friends of my friends
Relational Graph
 Performance degrades with many  Performance stays constant
joins  Queries are localised to a node
 Links explicated as joins  Nodes are linked via relationships
 Multiple self-joins even worse  Cost depends on the radius
 Each self-join must create a copy of
the relation
 Cost increases as cardinality
increases
Query Performance: Set queries

e.g., For each film, how many actors are there?

Relational Graphs
 Exploit data model structure  Have to visit every node
 Use group by operators
Benefits
• Data model looks like graph you would draw on whiteboard, so
simple to discuss with non technical people
• ACID compliant
• Flexible schema
• Cheap to run (e.g. compared to Oracle)
• Relationships are no longer hidden, they are a 1st class entity
• Fast
Native Graph Storage

For this course we will use…

http://neo4j.com/

..other native graph databases are available…


Neo4j- graph database (https://neo4j.com/)
 Characteristics
 Open source, massive scalability (billions of nodes), high availability, master-slave replication,
ACID transactions
 Specific features
 Graph query language: CYPHER
 Traversal framework

 Based on java
 Data model
 Property graph: Collection of nodes and relationships
Nodes

• Correspond to entities UNIVERSITY

• Have properties that describe HWU


the entity; stored as JSON-like

• Have labels that define the


class of entity PERSON

RESEARCH KEN
{
TEACH name: “ken”,
age: 26,
job: “ra”
}
Relationships
• Link 2 entities with a direction

• Have properties that describe the HWU


relationship; stored as JSON-like
{
start : “1.4.2006”,
• Can describe the nature (e.g., EMPLOYS contract : “rolling”
grade : 7
parent- child) }

KEN
• Or be qualitative (e.g., start
date)
• Have labels that define the class of
relationship
• Relationships ALWAYS have 1 direction

• Relationships cannot have 2 directions

• You can have multiple relationships – do you need to?


Answer: no

 You can navigate in reverse!

HOWEVER.. You might want to!

For example: HWU employs Ken Ken is employed by HWU


it doesn’t make sense that Ken Employs HWU

BUT: Ken follows HWU (i.e. on Twitter) and HWU follows Ken
Exercise

Create a graph that models the relationships between


Heriot-Watt University, Edinburgh and yourself.

http://www.apcjones.com/arrows/
UNIVERSITY

LOCATED IN

CITY

STUDIES AT

LIVES IN

PERSON

64
UNIVERSITY

{
name: “hwu”,
founded: 1821
}
LOCATED IN

CITY

STUDIES AT {
name:
“Edinburgh”
}
{
name: “steve”,
age: 23,
LIVES IN
job: “student”
}

PERSON
HWU

KEN

Queries generate sub-graphs


Query
Query
Treating relationships as 1st class entities, and storing
quantitative or qualitative facts about the relationship
enables very complex querying:

Find all men who are connected within three friends of my women friends who
like sailing but not bowling and who live within 30 miles of my address.
Weaknesses
• Still something of a niche product. Many use cases, but not a
general purpose DBMS in the way that RDBMS are.
• Immature in comparison to RDBMS
• Writes are slow (in comparison to something like MongoDB)
• Flexible schema are not that flexible. You need to fix a schema in
order that your program knows what to ask for and how to ask
for it. So if there is no schema defined at the DBMS level, you
end up defining one in the programs that access the data.
• Neo4J Cypher pushing for a standard query language
https://neo4j.com/press-releases/query-language-graph-databa
ses-international-standard/
Cypher

 Neo4j uses Cypher as its query language


 You will use Cypher in the lab!

http://neo4j.com/developer/cypher-query-language/
Cypher: create a node

create ({name: 'kcm'})


{ name:
‘kcm’}

This node is created in Neo4j


Cypher: create a node

create ({name: 'kcm'})


{ name:
‘kcm’}

Operation to run

() indicates a node

This node is created in Neo4j


Cypher: retrieve a node

match (n {name: 'kcm’})


{ name:
return n ‘kcm’}

The node is given a variable (i.e., n), which is returned


Cypher: retrieve a node

Operation to run

match (n {name: 'kcm’})


{ name:
return n ‘kcm’}

n is like a ‘list’ of every node that matches the query.


Each element of the list (ie every node) is returned.
Cypher: retrieve a node & create a relationship

{ name: TEACHES
‘kcm’}
match (n {name: 'kcm’})

create (n)-[:TEACHES]->({course: 'f21df'}) {course:


‘f21df’}

Relationship with label


TEACHES

The name node is returned, then used to create a relationship with the new course node.
Cypher Summary
 Create a node:
create()
 Retrieve all nodes:
match(n) return n
 Create two nodes and a relationship called :LINKS_TO between them:
create()-[:LINKS_TO]-> ()
 Create LINKS_TO relationship between nodes 0 and 1:
match (n),(m) where id(n)=0 and id(m)= 1
create(n) -[:LINKS_TO]-> (m)
 Delete all nodes and edges:
match(n) optional match (n)-[r]-() delete n,r
77
Pattern
 Sequence of interleaved nodes
Any node
and relationships Node x and node y having any relation

 Nodes between () Any movie having any relation with an actor

 Relationship between [] a movie that has a relation with a node having


a property name equal to ‘Ivan Trojan’

 Relationships with arrow: Any node that has a relation Play with any other node

Any movie with relation PLAY having the


the property PLAY with role=‘Ivan”
 <-- , -- , --> Actor who participate in path
of more than 2 KNOW relation

 Describe a single path *2..4 Any node that has a path 2 to 4 KNOW relations
Twitter Example:

Neo4J Aura cloud version


- free
- limited to 200k nodes

MATCH (a:User)-[:FOLLOWS]->(u:User:Me) MATCH (u)-[:FOLLOWS]->(a) RETURN a,u


79
Lab Exercise

 See instructions on Canvas


 Sign up for Neo4J Database AuraDB or install the Desktop App
on your own PC
https://neo4j.com/press-releases/announcing-neo4j-auradb-free/

80
http://neo4j.com/books/graph-databases/
https://neo4j.com/developer/cypher-query-language/ https://neo4j.com/developer/data-modeling/

82
https://neo4j.com/docs/pdf/cypher-refcard-3.3.pdf
83
Today we discussed

 Graph databases
 Neo4j

You might also like