SlideShare a Scribd company logo
Graph Databases
&
Neo4J
Girish Khanzode
Graph Databases
• Graph Based NoSQL Database
• Property Graph Model
• Neo4j
• Noe4j Architecture
• Data Storage
• Programmatic Data Access
• Core API
• Lucene
• Auto Index lifecycle
• Traversers API
• Cypher
• Graph Algorithms
• Neo4j HA
• Cache Sharding
• References
Graphs
• A collection nodes (things) and edges (relationships) that
connect pairs of nodes
– Suitable for any data that is related
• Can attach properties (key-value pairs) on nodes and
relationships
• Relationships connect two nodes and both nodes and
relationships can hold an arbitrary amount of key-value pairs
Graph Relations are Universal
Graph
Graphs
• Well-understood patterns and algorithms
– Studied since Leonard Euler's 7 Bridges (1736)
– Codd's Relational Model (1970)
• Knowledge graph - beyond links, search is smarter when considering how things
are related
• Facebook graph search – people interested in finding things in their part of the
world
• Bing + Britannica: referencing and cross-referencing
• People - relationships to people, to organizations, to places, to things - personal
graph
A Graph Database
• Relationships are first citizens
• NoSQL database optimized for connected data
– Social networking, logistics networks, recommendation engines
– Relationships are as important as the records
– 1000 times faster than RDBMS for connected data
• Uses graph structures with nodes, edges and properties to store data
• Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB
• Very fast querying across records
Graph Database
A Graph Database
• Transactional with the usual operations
• RDBMS - can tell sales in last year
• Graph database – can tell customer which book to buy next
• Index-free adjacency
– Every node is a pointer to its adjacent element
• Edges hold most of the important information and relations
– nodes to other nodes
– nodes to properties
Graph Based NoSQL Database
• No rigid format of SQL or the tables and columns representation
• Uses a flexible graphical representation - addresses scalability concerns
• Data can be easily transformed from one model to the other using a
graph based NoSQL database
• Nodes are organised by some relationships with one another represented
by edges between the nodes
• Both nodes and the relationships have some defined properties
Graph Based NoSQL Database
• Labelled, directed, attributed multi-graph - Graphs contains nodes which
are labelled properly with some properties and these nodes have some
relationship with one another which is shown by the directional edges
• While relational database models can replicate the graphical ones, the
edge would require a join which is a costly proposition
Advantages
• Easier Relationships Analysis
• Very fast for associative data sets
– Like social networks
• Map more directly to object oriented applications
– Object classification and Parent->Child relationships
Disadvantages
• If data is just tabular with not much relationship between the
data, graph databases do not fare well
• OLAP support for graph databases not mature
Performance Experiment
• Compute social network path exists
• 1000 persons
• Average 50 friends per person
• pathExists(a, b) limited to depth 4
# persons query time
Relational
database
1000 2000ms
Neo4j 1000 2ms
Neo4j 1000000 2ms
Property Graph Model
name: the Doctor
age: 907
species:Time Lord
first name: Rose
late name:Tyler
vehicle: Skoda
model:Type 40
Graphs -Whiteboard-friendly
• No decomposition, ER design, normalization / de-
normalization as needed with RDBMS
Neo4j
• A Graph Database
• A Property Graph containing Nodes, Relationships with Properties on
both
• Manage complex, highly connected data
• Scalable - High-performance with High-Availability
– Traverse 1,000,000+ relationships / second on commodity hardware
• Server with REST API, or Embeddable on the JVM
Neo4j
• Full ACID transactions
• Schema free, bottom-up data model design
• Stable
• Easier than RDBMS since no need for normalization
• Implemented in Java
• Open Source
Neo4j
• Schema free – Data does not have to adhere to any convention
• Support for wide variety of languages - Java, Python, Perl, Scala,Cypher
• A graph database can be thought of as a key-value store, with full support
for relationships.
• Graph databases don’t avoid design efforts
• Good design still requires effort
Why Neo4J?
• The internet is a network of pages connected to each other.
What is a better way to model that than in graphs?
• No time lost fighting with less expressive data-stores
• Easy to implement experimental features
• A single instance of Neo4j can house at most 34 billion nodes,
34 billion relationships and 68 billion properties
Core API
REST API
JVM Language Bindings
Traversal Framework
Caches
Memory-Mapped (N)IO
Filesystem
Java Ruby Clojure…
Graph Matching
Noe4j Architecture
Software Architecture
Data Storage
• Neo4j stores graph data in a number of different store files
• Each store file contains the data for a specific part of the
graph
– neostore.nodestore.db
– neostore.relationshipstore.db
– neostore.propertystore.db
– neostore.propertystore.db.index
– neostore.propertystore.db.strings
– neostore.propertystore.db.arrays
Node Store
• Size: 9 bytes
– 1st byte - in-use flag
– Next 4 bytes - ID of first relationship
– Last 4 bytes - ID of first property of node
• Fixed size records enable fast lookups
Relationship store
• neostore.relationshipstore.db
• Size: 33 bytes
• 1st byte - In use flag
• Next 8 bytes - IDs of the nodes at the start and end of the relationship
• 4 bytes - Pointer to the relationship type
• 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. (
property chain)
• 4 bytes - next property id
Relationships Storage
Data Size
nodes 235 (∼ 34 billion)
relationships 235 (∼ 34 billion)
properties 236 to 238 depending on property types (maximum ∼ 274
billion, always at least ∼ 68 billion)
relationship
types
215 (∼ 32 000)
Neo4j API – LogicalView
Programmatic Data Access
• JavaAPIs - JVM languages bind to sameAPIs
• JRuby, Jython, Clojure, Scala…
• Manage nodes and relationships
• Indexing – find data without traversal
• Traversing
• Path finding
• Pattern matching
Core API
• Deals with graphs in terms of their fundamentals
• Nodes - properties
– KV Pairs
• Relationships
– Start node
– End node
– Properties
• KV Pairs
Create Node
GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "the Doctor");
tx.success();
} finally
{
tx.finish();
}
Create Relationships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally
{
tx.finish();
}
Index a Graph
• Graphs themselves are indexes
• Can create short-cuts to well-known nodes
• In program, keep a reference to any interesting node
• Indexes offer flexibility in what constitutes an “interesting
node”
Lucene
• The default index implementation for Neo4j
– Default implementation for IndexManager
• Supports many indexes per database
• Each index supports nodes or relationships
• Supports exact and regex-based matching
• Supports scoring
– Number of hits in the index for a given item
– Great for recommendations
Create a Node Index
GraphDatabaseService db = …
Index<Node> planets = db.index().forNodes("planets");
Type
Type
Indexname
CreateOR
retrieve
Create a Relationship Index
GraphDatabaseService db = …
Index<Relationship> enemies = db.index().forRelationships("enemies");
Type
Type
Indexname
CreateOR
retrieve
Exact Matches
GraphDatabaseService db = …
Index<Node> actors = doctorWhoDatabase.index().forNodes("actors");
Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle();
Valueto
match
Firstmatch
only
Key
Query Matches
GraphDatabaseService db = …
Index<Node> species = doctorWhoDatabase.index().forNodes("species");
IndexHits<Node> speciesHits = species.query("species“,"S*n");
Query
Key
Transactions to Mutate Indexes
• Mutating access is still protected by transactions which cover both index and graph
GraphDatabaseService db = …
Transaction tx = db.beginTx();
try {
Node nixon= db.createNode();
nixon("character", "Richard Nixon");
db.index().forNodes("characters").add(nixon,
"character“, nixon.getProperty("character"));
tx.success();
} finally {
tx.finish();
}
Auto Index lifecycle
• Auto Index - stays consistent with the graph data
• Specify the property name to index while creation
• If node/relationship or property is removed from the graph it is removed
from the index
• If database started with auto indexing enabled but different auto indexed
properties than the last run, then already auto-indexed entities will be
deleted as they are worked upon
• Re-indexing is a manual
– Existing properties not indexed unless touched
Auto Index lifecycle
AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer();
nodeAutoIndex.startAutoIndexingProperty("species");
nodeAutoIndex.setEnabled( true );
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
Node -> Relationship Indexes Supported
Core API
• Basic (nodes, relationships)
• Fast
• Imperative
• Flexible - Easily intermix mutating operations
Traversers API
• Mechanisms to query graph navigating from starting node to
related nodes according to algorithm to get answers
• Expressive
• Fast
• Declarative (mostly)
• Opinionated
Cypher - A Graph Query Language
• Query Language for Neo4j
• A declarative graph pattern matching language
– SQL for graphs
– Tabular results
• aggregation, ordering and limits
• Mutating operations
• CRUD
• Easy to formulate queries based on relationships
• Many features stem from improving pain points of SQL like join tables
Cypher - A Graph Query Language
Cypher
Query
• Query:
MATCH(n:Crew)-[r:KNOWS*]-m
WHERE n.name = ‘Neo’
RETUEN nAS Nep,r,m
Operations
• Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT
• Where clause
start doctor=node:characters(name = 'Doctor‘)
match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom
Baker‘ and episode.title =~ /.*Dalek.*/
return episode.title
• Ordering
– order by <property>
– order by <property> desc
Graph Algorithms
• Neo4j has built-in algorithms
• Callable through JVM and REST APIs
• Higher level of abstraction
• Graph Matching
– Look for patterns in a data set - retail analytics
– Higher-level abstraction than raw traversers
• REST API
– Access the server
• Binary protocol
– JSON as default format
Neo4j HA - High Availability Cluster
• A scalability package known as high availability or HA that
uses a master-slave cluster architecture
– Full data redundancy
– Service fault tolerance
– Linear read scalability
– Master-slave replication
• Single data-centre or global zones
– tolerance for high-latency
Neo4j HA
• Redundancy - improved uptime
– automatic failover
• In a Neo4j HA cluster the full graph is replicated to each instance in the
cluster.
• Full dataset is replicated across the entire cluster to each server
• Read operations can be done locally on each slave
• Read capacity of the HA cluster increases linearly with the number of
servers
Neo4j HA
HA Cluster Architecture
• Cluster performs automatic master election
• Supports master-slave replication for clustering and DR
across sites
HA Cluster Architecture
Write to a Master
• All write operations are co-ordinated by the master
• Writes to the master are fast
• Slaves eventually catch up
Write to a Master
Write to a Slave
• Writes to a slave cause a synchronous transaction
with the master
• Other slaves eventually catch up
Write to a Slave
Server Overload Problem
• Unlike other classes of NOSQL database, a graph does not
have predictable lookup since it is a highly mutable structure
• We want to co-locate related nodes for traversal
performance, but we don’t want to place so many connected
nodes on the same database that it becomes heavily loaded
• The black-hole problem - popular nodes get lumped together
on a single instance, but there is low point cut
Server Overload Problem
Thinly Spread Network
• The opposite is also true, that we don’t want too widely connected nodes
across different database instances since it will incur a substantial
performance penalty at runtime as traversals cross the (relatively latent)
network
• Load-leveling alone can lead to many relationships crossing instances
• These are very expensive to traverse, networks are many orders of
magnitude slower than in-memory traversals
Thinly Spread Network
Minimal Point Cut
• The best approach is to balance a graph across database instances by
creating a minimum point cut for a graph, where graph nodes are placed
such that there are few relationships that span shards
• Good strategy is to take a local view of the graph (no global locks) and
work incrementally (short bursts)
• Take into account use patterns
• Unlike other NoSQL stores, graph s are not predictable so we can not use
techniques like consistent hashing for scale out
Minimal Point Cut
Cache Sharding
• A strategy for large data sets of terabyte scale
• Mandates consistent request routing
• For instance, requests for user A are always sent to server 1,
while requests for user B are always sent to server 2 and so on
• The key assumption is that requests for user A typically touch
parts of the graph around user A, such has his or her friends,
preferences, likes and so on
Cache Sharding
• This means that the neighbourhood of the graph around user
A will be cached on server 1, while the neighbourhood around
user B will be cached on server 2
• By employing consistent routing of requests, the caches of all
servers in the HA cluster can be utilized maximally
• Strategy is highly effective for managing a large graph that
does not fit in RAM
Consistent Routing
• Always try to route related requests to the same server to hopefully
benefit from warm caches
Domain Specific Sharding
• No easy to shard graphs like documents or KV stores
• High performance graph databases limited in terms of data set size that
can be handled by a single machine
• Use replicas to speed up and improve availability but limits data set size
limited to a single machine’s disk/memory
• No perfect algorithm exists but domain insight of expert helps
Domain Specific Sharding
• Some domains can shard easily (geo, most web apps) using consistent
routing approach and cache sharding
– Geo - where the connections between cities are few compared with the
connections within the cities. So can place cities or countries on different
nodes
• Eventually (Petabytes) level data cannot be replicated practically
• Need to shard data across machines
References
1. http://www.neo4j.org
2. http://www.neo4j.org/learn/cypher
3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases
http://graphaware.com/assets/bachman-msc-thesis.pdf
4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381
5. Mistry, Deep Neo4j: A Developer’s Perspective
http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective
6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs
7. Parallel Breadth First Search on GPU Clusters
8. DB-Engines Ranking of Graph DBMS
ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode
Ad

More Related Content

What's hot (20)

Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
Neo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Neo4j
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
Eugene Hanikblum
 
NoSql
NoSqlNoSql
NoSql
Girish Khanzode
 
NoSql
NoSqlNoSql
NoSql
AnitaSenthilkumar
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
Neo4j
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Graph database
Graph database Graph database
Graph database
Shruti Arya
 
Graph databases
Graph databasesGraph databases
Graph databases
Vinoth Kannan
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
Max De Marzi
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Neo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
Neo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
Neo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Neo4j
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
Eugene Hanikblum
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
Neo4j
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Neo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
Neo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 

Similar to Graph Databases (20)

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
Max De Marzi
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
thai
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
Xavier Gorse
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
Sina Khorami
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
Gerry McNicol
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
Neo4j
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
Rajeev Srivastava
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
Arpit Poladia
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
jexp
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
Sandeep Kumar
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
Neo4j
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
Rik Van Bruggen
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
Max De Marzi
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
thai
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
Xavier Gorse
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
Sina Khorami
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
Gerry McNicol
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
Neo4j
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
Rajeev Srivastava
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
Arpit Poladia
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
jexp
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
Neo4j
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
Rik Van Bruggen
 
Ad

More from Girish Khanzode (12)

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
Girish Khanzode
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
Girish Khanzode
 
IR
IRIR
IR
Girish Khanzode
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
NLP
NLPNLP
NLP
Girish Khanzode
 
NLTK
NLTKNLTK
NLTK
Girish Khanzode
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Girish Khanzode
 
Hadoop
HadoopHadoop
Hadoop
Girish Khanzode
 
Language R
Language RLanguage R
Language R
Girish Khanzode
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
Girish Khanzode
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
Girish Khanzode
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 

Graph Databases

  • 2. Graph Databases • Graph Based NoSQL Database • Property Graph Model • Neo4j • Noe4j Architecture • Data Storage • Programmatic Data Access • Core API • Lucene • Auto Index lifecycle • Traversers API • Cypher • Graph Algorithms • Neo4j HA • Cache Sharding • References
  • 3. Graphs • A collection nodes (things) and edges (relationships) that connect pairs of nodes – Suitable for any data that is related • Can attach properties (key-value pairs) on nodes and relationships • Relationships connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs
  • 6. Graphs • Well-understood patterns and algorithms – Studied since Leonard Euler's 7 Bridges (1736) – Codd's Relational Model (1970) • Knowledge graph - beyond links, search is smarter when considering how things are related • Facebook graph search – people interested in finding things in their part of the world • Bing + Britannica: referencing and cross-referencing • People - relationships to people, to organizations, to places, to things - personal graph
  • 7. A Graph Database • Relationships are first citizens • NoSQL database optimized for connected data – Social networking, logistics networks, recommendation engines – Relationships are as important as the records – 1000 times faster than RDBMS for connected data • Uses graph structures with nodes, edges and properties to store data • Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB • Very fast querying across records
  • 9. A Graph Database • Transactional with the usual operations • RDBMS - can tell sales in last year • Graph database – can tell customer which book to buy next • Index-free adjacency – Every node is a pointer to its adjacent element • Edges hold most of the important information and relations – nodes to other nodes – nodes to properties
  • 10. Graph Based NoSQL Database • No rigid format of SQL or the tables and columns representation • Uses a flexible graphical representation - addresses scalability concerns • Data can be easily transformed from one model to the other using a graph based NoSQL database • Nodes are organised by some relationships with one another represented by edges between the nodes • Both nodes and the relationships have some defined properties
  • 11. Graph Based NoSQL Database • Labelled, directed, attributed multi-graph - Graphs contains nodes which are labelled properly with some properties and these nodes have some relationship with one another which is shown by the directional edges • While relational database models can replicate the graphical ones, the edge would require a join which is a costly proposition
  • 12. Advantages • Easier Relationships Analysis • Very fast for associative data sets – Like social networks • Map more directly to object oriented applications – Object classification and Parent->Child relationships
  • 13. Disadvantages • If data is just tabular with not much relationship between the data, graph databases do not fare well • OLAP support for graph databases not mature
  • 14. Performance Experiment • Compute social network path exists • 1000 persons • Average 50 friends per person • pathExists(a, b) limited to depth 4 # persons query time Relational database 1000 2000ms Neo4j 1000 2ms Neo4j 1000000 2ms
  • 15. Property Graph Model name: the Doctor age: 907 species:Time Lord first name: Rose late name:Tyler vehicle: Skoda model:Type 40
  • 16. Graphs -Whiteboard-friendly • No decomposition, ER design, normalization / de- normalization as needed with RDBMS
  • 17. Neo4j • A Graph Database • A Property Graph containing Nodes, Relationships with Properties on both • Manage complex, highly connected data • Scalable - High-performance with High-Availability – Traverse 1,000,000+ relationships / second on commodity hardware • Server with REST API, or Embeddable on the JVM
  • 18. Neo4j • Full ACID transactions • Schema free, bottom-up data model design • Stable • Easier than RDBMS since no need for normalization • Implemented in Java • Open Source
  • 19. Neo4j • Schema free – Data does not have to adhere to any convention • Support for wide variety of languages - Java, Python, Perl, Scala,Cypher • A graph database can be thought of as a key-value store, with full support for relationships. • Graph databases don’t avoid design efforts • Good design still requires effort
  • 20. Why Neo4J? • The internet is a network of pages connected to each other. What is a better way to model that than in graphs? • No time lost fighting with less expressive data-stores • Easy to implement experimental features • A single instance of Neo4j can house at most 34 billion nodes, 34 billion relationships and 68 billion properties
  • 21. Core API REST API JVM Language Bindings Traversal Framework Caches Memory-Mapped (N)IO Filesystem Java Ruby Clojure… Graph Matching Noe4j Architecture
  • 23. Data Storage • Neo4j stores graph data in a number of different store files • Each store file contains the data for a specific part of the graph – neostore.nodestore.db – neostore.relationshipstore.db – neostore.propertystore.db – neostore.propertystore.db.index – neostore.propertystore.db.strings – neostore.propertystore.db.arrays
  • 24. Node Store • Size: 9 bytes – 1st byte - in-use flag – Next 4 bytes - ID of first relationship – Last 4 bytes - ID of first property of node • Fixed size records enable fast lookups
  • 25. Relationship store • neostore.relationshipstore.db • Size: 33 bytes • 1st byte - In use flag • Next 8 bytes - IDs of the nodes at the start and end of the relationship • 4 bytes - Pointer to the relationship type • 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. ( property chain) • 4 bytes - next property id
  • 27. Data Size nodes 235 (∼ 34 billion) relationships 235 (∼ 34 billion) properties 236 to 238 depending on property types (maximum ∼ 274 billion, always at least ∼ 68 billion) relationship types 215 (∼ 32 000)
  • 28. Neo4j API – LogicalView
  • 29. Programmatic Data Access • JavaAPIs - JVM languages bind to sameAPIs • JRuby, Jython, Clojure, Scala… • Manage nodes and relationships • Indexing – find data without traversal • Traversing • Path finding • Pattern matching
  • 30. Core API • Deals with graphs in terms of their fundamentals • Nodes - properties – KV Pairs • Relationships – Start node – End node – Properties • KV Pairs
  • 31. Create Node GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo"); Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "the Doctor"); tx.success(); } finally { tx.finish(); }
  • 32. Create Relationships Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "The Doctor"); Node susan = db.createNode(); susan.setProperty("firstname", "Susan"); susan.setProperty("lastname", "Campbell"); susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF")); tx.success(); } finally { tx.finish(); }
  • 33. Index a Graph • Graphs themselves are indexes • Can create short-cuts to well-known nodes • In program, keep a reference to any interesting node • Indexes offer flexibility in what constitutes an “interesting node”
  • 34. Lucene • The default index implementation for Neo4j – Default implementation for IndexManager • Supports many indexes per database • Each index supports nodes or relationships • Supports exact and regex-based matching • Supports scoring – Number of hits in the index for a given item – Great for recommendations
  • 35. Create a Node Index GraphDatabaseService db = … Index<Node> planets = db.index().forNodes("planets"); Type Type Indexname CreateOR retrieve
  • 36. Create a Relationship Index GraphDatabaseService db = … Index<Relationship> enemies = db.index().forRelationships("enemies"); Type Type Indexname CreateOR retrieve
  • 37. Exact Matches GraphDatabaseService db = … Index<Node> actors = doctorWhoDatabase.index().forNodes("actors"); Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle(); Valueto match Firstmatch only Key
  • 38. Query Matches GraphDatabaseService db = … Index<Node> species = doctorWhoDatabase.index().forNodes("species"); IndexHits<Node> speciesHits = species.query("species“,"S*n"); Query Key
  • 39. Transactions to Mutate Indexes • Mutating access is still protected by transactions which cover both index and graph GraphDatabaseService db = … Transaction tx = db.beginTx(); try { Node nixon= db.createNode(); nixon("character", "Richard Nixon"); db.index().forNodes("characters").add(nixon, "character“, nixon.getProperty("character")); tx.success(); } finally { tx.finish(); }
  • 40. Auto Index lifecycle • Auto Index - stays consistent with the graph data • Specify the property name to index while creation • If node/relationship or property is removed from the graph it is removed from the index • If database started with auto indexing enabled but different auto indexed properties than the last run, then already auto-indexed entities will be deleted as they are worked upon • Re-indexing is a manual – Existing properties not indexed unless touched
  • 41. Auto Index lifecycle AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer(); nodeAutoIndex.startAutoIndexingProperty("species"); nodeAutoIndex.setEnabled( true ); ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex(); Node -> Relationship Indexes Supported
  • 42. Core API • Basic (nodes, relationships) • Fast • Imperative • Flexible - Easily intermix mutating operations
  • 43. Traversers API • Mechanisms to query graph navigating from starting node to related nodes according to algorithm to get answers • Expressive • Fast • Declarative (mostly) • Opinionated
  • 44. Cypher - A Graph Query Language • Query Language for Neo4j • A declarative graph pattern matching language – SQL for graphs – Tabular results • aggregation, ordering and limits • Mutating operations • CRUD • Easy to formulate queries based on relationships • Many features stem from improving pain points of SQL like join tables
  • 45. Cypher - A Graph Query Language
  • 48. Operations • Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT • Where clause start doctor=node:characters(name = 'Doctor‘) match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom Baker‘ and episode.title =~ /.*Dalek.*/ return episode.title • Ordering – order by <property> – order by <property> desc
  • 49. Graph Algorithms • Neo4j has built-in algorithms • Callable through JVM and REST APIs • Higher level of abstraction • Graph Matching – Look for patterns in a data set - retail analytics – Higher-level abstraction than raw traversers • REST API – Access the server • Binary protocol – JSON as default format
  • 50. Neo4j HA - High Availability Cluster • A scalability package known as high availability or HA that uses a master-slave cluster architecture – Full data redundancy – Service fault tolerance – Linear read scalability – Master-slave replication • Single data-centre or global zones – tolerance for high-latency
  • 51. Neo4j HA • Redundancy - improved uptime – automatic failover • In a Neo4j HA cluster the full graph is replicated to each instance in the cluster. • Full dataset is replicated across the entire cluster to each server • Read operations can be done locally on each slave • Read capacity of the HA cluster increases linearly with the number of servers
  • 53. HA Cluster Architecture • Cluster performs automatic master election • Supports master-slave replication for clustering and DR across sites
  • 55. Write to a Master • All write operations are co-ordinated by the master • Writes to the master are fast • Slaves eventually catch up
  • 56. Write to a Master
  • 57. Write to a Slave • Writes to a slave cause a synchronous transaction with the master • Other slaves eventually catch up
  • 58. Write to a Slave
  • 59. Server Overload Problem • Unlike other classes of NOSQL database, a graph does not have predictable lookup since it is a highly mutable structure • We want to co-locate related nodes for traversal performance, but we don’t want to place so many connected nodes on the same database that it becomes heavily loaded • The black-hole problem - popular nodes get lumped together on a single instance, but there is low point cut
  • 61. Thinly Spread Network • The opposite is also true, that we don’t want too widely connected nodes across different database instances since it will incur a substantial performance penalty at runtime as traversals cross the (relatively latent) network • Load-leveling alone can lead to many relationships crossing instances • These are very expensive to traverse, networks are many orders of magnitude slower than in-memory traversals
  • 63. Minimal Point Cut • The best approach is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards • Good strategy is to take a local view of the graph (no global locks) and work incrementally (short bursts) • Take into account use patterns • Unlike other NoSQL stores, graph s are not predictable so we can not use techniques like consistent hashing for scale out
  • 65. Cache Sharding • A strategy for large data sets of terabyte scale • Mandates consistent request routing • For instance, requests for user A are always sent to server 1, while requests for user B are always sent to server 2 and so on • The key assumption is that requests for user A typically touch parts of the graph around user A, such has his or her friends, preferences, likes and so on
  • 66. Cache Sharding • This means that the neighbourhood of the graph around user A will be cached on server 1, while the neighbourhood around user B will be cached on server 2 • By employing consistent routing of requests, the caches of all servers in the HA cluster can be utilized maximally • Strategy is highly effective for managing a large graph that does not fit in RAM
  • 67. Consistent Routing • Always try to route related requests to the same server to hopefully benefit from warm caches
  • 68. Domain Specific Sharding • No easy to shard graphs like documents or KV stores • High performance graph databases limited in terms of data set size that can be handled by a single machine • Use replicas to speed up and improve availability but limits data set size limited to a single machine’s disk/memory • No perfect algorithm exists but domain insight of expert helps
  • 69. Domain Specific Sharding • Some domains can shard easily (geo, most web apps) using consistent routing approach and cache sharding – Geo - where the connections between cities are few compared with the connections within the cities. So can place cities or countries on different nodes • Eventually (Petabytes) level data cannot be replicated practically • Need to shard data across machines
  • 70. References 1. http://www.neo4j.org 2. http://www.neo4j.org/learn/cypher 3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases http://graphaware.com/assets/bachman-msc-thesis.pdf 4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381 5. Mistry, Deep Neo4j: A Developer’s Perspective http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective 6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs 7. Parallel Breadth First Search on GPU Clusters 8. DB-Engines Ranking of Graph DBMS
  • 71. ThankYou Check Out My LinkedIn Profile at https://in.linkedin.com/in/girishkhanzode