SlideShare a Scribd company logo
Optimized Graph
Algorithms in Neo4j
Use the Power of Connections to Drive Discovery
January 2018
Mark Needham
Amy Hodler
Mark Needham
Software Engineer, Neo4j
mark.needham@neo4j.com
@markhneedham
Next 50 Minutes
• Why Use Graph Analytics
• Randomness vs. Reality
• Graph Analytics Takes Off
• How to Run Graph Analytics
• Neo4j Graph Analytics and Algorithms
• Demos and Implementation
Graph
Algorithms
Real-World
Networks
Amy E. Hodler
Analytics Marketing, Neo4j
amy.hodler@neo4j.com
@amyhodler
Understand. Predict. Prescribe.
Forecast Complex Network Behavior
and Prescribe Action
Cascading Failures
Airline Congestion - 2010
Source: “Systemic delay propagation in the US airport network” – Fleurquin, Ramasco, Eguiluz
Planning and Least
Cost Routing
Bridge Points
Languages – Telecom Network
Source: “Fast unfolding of communities in large networks” – Blondel, Guillaume, Lambiotte, Lefebvre
Extract Structure and Model Processes
Real Networks Aren’t Random
Preferential
Attachment
Nodes tend to link to nodes
that already have a lot of links
Origins Debated
• Local Mechanisms
• Global Optimization
• Mixed or Other
Network Structures are Inseparable from Development
Concentrated
Distribution
Source: “How Stuff Spreads” – Pulsar Platform
NodeswithkLinks
Number of links (k)
Many nodes with only
a few links
A few hubs with a
large number of links
Power Law Distribution
“There is No Network in Nature that we
know of that would be described by the
Random network model.”
- Albert-László Barabási
Small-World
High local clustering
and short average path
lengths. Hub and spoke
architecture.
Scale-Free
Hub and spoke
architecture preserved
at multiple scales. High
power law distribution.
Random
Average distributions.
No structure or
hierarchical patterns.
Reality
The Lure of Averages
Source: Network Science - Barabasi
Art: Ulysses and the Sirens – Herbert James Draper
NodeswithkLinks
Number of Links (k)
Average Distribution
- Random -
Most nodes have the
same number of links
No highly
connected nodes
Resist The Lure
of AveragesNodeswithkLinks
Number of Links (k)
Average Distribution
- Random -
Most nodes have the
same number of links
No highly
connected nodes
NodeswithkLinks
Number of links (k)
Power Law Distribution
- Scale-Free -
Many nodes with only
a few links
A few hubs with a
large number of links
Source: Network Science - Barabasi
Resist The Lure
of AveragesNodeswithkLinks
Number of Links (k)
Average Distribution
- Random -
Art: Ulysses and the Sirens – Herbert James Draper
Most nodes have the
same number of links
No highly
connected nodes
You’ll Miss the Structure
Hidden in Your Networks
- Scale-Free -
- Small World -
Source: Network Science - Barabasi
Graph Analytics
Takes Off
#Finally!
Leonhard Euler 1707-1783
Critical Mass
• Collect, share and analyze
massive connected data
• Discovered common
principles and structures
• Existing mathematical tools
• Unfulfilled promises of
big data
Insights from Algorithms
Insights from Algorithms
Graph Algorithms
• Metrics
• Relevance
• Clustering
• Structural Insights
Machine Learning
• Classification, Regression
• NLP, Structural/Content
Predictions
• Neural Networks as Graphs
• Graph As Compute Fabric
Structures Can Hide
Source: “Communities, modules and large-scale structure in networks“ - Mark Newman
Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and
inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
Graph of Thrones
A. Beveridge: GoT - Interaction Graph from Books
Graph of Thrones
A. Beveridge: GoT - Interaction Graph from Books
How to Run
Graph Analytics?
Existing Options (so far)
•Data Processing
•Spark with GraphX, Flink with Gelly
•Dedicated Graph Processing
• Urika, GraphLab, Giraph, Mosaic, GPS, Signal-Collect,
Gradoop
•Data Scientist Toolkit
• igraph, NetworkX, Boost(graph-tool) in Python, R, C
Drawbacks
• Manage several tools
• Selection -> learning ->
installation -> operation
• Data selection, projection and
transfer
• Tedious and time consuming
• Scalability
• Especially classic data
science tools
An Example
From Past GraphConnect
Source: John Swain - Twitter Analytics Right Relevance Talk
Many Moving Parts!
Example Workflow Pipeline
Twitter
Streaming API
Python Tweet
Collection
(includes user
data)
Rabbit
MQ
MongoDB
Neo4j
R Scripts
-Graph Stats
-Community
Detection
MySQL
Graph
.graphml
Tableau
Graph
Visualization
Moved from Twitter
Search API to
Streaming API
Replaced Python
Twitter libraries
(Tweepy) with raw API
calls
Streaming tweets in message queue
Full tweets and user data stored in
MongoDB
Built graph for analysis in Neo4j from
tweets persisted in MongoDB
Analysis in R
iGraph libraries for
algorithms
Some text analysis e.g.
LDA topics
Results published in MySQL
for Tableau
Graphml for import to Gephi
with stats precalculated
Our Goal
Twitter
Streaming API
Python Tweet
Collection
(includes user
data)
Rabbit
MQ
MongoDB
Neo4j
R Scripts
-Graph Stats
-Community
Detection
MySQL
Graph
.graphml
Tableau
Graph
Visualization
Example Workflow Pipeline
Neo4j Graph Analytics
and Algorithms
Neo4j
Native Graph
Database
Analytics
Integrations
Cypher Query
Language
Wide Range of
APOC Procedures
Optimized
Graph Algorithms
Finds the optimal path
or evaluates route
availability and quality
Evaluates how a
group is clustered
or partitioned
Determines the
importance of distinct
nodes in the network
1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph returns statistics
CALL algo.<name>('Label','TYPE',{conf})
Usage
Pass in Cypher statement for node- and relationship-lists.
CALL algo.<name>(
'MATCH ... RETURN id(n)',
'MATCH (n)-->(m)
RETURN id(n) as source,
id(m) as target', {graph:'cypher'})
Cypher Projection
• PageRank (baseline)
• Betweeness
• Closeness
• Degree
Algorithms - Centralities
Pathfinding
Centrality
Community
Detection
• Label Propagation
• Union Find / WCC
• Strongly Connected Components
• Louvain
• Triangle-Count / Clustering Coefficent
Algorithms – Communitity Detection
Pathfinding
Community
Detection
Centrality
• Single Source Short Path
• All-Nodes SSP
• Parallel BFS / DFS
Algorithms - Pathfinding
Centrality Community
Detection
Pathfinding
Iterate Quickly
• Combine data from sources into one graph
• Project to relevant subgraphs
• Enrich data with algorithms
• Traverse, collect, filter aggregate with queries
• Visualize, Explore, Decide, Export
• From all APIs and Tools
Demo Time!
Datasets
Yelp Business Graph
• 5m nodes
• 17m relationships
Bitcoin
• 1.7bn nodes,
• 2.7bn rels
DBPedia
• 11m nodes
• 116m relationships
DBpedia
DBPedia
Shallow Copy of Wikipedia: (Page) -[:Link]-> (Page)
CALL algo.pageRank.stream('Page', 'Link', {iterations:5}) YIELD node, score
WITH *
ORDER BY score DESC
LIMIT 5
RETURN node.title, score;
+--------------------------------------+
| node.title | score |
+--------------------------------------+
| "United States" | 13349.2 |
| "Animal" | 6077.77 |
| "France" | 5025.61 |
| "List of sovereign states" | 4913.92 |
| "Germany" | 4662.32 |
+--------------------------------------+
5 rows 46 seconds
DBPedia – Largest Clusters
CALL algo.labelPropagation();
// First 1M pages by Rank
MATCH (n:Page)
WITH n
ORDER BY n.pagerank DESC
LIMIT 1000000
// group by partition
WITH n.partition AS partition,
count(*) AS clusterSize,
collect(n.title) AS pages
// return most influential node for largest clusters
RETURN pages[0] AS mainPage,
pages[1..10] AS otherPages
ORDER BY clusterSize DESC
LIMIT 20
Yelp
Yelp
• Business Reviews by Users
•Businesses have Categories and Locations
•Users have Friends
•Bi-partite-Network (:User)-->(:Business)
projections (:User)<-->(:User) &
(:Business)<-->(:Business)
Yelp – Social - Statistics
MATCH (u:User) where exists ( (u)-[:FRIENDS]-() )
WITH u.average_stars as stars, u.review_count as reviews, u.funny as funny
RETURN max(stars),avg(stars),stdev(stars),max(reviews),avg(reviews),stdev(reviews),max(funny),avg(funny),stdev(funny);
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| max(stars) | avg(stars) | stdev(stars) | max(reviews) | avg(reviews) | stdev(reviews) | max(funny) | avg(funny) | stdev(funny) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5.0 | 3.8238072950764947 | 0.8862511758625753 | 11284 | 45.81704314022204 | 120.52419266925014 | 170896 | 36.26637835535585 | 731.6024752545679 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
MATCH (u:User) where exists ( (u)-[:FRIENDS]-() )
WITH u.yelping_since as since
RETURN substring(since,0,4) as year, count(*) as total
ORDER BY year asc limit 10;
+----------------+
| year | total |
+----------------+
| "2004" | 64 |
| "2005" | 844 |
| "2006" | 4504 |
| "2007" | 11833 |
| "2008" | 20729 |
| "2009" | 33965 |
| "2010" | 53046 |
| "2011" | 70331 |
| "2012" | 62596 |
| "2013" | 57330 |
+----------------+
Yelp – Social - PageRank
call algo.pageRank.stream('User','FRIENDS')
yield node,score with node,score
order by score desc limit 10
return node {.name, .review_count, .average_stars,.useful,.yelping_since,.funny},
score,
size( (node)<-[:FRIENDS]-()<-[:FRIENDS]-()) as in,
size( (node)-[:FRIENDS]->()-[:FRIENDS]->()) as out;
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| node | score |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| {funny -> 61200, name -> "Philip", average_stars -> 3.93, review_count -> 788, useful -> 69448, yelping_since -> "2007-06-09"} | 208.31336799999994 |
| {funny -> 21432, name -> "Des", average_stars -> 3.88, review_count -> 78, useful -> 140024, yelping_since -> "2014-04-01"} | 201.28600150000003 |
| {funny -> 465, name -> "Dallas", average_stars -> 4.17, review_count -> 330, useful -> 5517, yelping_since -> "2010-11-07"} | 192.164762 |
| {funny -> 1019, name -> "Cara", average_stars -> 3.96, review_count -> 842, useful -> 11738, yelping_since -> "2010-07-21"} | 184.01898249999996 |
| {funny -> 1233, name -> "Walker", average_stars -> 3.91, review_count -> 462, useful -> 12332, yelping_since -> "2007-01-25"} | 180.48898350000005 |
| {funny -> 13432, name -> "Gabi", average_stars -> 4.05, review_count -> 1730, useful -> 20759, yelping_since -> "2007-08-10"} | 163.29424850000004 |
| {funny -> 12848, name -> "Ruggy", average_stars -> 3.92, review_count -> 2118, useful -> 72265, yelping_since -> "2007-07-31"} | 161.87635500000002 |
| {funny -> 9997, name -> "Bill", average_stars -> 3.38, review_count -> 595, useful -> 12074, yelping_since -> "2014-04-05"} | 157.0438075 |
| {funny -> 1544, name -> "Ashley", average_stars -> 3.7, review_count -> 224, useful -> 1610, yelping_since -> "2009-09-29"} | 150.21423599999997 |
| {funny -> 3599, name -> "Risa", average_stars -> 4.08, review_count -> 1044, useful -> 22121, yelping_since -> "2011-07-30"} | 138.20863199999997 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows
3236 ms
Yelp
•Inferred network of users, via jointly reviewed businesses
• (u1:User)-[:WROTE]->(review1)-[:REVIEWS]->(business)<-[:REVIEWS]-(review2)<-[:WROTE]-(u2:User)
• 1,3bn paths
• Inferred network of businesses, via jointly reviewed by user
• (b1:Business)<-[:REVIEWS]-()<-[:WROTE]-(u)-[:WROTE]->()-[:REVIEWS]->(b2:Business)
• 214m paths
• subset: (b1:Business)-[:CO_OCCURENT_REVIEWS]-(b2:Business)
Yelp
•Inferred network of users, via jointly reviewed businesses
• (u1:User)-[:WROTE]->(review1)-[:REVIEWS]->(business)<-[:REVIEWS]-(review2)<-[:WROTE]-(u2:User)
• 1.3bn paths
• Inferred network of businesses, via jointly reviewed by user
• (b1:Business)<-[:REVIEWS]-()<-[:WROTE]-(u)-[:WROTE]->()-[:REVIEWS]->(b2:Business)
• 214m paths
Yelp – Business – Co-Occurrence
•Find clusters of "similar" businesses
•Find peer groups of similar people
•Clusters of "interests"
Yelp – Business – Co-Occurrence
CALL apoc.periodic.iterate(
'MATCH (b:Business)
WHERE size((b)<-[:REVIEWS]-()) > 5 AND b.city="Las Vegas"
RETURN b',
'MATCH (b)<-[:REVIEWS]-(r1)<-[:WROTE]-(u)-[:WROTE]->(r2)-[:REVIEWS]->(b2)
WHERE id(b) < id(b2) AND b2.city="Las Vegas"
AND size((b2)<-[:REVIEWS]-()) > 5
AND r1.stars = r2.stars
WITH b, b2, count(*) AS weight, avg(r1.stars) as rating where weight > 5
MERGE (b)-[cr:B2B]-(b2)
ON CREATE SET cr.weight = weight, cr.rating = rating
SET b:Marked, b2:Marked',
{batchSize: 1});
Yelp - Clustering Union Find
CALL algo.unionFind.stream(
'MATCH (b:Business:Marked) RETURN id(b) as id’,
'MATCH (b1:Business:Marked)-[r:B2B]-(b2)
RETURN id(b1) as source,
id(b2) as target,
count(r) as value',
{graph:'cypher'}) YIELD setId as cluster, nodeId
RETURN cluster, count(*) as size
ORDER BY size DESC LIMIT 10;
+--------------+
|cluster| size |
+--------------+
| 3 | 5625 |
| 1876 | 3 |
| 155 | 2 |
| 1091 | 2 |
| 1728 | 2 |
| 1177 | 2 |
| 337 | 2 |
| 3046 | 2 |
| 674 | 2 |
| 1948 | 2 |
+--------------+
10 rows
6615 ms
Yelp - PageRank
CALL algo.pageRank.stream(
'MATCH (b:Business:Marked)
RETURN id(b) as id',
'MATCH (b1:Business:Marked)-[r:B2B]-(b2)
RETURN id(b1) as source,
id(b2) as target',
{graph:'cypher'})
YIELD node, score
RETURN node.name, score
ORDER BY score DESC
LIMIT 10;
+-------------------------------------------------------+
| node.name | score |
+-------------------------------------------------------+
| "McCarran International Airport" | 27.49973599999999 |
| "Hash House A Go Go" | 19.062398000000005 |
| "Bachi Burger" | 18.1494385 |
| "Mon Ami Gabi" | 17.720350000000003 |
| "Bacchanal Buffet" | 15.783480500000003 |
| "Yard House Town Square" | 14.427296999999998 |
| "Secret Pizza" | 13.156547 |
| "Rollin Smoke Barbeque" | 12.808718499999998 |
| "Wicked Spoon" | 12.639942499999997 |
| "Monta Ramen" | 12.3904845 |
+-------------------------------------------------------+
10 rows
6979 ms
BitCoin
BitCoin Graph
• Full Copy of the BitCoin BlockChain
• from learnmeabitcoin.com (Greg Walker)
• 1.7 billion nodes, 2.7 billion rels
• 474k blocks, 240m tx, 280m addresses, 650m outputs
• 600 GB on disk
BitCoin Graph
BitCoin Graph
Distribution of "locked" relationships for "addresses"
(participation in transactions)
call apoc.stats.degrees('<locked');
+--------------------------------------------------------------------------------------------------------------+
| type | direction | total | p50 | p75 | p90 | p95 | p99 | p999 | max | min | mean |
+--------------------------------------------------------------------------------------------------------------+
| "locked" | "INCOMING" | 654662356 | 0 | 0 | 1 | 1 | 2 | 28 | 1891327 | 0 | 0.37588608290716047 |
+--------------------------------------------------------------------------------------------------------------+
1 row
308 seconds
BitCoin Graph
Inferred network of addresses, via transaction and output
(a1)<-[:locked]-(o1)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2)
CALL algo.unionFind.stream(
'match (o:output)-[:locked]->(a) with a limit 10000000 return id(a) as id',
'match (o:output)-[:locked]->(a) with o limit 10000000
match (o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2)
return id(a) as source, id(a2) as target, count(tx) as weight',
{graph:'cypher'})
YIELD setId as cluster, nodeId
RETURN cluster, count(*) AS size
ORDER BY size DESC
LIMIT 10;
+-------------------+
| cluster | size |
+-------------------+
| 5036 | 4409420 |
| 6295282 | 1999 |
| 5839746 | 1488 |
| 9356302 | 833 |
| 6560901 | 733 |
| 6370777 | 637 |
| 8101710 | 392 |
| 5945867 | 369 |
| 2489036 | 264 |
| 1703620 | 203 |
+-------------------+
10 rows, 296 seconds
Implementation
Design Considerations
• Ease of Use – Call as Procedures
• Parallelize everything: load, compute, write
• Efficiency: Use direct access, efficient datastructures, provide
high-level API
• Scale to billions of nodes and relationships
Use up to hundreds of CPUs and Terabytes of RAM
1. Load Data in parallel
from Neo4j
2. Store in efficient data
structures
3. Run Graph Algorithm
in parallel using
Graph API
4. Write data back in
parallel
Neo4j
1, 2
Algorithm
Datastructures
4
3
Graph API
Architecture
Scale: 144 CPU
Neo4j Graph Platform with Neo4j Algorithms
vs. Apache Spark’s GraphX
0
50
100
150
200
250
300
350
400
450
Union-Find (Connected Components) PageRank
251
Seconds
152
416
124
Neo4j is
Significantly
Faster
Spark GraphX results publicly available
• Amazon EC2 cluster running 64-bit Linux
• 128 CPUs with 68 GB of memory, 2 hard disks
Neo4j Configuration
• Physical machine running 64-bit Linux
• 128 CPUs with 55 GB RAM, SSDs
Twitter 2010 Dataset
• 1.47 Billion Relationships
• 41.65 Million Nodes
GraphX
Neo4j
Neo4j
GraphX
Compute At Scale – Payment Graph
3,000,000,000 nodes and 18,000,000,000 relationships (600G)
PageRank (20 iterations) on 1 machine, 20 threads, 900G RAM
call algo.pageRank('Account','SENT',
{graph:'huge',iterations:20,write:true,concurrency:20});
+-------------------------------------------------------------------+
| nodes | iterations | loadMillis | computeMillis | writeMillis |
+-------------------------------------------------------------------+
| 300000000 | 20 | 401404 | 6024994 | 47106 |
+-------------------------------------------------------------------+
1 row 6473526 ms -> 1h 47min
We Need Your Feedback
• neo4j.com/slack at #neo4j-graph-algorithms
• github.com/neo4j-contrib/neo4j-graph-algorithms
• Whitepaper on neo4j.com/graph-analytics
Graphs are one of
the Unifying Themes of computer science . . .
That so many different structures
can be modeled using a single formalism
is a Source of Great Power
to the educated programmer.”
- Steven S. Skiena,
The Algorithm Design Manual
“
Kudos:
Paul Horn
Martin Knobloch from Avantgarde Labs
Tomasz Bratanic (docs)
Thank You!
Questions !?
Ad

More Related Content

What's hot (20)

Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
Introducing Neo4j 3.0
Introducing Neo4j 3.0Introducing Neo4j 3.0
Introducing Neo4j 3.0
Neo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Paco Nathan
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
Neo4j
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Martin Junghanns
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Databricks
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j
Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
Graph database & neo4j
Graph database & neo4jGraph database & neo4j
Graph database & neo4j
Sandip Jadhav
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
Andrea Iacono
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Databricks
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Einführung in Neo4j
Einführung in Neo4jEinführung in Neo4j
Einführung in Neo4j
Neo4j
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
Spark Summit
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
Introducing Neo4j 3.0
Introducing Neo4j 3.0Introducing Neo4j 3.0
Introducing Neo4j 3.0
Neo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
Paco Nathan
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
Neo4j
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Martin Junghanns
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Databricks
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Mo Patel
 
Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j
Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
Graph database & neo4j
Graph database & neo4jGraph database & neo4j
Graph database & neo4j
Sandip Jadhav
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
Andrea Iacono
 
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for ...
Databricks
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
Spark Summit
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Einführung in Neo4j
Einführung in Neo4jEinführung in Neo4j
Einführung in Neo4j
Neo4j
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
Spark Summit
 

Similar to Graph Analytics: Graph Algorithms Inside Neo4j (20)

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
Paco Nathan
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Dr. Mirko Kämpf
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
jexp
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
Stefan Schlobach
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
Neo4j
 
Modeling the Impact of R & Python Packages: Dependency and Contributor Networks
Modeling the Impact of R & Python Packages: Dependency and Contributor NetworksModeling the Impact of R & Python Packages: Dependency and Contributor Networks
Modeling the Impact of R & Python Packages: Dependency and Contributor Networks
Melissa Moody
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
Andrew Flatters
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
Samet KILICTAS
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
Databricks
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
Paco Nathan
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Dr. Mirko Kämpf
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
jexp
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
Neo4j
 
Modeling the Impact of R & Python Packages: Dependency and Contributor Networks
Modeling the Impact of R & Python Packages: Dependency and Contributor NetworksModeling the Impact of R & Python Packages: Dependency and Contributor Networks
Modeling the Impact of R & Python Packages: Dependency and Contributor Networks
Melissa Moody
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
Andrew Flatters
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
Samet KILICTAS
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Ioan Toma
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
Databricks
 
Ad

More from Neo4j (20)

Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptxNeo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptxGraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with GraphNeo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Smarter Knowledge Graphs For Public  SectorSmarter Knowledge Graphs For Public  Sector
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's FutureGraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire ManagementDémonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk ParisDémonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening SessionThe Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalkNeo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph TechnologyNeo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life SciencesAstra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptxNeo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptxGraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with GraphNeo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Smarter Knowledge Graphs For Public  SectorSmarter Knowledge Graphs For Public  Sector
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's FutureGraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire ManagementDémonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk ParisDémonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening SessionThe Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalkNeo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph TechnologyNeo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life SciencesAstra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
Ad

Recently uploaded (20)

Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Hundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and GasHundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and Gas
bengsoon3
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
5kW Solar System in India – Cost, Benefits & Subsidy 2025
5kW Solar System in India – Cost, Benefits & Subsidy 20255kW Solar System in India – Cost, Benefits & Subsidy 2025
5kW Solar System in India – Cost, Benefits & Subsidy 2025
Ksquare Energy Pvt. Ltd.
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and MLGyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
GyrusAI - Broadcasting & Streaming Applications Driven by AI and ML
Gyrus AI
 
Hundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and GasHundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and Gas
bengsoon3
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
5kW Solar System in India – Cost, Benefits & Subsidy 2025
5kW Solar System in India – Cost, Benefits & Subsidy 20255kW Solar System in India – Cost, Benefits & Subsidy 2025
5kW Solar System in India – Cost, Benefits & Subsidy 2025
Ksquare Energy Pvt. Ltd.
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

Graph Analytics: Graph Algorithms Inside Neo4j

  • 1. Optimized Graph Algorithms in Neo4j Use the Power of Connections to Drive Discovery January 2018 Mark Needham Amy Hodler
  • 2. Mark Needham Software Engineer, Neo4j [email protected] @markhneedham Next 50 Minutes • Why Use Graph Analytics • Randomness vs. Reality • Graph Analytics Takes Off • How to Run Graph Analytics • Neo4j Graph Analytics and Algorithms • Demos and Implementation Graph Algorithms Real-World Networks Amy E. Hodler Analytics Marketing, Neo4j [email protected] @amyhodler
  • 4. Forecast Complex Network Behavior and Prescribe Action
  • 5. Cascading Failures Airline Congestion - 2010 Source: “Systemic delay propagation in the US airport network” – Fleurquin, Ramasco, Eguiluz
  • 7. Bridge Points Languages – Telecom Network Source: “Fast unfolding of communities in large networks” – Blondel, Guillaume, Lambiotte, Lefebvre
  • 8. Extract Structure and Model Processes
  • 10. Preferential Attachment Nodes tend to link to nodes that already have a lot of links Origins Debated • Local Mechanisms • Global Optimization • Mixed or Other Network Structures are Inseparable from Development
  • 11. Concentrated Distribution Source: “How Stuff Spreads” – Pulsar Platform NodeswithkLinks Number of links (k) Many nodes with only a few links A few hubs with a large number of links Power Law Distribution
  • 12. “There is No Network in Nature that we know of that would be described by the Random network model.” - Albert-László Barabási
  • 13. Small-World High local clustering and short average path lengths. Hub and spoke architecture. Scale-Free Hub and spoke architecture preserved at multiple scales. High power law distribution. Random Average distributions. No structure or hierarchical patterns.
  • 15. The Lure of Averages Source: Network Science - Barabasi Art: Ulysses and the Sirens – Herbert James Draper NodeswithkLinks Number of Links (k) Average Distribution - Random - Most nodes have the same number of links No highly connected nodes
  • 16. Resist The Lure of AveragesNodeswithkLinks Number of Links (k) Average Distribution - Random - Most nodes have the same number of links No highly connected nodes NodeswithkLinks Number of links (k) Power Law Distribution - Scale-Free - Many nodes with only a few links A few hubs with a large number of links Source: Network Science - Barabasi
  • 17. Resist The Lure of AveragesNodeswithkLinks Number of Links (k) Average Distribution - Random - Art: Ulysses and the Sirens – Herbert James Draper Most nodes have the same number of links No highly connected nodes You’ll Miss the Structure Hidden in Your Networks - Scale-Free - - Small World -
  • 21. Critical Mass • Collect, share and analyze massive connected data • Discovered common principles and structures • Existing mathematical tools • Unfulfilled promises of big data
  • 23. Insights from Algorithms Graph Algorithms • Metrics • Relevance • Clustering • Structural Insights Machine Learning • Classification, Regression • NLP, Structural/Content Predictions • Neural Networks as Graphs • Graph As Compute Fabric
  • 24. Structures Can Hide Source: “Communities, modules and large-scale structure in networks“ - Mark Newman Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
  • 25. Graph of Thrones A. Beveridge: GoT - Interaction Graph from Books
  • 26. Graph of Thrones A. Beveridge: GoT - Interaction Graph from Books
  • 27. How to Run Graph Analytics?
  • 28. Existing Options (so far) •Data Processing •Spark with GraphX, Flink with Gelly •Dedicated Graph Processing • Urika, GraphLab, Giraph, Mosaic, GPS, Signal-Collect, Gradoop •Data Scientist Toolkit • igraph, NetworkX, Boost(graph-tool) in Python, R, C
  • 29. Drawbacks • Manage several tools • Selection -> learning -> installation -> operation • Data selection, projection and transfer • Tedious and time consuming • Scalability • Especially classic data science tools
  • 30. An Example From Past GraphConnect
  • 31. Source: John Swain - Twitter Analytics Right Relevance Talk
  • 32. Many Moving Parts! Example Workflow Pipeline Twitter Streaming API Python Tweet Collection (includes user data) Rabbit MQ MongoDB Neo4j R Scripts -Graph Stats -Community Detection MySQL Graph .graphml Tableau Graph Visualization Moved from Twitter Search API to Streaming API Replaced Python Twitter libraries (Tweepy) with raw API calls Streaming tweets in message queue Full tweets and user data stored in MongoDB Built graph for analysis in Neo4j from tweets persisted in MongoDB Analysis in R iGraph libraries for algorithms Some text analysis e.g. LDA topics Results published in MySQL for Tableau Graphml for import to Gephi with stats precalculated
  • 33. Our Goal Twitter Streaming API Python Tweet Collection (includes user data) Rabbit MQ MongoDB Neo4j R Scripts -Graph Stats -Community Detection MySQL Graph .graphml Tableau Graph Visualization Example Workflow Pipeline
  • 35. Neo4j Native Graph Database Analytics Integrations Cypher Query Language Wide Range of APOC Procedures Optimized Graph Algorithms
  • 36. Finds the optimal path or evaluates route availability and quality Evaluates how a group is clustered or partitioned Determines the importance of distinct nodes in the network
  • 37. 1. Call as Cypher procedure 2. Pass in specification (Label, Prop, Query) and configuration 3. ~.stream variant returns (a lot) of results CALL algo.<name>.stream('Label','TYPE',{conf}) YIELD nodeId, score 4. non-stream variant writes results to graph returns statistics CALL algo.<name>('Label','TYPE',{conf}) Usage
  • 38. Pass in Cypher statement for node- and relationship-lists. CALL algo.<name>( 'MATCH ... RETURN id(n)', 'MATCH (n)-->(m) RETURN id(n) as source, id(m) as target', {graph:'cypher'}) Cypher Projection
  • 39. • PageRank (baseline) • Betweeness • Closeness • Degree Algorithms - Centralities Pathfinding Centrality Community Detection
  • 40. • Label Propagation • Union Find / WCC • Strongly Connected Components • Louvain • Triangle-Count / Clustering Coefficent Algorithms – Communitity Detection Pathfinding Community Detection Centrality
  • 41. • Single Source Short Path • All-Nodes SSP • Parallel BFS / DFS Algorithms - Pathfinding Centrality Community Detection Pathfinding
  • 42. Iterate Quickly • Combine data from sources into one graph • Project to relevant subgraphs • Enrich data with algorithms • Traverse, collect, filter aggregate with queries • Visualize, Explore, Decide, Export • From all APIs and Tools
  • 44. Datasets Yelp Business Graph • 5m nodes • 17m relationships Bitcoin • 1.7bn nodes, • 2.7bn rels DBPedia • 11m nodes • 116m relationships
  • 46. DBPedia Shallow Copy of Wikipedia: (Page) -[:Link]-> (Page) CALL algo.pageRank.stream('Page', 'Link', {iterations:5}) YIELD node, score WITH * ORDER BY score DESC LIMIT 5 RETURN node.title, score; +--------------------------------------+ | node.title | score | +--------------------------------------+ | "United States" | 13349.2 | | "Animal" | 6077.77 | | "France" | 5025.61 | | "List of sovereign states" | 4913.92 | | "Germany" | 4662.32 | +--------------------------------------+ 5 rows 46 seconds
  • 47. DBPedia – Largest Clusters CALL algo.labelPropagation(); // First 1M pages by Rank MATCH (n:Page) WITH n ORDER BY n.pagerank DESC LIMIT 1000000 // group by partition WITH n.partition AS partition, count(*) AS clusterSize, collect(n.title) AS pages // return most influential node for largest clusters RETURN pages[0] AS mainPage, pages[1..10] AS otherPages ORDER BY clusterSize DESC LIMIT 20
  • 48. Yelp
  • 49. Yelp • Business Reviews by Users •Businesses have Categories and Locations •Users have Friends •Bi-partite-Network (:User)-->(:Business) projections (:User)<-->(:User) & (:Business)<-->(:Business)
  • 50. Yelp – Social - Statistics MATCH (u:User) where exists ( (u)-[:FRIENDS]-() ) WITH u.average_stars as stars, u.review_count as reviews, u.funny as funny RETURN max(stars),avg(stars),stdev(stars),max(reviews),avg(reviews),stdev(reviews),max(funny),avg(funny),stdev(funny); +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | max(stars) | avg(stars) | stdev(stars) | max(reviews) | avg(reviews) | stdev(reviews) | max(funny) | avg(funny) | stdev(funny) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 5.0 | 3.8238072950764947 | 0.8862511758625753 | 11284 | 45.81704314022204 | 120.52419266925014 | 170896 | 36.26637835535585 | 731.6024752545679 | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ MATCH (u:User) where exists ( (u)-[:FRIENDS]-() ) WITH u.yelping_since as since RETURN substring(since,0,4) as year, count(*) as total ORDER BY year asc limit 10; +----------------+ | year | total | +----------------+ | "2004" | 64 | | "2005" | 844 | | "2006" | 4504 | | "2007" | 11833 | | "2008" | 20729 | | "2009" | 33965 | | "2010" | 53046 | | "2011" | 70331 | | "2012" | 62596 | | "2013" | 57330 | +----------------+
  • 51. Yelp – Social - PageRank call algo.pageRank.stream('User','FRIENDS') yield node,score with node,score order by score desc limit 10 return node {.name, .review_count, .average_stars,.useful,.yelping_since,.funny}, score, size( (node)<-[:FRIENDS]-()<-[:FRIENDS]-()) as in, size( (node)-[:FRIENDS]->()-[:FRIENDS]->()) as out; +-----------------------------------------------------------------------------------------------------------------------------------------------------+ | node | score | +-----------------------------------------------------------------------------------------------------------------------------------------------------+ | {funny -> 61200, name -> "Philip", average_stars -> 3.93, review_count -> 788, useful -> 69448, yelping_since -> "2007-06-09"} | 208.31336799999994 | | {funny -> 21432, name -> "Des", average_stars -> 3.88, review_count -> 78, useful -> 140024, yelping_since -> "2014-04-01"} | 201.28600150000003 | | {funny -> 465, name -> "Dallas", average_stars -> 4.17, review_count -> 330, useful -> 5517, yelping_since -> "2010-11-07"} | 192.164762 | | {funny -> 1019, name -> "Cara", average_stars -> 3.96, review_count -> 842, useful -> 11738, yelping_since -> "2010-07-21"} | 184.01898249999996 | | {funny -> 1233, name -> "Walker", average_stars -> 3.91, review_count -> 462, useful -> 12332, yelping_since -> "2007-01-25"} | 180.48898350000005 | | {funny -> 13432, name -> "Gabi", average_stars -> 4.05, review_count -> 1730, useful -> 20759, yelping_since -> "2007-08-10"} | 163.29424850000004 | | {funny -> 12848, name -> "Ruggy", average_stars -> 3.92, review_count -> 2118, useful -> 72265, yelping_since -> "2007-07-31"} | 161.87635500000002 | | {funny -> 9997, name -> "Bill", average_stars -> 3.38, review_count -> 595, useful -> 12074, yelping_since -> "2014-04-05"} | 157.0438075 | | {funny -> 1544, name -> "Ashley", average_stars -> 3.7, review_count -> 224, useful -> 1610, yelping_since -> "2009-09-29"} | 150.21423599999997 | | {funny -> 3599, name -> "Risa", average_stars -> 4.08, review_count -> 1044, useful -> 22121, yelping_since -> "2011-07-30"} | 138.20863199999997 | +-----------------------------------------------------------------------------------------------------------------------------------------------------+ 10 rows 3236 ms
  • 52. Yelp •Inferred network of users, via jointly reviewed businesses • (u1:User)-[:WROTE]->(review1)-[:REVIEWS]->(business)<-[:REVIEWS]-(review2)<-[:WROTE]-(u2:User) • 1,3bn paths • Inferred network of businesses, via jointly reviewed by user • (b1:Business)<-[:REVIEWS]-()<-[:WROTE]-(u)-[:WROTE]->()-[:REVIEWS]->(b2:Business) • 214m paths • subset: (b1:Business)-[:CO_OCCURENT_REVIEWS]-(b2:Business)
  • 53. Yelp •Inferred network of users, via jointly reviewed businesses • (u1:User)-[:WROTE]->(review1)-[:REVIEWS]->(business)<-[:REVIEWS]-(review2)<-[:WROTE]-(u2:User) • 1.3bn paths • Inferred network of businesses, via jointly reviewed by user • (b1:Business)<-[:REVIEWS]-()<-[:WROTE]-(u)-[:WROTE]->()-[:REVIEWS]->(b2:Business) • 214m paths
  • 54. Yelp – Business – Co-Occurrence •Find clusters of "similar" businesses •Find peer groups of similar people •Clusters of "interests"
  • 55. Yelp – Business – Co-Occurrence CALL apoc.periodic.iterate( 'MATCH (b:Business) WHERE size((b)<-[:REVIEWS]-()) > 5 AND b.city="Las Vegas" RETURN b', 'MATCH (b)<-[:REVIEWS]-(r1)<-[:WROTE]-(u)-[:WROTE]->(r2)-[:REVIEWS]->(b2) WHERE id(b) < id(b2) AND b2.city="Las Vegas" AND size((b2)<-[:REVIEWS]-()) > 5 AND r1.stars = r2.stars WITH b, b2, count(*) AS weight, avg(r1.stars) as rating where weight > 5 MERGE (b)-[cr:B2B]-(b2) ON CREATE SET cr.weight = weight, cr.rating = rating SET b:Marked, b2:Marked', {batchSize: 1});
  • 56. Yelp - Clustering Union Find CALL algo.unionFind.stream( 'MATCH (b:Business:Marked) RETURN id(b) as id’, 'MATCH (b1:Business:Marked)-[r:B2B]-(b2) RETURN id(b1) as source, id(b2) as target, count(r) as value', {graph:'cypher'}) YIELD setId as cluster, nodeId RETURN cluster, count(*) as size ORDER BY size DESC LIMIT 10; +--------------+ |cluster| size | +--------------+ | 3 | 5625 | | 1876 | 3 | | 155 | 2 | | 1091 | 2 | | 1728 | 2 | | 1177 | 2 | | 337 | 2 | | 3046 | 2 | | 674 | 2 | | 1948 | 2 | +--------------+ 10 rows 6615 ms
  • 57. Yelp - PageRank CALL algo.pageRank.stream( 'MATCH (b:Business:Marked) RETURN id(b) as id', 'MATCH (b1:Business:Marked)-[r:B2B]-(b2) RETURN id(b1) as source, id(b2) as target', {graph:'cypher'}) YIELD node, score RETURN node.name, score ORDER BY score DESC LIMIT 10; +-------------------------------------------------------+ | node.name | score | +-------------------------------------------------------+ | "McCarran International Airport" | 27.49973599999999 | | "Hash House A Go Go" | 19.062398000000005 | | "Bachi Burger" | 18.1494385 | | "Mon Ami Gabi" | 17.720350000000003 | | "Bacchanal Buffet" | 15.783480500000003 | | "Yard House Town Square" | 14.427296999999998 | | "Secret Pizza" | 13.156547 | | "Rollin Smoke Barbeque" | 12.808718499999998 | | "Wicked Spoon" | 12.639942499999997 | | "Monta Ramen" | 12.3904845 | +-------------------------------------------------------+ 10 rows 6979 ms
  • 59. BitCoin Graph • Full Copy of the BitCoin BlockChain • from learnmeabitcoin.com (Greg Walker) • 1.7 billion nodes, 2.7 billion rels • 474k blocks, 240m tx, 280m addresses, 650m outputs • 600 GB on disk
  • 61. BitCoin Graph Distribution of "locked" relationships for "addresses" (participation in transactions) call apoc.stats.degrees('<locked'); +--------------------------------------------------------------------------------------------------------------+ | type | direction | total | p50 | p75 | p90 | p95 | p99 | p999 | max | min | mean | +--------------------------------------------------------------------------------------------------------------+ | "locked" | "INCOMING" | 654662356 | 0 | 0 | 1 | 1 | 2 | 28 | 1891327 | 0 | 0.37588608290716047 | +--------------------------------------------------------------------------------------------------------------+ 1 row 308 seconds
  • 62. BitCoin Graph Inferred network of addresses, via transaction and output (a1)<-[:locked]-(o1)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2) CALL algo.unionFind.stream( 'match (o:output)-[:locked]->(a) with a limit 10000000 return id(a) as id', 'match (o:output)-[:locked]->(a) with o limit 10000000 match (o)-[:in]->(tx)-[:out]->(o2)-[:locked]->(a2) return id(a) as source, id(a2) as target, count(tx) as weight', {graph:'cypher'}) YIELD setId as cluster, nodeId RETURN cluster, count(*) AS size ORDER BY size DESC LIMIT 10; +-------------------+ | cluster | size | +-------------------+ | 5036 | 4409420 | | 6295282 | 1999 | | 5839746 | 1488 | | 9356302 | 833 | | 6560901 | 733 | | 6370777 | 637 | | 8101710 | 392 | | 5945867 | 369 | | 2489036 | 264 | | 1703620 | 203 | +-------------------+ 10 rows, 296 seconds
  • 64. Design Considerations • Ease of Use – Call as Procedures • Parallelize everything: load, compute, write • Efficiency: Use direct access, efficient datastructures, provide high-level API • Scale to billions of nodes and relationships Use up to hundreds of CPUs and Terabytes of RAM
  • 65. 1. Load Data in parallel from Neo4j 2. Store in efficient data structures 3. Run Graph Algorithm in parallel using Graph API 4. Write data back in parallel Neo4j 1, 2 Algorithm Datastructures 4 3 Graph API Architecture
  • 67. Neo4j Graph Platform with Neo4j Algorithms vs. Apache Spark’s GraphX 0 50 100 150 200 250 300 350 400 450 Union-Find (Connected Components) PageRank 251 Seconds 152 416 124 Neo4j is Significantly Faster Spark GraphX results publicly available • Amazon EC2 cluster running 64-bit Linux • 128 CPUs with 68 GB of memory, 2 hard disks Neo4j Configuration • Physical machine running 64-bit Linux • 128 CPUs with 55 GB RAM, SSDs Twitter 2010 Dataset • 1.47 Billion Relationships • 41.65 Million Nodes GraphX Neo4j Neo4j GraphX
  • 68. Compute At Scale – Payment Graph 3,000,000,000 nodes and 18,000,000,000 relationships (600G) PageRank (20 iterations) on 1 machine, 20 threads, 900G RAM call algo.pageRank('Account','SENT', {graph:'huge',iterations:20,write:true,concurrency:20}); +-------------------------------------------------------------------+ | nodes | iterations | loadMillis | computeMillis | writeMillis | +-------------------------------------------------------------------+ | 300000000 | 20 | 401404 | 6024994 | 47106 | +-------------------------------------------------------------------+ 1 row 6473526 ms -> 1h 47min
  • 69. We Need Your Feedback • neo4j.com/slack at #neo4j-graph-algorithms • github.com/neo4j-contrib/neo4j-graph-algorithms • Whitepaper on neo4j.com/graph-analytics
  • 70. Graphs are one of the Unifying Themes of computer science . . . That so many different structures can be modeled using a single formalism is a Source of Great Power to the educated programmer.” - Steven S. Skiena, The Algorithm Design Manual “
  • 71. Kudos: Paul Horn Martin Knobloch from Avantgarde Labs Tomasz Bratanic (docs)