SlideShare a Scribd company logo
Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner
GRATIN: Accelerating Graph Traversals
in Main-Memory Column Stores
GRADES’14 Workshop
June 22, 2014
Graphs from an Enterprise View
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational
+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data
• Recursive Queries
• Chained Joins
 Data already in RDBMS
 SQL as interface
 Data transfer to application
Relational + Graph
+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
 Efficient processing in GDBMS
 Processing on replicated data
 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graph Processing (The New World)
Graph Representation
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representation
id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representation
id=1
name=John
type=User
id=2
title=The Shining
type=Product
id=3
title=The Stand
type=Product
id=4
name=Horror
type=Category
id=5
name=Literature
type=Category
type=category
type=similar
type=belongs
type=belongs
type=rated
rating=4.0
type=rated
rating=5.0
Example graph
id type name . . . title
1 User John . . . ?
2 Product ? . . . The Shining
3 Product ? . . . The Stand
4 Category Horror . . . ?
5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?
2 3 similar . . . ?
2 4 belongs . . . ?
3 4 belongs . . . ?
1 3 rated . . . 5.0
1 2 rated . . . 4.0
4 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Query Execution
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
(Traversal)
π (Projection)
Graph Traversals
Example query:{ id:D }-a-(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Edge Clustering
• Clustering by edge type preserves
subgraph meaning
• Clustering by edge source preserves vertex
neighborhood
• Increases spatial locality in memory
• Allows reducing scan to range in column
Type Clustering
Edge Clustering
Vs Vt Type
D F a
A D a
A B a
A C a
E B a
E G a
D B b
B E b
F G b
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
• gratin replaces full column scans by block scans
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
Minimal blocksize: 2
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
Experiments on Static Graphs
ID |V| |E| ¯dout
Amazon 0.4 M 3.3 M 16.8
California-Roads 1.9 M 2.7 M 2.8
1 2 3 4 5 6 7
0
5
10
15
20
# of Traversal Iterations
ExecutionTime(ms)
Amazon
SCAN gratin-512
gratin-4096 gratin-32768
Figure: Comparison for different block sizes.
2 4 6 8 10
10
20
30
40
50
Traversal Iteration
Querytime(ms)
Amazon
2 4 6 8 10
5
10
15
20
25
Traversal Iteration
California-Roads
Figure: Query time for scan-based traversal ( ) and
gratin-based traversal ( ).
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7
Handling Updates
Column
21
22
23
14
35
Minimal blocksize: 2
B1
B2
2
1
2
1
2
3
Value Blocks Block ranges
2
1 3
5
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
Minimal blocksize: 2
B1
B2
2
1
2
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
2
1 3
5
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 1.0
hB3
= 1.0
GRATIN health
h = 1.0
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1
= 1.0
hB2
= 0.5
hB3
= 1.0
GRATIN health
h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold → rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Experiments on Dynamic Graphs
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
1.8
2
2.2
∆Batch Insertions
SlowdownFactor()
Amazon
0
0.2
0.4
0.6
0.8
1
HealthFactor()
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
∆Batch Insertions
SlowdownFactor()
California-Roads
0
0.2
0.4
0.6
0.8
1
HealthFactor()
Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describes
the relative execution time in multiples of the execution time on a static graph.
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9
Summary
• Tight integration of traversal operator
into main-memory column store
• gratin is a lightweight secondary
index structure
• Handles dynamic graphs in
predictable time
• Experiments show a diverse
spectrum of performance
improvements
• Performance of gratin depends on
graph topology
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
Contact
Marcus Paradies
PhD Student at Database Technology Group, TU Dresden
https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/
marcus.paradies@gmail.com
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 11
Ad

More Related Content

Similar to GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores (20)

Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
itesm
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
MapReduce
MapReduceMapReduce
MapReduce
SatyaHadoop
 
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
cniclsh1
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between  CAD & GIS: 6 Ways to Automate Your  Data IntegrationBridging Between  CAD & GIS: 6 Ways to Automate Your  Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Prakher Hajela Saxena
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
MapReduce
MapReduceMapReduce
MapReduce
KavyaGo
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
VMware Tanzu
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
richoe
 
NOSQL introduction for big data analytics
NOSQL introduction for big data analyticsNOSQL introduction for big data analytics
NOSQL introduction for big data analytics
Radhika R
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
Map Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analyticsMap Reducec and Spark big data visualization and analytics
Map Reducec and Spark big data visualization and analytics
itesm
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
W10L2 Scaling Up LLM Pretraining: Parallel Training Scaling Up Optimizer Basi...
cniclsh1
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between  CAD & GIS: 6 Ways to Automate Your  Data IntegrationBridging Between  CAD & GIS: 6 Ways to Automate Your  Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Safe Software
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
marketing932765
 
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015
Prakher Hajela Saxena
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
MapReduce
MapReduceMapReduce
MapReduce
KavyaGo
 
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
VMware Tanzu
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Dsm Presentation
Dsm PresentationDsm Presentation
Dsm Presentation
richoe
 
NOSQL introduction for big data analytics
NOSQL introduction for big data analyticsNOSQL introduction for big data analytics
NOSQL introduction for big data analytics
Radhika R
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
Nishant Gandhi
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 

Recently uploaded (20)

md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Ad

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

  • 1. Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores GRADES’14 Workshop June 22, 2014
  • 2. Graphs from an Enterprise View © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 3. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 4. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 5. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application Relational + Graph + Application Logic Application Layer RDBMS GDBMS Replicate Data Data Data Data © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 6. Graphs from an Enterprise View Relational + Application Logic Application Layer RDBMS Graph Processing Data Data • Recursive Queries • Chained Joins Data already in RDBMS SQL as interface Data transfer to application Relational + Graph + Application Logic Application Layer RDBMS GDBMS Replicate Data Data Data Data Efficient processing in GDBMS Processing on replicated data No combination with relational data © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
  • 7. Graph Processing (The New World) Graph Representation © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 8. Graph Processing (The New World) Graph Representation id=1 name=John type=User id=2 title=The Shining type=Product id=3 title=The Stand type=Product id=4 name=Horror type=Category id=5 name=Literature type=Category type=category type=similar type=belongs type=belongs type=rated rating=4.0 type=rated rating=5.0 Example graph id type name . . . title 1 User John . . . ? 2 Product ? . . . The Shining 3 Product ? . . . The Stand 4 Category Horror . . . ? 5 Category Literature . . . ? Vertex table Vs Vt type . . . rating 3 2 similar . . . ? 2 3 similar . . . ? 2 4 belongs . . . ? 3 4 belongs . . . ? 1 3 rated . . . 5.0 1 2 rated . . . 4.0 4 5 category . . . ? Edge table © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 9. Graph Processing (The New World) Graph Representation id=1 name=John type=User id=2 title=The Shining type=Product id=3 title=The Stand type=Product id=4 name=Horror type=Category id=5 name=Literature type=Category type=category type=similar type=belongs type=belongs type=rated rating=4.0 type=rated rating=5.0 Example graph id type name . . . title 1 User John . . . ? 2 Product ? . . . The Shining 3 Product ? . . . The Stand 4 Category Horror . . . ? 5 Category Literature . . . ? Vertex table Vs Vt type . . . rating 3 2 similar . . . ? 2 3 similar . . . ? 2 4 belongs . . . ? 3 4 belongs . . . ? 1 3 rated . . . 5.0 1 2 rated . . . 4.0 4 5 category . . . ? Edge table • Each vertex/edge represented as a single record in universal tables • Support for transactions and compression • Combination with other data models (spatial, text, temporal) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
  • 10. Graph Processing (The New World) Query Execution © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 11. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 12. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 13. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 14. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 15. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 16. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 17. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 18. Graph Processing (The New World) Query Execution EDGES σ (Selection) (Traversal) π (Projection) Graph Traversals Example query:{ id:D }-a-(1,*) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
  • 19. Edge Clustering • Clustering by edge type preserves subgraph meaning • Clustering by edge source preserves vertex neighborhood • Increases spatial locality in memory • Allows reducing scan to range in column Type Clustering Edge Clustering Vs Vt Type D F a A D a A B a A C a E B a E G a D B b B E b F G b © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5
  • 20. GRATIN Column 2 2 2 1 3 1 2 3 4 5 • gratin replaces full column scans by block scans © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 21. GRATIN Column 2 2 2 1 3 1 2 3 4 5 Minimal blocksize: 2 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 22. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 23. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 24. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 • gratin replaces full column scans by block scans • gratin indexes column with variable block size © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 25. GRATIN Column 2 2 2 1 3 1 2 3 4 5 B1 B2 Minimal blocksize: 2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 • gratin replaces full column scans by block scans • gratin indexes column with variable block size • Allows efficient handling of vertices with high outdegree © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
  • 26. Experiments on Static Graphs ID |V| |E| ¯dout Amazon 0.4 M 3.3 M 16.8 California-Roads 1.9 M 2.7 M 2.8 1 2 3 4 5 6 7 0 5 10 15 20 # of Traversal Iterations ExecutionTime(ms) Amazon SCAN gratin-512 gratin-4096 gratin-32768 Figure: Comparison for different block sizes. 2 4 6 8 10 10 20 30 40 50 Traversal Iteration Querytime(ms) Amazon 2 4 6 8 10 5 10 15 20 25 Traversal Iteration California-Roads Figure: Query time for scan-based traversal ( ) and gratin-based traversal ( ). © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7
  • 27. Handling Updates Column 21 22 23 14 35 Minimal blocksize: 2 B1 B2 2 1 2 1 2 3 Value Blocks Block ranges 2 1 3 5 © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 28. Handling Updates Column 21 22 23 14 35 Minimal blocksize: 2 B1 B2 2 1 2 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 1.0 hB3 = 1.0 GRATIN health h = 1.0 Block ranges 2 1 3 5 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 29. Handling Updates Column 21 22 23 14 35 26 Minimal blocksize: 2 B1 B2 B3 2 1 2 3 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 1.0 hB3 = 1.0 GRATIN health h = 1.0 Block ranges 3 2 1 3 5 6 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index • gratin allows updates in constant time (append-only) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 30. Handling Updates Column 21 22 23 14 35 26 Minimal blocksize: 2 B1 B2 B3 2 1 2 3 1 2 3 Value Blocks Health factor hB1 = 1.0 hB2 = 0.5 hB3 = 1.0 GRATIN health h = 0.83 Block ranges 3 2 1 3 5 6 • Health factor describes viability of gratin • If global health factor below threshold → rebuild index • gratin allows updates in constant time (append-only) © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
  • 31. Experiments on Dynamic Graphs 0 +20K +20K +20K +20K +20K 1 1.2 1.4 1.6 1.8 2 2.2 ∆Batch Insertions SlowdownFactor() Amazon 0 0.2 0.4 0.6 0.8 1 HealthFactor() 0 +20K +20K +20K +20K +20K 1 1.2 1.4 1.6 ∆Batch Insertions SlowdownFactor() California-Roads 0 0.2 0.4 0.6 0.8 1 HealthFactor() Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describes the relative execution time in multiples of the execution time on a static graph. © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9
  • 32. Summary • Tight integration of traversal operator into main-memory column store • gratin is a lightweight secondary index structure • Handles dynamic graphs in predictable time • Experiments show a diverse spectrum of performance improvements • Performance of gratin depends on graph topology © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
  • 33. Contact Marcus Paradies PhD Student at Database Technology Group, TU Dresden https://wwwdb.inf.tu-dresden.de/team/external-members/marcus-paradies/ [email protected] © Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 11