SlideShare a Scribd company logo
Lazy Join Optimizations without
Upfront Statistics
MATTEO INTERLANDI
Open Source Data-Intensive Scalable Computing (DISC)
Platforms: Hadoop MapReduce and Spark
◦ functional API
◦ map and reduce User-Defined Functions
◦ RDD transformations (filter, flatMap, zipPartitions, etc.)
Several years later, introduction of high-level SQL-like
declarative query languages (and systems)
◦ Conciseness
◦ Pick a physical execution plan from a number of alternatives
Cloud Computing Programs
Two steps process
◦ Logical optimizations (e.g., filter pushdown)
◦ Physical optimizations (e.g., join orders and implementation)
Physical optimizer in RDMBS:
◦ Cost-base
◦ Data statistics (e.g., predicate selectivities, cost of data access, etc.)
The role of the cost-based optimizer is to
(1) enumerate some set of equivalent plans
(2) estimate the cost of each plan
(3) select a sufficiently good plan
Query Optimization
Query Optimization: Why Important?
0.25
1
4
16
64
256
1024
4096
16384
W R W R W R W R
1082
70
343
21
unabletofinishin5+hours
276
15102
954
Time(s)
Scale Factor = 10
Spark
AsterixDB
Hive
Pig
Query Optimization: Why Important?
0.25
1
4
16
64
256
1024
4096
16384
W R W R W R W R
1082
70
343
21
unabletofinishin5+hours
276
15102
954
Time(s)
Scale Factor = 10
Spark
AsterixDB
Hive
Pig
Bad plans over Big Data can be disastrous!
No cost-based join enumeration
◦ Rely on order of relations in FROM clause
◦ Left-deep plans
No upfront statistics:
◦ Often data sits in HDFS and unstructured
Even if input statistics are available:
◦ Correlations between predicates
◦ Exponential error propagation in joins
◦ Arbitrary UDFs
Cost-base Optimizer in DISC
Bad statistics
Adaptive Query planning
RoPe [NSDI 12, VLDB 2013]
No upfront statistics
Pilot runs (samples)
DynO [SIGMOD 2014]
Cost-base Optimizer in DISC: State of the Art
Bad statistics
◦ Adaptive Query planning
◦ RoPe [NSDI 12, VLDB 2013]
No upfront statistics
Cost-base Optimizer in DISC: State of the Art
Bad statistics
◦ Adaptive Query planning
◦ RoPe [NSDI 12, VLDB 2013]
No upfront statistics
Cost-base Optimizer in DISC: State of the Art
Assumption is that some initial
statistics exist
Bad statistics
◦ Adaptive Query planning
◦ RoPe [NSDI 12, VLDB 2013]
No upfront statistics
◦ Pilot runs (samples)
◦ DynO [SIGMOD 2014]
Cost-base Optimizer in DISC: State of the Art
Assumption is that some initial
statistics exist
Bad statistics
◦ Adaptive Query planning
◦ RoPe [NSDI 12, VLDB 2013]
No upfront statistics
◦ Pilot runs (samples)
◦ DynO [SIGMOD 2014]
Cost-base Optimizer in DISC: State of the Art
Assumption is that some initial
statistics exist
• Samples are expensive
• Only foreign-key joins
• No runtime plan revision
Lazy Cost-base Optimizer for Spark
Key idea: interleave query planning and execution
◦ Query plans are lazily executed
◦ Statistics are gathered at runtime
◦ Joins are greedly scheduled
◦ Next join can be dynamically changed if a bad decision was made
◦ Execute-Gather-Aggregate-Plan strategy (EGAP)
Neither upfront statistics nor pilot runs are required
◦ Raw dataset size for initial guess
Support for not foreign-key joins
Lazy Optimizer: an Example
BA
A
C
AA
AAB
AAC
Assumption: A < C
Lazy Optimizer: Execute Step
BA
A
C
AA
AAB
AAC
A
B
C
Assumption: A < C
Lazy Optimizer: Gather step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
Lazy Optimizer: Aggregate step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
Driver
Lazy Optimizer: Plan step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
Lazy Optimizer: Execute step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
AB
Lazy Optimizer: Gather step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
AB
S
Lazy Optimizer: Plan step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
AB
S
Lazy Optimizer: Execute step
BA
A
C
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < C
U
AB
S
ABCABCABC
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
Assumption: A < Cσ
σ(A) > σ(C)
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
σ(A) > σ(C)
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
B
σ(A) > σ(C)
Repartition step
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
B
σ(A) > σ(C)
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
B
BC
S
σ(A) > σ(C)
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
B
BC
S
σ(A) > σ(C)
Lazy Optimizer: Wrong Guess
B(A)
A
σ(C)
AA
AAB
AAC
A
B
C
S
S
S
S
Assumption: A < Cσ
B
BC
S
ABCABCABC
σ(A) > σ(C)
Runtime Integrated Optimizer for Spark
Spark batch execution model allows late binding of joins
Set of Statistics:
◦ Join estimations (based on sampling or sketches)
◦ Number of records
◦ Average size of each record
Statistics are aggregates using a Spark job or accumulators
Join implementations are picked based on thresholds
Challenges and Optimizations
Execute - Block and revise execution plans without wasting
computation
Gather - Asynchronous generation of statistics
Aggregate - Efficient accumulation of statistics
Plan - Try to schedule as many broadcast joins as possible
Experiments
Q1: Is RIOS able to generate good query plans?
Q2: What are the performance of RIOS compared to regular
Spark and pilot runs?
Q3: How expensive are wrong guesses?
Minibenchmark with 3 Fact Tables
16
64
256
1024
4096
16384
1 10 100 1000
40
66
115
997
41
61
111
868
45
66
123
1140
136
4194
unabletofinishin5+hours
unabletofinishin5+hours
143
4230
unabletofinishin5+hours
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order
RIOS R2R
RIOS W2R
spark wrong-order
pilot-run
Minibenchmark with 3 Fact Tables
16
64
256
1024
4096
16384
1 10 100 1000
40
66
115
997
41
61
111
868
45
66
123
1140
136
4194
unabletofinishin5+hours
unabletofinishin5+hours
143
4230
unabletofinishin5+hours
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order
RIOS R2R
RIOS W2R
spark wrong-order
pilot-run
Q1: RIOS is able to avoid bad plans
Minibenchmark with 3 Fact Tables
16
64
256
1024
4096
16384
1 10 100 1000
40
66
115
997
41
61
111
868
45
66
123
1140
136
4194
unabletofinishin5+hours
unabletofinishin5+hours
143
4230
unabletofinishin5+hours
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order
RIOS R2R
RIOS W2R
spark wrong-order
pilot-run
Q2: RIOS is always faster than pilot run approach
Minibenchmark with 3 Fact Tables
16
64
256
1024
4096
16384
1 10 100 1000
40
66
115
997
41
61
111
868
45
66
123
1140
136
4194
unabletofinishin5+hours
unabletofinishin5+hours
143
4230
unabletofinishin5+hours
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order
RIOS R2R
RIOS W2R
spark wrong-order
pilot-run
Q3: Bad guesses cost around 15% in the worst case
TPCDS and TPCH Queries
16
32
64
128
256
512
1024
2048
4096
8192
1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000
Query 17 Query 50 Query 28 Query 9
38
49
140
2298
38
55
87
1185
37
41
107
617
55
80
326
8511
37
41
67
899
38
41
54
843
34
35
39
464
46
47
137
7250
40
43
69
930
41
52
70
1128
37
38
42
490
50
50
215
7831
45
55
198
3898
47
105
291
6069
37
44
109
712
70
153
633
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order RIOS pilot-run spark bad-order
TPCDS and TPCH Queries
16
32
64
128
256
512
1024
2048
4096
8192
1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000
Query 17 Query 50 Query 28 Query 9
38
49
140
2298
38
55
87
1185
37
41
107
617
55
80
326
8511
37
41
67
899
38
41
54
843
34
35
39
464
46
47
137
7250
40
43
69
930
41
52
70
1128
37
38
42
490
50
50
215
7831
45
55
198
3898
47
105
291
6069
37
44
109
712
70
153
633
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order RIOS pilot-run spark bad-order
Q1: RIOS generates optimal plans
TPCDS and TPCH Queries
16
32
64
128
256
512
1024
2048
4096
8192
1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000
Query 17 Query 50 Query 28 Query 9
38
49
140
2298
38
55
87
1185
37
41
107
617
55
80
326
8511
37
41
67
899
38
41
54
843
34
35
39
464
46
47
137
7250
40
43
69
930
41
52
70
1128
37
38
42
490
50
50
215
7831
45
55
198
3898
47
105
291
6069
37
44
109
712
70
153
633
unabletofinishin5+hours
Time(s)
Scale Factor
spark good-order RIOS pilot-run spark bad-order
Q2: RIOS is always the faster approach
Conclusions
RIOS: cost-base query optimizer for Spark
Statistics are gathered at runtime (no need for initial
statistics or pilot runs)
Late bind of joins
Up to 2x faster than the best left-deep plans (Spark), and >
100x than previous approaches for fact table joins.
Future Work
More flexible shuffle operations:
◦ Efficient switch from shuffle-base joins to broadcast joins
◦ Allow records to be partitioned in different ways
Take in consideration interesting orders and partitions
Add aggregation and additional statistics (IO and network
cost)
Thank you
๏ Datasets:
• TPCDS
• TPCH
๏ Configuration:
• 16 machines, 4 cores (2 hyper threads per core)
machines, 32GB of RAM, 1TB disk
• Spark 1.6.3
• Scale factor from 1 to 1000 (~1TB)
Experiment Configuration

More Related Content

What's hot (20)

PPTX
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit
 
PPTX
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
PDF
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
PDF
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
PDF
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Spark Summit
 
PDF
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
PPTX
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
PDF
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 
PDF
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
PPTX
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
PDF
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Spark Summit
 
PDF
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
PDF
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
PDF
Spark DataFrames and ML Pipelines
Databricks
 
PDF
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
PDF
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
PDF
Assessing Graph Solutions for Apache Spark
Databricks
 
PDF
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
 
PDF
Lessons Learned while Implementing a Sparse Logistic Regression Algorithm in ...
Spark Summit
 
PDF
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Spark Summit
 
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Spark Summit
 
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Spark Summit EU talk by Elena Lazovik
Spark Summit
 
Spark DataFrames and ML Pipelines
Databricks
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Assessing Graph Solutions for Apache Spark
Databricks
 
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
 
Lessons Learned while Implementing a Sparse Logistic Regression Algorithm in ...
Spark Summit
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Flink Forward
 

Similar to Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi (20)

PPTX
List intersection for web search: Algorithms, Cost Models, and Optimizations
Sunghwan Kim
 
PPT
Relaxing Join and Selection Queries - VLDB 2006 Slides
rvernica
 
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
PDF
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
PDF
Maximizing Database Tuning in SAP SQL Anywhere
SAP Technology
 
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
PDF
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
PPTX
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
Pandey_G
 
DOC
Se notes
Noor Ul Hudda Memon
 
PDF
WITS Presentation - 6 Dec 2013
Magdalene Tan
 
PDF
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
EDB
 
PDF
Adaptive Query Optimization
Anju Garg
 
PDF
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
PDF
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
PPTX
Performance Risk Management
Viswanath Chittoory
 
PDF
TCA-C01 | How to Ace the Tableau Architect Exam?
PalakMazumdar1
 
PDF
OPTIMIZING THE TICK STACK
InfluxData
 
PDF
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
PPTX
Effectively Migrating to Cassandra from a Relational Database
Todd McGrath
 
PPT
Cansat 2008: University of Michigan Maizesat Final Presentation
American Astronautical Society
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
Sunghwan Kim
 
Relaxing Join and Selection Queries - VLDB 2006 Slides
rvernica
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
Maximizing Database Tuning in SAP SQL Anywhere
SAP Technology
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
Edge AI and Vision Alliance
 
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs
Pandey_G
 
WITS Presentation - 6 Dec 2013
Magdalene Tan
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
EDB
 
Adaptive Query Optimization
Anju Garg
 
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Performance Risk Management
Viswanath Chittoory
 
TCA-C01 | How to Ace the Tableau Architect Exam?
PalakMazumdar1
 
OPTIMIZING THE TICK STACK
InfluxData
 
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Effectively Migrating to Cassandra from a Relational Database
Todd McGrath
 
Cansat 2008: University of Michigan Maizesat Final Presentation
American Astronautical Society
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 

Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi

  • 1. Lazy Join Optimizations without Upfront Statistics MATTEO INTERLANDI
  • 2. Open Source Data-Intensive Scalable Computing (DISC) Platforms: Hadoop MapReduce and Spark ◦ functional API ◦ map and reduce User-Defined Functions ◦ RDD transformations (filter, flatMap, zipPartitions, etc.) Several years later, introduction of high-level SQL-like declarative query languages (and systems) ◦ Conciseness ◦ Pick a physical execution plan from a number of alternatives Cloud Computing Programs
  • 3. Two steps process ◦ Logical optimizations (e.g., filter pushdown) ◦ Physical optimizations (e.g., join orders and implementation) Physical optimizer in RDMBS: ◦ Cost-base ◦ Data statistics (e.g., predicate selectivities, cost of data access, etc.) The role of the cost-based optimizer is to (1) enumerate some set of equivalent plans (2) estimate the cost of each plan (3) select a sufficiently good plan Query Optimization
  • 4. Query Optimization: Why Important? 0.25 1 4 16 64 256 1024 4096 16384 W R W R W R W R 1082 70 343 21 unabletofinishin5+hours 276 15102 954 Time(s) Scale Factor = 10 Spark AsterixDB Hive Pig
  • 5. Query Optimization: Why Important? 0.25 1 4 16 64 256 1024 4096 16384 W R W R W R W R 1082 70 343 21 unabletofinishin5+hours 276 15102 954 Time(s) Scale Factor = 10 Spark AsterixDB Hive Pig Bad plans over Big Data can be disastrous!
  • 6. No cost-based join enumeration ◦ Rely on order of relations in FROM clause ◦ Left-deep plans No upfront statistics: ◦ Often data sits in HDFS and unstructured Even if input statistics are available: ◦ Correlations between predicates ◦ Exponential error propagation in joins ◦ Arbitrary UDFs Cost-base Optimizer in DISC
  • 7. Bad statistics Adaptive Query planning RoPe [NSDI 12, VLDB 2013] No upfront statistics Pilot runs (samples) DynO [SIGMOD 2014] Cost-base Optimizer in DISC: State of the Art
  • 8. Bad statistics ◦ Adaptive Query planning ◦ RoPe [NSDI 12, VLDB 2013] No upfront statistics Cost-base Optimizer in DISC: State of the Art
  • 9. Bad statistics ◦ Adaptive Query planning ◦ RoPe [NSDI 12, VLDB 2013] No upfront statistics Cost-base Optimizer in DISC: State of the Art Assumption is that some initial statistics exist
  • 10. Bad statistics ◦ Adaptive Query planning ◦ RoPe [NSDI 12, VLDB 2013] No upfront statistics ◦ Pilot runs (samples) ◦ DynO [SIGMOD 2014] Cost-base Optimizer in DISC: State of the Art Assumption is that some initial statistics exist
  • 11. Bad statistics ◦ Adaptive Query planning ◦ RoPe [NSDI 12, VLDB 2013] No upfront statistics ◦ Pilot runs (samples) ◦ DynO [SIGMOD 2014] Cost-base Optimizer in DISC: State of the Art Assumption is that some initial statistics exist • Samples are expensive • Only foreign-key joins • No runtime plan revision
  • 12. Lazy Cost-base Optimizer for Spark Key idea: interleave query planning and execution ◦ Query plans are lazily executed ◦ Statistics are gathered at runtime ◦ Joins are greedly scheduled ◦ Next join can be dynamically changed if a bad decision was made ◦ Execute-Gather-Aggregate-Plan strategy (EGAP) Neither upfront statistics nor pilot runs are required ◦ Raw dataset size for initial guess Support for not foreign-key joins
  • 13. Lazy Optimizer: an Example BA A C AA AAB AAC Assumption: A < C
  • 14. Lazy Optimizer: Execute Step BA A C AA AAB AAC A B C Assumption: A < C
  • 15. Lazy Optimizer: Gather step BA A C AA AAB AAC A B C S S S S Assumption: A < C
  • 16. U Lazy Optimizer: Aggregate step BA A C AA AAB AAC A B C S S S S Assumption: A < C Driver
  • 17. Lazy Optimizer: Plan step BA A C AA AAB AAC A B C S S S S Assumption: A < C U
  • 18. Lazy Optimizer: Execute step BA A C AA AAB AAC A B C S S S S Assumption: A < C U AB
  • 19. Lazy Optimizer: Gather step BA A C AA AAB AAC A B C S S S S Assumption: A < C U AB S
  • 20. Lazy Optimizer: Plan step BA A C AA AAB AAC A B C S S S S Assumption: A < C U AB S
  • 21. Lazy Optimizer: Execute step BA A C AA AAB AAC A B C S S S S Assumption: A < C U AB S ABCABCABC
  • 22. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC Assumption: A < Cσ σ(A) > σ(C)
  • 23. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ σ(A) > σ(C)
  • 24. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ B σ(A) > σ(C) Repartition step
  • 25. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ B σ(A) > σ(C)
  • 26. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ B BC S σ(A) > σ(C)
  • 27. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ B BC S σ(A) > σ(C)
  • 28. Lazy Optimizer: Wrong Guess B(A) A σ(C) AA AAB AAC A B C S S S S Assumption: A < Cσ B BC S ABCABCABC σ(A) > σ(C)
  • 29. Runtime Integrated Optimizer for Spark Spark batch execution model allows late binding of joins Set of Statistics: ◦ Join estimations (based on sampling or sketches) ◦ Number of records ◦ Average size of each record Statistics are aggregates using a Spark job or accumulators Join implementations are picked based on thresholds
  • 30. Challenges and Optimizations Execute - Block and revise execution plans without wasting computation Gather - Asynchronous generation of statistics Aggregate - Efficient accumulation of statistics Plan - Try to schedule as many broadcast joins as possible
  • 31. Experiments Q1: Is RIOS able to generate good query plans? Q2: What are the performance of RIOS compared to regular Spark and pilot runs? Q3: How expensive are wrong guesses?
  • 32. Minibenchmark with 3 Fact Tables 16 64 256 1024 4096 16384 1 10 100 1000 40 66 115 997 41 61 111 868 45 66 123 1140 136 4194 unabletofinishin5+hours unabletofinishin5+hours 143 4230 unabletofinishin5+hours unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS R2R RIOS W2R spark wrong-order pilot-run
  • 33. Minibenchmark with 3 Fact Tables 16 64 256 1024 4096 16384 1 10 100 1000 40 66 115 997 41 61 111 868 45 66 123 1140 136 4194 unabletofinishin5+hours unabletofinishin5+hours 143 4230 unabletofinishin5+hours unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS R2R RIOS W2R spark wrong-order pilot-run Q1: RIOS is able to avoid bad plans
  • 34. Minibenchmark with 3 Fact Tables 16 64 256 1024 4096 16384 1 10 100 1000 40 66 115 997 41 61 111 868 45 66 123 1140 136 4194 unabletofinishin5+hours unabletofinishin5+hours 143 4230 unabletofinishin5+hours unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS R2R RIOS W2R spark wrong-order pilot-run Q2: RIOS is always faster than pilot run approach
  • 35. Minibenchmark with 3 Fact Tables 16 64 256 1024 4096 16384 1 10 100 1000 40 66 115 997 41 61 111 868 45 66 123 1140 136 4194 unabletofinishin5+hours unabletofinishin5+hours 143 4230 unabletofinishin5+hours unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS R2R RIOS W2R spark wrong-order pilot-run Q3: Bad guesses cost around 15% in the worst case
  • 36. TPCDS and TPCH Queries 16 32 64 128 256 512 1024 2048 4096 8192 1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000 Query 17 Query 50 Query 28 Query 9 38 49 140 2298 38 55 87 1185 37 41 107 617 55 80 326 8511 37 41 67 899 38 41 54 843 34 35 39 464 46 47 137 7250 40 43 69 930 41 52 70 1128 37 38 42 490 50 50 215 7831 45 55 198 3898 47 105 291 6069 37 44 109 712 70 153 633 unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS pilot-run spark bad-order
  • 37. TPCDS and TPCH Queries 16 32 64 128 256 512 1024 2048 4096 8192 1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000 Query 17 Query 50 Query 28 Query 9 38 49 140 2298 38 55 87 1185 37 41 107 617 55 80 326 8511 37 41 67 899 38 41 54 843 34 35 39 464 46 47 137 7250 40 43 69 930 41 52 70 1128 37 38 42 490 50 50 215 7831 45 55 198 3898 47 105 291 6069 37 44 109 712 70 153 633 unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS pilot-run spark bad-order Q1: RIOS generates optimal plans
  • 38. TPCDS and TPCH Queries 16 32 64 128 256 512 1024 2048 4096 8192 1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000 Query 17 Query 50 Query 28 Query 9 38 49 140 2298 38 55 87 1185 37 41 107 617 55 80 326 8511 37 41 67 899 38 41 54 843 34 35 39 464 46 47 137 7250 40 43 69 930 41 52 70 1128 37 38 42 490 50 50 215 7831 45 55 198 3898 47 105 291 6069 37 44 109 712 70 153 633 unabletofinishin5+hours Time(s) Scale Factor spark good-order RIOS pilot-run spark bad-order Q2: RIOS is always the faster approach
  • 39. Conclusions RIOS: cost-base query optimizer for Spark Statistics are gathered at runtime (no need for initial statistics or pilot runs) Late bind of joins Up to 2x faster than the best left-deep plans (Spark), and > 100x than previous approaches for fact table joins.
  • 40. Future Work More flexible shuffle operations: ◦ Efficient switch from shuffle-base joins to broadcast joins ◦ Allow records to be partitioned in different ways Take in consideration interesting orders and partitions Add aggregation and additional statistics (IO and network cost)
  • 42. ๏ Datasets: • TPCDS • TPCH ๏ Configuration: • 16 machines, 4 cores (2 hyper threads per core) machines, 32GB of RAM, 1TB disk • Spark 1.6.3 • Scale factor from 1 to 1000 (~1TB) Experiment Configuration