SlideShare a Scribd company logo
Confidential Use Only – Do Not Share
David Phillips
Software Engineer
Facebook
Presto: Fast SQL on Everything
What is Presto?
• Open source distributed SQL query engine
• ANSI SQL compliant
• Originally developed by Facebook
• Used in production at many well known companies
Presto: Fast SQL on Everything
Commercial Offerings
Notable Characteristics
• Adaptive multi-tenant system
• Run hundreds of concurrent queries on thousands of nodes
• Extensible, federated design
• Plugins provide connectors, functions, types, security
• Flexible design supports many different use cases
• High performance
• Many optimizations, code generation, long-lived JVM
Use Cases at Facebook
Interactive Analytics
• Facebook has a massive multi-tenant data warehouse
• Employees need to quickly analyze small data (~50GB-3TB)
• Visualizations, dashboards, notebooks, BI tools
• Clusters run 50-100 concurrent queries w/ diverse shapes
• Queries usually execute in seconds or minutes
• Users are latency sensitive
• Fast improves productivity, slow blocks their work
Batch ETL
• Populate and process data in the warehouse
• Jobs are scheduled using a workflow management system
• Similar to Azkaban or Airflow
• Manages dependencies between jobs
• Queries are typically written by data engineers
• More expensive in CPU and data volume than Interactive
• Throughput and efficiency more important than latency
A/B Testing
• Evaluate product changes via statistical hypothesis testing
• Results need to be available in hours (not days)
• Data must be complete and accurate
• Arbitrary slice and dice at interactive latency (~5 -30s)
• Cannot pre-aggregate data, must compute results on the fly
• Producing results requires joining multiple large data sets
• Web interface generates restricted query shapes
App Analytics
• External-user facing custom reporting tools
• Facebook Analytics offers analytics to application developers
• Web interface generates small set of query shapes
• Highly selective queries over large aggregate data volumes
• Application developers can only access their own data
• Very strict latency requirements (~100ms-5s)
• Highly available, hundreds of concurrent queries
System Design
Worker
Data Source APIProcessor
Worker
Coordinator
Planner/Optimizer Scheduler
Metadata API Data Location API
Queue
Processor
Query
Results Data Source APIProcessor
Worker
External
Storage
System
Presto
Architecture
Predicate Pushdown
• Engine provides connectors with a two part constraint:
1. Domain of values: ranges and nullability
2. “Black box” predicate for filtering
• Connectors report the domain they can guarantee
• Engine can elide redundant filtering
• Optimizer can make further use of this information
Data Layouts
• Optimizer takes advantage of physical layout of data
• Properties: partitioning, sorting, grouping, indexes
• Tables can have multiple layouts with different properties
• Layouts can have a subset of columns or data
• Optimizer chooses best layout for query
• Tune queries by adding new physical layouts
LeftJoin
LocalShuffle
Stage 2
Stage 4
partitioned-shuffle
Hash
Filter
Scan
Hash
Scan
AggregateFinal
Hash
Stage 0
Output
Stage 1
Stage 3
collecting-shuffle
partitioned-shuffle partitioned-shuffle
AggregatePartial
Stage 0
LeftJoin
LocalShuffle
Stage 1collecting-shuffle
Hash
Scan
Aggregate
Output
Hash
Filter
Scan
Optimized plan using
data layout properties
Original plan
without any
data layout
properties
Pre-computing Hashes
• Computing hashes can be expensive
• Especially for strings or complex types
• Push computation to the lowest level of the plan tree
• Re-use for aggregations, joins, local or remote shuffles
Intra-node Parallelism
• Use multiple threads on a single node
• More efficient than parallelism across nodes
• Little latency overhead
• Efficiently share state (e.g., hash tables) between threads
• Needed due to skew or table transforms
LookupJoin
HashBuild
LocalShuffle
ScanHashScanFilterHash
HashBuild
Pipeline 0
Pipeline 1
Pipeline 2
Stage 0
Task 0
Stage 1
Task 0 Task 1
Task 3..n
Task 2
HashAggregate
ScanHash
Physical Execution Plan
Pipeline 1 is parallelized
across multiple threads
Stage Scheduling
• Two scheduling policies:
1. All-at-once: minimize latency
2. Phased: minimize resource usage
Split Scheduling
• Splits are enumerated as the query executes, not up front
• For Hive, both partition metadata and discovering files
• Start executing immediately
• Queries often finish early (LIMIT or interactive)
• Reduces metadata memory usage on coordinator
• Splits are assigned to worker with shortest queue
Operating on Compressed Data
• Process dictionaries directly instead of values
• Shared dictionaries can be larger than rows
• Use heuristics to determine if speculation is working
• Hash table creation takes advantage of dictionaries
• Joins can produce dictionary encoded data
Page Layout in Memory
Page 0
partkey returnflag shipinstruct
52470
50600
18866
72387
7429
44077
148102
101228
"F" x 8
0: "IN PERSON"
1: "COD"
2: "RETURN"
3: "NONE"
LongBlock RLEBlock DictionaryBlock
Indices
1
0
1
2
0
2
2
1
Dictionary
Page 1
partkey returnflag
164648
35173
139350
40227
87261
184817
153099
"O" x 7
LongBlock RLEBlock DictionaryBlock
Indices2
2
2
0
1
3
2
Dictionary
shipinstruct
Writer Scaling
• Write performance dominated by concurrency
• Too few writers causes the query to be slow
• Too many writers creates small files
• Expensive to read later (metadata, IO, latency)
• Inefficient for storage system
• Add writers as needed when producer buffers are full, as
long as data written exceeds a configured threshold
Code Generation
• SQL → JVM bytecode → machine code
• Filter, project, sort comparators, aggregations
• Auto-vectorization, branch prediction, register use
• Eliminate virtual calls and allow inlining
• Profile each task independently based on data processed
• Avoid profile pollution across tasks and queries
• Profile can change during execution as data changes
CPU Time Improvements for Bytecode Generation
0
1000
2000
3000
4000
5000
6000
7000
Baseline 1 Transform 2 Transforms 3 Transforms
AvgCPUTime(seconds)
Generated Naïve
Fault Tolerance
• Node crash causes query failure
• In practice, failures are rare, even on large clusters
• Checkpointing or other recovery mechanisms have a cost
• Re-run failures rather than making everything expensive
• Limit runtime to a few hours to reduce waste and latency
• Clients retry on failure
Presto: Fast SQL on Everything
Ad

More Related Content

What's hot (20)

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
Databricks
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public Health
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
HostedbyConfluent
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
Tom Kerkhove
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
SingleStore
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
Databricks
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
SingleStore
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public Health
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
HostedbyConfluent
 

Similar to Presto: Fast SQL on Everything (20)

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
Nguyen Tung
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
Emanuel Calvo
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
Colin Charles
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
GokulD
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017
Casey Kinsey
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
Lars Kamp
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
Nguyen Tung
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
Emanuel Calvo
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
Colin Charles
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
GokulD
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017
Casey Kinsey
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
Lars Kamp
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Ad

Presto: Fast SQL on Everything

  • 1. Confidential Use Only – Do Not Share David Phillips Software Engineer Facebook Presto: Fast SQL on Everything
  • 2. What is Presto? • Open source distributed SQL query engine • ANSI SQL compliant • Originally developed by Facebook • Used in production at many well known companies
  • 5. Notable Characteristics • Adaptive multi-tenant system • Run hundreds of concurrent queries on thousands of nodes • Extensible, federated design • Plugins provide connectors, functions, types, security • Flexible design supports many different use cases • High performance • Many optimizations, code generation, long-lived JVM
  • 6. Use Cases at Facebook
  • 7. Interactive Analytics • Facebook has a massive multi-tenant data warehouse • Employees need to quickly analyze small data (~50GB-3TB) • Visualizations, dashboards, notebooks, BI tools • Clusters run 50-100 concurrent queries w/ diverse shapes • Queries usually execute in seconds or minutes • Users are latency sensitive • Fast improves productivity, slow blocks their work
  • 8. Batch ETL • Populate and process data in the warehouse • Jobs are scheduled using a workflow management system • Similar to Azkaban or Airflow • Manages dependencies between jobs • Queries are typically written by data engineers • More expensive in CPU and data volume than Interactive • Throughput and efficiency more important than latency
  • 9. A/B Testing • Evaluate product changes via statistical hypothesis testing • Results need to be available in hours (not days) • Data must be complete and accurate • Arbitrary slice and dice at interactive latency (~5 -30s) • Cannot pre-aggregate data, must compute results on the fly • Producing results requires joining multiple large data sets • Web interface generates restricted query shapes
  • 10. App Analytics • External-user facing custom reporting tools • Facebook Analytics offers analytics to application developers • Web interface generates small set of query shapes • Highly selective queries over large aggregate data volumes • Application developers can only access their own data • Very strict latency requirements (~100ms-5s) • Highly available, hundreds of concurrent queries
  • 12. Worker Data Source APIProcessor Worker Coordinator Planner/Optimizer Scheduler Metadata API Data Location API Queue Processor Query Results Data Source APIProcessor Worker External Storage System Presto Architecture
  • 13. Predicate Pushdown • Engine provides connectors with a two part constraint: 1. Domain of values: ranges and nullability 2. “Black box” predicate for filtering • Connectors report the domain they can guarantee • Engine can elide redundant filtering • Optimizer can make further use of this information
  • 14. Data Layouts • Optimizer takes advantage of physical layout of data • Properties: partitioning, sorting, grouping, indexes • Tables can have multiple layouts with different properties • Layouts can have a subset of columns or data • Optimizer chooses best layout for query • Tune queries by adding new physical layouts
  • 15. LeftJoin LocalShuffle Stage 2 Stage 4 partitioned-shuffle Hash Filter Scan Hash Scan AggregateFinal Hash Stage 0 Output Stage 1 Stage 3 collecting-shuffle partitioned-shuffle partitioned-shuffle AggregatePartial Stage 0 LeftJoin LocalShuffle Stage 1collecting-shuffle Hash Scan Aggregate Output Hash Filter Scan Optimized plan using data layout properties Original plan without any data layout properties
  • 16. Pre-computing Hashes • Computing hashes can be expensive • Especially for strings or complex types • Push computation to the lowest level of the plan tree • Re-use for aggregations, joins, local or remote shuffles
  • 17. Intra-node Parallelism • Use multiple threads on a single node • More efficient than parallelism across nodes • Little latency overhead • Efficiently share state (e.g., hash tables) between threads • Needed due to skew or table transforms
  • 18. LookupJoin HashBuild LocalShuffle ScanHashScanFilterHash HashBuild Pipeline 0 Pipeline 1 Pipeline 2 Stage 0 Task 0 Stage 1 Task 0 Task 1 Task 3..n Task 2 HashAggregate ScanHash Physical Execution Plan Pipeline 1 is parallelized across multiple threads
  • 19. Stage Scheduling • Two scheduling policies: 1. All-at-once: minimize latency 2. Phased: minimize resource usage
  • 20. Split Scheduling • Splits are enumerated as the query executes, not up front • For Hive, both partition metadata and discovering files • Start executing immediately • Queries often finish early (LIMIT or interactive) • Reduces metadata memory usage on coordinator • Splits are assigned to worker with shortest queue
  • 21. Operating on Compressed Data • Process dictionaries directly instead of values • Shared dictionaries can be larger than rows • Use heuristics to determine if speculation is working • Hash table creation takes advantage of dictionaries • Joins can produce dictionary encoded data
  • 22. Page Layout in Memory Page 0 partkey returnflag shipinstruct 52470 50600 18866 72387 7429 44077 148102 101228 "F" x 8 0: "IN PERSON" 1: "COD" 2: "RETURN" 3: "NONE" LongBlock RLEBlock DictionaryBlock Indices 1 0 1 2 0 2 2 1 Dictionary Page 1 partkey returnflag 164648 35173 139350 40227 87261 184817 153099 "O" x 7 LongBlock RLEBlock DictionaryBlock Indices2 2 2 0 1 3 2 Dictionary shipinstruct
  • 23. Writer Scaling • Write performance dominated by concurrency • Too few writers causes the query to be slow • Too many writers creates small files • Expensive to read later (metadata, IO, latency) • Inefficient for storage system • Add writers as needed when producer buffers are full, as long as data written exceeds a configured threshold
  • 24. Code Generation • SQL → JVM bytecode → machine code • Filter, project, sort comparators, aggregations • Auto-vectorization, branch prediction, register use • Eliminate virtual calls and allow inlining • Profile each task independently based on data processed • Avoid profile pollution across tasks and queries • Profile can change during execution as data changes
  • 25. CPU Time Improvements for Bytecode Generation 0 1000 2000 3000 4000 5000 6000 7000 Baseline 1 Transform 2 Transforms 3 Transforms AvgCPUTime(seconds) Generated Naïve
  • 26. Fault Tolerance • Node crash causes query failure • In practice, failures are rare, even on large clusters • Checkpointing or other recovery mechanisms have a cost • Re-run failures rather than making everything expensive • Limit runtime to a few hours to reduce waste and latency • Clients retry on failure