SlideShare a Scribd company logo
What's New in SQL-on-Hadoop and Beyond
Martin Traverso, Facebook
Kamil Bajda-Pawlikowski, Teradata
Agenda
● Introduction
● Presto at Facebook
● Presto users and use cases
● New features
● Roadmap
Introduction
What is Presto
● Open source distributed SQL engine
● ANSI SQL syntax
● Custom built for interactive analytic queries
● Queries data across multiple data stores
● Flexible deployment (on premise or cloud)
● Extensible
What's new in SQL on Hadoop and Beyond
Presto at Facebook
Presto @ Facebook
● Ad-hoc/interactive queries for Hadoop warehouse
● Batch processing for Hadoop warehouse
● Analytics for user-facing products
● Analytics over various specialized stores
Hadoop Warehouse - Stats
● 1000s of internal daily active users
● Millions of queries each month
● Scan PBs of data every day
● Process trillions of rows every day
● 10s of concurrent queries
Hadoop Warehouse - Batch
Presto for User-facing Products
● Requirements
○ Hundreds of ms to seconds latency, low variability
○ Availability
○ Update semantics
○ 10 - 15 way joins
● Stats
○ > 99.99% query success rate
○ 100% system availability
○ 25 - 200 concurrent queries
○ 1 - 20 queries per second
○ <100ms - 5s latency
Presto with Raptor
● Large data sets (petabytes)
● Milliseconds to seconds latency
● Predictable performance
● 5-15 minute load latency
● Reliable data loads (no duplicates, no missing data)
● High availability
● 10s of concurrent queries
Presto users and use cases
Presto users
See more at https://github.com/prestodb/presto/wiki/Presto-Users
Netflix stats
Interactive, reporting, and app-driven queries
Data warehouse: 40PB in S3
~250 nodes across multiple clusters
~650 users with ~6K+ queries/day
Twitter stats
Ad-hoc and low-latency queries
~200 nodes dedicated to Presto
Parquet with nested data structures
Uber stats
2 clusters
100+ machines
2000+ queries per day
HDFS on premise
FINRA stats
120+ EC2 nodes (r3.4xlarge)
2+ PBs of data on S3 (bzip2 & orc)
200+ users
Distro supported by Teradata
New features
SQL features
● DDL syntax
CREATE / ALTER / DROP TABLE
● DML syntax
INSERT / DELETE
● SQL features:
Data types: DECIMAL, VARCHAR(n), INT, SMALLINT, TINYINT
CUBE, ROLLUP, GROUPING SETS
INTERSECT
Non-equi joins
Uncorrelated subqueries
Other features
● Performance
Join and aggregation optimizations
● Connectors
Redis
MongoDB
● Kerberos
● Presto-Admin
● Ambari and YARN (via Apache Slider)
● Enterprise-grade ODBC & JDBC drivers
● BI tools certifications
Information Builders, Looker, MicroStrategy, MS Power BI, Qlik, Tableau, ZoomData
Drivers and BI tools
Roadmap
Short term
● LDAP
● SQL features
Data types: FLOAT, CHAR(n), VAR/BINARY(n)
EXISTS, EXCEPT
Correlated subqueries
Lambda expressions
Prepared statements
● Connectors
Accumulo (by Bloomberg)
Long term
● Materialized Query Tables
● Workload management
● Spill to disk
● Cost-based Optimizer
See more at https://github.com/prestodb/presto/wiki/Roadmap
More about Presto
GitHub: https://github.com/prestodb & https://github.com/Teradata/presto
Website: http://prestodb.io
Group: https://groups.google.com/group/presto-users
Distro: http://www.teradata.com/presto

More Related Content

What's hot (20)

PPTX
Querying Druid in SQL with Superset
DataWorks Summit
 
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
PDF
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
PPT
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
 
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
PDF
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
DataWorks Summit/Hadoop Summit
 
PDF
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Data Con LA
 
PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
PPTX
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
PPTX
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
PDF
About CDAP
Cask Data
 
PDF
Big Telco - Yousun Jeong
Spark Summit
 
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
PPTX
Embeddable data transformation for real time streams
Joey Echeverria
 
Querying Druid in SQL with Superset
DataWorks Summit
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
DataWorks Summit/Hadoop Summit
 
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Data Con LA
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
About CDAP
Cask Data
 
Big Telco - Yousun Jeong
Spark Summit
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Embeddable data transformation for real time streams
Joey Echeverria
 

Viewers also liked (20)

PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Knowledge from Noise
DataWorks Summit/Hadoop Summit
 
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
PPTX
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
PDF
SQL on Hadoop
nvvrajesh
 
PPTX
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
PDF
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
PPTX
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Introduction to sentry
mozillazg
 
PDF
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
PPTX
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
PDF
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
PPTX
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Knowledge from Noise
DataWorks Summit/Hadoop Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
SQL on Hadoop
nvvrajesh
 
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
Introduction to sentry
mozillazg
 
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Ad

Similar to What's new in SQL on Hadoop and Beyond (20)

PDF
Presto at Hadoop Summit 2016
kbajda
 
PDF
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
PPTX
Presto: SQL-on-anything
DataWorks Summit
 
PDF
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
PDF
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
PDF
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
PDF
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
PDF
SQL for Everything at CWT2014
N Masahiro
 
ODP
Presto
Knoldus Inc.
 
PDF
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
PDF
Presto - SQL on anything
Grzegorz Kokosiński
 
PPTX
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
PDF
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
PDF
Presto@Uber
Zhenxiao Luo
 
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
PDF
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
PPTX
Big dataproposal
Qubole
 
PDF
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Presto at Hadoop Summit 2016
kbajda
 
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
Presto: SQL-on-anything
DataWorks Summit
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
SQL for Everything at CWT2014
N Masahiro
 
Presto
Knoldus Inc.
 
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
Presto - SQL on anything
Grzegorz Kokosiński
 
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
Presto@Uber
Zhenxiao Luo
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Facebook Presto presentation
Cyanny LIANG
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Big dataproposal
Qubole
 
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 

What's new in SQL on Hadoop and Beyond

  • 1. What's New in SQL-on-Hadoop and Beyond Martin Traverso, Facebook Kamil Bajda-Pawlikowski, Teradata
  • 2. Agenda ● Introduction ● Presto at Facebook ● Presto users and use cases ● New features ● Roadmap
  • 4. What is Presto ● Open source distributed SQL engine ● ANSI SQL syntax ● Custom built for interactive analytic queries ● Queries data across multiple data stores ● Flexible deployment (on premise or cloud) ● Extensible
  • 7. Presto @ Facebook ● Ad-hoc/interactive queries for Hadoop warehouse ● Batch processing for Hadoop warehouse ● Analytics for user-facing products ● Analytics over various specialized stores
  • 8. Hadoop Warehouse - Stats ● 1000s of internal daily active users ● Millions of queries each month ● Scan PBs of data every day ● Process trillions of rows every day ● 10s of concurrent queries
  • 10. Presto for User-facing Products ● Requirements ○ Hundreds of ms to seconds latency, low variability ○ Availability ○ Update semantics ○ 10 - 15 way joins ● Stats ○ > 99.99% query success rate ○ 100% system availability ○ 25 - 200 concurrent queries ○ 1 - 20 queries per second ○ <100ms - 5s latency
  • 11. Presto with Raptor ● Large data sets (petabytes) ● Milliseconds to seconds latency ● Predictable performance ● 5-15 minute load latency ● Reliable data loads (no duplicates, no missing data) ● High availability ● 10s of concurrent queries
  • 12. Presto users and use cases
  • 13. Presto users See more at https://github.com/prestodb/presto/wiki/Presto-Users
  • 14. Netflix stats Interactive, reporting, and app-driven queries Data warehouse: 40PB in S3 ~250 nodes across multiple clusters ~650 users with ~6K+ queries/day
  • 15. Twitter stats Ad-hoc and low-latency queries ~200 nodes dedicated to Presto Parquet with nested data structures
  • 16. Uber stats 2 clusters 100+ machines 2000+ queries per day HDFS on premise
  • 17. FINRA stats 120+ EC2 nodes (r3.4xlarge) 2+ PBs of data on S3 (bzip2 & orc) 200+ users Distro supported by Teradata
  • 19. SQL features ● DDL syntax CREATE / ALTER / DROP TABLE ● DML syntax INSERT / DELETE ● SQL features: Data types: DECIMAL, VARCHAR(n), INT, SMALLINT, TINYINT CUBE, ROLLUP, GROUPING SETS INTERSECT Non-equi joins Uncorrelated subqueries
  • 20. Other features ● Performance Join and aggregation optimizations ● Connectors Redis MongoDB ● Kerberos ● Presto-Admin ● Ambari and YARN (via Apache Slider)
  • 21. ● Enterprise-grade ODBC & JDBC drivers ● BI tools certifications Information Builders, Looker, MicroStrategy, MS Power BI, Qlik, Tableau, ZoomData Drivers and BI tools
  • 23. Short term ● LDAP ● SQL features Data types: FLOAT, CHAR(n), VAR/BINARY(n) EXISTS, EXCEPT Correlated subqueries Lambda expressions Prepared statements ● Connectors Accumulo (by Bloomberg)
  • 24. Long term ● Materialized Query Tables ● Workload management ● Spill to disk ● Cost-based Optimizer See more at https://github.com/prestodb/presto/wiki/Roadmap
  • 25. More about Presto GitHub: https://github.com/prestodb & https://github.com/Teradata/presto Website: http://prestodb.io Group: https://groups.google.com/group/presto-users Distro: http://www.teradata.com/presto