0% found this document useful (0 votes)

56 views

Intro To Presto

Uploaded by

José Rafael Giraldo Tenorio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

Intro To Presto

Uploaded by

José Rafael Giraldo Tenorio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction to Open-Source

Presto
Ali LeClerc
Open Source, IBM | Chair, Presto Foundation Outreach

Yi-hong Wang
Software Engineer, IBM
Today’s Agenda

What’s Presto?
Basic architecture & use cases
Presto at IBM
The community
Demo!
Challenges for today’s data engineer

Different engines for different workloads →

re-platforming down the road

Managing multiple query languages and

interfaces for siloed systems

Data infrastructure costs

Presto Overview
Presto is an open-source SQL query engine
that’s fast, reliable, and efficient at scale

● Federate queries and query data where it lives - data lakes, lakehouses, and more
● Blazing fast analytics
● Standardize your SQL with one engine
● Open source
Presto for Data Analytics
Presto for the Data Lakehouse

Reporting and Dashboarding Data Science, ML, & AI

Data warehouse (SQL) Open Lakehouse

functionality that’s: SQL Query Processing ML and AI Frameworks Governance,
Discovery,

● Open
Quality & Security

● Flexible
● Better price performance
Data Lake

Open
Formats

Storage
Why Presto?

One Interface

One Language

Fast, Reliable, Efficient

Key Features

High performance and low latency

Interactivity
Robust batch support
Highly scalable
Reliable
High Level Presto Architecture
Presto Use Cases

Dashboarding Business intelligence

Interactive exploration Data driven apps

Powered by
Ride-hailing, micromobility Social media.
Digital advertising platform. rentals, and food delivery in
Europe and Africa. 30K queries per day with
Over 2000 daily reports and 1000 daily active users on
100s of pipelines on a 7 PB Up to 100k daily queries (over a 300 PB data lake.
data lake with over 400 1.5M queries per month) with
billion records. over 2000 active internal users
on 2 PB data lake.

Ride-hailing, food delivery. Internet technology. Communications API

technology.
Over 100M queries per day Over 2M queries per day
with 7000 weekly active for business intelligence Over 2700 active internal users
users on a 50 PB data lake. and ad hoc use cases.
running 1M queries scanning
40 PB of data per month.
Presto at IBM
IBM ®watsonx.data ™

an open, hybrid, and governed fit-

for-purpose data store optimized to
scale all data, analytics and
AI workloads
Overview of the key components of the IBM watsonx.data: multiple query engines,
open table formats and built-in enterprise governance
Access 100% of your data across
databases and data lakes

Your existing Core watsonx.data functionality

ecosystem Data warehouse Data lake Ecosystem infrastructure

Query Multiple engines such as Presto and Spark Optimize workload costs and
that provide fast, reliable, and efficient
engines processing of big data at scale
performance using multi-engine
functionality

Governance Ensure governance and reduce

and Metadata Metadata store time to insight with centralized
Access control management metadata and access management

Vendor agnostic open formats for analytic Access all your data across databases
data sets, allowing different engines to access and data lakes
Data format and share the same data, at the same time

Reduce storage costs and facilitate

Storage data ingest

Deploy on any infrastructure and

Infrastructure
optimize available resources

watsonx.data
Presto Community
Presto Foundation: Community-driven and open

Governing Board Technical Steering Outreach

Committee Committee

Project Project Project

Governance Roadmap Community
The #1 community working on a deeply integrated query

80% growth in unique committers 285 new PR submitters over last year

1K+ contributors since the beginning 1K+ Github stars over the past year

largest increase in contributions this year

Demo
What we’ll do

Spin Presto up in Docker

Add data sources & catalog
Run a query
Build a dashboard
Tada!
What’s next?
Join the Slack channel
prestodb.slack.com

Contribute to the project

github.com/prestodb

Join the virtual meetup group

meetup.com/prestodb

Join a working group

prestodb.io/community/presto-working-groups/
Next webinar in series:
Diving into Presto C++,
the next-generation
Presto engine

Presto OS events:
prestodb.io/events
Questions?

Tomorrow's Lawyers: An Introduction To Your Future (Third Edition) Richard Susskind
No ratings yet
Tomorrow's Lawyers: An Introduction To Your Future (Third Edition) Richard Susskind
64 pages
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
From Everand
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
Robert Johnson
No ratings yet
Why Agents Are The Next Frontier of Generative Ai
No ratings yet
Why Agents Are The Next Frontier of Generative Ai
8 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
What Is Presto
No ratings yet
What Is Presto
11 pages
Presto - Make Sense of All Your Data, Any Size, Anywhere
No ratings yet
Presto - Make Sense of All Your Data, Any Size, Anywhere
13 pages
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Presto SQL On Everything
No ratings yet
Presto SQL On Everything
12 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
DBeaver Essentials: Definitive Reference for Developers and Engineers
From Everand
DBeaver Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
From Everand
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
02 - Introduction To Data Lakehouse Open-Source Technologies
No ratings yet
02 - Introduction To Data Lakehouse Open-Source Technologies
42 pages
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Vector Databases: The Future of Data Retrieval and AI
From Everand
Mastering Vector Databases: The Future of Data Retrieval and AI
Robert Johnson
No ratings yet
Airflow for Data Workflow Automation
From Everand
Airflow for Data Workflow Automation
Richard Johnson
No ratings yet
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering PrestoDB: Fast SQL Analytics at Scale
From Everand
Mastering PrestoDB: Fast SQL Analytics at Scale
Robert Johnson
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Alteryx Workflow Automation and Data Transformation: Definitive Reference for Developers and Engineers
From Everand
Alteryx Workflow Automation and Data Transformation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
From Everand
Amazon Athena Query Design and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
From Everand
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
From Everand
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataGrip Essentials: Definitive Reference for Developers and Engineers
From Everand
DataGrip Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Cloudant Essentials: Definitive Reference for Developers and Engineers
From Everand
Cloudant Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Zeppelin for Interactive Data Analytics: Definitive Reference for Developers and Engineers
From Everand
Zeppelin for Interactive Data Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sisense Solutions and Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
Sisense Solutions and Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
NiFi Dataflow Engineering: Definitive Reference for Developers and Engineers
From Everand
NiFi Dataflow Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
From Everand
Mastering Amazon Redshift: Scalable Cloud Data Warehousing
Robert Johnson
No ratings yet
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
From Everand
Zabbix Systems Monitoring and Management: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
01 - IBM Watsonx - Data Exploring Watsonx - Data
No ratings yet
01 - IBM Watsonx - Data Exploring Watsonx - Data
31 pages
Ebook: The Data Store For AI
No ratings yet
Ebook: The Data Store For AI
17 pages
IBM watsonx.data_ An open, hybrid, governed data store
No ratings yet
IBM watsonx.data_ An open, hybrid, governed data store
6 pages
Envoy Proxy Essentials: Definitive Reference for Developers and Engineers
From Everand
Envoy Proxy Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
PrestoICDE From Batch Processing To Real Time Analytics
No ratings yet
PrestoICDE From Batch Processing To Real Time Analytics
12 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
OrientDB Deep Dive: Definitive Reference for Developers and Engineers
From Everand
OrientDB Deep Dive: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StreamSets Pipeline Design and Best Practices: Definitive Reference for Developers and Engineers
From Everand
StreamSets Pipeline Design and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Oracle Information Integration, Migration, and Consolidation
From Everand
Oracle Information Integration, Migration, and Consolidation
Jason Williamson
No ratings yet
QuickSight Essentials: Definitive Reference for Developers and Engineers
From Everand
QuickSight Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rocket.Chat Administration and Deployment Guide: Definitive Reference for Developers and Engineers
From Everand
Rocket.Chat Administration and Deployment Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
From Everand
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
solutions_judo_competition.pdf
No ratings yet
solutions_judo_competition.pdf
2 pages
AWS CA Infographic
No ratings yet
AWS CA Infographic
1 page
leaders-guide-value-in-motion
No ratings yet
leaders-guide-value-in-motion
35 pages
FC5677.docx
No ratings yet
FC5677.docx
8 pages
British_Motorcycle_Industry_Case_Analysis___Zach_Y.pdf
No ratings yet
British_Motorcycle_Industry_Case_Analysis___Zach_Y.pdf
1 page
3D Printing: A Guide For Decision-Makers: White Paper
No ratings yet
3D Printing: A Guide For Decision-Makers: White Paper
24 pages
1703165354631010
No ratings yet
1703165354631010
62 pages
Quantum Technology
No ratings yet
Quantum Technology
26 pages
frontmatter_vanmieghem_allon_operationsstrategy_ed2
0% (1)
frontmatter_vanmieghem_allon_operationsstrategy_ed2
13 pages
M00230 HE Primer Strategy
No ratings yet
M00230 HE Primer Strategy
20 pages
Tuico AGRI 2024 Poster
No ratings yet
Tuico AGRI 2024 Poster
1 page
2010-insight-summer-auction-or-negotiate
No ratings yet
2010-insight-summer-auction-or-negotiate
2 pages
17-016_2664e889-da2d-4f34-a8d6-7da0648cc33c
No ratings yet
17-016_2664e889-da2d-4f34-a8d6-7da0648cc33c
40 pages
Paper aI Presentations
No ratings yet
Paper aI Presentations
14 pages
NETP Global Guideline
No ratings yet
NETP Global Guideline
104 pages
Types of RAG: @bhavishya Pandit
No ratings yet
Types of RAG: @bhavishya Pandit
15 pages
Paper IA
No ratings yet
Paper IA
18 pages
Architecture Governance v13n3
No ratings yet
Architecture Governance v13n3
11 pages
DI Cloud-Gaming
No ratings yet
DI Cloud-Gaming
20 pages
Top Strategic Technology Trends For 2025 AI Governance Platforms
100% (1)
Top Strategic Technology Trends For 2025 AI Governance Platforms
14 pages
AI Priorities: 5 Ways To Go From Reality Check To Real-World Pay Off
No ratings yet
AI Priorities: 5 Ways To Go From Reality Check To Real-World Pay Off
16 pages
5G The Driver For The Next-Generation Digital Society in Latin America and The Caribbean
No ratings yet
5G The Driver For The Next-Generation Digital Society in Latin America and The Caribbean
61 pages
WEF Inclusive Deployment of Blockchain For Supply Chains Part 5
No ratings yet
WEF Inclusive Deployment of Blockchain For Supply Chains Part 5
25 pages
The Impact of 5G:: Creating New Value Across Industries and Society
No ratings yet
The Impact of 5G:: Creating New Value Across Industries and Society
24 pages
Impact of AI on Corporate Governance
No ratings yet
Impact of AI on Corporate Governance
7 pages
AIOps Fundamentals Level 1 Quiz - Attempt Review
No ratings yet
AIOps Fundamentals Level 1 Quiz - Attempt Review
15 pages
Your Learning
No ratings yet
Your Learning
11 pages
ECM en AI
No ratings yet
ECM en AI
43 pages
Magic Quadrant For Insight Engines
No ratings yet
Magic Quadrant For Insight Engines
40 pages
74. Tuyển tập Đề Chuyên & HSG Tuyên Quang - Otto Channel
No ratings yet
74. Tuyển tập Đề Chuyên & HSG Tuyên Quang - Otto Channel
91 pages
AIOps Fundamentals Level 1 Quiz - Attempt Review
No ratings yet
AIOps Fundamentals Level 1 Quiz - Attempt Review
18 pages
IBM's Watson
No ratings yet
IBM's Watson
3 pages
Topic 1
No ratings yet
Topic 1
49 pages
4 Legal Persons
No ratings yet
4 Legal Persons
33 pages
!6-Test 4 - Pass
No ratings yet
!6-Test 4 - Pass
6 pages
Techtarget Article
No ratings yet
Techtarget Article
10 pages
WEATHER COMPANY CASELET
No ratings yet
WEATHER COMPANY CASELET
7 pages
IELTS Reading P3 - Attitudes towards Artificial Intelligence
No ratings yet
IELTS Reading P3 - Attitudes towards Artificial Intelligence
4 pages
Big Data Anil Maheshwari - The ebook is available for instant download, no waiting required
100% (1)
Big Data Anil Maheshwari - The ebook is available for instant download, no waiting required
63 pages
Session 4 Practice Case Question Answers
No ratings yet
Session 4 Practice Case Question Answers
5 pages
READING PASSAGE 3 Cambridge
No ratings yet
READING PASSAGE 3 Cambridge
4 pages
Product Groups
No ratings yet
Product Groups
2 pages
2 Introduction
No ratings yet
2 Introduction
16 pages
watsonx.ai Level2
No ratings yet
watsonx.ai Level2
19 pages
Artificial Intelligrnce (2)
No ratings yet
Artificial Intelligrnce (2)
30 pages
Arificial Intelligence Notes
No ratings yet
Arificial Intelligence Notes
19 pages
IBM Case Study
No ratings yet
IBM Case Study
3 pages
IBM Watson
No ratings yet
IBM Watson
4 pages
Watsonx Assistant Level 2 Quiz - Attempt Review
No ratings yet
Watsonx Assistant Level 2 Quiz - Attempt Review
22 pages
Quiz (AI Assistants L1) - Attempt Review
100% (1)
Quiz (AI Assistants L1) - Attempt Review
11 pages
Cyber Security Power and Technology Martti Lehto download
100% (4)
Cyber Security Power and Technology Martti Lehto download
56 pages
Architecture patterns and integrations [WA PoX L4] Quiz_ Attempt review
No ratings yet
Architecture patterns and integrations [WA PoX L4] Quiz_ Attempt review
8 pages

Intro To Presto

Uploaded by

Intro To Presto

Uploaded by

Introduction to Open-Source

Different engines for different workloads →

Managing multiple query languages and

Data infrastructure costs

Reporting and Dashboarding Data Science, ML, & AI

Data warehouse (SQL) Open Lakehouse

Fast, Reliable, Efficient

High performance and low latency

Dashboarding Business intelligence

Interactive exploration Data driven apps

Ride-hailing, food delivery. Internet technology. Communications API

an open, hybrid, and governed fit-

Your existing Core watsonx.data functionality

Governance Ensure governance and reduce

Reduce storage costs and facilitate

Deploy on any infrastructure and

Governing Board Technical Steering Outreach

Project Project Project

largest increase in contributions this year

Spin Presto up in Docker

Contribute to the project

Join the virtual meetup group

Join a working group

You might also like