0% found this document useful (0 votes)
56 views

Intro To Presto

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Intro To Presto

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Open-Source

Presto
Ali LeClerc
Open Source, IBM | Chair, Presto Foundation Outreach

Yi-hong Wang
Software Engineer, IBM
Today’s Agenda

What’s Presto?
Basic architecture & use cases
Presto at IBM
The community
Demo!
Challenges for today’s data engineer

Different engines for different workloads →


re-platforming down the road

Managing multiple query languages and


interfaces for siloed systems

Data infrastructure costs


Presto Overview
Presto is an open-source SQL query engine
that’s fast, reliable, and efficient at scale

● Federate queries and query data where it lives - data lakes, lakehouses, and more
● Blazing fast analytics
● Standardize your SQL with one engine
● Open source
Presto for Data Analytics
Presto for the Data Lakehouse

Reporting and Dashboarding Data Science, ML, & AI

Data warehouse (SQL) Open Lakehouse


functionality that’s: SQL Query Processing ML and AI Frameworks Governance,
Discovery,

● Open
Quality & Security

● Flexible
● Better price performance
Data Lake

Open
Formats

Storage
Why Presto?

One Interface

One Language

Fast, Reliable, Efficient


Key Features

High performance and low latency


Interactivity
Robust batch support
Highly scalable
Reliable
High Level Presto Architecture
Presto Use Cases

Dashboarding Business intelligence

Interactive exploration Data driven apps


Powered by
Ride-hailing, micromobility Social media.
Digital advertising platform. rentals, and food delivery in
Europe and Africa. 30K queries per day with
Over 2000 daily reports and 1000 daily active users on
100s of pipelines on a 7 PB Up to 100k daily queries (over a 300 PB data lake.
data lake with over 400 1.5M queries per month) with
billion records. over 2000 active internal users
on 2 PB data lake.

Ride-hailing, food delivery. Internet technology. Communications API


technology.
Over 100M queries per day Over 2M queries per day
with 7000 weekly active for business intelligence Over 2700 active internal users
users on a 50 PB data lake. and ad hoc use cases.
running 1M queries scanning
40 PB of data per month.
Presto at IBM
IBM ®watsonx.data ™

an open, hybrid, and governed fit-


for-purpose data store optimized to
scale all data, analytics and
AI workloads
Overview of the key components of the IBM watsonx.data: multiple query engines,
open table formats and built-in enterprise governance
Access 100% of your data across
databases and data lakes

Your existing Core watsonx.data functionality


ecosystem Data warehouse Data lake Ecosystem infrastructure

Query Multiple engines such as Presto and Spark Optimize workload costs and
that provide fast, reliable, and efficient
engines processing of big data at scale
performance using multi-engine
functionality

Governance Ensure governance and reduce


and Metadata Metadata store time to insight with centralized
Access control management metadata and access management

Vendor agnostic open formats for analytic Access all your data across databases
data sets, allowing different engines to access and data lakes
Data format and share the same data, at the same time

Reduce storage costs and facilitate


Storage data ingest

Deploy on any infrastructure and


Infrastructure
optimize available resources

watsonx.data
Presto Community
Presto Foundation: Community-driven and open

Governing Board Technical Steering Outreach


Committee Committee

Project Project Project


Governance Roadmap Community
The #1 community working on a deeply integrated query

80% growth in unique committers 285 new PR submitters over last year

1K+ contributors since the beginning 1K+ Github stars over the past year

largest increase in contributions this year


Demo
What we’ll do

Spin Presto up in Docker


Add data sources & catalog
Run a query
Build a dashboard
Tada!
What’s next?
Join the Slack channel
prestodb.slack.com

Contribute to the project


github.com/prestodb

Join the virtual meetup group


meetup.com/prestodb

Join a working group


prestodb.io/community/presto-working-groups/
Next webinar in series:
Diving into Presto C++,
the next-generation
Presto engine

Presto OS events:
prestodb.io/events
Questions?

You might also like