The Modern ELT Stack To Win With Cloud Data Warehousing
The Modern ELT Stack To Win With Cloud Data Warehousing
Data Warehousing
1
Today’s Speakers
2
Agenda
• Rise of Cloud Data Warehousing
• Modern ELT Stack Overview
• ELT in the Wild!
• Demo - ELT for Marketing Analytics
• Q&A
3
Traditional Analytics Process with ETL
DATA
• Rigid - hard to adapt to changing
requirements/data
CRM WAREHOUSE
EXTRACT REPORTING
TRANSFORM &
LOAD DATA ANALYTICS
DATABASE MARTS
• Siloed - typically IT-led tools...not
exposed broadly
FILES
Why ELT?
ERP
• Flexible - can transform data on-
DATA
WAREHOUSE
CRM DATA
the-fly to meet requirements
DATA PREPARATION DATA REPORTING
INGESTION
• Collaborative - fosters
WAREHOUSE ANALYTICS
(T) AI/ML
(E + L)
DATABASE
DATA
collaboration between data
FILES
LAKE
engineers & analysts
• No-code/low-code - empowers
a variety of users to do this work
IT-led Business-led
The Modern ELT Stack
Automate data integration Explore, clean & blend data Centralized data warehouse
from source to destination for use in analytics for reporting & analytics
Data
Movement Amazon S3
Amazon
Redshift
Amazon
Redshift
Data
Preparation
AUTOMATION
Refresh
ELT at Autodesk
Transformation, automation
Amazon
Redshift
Amazon S3
Internal enterprise sources
Amazon EMR
● Lots of dirty data that requires constant cleansing before it can be used or reported on
● A team of analysts that need to be able to ask a lot of questions of the data very quickly
● Demanding client-base who needs to be able to understand and communicate results fast
● Often tackling problems we have never run into before, and where there is no playbook to refer back to
Tableau
Cloud DW
● Fundamentally changed the way Callahan does business, in a competitively advantageous way
○ Has kept time spent on setting up and managing data pipelines to less than 30% of overall time spent on projects,
● Brought extreme value to our clients in terms of improved business results, and cost efficiencies
○ Media Result: 90% improvement in media impact, on a 50% reduction in media budget
○ Sales Result: 5% sales improvement during peak periods with ability to predict inventory out of stocks 3
weeks in advance
Demo
Amazon Redshift
Amazon Redshift
THE MOST WIDELY USED CLOUD DATA WAREHOUSE, WITH TENS OF THOUSANDS OF CUSTOMERS
Take a lake house approach by Get up to 3x better price Start small and pay only for what
analyzing all your data across your performance than other cloud you use with predictable monthly
data warehouse, your Amazon S3 data warehouses with a self-tuning costs; Amazon Redshift is 50% less
data lake, and operational system, boost queries expensive than other cloud data
databases with consistent security up to 10x with AQUA, and achieve warehouses
and governance policies <1s latency with materialized views
Tens of thousands of customers process
exabytes of data with Amazon Redshift daily
NTT DOCOMO FOX Corp. Yelp Jack in the box Warner Bros.
Games
Moved >10 PB of data Taking a lake house Enabling a data-driven Improved ops by
from on-premises approach with RA3 organization with moving off of Performance, scale,
to cloud nodes and Amazon S3 concurrency scaling on-premises DW cost-effective
Amazon Redshift innovates to meet your needs
NEW! NEW! NEW! UPDATED! NEW! NEW! NEW!
Concurrency
Performance & scale RA3 nodes & AQUA Performance Materialized 100K tables HyperLogLog
scaling
managed storage tuning: views with auto
Fast and self-tuning automated refresh & rewrite
SQL
Amazon
Federated Query Redshift ML
Materialized Views
Operational
Amazon Redshift ML & analytics
databases Spectrum query S3 Data lake export services
Query live data, Analyze open
maintain standards-based
materialized views data formats
RA3
Managed Large high- High-bandwidth compute
storage speed cache networking nodes
“When we tested ATO in our development environment the performance of our queries was 25% faster than our
production workload not using ATO, without requiring any additional effort by our administrators.”
Nishesh Aggarwal, Enterprise Architecture Manager,
ZS Associates
Data sharing
A SECURE AND EASY WAY TO SHARE DATA ACROSS AMAZON REDSHIFT CLUSTERS
sales store_info
Simplify and accelerate iterative and predictable
item store cust price store owner loc
workloads, such as ETL, BI/dashboarding queries
i1 s1 c1 12.00 s1 Joe SF
MVs can be based on one or more Amazon Redshift i2 s2 c1 3.00 s2 Ann NY
tables or external tables (Spectrum, Federated) i3 s2 c2 7.00 s3 Lisa SF
Start Free:
https://www.trifacta.com/start-wrangling/