0% found this document useful (0 votes)
232 views

The Modern ELT Stack To Win With Cloud Data Warehousing

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
232 views

The Modern ELT Stack To Win With Cloud Data Warehousing

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

The Modern ELT Stack to Win with Cloud

Data Warehousing

1
Today’s Speakers

Will Davis TJ Holsman Vijay Balasubramaniam Greg Khairallah


VP of Marketing Partner Engineer Director, Partner Solutions Director of Analytics
Architect

2
Agenda
• Rise of Cloud Data Warehousing
• Modern ELT Stack Overview
• ELT in the Wild!
• Demo - ELT for Marketing Analytics
• Q&A

3
Traditional Analytics Process with ETL

Challenges with ETL


ERP

DATA
• Rigid - hard to adapt to changing
requirements/data
CRM WAREHOUSE
EXTRACT REPORTING
TRANSFORM &
LOAD DATA ANALYTICS
DATABASE MARTS
• Siloed - typically IT-led tools...not
exposed broadly
FILES

• Technical - not designed for data


analysts & scientists
IT-led Business-led
The Rise of the Cloud Data Warehouse
Search Volume - Cloud Data Warehouse (CDW)

“Traditionally, data was extracted, transformed, then


loaded – ETL, in short – into a data warehouse. For
ETL, complex transformation pipelines were built at
the data source. However, cloud data warehouses
have finally made it cost-effective to store all of a
company’s data in a central location: we no longer
Search Volume - Amazon Redshift (red) compared to CDW (blue) need to transform data before we load it into a data
warehouse. Transformation can be done when
running analytics in a data warehouse.”

- Martin Casado, Andreessen Horowitz


Modern ELT Stack for Cloud DW

Why ELT?
ERP
• Flexible - can transform data on-
DATA
WAREHOUSE

CRM DATA
the-fly to meet requirements
DATA PREPARATION DATA REPORTING
INGESTION

• Collaborative - fosters
WAREHOUSE ANALYTICS
(T) AI/ML
(E + L)
DATABASE

DATA
collaboration between data
FILES
LAKE
engineers & analysts

• No-code/low-code - empowers
a variety of users to do this work
IT-led Business-led
The Modern ELT Stack

DATA DATA DATA


INTEGRATION PREPARATION WAREHOUSE

Automate data integration Explore, clean & blend data Centralized data warehouse
from source to destination for use in analytics for reporting & analytics

7 © 2020 Trifacta | Proprietary and Confidential


Analytics Workflow in the Cloud

Raw Data Warehouse


Staging for Analytics

Data
Movement Amazon S3

Amazon
Redshift

Amazon
Redshift
Data
Preparation

AUTOMATION

8 © 2020 Trifacta | Proprietary and Confidential


ELT in the Wild
Autodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affi liates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders .
Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notic e, and is not responsible for typographical or graphical errors that may appear in this document.
© 2018 Autodesk. All rights reserved.
Data Challenges at Autodesk

Extract Manipulate Publish

Account & Financial PowerPoint


Product data Excel or Excel
Data

ID 3+ hours of work each iteration


Mapping
Table No automation or live updates to data
Increased likelihood of inaccuracies in final
output

Refresh
ELT at Autodesk

Transformation, automation

Amazon
Redshift

Amazon S3
Internal enterprise sources
Amazon EMR

External marketing and


market research sources
Amazon EC2 Dashboarding
tool
Data staging & storage Centralized data warehouse
ROI at Autodesk

Decreased time & effort in Speed of refresh &


transformation process updates due to automation

⮚ 3+ hours 🡪 < 1 hour ⮚ Refresh: 3+ hours 🡪 minutes

⮚ No crashes; centralization of ⮚ Updates: Recreation of entire


data process 🡪 update 1 data
source or pipeline

Decreased time & effort on


first iteration

Deeper & Proactive Insights


Callahan
Transforming data into action that
inspires creativity and accelerates growth
Data Challenges at Callahan

● Lots of data coming from different sources and in different formats

● Lots of dirty data that requires constant cleansing before it can be used or reported on

● A team of analysts that need to be able to ask a lot of questions of the data very quickly

● Demanding client-base who needs to be able to understand and communicate results fast

● Often tackling problems we have never run into before, and where there is no playbook to refer back to

● No database admins or data engineers on staff


ELT at Callahan

▶ PoS & eCommerce ▶ Custom APIs ▶ Cloud DW Trifacta ▶ Tableau


▶ Advertising & web data ▶ Fivetran ▶ Cloud Storage ▶ SQL
▶ CRM & other databases ▶Trifacta

Tableau

Cloud DW

Cloud Storage ▶ NOAA Weather ▶ Python


▶ ESRI Location Science ▶R
▶ US Census & Labor Data
▶ etc.
ROI at Callahan

● Fundamentally changed the way Callahan does business, in a competitively advantageous way

○ Has kept time spent on setting up and managing data pipelines to less than 30% of overall time spent on projects,

allowing analysts to spend more than 70% of their time on analysis

● Brought extreme value to our clients in terms of improved business results, and cost efficiencies

○ Media Result: 90% improvement in media impact, on a 50% reduction in media budget

○ CRM Result: 2x increase in customers, on a 60% reduction in leads purchased

○ Sales Result: 5% sales improvement during peak periods with ability to predict inventory out of stocks 3

weeks in advance
Demo
Amazon Redshift
Amazon Redshift
THE MOST WIDELY USED CLOUD DATA WAREHOUSE, WITH TENS OF THOUSANDS OF CUSTOMERS

ANALYZE ALL PERFORMANCE LOWER YOUR


YOUR DATA AT ANY SCALE COSTS

Take a lake house approach by Get up to 3x better price Start small and pay only for what
analyzing all your data across your performance than other cloud you use with predictable monthly
data warehouse, your Amazon S3 data warehouses with a self-tuning costs; Amazon Redshift is 50% less
data lake, and operational system, boost queries expensive than other cloud data
databases with consistent security up to 10x with AQUA, and achieve warehouses
and governance policies <1s latency with materialized views
Tens of thousands of customers process
exabytes of data with Amazon Redshift daily

NTT DOCOMO FOX Corp. Yelp Jack in the box Warner Bros.
Games
Moved >10 PB of data Taking a lake house Enabling a data-driven Improved ops by
from on-premises approach with RA3 organization with moving off of Performance, scale,
to cloud nodes and Amazon S3 concurrency scaling on-premises DW cost-effective
Amazon Redshift innovates to meet your needs
NEW! NEW! NEW! UPDATED! NEW! NEW! NEW!

Analyze all your data Amazon


Amazon Data sharing SUPER data Federated Lambda UDF Partner Materialized Redshift Data Lake
Lake house with Redshift ML type with Query console Views via Spectrum + Export
AWS integration JSON support integration AWS Glue Lake Formation
Elastic Views

UPDATED! COMING NEW! UPDATED! NEW! NEW!


SOON!

Concurrency
Performance & scale RA3 nodes & AQUA Performance Materialized 100K tables HyperLogLog
scaling
managed storage tuning: views with auto
Fast and self-tuning automated refresh & rewrite

UPDATED! NEW! NEW!

Low cost & best value


Automatic Cross-AZ cluster Data API Built-in security
Predictable costs workload On-demand Pause and Cost controls
recovery and RIs resume features
manager
Analyze all your data
WITH A LAKE HOUSE APPROACH TO ANALYTICS

BI and analytics apps


Connect apps to analyze and visualize your data

SQL

Amazon
Federated Query Redshift ML

Materialized Views

Operational
Amazon Redshift ML & analytics
databases Spectrum query S3 Data lake export services
Query live data, Analyze open
maintain standards-based
materialized views data formats

Amazon S3 data lake


Keep up to exabytes of data in S3
RA3 nodes with managed storage
SCALE COMPUTE AND STORAGE INDEPENDENTLY
Leader node

RA3
Managed Large high- High-bandwidth compute
storage speed cache networking nodes

Size of data warehouse only based on


steady state compute needs

Scale and pay independently for


Amazon Redshift
compute and storage
managed storage
Automatic, no changes to any workflows,
no need to manage storage
Concurrency scaling
COMPUTE ELASTICITY AND SCALABILITY TO HANDLE UNPREDICTABLE USER DEMAND

Scale out to multiple Amazon Redshift clusters


from a single endpoint in seconds

Support virtually unlimited concurrent


users and queries while maintaining SLAs

Per-second billing for additional clusters used

Cost controls and free one-hour usage per day


Amazon Redshift automates performance
tuning
ML-BASED OPTIMIZATIONS TO GET STARTED EASILY NEW NEW
AND GET THE FASTEST PERFORMANCE QUICKLY

Automates physical data design


and optimization
Automatic Automatic Automatic
Optimizes for peak performance vacuum delete distribution keys sort keys
as data and workloads scale
Updated Updated

Leverages machine learning to adapt


to shifting workloads

Automated performance tuning Auto workload Automatic MV auto-refresh


manager table sort and rewrite

“When we tested ATO in our development environment the performance of our queries was 25% faster than our
production workload not using ATO, without requiring any additional effort by our administrators.”
Nishesh Aggarwal, Enterprise Architecture Manager,
ZS Associates
Data sharing
A SECURE AND EASY WAY TO SHARE DATA ACROSS AMAZON REDSHIFT CLUSTERS

“Data sharing feature


seamlessly allows multiple
Amazon Redshift clusters
to query data located in
our RA3 clusters and their
managed storage. This
eliminates our concerns
with delays in making data
available for our teams,
reduces the amount of
data duplication and
associated backfill
headache. We now can
• Instant, granular, high-performance data access without data copies / movement
concentrate even more of
• Live and consistently updating views of data across all consumers our time making use of our
• Secure and governed collaboration within and across organizations and with external parties data in Amazon Redshift
and enable better
• Workloads accessing shared data are isolated from each other collaboration instead of
data orchestration.”
• Use cases: Cross-group collaboration and sharing, workload isolation and chargeability, data as
Steven Moy, Yelp
a service
Materialized views auto refresh and
query rewrite
SPEED UP QUERY PERFORMANCE BY ORDERS OF MAGNITUDE WITH PRECOMPUTED RESULTS

sales store_info
Simplify and accelerate iterative and predictable
item store cust price store owner loc
workloads, such as ETL, BI/dashboarding queries
i1 s1 c1 12.00 s1 Joe SF
MVs can be based on one or more Amazon Redshift i2 s2 c1 3.00 s2 Ann NY
tables or external tables (Spectrum, Federated) i3 s2 c2 7.00 s3 Lisa SF

Efficient incremental maintenance


Scheduled, automatic, or manually timed refresh
loc_sales
loc total_sales
Amazon Redshift auto query rewrite optimizes
SF 12.00
queries by replacing native tables with materialized
NY 10.00
views
“The Amazon Redshift materialized view auto query rewrite feature reduced dashboard load times from 8 minutes to just
500 ms. The best part is that this is completely transparent for Tableau and the business user.”
Arman Nasrollahi, Home24
Amazon Redshift ML
EASILY CREATE AND TRAIN ML MODELS USING SQL QUERIES WITH AMAZON SAGEMAKER

Use case: Product recommendations, fraud


prevention, reduce customer churn

Train and apply ML models using SQL

From fully automated training to partially


or fully guided training

Automatic pre-processing, creation, CREATE MODEL customer_churn


training, deployment of your model FROM (SELECT c.age, c.zip, c.monthly_spend,
c.monthly_cases, c.active FROM customer_info_table c)
TARGET c.active
FUNCTION predict_customer_churn
…;
Amazon Redshift ML
USE ML MODELS USING SQL QUERIES

Deploy inference models locally in


Amazon Redshift

Run an inference as invoking a user-defined


function as part of SQL statements

SELECT n.id, n.firstName, n.lastName,


predict_customer_churn(n.age,c.zip,..)
AS activity_prediction
FROM new_customers n
WHERE n.marital_status = ‘single’
…;
Q&A
How to Get Started?

Start Free:
https://www.trifacta.com/start-wrangling/

32 © 2020 Trifacta | Proprietary and Confidential


Thank You
[email protected] | Trifacta.com
© 2020 Trifacta | Proprietary and Confidential 33

You might also like