Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
Rajesh Francis
Sr. Analytics Specialist Solutions Architect
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Discussion Topics
• Reference Architectures
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
Data Trends Large-Scale Data Warehousing Service
Used by more
customers for
their data
warehouse
workloads than
anyone else
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges of data analytics at scale
VARIETY PERFORMANCE COST
• Proprietary formats
Machine
Anti- • Data silos learning
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business use cases for Lake House architecture
Capture all my Customer Orders, Income/Expenses, Credit/Debit,
Clickstream data
Run Supply Chain Optimization process
Route Planning… Store information about Customer, Product, Supplier details
Analyze weather patterns…...
What are my open orders to fulfill?
Relational
databases
Business Business
Intelligence Intelligence
Non-
Big data
relational
processing
databases
Data
warehousing
OLTP ERP CRM LO Devices Web Sensors Socia
B l
Amazon
Athena
Seamless
Amazon data movement
S3
Amazon Amazon
Elasticsearch SageMaker Unified governance
Service
Performant and
Amazon cost-effective
Redshift
© 2021, Amazon Web Services, Inc. or its Affiliates.
Amazon Redshift
Analyze all your data with the fastest and most widely used cloud data warehouse
Amazon
Aurora Analyze all your data
Deepest integration with your data lake
Amazon Amazon
EMR DynamoDB
Amazon
Athena
Amazon
S3
Performance at any scale
Up to 3x better price performance than other cloud DW
Amazon
Amazon
Elasticsearch
SageMaker
Service
Amazon
Analyze all your data Amazon Data Super data type Federated AWS Partner Amazon Redshift Data lake
Redshift ML sharing with JSON support query Lambda console Spectrum + export
Lake house with AWS integration UDF integration AWS Lake Formation
Performance & scale RA3 nodes & AQUA Performance Materialized 100K tables HyperLogLog Concurrency
managed storage tuning: views scaling
Fast and self-tuning
automated
Low cost & best value Automatic Cross-AZ cluster Data API On-demand Pause Cost controls Built-in
workload recovery and RIs and resume security features
Predictable costs manager
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
Analyze all your data
Amazon
Aurora Analyze all your data
Deepest integration with your data lake
Amazon Amazon
EMR DynamoDB
Amazon
Athena
Amazon
S3
Performance at any scale
Up to 3x better price performance than other cloud DW
Amazon
Amazon
Elasticsearch
SageMaker
Service
SQL
Federated Query
Amazon
AWS Glue Elastic Views Redshift ML
Operational
Amazon Redshift ML & analytics
databases Spectrum query S3 Data lake export services
Query live data Analyze open
standards-based
data formats
Anil Chalasani
Gainsight, VP Product Operations
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift cluster architecture
SQL Clients / BI Tools
• Leader node
• SQL endpoint JDBC/ODBC
• Stores metadata
• Coordinates parallel SQL processing & Leader
ML optimizations node
• Leader node is no-charge for clusters
with 2+ nodes Compute Compute Compute
• Compute nodes node node node
• Local, columnar storage
• Executes queries in parallel Load … … … … …
Spectrum
Redshift
• Load, unload, backup, restore from S3 Unload
Query ...
Backup 1 2 3 4 N
• Amazon Redshift Spectrum nodes Restore
Execute queries directly against data lake
architecture Amazon S3
Exabyte-scale object storage
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evolving architecture (2017–2020)
Incremental features released in the last few years
Redshift Spectrum for Concurrency Scaling for RA3 with independent Data sharing across clusters
data lake analytics bursty workloads compute and storage scaling
Producer Consumers
Amazon S3
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Sharing
A SECURE AND EASY WAY TO SHARE DATA ACROSS AMAZON REDSHIFT CLUST ERS
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Federated Query
UNIFIED ANALYTICS ACROSS DATABASES, DATA WAREHOUSE, AND DATA LAK E
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Native semi-structured data support
id name phones
SUPER
INTEGER SUPER
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Spectrum Overview Amazon
Aurora
Amazon
Benefits S3
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Steps to define and create External Schema & Tables
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake query services: How to choose?
Relational
databases
Non-
Big data
relational
processing
databases
Data Lake
Log Machine
analytics learning
Data
warehousing
• Data warehouse, highly-relational, • Interactive ad-hoc queries • Process large volume of data
complex joins • Serverless • Use big data tools like Apache
• Lake house architecture • No data warehouse, not 24x7 Hadoop, Spark, Presto, Hive
• Sub-second latency • Log analysis • Run Jupyter-based EMR
• Joins between data warehouse notebooks
• Offload S3 workload from
data & an S3 data lake Datawarehouse
Amazon EMR
Amazon Redshift Amazon Athena
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda UDFs
INTEGRATE WITH EXTERNAL SERVICES USING AWS LAMBDA
SELECT my_lambda_fn(t.value)
FROM my_table t Invoke AWS Lambda programs as UDFs in Amazon Redshift
WHERE customer = “c1” SQL queries
<return results>
1
Simple integration with external services
5
• Tokenization with third-party vendors like Protegrity
• More languages runtimes (C++, Java, etc.)
• Access Amazon DynamoDB, Amazon SageMaker, etc.
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Native partner integration
INTEGRATE WITH SELECT PARTNERS FROM THE AMAZON REDSHIFT CONSOLE
Reach out to [email protected] to integrate your product into the Amazon Redshift console today!
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architectures
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architecture
Data warehouse first approach - Load raw data to Redshift and publish refined data on S3 Data Lake
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Healthcare Analytics use case
Analytics as a Service
North Carolina
Subscriber1
Virginia
Healthcare Analytics provider Subscriber2
North Carolina
Subscriber1
Virginia
Subscriber2
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
Performance at any scale
Amazon
Aurora Analyze all your data
Deepest integration with your data lake
Amazon Amazon
EMR DynamoDB
Amazon
Athena
Amazon
S3
Performance at any scale
Up to 3x better price performance than other cloud DW
Amazon
Amazon
Elasticsearch
SageMaker
Service
Martin Brambley
Sirocco Systems, Director
Data Users
We saw an immediate 30 percent
improvement in end-to-end ETL
loading using the new DC2 node
from Redshift. This is fantastic
news for our clients as data
volumes and demand for analytics
Separation of continue to grow rapidly
Workloads
Storage & Compute
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RA3 nodes with managed storage
SCALE COMPUTE AND STORAGE INDEPENDENTLY
Leader node
RA3
Large high- High-bandwidth compute
Managed
storage speed cache networking nodes
Compute
Compute Compute Compute
Compute
New distributed & hardware-accelerated
Compute Compute
Compute Compute
Compute
Clusters
Clusters
Redshift Compute
Clusters Compute
Clusters
Clusters
Redshift processing layer
Clusters Clusters
Redshift
Clusters Clusters
Clusters
Cluster Clusters Clusters
Cluster
Cluster
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift automates performance tuning
ML-BASED OPTIMIZATIONS TO GET STARTED EASILY AND GET THE FASTEST PERFORMANCE QUICKLY
svv_alter_table_recommendations
logs the recommended changes Automatic MV auto-refresh
Auto workload
manager table sort and rewrite
svl_auto_worker_action logs audit
trail of changes
© 2021, Amazon Web Services, Inc. or its Affiliates.
Concurrency scaling
Compute elasticity and scalability to handle unpredictable user demand
• Elastic Resize:
• In-place: Add or remove nodes to/from existing cluster
• Scale-Out: Performance scales proportionally
• Time: Completes within few minutes. Limited disruption to
sessions and queries
• Slice count: remains the same as original cluster
• Classic Resize:
• New cluster: new cluster is provisioned and data copied
• Time: Proportional to data volume in original cluster
• Slice count: changes based on the new cluster
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Materialized views
Compute once, query many times
Amazon
Aurora Analyze all your data
Deepest integration with your data lake
Amazon Amazon
EMR DynamoDB
Amazon
Athena
Amazon
S3
Performance at any scale
Up to 3x better price performance than other cloud DW
Amazon
Amazon
Elasticsearch
SageMaker
Service
Helps achieve
Authentication Access control Audit Encryption compliance
AZ–1 AZ–2
Amazon Redshift has a service SLA of 99.9%
On-demand failover
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.