SlideShare a Scribd company logo
Cobus Bernard
Sr Developer Advocate
Amazon Web Services
Getting Started with Data
Warehouses on AWS
@cobusbernard
cobusbernard
cobusbernard
Agenda
Amazon Redshift
Columnar Data
Compression
Query execution with Redshift Spectrum
Demo
Q&A
Load
Unload
Backup
Restore
Amazon Redshift architecture
Massively parallel, shared
nothing columnar architecture
Leader node
SQL endpoint
Stores metadata
Coordinates parallel SQL processing
Compute nodes
Local, columnar storage
Executes queries in parallel
Load, unload, backup, restore
Amazon Redshift Spectrum nodes
Execute queries directly against Amazon S3
SQL clients/BI tools
JDBC/ODBC
Compute
node
Compute
node
Compute
node
Leader
node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Load
Query
Terminology and concepts: Node types
Amazon Redshift analytics – RA3
Amazon Redshift managed storage – Solid state disks + Amazon S3
Dense compute – DC2
Solid state disks
Dense storage – DS2
Magnetic disks
Instance type Disk type Size Memory CPUs Slices
RA3 4xlarge RMS Scales to 16 TB 96 GB 12 4
RA3 16xlarge RMS Scales to 64 TB 384 GB 48 16
DC2 large SSD 160 GB 16 GB 2 2
DC2 8xlarge SSD 2.56 TB 244 GB 32 16
DS2 xlarge Magnetic 2 TB 32 GB 4 2
DS2 8xlarge Magnetic 16 TB 244 GB 36 16
Terminology and concepts: Columnar
Amazon Redshift uses a columnar architecture for storing data on disk
Goal: reduce I/O for analytics queries
Physically store data on disk by column rather than row
Only read the column data that is required
Row-based storage
• Need to read everything
• Unnecessary I/O
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
SELECT min(dt) FROM deep_dive;
Columnar architecture: Example
Column-based storage
• Only scan blocks for
relevant column
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
SELECT min(dt) FROM deep_dive;
Columnar architecture: Example
Compression example
Add 1 of 13 different encodings
to each column
aid loc dt
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
Compression example
• More efficient compression is due
to storing the same data type in
the columnar architecture
• Columns grow and shrink
independently
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE deep_dive (
aid INT ENCODE AZ64
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) SORTKEY (dt, loc);
Example: Sort key
Add a sort key to one or more
columns to physically sort
the data on disk
deep_dive
aid loc dt
1 SFO 2017-10-20
2 JFK 2017-10-20
3 SFO 2017-04-01
4 JFK 2017-05-14
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
2 JFK 2017-10-20
1 SFO 2017-10-20
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
2 JFK 2017-10-20
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
4 JFK 2017-05-14
deep_dive (sorted)
aid loc dt
3 SFO 2017-04-01
SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017';
MIN: 01-JUNE-2017
MAX: 06-JUNE-2017
MIN: 07-JUNE-2017
MAX: 12-JUNE-2017
MIN: 13-JUNE-2017
MAX: 21-JUNE-2017
MIN: 21-JUNE-2017
MAX: 30-JUNE-2017
Sorted by date
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017
MIN: 08-JUNE-2017
MAX: 30-JUNE-2017
MIN: 12-JUNE-2017
MAX: 20-JUNE-2017
MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Unsorted table
Example: Zone maps and sorting
Data distribution
Distribution style is a table property that
dictates how that table’s data is distributed
throughout the cluster
KEY: Value is hashed, same value goes to same location (slice)
ALL: Full table data goes to the first slice of every node
EVEN: Round robin
AUTO: Combines EVEN and ALL
Goals
Distribute data evenly for
parallel processing
Minimize data movement
during query processing
KEY
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
keyA
keyB
keyC
keyD
ALL
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
EVEN
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Data distribution example
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Table: deep_dive
User columns System columns
aid loc dt ins del row
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) (EVEN|KEY|ALL|AUTO);
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Data distribution, EVEN example
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE EVEN;
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Data distribution, KEY example #1
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0Rows: 0Rows: 1
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (loc);
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Data distribution, KEY example #2
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (aid);
Node 1
Slice 0 Slice 1
Node 2
Slice 2 Slice 3
Data distribution, ALL example
INSERT INTO deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE ALL;
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Load
Query
Amazon Redshift architecture
Massively parallel, shared
nothing columnar architecture
Leader node
SQL endpoint
Stores metadata
Coordinates parallel SQL processing
Compute nodes
Local, columnar storage
Executes queries in parallel
Load, unload, backup, restore
Amazon Redshift Spectrum nodes
Execute queries directly against Amazon S3
SQL clients/BI tools
JDBC/ODBC
Compute
node
Compute
node
Compute
node
Leader
node
Amazon S3
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
1
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Query is optimised and compiled at
the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum
2
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Query plan is sent to
all compute nodes3
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Compute nodes obtain partition info from
Data Catalog; dynamically prune partitions4
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Each compute node issues multiple
requests to the Amazon Redshift
Spectrum layer
5
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Amazon Redshift Spectrum nodes
scan your S3 data
6
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
7
Amazon Redshift
Spectrum projects,
filters, and aggregates
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Final aggregations and joins with
local Amazon Redshift tables
done in-cluster
8
Life Of A Query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive
Compatible Metastore
Result is sent back to client9
Data warehouse
(business data)
Data lake
(event data)
Amazon
Redshift
Amazon Redshift enables you to have a lake house approach
Customers moving to data lake architectures
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sources
Targets
Relational NoSQL Analytics Data Warehouse*
Amazon Aurora
Amazon Aurora
SAP ASE
Amazon DynamoDB
Amazon DocumentDB (with
MongoDB compatibility)
Db2 LUW SQL Azure
SAP ASE
MongoDB Cassandra
Amazon Redshift
Amazon S3
AWS SnowballAmazon S3
Amazon Elasticsearch
Service
Amazon Kinesis
Data Streams
Oracle SQL Server Netezza
Greenplum Teradata Vertica
* Supported via SCT data extractors
Supported DMS source and targets
https://aws.training
https://aws.amazon.com/training/path-databases
https://aws.amazon.com/training/path-advanced-networking
https://bit.ly/aws-office-hours
https://bit.ly/africa-virtual-day
https://bit.ly/emea-summit
https://bit.ly/cobus-youtube
Useful links
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cobus Bernard
Sr Developer Advocate
Amazon Web Services
@cobusbernard
cobusbernard
cobusbernard
Ad

More Related Content

Similar to AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS (12)

SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
Eric Nelson
 
Fotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging CommunityFotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging Community
farhan "Frank"​ mashraqi
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
yiditushe
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
PoguttuezhiniVP
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
Ajeet Singh
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
wangzhonnew
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
psathishcs
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
Andrey Lomakin
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
Eric Nelson
 
Fotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging CommunityFotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging Community
farhan "Frank"​ mashraqi
 
扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
yiditushe
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
PoguttuezhiniVP
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
Ajeet Singh
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
Ruben Verborgh
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
wangzhonnew
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
psathishcs
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
Andrey Lomakin
 

More from Cobus Bernard (20)

London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservices
Cobus Bernard
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
Cobus Bernard
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
Cobus Bernard
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
Cobus Bernard
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
Cobus Bernard
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
Cobus Bernard
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DR
Cobus Bernard
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
Cobus Bernard
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
Cobus Bernard
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
Cobus Bernard
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containers
Cobus Bernard
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
Cobus Bernard
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
Cobus Bernard
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
Cobus Bernard
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
Cobus Bernard
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
Cobus Bernard
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: Security
Cobus Bernard
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with Containers
Cobus Bernard
 
HashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with TerraformHashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with Terraform
Cobus Bernard
 
AWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: NetworkingAWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: Networking
Cobus Bernard
 
London Microservices Meetup: Lessons learnt adopting microservices
London Microservices  Meetup: Lessons learnt adopting microservicesLondon Microservices  Meetup: Lessons learnt adopting microservices
London Microservices Meetup: Lessons learnt adopting microservices
Cobus Bernard
 
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
AWS SSA Webinar 34 - Getting started with databases on AWS - Managing DBs wit...
Cobus Bernard
 
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDBAWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB
Cobus Bernard
 
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the righ...
Cobus Bernard
 
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
AWS SSA Webinar 30 - Getting Started with AWS - Infrastructure as Code - Terr...
Cobus Bernard
 
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as CodeAWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
AWS SSA Webinar 28 - Getting Started with AWS - Infrastructure as Code
Cobus Bernard
 
AWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DRAWS Webinar 24 - Getting Started with AWS - Understanding DR
AWS Webinar 24 - Getting Started with AWS - Understanding DR
Cobus Bernard
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
Cobus Bernard
 
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: ServicesAWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
AWS SSA Webinar 19 - Getting Started with Multi-Region Architecture: Services
Cobus Bernard
 
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: DataAWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
AWS SSA Webinar 18 - Getting Started with Multi-Region Architecture: Data
Cobus Bernard
 
AWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containersAWS EMEA Online Summit - Live coding with containers
AWS EMEA Online Summit - Live coding with containers
Cobus Bernard
 
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
AWS EMEA Online Summit - Blending Spot and On-Demand instances to optimizing ...
Cobus Bernard
 
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDSAWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
AWS SSA Webinar 17 - Getting Started on AWS with Amazon RDS
Cobus Bernard
 
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
AWS SSA Webinar 16 - Getting Started on AWS with Amazon EC2
Cobus Bernard
 
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKSAWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
AWS SSA Webinar 15 - Getting started on AWS with Containers: Amazon EKS
Cobus Bernard
 
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECSAWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
AWS SSA Webinar 13 - Getting started on AWS with Containers: Amazon ECS
Cobus Bernard
 
AWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: SecurityAWS SSA Webinar 11 - Getting started on AWS: Security
AWS SSA Webinar 11 - Getting started on AWS: Security
Cobus Bernard
 
AWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with ContainersAWS SSA Webinar 12 - Getting started on AWS with Containers
AWS SSA Webinar 12 - Getting started on AWS with Containers
Cobus Bernard
 
HashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with TerraformHashiTalks Africa - Going multi-account on AWS with Terraform
HashiTalks Africa - Going multi-account on AWS with Terraform
Cobus Bernard
 
AWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: NetworkingAWS SSA Webinar 10 - Getting Started on AWS: Networking
AWS SSA Webinar 10 - Getting Started on AWS: Networking
Cobus Bernard
 
Ad

Recently uploaded (19)

Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
Ad

AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS

  • 1. Cobus Bernard Sr Developer Advocate Amazon Web Services Getting Started with Data Warehouses on AWS @cobusbernard cobusbernard cobusbernard
  • 2. Agenda Amazon Redshift Columnar Data Compression Query execution with Redshift Spectrum Demo Q&A
  • 3. Load Unload Backup Restore Amazon Redshift architecture Massively parallel, shared nothing columnar architecture Leader node SQL endpoint Stores metadata Coordinates parallel SQL processing Compute nodes Local, columnar storage Executes queries in parallel Load, unload, backup, restore Amazon Redshift Spectrum nodes Execute queries directly against Amazon S3 SQL clients/BI tools JDBC/ODBC Compute node Compute node Compute node Leader node Amazon S3 ... 1 2 3 4 N Amazon Redshift Spectrum Load Query
  • 4. Terminology and concepts: Node types Amazon Redshift analytics – RA3 Amazon Redshift managed storage – Solid state disks + Amazon S3 Dense compute – DC2 Solid state disks Dense storage – DS2 Magnetic disks Instance type Disk type Size Memory CPUs Slices RA3 4xlarge RMS Scales to 16 TB 96 GB 12 4 RA3 16xlarge RMS Scales to 64 TB 384 GB 48 16 DC2 large SSD 160 GB 16 GB 2 2 DC2 8xlarge SSD 2.56 TB 244 GB 32 16 DS2 xlarge Magnetic 2 TB 32 GB 4 2 DS2 8xlarge Magnetic 16 TB 244 GB 36 16
  • 5. Terminology and concepts: Columnar Amazon Redshift uses a columnar architecture for storing data on disk Goal: reduce I/O for analytics queries Physically store data on disk by column rather than row Only read the column data that is required
  • 6. Row-based storage • Need to read everything • Unnecessary I/O aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 SELECT min(dt) FROM deep_dive; Columnar architecture: Example
  • 7. Column-based storage • Only scan blocks for relevant column aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 SELECT min(dt) FROM deep_dive; Columnar architecture: Example
  • 8. Compression example Add 1 of 13 different encodings to each column aid loc dt CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14
  • 9. Compression example • More efficient compression is due to storing the same data type in the columnar architecture • Columns grow and shrink independently • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE deep_dive ( aid INT ENCODE AZ64 ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH ); aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14
  • 10. CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) SORTKEY (dt, loc); Example: Sort key Add a sort key to one or more columns to physically sort the data on disk deep_dive aid loc dt 1 SFO 2017-10-20 2 JFK 2017-10-20 3 SFO 2017-04-01 4 JFK 2017-05-14 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 2 JFK 2017-10-20 1 SFO 2017-10-20 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 2 JFK 2017-10-20 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01 4 JFK 2017-05-14 deep_dive (sorted) aid loc dt 3 SFO 2017-04-01
  • 11. SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017'; MIN: 01-JUNE-2017 MAX: 06-JUNE-2017 MIN: 07-JUNE-2017 MAX: 12-JUNE-2017 MIN: 13-JUNE-2017 MAX: 21-JUNE-2017 MIN: 21-JUNE-2017 MAX: 30-JUNE-2017 Sorted by date MIN: 01-JUNE-2017 MAX: 20-JUNE-2017 MIN: 08-JUNE-2017 MAX: 30-JUNE-2017 MIN: 12-JUNE-2017 MAX: 20-JUNE-2017 MIN: 02-JUNE-2017 MAX: 25-JUNE-2017 Unsorted table Example: Zone maps and sorting
  • 12. Data distribution Distribution style is a table property that dictates how that table’s data is distributed throughout the cluster KEY: Value is hashed, same value goes to same location (slice) ALL: Full table data goes to the first slice of every node EVEN: Round robin AUTO: Combines EVEN and ALL Goals Distribute data evenly for parallel processing Minimize data movement during query processing KEY Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 keyA keyB keyC keyD ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4
  • 13. Node 1 Data distribution example Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Table: deep_dive User columns System columns aid loc dt ins del row CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) (EVEN|KEY|ALL|AUTO);
  • 14. Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Data distribution, EVEN example INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN;
  • 15. Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Data distribution, KEY example #1 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0Rows: 0Rows: 1 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc);
  • 16. Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Data distribution, KEY example #2 INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0Rows: 1 Rows: 1 Rows: 1 Rows: 1 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid);
  • 17. Node 1 Slice 0 Slice 1 Node 2 Slice 2 Slice 3 Data distribution, ALL example INSERT INTO deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL;
  • 18. ... 1 2 3 4 N Amazon Redshift Spectrum Load Query Amazon Redshift architecture Massively parallel, shared nothing columnar architecture Leader node SQL endpoint Stores metadata Coordinates parallel SQL processing Compute nodes Local, columnar storage Executes queries in parallel Load, unload, backup, restore Amazon Redshift Spectrum nodes Execute queries directly against Amazon S3 SQL clients/BI tools JDBC/ODBC Compute node Compute node Compute node Leader node Amazon S3
  • 19. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore 1
  • 20. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Query is optimised and compiled at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum 2
  • 21. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Query plan is sent to all compute nodes3
  • 22. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Compute nodes obtain partition info from Data Catalog; dynamically prune partitions4
  • 23. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Each compute node issues multiple requests to the Amazon Redshift Spectrum layer 5
  • 24. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Amazon Redshift Spectrum nodes scan your S3 data 6
  • 25. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore 7 Amazon Redshift Spectrum projects, filters, and aggregates
  • 26. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Final aggregations and joins with local Amazon Redshift tables done in-cluster 8
  • 27. Life Of A Query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Compatible Metastore Result is sent back to client9
  • 28. Data warehouse (business data) Data lake (event data) Amazon Redshift Amazon Redshift enables you to have a lake house approach Customers moving to data lake architectures
  • 29. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. Sources Targets Relational NoSQL Analytics Data Warehouse* Amazon Aurora Amazon Aurora SAP ASE Amazon DynamoDB Amazon DocumentDB (with MongoDB compatibility) Db2 LUW SQL Azure SAP ASE MongoDB Cassandra Amazon Redshift Amazon S3 AWS SnowballAmazon S3 Amazon Elasticsearch Service Amazon Kinesis Data Streams Oracle SQL Server Netezza Greenplum Teradata Vertica * Supported via SCT data extractors Supported DMS source and targets
  • 32. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cobus Bernard Sr Developer Advocate Amazon Web Services @cobusbernard cobusbernard cobusbernard