0% found this document useful (0 votes)

71 views

Benchmark Report - Amazon Redshift

Uploaded by

lborrego_bacit

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

Benchmark Report - Amazon Redshift

Uploaded by

lborrego_bacit

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Cloud Analytics

Performance Benchmark:
Amazon Redshift

By David Mariani & Krasimir Kovachki

2021-Q3
TABLE OF CONTENTS

Executive Summary________________________________________________ 1
Introduction_____________________________________________________ 2
Leveraging Amazon Redshift for BI and Analytics_________________________ 2
The Power of AtScale and Amazon Redshift___________________________ 3

Benchmark Methodology____________________________________________ 4
Benchmark Measurements_________________________________________ 4
Benchmark Dataset______________________________________________ 5
Benchmark Queries______________________________________________ 6
Test Harness___________________________________________________ 6
Configuration Tested_____________________________________________ 7
Query Performance for a Single User Test Methodology____________________ 8
Query Performance with Concurrency Test Methodology___________________ 8
Compute Cost Calculations________________________________________ 8

Summary Results__________________________________________________ 9
Queary Performance Test Results___________________________________ 10
Concurrent Query Performance____________________________________ 11
Median Query Time by TPC-DS Query Test Results_______________________ 13
Compute Cost Test Results________________________________________ 14
Complexity Test Results__________________________________________ 15

Conclusion_____________________________________________________ 19

© 2021 AtScale Inc. All rights reserved.

Executive Summary
This benchmarking study was conducted to quantify the benefits of using the AtScale semantic
layer platform with the Amazon Redshift data platform to manage BI and analytics workloads. The
comparative analysis was based on four defined measurements: Query Performance, Concurrent
Query Performance, Compute Cost, and SQL Complexity. Using the standard TPC-DS (10TB)
benchmarking framework, measurements were taken for raw Amazon Redshift and for AtScale on
Amazon Redshift that showed the clear advantages for combining AtScale with Amazon Redshift to
accelerate and optimize BI and analytics programs.

Improvement Factor with AtScale

Test Amazon Redshift

Query Performance1 11x Faster

Concurrent Query Performance2 31x Faster

Compute Cost3 3,7x Cheaper

Complexity4 76% less complex SQL queries

Figure 1: Improvements with AtScale

This analysis is a refresh of a study first done in 2020 using the same methodology. The results
illustrate improvement in Amazon Redshift’s raw performance, but with clear benefits for the
combined solution

1
Elapsed time for executing 1 query five times
2
Elapsed time executing 1 (x5), 5, 25, 50 queries
3
Compute costs for cluster time for user concurrency test
4
Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)

© 2021 AtScale Inc. All rights reserved. 1

Introduction
The enterprise has entered into a new era of data warehousing. Driven by the increasing popularity of
the public cloud, new cloud-based data platforms have become the dominant choice for enterprises
managing their data. By offering customers the power of a relational, scale-out data platform without
the overhead of managing it, cloud data platforms promise to make more data available at a lower cost
with fewer data management headaches.

Leveraging Amazon Redshif for BI and Analytics

Amazon Redshift is a great choice of cloud data platform for a number of reasons. First, Amazon
Redshift was the first cloud data platform to hit the market and it is therefore the most mature and well
known cloud data platform with a wide range of third party tooling support. Next, Redshift is the most
economical choice of all the cloud data platforms tested and delivers very good performance for the
money.

Amazon Redshift is the most economical choice of all

the cloud data platforms tested and delivers very good
performance for the money.

Finally, Amazon Redshift offers Redshift Spectrum which provides direct query access to files on S3
which extends Redshift’s reach into data lakes seamlessly.

© 2021 AtScale Inc. All rights reserved. 2

The Power of AtScale and Amazon Redshift
While cloud data platforms reduce the maintenance cost and scaling headaches of managing data
infrastructure for IT, they don’t make data any easier to understand or access for analytics consumers,
nor do they help IT better predict and control cloud costs. The AtScale platform works natively with
cloud data platforms to deliver an analytics semantic layer for business intelligence (BI) and data
science teams.

The AtScale semantic layer provides the following benefits:

1. It presents a consistent set of business-friendly metrics for BI and data science teams
to consume data with the tools of their choice.

2. It provides an integration layer to support analytics discoverability, governance,

and security.

3. It accelerates end-to-end query performance while optimizing data platform resources

and costs.

By leveraging a graph-based semantic model, the AtScale platform sends queries to Amazon Redshift
using its data virtualization engine and pushes workloads to the Amazon Redshift platform. By
automatically creating and managing aggregate tables on Amazon Redshift based on user query
patterns, AtScale avoids costly atomic table scans and delivers superior query performance by re-
writing queries to access those aggregate tables.

In this study, we will compare the performance, complexity and costs of these cloud data platforms with
and without the AtScale platform.

© 2021 AtScale Inc. All rights reserved. 3

Benchmarking Methodology
Benchmark Measurements
This benchmark uses four key metrics to compare Amazon Redshift to Amazon Redshift + AtScale. The
metrics are designed to answer basic questions relevant to enterprise analytics leaders.

Query User Compute Semantic

Performance Concurrency Costs Complexity
How fast can the cloud How do multiple users How do query workloads How difficult is it to write
Data Warehouse answer a running queries affect and configuration impact the query to answer the
query for one user? performance & stability? your monthly bill? business question?

Run 20 TPC-DS Queries for 1 user Run 20 TPC-DS Queries for 5, 25 & Measure the total elapsed time or Compare the raw TPC-DS SQL
ﬁve times & measure the total 50 users one time & measure the bytes read for the query & queries to the equivalent BI
elapsed time on a TPC-DS 10TB total elapsed time on a TPC-DS concurrency test on a TPC-DS semantic layer queries on a
dataset 10TB dataset 10TB dataset TPC-DS 10TB dataset

Figure 2: Benchmark Testing Topics

By automatically creating and managing aggregate tables

on Amazon Redshift based on user query patterns, AtScale
avoids costly atomic table scans and delivers superior
query performance by re-writing queries to access those
aggregate tables.

© 2021 AtScale Inc. All rights reserved. 4

Benchmark Dataset
We used the TPC-DS benchmark v2.11.0 from the Transaction Processing Council (TPC) for our tests.
We chose the 10TB (scale factor 10,000) version for this benchmark to better measure scalability
limits of each platform and to simulate a typical enterprise workload. This version’s largest fact table
(store_sales) at 28+ billion rows and the largest dimension (customer) at 65 million rows is a significant
scale challenge for most data platforms. In addition, the TPC-DS benchmark is ubiquitous amongst the
database warehouse vendors and we felt it represented a reasonable real-life analytics schema and set
of queries.

Table Name Row Size Row Count

call_center 305 54
catalog_page 139 40,000
catalog_returns
catalog_sales
166
226
1,440,033,112
14,399,964,710
THE TPC-DS 10TB
customer
customer_address
132
110
65,000,000
32,500,000
DATASET HAS:
customer_demographics 42 1,920,800

1
date_dim 141 73,049
household_demographics 21 7,200 Multiple fact tables
income_band 16 20
inventory 16 1,311,525,000
item 281 402,000
promotions
reason
124
38
2,000
70
2 Large fact tables
ship_mode 56 20
store 263 1,500
store_returns
store_sales
134
164
2,879,970,104
28,799,983,563 3 Large dimensions
time_dim 59 86,400
warehouse 117 25
web_page 96 4,002
web_returns 162 720,020,485
web_sales 226 7,199,963,324
web_site 292 78

Figure 3: TPC-DS 10TB Table SIzes

© 2021 AtScale Inc. All rights reserved. 5

Benchmark Queries
We selected a representative set of 20 queries from the 99 TPC-DS queries set to keep the run time and
costs of running the benchmarks within reason without having to downsize data size. The queries were
chosen in no particular order and were selected to eliminate redundancy and to ensure the usage of
most tables. It was imperative to benchmark the cloud data warehouse vendors with the largest data we
could afford and test to reveal real-life differences in the respective platforms.

The following 20 TPC-DS queries were selected for the test:

Figure 4: TPC-DS Test Queries

Test Harness
To ensure consistency for concurrency tests, we ran queries using v5.4.1 of Apache JMeter. The
instructions, documentation, utility scripts, results, and JMeter JMX files can be found in our GitHub
repository and are available upon request.

We designed the JMeter test suites to run the above 20 queries in the following four configurations:
▲ 1 concurrent user, 5 loops ▲ 25 concurrent users, 1 loop
(averaging the result to even out cold starts)
▲ 50 concurrent users, 1 loop
▲ 5 concurrent users, 1 loop

© 2021 AtScale Inc. All rights reserved. 6

We originally planned to run a 100 thread user concurrency test for Amazon Redshift but found
challenges at the 100 concurrent user level. Running 100 concurrent users without the help of a
semantic layer like AtScale proved to be a scaling challenge that resulted in extended run times as a
result of query queuing. Amazon Redshift has an option for managing user concurrency automatically
by spinning up additional clusters automatically. We did not test this feature because we wanted to
keep our resource level fixed for apples to apples comparisons for the different data platform choices.
As a result, we only ran the 100 thread tests with AtScale.

Configuration Tested
The following Snowflake configuration was used for the test:

Vendor Configuration Compute Cost per Hour5

dc2.8xlarge (6 nodes at
Amazon Redshift $28.80
$4.80 per node)

Figure 5: Data Platform Configurations

For the test, we used Amazon Redshift’s “out of the box” configuration. We did not manually tune any
of the TPC-DS queries and used the same clustering scheme used in AWS Labs’ TPC-DS benchmark in
GitHub.

Query Performance for a Single User Test Methodology

To test raw query performance, we ran the 20 TPC-DS queries with one concurrent user five times and
calculated the average elapsed time to finish each query. The elapsed time is simply the difference
between the start and end time of the test as reported by JMeter. We disabled Amazon Redshift’s query
caching for this test.

5
Storage cost wasn’t factored in (only compute cost)

© 2021 AtScale Inc. All rights reserved. 7

Query Performance with Concurrency Test Methodology
To test how each data warehouse performs with different levels of user concurrency, we ran each of
the 20 TPC-DS queries with 1, 5, 25 and 50 concurrent users using JMeter. We added a 750ms sleep
between each query start and using a single connection pool that was sized according to the number
of threads for the test. We used 1 loop (iteration) for the 5, 25, and 50 thread test and 5 loops for the
1 thread test. The elapsed time is simply the difference between the start and end time of each thread
test as reported by JMeter. We disabled Amazon Redshift’s query caching for this test.

Compute Cost Calculations

Amazon Redshift charges per hour, per cluster node with options for higher powered node types. We
calculated the compute costs by multiplying the total end-to-end run time as reported by JMeter for the
concurrency test by the cluster compute cost per hour like so:

ConcurrencyRunTimeMinutes / 60 * ComputeCostPerHour

We explicitly excluded storage costs from our calculations. We found that storage cost was nominal
across all platforms and given that it’s a fixed cost, it was not subject to variation in our testing
scenarios.

© 2021 AtScale Inc. All rights reserved. 8

Summary Results
We also ran the same 20 TPC-DS queries through the AtScale platform for Amazon Redshift. AtScale’s
Acceleration Structures showed major benefits in accelerating query performance, improving user
concurrency and reducing compute costs. AtScale’s semantic layer also drastically reduced the
complexity of the TPC-DS queries by hiding the joins and calculations from consumers. The illustration
below shows the extent of the benefits AtScale provides on top of the Amazon Redshift data warehouse:

Query Performance6 User Concurrency7 Compute Costs8 Semantic Complexity9

11X 31X 37X

.
Faster
76% Faster Cheaper less complex
SQL queries

Figure 6: Improvements with AtScale

6
Elapsed time for executing 1 query five times
7
Elapsed time executing 1 (x5), 5, 25, 50 queries
8
Compute costs for cluster time for user concurrency test
9
Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)

© 2021 AtScale Inc. All rights reserved. 9

Query Performance Test Results
For the query performance test, we ran our 20 TPC-DS queries 5 times each using JMeter with a single
thread. Even at a single concurrent user, we saw orders of magnitude improvement using AtScale on the
Amazon Redshift data warehouse in this test.

Elapsed Run Time (Minutes)

1 User - Redshift

3.483
3.5

3.0

2.5
Cost

2.0

1.5

1.0

0.5
0.330

0.0

No AtScale AtScale

Figure 7: Elapsed Run Time for 1 Thread

© 2021 AtScale Inc. All rights reserved. 10

Concurrent Query Performance
For the user concurrency test, we ran consecutive JMeter suites configured to execute 1, 5, 25, and 50
queries at the same time to simulate user concurrency. Each test ran 1 iteration with the exception of
the 1 thread test which ran 5 iterations sequentially.

In this test, we saw some real impact in query performance under additional user concurrency load. To
be fair, Amazon Redshift offers an option for their enterprise edition for automated concurrency scaling.
This option dynamically adds more cluster resources to handle concurrency bottlenecks. We chose not
to enable this option in order to quantify performance for a fixed level of resource.

Elapsed Run Time (Minutes)

All Runs - Redshift
150
143.0

140

130

120

110
Run Time (Minutes)

100

10
4.6

0
No AtScale Q3 2021 AtScale Q3 2021

Figure 8: Elapsed Run Time for All Runs

© 2021 AtScale Inc. All rights reserved. 11

Elapsed Time (Minutes)
by Thread Group - Redshift

1 5 25 50

80.0 79.03

70.0
Run Time (Minutes)

60.0

50.0

40.0 37.53

30.0

20.0

10.0 8.55

3.48 1.42
0.33 0.35 0.78
0.0

No AtScale AtScale

Figure 9: Elapsed Run Time by Thread

© 2021 AtScale Inc. All rights reserved. 12

Median Query Time by TPC-DS Query Test Results
The following chart (logarithmic scale) illustrates the benefits of AtScale for each of the 20 TPC-DS
queries (by TPC-DS Query number) tested with a median reference line overlay for comparison. This is
the median elapsed query time for all runs (1, 5, 25, 50 concurrent users) so data platform load is taken
into account. Notice that for Amazon Redshift raw (without AtScale), the median query time is almost
2 minutes versus Amazon Redshift on AtScale at a median time of 1.8 seconds. For interactive business
intelligence, elapsed query times over 10 seconds are not typically not acceptable by users which may
force IT to use data extracts or external caching solutions instead.

Average Query Time by Query (Seconds)

All Runs - Redshift
546.3
500
369.3 340.9
319.1 249.7

200 185.2
136.3 147.1 136.3 117.8
Elapsed Time (Seconds)

100 61.7
90.6
55.1 55.2 55.7
50 29.6
27.3 26.4

20
11.6
10 9.2 No AtScale
5

0.5

Median = 104.2 seconds

500

200
Elapsed Time (Seconds)

100

20
2.0
10.4
10 AtScale
5
3.5 3.4 2.8
2.7 2.7 2.3 2.1
2 1.7 1.6 1.7 1.9
1.5 1.4 1.4 1.4 1.2 1.1
1 0.8

0.5
2 7 13 15 26 31 33 42 48 50 52 53 55 56 60 61 71 88 96 98

Median = 1.8 seconds

Figure 10: Average query time by TPC-DS query number with median

Compute Cost Test Results
You will also see the value that AtScale can bring to cost predictability. By minimizing the amount of
data scanned, AtScale takes less time to run queries, with fewer resources used, which means more
users can run queries at the same time (higher concurrency) without additional hardware or resources.

Compute Cost
All Runs - Redshift
$70.00
$68.39

$65.00

$60.00

$55.00

$50.00

$45.00

$40.00
Cost

$35.00

$30.00

$25.00

$20.00 $18.31

$15.00

$10.00

$5.00

$0.00
No AtScale AtScale

Figure 11: Compute Costs for All Thread Groups

Complexity Test Results
The TPC-DS benchmark provides a good illustration of just how hard it can be to write SQL to answer
a simple business question. Translating tables and star schemas into business logic is not an easy
task. With today’s BI tools, our business users are spending more and more time dealing with data
engineering tasks rather than getting answers to their business questions.

For example, with query #60 of the TPC-DS benchmark, the business question is fairly straightforward
but the SQL to express it is not. .

BUSINESS QUESTION:

What is the monthly sales amount for a specific month in a

specific year, for items in a specific category, purchased by
customers residing in a specific time zone?

SQL TO ANSWER BUSINESS QUESTION:

TPC-DS Raw
with ss as ( item
select where
i_item_id,sum(ss_ext_sales_price) total_sales i_item_id in (select
from i_item_id
store_sales, from
date_dim, item
customer_address, where i_category in (‘Jewelry’))
item and cs_item_sk = i_item_sk
where and cs_sold_date_sk = d_date_sk
i_item_id in (select and d_year = 1999
i_item_id and d_moy =9
from and cs_bill_addr_sk = ca_address_sk
item and ca_gmt_offset = -6
where i_category in (‘Jewelry’)) group by i_item_id),
and ss_item_sk = i_item_sk ws as (
and ss_sold_date_sk = d_date_sk select
and d_year = 1999 i_item_id,sum(ws_ext_sales_price) total_sales
and d_moy =9 from
and ss_addr_sk = ca_address_sk web_sales,
and ca_gmt_offset = -6 date_dim,
group by i_item_id), customer_address,
cs as ( item
select where
i_item_id,sum(cs_ext_sales_price) total_sales i_item_id in (select
from i_item_id
catalog_sales, from
date_dim, item
customer_address, ...

26,640 bytes
Figure 12: TPC-DS Raw SQL to answer question

As you can see, it’s not at all obvious what the query is doing and obviously there’s a lot of repetition
which makes it very prone to error.

In response to this challenge, for this benchmark study, we defined an AtScale model that drastically
simplifies user queries by translating the raw tables and schema into a business semantic layer. The
following screenshot is the TPC-DS model expressed in AtScale Design Center:

Figure 13: AtScale TPC-DS Data Model

Instead of writing complex SQL or engineering data models in the BI tool, this business question was
easily answered with Tableau on AtScale as you can see below:

Figure 14: Tableau on AtScale TPC-DS Model for Query #60

The visualization above for TPC-DS query #60 generated the following SQL against AtScale:

AtScale SQL
SELECT
`d_product_item_id` AS `d_product_item_id`,
SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok`
FROM
`tpc-ds benchmark model - snowflake`.`tpc-ds benchmark model` `tpc_ds_benchmark_model`
WHERE
`I Category` = ‘Jewelry’
AND `Sold Calendar Year` = 1999
AND `Sold d_month_of_year` = 9
AND `d_customer_gmt_offset` = -6
GROUP BY 1

18,593 bytes
Figure 15: AtScale SQL to answer question

As you can see, the SQL written against the AtScale semantic model is human readable and
understandable. In addition, this semantic model provides important context for query optimization
which delivers query acceleration, user concurrency improvements and cost reduction.

As a measure of complexity, we used an open source parser to break down each SQL statement into the
following groups:

1. Number of functions used

2. Number of arithmetic operations
3. Number of tables accessed
4. Number of objects used and number of subqueries needed.

Complexity Factor
Configuration
# of # of # of # of # of
Total Score
Functions Operations Tables Objects Subqueries

No AtScale 87 66 177 700 27 1,057

AtScale 36 2 21 198 1 258

Figure 16: Complexity score for TPC-DS benchmark with and without AtScale Semantic Layer

Conclusion
As you can see from the benchmark results, the future for data warehousing is definitely in the cloud.
The cloud data platforms we tested prove that the cloud is a viable alternative with many performance
and management advantages for data warehousing compared to the traditional on-premise options.
However, there are key differences in performance, scalability and cost that need to be considered.

We also proved that the inclusion of a semantic layer like AtScale’s can make the cloud data warehouses
even better by:

1. 2. 3.
Drastically Insuring all Increasing
simplifying users access query
queries for the same, performance
users secure data by up to 11x

4. 5.
Improving user
Reducing cost
concurrency by
by up to 3.7x
up to 31x

ABOUT ATSCALE
AtScale enables smarter decision-making by accelerating the flow of data-driven insights. The company’s semantic layer
platform simplifies, accelerates, and extends business intelligence and data science capabilities for enterprise customers
across all industries.

Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
No ratings yet
Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
20 pages
Engine, Assembling: Service Information
74% (23)
Engine, Assembling: Service Information
29 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Srilakshi M Resume
No ratings yet
Srilakshi M Resume
6 pages
Divya Padsala
No ratings yet
Divya Padsala
5 pages
Introduction To GR228X Test Systems
50% (2)
Introduction To GR228X Test Systems
104 pages
Hfe Onkyo TX-sr607 Service Manual
60% (5)
Hfe Onkyo TX-sr607 Service Manual
128 pages
Data Cleaning in SQL
No ratings yet
Data Cleaning in SQL
29 pages
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
0% (1)
AWS Certified Data Analytics - Specialty Exam Guide - v1.0!08!23-2019 - FINAL
2 pages
Advanced Architecting On AWS
0% (1)
Advanced Architecting On AWS
3 pages
AWS Certified Solutions Architect - Associate (SAA-C02) Sample Exam Questions
No ratings yet
AWS Certified Solutions Architect - Associate (SAA-C02) Sample Exam Questions
5 pages
Build A Static Website With Amazon S3 Activity
No ratings yet
Build A Static Website With Amazon S3 Activity
7 pages
Beginning Microsoft SQL Server 2012 Programming
From Everand
Beginning Microsoft SQL Server 2012 Programming
Paul Atkinson
1/5 (1)
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Talend ESB Container AG 50b en
No ratings yet
Talend ESB Container AG 50b en
63 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
AWS Certification Preparation Notes
No ratings yet
AWS Certification Preparation Notes
25 pages
Amazon Redshift-Lab
No ratings yet
Amazon Redshift-Lab
14 pages
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
No ratings yet
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
23 pages
Amazon Web Services
No ratings yet
Amazon Web Services
2 pages
AWS Athena Knowledgebase
No ratings yet
AWS Athena Knowledgebase
4 pages
AWS Certified Solutions Architect Professional - Study Guide - Domain 7 0 Scalability and Elasticity
No ratings yet
AWS Certified Solutions Architect Professional - Study Guide - Domain 7 0 Scalability and Elasticity
12 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
CampusRecruitmentBook PDF
No ratings yet
CampusRecruitmentBook PDF
126 pages
Main - Page Integration Services (SSIS) : Transformation Description Examples of When Transformation Would Be Used
No ratings yet
Main - Page Integration Services (SSIS) : Transformation Description Examples of When Transformation Would Be Used
5 pages
BMW On AWS Case Study MDC Sem II
No ratings yet
BMW On AWS Case Study MDC Sem II
11 pages
Talend Installation Guide (Data Service Platform)
No ratings yet
Talend Installation Guide (Data Service Platform)
14 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Syntax SAP On AWS Cloud Teknowlogy Whitepaper
No ratings yet
Syntax SAP On AWS Cloud Teknowlogy Whitepaper
15 pages
Cognos Content Store Survival Guide
No ratings yet
Cognos Content Store Survival Guide
22 pages
Aws Data Analytics Fundamentals
100% (1)
Aws Data Analytics Fundamentals
15 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Low Level Design
No ratings yet
Low Level Design
23 pages
Presentation - 2018 - Microsoft SSIS SQL Server 2016&2017
No ratings yet
Presentation - 2018 - Microsoft SSIS SQL Server 2016&2017
30 pages
SAP On AWS Specialty - Sample Questions
No ratings yet
SAP On AWS Specialty - Sample Questions
7 pages
Amazon Cloudfront Overview: Tal Saraf General Manager Amazon Cloudfront and Route 53
No ratings yet
Amazon Cloudfront Overview: Tal Saraf General Manager Amazon Cloudfront and Route 53
40 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Drill Slides
No ratings yet
Drill Slides
14 pages
Resume - Tanmoy Munshi PDF
No ratings yet
Resume - Tanmoy Munshi PDF
2 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Azure For Research - Guide For Resellers FINAL
No ratings yet
Azure For Research - Guide For Resellers FINAL
28 pages
AWS Certified Solutions Architect Associate Exam Guide
No ratings yet
AWS Certified Solutions Architect Associate Exam Guide
8 pages
Tutorial Analysis Service Tabular Model
No ratings yet
Tutorial Analysis Service Tabular Model
113 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
AWS Certified Solutions Architect - Associate - SAA-C02 - Marks4Sure - Mansoor
No ratings yet
AWS Certified Solutions Architect - Associate - SAA-C02 - Marks4Sure - Mansoor
3 pages
Vestas Client VPN Installation & User Guide
No ratings yet
Vestas Client VPN Installation & User Guide
8 pages
AWS Certified Solutions Architect Associate Exam Guide
No ratings yet
AWS Certified Solutions Architect Associate Exam Guide
23 pages
Module 8 - Database Services
No ratings yet
Module 8 - Database Services
33 pages
Dimensional Modeling PDF
No ratings yet
Dimensional Modeling PDF
14 pages
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Aws CJ Saa en Kickoff 2023 Nov
No ratings yet
Aws CJ Saa en Kickoff 2023 Nov
43 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
18 pages
MongoBoulder - Schema Design
No ratings yet
MongoBoulder - Schema Design
59 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Course12 2 PDF
No ratings yet
Course12 2 PDF
36 pages
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
100% (1)
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
5 pages
SSIS
No ratings yet
SSIS
7 pages
DP 203 Merged Merged Merged
No ratings yet
DP 203 Merged Merged Merged
699 pages
C360 103 InstallationAndConfigurationGuide en
No ratings yet
C360 103 InstallationAndConfigurationGuide en
80 pages
Instant Pentaho Data Integration Kitchen
From Everand
Instant Pentaho Data Integration Kitchen
Sergio Ramazzina
No ratings yet
DSECLZG529-AIMLCZG529-Data Management For Machine Learning-Midsem - Makeup-AK
No ratings yet
DSECLZG529-AIMLCZG529-Data Management For Machine Learning-Midsem - Makeup-AK
12 pages
Smart-Car-Parking-Reservation-System-for-Establishments
No ratings yet
Smart-Car-Parking-Reservation-System-for-Establishments
90 pages
Application of Hybrid Fault Tree and Bayesian Networks in Safety Management of Oil and Gas Subsea Production Infrastructure
No ratings yet
Application of Hybrid Fault Tree and Bayesian Networks in Safety Management of Oil and Gas Subsea Production Infrastructure
12 pages
SKF Electronic Parking Brake 6879 1 en
No ratings yet
SKF Electronic Parking Brake 6879 1 en
4 pages
SO UI Photocopiable1
No ratings yet
SO UI Photocopiable1
1 page
Cisco Secure Firewall Threat Defense Virtual (Formerly FTDV/NGFWV)
No ratings yet
Cisco Secure Firewall Threat Defense Virtual (Formerly FTDV/NGFWV)
12 pages
Fuzzy Control Passino and Yurkovich PDF
No ratings yet
Fuzzy Control Passino and Yurkovich PDF
79 pages
MSE111-0 Lecture 5 (Semiconductor Assembly)
100% (1)
MSE111-0 Lecture 5 (Semiconductor Assembly)
85 pages
Audit - Script All Logins - Users - and Roles
No ratings yet
Audit - Script All Logins - Users - and Roles
9 pages
MTN Irancell: Iran: Company Profile Report
No ratings yet
MTN Irancell: Iran: Company Profile Report
32 pages
Invoice Print Query
No ratings yet
Invoice Print Query
15 pages
Hardening Microsoft Active Directory
No ratings yet
Hardening Microsoft Active Directory
14 pages
Wireless LAN Network Attacks & Mitigation: Kabul University
No ratings yet
Wireless LAN Network Attacks & Mitigation: Kabul University
8 pages
ESX2 Storage Performance
No ratings yet
ESX2 Storage Performance
14 pages
Internshala Answers
No ratings yet
Internshala Answers
2 pages
WSSF 2020 Stock Management System With Report Generating Software Using C# With Help of Vs2010 and SQL SERVER 2008
No ratings yet
WSSF 2020 Stock Management System With Report Generating Software Using C# With Help of Vs2010 and SQL SERVER 2008
7 pages
Wcms 538193 PDF
No ratings yet
Wcms 538193 PDF
39 pages
MIT TechnologyReview-March - April 2023
100% (1)
MIT TechnologyReview-March - April 2023
92 pages
Endoflife DLP
No ratings yet
Endoflife DLP
3 pages
Training Material PCS System
No ratings yet
Training Material PCS System
28 pages
Lexium 32 - BMH e BSH
No ratings yet
Lexium 32 - BMH e BSH
51 pages
Final List of Shortlisted/Recommended Websites Domain Authority
No ratings yet
Final List of Shortlisted/Recommended Websites Domain Authority
2 pages
Final Exam
100% (1)
Final Exam
14 pages
Animated PowerPoint Timeline Template With Morph Transition by PowerPoint School
No ratings yet
Animated PowerPoint Timeline Template With Morph Transition by PowerPoint School
11 pages
Arabic Numerals - Wikipedia
No ratings yet
Arabic Numerals - Wikipedia
1 page
Module 2 Quiz - Chapters 1 and 2 - CYBR 365 Intro To Digital Forensics - Jan 2022 - Online
No ratings yet
Module 2 Quiz - Chapters 1 and 2 - CYBR 365 Intro To Digital Forensics - Jan 2022 - Online
8 pages
(Ebook) The Interface IBM And The Transformation Of Corporate Design, 1945–1976 by John Harwood ISBN 9780816670390, 9780816674527, 9780816678495, 9781452946825, 0816670390, 0816674523, 0816678499, 1452946825 - The latest ebook version is now available for instant access
100% (1)
(Ebook) The Interface IBM And The Transformation Of Corporate Design, 1945–1976 by John Harwood ISBN 9780816670390, 9780816674527, 9780816678495, 9781452946825, 0816670390, 0816674523, 0816678499, 1452946825 - The latest ebook version is now available for instant access
76 pages

Benchmark Report - Amazon Redshift

Uploaded by

Benchmark Report - Amazon Redshift

Uploaded by

Cloud Analytics

By David Mariani & Krasimir Kovachki

© 2021 AtScale Inc. All rights reserved.

Improvement Factor with AtScale

Query Performance1 11x Faster

Concurrent Query Performance2 31x Faster

Compute Cost3 3,7x Cheaper

Complexity4 76% less complex SQL queries

Figure 1: Improvements with AtScale

© 2021 AtScale Inc. All rights reserved. 1

Leveraging Amazon Redshif for BI and Analytics

Amazon Redshift is the most economical choice of all

© 2021 AtScale Inc. All rights reserved. 2

The AtScale semantic layer provides the following benefits:

2. It provides an integration layer to support analytics discoverability, governance,

3. It accelerates end-to-end query performance while optimizing data platform resources

© 2021 AtScale Inc. All rights reserved. 3

Query User Compute Semantic

Figure 2: Benchmark Testing Topics

By automatically creating and managing aggregate tables

© 2021 AtScale Inc. All rights reserved. 4

Table Name Row Size Row Count

Figure 3: TPC-DS 10TB Table SIzes

© 2021 AtScale Inc. All rights reserved. 5

The following 20 TPC-DS queries were selected for the test:

Figure 4: TPC-DS Test Queries

© 2021 AtScale Inc. All rights reserved. 6

Vendor Configuration Compute Cost per Hour5

Figure 5: Data Platform Configurations

Query Performance for a Single User Test Methodology

© 2021 AtScale Inc. All rights reserved. 7

Compute Cost Calculations

© 2021 AtScale Inc. All rights reserved. 8

Query Performance6 User Concurrency7 Compute Costs8 Semantic Complexity9

11X 31X 37X

Figure 6: Improvements with AtScale

© 2021 AtScale Inc. All rights reserved. 9

Elapsed Run Time (Minutes)

Figure 7: Elapsed Run Time for 1 Thread

© 2021 AtScale Inc. All rights reserved. 10

Elapsed Run Time (Minutes)

Figure 8: Elapsed Run Time for All Runs

© 2021 AtScale Inc. All rights reserved. 11

Figure 9: Elapsed Run Time by Thread

© 2021 AtScale Inc. All rights reserved. 12

Average Query Time by Query (Seconds)

Median = 104.2 seconds

Median = 1.8 seconds

© 2021 AtScale Inc. All rights reserved. 13

Figure 11: Compute Costs for All Thread Groups

© 2021 AtScale Inc. All rights reserved. 14

What is the monthly sales amount for a specific month in a

SQL TO ANSWER BUSINESS QUESTION:

© 2021 AtScale Inc. All rights reserved. 15

Figure 13: AtScale TPC-DS Data Model

© 2021 AtScale Inc. All rights reserved. 16

Figure 14: Tableau on AtScale TPC-DS Model for Query #60

© 2021 AtScale Inc. All rights reserved. 17

1. Number of functions used

No AtScale 87 66 177 700 27 1,057

AtScale 36 2 21 198 1 258

© 2021 AtScale Inc. All rights reserved. 18

© 2019 AtScale Inc. All rights

You might also like