Benchmark Report - Amazon Redshift
Benchmark Report - Amazon Redshift
Performance Benchmark:
Amazon Redshift
Executive Summary________________________________________________ 1
Introduction_____________________________________________________ 2
Leveraging Amazon Redshift for BI and Analytics_________________________ 2
The Power of AtScale and Amazon Redshift___________________________ 3
Benchmark Methodology____________________________________________ 4
Benchmark Measurements_________________________________________ 4
Benchmark Dataset______________________________________________ 5
Benchmark Queries______________________________________________ 6
Test Harness___________________________________________________ 6
Configuration Tested_____________________________________________ 7
Query Performance for a Single User Test Methodology____________________ 8
Query Performance with Concurrency Test Methodology___________________ 8
Compute Cost Calculations________________________________________ 8
Summary Results__________________________________________________ 9
Queary Performance Test Results___________________________________ 10
Concurrent Query Performance____________________________________ 11
Median Query Time by TPC-DS Query Test Results_______________________ 13
Compute Cost Test Results________________________________________ 14
Complexity Test Results__________________________________________ 15
Conclusion_____________________________________________________ 19
This analysis is a refresh of a study first done in 2020 using the same methodology. The results
illustrate improvement in Amazon Redshift’s raw performance, but with clear benefits for the
combined solution
1
Elapsed time for executing 1 query five times
2
Elapsed time executing 1 (x5), 5, 25, 50 queries
3
Compute costs for cluster time for user concurrency test
4
Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)
Finally, Amazon Redshift offers Redshift Spectrum which provides direct query access to files on S3
which extends Redshift’s reach into data lakes seamlessly.
1. It presents a consistent set of business-friendly metrics for BI and data science teams
to consume data with the tools of their choice.
By leveraging a graph-based semantic model, the AtScale platform sends queries to Amazon Redshift
using its data virtualization engine and pushes workloads to the Amazon Redshift platform. By
automatically creating and managing aggregate tables on Amazon Redshift based on user query
patterns, AtScale avoids costly atomic table scans and delivers superior query performance by re-
writing queries to access those aggregate tables.
In this study, we will compare the performance, complexity and costs of these cloud data platforms with
and without the AtScale platform.
Run 20 TPC-DS Queries for 1 user Run 20 TPC-DS Queries for 5, 25 & Measure the total elapsed time or Compare the raw TPC-DS SQL
five times & measure the total 50 users one time & measure the bytes read for the query & queries to the equivalent BI
elapsed time on a TPC-DS 10TB total elapsed time on a TPC-DS concurrency test on a TPC-DS semantic layer queries on a
dataset 10TB dataset 10TB dataset TPC-DS 10TB dataset
1
date_dim 141 73,049
household_demographics 21 7,200 Multiple fact tables
income_band 16 20
inventory 16 1,311,525,000
item 281 402,000
promotions
reason
124
38
2,000
70
2 Large fact tables
ship_mode 56 20
store 263 1,500
store_returns
store_sales
134
164
2,879,970,104
28,799,983,563 3 Large dimensions
time_dim 59 86,400
warehouse 117 25
web_page 96 4,002
web_returns 162 720,020,485
web_sales 226 7,199,963,324
web_site 292 78
Test Harness
To ensure consistency for concurrency tests, we ran queries using v5.4.1 of Apache JMeter. The
instructions, documentation, utility scripts, results, and JMeter JMX files can be found in our GitHub
repository and are available upon request.
We designed the JMeter test suites to run the above 20 queries in the following four configurations:
▲ 1 concurrent user, 5 loops ▲ 25 concurrent users, 1 loop
(averaging the result to even out cold starts)
▲ 50 concurrent users, 1 loop
▲ 5 concurrent users, 1 loop
Configuration Tested
The following Snowflake configuration was used for the test:
dc2.8xlarge (6 nodes at
Amazon Redshift $28.80
$4.80 per node)
For the test, we used Amazon Redshift’s “out of the box” configuration. We did not manually tune any
of the TPC-DS queries and used the same clustering scheme used in AWS Labs’ TPC-DS benchmark in
GitHub.
5
Storage cost wasn’t factored in (only compute cost)
ConcurrencyRunTimeMinutes / 60 * ComputeCostPerHour
We explicitly excluded storage costs from our calculations. We found that storage cost was nominal
across all platforms and given that it’s a fixed cost, it was not subject to variation in our testing
scenarios.
6
Elapsed time for executing 1 query five times
7
Elapsed time executing 1 (x5), 5, 25, 50 queries
8
Compute costs for cluster time for user concurrency test
9
Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)
3.483
3.5
3.0
2.5
Cost
2.0
1.5
1.0
0.5
0.330
0.0
No AtScale AtScale
In this test, we saw some real impact in query performance under additional user concurrency load. To
be fair, Amazon Redshift offers an option for their enterprise edition for automated concurrency scaling.
This option dynamically adds more cluster resources to handle concurrency bottlenecks. We chose not
to enable this option in order to quantify performance for a fixed level of resource.
140
130
120
110
Run Time (Minutes)
100
90
80
70
60
50
40
30
20
10
4.6
0
No AtScale Q3 2021 AtScale Q3 2021
1 5 25 50
80.0 79.03
70.0
Run Time (Minutes)
60.0
50.0
40.0 37.53
30.0
20.0
10.0 8.55
3.48 1.42
0.33 0.35 0.78
0.0
No AtScale AtScale
200 185.2
136.3 147.1 136.3 117.8
Elapsed Time (Seconds)
100 61.7
90.6
55.1 55.2 55.7
50 29.6
27.3 26.4
20
11.6
10 9.2 No AtScale
5
0.5
500
200
Elapsed Time (Seconds)
100
50
20
2.0
10.4
10 AtScale
5
3.5 3.4 2.8
2.7 2.7 2.3 2.1
2 1.7 1.6 1.7 1.9
1.5 1.4 1.4 1.4 1.2 1.1
1 0.8
0.5
2 7 13 15 26 31 33 42 48 50 52 53 55 56 60 61 71 88 96 98
Figure 10: Average query time by TPC-DS query number with median
Compute Cost
All Runs - Redshift
$70.00
$68.39
$65.00
$60.00
$55.00
$50.00
$45.00
$40.00
Cost
$35.00
$30.00
$25.00
$20.00 $18.31
$15.00
$10.00
$5.00
$0.00
No AtScale AtScale
For example, with query #60 of the TPC-DS benchmark, the business question is fairly straightforward
but the SQL to express it is not. .
BUSINESS QUESTION:
TPC-DS Raw
with ss as ( item
select where
i_item_id,sum(ss_ext_sales_price) total_sales i_item_id in (select
from i_item_id
store_sales, from
date_dim, item
customer_address, where i_category in (‘Jewelry’))
item and cs_item_sk = i_item_sk
where and cs_sold_date_sk = d_date_sk
i_item_id in (select and d_year = 1999
i_item_id and d_moy =9
from and cs_bill_addr_sk = ca_address_sk
item and ca_gmt_offset = -6
where i_category in (‘Jewelry’)) group by i_item_id),
and ss_item_sk = i_item_sk ws as (
and ss_sold_date_sk = d_date_sk select
and d_year = 1999 i_item_id,sum(ws_ext_sales_price) total_sales
and d_moy =9 from
and ss_addr_sk = ca_address_sk web_sales,
and ca_gmt_offset = -6 date_dim,
group by i_item_id), customer_address,
cs as ( item
select where
i_item_id,sum(cs_ext_sales_price) total_sales i_item_id in (select
from i_item_id
catalog_sales, from
date_dim, item
customer_address, ...
26,640 bytes
Figure 12: TPC-DS Raw SQL to answer question
In response to this challenge, for this benchmark study, we defined an AtScale model that drastically
simplifies user queries by translating the raw tables and schema into a business semantic layer. The
following screenshot is the TPC-DS model expressed in AtScale Design Center:
The visualization above for TPC-DS query #60 generated the following SQL against AtScale:
AtScale SQL
SELECT
`d_product_item_id` AS `d_product_item_id`,
SUM( `Total Ext Sales Price` ) AS `sum_total__ext_sales_price_ok`
FROM
`tpc-ds benchmark model - snowflake`.`tpc-ds benchmark model` `tpc_ds_benchmark_model`
WHERE
`I Category` = ‘Jewelry’
AND `Sold Calendar Year` = 1999
AND `Sold d_month_of_year` = 9
AND `d_customer_gmt_offset` = -6
GROUP BY 1
18,593 bytes
Figure 15: AtScale SQL to answer question
As a measure of complexity, we used an open source parser to break down each SQL statement into the
following groups:
Complexity Factor
Configuration
# of # of # of # of # of
Total Score
Functions Operations Tables Objects Subqueries
Figure 16: Complexity score for TPC-DS benchmark with and without AtScale Semantic Layer
We also proved that the inclusion of a semantic layer like AtScale’s can make the cloud data warehouses
even better by:
1. 2. 3.
Drastically Insuring all Increasing
simplifying users access query
queries for the same, performance
users secure data by up to 11x
4. 5.
Improving user
Reducing cost
concurrency by
by up to 3.7x
up to 31x
ABOUT ATSCALE
AtScale enables smarter decision-making by accelerating the flow of data-driven insights. The company’s semantic layer
platform simplifies, accelerates, and extends business intelligence and data science capabilities for enterprise customers
across all industries.