Deep Dive Aurora
Deep Dive Aurora
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Amazon Aurora?
Database reimagined for the cloud
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Re-imagining the relational database
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale-out, distributed architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automate administrative tasks
Automatic fail-over
Schema design Backup & recovery
Query construction You Isolation & security
Query optimization Industry compliance
AWS Push-button scaling
Automated patching
Advanced monitoring
Routine maintenance
Fastest growing
service in AWS history
Aurora is used by
¾ of the top 100
AWS customers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Who is moving to Aurora and why?
Higher performance – up to 5x
Customers using Better availability and durability
open source engines Reduces cost – up to 60%
Easy migration; no application change
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A URORA PERFORMANCE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora MySQL performance
150000 400000
300000
100000
200000
50000
100000
0 0
MySQL SysBench results; R4.16XL: 64cores / 488 GB RAM Aurora MySQL 5.6
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora PostgreSQL performance
While running pgbench at load, throughput is
3x more consistent than PostgreSQL
pgbench throughput over time, 150 GiB, 1024 clients
45000
40000
35000
30000
Throughput, tps
25000
20000
15000
10000
5000
0
10 15 20 25 30 35 40 45 50 55 60
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora I/O profile
MYSQL WITH REPLICA AMAZON AURORA
AZ 1 3 AZ 2
AZ 1 AZ 2 AZ 3
Primary Replica Primary Replica Replica
Instance Instance Instance Instance Instance
ASYNC
1 4 4/6 QUORUM
Amazon Elastic
Block Store (EBS) EBS
2 5 DISTRIBUT
ED WRITES
Amazon S3 Amazon S3
MySQL I/O profile for 30 min Sysbench run Aurora IO profile for 30 min Sysbench run
780K transactions 27,378K transactions 35X MORE
7,388K I/Os per million txns (excludes mirroring, standby)
0.95 I/Os per transaction (6X amplification) 7.7X LESS
Average 7.4 I/Os per transaction
TYPE OF W RITE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora lock management
MySQL lock manager Aurora lock manager
Scan Scan Scan
Delete
Scan Scan
Insert
Delete Delete
Insert Insert
Insert
16.00 14.57
Works for queries using Batched Key 14.00
12.00
Access (BKA) join algorithm + Multi- 10.00
Range Read (MRR) Optimization 8.00
6.00
4.00
2.00
Performs a secondary to primary -
Query-3
Query-1
Query-2
Query-4
Query-5
Query-6
Query-7
Query-8
Query-9
Query-22
Query-10
Query-11
Query-12
Query-13
Query-14
Query-15
Query-16
Query-17
Query-18
Query-19
Query-20
Query-21
index lookup during JOIN evaluation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Batched scans
Query-1
Query-2
Query-3
Query-4
Query-5
Query-6
Query-7
Query-8
Query-9
Query-10
Query-11
Query-12
Query-13
Query-14
Query-15
Query-16
Query-17
Query-18
Query-19
Query-20
Query-21
Query-22
Table full scans
Index full scans
Index range scans Latency improvement factor vs. Batched Key Access (BKA)
join algorithm Decision support benchmark, R3.8xlarge
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
M O N I TO R I N G D ATA B A S E P E R F O R M A N C E
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance Insights
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance Insights
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
W H AT A B O U T AVA I L A B I L I T Y ?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6-way replicated storage
Survives catastrophic failures
Six copies across three availability zones
4 out 6 write quorum; 3 out of 6 read quorum
Peer-to-peer replication for repairs
Volume striped across hundreds of storage nodes
AZ 1 AZ 2 AZ 3 AZ 1 AZ 2 AZ 3
SQL SQL
Transaction Transaction
Caching Caching
Reader end-point
T0 T0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database fail-over time
30% 10%
20%
5%
10%
0%
0% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 Amazon
© 2018, 29 31 Web
33 Services,
35 Inc. or its affiliates. All rights reserved.
R E C E N T I N N O VAT I O N S
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability is about more than HW failures
Aurora solutions for availability disruptions
4. Disasters
Global replication
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Zero downtime patching
Net App
Old DB
Before ZDP
state state
Engine
Net App
state state New DB
User sessions terminate Engine Storage Service
during patching
With ZDP
Old DB
Application
Engine
Networking
state
state
New DB
User sessions remain Engine Storage Service
active through patching © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fast database cloning
BENCHMARKS
PRODUCTION DATABASE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database backtrack
Invisible
t4
Invisible
t2 t3
Rewind to t3
t0 t1
Rewind to t1
t0 t1 t2 t3 t4
Backtrack brings the database to a point in time without requiring restore from backups
• Backtracking from an unintentional DML or DDL operation
• Backtrack is not destructive. You can backtrack multiple times to find the right point in time
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does backtrack work?
SEGMENT LOG
SNAPSHOT RECORDS
SEGMENT 1
SEGMENT 2
SEGMENT 3
RECOVERY TIME
POINT
We keep periodic snapshot of each segment; we also preserve the redo logs
For backtrack, we identify the appropriate segment snapshots
Apply log streams to segment snapshots in parallel and asynchronously
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Online DDL: MySQL vs. Aurora
MySQL Aurora
ROOT
table name operation column-name time-stamp
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Online DDL performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global replication - logical
Faster disaster recovery and enhanced data locality
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global replication – physical
Region 1: Primary Aurora Cluster Region 2: Read Replica
AZ 1 AZ 2 AZ 3 AZ 1
Aurora
Primary Aurora Aurora
Replication Server
Replication Agent
Replica
Instance Replica Replica
(optional)
Async.
TYPE OF W RITE
REDO LOG © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. FRM FILES
A U R O R A S E RV E R L E S S
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora Serverless use cases
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora Serverless
APPLICATION
• Starts up on demand,
shuts down when not in DATABASE END-POINT
use
• Scales up & down REQUEST ROUTER
automatically
WARM POOL
• No application impact OF INSTANCES
when scaling
• Pay per second, 1 minute SCALABLE DB CAPACITY
minimum
DATABASE STORAGE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Instance provisioning and scaling
• First request triggers instance APPLICATION
provisioning. Usually 1-3 seconds
• Instance auto-scales up and down as
workload changes. Usually 1-3
seconds
REQUEST
• Instances hibernate after user-defined ROUTER
period of inactivity
• Scaling operations are transparent to
the application – user sessions are NEW
CURRENT
not terminated INSTANCE INSTANCE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling to and from zero
• If desired, instances are removed after APPLICATION
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A U R O R A M U LT I - M A S T E R
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Lock Manager
APPLICATION
GLOBAL
SQL SQL RESOURCE M1 M2 M3
TRANSACTIONS TRANSACTIONS MANAGER
CACHING CACHING
LOGGING LOGGING
M1 M2 M1 M3 M1 M2
Pros Cons
All data available to all nodes Heavyweight cache coherency traffic, on per-lock basis
Easy to build applications Networking can be expensive
Similar cache coherency as in multi-processors Negative scaling when hot blocks
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consensus with two phase or Paxos commit
DATA
SHARED NOTHING
DATA RANGE #4
RANGE #5
APPLICATION
L L
SQL SQL
TRANSACTIONS TRANSACTIONS L
CACHING CACHING DATA
RANGE #1 L DATA
LOGGING LOGGING
L RANGE #3
Pros Cons
Query broken up and sent to data node Heavyweight commit and membership change protocols
Less coherence traffic – just for commits Range partitioning can result in hot partitions, not just hot
blocks. Re-partitioning expensive.
Can scale to many nodes
Cross partition operations expensive. Better at small
requests
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Conflict resolution using distributed ledgers
• There are many “oases” of Page1 Page 2 Page 1 Page 2
consistency in Aurora BT1 OT1
BT2 OT2
• The database nodes know MASTER
MASTER
BT3 OT3
transaction orders from that
BT4 OT4
node
• The storage nodes know
PAGE1 PAGE2
transactions orders applied
at that node Quorum 1 2 3 4 5 6 1 2 3 4 5 6
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-region Multi-Master
HEAD NODES HEAD NODES
REGION 1 REGION 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A U R O R A P A R A L L E L Q U E RY
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Parallel query processing
STORAGE NODES
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Processing at head node
STORAGE NODES
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Processing at storage node
TO/FROM HEAD NODE
Each storage node runs up to 16 PQ
processes, each associated with a
parallel query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: parallel hash join
JOIN CONTEXT
Head node sends hash join context to
storage node consisting of:
PAGE SCAN
List of pages – Page ID and LSNs
Join expression byte code
Read view – what rows to include
Projections – what columns to select ROWS WITH
STABLE ROWS
PENDING UNDO
PQ process performs the following steps:
Scans through the lists of pages
Builds partial bloom filters; collect stats BLOOM
FILTER
Sends two streams to head node
• Rows with matching hashes
• Rows with pending undo processing
Head node performs the Join function after UNPROCESSED FILTERED AND PROJECTED
merging data streams from different storage ROW STREAM ROW STREAM
nodes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Parallel query performance
256 290.66
128 142.4
117.06
100.16
64 83.47
48.87
32 41.39
29.95
16
14.35
11.58
8
5.61
4 4.29
2.76
2 2.27
1.71 1.77
1.31 1.15 1.27
1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enjoy trying new features?
Sign up for previews: aws.amazon.com/rds/aurora/
AURORA SERVERLESS AURORA MULTI-MASTER
Scales out
Scales DB capacity
to match app needs writes across
multiple AZs PERFORMAN
CE INSIGHTS
Improves Assess DB
performance of load & tune
large queries performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the summit mobile app.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.