0% found this document useful (0 votes)

35 views

Amazon Aurora: On Avoiding Distributed Consensus For I/Os, Commits, and Membership Changes

Uploaded by

doraemonk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Amazon Aurora: On Avoiding Distributed Consensus For I/Os, Commits, and Membership Changes

Uploaded by

doraemonk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

Amazon Aurora: On Avoiding Distributed Consensus for I/Os,

Commits, and Membership Changes
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta
Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice
Tengiz Kharatishvilli, Xiaofeng Bao
Amazon Web Services
ABSTRACT database instances in RDS led to the design requirements for Aurora,
Amazon Aurora is a high-throughput cloud-native relational data- a high-throughput cloud-native relational database.
base offered as part of Amazon Web Services (AWS). One of the more In our earlier paper [12], we provided an overview of the design
novel differences between Aurora and other relational databases is considerations behind Aurora. A key contribution of that paper is
how it pushes redo processing to a multi-tenant scale-out storage to show that, on a fleet-wide basis, it is insufficient to treat failures
service, purpose-built for Aurora. Doing so reduces networking as independent. At a minimum, it is necessary to consider the
traffic, avoids checkpoints and crash recovery, enables failovers to correlated impact of the largest unit of failure in addition to the
replicas without loss of data, and enables fault-tolerant storage that background noise of on-going independent failures. In AWS, the
heals without database involvement. Traditional implementations largest unit of failure a system may need to tolerate is an Availability
that leverage distributed storage would use distributed consensus al- Zone (AZ). An AZ is a subset of a Region that is connected to
gorithms for commits, reads, replication, and membership changes other AZs through low-latency networking links, but is isolated for
and amplify cost of underlying storage. In this paper, we describe most faults, including power, networking, software deployments,
how Aurora avoids distributed consensus under most circumstances flooding, and other phenomena. Aurora supports “AZ+1” failures,
by establishing invariants and leveraging local transient state. Do- resulting in six copies of data, spread across three AZs, a 4/6 write
ing so improves performance, reduces variability, and lowers costs. quorum, and a 3/6 read quorum as illustrated in Figure 1. Aurora
implements quorum membership changes to handle unexpected
KEYWORDS failures, heat management, as well as planned software upgrades.
Databases; Distributed Systems; Log Processing; Quorum Models;
Fault tolerance; Quorum Sets; Replication; Recovery; Performance
AZ 1 AZ 2 AZ 3
ACM Reference Format:
2/3 read
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal X 2/3 write
Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor
Maurice, and Tengiz Kharatishvilli, Xiaofeng Bao. 2018. Amazon Aurora: X
Quorum
X X
On Avoiding Distributed Consensus for I/Os, Commits, and Membership X
break on
Changes. In SIGMOD’18: 2018 International Conference on Management of AZ failure

Data, June 10–15, 2018, Houston, TX, USA. ACM, New York, NY, USA, 8 pages.
https://doi.org/10.1145/3183713.3196937 AZ 1 AZ 2 AZ 3
3/6 read
X
4/6 write
X
1 INTRODUCTION
X X Quorum
IT workloads are increasingly moving to public cloud providers X X X survives
X X AZ failure
such as AWS. Many of these workloads require a relational database.
Amazon Relational Database Service (RDS) provides a managed
service that automates database provisioning, operating system Figure 1: Why are 6 copies necessary ?
and database patching, backup, point-in-time restore, storage and
compute scaling, instance health monitoring, failover, and other
capabilities. Our experience managing hundreds of thousands of Quorum models, such as the one used by Aurora, are rarely used
in high-performance relational databases, despite the benefits they
Permission to make digital or hard copies of all or part of this work for personal or provide for availability, durability, and the reduction of latency jitter.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
We believe this is because the underlying distributed algorithms
on the first page. Copyrights for components of this work owned by others than the typically used in these systems – two-phase commit (2PC), Paxos
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or commit, Paxos membership changes, and their variants – can be
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. expensive and incur additional network overheads. The commercial
SIGMOD’18, June 10–15, 2018, Houston, TX, USA systems we have seen built on these algorithms may scale well but
© 2018 Copyright held by the owner/author(s). Publication rights licensed to the have order-of-magnitude worse cost, performance, and peak to
Association for Computing Machinery.
ACM ISBN 978-1-4503-4703-7/18/06. . . $15.00
average latency than a traditional relational database running on a
https://doi.org/10.1145/3183713.3196937 single node against local disk.

789
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

In this paper, we show how Aurora leverages only quorum I/Os, storage node (3) sorts and groups records, (4) gossips with peers to
locally observable state, and monotonically increasing log order- fill in missing records, (5) coalesces them into data blocks, (6) backs
ing to provide high performance, non-blocking, fault-tolerant I/O, them up to Amazon Simple Storage Service (S3), (7) garbage collects
commits, and membership changes. We limit our discussion to backed-up data that will no longer be referenced by an instance,
single-writer databases with read replicas. The approach described and (8) periodically scrubs data to ensure checksums continue to
below is extensible to multi-writer databases by ordering writes at match the data on disk.
database nodes, storage nodes, and using a journal to order opera-
tions that span multiple database instances and multiple storage STORAGE NODE
nodes. We describe the following contributions:
INCOMING QUEUE 1 7
(1) How Aurora performs writes using asynchronous flows, es- LOG RECORDS
Primary ACK
tablishes local consistency points, uses consistency points Instance
GC

for commit processing, and re-establishes them upon crash 2 UPDATE

QUEUE COALESCE
recovery. (Section 2) DATA
BLOCKS SCRUB 8
(2) How Aurora avoids quorum reads and how reads are scaled 5
across replicas. (Section 3)
SORT
GROUP
3
(3) How Aurora uses quorum sets and epochs to make non- Peer PEER TO PEER GOSSIP HOT
blocking reversible membership changes to process failures, Storage LOG
Nodes 4 POINT IN TIME
grow storage, and reduce costs. (Section 4) SNAPSHOT

Finally, we briefly survey related work in Section 5 and present

6
concluding remarks in Section 6.
S3 BACKUP

2 MAKING WRITES EFFICIENT

In this section, we review the Aurora storage architecture, how Figure 2: Activity in Aurora Storage Nodes
storage is distributed, and our quorum model. We next describe the
writes performed by Aurora database instances, and how writes
Segments in Aurora are the minimum unit of failure, with faults
are batched to storage nodes. We then describe how we maintain
monitored and repaired automatically as part of the service. Seg-
and advance consistency points across distributed storage and how
ments are small, currently representing no more than 10GB of
we re-establish consistency upon crash recovery.
addressable data blocks in the database volume. Segments are repli-
cated into protection groups, using V = 6, Vw = 4, and Vr = 3.
2.1 Aurora System Architecture
These six copies are spread across three AZs, with two copies in
Aurora uses a service-oriented architecture where database in- each of the three AZs. Assuming a 10 second window to detect
stances are loosely coupled with a multi-tenant scale-out storage and repair a segment failure, it would require two independent
service that abstracts a segmented redo log. Each database instance segment failures as well as an AZ failure in the same 10 second
acts as a SQL endpoint and includes most of the components of period to lose the ability to repair a quorum. This may seem overly
a traditional database kernel (query processing, access methods, conservative. We don’t think so. AZ failures are a correlated failure
transactions, locking, buffer caching, and undo management). Some of two members in each and every quorum. Across a large fleet,
database functions, including redo logging, materialization of data some small number of quorums will be degraded, with some quo-
blocks, garbage collection, and backup/restore, are offloaded to our rum member already failed at the time of an AZ failure. The time
storage fleet. it takes to repair the failure of this quorum member is the time a
Aurora uses a quorum model, where the database reads from and database is vulnerable to loss of data with one additional fault.
writes to a subset of copies of data. Formally, a quorum system that Protection groups are concatenated together to form a storage
employs V copies must obey two rules. First, the read set, Vr , and volume, which has a one to one relationship with the database
the write set, Vw , must overlap on at least one copy. This ensures a instance. While the redo log is segmented and spread across storage
data item is not read and written by two transactions concurrently nodes, the Log Sequence Number (LSN) space is common across
and the read quorum contains at least one site with the newest the database volume, monotonically increasing, and allocated by
version of the data item. Second, the write set must overlap with the database instance. This is the key invariant that allows Aurora
prior write sets, which can be done by ensuring that Vw > V /2. This to avoid distributed consensus for most operations.
ensures two write operations from two transactions cannot occur
concurrently on the same data item. 2.2 Writes in Aurora
Aurora storage is partitioned into segments that individually In Aurora, the only writes that cross the network from the database
store the redo log for their portion of the database volume as well instance to the storage node are redo log records. No data blocks
as coalesced data blocks. The activities on the storage node are are written from the database instance, not for background writes,
shown in more detail in Figure 2. Foreground activity in a storage not for checkpointing, and not for cache eviction. Instead, redo
node consists of (1) receiving redo records, (2) writing them to an log application code is run within the storage nodes, materializing
update queue, and acknowledging them back. In background, the blocks in background or on-demand to satisfy a read request.

790
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

Changes to data blocks modify the image in the Aurora buffer given write may be lost for any reason we need to tolerate missing
cache and add the corresponding redo record to a log buffer. These writes in the storage nodes.
are periodically flushed to a storage driver to be made durable.
SCL is sent by the storage node as part of acknowledging a write.
Inside the driver, they are shuffled to individual write buffers for
Once the database instance observes SCL advance at four of six
each storage node storing segments for the data volume. The dri-
members of the protection group, it is able to locally advance the
ver asynchronously issues writes, receives acknowledgments, and
Protection Group Complete LSN (PGCL), representing the point
establishes consistency points.
at which the protection group has made all writes durable. For
Each log record stores the LSN of the preceding log record in the example, Figure 3 shows a database with two protection groups,
volume, the previous LSN for the segment, and the previous LSN PG1 and PG2, consisting of segments A1-F1 and A2-F2 respectively.
for the block being modified. The block chain is used by the storage In the figure, each solid cell represents a log record acknowledged
node to materialize individual blocks on demand. The segment by a segment, with the odd numbered log records going to PG1 and
chain is used by each storage node to identify records that it has the even numbered log records going to PG2. Here, PG1’s PGCL is
not received and fill in these holes by gossiping with other storage 103 because 105 has not met quorum, PG2’s PGCL is 104 because
nodes. The full log chain is not needed by an individual storage 106 has not met quorum, and the database’s VCL is 104 which is the
node but provides a fallback path to regenerate storage volume highest point at which all previous log records have met quorum.
metadata in case of a disastrous loss of metadata state.
Many database systems boxcar redo log writes to improve through-
put. There is a challenge in deciding, with each record, whether
to issue the write, to improve latency, or to wait for subsequent
records, to improve write efficiency and throughput. Waiting cre-
ates performance jitter since early requests entering the boxcar
have to wait for later requests or a timeout to fill the request. Jitter
is greatest under low load when the boxcar times out.
In Aurora, there are many segments partitioning the redo log and
the opportunity to boxcar are lower than with a single unsegmented
redo log. Aurora handles this by submitting the asynchronous net-
work operation when it receives the first redo log record in the
boxcar but continuing to fill the buffer until the network operation Figure 3: Storage Consistency Points
executes. This ensures requests are sent without boxcar latency and
jitter while packing records together to minimize network packets.
For a database, it is not enough for individual writes to be made
In Aurora, all log writes, including those for commit redo log durable, the entire log chain must be complete to ensure recoverabil-
records, are sent asynchronously to storage nodes, processed asyn- ity. The database instance also locally advances a Volume Complete
chronously at the storage node, and asynchronously acknowledged LSN (VCL) once there are no pending writes preventing PGCL from
back to the database instance. advancing for one of its protection groups. No consensus is required
to advance SCL, PGCL, or VCL – all that is required is bookkeep-
ing by each individual storage node and local ephemeral state on
2.3 Storage Consistency Points and Commits the database instance based on the communication between the
A traditional relational database working with local disk would database and storage nodes.
write a commit redo log record, boxcar commits together using
group commit, and flush the log to ensure that it has been made This is possible because storage nodes do not have a vote in
durable. When working with remote storage, it might use a two- determining whether to accept a write, they must do so. Locking,
phase commit, or a Paxos commit, or variant, to establish a con- transaction management, deadlocks, constraints, and other con-
sistency point since there is no individual flush operation across ditions that influence whether an operation may proceed are all
all storage nodes. This is heavyweight and introduces stalls and resolved at the database tier. Processing offloaded to the Aurora
jitter into the write path. Distributed commit protocols also have storage nodes can progress by executing idempotent operations
failure modalities different from those of quorum writes, making it using local state. This also ensures that failed storage nodes can
complex to reason about availability and durability. transparently be repaired without involving the database instance.

As a storage node receives new log records, it may locally ad- A commit is acknowledged by the database to its caller once
vance a Segment Complete LSN (SCL), representing the latest point it is able to affirm that all data modified by the transaction has
in time for which it knows it has received all log records. More been durably recorded. A simple way to do so is to ensure that the
precisely, SCL is the inclusive upper bound on log records continu- commit redo record for the transaction, or System Commit Number
ously linked through the segment chain without gaps. SCL is used (SCN), is below VCL. No flush, consensus, or grouping is required.
by storage nodes as a compact way to identify missing writes when Aurora must wait to acknowledge commits until it is able to
gossiping with their peers in a protection group. Note since any advance VCL beyond the requesting SCN. Typically, this would

791
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

require stalling the worker thread acting upon the user request. In and writes, Aurora increments an epoch in its storage metadata
Aurora, user sessions are multiplexed to worker threads as requests service and records this volume epoch in a write quorum of each
are received. When a commit is received, the worker thread writes protection group comprising the volume. The volume epoch is
the commit record, puts the transaction on a commit queue, and provided as part of every read or write request to a storage node.
returns to a common task queue to find the next request to be pro- Storage nodes will not accept requests at stale volume epochs. This
cessed. When a driver thread advances VCL, it wakes up a dedicated boxes out old instances with previously open connections from
commit thread that scans the commit queue for SCNs below the accessing the storage volume after crash recovery has occurred.
new VCL and sends acknowledgements to the clients waiting for Some systems use leases to establish short term entitlements to
commit. There is no induced latency from group commits and no access the system, but leases introduce latency when one needs to
idle time for worker threads. wait for expiry. Aurora, rather than waiting for a lease to expire,
just changes the locks on the door.
2.4 Crash Recovery in Aurora
No redo replay is required as part of crash recovery since seg-
Aurora is able to avoid distributed consensus during writes and ments are able to generate data blocks on their own. Undo of previ-
commits by managing consistency points in the database instance ously active transactions is required but can occur after the database
rather than establishing consistency across multiple storage nodes. has been opened in parallel with user activity.
But, instances fail. Customers shut them down, resize them, and
restore them to older points in time. The time we save in the normal
forward processing of commits using local transient state must be 3 MAKING READS EFFICIENT
paid back by re-establishing consistency upon crash recovery. This Reads are one of the few operations in Aurora where threads have
is a trade worth making since commits are many orders of magni- to wait. Unlike writes, which can stream asynchronously to storage
tude more common than crashes. Since instance state is ephemeral, nodes, or commits, where a worker can move on to other work
the Aurora database instance must be able to construct PGCLs and while waiting for storage to acknowledge, a thread needing a block
VCL from local SCL state at storage nodes. not in cache typically must wait for the read I/O to complete before
it can progress.
AT CRASH
In a quorum system, the I/O required for a read is amplified by
Log records Gaps
the size of the read quorum. Network traffic is far higher since
one is reading full data blocks, unlike writes, where Aurora only
CRASH
ships log records. A buffer cache miss in Aurora’s quorum model
would seem to require a minimum of three read I/Os, and likely
Volume Complete
LSN (VCL)
five, to mask outlier latency and intermittent unavailability. Read
performance in quorum systems compares poorly to traditional
IMMEDIATELY AFTER CRASH RECOVERY
replication models where one writes to all copies, enabling a read
from just one, though those models have worse write availability.

3.1 Avoiding quorum reads

Figure 4: Log truncation during crash recovery Aurora uses read views to support snapshot isolation using Multi-
Version Concurrency Control (MVCC). A read view establishes a
When opening a database volume, either for crash recovery or logical point in time before which a SQL statement must see all
for a normal startup, the database instance must be able to reach changes and after which it may not see any changes other than its
at least a read quorum for each protection group comprising the own. Aurora MySQL does this by establishing the most recent SCN
volume. The database instance can then locally re-compute PGCLs and a list of transactions active as of that LSN. Data blocks seen by a
and VCL for the database by finding read quorum consistency read request must be at or after the read view LSN and back out any
points across SCLs. There may be a ragged edge of updates in transactions either active as of that LSN or started after that LSN.
particular segments past this point that did not yet meet quorum. Aurora PostgreSQL also uses MVCC, though writes records out of
These represent partial writes that did not complete and would not place, recording the transaction id with each record, and vacuuming
have been acknowledged to clients of the database. The database old versions periodically. Snapshot isolation is straightforward in a
snips off the ragged edge of the log by recording a truncation single-node database instance by having a transaction read the last
range that annuls any log records beyond the newly computed durable version of a database block and apply undo to rollback any
VCL (Figure 4). This ensures that, even if in-flight asynchronous changes. One must apply an invariant that undo records may not
operations complete during the process of crash recovery, they are be purged until all read views have advanced.
ignored. New redo records after crash recovery are allocated LSNs Even though Aurora does not write blocks to storage from the
above the truncation range. database instance, it must support write-ahead logging by ensuring
If Aurora is unable to establish write quorum for one of its redo log records for dirty blocks have been made durable before
protection groups, it initiates repair from the available read quorum discarding the block from cache. This ensures that the latest version
to rebuild the failed segments. Once the volume is available for reads of a data block can always be found either in cache or in the cache

792
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

or by finding the latest durable version of the block in one of the is promoted to a write instance – it only needs to run a local crash
segments of the protection group that it belongs to. recovery to align its in-memory state.
Aurora does not do quorum reads. Through its bookkeeping of
3.3 Structural Consistency in Aurora Replicas
writes and consistency points, the database instance knows which
segments have the last durable version of a data block and can Managing structural consistency with asynchronous operations
request it directly from any of those segments. Avoiding the ampli- against shared durable state requires care. A single writer has local
fication of read quorums does make Aurora subject to latency when state for all writes and can easily coordinate snapshot isolation,
storage nodes are down or jitter when they are busy. We manage consistency points for storage, transaction ordering, and structural
this by tracking response time from storage nodes for read requests. atomicity. It is more complex for replicas.
The database instance will usually issue a request to the segment Aurora uses three invariants to manage replicas. First, replica
with the lowest measured latency, but occasionally also query one read views must lag durability consistency points at the writer in-
of the others in parallel to ensure up to date read latency response stance. This ensures that the writer and reader need not coordinate
times. If a request is taking longer than expected, will issue a read to cache eviction. Second, structural changes to the database, for ex-
another storage node and accept whichever one returns first. This ample B-Tree splits and merges, must be made visible to the replica
caps the latency due to slow or unavailable segments. In an active atomically. This ensures consistency during block traversals. Third,
system, this can be done without request timeouts by inspecting read views on replicas must be anchorable to equivalent points in
the list of outstanding requests when performing other I/Os. time on the writer instance. This ensures that snapshot isolation is
preserved across the system.
3.2 Scaling Reads Using Read Replicas To understand structural consistency on the replica, let us first
Many database systems scale reads by replicating updates from examine structural consistency on the writer instance, using Au-
a writer instance to a set of read replica instances. Typically, this rora MySQL as an example. Each database transaction in Aurora
involves transporting either logical statement updates or physical MySQL is a sequence of ordered mini-transactions (MTRs) that are
redo log records from the writer to the readers. Replication is done performed atomically. Each MTR is composed of changes to one
synchronously if the replicas are intended as failover targets with- or more data blocks, represented as a batch of sequenced redo log
out data loss and asynchronously if replica lag or data loss during records to provide consistency of structural changes, such as those
failover is acceptable. involving B-Tree splits. The database instance acquires latches for
Both synchronous and asynchronous replication have undesir- each data block, allocates a batch of contiguously ordered LSNs,
able characteristics. Synchronous replication introduces perfor- generates the log records, issues a write, shards then into write
mance jitter and failure modalities in the write path. Asynchronous buffers for each protection group associated with the blocks, and
replication introduces data loss on failure of the writer. In both writes them to the various storage nodes for the segments in the
cases, replication takes time to set up, requiring copying the un- protection group. We use an additional consistency point, the Vol-
derlying database volume and catching up on active changes. It is ume Durable LSN (VDL), to represent the last LSN below VCL
also expensive, since it doubles not only the instance costs, but also representing an MTR completion.
storage costs. Much of the throughput of the replica instance goes Replicas do not have the benefit of the latching used at the
to replicate write activity, not to scaling reads. writer instance to prevent read requests from seeing non-atomic
Aurora supports logical replication to communicate with non- structural updates. To create equivalent ordering, we ensure that log
Aurora systems and in cases where the application does not want records are only shipped from the writer instance in MTR chunks.
physical consistency – for example, when schemas differ. Internally, At the replica, they must be applied in LSN order, applied only if
within an Aurora cluster, we use physical replication. Aurora read above the VDL in the writer as seen in the replica, and applied
replicas attach to the same storage volume as the writer instance. atomically in MTR chunks to the subset of blocks in the cache.
They receive a physical redo log stream from the writer instance Read requests are made relative to VDL points to avoid seeing
and use this to update only data blocks present in their local caches. structurally inconsistent data.
Redo records for uncached blocks can be discarded, as they can be
read from the shared storage volume. 3.4 Snapshot Isolation and Read View Anchors
in Aurora Replicas
This approach allows Aurora customers to quickly set up and tear
down replicas in response to sharp demand spikes, since durable Once we have ensured that cached replica state is structurally
state is shared. Adding replicas does not change availability or consistent, allowing traversal of physical data structures, we must
durability characteristics, since durable state is independent from also ensure it is also logically consistent using snapshot isolation.
the number of instances accessing that state. There is little latency The redo log seen by a read replica does not carry the state
added to the write path on the writer instance since replication needed to establish SCL, PGCL, VCL, or VDL consistency points.
is asynchronous. Since we only update cached data blocks on the Nor is the read replica in the communication path between the
replicas, most resources on the replica remain available for read re- writer and storage nodes to establish this state on its own. Note
quests. And most importantly, if a commit has been marked durable that VDL advances based on acknowledgements from storage nodes,
and acknowledged to the client, there is no data loss when a replica not redo issuance from the writer. The writer instance sends VDL

793
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

update control records as part of its replication stream. Although ensuring each transition is reversible. Each membership change to
the active transaction list can be reconstructed at the replica us- a protection group is associated with a membership epoch, which is
ing redo records and VDL advancement, for efficiency reasons we monotonically incremented with each change. Membership changes
ship commit notifications and maintain transaction commit history. do not block either reads or writes.
Read views at the replica are built based on these VDL points and
Each read or write request from an instance and each gossip
transaction commit history. Replicas revert active transactions for
request from a peer segment passes in the epoch based on their
MVCC using undo, just as on the writer instance.
current understanding of quorum membership. As with volume
Since VDL on the replica may lag the writer, Aurora storage epochs, clients with stale membership epochs have their requests
nodes must ensure that past values are available to be read. Au- rejected and must update membership information. An epoch in-
rora blocks are written out-of-place and non-destructively. Older crement requires a write quorum to be met, just as any other write
versions are not garbage collected until we can assure neither the does. The request to increment membership epoch must pass in
writer instance or any replica might need to access it. We do this by the correct membership epoch, just as any other request does. As
maintaining a Protection Group Minimum Read Point LSN (PGM- with our other epochs, membership epochs ensure we can update
RPL), representing the lowest LSN read point for any active request membership without complex consensus, fence out others with-
on that database instance. A storage node may only advance its out waiting for lease expiry, and operate using the same failure
garbage collection point once PGMRPL has advanced for all in- tolerance as quorum reads and writes themselves.
stances that have opened the volume. The storage nodes will only
accept read requests between PGMRPL and SCL. A B C D E F
Epoch 1: All node healthy

4 FAILURES AND QUORUM MEMBERSHIP

A B C D E F
Managing quorum failures is complex. Traditional mechanisms
cause I/O stalls while membership is being changed. They are gener- A B C D E G
ally intolerant of additional failures during the membership change Epoch 2: Node F is in suspect state; second
quorum group is formed with node G; both
process. Most membership change protocols are intolerant of re- quorums are active
admitting previously fenced-out members which is particularly
A B C D E F
challenging – there is considerable state on storage nodes using
modern disks and repair takes time. For these reasons, systems tend A B C D E G
to be conservative about changing membership, increasing latency Epoch 3: Node F is confirmed unhealthy; new
and risking multiple faults that break quorum. quorum group with node G is active.

The probability of failures grows with the number of segments.

In Aurora, with six segments spread across three AZs for every Figure 5: Quorum Membership Changes
10GB of user data, a 64TB volume has 38,400 segments. Failures of
storage nodes, top of rack switches, network paths, or entire AZs Figure 5 illustrates how we replace segment F with segment
can impact many database volumes at the same time and require G. Rather than attempting to directly transition from ABCDEF
several repairs. In this section, we describe how Aurora supports to ABCDEG, we make our transition in two steps. First, we add
I/O processing, multiple faults, and member re-introduction while G to our quorum, moving the write set to 4/6 of ABCDEF AND
performing membership changes. 4/6 of ABCDEG. The read set is therefore 3/6 of ABCDEF OR 3/6
of ABCDEG. If F comes back, we can make a second membership
4.1 Using Quorum Sets to Change Membership change back to ABCDEF. That quorum subset met our write quorum
and is an available next step. If F continues to be down once G has
Consider a protection group with the six segments A, B, C, D, E,
completed hydrating from its peers, we can make a membership
and F. In Aurora, the write quorum is any four members out of this
change to ABCDEG. That quorum subset also met our write quorum
set of six, and the read quorum is any three members. Let us assume
and is an available next step. We do not discard any durable state
that a database instance or monitoring agent stops receiving timely
until back to a fully repaired quorum.
acknowledgements for segment F and wants to consider replacing
it with a new segment G. However, F may be encountering a tem- Let us now consider what happens if E also fails while we are
porary failure and may come back quickly. It may be processing replacing F with G, and we wish to replace it with H. In this case,
requests, but not be observable to this monitor. It may just be busy. we would move from a write quorum set of ((4/6 of ABCDEF AND
At the same time, we do not want to wait to see if F comes back. 4/6 of ABCDEG) AND (4/6 of ABCDFH AND 4/6 of ABCDGH)). As
It may be permanently down. Waiting extends the duration of im- with a single failure, I/Os can proceed, the operation is reversible,
pairment, during which we may see additional faults and increased and the membership change can occur with an epoch increment.
latency. Note that, both with a single failure and with multiple failures,
simply writing to the four members ABCD meets quorum.
Aurora uses the abstraction of quorum sets to quickly transition
membership changes, using Boolean logic to ensure more sophis- Quorum membership changes have the same failure character-
ticated read quorum and write quorums that are guaranteed to istics as read and write I/Os. Using Boolean logic, we can prove
overlap. We make at least two transitions per membership change, that each transition is correct, safe, and reversible, whatever the

794
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

sequence of errors and repairs may be. Transitions require only the us achieve storage prices comparable to low-cost alternatives, while
single epoch update to the write quorum of a protection group. Up- providing high durability, availability, and performance.
dates of stale state are similarly simple, requiring just one additional
request past the one rejected. 5 RELATED WORK
We also use epochs to manage volume growth, using a volume In this section we discuss other contributions and how they relate
geometry epoch that increments with each protection group added to the techniques used in Aurora and discussed in this paper.
to the volume. This can also be used to change the quorum model
Consensus and Distributed Transactions. Distributed systems
itself, for example, when moving from a 4/6 write quorum to 3/4 to
rely on consensus to allow a group of processes to agree on a sin-
handle the extended loss of an AZ.
gle value and tolerate faults in one or more of its members. Some
notable consensus algorithms include Paxos and variants [4, 5],
4.2 Using Quorum Sets to Reduce Costs Raft [9], and Viewstamped Replication [8]. A distributed database
Quorums are generally thought of as a collection of like members, requires a commit protocol that enforces that all processes start out
grouped together to transparently handle failures. However, there in a “working” state and all either end in an “aborted” or “committed”
is nothing in the quorum model to prevent unlike members with state. Distributed commit may be implemented using consensus
differing latency, cost, or durability characteristics. protocols such as Paxos or other approaches like 2-phase commit
and can incur considerable network overheads. Another recent sys-
In Aurora, a protection group is composed of three full segments,
tem that avoids the use of distributed commit is Calvin [11] which
which store both redo log records and materialized data blocks, and
implements a transaction scheduling and data replication layer that
three tail segments, which contain redo log records alone. Since
uses a deterministic ordering guarantee. Since all nodes reach an
most databases use much more space for data blocks than for redo
agreement regarding what transactions to attempt and in what or-
logs, this yields a cost amplification closer to three copies of the data
der, Calvin is able to completely avoid distributed commit protocols,
rather than a full six while satisfying our requirement to support
reducing the contention footprints of distributed transactions.
AZ+1 failures.
Quorums. Quorum-based approaches have been used for distributed
The use of full and tail segments changes how we construct our
commit protocols [10] as well as for replicating data [3].
read and write sets. Our write quorum is 4/6 of any segment OR 3/3
of full segments. Our read quorum is therefore 3/6 of any segment Distributed SQL Databases. Google Cloud Spanner [1] is a SQL
AND 1/3 of full segments. In practice, this means that we write log database on a quorum replicated system, using Multi-Paxos to
records to the same 4/6 quorum as we did previously. At least one establish consensus for every write providing strong consistency
of these log records arrives at a full segment and generates a data guarantees. Cloud Spanner enables clustering of tables to reduce
block. We read data from our full segments, using the optimization the participants in distributed transactions.
described earlier to avoid quorum reads.
Replication. Traditional database replication techniques consume
Repairing a tail segment simply requires reading from the other a physical or logical log that represents changes made in the data-
members of the protection group, using our SCL to determine and base and replicates these changes in a completely independent
fill in the gaps from other quorum members with SCLs higher than database. For example, Liu et al [6] describe how DB2 implements
our own. Repairing a full segment is a bit more complex since the transactional replication from a partitioned database system by com-
segment being repaired may have been the only full segment that bining the physical write-ahead log from each node. Oracle uses
saw the last write to the protection group. physical replication via Data Guard [2] to provide high availability
and disaster recovery. Some database systems like MySQL support
Even so, we must have at least one other full segment from
logical replication [7] using command/statement logging [13].
which we can read data blocks even if it has not seen the most
recent write. We have enough copies of the redo log record so that
we can rebuild a full segment and be up to date. We also gossip 6 CONCLUSIONS
between the segments of a quorum to ensure that any missing Aurora avoids considerable network, storage, and database process-
writes are quickly filled in. This reduces the probability we need to ing by leveraging a few simple techniques to avoid complex, brittle,
rebuild a full segment without adding a performance burden to our and expensive consensus protocols. Most distributed consensus
write path. Once we have our full segment baseline, we can obtain algorithms abhor state and establish their baseline from first prin-
redo log records from other segments using our SCL in the same ciples. But, databases are all about the management of state. Why
manner as tail segments. not use it for our own benefit?
There are many options available once one moves to quorum sets Aurora is able to avoid much of the work of consensus by recog-
of unlike members. One can combine local disks to reduce latency nizing that, during normal forward processing of a system, there
and remote disks for durability and availability. One can combine are local oases of consistency. Using backward chaining of redo
SSDs for performance and HDDs for cost. One can span quorums records, a storage node can tell if it is missing data and gossip with
across regions to improve disaster recovery. There are numerous its peers to fill in gaps. Using the advancement of segment chains,
moving parts that one needs to get right, but the payoffs can be a database instance can determine whether it can advance durable
significant. For Aurora, the quorum set model described earlier lets points and reply to clients requesting commits. Coordination and

795
Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

consensus is rarely required. While this state is ephemeral, it can REFERENCES

be re-established when recovering from failure. [1] David F. Bacon, Nathan Bales, Nico Bruno, Brian F. Cooper, Adam Dickinson,
Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan,
The use of monotonically increasing consistency points – SCLs, Alexander Lloyd, Sergey Melnik, Rajesh Rao, David Shue, Christopher Taylor,
Marcel van der Holst, and Dale Woodford. 2017. Spanner: Becoming a SQL
PGCLs, PGMRPLs, VCLs, and VDLs – ensures the representation System. In Proceedings of the 2017 ACM International Conference on Management
of consistency points is compact and comparable. These may seem of Data (SIGMOD ’17). ACM, New York, NY, USA, 331–343. https://doi.org/10.
like complex concepts but are just the extension of familiar database 1145/3035918.3056103
[2] Larry Carpenter, Joseph Meeks, Charles Kim, Bill Burke, Sonya Carothers, Joydip
notions of LSNs and SCNs. The key invariant is that the log only ever Kundu, Michael Smith, and Nitin Vengurlekar. 2009. Oracle Data Guard 11G
marches forward. This also simplifies the process of coordinating Handbook (1 ed.). McGraw-Hill, Inc., New York, NY, USA.
multiple request processors, as shown here for replicas operating [3] David K. Gifford. 1979. Weighted Voting for Replicated Data. In Proceedings of
the Seventh ACM Symposium on Operating Systems Principles (SOSP ’79). ACM,
against common storage. New York, NY, USA, 150–162. https://doi.org/10.1145/800215.806583
[4] Leslie Lamport. 1998. The Part-time Parliament. ACM Trans. Comput. Syst. 16, 2
Epochs provide a simple way to make changes to a distributed (May 1998), 133–169. https://doi.org/10.1145/279227.279229
system, only relying on the basic notions of reading and writing [5] L. Lamport. 2001. Paxos made simple. ACM SIGACT News 32, 4 (2001), 18–25.
[6] Chengfei Liu, Bruce G. Lindsay, Serge Bourbonnais, Elizabeth B. Hamel, Tuong C.
to the relevant quorums. This ensures there is a consistent way to Truong, and Jens Stankiewitz. 2003. Capturing Global Transactions from Multiple
reason about availability and durability, and that there are no sharp Recovery Log Files in a Partitioned Database System. In Proceedings of the 29th
edges when recovering from failures or changing how one must International Conference on Very Large Data Bases - Volume 29 (VLDB ’03). VLDB
Endowment, 987–996. http://dl.acm.org/citation.cfm?id=1315451.1315536
interact with a quorum. The combination of epochs and quorum sets [7] Mike Nugent. 2010. MySQL Replication. Linux J. 2010, 195, Article 2 (July 2010).
make changes reversible and non-blocking, making membership http://dl.acm.org/citation.cfm?id=1883478.1883480
[8] Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Pri-
change decisions inconsequential. Quorum sets also open up system mary Copy Method to Support Highly-Available Distributed Systems. In Proceed-
design to more sophisticated architectures to reduce latency and ings of the Seventh Annual ACM Symposium on Principles of Distributed Computing
cost while improving availability and durability. (PODC ’88). ACM, New York, NY, USA, 8–17. https://doi.org/10.1145/62546.62549
[9] Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable
We believe these techniques are broadly applicable beyond sys- Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX
Annual Technical Conference (USENIX ATC’14). USENIX Association, Berkeley,
tems like Aurora to other systems coordinating multiple actors or CA, USA, 305–320. http://dl.acm.org/citation.cfm?id=2643634.2643666
involving shared state. [10] Dale Skeen. 1982. A Quorum-Based Commit Protocol. Technical Report TR 82-483.
Cornell University, Ithaca, NY.
[11] Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao,
ACKNOWLEDGMENTS and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned
Database Systems. In Proceedings of the 2012 ACM SIGMOD International Con-
We thank the entire Aurora MySQL and Aurora PostgreSQL devel- ference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA, 1–12.
opment teams for their efforts on the project, including our current https://doi.org/10.1145/2213836.2213838
members as well as our distinguished alumni (Sam McKelvie, Yan [12] Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam,
Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz
Leshinsky, Lon Lundgren, Pradeep Madhavarapu, Hyungsoo Jung, Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations
and Stefano Stefani). We are especially grateful to Mehul Shah for for High Throughput Cloud-Native Relational Databases. In Proceedings of the
his help revising the paper. 2017 ACM International Conference on Management of Data (SIGMOD ’17). ACM,
New York, NY, USA, 1041–1052. https://doi.org/10.1145/3035918.3056101
[13] Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, and Sai Wu.
2016. Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed
In-memory Databases. In Proceedings of the 2016 International Conference on
Management of Data (SIGMOD ’16). ACM, New York, NY, USA, 1119–1134.
https://doi.org/10.1145/2882903.2915208

796

AWS Knowledge Check 1 - 5
No ratings yet
AWS Knowledge Check 1 - 5
28 pages
CLF C02
No ratings yet
CLF C02
4 pages
AWS Cloud Practicioner Exam Examples 0
75% (4)
AWS Cloud Practicioner Exam Examples 0
110 pages
Amazon Dynamo DB - Presentation
100% (1)
Amazon Dynamo DB - Presentation
30 pages
AWSome Day - EN 1
No ratings yet
AWSome Day - EN 1
41 pages
Aws RDS
100% (1)
Aws RDS
38 pages
DAHUA Firmware List 20160216
25% (4)
DAHUA Firmware List 20160216
40 pages
Normalization Exercise Answer
100% (2)
Normalization Exercise Answer
8 pages
Colyer.-Aurora-I-2019
No ratings yet
Colyer.-Aurora-I-2019
9 pages
Amazon_Aurora_storage_demystified_DAT401
No ratings yet
Amazon_Aurora_storage_demystified_DAT401
30 pages
Using Machine Learning From Your Database: Danilo Poccia, Chief Evangelist (EMEA) @danilop
No ratings yet
Using Machine Learning From Your Database: Danilo Poccia, Chief Evangelist (EMEA) @danilop
24 pages
CLF-C02 5-0 Current
No ratings yet
CLF-C02 5-0 Current
328 pages
Designing
No ratings yet
Designing
161 pages
Lesson_06_Databases
No ratings yet
Lesson_06_Databases
80 pages
Amazon SD PDF
No ratings yet
Amazon SD PDF
61 pages
p1041 Verbitski PDF
No ratings yet
p1041 Verbitski PDF
12 pages
112115115 CC LAB7
No ratings yet
112115115 CC LAB7
7 pages
Final Exam
No ratings yet
Final Exam
13 pages
Amazon Aurora: Mysql Compatible Edition
No ratings yet
Amazon Aurora: Mysql Compatible Edition
19 pages
AWS+Database Distribution ML
No ratings yet
AWS+Database Distribution ML
59 pages
AWS-Certified-Cloud-Practitioner practice
No ratings yet
AWS-Certified-Cloud-Practitioner practice
10 pages
Module 4
No ratings yet
Module 4
32 pages
Practice Test 1
No ratings yet
Practice Test 1
56 pages
Cloud Computing Question Bank
No ratings yet
Cloud Computing Question Bank
25 pages
aws-certified-cloud-practitioner_4-2
No ratings yet
aws-certified-cloud-practitioner_4-2
30 pages
A Survey On AWS Cloud Computing Security Challenges Amp Solutions
No ratings yet
A Survey On AWS Cloud Computing Security Challenges Amp Solutions
4 pages
) A.) B.) C.) D.: Correct
No ratings yet
) A.) B.) C.) D.: Correct
72 pages
aws-solution-architect-associate_1
No ratings yet
aws-solution-architect-associate_1
23 pages
Amazon Aurora (MySQL and PostgreSQL)
No ratings yet
Amazon Aurora (MySQL and PostgreSQL)
7 pages
Examen Practica AWS Practitioner
No ratings yet
Examen Practica AWS Practitioner
17 pages
Question Set 3
No ratings yet
Question Set 3
13 pages
Colyer.-Aurora-II-2019
No ratings yet
Colyer.-Aurora-II-2019
5 pages
Literature Survey 2.1 Java Swings: Ava Swing Tutorial
No ratings yet
Literature Survey 2.1 Java Swings: Ava Swing Tutorial
11 pages
Aws Solutions Associate
No ratings yet
Aws Solutions Associate
9 pages
AWS Certified Cloud Practitioner Study Guide TEST ANSWER
No ratings yet
AWS Certified Cloud Practitioner Study Guide TEST ANSWER
2 pages
Mid Term Assessment
No ratings yet
Mid Term Assessment
6 pages
2020 Cloud DB Survey UW
No ratings yet
2020 Cloud DB Survey UW
75 pages
Cloud Computing Assignment-1 Akagra Gupta 2K21 - EE - 30
No ratings yet
Cloud Computing Assignment-1 Akagra Gupta 2K21 - EE - 30
5 pages
AWS Database Services Cheat Sheet
No ratings yet
AWS Database Services Cheat Sheet
22 pages
UNIT-5 CLOUD COMPUTING (EEE-IV-I) (1)
No ratings yet
UNIT-5 CLOUD COMPUTING (EEE-IV-I) (1)
22 pages
AWS solution architect question and answers
No ratings yet
AWS solution architect question and answers
154 pages
AWS Practitioner EXAM 2
No ratings yet
AWS Practitioner EXAM 2
5 pages
AWS Certified Cloud Practitioner Cheat Sheet Guide
No ratings yet
AWS Certified Cloud Practitioner Cheat Sheet Guide
13 pages
aws_developer_interview_archived
No ratings yet
aws_developer_interview_archived
8 pages
Handout Observability For Builders
No ratings yet
Handout Observability For Builders
34 pages
AWS Certified Cloud Practitioner 2019 Practice Questions AWS Certified Cloud Practitioner Practice Exam Dumps, 100 Pass... (Busam, Chandra Prakash Busam, Gopi Chand) (Z-Library)
No ratings yet
AWS Certified Cloud Practitioner 2019 Practice Questions AWS Certified Cloud Practitioner Practice Exam Dumps, 100 Pass... (Busam, Chandra Prakash Busam, Gopi Chand) (Z-Library)
108 pages
AcademyCloudFoundations Module 08
No ratings yet
AcademyCloudFoundations Module 08
63 pages
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
No ratings yet
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
11 pages
AWS CCP Practice Questions (Other AWS Technologies)
No ratings yet
AWS CCP Practice Questions (Other AWS Technologies)
23 pages
5 AWS Database Services 19-08-2024
No ratings yet
5 AWS Database Services 19-08-2024
21 pages
AWS
No ratings yet
AWS
29 pages
Lecture 04 - Cloud Storage
No ratings yet
Lecture 04 - Cloud Storage
28 pages
AWS2
100% (1)
AWS2
37 pages
(Sep-2022) New PassLeader SAA-C03 Exam Dumps
No ratings yet
(Sep-2022) New PassLeader SAA-C03 Exam Dumps
7 pages
AWS Certified Cloud Practitioner
No ratings yet
AWS Certified Cloud Practitioner
20 pages
AWS Practitioner EXAM 3
No ratings yet
AWS Practitioner EXAM 3
4 pages
PRACTICE3
No ratings yet
PRACTICE3
52 pages
Tips To Pass 70-740 Exam
No ratings yet
Tips To Pass 70-740 Exam
3 pages
Ee PDF 2021-Jul-26 by Corey 575q Vce
No ratings yet
Ee PDF 2021-Jul-26 by Corey 575q Vce
8 pages
Amazon - Transcender.aws Solution Architect Associate - Pdf.exam.2020 Nov 18.by - Emmanuel.146q.vce
No ratings yet
Amazon - Transcender.aws Solution Architect Associate - Pdf.exam.2020 Nov 18.by - Emmanuel.146q.vce
29 pages
Rust Essentials: Safe and Fast Programming
From Everand
Rust Essentials: Safe and Fast Programming
William Smith
No ratings yet
Building an Operating System with Rust: A Practical Guide
From Everand
Building an Operating System with Rust: A Practical Guide
Robert Johnson
No ratings yet
Optimizing Enterprise Economics Serverless Architectures
No ratings yet
Optimizing Enterprise Economics Serverless Architectures
25 pages
Lakehouse: A New Generation of Open Platforms That Unify Data Warehousing and Advanced Analytics
No ratings yet
Lakehouse: A New Generation of Open Platforms That Unify Data Warehousing and Advanced Analytics
8 pages
2 Day Bootcamp For OpenStack - Mirantis
No ratings yet
2 Day Bootcamp For OpenStack - Mirantis
66 pages
Vsphere Hardening Guide April 2010
No ratings yet
Vsphere Hardening Guide April 2010
110 pages
ArcSight ACSA ESM 4-0 Course
No ratings yet
ArcSight ACSA ESM 4-0 Course
8 pages
Problems in The Preservation of Electronic Records
No ratings yet
Problems in The Preservation of Electronic Records
14 pages
As "VIJAY JOTANI'S BLOG" ..Now in Facebook .JOIN ON FB "VIJAY JOTANI'S BLOG" Please Check Answers Before Writing We Are Not Responsible For Anything
No ratings yet
As "VIJAY JOTANI'S BLOG" ..Now in Facebook .JOIN ON FB "VIJAY JOTANI'S BLOG" Please Check Answers Before Writing We Are Not Responsible For Anything
17 pages
100 Enterprise Architect interview Questions
No ratings yet
100 Enterprise Architect interview Questions
4 pages
Bibliographic Reference Portion of Abstracts
No ratings yet
Bibliographic Reference Portion of Abstracts
19 pages
DA&DM 3rd Unit-Notes
No ratings yet
DA&DM 3rd Unit-Notes
7 pages
Megan Westman 2023 For Website
No ratings yet
Megan Westman 2023 For Website
1 page
W1L1 - Course Outline
No ratings yet
W1L1 - Course Outline
6 pages
Guided Tutorial For Pentaho Data Integration Using Mysql
No ratings yet
Guided Tutorial For Pentaho Data Integration Using Mysql
39 pages
Bumn
No ratings yet
Bumn
50 pages
GD Top 30 Master Data Management Tools
No ratings yet
GD Top 30 Master Data Management Tools
71 pages
When Using OSAM: - Reasons You May Want To Use OSAM Are
No ratings yet
When Using OSAM: - Reasons You May Want To Use OSAM Are
90 pages
AI Bsics
No ratings yet
AI Bsics
3 pages
Introduction to Desktop Voice Assistants 2
No ratings yet
Introduction to Desktop Voice Assistants 2
12 pages
IBM SQL and Database For DS
No ratings yet
IBM SQL and Database For DS
1 page
CAPE Unit 1 SBA Mark Scheme
No ratings yet
CAPE Unit 1 SBA Mark Scheme
3 pages
ADBT Unit-1
No ratings yet
ADBT Unit-1
17 pages
KELTRON PRACTICE 5
No ratings yet
KELTRON PRACTICE 5
1 page
21 SQL Assignment - 717430
No ratings yet
21 SQL Assignment - 717430
8 pages
Applies To:: OBIEE 12c: How To Enable Usage Tracking? (Doc ID 2366978.1)
100% (1)
Applies To:: OBIEE 12c: How To Enable Usage Tracking? (Doc ID 2366978.1)
6 pages
Final Report ABSTRACT
No ratings yet
Final Report ABSTRACT
4 pages
ch.en.u4cys22016_lab1,2
No ratings yet
ch.en.u4cys22016_lab1,2
20 pages
Emerging Trends in Information Technology
No ratings yet
Emerging Trends in Information Technology
9 pages
SLIDE Principles of Marketing - Chapter 2
No ratings yet
SLIDE Principles of Marketing - Chapter 2
12 pages
Data Engineer
No ratings yet
Data Engineer
2 pages
Naukri SaurabhPrakashKamble (2y 2m)
No ratings yet
Naukri SaurabhPrakashKamble (2y 2m)
2 pages
Chapter 3 - Database Systems, Data Warehouses, and Data Marts
No ratings yet
Chapter 3 - Database Systems, Data Warehouses, and Data Marts
5 pages
Romi DM Apr2020 PDF
No ratings yet
Romi DM Apr2020 PDF
720 pages

Amazon Aurora: On Avoiding Distributed Consensus For I/Os, Commits, and Membership Changes

Uploaded by

Amazon Aurora: On Avoiding Distributed Consensus For I/Os, Commits, and Membership Changes

Uploaded by

Industry 3: DB Systems in the Cloud and Open Source SIGMOD’18, June 10-15, 2018, Houston, TX, USA

Amazon Aurora: On Avoiding Distributed Consensus for I/Os,

for commit processing, and re-establishes them upon crash 2 UPDATE

Finally, we briefly survey related work in Section 5 and present

2 MAKING WRITES EFFICIENT

3.1 Avoiding quorum reads

4 FAILURES AND QUORUM MEMBERSHIP

The probability of failures grows with the number of segments.

consensus is rarely required. While this state is ephemeral, it can REFERENCES

You might also like