0% found this document useful (0 votes)
2 views

Lecture 7.2 Consistency

The document discusses the concept of consistency in distributed systems, particularly in the context of ACID transactions and replication, highlighting different consistency models such as strict, sequential, causal, and linearizability. It introduces the CAP theorem, which states that in a distributed data store, one can only achieve two out of three guarantees: consistency, availability, and partition tolerance. Additionally, it covers client-centric consistency models and specific types like monotonic reads and writes, emphasizing their importance in managing data consistency across distributed environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 7.2 Consistency

The document discusses the concept of consistency in distributed systems, particularly in the context of ACID transactions and replication, highlighting different consistency models such as strict, sequential, causal, and linearizability. It introduces the CAP theorem, which states that in a distributed data store, one can only achieve two out of three guarantees: consistency, availability, and partition tolerance. Additionally, it covers client-centric consistency models and specific types like monotonic reads and writes, emphasizing their importance in managing data consistency across distributed environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

• Consistency may mean different things in different contexts

• In the context of ACID transactions, C:


o A transaction transforms a database from one “consistent” state
to another
o Here, “consistent” = satisfying application-specific invariants
§ E.g., “Transferring money from a checking account to a
savings account in the same back should not change the
total amount of money”
• In the context of replication, consistency refer to a relationship
between replicas
o Often called a consistency contract
§ Helps us reason about how to handle complex situations
§ E.g., concurrency, failures, RPC retransmission, leader
changes, etc.
o To keep replicas consistent, we generally need to ensure that all
conflicting operations are done "correctly" everywhere
o Conflicting operations (from the world of transactions)
§ Read-write conflict: a read operation and a write operation
act concurrently
§ Write-write conflict: two concurrent write operations
o Guaranteeing global ordering on conflicting operations may be a
costly operation, downgrading scalability
§ Solution: weaken consistency requirements
• Many to choose from

CAP Theorem
• Proposed by Eric Brewer (late 90s)
o Later proved by (Gilbert and Lynch)
• In a distributed data store, we can satisfy at most 2 out of the 3
guarantees:
o Consistency: all nodes see same data at any time, or reads return
latest written value by any client
§ E.g., Bank transactions (write) should be propagated to all
the replicas before subsequent write operations
o Availability: the system allows operations all the time, and
operations return quickly
§ E.g., At Amazon, each added millisecond of latency implies
a $6M yearly loss (2009)
o Partition-tolerance: the system continues to work in spite of
network partitions
§ E.g., Internet router outages
• Traditional RDBMSs
o Not replicated, so partition tolerance is not important
§ Provides strong consistency, availability
• For replicated storages, partition-tolerance is essential
o So, a replicated system has to choose between consistency and
availability
o Cassandra, Dynamo
§ Partition-tolerance, Availability, Eventual(weak)
consistency
o BigTable, Spanner
§ Partition-tolerance, Consistency
Consistency model: a contract between a (distributed) data store and
processes, in which the data store specifies what the results of read and
write operations are in the presence of concurrency

Data-Centric Consistency Models

Strict Consistency
• Any read on a data item ‘x’ returns a value corresponding to the result
of the most recent write on ‘x’ (regardless of where the write
occurred)
• All writes are instantaneously visible to all processes and absolute
global time order is maintained throughout the distributed system
o Not easy to achieve in real world
Example:

• Each data item is initially NIL


• P1 does a write to a data item x, modifying its value to a
• Operation W1(x)a is first performed on a copy of the data store that is
local to P1, and is then propagated to the other local copies
NOTE: it takes some time to propagate the update of x to P2, which is okay

Behavior of two processes, operating on the same data item:


• Figure a: a strictly consistent data-store
• Figure b: a data-store that is not strictly consistent

Sequential consistency

• Defined by Lamport (1979)


o The result of any execution is the same as if the operations of
all processes were executed in some sequential order, and the
operations of each individual process appear in this sequence in
the order specified by its program
• When processes run concurrently on (possibly) different machines, any
valid interleaving of read and write operations is acceptable behavior,
but all processes see the same interleaving of operations
(a) A sequentially consistent data store (b) A data store that is not
sequentially consistent

Causal consistency

• Defined by Hutto and Ahamad(1990)


o Writes that are potentially causally related must be seen by all
processes in the same order
o Concurrent writes may be seen in a different order by different
processes

• Writes W(x)b and W(x)c are concurrent, so it is not required that all
processes see them in the same order
• W(x)b has a causal relationship with W(x)a but not with W(x)c
• The two writes are causally related, so all processes must see them in
the same order

• (a) is incorrect since it violates ordering based on causality


• On the other hand, in Figure (b) the read has been removed, so W(x)a
and W(x)b are now concurrent writes
• A causally consistent store does not require concurrent writes to be
globally ordered, so Figure (b) is correct

Note: Figure (b) reflects a situation that would not be acceptable for a
sequentially consistent store
Linearizability
• "Linearizability" is the most common and intuitive definition
formalizes behavior expected of a single server ("strong" consistency)
• An execution history is linearizable:
o If one can find a total order of all operations, that matches
real-time (for non-overlapping ops), and
o In which each read sees the value from the write preceding it in
the order
Note: A history is a record of client operations, each with arguments, return
value, invocation time, and completion time. A history is usually a trace of
what clients saw in an actual execution

Example history 1
|-W(x1)-| |-W(x)2-|
|---R(x)2---|
|-R(x)1-|
• Constraint arrows:
o The order obeys value constraints (W -> R)
o The order obeys real-time constraints (W(x)1 -> W(x)2)
• This order satisfies the constraints:
o W(x)1 R(x)1 W(x)2 R(x)2
• So, the history is linearizable

Note:
• The definition is based on external behavior
• So, we can apply it without having to know how service works
• Histories explicitly incorporates concurrency in the form of
overlapping operations
• Thus, good match for how distributed systems operate

Example history 2
|-W(x)1-| |-W(x)2-|
|--R(x)2--|
|-R(x)1-|
• Constraint arrows:
o W(x)1 before W(x)2 (time)
o W(x)2 before R(x)2 (value)
o R(x)2 before R(x)1 (time)
o R(x)1 before W(x)2 (value)
• There's a cycle
o So it cannot be turned into a linear order
• So, this history is not linearizable

Example history 3
|--W(x)0--| |--W(x)1--|
|--W(x)2--|
|-R(x)2-| |-R(x)1-|
• Order: W(x)0 W(x)2 R(x)2 W(x)1 R(x)1
• So, the history linearizable
• So, the service can pick either order for concurrent writes
Example history 4
|--W(x)0--| |--W(x)1--|
|--W(x)2--|
C1: |-R(x)2-| |-R(x)1-|
C2: |-R(x)1-| |-R(x)2-|
Constraints:
• W(x)2 then C1: R(x)2 (value)
• C1: R(x)2 then W(x)1 (value)
• W(x)1 then C2: R(x)1 (value)
• C2: R(x)1 then W(x)2 (value)
• Cycle! so not linearizable

Example history 5
|-W(x)1-|
|-W(x)2-|
|-R(x)1-|
• Constraints:
o W(x)2 before R(x)1 (time)
o R(x)1 before W(x)2 (value)
o Or: time constraints mean only possible order is W(x)1 W(x)2
R(x)1
• There's a cycle; not linearizable

Example history 6
• Suppose clients re-send requests if they don't get a reply
• In case it was the response that was lost:
o Leader remembers client requests it has already seen
o If sees duplicate, replies with saved response from first
execution
• But this may yield a saved value from long time ago
o A stale value!

What does linearizability say?


C1: |-W(x)3-| |-W(x)4-|
C2: |-R(x)3-------------|
order: W(x)3 R(x)3 W(x)4
• So: returning the old, saved value 3 or 4 is correct
• In practice, people are often (but not always) willing to live with
stale data in return for higher performance

Client-Centric Consistency Models


• Data-Centric consistency models maintain a consistent (globally
accessible) data-store in the presence of concurrent read/write
operations
• Client-centric consistency model is the view of things for the
individual client process that is currently operating on the data-store
Question: How fast should updates (writes) be made available to read-only
processes?
• Many data storage systems: mainly read
o E.g., DNS: write-write conflicts do not occur
o They exhibit a high degree of acceptable inconsistency with
the replicas gradually become consistent over time
• Example: Consistency for mobile users
o Consider a distributed database to which you have access through
your notebook
o At location A you access the database doing reads and updates
o At location B you continue your work, but unless you access the
same server as the one at location A, you may detect
inconsistencies
o The problem can be alleviated by introducing client-centric
consistency
o In essence, client-centric consistency provides guarantees for a
single client concerning the consistency of accesses to a data
store by that client
o No guarantees are given concerning concurrent accesses by
different clients
§ If Bob modifies data that is shared with Alice but which is
stored at a different location, we may easily create write-
write conflicts

• Client-centric consistency models originate from the work


on Bayou [see, for example Terry et al. (1994) and Terry et al.,
(1998)]

Bayou distinguishes four different consistency models:


1. Monotonic reads
2. Monotonic writes
3. Read your writes
4. Writes follow reads
Only first two models will be described here:
Notation:
• xi denotes the version of data item x at local copy Li at time t
• WS(xi)is the set of write operations at Li that lead to version xi of x
• If operations in WS(xi) have also been performed at local copy Lj at a
later time t2, we write WS (xi; xj)
• If we do not know if xj follows from xi, we use the notation WS(xi|xj)

Monotonic Reads
• If a process reads the value of a data item x, any successive read
operation on x by that process will always return that same or a more
recent value
• In our example:
o The read operations performed by a single process P at two
different local copies of the same data store
o Local data stores are L1 and L2

(a) A monotonic-read consistent data store. (b) A data store that does not
provide monotonic reads

• The read operations performed by a single process P at two different


local copies of the same data store
• In Figure (a) process P1 first performs a write operation on x at L1,
producing version x1 and later reads this version
o At L2 process P2 first produces version x2, following from x1
o When process P1 moves to L2 and reads x again, it finds a more
recent value, but one that at least took its previous write into
account
• Figure (b) shows a situation in which monotonic-read consistency is
violated
o After process P1 has read x1 at L1, it later performs the
operation R1(x2) at L2
o However, the preceding write operation W2(x1|x2) by process P2 at
L2 is known to produce a version that does not follow from x1
o As a consequence, P1’s read operation at L2 is known not to
include the effect of the write operations when it performed
R1(x1) at location L1

Monotonic Writes
• A write operation by a process on a data item x is completed before any
successive write operation on x by the same process
• More formally, if we have two successive operations Wk(xi) and Wk(xj) by
process Pk, then, regardless where Wk(xj) takes place, we also have
WS(xi;xj)
• Thus, completing a write operation means that the copy on which a
successive operation is performed reflects the effect of a previous
write operation by the same process, no matter where that operation was
initiated
o In other words, a write operation on a copy of item x is
performed only if that copy has been brought up to date by means
of any preceding write operation by that same process, which may
have taken place on other copies of x
• If need be, the new write must wait for old ones to finish

(a) A monotonic-write consistent data store. (b) A data store that does not
provide monotonic-write consistency. (c) Again, no consistency as WS(x1|x2)
and thus also WS(x1|x3) (d) Consistent as WS(x1;x3) although x1 has apparently
overwritten x2.

• In Figure (a) process P1 performs a write operation on x at L1,


presented as the operation W1(x1)
o Later, P1 performs another write operation on x, but this time at
L2, shown as W1(x2; x3)
o The version produced by P1 at L2 follows from an update by
process P2, in turn is based on version x1
o The latter is expressed by the operation W2(x1; x2)
o To ensure monotonic-write consistency, it is necessary that the
previous write operation at L1 has already been propagated to L2,
and possibly updated
• In contrast, Figure (b) shows a situation in which monotonic-write
consistency is not guaranteed
o Compared to Figure (a) what is missing is the propagation of x1 to
L2 before another version of x is produced, expressed by the
operation W2(x1|x2)
o In this case, process P2 produced a concurrent version to x1,
after which process P1 simply produces version x3, but again
concurrently to x1
• Only slightly more subtle, but still violating monotonic write
consistency, is the situation sketched in Figure (c)
o Process P1 now produces version x3 which follows from x2
o However, because x2 does not incorporate the write operations that
led to x1, that is, WS(x1|x2), we also have WS(x1|x3)
• An interesting case is shown in Figure (d)
o The operation W2(x1|x2) produces version x2 concurrently to x1
o However, later process P1 produces version x3, but apparently
based on the fact that version x1 had become available at L2
o How and when x1 was transferred to L2 is left unspecified, but in
any case, a write-write conflict was created with version x2 and
resolved in favor of x1
o A consequence is that the situation shown in Figure (d) follows
the rules for monotonic-write consistency
o Note, however, that any subsequent write by process P2 at L2
(without having read version x1) will immediately violate
consistency again
Optional Read:
• https://jepsen.io/consistency
• Linearizability vs serializability
• Testing Distributed Systems for Linearizability

You might also like