0% found this document useful (0 votes)

10 views37 pages

Lecture 13

The document discusses fault tolerance in distributed systems, focusing on replication, two-phase commit (2PC), and three-phase commit (3PC) protocols. It emphasizes the importance of ensuring atomicity in transactions and outlines the steps involved in 2PC and 3PC, including handling failures and the need for logging and recovery mechanisms. The lecture highlights the challenges of blocking in 2PC and introduces 3PC as a more complex but non-blocking alternative.

Uploaded by

ronesa3901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views37 pages

Lecture 13

Uploaded by

ronesa3901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

University of Pennsylvania

CIS 5050
Software Systems

Linh Thi Xuan Phan

Department of Computer and Information Science

University of Pennsylvania

Lecture 13: Fault tolerance

March 28, 2024

©2016-2024 Linh Thi Xuan Phan

Announcements
• Each team please email me the GitHub IDs of your
team members

• Needed for creating the team GitHub repository

2
©2016-2024 Linh Thi Xuan Phan
Recall: Replication

• How does replication help?

– Durability: Once data is stored, the system should not lose it
– Availability: Data should be accessible whenever we need it
– Speed: Accessing the data should be fast
– Scalability: Distribute load and functionality across machines
• What do we need for replication?
– A replication protocol (how to propagate updates to all replicas)
– A consistency model (what is the ‘right’ value to return to the client)
• Are there any new challenges with replication?
3
©2016-2024 Linh Thi Xuan Phan
Plan for today
• Distributed commit NEXT

– Two-phase commit (2PC)

– Three-phase commit (3PC)

• Logging and recovery

– Centralized checkpointing
– Chandy-Lamport algorithm

4
©2016-2024 Linh Thi Xuan Phan
Why distributed commit?
• Suppose a large bank is operating a ‘shared’
account database
– Example: Node #1 has account data for customers whose first
names start with A, node #2 has B, node #3 C, ...

• Now suppose Alice wants to send $100 to Bob

– This involves changes on two separate nodes: Adding $100 to Bob's
account (node #2), and taking $100 from Alice's account (node #1)
– What if node #2 already finished the former, and then node #1
notices that there isn't enough money in Alice's account?
– What if node #1 finishes its part, but node #2 crashes before it can
complete the operation?
– ...

5
©2016-2024 Linh Thi Xuan Phan
Atomicity
• Goal: We need to ensure atomicity
– Either all parts of the transaction are completed, or none of them is!
– This is one of the four classical ACID properties from databases
• Atomicity, Consistency, Isolation, Durability

• How can we ensure atomicity?

• Idea: Let's do one-phase commit

– We elect one node as the coordinator; the others are subordinates
– The coordinator tells all the subordinates whether to finalize their part of
the transaction (commit), or whether to undo it (abort)
– The subordinates do what they are told, and then acknowledge

• Is this a good solution?

6
©2016-2024 Linh Thi Xuan Phan
Why one-phase commit fails
Coordinator crashes
in the middle
CO
T
CO MMI M
M But I already
IT

COMMIT
??? aborted!

B
C
OK!
A

• Problem #1: Subordinate cannot independently abort the

transaction
• Problem #2: Coordinator might crash before subordinate
receives the message; partial results are lost

7
©2016-2024 Linh Thi Xuan Phan
Two-Phase Commit (2PC)
• Idea: We need two rounds of communication!

• First round: Voting

– Coordinator sends prepare message to each subordinate
– Each subordinate responds with yes if it is able to commit its part of
the transaction, otherwise with no
• If the subordinate needs locks to commit, it needs to acquire them before
responding with yes (why?)

• Second round: Decision

– Coordinator sends commit or abort to each subordinate
– Each subordinate responds with an ack

• Result: Any site can decide to abort a transaction!

8
©2016-2024 Linh Thi Xuan Phan
2PC: What about crashes?
• We also need to handle the case where a node
crashes in the middle
– Nodes need to be able to 'remember' some information!

• Idea: Each node is given a local, persistent log

– This log could be stored on disk, or in NVRAM
• Is this enough to ensure that data remains persistent despite a possible crash?
– When a node recovers after a crash, it can look at its local log to
figure out what the next steps should be

9
©2016-2024 Linh Thi Xuan Phan
2PC: Steps in more detail
• When a transaction wants to commit:
– The coordinator sends a prepare message to each subordinate
– Subordinate force-writes an abort or prepare log record, then
sends a no (abort) or yes (prepare) message to coordinator
– The coordinator then considers the votes:
• If everyone has voted yes, it force-writes a commit log record and sends
commit message to all subordinates
• Else, it force-writes an abort log record and sends an abort message
– The subordinates force-write abort/commit log records based on
the message they get, and then send an ack message to
coordinator
– The coordinator writes an end log record after getting all the
acks
• Why is the 'end' record useful?
Messages in red
Log records in green 10
©2016-2024 Linh Thi Xuan Phan
2PC: Protocol (1/2)
void coordinator(Transaction t, Set nodes)
{
log.write("BEGIN");
log.write(result);//commits the result
foreach (n : nodes)
foreach (n : nodes)
send(n, "PREPARE");
send(n, result);
if (result == "COMMIT")
Set responses = new Set();
t.performLocalPart();
bool allInFavor = true;
if (!t.localPartCanCommit())
Set finished = new Set();
allInFavor = false;
while (!finished.equals(nodes)){
while (!responses.equals(nodes) &&
Node sender;
!timeout() && allInFavor){
Message msg = recv(&sender);
Node sender;
if (msg == "STATUS?")
Message msg = recv(&sender);
send(sender, result);
responses.add(sender);
if (msg == "ACK")
if (msg == "NO")
finished.add(sender);
allInFavor = false;
}
}
if (timeout())
log.write("END");
allInFavor = false;
}
String result;
if (allInFavor)
result = "COMMIT";
else
result = "ABORT";
11
©2016-2024 Linh Thi Xuan Phan
2PC: Protocol (2/2)
void subordinate(Transaction t, Node coordinator)
{
log.write("BEGIN");
while (true) {
Message msg = recvFrom(coordinator);
if (msg == "PREPARE") {
if (t.localPartCanCommit()) {
log.write("PREPARE");
send(coordinator, "YES");
} else {
log.write("ABORT");
send(coordinator, "NO");
}
} else if (msg == "COMMIT") {
log.write("COMMIT");
t.performLocalPart();
log.write("END");
send(coordinator, ”ACK");
break;
} else if (msg == "ABORT") {
log.write("ABORT");
log.write("END");
send(coordinator, ”ACK");
break;
}
}
} 12
©2016-2024 Linh Thi Xuan Phan
2PC: Illustration
Coordinator Subordinate 1 Subordinate 2
force-write
begin log entry
send “prepare”
send “prepare”

force-write force-write
prepare log entry prepare log entry
send “yes”
send “yes”
force-write
commit log entry commit point
send “commit”
send “commit”
force-write force-write
commit log entry commit log entry
send “ack”
send “ack”
write
end log entry

©2016-2024 Linh Thi Xuan Phan 13

2PC: Some observations
• All log records for a transaction contain its ID and
the coordinator’s ID
– The coordinator’s abort/commit record also includes IDs of all
subordinates (why?)

• Every message reflects a decision by the sender

– To ensure that this decision survives failures, it is first recorded in
the local log

• There exists no distributed commit protocol that

can recover without communicating with other
processes, in the presence of multiple failures!

©2016-2024 Linh Thi Xuan Phan

What if a node fails in the middle?
• Suppose we find a commit or abort log record for
transaction T, but not an end record?
– Need to redo/undo T
– If this node is the coordinator for T, keep sending commit / abort
messages to subordinates until acks have been received
• Suppose we find a prepare log record for
transaction T, but not commit/abort?
– This node is a subordinate for T
– Repeatedly contact the coordinator to find status of T, then write
commit/abort log record; redo/undo T; and write end log record
• Suppose we don’t find even a prepare record for T?
– Unilaterally abort and undo T
– This site may be coordinator! If so, subordinates may send
messages and need to be undone as well

©2016-2024 Linh Thi Xuan Phan

Coordinator failure
• What should a subordinate do when the
coordinator fails after it has already voted yes?
– Problem: Cannot decide whether to commit or abort T until
coordinator recovers - T is blocked!
– Consequences?

• Suppose all the subordinates know each other?

– Can be implemented (requires extra overhead in prepare msg)
– But: They are still blocked, unless one of them voted no, or one of
them has already received the coordinator's decision

©2016-2024 Linh Thi Xuan Phan

Link and remote site failures
• What should a node do when a remote node does
not respond during the commit protocol for
transaction T (either because the remote node
failed or the link failed)?
– If the current node is the coordinator for T, should abort T
– If the current node is a subordinate, and has not yet voted yes, it
should abort T
– If the current node is a subordinate and has voted yes, it is blocked
until the coordinator responds!

• Can we do better?

©2016-2024 Linh Thi Xuan Phan

Plan for today
• Distributed commit
– Two-phase commit (2PC)
– Three-phase commit (3PC) NEXT

• Logging and recovery

– Centralized checkpointing
– Chandy-Lamport algorithm

18
©2016-2024 Linh Thi Xuan Phan
How can we improve 2PC?
• What is the real reason why 2PC can block?
– Suppose both the coordinator and a subordinate crash
– The decision could have been COMMIT, and the subordinate may
have already completed the operation
– But the other subordinates have no way to distinguish this from the
situation where the decision was ABORT

• Idea: Let's make sure that the subordinates know

the decision before they execute anything!
– When the coordinator has received yes votes from all the subordi-
nates, it tells them that there will be a COMMIT ("PRECOMMIT")
– Once everyone (or at least a large number) acknowledges the
receipt, the coordinator then sends the actual COMMIT message
– What happens now when the coordinator fails?
• ... during the PRECOMMIT phase? ... during the actual COMMIT phase? 19
©2016-2024 Linh Thi Xuan Phan
Three-phase commit (3PC)
• Phase 1: Voting
– Coordinator sends PREPARE to each subordinate
– Each subordinate votes either YES or NO

• Phase 2: Precommit
– If at least one vote is NO, the coordinator sends ABORT as before
– If all votes are YES, the coordinator sends PRECOMMIT to
at least k subordinates (where k is a tunable parameter)
• OK to send more PRECOMMITs if there are not enough responses
– Each subordinate replies with an ACK

• Phase 3: Commit
– Once the coordinator has received k ACKs, it sends COMMIT to
each subordinate
– Each subordinate responds with an ACK
20
©2016-2024 Linh Thi Xuan Phan
3PC: Handling coordinator failures
• What if some nodes fail, including the coordinator?
– Remaining nodes ask each other what the coordinator has told them
• Situation #1: Nobody has seen a PRECOMMIT
– 2PC would have block in this case!
– But with 3PC, the remaining nodes can safely ABORT, since the
failed nodes could not have made any changes yet
• ... at least unless more than k nodes have failed! (why?)

• Situation #2: At least one PRECOMMIT or COMMIT

– The remaining subordinates know that the decision was (or was
going to be) COMMIT
– They can all decide to go ahead and COMMIT; once the other nodes
come back up, they can learn about this decision and COMMIT too
• Situation #3: Network partition
– 3PC isn't safe in this case! (why?)

21
©2016-2024 Linh Thi Xuan Phan
Recap: Distributed commit
• Goal: Do something on all nodes, or none of them
– A very common requirement in distributed systems
– Naïve solution (one-phase commit) isn't safe

• We have seen two solutions: 2PC and 3PC

– The key idea is for the nodes to 'vote' whether to commit or abort
– A coordinator is elected to collect votes and broadcast the decision
– 2PC is much simpler, but can block when the coordinator fails
– 3PC doesn't have that problem, but it is more complicated
– Neither 2PC nor 3PC can tolerate network partitions

22
©2016-2024 Linh Thi Xuan Phan
Plan for today
• Distributed commit
– Two-phase commit (2PC)
– Three-phase commit (3PC)

• Logging and recovery NEXT

– Centralized checkpointing
– Chandy-Lamport algorithm

23
©2016-2024 Linh Thi Xuan Phan
Why logging and recovery?
• Suppose a distributed system fails in the middle of
a long and expensive computation

• What should we do?

– Restarting from scratch may be expensive, or even impossible!

• Idea: Periodically record a checkpoint of the system

– Each node writes down ("logs") its current state, e.g., on its local disk
– If something bad happens, the nodes can go back to the latest
checkpoint and then continue from there!

24
©2016-2024 Linh Thi Xuan Phan
Message logging
• What if the latest checkpoint was some time ago?
– Rolling back the entire system would destroy a lot of useful work

• Idea: Use message logging + deterministic replay

– Whenever a message is sent, it is saved somewhere – either by the
sender (sender-based logging) or by the recipient (receiver-based)
– Nodes also record all nondeterministic events
• What are these? Why do they need to be remembered?
– When a node crashes, we can roll back only that node to its latest
checkpoint, and then feed it all the recorded messages and
nondeterministic events
– This will bring the node back to the state it was before the crash
• What are we assuming about the software on the node?
• Does this assumption generally hold? What does it take to make it hold?

25
©2016-2024 Linh Thi Xuan Phan
Checkpointing on a single node
• How would you record a checkpoint on one node?
• Idea #1: Just write its memory contents to disk!
– Problem: Write can take a long time!
– If the node keeps running during that time, the checkpoint will be
inconsistent: the state at the beginning is older than that at the end
– The node may never have been in the state that the checkpoint
describes (at least not at any given time)

• Idea #2: Stop the node during the checkpoint

– Not ideal either – system will be unresponsive during that time!
– Actual checkpointing systems use techniques like copy-on-write
(CoW) to take a checkpoint in memory very quickly (a few ms)
• This can then be written to disk asynchronously!
– But this trick won't work if we have a distributed system!
26
©2016-2024 Linh Thi Xuan Phan
Consistent vs. inconsistent cuts
Event A0 Event A1 Event A2 Event A3
A
Me
ssa

2
em
ge
m

ag
1

ss
Me
B
Event B0 Event B1 Event B2 Event B3

Inconsistent cut Consistent cut

• We can define a cut of the distributed execution

– Basically, one prefix of each node's local execution, taken together
• When can we call a cut consistent?
– If, for every event it contains, it also contains all events that
'happened before' that event.
– In particular, every received message must have been sent
• Which of the above cuts are consistent?
27
©2016-2024 Linh Thi Xuan Phan
Recovery lines
= Checkpoint

Failure

Initial state Recovery line Inconsistent cut

• Suppose we want to roll back the entire system

– Can we simply roll back each individual node to a recent checkpoint?
– Problem: Checkpoints could describe an inconsistent cut!
• A recovery line is a consistent set of checkpoints
– How can we find such a recovery line?

28
©2016-2024 Linh Thi Xuan Phan
The Domino Effect
= Checkpoint

Failure

Initial state Recovery line

• What if a set of checkpoints is not consistent?

– Need to roll back some of the nodes to an even earlier checkpoint!
– But that could create further inconsistencies!
• Problem: Cascading rollbacks! Why?
– In this example, the nodes just took their checkpoints individually
– Thus, it is unlikely that they would (ever) form a consistent cut
– The nodes need to coordinate their checkpointing!
29
©2016-2024 Linh Thi Xuan Phan
Centralized checkpointing
• Similar to 2PC with a coordinator

• Coordinator first multicasts a CHECKPOINT

message to all processes

• When a process receives the message:

– Takes a local checkpoint
– Queues any subsequent outgoing messages (why?)
– Responds with an ACK

• When the coordinator receives all the ACKs, it

sends a DONE message
– At that point, the queued messages are sent, and the execution
proceeds normally again

Can we do better?
• Centralized checkpointing has several drawbacks

• Problem #1: Need for a central coordinator

– Ideally, we would like a fully distributed protocol
– Any node should be able to initiate a checkpoint at any time

• Problem #2: Queueing

– This is necessary to ensure that the cut is consistent
– But it also holds up the system for a while!

• Problem #3: Messages are not captured

– Some messages may still be 'in flight' at checkpoint time
– If we roll back every node to the checkpoint, these messages will
not be re-sent!

• This should include:

– A checkpoint for each node
– The set of messages that were "in flight"
at the time (i.e., sent but not yet received)

• Can we expect to capture a state of the entire

system as of some particular time?
– Not unless clocks are perfectly synchronized!
– All we can hope for is a state that the system could have been in,
and from which the actual state is reachable (why?)
32
©2016-2024 Linh Thi Xuan Phan
What are in flight messages from B to A?

Node A Node B

m1: sent and received before m1

both A’s and B’s checkpoints

m2
m2 , m3: sent before B’s checkpoint
received after A’s checkpoint
m3

m4: sent and received after

both A’s and B’s checkpoints m4

33
©2016-2024 Linh Thi Xuan Phan
Some simplifying assumptions
• To simplify our discussion,
we will assume that:
– Each node Ni has a direct "channel" cij
to every other node Nj
– Channels are unidirectional and
deliver messages in FIFO order
– Channels are reliable, i.e., messages
are never lost

• How can we make these assumptions true?

– Direct channel: This is just an abstraction; all we need is that every
node can send messages to every other node
– FIFO order: Can use sequence numbers
– Reliable: Can use retransmissions
34
©2016-2024 Linh Thi Xuan Phan
K. Mani Chandy and Leslie Lamport: "Distributed Snapshots: Determining Global States of Distributed Systems", ACM TOCS 3(1):63-75
Chandy-Lamport algorithm
• When a node Ni wants to initiate a snapshot:
– it takes a local checkpoint,
– it sends a special marker message on each of its outgoing channels, and
– it begins recording messages that arrive on its incoming channels
• When a node Nj receives a marker on channel cij:
– If Nj has not yet taken a checkpoint:
• it takes a local checkpoint and sends a special marker on each outgoing channel,
• it records the state of cij as the empty set, and
• it begins recording messages that arrive on any
other incoming channel ckj
– If Nj has already taken a checkpoint:
• it stops recording messages for cij

• Termination condition:
– The node initiated the snapshot has received a marker from every other node

• What are the 'recorded' sets of messages?

– These are the messages that are "in flight" at the snapshot time
35
©2016-2024 Linh Thi Xuan Phan
Chandy-Lamport: Example
Alice: $500 Alice: $600 Alice: $350 Alice: $400
A

0
10
:$
ce
0
: $5
lice

Ali
→ A

b→
Bob
Bob: $300 Bo
Bob: $200 Bob: $50
B
Bob: $150

Bo $1
Ali

b→ 00
c
e→ 250

Ch
$
Ch

arl
arl

ie:
Charlie: $100

ie:
C
Charlie: $350 Charlie: $450

Node C: Chk="Charlie: $100"; SAC = { Alice→Charlie: $250 }; SBC = {};

Node B: Chk=”Bob: $150"; SAB = {}; SCB = {};

Node A: Chk=”Alice: $350"; SBA = { Bob → Alice: $50 } SCA = {};

• What happens if we roll back & replay messages?

– We arrive at the point indicated by the orange dots! 36
©2016-2024 Linh Thi Xuan Phan
Recap: Chandy-Lamport algorithm
• Goal: A consistent "snapshot" of the system
– One checkpoint per node, and all the messages that were "in flight"
– Not necessarily a state that the system ever was in at any given
(wallclock) time

• Properties of the algorithm:

– Fully distributed; no need for a central coordinator
– Any node can trigger a snapshot at any time
– Execution can continue; no need to "queue" messages

• How can we use this?

– Example: Stable property detection
• Some of the algorithms we have discussed (e.g., deadlock detection via circular
wait) assume that we have a consistent snapshot of the global system state
– Example: Failure recovery

NetOps 2.0 Transformation: The DIRE Methodology
From Everand
NetOps 2.0 Transformation: The DIRE Methodology
Ray Belleville
5/5 (1)
Ddbs Checkpointing ... Ddbs Checkpointing ... : Phase 1 at Css Phase 2 at CC
No ratings yet
Ddbs Checkpointing ... Ddbs Checkpointing ... : Phase 1 at Css Phase 2 at CC
9 pages
Distributed Transactions
No ratings yet
Distributed Transactions
27 pages
Unit IV - Distributed Transaction Processing
No ratings yet
Unit IV - Distributed Transaction Processing
38 pages
Reliability and Security in The Distributed Databases
No ratings yet
Reliability and Security in The Distributed Databases
29 pages
Distributed Transactions - Database Systems
No ratings yet
Distributed Transactions - Database Systems
10 pages
Nonblocking Commit Protocols: Dale Skeen
No ratings yet
Nonblocking Commit Protocols: Dale Skeen
42 pages
Logless One-Phase Commit Made Possible For Highly-Available Datastores
No ratings yet
Logless One-Phase Commit Made Possible For Highly-Available Datastores
26 pages
Lec22
No ratings yet
Lec22
22 pages
Quorum
No ratings yet
Quorum
14 pages
Atomic Commit and Concurrency Control: COS 418: Distributed Systems Wyatt Lloyd
No ratings yet
Atomic Commit and Concurrency Control: COS 418: Distributed Systems Wyatt Lloyd
40 pages
Commit Protocols Non-Blocking Commit Protocols
No ratings yet
Commit Protocols Non-Blocking Commit Protocols
10 pages
CS542: Topics in Distributed Systems
No ratings yet
CS542: Topics in Distributed Systems
11 pages
DistributedTransaction
No ratings yet
DistributedTransaction
46 pages
DDS Unit - 4
No ratings yet
DDS Unit - 4
22 pages
L6 Transactions II
No ratings yet
L6 Transactions II
20 pages
Consensus
No ratings yet
Consensus
77 pages
Lec16 2023
No ratings yet
Lec16 2023
45 pages
Fert
No ratings yet
Fert
38 pages
Session 35
No ratings yet
Session 35
3 pages
Consensus On Transaction Commit
No ratings yet
Consensus On Transaction Commit
28 pages
3 - Nonblocking Commit Protocols
No ratings yet
3 - Nonblocking Commit Protocols
28 pages
Lecture 9 Distributed Transactions
No ratings yet
Lecture 9 Distributed Transactions
7 pages
Distributed Transaction Model
No ratings yet
Distributed Transaction Model
17 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
Word Unit5
No ratings yet
Word Unit5
19 pages
Nested Transactions Nested Transactions
No ratings yet
Nested Transactions Nested Transactions
11 pages
Distributed_Commit_Protocols
No ratings yet
Distributed_Commit_Protocols
9 pages
Design and Implementation of A Two-Phase Commit Protocol Simulator
No ratings yet
Design and Implementation of A Two-Phase Commit Protocol Simulator
8 pages
10 DS_Ch17
No ratings yet
10 DS_Ch17
16 pages
WINSEM2023-24 CSI2004 TH VL2023240501820 2024-02-07 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2004 TH VL2023240501820 2024-02-07 Reference-Material-I
75 pages
Lecture 11A - Replication Control
No ratings yet
Lecture 11A - Replication Control
15 pages
2.11 Distributed Transaction
No ratings yet
2.11 Distributed Transaction
27 pages
Distributed Recovery Management: UNIT-4
No ratings yet
Distributed Recovery Management: UNIT-4
31 pages
u4p5
No ratings yet
u4p5
15 pages
Distributed Transactions
No ratings yet
Distributed Transactions
37 pages
25 DistributedCoordination
No ratings yet
25 DistributedCoordination
30 pages
lect22
No ratings yet
lect22
27 pages
Concurrency Control
No ratings yet
Concurrency Control
26 pages
MIT 6.824 - Lecture 12 - Distributed Transactions
No ratings yet
MIT 6.824 - Lecture 12 - Distributed Transactions
1 page
What is a Transaction
No ratings yet
What is a Transaction
7 pages
DDB Unit3
No ratings yet
DDB Unit3
11 pages
Distributed Systems - Fault Tolerance
No ratings yet
Distributed Systems - Fault Tolerance
21 pages
8 Transaction
No ratings yet
8 Transaction
30 pages
Os Answer 5
No ratings yet
Os Answer 5
2 pages
Efficient and Non-Blocking Agreement Protocols: Suyash Gupta Mohammad Sadoghi
No ratings yet
Efficient and Non-Blocking Agreement Protocols: Suyash Gupta Mohammad Sadoghi
47 pages
Distributed Computing Replication Control
No ratings yet
Distributed Computing Replication Control
71 pages
Aks Replication Control
No ratings yet
Aks Replication Control
71 pages
08 Transactions
No ratings yet
08 Transactions
15 pages
CS 194: Distributed Systems
No ratings yet
CS 194: Distributed Systems
15 pages
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
No ratings yet
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
30 pages
Distributed Transactions
0% (1)
Distributed Transactions
52 pages
Distributed DBMS Reliability - 3 of 3 (Good)
50% (2)
Distributed DBMS Reliability - 3 of 3 (Good)
35 pages
05distributed Transaction Processinglec5 (Feb 28)
No ratings yet
05distributed Transaction Processinglec5 (Feb 28)
41 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
Distributed Systems Unit 4
No ratings yet
Distributed Systems Unit 4
26 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
Unit 4 - DSRM
No ratings yet
Unit 4 - DSRM
5 pages
bb7317f74359bd61ad497a689e151ef1_MIT6_830F10_lec16
No ratings yet
bb7317f74359bd61ad497a689e151ef1_MIT6_830F10_lec16
6 pages
CCNA Interview Questions You'll Most Likely Be Asked
From Everand
CCNA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
IntroDistributed Systems
No ratings yet
IntroDistributed Systems
27 pages
Transactions Management and Concurrency Control
No ratings yet
Transactions Management and Concurrency Control
80 pages
Foundations of Information Systems in Business
No ratings yet
Foundations of Information Systems in Business
50 pages
Week 7 Solution
No ratings yet
Week 7 Solution
10 pages
Unit 3
No ratings yet
Unit 3
35 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
45 pages
Chapter 9: Concurrency Control
No ratings yet
Chapter 9: Concurrency Control
34 pages
Characterizing Schedules Based on Serializabilitygfdgfgfg
No ratings yet
Characterizing Schedules Based on Serializabilitygfdgfgfg
9 pages
Concurrency Control, Lock-Based Protocol & Time-Stamp Protocol
No ratings yet
Concurrency Control, Lock-Based Protocol & Time-Stamp Protocol
8 pages
Introduction To Transaction Processing Concepts and Theory
No ratings yet
Introduction To Transaction Processing Concepts and Theory
52 pages
DBMS Unit4 Notes
No ratings yet
DBMS Unit4 Notes
14 pages
7.Two Phase Locking Protocols and Deadlock
No ratings yet
7.Two Phase Locking Protocols and Deadlock
7 pages
Concurrency Control in Database Systems: Bharat Bhargava, Fellow, IEEE
No ratings yet
Concurrency Control in Database Systems: Bharat Bhargava, Fellow, IEEE
14 pages
Concurrency control
No ratings yet
Concurrency control
33 pages
Chapter 14 - Transactions
No ratings yet
Chapter 14 - Transactions
31 pages
RTDB
No ratings yet
RTDB
47 pages
Quiz#8 ITM6505
No ratings yet
Quiz#8 ITM6505
7 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
12 pages
Unit 4 - Distributed System - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Distributed System - WWW - Rgpvnotes.in
11 pages
Lecture 2 Ho PDF
No ratings yet
Lecture 2 Ho PDF
35 pages
Chapter 1 - 5
100% (1)
Chapter 1 - 5
80 pages
Information System
No ratings yet
Information System
27 pages
Concurrency Control in Database Systems
No ratings yet
Concurrency Control in Database Systems
16 pages
ACID Properties
No ratings yet
ACID Properties
5 pages
Lesson G - 2 Ch09 2 Rev. Cycle Obj., Control, Test
No ratings yet
Lesson G - 2 Ch09 2 Rev. Cycle Obj., Control, Test
34 pages
Serial Schedule Non-Serial Schedule: Checkpoints
No ratings yet
Serial Schedule Non-Serial Schedule: Checkpoints
7 pages
Activity 6 - Auditing Database
No ratings yet
Activity 6 - Auditing Database
3 pages
ADBMS-Chapter 2
No ratings yet
ADBMS-Chapter 2
11 pages
DBMS Unit-4
67% (3)
DBMS Unit-4
67 pages
AIS Chapter 1
No ratings yet
AIS Chapter 1
8 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

University of Pennsylvania

Linh Thi Xuan Phan

Department of Computer and Information Science

Lecture 13: Fault tolerance

©2016-2024 Linh Thi Xuan Phan

• Needed for creating the team GitHub repository

• How does replication help?

– Two-phase commit (2PC)

• Logging and recovery

• Now suppose Alice wants to send $100 to Bob

• How can we ensure atomicity?

• Idea: Let's do one-phase commit

• Is this a good solution?

• Problem #1: Subordinate cannot independently abort the

• First round: Voting

• Second round: Decision

• Result: Any site can decide to abort a transaction!

• Idea: Each node is given a local, persistent log

©2016-2024 Linh Thi Xuan Phan 13

• Every message reflects a decision by the sender

• There exists no distributed commit protocol that

©2016-2024 Linh Thi Xuan Phan

©2016-2024 Linh Thi Xuan Phan

• Suppose all the subordinates know each other?

©2016-2024 Linh Thi Xuan Phan

©2016-2024 Linh Thi Xuan Phan

• Logging and recovery

• Idea: Let's make sure that the subordinates know

• Situation #2: At least one PRECOMMIT or COMMIT

• We have seen two solutions: 2PC and 3PC

• Logging and recovery NEXT

• What should we do?

• Idea: Periodically record a checkpoint of the system

• Idea: Use message logging + deterministic replay

• Idea #2: Stop the node during the checkpoint

Inconsistent cut Consistent cut

• We can define a cut of the distributed execution

Initial state Recovery line Inconsistent cut

• Suppose we want to roll back the entire system

Initial state Recovery line

• What if a set of checkpoints is not consistent?

• Coordinator first multicasts a CHECKPOINT

• When a process receives the message:

• When the coordinator receives all the ACKs, it

©2016-2024 Linh Thi Xuan Phan

• Problem #1: Need for a central coordinator

• Problem #2: Queueing

• Problem #3: Messages are not captured

• This should include:

• Can we expect to capture a state of the entire

m1: sent and received before m1

m4: sent and received after

• How can we make these assumptions true?

• What are the 'recorded' sets of messages?

Node C: Chk="Charlie: $100"; SAC = { Alice→Charlie: $250 }; SBC = {};

Node B: Chk=”Bob: $150"; SAB = {}; SCB = {};

Node A: Chk=”Alice: $350"; SBA = { Bob → Alice: $50 } SCA = {};

• What happens if we roll back & replay messages?

• Properties of the algorithm:

• How can we use this?

You might also like