Presentation On Consistent Checkpoints & Recovery in Distributed System
Presentation On Consistent Checkpoints & Recovery in Distributed System
Checkpointing
Orphan message : a message that make an inconsistent state Domino Effect : what a single rolling back induce other rolling back
[
y1
[
y2
[
Y has not sent yet, but X has received.
[
z1 z2
: Orphan message
Roll back
[ : Domino Effect
Lost messages
x1 x2 x3
[
y1
[
y2
[
X has sent, but Y cannot receive forever
[
z1 z2
: Lost message
Roll back
Livelocks
x1
[
n1
m2 m1
n2 n1
y1
Consistency of Checkpoint
[
y1
y2
[
need to deal with lost messages
[
Strongly consistent z1
[
consistent
z2
Checkpoint/Recovery Algorithm
Synchronous
Asynchronous
Preliminary (Assumption)
Synchronous Checkpoint
Goal
To make a consistent global checkpoint
Assumptions
Communication channels are FIFO No partition of the network End-to-end protocols cope with message loss due to rollback recovery and communication failure No failure during the execution of the algorithm
tentative checkpoint :
a temporary checkpoint a candidate for permanent checkpoint a local checkpoint at a process a part of a consistent global checkpoint
permanent checkpoint :
Checkpoint Algorithm
Synchronous Checkpoint
Algorithm
1.
2. 3.
4.
5. 6.
an initiating process (a single process that invokes this algorithm) takes a tentative checkpoint it requests all the processes to take tentative checkpoints it waits for receiving from all the processes whether taking a tentative checkpoint has been succeeded if it learns all the processes has succeeded, it decides all tentative checkpoints should be made permanent; otherwise, should be discarded. it informs all the processes of the decision The processes that receive the decision act accordingly
Supplement Once a process has taken a tentative checkpoint, it shouldnt send messages until it is informed of initiators decision.
Initiator
|
request to take a tentative checkpoint
permanent checkpoint
OK
[
consistent global checkpoint
[
Unnecessary checkpoint
Correctness
Synchronous Checkpoint
No process sends messages after taking a tentative checkpoint until the receipt of the decision New checkpoints include no message from the processes that dont take a checkpoint The set of tentative checkpoints is fully either made to permanent checkpoints or discarded.
Additional messages are exchanged Synchronization delay An unnecessary extra load on the system if failure rarely occurs
Asynchronous Checkpoint
Characteristic
Each process takes checkpoints independently No guarantee that a set of local checkpoints is consistent A recovery algorithm has to search consistent set of checkpoints No additional message No synchronization delay Lighter load during normal excution
Preliminary (Assumptions)
Asynchronous Checkpoint / Recovery
Goal
To find the latest consistent set of checkpoints
Assumptions
Communication channels are FIFO Communication channels are reliable The underlying computation is event-driven
save an event on the memory at receipt of messages (volatile log) volatile log periodically flushed to the disk (stable log) checkpoint
volatile log :
quick access lost if the corresponding processor fails
stable log :
slow access not lost even if processors fail
Preliminary (Definition)
Asynchronous Checkpoint / Recovery
Definition
CkPti : the checkpoint (stable log) that i rolled back to when failure occurs RCVDij (CkPti / e ) :
the number of messages received by processor i from processor j, per the information stored in the checkpoint CkPti or event e.
SENTij(CkPti / e ) :
the number of messages sent by processor i to processor j, per the information stored in the checkpoint CkPti or event e
Recovery Algorithm
Asynchronous Checkpoint / Recovery
Algorithm
1. 2. 3. 4. 5.
6.
When one process crashes, it recovers to the latest checkpoint CkPt. It broadcasts the message that it had failed. Others receive this message, and rollback to the latest event. Each process sends SENT(CkPt) to neighboring processes Each process waits for SENT(CkPt) messages from every neighbor On receiving SENTji(CkPtj) from j, if i notices RCVDij (CkPti) > SENTji(CkPtj), it rolls back to the event e such that RCVDij (e) = SENTji(e), repeat 3,4,and 5 N times (N is the number of processes)
In Distributed System replicas of data objects at different sites,the availability & reliability increases. It is also known as Replicated Distributed Database System. In recovery algorithm two methods are used to recover the failed sites.
Updation message are going towards the failed site. Saved in message spoolers. All the failed site process all the updates from there & then gets normal operating situations.
Copier transactions are used. Two things are necessary for this method ; Replicas having missed updation are not used in user transactions. If these are used in user transactions,they are made up to date by copier transaction.
Any Queries?
Thank you