Unit-V: Coordination and Agreement
Unit-V: Coordination and Agreement
Introduction:
• It is generally important that the processes within a distributed system have some
sort of agreement
• I Agreement may be as simple as the goal of the distributed system
• I Has the general task been aborted?
• I Should the main aim be changed?
• I This is made more complicated than it sounds, since all the processes must, not
only agree, but be confident that their peers agree
No Fixed Master
• We will also look at dynamic agreement of a master or leader process i.e. an
election. Generally after the current master has failed.
• I We saw in the Time and Global State section that some algorithms required a
global master/nominee, but there was no requirement for that master/nominee
process to be fixed
• I With a fixed master process agreement is made much simpler
• I However it then introduces a single point of failure
• I So here we are generally assuming no fixed master process
Synchronous vs Asynchronous
• I Again with the synchronous and asynchronous
• I It is an important distinction here, synchronous systems allow us to determine
important bounds on message transmission delays
• I This allows us to use timeouts to detect message failure in a way that cannot be
done for asynchronous systems.
• A process enters the CS when an assertion, defined on its local variables, becomes true
• Mutual exclusion is enforced because the assertion becomes true only at one site at any
given time
Lamport’s Algorithm
• Requires communication channels to deliver messages in FIFO order
• Satisfies conditions ME1, ME2 and ME3
• Based on Lamport logical clocks: timestamped requests for entering the CS
• Every process pi keeps a queue, request_queuei, which contains mutual exclusion
requests ordered by their timestamps
• IDEA: the algorithm executes CS requests in the increasing order of timestamps
• Timestamp: (clock value, id of the process)
Requesting the CS
Process pi updates its local clock and timestamps the request (tsi)
Process pi broadcasts a REQUEST(tsi, i) to all the other processes
Process pi places the request on request_queuei
On Receiving REQUEST(tsi, i) from a process pi
Process pj places pi’s request on request_queuej
Process pj returns a timestamped REPLY msg to pi
Executing the CS
Process pi enters the CS when the following two conditions hold:
‣ L1: pi has received a msg with timestamp larger than (tsi, i) from all other processes
‣ L2: pi’s request is at the top of request_queuei
Releasing the CS
Process pi removes its request from the top of request_queuei
Process pi broadcasts a timestamped RELEASE msg to all other processes
On Receiving RELEASE from a process pi
Process pj removes pi’s request from its request queue request_queuej
Entering a CS
• p1 and p2 send out REQUEST messages for the CS to the other processes
Both p1 and p2 have received timestamped REPLY msgs from all processes
Exiting a CS
p1 exits and sends RELEASE msgs to all other processes
Example:
Group Communication
Communication between two processes in a distributed system is required to exchange
various data, such as code or a file, between the processes. When one source process tries
to communicate with multiple processes at once, it is called Group Communication.
A group is a collection of interconnected processes with abstraction. This
abstraction is to hide the message passing so that the communication looks like a normal
procedure call. Group communication also helps the processes from different hosts to work
together and perform operations in a synchronized manner, therefore increases the overall
performance of the system.
A multicast Communication: P1 process communicating with only a group of the process in the system
3. Unicast Communication :
When the host process tries to communicate with a single process in a distributed system
at the same time. Although, same information may be passed to multiple processes. This
works best for two processes communicating as only it has to treat a specific process only.
However, it leads to overheads as it has to find exact process and then exchange
information/data.
A broadcast Communication: P1 process communicating with only P3 process
2. detection: let a deadlock occur, detect it, and then deal with it by aborting and later
restarting a process that causes deadlock.
4. avoidance: choose resource allocation carefully so that deadlock will not occur.
Resource requests can be honored as long as the system remains in a safe (non-
deadlock) state after resources are allocated.
In a distributed system deadlock can neither be prevented nor avoided as the system is so
vast that it is impossible to do so. Therefore, only deadlock detection can be implemented.
The techniques of deadlock detection in the distributed system require the following:
Progress – The method should be able to detect all the deadlocks in the system.
Safety – The method should not detect false or phantom deadlocks.
There are three approaches to detect deadlocks in distributed systems. They are as
follows:
1. Centralized approach –
In the centralized approach, there is only one responsible resource to detect deadlock.
The advantage of this approach is that it is simple and easy to implement, while the
drawbacks include excessive workload at one node, single-point failure (that is the
whole system is dependent on one node if that node fails the whole system crashes)
which in turns makes the system less reliable.
2. Distributed approach –
In the distributed approach different nodes work together to detect deadlocks. No
single point failure ( that is the whole system is dependent on one node if that node fails
the whole system crashes) as the workload is equally divided among all nodes. The
speed of deadlock detection also increases.
3. Hierarchical approach –
This approach is the most advantageous. It is the combination of both centralized and
distributed approaches of deadlock detection in a distributed system. In this approach,
some selected nodes or cluster of nodes are responsible for deadlock detection and
these selected nodes are controlled by a single node.
Transaction Recovery
Atomic Property of Transactions means that the effect of performing a transaction
on behalf of one client is free from interference from concurrent transactions being
performed on behalf of other clients
It requires the effects of all committed transactions reflected in data items, but none
of the effects of incomplete/aborted transactions are reflected in the data items
Two Aspects to consider
Durability - requires that data items are saved in permanent storage and will be available
indefinitely, at the servers, or the sites of storage.
Failure Atomicity - requires that the effects of the transaction are atomic even when the
server failsThese two aspects are not completely independent and they can be handled by a
so called recovery manager, which is based on a two-phase commit protocol.
Recovery Manager
• Restores the server’s database from Recovery File (RF) after a crash, which needs to be
resilient to media failure - stable storage
• Reorganizes the RF to improve the performance of recovery
• Reclaims storage space in the RF, through the execution of the application
Recovery File (as a log) is used to deal with recovery of a server involved in a distributed
transaction.
The RF contains:
• Trans Id and the status of the transaction - prepared, committed, aborted
• Data items that are part of the transaction and their values
• Intentions List for the transaction
• RF represents a log containing the history of all the transactions performed
• Contains a Checkpoint, a point where the state of database is precisely known.
• Order of entries reflects the order in which transactions have prepared, committed
and aborted
Intentions List
Contains a list of data item names and the position in the RF were the values of the data
items that are altered by that transaction reside
When a server is prepared to commit a transaction, the RM must save the intentions
list in the RF, this ensures the server is able to carry out the commitment later, even if it
crashes in the interim
When a transaction is aborted the RM uses the intentions list to delete all the
tentative versions of data items made by that transaction.
Check pointing
• The process of writing the current committed values of a server’s data items to a new
RF, together with transaction status entries and intentions lists of transactions that
have not yet been fully resolved
• Its purpose is to reduce number of transactions to be dealt with during recovery and
reclaim file space
• The failed checkpoint itself must be able to recovered too…
Recovery of Two- Phase Commit Protocol
• RMs use two new transaction status values done and uncertain which can be written to
the RF. Both done and uncertain are used when the RF is re-organized
• RM of coordinator uses done to indicate two- phase commit is complete
• RM of worker uses uncertain to indicate the worker has voted Yes but does not know
the outcome
• The RM at coordinator records a coordinator entry - (Trans Id, list of workers) in
coordinator’s RF
• The RM at worker records a worker entry - (Trans Id, coordinator) in worker’s RF
During Phase 1 - Voting
• When coordinator is prepared to commit, its RM writes prepared and a coordinator
entry to RF
• If worker votes Yes, its RM writes prepared, a worker entry and uncertain to the RF
• If worker votes No, its RM writes aborted to the RF
During Phase 2 - Completion
• RM of Coordinator writes either committed or aborted to the RF according to the
decision made
• RMs of Workers write committed or aborted to their RFs depending on message
received from coordinator
• RM of Coordinator writes done to RF when coordinator has received a have committed
message from all its workers
Replication
In the distributed systems research area replication is mainly used to provide fault
tolerance. The entity being replicated is a process.
The basic model for managing replicated data includes the following components:
• Clients issue requests to a front end
• The front end provides transparency by hiding the fact that data is replicated.
• The front end contacts one or more replica managers to retrieve/store the data
• The replica managers interact to ensure that data is consistent
Two replication strategies have been used in distributed systems: Active and Passive
replication.
Active Replication:
Passive Replication
• There is only one server that processes client requests.
• After processing a request, the primary server updates the state on the other servers
and sends back the response to the client.
• If the primary server fails, one of the backup servers takes its place.
• This may be used for non – deterministic processes.
• Disadvantage – In case of failure the response is delayed.
Sequence of Steps:
Data Replication;
• It is the process of storing data in more than one site or node.
• Ensures availability of data.
• There can be two types of replication:
Full Replication – A copy of the whole database is stored at every site.
Partial Replication – Some fragment of the database are replicated.