chapter 4
chapter 4
The network is said to be synchronous, if the data is transmitted from the sender in accordance
with the response from the receiver for each transmission.
In order to minimize the computational time in the synchronous networks Leader Election
algorithm is used.
Leader Election
The leader election is important problem in distributed system as data is distributed among
different node which is geographically separated. Designating a single node as an organizer in
distributed systems is a challenging issue that calls for suitable election algorithms. In distributed
systems, nodes communicate with each other using shared memory or via message passing. To
execute any distributed task effectively. The key requirement for nodes is coordination. In a pure
distributed system, there does not exist any central controlling node that arbitrates decisions
and therefore, every node has to communicate with the rest of the nodes in the network to make a
proper decision.
Often during the decision process, not all nodes make the same decision, thus the communication
between nodes time-consuming and decision-making process. Coordination among nodes
becomes very difficult when consistency is needed among all nodes. Centralized controlling
nodes can be selected from the group of available nodes to reduce the complexity of decision
making. Many distributed algorithms require one node to act as coordinator, initiator, or
otherwise perform some special role.
Leader election is a technique that can be used to break the symmetry of distributed systems. In
order to determine a central controlling node in a distributed system, a node is usually elected
from the group of nodes as the leader to serve as the centralized controller for that decentralized
system. The purpose of leader election is to choose a node that will coordinate activities of the
system. In any election algorithm, a leader is chosen based on some criterion such as choosing
the node with the largest identifier.
Once the leader is elected, the nodes reach a particular state known as terminated state. In leader
election algorithms, these states are partitioned into elected states and non-elected states. When a
node enters either state, it always remains in that state. Every leader election algorithm must be
satisfied by the safety and liveness condition for an execution to be admissible. The liveness
condition states that every node will eventually enter an elected state or a non-elected state. The
Page 1 of 10
Introduction to Distributed System ITec3102
safety condition for leader election requires that only a single node can enter the elected state
and eventually, become the leader of the distributed system.
Many distributed election algorithms have been proposed to resolve the problem of leader
election. Among all the existing algorithms, the most prominent algorithms are as:
a. Ring Algorithm
b. Bully Algorithm
The algorithm is suitable for a collection of processes arranged in a logical ring. Each process pi
has a communication channel to the next process in the ring, p i + 1 mod N , and all messages
are sent clockwise around the ring. The goal of this algorithm is to elect a single process called
the coordinator, which is the process with the largest identifier. Initially, every process is marked
as a non-participant in an election. Any process can begin an election. It proceeds by marking
itself as a participant, placing its identifier in an election message and sending it to its clockwise
neighbour. When a process receives an election message, it compares the identifier in the
message with its own. If the arrived identifier is greater, then it forwards the message to its
neighbour. If the arrived identifier is smaller and the receiver is not a participant, then it
substitutes its own identifier in the message and forwards it; but it does not forward the message
if it is already a participant. On forwarding an election message in any case, the process marks
itself as a participant.
Page 2 of 10
Introduction to Distributed System ITec3102
Note: The election was started by process 17. The highest process identifier encountered
so far is 24.
If, however, the received identifier is that of the receiver itself, then this process’s identifier must
be the greatest, and it becomes the coordinator. The coordinator marks itself as a non-participant
once more and sends an elected message to its neighbour, announcing its election and enclosing
its identity. When a process pi receives an elected message, it marks itself as a nonparticipant,
sets its variable electedi to the identifier in the message and, unless it is the new coordinator,
forwards the message to its neighbour. It is easy to see that condition E1 is met. All identifiers
are compared, since a process must receive its own identifier back before sending an elected
message. For any two processes, the one with the larger identifier will not pass on the other’s
identifier. It is therefore impossible that both should receive their own identifier back.
Condition E2 follows immediately from the guaranteed traversals of the ring
(there are no failures). Note how the non-participant and participant states are used so
that duplicate messages arising when two processes start an election at the same time are
extinguished as soon as possible, and always before the ‘winning’ election result has
been announced.
Page 3 of 10
Introduction to Distributed System ITec3102
The bully algorithm allows processes to crash during an election, although it assumes that
message delivery between processes is reliable. Unlike the ring-based algorithm, this algorithm
assumes that the system is synchronous: it uses timeouts to detect a process failure. Another
difference is that the ring-based algorithm assumed that processes have minimal a priori
knowledge of one another: each knows only how to communicate with its neighbour, and none
knows the identifiers of the other processes. The bully algorithm, on the other hand, assumes that
each process knows which processes have higher identifiers, and that it can communicate with
all such processes.
There are three types of message in this algorithm: an election message is sent to announce an
election; an answer message is sent in response to an election message and
a coordinator message is sent to announce the identity of the elected process – the new
‘coordinator’. A process begins an election when it notices, through timeouts, that the
coordinator has failed. Several processes may discover this concurrently.
Since the system is synchronous, we can construct a reliable failure detector. There is a
maximum message transmission delay, Ttrans , and a maximum delay for processing a message
Tprocess. Therefore, we can calculate a time T = 2T trans + Tprocess that is an upper bound on the time
that can elapse between sending a message to another process and receiving a response. If no
response arrives within time T, then the local failure detector can report that the intended
recipient of the request has failed.
The process that knows it has the highest identifier can elect itself as the
coordinator simply by sending a coordinator message to all processes with lower
identifiers. On the other hand, a process with a lower identifier can begin an election by
sending an election message to those processes that have a higher identifier and awaiting
answer messages in response. If none arrives within time T, the process considers itself
the coordinator and sends a coordinator message to all processes with lower identifiers
announcing this. Otherwise, the process waits a further period Tc for a coordinator
message to arrive from the new coordinator. If none arrives, it begins another election.
If a process pi receives a coordinator message, it sets its variable electedi to the
identifier of the coordinator contained within it and treats that process as the coordinator.
If a process receives an election message, it sends back an answer message and
begins another election – unless it has begun one already. When a process is started to replace a
crashed process, it begins an election. If it has the highest process identifier, then it will decide
that it is the coordinator and announce this to the other processes. Thus it will become the
coordinator, even though the current coordinator is functioning. It is for this reason that the
algorithm is called the ‘bully’ algorithm.
Page 4 of 10
Introduction to Distributed System ITec3102
The operation of the algorithm is shown in Figure 6. There are four processes,
p1 –p4 . Process p1 detects the failure of the coordinator p4 and announces an election
(stage 1 in the figure). On receiving an election message from p1 , processes p2 and p3
send answer messages to p1 and begin their own elections; p3 sends an answer
message to p2 , but p3 receives no answer message from the failed process p4 (stage 2). It
therefore decides that it is the coordinator. But before it can send out the
coordinator message, it too fails (stage 3). When p1 ’s timeout period T’expires (which
we assume occurs before p2 ’s timeout expires), it deduces the absence of a coordinator
message and begins another election. Eventually, p2 is elected coordinator (stage 4).
Mutual Exclusion
A mutual exclusion (mutex) is a program object that prevents simultaneous access to a shared
resource. This concept is used in concurrent programming with a critical section, a piece of code
in which processes or threads access a shared resource. Only one thread owns the mutex at a
Page 5 of 10
Introduction to Distributed System ITec3102
time, thus a mutex with a unique name is created when a program starts. When a thread holds a
resource, it has to lock the mutex from other threads to prevent concurrent access of the resource.
Upon releasing the resource, the thread unlocks the mutex.
Page 6 of 10
Introduction to Distributed System ITec3102
1. The largest boost in performance will likely be noticed in improved response-time while
running CPU intensive processes, like anti-virus scans, ripping/burning media.
2. Assuming that the die can fit into the package, physically, the multi-core CPU designs require
much less printed Circuit Board(PCB) space than multi-chip SMP designs. Also, a dual core
processor uses slightly less power than two coupled single core processors, principally because
of the decreased power required to drive signals external to the chip.
Page 7 of 10
Introduction to Distributed System ITec3102
Traditionally, consistency has been discussed in the context of read and writes operations on
shared data, available by means of (distributed) shared memory, a (distributed) shared database,
or a (distributed) file system. In this section, we use the broader term data store.
A data store may be physically distributed across multiple machines. In particular, each process
that can access data from the store is assumed to have a local (or nearby) copy available of the
entire store. Write operations are propagated to the other copies, as shown in Fig. 1. A data
operation is classified as a write operation when it changes the data, and is otherwise classified
as a read operation. A consistency model is essentially a contract between processes and the data
store. It says that if processes agree to obey certain rules, the store promises to work correctly.
Normally, a process that performs a read operation on a data item, expects the operation to return
a value that shows the results of the last write operation on that data.
Fig.1. The general organization of a logical data store, physically distributed and replicated
across multiple processes. Data-centric consistency models are classified into the following
types:
a. Sequential Consistency
In the following, we will use a special notation in which we draw the operations of a process
along a time axis. The time axis is always drawn horizontally, with time increasing from left to
right. The symbols Wi(x)a and Ri(x)b
mean that a write by process P; to data item x with the value a and a read from
that item by Pi returning b have been done, respectively. We assume that each data item is
initially NIL. When there is no confusion concerning which process is accessing data, we omit
the index from the symbols W and R.
Page 8 of 10
Introduction to Distributed System ITec3102
Fig. 1. Behavior of two processes operating on the same data item. The horizontal axis is time.
As an example, in Fig. 1. P1 does a write to a data item x, modifying its value to a. Note that, in
principle, this operation WI (x)a is first performed on a copy of the data store that is local to PI,
and is then subsequently propagated to the other local copies. In our example, P2 later reads the
value NIL, and sometime after that a (from its local copy of the store). What we are seeing here
is that it took some time to propagate the update of x to P2, which is perfectly acceptable.
Sequential consistency is an important data-centric consistency model, which was first defined
by Lamport in the context of shared memory for multiprocessor systems. In general, a data store
is said to be sequentially consistent when it satisfies the following condition:
The result of any execution is the same as if the (read and write) operations by all processes on
the data store were executed in some sequential order and the operations of-each individual
process appear in this sequence in the order specified by its program.
b. Strict Consistency
Any read always returns the result of the most recent write.
Analysis:
1. In a single processor system strict consistency is for free, it’s the behavior of main memory
with atomic reads and writes.
2. However, without the notion of a global time it is hard to determine what is the most recent
write.
c. Causal Consistency
The causal consistency model (Hutto and Ahamad) represents a weakening of sequential
consistency in that it makes a distinction between events that are potentially causally related and
those that are not. If event b is caused or influenced by an earlier event a, causality requires that
everyone else first see a, then see b.
Consider a simple interaction by means of a distributed shared database. Suppose that process
P1, writes a data item x. Then P2 reads x and writes y. Here the reading of x and the writing of y
are potentially causally related because the computation of y may have depended on the value of
x as read by Pz .
On the other hand, if two processes spontaneously and simultaneously write two different data
items, these are not causally related. Operations that are not causally related are said to be
Page 9 of 10
Introduction to Distributed System ITec3102
concurrent. For a data store to be considered causally consistent, it is necessary that the
store obeys the following condition:
Writes that are potentially causally related must be seen by all processes in the same order.
Concurrent writes may be seen in a different order on different machines.
As an example of causal consistency, consider Fig. 2. Here we have an event
sequence that is allowed with a causally-consistent store, but which is forbidden
with a sequentially-consistent store or a strictly consistent store. The thing to note
is that the writes Wz(x)b and WI (x)c are concurrent, so it is not required that all
processes see them in the same order.
Fig.3. This sequence is allowed with a causally-consistent store, but not with a sequentially
consistent store.
Now consider a second example. In Fig. 3(a) we have Wz(x)b potentially depending on WI (x)a
because the b may be a result of a computation involving the value read by Rz(x)a. The two
writes are causally related, so all processes must see them in the same order. Therefore, Fig. 3(a)
is incorrect. On the other hand, in Fig. 3(b) the read has been removed, so WI (x)a and Wz(x)b
are now concurrent writes. A causally-consistent store does not require concurrent writes
to be globally ordered, so Fig.3(b) is correct. Note that Fig.3(b) reflects a situation that would not
be acceptable for a sequentially consistent store.
Page 10 of 10