0% found this document useful (0 votes)
169 views58 pages

Chapter 6 Synchronozation Coordination

Uploaded by

Aga Chimdesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views58 pages

Chapter 6 Synchronozation Coordination

Uploaded by

Aga Chimdesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Chapter 6 – Synchronization/ Coordination

Outline
 Clock Synchronization
 Logical Clocks
 Mutual Exclusion
 Election Algorithms

2
Objectives of the Chapter

 At the end of this chapter, the students are able to discuss


 the issue of synchronization based on time (actual time and
relative ordering)
 distributed mutual exclusion to protect shared resources
from simultaneous access by multiple processes
 how a group of processes can appoint a process as a
coordinator; can be done by means of election algorithms

3
Introduction

 Apart from communication, it requires to identify how do


processes cooperate and synchronize with one another
 Cooperation is partly supported by naming: it allows
processes to at least share resources (entities)
 This chapter mainly concentrate on how processes can
synchronize and coordinate their action
 It is important that multiple processes do not simultaneously
access a shared resource, such as a file; but instead
cooperate in granting each other temporary exclusive access
 Also need to address how events can be ordered
 such as whether message m1 from process P was sent before or after
message m2 from process Q

4
Introduction…

 Synchronization and coordination are two closely related


phenomena
 In process synchronization we make sure one process waits
for another to complete its operation
 When dealing with data synchronization, the problem is to
ensure that two sets of data are the same
 To coordination, the goal is to manage the interactions and
dependencies between activities in DS
 From this perspective, one could state that coordination
encapsulates synchronization

5
6.1 Clock Synchronization

 In centralized systems, time can be unambiguously decided


by a system call
 e.g., process A at time t1 gets the time, say tA, and process B at time t2,
where t1 < t2, gets the time, say tB
 then tA is always less than (possibly equal to but never greater than) t B
 Achieving agreement on time in distributed systems is
difficult
 e.g., consider the make program on a UNIX machine;
 it compiles only source files for which the time of their last update was
later than the existing object file

6
when each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time

Is it possible to synchronize all the clocks in a distributed system?


The answer is surprisingly complicated

7
 Physical Clocks
 Even if all computers initially start at the same time, they
will get out of synch after some time due to crystals in
different computers running at different frequencies, a
phenomenon called clock skew
 how is time actually measured?
 earlier astronomically;
 based on the amount of time it takes the earth to rotate the sun;
 1 solar second = 1/86400th of a solar day (24*3600 = 86400)
 it was later discovered that the period of the earth’s
rotation is not constant;
 the earth is slowing down due to tidal friction and atmospheric drag;
 geologists believe that 300 million years ago there were about 400 days per year;
 the length of the year is not affected, only the days have become longer
 astronomical timekeeping was later replaced by counting
the transitions of the cesium 133 atom and led to what is
known as TAI - International Atomic Time

8
 TAI was also found to have a problem; 86,400 TAI seconds
are behind a solar day by 3 msec, as a result of the day
getting longer everyday
 UTC (Universal Coordinated Time) was introduced by
having leap seconds whenever the discrepancy between
TAI and solar time grows to 800 msec
 UTC replaced the astronomical GMT
 in some countries, UTC is broadcasted on shortwave radio
and satellites (as a short pulse at the start of each UTC
second) for those who need precise time; but one has to
compensate for the propagation delay

9
 Clock Synchronization Algorithms
 two situations:
 one machine has a receiver of UTC time, then how do we
synchronize all other machines to it
 no machine has a receiver, each machine keeps track of
its own time, then how to synchronize them
 Many algorithms have been proposed

10
 A model for all algorithms
 each machine has a timer that ticks H times per second or
causes an interrupt; the interrupt handler adds 1 to the clock
 let the value of the clock that is obtained so be C
 when the UTC time is t, the value of the clock on machine p is
Cp(t); if everything is perfect, Cp(t) = t or dC/dt = 1
 but in practice there will be errors; either it ticks faster or slower
 if  is a constant such that 1-   dC/dt  1+ , then the timer is
said to be working within its specification
  is set by the manufacturer and is called the maximum drift rate

the relation between clock time and UTC when clocks tick at different rates 11
 if two clocks are drifting in the opposite direction, at a
time t after they were synchronized, they may be as
much as 2.t apart
 If the system designers want to guarantee a precision
π , that is , that no two clocks should ever differ by
more than π seconds, then clocks must be
resynchronized(in software) at least every π /(2)
seconds.
 how is it done?

12
 Network Time Protocol (originally by Cristian)
 suitable when one machine has a UTC receiver or
a correct clock; let us call this machine a time
server
 then let clients (A) contact the time server (B)
 problem: message delays; how to estimate these
delays

getting the current time from a time server

13
 assume that the propagation delays from A to B is the same
as from B to A, i.e., T2 - T1 ≈ T4 - T3
 then A can estimate its offset relative to B as
(T2  T1 )  (T4  T3 ) (T2  T1 )  (T3  T4 )
  T3   T4 
2 2
 problem: time must never run backward or  should not be
less than 0, i.e., A’s clock is faster than B’s
 solution: introduce change gradually; if a timer is set to
generate 100 interrupts per second, it means 10 msec
must be added to the time; then make it say 9 (to slow it
down,  < 0) or 11 msec (to advance it gradually,  > 0)

14
 The Berkley Algorithm
 in the previous algorithm, the time server is passive;
 only other machines ask it periodically
 in Berkley UNIX, a time daemon asks every machine from
time to time to ask the time
 it then calculates the average and sends messages to all
machines so that they will adjust their clocks accordingly
 suitable when no machine has a UTC receiver
 the time daemon’s time must be set periodically manually

15
a) the time daemon asks all the other machines for their clock values
b) the machines answer how far ahead or behind the time daemon they are
c) the time daemon tells everyone how to adjust their clock

 there are also other algorithms


 read about clock synchronization in wireless networks;

16
6.2 Logical Clocks
 For some applications, it is sufficient if all machines
agree on the same time, rather than with the real time;
 We need internal consistency of the clocks rather than
being close to the real time
 hence the concept of logical clocks
 what matters is the order in which events occur

17
Coordination: Logical clocks Lamport’s logical clocks

The Happened-before relationship

Issue
What usually matters is not that all processes agree on exactly what
time it is, but that they agree on the order in which events occur.
Requires a notion of ordering.

8 / 18
49
Coordination: Logical clocks Lamport’s logical clocks

The Happened-before relationship

Issue
What usually matters is not that all processes agree on exactly what
time it is, but that they agree on the order in which events occur.
Requires a notion of ordering.

The happened-before relation


►If a and b are two events in the same process, and a comes before
b, then a → b.
► If a is the sending of a message, and b is the receipt of that
message, then a → b
If a → b and b → c, then a → c

Note
This introduces a partial ordering of events in a system with
concurrently operating processes.

8 / 19
49
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks

Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?

9 / 20
49
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks

Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?

Attach a timestamp C(e) to each event e, satisfying the


following properties:
P1 If a and b are two events in the same process, and a → b, then we
demand that C(a) < C(b).
P2 If a corresponds to sending a message m, and b to the receipt of that
message, then also C(a) < C(b).

9 / 21
49
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks

Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?

Attach a timestamp C(e) to each event e, satisfying the


following properties:
P1 If a and b are two events in the same process, and a → b, then we
demand that C(a) < C(b).
P2 If a corresponds to sending a message m, and b to the receipt of that
message, then also C(a) < C(b).

Problem
How to attach a timestamp to an event when there’s no global clock ⇒
maintain a consistent set of logical clocks, one per process.

9 / 22
49
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks: solution


Each process Pi maintains a local counter Ci and adjusts
this counter
1.For each new event that takes place within Pi , Ci is incremented by 1.
2.Each time a message m is sent by process Pi , the message receives a
timestamp ts(m) = Ci .
3.Whenever a message m is received by a process Pj , Pj adjusts
its local counter Cj to max{Cj , ts(m)}; then executes step 1 before
23 /
49

passing m to the application.

Notes
►Property P1 is satisfied by (1); Property P2 by (2) and (3).
►It can still occur that two events happen at the same time. Avoid this
by breaking ties through process IDs.
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks: example

Consider three processes with event counters operating at


different rates
P
1
P
2
P
3
P
1
P
2
P
3
0 0 0 0 0 0
6 m 8 10 6 8 10
1
m
1
12 16 20 12 16 20
18 24 m 30 24 / 18 24 30
2
49 m
2

24 32 40 24 32 40
30 40 50 30 40 P adjusts
2 50
36 48 60 36 48 its clock 60
42 56 70 42 P adjusts
1 61 m 70
m3
48 its clock
3
48 64 80 69 80
54 m 72 90 70 m 77 90
4 4

60 80 100 100
76 85
Coordination: Logical clocks Lamport’s logical clocks

Logical clocks: where implemented

Adjustments implemented in middleware


Application layer
Message is delivered
Application sends message to application

25 /
49
Adjust local clock Adjust local clock
and timestamp message
Middleware layer

Middleware sends message Message is received

Network layer
Coordination: Logical clocks Lamport’s logical clocks

Example: Total-ordered multicast


Concurrent updates on a replicated database are seen in
the same order everywhere
►P1 adds $100 to an account (initial value: $1000)
►P2 increments account by 1%
►There are two replicas

Update 1 Update 2
26 /
49

Replicated database
Update 1 is Update 2 is
performed before performed before
update 2 update 1

Result
In absence of proper synchronization:
replica #1 ← $1111, while replica #2 ← $1110.
Example: Total-ordered multicasting
Coordination: Logical clocks Lamport’s logical clocks

Example: Total-ordered multicast

Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.

Example: Total-ordered multicasting 14 / 27


49
Coordination: Logical clocks Lamport’s logical clocks

Example: Total-ordered multicast

Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.

Pj passes a message mi to its application if:


(1)mi is at the head of queuej
(2)for each process Pk , there is a message mk in queuej with a larger
timestamp.

Example: Total-ordered multicasting 14 / 28


49
Coordination: Logical clocks Lamport’s logical clocks

Example: Total-ordered multicast

Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.

Pj passes a message mi to its application if:


(1)mi is at the head of queuej
(2)for each process Pk , there is a message mk in queuej with a larger
timestamp.

Note
We are assuming that communication is reliable and FIFO ordered.

Example: Total-ordered multicasting 14 / 29


49
Coordination: Logical clocks Lamport’s logical clocks

Lamport’s clocks for mutual exclusion


1 class Process:
2 def init (self, chan):
3 self.queue = [] # The request queue
4 self.clock =0 # The current logical clock
5
6 def requestToEnter(self):
7 self.clock = self.clock + 1 # Increment clock value
8 self.queue.append((self.clock, self.procID, ENTER)) # Append request to q
9 self.cleanupQ() # Sort the queue
10 self.chan.sendTo(self.otherProcs, (self.clock,self.procID,ENTER))
30 /
# Send requ
11 49

12 def allowToEnter(self, requester):


13 self.clock = self.clock + 1 # Increment clock value
14 self.chan.sendTo([requester], (self.clock,self.procID,ALLOW)) # Permit other
15
16 def release(self):
17 tmp = [r for r in self.queue[1:] if r[2] == ENTER] # Remove all ALLOWs
18 self.queue = tmp # and copy to new queue #
19
self.clock = self.clock + 1 Increment clock value
20
self.chan.sendTo(self.otherProcs, (self.clock,self.procID,RELEASE)) # Release
21
22 def allowedToEnter(self):
23 commProcs = set([req[1] for req in self.queue[1:]]) # See who has sent a mess
24 return (self.queue[0][1]==self.procID and len(self.otherProcs)==len(commProcs

Example: Total-ordered multicasting


Coordination: Logical clocks Lamport’s logical clocks

Lamport’s clocks for mutual exclusion

1 def receive(self):
2 msg = self.chan.recvFrom(self.otherProcs)[1] # Pick up any message
3 self.clock = max(self.clock, msg[0]) # Adjust clock value...
4 self.clock = self.clock + 1 # ...and increment
5 if msg[2] == ENTER:
self.queue.append(msg)
31 /
6 49 # Append an ENTER request #
7 self.allowToEnter(msg[1]) and unconditionally all
8 elif msg[2] == ALLOW:
9 self.queue.append(msg) # Append an ALLOW
10 elif msg[2] == RELEASE:
11 del(self.queue[0]) # Just remove first messa # And
12 self.cleanupQ() sort and cleanup

Example: Total-ordered multicasting


Coordination: Logical clocks Lamport’s logical clocks

Lamport’s clocks for mutual exclusion

Analogy with total-ordered multicast


►With total-ordered multicast, all processes build identical queues,
delivering messages in the same order
32 /
49

►Mutual exclusion is about agreeing in which order processes are


allowed to enter a critical section

Example: Total-ordered multicasting


Coordination: Logical clocks Vector clocks

Vector clocks
Observation
Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally
preceded b.
Concurrent message Observation
transmission using logical Event a: m1 is received at T = 16;
clocks Event b: m2 is sent at T = 20.
P
1
P
2
P
3
0 0 0
6 m 8 10
1
12 16 m 20
2
18 24 30
24 32 m
3 40
30 40 50
36 48 60
42 61 m 70
4
48 69 80
70 m 77 90
5
76 85 100

18 / 33
49
Coordination: Logical clocks Vector clocks

Vector clocks
Observation
Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally
preceded b.
Concurrent message
transmission using logical  Observation
clocks  Event a: m1 is received at T = 16; Event b:
m2 is sent at T = 20.
P
1
P
2
P
3
0 0 0
6
m1
8 10  Note
12 16 m 20  We cannot conclude that a causally precedes b.
2
18 24 30
24 32 m 3
40
30 40 50
36 48 60
42 61 m 70
4
48 69 80
70 m 77 90
5
76 85 100

18 / 34
49
Coordination: Logical clocks Vector clocks

Causal dependency

Definition
We say that b may causally depend on a if ts(a) < ts(b), with:
►for all k , ts(a)[k ] ≤ ts(b)[k ] and

►there exists at least one index k for which ts(a)[k ] < ts(b)[k ]
35 / 49

Precedence vs. dependency


►We say that a causally precedes b.
►b may causally depend on a, as there may be information from a
that is propagated into b.
Coordination: Logical clocks Vector clocks

Capturing causality

Solution: each Pi maintains a vector VCi


►VCi [i ] is the local logical clock at process Pi .
►If VCi [j ] = k then Pi knows that k events have occurred at Pj .

Maintaining vector clocks 36 / 49

1.Before executing an event Pi executes VCi [i ] ← VCi [i ] + 1.


2.When process Pi sends a message m to Pj , it sets m’s (vector) timestamp
ts(m) equal to VCi after having executed step 1.
3.Upon the receipt of a message m, process Pj sets
VCj [k ] ← max{VCj [k ], ts(m)[k ]} for each k , after which it executes step 1
and then delivers the message to the application.
Coordination: Logical clocks Vector clocks

Vector clocks: Example


Capturing potential causality when exchanging messages
P (1,1,0) (2,1,0) (3,1,0) (4,1,0) (1,1,0) (2,1,0) (3,1,0) (4,1,0)
1
P
1

m m
1
m2 m3 m1 2

m (4,3,0) (2,3,
P2 3 P2 0)
(0,1,0) (2,2,0)

(4,2,0) (0,1,0) m 4 m 4

P 3 P 3
(2,1,1) (4,3,2) (2,3,1) (4,3,2)
37 / 49

(a) (b)

Analysis
ts(m2 ) ts(m2 )
Situation ts(m2 ) ts(m4 ) < )
ts(m > )
ts(m Conclusion
4 4

(a) (2, 1, 0) (4, 3, 0) Yes No m2 may causally precede m4


(b) (4, 1, 0) (2, 3, 0) No No m2 and m4 may conflict
Coordination: Logical clocks Vector clocks

Causally ordered multicasting

Observation
We can now ensure that a message is delivered only if all causally
preceding messages have already been delivered.

Adjustment
Pi increments VCi [i ] only when sending a message, and Pj “adjusts” VCj
when receiving a message (i.e., effectively does not change VCj [j ]).

22 / 38
49
Coordination: Logical clocks Vector clocks

Causally ordered multicasting

Observation
We can now ensure that a message is delivered only if all causally
preceding messages have already been delivered.

Adjustment
Pi increments VCi [i ] only when sending a message, and Pj “adjusts” VCj
when receiving a message (i.e., effectively does not change VCj [j ]).

Pj postpones delivery of m until:


1. ts(m)[i ] = VCj [i ] + 1
2. ts(m)[k ] ≤ VCj [k ] for all k I= i

22 / 39
49
Coordination: Logical clocks Vector clocks

Causally ordered multicasting

Enforcing causal communication


(1,0,0) (1,1,0)
P 1

(1,1,0)
P 2

(1,0,0) m*

P 3

(0,0,0) (1,0,0) (1,1,0)

23 / 40
49
Coordination: Logical clocks Vector clocks

Causally ordered multicasting

Enforcing causal communication


(1,0,0) (1,1,0)
P 1

(1,1,0)
P 2

(1,0,0) m*

P 3

(0,0,0) (1,0,0) (1,1,0)

Example
Take VC3 = [0, 2, 2], ts(m) = [1, 3, 0] from P1. What information does
P3 have, and what will it do when receiving m (from P1)?

23 / 41
49
6.3 Mutual Exclusion
 Fundamental to DS is the concurrency and collaboration among
multiple processes
 Processes will need to simultaneously access the same resources
 When a process has to read or update shared data (quite often to
use a shared resource such as a printer, a file, etc.),
 Concurrent accesses corrupt the resource, or make it inconsistent,
solution are needed to grant mutual exclusive access by
processes
 Distributed mutual exclusion algorithms can be classified into two:
a. Token-based solutions
b. Permission-based approach

42
6.3 Mutual Exclusion..
a. Token-based solutions
 achieved by passing a special message between the
processes, known as a token
 a token is passed between processes; if a process wants
to transmit, it waits for the token, transmits its message
and releases the token
 assume a bus network (e.g., Ethernet); no physical
ordering of processes required but a logical ring is
constructed by software

43
 an unordered group of processes on a network
 a logical ring constructed in software

44
 when the ring is initialized, process 0 is given a token
 the token circulates around the ring
 a process can enter into a critical region only when it has
access to the token
 it releases it when it leaves the critical region
 it should not enter into a second critical region with the
same token
 mutual exclusion is guaranteed
 no starvation
 problems
 if the token is lost, it has to be generated, but detecting
that it is lost is difficult
 if a process crashes
 can be modified to include acknowledging a token so
that the token can bypass a crashed process
 in this case every process has to maintain the current
ring configuration 45
b. Permission-based approach: based on getting permission from other
processes
 two algorithms: centralized and distributed
 A Centralized Algorithm
 a coordinator is appointed and is in charge of granting permissions
 three messages are required: request, grant, release
a) process 1 asks the coordinator for permission to the shared resource
(through a request message); permission is granted (through a grant
message)
b) process 2 then asks permission to enter the same resource; the
coordinator does not reply (could also send a no message; is
implementation dependent), but queues process 2; process 2 is
blocked
c) when process 1 releases the resource, the coordinator replies to
process 2

46
 The algorithm
 guarantees mutual exclusion
 is fair - first come first served
 no starvation
 easy to implement; requires only three messages:
request, grant, release
 Shortcoming: the coordinator is a single point of failure, so
if it is crashes, the entire system may go down (specially if
processes block after sending a request); it also becomes
a performance bottleneck; the solution is a decentralized
algorithm (Reading Assignment) 47
 A Distributed Algorithm
 assume that there is a total ordering of all events in the
system, such as using Lamport logical clocks
 when a process wants to access a shared resource, it
builds a message containing the name of the resource,
its process number , and current (logical ) time., and
then sends the message to all other processes,
conceptually including itself, i
 the sending of a message is assumed to be reliable; i.e.,
every message is acknowledged

48
 When a process receives a request message from another
process, the action it takes depends on its own state w.r.t the
resource named in the message
 Three different cases have to be clearly distinguished:
1. if the receiver is not accessing the resource and does not
want to access it, it sends back an OK message to the
sender
2. if the receiver is already has access to the resource, it
simply does not reply; instead it queues the request
3. if the receiver wants to access the resource as well but has
not yet done so, it compares the timestamp of the incoming
message with the one contained in the message has a
lower timestamp ; the lowest one wins; if the incoming
message is lower timestamp, the receiver sends back an
OK message; If its own message has a lower timestamp ,
the receiver queues the incoming request and sends
nothing
49
 After sending out requests asking permission, a process
sits back and waits until everyone else has given
permission.
 As soon as all the permissions are in, it may go ahead
 When it finishes it sends an OK message to all processes
in its queue deletes them all from the queue
 If there is no conflict, it clearly works.
 It is possible that two processes try to simultaneously
access the resource, as shown in Figure 6.16
a) two processes (P0 and P2) want to access a shared
resource at the same moment
b) process P0 has the lowest timestamp, so it wins
c) when process P0 is done, it sends an OK message, so
process P2 can now go ahead

50
 mutual exclusion is guaranteed
 If the total number of processes is N, then the number of messages
that a processes needs to send and receive before it can enter its
critical section is 2(n-1), where n is the number of processes
 no single point of failure; unfortunately n points of failure
 we also have n bottlenecks
 hence, it is slower, more complicated, more expensive, and less
robust than the previous one; but shows that a distributed
algorithm is possible

51
 A comparison of the three algorithms (assumes only point-to-point communication
channels are used)

Messages per Delay before entry


Algorithm Problems
entry/exit (in message times)
Coordinator
Centralized 3 - efficient 2
crash
Crash of any
Distributed 3. ( N – 1 ) 2. ( N– 1 )
process
Lost token,
Token ring 1, …, to  0, …, N – 1
process crash
Decentraliz 2.m.k +m,
2.m.k
ed k=1,2,…

52
6.4 Election Algorithms
 There are situations where one process must act as a coordinator, initiator,
or perform some special task
 Assume that
 each process has a unique identifier id(P)
 every process knows the process number of every other process, but not
the state of each process (which ones are currently running and which
ones are down)
 Election algorithms attempt to locate the process with the highest identifier
and designate it as coordinator
 The algorithms differ in the way they locate the coordinator
 Two traditional algorithms: the Bully algorithm and the Ring algorithm
 for elections in wireless environments and large-scale systems

53
 The Bully Algorithm (the biggest person wins)
 when a process (say Pk) notices that the coordinator is no
longer responding to requests, it initiates an election as
follows
1. Pk sends an ELECTION message to all processes with
higher identifiers (Pk+1 , Pk+2, …, PN-1)
 if a process gets an ELECTION message from one of
its lower-numbered colleagues, it sends an OK
message to the sender and holds an election
2. If no one responds, Pk wins the election and becomes a
coordinator
3. If one of the higher-ups answers, it takes over and P k ’s
job is done

54
a) Process 4 holds an election
b) Process 5 and 6 respond, telling 4 to stop
c) Now 5 and 6 each hold an election

55
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone

 the last one will send a message to all processes telling them that
it is a boss
 if a process that was previously down comes back, it holds an
election
56
 The Ring Algorithm
 based on the use of a ring
 assume that processes are physically or logically ordered, so
that each process knows who its successor is
 when a process notices that the coordinator is not functioning,
 it builds an ELECTION message containing its own process
identifier and sends it to its successor
 if the successor is down, the sender skips over the
successor and goes to the next number along the ring, or
the one after that, until a running process is located
 at each step, the sender adds its own identifier to the list in
the message effectively making itself a candidate
 eventually the message gets back to the originating process;
the process with the highest identifier is elected as
coordinator, the message type is changed to COORDINATOR
and circulated once again to announce the coordinator and
who the members of the ring are

57
 e.g., processes 2 and 5 simultaneously discover that the
previous coordinator has crashed
 they build an ELECTION message and a coordinator is chosen
 although the process is done twice, there is no problem, only
a little bandwidth is consumed

election algorithm using a ring 58

You might also like