Coordination and Agreement
Coordination and Agreement
Master 2007
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
11/26/20
VFSTR 2
19
Distributed Mutual Exclusion
(1)
Process 2
Process 1 Process 3
…
Shared Process n
resource
Prevent interference
Ensure consistency when accessing the resources
11/26/20
VFSTR 3
19
Distributed Mutual
Exclusion (2)
Mutual exclusion useful when the server managing
the resources don’t use locks
Critical section
11/26/20
VFSTR 4
19
Distributed Mutual
Exclusion (3)
Distributed mutual exclusion: no shared variables, only message
passing
Properties:
Safety: At most one process may execute in the critical
section at a time
Liveness: Requests to enter and exit the critical section
eventually succeed
No deadlock and no starvation
Basic hypotheses:
System: asynchronous
Processes: don’t fail
Message transmission: reliable
If no other process has the token at the time of request, then the server
replies immediately, granting the token.
If the token is currently held by another process, then the server does not
reply but queens the request on exiting the critical section, a message is
sent to the server, giving it back the token. 11/26/20
VFSTR 7
19
Central Server Algorithm
Server
Queue of
Holds the token
requests 4
2
2 3) Grant
token
1) Request 2) Release
token token P4
P1 Waiting
P2 P3
Holds the token
11/26/20
VFSTR 8
19
Ring-Based Algorithm (1)
A group of unordered
processes in a network
P4 P2 Pn P1 P3
Ethernet
11/26/20
VFSTR 9
19
Ring-Based Algorithm (2)
One of the simplest ways to arrange mutual exclusion between the M
processes without requiring an additional process is to arrange them in a
logical ring exclusion is conferred by obtaining a token in me form of d
message passed from process to process in a single direction- clockwise
around the ring.
If a process does not require entering the critical section when it receives
the token, then it immediately forwards the token to its neighbour.
A process that requires the token waits until it receives it but retains it.
To exit the critical section, the process sends the token on to its
neighbour. This algorithm continuously consumes network bandwidth
P1 Enter()
P2 • Critical
•
• Section
Pn Exit()
P3
P4
Token navigates
around the ring
11/26/20
VFSTR 11
19
Mutual Exclusion using Multicast and Logical
Clocks
The basic idea is that processes that require entry to a critical section
multicast a request message and can enter it only when all the other
process have replied to this message.
Messages requesting entry are of the form < T, Pi >, where T is the
sender's timestamp and Pi is the sender's identifier.
11/26/20
VFSTR 12
19
If a process requests entry and the state of all other processes is
RELEASED, then processes will reply immediately the request and the
requestor will obtain entry.
If some process is in state HELD, then that process will not reply to
requests until finished with the CS and so the requester cannot gain entry
in the meantime.
If two or more process request entry at the same time then whichever
process's request bears the lowest timestamp will be first to collect N-l
replies, granting it entry next.
If the request equal timestamp the requests are ordered according to the
processes corresponding identifiers.
Waiting
queue 19 P3
19
2
P1
23
Enter() 23 P1 and P2 request
• entering the critical
•
• 19 23 section simultaneously
Exit()
P2
Critical Section
11/26/20
VFSTR 14
19
Mutual Exclusion using Multicast
and Logical Clocks (2)
Main steps of the algorithm:
Initialization
State := RELEASED;
11/26/20
VFSTR 21
19
M. E. Algorithms Comparison
Number of messages
Algorithm Enter()/Exit Before Enter() Problems
Crash of a process
Virtual Token lost
1 to 0 to N-1
ring Ordering non
satisfied
Logical Crash of a
2(N-1) 2(N-1)
clocks process
Maekawa’s Alg. 3N 2N Crash of a
process who votes
11/26/20
VFSTR 22
19
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
11/26/20
VFSTR 23
19
Election Algorithms (1)
i identifier
process with the
Liveness: pi participates and sets Elected
largest
NIL, or
VFSTR
crashes 11/26/20
19 24
An election algorithm determines which process will play the role of
coordinator or server. All processes need to agree on the selected
process. Any process can start an election, for example if it notices
that the previous coordinator has failed. The requirements of an
election algorithm are as follows:
Safety: Only one process is chosen -- the one with the largest
identifying value. The value could be load, uptime, a random number,
etc.
Liveness: All process eventually choose a winner or crash.
11/26/20
VFSTR 25
19
Election Algorithms (2)
Bully Algorithm
11/26/20
VFSTR 26
19
Processes are arranged in a logical ring. A process starts an
election by placing its ID and value in a message and sending the
message to its neighbor. When a message is received, a process
does the following:
If the value is greater that its own, it saves the ID and forwards
the value to its neighbor.
Else if its own value is greater and the it has not yet participated
in the election, it replaces the ID with its own, the value with its
own, and forwards the message.
Else if it has already participated it discards the message.
If a process receives its own ID and value, it knows it has been
elected. It then sends an elected message to its neighbor.
When an elected message is received, it is forwarded to the next
neighbor.
11/26/20
VFSTR 27
19
Ring-Based Election Algorithm
(1)
5
5
16
16
9
25
Process 5 starts
25
the election
25
11/26/20
VFSTR 28
19
Ring-Based Election Algorithm
(2)
Initialization
Participanti := FALSE;
Electedi := NIL
Pi starts an election
Participanti := TRUE;
Send the message <election, pi> to its neighbor
Participanti := FALSE;
If pi pj
Then Send the message <elected, pj> to its neighbor
11/26/20
VFSTR 29
19
Ring-Based Election Algorithm
(3)
Receipt of the election’s message <election, pi> at pj
If pi > pj
Then Send the message <election, pi> to its neighbor
Participantj := TRUE;
Else If pi < pj AND Participantj = FALSE
DelayTrans.
DelayTrait.
T = 2 DelayTrans. + DelayTrait.
11/26/20
VFSTR 31
19
Bully Algorithm (2)
Hypotheses (cont’d):
Each process knows which processes have higher
identifiers, and it can communicate with all such
processes
Three types of messages:
Election: starts an election
OK: sent in response to an election message
Coordinator: announces the new coordinator
Election started by a process when it notices, through
timeouts, that the coordinator has failed
11/26/20
VFSTR 32
19
Bully Algorithm (3)
3 6
Process 5 detects
5 it first Election
OK
7 New Coordinator
1 4
8 Coordinator failed
11/26/20
VFSTR 33
19
Bully Algorithm (4)
Initialization
Electedi := NIL
Elected := pj;
11/26/20
VFSTR 36
19
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
11/26/20
VFSTR 37
19
Group Communication (1)
Coordination
Agreement: on the set of messages that is
received and on the delivery ordering
Groups:
Primitives:
multicast(g, m): sends the message m to all
members of group g
deliver(m) : delivers the message m to the
calling process
sender(m) : unique identifier of the process that
sent the message m
group(m): unique identifier of the group to which
the message m was sent
11/26/20
VFSTR 40
19
Group Communication (4)
Basic Multicast
Reliable Multicast
Ordered Multicast
11/26/20
VFSTR 41
19
Basic Multicast
To B_multicast(g, m)
For each process p g, send(p, m);
Use
On receive(m) of
at p threads to perform the send
operations simultaneously
B_deliver(m) to p
Properties to satisfy:
Integrity: A correct process P delivers the message
m at most once
B-multicast(g, m); // p g
B-deliver(m) by q with g = group(m)
If (m msgReceived)
Then msgReceived := msgReceived {m};
If (q p) Then B-multicast(g, m);
VFSTR R-deliver(m); 11/26/20
19 44
Ordered Multicast
Ordering categories:
FIFO Ordering
Total Ordering
Causal Ordering
Hybrid Ordering: Total-Causal,
Total-FIFO
11/26/20
VFSTR 45
19
FIFO Ordering (1)
m3
m2
Unique
TO-multicast(g, m) by p identifier of m
11/26/20
VFSTR 52
19
Total Ordering (6)
p3
p3 p3 A p3 = SN
Pg = MAX(Ag, Pgg ) + 1 P3
Proposition
Assigning
Message of
a sequence
a sequence
transmission
number to the
P3
<Ident.,
<m, Ident.>
P
SN>
g > number
message
p2
Ag = SN P2 P4
<Ident., Pg SN>
<Ident.,
<m, Ident.>
> <Ident.,
<m,
<Ident., Pg >
Ident.>
SN>
P2 P1 P4
p1 pi p4
p2
Pg =
p2
MAX(Ag,
p2
Pg SN =
A
)+1 gMAX= SN
i=1,..,5 (P
PP5g )
g
p4
= A p4 = SN
p4
MAX(Ag, Pg )
g +1
<Ident.,
<m, Ident.>
P
SN>
g >
p5 p5 p5 p5
Ag = SN P5 Pg = MAX(Ag, Pg )+1
11/26/20
VFSTR 53
19
Causal Ordering (1)
m2
m3
Initialization
Example
g
Vi [j] := 0 (j = 1, …, N);
11/26/20
VFSTR
19 55
Causal Ordering (3)
CO-multicast(g, m)
g g
Vi [i] := Vi [i] + 1;
g
B-multicast(g, <m,Vi >);
g
B-deliver(<m, Vj >) of pj, with g = group(m)
g
Place <m, Vj> in a hold-back queue;
g g g g
V
Wait until (Vj [j] = i [j] + 1) AND ( Vj [k] Vi [k] );
(k j)
CO-deliver(m);
g g
Vi [j] := Vi [j] + 1;
11/26/20
VFSTR 56
19
Outline
Introduction
Distributed Mutual Exclusion
Election Algorithms
Group Communication
Consensus and Related Problems
11/26/20
VFSTR 57
19
Consensus introduction
11/26/20
VFSTR 58
19
Consensus (1)
V1:=proceed V2:=proceed
Consensus
algorithm
V3:=abort
P3 (Crashes)
11/26/20
VFSTR 60
19
Consensus (3)
Proprieties to satisfy:
Termination: Eventually each correct process
sets its decision variable
Agreement: the decision value of all correct
processes is the same:
Pi and Pj are correct di = dj (i,j=1, …, N)
Integrity: If the correct processes all proposed
the same value, then any correct process in the
decided state has chosen that value
11/26/20
VFSTR 61
19
Consensus (4)
Consensus in a synchronous system:
Use of basic multicast Valuesir : set of proposed
values known to process pi at
At most f processes may crash
the beginning of round r
f+1 rounds are necessary
Delay of one round is bounded by a timeout
11/26/20
VFSTR 62
19
Consensus (5)
Interactive consistency problem: variant of the consensus
problem
Objective: correct processes must agree on a vector of values,
one for each process
Proprieties to satisfy:
Termination: Eventually each correct process sets its
decision variable
Agreement: the decision vector of all correct processes is
the same
Integrity: If Pi is correct, then all correct processes decide
on Vi as the ith component of their vector
11/26/20
VFSTR 63
19
Consensus (6)
Byzantine generals problems: variant of the consensus
problem
Objective: a distinguished process supplies a value that the
others must agree upon
Proprieties to satisfy:
Termination: Eventually each correct process sets its
decision variable
Agreement: the decision value of all correct processes is
the same
same:
Pi and Pj are
Integrity: correct
If the di = dj is
commander (i,j=1, …, N)then all correct
correct,
processes decide on the value that the commander
proposed 11/26/20
VFSTR 64
19
Consensus (7)
Byzantine agreement in a synchronous system:
Example : a system composed of three processes (must
Commander Commander
1 1 1 0
11/26/20
VFSTR 67
19
References
Other presentations
Wikipedia: www.wikipedia.com
11/26/20
VFSTR 68
19
11/26/20
VFSTR 69
19