Chapter 6 Synchronozation Coordination
Chapter 6 Synchronozation Coordination
Outline
Clock Synchronization
Logical Clocks
Mutual Exclusion
Election Algorithms
2
Objectives of the Chapter
3
Introduction
4
Introduction…
5
6.1 Clock Synchronization
6
when each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time
7
Physical Clocks
Even if all computers initially start at the same time, they
will get out of synch after some time due to crystals in
different computers running at different frequencies, a
phenomenon called clock skew
how is time actually measured?
earlier astronomically;
based on the amount of time it takes the earth to rotate the sun;
1 solar second = 1/86400th of a solar day (24*3600 = 86400)
it was later discovered that the period of the earth’s
rotation is not constant;
the earth is slowing down due to tidal friction and atmospheric drag;
geologists believe that 300 million years ago there were about 400 days per year;
the length of the year is not affected, only the days have become longer
astronomical timekeeping was later replaced by counting
the transitions of the cesium 133 atom and led to what is
known as TAI - International Atomic Time
8
TAI was also found to have a problem; 86,400 TAI seconds
are behind a solar day by 3 msec, as a result of the day
getting longer everyday
UTC (Universal Coordinated Time) was introduced by
having leap seconds whenever the discrepancy between
TAI and solar time grows to 800 msec
UTC replaced the astronomical GMT
in some countries, UTC is broadcasted on shortwave radio
and satellites (as a short pulse at the start of each UTC
second) for those who need precise time; but one has to
compensate for the propagation delay
9
Clock Synchronization Algorithms
two situations:
one machine has a receiver of UTC time, then how do we
synchronize all other machines to it
no machine has a receiver, each machine keeps track of
its own time, then how to synchronize them
Many algorithms have been proposed
10
A model for all algorithms
each machine has a timer that ticks H times per second or
causes an interrupt; the interrupt handler adds 1 to the clock
let the value of the clock that is obtained so be C
when the UTC time is t, the value of the clock on machine p is
Cp(t); if everything is perfect, Cp(t) = t or dC/dt = 1
but in practice there will be errors; either it ticks faster or slower
if is a constant such that 1- dC/dt 1+ , then the timer is
said to be working within its specification
is set by the manufacturer and is called the maximum drift rate
the relation between clock time and UTC when clocks tick at different rates 11
if two clocks are drifting in the opposite direction, at a
time t after they were synchronized, they may be as
much as 2.t apart
If the system designers want to guarantee a precision
π , that is , that no two clocks should ever differ by
more than π seconds, then clocks must be
resynchronized(in software) at least every π /(2)
seconds.
how is it done?
12
Network Time Protocol (originally by Cristian)
suitable when one machine has a UTC receiver or
a correct clock; let us call this machine a time
server
then let clients (A) contact the time server (B)
problem: message delays; how to estimate these
delays
13
assume that the propagation delays from A to B is the same
as from B to A, i.e., T2 - T1 ≈ T4 - T3
then A can estimate its offset relative to B as
(T2 T1 ) (T4 T3 ) (T2 T1 ) (T3 T4 )
T3 T4
2 2
problem: time must never run backward or should not be
less than 0, i.e., A’s clock is faster than B’s
solution: introduce change gradually; if a timer is set to
generate 100 interrupts per second, it means 10 msec
must be added to the time; then make it say 9 (to slow it
down, < 0) or 11 msec (to advance it gradually, > 0)
14
The Berkley Algorithm
in the previous algorithm, the time server is passive;
only other machines ask it periodically
in Berkley UNIX, a time daemon asks every machine from
time to time to ask the time
it then calculates the average and sends messages to all
machines so that they will adjust their clocks accordingly
suitable when no machine has a UTC receiver
the time daemon’s time must be set periodically manually
15
a) the time daemon asks all the other machines for their clock values
b) the machines answer how far ahead or behind the time daemon they are
c) the time daemon tells everyone how to adjust their clock
16
6.2 Logical Clocks
For some applications, it is sufficient if all machines
agree on the same time, rather than with the real time;
We need internal consistency of the clocks rather than
being close to the real time
hence the concept of logical clocks
what matters is the order in which events occur
17
Coordination: Logical clocks Lamport’s logical clocks
Issue
What usually matters is not that all processes agree on exactly what
time it is, but that they agree on the order in which events occur.
Requires a notion of ordering.
8 / 18
49
Coordination: Logical clocks Lamport’s logical clocks
Issue
What usually matters is not that all processes agree on exactly what
time it is, but that they agree on the order in which events occur.
Requires a notion of ordering.
Note
This introduces a partial ordering of events in a system with
concurrently operating processes.
8 / 19
49
Coordination: Logical clocks Lamport’s logical clocks
Logical clocks
Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?
9 / 20
49
Coordination: Logical clocks Lamport’s logical clocks
Logical clocks
Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?
9 / 21
49
Coordination: Logical clocks Lamport’s logical clocks
Logical clocks
Problem
How do we maintain a global view on the system’s behavior that is
consistent with the happened-before relation?
Problem
How to attach a timestamp to an event when there’s no global clock ⇒
maintain a consistent set of logical clocks, one per process.
9 / 22
49
Coordination: Logical clocks Lamport’s logical clocks
Notes
►Property P1 is satisfied by (1); Property P2 by (2) and (3).
►It can still occur that two events happen at the same time. Avoid this
by breaking ties through process IDs.
Coordination: Logical clocks Lamport’s logical clocks
24 32 40 24 32 40
30 40 50 30 40 P adjusts
2 50
36 48 60 36 48 its clock 60
42 56 70 42 P adjusts
1 61 m 70
m3
48 its clock
3
48 64 80 69 80
54 m 72 90 70 m 77 90
4 4
60 80 100 100
76 85
Coordination: Logical clocks Lamport’s logical clocks
25 /
49
Adjust local clock Adjust local clock
and timestamp message
Middleware layer
Network layer
Coordination: Logical clocks Lamport’s logical clocks
Update 1 Update 2
26 /
49
Replicated database
Update 1 is Update 2 is
performed before performed before
update 2 update 1
Result
In absence of proper synchronization:
replica #1 ← $1111, while replica #2 ← $1110.
Example: Total-ordered multicasting
Coordination: Logical clocks Lamport’s logical clocks
Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.
Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.
Solution
►Process Pi sends timestamped message mi to all others. The message
itself is put in a local queue queuei .
►Any incoming message at Pj is queued in queuej , according to its
timestamp, and acknowledged to every other process.
Note
We are assuming that communication is reliable and FIFO ordered.
1 def receive(self):
2 msg = self.chan.recvFrom(self.otherProcs)[1] # Pick up any message
3 self.clock = max(self.clock, msg[0]) # Adjust clock value...
4 self.clock = self.clock + 1 # ...and increment
5 if msg[2] == ENTER:
self.queue.append(msg)
31 /
6 49 # Append an ENTER request #
7 self.allowToEnter(msg[1]) and unconditionally all
8 elif msg[2] == ALLOW:
9 self.queue.append(msg) # Append an ALLOW
10 elif msg[2] == RELEASE:
11 del(self.queue[0]) # Just remove first messa # And
12 self.cleanupQ() sort and cleanup
Vector clocks
Observation
Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally
preceded b.
Concurrent message Observation
transmission using logical Event a: m1 is received at T = 16;
clocks Event b: m2 is sent at T = 20.
P
1
P
2
P
3
0 0 0
6 m 8 10
1
12 16 m 20
2
18 24 30
24 32 m
3 40
30 40 50
36 48 60
42 61 m 70
4
48 69 80
70 m 77 90
5
76 85 100
18 / 33
49
Coordination: Logical clocks Vector clocks
Vector clocks
Observation
Lamport’s clocks do not guarantee that if C(a) < C(b) that a causally
preceded b.
Concurrent message
transmission using logical Observation
clocks Event a: m1 is received at T = 16; Event b:
m2 is sent at T = 20.
P
1
P
2
P
3
0 0 0
6
m1
8 10 Note
12 16 m 20 We cannot conclude that a causally precedes b.
2
18 24 30
24 32 m 3
40
30 40 50
36 48 60
42 61 m 70
4
48 69 80
70 m 77 90
5
76 85 100
18 / 34
49
Coordination: Logical clocks Vector clocks
Causal dependency
Definition
We say that b may causally depend on a if ts(a) < ts(b), with:
►for all k , ts(a)[k ] ≤ ts(b)[k ] and
►there exists at least one index k for which ts(a)[k ] < ts(b)[k ]
35 / 49
Capturing causality
m m
1
m2 m3 m1 2
m (4,3,0) (2,3,
P2 3 P2 0)
(0,1,0) (2,2,0)
(4,2,0) (0,1,0) m 4 m 4
P 3 P 3
(2,1,1) (4,3,2) (2,3,1) (4,3,2)
37 / 49
(a) (b)
Analysis
ts(m2 ) ts(m2 )
Situation ts(m2 ) ts(m4 ) < )
ts(m > )
ts(m Conclusion
4 4
Observation
We can now ensure that a message is delivered only if all causally
preceding messages have already been delivered.
Adjustment
Pi increments VCi [i ] only when sending a message, and Pj “adjusts” VCj
when receiving a message (i.e., effectively does not change VCj [j ]).
22 / 38
49
Coordination: Logical clocks Vector clocks
Observation
We can now ensure that a message is delivered only if all causally
preceding messages have already been delivered.
Adjustment
Pi increments VCi [i ] only when sending a message, and Pj “adjusts” VCj
when receiving a message (i.e., effectively does not change VCj [j ]).
22 / 39
49
Coordination: Logical clocks Vector clocks
(1,1,0)
P 2
(1,0,0) m*
P 3
23 / 40
49
Coordination: Logical clocks Vector clocks
(1,1,0)
P 2
(1,0,0) m*
P 3
Example
Take VC3 = [0, 2, 2], ts(m) = [1, 3, 0] from P1. What information does
P3 have, and what will it do when receiving m (from P1)?
23 / 41
49
6.3 Mutual Exclusion
Fundamental to DS is the concurrency and collaboration among
multiple processes
Processes will need to simultaneously access the same resources
When a process has to read or update shared data (quite often to
use a shared resource such as a printer, a file, etc.),
Concurrent accesses corrupt the resource, or make it inconsistent,
solution are needed to grant mutual exclusive access by
processes
Distributed mutual exclusion algorithms can be classified into two:
a. Token-based solutions
b. Permission-based approach
42
6.3 Mutual Exclusion..
a. Token-based solutions
achieved by passing a special message between the
processes, known as a token
a token is passed between processes; if a process wants
to transmit, it waits for the token, transmits its message
and releases the token
assume a bus network (e.g., Ethernet); no physical
ordering of processes required but a logical ring is
constructed by software
43
an unordered group of processes on a network
a logical ring constructed in software
44
when the ring is initialized, process 0 is given a token
the token circulates around the ring
a process can enter into a critical region only when it has
access to the token
it releases it when it leaves the critical region
it should not enter into a second critical region with the
same token
mutual exclusion is guaranteed
no starvation
problems
if the token is lost, it has to be generated, but detecting
that it is lost is difficult
if a process crashes
can be modified to include acknowledging a token so
that the token can bypass a crashed process
in this case every process has to maintain the current
ring configuration 45
b. Permission-based approach: based on getting permission from other
processes
two algorithms: centralized and distributed
A Centralized Algorithm
a coordinator is appointed and is in charge of granting permissions
three messages are required: request, grant, release
a) process 1 asks the coordinator for permission to the shared resource
(through a request message); permission is granted (through a grant
message)
b) process 2 then asks permission to enter the same resource; the
coordinator does not reply (could also send a no message; is
implementation dependent), but queues process 2; process 2 is
blocked
c) when process 1 releases the resource, the coordinator replies to
process 2
46
The algorithm
guarantees mutual exclusion
is fair - first come first served
no starvation
easy to implement; requires only three messages:
request, grant, release
Shortcoming: the coordinator is a single point of failure, so
if it is crashes, the entire system may go down (specially if
processes block after sending a request); it also becomes
a performance bottleneck; the solution is a decentralized
algorithm (Reading Assignment) 47
A Distributed Algorithm
assume that there is a total ordering of all events in the
system, such as using Lamport logical clocks
when a process wants to access a shared resource, it
builds a message containing the name of the resource,
its process number , and current (logical ) time., and
then sends the message to all other processes,
conceptually including itself, i
the sending of a message is assumed to be reliable; i.e.,
every message is acknowledged
48
When a process receives a request message from another
process, the action it takes depends on its own state w.r.t the
resource named in the message
Three different cases have to be clearly distinguished:
1. if the receiver is not accessing the resource and does not
want to access it, it sends back an OK message to the
sender
2. if the receiver is already has access to the resource, it
simply does not reply; instead it queues the request
3. if the receiver wants to access the resource as well but has
not yet done so, it compares the timestamp of the incoming
message with the one contained in the message has a
lower timestamp ; the lowest one wins; if the incoming
message is lower timestamp, the receiver sends back an
OK message; If its own message has a lower timestamp ,
the receiver queues the incoming request and sends
nothing
49
After sending out requests asking permission, a process
sits back and waits until everyone else has given
permission.
As soon as all the permissions are in, it may go ahead
When it finishes it sends an OK message to all processes
in its queue deletes them all from the queue
If there is no conflict, it clearly works.
It is possible that two processes try to simultaneously
access the resource, as shown in Figure 6.16
a) two processes (P0 and P2) want to access a shared
resource at the same moment
b) process P0 has the lowest timestamp, so it wins
c) when process P0 is done, it sends an OK message, so
process P2 can now go ahead
50
mutual exclusion is guaranteed
If the total number of processes is N, then the number of messages
that a processes needs to send and receive before it can enter its
critical section is 2(n-1), where n is the number of processes
no single point of failure; unfortunately n points of failure
we also have n bottlenecks
hence, it is slower, more complicated, more expensive, and less
robust than the previous one; but shows that a distributed
algorithm is possible
51
A comparison of the three algorithms (assumes only point-to-point communication
channels are used)
52
6.4 Election Algorithms
There are situations where one process must act as a coordinator, initiator,
or perform some special task
Assume that
each process has a unique identifier id(P)
every process knows the process number of every other process, but not
the state of each process (which ones are currently running and which
ones are down)
Election algorithms attempt to locate the process with the highest identifier
and designate it as coordinator
The algorithms differ in the way they locate the coordinator
Two traditional algorithms: the Bully algorithm and the Ring algorithm
for elections in wireless environments and large-scale systems
53
The Bully Algorithm (the biggest person wins)
when a process (say Pk) notices that the coordinator is no
longer responding to requests, it initiates an election as
follows
1. Pk sends an ELECTION message to all processes with
higher identifiers (Pk+1 , Pk+2, …, PN-1)
if a process gets an ELECTION message from one of
its lower-numbered colleagues, it sends an OK
message to the sender and holds an election
2. If no one responds, Pk wins the election and becomes a
coordinator
3. If one of the higher-ups answers, it takes over and P k ’s
job is done
54
a) Process 4 holds an election
b) Process 5 and 6 respond, telling 4 to stop
c) Now 5 and 6 each hold an election
55
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
the last one will send a message to all processes telling them that
it is a boss
if a process that was previously down comes back, it holds an
election
56
The Ring Algorithm
based on the use of a ring
assume that processes are physically or logically ordered, so
that each process knows who its successor is
when a process notices that the coordinator is not functioning,
it builds an ELECTION message containing its own process
identifier and sends it to its successor
if the successor is down, the sender skips over the
successor and goes to the next number along the ring, or
the one after that, until a running process is located
at each step, the sender adds its own identifier to the list in
the message effectively making itself a candidate
eventually the message gets back to the originating process;
the process with the highest identifier is elected as
coordinator, the message type is changed to COORDINATOR
and circulated once again to announce the coordinator and
who the members of the ring are
57
e.g., processes 2 and 5 simultaneously discover that the
previous coordinator has crashed
they build an ELECTION message and a coordinator is chosen
although the process is done twice, there is no problem, only
a little bandwidth is consumed