DC - Unit 2 - Logical Time and Global State
DC - Unit 2 - Logical Time and Global State
Different systems will have their own local clock which will drift apart over
time due to difference in hardware, temperature and other factors.
Time taken by the messages to travel between nodes will vary due to which
synchronizing the events across the entire distributed system becomes a
challenge.
Physical clock
3. Faults and Failures
- Physical clock accuracy is disrupted by hardware failures, software errors
and network issues.
- When a node failure occurs maintaining a consistent global time becomes
more challenging
4. Scalability
In large scale distributed systems with numerous interconnected nodes
maintaining a single global clock across the entire system become more complex
and impractical.
Logical clock
Logical clock
• It refers to a mechanism used to establish an ordering of events that occurs
across the different nodes or processes in a distributed system.
• It satisfies causal relationship between events
• Useful for
a. coordinating activities
b. Maintaining consistency
c. Enabling the implementation of various distributed algorithms and protocols
d. Ensuring proper ordering of events
• Every event is assigned a timestamp and the causality relationship between
events can be inferred from their timestamps
Logical clock
Logical clock
• The timestamp assigned to events obey the fundamental monotonicity
property.
ie if an event a causally affects event b then the timestamp of a is smaller than
the timestamp of b
➢ Concurrency measure:
• The knowledge of how many events are causally dependent is useful in
measuring the amount of concurrency in a computation.
• All events that are not causally related can be executed concurrently.
• Thus, an analysis of the causality in a computation gives an idea of the
concurrency in the program
Physical clock synchronization : NTP
➢ Clock synchronization
• It is the process of ensuring that physically distributed processors have a
common notion of time.
• It has a significant effect on many problems like secure systems, fault
diagnosis and recovery, scheduled operations, database systems, and real-
world clock values.
• Clocks are synchronized to an accurate real-time standard like UTC (Universal
Coordinated Time).
Physical clock synchronization : NTP
➢ Definitions
Ca and Cb are any two clocks.
• Time: The time of a clock in a machine p is given by the function Cp(t), where
Cp(t) = t for a perfect clock.
• Offset: Clock offset is the difference between the time reported by a clock and
the real time.
✓ The offset of the clock Ca is given by Ca(t) − t.
✓ The offset of clock Ca relative to Cb at time t ≥ 0 is given by Ca(t) − Cb(t).
Physical clock synchronization : NTP
• Skew: The skew of a clock is the difference in the frequencies of the clock and
the perfect clock.
✓ The skew of a clock Ca relative to clock Cb at time t is (C ′ a (t) − C ′ b (t)).
✓ If the skew is bounded by ρ, then as per Equation 1 − ρ ≤ dC/dt ≤ 1 + ρ clock
values are allowed to diverge at a rate in the range of 1 − ρ to 1 + ρ.
• Drift (rate): The drift of clock Ca is the second derivative of the clock value with
respect to time, namely, C ′′ a (t).
✓ The drift of clock Ca relative to clock Cb at time t is C ′′ a (t) − C ′′ b (t).
Physical clock synchronization : NTP
➢ The behavior of fast, slow, and perfect clocks with respect to UTC.
Physical clock synchronization : NTP
➢ Offset delay estimation method using NTP
NTP : Network Time Protocol
• widely used for clock synchronization on the internet
• Calculates Offset by performing various trials and choosing the trial with
minimum delay.
• The design of NTP involves a hierarchical tree of time servers
-The primary server at the root synchronizes with the UTC.
-The next level contains secondary servers, which act as a backup to the
primary server
- At the lowest level is the synchronization subnet which has the clients.
Physical clock synchronization : NTP
➢ Offset delay estimation using NTP
• R2: This rule governs how a process updates its global logical clock to update
its view of the global time and global progress.
4. No strong consistency
ei → ej ⇔ C(ei) < C(ej )
Vector Time
• Proposed by Fidge, Mattern and Schmuck
• The timestamp associated with an event is the value of the vector clock of its
process when the event is executed.
• Process pi uses the following two rules R1 and R2 to update its clock:
Vector Time
R1:
Before executing an event, process pi updates its local logical time as
follows:
vti[i] := vti[i] + d (d > 0)
• R2:
Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process pi
executes the following sequence of actions:
Update its global logical time as follows:
• 1 ≤ k ≤ n : vti[k] := max(vti[k], vt[k])
• – Execute R1.
• – Deliver the message m.
Vector Time
The following relations are defined to compare two vector timestamps, vh and
vk:
• vh = vk ⇔ ∀x : vh[x] = vk[x]
• vh ≤ vk ⇔ ∀x : vh[x] ≤ vk[x]
• vh < vk ⇔ vh ≤ vk and ∃x : vh[x] < vk[x]
• vh || vk ⇔ ¬(vh < vk) ∧ ¬(vk < vh)
Vector Time
Example of vector clocks progress with the increment value d=1. Initially, a
vector clock is [0, 0, 0, ...., 0].
Vector Time
Properties of Vector time
1. Isomorphism
If two events x and y have timestamps vh and vk, respectively, then
x → y ⇔ vh < vk
x || y ⇔ vh || vk.
2. Strongly Consistent
x → y ⇔ vh < vk
3. Event counting
• If an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e.
• The total number of events that causally precede e in the distributed
computation.
Σvh[j] − 1
Vector Time
Size of Vector clock
Case 2: Whenever there is a causality between pair of events, array size will be
equal to (E,≺) and need not be ‘n’.
Need:
• If the number of processes in a distributed computation is large, then vector
clocks will require piggybacking of huge amount of information in messages.
• The message overhead grows linearly with number of processors in the system
and when there are thousands of processors in the system, the message size
becomes huge even if there are only a few events occurring in few processors.
Method:
Singhal – KshemKalyani’s Differential technique
Global state
Why finding Global state is difficult?
• No shared memory between systems
• No common global clock
• Difficult to synchronize different local clocks
Global state computation
• Process state – Local memory and history of Activity
• Channel state – Message sent / received
Need of global state
• Deadlock detection
• Termination
• Failure recovery
• Debugging
Global state
Effect of state computation at different instant
Banking example:
• time t1: Site S1 initiates a transfer of $50 from Account A to Account B. Account
A is decremented by $50 to $550 and a request for $50 credit to Account B is
sent on Channel C12 to site S2.
Account A = $550, Account B = $200, C12 = $50, C21 = $0.
• time t2: Site S2 initiates a transfer of $80 from Account B to Account A. Account
B is decremented by $80 to $120 and a request for $80 credit to Account A is
sent on Channel C21 to site S1.
Account A = $550, Account B = $120, C12 = $50, C21 = $80.
Global state
• time t3: Site S1 receives the message for a $80 credit to Account A and updates
Account A.
Account A = $630, Account B = $120, C12 = $50, C21 = $0.
• time t4: Site S2 receives the message for a $50 credit to Account B and updates
Account B.
Account A = $630, Account B = $170, C12 = $0, C21 = $0.
System model and definitions
• The system consists of a collection of n processes, p1, p2, ..., pn, that are
connected by channels.
• There is no globally shared memory and physical global clock and processes
communicate solely by passing messages.
• Cij denote the channel from process pi to process pj and its state is denoted by
Scij
• For a message mij that is sent by process pi to process pj , let send(mij) and
rec(mij) denote its send and receive events, respectively
System model and definitions
• At any instant, the state of process pi, denoted by LSi, is a result of the
sequence of all the events executed by pi till that instant
• For an event e and a process state LSi, e ∈ LSi iff e belongs to the sequence of
events that have taken process pi to state LSi.
• For an event e and a process state LSi, e ∈ LSi iff e does not belong to the
sequence of events that have taken process pi to state LSi.
• For a channel Cij , the following set of messages can be defined based on the
local states of the processes pi and pj
• In FIFO model, each channel acts as a firstin first-out message queue and thus,
message ordering is preserved by a channel.
• In non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random
order.
• A system that supports causal delivery of messages satisfies the following
property:
“ For any two messages mij and mkj , if send(mij) → send(mkj),
then rec(mij) → rec(mkj) ".
System model and definitions
• The global state of a distributed system is a collection of the local states of the
processes and the channels. Notationally, global state GS is defined as,
System model and definitions
Interpretation in terms of cuts
• A cut is a line joining an arbitrary point on each process line that slices the
space-time diagram into a PAST and a FUTURE
• For example, consider the space-time diagram for the computation illustrated
in the below Figure
System model and definitions
system model and definitions
• Cut C1 is inconsistent because message m1 is flowing from the FUTURE to the
PAST.
• Cut C2 is consistent and message m4 must be captured in the state of channel
C21.
Issues in recording a global state
The following two issues need to be addressed:
From C2 - Any message that is sent by a process after recording its snapshot,
must not be recorded in the global snapshot
I2: How to determine the instant when a process takes its snapshot.
From C2 - A process pj must record its snapshot before processing a message mij
that was sent by process pi after recording its snapshot.
Snapshot Recording Algorithm for FIFO channels
Chandy-Lamport Algorithm
• After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
• A process must record its snapshot no later than when it receives a marker on
any of its incoming channels.
Snapshot Recording Algorithm for FIFO channels
• The algorithm can be initiated by any process by executing the “Marker
Sending Rule“ by which it records its local state and sends a marker on each
outgoing channel.
Case 1: If the process has not recorded its local state , it records the state of the
channel on which the marker is received as empty and executes the
“Marker Sending Rule“ to record its local state.
Case 2: If the process has recorded its local state , it records the state of the
channel as corresponding message between last snapshot and current
signal.
Snapshot Recording Algorithm for FIFO channels
• The algorithm terminates after each process receives a marker on all of its
incoming channels.
• All the local snapshots will get disseminated to all other processes and all the
processes can determine the global state.
Snapshot Recording Algorithm
Snapshot Recording Algorithm for FIFO channels
• Correctness
• To prove the correctness of the algorithm, a recorded snapshot must satisfy
conditions C1 and C2.
• Due to FIFO property of channels, it follows that no message sent after the
marker on that channel is recorded in the channel state. Thus, condition C2 is
satisfied.
• When a process pj receives message mij that precedes the marker on channel Cij
, it acts as follows:
If process pj has not taken its snapshot yet, then it includes mij in its
recorded snapshot.
Otherwise, it records mij in the state of the channel Cij . Thus, condition C1
is satisfied.
Snapshot Recording Algorithm for FIFO channels
• Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is the
diameter of the network.
Snapshot Recording Algorithm for FIFO channels
Properties of Recorded Global state
• Consider two possible executions of the snapshot algorithm
1. Markers shown using dashed-and-dotted arrows
• Let site S1 initiate the algorithm just after t1.
• Site S1 records its local state (Account A = $550) and sends a marker to site S2.
The marker is received by site S2 after t4.
• When site S2 receives the marker, it records its local state (Account B = $170),
the state of channel C12 as $0, and sends a marker along channel C21.
Snapshot Recording Algorithm for FIFO channels
• When site S1 receives this marker, it records the state of Channel C21 as $80.
The $800 amount in the system is conserved in the recorded global state,
A = $550,B = $170,C12 = $0,C21 = $80
2. Markers shown using dotted arrows
• Let site S1 initiate the algorithm just after t0 and before sending the $50 for S2.
• Site S1 records its local state (Account A = $600) and sends a marker to site S2.
• The marker is received by site S2 between t2 and t3.
• When site S2 receives the marker, it records its local state (Account B = $120),
the state of channel C12 as $0, and sends a marker along channel C21.
• When site S1 receives this marker, it records the state of Channel C21 as $80.
The $800 amount in the system is conserved in the recorded global state,
A = $600,B = $120,C12 = $0,C21 = $80
Snapshot Recording Algorithm for FIFO channels
Snapshot Recording Algorithm for FIFO channels
• The recorded global state may not correspond to any of the global states that
occurred during the computation
• This happens because a process can change its state asynchronously before
the markers it sent are received by other sites and the other sites record their
states.
• The system could have passed through the recorded global states in some
equivalent executions
1. Non FIFO
• Asynchronous execution : An asynchronous execution (or A-execution) is an
execution (E,≺) for which the causality relation is a partial order.
• physical link typically delivers the messages sent on it in FIFO order due to the
physical properties of the medium
Message ordering paradigms
• On any logical link between two nodes in the system, messages may be
delivered in any order, not necessarily First-In First-Out. Such executions are
also known as non-FIFO executions.
Message ordering paradigms
2. FIFO
• Asynchronous execution : A FIFO execution is an A-execution in which:
for all (s, r) and (s′, r′) ∈ T , (s ∼ s′ and r ∼ r′ and s ≺ s′) =: r ≺ r′
Message ordering paradigms
To enforce CO, message m3 should be kept pending in the local buffer after it
arrives at P1, until m1 arrives and m1 is delivered
Message ordering paradigms
OTHER DEFINITIONS
(a) causal order (CO) for implementations
If send(m1) ≺ send(m2) then for each common destination d of messages m1
and m2, deliverd(m1) ≺ deliverd(m2) must be satisfied.
{s2, r2, s3, r3, s1, r1} is a linear extension that is non-separated.
{s2, s1, r2, s3, r3, r1} is a linear extension that is separated.
Asynchronous execution with synchronous communication
(ii) RSC execution
An A-execution (E,≺) is an RSC execution if and only if there exists a non-
separated linear extension of the partial order (E,≺)
(iii) Crown
• Characterization of the RSC execution can be studied in terms of a graph
structure called a crown.
• RSC ⊂ CO ⊂ FIFO ⊂ A.
• The above hierarchy implies that some executions belonging to a class X will
not belong to any of the classes included in X. Thus, there are more restrictions
on the possible message orderings in the smaller classes.
• This is because synchronous order offers the most simplicity due to the
restricted number of possibilities, whereas non-FIFO order offers the greatest
difficulties because it admits a much larger set of possibilities that the
developer and verifier need to account for.
Asynchronous execution with synchronous communication
(d) Simulations
(a) Asynchronous programs on synchronous systems
Asynchronous execution with synchronous communication
(b) synchronous programs on Asynchronous systems
• A (valid) S-execution can be trivially realized on an asynchronous system by
scheduling the messages in the order in which they appear in the S-execution.
• Once a message send event is scheduled, the middleware layer waits for an
acknowledgment; after the ack is received, the synchronous send primitive
completes.
Synchronous Program Order on an Asynchronous System
In order to implement system with synchronous communication we have to
determine whether there is non determinism in program execution.
Non-determinism.
programs are non-deterministic in the following senses.
1. A receive call can receive a message from any sender who has sent a
message, if the expected sender is not specified.
2. Multiple send and receive calls which are enabled at a process can be
executed in an interchangeable order.
Rendezvous
Types
1. Multiway
2. Binary
Synchronous Program Order on an Asynchronous System
1. Multiway Rendezvous
synchronous communication among an arbitrary number of asynchronous
processes.
[[
2. Binary Rendezvous
• synchronous communication between a pair of processes at a time.
• Programming language support : CSP and Ada
• Subset of CSP is used for implementation
∗[G1 → CL1 || G2 → CL2 || · · · || Gk → CLk]
the ⋆ operator → Repetative command
the || operator → Alternative command
G1 → CL1 → Guarded command
G -→ Guard and CL → Statement Block
Synchronous Program Order on an Asynchronous System
• M --- Message
• Ack(M) --- acknowledgement for the message
• Request(M) ---- requesting to send
• Permission(M) ---- permission for sending
Synchronous Program Order on an Asynchronous System
Bagrodia Binary Rendezvous Algorithm
Synchronous Program Order on an Asynchronous System
Group Communication
• If the sender of the multicast can be outside the destination group, the
multicast algorithm is said to be an open group algorithm
Causal order
Causal order
Criteria to be met by causal ordering
• Safety: In order to prevent causal order from being violated, a message M that
arrives at a process may need to be buffered until all systemwide messages
sent in the causal past of the send(M) event to that same destination have
already arrived.
Therefore, we distinguish between the arrival of a message at a process (at
which time it is placed in a local system buffer) and the event at which the
message is given to the application process (when the protocol deems it safe
to do so without violating causal order). The arrival of a message is transparent
to the application process. The delivery event corresponds to the receive event
in the execution model.
• Liveness: A message that arrives at a process must eventually be delivered to
the process
Causal order
Raynal –Schiper-Toueg (RST) Algorithm
Causal order
(b) SR (Sequence Record or Status Register) – Vector that tracks the last message
delivery from each node
(c) Log – stores causal dependencies (messages not yet delivered to other nodes)
(b) Send the message along with its timestamp and causal dependencies (from
log) to the destination
• Receiver
Initial setup:
• SR – It’s a record that tracks the latest message each node has received from
every other node. It helps to ensure that messages are received in CO.
SR_A = 0, SR_B = 0, SR_C = 0
Optimal KS Algorithm
• Local Log – A node keeps the record of all the messages it has sent out, along
with the information about which other nodes still needs to receive those
messages.
Log A = { }, Log B = { }, Log C = { }
b. The await call in line (1d) is nonblocking, i.e., any other messages received in
the meanwhile are processed.
c. Once all expected replies are received, the process computes the maximum of
the proposed timestamps for M, and uses the maximum as the final
timestamp.
Total order
• Phase 3: In the third phase, the process multicasts the final timestamp to the
group in line (1f).
Receivers:
b. It updates the variable priority that tracks the highest proposed timestamp
(line 2a), then revises the proposed timestamp to the priority, and places
the message with its tag and the revised timestamp at the tail of the queue
temp_Q (line 2b).
a. The receiver sends the revised timestamp (and the tag) back to the sender
(line 2c).
b. The receiver then waits in a nonblocking manner for the final timestamp
(correlated by the message tag).
c. The queue is then resorted using the timestamp field of the entries
as the key (line 3c).
d. As the queue is already sorted except for the modified entry for the
message under consideration, that message entry has to be placed
in its sorted position in the queue.
Total order
e. If the message entry is at the head of the temp_Q, that entry, and all
consecutive subsequent entries that are also marked as deliverable, are
dequeued from temp_Q, and enqueued in deliver_Q in that order (loop in
lines 3d-3g).