Relation To Computer System Components: M.D.Boomija, Ap/Cse
Relation To Computer System Components: M.D.Boomija, Ap/Cse
1. Introduction
1. No common physical clock -> “distribution” in the system and gives rise to the inherent
asynchrony amongst the processors.
2. No shared memory -> distributed system may still provide the abstraction of a common
address space via the distributed shared memory abstraction.
3. Geographical separation -> The geographically wider apart that the processors are, the
more representative is the system of a distributed system network/cluster of workstations
(NOW/COW) configuration connecting processors. The Google search engine is based
on the NOW architecture.
4. Autonomy and heterogeneity -> The processors are “loosely coupled” in that they have
different speeds and each can be running a different operating system.
Each computer has a memory-processing unit and the computers are connected by a
communication network. Figure shows the relationships of the software components that run on
each of the computers and use the local operating system and network protocol stack for
functioning.
M.D.BOOMIJA, AP/CSE
The distributed software is also termed as middleware. A distributed execution is the execution
of processes across the distributed system to collaboratively achieve a common goal. An
execution is also sometimes termed a computation or a run.
A distributed system connects processors by a communication network.
The distributed system uses a layered architecture to break down the complexity of
system design. The middleware is the distributed software that drives the distributed
system, while providing transparency of heterogeneity at the platform level.
There are several standards such as Object Management Group’s (OMG) common object
request broker architecture (CORBA) [36], and the remote procedure call (RPC)
mechanism
1.3 Motivation
The motivation for using a distributed system is some or all of the following requirements:
1. Inherently distributed computations
The computation is inherently distributed
Eg., money transfer in banking
2. Resource sharing
Resources such as peripherals, complete data sets in databases, special libraries, as well as data
(variable/files) cannot be fully replicated at all the sites. Further, they cannot be placed at a single
site. Therefore, such resources are typically distributed across the system.
For example, distributed databases such as DB2 partition the data sets across several servers
M.D.BOOMIJA, AP/CSE
3. Access to geographically remote data and resources
In many scenarios, the data cannot be replicated at every site participating in the distributed
execution because it may be too large or too sensitive to be replicated.
For example, payroll data within a multinational corporation is both too large and too sensitive to
be replicated at every branch office/site.
4. Enhanced reliability
A distributed system has the inherent potential to provide increased reliability because of the
possibility of replicating resources and executions, as well as the reality that geographically
distributed resources are not likely to crash/malfunction at the same time under normal
circumstances. Reliability entails several aspects:
a. availability, i.e., the resource should be accessible at all times;
b. integrity, i.e., the value/state of the resource should be correct
c. fault-tolerance, i.e., the ability to recover from system failures
Heterogeneous processors may be easily added into the system without affecting the
performance, as long as those processors are running the same middleware algorithms. Similarly,
existing processors may be easily replaced by other processors.
1. Multiprocessor system
2. Multicomputer parallel system
3. Array processors
1. A multiprocessor system is a parallel system in which the multiple processors have direct
access to shared memory which forms a common address space.
M.D.BOOMIJA, AP/CSE
The architecture is shown in Figure (a). Such processors usually do not have a common clock.
Two standard architectures for parallel systems. (a) Uniform memory access (UMA)
multiprocessor system. (b) Non-uniform memory access (NUMA) multiprocessor. In both
architectures, the processors may locally cache data from memory.
Figure : Interconnection networks for shared memory multiprocessor systems. (a) Omega
network [4] for n = 8 processors P0–P7 and memory banks M0–M7. (b) Butterfly network [10]
for n = 8 processors P0–P7 and memory banks M0–M7.
Figure shows two popular interconnection networks – the Omega network and the Butterfly
network, each of which is a multi-stage network formed of 2 ×2 switching elements. Each 2 ×2
switch allows data on either of the two input wires to be switched to the upper or the lower
output wire.
Each 2 × 2 switch is represented as a rectangle in the figure. Further-more, a n-input and
n-output network uses log n stages and log n bits for addressing.
Omega interconnection function The Omega network which connects n processors to n
memory units has n/2log2 n switching elements of size 2 × 2 arranged in log2 n stages.
M.D.BOOMIJA, AP/CSE
Interconnection function: Output i of a stage connected to input j of next stage:
Consider any stage of switches. Informally, the upper (lower) input lines for each switch
come in sequential order from the upper (lower) half of the switches in the earlier stage.
With respect to the Omega network in Figure(a), n = 8. Hence, for any stage, for the
outputs i, where 0 ≤ i ≤ 3, the output i is connected to input 2i of the next stage. For 4 ≤ i
≤ 7, the output i of any stage is connected to input 2i + 1 − n of the next stage.
M.D.BOOMIJA, AP/CSE
Examples of parallel multicomputers are: the NYU Ultracomputer and the Sequent shared
memory machines, the CM* Connection machine and processors configured in regular and
symmetrical topologies such as an array or mesh, ring, torus, cube, and hypercube (message-
passing machines).
Figure (a) shows a wrap-around 4 × 4 mesh. For a k × k mesh which will contain k 2
processors, the maximum path length between any two processors is 2 k/2 − 1 . Routing can
be done along the Manhattan grid.
Figure (b) shows a four-dimensional hypercube. A k-dimensional hyper-cube has 2 k
processor-and-memory units. Each such unit is a node in the hypercube, and has a unique k-
bit label.
Hamming distance
The processors are labelled such that the shortest path between any two processors is the
Hamming distance (defined as the number of bit positions in which the two equal sized
bit strings differ) between the processor labels.
Example Nodes 0101 and 1100 have a Hamming distance of 2. The shortest path between
them has length 2.
3. Array processors
Array processors belong to a class of parallel computers that are physically co-located, are
very tightly coupled, and have a common system clock (but may not share memory and
communicate by passing data using messages).
Array processors and systolic arrays that perform tightly synchronized processing and data
exchange in lock-step for applications such as DSP and image processing belong to this
category.
These applications usually involve a large number of iterations on the data. This class of
parallel systems has a very niche market.
M.D.BOOMIJA, AP/CSE
1.4.2 Flynn’s Taxonomy
Flynn identified four processing modes, based on whether the processors execute the same or
different instruction streams at the same time, and whether or not the processors processed the
same (identical) data at the same time.
M.D.BOOMIJA, AP/CSE
When the degree of coupling is high (low), the modules are said to be tightly (loosely)
coupled.
SIMD and MISD architectures generally tend to be tightly coupled because of the
common clocking of the shared instruction stream or the shared data stream.
Various MIMD architectures in terms of coupling:
Tightly coupled multiprocessors (with UMA shared memory). These may be either
switch-based
Tightly coupled multiprocessors (with NUMA shared memory or that communicate by
message passing).
Loosely coupled multi computers (without shared memory) physically co-located. These
may be bus-based
and the processors may be heterogeneous
Loosely coupled multi computers (without shared memory and without common clock)
that are physically remote.
Parallelism or speedup of a program on a specific system
Parallelism
within a parallel/distributed program
This is an aggregate measure of the percentage of time that all the proces-sors are
executing CPU instructions productively, as opposed to waiting for communication (either via
shared memory or message-passing) operations to complete.
Concurrency of a program
Shared memory systems are those in which there is a (common) shared address space
throughout the system.
Communication among processors takes place via shared data variables, and control
variables for synchronization among the processors.
M.D.BOOMIJA, AP/CSE
Semaphores and monitors that were originally designed for shared memory uni
processors and multiprocessors
The abstraction called shared memory is sometimes provided to simulate a shared address
space. For a distributed system, this abstraction is called distributed shared memory.
Implementing this abstraction has a certain cost but it simplifies the task of the
application programmer.
The communication via message-passing can be simulated by communication via shared
memory and vice-versa. Therefore, the two paradigms are equivalent.
M.D.BOOMIJA, AP/CSE
Synchronous primitive(send/receive)
Handshake between sender and receiver
Send completes when Receive completes
Receive completes when data copied into buffer
A non-blocking send primitive. When the Wait call returns, at least one of its parameters is
posted.
M.D.BOOMIJA, AP/CSE
Blocking/nonblocking; Synchronous/asynchronous; send/receive primities
Processor synchrony indicates that all the processors execute in lock-step with their
clocks synchronized.
It is used to ensure that no processor begins executing the next step of code until all the
processors have completed executing the previous steps of code assigned to each of the
processors.
The message-passing interface (MPI) library and the PVM (parallel virtual machine)
library
Commercial software is often written using the remote procedure calls (RPC) mechanism
for example, Sun RPC, and distributed computing environ-ment (DCE) RPC
“Messaging” and “streaming” are two other mechanisms for communication, (RMI) and
remote object invocation (ROI)
M.D.BOOMIJA, AP/CSE
CORBA (common object request broker architecture) and DCOM (distributed
component object model) are two other standardized architectures with their own set of
primitives
M.D.BOOMIJA, AP/CSE
When implementing this abstraction, observe that the fewer the steps or
“synchronizations” of the processors, the lower the delays and costs.
Virtual Synchrony
If processors are allowed to have an asynchronous execution for a period of time and then
they synchronize, then the granularity of the synchrony is coarse. This is really a
virtually synchronous execution, and the abstraction is sometimes termed as virtual
synchrony.
Ideally, many programs want the processes to execute a series of instructions in rounds
(also termed as steps or phases) asynchronously, with the requirement that after each
round/step/phase, all the processes should be synchronized and all messages sent should
be delivered.
This is the commonly understood notion of a synchronous execution. Within each
round/phase/step, there may be a finite and bounded number of sequential sub-rounds (or
sub-phases or sub-steps) that processes execute. Each sub-round is assumed to send at
most one message per process; hence the message(s) sent will reach in a single message
hop.
In this system, there are four nodes P 0 to P3. In each round, process Pi sends a message to P i+1 mod
4 and P i−1 mod 4 and calculates some application-specific function on the received values.
M.D.BOOMIJA, AP/CSE
No processor synchrony, no bound on drift rate of clocks
Message delays nite but unbounded
No bound on time for a step at a process
Sync execution
Processors are synchronized; clock drift rate bounded
Message delivery occurs in one logical step/round
Known upper bound on time to execute a step at a process
System Emulations
The shared memory system could be emulated by a message-passing system, and vice-
versa
If system A can be emulated by system B, denoted A/B, and if a problem is not solvable
in B, then it is also not solvable in A. Likewise, if a problem is solvable in A, it is also solvable
in B. Hence, in a sense, all four classes are equivalent in terms of “computability” – what can
and cannot be computed – in failure-free systems.
Emulations among the principal system classes in a failure-free system.
M.D.BOOMIJA, AP/CSE
1.8 Design issues and challenges
The categorization of design issues and challengesm as (i) having a greater component related to
systems design and operating systems design, or (ii) having a greater component related to
algorithm design, or (iii) emerging from recent technology advances and/or driven by new
applications.
The following functions must be addressed when designing and building a distributed system:
Communication mechanisms: E.g., Remote Procedure Call (RPC), remote object invocation
(ROI), message-oriented vs. stream-oriented communication
Processes: Code migration, process/thread management at clients and servers, design of
software and mobile agents
Naming: Easy to use identifiers needed to locate resources and processes transparently and
scalable.
Synchronization
Mechanisms for synchronization or coordination among the processes are essential. Mutual
exclusion is the classical example of synchronization
Data storage and access
Schemes for data storage, search, and lookup should be fast and scalable across network
Revisit file system design
Consistency and replication
Replication for fast access, scalability, avoid bottlenecks
Require consistency management among replicas
Fault-tolerance: correct and efficient operation despite link, node, process failures
M.D.BOOMIJA, AP/CSE
Access: hide di erences in data rep across systems, provide uniform operations to access
resources
Location: locations of resources are transparent
Migration: relocate resources without renaming
Relocation: relocate resources as they are being accessed
Replication: hide replication from the users
Concurrency: mask the use of shared resources
Failure: reliable and fault-tolerant operation
Scalability and modularity
Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.
Useful execution models and frameworks: to reason with and design correct distributed
programs
Interleaving model
Partial order model
Input/Output automata
Temporal Logic of Actions
Dynamic distributed graph algorithms and routing algorithms
System topology: distributed graph, with only local neighborhood knowledge
Graph algorithms: building blocks for group communication, data dissemination, object
location
Algorithms need to deal with dynamically changing graphs
Algorithm e ciency: also impacts resource consumption, latency, tra c, congestion
Time and global state
The processes in the system are spread across three-dimensional physical space. Another
dimension, time, has to be superimposed uniformly across space.
The challenges pertain to providing accurate physical time, and to providing a variant of
time, called logical time
Logical time captures inter-process dependencies and tracks relative time progression
Global state observation: inherent distributed nature of system
Concurrency measures: concurrency depends on program logic, execution speeds within
logical threads, communication speeds
Synchronization/coordination mechanisms
M.D.BOOMIJA, AP/CSE
Termination detection: global state of quiescence; no CPU processing and no in-transit
messages
Garbage collection: Reclaim objects no longer pointed to by any process
Group communication, multicast, and ordered message delivery
A group is a collection of processes that share a common context and collab-orate on a
common task within an application domain.
Multiple joins, leaves, fails
Concurrent sends: semantics of delivery order
Monitoring distributed events and predicates
Predicate: condition on global system state
An important paradigm for monitoring distributed events is that of event streaming,
wherein streams of relevant events reported from different processes are examined
collectively to detect predicates.
Distributed program design and verification tools
Methodically designed and verifiably correct programs can greatly reduce the overhead
of software design, debugging, and engineering.
Debugging distributed programs
Debugging sequential programs is hard; debugging distributed programs is that much
harder because of the concurrency in actions
M.D.BOOMIJA, AP/CSE
Reliable and fault-tolerant distributed systems
Consensus algorithms: processes reach agreement in spite of faults (under various fault models)
Replication (as in having backup servers) is a classical method of providing fault-tolerance. The
triple modular redundancy (TMR) technique has long been used in software as well as hardware
installations.
Voting and quorum systems
Distributed databases, commit: ACID properties
Self-stabilizing systems: "illegal" system state changes to "legal" state; requires built-in
redundancy
Check pointing and recovery algorithms: roll back and restart from earlier "saved" state
Failure detectors:
Difficult to distinguish a "slow" process/message from a failed process/ never sent
message algorithms that "suspect" a process as having failed and converge on a
determination of its up/down status
Load balancing: to reduce latency, increase throughput, dynamically. E.g., server farms
Computation migration: relocate processes to redistribute workload
Data migration: move data, based on access patterns
Distributed scheduling: across processors
Real-time scheduling: difficult without global view, network delays make task harder
Performance modeling and analysis: Network latency to access resources must be reduced
Metrics: theoretical measures for algorithms, practical measures for systems
Measurement methodologies and tools
Mobile systems
Wireless communication: unit disk model; broadcast medium (MAC), power
management etc.
CS perspective: routing, location management, channel allocation, localization and
position estimation, mobility management
Base station model (cellular model)
Ad-hoc network model (rich in distributed graph theory problems)
Sensor networks: Processor with electro-mechanical interface Ubiquitous or pervasive
computing
Processors embedded in and seamlessly pervading environment
Wireless sensor and actuator mechanisms; self-organizing; network-centric, resource-
constrained
E.g., intelligent home, smart workplace
M.D.BOOMIJA, AP/CSE
Peer-to-peer computing
No hierarchy; symmetric role; self-organizing; efficient object storage and lookup;
scalable; dynamic reconfiguration
all processors are equal and play a symmetric role in the computation.
Publish/subscribe, content distribution
Filtering information to extract that of interest
Distributed agents
Processes that move and cooperate to perform specific tasks; coordination, controlling
mobility, software design and interfaces
Distributed data mining
Extract patterns/trends of interest
Data not available in a single repository
Grid computing
Grid of shared computing resources; use idle CPU cycles
Issues: scheduling, QOS guarantees, security of machines and jobs
Security
Confidentiality, authentication, availability in a distributed setting
Manage wireless, peer-to-peer, grid environments
Issues: e.g., Lack of trust, broadcast media, resource-constrained, lack of structure
M.D.BOOMIJA, AP/CSE
The occurrence of events changes the states of respective processes and channels. An
internal event changes the state of the process at which it occurs. A send event
changes the state of the process that sends the message and the state of the channel on
which the message is sent. A receive event changes the state of the process that
receives the message and the state of the channel on which the message is received.
The send and the receive events signify the flow of information between processes
and establish causal dependency from the sender process to the receiver process.
A relation →msg that captures the causal dependency due to message exchange, is
defined as follows. For every message m that is exchanged between two processes, we
have send (m) →msg rec (m).
Relation →msg defines causal dependencies between the pairs of corresponding send
and receive events.
The evolution of a distributed execution is depicted by a space-time diagram.
A horizontal line represents the progress of the process; a dot indicates an event; a
slant arrow indicates a message transfer.
Since we assume that an event execution is atomic (hence, indivisible and
instantaneous), it is justified to denote it as a dot on a process line.
In the Figure, for process p1, the second event is a message send event, the third
event is an internal event, and the fourth event is a message receive event.
M.D.BOOMIJA, AP/CSE
The causal precedence relation induces an irreflexive partial order on the events of a distributed
computation that is denoted as H=(H, →).
Note that the relation → is nothing but Lamport’s “happens before” relation.
For any two events ei and ej , if ei → ej , then event ej is directly or transitively dependent
on event ei . (Graphically, it means that there exists a path consisting of message arrows
and process-line segments (along increasing time) in the space-time diagram that starts at
ei and ends at ej .)
The relation → denotes flow of information in a distributed computation and ei → ej
dictates that all the information available at ei is potentially accessible at ej .
For example, 2in Figure 2.1, event e 26 has the knowledge of all other events shown in the
figure.
Concurrent Events
M.D.BOOMIJA, AP/CSE
A Consistent Global State
Even if the state of all the components is not recorded at the same instant, such a state
will be meaningful provided every message that is recorded as received is also recorded
as sent.
Basic idea is that a state should not violate causality – an effect should not be present
without its cause. A message cannot be received if it was not sent.
Such states are called consistent global states and are meaningful global states.
An Example
Consider the distributed execution of Figure
M.D.BOOMIJA, AP/CSE
1.13 Cuts of a Distributed Computation
“In the space-time diagram of a distributed computation, a cut is a zigzag line joining one
arbitrary point on each process line.”
A cut slices the space-time diagram, and thus the set of events in the distributed
computation, into a PAST and a FUTURE.
The PAST contains all the events to the left of the cut and the FUTURE contains all the
events to the right of the cut.
For a cut C , let PAST(C ) and FUTURE(C ) denote the set of events in the PAST and
FUTURE of C , respectively.
Every cut corresponds to a global state and every global state can be graphically
represented as a cut in the computation’s space-time diagram.
Cuts in a space-time diagram provide a powerful graphical aid in representing and
reasoning about global states of a computation.
Figure: Illustration of cuts in a distributed execution.
In a consistent cut, every message received in the PAST of the cut was sent in the PAST
M.D.BOOMIJA, AP/CSE
of that cut. (In Figure, cut C2 is a consistent cut.)
All messages that cross the cut from the PAST to the FUTURE are in transit in the
corresponding consistent global state.
A cut is inconsistent if a message crosses the cut from the FUTURE to the PAST. (In
Figure, cut C1 is an inconsistent cut.)
1.14 Past and Future Cones of an Event
Let Pasti (ej ) be the set of all those events of Past(ej ) that are on process pi .
Pasti (ej ) is a totally ordered set, ordered by the relation →i , whose maximal element is
denoted by max (Pasti (ej )).
max (Pasti (ej )) is the latest event at process pi that affected event ej
M.D.BOOMIJA, AP/CSE
1.15 Models of Process Communications
There are two of basic models process communications – synchronous and asynchronous.
The synchronous communication model is a blocking type where on a message send, the
sender process blocks until the message has been received by the receiver process. The
sender process resumes execution only after it learns that the receiver process has
accepted the message.
Thus, the sender and the receiver processes must synchronize to exchange a message. On
the other hand, asynchronous communication model is a non-blocking type where the sender
and the receiver do not synchronize to exchange a message.
After having sent a message, the sender process does not wait for the message to be
delivered to the receiver process. The message is buffered by the system and is delivered to
the receiver process when it is ready to accept the message. Neither of the communication
models is superior to the other.
Asynchronous communication provides higher parallelism because the sender process
can execute while the message is in transit to the receiver.
However, A buffer overflow may occur if a process sends a large number of messages in
a burst to another process. Thus, an implementation of asynchronous communication
requires more complex buffer management.
In addition, due to higher degree of parallelism and non-determinism, it is much more
difficult to design, verify, and implement distributed algorithms for asynchronous
communications.
Synchronous communication is simpler to handle and implement.
However, due to frequent blocking, it is likely to have poor performance and is likely to
be more prone to deadlocks.
The concept of causality between events is fundamental to the design and analysis of parallel
M.D.BOOMIJA, AP/CSE
and distributed computing and operating systems.
Usually causality is tracked using physical time.
In distributed systems, it is not possible to have a global physical time.
As asynchronous distributed computations make progress in spurts, the logical time is
sufficient to capture the fundamental monotonicity property associated with causality in
distributed systems.
This chapter discusses three ways to implement logical time - scalar time, vector time,
and matrix time.
Causality among events in a distributed system is a powerful concept in reasoning,
analyzing, and drawing inferences about a computation.
The knowledge of the causal precedence relation among the events of processes helps
solve a variety of problems in distributed systems, such as distributed algorithms design,
tracking of dependent events, knowledge about the progress of a computation, and
concurrency measures.
1.17 A Framework for a System of Logical Clocks
1.17.1 Definition
A system of logical clocks consists of a time domain T and a logical clock C .
Elements of T form a partially ordered set over a relation <.
Relation < is called the happened before or causal precedence. Intuitively, this
relation is analogous to the earlier than relation provided by the physical time.
The logical clock C is a function that maps an event e in a distributed system to
an element in the time domain T , denoted as C(e) and called the timestamp of e,
and is defined as follows:
C : H→T
such that the following property is satisfied:
M.D.BOOMIJA, AP/CSE
The protocol ensures that a process’s logical clock, and thus its view of the global time, is
managed consistently. The protocol consists of the following two rules:
R1: This rule governs how the local logical clock is updated by a process when it
executes an event.
R2: This rule governs how a process updates its global logical clock to update its view of
the global time and global progress.
Systems of logical clocks differ in their representation of logical time and also in the
protocol to update the logical clocks.
1.18 Scalar Time
The scalar time representation was proposed by Lamport in 1978 [9] as an attempt to
totally order events in a distributed system. Time domain in this representation is the set
of non-negative integers.
The logical local clock of a process pi and its local view of the global time are squashed
into one integer variable Ci .
Rules R1 and R2 to update the clocks are as follows:
R1: Before executing an event (send, receive, or internal), process pi executes the following:
Ci := Ci + d (d > 0) In general, every time R1 is executed, d can have a different value;
however, typically d is kept at 1.
R2: Each message piggybacks the clock value of its sender at sending time. When a process pi
receives a message with timestamp Cmsg , it executes the following actions:
1. Ci := max (Ci , Cmsg )
2. Execute R1.
3. Deliver the message.
Figure shows evolution of scalar time.
Evolution of scalar time:
Figure : The space-time diagram of a distributed execution.
Basic Properties
Consistency Property
Scalar clocks satisfy the monotonicity and hence the consistency property: for two events ei and ej ,
ei → ej =⇒ C(ei ) < C(ej ).
Total Ordering
Scalar clocks can be used to totally order events in a distributed system.
The main problem in totally ordering events is that two or more events at different
M.D.BOOMIJA, AP/CSE
processes may have identical timestamp.
For example in Figure, the third event of process P1 and the second event of process P2
have identical scalar timestamp.
A tie-breaking mechanism is needed to order such events. A tie is broken as follows:
Process identifiers are linearly ordered and tie among events with identical scalar
timestamp is broken on the basis of their process identifiers.
The lower the process identifier in the ranking, the higher the priority.
The timestamp of an event is denoted by a tuple (t, i ) where t is its time of occurrence
and i is the identity of the process where it occurred.
Event counting
If the increment value d is always 1, the scalar time has the following interesting
property: if event e has a timestamp h, then h-1 represents the minimum logical duration,
counted in units of events, required before producing the event e;
We call it the height of the event e.
In other words, h-1 events have been produced sequentially before the event e
regardless of the processes that produced these events.
For example, in Figure, five events precede event b on the longest causal path ending at b.
No Strong Consistency
For example, in Figure, the third event of process P1 has smaller scalar timestamp than
the third event of process P2. However, the former did not happen before the latter.
The reason that scalar clocks are not strongly consistent is that the logical local clock and
logical global clock of a process are squashed into one, resulting in the loss causal
dependency information among events at different processes.
For example, in Figure, when process P2 receives the first message from process P1, it
updates its clock to 3, forgetting that the timestamp of the latest event at P1 on which it
depends is 2.
1.19 Vector Time
The system of vector clocks was developed independently by Fidge, Mattern and Schmuck.
In the system of vector clocks, the time domain is represented by a set of
n-dimensional non-negative integer vectors.
Each process pi maintains a vector vti [1..n], where vti [i ] is the local logical clock of pi
and describes the logical time progress at process pi .
vti [j] represents process pi ’s latest knowledge of process pj local time.
M.D.BOOMIJA, AP/CSE
If vti [j]=x , then process pi knows that local time at process pj has progressed till x .
The entire vector vti constitutes pi ’s view of the global logical time and is used to
timestamp events.
Process pi uses the following two rules R1 and R2 to update its clock:
R1: Before executing an event, process pi updates its local logical time as follows:
vti [i ] := vti [i ] + d (d > 0)
R2: Each message m is piggybacked with the vector clock vt of the sender process at
sending time. On the receipt of such a message (m,vt), process pi executes the following
sequence of actions:
1. Update its global logical time as follows:
1 ≤ k ≤ n : vti [k ] := max (vti [k ], vt[k ])
2. Execute R1.
3. Deliver the message m.
The timestamp of an event is the value of the vector clock of its process when the event is
executed.
Figure shows an example of vector clocks progress with the increment value d=1.
Initially, a vector clock is [0, 0, 0,....., 0].
An Example of Vector Clocks
If the process at which an event occurred is known, the test to compare two timestamps can be
simplified as follows: If events x and y respectively occurred at processes pi and pj and are
assigned timestamps vh and vk, respectively, then
M.D.BOOMIJA, AP/CSE
Basic Properties of Vector Time
Isomorphism
If events in a distributed system are time stamped using a system of vector clocks, we
have the following property.
If two events x and y have timestamps vh and vk, respectively, then
x → y ⇔ vh < vk x ǁ y⇔ vh ǁ vk .
Thus, there is an isomorphism between the set of partially ordered events produced by a
distributed computation and their vector timestamps
Strong Consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
However, Charron-Bost showed that the dimension of vector clocks cannot be less than
n, the total number of processes in the distributed computation, for this property to hold.
Event Counting
If d=1 (in rule R1), then the i th component of vector clock at process pi , vti [i ], denotes
the number of events that have occurred at pi until that instant.
So, if an event e has timestamp vh,
vh[j] denotes the number of events executed by process pj that causally precede e.
Clearly, vh[j] – 1 represents the total number of events that causally precede e in the
distributed computation.
Applications
Distributed debugging,
Implementations of causal ordering,
Communication and causal distributed shared memory,
Establishment of global breakpoints
Determining the consistency of checkpoints in optimistic recovery
Size of vector clocks
A linear extension of a partial order E ≺ is a linear ordering of E that is consistent with the partial
order, i.e., if two events are ordered in the partial order, they are also ordered in the linear order.
A linear extension can be viewed as projecting all the events from the different processes on a
single time axis. However, the linear order will necessarily introduce ordering between each pair
of events, and some of these orderings are not in the partial order.
Now consider an execution on processes P1 and P2 such that each sends a message to the other
before receiving the other’s message. The two send events are concurrent, as are the two receive
events. To determine the causality between the send events or between the receive events, it is not
sufficient to use a single integer; a vector clock of size n = 2 is necessary. This execution exhibits
the graphical property called a crown, wherein there are some messages m0 mn−1 such that
M.D.BOOMIJA, AP/CSE
Send mi < Receive mi+1 mod n−1 for all i from 0 to n − 1. A crown of n messages has dimension
n
M.D.BOOMIJA, AP/CSE
Clock Inaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC (Universal
Coordinated Time).
However, due to the clock inaccuracy discussed above, a timer (clock) is said to be working
within its specification if (where constant ρ is the maximum skew rate specified by the
manufacturer.)
Figure illustrates the behavior of fast, slow, and perfect clocks with respect to UTC.
M.D.BOOMIJA, AP/CSE
Clock offset and delay estimation:
In practice, a source node cannot accurately estimate the local time on the target node due to
varying message or network delays between the nodes. This protocol employs a common practice
of performing several trials and chooses the trial with the minimum delay.
Figure shows how NTP timestamps are numbered and exchanged between peers A and B.
Let T1, T2, T3, T4 be the values of the four most recent timestamps as shown. Assume clocks A
and B are stable and running at the same speed.
Offset and delay estimation.
Each NTP message includes the latest three timestamps T1, T2 and T3, while T4 is determined
upon arrival. Thus, both peers A and B can independently calculate delay and offset using a
single bidirectional message stream as shown in Figure.
PART A
1. What Is Distributed system?
A distributed system is a system whose components are located on different networked computers, which
communicate and coordinate their actions by passing messages to one another.
A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be
individually solved.
Autonomous processors communicating over a communication network
M.D.BOOMIJA, AP/CSE
No shared memory -> distributed system may still provide the abstraction of a common address space via the
distributed shared memory abstraction.
Geographical separation -> The geographically wider apart that the processors are, the more representative is
the system of a distributed system network/cluster of workstations (NOW/COW) configuration connecting
processors. The Google search engine is based on the NOW architecture.
Autonomy and heterogeneity -> The processors are “loosely coupled” in that they have different speeds and
each can be running a different operating system.
M.D.BOOMIJA, AP/CSE
11. List out MIMD architectures in terms of coupling:
Tightly coupled multiprocessors (with UMA shared memory). These may be either switch-based
Tightly coupled multiprocessors (with NUMA shared memory or that communicate by message passing).
Loosely coupled multi computers (without shared memory) physically co-located. These may be bus-based and
the processors may be heterogeneous
Loosely coupled multi computers (without shared memory and without common clock) that are physically
remote.
The parallelism/concurrency in a parallel/distributed program can be measured by the ratio of the number of
local (non-communication and non-shared memory access) operations to the total number of operations,
including the communication or shared memory access operations.
15. Identify some distributed applications in the scientific and commercial application areas. For each
application, determine which of the motivating factors are important for building the application over a
distributed system.
Scientific: Cosmology@Home
a. Inherently distributed
b. Resource sharing: CPU time
Commercial: HDFS
a. Access
b. Reliability
c. Scalability
d. Modularity and Expandability
16. Explain why a Receive call cannot be asynchronous.
Async is about copying out. But the Receive is about copying in user-buffer. After copying-in, we can continue
the user code immediately. This is different from sending -- which takes OS-time and other time out of user-
code.
17. What are the three aspects of reliability? Is it possible to order them in different ways in terms of
importance, based on different applications’ requirements? Justify your answer by giving examples of
different applications.
Availability
Integrity
Fault-tolerance
M.D.BOOMIJA, AP/CSE
Yes.
For banking service, fault-tolerance is of the top-most importance.
But for web service, availability is the most important one.
18. The emulations among the principal system classes in a failure-free system. 1. Which of these emulations
are possible in a failure-prone system? Explain. 2. Which of these emulations are not possible in a failure-
prone system? Explain.
1. Impossible
MP -> SM: If there is no previously sent messages, any read will cause error
S -> A: If the process is out of synchronization, there will be error
2. Possible
SM -> MP
A -> S
M.D.BOOMIJA, AP/CSE
25. List out the models of communication networks
There are several models of the service provided by communication networks, namely, FIFO, Non-FIFO, and
causal ordering.
In the FIFO model, each channel acts as a first-in first-out message queue and thus, message ordering is
preserved by a channel.
In the non-FIFO model, a channel acts like a set in which the sender process adds messages and the receiver
process removes messages from it in a random order.
The “causal ordering” model is based on Lamport’s “happens before” relation.
M.D.BOOMIJA, AP/CSE
PART B
1. Define distributed system. Listout the characteristics of distributed systems. How to relate the computer
system components in distributed environment. (1.1 & 1.2)
2. Describe the motivations of implementing distributed systems. (1.3)
3. Describe the parallel systems with examples. (1.4)
4. Differentiate message passing and shared memory and how they emulate (1.5)
5. Describe the primitives of distributed computing (1.6)
6. Differentiate sync and async execution with example. (1.7)
7. Explain the Design issues and challenges of distributed computing. (1.8)
8. Discuss the model of distributed execution. (1.10)
9. Explain global states with example. (1.12)
10. What is cut and past, future cones of an event in distributed systems (1.13 &1.14)
11. Explain Logical clocks with example.(1.16 &1.17)
12. Discuss scalar time and its properties. (1.18)
13. Discuss Vector time(1.19)
14. Explain physical clock synchronization with example (1.20)
M.D.BOOMIJA, AP/CSE