CS3551 - 1_merged
CS3551 - 1_merged
UNIT I INTRODUCTION
Logical Time: Physical Clock Synchronization: NTP – A Framework for a System of Logical Clocks
– Scalar Time – Vector Time; Message Ordering and Group Communication: Message Ordering
Paradigms – Asynchronous Execution with Synchronous Communication – Synchronous Program
Order on Asynchronous System – Group Communication – Causal Order – Total Order; Global
State and Snapshot Recording Algorithms: Introduction – System Model and Definitions – Snapshot
Algorithms for FIFO Channels
The process of computation was started from working on a single processor. This uni-
processor computing can be termed as centralized computing.
A distributed system is a collection of independent computers, interconnected via a
network, capable of collaborating on a task. Distributed computing is computing
performed in a distributed system.
The interaction of the layers of the network with the operating system and
middleware is shown in Fig 1.2. The middleware contains important library functions for
facilitating the operations of DS.
The distributed system uses a layered architecture to break down the complexity of system
design. The middleware is the distributed software that drives the distributed system, while
providing transparency of heterogeneity at the platform level
1.3 Motivation
The following are the key points that acts as a driving force behind DS:
• Buffered: The standard option copies the data from the user buffer to the kernel
buffer. The data later gets copied from the kernel buffer onto the network. For the
Receive primitive, the buffered option is usually required because the data may
already have arrived when the primitive is invoked, and needs a storage place in
the kernel.
• Unbuffered: The data gets copied directly from the user buffer onto the network.
Blocking primitives
• The primitive commands wait for the message to be delivered. The execution of
the processes is blocked.
• The sending process must wait after a send until an acknowledgement is made
bythe receiver.
• The receiving process must wait for the expected message from the sending
process
• A primitive is blocking if control returns to the invoking process after the
processing for the primitive completes.
Non Blocking primitives
• If send is nonblocking, it returns control to the caller immediately, before the
message is sent.
• The advantage of this scheme is that the sending process can continue computing
in parallel with the message transmission, instead of having the CPU go idle.
• This is a form of asynchronous communication.
• A primitive is non-blocking if control returns back to the invoking process
immediately after invocation, even though the operation has not completed.
• For a non-blocking Send, control returns to the process even before the data
iscopied out of the user buffer.
For a non-blocking Receive, control returns to the process even before thedata may have
arrived from the sender.
Synchronous
• A Send or a Receive primitive is synchronous if both the Send() and Receive()
handshake with each other.
• The processing for the Send primitive completes only after the invoking
processor learns
• The processing for the Receive primitive completes when the data to be
received is copied into the receiver’s user buffer.
Asynchronous
• A Send primitive is said to be asynchronous, if control returns back to the
invoking process after the data item to be sent has been copied out of the user-
specified buffer.
• For non-blocking primitives, a return parameter on the primitive call returns a
system-generated handle which can be later used to check the status of
completion of the call.
• The process can check for the completion:
o checking if the handle has been flagged or posted
o issue a Wait with a list of handles as parameters: usually blocks until one
of the parameter handles is posted.
The send and receive primitives can be implemented in four modes:
• Blocking synchronous
• Non- blocking synchronous
• Blocking asynchronous
• Non- blocking asynchronous
Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized.
To ensure that no processor begins executing the next step of code until all the processors
have completed executing the previous steps ofcode assigned to each of the processors.
Asynchronous Execution:
A communication among processes is considered asynchronous, when every
communicating process can have a different observation of the order of the messages being
exchanged. In an asynchronous execution:
• there is no processor synchrony and there is no bound on the drift rate of processor
clocks
• message delays are finite but unbounded
• no upper bound on the time taken by a process
Fig: Asynchronous execution in message passing system
Synchronous Execution:
A communication among processes is considered synchronous when every process
observes the same order of messages within the system. In an synchronous execution:
• processors are synchronized and the clock drift rate between any two processors is
bounded
• message delivery times are such that they occur in one logical step or round
• upper bound on the time taken by a process to execute a
step.
➢ Processes: The main challenges involved are: process and thread management at
both client and server environments, migration of code between systems, design of software
and mobile agents.
➢ Naming: Devising easy to use and robust schemes for names, identifiers, and
addresses is essential for locating resources and processes in a transparent and scalable
manner. The remote and highly varied geographical locations make this task difficult.
➢ Synchronization: Mutual exclusion, leader election, deploying physical clocks,
global state recording are some synchronization mechanisms.
➢ Data storage and access Schemes: Designing file systems for easy and efficient data
storage with implicit accessing mechanism is very much essential for distributed operation
➢ Consistency and replication: The notion of Distributed systems goes hand in hand
with replication of data, to provide high degree of scalability. The replicas should be handed
with care since data consistency is prime issue.
➢ Fault tolerance: This requires maintenance of fail proof links, nodes, and processes.
Some of the common fault tolerant techniques are resilience, reliable communication,
distributed commit, checkpointing and recovery, agreement and consensus, failure detection,
and self-stabilization.
➢ Security: Cryptography, secure channels, access control, key management –
generation and distribution, authorization, and secure group management are some of the
security measure that is imposed on distributed systems.
➢ Applications Programming Interface (API) and transparency: The user
friendliness and ease of use is very important to make the distributed services to be used by
wide community. Transparency, which is hiding inner implementation policy from users, is
of the following types:
➢ Performance
User perceived latency in distributed systems must be reduced. The common issues in
performance:
✓ Metrics: Appropriate metrics must be defined for measuring the performance of
theoretical distributed algorithms and its implementation.
✓ Measurement methods/tools: The distributed system is a complex entity
appropriate methodology and tools must be developed for measuring the performance
metrics.
1.7.3 Applications of distributed computing and newer challenges
The deployment environment of distributed systems ranges from mobile systems to
cloud storage. All the environments have their own challenges:
➢ Mobile systems
o Mobile systems which use wireless communication in shared broadcast
medium have issues related to physical layer such as transmission range,
power, battery power consumption, interfacing with wired internet, signal
processing and interference.
o The issues pertaining to other higher layers include routing, location
management, channel allocation, localization and position estimation, and
mobility management.
o Apart from the above mentioned common challenges, the architectural
differences of the mobile network demands varied treatment. The two
architectures are:
✓ Base-station approach (cellular approach): The geographical region is divided into
hexagonal physical locations called cells. The powerful base station transmits signals to all
other nodes in its range
➢ Sensor networks
o A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters.
o They are low cost equipment with limited computational power and battery
life. They are designed to handle streaming data and route it to external
computer network and processes.
o They are susceptible to faults and have to reconfigure themselves.
o These features introduces a whole new set of challenges, such as position
estimation and time estimation when designing a distributed system .
➢ Ubiquitous or pervasive computing
o In Ubiquitous systems the processors are embedded in the environment to
perform application functions in the background.
o Examples: Intelligent devices, smart homes etc.
o They are distributed systems with recent advancements operating in wireless
environments through actuator mechanisms.
o They can be self-organizing and network-centric with limited resources.
➢ Peer-to-peer computing
o Peer-to-peer (P2P) computing is computing over an application layer
networkwhere all interactions among the processors are at a same level.
o This is a form of symmetric computation against the client sever paradigm.
o They are self-organizing with or without regular structure to the network.
Some of the key challenges include: object storage mechanisms, efficientobject lookup, and retrieval in a
scalable manner; dynamic reconfiguration with nodes as well as objects joining and leaving the network
randomly;replication strategies to expedite object search; tradeoffs between object size latency and table
sizes; anonymity, privacy, and security
➢ Publish-subscribe, content distribution, and multimedia
o The users in present day require only the information of interest.
o In a dynamic environment where the information constantly fluctuates there
isgreat demand for
o Publish: an efficient mechanism for distributing this information
o Subscribe: an efficient mechanism to allow end users to indicate interest in
receiving specific kinds of information
o An efficient mechanism for aggregating large volumes of published
information and filtering it as per the user’s subscription filter.
o Content distribution refers to a mechanism that categorizes the information
based on parameters.
o The publish subscribe and content distribution overlap each other.
o Multimedia data introduces special issue because of its large size.
➢ Distributed agents
o Agents are software processes or sometimes robots that move around the
system to do specific tasks for which they are programmed.
o Agents collect and process information and can exchange such
informationwith other agents.
• An internal event changes the state of the process at which it occurs. A send event
changes the state of the process that sends the message and the state of the channel
onwhich the message is sent.
• A receive event changes the state of the process that receives the message and the
stateof the channel on which the message is received.
Casual Precedence Relations
Causal message ordering is a partial ordering of messages in a distributed computing
environment. It is the delivery of messages to a process in the order in which they were
transmitted to that process.
When all the above conditions are satisfied, then it can be concluded that a→b is casually
related. Consider two events c and d; c→d and d→c is false (i.e) they are not casually
related, then c and d are said to be concurrent events denoted as c||d.
A system that supports the causal ordering model satisfies the following property:
GLOBAL STATE
Distributed Snapshot represents a state in which the distributed system might have been in. A snapshot
of the system is a single configuration of the system.
• The global state of a distributed system is a collection of the local states of its components, namely,
the processes
and the communication channels. • The state of a process at any time is defined by the contents of
processor registers, stacks, local memory, etc. and depends on the local context of the distributed
application.
• The state of a channel is given by the set of messages in transit in the channel.
UNIT II
Logical clocks are based on capturing chronological and causal relationships of processes and
ordering events based on these relationships.
In a system of logical clocks, every process has a logical clock that is advanced using a set
of rules. Every event is assigned a timestamp and the causality relation between events can
be generally inferred from their timestamps.
The timestamps assigned to events obey the fundamental monotonicity property; that is, if
an event a causally affects an event b, then the timestamp of a is smaller than the timestamp
of b.
A Framework for a system of logical clocks
A system of logical clocks consists of a time domain T and a logical clock C. Elements of T form a
partially ordered set over a relation <. This relation is usually called the happened before or
causal precedence.
The logical clock C is a function that maps an event e in a distributed system to an element
in the time domain T denoted as C(e).
such that
for any two events ei and ej,.
This monotonicity property is called the clock consistency condition. When T and C
satisfythe following condition,
Data structures:
Each process pi maintains data structures with the given capabilities:
• A local logical clock (lci), that helps process pi measure its own progress.
• A logical global clock (gci), that is a representation of process pi’s local view of the
logicalglobal time. It allows this process to assign consistent timestamps to its local events.
Protocol:
The protocol ensures that a process’s logical clock, and thus its view of the global time,
ismanaged consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.
2. Total Reordering: Scalar clocks order the events in distributed systems. But all the
events do not follow a common identical timestamp. Hence a tie breaking mechanism is
essential toorder the events. The tie breaking is done through:
• Linearly order process identifiers.
• Process with low identifier value will be given higher priority.
The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is the
identity of the process where it occurred.
The total order relation ( ) over two events x and y with timestamp (h, i) and (k, j) is given by:
3. Event Counting
If event e has a timestamp h, then h−1 represents the minimum logical duration,
counted in units of events, required before producing the event e. This is called height of the
event e. h-1 events have been produced sequentially before the event e regardless of the
processes that produced these events.
4. No strong consistency
The scalar clocks are not strongly consistent is that the logical local clock and logical
global clock of a process are squashed into one, resulting in the loss causal dependency
information among events at different processes.
The time domain is represented by a set of n-dimensional non-negative integer vectors in vector
time.
Rule 2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process
pi executes the following sequence of actions:
1. update its global logical time
2. execute R1
3. deliver the message m
• If the process at which an event occurred is known, the test to compare two
timestamps can be simplified as:
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector timestamp
of two events, we can determine if the events are causally related.
3. Event counting
If an event e has timestamp vh[i], vh[j] denotes the number of events executed by process
pjthat causally precede e.
Clock synchronization is the process of ensuring that physically distributed processors have a
common notion of time.
Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically a clock synchronization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other
but also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.
Basic terminologies:
If Ca and Cb are two different clocks, then:
• Time: The time of a clock in a machine p is given by the function Cp(t),where Cp(t)=
tfor a perfect clock.
• Frequency: Frequency is the rate at which a clock progresses. The frequency at time t
of clock Ca is Ca’(t).
• Offset: Clock offset is the difference between the time reported by a clock and the
real time. The offset of the clock Ca is given by Ca(t)− t. The offset of clock C a
relative toCb at time t ≥ 0 is given by Ca(t)- Cb(t)
• Skew: The skew of a clock is the difference in the frequencies of the clock and
theperfect clock. The skew of a clock Ca relative to clock Cb at timet is Ca’(t)-
Cb’(t).
• Drift (rate): The drift of clock Ca the second derivative of the clockvalue with
respectto time. The drift is calculated as:
Clocking Inaccuracies
Physical clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Due to the clock inaccuracy discussed above, a timer (clock)
is said to be working within its specification if:
delay.
Fig : Behavior of clocks
Fig a) Offset and delay estimation Fig b) Offset and delay estimation
between processes from same server between processes from different servers
Let T1, T2, T3, T4 be the values of the four most recent timestamps. The clocks A and B
arestable and running at the same speed. Let a = T1 − T3 and b = T2 − T4. If the network
delay difference from A to B and from B to A, called differential delay, is
small, the clock offset and roundtrip delay of B relative to A at time T4 are
approximatelygiven by the following:
Each NTP message includes the latest three timestamps T1, T2, andT3, while
T4 isdetermined upon arrival.
(i) non-FIFO
(ii) FIFO
(iii) causal order
(iv) synchronous order
There is always a trade-off between concurrency and ease of use and implementation.
Asynchronous Executions
An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality relation
is a partial order.
• There cannot be any causal relationship between events in asynchronous execution.
• The messages can be delivered in any order even in non FIFO.
• Though there is a physical link that delivers the messages sent on it in FIFO order due
to the physical properties of the medium, a may be formed as a composite of
physical links and multiple paths may exist between the two end points of the logical
link.
Fig 2.1: a) FIFO executions b) non FIFO executions
FIFO executions
Fig: CO Execution
• Two send events s and s’ are related by causality ordering (not physical time
ordering), then a causally ordered execution requires that their corresponding receive
events r and r’ occur in the same order at all common destinations.
Applications of causal order:
Applications that requires update to shared data to implement distributed shared
memory, and fair resource allocation in distributed mutual exclusion.
If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.
Other properties of causal ordering
1. Message Order (MO): A MO execution is an A-execution in which, for all
Synchronous Execution
• When all the communication between pairs of processes uses synchronous send and
receives primitives, the resulting order is the synchronous order.
• The synchronous communication always involves a handshake between the receiver
and the sender, the handshake events may appear to be occurring instantaneously and
atomically.
An execution( E ≺)is synchronous if and only if there exists a mapping from E to T (scalar timestamps)
such that
• for any message M, T(s(M))=T(r(M);
• for each process Pi, if ei ≺ ei1 then T(ei)< T(ei1) .
Rendezvous
Rendezvous systems are a form of synchronous communication among an arbitrary
number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems
are possible:
• Binary rendezvous: When two processes agree to synchronize.
• Multi-way rendezvous: When more than two processes agree to synchronize.
• Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not
alter the partial order of the execution.
(message types)
Pi executes send(M) and blocks until it receives ack(M) from Pj . The send event SEND(M)
nowcompletes.
Any M’ message (from a higher priority processes) and request(M’) request for
synchronization (from a lower priority processes) received during the blocking period are
queued.
send(request(M)).
(i) If a message M’ arrives from a higher priority process Pk, Pi accepts M’ by scheduling a
RECEIVE(M’) event and then executes send(ack(M’)) to Pk.
(ii) Ifa request(M’) arrives from a lower priority process Pk, Pi executes
send(permission(M’)) to Pk and blocks waiting for the messageM’. WhenM’ arrives, the
RECEIVE(M’) event is executed.
(2c) When the permission(M) arrives, Pi knows partner Pj is synchronized and Pi executes
send(M). The SEND(M) now completes.
When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it
as a message arrival (as per rules 3 or 4).
Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send(M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.
The data structures maintained are sorted row–major and then column–major:
1. Explicit tracking:
▪ Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the
I.Dests field of entries in local logs at nodes and o.Dests field of entries in messages.
▪ Sets li,a Dests and oi,a. Dests contain explicit information of destinations to which Mi,ais
not guaranteed to be delivered in CO and is not known to be delivered.
▪ The information about d ∈Mi,a .Dests is propagated up to the earliest events on all causal
paths from (i, a) at which it is known that Mi,a is delivered to d or is guaranteed to be
delivered to d in CO.
2. Implicit tracking:
▪ Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be
delivered in CO, is performed implicitly.
▪ The information about messages (i) already delivered or (ii) guaranteed to be
delivered in CO is deleted and not propagated because it is redundant as far as
enforcing CO is concerned.
▪ It is useful in determining what information that is being carried in other messages
and is being stored in logs at other nodes has become redundant and thus can be
purged.
▪ The semantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked
without explicitly storing it.
▪ The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining
only oi,aDests or li,aDests, which is a part of the explicit information.
Multicast M4,3
At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated on multicast M4,3only
to process P6 to ensure causal delivery using the Delivery Condition. The piggybacked
information on message M4,3sent to process P3must not contain this information because of
constraint II. As long as any future message sent to P6 is delivered in causal order w.r.t.
M4,3sent to P6, it will also be delivered in causal order w.r.t. M5,1. And as M5,1 is already
delivered to P4, the information M5,1Dests = ∅ is piggybacked on M4,3 sent to P 3.
Similarly, the information P6 ∈ M5,1Dests must be deleted from Log4 as it will no longer be
needed, because of constraint II. M5,1Dests = ∅ is stored in Log4 to remember that M5,1 has
been delivered or is guaranteed to be delivered in causal order to all its destinations.
Learning implicit information at P2 and P3
When message M4,2is received by processes P2 and P3, they insert the (new)
piggybacked information in their local logs, as information M5,1.Dests = P6. They both
continue to storethis in Log2 and Log3 and propagate this information on multicasts until
they learn at events(2, 4) and (3, 2) on receipt of messages M3,3and M4,3, respectively, that
any future message is expected to be delivered in causal order to process P6, w.r.t. M 5,1sent
toP6. Hence byconstraint II, this information must be deleted from Log2 andLog3. The
flow of events isgiven by;
• When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2),
this is inferred to be valid current implicit information about multicast M5,1because
the log Log3 already contains explicit informationP6 ∈M5,1.Dests about that
multicast. Therefore, the explicit information in Log3 is inferred to be old and must be
deleted to achieve optimality. M5,1Dests is set to ∅ in Log3.
• The logic by which P2 learns this implicit knowledge on the arrival of M3,3is
identical.
Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further,
P6 propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current
implicit information M5,1 has been delivered to P6 by its very absence in the explicit
information.
• When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests
= P6 it is used only to ensure causal delivery of M4,3 using the Delivery
Condition,and is not inserted in Log6 (constraint I) – further, the presence of M5,1
.Dests = P4 in Log6 implies the implicit information that M5,1 has already been
delivered to P6. Also, the absence of P4 in M5,1 .Dests in the explicit
piggybacked information implies the implicit information that M5,1 has been
delivered or is guaranteed to be delivered in causal order to P4, and, therefore,
M5,1. Dests is set to ∅ in Log6.
• When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests
= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery
Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,
which gives the implicit information that M5,1 has been delivered or is
guaranteedto be delivered in causal order to both P4 and P6.
Processing at P1
• When M2,2arrives carrying piggybacked information M5,1.Dests = P6 this
(new)information is inserted in Log1.
• When M6,2arrives with piggybacked information M5,1.Dests ={P4}, P1learns
implicit information M5,1has been delivered to P6 by the very absence of explicit
information P6 ∈ M5,1.Dests in the piggybacked information, and hence marks
information P6 ∈ M5,1Dests for deletion from Log1
• The information “P6 ∈M5,1.Dests piggybacked on M2,3,which arrives at P 1, is
inferred to be outdated using the implicit knowledge derived from M5,1.Dest= ∅”
inLog1.
2.8 TOTAL ORDER
For each pair of processes Pi and Pj and for each pair of messages Mx and My that are delivered to
both the processes, Pi is delivered Mx before My if and only if Pj is delivered Mxbefore My.
Each process sends the message it wants to broadcast to a centralized process, which
relays all the messages it receives to every other process over FIFO channels.
Complexity: Each message transmission takes two message hops and exactly n messages
in a system of n processes.
Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.
Sender
Phase 1
• In the first phase, a process multicasts the message M with a locally unique tag and
the local timestamp to the group members.
Phase 2
• The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
• The await call is non-blocking.
Phase 3
• The process multicasts the final timestamp to the group.
Fig) Sender side of three phase distributed algorithm
Receiver Side
Phase 1
• The receiver receives the message with a tentative timestamp. It updates the variable
priority that tracks the highest proposed timestamp, then revises the proposed
timestamp to the priority, and places the message with its tag and the revised
timestamp at the tail of the queue temp_Q. In the queue, the entry is marked as
undeliverable.
Phase 2
• The receiver sends the revised timestamp back to the sender. The receiver then waits
in a non-blocking manner for the final timestamp.
Phase 3
• The final timestamp is received from the multi caster. The corresponding
messageentry in temp_Q is identified using the tag, and is marked as deliverable
after the revised timestamp is overwritten by the final timestamp.
• The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under
consideration, that message entry has to be placed in its sorted position in the queue.
• If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from temp_Q,
and enqueued in deliver_Q.
Complexity
This algorithm uses three phases, and, to send a message to n − 1 processes, it uses 3(n – 1)
messages and incurs a delay of three message hops
Example
An example execution to illustrate the algorithm is given in Figure 6.14. Here, A and B
multicast to a set of destinations and C and D are the common destinations for both
multicasts. •
Figure (a) The main sequence of steps is as follows:
1. A sends a REVISE_TS(7) message, having timestamp 7. B sends a REVISE_TS(9)
message, having timestamp 9.
2. C receives A’s REVISE_TS(7), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 7. C then sends PROPOSED_TS(7) message to A
3. D receives B’s REVISE_TS(9), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 9. D then sends PROPOSED_TS(9) message to B.
4. C receives B’s REVISE_TS(9), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 9. C then sends PROPOSED_TS(9) message to B.
5. D receives A’s REVISE_TS(7), enters the corresponding message in temp_Q, and marks
it as undeliverable; priority = 10. D assigns a tentative timestamp value of 10, which is
greater than all of the times tamps on REVISE_TSs seen so far, and then sends
PROPOSED_TS(10) message to A.
The state of the system is as shown in the figure
Fig) An example to illustrate the three-phase total ordering algorithm. (a) A snapshot for
PROPOSED_TS and REVISE_TS messages. The dashed lines show the further execution
after the snapshot. (b) The FINAL_TS messages in the example.
Figure (b) The continuing sequence of main steps is as follows:
6. When A receives PROPOSED_TS(7) from C and PROPOSED_TS(10) from D, it
computes the final timestamp as max710=10, and sends FINAL_TS(10) to C and D.
7. When B receives PROPOSED_TS(9) from C and PROPOSED_TS(9) from D, it
computes the final timestamp as max99= 9, and sends FINAL_TS(9) to C and D.
8. C receives FINAL_TS(10) from A, updates the corresponding entry in temp_Q with the
timestamp, resorts the queue, and marks the message as deliverable. As the message is not
at the head of the queue, and some entry ahead of it is still undeliverable, the message is
not moved to delivery_Q.
9. D receives FINAL_TS(9) from B, updates the corresponding entry in temp_Q by
marking the corresponding message as deliverable, and resorts the queue. As the message
is at the head of the queue, it is moved to delivery_Q. This is the system snapshot shown in
Figure (b).
The following further steps will occur:
10. When C receives FINAL_TS(9) from B, it will update the correspond ing entry in
temp_Q by marking the corresponding message as deliv erable. As the message is at the
head of the queue, it is moved to the delivery_Q, and the next message (of A), which is
also deliverable, is also moved to the delivery_Q.
11. When D receives FINAL_TS(10) from A, it will update the corre sponding entry in
temp_Q by marking the corresponding message as deliverable. As the message is at the
head of the queue, it is moved to the delivery_Q
Law of conservation of messages: Every message mijthat is recorded as sent in the local state of a
process pi must be captured in the state of the channel Cij or in the collected local state of the
receiver process pj.
➢ In a consistent global state, every message that is recorded as received is also recorded
as sent. Such a global state captures the notion of causality that a message cannot be
received if it was not sent.
➢ Consistent global states are meaningful global states and inconsistent global states are
not meaningful in the sense that a distributed system can never be in an inconsistent
state.
Issue 2:
How to determine the instant when a process takes its snapshot?
The answer
Answer:
A process pj must record its snapshot before processing a message mij that was sent byprocess pi after
recording its snapshot
A snapshot captures the local states of each process along with the state of each communication channel.
Chandy–Lamport algorithm
• The algorithm will record a global snapshot for each process channel.
• The Chandy-Lamport algorithm uses a control message, called a marker.
• After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
• Since channels are FIFO, a marker separates the messages in the channel into those to
be included in the snapshot from those not to be recorded in the snapshot.
• This addresses issue I1. The role of markers in a FIFO system is to act as delimiters
for the messages in the channels so that the channel state recorded by the process
at the receiving end of the channel satisfies the condition C2.
Initiating a snapshot
• Process Pi initiates the snapshot
• Pi records its own state and prepares a special marker message.
• Send the marker message to all other processes.
• Start recording all incoming messages from channels Cij for j not equal to i.
Propagating a snapshot
• For all processes Pjconsider a message on channel Ckj.
Terminating a snapshot
• All processes have received a marker.
• All process have received a marker on all the N-1 incoming channels.
• A central server can gather the partial state to build a global snapshot.
Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is the diameter of
thenetwork.
2. (Markers shown using dotted arrows.) Let site S1 initiate the algorithm just after t0 and before
Sending the $50 for S2. Site S1 records its local state (account A = $600) and sends a marker to
S2. The marker is received by site S2 between t2 and t3. When site S2 receives the marker, it
records its local state (account B = $120), the state of channel C12 as $0, and sends a marker
along channel C21. When site S1 receives this marker, it records the state of channel C21 as $80.
The $800 amount in the system is conserved in the recorded global state,
A=$600 B=$120 C12 =$0 C21 =$80
The recorded global state may not correspond to any of the global states that occurred
during the computation.
This happens because a process can change its state asynchronously before the markers it
sentare received by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some equivalent
executions.
The recorded global state is a valid state in an equivalent execution and if a stable property
(i.e., a property that persists) holds in the system before the snapshot algorithm begins, it holds in
the recorded global snapshot.
Therefore, a recorded global state is useful in detecting stable properties.
UNIT III
Mutual exclusion in a distributed system states that only one process is allowed to execute the
critical section (CS) at any given time.
• Message passing is the sole means for implementing distributed mutual exclusion.
There are three basic approaches for implementing distributed mutual exclusion:
1. Token-based approach:
− A unique token (also known as the privilege message) is shared among the sites.
− A site is allowed to enter its CS if it possesses the token.
− Mutual Exclusion is ensured because the token is unique.
− Eg: Suzuki-Kasami’s Broadcast Algorithm, Raymond’s Tree- Based Algorithm
etc
2. Non-token-based approach:
− Two or more successive rounds of messages are exchanged among the sites to
determine which site will enter the CS next.
− Eg: Lamport's algorithm, Ricart–Agrawala algorithm
Quorum-based approach:
3.1.1 Preliminaries
• The system consists of N sites, S1, S2, S3, …, SN.
• Assume that a single process is running on each site.
• The process at site Si is denoted by pi.
• All these processes communicate asynchronously over an underlying
communication network.
• A site can be in one of the following three states: requesting the CS, executing the CS,
or neither requesting nor executing the CS.
• In the requesting the CS state, the site is blocked and cannot make further requests for
the CS.
• In the idle state, the site is executing outside the CS.
• In the token-based algorithms, a site can also be in a state where a site holding the
token is executing outside the CS. Such state is referred to as the idle token state.
• At any instant, a site may have several pending requests for CS. A site queues up
these requests and serves them one at a time.
• N denotes the number of processes or sites involved in invoking the critical section, T
denotes the average message delay, and E denotes the average critical section
execution time.
At any instant, only one process can execute the critical section. This is an
essential property of a mutual exclusion algorithm.
• Liveness property:
This property states the absence of deadlock and starvation. Two or more sites
should not endlessly wait for messages that will never arrive. time. This is an
important property of a mutual exclusion algorithm
• Fairness:
Fairness in the context of mutual exclusion means that each process gets a fair
chance to execute the CS. In mutual exclusion algorithms, the fairness property
generally means that the CS execution requests are executed in order of their arrival in
the system.
➢ Message complexity: This is the number of messages that are required per CS
execution by a site.
➢ Synchronization delay: After a site leaves the CS, it is the time required and before
the next site enters the CS.
➢ Response time: This is the time interval a request waits for its CS execution to be
over after its request messages have been sent out. Thus, response time does not
include the time a request waits at a site before its request messages have been sent
out.
System throughput: This is the rate at which the system executes requests for the
CS. If SD is the synchronization delay and E is the average critical section execution
time.
▪ For examples, the best and worst values of the response time are achieved when load
is, respectively, low and high;
▪ The best and the worse message traffic is generated at low and heavy load conditions,
respectively.
3.2 LAMPORT’S ALGORITHM
• Request for CS are executed in the increasing order of timestamps and time is
determined by logical clocks.
• Every site Si keeps a queue, request_queuei which contains mutual exclusion requests
ordered by their timestamps
• This algorithm requires communication channels to deliver messages the FIFO
order.Three types of messages are used Request, Reply and Release. These messages
with timestamps also updates logical clock
Correctness
Theorem: Lamport’s algorithm achieves mutual exclusion.
Proof: Proof is by contradiction.
Suppose two sites Si and Sj are executing the CS concurrently. For this to happen
conditions L1 and L2 must hold at both the sites concurrently.
This implies that at some instant in time, say t, both Si and Sj have their own requests
at the top of their request queues and condition L1 holds at them. Without loss of
generality, assume that Si ’s request has smaller timestamp than the request of Sj .
From condition L1 and FIFO property of the communication channels, it is clear that
at instant t the request of Si must be present in request queuej when Sj was executing
its CS. This implies that Sj ’s own request is at the top of its own request queue
when a smaller timestamp request, Si ’s request, is present in the request queuej – a
contradiction!
Message Complexity:
Lamport’s Algorithm requires invocation of 3(N – 1) messages per critical section execution.
These 3(N – 1) messages involves
• (N – 1) request messages
• (N – 1) reply messages
• (N – 1) release messages
Drawbacks of Lamport’s Algorithm:
• Unreliable approach: failure of any one of the processes will halt the progress
ofentire system.
• High message complexity: Algorithm requires 3(N-1) messages per critical
sectioninvocation.
• In case Site Sj is requesting, the timestamp of Site Si‘s request is smaller than its
ownrequest.
• Otherwise the request is deferred by site Sj.
Upon exiting site Si sends REPLY message to all the deferred requests.
Performance:
Synchronization delay is equal to maximum message transmission time. It requires 3(N –
1) messages per CS execution. Algorithm can be optimized to 2(N – 1) messages by
omitting the REPLY message in some situations.
• A site send a REQUEST message to all other site to get their permission to
entercritical section.
• A site send a REPLY message to other site to give its permission to enter the
criticalsection.
• A timestamp is given to each critical section request using Lamport’s logical clock.
• Timestamp is used to determine priority of critical section requests.
• Smaller timestamp gets high priority over larger timestamp.
• The execution of critical section request is always in the order of their timestamp.
Fig) Site S1 exists the CS and sends a reply message to S2’s deferred
request
Message Complexity:
Ricart–Agrawala algorithm requires invocation of 2(N – 1) messages per critical section
execution. These 2(N – 1) messages involve:
• (N – 1) request messages
• (N – 1) reply messages
Performance:
Synchronization delay is equal to maximum message transmission time It requires
2(N – 1) messages per Critical section execution.
Correctness
Mutual exclusion is guaranteed because there is only one token in the system and a site holds
the token during the CS execution.
Theorem: A requesting site enters the CS in finite time.
Proof: Token request messages of a site Si reach other sites in finite time.
Since one of these sites will have token in finite time, site Si ’s request will be placed in the
token queue in finite time.
Since there can be at most N − 1 requests in front of this request in the token queue, site Si
will get the token and execute the CS in finite time.
Message Complexity:
The algorithm requires 0 message invocation if the site already holds the idle token at the
time of critical section request or maximum of N message per critical section execution.
ThisN messages involves
• (N – 1) request messages
• 1 reply message
Performance:
Synchronization delay is 0 and no message is needed if the site holds the idle token at the
time of its request. In case site does not holds the idle token, the maximum
synchronization delay is equal to maximum message transmission time and a maximum of
N message is required per critical section invocation.
Distributed approach:
• In the distributed approach different nodes work together to detect deadlocks.
Nosingle point failure as workload is equally divided among all nodes.
• The speed of deadlock detection also increases.
Hierarchical approach:
• This approach is the most advantageous approach.
• It is the combination of both centralized and distributed approaches of
deadlockdetection in a distributed system.
• In this approach, some selected nodes or cluster of nodes are responsible for
deadlockdetection and these selected nodes are controlled by a single node.
System Model
Deadlock Detection
Correctness criteria
A deadlock detection algorithm must satisfy the following two conditions:
1. Progress-No undetected deadlocks:
The algorithm must detect all existing deadlocks in finite time. In other words, after
all wait-for dependencies for a deadlock have formed, the algorithm should not wait for any
more events to occur to detect the deadlock.
2. Safety -No false deadlocks:
The algorithm should not report deadlocks which do not exist. This is also called as called
phantom or false deadlocks
• fulfilled) only after a message from each process in its dependent set has arrived.
• In the AND model, a process can request more than one resource simultaneously and
therequest is satisfied only after all the requested resources are granted to the process.
• The requested resources may exist at different locations.
• The out degree of a node in the WFG for AND model can be more than 1.
• The presence of a cycle in the WFG indicates a deadlock in the AND model.
• Each node of the WFG in such a model is called an AND node.
• In the AND model, if a cycle is detected in the WFG, it implies a deadlock but not
viceversa. That is, a process may not be a part of a cycle, it can still be deadlocked.
• A process can make a request for numerous resources simultaneously and the
requestis satisfied if any one of the requested resources is granted.
• Presence of a cycle in the WFG of an OR model does not imply a
deadlockin the OR model.
• In the OR model, the presence of a knot indicates a deadlock.
• Note that AND requests for p resources can be stated as and OR requests for
presources can be stated as
Data structures
Each process Pi maintains a boolean array, dependen ti, where dependent(j) is true only if
Piknows that Pj is dependent on it. Initially, dependen ti (j) is false for all i and j.
Performance analysis
In the algorithm, one probe message is sent on every edge of the WFG which
connects processes on two sites.
The algorithm exchanges at most m(n − 1)/2 messages to detect a deadlock that
involves m processes and spans over n sites.
The size of messages is fixed and is very small (only three integer words).
The delay in detecting a deadlock is O(n).
Advantages:
It is easy to implement.
Each probe message is of fixed
length. There is very little
computation.
There is very little overhead.
There is no need to construct a graph, nor to pass graph information to other sites.
This algorithm does not find false (phantom) deadlock.
There is no need for special data structures.
denoting that they belong to a diffusion computation initiated by a process pi and are
being sent from process pj to process pk.
A blocked process initiates deadlock detection by sending query messages to all
processes in its dependent set.
If an active process receives a query or reply message, it discards it. When a blocked
process Pk receives a query(i, j, k) message, it takes the following actions:
1. If this is the first query message received by Pk for the deadlock detection
initiated by Pi, then it propagates the query to all the processes in its
dependent set and sets a local variable numk (i) to the number of query
messages sent.
2. If this is not the engaging query, then Pk returns a reply message to it
immediately provided Pk has been continuously blocked since it received
thecorresponding engaging query. Otherwise, it discards the query.
• Process Pk maintains a boolean variable waitk(i) that denotes the fact that it
has been continuously blocked since it received the last engaging query
fromprocess Pi.
• When a blocked process Pk receives a reply(i, j, k) message, it
decrementsnumk(i) only if waitk(i) holds.
• A process sends a reply message in response to an engaging query only after
it has received a reply to every query message it has sent out for this engaging
query.
• The initiator process detects a deadlock when it has received reply messages
toall the query messages it has sent out.
Performance analysis
For every deadlock detection, the algorithm exchanges e query messages and e reply
messages, where e = n(n – 1) is the number of edges.
Problem definition
Agreement among the processes in a distributed system is a fundamental requirement for a
wide range of applications. Many forms of coordination require the processes to exchange
information to negotiate with one another and eventually reach a common understanding or
agreement, before taking application-specific actions. A classical example is that of the
commit decision in database systems, wherein the processes collectively decide whether to
commit or abort a transaction that they participate in.
We first state some assumptions underlying our study of agreement algorithms:
• Failure models Among the n processes in the system, at most f processes can be faulty. A
faulty process can behave in any manner allowed by the failure model assumed. The various
failure models – fail-stop, send omission and receive omission, and Byzantine failures.
• Synchronous/asynchronous communication If a failure-prone process chooses to send a
message to process Pi but fails, then Pi cannot detect the non-arrival of the message in an
asynchronous system. In a synchronous system, however, the scenario in which a message
has not been sent can be recognized by the intended recipient, at the end of the round.
• Network connectivity The system has full logical connectivity, i.e., each process can
communicate with any other by direct message passing.
• Sender identification A process that receives a message always knows the identity of the
sender process.
• Channel reliability The channels are reliable, and only the processes may fail (under one of
various failure models).
• Authenticated vs. non-authenticated messages With unauthenticated messages, when a
faulty process relays a message to other processes, (i) it can forge the message and claim that
it was received from another process, and (ii) it can also tamper with the contents of a
received message before relaying it. When a process receives a message, it has no way to
verify its authenticity. An unauthenticated message is also called an oral message or an
unsigned message. Using authentication via techniques such as digital signatures, it is easier
to solve the agreement problem because, if some process forges a message or tampers with
the contents of a received message before relaying it, the recipient can detect the forgery or
tampering. Thus, faulty processes can inflict less damage.
• Agreement variable The agreement variable may be boolean or multivalued, and need not
be an integer.
f < n processes
Byzantine agreement attainable agreement not attainable
Failure f ≤ [(n - 1)/3] Byzantine processes
• The agreement condition is satisfied because in the f+ 1 rounds, there must be at least one round in which
no process failed.
• In this round, say round r, all the processes that have not failed so far succeed in broadcasting their
values, and all these processes take the minimum of the values broadcast and received in that round.
• Thus, the local values at the end of the round are the same, say x r i for all non-failed processes.
• In further rounds, only this value may be sent by each process at most once, and no process i will update
its value x r i.
• The validity condition is satisfied because processes do not send fictitious values in this failure model.
• For all i, if the initial value is identical, then the only value sent by any process is the value that has been
agreed upon as per the agreement condition.
• The termination condition is seen to be satisfied.
Complexity: The complexity of this particular algorithm is it requires f + 1 rounds where f < n and the
number of messages is O(n2)in each round and each message has one integers hence the total number
of messages is O((f+1)· n 2 ) is the total number of rounds and in each round n 2 messages are required.
• 2 in 2nd round, the "Phase king" process arrives at an estimate based on the
values it received in 1st round, and broadcasts its new estimate to all others.
Fig. Message pattern for the phase-king algorithm.
(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes
Correctness Argument
• Pi and Pj use their own majority values. Pi 's mult > n=2 + f )
• Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi’s mult >
n=2 + f , Pj 's mult > n=2 for same value)
• Pi and Pj use the phase-king's tie-breaker value. (In the phase in which Pk
is non- malicious, it sends same value to Pi and Pj )
In all 3 cases, argue that Pi and Pj end up with same value as estimate
•
If all non-malicious processes have the value x at the start of a phase, they
will continue to have x as the consensus value at the end of the phase.
Check pointing and rollback recovery: Introduction
• Rollback recovery protocols restore the system back to a consistent state after a failure,
• It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
• It treats a distributed system application as a collection of processes that communicate
over a network
Checkpoints
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Rollback propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect
1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to
form a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.
3. Log-based rollback recovery - Combines check pointing with logging of non-
deterministic events
• relies on piecewise deterministic (PWD) assumption.
Background and definitions
System model
• A distributed system consists of a fixed number of processes, P1, P2,…_ PN , which
communicate only through messages.
• Processes cooperate to execute a distributed application and interact with the outside
world by receiving and sending input and output messages, respectively.
• Rollback-recovery protocols generally make assumptions about the reliability of theinter-
process communication.
• Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
• Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.
A local checkpoint
• All processes save their local states at certain instants of time
• A local check point is a snapshot of the state of the process at a given instance
• Assumption
– A process stores all local checkpoints on the stable storage
– A process is able to roll back to any of its existing local checkpoints
1. In-transit message
• messages that have been sent but not yet received
2. Lost messages
• messages whose “send‟ is done but “receive‟ is undone due to rollback
3. Delayed messages
•messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphan messages
• messages with “receive‟ recorded but message “send‟ not recorded
• do not arise if processes roll back to a consistent global state
5. Duplicate messages
• arise due to message logging and replaying during process recovery
In-transit messages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.
Delayed messages
Messages whose receive is not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.
Lost messages
Messages whose send is not undone but receive is undone due to rollback are called lost
messages. This type of messages occurs when the process rolls back to a checkpoint prior to
reception of the message while the sender does not rollback beyond the send operation of the
message. In Figure , message m1 is a lost message.
Duplicate messages
• Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
• When process P3 restarts from C3,8, it will resend message m4.
• Therefore, P4 should not replay message m4 from its log.
• If P4 replays message m4, then message m4 is called a duplicate message.
Issues in failure recovery
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels.
• Processes Pi, Pj , and Pk, have taken checkpoints {Ci,0, Ci,1}, {Cj,0, Cj,1, Cj,2}, and {Ck,0,
Ck,1}, respectively, and these processes have exchanged messages A to J
Suppose process Pi fails at the instance indicated in the figure. All the contents of the volatile
memory of Pi are lost and, after Pi has recovered from the failure, the system needs to be
restored to a consistent global state from where the processes can resume their execution.
• Process Pi’s state is restored to a valid state by rolling it back to its most recent checkpoint
Ci,1. To restore the system to a consistent state, the process Pj rolls back to checkpoint Cj,1
because the rollback of process Pi to checkpoint Ci,1 created an orphan message H (the receive
event of H is recorded at process Pj while the send event of H has been undone at process Pi).
• Pj does not roll back to checkpoint Cj,2 but to checkpoint Cj,1. An orphan message I is created
due to the roll back of process Pj to checkpoint Cj,1. To eliminate this orphan message, process
Pk rolls back to checkpoint Ck,1.
• Messages C, D, E, and F are potentially problematic. Message C is in transit during the failure
and it is a delayed message. The delayed message C has several possibilities: C might arrive at
process Pi before it recovers, it might arrive while Pi is recovering, or it might arrive after Pi has
completed recovery. Each of these cases must be dealt with correctly.
• Message D is a lost message since the send event for D is recorded in the restored state for
process Pj , but the receive event has been undone at process Pi. Process Pj will not resend D
without an additional mechanism.
• Messages E and F are delayed orphan messages and pose perhaps the most serious problem of
all the messages. When messages E and F arrive at their respective destinations, they must be
discarded since their send events have been undone. Processes, after resuming execution from
their checkpoints, will generate both of these messages.
• Lost messages like D can be handled by having processes keep a message log of all the sent
messages. So when a process restores to a checkpoint, it replays the messages from its log to
handle the lost message problem.
• Overlapping failures further complicate the recovery process. If overlapping failures are to be
tolerated, a mechanism must be introduced to deal with amnesia and the resulting
inconsistencies.
Checkpoint-based recovery
Checkpoint-based rollback-recovery techniques can be classified into three categories:
1. Uncoordinated checkpointing
2. Coordinated checkpointing
3. Communication-induced checkpointing
1. Uncoordinated Checkpointing
• Each process has autonomy in deciding when to take checkpoints
• Advantages
The lower runtime overhead during normal execution
• Disadvantages
1. Domino effect during a recovery
2. Recovery from a failure is slow because processes need to iterate to find a
consistent set of checkpoints
3. Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithm
4. Not suitable for application with frequent output commits
• The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation
Algorithm
• The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and
sends them a request.
• Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, until
no more processes can be identified.
• During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.
• In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-induced Checkpointing
Communication-induced checkpointing is another way to avoid the domino effect, while
allowing processes to take some of their checkpoints independently. Processes may be forced to
take additional checkpoints
Two types of checkpoints
1. Autonomous checkpoints
2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
• Communication-induced check pointing piggybacks protocol- related information on
each application message
• The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
• The forced checkpoint must be taken before the application may process the contents of
the message
• In contrast with coordinated check pointing, no special coordination messages are
exchanged
Two types of communication-induced checkpointing
1. Model-based checkpointing
2. Index-based checkpointing.
Model-based checkpointing
• Model-based checkpointing prevents patterns of communications and checkpoints
that could result in inconsistent states among the existing checkpoints.
• No control messages are exchanged among the processes during normal operation.
All information necessary to execute the protocol is piggybacked on application
messages
• There are several domino-effect-free checkpoint and communication model.
• The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede
all message-sending events.
Index-based checkpointing.
• Index-based communication-induced checkpointing assigns monotonically increasing
indexes to checkpoints, such that the checkpoints having the same index at different
processes form a consistent state.
Optimization: May not to recover all, since some of the processes did not change
anything
The above protocol, in the event of failure of process X, the above protocol will require
processes X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.
The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
processor pi executes the following:
STEP (a)
if processor pi is recovering after a failure then
CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if
STEP (b)
for k = 1 1 to N {N is the number of processors in the system} do
for each neighboring processor pj do
compute SENTi→j(CkPti)
The term cloud refers to a network or the internet. It is a technology that uses remote
servers on the internet to store, manage, and access data online rather than local drives.
The data can be anything such as files, images, documents, audio, video, and more.
1) Agility
The availability of servers is high and more reliable because the chances of
infrastructure failure are minimum.
3) High Scalability
4) Multi-Sharing
With the help of cloud computing, multiple users and applications can work more
efficiently with cost reductions by sharing common infrastructure.
6) Maintenance
7) Low Cost
By using cloud computing, the cost will be reduced because to take the services of cloud
computing, IT Company need not to set its own infrastructure and pay-as-per usage
of resources.
Application Programming Interfaces (APIs) are provided to the users so that they can
access services on the cloud by using these APIs and pay the charges as per the usage
of services.
Though the answer to which cloud model is an ideal fit for a business depends on your
organization's computing and business needs. Choosing the right one from the various
types of cloud service deployment models is essential. It would ensure your business is
equipped with the performance, scalability, privacy, security, compliance & cost-
effectiveness it requires. It is important to learn and explore what different deployment
types can offer - around what particular problems it can solve.
Read on as we cover the various cloud computing deployment and service models to help
discover the best choice for your business.
What Is A Cloud Deployment Model?
Most cloud hubs have tens of thousands of servers and storage devices to enable fast
loading. It is often possible to choose a geographic area to put the data "closer" to users.
Thus, deployment models for cloud computing are categorized based on their location.
To know which model would best fit the requirements of your organization, let us first
learn about the various types.
Public Cloud
The name says it all. It is accessible to the public. Public deployment models in the cloud
are perfect for organizations with growing and fluctuating demands. It also makes a great
choice for companies with low-security concerns. Thus, you pay a cloud service provider
for networking services, compute virtualization & storage available on the public internet.
It is also a great delivery model for the teams with development and testing. Its
configuration and deployment are quick and easy, making it an ideal choice for test
environments.
Benefits of Public Cloud
o Data Security and Privacy Concerns - Since it is accessible to all, it does not fully
protect against cyber-attacks and could lead to vulnerabilities.
o Reliability Issues - Since the same server network is open to a wide range of users,
it can lead to malfunction and outages
o Service/License Limitation - While there are many resources you can exchange
with tenants, there is a usage cap.
Private Cloud
Now that you understand what the public cloud could offer you, of course, you are keen
to know what a private cloud can do. Companies that look for cost efficiency and greater
control over data & resources will find the private cloud a more suitable choice.
It means that it will be integrated with your data center and managed by your IT team.
Alternatively, you can also choose to host it externally. The private cloud offers bigger
opportunities that help meet specific organizations' requirements when it comes to
customization. It's also a wise choice for mission-critical processes that may have
frequently changing requirements.
o Data Privacy - It is ideal for storing corporate data where only authorized
personnel gets access
o Security - Segmentation of resources within the same Infrastructure can help with
better access and higher levels of security.
o Supports Legacy Systems - This model supports legacy systems that cannot access
the public cloud.
Community Cloud
The community cloud operates in a way that is similar to the public cloud. There's just
one difference - it allows access to only a specific set of users who share common
objectives and use cases. This type of deployment model of cloud computing is managed
and hosted internally or by a third-party vendor. However, you can also choose a
combination of all three.
o Smaller Investment - A community cloud is much cheaper than the private &
public cloud and provides great performance
o Setup Benefits - The protocols and configuration of a community cloud must align
with industry standards, allowing customers to work much more efficiently.
Hybrid Cloud
As the name suggests, a hybrid cloud is a combination of two or more cloud architectures.
While each model in the hybrid cloud functions differently, it is all part of the same
architecture. Further, as part of this deployment of the cloud computing model, the
internal or external providers can offer resources.
Let's understand the hybrid model better. A company with critical data will prefer storing
on a private cloud, while less sensitive data can be stored on a public cloud. The hybrid
cloud is also frequently used for 'cloud bursting'. It means, supposes an organization runs
an application on-premises, but due to heavy load, it can burst into the public cloud.
Characteristics of IaaS
PaaS cloud computing platform is created for the programmer to develop, test, run, and
manage the applications.
Characteristics of PaaS
Example: AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App
Engine, Apache Stratos, Magento Commerce Cloud, and OpenShift.
Characteristics of SaaS
It provides a virtual data It provides virtual platforms It provides web software and
center to store information and tools to create, test, and apps to complete business
and create platforms for deploy apps. tasks.
app development, testing,
and deployment.
Iaas is also known as Hardware as a Service (HaaS). It is one of the layers of the cloud
computing platform. It allows customers to outsource their IT infrastructures such as
servers, networking, processing, storage, virtual machines, and other resources.
Customers access these resources on the Internet using a pay-as-per use model.
In traditional hosting services, IT infrastructure was rented out for a specific period of
time, with pre-determined hardware configuration. The client paid for the configuration
and time, regardless of the actual use. With the help of the IaaS cloud computing platform
layer, clients can dynamically scale the configuration to meet changing requirements and
are billed only for the services actually used.
IaaS cloud computing platform layer eliminates the need for every organization to
maintain the IT infrastructure.
IaaS is offered in three models: public, private, and hybrid cloud. The private cloud
implies that the infrastructure resides at the customer-premise. In the case of public cloud,
it is located at the cloud computing platform vendor's data center, and the hybrid cloud is
a combination of the two in which the customer selects the best of both public cloud or
private cloud.
3. Pay-as-per-use model
IaaS providers provide services based on the pay-as-per-use basis. The users are required
to pay for what they have used.
IaaS providers focus on the organization's core business rather than on IT infrastructure.
5. On-demand scalability
On-demand scalability is one of the biggest advantages of IaaS. Using IaaS, users do not
worry about to upgrade software and troubleshoot the issues related to hardware
components.
1. Security
Security is one of the biggest issues in IaaS. Most of the IaaS providers are not able to
provide 100% security.
Although IaaS service providers maintain the software, but they do not upgrade the
software for some organizations.
3. Interoperability issues
It is difficult to migrate VM from one IaaS provider to the other, so the customers might
face problem related to vendor lock-in.
1. Programming languages
PaaS providers provide various programming languages for the developers to develop the
applications. Some popular programming languages provided by PaaS providers are Java,
PHP, Ruby, Perl, and Go.
2. Application frameworks
3. Databases
PaaS providers provide various databases such as ClearDB, PostgreSQL, MongoDB, and
Redis to communicate with the applications.
4. Other tools
PaaS providers provide various other tools that are required to develop, test, and deploy
the applications.
Advantages of PaaS
1) Simplified Development
PaaS allows developers to focus on development and innovation without worrying about
infrastructure management.
2) Lower risk
No need for up-front investment in hardware and software. Developers only need a PC
and an internet connection to start building applications.
Some PaaS vendors also provide already defined business functionality so that users can
avoid building everything from very scratch and hence can directly start the projects only.
4) Instant community
PaaS vendors frequently provide online communities where the developer can get the
ideas to share experiences and seek advice from others.
5) Scalability
Applications deployed can scale from one to thousands of users without any changes to
the applications.
1) Vendor lock-in
One has to write the applications according to the platform provided by the PaaS vendor,
so the migration of an application to another PaaS vendor would be a problem.
2) Data Privacy
Corporate data, whether it can be critical or not, will be private, so if it is not located
within the walls of the company, there can be a risk in terms of privacy of data.
It may happen that some applications are local, and some are in the cloud. So there will
be chances of increased complexity when we want to use data which in the cloud with
the local data.
Business Services - SaaS Provider provides various business services to start-up the
business. The SaaS business services include ERP (Enterprise Resource
Planning), CRM (Customer Relationship Management), billing, and sales.
Social Networks - As we all know, social networking sites are used by the general public,
so social networking service providers use SaaS for their convenience and handle the
general public's information.
Mail Services - To handle the unpredictable number of users and load on e-mail services,
many e-mail providers offering their services using SaaS.
Advantages of SaaS cloud computing layer
Unlike traditional software, which is sold as a licensed based with an up-front cost (and
often an optional ongoing support fee), SaaS providers are generally pricing the
applications using a subscription fee, most commonly a monthly or annually fee.
2. One to Many
SaaS services are offered as a one-to-many model means a single instance of the
application is shared by multiple users.
Software as a service removes the need for installation, set-up, and daily maintenance for
the organizations. The initial set-up cost for SaaS is typically less than the enterprise
software. SaaS vendors are pricing their applications based on some usage parameters,
such as a number of users using the application. So SaaS does easy to monitor and
automatic updates.
6. Multidevice support
SaaS services can be accessed from any device such as desktops, laptops, tablets, phones,
and thin clients.
7. API Integration
SaaS services easily integrate with other software or services through standard APIs.
8. No client-side installation
SaaS services are accessed directly from the service provider using the internet
connection, so do not need to require any software installation.
1) Security
Actually, data is stored in the cloud, so security may be an issue for some users. However,
cloud computing is not more secure than in-house deployment.
2) Latency issue
Since data and applications are stored in the cloud at a variable distance from the end-
user, there is a possibility that there may be greater latency when interacting with the
application compared to local deployment. Therefore, the SaaS model is not suitable for
applications whose demand response time is in milliseconds.
Switching SaaS vendors involves the difficult and slow task of transferring the very large
data files over the internet and then converting and importing them into another SaaS
also.
*Cloud Computing Challenges*
2. Cost Management
Even as almost all cloud service providers have a “Pay As You Go” model, which
reduces the overall cost of the resources being used, there are times when there are huge
costs incurred to the enterprise using cloud computing. When there is under
optimization of the resources, let’s say that the servers are not being used to their full
potential, add up to the hidden costs. If there is a degraded application performance or
sudden spikes or overages in the usage, it adds up to the overall cost. Unused resources
are one of the other main reasons why the costs go up. If you turn on the services or an
instance of cloud and forget to turn it off during the weekend or when there is no current
use of it, it will increase the cost without even using the resources.
3. Multi-Cloud Environments
Due to an increase in the options available to the companies, enterprises not only use a
single cloud but depend on multiple cloud service providers. Most of these companies
use hybrid cloud tactics and close to 84% are dependent on multiple clouds. This often
ends up being hindered and difficult to manage for the infrastructure team. The process
most of the time ends up being highly complex for the IT team due to the differences
between multiple cloud providers.
4. Performance Challenges
When an organization uses a specific cloud service provider and wants to switch to
another cloud-based solution, it often turns up to be a tedious procedure since
applications written for one cloud with the application stack are required to be re-written
for the other cloud. There is a lack of flexibility from switching from one cloud to
another due to the complexities involved. Handling data movement, setting up the
security from scratch and network also add up to the issues encountered when changing
cloud solutions, thereby reducing flexibility.
Since cloud computing deals with provisioning resources in real-time, it deals with
enormous amounts of data transfer to and from the servers. This is only made possible
due to the availability of the high-speed network. Although these data and resources are
exchanged over the network, this can prove to be highly vulnerable in case of limited
bandwidth or cases when there is a sudden outage. Even when the enterprises can cut
their hardware costs, they need to ensure that the internet bandwidth is high as well
there are zero network outages, or else it can result in a potential business loss. It is
therefore a major challenge for smaller enterprises that have to maintain network
bandwidth that comes with a high cost.
Due to the complex nature and the high demand for research working with the cloud
often ends up being a highly tedious task. It requires immense knowledge and wide
expertise on the subject. Although there are a lot of professionals in the field they need
to constantly update themselves. Cloud computing is a highly paid job due to the
extensive gap between demand and supply. There are a lot of vacancies but very few
talented cloud engineers, developers, and professionals. Therefore, there is a need for
upskilling so these professionals can actively understand, manage and develop cloud-
based applications with minimum issues and maximum reliability.
*Virtualization*
Virtualization is the "creation of a virtual (rather than actual) version of something, such
as a server, a desktop, a storage device, an operating system or network resources".
In other words, Virtualization is a technique, which allows to share a single physical
instance of a resource or an application among multiple customers and organizations. It
does by assigning a logical name to a physical storage and providing a pointer to that
physical resource when demanded.
Creation of a virtual machine over existing operating system and hardware is known as
Hardware Virtualization. A Virtual machine provides an environment that is logically
separated from the underlying hardware.
The machine on which the virtual machine is going to create is known as Host
Machine and that virtual machine is referred as a Guest Machine
Types of Virtualization:
1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.
1) Hardware Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly
installed on the hardware system is known as hardware virtualization.
The main job of hypervisor is to control and monitoring the processor, memory and other
hardware resources.
Usage:
Hardware virtualization is mainly done for the server platforms, because controlling
virtual machines is much easier than controlling a physical server.
When the virtual machine software or virtual machine manager (VMM) is installed on the
Host operating system instead of directly on the hardware system is known as operating
system virtualization.
Usage:
Operating System Virtualization is mainly used for testing the applications on different
platforms of OS.
3) Server Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly
installed on the Server system is known as server virtualization.
Usage:
Server virtualization is done because a single physical server can be divided into multiple
servers on the demand basis and for balancing the load.
4) Storage Virtualization:
Storage virtualization is the process of grouping the physical storage from multiple
network storage devices so that it looks like a single storage device.
Usage:
Virtualization plays a very important role in the cloud computing technology, normally
in the cloud computing, users share the data present in the clouds like application etc, but
actually with the help of virtualization users shares the Infrastructure.
The main usage of Virtualization Technology is to provide the applications with the
standard versions to their cloud users, suppose if the next version of that application is
released, then cloud provider has to provide the latest version to their cloud users and
practically it is possible because it is more expensive.
*Load Balancing*
Load balancing is the method that allows you to have a proper balance of the amount of
work being done on different pieces of device or hardware equipment. Typically, what
happens is that the load of the devices is balanced between different servers or between
the CPU and hard drives in a single cloud server.
Load balancing was introduced for various reasons. One of them is to improve the speed
and performance of each single device, and the other is to protect individual devices from
hitting their limits by reducing their performance.
Cloud load balancing is defined as dividing workload and computing properties in cloud
computing. It enables enterprises to manage workload demands or application demands
by distributing resources among multiple computers, networks or servers. Cloud load
balancing involves managing the movement of workload traffic and demands over the
Internet.
Traffic on the Internet is growing rapidly, accounting for almost 100% of the current
traffic annually. Therefore, the workload on the servers is increasing so rapidly, leading
to overloading of the servers, mainly for the popular web servers. There are two primary
solutions to overcome the problem of overloading on the server
o First is a single-server solution in which the server is upgraded to a higher-
performance server. However, the new server may also be overloaded soon,
demanding another upgrade. Moreover, the upgrading process is arduous and
expensive.
o The second is a multiple-server solution in which a scalable service system on a
cluster of servers is built. That's why it is more cost-effective and more scalable to
build a server cluster system for network services.
Cloud-based servers can achieve more precise scalability and availability by using farm
server load balancing. Load balancing is beneficial with almost any type of service, such
as HTTP, SMTP, DNS, FTP, and POP/IMAP.
1. Static Algorithm
Static algorithms are built for systems with very little variation in load. The entire traffic
is divided equally between the servers in the static algorithm. This algorithm requires in-
depth knowledge of server resources for better performance of the processor, which is
determined at the beginning of the implementation.
However, the decision of load shifting does not depend on the current state of the system.
One of the major drawbacks of static load balancing algorithm is that load balancing tasks
work only after they have been created. It could not be implemented on other devices for
load balancing.
2. Dynamic Algorithm
The dynamic algorithm first finds the lightest server in the entire network and gives it
priority for load balancing. This requires real-time communication with the network
which can help increase the system's traffic. Here, the current state of the system is used
to control the load.
The characteristic of dynamic algorithms is to make load transfer decisions in the current
system state. In this system, processes can move from a highly used machine to an
underutilized machine in real time.
3. Round Robin Algorithm
As the name suggests, round robin load balancing algorithm uses round-robin method to
assign jobs. First, it randomly selects the first node and assigns tasks to other nodes in a
round-robin manner. This is one of the easiest methods of load balancing.
Processors assign each process circularly without defining any priority. It gives fast
response in case of uniform workload distribution among the processes. All processes
have different loading times. Therefore, some nodes may be heavily loaded, while others
may remain under-utilised.
Weighted Round Robin Load Balancing Algorithms have been developed to enhance the
most challenging issues of Round Robin Algorithms. In this algorithm, there are a
specified set of weights and functions, which are distributed according to the weight
values.
Processors that have a higher capacity are given a higher value. Therefore, the highest
loaded servers will get more tasks. When the full load level is reached, the servers will
receive stable traffic.
The opportunistic load balancing algorithm allows each node to be busy. It never
considers the current workload of each system. Regardless of the current workload on
each node, OLB distributes all unfinished tasks to these nodes.
The processing task will be executed slowly as an OLB, and it does not count the
implementation time of the node, which causes some bottlenecks even when some nodes
are free.
Under minimum to minimum load balancing algorithms, first of all, those tasks take
minimum time to complete. Among them, the minimum value is selected among all the
functions. According to that minimum time, the work on the machine is scheduled.
Other tasks are updated on the machine, and the task is removed from that list. This
process will continue till the final assignment is given. This algorithm works best where
many small tasks outweigh large tasks.
Load balancing solutions can be categorized into two types -
o Software-based load balancers: Software-based load balancers run on standard
hardware (desktop, PC) and standard operating systems.
o Hardware-based load balancers: Hardware-based load balancers are dedicated
boxes that contain application-specific integrated circuits (ASICs) optimized for a
particular use. ASICs allow network traffic to be promoted at high speeds and are
often used for transport-level load balancing because hardware-based load
balancing is faster than a software solution.
Cloud load balancing takes advantage of network layer information and leaves it to decide
where network traffic should be sent. This is accomplished through Layer 4 load
balancing, which handles TCP/UDP traffic. It is the fastest local balancing solution, but
it cannot balance the traffic distribution across servers.
HTTP(s) load balancing is the oldest type of load balancing, and it relies on Layer 7. This
means that load balancing operates in the layer of operations. It is the most flexible type
of load balancing because it lets you make delivery decisions based on information
retrieved from HTTP addresses.
It is very similar to network load balancing, but is leveraged to balance the infrastructure
internally.
Load balancers can be further divided into hardware, software and virtual load balancers.
It depends on the base and the physical hardware that distributes the network and
application traffic. The device can handle a large traffic volume, but these come with a
hefty price tag and have limited flexibility.
It can be an open source or commercial form and must be installed before it can be used.
These are more economical than hardware solutions.
It differs from a software load balancer in that it deploys the software to the hardware
load-balancing device on the virtual machine.
The technology of load balancing is less expensive and also easy to implement. This
allows companies to work on client applications much faster and deliver better results at
a lower cost.
Cloud load balancing can provide scalability to control website traffic. By using effective
load balancers, it is possible to manage high-end traffic, which is achieved using network
equipment and servers. E-commerce companies that need to deal with multiple visitors
every second use cloud load balancing to manage and distribute workloads.
Load balancers can handle any sudden traffic bursts they receive at once. For example,
in case of university results, the website may be closed due to too many requests. When
one uses a load balancer, he does not need to worry about the traffic flow. Whatever the
size of the traffic, load balancers will divide the entire load of the website equally across
different servers and provide maximum results in minimum response time.
Greater Flexibility
The main reason for using a load balancer is to protect the website from sudden crashes.
When the workload is distributed among different network servers or units, if a single
node fails, the load is transferred to another node. It offers flexibility, scalability and the
ability to handle traffic better.
Conclusion
1. Horizontal Scaling: This involves adding more nodes or servers to the system to handle
increased load. Load balancers distribute incoming requests across these additional
resources, allowing the system to handle a larger volume of traffic.
2. Load Balancer Redundancy: To ensure high availability and avoid single points of failure,
load balancers themselves can be scaled by implementing redundancy. Multiple load
balancers can be deployed in parallel, distributing the load across them and providing
fault tolerance. If one load balancer fails, others can take over seamlessly.
3. Dynamic Configuration: Scalable load balancing systems often have dynamic
configurations that allow for automatic adjustment of resources based on demand. This
includes dynamically adding or removing nodes from the load balancing pool based on
factors like CPU utilization, network traffic, or predefined thresholds.
Elasticity: Elasticity is closely related to scalability but emphasizes the ability of a system
to dynamically adapt its resource allocation in response to workload changes. In load
balancing, elasticity refers to the ability to scale resources up or down based on real-time
demand.
1. Auto Scaling: Auto scaling allows the system to automatically adjust the number of nodes
or servers based on predefined metrics or policies. When the workload increases, new
nodes can be provisioned to handle the additional load, and when the demand decreases,
unnecessary resources can be removed.
2. Load Balancer Health Monitoring: Elastic load balancing systems continuously monitor
the health and performance of individual nodes or servers. If a node becomes overloaded
or unresponsive, the load balancer can dynamically redirect traffic to healthier nodes,
ensuring efficient resource utilization.
3. Dynamic Load Distribution: Elastic load balancers can intelligently distribute incoming
requests based on real-time conditions. For example, they can route requests to nodes
with lower resource utilization or closer proximity to minimize latency.
By combining scalability and elasticity, load balancing systems can efficiently distribute
workload across distributed resources, ensuring optimal performance, responsiveness,
and resource utilization. These characteristics are particularly important in cloud
computing environments, where workloads can vary significantly over time.
*Cloud services and platforms: Compute services*
Compute services are a fundamental component of cloud computing platforms. They
provide the necessary computing resources to run applications, process data, and perform
various computational tasks. Here are some prominent compute services offered by cloud
providers:
1. Amazon EC2 (Elastic Compute Cloud): EC2 is a web service provided by Amazon Web
Services (AWS) that offers resizable virtual servers in the cloud. It allows users to rent
virtual machines (EC2 instances) and provides flexibility in terms of instance types,
operating systems, and configurations. EC2 instances can be rapidly scaled up or down
based on demand, offering a highly scalable compute infrastructure.
2. Microsoft Azure Virtual Machines: Azure Virtual Machines provide users with on-
demand, scalable computing resources in the Microsoft Azure cloud. Users can deploy
virtual machines with various operating systems and configurations, choosing from a
wide range of instance types to meet their specific requirements.
3. Google Compute Engine: Compute Engine is the Infrastructure as a Service (IaaS)
offering of Google Cloud Platform (GCP). It allows users to create and manage virtual
machines with customizable configurations, including options for various CPU and
memory sizes. Compute Engine provides scalable and flexible compute resources in the
Google Cloud environment.
4. IBM Virtual Servers: IBM Cloud offers Virtual Servers, which are scalable and
customizable compute resources. Users can choose from a variety of instance types,
including bare metal servers, virtual machines, and GPU-enabled instances. IBM Virtual
Servers provide the flexibility to customize network and storage configurations according
to specific workload needs.
5. Oracle Compute: Oracle Cloud Infrastructure (OCI) provides compute services through
Oracle Compute, allowing users to provision and manage virtual machines in the Oracle
Cloud. It offers a range of compute shapes, including general-purpose instances,
memory-optimized instances, and GPU instances, enabling users to optimize their
compute resources for different workloads.
These compute services provide the necessary infrastructure to deploy and manage
applications, whether they require simple virtual machines or more specialized instances.
They offer scalability, flexibility, and on-demand provisioning, allowing users to scale
their compute resources up or down based on workload demands. Additionally, these
services often integrate with other cloud services like storage, networking, and databases,
enabling users to build comprehensive cloud-based solutions
*Storage services*
1. Amazon S3 (Simple Storage Service): Amazon S3 is a highly scalable object storage
service provided by AWS. It allows users to store and retrieve any amount of data from
anywhere on the web. S3 provides high durability, availability, and low latency access to
data. It is commonly used for backup and restore, data archiving, content distribution,
and hosting static websites.
2. Azure Blob Storage: Azure Blob Storage is a scalable object storage service in Microsoft
Azure. It offers high availability, durability, and global accessibility for storing large
amounts of unstructured data, such as documents, images, videos, and log files. Blob
Storage provides various storage tiers to optimize costs based on data access patterns.
3. Google Cloud Storage: Google Cloud Storage is a scalable and secure object storage
service in Google Cloud Platform (GCP). It provides a simple and cost-effective solution
for storing and retrieving unstructured data. Google Cloud Storage offers multiple storage
classes, including multi-regional, regional, and nearline, to meet different performance
and cost requirements.
4. IBM Cloud Object Storage: IBM Cloud Object Storage is an scalable and secure storage
service offered by IBM Cloud. It provides durable and highly available storage for storing
large volumes of unstructured data. IBM Cloud Object Storage supports different storage
tiers, data encryption, and integration with other IBM Cloud services.
*Application Services*
1. AWS Lambda: AWS Lambda is a serverless compute service provided by AWS. It allows
developers to run code without provisioning or managing servers. Lambda functions can
be triggered by various events, such as changes in data, API calls, or scheduled events. It
is commonly used for building event-driven architectures, data processing, and executing
small, self-contained tasks.
2. Azure Functions: Azure Functions is a serverless compute service in Microsoft Azure. It
enables developers to run event-triggered code in a serverless environment. Azure
Functions supports multiple programming languages and integrates with various Azure
services, making it suitable for building event-driven applications, data processing
pipelines, and microservices.
3. Google Cloud Functions: Google Cloud Functions is a serverless compute service in
GCP. It allows developers to write and deploy event-driven functions that automatically
scale based on demand. Cloud Functions can be triggered by various events from Google
Cloud services, HTTP requests, or Pub/Sub messages.
4. IBM Cloud Functions: IBM Cloud Functions is a serverless compute service offered by
IBM Cloud. It allows developers to run event-driven functions in a serverless
environment. IBM Cloud Functions supports multiple programming languages and
integrates with other IBM Cloud services, making it suitable for building serverless
applications and event-driven architectures.
These storage services and application services provided by cloud computing platforms
offer scalable, reliable, and cost-effective solutions for data storage, processing, and
application development. They enable organizations to leverage the benefits of cloud
computing while reducing the burden of managing infrastructure and focusing more on
their core business goals.