Distributed-System-and-Cloud-Computing
Distributed-System-and-Cloud-Computing
A
(TWO YEARS PATTERN)
SEMESTER - III (CBCS)
Unit I
1. Introduction To Distributed Computing Concepts 1
2. Introduction To Distributed Computing Concepts 15
Unit II
3. Clock Synchronization 33
4. Election Algorithms 47
Unit III
5. Distributed Shared Memory 59
Unit IV
6. Distributed System Management 73
7. Distributed System Management 97
8. Distributed System Management 118
Unit V
9. Introduction To Cloud Computing 138
Unit VI
10. Cloud Computing 152
11. Cloud Platforms 167
12. Cloud Issues And Challenges 180
*****
SYLLABUS
*****
UNIT I
1
INTRODUCTION TO DISTRIBUTED
COMPUTING CONCEPTS
Unit Structure
1.0 Objective
1.1 Introduction
1.2 Types of distributed system
1.2.1 Client Server distributed system
1.2.2 Peer to Peer distributed system
1.3 Distributed system Overview
1.3.1 Advantages of distributed system
1.3.2 Disadvantages of distributed system
1.3.3 Challenges of distributed system
1.4 Designing issues of distributed system
1.5 Distributed system Architecture
1.6 Categories of distributed system
1.7 Distinguish between token base and non-token base algorithm
1.8 Summary
1.9 Unit End Exercise
1.0 OBJECTIVE
This chapter will able you to understand the following concept:
Bezier curve and surface
properties of Bezier curve
Design techniques using Bezier curve
Cubic Bezier curve
Bezier surface
1.1 INTRODUCTION
1.1.1 What Is a Distributed System?
The definition of distributed system is consisting of many components
together, it can also have multiple geographic boundaries and it can also
communicate and coordinate with many components for message passing
among the actor outside this system.
1
Distributed System and
Cloud Computing
Now we will talk about the Decentralized system in which the distributed
systems are not having specific components to take the decision but every
component own their part of decision, none of them have complete
information. Hence, the resulted decision is depending upon some sort of
consensus between all components.
Distributed system is also called as parallel system as it is very close to
parallel computing. In both the terms it is been refer to scaling-up the
computational capability, but they achieve this in different way. In
parallel computing, we use multiple processors on a single machine to
perform multiple tasks simultaneously, possibly with shared memory.
Whereas in distributed computing, multiple autonomous machines with no
shared memory and communicating with message passing is used for
message passing.
2. Consistency:
In this type of feature the same information can be share all nodes
simultaneously and, all nodes see and return the same information. Hence
they should work in synchronization to get all nodes to exchange
messages and work.
There some minor problem can be there like some difficulties while
passing message through the network between the nodes. Other example
of problem is in passing some message the delivery of message may fail
22
during communication or may it get lost or some nodes may be Introduction to Distributed
unavailable at some point. Computing Concepts
3. Idempotency:
Idempotency means in specific request is executed when the actual event
execution will occur only one time regardless the number of times. As
long as the level of idempotency, develop by manager is try to avoid bad
consequences then it can have dropped connections, request errors, and
more.
For example, after shopping if the customer tries to make a payment but
nothing happens, he/she tries for many times, when the system is
idempotent, the payment will be charged only one time, while not using
idempotent systems one cannot give guarantee the lack of double charges
and users returning their money back.
4. Data durability:
The term durability means that once data is added to the data storage it is
one of the key concerns of distributed systems, it works even if some
system’s nodes are offline or have their data corrupted.
Level of durability is depending on different distributed databases used.
Some of them are support data durability at the machine/node level, where
as in some cases it maintains the cluster level, and in few cases it doesn’t
offer this functionality out of the box.
While developing high-scalable applications data durability takes an
important role in which it is able to process millions of events per day.
In organization world sometime the companies or the owners do not allow
the data loss as it is very crucial data. In some special cases when it deals
with critical operations and transactions. Hence the developers aim should
be providing a high level of data durability and strong connection data.
Nowadays, most distributed data storage services, e.g.
Cassandra, MongoDB, and Dynamodb, offer durability support at
different levels and can be all configured to ensure data durability at the
cluster level.
5. Message Persistence:
In message passing the many of the time it happens that while processing
a message the nodes through which a message passed goes offline or
sometime it may occur failure, then there is a risk of message loss or some
part of message loss. Message persistence assures that the message is
saved and will be processed after the issue is solved.
3
Distributed System and When we talk about quality application message persistence is one of the
Cloud Computing most important characteristics.
When we need to protect the system from losses, we can take an example
of a messaging app of Uber where billions of users are there with millions
of payments per day, it seems very difficult and requires proven
technologies and developers’ expertise.
The solution to this challenge can be a creation of a messaging system that
delivers a message at least one time and it should implement at least once.
In speaking of distributed systems, messaging is generally ensured by
some distributed messaging service like RabbitMQ or Kafka, supporting
various levels of reliability in delivering messages and allowing to build
successful app architectures.
44
● The large problem is divided into smaller part so that the task can be Introduction to Distributed
perform at multiple machines at the same time is the typical Computing Concepts
applications of distributed computing. Hence, this will result into
increase in performance of many complex workloads, like matrix
multiplications.
● Transfer of data from one node to another node can be easily done as
all the nodes are connected to each other in distributed system.
● Addition of nodes at any point of time can be easily done in
distributed system i.e. it can be scaled as required.
● In the entire distributed system if one node fails the other remaining
nodes can still communicate with each other. The entire system will
not break down.
● Printers and scanner can be shared with multiple nodes rather than
being restricted to just one.
5
Distributed System and Often in enterprise applications, under a transaction we require multiple
Cloud Computing operations to happen at same time. For example, in a single unit of data we
may need to make several updates. While this has become quite complex
when we distribute data over a cluster of nodes. Many systems do provide
transactions like semantics in a distributed environment using complex
protocols like Paxos and Raft.
66
1.5 DISTRIBUTED SYSTEM ARCHITECTURE Introduction to Distributed
Computing Concepts
The architecture of a distributed system depends on the use-case which
can be used anywhere in the management .it shows the flow of the data.
However, with some general patterns we can explore more cases.
The following diagram can represent the fact, about the core distribution
models which the architecture used
1. Minicomputer Model:
2. Workstation Model:
9
Distributed System and 3. Workstation–Server Model:
Cloud Computing
● In this model, a user logs onto a workstation with the help of his or
her home workstation. Normal algorithmic or numeric activities
required by the user's processes are performed at the user's home
workstation, but server process the requests coming from special
servers. Out of which some are sent to a server which provided by
user's requested activity & returns the result of request processing to
the user's workstation.
● Therefore, in this model, the user's processes need not migrated to the
server machines for getting the work done by those machines.
10
10
4. Processor–Pool Model: Introduction to Distributed
Computing Concepts
● Some time there are some cases in which the user does not need any
computing power but once in a while the user may need a very large
amount of computing power for a short time this computation can
consider under the processor-pool model. It works on the observation
of the utilization of resources.
● As we know that the workstation-server model uses a processor is
allocated to each user for the task, but in processor-pool model the
processors are used for the task to pooled together and the resources
are shared by the users as and when needed.
● a large number of microcomputers & minicomputers attached to the
network in the pool of processors.
● In this model every workstation in the pool has its own memory to
load the data & run a system program or an application program of
the distributed computing system.
● No home machine is present & the user does not log onto any
machine in this model.
● Better utilization of processing power & greater flexibility is the
highlighting advantage of the model.
● Example: Amoeba & the Cambridge Distributed Computing System.
5. Hybrid Model:
● The workstation-server model has a large number of computer users
only performing simple interactive tasks &-executing small programs.
● The processor-pool model is more attractive & suitable for the users
or the group of users who need to do massive computation for job
performing.
11
Distributed System and ● The feature of Workstation-server & processor-pool models can
Cloud Computing combine together is called as hybrid model which can be used to build
a distributed system.
● The allocation of processor can be done dynamically for computations
that are too large or require several computers for execution.
● This model assures that the interactive jobs can be processed in local
workstation of the user account in the hybrid mode.
A distributed system is a system in which components are situated in
distinct places, these distinct places refer to networked computers which
can easily communicate and coordinate their tasks by just exchanging
messages to each other. These components can communicate with each
other to conquer one common goal as a task.
There are many algorithms are used to achieve Distributed Computing and
these are broadly divided into 2 categories: Token-Based Algorithms and
Non-Token Based Algorithms.
1.8 SUMMARY
The definition of distributed system is consisting of many components
together, it can also have multiple geographic boundaries and it can also
communicate and coordinate with many components for message passing
among the actor outside this system. Distributed software systems can be
made up of machines with a lower level of availability. To develop an
application with 99.99% availability you can use machines/nodes that
have the four nines availability. The architecture of a distributed system
depends on the use-case which can be used anywhere in the management
.it shows the flow of the data. However, with some general patterns we
can explore more cases. A distributed system is a system in which
components are situated in distinct places, these distinct places refer to
networked computers which can easily communicate and coordinate their
13
Distributed System and tasks by just exchanging messages to each other. These components can
Cloud Computing communicate with each other to conquer one common goal as a task.
*****
14
14
2
INTRODUCTION TO DISTRIBUTED
COMPUTING CONCEPTS
Unit Structure
2.0 Objective
2.1 Introduction
2.2 Modes of Interprocess Communication
2.2.1 Shared memory
2.2.2 Message Passing
2.2.3 Synchronization in interprocess communication
2.3 Approaches to Interprocess communication
2.4 Group Communication in distributed system
2.5 RPC in Distributed system
2.5.1 Characteristics of RPC
2.5.2 Features of RPC
2.6 Types of RPC
2.7 Architecture of RPC
2.8 Advantages & Disadvantages of RPC
2.9 Remote Method Invocation
2.10 Summary
2.11 Unit End Exercise
2.0 OBJECTIVE
This chapter will able you to understand the following concept:
● What is IPC (Inter-Process Communication) and its need
● Different modes of IPC
● Different approaches of IPC
● Group communication overview with its types
● RPC and its working
● Remote Method Invocation
2.1 INTRODUCTION
What Is IPC?
To exchange the data, the cooperating processes, need to communicate
with each this transaction of process can be called as Inter-process
communication. It is the mechanism of communicating among processes.
15
Distributed System and To share data or resources the process can be used this process refers inter
Cloud Computing - process communication or interprocess communication (IPC) and it
is specially used in operating system in computer science. This
mechanism of an operating system provides to allow the processes to
manage shared data. Typically, applications of an IPC can be client server
relationship in which client ask for resources from server and server will
reply on the client request.
In designing of microkernels and nano-kernels IPC is very important
factor, which reduce the number of functionalities provided by the kernel.
Those functionalities are then obtained by communicating with servers via
IPC, leading to a large increase in communication when compared to a
regular monolithic kernel. Variable analytic framework structures are used
in IPC. The IPC model relies on multi vector protocols which can ensure
the compatibility between them. An IPC mechanism is either synchronous
or asynchronous.
16
16
Introduction to Distributed
Computing Concepts
The above diagram explains the shared memory model of IPC. Message
sharing or information exchanging done between Process A and process B
with the help of a shared memory segment through the shared memory
region.
By default, the operating system prevents processes from accessing other
process memory. The shared memory model requires processes to agree to
remove this restriction. Besides, as shared memory is established based on
the agreement between processes, the processes are also responsible to
ensure synchronization so that both processes are not writing to the same
location at the same time.
17
Distributed System and
Cloud Computing
● Semaphore:
A multiple process demands common resources at same time which is
sometime called as a semaphore with a variable that controls the access of
the resources. The Min categories of semaphores are binary semaphores
and counting semaphores.
● Mutual Exclusion:
There are some cases where the only one thread enter into critical section
that situation is called as Mutual. This technique is useful for preventing
race condition and it also can help in synchronization.
● Barrier:
As name implies barrier will not allow any individual processes to proceed
until all the processes reach it. Many parallel languages and collective
routines impose barriers.
19
Distributed System and ● Spinlock:
Cloud Computing
This is one type of lock. The checking process of lock availability is called
as spinlock where the processes trying to acquire this lock wait in a loop.
This is known as busy waiting because the process is not doing any useful
operation even though it is active.
● Socket:
The end point for sending or receiving data in a network is called as
socket. This is true for data sent between processes on the same computer
or data sent between different computers on the same network. Most of the
operating systems use sockets for interprocess communication.
● File:
A set of data record is called as file it is stored on a disk or acquired on
demand by a file server. As per the requirement multiple processes can
access a file. Files can be used for data storage in all operating systems.
● Signal:
In a limited way of interprocess communication the signals are used. The
messages are sent from one process to another. Normally, signals are not
used to transfer data but are used for remote commands between
processes.
● Shared Memory:
Multiple process at the same time shares the resources with the help of
Shared memory. Due to this all process can communicate with one another
in easiest way All POSIX systems, as well as Windows operating systems
use shared memory.
● Message Queue:
Without being connected to each other multiple processes can read and
write data to the message queue. Until their recipient retrieves, messages
are stored in the queue. In interprocess communication Message queues
are quite useful and are used by most operating systems.
20
20
Introduction to Distributed
Computing Concepts
Multicast Communication:
A group of process needs to controlled by a specific host process in a
distributed system at the same time is called as multicast communication.
The implementation is done in finding a way to address problem of a high
workload on host system and redundant information from process in
system. It mostly decreases the time taken for message handling.
Unicast Communication:
When a single process wants to communicate with the host process in a
distributed system at the same time is called as Unicast communication.
As the name imply it deals with single process. The same information can
be pass to multiple processes. The specific process will be treating in best
way hence this is the best for two processes for communication. It will
lead to overheads as it has to find exact process and then exchange
information/data.
23
Distributed System and
Cloud Computing
24
24
Notifying members of group membership changes: if the process is Introduction to Distributed
added to the group or excluded from the group then the service will Computing Concepts
indicate that to the other group members.
Performing group address expansion: When a process multicast a
message, it supplies the group identifier rather than a list of processes in
the group.
1. Callback RPC:
P2P paradigm between participating processes can enables RPC. With this
client and server both can be work as service.
2. Broadcast RPC:
The client’s request is called as Broadcast RPC; the processing request
method can be processed by all servers on broadcast network.
3. Batch-mode RPC:
Batch-mode RPC helps to queue, on the client-side, separate RPC
requests, in a transmission buffer and then send them on a network in one
batch to the server.
RPC Architecture 27
Distributed System and How RPC Works?:
Cloud Computing
Step 1) On the run time client execute, client stub, and one instance of
RPC client machine.
Step 2) A client starts a client stub process by passing parameters.
Client’s own address space stores within the client stub. It
acknowledges the server stub by local RPC Runtime.
Step 3) RPC called by the user by accessing regular Local Procedural
Call. RPC Runtime manages broadcast the messages between
the network across client and server. It also does
acknowledgment, routing, performs the job of retransmission,
and encryption.
Step 4) After getting over the server procedure, it gets back to the server
stub, in which the message values are returned. In the transport
layer the layer gets back the message by server stub.
Step 5) During the step, the client sends message result to the transport
layer, which returns back a message to the client stub.
Step 6) In this stage, the return parameters called by the client stub, in
the resulting packet, and the execution process returns to the
caller.
28
28
Introduction to Distributed
Computing Concepts
Disadvantages of RPC:
● In Remote Procedure the parameter is call by values and pointer
which is not allowed in RPC.
● The time required for Remote procedure call is significantly lower
than that for a local procedure.
● The probability of failure occurs is high as it involves a
communication system, another machine, and another process.
● There are many different ways of RPC implementation, hence it is not
standard.
● Flexibility is not offered in RPC for hardware architecture as It is
mostly interaction-based.
● In remote procedure call, the cost of the process is increased.
RMI Registry:
RMI registry is a namespace on which all server objects are placed. Each
time the server creates an object, it registers this object with the
RMIregistry (using bind() or reBind() methods). These are registered
using a unique name known as bind name.
To invoke a remote object, the client needs a reference of that object. At
that time, the client fetches the object from the registry using its bind name
(using lookup() method).
2.10 SUMMARY
To exchange the data, the cooperating processes, need to communicate
with each this transaction of process can be called as Inter-process
communication. it is the mechanism of communicating among processes.
To established a shared memory region, require communicating process
which can run through the shared memory model. The message passing
mechanism provides an alternative means processes for communication.
In this mode, processes interact with each other through messages with
assistance from the underlying operating system. In group communication
all the servers are connected together with each other hence the message is
31
Distributed System and sent to a group and then this message is delivered to all server of the
Cloud Computing group. Group communication can be possible in both the environments. It
is reliable and ordered in multicast message processing. The extended
form of RMI is Remote Method Invocation. In this it allows an object
residing in one system (JVM) to gain access /invoke an object running on
another JVM. In case of primitive type, the parameters are put together
and a header is attached to it. In case the parameters are objects, then they
are serialized. This process is known as marshalling. At the server side,
the packed parameters are unbundled and then the required method is
invoked. This process is known as unmarshalling.
*****
32
32
UNIT II
3
CLOCK SYNCHRONIZATION
Unit Structure
3.1 Introduction
3.2 Clock Synchronization
3.2.1 How Computer Clocks Are Implemented
3.2.2 Drifting or Clocks
3.2.3 Clock Synchronization Issues
3.2.4 Clock Synchronization Algorithms
3.3 Mutual Exclusion
3.3.1 Centralized Approach
3.3.2 Distributed Approach
3.3.3 Token-Passing Approach
3.4 Reference
3.1 INTRODUCTION
A distributed system is a collection of distinct processes that are spatially
separated and run concurrently. In systems with multiple concurrent
processes, it is economical to share the hardware or software resources
among the concurrently executing processes. In such situation, sharing
may be cooperative or competitive. Since the number of available
resources in a computing system is restricted, one process must
necessarily influence the action of other concurrently executing processes
as it competes for resources. For example, a resource such as a tape drive
that cannot be used simultaneously by multiple processes, a process
willing to use it must wait if it is in use by another process. This chapter
presents synchronization mechanisms that are suitable for distributed
systems.
Figure 3.1
To make the computer clock function as an ordinary clock used by us in
our day-to day life, the following things are done:
34
34
1. The value in the constant register is chosen so that 60 clock ticks Clock Synchronization
occur in a second.
2. The computer clock is synchronized with real time (external clock).
For this, two more values are stored in the system-a fixed starting date
and time and the number of ticks. For example, in UNIX, time begins
at 0000 on January 1, 1970. At the time of initial booting, the system
asks the operator to enter the current date and time. The system
converts the entered value to the number of ticks after the fixed
starting date and time. At every clock tick, the interrupt service
routine increments the value of the number of ticks to keep the clock
running.
35
Distributed System and
Cloud Computing
Clock synchronization requires each node to read the other nodes clock
values. The actual mechanism used by a node to read other clocks differs
from one algorithm to another. However, regardless of the actual reading
mechanism, a node can obtain only an approximate view of its clock skew
with respect to other nodes' clocks in the system. Errors occur mainly
because of unpredictable communication delays during message passing
used to deliver a clock signal or a clock message from one node to
another. A minimum value of the unpredictable communication delays
between two nodes can be computed by counting the time needed to
prepare, transmit, and receive an empty message in the absence of
transmission errors and any other system load. However, in general, it is
rather impossible to calculate the upper bound of this value because it
depends on the amount of communication and computation going on in
parallel in the system, on the possibility that transmission errors will cause
messages to be transmitted several times, and on other random events,
such as page faults, process switches, or the establishment of new
communication routes.
Centralized Algorithms:
Distributed Algorithms:
38
38
1. Mutual exclusion: Clock Synchronization
At any time only one process should access the shared resource. That is, a
process that has been granted the resource must release it before it can be
granted to another process.
2. No starvation:
If every process that is granted the resource eventually releases it, every
request must be eventually granted.
39
Distributed System and
Cloud Computing
1. Process failure:
A process failure in the system causes the logical ring to break. In such
situation, a new logical ring must be established to ensure the continued
circulation of the token among other processes. This requires detection of
a failed process and dynamic reconfiguration of the logical ring when a
failed process is detected or when a failed process recovers after failure.
Failed process can be easily detected by making it a rule that a process
receiving the token from its neighbor always sends an acknowledgment
message to its neighbor. With this rule, a process detects that its neighbor
has failed when it sends the token to it but does not receive the
acknowledgment message within a fixed time period.
On the other hand, dynamic reconfiguration of the logical ring can be done
by maintaining the current ring configuration with each process. When a
process detects that its neighbor has failed, it removes the failed process
from the group by skipping it and passing the token to the next alive
process in the sequence. When a process becomes alive after recovery, it
simply informs its previous neighbor in the ring so that it gets the token
during the next round of circulation.
45
Distributed System and 2. Lost token:
Cloud Computing
If the token is lost, a new token must be generated. The algorithm must
have mechanisms to detect and regenerate a lost token. One method to
solve this problem is to designate one of the processes on the ring as a
"monitor" process. The monitor process periodically circulates a "who
has the token?" message on the ring. This message rotates around the
ring from one process to another.
All processes simply pass this message to their neighbor process, except
the process that has the token when it receives this message. This process
writes its identifier in a special field of the message before passing it to its
neighbor. After one complete round, when the message returns to the
monitor process it checks the special field of the message. If there is no
entry in this field, it concludes that the token has been lost, generates a
new token, and circulates it around the ring.
There are two problems associated with this method- the monitor process
may itself fail and the "who has the token?" message may itself get
lost. Both problems may be solved by using more than one monitor
processes. Each monitor process independently checks the availability of
the token on the ring. However, when a monitor process detects that the
token is lost, it holds an election with other monitor processes to decide
which monitor process will generate and circulate a new token. An
election is required to prevent the generation of multiple tokens that may
happen when each monitor process independently detects that the token is
lost, and each one generates a new token.
3.4 REFERENCE
Pradeep K. Sinha, Distributed Operating System: Concepts and
Design, PHI Learning.
*****
46
46
4
ELECTION ALGORITHMS
Unit Structure
4.1 Deadlock Introduction
4.1.1 Necessary Conditions for Deadlock
4.1.2 Deadlock Modeling
4.1.3 Deadlock Prevention
4.2 Election Algorithms
4.2.1 The Bully Algorithm
4.2.2 A Ring Algorithm
4.2.3 Discussion of the Election Algorithms
4.3 Summary
4.4 Reference
4.1 DEADLOCK
In the previous section we saw that there are several resources in a system
for which the resource allocation policy must ensure exclusive access by a
process. A system consists of a finite number of units of each resource
type for example, three printers, six tape drives, four disk drives, two
CPUs, etc. When multiple concurrent processes have to compete to use
resource, the sequence of events required to use a resource by a process is
as follows:
1. Request:
The process first makes a request for the resource. If the requested
resource is not available, possibly because it is being used by another
process, the requesting process must wait until the requested resource is
allocated to it by the system.
Note that if the system has multiple units of the requested resource type,
the allocation of any unit of the type will satisfy the request. Process may
request as many units of a resource as it requires with the restriction that
the number of units requested may not exceed the total number of
available units of the resource.
2. Allocate:
The system allocates the resource to the requesting process as soon as
possible. It maintains a table in which it keeps records of each resource
whether it is free or allocated and, if it is allocated, to which process its
allocated. If the requested resource is currently allocated to another
process, the requesting process is added to a queue of processes waiting
for this resource. Once the system allocates the resource to the requesting
process, that process can exclusively use the resource by operating on it.
47
Distributed System 3. Release:
and Cloud Computing
Once the process has finished using the allocated resource, it releases the
resource to the system. The system table records are updated at the time of
allocation and release to reflect the current status of availability of
resources.
The request and release of resources are called system calls, such as
request and release for devices, open and close for files, and allocate and
free for memory space. Notice that of the three operations, allocate is the
only operation that the system can control. The other two operations are
initiated by a process. If the total request made by multiple concurrent
processes for resources of a certain type exceeds the available amount,
some way is required to order the assignment of resources in time. Extra
care must be taken that the strategy applied cannot cause a deadlock, that
is, a situation where a set of processes are blocked because each process is
holding a resource and waiting for another resource acquired by some
other process.
It may happen that some processes that have entered the waiting state
(because the requested resources were not available at the time of request)
will never again change state, because the resources they have requested
are held by other waiting processes. This situation is called deadlock, and
the processes involved are said to be deadlocked. Hence, deadlock is the
state of permanent blocking of a set of processes each of which is waiting
for an event that only another process in the set can cause.
A deadlock situation can be explained with the help of an example given
below. Suppose that a system has two processes P1 and P2 & two tape
drives T1 and T2; resource allocation strategy is such that a requested
resource is immediately allocated to the requester if the resource is free.
Suppose that two concurrent processes P1 and P2 make requests for the
tape drives in the following order:
1. P1 requests for one tape drive and the system allocates tape drive T1 to
it.
2. P2 requests for one tape drive and the system allocates tape drive T2 to
it.
3. Now P1 requests for one more tape drive and enters a waiting state
because no tape drive is currently available.
4. Now P2 requests for one more tape drive and it also enters a waiting
state because no tape drive is currently available.
Now onwards, P1 and P2 will wait for each other indefinitely, since P1
will not release T1 until it gets T2 to carry out its designated task, that is,
not until P2 has released T2, whereas P2 will not release T2 until it gets
T1. Therefore, the two processes are in a state of deadlock. Note that the
requests made by the two processes are totally legal because each is
requesting for only two tape drives, which is the total number of tapes
48
48
drives available in the system. However, the deadlock problem occurs Election Algorithms
because the total requests of both processes exceed the total number of
units for the tape drive and the resource allocation policy is such that it
immediately allocates a resource on request if the resource is free.
In the context of deadlocks, the term "resource" applies not only to
physical objects such as tape and disk drives, printers, CPU cycles, and
memory space but also to logical objects such as a locked record in a
database, files, tables, semaphores, and monitors. However, these
resources should permit only exclusive use by a single process at a time
and should be nonpreemptable. A nonpreemptable resource is one that
cannot be taken away from a process to which it was allocated until the
process voluntarily releases it. If taken away, it has ill effects on the
computation already performed by the process. For example, a printer is a
nonpreemptable resource because taking the printer away from a process
that has started printing but has not yet completed its printing job and
giving it to another process may produce printed output that contains a
mixture of the output of the two processes. This is certainly unacceptable.
1. Mutual-exclusion:
If a resource is held by a process, any other process requesting for that
resource must wait until the resource has been released.
2. Hold-and-wait:
Processes can request for new resources without releasing the resources
that they are currently holding.
3. No-preemption:
A resource that has been allocated to a process becomes available for
allocation to another process only after it has been voluntarily released by
the process holding it.
4. Circular-wait:
Two or more processes must form a circular chain in which each process
is waiting for a resource that is held by another process.
A set of processes are waiting for each other in a circular fashion. For
example, let’s say there are a set of processes {P0, P1, P2, P3} such
that P0 depends on P1, P1 depends on P2, P2 depends on P3 and P3
depends on P0. This creates a circular relation between all these processes,
and they have to wait forever to be executed.
All four conditions must hold simultaneously in a system for a deadlock to
occur. If anyone of them is absent, no deadlock can occur. Notice that the
four conditions are not completely independent because the circular-wait
49
Distributed System condition implies the hold-and-wait condition. Although these four
and Cloud Computing conditions are somewhat interrelated, it is quite useful to consider them
separately to devise methods for deadlock prevention.
Example:
1. Directed graph:
A directed graph is a pair (N, E), where N is a nonempty set of nodes and
E is a set of directed edges. A directed edge is an ordered pair (a, b),
where a and b are nodes in N.
2. Path:
A path is a sequence of nodes (a, b, c, ......i, j) of a directed graph such that
(a, b), (b, c), ... , (i, j) are directed edges. Obviously, a path contains at
least two nodes.
3. Cycle:
A cycle is a path whose first and last nodes are the same.
50
50
4. Reachable set: Election Algorithms
The reachable set of a node a is the set of all nodes b such that a path
exists from a to b.
5. Knot:
A knot. is a nonempty set K of nodes such that the reachable set of each
node in K is exactly the set K. A knot always contains one or more cycles.
An example of a directed graph is shown in Figure 4.2.
The graph has a set of nodes {a, b, c, d, e, f} and a set of directed edges
{(a, b), (b, c), (c, d), (d, e), (e, f) (f, a), (e, b)}. It has two cycles (a, b, c, d,
e, f, a) and (b, c, d, e, b). It also has a knot {a, b, c, d, e, f} that contains
the two cycles of the graph.
1. Process nodes:
A process node represents a process of the system. In a resource allocation
graph, it is normally shown as a circle, with the name of the process
written inside the circle (nodes P1, P2, and P3 of Fig. 4.3)
2. Resource nodes:
A resource node represents a resource of the system. In a resource
allocation graph, it is normally shown as a rectangle with the name of the
resource written inside the rectangle. Since a resource type R, may have
more than one unit in the system, each such unit is represented as a bullet
within the rectangle. For instance, in the resource allocation graph of
51
Distributed System Figure 4.3, there are two units of resource R1, one unit of R2, and three
and Cloud Computing units of R3.
3. Assignment edges:
An assignment edge is a directed edge from a resource node to a process
node. It signifies that the resource is currently held by the process. In
multiple units of a resource type, the tail of an assignment edge touches
one of the bullets in the rectangle to indicate that only one unit of the
resource is held by that process. Edges (R1, P1), (R1, P3), and (R2, P2)
are the three assignment edges in the resource allocation graph of Figure
4.3.
4. Request edges:
A request edge is a directed edge from a process node to a resource node.
It signifies that the process made a request for a unit of the resource type
and is currently waiting for that resource. Edges (P1, R2) and (P2, R1 ) are
the two request edges in the resource allocation graph of Figure 4.3.
Figure 4.4
53
Distributed System 3. Eliminate No Preemption:
and Cloud Computing
Preempt resources from the process when resources are required by other
high priority processes.
4. Eliminate Circular Wait:
Each resource will be assigned a numerical value, and a process has to
access the resource in increasing or decreasing order. For Example, if
process P1 is allocated resource R5, now next time if P1 ask for R4 or R3
which is lesser than R5; this request will not be granted, only request for
resources greater than R5 will be granted.
Distributed system:
Distributed system is a collection of independent computers that do not
share their memory. Each processor has its own memory and they
communicate via communication networks. Communication in networks
is implemented in a process on one machine communicating with a
process on another machine.
Distributed Algorithm is an algorithm that runs on a distributed system.
Several distributed algorithms require that there be a coordinator process
in the entire system that performs some type of coordination activity
needed for the smooth running of other processes in the system. Since all
other processes in the system must interact with the coordinator, they all
must unanimously agree on who the coordinator is. Furthermore, if the
coordinator process fails due to the failure of the site on which it is
located, a new coordinator process must be elected to take up the job of
the failed coordinator.
54
54
below. In the description of both algorithms we will assume that there is Election Algorithms
only one process on each node of the distributed system.
55
Distributed System Now we will see working of this algorithm with the help of an example.
and Cloud Computing Suppose the system consists of five processes P1, P2, P3, P4 and P5 and
their priority numbers are 1, 2, 3, 4, and 5 respectively. Also assume that
at an instance of time the system is in a state in which P2 is crashed, and
P1, P3, P4 and P5 are active. Starting from this state, the functioning of the
bully algorithm with the changing system states is illustrated below.
1. P5 is the coordinator in the starting state. Suppose P5 crashes.
2. Process P3 sends a request message to P5 and does not receive a reply
within the fixed time period.
3. Process P3 assumes that P5 has crashed and it initiates an election by
sending an election message to P4 and P5. An election message is sent
only to processes with higher priority numbers.
4. When P4 receives P3's election message, it sends an alive message to
P3, informing that it is alive and will take over the election activity.
Process P5 cannot respond to P3's election message because it is down.
5. Now P4 holds an election by sending an election message to P5.
6. Process P5 does not respond to P4's election message because it is
down, and therefore, P4 wins the election and sends a coordinator
message to P1, P2 and P3 informing them that from now onwards it is
new coordinator. Obviously, this message is not received by P2
because it is currently down.
7. Now suppose P2 recovers from failure and initiates an election by
sending an election message to P3, P4 and P5. Since P2's priority
number is lower than that of P4 (current coordinator), P4 will win the
election initiated by P2 and will continue to be the coordinator.
8. Finally, suppose P5 recovers from failure. Since P5 is the process with
the highest priority number, it simply sends a coordinator message to
P1, P2, P3 and P4 and becomes the new coordinator.
57
Distributed System coordinator messages. Hence, in the best case, the bully algorithm requires
and Cloud Computing only n-2 messages.
On the other hand, in the ring algorithm, irrespective of which process
detects the failure of the coordinator and initiates an election, an election
always requires 2(n-l) messages (assuming that only the coordinator
process has failed); n-l messages are needed for one round rotation of the
election message, and another n-l messages are needed for one round
rotation of the coordinator message.
Next let us consider the complexity involved in the recovery of a process.
In the bully algorithm, a failed process must initiate an election on
recovery. Therefore, once again depending on the priority number of the
process that initiates the recovery action, the bully algorithm requires
O(n2) messages in the worst case, and n-l messages in the best case.
On the other hand, in the ring algorithm, a failed process does not initiate
an election on recovery but simply searches for the current coordinator.
Hence, the ring algorithm requires only n/2 messages on an average for
recovery action.
In conclusion, as compared to the bully algorithm, the ring algorithm is
more efficient and easier to implement.
4.3 SUMMARY
Several distributed algorithms require that there should be a coordinator
process in the entire system. Election algorithms are meant for electing a
coordinator process from among the currently running processes. We
discussed two election algorithms in this chapter i.e the bully algorithm
and the ring algorithm.
4.4 REFERENCE
Pradeep K. Sinha, Distributed Operating System: Concepts and
Design, PHI Learning.
*****
58
58
UNIT III
5
DISTRIBUTED SHARED MEMORY
Unit Structure
5.1 Introduction of Distributed Shared Memory
5.2 Fundamentals concept of DSM
5.2.1 Architecture of DSM
5.2.2 Advantages of DSM
5.2.3 Disadvantages of DSM
5.2.4 Types of Distributed Shared Memory (DSM)
5.2.5 Various hardware DSM systems
5.2.6 Consistency Models
5.2.7 Issues in designing and implementing DSM systems
5.3 Summery
5.4 Reference for further reading
5.5 Bibliography
5.6 Further Reading topics
5.7 Question
5.1 INTRODUCTION
DSM stands Distributed Shared Memory. It is form of memory
architecture where shared means the address space of memory shared.
DSM is a mechanism of allowing user processes to access shared data
without using inter-process communications. DSM refers to shared
memory paradigm applied to loosely coupled distributed memory systems.
Shared memory exists virtual only which is similar virtual memory so
sometimes it’s also called DSVM-Distributed Shared Virtual Memory. It
provides a virtual address space shared among processes on loosely
coupled processors.
It is basically an abstraction that integrates the local memory of different
machine into a single logical entity shared by cooperating processes.
In DSM, every implement shared node has its own memory and provides
memory read and write services and it provide consistency protocols. The
distributed shared memory (DSM) implements shared memory model in
distributed system but doesn’t have physical shared memory. All the
nodes share the virtual address space provided by the shared memory
model.
60
Distributed Shared memory: Distributed Shared Memory
. 61
Distributed System Disturbed shared memory comparison with Message passing
and Cloud Computing
Message passing Distributed shared memory
Variables must be marshalled. Variables are shared directly
Cost of communication is visible Cost of communication is invisible
Processes are protected by having Process could cause error by altering
private address space data.
Process executes at same time. Process executing may happen with
non-overlapping lifetimes.
62
62
In simple language we can say nodes can communicate through managing Distributed Shared Memory
application. Refers the following diagram of distributed shared memory
communication among nodes.
63
Distributed System 5.2.4 Types of Distributed Shared Memory (DSM)
and Cloud Computing
1) On- Chip Memory
2) Bus-Based Multiprocessors
3) Ring-Based Multiprocessors
4) Switched Multiprocessors
2) Bus-Based Multiprocessors:
A bus-based multiprocessor
is a type of multiprocessor A set of parallel wires called bus acts as a connection between CPU and
system where multiple memory.
processors (CPUs) are
connected to a single shared Network traffic is reduced by using caches with each CPU.
communication bus, which is
used to access shared Some Algorithms are used to prevent two CPU trying to access same
memory or communicate with
other devices. This memory simultaneously.
use case : Small-scale multiprocessor systems
architecture is commonly where the number of processors is limited (e.g.
used in small-scale As uses single bus makes it overloaded. , 2-8 processors).
multiprocessor systems due
to its simplicity and cost-
effectiveness.
Advantages :
Simplicity
cost-Effective
Ease of programming
DisAdvantages :
Scalability issue
contention : multiple
processors
64
64 competing for the
bus can lead to delay
Latency :
3) Ring-Based Multiprocessors: Distributed Shared Memory
A single address line is partitioned into a private area and shared area.
All nodes are connected via a token passing ring.
Shared area is divided into 32 bytes.
There is no global centralized memory present in ring-based
multiprocessors.
Communication
Mechanism: Data
travels along the ring,
passing through
intermediate
processors to reach its
destination.
Ring Interconnect: A
circular connection
structure linking all
processors in the
system.
65
Distributed System 5.2.5 Various hardware DSM systems:
and Cloud Computing
Distributed shared memory (DSM) can be implemented in software or
hardware.
Software implementation of distributed shared memory is easy, and which
is used by software. Message passing on same cluster, so it is easy. When
we apply software distributed shared memory there is no need to change
hardware.
In this diagram,
CPU-Central Processing Unit
Mm-Memory Management
NIC-Network Interface Card
66
66
5.2.6 Consistency Models: Distributed Shared Memory
Notations:
67
Distributed System
and Cloud Computing
68
68
6) Processor Consistency Model: Distributed Shared Memory
1) Granularity:
Granularity means block size of shared memory. When data sharing and
transfer across the network that time block faults occurs may be word or
page faults. A chunk of memory contains the word and issue is size of the
chunk.
69
Distributed System
and Cloud Computing
5) Replacement Strategy:
In this strategy the data block must be replaced by new data block. When
the local memory of the node is full, a cache misses at that node from
fetched of access of data block from remote node and replacement also.
6) Thrashing:
Data block of memory migrate between nodes on demands. Different
technique used to less thrashing.
Application Controlled Lock
Algorithm to shared date used pattern a
70
70
page access by using Distributed Shared Memory
6) Heterogeneity:
It is applied on operating system, computer hardware, networks on system
with implementation of developers. Heterogeneity provides environment
like client-server .it treats as middleware which is set of services which
interacts to end users. Basically, Heterogeneity works on
1) Networks: Internet protocols is used in network for communication
purpose.
2) Computer Hardware: Internal representation of different processor.
3) Programming Language: Data structured are represented for data
using different programming languages.
4) Operating System:To communicate different operating system used
to send message.
5) Implementation of different Developers: Follows different
standards for communication.
5.3 SUMMARY
DSM means Distributed Shared Memory which is not physical memory it
is virtual memory. It is a form of memory architecture where shared
means the address space of memory shared. DSM is a mechanism of
allowing user processes to access shared data without using inter-process
communications. DSM refers to the shared memory paradigm applied to
loosely coupled distributed memory systems. We learned what is
Distributed shared memory with its architecture that what is the role of
memory mapping manager and communication network unit. What is the
difference between message passing and distributed shared memory?
There are different types of distributed shared memory likes On- Chip
Memory, Bus-Based Multiprocessors, Ring-Based Multiprocessors and
Switched Multiprocessors.
As Distributed shared memory can be implemented in software as well as
in hardware. In software there are three layers Page Based, Shared
Variable Base, Object Based. In Hardware CPU (Central Processing Unit),
MU (Memory Unit) and NIC (Network Interface Card) required. We
learned different Consistency Models of distributed shared memory.
We discussed different issues of distributed shared memory.
71
Distributed System 5.4 BIBLIOGRAPHY
and Cloud Computing
1)
https://www.cc.gatech.edu/classes/AY2010/cs4210_fall/papers/DS
M_protic.pdf
2) https://www.geeksforgeeks.org/what-is-distributed-shared-memory-
and-its-advantages/
3) https://vedveethi.co.in/eNote/DistSys/CS-702%20U-2.htm
4) https://en.wikipedia.org/wiki/Distributed_shared_memory
5) https://www.slideserve.com/dianne/distributed-computing
6) https://slideplayer.com/slide/6897641/
7) https://nptel.ac.in/courses/106102114/
8) Pradeep K. Sinha, Distributed Operating System: Concepts and
Design, PHI Learning, ISBN No. 978-81-203-1380-4
9) Andrew S. Tanenbaum, Distributed Operating Systems, Pearson
Education, ISBN No. 978-81-317-0147-8
5.7 QUESTIONS
1) Explain Architecture of Distributed Shared Memory.
2) What is Distributed shared memory? Explain its architecture.
3) Explain various consistency model in details.
4) Explain Different types of Distributed Shared Memory.
5) What is the issue of design and implementation of distributed shared
memory?
6) Explain advantages and disadvantages of distributed shared memory.
7) Explain Distributed shared memory with its architecture?
*****
72
72
UNIT IV
6
DISTRIBUTED SYSTEM MANAGEMENT
Unit Structure
6.1 Introduction to distributed system
6.1.1 Types of Distributed Systems
6.1.2 Advantages of Distributed Systems
6.1.3 Disadvantages of Distributed Systems
6.1.4 How does a distributed system work?
6.2 Introduction to resource management
6.2.1 Architecture of resource Management in distributed
Environment
6.3 Scheduling algorithms
6.3.1 Features of scheduling algorithms
6.3.2 Local Scheduling
6.3.3 Stride Scheduling
6.3.3.1 Extension to Stride Scheduling
6.3.4 Predictive Scheduling
6.3.5 Coscheduling
6.3.6 Gang Scheduling
6.3.7 Implicit Coscheduling
6.3.8 Dynamic Coscheduling
6.4 Task assignment Approach
6.4.1 Resource Management
6.4.2 Working of Task Assignment Approach
6.4.3 Goals of Task Assignment Algorithms
6.4.4 Need for Task Assignment in a Distributed System
6.4.5 Example of Task Assignment Approach
6.5 Load balancing approach
6.5.1 Classification of Load Balancing Algorithms
6.5.2 Issues in Load Balancing Algorithms
6.6 Load sharing approach
6.6.1 Basic idea
6.6.2 Load Estimation Policies
6.6.3 Process Transfer Policies
6.6.4 Differences between Load Balancing and Load Sharing
6.7 Summary
6.8 Reference for further reading
6.9 Model Questions
73
Distributed System and 6.1 INTRODUCTION TO DISTRIBUTED SYSTEM
Cloud Computing
A distributed system contains multiple nodes that are physically separate
but linked together using the network. All the nodes in this system
communicate with each other and handle processes in tandem. Each of
these nodes contains a small part of the distributed operating system
software.
Fairness:
● Sharing resources among users raises new challenges in guaranteeing
that each user obtains his/her fair share when demand is heavy is
fairness.
● In a distributed system, this problem could be exacerbated such that
one user consumes the entire system.
● There are many mature strategies to achieve fairness on a single node.
Dynamic:
● The algorithms employed to decide where to process a task should
respond to load changes, and exploit the full extent of the resources
available.
Transparency:
● The behavior and result of a task’s execution should not be affected
by the host(s) on which it executes.
● In particular, there should be no difference between local and remote
execution.
● No user effort should be required in deciding where to execute a task
or in initiating remote execution; a user should not even be aware of
remote processing.
● Further, the applications should not be changed greatly.
● It is undesirable to have to modify the application programs in order
to execute them in the system.
Scalability:
● A scheduling algorithm should scale well as the number of nodes
increases.
● An algorithm that makes scheduling decisions by first inquiring the
workload from all the nodes and then selecting the most lightly loaded
node has poor scalability.
● This will work fine only when there are few nodes in the system.
● This is because the inquirer receives a flood of replies almost
simultaneously, and the time required to process the reply messages
78
78
for making a node selection is too long as the number of nodes (N) Distributed System
increase. Management
Fault tolerance:
● A good scheduling algorithm should not be disabled by the crash of
one or more nodes of the system.
● Also, if the nodes are partitioned into two or more groups due to link
failures, the algorithm should be capable of functioning properly for
the nodes within a group.
● Algorithms that have decentralized decision making capability and
consider only available nodes in their decision making have better
fault tolerance capability.
Stability:
● Fruitless migration of processes, known as processor thrashing, must
be prevented.
E.g. if nodes n1 and n2 observe that node n3 is idle and then offload a
portion of their work to n3 without being aware of the offloading
decision made by the other node.
● Now if n3 becomes overloaded due to this it may again start
transferring its processes to other nodes.
● This is caused by scheduling decisions being made at each node
independently of decisions made by other nodes.
79
Distributed System and 6.3.2 Local Scheduling
Cloud Computing
In a distributed system, local scheduling means how an individual
workstation should schedule those processes assigned to it in order to
maximize the overall performance. It seems that local scheduling is the
same as the scheduling approach on a stand-alone workstation. However,
they are different in many aspects. In a distributed system, the local
scheduler may need global information from other workstations to achieve
the optimal overall performance of the entire system. For example, in the
extended stride scheduling of clusters, the local schedulers need global
ticket information in order to achieve fairness across all the processes in
the system.
In recent years, there have been many scheduling techniques developed in
different models. Here, we introduce two of them: one is a proportional-
sharing scheduling approach, in which the resource consumption rights of
each active process are proportional to the relative shares that it is
allocated. The other is predictive scheduling, which is adaptive to the CPU
load and resource distribution of the distributed system.
The traditional priority-based schedulers are difficult to understand and
give more processing time to users with many jobs, which leads to
unfairness among users. Numerous researches have been trying to find a
scheduler that is easy to implement and can solve the problem of
allocating resources to users fairly over time. In this environment,
proportional-share scheduling was brought out to effectively solve this
problem. With proportional-share scheduling, the resource consumption
rights of each active process are proportional to the relative shares that it is
allocated.
Loan & Borrow: In this approach, exhausted tickets are traded among
competing clients. When a user temporarily exits the system, other users
can borrow these otherwise inactive tickets. The borrowed tickets expire
when the user rejoins the system. When the sleeping user wakes up, it
stops loaning tickets and is paid back in exhaustible tickets by the
borrowing users. In general, the lifetime of the exhaustible tickets is equal
to the length the original tickets were borrowed. This policy can keep the
total number of tickets in the system constant over time; thus, users can
accurately determine the amount of resources they receive. However, it
also introduces an excessive amount of computation into the scheduler on
every sleep and wake-up event, which we don’t expect.
System Credit: This second approach is an approximation of the first one.
With system credits, clients are given exhaustible tickets from the system
when they awaken. The idea behind this policy is that after a client sleeps
and awakens, the scheduler calculates the number of exhaustible tickets
for the clients to receive its proportional share over some longer interval.
The system credit policy is easy to implement and does not add significant
overhead to the scheduler on sleep and wakeup events.
Proportional-share of resources can be allocated to clients running
sequential jobs in a cluster. In the cluster, users are guaranteed a
proportional-share of resources if each local stride-scheduler is aware of
the number of tickets issued in its currency across the cluster and if the
total number of base tickets allocated on each workstation is balanced. The
solution for the first assumption is simple: each local scheduler is
informed of the number of tickets issued in each currency, and then
correctly calculates the base funding of each local job. The solution for
distributing tickets to the stride-schedulers is to run a user-level tickets-
sever on each of the nodes in the cluster. Each stride-scheduler
periodically contacts the local ticket server to update and determine the
value of currencies.
Further, for parallel jobs in a distributed cluster, proportional-share
resources can be provided through a combination of stride-scheduling and
implicit coscheduling. Preliminary simulations of implicit coscheduling
for a range of a communication patterns and computation granularity
indicate that the stride-scheduler with system credit performs similarly to
the Solaris time-sharing scheduler which is used in the Berkeley NOW
environment
6.3.5 Coscheduling:
In 1982, Outsterhout introduced the idea of coscheduling , which
schedules the interacting activities (i. e., processes) in a job so that all the
activities execute simultaneously on distinct workstations. It can produce
82
82 benefits in both system and individual job efficiency. Without coordinated
scheduling, the processor thrashing may lead to high communication Distributed System
latencies and consequently degraded overall performance. With systems Management
connected by highperformance networks that already achieve latencies
within tens microseconds, the success of coscheduling becomes a more
important factor in deciding the performance.
84
84
The implementation consists three parts: Distributed System
Management
Monitoring Communication/Thread Activity:
A firmware, which is on the network interface card, monitors the thread
activities by periodically reading the host's kernel memory. If the
incoming message is sent to the process currently running, the scheduler
should do nothing.
86
86
● The IPC (Inter-Process Communication) cost is known for every pair Distributed System
of tasks performed between nodes. Management
87
Distributed System and ● inter-task communication cost: cij refers to inter-task
Cloud Computing communication cost between tasks i and j.
Task
s t1 t2 t3 t4 t5 t6
t1 0 6 4 0 0 12
t2 6 0 8 12 3 0
t3 4 8 0 0 11 0
t4 0 12 0 0 5 0
t5 0 3 11 5 0 0
t6 12 0 0 0 0 0
Execution Cost
Nodes
Task
s n1 n2
t1 5 10
t2 2 infinity
t3 4 4
t4 6 3
t5 5 2
t6 infinity 4
88
88
Note: The execution of the task (t2) on the node (n2) and the execution of Distributed System
the task (t6) on the node (n1) is not possible as it can be seen from the Management
above table of Execution costs that resources are not available.
Case1: Serial Assignment
Nod
Task e
t1 n1
t2 n1
t3 n1
t4 n2
t5 n2
t6 n2
89
Distributed System and Case2: Optimal Assignment
Cloud Computing
Task Node
t1 n1
t2 n1
t3 n1
t4 n1
t5 n1
t6 n2
90
90
6.5.1 Classification of Load Balancing Algorithms Distributed System
Management
91
Distributed System and Cooperative versus Non-cooperative:
Cloud Computing
● Priority assignment policy and migration limiting policy are the same
as that for the load-balancing algorithms.
Policies:
The location policy decides whether the sender node or the receiver node
of the process takes the initiative to search for suitable node in the system,
and this policy can one of the following:
Sender-initiated location policy: Sender node decides where to send the
process. Heavily loaded nodes search for lightly loaded nodes
Receiver-initiated location policy: Receiver node decides from where to
get the process. Lightly loaded nodes search for heavily loaded nodes
initiated location policy: Node becomes overloaded, it either broadcasts
or randomly probes the other nodes one by one to find a node that is able
to receive remote processes. When broadcasting, suitable node is known
as soon as reply arrives
initiated location policy: Nodes becomes underloaded, it either broadcast
or randomly probes the other nodes one by one to indicate its willingness
to receive remote processes. Receiver-initiated policy require preemptive
process migration facility since scheduling decisions are usually made at
process departure epochs
93
Distributed System and ● Both policies gives substantial performance advantages over the
Cloud Computing situation in which no load-sharing is attempted.
● Sender-initiated policy is preferable at light to moderate system loads.
● Receiver-initiated policy is preferable at high system loads.
● Sender-initiated policy provide better performance for the case when
process transfer cost significantly more at receiver-initiated than at
sender-initiated policy due to the pre-emptive transfer of processes.
6.7 SUMMARY
A distributed file system (DFS) is a network file system wherein the file
system is distributed across multiple servers. DFS enables location
transparency and file directory replication as well as tolerance to faults.
Some implementations may also cache recently accessed disk blocks for
improved performance. Though distribution of file content increases
performance considerably, efficient management of metadata is crucial for
overall file system performance. It has been shown that 75% of all file
system calls access file metadata [15] and distributing metadata load is
important for scalability. Scaling metadata performance is more complex
than scaling raw I/O performance since even a small inconsistency in
metadata can lead to data corruption.
Resource management is the process by which businesses manage their
various resources effectively. Those resources can be intangible – people
and time – and tangible – equipment, materials, and finances.
It involves planning so that the right resources are assigned to the right
tasks. Managing resources involves schedules and budgets for people,
projects, equipment, and supplies. While it is often used in reference to
project management, it applies to many other areas of business
management. A small business, in particular, will pay attention to resource
management in a number of areas
95
Distributed System and 3. How does a distributed system work?
Cloud Computing
4. Explain the Architecture of resource Management in distributed
Environment
5. Write a short note on Features of scheduling algorithms
6. Explain any two scheduling algorithms in detail
7. Explain Resource Management
8. Discuss on Working of Task Assignment Approach
9. Explain the Goals of Task Assignment Algorithms
10. Discuss the Need for Task Assignment in a Distributed System
11. Explain with example the Task Assignment Approach
12. Explain the concept of Classification of Load Balancing Algorithms
13. What are the Issues in Load Balancing Algorithms
14. Explain the Basic idea of load sharing
15. Differentiate between Load Balancing and Load Sharing
*****
96
96
7
DISTRIBUTED SYSTEM MANAGEMENT
Unit Structure
7.1 Process Management
7.1.1 Introduction
7.1.2 What is Process Management?
7.1.3 The Importance of Process Management
7.1.4 Realtime Process Management Examples
7.1.5 How does BPM differ from workflow management?
7.1.6 Distinction between Digital Process Automation (DPA) and
Process Management
7.1.7 Examples of Business Process Management (BPM) phases
7.1.8 Digital process management: Application examples
7.1.9 How can process management be implemented?
7.1.10 Selection criteria for good BPM software
7.1.11 Benefits of Process Management
7.2 Process Migration in Distributed System
7.2.1 Why use Process Migration?
7.2.2 What are the steps involved in Process Migration?
7.2.3 Methods of Process Migration
7.3 Threads
7.4 Summary
7.5 Reference for further reading
7.6 Model questions
99
Distributed System and
Cloud Computing
Modeling:
The modeling phase essentially involves selecting and adapting processes
that are to be implemented.
Execution:
Once processes are defined, the execution phase begins, including efforts
to automate business processes.
Monitoring:
The monitoring phase is the prerequisite for subsequent optimization and
is used for targeted process control.
Optimization:
The optimization phase begins following monitoring. The knowledge
gained during monitoring makes it possible to improve the processes. It is
possible, for example, that there are subtasks that have not been
automated, that unnecessary steps are still being carried out, or that the
data structure in general may require readjustment.
Use cases that affect the entire company are, for example:
● Accounting and finance
● Purchasing decisions
● Administrative activities
● Customer services
● Facility management
● Personnel management
● Order processing
● Performance measurement
● Warehousing, logistics
● Standard operating procedures
● Employee performance and training
● Supplier and customer portal
In addition, there are processes whose origin can be assigned to a specific
department. All these workflows can be mapped digitally. Even if the
actual process management is detached from digital solutions, it can be
implemented much more easily. Digital and automated business processes
improve performance in all departments - this reduces overhead and
enables flexibility in the company.
Human Resources:
With the advancement of organization-wide digital strategies, the HR
department is also changing - and gaining in importance. In addition to
administrative tasks such as approving vacation requests or checking
applications, the HR department also has a strategic role to play in the
"war for talent". In the future, fewer administrative tasks will be on the
agenda than creative, intelligent activities in the corporate environment.
This, in turn, requires companies to digitize and automate HR processes
that used to be time-consuming and to connect them with the relevant data
and departments. More flexible work structures, further developing
corporate culture and supporting employees in digital transformation and
change management are today's new priorities. To free up time for
strategic and value-creating activities, HR employees must be relieved of
their administrative tasks. This is done through digital business processes.
Here are some workflows that can be digitized and automated in HR:
101
Distributed System and ● Vacation request
Cloud Computing
● Digital personnel file
● Travel expense report
● Application management
● Employee onboarding
● Expense report
● Business trip application
And here are the reasons why digital automaton is particularly worthwhile
in the HR sector:
● Promotion of new and agile ways of working and mobile working
● Increased employee satisfaction through fast processing
● Transparency in employee responsibilities
● Simplified change management
● Information exchange without interruption or integration problems
Administration:
Who hasn't had to handle a contract or obtain a permit? These tasks are
often time-consuming when performed in analog (manual) form.
Digitizing administrative processes is therefore an essential part of any
digitization strategy. In addition, workflows can only be accelerated
effectively when all systems, applications and company departments are
connected locally and across decentralized locations. To achieve this,
companies have to rethink their analog processes, sometimes redesigning
and streamlining them. As a result, the heterogeneous IT landscape
becomes interconnected, the company remains agile, and decision-makers
can focus on management tasks. This frees up employees to use their time
more effectively - and allows the company to maintain control over all
physical and electronic records at all times. Additionally, the newly
created visibility helps identify operational bottlenecks and continuously
improve processes. Here are a few workflows that can be digitized and
automated in administration:
● Contract management
● Training management
● Approval processes
● Digital construction file / digital project file
● Maintenance order / Production order
102
102 ● Fleet management
And these are the reasons why digital automation in administration is Distributed System
worthwhile: Management
Finance:
The finance department is undergoing a transformation. Administrative
tasks still fill the business day, but according to McKinsey Research, as
much as 75-79 percent of general accounting operations, cash
dispurement, and revenue management tasks in finance can be fully
automated in the future. Experts are convinced that with digitization,
employees in finance departments are developing into business partners,
answering trend-setting questions, interpreting data and contributing
increasing value to their organization. The problem arises when the
number of cross-departmental, control-relevant data continues to increase
- but beneficial processes for management are not implemented. As real-
time and ad-hoc analysis grows in importance along the entire value chain,
it will become a priority to set up an interconnected value-creation system.
Here are some workflows that can be digitized and automated in finance:
● Treasury Management
● Risk Assessment
● Purchase-to-Pay
o Invoice Receipt Verification
o Outgoing invoice processing
o Payment processing
● Expenditure planning
● bank report
● data controlling
● Accounts Payable / Accounts Receivable
103
Distributed System and And these are the reasons why digital automation in finance is worthwhile:
Cloud Computing
● More performance through integrated data exchange
● Creation of electronic invoices with information linking
● Establishment of a paperless filing system as a central reference for
all documents
● Ability to exchange, match and archive documents without material
costs
● Optimized cost control
Purchasing:
It is no secret that digital procurement processes can now be fully
automated. Companies also rely on operational and in-house digital
processes such as requirements gathering and pricing, with data from
various sources being integrated. However, it is the comprehensive
exchange of information that brings the full advantage of digital
procurement to light. Sustainable processes require more than strategic
data management. The degree of interconnection between employees,
departments and systems determines how digital and efficient purchasing
can be. In day-to-day business, this can be seen by optimizing the supply
chain and maximizing response time. With digital processes, the modern
buyer maintains full control and transparency over processes, tasks and
figures at all times, and can make decisions in real time, despite the large
number of purchasing processes involved.
Here are some workflows that can be digitized and automated in
purchasing:
● Investment request
● Goods receipt process
● Order processing
● Inventory process
● Delivery release
And these are the reasons why digital automation in purchasing is
worthwhile:
● Contract, supplier and procurement management with seamless
system and data integration
● Optimized supply chains
● Maximized reaction speed
● Automated routine processes (article dispositions, creation of order
proposals or price inquiries)
104
104
● Transparent bookings and stock levels Distributed System
Management
● New savings potential
● Reduced processing time
Sales:
Today, more than 50 percent of new employees already belong to the
"digital native" generation. They have has grown up with digital tools and
different ways of working and are transferring this experience to their
everyday working lives. They don't think much of mountains of
documents and Excel lists with manually prepared data or paperwork.
Customers have also opened up new information channels. By the time the
first contact with sales is made, the decision has often already been made,
so advance work must be done - on all channels. After all, the customer
should have the choice of how to get in touch and the employee should be
able to switch seamlessly between channels to qualify a lead efficiently.
Information overload and attention deficits among prospects demand that
companies streamline all sales processes and optimize them digitally.
Information must be quick and easy to find, and data must be efficient to
use. Here are some workflows that can be digitized and automated
in sales:
● Order processing
● Information download
● Quotation approval
● Compilation of product sheets
● B2B sales process
And these are the reasons why digital automation in sales is worthwhile:
● Simplification of processes, communication and advice
● Increased reach and sales
● Optimized sales productivity
● Build trust and prevent mistrust
● Reduced costs for administration and organization
● Increased effectiveness and reduced susceptibility to errors
● Simplified contact, data maintenance & collaboration
● Sustainable competitive advantage
106
106
Value creation processes: Distributed System
Management
Value-adding processes are essential for the creation of a product or the
provision of a service. They describe all corporate activities that are
geared to customer needs. The value-adding processes a company has
depend to a large extent on its industry focus or its core competencies.
Typically, sales and marketing processes are almost always part of the
value creation processes. It is also characteristic that different departments
of a company are integrated into the value creation processes.
Support processes:
Support processes, also known as supporting processes, are not customer-
oriented at first, but are necessary in order to carry out, control and
optimize value creation and management processes. These include, for
example, personnel selection and qualification as well as purchasing or the
payment of invoices. In contrast to value-added processes, support
processes can often be assigned to a single department.
Management processes:
Management processes relate to the company as a whole, contribute to the
planning and control of core and support processes, and serve to
strategically manage a company. Similar to support processes, this type of
process is not directly related to the value creation of a company.
Examples of management processes would include, but are not limited to:
Aligning the company strategically, defining the corporate mission
statement or formulating corporate goals.
107
Distributed System and Before you start with the implementation, however, you should have
Cloud Computing completed important tasks:
● Involve employees early on and offer training.
● Record process steps, responsibilities and other important information
in documentation.
● Identify and contact all stakeholders in a timely manner. Also
consider those who have no direct contact with the process but must
approve it - for example, the works council or data protection officers.
It is also advisable to set regular coordination and review dates even
before the introduction. In this way, you can ensure from the outset that
processes are constantly scrutinized and, if necessary, adapted or even
eliminated. We have also compiled five additional tips to help you
successfully implement your project management.
2. Start small:
"It's hard to get started" - a saying that couldn't be more apt for process
management. So don't make things unnecessarily difficult for yourself and
start small. Choose simple processes that address the current situation and
avoid being overwhelmed. Especially if process management is new. You
should aim for small stages that can be implemented quickly and easily.
Success will inspire you to do more - so you can gradually tackle other, as
yet untried processes in your company.
3. Create a schedule:
You will probably realize pretty quickly that many things will take much
longer than initially anticipated. Nevertheless, it makes sense to create a
schedule that includes generous milestones and that you can regularly
review and adjust. Process management is not something you can just do
on the side. It is important to give all involved employees sufficient time
to deal with the topic in detail and successfully implement their tasks.
108
108
4. Enable the exchange of information, ideas: Distributed System
Management
As with any new big project, sooner or later challenges crop up.
Restructuring, new service providers and much more can mean that you
have to throw your well-thought-out plan out the window and start over
again. Regular meetings are a good way to learn about changes early on
and to support each other. Bring yourself up to date, discuss problems and
inspire each other. The intervals at which you schedule meetings depend
on how much you need to discuss. It is advisable to choose shorter
intervals, especially during critical phases, such as at the beginning. Once
everything has settled in, weekly or monthly meetings are also advisable.
5. Use tools:
With your process map, you already have a powerful tool at hand. It is the
basis for analysis, meetings and further developments and should always
be included in the regular coordination meetings. But the choice of a
suitable means of communication is also crucial. E-mails often lead to
misunderstandings and ultimately cause more confusion than they help the
process. Therefore, only write e-mails if your request can be explained
succinctly and be sure to file any important documents in an agreed-upon
location. This is where project management tools have proven to be the
tool of choice, as they bundle all communication including documents,
schedules, and to-do lists, etc. There are a number of software products
available, some of which can even be used free of charge - for example,
Asana or Trello.
Monitoring:
Make sure that you can use the BPM software to monitor key business
indicators in real time, if possible. Ideally, you should be able to visualize
the data in a dashboard.
Scalability:
Every company has its own special features and unique requirements.
Therefore, make sure that the selected process management software can
109
Distributed System and handle them. This includes integration to third-party systems as well as
Cloud Computing preferred data types or archiving and search functions for documents. A
scalable solution is also necessary for companies with growth and
expansion plan. This enables them to handle not just current requirements,
but also future changes as well.
Security:
Recent surveys across many industries shows that data privacy and
security have become top priorities driving digital transformation
initiatives. Security is therefore one of the most important criteria when it
comes to selecting a suitable BPM tool. Particularly in the case of cloud
solutions where the question of data residency, where data is stored, is at
the forefront. In many countries and states, government regulations can be
particularly strict. It is therefore advisable to choose a software provider
that offers cloud options that meet data residency requirements. In
addition, process management software should meet the requirements of
data protection regulations such as GDPR.
Usability:
For process designers, a flexible and user-friendly interface is essential.
This enables them to model and adapt even complicated process forms
quickly and easily. Of course, end users also benefit from user-friendly
BPM software that makes it particularly easy to start and control
processes.
112
112
7.2.3 Methods of Process Migration: Distributed System
Management
The methods of Process Migration are:
1. Homogeneous Process Migration: Homogeneous process migration
implies relocating a process in a homogeneous environment where all
systems have a similar operating system as well as architecture. There are
two unique strategies for performing process migration. These are i) User-
level process migration ii) Kernel level process migration.
● User-level process migration: In this procedure, process migration is
managed without converting the operating system kernel. User-level
migration executions are more simple to create and handle but have
usually two issues: i) Kernel state is not accessible by them. ii) They
should cross the kernel limit utilizing kernel demands which are slow
and expensive.
● Kernel level process migration: In this procedure, process migration
is finished by adjusting the operating system kernel. Accordingly,
process migration will become more simple and more proficient. This
facility permits the migration process to be done faster and relocate
more types of processes.
Homogeneous Process Migration Algorithms:
There are five fundamental calculations for homogeneous process
migration:
● Total Copy Algorithm
● Pre-Copy Algorithm
● Demand Page Algorithm
● File Server Algorithm
● Freeze Free Algorithm
7.3 THREADS
A thread is a light weight process which is similar to a process where
every process can have one or more threads. Each thread contains a Stack
and a Thread Control Block. There are four basic thread models :
1. User Level Single Thread Model:
● Each process contains a single thread.
● Single process is itself a single thread.
● process table contains an entry for every process by maintaining its
PCB.
115
Distributed System and
Cloud Computing
7.4 SUMMARY
A thread is also known as lightweight process. The idea is to achieve
parallelism by dividing a process into multiple threads. For example, in a
browser, multiple tabs can be different threads. MS Word uses multiple
threads: one thread to format the text, another thread to process inputs, etc.
The primary difference is that threads within the same process run in a
shared memory space, while processes run in separate memory spaces.
Threads are not independent of one another like processes are, and as a
result threads share with other threads their code section, data section, and
OS resources (like open files and signals). But, like process, a thread has
its own program counter (PC), register set, and stack space
*****
117
8
DISTRIBUTED SYSTEM MANAGEMENT
Unit Structure
8.1 Data file system
8.1 .1 What is DFS (Distributed File System)?
8.1.2 Features of DFS
8.1.3 History
8.1.4 Applications
8.1.5 Working of DFS
8.1.6 Advantages
8.1.7 Disadvantages
8.1.8 Benefits of DFS Models
8.1.9 DFS and Backup
8.1.10 The challenges associated with DFS
8.1.11 Components
8.2 File Models in Distributed System
8.3 Summary
8.4 Reference for further reading
8.5 Model questions
118
a Common File System. A collection of workstations and mainframes Distributed System
connected by a Local Area Network (LAN) is a configuration on Management
Distributed File System. A DFS is executed as a part of the operating
system. In DFS, a namespace is created and this process is transparent for
the clients.
Redundancy:
Redundancy is done through a file replication component.
In the case of failure and heavy load, these components together improve
data availability by allowing the sharing of data in different locations to be
logically grouped under one folder, which is known as the ―DFS root‖.
It is not necessary to use both the two components of DFS together, it is
possible to use the namespace component without using the file replication
component and it is perfectly possible to use the file replication
component without using the namespace component between servers.
Structure transparency:
There is no need for the client to know about the number or locations of
file servers and the storage devices. Multiple file servers should be
provided for performance, adaptability, and dependability.
Access transparency:
Both local and remote files should be accessible in the same manner. The
file system should be automatically located on the accessed file and send it
to the client’s side.
Naming transparency:
There should not be any hint in the name of the file to the location of the
file. Once a name is given to the file, it should not be changed during
transferring from one node to another.
Replication transparency:
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
119
Distributed System and User mobility:
Cloud Computing
It will automatically bring the user’s home directory to the node where the
user logs in.
Performance:
Performance is based on the average amount of time needed to convince
the client requests. This time covers the CPU time + time taken to access
secondary storage + network access time. It is advisable that the
performance of the Distributed File System be similar to that of a
centralized file system.
Simplicity and ease of use:
The user interface of a file system should be simple and the number of
commands in the file should be small.
High availability:
A Distributed File System should be able to continue in case of any partial
failures like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have
different and independent file servers for controlling different and
independent storage devices.
Scalability:
Since growing the network by adding new machines or joining two
networks together is routine, the distributed system will inevitably grow
over time. As a result, a good distributed file system should be built to
scale quickly as the number of nodes and users in the system grows.
Service should not be substantially disrupted as the number of nodes and
users grows.
High reliability:
The likelihood of data loss should be minimized as much as feasible in a
suitable distributed file system. That is, because of the system’s
unreliability, users should not feel forced to make backup copies of their
files. Rather, a file system should create backup copies of key files that
can be used if the originals are lost. Many file systems employ stable
storage as a high-reliability strategy.
Data integrity:
Multiple users frequently share a file system. The integrity of data saved
in a shared file must be guaranteed by the file system. That is, concurrent
access requests from many users who are competing for access to the same
file must be correctly synchronized using a concurrency control method.
Atomic transactions are a high-level concurrency management mechanism
for data integrity that is frequently offered to users by a file system.
120
120
Security: Distributed System
Management
A distributed file system should be secure so that its users may trust that
their data will be kept private. To safeguard the information contained in
the file system from unwanted & unauthorized access, security
mechanisms must be implemented.
Heterogeneity:
Heterogeneity in distributed systems is unavoidable as a result of huge
scale. Users of heterogeneous distributed systems have the option of using
multiple computer platforms for different purposes.
8.1.3 History:
The server component of the Distributed File System was initially
introduced as an add-on feature. It was added to Windows NT 4.0 Server
and was known as ―DFS 4.1‖. Then later on it was included as a standard
component for all editions of Windows 2000 Server. Client-side support
has been included in Windows NT 4.0 and also in later on version of
Windows. Linux kernels 2.6.14 and versions after it come with an SMB
client VFS known as ―cifs‖ which supports DFS. Mac OS X 10.7 (lion)
and onwards supports Mac OS X DFS.
8.1.4 Applications:
NFS:
NFS stands for Network File System. It is a client-server architecture that
allows a computer user to view, store, and update files remotely. The
protocol of NFS is one of the several distributed file system standards for
Network-Attached Storage (NAS).
CIFS:
CIFS stands for Common Internet File System. CIFS is an accent of SMB.
That is, CIFS is an application of SIMB protocol, designed by Microsoft.
SMB:
SMB stands for Server Message Block. It is a protocol for sharing a file
and was invented by IMB. The SMB protocol was created to allow
computers to perform read and write operations on files to a remote host
over a Local Area Network (LAN). The directories present in the remote
host can be accessed via SMB and are called as ―shares‖.
Hadoop:
Hadoop is a group of open-source software services. It gives a software
framework for distributed storage and operating of big data using the
MapReduce programming model. The core of Hadoop contains a storage
part, known as Hadoop Distributed File System (HDFS), and an operating
part which is a MapReduce programming model.
121
Distributed System and NetWare:
Cloud Computing
NetWare is an abandon computer network operating system developed by
Novell, Inc. It primarily used combined multitasking to run different
services on a personal computer, using the IPX network protocol.
124
124
Distributed System
Management
Space Management:
Storage devices are divided into fixed-sized blocks called sectors. A
sector is the minimum storage unit on a storage device and is between
512 bytes and 4096 bytes (Advanced Format). However, file systems use a
high-level concept as the storage unit, called blocks. Blocks are an
abstraction over physical sectors; Each block usually consists of multiple
sectors. Depending on the file size, the file system allocates one or more
blocks to each file. Speaking of space management, the file system is
aware of every used and unused block on the partitions, so it’ll be able to
allocate space to new files or fetch the existing ones when requested.
The most basic storage unit in ext4-formatted partitions is the block.
However, the contiguous blocks are grouped into block groups for easier
management.
127
Distributed System and
Cloud Computing
128
128
Distributed System
Management
129
Distributed System and You can use the du command on Linux to see it yourself.
Cloud Computing
du -b "some-file.txt"
8.1.6 Advantages:
● DFS allows multiple user to access or store the data.
● It allows the data to be share remotely.
● It improved the availability of file, access time, and network
efficiency.
● Improved the capacity to change the size of the data and also
improves the ability to exchange the data.
● Distributed File System provides transparency of data even if server
or disk fails.
8.1.7 Disadvantages:
● In Distributed File System nodes and connections needs to be secured
therefore we can say that security is at stake.
● There is a possibility of lose of messages and data in the network
while movement from one node to another.
● Database connection in case of Distributed File System is
complicated.
● Also handling of the database is not easy in Distributed File System as
compared to a single user system.
● There are chances that overloading will take place if all nodes tries to
send data at once.
130
130
8.1.8 Benefits of DFS Models: Distributed System
Management
The distributed file system brings with it some common benefits.
A DFS makes it possible to restrict access to the file system, depending on
access lists or capabilities on both the servers and the clients, depending
on how the protocol is designed. Also, since the server also provides a
single central point of access for data requests, it is thought to be fault-
tolerant (as mentioned above) in that it will still function well if some of
the nodes are taken offline. This dovetails with some of the reasons that
DFS was developed in the first place – the system can still have that
integrity if a few workstations get moved around.
8.1.11 Components:
The components of DFS are as follows:
● Block Storage provider
● Client Driver
● Security provider
● Meta- Data Service
● Object service.
131
Distributed System and These components are pictorially represented below:
Cloud Computing
Features:
The features of DFS are as follows:
● User mobility
● Easy to use
● High availability
● Performance
● Coherent access
● Location independence
● File locking
● Multi-networking access
● Local gateways
● Multi-protocol access
Example:
Given below is an example of DFS Structure:
132
132
Distributed System
Management
133
Distributed System and There are mainly two types of file models in the distributed operating
Cloud Computing system.
1. Structure Criteria
2. Modifiability Criteria
Structure Criteria
There are two types of file models in structure criteria. These are as
follows:
1. Structured Files
2. Unstructured Files
Structured Files
The Structured file model is presently a rarely used file model. In the
structured file model, a file is seen as a collection of records by the file
system. Files come in various shapes and sizes and with a variety of
features. It is also possible that records from various files in the same file
system have varying sizes. Despite belonging to the same file system, files
have various attributes. A record is the smallest unit of data from which
data may be accessed. The read/write actions are executed on a set of
records. Different "File Attributes" are provided in a hierarchical file
system to characterize the file. Each attribute consists of two parts: a name
and a value. The file system used determines the file attributes. It provides
information on files, file sizes, file owners, the date of last modification,
the date of file creation, access permission, and the date of last access.
Because of the varied access rights, the Directory Service function is
utilized to manage file attributes.
134
134
Unstructured Files: Distributed System
Management
It is the most important and widely used file model. A file is a group of
unstructured data sequences in the unstructured model. Any substructure
does not support it. The data and structure of each file available in the file
system is an uninterrupted sequence of bytes such as UNIX or DOS. Most
latest OS prefer the unstructured file model instead of the structured file
model due to sharing of files by multiple apps. It has no structure;
therefore, it can be interpreted in various ways by different applications.
Modifiability Criteria:
There are two files model in the Modifiability Criteria. These are as
follows:
1. Mutable Files
2. Immutable Files
1. Mutable Files:
The existing operating system employs the mutable file model. A file is
described as a single series of records because the same file is updated
repeatedly once new material is added. After a file is updated, the existing
contents are changed by the new contents.
2. Immutable Files:
The Immutable file model is used by Cedar File System (CFS). The file
may not be modified once created in the immutable file model. Only after
the file has been created can it be deleted. Several versions of the same file
are created to implement file updates. When a file is changed, a new file
version is created. There is consistent sharing because only immutable
files are shared in this file paradigm. Distributed systems allow caching
and replication strategies, overcoming the limitation of many copies and
maintaining consistency. The disadvantages of employing the immutable
file model include increased space use and disc allocation activity. CFS
uses the "Keep" parameter to keep track of the file's current version
number. When the parameter value is 1, it results in the production of a
new file version. The previous version is erased, and the disk space is
reused for a new one. When the parameter value is greater than 1, it
indicates the existence of several versions of a file. If the version number
is not specified, CFS utilizes the lowest version number for actions such
as "delete" and the highest version number for other activities such
as "open".
The specific client's request for accessing a particular file is serviced on
the basis of the file accessing model used by the distributed file system.
The file accessing model basically depends on 1 ) the unit of data access
and 2 ) the method used for accessing remote files.
On the basis of the unit of data access, following file access models might
be used in order to access the specific file.
135
Distributed System and 1. File-level transfer model
Cloud Computing
2. Block-level transfer model
3. Byte-level transfer model
4. Record-level transfer model
1. File-level transfer model: In file-level transfer model, the complete
file is moved while a particular operation necessitates the file data to
be transmitted all the way through the distributed computing network
amongst client and server. This model has better scalability and is
efficient.
2. Block-level transfer model: In block-level transfer model, file data
transfers through the network amongst client and a server is
accomplished in units of file blocks. In short, the unit of data transfer
in block-level transfer model is file blocks. The block-level transfer
model might be used in distributed computing environment
comprising several diskless workstations.
3. Byte-level transfer model: In byte-level transfer model, file data
transfers the network amongst client and a server is accomplished in
units of bytes. In short, the unit of data transfer in byte-level transfer
model is bytes. The byte-level transfer model offers more flexibility
in comparison to the other file transfer models since, it allows
retrieval and storage of an arbitrary sequential subrange of a file. The
major disadvantage of byte-level transfer model is the trouble in cache
management because of the variable-length data for different access
requests.
4. Record-level transfer model: The record-level file transfer model
might be used in the file models where the file contents are structured
in the form of records. In record-level transfer model, file data
transfers through the network amongst client and a server is
accomplished in units of records. The unit of data transfer in record-
level transfer model is record.
8.3 SUMMARY
DFS allows multiple user to access or store the data. It allows the data
to be share remotely. It improved the availability of file, access time, and
network efficiency. Improved the capacity to change the size of the data
and also improves the ability to exchange the data.
The main purpose of the Distributed File System (DFS) is to allows users
of physically distributed systems to share their data and resources by using
a Common File System.
A collection of workstations and mainframes connected by a Local Area
Network (LAN) is a configuration on Distributed File System. A DFS is
executed as a part of the operating system. In DFS, a namespace is created
136
136 and this process is transparent for the clients.
8.4 REFERENCE FOR FURTHER READING Distributed System
Management
1. https://www.unf.edu/~sahuja/cis6302/filesystems.html
2. https://www.geeksforgeeks.org/what-is-dfsdistributed-file-system/
3. https://www.ques10.com/p/2247/what-are-the-good-features-of-a-
distributed-file-1/
4. https://www.javatpoint.com/distributed-file-system
5. https://en.wikipedia.org/wiki/Distributed_File_System_(Microsoft)
6. https://erandipraboda.medium.com/characteristics-of-distributed-file-
systems-bf5988f85d3
*****
137
UNIT V
9
INTRODUCTION TO CLOUD
COMPUTING
Unit Structure
9.0 Objective
9.1 Introduction
9.1.1 History and evolution
9.2 Characteristics of cloud computing
9.3 Cloud Computing example
9.4 Benefits of Cloud Computing
9.5 Risks of Cloud Computing
9.6 Cloud Computing Architecture
9.6.1 Cloud Architecture model
9.6.2 Types of Cloud
9.6.3 Cloud based Services
9.6.3.1 Software as a service (SaaS)
9.6.3.2 Platform as a service (PaaS)
9.6.3.3 Infrastructure as a service (IaaS)
9.7 Summary
9.8 Referances
9.0 OBJECTIVE
After studying this module, you will be able to understand
● Cloud Computing and its characteristics
● Benefits of Cloud Computing
● Cloud Computing Architecture
● Cloud based Services
9.1 INTRODUCTION
● Cloud Computing is a model for enabling convenient, on-demand
network access to a shared pool of resources that can be rapidly
provided and released with minimum management efforts.
● Cloud Computing intends to realize the concept of Computing as a
utility like water, gas, electricity etc.
138
● Cloud Computing referred as the accessing and storing of data and Introduction to Cloud
provide services related to computing over the internet. Computing
139
Distributed System and
Cloud Computing
9.2 CHARACTERISTICS OF CLOUD COMPUTING
1. On demand self-service Here the user is able to use web services
and resources on demand. User can
logon to the website any time and use
them.
2. Ubiquitous Access- As Cloud Computing is completely web
based user can have accessed it from
anywhere and anytime.
3. Resource Pooling- Resource pooling allows cloud
providers to pool large-scale IT
resources to serve multiple cloud
consumers. Different physical and
virtual IT resources are dynamically
assigned and reassigned according to
cloud consumer demand, typically
followed by execution through statistical
multiple xing.
4. Rapid Elasticity Elasticity is the automated ability of
acloud to transparent scale IT
resources, as required in response to
runtime conditions or as pre-determined
by the cloud consumer or cloud provider.
5. Measured Usage The measured usage characteristic
represents the ability of a Cloud
platform to keep track of the usage of its
IT resources, primarily by cloud
consumers. Based on what is measured,
the cloud provider can charge a cloud
consumer only for the IT resources used
and/or for the time frame during which
access to the IT resources was granted.
6. Resiliency Resilient computing is a form of failover
that distributes redundant
Implementations of IT resources across
physical locations.
140
140
With EC2 user may rent virtual machine instances to run their own Introduction to Cloud
software and also can monitor number of VMs as demand changes. Computing
141
Distributed System and
Cloud Computing
2. Lock in:
To switch from one Cloud Service provider(CSP) to another is very
difficult. It results in dependency on particular CSP.
3. Isolation Failure:
It involves the failure of isolation mechanism which will separate storage,
memory, routing between different tenants.
1. Frontend:
Frontend of the cloud architecture refers to the client side of cloud
computing system. Means it contains all the user interfaces and
applications which are used by the client to access the cloud computing
services/resources.
For example, use of a web browser to access the cloud plat form.
Client Infrastructure:
Client Infrastructure refers to the frontend components. It contains the
applications and user interfaces which are required to access the cloud
platform.
2. Backend:
Backend refers to the cloud itself which is used by the service provider. It
contains the resources as well as manages the resources and provides
security mechanisms. Along with this it includes huge storage, virtual
applications, virtual machines, traffic control mechanisms, deployment
models etc.
1. Application:
Application in backend refers to a software or platform to which client
accesses. Means it provides the service in backend as per the client
requirement.
2. Service:
Service in backend refers to the major three types of cloud based services
like SaaS, PaaS and IaaS. Also manages which type of service the user
accesses.
143
Distributed System and 3. Cloud Runtime:
Cloud Computing
Runtime cloud in backend refers to provide of execution and runtime
platform/environment to the virtual machine.
4. Storage:
Storage in backend refers to provide flexible and scalable storage service
and management of stored data.
5. Infrastructure:
Cloud Infrastructure in backend refers to hardware and software
components of cloud like it includes servers, storage, network devices,
virtualization software etc.
6. Management:
Management in backend refers to management of backend components
like application, service, runtime cloud, storage, infrastructure, and other
security mechanisms etc.
7. Security:
Security in backend refers to implementation of different security
mechanisms in the backend for secure cloud resources, systems, files, and
infrastructure to end-users.
8. Internet:
Internet connection acts as the medium or a bridge between frontend and
backend and establishes the interaction and communication between
frontend and backend.
Public Cloud:
A public cloud is a publicly accessible cloud environment owned by a
third-party cloud provider
The IT resources on public clouds are usually provisioned via cloud
delivery models and are generally offered to cloud consumers at a
cost or are commercialized via other avenues (such as
advertisement).
The cloud provider is responsible for the creation and on-going
maintenance of the public cloud and its IT resources.
Example-Google, Oracle, Microsoft
Community Clouds:
A community cloud is similar to a public cloud except that its
access is limited to a specific community of cloud consumers.
The community cloud may be jointly owned by the community
members or by third-party cloud provider that provisions a public
cloud with limited access.
The member cloud consumers of the community typically share the
responsibility for defining and evolving the community Cloud.
Example- Government agency
Private Clouds:
A private cloud is owned by a single organization.
Private clouds enable an organization to use cloud computing
technology as a means of centralizing access to IT resources by
different parts, locations, or departments the organization.
The use of a private cloud can change how organizational and trust
boundaries defined and applied
The actual administration of a private cloud environment may be
carried out by internal or outsourced staff.
Example-HP data centre, Ubuntu
145
Distributed System and Hybrid Clouds:
Cloud Computing
A hybrid cloud is a cloud environment comprised of two or more
different cloud deployment models.
The service of a hybrid cloud can be distributed in multiple cloud
types.
Example-Amazon Web service
146
146
Benefits: Introduction to Cloud
Computing
1. It is beneficial in terms of scalability, efficiency, performance.
2. Modest software tools
3. Efficient use of software licence
4. Centralized management and data
5. Platform responsibility managed by provider
6. Multitenant solution
147
Distributed System and Figure 2: A cloud consumer is accessing a ready-made PaaS
Cloud Computing environment. The question mark indicates that the cloud consumer is
intentionally shielded from the implementation details of the
platform.
Benefits:
1. Lower administrative overhead:
Consumer need not to bother much about the administrative because its
responsibility of cloud provider.
3. Scalable Solution:
It is easy to scale up and down automatically based upon application
resources
Issues:
1. Lack of portability between PaaS clouds
2. Event based processor scheduling
3. Security engineering of PaaS application
148
148
Introduction to Cloud
Computing
Benefits:
Some of the key benefits of IaaS are listed below:
1. Full control of computing resources through administrative access
to VMs:
It allows the consumer to access computing resources through
administrative access to virtual machine in the following manner
Consumer issues administrative command to cloud provide to run the
virtual machine or to save data on cloud server.
▪ Consumer issues administrative command to virtual machines they
owned to start web or installing new application.
Issues:
1. Compatibility with legacy security vulnerability
2. Virtual machine Sprawl
3. Robustness of VM level isolation
4. Data erase practices
149
Distributed System and
Cloud Computing
9.7 SUMMARY
Cloud environments are comprised of highly extensive infrastructure
that Offers pools of IT resources that can be leased using a pay-for-
use model whereby Only the actual usage of the IT resources is
billable.
The IaaS cloud delivery model offers cloud consumers a high level of
administrative control over “raw” infrastructure-based IT resources.
SaaS is a cloud delivery model for shared cloud services that can be
positioned as commercialized products hosted by clouds.
9.8 REFERENCES-
Cloud Computing Concepts, Technology & Architecture by Thomas
Erl, Zaigham Mahmood, and Ricardo Puttini
*****
151
UNIT VI
10
CLOUD COMPUTING
Unit Structure
10.0 Objective
10.1 Introduction
10.2 Amazon Web Services
10.3 Microsoft Azure and Google Cloud
10.3.1 Compute services
10.3.2 Storage services
10.3.3 Database services
10.3.4 Additional services
10.0 OBJECTIVE
The main motivation behind cloud computing is to enable businesses to
get access to data centres and manage tasks from a remote location. Cloud
computing works on the pay-as-you-go pricing model, which helps
businesses lower their operating cost and run infrastructure more
efficiently.
152
How does cloud computing work?: Cloud Computing
10.1 INTRODUCTION
Cloud Computing is the delivery of computing services such as servers,
storage, databases, networking, software, analytics, intelligence, and more,
over the Cloud (Internet).
153
Distributed System and
Cloud Computing
154
154
Scalability: We can increase or decrease the requirement of resources Cloud Computing
according to the business requirements.
Productivity: While using cloud computing, we put less operational
effort. We do not need to apply patching, as well as no need to
maintain hardware and software. So, in this way, the IT team can be
more productive and focus on achieving business goals.
Reliability: Backup and recovery of data are less expensive and very
fast for business continuity.
Security: Many cloud vendors offer a broad set of policies,
technologies, and controls that strengthen our data security.
Public Cloud: The cloud resources that are owned and operated by a
third-party cloud service provider are termed as public clouds. It
delivers computing resources such as servers, software, and storage
over the internet
Private Cloud: The cloud computing resources that are exclusively
used inside a single business or organization are termed as a private
cloud. A private cloud may physically be located on the company’s
on-site datacentre or hosted by a third-party service provider.
Hybrid Cloud: It is the combination of public and private clouds,
which is bounded together by technology that allows data applications
to be shared between them. Hybrid cloud provides flexibility and
more deployment options to the business.
156
156
Cloud Computing
Uses of AWS:
o A small manufacturing organization uses their expertise to expand
their business by leaving their IT management to the AWS.
o A large enterprise spread across the globe can utilize the AWS to
deliver the training to the distributed workforce.
o An architecture consulting company can use AWS to get the high-
compute rendering of construction prototype.
157
Distributed System and o A media company can use the AWS to provide different types of
Cloud Computing content such as ebox or audio files to the worldwide files.
Pay-As-You-Go:
Based on the concept of Pay-As-You-Go, AWS provides the services to
the customers.
AWS provides services to customers when required without any prior
commitment or upfront investment. Pay-As-You-Go enables the
customers to procure services from AWS.
o Computing
o Programming models
o Database storage
o Networking
Advantages of AWS:
1) Flexibility:
o We can get more time for core business tasks due to the instant
availability of new features and services in AWS.
o It provides effortless hosting of legacy applications. AWS does not
require learning new technologies and migration of applications to the
AWS provides the advanced computing and efficient storage.
o AWS also offers a choice that whether we want to run the applications
and services together or not. We can also choose to run a part of the
IT infrastructure in AWS and the remaining part in data centres.
2) Cost-effectiveness:
AWS requires no upfront investment, long-term commitment, and
minimum expense when compared to traditional IT infrastructure that
requires a huge investment.
3) Scalability/Elasticity:
Through AWS, autoscaling and elastic load balancing techniques are
automatically scaled up or down, when demand increases or decreases
respectively. AWS techniques are ideal for handling unpredictable or very
high loads. Due to this reason, organizations enjoy the benefits of reduced
cost and increased user satisfaction.
4) Security:
AWS provides end-to-end security and privacy to customers.
AWS has a virtual infrastructure that offers optimum availability
while managing full privacy and isolation of their operations.
158
158
Customers can expect high-level of physical security because of Cloud Computing
Amazon's several years of experience in designing, developing and
maintaining large-scale IT operation centers.
AWS ensures the three aspects of security, i.e., Confidentiality,
integrity, and availability of user's data.
Azure Services:
Data services: This service is used to store data over the cloud that
can be scaled according to the requirements. It includes Microsoft
Azure Storage (Blob, Queue Table, and Azure File services), Azure
SQL Database, and the Redis Cache.
Network services: It helps you to connect with the cloud and on-
premises infrastructure, which includes Virtual Networks, Azure
Content Delivery Network, and the Azure Traffic Manager.
159
Distributed System and including getting licenses and ensuring the best safety in all
Cloud Computing operations.
● There is a vast user base already on Microsoft Azure, but the
infrastructure is constantly scaling up, by using more processes for
applications and selling storage through the cloud. It can run without
any additional coding.
● A hybrid cloud computing ecosystem is still a unique feature of
Microsoft Azure. It can improve the performance by utilizing Virtual
Private Networks (VPNs), ExpressRoute connections, caches, CDNs,
etc.
● As most enterprises rely on MS office tools, it is wise to invest in a
cloud platform that integrates well with all Microsoft products.
Additionally, knowing C++, C, and Visual Basic can help you steer
your career in Microsoft Azure. If you require further validation, then
you can try out the Azure certification courses for Windows
certificates.
● Microsoft Azure has intelligence and analytics capacities to improve
the business process with the help of machine learning bots, cognitive
APIs, and Blockchain as a Service (BaaS).
● Microsoft Azure also has SQL and noSQL data processing facilities to
get deeper and actionable insights from the available data.
● One of the major reasons to choose Azure services is the affordability,
as the virtual infrastructure maintenance is extremely cost-efficient.
160
160
Cloud Computing
161
Distributed System and Azure covers more global regions than any other cloud provider, which
Cloud Computing offers the scalability needed to bring applications and users closer around
the world. It is globally available in 50 regions around the world. Due to
its availability over many regions, it helps in preserving data residency and
offers comprehensive compliance and flexible options to the customers.
163
Distributed System and The basic fundamental building block that is available in Azure is the SQL
Cloud Computing database. Microsoft offers this SQL server and SQL database on Azure in
many ways. We can deploy a single database, or we can deploy multiple
databases as part of a shared elastic pool.
Data Factory:
It is used for ETL transformation, extraction loading, etc. Using the data
factory, we can even extract the data from our on-premises data center.
We can do some conversion and load it into the Azure SQL database. Data
Factory is an ETL tool that is offered on the cloud, which we can use to
connect to different databases, extract the data, transform it, and load into
a destination.
Security:
All the databases that are existing in Azure need to be secured, and also
we need to accept connections from known origins. For this purpose, all
these database services come with firewall rules where we can configure
from which particular IP address we want to allow connections. We can
define those firewall rules to limit the number of connections and also
reduce the surface attack area.
Cosmos DB:
Cosmos DB is a NoSQL data store that is available in Azure, and it is
designed to be globally scalable and also very highly available with
extremely low latency. Microsoft guarantees latency in terms of reading
and writes with Cosmos DB. For example - if we have any applications
such as IoT, gaming where we get a lot of data from different users spread
across globally, then we will go for Cosmos DB. Because Cosmos DB is
designed to be globally scalable and highly available due to which our
users will experience low latency.
Finally, there are two things, and one is we need to secure all the services.
For that purpose, we can integrate all these services with Azure Active
Directory and manage the users from Azure Active Directory also. To
monitor all these services, we can use the security center. There is an
individual monitoring tool too, but Azure security center will keep on
monitoring all these services and provide recommendations if something
is wrong.
Cosmos DB:
Cosmos DB is a NoSQL data store that is available in Azure, and it is
designed to be globally scalable and also very highly available with
extremely low latency. Microsoft guarantees latency in terms of reading
and writes with Cosmos DB. For example - if we have any applications
such as IoT, gaming where we get a lot of data from different users spread
165
Distributed System and across globally, then we will go for Cosmos DB. Because Cosmos DB is
Cloud Computing designed to be globally scalable and highly available due to which our
users will experience low latency.
Finally, there are two things, and one is we need to secure all the services.
For that purpose, we can integrate all these services with Azure Active
Directory and manage the users from Azure Active Directory also. To
monitor all these services, we can use the security center. There is an
individual monitoring tool too, but Azure security center will keep on
monitoring all these services and provide recommendations if something
is wrong.
*****
166
166
11
CLOUD PLATFORMS
Unit Structure
11.0 Objective
11.1 Introduction
11.2 Google App Engine (GAE)
11.3 Aneka
11.4 Comparative study of various Cloud
11.5 Computing Platforms
11.0 OBJECTIVE
A computer platform is a system that consists of a hardware device and
an operating system that an application, program or process runs upon. An
example of a computer platform is a desktop computer with Microsoft
Windows installed on it. A desktop is a hardware device and Windows is
an operating system.
The operating system acts as an interface between the computer and the
user and also between the computer and the application. So, in order to
have a functional device, you need hardware and an operating system
together to make a usable computer platform for a program to run on.
The hardware portion of a computer platform consists of a processor,
memory, and storage. The processor is a bit like your brain and memory is
like a scratchpad for your brain to use while you're working out a problem.
It used to be that people referred to different computer platforms by their
physical size, from smallest to largest - microcomputers (smallest),
minicomputers (mid-size), and mainframes (largest). The term
microcomputer has fallen somewhat out of favor - now most people just
refer to these machines as computers or personal computers.
11.1 INTRODUCTION
Cloud computing is the delivery of computing services—including
servers, storage, databases, networking, software, analytics, and
intelligence—over the Internet ("the cloud") to offer faster innovation,
flexible resources, and economies of scale.
167
Distributed System and One benefit of using cloud-computing services is that firms can avoid the
Cloud Computing upfront cost and complexity of owning and maintaining their own IT
infrastructure, and instead simply pay for what they use, when they use it.
In turn, providers of cloud-computing services can benefit from significant
economies of scale by delivering the same services to a wide range of
customers.
171
Distributed System and Now let’s talk about some of these services in brief.
Cloud Computing
Compute Engine:
The Compute Engine service is Google’s unmanaged compute service. We
can think of Compute Engine as an Infrastructure as a Service (IaaS)
offering by Google Cloud. As the service is unmanaged, it is our
responsibility to configure, administer, and monitor the system. On
Google’s side, they will ensure that resources are available, reliable, and
ready for you to use. The main benefit in using compute engine is that you
have complete control of the systems.
App Engine:
The App Engine is Google’s Platform as a Service(PaaS) offering. It is a
compute service that provides a managed platform for running
applications. As this is a managed service, your focus should be on the
application only and Google will manage the resources needed to run the
application. Thus App Engine users have less to manage, but you will
have less control over the compute resources. The applications hosted on
App Engine are highly scalable and run reliably even under heavy load.
172
172
The App Engine supports the following languages: Cloud Platforms
● Python
● Go
● Ruby
● PHP
● Node.js
● Java
● .NET
The App Engine provides two types of runtime environments: standard
and flexible.
1. The Standard environment provides a secured and sandboxed
environment for running applications and distributes requests across
multiple servers to meet the demand. The applications run
independently of the hardware, OS, and physical location of the
server.
2. The Flexible environment provides more options and control to the
developers who want to use App Engine, but without the language
constraints of the standard environment. It uses Docker containers as
the basic building blocks. These containers can be auto-scaled
according to load.
173
Distributed System and Difference between Compute Engine and App Engine:
Cloud Computing
Compute Engine App Engine
Service model IaaS offering PaaS offering
Type of Service Unmanaged Service Managed Service
Control over More control and Less control over
resources flexibility computing resources
Costs Costs less than App Costs more than
Engine over resources Compute Engine
Running Instances When running Can scale down to zero
application, at least one instances when no
instance should be requests are coming
running
Use cases Best for general Best for web-facing
computing workloads and mobile
applications
Autoscaling Slower autoscaling Faster autoscaling
Security Less secure than App Comparatively more
Engine secure than Compute
Engine
11.3 ANEKA
Aneka includes an extensible set of APIs associated with programming
models like MapReduce.
These APIs support different cloud models like a private, public, hybrid
Cloud.
Manjrasoft focuses on creating innovative software technologies to
simplify the development and deployment of private or public cloud
applications. Our product plays the role of an application platform as a
service for multiple cloud computing.
Multiple Structures:
Aneka is a software platform for developing cloud computing
applications.
In Aneka, cloud applications are executed.
Aneka is a pure PaaS solution for cloud computing.
Aneka is a cloud middleware product.
Manya can be deployed over a network of computers, a multicore server, a
data center, a virtual cloud infrastructure, or a combination thereof.
174
174
Multiple containers can be classified into three major categories: Cloud Platforms
1. Textile services
2. Foundation Services
3.Application Services
1. Textile Services:
Fabric Services defines the lowest level of the software stack that
represents multiple containers. They provide access to resource-
provisioning subsystems and monitoring features implemented in many.
2. Foundation Services:
Fabric Services are the core services of Manya Cloud and define the
infrastructure management features of the system. Foundation services are
concerned with the logical management of a distributed system built on
top of the infrastructure and provide ancillary services for delivering
applications.
3. Application Services:
And:
175
Distributed System and A runtime engine and platform for managing the deployment and
Cloud Computing execution of applications on a private or public cloud.
Architecture of Aneka:
176
176
A multiplex-based computing cloud is a collection of physical and Cloud Platforms
virtualized resources connected via a network, either the Internet or a
private intranet. Each resource hosts an instance of multiple containers
that represent the runtime environment where distributed applications are
executed. The container provides the basic management features of a
single node and takes advantage of all the other functions of its hosting
services.
One of the key features of Aneka is its ability to provide a variety of ways
to express distributed applications by offering different programming
models; Execution services are mostly concerned with providing
middleware with the implementation of these models. Additional services
such as persistence and security are inverse to the whole stack of services
hosted by the container.
177
Distributed System and 11.4 COMPARATIVE STUDY OF VARIOUS CLOUD
Cloud Computing
Below table shows the comparitive study of various cloud:
179
12
CLOUD ISSUES AND CHALLENGES
Unit Structure
12.0 Objective
12.1 Introduction
12.2 Cloud computing issues and challenges
12.2.1 Security
12.2.2 Elasticity
12.2.3 Resource management and scheduling
12.3 Quality of service (QoS) and Resource allocation
12.4 Identity and Access management
12.0 OBJECTIVE
To study more about cloud issues and challenges.
To study about cloud security and elasticity.
To study and understand about Quality of Service (QoS) and resource
allocation.
12.1 INTRODUCTION
In Simplest terms, cloud computing means storing and accessing the data
and programs on remote servers that are hosted on internet instead of
computer’s hard drive or local server. Cloud computing is also referred as
Internet based computing.
Hosting a cloud:
There are three layers in cloud computing. Companies use these layers
based on the service they provide.
● Infrastructure
● Platform
● Application
Cloud Issues and Challenges
5. Internet Connectivity
6. Control or Governance
7. Compliance
8. Multiple Cloud Management
9. Creating a private cloud
10. Performance
11. Migration
12. Interoperability and Portability
13. Reliability and High Availability
14. Hybrid-Cloud Complexity
1. SECURITY:
The topmost concern in investing in cloud services is security issues in
cloud computing. It is because your data gets stored and processed by a
third-party vendor and you cannot see it. Every day or the other, you get
informed about broken authentication, compromised credentials, account
hacking, data breaches, etc. in a particular organization. It makes you a
little more skeptical.
Fortunately, the cloud providers, these days have started to put efforts to
improve security capabilities. You can be cautious as well by verifying if
the provider implements a safe user identity management system and
access control procedures. Also, ensure it implements database security
and privacy protocols.
2. PASSWORD SECURITY:
As large numbers of people access your cloud account, it becomes
vulnerable. Anybody who knows your password or hacks into your cloud
will be able to access your confidential information.
Here the organization should use a multiple level authentication and
ensure that the passwords remain protected. Also, the passwords should be
modified regularly, especially when a particular employee resigns and
leave the organization. Access rights to usernames and passwords should
be given judiciously.
3. COST MANAGEMENT:
Cloud computing enables you to access application software over a fast
internet connection and lets you save on investing in costly computer
hardware, software, management and maintenance. This makes it
affordable. But what is challenging and expensive is tuning the
183
Distributed System and organization’s needs on the third-party platform. Another costly affair is
Cloud Computing the cost of transferring data to a public cloud, especially for a small
business or project.
4. LACK OF EXPERTISE:
With the increasing workload on cloud technologies and continuously
improving cloud tools, the management has become difficult. There has
been a consistent demand for a trained workforce who can deal with cloud
computing tools and services. Hence, firms need to train their IT staff to
minimize this challenge.
5. INTERNET CONNECTIVITY:
The cloud services are dependent on a high-speed internet connection. So
the businesses that are relatively small and face connectivity issues should
ideally first invest in a good internet connection so that no downtime
happens. It is because internet downtime might incur vast business losses.
6. CONTROL OR GOVERNANCE:
Another ethical issue in cloud computing is maintaining proper control
over asset management and maintenance. There should be a dedicated
team to ensure that the assets used to implement cloud services are used
according to agreed policies and dedicated procedures. There should be
proper maintenance and that the assets are used to meet your
organization’s goals successfully.
7. COMPLIANCE:
Another major risk of cloud computing is maintaining compliance. By
compliance we mean, a set of rules about what data is allowed to be
moved and what should be kept in-house to maintain compliance. The
organizations must follow and respect the compliance rules set by various
government bodies.
11. MIGRATION:
Migration is nothing but moving a new application or an existing
application to a cloud. In the case of a new application, the process is
pretty straightforward. But if it is an age-old company application, it
becomes tedious.
185
Distributed System and challenges such as scalability, integration, and disaster recovery are
Cloud Computing magnified in a hybrid cloud environment.
12.2.1 Security:
Cloud Computing is a type of technology that provides remote services on
the internet to manage, access, and store data rather than storing it on
Servers or local drives. This technology is also known as Serverless
technology. Here the data can be anything like Image, Audio, video,
documents, files, etc.
186
186
Data Loss: Cloud Issues and Challenges
Data Loss is one of the issues faced in Cloud Computing. This is also
known as Data Leakage. As we know that our sensitive data is in the
hands of Somebody else, and we don’t have full control over our database.
So if the security of cloud service is to break by hackers then it may be
possible that hackers will get access to our sensitive data or personal files.
Lack of Skill:
While working, shifting o another service provider, need an extra feature,
how to use a feature, etc. are the main problems caused in IT Company
who doesn’t have skilled Employee. So it requires a skilled person to work
with cloud Computing.
187
Distributed System and 12.2.2 Elasticity:
Cloud Computing
The Elasticity refers to the ability of a cloud to automatically expand or
compress the infrastructural resources on a sudden-up and down in the
requirement so that the workload can be managed efficiently. This
elasticity helps to minimize infrastructural cost. This is not applicable for
all kind of environment, it is helpful to address only those scenarios where
the resources requirements fluctuate up and down suddenly for a specific
time interval. It is not quite practical to use where persistent resource
infrastructure is required to handle the heavy workload.
It is most commonly used in pay-per-use, public cloud services. Where IT
managers are willing to pay only for the duration to which they consumed
the resources.
Example:
Consider an online shopping site whose transaction workload increases
during festive season like Christmas. So for this specific period of time,
the resources need a spike up. In order to handle this kind of situation, we
can go for Cloud-Elasticity service rather than Cloud Scalability. As soon
as the season goes out, the deployed resources can then be requested for
withdrawal.
Cloud Scalability:
Cloud scalability is used to handle the growing workload where good
performance is also needed to work efficiently with software or
applications. Scalability is commonly used where the persistent
deployment of resources is required to handle the workload statically.
Example:
Consider you are the owner of a company whose database size was small
in earlier days but as time passed your business does grow and the size of
your database also increases, so in this case you just need to request your
cloud service vendor to scale up your database capacity to handle a heavy
workload.
It is totally different from what you have read above in Cloud Elasticity.
Scalability is used to fulfill the static needs while elasticity is used to
fulfill the dynamic need of the organization. Scalability is a similar kind of
service provided by the cloud where the customers have to pay-per-use.
So, in conclusion, we can say that Scalability is useful where the workload
remains high and increases statically.
Types of Scalability:
1. Vertical Scalability (Scale-up):
In this type of scalability, we increase the power of existing resources in
the working environment in an upward direction.
188
188
Cloud Issues and Challenges
2. Horizontal Scalability:
In this kind of scaling, the resources are added in a horizontal row.
3. Diagonal Scalability:
It is a mixture of both Horizontal and Vertical scalability where the
resources are added both vertically and horizontally.
189
Distributed System and The policies for CRM can be loosely grouped into five classes:
Cloud Computing
(1) admission control;
(2) capacity allocation;
(3) load balancing;
(4) energy optimization; and
(5) quality of service (QoS) guarantees.
The explicit goal of an admission control policy is to prevent the system
from accepting workload in violation of high-level system policies. A
system should not accept additional workload if this would prevent it from
completing work already in progress or contracted. Limiting the workload
requires some knowledge of the global state of the system.
Capacity allocation means to allocate resources for individual instances.
An instance is an activation of a service on behalf of a cloud user.
Locating resources subject to multiple global optimization constraints
requires a search in a very large search space. Capacity allocation is more
challenging when the state of individual servers changes rapidly.
Load balancing and energy optimization are correlated and affect the cost
of providing the services; they can be done locally, but global load
balancing and energy optimization policies encounter the same difficulties
as the capacity allocation. Quality of service (QoS) is probably the most
challenging aspect of resource management and, at the same time,
possibly the most critical for the future of cloud computing.
Resource management policies must be based on a disciplined approach,
rather than ad hoc methods.
To our knowledge, none of the optimal or near-optimal methods to address
the five classes of policies scale up, thus, there is a need to develop novel
strategies for resource management in a computer cloud. Typically, these
methods target a single aspect of resource management, e.g., admission
control, but ignore energy conservation; many require very complex
computations that cannot be done effectively in the time available to
respond.
Performance models required by some of the methods are very complex,
analytical solutions are intractable, and the monitoring systems used to
gather state information for these models can be too intrusive and unable
to provide accurate data. Many techniques are concentrated on system
performance in terms of throughput and time in system, but they rarely
include energy trade-offs or QoS guarantees. Some techniques are based
on unrealistic assumptions; for example, capacity allocation is viewed as
an optimization problem, but under the assumption that servers are
protected from overload.
190
190
Virtually all mechanisms for the implementation of the resource Cloud Issues and Challenges
management policies require the presence of a few systems which monitor
and control the entire cloud, while the large majority of systems run
applications and store data; some of these mechanisms require a two-level
control, one at the cloud level and one at the application level. The
strategies for resource management associated with IaaS, PaaS,
and SaaS will be different, but in all cases the providers are faced with
large fluctuating loads.
In some cases, when a spike can be predicted, the resources can be
provisioned in advance, e.g., for Web services subject to seasonal spikes.
For an unplanned spike, the situation is slightly more complicated. Auto-
scaling can be used for unplanned spike loads provided that: (a) there is a
pool of resources that can be released or allocated on demand and (b) there
is a monitoring system which allows a control loop to decide in real time
to reallocate resources.
193
Distributed System and In the traditional view of a computing system, resource allocation, i.e., the
Cloud Computing sharing of a common set of resources between applications contending for
their use, was a task performed by the operating system, and user
applications had little control over this process. This situation has changed
due to the following causes:
● The increasing number of applications subject to strong constraints in
time and space, e.g., embedded systems and applications managing
multimedia data.
● The growing variability of the environment and operating conditions
of many applications, e.g., those involving mobile communications.
● The trend towards more open systems, and the advent of open
middleware.
Thus, an increasing part of resource management is being delegated to the
upper levels, i.e., to the middleware layers and to the applications
themselves. In this chapter, we examine some aspects of resource
allocation in these upper levels.
The term resource applies to any identifiable entity (physical or virtual)
that is used by a system for service provision. The entity that actually
implements service provision, using resources, is called a resource
principal. Examples of physical resources are processors, memory, disk
storage, routers, network links, sensors. Examples of virtual resources are
virtual memory, network bandwidth, files and other data (note that virtual
resources are abstractions built on top of physical resources). Examples of
resource principals are processes in an operating system, groups of
processes dedicated to a common task (possibly across several machines),
various forms of "agents" (computational entities that may move or spread
across the nodes of a network). One may define a hierarchy of principals:
for example, a virtual machine provides (virtual) resources to the
applications it supports, while requesting (physical or even virtual)
resources from a hypervisor.
Goals and Policies:
The role of resource management is to allocate resources to the service
providers (principals), subject to the requirements and constraints of both
service providers and resource providers. The objective of a service
provider is to respect its SLA, the contract that binds it to service
requesters (clients). The objective of a resource provider is to maximize
the utilization rate of its resources, and possibly the revenue it draws from
their provision. The relationships between clients, service providers, and
resource providers are illustrated in below figure,
194
194
Cloud Issues and Challenges
196
196
12.1.3 Resource Management for Internet Services: Cloud Issues and Challenges
An increasing number of services are available over the Internet, and are
subject to high demand. Internet services include electronic commerce, e-
mail, news diffusion, stock trading, and many other applications. As these
services provide a growing number of functions to their users, their scale
and complexity have also increased. Many services may accept requests
from millions of clients. Processing a request submitted by a client
typically involves several steps, such as analyzing the request, looking up
one or several databases to find relevant information, doing some
processing on the results of the queries, dynamically generating a web
page to answer the request, and sending this page to the client. This cycle
may be shortened, e.g., if a result is available in a cache. To accommodate
this interaction pattern, a common form of organization of Internet
services is a multi-tier architecture, in which each tier is in charge of a
specific phase of request processing.
To answer the demand in computational power and storage space imposed
by large scale applications, clusters of commodity, low cost machines
have proved an economic alternative to mainframes and high-performance
multiprocessors. In addition to flexibility, clusters allow high availability
by replicating critical components. Thus each tier of a cluster-based
application is deployed on a set of nodes (Figure). How the application
components running on the servers of the different tiers are connected
together depends on the architecture of the application; examples may be
found in the rest of this chapter. Nodes may be reallocated between
different tiers, according to the resource allocation policy.
197
Distributed System and ● The fixed cost of ownership (i.e., the part of the cost that is not
Cloud Computing proportional to the amount of resources) is shared between the users
of the common facility.
● Mutualizing a large pool of resources between several applications
allows reacting to load peaks by reallocating resources, provided the
peaks are not correlated for the different applications.
● Resource sharing improves global availability, because of redundancy
in hosts and network connections.
Components of IAM
Users
Roles
Groups
Policies
With these new applications being created over the cloud, mobile and on-
premise can hold sensitive and regulated information, It’s no longer
acceptable and feasible to just create an Identity server and provide access
based on the requests. In current times an organization should be able to
track the flow of information and provide least privileged access as and
when required, obviously with a large workforce and new applications
being added every day it becomes quite difficult to do the same. So
organizations specifically concentrate on managing identity and its access
with the help of few IAM tools. It’s quite obvious that it is very difficult
for a single tool to manage everything but there are multiple IAM tools in
the market that help the organizations with any of the few services given
below.
Services By IAM:
Identity management
Access management
198
198
Federation Cloud Issues and Challenges
RBAC/EM
Multi-Factor authentication
Access governance
Customer IAM
API Security
IDaaS – Identity as a service
Granular permissions
Privileged Identity management – PIM (PAM or PIM is the same)