0% found this document useful (0 votes)
18 views

Distributed Os

This document discusses hardware and software concepts in distributed systems. It describes four types of computer architectures based on the number of instruction and data streams: SISD, SIMD, MISD, and MIMD. MIMD systems can have shared or distributed memory. The document also discusses loosely and tightly coupled distributed operating systems and outlines key issues in distributed communication including naming, routing, packet strategies, and connection strategies.

Uploaded by

bibek gautam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Distributed Os

This document discusses hardware and software concepts in distributed systems. It describes four types of computer architectures based on the number of instruction and data streams: SISD, SIMD, MISD, and MIMD. MIMD systems can have shared or distributed memory. The document also discusses loosely and tightly coupled distributed operating systems and outlines key issues in distributed communication including naming, routing, packet strategies, and connection strategies.

Uploaded by

bibek gautam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Distributed os

• Advantages of an distributed system over independent pcs


• Disadvantages of distributed system
• Hardware concept
• Even though all distributed systems consist of multiple CPUs, there are several different
ways the hardware can be organized, especially in terms of how they are interconnected
and how they communicate.
• In this section we will take a brief look at distributed system hardware, in particular, how
the machines are connected together. In the next section we will examine some of the
software issues related to distributed systems.
• Various classification schemes for multiple CPU computer systems have been proposed
over the years, but none of them have really caught on and been widely adopted.
Probably the most frequently cited taxonomy is Flynn’s (1972), although it is fairly
rudimentary. Flynn picked two characteristics that he con¬ sidered essential:
the number of instruction streams
 the number of data streams
• A computer with a single instruction stream and a single data stream is
called SISD.
• SIMD, single instruction stream, multiple data stream. This type refers to
array processors with one instruction unit that fetches an instruction,
and then commands many data units to carry it out in parallel, each with
its own data.
• MISD, multiple instruction stream, single data stream. No known
computers fit this model.
• Finally, comes MIMD, which essentially means a group of independent
computers, each with its own program counter, program, and data. All
distributed systems are MIMD
• we divide all MIMD computers into two groups: those that have shared memory,
usually called multiprocessors, and those that do not, sometimes called
multicomputers.
• The essential difference is this: in a multiprocessor, there is a single virtual address
space that is shared by all CPUs. If any CPU writes, for example, the value 44 to
address 1000, any other CPU subsequently reading from its address 1000 will get the
value 44. All the machines share the same memory
• In contrast, in a multicomputer, every machine has its own private memory. If one
CPU writes the value 44 to address 1000, when another CPU reads address 1000 it
will get whatever value was there before. The write of 44 does not affect its memory
at all. A common example of a multicomputer is a collec¬ tion of personal computers
connected by a network
• Can be tightly coupled as well as loosely coupled
• Software concept
• Although the hardware is important, the software is even more important. The image that a system
presents to its users, and how they think about the system, is largely determined by the operating
system software, not the hardware.
• Operating systems cannot be put into nice, neat pigeonholes like hardware. By nature software is vague
and amorphous. Still, it is more-or-less possible to distinguish two kinds of operating systems for
multiple CPU systems: loosely coupled and tightly coupled. As we shall see, loosely and tightly-coupled
software is roughly analogous to loosely and tightly-coupled hardware.
• Loosely-coupled software allows machines and users of a distributed system to be fundamentally
independent of one another, but still to interact to a limited degree where that is necessary. Consider a
group of personal computers, each of which has its own CPU, its own memory, its own hard disk, and
its own operating system, but which share some resources, such as laser printers and data bases, over
a LAN. This system is loosely coupled, since the individual machines are clearly distinguishable, each
with its own job to do. If the network should go down for some reason, the individual machines can still
continue to run to a considerable degree, although some functionality may be lost (e.g., the ability to
print files
• To show how difficult it is to make definitions in this area, now consider the same system as
above, but without the network(loosely coupled). To print a file, the user writes the file on a
floppy disk, carries it to the machine with the printer, reads it in, and then prints it. Is this
still a distributed system, only now even more loosely coupled? It’s hard to say. From a
fundamental point of view, there is not really any theoretical difference between
communicating over a LAN and communicating by carrying floppy disks around. At most one
can say that the delay and data rate are worse in the second example.
• At the other extreme we might find a multiprocessor dedicated to running a single chess
program in parallel. Each CPU is assigned a board to evaluate, and it spends its time
examining that board and all the boards that can be generated from it. When the evaluation
is finished, the CPU reports back the results and is given a new board to work on. The
software for this system, both the applica¬ tion program and the operating system required
to support it, is clearly much more tightly coupled than in our previous example.
• Communication Structure in distributed system
• Now that we have discussed the physical aspects of networking, we
turn to the internal workings. The designer of a communication
network must address five basic issues:
• • Naming and name resolution. How do two processes locate each
other to communicate?
• • Routing strategies. How are messages sent through the network?
• • Packet strategies. Are packets sent individually or as a sequence?
• • Connection strategies. How do two processes send a sequence of mes
• Naming and Name Resolution
• The first issue in network communication involves the naming of the systems in the network. For a
process at site A to exchange information with a process at site B, each must be able to specify the other.
Within a computer system, each process has a process identifier, and messages may be addressed with
the process identifier. Because networked systems share no memory, however, a host within the system
initially has no knowledge about the processes on other hosts.
• For example, a request made by a process on system A to communicate with bob.cs.brown.edu would
result in the following steps:
• 1. The system library or the kernel on system A issues a request to the name server for the edu domain,
asking for the address of the name server for brown.edu. The name server for the edu domain must be at
a known address, so that it can be queried.
• 2. The edu name server returns the address of the host on which the brown.edu name server resides.
• 3. System A then queries the name server at this address and asks about cs.brown.edu.
• 4. An address is returned. Now, finally, a request to that address for bob.cs.brown.edu returns an Internet
address host-id for that host (for example, 128.148.31.100)
Routing Strategies
• When a process at site A wants to communicate with a process at site B, how is the message sent?
If there is only one physical path from A to B, the message must be sent through that path.
However, if there are multiple physical paths from A to B, then several routing options exist.
• Fixed routing. A path from A to B is specified in advance and does not change unless a hardware
failure disables it. Usually, the shortest path is chosen, so that communication costs are minimized.
• • Virtual routing. A path from A to B is fixed for the duration of one session. Different sessions
involving messages from A to B may use different paths. A session could be as short as a file transfer
or as long as a remote-login period.
• Dynamic routing. The path used to send a message from site A to site B is chosen only when the
message is sent. Because the decision is made dynamically, separate messages may be assigned
different paths. Site A will make a decision to send the message to site C. C, in turn, will decide to
send it to site D, and so on. Eventually, a site will deliver the message to B. Usually, a site sends a
message to another site on whatever link is the least used at that particular time.
• Packet Strategies
• Messages generally vary in length. To simplify the system design, we commonly
implement communication with fixed-length messages called packets, frames, or
datagrams. A communication implemented in one packet can be sent to its
destination in a connectionless message. A connectionless message can be
unreliable, in which case the sender has no guarantee that, and cannot tell
whether, the packet reached its destination.
• Alternatively, the packet can be reliable. Usually, in this case, an acknowledgement
packet is returned from the destination indicating that the original packet arrived.
(Of course, the return packet could be lost along the way.) If a message is too long
to fit within one packet, or if the packets need to flow back and forth between the
two communicators, a connection is established to allow the reliable exchange of
multiple packets.
• Connection Strategies
• Once messages are able to reach their destinations, processes can institute
communications sessions to exchange information. Pairs of processes that want to
communicate over the network can be connected in a number of ways. The three most
common schemes are circuit switching, message switching, and packet switching.
Circuit switching. If two processes want to communicate, a permanent physical link is
established between them. This link is allocated for the duration of the communication
session, and no other process can use that link during this period (even if the two
processes are not actively communicating for a while). This scheme is similar to that used
in the telephone system. Once a communication line has been opened between two
parties (that is, party A calls party B), no one else can use this circuit until the
communication is terminated explicitly (for example, when the parties hang up)
Message switching.
• If two processes want to communicate, a temporary link is established for the duration of one
message transfer. Physical links are allocated dynamically among correspondents as needed and
are allocated for only short periods. Each message is a block of data with system information—such
as the source, the destination, and error correction codes (ECC)—that allows the communication
network to deliver the message to the destination correctly. This scheme is similar to the post-
office mailing system. Each letter is a message that contains both the destination address and
source (return) address. Many messages (from different users) can be shipped over the same link.
Packet switching. One logical message may have to be divided into a number of packets. Each packet
may be sent to its destination separately, and each therefore must include a source and a destination
address with its data. Furthermore, the various packets may take different paths through the
network. The packets must be reassembled into messages as they arrive. Note that it is not harmful
for data to be broken into packets, possibly routed separately, and reassembled at the destination.
Breaking up an audio signal (say, a telephone communication), in contrast, could cause great
confusion if it was not done carefully
Communication in distributed systems
• Communication in distributed systems is a critical aspect that enables components, nodes, or
processes to exchange information and coordinate their activities. Effective communication is
essential for achieving the goals of transparency, fault tolerance, and resource sharing in
distributed computing environments. Here are some key aspects of communication in distributed
systems:
Inter-Process Communication (IPC): Processes in a distributed system need to communicate with
each other, whether they are running on the same machine or different machines. Various
mechanisms for IPC are employed, such as message passing, remote procedure calls (RPC), and
distributed objects. These mechanisms facilitate the exchange of data and commands between
processes.
Message Passing:
1. Synchronous Messaging: Communication happens in real-time, and processes wait for a response.
2. Asynchronous Messaging: Communication is not real-time, and processes can continue their tasks without
waiting for a response.
Remote Procedure Calls (RPC):
1. RPC allows a process to invoke a procedure (method or function) on a remote machine as if it were a local
procedure.
2. It abstracts the communication details, making remote interactions resemble local function calls.
Distributed Objects:
3. Distributed objects involve the use of objects that can be distributed across multiple machines.
4. Objects communicate by invoking methods on remote objects, and the communication is transparent to the
calling code.
Middleware and Message Brokers:
5. Middleware provides a layer of abstraction for communication, often offering services like message queuing,
publish-subscribe mechanisms, and transaction support.
6. Message brokers (e.g., Apache Kafka, RabbitMQ) facilitate asynchronous communication by allowing
components to publish and subscribe to messages.
Networking Protocols:
7. Distributed systems rely on various networking protocols for communication. Common protocols include
TCP/IP, UDP, HTTP, and others, depending on the requirements of the system.
Message passing
• Message passing in distributed operating systems can be implemented using various
communication mechanisms:
Direct Communication:
1. Processes directly communicate with each other by explicitly naming the recipient.
2. This may involve specifying the process identifier or a communication endpoint.
3. Direct communication can be implemented through send and receive primitives.
Indirect Communication:
4. Processes communicate indirectly through a shared communication channel or message queue.
5. The sending process places a message in a shared buffer or queue, and the receiving process
retrieves the message from the same location.
6. This model allows for more flexibility and decoupling between sender and receiver.
Message-Oriented Communication:
1. In a message-oriented communication model, processes communicate by sending and receiving
messages.
2. Messages typically include both data and metadata, such as the sender's identity, message type, and
payload.
• Message passing is essential for achieving communication and coordination in distributed systems,
enabling processes to work together to perform complex tasks. The choice between synchronous
and asynchronous message passing, as well as the specific communication mechanisms used,
depends on the requirements and design goals of the distributed operating system or application.
• Remote procedure call
• Remote Procedure Call (RPC) is a communication technology that is used
by one program to make a request to another program for utilizing its
service on a network without even knowing the network’s details. A
function call or a subroutine call are other terms for a procedure call.
• It is based on the client-server concept. The client is the program that makes
the request, and the server is the program that gives the service. An RPC,
like a local procedure call, is based on the synchronous operation that
requires the requesting application to be stopped until the remote process
returns its results. Multiple RPCs can be executed concurrently by utilizing
lightweight processes or threads that share the same address space.
• A remote procedure call occurs in the following steps:
1. The client procedure calls the client stub in the normal way.
2. The client stub builds a message and traps to the kernel.
3. The kernel sends the message to the remote kernel.
4. The remote kernel gives the message to the server stub.
5. The server stub unpacks the parameters and calls the server.
6. The server does the work and returns the result to the stub.
7. The server stub packs it in a message and traps to the kernel.
8. The remote kernel sends the message to the client's kernel.
9. The client’s kernel gives the message to the client stub.
10.The stub unpacks the result and returns to the client.
Process in distributed system
In a distributed system, a process refers to an instance of a program in execution. A process is an independent entity
that runs on a specific node (a computer or server) within the distributed environment. Processes in a distributed system
may need to communicate and collaborate with each other to achieve common goals. Here are some key aspects related
to processes in distributed systems:
Definition:
1. A process is an execution unit that consists of an executable program, associated data, and a state.
2. It is an independent unit of execution with its own memory space and resources.
Characteristics of Processes in Distributed Systems:
3. Concurrency: Multiple processes can run concurrently in a distributed system, performing tasks simultaneously on different
nodes.
4. Independence: Each process is independent and has its own local memory space, making it isolated from other processes.
5. Communication: Processes may need to communicate with each other to share information, coordinate activities, or exchange
data.
Communication Between Processes:
6. Processes in a distributed system often communicate through message passing or other communication mechanisms.
7. Communication can be achieved through direct communication, where processes explicitly name the recipient, or through
indirect communication using shared channels or message queues.
Clock synchronization
• Clock synchronization is the mechanism to synchronize the time of all the computers in the distributed
environments or system.
• Assume that there are three systems present in a distributed environment. To maintain the data i.e. to send,
receive and manage the data between the systems with the same time in synchronized manner you need a clock
that has to be synchronized. This process to synchronize data is known as Clock Synchronization.
• Synchronization in distributed system is more complicated than in centralized system because of the use of
distributed algorithms.
• Properties of Distributed algorithms to maintain Clock synchronization:
• Relevant and correct information will be scattered among multiple machines.
• The processes make the decision only on local information.
• Failure of the single point in the system must be avoided.
• No common clock or the other precise global time exists.
• In the distributed systems, the time is ambiguous.
• Types of Clock Synchronization
• Physical clock synchronization
• Logical clock synchronization

Physical Clock Synchronization:


• Definition: Physical clock synchronization aims to align the physical clocks of different nodes in a
distributed system.
• Objective: The goal is to make sure that the time reported by each node's clock is close to the time
reported by a reference clock, often referred to as a global time or a time server.
• Challenges: Clocks on different machines may drift due to variations in hardware, temperature, or other
factors. Network delays and asymmetries also contribute to the challenge of achieving precise
synchronization.
• Protocols: Various protocols are used for physical clock synchronization, with the Network Time Protocol
(NTP) being one of the most widely adopted. NTP adjusts the local clock of each machine to match a
reference clock, compensating for clock drift and network delays.
• Logical Clock Synchronization:
• Definition: Logical clock synchronization is concerned with establishing a logical ordering of events
across distributed nodes, even when their physical clocks may be unsynchronized.
• Objective: The goal is to provide a consistent and partial order of events to support coordination and
ensure causality is maintained across the distributed system.
• Challenges: Nodes in a distributed system may not have a shared global time, and events may occur
concurrently or with partial ordering. Logical clock synchronization addresses these challenges by
assigning logical timestamps to events.
• Protocols: Lamport clocks and Vector clocks are commonly used logical clock synchronization
algorithms.
• Lamport Clocks: Lamport clocks provide a partial ordering of events by assigning a timestamp to each event.
Events are totally ordered if they have the same timestamp and causally ordered if one event happened before
another.
• Vector Clocks: Vector clocks extend the concept of Lamport clocks by associating a vector of timestamps with
each process. Vector clocks capture both causality and concurrency, allowing for a more accurate representation of
the distributed system's state.

You might also like