UNIT-I AOS
UNIT-I AOS
UNIT- I
Architectures of Distributed Systems:
What are Distributed Systems?
Distributed Systems are networks of independent computers that work together to present
themselves as a unified system. These systems share resources and coordinate tasks across
multiple nodes, allowing them to work collectively to achieve common goals. Key characteristics
include:
Multiple Nodes: Consists of multiple interconnected computers or servers that
communicate over a network.
Resource Sharing: Enable sharing of resources such as processing power, storage, and
data among the nodes.
Scalability: This can be scaled by adding more nodes to handle increased load or
expand functionality.
Fault Tolerance: Designed to handle failures of individual nodes without affecting the
overall system’s functionality.
Transparency: Aim to hide the complexities of the underlying network, making the
system appear as a single coherent entity to users.
Hardware Platform: Hardware platform includes the physical components of the system
such as servers, storage devices, and network infrastructure. The hardware platform must be
chosen based on the specific requirements of the system, such as the amount of storage and
processing power needed, as well as any specific technical constraints.
Software Platform: Software platform includes the operating system, application servers,
and other software components that run on the hardware. The software platform must be
chosen based on the programming languages and frameworks used to build the system, as
well as any specific technical constraints.
System interfaces: System interfaces include the APIs and user interfaces used to interact
with the system. Interfaces must be designed to be easy to use and understand and must be
able to handle the expected load of users and requests.
System Structure: System structure includes the overall organization of the system,
including the relationship between different components and how they interact with each
other. The system structure must be designed to be modular and scalable so that new
features and components can be added easily.
Security: Security is an important aspect of system architecture. It must be designed to
protect the system and its users from malicious attacks and unauthorized access.
There are several different architectural styles that can be used when designing a system,
such as:
Monolithic architecture: This is a traditional approach where all components of the system
are tightly coupled and run on a single server. The components are often tightly integrated
and share a common codebase.
Microservices architecture: In this approach, the system is broken down into a set of small,
independent services that communicate with each other over a network. Each service is
responsible for a specific task and can be developed, deployed, and scaled independently.
This allows for greater flexibility and scalability, but also requires more complexity in
managing the interactions between services.
Event-driven architecture: This approach is based on the idea of sending and receiving
events between different components of the system. Events are generated by one
component, and are consumed by other components that are interested in that particular
event. This allows for a more asynchronous and decoupled system.
Serverless architecture: this approach eliminates the need for provisioning and managing
servers, by allowing to run code without thinking about servers. In this way, the cloud
provider is responsible for scaling and maintaining the infrastructure, allowing the developer
to focus on writing code.
When designing a system, it is important to choose an architecture that aligns with the
requirements and constraints of the project, such as scalability, performance, security, and
maintainability.
Example: website for an online retail store.
One example of a system design architecture is the design of a website for an online retail
store. The architecture includes the following components:
Front-end: The user interface of the website, including the layout, design, and navigation.
This component is responsible for displaying the products, categories, and other information
to the user.
Back-end: The server-side of the website, including the database, application logic, and
APIs. This component is responsible for processing and storing the data, and handling the
user’s requests.
Database: The component that stores and manages the data for the website, such as
customer information, product information, and order information.
APIs: The component that allows the website to communicate with other systems, such as
payment systems, shipping systems, and inventory systems.
Security: The component that ensures the website is secure and protected from
unauthorized access. This includes measures such as SSL encryption, firewalls, and user
authentication.
Monitoring and analytics: The component that monitors the website’s performance, tracks
user behaviour, and provides data for analytics and reporting.
Naming
Scalability
Compatibility
Process synchronization
Data migration: data are brought to the location that needs them.
o distributed filesystem (file migration)
o distributed shared memory (page migration)
Computation migration: the computation migrates to another location.
o remote procedure call: computation is done at the remote machine.
o processes migration: processes are transferred to other processors.
Security
Structuring
Communication Networks
Communication Models
message passing
remote procedure call (RPC)
RPC
With message passing, the application programmer must worry about many details:
parsing messages
pairing responses with request messages
converting between data representations
knowing the address of the remote machine/server
handling communication and system failures
RPC Diagram
Communication networks
Communication networks in advanced operating systems are crucial for enabling efficient data
exchange between different parts of the system, other systems, and the external environment. These
networks can be categorized based on their scope, functionality, and the technologies they employ.
Here are some key aspects of communication networks in advanced operating systems:
1. Network Types
Local Area Network (LAN): Used for connecting devices within a limited area, like a building
or campus.
Wide Area Network (WAN): Covers larger geographic areas, connecting multiple LANs.
Metropolitan Area Network (MAN): Spans a city or a large campus.
Personal Area Network (PAN): Connects devices within the range of an individual, typically
within a few meters.
2. Network Protocols
3. Network Topologies
5. Communication Mechanisms
6. Performance Optimization
Load Balancing: Distributes network traffic across multiple servers to ensure no single server
becomes a bottleneck.
Caching: Stores frequently accessed data closer to the user to reduce latency.
Compression: Reduces the size of data to speed up transmission.
7. Emerging Technologies
5G Networks: Offer higher speeds, lower latency, and greater capacity than previous mobile
networks.
Internet of Things (IoT): Connects a wide range of devices to the internet, enabling new
applications and services.
Software-Defined Networking (SDN): Allows for centralized network management and
dynamic network configuration.
Examples of Advanced Operating Systems Implementing Communication Networks
Linux: Known for its robust networking capabilities, extensive support for networking protocols,
and strong community support.
Windows: Provides comprehensive networking features, integration with Active Directory for
centralized management, and support for various protocols.
macOS: Offers seamless integration with Apple's ecosystem, strong security features, and
user-friendly networking tools.
Communication primitives:
Communication primitives in advanced operating systems are fundamental mechanisms that facilitate
communication between processes or threads. These primitives ensure that different parts of a system
can coordinate and share data efficiently and safely. Here's an overview of some key communication
primitives:
1. Message Passing:
o Send/Receive: Processes send and receive messages using system calls. This can
be implemented using direct or indirect communication, synchronous or asynchronous
communication.
o Mailboxes/Message Queues: Intermediate storage locations where messages are
held until the receiving process retrieves them.
2. Shared Memory:
o Memory Mapped Files: Portions of files are mapped into the process's address
space, allowing multiple processes to share memory.
o POSIX Shared Memory: Standardized shared memory mechanisms provided by
POSIX-compliant systems.
3. Synchronization Primitives:
o Mutexes (Mutual Exclusion): Prevents multiple threads from accessing critical
sections of code simultaneously.
o Semaphores: Generalized synchronization tools used for signaling and controlling
access to resources.
o Condition Variables: Used in conjunction with mutexes to block threads until a
particular condition is met.
4. Pipes:
o Anonymous Pipes: Unidirectional data channels used for communication between
parent and child processes.
o Named Pipes (FIFOs): Named, bidirectional communication channels that can be
used between unrelated processes.
5. Signals:
o Software Interrupts: Allow processes to send simple notifications to each other.
Signals can carry basic information or be used to initiate actions in the receiving
process.
6. Remote Procedure Calls (RPC):
o RPC Mechanisms: Allow a process to execute a procedure in another address space
(on the same machine or across a network), abstracting the communication details.
7. Sockets:
o Interprocess Communication (IPC) Sockets: Used for communication between
processes on the same machine.
o Network Sockets: Facilitate communication between processes over a network,
supporting protocols like TCP/IP and UDP.
8. Memory Barriers:
o Memory Fences: Ensure proper ordering of memory operations, crucial in
multiprocessor environments.
9. Event Flags:
o Event Wait/Signal: Processes or threads wait for an event to occur and are signaled
when the event takes place.
10. Monitors:
o High-level Synchronization: Encapsulate shared variables, procedures, and the
synchronization between threads that access them.
Each of these primitives serves a specific purpose in ensuring efficient, reliable, and secure
communication and synchronization in an operating system. Advanced operating systems typically
provide a combination of these primitives to cover a wide range of communication and synchronization
needs.
Theoretical foundations:
Theoretical foundations of advanced operating systems are built on several key concepts and principles
that ensure their reliability, efficiency, and scalability. Here are some of the core theoretical foundations:
These theoretical foundations form the basis for the design, implementation, and management of
advanced operating s systems, enabling them to meet the demands of modern computing
environments.
Distributed systems, despite their numerous benefits such as scalability, fault tolerance, and resource
sharing, come with several inherent limitations. Here are some of the key challenges and limitations in
advanced operating systems:
Latency: Communication over a network introduces delays that are significantly higher than
memory access times. This can affect the performance of distributed applications.
Bandwidth: The bandwidth of the network can become a bottleneck, especially for data-
intensive applications.
2. Fault Tolerance
Partial Failures: Unlike a single system, parts of a distributed system can fail independently.
Detecting, handling, and recovering from these partial failures can be complex.
Consistency: Ensuring data consistency across distributed nodes is challenging, particularly in
the presence of network partitions and failures.
3. Security
4. Synchronization
Consistency Models: Providing strong consistency guarantees (like those in a single system)
can be difficult and may impact performance.
Replication: Managing replicated data to ensure consistency, availability, and partition
tolerance (as per the CAP theorem) requires sophisticated mechanisms.
6. Scalability
Scalability Issues: While distributed systems are designed to be scalable, achieving seamless
scalability requires careful design to avoid bottlenecks and ensure balanced load distribution.
Resource Management: Efficiently managing and allocating resources across distributed
nodes is challenging.
7. Complexity
System Complexity: The overall system becomes more complex, making development,
debugging, and maintenance harder.
Software Development: Writing distributed applications requires knowledge of distributed
algorithms, network programming, and concurrency control.
8. Concurrency
Debugging Difficulties: Identifying and fixing bugs in a distributed system is more challenging
due to the non-deterministic nature of network communication and concurrency.
Monitoring and Logging: Collecting and analyzing logs and performance metrics from
multiple nodes requires robust tools and infrastructure.
In advanced operating systems, particularly in distributed systems, LAMP (Logical and Message-based
Processes) ports and logical clocks play crucial roles in ensuring correct and efficient communication,
synchronization, and event ordering among distributed components. Let's delve into these concepts:
LAMP Ports
LAMP ports refer to a logical mechanism used to facilitate communication between distributed
processes. Each port acts as a conduit for sending and receiving messages, ensuring that processes
can communicate without direct knowledge of each other's physical locations or states.
1. Logical Ports: These are abstract representations that map to physical network ports. They
allow processes to send and receive messages, typically identified by unique identifiers within a
system.
2. Message Passing: LAMP ports are essential in message-based communication models.
Messages sent to a LAMP port are queued until the receiving process retrieves them.
3. Port Binding: Processes bind to specific ports, allowing them to send or receive messages.
Binding can be static (fixed ports) or dynamic (ports assigned at runtime).
Logical Clocks
Logical clocks are a fundamental concept used to order events in a distributed system where there is
no global clock. They help in maintaining the causality of events and ensuring consistency across the
system.
1. Lamport Timestamps: Named after Leslie Lamport, these timestamps provide a way to order
events in a distributed system. Each event in the system is assigned a timestamp such that if
one event happens before another, the timestamp of the first event is less than the timestamp
of the second.
o Rules:
If an event aaa happens before event bbb in the same process, then the
timestamp of aaa is less than the timestamp of bbb.
If a message is sent from one process to another, the timestamp of the
sending event is less than the timestamp of the receiving event.
Each process increments its logical clock before timestamping an event.
2. Vector Clocks: An extension of Lamport timestamps, vector clocks provide a more precise
mechanism for capturing causality by using a vector of clocks, one for each process in the
system.
o Rules:
Each process maintains a vector of logical clocks.
When an event occurs locally, the process increments its own clock in the
vector.
When a process sends a message, it includes its vector clock with the
message.
When a process receives a message, it updates its own vector clock by taking
the element-wise maximum of its own clock and the received clock, then
increments its own clock.
By combining LAMP ports for communication and logical clocks for event ordering, advanced operating
systems can manage complex interactions between distributed processes, ensuring both efficiency and
correctness.
vector clocks:
Vector clocks are a mechanism used in distributed systems to keep track of the causal relationships
between events. They are an extension of Lamport clocks and provide more information about the
ordering of events. Here's a detailed explanation of vector clocks and their usage in advanced
operating systems:
1. Vector Structure:
o A vector clock is an array of logical clocks, where each element in the vector
corresponds to a node (process) in the distributed system.
o Each node maintains its own vector clock, which it updates based on the events it
processes.
2. Initialization:
o Initially, all elements of the vector clock are set to zero.
3. Update Rules:
o Internal Events: When a process PiP_iPi executes an internal event, it increments the
iii-th element of its vector clock.
o Send Events: When a process PiP_iPi sends a message, it increments the iii-th
element of its vector clock and attaches a copy of the updated vector clock to the
message.
o Receive Events: When a process PiP_iPi receives a message with a vector clock
VmV_mVm, it updates its own vector clock ViV_iVi by taking the element-wise
maximum of ViV_iVi and VmV_mVm, then increments the iii-th element of ViV_iVi.
1. Causal Ordering:
o Vector clocks help maintain causal ordering of events. If ViV_iVi and VjV_jVj are vector
clocks of events eie_iei and eje_jej respectively, then eie_iei happened-before eje_jej
(denoted as ei→eje_i \rightarrow e_jei→ej) if and only if ViV_iVi is less than or equal
to VjV_jVj for all elements and at least one element of ViV_iVi is strictly less than
VjV_jVj.
2. Concurrency Control:
o In distributed databases or transactional systems, vector clocks help determine which
transactions or operations are concurrent, aiding in conflict resolution and consistency
maintenance.
3. Distributed Debugging:
o Vector clocks assist in identifying the causal relationships between events across
different nodes, making it easier to debug and understand the system behavior.
4. Snapshot Algorithms:
o Algorithms that capture consistent snapshots of the system state use vector clocks to
ensure that the snapshot reflects a possible global state.
Example
Consider a system with three processes P1P_1P1, P2P_2P2, and P3P_3P3. Each process has a
vector clock V=[V1,V2,V3]V = [V_1, V_2, V_3]V=[V1,V2,V3]:
1. Initial State:
o P1P_1P1: [0,0,0][0, 0, 0][0,0,0]
o P2P_2P2: [0,0,0][0, 0, 0][0,0,0]
o P3P_3P3: [0,0,0][0, 0, 0][0,0,0]
2. Event Execution:
o P1P_1P1 performs an internal event:
P1P_1P1: [1,0,0][1, 0, 0][1,0,0]
o P1P_1P1 sends a message to P2P_2P2:
P1P_1P1: [2,0,0][2, 0, 0][2,0,0] (before sending)
Message sent with vector clock [2,0,0][2, 0, 0][2,0,0]
o P2P_2P2 receives the message from P1P_1P1:
P2P_2P2: [2,1,0][2, 1, 0][2,1,0] (after receiving and updating)
o P2P_2P2 sends a message to P3P_3P3:
P2P_2P2: [2,2,0][2, 2, 0][2,2,0] (before sending)
Message sent with vector clock [2,2,0][2, 2, 0][2,2,0]
o P3P_3P3 receives the message from P2P_2P2:
P3P_3P3: [2,2,1][2, 2, 1][2,2,1] (after receiving and updating)
By examining the vector clocks, the system can determine the causal relationships between events,
such as which events happened before others and which are concurrent.
Casual ordering of messages in advanced operating systems, particularly in the context of distributed
systems, refers to the concept where messages are ordered based on their causal relationships rather
than strict chronological order. This is important to ensure consistency and coordination across different
nodes or processes in a system.
Here's a breakdown of key aspects related to casual ordering of messages:
1. Causal Relationships: If one event causally affects another, they have a causal relationship.
In distributed systems, if a message AAA is sent before message BBB and AAA influences
BBB, then AAA should be processed before BBB.
2. Vector Clocks: One way to implement causal ordering is through vector clocks. Each process
maintains a vector clock, and when a message is sent, it includes the current state of the
sender's vector clock. Upon receiving a message, a process can determine the causal order by
comparing vector clocks.
3. Lamport Timestamps: Another approach involves Lamport timestamps. Though they don’t
provide true causal ordering, they can help in determining a partial order of events in a
distributed system.
4. Happened-Before Relationship: The happened-before relationship (denoted as →\
rightarrow→) is fundamental to understanding causal ordering. If event E1E1E1 happened
before event E2E2E2, then E1→E2E1 \rightarrow E2E1→E2. This relationship is transitive
and helps in defining the order in which messages should be processed.
5. Concurrency: Two events are concurrent if neither can causally affect the other. In this case,
their order of execution doesn’t matter with respect to causal consistency.
6. Causal Multicast: In systems requiring causal ordering, causal multicast protocols ensure that
messages are delivered to all recipients respecting their causal relationships. This involves
delaying messages until all causally preceding messages have been delivered.
7. Applications: Causal ordering is critical in collaborative applications, distributed databases,
and any system where maintaining consistency of operations across different nodes is
necessary.
Global state
In the context of advanced operating systems, the term "global state" typically refers to the overall
status of the operating system and its components at a given moment in time. This includes information
about processes, memory, storage, and other resources. Understanding the global state is crucial for
various operating system tasks, such as process scheduling, memory management, and resource
allocation. Here's an overview of key aspects of the global state in advanced operating systems:
In advanced operating systems, the concept of "cuts" in distributed computation is essential for
understanding the state of a distributed system at a given point in time. A cut represents a snapshot of
the global state of the distributed system, capturing the states of all processes and the messages in
transit between them. Here are some key aspects of cuts in distributed computation:
Types of Cuts
1. Consistent Cuts:
o A cut is consistent if it reflects a possible state of the system that could have occurred
during its execution. In other words, it does not violate the causality of events.
o Mathematically, a cut CCC is consistent if for any event eee included in CCC, all
events causally preceding eee are also included in CCC.
2. Inconsistent Cuts:
o A cut is inconsistent if it does not correspond to any possible state of the system, often
because it violates the causality of events.
o In an inconsistent cut, an event might appear without its causally preceding events,
which cannot happen in a real execution.
Applications of Cuts
1. Snapshot Algorithms:
o Chandy-Lamport Algorithm: A well-known algorithm to record a consistent snapshot
of a distributed system. It involves processes recording their state and the state of the
communication channels.
2. Vector Clocks:
o Used to capture the partial ordering of events in a distributed system. Each process
maintains a vector clock, and cuts can be determined by comparing these vector
clocks.
Imagine a distributed system with three processes P1P1P1, P2P2P2, and P3P3P3. Events e1e1e1,
e2e2e2, and e3e3e3 occur in P1P1P1, P2P2P2, and P3P3P3, respectively. A consistent cut could
include e1e1e1 and e2e2e2 but not e3e3e3 if e3e3e3 causally depends on e1e1e1 and e2e2e2.
Using the same processes and events, an inconsistent cut might include e3e3e3 but exclude e1e1e1
and e2e2e2, which is impossible in a real execution since e3e3e3 depends on e1e1e1 and e2e2e2.
Termination detection
Termination detection is a crucial aspect of distributed systems and advanced operating systems,
especially when dealing with concurrent processes and distributed algorithms. It involves determining
when a process or a set of processes in a distributed system has completed its execution.
Centralized Approach: Uses a central coordinator to keep track of the state of all processes
and messages. The coordinator checks whether all processes have terminated.
o Pros: Simplified implementation.
o Cons: Single point of failure, scalability issues.
Distributed Approach: Each process maintains its own state and communicates with other
processes to determine termination. Commonly used algorithms include:
o Chandy-Misra-Haas Algorithm: A distributed algorithm where processes exchange
termination messages. It uses a combination of markers and messages to detect
termination.
o Dijkstra-Scholten Algorithm: A distributed algorithm that uses a depth-first search
approach to track the termination state of processes.
4. Challenges
5. Applications
Distributed Databases: Ensuring all transactions are completed and consistent before
proceeding with further operations.
Distributed Computing: Managing and synchronizing tasks in distributed computing
environments or parallel processing.
Distributed Mutual Exclusion is a key concept in distributed systems and advanced operating systems.
Here's a brief introduction:
Mutual Exclusion is a fundamental principle in concurrent computing and distributed systems that
ensures that multiple processes or nodes do not enter a critical section of code simultaneously. The
critical section is a part of the code that accesses shared resources and must be executed by only one
process at a time to prevent inconsistencies and conflicts.
In a distributed system, where processes or nodes are spread across multiple machines or locations,
achieving mutual exclusion becomes more complex due to the lack of a centralized control and the
potential for communication delays.
Key Challenges
1. Centralized Algorithm:
o Description: A central coordinator or server manages the critical section requests and
grants access.
o Example: Lamport's Centralized Algorithm.
o Advantages: Simplicity and ease of implementation.
o Disadvantages: Single point of failure and scalability issues.
2. Distributed Algorithm:
o Description: All nodes participate in the decision-making process without a central
coordinator.
o Example: Lamport’s Logical Clocks, Ricart-Agrawala Algorithm.
o Advantages: No single point of failure and better scalability.
o Disadvantages: More complex to implement and manage.
3. Token-based Algorithm:
o Description: A unique token circulates among nodes, and only the node holding the
token can access the critical section.
o Example: Raymond's Tree-Based Algorithm.
o Advantages: Efficient in terms of message complexity.
o Disadvantages: Token loss or duplication needs to be managed.
4. Quorum-based Algorithm:
o Description: Nodes form a quorum (a subset of nodes) and require a majority vote to
enter the critical section.
o Example: The Chandy-Misra Algorithm.
o Advantages: Suitable for systems with high node failure rates.
o Disadvantages: Complexity in maintaining quorum and ensuring consistency.
Applications
Effective distributed mutual exclusion algorithms are crucial for maintaining system reliability,
consistency, and performance in distributed environments.
Mutual exclusion is a fundamental concept in concurrent computing and operating systems. It refers to
the requirement that if one process is executing in its critical section, then no other process should be
allowed to execute in its critical section. Here’s an overview of the classification and associated
algorithms for mutual exclusion in advanced operating systems:
Classification of Mutual Exclusion
1. Centralized Algorithms:
o Centralized Mutex: A single coordinator (centralized process) controls access to the
critical section. All processes must request access from this coordinator.
o Example Algorithm: Centralized Mutex Algorithm.
2. Distributed Algorithms:
o Token-Based Algorithms: A unique token is circulated among the processes. Only
the process holding the token can enter its critical section.
o Example Algorithms:
Ricart-Agrawala Algorithm: A request-response based algorithm where
processes send requests to others to gain access to the critical section.
Token Ring Algorithm: A token circulates in a logical ring. The process that
holds the token can enter its critical section.
1. Peterson’s Algorithm:
o A software-based mutual exclusion algorithm for two processes. It uses shared
variables and is easy to understand and implement.
3. Dekker’s Algorithm:
o A solution for two processes, using a shared flag and turn variable to ensure mutual
exclusion and avoid deadlock and starvation.
5. Queue-Based Algorithms:
o Queue-Based Locks: Processes form a queue to enter the critical section, ensuring
that they access it in the order of their requests.
o Example Algorithm: The Queue-Based Lock Algorithm.