Chapter 1 Introduction
Chapter 1 Introduction
Systems
Tutorials
Introduction to DS
• A distributed system is a collection of autonomous computers
connected through a network, working together as a single cohesive
system.
• Characteristics: Distribution of resources, concurrency, and failure
independence.
• Examples: Internet, cloud computing, peer-to-peer networks.
Why Distributed Systems?
• Advantages: Increased performance, scalability, reliability, fault
tolerance, and resource sharing.
• Challenges: Coordination, communication, consistency, and security.
Design Goals of Distributed Systems
• Scalability: The system should be able to handle an increasing
number of users and resources.
• Reliability: The system should continue to function despite individual
component failures.
• Performance: The system should provide efficient and timely
responses to user requests.
• Transparency: The system should appear as a single, unified entity to
its users.
• Flexibility: The system should be adaptable to changing requirements
and environments.
Types of Distributed Systems
• Cluster Computing Systems:
• Cluster computing systems combine multiple machines or servers to form a
cluster that works together to perform large-scale computational tasks.
They distribute the workload among cluster nodes and often leverage
parallel processing techniques. Examples include Apache Hadoop and
Apache Spark.
• Grid Computing Systems:
• Grid computing systems connect geographically distributed resources to
form a virtual supercomputer. They enable the sharing of computing
power, storage, and data across different organizations or institutions. Grid
systems are typically used for scientific computing, research collaborations,
and resource-intensive applications.
Cont’d…
• Cloud Computing Systems:
• Cloud computing systems provide on-demand access to a pool of
computing resources, including virtual machines, storage, and services,
over the internet. They offer scalability, flexibility, and pay-per-use billing
models. Examples include Amazon Web Services (AWS), Microsoft Azure,
and Google Cloud Platform (GCP).
• Internet of Things (IoT) Systems:
• IoT systems connect a large number of devices and sensors, often
geographically distributed, to collect and exchange data. They involve
distributed processing, data aggregation, and coordination among devices
and services. IoT systems are used in various domains such as smart
homes, industrial automation, and smart cities.
Cont’d…
• Distributed File Systems:
• Distributed file systems are designed to provide a unified view of file
storage across multiple machines. They distribute file data and metadata
across nodes, allowing clients to access and manipulate files as if they were
stored on a single machine. Examples include the Google File System (GFS)
and the Hadoop Distributed File System (HDFS).
• Distributed Database Systems:
• Distributed database systems store and manage data across multiple
nodes to provide scalability, fault tolerance, and improved performance.
They distribute data across nodes and support distributed query processing
and transaction management. Examples include Apache Cassandra and
Apache HBase.
Architectural Models
• Client-Server Model
• Clients request services from servers.
• Servers provide services to clients.
• Example: Web applications with web browsers (clients) and web servers.
• Peer-to-Peer Model
• Peers communicate and collaborate directly with each other.
• Peers can act as clients or servers.
• Example: File sharing networks like BitTorrent.
• Hybrid Models
• Combine elements of both client-server and peer-to-peer models.
• Example: Distributed databases with dedicated server nodes and peer replication.
Next…
Communication Models
Communication in Distributed Systems
• Communication Models
• Message Passing: Communication through explicit message exchanges.
• Remote Procedure Call (RPC): Invoking procedures on remote machines.
• Publish-Subscribe: Subscribers receive notifications about events from publishers.
• Message Queues: Messages are stored in queues for asynchronous processing.
• Communication Protocols
• TCP/IP: Transmission Control Protocol/Internet Protocol for reliable and
connection-oriented communication.
• UDP: User Datagram Protocol for unreliable and connectionless communication.
• HTTP: Hypertext Transfer Protocol for web-based communication.
• MQTT: Message Queuing Telemetry Transport for lightweight publish-subscribe
messaging.
Next…
Consistency Models
Consistency Models
• Consistency models define the guarantees about the order and visibility of
data in a distributed system.
• A replica refers to a copy of data or a component in a distributed system
that is stored and maintained on multiple nodes.
• Some commonly discussed consistency models include:
• Strong Consistency: In a strongly consistent system, all replicas show the
same data at all times.
• Any read operation immediately reflects the most recent write operation.
• Achieving strong consistency often requires coordination and
synchronization between replicas, which can impact performance and
availability.
Types of Strong Consistency
• Two-Phase Locking (2PL): This mechanism ensures that conflicting
operations on shared data are serialized.
• It involves acquiring locks before accessing shared data and releasing
them after the operation is complete.
• 2PL guarantees strict serializability but can introduce contention and
affect system performance.
• Distributed Transaction Commit Protocol: Consistency can be
enforced using distributed transaction commit protocols, such as the
Two-Phase Commit (2PC) or Three-Phase Commit (3PC).
Cont’d…
• Eventual Consistency: Eventual consistency allows replicas to
temporarily show different data but guarantees that they will
eventually converge to a consistent state.
• This model relaxes the synchronization requirements, allowing
replicas to operate independently and asynchronously.
• It is often used in systems that prioritize availability and partition
tolerance over strict consistency.
Cont’d…
• Vector Clocks: Vector clocks are used to track the causal ordering of events
in a distributed system. Each replica maintains a vector clock that is
updated with each event. The vector clock information helps determine the
relative ordering of events across replicas.
• Anti-Entropy and Merkle Trees: Anti-entropy mechanisms, such as the
Gossip Protocol, periodically exchange updates between replicas to
synchronize data. Merkle trees are used to efficiently detect differences
and reconcile inconsistencies between replicas by verifying the integrity of
data blocks.
• Conflict Resolution and Convergence: In eventual consistency, conflicts
may arise when concurrent updates occur on different replicas. Conflict
resolution techniques, such as Last-Writer-Wins (LWW) or Multi-Value
Convergence (MVC), are used to reconcile conflicting updates and converge
the data to a consistent state over time.
Cont’d…
• Causal Consistency: Causal consistency ensures that the order of
causally related events is preserved across replicas.
• If one event causally depends on another, all replicas must observe
the same causal order. However, the ordering of unrelated events can
be different across replicas.
• Read/Write Consistency: Some systems provide different consistency
levels for read and write operations.
• For example, a system may offer strong consistency for write
operations to ensure data integrity but provide eventual consistency
for read operations to improve performance.
Cont’d…
• Dependency Tracking: Causal consistency mechanisms track the
causal dependencies between events. This can be done through
explicit metadata or implicit tracking based on the ordering of events.
Lamport Clocks: Lamport clocks assign a unique timestamp to each
event and help establish a partial ordering of events in a distributed
system. Lamport clocks are used to capture the causal dependencies
between events and ensure consistent ordering.
• Vector Clocks: Vector clocks, as mentioned earlier, are also used in
causal consistency mechanisms to track and enforce the causal
ordering of events across replicas.
Next…
Replication
Replication Techniques
• Replication involves creating and maintaining multiple copies of data
or components across distributed systems.
• Replication offers several benefits, including increased availability,
fault tolerance, and performance.
• Here are some common replication techniques:
Cont’d…
• Primary-Backup Replication: In this approach, one replica (the primary)
handles all client requests and updates the backup replicas.
• If the primary replica fails, one of the backups takes over its role.
• This technique ensures that there is always a consistent copy of the data
available.
• State Machine Replication: State machine replication involves executing
the same set of commands on all replicas in the same order.
• Each replica applies the commands to its local state machine, ensuring that
they all reach the same state.
• This technique provides strong consistency but can be resource-intensive
due to the need for synchronous communication and coordination.
Cont’d…
• Quorum-Based Replication: Quorum-based replication requires a
certain number of replicas to agree on a write operation before it is
considered successful.
• The quorum can be a majority, a fixed number, or a percentage of the
replicas.
• Quorum-based replication balances the trade-off between
consistency and performance, allowing systems to continue operating
as long as a sufficient number of replicas are available.
Next…