0% found this document useful (0 votes)
18 views25 pages

mod 5 (ds)

Distributed transactions involve multiple independent components or servers that must coordinate to execute a transaction as a single operation while adhering to ACID properties. The Two-Phase Commit Protocol (2PC) is commonly used to ensure atomicity and consistency among participants, managing challenges such as network failures and performance overhead. Distributed transactions can be structured as flat or nested transactions, with each server responsible for concurrency control and maintaining consistency across the system.

Uploaded by

kaifmohammed6777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

mod 5 (ds)

Distributed transactions involve multiple independent components or servers that must coordinate to execute a transaction as a single operation while adhering to ACID properties. The Two-Phase Commit Protocol (2PC) is commonly used to ensure atomicity and consistency among participants, managing challenges such as network failures and performance overhead. Distributed transactions can be structured as flat or nested transactions, with each server responsible for concurrency control and maintaining consistency across the system.

Uploaded by

kaifmohammed6777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MODULE 5

DISTRIBUTED TRANSACTIONS

What Are Distributed Transactions?


• A distributed transaction is a transaction that involves multiple independent components
or servers (often in different physical or logical locations) working together.
• Each component may manage its own resources (e.g., databases, file systems), but they
all need to coordinate to complete the transaction as a single, unified operation.

Key Features of Distributed Transactions


1. Spanning Multiple Resources:
o A distributed transaction may involve databases on different servers, files on a
remote system, or APIs of other services.
2. Coordination:
o Since resources are managed independently, a mechanism is required to ensure all
participants agree on the transaction's outcome (commit or abort).
3. ACID Properties: Distributed transactions must adhere to the four ACID principles:
o Atomicity: All-or-nothing execution. If any part fails, the entire transaction is
rolled back.
o Consistency: The system transitions from one valid state to another.
o Isolation: The transaction does not interfere with others.
o Durability: Once committed, the changes are permanent.
4. Concurrency Management:
o Since transactions often run in parallel, distributed transactions require
mechanisms like locks to prevent conflicts.
How Do Distributed Transactions Work?
1. Transaction Coordinator:
o A special component or system, called the transaction coordinator, manages the
distributed transaction.
o It communicates with all participating servers, ensuring they work together to
commit or abort the transaction.
2. Transaction Participants:
o These are the systems or servers involved in executing parts of the transaction.
o For example:
▪ A payment service.
▪ An inventory management system.
▪ A shipping system.
3. Commit Protocol:
o A protocol, like the Two-Phase Commit Protocol (2PC), ensures consistency
among participants.

Challenges of Distributed Transactions


1. Network Failures:
o Communication between participants may fail, causing delays or inconsistencies.
2. Partial Failures:
o A single participant may crash while others are still operational, complicating
recovery.
3. Performance Overhead:
o Coordinating multiple servers adds latency.
4. Complexity:
o Implementing distributed transactions involves handling a variety of failure
scenarios and edge cases.
Examples of Distributed Transactions (Simplified)
1. Online Shopping:
o Actions: Deduct payment, update inventory, confirm order, schedule shipping.
o Requirement: All succeed or roll back.
2. Bank Transfer:
o Actions: Deduct money from sender’s account, credit to receiver’s account.
o Requirement: Both succeed or roll back.
3. Ride Booking:
o Actions: Confirm ride, assign driver, reserve fare.
o Requirement: All succeed or cancel booking.

What is Atomicity?
• Atomicity ensures that a transaction is treated as a single, indivisible unit.
• In distributed systems, where multiple servers are involved in the same transaction, the
atomicity property ensures:
o The transaction fully succeeds: All servers commit the changes, or
o The transaction completely fails: All servers abort and roll back the changes.
• Example:
o Imagine splitting a bill at a restaurant:
▪ If one person cannot pay, the whole group’s payment attempt is canceled
(abort).
▪ If everyone is ready to pay, the transaction goes through (commit).

Why is Coordination Needed?


• In distributed systems, each server must be in agreement for a transaction to either
commit or abort.
• To achieve this, one server takes on the role of a coordinator:
o The coordinator ensures all servers decide on the same action: either commit or
abort.
o Without coordination, some servers might commit while others abort, leading to
data inconsistency.

Two-Phase Commit Protocol (2PC)


The Two-Phase Commit (2PC) protocol is a widely used method to ensure atomicity in
distributed transactions.
It’s like a voting process to guarantee that all servers agree on a single decision.
Breaking Down the Two-Phase Commit Protocol
Phase 1: Prepare (Voting Stage)
• What happens?
1. The coordinator sends a message to all servers (participants):
▪ “Can you commit this transaction?”
2. Each server checks if it is ready to commit the transaction:
▪ Are the required resources (e.g., data or locks) available?
▪ Has the operation been successfully processed locally?
3. Servers respond to the coordinator:
▪ “Yes” (prepared to commit) if all conditions are met.
▪ “No” (must abort) if conditions are not met or an error occurs.
• Key rule:
o If even one server responds “No”, the entire transaction cannot proceed to commit.
Phase 2: Commit or Abort (Decision Stage)
• What happens next?
1. The coordinator collects all the responses:
▪ If all servers say “Yes” → The coordinator sends a “Commit” message to all
servers.
▪ If any server says “No” → The coordinator sends an “Abort” message to all
servers.

2. Each server then:


▪ Commits: Finalizes the transaction and saves changes permanently.
▪ Aborts: Rolls back any changes made during the transaction.

Why is 2PC Important?


1. Ensures consistency:
o All servers either commit or abort the transaction, maintaining synchronized states.
2. Prevents partial updates:
o No server finalizes changes unless all servers agree to commit, avoiding data
corruption.

3. Handles failures gracefully:


o If any server fails during the process, the transaction is aborted to maintain
integrity.
5.2 FLAT AND NESTED DISTRIBUTED TRANSACTION
A client transaction becomes distributed if it invokes operations in several different servers.
There are two different ways that distributed transactions can be structured: as flat transactions
and as nested transactions.

FLAT TRANSACTION : In a flat transaction, a client makes requests to more than one server.
For example, in Figure (a), transaction T is a flat transaction that invokes operations on objects
in servers X, Y and Z. A flat client transaction completes each of its requests before going on to
the next one. Therefore, each transaction accesses servers’ objects sequentially. When servers
use locking, a transaction can only be waiting for one object at a time.
NESTED TRANSACTION : In a nested transaction, the top-level transaction can open
subtransactions, and each subtransaction can open further subtransactions down to any depth of
nesting. Figure (b) shows a client transaction T that opens two subtransactions, T1 and T2,
which access objects at servers X and Y. The subtransactions T1 and T2 open further
subtransactions T11, T12, T21, and T22, which access objects at servers M, N and P. In the
nested case, subtransactions at the same level can run concurrently, so T1 and T2 are
concurrent, and as they invoke objects in different servers, they can run in parallel. The four
subtransactions T11, T12, T21 and T22 also run concurrently. Here we will see the example of
nested transaction

Consider a distributed transaction in which a client transfers $10 from account A to C and then
transfers $20 from B to D. Accounts A and B are at separate servers X and Y and accounts C
and D are at server Z. If this transaction is structured as a set of four nested transactions, as
shown in Figure 5.2 , the four requests (two deposits and two withdraws) can run in parallel and
the overall effect can be achieved with better performance than a simple transaction in which
the four operations are invoked sequentially.
5.2.1 The coordinator of a distributed transaction

Servers that execute requests as part of a distributed transaction need to be able to communicate
with one another to coordinate their actions when the transaction commits. A client starts a
transaction by sending an openTransaction request to a coordinator in any server, as described
in Section 4.2. The coordinator that is contacted carries out the openTransaction and returns the
resulting transaction identifier (TID) to the client. Transaction identifiers for distributed
transactions must be unique within a distributed system. A simple way to achieve this is for a
TID to contain two parts: the identifier (for example, an IP address) of the server that created it
and a number unique to the server. The coordinator that opened the transaction becomes the
coordinator for the distributed transaction and at the end is responsible for committing or
aborting it. Each of the servers that manages an object accessed by a transaction is a participant
in the transaction and provides an object we call the participant. Each participant is responsible
for keeping track of
all of the recoverable objects at that server that are involved, in the transaction. The participants
are responsible for cooperating with the coordinator in carrying out the commit protocol.
During the progress of the transaction, the coordinator records a list of references to the
participants, and each participant records a reference to the coordinator. The interface for
Coordinator shown in Figure 5.3 provides an additional method, join, which is used whenever a
new participant joins the transaction: join(Trans, reference to participant) Informs a coordinator
that a new participant has joined the transaction Trans.The coordinator records the new
participant in its participant list. The fact that the coordinator knows all the participants and
each participant knows the coordinator will enable them to collect the information that will be
needed at commit time.
Figure 5.3 shows a client whose (flat) banking transaction involves accounts A, B, C and D at
servers BranchX, BranchY and BranchZ. The client’s transaction, T, transfers $4 from account
A to account C and then transfers $3 from account B to account D. The transaction described on
the left is expanded to show that openTransaction and closeTransaction are directed to the
coordinator, which would be situated in one of the servers involved in the transaction. Each
server is shown with a participant, which joins the transaction by invoking the join method in
the coordinator. When the client invokes one of the methods in the transaction, for example
b.withdraw(T, 3), the object receiving the invocation (B at BranchY, in this case) informs its
participant object that the object belongs to the transaction T. If it has not already informed the
coordinator, the participant object uses the join operation to do so. In this example, we show the
transaction identifier being passed as an additional argument so that the recipient can pass it on
to the coordinator. By the time the client calls closeTransaction, the coordinator has references
to all of the participants.
Note that it is possible for a participant to call abortTransaction in the coordinator if for some
reason it is unable to continue with the transaction.
CONCURRENCY CONTROL
Each server manages a set of objects and is responsible for ensuring that they remain consistent
when accessed by concurrent transactions. Therefore, each server is responsible for applying
concurrency control to its own objects.
The members of a collection of servers of distributed transactions are jointly responsible for
ensuring that they are performed in a serially equivalent manner. This implies that if transaction
T is before transaction U in their conflicting access to objects at one of the servers, then every
server involved in the task must makes sure T always happens before U for the objects they
both use.
Locking
In a distributed transaction, the locks on an object are held locally (in the same server). The
local lock manager can decide whether to grant a lock or make the requesting transaction wait.
However, it cannot release any locks until it knows that the transaction has been committed or
aborted at all the servers involved in the transaction.
When locking is used for concurrency control, the objects remain locked and are unavailable for
other transactions during the atomic commit protocol, although an aborted transaction releases
its locks after phase 1 of the protocol. As lock managers in different servers set their locks
independently of one another, it is possible that different servers may impose different orderings
on transactions.
Consider the following interleaving of transactions T and U at servers X and Y:
The transaction T locks object A at server X, and then transaction U locks object B at server Y.
After that, T tries to access B at server Y and waits for U’s lock. Similarly, transaction U tries to
access A at server X and has to wait for T’s lock.
Therefore, we have T before U in one server and U before T in the other. These different
orderings can lead to cyclic dependencies between transactions, giving rise to a distributed
deadlock situation.
When a deadlock is detected, a transaction is aborted to resolve the deadlock. In this case, the
coordinator will be informed and will abort the transaction at the participants involved in the
transaction.
Timestamp ordering concurrency control
In a single server transaction, the coordinator issues a unique timestamp to each transaction
when it starts. Serial equivalence is enforced by committing the versions of objects in the order
of the timestamps of transactions that accessed them.
In distributed transactions, we require that each coordinator issue globally unique timestamps. A
globally unique transaction timestamp is issued to the client by the first coordinator accessed by
a transaction. The transaction timestamp is passed to the coordinator at each server whose
objects perform an operation in the transaction.

The servers of distributed transactions are jointly responsible for ensuring that they are
performed in a serially equivalent manner. For example, if the version of an object accessed by
transaction U commits after the version accessed by T at one server, if T and U access the same
object as one another at other servers they must commit them in the same order.
To achieve the same ordering at all the servers, the coordinators must agree as to the ordering of
their timestamps. A timestamp consists of a <local timestamp, server-id> pair. The agreed
ordering of pairs of timestamps is based on a comparison in which the server-id part is less
significant. The same ordering of transactions can be achieved at all the servers even if their
local clocks are not synchronized.
For reasons of efficiency it is required that the timestamps issued by one coordinator be roughly
synchronized with those issued by the other coordinators. When this is the case, the ordering of
transactions generally corresponds to the order in which they are started in real time.
Timestamps can be kept roughly synchronized by the use of synchronized local physical clocks.
If the resolution of a conflict requires a transaction to be aborted, the coordinator will be
informed and it will abort the transaction at all the participants. Therefore any transaction that
reaches the client request to commit should always be able to commit, and participants in the
two-phase commit protocol will normally agree to commit. The only situation in which a
participant will not agree to commit is if it has crashed during the transaction.
Optimistic Concurrency Control in Distributed Transactions
• Validation of Transactions:
In optimistic concurrency control, transactions are validated before they are allowed to
commit. Each transaction is assigned a transaction number at the start of validation.
Transactions are serialized based on these transaction numbers.
• Distributed Transactions:
For distributed transactions, independent servers validate the transactions that access their
respective objects. This validation occurs in the first phase of the Two-Phase Commit
(2PC) protocol.
Example of Interleaving:

o Transaction T accesses object A on server X, and transaction U accesses object B


on server Y.
o Both transactions write to these objects and later access the other server's object.
o The transactions are validated at different servers in different orders (T before U at
X, U before T at Y).
• Commitment Deadlock:
When server X validates T first and server Y validates U first, a commitment deadlock
can occur. This deadlock arises because the simplified protocol only allows one
transaction to validate and update at a time, causing servers to wait for each other.
• Challenges in Distributed Transactions:
Unlike single-server transactions, distributed transactions take longer due to the two-
phase commit protocol. Transactions are blocked from entering validation until a decision
is made on the current transaction.
• Parallel Validation:
o Parallel validation allows multiple transactions to be in the validation phase
simultaneously.
o Both backward and forward validation need to be applied, checking for
overlapping write sets of the transaction being validated and earlier transactions.
• Avoiding Commitment Deadlock:
Parallel validation prevents commitment deadlock. However, independent validations
may result in different serialization orders at different servers.
Preventing Inconsistent Serialization
• Global Validation:
o After local validation, a global validation is performed to ensure the serializability
of the combined orderings from individual servers. This prevents cycles in the
serialization order of transactions.
• Globally Unique Transaction Numbers:
o The coordinator of the two-phase commit protocol generates a globally unique
transaction number, passed to all servers participating in the transaction.
o Servers need to agree on an ordering of these numbers to ensure consistency, as
seen in distributed timestamp ordering.
Algorithms for Optimistic Concurrency in Distributed Systems
• MVGV (Multi-Version Generalized Validation):
o This algorithm favors read-only transactions, ensuring that transaction numbers
reflect the serial order.
o It allows transaction numbers to be adjusted so that some transactions can still
validate, avoiding unnecessary aborts.
• Transaction Number Proposal:
o In some algorithms, like that proposed by Agrawal et al., the coordinator proposes
a global transaction number at the end of the read phase.
o Each participant attempts to validate transactions using this number. If the number
is too small, participants negotiate for a higher number.
o If a common number is found that all participants can validate, the transaction is
committed; otherwise, it is aborted.
Distributed dead lock
Deadlocks in Servers:
• Deadlocks can occur within a single server when locking is used for concurrency control.
Need for Deadlock Management:
• Servers must adopt strategies to either prevent deadlocks or detect and resolve them when
they occur.
Timeouts for Deadlock Resolution:
• Timeouts can be used to resolve deadlocks, but:
o They are a clumsy approach.
o Choosing an appropriate timeout interval is challenging.
o Transactions may be unnecessarily aborted.
Deadlock Detection Schemes:
• Deadlock detection schemes are more efficient because:
o A transaction is aborted only when it is part of a deadlock.
• Most detection schemes work by identifying cycles in the transaction wait-for graph.
Distributed Systems Complexity:
• In distributed systems with multiple servers and transactions, the situation becomes more
complex.
• A global strategy is required to effectively manage and detect deadlocks across the entire
system.
1. Global Wait-For Graph:
o The global wait-for graph can theoretically be constructed by combining local
wait-for graphs from multiple servers.
o A deadlock in a distributed system arises if there is a cycle in the global wait-for
graph, which might not be present in any single local graph.
2. Wait-For Graph Details:
o Structure:
▪ Nodes represent transactions and objects.
▪ Edges represent either:
▪ An object held by a transaction, or
▪ A transaction waiting for an object.
o Deadlock Detection:
▪ A deadlock occurs if and only if there is a cycle in the wait-for graph.
o Simplification: Objects can be omitted from the graph, as a transaction can wait for
only one object at a time.
3. Illustrative Example:
o Transactions and Servers:
▪ Transactions: U, V, and W.
▪ Objects: A and B (Servers X and Y), C and D (Server Z).
o Local Wait-For Graphs:
▪ Server Y: U → V (when U requests b.withdraw(30)).
▪ Server Z: V → W (when V requests c.withdraw(20)).
▪ Server X: W → U (when W requests a.withdraw(20)).
4. Global Deadlock Detection:
o Challenge:
▪ The global wait-for graph is distributed across servers, requiring inter-server
communication to detect cycles.
o Centralized Deadlock Detection:
▪ One server acts as the global deadlock detector.
▪ Local wait-for graphs are periodically sent to the global detector.
▪ The global detector merges these local graphs to create a comprehensive
global wait-for graph.
▪ It checks for cycles to identify distributed deadlocks.

· Deadlock Resolution:
• Upon finding a cycle in the global wait-for graph:
o The centralized detector decides how to resolve the deadlock.
o It instructs the servers on which transaction to abort.
·Drawbacks of Centralized Detection:
• Single Server Dependency:
o Relies entirely on one server to detect and resolve deadlocks.
• Poor Availability:
o The system becomes less available due to its reliance on a single point of control.
• Lack of Fault Tolerance:
o Failure of the central server disrupts the entire deadlock detection process.
• Limited Scalability:
o Not suitable for large distributed systems with high transaction and server volumes.
·High Communication Overhead:
• Frequent transmission of local wait-for graphs to the centralized detector incurs
significant communication costs.
·Delayed Deadlock Detection:
• Reducing the frequency of updates to minimize communication overhead can delay the
identification of deadlocks, prolonging their impact.
Phantom Deadlocks:
1. Definition of Phantom Deadlock:
o A phantom deadlock occurs when a deadlock is falsely detected even though it
does not actually exist.
2. Cause of Phantom Deadlocks in Distributed Systems:
o Information about wait-for relationships between transactions is transmitted
between servers.
o Detecting a deadlock requires collecting and analyzing this information, which
takes time.
o During this time, a transaction holding a lock may release it, resolving the
deadlock before it is detected.
3. Example of Phantom Deadlock:
o A global deadlock detector receives local wait-for graphs from servers X and Y.
o Transaction U releases an object at server X and requests an object held by V at
server Y.
o The global detector processes server Y's graph before receiving the updated graph
from server X.
o A cycle (e.g., T → U → V → T) is detected, even though the edge T → U no
longer exists.
4. Impact of Two-Phase Locking:
o If transactions use two-phase locking:
▪ Transactions cannot release objects and then acquire more locks.
▪ This reduces the likelihood of phantom deadlocks occurring

local wait-for graph local wait-for graph global deadlock detector

cycles cannot occur in the way suggested above. Consider the situation in which a cycle T U
V T is detected: either this represents a deadlock or each of the transactions T, U and V
must eventually commit. It is actually impossible for any of them to commit, because each of
them is waiting for an object that will never be released.
A phantom deadlock could be detected if a waiting transaction in a deadlock cycle aborts during
the deadlock detection procedure. For example, if there is a cycle T U V T and U aborts
after the information concerning U has been collected, then the cycle has been broken already
and there is no deadlock.

Edge Chasing for Distributed Deadlock Detection:


1. Definition of Edge Chasing:
o A distributed technique for deadlock detection where no global wait-for graph is
constructed.
o Servers maintain knowledge of local edges and use messages (probes) to detect
cycles across the system.
2. Probes:
o Probes are messages representing transaction wait-for paths in the global wait-for
graph.
o Probes follow edges of the graph across servers to identify potential cycles.
3. When to Send a Probe:
o A probe is sent when adding a local edge might indicate part of a cycle.
o Example: At server X, adding the edge W → U when U is waiting for V at server Y
justifies sending a probe to Y.
o Probes are not sent if no waiting transactions are involved, e.g., when W is not
waiting in the case of server Z adding V → W.

Steps in Edge-Chasing Algorithms:


• Initiation:
o A server initiates detection when a transaction TTT starts waiting for another
transaction UUU, and UUU is waiting for an object at another server.
o Probes are sent containing the edge T→UT \rightarrow UT→U to the relevant
server.
o If UUU shares a lock, probes are sent to all lock holders, including new holders
added later.
• Detection:
o Servers receiving probes check if the referenced transaction is waiting.
o If waiting, the transaction it depends on (e.g., VVV) is added to the probe, and the
probe is forwarded.
o Servers check for cycles in the probe (e.g., T→U→V→TT \rightarrow U
\rightarrow V \rightarrow TT→U→V→T), indicating a deadlock.
• Resolution:
o When a cycle is detected, one transaction in the cycle is aborted to break the
deadlock.
o The transaction to abort is chosen based on priority.
Example of Deadlock Detection:
• Initiation: Server X sends probe W→UW \rightarrow UW→U to Server Y.
• Detection at Server Y:
o BBB (held by VVV) is appended to the probe to create W→U→VW \rightarrow U
\rightarrow VW→U→V.
o Probe forwarded to Server Z, as VVV is waiting for CCC.
• Detection at Server Z:
o CCC (held by WWW) is appended to the probe, creating W→U→V→WW
\rightarrow U \rightarrow V \rightarrow WW→U→V→W.
o Cycle detected, indicating a deadlock.
Communication Process:
• Servers consult transaction coordinators before forwarding probes.
• Coordinators provide information about whether transactions are waiting and forward
probes to the relevant servers.
• Each probe message requires two transmissions (server to coordinator, coordinator to
server).
• (a) initial situation (b) detection initiated at object (c) detection initiated at object
requested by T requested by W
Probes travel downhill
(a) V stores probe when U starts waiting (b) Probe is forwarded when V starts waiting

W
Waits

Challenges and Improvements in Edge-Chasing Algorithms:


1. Priority Rule Pitfall:
o Without priority rules, probes are initiated when a transaction starts waiting.
o Under priority rules, probes are blocked if the waiting transaction has a lower
priority, potentially leaving deadlocks undetected.
o Example: In a cycle U→V→W→UU \rightarrow V \rightarrow W \rightarrow
UU→V→W→U, if W<UW < UW<U, the probe W→UW \rightarrow UW→U
will not be sent, and the deadlock is not detected.
2. Avoiding the Pitfall:
o Use probe queues:
▪ Coordinators save all probes received for a transaction in a queue.
▪ When a transaction starts waiting, it forwards its probe queue to the server of
the object it is waiting for.
o Example:
▪ UUU waits for VVV, and VVV waits for WWW:
▪ Coordinator of VVV saves U→VU \rightarrow VU→V.
▪ VVV forwards U→VU \rightarrow VU→V to WWW.
▪ WWW waits for UUU:
▪ WWW's probe queue includes U→V→WU \rightarrow V \rightarrow
WU→V→W, leading to detection of U→V→W→UU \rightarrow V
\rightarrow W \rightarrow UU→V→W→U.
3. Challenges with Probe Queues:
o Probes must be passed to new lock holders.
o Probes referring to completed (committed/aborted) transactions must be discarded.
o Failure to discard relevant probes may result in undetected deadlocks.
o Retaining outdated probes can lead to phantom deadlocks.
4. Algorithm Complexity:
o Handling probe queues adds significant complexity to edge-chasing algorithms.
o Incorrect handling of probes can lead to:
▪ Missed deadlocks.
▪ False deadlock reports.
5. Algorithm Improvements:
o Early algorithms (e.g., by Sinha and Natarajan [1985]) were found to be incorrect.
o Choudhary et al. [1989] proposed improvements but had limitations.
o Kshemkalyani and Singhal [1991] corrected the Choudhary algorithm and
provided proof of correctness.
o Later work (Kshemkalyani and Singhal [1994]) highlighted that distributed
deadlocks remain poorly understood due to the absence of global state or time in
distributed systems.
REPLICATION
INTRODUCTION TO REPLICATION IN DISTRIBUTED SYSTEM:
Data replication refers to the process of creating and maintaining multiple copies of data
across different storage locations or systems. It involves duplicating data from a source
location, known as the primary or master copy, to one or more secondary copies. These
secondary copies can be stored on separate servers, data centers, or even in different
geographical locations.
Replication is a key to the effectiveness of distributed systems in that it can provide
enhanced performance, high availability and fault tolerance. Replication is used widely.
Replication in distributed systems involves creating duplicate copies of data or services
across multiple nodes.

FIG 1:Introduction to replication

Data Availability and Redundancy: By having multiple copies of data, replication ensures
high availability and redundancy. If the primary copy becomes unavailable due to hardware
failures, network issues, or disasters, the secondary copies can be used to serve data and
maintain continuous operations.

Improved Performance and Scalability: Replication can enhance read performance by


distributing the data across multiple locations. Clients can retrieve data from the nearest or
most suitable replica, reducing latency and improving response times. Replication also
enables the scaling of read operations by allowing multiple servers to serve data
concurrently.

Disaster Recovery and Business Continuity: Data replication plays a crucial role in
disaster recovery and business continuity strategies. In the event of a catastrophic failure,
such as a natural disaster or data center outage, secondary copies can be used to quickly
restore data and resume operations. Replication allows for data to be geographically
distributed, protecting against localized failures.
Load Balancing and Performance Optimization: Replication enables load balancing by
distributing read and write operations across multiple copies. This helps to distribute the
workload and prevent any single replica from becoming a performance bottleneck. By
spreading the load, replication can improve overall system performance and resource
utilization.
Data Consistency and Integrity: Replication techniques often include mechanisms for
maintaining data consistency and integrity across the replicas. Updates to the primary copy
are propagated to the secondary copies, ensuring that all replicas remain synchronized and up
to date. Techniques like synchronous or asynchronous replication can be employed based on
the desired consistency guarantees and trade-offs.
TRANSACTIONRECOVERY
• The atomic property of transactions requires that all the effects of committed
transactions and none of the effects of incomplete or aborted transactions are reflected
in the objects they accessed.
• This property can be described in terms of two aspects:
durability and failure atomicity.
• Durability requires that objects are saved in permanent storage and will be available
indefinitely thereafter. v Failure atomicity requires that effects of transactions are
atomic even when the server crashes. v when a server is running it keeps all of its
objects in its volatile memory (temporary storage) and records its committed objects in
a recovery file.
• The requirements for durability and failure atomicity are not really independent of one
another and can be dealt with by a single mechanism – the recovery manager.
• The tasks of a recovery manager are:
• to save objects in permanent storage (in a recovery file) for committed transactions;
• to restore the server’s objects after a crash;
• to reorganize the recovery file to improve the performance of recovery;
• to reclaim storage space (in the recovery file).
• Intentions list
• The intentions list is a mechanism used by servers to track which objects are modified
by a transaction and the tentative values of those objects.
• It's essentially a "log" of the objects a transaction intends to update, before those
changes are finalized.
• Purpose - The intentions list helps ensure that the state of objects is properly tracked
for recovery purposes, allowing the system to correctly commit or abort changes made
by transactions even if the system crashes.
• For each active transaction, an intentions list holds the following:
• Object references: Identifiers of objects that were altered by the transaction. ⁻
• Tentative values: The new, temporary values of those objects, which may eventually
replace the old, committed values.
• If the transaction commits, the server uses the intentions list to finalize the changes by
replacing the original objects with the tentative ones.
• If the transaction aborts, the tentative versions are discarded, and the objects return to
their original state.
• Discussion of recovery is based on the two phase commit protocol, in which all the
participants involved in a transaction first say whether they are prepared to commit
and later, if all the participants agree ,carry out the actual commit actions.
• If the participants cannot agree to commit, they must abort the transaction.
• When all the participants involved in a transaction agree to commit it ,the coordinator
informs the client and then sends messages to the participants to commit their part of
the transaction.
• Once the client has been informed that a transaction has committed , the recovery files
of the participating servers must contain sufficient information to ensure that the
transaction is committed by all of the servers.
LOGGING
• In the logging technique, the recovery file represents a log containing the history of all
the transactions performed by a server.
• The history consists of values of objects, transaction status entries and transaction
intentions lists.
• The order of the entries in the log reflects the order in which transactions have
prepared, committed and aborted at that server.
• Appending to the Log: When a transaction prepares to commit, its intentions list and
the prepared status are appended to the log. If it commits, a committed status is added.
Similarly, if a transaction is aborted , an aborted status is logged.
• The log is written sequentially, which is more efficient than writing to random disk
locations.
• Each write is atomic, ensuring consistency even in the event of partial writes.
• After a crash, any transaction that does not have a committed status in the log is
aborted. Therefore when a transaction commits, its committed status entry must be
forced to the log – that is, written to the log together with any other buffered entries.

Consider the banking service example provided in the text, which illustrates how transaction
logs are recorded:
Before Transaction T and U Start: The log contains a snapshot of objects A, B, and C
(unique identifiers for objects) with their values (e.g., A=100, B=200, C=300).
Transaction T Prepares to Commit: The log includes entries for the tentative values of
objects A and B, followed by the prepared status entry for T (< A, P1 >, < B, P2 >), where P1
and P2 are positions in the log where these tentative values are stored.
Transaction T Commits: The log includes a committed status entry for T at position P4.
Transaction U Prepares to Commit: Similar to T, the log records the tentative values for
objects C and B, along with the prepared status for U (< C, P5 >, < B, P6 >).
Recovery of objects :
• When a server is replaced after a crash, it first sets default initial values for its objects
and then hands over to its recovery manager.
• The recovery manager is responsible for restoring the server’s objects so that they
include all the effects of the committed transactions performed in the correct order and
none of the effects of incomplete or aborted transactions.
• There are two approaches to restoring the data from the recovery file
• In the first approach, the recovery manager starts at the beginning and restores the
values of all of the objects from the most recent check point . It then reads in the
values of each of the objects, associates them with their transaction’s intentions lists
and for committed transactions replaces the values of the objects .
• In the second approach, the recovery manager will restore a server’s objects by
reading the recovery file backwards. The recovery file has been structured so that there
is a backwards pointer from each transaction status entry to the next.
• The recovery manager uses transactions with committed status to restore those objects
that have not yet been restored. It continues until it has restored all of the server’s
objects.
• During the recovery process, the recovery manager also tracks all prepared
transactions (those that are in the "prepared" state but not yet committed). For each
prepared transaction, the recovery manager adds an aborted transaction status to the
log. This ensures that every transaction in the recovery file is eventually marked as
either committed or aborted.
• The recovery process must be idempotent, meaning that it can be performed multiple
times without introducing errors or inconsistencies.
• Reorganizing the recovery file :
• A recovery manager is responsible for reorganizing its recovery file so as to make the
process of recovery faster and to reduce its use of space.
• Conceptually, the only information required for recovery is a copy of the committed
version of each object in the server.
• The name checkpointing is used to refer to the process of writing the current
committed values of a server’s objects to a new recovery file ,together with transaction
status entries and intentions lists of transactions that have not yet been fully resolved .
• The purpose of making checkpoints is to reduce the number of transactions to be
dealt with during recovery and to reclaim file space.
• Checkpointing can be done immediately after recovery but before any new
transactions are started.
• However, recovery may not occur very often. Therefore, check pointing may need to
be done from time to time during the normal activity of a server.
• The checkpoint is written to a future recovery file, and the current recovery file
remains in use until the checkpoint is complete.
• Checkpointing consists of adding a mark to the recovery file when the checkpointing
starts, writing the server’s objects to the future recovery file and then copying to that
file (1) all entries before the mark that relate to as-yet-unresolved transactions and (2)
all entries after the mark in the recovery file.
• When the checkpoint is complete, the future recovery file becomes the recovery file.
• The recovery system can reduce its use of space by discarding the old recovery file

You might also like