mod 5 (ds)
mod 5 (ds)
DISTRIBUTED TRANSACTIONS
What is Atomicity?
• Atomicity ensures that a transaction is treated as a single, indivisible unit.
• In distributed systems, where multiple servers are involved in the same transaction, the
atomicity property ensures:
o The transaction fully succeeds: All servers commit the changes, or
o The transaction completely fails: All servers abort and roll back the changes.
• Example:
o Imagine splitting a bill at a restaurant:
▪ If one person cannot pay, the whole group’s payment attempt is canceled
(abort).
▪ If everyone is ready to pay, the transaction goes through (commit).
FLAT TRANSACTION : In a flat transaction, a client makes requests to more than one server.
For example, in Figure (a), transaction T is a flat transaction that invokes operations on objects
in servers X, Y and Z. A flat client transaction completes each of its requests before going on to
the next one. Therefore, each transaction accesses servers’ objects sequentially. When servers
use locking, a transaction can only be waiting for one object at a time.
NESTED TRANSACTION : In a nested transaction, the top-level transaction can open
subtransactions, and each subtransaction can open further subtransactions down to any depth of
nesting. Figure (b) shows a client transaction T that opens two subtransactions, T1 and T2,
which access objects at servers X and Y. The subtransactions T1 and T2 open further
subtransactions T11, T12, T21, and T22, which access objects at servers M, N and P. In the
nested case, subtransactions at the same level can run concurrently, so T1 and T2 are
concurrent, and as they invoke objects in different servers, they can run in parallel. The four
subtransactions T11, T12, T21 and T22 also run concurrently. Here we will see the example of
nested transaction
Consider a distributed transaction in which a client transfers $10 from account A to C and then
transfers $20 from B to D. Accounts A and B are at separate servers X and Y and accounts C
and D are at server Z. If this transaction is structured as a set of four nested transactions, as
shown in Figure 5.2 , the four requests (two deposits and two withdraws) can run in parallel and
the overall effect can be achieved with better performance than a simple transaction in which
the four operations are invoked sequentially.
5.2.1 The coordinator of a distributed transaction
Servers that execute requests as part of a distributed transaction need to be able to communicate
with one another to coordinate their actions when the transaction commits. A client starts a
transaction by sending an openTransaction request to a coordinator in any server, as described
in Section 4.2. The coordinator that is contacted carries out the openTransaction and returns the
resulting transaction identifier (TID) to the client. Transaction identifiers for distributed
transactions must be unique within a distributed system. A simple way to achieve this is for a
TID to contain two parts: the identifier (for example, an IP address) of the server that created it
and a number unique to the server. The coordinator that opened the transaction becomes the
coordinator for the distributed transaction and at the end is responsible for committing or
aborting it. Each of the servers that manages an object accessed by a transaction is a participant
in the transaction and provides an object we call the participant. Each participant is responsible
for keeping track of
all of the recoverable objects at that server that are involved, in the transaction. The participants
are responsible for cooperating with the coordinator in carrying out the commit protocol.
During the progress of the transaction, the coordinator records a list of references to the
participants, and each participant records a reference to the coordinator. The interface for
Coordinator shown in Figure 5.3 provides an additional method, join, which is used whenever a
new participant joins the transaction: join(Trans, reference to participant) Informs a coordinator
that a new participant has joined the transaction Trans.The coordinator records the new
participant in its participant list. The fact that the coordinator knows all the participants and
each participant knows the coordinator will enable them to collect the information that will be
needed at commit time.
Figure 5.3 shows a client whose (flat) banking transaction involves accounts A, B, C and D at
servers BranchX, BranchY and BranchZ. The client’s transaction, T, transfers $4 from account
A to account C and then transfers $3 from account B to account D. The transaction described on
the left is expanded to show that openTransaction and closeTransaction are directed to the
coordinator, which would be situated in one of the servers involved in the transaction. Each
server is shown with a participant, which joins the transaction by invoking the join method in
the coordinator. When the client invokes one of the methods in the transaction, for example
b.withdraw(T, 3), the object receiving the invocation (B at BranchY, in this case) informs its
participant object that the object belongs to the transaction T. If it has not already informed the
coordinator, the participant object uses the join operation to do so. In this example, we show the
transaction identifier being passed as an additional argument so that the recipient can pass it on
to the coordinator. By the time the client calls closeTransaction, the coordinator has references
to all of the participants.
Note that it is possible for a participant to call abortTransaction in the coordinator if for some
reason it is unable to continue with the transaction.
CONCURRENCY CONTROL
Each server manages a set of objects and is responsible for ensuring that they remain consistent
when accessed by concurrent transactions. Therefore, each server is responsible for applying
concurrency control to its own objects.
The members of a collection of servers of distributed transactions are jointly responsible for
ensuring that they are performed in a serially equivalent manner. This implies that if transaction
T is before transaction U in their conflicting access to objects at one of the servers, then every
server involved in the task must makes sure T always happens before U for the objects they
both use.
Locking
In a distributed transaction, the locks on an object are held locally (in the same server). The
local lock manager can decide whether to grant a lock or make the requesting transaction wait.
However, it cannot release any locks until it knows that the transaction has been committed or
aborted at all the servers involved in the transaction.
When locking is used for concurrency control, the objects remain locked and are unavailable for
other transactions during the atomic commit protocol, although an aborted transaction releases
its locks after phase 1 of the protocol. As lock managers in different servers set their locks
independently of one another, it is possible that different servers may impose different orderings
on transactions.
Consider the following interleaving of transactions T and U at servers X and Y:
The transaction T locks object A at server X, and then transaction U locks object B at server Y.
After that, T tries to access B at server Y and waits for U’s lock. Similarly, transaction U tries to
access A at server X and has to wait for T’s lock.
Therefore, we have T before U in one server and U before T in the other. These different
orderings can lead to cyclic dependencies between transactions, giving rise to a distributed
deadlock situation.
When a deadlock is detected, a transaction is aborted to resolve the deadlock. In this case, the
coordinator will be informed and will abort the transaction at the participants involved in the
transaction.
Timestamp ordering concurrency control
In a single server transaction, the coordinator issues a unique timestamp to each transaction
when it starts. Serial equivalence is enforced by committing the versions of objects in the order
of the timestamps of transactions that accessed them.
In distributed transactions, we require that each coordinator issue globally unique timestamps. A
globally unique transaction timestamp is issued to the client by the first coordinator accessed by
a transaction. The transaction timestamp is passed to the coordinator at each server whose
objects perform an operation in the transaction.
The servers of distributed transactions are jointly responsible for ensuring that they are
performed in a serially equivalent manner. For example, if the version of an object accessed by
transaction U commits after the version accessed by T at one server, if T and U access the same
object as one another at other servers they must commit them in the same order.
To achieve the same ordering at all the servers, the coordinators must agree as to the ordering of
their timestamps. A timestamp consists of a <local timestamp, server-id> pair. The agreed
ordering of pairs of timestamps is based on a comparison in which the server-id part is less
significant. The same ordering of transactions can be achieved at all the servers even if their
local clocks are not synchronized.
For reasons of efficiency it is required that the timestamps issued by one coordinator be roughly
synchronized with those issued by the other coordinators. When this is the case, the ordering of
transactions generally corresponds to the order in which they are started in real time.
Timestamps can be kept roughly synchronized by the use of synchronized local physical clocks.
If the resolution of a conflict requires a transaction to be aborted, the coordinator will be
informed and it will abort the transaction at all the participants. Therefore any transaction that
reaches the client request to commit should always be able to commit, and participants in the
two-phase commit protocol will normally agree to commit. The only situation in which a
participant will not agree to commit is if it has crashed during the transaction.
Optimistic Concurrency Control in Distributed Transactions
• Validation of Transactions:
In optimistic concurrency control, transactions are validated before they are allowed to
commit. Each transaction is assigned a transaction number at the start of validation.
Transactions are serialized based on these transaction numbers.
• Distributed Transactions:
For distributed transactions, independent servers validate the transactions that access their
respective objects. This validation occurs in the first phase of the Two-Phase Commit
(2PC) protocol.
Example of Interleaving:
· Deadlock Resolution:
• Upon finding a cycle in the global wait-for graph:
o The centralized detector decides how to resolve the deadlock.
o It instructs the servers on which transaction to abort.
·Drawbacks of Centralized Detection:
• Single Server Dependency:
o Relies entirely on one server to detect and resolve deadlocks.
• Poor Availability:
o The system becomes less available due to its reliance on a single point of control.
• Lack of Fault Tolerance:
o Failure of the central server disrupts the entire deadlock detection process.
• Limited Scalability:
o Not suitable for large distributed systems with high transaction and server volumes.
·High Communication Overhead:
• Frequent transmission of local wait-for graphs to the centralized detector incurs
significant communication costs.
·Delayed Deadlock Detection:
• Reducing the frequency of updates to minimize communication overhead can delay the
identification of deadlocks, prolonging their impact.
Phantom Deadlocks:
1. Definition of Phantom Deadlock:
o A phantom deadlock occurs when a deadlock is falsely detected even though it
does not actually exist.
2. Cause of Phantom Deadlocks in Distributed Systems:
o Information about wait-for relationships between transactions is transmitted
between servers.
o Detecting a deadlock requires collecting and analyzing this information, which
takes time.
o During this time, a transaction holding a lock may release it, resolving the
deadlock before it is detected.
3. Example of Phantom Deadlock:
o A global deadlock detector receives local wait-for graphs from servers X and Y.
o Transaction U releases an object at server X and requests an object held by V at
server Y.
o The global detector processes server Y's graph before receiving the updated graph
from server X.
o A cycle (e.g., T → U → V → T) is detected, even though the edge T → U no
longer exists.
4. Impact of Two-Phase Locking:
o If transactions use two-phase locking:
▪ Transactions cannot release objects and then acquire more locks.
▪ This reduces the likelihood of phantom deadlocks occurring
cycles cannot occur in the way suggested above. Consider the situation in which a cycle T U
V T is detected: either this represents a deadlock or each of the transactions T, U and V
must eventually commit. It is actually impossible for any of them to commit, because each of
them is waiting for an object that will never be released.
A phantom deadlock could be detected if a waiting transaction in a deadlock cycle aborts during
the deadlock detection procedure. For example, if there is a cycle T U V T and U aborts
after the information concerning U has been collected, then the cycle has been broken already
and there is no deadlock.
W
Waits
Data Availability and Redundancy: By having multiple copies of data, replication ensures
high availability and redundancy. If the primary copy becomes unavailable due to hardware
failures, network issues, or disasters, the secondary copies can be used to serve data and
maintain continuous operations.
Disaster Recovery and Business Continuity: Data replication plays a crucial role in
disaster recovery and business continuity strategies. In the event of a catastrophic failure,
such as a natural disaster or data center outage, secondary copies can be used to quickly
restore data and resume operations. Replication allows for data to be geographically
distributed, protecting against localized failures.
Load Balancing and Performance Optimization: Replication enables load balancing by
distributing read and write operations across multiple copies. This helps to distribute the
workload and prevent any single replica from becoming a performance bottleneck. By
spreading the load, replication can improve overall system performance and resource
utilization.
Data Consistency and Integrity: Replication techniques often include mechanisms for
maintaining data consistency and integrity across the replicas. Updates to the primary copy
are propagated to the secondary copies, ensuring that all replicas remain synchronized and up
to date. Techniques like synchronous or asynchronous replication can be employed based on
the desired consistency guarantees and trade-offs.
TRANSACTIONRECOVERY
• The atomic property of transactions requires that all the effects of committed
transactions and none of the effects of incomplete or aborted transactions are reflected
in the objects they accessed.
• This property can be described in terms of two aspects:
durability and failure atomicity.
• Durability requires that objects are saved in permanent storage and will be available
indefinitely thereafter. v Failure atomicity requires that effects of transactions are
atomic even when the server crashes. v when a server is running it keeps all of its
objects in its volatile memory (temporary storage) and records its committed objects in
a recovery file.
• The requirements for durability and failure atomicity are not really independent of one
another and can be dealt with by a single mechanism – the recovery manager.
• The tasks of a recovery manager are:
• to save objects in permanent storage (in a recovery file) for committed transactions;
• to restore the server’s objects after a crash;
• to reorganize the recovery file to improve the performance of recovery;
• to reclaim storage space (in the recovery file).
• Intentions list
• The intentions list is a mechanism used by servers to track which objects are modified
by a transaction and the tentative values of those objects.
• It's essentially a "log" of the objects a transaction intends to update, before those
changes are finalized.
• Purpose - The intentions list helps ensure that the state of objects is properly tracked
for recovery purposes, allowing the system to correctly commit or abort changes made
by transactions even if the system crashes.
• For each active transaction, an intentions list holds the following:
• Object references: Identifiers of objects that were altered by the transaction. ⁻
• Tentative values: The new, temporary values of those objects, which may eventually
replace the old, committed values.
• If the transaction commits, the server uses the intentions list to finalize the changes by
replacing the original objects with the tentative ones.
• If the transaction aborts, the tentative versions are discarded, and the objects return to
their original state.
• Discussion of recovery is based on the two phase commit protocol, in which all the
participants involved in a transaction first say whether they are prepared to commit
and later, if all the participants agree ,carry out the actual commit actions.
• If the participants cannot agree to commit, they must abort the transaction.
• When all the participants involved in a transaction agree to commit it ,the coordinator
informs the client and then sends messages to the participants to commit their part of
the transaction.
• Once the client has been informed that a transaction has committed , the recovery files
of the participating servers must contain sufficient information to ensure that the
transaction is committed by all of the servers.
LOGGING
• In the logging technique, the recovery file represents a log containing the history of all
the transactions performed by a server.
• The history consists of values of objects, transaction status entries and transaction
intentions lists.
• The order of the entries in the log reflects the order in which transactions have
prepared, committed and aborted at that server.
• Appending to the Log: When a transaction prepares to commit, its intentions list and
the prepared status are appended to the log. If it commits, a committed status is added.
Similarly, if a transaction is aborted , an aborted status is logged.
• The log is written sequentially, which is more efficient than writing to random disk
locations.
• Each write is atomic, ensuring consistency even in the event of partial writes.
• After a crash, any transaction that does not have a committed status in the log is
aborted. Therefore when a transaction commits, its committed status entry must be
forced to the log – that is, written to the log together with any other buffered entries.
Consider the banking service example provided in the text, which illustrates how transaction
logs are recorded:
Before Transaction T and U Start: The log contains a snapshot of objects A, B, and C
(unique identifiers for objects) with their values (e.g., A=100, B=200, C=300).
Transaction T Prepares to Commit: The log includes entries for the tentative values of
objects A and B, followed by the prepared status entry for T (< A, P1 >, < B, P2 >), where P1
and P2 are positions in the log where these tentative values are stored.
Transaction T Commits: The log includes a committed status entry for T at position P4.
Transaction U Prepares to Commit: Similar to T, the log records the tentative values for
objects C and B, along with the prepared status for U (< C, P5 >, < B, P6 >).
Recovery of objects :
• When a server is replaced after a crash, it first sets default initial values for its objects
and then hands over to its recovery manager.
• The recovery manager is responsible for restoring the server’s objects so that they
include all the effects of the committed transactions performed in the correct order and
none of the effects of incomplete or aborted transactions.
• There are two approaches to restoring the data from the recovery file
• In the first approach, the recovery manager starts at the beginning and restores the
values of all of the objects from the most recent check point . It then reads in the
values of each of the objects, associates them with their transaction’s intentions lists
and for committed transactions replaces the values of the objects .
• In the second approach, the recovery manager will restore a server’s objects by
reading the recovery file backwards. The recovery file has been structured so that there
is a backwards pointer from each transaction status entry to the next.
• The recovery manager uses transactions with committed status to restore those objects
that have not yet been restored. It continues until it has restored all of the server’s
objects.
• During the recovery process, the recovery manager also tracks all prepared
transactions (those that are in the "prepared" state but not yet committed). For each
prepared transaction, the recovery manager adds an aborted transaction status to the
log. This ensures that every transaction in the recovery file is eventually marked as
either committed or aborted.
• The recovery process must be idempotent, meaning that it can be performed multiple
times without introducing errors or inconsistencies.
• Reorganizing the recovery file :
• A recovery manager is responsible for reorganizing its recovery file so as to make the
process of recovery faster and to reduce its use of space.
• Conceptually, the only information required for recovery is a copy of the committed
version of each object in the server.
• The name checkpointing is used to refer to the process of writing the current
committed values of a server’s objects to a new recovery file ,together with transaction
status entries and intentions lists of transactions that have not yet been fully resolved .
• The purpose of making checkpoints is to reduce the number of transactions to be
dealt with during recovery and to reclaim file space.
• Checkpointing can be done immediately after recovery but before any new
transactions are started.
• However, recovery may not occur very often. Therefore, check pointing may need to
be done from time to time during the normal activity of a server.
• The checkpoint is written to a future recovery file, and the current recovery file
remains in use until the checkpoint is complete.
• Checkpointing consists of adding a mark to the recovery file when the checkpointing
starts, writing the server’s objects to the future recovery file and then copying to that
file (1) all entries before the mark that relate to as-yet-unresolved transactions and (2)
all entries after the mark in the recovery file.
• When the checkpoint is complete, the future recovery file becomes the recovery file.
• The recovery system can reduce its use of space by discarding the old recovery file