Unit 3_Part 2_Transaction management
Unit 3_Part 2_Transaction management
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Play0:00
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
If we want to maintain database consistency, then certain properties need to be followed in the
transactions known as the ACID (Atomicity, Consistency, Isolation, Durability) properties. Let
us discuss them in detail.
A – Atomicity
Atomicity means that an entire transaction either takes place all at once or it doesn’t occur at
all. It means that there’s no midway. The transactions can never occur partially. Every
transaction can be considered as a single unit, and they either run to completion or do not get
executed at all. We have the following two operations here:
—Commit: In case a transaction commits, the changes made are visible to us. Thus, atomicity
is also called the ‘All or nothing rule’.
—Abort: In case a transaction aborts, the changes made to the database are not visible to us.
Consider this transaction T that consists of T1 and T2: Transfering 100 from account A to
account B.
C – Consistency
Consistency means that we have to maintain the integrity constraints so that any given database
stays consistent both before and after a transaction. If we refer to the example discussed above,
then we have to maintain the total amount, both before and after the transaction.
Thus, the given database is consistent. Here, an inconsistency would occur when T1 completes,
but then the T2 fails. As a result, the T would remain incomplete.
I – Isolation
Isolation ensures the occurrence of multiple transactions concurrently without a database state
leading to a state of inconsistency. A transaction occurs independently, i.e. without any
interference. Any changes that occur in any particular transaction wou ld NOT be ever visible
to the other transactions unless and until this particular change in this transaction has been
committed or written to the memory.
The property of isolation ensures that when we execute the transactions concurrently, it will
result in such a state that’s equivalent to the achieved state that was serially executed in a
particular order.
Thus, the sum computed here is not consistent with the sum that is obtained at the end of the
transaction:
It results in the inconsistency of a database due to the loss of a total of 50 units. The transactions
must, thus, take place in isolation. Also, the changes must only be visible after we have made
them on the main memory.
D – Durability
The durability property states that once the execution of a transaction is completed, the
modifications and updates on the database gets written on and stored in the disk. These persist
even after the occurrence of a system failure. Such updates become permanent and get stored
in non-volatile memory. Thus, the effects of this transaction are never lost.
States of Transaction
In a database, the transaction can be in one of the following states -
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed
in this state.
Committed
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If
not then it will abort or roll back the transaction to bring the database in to a consistent
state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
SCHEDULES IN DBMS
A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.
The serial schedule is a type of schedule where one transaction is executed completely before
starting another transaction. In the serial schedule, when the first transaction completes its
cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it
has no interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
o It contains many possible orders in which the system can execute the individual
operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non -serial schedules.
It has interleaving of operations.
3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Serializability in DBMS
In the field of computer science, serializability is a term that is a property of the system that
describes how the different process operates the shared data. If the result given by the system
is similar to the operation performed by the system, then in this situation, we call that system
serializable. Here the cooperation of the system means there is no overlapping in the execution
of the data. In DBMS, when the data is being written or read then, the DBMS can stop all the
other processes from accessing the data.
Testing of Serializability
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has
a pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of
vertices is used to contain all the transactions participating in the schedule. The set of edges is
used to contain all edges Ti ->Tj for which one of the three conditions holds:
o If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are
executed before the first instruction of Tj is executed.
For example:
Explanation:
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.
Types of Serializability
In DBMS, all the transaction should be arranged in a particular order, even if all the transaction
is concurrent. If all the transaction is not serializable, then it produces the incorrect result.
In DBMS, there are different types of serializable. Each type of serializable has some
advantages and disadvantages. The two most common types of serializable are
1. Conflict Serializability
Conflicting Operations
Example:
Conflict Equivalent
Example:
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
2. View Serializability
View serializability is a type of operation in the serializable in which each transaction should
produce some result and these results are the output of proper sequential execution of the data
item. Unlike conflict serialized, the view serializability focuses on preventing inconsistency in
the database. In DBMS, the view serializability provides the user to view the database in a
conflicting way.
View Equivalent
1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should
also read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.
2. Updated Read
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in
S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transaction T1
updates A at last then in S2, final writes operations should also be done by T1.
Example:
Schedule S
1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>
In both schedules S and S1, there is no read except the initial read that's why we don't need to
check that condition.
The initial read operation in S is done by T1 and in S1, it is also done by T1.
The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1 are
view Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check another schedule.
1. T1 → T2 → T3
For better understanding, let's explain these with an example. Suppose there are two users Sona
and Archita. Each executes two transactions. Let's transactions T1 and T2 are executed by
Sona, and T3 and T4 are executed by Archita. Suppose transaction T1 re ads and writes the data
item A, transaction T2 reads the data item B, transaction T3 reads and writes the data item C
and transaction T4 reads the data item D. Lets the schedule the above transaction as below.
In order for a schedule to be considered serializable, it must first satisfy the conflict
serializability property. In our example schedule above, notice that Transaction 1 (T1) and
Transaction 2 (T2) read data item B before either writing it. This causes a conflict between T1
and T2 because they are both trying to read and write the same data item concurrently.
Therefore, the given schedule does not conflict with serializability.
However, there is another type of serializability called view serializability which our example
does satisfy. View serializability requires that if two transactions cannot see each other's
updates (i.e., one transaction cannot see the effects of another co ncurrent transaction), the
schedule is considered to view serializable. In our example, Transaction 2 (T2) cannot see any
updates made by Transaction 4 (T4) because they do not share common data items. Therefore,
the schedule is viewed as serializable.
It's important to note that conflict serializability is a stronger property than view serializability
because it requires that all potential conflicts be resolved before any updates are made (i.e.,
each transaction must either read or write each data item before any other transaction can write
it). View serializability only requires that if two transactions cannot see each other's updates,
then the schedule is view serializable & it doesn't matter whether or not there are potential
conflicts between them.
All in all, both properties are necessary for ensuring correctness in concurrent transactions in a
database management system.
1. Predictable execution: In serializable, all the threads of the DBMS are executed at one
time. There are no such surprises in the DBMS. In DBMS, all the variables are updated
as expected, and there is no data loss or corruption.
2. Easier to Reason about & Debug: In DBMS all the threads are executed alone, so it
is very easier to know about each thread of the database. This can make the debugging
process very easy. So we don't have to worry about the concurrent process.
3. Reduced Costs: With the help of serializable property, we can reduce the cost of the
hardware that is being used for the smooth operation of the database. It can also reduce
the development cost of the software.
But before knowing about concurrency control, we should know about concurrent execution.
Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
For example:
Consider the below diagram where two transactions T X and TY, are performed on the
same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
For example:
For example:
Consider two transactions, TX and TY, performing the read/write operations on account
A, having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that
will be read as $400.
Thus, in order to maintain consistency in the database and avoid such problems that take place
in concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or logical
counter.
o The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols start
working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system first.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise
the operation is executed.
Where,
o But the schedule may not be recoverable and may not even be cascade - free.
Advantages :
Disadvantages:
Locking and unlocking of the database should be done in such a way that there is no
inconsistency, deadlock and no starvation.
Every transaction will lock and unlock the data item in two different phases.
• Growing Phase − All the locks are issued in this phase. No locks are released, after
all changes to data-items are committed and then the second phase (shrinking phase)
starts.
• Shrinking phase − No locks are issued in this phase, all the changes to data-items are
noted (stored) and then locks are released.
The 2PL locking protocol is represented diagrammatically as follows −
The following way shows how unlocking and locking work with 2 -PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
T1 T2
Lock-X(A) Lock-X(B)
Read A; Read B;
Lock-X(B) Lock-X(A)
Here,
Lock-X(B) : Cannot execute Lock-X(B) since B is locked by T2.
Lock-X(A) : Cannot execute Lock-X(A) since A is locked by T1.
In the above situation T1 waits for B and T2 waits for A. The waiting time never ends. Both
the transaction cannot proceed further at least any one releases the lock voluntarily. This
situation is called deadlock.
The wait for graph is as follows −
Wait for graph: It is used in the deadlock detection method, creating a node for each
transaction, creating an edge Ti to Tj, if Ti is waiting to lock an item locked by Tj. A cycle in
WFG indicates a deadlock has occurred. WFG is created at regular interva ls.
3. Conservative two-phase locking protocol :
• The transaction must lock all the data items it requires in the transaction before the
transaction begins.
• If any of the data items are not available for locking before execution of the lock, then
no data items are locked.
• The read-and-write data items need to know before the transaction begins. This is not
possible normally.
• Conservative two-phase locking protocol is deadlock-free.
Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
o This protocol is used to determine the time stamp for the transaction for serialization
using the time stamp of the validation phase, as it is the actual phase which determines
if the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided in
advance.
o While executing the transaction, it ensures a greater degree of concurrency and also less
number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
In the above table, in the left column, a transaction is written, and in the right column of the
table, a log record is written for this Transaction.
Key Points – Following points should be remembered while studying the Log Based
Recovery.
1. Whenever a transaction performs a write, it is essential that the log record for that
write is to be created before the D.B. is modified.
2. Once a log record exists, we can output the modification into D.B. if required. Also,
we have the ability to undo the modification that has already been updated in D.B.
DBMS Unit 3 by Dr. Deepika Bhatia
Log Based Recovery work in two modes These modes are as follow-
1. Immediate Mode
2. Deferred Mode
Log Based Recovery in Immediate Mode
In immediate Mode of log-based recovery, database modification is performed while
Transaction is in Active State.
It means as soon as Transaction is performed or executes its WRITE Operation, then
immediately these changes are saved in Database also. In immediate Mode, there is no need
to wait for the execution of the COMMIT Statement to update the Database.
Explanation
Consider the transition T1 as shown in the above table. The log of this Transaction is written
in the second column. So when the value of data items A and B are changed from 1000 to 950
and 1050 respectively at that time, the value of A and B will also be Update in the Database.
In the case of Immediate Mode, we Need both Old value and New value of the Data Item in
the Log File.
Now, if the system is crashed or failed in the following cases may be possible.
Case 1: If the system crashes after Transaction executing the Commit statement.
In this case, when Transaction executed commit statement, then corresponding commit
entry will also be made to the Log file immediately.
To recover the database recovery manager will check the log file to recover the Database, then
the recovery manager will find both <T, Start > and < T, Commit> in the Log file then it
represents that Transaction T has been completed successfully before the system failed
so REDO(T) operation will be performed and Updated values of Data Item A and B will be set
in Database.
Case 2: If Transaction failed before executing the Commit, it means there is no commit
statement in Transaction as shown in the table given below, then there will be no entry for
Commit in the log file.
It means before system failure; Transaction was not completed successfully, so to ensure the
atomicity property UNDO(T) operation will be performed because Update Values are written
in the Database immediately after the write operation. So Recovery manager will set the old
value of data items A and B.
Log Based Recovery in Deferred Mode: In the Deferred Mode of Log-based recovery
method, all modifications to Database are recorded but WRITE Operation is deferred until
the Transaction is partially committed. It means In the case of Deferred mode, Database is
modified after Commit operation of Transaction is performed. For database Recovery in DBMS
in Deferred Mode, there may be two possible cases.
Case 1: If the system fails or crashes after Transaction performed the commit operation. In this
situation, since the Transaction has performed the commit operation successfully so there will
be an entry for the commit statement in the Logfile of the Transaction .
So after System Failure, when the recovery manager will recover the Database, then he will
check the log file, and the recovery manager will find both <T, Start> and <T, Commit> It
means Transaction has been completed successfully before the system crash so in this situation
REDO(T) operation will be performed and Updated value of Data item A and B will be set in
Database.
Case 2: If Transaction failed before executing the Commit, it means there is no commit
statement in Transaction as shown in the table given below, then there will be no entry for
Commit in the log file.
So, in this case, when the system will fail or crash, then the recovery manager will check the
Log file, and he will find the < T, Start> entry in the Log file but not find the < T, Commit>
entry. It means before system failure, Transaction was not completed successfully, so to ensure
the atomicity property, the Recovery Manager will set the old value of data items A and B.
Note – In this case of Deferred Mode, there is no need to Perform UNDO (T). Update
values of data item not written to Database immediately after the WRITE operation.
At the starting of the Transaction, the page tables are identical at that time.
Advantages
The advantages of shadow paging are as follows −
• No need for log records.
• No undo/ Redo algorithm.
• Recovery is faster.
Disadvantages
The disadvantages of shadow paging are as follows −
• Data is fragmented or scattered.
• Garbage collection problem. Database pages containing old versions of modified data
need to be garbage collected after every transaction.
• Concurrent transactions are difficult to execute.
When more than one transaction are being executed in parallel, the logs are interleaved. At the
time of recovery, it would become hard for the recovery system to backtrack all logs, and then
start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at
all. Checkpoint is a mechanism where all the previous logs are removed from the system and
stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was
in consistent state, and all the transactions were committed.
Recovery via checkpoint
When a system with concurrent transactions crashes and recovers, it behaves in the following
manner −