Chapter 2 Transaction Processing
Chapter 2 Transaction Processing
CHAPTER TWO
Transaction Processing
1
Introduction
Single user Vs multiuser systems
◦ One criterion for classifying is a data base system
according to the number of users who can use the
system at the same time
Single-User System:
A DBMS is a single user if at most one user at a time can use the
system.
Multiuser System:
Many users can access the system concurrently.
Concurrency
Interleaved processing:
Concurrent execution of processes is interleaved in a single
CPU
Parallel processing:
Processes are concurrently executed in multiple CPUs.
2
Cont…
A Transaction:
◦ Logical unit of database processing that includes one or more access
operations (read, retrieval, write, insert or update and delete)
◦ An action, or series of actions, carried out by a single user or
application program, which reads or updates the contents of the
database.
◦ A transaction (set of operations) may be stand-alone specified in a high
level language like SQL submitted interactively, or may be embedded
within a program.
Transaction boundaries:
◦ One of specifying transaction boundaries is using explicit Begin and
End transaction .
◦ An application program may contain several transactions separated
by the Begin and End transaction boundaries
• Can have one of the two outcomes for any transaction:
◦ Success - transaction commits and database reaches a new consistent
state
- Committed transaction cannot be aborted or rolled back.
◦ Failure: transaction aborts, and database must be restored to
consistent state before it started
Granularity of data – size of a field, a record , or a whole disk block .
3
Transaction concepts are independent of granularity
Cont…
Basic operations are read and write
◦ read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume
that the program variable is also named X.
◦ write_item(X): Writes the value of program variable X
into the database item named X.
Note: x is the filed/column of a record or the whole block
of the disk.
read_item(X) command includes the following steps:
◦ Find the address of the disk block that contains item X.
◦ Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory
buffer).
◦ Copy item X from the buffer to the program variable
named X.
4
Cont…
write_item(X) command includes the following steps:
◦ Find the address of the disk block that contains item
X.
◦ Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory
buffer)
◦ Copy item X from the program variable named X into
its correct location in the buffer.
◦ Store the updated block from the buffer back to disk
(either immediately or at some later point in time).
The decision about when to store back a modified
disk block that is in main memory is handled by the
recovery manager of the DBMS in cooperation
with the underlying operating system
5
Cont…
◦ Example of transactions
(a) Transaction T1
(b) Transaction T2
6
Cont…
8
Cont….
Assume T1 is withdrawing £10 from an account with balance balx,
initially £100, and T2 is depositing £100 into the same account. If these
transactions are executed serially, one after the other with no
interleaving of operations, the final balance would be £190.
If T1 and T2 start at nearly the same time, and both read the balance as
£100. T2 increases balx by £100 to £200 and stores the update in the
database. Meanwhile, transaction T1 decrements its copy of balx by
£10 to £90 and stores this value in the database, overwriting the
previous update, and thereby ‘losing’ the £100 previously added to the
balance. see the following table.
Time T1 T2 bal(X)
t1 Begin Tx 100
t2 Begin Tx R(balX) 100
t3 R(balX) balx=balx+100 100
t4 balx=balx-10 W(balx) 200
t5 W(balx) Commit 90
t6 Commit 90
Lost update!! This could have been avoided if we prevent T1 from
9
reading untill T2’s update has been completed
Cont…
11
The temporary update problem: Example
Time T1 T2 bal(X)
t1 Begin Tx 100
t2 R(balX) 100
t3 balx=balx+100 100
t4 Begin Tx W(balx) 200
t5 R(balX) 200
t6 balx=balx-10 Rollback 200
t7 W(balx) 190
t8 Commit 190
Temporary update!!
o Could have been avoided if we prevent T1 from reading until after
the decision to commit or rollback12
T2 has been made
Cont…
C.The Incorrect Summary Problem
◦ If one transaction is calculating an aggregate summary function on a
number of records while other transactions are updating some of these
records, the aggregate function may calculate some values before they
are updated and others after they are updated.
Example: to summarize account a transaction T6 is executing
concurrently with transaction T5. Transaction T6 is totaling the balances
of account x (£100), account y (£50), and account z (£25). However, in the
meantime, transaction T5 has transferred £10 from balx to balz,
so that T6 now has the wrong result (£10 too high). See the following
table
13
(c) The incorrect summary problem.
14
Causes of Transaction failure
1.A computer failure (system crash):
A hardware or software error may occur in the computer
system during transaction execution. If the hardware crashes,
the contents of the computer’s internal memory may be lost.
2.A transaction or system error:
Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero.
Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error
3. Local errors or exception conditions detected by the
transaction:
Certain conditions necessitate cancellation of the transaction
For example, data for the transaction may not be found
or condition, such as insufficient account balance in
a banking database, may cause a transaction, such as a
fund withdrawal from that account, to be canceled.
A programmed abort in the transaction causes it to fail. 15
Cont…
4. Concurrency control enforcement:
The concurrency control method may decide to abort
the transaction, to be restarted later, because it violates
serializability or because several transactions are in a
state of deadlock.
5. Disk failure:
Some disk blocks may lose their data because of a read
or write malfunction or because of a disk read/write
head crash.
This may happen during a read or a write operation of
the transaction.
6. Physical problems and catastrophes:
This refers to an endless list of problems that
includes power or air-conditioning failure, fire, theft,
sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.
16
Transaction and System Concepts
Transaction states and additional operations
◦ Transaction states:
Active state, Partially committed state , Committed state , Failed
state and Terminated State
17
Transaction and System Concepts (cont…)
Transaction operations
◦ For recovery purposes, the system needs to keep track of when the transaction
starts, terminates, and commits or aborts. Recovery manager keeps track of the
following operations:
begin_transaction: This marks the beginning of transaction
execution
read or write: These specify read or write operations on the
database items that are executed as part of a transaction
End_transaction: This specifies that read and write
transaction operations have ended and marks the end limit of
transaction execution.
◦ commit_transaction:
This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the
database and will not be undone.
◦ rollback (or abort):
This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database
must be undone. 18
Transaction and System Concepts (cont…)
The System Log
◦ Log or Journal: The log keeps track of all transaction operations
that affect the values of database items
This information is needed to permit recovery from transaction
failures
The log is kept on disk, so it is not affected by any type of failure
except for disk or catastrophic failure.
In addition, the log is periodically backed up to archival storage
(tape) to guard against such catastrophic failures.
◦ We can use a notation T to refer to a unique transaction-id that is
generated automatically by the system and is used to identify each
transaction:
◦ Types of log record:
[start_transaction,T]: Records that transaction T has started execution.
[write_item,T,X,old_value,new_value]: Records that transaction T has
changed the value of database item X from old_value to new_value.
[read_item,T,X]: Records that transaction T has read the value of
database item X.
19
The System Log (cont):
20
Recovery using log records:
22
Transaction and System Concepts (cont…)
Undoing transactions
◦ If a system failure occurs, we search back in the log for all transactions
T that have written a [start_transaction,T] entry into the log but no
commit entry [commit,T] record yet
These transactions have to be rolled back to undo their
effects on the database during recovery process
Redoing transactions:
◦ Transactions that have written their commit entry in the log must also
have recorded all their write operations in the log; otherwise they
would not be committed, so their effect on the database can be redone
from the log entries. (Notice that the log file must be kept on disk.
◦ At the time of a system crash, only the log entries that have been
written back to disk are considered in the recovery process because
the contents of main memory may be lost.)
23
Desirable Properties of Transactions
24
Schedules
Schedule (or history) of transaction
◦ When transactions are executing concurrently in an interleaved
fashion, the order of execution of operations from the various
transactions form what is known as a transaction schedule (or
history)
A schedule (or history) S of n transactions T1,T2,.. ,Tn:
◦ is an ordering of the operations of the transactions subject to the
constraint that, for each transaction Ti that participates in S, the
operations of Ti in S must appear in the same order in which they
occur in Ti.
Note, however, that operations from other transactions Tj can be
interleaved with the operations of Ti in S.
A shorthand notation for describing a schedule uses the
symbols :
◦ r : for read_item operations ,
◦ w: write_item,
◦ c: commit and
◦ a: abort 25
Schedules (cont…)
Transaction numbers are appended as
subscript to each operation in the
schedule
The database item X that is read or
written follows the r and w operations
in parenthesis
◦ Example:
Sa: r1(X),r2(x),w1(x),r1(Y),w2(x);w1(Y)
28
Complete schedules
◦ A schedule S of n transactions T1, T2, ……..,Tn is said to
be a complete schedule if the following conditions hold:
1. The operations in S are exactly those operations in T1, T2, …Tn
including a commit or abort operations as the last operation for
each transaction in the schedule
2. For any pair of operations from the same transaction Ti, their order
of appearance in S is the same as their order of appearance in T
3. For any two conflicting operations, one of the two must occur
before the other in the schedule (theoretically, it is not necessary to
determine an order b/n pair of non conflicting operations)
30
Characterizing Schedules based on Recoverability
Consider the schedule given as follow:
Sa’ : r1(X),r2(x),w1(x), r1(Y),w2(x);c2;w1(Y);c1
◦ Sa’ is recoverable despite it suffers from lost update problem
However, consider the two partial schedules Sc and Sd below:
◦ Sc:r1(x);w1(x);r2(x);r1(y);w2(x);c2;a1 not recoverable
Sc is not recoverable because T2 reads X from T1 and then T2
commits before T1 commits.
If T1 aborts after the c2 operations in Sc, then the value of x that T2
read is no longer valid and T2 must be aborted after it had been
committed, leading to a schedule that is not recoverable
For the above schedule to be recoverable, the c2 operation
in Sc must be postponed until after T1 commits as shown in
Sd
◦ Sd:r1(x);w1(x);r2(x);r1(y);w2(x);w1(y);c1;c2 Recoverable
If T1 aborts instead of committing, then T2 should also abort
as shown in Se because the X it read is no longer valid.
Se:r1(x);w1(x);r2(x);r1(y);w2(x);w1(y);a1;a2 Recoverable
31
Recoverable Schedules(cont…)
Recoverable schedule — if a transaction Tj reads a data
item previously written by a transaction Ti , then the commit
operation of Ti must appear before the commit operation of
Tj.
The following schedule is not recoverable if T9 commits
immediately after the read(A) operation.
S1 S2
T1 T2 T1 T2
R(x) R(x)
x=x+10 X=x+10
W(x)
W(x)
R(x)
C
x=x-5
R(x)
W(x)
X=x-5
C
Fails/Abort/ W(x)
Rollback
C
35
Non-serializable and recoverable schedules example
S1: w2(x),w1(y),w1(x),r2(y),C1,C2
S1 is not serializable because why?
- Serial schedule 1: w1(y),w1(x),w2(x),r2(y),C1,C2
- Serial schedule 2: w2(x),r2(y), w1(y),w1(x),C1,C2
Is schedule S1 equivalent to one of the serial schedules?
o Schedule 1 is not equivalent to serial schedule #1 because in
schedule S1, T1 made the final update to DB element x But in the
serial schedule #1, transaction T2 made the final update to DB
element x . Therefore, schedule S1 is not equivalent to the serial
schedule #1.
o Schedule 1 is not equivalent to serial schedule #2 because, In
schedule S1, T2 used DB element y that was written by T1. But in
serial schedule #2, T2 used y that was written by another (the
previous ) transaction. Therefore, schedule S1 is not equivalent to
the serial schedule #2. There is no equivalent serial schedule and
thus Schedule S1 is not serializable .
36
Cont..
Schedule S1 is recoverable because:
o Transaction T2 has used dirty data from the
uncommitted T1 and T2 commits after T1.
The schedule S1 is non-serializable means resulting (final)
DB state after executing schedule S1 is an inconsistent DB
state.
What is the effect of a recoverable but not-serializable
schedule?
o A recoverable schedule means: We can restore the
database to a state that was achieved by committed
transactions.
o In a recoverable but not-serializable schedule: We will (
unfortunately) recover the database to the state that was
achieved by a non-serializable execution (achieved by the
committed transactions). (We will recover to the inconsistent
state achieve by the execution of the non -serializable schedule S1)
37
Serializable and non-recoverable schedules
S1: w1(x),w1(y),w2(x),r2(y)C2,C1
Schedule S1 is serializable because Schedule S1 is a serial
schedule and serial schedule is always serialize
Schedule S3 is not recoverable because T2 has used (read)
dirty data from the uncommitted T1. However ,
transaction T2 the commits before the transaction T1.
Effect of a serializable but non recoverable schedule:
o A serializable schedule means that when there is no
system failures, the (serializable) schedule will
completes/results in consistent database state.
oA non recoverable schedule means: When there is a
system failures, we may not be able to recover to a
consistent database state.
oThe DB state after the recovery operation (using
any logging technique) will be an inconsistent
database state. 38
Cont.…
crash
S1: w1(x),w1(y),w2(x),r2(y), C2 C1
o Suppose the system crashes after c2 then
T2 has used (read) dirty data written by
transaction T1. During recovery, the
uncommitted T1 will be rolled back (=
undone). The dirty data is illegal (= an
inconsistent DB state). Therefore, the T2 has
used (= read) a non-existing (inconsistent)
value in its execution.
39
Exercise
•Consider schedules S1,S2 and S3 below. Determine
whether each schedule is Recoverable or non-
recoverable.
S1: rl (A); r2 (K); rl (K); r3 (A); r3 (Y); WI (A); CI; W3 (Y); C3; r2 (Y); W2 (K); W2 (Y); c2;
S2: rl (A); r2 (K); r l (K); r3 (A); r3 (Y); WI (A); W3 (Y); r2 (Y); W2 (K); W2 (Y); C1; C2; c3;
S3: rl (A); r2 (K); r3 (A); rl (K); r2 (Y); r3 (Y); WI (A); C1; W2 (K); W3 (Y); W2 (Y); C3; C2;
Recoverability…
Cascadeless schedule Vs cascading rollback
◦ Schedules requiring cascaded rollback:
A schedule in which uncommitted transactions
that read an item from a failed transaction must
be rolled back
◦ Cascadeless schedule:
One where every transaction reads only the
items that are written by committed
transactions
41
Cascadeless schedule Vs cascading rollback
S1 S2
T1 T2 T3 T1 T2 T3
R(x) R(x)
W(x) W(x)
R(x) C
W(x) R(x)
R(x) W(x)
C
W(x)
R(x)
abort abort abort
W(x)
Cascading rollback C
Cascadeless
42
Characterizing Schedules based on Serializability
Serial schedule:
◦ A schedule S is serial if, for every transaction T participating in the
schedule, all the operations of T are executed consecutively in the
schedule
Otherwise, the schedule is called non serial schedule.
Serializable schedule:
◦ A schedule S is serializable if it is equivalent to some serial schedule
of the same n transactions
Result equivalent:
◦ Two schedules are called result equivalent if they produce the same
final state of the database.
Conflict equivalent:
◦ Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules
◦ Conflict serializable:
A schedule S is said to be conflict serializable if it is conflict
equivalent to some serial schedule S’.
43
Cont…
Being serializable is not the same as being serial
Being serializable implies that the schedule is a correct
schedule
◦ It will leave the database in a consistent state.
◦ The interleaving is appropriate and will result in a state as if
the transactions were serially executed, yet will achieve
efficiency due to concurrent execution.
It’s not possible to determine when a
schedule begins and when it ends.
◦ Hence, we reduce the problem of checking the
whole schedule to checking only a committed
project of the schedule (i.e. operations from only
the committed transactions.)
44
Determining conflict serializability
◦ To determine serializability, first identify the pair of conflicting
operations and check if their order is preserved in one of the
possible serial schedules
◦ schedule A:
r1(x);w1(x),r1(y);w1(y);r2(x);w2(x)- serial schedule
◦ schedule B:
r2(x);w2(x); r1(x);w1(x),r1(y);w1(y)- serial schedule
◦ schedule C:
r1(x);r2(x);w1(x);w2(x)w1(y)- (not serializable).
◦ ScheduleD :
r1(x);w1(x);r2(x);w2(x);r1(y);w1(y)-(serializable,
equivalent to schedule A).
45
Cont…
Testing for conflict serializability with precedence graphs:
Algorithm
◦ For each transaction Ti participating in Schedule S, create a node
labeled Ti in the precedence graph
◦ For each case in S where Tj executes read_item(x) after Ti executes a
write_item(x) create an edge (Ti Tj) in the precedence graph
◦ For each case in S where Tj executes write_item(x) after Ti executes
a read_item(x) create an edge (Ti Tj) in the precedence graph
◦ For each case in S where Tj executes write_item(x) after Ti executes
a write_item(x) create an edge (Ti Tj) in the precedence graph
◦ The schedule is serializable if and only if the precedence graph has no
cycles.
Testing serializability with Precedence Graphs
◦ schedule A: r1(x);w1(x),r1(y);w1(y);r2(x);w2(x)
◦ schedule B:r2(x);w2(x); r1(x);w1(x),r1(y);w1(y)
◦ schedule C:r1(x);r2(x);w1(x);w2(x)w1(y) (not serializable).
◦ ScheduleD : r1(x);w1(x);r2(x);w2(x);r1(y);w1(y) (serializable, equivalent
to schedule A).
46
Testing serializability with Precedence Graphs
Serial Serial
Serializable
Not Serializable 47
Exercise
• Draw the serializability (precedence) graphs for S1 and S2 and state
whether each schedule is serializable or not. If a schedule is
serializable, write down the equivalent serial schedule(s).
S1: rl (A); r2 (K); rl (K); r3 (A); r3 (Y); WI (A); W3 (Y); r2 (Y); W2 (K); W2 (Y);
S2: rl (A); r2 (K); r3 (A); rl (K); r2 (Y); r3 (Y); WI (A); W2 (K); W3 (Y); W2 (Y);