0% found this document useful (0 votes)
14 views

DBMS UNIT 5 Part 2

Uploaded by

msreevanicse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

DBMS UNIT 5 Part 2

Uploaded by

msreevanicse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

UNIT – V (Part - 2)

51

CONCURRENCY CONTROL
Lock-Based Protocols
Deadlock handling
Multiple Granularity
Timestamp-Based Protocols
Validation-Based Protocols
Multiversion Schemes
Lock-Based Protocols
52

 A lock is a mechanism to control concurrent access to a data item


 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
 Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.
Lock-Based Protocols (Cont.)
53

 Lock-compatibility matrix

 A transaction may be granted a lock on an item if the requested lock


is compatible with locks already held on the item by other transactions
 Any number of transactions can hold shared locks on an item,
□ but if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item.
 If a lock cannot be granted, the requesting transaction is made to wait
till all incompatible locks held by other transactions have been
released. The lock is then granted.
Lock-Based Protocols (Cont.)
54

 Example of a transaction performing locking:


T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
 Locking as above is not sufficient to guarantee Serializability - if A and B
get updated in-between the read of A and B, the displayed sum would
be wrong.
 A locking protocol is a set of rules followed by all transactions while
requesting and releasing locks. Locking protocols restrict the set of
possible schedules.
Pitfalls of Lock-Based Protocols
55

 Consider the partial schedule

 Neither T3 nor T4 can make progress - executing lock-S(B) causes T4


to wait for T3 to release its lock on B, while executing lock-X(A)
causes T3 to wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
□ To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
Pitfalls of Lock-Based Protocols (Cont.)
56

 The potential for deadlock exists in most locking protocols.


Deadlocks are a necessary evil.
 Starvation is also possible if concurrency control manager is
badly designed. For example:
□ A transaction may be waiting for an X-lock on an item, while
a sequence of other transactions request and are granted an
S-lock on the same item.
□ The same transaction is repeatedly rolled back due to
deadlocks.
 Concurrency control manager can be designed to prevent
starvation.
The Two-Phase Locking Protocol
57

 This is a protocol which ensures conflict-serializable schedules.


 Phase 1: Growing Phase
□ transaction may obtain locks

□ transaction may not release locks

 Phase 2: Shrinking Phase


□ transaction may release locks

□ transaction may not obtain locks

 The protocol assures serializability. It can be proved that the


transactions can be serialized in the order of their lock points
(i.e. the point where a transaction acquired its final lock).
The Two-Phase Locking Protocol (Cont.)
58

 Two-phase locking does not ensure freedom from deadlocks


 Cascading roll-back is possible under two-phase locking. To
avoid this, follow a modified protocol called strict two-phase
locking. Here a transaction must hold all its exclusive locks till it
commits/aborts.
 Rigorous two-phase locking is even stricter: here all locks are
held till commit/abort. In this protocol transactions can be
serialized in the order in which they commit.
The Two-Phase Locking Protocol (Cont.)
59

 There can be conflict serializable schedules that cannot be


obtained if two-phase locking is used.
 However, in the absence of extra information (e.g., ordering of
access to data), two-phase locking is needed for conflict
serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking,
we can find a transaction Tj that uses two-phase locking, and a
schedule for Ti and Tj that is not conflict serializable.
Lock Conversions
60

 Two-phase locking with lock conversions:


– First Phase:
□ can acquire a lock-S on item

□ can acquire a lock-X on item

□ can convert a lock-S to a lock-X (upgrade)

– Second Phase:
□ can release a lock-S

□ can release a lock-X

□ can convert a lock-X to a lock-S (downgrade)

 This protocol assures serializability. But still relies on the


programmer to insert the various locking instructions.
Automatic Acquisition of Locks
61

 A transaction Ti issues the standard read/write instruction,


without explicit locking calls.
 The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read(D)
end
Automatic Acquisition of Locks (Cont.)
62

 write(D) is processed as:


if Ti has a lock-X on D
then
write(D)
else begin
if necessary wait until no other trans. has any lock on D,
if Ti has a lock-S on D
then
upgrade lock on D to lock-X
else
grant Ti a lock-X on D
write(D)
end;
 All locks are released after commit or abort
Implementation of Locking
63

 A lock manager can be implemented as a separate process to


which transactions send lock and unlock requests
 The lock manager replies to a lock request by sending a lock
grant messages (or a message asking the transaction to roll back,
in case of a deadlock)
 The requesting transaction waits until its request is answered
 The lock manager maintains a data-structure called a lock table
to record granted locks and pending requests
 The lock table is usually implemented as an in-memory hash table
indexed on the name of the data item being locked
Lock Table
64

 Black rectangles indicate granted locks,


white ones indicate waiting requests
 Lock table also records the type of lock
granted or requested
 New request is added to the end of the
queue of requests for the data item, and
granted if it is compatible with all earlier
locks
 Unlock requests result in the request being
deleted, and later requests are checked to
see if they can now be granted
Granted  If transaction aborts, all waiting or
granted requests of the transaction are
Waiting
deleted
□ lock manager may keep a list of locks
held by each transaction, to implement
this efficiently
Graph-Based Protocols
65

 Graph-based protocols are an alternative to two-phase locking


 Impose a partial ordering  on the set D = {d1, d2 ,..., dh} of all
data items.
□ If di  dj then any transaction accessing both di and dj must
access di before accessing dj.
□ Implies that the set D may now be viewed as a directed acyclic
graph, called a database graph.
 The tree-protocol is a simple kind of graph protocol.
Tree Protocol
66

1. Only exclusive locks are allowed.


2. The first lock by Ti may be on any data
item. Subsequently, a data Q can be
locked by Ti only if the parent of Q is
currently locked by Ti.
3. Data items may be unlocked at any
time.
4. A data item that has been locked and
unlocked by Ti cannot subsequently be
relocked by Ti
Graph - Based Protocols (Cont.)
67

 The tree protocol ensures conflict Serializability as well as freedom


from deadlock.
 Unlocking may occur earlier in the tree-locking protocol than in the
two-phase locking protocol.
□ shorter waiting times, and increase in concurrency
□ protocol is deadlock-free, no rollbacks are required

 Drawbacks
□ Protocol does not guarantee recoverability or cascade freedom
 Need to introduce commit dependencies to ensure recoverability
□ Transactions may have to lock data items that they do not access.
 increased locking overhead, and additional waiting time
 potential decrease in concurrency
 Schedules not possible under two-phase locking are possible under
tree protocol, and vice versa.
 ******END******
Deadlock Handling
73

 Consider the following two transactions:


T1: write (X) T2: write(Y)
write(Y) write(X)
 Schedule with deadlock

1 2

lock-X on X
write (X)
lock-X on Y
write (X)
wait for lock-X on X
wait for lock-X on Y
Deadlock Handling
74

 System is deadlocked if there is a set of transactions such that


every transaction in the set is waiting for another transaction in
the set.
 Deadlock prevention protocols ensure that the system will never
enter into a deadlock state. Some prevention strategies :
□ Require that each transaction locks all its data items before it
begins execution (predeclaration).
□ Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by
the partial order (graph-based protocol).
More Deadlock Prevention Strategies
75

 Following schemes use transaction timestamps for the sake of


deadlock prevention alone.
 wait-die scheme - non-preemptive
□ older transaction may wait for younger one to release data
item. Younger transactions never wait for older ones; they are
rolled back instead.
□ a transaction may die several times before acquiring needed
data item
 wound-wait scheme - preemptive
□ older transaction wounds (forces rollback) of younger
transaction instead of waiting for it. Younger transactions may
wait for older ones.
□ may be fewer rollbacks than wait-die scheme.
Deadlock prevention (Cont.)
76

 Both in wait-die and in wound-wait schemes, a rolled back


transactions is restarted with its original timestamp. Older
transactions thus have precedence over newer ones, and
starvation is hence avoided.
 Timeout-Based Schemes :
□ a transaction waits for a lock only for a specified amount of
time. After that, the wait times out and the transaction is rolled
back.
□ thus deadlocks are not possible

□ simple to implement; but starvation is possible. Also difficult to


determine good value of the timeout interval.
Deadlock Detection
77

 Deadlocks can be described as a wait-for graph, which consists


of a pair G = (V,E),
□ V is a set of vertices (all the transactions in the system)

□ E is a set of edges; each element is an ordered pair Ti Tj.

 If Ti  Tj is in E, then there is a directed edge from Ti to Tj,


implying that Ti is waiting for Tj to release a data item.
 When Ti requests a data item currently being held by Tj, then the
edge Ti Tj is inserted in the wait-for graph. This edge is removed
only when Tj is no longer holding a data item needed by Ti.
 The system is in a deadlock state if and only if the wait-for
graph has a cycle. Must invoke a deadlock-detection algorithm
periodically to look for cycles.
Deadlock Detection (Cont.)
78

Wait-for graph without a cycle Wait-for graph with a cycle


Deadlock Recovery
79

 When deadlock is detected :


□ Some transaction will have to rolled back (made a victim) to
break deadlock. Select that transaction as victim that will
incur minimum cost.
□ Rollback -- determine how far to roll back transaction

 Total rollback: Abort the transaction and then restart it.


 More effective to roll back transaction only as far as
necessary to break deadlock.
□ Starvation happens if same transaction is always chosen as
victim. Include the number of rollbacks in the cost factor to
avoid starvation
******END******
Multiple Granularity
68

 Allow data items to be of various sizes and define a hierarchy


of data granularities, where the small granularities are nested
within larger ones
 Can be represented graphically as a tree (but don't confuse with
tree-locking protocol)
 When a transaction locks a node in the tree explicitly, it implicitly
locks all the node's descendents in the same mode.
 Granularity of locking (level in tree where locking is done):
□ fine granularity (lower in tree): high concurrency, high locking
overhead
□ coarse granularity (higher in tree): low locking overhead, low
concurrency
Example of Granularity Hierarchy
69

The levels, starting from the coarsest (top) level are


□ database
□ area
□ file
□ record
Intention Lock Modes
70

 In addition to S and X lock modes, there are three additional


lock modes with multiple granularity:
□ intention-shared (IS): indicates explicit locking at a lower level
of the tree but only with shared locks.
□ intention-exclusive (IX): indicates explicit locking at a lower
level with exclusive or shared locks
□ shared and intention-exclusive (SIX): the subtree rooted by that
node is locked explicitly in shared mode and explicit locking is
being done at a lower level with exclusive-mode locks.
 intention locks allow a higher level node to be locked in S or X
mode without having to check all descendent nodes.
Compatibility Matrix with
Intention Lock Modes
71

 The compatibility matrix for all lock modes is:

IS IX S S IX X
IS     

IX     

S     

S IX     

X     
Multiple Granularity Locking Scheme
72

 Transaction Ti can lock a node Q, using the following rules:


1. The lock compatibility matrix must be observed.

2. The root of the tree must be locked first, and may be locked in
any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent
of Q is currently locked by Ti in either IX or IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the
parent of Q is currently locked by Ti in either IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any
node (that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are
currently locked by Ti.
 Observe that locks are acquired in root-to-leaf order, whereas they
are released in leaf-to-root order.
******END******
Timestamp-Based Protocols
80

 Each transaction is issued a timestamp when it enters the system.


If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj
is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
 The protocol manages concurrent execution such that the time-
stamps determine the serializability order.
 In order to assure such behavior, the protocol maintains for each
data Q two timestamp values:
□ W-timestamp(Q) is the largest time-stamp of any transaction
that executed write(Q) successfully.
□ R-timestamp(Q) is the largest time-stamp of any transaction
that executed read(Q) successfully.
Timestamp-Based Protocols (Cont.)
81

 The timestamp ordering protocol ensures that any conflicting


read and write operations are executed in timestamp order.
 Suppose a transaction Ti issues a read(Q)
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of
Q that was already overwritten.
 Hence, the read operation is rejected, and Ti is rolled
back.
2. If TS(Ti) W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to max(R-timestamp(Q),
TS(Ti)).
Timestamp-Based Protocols (Cont.)
82

 Suppose that transaction Ti issues write(Q).


1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is
producing was needed previously, and the system assumed
that that value would never be produced.
 Hence, the write operation is rejected, and Ti is rolled
back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an
obsolete value of Q.
 Hence, this write operation is rejected, and Ti is rolled
back.
3. Otherwise, the write operation is executed, and W-
timestamp(Q) is set to TS(Ti).
Example Use of the Protocol
83

A partial schedule for several data items for transactions with


timestamps 1, 2, 3, 4, 5

T1 T2 T3 T4 T5
read(X)
read(Y)
read(Y)
write(Y)
write(Z)
read(Z)
read(X)
abort
read(X)
write(Z)
abort
write(Y)
write(Z)
Correctness of Timestamp-Ordering Protocol
84

 The timestamp-ordering protocol guarantees Serializability since


all the arcs in the precedence graph are of the form:

transaction transaction
with smaller with larger
timestamp timestamp

Thus, there will be no cycles in the precedence graph


 Timestamp protocol ensures freedom from deadlock as no
transaction ever waits.
 But the schedule may not be cascade-free, and may not even
be recoverable.
Recoverability and Cascade Freedom
85

 Problem with timestamp-ordering protocol:


□ Suppose Ti aborts, but Tj has read a data item written by Ti

□ Then Tj must abort; if Tj had been allowed to commit earlier, the


schedule is not recoverable.
□ Further, any transaction that has read a data item written by Tj must
abort
□ This can lead to cascading rollback --- that is, a chain of rollbacks

 Solution 1:
□ A transaction is structured such that its writes are all performed at the
end of its processing
□ All writes of a transaction form an atomic action; no transaction may
execute while a transaction is being written
□ A transaction that aborts is restarted with a new timestamp

 Solution 2: Limited form of locking: wait for data to be committed before


reading it
 Solution 3: Use commit dependencies to ensure recoverability
Thomas’ Write Rule
86

 Modified version of the timestamp-ordering protocol in which obsolete


write operations may be ignored under certain circumstances.
 When Ti attempts to write data item Q, if TS(Ti) < W-timestamp(Q),
then Ti is attempting to write an obsolete value of {Q}.
□ Rather than rolling back Ti as the timestamp ordering protocol
would have done, this {write} operation can be ignored.
 Otherwise this protocol is the same as the timestamp ordering
protocol.
 Thomas' Write Rule allows greater potential concurrency.
□ Allows some view-serializable schedules that are not conflict-
serializable.
******END******
Validation-Based Protocol
87

 Execution of transaction Ti is done in three phases.


1. Read and execution phase: Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti performs a ``validation test''
to determine if local variables can be written without violating
serializability.
3. Write phase: If Ti is validated, the updates are applied to the
database; otherwise, Ti is rolled back.
 The three phases of concurrently executing transactions can be

interleaved, but each transaction must go through the three phases in that
order.
□ Assume for simplicity that the validation and write phase occur
together, atomically and serially
 I.e., only one transaction executes validation/write at a time.

 Also called as optimistic concurrency control since transaction executes


fully in the hope that all will go well during validation
Validation-Based Protocol (Cont.)
88

 Each transaction Ti has 3 timestamps


□ Start(Ti) : the time when Ti started its execution

□ Validation(Ti): the time when Ti entered its validation phase

□ Finish(Ti) : the time when Ti finished its write phase

 Serializability order is determined by timestamp given at


validation time, to increase concurrency.
□ Thus TS(Ti) is given the value of Validation(Ti).

 This protocol is useful and gives greater degree of concurrency


if probability of conflicts is low.
□ because the Serializability order is not pre-decided, and

□ relatively few transactions will have to be rolled back.


Validation Test for Transaction Tj
89

 If for all Ti with TS (Ti) < TS (Tj) either one of the following condition
holds:
□ finish(Ti) < start(Tj)

□ start(Tj) < finish(Ti) < validation(Tj) and the set of data items
written by Ti does not intersect with the set of data items read by
Tj.
then validation succeeds and Tj can be committed. Otherwise,
validation fails and Tj is aborted.
 Justification: Either the first condition is satisfied, and there is no
overlapped execution, or the second condition is satisfied and
 the writes of Tj do not affect reads of Ti since they occur after Ti
has finished its reads.
 the writes of Ti do not affect reads of Tj since Tj does not read
any item written by Ti.
Schedule Produced by Validation
90

 Example of schedule produced using validation

T14 T15
read(B)
read(B)
B:= B-50
read( )
A:= A+50
read(A)
(validate)
display (A+B)
(validate)
write (B)
write (A)
Multiversion Schemes
91

 Multiversion schemes keep old versions of data item to increase


concurrency.
□ Multiversion Timestamp Ordering

□ Multiversion Two-Phase Locking

 Each successful write results in the creation of a new version of


the data item written.
 Use timestamps to label versions.
 When a read(Q) operation is issued, select an appropriate
version of Q based on the timestamp of the transaction, and
return the value of the selected version.
 reads never have to wait as an appropriate version is returned
immediately.
Multiversion Timestamp Ordering
92

 Each data item Q has a sequence of versions <Q1, Q2,...., Qm>.


Each version Qk contains three data fields:
□ Content -- the value of version Qk.

□ W-timestamp(Qk) -- timestamp of the transaction that created


(wrote) version Qk
□ R-timestamp(Qk) -- largest timestamp of a transaction that
successfully read version Qk
 when a transaction Ti creates a new version Qk of Q, Qk's W-
timestamp and R-timestamp are initialized to TS(Ti).
 R-timestamp of Qk is updated whenever a transaction Tj reads
Qk, and TS(Tj) > R-timestamp(Qk).
Multiversion Timestamp Ordering (Cont)
93

 Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk


denote the version of Q whose write timestamp is the largest write timestamp
less than or equal to TS(Ti).
1. If transaction Ti issues a read(Q), then the value returned is the content
of version Qk.
2. If transaction Ti issues a write(Q)

1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.


1. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten
2. else a new version of Q is created.

 Observe that
□ Reads always succeed

□ A write by Ti is rejected if some other transaction Tj that (in the


serialization order defined by the timestamp values) should read
Ti's write, has already read a version created by a transaction older than
Ti.
 Protocol guarantees serializability
Multiversion Two-Phase Locking
94

 Differentiates between read-only transactions and update


transactions
 Update transactions acquire read and write locks, and hold all locks
up to the end of the transaction. That is, update transactions follow
rigorous two-phase locking.
□ Each successful write results in the creation of a new version of
the data item written.
□ each version of a data item has a single timestamp whose value
is obtained from a counter ts-counter that is incremented during
commit processing.
 Read-only transactions are assigned a timestamp by reading the
current value of ts-counter before they start execution; they follow
the multiversion timestamp-ordering protocol for performing reads.
Multiversion Two-Phase Locking (Cont.)
95

 When an update transaction wants to read a data item:


□ it obtains a shared lock on it, and reads the latest version.

 When it wants to write an item


□ it obtains X lock on; it then creates a new version of the item and
sets this version's timestamp to .
 When update transaction Ti completes, commit processing occurs:
□ Ti sets timestamp on the versions it has created to ts-counter + 1

□ Ti increments ts-counter by 1

 Read-only transactions that start after Ti increments ts-counter will


see the values updated by Ti.
 Read-only transactions that start before Ti increments the
ts-counter will see the value before the updates by Ti.
 Only serializable schedules are produced.
MVCC (Multiversion Concurrency Control):
Implementation Issues
96

 Creation of multiple versions increases storage overhead


□ Extra tuples

□ Extra space in each tuple for storing version information

 Versions can, however, be garbage collected


□ E.g. if Q has two versions Q5 and Q9, and the oldest active
transaction has timestamp > 9, than Q5 will never be
required again
UNIT – V (Part - 3)
105

RECOVERY SYSTEM
Failure classification
Storage structure
Recovery and atomicity
Recovery algorithm
Log-based recovery
Shadow paging
Recovery with concurrent transactions
Buffer management
Failure with loss of non-volatile storage
Early lock release and Logical Undo
operations
Failure Classification
106

 Transaction failure :
□ Logical errors: transaction cannot complete due to some internal error
condition
□ System errors: the database system must terminate an active transaction
due to an error condition (e.g., deadlock)
 System crash: a power failure or other hardware or software failure
causes the system to crash.
□ Fail-stop assumption: non-volatile storage contents are assumed to not
be corrupted by system crash
 Database systems have numerous integrity checks to prevent
corruption of disk data
 Disk failure: a head crash or similar disk failure destroys all or part of disk
storage
□ Destruction is assumed to be detectable: disk drives use checksums to
detect failures
Recovery Algorithms
107

 Recovery algorithms are techniques to ensure database


consistency and transaction atomicity and durability despite
failures
□ Focus of this chapter

 Recovery algorithms have two parts


1. Actions taken during normal transaction processing to ensure
enough information exists to recover from failures
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity, consistency and
durability
Storage Structure
108

 Volatile storage:
□ does not survive system crashes

□ examples: main memory, cache memory

 Nonvolatile storage:
□ survives system crashes

□ examples: disk, tape, flash memory, non-volatile (battery


backed up) RAM
 Stable storage:
□ a mythical form of storage that survives all failures

□ approximated by maintaining multiple copies on distinct


nonvolatile media
Stable-Storage Implementation
109

 Maintain multiple copies of each block on separate disks


□ copies can be at remote sites to protect against disasters such as fire
or flooding.
 Failure during data transfer can still result in inconsistent copies: Block
transfer can result in
□ Successful completion

□ Partial failure: destination block has incorrect information

□ Total failure: destination block was never updated

 Protecting storage media from failure during data transfer (one solution):
□ Execute output operation as follows (assuming two copies of each
block):
1. Write the information onto the first physical block.
2. When the first write successfully completes, write the same
information onto the second physical block.
3. The output is completed only after the second write successfully
completes.
Stable-Storage Implementation (Cont.)
110

 Protecting storage media from failure during data transfer (cont.):


 Copies of a block may differ due to failure during output operation. To
recover from failure:
1. First find inconsistent blocks:

1. Expensive solution: Compare the two copies of every disk block.


2. Better solution:
 Record in-progress disk writes on non-volatile storage (Non-
volatile RAM or special area of disk).
 Use this information during recovery to find blocks that may
be inconsistent, and only compare copies of these.
 Used in hardware RAID systems
2. If either copy of an inconsistent block is detected to have an error
(bad checksum), overwrite it by the other copy. If both have no error,
but are different, overwrite the second block by the first block.
Block Storage Operations
111
Data Access
112

 Physical blocks are those blocks residing on the disk.


 Buffer blocks are the blocks residing temporarily in main memory.
 Block movements between disk and main memory are initiated
through the following two operations:
□ input(B) transfers the physical block B to main memory.

□ output(B) transfers the buffer block B to the disk, and replaces


the appropriate physical block there.
 Each transaction Ti has its private work-area in which local copies of
all data items accessed and updated by it are kept.
□ Ti's local copy of a data item X is called xi.

 We assume, for simplicity, that each data item fits in, and is stored
inside, a single block.
Data Access (Cont.)
113

 Transaction transfers data items between system buffer blocks and its
private work-area using the following operations :
□ read(X) assigns the value of data item X to the local variable xi.

□ write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
□ both these commands may necessitate the issue of an input(BX)
instruction before the assignment, if the block BX in which X resides is
not already in memory.
 Transactions
□ Perform read(X) while accessing X for the first time;

□ All subsequent accesses are to the local copy.

□ After last access, transaction executes write(X).

 output(BX) need not immediately follow write(X). System can perform


the output operation when it deems fit.
Example of Data Access
114

buffer
Buffer Block A input(A)
X A
Buffer Block B Y B
output(B)
read(X)
write(Y)

x2
x1
y1

work area work area


of T1 of T2

memory disk

******END******
Recovery and Atomicity
115

 Modifying the database without ensuring that the transaction will


commit may leave the database in an inconsistent state.
 Consider transaction Ti that transfers $50 from account A to
account B; goal is either to perform all database modifications
made by Ti or none at all.
 Several output operations may be required for Ti (to output A
and B). A failure may occur after one of these modifications have
been made but before all of them are made.
Recovery and Atomicity (Cont.)
116

 To ensure atomicity despite failures, we first output information


describing the modifications to stable storage without modifying
the database itself.
 We study two approaches:
□ log-based recovery, and

□ shadow-paging

 We assume (initially) that transactions run serially, that is, one


after the other.
Recovery Algorithm
Logging (during normal operation):
<Ti start> at transaction start
<Ti, Xj, V1, V2> for each update, and
<Ti commit> at transaction end
Transaction rollback (during normal operation)
Let Ti be the transaction to be rolled back
Scan log backwards from the end, and for each log record of Ti of the form <Ti, Xj, V1, V2>
perform the undo by writing V1 to Xj,
write a log record <Ti , Xj, V1>
such log records are called compensation log records
Once the record <Ti start> is found stop the scan and write the log record <Ti abort>
Recovery Algorithm (Cont.)
Recovery from failure: Two phases
Redo phase: replay updates of all transactions, whether they committed, aborted, or
are incomplete
Undo phase: undo all incomplete transactions
Redo phase:
1. Find last <checkpoint L> record, and set undo-list to L.
2. Scan forward from above <checkpoint L> record
1. Whenever a record <Ti, Xj, V1, V2> or <Ti, Xj, V2> is found, redo it by writing
V2 to Xj
2. Whenever a log record <Ti start> is found, add Ti to undo-list
3. Whenever a log record <Ti commit> or <Ti abort> is found, remove Ti from
undo-list
Recovery Algorithm (Cont.)
Undo phase:
1. Scan log backwards from end
1. Whenever a log record <Ti, Xj, V1, V2> is found where Ti is in undo-list
perform same actions as for transaction rollback:
1. perform undo by writing V1 to Xj.
2. write a log record <Ti , Xj, V1>
2. Whenever a log record <Ti start> is found where Ti is in undo-list,
1. Write a log record <Ti abort>
2. Remove Ti from undo-list
3. Stop when undo-list is empty
 i.e. <Ti start> has been found for every transaction in undo-list
After undo phase completes, normal transaction processing can
commence
Example of Recovery
Log-Based Recovery
Log-Based Recovery
117

 A log is kept on stable storage.


□ The log is a sequence of log records, and maintains a record of update
activities on the database.
□ Each and every log record is given a unique id called log sequence
number.

 A log record is written for each of the following actions:


1. Updating a page
2. Commit
3. Abort
4. End
5. Undoing an update

Every log record contains the following fields:


Prev Trans Type Page Length Offset Before After
LSN Id ID Image image
Log-Based Recovery Cont..
118

 When transaction Ti starts, it registers itself by writing a


<Ti start>log record
 Before Ti executes write(X), a log record <Ti, X, V1, V2> is written,
where V1 is the value of X before the write, and V2 is the value to be
written to X.
□ Log record notes that Ti has performed a write on data item Xj Xj
had value V1 before the write, and will have value V2 after the
write.
 When Ti finishes it last statement, the log record <Ti commit> is
written.
 We assume for now that log records are written directly to stable
storage (that is, they are not buffered)
 Two approaches using logs
□ Deferred database modification

□ Immediate database modification


Deferred Database Modification
119

 The deferred database modification scheme records all


modifications to the log, but defers all the writes to after partial
commit.
 Assume that transactions execute serially
 Transaction starts by writing <Ti start> record to log.
 A write(X) operation results in a log record <Ti, X, V> being
written, where V is the new value for X
□ Note: old value is not needed for this scheme

 The write is not performed on X at this time, but is deferred.


 When Ti partially commits, <Ti commit> is written to thelog
 Finally, the log records are read and used to actually execute
the previously deferred writes.
Deferred Database Modification (Cont.)
120

 During recovery after a crash, a transaction needs to be redone if and


only if both <Ti start> and<Ti commit> are there in the log.
 Redoing a transaction Ti ( redoTi) sets the value of all data items
updated by the transaction to the new values.
 Crashes can occur while
□ the transaction is executing the original updates, or

□ while recovery action is being taken

 example transactions T0 and T1 (T0 executes before T1):


T0: read (A) T1 : read (C)
A: - A - 50 C:- C-100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
Deferred Database Modification (Cont.)
121

 Below we show the log as it appears at three instances of time.

 If log on stable storage at time of crash is as in case:


(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Immediate Database Modification
122

 The immediate database modification scheme allows database


updates of an uncommitted transaction to be made as the writes are
issued
□ since undoing may be needed, update logs must have both old
value and new value
 Update log record must be written before database item is written
□ We assume that the log record is output directly to stable storage

□ Can be extended to postpone log record output, so long as prior


to execution of an output(B) operation for a data block B, all log
records corresponding to items B must be flushed to stable storage
 Output of updated blocks can take place at any time before or after
transaction commit
 Order in which blocks are output can be different from the order in
which they are written.
Immediate Database Modification Example
123

Log Write Output

<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start> x1
<T1, C, 700, 600>
C = 600
BB, BC
<T1 commit>
BA
 Note: BX denotes block containing X.
Immediate Database Modification (Cont.)
124

 Recovery procedure has two operations instead of one:


□ undo(Ti) restores the value of all data items updated by Ti to their
old values, going backwards from the last log record for Ti
□ redo(Ti) sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
 Both operations must be idempotent
□ That is, even if the operation is executed multiple times the effect is
the same as if it is executed once
 Needed since operations may get re-executed during recovery
 When recovering after failure:
□ Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.
□ Transaction Ti needs to be redone if the log contains both the
record <Ti start> and the record <Ti commit>.
 Undo operations are performed first, then redo operations.
Immediate DB Modification Recovery Example
125

Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Checkpoints
126

 Problems in recovery procedure as discussed earlier :


1. searching the entire log is time-consuming

2. we might unnecessarily redo transactions which have already

3. output their updates to the database.

 Streamline recovery procedure by periodically performing


checkpointing
1. Output all log records currently residing in main memory onto
stable storage.
2. Output all modified buffer blocks to the disk.

3. Write a log record < checkpoint> onto stable storage.


Checkpoints (Cont.)
127

 During recovery we need to consider only the most recent


transaction Ti that started before the checkpoint, and transactions
that started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is found.

3. Need only consider the part of log following above start


record. Earlier part of log can be ignored during recovery, and
can be erased whenever desired.
4. For all transactions (starting from Ti or later) with no
<Ti commit>, execute undo(Ti). (Done only in case of
immediate modification.)
5. Scanning forward in the log, for all transactions starting
from Ti or later with a <Ti commit>, execute redo(Ti).
Example of Checkpoints
128

Tc Tf
T1
T2
T3
T4

checkpoint system failure

 T1 can be ignored (updates already output to disk due to


checkpoint)
 T2 and T3 redone.
 T4 undone
Shadow Paging
Shadow Paging
129

 Shadow paging is an alternative to log-based recovery; this


scheme is useful if transactions execute serially
 Idea: maintain two page tables during the lifetime of a transaction -
the current page table, and the shadow page table
 Store the shadow page table in nonvolatile storage, such that state
of the database prior to transaction execution may be recovered.
□ Shadow page table is never modified during execution

 To start with, both the page tables are identical. Only current page
table is used for data item accesses during execution of the
transaction.
 Whenever any page is about to be written for the first time
□ A copy of this page is made onto an unused page.

□ The current page table is then made to point to the copy

□ The update is performed on the copy


Sample Page Table
130
Example of Shadow Paging
131

Shadow and current page tables after write to page 4


Shadow Paging (Cont.)
132

 To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
□ keep a pointer to the shadow page table at a fixed (known)
location on disk.
□ to make the current page table the new shadow page table, simply
update the pointer to point to current page table on disk
 Once pointer to shadow page table has been written, transaction is
committed.
 No recovery is needed after a crash - new transactions can start right
away, using the shadow page table.
 Pages not pointed to from current/shadow page table should be freed
(garbage collected).
Shadow Paging (Cont.)
133

 Advantages of shadow-paging over log-based schemes


□ no overhead of writing log records
□ recovery is trivial

 Disadvantages :
□ Copying the entire page table is very expensive
 Can be reduced by using a page table structured like a B+-tree
 No need to copy entire tree, only need to copy paths in the
tree that lead to updated leaf nodes
□ Commit overhead is high even with above extension
 Need to flush every updated page, and page table
□ Data gets fragmented (related pages get separated on disk)

□ After every transaction completion, the database pages containing


old versions of modified data need to be garbage collected
□ Hard to extend algorithm to allow transactions to run concurrently
 Easier to extend log based schemes
******END******
Buffer Management (Cont.)
146

 Database buffer can be implemented either


□ in an area of real main-memory reserved for the database, or

□ in virtual memory

 Implementing buffer in reserved main-memory has drawbacks:


□ Memory is partitioned before-hand between database buffer
and applications, limiting flexibility.
□ Needs may change, and although operating system knows best
how memory should be divided up at any time, it cannot
change the partitioning of memory.
Buffer Management (Cont.)
147

 Database buffers are generally implemented in virtual memory in


spite of some drawbacks:
□ When operating system needs to evict a page that has been
modified, the page is written to swap space on disk.
□ When database decides to write buffer page to disk, buffer
page may be in swap space, and may have to be read from
swap space on disk and output to the database on disk,
resulting in extra I/O!
 Known as dual paging problem.
□ Ideally when OS needs to evict a page from the buffer, it
should pass control to database, which in turn should
1. Output the page to database instead of to swap space
(making sure to output log records first), if it is modified
2. Release the page from the buffer, for the OS to use
Dual paging can thus be avoided, but common operating
systems do not support such functionality.
******END******
Failure with Loss of Nonvolatile Storage
148

 So far we assumed no loss of non-volatile storage


 Technique similar to checkpointing used to deal with loss of non-
volatile storage
□ Periodically dump the entire content of the database to
stable storage
□ No transaction may be active during the dump procedure; a
procedure similar to checkpointing must take place
 Output all log records currently residing in main memory
onto stable storage.
 Output all buffer blocks onto the disk.
 Copy the contents of the database to stable storage.
 Output a record <dump> to log on stable storage.

******END******
Recovery with Early Lock Release and
Logical Undo Operations
Recovery with Early Lock Release
Support for high-concurrency locking techniques, such as those used
for B+-tree concurrency control, which release locks early
Supports “logical undo”
Recovery based on “repeating history”, whereby recovery executes
exactly the same actions as normal processing
Logical Undo Logging
Operations like B+-tree insertions and deletions release locks early.
They cannot be undone by restoring old values (physical undo), since once a lock is released,
other transactions may have updated the B+-tree.
Instead, insertions (resp. deletions) are undone by executing a deletion (resp. insertion)
operation (known as logical undo).
For such operations, undo log records should contain the undo operation to
be executed
Such logging is called logical undo logging, in contrast to physical undo logging
Operations are called logical operations
Other examples:
delete of tuple, to undo insert of tuple
allows early lock release on space allocation information
subtract amount deposited, to undo deposit
allows early lock release on bank balance
Physical Redo
Redo information is logged physically (that is, new value for each
write) even for operations with logical undo
Logical redo is very complicated since database state on disk may not be “operation
consistent” when recovery starts
Physical redo logging does not conflict with early lock release
Operation Logging
Operation logging is done as follows:
1. When operation starts, log <Ti, Oj, operation-begin>. Here Oj is a unique identifier of the
operation instance.
2. While operation is executing, normal log records with physical redo and physical undo
information are logged.
3. When operation completes, <Ti, Oj, operation-end, U> is logged, where U contains
information needed to perform a logical undo information.
Example: insert of (key, record-id) pair (K5, RID7) into index I9

<T1, O1, operation-begin>


….
<T1, X, 10, K5> Physical redo of steps in insert
<T1, Y, 45, RID7>
<T1, O1, operation-end, (delete I9, K5, RID7)>
Operation Logging (Cont.)

If crash/rollback occurs before operation completes:


the operation-end log record is not found, and
the physical undo information is used to undo operation.
If crash/rollback occurs after the operation completes:
the operation-end log record is found, and in this case
logical undo is performed using U; the physical undo information for the operation is
ignored.
Redo of operation (after crash) still uses physical redo information.
Transaction Rollback with Logical Undo

Rollback of transaction Ti is done as follows:


Scan the log backwards
1. If a log record <Ti, X, V1, V2> is found, perform the undo and log a al <Ti, X, V1>.
2. If a <Ti, Oj, operation-end, U> record is found
Rollback the operation logically using the undo information U.
Updates performed during roll back are logged just like during normal operation
execution.
At the end of the operation rollback, instead of logging an operation-end record,
generate a record
<Ti, Oj, operation-abort>.
Skip all preceding log records for Ti until the record
<Ti, Oj operation-begin> is found
Transaction Rollback with Logical Undo (Cont.)

Transaction rollback, scanning the log backwards (cont.):


3. If a redo-only record is found ignore it
4. If a <Ti, Oj, operation-abort> record is found:
 skip all preceding log records for Ti until the record
<Ti, Oj, operation-begin> is found.
5. Stop the scan when the record <Ti, start> is found
6. Add a <Ti, abort> record to the log
Some points to note:
Cases 3 and 4 above can occur only if the database crashes while a
transaction is being rolled back.
Skipping of log records as in case 4 is important to prevent multiple
rollback of the same operation.
Transaction Rollback with Logical
Undo
Transaction rollback during normal
operation
Failure Recovery with Logical Undo
Transaction Rollback: Another
Example
Example with a complete and an incomplete operation
<T1, start>
<T1, O1, operation-begin>
….
<T1, X, 10, K5>
<T1, Y, 45, RID7>
<T1, O1, operation-end, (delete I9, K5, RID7)>
<T1, O2, operation-begin>
<T1, Z, 45, 70>
 T1 Rollback begins here
<T1, Z, 45>  redo-only log record during physical undo (of incomplete O2)
<T1, Y, .., ..>  Normal redo records for logical undo of O1

<T1, O1, operation-abort>  What if crash occurred immediately after this?
<T1, abort>
Recovery Algorithm with Logical
Undosame as earlier algorithm, except for changes described
Basically
earlier for transaction rollback
1. (Redo phase): Scan log forward from last < checkpoint L> record
till end of log
1. Repeat history by physically redoing all updates of all transactions,
2. Create an undo-list during the scan as follows
undo-list is set to L initially
Whenever <Ti start> is found Ti is added to undo-list
Whenever <Ti commit> or <Ti abort> is found, Ti is deleted from undo-list
This brings database to state as of crash, with committed as well
as uncommitted transactions having been redone.
Now undo-list contains transactions that are incomplete, that is,
have neither committed nor been fully rolled back.
Recovery with Logical Undo (Cont.)
Recovery from system crash (cont.)
2. (Undo phase): Scan log backwards, performing undo on log records
of transactions found in undo-list.
Log records of transactions being rolled back are processed as described earlier, as they
are found
Single shared scan for all transactions being undone
When <Ti start> is found for a transaction Ti in undo-list, write a <Ti abort> log record.
Stop scan when <Ti start> records have been found for all Ti in undo-list
This undoes the effects of incomplete transactions (those with neither
commit nor abort log records). Recovery is now complete.

You might also like