16 Logging and Recovery in Database Systems
16 Logging and Recovery in Database Systems
in Database systems
16.1 Introduction: Fail safe systems
16.1.1 Failure Types and failure model
16.1.2 DBS related failures
16.2 DBS Logging and Recovery principles
16.2.1 The Redo / Undo priciple
16.2.2 Writing in the DB
16.2.3 Buffer management
16.2.4 Write ahead log
16.2.5 Log entry types
16.2.6 Checkpoints
16.3 Recovery
16.3.1 ReDo / UnDo
16.4.2 Recovery algorithm
Lit.: Eickler/ Kemper chap 10, Elmasri /Navathe chap. 17, Garcia-Molina, Ullman, Widom: chap. 21
operation fault
correct
fault
safe state
1
Introduction
• Failure Model
HS / DBS05-20-LogRecovery 3
2
DBS related failure model
More failure types (not discussed in detail)
• Media failure (e.g. disk crash)
Ö Archive
• Catastrophic ("9-11-") failure
– loss of system
Ö Geographically remote standby system
HS / DBS05-20-LogRecovery 5
Fault tolerance
Fault tolerant system
– fail safe system, survives faults of the failure model
• How to achieve a fault tolerant system?
– Redundancy
• Which data should be stored redundantly ?
• When / how to save / synchronize them
– Recovery methods
• Utilize redundancy to reconstruct a consistent state
Ö "warm start"
– Important principle:
Make frequent operations fast
HS / DBS05-20-LogRecovery 6
3
Terminology
• Log
– redundantly stored data
– Short term redundancy
– Data, operations or both
• Archive storage
– Long term storage of data
– Sometimes forced by legal regulations
• Recovery
– Algorithms for restoring a consistent DB state
after system failure using log or archival data
HS / DBS05-20-LogRecovery 7
Log
HS / DBS05-20-LogRecovery 8
4
16.2.1 The UNDO / REDO Principle
• Do-Undo-Redo
DB state old Log record
DB state old
Use Redo
REDO
DO data from
Log file
DB state new Log record DB state new "Roll forward"
DBS Architecture
• When are data safe? Under control of
OS or middleware
TA programs
5
Redo / Undo
• Why REDO ?
– Changed data into database after each commit
Ö no redo
– In general too slow to force data to disk at commit
time
HS / DBS05-20-LogRecovery 11
Redo / Undo
• Why UNDO ?
– no dirty data written into DB before commit:
Ö no undo
TA changes must not be
written to disk before this point
BOT EOT
6
16.2.2 Writing into the DB
Update-in-place
A data page is written back to its physical
address
Writing data
• Indirect write to DB
Advantage: simple undo
Page tables
Files
7
16.2.3 Buffer Management
• Influence of buffering
– Database buffer (cache) has very high influence on
performance
TA programs
HS / DBS05-20-LogRecovery 15
DBS Buffer
• Buffer management
– Interface:
fetch(P) load Page P into buffer (if not there)
pin(P) don't allow to write or deallocate P
unpin(P)
flush(P) write page if dirty
deallocate(P) release block in buffer
– No transaction oriented operations
• Influence on logging and recovery
– When are dirty data written back?
– Update-in-place or update elsewhere?
• Interference with transaction management
– When are committed data in the DB, when still in buffer?
– May uncommitted data be written into the DB?
HS / DBS05-20-LogRecovery 16
8
Logging and Recovery Buffering
• Influence on recovery
– Force: Flush buffer before EOT (commit
processing)
– NoForce: Buffer manager decides on writes, not
TA-mgr
– NoSteal : Do not write dirty pages before EOT
– Steal: Write dirty pages at any time
Steal NoSteal
No recovery (!)
Force Undo recovery impossible with
no Redo update-in-place
/immediate
Undo recovery and No Undo but
NoForce
Redo recovery Redo recovery HS / DBS05-20-LogRecovery 17
HS / DBS05-20-LogRecovery 18
9
16.3 Implementing Backup and Recovery
• Commit Processing
commit- log Write log buffer Release locks
record in buffer
HS / DBS05-20-LogRecovery 19
HS / DBS05-20-LogRecovery 20
10
Safe write
Write must be safe – under all circumstances:
• Duplex disk write
Page m Page m+1
Page m Page m
Demonstrates, how
difficult it is to
guarantee failsafe
operation
11
Log types
2. Physical log
– Log each page that has been changed
Undo log data : old state of page (Before image)
Redo log data: new state (After image)
Advantage:
Redo / undo processing very simple
Disadvantage:
not compatible with finer lock granularity than page
HS / DBS05-20-LogRecovery 23
Log types
Entry log:
only those parts of pages logged which have been
changed e.g. a tuple
• Physiological
most popular method:
– physical on page level,
– logical within page.
• Transition log
may be applied for entry and page logging
HS / DBS05-20-LogRecovery 24
12
Logical / Physiological log
insert into A (r) A
A
B
B ...
...
Indexes
C
C
insert A, page 473,r
Physiological Log
Logical Log
HS / DBS05-20-LogRecovery 25
13
16.3.4 Checkpoints
• Limiting the Undo / Redo work
– Assumption: no force at commit, steal (as in most systems)
cp 1 cp 2
System start ... thousands of transactions ... which ones committed / open?
HS / DBS05-20-LogRecovery 28
14
Checkpoints
• Different types of checkpoints
Checkpoints signal a specific system state,
– Most simple example:
all updates forced to disk, no open transaction
– Has to be prepared before writing the checkpoint entry
– Expensive: "calming down" of the system as
assumed above is very time-consuming:
• All transactions willing to begin have to be suspended
• All running transactions have to commit or rollback
• The buffer has to be flushed (i.e. write out dirty pages)
• The checkpoint entry has to be written
• Benefit: no Redo / Undo before last checkpoint
• Time needed: minutes !
15
Checkpoints
Direct checkpoints
– Write all dirty buffer pages to stable storage
1. Transaction oriented checkpoints (TOC)
– Force dirty buffer pages of committing transaction
– Commit log entry is basically checkpoint
Expensive:
- hot spot pages used by different transactions must be written for
each transation
- Good for fast recovery – no redo – bad for normal processing
Red TA has to
be undone
CPn CP n+1
HS / DBS05-20-LogRecovery 31
Checkpoints
2. Transaction consistent checkpoint (TCC)
• Request CP
Wait until all active TAs committed,
Write dirty buffer pages of TAs
to be redone
to be undone
Request CP CP
16
Checkpoints
3. Action consistent checkpoint (ACC)
not the
• Request CP problem any
more,
Wait until no update operation is running, but that may
Write dirty buffer pages of TAs be an awful
lot of work
to be redone
to be undone
ROLLBACK
Request CP CP
Dirty data in
stable storage New update commands
suspended HS / DBS05-20-LogRecovery 33
17
Fuzzy Checkpoints
1. Stop accepting updates
2. Scan buffer to built a list of dirty pages
(may already exist as write queue)
3. Make list of active (open) transactions
together with pointer
to last log entry (see below)
4. Write checkpoint record and start accepting
updates
HS / DBS05-20-LogRecovery 35
Checkpoints
• ... Fuzzy checkpoints
– Last checkpoint does not limit redo log any more
– Use Log sequence number (LSN):
• For each dirty buffer page record in page header
the LSN of first update after page was transferred to buffer
(or was flushed)
• Minimal LSN (minDirtyLSN) limits redo recovery
CP
Two pages
and their updates
means write
to disk
HS / DBS05-20-LogRecovery 36
18
Checkpoints
• Fuzzy Checkpoints
– may be written at any time
– No need to flush buffer pages
flushing may occur asynchronous to writing the checkpoint
– Fuzzy checkpoints contain:
• ids of running transactions
• address of last log record for each TA
• "low water mark" minDirtyLSN
where
minDirtyLSN = min (LSN 1(p) : p is dirty and LSN1 is the
LSN of the first update of this page after being read into
the buffer). The minimum is taken over all dirty buffer
pages
• Buffer status: bit vector of dirty pages
(for optimization only)
HS / DBS05-20-LogRecovery 37
19
Logging and Recovery
HS / DBS05-20-LogRecovery 39
Do / Redo processing
LSN 117 LSN 118 commitcommit
Do and Redo LSN 112
w5[R3] w5[R1]
w0[R1] ... TA 5 TA 0
Page 471 LSN 112 Page 471 LSN 117 This update
has been
R1-new R1-new performed but
... has not been
R3-old R3-new written into the db
Redo recovery
Find log record with LSN 112 <= 117 LSN 117 <= 117 LSN 118 > 117
minDirtyLSN . no redo no redo redo
For subsequent log records r
which require redo:
apply update if ond only LSN 118
Page 471 LSN 117
if LSN( r ) > LSN (page) R1-new
R1-new
... R3-new
Important property: R3-new
One update is never
Page read from Page AFTER redo
performed more
Stable storage
than once ("idempotent") during recovery
HS / DBS05-20-LogRecovery 40
20
Do / Redo processing
Transaction rollback
– Each page contains LSN of last update in page
System is alive. Each 211 212
buffer pages LSN=115 LSN=211 logged operation 213
of this transaction
LSN=118 LSN=213
has to be undone
Log record page
1. log_entry :=Read last_entry of aborted TA (t)
2. Repeat
{ p:= locate page (log_entry.pageAdr); // may still be in buffer
apply (undo);
log_entry := log_entry.previous }
until log_entry = NIL
Undo after crash: update may have been written to stable storage
or was still in lost buffer. Modified undo:
If LSN.page >= LSN.log_entry then apply(undo)
HS / DBS05-20-LogRecovery 41
LSN 110 LSN 111 LSN 112 LSN 113 LSN 113 LSN ??
21
Logging and Recovery Do / Redo processing
HS / DBS05-20-LogRecovery 43
Reference:
C. Mohan et al.:
ARIES: A Transaction Recovery Method Supporting
Fine-Granularity Locking and Partial Rollbacks Using
Write-Ahead logging,
ACM TODS 17(1), Mach 1992 (see reader)
HS / DBS05-20-LogRecovery 44
22
Summary
• Fault tolerance:
– failure model is essential
– make the common case fast
• Logging and recovery in DBS
– essential for implementation of TA atomicity
– simple principles
– interference with buffer management makes solutions
complex
– naive implementations: too slow
HS / DBS05-20-LogRecovery 45
23