What are Database Recovery Techniques
What are Database Recovery Techniques
Database recovery refers to the process of restoring a database to a consistent state after a failure
or corruption. It involves bringing the database back to a point where data is accurate,
transactions are committed, and system integrity is maintained. Several techniques are employed
to achieve effective database recovery, each tailored to address specific types of failures.
Types of Database Recovery Techniques
Below are some Types of Database Recovery Techniques:
Backup and Restore: One of the fundamental database recovery techniques involves
regular backups. Organizations create backup copies of their databases at predefined
intervals. In the event of data loss or corruption, the database can be restored to a
previous state using these backups. However, the effectiveness of this technique depends
on the frequency of backups and the ability to minimize the data loss between backup
intervals.
Transaction Log Processing: Transaction logs record all changes made to the database,
allowing organizations to replay or roll forward transactions in the event of a failure. This
technique ensures that committed transactions are reapplied to the database, bringing it
back to a consistent state. Transaction log processing is particularly effective for point-in-
time recovery, allowing organizations to restore databases to a specific moment in time.
Checkpoint Mechanisms: Database management systems (DBMS) often employ
checkpoint mechanisms to create a stable point in the transaction log. Checkpoints help in
minimizing the recovery time by indicating a consistent state from which the system can
quickly recover in the event of a failure. Regular checkpoints reduce the amount of
transaction log processing required during recovery.
Shadow Paging: Shadow paging is a technique where a shadow or duplicate copy of the
entire database is maintained. Changes are applied to the shadow copy, and only once the
transaction is committed, the changes are atomically swapped with the original database.
In case of a failure, the system can revert to the unaltered shadow copy, ensuring data
consistency.
Database Replication: Database replication involves maintaining duplicate copies of the
database on separate servers. In the event of a failure, traffic can be redirected to the
replica, ensuring continuity of operations. Replication is not only a recovery technique
but also enhances data availability and load balancing.
Point-in-Time Recovery: Point-in-time recovery allows organizations to restore a
database to a specific moment in time before a failure occurred. This technique is
particularly useful when data corruption is identified after it has occurred, and
organizations need to roll back the database to a state before the corruption occurred.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
Transaction failure
System crash
Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it
can't go any further. If a few transaction or process is hurt, then this is called as transaction
failure.
Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or software failure. Example:
Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.
3. Disk Failure
It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the
disk or any other failure, which destroy all or part of disk storage.
Log-Based Recovery
o The log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is applied
in the database.
Let's assume there is a transaction to modify the City of a student. The following logs are written
for this transaction.
o When the transaction is initiated, then it writes 'start' log.
o <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is
written to the file.
o <Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished, then it writes another log to indicate the end of the
transaction.
o <Tn, Commit>
There are two approaches to modify the database:
1. Deferred database modification:
o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage, and the database is
updated when a transaction commits.
2. Immediate database modification:
o The Immediate modification technique occurs if database modification occurs while the
transaction is still active.
o In this technique, the database is modified immediately after every operation. It follows
an actual database modification.
Recovery using Log records
When the system is crashed, then the system consults the log to find which transactions need to
be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the
Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or
<Ti, abort>, then the Transaction Ti needs to be undone.
Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are removed from the
system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the database,
and till that point, the entire log file will be removed from the file. Then the log file is
updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this failure:
o The recovery system reads log files from the end to start. It reads log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with <Tn, Start>
and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with <Tn, Start>
but no commit or abort log found. In the undo-list, all the transactions are undone, and
their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed amid.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in DBMS
as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the
grade table and needs to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state
and remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and
aborts one of the transactions.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains a
Wait for the graph to detect the deadlock cycle in the database.
Advertisement
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is created
based on the transaction and their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any
cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting
lock by another transaction then the DBMS simply checks the timestamp of both transactions. It
allows the older transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource,
then Ti is allowed to wait until the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj
is waiting for it, then Tj is killed and restarted later with the random delay but with the
same timestamp.
Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource which is held by
the younger transaction, then older transaction forces younger one to kill the transaction
and release the resource. After the minute delay, the younger transaction is restarted but
with the same timestamp.
o If the older transaction has held a resource which is requested by the Younger
transaction, then the younger transaction is asked to wait until older releases it.
DBMS Concurrency Control
Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be
solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So, there
is a need to manage these two operations in the concurrent execution of the transactions as if
these operations are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
Advertisement
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted
and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only,
as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:
Advertisement
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction T X rollbacks due to server problem, and the value changes
back to $300 (as initially).
o But the value for account A remains $350 for transaction T Y as committed, which is the
dirty read and therefore known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different
values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will
be read as $400.
o It means that within the same transaction T X, it reads two different values of account A,
i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions. Therefore,
these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
Distributed Database System