0% found this document useful (0 votes)
9 views

Unit 9(DBMS)

The document discusses crash recovery in database management systems, emphasizing the importance of maintaining atomicity and durability during failures. It outlines various types of failures, storage structures, and recovery methods, including deferred and immediate database modifications, as well as the use of logs and checkpoints. Additionally, it addresses concurrency control and factors affecting COMMIT operation performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 9(DBMS)

The document discusses crash recovery in database management systems, emphasizing the importance of maintaining atomicity and durability during failures. It outlines various types of failures, storage structures, and recovery methods, including deferred and immediate database modifications, as well as the use of logs and checkpoints. Additionally, it addresses concurrency control and factors affecting COMMIT operation performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
Unit 9 Crash Recovery Y Acomputer system, like any other device, is subject to failure from a variety of causes: disk crash, power outage, software error, fire etc. In any failure, information may be lost. ¥ Therefore, the database system must take actions in advance to ensure that the atomicity and durability properties of transactions are preserved. ¥ An integral part of a database system is a recovery scheme that can restore the database to the consistent state that existed before the failure. Failure classification There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner. Some major types of failure are as follows: “Transaction failure There are two types of errors that may cause a transaction to fail: > Logical error. The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not found, overflow, or resource limit exceeded. > System error. The system has entered an undesirable state (for example, deadlock), asa result of which a transaction cannot continue with its normal execution, The transaction, however, can be re executed at a later time. System crash > Apower failure or other hardware or software failure causes the system to crash, > Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by system crash. > Database systems have numerous integrity checks to prevent corruption of disk data. Disk failure > A disk block loses its content as a result of either a head crash or failure during a data-transfer operation. > Copies of the data on other disks, or tertiary media, such as DVD or tapes, are used to recover from the failure. Storage structure Various data items in the database may be stored and accessed in a number of different storage media. We identified three categories of storage. © Volatile storage © Nonvolatile storage © Stable storage Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel [Page Volatile storage Y Data residing in volatile storage does not survive system crashes ¥ Examples: main memory, cache memory Nonvolatile storage Y. Data residing in non-volatile storage survives system crashes Examples: disk, tape, flash memory, non-volatile RAM ¥ But may still fail losing data Stable storage Y Amythical form of storage that survives all failures Y Approximated by maintaining multiple copies on distinct nonvolatile media. Log and log records Y The log is a sequence of log records, recording all the updated activities in the database. In stable storage, logs for each transaction are maintained. Y Any operation which is performed on the database is recorded on the log. < When transaction 7; starts, it registers itself by writing a log record Y Prior to performing any modification to the database, an updated log record is created to X, V1, V2> has these reflect that modification. An update log record represented as: < fields: «Transaction identifier (Ti): Unique Identifier of the transaction that performed the write operation. «Data item (X): Unique identifier of the data item written. Old value (V1): Value of data item prior to write, «New value(\2): Value of data item after write operation. ¥ When Tifinishes it last statement, the log record is written. Y We assume for now that log records are directly to stable storage (that is, they are not buffered) Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 21Page Database modification The database can be modified using two approaches: 1. Deferred database modification 2. Immediate database modification Deferred database modifica Y The deferred database modification scheme records all modifications to the log but defers all the writes to after partial commit. Y If the system crashes before the transaction completes its execution, or if the transaction abort, then the information on the log is simply ignored. Y Transactions starts by writing record to the log. Y Awrite(X) operations results in a log record being written, where V is the new value for X. Y The write is not performed on X at this time, but is deferred. ¥ When T; Partially commits, < Ti, commit> is written to the log. Y Finally ,the log records are read and used to actually execute the previously deferred writes. During recovery after crash, a transaction needs to be redone if and only if both and are there is the log. Redoing a transaction T,(redo T)) sets the value of all data items updated by the transaction to the new values. (b) © If log on stable storage at time of crash is as in cas a) No redo actions need to be taken b) redo (To) must be performed since < To commit> is present ¢) redo (To) must be performed followed by redo (T1) since < To commit> are present. If the transaction fails before reaching its commit point, it will have made no changes to database ‘any way so no UNDO operation is necessary. It may be necessary to REDO effect of the operations of commit transactions from the log because their effect may not have been recorded in the database. Therefore it is also known as no undo algorithm. Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 31Page 2. Immediate database modification Allows database modification while the transaction is still active. Which means all the modifications that is performed before the transaction reaches to commiit state are updated to database. Database modifications written by active transactions are called uncommitted modifications. Update log must be written before database items is written. Transaction starts by writing record to log. Awrite(X) operations result in the log record where V1 is the old value and V2 is the new value . Since undoing may be needed, update logs must have both old value and new value. The write operation on X is recorded in log on disk and is output directly to stable storage without concerning transaction commits or not. In case of failure recovery procedure has two operations instead of one: 1. undo (Ti) restores the value of all data items updated by T; to their old values, going backwards from the last record for T. 2. redo(Ti) sets the value of all data items updated by Ti to the new values, going forward from the first log record for Ti When recovering after failure: v v v Example Transaction T; needs to be undone if the log contains the record ,but does not contain the record Transaction T; needs to be redone if the log contains both the record and the record Undo operations are performed first, then redo operations. Below we show the log as it appears at three instances of time (a) (b) (© Recovery actions in each case above are: a) Undo (To):B is restored to 2000 and A to 1000 b) Undo (Tz) and redo (To) : Cis restored to 700,and A and B are set to 950 and 2050 respectively. ¢) Redo (To) and Redo (T:): A and Bis set to 950 and 2050 respectively. Then C is set to 600. Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 4[Page Undo and Redo transaction using log Because all database modifications must be preceded by the creation of a log record, the system has available both the old value prior to the modification of the data item and new value that is to be written for data item. This allows system to perform redo and undo operations as appropriate: Undo: using a log record sets the data item specified in log record to old value. (used for immediate database modification only) Redo: using a log record sets the data item specified in log record to new value. Checkpoint The Checkpoint is used to declare a point before which the DBMS was in a consistent state, and all transactions were committed. Use of Checkpoints When a system crash occurs, user must consult the log. In principle, that need to search the entire log to determine this information. There are two major difficulties with this approach: Y The search process is time-consuming, Y Most of the transactions that, according to our algorithm, need to be redone have already written their updates into the database. Although redoing them will cause no harm, it will cause recovery to take longer. To reduce these types of overhead, user introduce checkpoints. A log record of the form is used to represent a checkpoint in log where Lis a list of transactions active at the time of the checkpoint. When a checkpoint log record is added to log all the transactions that have committed before this checkpoint have log record before the checkpoint record. Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress During recovery we need to consider only the most recent transaction T; that started before the checkpoint, and transactions that started after T. ¥ Scan backwards from end of log to find the most recent record ¥ continue scanning backwards till a record is found. Y Need to consider the part of log following above record. Earlier part of log can be ignored during recovery . After the transaction T; identified, the redo and undo operations to be applied to the T)_ and all T, that started execution after transaction Ti . Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 51Page For all transactions (starting from; or later) > with no , execute undo (Ti) (Done only in case of immediate database modification) > with , execute redo (Ti) iP T, 7 T. T. T, checkpoint system failure ¥ Tican be ignored (updates already output to disk due to checkpoint) Tz and Tsredone ¥ Ts undone « Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 61 Page Concurrency control and recovery Concurrency control means that multiple transactions can be executed at the same time and then the interleaved logs occur. But there may be changes in transaction results so maintain the order of execution of those transactions. During recovery, it would be very difficult for the recovery system to backtrack all the logs and then start recovering. Recovery with concurrent transactions can be done in the following four ways. Interaction with concurrency control Transaction rollback Checkpoints Restart recovery Pen 1. Interaction with concurrency control: The recovery scheme depends greatly on the concurrency-control scheme that is used. To roll back a failed transaction, we must undo the updates performed by the transaction. Suppose that a transaction Tohas to be rolled back, and a data item Q that was updated by Tohas to be restored to its old value. Using the log-based schemes for recovery, we restore the value by using the undo information in a log record. ‘Suppose now that a second transaction T1 has performed yet another update on Q before Tois rolled back. Then, the update performed by 7: will be lost if Tos rolled back. Therefore, we require that, if a transaction T has updated a data item Q, no other transaction may update the same data item until T has committed or been rolled back. We can ensure this requirement easily by using strict two-phase locking—that is, two-phase locking with exclusive locks held until the end of the transaction. 2. Transaction rollback: We roll back a failed transaction, Ti, by using the log. The system scans the log backward; for every log record of the form found in the log, the system restores the data item Xj to its old value V1. Scanning of the log terminates when the log record is found. Scanning the log backward is important, since a transaction may have updated a dat: than once. As an illustration, consider the pair of log records The log records represent a modification of data item A by T:, followed by another modification of A by Ti. Scanning the log backward sets A correctly to 10. Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 71Page 3. Checkpoint The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the transactions were committed. This reduces the amount of work during recovery. By creating checkpoints, recovery operations don't need to start from the very beginning. Instead, they can begin from the most recent checkpoint, thereby considerably speeding up the recovery process. Since we assumed no concurrency, it was necessary to consider only the following transactions during recovery: Y Those transactions that started after the most recent checkpoint ¥ The one transaction, if any, that was active at the time of the most recent checkpoint The situation is more complex when transactions can execute concurrently, since several transactions may have been active at the time of the most recent checkpoint. Ina concurrent transa n-processing system, we require that the checkpoint log record be of the, form , where L is a list of transactions active at the time of the checkpoint. Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress. 4. Restart Recovery When the system recovers from a crash, it constructs two lists: The undo-list consists of transactions to be undone, and the redo-list consists of transactions to be redone. The system constructs the two lists as follows: 1. Initially, they are both empty. 2. The system scans the log backward, examining each record, un record, then i. For each record found of the form SYNCMEM or SYNCMEM -> ASYNC) improve the performance. If yes, you can analyze if there are unnecessary replication delays, e.g. due to limited network bandwidth or high latency times. Synchronous table replication Ifa transaction performs changes on a table with activated optimistic synchronous table replication (OSTR), the replicas need to be refreshed during COMMIT. This can have an adverse impact on the COMMIT performance. High amount of active versions ‘The COMMIT performance can be significantly impacted by a high amount of active versions. Integrated liveCache Ifa SAP HANA integrated liveCache is used, the SAP HANA COMMIT performance can be impacted by related COMMITS on liveCache side. ‘The time for the 'Kernel-Commit' method is part of the SAP HANA COMMIT time. Other related times like 'Flush-Cache’, 'Commit- Invalidate-Callback’ and 'Validate-Callback’ have to be considered on top of the SAP HANA COMMIT time. During a liveCache COMMIT a potentially large amount of data has to be flushed, so in case of high COMMIT times you should always consider the amount of flushed data to judge if it is more an I/O issue or a data volume issue. ‘SAP HANA bugs The following SAP HANA bugs can be responsible for increased COMMIT times: Impacted Revision 122.062,00.000 Details Overhead in internal COMMIT processing design If COMMITS take very long, a termination with error "snapshot timestamp synchronization failed" is possible. Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel 9]Pace High Availability Using Remote Backup System ¥ Traditional transaction-processing systems are centralized or client-server systems. Such systems are vulnerable to environmental disasters such as fire, flooding, or earthquakes. ¥ Sothat there is a need for transaction-processing systems that can function in spite of system failures or environmental disasters and such systems must provide high availability. ¥ We can achieve high availability by performing transaction processing at one site, called the primary site, and having a remote backup site where all the data from the primary site are replicated, The remote backup site is sometimes also called the secondary site. ¥ The remote site must be kept synchronized with the primary site, as updates are performed at the primary. We achieve synchroi ‘ation by sending all log records from primary site to the remote backup site, The remote backup site must be physically separated from the primary—for example, we can locate it in a different state—so that a disaster at the primary does not damage the remote backup site. Figure below shows the architecture of a remote backup system. primary network backup log Siok GG ¥ When the primary site fails, the remote backup site takes over processing. It performs recovery, using its copy of the data from the primary, and the log records received from the primary. In effect, the remote backup site is performing recovery actions that would have been performed at the primary site when the latter recovered. Y Once recovery has been performed, the remote backup site starts processing transactions. Several issues must be addressed in designing a remote backup system: Detection of failure: It is important for the remote backup system to detect when the primary has failed. Failure of communication lines can fool the remote backup into believing that the primary has failed. To avoid this problem, we maintain several communication links with independent modes of failure between the primary and the remote backup. Transfer of control: When the primary fails, the backup site takes over processing and becomes the new primary. When the original primary site recovers, it can either play the role of remote backup, or take over the role of primary site again. In either case, the old primary must receive a log of updates carried out by the backup site while the old primary was down. Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel aolPege Time to recover: If the log at the remote backup grows large, recovery will take a long time. The remote backup site can periodically process the redo log records that it has received and can perform a checkpoint, so that earlier parts of the log can be deleted. A hot-spare configuration can make takeover by the backup site almost instantaneous. In this configuration, the remote backup site continually processes redo log records as they arrive, applying the updates locally. As soon as the failure of the primary is detected, the backup site completes recovery by rolling back incomplete transactions; it is then ready to process new transactions. Time to commit: To ensure that the updates of a committed transaction are durable, a transaction must not be declared committed until its log records have reached the backup site. This delay can result in a longer wait to commit a transaction, and some systems therefore permit lower degrees of durability. The degrees of durability can be classified as follows: One-safe: A transaction commits as soon as its commit log record is written to stable storage at the primary site. Two-very-safe: A transaction commits as soon as its commit log record is written to stable storage at the primary and the backup site. Two-safe: This scheme is the same as two-very-safe if both primary and backup sites are active. If only the primary is active, the transaction is allowed to commit as soon as its commit log record is written to stable storage at the primary site Unit-9 Crash Recovery (DBMS) Compiled By: Pradip Paudel [Page

You might also like