Key For II Mid DBMS B
Key For II Mid DBMS B
PART A
1 Answer all the questions each question carries 2 Marks
5*2=10
B
d) Describe Lost update problem and Unrepeatable read Problem.
Lost Update Problem (W-W Conflict)
Suppose that the operations of T1 and T2 are interleaved in such a way that T2 reads a value of a data item
before that was updated by T1 and now when T2 updates the value of that data item in the database the value of
the data item that was written by T1 was overwritten by the value written by T2 and hence it is lost. This is
known as lost update problem.
B
Insertion When a record is required to be entered using static hash, the hash function h computes the
bucket address for search key K, where the record will be stored. Bucket address = h(K)
Search When a record needs to be retrieved, the same hash function can be used to retrieve the address
of the bucket where the data is stored.
Delete This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function.
In this case, overflow chaining can be used.
Overflow Chaining When buckets are full, a new bucket is allocated for the same hash result and is linked
after the previous one. This mechanism is called Closed Hashing.
Linear Probing When a hash function generates an address at which data is already stored, the next free
bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the database
grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed
dynamically and on-demand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic
hashing, is made to produce a large number of values and only a few are used initially.
PART B
2) a) Describe about testing of Serializability.
A simple and efficient method for determining conflict serializability of a schedule. Consider a
schedule S. We construct a directed graph, called a precedence graph, from S. This graph consists of a
pair G = (V, E), where V is a set of vertices and E is a set of edges. The set of vertices consists of all
the transactions participating in the schedule. The set of edges consists of all edges Ti Tj for which
one of three conditions holds:
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
If an edge Ti Tj exists in the precedence graph, then, in any serial schedule S 1equivalent to S, Ti
must appear before Tj .
If the precedence graph for S has a cycle, then schedule S is not conflict serializable. If the graph
contains no cycles, then the schedule S is conflict serializable. A serializability order of the
transactions can be obtained through topological sorting, which determines a linear order consistent
with the partial order of the precedence graph. There are, in general, several possible linear orders that
can be obtained through a topological sorting. Thus, to test for conflict serializability, we need to
construct the precedence graph
and to invoke a cycle-detection algorithm. Cycle-detection algorithms, such as those based on depthfirst search, require on the order of n 2 operations, where n is the number of vertices in the graph (that
is, the number of transactions).
Testing for view serializability is complicated. The problem of testing for view serializability is itself
NP-complete. Thus, almost certainly there exists no efficient algorithm to test for view serializability.
However, concurrency-control schemes can still use sufficient conditions for view serializability. That
is, if the sufficient conditions are satisfied, the schedule is view serializable, but there may be viewserializable schedules that do not satisfy the sufficient conditions.
Output onto stable storage all the log records currently residing in main memory.
Now recovery will be to only process log records since the last crash.
(OR)
In the tree protocol, the only lock instruction allowed is lock-X. Each transaction
Ti can lock a data item at most once, and must observe the following rules:
1. The first lock by Ti may be on any data item.
2. Subsequently, a data item Q can be locked by Ti only if the parent of Q is
currently locked by Ti.
3. Data items may be unlocked at any time.
4. A data item that has been locked and unlocked by Ti cannot subsequently be
relocked by Ti.
All schedules that are legal under the tree protocol are conflict serializable.
B
The tree-locking protocol has an advantage over the two-phase locking protocol in that, unlike twophase locking, it is deadlock-free, so no rollbacks are required. The tree-locking protocol has another
advantage over the two-phase locking protocol in that unlocking may occur earlier. Earlier unlocking
may lead to shorter waiting times, and to an increase in concurrency. However, the protocol has the
disadvantage that, in some cases, a transaction may have to lock data items that it does not access.
For a set of transactions, there may be conflict-serializable schedules that cannot be obtained through
the tree protocol. Indeed, there are schedules possible under the two-phase locking protocol that are
not possible under the tree protocol, and vice versa.
c) ARIES.
ARIES is a recovery algorithm that is designed to work with a steal, no-force approach. When the
recovery manager is invoked after a crash, restart proceeds in three phases:
1.Analysis: Identifies dirty pages in the buffer pool and active transactions at the time of the crash.
2.Redo: Repeats all actions, starting from an appropriate point in the log, and restores the database state to what
it was at the time of the crash.
3.Undo: Undoes the actions of transactions that did not commit, so that the database reflects only the actions of
committed transactions.
There are three main principles behind the ARIES recovery algorithm:
Write-ahead logging: Any change to a database object is first recorded in the log; the record in the log must be
written to stable storage before the change to the database object is written to disk.
Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of the DBMS
before the crash and brings the system back to the exact state that it was in at the time of the crash. Then, it
undoes the actions of transactions that were still active at the time of the crash.
Logging changes during Undo: Changes made to the database while undoing a transaction are logged in order
to ensure that such an action is not repeated in the event of repeated restarts.
B
Strict Two-Phase Locking(Strict2PL):
The most widely used locking protocol, called Strict Two-Phase Locking, or Strict2PL,
has two rules. The first rule is
1.If a transaction T wants to read an object, it first requests a shared lock on the object.A transaction
that requests a lock is suspended until the DBMS is able to grant it the requested lock. The DBMS
keeps track of the locks it has granted and ensures that if a transaction holds an exclusive lock on an
object no other transaction holds a shared or exclusive lock on the same object.
(2)All locks held by a transaction are released when the transaction is completed.
Multiple-Granularity Locking
Another specialized locking strategy is called multiple-granularity locking, and it allows us to
efficiently set locks on objects that contain other objects. For instance, a database contains several files
, a file is a collection of pages , and a page is a collection of records . A transaction that expects to
access most of the pages in a file should probably set a lock on the entire file, rather than locking
individual pages as and when it needs them. Doing so reduces the locking overhead considerably. On
the other hand, other transactions that require access to parts of the file even parts that are not
needed by this transaction are blocked. If a transaction accesses relatively few pages of the file, it is
better to lock only those pages. Similarly, if a transaction accesses ever all records on a page, it should
lock the entire page, and if it accesses just a few records, it should lock just those records.
The recovery manager of a DBMS is responsible for ensuring two important properties of transactions:
atomicity and durability. It ensures atomicity by undoing the actions of transactions that do not commit
and durability by making sure that all actions of committed transactions survive system crashes, (e.g.,
a core dump caused by a bus error) and media failures (e.g., a disk is corrupted).
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and write
operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed
according to the timestamp values of the transactions.
Operation rejected.
Operation executed.
o
All data-item timestamps updated.
If a transaction Ti issues a write(X) operation
o
Operation rejected.
B
o
Otherwise, operation executed.
Thomas' Write Rule
This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
Detection of failure: Backup site must detect when primary site has failed
To distinguish primary site failure from link failure maintain several communication
links between the primary and the remote backup.
Heart-beat messages
Transfer of control:
o To take over control backup site first perform recovery using its copy of the database and all the
long records it has received from the primary.
o Thus, completed transactions are redone and incomplete transactions are rolled back.
o When the backup site takes over processing it becomes the new primary
o To transfer control back to old primary when it recovers, old primary must receive redo logs
from the old backup and apply all updates locally.
Time to recover: To reduce delay in takeover, backup site periodically proceses the redo log records (in
effect, performing recovery from previous database state), performs a checkpoint, and can then delete
earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
o Backup continually processes redo log record as they arrive, applying the updates locally.
o When failure of the primary is detected the backup rolls back incomplete transactions, and is
ready to process new transactions.
Alternative to remote backup: distributed database with replicated data
o Remote backup is faster and cheaper, but less tolerant to failure
Ensure durability of updates by delaying transaction commit until update is logged at backup; avoid this
delay by permitting lower degrees of durability.
One-safe: commit as soon as transactions commit log record is written at primary
o Problem: updates may not arrive at backup before it takes over.
Two-very-safe: commit when transactions commit log record is written at primary and backup
o Reduces availability since transactions cannot commit if either site fails.
Two-safe: proceed as in two-very-safe if both primary and backup are active. If only the primary is
active, the transaction commits as soon as is commit log record is written at the primary.
o Better availability than two-very-safe; avoids problem of lost transactions in one-safe.
(OR)
B
Indexing is a data structure technique to efficiently retrieve records from the database files based on some
attributes on which the indexing has been done. Indexing in database systems is similar to what we see in
books.
INDEXED SEQUENTIAL ACCESS METHOD (ISAM)
The potential large size of the index file motivates the ISAM idea. Building an auxiliary file on the index file
and so on recursively until the final auxiliary file fits on one page. This repeated construction of a one-level
index leads to a tree structure that is illustrated in Figure. The data entries of the ISAM index are in the leaf
pages of the tree and additional overflow pages that are chained to some leaf page. In addition, some systems
carefully organize the layout of pages so that page boundaries correspond closely to the physical characteristics
of the underlying storage device. The ISAM structure is completely static and facilitates such low-level
optimizations.
B+ tree
A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as
the file grows, leading to poor performance. This problem motivated the development of more flexible, dynamic
structures that adjust gracefully to inserts and deletes. The B+ tree search structure, which is widely used, is a
balanced tree in which the internal nodes direct the search and the leaf nodes contain the data entries. Since the
tree structure grows and shrinks dynamically, it is not feasible to allocate the leaf pages sequentially as in
ISAM, where the set of primary leaf pages was static. In order to retrieve all leaf pages efficiently, we have to
link them using page pointers. By organizing them into a doubly linked list, we can easily traverse the sequence
of leaf pages in either direction. This structure is illustrated in Figure.
Roll No:
__________________________________________________________________________________
I Answer all the questions each question carries 1 Mark
5*1=5
1) What are FD and Transitive Dependencies?
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples
must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign () that is, XY, where X functionally determines
Y. The left-hand side attributes determine the values of attributes on the right-hand side.
Transitive Dependency
A transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive
dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three
B
distinct attributes (or distinct collections of attributes) in the relation.A -> C is a transitive dependency
when it is true only because both A -> B and B -> C are true.
2) What is a clustered index?
A clustered index is a special type of index that reorders the way records in the table are physically stored.
Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data
pages.
3) Define lock and list various types of lock modes in multi granularity locking.
A lock is a variable associated with a data item that describes the status of the item with respect to possible
operations that can be applied to it.
Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.
5*1/2=2
1
2
B
a. log
b. table
c. block
13. Recovery is possible if we maintain a
a. Dirty Record
b. Log file
14. Example of dense index is
a. Ternary
b. Secondary
15. Join Dependency is removed in
a. 2NF
b. Denormalization
d. statement
c. W-W conflict
c. Primary
c. 3NF
d. 5NF
[ B ]
d. Tree
[ C
d. Clustered
[
D ]