Dbms Unit III Notes 2022-23
Dbms Unit III Notes 2022-23
UNIT – III
Indexing: Types of Single Level Ordered Indexes, Multilevel Indexes, Dynamic Multilevel Indexes.
Transaction Processing and Concurrency Control: Transaction Concepts, ACID Properties,
Transaction States, Concurrency Control Problems, Serializability, Recoverability, Pessimistic and
Optimistic Concurrency Control Schemes.
COURSE OBJECTIVES:
To get familiar with data storage techniques and indexing.
To impart knowledge in transaction Management, concurrency control techniques and
recovery techniques.
COURSE OUTCOMES:
Implement storage of data, indexing, and hashing.
Apply the knowledge about transaction management, concurrency control and recovery of
database systems.
INDEXING
Indexing mechanisms are used to speed up access to desired data using a search key. For
example, author catalog in library.
Search Key: It is an attribute or set of attributes used to look up records in a file.
Indexing is defined based on its indexing attributes. Indexing can be of the following types:
Primary Index: In a sequentially ordered file, the index whose search key specifies the sequential
order of the file. It is also called clustering index. Here search key of a primary index is usually
but not necessarily the primary key.
Files, with a clustering index on the search key, are called index-sequential files.
Secondary Index: An index whose search key specifies an order different from the sequential
order of the file. It is also called non-clustering index.
Index files are typically much smaller than the original file. Two basic kinds of indices are:
1
Lecture Notes for DBMS
1. Dense Index: In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.
In this, the number of records in the index table is same as the number of records in the main
table.
2. Sparse Index: In the data file, index record appears only for a few items. Each item points to a
block. In this, instead of pointing to each record in the main table, the index points to the records in
the main table in a gap. To search a record, we first proceed by index record and reach at the actual
location of the data. If the data we are looking for is not where we directly reach by following the
index, then the system starts sequential search until the desired data is found.
MULTILEVEL INDEXES: Index records comprise search-key values and data pointers.
Multilevel index is stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index records in the main
memory so as to speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.
2
Lecture Notes for DBMS
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which can easily be
accommodated anywhere in the main memory.
B+ Tree: A B+ tree is a Balanced Binary Search tree that follows a multi-level index format. The
leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height, thus balanced. The leaf nodes are linked using a linked list; so that, a B+ tree can support
random access as well as sequential access.
The main goals of B+ tree include:
Sorted Intermediary and leaf nodes
Fast traversal and Quick Search:
No overflow pages
3
Lecture Notes for DBMS
Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B + tree is of the
order n where n is fixed for every B+ tree.
Internal/Intermediatary Nodes: Internal (non-leaf) nodes will have only pointers to the leaf nodes; it
does not have any data. Internal nodes usually contain keys with at least ⌈n/2⌉ data pointers and at
most n data pointers.
Leaf Nodes: In B+ tree, all leaf nodes will have the actual records stored. Every leaf node contains
one block pointer to point to next leaf node and forms a linked list.
Properties of B+ Tree:
The leaves are all at the same height.
There are at least two children in the root.
Except for root, each node can have a maximum of n children and a minimum of n/2 children.
A maximum of n – 1 keys and a minimum of n/2 – 1 keys can be stored in each node.
4
Lecture Notes for DBMS
Atomicity Requirement: If the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
Failure could be due to software or hardware
System should ensure that updates of a partially executed transaction are not reflected in database
Durability Requirement: Once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if
there are software or hardware failures.
Consistency Requirement: In above example: the sum of A and B is unchanged by the execution of
the transaction. In general, consistency requirements include
Explicitly specified integrity constraints such as primary keys and foreign keys
Implicit integrity constraints
Ex: sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand
A transaction must see a consistent database.
During transaction execution the database may be temporarily inconsistent.
When the transaction completes successfully the database must be consistent
Erroneous transaction logic can lead to inconsistency
Isolation Requirement: If between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).
5
Lecture Notes for DBMS
Isolation can be ensured trivially by running transactions serially, that is, one after the other.
However, executing multiple transactions concurrently has significant benefits.
ACID PROPERTIES: A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database system must ensure:
1. Atomicity: Either all operations of the transaction are properly reflected in the database or
none are.
2. Consistency: Execution of a transaction in isolation preserves the consistency of the database.
3. Isolation: Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
4. Durability: After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
1. Active – the initial state; the transaction stays in this state while it is executing
2. Partially committed – after the final statement has been executed.
3. Failed -- after the discovery that normal execution can no longer proceed.
4. Aborted – after the transaction has been rolled back and the database restored to its state prior to
the start of the transaction. Two options after it has been aborted:
restart the transaction
can be done only if no internal logical error
kill the transaction
5. Committed – after successful completion.
6
Lecture Notes for DBMS
1. Lost Update Problems (W - W Conflict): The problem occurs when two different database
transactions perform the read/write operations on the same database items in an interleaved
manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the
database inconsistent.
2. Dirty Read Problems (W-R Conflict): The dirty read problem occurs when one transaction
updates an item of the database, and somehow the transaction fails, and before the data gets
rollback, the updated database item is accessed by another transaction. There comes the Read-
Write Conflict between both transactions.
3. Unrepeatable Read Problem (W-R Conflict): Also known as Inconsistent Retrievals Problem
that occurs when in a transaction, two different values are read for the same database item.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.
7
Lecture Notes for DBMS
CONCURRENCY CONTROL:
Concurrent Executions: Multiple transactions are allowed to run concurrently in the system.
Advantages are:
1. Increased processor and disk utilization, leading to better transaction throughput
Ex: One transaction can be using the CPU while another is reading from or writing to the disk
2. Reduced average response time for transactions: short transactions need not wait behind long
ones.
Concurrency Control Schemes: Concurrency control schemes are the mechanisms to achieve
isolation. That is, to control the interaction among the concurrent transactions in order to prevent them
from destroying the consistency of the database.
Schedules: A sequence of instructions that specify the chronological order in which instructions of
concurrent transactions are executed. a schedule for a set of transactions must consist of all
instructions of those transactions and must preserve the order in which the instructions appear in each
individual transaction.
A transaction that successfully completes its execution will have commit instructions as the
last statement. By default transaction assumed to execute commit instruction as its last step. A
transaction that fails to successfully complete its execution will have an abort instruction as the last
statement.
Schedule 1: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
A serial schedule in which T1 is followed by T2:
8
Lecture Notes for DBMS
Schedule 2:
Schedule 3:
Let T1 and T2 be the transactions defined previously. The above schedule is not a serial schedule, but
it is equivalent to Schedule 1.
9
Lecture Notes for DBMS
Schedule 4: The following concurrent schedule does not preserve the value of (A + B).
SERIALIZABILITY
Each transaction preserves database consistency. Thus serial execution of a set of transactions
preserves database consistency. A (possibly concurrent) schedule is serializable if it is equivalent to a
serial schedule. Different forms of schedule equivalence give rise to the notions of:
1. Conflict Serializability
2. View Serializability
Simplified view of transactions
We ignore operations other than read and write instructions
We assume that transactions may perform arbitrary computations on data in local buffers in
between reads and writes.
Our simplified schedules consist of only read and write instructions.
10
Lecture Notes for DBMS
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
View Serializability: Let S and S´ be two schedules with the same set of transactions. S and S´ are
view equivalent if the following three conditions are met, for each data item Q,
1. Initial Read: In schedule S, if transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q.
2. Updated Read: In schedule S, if transaction Ti executes read (Q), and that value was
produced by transaction Tj, then in schedule S’ also transaction Ti must read the value of Q that
was produced by transaction Tj .
3. Final Write: The transaction that performs the final write (Q) operation in schedule S must
also perform the final write (Q) operation in schedule S’.
As can be seen, view equivalence is also based purely on reads and writes alone.
11
Lecture Notes for DBMS
Every view serializable schedule that is not conflict serializable has blind writes.
12
Lecture Notes for DBMS
If precedence graph is acyclic, the serializability order can be obtained by a topological sorting
of the graph.
This is a linear order consistent with the partial order of the graph.
For example, a Serializability order for Schedule A would be T5 T1 T3 T2 T4
RECOVERABILITY:
Recoverable Schedules: Recoverable schedules address the effect of transaction failures on
concurrently running transactions.
The following schedule is not recoverable if T9 commits immediately after the read
Should T8 abort, T9 would have read (and possibly shown to the user) an inconsistent database state.
Hence, database must ensure that schedules are recoverable.
13
Lecture Notes for DBMS
Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the updated value of Ti and Tj
committed before Ti commit.
Cascading Rollbacks: A single transaction failure leads to a series of transaction rollbacks. Consider
the following schedule where none of the transactions has yet committed (so the schedule is
recoverable)
Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj
such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
Every cascadeless schedule is also recoverable. It is desirable to restrict the schedules to
those that are cascadeless.
Lock requests are made to concurrency-control manager. Transaction can proceed only after request is
granted.
Lock-compatibility matrix:
Phase 1: Growing Phase: The transaction may obtain locks and transaction may not release locks.
Phase 2: Shrinking Phase: The transaction may release locks and transaction may not obtain locks.
Cascading roll-back is possible under two-phase locking. To avoid this, a modified protocol called
strict two-phase locking is followed. Here a transaction must hold all its exclusive locks till it
commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this
protocol transactions can be serialized in the order in which they commit.
Two-phase locking does not ensure freedom from deadlocks
15
Lecture Notes for DBMS
Timestamp Ordering Protocol: The Timestamp Ordering Protocol is used to order the
transactions based on their Timestamps. The order of transaction is nothing but the ascending order of
the transaction creation.
The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time. Timestamp based protocols start
working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation
on a data.
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
If W_TS(X) >TS(Ti) then the operation is rejected.
If W_TS(X) <= TS(Ti) then the operation is executed.
Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
If TS(Ti) < R_TS(X) then the operation is rejected.
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
Where, TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Timestamp Ordering protocol ensures freedom from deadlock i.e., no transaction ever waits.
But the schedule may not be recoverable and may not even be cascade- free.
Advantages:
In an optimistic approach transaction, rollback becomes very easy when the contacts are there.
In an optimistic approach, you will not find any cascading rollback because it uses only the local
copy of data and not database.
Disadvantages:
Using an optimistic approach for concurrency control can be very expensive as it needs to be
rolled back.
If there is a conflict between large and small transactions in this method, then only large
transactions are rolled back repeatedly as they consist of more conflicts.
18