0% found this document useful (0 votes)
15 views18 pages

Dbms Unit III Notes 2022-23

Uploaded by

tejasrigurram135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

Dbms Unit III Notes 2022-23

Uploaded by

tejasrigurram135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture Notes for DBMS

UNIT – III
Indexing: Types of Single Level Ordered Indexes, Multilevel Indexes, Dynamic Multilevel Indexes.
Transaction Processing and Concurrency Control: Transaction Concepts, ACID Properties,
Transaction States, Concurrency Control Problems, Serializability, Recoverability, Pessimistic and
Optimistic Concurrency Control Schemes.

COURSE OBJECTIVES:
 To get familiar with data storage techniques and indexing.

To impart knowledge in transaction Management, concurrency control techniques and
recovery techniques.
COURSE OUTCOMES:
 Implement storage of data, indexing, and hashing.
 Apply the knowledge about transaction management, concurrency control and recovery of
database systems.

INDEXING
Indexing mechanisms are used to speed up access to desired data using a search key. For
example, author catalog in library.
Search Key: It is an attribute or set of attributes used to look up records in a file.

An index file consists of records (called index entries) of the form

Search Key Pointer

Indexing is defined based on its indexing attributes. Indexing can be of the following types:
 Primary Index: In a sequentially ordered file, the index whose search key specifies the sequential
order of the file. It is also called clustering index. Here search key of a primary index is usually
but not necessarily the primary key.
 Files, with a clustering index on the search key, are called index-sequential files.

 Secondary Index: An index whose search key specifies an order different from the sequential
order of the file. It is also called non-clustering index.

Index files are typically much smaller than the original file. Two basic kinds of indices are:

1. Ordered indices: Search keys are stored in sorted order


2. Hash indices: Search keys are distributed uniformly across “buckets” using a “hash function”.

1
Lecture Notes for DBMS

SINGLE LEVEL ORDERED INDEXES/ORDERED INDICES: In an ordered


index, index entries are stored sorted on the search key value. Ex: author catalog in library.

Ordered Indexing is of two types:


1. Dense Index
2. Sparse Index

1. Dense Index: In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.
 In this, the number of records in the index table is same as the number of records in the main
table.

2. Sparse Index: In the data file, index record appears only for a few items. Each item points to a
block. In this, instead of pointing to each record in the main table, the index points to the records in
the main table in a gap. To search a record, we first proceed by index record and reach at the actual
location of the data. If the data we are looking for is not where we directly reach by following the
index, then the system starts sequential search until the desired data is found.

MULTILEVEL INDEXES: Index records comprise search-key values and data pointers.
Multilevel index is stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index records in the main
memory so as to speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.

2
Lecture Notes for DBMS

Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which can easily be
accommodated anywhere in the main memory.

DYNAMIC MULTILEVEL INDEXES: Most implementations of dynamic multilevel


indexes use a variation of B-tree known as B+ tree

B+ Tree: A B+ tree is a Balanced Binary Search tree that follows a multi-level index format. The
leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height, thus balanced. The leaf nodes are linked using a linked list; so that, a B+ tree can support
random access as well as sequential access.
The main goals of B+ tree include:
 Sorted Intermediary and leaf nodes
 Fast traversal and Quick Search:
 No overflow pages
3
Lecture Notes for DBMS

Structure of B+ Tree: Every leaf node is at equal distance from the root node. A B + tree is of the
order n where n is fixed for every B+ tree.

Internal/Intermediatary Nodes: Internal (non-leaf) nodes will have only pointers to the leaf nodes; it
does not have any data. Internal nodes usually contain keys with at least ⌈n/2⌉ data pointers and at
most n data pointers.
Leaf Nodes: In B+ tree, all leaf nodes will have the actual records stored. Every leaf node contains
one block pointer to point to next leaf node and forms a linked list.

Properties of B+ Tree:
 The leaves are all at the same height.
 There are at least two children in the root.
 Except for root, each node can have a maximum of n children and a minimum of n/2 children.
 A maximum of n – 1 keys and a minimum of n/2 – 1 keys can be stored in each node.

Insertion in B+ Tree: Refer to problems in Class notes.

4
Lecture Notes for DBMS

TRANSACTION PROCESSING AND CONCURRENCY CONTROL

TRANSACTION CONCEPTS: A transaction is a unit of program execution that accesses and


possibly updates various data items.
EX: transaction to transfer $50 from account A to account B:
read (A)
A: = A – 50
write (A)
read (B)
B: = B + 50
write (B)

Two main issues to deal with:


 Failures of various kinds, such as hardware failures and system crashes
 Concurrent execution of multiple transactions

Atomicity Requirement: If the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
 Failure could be due to software or hardware
 System should ensure that updates of a partially executed transaction are not reflected in database

Durability Requirement: Once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if
there are software or hardware failures.

Consistency Requirement: In above example: the sum of A and B is unchanged by the execution of
the transaction. In general, consistency requirements include
 Explicitly specified integrity constraints such as primary keys and foreign keys
 Implicit integrity constraints
Ex: sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand
 A transaction must see a consistent database.
 During transaction execution the database may be temporarily inconsistent.
 When the transaction completes successfully the database must be consistent
 Erroneous transaction logic can lead to inconsistency

Isolation Requirement: If between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).

5
Lecture Notes for DBMS

 Isolation can be ensured trivially by running transactions serially, that is, one after the other.
 However, executing multiple transactions concurrently has significant benefits.

ACID PROPERTIES: A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database system must ensure:

1. Atomicity: Either all operations of the transaction are properly reflected in the database or
none are.
2. Consistency: Execution of a transaction in isolation preserves the consistency of the database.
3. Isolation: Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
 That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
4. Durability: After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.

Implementation of Atomicity and Durability: The recovery-management component of a database


system implements the support for atomicity and durability.

TRANSACTION STATES: The following are various states of a transaction:

1. Active – the initial state; the transaction stays in this state while it is executing
2. Partially committed – after the final statement has been executed.
3. Failed -- after the discovery that normal execution can no longer proceed.
4. Aborted – after the transaction has been rolled back and the database restored to its state prior to
the start of the transaction. Two options after it has been aborted:
 restart the transaction
 can be done only if no internal logical error
 kill the transaction
5. Committed – after successful completion.

6
Lecture Notes for DBMS

Fig: States of a Transaction

CONCURRENCY CONTROL PROBLEMS:


In a database transaction, the two main operations are READ and WRITE operations. So, there is a
need to manage these two operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may become inconsistent. So, the
following problems occur with the Concurrent Execution of the operations:

1. Lost Update Problems (W - W Conflict): The problem occurs when two different database
transactions perform the read/write operations on the same database items in an interleaved
manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the
database inconsistent.

2. Dirty Read Problems (W-R Conflict): The dirty read problem occurs when one transaction
updates an item of the database, and somehow the transaction fails, and before the data gets
rollback, the updated database item is accessed by another transaction. There comes the Read-
Write Conflict between both transactions.

3. Unrepeatable Read Problem (W-R Conflict): Also known as Inconsistent Retrievals Problem
that occurs when in a transaction, two different values are read for the same database item.

Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency Control
comes into role.

7
Lecture Notes for DBMS

CONCURRENCY CONTROL:
Concurrent Executions: Multiple transactions are allowed to run concurrently in the system.
Advantages are:
1. Increased processor and disk utilization, leading to better transaction throughput
Ex: One transaction can be using the CPU while another is reading from or writing to the disk
2. Reduced average response time for transactions: short transactions need not wait behind long
ones.

Concurrency Control Schemes: Concurrency control schemes are the mechanisms to achieve
isolation. That is, to control the interaction among the concurrent transactions in order to prevent them
from destroying the consistency of the database.

Schedules: A sequence of instructions that specify the chronological order in which instructions of
concurrent transactions are executed. a schedule for a set of transactions must consist of all
instructions of those transactions and must preserve the order in which the instructions appear in each
individual transaction.
A transaction that successfully completes its execution will have commit instructions as the
last statement. By default transaction assumed to execute commit instruction as its last step. A
transaction that fails to successfully complete its execution will have an abort instruction as the last
statement.

Schedule 1: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
A serial schedule in which T1 is followed by T2:

8
Lecture Notes for DBMS

Schedule 2:

Schedule 3:

Let T1 and T2 be the transactions defined previously. The above schedule is not a serial schedule, but
it is equivalent to Schedule 1.

Note: In Schedules 1, 2 and 3, the sum A + B is preserved.

9
Lecture Notes for DBMS

Schedule 4: The following concurrent schedule does not preserve the value of (A + B).

SERIALIZABILITY
Each transaction preserves database consistency. Thus serial execution of a set of transactions
preserves database consistency. A (possibly concurrent) schedule is serializable if it is equivalent to a
serial schedule. Different forms of schedule equivalence give rise to the notions of:
1. Conflict Serializability
2. View Serializability
Simplified view of transactions
 We ignore operations other than read and write instructions
 We assume that transactions may perform arbitrary computations on data in local buffers in
between reads and writes.
 Our simplified schedules consist of only read and write instructions.

Conflict Serializability: A schedule is called Conflict Serializable if it can be converted into a


serial schedule after swapping its non-conflicting operations. If a schedule S can be transformed into a
schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict
equivalent.

Conflicting Instructions: Instructions li and lj of transactions Ti and Tj respectively, conflict if and


only if there exists an item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict

10
Lecture Notes for DBMS

Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable

Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.

View Serializability: Let S and S´ be two schedules with the same set of transactions. S and S´ are
view equivalent if the following three conditions are met, for each data item Q,
1. Initial Read: In schedule S, if transaction Ti reads the initial value of Q, then in schedule S’
also transaction Ti must read the initial value of Q.
2. Updated Read: In schedule S, if transaction Ti executes read (Q), and that value was
produced by transaction Tj, then in schedule S’ also transaction Ti must read the value of Q that
was produced by transaction Tj .
3. Final Write: The transaction that performs the final write (Q) operation in schedule S must
also perform the final write (Q) operation in schedule S’.

As can be seen, view equivalence is also based purely on reads and writes alone.

A schedule S is view serializable if it is view equivalent to a serial schedule. Every conflict


serializable schedule is also view serializable. Below is a schedule which is view-serializable but not
conflict serializable.

11
Lecture Notes for DBMS

 Every view serializable schedule that is not conflict serializable has blind writes.

Testing for Serializability:


 Consider some schedule of a set of transactions T1, T2, ..., Tn
 Precedence graph — a direct graph where the vertices are the transactions.
 We draw an arc from Ti to Tj if the two transaction conflict and Ti accessed the data item on which
the conflict arose earlier.
 We may label the arc by the item that was accessed.
Example:

Example Schedule (Schedule A) + Precedence Graph

Test for Conflict Serializability


A schedule is conflict serializable if and only if its precedence graph is acyclic.
 Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in
the graph.
 Better algorithms take order n + e where e is the number of edges.

12
Lecture Notes for DBMS

 If precedence graph is acyclic, the serializability order can be obtained by a topological sorting
of the graph.
 This is a linear order consistent with the partial order of the graph.
 For example, a Serializability order for Schedule A would be T5  T1  T3  T2  T4

Test for View Serializability


 The precedence graph test for conflict Serializability cannot be used directly to test for view
Serializability.
 The problem of checking if a schedule is view serializable falls in the class of NP-complete
problems. Thus existence of an efficient algorithm is extremely unlikely.

RECOVERABILITY:
Recoverable Schedules: Recoverable schedules address the effect of transaction failures on
concurrently running transactions.

 For a Recoverable schedule, if a transaction Tj reads a data item previously written by a


transaction Ti, then the commit operation of Ti appears before the commit operation of Tj.

The following schedule is not recoverable if T9 commits immediately after the read

Should T8 abort, T9 would have read (and possibly shown to the user) an inconsistent database state.
Hence, database must ensure that schedules are recoverable.

13
Lecture Notes for DBMS

Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the updated value of Ti and Tj
committed before Ti commit.

Cascading Rollbacks: A single transaction failure leads to a series of transaction rollbacks. Consider
the following schedule where none of the transactions has yet committed (so the schedule is
recoverable)

If T10 fails, T11 and T12 must also be rolled back.

 Can lead to the undoing of a significant amount of work.

Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj
such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
 Every cascadeless schedule is also recoverable. It is desirable to restrict the schedules to
those that are cascadeless.

PESSIMISTIC AND OPTIMISTIC CONCURRENCY CONTROL SCHEMES:

Pessimistic Approach: A Pessimistic approach is an approach of concurrency control algorithms


in which the transaction is delayed if there is a conflict with each other at some point of time in the
future. It locks the database’s record for update access and other users can only access record as read-
only or have to wait for a record to be ‘unlocked’. Programming an app with a pessimistic concurrency
approach can be more complicated and complex in managing because of deadlocks’ risk.
In the execution of pessimistic approach, the validate operation is performed first and if there’s
a validation consistent with compatibility of the lock then only read, compute, and write operations are
performed i.e., Validate -> Read -> Compute -> Write
14
Lecture Notes for DBMS

In the pessimistic approach we use two common locking protocols:


1. Two-phase locking protocol
2. Timestamp ordering protocol

LOCK-BASED PROTOCOLS: A locking protocol is a set of rules followed by all transactions


while requesting and releasing locks. Locking protocols restrict the set of possible schedules.
A lock is a mechanism to control concurrent access to a data item. Data items can be locked in
two modes:
1. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
2. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using
lock-X instruction.

Lock requests are made to concurrency-control manager. Transaction can proceed only after request is
granted.
Lock-compatibility matrix:

Example of a transaction performing locking:


T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)

The Two-Phase Locking Protocol: This is a protocol which ensures conflict-serializable


schedules (Serializability). It can be proved that the transactions can be serialized in the order of their
lock points (i.e. the point where a transaction acquired its final lock).

Phase 1: Growing Phase: The transaction may obtain locks and transaction may not release locks.
Phase 2: Shrinking Phase: The transaction may release locks and transaction may not obtain locks.

 Cascading roll-back is possible under two-phase locking. To avoid this, a modified protocol called
strict two-phase locking is followed. Here a transaction must hold all its exclusive locks till it
commits/aborts.
 Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this
protocol transactions can be serialized in the order in which they commit.
 Two-phase locking does not ensure freedom from deadlocks

15
Lecture Notes for DBMS

Pitfalls of Lock-Based Protocols:


 It may lead to Deadlocks.
 Starvation is also possible if concurrency control manager is badly designed.
For example: A transaction may be waiting for an X-lock on an item, while a sequence of other
transactions request and are granted an S-lock on the same item. The same transaction is
repeatedly rolled back due to deadlocks.

Timestamp Ordering Protocol: The Timestamp Ordering Protocol is used to order the
transactions based on their Timestamps. The order of transaction is nothing but the ascending order of
the transaction creation.
 The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time. Timestamp based protocols start
working as soon as a transaction is created.
 Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
 The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation
on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
 If W_TS(X) >TS(Ti) then the operation is rejected.
 If W_TS(X) <= TS(Ti) then the operation is executed.
 Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
 If TS(Ti) < R_TS(X) then the operation is rejected.
 If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
Where, TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of Timestamp Ordering protocol:


 Timestamp Ordering protocol ensures serializability since the precedence graph is as follows:

Transaction with Transaction with


Smaller TS Larger TS

Fig: Precedence Graph for Time Stamp Ordering


16
Lecture Notes for DBMS

 Timestamp Ordering protocol ensures freedom from deadlock i.e., no transaction ever waits.
 But the schedule may not be recoverable and may not even be cascade- free.

Optimistic Approach: An Optimistic approach is based on assumption that conflicts of operations


on a database are rare. It is advisable to run these transactions to completion and to check for conflicts
only before they commit, also here there’s no checking to be done during the execution of
transactions. This approach does not need any locking or time-stamping method. In an optimistic
approach, a transaction is executed without any problems of restriction until transaction is committed.
The optimistic approach allows the transactions to proceed in an unsynchronized way and also allows
conflict checking at the end. This approach is also known as validation or certification approach.
During optimistic execution, we do only read and compute operations without validation and
validate the transaction just before the write operation, i.e., Read -> Compute -> Validate -> Write

Advantages:
 In an optimistic approach transaction, rollback becomes very easy when the contacts are there.
 In an optimistic approach, you will not find any cascading rollback because it uses only the local
copy of data and not database.

Disadvantages:
 Using an optimistic approach for concurrency control can be very expensive as it needs to be
rolled back.
 If there is a conflict between large and small transactions in this method, then only large
transactions are rolled back repeatedly as they consist of more conflicts.

Difference between Pessimistic Approach and Optimistic Approach:

Pessimistic Approach Optimistic Approach


It locks records so that selected record for update It doesn’t lock the records as it ensures record
will not be changed meantime by another user wasn’t changed in time between SELECT &
SUBMIT operations.
The conflicts between transactions are very large The conflicts between transactions are less as
in this approach compared to pessimistic approach.
The synchronization of transactions is conducted The synchronization of transactions is conducted
in start phase of life cycle of execution of a in later phase or gets delayed in execution of a
transaction transaction
It is simple in designing and in programming. It is more complex in designing and managing
deadlocks’ risk
It has a higher storage cost It has a relatively lower storage cost as compared
to pessimistic approach
17
Lecture Notes for DBMS

It has a lower degree of concurrency It has a high degree of concurrency


This approach is found to use where there are This approach is found to use where there are
more transaction conflicts fewer transaction conflicts or very rare
The flow of transaction phases: The flow of transaction phases:
Validate -> Read -> Compute -> Write Read -> Compute -> Validate -> Write
It helps in protecting the system from the It allows the conflict to happen
concurrency conflict
It is suitable for a small database or a table which It is suitable for a large database or has more
has less records records

18

You might also like