0% found this document useful (0 votes)

26 views36 pages

DBMS UNIT-5

This document covers the transaction concept in database management systems, detailing transaction states, ACID properties, and concurrent execution. It explains the importance of atomicity, consistency, isolation, and durability in ensuring data integrity during transactions, along with issues like deadlocks and recovery algorithms. Additionally, it introduces indexing techniques such as B+ Trees and hash-based indexing for efficient data retrieval.

Uploaded by

bashaa0669

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views36 pages

DBMS UNIT-5

Uploaded by

bashaa0669

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

UNIT-5

Transaction Concept: Transaction State, ACID properties, Concurrent Executions,

Serializability, Recoverability, Implementation of Isolation, Testing for Serializability, lock based,
time stamp based, optimistic, concurrency protocols, Deadlocks, Failure Classification, Storage,
Recovery and Atomicity, Recovery algorithm. Introduction to Indexing Techniques: B+ Trees,
operations on B+Trees, Hash Based Indexing:

Transaction

o The transaction is a set of logically related operation. It contains a group of tasks.

o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This
small transaction contains several low-level tasks:

X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be
3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2
then X's value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.

States of Transaction

Active state
o The active state is the first state of every transaction. In this state, the transaction is being
executed.
o For example: Insertion or deletion or updating a record is done here. But all the records
are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is
still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed in
this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In
this state, all the effects are now permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch
the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If not
then it will abort or roll back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
o Re-start
start the transaction
o Kill the transaction

ACID Properties
The expansion of the term ACID defines for:

1) Atomicity

Data's ability to remain atomic is defined by the word "atomicity."." It indicates that any operation
performed on the data should be done so either fully or not at all.It also implies that there
shouldn't be any breaks or incomplete execution of the process. When carrying out operations on
a transaction, the
he action need to be carried out whole and not only in part.
Example: If Remo has account A having $30 in his account from which he wishes to send $10
to Sheero's account, which is B. In account B, a sum of $ 100 is already present. When $10 will
be transferred
erred to account B, the sum will become $110. Now, there will be two operations that
will take place. One is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into Sh Sheero's
eero's account. Now,
what happens - the first operation of debit executes successfully, but the credit operation,
however, fails. Thus, in Remo's account A, the value becomes $20, and to that of Sheero's
account, it remains $100 as it was previously presen
present.

In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account
B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus the
transaction is atomic.

Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and
so the atomicity is the main focus in the bank systems.

2) Consistency
The definition of consistency is that a value should always be maintained. Any modifications
done to the database should always be kept, since this ensures the integrity of the data in
DBMSs. Data integrity is essential for transactions in order to assure consistency in the database
earlier and later in the transaction.
Example:

In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one
by one to both B & C. There are two operations that take place, i.e., Debit and Credit. Account A
firstly debits $50 to account B, and the amount in account A iiss read $300 by B before the
transaction. After the successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of
$50 has been successfully done to B). The debit and credit operation from account A to C has
been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means
eans that data is inconsistent because when the debit operation executes, it will not be
consistent.

3) Isolation
'Isolation' indicates to a phase of separation. A database's isolation, as defined by DBMS, is the
quality wherein no data should impact any other database and can happen simultaneously. To put
it briefly, one database operation should start once the first database operation is finished. It
signifies that two processes on two different databases could not affect the values of the other
process. When two or more transactions happen at the same time, consistency in the case of the
transactions should be preserved. Until a modification is not committed in the memory, it cannot
be observed by other transactions in that same transaction.
Example: If two operations are concurrently running on two different accounts, then the value of
both accounts should not get affected. The value should remain persistent. As you can see in the
below diagram, account A is making T1 and T2 transactions to account B and C, but both are
executing independently without affecting each other. It is known as Isolation.

4) Durability
Something's durability guarantees its permanence. The word ""durability
durability" in database
management systems (DBMS) refers to the guarantee that data is permanently stored in the
database following a successful transaction. Data need to be so completely persistent that it
keeps the database intact even in the case of a crash or system failure.However, the recovery
manager is responsible for ensuring th that
at the database remains durable in the unfortunate
circumstances. To commit the data, we must use the COMMIT command each time we make a
modification.

What are another example of an ACID transaction?

Let's get straight to the banking example, where funds are being moved between accounts. Let's
look at this example's ACID characteristics one by one:
Atomicity:Money
Money must be moved from one account to another for a transaction to be completed.
The data would become inconsistent if funds were taken out of one account and not added to
another.
Consistency: Let's take a look at a database constraint where an account balance cannot be lower
than $0. Any changes made to the account balance during a transaction must result in a
legitimate, non-negative
negative balance when the transaction is completed; otherwise, it should be
cancelled.
Isolation: Take into consideration two requests for money transfers from the same bank account
made at the same time. When the transfer requests are processed sequentially and
simultaneously, the outcome should be the same.
Durability: As soon as a database verifies that funds have been moved from one bank account to
another, take into consideration a power outage. Despite the unexpected failure, the database
ought to still have the updated data.

Concurrent Executions in DBMS

Concurrent execution refers to the simultaneous execution of more than one transaction. This is
a common scenario in multi-user database environments where many users or applications might
be accessing or modifying the database at the same time. Concurrent execution is crucial for
achieving high throughput and efficient resource utilization. However, it introduces the potential
for conflicts and data inconsistencies.
Advantages of Concurrent Execution
1. Increased System Throughput: Multiple transactions can be in progress at the same
time, but at different stages
2. Maximized Processor Utilization: If one transaction is waiting for I/O operations,
another transaction can utilize the processor.
3. Decreased Wait Time: Transactions no longer have to wait for other long transactions to
complete.
4. Improved Transaction Response Time: Transactions get processed faster because they
can be executed in parallel.
Potential Problems with Concurrent Execution
1. Lost Update Problem (Write-Write conflict):
One transaction's updates could be overwritten by another.
Examples:
T1 | T2

----------|-----------

Read(A) |

A = A+50 |

| Read(A)

| A = A+100

Write(A) |

| Write(A)

Result: T1's updates are lost.

2.Temporary Inconsistency or Dirty Read Problem (Write-Read conflict):
One transaction might read an inconsistent state of data that's being updated by another.
Examples:
T1 | T2

------------------|-----------

Read(A) |

A = A+50 |

Write(A) |

| Read(A)
| A = A+100

| Write(A)

Read(A)(rollbacks)|

| commit

Result: T2 has a "dirty" value, that was never committed in T1 and doesn't actually exist in the
database.
3. Unrepeatable Read Problem (Read-Write conflict):
when a single transaction reads the same row multiple times and observes different values each
time. This occurs because another concurrent transaction has modified the row between the two
reads.
Examples:
T1 | T2

----------|----------

Read(A) |

| Read(A)

| A = A+100

| Write(A)

Read(A) |

Result: Within the same transaction, T1 has read two different values for the same data item.
This inconsistency is the unrepeatable read.

Serializability in DBMS
Schedule is an order of multiple transactions executing in concurrent environment.
Serial Schedule: The schedule in which the transactions execute one after the other is called
serial schedule. It is consistent in nature.
For example: Consider following two transactions T1 and T2.
T1 | T2

----------|----------

Read(A) |

Write(A) |

Read(B) |

Write(B) |

| Read(A)

| Write(A)

| Read(B)

| Write(B)

All the operations of transaction T1 on data items A and then B executes and then in transaction
T2 all the operations on data items A and B execute.
Non Serial Schedule: The schedule in which operations present within the transaction are
intermixed. This may lead to conflicts in the result or inconsistency in the resultant data.
For example- Consider following two transactions,
T1 | T2

----------|----------

Read(A) |

Write(A) |

| Read(A)

| Write(B)

Read(A) |

Write(B) |

| Read(B)

| Write(B)

The above transaction is said to be non serial which result in inconsistency or conflicts in the
data.
Types of Serializability

1. Conflict Serializability
2. View Serializability
Conflict Serializability
Conflict serializability is a form of serializability where the order of non-conflicting operations is
not significant. It determines if the concurrent execution of several transactions is equivalent to
some serial execution of those transactions.
Two operations are said to be in conflict if:

They belong to different transactions.


They access the same data item.

At least one of them is a write operation.

Examples of non-conflicting operations
T1 | T2

----------|----------

Read(A) | Read(A)

Read(A) | Read(B)

Write(B) | Read(A)

Read(B) | Write(A)

Write(A) | Write(B)
Examples of conflicting operations
T1 | T2

----------|----------

Read(A) | Write(A)

Write(A) | Read(A)

Write(A) | Write(A)

A schedule is conflict serializable if it can be transformed into a serial schedule (i.e., a schedule
with no overlapping transactions) by swapping non-conflicting operations. If it is not possible to
transform a given schedule to any serial schedule using swaps of non-conflicting operations, then
the schedule is not conflict serializable.
To determine if S is conflict serializable:
Precedence Graph (Serialization Graph): Create a graph where:
Nodes represent transactions.
Draw an edge from $ T_i $ to $ T_j $ if an operation in $ T_i $ precedes and conflicts with an
operation in $ T_j $.
For the given example:
T1 | T2

----------|----------

Read(A) |

| Read(A)

Write(A) |

| Read(B)

| Write(B)

R1(A)conflicts with W1(A),so there's an edge from T1 to T1, but this is ignored because they´re
from the same transaction.
R2(A) conflicts with W1(A), so there's an edge from T2 to T1.
No other conflicting pairs.
The graph has nodes T1 and T2 with an edge from T2 to T1. There are no cycles in this graph.
Decision: Since the precedence graph doesn't have any cycles,Cycle is a path using which we
can start from one node and reach to the same node. the schedule S is conflict serializable. The
equivalent serial schedules, based on the graph, would be T2 followed by T1.

View Serializability
View Serializability is one of the types of serializability in DBMS that ensures the consistency of
a database schedule. Unlike conflict serializability, which cares about the order of conflicting
operations, view serializability only cares about the final outcome. That is, two schedules are
view equivalent if they have:

 Initial Read: The same set of initial reads (i.e., a read by a transaction with no preceding
write by another transaction on the same data item).
 Updated Read: For any other writes on a data item in between, if a transaction $T_j$
reads the result of a write by transaction $T_i$ in one schedule, then $T_j$ should read
the result of a write by $T_i$ in the other schedule as well.
 Final Write: The same set of final writes (i.e., a write by a transaction with no
subsequent writes by another transaction on the same data item).
Let's understand view serializability with an example:
Consider two transactions $T_1$ and $T_2$:
Schedule 1(S1):
| Transaction T1 | Transaction T2 |

|---------------------|---------------------|

| Write(A) | |

| | Read(A) |

| | Write(B) |

| Read(B) | |

| Write(B) | |

| Commit | Commit |

Schedule 2(S2):
| Transaction T1 | Transaction T2 |

|---------------------|---------------------|

| | Read(A) |

| Write(A) | |

| | Write(A) |

| Read(B) | |

| Write(B) | |

| Commit | Commit |

Here,
1. Both S1 and S2 have the same initial read of A by $T_2$.
2. Both S1 and S2 have the final write of A by $T_2$.
3. For intermediate writes/reads, in S1, $T_2$ reads the value of A after $T_1$ has written
to it. Similarly, in S2, $T_2$ reads A which can be viewed as if it read the value after
$T_1$ (even though in actual sequence $T_2$ read it before $T_1$ wrote it). The
important aspect is the view or effect is equivalent.
4. B is read and then written by $T_1$ in both schedules.
Considering the above conditions, S1 and S2 are view equivalent. Thus, if S1 is serializable, S2
is also view serializable.
Recoverability in DBMS
Recoverability refers to the ability of a system to restore its state to a point where the integrity of
its data is not compromised, especially after a failure or an error.
When multiple transactions are executing concurrently, issues may arise that affect the system's
recoverability. The interaction between transactions, if not managed correctly, can result in
scenarios where a transaction's effects cannot be undone, which would violate the system's
integrity.
Importance of Recoverability:
The need for recoverability arises because databases are designed to ensure data reliability and
consistency. If a system isn't recoverable:
 The integrity of the data might be compromised.
 Business processes can be adversely affected due to corrupted or inconsistent data.
 The trust of end-users or businesses relying on the database will be diminished.
Levels of Recoverability 1. Recoverable Schedules
A schedule is said to be recoverable if, for any pair of transactions $T_i$ and $T_j$, if $T_j$
reads a data item previously written by $T_i$, then $T_i$ must commit before $T_j$ commits.
If a transaction fails for any reason and needs to be rolled back, the system can recover without
having to rollback other transactions that have read or used data written by the failed transaction.
Example of a Recoverable Schedule:Suppose we have two transactions $T_1$ and $T_2$.
| Transaction T1 | Transaction T_2 |

|---------------------|---------------------|

| Write(A) | |

| | Read(A) |

| Commit | |

| | Write(B) |

| | Commit |

In the above schedule, $T_2$ reads a value written by $T_1$, but $T_1$ commits before
$T_2$, making the schedule recoverable
2. Non-Recoverable Schedules
A schedule is said to be non-recoverable (or irrecoverable) if there exists a pair of transactions
$T_i$ and $T_j$ such that $T_j$ reads a data item previously written by $T_i$, but $T_i$ has
not committed yet and $T_j$ commits before $T_i$. If $T_i$ fails and needs to be rolled back
after $T_j$ has committed, there's no straightforward way to roll back the effects of $T_j$,
leading to potential data inconsistency.
Example of a Non-Recoverable Schedule:Again, consider two transactions $T_1$ and $T_2$.
| Transaction T1 | Transaction T2 |

|---------------------|---------------------|

| Write(A) | |

| | Read(A) |

| | Write(B) |

| | Commit |

| Commit | |

In this schedule, $T_2$ reads a value written by $T_1$ and commits before $T_1$ does. If
$T_1$ encounters a failure and has to be rolled back after $T_2$ has committed, we're left in a
problematic situation since we cannot easily roll back $T_2$, making the schedule non-
recoverable.
3. Cascading Rollback
A cascading rollback occurs when the rollback of a single transaction causes one or more
dependent transactions to be rolled back. This situation can arise when one transaction reads
uncommitted changes of another transaction, and then the latter transaction fails and needs to be
rolled back. Consequently, any transaction that has read the uncommitted changes of the failed
transaction also needs to be rolled back, leading to a cascade effect.
Example of Cascading Rollback
Consider two transactions $T_1$ and $T_2$:
| Transaction T1 | Transaction T2 |

|---------------------|---------------------|

| Write(A) | |

| | Read(A) |

| | Write(B) |

| Abort(some failure) | |

| Rollback | |

| | Rollback (because it read uncommitted changes from T1)

Here, $T_2$ reads an uncommitted value of A written by $T_1$. When $T_1$ fails and is
rolled back, $T_2$ also has to be rolled back, leading to a cascading rollback. This is
undesirable because it wastes computational effort and can complicate recovery procedures.
4. Cascadeless Schedules
A schedule is considered cascadeless if transactions only read committed values. This means, in
such a schedule, a transaction can read a value written by another transaction only after the latter
has committed. Cascadeless schedules prevent cascading rollbacks.
Example of Cascadeless Schedule
Consider two transactions $T_1$ and $T_2$:
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|

| Write(A) | |

| Commit | |

| | Read(A) |

| | Write(B) |

| | Commit |

In this schedule, $T_2$ reads the value of A only after $T_1$ has committed. Thus, even if
$T_1$ were to fail before committing (not shown in this schedule), it would not affect $T_2$.
This means there's no risk of cascading rollback in this schedule.
Implementation of Isolation in DBMS: Isolation is one of the core ACID properties of a
database transaction, ensuring that the operations of one transaction remain hidden from other
transactions until completion. It means that no two transactions should interfere with each other
and affect the other's intermediate state.
Isolation Levels
Isolation levels defines the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. There are four levels of
transaction isolation defined by SQL –
1. Serializable
 The highest isolation level.
 Guarantees full serializability and ensures complete isolation of transaction operations.
2. Repeatable Read
 This is the most restrictive isolation level.
 The transaction holds read locks on all rows it references.
 It holds write locks on all rows it inserts, updates, or deletes.
 Since other transaction cannot read, update or delete these rows, it avoids non repeatable
read.
3. Read Committed
 This isolation level allows only committed data to be read.
 Thus it does not allows dirty read (i.e. one transaction reading of data immediately after
written by another transaction).
 The transaction hold a read or write lock on the current row, and thus prevent other rows
from reading, updating or deleting it.
4. Read Uncommitted
 It is lowest isolation level.
 In this level, one transaction may read not yet committed changes made by other
transaction.
 This level allows dirty reads.

The proper isolation level or concurrency control mechanism to use depends on the specific
requirements of a system and its workload. Some systems may prioritize high throughput and
can tolerate lower isolation levels, while others might require strict consistency and higher
isolation.

Isolation level Dirty Read Unrepetable Read

Serializable NO NO

Repeatable Read NO NO

Read Committed NO Maybe

Read Uncommitted Maybe Maybe

Implementation of Isolation
Implementing isolation typically involves concurrency control mechanisms. Here are common
mechanisms used:
1. Locking Mechanisms
Locking ensures exclusive access to a data item for a transaction. This means that while one
transaction holds a lock on a data item, no other transaction can access that item.
 Shared Lock (S-lock): Allows a transaction to read an item but not write to it.
 Exclusive Lock (X-lock): Allows a transaction to read and write an item. No other
transaction can read or write until the lock is released.
 Two-phase Locking (2PL): This protocol ensures that a transaction acquires all the
locks before it releases any. This results in a growing phase (acquiring locks and not
releasing any) and a shrinking phase (releasing locks and not acquiring any).
2. Timestamp-based Protocols
Every transaction is assigned a unique timestamp when it starts. This timestamp determines the
order of transactions. Transactions can only access the database if they respect the timestamp
order, ensuring older transactions get priority.
Lock Based Protocols in DBMS:Locks are essential in a database system to ensure:
1. Consistency: Without locks, multiple transactions could modify the same data item
simultaneously, resulting in an inconsistent state.
2. Isolation: Locks ensure that the operations of one transaction are isolated from other transactions,
i.e., they are invisible to other transactions until the transaction is committed.
3. Concurrency: While ensuring consistency and isolation, locks also allow multiple transactions to
be processed simultaneously by the system, optimizing system throughput and overall
performance.
4. Avoiding Conflicts: Locks help in avoiding data conflicts that might arise due to simultaneous
read and write operations by different transactions on the same data item.
5. Preventing Dirty Reads: With the help of locks, a transaction is prevented from reading data
that hasn't yet been committed by another transaction.
Lock-Based Protocols
1. Simple Lock Based Protocol
The Simple lock based protocol is a mechanism in which there is exclusive use of locks on the
data item for current transaction.
Types of Locks: There are two types of locks used –
Shared Lock (S-lock)
This lock allows a transaction to read a data item. Multiple transactions can hold shared locks on
the same data item simultaneously. It is denoted by Lock-S. This is also called as read lock.
Exclusive Lock (X-lock):
This lock allows a transaction to read and write a data item. If a transaction holds an exclusive
lock on an item, no other transaction can hold any kind of lock on the same item. It is denoted
as Lock-X. This is also called as write lock.
T1 | T2
----------|----------
Lock-S(A) |
Read(A) |
Unlock(A) | Lock-X(A)
| Read(A)
| Write(A)
| Unlock(A)

Shared Lock Exclusive Lock

Shared lock is used for when the transaction Exclusive lock is used when the transaction wants to
wants to perform read operation. perform both read and write operation.

Any number of transactions can hold shared lock But exclusive lock can be hold by only one transaction.
on an item.

Using shared lock data item can be viewed. Using exclusive lock data can be inserted or deleted.

The difference between shared lock and exclusive lock?

2. Two-Phase Locking (2PL) Protocol
The two-phase locking protocol ensures a transaction gets all the locks it needs before it releases
any. The protocol has two phases:
 Growing Phase: The transaction may obtain any number of locks but cannot release any.
 Shrinking Phase: The transaction may release but cannot obtain any new locks.
The point where the transaction releases its first lock is the end of the growing phase and the
beginning of the shrinking phase.
This protocol ensures conflict-serializability and avoids many of the concurrency issues, but it
doesn't guarantee the absence of deadlocks.
T1 | T2

----------|----------

Lock-S(A) |

| Lock-S(A)

Lock-X(B) |

Unlock(A) |

| Lock-X(C)

Unlock(B) |

| Unlock(A)

| Unlock(C)

Pros of Two-Phase Locking (2PL)

 Ensures Serializability: 2PL guarantees conflict-serializability, ensuring the consistency
of the database.
 Concurrency: By allowing multiple transactions to acquire locks and release them, 2PL
increases the concurrency level, leading to better system throughput and overall
performance.
 Avoids Cascading Rollbacks: Since a transaction cannot read a value modified by
another uncommitted transaction, cascading rollbacks are avoided, making recovery
simpler.
Cons of Two-Phase Locking (2PL)
 Deadlocks: The main disadvantage of 2PL is that it can lead to deadlocks, where two or
more transactions wait indefinitely for a resource locked by the other.
 Reduced Concurrency (in certain cases): Locking can block transactions, which can
reduce concurrency. For example, if one transaction holds a lock for a long time, other
transactions needing that lock will be blocked.
 Overhead: Maintaining locks, especially in systems with a large number of items and
transactions, requires overhead. There's a time cost associated with acquiring and
releasing locks, and memory overhead for maintaining the lock table.
 Starvation: It's possible for some transactions to get repeatedly delayed if other
transactions are continually requesting and acquiring locks.
Categories of Two-Phase Locking in DBMS
1. Strict Two-Phase Locking
2. Rigorous Two-Phase Locking
3. Conservative (or Static) Two-Phase Locking:
Strict Two-Phase Locking
 Transactions are not allowed to release any locks until after they commit or abort.
 Ensures serializability and avoids the problem of cascading rollbacks.
 However, it can reduce concurrency.
Pros Strict Two-Phase Locking
 Serializability: Ensures serializable schedules, maintaining the consistency of the
database.
 Avoids Cascading Rollbacks: A transaction cannot read uncommitted data, thus
avoiding cascading aborts.
 Simplicity: Conceptually straightforward in that a transaction simply holds onto its locks
until it's finished.
Cons Strict Two-Phase Locking
 Reduced Concurrency: Since transactions hold onto locks until they commit or abort,
other transactions might be blocked for longer periods.
 Potential for Deadlocks: Holding onto locks can increase the chances of deadlocks.
Rigorous Two-Phase Locking
 A transaction can release a lock after using it, but it cannot commit until all locks have
been acquired.
 Like strict 2PL, rigorous 2PL is deadlock-free and ensures serializability.
Pros Rigorous Two-Phase Locking
 Serializability: Like strict 2PL, ensures serializable schedules.
 Avoids Cascading Rollbacks: Prevents reading of uncommitted data.
 Improved Concurrency: By releasing locks before committing, it can potentially allow
higher concurrency compared to strict 2PL.
Cons Rigorous Two-Phase Locking
 Deadlock Possibility: Still susceptible to deadlocks as transactions might hold some
locks while releasing others.
Conservative or Static Two-Phase Locking
 A transaction must request all the locks it will ever need before it begins execution. If any
of the requested locks are unavailable, the transaction is delayed until they are all
available.
 This approach can avoid deadlocks since transactions only start when all their required
locks are available.
Pros Conservative or Static Two-Phase Locking
 Avoids Deadlocks: By ensuring that all required locks are acquired before a transaction
starts, it inherently avoids the possibility of deadlocks.
 Serializability: Ensures serializable schedules.
Cons Conservative or Static Two-Phase Locking
 Reduced Concurrency: Transactions might be delayed until all required locks are
available, leading to potential inefficiencies.

Timestamp Based Protocols in DBMS

 Timestamp-based protocols are concurrency control mechanisms used in databases to
ensure serializability and to avoid conflicts without the need for locking. The main idea
behind these protocols is to use a timestamp to determine the order in which transactions
should be executed. Each transaction is assigned a unique timestamp when it starts.
 Here are the primary rules associated with timestamp-based protocols:
 1. Timestamp Assignment: When a transaction $ T $ starts, it is given a timestamp $
TS(T) $. This timestamp can be the system's clock time or a logical counter that
increments with each new transaction.
2. Reading/Writing Rules: Timestamp-based protocols use the following rules to determine if a
transaction can read or write an item:
 Read Rule: If a transaction $ T $ wants to read an item that was last written by
transaction $ T' $ with $ TS(T') > TS(T) $, the read is rejected because it's trying to read
a value from the future. Otherwise, it can read the item.
 Write Rule: If a transaction $ T $ wants to write an item that has been read or written by
a transaction $ T' $ with $ TS(T') > TS(T) $, the write is rejected. This avoids
overwriting a value that has been read or written by a younger transaction.
3. Handling Violations: When a transaction's read or write operation violates the rules, the
transaction can be rolled back and restarted or aborted, depending on the specific protocol in use.
Let's look at examples to better understand these rules:
Example on Timestamp-based protocols
Suppose two transactions $ T1 $ and $ T2 $ with timestamps 5 and 10 respectively:
1. $ T1 $ reads item $ A $.
2. $ T2 $ writes item $ A $.
3. $ T1 $ tries to write item $ A $.
According to the write rule, $ T1 $ can't write item $ A $ after $ T2 $ has written it, because $
TS(T2) > TS(T1) $. Thus, $ T1 $'s write operation will be rejected.
Example-2
Suppose two transactions $ T1 $ and $ T2 $ with timestamps 5 and 10 respectively:
1. $ T2 $ writes item $ A $.
2. $ T1 $ tries to read item $ A $.
According to the read rule, $ T1 $ can't read item $ A $ after $ T2 $ has written it, as $ TS(T2)
> TS(T1) $. So, $ T1 $'s read operation will be rejected.
Advantages of Timestamp-based Protocols
1. Deadlock-free: Since there are no locks involved, there's no chance of deadlocks
occurring.
2. Fairness: Older transactions have priority over newer ones, ensuring that transactions do
not starve and get executed in a fair manner.
3. Increased Concurrency: In many situations, timestamp-based protocols can provide
better concurrency than lock-based methods.
Disadvantages of Timestamp-based Protocols
1. Starvation: If a transaction is continually rolled back due to timestamp rules, it may
starve.
2. Overhead: Maintaining and comparing timestamps can introduce overhead, especially in
systems with a high transaction rate.
3. Cascading Rollbacks: A rollback of one transaction might cause rollbacks of other
transactions.
One of the most well-known timestamp-based protocols is the Thomas Write Rule, which
modifies the write rule to allow certain writes that would be rejected under the basic timestamp
protocol. The idea is to ignore a write that would have no effect on the outcome, rather than
rolling back the transaction. This reduces the number of rollbacks but can result in non-
serializable schedules.
Optimistic Concurrency Control in DBMS
All data items are updated at the end of the transaction, at the end, if any data item is found
inconsistent with respect to the value in, then the transaction is rolled back.
Check for conflicts at the end of the transaction. No checking while the transaction is executing.
Checks are all made at once, so low transaction execution overhead. Updates are not applied
until end-transaction. They are applied to local copies in a transaction space.
Phases
The optimistic concurrency control has three phases, which are explained below −
Read Phase:Various data items are read and stored in temporary variables (local copies). All
operations are performed in these variables without updating the database.

Explore our latest online courses and learn new skills at your own pace. Enroll and become a
certified expert to boost your career.
Validation Phase
All concurrent data items are checked to ensure serializability will not be validated if the
transaction updates are actually applied to the database. Any changes in the value cause the
transaction rollback. The transaction timestamps are used and the write-sets and read-sets are
maintained.
To check that transaction A does not interfere with transaction B the following must hold −
 TransB completes its write phase before TransA starts the read phase.
 TransA starts its write phase after TransB completes its write phase, and the read set of
TransA has no items in common with the write set of TransB.
 Both the read set and write set of TransA have no items in common with the write set of
TransB and TransB completes its read before TransA completes its read Phase.
Write Phase
The transaction updates applied to the database if the validation is successful. Otherwise, updates
are discarded and transactions are aborted and restarted. It does not use any locks hence deadlock
free, however starvation problems of data items may occur.
Problem
S: W1(X), r2(Y), r1(Y), r2(X).
T1 -3
T2 – 4
Check whether timestamp ordering protocols allow schedule S.
Solution
Initially for a data-item X, RTS(X)=0, WTS(X)=0
Initially for a data-item Y, RTS(Y)=0, WTS(Y)=0

For W1(X) : TS(Ti)<RTS(X) i.e.

TS(T1)<RTS(X)
TS(T1)<WTS(X)
3<0 (FALSE)
=>goto else and perform write operation w1(X) and WTS(X)=3
For r2(Y): TS(T2)<WTS(Y)
4<0 (FALSE)
=>goto else and perform read operation r2(Y) and RTS(Y)=4
For r1(Y) :TS(T1)<WTS(Y)
3<0 (FALSE)
=>goto else and perform read operation r1(Y).
For r2(X) : TS(T2)<WTS(X)
4<3 (FALSE)
=>goto else and perform read operation r2(X) and RTS(X)=4
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in DBMS
as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the
grade table and needs to update the rows in the Student table held by Transaction T1.

Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state
and remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and
aborts one of the transactions.
Below is a list of conditions necessary for a deadlock to occur:

o Circular Waiting: It is when two or more transactions wait each other indefinitely for a
lock held by the others to be released.
o Partial Allocation: When a transaction acquires some of the required data items but not
all the data items as they may be exclusively locked by others.
o Non-Preemptive scheduling: A data item that could be only single transaction at a time.
o Mutual Exclusion: A data item can be locked exclusively by one transaction at a time.
To avoid a deadlock atleast one of the above mentioned necessary conditions should not occur.

Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A
method like "wait for graph" is used for detecting the deadlock situation but this method
is suitable only for the smaller database. For the larger database, deadlock prevention
method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not. The lock manager maintains a
Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a graph is created
based on the transaction and their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction which is waiting
for some data held by the others. The system keeps checking the graph if there is any
cycle in the graph.
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated
in such a way that deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
Each transaction has unique identifier which is called timestamp. It is usually based on the state
of the transaction and assigned once the transaction is started. For example if the transaction T1
starts before the transaction T2 then the timestamp corresponding to the transaction T1 will be
less than timestamp corresponding to transaction T2. The timestamp decides whether a
transaction should wait or abort and rollback. Aborted transaction retain their timestamps values
and hence the seniority.
The following deadlock prevention schemes using timestamps have been proposed.
o Wait-Die scheme
o Wound wait scheme

The significant disadvantage of both of these techniques is that some transactions are aborted and
restarted unnecessarily even though those transactions never actually cause a deadlock.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting
lock by another transaction then the DBMS simply checks the timestamp of both transactions. It
allows the older transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:

1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource,
then Ti is allowed to wait until the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj
is waiting for it, then Tj is killed and restarted later with the random delay but with the
same timestamp.
In the above representation, T1 is requesting transactions and T2 is the transaction holding the
lock on data item and t(Ti) is the timestamp of the transaction Ti.
Consider an example in which transactions having following timestamps:
If T1 request a data item is locked by transaction T2 then T1 has to wait until T2 completes and
all locks acquired by it are released because t(T1) < t(T2). On the other hand, if transaction T3
requests a data item locked by transaction T2 and T3 has to abort and rollback i.e. dies because
t(T3) < t(T2).
Deadlock Detection and Recovery
In the deadlock detection scheme, the deadlock detection algorithm checks the state of the
system periodically whether the deadlock has occurred or not, if the deadlock exists in the
system tries to recover from the deadlock.
In order to detect a deadlock the system must have the following information:The system
must provide an algorithm that uses this information i.e. the information about the current
allocations of data items to examine whether a system has entered a deadlock state or not. If the
deadlock exists then the system attempts to recover from the deadlock.
Recovery from Deadlock
If the wait for graph which is used for deadlock detection contains a deadlock situation i.e. there
exists cycles in it then those cycles should be removed to recover from the deadlock. The most
widely used technique of recovering from a deadlock is to rollback one or more transactions till
the system no longer displays a deadlock condition.
The selection of the transactions to be rolled back is based on the following deliberations:
Selection of victim: There may be many transactions which are involved in a deadlock i..e
deadlocked transaction. So to recover from the deadlock some of the transaction should be rolled
back, out of the possible transactions causing a deadlock. The one that is rolled back is known as
victim transaction and the mechanism is known as victim election.
The transactions to be rolled back are the one which has just started or has not made many
changes. Avoid selecting transactions that have made many updates and have been running for a
long time.
Rollback: Once the selection of the transaction to be rolled back is decided we should find out
how far the current transaction should be rolled back. One of the simplest solution is the total
rollback i.e. abort the transaction and restart it. However, the transaction should be rolled back to
the extent required to break the deadlock. Also, the additional information of the state of
currently executing transactions should be maintained.
Starvation: To recover from the deadlock, we must ensure that the same transaction should not
be selected again and again as a victim to rollback. The transaction will never complete if the
type of situation is not avoided. To avoid starvation, only a finite number of times a transaction
should be picked up as a victim.
A widely used solution is to include the number of rollbacks of the transaction that is selected as
the victim.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called as
transaction failure.
Reasons for a transaction failure could be -
1. Logical errors: If a transaction cannot complete due to some code error or an
internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.
2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to
be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a
common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.
STORAGE: In DBMS (Database Management System), storage refers to the organization and
management of data on physical storage devices like disks or tapes, ensuring efficient data
handling and access for applications and queries. According to Talent Battle, the storage system
in DBMS plays a crucial role in ensuring efficient data handling, from rapid processing in
primary storage to long-term backup in tertiary storage.
Here's a more detailed explanation:
 Data Storage:
DBMS stores data as files on physical storage devices like disks, managing the allocation and
deallocation of storage space.
 Storage Management:
DBMS components, like the disk space manager and file manager, handle the interaction with
the storage devices, ensuring efficient data access and retrieval.
 File Organization:
DBMS uses various file organization techniques to store data efficiently, such as heap files,
sorted files, and indexed files.
 Storage Hierarchy:
Database storage systems often employ a hierarchy of storage devices, from fast primary
storage (like RAM) to slower secondary storage (like hard drives) and tertiary storage (like
tapes) for backups.
 Storage Engines:
Some DBMS systems use storage engines, which are specialized components responsible for
managing the storage of data, interacting with the file system, and providing efficient data
access.
 RAID:
Redundant Array of Independent Disks (RAID) technology is often used in DBMS to improve
performance, reliability, and scalability by distributing data across multiple storage devices.
 Metadata:
DBMS stores metadata, which is data about data, such as table names, column names, data
types, and constraints, to describe the structure and organization of the database.
o

Recovery and Atomicity in dbms

 When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items.
 But according to ACID properties of DBMS, atomicity of transactions as a whole must
be maintained, that is, either all the operations are executed or none.
 Database recovery means recovering the data when it get deleted, hacked or damaged
accidentally.
 Atomicity is must whether is transaction is over or not it should reflect in the database
permanently or it should not effect the database at all.
When a DBMS recovers from a crash, it should maintain the following −
 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to be rolled
back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
 Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile memory, and later,
the actual database is updated
Log-Based Recovery
Log-based recovery is a widely used approach in database management systems to recover from
system failures and maintain atomicity and durability of transactions. The fundamental idea
behind log-based recovery is to keep a log of all changes made to the database, so that after a
failure, the system can use the log to restore the database to a consistent state.
How Log-Based Recovery Works
1. Transaction Logging:
For every transaction that modifies the database, an entry is made in the log. This entry typically
includes:
 Transaction ID: A unique identifier for the transaction.
 Data item identifier: Identifier for the specific item being modified.
 OLD value: The value of the data item before the modification.
 NEW value: The value of the data item after the modification.
We represent an update log record as <$T_i$ , $X_j$ , $V_1$, $V_2$>, indicating that
transaction $T_i$ has performed a write on data item $X_j$. $X_j$ had value $V_1$ before
the write, and has value $V_2$ after the write. Other special log records exist to record
significant events during transaction processing, such as the start of a transaction and the commit
or abort of a transaction. Among the types of log records are:
 <$T_i$ start>. Transaction Ti has started.
 <$T_i$ commit>. Transaction Ti has committed.
 <$T_i$ abort>. Transaction Ti has aborted.
2. Writing to the Log
Before any change is written to the actual database (on disk), the corresponding log entry is
stored. This is called the Write-Ahead Logging (WAL) principle. By ensuring that the log is
written first, the system can later recover and apply or undo any changes.
3. Checkpointing
Periodically, the DBMS might decide to take a checkpoint. A checkpoint is a point of
synchronization between the database and its log. At the time of a checkpoint:
 All the changes in main memory (buffer) up to that point are written to disk.
 A special entry is made in the log indicating a checkpoint. This helps in reducing the
amount of log that needs to be scanned during recovery.
4. Recovery Process
 Redo: If a transaction is identified (from the log) as having committed but its changes
have not been reflected in the database (due to a crash before the changes could be
written to disk), then the changes are reapplied using the 'After Image' from the log.
 Undo: If a transaction is identified as not having committed at the time of the crash, any
changes it made are reversed using the 'Before Image' in the log to ensure atomicity.
5. Commit/Rollback
Once a transaction is fully complete, a commit record is written to the log. If a transaction is
aborted, a rollback record is written, and using the log, the system undoes any changes made by
this transaction.
Benefits of Log-Based Recovery
 Atomicity: Guarantees that even if a system fails in the middle of a transaction, the
transaction can be rolled back using the log.
 Durability: Ensures that once a transaction is committed, its effects are permanent and
can be reconstructed even after a system failure.
 Efficiency: Since logging typically involves sequential writes, it is generally faster than
random access writes to a database.
Shadow paging - Its Working principle
Shadow Paging is an alternative disk recovery technique to the more common logging
mechanisms. It's particularly suitable for database systems. The fundamental concept behind
shadow paging is to maintain two page tables during the lifetime of a transaction: the current
page table and the shadow page table.
Here's a step-by-step breakdown of the working principle of shadow paging:
Initialization
When the transaction begins, the database system creates a copy of the current page table. This
copy is called the shadow page table.
The actual data pages on disk are not duplicated; only the page table entries are. This means both
the current and shadow page tables point to the same data pages initially.
During Transaction Execution
When a transaction modifies a page for the first time, a copy of the page is made. The current
page table is updated to point to this new page.
Importantly, the shadow page table remains unaltered and continues pointing to the original,
unmodified page.
Any subsequent changes by the same transaction are made to the copied page, and the current
page table continues to point to this copied page.
On Transaction Commit
Once the transaction reaches a commit point, the shadow page table is discarded, and the current
page table becomes the new "truth" for the database state.
The old data pages that were modified during the transaction (and which the shadow page table
pointed to) can be reclaimed.
Recovery after a Crash
If a crash occurs before the transaction commits, recovery is straightforward. Since the original
data pages (those referenced by the shadow page table) were never modified, they still represent
a consistent database state.
The system simply discards the changes made during the transaction (i.e., discards the current
page table) and reverts to the shadow page table.
Indexed File Organization
Indexed file organization is a method used to store and retrieve data in databases. It is designed
to provide quick random access to records based on key values. In this organization, an index is
created which helps in achieving faster search and access times.
Features Indexed File Organization:
 Primary Data File: The actual database file where records are stored.
 Index: An auxiliary file that contains key values and pointers to the corresponding
records in the data file.
 Multi-level Index: Sometimes, if the index becomes large, a secondary (or even tertiary)
index can be created on the primary index to expedite searching further.
Advantages Indexed File Organization:
 Quick Random Access: Direct access to records is possible using the index.
 Flexible Searches: Since an index provides a mechanism to jump directly to records,
different types of search operations (like range queries) can be efficiently supported.
 Ordered Access: If the primary file is ordered, then indexed file organization can
support efficient sequential access too.
Disadvantages Indexed File Organization:
 Overhead of Maintaining Index: Every time a record is added, deleted, or updated, the
index also needs to be updated. This can introduce overhead.
 Space Overhead: Indexes consume additional storage space.
 Complexity: Maintaining multiple levels of indexes can introduce complexity in terms of
design and implementation.
Practical Application: Consider a database that holds information about students.
where each student has a unique student ID. The main file would contain detailed records for
each student. A separate index file would contain student IDs and pointers to the location of the
detailed records in the main file. When you want to find a specific student's details, you first
search the index to find the pointer and then use that pointer to fetch the record directly from the
main file.
Note: Indexed file organization strikes a balance between the direct and sequential access of
records. While it offers faster search capabilities, there's a trade-off in terms of space and
maintenance overhead. It is ideal for databases where search operations are predominant, and
where the overhead of maintaining the index can be justified by the improved access times.
Indexed Sequential Access Method (ISAM)
ISAM is a popular method for indexed file organization. In ISAM:
 The primary file is stored in a sequential manner based on a primary key.
 There's a static primary index built on the primary key.
 Overflow areas are designated for insertion of new records, which keeps the main file in
sequence. Periodically, the overflow area can be merged back into the main file.
Indexing
Indexing involves creating an auxiliary structure (an index) to improve data retrieval times. Just
like the index in the back of a book, a database index provides pointers to the locations of
records.
Structure of Index
We can create indices using some columns of the database.
|-------------|----------------|
| Search Key | Data Reference |
|-------------|----------------|

 The search key is the database’s first column, and it contains a duplicate or copy of the
table’s candidate key or primary key. The primary key values are saved in sorted order so
that the related data can be quickly accessible.
 The data reference is the database’s second column. It contains a group of pointers that
point to the disk block where the value of a specific key can be found.
Types of Indexes:
1. Single-level Index: A single index table that contains pointers to the actual data records.
2. Multi-level Index: An index of indexes. This hierarchical approach reduces the number
of accesses (disk I/O operations) required to find an entry.
3. Dense and Sparse Indexes:
o In a dense index, there's an index entry for every search key value in the database.
o In a sparse index, there are fewer index entries. One entry might point to several
records.
4. Primary and Secondary Indexes:
o A primary index is an ordered file whose records are of fixed length with two
fields. The first field is the same as the primary key, and the second field is a
pointer to the data block. There's a one-to-one relationship between the number of
entries in the index and the number of records in the main file.
o A secondary index provides a secondary means of accessing data. For each
secondary key value, the index points to all the records with that key value.
5. Clustered vs. Non-clustered Index:
o In a clustered index, the rows of data in the table are stored on disk in the same
order as the index. There can only be one clustered index per table.
o In a non-clustered index, the order of rows does not match the index's order. You
can have multiple non-clustered indexes.
6. Bitmap Index: Used mainly for data warehousing setups, a bitmap index uses bit arrays
(bitmaps) and usually involves columns that have a limited number of distinct values.
7. B-trees and B+ trees: Balanced tree structures that ensure logarithmic access time. B+
trees are particularly popular in DBMS for their efficiency in disk I/O operations.
Benefits of Indexing:
 Faster search and retrieval times for database operations.
Drawbacks of Indexing:
 Overhead for insert, update, and delete operations, as indexes need to be maintained.
 Additional storage requirements for the index structures.
B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes
remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support
random access as well as sequential access.
Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of
the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end,
we will be redirected to the third leaf node. Here DBMS will perform a sequential search to find
55.

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after
55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to
find the node where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from
the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

B+ Trees
A B+ Tree is a type of self-balancing tree structure commonly used in databases and file systems
to maintain sorted data in a way that allows for efficient insertion, deletion, and search
operations. B+ Trees are an extension of B-Trees but differ mainly in the way they handle leaf
nodes, which contain all the key values and point to the actual records.
A B+ Tree of order `n` has the following properties:
1. Every node has a maximum of `n` children.
2. Every node (except the root) has a minimum of `n/2` children.
3. The tree is perfectly balanced, meaning that all leaf nodes are at the same level.
4. All keys are stored in the leaf nodes, and the internal nodes act as 'guides' to locate the
leaf nodes faster.
Operations on B+ Trees:
1. Search: Starts at the root and traverses down the tree, guided by the key values in each
node, until it reaches the appropriate leaf node.
2. Insert: Inserts a new key-value pair and then reorganizes the tree as needed to maintain its
properties.
3. Delete: Removes a key-value pair and then reorganizes the tree, again to maintain its
properties.
Example of B+ Tree Operations
Let's say we have a B+ Tree of order 4, and we want to insert the keys `[10, 20, 5, 6, 12, 30, 7,
17]` into an initially empty tree.
Let's say we have a B+ Tree of order 4, and we want to insert the keys `[10, 20, 5, 6, 12, 30, 7, 17]` into an initially
empty tree.

-------------------
|| || || ||
-------------------

Insertion
1. Insert 10:
- The tree is empty, so 10 becomes the root.

[10]
2. Insert 20:
- There's room in the leaf node for 20.

[10, 20]
3. Insert 5:
- Still room in the leaf node for 5.

[5, 10, 20]

4. Insert 6:
- The leaf node is full; split it and promote the smallest key in the right node to be the new root.

[10]
/ \
[5, 6] [10, 20]
5. Insert 12:
- Insert into the appropriate leaf node.

[10, , ]
/ \
[5, 6] [10, 12, 20]
6. Insert 30:
- Need to split the right leaf node, promote 20.

[10 , 20 , ]
/ | \
[5, 6] [10, 12] [20, 30]

7. Insert 7:
- Insert into the appropriate leaf node.
[10 , 20 , ]
/ | \
[5, 6, 7] [10, 12] [20, 30]
8. Insert 17:
- Insert into the appropriate leaf node and split.

[10 , 20 , ]
/ | \
[5, 6, 7] [10, 12] [17, 20, 30]

- Here, the middle internal node gets split, and 17 is promoted.

[10 , 17 , 20 ]
/ | | \
[5, 6, 7] [10, 12] [17] [20, 30]
Search (for 12):
- Start at the root, go down the second child because 12 > 10 and 12 < 17, and find 12 in the
corresponding leaf node.
Deletion (of 10):

1. Remove 10 from the leaf node.

[10 , 17 , 20 ]
/ | | \
[5, 6, 7] [12] [17] [20, 30]
2. Since the key 10 is also present in the internal node, we replace it with its in-order predecessor
(or successor based on design), which is 7.

[7 , 17 , 20 ]
/ | | \
[5, 6] [7, 12] [17] [20, 30]
And that's how B+ Trees work for search, insert, and delete operations. B+ Trees are dynamic,
adapting efficiently as keys are added or removed, which makes them quite useful for databases
where high-speed data retrieval is crucial.
Hash Based Indexing:

For a huge database structure, it can be almost next to impossible to search all the index values
through all its level and then reach the destination data block to retrieve the desired data.
Hashing is an effective technique to calculate the direct location of a data record on the disk
without using index structure. Hashing uses hash functions with search keys as parameters to
generate the address of a data record.

Hash Organization

Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A
bucket typically stores one complete disk block, which in turn can store one or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of search- search
keys K to the address where actual records are placed. It is a function from search keys to bucket
addresses.

Static Hashing

In static hashing, when a search-key

key value is provided, the hash function always computes the
same address. For example, if mod
mod-44 hash function is used, then it shall generate only 5 values.
The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.

Operation
Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K,, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the
address of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.
Bucket Overflow The conditionndition of bucket
bucket-overflow is known as collision.. This is a fatal state
for any static hash function. In this case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result
and is linked after
fter the previous one. This mechanism is called Closed Hashing.
 Linear Probing − When a hash function generates an address at which data is already stored,
the next free bucket is allocated to it. This mechanism is called Open Hashing.

Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on on-demand. Dynamic hashing is also so known as extended
hashing.Hash
.Hash function, in dynamic hashing, is made to produce a large number of values and
only a few are used initially.

Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is
used for computing bucket addresses. Every hash index has a depth value to signify how many
bits are used for computing a hash function. These bits can address 2n buckets. When all these
bits are consumed − that is, when all the buckets are full − then the depth value is increased
linearly and twice the buckets are allocated.
Operation
Querying − Look at the depth value of the hash index and use those bits to compute the bucket
address.
Update − Perform a query as above and update the data.
Deletion − Perform rm a query to locate the desired data and delete the same.
Insertion − Compute the address of the bucket
If the bucket is full.
Add more buckets already.
Add additional bits to the hash value.
Re-compute the hash function.
Else
Add data to the bucket,
If all the buckets are full, perform the remedies of static hashing.

Hashing is not favorable when the data is organized in some ordering and the queries require a
range of data. When data is discrete and random, hash performs the best.

Hashing algorithms have high complexity than indexing. All hash operations are done in
constant time.

Hash-Based Indexing
In hash-based indexing, a hash function is used to convert a key into a hash code. This hash code
serves as an index where the value associated with that key is stored. The goal is to distribute the
keys uniformly across an array, so that access time is, on average, constant.
Let's break down some of these elements to further understand how hash-based indexing works
in practice:
Buckets
In hash-based indexing, the data space is divided into a fixed number of slots known as
"buckets." A bucket usually contains a single page (also known as a block), but it may have
additional pages linked in a chain if the primary page becomes full. This is known as overflow.
Hash Function
The hash function is a mapping function that takes the search key as an input and returns the
bucket number where the record should be located. Hash functions aim to distribute records
uniformly across buckets to minimize the number of collisions (two different keys hashing to the
same bucket).
Disk I/O Efficiency
Hash-based indexing is particularly efficient when it comes to disk I/O operations. Given a
search key, the hash function quickly identifies the bucket (and thereby the disk page) where the
desired record is located. This often requires only one or two disk I/Os, making the retrieval
process very fast.
Insert Operations
When a new record is inserted into the dataset, its search key is hashed to find the appropriate
bucket. If the primary page of the bucket is full, an additional overflow page is allocated and
linked to the primary page. The new record is then stored on this overflow page.
Search Operations
To find a record with a specific search key, the hash function is applied to the search key to
identify the bucket. All pages (primary and overflow) in that bucket are then examined to find
the desired record.
Limitations
Hash-based indexing is not suitable for range queries or when the search key is not known. In
such cases, a full scan of all pages is required, which is resource-intensive.
Hash-Based Indexing Example
Let's consider a simple example using employee names as the search key.
Employee Records
| Name | Age | Salary
|-----------|----------|--------
| Alice | 28 | 50000
| Bob | 35 | 60000
| Carol | 40 | 70000

Hash Function: H(x) = ASCII value of first letter of the name mod 3
 Alice: 65 mod 3 = 2
 Bob: 66 mod 3 = 0
 Carol: 67 mod 3 = 1
Buckets:
Bucket 0: Bob
Bucket 1: Carol
Bucket 2: Alice
Pros of Hash-Based Indexing
 Extremely fast for exact match queries.
 Well-suited for equality comparisons.
Cons of Hash-Based Indexing
 Not suitable for range queries (e.g., "SELECT * FROM table WHERE age BETWEEN
20 AND 30").
 Performance can be severely affected by poor hash functions or a large number of
collisions.
Tree-based Indexing
The most commonly used tree-based index structure is the B-Tree, and its variations like B+
Trees and B* Trees. In tree-based indexing, data is organized into a tree-like structure. Each
node represents a range of key values, and leaf nodes contain the actual data or pointers to the
data.
Why Tree-based Indexing?
Tree-based indexes like B-Trees offer a number of advantages:
 Sorted Data: They maintain data in sorted order, making it easier to perform range
queries.
 Balanced Tree: B-Trees and their variants are balanced, meaning the path from the root
node to any leaf node is of the same length. This balancing ensures that data retrieval
times are consistently fast, even as the dataset grows.
 Multi-level Index: Tree-based indexes can be multi-level, which helps to minimize the
number of disk I/Os required to find an item.
 Dynamic Nature: B-Trees are dynamic, meaning they're good at inserting and deleting
records without requiring full reorganization.
 Versatility: They are useful for both exact-match and range queries.

Tree-based Indexing Example

Continuing with the "Students" table:
ID Name
1 Abhi
2 Bharath
3 Chinni
4 Devid
A simplified B-Tree index could look like this:

[1, 3]
/ \
[1] [3, 4]
/ \ / \
1 2 3 4

In the tree, navigating from the root to the leaf nodes will lead us to the desired data record.
Pros of Tree-based Indexing:
 Efficient for range queries.
 Good for both exact and partial matches.
 Keeps data sorted.
Cons of Tree-based Indexing:
 Slower than hash-based indexing for exact queries.
 More complex to implement and maintain.

Dbms Notes Unit 4
No ratings yet
Dbms Notes Unit 4
45 pages
DBMS UNIT-III
No ratings yet
DBMS UNIT-III
24 pages
M7 Transaction Management
No ratings yet
M7 Transaction Management
14 pages
UNIT4
No ratings yet
UNIT4
43 pages
Relational Database Management Systems - I
No ratings yet
Relational Database Management Systems - I
2 pages
Versions 5.4 and 5.5 Technical Guide: IBM Tivoli Storage Manager
No ratings yet
Versions 5.4 and 5.5 Technical Guide: IBM Tivoli Storage Manager
408 pages
Dbms - r18 Unit 4 Notes
No ratings yet
Dbms - r18 Unit 4 Notes
33 pages
Sppu dbms unit 5
No ratings yet
Sppu dbms unit 5
15 pages
UNIT 5 NOTES
No ratings yet
UNIT 5 NOTES
64 pages
Transaction Online
No ratings yet
Transaction Online
27 pages
DBMS_UNTI-5
No ratings yet
DBMS_UNTI-5
75 pages
unit 4 ADBMS
No ratings yet
unit 4 ADBMS
12 pages
UNIT-5
No ratings yet
UNIT-5
60 pages
Module 6
No ratings yet
Module 6
35 pages
What is a Transaction
No ratings yet
What is a Transaction
4 pages
DBMS_Module6
No ratings yet
DBMS_Module6
94 pages
DBMS_UNTI-5
No ratings yet
DBMS_UNTI-5
57 pages
U4T1_Transaction_Concept
No ratings yet
U4T1_Transaction_Concept
5 pages
DBMS Unit 4 Notes
No ratings yet
DBMS Unit 4 Notes
29 pages
Types of Failure and ACID Property (Basics Transaction) by Aditi Waghela
100% (2)
Types of Failure and ACID Property (Basics Transaction) by Aditi Waghela
5 pages
DBMS-Module - 5 Updated
No ratings yet
DBMS-Module - 5 Updated
98 pages
DBMS-UNIT-4
No ratings yet
DBMS-UNIT-4
61 pages
UNIT V dbms notes
No ratings yet
UNIT V dbms notes
46 pages
3 week-2-lecture-1-10102024-010318pm-11022025-091226am
No ratings yet
3 week-2-lecture-1-10102024-010318pm-11022025-091226am
21 pages
IDs notes
No ratings yet
IDs notes
52 pages
Chapter 6 Database Management (1)
No ratings yet
Chapter 6 Database Management (1)
71 pages
THE SECRET CODE OF JAPANESE CANDLESTICKS
No ratings yet
THE SECRET CODE OF JAPANESE CANDLESTICKS
261 pages
Transaction Management
No ratings yet
Transaction Management
12 pages
De Lab Manual
No ratings yet
De Lab Manual
40 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
19 pages
Chapter 5
No ratings yet
Chapter 5
24 pages
R22 Unit-4
No ratings yet
R22 Unit-4
29 pages
Transaction Management in DBMS
No ratings yet
Transaction Management in DBMS
7 pages
Introduction To Transaction:: Active State
No ratings yet
Introduction To Transaction:: Active State
11 pages
6 Transaction Processing and Management
No ratings yet
6 Transaction Processing and Management
10 pages
DBMS 2
No ratings yet
DBMS 2
25 pages
Chapter 7 - Part 1
No ratings yet
Chapter 7 - Part 1
157 pages
ADBMSUnit3pptx__2023_10_06_09_10_59
No ratings yet
ADBMSUnit3pptx__2023_10_06_09_10_59
60 pages
Transaction Management in DBMS
No ratings yet
Transaction Management in DBMS
8 pages
UNIT-5 notes[1]
No ratings yet
UNIT-5 notes[1]
22 pages
CH8 Slide
No ratings yet
CH8 Slide
97 pages
Artificial Intelligence and P___
No ratings yet
Artificial Intelligence and P___
7 pages
Transaction
No ratings yet
Transaction
5 pages
ADBMS Unit 2
No ratings yet
ADBMS Unit 2
59 pages
De Programs2
No ratings yet
De Programs2
16 pages
Chapter 14
No ratings yet
Chapter 14
56 pages
DBMS unit4
No ratings yet
DBMS unit4
51 pages
Adobe Scan 09 Dec 2024
No ratings yet
Adobe Scan 09 Dec 2024
8 pages
DB Transaction-1
No ratings yet
DB Transaction-1
3 pages
Transaction Management in DBMS
No ratings yet
Transaction Management in DBMS
6 pages
DE UNIT 4
No ratings yet
DE UNIT 4
33 pages
Unit 4RTH Notes (DBMS)
No ratings yet
Unit 4RTH Notes (DBMS)
18 pages
CSE_CSPC403_DBMS (1)
No ratings yet
CSE_CSPC403_DBMS (1)
12 pages
JavaScript SEO Essentials
No ratings yet
JavaScript SEO Essentials
18 pages
Lec 7
No ratings yet
Lec 7
26 pages
Transactions
No ratings yet
Transactions
20 pages
DBMS UNIT-4
No ratings yet
DBMS UNIT-4
29 pages
Data Warehousing Management
No ratings yet
Data Warehousing Management
18 pages
04 Advanced SQL Commands
No ratings yet
04 Advanced SQL Commands
78 pages
Load Example SQLServer
No ratings yet
Load Example SQLServer
43 pages
Professional SQL Server 2012 Internals and Troubleshooting
From Everand
Professional SQL Server 2012 Internals and Troubleshooting
Christian Bolton
4/5 (4)
Holiday Homework Computer Science Part1
No ratings yet
Holiday Homework Computer Science Part1
14 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
Database II
No ratings yet
Database II
17 pages
Unit 3 Transactions
No ratings yet
Unit 3 Transactions
16 pages
CO4 Notes Transaction
No ratings yet
CO4 Notes Transaction
15 pages
Dbms - r18 Unit 4 Notes
100% (2)
Dbms - r18 Unit 4 Notes
29 pages
DBMS4
No ratings yet
DBMS4
23 pages
Term Paper On Database Management System
100% (1)
Term Paper On Database Management System
4 pages
01 Spatial Database 22-9-2016
No ratings yet
01 Spatial Database 22-9-2016
60 pages
Unit-4 DBMS
No ratings yet
Unit-4 DBMS
7 pages
TRANSACTION
No ratings yet
TRANSACTION
5 pages
UNIT-IV-TRANSACTION CONCEPT
No ratings yet
UNIT-IV-TRANSACTION CONCEPT
50 pages
Fpse 64
No ratings yet
Fpse 64
4 pages
MMS - Chapter 4 - Optical Storage
No ratings yet
MMS - Chapter 4 - Optical Storage
29 pages
1st Place Solution To Google Landmark Retrieval 2020 Modified
No ratings yet
1st Place Solution To Google Landmark Retrieval 2020 Modified
3 pages
CIS017-1 - CIS095-1 - Assignment 1 (Design and Implement A Database) Report Template 2020-2021-16!3!2021
No ratings yet
CIS017-1 - CIS095-1 - Assignment 1 (Design and Implement A Database) Report Template 2020-2021-16!3!2021
7 pages
COLA-070071 - Unit 04 - Database Design and Development
No ratings yet
COLA-070071 - Unit 04 - Database Design and Development
86 pages
Concepts of Database Management Eighth Edition
No ratings yet
Concepts of Database Management Eighth Edition
47 pages
The Entrepreneur’S Dictionary of Business and Financial Terms
From Everand
The Entrepreneur’S Dictionary of Business and Financial Terms
Khwaja Masoom
No ratings yet
Assignment Brief BTEC Level 4-5 HNC/HND Diploma (QCF) : Merit and Distinction Descriptor
100% (1)
Assignment Brief BTEC Level 4-5 HNC/HND Diploma (QCF) : Merit and Distinction Descriptor
11 pages
Biw Basics
100% (1)
Biw Basics
109 pages
Websites That Pay You To Promote Them - SEO Tools
100% (1)
Websites That Pay You To Promote Them - SEO Tools
1 page
Azure Data Engineer Resume - Hire IT People - We Get IT Done
100% (1)
Azure Data Engineer Resume - Hire IT People - We Get IT Done
4 pages
Unit-4 DBMS Notes
No ratings yet
Unit-4 DBMS Notes
32 pages
Assignment No. 9
No ratings yet
Assignment No. 9
6 pages
Informatica ETL Beginner's Guide Informatica Tutorial Edureka
No ratings yet
Informatica ETL Beginner's Guide Informatica Tutorial Edureka
40 pages
Binary Search Tree: Basic Operations
No ratings yet
Binary Search Tree: Basic Operations
4 pages
Financial Accounting Handbook
From Everand
Financial Accounting Handbook
Dr. Anis I. Milad D.B.A. S.C.P.M
No ratings yet
Performance and Monitoring
No ratings yet
Performance and Monitoring
36 pages
21CS71-model-set-1-paper
No ratings yet
21CS71-model-set-1-paper
2 pages
Tutorial Class ERD PDF
No ratings yet
Tutorial Class ERD PDF
15 pages
Select Command in Abap
No ratings yet
Select Command in Abap
2 pages
Oracle 11g Certified Database Administrator
No ratings yet
Oracle 11g Certified Database Administrator
0 pages
Configuration Example: SAP Electronic Bank Statement (SAP - EBS)
From Everand
Configuration Example: SAP Electronic Bank Statement (SAP - EBS)
Conrad Jarrett
3/5 (1)

DBMS UNIT-5

Uploaded by

DBMS UNIT-5

Uploaded by

UNIT-5

Transaction Concept: Transaction State, ACID properties, Concurrent Executions,

o The transaction is a set of logically related operation. It contains a group of tasks.

What are another example of an ACID transaction?

Concurrent Executions in DBMS

Result: T1's updates are lost.

They belong to different transactions.

| | Rollback (because it read uncommitted changes from T1)

Isolation level Dirty Read Unrepetable Read

Read Committed NO Maybe

Read Uncommitted Maybe Maybe

Shared Lock Exclusive Lock

The difference between shared lock and exclusive lock?

Pros of Two-Phase Locking (2PL)

Timestamp Based Protocols in DBMS

For W1(X) : TS(Ti)<RTS(X) i.e.

Recovery and Atomicity in dbms

Searching a record in B+ Tree

[5, 10, 20]

- Here, the middle internal node gets split, and 17 is promoted.

1. Remove 10 from the leaf node.

In static hashing, when a search-key

Tree-based Indexing Example

You might also like