0% found this document useful (0 votes)
15 views59 pages

DBMS Unit-5

Uploaded by

gotunamdevj11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views59 pages

DBMS Unit-5

Uploaded by

gotunamdevj11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

DBMS UNIT-5

Introduction of Transactions
Introduction of Transaction
• The transaction is a set of logically related operation. It contains a
group of tasks.
• A transaction is an action or series of actions. It is performed by a
single user to perform operations for accessing the contents of the
database.
Operations in Transaction
• A certain set of operations takes place when a transaction is done that is used to
perform some logical set of operations. For example: When we go to withdraw
money from ATM, we encounter the following set of operations:
• Transaction Initiated
• You have to insert an ATM card
• Select your choice of language
• Select whether savings or current account
• Enter the amount to withdraw
• Entering your ATM pin
• Transaction processes
• You collect the cash
• You press finish to end transaction
• The above mentioned are the set of operations done by you.
• In the case of a transaction in DBMS there are three major operations
that are used for a transaction to get executed in an efficient manner.
These are:
• 1. Read/ Access Data 2. Write/ Change Data 3. Commit
• Let's understand the above three sets of operations in a transaction
with a real-life example of transferring money from Account1 to
Account2.
• Initial balance in both the banks before the start of the transaction
• Account1 - ₹ 5000 and Account2 - ₹ 2000
• This data before the start of the transaction is stored in the secondary
memory (Hard disk) which once initiated is bought to the primary
memory (RAM) of the system for faster and better access.
• Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the
following set of operations will take place.
• Read (Account1) --> 5000
• Account1 = Account1 - 500
• Write (Account1) --> 4500
• Read (Account2) --> 2000
• Account2 = Account2 + 500
• Write (Account2) --> 2500
• Commit
• After commit operation the transaction ends and updated values of
Account1 = ₹ 4500 and Account2 = ₹ 2500.
Transaction States in DBMS
• States through which a transaction goes during its lifetime. These are
the states which tell about the current state of the Transaction and
also tell how we will further do the processing in the transactions.
These states govern the rules which decide the fate of the transaction
whether it will commit or abort.
• They also use Transaction log. Transaction log is a file maintain by
recovery management component to record all the activities of the
transaction. After commit is done transaction log file is removed.
These are different types of Transaction
States :
• Active State –
When the instructions of the transaction are running then the
transaction is in active state.
• Partially Committed – If all the ‘read and write’ operations are
performed without any error then it goes to the “partially committed
state”.
• Failed State –
When any instruction of the transaction fails, it goes to the “failed
state” or if failure occurs in making a permanent change of data on
Data Base.
• Aborted State –
After having any type of failure the transaction goes from “failed
state” to “aborted state” and since in previous states, the changes are
only made to local buffer or main memory and hence these changes
are deleted or rolled-back.
• Committed State –
It is the state when the changes are made permanent on the Data
Base and the transaction is complete.
• Terminated State –
If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.
• Atomicity:
• By this, we mean that either the entire transaction takes place at once or doesn’t happen at
all. There is no midway i.e. transactions do not occur partially. Each transaction is considered
as one unit and either runs to completion or is not executed at all. It involves the following
two operations.
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
• Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
• Consistency:
• This means that integrity constraints must be maintained so that the
database is consistent before and after the transaction. It refers to the
correctness of a database. Referring to the example above,
The total amount before and after the transaction must be
maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent.
Isolation:
• This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of the database state. Transactions occur
independently without interference. Changes occurring in a particular transaction
will not be visible to any other transaction until that particular change in that
transaction is written to memory or has been committed. This property ensures
that the execution of transactions concurrently will result in a state that is
equivalent to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.
Durability:
• This property ensures that once the transaction has completed
execution, the updates and modifications to the database are stored
in and written to disk and they persist even if a system failure occurs.
These updates now become permanent and are stored in non-volatile
memory. The effects of the transaction, thus, are never lost.
Transaction Processing and Recovery
• There are mainly two types of recovery techniques used in DBMS:
• Rollback/Undo Recovery Technique: The rollback/undo recovery
technique is based on the principle of backing out or undoing the effects
of a transaction that has not completed successfully due to a system
failure or error.
• Commit/Redo Recovery Technique: The commit/redo recovery
technique is based on the principle of reapplying the changes made by a
transaction that has been completed successfully to the database.
• system log: It contains information about the start and end of each
transaction and any updates which occur during the transaction. The log
keeps track of all transaction operations that affect the values of database
items. This information is needed to recover from transaction failure.
• Deferred update – This technique does not physically update the database on
disk until a transaction has reached its commit point. Before reaching commit, all
transaction updates are recorded in the local transaction workspace. If a
transaction fails before reaching its commit point, it will not have changed the
database in any way so UNDO is not needed. It may be necessary to REDO the
effect of the operations that are recorded in the local transaction workspace,
because their effect may not yet have been written in the database. Hence, a
deferred update is also known as the No-undo/redo algorithm

• Immediate update – In the immediate update, the database may be updated by


some operations of a transaction before the transaction reaches its commit point.
However, these operations are recorded in a log on disk before they are applied to
the database, making recovery still possible. If a transaction fails to reach its
commit point, the effect of its operation must be undone i.e. the transaction must
be rolled back hence we require both undo and redo. This technique is known
as undo/redo algorithm.
Checkpoint
• A checkpoint is a process that saves the current state of the database to disk.
• This includes all transactions that have been committed, as well as any changes
that have been made to the database but not yet committed.
• The checkpoint process also includes a log of all transactions that have occurred
since the last checkpoint.
• This log is used to recover the database in the event of a system failure or crash.
• When a checkpoint occurs, the DBMS will write a copy of the current state of the
database to disk.
• This is done to ensure that the database can be recovered quickly in the event of a
failure.
• The checkpoint process also includes a log of all transactions that have occurred
since the last checkpoint.
• This log is used to recover the database in the event of a system failure or crash.
Importance of checkpoint
• Checkpoints not only play a crucial role in recovery, but they can also be used to
improve the performance of a DBMS.
• One way in which they do this is by reducing the amount of work that needs to be done
during recovery.
• As the DBMS writes a copy of the current state of the database to disk during a
checkpoint, it also discards any unnecessary information, such as old data or temporary
files.
• This helps to keep the database clean and optimized for performance.
• Another way in which checkpoints can be used to improve performance is by reducing
the amount of data that needs to be read from disk during recovery.
• When a system failure occurs, the DBMS reads the data from the checkpoint and
transaction log to rebuild the database.
• By configuring the checkpoint intervals appropriately, the DBMS can minimize the
amount of data that needs to be read from disk, which can significantly improve the
recovery time.
Steps performed during checkpoint
• The recovery system reads log files from the end to start. It reads log files from T4 to T1.
• Recovery system maintains two lists, a redo-list, and an undo-list.
• The transaction is put into redo state if the recovery system sees a log with <Tn, Start>
and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the
transactions are removed and then redone before saving their logs.
• For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the
transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
• The transaction is put into undo state if the recovery system sees a log with <Tn, Start>
but no commit or abort log found. In the undo-list, all the transactions are undone, and
their logs are removed.
• For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since
this transaction is not yet complete and failed amid.
Advantages of ACID Properties in DBMS:
• Data Consistency: ACID properties ensure that the data remains
consistent and accurate after any transaction execution.
• Data Integrity: ACID properties maintain the integrity of the data by
ensuring that any changes to the database are permanent and cannot
be lost.
• Concurrency Control: ACID properties help to manage multiple
transactions occurring concurrently by preventing interference
between them.
• Recovery: ACID properties ensure that in case of any failure or crash,
the system can recover the data up to the point of failure or crash.
Concurrency
• Concurrency Control is the management procedure that is required for
controlling concurrent execution of the operations that take place on a database.
• In a multi-user system, multiple users can access and use the same database at
one time, which is known as the concurrent execution of the database. It means
that the same database is executed simultaneously on a multi-user system by
different users.
• While working on the database transactions, there occurs the requirement of
using the database by multiple users for performing different operations, and in
that case, concurrent execution of the database is performed.
• The thing is that the simultaneous execution that is performed should be done in
an interleaved manner, and no operation should affect the other executing
operations, thus maintaining the consistency of the database. Thus, on making
the concurrent execution of the transaction operations, there occur several
challenging problems that need to be solved.
Concurrency control

Concurrency Control is the management procedure that is required for controlling


concurrent execution of the operations that take place on a database.

Concurrent Execution in DBMS


• In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.

• While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.

• The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database.
Implementation of Concurrency in DBMS
Lock-Based Protocol
• Shared lock
• Exclusive lock
Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item can only read
by the transaction.
• It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.
Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as written by the
transaction.
• This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
Three kinds of Intent locks
• Intent Shared (IS)
• Intent Exclusive (IE)
• Shared Intent Exclsive (SIX)
Intent shared (IS): If a page or row holds this lock, then the transaction
intends to read resources in the lower hierarchy by obtaining shared locks (S)
on those resources independently.
Intent exclusive (IX): If a page or row holds this lock, the transaction intends
to change some lower hierarchy resources by obtaining exclusive (X) locks on
those resources independently.
Intent update (IU): This lock can only be obtained at the page level, and it
transforms to the intent exclusive lock when the update operation is
completed.
Two Phase Locking Protocol
• The two-phase locking protocol divides the execution phase of the
transaction into three parts.
• In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
• In the second part, the transaction acquires all the locks. The third
phase is started as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It
only releases the acquired locks.
• There are two phases of 2PL:

• Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.

• Shrinking phase: In the shrinking phase, existing lock held by the


transaction may be released, but no new locks can be acquired.

• In the below example, if lock conversion is allowed then the following


phase can happen:

• Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


• Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
•Growing phase: from step 1-3
•Shrinking phase: from step 5-7
•Lock point: at 3
Transaction T2:
•Growing phase: from step 2-6
•Shrinking phase: from step 8-9
•Lock point: at 6
Strict Two-Phase Locking
• The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
• The only difference between 2PL and strict 2PL is that Strict-2PL does
not release a lock after using it.
• Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
• Strict-2PL protocol does not have shrinking phase of lock release.
• It does not have cascading abort as 2PL does.
Strict Two-Phase Locking Advantage/Disadvantage
• The advantage of strict 2PL is that it guarantees serializability, which
is the highest level of isolation among transactions. In other words,
the results of concurrent transactions executed under strict 2PL will
be the same as if they were executed one after the other.

• The disadvantage of strict 2PL is that it can lead to decreased


concurrency and increased contention for resources, as transactions
are not able to release locks until they are committed.
Strict Two-Phase Locking Implementation Example
read_lock(X);
write_lock(X);
read_lock(Y);
write_lock(Y);
read_item(X);
X:X-500;
read_item(Y);
Y:Y+500;
write_item(X);
write_item(Y);
unlock_write(Y);
Unlock_read(Y);
unlock_write(X);
Unlock_read(X);
Timestamp based protocol for concurrency
control
• Basic Timestamp Ordering
• Strict Timestamp Ordering
Basic Timestamp Ordering
The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The
order of transaction is nothing but the ascending order of the transaction creation.
The priority of the older transaction is higher that's why it executes first. To determine the timestamp of
the transaction, this protocol uses system time or logical counter.
The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at
007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a
data.
Strict Timestamp Ordering
• A variation of Basic TO is called Strict TO ensures that the schedules
are both Strict and Conflict Serializable. In this variation, a Transaction
T that issues a R_item(X) or W_item(X) such that TS(T) > W_TS(X) has
its read or write operation delayed until the Transaction T‘ that wrote
the values of X has committed or aborted.
Serializability
• A schedule is serialized if it is equivalent to a serial schedule. A
concurrent schedule must ensure it is the same as if executed serially
means one after another. It refers to the sequence of actions such as
read, write, abort, commit are performed in a serial manner.
Example
• Let’s take two transactions T1 and T2,
• If both transactions are performed without interfering each other then it
is called as serial schedule, it can be represented as follows (Example 1)−
• Non serial schedule − When a transaction is overlapped between the
transaction T1 and T2(Example 2).
Example 1 Example 2

T1 T2
READ1(A) T1 T2
READ1(A)
WRITE1(A)

WRITE1(A)
READ1(B)

READ2(B)
C1

WRITE2(B)
READ2(B)

WRITE2(B) READ1(B)

READ2(B) WRITE1(B)

C2 READ1(B)
Types of serializability

There are two types of serializability −

View serializability

A schedule is view-serializability if it is viewed equivalent to a serial schedule.

The rules it follows are as follows −

T1 is reading the initial value of A, then T2 also reads the initial value of A.

T1 is the reading value written by T2, then T2 also reads the value written by T1.

T1 is writing the final value, and then T2 also has the write operation as the final value.

Conflict serializability

• It orders any conflicting operations in the same way as some serial execution. A pair of operations is said to conflict if they operate
on the same data item and one of them is a write operation.

• That means

Readi(x) readj(x) - non conflict read-read operation

Readi(x) writej(x) - conflict read-write operation.

Writei(x) readj(x) - conflict write-read operation.

Writei(x) writej(x) - conflict write-write operation.


Deadlock
• A deadlock is a condition where two or more transactions are waiting indefinitely for
one another to give up locks. Deadlock is said to be one of the most feared
complications in DBMS as no task ever gets finished and is in waiting state forever.
• For example: In the student table, transaction T1 holds a lock on some rows and
needs to update some rows in the grade table. Simultaneously, transaction T2 holds
locks on some rows in the grade table and needs to update the rows in the Student
table held by Transaction T1.
• Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock
and similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a
halt state and remain at a standstill. It will remain in a standstill until the DBMS detects
the deadlock and aborts one of the transactions.
Deadlock Avoidance

• When a database is stuck in a deadlock state, then it is better to avoid


the database rather than aborting or restating the database. This is a
waste of time and resource.
• Deadlock avoidance mechanism is used to detect any deadlock
situation in advance. A method like "wait for graph" is used for
detecting the deadlock situation but this method is suitable only for
the smaller database. For the larger database, deadlock prevention
method can be used.
Deadlock Detection
• In a database, when a transaction waits indefinitely to obtain a lock, then the
DBMS should detect whether the transaction is involved in a deadlock or not. The
lock manager maintains a Wait for the graph to detect the deadlock cycle in the
database.
• Wait for Graph
• This is the suitable method for deadlock detection. In this method, a graph is
created based on the transaction and their lock. If the created graph has a cycle or
closed loop, then there is a deadlock.
• The wait for the graph is maintained by the system for every transaction which is
waiting for some data held by the others. The system keeps checking the graph if
there is any cycle in the graph.
• The wait for a graph for the above scenario is shown below:
Deadlock Prevention

• Deadlock prevention method is suitable for a large database. If the resources are allocated in such a way that deadlock
never occurs, then the deadlock can be prevented.
• The Database management system analyzes the operations of the transaction whether they can create a deadlock
situation or not. If they do, then the DBMS never allowed that transaction to be executed.
• Wait-Die scheme
• In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by another transaction
then the DBMS simply checks the timestamp of both transactions. It allows the older transaction to wait until the
resource is available for execution.
• Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction T. If T2 holds a lock by
some other transaction and T1 is requesting for resources held by T2 then the following actions are performed by DBMS:
• Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is allowed to wait until the
data-item is available for execution. That means if the older transaction is waiting for a resource which is locked by the
younger transaction, then the older transaction is allowed to wait for resource until it is available.
• Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is waiting for it, then Tj is killed and
restarted later with the random delay but with the same timestamp.
• Wound wait scheme
• In wound wait scheme, if the older transaction requests for a resource which is held by the younger transaction, then
older transaction forces younger one to kill the transaction and release the resource. After the minute delay, the younger
transaction is restarted but with the same timestamp.
• If the older transaction has held a resource which is requested by the Younger transaction, then the younger transaction is
asked to wait until older releases it.
Database security
• Database security is the technique that protects and secures the database against
intentional or accidental threats.
• Security concerns will be relevant not only to the data resides in an organization's
database: the breaking of security may harm other parts of the system, which
may ultimately affect the database structure.
• Consequently, database security includes hardware parts, software parts, human
resources, and data.
• To efficiently do the uses of security needs appropriate controls, which are
distinct in a specific mission and purpose for the system.
• The requirement for getting proper security while often having been neglected or
overlooked in the past days; is now more and more thoroughly checked by the
different organizations.
Database Integrity
• The overall precision, completeness, and continuity of data is known
as data integrity. Data integrity also applies to the data's protection
and security in terms of regulatory enforcement, such as GDPR
compliance. It is kept up to date by a set of procedures, guidelines,
and specifications that were put in place during the design phase.
Difference between distributed system and
centralized system
• A centralized database is basically a type of database that is stored, located as well as maintained
at a single location only. This type of database is modified and managed from that location itself.
This location is thus mainly any database system or a centralized computer system. The centralized
location is accessed via an internet connection (LAN, WAN, etc). This centralized database is mainly
used by institutions or organizations.
• Advantages:
• Since all data is stored at a single location only thus it is easier to access and coordinate data.
• The centralized database has very minimal data redundancy since all data is stored in a single
place.
• It is cheaper in comparison to all other databases available.
• Disadvantages:
• The data traffic in the case of a centralized database is more.
• If any kind of system failure occurs in the centralized system then the entire data will be destroyed.
• Distributed Database:
• A distributed database is basically a type of database which consists of multiple
databases that are connected with each other and are spread across different
physical locations. The data that is stored in various physical locations can thus be
managed independently of other physical locations. The communication between
databases at diffeAdvantages:
• This database can be easily expanded as data is already spread across different
physical locations.
• The distributed database can easily be accessed from different networks.
• This database is more secure in comparison to a centralized database.
• Disadvantages:
• This database is very costly and is difficult to maintain because of its complexity.
• In this database, it is difficult to provide a uniform view to users since it is spread
across different physical locations.
• rent physical locations is thus done by a computer network.
Transparency features of DDMS
• The definition of and DDBMS defines that the system should make the distribution transparent to the user.
Transparent hides implementation details from the user. For example, in a centralized DBMS, data
independence is a form of transparency it hides changes in the definition and organization of the data from
the user. A DDBMS may provide a various· levels of transparency.
• Distribution transparency

Distribution transparency allows the user to perceive the database as a single, logical entity. If add BMS exhibits
distribution transparency, then the user does not need· to know the data is fragrances (fragmentation
transparency) or the location of data items (Local transparency).
Distribution transparency can be classified into:
Fragmentation transparency
• Location transparency
• Replication transparency
• Local Mapping transparency
• Naming transparency
Fragmentation transparency
Fragmentation is the highest level of distribution transparency. If
fragmentation transparency is provided by the DDBMS, then the user
does not need to know that the data is fragmented, As a result,
database accesses are based on the global schema,. so the user does
not need to specify fragment names or data locations.
Replication transparency
Closely related to location transparency is replication transparency,
which means that the user is unaware of the replication of fragments.
Replication transparency is implied’ by location transparency.
Problems encountered in DDBMS while
considering concurrency and recovery
• Dealing with multiple copies of the data items. The concurrency
control method is responsible for maintaining consistency among
these copies. The recovery method is responsible for making a copy
consistent with other copies if the site on which the copy is stored
fails and recovers later.
• Failure of individual sites. The DDBMS should continue to operate
with its running sites, if possible, when one or more individual sites
fail. When a site recovers, its local database must be brought up-to-
date with the rest of the sites before it rejoins the system.
• Failure of communication links. The system must be able to deal
with the failure of one or more of the communication links that
connect the sites. An extreme case of this problem is that network
partitioning may occur. This breaks up the sites into two or more
partitions, where the sites within each partition can communicate
only with one another and not with sites in other partitions.
• Distributed commit. Problems can arise with committing a
transaction that is accessing databases stored on multiple sites if
some sites fail during the commit process.
• Distributed deadlock. Deadlock may occur among several sites, so
techniques for dealing with deadlocks must be extended to take this
into account.
Object Oriented database design.
• ODBMS stands for Object-Oriented Database Management System,
which is a type of database management system that is designed to
store and manage object-oriented data.
• Object-oriented data is data that is represented using objects, which
encapsulate data and behavior into a single entity.
Object Oriented database
• The ODBMS which is an abbreviation for object-oriented database
management system is the data model in which data is stored in form
of objects, which are instances of classes. These classes and objects
together make an object-oriented data model.
• An object-oriented database is organized around objects rather than
actions, and data rather than logic. For example, a multimedia record
in a relational database can be a definable data object, as opposed to
an alphanumeric value.
RDBMS vs OODBMS vs ORDBMS
RDBMS
A relational database management system (RDBMS) is a collection of programs and
capabilities that enable IT teams and others to create, update, administer and
otherwise interact with a relational database.
OODBMS
• OODBMS stands for Object-Oriented Database Management System. It is a DBMS
where data is represented in the form of objects, as used in object-oriented
programming. OODB implements object-oriented concepts such as classes of
objects, object identity, polymorphism, encapsulation, and inheritance.
ORDBMS
• An object relational database management system (ORDBMS) is a database
management system with that is similar to a relational database, except that it
has an object-oriented database model. This system supports objects, classes and
inheritance in database schemas and query language.
Thank You

You might also like