0% found this document useful (0 votes)
6 views

Unit 4 Notes

Uploaded by

ADARSH YADAV
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit 4 Notes

Uploaded by

ADARSH YADAV
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

DBMS

UNIT - IV
TRANSACTIONS PROCESSING CONCEPTS
Transaction Concept
Transaction: A Transaction is a list of actions to perform a single logical unit of work.

or

A transaction is a unit of program execution that accesses and possibly updates various data items.

i. Two main issues to deal with:

● Failures of various kinds, such as hardware failures and system crashes


● Concurrent execution of multiple transactions

ii. Access to the DB is done by two operations: Read(x) Write(x)

E.g. Transaction to transfer $50 from account A to account B:

T1:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)
5. B := B + 50

6. write(B)

Transaction State

Active: The initial state; the transaction stays in this state while it is executing

Partially committed: After the final statement has been executed.

Failed: After the discovery that normal execution can no longer proceed.

Aborted: After the transaction has been rolled back and the database restored to its state prior to the start of the transaction.
Two options after it has been aborted:

i. restart the transaction ii. kill the transaction

Committed: After successful completion. A transaction is said to have terminated if it has either committed or aborted.

ACID Properties
To preserve the integrity of data the database system must ensure:

Atomicity: Either all operations of the transaction are properly reflected in the database or none are.

Consistency: Execution of a transaction in isolation preserves the consistency of the database.

Isolation: Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.

i. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started
execution after Ti finished.

Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Example of Fund Transfer

● Transaction to transfer $50 from account A to account B:


1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Atomicity requirement

● If the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state Failure could be due
to software or hardware
● The system should ensure that updates of a partially executed transaction are not reflected in the database

Durability requirement

Once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database
by the transaction must persist even if there are software or hardware failures.

Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Consistency requirement in above example:

The sum of A and B is unchanged by the execution of the transaction

In general, consistency requirements include

● A transaction must see a consistent database.


● During transaction execution the database may be temporarily inconsistent.
● When the transaction completes successfully the database must be consistent

Isolation requirement:

If between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the
sum A + B will be less than it should be).

T1 T2

1. read(A)

2. A := A – 50

3. write(A) write(A) read(A), read(B), print(A+B)

4. read(B)

5. B := B + 50

6. write(B)

Isolation can be ensured by running transactions serially that is, one after the other. However, executing multiple transactions concurrently
has significant benefits.

Schedules
Schedule: A sequences of instructions that specify the chronological order (i.e. starting with the earliest date and finishing with most recent) in
which instructions of concurrent transactions are executed

● a schedule for a set of transactions must consist of all instructions of those transactions
● must preserve the order in which the instructions appear in each individual transaction.
A transaction that successfully completes its execution will have a commit instruction as the last statement

● by default, transaction assumed to execute commit instruction as its last step

A transaction that fails to successfully complete its execution will have an abort instruction as the last statement.

● A schedule is the order in which the operations of multiple transactions appear for execution.
● Serial schedules are always consistent.
● Non-serial schedules are not always consistent.
Serial Schedules-

In serial schedules,

● All the transactions execute serially one after the other.


● When one transaction executes, no other transaction is allowed to execute.

Characteristics-

Serial schedules are always-

● Consistent
● Recoverable
● Cascadeless
● Strict
Example-01:

In this schedule,

● There are two transactions T1 and T2 executing serially one after the other.
● Transaction T1 executes first.
● After T1 completes its execution, transaction T2 executes.
● So, this schedule is an example of a Serial Schedule.
Non-Serial Schedules-

In non-serial schedules,

● Multiple transactions execute concurrently.


● Operations of all the transactions are inter leaved or mixed with each other.

Characteristics-

Non-serial schedules are NOT always-

● Consistent
● Recoverable
● Cascadeless
● Strict

Example-01:
In this schedule,

● There are two transactions T1 and T2 executing concurrently.


● The operations of T1 and T2 are interleaved.
● So, this schedule is an example of a Non-Serial Schedule.
Transaction T1 to transfer $50 from account A to account B:

T1

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

Transaction T2 transfers 10% of the balance from account A to account B:

T2

1. read(A);

2. temp:= A* 0.1;

3. A := A – temp;

4. write(A);

5. read(B);

6. B := B + temp;

7. write(B).

Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.

A serial schedule in which T1 is followed by T2 :

Schedule 1.2

Schedule 2
In Schedules 1.1,1.2 and 2, the sum A + B is preserved
Serializability
● Some non-serial schedules may lead to inconsistency of the database.
● Serializability is a concept that helps to identify which non-serial schedules are correct and will maintain the consistency of the database.
Serializable Schedules-

If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’ transactions, then it is called as a serializable
schedule.

Characteristics-

Serializable schedules behave exactly same as serial schedules.

Thus, serializable schedules are always-

● Consistent
● Recoverable
● Casacadeless
● Strict
Serial Schedules Vs Serializable Schedules-

Serial Schedules Serializable Schedules

Concurrency is allowed.
No concurrency is allowed.
Thus, multiple transactions can execute
Thus, all the transactions necessarily execute
concurrently.
serially one after the other.

Serial schedules lead to less resource utilization Serializable schedules improve both resource
and CPU throughput. utilization and CPU throughput.

Serial Schedules are less efficient as compared Serializable Schedules are always better than
to serializable schedules. serial schedules.
(due to above reason) (due to above reason)

Types of Serializability-

Serializability is mainly of two types-


1. Conflict Serializability
2. View Serializability

Basic Assumption: Each transaction preserves database consistency.

Thus, serial execution of a set of transactions preserves database consistency.

A (possibly concurrent) schedule is Serializable if it is equivalent to a serial schedule.

Conflict Serializability-

If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting operations, then it is called as a conflict
serializable schedule.

Conflicting Operations-

Two operations are called as conflicting operations if all the following conditions hold true for them-

● Both the operations belong to different transactions


● Both the operations are on the same data item
● At least one of the two operations is a write operation

Example-
Consider the following schedule-

In this schedule,

● W1 (A) and R2 (A) are called as conflicting operations.


● This is because all the above conditions hold true for them.

Checking Whether a Schedule is Conflict Serializable Or Not-

Follow the following steps to check whether a given non-serial schedule is conflict serializable or not-
Step-01:

Find and list all the conflicting operations.

Step-02:

Start creating a precedence graph by drawing one node for each transaction.

Step-03:

● Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict pair then draw an edge from Ti to Tj.
● This ensures that Ti gets executed before Tj.

Step-04:

● Check if there is any cycle formed in the graph.


● If there is no cycle found, then the schedule is conflict serializable otherwise not.

Problem-01:
Check whether the given schedule S is conflict serializable or not. If yes, then determine all the possible serialized schedules-

Solution-

Checking Whether S is Conflict Serializable Or Not-


Step-01:

List all the conflicting operations and determine the dependency between the transactions-

● R4(A) , W2(A) (T4 → T2)


● R3(A) , W2(A) (T3 → T2)
● W1(B) , R3(B) (T1 → T3)
● W1(B) , W2(B) (T1 → T2)
● R3(B) , W2(B) (T3 → T2)

Step-02:

Draw the precedence graph-


● Clearly, there exists no cycle in the precedence graph.
● Therefore, the given schedule S is conflict serializable.

Finding the Serialized Schedules-

● All the possible topological orderings of the above precedence graph will be the possible serialized schedules.
● The topological orderings can be found by performing the Topological Sort of the above precedence graph.

After performing the topological sort, the possible serialized schedules are-

1. T1 → T3 → T4 → T2
2. T1 → T4 → T3 → T2
3. T4 → T1 → T3 → T2

View Serializability-

If a given schedule is found to be view equivalent to some serial schedule, then it is called as a view
serializable schedule.
View Equivalent Schedules-

Consider two schedules S1 and S2 each consisting of two transactions T1 and T2.

Schedules S1 and S2 are called view equivalent if the following three conditions hold true for them-

Condition-01:

For each data item X, if transaction Ti reads X from the database initially in schedule S1, then in schedule S2 also, Ti must perform the initial read
of X from the database.

Thumb Rule
“Initial readers must be same for all the data items”.

Condition-02:

If transaction Ti reads a data item that has been updated by the transaction Tj in schedule S1, then in schedule S2 also, transaction Ti must read
the same data item that has been updated by the transaction Tj.
Thumb Rule
“Write-read sequence must be same.”.

Condition-03:

For each data item X, if X has been updated at last by transaction Ti in schedule S1, then in schedule S2 also, X must be updated at last by
transaction Ti.

Thumb Rule
“Final writers must be same for all the data items”.

Checking Whether a Schedule is View Serializable Or Not-

Method-01:

Check whether the given schedule is conflict serializable or not.

● If the given schedule is conflict serializable, then it is surely view serializable. Stop and report your answer.
● If the given schedule is not conflict serializable, then it may or may not be view serializable. Go and check using other methods.
Thumb Rules
● All conflict serializable schedules are view serializable.
● All view serializable schedules may or may not be conflict serializable.

Method-02:

Check if there exists any blind write operation.

(Writing without reading is called as a blind write).

● If there does not exist any blind write, then the schedule is surely not view serializable. Stop and report your answer.
● If there exists any blind write, then the schedule may or may not be view serializable. Go and check using other methods.

Thumb Rule
No blind write means not a view serializable schedule.

Method-03:

In this method, try finding a view equivalent serial schedule.


● By using the above three conditions, write all the dependencies.
● Then, draw a graph using those dependencies.
● If there exists no cycle in the graph, then the schedule is view serializable otherwise not.

PRACTICE PROBLEMS BASED ON VIEW SERIALIZABILITY-

Problem-01:

Check whether the given schedule S is view serializable or not-


Solution-

● We know, if a schedule is conflict serializable, then it is surely view serializable.


● So, let us check whether the given schedule is conflict serializable or not.

Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the transactions-

● R1(A) , W3(A) (T1 → T3)


● R2(A) , W3(A) (T2 → T3)
● R2(A) , W1(A) (T2 → T1)
● W3(A) , W1(A) (T3 → T1)

Step-02:

Draw the precedence graph-


● Clearly, there exists a cycle in the precedence graph.
● Therefore, the given schedule S is not conflict serializable.

Now,

● Since, the given schedule S is not conflict serializable, so, it may or may not be view serializable.
● To check whether S is view serializable or not, let us use another method.
● Let us check for blind writes.

Checking for Blind Writes-

● There exists a blind write W3 (A) in the given schedule S.


● Therefore, the given schedule S may or may not be view serializable.
Now,

● To check whether S is view serializable or not, let us use another method.


● Let us derive the dependencies and then draw a dependency graph.

Drawing a Dependency Graph-

● T1 firstly reads A and T3 firstly updates A.


● So, T1 must execute before T3.
● Thus, we get the dependency T1 → T3.
● Final updation on A is made by the transaction T1.
● So, T1 must execute after all other transactions.
● Thus, we get the dependency (T2, T3) → T1.
● There exists no write-read sequence.

Now, let us draw a dependency graph using these dependencies-


● Clearly, there exists a cycle in the dependency graph.
● Thus, we conclude that the given schedule S is not view serializable.

Non-Serializable Schedules-

● A non-serial schedule which is not serializable is called as a non-serializable schedule.


● A non-serializable schedule is not guaranteed to produce the the same effect as produced by some serial schedule on any consistent
database.

Characteristics-

Non-serializable schedules-

● may or may not be consistent


● may or may not be recoverable

Irrecoverable Schedules-

If in a schedule,

● A transaction performs a dirty read operation from an uncommitted transaction


● And commits before the transaction from which it has read the value
then such a schedule is known as an Irrecoverable Schedule.

Example-

Consider the following schedule-


Here,

● T2 performs a dirty read operation.


● T2 commits before T1.
● T1 fails later and roll backs.
● The value that T2 read now stands to be incorrect.
● T2 can not recover since it has already committed.
Recoverable Schedules-

If in a schedule,

● A transaction performs a dirty read operation from an uncommitted transaction


● And its commit operation is delayed till the uncommitted transaction either commits or roll backs
then such a schedule is known as a Recoverable Schedule.

Here,

● The commit operation of the transaction that performs the dirty read is delayed.
● This ensures that it still has a chance to recover if the uncommitted transaction fails later.

Example-

Consider the following schedule-


Here,

● T2 performs a dirty read operation.


● The commit operation of T2 is delayed till T1 commits or roll backs.
● T1 commits later.
● T2 is now allowed to commit.
● In case, T1 would have failed, T2 has a chance to recover by rolling back.

Checking Whether a Schedule is Recoverable or Irrecoverable-

Method-01:

Check whether the given schedule is conflict serializable or not.

● If the given schedule is conflict serializable, then it is surely recoverable. Stop and report your answer.
● If the given schedule is not conflict serializable, then it may or may not be recoverable. Go and check using other methods.

Thumb Rules
● All conflict serializable schedules are recoverable.
● All recoverable schedules may or may not be conflict serializable.

Method-02:

Check if there exists any dirty read operation.

(Reading from an uncommitted transaction is called as a dirty read)

● If there does not exist any dirty read operation, then the schedule is surely recoverable. Stop and report your answer.
● If there exists any dirty read operation, then the schedule may or may not be recoverable.

If there exists a dirty read operation, then follow the following cases-

Case-01:

If the commit operation of the transaction performing the dirty read occurs before the commit or abort operation of the transaction which
updated the value, then the schedule is irrecoverable.

Case-02:

If the commit operation of the transaction performing the dirty read is delayed till the commit or abort operation of the transaction which
updated the value, then the schedule is recoverable.

Thumb Rule

No dirty read means a recoverable schedule.

Recoverable Schedules-

If in a schedule,

● A transaction performs a dirty read operation from an uncommitted transaction


● And its commit operation is delayed till the uncommitted transaction either commits or roll backs
then such a schedule is called as a Recoverable Schedule.

Types of Recoverable Schedules-

A recoverable schedule may be any one of these kinds-

1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule

Cascading Schedule-
● If in a schedule, failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule is called as
a Cascading Schedule or Cascading Rollback or Cascading Abort.
● It simply leads to the wastage of CPU time.

Example-
Here,

● Transaction T2 depends on transaction T1.


● Transaction T3 depends on transaction T2.
● Transaction T4 depends on transaction T3.

In this schedule,

● The failure of transaction T1 causes the transaction T2 to rollback.


● The rollback of transaction T2 causes the transaction T3 to rollback.
● The rollback of transaction T3 causes the transaction T4 to rollback.
Such a rollback is called as a Cascading Rollback.

NOTE-

If the transactions T2, T3 and T4 would have committed before the failure of transaction T1, then the schedule would have been irrecoverable.

Cascadeless Schedule-

If in a schedule, a transaction is not allowed to read a data item until the last transaction that has written it is committed or aborted, then such
a schedule is called as a Cascadeless Schedule.

In other words,

● Cascadeless schedule allows only committed read operations.


● Therefore, it avoids cascading roll back and thus saves CPU time.
Example-
NOTE-

● Cascadeless schedule allows only committed read operations.


● However, it allows uncommitted write operations.

Example-

Strict Schedule-
If in a schedule, a transaction is neither allowed to read nor write a data item until the last transaction that has written it is committed or
aborted, then such a schedule is called as a Strict Schedule.

In other words,

● Strict schedule allows only committed read and write operations.


● Clearly, strict schedule implements more restrictions than cascadeless schedule.

Example-

Remember-
● Strict schedules are more strict than cascadeless schedules.
● All strict schedules are cascadeless schedules.
● All cascadeless schedules are not strict schedules.

Equivalence of Schedules-

In DBMS, schedules may have the following three different kinds of equivalence relations among them-
1. Result Equivalence
2. Conflict Equivalence
3. View Equivalence

1. Result Equivalent Schedules-

● If any two schedules generate the same result after their execution, then they are called as result equivalent schedules.
● This equivalence relation is considered of least significance.
● This is because some schedules might produce same results for some set of values and different results for some other set of values.

2. Conflict Equivalent Schedules-

If any two schedules satisfy the following two conditions, then they are called as conflict equivalent schedules-
1. The set of transactions present in both the schedules is same.
2. The order of pairs of conflicting operations of both the schedules is same.

3. View Equivalent Schedules-

We have already discussed about View Equivalent Schedules.

PRACTICE PROBLEMS BASED ON EQUIVALENCE OF SCHEDULES-

Problem-01:

Are the following three schedules result equivalent?


Solution-
To check whether the given schedules are result equivalent or not,

● We will consider some arbitrary values of X and Y.


● Then, we will compare the results produced by each schedule.
● Those schedules which produce the same results will be result equivalent.

Let X = 2 and Y = 5.

On substituting these values, the results produced by each schedule are-

Results by Schedule S1- X = 21 and Y = 10

Results by Schedule S2- X = 21 and Y = 10

Results by Schedule S3- X = 11 and Y = 10

● Clearly, the results produced by schedules S1 and S2 are same.


● Thus, we conclude that S1 and S2 are result equivalent schedules.

Failure Classification
Transaction failure:
Logical errors: transaction cannot complete due to some internal error condition
System errors: the database system must terminate an active transaction due to an error condition (e.g., deadlock)
System crash: a power failure or other hardware or software failure causes the system to crash.
Disk failure: a head crash or similar disk failure destroys all or part of disk storage.
Recoverability from Transaction Failure
To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the
database itself.
We study two approaches:
1. log-based recovery, and
2. shadow -paging
Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update activities on the database.
Each and every log record is given a unique id called log sequence number.
A log record is written for each of the following actions:
1. Updating a page
2. Commit
3. Abort
4. End
5. Undoing an update
Every log record contains the following fields:
Prev LSN |Trans Id |Type| Page ID| Length| Offset |Before Image |After image
Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or
abort of a transaction. We denote the various types of log records as:

● <Ti start>.Transaction Ti has started.


● <Ti, Xj, V1, V2> Transaction Ti has performed a write on data item Xj. Xj had value V1 before write, and will have value V2 after the
write.
● < Ti commit> Transaction Ti has committed.
● < Ti abort> Transaction Ti has aborted.

Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. Once a log
record exists, we can output the modification that has already been output to the database. Also we have the ability to undo a modification that
has already been output to the database, by using the old-value field in the log records.
For log records to be useful for recovery from system and disk failures, the log must reside on stable storage. However, since the log contains a
complete record of all database activity, the volume of data stored in the log may become unreasonable large.

● Two approaches using logs


1. Deferred database modification
2. Immediate database modification
Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all database modifications in the log, but deferring all write
operations of a transaction until the transaction partially commits (i.e., once the final action of the transaction has been executed). Then the
information in the logs is used to execute the deferred writes. If the system crashes or if the transaction aborts, then the information in the
logs is ignored.

Let T0 be transaction that transfers $50 from account A to account B:


T0: read (A);
A: = A-50;
Write (A);
Read (B);
B: = B + 50;
Write (B).

Immediate Database Modification

The immediate-update technique allows database modifications to be output to the database while the transaction is still in the active state.
These modifications are called uncommitted modifications. In the event of a crash or transaction failure, the system must use the old-value
field of the log records to restore the modified data items.
Transactions T0 and T1 executed one after the other in the order T0 followed by T1. The portion of the log containing the relevant
information concerning these two transactions appears in the following,
Portion of the system log corresponding to T0 and T1
< T0 start >
< T0, A, 1000, 950 >
< T0, B, 2000, 2050 >
< T0 commit >
< T1 start >
< T1, C, 700, 600 >
< T0 commit >

Checkpoints

When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be
undone. Rather than reprocessing the entire log, which is time-consuming and much of it unnecessary, we can use checkpoints:

● Output onto stable storage all the log records currently residing in main memory.
● Output to the disk all modified buffer blocks.
● Output onto stable storage a log record, <checkpoint>.
● The recovery system reads the logs backwards from the end to the last checkpoint.
● It maintains two lists, an undo-list and a redo-list.
● If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
● If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs
are removed and then redone before saving their logs.

Shadow Paging

Shadow paging is an alternative to log-based recovery techniques, which has both advantages and disadvantages. It may require fewer disk
accesses, but it is hard to extend paging to allow multiple concurrent transactions. The paging is very similar to paging schemes used by the
operating system for memory management.
The idea is to maintain two page tables during the life of a transaction: the current page table and the shadow page table. When the
transaction starts, both tables are identical. The shadow page is never changed during the life of the transaction. The current page is updated
with each write operation. Each table entry points to a page on the disk. When the transaction is committed, the shadow page entry becomes
a copy of the current page table entry and the disk block with the old data is released. If the shadow is stored in nonvolatile memory and a
system crash occurs, then the shadow page table is copied to the current page table. This guarantees that the shadow page table will point to
the database pages corresponding to the state of the database prior to any transaction that was active at the time of the crash, making aborts
automatic.
There are drawbacks to the shadow-page technique:

1. Commit overhead. The commit of a single transaction using shadow paging requires multiple blocks to be output -- the current page
table, the actual data and the disk address of the current page table. Log-based schemes need to output only the log records.
2. Data fragmentation. Shadow paging causes database pages to change locations (therefore, no longer contiguous.
3. Garbage collection. Each time that a transaction commits, the database pages containing the old version of data changed by the
transactions must become inaccessible. Such pages are considered to be garbage since they are not part of the free space and do not
contain any usable information. Periodically it is necessary to find all of the garbage pages and add them to the list of free pages. This
process is called garbage collection and imposes additional overhead and complexity on the system.

Deadlock Handling
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies:
1. Require that each transaction locks all its data items before it begins execution (predeclaration).
2. Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order
(graph-based protocol
Following schemes use transaction timestamps for the sake of deadlock prevention alone.
wait-die scheme - non-preemptive
1. older transactions may wait for younger ones to release data items. Younger transactions never wait for older ones; they are rolled back
instead.
2. a transaction may die several times before acquiring needed data item
wound-wait scheme – preemptive
1. older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones.
2. may be fewer rollbacks than wait-die scheme.
Deadlock prevention
Both in wait-die and in wound-wait schemes, a rolled back transaction is restarted with its original timestamp. Older transactions thus have
precedence over newer ones, and starvation is hence avoided.
Deadlock Detection
Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of vertices (all the transactions in the system) E is a
set of edges; each element is an ordered pair Ti →Tj.
If Ti → Tj is in E, then there is a directed edge from Ti to Tj, implying that T is waiting for T to release a data item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when Tj
is no longer holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look
for cycles.
Deadlock Recovery
When deadlock is detected:
Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to break deadlock.
Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation
Distributed DBMS - Distributed Databases
A distributed database is a collection of multiple interconnected databases, which are spread physically across various locations that
communicate via a computer network.
Advantages of Distributed Database System:

● Transparent Management of Distributed and Replicated Data

● Reliability Through Distributed Transactions

● Improved Performance

● Easier System Expansion

Disadvantages of Distributed Database System

● Design Issues

● Cost of Update Replication , and Syncronization

● Cost of Security Constraints

● Recover from Failure and Syncronization

Distributed Data Storage

There are 2 ways in which data can be stored on different sites. These are:
1. Replication
In this approach, the entire relation is stored redundantly at 2 or more sites. If the entire database is available at all sites, it is a fully redundant
database. Hence, in replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one site needs to be recorded at
every site that relation is stored or else it may lead to inconsistency. This is a lot of overhead. Also, concurrency control becomes way more
complex as concurrent access now needs to be checked over a number of sites.

2. Fragmentation
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments is stored in different sites
where they’re required. It must be made sure that the fragments are such that they can be used to reconstruct the original relation (i.e, there
isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Fragmentation of relations can be done in two ways:
● Horizontal fragmentation – Splitting by rows – The relation is fragmented into groups of tuples so that each tuple is assigned to at least
one fragment.
● Vertical fragmentation – Splitting by columns – The schema of the relation is divided into smaller schemas. Each fragment must contain a
common candidate key so as to ensure lossless join.

Directory System
A Derby database-A Derby database contains dictionary objects such as tables, columns, indexes, and jar files. A Derby database can also store
its own configuration information.

The database directory


A Derby database is stored in files that live in a directory of the same name as the database. Database directories typically live in system
directories.

Note: An in-memory database does not use the file system, but the size limits listed in the table later in this topic still apply. For some limits,
the maximum value is determined by the available main memory instead of the available disk space and file system limitations.

A database directory contains the following, as shown in the following figure.


● log directory

Contains files that make up the database transaction log, used internally for data recovery (not the same thing as the error log).

● seg0 directory

Contains one file for each user table, system table, and index (known as conglomerates).

● service.properties file

A text file with internal configuration information.

● tmp directory

(might not exist.) A temporary directory used by Derby for large sorts and deferred updates and deletes. Sorts are used by a variety of
SQL statements. For databases on read-only media, you might need to set a property to change the location of this directory. See
"Creating Derby Databases for Read-Only Use".

● jar directory

(might not exist.) A directory in which jar files are stored when you use database class loading.

The following figure shows the files and directories in the Derby database directories that are used by the Derby software.

Figure 1. An example of a Derby database directory and file structure


Derby imposes relatively few limitations on the number and size of databases and database objects. The following table shows some size
limitations of Derby databases and database objects.

Table 1. Size limits for Derby database objects


Type of Object Limit
Tables in each database java.lang.Long.MAX_VALUE

Some operating systems impose a limit to the number of files allowed in a single
directory.

Indexes in each table 32,767 or storage


Columns in each table 1,012
Number of columns on an index key 16
Rows in each table No limit.
Size of table No limit. Some operating systems impose a limit on the size of a single file.

Size of row No limit. Rows can span pages. Rows cannot span tables so some operating systems
impose a limit on the size of a single file, which results in limiting the size of a table
and size of a row in that table.

You might also like