0% found this document useful (0 votes)
8 views

Unit 4 DBMS Transcation

Uploaded by

Mohit Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit 4 DBMS Transcation

Uploaded by

Mohit Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

UNIT-2

Database Management Systems(KCA 204)


Transaction
A transaction is an executing program that forms a logical unit of database processing. A
transaction includes one or more database access operations—these can include insertion,
deletion, modification, or retrieval operations. Collection of operations that forms a single
logical unit of work is called transaction. A transaction is a unit of program execution that
accesses and possibly updates various data items. A transaction can be defined as a logical
unit of work on the database.
A transaction is a unit of program execution that accesses and possibly updates various data
items.
A transaction symbolizes a unit of work performed within a database management
system (or similar system) against a database, and treated in a coherent and reliable way
independent of other transactions. A transaction generally represents any change in a
database

The transaction consists of all operations executed between the begin transaction
and end transaction.

Operations of transaction:
1. Read Operation-
 read_item(X). Reads a database item named X into a program variable. To simplify
our notation, we assume that the program variable is also named X.
2. Write Operation-
 write_item(X). Writes the value of program variable X into the database item named
X.

Transaction States
A transaction in a database can be in one of the following states –
1. Active − In this state, the transaction is being executed. This is the initial state of
every transaction.
2. Partially Committed − When a transaction executes its final operation, it is said to be
in a partially committed state.
3. Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
4. Aborted − If any of the checks fails and the transaction has reached a failed state,
then the recovery manager rolls back all its write operations on the database to
bring the database back to its original state where it was prior to the execution of
the transaction. Transactions in this state are called aborted. The database recovery
module can select one of the two operations after a transaction aborts −

Re-start the transaction

kill the transaction


5. Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system.

Desirable Properties of Transactions:


Transactions should possess several properties, often called the ACID properties; they
should be enforced by the concurrency control and recovery methods of the DBMS. The
following are the ACID properties:
 Atomicity: - A transaction is an atomic unit of processing; it should either be
performed in its entirety or not performed at all.
 Consistency. A transaction should be consistency preserving, meaning that if it is
completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another.
 Isolation: - A transaction should appear as though it is being executed in isolation
from other transactions, even though many transactions are executing concurrently.
That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently.
 Durability: - The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any failure.

Concurrent Execution of Transaction


In the transaction process, a system usually allows executing more than one transaction
simultaneously. This process is called a concurrent execution.

Advantages of concurrent execution of a transaction


1. Decrease waiting time or turnaround time.
2. Improve response time
3. Increased throughput or resource utilization.

Why Concurrency Control Needed?


1. The Lost Update Problem(Write Write Conflict):This problem occurs when multiple
transactions execute concurrently and updates from one or more transactions get
lost.
1. T1 reads the value of A (= 10 say).
2. T2 updates the value to A (= 15 say) in the buffer.
3. T2 does blind write A = 25 (write without read) in the buffer.
4. T2 commits.
5. When T1 commits, it writes A = 25 in the database.
 T1 writes the over written value of X in the database.
 Thus, update from T1 gets lost.

2. Temporary Update or Dirty Read Problem(Write Read Conflict). Reading the data
written by an uncommitted transaction is called as dirty read.

 T2 reads the dirty value of A written by the uncommitted transaction T1.


 T1 fails in later stages and roll backs.
 Thus, the value that T2 read now stands to be incorrect.
 Therefore, database becomes inconsistent.

3. The Incorrect Summary Problem. If one transaction is calculating an aggregate


summary function on a number of database items while other transactions are
updating some of these items, the aggregate function may calculate some values
before they are updated and others after they are updated.
4. Unrepeatable Read Problem:
The unrepeatable problem occurs when two or more read operations of the same
transaction read different values of the same variable.
Example:

1. T1 reads the value of X (= 10 say).


2. T2 reads the value of X (= 10).
3. T1 updates the value of X (from 10 to 15 say) in the buffer.
4. T2 again reads the value of X (but = 15).
 T2 gets to read a different value of X in its second reading.
 T2 wonders how the value of X got changed because according to it, it is running in
isolation.

5. Phantom Read Problem:


The phantom read problem occurs when a transaction reads a variable once but when it
tries to read that same variable again, an error occurs saying that the variable does not
exist.
Example:
1. T1 reads X.
2. T2 reads X.
3. T1 deletes X.
4. T2 tries reading X but does not find it.

Schedules in DBMS: The order in which the operations of multiple transactions appear for
execution is called as a schedule.
The execution sequences in chronological order are called schedules.
The schedules are serial; each serial schedule consists of a sequence of instructions from
various transactions.

Types of Schedules

Serial Schedules-
 All the transactions execute serially one after the other.
 When one transaction executes, no other transaction is allowed to execute.
 Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule
is one in which no transaction starts until a running transaction has ended are called
serial schedules.
 The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction.

Non Serial Schedule


 Multiple transactions execute concurrently.
 Operations of all the transactions are inter leaved or mixed with each other.
 in the non-serial schedule, the other transaction proceeds without waiting for the
previous transaction to complete.
 If interleaving of operations is allowed, then there will be non-serial schedule.
 It contains many possible orders in which the system can execute the individual operations
of the transactions.
Serializability:-
When multiple transactions are running concurrently then there is a possibility that the
database may be left in an inconsistent state. Serializability is a concept that helps us to
check which schedules are serializable. A serializable schedule is the one that always leaves
the database in consistent state.
If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’
transactions, then it is called as a serializable schedule.
A non serial schedule is said to be serializable, if it is conflict equivalent or view-equivalent
to a serial schedule.
Example

Types of Serializability-
Serializability is mainly of two types-

1. Conflict Serializability
2. View Serializability

Conflict Serializability-
If a given non-serial schedule can be converted into a serial schedule by swapping its non-
conflicting operations, then it is called as a conflict serializable schedule.

If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting


instructions, we say that S and S´ are conflict equivalent

We say that a schedule S is Conflict Serializability if it is conflict equivalent to a serial


schedule.
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.

Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists
some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.


2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict

Conflicting Operations-
Two operations are called as conflicting operations if all the following conditions hold true
for them-

 Both the operations belong to different transactions


 Both the operations are on the same data item
 At least one of the two operations is a write operation
Example of Conflict Serializable Schedule

Testing of Serializability for Conflict Serializable Schedule


Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)
Step-01:
List all the conflicting operations and determine the dependency between the transactions-
 R2(A) , W1(A) (T2 → T1)
 R1(B) , W2(B) (T1 → T2)
 R3(B) , W2(B) (T3 → T2)
Step-02:
Draw the precedence graph-
Clearly, there exists a cycle in the precedence graph.
 Therefore, the given schedule S is not conflict serializable.

Q2.Check whether the given schedule S is conflict serializable and recoverable or not?

Step-01:
List all the conflicting operations and determine the dependency between the transactions-

 R2(X) , W3(X) (T2 → T3)


 R2(X) , W1(X) (T2 → T1)
 W3(X) , W1(X) (T3 → T1)
 W3(X) , R4(X) (T3 → T4)
 W1(X) , R4(X) (T1 → T4)
 W2(Y) , R4(Y) (T2 → T4)
Step-02:

Draw the precedence graph-


 Clearly, there exists no cycle in the precedence graph.
 Therefore, the given schedule S is conflict serializable.
 Conflict serializable schedules are always recoverable.
 Therefore, the given schedule S is recoverable.

View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.

View Equivalent

Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:

1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should
also read A.
2. Updated Read

Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in
S2, T3 is reading A updated by T1.

3. Final Write
Two schedules is view equal because Final write operation in S1 is done by T3 and in S2,
the final write operation is also done by T3

Q1.Check whether the given schedule S is view serializable or not?


Solution:- First Check conflict serializable

T1
T2

T3

it has Cycle between T1 and T2 so it is not Conflict serializable


Now check blind write , in this blind write exists in T2 or T3
so we check view serializable

1.The initial read operation in S is done by T1 and in S1, it is also done by T1.
2. In both schedules S and S1, there is no write read except the initial read that’s why we
don’t need to check that condition.
3.The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1
are view Equivalent.The first schedule S1 satisfies all three conditions, so we don’t need to
check another schedule. Hence, view equivalent serial schedule is

T1 → T2 → T3

Q2.Check whether the given schedule S is view serializable or not?


Solution-
We know, if a schedule is conflict serializable, then it is surely view serializable.

 So, let us check whether the given schedule is conflict serializable or not.
Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the transactions-

 W1(B) , W2(B) (T1 → T2)


 W1(B) , W3(B) (T1 → T3)
 W1(B) , W4(B) (T1 → T4)
 W2(B) , W3(B) (T2 → T3)
 W2(B) , W4(B) (T2 → T4)
 W3(B) , W4(B) (T3 → T4)

Draw the precedence graph-

 there exists no cycle in the precedence graph.Therefore, the given schedule S is conflict
serializable.
 Thus, we conclude that the given schedule is also view serializable.
Q3. Check whether the given schedule S is view serializable or not?

Solution-

We know, if a schedule is conflict serializable, then it is surely view serializable.

 So, let us check whether the given schedule is conflict serializable or not.
Checking Whether S is Conflict Serializable Or Not-

Step-01:

List all the conflicting operations and determine the dependency between the transactions-

 R1(A) , W3(A) (T1 → T3)


 R2(A) , W3(A) (T2 → T3)
 R2(A) , W1(A) (T2 → T1)
 W3(A) , W1(A) (T3 → T1)
Step-02:

Draw the precedence graph-

Clearly, there exists a cycle between T1 and T3 in the precedence graph.Therefore, the
given schedule S is not conflict serializable.

Since, the given schedule S is not conflict serializable, so, it may or may not be view
serializable.

 To check whether S is view serializable or not, let us use another method.


 Let us check for blind writes.

Checking for Blind Writes-


 There exists a blind write W3 (A) in the given schedule S.
 Therefore, the given schedule S may or may not be view serializable.
 To check whether S is view serializable or not, let us use another method.
Drawing a Dependency Graph-

 T1 firstly reads A and T3 firstly updates A.


 So, T1 must execute before T3.
 Thus, we get the dependency T1 → T3.
 Final updation on A is made by the transaction T1.
 So, T1 must execute after all other transactions.
 Thus, we get the dependency (T2, T3) → T1.
 There exists no write-read sequence.
Now, let us draw a dependency graph using these dependencies-

 there exists a cycle in the dependency graph. Thus, we conclude that the given schedule S
is not view serializable.

Non-Serializable Schedules-
A non-serial schedule which is not serializable is called as a non-serializable schedule.
A non-serializable schedule is not guaranteed to produce the the same effect as produced
by some serial schedule on any consistent database.

It has two types:-

i) Recoverable Schedule

ii) Irrecoverable schedule

Recoverable Schedule:
if some transaction T j is reading value updated or written by some other transaction T i,
then the commit of T j must occur after the commit of T i.
If in a schedule,

 A transaction performs a dirty read operation from an uncommitted transaction


 And its commit operation is delayed till the uncommitted transaction either commits
or roll backs then such a schedule is known as a Recoverable Schedule.

 The commit operation of the transaction that performs the dirty read is delayed.
 This ensures that it still has a chance to recover if the uncommitted transaction fails
later.

Example 1:-

 T2 performs a dirty read operation.


 The commit operation of T2 is delayed till T1 commits or roll backs.
 In case, T1 would have failed, T2 has a chance to recover by rolling back.

Example 2 - S1: R1(x), W1(x), R2(x), R1(y),R2(y), W2(x), W1(y), C1, C2;
Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2
hence there is no chances of conflict occur. transaction T1 is committed before T2 i.e.
completion of first transaction performed first update on data item x, hence given
schedule is recoverable.

Irrecoverable Schedule- When Tj is reading the value updated by Ti and Tj is committed


before committing of Ti, the schedule will be irrecoverable.

If in a schedule, A transaction performs a dirty read operation from an uncommitted


transaction and commits before the transaction from which it has read the value then such
a schedule is known as an Irrecoverable Schedule.
 T2 performs a dirty read operation.
 T2 commits before T1.
 T1 fails later and roll backs.
 The value that T2 read now stands to be incorrect.
 T2 cannot recover since it has already committed.

Types of Recoverable Schedules


There can be three types of recoverable schedule:

i) Cascading Schedule/ cascading rollback


If in a schedule, failure of one transaction causes several other dependent transactions to
rollback or abort, then such a schedule is called as a Cascading Rollback or Cascading Abort
or Cascading Schedule. It simply leads to the wastage of CPU time.
These Cascading Rollbacks occur because of Dirty Read problems
ii) Cascadeless Schedule:
Schedules in which transactions read values only after all transactions whose changes they
are going to read commit are called cascadeless schedules. Avoids that a single transaction
abort leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to
disallow a transaction from reading uncommitted changes from another transaction in the
same schedule.
In other words, if some transaction Tj wants to read value updated or written by some
other transaction Ti, then the commit of Tj must read it after the commit of Ti.

Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2

R(A)

W(A)

W(A)

Commit

R(A)

Commit

This schedule is cascadeless. Since the updated value of A is read by T2 only after the
updating transaction i.e. T1 commits.

iii) Strict Schedule:


A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti precedes a
conflicting operation of Tj (either read or write), then the commit or abort event of Ti also
precedes that conflicting operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after
Ti commits/aborts.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2

R(A)

R(A)

W(A)

Commit

W(A)

R(A)

commit

This is a strict schedule since T2 reads and writes A which is written by T1 only after the
commit of T1.

1. Cascadeless schedules are stricter than recoverable schedules or are a subset of


recoverable schedules.
2. Strict schedules are stricter than cascadeless schedules or are a subset of cascadeless
schedules.
3. Serial schedules satisfy constraints of all recoverable, cascadeless and strict schedules
and hence is a subset of strict schedules.
The relation between various types of schedules can be depicted as:
Recovery System:-
Database recovery is the process of restoring the database to a correct (consistent) state in
the event of a failure. In other words, it is the process of restoring the database to the most
recent consistent state that existed shortly before the time of system failure.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows −

Transaction failure
 Transaction failure :

o Logical errors: transaction cannot complete due to some internal error


condition such as bad input, data not found

o System errors: the database system must terminate an active transaction


due to an error condition (e.g., deadlock)

 System crash: a power failure or other hardware or software failure causes the
system to crash.

 Disk failure: a head crash or similar disk failure destroys all or part of disk storage

o Destruction is assumed to be detectable: disk drives use checksums to detect


failures

Recovery from transaction Failure


Transaction recovery is the process of removing the undesirable effects of specific
transactions from the database. To recover from transaction failure recovery schemes are
used-

1. Log based recovery

2. Shadow paging

3.Checkpoint

Log based Recovery

A log is the most widely used recording database modification technique. The log is a
structure used for recording database modification. It is a sequence of log record recording
all the update activities in the database. There are many types of log record we denote the
various types of log records:

1. <Ti start> transaction Ti has started.


2. <Ti, Xi, V1, V2> transaction Ti has performed a write on data item Xi, Xj has
value v1 before the write and will have value V2 after the write operation.
3. <Ti commit> transaction Ti has committed.
4. <Ti abort> transaction Ti ha aborted.

Two approaches that uses log based recovery −

1. Deferred database modification − The deferred modification technique occurs if


the transaction does not modify the database until it has committed. In this method,
all the logs are created and stored in the stable storage, and the database is updated
when a transaction commits. It is also called NO-UNDO/REDO technique

Let A=1000, B=2000, C=700


So, A=A-50 means A=1000-50=> A=950
B=B+50 means B=2000+50=> B=2050
C=C-100 means C=700-100=> C=600
2. Immediate database modification − Each log follows an actual database modification.
That is, the database is modified immediately after every operation. It uses redo/undo
operation.

Shadow Paging
Shadow paging is one of the techniques that is used to recover from failure. We all know
that recovery means to get back the information, which is lost. It helps to maintain
database consistency in case of failure.

Shadow Paging is recovery technique that is used to recover database. In this recovery
technique, database is considered as made up of fixed size of logical units of storage
which are referred as pages. pages are mapped into physical blocks of storage, with help
of the page table which allow one entry for each logical page of database. This method
uses two page tables named current page table and shadow page table.
The entries which are present in current page table are used to point to most recent
database pages on disk. Another table i.e., Shadow page table is used when the
transaction starts which is copying current page table. After this, shadow page table gets
saved on disk and current page table is going to be used for transaction. Entries present
in current page table may be changed during execution but in shadow page table it never
get changed. After transaction, both tables become identical.
Advantages of shadow paging
The advantages of shadow paging are as follows −

 No need for log records.


 No undo/ Redo algorithm.
 Recovery is faster.
Disadvantages of shadow paging
The disadvantages of shadow paging are as follows −
 Data is fragmented or scattered.
 Garbage collection problem. Database pages containing old versions of modified
data need to be garbage collected after every transaction.
 Concurrent transactions are difficult to execute.
Checkpoint in DBMS
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be
handled at all. Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.

Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −
 The recovery system reads the logs backwards from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <T n, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
 If the recovery system sees a log with <T n, Start> but no commit or abort log found,
it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before
saving their logs.
Deadlock Handling:-
A system is said to be in deadlock state when in set of transactions every transaction is
waiting for another transaction to finish its operation.
A deadlock is an unwanted situation in which two or more transactions are waiting
indefinitely for one another to give up locks.
Deadlock Occurs when each transaction T in a set is waiting for some item locked by some
other transaction T’. Both transactions stuck in a waiting queue.

Necessary conditions for deadlock to occur in database


A deadlock can arise if the following 4 conditions hold simultaneously in a
system;
 Mutual exclusion: At least one resource is held in a non-sharable mode.
For example, among transactions if there is any Exclusive lock (Write lock)
request, that data item cannot be shared with others.
 Hold and wait: There is a transaction which acquired and held lock on a
data item, and waits for other data item.
 No preemption: A situation where a transaction releases the locks on
data items which it holds only after the successful completion of the
transaction. Not on voluntarily.
 Circular wait: A situation where a transaction say T1 is waiting for
another transaction T2 to release lock on some data items, in turn T2 is waiting
for another transaction T3 to release lock, and so on.

Methods for dealing with deadlock problems


There are two principles methods for dealing with deadlock problems:-
1. Deadlock Prevention
2. Deadlock Detection and Recovery
Deadlock Prevention
Deadlock prevention protocols ensure that the system will never enter into a deadlock
state. Some prevention strategies :
 Require that each transaction locks all its data items before it begins execution
 Impose partial ordering of all data items and require that a transaction can lock data
items only in the order specified by the partial order
Deadlock prevention mechanism proposes two schemes :
 Wait_Die : It is non pre-emption technique. An older transaction is allowed to wait
for a younger transaction, whereas a younger transaction requesting an item held by
an older transaction is aborted and restarted.
For example consider three transactions T1,T2,T3 with timestamp 5,10,15 . if t1
request data item held by T2,then t1 will wait.If T3 requests data item held by T2
then T2 will be rollback.

 Wound_Wait : It is pre-emptive technique. older transaction wounds (forces rollback)


of younger transaction instead of waiting for it. Younger transactions may wait for older
ones.For example consider three transactions T1,T2,T3 with timestamp 5,10,15 . if t1
request data item held by T2,then data item will be preempted by T2 and T2 will be
rollback. If T3 requests a data item held by T2 the T3 will wait.
Timeout-Based Schemes : a transaction waits for a lock only for a specified amount of time.
After that, the wait times out and the transaction is rolled back. thus deadlocks are not
possible

Deadlock detection
 Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E),
 V is a set of vertices (all the transactions in the system) ! E is a set of edges; each
element is an ordered pair Ti →Tj.
 If Ti → Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting
for Tj to release a data item.
 When Ti requests a data item currently being held by Tj, then the edge Ti Tj is
inserted in the wait-for graph. This edge is removed only when Tj is no longer
holding a data item needed by Ti
 The system is in a deadlock state if and only if the wait-for graph has a cycle

Recovery From Deadlock


When deadlock is detected :
 Selecting a Victim Some transaction will have to rolled back (made a victim) to break
deadlock. Select that transaction as victim that will incur minimum cost.
 Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
Partial Rollback- More effective to roll back transaction only as far as necessary to
break deadlock.
 Starvation happens if same transaction is always chosen as victim. Include the
number of rollbacks in the cost factor to avoid starvation

Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database
rather than aborting or restating the database. This is a waste of time and resource.

o Deadlock avoidance mechanism is used to detect any deadlock situation in advance.


A method like "wait for graph" is used for detecting the deadlock situation but this
method is suitable only for the smaller database. For the larger database, deadlock
prevention method can be used.

Distributed database
Database systems that run on each site are independent of each other „ Transactions may
access data at one or more sites
 Distributed database is a system in which storage devices are not connected to a common
processing unit.
 Database is controlled by Distributed Database Management System and data may be
stored at the same location or spread over the interconnected network. It is a loosely
coupled system.
 Shared nothing architecture is used in distributed databases.

 The above diagram is a typical example of distributed database system, in which


communication channel is used to communicate with the different locations and every
system has its own memory and database.

Goals of Distributed Database system.

The concept of distributed database was built with a goal to improve:


Reliability: In distributed database system, if one system fails down or stops working for
some time another system can complete the task.
Availability: In distributed database system reliability can be achieved even if sever fails
down. Another system is available to serve the client request.
Performance: Performance can be achieved by distributing database over different
locations. So the databases are available to every location which is easy to maintain.

Types of distributed databases.


The two types of distributed systems are as follows:

1. Homogeneous distributed databases system:


 Homogeneous distributed database system is a network of two or more databases (With
same type of DBMS software) which can be stored on one or more machines.
 So, in this system data can be accessed and modified simultaneously on several databases
in the network. Homogeneous distributed system are easy to handle.
 All sites have identical software
 Are aware of each other and agree to cooperate in processing user requests.
 Each site surrenders part of its autonomy in terms of right to change schemas or software
 Appears to user as a single system
Example: Consider that we have three departments using Oracle-9i for DBMS. If some
changes are made in one department then, it would update the other department also.

2. Heterogeneous distributed database system.


 Different sites may use different schemas and software
o Difference in schema is a major problem for query processing
o Difference in software is a major problem for transaction processing
 Sites may not be aware of each other and may provide only limited facilities for
cooperation in transaction processing
 Heterogeneous distributed database system is a network of two or more databases with
different types of DBMS software, which can be stored on one or more machines.
 In this system data can be accessible to several databases in the network with the help of
generic connectivity (ODBC and JDBC).

Example: In the following diagram, different DBMS software are accessible to each
other using ODBC and JDBC.

Data Replication IN DBMS


 A relation or fragment of a relation is replicated if it is stored redundantly in two or
more sites. „
 Full replication of a relation is the case where the relation is stored at all sites. „
 Fully redundant databases are those in which every site contains a copy of the entire
database.
 Data replication is the process of storing separate copies of the database at two or
more sites. It is a popular fault tolerance technique of distributed databases.

Advantages of Replication

 Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
 Reduction in Network Load − Since local copies of data are available, query
processing can be done with reduced network usage, particularly during prime
hours. Data updating can be done at non-prime hours.
 Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
 Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become
simpler in nature.

DISADVANTAGES OF DATA REPLICATION –


 More storage space is needed as storing the replicas of same data at different sites
consumes more space.
 Data Replication becomes expensive when the replicas at all different sites need to
be updated.
 Maintaining Data consistency at all different sites involves complex measures.

Data Fragmentation in DBMS


 Division of relation r into fragments r1, r2, …, rn which contain sufficient information
to reconstruct relation r.
 The process of dividing the database into a smaller multiple parts is called as data
fragmentation.
 These fragments may be stored at different locations.
 The data fragmentation process should be carrried out in such a way that the
reconstruction of original database from the fragments is possible.
Types of Data fragmentation
There are three types of data fragmentation:

1. Horizontal data fragmentation

Horizontal fragmentation divides a relation(table) horizontally into the group of rows to


create subsets of tables.

Example:

2. Vertical Fragmentation

Vertical fragmentation divides a relation(table) vertically into groups of columns to create


subsets of tables.
Example:

Acc_No Balance Branch_Name

A_101 5000 Pune

A_102 10,000 Baroda

A_103 25,000 Delhi

Fragmentation1:
SELECT * FROM Acc_NO

Fragmentation2:
SELECT * FROM Balance
3) Hybrid Fragmentation or MIXED Fragmentation

 Hybrid fragmentation can be achieved by performing horizontal and vertical partition


together.
 Mixed fragmentation is group of rows and columns in relation.

Example: Consider the following table which consists of employee information.

Emp_ID Emp_Name Emp_Address Emp_Age Emp_Salary


101 Surendra Baroda 25 15000
102 Jaya Pune 37 12000
103 Jayesh Pune 47 10000

Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40

Fragmentation2:
SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000

Directory Systems in DBMS


 Typical kinds of directory information
o Employee information such as name, id, email, phone, office address
o Even personal information to be accessed from multiple places
 e.g. Web browser bookmarks
 White pages
o Entries organized by name or identifier
 Meant for forward lookup to find more about an entry
 Yellow pages
o Entries organized by properties
o For reverse lookup to find entries matching specific requirements
 When directories are to be accessed across an organization
o Alternative 1: Web interface. Not great for programs
o Alternative 2: Specialized directory access protocols
 Coupled with specialized user interfaces

Directory Access Protocols

 Most commonly used directory access protocol:


o LDAP (Lightweight Directory Access Protocol)- The Lightweight Directory
Access Protocol is an open, vendor-neutral, industry standard application
protocol for accessing and maintaining distributed directory information
services over an Internet Protocol network.
o Simplified from earlier X.500 protocol

You might also like