SlideShare a Scribd company logo
2
Most read
3
Most read
7
Most read
PMSCS-653 (Advance DBMS Final-exam)
Indexing and Hashing
1- Explain the basic concept of indexing. Mention some
factors to be considered for evaluating an indexing
technique.
Basic Concept
An index for a file in a database system works in much the same way as the index in
this textbook. If we want to learn about a particular topic, we can search for the topic in
the index at the back of the book, find the pages where it occurs, and then read the pages
to find the information we are looking for. The words in the index are in sorted order,
making it easy to find the word we are looking for. Moreover, the index is much smaller
than the book, further reducing the effort needed to find the words we are looking for.
Evaluation Factors
 Access types: Access types can include finding records with a specified attribute
value and finding records whose attribute values fall in a specified range.
 Access time: The time it takes to find a particular data item, or set of items, using the
technique in question.
 Insertion time: The time it takes to insert a new data item. This value includes the
time it takes to find the correct place to insert the new data item, as well as the time it
takes to update the index structure.
 Deletion time: The time it takes to delete a data item. This value includes the time it
takes to find the item to be deleted, as well as the time it takes to update the index
structure.
 Space overhead: The additional space occupied by an index structure. Provided that
the amount of additional space is moderate, it is usually worthwhile to sacrifice the
space to achieve improved performance.
2- Explain multilevel indexing with example. When is it
preferable to use a multilevel index rather than a single
level index?
Multi-Level Indexing
 If primary index does not fit in memory, access becomes expensive.
 Solution: treat primary index kept on disk as a sequential file and construct a sparse
index on it.
- Outer index – a sparse index of primary index
- Inner index – the primary index file
 If even outer index is too large to fit in main memory, yet another level of index can
be created, and so on.
 Indices at all levels must be updated on insertion or deletion from the file.
An Example
 Consider 100,000 records, 10 per block, at one index record per block, that's 10,000
index records. Even if we can fit 100 index records per block, this is 100 blocks. If
index is too large to be kept in main memory, a search results in several disk reads.
 For very large files, additional levels of indexing may be required.
 Indices must be updated at all levels when insertions or deletions require it.
 Frequently, each level of index corresponds to a unit of physical storage.
Fig: Multilevel Index
3- Differentiate between Dense & Sparse indexing. What is
the advantage of sparse index over dense index?
Dense VS Sparse Indices
 It is generally faster to locate a record if we have a dense index rather than a sparse
index.
 However, sparse indices have advantages over dense indices in that they require less
space and they impose less maintenance overhead for insertions and deletions.
 There is a trade-off that the system designer must make between access time and
space overhead.
However, sparse indices have advantages over dense indices in that they require
less space and they impose less maintenance overhead for insertions and deletions.
4- Explain hash file organization with example.
Hash File Organization
 In a hash file organization, we obtain the address of the disk block, also called the
bucket containing a desired record directly by computing a function on the search-key
value of the record.
Hash File Organization: An Example
 Let us choose a hash function for the account file using the search key branch_name.
 Suppose we have 26 buckets and we define a hash function that maps names
beginning with the ith letter of the alphabet to the ith bucket.
 This hash function has the virtue of simplicity, but it fails to provide a uniform
distribution, since we expect more branch names to begin with such letters as B and R
than Q and X.
 Instead, we consider 10 buckets and a hash function that computes the sum of the
binary representations of the characters of a key, then returns the sum modulo the
number of buckets.
 For branch name ‘Perryridge’
 Bucket no=h(Perryridge) = 5
 For branch name ‘Round Hill’
 Bucket no=h(Round Hill) = 3
 For branch name ‘Brighton’
 Bucket no=h(Brighton) = 3
Fig. Hash organization of account file, with branch-name as the key
5- Construct a B+-tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
For n=4
19
2 3 5 7 11 17 19 23 3129
5 11 29
7 19
2 3 5 7 11 17 19 23 3129
5 29
2 3 7 11
17
19 23 31
For n=6
6- Construct a B -tree for the following set of key values: (2,
3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6.
For n=4
For n=6
7 23
2 3 5 11 17 19 29 31
Functional Dependencies & Normalization
1- Normalize step by step the following database into 1NF,
2NF and 3NF:
PROJECT (Proj-ID, Proj-Name, Proj-Mgr-ID, Emp-ID,
Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs)
FIRST NORMAL FORM (1NF)
A relation r(R) is said to be in First Normal Form (1NF), if and only if every entry
of the relation has at most a single value. Thus, it requires to decompose the table
PROJECT into two new tables. It is also necessary to identify a PK for each table.
First table with nonrepeating groups:
PROJECT (Proj-ID, Proj-Name, Proj-Mgr-ID)
Second table with table identifier and all repeating groups:
PROJECT-EMPLOYEE (Proj-ID, Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs)
Here Proj-ID is the table identifier.
SECOND NORMAL FORM (2NF)
A relation r (R) is in Second Normal Form (2NF) if and only if the following two
conditions are met simultaneously:
1. r(R) is already in 1NF.
2. No nonprime attribute is partially dependent on any key or, equivalently, each
nonprime attribute in R is fully dependent on every key.
In PROJECT-EMPLOYEE relation attributes Emp-Name, Emp-Dpt and Emp-Hrly-rate are
fully dependent only on Emp-ID. The only attribute Total-Hrs fully depends on the
composite PK (Proj-ID, Emp-ID). This is shown below.
Now break this relation into two new relations so that partial dependency does not exist.
PHOURS-ASSIGNED (Proj-ID, Emp-ID, Total-Hrs)
EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate)
THIRD NORMAL FORM (3NF)
A relation r(R) is in Third Normal Form (3NF) if and only if the following conditions are
satisfied simultaneously:
1. r(R) is already in 2NF.
2. No nonprime attribute is transitively dependent on the key (i.e. No nonprime
attribute functionally determines any other nonprime attribute).
In EMPLOYEE relation, attribute Emp-Hrly-rate is transitively dependent on the PK Emp-
ID through the functional dependency Emp-Dpt → Emp-Hrly-rate. This is shown below.
Now break this relation into two new relations so that transitive dependency does not
exist.
EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt)
CHARGES (Emp-Dpt, Emp-Hrly-rate)
The new set of relations that we have obtained through the normalization process does
not exhibit any of the anomalies. That is, we can insert, delete and update tuples without
any of the side effects.
PROJECT-EMPLOYEE (Proj-ID, Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs)
EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate)
2- Explain BCNF with example.
Boyce-Codd Normal Form:
A relation r(R) is in Boyce-Codd Normal Form (BCNF) if and only of the following
conditions are met simultaneously:
(1) The relation is in 3NF.
(2) For every functional dependency of the form X → A, we have that either A  X
or X is a super key of r. In other words, every functional dependency is either a
trivial dependency or in the case that the functional dependency is not trivial, then
X must be a super key.
To transform the relation of example 2 into BCNF, we can decompose it onto the
following set of relational schemes:
Set No. 1
MANUFACTURER (ID, Name)
MANUFACTURER-PART (ID, Item-No, Quantity)
Or
Set No. 2
MANUFACTURER (ID, Name)
MANUFACTURER-PART (Name, Item-No, Quantity)
Notice that both relations are in BCNF and that the update data anomaly is no longer
present.
3- Given the relation r(A, B, C) and the set F={AB → C, B
→ D, D → B} of functional dependencies, find the candidate
keys of the relation. How many candidate keys are in this
relation? What are the prime attributes?
AB is a candidate key.
(1) AB → A and AB → B ( Reflexivity axiom)
(2) AB → C (given)
(3) AB → B (step 1) and B → D (given) then AB → D ( Transitivity axiom)
Since AB determines every other attribute of the relation, we have that AB is a candidate
key.
AD is a candidate key.
(1) AD → A and AD → D ( Reflexivity axiom)
(2) AD → D (step 1) and D → B (given) then AD → B
(3) AD → B (step 2) and AB → C (given) then AAD → C (Pseudotransitivity
axiom).
i.e. AD → C
Since AD determines every other attribute of the relation, we have that AB is a candidate
key.
The prime attributes are: A, B, and D.
4- What is Canonical Cover? Given the set F of FDs shown
below, find a canonical cover for F.
F= {X → Z, XY → WP, XY → ZWQ, XZ → R}.
Canonical cover
For a given set F of FDs, a canonical cover, denoted by Fc, is a set of FDs where the
following conditions are simultaneously satisfied:
(1) Every FD of Fc is simple. That is, the right-hand side of every functional
dependency of Fc has only one attribute.
(2) Fc is left-reduced.
(3) Fc is nonredundant.
Example: Given the set F of FDs shown below, find a canonical cover for F.
F= {X → Z, XY → WP, XY → ZWQ, XZ → R}.
FC = {X → Z, XY → WP, XY → ZWQ, XZ → R}
= {X → Z, XY → W, XY → P, XY → Z, XY → W, XY → Q, XZ → R}.
[Simple]
= {X → Z, XY → W, XY → P, XY → Z, XY → Q, XZ → R}. [Remove
redundancy]
= {X → Z, XY → W, XY → P, XY → Z, XY → Q, XZ → R}.
= {X → Z, XY → W, XY → P, XY → Q, XZ → R}. [Remove redundancy]
= {X → Z, XY → W, XY → P, XY → Q, XZ → R}.
= {X → Z, XY → W, XY → P, XY → Q, X → R}. [Left-reduced]
FC = {X → Z, XY → W, XY → P, XY → Q, X → R}. This resulting set where all FDs
are simple, left-reduced and non-redundant, is the canonical cover of F.
Distributed Database Management System (DDBMS)
1- Distinguish between local transaction & global
transaction.
 A local transaction is one that accesses data only from sites where the transaction
was initiated.
 A global transaction, on the other hand, is one that either accesses data in a site
different from the one at which the transaction was initiated, or accesses data in
several different sites.
2- Draw & explain the general structure of a
distributed database system. Describe the relative
advantages & disadvantages of DDBMS.
• Consists of a single logical database that is split into a number of fragments. Each
fragment is stored on one or more computers.
• The computers in a distributed system communicate with one another through various
communication media, such as high-speed networks or telephone lines.
• They do not share main memory or disks.
• Each site is capable of independently processing user requests that require access to
local data as well as it is capable of processing user requests that require access to
remote data stored on other computers in the network.
The general structure of a distributed system appears in the following figure.
Advantages of DDBMS
 Sharing data. Users at one site may be able to access the data residing at other sites.
 Increased Local Autonomy. Global database administrator responsible for the entire
system. A part of these responsibilities is delegated to the local database administrator
for each site.
 Increased Availability. If one site fails in a distributed system, the remaining sites
may be able to continue operating.
 Speeding up of query processing. It is possible to process data at several sites
simultaneously. If a query involves data stored at several sites, it may be possible to
split the query into a number of subqueries that can be executed in parallel. Thus,
query processing becomes faster in distributed systems.
 Increased reliability. A single data item may exist at several sites. If one site fails, a
transaction requiring a particular data item from that site may access it from other
sites. Thus, reliability increases.
 Better performance. As each site handles only a part of the entire database, there
may not be the same level of contention for CPU and I/O services that characterizes a
centralized DBMS.
 Processor independence. A distributed system may contain multiple copies of a
particular data item. Thus, users can access any available copy of the data item, and
user requests can be processed by the processor at the data location. Hence, user-
requests do not depend on a specific processor.
 Modular extensibility. Modular extension in a distributed system is much easier.
New sites can be added to the network without affecting the operations of other sites.
Such flexibility allows organizations to extend their system in a relatively rapid and
easier way.
Disadvantages of DDBMS
 Software-development cost. It is more difficult to implement a distributed database
system; thus, it is more costly.
 Greater potential for bugs. Since the sites operate in parallel, it is harder to ensure
the correctness of algorithms, especially operation during failures of part of the
system, and recovery from failures.
 Increased processing overhead. The exchange of messages and the additional
computation required to achieve inter-site coordination are a form of overhead that
does not arise in centralized systems.
 Security. As data in a distributed DBMS are located at multiple sites, the probability
of security lapses increases. Further, all communications between different sites in a
distributed DBMS are conveyed through the network, so the underlying network has
to be made secure to maintain system security.
 Lack of standards. A distributed system can consists of a number of sites that are
heterogeneous in terms of processors, communication networks and DBMSs. This
lack of standards significantly limits the potential of distributed DBMSs.
 Increased storage requirements.
 Maintenance of integrity is very difficult. Database integrity refers to the validity
and consistency of stored data.
 Lack of experience. Distributed DBMSs have not been widely accepted.
Consequently, we not yet have the same level of experience in industry .
 Database design is more complex. The design of a distributed DBMS involves
fragmentation of data, allocation of fragments to specific sites and data replication.
3- Differentiate between Homogeneous and
Heterogeneous Databases.
Homogeneous Distributed Database
 All sites have identical database management system software, are aware of one
another, and agree to cooperate in processing users’ requests.
 Use same DB schemas at all sites.
 Easier to design and manage
 Addition of a new site is much easier.
Heterogeneous distributed database
 Usually constructed over a no. of existing sites.
 Each site has its local database. Different sites may use different schemas (relational
model, OO model etc.).
 Use different DBMS software.
 Query processing is more difficult.
 Use gateways (as query translator) which convert the language and data model of
each different DBMS into the language and data model of the relational system.
4- Define data replication in DDBs. Mention some
major advantages and disadvantages of data
replication.
Data Storage in DDBMS
 Replication. The system maintains several identical replicas (copies) of the relation,
and stores each replica at a different site.
 Fragmentation. The system partitions the relation into several fragments, and stores
each fragment at a different site.
 Fragmentation and replication can be combined.
Advantages and disadvantages of Replication
 Availability.
 Increased parallelism.
 Increased overhead on update.
5- What is Data Fragmentation? Discuss different
types of data fragmentation with examples.
Data Fragmentation
 If relation r is fragmented, r is divided into a number of fragments r1, r2, . . . , rn.
 These fragments contain sufficient information to allow reconstruction of the original
relation r.
 There are two different schemes for fragmenting a relation:
- Horizontal fragmentation and
- Vertical fragmentation
Horizontal Fragmentation
 In horizontal fragmentation, a relation r is partitioned into a number of subsets,
r1, r2, . . . , rn.
 Each tuple of relation r must belong to at least one of the fragments, so that the
original relation can be reconstructed, if needed.
As an illustration, consider the account relation:
Account-schema = (account-number, branch-name, balance)
The relation can be divided into several different fragments. If the banking system has
only two branches - Savar and Dhanmondi - then there are two different fragments:
account1 = σbranch-name = “Savar” (account)
account2 = σbranch-name = “Dhanmondi” (account)
 Use a predicate Pi to construct fragment ri:
ri = σPi (r)
 Reconstruct the relation r by taking the union of all fragments. That is,
r = r1 r2 · · · rn
 The fragments are disjoint.
Vertical Fragmentation
 Vertical fragmentation of r(R) involves the definition of several subsets of
attributes R1, R2, . . .,Rn of the schema R so that
R = R1 R2 · · · Rn
 Each fragment ri of r is defined by
ri = ΠRi (r)
 We can reconstruct relation r from the fragments by taking the natural join
r = r1 ⋈ r2 ⋈ r3 ⋈ · · · ⋈ rn
 One way of ensuring that the relation r can be reconstructed is to include the
primary-key attributes of R in each of the Ri.
- To illustrate vertical fragmentation, consider the following relation:
employee-info=(employee-id, name, designation, salary)
- For privacy reasons, the relation may be fragmented into a relation employee-
privateinfo containing employee-id and salary, and another relation employee-
public-info containing attributes employee-id, name, and designation.
employee-privateinfo=(employee-id, salary)
employee-publicinfo=(employee-id, name, designation)
- These may be stored at different sites, again for security reasons.
6- Write down the correctness rules for data
fragmentation. Give an example of horizontal &
vertical fragments that satisfy all the correctness
rules of fragmentation.
Correctness Rules for Data Fragmentation
To ensure no loss of information and no redundancy of data, there are three different
rules that must be considered during fragmentation.
 Completeness
If a relation instance R is decomposed into fragments R1, R2, . . . .Rn, each data item in
R must appear in at least one of the fragments. It is necessary in fragmentation to ensure
that there is no loss of data during data fragmentation.
 Reconstruction
If relation R is decomposed into fragments R1, R2, . . . .Rn, it must be possible to define
a relational operation that will reconstruct the relation R from fragments R1, R2, . . . .Rn.
This rule ensures that constrains defined on the data are preserved during data
fragmentation.
 Disjoint ness
If a relation R is decomposed into fragments R1, R2, . . . .Rn and if a data item is found
in the fragment Ri, then it must not appear in any other fragments. This rule ensures
minimal data redundancy.
In case of vertical fragmentation, primary key attribute must be repeated to allow
reconstruction. Therefore, in case of vertical fragmentation, disjoint ness is defined only
on non-primary key attributes of a relation.
 Example
Let us consider the relational schema Project where project-type represents whether the
project is an inside project or abroad project. Assume that P1 and P2 are two horizontal
fragments of the relation Project, which are obtained by using the predicate “whether the
value of project-type attribute is ‘inside’ or ‘abroad’.
 Example (Horizontal Fragmentation)
P1: σproject-type = “inside” (Project)
P2: σproject-type = “abroad” (Project)
 Example (Horizontal Fragmentation)
These horizontal fragments satisfy all the correctness rules of fragmentation as
shown below.
Completeness: Each tuple in the relation Project appears either in fragment P1 or
P2. Thus, it satisfies completeness rule for fragmentation.
Reconstruction: The Project relation can be reconstructed from the horizontal
fragments P1 and P2 by using the union operation of relational algebra, which
ensures the reconstruction rule.
Thus, P1 P2 = Project.
Disjointness: The fragments P1 and P2 are disjoint, since there can be no such
project whose project type is both “inside” and “abroad”.
 Example (Vertical Fragmentation)
These vertical fragments also ensure the correctness rules of fragmentation as shown
below.
Completeness: Each tuple in the relation Project appears either in fragment V1 or V2
which satisfies completeness rule for fragmentation.
Reconstruction: The Project relation can be reconstructed from the vertical fragments
V1 and V2 by using the natural join operation of relational algebra, which ensures the
reconstruction rule.
Thus, V1 ⋈ V2 = Project.
Disjointness: The fragments V1 and V2 are disjoint, except for the primary key project-
id, which is repeated in both fragments and is necessary for reconstruction.

Transparencies in DDBMS
1- What is meant by transparency in database
management system? Discuss different types of
distributed transparency.
Transparency
o It refers to the separation of the high-level semantics of a system from lower-level
implementation issues. In a distributed system, it hides the implementation details
from users of the system.
o The user believes that he/she is working with a centralized database system and that
all the complexities of a distributed database system are either hidden or transparent
to the user.
Distribution transparency can be classified into:
 Fragmentation transparency
 Location transparency
 Replication transparency
 Local mapping transparency
 Naming transparency
 Fragmentation transparency
It hides the fact from users that the data are fragmented. Users are not required to
know how a relation has been fragmented. To retrieve data, the user needs not to
specify the particular fragment names.
 Location transparency
Users are not required to know the physical location of the data. To retrieve data
from a distributed database with location transparency, the user has to specify the
database fragment names but need not to specify where the fragments are located in
the system.
 Replication transparency
The user is unaware of the fact that the fragments of relations are replicated and
stored in different sites of the system.
 Local mapping transparency
It refers to the fact that users are aware of both the fragment names and the
location of fragments, taking into account that any replication of the fragments may
exist. In this case, the user has to mention both the fragment names and the location
for data access.
 Naming transparency
In a distributed database system, each database object such as- relations,
fragments, replicas etc, must have a unique name. Therefore, the DDBMS must
ensure that no two sites create a database object with the same name. One solution to
this problem is to create a central name server, which has the responsibility to ensure
uniqueness of all names in the system. Naming transparency means that the users are
not aware of the actual name of the database object in the system. In this case, the
user will specify the alias names of the database objects for data accessing.
Distributed Deadlock Management
1- Define Deadlock in distributed DBMS. Discuss
how deadlock situation can be characterized in
distributed DBMS.
Deadlock
In a database environment, a deadlock is a situation when transactions are
endlessly waiting for one another. Any lock-based concurrency control algorithm and
some timestamp-based concurrency control algorithms may result in deadlocks, as these
algorithms require transactions to wait for one another.
Deadlock Situation
 Deadlock situations can be characterized by wait-for graphs, directed graphs that
indicate which transactions are waiting for which other transactions.
 In a wait-for graph, nodes of the graph represent transactions and edges of the graph
represent the waiting-for relationships among transactions. An edge is drawn in the
wait-for graph from transaction Ti to transaction Tj, if the transaction Ti is waiting for
a lock on a data item that is currently held by the transaction Tj.
 Using wait-for graphs, it is very easy to detect whether a deadlock situation has
occurred in a database environment or not. There is a deadlock in the system if and
only if the corresponding wait-for graph contains a cycle.
2 Explain distributed deadlock prevention method.
Distributed Deadlock Prevention Method
 Distributed Deadlock prevention is a cautious scheme in which a transaction is
restarted when the system suspects that a deadlock might occur. Deadlock prevention
is an alternative method to resolve deadlock situations in which a system is designed
in such a way that deadlocks are impossible. In this scheme, the transaction manager
checks a transaction when it is first initiated and does not permit to proceed if there is
a risk that it may cause a deadlock.
 In the case of lock-based concurrency control, deadlock prevention in a distributed
system is implemented in the following way:
Let us consider that a transaction Ti is initiated at a particular site in a distributed
database system and that it requires a lock on a data item that is currently owned by
another transaction Tj. Here, a deadlock prevention test is done to check whether there is
any possibility of a deadlock occurring in the system. The transaction Ti is not permitted
to enter into a wait state for the transaction Tj, if there is the risk of a deadlock situation.
In this case, one of the two transactions is aborted to prevent a deadlock.
 The deadlock prevention algorithm is called non-preemptive if the transaction Ti is
aborted and restarted. On the other hand, if the transaction Tj is aborted and restarted,
then the deadlock prevention algorithm is called preemptive. The transaction Ti is
permitted to wait for the transaction Tj as usual, if they pass the prevention test. The
prevention test must guarantee that if Ti is allowed to wait for Tj, a deadlock can
never occur.
 A better approach to implement deadlock prevention test is by assigning priorities to
transactions and checking priorities to determine whether one transaction would wait
for the other transaction or not. These priorities can be assigned by using a unique
identifier for each transaction in a distributed system. For instance, consider that i and
j are priorities of two transactions Ti and Tj respectively. The transaction Ti would
wait for the transaction Tj, if Ti has a lower priority than Tj, that is, if i<j. This
approach prevents deadlock, but one problem with this approach is that cyclic restart
is possible. Thus, some transactions could be restarted repeatedly without ever
finishing.
3-Explain the working principal of wait die &
wound wait algorithms for deadlock prevention.
There are two different techniques for deadlock prevention: Wait-die and Wound-wait.
Wait-die is a non-preemptive deadlock prevention technique based on timestamp values
of transactions:
In this technique, when one transaction is about to block and is waiting for a lock
on a data item that is already locked by another transaction, timestamp values of both the
transactions are checked to give priority to the older transaction. If a younger transaction
is holding the lock on data item then the older transaction is allowed to wait, but if an
older transaction is holding the lock, the younger transaction is aborted and restarted with
the same timestamp value. This forces the wait-for graph to be directed from the older to
the younger transactions, making cyclic restarts impossible. For example, if the
transaction Ti requests a lock on a data item that is already locked by the transaction Tj,
then Ti is permitted to wait only if Ti has a lower timestamp value than Tj. On the other
hand, if Ti is younger than Tj, then Ti is aborted and restarted with the same timestamp
value.
Wound-Wait is an alternative preemptive deadlock prevention technique by which
cyclic restarts can be avoided.
In this method, if a younger transaction requests for a lock on a data item that is
already held by an older transaction, the younger transaction is allowed to wait until the
older transaction releases the corresponding lock. In this case, the wait-for graph flows
from the younger to the older transactions, and cyclic restart is again avoided. For
instance, if the transaction Ti requests a lock on a data item that is already locked by the
transaction Tj, then Ti is permitted to wait only if Ti has a higher timestamp value than
Tj, otherwise, the transaction Tj is aborted and the lock is granted to the transaction Ti.
4- Discuss distributed deadlock detection technique
with appropriate figure.
Distributed Deadlock detection
 In distributed deadlock detection method, a deadlock detector exists at each site of the
distributed system. In this method, each site has the same amount of responsibility,
and there is no such distinction as local or global deadlock detector. A variety of
approaches have been proposed for distributed deadlock detection algorithms, but the
most well-known and simplified version, which is presented here, was developed by
R. Obermarck in 1982.
 In this approach, a LWFG is constructed for each site by the respective local deadlock
detectors. An additional external node is added to the LWFGs, as each site in the
distributed system receives the potential deadlock cycles from other sites. In the
distributed deadlock detection algorithm, the external node Tex is added to the
LWFGs to indicate whether any transaction from any remote site is waiting for a data
item that is being held by a transaction at the local site or whether any transaction
from the local site is waiting for a data item is currently being used by any transaction
at any remote site. For instance, an edge from the node Ti to Tex exists in the LWFG,
if the transaction Ti is waiting for a data item that is already held by any transaction at
any remote site. Similarly, an edge from the external node Tex to Ti exists in the
graph, if a transaction from a remote site is waiting to acquire a data item that is
currently being held by the transaction Ti at the local site.
 Thus, the local detector checks for two things to determine a deadlock situation.
 If a LWFG contains a cycle that does not involve the external node Tex, then it
indicates that a deadlock has occurred locally and it can be handled locally.
 On the other hand, a global deadlock potentially exists if the LWFG contains a cycle
involving the external node Tex. However, the existence of such a cycle does not
necessarily imply that there is a global deadlock, as the external node Tex represents
different agents.
 The LWFGs are merged so as to determine global deadlock situations. To avoid sites
transmitting their LWFGs to each other, a simple strategy is followed here. According
to this strategy, one timestamp value is allocated to each transaction and a rule is
imposed such that one site Si transmits its LWFG to the site Sk, if a transaction, say
Tk, at site Sk is waiting for a data item that is currently being held by a transaction Ti
at site Si and ts(Ti)<ts(Tk). If ts (Ti)<ts(Tk), the site Si transmits its LWFG to the site
Sk, and the site Sk adds this information to its LWFG and checks for cycles not
involving the external the node Tex in the extended graph. If there is no cycle in the
extended graph, the process continues until a cycle appears and it may happen that the
entire GWFG is constructed and no cycle is detected. In this case, it is decided that
there is no deadlock in the entire distributed system.
 On the other hand, if the GWFG contains a cycle not involving the external node Tex,
it is concluded that a deadlock has occurred in the system. The distributed deadlock
detection method is illustrated below.
Site 1 Site 2 Site 3
Fig. Distributed Deadlock Detection
Ti
Tex
Tk
Site 3
Site 2
Tj
Tex
Ti
Site 1
Site 3
Tk
Tex
Tj
Site 2
Site 1
5- Define false deadlock & phantom deadlock in
distributed database environment.
False Deadlocks
 To handle the deadlock situation in distributed database systems, a number of
messages are transmitted among the sites. The delay associated with the transmission
of messages that is necessary for deadlock detection can cause the detection of false
deadlocks.
 For instance, consider that at a particular time the deadlock detector has received the
information that the transaction Ti is waiting for the transaction Tj. Further assume
that after some time the transaction Tj releases the data item requested by the
transaction Ti and requests for data item that is being currently held by the transaction
Ti. If the deadlock detector receives the information that the transaction Tj has
requested for a data item that is held by the transaction Ti before receiving the
information that the transaction Ti is not blocked by the transaction Tj any more, a
false deadlock situation is detected.
Phantom Deadlocks
 Another problem is that a transaction Ti that blocks another transaction may be
restarted for reasons that are not related to deadlock detection. In this case, until the
restart message of the transaction Ti is transmitted to the deadlock detector, the
deadlock detector can find a cycle in the wait-for graph that includes the transaction
Ti. Hence, a deadlock situation is detected by the deadlock detector and this is called
a phantom deadlock. When the deadlock detector detects a phantom deadlock, it
may unnecessarily restart a transaction other than Ti. To avoid unnecessary restarts
for phantom deadlock, special safety measures are required.
6- Explain Centralized deadlock detection.
Centralized Deadlock detection
In Centralized Deadlock detection method, a single site is chosen as Deadlock
Detection Coordinator (DDC) for the entire distributed system. The DDC is responsible
for constructing the GWFG for system. Each lock manager in the distributed database
transmits its LWFG to the DDC periodically. The DDC constructs the GWFG from these
LWFGs and checks for cycles in it. The occurrence of a global deadlock situation is
detected if there are one or more cycles in the GWFG. The DDC must break each cycle in
the GWFG by selecting the transactions to be rolled back and restarted to recover from a
deadlock situation. The information regarding the transactions that are to be rolled back
and restarted must be transmitted to the corresponding lock managers by the deadlock
detection coordinator.
– The centralized deadlock detection approach is very simple, but it has several
drawbacks.
– This method is less reliable, as the failure of the central site makes the deadlock
detection impossible.
– The communication cost is very high in the case, as other sites in the distributed
system send their LWFGs to the central site.
– Another disadvantage of centralized deadlock detection technique is that false
detection of deadlocks can occur, for which the deadlock recovery procedure may be
initiated, although no deadlock has occurred. In this method, unnecessary rollbacks
and restarts of transactions may also result owing to phantom deadlocks.

More Related Content

What's hot (20)

PDF
Database Normalization
Arun Sharma
 
PDF
DDBMS Paper with Solution
Gyanmanjari Institute Of Technology
 
PPT
11 Database Concepts
Praveen M Jigajinni
 
PPT
Database, Lecture-1.ppt
MatshushimaSumaya
 
PPT
Database Relationships
Forrester High School
 
DOC
rdbms-notes
Mohit Saini
 
PPT
File organization 1
Rupali Rana
 
PPTX
Object oriented database
Md. Hasan Imam Bijoy
 
PPTX
Database failure and recovery 1
vishal choudhary
 
PPTX
All data models in dbms
Naresh Kumar
 
PPTX
Data science life cycle
Manoj Mishra
 
PPT
Databases: Normalisation
Damian T. Gordon
 
PPTX
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Gyanmanjari Institute Of Technology
 
PDF
Database System Architecture
University of Potsdam
 
PPTX
Data Modeling PPT
Trinath
 
PPTX
Dbms normalization
Pratik Devmurari
 
PDF
Distributed Coordination-Based Systems
Ahmed Magdy Ezzeldin, MSc.
 
PPTX
Association rule mining.pptx
maha797959
 
PPTX
2 phase locking protocol DBMS
Dhananjaysinh Jhala
 
PPTX
Distributed DBMS - Unit 5 - Semantic Data Control
Gyanmanjari Institute Of Technology
 
Database Normalization
Arun Sharma
 
DDBMS Paper with Solution
Gyanmanjari Institute Of Technology
 
11 Database Concepts
Praveen M Jigajinni
 
Database, Lecture-1.ppt
MatshushimaSumaya
 
Database Relationships
Forrester High School
 
rdbms-notes
Mohit Saini
 
File organization 1
Rupali Rana
 
Object oriented database
Md. Hasan Imam Bijoy
 
Database failure and recovery 1
vishal choudhary
 
All data models in dbms
Naresh Kumar
 
Data science life cycle
Manoj Mishra
 
Databases: Normalisation
Damian T. Gordon
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Gyanmanjari Institute Of Technology
 
Database System Architecture
University of Potsdam
 
Data Modeling PPT
Trinath
 
Dbms normalization
Pratik Devmurari
 
Distributed Coordination-Based Systems
Ahmed Magdy Ezzeldin, MSc.
 
Association rule mining.pptx
maha797959
 
2 phase locking protocol DBMS
Dhananjaysinh Jhala
 
Distributed DBMS - Unit 5 - Semantic Data Control
Gyanmanjari Institute Of Technology
 

Viewers also liked (15)

PPTX
Semi join
Alokeparna Choudhury
 
PPTX
Dbms acid
Zaheer Soomro
 
PDF
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Beat Signer
 
XLS
Dbms Final Examination Answer Key
Laguna State Polytechnic University
 
PPTX
Acid properties
Abhilasha Lahigude
 
PPTX
Fragmentation and types of fragmentation in Distributed Database
Abhilasha Lahigude
 
PDF
Transaction & Concurrency Control
Ravimuthurajan
 
PPT
Transaction concurrency control
Anand Grewal
 
PDF
Chapter 5 Database Transaction Management
Eddyzulham Mahluzydde
 
PPT
Transaction management
renuka_a
 
PPTX
8 drived horizontal fragmentation
Mohsan Ijaz
 
PPT
Databases: Concurrency Control
Damian T. Gordon
 
PPT
16. Concurrency Control in DBMS
koolkampus
 
PPT
Concurrency control
Virender Kumar
 
PPT
15. Transactions in DBMS
koolkampus
 
Dbms acid
Zaheer Soomro
 
Transaction Management - Lecture 11 - Introduction to Databases (1007156ANR)
Beat Signer
 
Dbms Final Examination Answer Key
Laguna State Polytechnic University
 
Acid properties
Abhilasha Lahigude
 
Fragmentation and types of fragmentation in Distributed Database
Abhilasha Lahigude
 
Transaction & Concurrency Control
Ravimuthurajan
 
Transaction concurrency control
Anand Grewal
 
Chapter 5 Database Transaction Management
Eddyzulham Mahluzydde
 
Transaction management
renuka_a
 
8 drived horizontal fragmentation
Mohsan Ijaz
 
Databases: Concurrency Control
Damian T. Gordon
 
16. Concurrency Control in DBMS
koolkampus
 
Concurrency control
Virender Kumar
 
15. Transactions in DBMS
koolkampus
 
Ad

Similar to Final exam in advance dbms (20)

PPTX
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
PDF
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
kiruthikan18
 
PPTX
Relational Database Management System
sweetysweety8
 
PPT
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
PPT
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
ahmed518927
 
PDF
Introduction to database-Normalisation
Ajit Nayak
 
PPT
Query optimization and processing for advanced database systems
meharikiros2
 
PPT
Unit05 dbms
arnold 7490
 
PPTX
RDBMS
sowfi
 
DOCX
CSC388 Online Programming Languages Homework 3 (due b.docx
annettsparrow
 
PDF
International Journal of Engineering Research and Development
IJERD Editor
 
PPTX
ORDBMS.pptx
Anitta Antony
 
DOCX
Bt0066 dbms
smumbahelp
 
PPT
Designing A Syntax Based Retrieval System03
Avelin Huo
 
PDF
Query Optimization - Brandon Latronica
"FENG "GEORGE"" YU
 
PDF
Relational Database Design
Prabu U
 
PDF
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
Vladimir Alexiev, PhD, PMP
 
PPTX
Ch-2-Query-Process.pptx advanced database
tasheebedane
 
PPTX
700442110-advanced database Ch-2-Query-Process.pptx
tasheebedane
 
PPTX
Ir 02
Mohammed Romi
 
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
kiruthikan18
 
Relational Database Management System
sweetysweety8
 
ch02-240507064009-ac337bf1 .ppt
iamayesha2526
 
QPOfutyfurfugfuyttruft7rfu65rfuyt PPT - Copy.ppt
ahmed518927
 
Introduction to database-Normalisation
Ajit Nayak
 
Query optimization and processing for advanced database systems
meharikiros2
 
Unit05 dbms
arnold 7490
 
RDBMS
sowfi
 
CSC388 Online Programming Languages Homework 3 (due b.docx
annettsparrow
 
International Journal of Engineering Research and Development
IJERD Editor
 
ORDBMS.pptx
Anitta Antony
 
Bt0066 dbms
smumbahelp
 
Designing A Syntax Based Retrieval System03
Avelin Huo
 
Query Optimization - Brandon Latronica
"FENG "GEORGE"" YU
 
Relational Database Design
Prabu U
 
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
Vladimir Alexiev, PhD, PMP
 
Ch-2-Query-Process.pptx advanced database
tasheebedane
 
700442110-advanced database Ch-2-Query-Process.pptx
tasheebedane
 
Ad

More from Md. Mashiur Rahman (20)

PDF
Rule for creating power point slide
Md. Mashiur Rahman
 
PDF
Answer sheet of switching &amp; routing
Md. Mashiur Rahman
 
PDF
Routing and switching question1
Md. Mashiur Rahman
 
PPT
Lecture 1 networking &amp; internetworking
Md. Mashiur Rahman
 
PPTX
Lec 7 query processing
Md. Mashiur Rahman
 
PPTX
Lec 1 indexing and hashing
Md. Mashiur Rahman
 
PPTX
Cloud computing lecture 7
Md. Mashiur Rahman
 
PPTX
Cloud computing lecture 1
Md. Mashiur Rahman
 
PDF
parallel Questions &amp; answers
Md. Mashiur Rahman
 
DOCX
Computer network solution
Md. Mashiur Rahman
 
DOCX
Computer network answer
Md. Mashiur Rahman
 
Rule for creating power point slide
Md. Mashiur Rahman
 
Answer sheet of switching &amp; routing
Md. Mashiur Rahman
 
Routing and switching question1
Md. Mashiur Rahman
 
Lecture 1 networking &amp; internetworking
Md. Mashiur Rahman
 
Lec 7 query processing
Md. Mashiur Rahman
 
Lec 1 indexing and hashing
Md. Mashiur Rahman
 
Cloud computing lecture 7
Md. Mashiur Rahman
 
Cloud computing lecture 1
Md. Mashiur Rahman
 
parallel Questions &amp; answers
Md. Mashiur Rahman
 
Computer network solution
Md. Mashiur Rahman
 
Computer network answer
Md. Mashiur Rahman
 

Recently uploaded (20)

PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PDF
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPTX
How to Consolidate Subscription Billing in Odoo 18 Sales
Celine George
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
GENERAL METHODS OF ISOLATION AND PURIFICATION OF MARINE__MPHARM.pptx
SHAHEEN SHABBIR
 
PPTX
THE HUMAN INTEGUMENTARY SYSTEM#MLT#BCRAPC.pptx
Subham Panja
 
PPTX
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
PDF
Module 1: Determinants of Health [Tutorial Slides]
JonathanHallett4
 
PPTX
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
PPTX
Latest Features in Odoo 18 - Odoo slides
Celine George
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PDF
Comprehensive Guide to Writing Effective Literature Reviews for Academic Publ...
AJAYI SAMUEL
 
PPTX
IDEAS AND EARLY STATES Social science pptx
NIRANJANASSURESH
 
PDF
Right to Information.pdf by Sapna Maurya XI D
Directorate of Education Delhi
 
PDF
Living Systems Unveiled: Simplified Life Processes for Exam Success
omaiyairshad
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PDF
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
PPTX
How to Configure Storno Accounting in Odoo 18 Accounting
Celine George
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
How to Consolidate Subscription Billing in Odoo 18 Sales
Celine George
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
GENERAL METHODS OF ISOLATION AND PURIFICATION OF MARINE__MPHARM.pptx
SHAHEEN SHABBIR
 
THE HUMAN INTEGUMENTARY SYSTEM#MLT#BCRAPC.pptx
Subham Panja
 
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
Module 1: Determinants of Health [Tutorial Slides]
JonathanHallett4
 
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
Latest Features in Odoo 18 - Odoo slides
Celine George
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
Comprehensive Guide to Writing Effective Literature Reviews for Academic Publ...
AJAYI SAMUEL
 
IDEAS AND EARLY STATES Social science pptx
NIRANJANASSURESH
 
Right to Information.pdf by Sapna Maurya XI D
Directorate of Education Delhi
 
Living Systems Unveiled: Simplified Life Processes for Exam Success
omaiyairshad
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
How to Configure Storno Accounting in Odoo 18 Accounting
Celine George
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 

Final exam in advance dbms

  • 1. PMSCS-653 (Advance DBMS Final-exam) Indexing and Hashing 1- Explain the basic concept of indexing. Mention some factors to be considered for evaluating an indexing technique. Basic Concept An index for a file in a database system works in much the same way as the index in this textbook. If we want to learn about a particular topic, we can search for the topic in the index at the back of the book, find the pages where it occurs, and then read the pages to find the information we are looking for. The words in the index are in sorted order, making it easy to find the word we are looking for. Moreover, the index is much smaller than the book, further reducing the effort needed to find the words we are looking for. Evaluation Factors  Access types: Access types can include finding records with a specified attribute value and finding records whose attribute values fall in a specified range.  Access time: The time it takes to find a particular data item, or set of items, using the technique in question.  Insertion time: The time it takes to insert a new data item. This value includes the time it takes to find the correct place to insert the new data item, as well as the time it takes to update the index structure.  Deletion time: The time it takes to delete a data item. This value includes the time it takes to find the item to be deleted, as well as the time it takes to update the index structure.  Space overhead: The additional space occupied by an index structure. Provided that the amount of additional space is moderate, it is usually worthwhile to sacrifice the space to achieve improved performance. 2- Explain multilevel indexing with example. When is it preferable to use a multilevel index rather than a single level index? Multi-Level Indexing  If primary index does not fit in memory, access becomes expensive.  Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it. - Outer index – a sparse index of primary index - Inner index – the primary index file  If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.  Indices at all levels must be updated on insertion or deletion from the file. An Example  Consider 100,000 records, 10 per block, at one index record per block, that's 10,000 index records. Even if we can fit 100 index records per block, this is 100 blocks. If index is too large to be kept in main memory, a search results in several disk reads.  For very large files, additional levels of indexing may be required.  Indices must be updated at all levels when insertions or deletions require it.  Frequently, each level of index corresponds to a unit of physical storage.
  • 2. Fig: Multilevel Index 3- Differentiate between Dense & Sparse indexing. What is the advantage of sparse index over dense index? Dense VS Sparse Indices  It is generally faster to locate a record if we have a dense index rather than a sparse index.  However, sparse indices have advantages over dense indices in that they require less space and they impose less maintenance overhead for insertions and deletions.  There is a trade-off that the system designer must make between access time and space overhead. However, sparse indices have advantages over dense indices in that they require less space and they impose less maintenance overhead for insertions and deletions. 4- Explain hash file organization with example. Hash File Organization  In a hash file organization, we obtain the address of the disk block, also called the bucket containing a desired record directly by computing a function on the search-key value of the record. Hash File Organization: An Example  Let us choose a hash function for the account file using the search key branch_name.  Suppose we have 26 buckets and we define a hash function that maps names beginning with the ith letter of the alphabet to the ith bucket.
  • 3.  This hash function has the virtue of simplicity, but it fails to provide a uniform distribution, since we expect more branch names to begin with such letters as B and R than Q and X.  Instead, we consider 10 buckets and a hash function that computes the sum of the binary representations of the characters of a key, then returns the sum modulo the number of buckets.  For branch name ‘Perryridge’  Bucket no=h(Perryridge) = 5  For branch name ‘Round Hill’  Bucket no=h(Round Hill) = 3  For branch name ‘Brighton’  Bucket no=h(Brighton) = 3 Fig. Hash organization of account file, with branch-name as the key 5- Construct a B+-tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. For n=4 19 2 3 5 7 11 17 19 23 3129 5 11 29
  • 4. 7 19 2 3 5 7 11 17 19 23 3129 5 29 2 3 7 11 17 19 23 31 For n=6 6- Construct a B -tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) for n=4 and n=6. For n=4 For n=6 7 23 2 3 5 11 17 19 29 31 Functional Dependencies & Normalization 1- Normalize step by step the following database into 1NF, 2NF and 3NF: PROJECT (Proj-ID, Proj-Name, Proj-Mgr-ID, Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs) FIRST NORMAL FORM (1NF) A relation r(R) is said to be in First Normal Form (1NF), if and only if every entry of the relation has at most a single value. Thus, it requires to decompose the table PROJECT into two new tables. It is also necessary to identify a PK for each table. First table with nonrepeating groups: PROJECT (Proj-ID, Proj-Name, Proj-Mgr-ID) Second table with table identifier and all repeating groups: PROJECT-EMPLOYEE (Proj-ID, Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs) Here Proj-ID is the table identifier.
  • 5. SECOND NORMAL FORM (2NF) A relation r (R) is in Second Normal Form (2NF) if and only if the following two conditions are met simultaneously: 1. r(R) is already in 1NF. 2. No nonprime attribute is partially dependent on any key or, equivalently, each nonprime attribute in R is fully dependent on every key. In PROJECT-EMPLOYEE relation attributes Emp-Name, Emp-Dpt and Emp-Hrly-rate are fully dependent only on Emp-ID. The only attribute Total-Hrs fully depends on the composite PK (Proj-ID, Emp-ID). This is shown below. Now break this relation into two new relations so that partial dependency does not exist. PHOURS-ASSIGNED (Proj-ID, Emp-ID, Total-Hrs) EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate) THIRD NORMAL FORM (3NF) A relation r(R) is in Third Normal Form (3NF) if and only if the following conditions are satisfied simultaneously: 1. r(R) is already in 2NF. 2. No nonprime attribute is transitively dependent on the key (i.e. No nonprime attribute functionally determines any other nonprime attribute). In EMPLOYEE relation, attribute Emp-Hrly-rate is transitively dependent on the PK Emp- ID through the functional dependency Emp-Dpt → Emp-Hrly-rate. This is shown below. Now break this relation into two new relations so that transitive dependency does not exist. EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt) CHARGES (Emp-Dpt, Emp-Hrly-rate) The new set of relations that we have obtained through the normalization process does not exhibit any of the anomalies. That is, we can insert, delete and update tuples without any of the side effects. PROJECT-EMPLOYEE (Proj-ID, Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate, Total-Hrs) EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt, Emp-Hrly-rate)
  • 6. 2- Explain BCNF with example. Boyce-Codd Normal Form: A relation r(R) is in Boyce-Codd Normal Form (BCNF) if and only of the following conditions are met simultaneously: (1) The relation is in 3NF. (2) For every functional dependency of the form X → A, we have that either A  X or X is a super key of r. In other words, every functional dependency is either a trivial dependency or in the case that the functional dependency is not trivial, then X must be a super key. To transform the relation of example 2 into BCNF, we can decompose it onto the following set of relational schemes: Set No. 1 MANUFACTURER (ID, Name) MANUFACTURER-PART (ID, Item-No, Quantity) Or Set No. 2 MANUFACTURER (ID, Name) MANUFACTURER-PART (Name, Item-No, Quantity) Notice that both relations are in BCNF and that the update data anomaly is no longer present. 3- Given the relation r(A, B, C) and the set F={AB → C, B → D, D → B} of functional dependencies, find the candidate keys of the relation. How many candidate keys are in this relation? What are the prime attributes? AB is a candidate key. (1) AB → A and AB → B ( Reflexivity axiom) (2) AB → C (given) (3) AB → B (step 1) and B → D (given) then AB → D ( Transitivity axiom) Since AB determines every other attribute of the relation, we have that AB is a candidate key. AD is a candidate key. (1) AD → A and AD → D ( Reflexivity axiom) (2) AD → D (step 1) and D → B (given) then AD → B (3) AD → B (step 2) and AB → C (given) then AAD → C (Pseudotransitivity axiom). i.e. AD → C Since AD determines every other attribute of the relation, we have that AB is a candidate key. The prime attributes are: A, B, and D.
  • 7. 4- What is Canonical Cover? Given the set F of FDs shown below, find a canonical cover for F. F= {X → Z, XY → WP, XY → ZWQ, XZ → R}. Canonical cover For a given set F of FDs, a canonical cover, denoted by Fc, is a set of FDs where the following conditions are simultaneously satisfied: (1) Every FD of Fc is simple. That is, the right-hand side of every functional dependency of Fc has only one attribute. (2) Fc is left-reduced. (3) Fc is nonredundant. Example: Given the set F of FDs shown below, find a canonical cover for F. F= {X → Z, XY → WP, XY → ZWQ, XZ → R}. FC = {X → Z, XY → WP, XY → ZWQ, XZ → R} = {X → Z, XY → W, XY → P, XY → Z, XY → W, XY → Q, XZ → R}. [Simple] = {X → Z, XY → W, XY → P, XY → Z, XY → Q, XZ → R}. [Remove redundancy] = {X → Z, XY → W, XY → P, XY → Z, XY → Q, XZ → R}. = {X → Z, XY → W, XY → P, XY → Q, XZ → R}. [Remove redundancy] = {X → Z, XY → W, XY → P, XY → Q, XZ → R}. = {X → Z, XY → W, XY → P, XY → Q, X → R}. [Left-reduced] FC = {X → Z, XY → W, XY → P, XY → Q, X → R}. This resulting set where all FDs are simple, left-reduced and non-redundant, is the canonical cover of F. Distributed Database Management System (DDBMS) 1- Distinguish between local transaction & global transaction.  A local transaction is one that accesses data only from sites where the transaction was initiated.  A global transaction, on the other hand, is one that either accesses data in a site different from the one at which the transaction was initiated, or accesses data in several different sites. 2- Draw & explain the general structure of a distributed database system. Describe the relative advantages & disadvantages of DDBMS. • Consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers. • The computers in a distributed system communicate with one another through various communication media, such as high-speed networks or telephone lines. • They do not share main memory or disks.
  • 8. • Each site is capable of independently processing user requests that require access to local data as well as it is capable of processing user requests that require access to remote data stored on other computers in the network. The general structure of a distributed system appears in the following figure. Advantages of DDBMS  Sharing data. Users at one site may be able to access the data residing at other sites.  Increased Local Autonomy. Global database administrator responsible for the entire system. A part of these responsibilities is delegated to the local database administrator for each site.  Increased Availability. If one site fails in a distributed system, the remaining sites may be able to continue operating.  Speeding up of query processing. It is possible to process data at several sites simultaneously. If a query involves data stored at several sites, it may be possible to split the query into a number of subqueries that can be executed in parallel. Thus, query processing becomes faster in distributed systems.  Increased reliability. A single data item may exist at several sites. If one site fails, a transaction requiring a particular data item from that site may access it from other sites. Thus, reliability increases.  Better performance. As each site handles only a part of the entire database, there may not be the same level of contention for CPU and I/O services that characterizes a centralized DBMS.  Processor independence. A distributed system may contain multiple copies of a particular data item. Thus, users can access any available copy of the data item, and user requests can be processed by the processor at the data location. Hence, user- requests do not depend on a specific processor.  Modular extensibility. Modular extension in a distributed system is much easier. New sites can be added to the network without affecting the operations of other sites. Such flexibility allows organizations to extend their system in a relatively rapid and easier way.
  • 9. Disadvantages of DDBMS  Software-development cost. It is more difficult to implement a distributed database system; thus, it is more costly.  Greater potential for bugs. Since the sites operate in parallel, it is harder to ensure the correctness of algorithms, especially operation during failures of part of the system, and recovery from failures.  Increased processing overhead. The exchange of messages and the additional computation required to achieve inter-site coordination are a form of overhead that does not arise in centralized systems.  Security. As data in a distributed DBMS are located at multiple sites, the probability of security lapses increases. Further, all communications between different sites in a distributed DBMS are conveyed through the network, so the underlying network has to be made secure to maintain system security.  Lack of standards. A distributed system can consists of a number of sites that are heterogeneous in terms of processors, communication networks and DBMSs. This lack of standards significantly limits the potential of distributed DBMSs.  Increased storage requirements.  Maintenance of integrity is very difficult. Database integrity refers to the validity and consistency of stored data.  Lack of experience. Distributed DBMSs have not been widely accepted. Consequently, we not yet have the same level of experience in industry .  Database design is more complex. The design of a distributed DBMS involves fragmentation of data, allocation of fragments to specific sites and data replication. 3- Differentiate between Homogeneous and Heterogeneous Databases. Homogeneous Distributed Database  All sites have identical database management system software, are aware of one another, and agree to cooperate in processing users’ requests.  Use same DB schemas at all sites.  Easier to design and manage  Addition of a new site is much easier. Heterogeneous distributed database  Usually constructed over a no. of existing sites.  Each site has its local database. Different sites may use different schemas (relational model, OO model etc.).  Use different DBMS software.  Query processing is more difficult.  Use gateways (as query translator) which convert the language and data model of each different DBMS into the language and data model of the relational system.
  • 10. 4- Define data replication in DDBs. Mention some major advantages and disadvantages of data replication. Data Storage in DDBMS  Replication. The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site.  Fragmentation. The system partitions the relation into several fragments, and stores each fragment at a different site.  Fragmentation and replication can be combined. Advantages and disadvantages of Replication  Availability.  Increased parallelism.  Increased overhead on update. 5- What is Data Fragmentation? Discuss different types of data fragmentation with examples. Data Fragmentation  If relation r is fragmented, r is divided into a number of fragments r1, r2, . . . , rn.  These fragments contain sufficient information to allow reconstruction of the original relation r.  There are two different schemes for fragmenting a relation: - Horizontal fragmentation and - Vertical fragmentation Horizontal Fragmentation  In horizontal fragmentation, a relation r is partitioned into a number of subsets, r1, r2, . . . , rn.  Each tuple of relation r must belong to at least one of the fragments, so that the original relation can be reconstructed, if needed. As an illustration, consider the account relation: Account-schema = (account-number, branch-name, balance) The relation can be divided into several different fragments. If the banking system has only two branches - Savar and Dhanmondi - then there are two different fragments: account1 = σbranch-name = “Savar” (account) account2 = σbranch-name = “Dhanmondi” (account)  Use a predicate Pi to construct fragment ri: ri = σPi (r)  Reconstruct the relation r by taking the union of all fragments. That is, r = r1 r2 · · · rn  The fragments are disjoint.
  • 11. Vertical Fragmentation  Vertical fragmentation of r(R) involves the definition of several subsets of attributes R1, R2, . . .,Rn of the schema R so that R = R1 R2 · · · Rn  Each fragment ri of r is defined by ri = ΠRi (r)  We can reconstruct relation r from the fragments by taking the natural join r = r1 ⋈ r2 ⋈ r3 ⋈ · · · ⋈ rn  One way of ensuring that the relation r can be reconstructed is to include the primary-key attributes of R in each of the Ri. - To illustrate vertical fragmentation, consider the following relation: employee-info=(employee-id, name, designation, salary) - For privacy reasons, the relation may be fragmented into a relation employee- privateinfo containing employee-id and salary, and another relation employee- public-info containing attributes employee-id, name, and designation. employee-privateinfo=(employee-id, salary) employee-publicinfo=(employee-id, name, designation) - These may be stored at different sites, again for security reasons. 6- Write down the correctness rules for data fragmentation. Give an example of horizontal & vertical fragments that satisfy all the correctness rules of fragmentation. Correctness Rules for Data Fragmentation To ensure no loss of information and no redundancy of data, there are three different rules that must be considered during fragmentation.  Completeness If a relation instance R is decomposed into fragments R1, R2, . . . .Rn, each data item in R must appear in at least one of the fragments. It is necessary in fragmentation to ensure that there is no loss of data during data fragmentation.  Reconstruction If relation R is decomposed into fragments R1, R2, . . . .Rn, it must be possible to define a relational operation that will reconstruct the relation R from fragments R1, R2, . . . .Rn. This rule ensures that constrains defined on the data are preserved during data fragmentation.  Disjoint ness If a relation R is decomposed into fragments R1, R2, . . . .Rn and if a data item is found in the fragment Ri, then it must not appear in any other fragments. This rule ensures minimal data redundancy.
  • 12. In case of vertical fragmentation, primary key attribute must be repeated to allow reconstruction. Therefore, in case of vertical fragmentation, disjoint ness is defined only on non-primary key attributes of a relation.  Example Let us consider the relational schema Project where project-type represents whether the project is an inside project or abroad project. Assume that P1 and P2 are two horizontal fragments of the relation Project, which are obtained by using the predicate “whether the value of project-type attribute is ‘inside’ or ‘abroad’.  Example (Horizontal Fragmentation) P1: σproject-type = “inside” (Project) P2: σproject-type = “abroad” (Project)
  • 13.  Example (Horizontal Fragmentation) These horizontal fragments satisfy all the correctness rules of fragmentation as shown below. Completeness: Each tuple in the relation Project appears either in fragment P1 or P2. Thus, it satisfies completeness rule for fragmentation. Reconstruction: The Project relation can be reconstructed from the horizontal fragments P1 and P2 by using the union operation of relational algebra, which ensures the reconstruction rule. Thus, P1 P2 = Project. Disjointness: The fragments P1 and P2 are disjoint, since there can be no such project whose project type is both “inside” and “abroad”.  Example (Vertical Fragmentation) These vertical fragments also ensure the correctness rules of fragmentation as shown below. Completeness: Each tuple in the relation Project appears either in fragment V1 or V2 which satisfies completeness rule for fragmentation. Reconstruction: The Project relation can be reconstructed from the vertical fragments V1 and V2 by using the natural join operation of relational algebra, which ensures the reconstruction rule. Thus, V1 ⋈ V2 = Project. Disjointness: The fragments V1 and V2 are disjoint, except for the primary key project- id, which is repeated in both fragments and is necessary for reconstruction. 
  • 14. Transparencies in DDBMS 1- What is meant by transparency in database management system? Discuss different types of distributed transparency. Transparency o It refers to the separation of the high-level semantics of a system from lower-level implementation issues. In a distributed system, it hides the implementation details from users of the system. o The user believes that he/she is working with a centralized database system and that all the complexities of a distributed database system are either hidden or transparent to the user. Distribution transparency can be classified into:  Fragmentation transparency  Location transparency  Replication transparency  Local mapping transparency  Naming transparency  Fragmentation transparency It hides the fact from users that the data are fragmented. Users are not required to know how a relation has been fragmented. To retrieve data, the user needs not to specify the particular fragment names.  Location transparency Users are not required to know the physical location of the data. To retrieve data from a distributed database with location transparency, the user has to specify the database fragment names but need not to specify where the fragments are located in the system.  Replication transparency The user is unaware of the fact that the fragments of relations are replicated and stored in different sites of the system.  Local mapping transparency It refers to the fact that users are aware of both the fragment names and the location of fragments, taking into account that any replication of the fragments may exist. In this case, the user has to mention both the fragment names and the location for data access.  Naming transparency In a distributed database system, each database object such as- relations, fragments, replicas etc, must have a unique name. Therefore, the DDBMS must ensure that no two sites create a database object with the same name. One solution to this problem is to create a central name server, which has the responsibility to ensure uniqueness of all names in the system. Naming transparency means that the users are not aware of the actual name of the database object in the system. In this case, the user will specify the alias names of the database objects for data accessing.
  • 15. Distributed Deadlock Management 1- Define Deadlock in distributed DBMS. Discuss how deadlock situation can be characterized in distributed DBMS. Deadlock In a database environment, a deadlock is a situation when transactions are endlessly waiting for one another. Any lock-based concurrency control algorithm and some timestamp-based concurrency control algorithms may result in deadlocks, as these algorithms require transactions to wait for one another. Deadlock Situation  Deadlock situations can be characterized by wait-for graphs, directed graphs that indicate which transactions are waiting for which other transactions.  In a wait-for graph, nodes of the graph represent transactions and edges of the graph represent the waiting-for relationships among transactions. An edge is drawn in the wait-for graph from transaction Ti to transaction Tj, if the transaction Ti is waiting for a lock on a data item that is currently held by the transaction Tj.  Using wait-for graphs, it is very easy to detect whether a deadlock situation has occurred in a database environment or not. There is a deadlock in the system if and only if the corresponding wait-for graph contains a cycle. 2 Explain distributed deadlock prevention method. Distributed Deadlock Prevention Method  Distributed Deadlock prevention is a cautious scheme in which a transaction is restarted when the system suspects that a deadlock might occur. Deadlock prevention is an alternative method to resolve deadlock situations in which a system is designed in such a way that deadlocks are impossible. In this scheme, the transaction manager checks a transaction when it is first initiated and does not permit to proceed if there is a risk that it may cause a deadlock.  In the case of lock-based concurrency control, deadlock prevention in a distributed system is implemented in the following way: Let us consider that a transaction Ti is initiated at a particular site in a distributed database system and that it requires a lock on a data item that is currently owned by another transaction Tj. Here, a deadlock prevention test is done to check whether there is any possibility of a deadlock occurring in the system. The transaction Ti is not permitted to enter into a wait state for the transaction Tj, if there is the risk of a deadlock situation. In this case, one of the two transactions is aborted to prevent a deadlock.  The deadlock prevention algorithm is called non-preemptive if the transaction Ti is aborted and restarted. On the other hand, if the transaction Tj is aborted and restarted, then the deadlock prevention algorithm is called preemptive. The transaction Ti is permitted to wait for the transaction Tj as usual, if they pass the prevention test. The prevention test must guarantee that if Ti is allowed to wait for Tj, a deadlock can never occur.  A better approach to implement deadlock prevention test is by assigning priorities to transactions and checking priorities to determine whether one transaction would wait for the other transaction or not. These priorities can be assigned by using a unique
  • 16. identifier for each transaction in a distributed system. For instance, consider that i and j are priorities of two transactions Ti and Tj respectively. The transaction Ti would wait for the transaction Tj, if Ti has a lower priority than Tj, that is, if i<j. This approach prevents deadlock, but one problem with this approach is that cyclic restart is possible. Thus, some transactions could be restarted repeatedly without ever finishing. 3-Explain the working principal of wait die & wound wait algorithms for deadlock prevention. There are two different techniques for deadlock prevention: Wait-die and Wound-wait. Wait-die is a non-preemptive deadlock prevention technique based on timestamp values of transactions: In this technique, when one transaction is about to block and is waiting for a lock on a data item that is already locked by another transaction, timestamp values of both the transactions are checked to give priority to the older transaction. If a younger transaction is holding the lock on data item then the older transaction is allowed to wait, but if an older transaction is holding the lock, the younger transaction is aborted and restarted with the same timestamp value. This forces the wait-for graph to be directed from the older to the younger transactions, making cyclic restarts impossible. For example, if the transaction Ti requests a lock on a data item that is already locked by the transaction Tj, then Ti is permitted to wait only if Ti has a lower timestamp value than Tj. On the other hand, if Ti is younger than Tj, then Ti is aborted and restarted with the same timestamp value. Wound-Wait is an alternative preemptive deadlock prevention technique by which cyclic restarts can be avoided. In this method, if a younger transaction requests for a lock on a data item that is already held by an older transaction, the younger transaction is allowed to wait until the older transaction releases the corresponding lock. In this case, the wait-for graph flows from the younger to the older transactions, and cyclic restart is again avoided. For instance, if the transaction Ti requests a lock on a data item that is already locked by the transaction Tj, then Ti is permitted to wait only if Ti has a higher timestamp value than Tj, otherwise, the transaction Tj is aborted and the lock is granted to the transaction Ti. 4- Discuss distributed deadlock detection technique with appropriate figure. Distributed Deadlock detection  In distributed deadlock detection method, a deadlock detector exists at each site of the distributed system. In this method, each site has the same amount of responsibility, and there is no such distinction as local or global deadlock detector. A variety of approaches have been proposed for distributed deadlock detection algorithms, but the most well-known and simplified version, which is presented here, was developed by R. Obermarck in 1982.  In this approach, a LWFG is constructed for each site by the respective local deadlock detectors. An additional external node is added to the LWFGs, as each site in the distributed system receives the potential deadlock cycles from other sites. In the
  • 17. distributed deadlock detection algorithm, the external node Tex is added to the LWFGs to indicate whether any transaction from any remote site is waiting for a data item that is being held by a transaction at the local site or whether any transaction from the local site is waiting for a data item is currently being used by any transaction at any remote site. For instance, an edge from the node Ti to Tex exists in the LWFG, if the transaction Ti is waiting for a data item that is already held by any transaction at any remote site. Similarly, an edge from the external node Tex to Ti exists in the graph, if a transaction from a remote site is waiting to acquire a data item that is currently being held by the transaction Ti at the local site.  Thus, the local detector checks for two things to determine a deadlock situation.  If a LWFG contains a cycle that does not involve the external node Tex, then it indicates that a deadlock has occurred locally and it can be handled locally.  On the other hand, a global deadlock potentially exists if the LWFG contains a cycle involving the external node Tex. However, the existence of such a cycle does not necessarily imply that there is a global deadlock, as the external node Tex represents different agents.  The LWFGs are merged so as to determine global deadlock situations. To avoid sites transmitting their LWFGs to each other, a simple strategy is followed here. According to this strategy, one timestamp value is allocated to each transaction and a rule is imposed such that one site Si transmits its LWFG to the site Sk, if a transaction, say Tk, at site Sk is waiting for a data item that is currently being held by a transaction Ti at site Si and ts(Ti)<ts(Tk). If ts (Ti)<ts(Tk), the site Si transmits its LWFG to the site Sk, and the site Sk adds this information to its LWFG and checks for cycles not involving the external the node Tex in the extended graph. If there is no cycle in the extended graph, the process continues until a cycle appears and it may happen that the entire GWFG is constructed and no cycle is detected. In this case, it is decided that there is no deadlock in the entire distributed system.  On the other hand, if the GWFG contains a cycle not involving the external node Tex, it is concluded that a deadlock has occurred in the system. The distributed deadlock detection method is illustrated below. Site 1 Site 2 Site 3 Fig. Distributed Deadlock Detection Ti Tex Tk Site 3 Site 2 Tj Tex Ti Site 1 Site 3 Tk Tex Tj Site 2 Site 1
  • 18. 5- Define false deadlock & phantom deadlock in distributed database environment. False Deadlocks  To handle the deadlock situation in distributed database systems, a number of messages are transmitted among the sites. The delay associated with the transmission of messages that is necessary for deadlock detection can cause the detection of false deadlocks.  For instance, consider that at a particular time the deadlock detector has received the information that the transaction Ti is waiting for the transaction Tj. Further assume that after some time the transaction Tj releases the data item requested by the transaction Ti and requests for data item that is being currently held by the transaction Ti. If the deadlock detector receives the information that the transaction Tj has requested for a data item that is held by the transaction Ti before receiving the information that the transaction Ti is not blocked by the transaction Tj any more, a false deadlock situation is detected. Phantom Deadlocks  Another problem is that a transaction Ti that blocks another transaction may be restarted for reasons that are not related to deadlock detection. In this case, until the restart message of the transaction Ti is transmitted to the deadlock detector, the deadlock detector can find a cycle in the wait-for graph that includes the transaction Ti. Hence, a deadlock situation is detected by the deadlock detector and this is called a phantom deadlock. When the deadlock detector detects a phantom deadlock, it may unnecessarily restart a transaction other than Ti. To avoid unnecessary restarts for phantom deadlock, special safety measures are required. 6- Explain Centralized deadlock detection. Centralized Deadlock detection In Centralized Deadlock detection method, a single site is chosen as Deadlock Detection Coordinator (DDC) for the entire distributed system. The DDC is responsible for constructing the GWFG for system. Each lock manager in the distributed database transmits its LWFG to the DDC periodically. The DDC constructs the GWFG from these LWFGs and checks for cycles in it. The occurrence of a global deadlock situation is detected if there are one or more cycles in the GWFG. The DDC must break each cycle in the GWFG by selecting the transactions to be rolled back and restarted to recover from a deadlock situation. The information regarding the transactions that are to be rolled back and restarted must be transmitted to the corresponding lock managers by the deadlock detection coordinator. – The centralized deadlock detection approach is very simple, but it has several drawbacks. – This method is less reliable, as the failure of the central site makes the deadlock detection impossible. – The communication cost is very high in the case, as other sites in the distributed system send their LWFGs to the central site. – Another disadvantage of centralized deadlock detection technique is that false detection of deadlocks can occur, for which the deadlock recovery procedure may be initiated, although no deadlock has occurred. In this method, unnecessary rollbacks and restarts of transactions may also result owing to phantom deadlocks.