0% found this document useful (0 votes)
11 views

DDBMS Design

Distributed Database design is discussed here.

Uploaded by

debjit7864
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

DDBMS Design

Distributed Database design is discussed here.

Uploaded by

debjit7864
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

DDBMS Design Principle

Design of centrally located Database (CD) involves following:

 Design the conceptual (all the data which are used by the database applications)
schema
 Design the physical database (mapping the conceptual schema to storage areas
and determining appropriate access methods)

Design of distributed database (DD):

Two abovementioned steps are done by


 Designing of global schema
 Design of the local physical databases at each site
Additional steps for distribution of database
 Design fragmentation (logical concept)
 Design allocation (physical concept) of fragments including replication decision

Points to remember
 Although the design of application programs is made after schema design, the
knowledge of application requirements influences schema design, since schemata
must be able to support applications efficiently.

 The site from which the application is issued is called site of origin of the
application.
Objectives of data distribution design

 Processing locality – Place data as close as possible to the applications which use
them.
 Availability and reliability – A high degree of availability for read-only
applications is achieved by storing multiple copies of the same information.
Reliability is also achieved by storing multiple copies of the same information,
since it is possible to recover from crashes of one of the copies by using the other.

 Workload distribution is done for the following:

o Taking advantage of specialized powers/utility of computers at each


site
o Maximizing the degree of parallelism
As workload distribution and processing locality are two conflicting requirements,
there must be a trade-off.
 Storage cost and availability—Database distribution should reflect the cost and
availability of storage at the different sites. The cost of data storage is not relevant
w.r.t. the CPU, I/O and transmission cost of applications but the limitation of
available storage at each site must be considered.
Satisfying all these conditions leads to complex optimization models.
Designer’s options:

 Option 1 -- Consider some of the above features as constraints rather than


objectives.
 Option 2 -- Consider the most important criterion in the initial design and to
introduce other criteria in the post-optimization.

Design of data distribution: Top-Down vs. Bottom-up approaches

Top-down design:
 Design the global schema
 Fragment the database
 Allocate the fragments to the sites
 Create the physical images

This approach is suitable for systems which are developed from scratch since it allows
performing the design rationally.

When the distributed database is developed as the aggregation of the existing databases,
bottom-up approach is normally followed.

Bottom-up design:
 Select a common database model for describing the global schema of the database
 Translate each local schema into the common data model
 Integrate all the local schema into a common global schema

By integration it means the merging of common data definition and the resolution of
conflicts among different representations given to the same data.

Example:
Consider a distributed database for a company in West Bengal having 3 sites at North
Bengal (site 1), Kolkata (site 2) and South Bengal (site 3). Kolkata is located about
halfway between NB and SB. There are 30 depts physically grouped as follows: the first
10 are close to NB, depts. between 11 and 20 are close to Kolkata, and depts. over 20 are
close to SB.

Suppliers of the company are all either in the city of NB or in the city of SB. Moreover
NB is in area ‘North’ and SB is in ‘South’. Kolkata falls on the border with some depts.
in North and some are in South.

Let us design the fragmentation of the following two tables:


SUPPLIER (snum,name,city)
DEPT (deptnum,name,area,mgrnum)

Consider the following application:


 Retrieve name of suppliers with a given number snum. The pseudocode is as
follows:
select name
from SUPPLIER
where snum= $X;

It is more likely that it references the supplier whose city


Query issued at site 1, city = NB
Query issued at site 2, city = NB or SB
Query issued at site 3, city = SB

Possible predicates in this application domain:


P1: city = “NB”
P2: city = “SB”
Consider the following applications:

 Administrative information about departments in area = North are issued at site 1


and in area=South are issued at site 3.
 Regular information about work conducted at each department may be issued at
any department.
Possible predicates in this application domain:
P1: deptnum  10
P2: 10 < deptnum  20
P3: deptnum > 20
P4: area = “North”
P5: area = “South”
There are a number of combinations between the elements of the two sets {P1,P2,P3}
and {P4,P5}which are not valid :
Example: P3 AND P4 is invalid; only four combinations are valid as shown below:

Y1: deptnum  10
Y2: (10 < deptnum  20) AND (area = “North”)
Y3: (10 < deptnum  20) AND (area = “South”)
Y4: deptnum > 20

P4: area = “North” P5: area = “South”


P1 Y1 False
P2 Y2 Y3
P3 False Y4

Table: Fragmentation of relation DEPT

Allocation of fragments:
 Fragments corresponding to Y1 and Y4 can easily be allocated at sites 1 and 3.
 The allocation of fragments Y2 and Y3 needs a trade-off between two conflicting
requirements as follows:
o Administrative applications which would like fragments to be allocated at
site 1 and 3 respectively.
o Regular application would like fragments to be allocated at site 2.

NB: In this example fragments Y2 and Y3 are appropriate units for the allocation
Problem.

A distributed join is a join between horizontally fragmented relations. R X S means all


the tuples of R and S need to be compared; that in turn means to compare all the
fragments Ri with all the fragments Sj. For some applications, if it is found that some of
partial joins Ri JN Sj are intrinsically empty (values of the join attribute in R i and Sj are
disjoint.

R1
R1 S1
S1 R2
R2
S2
R3 R3
S3 S2
R4 R4
S3
R5
(a) Join graph (b) Partitioned join graph (c) Simple join graph

A distributed join can be represented by join graphs. It is defined as a graph (N,E) where
nodes N represent fragments of R and S and non-directed edges represent joins between
fragments which are not intrinsically empty.

 Total join graph–Graph contains all possible edges between fragments of R and S

 Reduced join graph – Some of the edges between fragments of R and fragments
of S are missing

o Partitioned – graph is composed of two or more subgraphs without edges


between them (Fig. b)
o Simple – it is partitioned and each subgraph has just one edge (Fig. c)

Allocation of fragments
 Decide whether we would go for non-redundant or redundant allocation.
 Non-redundant – The best-fit approach
o A measure is associated with each possible allocation
o The site with the highest measure is selected

 Redundant – All beneficial site approach


o Determine the set of all sites where the benefit of allocating one copy of
the fragment is higher than the update cost
o Allocate a copy of the fragment to each element of this set

 Redundant – Progressive introduction of replication approach


o Determine the solution of the non-replicated problem
o Progressively introduce replication starting from the most beneficial site
o Terminate replication when no additional replication is beneficial

 Both the approaches have some disadvantages


o In the all beneficial site approach quantifying

You might also like