0% found this document useful (0 votes)

9 views

Lecture4-Distribution_Design_Replica Allocation

Distibuted Design with Replication

Uploaded by

amirosama2121

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture4-Distribution_Design_Replica Allocation

Distibuted Design with Replication

Uploaded by

amirosama2121

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Principles of Distributed Database

Systems
M. Tamer Özsu
Patrick Valduriez

© 2020, M.T. Özsu & P. Valduriez 1

Outline
◼ Distributed and Parallel Database Design
❑ Fragmentation
❑ Data distribution
❑ Combined approaches

© 2020, M.T. Özsu & P. Valduriez 2

Distributed DB Design

◼ Distributed DB design refers to how the database should

be split (fragmented) and allocated to site to optimize
certain objective function.

◼ There are two issues:

❑ Data fragmentation which determines how the data should be
fragmented;
❑ Data allocation which determines how the fragments should be
allocated. While these two issues have traditionally been studied
independently, giving rise to a two-phase approach to the design
problem.

© 2020, M.T. Özsu & P. Valduriez 3

Outline
◼ Distributed and Parallel Database Design
❑ Fragmentation
❑

© 2020, M.T. Özsu & P. Valduriez 4

General Goal to DB Design

◼ To provide high performance

◼ To provide reliability

◼ To provide functionality

◼ To fit into the existing environment

◼ To provide cost – saving solutions

© 2020, M.T. Özsu & P. Valduriez 5

Distributed Database Design Three key issues:

I. Fragmentation
Relation may be divided into a number of sub-relations, which are
then distributed among network sites.
II. Allocation
Each fragment is stored at site with “optimal” distribution.
III. Replication
Copy of fragment may be maintained at several sites increasing the
availability and performance.
IV. Location Transparency
Which enables a user to access data without knowing or being
concerned with, the site at which the data resides. The location is
hidden.

6
Fragmentation

◼ Can't we just distribute relations?

◼ What is a reasonable unit of distribution?
❑ relation
◼ views are subsets of relations ➔ locality
◼ extra communication
❑ fragments of relations (sub-relations)
◼ concurrent execution of a number of transactions that access
different portions of a relation
◼ views that cannot be defined on a single fragment will require extra
processing
◼ semantic data control (especially integrity enforcement) more
difficult

© 2020, M.T. Özsu & P. Valduriez 7

Example Database

© 2020, M.T. Özsu & P. Valduriez 8

Distributed Data Design: Fragmentation

I. Data Fragmentation

❑ Split a relation into logically related parts. A relation can

be fragmented in three ways:

◼ Horizontal Fragmentation

◼ Vertical Fragmentation

◼ Mixed (Hybrid) fragmentation

9
Distributed Data Design:
Horizontal fragmentation

❑ It is a horizontal subset of a relation which contain tuples that

satisfy a selection conditions.

❑ Consider the Employee relation with selection condition (DNO =

5). All tuples satisfy this condition will create a subset which will

be a horizontal fragment of Employee relation.

❑ To reconstruct R from horizontal fragments a UNION is applied.

10
Distributed Data Design:
Horizontal fragmentation

6/1/07 424252 Couch $570

6/1/07 256623 Car $1123
6/2/07 636353 Bike $86
6/5/07 662113 Chair $10
6/7/07 121113 Lamp $19
6/9/07 887734 Bike $56
6/11/07 252111 Scooter $18
6/11/07 116458 Hammer $8000

Server 1 Server 2 Server 3 Server 4

11
Distributed Data Design:

◼ Horizontal Fragmentation Example:

P1 =  Dno=‘5’(Employee) → Site 1

P2 =  Dno=‘7’(Employee) → Site 2

To reconstruct Employee relation:

P1 U P2

12
Fragmentation Alternatives – Horizontal

PROJ1 : projects with budgets

less than $200,000
PROJ2 : projects with budgets
greater than or equal
to $200,000

© 2020, M.T. Özsu & P. Valduriez 13

Distributed Data Design: Fragmentation
Vertical Fragmentation
❑ It is a subset of a relation which is created by a subset of columns.
There is no selection condition used in vertical fragmentation.
❑ Consider the Employee relation. A vertical fragment can be
created by projecting (π) the values of Name, DoB, and Address.
❑ Because there is no condition for creating a vertical fragment, each
fragment must include the primary key attribute of the parent
relation Employee. In this way all vertical fragments of a relation
are connected.
❑ To reconstruct R from complete vertical fragments an OUTER
JOIN is applied.

14
Distributed Data Design: Fragmentation

◼ Vertical Fragmentation
424252
424252 6/1/07 Couch $570
256623
256623 6/1/07 Car $1123
636353
636353 6/2/07 Bike $86
662113
662113 6/5/07 Chair $10
121113
121113 6/7/07 Lamp $19
887734
887734 6/9/07 Bike $56
252111
252111 6/11/07 Scooter $18
116458
116458 6/11/07 Hammer $8000

Server 2 Server 3 Server 4

15
Distributed Data Design: Fragmentation

◼ Vertical Fragmentation Example:

S1 =  staffNo, position, DOB, salary(Staff)

S2 =  staffNo, fName, lName, branchNo(Staff)

To reconstruct Staff relation:

S1 S2

10/30/2023
Fragmentation Alternatives – Vertical

PROJ1: information about

project budgets
PROJ2: information about
project names and
locations

© 2020, M.T. Özsu & P. Valduriez 17

Distributed Data Design: Fragmentation

◼ Hybrid Fragmentation Example:

In hybrid fragmentation, a combination of horizontal
and vertical fragmentation techniques are used. This
is the most flexible fragmentation technique since it
generates fragments with minimal extraneous
information. However, reconstruction of the original
table is often an expensive task.

10/30/2023
Distributed Data Design: Hybrid Fragmentation

◼ The first method is to create a set or group of horizontal

fragment and then create vertical fragment from one or
more of the horizontal fragments.

◼ The second method is to first create a set or group of

vertical fragments and then create horizontal fragments
from one or more of the vertical fragments

The original relation can be obtained by the combination of

join and union

© 2020, M.T. Özsu & P. Valduriez 19

Distributed Data Design: Hybrid Fragmentation

Consider this EMP table,

Apply hybrid fragmentation as:

© 2020, M.T. Özsu & P. Valduriez 20

Correctness of Fragmentation
◼ Completeness
❑ Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found in
some Ri
◼ Reconstruction
❑ Must be possible to define a relational operation that will reconstruct
R from the fragments.
❑ Reconstruction for horizontal fragmentation is Union operation and
Outer Join for vertical .

◼ Disjointness
❑ If data item di appears in fragment Ri, then it should not appear
in any other fragment. Exception: vertical fragmentation, where
primary key attributes must be repeated to allow reconstruction

© 2020, M.T. Özsu & P. Valduriez 21

Fragmentation

◼ Horizontal Fragmentation (HF)

❑ Primary Horizontal Fragmentation (PHF)
❑ Derived Horizontal Fragmentation (DHF)

◼ Vertical Fragmentation (VF)

◼ Hybrid Fragmentation (HF)

© 2020, M.T. Özsu & P. Valduriez 22

Primary Horizontal Fragmentation

◼ What is it?
❑ It is about fragmenting single table horizontally with set of simple
conditions.

◼ What do we require?
❑ The procedure to find the simple condition that are required to
fragment the table.

◼ How to find the simple conditions?

❑ Simple predicates
❑ Min-term predicates

© 2020, M.T. Özsu & P. Valduriez 23

PHF – Information Requirements

◼ Database Information
❑ relationship

❑ cardinality of each relation: card(R)

© 2020, M.T. Özsu & P. Valduriez 24

PHF – Simple Predicates
◼ Given a relation R with set of n attributes, a simple
predicate p is condition of the form as follow;
Attribute i comparison-operator value
◼ Here,
❑ Attribute is any attribute of the relation R
❑ Comparison operator can be one of =,<=<=,>=,<>
❑ Value is the permitted value for the domain of that attribute

◼ Example:
❑ REG_NO=119
❑ Gender=‘M’
❑ Grade>5

© 2020, M.T. Özsu & P. Valduriez 25

PHF – Simple Predicates
Set of simple predicates
❑ A relation usually fragmented using multiple simple predicates
collectively.

◼ Example:
❑ Employee (Emp_id, Ename, Department, Office)
❑ P={office=‘Alex’, department=‘Design’}  is a set of simple
predicate (include several simple predicate )

© 2020, M.T. Özsu & P. Valduriez 26

Desirable Properties Simple Predicates

The set of simple predicate should be complete and

minimal.

Complete
Not missing any data and equal probability of access by every
application

Minimal
If all the predicate of a set P are relevant, then P is minimal. That is, there
should be at least one application that access fragment f1 and f2 differently.

© 2020, M.T. Özsu & P. Valduriez 27

Min-term Predicates

◼ Can we use simple predicates for fragmentation

directly?
NO

◼ What is min-term predicate?

It is conjunction of different simple predicates either in its
regular or in negated form to define fragment

© 2020, M.T. Özsu & P. Valduriez 28

PHF - Information Requirements
◼ Application Information
❑ Simple predicates : Given R[A1, A2, …, An], a simple predicate
pj is
pj : Ai θValue
where θ  {=,<,≤,>,≥,≠}, Value  Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
❑ Min-term predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m 1,m 2,…,m r} as
M = { mi | mi = pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).

© 2020, M.T. Özsu & P. Valduriez 29

PHF – Information Requirements

Example
m 1: PNAME="Maintenance"  BUDGET≤200000

m 2: NOT(PNAME="Maintenance")  BUDGET≤200000

m 3: PNAME= "Maintenance"  NOT(BUDGET≤200000)

m 4: NOT(PNAME="Maintenance")  NOT(BUDGET≤200000)

Are these min-term fragments end s up in a valid

fragmentation?

© 2020, M.T. Özsu & P. Valduriez 30

PHF – Information Requirements

◼ Application Information
❑ Min term selectivities: sel(m i)
◼ The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm
predicate m i.
❑ access frequencies: acc(qi)
◼ The frequency with which a user application qi accesses data.
◼ Access frequency for a min term predicate can also be defined.

© 2020, M.T. Özsu & P. Valduriez 31

Primary Horizontal Fragmentation

Definition :
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a min term
predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the tuples of R
which satisfy a min term predicate m i.


Given a set of min term predicates M, there are as many horizontal
fragments of relation R as there are min term predicates.
Set of horizontal fragments also referred to as min term fragments.

© 2020, M.T. Özsu & P. Valduriez 32

PHF – Algorithm

Given: A relation R, the set of simple predicates Pr

Output: The set of fragments of R = {R1, R2,…,Rw} which
obey the fragmentation rules.

Preliminaries :
❑ Pr should be complete
❑ Pr should be minimal

© 2020, M.T. Özsu & P. Valduriez 33

Completeness of Simple Predicates

◼ A set of simple predicates Pr is said to be complete if

and only if the accesses to the tuples of the min term
fragments defined on Pr requires that two tuples of the
same min term fragment have the same probability of
being accessed by any application.

◼ Example :
❑ Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
❑ Find the budgets of projects at each location. (1)
❑ Find projects with budgets less than $200000. (2)

© 2020, M.T. Özsu & P. Valduriez 34

Completeness of Simple Predicates

According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}