0% found this document useful (0 votes)
9 views

Lecture4-Distribution_Design_Replica Allocation

Distibuted Design with Replication

Uploaded by

amirosama2121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture4-Distribution_Design_Replica Allocation

Distibuted Design with Replication

Uploaded by

amirosama2121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Principles of Distributed Database

Systems
M. Tamer Özsu
Patrick Valduriez

© 2020, M.T. Özsu & P. Valduriez 1


Outline
◼ Distributed and Parallel Database Design
❑ Fragmentation
❑ Data distribution
❑ Combined approaches

© 2020, M.T. Özsu & P. Valduriez 2


Distributed DB Design

◼ Distributed DB design refers to how the database should


be split (fragmented) and allocated to site to optimize
certain objective function.

◼ There are two issues:


❑ Data fragmentation which determines how the data should be
fragmented;
❑ Data allocation which determines how the fragments should be
allocated. While these two issues have traditionally been studied
independently, giving rise to a two-phase approach to the design
problem.

© 2020, M.T. Özsu & P. Valduriez 3


Outline
◼ Distributed and Parallel Database Design
❑ Fragmentation

© 2020, M.T. Özsu & P. Valduriez 4


General Goal to DB Design

◼ To provide high performance

◼ To provide reliability

◼ To provide functionality

◼ To fit into the existing environment

◼ To provide cost – saving solutions

© 2020, M.T. Özsu & P. Valduriez 5


Distributed Database Design Three key issues:

I. Fragmentation
Relation may be divided into a number of sub-relations, which are
then distributed among network sites.
II. Allocation
Each fragment is stored at site with “optimal” distribution.
III. Replication
Copy of fragment may be maintained at several sites increasing the
availability and performance.
IV. Location Transparency
Which enables a user to access data without knowing or being
concerned with, the site at which the data resides. The location is
hidden.

6
Fragmentation

◼ Can't we just distribute relations?


◼ What is a reasonable unit of distribution?
❑ relation
◼ views are subsets of relations ➔ locality
◼ extra communication
❑ fragments of relations (sub-relations)
◼ concurrent execution of a number of transactions that access
different portions of a relation
◼ views that cannot be defined on a single fragment will require extra
processing
◼ semantic data control (especially integrity enforcement) more
difficult

© 2020, M.T. Özsu & P. Valduriez 7


Example Database

© 2020, M.T. Özsu & P. Valduriez 8


Distributed Data Design: Fragmentation

I. Data Fragmentation

❑ Split a relation into logically related parts. A relation can

be fragmented in three ways:

◼ Horizontal Fragmentation

◼ Vertical Fragmentation

◼ Mixed (Hybrid) fragmentation

9
Distributed Data Design:
Horizontal fragmentation

❑ It is a horizontal subset of a relation which contain tuples that

satisfy a selection conditions.

❑ Consider the Employee relation with selection condition (DNO =

5). All tuples satisfy this condition will create a subset which will

be a horizontal fragment of Employee relation.

❑ To reconstruct R from horizontal fragments a UNION is applied.

10
Distributed Data Design:
Horizontal fragmentation

6/1/07 424252 Couch $570


6/1/07 256623 Car $1123
6/2/07 636353 Bike $86
6/5/07 662113 Chair $10
6/7/07 121113 Lamp $19
6/9/07 887734 Bike $56
6/11/07 252111 Scooter $18
6/11/07 116458 Hammer $8000

Server 1 Server 2 Server 3 Server 4

11
Distributed Data Design:

◼ Horizontal Fragmentation Example:

P1 =  Dno=‘5’(Employee) → Site 1

P2 =  Dno=‘7’(Employee) → Site 2

To reconstruct Employee relation:

P1 U P2

12
Fragmentation Alternatives – Horizontal

PROJ1 : projects with budgets


less than $200,000
PROJ2 : projects with budgets
greater than or equal
to $200,000

© 2020, M.T. Özsu & P. Valduriez 13


Distributed Data Design: Fragmentation
Vertical Fragmentation
❑ It is a subset of a relation which is created by a subset of columns.
There is no selection condition used in vertical fragmentation.
❑ Consider the Employee relation. A vertical fragment can be
created by projecting (π) the values of Name, DoB, and Address.
❑ Because there is no condition for creating a vertical fragment, each
fragment must include the primary key attribute of the parent
relation Employee. In this way all vertical fragments of a relation
are connected.
❑ To reconstruct R from complete vertical fragments an OUTER
JOIN is applied.

14
Distributed Data Design: Fragmentation

◼ Vertical Fragmentation
424252
424252 6/1/07 Couch $570
256623
256623 6/1/07 Car $1123
636353
636353 6/2/07 Bike $86
662113
662113 6/5/07 Chair $10
121113
121113 6/7/07 Lamp $19
887734
887734 6/9/07 Bike $56
252111
252111 6/11/07 Scooter $18
116458
116458 6/11/07 Hammer $8000

Server 2 Server 3 Server 4

15
Distributed Data Design: Fragmentation

◼ Vertical Fragmentation Example:

S1 =  staffNo, position, DOB, salary(Staff)


S2 =  staffNo, fName, lName, branchNo(Staff)

To reconstruct Staff relation:

S1 S2

10/30/2023
Fragmentation Alternatives – Vertical

PROJ1: information about


project budgets
PROJ2: information about
project names and
locations

© 2020, M.T. Özsu & P. Valduriez 17


Distributed Data Design: Fragmentation

◼ Hybrid Fragmentation Example:


In hybrid fragmentation, a combination of horizontal
and vertical fragmentation techniques are used. This
is the most flexible fragmentation technique since it
generates fragments with minimal extraneous
information. However, reconstruction of the original
table is often an expensive task.

10/30/2023
Distributed Data Design: Hybrid Fragmentation

◼ The first method is to create a set or group of horizontal


fragment and then create vertical fragment from one or
more of the horizontal fragments.

◼ The second method is to first create a set or group of


vertical fragments and then create horizontal fragments
from one or more of the vertical fragments

The original relation can be obtained by the combination of


join and union

© 2020, M.T. Özsu & P. Valduriez 19


Distributed Data Design: Hybrid Fragmentation

Consider this EMP table,


Apply hybrid fragmentation as:

© 2020, M.T. Özsu & P. Valduriez 20


Correctness of Fragmentation
◼ Completeness
❑ Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found in
some Ri
◼ Reconstruction
❑ Must be possible to define a relational operation that will reconstruct
R from the fragments.
❑ Reconstruction for horizontal fragmentation is Union operation and
Outer Join for vertical .

◼ Disjointness
❑ If data item di appears in fragment Ri, then it should not appear
in any other fragment. Exception: vertical fragmentation, where
primary key attributes must be repeated to allow reconstruction

© 2020, M.T. Özsu & P. Valduriez 21


Fragmentation

◼ Horizontal Fragmentation (HF)


❑ Primary Horizontal Fragmentation (PHF)
❑ Derived Horizontal Fragmentation (DHF)

◼ Vertical Fragmentation (VF)


◼ Hybrid Fragmentation (HF)

© 2020, M.T. Özsu & P. Valduriez 22


Primary Horizontal Fragmentation

◼ What is it?
❑ It is about fragmenting single table horizontally with set of simple
conditions.

◼ What do we require?
❑ The procedure to find the simple condition that are required to
fragment the table.

◼ How to find the simple conditions?


❑ Simple predicates
❑ Min-term predicates

© 2020, M.T. Özsu & P. Valduriez 23


PHF – Information Requirements

◼ Database Information
❑ relationship

❑ cardinality of each relation: card(R)

© 2020, M.T. Özsu & P. Valduriez 24


PHF – Simple Predicates
◼ Given a relation R with set of n attributes, a simple
predicate p is condition of the form as follow;
Attribute i comparison-operator value
◼ Here,
❑ Attribute is any attribute of the relation R
❑ Comparison operator can be one of =,<=<=,>=,<>
❑ Value is the permitted value for the domain of that attribute

◼ Example:
❑ REG_NO=119
❑ Gender=‘M’
❑ Grade>5

© 2020, M.T. Özsu & P. Valduriez 25


PHF – Simple Predicates
Set of simple predicates
❑ A relation usually fragmented using multiple simple predicates
collectively.

◼ Example:
❑ Employee (Emp_id, Ename, Department, Office)
❑ P={office=‘Alex’, department=‘Design’}  is a set of simple
predicate (include several simple predicate )

© 2020, M.T. Özsu & P. Valduriez 26


Desirable Properties Simple Predicates

The set of simple predicate should be complete and


minimal.

Complete
Not missing any data and equal probability of access by every
application

Minimal
If all the predicate of a set P are relevant, then P is minimal. That is, there
should be at least one application that access fragment f1 and f2 differently.

© 2020, M.T. Özsu & P. Valduriez 27


Min-term Predicates

◼ Can we use simple predicates for fragmentation


directly?
NO

◼ What is min-term predicate?


It is conjunction of different simple predicates either in its
regular or in negated form to define fragment

© 2020, M.T. Özsu & P. Valduriez 28


PHF - Information Requirements
◼ Application Information
❑ Simple predicates : Given R[A1, A2, …, An], a simple predicate
pj is
pj : Ai θValue
where θ  {=,<,≤,>,≥,≠}, Value  Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
❑ Min-term predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m 1,m 2,…,m r} as
M = { mi | mi = pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).

© 2020, M.T. Özsu & P. Valduriez 29


PHF – Information Requirements

Example
m 1: PNAME="Maintenance"  BUDGET≤200000

m 2: NOT(PNAME="Maintenance")  BUDGET≤200000

m 3: PNAME= "Maintenance"  NOT(BUDGET≤200000)

m 4: NOT(PNAME="Maintenance")  NOT(BUDGET≤200000)

Are these min-term fragments end s up in a valid


fragmentation?

© 2020, M.T. Özsu & P. Valduriez 30


PHF – Information Requirements

◼ Application Information
❑ Min term selectivities: sel(m i)
◼ The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm
predicate m i.
❑ access frequencies: acc(qi)
◼ The frequency with which a user application qi accesses data.
◼ Access frequency for a min term predicate can also be defined.

© 2020, M.T. Özsu & P. Valduriez 31


Primary Horizontal Fragmentation

Definition :
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a min term
predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the tuples of R
which satisfy a min term predicate m i.


Given a set of min term predicates M, there are as many horizontal
fragments of relation R as there are min term predicates.
Set of horizontal fragments also referred to as min term fragments.

© 2020, M.T. Özsu & P. Valduriez 32


PHF – Algorithm

Given: A relation R, the set of simple predicates Pr


Output: The set of fragments of R = {R1, R2,…,Rw} which
obey the fragmentation rules.

Preliminaries :
❑ Pr should be complete
❑ Pr should be minimal

© 2020, M.T. Özsu & P. Valduriez 33


Completeness of Simple Predicates

◼ A set of simple predicates Pr is said to be complete if


and only if the accesses to the tuples of the min term
fragments defined on Pr requires that two tuples of the
same min term fragment have the same probability of
being accessed by any application.

◼ Example :
❑ Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
❑ Find the budgets of projects at each location. (1)
❑ Find projects with budgets less than $200000. (2)

© 2020, M.T. Özsu & P. Valduriez 34


Completeness of Simple Predicates

According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}

which is not complete with respect to (2).


Modify
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}

which is complete.

© 2020, M.T. Özsu & P. Valduriez 35


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 36


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 37


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 38


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 39


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 40


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 41


Primary Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 42


Derived Horizontal Fragmentation
◼ The process of creating horizontal fragments of a table in
question based on the already created horizontal fragments
of another relations (for example, base table) is called
Derived Horizontal Fragmentation

© 2020, M.T. Özsu & P. Valduriez 43


Derived Horizontal Fragmentation AA

Semi –join
Semi join is a join operation that result in a structure and
records of one table that match with the records of another
table.

© 2020, M.T. Özsu & P. Valduriez 44


Schema Example

© 2020, M.T. Özsu & P. Valduriez 45


Derived Horizontal Fragmentation
Example
◼ Let's, assume that the owner relation student is horizontally
fragmented as follow:

© 2020, M.T. Özsu & P. Valduriez 46


Derived Horizontal Fragmentation
Example
◼ The member relation Grade_Detail should be
fragmented into 4 fragments using the fragments of
owner relation student

© 2020, M.T. Özsu & P. Valduriez 47


Derived Horizontal Fragmentation
Example

© 2020, M.T. Özsu & P. Valduriez 48


DHF – Correctness
◼ Completeness
❑ Referential integrity
❑ Let R be the member relation of a link whose owner is relation S
which is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be
the join attribute between R and S. Then, for each tuple t of R,
there should be a tuple t' of S such that
t[A] = t' [A]
◼ Reconstruction
❑ Same as primary horizontal fragmentation.
◼ Disjointness
❑ Simple join graphs between the owner and the member
fragments.

© 2020, M.T. Özsu & P. Valduriez 49


Vertical Fragmentation

◼ Has been studied within the centralized context


❑ design methodology
❑ physical clustering
◼ More difficult than horizontal, because more alternatives
exist.
Two approaches :
❑ grouping
◼ attributes to fragments
❑ splitting
◼ relation to fragments

© 2020, M.T. Özsu & P. Valduriez 50


Data Replication and Allocation

▪ Replication is useful in improving the availability of data

▪ The most extreme case is replication of the whole DB


at every site in the distributed system, thus creating a
fully replicated distributed DB.

▪ The other extreme from full replication involve having


no replications

© 2020, M.T. Özsu & P. Valduriez 51


Data Replication and Allocation

▪ Between these two extremes, we have a wide


spectrum of partial replication of the data- that is, some
fragments of the database may be replicated whereas
other may not.

▪ The number of copies of each fragment can range from


one up to the total number of sites in the distributed
system (1:N)

▪ A description of the replication of fragments is


sometimes called a replication schema.

© 2020, M.T. Özsu & P. Valduriez 52


Data Replication and Allocation

◼ Each fragment must be assigned to a particular site in


the distributed system.

◼ This process is called data distributed (or data allocation)

◼ The choice of the sites and the degree of replication


depend on the performance and availability goal of the
system and on the types and frequencies of transactions
submitted at each site.

© 2020, M.T. Özsu & P. Valduriez 53


Data Replication and Allocation
◼ Non-replicated
❑ partitioned : each fragment resides at only one site
◼ Replicated
❑ fully replicated : each fragment at each site
❑ partially replicated : each fragment at some of the sites

▪ Synchronous and asynchronous replication

◼ Rule of thumb:
If (Read-only Queries)/(updated Queries)>=1, Data
Replication is advantageous.

If (Read-only Queries)/(Updated Queries)<1, Data


Replication may cause problem.

© 2020, M.T. Özsu & P. Valduriez 54


Comparison of Replication Alternatives
◼ Full replication
Availability of same copy of a database in multiple
locations/sites is referred as full replication.
◼ Partial replication
Database gets fragmented and some of the fragments are
replicated(multiple copies of same fragment) and
maintained at many locations/sites. This kind of distribution
is called partial replication.
◼ Partitioning
A non-replicated database is called as partitioned
database. That is, a table is fragmented, and each
fragment is stored at different locations.
© 2020, M.T. Özsu & P. Valduriez 55
Comparison of
Replication Alternatives

© 2020, M.T. Özsu & P. Valduriez 56


Comparison of Replication Alternatives

© 2020, M.T. Özsu & P. Valduriez 57


Data Replication and Allocation
◼ If high availability is required, transaction can be submitted
at any site, and most transactions are retrieval only, a fully
replicated DB is a good choice.
◼ If a certain transactions that access particular parts of the
DB are mostly submitted at a particular site, the
corresponding set of fragments can be allocated at that site
only.
◼ Data that is accessed at multiple sites can be replicated at
those sites.
◼ If many updates are performed , it may be useful to limit
replication.
Finding an optimal or even a good solution to distributed data
allocation is a complex optimization problem
© 2020, M.T. Özsu & P. Valduriez 58
Allocation
◼ It is the process to decide where exactly you want to
store the data in memory.

◼ Involve the decision as per which data has to be stored


at what location.
❑ Centralized data allocation (Entire DB is stored at one site)
❑ Partitioned data allocation (Database is divided into several
disjointed parts (fragments) and stored at several sites)
❑ Replicated data allocation (Copies of one/more DB fragments
are stored a several sites)

◼ Data distribution over computer network is achieved


through data partition, data replication, or a combination
of both
© 2020, M.T. Özsu & P. Valduriez 59
Outline
◼ Distributed and Parallel Database Design

❑ Combined approaches

© 2020, M.T. Özsu & P. Valduriez 60


Combining Fragmentation & Allocation

Partition the data to dictate where it is located


◼ Workload-agnostic techniques
❑ Round-robin partitioning
❑ Hash partitioning
❑ Range partitioning
◼ Workload-aware techniques
❑ Graph-based approach

© 2020, M.T. Özsu & P. Valduriez 61


Round-robin Partitioning

◼ Data is distributed
evenly by Informatica
among all partitions.

◼ This partitioning is
used where the
number of rows to
process in each
partition are
approximately same

© 2020, M.T. Özsu & P. Valduriez 62


Round-robin Partitioning

© 2020, M.T. Özsu & P. Valduriez 63


Round-robin Partitioning

Four partitions are created.


Note: The table must not have primary keys

The number of partitions is determined by the database at


runtime according to its configuration.

© 2020, M.T. Özsu & P. Valduriez 64


Hash Partitioning
◼ Informatica server
applies a hash function
for the purpose of
partitioning keys to
group data among
partitions.

◼ It is used where
ensuring the processes
groups of rows with the
same partitioning key in
the same partition,
need to be ensured.

© 2020, M.T. Özsu & P. Valduriez 65


Hash Partitioning

© 2020, M.T. Özsu & P. Valduriez 66


Range Partitioning
◼ Range partitioning
creates dedicated
partitions for certain
values or value ranges
in a table.
◼ The range partitioning
specification usually
takes ranges of values
to determine one
partition (the integers 1
to 10 for example) but it
is also possible to define
a partition for a single
value.

© 2020, M.T. Özsu & P. Valduriez 67


Range Partitioning

© 2020, M.T. Özsu & P. Valduriez 68


Comparison between Partitioning
Techniques

© 2020, M.T. Özsu & P. Valduriez 69


Questions ?

© 2020, M.T. Özsu & P. Valduriez 70

You might also like