Lecture4-Distribution_Design_Replica Allocation
Lecture4-Distribution_Design_Replica Allocation
Systems
M. Tamer Özsu
Patrick Valduriez
◼ To provide reliability
◼ To provide functionality
I. Fragmentation
Relation may be divided into a number of sub-relations, which are
then distributed among network sites.
II. Allocation
Each fragment is stored at site with “optimal” distribution.
III. Replication
Copy of fragment may be maintained at several sites increasing the
availability and performance.
IV. Location Transparency
Which enables a user to access data without knowing or being
concerned with, the site at which the data resides. The location is
hidden.
6
Fragmentation
I. Data Fragmentation
◼ Horizontal Fragmentation
◼ Vertical Fragmentation
9
Distributed Data Design:
Horizontal fragmentation
5). All tuples satisfy this condition will create a subset which will
10
Distributed Data Design:
Horizontal fragmentation
11
Distributed Data Design:
P1 = Dno=‘5’(Employee) → Site 1
P2 = Dno=‘7’(Employee) → Site 2
P1 U P2
12
Fragmentation Alternatives – Horizontal
14
Distributed Data Design: Fragmentation
◼ Vertical Fragmentation
424252
424252 6/1/07 Couch $570
256623
256623 6/1/07 Car $1123
636353
636353 6/2/07 Bike $86
662113
662113 6/5/07 Chair $10
121113
121113 6/7/07 Lamp $19
887734
887734 6/9/07 Bike $56
252111
252111 6/11/07 Scooter $18
116458
116458 6/11/07 Hammer $8000
15
Distributed Data Design: Fragmentation
S1 S2
10/30/2023
Fragmentation Alternatives – Vertical
10/30/2023
Distributed Data Design: Hybrid Fragmentation
◼ Disjointness
❑ If data item di appears in fragment Ri, then it should not appear
in any other fragment. Exception: vertical fragmentation, where
primary key attributes must be repeated to allow reconstruction
◼ What is it?
❑ It is about fragmenting single table horizontally with set of simple
conditions.
◼ What do we require?
❑ The procedure to find the simple condition that are required to
fragment the table.
◼ Database Information
❑ relationship
◼ Example:
❑ REG_NO=119
❑ Gender=‘M’
❑ Grade>5
◼ Example:
❑ Employee (Emp_id, Ename, Department, Office)
❑ P={office=‘Alex’, department=‘Design’} is a set of simple
predicate (include several simple predicate )
Complete
Not missing any data and equal probability of access by every
application
Minimal
If all the predicate of a set P are relevant, then P is minimal. That is, there
should be at least one application that access fragment f1 and f2 differently.
Example
m 1: PNAME="Maintenance" BUDGET≤200000
m 2: NOT(PNAME="Maintenance") BUDGET≤200000
m 4: NOT(PNAME="Maintenance") NOT(BUDGET≤200000)
◼ Application Information
❑ Min term selectivities: sel(m i)
◼ The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm
predicate m i.
❑ access frequencies: acc(qi)
◼ The frequency with which a user application qi accesses data.
◼ Access frequency for a min term predicate can also be defined.
Definition :
Rj = Fj(R), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a min term
predicate.
Therefore,
A horizontal fragment Ri of relation R consists of all the tuples of R
which satisfy a min term predicate m i.
Given a set of min term predicates M, there are as many horizontal
fragments of relation R as there are min term predicates.
Set of horizontal fragments also referred to as min term fragments.
Preliminaries :
❑ Pr should be complete
❑ Pr should be minimal
◼ Example :
❑ Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
❑ Find the budgets of projects at each location. (1)
❑ Find projects with budgets less than $200000. (2)
According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}
which is complete.
Semi –join
Semi join is a join operation that result in a structure and
records of one table that match with the records of another
table.
◼ Rule of thumb:
If (Read-only Queries)/(updated Queries)>=1, Data
Replication is advantageous.
❑ Combined approaches
◼ Data is distributed
evenly by Informatica
among all partitions.
◼ This partitioning is
used where the
number of rows to
process in each
partition are
approximately same
◼ It is used where
ensuring the processes
groups of rows with the
same partitioning key in
the same partition,
need to be ensured.