0% found this document useful (0 votes)
6 views

W7 DBMS Chapter23

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

W7 DBMS Chapter23

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

CHAPTER 23

Distributed
Database Concepts

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Introduction
 Distributed databases (DDB) bring the
advantages of distributed computing to the
database domain.
 Distributed computing system
 Consists of several processing sites or nodes

interconnected by a computer network


 Nodes cooperate in performing certain tasks

 Partitions large task into smaller tasks for

efficient solving

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Introduction
 Distributed computing systems partition a big,
unmanageable problem into smaller pieces and
solve it efficiently in a coordinated manner.
 DDB technology resulted from a merger of two
technologies: database technology and
distributed systems technology.
 The origins of big data technologies come from
distributed systems and database systems, as
well as data mining and machine learning
algorithms that can process these vast amounts
of data to extract needed knowledge .
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Distributed Database Concepts
 What constitutes a distributed database?
 Connection of database nodes over computer
network
 Logical interrelation of the connected databases
 Possible absence of homogeneity among
connected nodes
 Distributed database management system
(DDBMS)
 Software system that manages a distributed
database

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Distributed Databases

Figure 23.1 Data distribution and replication among distributed databases

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Transparency
 Transparency: Hiding implementation details from the
end user.
 A highly transparent system offers a lot of flexibility to
the end user/application developer since it requires little
or no awareness of underlying details on their part.
 In the case of a traditional centralized database,
transparency simply pertains to logical and physical data
independence for application developers.
 In a DDB scenario, the data and software are distributed
over multiple nodes connected by a computer network,
so additional types of transparencies are introduced.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Types of Transparency in DDB
 Data organization transparency (also known as
distribution or network transparency): This refers to
freedom for the user from the operational details of the
network and the placement of the data in the distributed
system.
 Replication transparency: makes the user unaware of the
existence of these copies.
 Fragmentation transparency: makes the user unaware of
the existence of fragments.
 Design transparency: refer, to freedom from knowing how
the distributed database is designed.
 Execution transparency: refer to freedom from knowing
where a transaction executes.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Availability and Reliability
 Availability: Probability that the system is continuously available during
a time interval
 Reliability: Probability that the system is running (not down) at a certain
time point
 Availability and Reliability are directly related to faults, errors, and
failures
 A failure can be described as a deviation of a system’s behaviour

from that which is specified in order to ensure correct execution of


operations.
 Errors constitute that subset of system states that causes the

failure.
 Fault is the cause of an errors.

 Fault tolerance approach is recognizes that faults will occur, and it


designs mechanisms that can detect and remove faults before they
can result in a system failure.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Scalability and Partition Tolerance
 Scalability determines the extent to which the system can
expand its capacity while continuing to operate without
interruption.
 Horizontal scalability: This refers to expanding the number of

nodes in the distributed system. As nodes are added to the


system, it should be possible to distribute some of the data
and processing loads from existing nodes to the new nodes.
 Vertical scalability: This refers to expanding the capacity of the
individual nodes in the system, such as expanding the storage
capacity or the processing power of a node.
 The concept of partition tolerance states that the system
should have the capacity to continue operating while the
network is partitioned

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Autonomy
 Autonomy determines the extent to which individual nodes or
DBs in a connected DDB can operate independently.
 A high degree of autonomy is desirable for increased flexibility
and customized maintenance of an individual node.
 Autonomy can be applied to design, communication, and
execution:
 Design autonomy refers to independence of data model usage
and transaction management techniques among nodes.
 Communication autonomy determines the extent to which
each node can decide on sharing of information with other
nodes.
 Execution autonomy refers to independence of users to act as
they please.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Advantages of Distributed Databases
 Improved ease and flexibility of application
development:
 Development at geographically dispersed sites
 Increased availability
 Isolate faults to their site of origin
 Improved performance
 Data localization
 Easier expansion via scalability
 Easier than in non-distributed systems

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Data Fragmentation, Replication for
Distributed Database Design
 Data Fragmentation are techniques used to break up the
database into logical units, called fragments, which may be
assigned for storage at the various nodes. Fragmentation
types are: Horizontal Fragmentation, Vertical
Fragmentation, and Mixed (Hybrid) Fragmentation.
 Data replication, which permits certain data to be stored in
more than one site to increase availability and reliability; and
the process of allocating fragments for storage at the
various nodes.
 Fragmentation schema: Defines a set of fragments that
includes all attributes and tuples in the database
 Allocation schema: Describes the allocation of fragments to
nodes of the DDBS.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Types of Fragmentation

a) Horizontal and (b) vertical


fragmentation.

Mixed fragmentation: (a) vertical fragments,


horizontally fragmented; (b) horizontal fragments,
vertically fragmented.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Horizontal Fragmentation
 Horizontal fragmentation
 Horizontal fragment of a relation is a subset of the

tuples in that relation


 Can be specified by condition on one or more

attributes or by some other method


 Groups rows to create subsets of tuples


Each subset has a certain logical meaning

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Horizontal Fragmentation
Acc_No Balance Branch_Name
A_101 5000 Pune
A_102 10,000 Baroda
A_103 25,000 Delhi

For the above table we can define any simple condition like,
Branch_Name= 'Pune', Branch_Name= 'Delhi', Balance < 50,000
Fragmentation1: Fragmentation2:
SELECT * FROM Account WHERE SELECT * FROM Account WHERE
Branch_Name= 'Pune' AND Branch_Name= 'Delhi' AND
Balance < 50,000 Balance < 50,000

Acc_No Balance Branch_Name Acc_No Balance Branch_Name


A_101 5000 Pune A_103 25,000 Delhi

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Vertical fragmentation
 Each site may not need all the attributes of a
relation, which would indicate the need for a different
type of fragmentation.
 Vertical fragmentation divides a relation “vertically”
by columns.
 A vertical fragment of a relation keeps only certain
attributes of the relation

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Vertical Fragmentation
Acc_No Balance Branch_Name
A_101 5000 Pune
A_102 10,000 Baroda
A_103 25,000 Delhi

For the above table we can define any simple condition like,
Branch_Name= 'Pune', Branch_Name= 'Delhi', Balance < 50,000
Fragmentation1: Fragmentation2:
SELECT Acc_No, Balance FROM SELECT Acc_No, Branch_Name
ACCOUNT FROM ACCOUNT

Acc_No Balance Acc_No Branch_Name


A_101 5000 A_101 Pune
A_102 10,000 A_102 Baroda
A_103 25,000 A_103 Delhi

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Mixed (hybrid) Fragmentation
 Mixed (hybrid) fragmentation Combination of horizontal
and vertical fragmentations.
 There are two alternative ways of hybrid fragmentation
can be done:
 At first, generate a set of horizontal fragments; then

generate vertical fragments from one or more of the


horizontal fragments.
 At first, generate a set of vertical fragments; then

generate horizontal fragments from one or more of the


vertical fragments.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Mixed (hybrid) Fragmentation
p_ID Emp_Name Emp_Address Emp_Age Emp_Salary
101 Surendra Baroda 25 15000
102 Jaya Pune 37 12000
103 Jayesh Pune 47 10000

Fragmentation1: Fragmentation1:
SELECT p_ID, Emp_Name FROM SELECT p_ID, Emp_Name FROM
Empolyee WHERE Emp_Age < 40 Empolyee WHERE Emp_Address= 'Pune'
AND Salary < 14000

p_ID Emp_Name p_ID Emp_Name


101 Surendra 102 Jaya
102 Jaya 103 Jayesh

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Data Replication
 Replication is useful in improving the availability of data.
 The most extreme case is replication of the whole database at
every site in the distributed system, thus creating a fully
replicated distributed database.
 Replication of whole database at every site in distributed

system
 Improves availability remarkably

 Update operations can be slow

 The other extreme case is, each fragment is stored at exactly


one site. Thus creating a no replication.
 Between these two extremes, we have a wide spectrum of
partial replication of the data—that is, some fragments of the
database may be replicated whereas others may not.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Data Allocation (Data Distribution)
 Data allocation: Each fragment assigned to a
particular site in the distributed system.
 Choices depend on performance and availability
goals of the system and on the types and
frequencies of transactions submitted at each site.
 Data that is accessed at multiple sites can be
replicated at those sites.
 If many updates are performed, it may be useful to
limit replication.
 Finding an optimal or even a good solution to
distributed data allocation is a complex optimization
problem.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Example of Fragmentation,
Allocation, and Replication
 Use the company database below. Suppose that company has
three computer sites one for each current department.
 Site 1 is used by company headquarters and accesses all
employee and project information regularly, in addition to keeping
track of DEPENDENT information for insurance purposes.
 Sites 2 and 3 are for departments 5 and 4, respectively. At each of
these sites,
 We expect frequent access to the EMPLOYEE and PROJECT

information for the employees who work in that department and


the projects controlled by that department.
 Further, we assume that these sites mainly access the Name,

Ssn, Salary, and Super_ssn attributes of EMPLOYEE.


 According to these requirements, the whole database in can be
stored at site 1.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Problems related with Concurrency Control and
Recovery in Distributed DBMS Environment
 Multiple copies of the data items: The concurrency control
method is responsible for maintaining consistency among these
copies.
 Failure of individual sites: The DDBMS should continue to
operate with its running sites, if possible, when one or more
individual sites fail.
 Failure of communication links: The system must be able to deal
with the failure of one or more of the communication links that
connect the sites.
 Distributed commit: Problems can arise with committing a
transaction that is accessing databases stored on multiple sites if
some sites fail during the commit process.
 Distributed deadlock: Deadlock may occur among several sites,
so techniques for dealing with deadlocks must be extended to take
this into account.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item
 Distinguished Copy of a Data Item: Particular copy of each
data item designated as distinguished copy. Locks are
associated with the distinguished copy and all locking and
unlocking requests are sent to the site that contains that copy.
 Primary Site Technique: a single primary site is designated to
be the coordinator site for all database items. Hence, all locks
are kept at that site, and all requests for locking or unlocking are
sent there.
 Primary Site with Backup Site: This approach designating a
second site to be a backup site. All locking information is
maintained at both the primary and the backup sites.
 Primary Copy Technique. This method attempts to distribute
the load of lock coordination among various sites by having the
distinguished copies of different data items stored at different
sites.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Distributed Concurrency Control
Based on Voting Method
 There is no distinguished copy.
 Lock request is sent to all sites that includes a copy of the
data item. Each copy maintains its own lock and can grant or
deny the request for it.
 If a transaction that requests a lock is granted that lock by

a majority of the copies, it holds the lock and informs all


copies that it has been granted the lock.
 If a transaction does not receive a majority of votes

granting it a lock within a certain time-out period, it cancels


its request and informs all sites of the cancellation.
 Voting method has higher message traffic among sites than
do the distinguished copy methods.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Distributed Recovery
 Difficult to determine whether a site is down
without exchanging numerous messages with
other sites
 Distributed commit when a transaction is
updating data at several sties, it cannot
commit until certain its effect on every site
cannot be lost.
 Two-phase commit protocol often used to
ensure correctness.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Types of Distributed Database Systems
 Factors that influence types of DDBMSs
 Degree of homogeneity of DDBMS software

Homogeneous

Heterogeneous
 Degree of local autonomy

No local autonomy

Multidatabase system has full local autonomy
 Federated database system (FDBS)
 Global view or schema of the federation of
databases is shared by the applications

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Assignment 5
 Write the model answers for the mid-semester
exam questions.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Assignment 6-A
Use the Student table show content of the Student
following fragment: RollNo Marks University
Fragment 1: T01 33 Harvard
SELECT * FROM Student WHERE University = T03 77 Stanford
‘California’ AND Marks ≥ 77; T04 23 California
Fragment 2: T02 89 California
SELECT * FROM Student WHERE University <> ‘Harvard’ T05 90 Harvard
AND University <> ‘California’ AND Marks ≥ 77; T06 90 Harvard
Fragment 3:
T07 15 Stanford
SELECT * FROM Student WHERE University = ‘Harvard’
AND Marks ≥ 77;
Fragment 4:
SELECT * FROM Student WHERE University = ‘Harvard’
AND Marks < 77;
Fragment 5:
SELECT * FROM Student WHERE University <> ‘Harvard’
AND University <> ‘California’ AND Marks < 77

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Address
Assignment 6-B RollNo Address
01 City A, IRAQ
Student
01 City A, UK
RollNo NAME MARKS COUNTRY
02 City D, Italy
01 Fazal 22 IRAQ 02 City A, Pakistan
02 Abdul 66 Italy 03 City D, IRAQ
03 Sameed 77 UK 04 City D, Iraq
04 Shahzeb 90 China 04 City A, Pakistan
05 Mumraiz 66 Chin 05 City B, China

International university maintains the information about its STUDENTs. They


store information about the STUDENT in STUDENT table and the STUDENT
addresses in ADDRESS table as shown above.
If the International university would go for fragmenting the relation STUDENT
on the Country attribute and fragment the second relation ADDRESS based
on the fragment created in STUDENT relation.
Write the appropriate query statements to perform the required
fragmentation and show the content of each fragment for STUDENT
and ADDRESS tables.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe

You might also like