SlideShare a Scribd company logo
Slide 25- 1
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter 25
Distributed Databases and
Client-Server Architectures
Slide 25- 3
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter 25 Outline
1. Distributed Database Concepts
2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Query Processing
5. Concurrency Control and Recovery
6. 3-Tier Client-Server Architecture
Slide 25- 4
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database Concepts
 A transaction can be executed by multiple
networked computers in a unified manner.
 A distributed database (DDB) processes Unit of
execution (a transaction) in a distributed manner.
A distributed database (DDB) can be defined as
 A distributed database (DDB) is a collection of
multiple logically related database distributed over
a computer network, and a distributed database
management system as a software system that
manages a distributed database while making the
distribution transparent to the user.
Slide 25- 5
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Advantages
 Management of distributed data with different
levels of transparency:

This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).
Slide 25- 6
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Advantages (transparency, contd.)
 The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and stored
with possible replication as shown below.
Slide 25- 7
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Advantages (transparency, contd.)
 Distribution and Network transparency:

Users do not have to worry about operational details
of the network.
 There is Location transparency, which refers to freedom of
issuing command from any location without affecting its
working.
 Then there is Naming transparency, which allows access
to any names object (files, relations, etc.) from any
location.
Slide 25- 8
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Advantages (transparency, contd.)
 Replication transparency:

It allows to store copies of a data at multiple sites as
shown in the above diagram.

This is done to minimize access time to the required
data.
 Fragmentation transparency:

Allows to fragment a relation horizontally (create a
subset of tuples of a relation) or vertically (create a
subset of columns of a relation).
Slide 25- 9
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Other Advantages
 Increased reliability and availability:

Reliability refers to system live time, that is, system
is running efficiently most of the time. Availability is
the probability that the system is continuously
available (usable or accessible) during a time
interval.

A distributed database system has multiple nodes
(computers) and if one fails then others are
available to do the job.
Slide 25- 10
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Distributed Database System
 Other Advantages (contd.)
 Improved performance:

A distributed DBMS fragments the database to keep
data closer to where it is needed most.

This reduces data management (access and
modification) time significantly.
 Easier expansion (scalability):

Allows new nodes (computers) to be added anytime
without chaining the entire configuration.
Slide 25- 11
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Data Fragmentation
 Split a relation into logically related and correct
parts. A relation can be fragmented in two ways:

Horizontal Fragmentation

Vertical Fragmentation
Slide 25- 12
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Horizontal fragmentation
 It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
 Consider the Employee relation with selection condition
(DNO = 5). All tuples satisfy this condition will create a
subset which will be a horizontal fragment of Employee
relation.
 A selection condition may be composed of several
conditions connected by AND or OR.
 Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are
related with Foreign keys.
Slide 25- 13
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Vertical fragmentation
 It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will contain
values of selected columns. There is no selection condition
used in vertical fragmentation.
 Consider the Employee relation. A vertical fragment of can
be created by keeping the values of Name, Bdate, Sex, and
Address.
 Because there is no condition for creating a vertical
fragment, each fragment must include the primary key
attribute of the parent relation Employee. In this way all
vertical fragments of a relation are connected.
Slide 25- 14
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Representation
 Horizontal fragmentation

Each horizontal fragment on a relation can be specified by a
Ci (R) operation in the relational algebra.

Complete horizontal fragmentation

A set of horizontal fragments whose conditions C1, C2, …, Cn
include all the tuples in R- that is, every tuple in R satisfies (C1
OR C2 OR … OR Cn).

Disjoint complete horizontal fragmentation: No tuple in R
satisfies (Ci AND Cj) where i ≠ j.

To reconstruct R from horizontal fragments a UNION is
applied.
Slide 25- 15
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Representation
 Vertical fragmentation

A vertical fragment on a relation can be specified by a Li(R)
operation in the relational algebra.

Complete vertical fragmentation

A set of vertical fragments whose projection lists L1, L2, …, Ln
include all the attributes in R but share only the primary key of
R. In this case the projection lists satisfy the following two
conditions:

L1  L2  ...  Ln = ATTRS (R)

Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of
attributes of R and PK(R) is the primary key of R.

To reconstruct R from complete vertical fragments a OUTER
UNION is applied.
Slide 25- 16
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Representation
 Mixed (Hybrid) fragmentation

A combination of Vertical fragmentation and
Horizontal fragmentation.

This is achieved by SELECT-PROJECT operations
which is represented by Li(Ci (R)).

If C = True (Select all tuples) and L ≠ ATTRS(R), we
get a vertical fragment, and if C ≠ True and L ≠
ATTRS(R), we get a mixed fragment.

If C = True and L = ATTRS(R), then R can be
considered a fragment.
Slide 25- 17
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Fragmentation schema
 A definition of a set of fragments (horizontal or vertical or
horizontal and vertical) that includes all attributes and tuples
in the database that satisfies the condition that the whole
database can be reconstructed from the fragments by
applying some sequence of UNION (or OUTER JOIN) and
UNION operations.
 Allocation schema
 It describes the distribution of fragments to sites of
distributed databases. It can be fully or partially replicated
or can be partitioned.
Slide 25- 18
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Data Fragmentation, Replication and
Allocation
 Data Replication
 Database is replicated to all sites.
 In full replication the entire database is replicated and in
partial replication some selected part is replicated to some
of the sites.
 Data replication is achieved through a replication schema.
 Data Distribution (Data Allocation)
 This is relevant only in the case of partial replication or
partition.
 The selected portion of the database is distributed to the
database sites.
Slide 25- 19
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Types of Distributed Database Systems
 Homogeneous

All sites of the database
system have identical
setup, i.e., same database
system software.
 The underlying operating
system may be different.

For example, all sites run
Oracle or DB2, or Sybase
or some other database
system.

The underlying operating
systems can be a mixture
of Linux, Window, Unix,
etc.
Site 5
Site 1
Site 2
Site 3
Oracle Oracle
Oracle
Oracle
Site 4
Oracle
Linux
Linux
Window
Window
Unix
Communications
network
Slide 25- 20
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Types of Distributed Database Systems
 Heterogeneous
 Federated: Each site may run different database system but the
data access is managed through a single conceptual schema.

This implies that the degree of local autonomy is minimum. Each site
must adhere to a centralized access policy. There may be a global
schema.
 Multidatabase: There is no one conceptual global schema. For
data access a schema is constructed dynamically as needed by
the application software.
Communications
network
Site 5
Site 1
Site 2
Site 3
Network
DBMS
Relational
Site 4
Object
Oriented
Linux
Linux
Unix
Hierarchical
Object
Oriented
Relational
Unix
Window
Slide 25- 21
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Types of Distributed Database Systems
 Federated Database Management Systems
Issues
 Differences in data models:

Relational, Objected oriented, hierarchical, network,
etc.
 Differences in constraints:

Each site may have their own data accessing and
processing constraints.
 Differences in query language:

Some site may use SQL, some may use SQL-89,
some may use SQL-92, and so on.
Slide 25- 22
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Issues
 Cost of transferring data (files and results) over the network.

This cost is usually high so some optimization is necessary.

Example relations: Employee at site 1 and Department at Site
2
 Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106
bytes.
 Department at Site 2. 100 rows. Row size = 35 bytes. Table
size = 3,500 bytes.

Q: For each employee, retrieve employee name and
department name Where the employee works.
 Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
Dname Dnumber Mgrssn Mgrstartdate
Slide 25- 23
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Result
 The result of this query will have 10,000 tuples,
assuming that every employee is related to a
department.
 Suppose each result tuple is 40 bytes long. The
query is submitted at site 3 and the result is sent to
this site.
 Problem: Employee and Department relations are
not present at site 3.
Slide 25- 24
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Strategies:
1. Transfer Employee and Department to site 3.

Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.

Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3.

Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
 Optimization criteria: minimizing data transfer.
Slide 25- 25
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Strategies:
1. Transfer Employee and Department to site 3.

Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.

Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3.

Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
 Optimization criteria: minimizing data transfer.
 Preferred approach: strategy 3.
Slide 25- 26
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Consider the query
 Q’: For each department, retrieve the department
name and the name of the department manager
 Relational Algebra expression:
 Fname,Lname,Dname (Employee Mgrssn = SSN
Department)
Slide 25- 27
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 The result of this query will have 100 tuples, assuming
that every department has a manager, the execution
strategies are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site
1 and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.
Slide 25- 28
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 The result of this query will have 100 tuples, assuming
that every department has a manager, the execution
strategies are:
1. Transfer Employee and Department to the result site and
perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3. Query result size = 40 * 100 = 4000
bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1
and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.
 Preferred strategy: Choose strategy 3.
Slide 25- 29
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Now suppose the result site is 2. Possible
strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2.

Total transfer size = 1,000,000 bytes for both
queries Q and Q’.
2. Transfer Department relation to site 1, execute
join at site 1 and send the result back to site 2.

Total transfer size for Q = 400,000 + 3500 =
403,500 bytes and for Q’ = 4000 + 3500 = 7500
bytes.
Slide 25- 30
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Query Processing in Distributed
Databases
 Semijoin:

Objective is to reduce the number of tuples in a relation
before transferring it to another site.
 Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred and for Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site
1, and transfer the required attributes from the resulting file
to site 2. For Q, 34 * 10,000 = 340,000 bytes are
transferred and for Q’, 39 * 100 = 3900 bytes are
transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site 2.
Slide 25- 31
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Distributed Databases encounter a number of
concurrency control and recovery problems which
are not present in centralized databases. Some
of them are listed below.
 Dealing with multiple copies of data items
 Failure of individual sites
 Communication link failure
 Distributed commit
 Distributed deadlock
Slide 25- 32
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Details
 Dealing with multiple copies of data items:

The concurrency control must maintain global
consistency. Likewise the recovery mechanism
must recover all copies and maintain consistency
after recovery.
 Failure of individual sites:

Database availability must not be affected due to
the failure of one or two sites and the recovery
scheme must recover them before they are
available for use.
Slide 25- 33
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Details (contd.)
 Communication link failure:

This failure may create network partition which would affect
database availability even though all database sites may be
running.
 Distributed commit:

A transaction may be fragmented and they may be executed
by a number of sites. This require a two or three-phase
commit approach for transaction commit.
 Distributed deadlock:

Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be
resolved in a distributed manner.
Slide 25- 34
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Distributed Concurrency control based on a
distributed copy of a data item
 Primary site technique: A single site is designated
as a primary site which serves as a coordinator for
transaction management.
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Primary site
Slide 25- 35
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Transaction management:
 Concurrency control and commit are managed by
this site.
 In two phase locking, this site manages locking
and releasing data items. If all transactions follow
two-phase policy at all sites, then serializability is
guaranteed.
Slide 25- 36
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Transaction Management
 Advantages:

An extension to the centralized two phase locking so
implementation and management is simple.

Data items are locked only at one site but they can be
accessed at any site.
 Disadvantages:

All transaction management activities go to primary site which
is likely to overload the site.

If the primary site fails, the entire system is inaccessible.
 To aid recovery a backup site is designated which behaves
as a shadow of primary site. In case of primary site failure,
backup site can act as primary site.
Slide 25- 37
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Primary Copy Technique:
 In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
 Advantages:
 Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking
requests.
 Disadvantages:
 Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.
Slide 25- 38
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Recovery from a coordinator failure

In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
 Primary site approach with no backup site:

Aborts and restarts all active transactions at all sites. Elects
a new coordinator and initiates transaction processing.
 Primary site approach with backup site:

Suspends all active transactions, designates the backup
site as the primary site and identifies a new back up site.
Primary site receives all transaction management
information to resume processing.
 Primary and backup sites fail or no backup site:

Use election process to select a new coordinator site.
Slide 25- 39
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Concurrency Control and Recovery
 Concurrency control based on voting:
 There is no primary copy of coordinator.
 Send lock request to sites that have data item.
 If majority of sites grant lock then the requesting
transaction gets the data item.
 Locking information (grant or denied) is sent to all
these sites.
 To avoid unacceptably long wait, a time-out period
is defined. If the requesting transaction does not
get any vote information then the transaction is
aborted.
Slide 25- 40
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Client-Server Database Architecture
 It consists of clients running client software, a set
of servers which provide all database
functionalities and a reliable communication
infrastructure.
Client 1
Client 3
Client 2
Client n
Server 1
Server 2
Server n
Slide 25- 41
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Client-Server Database Architecture
 Clients reach server for desired service, but
server does reach clients.
 The server software is responsible for local data
management at a site, much like centralized
DBMS software.
 The client software is responsible for most of the
distribution function.
 The communication software manages
communication among clients and servers.
Slide 25- 42
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Client-Server Database Architecture
 The processing of a SQL queries goes as follows:
 Client parses a user query and decomposes it into
a number of independent sub-queries. Each
subquery is sent to appropriate site for execution.
 Each server processes its query and sends the
result to the client.
 The client combines the results of subqueries and
produces the final result.
Slide 25- 43
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Recap
 Distributed Database Concepts
 Data Fragmentation, Replication and Allocation
 Types of Distributed Database Systems
 Query Processing
 Concurrency Control and Recovery
 3-Tier Client-Server Architecture
Ad

Recommended

CHAPTER2.ppt DATABASES FOR MULTIMEDIA COMPUTING
CHAPTER2.ppt DATABASES FOR MULTIMEDIA COMPUTING
ssuser77162c
 
DDBMS (1).pptanmhfvnmbjk bjhjjjhkjkhjkjk
DDBMS (1).pptanmhfvnmbjk bjhjjjhkjkhjkjk
SireeshaRM
 
Distributed Database Management System
Distributed Database Management System
Hardik Patil
 
Chapter25
Chapter25
gourab87
 
Adbms 24 data fragmentation
Adbms 24 data fragmentation
Vaibhav Khanna
 
1 ddbms jan 2011_u
1 ddbms jan 2011_u
betheperformer
 
Chapter02
Chapter02
Reham Maher El-Safarini
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
examlab57
 
ch02-Database System Concepts and Architecture.pdf
ch02-Database System Concepts and Architecture.pdf
BasirKhan22
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
KalsoomTahir2
 
2 database system concepts and architecture
2 database system concepts and architecture
Kumar
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
vivekananda34
 
DDBS PPT (1).pptx
DDBS PPT (1).pptx
HarshitSingh334328
 
Chapter 5 - Distributed Database and QODD.pptx
Chapter 5 - Distributed Database and QODD.pptx
ahmed518927
 
Introduction to 3 tier architecture of DBMS basics and details
Introduction to 3 tier architecture of DBMS basics and details
ShrutiArora343479
 
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
kiruthikan18
 
5 the relational algebra and calculus
5 the relational algebra and calculus
Kumar
 
Database System Concepts and Architecture
Database System Concepts and Architecture
sontumax
 
Pptofdistributeddb
Pptofdistributeddb
Mahavir Devmane
 
chap1.pdf
chap1.pdf
ezaldeen2013
 
Ddbms1
Ddbms1
pranjal_das
 
Distributed Database System
Distributed Database System
Sulemang
 
Os9
Os9
issbp
 
Introduction to distributed database
Introduction to distributed database
Sonia Panesar
 
Rdbms
Rdbms
argusacademy
 
Distributed design and architechture .ppt
Distributed design and architechture .ppt
pandeyvivek1602
 
normalization in SQL BEST NOTES PPT AVAILABLE
normalization in SQL BEST NOTES PPT AVAILABLE
DivyanshUpadhyay11
 
Chapter02 database system in computer.ppt
Chapter02 database system in computer.ppt
ubaidullah75790
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
SRU_RK_Lecturer1 about datamining cocepts
SRU_RK_Lecturer1 about datamining cocepts
coolscools1231
 

More Related Content

Similar to SR_R_Datamining.ppt detaled explanation re (20)

ch02-Database System Concepts and Architecture.pdf
ch02-Database System Concepts and Architecture.pdf
BasirKhan22
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
KalsoomTahir2
 
2 database system concepts and architecture
2 database system concepts and architecture
Kumar
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
vivekananda34
 
DDBS PPT (1).pptx
DDBS PPT (1).pptx
HarshitSingh334328
 
Chapter 5 - Distributed Database and QODD.pptx
Chapter 5 - Distributed Database and QODD.pptx
ahmed518927
 
Introduction to 3 tier architecture of DBMS basics and details
Introduction to 3 tier architecture of DBMS basics and details
ShrutiArora343479
 
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
kiruthikan18
 
5 the relational algebra and calculus
5 the relational algebra and calculus
Kumar
 
Database System Concepts and Architecture
Database System Concepts and Architecture
sontumax
 
Pptofdistributeddb
Pptofdistributeddb
Mahavir Devmane
 
chap1.pdf
chap1.pdf
ezaldeen2013
 
Ddbms1
Ddbms1
pranjal_das
 
Distributed Database System
Distributed Database System
Sulemang
 
Os9
Os9
issbp
 
Introduction to distributed database
Introduction to distributed database
Sonia Panesar
 
Rdbms
Rdbms
argusacademy
 
Distributed design and architechture .ppt
Distributed design and architechture .ppt
pandeyvivek1602
 
normalization in SQL BEST NOTES PPT AVAILABLE
normalization in SQL BEST NOTES PPT AVAILABLE
DivyanshUpadhyay11
 
Chapter02 database system in computer.ppt
Chapter02 database system in computer.ppt
ubaidullah75790
 
ch02-Database System Concepts and Architecture.pdf
ch02-Database System Concepts and Architecture.pdf
BasirKhan22
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
KalsoomTahir2
 
2 database system concepts and architecture
2 database system concepts and architecture
Kumar
 
ch02-Database System Concepts and Architecture.ppt
ch02-Database System Concepts and Architecture.ppt
vivekananda34
 
Chapter 5 - Distributed Database and QODD.pptx
Chapter 5 - Distributed Database and QODD.pptx
ahmed518927
 
Introduction to 3 tier architecture of DBMS basics and details
Introduction to 3 tier architecture of DBMS basics and details
ShrutiArora343479
 
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
PVP19 DBMS UNIT-4 Material.pdfvh kk ghkd DL of child gf
kiruthikan18
 
5 the relational algebra and calculus
5 the relational algebra and calculus
Kumar
 
Database System Concepts and Architecture
Database System Concepts and Architecture
sontumax
 
Distributed Database System
Distributed Database System
Sulemang
 
Introduction to distributed database
Introduction to distributed database
Sonia Panesar
 
Distributed design and architechture .ppt
Distributed design and architechture .ppt
pandeyvivek1602
 
normalization in SQL BEST NOTES PPT AVAILABLE
normalization in SQL BEST NOTES PPT AVAILABLE
DivyanshUpadhyay11
 
Chapter02 database system in computer.ppt
Chapter02 database system in computer.ppt
ubaidullah75790
 

More from coolscools1231 (8)

Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
SRU_RK_Lecturer1 about datamining cocepts
SRU_RK_Lecturer1 about datamining cocepts
coolscools1231
 
R1234_SRU data knowledge informations regarding
R1234_SRU data knowledge informations regarding
coolscools1231
 
ERK_SRU_ch08-2019-03-27.ppt discussion in class room
ERK_SRU_ch08-2019-03-27.ppt discussion in class room
coolscools1231
 
DRK_Introduction to Data mining and Knowledge discovery
DRK_Introduction to Data mining and Knowledge discovery
coolscools1231
 
WEKA Tutorial and Introduction Data mining
WEKA Tutorial and Introduction Data mining
coolscools1231
 
Dynamic Programming and Applications.ppt
Dynamic Programming and Applications.ppt
coolscools1231
 
ch17_Transaction management in Database Management System
ch17_Transaction management in Database Management System
coolscools1231
 
Lecturer3 by RamaKrishna SRU waranagal telanga
Lecturer3 by RamaKrishna SRU waranagal telanga
coolscools1231
 
SRU_RK_Lecturer1 about datamining cocepts
SRU_RK_Lecturer1 about datamining cocepts
coolscools1231
 
R1234_SRU data knowledge informations regarding
R1234_SRU data knowledge informations regarding
coolscools1231
 
ERK_SRU_ch08-2019-03-27.ppt discussion in class room
ERK_SRU_ch08-2019-03-27.ppt discussion in class room
coolscools1231
 
DRK_Introduction to Data mining and Knowledge discovery
DRK_Introduction to Data mining and Knowledge discovery
coolscools1231
 
WEKA Tutorial and Introduction Data mining
WEKA Tutorial and Introduction Data mining
coolscools1231
 
Dynamic Programming and Applications.ppt
Dynamic Programming and Applications.ppt
coolscools1231
 
ch17_Transaction management in Database Management System
ch17_Transaction management in Database Management System
coolscools1231
 
Ad

Recently uploaded (20)

Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
The Man In The Back – Exceptional Delaware.pdf
The Man In The Back – Exceptional Delaware.pdf
dennisongomezk
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Rajdeep Bavaliya
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
Introduction to problem solving Techniques
Introduction to problem solving Techniques
merlinjohnsy
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
Overview of Employee in Odoo 18 - Odoo Slides
Overview of Employee in Odoo 18 - Odoo Slides
Celine George
 
How to Create an Event in Odoo 18 - Odoo 18 Slides
How to Create an Event in Odoo 18 - Odoo 18 Slides
Celine George
 
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
ChristinaFortunova
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
Unit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdf
KRUTIKA CHANNE
 
Ray Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big Cycle
Dadang Solihin
 
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
What are the benefits that dance brings?
What are the benefits that dance brings?
memi27
 
Revista digital preescolar en transformación
Revista digital preescolar en transformación
guerragallardo26
 
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Battle of Bookworms 2025 - U25 Literature Quiz by Pragya
Pragya - UEM Kolkata Quiz Club
 
The Man In The Back – Exceptional Delaware.pdf
The Man In The Back – Exceptional Delaware.pdf
dennisongomezk
 
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
PEST OF WHEAT SORGHUM BAJRA and MINOR MILLETS.pptx
Arshad Shaikh
 
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in ‘Interstellar’: Exploring Universal Themes...
Rajdeep Bavaliya
 
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
THERAPEUTIC COMMUNICATION included definition, characteristics, nurse patient...
parmarjuli1412
 
GEOGRAPHY-Study Material [ Class 10th] .pdf
GEOGRAPHY-Study Material [ Class 10th] .pdf
SHERAZ AHMAD LONE
 
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
 
Introduction to problem solving Techniques
Introduction to problem solving Techniques
merlinjohnsy
 
june 10 2025 ppt for madden on art science is over.pptx
june 10 2025 ppt for madden on art science is over.pptx
roger malina
 
Overview of Employee in Odoo 18 - Odoo Slides
Overview of Employee in Odoo 18 - Odoo Slides
Celine George
 
How to Create an Event in Odoo 18 - Odoo 18 Slides
How to Create an Event in Odoo 18 - Odoo 18 Slides
Celine George
 
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
FEBA Sofia Univercity final diplian v3 GSDG 5.2025.pdf
ChristinaFortunova
 
Introduction to Generative AI and Copilot.pdf
Introduction to Generative AI and Copilot.pdf
TechSoup
 
Unit- 4 Biostatistics & Research Methodology.pdf
Unit- 4 Biostatistics & Research Methodology.pdf
KRUTIKA CHANNE
 
Ray Dalio How Countries go Broke the Big Cycle
Ray Dalio How Countries go Broke the Big Cycle
Dadang Solihin
 
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
 
How to Manage & Create a New Department in Odoo 18 Employee
How to Manage & Create a New Department in Odoo 18 Employee
Celine George
 
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs – Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
 
What are the benefits that dance brings?
What are the benefits that dance brings?
memi27
 
Ad

SR_R_Datamining.ppt detaled explanation re

  • 1. Slide 25- 1 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
  • 2. Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Distributed Databases and Client-Server Architectures
  • 3. Slide 25- 3 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline 1. Distributed Database Concepts 2. Data Fragmentation, Replication and Allocation 3. Types of Distributed Database Systems 4. Query Processing 5. Concurrency Control and Recovery 6. 3-Tier Client-Server Architecture
  • 4. Slide 25- 4 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database Concepts  A transaction can be executed by multiple networked computers in a unified manner.  A distributed database (DDB) processes Unit of execution (a transaction) in a distributed manner. A distributed database (DDB) can be defined as  A distributed database (DDB) is a collection of multiple logically related database distributed over a computer network, and a distributed database management system as a software system that manages a distributed database while making the distribution transparent to the user.
  • 5. Slide 25- 5 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Advantages  Management of distributed data with different levels of transparency:  This refers to the physical placement of data (files, relations, etc.) which is not known to the user (distribution transparency).
  • 6. Slide 25- 6 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Advantages (transparency, contd.)  The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below.
  • 7. Slide 25- 7 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Advantages (transparency, contd.)  Distribution and Network transparency:  Users do not have to worry about operational details of the network.  There is Location transparency, which refers to freedom of issuing command from any location without affecting its working.  Then there is Naming transparency, which allows access to any names object (files, relations, etc.) from any location.
  • 8. Slide 25- 8 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Advantages (transparency, contd.)  Replication transparency:  It allows to store copies of a data at multiple sites as shown in the above diagram.  This is done to minimize access time to the required data.  Fragmentation transparency:  Allows to fragment a relation horizontally (create a subset of tuples of a relation) or vertically (create a subset of columns of a relation).
  • 9. Slide 25- 9 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Other Advantages  Increased reliability and availability:  Reliability refers to system live time, that is, system is running efficiently most of the time. Availability is the probability that the system is continuously available (usable or accessible) during a time interval.  A distributed database system has multiple nodes (computers) and if one fails then others are available to do the job.
  • 10. Slide 25- 10 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Distributed Database System  Other Advantages (contd.)  Improved performance:  A distributed DBMS fragments the database to keep data closer to where it is needed most.  This reduces data management (access and modification) time significantly.  Easier expansion (scalability):  Allows new nodes (computers) to be added anytime without chaining the entire configuration.
  • 11. Slide 25- 11 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Data Fragmentation  Split a relation into logically related and correct parts. A relation can be fragmented in two ways:  Horizontal Fragmentation  Vertical Fragmentation
  • 12. Slide 25- 12 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Horizontal fragmentation  It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions.  Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation.  A selection condition may be composed of several conditions connected by AND or OR.  Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations which are related with Foreign keys.
  • 13. Slide 25- 13 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Vertical fragmentation  It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no selection condition used in vertical fragmentation.  Consider the Employee relation. A vertical fragment of can be created by keeping the values of Name, Bdate, Sex, and Address.  Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.
  • 14. Slide 25- 14 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Representation  Horizontal fragmentation  Each horizontal fragment on a relation can be specified by a Ci (R) operation in the relational algebra.  Complete horizontal fragmentation  A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn).  Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j.  To reconstruct R from horizontal fragments a UNION is applied.
  • 15. Slide 25- 15 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Representation  Vertical fragmentation  A vertical fragment on a relation can be specified by a Li(R) operation in the relational algebra.  Complete vertical fragmentation  A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. In this case the projection lists satisfy the following two conditions:  L1  L2  ...  Ln = ATTRS (R)  Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of R and PK(R) is the primary key of R.  To reconstruct R from complete vertical fragments a OUTER UNION is applied.
  • 16. Slide 25- 16 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Representation  Mixed (Hybrid) fragmentation  A combination of Vertical fragmentation and Horizontal fragmentation.  This is achieved by SELECT-PROJECT operations which is represented by Li(Ci (R)).  If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment.  If C = True and L = ATTRS(R), then R can be considered a fragment.
  • 17. Slide 25- 17 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Fragmentation schema  A definition of a set of fragments (horizontal or vertical or horizontal and vertical) that includes all attributes and tuples in the database that satisfies the condition that the whole database can be reconstructed from the fragments by applying some sequence of UNION (or OUTER JOIN) and UNION operations.  Allocation schema  It describes the distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned.
  • 18. Slide 25- 18 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Data Fragmentation, Replication and Allocation  Data Replication  Database is replicated to all sites.  In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites.  Data replication is achieved through a replication schema.  Data Distribution (Data Allocation)  This is relevant only in the case of partial replication or partition.  The selected portion of the database is distributed to the database sites.
  • 19. Slide 25- 19 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Types of Distributed Database Systems  Homogeneous  All sites of the database system have identical setup, i.e., same database system software.  The underlying operating system may be different.  For example, all sites run Oracle or DB2, or Sybase or some other database system.  The underlying operating systems can be a mixture of Linux, Window, Unix, etc. Site 5 Site 1 Site 2 Site 3 Oracle Oracle Oracle Oracle Site 4 Oracle Linux Linux Window Window Unix Communications network
  • 20. Slide 25- 20 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Types of Distributed Database Systems  Heterogeneous  Federated: Each site may run different database system but the data access is managed through a single conceptual schema.  This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema.  Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software. Communications network Site 5 Site 1 Site 2 Site 3 Network DBMS Relational Site 4 Object Oriented Linux Linux Unix Hierarchical Object Oriented Relational Unix Window
  • 21. Slide 25- 21 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Types of Distributed Database Systems  Federated Database Management Systems Issues  Differences in data models:  Relational, Objected oriented, hierarchical, network, etc.  Differences in constraints:  Each site may have their own data accessing and processing constraints.  Differences in query language:  Some site may use SQL, some may use SQL-89, some may use SQL-92, and so on.
  • 22. Slide 25- 22 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Issues  Cost of transferring data (files and results) over the network.  This cost is usually high so some optimization is necessary.  Example relations: Employee at site 1 and Department at Site 2  Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size = 106 bytes.  Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3,500 bytes.  Q: For each employee, retrieve employee name and department name Where the employee works.  Q: Fname,Lname,Dname (Employee Dno = Dnumber Department) Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno Dname Dnumber Mgrssn Mgrstartdate
  • 23. Slide 25- 23 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Result  The result of this query will have 10,000 tuples, assuming that every employee is related to a department.  Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and the result is sent to this site.  Problem: Employee and Department relations are not present at site 3.
  • 24. Slide 25- 24 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Strategies: 1. Transfer Employee and Department to site 3.  Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3.  Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3.  Total bytes transferred = 400,000 + 3500 = 403,500 bytes.  Optimization criteria: minimizing data transfer.
  • 25. Slide 25- 25 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Strategies: 1. Transfer Employee and Department to site 3.  Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3.  Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3.  Total bytes transferred = 400,000 + 3500 = 403,500 bytes.  Optimization criteria: minimizing data transfer.  Preferred approach: strategy 3.
  • 26. Slide 25- 26 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Consider the query  Q’: For each department, retrieve the department name and the name of the department manager  Relational Algebra expression:  Fname,Lname,Dname (Employee Mgrssn = SSN Department)
  • 27. Slide 25- 27 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: 1. Transfer Employee and Department to the result site and perform the join at site 3.  Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes.  Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3.  Total transfer size = 4000 + 3500 = 7500 bytes.
  • 28. Slide 25- 28 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: 1. Transfer Employee and Department to the result site and perform the join at site 3.  Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes.  Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3.  Total transfer size = 4000 + 3500 = 7500 bytes.  Preferred strategy: Choose strategy 3.
  • 29. Slide 25- 29 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Now suppose the result site is 2. Possible strategies : 1. Transfer Employee relation to site 2, execute the query and present the result to the user at site 2.  Total transfer size = 1,000,000 bytes for both queries Q and Q’. 2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2.  Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
  • 30. Slide 25- 30 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Query Processing in Distributed Databases  Semijoin:  Objective is to reduce the number of tuples in a relation before transferring it to another site.  Example execution of Q or Q’: 1. Project the join attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred. 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.
  • 31. Slide 25- 31 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.  Dealing with multiple copies of data items  Failure of individual sites  Communication link failure  Distributed commit  Distributed deadlock
  • 32. Slide 25- 32 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Details  Dealing with multiple copies of data items:  The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery.  Failure of individual sites:  Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.
  • 33. Slide 25- 33 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Details (contd.)  Communication link failure:  This failure may create network partition which would affect database availability even though all database sites may be running.  Distributed commit:  A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit.  Distributed deadlock:  Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.
  • 34. Slide 25- 34 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Distributed Concurrency control based on a distributed copy of a data item  Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management. Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3 Primary site
  • 35. Slide 25- 35 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Transaction management:  Concurrency control and commit are managed by this site.  In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed.
  • 36. Slide 25- 36 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Transaction Management  Advantages:  An extension to the centralized two phase locking so implementation and management is simple.  Data items are locked only at one site but they can be accessed at any site.  Disadvantages:  All transaction management activities go to primary site which is likely to overload the site.  If the primary site fails, the entire system is inaccessible.  To aid recovery a backup site is designated which behaves as a shadow of primary site. In case of primary site failure, backup site can act as primary site.
  • 37. Slide 25- 37 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Primary Copy Technique:  In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked.  Advantages:  Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests.  Disadvantages:  Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.
  • 38. Slide 25- 38 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Recovery from a coordinator failure  In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator.  Primary site approach with no backup site:  Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing.  Primary site approach with backup site:  Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing.  Primary and backup sites fail or no backup site:  Use election process to select a new coordinator site.
  • 39. Slide 25- 39 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Concurrency Control and Recovery  Concurrency control based on voting:  There is no primary copy of coordinator.  Send lock request to sites that have data item.  If majority of sites grant lock then the requesting transaction gets the data item.  Locking information (grant or denied) is sent to all these sites.  To avoid unacceptably long wait, a time-out period is defined. If the requesting transaction does not get any vote information then the transaction is aborted.
  • 40. Slide 25- 40 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Client-Server Database Architecture  It consists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure. Client 1 Client 3 Client 2 Client n Server 1 Server 2 Server n
  • 41. Slide 25- 41 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Client-Server Database Architecture  Clients reach server for desired service, but server does reach clients.  The server software is responsible for local data management at a site, much like centralized DBMS software.  The client software is responsible for most of the distribution function.  The communication software manages communication among clients and servers.
  • 42. Slide 25- 42 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Client-Server Database Architecture  The processing of a SQL queries goes as follows:  Client parses a user query and decomposes it into a number of independent sub-queries. Each subquery is sent to appropriate site for execution.  Each server processes its query and sends the result to the client.  The client combines the results of subqueries and produces the final result.
  • 43. Slide 25- 43 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Recap  Distributed Database Concepts  Data Fragmentation, Replication and Allocation  Types of Distributed Database Systems  Query Processing  Concurrency Control and Recovery  3-Tier Client-Server Architecture