0% found this document useful (0 votes)
4 views

1 Introduction

The document outlines the concepts and architecture of distributed database management systems (DDBMS), highlighting their design, integration, and transaction management. It discusses the advantages of distributed systems, such as improved reliability, performance, and scalability, while also addressing challenges like query processing and data control. Key topics include types of transparency, data replication, and the evolution of client-server architectures in distributed environments.

Uploaded by

kieunty22416c
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1 Introduction

The document outlines the concepts and architecture of distributed database management systems (DDBMS), highlighting their design, integration, and transaction management. It discusses the advantages of distributed systems, such as improved reliability, performance, and scalability, while also addressing challenges like query processing and data control. Key topics include types of transparency, data replication, and the evolution of client-server architectures in distributed environments.

Uploaded by

kieunty22416c
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Outline

• Introduction
● What is a distributed DBMS
● Distributed DBMS Architecture

• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed © M. T. Özsu & P.
Ch.1/1
File Systems
History of Distributed DBMS

progra
m1 File
data
1
description 1
progra
m2
data File
description 2 2
progra
m3
data File
description 3 3
Before the advent of database systems
in the 1960s
Distributed © M. T. Özsu & P.
Ch.1/2
Database Management
Applicatio
n
program 1
(with data
semantics DBM
) S
descriptio
Applicatio
n n
manipulatio
program 2 databa
n
contro se
(with data
semantics l
)
Applicatio
n
program 3
(with data
semantics
Distributed
) © M. T. Özsu & P.
Ch.1/3
Motivation

Database Comput
Technolo er
gy Networ
integratio ksdistributio
n n

Distribut
ed
Database
Systems
integratio
n
integration ≠
Distributed
centralization
© M. T. Özsu & P.
Ch.1/4
Distributed Computing
• A number of autonomous processing elements (not necessarily
homogeneous) that are interconnected by a computer network
and that cooperate in performing their assigned tasks.
• What is being distributed?
● Processing logic
● Function (various pieces of hardware or software)
● Data
● Control (execution of various tasks)

Distributed © M. T. Özsu & P.


Ch.1/5
What is a Distributed Database
System?
A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.

A distributed database management system (D–DBMS) is the


software that manages the DDB and provides an access
mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + D–DBMS

Distributed enterprises examples : Web-based application ,


electronic commerce business over the Internet, multimedia
applications (news-on-demand; medical imaging, manufacturing
control systems)

Distributed © M. T. Özsu & P.


Ch.1/6
What is not a DDBS?
• A timesharing computer system
• A loosely or tightly coupled multiprocessor system
• A database system which resides at one of the nodes of a
network of computers - this is a centralized database on a
network node

Distributed © M. T. Özsu & P.


Ch.1/7
Centralized DBMS on a Network

Site 1
Site 2

Site 5

Central
Communicati
Database on
on
Network Network

Site 4 Site 3

Distributed © M. T. Özsu & P.


Ch.1/8
Distributed DBMS Environment

Site 1
Site 2

Site 5
Communicati DDBMS
on environment
Network

Site 4 Site 3

Distributed © M. T. Özsu & P.


Ch.1/9
Implicit Assumptions
• Data stored at a number of sites 🡪 each site logically consists of a
single processor.
• Processors at different sites are interconnected by a computer
network 🡪 not a multiprocessor system
● Parallel database systems
• Distributed database is a database, not a collection of files 🡪 data
logically related as exhibited in the users’ access patterns
● Relational data model
• D-DBMS is a full-fledged DBMS
● Not remote file system

Distributed © M. T. Özsu & P.


Ch.1/10
Data Delivery Alternatives
• Delivery modes
● Pull-only
● Push-only
● Hybrid
• Frequency
● Periodic
● Conditional
● Ad-hoc or irregular
• Communication Methods
● Unicast (unbounded set of receivers - one-to-one)
● One-to-many (selective set of receivers)
• Note: not all combinations make sense
Distributed © M. T. Özsu & P.
Ch.1/11
Distributed DBMS Promises
❶ Transparent management of distributed, fragmented, and
replicated data

❷ Improved reliability/availability through distributed transactions

❸ Improved performance

❹ Easier and more economical system expansion

Distributed © M. T. Özsu & P.


Ch.1/12
Transparency
• Transparency is the separation of the higher level semantics of a
system from the lower level implementation issues.
• Fundamental issue is to provide
data independence
in the distributed environment
● Network (distribution) transparency

● Replication transparency

● Fragmentation transparency
● horizontal fragmentation: selection
● vertical fragmentation: projection
● hybrid

Distributed © M. T. Özsu & P.


Ch.1/13
Example

Distributed © M. T. Özsu & P.


Ch.1/14
Transparent Access
SELECT ENAME,SAL
Toky
FROM EMP,ASG,PAY o
WHERE DUR > 12 Bosto Pari
AND EMP.ENO = ASG.ENO n s
Paris projects
Paris
AND PAY.TITLE = EMP.TITLE Communicatio employees
n Paris
assignments
Boston projects Network Boston
Boston employees
employees
Boston
Montre
assignments
Ne al
Montreal projects
w Paris projects
Yorkprojects
Boston New York projects
New York with budget >
employees 200000
New York projects Montreal
New York employees
Distributed assignments
© M. T. Özsu & P. Montreal Ch.1/15
Distributed Database - User View

Distributed
Database

Distributed © M. T. Özsu & P.


Ch.1/16
Distributed DBMS - Reality
User
Quer
y
User
DBMS
Applicatio
Softwa n
re DBMS
Softwa
re
DBMS Communicati
Softwa on
re Subsystem
User
DBMS User Applicatio
Softwa Quer n
re y DBMS
Softwa
re
User
Quer
y

Distributed © M. T. Özsu & P.


Ch.1/17
Types of Transparency
• Data independence
• Network transparency (or distribution transparency)
● Location transparency
● Name transparency
• Replication transparency
• Fragmentation transparency

Distributed © M. T. Özsu & P.


Ch.1/18
Reliability Through Transactions
• Replicated components and data should make distributed DBMS
more reliable.
• Distributed transactions provide
● Concurrency transparency
● Failure atomicity
• Distributed transaction support requires implementation of
● Distributed concurrency control protocols
● Commit protocols
• Data replication
● Great for read-intensive workloads, problematic for updates
● the implementation of replica control protocols

Distributed © M. T. Özsu & P.


Ch.1/19
Potentially Improved
Performance
• Proximity of data to its points of use

● Requires some support for fragmentation and replication

• Parallelism in execution

● Inter-query parallelism

● Intra-query parallelism
● Interoperator parallelism
✓ pipeline parallelism
✓ Independent parallelism
● Intraoperator parallelism

Distributed © M. T. Özsu & P.


Ch.1/20
Parallelism Requirements
• Have as much of the data required by each application at the site
where the application executes

● Full replication

• How about updates?

● Mutual consistency

● Freshness of copies

Distributed © M. T. Özsu & P.


Ch.1/21
System Expansion
• Issue is database scaling

• Emergence of microprocessor and workstation technologies

● Demise of Grosh's law

● Client-server model of computing

• Data communication cost vs telecommunication cost

Distributed © M. T. Özsu & P.


Ch.1/22
Distributed DBMS Issues
• Distributed Database Design
How to distribute the database ?
● Replicated & non-replicated (partitioned ) database distribution
● A related problem in directory management

• Distributed Data Control


DBMS : maintain data consistency by controlling how data is
accessed, called data control (view management, access control,
and integrity enforcement )
Distribution imposes additional challenges : requiring distributed
rule checking and enforcement

Distributed © M. T. Özsu & P.


Ch.1/23
Distributed DBMS Issues
• Query Processing
● Analyze queries and convert them into a series of data manipulation
operations
● Optimization problem
● Min {cost = data transmission + local processing}
● General formulation is NP-hard
• Concurrency Control
● Synchronization of concurrent accesses
● Consistency and isolation of transactions' effects
● Deadlock management
• Reliability
● How to make the system resilient to failures
● Atomicity and durability
Distributed © M. T. Özsu & P.
Ch.1/24
Distributed DBMS Issues
• Replication
Distributed database is (partially or fully) replicated
⇒ implement protocols that ensure the consistency of the replicas
Protocols
● Eager : force the updates to be applied to all the replicas before
the transaction completes
● Lazy : the transaction updates one copy (called the master) from
which updates are propagated to the others after the transaction
completes
• Parallel DBMSs
Involves multiple processors and working in parallel on the
database used to provide the services.
to improve performance through parallelization of various
operations
Distributed © M. T. Özsu & P.
Ch.1/25
Relationship Between Issues
Directory
Managemen
t

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Managemen
t
Distributed © M. T. Özsu & P.
Ch.1/26
Architecture
• Defines the structure of the system
● components identified
● functions of each component defined
● interrelationships and interactions between components defined

Distributed © M. T. Özsu & P.


Ch.1/27
ANSI/SPARC Architecture

User
s

Extern Extern Extern Extern


al al al al
Schem view view view
a

Concept Conceptu
al
ual view
Schema

Intern Internal
al view
Sche
Distributed ma © M. T. Özsu & P.
Ch.1/28
Generic DBMS Architecture

Distributed © M. T. Özsu & P.


Ch.1/29
DBMS Implementation Alternatives

Distributed © M. T. Özsu & P.


Ch.1/30
Dimensions of the Problem
• Autonomy : refers to the distribution of control, not of data
● Indicates the degree to which individual DBMSs can operate
independently
● Design autonomy: Ability of a component DBMS to decide on issues
related to its own design.
● Communication autonomy: Ability of a component DBMS to decide
whether and how to communicate with other DBMSs.
● Execution autonomy: Ability of a component DBMS to execute local
operations in any manner it wants to.
● Classification highlights three alternatives
● tight integration
● semiautonomous systems
● total isolation

Distributed © M. T. Özsu & P.


Ch.1/31
Dimensions of the Problem
• Distribution : dimension deals with data
● Alternatives into two classes :
● client/server distribution
✓ Concentrates data management duties at servers
✓ Client/server DBMSs represent a practical compromise to
distributing functionality
● peer-to-peer distribution (or full distribution)
✓ No distinction of client machines versus servers.
Each machine has full DBMS functionality
✓ Can communicate with other machines to execute queries
and transactions

Distributed © M. T. Özsu & P.


Ch.1/32
Dimensions of the Problem
• Heterogeneity
● Various levels (hardware, communications, operating system)
● Perspective of this book relate to
● data models
✓ Different modeling tools creates heterogeneity
● query languages
✓ Use of completely different data access paradigms in
different data models
✓ Covers differences in vendor’s query language
✓ Big data platforms and NoSQL systems have significantly
variable access languages and mechanisms.
● transaction management protocols

Distributed © M. T. Özsu & P.


Ch.1/33
Client/Server Architecture

Distributed © M. T. Özsu & P.


Ch.1/34
Functional of Client-Server
Architectures
• Relational client/server DBMSs
● the server does most of the data management work (query
processing and optimization, transaction management, and
storage management)
• DBMS client module
● Responsible for managing the data that is cached to the client
● managing the transaction locks

Relational systems : the communication between the clients and


the server(s) is at the level of SQL statements
Client passes SQL queries to the server without trying to
understand or optimize them

Distributed © M. T. Özsu & P.


Ch.1/35
Client-Server Architectures
Two client/server architecture : makes it easier to manage the
complexity of modern DBMSs and the complexity of distribution.
• Multiple client/single server
● Not much different from centralized databases
● Important differences from centralized systems : data is cached
at the client => necessary to deploy cache coherence protocols
• Multiple client/multiple server
Two alternative management strategies
● Each client manages its own connection to the appropriate server
Called “heavy client” systems
● Each client knows of only its “home server”
Concentrates the data management functionality at the
servers
Distributed Called “light clients”
© M. T.systems
Özsu & P.
Ch.1/36
Database Servers
Client/server can be naturally
extended to provide for a more
efficient function distribution on
different kinds of servers

Three-tier distributed system


architecture.

Distributed © M. T. Özsu & P.


Ch.1/37
Distributed Database Servers
n-tier distributed approach

Advantages :
✔ Single focus on data management
✔ Overall performance of database
management can be significantly
enhanced
✔ Database servers can also exploit
advanced hardware

Costs :
additional overhead introduced
by another layer of
communication between the
application and the data servers
Distributed © M. T. Özsu & P.
Ch.1/38
Advantages of Client-Server
Architectures
• More efficient division of labor
• Horizontal and vertical scaling of resources
• Better price/performance on client machines
• Ability to use familiar tools on client machines
• Client access to remote data (via standards)
• Full DBMS functionality provided to client workstations
• Overall better system price/performance

Distributed © M. T. Özsu & P.


Ch.1/39
Peer-to-Peer Systems
Database design follows a top-down design
• Input is a (centralized) database with its own schema definition
(global conceptual schema—GCS)
• This database is partitioned and allocated to sites of the
distributed DBMS
• At each site, there is a local database with its own schema (called
the local conceptual schema—LCS)

User formulates queries according to the GCS, irrespective of its


location
• Distributed DBMS translates global queries into a group of local
queries => are executed by distributed DBMS components at
different sites that communicate with one another
Distributed © M. T. Özsu & P.
Ch.1/40
Peer-to-Peer Systems

Distributed © M. T. Özsu & P.


Ch.1/41
Peer-to-Peer Systems
Components of a distributed DBMS
• The first major component (user processor)
● user interface handler
● data controller
● global query optimizer and decomposer
● distributed execution monitor

• The second major component (data processor)


● local query optimizer
● local recovery manager
● runtime support processor

Distributed © M. T. Özsu & P.


Ch.1/42
Multidatabase Systems
Multidatabase systems (MDBSs) represent the case where
individual DBMSs are fully autonomous and have no concept of
cooperation.
Differents between MDBSs and Logically integrated distributed
DBMSs
• MDBSs
Global conceptual schema Logically integrated distributed
DBMSs
Represents only the collection of Defines the conceptual view of the
some of the local databases that entire database
each local DBMS wants to share

GCS (which is also called a


mediated schema) is defined by
integrating (possibly parts of) local
conceptual schemas

Distributed © M. T. Özsu & P.


Ch.1/43
Multidatabase Systems

• The global database

MDBSs Logically integrated distributed


DBMSs
Subset of the same union equal to the union of local databases

• The component-based architecture model

MDBSs Logically integrated distributed


DBMSs
Provides a layer of software that Each site is a full-fledged DBMS
runs on top of individual DBMSs that manages a different database
and provides users with the
facilities of accessing various
databases
Distributed © M. T. Özsu & P.
Ch.1/44
Multidatabase Systems
MDBS layer may run on multiple sites or central site.
MDBS layer simply another application that submits requests and
receives answers.
A popular implementation architecture for MDBSs is the
mediator/wrapper approach
• A mediator “is a software module that exploits encoded knowledge
about certain sets or subsets of data to create information for a higher
layer of applications”
Each mediator performs a particular function with clearly defined
interfaces
Each module in the MDBS layer is realized as a mediator.
Mediator level implements the GCS => handles user queries
over the GCS and performs the MDBS functionality
The mediators typically operate using a common data model and
interface
Distributed language. © M. T. Özsu & P.
Ch.1/45
Multidatabase Systems
• Wrappers : provide a mapping between a source DBMSs view
and the mediators’ view.
One can view the collection of mediators as a middleware
layer that provides services above the source systems

Distributed © M. T. Özsu & P.


Ch.1/46
Cloud Computing
Technologies
• Supporting applications over the web : service-oriented
architectures (SOA) for high-level communication of applications
through web services.
• Utility computing for packaging computing and storage resources
as services
• Cluster and virtualization technologies to manage lots of
computing and storage resources
• Autonomous computing to enable self-management of complex
infrastructure.
The main functions provided by clouds :
Security, directory management, resource management
(provisioning, allocation, monitoring), and data management
(storage, file management, database management, data
Distributed © M. T. Özsu & P.
Ch.1/47
Cloud Computing
The cloud provides various levels of functionality :
• Infrastructure-as-a-Service (IaaS)
● the delivery of a computing infrastructure as a service.
● (i.e., computing, networking, and storage resources)
• Platform-as-a-Service (PaaS)
● the delivery of a computing platform with development tools and
APIs as a service
• Software-as-a-Service (SaaS)
● the delivery of application software as a service
• Database-as-a-Service (DaaS)
● the delivery of database as a service.

Distributed © M. T. Özsu & P.


Ch.1/48
Cloud Computing
Advantages of cloud computing
• Cost
• Ease of access and use
• Quality of service
• Innovation
• Elasticity
Disadvantages of cloud computing
• Provider dependency
• Loss of control
• Security
• Hidden costs
Distributed © M. T. Özsu & P.
Ch.1/49
Cloud Computing
Three main multitenant database models with increasing
resource sharing and performance at the expense of less
isolation and increased complexity.
• Shared DBMS server
• Shared database
• Shared tables

Distributed © M. T. Özsu & P.


Ch.1/50

You might also like