0% found this document useful (0 votes)
3 views

1 Introduction

Uploaded by

Ngọc Long
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

1 Introduction

Uploaded by

Ngọc Long
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Principles of Distributed Database

Systems
TS. Phan Thị Hà
Khoa CNTT. PTIT

© 2020, M.T. Özsu & P. Valduriez TS. Pan


1
Thị Hà_PTIT
Outline
◼ Introduction
◼ Distributed and Parallel Database Design
◼ Distributed Data Control
◼ Distributed Query Processing
◼ Distributed Transaction Processing
◼ Data Replication
◼ Database Integration – Multidatabase Systems
◼ Parallel Database Systems
◼ Peer-to-Peer Data Management
◼ Big Data Processing
◼ NoSQL, NewSQL and Polystores
◼ Web Data Management
© 2020, M.T. Özsu & P. Valduriez TS. Pan
2
Thị Hà_PTIT
Outline
◼ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


3
Thị Hà_PTIT
Distributed Computing

◼ A number of autonomous processing elements (not


necessarily homogeneous) that are interconnected by a
computer network and that cooperate in performing their
assigned tasks.
◼ What is being distributed?
❑ Processing logic
❑ Function
❑ Data
❑ Control

© 2020, M.T. Özsu & P. Valduriez TS. Pan


4
Thị Hà_PTIT
Current Distribution – Geographically
Distributed Data Centers

© 2020, M.T. Özsu & P. Valduriez TS. Pan


5
Thị Hà_PTIT
What is a Distributed Database System?

A distributed database is a collection of multiple, logically


interrelated databases distributed over a computer network

A distributed database management system (Distributed


DBMS) is the software that manages the DDB and provides
an access mechanism that makes this distribution
transparent to the users

© 2020, M.T. Özsu & P. Valduriez TS. Pan


6
Thị Hà_PTIT
What is not a DDBS?

◼ A timesharing computer system


◼ A loosely or tightly coupled multiprocessor system
◼ A database system which resides at one of the nodes of
a network of computers - this is a centralized database
on a network node

© 2020, M.T. Özsu & P. Valduriez TS. Pan


7
Thị Hà_PTIT
Distributed DBMS Environment

© 2020, M.T. Özsu & P. Valduriez TS. Pan


8
Thị Hà_PTIT
Implicit Assumptions

◼ Data stored at a number of sites → each site logically


consists of a single processor
◼ Processors at different sites are interconnected by a
computer network → not a multiprocessor system
❑ Parallel database systems
◼ Distributed database is a database, not a collection of
files → data logically related as exhibited in the users’
access patterns
❑ Relational data model
◼ Distributed DBMS is a full-fledged DBMS
❑ Not remote file system, not a TP system

© 2020, M.T. Özsu & P. Valduriez TS. Pan


9
Thị Hà_PTIT
Important Point

Logically integrated
but
Physically distributed

© 2020, M.T. Özsu & P. Valduriez TS. Pan


10
Thị Hà_PTIT
Outline
◼ Introduction

❑ History

© 2020, M.T. Özsu & P. Valduriez TS. Pan


11
Thị Hà_PTIT
History – File Systems

© 2020, M.T. Özsu & P. Valduriez TS. Pan


12
Thị Hà_PTIT
History – Database Management

© 2020, M.T. Özsu & P. Valduriez TS. Pan


13
Thị Hà_PTIT
History – Early Distribution
Peer-to-Peer (P2P)

© 2020, M.T. Özsu & P. Valduriez TS. Pan


14
Thị Hà_PTIT
History – Client/Server

© 2020, M.T. Özsu & P. Valduriez TS. Pan


15
Thị Hà_PTIT
History – Data Integration

© 2020, M.T. Özsu & P. Valduriez TS. Pan


16
Thị Hà_PTIT
History – Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
◼ Cost savings: no need to maintain dedicated compute
power
◼ Elasticity: better adaptivity to changing workload

© 2020, M.T. Özsu & P. Valduriez TS. Pan


17
Thị Hà_PTIT
Data Delivery Alternatives

◼ Delivery modes
❑ Pull-only
❑ Push-only
❑ Hybrid
◼ Frequency
❑ Periodic
❑ Conditional
❑ Ad-hoc or irregular
◼ Communication Methods
❑ Unicast
❑ One-to-many
◼ Note: not all combinations make sense
© 2020, M.T. Özsu & P. Valduriez TS. Pan
18
Thị Hà_PTIT
Outline
◼ Introduction

❑ Distributed DBMS promises


© 2020, M.T. Özsu & P. Valduriez TS. Pan


19
Thị Hà_PTIT
Distributed DBMS Promises

 Transparent management of distributed, fragmented,


and replicated data

 Improved reliability/availability through distributed


transactions

 Improved performance

 Easier and more economical system expansion

© 2020, M.T. Özsu & P. Valduriez TS. Pan


Thị Hà_PTIT
Transparency

◼ Transparency is the separation of the higher-level


semantics of a system from the lower level
implementation issues.
◼ Fundamental issue is to provide
data independence
in the distributed environment
❑ Network (distribution) transparency
❑ Replication transparency
❑ Fragmentation transparency
◼ horizontal fragmentation: selection
◼ vertical fragmentation: projection
◼ hybrid

© 2020, M.T. Özsu & P. Valduriez TS. Pan


Thị Hà_PTIT
Example

© 2020, M.T. Özsu & P. Valduriez TS. Pan


22
Thị Hà_PTIT
Transparent Access

Tokyo

SELECT ENAME,SAL
FROM EMP,ASG,PAY Boston Paris
WHERE DUR > 12 Paris projects
Paris employees
AND EMP.ENO = ASG.ENO Communication Paris assignments
Network Boston employees
AND PAY.TITLE = EMP.TITLE
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments

© 2020, M.T. Özsu & P. Valduriez TS. Pan


23
Thị Hà_PTIT
Distributed Database - User View

Distributed Database

© 2020, M.T. Özsu & P. Valduriez TS. Pan


24
Thị Hà_PTIT
Distributed DBMS - Reality
User
Query

User
DBMS
Application
Software
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query
DBMS
Software

User
Query

© 2020, M.T. Özsu & P. Valduriez TS. Pan


25
Thị Hà_PTIT
Types of Transparency

◼ Data independence
◼ Network transparency (or distribution transparency)
❑ Location transparency
❑ Fragmentation transparency
◼ Fragmentation transparency
◼ Replication transparency

© 2020, M.T. Özsu & P. Valduriez TS. Pan


26
Thị Hà_PTIT
Reliability Through Transactions

◼ Replicated components and data should make distributed


DBMS more reliable.
◼ Distributed transactions provide
Concurrency transparency

❑ Failure atomicity

• Distributed transaction support requires implementation of


❑ Distributed concurrency control protocols

❑ Commit protocols

◼ Data replication
❑ Great for read-intensive workloads, problematic for updates
❑ Replication protocols

© 2020, M.T. Özsu & P. Valduriez TS. Pan


27
Thị Hà_PTIT
Potentially Improved Performance

◼ Proximity of data to its points of use

❑ Requires some support for fragmentation and replication

◼ Parallelism in execution

❑ Inter-query parallelism

❑ Intra-query parallelism

© 2020, M.T. Özsu & P. Valduriez TS. Pan


28
Thị Hà_PTIT
Scalability

◼ Issue is database scaling and workload scaling

◼ Adding processing and storage power

◼ Scale-out: add more servers

❑ Scale-up: increase the capacity of one server → has limits

© 2020, M.T. Özsu & P. Valduriez TS. Pan


29
Thị Hà_PTIT
Outline
◼ Introduction

❑ Design issues

© 2020, M.T. Özsu & P. Valduriez TS. Pan


30
Thị Hà_PTIT
Distributed DBMS Issues

◼ Distributed database design


❑ How to distribute the database
❑ Replicated & non-replicated database distribution
❑ A related problem in directory management

◼ Distributed query processing


❑ Convert user transactions to data manipulation instructions
❑ Optimization problem
◼ min{cost = data transmission + local processing}
❑ General formulation is NP-hard

© 2020, M.T. Özsu & P. Valduriez TS. Pan


31
Thị Hà_PTIT
Distributed DBMS Issues

◼ Distributed concurrency control


❑ Synchronization of concurrent accesses
❑ Consistency and isolation of transactions' effects
❑ Deadlock management

◼ Reliability
❑ How to make the system resilient to failures
❑ Atomicity and durability

© 2020, M.T. Özsu & P. Valduriez TS. Pan


32
Thị Hà_PTIT
Distributed DBMS Issues

◼ Replication
❑ Mutual consistency
❑ Freshness of copies
❑ Eager vs lazy
❑ Centralized vs distributed
◼ Parallel DBMS
❑ Objectives: high scalability and performance
❑ Not geo-distributed
❑ Cluster computing

© 2020, M.T. Özsu & P. Valduriez TS. Pan


33
Thị Hà_PTIT
Related Issues

◼ Alternative distribution approaches


❑ Modern P2P
❑ World Wide Web (WWW or Web)
◼ Big data processing
❑ 4V: volume, variety, velocity, veracity
❑ MapReduce & Spark
❑ Stream data
❑ Graph analytics
❑ NoSQL
❑ NewSQL
❑ Polystores

© 2020, M.T. Özsu & P. Valduriez TS. Pan


34
Thị Hà_PTIT
Outline
◼ Introduction

❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


35
Thị Hà_PTIT
DBMS Implementation Alternatives

© 2020, M.T. Özsu & P. Valduriez TS. Pan


36
Thị Hà_PTIT
Dimensions of the Problem

◼ Distribution
❑ Whether the components of the system are located on the same machine or
not
◼ Heterogeneity
❑ Various levels (hardware, communications, operating system)
❑ DBMS important one
◼ data model, query language,transaction management algorithms
◼ Autonomy
❑ Not well understood and most troublesome
❑ Various versions
◼ Design autonomy: Ability of a component DBMS to decide on issues related to its
own design.
◼ Communication autonomy: Ability of a component DBMS to decide whether and
how to communicate with other DBMSs.
◼ Execution autonomy: Ability of a component DBMS to execute local operations in
any manner it wants to.

© 2020, M.T. Özsu & P. Valduriez TS. Pan


37
Thị Hà_PTIT
Client/Server Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


38
Thị Hà_PTIT
Advantages of Client-Server
Architectures
◼ More efficient division of labor
◼ Horizontal and vertical scaling of resources
◼ Better price/performance on client machines
◼ Ability to use familiar tools on client machines
◼ Client access to remote data (via standards)
◼ Full DBMS functionality provided to client workstations
◼ Overall better system price/performance

© 2020, M.T. Özsu & P. Valduriez TS. Pan


39
Thị Hà_PTIT
Database Server

© 2020, M.T. Özsu & P. Valduriez TS. Pan


40
Thị Hà_PTIT
Distributed Database Servers

© 2020, M.T. Özsu & P. Valduriez TS. Pan


41
Thị Hà_PTIT
Peer-to-Peer Component Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


42
Thị Hà_PTIT
MDBS Components & Execution

© 2020, M.T. Özsu & P. Valduriez TS. Pan


43
Thị Hà_PTIT
Mediator/Wrapper Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


44
Thị Hà_PTIT
Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
◼ IaaS – Infrastructure-as-a-Service

◼ PaaS – Platform-as-a-Service

◼ SaaS – Software-as-a-Service

◼ DaaS – Database-as-a-Service

© 2020, M.T. Özsu & P. Valduriez TS. Pan


45
Thị Hà_PTIT
Simplified Cloud Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


46
Thị Hà_PTIT

You might also like