6-Query Intro
6-Query Intro
• Introduction
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
➡ Distributed query optimization
• Multidatabase Query Processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/1
Query Processing in a DDBMS
high level user query
query
processor
• Query optimization
➡ How do we determine the “best” execution plan?
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Site 5 Site 5
result EMP1' EMP2' result= (EMP1 × EMP2)⋈ENOσRESP=“Manager”(ASG1× ASG2)
EMP1' EMP2'
Site 3 Site 4 ASG1 ASG2 EMP1 EMP2
ASG1' ASG'2
Site 1 Site 2
ASG1' σ RESP "Manager" ASG1 ASG'2 σ RESP "Manager" ASG2
• Strategy 1
➡ produce ASG': (10+10) tuple access cost
20
➡ transfer ASG' to the sites of EMP: (10+10) tuple transfer cost
200
➡ produce EMP': (10+10) tuple access cost 2
40
➡ transfer EMP' to result site: (10+10) tuple transfer cost
200
Total Cost 460
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/6
Query Optimization Objectives
• Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed environments
• Wide area networks
➡ communication cost may dominate or vary much
✦ bandwidth
✦ speed
✦ high protocol overhead
• Local area networks
➡ communication cost not that dominant
➡ total cost function should be considered
• Can also maximize throughput
Select
• Assume Project O(n)
➡ relations of cardinality n (without duplicate elimination)
Join
Semi-join O(n log n)
Division
Set Operators
➡ Optimal
• Heuristics
➡ Not optimal
• Distributed
➡ Cooperation among sites to determine the schedule
➡ Need only local information
➡ Cost of cooperation
• Hybrid
➡ One site determines the global schedule
➡ Each site optimizes the local subqueries
Query
Query GLOBAL
GLOBAL
Decomposition
Decomposition SCHEMA
SCHEMA
Fragment Query
Global
Global STATS
STATSON
ON
Optimization
Optimization FRAGMENTS
FRAGMENTS
LOCAL Local
Local LOCAL
LOCAL
Optimization
Optimization SCHEMAS
SCHEMAS
SITES