Lecture 17
Lecture 17
1
Operator Implementations
• Selection
• Join
• Order By: Sorting
• Projection with Distinct
• Group By
2
Overview of the Sorting Problem
• Files are broken up into N pages.
• The DBMS has a finite number of B fixed size buffers.
• Let’s start with a simple example...
3
Two-way External Merge Sort
4
Two-way External Merge Sort
• # of passes: log2 N
6
• # of passes: logB N
General External Merge Sort • Total I/O cost: 2N logB N=
2N log N/log B
7
Using B+ Tree for Sorting
8
Clustered B+ Tree for Sorting
11
12
Sorting Projection
13
14
15
I/O costs: 2*N
(N page input + N page output
each page is read once and write once)
16
I/O costs: N
Load each of N pages to memory for rehash
17
Projection
• Total I/O cost for Sorting: 2NxlogN/logB (B<=N)
• Total I/O cost for Hashing: 3N
18
19
Operator Implementations
• Selection
• Join
• Order By
• Projection with Distinct
• Group By: The same with DISTINCT projection!
20
21
Equivalence of Relational Algebra, and
Query Optimization
22
Example Database
Indexed on Grade
Query Plan Example
Query Plan Example
Query Plan Example
On-the-fly
On-the-fly
Hash Join
• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
Relational Algebra Equivalences
• Two relational algebra expressions are equivalent if they generate the
same set of tuples.
Predicate Pushdown
Predicate Pushdown
Selections
• Perform it early
• Break a complex predicate, and push down
• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
Selections with equality predicates
Example:
sel(rating=‘2’) = 1/5
Negated and disjunctive predicates
Example:
sel(rating != ‘2’)=1-
1/5=4/5
Range predicates
Example:
sel(rating > ‘2’)=(4-
2)/(4-0)=1/2
Cost estimation: summary
• Using similar ideas, we can estimate the size of projection, duplicate
elimination, union, difference, aggregation (with grouping)
• Lots of assumptions and very rough estimation
• Accurate estimate is not needed
• Not covered: better estimation using histograms and sampling
Agenda: Query Optimization
• How to enumerate possible plans?
• Relational Algebra Equivalences
• How to estimate cost for each plan?
• How to pick the “best” ?
• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
A greedy algorithm
A dynamic programming algorithm
Summary: Query Optimization
• How to enumerate possible plans?
• Relational Algebra Equivalences
• How to estimate cost for each plan?
• Assumptions of uniform distributions
• Histogram and Sampling (optional)
• How to pick the “best”
• Need statistics to estimate sizes of intermediate results
• Greedy approach
• Dynamic programming approach
PostgreSQL Query Optimizer
• Examines all types of join trees
• Left-deep, Right-deep, bushy
• Two optimizer implementations:
• Traditional Dynamic Programming Approach
• Genetic Query Optimizer (GEQO)
• Postgres uses the traditional one when # of tables in query is less
than 12 and switches to GEQO when there are 12 or more.
Postgres GEQO
Postgres GEQO
Postgres GEQO
Postgres GEQO
Postgres GEQO