0% found this document useful (0 votes)
12 views

Lecture 17

The document discusses sorting algorithms and query optimization in database management systems. It covers topics like external merge sort, B+ tree sorting, cost estimation for different operators, and equivalence rules for relational algebra expressions. The document also describes approaches for query optimization like greedy algorithms and dynamic programming.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 17

The document discusses sorting algorithms and query optimization in database management systems. It covers topics like external merge sort, B+ tree sorting, cost estimation for different operators, and equivalence rules for relational algebra expressions. The document also describes approaches for query optimization like greedy algorithms and dynamic programming.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

CSE 412 Database Management

Lecture 17 Equivalence of Relational


Algebra and Query Optimization
Jia Zou
Arizona State University

1
Operator Implementations
• Selection
• Join
• Order By: Sorting
• Projection with Distinct
• Group By

2
Overview of the Sorting Problem
• Files are broken up into N pages.
• The DBMS has a finite number of B fixed size buffers.
• Let’s start with a simple example...

3
Two-way External Merge Sort

4
Two-way External Merge Sort

• Each pass we read + write each page in file

• # of passes: log2 N

• Total I/O cost: 2N log2 N


5
General External Merge Sort

6
• # of passes: logB N
General External Merge Sort • Total I/O cost: 2N logB N=
2N log N/log B

7
Using B+ Tree for Sorting

8
Clustered B+ Tree for Sorting

IO costs: Read N pages


9
Clustered B+ Tree for Sorting

A page has p records


IO costs: Read pN pages
10
Operator Implementations
• Selection
• Join
• Order By
• Projection with Distinct
• Group By

11
12
Sorting Projection

13
14
15
I/O costs: 2*N
(N page input + N page output
each page is read once and write once)

16
I/O costs: N
Load each of N pages to memory for rehash

17
Projection
• Total I/O cost for Sorting: 2NxlogN/logB (B<=N)
• Total I/O cost for Hashing: 3N

18
19
Operator Implementations
• Selection
• Join
• Order By
• Projection with Distinct
• Group By: The same with DISTINCT projection!

20
21
Equivalence of Relational Algebra, and
Query Optimization

22
Example Database

Indexed on Grade
Query Plan Example
Query Plan Example
Query Plan Example

On-the-fly
On-the-fly
Hash Join

File Scan File Scan


Query Plan Example

Is this relational algebra


expression the optimal?
On-the-fly Can we find a better query plan?
On-the-fly
Hash Join

File Scan File Scan


Query Optimization
• Step 1. Enumerate possible plans
• Relational Algebra Equivalences
• Step 2. Estimate cost for each plan
• Step 3. Pick the “best”

• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
Relational Algebra Equivalences
• Two relational algebra expressions are equivalent if they generate the
same set of tuples.
Predicate Pushdown
Predicate Pushdown
Selections
• Perform it early
• Break a complex predicate, and push down

• Simplify a complex predicate


• (X=Y AND Y=3) → X=3 AND Y=3
Projections
• Perform them early
• Smaller tuples
• Fewer tuples (if duplicates are eliminated)
• Project out all attributes except the ones requested or required (e.g.,
joining attr.)
Projection Pushdown Example
Projection Pushdown Example
Joins
• X and ⋈ are Commutative, associative
More of relational algebra equivalence
Agenda: Query Optimization
• How to enumerate possible plans?
• Relational Algebra Equivalences
• How to estimate cost for each plan?
• How to pick the “best”

• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
Selections with equality predicates

Example:
sel(rating=‘2’) = 1/5
Negated and disjunctive predicates

Example:
sel(rating != ‘2’)=1-
1/5=4/5
Range predicates
Example:
sel(rating > ‘2’)=(4-
2)/(4-0)=1/2
Cost estimation: summary
• Using similar ideas, we can estimate the size of projection, duplicate
elimination, union, difference, aggregation (with grouping)
• Lots of assumptions and very rough estimation
• Accurate estimate is not needed
• Not covered: better estimation using histograms and sampling
Agenda: Query Optimization
• How to enumerate possible plans?
• Relational Algebra Equivalences
• How to estimate cost for each plan?
• How to pick the “best” ?

• Often the goal is not getting the optimum plan, but instead avoiding
the horrible ones one?
A greedy algorithm
A dynamic programming algorithm
Summary: Query Optimization
• How to enumerate possible plans?
• Relational Algebra Equivalences
• How to estimate cost for each plan?
• Assumptions of uniform distributions
• Histogram and Sampling (optional)
• How to pick the “best”
• Need statistics to estimate sizes of intermediate results
• Greedy approach
• Dynamic programming approach
PostgreSQL Query Optimizer
• Examines all types of join trees
• Left-deep, Right-deep, bushy
• Two optimizer implementations:
• Traditional Dynamic Programming Approach
• Genetic Query Optimizer (GEQO)
• Postgres uses the traditional one when # of tables in query is less
than 12 and switches to GEQO when there are 12 or more.
Postgres GEQO
Postgres GEQO
Postgres GEQO
Postgres GEQO
Postgres GEQO

You might also like