0% found this document useful (0 votes)
4 views

1_1b_query_optimization_sil_7ed_ch16_SPLIT

Chapter 16 focuses on query optimization, which involves finding the best query execution plan (QEP) among various alternatives. It covers generating equivalent expressions, estimating statistics of expression results, and choosing evaluation plans using dynamic programming. The chapter emphasizes the importance of cost-based query optimization and provides equivalence rules for relational algebra expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1_1b_query_optimization_sil_7ed_ch16_SPLIT

Chapter 16 focuses on query optimization, which involves finding the best query execution plan (QEP) among various alternatives. It covers generating equivalent expressions, estimating statistics of expression results, and choosing evaluation plans using dynamic programming. The chapter emphasizes the importance of cost-based query optimization and provides equivalence rules for relational algebra expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Dario Della Monica

Chapter 16: Query Optimization

These slides are a modified version of the slides provided with the book:

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

(however, chapter numeration refers to 7th Ed.)

The original version of the slides is available at: https://www.db-book.com/


Chapter 16: Query Optimization
 Introduction
 Generating Equivalent Expressions
 Equivalence rules
 How to generate (all) equivalent expressions
 Estimating Statistics of Expression Results
 The Catalog
 Size estimation
 Selection
 Join
 Other operations (projection, aggregation, set operations, outer join)

 Estimation of number of distinct values


 Choice of Evaluation Plans
 Dynamic Programming for Choosing Evaluation Plans

Database System Concepts - 7th Edition 16.2 ©Silberschatz, Korth and Sudarshan
Introduction
 Query optimization: finding the “best” query execution plan (QEP) among the
many possible ones
 User is not expected to write queries efficiently (DBMS optimizer takes care of that)
 Alternative ways to execute a given query – 2 levels
 Equivalent relational algebra expressions
 Different implementation choices for each relational algebra operation
 Algorithms, indices, coordination between successive operations, …

Database System Concepts - 7th Edition 16.3 ©Silberschatz, Korth and Sudarshan
Introduction
 Query optimization: finding the “best” query execution plan (QEP) among the
many possible ones
 User is not expected to write queries efficiently (DBMS optimizer takes care of that)
 Alternative ways to execute a given query – 2 levels
 Equivalent relational algebra expressions
 Different implementation choices for each relational algebra operation
 Algorithms, indices, coordination between successive operations, …

INSTR(i_id, name, dept_name, ...) The name of all instructors in the department of Music
COURSE(c_id, title, ...) together with the titles of all courses they teach
TEACHES(i_id, c_id, ...)

Database System Concepts - 7th Edition 16.4 ©Silberschatz, Korth and Sudarshan
Introduction
 Query optimization: finding the “best” query execution plan (QEP) among the
many possible ones
 User is not expected to write queries efficiently (DBMS optimizer takes care of that)
 Alternative ways to execute a given query – 2 levels
 Equivalent relational algebra expressions
 Different implementation choices for each relational algebra operation
 Algorithms, indices, coordination between successive operations, …

INSTR(i_id, name, dept_name, ...) The name of all instructors in the department of Music
COURSE(c_id, title, ...) together with the titles of all courses they teach
TEACHES(i_id, c_id, ...)

SELECT I.name, C.title


FROM INSTR I, COURSE C, TEACHES T
WHERE I.i_id = T.i_id
AND T.c_id = C.c_id
AND dept_name=“Music”

Database System Concepts - 7th Edition 16.5 ©Silberschatz, Korth and Sudarshan
Introduction
 Query optimization: finding the “best” query execution plan (QEP) among the
many possible ones
 User is not expected to write queries efficiently (DBMS optimizer takes care of that)
 Alternative ways to execute a given query – 2 levels
 Equivalent relational algebra expressions
 Different implementation choices for each relational algebra operation
 Algorithms, indices, coordination between successive operations, …

INSTR(i_id, name, dept_name, ...) The name of all instructors in the department of Music
COURSE(c_id, title, ...) together with the titles of all courses they teach
TEACHES(i_id, c_id, ...)

SELECT I.name, C.title


FROM INSTR I, COURSE C, TEACHES T
WHERE I.i_id = T.i_id
AND T.c_id = C.c_id
AND dept_name=“Music”

∏ (σ ( INSTR (TEACHES COURSE ))) ∏ (σ ( INSTR) (TEACHES COURSE ))


Database System Concepts - 7th Edition 16.6 ©Silberschatz, Korth and Sudarshan
Introduction (Cont.)
 A query evaluation plan (QEP) defines exactly what algorithm is used
for each operation, and how the execution of the operations is
coordinated

 Find out how to view query execution plans on your favorite database

Database System Concepts - 7th Edition 16.7 ©Silberschatz, Korth and Sudarshan
Introduction (Cont.)

 Cost difference between query evaluation plans can be enormous


E.g. seconds vs. days in some cases

 It is worth spending time in finding “best” QEP
 Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence
rules
2. Annotate in all possible ways resulting expressions to get
alternative QEP
3. Evaluate/estimate the cost (execution time) of each QEP
4. Choose the cheapest QEP based on estimated cost
 Estimation of QEP cost based on:
 Statistical information about relations (stored in the Catalog)
 number of tuples, number of distinct values for an attribute
 Statistics estimation for intermediate results
 to compute cost of complex expressions
 Cost formulae for algorithms, computed using statistics
Database System Concepts - 7th Edition 16.8 ©Silberschatz, Korth and Sudarshan
Generating Equivalent Expressions
 Equivalence rules
 How to generate (all) equivalent expressions

These slides are a modified version of the slides provided with the book:

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

(however, chapter numeration refers to 7th Ed.)

The original version of the slides is available at: https://www.db-book.com/


Transformation of Relational Expressions

 Two relational algebra expressions are said to be equivalent if the two


expressions generate the same set of tuples on every legal database
instance
 Note: order of tuples is irrelevant (and also order of attributes)
 We don’t care if they generate different results on databases that
violate integrity constraints (e.g., uniqueness of keys)
 In SQL, inputs and outputs are multisets of tuples
 Two expressions in the multiset version of the relational algebra are
said to be equivalent if the two expressions generate the same multiset
of tuples on every legal database instance
 We focus on relational algebra and treat relations as sets
 An equivalence rule states that expressions of two forms are equivalent
 One can replace an expression of first form by one of the second form,
or vice versa

Database System Concepts - 7th Edition 16.10 ©Silberschatz, Korth and Sudarshan
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
σ θ ∧θ ( E) = σ θ (σ θ ( E))
1 2 1 2

Database System Concepts - 7th Edition 16.11 ©Silberschatz, Korth and Sudarshan
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
σ θ ∧θ ( E) = σ θ (σ θ ( E))
1 2 1 2
2. Selection operations are commutative.
σ θ (σ θ ( E)) = σ θ (σ θ ( E))
1 2 2 1

Database System Concepts - 7th Edition 16.12 ©Silberschatz, Korth and Sudarshan
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
σ θ ∧θ ( E) = σ θ (σ θ ( E))
1 2 1 2
2. Selection operations are commutative.
σ θ (σ θ ( E)) = σ θ (σ θ ( E))
1 2 2 1

3. Only the last in a sequence of projection operations is


needed, the others can be omitted
Π L1 (Π L2 (K (Π Ln ( E )) K)) = Π L1 ( E )
where L1 ⊆ L2 ⊆ K ⊆ Ln

Database System Concepts - 7th Edition 16.13 ©Silberschatz, Korth and Sudarshan
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
σ θ ∧θ ( E) = σ θ (σ θ ( E))
1 2 1 2
2. Selection operations are commutative.
σ θ (σ θ ( E)) = σ θ (σ θ ( E))
1 2 2 1

3. Only the last in a sequence of projection operations is


needed, the others can be omitted
Π L1 (Π L2 (K (Π Ln ( E )) K)) = Π L1 ( E )
where L1 ⊆ L2 ⊆ K ⊆ Ln
4. Selections can be combined with Cartesian products and
theta joins.
a. σθ(E1 x E2) = E1 θ E2
b. σθ1(E1 θ2 E2) = E1 θ1∧ θ2 E2

Database System Concepts - 7th Edition 16.14 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join (and thus natural joins) operations are commutative.
E1 θ E2 = E2 θ E1
(but the order is important for efficiency)

Database System Concepts - 7th Edition 16.15 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join (and thus natural joins) operations are commutative.
E1 θ E2 = E2 θ E1
(but the order is important for efficiency)

6. (a) Natural join operations are associative:


(E1 E2) E3 = E1 (E2 E3)
(again, the order is important for efficiency)

Database System Concepts - 7th Edition 16.16 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join (and thus natural joins) operations are commutative.
E1 θ E2 = E2 θ E1
(but the order is important for efficiency)

6. (a) Natural join operations are associative:


(E1 E2) E3 = E1 (E2 E3)
(again, the order is important for efficiency)

(b) Theta joins are associative in the following manner:


(E1 θ1 E2) θ2∧θ3 E3 = E1 θ1∧ θ3 (E2 θ2 E3)
where θ1 involves attributes from only E1 and E2
and θ2 involves attributes from only E2 and E3

Database System Concepts - 7th Edition 16.17 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)

7. (a) Selection distributes over theta join in the following manner:


σθ (E1 ⋈θ E2) = (σθ (E1)) ⋈θ E2
1 1
where θ1 involves attributes from only E1

(b) Complex selection distributes over theta join in the following manner:
σθ ∧θ2(E1 ⋈θ E2) = (σθ (E1)) ⋈θ (σθ (E2))
1 1 2

where θ1 involves attributes from only E1


and θ2 involves attributes from only E2

More equivalences at Ch. 16.2 of the book ⋆


⋆ Silberschatz, Korth, and Sudarshan, Database System Concepts, 7° ed.

Database System Concepts - 7th Edition 16.18 ©Silberschatz, Korth and Sudarshan
Pictorial Depiction of Equivalence Rules

Database System Concepts - 7th Edition 16.19 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Database System Concepts - 7th Edition 16.20 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Definition (left outer join): the result of a left outer join T = R S is a super-set of the
result of the join T’ = R S in that all tuples in T’ appear in T. In addition, T preserve
those tuples that are lost in the join, by creating tuples in T that are filled with null
values

Database System Concepts - 7th Edition 16.21 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Definition (left outer join): the result of a left outer join T = R S is a super-set of the
result of the join T’ = R S in that all tuples in T’ appear in T. In addition, T preserve
those tuples that are lost in the join, by creating tuples in T that are filled with null
values

STUD stud_id name surname


1 gino bianchi
2 filippo neri stud_id name surname course grade
3 mario rossi 1 gino bianchi Math 30
2 filippo neri DB 22
TAKES stud_id course grade
2 filippo neri Logic 30
1 Math 30
2 DB 22
2 Logic 30

Database System Concepts - 7th Edition 16.22 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Definition (left outer join): the result of a left outer join T = R S is a super-set of the
result of the join T’ = R S in that all tuples in T’ appear in T. In addition, T preserve
those tuples that are lost in the join, by creating tuples in T that are filled with null
values

STUD stud_id name surname


1 gino bianchi STUD TAKES
2 filippo neri stud_id name surname course grade
3 mario rossi 1 gino bianchi Math 30
2 filippo neri DB 22
TAKES stud_id course grade
2 filippo neri Logic 30
1 Math 30
3 mario rossi null null
2 DB 22
2 Logic 30

Database System Concepts - 7th Edition 16.23 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Definition (left outer join): the result of a left outer join T = R S is a super-set of the
result of the join T’ = R S in that all tuples in T’ appear in T. In addition, T preserve
those tuples that are lost in the join, by creating tuples in T that are filled with null
values

STUD stud_id name surname


1 gino bianchi STUD TAKES
2 filippo neri stud_id name surname course grade
3 mario rossi 1 gino bianchi Math 30
2 filippo neri DB 22
TAKES stud_id course grade
2 filippo neri Logic 30
1 Math 30
3 mario rossi null null
2 DB 22
2 Logic 30
TAKES STUD ???

Database System Concepts - 7th Edition 16.24 ©Silberschatz, Korth and Sudarshan
Exercise
 Disprove the equivalence

(R S) T = R (S T)

Definition (left outer join): the result of a left outer join T = R S is a super-set of the
result of the join T’ = R S in that all tuples in T’ appear in T. In addition, T preserve
those tuples that are lost in the join, by creating tuples in T that are filled with null
values

STUD stud_id name surname


1 gino bianchi STUD TAKES
2 filippo neri stud_id name surname course grade
3 mario rossi 1 gino bianchi Math 30
2 filippo neri DB 22
TAKES stud_id course grade
2 filippo neri Logic 30
1 Math 30
3 mario rossi null null
2 DB 22
2 Logic 30
TAKES STUD ??? equivalent to
TAKES STUD

Database System Concepts - 7th Edition 16.25 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

Database System Concepts - 7th Edition 16.26 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

R S T
A AR A AS A AT
1 1 2 1 1 1

Database System Concepts - 7th Edition 16.27 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

R S T
A AR A AS A AT
1 1 2 1 1 1

R S
A AR AS
1 1 null

Database System Concepts - 7th Edition 16.28 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

R S T
A AR A AS A AT
1 1 2 1 1 1

R S
A AR AS
1 1 null

(R S) T
A AR AS AT
1 1 null 1

Database System Concepts - 7th Edition 16.29 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

R S T
A AR A AS A AT
1 1 2 1 1 1

R S S T
A AR AS A AS AT
1 1 null 2 1 null

(R S) T
A AR AS AT
1 1 null 1

Database System Concepts - 7th Edition 16.30 ©Silberschatz, Korth and Sudarshan
Solution
 Disprove the equivalence ( R S) T = R (S T)

R S T
A AR A AS A AT
1 1 2 1 1 1

R S S T
A AR AS A AS AT
1 1 null 2 1 null

(R S) T R (S T)
A AR AS AT A AR AS AT
1 1 null 1 1 1 null null

Database System Concepts - 7th Edition 16.31 ©Silberschatz, Korth and Sudarshan
Equivalence derivability and minimality

 Some equivalence can be derived from others


 example: 2 can be obtained from 1 (exploiting commutativity of conjunction)
7b can be obtained from 1 and 7a

 Optimizers use minimal sets of equivalence rules

Database System Concepts - 7th Edition 16.32 ©Silberschatz, Korth and Sudarshan
Enumeration of Equivalent Expressions
 Query optimizers use equivalence rules to systematically generate
expressions equivalent to the given one
 Can generate all equivalent expressions as follows:
 Repeat (starting from the set containing only the given expression)
 apply all applicable equivalence rules on every sub-expression of
every equivalent expression found so far
 add newly generated expressions to the set of equivalent
expressions
Until no new equivalent expressions are generated
 The above approach is very expensive in space and time
 Space: efficient expression-representation techniques
 1 copy is stored for shared sub-expressions
 Time: partial generation
 Dynamic programming
 Greedy techniques (select best choices at each step) E1 E2
 Heuristics, e.g., single-relation operations
(selections, projections) are pushed inside (performed earlier)

Database System Concepts - 7th Edition 16.33 ©Silberschatz, Korth and Sudarshan
Estimating Statistics of Expression
Results
 The Catalog
 Size estimation
 Selection
 Join
 Other operations (projection, aggregation, set operations, outer join)

 Estimation of number of distinct values

These slides are a modified version of the slides provided with the book:

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

(however, chapter numeration refers to 7th Ed.)

The original version of the slides is available at: https://www.db-book.com/


Cost Estimation
 Cost of each operator computed as described in Chapter 15 ⋆
 Need statistics of input relations
 E.g. number of tuples, number of blocks
 Statistics are collected in the Catalog
 Inputs can be results of sub-expressions
 Need to estimate statistics of expression results
 Estimation of size of intermediate results
 # of tuple in input to successive operations

 Estimation of number of distinct values in intermediate results


 selectivity rate of successive selection operations

 Statistics are not totally accurate


 Information in the catalog might be not always up-to-date (delay)
 A precise estimate for intermediate results might be impossible to compute

⋆ Silberschatz, Korth, and Sudarshan, Database System Concepts, 7° ed.

Database System Concepts - 7th Edition 16.35 ©Silberschatz, Korth and Sudarshan
Statistical Information for Cost Estimation
 Statistics information is maintained in the Catalog
 The catalog is itself stored in the database as relation(s)
 It contains:
 nr: number of tuples in a relation r
 br: number of blocks containing tuples of r
 lr: size of a tuple of r (in bytes)
 fr: blocking factor of r – i.e., the number of tuples of r that fit into one block
 V(A, r): number of distinct values that appear in r for set of attributes A
 V(A, r) = the size of ∏A(r) – if A is a key, then V(A,r) = nr

 min(A,r): smallest value appearing in relation r for set of attribute A;


 max(A,r): largest value appearing in relation r for set of attribute A;
 statistics about indices (height of B+-trees, number of blocks for leaves, …)

 We assume tuples of r are stored together physically in a file; then: br = ⌈ nr / fr ⌉


 Information not always up-to-date
 Catalog is not updated to every DB change (done during periods of light system load)

Database System Concepts - 7th Edition 16.36 ©Silberschatz, Korth and Sudarshan
Histograms
 Histogram on attribute age of relation person
5
50 3

40 4
frequency
30 3

20 1

10

1–5 6–10 11–15 16–20 21–25


value
 For each range
 Number of records (tuples) with value in the range
 Also, number of distinct values in the range (red numbers in the picture)

 Without histogram information, uniform distribution is assumed


 Little space occupation
 Histograms for many attributes on many relations can be stored

Database System Concepts - 7th Edition 16.37 ©Silberschatz, Korth and Sudarshan
Selection Size Estimation
 # of records that will satisfy the selection predicate (aka selection condition)
 σA=v(r ) (we are assuming that v actually is present in A)

Database System Concepts - 7th Edition 16.38 ©Silberschatz, Korth and Sudarshan
Selection Size Estimation
 # of records that will satisfy the selection predicate (aka selection condition)
 σA=v(r ) (we are assuming that v actually is present in A)
 nr / V(A,r) (no histogram, uniform distribution)
 1 if A is key

Database System Concepts - 7th Edition 16.39 ©Silberschatz, Korth and Sudarshan
Selection Size Estimation
 # of records that will satisfy the selection predicate (aka selection condition)
 σA=v(r ) (we are assuming that v actually is present in A)
 nr / V(A,r) (no histogram, uniform distribution)
 1 if A is key

 σA ≤ v(r ) (case σA ≥ V(r) is symmetric)


 0 if v < min(A,r)
 nr if v >= max(A,r)

Database System Concepts - 7th Edition 16.40 ©Silberschatz, Korth and Sudarshan
Selection Size Estimation
 # of records that will satisfy the selection predicate (aka selection condition)
 σA=v(r ) (we are assuming that v actually is present in A)
 nr / V(A,r) (no histogram, uniform distribution)
 1 if A is key

 σA ≤ v(r ) (case σA ≥ V(r) is symmetric)


 0 if v < min(A,r)
 nr if v >= max(A,r)


v − min( A, r ) otherwise (no histogram, uniform distribution)
nr *
max( A, r ) − min( A, r )

 In absence of statistical information or when v is unknown at time of cost estimation


(e.g., v is computed at run-time by the application using the DB), the we assume
 nr / 2

Database System Concepts - 7th Edition 16.41 ©Silberschatz, Korth and Sudarshan
Selection Size Estimation
 # of records that will satisfy the selection predicate (aka selection condition)
 σA=v(r ) (we are assuming that v actually is present in A)
 nr / V(A,r) (no histogram, uniform distribution)
 1 if A is key

 σA ≤ v(r ) (case σA ≥ V(r) is symmetric)


 0 if v < min(A,r)
 nr if v >= max(A,r)


v − min( A, r ) otherwise (no histogram, uniform distribution)
nr *
max( A, r ) − min( A, r )

 In absence of statistical information or when v is unknown at time of cost estimation


(e.g., v is computed at run-time by the application using the DB), the we assume
 nr / 2

 If histograms are available, we can do more precise estimates


 use values for restricted ranges instead of nr , V(A,r), min(A, r), max(A,r)

Database System Concepts - 7th Edition 16.42 ©Silberschatz, Korth and Sudarshan
Complex Selection Size Estimation
 Conjunction E = σθ (r )
1 ∧ θ2 ∧ … ∧ θn

 we compute si = size selection for θi (i = 1,…, n)


 selectivity rate (SR) of σθi (r): SR(σθi (r) ) = si / nr (i = 1,…, n)
 SR(E) = Πi (SR(σθi (r))) = s1 / nr * … * sn/ nr Πi is multiplication with i = 1,…,n
s1*s2*...*sn
 # of record for E = nr * SR(E) = nr*
(nr )n

Database System Concepts - 7th Edition 16.43 ©Silberschatz, Korth and Sudarshan
Complex Selection Size Estimation
 Conjunction E = σθ (r )
1 ∧ θ2 ∧ … ∧ θn

 we compute si = size selection for θi (i = 1,…, n)


 selectivity rate (SR) of σθi (r): SR(σθi (r) ) = si / nr (i = 1,…, n)
 SR(E) = Πi (SR(σθi (r))) = s1 / nr * … * sn/ nr Πi is multiplication with i = 1,…,n
s1*s2*...*sn
 # of record for E = nr * SR(E) = nr*
(nr )n

 Disjunction E = σθ (r ) = σ¬(¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn) (r )


1 ∨ θ2 ∨ … ∨ θn
 SR(E) = 1 - SR(σ¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn (r ))
 SR(σ¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn (r )) = (1 - s1 / nr ) * … * (1 - sn/ nr )
 s1 s s 
 # of record for E = nr * SR(E) = nr* 1 - ( 1 − )*( 1 − 2 )*...*( 1 − n )
 nr nr nr 

Database System Concepts - 7th Edition 16.44 ©Silberschatz, Korth and Sudarshan
Complex Selection Size Estimation
 Conjunction E = σθ (r )
1 ∧ θ2 ∧ … ∧ θn

 we compute si = size selection for θi (i = 1,…, n)


 selectivity rate (SR) of σθi (r): SR(σθi (r) ) = si / nr (i = 1,…, n)
 SR(E) = Πi (SR(σθi (r))) = s1 / nr * … * sn/ nr Πi is multiplication with i = 1,…,n
s1*s2*...*sn
 # of record for E = nr * SR(E) = nr*
(nr )n

 Disjunction E = σθ (r ) = σ¬(¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn) (r )


1 ∨ θ2 ∨ … ∨ θn
 SR(E) = 1 - SR(σ¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn (r ))
 SR(σ¬θ1 ∧ ¬θ2 ∧ … ∧ ¬θn (r )) = (1 - s1 / nr ) * … * (1 - sn/ nr )
 s1 s s 
 # of record for E = nr * SR(E) = nr* 1 - ( 1 − )*( 1 − 2 )*...*( 1 − n )
 nr nr nr 

 Negation E = σ¬θ (r)


 # of record for E = nr - # of record for σ θ (r)

Database System Concepts - 7th Edition 16.45 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result

Database System Concepts - 7th Edition 16.46 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns

Database System Concepts - 7th Edition 16.47 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:

Database System Concepts - 7th Edition 16.48 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected

Database System Concepts - 7th Edition 16.49 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)

Database System Concepts - 7th Edition 16.50 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)

Database System Concepts - 7th Edition 16.51 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)
 lowest is more accurate estimation # of records = nr * ns / max{ V(A,r), V(A,s) }

Database System Concepts - 7th Edition 16.52 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)
 lowest is more accurate estimation # of records = nr * ns / max{ V(A,r), V(A,s) }
 histograms can be used for more accurate estimations
 histograms must be on join attributes, for both relations, and with same ranges
 use values for each range of the histogram, instead of nr , ns , V(A,r), V(A,s), and then sum
estimations obtained for all ranges

Database System Concepts - 7th Edition 16.53 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)
 lowest is more accurate estimation # of records = nr * ns / max{ V(A,r), V(A,s) }
 histograms can be used for more accurate estimations
 histograms must be on join attributes, for both relations, and with same ranges
 use values for each range of the histogram, instead of nr , ns , V(A,r), V(A,s), and then sum
estimations obtained for all ranges
 if A is key for r, then # of records <= ns (and vice versa)
 in addition, if A is not null in s, then # of records = ns (and vice versa)

Database System Concepts - 7th Edition 16.54 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)
 lowest is more accurate estimation # of records = nr * ns / max{ V(A,r), V(A,s) }
 histograms can be used for more accurate estimations
 histograms must be on join attributes, for both relations, and with same ranges
 use values for each range of the histogram, instead of nr , ns , V(A,r), V(A,s), and then sum
estimations obtained for all ranges
 if A is key for r, then # of records <= ns (and vice versa)
 in addition, if A is not null in s, then # of records = ns (and vice versa)
 (theta join) r ⋈θ s

Database System Concepts - 7th Edition 16.55 ©Silberschatz, Korth and Sudarshan
Join Size Estimation
 # of records that will be included in the result
 (cartesian product) r x s: # of records = nr * ns
 (natural join on attribute A) r ⋈ s:
 for each tuple tr of r there are in average ns / V(A,s) many tuples of s selected
 thus, # of records = nr * ns / V(A,s)
 by switching the role of r and s we get # of records = nr * ns / V(A,r)
 lowest is more accurate estimation # of records = nr * ns / max{ V(A,r), V(A,s) }
 histograms can be used for more accurate estimations
 histograms must be on join attributes, for both relations, and with same ranges
 use values for each range of the histogram, instead of nr , ns , V(A,r), V(A,s), and then sum
estimations obtained for all ranges
 if A is key for r, then # of records <= ns (and vice versa)
 in addition, if A is not null in s, then # of records = ns (and vice versa)
 (theta join) r ⋈θ s
 r ⋈θ s = σ θ ( r x s) use formulas for cartesian product and selection

Database System Concepts - 7th Edition 16.56 ©Silberschatz, Korth and Sudarshan
Size Estimation for Other Operations
 projection (no duplications): # of records = V(A,r)
 aggregation GγF (r) # of records = V(G,r)
 set operations
 between selections on same relation use formulas for selection
 es.: σθ1(r) ∪ σθ2(r) = σθ1 ∨ θ2 (r)
 r∪s # of records = nr + ns
 r∩s # of records = min { nr , ns }
 r–s # of records = nr
 outer join
 left outer join # of records = # of records for inner join + nr
 right outer join # of records = # of records for inner join + ns
 full outer join # of records = # of records for inner join + nr + ns

Database System Concepts - 7th Edition 16.57 ©Silberschatz, Korth and Sudarshan
Estimation for Number of Distinct Values
 # distinct values in the result for expression E and attribute (or set of attributes) A: V(A,E)
 Selection E = σθ (r)
 V(A, E) is a specific value for some conditions
 e.g., if condition θ is A=3 , then V(A, E) = 1
 e.g., if condition θ is 3 < A <= 6, then V(A, E) = 3 (assuming domain of A is the integers)
 condition A < v (or A > v, A >= v, … ) V(A,E) = V(A,r) * selectivity rate of the selection
 otherwise V(A,E) = min { nE , V(A,r) }
 Join E = r⋈s
 A only contains attributes from r V(A,E) = min { nE , V(A,r) }
 A only contains attributes from s V(A,E) = min { nE , V(A,s) }
 A contains attributes A1 from r and attributes A2 from s
V(A,E) = min { nE , V(A1, r) * V(A2 – A1, s) , V(A2, s) * V(A1 – A2, r) }

Database System Concepts - 7th Edition 16.58 ©Silberschatz, Korth and Sudarshan
Choice of Evaluation Plans
 Dynamic Programming for Choosing Evaluation Plans

These slides are a modified version of the slides provided with the book:

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

(however, chapter numeration refers to 7th Ed.)

The original version of the slides is available at: https://www.db-book.com/


Choice of Evaluation Plans
 Must consider the interaction of evaluation techniques when choosing
evaluation plans
 choosing the cheapest algorithm for each operation independently
may not yield best overall algorithm. E.g.
 merge-join may be costlier than hash-join, but may provide a
sorted output which reduces the cost for an outer level
aggregation
 nested-loop join may provide opportunity for pipelining
 Practical query optimizers incorporate elements of the following two
broad approaches:
1. Search all the plans and choose the best plan in a cost-based
fashion
2. Uses heuristics to choose a plan

Database System Concepts - 7th Edition 16.60 ©Silberschatz, Korth and Sudarshan
Cost--Based Optimization
Cost
 A big part of a cost-based optimizer (based on equivalence rules) is
choosing the “best” order for join operations
 Consider finding the best join-order for r1 ⋈ r2 ⋈ . . . ⋈ rn.
 There are (2(n – 1))!/(n – 1)! different join orders for above expression.
With n = 7, the number is 665280, with n = 10, the number is greater
than 17.6 billion!
 No need to generate all the join orders. Exploiting some monotonicity
(optimal substructure property), the least-cost join order for any subset
of {r1, r2, . . ., rn} is computed only once.

Database System Concepts - 7th Edition 16.61 ©Silberschatz, Korth and Sudarshan
Cost--Based Optimization: An example
Cost
 Consider finding the best join-order for r1 r2 r3 r4 r5
(2(n − 1))! 8 !
 Number of possible different join orderings: = = 1680
(n − 1)! 4!
 The least-cost join order for any subset of { r1, r2, r3, r4, r5 } is computed only once
 Assume we want to compute N123/45 : number of possible different join orderings
where r1, r2, r3 sare grouped together, e.g.,

(r1 r2 r3 ) r4 r5 (r2 r3 r1) (r5 r4 ) r4 (r5 (r1 (r2 r3))) …

 The naïve approach


 N123/45 = N123 * N45
 N123 = 4! = 12 (N123 : # ways of arranging r1, r2, and r3)
2!
 N45 = N123 = 12 (N45 : # ways of arranging r4 and r5 wrt. block of r1, r2, and r3)
 N123/45 = 12 * 12 = 144
 Exploiting optimal substructure property:
 compute only once best ordering for r1 r2 r3 : 12 possibilities (N123)
 compute best ordering for R123 r4 r5 : 12 possibilities (N45)
 Therefore, N123/45 = 12 + 12 = 24

Database System Concepts - 7th Edition 16.62 ©Silberschatz, Korth and Sudarshan
Dynamic Programming in Optimization

 To find best join tree (equivalently, best join order) for a set of n relations:

 Consider all possible plans of the form:


S’ ⋈ (S \ S’ )
for every non-empty subset S’ of S

 Recursively compute (and store) costs of best join orders for subsets
S’ and S \ S’. Choose the cheapest of the 2n – 2 alternatives

 Base case for recursion: find best algorithm for scanning relation

 When a plan for a subset is computed, store it and reuse it when it is


required again, instead of re-computing it
 Dynamic programming

Database System Concepts - 7th Edition 16.63 ©Silberschatz, Korth and Sudarshan
Join Order Optimization Algorithm
procedure findbestplan(S)
if (bestplan[S].cost ≠ ∞)
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
if (S contains only 1 relation)
set bestplan[S].plan and bestplan[S].cost based on the best way
of accessing S /* Using selections on S and indices on S */
else for each non-empty subset S1 of S such that S1 ≠ S
P1= findbestplan(S1)
P2= findbestplan(S - S1)
A = best algorithm for joining results of P1 and P2
cost = P1.cost + P2.cost + cost of A
if cost < bestplan[S].cost
bestplan[S].cost = cost
bestplan[S].plan = “execute P1.plan; execute P2.plan;
join results of P1 and P2 using A”
return bestplan[S]

* This is the algorithm shown in the 6th edition of the textbook.


It is slightly different from the algorithm we presented during our class, especially the way
the base case is handled.

Database System Concepts - 7th Edition 16.64 ©Silberschatz, Korth and Sudarshan
Cost of Optimization
 With dynamic programming time complexity of optimization is O(3n).
 With n = 10, this number is 59000 instead of 17.6 billion!
 Space complexity is O(2n)
 Better time performance when considering only left-deep join tree O(n 2n)
Space complexity remains at O(2n) (heuristic approach)

 Cost-based optimization is expensive, but worthwhile for queries on


large datasets (typical queries have small n, generally < 10)

Database System Concepts - 7th Edition 16.65 ©Silberschatz, Korth and Sudarshan
Cost Based Optimization with Equivalence
Rules
 Physical equivalence rules equates logical operations (e.g., join) to physical
ones (i.e., implementations – e.g., nested-loop join, merge join)
 Relational algebra expression are converted into QEP with implementation details
 Efficient optimizer based on equivalence rules depends on
 A space efficient representation of expressions which avoids making
multiple copies of sub-expressions
 Efficient techniques for detecting duplicate derivations of expressions
 Dynamic programming or memoization techniques, which store the “best”
plan for a sub-expression the first time it is computed, and reuses in on
repeated optimization calls on same sub-expression
 Cost-based pruning techniques that avoid generating all plans (greedy,
heuristics)

Database System Concepts - 7th Edition 16.66 ©Silberschatz, Korth and Sudarshan
Heuristic Optimization
 Cost-based optimization is expensive, even with dynamic programming
 Systems may use heuristics to reduce the number of possibilities
choices that must be considered
 Heuristic optimization transforms the query-tree by using a set of rules
that typically (but not in all cases) improve execution performance:
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of attributes)
 Perform most restrictive selection and join operations (i.e. with
smallest result size) before other similar operations
 Only consider left-deep join orders (particularly suited for pipelining
as only one input has to be pipelined, the other is a relation)

Database System Concepts - 7th Edition 16.67 ©Silberschatz, Korth and Sudarshan
Structure of Query Optimizers
 Some systems use only heuristics, others combine heuristics with partial
cost-based optimization.
 Many optimizers considers only left-deep join orders.
 Plus heuristics to push selections and projections down the query
tree
 Reduces optimization complexity and generates plans amenable to
pipelined evaluation.
 Heuristic optimization used in some versions of Oracle:
 Repeatedly pick “best” relation to join next
 it obtains and compares n plans (each starting with one relation)
In each plan, pick the best next relation for the join

Database System Concepts - 7th Edition 16.68 ©Silberschatz, Korth and Sudarshan
End of Chapter

These slides are a modified version of the slides provided with the book:

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

(however, chapter numeration refers to 7th Ed.)

The original version of the slides is available at: https://www.db-book.com/

You might also like