4 Chapter Four
4 Chapter Four
1
Basic Steps in Query Processing
1.Parsing and translation
2.Optimization
3.Evaluation
Basic Steps in Query Processing
(Cont.)
• Parsing and translation
– translate the query into its internal form. This is
then translated into relational algebra.
– Parser checks syntax, verifies relations
• Evaluation
– The query-execution engine takes a query-
evaluation plan, executes that plan, and returns
the answers to the query.
Basic Steps in Query Processing :
Optimization
• A relational algebra expression may have many equivalent
expressions
– E.g., balance2500(balance(account)) is equivalent to
balance(balance2500(account))
5
Translating SQL Queries into Relational Algebra
(cont…)
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
7
Introduction (Cont.)
• An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.
8
Introduction (Cont.)
• Cost difference between evaluation plans for a query can
be enormous
– E.g. seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence
rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
• Estimation of plan cost based on:
– Statistical information about relations. Examples:
• number of tuples, number of distinct values for an attribute
– Statistics estimation for intermediate results
• to compute cost of complex expressions
– Cost formulae for algorithms, computed using statistics
9
Generating Equivalent
Expressions
10
Transformation of Relational Expressions
• Two relational algebra expressions are said to be
equivalent if the two expressions generate the same
set of tuples on every legal database instance
– Note: order of tuples is irrelevant
• In SQL, inputs and outputs are multisets of tuples
– Two expressions in the multiset version of the relational
algebra are said to be equivalent if the two expressions
generate the same multiset of tuples on every legal
database instance.
• An equivalence rule says that expressions of two
forms are equivalent
– Can replace expression of first form by second, or vice
versa
11
Equivalence Rules
1. Conjunctive selection operations can be
deconstructed into a sequence of individual
selections.
14
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in 0 involve only the
attributes of one of the expressions (E1) being joined.
15
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation as
follows:
(a) if involves only attributes from L1 L2:
L1 L2 ( E1 E2 ) ( L1 ( E1 )) ( L2 ( E2 ))
16
Equivalence Rules (Cont.)
9. The set operations union and intersection are
commutative
E1 E2 = E2 E1
E1 E2 = E2 E1
9. (set difference is not commutative).
10.Set union and intersection are associative.
(E1 E2) E3 = E1 (E2 E3)
(E1 E2) E3 = E1 (E2 E3)
9. The selection operation distributes over , and –.
(E1 – E2) = (E1) – (E2)
18
Example with Multiple Transformations
• Query: Find the names of all customers with an
account at a Brooklyn branch whose account balance is
over $1000.
(
customer_name( branch_city = “Brooklyn” balance > 1000
19
Multiple Transformations (Cont.)
20
Transformation Example: Pushing Projections
customer_name((branch_city = “Brooklyn” (branch) account) depositor)
• When we compute
( branch_city = “Brooklyn” (branch) account )
21
Join Ordering Example
• For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
• If r2 r3 is quite large and r1 r2 is small, we
choose
(r1 r2) r3
so that we compute and store a smaller
temporary relation.
22
Join Ordering Example (Cont.)
• Consider the expression
customer_name ((branch_city = “Brooklyn” (branch))
(account depositor))
• Could compute account depositor first, and join
result with
branch_city = “Brooklyn” (branch)
23
Enumeration of Equivalent
Expressions
• Query optimizers use equivalence rules to
systematically generate expressions equivalent to
the given expression
• Can generate all equivalent expressions as follows:
– Repeat
• apply all applicable equivalence rules on every equivalent
expression found so far
• add newly generated expressions to the set of equivalent
expressions
Until no new equivalent expressions are generated above
• The above approach is very expensive in space and
time
– Two approaches
• Optimized plan generation based on transformation rules
• Special case approach for queries with only selections,
projections and joins
24
Cost Estimation
• Cost of each operator computer
– Need statistics of input relations
• E.g. number of tuples, sizes of tuples
• Inputs can be results of sub-expressions
– Need to estimate statistics of expression results
– To do so, we require additional statistics
• E.g. number of distinct values for an attribute
26
Choice of Evaluation Plans
• Must consider the interaction of evaluation
techniques when choosing evaluation plans
– choosing the cheapest algorithm for each operation
independently may not yield best overall algorithm.
E.g.
• merge-join may be costlier than hash-join, but may provide a
sorted output which reduces the cost for an outer level
aggregation.
• nested-loop join may provide opportunity for pipelining
• Practical query optimizers incorporate elements of
the following two broad approaches:
1. Search all the plans and choose the best plan in a
cost-based fashion.
2. Uses heuristics to choose a plan.
27
Cost-Based Optimization
• Consider finding the best join-order for r1 r2 ... rn.
• There are (2(n – 1))!/(n – 1)! different join orders for
above expression. With n = 7, the number is 665280,
with n = 10, the number is greater than 176 billion!
• No need to generate all the join orders. Using
dynamic programming, the least-cost join order for
any subset of {r1, r2, . . . rn} is computed only once
and stored for future use.
28
Left Deep Join Trees
• In left-deep join trees, the right-hand-side
input for each join is a relation, not the
result of an intermediate join.
31
Cost of Optimization
• With dynamic programming time complexity of
optimization with bushy trees is O(3n).
– With n = 10, this number is 59000 instead of 176 billion!
• Space complexity is O(2n)
• To find best left-deep join tree for a set of n relations:
– Consider n alternatives with one relation as right-hand side
input and the other relations as left-hand side input.
– Modify optimization algorithm:
• Replace “for each non-empty subset S1 of S such that S1 S”
• By: for each relation r in S
let S1 = S – r .
• If only left-deep trees are considered, time complexity of
finding best join order is O(n 2n)
– Space complexity remains at O(2n)
• Cost-based optimization is expensive, but worthwhile for
queries on large datasets (typical queries have small n,
generally < 10)
32
Interesting Sort Orders
• Consider the expression (r1 r2) r3 (with A as
common attribute)
• An interesting sort order is a particular sort order of
tuples that could be useful for a later operation
– Using merge-join to compute r1 r2 may be costlier than
hash join but generates result sorted on A
– Which in turn may make merge-join with r3 cheaper, which
may reduce cost of join with r3 and minimizing overall cost
– Sort order may also be useful for order by and for grouping
• Not sufficient to find the best join order for each subset
of the set of n given relations
– must find the best join order for each subset, for each
interesting sort order
– Simple extension of earlier dynamic programming algorithms
– Usually, number of interesting orders is quite small and
doesn’t affect time/space complexity significantly
33
Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically (but not in all cases)
improve execution performance:
– Perform selection early (reduces the number of tuples)
– Perform projection early (reduces the number of attributes)
– Perform most restrictive selection and join operations (i.e.
with smallest result size) before other similar operations.
– Some systems use only heuristics, others combine heuristics
with partial cost-based optimization.
34
Structure of Query Optimizers
• Many optimizers considers only left-deep join
orders.
– Plus heuristics to push selections and projections down
the query tree
– Reduces optimization complexity and generates plans
amenable to pipelined evaluation.
• Heuristic optimization used in some versions of
Oracle:
– Repeatedly pick “best” relation to join next
• Starting from each of n starting points. Pick best among these
• Intricacies of SQL complicate query optimization
– E.g. nested subqueries
35
Structure of Query Optimizers (Cont.)
• Some query optimizers integrate heuristic selection
and the generation of alternative access plans.
– Frequently used approach
• heuristic rewriting of nested block structure and aggregation
• followed by cost-based join-order optimization for each block
– Some optimizers (e.g. SQL Server) apply transformations
to entire query and do not depend on block structure
• Even with the use of heuristics, cost-based query
optimization imposes a substantial overhead.
– But is worth it for expensive queries
– Optimizers often use simple heuristics for very cheap
queries, and perform exhaustive enumeration for more
expensive queries
36
End of Chapter
46