0% found this document useful (0 votes)
1K views

QueryOptimization Siao

This document discusses query optimization in database management systems. It describes the key steps in cost-based query optimization as parsing, transformation, implementation, and plan selection based on cost estimates. The goal of query optimization is to process queries in the most efficient way possible by considering factors like disk I/O costs and operation costs. Transformation involves rewriting the query algebraically, while implementation generates different physical execution plans. The lowest cost plan is then selected to run the query.

Uploaded by

Mayank Kukreti
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

QueryOptimization Siao

This document discusses query optimization in database management systems. It describes the key steps in cost-based query optimization as parsing, transformation, implementation, and plan selection based on cost estimates. The goal of query optimization is to process queries in the most efficient way possible by considering factors like disk I/O costs and operation costs. Transformation involves rewriting the query algebraically, while implementation generates different physical execution plans. The lowest cost plan is then selected to run the query.

Uploaded by

Mayank Kukreti
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Query Optimization

CS 157B
Ch. 14
Mien Siao
Outline
Introduction
Steps in Cost-based query optimization- Query
Flow
Projection Example
Query Interaction in DBMS
Cost-based query Optimization: Algebraic
Expressions
Introduction
What is Query Optimization?
Suppose you were given a chance to
visit 15 pre-selected different cities
in Europe. The only constraint would
be Time
-> Would you have a plan to visit
the cities in any order?
Europe
Plan:
-> Place the 15 cities in different groups
based on their proximity to each other.
-> Start with one group and move on to the
next group.

Important point made over here is that you


would have visited the cities in a more
organized manner, and the Time constraint
mentioned earlier would have been dealt with
efficiently.
Query Optimization works in a similar way:
There can be many different ways to get an
answer from a given query. The result would be
same in all scenarios.

DBMS strive to process the query in the most


efficient way (in terms of Time) to produce the
answer.

Cost = Time needed to get all answers


Starting with System-R, most of the
commercial DBMSs use cost-based
optimizers.
The estimation should be accurate
and easy. Another important point is
the need for being logically
consistent because the least cost
plan will always be consistently low.
Steps in a Cost-based query
optimization

1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on cost
estimates
Query Flow
SQL

Parser

Optimizer

Code
Generator/
Interpreter

Processor
Query Parser Verify validity of the SQL
statement. Translate query into an internal
structure using relational calculus.
Query Optimizer Find the best expression
from various different algebraic expressions.
Criteria used is Cheapness
Code Generator/Interpreter Make calls for
the Query processor as a result of the work done
by the optimizer.
Query Processor Execute the calls obtained
from the code generator.
Cost of physical plans includes processor
time and communication time. The most
important factor to consider is disk I/Os
because it is the most time consuming
action.
Some other costs associated are:
- Operations (joins, unions,
intersections).
- The order of operations.
Why?
Joins, unions, and intersections are
associative and commutative.
- Management of storage of
arguments and passing of it.

Factors mentioned above should be


limited and minimized when creating
the best physical plan.
Projection Example:
Projections produce a result tuple for every
argument tuple.
What is the change?
Change in the output size is the change in the
length of tuples

Lets take a relation R


Relation (20,000 tuples): R(a, b, c)
Each Tuple (190 bytes): header = 24 bytes, a = 8
bytes, b = 8 bytes, c = 150 bytes
Each Block (1024): header = 24 bytes
We can fit 5 tuples into 1 block
- 5 tuples * 190 bytes/tuple = 950 bytes can
fit into 1 block
- For 20,000 tuples, we would require 4,000
blocks (20,000 / 5 tuples per block =
4,000

With a projection resulting in elimination of


column c (150 bytes), we could estimate
that each tuple would decrease to 40 bytes
(190 150 bytes)
Now, the new estimate will be 25 tuples in
1 block.
- 25 tuples * 40 bytes/tuple = 1000 bytes
will be able to fit into 1 block
- With 20,000 tuples, the new estimate is
800 blocks (20,000 tuples / 25 tuples per
block = 800 blocks)

Result is reduction by a factor of 5


Query interaction in DBMS
How does a query interact with a
DBMS?
- Interactive users
- Embedded queries in programs
written in C, C++, etc.
What is the difference between
these two ?
Interactive Users:
- When there is an interactive user
query, the query goes through the
Query Parser, Query Optimizer, Code
Generator, and Query Processor
each time.
Embedded Query:
- When there is an embedded query,
the query does not have to through
the Query Parser, Query Optimizer,
Code Generator, and the Query
Processor each time.
- In an embedded query, the calls
generated by the code generator are
stored in the database. Each time
the query is reached within the
program at run-time, the Query
Processor invokes the stored calls in
the database.
- Optimization is independent in
embedded queries.
Cost-based query Optimization:
Algebraic Expressions
If we had the following query-

SELECT p.pname, d.dname


FROM Patients p, Doctors d
WHERE p.doctor = d.dname
AND d.dgender = M
projection

filter

join

Scan (Patients) Scan (Doctors)


Cost-based query Optimization :
Transformation
projection projection

filter join

join Filter

Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors)


Cost-based query Optimization:
Implementation
projection projection

filter hash join

natural join filter

Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)


Cost-based query Optimization:
Plan selection based on costs
projection projection

filter hash join

natural join filter

Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)

Estimated Costs Estimated Costs


= 100ms = 50ms

You might also like