Chapter15 1
Chapter15 1
Query Execution
Sukarna Barua
Associate Professor, CSE, BUET
03/20/2024
The Query Processor
Converts high level SQL queries into a sequence of database operations and
executes those operations.
Converts high level query to a detailed description.
03/20/2024
The Query Processor
Some algorithms assume main memory is available at least one of the relations involved
in an operation.
Others assume that the arguments are too big to fit in the main memory.
Query execution.
03/20/2024
Query Compilation
Example: Parse tree shown (right) for the given SQL (left).
03/20/2024
Query Compilation
03/20/2024
Query Compilation
03/20/2024
Query compilation
03/20/2024
Query Compilation
03/20/2024
Issues to Consider
Issue 1: What of the algebraically equivalent forms of a query that leads to the
most efficient algorithms for answering a query?
Issue 2: For each operation, what algorithms should be used to implement that
operation?
Issue 3: How should the operations pass data from one to the other, e.g., in a
pipelined fashion, in memory buffers, or via the disk?
03/20/2024
Best Query Plan
03/20/2024
Issues to Consider
Image: https://www.cs.emory.edu/~cheung/Courses/554/Syllabus/4-query-exec/phys-ops.html
03/20/2024
Physical Query Plan Operators
03/20/2024
Scanning operation
Scanning
Read the entire contents of a relation R.
Read only those tuples of R that satisfy a given predicate.
03/20/2024
Scanning Operation
Scanning approaches
Table-scan: R is stored in secondary memory, tuples are arranges in blocks.
- Blocks are already known to DBMS.
- Read the blocks one by one.
- This is called table-scan.
Usage:
- When all blocks of R must be read.
03/20/2024
Scanning Operation
Scanning approaches
Index-scan: There is an index on any attribute of R.
- Read the index.
- Use the index to locate all the blocks of R.
- Read the blocks one by one according to index.
- Blocks are read sorted by index-attribute.
Usage:
- When blocks location of are not known to DBMS.
- When tuples satisfying a condition on an attribute to be retrieved.
[index must be on attribute ]
03/20/2024
Sorting While Scanning
03/20/2024
Sort-Scan Operation
Sort-scan operation:
Sorts the relation while scanning.
If is to be sorted by attribute and there is a B-tree index on
- An index-scan produces sorted .
If fits in main memory:
- Retrieve tuples of using table-scan or index-scan.
- Use a main-memory sorting algorithm.
If R is too large to fit in main memory:
- Use multi-way merge-sort [ discussed later ].
03/20/2024
Query Execution Cost
03/20/2024
Parameters for Measuring Cost
Main memory cost metric:
Main memory is divided into buffers.
: number of main-memory buffers available to a operator.
can be:
- Entire main memory or
- A portion of main-memory [typically when several operations share main
memory]
03/20/2024
Parameters for Measuring Cost
Secondary memory (disk) cost metric:
Data is accessed one block at a time from disk.
Three parameters: , and .
Number of blocks to hold R in disk.
- Can be written as , if is implied.
Number of tuples in R.
- Can be written as , if R is implied.
- T(B) is the number of tuples in a single block.
Number of distinct values of an attribute “” in R.
03/20/2024
I/O Cost for Scan Operation
Cost of scan
Number of disk I/Os is approximately:
[If is clustered]
[If is not clustered, and tuples are stored along with other tuples in disk
blocks]
Cost =
03/20/2024
I/O Cost for Scan Operation
Cost of sort-scan
If R fits in main memory:
- Readinto memory.
- Perform an in-memory sort on .
- Cost =
03/20/2024
I/O Cost for Index-scan Operation
Cost of index-scan
Read the index first: blocks read.
Read the blocks of : blocks read.
Total = [B(I) << B(R)]
= [If
Not useful when full is required.
Useful when only a part of is required.
Only relevant blocks of are retrieved.
03/20/2024
Iterators for Physical Plan Operators
Iterators are implemented for physical operators:
Returns result of operator one tuple at a time.
Iterators have following three methods:
Open: initializes data structure for getting blocks and tuples.
Getnext: returns the next tuple in the result.
Close: clears data structure.
03/20/2024
Types of algorithms for physical plan
operators
One pass algorithms
Involve reading data only once from disk.
Require at least one argument to fit in main memory.
Two-pass algorithms
Relations are too large to fit in main memory.
Involve two times read from disk.
Read first time from disk, process in some way, write to disk, and reads a
second time from disk.
03/20/2024
Types of Algorithms For Physical Plan
Operators
Many-pass algorithms
Data has no limit.
Involve three or more passes.
03/20/2024
Types of Physical Plan Operations
Tuple-at-a-time, Unary Operations
Example operators:
Selection
Projection:
Do not require entire relation in memory at once.
Read one block at a time in a main-memory buffer and produce the
output.
03/20/2024
Types of Physical Plan Operations
Full-relation, Unary Operations
Example operators:
Gamma: (grouping operator)
Delta: (duplicate-elimination operator)
Require all or most of the tuples in memory at once. [ Why? ]
One pass algorithms can be used only if fits in M.
03/20/2024
Types of Physical Plan Operations
Full-relation, Binary Operation
Example operators:
Union:
Intersection:
Natural join:
Product:
One pass algorithm may be used if at least one argument fits in main-memory.
03/20/2024
One pass algorithm for tuple-at-a-time operation
Relational algebra operations: and
Approach:
Read blocks one at a time in an input buffer.
Perform the operation on each tuple and move selected tuple to the output
buffer.
Requirement: regardless of .
I/O Cost:
if is clustered [table-scan].
if is not clustered.
Exception: For selection with a condition on an attribute for which an index is
available, use index to retrieve a subset of .
Cost of index-scan: < B(R)
03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
Relational algebra operation: [Duplicate elimination]
Use one memory block to hold one block of
Use remaining buffers to hold output tuples [single copy of each tuple of ].
Algorithm:
For each tuple in retrieved block:
If it is already in output tuples, then discard (don’t copy to output buffer).
Otherwise copy to output block.
03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
Cost of checking whether a tuple already exists in output list:
if checking required proportional to [size of output], total time = for duplicate
checking.
used a hash table with a large number of buckets for output list.
What if ?
Outputs doesn't fit in main memory.
Output must be moved to disk back and forth, resulting in thrashing.
Increases cost for duplicate checking.
03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
Grouping operation:
Involves zero or more grouping attribute
One or more aggregate attributes
Algorithm:
If we create one entry for each group in main memory:
- Scan the blocks of one at a time.
- For each tuple, find the entry corresponding to the tuple and update aggregated
result of the group.
- For aggregate, record the and seen so far.
- For aggregation, add one to accumulated value.
- For aggregation, add the value of a to the accumulated sums.
- For , use two accumulations, and
03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
Result generation:
When all tuples of are read.
Output contains one entry for each group from the main memory.
Requirement:
Efficient data structure for finding group entry for a tuple.
Hash tables or balanced trees can be used.
Number of disk I/Os:
Memory buffers requirement: .
[Size of R with unique rows should fit in M-1 buffers]
03/20/2024
Binary operations: Bag Union
Bag union:
R and S are bags. Union keeps all tuples.
Algorithm:
First copy every tuple of
Then copy every tuple of
Number of disk I/Os:
Number of memory buffers required: suffices.
[ no need to store; pipelining allows outputs one tuple-at-a-time ]
03/20/2024
Binary operations: Set Union
Set union: R U S
R and S are sets. Union keeps one copy of each common tuple occurring in both R and S.
Algorithm:
Assume
First read and copy in main memory buffers, build a main memory data structure on
search key [ entire tuple is the search key ]
Copy all tuples of into output.
Retrieve one block of at a time in main memory buffer. For each tuple of , check if it
is also in [ using main memory data structure on search key ]
If it is not in , copy it to output.
Efficient data structure is required for storing in main memory so that
check operation can be done efficiently.
If it is in , don’t copy.
03/20/2024
Binary operations: Set Union
Set union: R U S
Number of disk I/Os required:
Number of memory buffers required: .
[ At least one of R and S must fit in M-1 buffers ]
03/20/2024
Binary operations: Set Intersection
Set Intersection: .
and are sets. Intersection keeps only common tuples of R and S.
Algorithm:
Assume .
Read into main memory buffers, build a search data structure [key is full
tuple]
Read each block of For each tuple of , check if it is also in
[using main memory data structure on search key to check]
If it is in , copy it to output.
If it is not in , don’t copy.
Number of disk I/Os required: .
Number of memory buffers required: .
03/20/2024
Binary operations: Set Difference
Set Difference: .
andare sets.
Keeps tuples of R that are not in S.
Algorithm:
Assume
Readinto main memory buffers, build a search data structure
[search key is full tuple]
Read each block of For each tuple of , check if it is also in
If it is not in , copy it to output.
If it is in , don’t copy to output.
Number of disk I/Os required:
Number of memory buffers required:
[ At least one of R and S must fit in M-1 buffers ]
03/20/2024
Binary operations: Bag Intersection
Bag intersection: .
R and S are bags.
Bags allow multiples copies of the same tuple.
Also known as multi-sets.
An element appears in the intersection the minimum of the number of times it
appears in either.
03/20/2024
Binary operations: Bag Intersection
Algorithm for bag intersection:
Assume
Read into main memory buffers, build a search data structure [ key is full tuple]
along with a count value.
Main memory stores only unique tuples of .
Count is the number of times tuple occurs in .
Read each block of For each tuple of , check if it is also in and check the
count:
If count is positive, decrease by one, and copy the tuple to the output.
If count is zero, no action is required.
03/20/2024
Binary operations: Natural Join
Natural join: .
Algorithm:
.
Read into main memory buffers, build a search data structure on with search
key = .
Read each block of in one remaining memory buffer. For each tuple of :
Find the tuple of that agrees with .
Concatenate the matched tuple of with in main memory.
Send the concatenated tuple to the output.
03/20/2024