0% found this document useful (0 votes)
18 views

Chapter15 1

The document discusses query processing and execution in a database system. It describes the major steps of query compilation including parsing, rewrite, and physical plan generation. It also covers query execution, explaining concepts like scanning, sorting, indexing, and cost-based optimization. The goal of the query processor is to efficiently execute SQL queries by converting them to a sequence of low-level operations.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Chapter15 1

The document discusses query processing and execution in a database system. It describes the major steps of query compilation including parsing, rewrite, and physical plan generation. It also covers query execution, explaining concepts like scanning, sorting, indexing, and cost-based optimization. The goal of the query processor is to efficiently execute SQL queries by converting them to a sequence of low-level operations.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Chapter 15 (TCDS)

Query Execution
Sukarna Barua
Associate Professor, CSE, BUET
03/20/2024
The Query Processor

 Functions of query processor:

 Converts high level SQL queries into a sequence of database operations and
executes those operations.
 Converts high level query to a detailed description.

 Use query algorithms to efficiently execute a query.

03/20/2024
The Query Processor

 Major approaches in query processing:

 Scanning, hashing, sorting, and indexing

 Algorithms have significantly different costs and structures:

 Some algorithms assume main memory is available at least one of the relations involved
in an operation.
 Others assume that the arguments are too big to fit in the main memory.

 Query processing has two parts:


 Query compilation

 Query execution.
03/20/2024
Query Compilation

 Three major steps in query compilation:


(a) Parsing:
 A parse tree is constructed from the SQL query.
 Also known as expression tree. [ when parse tree is represented using relational algebra
operators]
 This representation is more succinct.

 Example: Parse tree shown (right) for the given SQL (left).

03/20/2024
Query Compilation

 Three major steps in query compilation :


(b) Query rewrite:
 Parse tree is converted to an initial query plan.
 Usually an algebraic representation of the query
 Initial plan is transformed to a logical query plan.
 Logical plan require less time to execute than
initial plan.

03/20/2024
Query Compilation

 Three major steps in query compilation:


(c) Physical plan generation
 Converts logical query plan to a physical query plan.
 Selects algorithms to implement each of the operations
in the logical plan.
 Physical plan includes details such as
- How query relations are accessed.
- When and if a relation is to be sorted.

03/20/2024
Query compilation

 Three major steps in query compilation


(c) Physical plan generation
 Selects the best physical plan with lowest cost.

03/20/2024
Query Compilation

 Query rewrite + physical plan generation


= Query optimizer

03/20/2024
Issues to Consider

 Issue 1: What of the algebraically equivalent forms of a query that leads to the
most efficient algorithms for answering a query?
 Issue 2: For each operation, what algorithms should be used to implement that
operation?
 Issue 3: How should the operations pass data from one to the other, e.g., in a
pipelined fashion, in memory buffers, or via the disk?

03/20/2024
Best Query Plan

 Metadata to consider for best query plan generation:


 The size of each relation
 Statistics such as the approximate number and frequency of different values
for an attribute
 Existence of certain indexes
 Layout of data on disk

03/20/2024
Issues to Consider

 Image: https://www.cs.emory.edu/~cheung/Courses/554/Syllabus/4-query-exec/phys-ops.html

03/20/2024
Physical Query Plan Operators

 Physical query plans are built from operators


 Physical operators implementations
 Relational algebra operations:
 One of the operations of relational algebra operations
 Example: , , etc.

 Non-relational algebra operation:


 Scanning, Sorting while scanning, etc.
 Example: table-scan, index-scan, etc.

03/20/2024
Scanning operation

 Scanning
 Read the entire contents of a relation R.
 Read only those tuples of R that satisfy a given predicate.

03/20/2024
Scanning Operation

 Scanning approaches
 Table-scan: R is stored in secondary memory, tuples are arranges in blocks.
- Blocks are already known to DBMS.
- Read the blocks one by one.
- This is called table-scan.
 Usage:
- When all blocks of R must be read.

03/20/2024
Scanning Operation

 Scanning approaches
 Index-scan: There is an index on any attribute of R.
- Read the index.
- Use the index to locate all the blocks of R.
- Read the blocks one by one according to index.
- Blocks are read sorted by index-attribute.
 Usage:
- When blocks location of are not known to DBMS.
- When tuples satisfying a condition on an attribute to be retrieved.
[index must be on attribute ]

03/20/2024
Sorting While Scanning

 Why sorting while scanning?


 Query have ORDER BY clause. Hence, sorting is required for the final output.
 Some approaches of relationship algebra requires one or both arguments to be
sorted relations.

03/20/2024
Sort-Scan Operation

 Sort-scan operation:
 Sorts the relation while scanning.
 If is to be sorted by attribute and there is a B-tree index on
- An index-scan produces sorted .
 If fits in main memory:
- Retrieve tuples of using table-scan or index-scan.
- Use a main-memory sorting algorithm.
 If R is too large to fit in main memory:
- Use multi-way merge-sort [ discussed later ].

03/20/2024
Query Execution Cost

 A query consists of several relational algebra operations.


 A physical query plan consists of several physical operators.
 Each operator implements an operation (relational/non-relational).
 Assumptions:
 Arguments of operation are in disk.
 Final result is left in main memory or pipelines [don’t matter for cost
calculation, why?]
 Measure of cost:
 Number of disk I/Os. [primary cost]
 Why? It takes longer to get data from disk than main memory.

03/20/2024
Parameters for Measuring Cost
 Main memory cost metric:
 Main memory is divided into buffers.
 : number of main-memory buffers available to a operator.
 can be:
- Entire main memory or
- A portion of main-memory [typically when several operations share main
memory]

03/20/2024
Parameters for Measuring Cost
 Secondary memory (disk) cost metric:
 Data is accessed one block at a time from disk.
 Three parameters: , and .
 Number of blocks to hold R in disk.
- Can be written as , if is implied.
 Number of tuples in R.
- Can be written as , if R is implied.
- T(B) is the number of tuples in a single block.
 Number of distinct values of an attribute “” in R.

03/20/2024
I/O Cost for Scan Operation
 Cost of scan
 Number of disk I/Os is approximately:
 [If is clustered]
 [If is not clustered, and tuples are stored along with other tuples in disk
blocks]
 Cost =

 We assume all relations are clustered.

03/20/2024
I/O Cost for Scan Operation
 Cost of sort-scan
 If R fits in main memory:
- Readinto memory.
- Perform an in-memory sort on .
- Cost =

03/20/2024
I/O Cost for Index-scan Operation

 Cost of index-scan
 Read the index first: blocks read.
 Read the blocks of : blocks read.
 Total = [B(I) << B(R)]
= [If
 Not useful when full is required.
 Useful when only a part of is required.
 Only relevant blocks of are retrieved.

03/20/2024
Iterators for Physical Plan Operators
 Iterators are implemented for physical operators:
 Returns result of operator one tuple at a time.
 Iterators have following three methods:
 Open: initializes data structure for getting blocks and tuples.
 Getnext: returns the next tuple in the result.
 Close: clears data structure.

03/20/2024
Types of algorithms for physical plan
operators
 One pass algorithms
 Involve reading data only once from disk.
 Require at least one argument to fit in main memory.
 Two-pass algorithms
 Relations are too large to fit in main memory.
 Involve two times read from disk.
 Read first time from disk, process in some way, write to disk, and reads a
second time from disk.

03/20/2024
Types of Algorithms For Physical Plan
Operators
 Many-pass algorithms
 Data has no limit.
 Involve three or more passes.

03/20/2024
Types of Physical Plan Operations
 Tuple-at-a-time, Unary Operations
 Example operators:
 Selection
 Projection:
 Do not require entire relation in memory at once.
 Read one block at a time in a main-memory buffer and produce the
output.

03/20/2024
Types of Physical Plan Operations
 Full-relation, Unary Operations
 Example operators:
 Gamma: (grouping operator)
 Delta: (duplicate-elimination operator)
 Require all or most of the tuples in memory at once. [ Why? ]
 One pass algorithms can be used only if fits in M.

03/20/2024
Types of Physical Plan Operations
 Full-relation, Binary Operation
 Example operators:
 Union:
 Intersection:
 Natural join:
 Product:
 One pass algorithm may be used if at least one argument fits in main-memory.

03/20/2024
One pass algorithm for tuple-at-a-time operation
 Relational algebra operations: and
 Approach:
 Read blocks one at a time in an input buffer.
 Perform the operation on each tuple and move selected tuple to the output
buffer.
 Requirement: regardless of .
 I/O Cost:
 if is clustered [table-scan].
 if is not clustered.
 Exception: For selection with a condition on an attribute for which an index is
available, use index to retrieve a subset of .
 Cost of index-scan: < B(R)

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
 Relational algebra operation: [Duplicate elimination]
 Use one memory block to hold one block of
 Use remaining buffers to hold output tuples [single copy of each tuple of ].
 Algorithm:
 For each tuple in retrieved block:
 If it is already in output tuples, then discard (don’t copy to output buffer).
 Otherwise copy to output block.

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
 Cost of checking whether a tuple already exists in output list:
 if checking required proportional to [size of output], total time = for duplicate
checking.
 used a hash table with a large number of buckets for output list.

 Requirement: . [size of R with unique tuples should fit in M-1 buffers ]

 What if ?
 Outputs doesn't fit in main memory.
 Output must be moved to disk back and forth, resulting in thrashing.
 Increases cost for duplicate checking.

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation

 Grouping operation:
 Involves zero or more grouping attribute
 One or more aggregate attributes
 Algorithm:
 If we create one entry for each group in main memory:
- Scan the blocks of one at a time.
- For each tuple, find the entry corresponding to the tuple and update aggregated
result of the group.
- For aggregate, record the and seen so far.
- For aggregation, add one to accumulated value.
- For aggregation, add the value of a to the accumulated sums.
- For , use two accumulations, and

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation

 Result generation:
 When all tuples of are read.
 Output contains one entry for each group from the main memory.
 Requirement:
 Efficient data structure for finding group entry for a tuple.
 Hash tables or balanced trees can be used.
 Number of disk I/Os:
 Memory buffers requirement: .
[Size of R with unique rows should fit in M-1 buffers]

03/20/2024
Binary operations: Bag Union
 Bag union:
 R and S are bags. Union keeps all tuples.
 Algorithm:
 First copy every tuple of
 Then copy every tuple of
 Number of disk I/Os:
 Number of memory buffers required: suffices.
[ no need to store; pipelining allows outputs one tuple-at-a-time ]

03/20/2024
Binary operations: Set Union
 Set union: R U S
 R and S are sets. Union keeps one copy of each common tuple occurring in both R and S.
 Algorithm:
 Assume
 First read and copy in main memory buffers, build a main memory data structure on
search key [ entire tuple is the search key ]
 Copy all tuples of into output.
 Retrieve one block of at a time in main memory buffer. For each tuple of , check if it
is also in [ using main memory data structure on search key ]
 If it is not in , copy it to output.
 Efficient data structure is required for storing in main memory so that
check operation can be done efficiently.
 If it is in , don’t copy.

03/20/2024
Binary operations: Set Union
 Set union: R U S
 Number of disk I/Os required:
 Number of memory buffers required: .
[ At least one of R and S must fit in M-1 buffers ]

03/20/2024
Binary operations: Set Intersection
 Set Intersection: .
 and are sets. Intersection keeps only common tuples of R and S.
 Algorithm:
 Assume .
 Read into main memory buffers, build a search data structure [key is full
tuple]
 Read each block of For each tuple of , check if it is also in
[using main memory data structure on search key to check]
 If it is in , copy it to output.
 If it is not in , don’t copy.
 Number of disk I/Os required: .
 Number of memory buffers required: .
03/20/2024
Binary operations: Set Difference
 Set Difference: .
 andare sets.
 Keeps tuples of R that are not in S.
 Algorithm:
 Assume
 Readinto main memory buffers, build a search data structure
[search key is full tuple]
 Read each block of For each tuple of , check if it is also in
 If it is not in , copy it to output.
 If it is in , don’t copy to output.
 Number of disk I/Os required:
 Number of memory buffers required:
[ At least one of R and S must fit in M-1 buffers ]

03/20/2024
Binary operations: Bag Intersection
 Bag intersection: .
 R and S are bags.
 Bags allow multiples copies of the same tuple.
 Also known as multi-sets.
 An element appears in the intersection the minimum of the number of times it
appears in either.

03/20/2024
Binary operations: Bag Intersection
 Algorithm for bag intersection:
 Assume
 Read into main memory buffers, build a search data structure [ key is full tuple]
along with a count value.
 Main memory stores only unique tuples of .
 Count is the number of times tuple occurs in .
 Read each block of For each tuple of , check if it is also in and check the
count:
 If count is positive, decrease by one, and copy the tuple to the output.
 If count is zero, no action is required.

 Number of disk I/Os required:


 Number of memory buffers required:
03/20/2024
Binary operations: Product
 Product:
 R and S are sets. Product implements the cartesian product.
 Algorithm:
 Assume
 Read into main memory buffers, no special data structure is required.
 Read each block of For each tuple of R:
 Concatenate it with each tuple of S in main memory
 Send the concatenated tuple to the output.

 Number of disk I/Os required: .


 Number of memory buffers required:

03/20/2024
Binary operations: Natural Join
 Natural join: .
 Algorithm:
.
 Read into main memory buffers, build a search data structure on with search
key = .
 Read each block of in one remaining memory buffer. For each tuple of :
 Find the tuple of that agrees with .
 Concatenate the matched tuple of with in main memory.
 Send the concatenated tuple to the output.

 Number of disk I/Os required: .


 Number of memory buffers required:

03/20/2024

You might also like