Query Processing, Optimization, and Indexing Techniques
Query Processing, Optimization, and Indexing Techniques
indexing techniques
From here:
SELECT C.name AS Course, count(S.students) AS Cnt
FROM courses C, subscription S
WHERE
C.lecturer = “Calders”
AND C.courseID = S.courseID
To there:
Course Cnt
“Advanced Databases” 67
“Data mining en kennissystemen” 19
What’s in between?
How does a relational DBMS get there efficiently.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
1
Physical Reality
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
2
Basic Steps in Query Processing
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
3
Pictorial Depiction of Equivalence Rules
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
4
Left Deep Join Trees
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
5
Physical Query Plan
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Optimization
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
6
Indexing Structures
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Basic Concepts
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Multiple-Key Access
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
7
Basic Concepts
search-key pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
8
Ordered Indices
In an ordered index, index entries are stored sorted on the search key
value. E.g., author catalog in library.
Primary index: in a sequentially ordered file, the index whose search
key specifies the sequential order of the file.
Also called clustering index
The search key of a primary index is usually but not necessarily the
primary key.
Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
Index-sequential file: ordered sequential file with a primary index.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
9
Sparse Index Files
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
10
Multilevel Index
If primary index does not fit in memory, access becomes
expensive.
Solution: treat primary index kept on disk as a sequential file
and construct a sparse index on it.
outer index – a sparse index of primary index
inner index – the primary index file
If even outer index is too large to fit in main memory, yet
another level of index can be created, and so on.
Indices at all levels must be updated on insertion or deletion
from the file.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
11
Secondary Indices
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
12
Primary and Secondary Indices
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
13
B+-Tree Index Files (Cont.)
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Typical node
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
14
Leaf Nodes in B+-Trees
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Non leaf nodes form a multi-level sparse index on the leaf nodes. For
a non-leaf node with m pointers:
All the search-keys in the subtree to which P1 points are less than
K1
For 2 ≤ i ≤ n – 1, all the search-keys in the subtree to which Pi
points have values greater than or equal to Ki–1 and less than Ki
All the search-keys in the subtree to which Pn points have values
greater than or equal to Kn–1
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
15
Example of a B+-tree
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Example of B+-tree
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
16
Queries on B+-Trees
Find all records with a search-key value of k.
1. N=root
2. Repeat
1. Examine N for the smallest search-key value > k.
2. If such a value exists, assume it is Ki. Then set N = Pi
3. Otherwise k ≥ Kn–1. Set N = Pn
Until N is a leaf node
3. If for some i, key Ki = k follow pointer Pi to the desired record or bucket.
4. Else no record with search-key value k exists.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
If there are K search-key values in the file, the height of the tree is no
more than logn/2(K).
A node is generally the same size as a disk block, typically 4
kilobytes
and n is typically around 100 (40 bytes per index entry).
With 1 million search key values and n = 100
at most log50(1,000,000) = 4 nodes are accessed in a lookup.
Contrast this with a balanced binary tree with 1 million search key
values — around 20 nodes are accessed in a lookup
above difference is significant since every node access may need
a disk I/O, costing around 20 milliseconds
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
17
Updates on B+-Trees: Insertion (Cont.)
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
18
Examples of B+-Tree Deletion (Cont.)
19
B+-Tree File Organization
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
20
B-Tree Index Files
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
21
Multiple-Key Access
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Composite search keys are search keys containing more than one
attribute
E.g. (branch_name, balance)
Lexicographic ordering: (a1, a2) < (b1, b2) if either
a1 < b1, or
a1=b1 and a2 < b2
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
22
Implementations of Relational Algebra
Expressions
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Selection Operation
File scan – search algorithms that locate and retrieve records that
fulfill a selection condition.
Algorithm A1 (linear search). Scan each file block and test all
records to see whether they satisfy the selection condition.
A2 (binary search). Applicable if selection is an equality
comparison on the attribute on which file is ordered.
A3 (primary index on candidate key, equality). Retrieve a single
record that satisfies the corresponding equality condition
A4 (primary index on nonkey, equality) Retrieve multiple records.
A5 (equality on search-key of secondary index).
A6 (primary index, comparison). (Relation is sorted on A)
A7 (secondary index, comparison)
…
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
23
Sorting
We may build an index on the relation, and then use the index to read
the relation in sorted order. May lead to one disk block access for
each tuple.
For relations that fit in memory, techniques like quicksort can be used.
For relations that don’t fit in memory, external
sort-merge is a good choice.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
24
Join Operation
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Nested-Loop Join
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
25
Block Nested-Loop Join
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
26
Merge-Join
1. Sort both relations on their join attribute (if not already sorted on the join
attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge algorithm.
2. Main difference is handling of duplicate values in join attribute — every
pair with same value on join attribute must be matched
3. Detailed algorithm in book
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Hash-Join (Cont.)
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
27
Hash-Join Algorithm
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Evaluation of Expressions
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
28
Materialization
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
Pipelining
Based upon slides for: Database System Concepts - 5th Edition, Aug 27, 2005.
29