ADB Course Chapter_3.2 Indexes
ADB Course Chapter_3.2 Indexes
CS 245 14
Tradeoffs in Indexing
Improved query
performance
CS 245 15
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245
17
Dense Index Sequential File
10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120
CS 245
18
Sparse Index Sequential File
10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230
CS 245
19
2-level sparse index Sequential File
10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230
FileCS and
245
2 nd level index blocks need not be contiguous on disk
20
Sparse vs Dense Tradeoff
Sparse: Less space usage, can keep more
of index in memory
CS 245 21
Terms
Search key of an index
Primary index (on primary key of ordered files)
Secondary index
Dense index (contains all search key values)
Sparse index
Multi-level index
CS 245 22
Handling Duplicate Keys
For a primary index, can point to 1st instance
of each item (assuming blocks are linked)
CS 245 23
Deletion: Sparse Index
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 24
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 25
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 26
Deletion: Sparse Index
– delete record 30
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 27
Deletion: Sparse Index
– delete record 30
10
10 20
40 30
50 30 40
70 40
90 50
110 60
130 70
150 80
CS 245 28
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 29
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40
90 50
110 60
130 70
150 80
CS 245 30
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40
90 50
110 60
130 70
150 80
CS 245 31
Deletion: Dense Index
10
10 20
20
30 30
40 40
50 50
60 60
70 70
80 80
CS 245 32
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30
40 40
50 50
60 60
70 70
80 80
CS 245 33
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30 40
40 40
50 50
60 60
70 70
80 80
CS 245 34
Deletion: Dense Index
– delete record 30
10
10 20
20
40 30 30 40
40 40
50 50
60 60
70 70
80 80
CS 245 35
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60
40
50
60
CS 245 36
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60 34
40
50
our lucky day! 60
we have free space
where we need it!
CS 245
37
Insertion: Sparse Index
– insert record 15
10
10 20
30
40 30
60
40
50
60
CS 245
38
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60
CS 245
39
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60
• Illustrated: Immediate reorganization
• Variation:
– insert new block (chained file)
– update index
CS 245
40
Insertion: Sparse Index
– insert record 25
10
10 20
30
40 30
60
40
50
60
CS 245
41
Insertion: Sparse Index
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60
CS 245
42
Secondary Indexes Ordering
field
30
50
20
70
80
40
100
10
90
60
CS 245
43
Secondary Indexes Ordering
field
30
Sparse index: 30
50
20
80 20
100 70
90 80
... 40
100
10
90
60
CS 245
44
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60
CS 245
46
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
10 30
50 40 20
90
70
... 50
60
80
Sparse 70
40
higher ... 100
level 10
90
60
CS 245
47
Duplicate Values in Secondary
Indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
40
CS 245
buckets
49
Conventional Indexes
Pros:
- Simple
- Index is sequential file (good for scans)
Cons:
- Inserts expensive, and/or
- Lose sequentiality & balance
CS 245 55
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 56
B-Trees
Another type of index
» Give up on sequentiality of index
» Try to get “balance”
CS 245 57
3
CS 245
5
11
30
30
35
100
101
110
100
Root
B+ Tree Example
120
130
150
156 120
179 150
180
180
200
(n = 3)
58
Sample Non-Leaf
57
81
95
to keys to keys to keys to keys
< 57 57£ k<81 81£k<95 ³95
CS 245 59
CS 245
To record 57
with key 57
To record 81
with key 81
To record
Sample Leaf Node
with key 95 95
From non-leaf node
to next leaf
in sequence
60
Size of Nodes on Disk
n + 1 pointers
n keys
CS 245 61
Don’t Want Nodes to be Too Empty
Use at least
CS 245 62
Example: n = 3
Full node min. node
Non-leaf
120
150
180
30
11
30
35
Leaf
3
5
CS 245 63
B+ Tree Rules (tree of order n)
CS 245 64
B+ Tree Rules (tree of order n)
CS 245 66
(a) Insert key = 32 n=3
100
30
11
30
31
3
5
CS 245 67
(a) Insert key = 32 n=3
100
30
11
30
31
32
3
5
CS 245 68
(a) Insert key = 7 n=3
100
30
11
30
31
3
5
CS 245 69
(a) Insert key = 7 n=3
100
30
57
11
30
31
3
5
CS 245 70
(a) Insert key = 7 n=3
100
30
7
57
11
30
31
3
5
CS 245 71
(c) Insert key = 160 n=3
100
120
150
180
180
200
150
156
179
CS 245 72
CS 245
100
150
120
(c) Insert key = 160
156
179 150
180
160
179
180
n=3
200
73
CS 245
100
150
120
(c) Insert key = 160
156
179 150
180
160
179
180
180
n=3
200
74
CS 245
100
160
150
120
(c) Insert key = 160
156
179 150
180
160
179
180
180
n=3
200
75
(d) New root, insert 45 n=3
10
20
30
10
12
20
25
30
32
40
1
2
3
CS 245 76
(d) New root, insert 45 n=3
10
20
30
10
12
20
25
30
32
40
40
45
1
2
3
CS 245 77
(d) New root, insert 45 n=3
10
20
30
40
10
12
20
25
30
32
40
40
45
1
2
3
CS 245 78
(d) New root, insert 45 n=3
new root
30
10
20
30
40
10
12
20
25
30
32
40
40
45
1
2
3
CS 245 79
Deletion from B+tree
(a) Simple case: no example
CS 245 80
(b) Coalesce with sibling
» Delete 50 n=4
100
10
40
10
20
30
40
50
CS 245 81
(b) Coalesce with sibling
» Delete 50 n=4
100
10
40
40
10
20
30
40
50
CS 245 82
(c) Redistribute keys
» Delete 50 n=4
100
10
40
10
20
30
35
40
50
CS 245 83
(c) Redistribute keys
» Delete 50 n=4
40 35
100
10
35
10
20
30
35
40
50
CS 245 84
(d) Non-leaf coalesce
n=4
– Delete 37
25
10
20
30
40
30
37
10
14
20
22
25
26
40
45
1
3
CS 245
85
(d) Non-leaf coalesce
n=4
– Delete 37
25
10
20
30
40
30
30
37
10
14
20
22
25
26
40
45
1
3
CS 245
86
(d) Non-leaf coalesce
n=4
– Delete 37
25
40
10
20
30
40
30
30
37
10
14
20
22
25
26
40
45
1
3
CS 245
87
(d) Non-leaf coalesce
n=4
– Delete 37
25
new root
40
25
10
20
30
40
30
30
37
10
14
20
22
25
26
40
45
1
3
CS 245
88
B+ Tree Deletion in Practice
Often, coalescing is not implemented
» Too hard and not worth it! Most datasets
only tend to grow in size over time.
CS 245 89
Interesting Problem:
For B+ tree, how large should n be?