0% found this document useful (0 votes)
0 views72 pages

ADB Course Chapter_3.2 Indexes

Uploaded by

chebl6001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views72 pages

ADB Course Chapter_3.2 Indexes

Uploaded by

chebl6001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Data Storage & Indexes

Key Operations on an Index


Find all records with a given value for a key
» Key can be one field or a tuple of fields
(e.g. country=“US” AND state=“CA”)
» In some cases, only one matching record

Find all records with key in a given range

Find nearest neighbor to a data point?

CS 245 14
Tradeoffs in Indexing

Improved query
performance

Cost to update Size of


indexes indexes

CS 245 15
Some Types of Indexes
Conventional indexes

B-trees

Hash indexes

Multi-key indexing

Many standard data structures, but adapted


to work well on disk
CS 245 16
Sequential File
10
20
30
40
50
60
70
80
90
100

CS 245
17
Dense Index Sequential File

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

CS 245
18
Sparse Index Sequential File

10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230

CS 245
19
2-level sparse index Sequential File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

FileCS and
245
2 nd level index blocks need not be contiguous on disk
20
Sparse vs Dense Tradeoff
Sparse: Less space usage, can keep more
of index in memory

Dense: Can tell whether a key is present


without accessing file

(Later: sparse better for insertions, dense


needed for secondary indexes)

CS 245 21
Terms
Search key of an index
Primary index (on primary key of ordered files)
Secondary index
Dense index (contains all search key values)
Sparse index
Multi-level index

CS 245 22
Handling Duplicate Keys
For a primary index, can point to 1st instance
of each item (assuming blocks are linked)

For a secondary index, need to point to a list


of records since they can be anywhere

CS 245 23
Deletion: Sparse Index

10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 24
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 25
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 26
Deletion: Sparse Index
– delete record 30
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 27
Deletion: Sparse Index
– delete record 30
10
10 20
40 30
50 30 40
70 40

90 50
110 60
130 70
150 80

CS 245 28
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 29
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 30
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40

90 50
110 60
130 70
150 80

CS 245 31
Deletion: Dense Index

10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 32
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 33
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 34
Deletion: Dense Index
– delete record 30
10
10 20
20
40 30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 35
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60

40
50
60

CS 245 36
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60 34
40
50
our lucky day! 60
we have free space
where we need it!

CS 245
37
Insertion: Sparse Index
– insert record 15
10
10 20
30
40 30
60

40
50
60

CS 245
38
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60

CS 245
39
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60
• Illustrated: Immediate reorganization
• Variation:
– insert new block (chained file)
– update index
CS 245
40
Insertion: Sparse Index
– insert record 25
10
10 20
30
40 30
60

40
50
60

CS 245
41
Insertion: Sparse Index
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60

CS 245
42
Secondary Indexes Ordering
field

30
50
20
70
80
40
100
10
90
60

CS 245
43
Secondary Indexes Ordering
field

30
Sparse index: 30
50
20
80 20
100 70
90 80
... 40
100
10
90
60

CS 245
44
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60

CS 245
46
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
10 30
50 40 20
90
70
... 50
60
80
Sparse 70
40
higher ... 100
level 10
90
60

CS 245
47
Duplicate Values in Secondary
Indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
40

CS 245
buckets
49
Conventional Indexes
Pros:
- Simple
- Index is sequential file (good for scans)

Cons:
- Inserts expensive, and/or
- Lose sequentiality & balance

CS 245 55
Some Types of Indexes
Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 56
B-Trees
Another type of index
» Give up on sequentiality of index
» Try to get “balance”

Note: the exact data structure we’ll look at is


a B+ tree, but plain old “B-trees” are similar

CS 245 57
3

CS 245
5
11

30
30
35

100
101
110
100
Root
B+ Tree Example

120
130

150
156 120
179 150
180
180
200
(n = 3)

58
Sample Non-Leaf

57

81

95
to keys to keys to keys to keys
< 57 57£ k<81 81£k<95 ³95

CS 245 59
CS 245
To record 57
with key 57
To record 81
with key 81
To record
Sample Leaf Node

with key 95 95
From non-leaf node

to next leaf
in sequence

60
Size of Nodes on Disk
n + 1 pointers
n keys

(Fixed size nodes)

CS 245 61
Don’t Want Nodes to be Too Empty

Use at least

Non-leaf: é(n+1)/2ù pointers

Leaf: ë(n+1)/2û pointers to data

CS 245 62
Example: n = 3
Full node min. node

Non-leaf

120
150
180

30
11

30
35
Leaf
3
5

CS 245 63
B+ Tree Rules (tree of order n)

1. All leaves are at same lowest level


(balanced tree)

2. Pointers in leaves point to records, except


for “sequence pointer”

CS 245 64
B+ Tree Rules (tree of order n)

(3) Number of pointers/keys for B+ tree:

Max Max Min Min


ptrs keys ptrs®data keys
Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1
(non-root)
Leaf n+1 n ë(n+1)/2û ë(n+1)/2û
(non-root)
Root n+1 n 2* 1

* When there is only one record in the B+ tree, min pointers


in the root is 1 (the other pointers are null)
CS 245 65
Insert Into B+ Tree
(a) simple case
» space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

CS 245 66
(a) Insert key = 32 n=3

100
30
11

30
31
3
5

CS 245 67
(a) Insert key = 32 n=3

100
30
11

30
31
32
3
5

CS 245 68
(a) Insert key = 7 n=3

100
30
11

30
31
3
5

CS 245 69
(a) Insert key = 7 n=3

100
30
57
11

30
31
3
5

CS 245 70
(a) Insert key = 7 n=3

100
30
7
57
11

30
31
3
5

CS 245 71
(c) Insert key = 160 n=3

100

120
150
180

180
200
150
156
179

CS 245 72
CS 245
100

150
120
(c) Insert key = 160

156
179 150
180

160
179

180
n=3

200
73
CS 245
100

150
120
(c) Insert key = 160

156
179 150
180

160
179
180

180
n=3

200
74
CS 245
100
160
150
120
(c) Insert key = 160

156
179 150
180

160
179
180

180
n=3

200
75
(d) New root, insert 45 n=3

10
20
30
10
12

20
25

30
32
40
1
2
3

CS 245 76
(d) New root, insert 45 n=3

10
20
30
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 77
(d) New root, insert 45 n=3

10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 78
(d) New root, insert 45 n=3

new root

30
10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 79
Deletion from B+tree
(a) Simple case: no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

CS 245 80
(b) Coalesce with sibling
» Delete 50 n=4

100
10
40
10
20
30

40
50

CS 245 81
(b) Coalesce with sibling
» Delete 50 n=4

100
10
40
40
10
20
30

40
50

CS 245 82
(c) Redistribute keys
» Delete 50 n=4

100
10
40
10
20
30
35

40
50

CS 245 83
(c) Redistribute keys
» Delete 50 n=4

40 35
100
10

35
10
20
30
35

40
50

CS 245 84
(d) Non-leaf coalesce
n=4
– Delete 37

25
10
20

30
40
30
37
10
14

20
22

25
26

40
45
1
3

CS 245
85
(d) Non-leaf coalesce
n=4
– Delete 37

25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
86
(d) Non-leaf coalesce
n=4
– Delete 37

25
40
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
87
(d) Non-leaf coalesce
n=4
– Delete 37

25
new root

40
25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
88
B+ Tree Deletion in Practice
Often, coalescing is not implemented
» Too hard and not worth it! Most datasets
only tend to grow in size over time.

CS 245 89
Interesting Problem:
For B+ tree, how large should n be?

n is number of keys / node

With modern hardware, get n = 1000 or more


CS 245 90
Summary
Wide range of indexes for different data
types and queries (e.g. range vs exact)

Key concerns: query time, cost to update,


and size of index

Next: given all these storage data


structures, how do we run our queries?
CS 245 117

You might also like