0% found this document useful (0 votes)

0 views72 pages

ADB Course Chapter_3.2 Indexes

Uploaded by

chebl6001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views72 pages

ADB Course Chapter_3.2 Indexes

Uploaded by

chebl6001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Data Storage & Indexes

Key Operations on an Index

Find all records with a given value for a key
» Key can be one field or a tuple of fields
(e.g. country=“US” AND state=“CA”)
» In some cases, only one matching record

Find all records with key in a given range

Find nearest neighbor to a data point?

CS 245 14
Tradeoffs in Indexing

Improved query
performance

Cost to update Size of

indexes indexes

CS 245 15
Some Types of Indexes
Conventional indexes

B-trees

Hash indexes

Multi-key indexing

Many standard data structures, but adapted

to work well on disk
CS 245 16
Sequential File
10
20
30
40
50
60
70
80
90
100

CS 245
17
Dense Index Sequential File

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

CS 245
18
Sparse Index Sequential File

10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230

CS 245
19
2-level sparse index Sequential File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

FileCS and
245
2 nd level index blocks need not be contiguous on disk
20
Sparse vs Dense Tradeoff
Sparse: Less space usage, can keep more
of index in memory

Dense: Can tell whether a key is present

without accessing file

(Later: sparse better for insertions, dense

needed for secondary indexes)

CS 245 21
Terms
Search key of an index
Primary index (on primary key of ordered files)
Secondary index
Dense index (contains all search key values)
Sparse index
Multi-level index

CS 245 22
Handling Duplicate Keys
For a primary index, can point to 1st instance
of each item (assuming blocks are linked)

For a secondary index, need to point to a list

of records since they can be anywhere

CS 245 23
Deletion: Sparse Index

10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 24
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 25
Deletion: Sparse Index
– delete record 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 26
Deletion: Sparse Index
– delete record 30
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 27
Deletion: Sparse Index
– delete record 30
10
10 20
40 30
50 30 40
70 40

90 50
110 60
130 70
150 80

CS 245 28
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 29
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
30
50 30
70 40

90 50
110 60
130 70
150 80

CS 245 30
Deletion: Sparse Index
– delete records 30 & 40
10
10 20
50 30
70 50 30
70 40

90 50
110 60
130 70
150 80

CS 245 31
Deletion: Dense Index

10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 32
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30
40 40

50 50
60 60
70 70
80 80

CS 245 33
Deletion: Dense Index
– delete record 30
10
10 20
20
30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 34
Deletion: Dense Index
– delete record 30
10
10 20
20
40 30 30 40
40 40

50 50
60 60
70 70
80 80

CS 245 35
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60

40
50
60

CS 245 36
Insertion: Sparse Index
– insert record 34
10
10 20
30
40 30
60 34
40
50
our lucky day! 60
we have free space
where we need it!

CS 245
37
Insertion: Sparse Index
– insert record 15
10
10 20
30
40 30
60

40
50
60

CS 245
38
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60

CS 245
39
Insertion: Sparse Index
– insert record 15
10
10 20 15
20 30
40 30 20
60 30
40
50
60
• Illustrated: Immediate reorganization
• Variation:
– insert new block (chained file)
– update index
CS 245
40
Insertion: Sparse Index
– insert record 25
10
10 20
30
40 30
60

40
50
60

CS 245
41
Insertion: Sparse Index
– insert record 25
10 25
10 20
30
40 30 overflow blocks
60 (reorganize later...)
40
50
60

CS 245
42
Secondary Indexes Ordering
field

30
50
20
70
80
40
100
10
90
60

CS 245
43
Secondary Indexes Ordering
field

30
Sparse index: 30
50
20
80 20
100 70
90 80
... 40
100
10
90
60

CS 245
44
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
30
40 20
70
50
60
80
70
40
... 100
10
90
60

CS 245
46
Secondary Indexes Ordering
field
Dense index: 10 30
20 50
10 30
50 40 20
90
70
... 50
60
80
Sparse 70
40
higher ... 100
level 10
90
60

CS 245
47
Duplicate Values in Secondary
Indexes
20
10 10
20
30 20
40 40
50 10
60 40
...
10
40
30
40

CS 245
buckets
49
Conventional Indexes
Pros:
- Simple
- Index is sequential file (good for scans)

Cons:
- Inserts expensive, and/or
- Lose sequentiality & balance

CS 245 55
Some Types of Indexes
Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 56
B-Trees
Another type of index
» Give up on sequentiality of index
» Try to get “balance”

Note: the exact data structure we’ll look at is

a B+ tree, but plain old “B-trees” are similar

CS 245 57
3

CS 245
5
11

30
30
35

100
101
110
100
Root
B+ Tree Example

120
130

150
156 120
179 150
180
180
200
(n = 3)

58
Sample Non-Leaf

95
to keys to keys to keys to keys
< 57 57£ k<81 81£k<95 ³95

CS 245 59
CS 245
To record 57
with key 57
To record 81
with key 81
To record
Sample Leaf Node

with key 95 95
From non-leaf node

to next leaf
in sequence

60
Size of Nodes on Disk
n + 1 pointers
n keys

(Fixed size nodes)

CS 245 61
Don’t Want Nodes to be Too Empty

Use at least

Non-leaf: é(n+1)/2ù pointers

Leaf: ë(n+1)/2û pointers to data

CS 245 62
Example: n = 3
Full node min. node

Non-leaf

120
150
180

30
11

30
35
Leaf
3
5

CS 245 63
B+ Tree Rules (tree of order n)

1. All leaves are at same lowest level

(balanced tree)

2. Pointers in leaves point to records, except

for “sequence pointer”

CS 245 64
B+ Tree Rules (tree of order n)

(3) Number of pointers/keys for B+ tree:

Max Max Min Min

ptrs keys ptrs®data keys
Non-leaf n+1 n é(n+1)/2ù é(n+1)/2ù-1
(non-root)
Leaf n+1 n ë(n+1)/2û ë(n+1)/2û
(non-root)
Root n+1 n 2* 1

* When there is only one record in the B+ tree, min pointers

in the root is 1 (the other pointers are null)
CS 245 65
Insert Into B+ Tree
(a) simple case
» space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

CS 245 66
(a) Insert key = 32 n=3

100
30
11

30
31
3
5

CS 245 67
(a) Insert key = 32 n=3

100
30
11

30
31
32
3
5

CS 245 68
(a) Insert key = 7 n=3

100
30
11

30
31
3
5

CS 245 69
(a) Insert key = 7 n=3

100
30
57
11

30
31
3
5

CS 245 70
(a) Insert key = 7 n=3

100
30
7
57
11

30
31
3
5

CS 245 71
(c) Insert key = 160 n=3

100

120
150
180

180
200
150
156
179

CS 245 72
CS 245
100

150
120
(c) Insert key = 160

156
179 150
180

160
179

180
n=3

200
73
CS 245
100

150
120
(c) Insert key = 160

156
179 150
180

160
179
180

180
n=3

200
74
CS 245
100
160
150
120
(c) Insert key = 160

156
179 150
180

160
179
180

180
n=3

200
75
(d) New root, insert 45 n=3

10
20
30
10
12

20
25

30
32
40
1
2
3

CS 245 76
(d) New root, insert 45 n=3

10
20
30
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 77
(d) New root, insert 45 n=3

10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 78
(d) New root, insert 45 n=3

new root

30
10
20
30

40
10
12

20
25

30
32
40

40
45
1
2
3

CS 245 79
Deletion from B+tree
(a) Simple case: no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

CS 245 80
(b) Coalesce with sibling
» Delete 50 n=4

100
10
40
10
20
30

40
50

CS 245 81
(b) Coalesce with sibling
» Delete 50 n=4

100
10
40
40
10
20
30

40
50

CS 245 82
(c) Redistribute keys
» Delete 50 n=4

100
10
40
10
20
30
35

40
50

CS 245 83
(c) Redistribute keys
» Delete 50 n=4

40 35
100
10

35
10
20
30
35

40
50

CS 245 84
(d) Non-leaf coalesce
n=4
– Delete 37

25
10
20

30
40
30
37
10
14

20
22

25
26

40
45
1
3

CS 245
85
(d) Non-leaf coalesce
n=4
– Delete 37

25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
86
(d) Non-leaf coalesce
n=4
– Delete 37

25
40
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
87
(d) Non-leaf coalesce
n=4
– Delete 37

25
new root

40
25
10
20

30
40
30

30
37
10
14

20
22

25
26

40
45
1
3

CS 245
88
B+ Tree Deletion in Practice
Often, coalescing is not implemented
» Too hard and not worth it! Most datasets
only tend to grow in size over time.

CS 245 89
Interesting Problem:
For B+ tree, how large should n be?

n is number of keys / node

With modern hardware, get n = 1000 or more

CS 245 90
Summary
Wide range of indexes for different data
types and queries (e.g. range vs exact)

Key concerns: query time, cost to update,

and size of index

Next: given all these storage data

structures, how do we run our queries?
CS 245 117

Advanced Excel Formulas Unleashing Brilliance With Excel Formulas
89% (9)
Advanced Excel Formulas Unleashing Brilliance With Excel Formulas
834 pages
DP Ss3 Note First Term
100% (2)
DP Ss3 Note First Term
43 pages
S03-U1-Ejemplos Indices Secuenciales
No ratings yet
S03-U1-Ejemplos Indices Secuenciales
29 pages
SFSF
No ratings yet
SFSF
6 pages
Binary Search Trees
No ratings yet
Binary Search Trees
10 pages
Expand All Collapse All
No ratings yet
Expand All Collapse All
1 page
December 0307 PM
No ratings yet
December 0307 PM
13 pages
CS 245: Database System Principles: Notes 4: Indexing
No ratings yet
CS 245: Database System Principles: Notes 4: Indexing
156 pages
Assignment 1: Create The Table
No ratings yet
Assignment 1: Create The Table
26 pages
Utilities: Comparex Gvexport & Gvrestore
100% (6)
Utilities: Comparex Gvexport & Gvrestore
38 pages
Block Size Comparison1
No ratings yet
Block Size Comparison1
624 pages
NRIV1
No ratings yet
NRIV1
35 pages
Indexing
No ratings yet
Indexing
24 pages
Tbs
No ratings yet
Tbs
257 pages
Indexing and Hashing: B.Ramamurthy
No ratings yet
Indexing and Hashing: B.Ramamurthy
24 pages
Modeling and Simulation (CSL 403) Mini Project: Submitted By: Anurag Aggarwal 10-CSU-030 Aprajita Gupta 10-CSU-031
No ratings yet
Modeling and Simulation (CSL 403) Mini Project: Submitted By: Anurag Aggarwal 10-CSU-030 Aprajita Gupta 10-CSU-031
21 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Week 3
No ratings yet
Week 3
5 pages
Changing Character Set To UTF8 For Production Database
100% (1)
Changing Character Set To UTF8 For Production Database
11 pages
Indexing: By: Arnold Mesa
No ratings yet
Indexing: By: Arnold Mesa
12 pages
Indexing Notes Dbms
No ratings yet
Indexing Notes Dbms
6 pages
Difference Between Non-Indexed and Indexed Table: Elapsed: 00:11:33.29
No ratings yet
Difference Between Non-Indexed and Indexed Table: Elapsed: 00:11:33.29
4 pages
Inventory
No ratings yet
Inventory
6 pages
HANA Configuration MiniChecks 1.00.70+ SSS
No ratings yet
HANA Configuration MiniChecks 1.00.70+ SSS
73 pages
Ramana@scenarios
No ratings yet
Ramana@scenarios
15 pages
Indexing: Database System Concepts, 6 Ed
No ratings yet
Indexing: Database System Concepts, 6 Ed
15 pages
3.11. PPU TOFU
No ratings yet
3.11. PPU TOFU
2 pages
Theory Assignment 2
No ratings yet
Theory Assignment 2
3 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
Indexing & Hashing
No ratings yet
Indexing & Hashing
88 pages
HANA Configuration MiniChecks SSS
No ratings yet
HANA Configuration MiniChecks SSS
62 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
LabVIEW Database Connectivity Toolkit Cheat Sheet
No ratings yet
LabVIEW Database Connectivity Toolkit Cheat Sheet
2 pages
EC2209 Manual
No ratings yet
EC2209 Manual
122 pages
ISBD Lab9 2022
No ratings yet
ISBD Lab9 2022
12 pages
Insert 12 (Rest of Tree Is Unchanged)
No ratings yet
Insert 12 (Rest of Tree Is Unchanged)
3 pages
Boltos 20solutions 20inc. 20-Project 204
No ratings yet
Boltos 20solutions 20inc. 20-Project 204
5 pages
SS3 TERM 1
No ratings yet
SS3 TERM 1
18 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
Lecture Index Structures
No ratings yet
Lecture Index Structures
43 pages
Dbms 4
No ratings yet
Dbms 4
11 pages
DBMS 9.pdf - Jhyyiu
No ratings yet
DBMS 9.pdf - Jhyyiu
9 pages
VSAM
No ratings yet
VSAM
352 pages
CheatSheet DW 25082023
No ratings yet
CheatSheet DW 25082023
2 pages
Ospf Cpu Hist Eth47
No ratings yet
Ospf Cpu Hist Eth47
1 page
Indexing
No ratings yet
Indexing
8 pages
Cheat Sheet Data Wrangling
No ratings yet
Cheat Sheet Data Wrangling
1 page
CH 08
No ratings yet
CH 08
6 pages
dbms1
No ratings yet
dbms1
18 pages
18mm Airolam Acacia 575 BSL
No ratings yet
18mm Airolam Acacia 575 BSL
17 pages
Breitling - Histograms, Myths and Facts Oracle
No ratings yet
Breitling - Histograms, Myths and Facts Oracle
42 pages
Unknown pg4
No ratings yet
Unknown pg4
13 pages
ADB Course Chapter_3.1 Data Storage
No ratings yet
ADB Course Chapter_3.1 Data Storage
34 pages
Script Sharepool
No ratings yet
Script Sharepool
8 pages
Mainframes Tips
No ratings yet
Mainframes Tips
5 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Toc
No ratings yet
Toc
6 pages
03 UW Indexing (1)
No ratings yet
03 UW Indexing (1)
97 pages
Intermediate SQL
No ratings yet
Intermediate SQL
46 pages
CCPRGG2L Course Manual-1
No ratings yet
CCPRGG2L Course Manual-1
106 pages
Administration: ABAP Dictionary Rel. 740 Payr
No ratings yet
Administration: ABAP Dictionary Rel. 740 Payr
8 pages
Sap Abap Learning Material
No ratings yet
Sap Abap Learning Material
102 pages
ExtendSim Database
No ratings yet
ExtendSim Database
102 pages
OS UNIT 5 File Structures notes
No ratings yet
OS UNIT 5 File Structures notes
11 pages
SQL W3schools
No ratings yet
SQL W3schools
110 pages
Oracle PL - SQL - INDEXES - View Table Indexes - Examples of Viewing The Indexes On An Oracle Table
No ratings yet
Oracle PL - SQL - INDEXES - View Table Indexes - Examples of Viewing The Indexes On An Oracle Table
2 pages
Coronel PPT Ch03
100% (1)
Coronel PPT Ch03
38 pages
Add Power To RPG 400 With Embedded SQL
No ratings yet
Add Power To RPG 400 With Embedded SQL
12 pages
Introduction To Structured Query Language (SQL)
No ratings yet
Introduction To Structured Query Language (SQL)
55 pages
Dbms Class Notes
No ratings yet
Dbms Class Notes
69 pages
Final Petrol PPM - Merged
No ratings yet
Final Petrol PPM - Merged
36 pages
ABAP Performance Tips: Using All The Keys in SELECT Statement
No ratings yet
ABAP Performance Tips: Using All The Keys in SELECT Statement
4 pages
MCA Practical Exercises
No ratings yet
MCA Practical Exercises
12 pages
CS8492-DBMS Syllabus
No ratings yet
CS8492-DBMS Syllabus
2 pages
Shrink
No ratings yet
Shrink
10 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
SQL SERVER Database Coding Standards and Guidelines Part 2
No ratings yet
SQL SERVER Database Coding Standards and Guidelines Part 2
5 pages
SQL WW3 Schools
100% (1)
SQL WW3 Schools
34 pages
DSA Tutorial - 2025
No ratings yet
DSA Tutorial - 2025
22 pages
Languages
No ratings yet
Languages
26 pages
Index OF FEDT (Manish)
No ratings yet
Index OF FEDT (Manish)
3 pages
T SQL Coding Standards and Best Practices For Developers
No ratings yet
T SQL Coding Standards and Best Practices For Developers
32 pages
Chapter-1-Basic-Concepts-of-Database-Management-System (1)
No ratings yet
Chapter-1-Basic-Concepts-of-Database-Management-System (1)
12 pages
Application Performance Tunning
No ratings yet
Application Performance Tunning
41 pages
Advanced Microsoft Excel: Mel Montes HRS Supervisor
No ratings yet
Advanced Microsoft Excel: Mel Montes HRS Supervisor
36 pages
CDS Best Practices
100% (1)
CDS Best Practices
38 pages
kdd98
No ratings yet
kdd98
5 pages

ADB Course Chapter_3.2 Indexes

Uploaded by

ADB Course Chapter_3.2 Indexes

Uploaded by

Data Storage & Indexes

Key Operations on an Index

Find all records with key in a given range

Find nearest neighbor to a data point?

Cost to update Size of

Many standard data structures, but adapted

Dense: Can tell whether a key is present

(Later: sparse better for insertions, dense

For a secondary index, need to point to a list

Note: the exact data structure we’ll look at is

(Fixed size nodes)

Non-leaf: é(n+1)/2ù pointers

Leaf: ë(n+1)/2û pointers to data

1. All leaves are at same lowest level

2. Pointers in leaves point to records, except

(3) Number of pointers/keys for B+ tree:

Max Max Min Min

* When there is only one record in the B+ tree, min pointers

(b) leaf overflow

(c) non-leaf overflow

(d) new root

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

n is number of keys / node

With modern hardware, get n = 1000 or more

Key concerns: query time, cost to update,

Next: given all these storage data

You might also like