IT3031-L06-Indexing
IT3031-L06-Indexing
Data-Driven Applications
Indexes
Files of Records
Three types
Heap File Organization
Sequential File Organization
Hashing File Organization
Alternative File
Organizations
Many alternatives exist, each ideal for some situation ,
and not so good in others:
Heap files: Suitable when typical access is a file scan
Data
DIRECTORY Page N
Indexes
Three alternatives:
1. Data record with key value k (Alt. 1)
2. <k, rid of data record with search key
value k> (Alt. 2)
3. <k, list of rids of data records with
search key k> (Alt. 3)
Terminology
File of records containing index
entries = index file
Index entries
CLUSTERED direct search for UNCLUSTERED
data entries
30
some data record), then Ashby
33
Cass Cass, 50, 5004
dense. Smith Daniels, 22, 6003
Every sparse index is Tracy, 44, 5004
50
k1 k2 kN Index File
Data Entries
("Sequence set")
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133
Typical capacities:
Height 4: 1334 = 312,900,700 records
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
occupancy is
guaranteed in 2* 3* 5* 7* 8*
both leaf and
index pg splits.
Note difference
between copy- Entry to be inserted in parent node.
(Note that 17 is pushed up and only
up and push- 17
appears once in the index. Contrast
this with a leaf split.)
up; be sure you
understand the
5 13 24 30
reasons for this.
Example B+ Tree After
Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Try to re-distribute, borrowing from sibling
(adjacent node with same parent as L).
If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry (pointing to L or
sibling) from parent of L.
Merge could propagate to root, decreasing height.
Example Tree After
(Inserting 8*, Then)
Deleting 19* and 20* ...
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Observe `toss’ of
index entry (on 22* 27* 29* 33* 34* 38* 39*
Alternatives…
Overflow leaf pages
N-1
Primary bucket pages Overflow pages
Static Hashing… (contd.)
Example
GLOBAL DEPTH
2 2
Bucket B
00 1* 5* 21* 13*
Directory is array of size 4.
01
To find bucket for r, take 2
10
last `global depth’ # bits of 10*
Bucket C
11
h(r); we denote r by h(r).
If h(r) = 5 = binary 101,
DIRECTORY 2
it is in bucket pointed to 15* 7* 19*
Bucket D
by 01.
DATA PAGES
2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
Points to Note
problems!
Comments on Extendible
Hashing (contd.)
Delete: If removal of data entry
makes bucket empty, can be
merged with `split image’. If each
directory element points to same
bucket as its split image, can halve
directory.
Summary
File Organizations
Indexes
B+ Tree
Hashing (Extendible)