5 Data Storage and Indexing
5 Data Storage and Indexing
Structure of Disks
Disk
several platters stacked on
a rotating spindle
one read / write head per surface
for fast access
platter has several tracks
~10,000 per inch
each track - several sectors
each sector - blocks
unit of data transfer - block
cylinder i - track i on all platters
Speed:
7000 to
10000 rpm
Platters
track
Read/write head
}
Prof P Sreenivasa Kumar
Department of CS&E, IITM
sector
File blocks:
sequence of blocks containing all the records of the file
Prof P Sreenivasa Kumar
Department of CS&E, IITM
Operations on Files
Insertion of a new record: may involve searching for appropriate
location for the new record
Deletion of a record: locating a record may involve search;
delete the record may involve movement of other records
Update a record field/fields: equivalent to delete and insert
Search for a record: given value of a key field / non-key field
Range search: given range values for a key / non-key field
How successfully we can carry out these operations
depends on the organization of the file and the availability
of indexes
Prof P Sreenivasa Kumar
Department of CS&E, IITM
10
11
Hashed Files
Very useful file organization, if quick access to the data record is
needed given the value of a single attribute.
Hashing field: The attribute on which quick access is needed and
on which hashing is performed
Data file: organized as a buckets with numbers 0,1, , (M 1)
(bucket - a block or a few consecutive blocks)
Hash function h: maps the values from the domain of the hashing
attribute to bucket numbers
12
Overflow
chain
Overflow
buckets
Main buckets
13
Overflow
chain
M-1
Overflow
buckets
Main buckets
14
15
16
17
Local depth
2
The # of
trailing
bits used in
the directory
3
2
3
3
Locating a record
Match the d-bit sequence with an entry in the directory and go to
the corresponding bucket to find the record
Prof P Sreenivasa Kumar
Department of CS&E, IITM
18
b1
b2
b0
1
00
01
10
11
d=2
d=2
b3
b1
b2
all local
depth = 2
b0 Full:
Bucket b0 is split
All records whose 2-bit sequence is 10 are
sent to a new bucket b3. Others are retained in b0
Directory is modified.
b0 Not full: New record is placed in b0. No changes in the directory.
Prof P Sreenivasa Kumar
Department of CS&E, IITM
19
00
01
10
11
d=2
000
001
010
011
100
101
110
111
b1
full
b3
b2
all local
depth = 2
d=3
b0
b1
b3
b4
b2
20
00
01
10
11
d=2
000
001
010
011
100
101
110
111
b1
b3
b2
all local
depth = 2
d=3
b0
b1
b3
b4
b2
21
Initial buckets = 1
0
Global
depth
Insert 12
45
22
1
1
0
1
Local depth
22
12
1
45
101101
22
10110
12
1100
11
1011
Bucket overflows
local depth = global depth
Directory doubles and split image
is created
45
Insert 11
1
1
0
1
22
12
1
45
11
Prof P Sreenivasa Kumar
Department of CS&E, IITM
22
Insert 15
1
22
12
2
2
45
00
01
10
11
Overflow occurs.
Global depth = local depth
Directory doubles and split occurs
2
11
15
Insert 10
2
12
2
2
00
01
10
11
45
2
10
22
45
101101
22
10110
12
1100
11
1011
15
1111
10
1010
Overflows occurs.
Since local depth < global depth
Split image is created
Directory is not doubled
2
11
15
Prof P Sreenivasa Kumar
Department of CS&E, IITM
23
Linear Hashing
Does not require a separate directory structure
Uses a family of hash functions h0, h1, h2,.
the range of hi is double the range of hi-1
hi(x) = x mod 2iM
M - the initial no. of buckets
(Assume that the hashing field is an integer)
Initial hash functions
h0(x) = x mod M
h1(x) = x mod 2M
24
Insertion (1/3)
Initially the structure has M main buckets
( 0 ,, M-1 ) and a few overflow buckets
0
Overflow
buckets
1
.
.
Split image
of bucket 0
25
Insertion (2/3)
0
On first overflow,
irrespective of where it occurs, bucket 0 is split
1
On subsequent overflows
buckets 1, 2, 3, are split in that order
(This why the scheme is called linear hashing) 2
N: the next bucket to be split
After M overflows,
M-1
all the original M buckets are split.
We switch to hash functions h1, h2
and set N = 0.
M
ho
h1
h1
h2
hi
hi+1
.
.
.
Split
images
M+1
.
.
26
27
Insertion (3/3)
Say the hash functions in use are hi, hi+1
To insert record with hash field value x,
Compute hi(x)
if hi(x) < N, the original bucket is already split
place the record in bucket hi+1(x)
else place the record in bucket hi(x)
28
Split pointer
0
Insert 12, 11
N
Insert 14
0
12
11
B0 overflows
Bucket pointed by
N is split
Hash functions are
changed
12
14
11
h0 = x mod 2
h1 = x mod 4
29
Insert 13
N
0
12
14
12
11
13
14
N 0
12
9
13
14
10
11
h0 = x mod 2
h1 = x mod 4
Insert 9
N
B1 overflows
B0 is split using h1
and split image
is created
11
13
Insert 10
0
12
Insert 18
N
1
11
13
14
10
h1 is
applied here
overflow at B2
split B1
h0 = x mod 4
h1 = x mod 8
18
30
Index Structures
Index: A disk data structure
enables efficient retrieval of a record
given the value (s) of certain attributes
indexing attributes
Primary Index:
Index built on ordering key field of a file
Clustering Index:
Index built on ordering non-key field of a file
Secondary Index:
Index built on any non-ordering field of a file
Prof P Sreenivasa Kumar
Department of CS&E, IITM
31
Primary Index
Can be built on ordered / sorted files
Index attribute ordering key field (OKF)
Index Entry:
disk address
of Bj
101
104
.
.
.
.
101
121
129
.
.
.
.
121
123
.
.
.
.
129
130
.
.
.
.
.
.
.
.
240
244
.
.
.
.
Data
file
32
33
An Example
Data file:
No. of blocks b = 9500
Block size B = 4KB
OKF length V = 15 bytes
Block pointer length p = 6 bytes
Index file
No. of records ri = 9500
Size of entry V + P = 21 bytes
Blocking factor BFi = 4096/21 = 195
No. of blocks bi = ri/BFi = 49
Max No. of block accesses for getting record
using the primary index
Max No. of block accesses for getting record
without using primary index
Prof P Sreenivasa Kumar
Department of CS&E, IITM
1 + log2 bi = 7
log2b = 14
34
9500
entries
49 entries
.
.
.
.
.
.
.
.
.
.
First level
index
49 blocks
data file
9500 blocks
35
36
Clustering Index
Built on ordered files where ordering field is not a key
Index attribute: ordering field (OF)
Index entry:
37
Secondary Index
Built on any non-ordering field (NOF) of a data file.
Case I: NOF is also a key (Secondary key)
value of the NOF Vi pointer to the record with Vi as the NOF value
value of the NOF Vi pointer(s) to the record(s) with Vi as the NOF value
(2)
value of the NOF Vi pointer to a block that has pointer(s) to the record(s)
with Vi as the NOF value
Remarks:
(1) index entry variable length record
(2) index entry fixed length One more level of indirection
Prof P Sreenivasa Kumar
Department of CS&E, IITM
38
39
An Example
Data file:
No. of records r = 90,000
Block size B = 4KB
Record length R = 100 bytes BF = 4096/100 = 40,
b = 90000/40 = 2250
NOF length V = 15 bytes
length of a record pointer Pr = 7 bytes
Index file :
No. of records ri = 90,000
record length = V + Pr = 22 bytes
BFi = 4096/22 = 186
No. of blocks bi = 90000/186 = 484
Max no. of block accesses to get a record
using the secondary index
1 + log2bi = 10
Avg no. of block accesses to get a record
without using the secondary index
b/2 = 1125
A very significant improvement
Prof P Sreenivasa Kumar
Department of CS&E, IITM
40
41
data file
90000
records
90000
entries
1 block
.
.
.
.
Second level
index
First level
3 blocks
index
484 blocks
.
.
.
.
.
.
2250
blocks
42
43
B+- trees
Balanced search trees
all leaves are at the same level
Leaf node entries point to the actual data records
all leaf nodes are linked up as a list
Internal node entries carry only index information
In B-trees, internal nodes carry data records also
The fan-out in B-trees is less
Makes sure that blocks are always at least half filled
Supports both random and sequential access of records
Prof P Sreenivasa Kumar
Department of CS&E, IITM
44
Order
Order (m) of an Internal Node
Order of an internal node is the maximum number of tree
pointers held in it.
Maximum of (m-1) keys can be present in an internal node
Order (mleaf) of a Leaf Node
Order of a leaf node is the maximum number of record
pointers held in it. It is equal to the number of keys in a
leaf node.
45
Internal Nodes
An internal node of a B+- tree of order m:
m
It contains at least 2 pointers, except when it is the root node
It contains at most m pointers.
If it has P1, P2, , Pj pointers with
K1 < K2 < K3 < Kj-1 as keys, where
m
2
j m, then
46
j m
P1 K 1 P2 K 2
x K1
Ki-1 Pi
Ki
Ki-1 < x Ki
Example
x 2
12
Kj-1 Pj
Kj-1 < x
2 < x 5 5 < x 12
Prof P Sreenivasa Kumar
Department of CS&E, IITM
x > 12
47
K1 Pr1 K2 Pr2
K j Pj
48
Order Calculation
Block size: B, Size of Indexing field: V
Size of block pointer: P, Size of record pointer: Pr
Order of Internal node (m):
As there can be at most m block pointers and (m-1) keys
(m*P) + ((m-1) * V) B
m can be calculated by solving the above equation.
Order of leaf node:
As there can be at most mleaf record pointers and keys
with one block pointer in a leaf node,
mleaf can be calculated by solving
(mleaf * (Pr + V)) + P B
Prof P Sreenivasa Kumar
Department of CS&E, IITM
49
50
12 15 ^
51
52
Example of Insertions
m = 3 mleaf = 2
Insert 20, 11
14
Insert 14
- ^
1
11
20 ^
Insert 25
Inserted at
leaf level
11
14
11
14
- ^
20
14
- ^
Insert 30
Overflow.
25 ^
split at 25.
25 is moved
up
20
14
11
14
25
20
25
30
53
Insert 12
14
12
11
12
- ^
14
25
20
25
- ^
30
54
Insert 22
14
12
11 12
Insert 23, 24
12
11
12
- ^
22 25
14
20
14
25
22
30
20 22
24
22
- ^
14
- ^
25
23
24
25
30
55
56
Example
14
22
12
11
12
24
14
20 22
25
23
24
25
30
Delete 20
14
12
Removed entry
from leaf here
22
12
11
24
14
22
25
23
24
25
30
57
Delete 22
14
24
23
12
14
11 12
23
25
24
25
Entry 22 is removed
from leaf and
internal node
Entries from right
sibling are
distributed to left
30
Delete 24
14
23 25
12
11
12
14
23
25
30
58
Delete 14
12
23 25
11
11
12
23
25
30
Delete 12
Level drop has occurred
23 25
11
23
25
30
59
60