Ch4-Data Storage and Indexing
Ch4-Data Storage and Indexing
Deletion of record i:
alternatives:
move records i + 1, . . ., n
to i, . . . , n – 1
move record n to i
do not move records, but
link all free records on a
free list
Deleting record 3 and compacting
• When a record is deleted, we could move the record that came after it
into the space formerly occupied by the deleted record, and so on, until
every record following the deleted record has been moved ahead.
• Such an approach requires moving a large number of records
Deleting record 3 and moving last record
• It might be easier simply to move the final record of the file into the space
occupied by the deleted record.
• It is undesirable to move records to occupy the space freed by a deleted
record, since doing so requires additional block accesses.
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record,
and so on
Can think of these stored addresses as pointers since they “point” to
the location of a record.
More space efficient representation: reuse space for normal attributes
of free records to store pointers. (No pointers stored in in-use records.)
Variable-Length Records
Variable-length records arise in database systems in several ways:
Storage of multiple record types in a file.
Record types that allow variable lengths for one or more fields.
Record types that allow repeating fields such as arrays or multiset.
Two different problems must be solved by any such technique:
How to represent a single record in such a way that individual
attributes can be extracted easily.
How to store variable-length records within a block, such that records
in a block can be extracted easily.
Variable-Length Records
The representation of a record with variable-length attributes typically
has two parts:
An initial part with fixed length attributes, followed by data for variable
length attributes.
Fixed-length attributes, such as numeric values, dates, or fixed length
character strings are allocated as many bytes as required to store
their value.
Variable-length attributes, such as varchar types, are represented by a
pair (offset, length), where offset denotes where the data for that
attribute begins within the record, and length is the length in bytes of
the variable-sized attribute.
Variable-Length Records
The figure shows an instructor record, whose first three attributes ID,
name, and dept name are variable-length strings, and whose fourth
attribute salary is a fixed-sized number.
We assume that the offset and length values are stored in two bytes
each, for a total of 4 bytes per attribute.
The salary attribute is assumed to be stored in 8 bytes, and each string
takes as many bytes as it has characters.
The figure also illustrates the use of a null bitmap, which indicates which
attributes of the record have a null value.
In this particular record, if the salary were null, the fourth bit of the bitmap
would be set to 1, and the salary value stored in bytes 12 through 19
would be ignored.
Since the record has four attributes, the null bitmap for this record fits in 1
byte.
Variable-Length Records: Slotted Page Structure
department
instructor
multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
Good for queries involving department instructor, and for queries
involving one single department and its instructors
Bad for queries involving only department or instructor
select *
from department;
requires more block accesses
Can add pointer chains to link records of a particular relation
Results in variable size records
Data Dictionary Storage
❑ A relational database system needs to maintain the data about
relations (metadata), such as the schema of the relations.
❑ Relational schemas and other metadata about relations are stored in a
structure called the data dictionary or system catalog.
Information about relations
Names of relations
Names of attributes of each relation
Domains and length of attributes
Names and definitions of views
integrity constraints
Data on users of the system
Names of authorized users
Accounting information about users
Passwords or other information used to authenticate users
Statistical and descriptive data
number of tuples in each relation
Method of storage for each relation
Physical file organization information
How relation is stored (sequential/hash/…)
Physical location of relation
operating system file name or
disk addresses of blocks containing records of the relation
Information about indices
name of the index
name of relation being indexed
attribute on which the index is defined
type of index formed
Data Dictionary Storage (Cont.)
System designer decide how to represent system using relations.
One possible representation as follows:
Relational
representation on
disk
Specialized data
structures
designed for
efficient access, in
memory
Storage Access
A database file is partitioned into fixed-length storage units called
blocks. Blocks are units of both storage allocation and data
transfer.
Database system’s major goal is to minimize the number of block
transfers between the disk and memory. We can reduce the
number of disk accesses by keeping as many blocks as possible
in main memory.
Buffer – portion of main memory available to store copies of disk
blocks.
Buffer manager – subsystem responsible for allocating buffer
space in main memory.
Buffer Manager
Programs call on the buffer manager when they need a block
from disk.
1. If the block is already in the buffer, buffer manager returns
the address of the block in main memory to the requester.
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if required,
to make space for the new block.
2. Replaced block written back to disk only if it was
modified since the most recent time that it was written
to/fetched from the disk.
2. Reads the block from the disk to the buffer, and returns
the address of the block in main memory to requester.
Buffer Manager
Typical virtual-memory management schemes:
• Buffer replacement strategy: When there is no room left in the buffer, a
block must be removed from the buffer before a new one can be read in. Most
operating systems use a least recently used (LRU) scheme, in which the
block that was referenced least recently is written back to disk and is removed
from the buffer.
• Pinned blocks: Most recovery systems require that a block should not be
written to disk while an update on the block is in progress. A block that is not
allowed to be written back to disk is said to be pinned.
• Forced output of blocks. There are situations in which it is necessary to
write back the block to disk, even though the buffer space that it occupies is
not needed. This write is called the forced output of a block.
Buffer-Replacement Policies
Most operating systems replace the block least recently used (LRU
strategy)
However, a database system is able to predict the pattern of future
references more accurately than an operating system.
Queries have well-defined access patterns (such as sequential
scans), and a database system can use the information in a user’s
query to predict future references
LRU can be a bad strategy for certain access patterns involving
repeated scans of data
For example: when computing the join of 2 relations r and s
by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
Buffer-Replacement Policies
Ex: select *
from instructor natural join department;
Indexing
Ordered Hashing
Indices B+ Tree B-Tree
search-key pointer
Dense Index Files
Dense index — Index record appears for every search-key
value in the file.
E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
In a dense clustering index, the index record contains the search-key
value and a pointer to the first data record with that search-key value.
The rest of the records with the same search-key value would be stored
sequentially after the first record.
Dense index on dept_name, with instructor file sorted on dept_name
Dense Index Files (Cont.)
In a dense non clustering index, the index must store a list of
pointers to all records with the same search-key value.
Dense index on dept_name, with instructor file sorted on dept_name
Sparse Index Files
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value <= K
Search file sequentially starting at the record to which the index
record points
Dense and Sparse Index
Sparse Index Files (Cont.)
Compared to dense indices:
Less space and less maintenance overhead for insertions and
deletions.
Generally slower than dense index for locating records.
Good tradeoff: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
Index Update
Index Update
Index Update: Insertion
Single-level index insertion:
➢ Perform a lookup using the search-key value appearing in the record to
be inserted.
➢ Dense indices –
❖ if the search-key value does not appear in the index, insert it at
appropriate position.
❖ Otherwise the following actions are taken:
a. If the index record stores pointers to all records with the same search-
key value, the system adds a pointer to the new record to the index
record.
b. Otherwise, if the index record stores a pointer to only the first record with
the search-key value, the system then places the record being inserted
after the last records with the same search-key values.
➢ Sparse indices – if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is created.
In this case, the first search-key value appearing in the new block is
inserted into the index.
Index Update: Deletion
Sparse indices –
❖ If the index does not contain an index record with the search-
key value of the deleted record, nothing needs to be done to
the index.
❖ Otherwise the system takes the following actions:
a. If the deleted record was the only record with its search
key, the system replaces the corresponding index record
with an index record for the next search-key value (in
search-key order). If the next search-key value already has
an index entry, the entry is deleted instead of being
replaced.
b. Otherwise, if the index record for the search-key value
points to the record being deleted, the system updates the
index record to point to the next record with the same
search-key value.
Multilevel Index
If an index is small enough to be kept entirely in main memory,
the search time to find an entry is low.
However, if the index is so large that it can not be kept in main
memory, index blocks must be fetched from disk when
required.
Solution: treat primary index kept on disk as a sequential file
and construct a sparse index on it.
inner index – the primary index file
outer index – a sparse index of primary index
If even outer index is too large to fit in main memory, yet
another level of index can be created, and so on.
Indices at all levels must be updated on insertion or deletion
from the file.
Multilevel Index (Cont.)
All paths from the root of the tree to the leaf of the tree are
of the same length (balance tree).
Each node that is not a root or a leaf has between n/2 and
n children.
A leaf node has between (n–1)/2 and n–1 values
Special cases:
If the root is not a leaf, it has at least 2 children.
If the root is a leaf (that is, there are no other nodes in
the tree), it can have between 0 and (n–1) values.
B+-Tree Node Structure
Typical node