File Organization
File Organization
The cost of page I/O (input from disk to main memory and
output from memory to disk) dominates the cost of typical
database operations, and database systems are carefully
optimized to minimize this cost.
FILE ORGANIZATIONS AND INDEXING
The file of records is an important abstraction in a DBMS
A file can be created, destroyed, and have records inserted into and
deleted from it. It also supports scans; a scan operation allows us to
step through all the records in the file one at a time.
A relation is typically stored as a file of records.
The file layer stores the records in a file in a collection of disk
pages. It keeps track of pages allocated to each file, and as records
are inserted into and deleted from the file, it also tracks available
space within pages allocated to the file.
The simplest file structure is an unordered file, or heap file.
Records in a heap file are stored in random order across the pages
of the file.
An index is a data structure that organizes data records on
disk to optimize certain kinds of retrieval operations.
There is an index record for every search key value in the data file
This record contains the search key and also a reference to the first
data record with that search key value.
Sparse Index
The index record appears only for a few items in the data file.
Each item points to a block as shown.
To locate a record, we find the index record with the largest
search key value less than or equal to the search key value we
are looking for.
We start at that record pointed to by the index record, and
proceed along with the pointers in the file (that is,
sequentially) until we find the desired record.
Sparse Index
Dense vesus Sparse Index
Dense indexes require more maintenance than sparse indexes at
write-time. Since every row must have an entry, the database must
maintain the index on inserts, updates, and deletes.
Having an entry for every row also means that dense indexes will
require more memory.
The benefit of a dense index is that values can be quickly found with
just a binary search.
The lowest level of the tree, called the leaf level, contains the
data entries; in our example, these are employee records.
Tree-Based Indexing
This structure allows us to efficiently locate all data entries with
a particular search key value
All searches begin at the topmost node, called the root, and the
contents of pages in non-leaf levels direct searches to the correct
leaf page.
Non-leaf pages contain node pointers separated by search key
values.
The node pointer to the left of a key value k points to a
subtree that contains only data entries less than k.
The node pointer to the right of a key value k points to a
subtree that contains only data entries greater than or equal
to k.
Example
Finding the correct leaf page is faster than binary search of the
pages in a sorted file because each non~leaf node can
accommodate a very large number of node-pointers, and the
height of the tree is rarely more than three or four in practice.
Applying the hash function to the age field identifies the page
that the record belongs to.
If we do not have the search key value for the record, for
example, the index is based on age and we want records with a
given age value, we have to scan all pages in the file
Figure 8.2 also shows an index with search key sal that contains
(sal, rid) pairs as data entries.
The rid (short for record id) component of a data entry in this
second index is a pointer to a record with search key value sal