File Organization (1)
File Organization (1)
2
Introduction
Databases are stored physically as files of records, which are
typically stored on magnetic disks
The collection of data that makes up a computerized database must
be stored physically on some computer storage medium
DBMS software can then retrieve, update, and process this data as
needed
Two main categories of storage medium
Primary storage
3
Primary vs Secondary
Storage
4
Storage Hierarchy
Volatile Cache
Primary
storage Memory unit price
5
Storage Hierarchy
At the primary storage level
Cache memory
Cache memory is typically used by the CPU to speed up execution of
program instructions using techniques such as prefetching and pipelining
It stores the segments of program that are frequently accessed by the
processor
Main memory
Provides the main work area for the CPU for keeping program
instructions and data
It is less expensive than cache memory and therefore larger in size
The drawback is its volatility and lower speed compared with cache
memory
6
Storage Hierarchy
At the secondary and tertiary storage level
The hierarchy includes magnetic disks, as well as mass storage in the form
of CD-ROM (Compact Disk–Read-Only Memory) and DVD (Digital Video
Disk or Digital Versatile Disk) devices, and finally tapes at the least
expensive end of the hierarchy
The storage capacity is measured in kilobytes (Kbyte or 1000 bytes),
megabytes (MB or 1 million bytes), gigabytes (GB or 1 billion bytes), and
even terabytes (1000 GB)
The word petabyte is now becoming relevant in the context of very large
repositories of data
Magnetic tapes are used for archiving and backup storage of data
7
Storage of Databases
DBMS stores information on (‘hard’) disks
This has major implications for DBMS design!
– READ: transfer data from disk to main memory (RAM)
– WRITE: transfer data from RAM to disk
– Both are high-cost operations, relative to in memory operations,
so must be planned carefully!
Why not store everything in main memory?
Costs too much
Main memory is volatile. We want data to be saved between runs
Typical storage hierarchy
Main memory (RAM) for currently used data
8
Secondary Storage Devices
Magnetic disks are used for storing large amounts of data
The most basic unit of data on the disk is a single bit of information.
To code information, bits are grouped into bytes
Capacity of a disk is the number of bytes it can store, which is
usually very large
9
Disk Storage Devices
A track is divided into smaller blocks or sectors
because it usually contains a large amount of information
Whole blocks are transferred between disk and main memory for
processing
10
Disk Storage Devices
11
Disk Storage Devices
A read-write head moves to the track that contains the block to be
transferred
Disk rotation moves the block under the read-write head for reading or
writing
A physical disk block (hardware) address consists of
a cylinder number (imaginary collection of tracks of same radius from
Figure
(a) A single-sided disk with read/write hardware 13
(b) A disk pack with read/write hardware
Components of a Disk
The platters spin (say, 90rps)
Read-write head
Positioned very close to the platter surface (almost touching it)
Reads or writes magnetically encoded information
Only one head reads/writes at any one time
14
Physical Characteristics of
Disks
Track
an information storage circle on the surface of a disk.
Over 16,000 tracks per platter
each track can store between 4KB and 50KB of data.
Each track is divided into sectors.
Tracks under heads make a cylinder (imaginary!)
Cylinder
the tracks with the same diameter on all surfaces of a disk pack.
Cylinder i consists of i-th track of all the platters
Sector
a part of a track with fixed size
separated by fixed-size interblock gaps
Typical sectors per track
200 (on inner tracks) to 400 (on outer tracks)
15
Pages and Blocks
Data files decomposed into pages (blocks)
fixed size piece of contiguous information in the file
sizes range from 512 bytes to several kilobytes
16
Pages and Blocks
Track
Gap
Sector
Page I/O --- one page I/O is the cost (or time needed) to transfer
one page of data between the memory and the disk.
The cost of a (random) page I/O =
seek time + rotational delay + block transfer time
Seek time
time needed to position read/write head on correct track.
head
Block transfer time
time needed to transfer data in the page/block
18
19
Magnetic Tape Storage
Devices
Disks are random access secondary storage devices because an
arbitrary disk block may be accessed at random once we specify
its address
Magnetic tapes are sequential access devices; to access the nth
block on tape, first we must scan the preceding n–1 blocks
Data is stored on reels of high-capacity magnetic tape, somewhat
similar to audiotapes or videotapes
A read/write head is used to read or write data on tape.
Data records on tape are also stored in blocks—although the blocks
may be substantially larger than those for disks
Tapes serve a very important function-backing up the database
One reason for backup is to keep copies of disk files in case the
data is lost due to a disk crash
20
Buffering of Blocks
When several blocks need to be transferred from disk to main
memory and all the block addresses are known, several buffers can
be reserved in main memory to speed up the transfer
While one buffer is being read or written, the CPU can process data
in the other buffer because an independent disk I/O processor
(controller) exists that, once started, can proceed to transfer a data
block between memory and disk independent of and in parallel
to CPU processing
21
Buffering of Blocks
22
Placing File Records on Disk
Records
Data is usually stored in the form of records
Each record consists of a collection of related data values or
For example
23
Placing File Records on Disk
Record Types
A collection of field names and their corresponding data types
constitutes a record type or record format definition.
A data type, associated with each field, specifies the types of
values a field can take
For example, an EMPLOYEE record type may be defined—using
the C programming language notation-as the following structure:
struct employee {
char name[30];
char ssn[9];
int salary;
int job_code;
char department[20]; } ;
24
Placing File Records on Disk
Files, Fixed-Length Records, and Variable-Length Records
File - sequence of records
Fixed-Length Records - If every record in the file has exactly the
same size (in bytes)
Variable-length records - If different records in the file have different
sizes
26
Placing File Records on Disk
Space is wasted when certain records do not have values for all the
physical spaces provided in each record
27
Placing File Records on Disk
Variable-Length Records
For variable-length fields, each record has a value for each field, but
we do not know the exact length of some field values.
To determine the bytes within a particular record that represent each
field, we can use special separator characters (such as ? Or % or
$) - which do not appear in any field value—to terminate variable-
length fields
28
Placing File Records on Disk
Variable-Length Records
A file of records with optional fields can be formatted in different ways.
If the total number of fields for the record type is large, but the number
of fields that actually appear in a typical record is small, we can
include in each record a sequence of
<field-name, field-value> pairs
rather than just the field values
29
Record Blocking and
Spanned
versus
block is the unit ofUnspanned
data transfer between disk Records
The records of a file must be allocated to disk blocks because a
and memory
When the block size > the record size, each block will contain
numerous records, although some files may have unusually large
records that cannot fit in one block
Example:
Record size R = 100 bytes
Block Size B = 2,000 bytes
Thus the blocking factor bfr = floor(2000/100) = 20
31
Record Blocking and
Spanned
versus Unspanned
Spanned organization of records
Records
To utilize this unused space, we can store part of a record on one block
and the rest on another.
A pointer at the end of the first block points to the block containing the
remainder of the record in case it is not the next consecutive block on
disk
Spanned - records can span more than one block
Whenever a record is larger than a block - use spanned organization
33
Allocating File Blocks on
Disk
Several standard techniques for allocating the blocks of a file on
disk
Contiguous allocation
linked allocation
Indexed allocation
35
Allocating File Blocks on
Disk
Linked Allocation
Disk files can be stored as linked lists, with the expense of the storage
space consumed by each link
Linked allocation involves no wastage of space, does not require pre-
known file sizes, and allows files to grow dynamically at any time
36
Allocating File Blocks on
Disk
Indexed Allocation
One or more index blocks contain pointers to the actual file blocks
Supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks
37
Operations on Files
DBMS software programs, access records by using the following
commands
OPEN - Prepares the file for reading or writing. Allocates
appropriate buffers (typically at least two) to hold file blocks from
disk, and retrieves the file header. Sets the file pointer to the
beginning of the file
Reset - Sets the file pointer of an open file to the beginning of the
file
Find (or Locate)- Searches for the first record that satisfies a
search condition. Transfers the block containing that record into a
main memory buffer (if it is not already there). The file pointer points
to the record in the buffer and it becomes the current record
38
Operations on Files
Read (or Get) - Copies the current record from the buffer to a program variable in
the user program. This command may also advance the current record pointer to
the next record in the file, which may necessitate reading the next file block from
disk
FindNext - Searches for the Searches for the next record in the file that satisfies
the search condition. Transfers the block containing that record into a main
memory buffer (if it is not already there)
Delete -Deletes the current record and (eventually) updates the file on disk to
reflect the deletion
Modify-Modifies some field values for the current record and (eventually) updates
the file on disk to reflect the modification
Insert - Inserts a new record in the file by locating the block where the record is to
be inserted, transferring that block into a main memory buffer (if it is not already
there), writing the record into the buffer, and (eventually) writing the buffer to disk
to reflect the insertion
Close - Completes the file access by releasing the buffers and performing any
other needed cleanup operations
39
Operations on Files
The preceding (except for Open and Close) are called record-at-a-
time operations because each operation applies to a single record
40
File Organization Vs Access
Method
File organization
Refers to the organization of the data of a file into records,
blocks, and access structures; this includes the way records and
blocks are placed on the storage medium and interlinked
Access method
On the other hand, provides a group of operations that can be
applied to a file
41
Methods for Organizing
Records of a File on Disk
Heap file
Sorted file
Hash file
RAID
42
Heap Files
Files of Unordered Records (Heap Files)
Also called a heap or a pile file
New records are inserted at the end of the file
Record insertion is quite efficient
A linear search through the file records is necessary to search for a
record
This requires reading and searching half the file blocks on the average, and
is hence quite expensive
For a file of b blocks, this requires searching (b/2) blocks, on average. If
no records or several records satisfy the search condition, the program must
read and search all b blocks in the file
Reading the records in order of a particular field requires sorting the file
records
This organization is often used with additional access paths, such as
the secondary indexes
43
Heap Files
To delete a record, a program must first find its block, copy the
block into a buffer, delete the record from the buffer, and finally
rewrite the block back to the disk. This leaves unused space in
the disk block.
Deleting a large number of records in this way results in wasted
storage space.
Another technique used for record deletion is to have an extra
byte or bit, called a deletion marker, stored with each record
spanned or unspanned organization can be use and it may be
used with either fixed-length or variable-length records
Modifying a variable- length record may require deleting the old
record and inserting a modified record because the modified
record may not fit in its old space on disk
44
Heap File Organization
Records are placed in the file in the order in which they
are inserted. Such an organization is called a heap file
Insertion is at the end
takes constant time O(1) (very efficient)
Searching
requires a linear search (expensive)
Deleting
requires a search, then delete
46
Sorted Files
Files of Ordered Records (Sorted Files)
We can physically order the records of a file on disk based on the
values of one of their fields—called the ordering field
This leads to an ordered or sequential file
If the ordering field is also a key field of the file—a field
guaranteed to have a unique value in each record
Some advantages ordered files over unordered files
Reading the records in order of the ordering key values becomes
extremely efficient because no sorting is required
Finding the next record from the current one in order of the ordering key
usually requires no additional block accesses because the next record is
in the same block as the current o
Using a search condition based on the value of an ordering key field
results in faster access when the binary search technique is used
47
Sorted Files
Figure
Some blocks of an ordered
(sequential) file of EMPLOYEE
records with Name as the
ordering key field
49
Sorted Files
50
Sorted Files
Inserting and deleting records are expensive operations for an
ordered file because the records must remain physically
ordered
Insert
To insert a record, we must find its correct position in the file,
based on its ordering field value, and then make space in the file
to insert the record in that position.
For a large file this can be very time consuming because, on the
51
Sorted Files
Modification
Modifying a field value of a record depends on two factors: the search
condition to locate the record and the field to be modified
Search Condition
position in the file. This requires deletion of the old record followed
by insertion of the modified record
52
Sequential File Organization
Insertion is expensive
records must be inserted in the correct order
locate the position where the record is to be inserted
if there is free space insert there
if no free space insert the record in an overflow block
In either case, pointer chain must be updated
Insert takes lg2(b) plus the time to re-organize records
b is the number of blocks
Deletion
use pointer chains
Searching
very efficient (Binary search)
This requires lg2(b) on the average
53
Sorted Files
54
Hash Files
Another type of primary file organization is based on hashing -
provides very fast access to records under certain search
conditions
The search condition must be an equality condition on a single field,
called the hash field
In most cases, the hash field is also a key field of the file, in which
case it is called the hash key
Idea - to provide a function h, called a hash function or
randomizing function, which is applied to the hash field value of a
record and yields the address of the disk block in which the record is
stored
A search for the record within the block can be carried out in a main
memory buffer. For most records, we need only a single-block
access to retrieve that record
55
Hash Files
Internal Hashing
For internal files, hashing is typically implemented as a hash
table through the use of an array of records. Suppose that the
array index range is from 0 to M – 1
we have M slots whose addresses correspond to the array
indexes.
Choose a hash function that transforms the hash field value into
an integer between 0 and M − 1.
One common hash function is the h(K) = K mod M function -
which returns the remainder of an integer hash field value K after
division by M; this value is then used for the record address
56
Hash Files
57
Hash Files
Other hashing functions can be used
Folding
58
Hash Files
Most hashing functions is that they do not guarantee that
distinct values will hash to distinct addresses
Hash collision
Occurs when the hash field value of a record that is being
inserted hashes to an address that already contains a different
record
In this situation, we must insert the new record in some other
position, since its hash address is occupied
The process of finding another position is called collision
resolution
59
Hash Files
Methods for collision resolution
Open addressing
Proceeding from the occupied position specified by the hash address, the
60
Hash Files
Hash Collision - Open addressing
61
Hash Files
Hashing with Chains
When a collision occurs, elements with the same hash key will be chained together.
A chain is simply a linked list of all the elements with the same hash key.
62
Hash Files
63
Figure : Collision resolution by chaining records
Hash Files
External Hashing for Disk Files
64
Hash Files
66
Hash Files
67
Hashing Techniques
The hashing scheme is called static hashing if a fixed
number of buckets is allocated
68
Hashing for Dynamic File
Organization
Hashing for Dynamic File Organization
Dynamic Files
Files where record insertions and deletion take place frequently
69
Dynamic And Extendible
Hashed Files
Dynamic and Extendible Hashing Techniques
Hashing techniques are adapted to allow the dynamic growth and
shrinking of the number of file records
These techniques include the following: dynamic hashing, extendible
hashing, and linear hashing
Both dynamic and extendible hashing use the binary
representation of the hash value h(K) in order to access a
directory
In dynamic hashing, the directory is a binary tree
In extendible hashing the directory is an array of size 2d where d is
called the global depth
The value of d can be increased or decreased by one at a time, thus
doubling or halving the number of entries in the directory array
Doubling is needed if a bucket, whose local depth d is equal to the
global depth d, overflows
70
Dynamic And Extendible
Hashed Files
The directories can be stored on disk, and they expand or shrink
dynamically
Directory entries point to the disk blocks that contain the
stored records
An insertion in a disk block that is full causes the block to split into
two blocks and the records are redistributed among the two blocks
The directory is updated appropriately
Dynamic and extendible hashing do not require an overflow area.
Linear hashing does require an overflow area but does not use a
directory
Blocks are split in linear order as the file expands
71
Insertion in Extendible
Hashing Scheme
2 -bit sequence for the record to be inserted
72
Insertion in Extendible
Hashing Scheme
73
Deletion in Extendible
Hashing Scheme
74
Extendible Hashing
75
Dynamic Hashing
A precursor to extendible hashing was dynamic hashing
The storage of records in buckets for dynamic hashing is somewhat
similar to extendible hashing.
The major difference is in the organization of the directory
Dynamic hashing maintains a tree-structured directory with two
types of nodes:
Internal nodes that have two pointers—the left pointer
records
76
Dynamic Hashing
77
Linear Hashing
Idea - is to allow a hash file to expand and shrink its number of
buckets dynamically without needing a directory
78
Linear Hashing
A key property of the two hash functions hi and hi+1 is that any
records that hashed to bucket 0 based on hi will hash to either
bucket 0 or bucket M based on hi+1; this is necessary for linear
hashing to work
As further collisions lead to overflow records, additional buckets are
split in the linear order 1, 2, 3, .... If enough overflows occur, all the
original file buckets 0, 1, ..., M− 1 will have been split, so the file now
has 2M instead of M buckets, and all buckets use the hash function
hi+1.
Hence, the records in overflow are eventually redistributed into
regular buckets, using the function hi+1 via a delayed split of their
buckets
79
Insertion
80
Linear Hashing
Advantages
Directory is not needed
Simple to implement
http://queper.in/drupal/blogs/dbsys/linear_hashing
81
Parallelizing Disk Access
Using RAID Technology
Secondary storage technology must take steps to keep up in
performance and reliability with processor technology
The main goal of RAID is to even out the widely different rates
of performance improvement of disks against those in
memory and microprocessors
82
RAID Technology
A natural solution is a large array of small independent
(inexpensive) disks acting as a single higher-performance
logical disk
A concept called data striping is used, which utilizes
parallelism to improve disk performance
Data striping distributes data transparently over multiple disks
to make them appear as a single large, fast disk
83
RAID Technology
Provides
Increased performance
Fault Tolerance
Redundancy
RAID Levels
Level 0
Level 1
Level 2
Level 3
Level 4
Level 5
Level 6
Level 10 (1+0)
84
RAID Technology
RAID Level 0
Minimum number of drives required - 2
A RAID Level 0 system uses data striping - dividing data
evenly across two or more storage devices
No redundant information is maintained
Purpose - speed up performance as organizing data in such a
way allows faster reading and writing of files
Not fault-tolerant should not be used for critical data
Simple and easy to implement
85
RAID Technology
Data striping means breaking up contiguous data that would
normally go on a single disk
The data is distributed to many disks, either by byte (a) or by
block (b)
86
RAID Technology
RAID Level 1 – minimum no. of drives required - 2
Disk Mirroring - is fault-tolerant as it duplicates data by
simultaneously writing on two storage devices
Therefore, each disk has an exact copy on another disk
RAID 1 - ensures protection against data loss. If a problem arises
with one disk, the copy provides the data needed
Writing takes more time as it only uses the capacity of one disk and
has to operate twice
Disadvantages
Uses only half of the storage capacity
More expensive
87
RAID Technology
RAID Level 2
Bit-level striping means that the file is broken into “bit-sized
pieces”.
It uses a Hamming code for error correction
Theoretical performance is very high, but it would be so expensive
to implement
88
RAID Technology
RAID Level 3
Requires a minimum of 3 drives to implement
Byte-level striping means that the file is broken into "byte sized
pieces“.
Written in parallel on two or more drives
An additional drive stores parity information
89
RAID Technology
RAID Level 4
Minimum nos. of drives required : 3 (2 disks for data and 1 for
parity)
Level 4 provides block-level striping (like Level 0) with a parity
disk
If a data disk fails, the parity data is used to create a replacement
disk
90
RAID Technology
RAID Level 5
Most common secure RAID level
Instead of a dedicated parity disk, parity information is spread
across all the drives
91
RAID Technology
RAID Level 6
The parity data are written to two drives
The chances that two drives break down at exactly the same
moment are of course very small
Advantages
Read data transactions are very fast
RAID 6 is more secure than RAID 5
92
RAID Technology
RAID level 10 – combining RAID 1 & RAID 0
Combine the advantages of RAID 0 and RAID 1 in one single system
Provides security by mirroring all data on secondary drives while using
striping across each set of drives to speed up data transfers
Advantage
If something goes wrong with one of the disks, the rebuild time is very fast since
all that is needed is copying all the data from the surviving mirror to a new drive
Disadvantage
Half of the storage capacity goes to mirroring. expensive way to have
redundancy.
93