File Organizations
File Organizations
File Organizations
(Chapters 13 and 14 of the textbook)
V Kumar
School of Computing and Engineering
University of Missouri-Kansas City
File Organizations
A file is a collection or set (ordered or unordered) of data elements stored on a storage media.
A system software module is responsible for managing (reading, processing, deleting, etc.) a file.
Let us see how much work goes in a simple file operation.
The system performs the following steps to write a character to a sequential file.
Example: write (f1, c); where f1 = text file name. c = a char variable contains a value 'P'.
Record: A set of logically related fields. Cardinality of this set may be fixed or variable,
i.e., a record size may be fixed or variable. A file, therefore, is a collection of
logically related records
A Field A Record A File
SSN Name Age Phone # SSN Name Age Phone #
A fixed record size may be larger or equal to or smaller than a disk block. For storage
purpose a unit called Blocking factor (Bfr) is defined, which is the number of records in a block
and computed as follows:
Blocking factor - Bfr (Number of records in a disk block) = ⎣B/R⎦ where B is disk block size in
number of records and R is file record size. Thus, unused space in a block: B - (Bfr * R) bytes
Disk parameters: Seek time, rotational latency time and block transfer time.
Seek time: The time it takes to move the read/write arm to the correct cylinder. It is the largest in
cost. Average seek time is the same as the time it takes to traverse one third of the cylinders.
Rotational latency time: The time the disk unit takes to move or to position the read/write on to
the beginning of the sector from where the file records are stored.
Block transfer time: Time for the read/write head to pass over a disk block. During the block
transfer time, the bytes of data can be transferred between the disk and main memory.
Average rotational latency (r) = 1/2 of one disc revolution = 0.5. rpm = No. of rotations per min.
1 60 × 1000 0.5 × 60 × 1000
Time for 1 revolution= min = ms. Time for half revolution = ms.
rpm rpm rpm
Common speed for disk rotation is 3600 rpm so average latency time =
0.5 × 60 × 1000
ms = 8.3ms
3600
Let data transfer speed = t bytes/ms and block size = B, then the Block transfer time (btt) = B/t
ms.
In a formatted data on the disk there is a gap between two consecutive blocks. Some control
information (information about the following block) is stored in this gap. In reading a set of
blocks these gaps are also read so we need to take into account the time to read these gaps in the
computation of block transfer time.
Suppose t' = data transfer speed for formatted data (block and the gap). So the effective
block transfer time Ebt = B/t'. We use IBM 3380 disk in our example to compute various
parameter values. The total capacity of the disk is over 2 billion bytes. Table 1 lists the values
of some useful parameters of this disk.
Table 1
Parameter Definition Value
B Block size 2400 bytes
t Data transfer speed 3000 bytes/ms
btt Block transfer time 0.8ms
t' Formatted data (interblock gap) speed 2857 bytes/ms
ebt Effective block transfer time 0.84 ms (= B/t')
N No. of cylinders 885 per spindle
r Average rotational latency 8.3 ms
s Average seek time 16 ms
It is the sum of average seek time, average rotational latency time, and 10 effective transfer
time. Effective block transfer time must be used, since the head must move over the interblock
gap. So reading time = s + r + 10 × ebt = (16 + 8.3 + 8.4) = 32.7 ms. Note that the seek time
and the rotational latency time are significant with respect to the transfer time.
In this case individual seek time and rotational latency time are required for each block. So
reading time = 10 × (s + r + btt) = 10 × (16 + 8.3 + 0.84) = 10 × 25.14 = 251 ms. The random
time for reading 10 blocks is higher than the sequential time by a factor of 8. Thus, time for
reading 100 sequential blocks = 0.1083 sec. Time for reading 100 random blocks = 2.510 sec.
Conclusions:
File Organizations
One of the main objectives of file organization is to speed up the file retrieval time, that is, to
reduce the I/O time. This can be done by improving disk access speed but that has a limit
because of the hardware constraints. The other way is to store file records in such a way that
records can be accessed concurrently or in parallel. A hardware solution to this is using RAID
structure. A RAID is an array of several disks which can be accessed in parallel. A software
solution is through some file organization. We discuss a number of file organizations. Note that
no single file organization is the most efficient for all types of data access. For this reason a
database management system uses a number of different file organizations for storing the same
set of data.
There are basically three categories of file organizations (a) Sequential organization, (b)
organization using Index, and (c) Random organization. We begin with sequential category.
Files of these categories are called sequential files.
We proceed as follows; (a) understand the file organization, (b) understand how a set of
operations are performed, and (c) analyze the time required to performed each operation. This
will help us to identify which file is suitable for what kind of data processing.
Sequential file
In this organization records are written consecutively when the file is created. Records in a
sequential file can be stored in two ways. Each way identifies a file organization.
• Pile file: Records are placed one after another as they arrive (no sorting of any kind).
• Sorted file: Records are placed in ascending or descending values of the primary key.
Pile file: The total time to fetch (read) a record from a pile file requires seek time (s), rotational
latency time (r), Block transfer time (btt), and number of blocks in the file (B). We also need a
key field of the record (K). In pile file we do not know the address of a record, so we search
sequentially. The possibilities are (a) the desired record may be in the first block or (b) it may be
in some middle block or (c) it may be in the last block.
1 B 1 B × ( B + 1) B
So the average number of blocks read × ∑i = × ≈ . Thus the time to find
B i =1 B 2 2
B
(read) a record in a pile file is approximately: TF = × ebt . The fetching (searching) process is
2
shown in the flow diagram (Figure 1) and the example illustrates the time involved.
File Reorganization: In file reorganization all records, which are marked to be deleted are
deleted and all inserted records are moved to their correct place (sorting). File reorganization
steps are:
• read the entire file (all blocks) in RAM.
• remove all the deleted records.
• write all the modified blocks at a different place on the disk.
Time to read the entire old file blocks = B (No. of blocks in the file) × ebt.
Time to write out new file blocks (n) = (n /Bfr) × ebt.
Total reorganization time TR = (b + n/Bfr)× ebt.
Example
Environment: Hospital
Record size: 400 bytes.
File size: 100,000 records 40 megabytes.
Storage disk: IBM 3380.
Block size: 2400 bytes.
Number of blocks in the file (B): 40 mb/2400 = 16,667.
The total time to read a record is TF: 8333 × 0.84 ms = 7 secs.
This time is tolerable for infrequent reads. However, if frequency of reads is large, then this
time is not acceptable. Suppose we want to look for 104 names of patients. For this the total
search time is = 7 × 104 ms = 19 hrs. and time to read through the entire file with independent
fetches is TX (independent fetches) = 100,000 × 7000ms > 8 days.
Missouri has about 5 million personal income tax record of 400 bytes each. This will fit on
one IBM 3380 drive. The time to read this file with independent searches is:
5000000 × 7000
TX = = 2 years .
1000 × 60 × 60 × 24 × 365
Retrieval: Records are retrieved in a consecutive order. The order of record storage determines
order of retrieval. During retrieval several required operations (partial result output etc.) can be
performed simultaneously.
Insert, delete and modify (Update): Sequential files are usually not updated in place. Each
operation regenerates a new version of the file. The new version contains the up-to-date
information and the old version is usually saved for recovery. The new version becomes the old
version and the next update to the file uses this old version to generate next new version. The
intensity or frequency of use of a sequential file is defined by a parameter called “Hit ratio”,
which defines is defined as follows:
No. of records accessed for responding to a query
Hit ratio = .
Total number of records in the file
Desirable: high hit ratio value. This means a larger number of records are accessed to respond to
a query. Interactive transactions have very low hit ratio.
Advantages of sequential file
• Good for batch transactions.
• Simple to implement
Track No. Key Record Key Record Key Record Key Record
Track 1 10 rec10 14 rec14 18 rec18 20 rec20
Track 2 26 rec26 34 rec34 41 rec41 60 rec60
Track 3 72 rec72 73 rec73 74 rec74 75 rec75
Track 4 77 rec77 78 rec78 79 rec79 82 rec82
Track 5 89 rec89 91 rec91 93 rec93 98 rec98
Track Index
Track No. Highest key on the track
1 20
2 60
3 75
4 82
5 98
Operation on the file: Read: record 79.
Sequential access: Search sequentially from track 1, record 10. Very slow.
Semi-random access: (a) Search track index, (b) find the track that contains a key greater than
79. This takes us to track 4. Search track 4 sequentially to find record 79.
Problem: For a large file track index would be large. A large index is not convenient to manage.
Solution: Create Cylinder index to manage track index.
Cylinder index
Cylinder Highest key
1 98
2 184
3 278
Track index for cylinder 1 Track index for cylinder 2 Track index for cylinder 3
Track Highest key Track Highest key Track Highest key
1 20 1 107 1 201
2 60 2 122 2 210
3 75 3 148 3 223
4 82 4 163 4 259
File Processing
Deletion: Simple. Find the desired record. Put a special marker into the record to indicate that it
has been deleted.
Insertion: Cumbersome.
Example: Insert record 55.
Destination: Track 2. Track 2 is full. Move record 60 to the overflow area. Modify the track
index.
10 14 18 20 1 20 20 Null
26 34 41 55 2 55 60 Address of Rec. 60
72 73 74 75 3 75 75 Null
77 78 79 82 4 82 82 Null
89 91 93 98
60
Add record 42. Result Prime Area Track Index. Other records can be inserted similarly. When
the overflow area is full then records are inserted in overflow area on the disk, however, this
seldom happens.
10 14 18 20 1 20 20 Null
26 34 41 42 2 42 60 Address of Rec. 55
72 73 74 75 3 75 75 Null
77 78 79 82 4 82 82 Null
89 91 93 98
Implementation: The cylinder index is usually kept in the memory. The cylinder index gives the
address of the correct track index, which is located on the disk. The track index is fetched and the
address of the record is found.
As new records are added, the ISAM file degraded in performance. It becomes slow and
often had to be reorganized at high cost. This is why ISAM is not the choice of indexing method
for new systems. Let us suppose that all overflows is on the same cylinder. Then to look up a
record, we obtain cylinder track index and search the primary area. If the record is not there then
we must follow the chain of pointers to various records in the overflow area of the cylinder. If
this overflow area does not contain the record then the second overflow area on another cylinder
(not necessarily the adjacent cylinder). So a search for a record may require several seeks.
Similarly, reading the file in order after many insertions becomes quite complicated. Each linked
list must be followed, record-by-record, to read sequentially. This can require a large number of
seeks. ISAM is not used any more. It is replaced by B+ structure.
In an index set (block) the index records are maintained physically in ascending sequence by
primary key value, The index records in the index set are nondense, meaning that there is only
one index record for a lower level index block.
Index Sequence Set: Nondense. Each record in this set points to one Control Interval.
Control Interval: Set of data records, which are physically stored in ascending key
values.
Control Area: An ordered set of control intervals and Free control intervals for a file.
Distributed free space: Set of free control intervals in a control area.
The sequence set contains index records that point to every control interval allocated in the
file. The following diagram illustrates Index sequence set, Control intervals, Distributed free
space and Control area. The size of control area and the number of control intervals may be
predefined. At the end of file initialization the unfilled control intervals are set aside as
distributed free space.
A D F
A D F
A CD F
Not enough room in the target control interval: The available room in the control area is
insufficient to accommodate the new record. In order to insert C we must get a new control
interval. If there is a free control interval in the control area then the records with higher key
values from the control interval are copied into the new control interval. If there is no free
control interval in a control area then a new control area is allocated and initialized. The
initialization process copies all the control intervals with records having value smaller than the
new record. The new record is then copied into the proper place and the rest of the records are
then copied. The old control area is discarded, but this seldom happens.
F
A B D F
Index sequence set
C A B C
F
D F
Mofified index sequence set
The control area is completely full: In this situation an entire new control area is created,
added to the file, and then populated.
File Size and File creation: User may specify the number and size of control intervals and
control areas. If not specified, VSAM will use some default value. The size of control interval
dictates the size of the unit of I/O data transfer. It is important for performance reasons to make
the control interval a multiple of the actual physical blocks on the disk itself.
File name: Given by the user. This name is stored in the master catalogue.
Freespace: The user has the capability of allocating distributed free space throughout the file.
This space will not be used when the file is populated. However, this will be available for control
interval spliting tasks.
Record size: Greater flexibility. The user must specify not only the maximum record length but
also the average record size. VSAM can use the average size to perform calculations as to how
large, for example, a control interval must be.
File population: One of the main ways is to sort all data records by their key field and present
this to the system. This is called mass insert and can optimize the process of how to move
records within control intervals.
The following diagram illustrates this process:
Empty file
Original file A D K Z
Add record L A D K L Z
A D K L M N O P
Direct File
Indexing provides the most powerful and flexible tools for organizing files. However, (a)
indexes take up space and time to keep them organized and (b) the amount of I/O increases with
5 N
5 N 6 R
H(K) Home address 6 R First try (full) 7 J Wrap around
7 J Second try (full) 8 M
8 M Third try (full) H(T) Home address 9 S
Fourth try. K's home address
Collision Repair by Progressive Overflow Home address search beyond the end of file
Problems
• What happens if there is a search for a record, but the record was never placed in the file?
Access record: Green. 1. Load the buffer in the memory. 2. Search the bucket serially.
Solution: Mark the deleted record with a tombstone. The search for a record will not stop at a
tombstone. This solution makes insertion difficult.
The program gets the address 5 and tries to find an empty slot. During the search it encounters a
tombstone. The record Smith is inserted here.
Solution: The program must search the entire file and then go back to the first tombstone and
insert the record.
Another problem with a tombstone is it should not be inserted when the last record or a record,
which does not have any successor, is deleted.
The following diagram shows the state of the file after the distribution of the four old records
and the insertion of new record with key 1208.
Buckets after rehashing
208 508
408 17308
1208
Bucket 8 Bucket 108
Suppose the record 1608 is now inserted into bucket 8 and then record 5808 comes. We need to
split bucket 8 a second time, and on this occasion we use: address = key mod 4N (key mod 400).
Bucket 208 is allocated to the file. The file after a second split
2408 508 208
1208 17308 5808
1608
Bucket 8 Bucket 108 Bucket 208
In general, if we are splitting a bucket that has been involved as destination of S previously
splits, we use the function: address = key mod (2s+1 × N).
Disadvantages
• Waste of space (unused buckets after each split).
• If two buckets n and n+j are in use then all buckets between n and n+j must be available
to the algorithm for inserting records.
• The history of insertion must be saved to access a record. This is because many related
hashing functions are used in the search operation.
Dynamic Hashing
In this scheme there is no relationship between the buckets. The position of one bucket is not
related to the position of any bucket, which was the case in virtual hashing. Initially files are
organized in N buckets. These buckets reside on the disk but each of them is pointed to by a cell
in the memory, and bucket address is therefore not as important as in Virtual hashing. Suppose N
Secondary storage
Buc. 1 Buc. 2 Buc. 3
A hash function transforms a key into an address of a level 0 index element (1 through 3 in this
case). The appropriate bucket is accessed by the pointer.
Insertion: The hashing function gives the address of an index element and the record is inserted
in the bucket pointed to by that node. If the target bucket is full, a new bucket is allocated to the
file. The C + 1 records are redistributed between the two buckets. These two buckets then are
pointed to by the children of the original index node as shown in the following diagram:
1 2 3
Index level 0
Index level 1
RAM
Disk
Buckets
1 2 3 21
Hash File after Bucket Split
Two questions
• When redistributing records between two buckets (bucket split), how do we decide which
record goes to which bucket?
• A record search begins from the Index level 0. When searching a record, how do we find
which of the two buckets pointed to by the index block contains the record?
Solution: To solve the above problems, a second function is used. This function, given a key
(usually primary), returns an arbitrarily long binary string (1's and 0's) of fixed length
(predefined) as shown below. This string is referred to here as B (Key).
Key H (Key) B (Key)
157 2 10100...
95 1 00011...
88 1 01100...
205 2 10010...
13 1 10111...
125 1 10001...
6 1 01000...
301 1 00110...
How a B (Key) is used?: The bits of B are used to identify the right bucket for inserting and for
searching the desired record. If the bucket being split is pointed to from a node at level I in the
tree, then we use the value of the (I + 1)st digit of B (key) to determine if a record goes to the left
(old) or to the right (new) bucket. The following diagram shows the insertions of the first five
records
RAM
Disk
95 157
83 205
13
Bucket 1 Bucket 2
File after 5 insertions
Insert 125 and 6
The hashing on key 125 gives 1 so it should be inserted in bucket pointed by node 1 of level
0. The bucket is full so it splits. A new bucket is allocated to the file. The full bucket was pointed
to by a node at level 0, so we use the first digit in B (key) string to decide the destination bucket
of a record 125. Records 95 and 88 go to the left bucket, and records 13 and 125 go to the right
bucket. Record 6 goes to Bucket 1.
1 2 Index level 0
0 1
Index level 1
RAM
Disk
95 157 13
88 205 125
6
Bucket 1 Bucket 2 Bucket 3
Split after inserting 125. No split in inserting 6
Insert record 301
H(301) = 1, so its B (key) takes it to bucket 1. Bucket 1 is full so a split occurs. The bucket to
be split is being pointed to by a leaf at index level 1. So in distributing the records (old) and
inserting new record (301), 2nd bit (Index level 1+1 = 2) of B key is used for distribution. The
structure of the file after inserting 301 is:
1 2
Index level 0
0 1
Index level 1
0 1
Index level 2
RAM
Disk
95 157 88 13
301 205 6 125
10 11 12 13 14
15 16 LB4 17 LB1 18 LB2 19
Noncontiguous allocation: Indexing. Based on index sequential file. The first disk block of the
file is reserved as index block and the addresses of the index block is stored in the file directory.
The following figure illustrates the allocation.