0% found this document useful (0 votes)
84 views

File Management

The document discusses different methods for organizing data files. It begins by explaining that data files must balance fast access with efficient storage, and that the main operations on data files are reading and updating records. It then describes several key file organization methods: - Pile/serial files store records in the order they are received with no structure, requiring exhaustive searching. - Sequential files store fixed-length records in key order, allowing efficient batch processing but slow random access. - Indexed sequential files add an index and overflow area to sequential files, speeding random access while maintaining sequential processing. - Indexed files fully index the records, allowing very fast random access at the cost of sequential processing abilities.

Uploaded by

Solomon Godwin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

File Management

The document discusses different methods for organizing data files. It begins by explaining that data files must balance fast access with efficient storage, and that the main operations on data files are reading and updating records. It then describes several key file organization methods: - Pile/serial files store records in the order they are received with no structure, requiring exhaustive searching. - Sequential files store fixed-length records in key order, allowing efficient batch processing but slow random access. - Indexed sequential files add an index and overflow area to sequential files, speeding random access while maintaining sequential processing. - Indexed files fully index the records, allowing very fast random access at the cost of sequential processing abilities.

Uploaded by

Solomon Godwin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

INTRODUCTION

Data files are organised so as to facilitate access to records and to ensure their efficient
storage. A trade-off between these two requirements generally exists: if rapid access is
required, more storage must be expended to make it possible (for example, by providing
indexes to the data records). Access to a record for reading it (and sometimes updating
it) is the essential operation on data. On secondary storage devices where files are kept,
these are two types of access: sequential and direct.

File Organisation and Access Methods


In this unit, we use the term file organisation to refer to the structure of a file (especially
a data file) defined in terms of its components and how they are mapped onto backing
store. Any given file organisation supports one or more file access methods. Organisation
is thus closely related to but conceptually distinct from access methods. Access method
is any algorithm used for the storage and retrieval of records from a data file by
determining the structural characteristics of the file on which it is used.

File Organisation Criteria


In choosing a file organisation, several criteria are important:
 Short access time
 Ease of update
 Economy of storage
 Simple maintenance
 Reliability.

The relative priority of these criteria will depend on the applications that will use the file.
For example, if a file is only to be processed in batch mode, with all of the records
accessed every time, then rapid access for retrieval of a single record is of minimal
concern. A file stored on CD-ROM will never be updated, and so ease of update is not an
issue. These criteria may conflict. For example, for economy of storage, there should
be minimum redundancy in the data. On the other hand, redundancy is a primary means
of increasing the speed of access to data. An example of this is the use of indexes.

File Organisation Methods

The number of alternative file organisations that have been implemented or just proposed
is unmanageably large. In this brief survey, we will outline five fundamental rganisations.
Most structures used in actual systems either fall into one of these categories or can be
implemented or a combination of these organisations. The five organisations, the first
four of which are depicted in

 The pile/serial
 The sequential file
 The indexed sequential file
 The indexed file
 The direct, or hashed, file

The Pile/Serial
The least-complicated form of file organisation may be termed the pile/serial. Data are
collected in the order in which they arrive. Each record consists of one burst of data. The
purpose of the pile/serial is simply to accumulate the mass of data and save it. Records
may have different fields, or similar fields in different orders. Thus, each field should be
self-describing, including a field name as well as a value. The length of each field must
be implicitly indicated by delimiters, explicitly included as a subfield, or known as
default for that field type. Because there is no structure to the pile/serial file, record
access is by exhaustive search. That is, if we wish to find a record that contains a
particular field with a particular value, it is necessary to examine each record in the pile
until the desired record is found or the entire file has been searched. If we wish to find all
records that contain a particular field or contain that field with a particular value, then the
entire file must be searched.
Pile/serial files are encountered when data are collected and stored prior to processing or
when data are not easy to organise. This type of file uses space well when the stored data
vary in size and structure; is perfectly adequate for exhaustive searches, and is easy to
update. However, beyond these limited uses, this type of file is unsuitable for most
applications.

The Sequential File


The most common form of file structure is the sequential file. In this file organisation, a
fixed format is used for records. All records are of the same length, consisting of the
same number of fixed-length fields in a particular order. Because the length and position
of each field are known, only the values of fields need to be stored; the field name and
length for each field are attributes of the file structure. One particular field, usually the
first field in each record, is referred to as the key field.

The key field uniquely identifies the record; thus key values for different records are
always different. Further, the records are stored in key sequence: alphabetical order for a
text key, and numerical order for a numerical key. Sequential files are typically used in
batch applications and are generally optimum for such applications if they involve the
processing of all the records (e.g., a billing or payroll application).The sequential file
organisation is the only one that is easily stored on tape as well as disk. For interactive
applications that involve queries and/or updates of individual records, the sequential file
provides poor performance.

Access requires the sequential search of the file for a key match. If the entire file, or a
large portion of the file, can be brought into main memory at one time, more efficient
search techniques are possible. Nevertheless, considerable processing and delay are
encountered to access a record in a large sequential file. Additions to the file also
present problems.

Typically, a sequential file is stored in simple sequential ordering of the records within
blocks. That is, the physical organisation of the file on tape or disk directly matches the
logical organisation of the file. In this case, the usual procedure is to place new records in
a separate pile file, called a log file or transaction file.

Periodically, a batch update is performed that merges the log file with the master file to
produce a new file in correct key sequence. An alternative is to organize the sequential
file physically as a linked list. One or more records are stored in each physical block.
Each block on disk contains a pointer to the next block. The insertion of new records
involves pointer manipulation but does not require that the new records occupy a
particular physical block position. Thus, some added convenience is obtained at the cost
of additional processing and overhead.
The Indexed Sequential File
A popular approach to overcoming the disadvantages of the sequential file is the indexed
sequential file. The indexed sequential file maintains the key characteristic of the
equential file: records are organised in sequence based on a key field. Two features are
added:
 an index to the file to support random access, and
 an overflow file.
The index provides a lookup capability to quickly reach the vicinity of a desired record.
The overflow file is similar to the log file used with a sequential file but is integrated so
that a record in the overflow file is located by following a pointer from its predecessor
record.
In the simplest indexed sequential structure, a single level of indexing is used. The index
in this case is a simple sequential file. Each record in the index file consists of two fields:
a key field, which is the same as the key field in the main file, and a pointer into the main
file. To find a specific record, the index is searched to find the highest key value that is
equal to or precedes the desired key value. The search continues in the main file at the
location indicated by the pointer.

To see the effectiveness of this approach, consider a sequential file with 1 million
records. To search for a particular key value will require on average one-half million
record accesses. Now suppose that an index containing 1000 entries is constructed, with
the keys in the index more or less evenly distributed over the main file. Now it will take
on average 500 accesses to the index file followed by 500 accesses to the main file
to find the record. The average search length is reduced from 500,000 to 1000.

Additions to the file are handled in the following manner: Each record in the main file
contains an additional field not visible to the application, which is a pointer to the
overflow file. When a new record is to be inserted into the file, it is added to the overflow
file. The record in the main file that immediately precedes the new record in logical
sequence is updated to contain a pointer to the new record in the overflow file.
the immediately preceding record is itself in the overflow file, then the pointer in that
record is updated. As with the sequential file, the indexed sequential file is occasionally
merged with the overflow file in batch mode.

The indexed sequential file greatly reduces the time required to access a single record,
without sacrificing the sequential nature of the file. To process the entire file sequentially,
the records of the main file are processed in sequence until a pointer to the overflow file
is found, then accessing continues in the overflow file until a null pointer is encountered,
at which time accessing of the main file is resumed where it left off.
To provide even greater efficiency in access, multiple levels of indexing can be used.
Thus the lowest level of index file is treated as a sequential file and a higher-level index
file is created for that file. Consider again a file with 1 million records. A lower-level
index with 10,000 entries is constructed. A higher-level index into the lower level index
of 100 entries can then be constructed. The search begins at the higher-level index
(average length = 50 accesses) to find an entry point into the lower-level index. This
index is then searched (average length = 50) to find an entry point into the main file,
which is then searched (average length = 50). Thus the average length of search has been
reduced from 500,000 to 1000 to 150.

The Indexed File

The indexed sequential file retains one limitation of the sequential file: effective
processing is limited to that which is based on a single field of the file. For example,
when it is necessary to search for a record on the basis of some other attributes than the
key field, both forms of sequential file are inadequate. In some applications, the
flexibility of efficiently searching by various attributes is desirable.

To achieve this flexibility, a structure is needed that employs multiple indexes, one for
each type of field that may be the subject of a search. In the general indexed file, the
concept of sequentiality and a single key are abandoned. Records are accessed only
through their indexes. The result is that there is now no restriction on the placement of
records as long as a pointer in at least one index refers to that record. Furthermore,
variable-length records can be employed.

Two types of indexes are used. An exhaustive index contains one entry for every record
in the main file. The index itself is organized as a sequential file for ease of searching. A
partial index contains entries to records where the field of interest exists. With variable-
length records, some records will not contain all fields. When a new record is added to
the main file, all of the index files must be updated. Indexed files are used mostly in
applications where timeliness of information is critical
and where data are rarely processed exhaustively. Examples are airline reservation
systems and inventory control systems.

The Direct or Hashed File

The direct or hashed file exploits the capability found on disks to access directly any
block of a known address. As with sequential and indexed sequential files, a key field is
required in each record. However, there is no concept of sequential ordering here. The
direct file makes use of hashing on the key value. Direct files are often used where very
rapid access is required, where fixed length records are used, and where records are
always accessed one at a time. Examples are directories, pricing tables, schedules, and
name lists.

You might also like