File Management

The document discusses different methods for organizing data files. It begins by explaining that data files must balance fast access with efficient storage, and that the main operations on data files are reading and updating records. It then describes several key file organization methods: - Pile/serial files store records in the order they are received with no structure, requiring exhaustive searching. - Sequential files store fixed-length records in key order, allowing efficient batch processing but slow random access. - Indexed sequential files add an index and overflow area to sequential files, speeding random access while maintaining sequential processing. - Indexed files fully index the records, allowing very fast random access at the cost of sequential processing abilities.

Uploaded by

Solomon Godwin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

File Management

Uploaded by

Solomon Godwin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

INTRODUCTION

Data files are organised so as to facilitate access to records and to ensure their efficient
storage. A trade-off between these two requirements generally exists: if rapid access is
required, more storage must be expended to make it possible (for example, by providing
indexes to the data records). Access to a record for reading it (and sometimes updating
it) is the essential operation on data. On secondary storage devices where files are kept,
these are two types of access: sequential and direct.

File Organisation and Access Methods

In this unit, we use the term file organisation to refer to the structure of a file (especially
a data file) defined in terms of its components and how they are mapped onto backing
store. Any given file organisation supports one or more file access methods. Organisation
is thus closely related to but conceptually distinct from access methods. Access method
is any algorithm used for the storage and retrieval of records from a data file by
determining the structural characteristics of the file on which it is used.

File Organisation Criteria

In choosing a file organisation, several criteria are important:
 Short access time
 Ease of update
 Economy of storage
 Simple maintenance
 Reliability.

The relative priority of these criteria will depend on the applications that will use the file.
For example, if a file is only to be processed in batch mode, with all of the records
accessed every time, then rapid access for retrieval of a single record is of minimal
concern. A file stored on CD-ROM will never be updated, and so ease of update is not an
issue. These criteria may conflict. For example, for economy of storage, there should
be minimum redundancy in the data. On the other hand, redundancy is a primary means
of increasing the speed of access to data. An example of this is the use of indexes.

File Organisation Methods

The number of alternative file organisations that have been implemented or just proposed
is unmanageably large. In this brief survey, we will outline five fundamental rganisations.
Most structures used in actual systems either fall into one of these categories or can be
implemented or a combination of these organisations. The five organisations, the first
four of which are depicted in

 The pile/serial
 The sequential file
 The indexed sequential file
 The indexed file
 The direct, or hashed, file

The Pile/Serial
The least-complicated form of file organisation may be termed the pile/serial. Data are
collected in the order in which they arrive. Each record consists of one burst of data. The
purpose of the pile/serial is simply to accumulate the mass of data and save it. Records
may have different fields, or similar fields in different orders. Thus, each field should be
self-describing, including a field name as well as a value. The length of each field must
be implicitly indicated by delimiters, explicitly included as a subfield, or known as
default for that field type. Because there is no structure to the pile/serial file, record
access is by exhaustive search. That is, if we wish to find a record that contains a
particular field with a particular value, it is necessary to examine each record in the pile
until the desired record is found or the entire file has been searched. If we wish to find all
records that contain a particular field or contain that field with a particular value, then the
entire file must be searched.
Pile/serial files are encountered when data are collected and stored prior to processing or
when data are not easy to organise. This type of file uses space well when the stored data
vary in size and structure; is perfectly adequate for exhaustive searches, and is easy to
update. However, beyond these limited uses, this type of file is unsuitable for most
applications.

The Sequential File

The most common form of file structure is the sequential file. In this file organisation, a
fixed format is used for records. All records are of the same length, consisting of the
same number of fixed-length fields in a particular order. Because the length and position
of each field are known, only the values of fields need to be stored; the field name and
length for each field are attributes of the file structure. One particular field, usually the
first field in each record, is referred to as the key field.

The key field uniquely identifies the record; thus key values for different records are
always different. Further, the records are stored in key sequence: alphabetical order for a
text key, and numerical order for a numerical key. Sequential files are typically used in
batch applications and are generally optimum for such applications if they involve the
processing of all the records (e.g., a billing or payroll application).The sequential file
organisation is the only one that is easily stored on tape as well as disk. For interactive
applications that involve queries and/or updates of individual records, the sequential file
provides poor performance.

Access requires the sequential search of the file for a key match. If the entire file, or a
large portion of the file, can be brought into main memory at one time, more efficient
search techniques are possible. Nevertheless, considerable processing and delay are
encountered to access a record in a large sequential file. Additions to the file also
present problems.

Typically, a sequential file is stored in simple sequential ordering of the records within
blocks. That is, the physical organisation of the file on tape or disk directly matches the
logical organisation of the file. In this case, the usual procedure is to place new records in
a separate pile file, called a log file or transaction file.

Periodically, a batch update is performed that merges the log file with the master file to
produce a new file in correct key sequence. An alternative is to organize the sequential
file physically as a linked list. One or more records are stored in each physical block.
Each block on disk contains a pointer to the next block. The insertion of new records
involves pointer manipulation but does not require that the new records occupy a
particular physical block position. Thus, some added convenience is obtained at the cost
of additional processing and overhead.
The Indexed Sequential File
A popular approach to overcoming the disadvantages of the sequential file is the indexed
sequential file. The indexed sequential file maintains the key characteristic of the
equential file: records are organised in sequence based on a key field. Two features are
added:
 an index to the file to support random access, and
 an overflow file.
The index provides a lookup capability to quickly reach the vicinity of a desired record.
The overflow file is similar to the log file used with a sequential file but is integrated so
that a record in the overflow file is located by following a pointer from its predecessor
record.
In the simplest indexed sequential structure, a single level of indexing is used. The index
in this case is a simple sequential file. Each record in the index file consists of two fields:
a key field, which is the same as the key field in the main file, and a pointer into the main
file. To find a specific record, the index is searched to find the highest key value that is
equal to or precedes the desired key value. The search continues in the main file at the
location indicated by the pointer.

To see the effectiveness of this approach, consider a sequential file with 1 million
records. To search for a particular key value will require on average one-half million
record accesses. Now suppose that an index containing 1000 entries is constructed, with
the keys in the index more or less evenly distributed over the main file. Now it will take
on average 500 accesses to the index file followed by 500 accesses to the main file
to find the record. The average search length is reduced from 500,000 to 1000.

Additions to the file are handled in the following manner: Each record in the main file
contains an additional field not visible to the application, which is a pointer to the
overflow file. When a new record is to be inserted into the file, it is added to the overflow
file. The record in the main file that immediately precedes the new record in logical
sequence is updated to contain a pointer to the new record in the overflow file.
the immediately preceding record is itself in the overflow file, then the pointer in that
record is updated. As with the sequential file, the indexed sequential file is occasionally
merged with the overflow file in batch mode.

The indexed sequential file greatly reduces the time required to access a single record,
without sacrificing the sequential nature of the file. To process the entire file sequentially,
the records of the main file are processed in sequence until a pointer to the overflow file
is found, then accessing continues in the overflow file until a null pointer is encountered,
at which time accessing of the main file is resumed where it left off.
To provide even greater efficiency in access, multiple levels of indexing can be used.
Thus the lowest level of index file is treated as a sequential file and a higher-level index
file is created for that file. Consider again a file with 1 million records. A lower-level
index with 10,000 entries is constructed. A higher-level index into the lower level index
of 100 entries can then be constructed. The search begins at the higher-level index
(average length = 50 accesses) to find an entry point into the lower-level index. This
index is then searched (average length = 50) to find an entry point into the main file,
which is then searched (average length = 50). Thus the average length of search has been
reduced from 500,000 to 1000 to 150.

The Indexed File

The indexed sequential file retains one limitation of the sequential file: effective
processing is limited to that which is based on a single field of the file. For example,
when it is necessary to search for a record on the basis of some other attributes than the
key field, both forms of sequential file are inadequate. In some applications, the
flexibility of efficiently searching by various attributes is desirable.

To achieve this flexibility, a structure is needed that employs multiple indexes, one for
each type of field that may be the subject of a search. In the general indexed file, the
concept of sequentiality and a single key are abandoned. Records are accessed only
through their indexes. The result is that there is now no restriction on the placement of
records as long as a pointer in at least one index refers to that record. Furthermore,
variable-length records can be employed.

Two types of indexes are used. An exhaustive index contains one entry for every record
in the main file. The index itself is organized as a sequential file for ease of searching. A
partial index contains entries to records where the field of interest exists. With variable-
length records, some records will not contain all fields. When a new record is added to
the main file, all of the index files must be updated. Indexed files are used mostly in
applications where timeliness of information is critical
and where data are rarely processed exhaustively. Examples are airline reservation
systems and inventory control systems.

The Direct or Hashed File

The direct or hashed file exploits the capability found on disks to access directly any
block of a known address. As with sequential and indexed sequential files, a key field is
required in each record. However, there is no concept of sequential ordering here. The
direct file makes use of hashing on the key value. Direct files are often used where very
rapid access is required, where fixed length records are used, and where records are
always accessed one at a time. Examples are directories, pricing tables, schedules, and
name lists.

Internal File Structure: Methods and Design Paradigm
No ratings yet
Internal File Structure: Methods and Design Paradigm
6 pages
Ds Mod 5
No ratings yet
Ds Mod 5
17 pages
Ignou Bca Cs 06 Solved Assignment 2012
No ratings yet
Ignou Bca Cs 06 Solved Assignment 2012
10 pages
Lecture 37-39
No ratings yet
Lecture 37-39
35 pages
Chapter 5: File Organization
No ratings yet
Chapter 5: File Organization
13 pages
ss2 DPR Second Term
No ratings yet
ss2 DPR Second Term
5 pages
Unit 6
No ratings yet
Unit 6
20 pages
Module 5 File Organization 1
No ratings yet
Module 5 File Organization 1
37 pages
COM 214 File Organization and Management Lecture Note 6
No ratings yet
COM 214 File Organization and Management Lecture Note 6
5 pages
File System: 1.1 Metadata
No ratings yet
File System: 1.1 Metadata
9 pages
UNIT 3 OS 4TH SEM
No ratings yet
UNIT 3 OS 4TH SEM
36 pages
Types of File Organization
100% (1)
Types of File Organization
3 pages
MCA File Structures MCA 212
No ratings yet
MCA File Structures MCA 212
31 pages
Information Systems Architecture (1)
No ratings yet
Information Systems Architecture (1)
13 pages
OS Unit IV File System_Part 1
No ratings yet
OS Unit IV File System_Part 1
28 pages
File Organization Midterm
No ratings yet
File Organization Midterm
43 pages
Unit 7
No ratings yet
Unit 7
46 pages
Hashed Files Internals
No ratings yet
Hashed Files Internals
22 pages
Organization of Files - Sequential & Multitable
No ratings yet
Organization of Files - Sequential & Multitable
11 pages
Files and Their Organization: Data Hierarchy
No ratings yet
Files and Their Organization: Data Hierarchy
17 pages
File-System Interface
No ratings yet
File-System Interface
47 pages
File Organization
No ratings yet
File Organization
17 pages
Indexing
No ratings yet
Indexing
62 pages
Chapter 5
No ratings yet
Chapter 5
28 pages
Unit VIII File System 1. File Concept
No ratings yet
Unit VIII File System 1. File Concept
11 pages
Cit381 Calculus Educational Consult 2021 - 1
No ratings yet
Cit381 Calculus Educational Consult 2021 - 1
43 pages
File Organization in RDBMS
No ratings yet
File Organization in RDBMS
9 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
Storage System Hierarchy in DBMS
No ratings yet
Storage System Hierarchy in DBMS
20 pages
OS Chapter-5 Handout os
No ratings yet
OS Chapter-5 Handout os
15 pages
UNIT-6
No ratings yet
UNIT-6
21 pages
File Management
No ratings yet
File Management
21 pages
L-2.3.1 File System Management
No ratings yet
L-2.3.1 File System Management
8 pages
File Management
No ratings yet
File Management
26 pages
UNIT-4 OS NOTES
No ratings yet
UNIT-4 OS NOTES
27 pages
5-FileSystem
No ratings yet
5-FileSystem
10 pages
File Organization
No ratings yet
File Organization
16 pages
Chapter 11 File Management
No ratings yet
Chapter 11 File Management
13 pages
7269IV - 5th Semester - Computer Science and Engineering
No ratings yet
7269IV - 5th Semester - Computer Science and Engineering
37 pages
Operating System Unit-5
No ratings yet
Operating System Unit-5
27 pages
Bluecrest College Ghana
No ratings yet
Bluecrest College Ghana
7 pages
File System-1
No ratings yet
File System-1
11 pages
UNIT-2
No ratings yet
UNIT-2
10 pages
File Organization, Hashing and Collision Full Copy. 1
No ratings yet
File Organization, Hashing and Collision Full Copy. 1
12 pages
History of File Structures
No ratings yet
History of File Structures
26 pages
File Organization EDIT
No ratings yet
File Organization EDIT
17 pages
File Organization
100% (1)
File Organization
4 pages
File System Interface Access Methods Directory Structure
No ratings yet
File System Interface Access Methods Directory Structure
27 pages
Operating Systems
100% (1)
Operating Systems
27 pages
Topic 5
No ratings yet
Topic 5
40 pages
Coos Unit V Part 1&2
No ratings yet
Coos Unit V Part 1&2
16 pages
Unit 5
No ratings yet
Unit 5
43 pages
OS UNIT 5
No ratings yet
OS UNIT 5
45 pages
08 Task Performance 1-OS
No ratings yet
08 Task Performance 1-OS
4 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
No ratings yet
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
23 pages
Operating Systems: Files Concept & Implementing File Systems
88% (48)
Operating Systems: Files Concept & Implementing File Systems
27 pages
9 File Systems
No ratings yet
9 File Systems
38 pages
File Organization
No ratings yet
File Organization
7 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C Programming - File Management in C
75% (4)
C Programming - File Management in C
10 pages
PL 05
100% (1)
PL 05
42 pages
ALV Com Excel e CSV Local e Servidor - ABAP
No ratings yet
ALV Com Excel e CSV Local e Servidor - ABAP
23 pages
Vistex 6 0 C Extractor Documentation
100% (1)
Vistex 6 0 C Extractor Documentation
9 pages
Power Bi Consulting Company in Dubai PDF
No ratings yet
Power Bi Consulting Company in Dubai PDF
5 pages
Prewriting - GA Workflow Document ISP100
No ratings yet
Prewriting - GA Workflow Document ISP100
7 pages
Kavya_Data_analyst
No ratings yet
Kavya_Data_analyst
1 page
Cisco Networking Academy Course Information
No ratings yet
Cisco Networking Academy Course Information
12 pages
Google Play Store Data Analysis
No ratings yet
Google Play Store Data Analysis
46 pages
Datum Systems
71% (14)
Datum Systems
16 pages
Data Virtuality LDW - Datasheet
No ratings yet
Data Virtuality LDW - Datasheet
3 pages
EVRoots
No ratings yet
EVRoots
6 pages
Chapter 4 Database Security
No ratings yet
Chapter 4 Database Security
10 pages
Seismic Processing Steps
No ratings yet
Seismic Processing Steps
2 pages
MySQL Replication Blueprint
No ratings yet
MySQL Replication Blueprint
26 pages
SPE Paper - Cross Hair Scatter Plots
No ratings yet
SPE Paper - Cross Hair Scatter Plots
8 pages
Creation of Custom IDOC Type: 1. Business Case
100% (1)
Creation of Custom IDOC Type: 1. Business Case
8 pages
Mysql
No ratings yet
Mysql
19 pages
Computer Memory
No ratings yet
Computer Memory
3 pages
Research Design Mixed Method
No ratings yet
Research Design Mixed Method
23 pages
Informatica Preview
No ratings yet
Informatica Preview
17 pages
Oracle Streams Step by Step
No ratings yet
Oracle Streams Step by Step
16 pages
ATI TEAS VI Individual 258994143 642b77e1 WithExplanation
No ratings yet
ATI TEAS VI Individual 258994143 642b77e1 WithExplanation
4 pages
Journal Data Mining
No ratings yet
Journal Data Mining
31 pages
Utiva Data Science Fellowship
No ratings yet
Utiva Data Science Fellowship
12 pages
Coursera CURPZZP7PL5N
No ratings yet
Coursera CURPZZP7PL5N
1 page
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Nursingplus Open: Mariette Bengtsson
No ratings yet
Nursingplus Open: Mariette Bengtsson
7 pages
Lab 1 - The Scientific Method and Termite
No ratings yet
Lab 1 - The Scientific Method and Termite
22 pages
Sequential Versus Random Access
No ratings yet
Sequential Versus Random Access
2 pages

File Management

Uploaded by

File Management

Uploaded by

INTRODUCTION

File Organisation and Access Methods

File Organisation Criteria

File Organisation Methods

The Sequential File

The Indexed File

The Direct or Hashed File

You might also like