MODULE 2
MODULE 2
FILE NAMING
Files are abstraction mechanisms. They provide a way to store information and read it back later.
This must be done in a way as to shield the user from the details of how and where the
information is stored, and how the disks actually work. When a process creates a file, it gives the
file a name. When the process terminates, the file continue to exist, and can be accessed by other
processes using its name.
The exact rules for file naming vary somewhat from system to system, but all operating systems
allow strings of one to eight letters as legal file names. The file name is chosen by the person
creating it, usually to reflect its contents. There are few constraints on the format of the file
name: It can comprise the letters A-Z, numbers 0-9 and special characters $ # & + @ ! ( ) - { } ' `
_ ~ as well as space. The only symbols that cannot be used to identify a file are * | < > \ ^ = ? / [ ]
' ; , plus control characters. The main caveat on chosen a file name is that there are different rules
for different operating systems that can present problems when files are moved from computer to
another. For example,
Microsoft Windows is case insensitive, so files like MYEBOOKS, myebooks, MyEbooks are all
the same to Microsoft Windows.
However, under the UNIX operating system, all three would be different files as, in this instance,
file names are case sensitive.
Naming Convention
Usually a file would have two parts with “.” separating them. The part on the left side of the
period character is called the main name while the part on the right side is called the extension.
A good example of a file name is “course.doc.” The main name is course while the extension is
doc. File extension differentiates between different types of files. We generally refer to a file
with its name along with its extension and
that forms a complete file name.
File Name Extension
A filename extension is a suffix to the name of a computer file applied to indicate the encoding
convention or file format of its contents. In some operating systems (for example UNIX) it is
optional, while in some others (such as DOS) it is a requirement. Some operating systems limit
the length of the extension (such as DOS and OS/2, to three
characters) while others (such as UNIX) do not. Some operating systems (for example RISC OS)
do not use file extensions.
The following tables, which are extracted from Microsoft® Encarta (2007), show examples of
some common filename extensions:
IMAGES
FILE TYPECONTENT APPLICATION
.gif General Interchange Lview and others.
format, though not the
most economical, the
most common graphics
format not found on the
internet
.jpg Joint picture experts Lview and many
.jpeg group, a 24 bit graphic others
format
.mpg Moving picture expert Sparle, Windows
.mpeg group, a standard internetmedia player, Quick
movie platform time and many
others.
Table 3. File name extension of sound files.
SOUND
FILE TYPE CONTENT APPLICATION
.mp3 Audio files on both Windows media player
Pc and Mac
.wav Audio files on Pc Real player
.ra Real audio, a
proprietary system
for delivering and
playing streaming
audio on the web
.aiff Audio files on Mac.
UTILITIES
FILE TYPECONTENT APPLICATION
.ppt A presentation Microsoft
file (for slide powerpoint
shows)
.xls Spreadsheet files Microsoft excel,
.123 Lotus 123
.mdb A database file Microsoft access
File Attributes
The particular information kept for each file varies from operating system to operating system.
No matter what operating system one might be using, files always have certain attributes or
characteristics. Different file attributes are discussed as follow.
File Name
The symbolic file name is the only information kept in human-read form. As it is obvious, a file
name helps users to differentiate between various files.
File Type
A file type is required for the systems that support different types of files. As discussed earlier,
file type is a part of the complete file name.
We might have two different files; say “cit381.doc” and “cit381.txt”.
Therefore the file type is an important attribute which helps in differentiating between files based
on their types. File types indicate which application should be used to open a particular file.
Location
This is a pointer to the device and location on that device of the file. As it is clear from the
attribute name, it specifies where the file is stored.
Size
Size attribute keeps track of the current size of a file in bytes, words or blocks. The size of a file
is measured in bytes. A floppy disk holds about 1.44 Mb; a Zip disk holds 100 Mb or 250 Mb; a
CD holds about 800 Mb; a DVD holds about 4.7 Gb.
Protection
Protection attribute of a file keeps track of the access-control information that controls who can
do reading, writing, executing, and so on.
Usage Count
This value indicates the number of processes that are currently using (have opened) a particular
file.
Time, Date and Process Identification
This information may be kept for creation, last modification, and last use. Data provided by this
attribute is often helpful for protection and usage monitoring. Each process has its own
identification number which contains information about file hierarchy.
FILE OPERATIONS
File Sharing
In a multiuser system, there is almost always a requirement for allowing files to be shared among
a number of users. Two issues arise: access rights and the management of simultaneous access.
Access Right
The file system should provide a flexible tool for allowing extensive file sharing among users.
The file system should provide a number of options so that the way in which a particular file is
accessed can be controlled. Typically, users or groups of users are granted certain access rights
to a file. A wide range of access rights are in use. The following list is representative of access
rights that can be assigned to a particular
user for a particular file:
a. None: The user may not even know of the existence of the file, not to talk of accessing it.
To enforce this restriction, the user would not be allowed to read the user directory that
contains this file.
b. Knowledge: The user can determine that the file exists and who its owner is. The user is
then able to petition the owner for additional access rights.
c. Execution: The user can load and execute a program but cannot copy it. Proprietary
programs are often made accessible with this restriction.
d. Reading: The user can read the file for any purpose, including copying and execution.
Some systems are able to enforce a distinction between viewing and copying. In the
former case, the contents of the file can be displayed to the user, but the user has no
means for making a copy.
e. Appending: The user can add data to the file, often only at the end, but cannot modify or
delete any of the file’s contents. This right is useful in collecting data from a number of
sources.
f. Updating: The user can modify, delete, and add to the file’s data.
This normally includes writing the file initially, rewriting it completely or in part, and
removing all or a portion of the data. Some systems distinguish among different degrees
of updating.
g. Changing protection: The user can change the access rights granted to other users.
Typically, this right is held only by the owner of the file. In some systems, the owner can
extend this right to others. To prevent abuse of this mechanism, the file
owner will typically be able to specify which rights can be changed by the holder of this
right.
h. Deletion: The user can delete the file from the file system.
These rights can be considered to constitute a hierarchy, with each right implying those
that precede it. Thus, if a particular user is granted the updating right for a particular file,
then that user is also granted the following rights: knowledge, execution, reading, and
appending. One user is designated as owner of a given file, usually the person who
initially created the file. The owner has all of the access rights listed previously and may
grant rights to others. Access can be provided to different classes of users:
i. Specific user: Individual users who are designated by user ID.
j. User groups: A set of users who are not individually defined.
The system must have some way of keeping track of the membership of user groups.
k. All: All users who have access to this system. These are public files.
Simultaneous Access
When access is granted to append or update a file to more than one user, the operating system or
file management system must enforce discipline. A brute-force approach is to allow a user to
lock the entire file when it is to be updated. A finer grain of control is to lock individual records
during update.
File Allocation
In allocating disk space, several issues are involved:
1. When a new file is created, is the maximum space required for the file allocated at once?
2. Is space allocated to a file as one or more contiguous units? We shall refer to these units
as portions. That is, a portion is a contiguous set of allocated blocks. The size of a
portion can range from a single block to the entire file. What size of portion
should be used for file allocation?
3. What sort of data structure or table is used to keep track of the portions assigned to a file?
An example of such a structure is a file allocation table (FAT), found on DOS and some
other systems.
Let us examine these issues in turn.
Preallocation versus Dynamic Allocation
A pre-allocation policy requires that the maximum size of a file be declared at the time of the file
creation request. In a number of cases, such as program compilations, the production of summary
data files, or the transfer of a file from another system over a communications network, this
value can be reliably estimated. However, for many
applications, it is difficult if not impossible to estimate reliably the maximum potential size of a
file. In those cases, users and application programmers would tend to overestimate file size so as
not to run out of space. This clearly is wasteful from the point of view of secondary storage
allocation. Thus, the dynamic allocation which allocates space to a file in portions as needed.
Portion Size
The second issue listed is that of the size of the portion allocated to a file. At one extreme, a
portion large enough to hold the entire file is allocated. At the other extreme, space on the disk is
allocated one block at a time. In choosing a portion size, there is a tradeoff between efficiency
from the point of view of a single file versus overall system
efficiency.
Below is a list of four items to be considered in the tradeoff:
· Contiguity of space increases performance, especially for Retrieve_Next operations, and
greatly for transactions running in a transaction-oriented operating system.
· Having a large number of small portions increases the size of tables needed to manage the
allocation information.
· Having fixed-size portions (for example, blocks) simplifies the reallocation of space.
· Having variable-size or small fixed-size portions minimizes waste of unused storage due to
over-allocation.
Of course, these listed items interact and must be considered together.
The result is that there are two major alternatives: variable, large contiguous portions and blocks.
a. Variable, Large Contiguous Portions
This will provide better performance. The variable size avoids waste, and the file allocation
tables are small. However, space is hard to reuse.
b. Blocks
Small fixed portions provide greater flexibility. They may require large tables or complex
structures for their allocation. Contiguity has been abandoned as a primary goal; blocks are
allocated as needed.
Both options are compatible with pre-allocation and dynamic allocation. In the case of variable,
large contiguous portions, a file is pre-allocated one contiguous group of blocks. This eliminates
the need for a file allocation table; all that is required is a pointer to the first block and the
number of blocks allocated. In the case of blocks, all of the portions
required are allocated at one time. This means that the file allocation table for the file will remain
of fixed size, because the number of blocks allocated is fixed.
With variable-size portions, we need to be concerned with the
fragmentation of free space. The following are possible alternative
strategies:
· First fit: Choose the first unused contiguous group of blocks of sufficient size from a free
block list.
· Best fit: Choose the smallest unused group that is of sufficient size.
· Nearest fit: Choose the unused group of sufficient size that is closest to the previous allocation
for the file to increase locality.
It is not clear which strategy is best. The difficulty in modeling alternative strategies is that so
many factors interact, including types of files, pattern of file access, degree of
multiprogramming, other performance factors in the system, disk caching, disk scheduling, and
so on.
File Allocation Methods
The direct-access nature of disks allows us flexibility in the implementation of files. In almost
every case, files will be stored on the same disk. The main problem is how to allocate space to
these files so that disk space is utilised effectively and file can be accessed quickly.
Three major methods of allocating disk space are in wide use: contiguous, linked and indexed.
Contiguous Allocation
With contiguous allocation, a single contiguous set of blocks is allocated to a file at the time of
file creation (Figure 11). Thus, this is a pre-allocation strategy, using variable-size portions. The
file allocation table needs just a single entry for each file, showing the starting block and the
length of the file. Contiguous allocation is the best from the
point of view of the individual sequential file. Multiple blocks can be read in at a time to improve
I/O performance for sequential processing. It is also easy to retrieve a single block.
For example, if a file starts at block b, and the ith block of the file is wanted, its location on
secondary storage is simply b + i - 1. Contiguous allocation presents some problems. External
fragmentation will occur, making it difficult to find contiguous blocks of space of sufficient
length. From time to time, it will be necessary to perform a compaction algorithm to free up
additional space on the disk (Figure 12). Also, with preallocation, it is necessary to declare the
size of the file at the time of
creation, with the problems mentioned earlier.
Linked/Chained Allocation
At the opposite extreme from contiguous allocation is chained allocation (Figure 13). Typically,
allocation is on an individual block basis. Each block contains a pointer to the next block in the
chain.
Again, the file allocation table needs just a single entry for each file, showing the starting block
and the length of the file.
Although pre-allocation is possible, it is more common simply to allocate blocks as needed. The
selection of blocks is now a simple matter: any free block can be added to a chain. There is no
external fragmentation to worry about because only one block at a time is needed. This type of
physical organisation is best suited to sequential files that are to be processed sequentially. To
select an individual block of a file requires tracing through the chain to the desired block.
One consequence of linking/chaining, as described so far, is that there is no accommodation of
the principle of locality. Thus, if it is necessary to bring in several blocks of a file at a time, as in
sequential processing, then a series of accesses to different parts of the disk are required. This is
perhaps a more significant effect on a single-user system but may also be of concern on a shared
system. To overcome this problem, some
systems periodically consolidate files.
Indexed Allocation
Indexed allocation addresses many of the problems of contiguous and linked/chained allocation.
In this case, the file allocation table contains a separate one-level index for each file; the index
has one entry for each portion allocated to the file. Typically, the file indexes are not physically
stored as part of the file allocation table. Rather, the file index for a file is kept in a separate
block and the entry for the file in the file allocation
table points to that block. Allocation may be on the basis of either fixedsize blocks (Figure 15) or
variable-size portions (Figure 16). Allocation by blocks eliminates external fragmentation,
whereas allocation by variable-size portions improves locality. In either case, file consolidation
may be done from time to time. File consolidation reduces the size of the index in the case of
variable-size portions, but not in the case of block allocation. Indexed allocation supports both
sequential and direct access to the file and thus is the most popular form of file allocation.