L-2.3.1 File System Management
L-2.3.1 File System Management
The file system is the most visible aspect of an operating system. It provides the mechanism
for on-line storage of and access to both data and programs of the operating system and all
the users of the computer system. The file system consists of two distinct parts: a collection
of files, each storing related data, and a directory structure, which organizes and provides
information about all the files in the system.
File Concept
Computers can store information on various storage media, such as magnetic disks, magnetic
tapes, and optical disks. So that the computer system will be convenient to use, the operating
system provides a uniform logical view of stored information. The operating system abstracts
from the physical properties of its storage devices to define a logical storage unit, the file.
Files are mapped by the operating system onto physical devices. These storage devices are
usually non-volatile, so the contents are persistent between system reboots.
From a user’s perspective, a file is the smallest allotment of logical secondary storage; that
is, data cannot be written to secondary storage unless they are within a file. Commonly, files
represent programs (both source and object forms) and data. Data files may be numeric,
alphabetic, alphanumeric, or binary. Files may be free form, such as text files, or may be
formatted rigidly. In general, a file is a sequence of bits, bytes, lines, or records, the meaning
of which is defined by the file’s creator and user.
Many different types of information may be stored in a file — source or executable programs,
numeric or text data, photos, music, video, and so on. A file has a certain defined structure,
which depends on its type.
File Attributes
A file’s attributes vary from one operating system to another but typically consist of these:
Name: The symbolic file name is the only information kept in human-readable form.
Identifier: This unique tag, usually a number, identifies the file within the file
system; it is the non-human-readable name for the file.
Type: This information is needed for systems that support different types of files.
Location: This information is a pointer to a device and to the location of the file on
that device.
Size: The current size of the file (in bytes, words, or blocks) and possibly the
maximum allowed size are included in this attribute.
Protection: Access-control information determines who can do reading, writing,
executing, and so on.
Time, date, and user identification. This information may be kept for creation, last
modification, and last use. These data can be useful for protection, security, and usage
monitoring.
Some newer file systems also support extended file attributes, including character encoding
of the file and security features such as a file checksum.
The information about all files is kept in the directory structure, which also resides on
secondary storage. Typically, a directory entry consists of the file’s name and its unique
identifier. The identifier in turn locates the other file attributes.
File Operations
Creating a file: Two steps are necessary to create a file. First, space in the file system
must be found for the file. Second, an entry for the new file must be made in the
directory.
Writing a file: To write a file, we make a system call specifying both the name of the
file and the information to be written to the file. Given the name of the file, the system
searches the directory to find the file’s location. The system must keep a write pointer
to the location in the file where the next write is to take place. The write pointer must
be updated whenever a write occurs.
Reading a file: To read from a file, we use a system call that specifies the name of the
file and where (in memory) the next block of the file should be put. Again, the
directory is searched for the associated entry, and the system needs to keep a read
pointer to the location in the file where the next read is to take place. Once the read
has taken place, the read pointer is updated. Because a process is usually either
reading from or writing to a file, the current operation location can be kept as a per-
process current-file-position pointer. Both the read and write operations use this same
pointer, saving space and reducing system complexity.
Repositioning within a file: The directory is searched for the appropriate entry, and
the current-file-position pointer is repositioned to a given value. Repositioning within
a file need not involve any actual I/O .This file operation is also known as a file seek.
Deleting a file: To delete a file, we search the directory for the named file. Having
found the associated directory entry, we release all file space, so that it can be reused
by other files, and erase the directory entry.
Truncating a file: The user may want to erase the contents of a file but keep its
attributes. Rather than forcing the user to delete the file and then recreate it, this
function allows all attributes to remain unchanged — except for file length — but lets
the file be reset to length zero and its file space released.
File Types
When we design a file system — indeed, an entire operating system — we always consider
whether the operating system should recognize and support file types. If an operating system
recognizes the type of a file, it can then operate on the file in reasonable ways.
A common technique for implementing file types is to include the type as part of the file
name. The name is split into two parts — a name and an extension, usually separated by a
period.
In this way, the user and the operating system can tell from the name alone what the type of a
file is. Most operating systems allow users to specify a file name as a sequence of characters
followed by a period and terminated by an extension made up of additional characters.
Examples include resume.docx , server.c and ReaderThread.cpp.
The system uses the extension to indicate the type of the file and the type of operations that
can be done on that file. Only a file with a .com, .exe or .sh extension can be executed, for
instance. The .com and .exe files are two forms of binary executable files, whereas the.sh file
is a shell script containing, in ASCII format, commands to the operating system.
File Access
Files store information. When it is used, this information must be accessed and read into
computer memory. The information in the file can be accessed in several ways.
Sequential Access
The simplest access method is sequential access. Information in the file is processed in order,
one record after the other. This mode of access is by far the most common; for example,
editors and compilers usually access files in this fashion.
Reads and writes make up the bulk of the operations on a file. A read operation —readnext()
— reads the next portion of the file and automatically advances a file pointer, which tracks
the I/O location. Similarly, the write operation —writenext()— appends to the end of the file
and advances to the end of the newly written material.
Direct Access
Another method is direct access (or relative access). Here, a file is made up of fixed-length
logical records that allow programs to read and write records rapidly in no particular order.
The direct-access method is based on a disk model of a file, since disks allow random access
to any file block. For direct access, the file is viewed as a numbered sequence of blocks or
records. Thus, we may read block 14, then read block 53, and then write block 7. There are
no restrictions on the order of reading or writing for a direct-access file.
Direct-access files are of great use for immediate access to large amounts of information.
Databases are often of this type. When a query concerning a particular subject arrives, we
compute which block contains the answer and then read that block directly to provide the
desired information.
The index sequential access method is a modification of the direct access method. Basically,
it is kind of combination of both the sequential access as well as direct access. The main idea
of this method is to first access the file directly and then it accesses sequentially.
In this access method, it is necessary for maintaining an index. The index is nothing but a
pointer to a block. The direct access of the index is made to access a record in a file. The
information which is obtained from this access is used to access the file. Sometimes the
indexes are very big. So to maintain all these hierarchies of indexes are built in which one
direct access of an index leads to information of another index access. The main advantage in
this type of access is that both direct and sequential access of files is possible with the help of
this method.
Or
IBM’s indexed sequential-access method (ISAM) uses a small master index that points to
disk blocks of a secondary index. The secondary index blocks point to the actual file blocks.
The file is kept sorted on a defined key. To find a particular item, we first make a binary
search of the master index, which provides the block number of the secondary index. This
block is read in, and again a binary search is used to find the block containing the desired
record. Finally, this block is searched sequentially. In this way, any record can be located
from its key by at most two direct-access reads. Figure 11.6 shows a similar situation as
implemented by VMS index and relative files.
Next, we consider how to store files. Certainly, no general-purpose computer stores just one
file. There are typically thousands, millions, even billions of files within a computer. Files are
stored on random-access storage devices, including hard disks, optical disks, and solid-state
(memory-based) disks.
A storage device can be used in its entirety for a file system. It can also be subdivided for
finer-grained control. For example, a disk can be partitioned into quarters, and each quarter
can hold a separate file system. Storage devices can also be collected together into RAID sets
that provide protection from the failure of a single disk.
A file system can be created on each of these parts of the disk. Any entity containing a file
system is generally known as a volume.
Volumes can also store multiple operating systems, allowing a system to boot and run more
than one operating system.
Each volume that contains a file system must also contain information about the files in the
system. This information is kept in entries in a device directory or volume table of contents.
The device directory (more commonly known simply as the directory) records information —
such as name, location, size and type — for all files on that volume. Figure 11.7 shows a
typical file-system organization.
Directory Overview
The directory can be viewed as a symbol table that translates file names into their directory
entries. If we take such a view, we see that the directory itself can be organized in many
ways. The organization must allow us to insert entries, to delete entries, to search for a named
entry, and to list all the entries in the directory. In this section, we examine several schemes
for defining the logical structure of the directory system.
Single-Level Directory
The simplest directory structure is the single-level directory. All files are contained in the
same directory, which is easy to support and understand (Figure 11.9).
A single-level directory has significant limitations, however, when the number of files
increases or when the system has more than one user. Since all files are in the same directory,
they must have unique names. If two users call their data file test.txt, then the unique-name
rule is violated.
Even a single user on a single-level directory may find it difficult to remember the names of
all the files as the number of files increases. It is not uncommon for a user to have hundreds
of files on one computer system and an equal number of additional files on another system.
Two-Level Directory
As we have seen, a single-level directory often leads to confusion of file names among
different users. The standard solution is to create a separate directory for each user. In the
two-level directory structure, each user has his own user file directory (UFD).The UFDs have
similar structures, but each lists only the files of a single user. When a user job starts or a user
logs in, the system’s master file directory (MFD) is searched. The MFD is indexed by user
name or account number, and each entry points to the UFD for that user.
Although the two-level directory structure solves the name-collision problem, it still has
disadvantages. This structure effectively isolates one user from another. Isolation is an
advantage when the users are completely independent but is a disadvantage when the users
want to cooperate on some task and to access one another’s files. Some systems simply do
not allow local user files to be accessed by other users.
Tree-Structured Directories
Once we have seen how to view a two-level directory as a two-level tree, the natural
generalization is to extend the directory structure to a tree of arbitrary height. This
generalization allows users to create their own subdirectories and to organize their files
accordingly. A tree is the most common directory structure. The tree has a root directory, and
every file in the system has a unique path name.
The current directory should contain most of the files that are of current interest to the
process. When reference is made to a file, the current directory is searched. If a file is needed
that is not in the current directory, then the user usually must either specify a path name or
change the current directory to be the directory holding that file.
Path names can be of two types: absolute and relative. An absolute path name begins at the
root and follows a path down to the specified file, giving the directory names on the path.
A relative path name defines a path from the current directory. For example, in the tree-
structured file system of Figure 11.11, if the current directory is root/spell/mail, then the
relative path name prt/first refers to the same file as does the absolute path name
root/spell/mail/prt/first.
Acyclic-Graph Directories
Consider two programmers who are working on a joint project. The files associated with that
project can be stored in a subdirectory, separating them from other projects and files of the
two programmers. But since both programmers are equally responsible for the project, both
want the subdirectory to be in their own directories. In this situation, the common
subdirectory should be shared. A shared directory or file exists in the file system in two (or
more) places at once.
An acyclic graph — that is, a graph with no cycles — allows directories to share
subdirectories and files.
Video Links:
https://www.youtube.com/watch?v=vqdTDdHyU5U
https://www.youtube.com/watch?v=xl8a2n63D5s
REFERENCES
https://www.includehelp.com/operating-systems/file-management-in-operating-
system.aspx#:~:text=A%20file%20is%20collection%20of,modifying%20and%20deleting
%20the%20files.
https://www.tutorialspoint.com/operating_system/os_file_system.htm
https://www.unf.edu/public/cop4610/ree/Notes/PPT/PPT8E/CH12-OS8e.pdf
https://www.javatpoint.com/os-file-system