0% found this document useful (0 votes)
53 views

File Management

1. A file can be defined as a data structure that stores a sequence of records and is a logical collection of information stored in a file system on a disk or in memory. 2. Files have attributes like name, identifier, type, location, size, permissions, and timestamps that provide metadata about the file. 3. Common file operations include create, open, write, read, seek, delete, truncate, close, append, and rename that allow manipulating the file contents and attributes.

Uploaded by

Yamini
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

File Management

1. A file can be defined as a data structure that stores a sequence of records and is a logical collection of information stored in a file system on a disk or in memory. 2. Files have attributes like name, identifier, type, location, size, permissions, and timestamps that provide metadata about the file. 3. Common file operations include create, open, write, read, seek, delete, truncate, close, append, and rename that allow manipulating the file contents and attributes.

Uploaded by

Yamini
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

What is a File ?

A file can be defined as a data structure which stores the


sequence of records. A file may also be defined as logical collection of
information. Files are stored in a file system, which may exist on a disk or
in the main memory. Files can be simple (plain text) or complex
(specially-formatted).
The collection of files is known as Directory. The collection of directories
at the different levels, is known as File System.

Attributes of the File


1.Name
Every file carries a name by which the file is recognized in the file system.
One directory cannot have two files with the same name.
2.Identifier
Along with the name, Each File has its own extension which identifies the
type of the file. For example, a text file has the extension .txt, A video file
can have the extension .mp4.
3.Type
In a File System, the Files are classified in different types such as video
files, audio files, text files, executable files, etc.
4.Location
In the File System, there are several locations on which, the files can be
stored. Each file carries its location as its attribute.
5.Size
The Size of the File is one of its most important attribute. By size of the
file, we mean the number of bytes acquired by the file in the memory.
6.Protection
The Admin of the computer may want the different protections for the
different files. Therefore each file carries its own set of permissions to the
different group of Users.
7.Time and Date
Every file carries a time stamp which contains the time and date on which
the file is last modified.

Operations on the File


A file is a collection of logically related data that is recorded on the
secondary storage in the form of sequence of operations. The content of
the files are defined by its creator who is creating the file. The various
operations which can be implemented on a file such as read, write, open
and close etc. are called file operations. These operations are performed
by the user by using the commands provided by the operating system.
Some common operations are as follows:

1.Create operation:
This operation is used to create a file in the file system. It is the most
widely used operation performed on the file system. To create a new file
of a particular type the associated application program calls the file
system. This file system allocates space to the file. As the file system
knows the format of directory structure, so entry of this new file is made
into the appropriate directory.
2. Open operation:
This operation is the common operation performed on the file. Once the
file is created, it must be opened before performing the file processing
operations. When the user wants to open a file, it provides a file name to
open the particular file in the file system. It tells the operating system to
invoke the open system call and passes the file name to the file system.
3. Write operation:
This operation is used to write the information into a file. A system call
write is issued that specifies the name of the file and the length of the
data has to be written to the file. Whenever the file length is increased by
specified value and the file pointer is repositioned after the last byte
written.
4. Read operation:
This operation reads the contents from a file. A Read pointer is
maintained by the OS, pointing to the position up to which the data has
been read.
5. Re-position or Seek operation:
The seek system call re-positions the file pointers from the current
position to a specific place in the file i.e. forward or backward depending
upon the user's requirement. This operation is generally performed with
those file management systems that support direct access files.
6. Delete operation:
Deleting the file will not only delete all the data stored inside the file it is
also used so that disk space occupied by it is freed. In order to delete the
specified file the directory is searched. When the directory entry is
located, all the associated file space and the directory entry is released.
7. Truncate operation:
Truncating is simply deleting the file except deleting attributes. The file is
not completely deleted although the information stored inside the file gets
replaced.
8. Close operation:
When the processing of the file is complete, it should be closed so that all
the changes made permanent and all the resources occupied should be
released. On closing it deallocates all the internal descriptors that were
created when the file was opened.
9. Append operation:
This operation adds data to the end of the file.
10. Rename operation:
This operation is used to rename the existing file.

File Type
File type refers to the ability of the operating system to distinguish
different types of file such as text files source files and binary files etc.
Many operating systems support many types of files. Operating system
like MS-DOS and UNIX have the following types of files −
Ordinary files
 These are the files that contain user information.
 These may have text, databases or executable program.
 The user can apply various operations on such files like add,
modify, delete or even remove the entire file.

Simple file types


Executable files
Object files
Source code files
Text files
Database files
Library files
Backup
Multimedia and other files
Directory files
 These files contain list of file names and other information
related to these files.
Special files
 These files are also known as device files.
 These files represent physical device like disks, terminals,
printers, networks, tape drive etc.
These files are of two types −
 Character special files − data is handled character by
character as in case of terminals or printers.
 Block special files − data is handled in blocks as in the case
of disks and tapes.

File Access Methods


Let's look at various ways to access files stored in secondary memory.
Sequential Access

Most of the operating systems access the file sequentially. In other words,
we can say that most of the files need to be accessed sequentially by the
operating system.
In sequential access, the OS read the file word by word. A pointer is
maintained which initially points to the base address of the file. If the
user wants to read first word of the file then the pointer provides that
word to the user and increases its value by 1 word. This process
continues till the end of the file.
Modern word systems do provide the concept of direct access and indexed
access but the most used method is sequential access due to the fact that
most of the files such as text files, audio files, video files, etc need to be
sequentially accessed.

Direct Access
The Direct Access is mostly required in the case of database systems.
In most of the cases, we need filtered information from the database. The
sequential access can be very slow and inefficient in such cases.
Suppose every block of the storage stores 4 records and we know
that the record we needed is stored in 10th block. In that case, the
sequential access will not be implemented because it will traverse all the
blocks in order to access the needed record.
Direct access will give the required result despite of the fact that the
operating system has to perform some complex tasks such as
determining the desired block number. However, that is generally
implemented in database applications.
Indexed Access
An index can be assigned to a group of certain records. A particular
record can be accessed by its index. The index is nothing but the address
of a record in the file.
In index accessing, searching in a large database became very quick and
easy but we need to have some extra space in the memory to store the
index value.

File Allocation Methods


The allocation methods define how the files are stored in the disk blocks.
There are three main disk space or file allocation methods.
 Contiguous Allocation
 Linked Allocation
 Indexed Allocation
The main idea behind these methods is to provide:
 Efficient disk space utilization.
 Fast access to the file blocks.
All the three methods have their own advantages and disadvantages as
discussed below:
1. Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk.
For example, if a file requires n blocks and is given a block b as the
starting location, then the blocks assigned to the file will be: b,
b+1, b+2,……b+n-1. This means that given the starting block address
and the length of the file (in terms of blocks required), we can determine
the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
 Address of starting block
 Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length
= 6 blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
 Both the Sequential and Direct Accesses are supported by this.
For direct access, the address of the kth block of the file which
starts at block b can easily be obtained as (b+k).
 This is extremely fast since the number of seeks are minimal
because of contiguous allocation of file blocks.
Disadvantages:
 This method suffers from both internal and external
fragmentation. This makes it inefficient in terms of memory
utilization.
 Increasing file size is difficult because it depends on the
availability of contiguous memory at a particular instance.

2. Linked List Allocation


In this scheme, each file is a linked list of disk blocks which need not
be contiguous. The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file
block. Each block contains a pointer to the next block occupied by
the file.
The file ‘jeep’ in following image shows how the blocks are randomly
distributed. The last block (25) contains -1 indicating a null pointer and
does not point to any other block.

Advantages:
 This is very flexible in terms of file size. File size can be
increased easily since the system does not have to look for a
contiguous chunk of memory.
 This method does not suffer from external fragmentation. This
makes it relatively better in terms of memory utilization.
Disadvantages:
 Because the file blocks are distributed randomly on the disk, a
large number of seeks are needed to access every block
individually. This makes linked allocation slower.
 It does not support random or direct access. We can not directly
access the blocks of a file. A block k of a file can be accessed by
traversing k blocks sequentially (sequential access ) from the
starting block of the file via block pointers.
 Pointers required in the linked allocation incur some extra
overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the
pointers to all the blocks occupied by a file. Each file has its own index
block. The ith entry in the index block contains the disk address of the ith
file block. The directory entry contains the address of the index block as
shown in the image:

Advantages:
 This supports direct access to the blocks occupied by the file and
therefore provides fast access to the file blocks.
 It overcomes the problem of external fragmentation.
Disadvantages:
 The pointer overhead for indexed allocation is greater than linked
allocation.
 For very small files, say files that expand only 2-3 blocks, the
indexed allocation would keep one entire block (index block) for
the pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer
per block.
For files that are very large, single index block may not be able to hold all
the pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks
together for holding the pointers. Every index block would then
contain a pointer or the address to the next index block.
2. Multilevel index: In this policy, a first level index block is used
to point to the second level index blocks which inturn points to
the disk blocks occupied by the file. This can be extended to 3 or
more levels depending on the maximum file size.
3. Combined Scheme: In this scheme, a special block called
the Inode (information Node) contains all the information
about the file such as the name, size, authority, etc and the
remaining space of Inode is used to store the Disk Block
addresses which contain the actual file as shown in the image
below. The first few of these pointers in Inode point to the direct
blocks i.e the pointers contain the addresses of the disk blocks
that contain data of the file. The next few pointers point to
indirect blocks. Indirect blocks may be single indirect, double
indirect or triple indirect. Single Indirect block is the disk block
that does not contain the file data but the disk address of the
blocks that contain the file data. Similarly, double indirect
blocks do not contain the file data but the disk address of the
blocks that contain the address of the blocks containing the file
data.

Free space management in Operating System


Free space management is a critical aspect of operating systems as it
involves managing the available storage space on the hard disk or other
secondary storage devices. The operating system uses various techniques
to manage free space and optimize the use of storage devices. Here are
some of the commonly used free space management techniques:
1. Linked Allocation: In this technique, each file is represented by a
linked list of disk blocks. When a file is created, the operating
system finds enough free space on the disk and links the blocks
of the file to form a chain. This method is simple to
implement but can lead to fragmentation and wastage of
space.
2. Contiguous Allocation: In this technique, each file is stored as a
contiguous block of disk space. When a file is created, the
operating system finds a contiguous block of free space and
assigns it to the file. This method is efficient as it minimizes
fragmentation but suffers from the problem of external
fragmentation.
3. Indexed Allocation: In this technique, a separate index block is
used to store the addresses of all the disk blocks that make up a
file. When a file is created, the operating system creates an
index block and stores the addresses of all the blocks in the file.
This method is efficient in terms of storage space and minimizes
fragmentation.
4. File Allocation Table (FAT): In this technique, the operating
system uses a file allocation table to keep track of the location
of each file on the disk. When a file is created, the operating
system updates the file allocation table with the address of the
disk blocks that make up the file. This method is widely used in
Microsoft Windows operating systems.

Overall, free space management is a crucial function of operating


systems, as it ensures that storage devices are utilized efficiently and
effectively.
The system keeps tracks of the free disk blocks for allocating
space to files when they are created. Also, to reuse the space
released from deleting the files, free space management becomes crucial.
The system maintains a free space list which keeps track of the disk
blocks that are not allocated to some file or directory. The free space list
can be implemented mainly as:
1. Bitmap or Bit vector – A Bitmap or Bit Vector is series or
collection of bits where each bit corresponds to a disk block.
The bit can take two values: 0 and 1: 0 indicates that the block
is allocated and 1 indicates a free block. The given instance of
disk blocks on the disk in Figure 1 (where green blocks are
allocated) can be represented by a bitmap of 16 bits

as: 0000111000000110.
Advantages –
 Simple to understand.
 Finding the first free block is efficient. It requires
scanning the words (a group of 8 bits) in a bitmap for a
non-zero word. (A 0-valued word has all bits 0). The
first free block is then found by scanning for the first 1
bit in the non-zero word.
2. Linked List – In this approach, the free disk blocks are linked
together i.e. a free block contains a pointer to the next free
block. The block number of the very first disk block is stored at a
separate location on disk and is also cached in

memory.

In Figure-2, the free space list head points to Block 5 which


points to Block 6, the next free block and so on. The last free
block would contain a null pointer indicating the end of free list.
A drawback of this method is the I/O required for free space list
traversal.

3. Grouping – This approach stores the address of the free


blocks in the first free block. The first free block stores the
address of some, say n free blocks. Out of these n blocks, the
first n-1 blocks are actually free and the last block contains the
address of next free n blocks. An advantage of this approach
is that the addresses of a group of free disk blocks can be found
easily.
4. Counting – This approach stores the address of the first free
disk block and a number n of free contiguous disk blocks
that follow the first block. Every entry in the list would
contain:
 Address of first free disk block
 A number n
Here are some advantages and disadvantages of free space management
techniques in operating systems:
Advantages:
1. Efficient use of storage space: Free space management
techniques help to optimize the use of storage space on the hard
disk or other secondary storage devices.
2. Easy to implement: Some techniques, such as linked allocation,
are simple to implement and require less overhead in terms of
processing and memory resources.
3. Faster access to files: Techniques such as contiguous allocation
can help to reduce disk fragmentation and improve access time
to files.
Disadvantages:
1. Fragmentation: Techniques such as linked allocation can lead to
fragmentation of disk space, which can decrease the efficiency of
storage devices.
2. Overhead: Some techniques, such as indexed allocation, require
additional overhead in terms of memory and processing
resources to maintain index blocks.
3. Limited scalability: Some techniques, such as FAT, have limited
scalability in terms of the number of files that can be stored on
the disk.
4. Risk of data loss: In some cases, such as with contiguous
allocation, if a file becomes corrupted or damaged, it may be
difficult to recover the data.
5. Overall, the choice of free space management technique depends
on the specific requirements of the operating system and the
storage devices being used. While some techniques may offer
advantages in terms of efficiency and speed, they may also have
limitations and drawbacks that need to be considered.

Directory Structure
What is a directory?
Directory can be defined as the listing of the related files on the disk. The
directory may store some or the entire file attributes.
To get the benefit of different file systems on the different operating
systems, A hard disk can be divided into the number of partitions of
different sizes. The partitions are also called volumes or mini disks.
Each partition must have at least one directory in which, all the files of
the partition can be listed. A directory entry is maintained for each file in
the directory which stores all the information related to that file.

A directory can be viewed as a file which contains the Meta data of the
bunch of files.
Every Directory supports a number of common operations on the file:
1. File Creation
2. Search for the file
3. File deletion
4. Renaming the file
5. Traversing Files
6. Listing of files

Single Level Directory


The simplest method is to have one big list of all the files on the disk. The
entire system will contain only one directory which is supposed to
mention all the files present in the file system. The directory contains one
entry per each file present on the file system.

This type of directories can be used for a simple system.


Advantages
1. Implementation is very simple.
2. If the sizes of the files are very small then the searching becomes
faster.
3. File creation, searching, deletion is very simple since we have only
one directory.
Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may
take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits
the number of files in the system because most of the Operating
System limits the number of characters used to construct the file
name.

Two Level Directory


In two level directory systems, we can create a separate directory for
each user. There is one master directory which contains separate
directories dedicated to each user. For each user, there is a different
directory present at the second level, containing group of user's file. The
system doesn't let a user to enter in the other user's directory without
permission.

Characteristics of two level directory system


1. Each files has a path name as /User-name/directory-name/
2. Different users can have the same file name.
3. Searching becomes more efficient as only one user's list needs to be
traversed.
4. The same kind of files cannot be grouped into a single directory for
a particular user.
Every Operating System maintains a variable as PWD which contains the
present directory name (present user name) so that the searching can be
done appropriately.
Tree Structured Directory
In Tree structured directory system, any directory entry can either be a
file or sub directory. Tree structured directory system overcomes the
drawbacks of two level directory system. The similar kind of files can now
be grouped in one directory.
Each user has its own directory and it cannot enter in the other user's
directory. However, the user has the permission to read the root's data
but he cannot write or modify this. Only administrator of the system has
the complete access of root directory.
Searching is more efficient in this directory structure. The concept of
current working directory is used. A file can be accessed by two types of
path, either relative or absolute.
Absolute path is the path of the file with respect to the root directory of
the system while relative path is the path with respect to the current
working directory of the system. In tree structured directory systems, the
user is given the privilege to create the files as well as directories.

Permissions on the file and directory


A tree structured directory system may consist of various levels therefore
there is a set of permissions assigned to each file and directory.
The permissions are R W X which are regarding reading, writing and the
execution of the files or directory. The permissions are assigned to three
types of users: owner, group and others.
There is a identification bit which differentiate between directory and file.
For a directory, it is d and for a file, it is dot (.)

Acyclic-Graph Structured Directories


The tree structured directory system doesn't allow the same file to exist in
multiple directories therefore sharing is major concern in tree structured
directory system. We can provide sharing by making the directory an
acyclic graph. In this system, two or more directory entry can point to the
same file or sub directory. That file or sub directory is shared between the
two directory entries.
These kinds of directory graphs can be made using links or aliases. We
can have multiple paths for a same file. Links can either be symbolic
(logical) or hard link (physical).
If a file gets deleted in acyclic graph structured directory system, then
1. In the case of soft link, the file just gets deleted and we are left with a
dangling pointer.
2. In the case of hard link, the actual file will be deleted only if all the
references to it gets deleted.
General-graph directory
This is an extension to the acyclic-graph directory. In the general-graph
directory, there can be a cycle inside a directory.

In the above image, we can see that a cycle is formed in the user 2
directory. Although it provides greater flexibility, it is complex to
implement this structure.
Advantages of General-graph directory
 Compared to the others, the General-Graph directory structure
is more flexible.
 Cycles are allowed in the directory for general-graphs.
Disadvantages of General-graph directory
 It costs more than alternative solutions.
 Garbage collection is an essential step here.

Directory Implementation in Operating System


Directory implementation in the operating system can be done using
Singly Linked List and Hash table. The efficiency, reliability, and
performance of a file system are greatly affected by the selection of
directory-allocation and directory-management algorithms. There are
numerous ways in which the directories can be implemented. But we need
to choose an appropriate directory implementation algorithm that
enhances the performance of the system.
Directory Implementation using Singly Linked List
The implementation of directories using a singly linked list is easy to
program but is time-consuming to execute. Here we implement a
directory by using a linear list of filenames with pointers to the data
blocks.

Directory Implementation Using Singly Linked List


 To create a new file the entire list has to be checked such that
the new directory does not exist previously.
 The new directory then can be added to the end of the list or at
the beginning of the list.
 In order to delete a file, we first search the directory with the
name of the file to be deleted. After searching we can delete that
file by releasing the space allocated to it.
 To reuse the directory entry we can mark that entry as unused or
we can append it to the list of free directories.
 To delete a file linked list is the best choice as it takes less time.
Disadvantage
The main disadvantage of using a linked list is that when the user needs
to find a file the user has to do a linear search. In today’s world directory
information is used quite frequently and linked list implementation results
in slow access to a file. So the operating system maintains a cache to
store the most recently used directory information.
Directory Implementation using Hash Table
An alternative data structure that can be used for directory
implementation is a hash table. It overcomes the major drawbacks of
directory implementation using a linked list. In this method, we use a
hash table along with the linked list. Here the linked list stores the
directory entries, but a hash data structure is used in combination with
the linked list.
In the hash table for each pair in the directory key-value pair is
generated. The hash function on the file name determines the key and
this key points to the corresponding file stored in the directory. This
method efficiently decreases the directory search time as the entire list
will not be searched on every operation. Using the keys the hash table
entries are checked and when the file is found it is fetched.
Directory Implementation Using Hash Table
Disadvantage:
The major drawback of using the hash table is that generally, it has a
fixed size and its dependency on size. But this method is usually faster
than linear search through an entire directory using a linked list.

You might also like