File System Implementation
File System Implementation
MANAGEMENT
File System
Implementation
File Concept
File Attributes
Different OSes keep track of different file attributes, including:
Name - Some systems give special significance to names, and particularly extensions
( .exe, .txt, etc. ), and some do not. Some extensions may be of significance to the OS ( .exe
), and others only to certain applications ( .jpg )
Identifier ( e.g. inode number )
Type - Text, executable, other binary, etc.
Location - on the hard drive.
Size
Protection
Time & Date
User ID
File Operations
The file ADT supports many common operations:
• Creating a file
• Writing a file
• Reading a file
• Repositioning within a file
• Deleting a file
• Truncating a file.
Most OSes require that files be opened before access and closed after all access is complete. Normally the
programmer must open and close files explicitly, but some rare systems open the file automatically at first
access. Information about currently open files is stored in an open file table, containing for example:
File pointer - records the current position in the file, for the next read or write access.
File-open count - How many times has the current file been opened ( simultaneously by different
processes ) and not yet closed? When this counter reaches zero the file can be removed from the table.
Disk location of the file.
Access rights
File Types
Windows ( and some other systems ) use special file extensions to indicate the type of each file:
File-System
Implementation
File-System Structure
Hard disks have two important properties that make them suitable for secondary storage of files
in file systems: (1) Blocks of data can be rewritten in place, and (2) they are direct access,
allowing any block of data to be accessed with only (relatively) minor movements of the disk
heads and rotational latency.
Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may range
from 512 bytes to 4K or larger.
File systems provide efficient and convenient access to the disk by allowing data to be stored,
located, and retrieved easily. A file system poses two quite different design problems. The first
problem is defining how the file system should look to the user. This task involves defining a file
and its attributes, the operations allowed on a file, and the directory structure for organizing files.
The second problem is creating algorithms and data structures to map the logical file system onto
the physical secondary-storage devices.
File systems organize storage on disk drives, and can be viewed as
a layered design:
At the lowest layer are the physical devices, consisting of the
magnetic media, motors & controls, and the electronics connected
to them and controlling them. Modern disk put more and more of
the electronic controls directly on the disk drive itself, leaving
relatively little work for the disk controller card to perform.
I/O Control consists of device drivers, special software programs
(often written in assembly) which communicate with the devices
by reading and writing special codes directly to and from memory
addresses corresponding to the controller card's registers. Each
controller card (device) on a system has a different set of
addresses (registers, a.k.a. ports) that it listens to, and a unique set
of command codes and results codes that it understands.
• The basic file system level works directly with the device drivers in terms of
retrieving and storing raw blocks of data, without any consideration for what is in
each block. Depending on the system, blocks may be referred to with a single block
number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations.
• The file organization module knows about files and their logical blocks, and how
they map to physical blocks on the disk. In addition to translating from logical to
physical blocks, the file organization module also maintains the list of free blocks,
and allocates free blocks to files as needed.
• The logical file system deals with all of the meta data associated with a file ( UID,
GID, mode, dates, etc), i.e. everything about the file except the data itself. This level
manages the directory structure and the mapping of file names to file control blocks,
FCBs, which contain all of the meta data as well as block number information for
finding the data on the disk.
• The layered approach to file systems means that much of the code can be used uniformly for a
wide variety of different file systems, and only certain layers need to be filesystem specific.
• Examples include FAT (FAT12, FAT16, FAT32), exFAT, NTFS, HFS and HFS+, HPFS, APFS, UFS, ext2,
ext3, ext4, XFS, btrfs, ISO 9660, Files-11, Veritas File System, VMFS, ZFS, ReiserFS and UDF.
File-System Implementation
File systems store several important data structures on the disk:
A boot-control block, (per volume) a.k.a. the boot block in UNIX or the partition boot sector in
Windows contains information about how to boot the system off of this disk. This will generally be the
first sector of the volume if there is a bootable system loaded on that volume, or the block will be left
vacant otherwise.
A volume control block, (per volume) a.k.a. the master file table in UNIX or the superblock in
Windows, which contains information such as the partition table, number of blocks on each filesystem,
and pointers to free blocks and free FCB blocks.
A directory structure (per file system), containing file names and pointers to corresponding FCBs.
UNIX uses inode numbers, and NTFS uses a master file table.
The File Control Block, FCB, (per file) containing details about ownership, size, permissions, dates,
etc. UNIX stores this information in inodes, and NTFS in the master file table as a relational database
structure.
There are also several key data structures stored in memory:
An in-memory mount table, which contains information about each mounted volume
An in-memory directory cache of recently accessed directory information.
A system-wide open file table, containing a copy of the FCB for every currently open file in the
system, as well as some other related information.
A per-process open file table, containing a pointer to the system open file table as well as some
other information.
Buffers hold file-system blocks when they are being read from or written to disk.
Advantages:
Both the Sequential and Direct Accesses are supported by this. For direct access, the
address of the kth block of the file which starts at block b can easily be obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of contiguous
allocation of file blocks.
The file ‘mail’ in the following figure starts from the block 19 with length = 6
blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Disadvantages:
• This method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
• Increasing file size is difficult because it depends on the availability of contiguous
memory at a particular instance.
Linked Allocation
• Linked allocation solves all problems of contiguous allocation. With linked allocation, each file is a
linked list of disk blocks; the disk blocks may be scattered anywhere on the disk.
• The directory contains a pointer to the first and last blocks of the file. Each block contains a pointer
to the next block occupied by the file.
Allocating clusters of blocks reduces the space wasted by pointers, at the cost of internal fragmentation.
Another big problem with linked allocation is reliability if a pointer is lost or damaged. Doubly linked
lists provide some protection, at the cost of additional overhead and wasted space.
The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all the links are
stored in a separate table at the beginning of the disk. The benefit of this approach is that the FAT table
can be cached in memory, greatly improving random access speeds.
Indexed Allocation
• Linked allocation solves the external-fragmentation and size-declaration problems of contiguous
allocation. However, in the absence of a FAT, linked allocation cannot support efficient direct
access, since the pointers to the blocks are scattered with the blocks themselves all over the disk and
must be retrieved in order.
Disadvantages:
The pointer overhead for indexed allocation is greater than linked allocation.
For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep
one entire block (index block) for the pointers which is inefficient in terms of memory
utilization. However, in linked allocation we lose the space of only 1 pointer per block.
Some disk space is wasted ( relative to linked lists or FAT tables ) because an
entire index block must be allocated for each file, regardless of how many
data blocks the file contains. This leads to questions of how big the index
block should be, and how it should be implemented.
Performance
The optimal allocation method is different for sequential access files than for
random access files, and is also different for small files than for large files.
Some systems support more than one allocation method, which may require
specifying how the file is to be used (sequential or random access) at the time
it is allocated. Such systems also provide conversion utilities.
Some systems have been known to use contiguous access for small files, and
automatically switch to an indexed scheme when file sizes surpass a certain
threshold.
And of course some systems adjust their allocation schemes (e.g. block sizes)
to best match the characteristics of the hardware for optimum performance.
Free Space Management
To keep track of free disk space, the system maintains a free-space list. The free-space list
records all free disk blocks-those not allocated to some file or directory. To create a file, the
free-space list is searched for the required amount of space, and that space is allocated to the
new file
Bit Vector
Frequently, the free-space list is implemented as a bit map or bit vector. Each block is
represented by 1 bit. If the block is free, the bit is 1; if the block is allocated, the bit is 0.
For example, consider a disk where blocks 2, 3,4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26,
and 27 are free, and the rest of the blocks are allocated. The free space bit map would be
00111100111111000110000001110000………
The main advantage of this approach is it’s relatively simplicity and efficiency in finding
the first free block, or n consecutive free blocks on the disk. The bit-vector method is
inefficient if the entire vector is not kept in main memory.
Linked List
Another approach to free-space management is to link together all the free disk blocks,
keeping a pointer to the first free block in a special location on the disk and caching it in
memory.
Counting
When there are multiple contiguous blocks of free space then the system can keep track of the
starting address of the group and the number of contiguous free blocks. As long as the average
length of a contiguous group of free blocks is greater than two this offers a savings in space
needed for the free list. The free-space list can contain pairs (block number, count)