0% found this document useful (0 votes)
34 views

File System Implementation

OS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

File System Implementation

OS
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

STORAGE

MANAGEMENT
File System
Implementation
File Concept
File Attributes
Different OSes keep track of different file attributes, including:
 Name - Some systems give special significance to names, and particularly extensions
( .exe, .txt, etc. ), and some do not. Some extensions may be of significance to the OS ( .exe
), and others only to certain applications ( .jpg )
 Identifier ( e.g. inode number )
 Type - Text, executable, other binary, etc.
 Location - on the hard drive.
 Size
 Protection
 Time & Date
 User ID
File Operations
The file ADT supports many common operations:
• Creating a file
• Writing a file
• Reading a file
• Repositioning within a file
• Deleting a file
• Truncating a file.
Most OSes require that files be opened before access and closed after all access is complete. Normally the
programmer must open and close files explicitly, but some rare systems open the file automatically at first
access. Information about currently open files is stored in an open file table, containing for example:
 File pointer - records the current position in the file, for the next read or write access.
 File-open count - How many times has the current file been opened ( simultaneously by different
processes ) and not yet closed? When this counter reaches zero the file can be removed from the table.
 Disk location of the file.
 Access rights
File Types
Windows ( and some other systems ) use special file extensions to indicate the type of each file:
File-System
Implementation
File-System Structure
 Hard disks have two important properties that make them suitable for secondary storage of files
in file systems: (1) Blocks of data can be rewritten in place, and (2) they are direct access,
allowing any block of data to be accessed with only (relatively) minor movements of the disk
heads and rotational latency.
 Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may range
from 512 bytes to 4K or larger.
 File systems provide efficient and convenient access to the disk by allowing data to be stored,
located, and retrieved easily. A file system poses two quite different design problems. The first
problem is defining how the file system should look to the user. This task involves defining a file
and its attributes, the operations allowed on a file, and the directory structure for organizing files.
The second problem is creating algorithms and data structures to map the logical file system onto
the physical secondary-storage devices.
File systems organize storage on disk drives, and can be viewed as
a layered design:
 At the lowest layer are the physical devices, consisting of the
magnetic media, motors & controls, and the electronics connected
to them and controlling them. Modern disk put more and more of
the electronic controls directly on the disk drive itself, leaving
relatively little work for the disk controller card to perform.
 I/O Control consists of device drivers, special software programs
(often written in assembly) which communicate with the devices
by reading and writing special codes directly to and from memory
addresses corresponding to the controller card's registers. Each
controller card (device) on a system has a different set of
addresses (registers, a.k.a. ports) that it listens to, and a unique set
of command codes and results codes that it understands.
• The basic file system level works directly with the device drivers in terms of
retrieving and storing raw blocks of data, without any consideration for what is in
each block. Depending on the system, blocks may be referred to with a single block
number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations.
• The file organization module knows about files and their logical blocks, and how
they map to physical blocks on the disk. In addition to translating from logical to
physical blocks, the file organization module also maintains the list of free blocks,
and allocates free blocks to files as needed.
• The logical file system deals with all of the meta data associated with a file ( UID,
GID, mode, dates, etc), i.e. everything about the file except the data itself. This level
manages the directory structure and the mapping of file names to file control blocks,
FCBs, which contain all of the meta data as well as block number information for
finding the data on the disk.
• The layered approach to file systems means that much of the code can be used uniformly for a
wide variety of different file systems, and only certain layers need to be filesystem specific.
• Examples include FAT (FAT12, FAT16, FAT32), exFAT, NTFS, HFS and HFS+, HPFS, APFS, UFS, ext2,
ext3, ext4, XFS, btrfs, ISO 9660, Files-11, Veritas File System, VMFS, ZFS, ReiserFS and UDF.
File-System Implementation
File systems store several important data structures on the disk:
 A boot-control block, (per volume) a.k.a. the boot block in UNIX or the partition boot sector in
Windows contains information about how to boot the system off of this disk. This will generally be the
first sector of the volume if there is a bootable system loaded on that volume, or the block will be left
vacant otherwise.
 A volume control block, (per volume) a.k.a. the master file table in UNIX or the superblock in
Windows, which contains information such as the partition table, number of blocks on each filesystem,
and pointers to free blocks and free FCB blocks.
 A directory structure (per file system), containing file names and pointers to corresponding FCBs.
UNIX uses inode numbers, and NTFS uses a master file table.
 The File Control Block, FCB, (per file) containing details about ownership, size, permissions, dates,
etc. UNIX stores this information in inodes, and NTFS in the master file table as a relational database
structure.
There are also several key data structures stored in memory:
 An in-memory mount table, which contains information about each mounted volume
 An in-memory directory cache of recently accessed directory information.
 A system-wide open file table, containing a copy of the FCB for every currently open file in the
system, as well as some other related information.
 A per-process open file table, containing a pointer to the system open file table as well as some
other information.
 Buffers hold file-system blocks when they are being read from or written to disk.

A typical file-control block.


• When a new file is created, a new FCB is allocated and filled out with important information
regarding the new file. The appropriate directory is modified with the new file name and FCB
information.
• When a file is accessed during a program, the open( ) system call reads in the FCB
information from disk, and stores it in the system-wide open file table. An entry is added to
the per-process open file table referencing the system-wide table, and an index into the per-
process table is returned by the open( ) system call. UNIX refers to this index as a file
descriptor, and Windows refers to it as a file handle.
• If another process already has a file open when a new request comes in for the same file, and
it is sharable, then a counter in the system-wide table is incremented and the per-process table
is adjusted to point to the existing entry in the system-wide table.
• When a file is closed, the per-process table entry is freed, and the counter in the system-wide
table is decremented. If that counter reaches zero, then the system wide table is also freed.
Any data currently stored in memory cache for this file is written out to disk if necessary.
In-memory file-system structures. (a) File open. (b) File read.
Directory Implementation
 Directories need to be fast to search, insert, and delete, with a minimum of wasted disk space.
Linear List
• A linear list is the simplest and easiest directory structure to set up, but it does have some
drawbacks.
• Finding a file (or verifying one does not already exist upon creation) requires a linear search.
• Deletions can be done by moving all entries, flagging an entry as deleted, or by moving the
last entry into the newly vacant position.
• Sorting the list makes searches faster, at the expense of more complex insertions and
deletions.
• A linked list makes insertions and deletions into a sorted list easier, with overhead for the
links.
• More complex data structures, such as B-trees, could also be considered.
Hash Table
• A hash table can also be used to speed up searches.
• Hash tables are generally implemented in addition to a linear or other structure
Allocation Methods
There are various methods which can be used to allocate disk space to the files. There are
three main disk space or file allocation methods.
1. Contiguous Allocation.
2. Linked Allocation.
3. Indexed Allocation.
• Selection of an appropriate allocation method will significantly affect the performance
and efficiency of the system.
• Allocation method provides a way in which the disk will be utilized and the files will be
accessed.
• The main idea behind these methods is to provide:
Efficient disk space utilization.
Fast access to the file blocks.
Contiguous Allocation
• Contiguous allocation requires that each file occupy a set of contiguous blocks on the disk.
Disk addresses define a linear ordering on the disk.
• For example, if a file requires n blocks and is given a block b as the starting location, then
the blocks assigned to the file will be: b, b+1, b+2,……b+n-1.
• When head movement is needed, the head need only move from one track to the next.
Thus, the number of disk seeks required for accessing contiguously allocated files is
minimal, as is seek time when a seek is finally needed.

Advantages:
 Both the Sequential and Direct Accesses are supported by this. For direct access, the
address of the kth block of the file which starts at block b can easily be obtained as (b+k).
 This is extremely fast since the number of seeks are minimal because of contiguous
allocation of file blocks.
The file ‘mail’ in the following figure starts from the block 19 with length = 6
blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.

Disadvantages:
• This method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
• Increasing file size is difficult because it depends on the availability of contiguous
memory at a particular instance.
Linked Allocation
• Linked allocation solves all problems of contiguous allocation. With linked allocation, each file is a
linked list of disk blocks; the disk blocks may be scattered anywhere on the disk.
• The directory contains a pointer to the first and last blocks of the file. Each block contains a pointer
to the next block occupied by the file.

The file ‘jeep’ in


following image
shows how the blocks
are randomly
distributed. The last
block (25) contains -1
indicating a null
pointer and does not
point to any other
block.
Advantages:
 This is very flexible in terms of file size. File size can be increased easily since the system does not have
to look for a contiguous chunk of memory.
 This method does not suffer from external fragmentation. This makes it relatively better in terms of
memory utilization.
Disadvantages:
 Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to
access every block individually. This makes linked allocation slower.
 It does not support random or direct access. We can not directly access the blocks of a file. A block k of a
file can be accessed by traversing k blocks sequentially (sequential access) from the starting block of the
file via block pointers.
 Pointers required in the linked allocation incur some extra overhead

 Allocating clusters of blocks reduces the space wasted by pointers, at the cost of internal fragmentation.
 Another big problem with linked allocation is reliability if a pointer is lost or damaged. Doubly linked
lists provide some protection, at the cost of additional overhead and wasted space.
 The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all the links are
stored in a separate table at the beginning of the disk. The benefit of this approach is that the FAT table
can be cached in memory, greatly improving random access speeds.
Indexed Allocation
• Linked allocation solves the external-fragmentation and size-declaration problems of contiguous
allocation. However, in the absence of a FAT, linked allocation cannot support efficient direct
access, since the pointers to the blocks are scattered with the blocks themselves all over the disk and
must be retrieved in order.

• Indexed allocation solves this


problem by bringing all the
pointers together into one
location: the index block.
• Each file has its own index
block, which is an array of disk-
block addresses. The ith entry in
the index block points to the ith
block of the file.
Advantages:
 This supports direct access to the blocks occupied by the file and therefore provides fast
access to the file blocks.
 It overcomes the problem of external fragmentation.

Disadvantages:
 The pointer overhead for indexed allocation is greater than linked allocation.
 For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep
one entire block (index block) for the pointers which is inefficient in terms of memory
utilization. However, in linked allocation we lose the space of only 1 pointer per block.

Some disk space is wasted ( relative to linked lists or FAT tables ) because an
entire index block must be allocated for each file, regardless of how many
data blocks the file contains. This leads to questions of how big the index
block should be, and how it should be implemented.
Performance

 The optimal allocation method is different for sequential access files than for
random access files, and is also different for small files than for large files.
 Some systems support more than one allocation method, which may require
specifying how the file is to be used (sequential or random access) at the time
it is allocated. Such systems also provide conversion utilities.
 Some systems have been known to use contiguous access for small files, and
automatically switch to an indexed scheme when file sizes surpass a certain
threshold.
 And of course some systems adjust their allocation schemes (e.g. block sizes)
to best match the characteristics of the hardware for optimum performance.
Free Space Management
To keep track of free disk space, the system maintains a free-space list. The free-space list
records all free disk blocks-those not allocated to some file or directory. To create a file, the
free-space list is searched for the required amount of space, and that space is allocated to the
new file

Bit Vector
Frequently, the free-space list is implemented as a bit map or bit vector. Each block is
represented by 1 bit. If the block is free, the bit is 1; if the block is allocated, the bit is 0.
For example, consider a disk where blocks 2, 3,4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26,
and 27 are free, and the rest of the blocks are allocated. The free space bit map would be
00111100111111000110000001110000………
The main advantage of this approach is it’s relatively simplicity and efficiency in finding
the first free block, or n consecutive free blocks on the disk. The bit-vector method is
inefficient if the entire vector is not kept in main memory.
Linked List
Another approach to free-space management is to link together all the free disk blocks,
keeping a pointer to the first free block in a special location on the disk and caching it in
memory.

This first block contains a pointer to the next free disk


block, and so on. Traversing the list and/or finding a
contiguous block of a given size are not easy, but
fortunately are not frequently needed operations.
Generally the system just adds and removes single blocks
from the beginning of the list. The FAT method
incorporates free-block accounting into the allocation
data structure. No separate method is needed.
Grouping
A modification of the free-list approach is to store the addresses of n free blocks in the first
free block. The first n-1 of these blocks are actually free. The last block contains the
addresses of another n free block, and so on. The importance of this implementation is that
the addresses of a large number of free blocks can be found quickly, unlike in the standard
linked-list approach.

Counting
When there are multiple contiguous blocks of free space then the system can keep track of the
starting address of the group and the number of contiguous free blocks. As long as the average
length of a contiguous group of free blocks is greater than two this offers a savings in space
needed for the free list. The free-space list can contain pairs (block number, count)

You might also like