0% found this document useful (0 votes)
51 views

File Systems

This document discusses key concepts related to file systems including: 1. File systems organize storage into logical files that have attributes like name, size, location. Directories map file names to locations and allow hierarchical organization. 2. Common access methods are sequential and direct access. Allocation methods like contiguous, linked, and indexed determine how files are physically stored across blocks. 3. Directory structures range from single-level to tree-structured directories that allow nesting and sharing. Implementations include linear lists, hash tables, and other indexing schemes to map names to locations efficiently.

Uploaded by

Amit Kr Mandal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

File Systems

This document discusses key concepts related to file systems including: 1. File systems organize storage into logical files that have attributes like name, size, location. Directories map file names to locations and allow hierarchical organization. 2. Common access methods are sequential and direct access. Allocation methods like contiguous, linked, and indexed determine how files are physically stored across blocks. 3. Directory structures range from single-level to tree-structured directories that allow nesting and sharing. Implementations include linear lists, hash tables, and other indexing schemes to map names to locations efficiently.

Uploaded by

Amit Kr Mandal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

File Systems

1. File Concept
2. Access Methods
3. Directory Structures
4. File System Structure
5. Allocation Methods (Contiguous, Linked, Indexed)
6. Free-space Management (Bit Vector, Linked List, Grouping)
7. Directory Implementation (Linear, List, Hash Table)
8. Efficiency & Performance

File Concept
• OS abstracts the the physical properties of storage devices to define a logical storage unit
(the file)
• A file is named collection of related information stored on secondary storage
• Form user’s perspective, a file is the smallest allotment of logical secondary
storage
• Types of files
– Data
• numeric
• character
• binary
– Program
– Free form
• Test file
– Formatted
• Executable
• database

File Attributes

• Name – only information kept in human readable form


• Type – needed for systems that support different types
• Location – pointer to file location on device
• Size – current file size (bytes / words / blocks)
• Protection – controls who can do reading,writing, executing
• Time, date, and user identification
– For creation, last modification, last use
– Data for protection, security, and usage monitoring
• Information about files are kept in the directory structure, which is maintained on the disk

File Operations

• Create
– Find space in the file system
– Make an entry in the directory
• Write
– Search the directory to find location
– Keep a write pointer to the location where the next write is to take place
• Read
– Search the directory to find location
– Keep a read pointer to the location where the next read is to take place
• Reposition within file
– File seek
• Delete
– Search the directory to find location
– Release all file space
• Truncate
– All attributes remain unchanged
– File length reset to zero, file space released
• Open(Fi)
– search the directory structure on disk for entry Fi, and move the content of entry
to memory
• Close (Fi)
– move the content of entry Fi in memory to directory structure on disk

Access Methods

• Defines how information within file isaccessed


• Sequential Access
– Based on tape model
– Information within a file is processed in order,one record after another
– A read operation reads the next portion of the file, advances the file pointer
automatically
– A write appends to the end of a file and advances to the newly written material
• Direct Access
– Based on disk model
– Allows read/write records/blocks within a file rapidly in no particular order
– Each file operation must include the record/block number (relative to the beginning
of the file) as parameter
– Relative record/block numbers allow OS to decide where to place the file

Other Access Methods

• Built on top of direct access method


• Construct an index for the file
– Contains pointers to various blocks within the file
• To find a record in the file
– Search the index
– Use the pointers to access the file directly
• Example
– A retail price list file contain 16 byte records
• 10 digit universal product code (UPC)
• 6 digit price
– A file of 120,000 records occupy 1875 blocks(1024 bytes per block)
– By keeping the file sorted by UPC, we can define an index consisting of first UPC
in each block
– With large file the index file may become large
• Create an index of index files
Directory Structure

• Disks are split into one or more partitions known as minidisks or volumes
• Partitions are low level structures in which files and directories reside
• A device directory or volume table of contents contains information about files within a
partition
• Directory is a symbol table translating file names into their entries

Single-Level Directory

• Simplest directory structure


• All files are contained in the same directory
• Limitations
– When the number of users and number of files increases
– All files must have unique name
– Difficult to remember file names as number of files increases

Two-Level Directory
• Each user has own User File Directory(UFD)
• When a user logs in system’s Master File Directory (MFD) is searched
• MFD is indexed by user name/account number, each entry points to the UFD of user
• To create a file, OS searches the UFD of the user to check for duplicate file name
• To delete a file, OS confines its search to local UFD
• Users cannot share files
• A user name and a file name define a path

Tree-Structured Directories
• Allows users to create their own sub directories and organize files accordingly
• A directory is simply another file
• Path names: absolute, relative
• Deletion of directory is important decision
• Prohibits sharing of files and directories and security

Directory Implementation

• Linear List
– Use a linear list of file names with pointers to data blocks
– Reuse directory entry (when files deleted)
• Mark the entry as unused
• Attach the entry to a list of free directory entries
• Copy the last entry in the directory into the freed location and decrease the
length of the directory
– Disadvantage
• Requires linear search to find an entry
– Many OS use a software cache to store the most recently used directory information
• Sorted list allows binary search and decreases search time
• Sorting may complicate creating and deleting file
• Hash Table
– A linear list stores the the directory entries
– A hash table takes a value computed from the file name and returns a pointer
to the file name in the linear list
– Greatly decreases directory search time
– Insertion and deletion are simple
– Disadvantage
• Hash table is fixed size
• Depends on the hash function
– A chained-overflow hash table can be used
• Each hash entry can be a linked list
• Collisions are resolved by adding the new entry to the linked list
• Lookups may be some what slow
• Much faster than a linear search through the directory entries

Allocation Methods
• How to allocate space to files so that disk space is used effectively and files accessed
quickly

Contiguous Allocation
• Each file occupies a set of contiguous blocks on the disk
• Number of disk seeks required for accessing contiguously allocated files is
minimal
• Directory entry contains starting location (block #) and length (number of blocks)
• Both sequential and direct access is supported
• Application of dynamic storage-allocation (how to satisfy a request of size n from a
list of free holes)
-First-Fit: Allocate the first hole that is big enough
-Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless
ordered by size. Produces the smallest leftover hole.
-Worst –Fit: Allocate the largest hole; must also search entire list. Produces the
largest leftover hole
-First-fit and best-fit better than worst-fit in terms of speed and storage utilization
-Sometimes worst-fit is better than best-fit. Because, in the case of worst-fit, there
may be a possibility to store another file in the free space of the block.
• All these above algorithms suffer from the problem of External fragmentation. As files are
allocated and deleted, the free disk space is broken into little picees. External fragmentation
exists whenever free space is broken into chunks. It becomes a problem when the largest
contiguous chunk is insufficient for a request; storage is fragmented into a number of holes,
none of which is large enough to store the data. If there is so many free space/holes in a
memory but none of them is capable to store a file of our required size, then this
phenomenon is called external fragmentation.
• Solution: Defragmentation
To prevent the loss of significant amount of disk space for external
fragmentation user must run compaction routine which basically copies the entire file system
onto another disk, the original one then completely freed, creating a large single contiguous
free hole. The files can then be copied back to original one. So all free blocks are compacted
to one hole. The overhead of this scheme is time so it is not recommended to do frequently.
• Files cannot grow
• Solution:
• Allocate in contiguous chunks
• Location of file’s blocks are recorded as
• Location
• Block count
• Link to the first block of the next chunk
 Problems with contiguous allocation
-Suffers from external fragmentation.
- You need to know the size of the file in advance before storage and in most
of cases it is very difficult to estimate the final size of the file in advance.

Linked Allocation
• Linked allocation solves all problems of contiguous allocation.
• Each file is a linked list of disk blocks, blocks may be scattered anywhere on the disk
• Directory entry contains a pointer to the first and last blocks of the file
• No external fragmentation
• File size need not be declared during creation. A file can continue to grow as long as free
blocks are available.
• It is never necessary to compact disk space.
• Problems with Linked allocation
• Can be use only for sequential access file
• Space required for pointers are wasted
• Solutions
– Allocate space in clusters (ex - 4 blocks)
– Increases internal fragmentation
• Reliability
– A bug in OS software or disk hardware failure may cause pointer to be lost
or damaged
• Solutions
– Use doubly linked list
– Store name of file, relative block in each block
• An important variation on linked allocation is to use of a file-allocation table(FAT)

File Allocation Table (FAT):


• A section of disk at the beginning of each partition is set aside to contain FAT
• FAT contain
– One entry for each block; indexed by block number
– Directory entry contain block number of the first block of the file
– Last block of a file contain special end of file marker
– Unused blocks in FAT contain 0 value

Indexed Allocation
• Brings all pointers together into the index block
• Each file has its own index block, which is an array of disk-block addresses.
• Directory entry contains the address of the index block
• All pointers in the index block are NULL initially
• ith entry in the index block points to the ith block within the file
• Supports direct access, without suffering from external fragmentation.
• Pointer overhead is greater than linkedallocation
• How large the index block should be?
• Linked scheme:
– An index block is normally one disk block
– For large files, link together several index blocks
• Multilevel index:
– First level index block points to a set of second level index blocks which contain
pointers to the file blocks
• Combined Scheme:
– Ex: Unix File System

Free-Space Management/Free-Space Allocation

• Bit vector: Frequently, the free-space list is implemented as a bit map or bit vector. Vector
size is the no. of blocks in the memory. Each block is represented by 1 bit. If the block is
free, the bit is 1; if the block is allocated, the bit is 0.
For example, consider a disk where blocks 2,3,4,5,8,9,10,11,12,13,17,18,25,26 and 27
are free and the rest of the blocks are allocated. The free-space bit map would be
001111001111110001100000011100000…
(n blocks)
0 1 2 n-1
block size = 212 bytes

disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 bits (or 32K bytes)
• Easy to find
– First free block
– N consecutive free blocks
• Processors support bit manipulation
functions that can be used effectively to
find free blocks
– Instruction returns the offset of the first 1 bit in a
word
• Calculation of block number
= Number of bits per word * number of 0-value
Jadavpur University Mridul S. Barik
0 words + offset of first 1 bit
• Inefficient unless the bit vector is kept in
memory
Free-Space Management (contd.)
• Linked list (free list)
– Keep a pointer to the first free block in a special
location on the disk and caching it in memory
– Each free block contains a pointer to next free
block
– To traverse the list each free block has to be
read, requiring substantial I/O
– No waste of memory space
Jadavpur University Mridul S. Barik
Free-Space Management
(contd.)
• Grouping
– Store the address of n free block
– First n-1 of these blocks are actually free
– Last block contains the address of another n
free blocks and so on
– Address of large number of free blocks can be
found very quickly
• Counting
– Generally several contiguous blocks may be
allocated or freed simultaneously
– Keep the address of the first free block and the
number n of free contiguous blocks that follow
the first block
– Each entry in free list contain
• A disk address
• A count
– Overall free list becomes shorter as long as the
count is generally greater that 1
Jadavpur University Mridul S. Barik
Efficiency and Performance
• Disks tend to be the bottleneck in system
performance since they are the slowest
component
• Efficiency depends on
– Disk allocation and directory algorithms
• Unix: pre-allocation of inodes and spreading
them across the partition improves performance
• Use of clusters as lowest allocation unit improves
performance at the cost of internal fragmentation
– Types of data kept in the files directory entry
• Anytime a file is opened its directory entry be
read and written as well; inefficient for frequently
accessed files
• Size of pointer used to access data is also
important
–Length of data structures (process table, open
file table etc.) maintained by OS
• Fixed length, allocated at system start up
– When full, system fails to provide service
• Allocate dynamically; algorithms become
complicated and OS a little slower
Jadavpur University Mridul S. Barik
Efficiency and Performance
(contd.)
• Performance
– Once basic file system algorithms are selected
performance still can be improved
– Most disk controllers have on-board cache
sufficiently large to hold entire track at a time
– Some OS maintain portion of memory as disk
cache to keep disk blocks, assuming that they
will be used again
– Some OS use page cache to cache file data as
pages rather than blocks
– Unified virtual memory: page caching used for
both process pages and file data
– LRU is used for block/page replacement
– When data are written to a disk file, pages are
buffered in the cache and the disk driver sorts
its output queue according to disk address,
minimizing disk head seek
Jadavpur University Mridul S. Barik
Efficiency and Performance
(contd.)
– Synchronous write
• Occur in the order disk subsystem receives them
• Writes are not buffered
• Calling routine must wait for the data to be
written to disk before it can proceed
• Metadata writes can be asynchronous
– Asynchronous write
• Data is stored in cache and control returns to the
caller
– RAM Disk
• A section of memory is set aside and treated as
virtual disk or RAM disk
• RAM disk drivers accepts all standard disk
operations but performs them in memory
• RAM disks are useful only for temporary storage
(ex-intermediate compiler files)
• Content of RAM disk are totally user controlled
• Content of disk cache are under the control of OS
Jadavpur University Mridul S. Barik

You might also like