Unit 5 OS
Unit 5 OS
STORAGE MANAGEMENT
FILE CONCEPT
Computers can store information on several different storage media such
magnetic disk, magnetic tapes and optical disks. Each File has a unique name
making it easy to find and use.
A File has a certain defined structure according to its types,
Text File
A Text file is a collection of characters arranged in lines or rows. Each line
contains a sequence of characters.
Source File
A Source file is a collection of subroutines and functions. Subroutines and
functions are sections of code that performs specific tasks.
Example: In a Source code for a calculator program, you might have a “add”
function that adds two numbers and “subtract” function that subtracts one
number from another. These functions are organized within the source file to
make the program work.
Object File
An Object file is a file containing a sequence of bytes that are structured in a
way the system linker’s can understand and use.
Example: Imagine you are writing program and you have several source code
file, each containing different functions and variables. When you compile these
source files, they are transformed into object files. The Linker’s job is to
combine those object files into a single executable program.
Executable File
An Executable file is a file containing instructions that a loader can load into
computers memory and then execute.
Example: Imagine you have a computer game called “my_game.exe”. When
you double click on this file to start the game. It is considered as an executable
file. Inside “my_game.exe” there are instructions that tell your computer how to
play the game. These instructions need to be loaded into your computer’s
memory so the game can run smoothly.
FILE ATTRIBUTES
A File named for the convenience of its human users. A name is usually a string
of characters such as “example.c”
Other Attributes,
NAME - This is the only information kept in human readable format.
IDENTIFIER is a Unique tag, usually a number.
TYPE – needed for systems that supports different types
LOCATION – pointer to the location of the file.
SIZE – Current file size
PROTECTION – It’s the access control information which determines who can
read, write & execute.
TIME, DATE & USER IDENTIFICATION – this information may be kept for
creation, last modification and last use and it is useful for security, protection
and usage monitoring.
FILE OPERATIONS
File is an abstract data type. System calls are a way for programs to interact
with the operating system, to perform various tasks like creating, reading,
writing, repositioning, deleting and truncating files. To define a file property,
we need to consider the operations that can be performed on files.
Creating a file
Writing a file
Reading a file
Repositioning within a file
Deleting a file
Truncating a file.
Creating A File
Creating a file involves two fundamental steps.
1. Find space in file system.
To create a file, you need to find an empty space.
2. Create an entry in the directory.
Now that we have found space for the file, you have to create an entry.
This directory’s entry associates the file’s name with its location.
Writing A File
To write a file, we make a system call specifying both the name of the file and
the information to be written to the file.
Reading A File
To read from a file, we use a system call that specifies the name of the file and
where the next block of file should be put.
Repositioning Within A File
This allows you to read or write data from a particular point within the file.
Deleting A File
To delete a file, we search the directory for the named file and erase the entry.
Truncating A File
The user may want to erase the contents of a file but keep the attributes.
ACCESS METHODS
Information’s in file can be accessed in several ways.
Sequential Access
Direct or Relative Access
Other Access Methods
SEQUENTIAL ACCESS
It is the simplest access method. Information in the file is processed in order,
one record after the other. This is most commonly used by editors and
compilers. Sequential access involves reading or writing data in a sequential,
linear manner from beginning of the file to the end. It is similar to reading a
book from the first page to the last page one page at a time. This method is
efficient for tasks like processing logs.
Most common operations on files are,
Read
Write
Whenever a file is opened for a read or write operation a file pointer is
maintained to keep track of the current position in the file. Read operation reads
the next position of the file. Write operation appends data to the end of the file.
DIRECT ACCESS
Direct access is also known as relative access. It is of fixed length logical
records. There is no particular order for read and write. It is most suitable for
database applications and it is easy to read write and delete a record. For direct
access, the file is viewed as a numbered sequence of block or record. A block
number provided by the user to the operating system is normally a relative
block number, the first relative block of the file is 0 and then 1 and so on.
Direct or relative access methods allow you to read or write data at specific
locations within the file with no particular order. Consider a database file where
each record represents information about a Customer. If you want to retrieve
data for a specific Customer, you can use direct access methods.
INDEXED SEQUENTIAL ACCESS
It is the other method of accessing a file that is built on the top of the sequential
access method. Index is a small table stored in memory. Index contains pointers
to various blocks.
Uses indexes in a hierarchy to point to records in a file.
DIRECTORY STRUCTURE
A collection of nodes containing information about all files. Both the directory
structure and the files reside on disk. Directory can be defined as the listing of
the related files on the disk.
Directory operations are listed below,
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
SINGLE LEVEL DIRECTORY
A single directory for all users.
Since all the files are in the same directory, they must have a unique name. Two
files cannot have the same name.
Drawback: Naming a file.
The solution to this problem is to create a separate directory for each user.
ABSOLUTE PATH
The Absolute path always starts from the root directory (/).
Example: C:\Users\HP\Desktop\Notes\OS-unit5.txt
RELATIVE PATH
A relative path starts from the current directory.
Example: \Notes\ OS-unit5.txt
ACYCLIC GRAPH DIRECTORIES
Here a file in one directory can be accessed from multiple directories. In this
way, the files could be shared in between the users.
In the given diagram, where a file is shared between multiple users. If any user
makes a change, it would be reflected to both the users.
Two or more directory entry can point to the same file or sub directory. That file
or sub directory is shared between the two directory entries. Links can either be
symbolic (logical) or hard link (physical).
If a file gets deleted in acyclic graph structured directory system, then
1. In the case of soft link, the file just gets deleted and we are left with a
dangling pointer.
2. In the case of hard link, the actual file will be deleted only if all the
references to it gets deleted.
Advantages
Sharing of files and directories is allowed between multiple users.
Disadvantages
Since it has a complex structure, it is difficult to implement this directory
structure.
If we need to delete the file, then we need to delete all the references of
the file in order to delete it permanently.
HOW DO WE GAURANTEE NO CYCLE?
Allow only links to file not subdirectories.
Every time a new link is added use a cycle detection algorithm to
determine whether it is OK.
DISK STRUCTURE
A hard disk can be divided into the number of partitions of different sizes. The
partitions are also called volumes or mini disks.
The technique that operating system uses to determine the request which is to be
satisfied next is called disk scheduling.
Seek Time
Seek time is the time taken in locating the disk arm to a specified track where
the read/write request will be satisfied.
Rotational Latency
It is the time taken by the desired sector to rotate itself to the position from
where it can access the R/W heads.
Transfer Time
It is the time taken to transfer the data.
Disk Access Time
Disk access time is given as,
Disk Access Time = Rotational Latency + Seek Time + Transfer Time
Disk Response Time
It is the average of time spent by each request waiting for the IO operation.
DISK SCHEDULING ALGORITHM
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50.
"Seek time" will be calculated by adding the head movement differences of all
the requests:
Seek time
= (170-50) + (170-43) + (140-43) + (140-16) + (190-16)
= 120 + 127 + 97 + 124 + 174
= 642
SSTF (Shortest Seek Time First)
In SSTF (Shortest Seek Time First), requests having the shortest seek time are
executed first. So, the seek time of every request is calculated in advance in the
queue and then they are scheduled according to their calculated seek time.
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50.
Seek Time
= (50-16) + (190-16)
= 34 + 174
= 208
SCAN (Elevator Algorithm)
In this algorithm, the head starts to scan all the requests in a direction and
reaches the end of the disk. After that, it reverses its direction and starts to scan
again the requests in its path and serves them. Due to this feature, this algorithm
is also known as the "Elevator Algorithm". It works in the way an elevator
works, elevator moves in a direction completely till the last floor of that
direction and then turns back.
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50. The 'disk
arm' will first move to the larger values.
Seek time
= (199-50) + (199-16)
= 149 + 183
=332
C-SCAN (Circular SCAN)
In C-SCAN algorithm, the arm of the disk moves in a particular direction
servicing requests until it reaches the last request, then it jumps to the last
request of the opposite direction without servicing any request then it turns back
and start moving in that direction servicing the remaining requests.
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50.
Seek time
= (199−50) + (199−0) + (43−0)
= 149 + 199 + 43
= 391
LOOK
It is similar to SCAN algorithm. In this algorithm, the disk arm moves to the
last request present and services them. After reaching the last request, it reverses
its direction and again comes back to the starting point. It does not go to the end
of the disk, in spite, it goes to the end of requests.
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50.
Seek time
= (190-50) + (190-16)
= 140 + 174
= 314
CLOOK (Circular Look)
In this algorithm, the arm of the disk moves outwards servicing requests until it
reaches the highest request, then it jumps to the lowest request cylinder without
servicing any request then it again start moving outwards servicing the
remaining requests.
Example: Suppose a disk having 200 tracks (0-199). The request sequence
(82,170,43,140,24,16,190) of the disk. The head start is at request 50.
Seek Time
= (190−50) + (190−16) + (43−16)
= 140 + 174 + 27
= 341
FILE SHARING
Sharing of files in a multi user system is desirable.
Sharing may be done through a protection scheme.
On distributed systems, files may be shared across a network.
Network File System (NFS) is a common distributed file-sharing method.
MULTIPLE USERS
User IDs Identify users, allowing permissions.
Group IDs allow users to be in groups, permitting group access rights.
REMOTE FILE SYSTEM
Uses networking to allow file system access between systems.
o Manually via programs like FTP.
o Automatically, seamlessly using distributed file systems.
o Semi automatically via the world wide web.
Client-server model allows clients to mount remote file systems from
servers.
o Server can serve multiple clients.
o Client and user-on-client identification is insecure or complicated.
o NFS is standard UNIX client-server file sharing protocol.
o CIFS is standard Windows protocol Standard operating system file
calls are translated into remote calls.
Distributed Information Systems (distributed naming services) such as
LDAP, DNS, NIS, Active Directory implement unified access to information
needed for remote computing.
PROTECTION
File owner should be able to control-
What can be done
By whom
Types of access
Read
Write
Execute
Append
Delete
List
MODES OF ACCESS
Read (R) => 4
Write (W) => 2
Execute (X) => 1
4,2,1 is the constant value for read, write and execute.
3 Classes of users RWX
Owner access 7 1 1 1
Group access 6 1 1 0
Public access 1 0 0 1
ALLOCATION METHODS
The allocation methods define how the files are stored in the disk blocks. There
are three main disk space or file allocation methods.
Contiguous Allocation
Linked Allocation
Indexed Allocation
CONTIGUOUS ALLOCATION
In this method, each file occupies a contiguous set of blocks on the disk. For
example, if a file requires n blocks and is given a block b as the starting
location, then the blocks assigned to the file will be: b, b+1, b+2,…b+n-1.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘tr’ in the following figure starts from the block 14 with length = 3
blocks. Therefore, it occupies 14,15,16 blocks.
LINKED ALLOCATION
In this method, each file is a linked list of disk blocks which need not be
contiguous. The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block.
Each block contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly
distributed. The last block (25) contains -1 indicating a null pointer and does not
point to any other block.
INDEXED ALLOCATION
In this method, a special block known as the Index block contains the pointers
to all the blocks occupied by a file. Each file has its own index block.
The directory entry contains the address of the index block as shown in the
image.
Linked scheme:
This scheme links two or more index blocks together for holding the pointers.
Every index block would then contain a pointer or the address to the next index
block.
Multilevel index:
In this a first level index block is used to point to the second level index blocks
which in turn points to the disk blocks occupied by the file. This can be
extended to 3 or more levels depending on the maximum file size.
Combined Scheme:
In this scheme, a special block called the I node (Information Node) contains all
the information about the file such as the name, size, authority etc. The
remaining space of I node is used to store the Disk Block addresses which
contain the actual file.
Direct Blocks
The pointers contain the addresses of the disk blocks that contain data of the
file.
Indirect Blocks
Indirect blocks may be single indirect, double indirect or triple indirect.
Single Indirect block is the disk block that does not contain the file data but the
disk address of the blocks that contain the file data.
Similarly, double indirect blocks do not contain the file data but the disk
address of the blocks that contain the address of the blocks containing the file
data.
FREE SPACE MANAGEMENT
To keep track of free disk space the system maintains a free space list.
The free space list records all free disk blocks.
BIT VECTOR
The free space list is implemented as bit map or bit vector. Each block is
represented by 1 bit. If the block is free, the bit is 1. If the block is allocated, the
bit is 0.
0,0,1,1,1,1,0,0,1,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,….
(Number of bits per word) x (Number of 0-value words) + Offset of first 1 bit.
LINKED LIST
Another approach to free space management is to link together all the free disk
blocks, keeping a pointer to the first free block. The first block contains a
pointer to the next free disk block and so on.
EFFICIENCY & PERFORMANCE
EFFICIENCY
The efficient use of disk space depends heavily on the disk-allocation and
directory algorithms in use. For instance, UNIX I nodes are pre-allocated on a
volume. Even an empty disk has a percentage of its space lost to I nodes.
However, by pre-allocating the I nodes and spreading them across the volume,
we improve the file system’s performance. This improved performance results
from the UNIX allocation and free-space algorithms, which try to keep a file’s
data blocks near that file’s I node block to reduce seek time.
PERFORMANCE
Some systems maintain a separate section of main memory for a buffer cache,
where blocks are kept under the assumption that they will be used again shortly.
Other systems cache file data using a page cache. The page cache uses virtual
memory techniques to cache file data as pages rather than as file-system-
oriented blocks. Caching file data using virtual addresses is far more efficient
than caching through physical disk blocks, as accesses interface with virtual
memory rather than the file system. Several systems—including Solaris, Linux,
and Windows use page caching to cache both process pages and file data. This
is known as unified virtual memory.
RECOVERY TECHNIQUES
Consistency checking - compares data in the directory structure with data
blocks on disk, and tries to fix inconsistencies.
Use system programs to backup data from disk to another storage device
(floppy disk, magnetic tape, optical)
Recover lost file or disk by restoring data from backup.
NETWORK FILE SYSTEM (NFS)
An implementation and a specification of a software system for accessing
remote files across LANs or WANs.