UNIT 2 OS
UNIT 2 OS
A file is a collection of correlated information which is recorded on secondary or non-volatile storage like magnetic disks, optical disks,
and tapes. It is a method of data collection that is used as a medium for giving input and receiving output from that program.
In general, a file is a sequence of bits, bytes, or records whose meaning is defined by the file creator and user. Every File has a logical
location where they are located for storage and retrieval.
Field:
This element stores a single value, which can be static or variable length.
DATABASE:
Collection of related data is called a database. Relationships among elements of data are explicit.
FILES:
RECORD:
A Record type is a complex data type that allows the programmer to create a new data type with the desired column structure. Its
groups one or more columns to form a new data type. These columns will have their own names and data type.
26.3M
173
What is JVM (Java Virtual Machine) with Architecture JAVA Programming Tutorial
● Files are stored on disk or other storage and do not disappear when a user logs off.
● Files have names and are associated with access permission that permits controlled sharing.
● Files could be arranged or more complex structures to reflect the relationship between them.
File Attributes
A file has a name and data. Moreover, it also stores meta information like file creation date and time, current size, last modified date,
etc. All this information is called the attributes of a file system.
Functions of File
● Create file, find space on disk, and make an entry in the directory.
● Write to file, requires positioning within the file
● Read from file involves positioning within the file
● Delete directory entry, regain disk space.
SINGLE-LEVEL DIRECTORY
In this a single directory is maintained for all the users.
● Naming problem: Users cannot have same name for two files.
● Grouping problem: Users cannot group files according to their need.
TWO-LEVEL DIRECTORY
In this separate directories for each user is maintained.
● Path name:Due to two levels there is a path name for every file to locate that file.
● Now,we can have same file name for different user.
● Searching is efficient in this method.
●
TREE-STRUCTURED DIRECTORY :
Directory is maintained in the form of a tree. Searching is efficient and also there is grouping capability. We
have absolute or relative path name for a file.
● Contiguous Allocation
● Linked Allocation
● Indexed Allocation
The main idea behind these methods is to provide:
1. Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file requires n blocks
and is given a block b as the starting location, then the blocks assigned to the file will be: b, b+1, b+2,……b+n-
1. This means that given the starting block address and the length of the file (in terms of blocks required), we
can determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
● Address of starting block
● Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks. Therefore, it occupies 19,
20, 21, 22, 23, 24 blocks.
Advantages:
● Both the Sequential and Direct Accesses are supported by this. For direct access, the address of
the kth block of the file which starts at block b can easily be obtained as (b+k).
● This is extremely fast since the number of seeks are minimal because of contiguous allocation of
file blocks.
Disadvantages:
● This method suffers from both internal and external fragmentation. This makes it inefficient in
terms of memory utilization.
● Increasing file size is difficult because it depends on the availability of contiguous memory at a
particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous. The disk blocks can be
scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each block contains a pointer to
the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last block (25) contains -1
indicating a null pointer and does not point to any other block.
Advantages:
● This is very flexible in terms of file size. File size can be increased easily since the system does
not have to look for a contiguous chunk of memory.
● This method does not suffer from external fragmentation. This makes it relatively better in terms
of memory utilization.
Disadvantages:
● Because the file blocks are distributed randomly on the disk, a large number of seeks are needed
to access every block individually. This makes linked allocation slower.
● It does not support random or direct access. We can not directly access the blocks of a file. A
block k of a file can be accessed by traversing k blocks sequentially (sequential access ) from the
starting block of the file via block pointers.
● Pointers required in the linked allocation incur some extra overhead.
●
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the blocks occupied by a
file. Each file has its own index block. The ith entry in the index block contains the disk address of the ith file
block. The directory entry contains the address of the index block as shown in the image:
Advantages:
● This supports direct access to the blocks occupied by the file and therefore provides fast access to
the file blocks.
● It overcomes the problem of external fragmentation.
Disadvantages:
● The pointer overhead for indexed allocation is greater than linked allocation.
● For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one
entire block (index block) for the pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer per block.
For files that are very large, single index block may not be able to hold all the pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks together for holding the pointers.
Every index block would then contain a pointer or the address to the next index block.
Multilevel index: In this policy, a first level index block is used to point to the second level index blocks which
inturn points to the disk blocks occupied by the file. This can be extended to 3 or more levels depending on the
maximum file size. File Access Methods in Operating System
1. Sequential Access –
It is the simplest access method. Information in the file is processed in order, one record after the
other. This mode of access is by far the most common; for example, editor and compiler usually
access the file in this fashion.
Read and write make up the bulk of the operation on a file. A read operation -read next- read the
next position of the file and automatically advance a file pointer, which keeps track I/O location.
Similarly, for the writewrite next append to the end of the file and advance to the newly written
material.
Key points:
● When we use write command, it will allocate memory and move the pointer to the
end of the file
2. Direct Access –
Another method is direct access method also known as relative access method. A filed-length
logical record that allows the program to read and write record rapidly. in no particular order. The
direct access is based on the disk model of a file since disk allows random access to any file block.
For direct access, the file is viewed as a numbered sequence of block or record. Thus, we may
read block 14 then block 59 and then we can write block 17. There is no restriction on the order of
reading and writing for a direct access file.
A block number provided by the user to the operating system is normally a relative block number,
the first relative block of the file is 0 and then 1 and so on.
Magnetic Disks
Traditional magnetic disks have the following basic structure:
One or more platters in the form of disks covered with magnetic media. Hard disk platters are made of rigid metal, while "floppy" disks are
made of more flexible plastic.
Each platter has two working surfaces. Older hard disk drives would sometimes not use the very top or bottom surface of a stack of
platters, as these surfaces were more susceptible to potential damage.
Each working surface is divided into a number of concentric rings called tracks. The collection of all tracks that are the same distance from
the edge of the platter, ( i.e. all tracks immediately above one another in the following diagram ) is called a cylinder.
Each track is further divided into sectors, traditionally containing 512 bytes of data each, although some modern disks occasionally use
larger sector sizes. ( Sectors also include a header and a trailer, including checksum information among other things. Larger sector sizes
reduce the fraction of the disk consumed by headers and trailers, but increase internal fragmentation and the amount of disk that must be
marked bad in the case of errors. )
The data on a hard drive is read by read-write heads. The standard configuration ( shown below ) uses one head per surface, each on a
separate arm, and controlled by a common arm assembly which moves all heads simultaneously from one cylinder to another. ( Other
configurations, including independent read-write heads, may speed up disk access, but involve serious technical difficulties. )
The storage capacity of a traditional disk drive is equal to the number of heads ( i.e. the number of working surfaces ), times the number of
tracks per surface, times the number of sectors per track, times the number of bytes per sector. A particular physical block of data is
specified by providing the head-sector-cylinder number at which it is located.
●
Disk Scheduling Algorithms
Disk scheduling is done by operating systems to schedule I/O requests arriving for the disk. Disk scheduling is
also known as I/O scheduling.
Disk scheduling is important because:
● Multiple I/O requests may arrive by different processes and only one I/O request can be served at
a time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need
to be scheduled.
● Two or more request may be far from each other so can result in greater disk arm movement.
● Hard drives are one of the slowest parts of the computer system and thus need to be accessed in
an efficient manner.
There are many Disk Scheduling Algorithms but before discussing them let’s have a quick look at some of the
important terms:
● Seek Time:Seek time is the time taken to locate the disk arm to a specified track where the data
is to be read or write. So the disk scheduling algorithm that gives minimum average seek time is
better.
● Rotational Latency: Rotational Latency is the time taken by the desired sector of disk to rotate
into a position so that it can access the read/write heads. So the disk scheduling algorithm that
gives minimum rotational latency is better.
● Transfer Time: Transfer time is the time to transfer the data. It depends on the rotating speed of
the disk and number of bytes to be transferred.
● Disk Access Time: Disk Access Time is:
Transfer Time
● Disk Response Time: Response Time is the average of time spent by a request waiting to
perform its I/O operation. Average Response time is the response time of the all
requests. Variance Response Time is measure of how individual request are serviced with respect
to average response time. So the disk scheduling algorithm that gives minimum variance
response time is better.
Example:
Suppose the order of request is- (82,170,43,140,24,16,190)
And current position of Read/Write head is : 50
Advantages:
2. SSTF: In SSTF (Shortest Seek Time First), requests having shortest seek time are executed first.
So, the seek time of every request is calculated in advance in the queue and then they are
scheduled according to their calculated seek time. As a result, the request near the disk arm will
get executed first. SSTF is certainly an improvement over FCFS as it decreases the average
response time and increases the throughput of system.Let us understand this with the help of an
example.
Example:
Suppose the order of request is- (82,170,43,140,24,16,190)
And current position of Read/Write head is : 50
=(50-43)+(43-24)+(24-16)+(82-16)+(140-82)+(170-40)+(190-170)
=208
Advantages:
Example:
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at
50, and it is also given that the disk arm should move “towards the larger value”.
=(199-50)+(199-16)
=332
Advantages:
● High throughput
● Low variance of response time
● Average response time
Disadvantages:
● Long waiting time for requests for locations just visited by disk arm
4. CSCAN: In SCAN algorithm, the disk arm again scans the path that has been scanned, after
reversing its direction. So, it may be possible that too many requests are waiting at the other end
or there may be zero or few requests pending at the scanned area.
These situations are avoided in CSCAN algorithm in which the disk arm instead of reversing its direction goes to
the other end of the disk and starts servicing the requests from there. So, the disk arm moves in a circular
fashion and this algorithm is also similar to SCAN algorithm and hence it is known as C-SCAN (Circular SCAN).
Example:
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it is
also given that the disk arm should move “towards the larger value”.
=(199-50)+(199-0)+(43-0)
=391
Advantages:
Example:
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at
50, and it is also given that the disk arm should move “towards the larger value”.
=(190-50)+(190-16)
=314
6. CLOOK: As LOOK is similar to SCAN algorithm, in similar way, CLOOK is similar to CSCAN disk
scheduling algorithm. In CLOOK, the disk arm in spite of going to the end goes only to the last
request to be serviced in front of the head and then from there goes to the other end’s last
request. Thus, it also prevents the extra delay which occurred due to unnecessary traversal to the
end of the disk.
Example:
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at
50, and it is also given that the disk arm should move “towards the larger value”
=(190-50)+(190-16)+(43-16)
=341
Types of Access :
The files which have direct access of the any user have the need of protection. The files which are not
accessible to other users doesn’t require any kind of protection. The mechanism of the protection provide the
facility of the controlled access by just limiting the types of access to the file. Access can be given or not given
to any user depends on several factors, one of which is the type of access required. Several different types of
operations can be controlled:
● Read –
Reading from a file.
● Write –
Writing or rewriting the file.
● Execute –
Loading the file and after loading the execution process starts.
● Append –
Writing the new information to the already existing file, editing must be end at the end of the
existing file.
● Delete –
Deleting the file which is of no use and using its space for the another data.
● List –
List the name and attributes of the file.
Operations like renaming, editing the existing file, copying; these can also be controlled. There are many
protection mechanism. each of them mechanism have different advantages and disadvantages and must be
appropriate for the intended application.
Access Control :
There are different methods used by different users to access any file. The general way of protection is to
associate identity-dependent access with all the files and directories an list called access-control list
(ACL) which specify the names of the users and the types of access associate with each of the user. The main
problem with the access list is their length. If we want to allow everyone to read a file, we must list all the users
with the read access. This technique has two undesirable consequences:
Constructing such a list may be tedious and unrewarding task, especially if we do not know in advance the list
of the users in the system.
Previously, the entry of the any directory is of the fixed size but now it changes to the variable size which
results in the complicates space management. These problems can be resolved by use of a condensed version
of the access list. To condense the length of the access-control list, many systems recognize three classification
of users in connection with each file:
● Owner –
Owner is the user who has created the file.
● Group –
A group is a set of members who has similar needs and they are sharing the same file.
● Universe –
In the system, all other users are under the category called universe.
The most common recent approach is to combine access-control lists with the normal general owner, group,
and universe access control scheme. For example: Solaris uses the three categories of access by default but
allows access-control lists to be added to specific files and directories when more fine-grained access control is
desired.
File system
A file system is a structure used by an operating system to organise and manage files on a storage device such as
a hard drive, solid state drive (SSD), or USB flash drive. It defines how data is stored, accessed, and organised
on the storage device. Different file systems have varying characteristics and are often specific to certain
operating systems or devices.
FAT is one of the oldest and simplest file systems. It was initially developed for MS-DOS and is still used in
many removable storage devices. The two major versions of this system are FAT16 and FAT32. FAT uses a file
allocation table to keep track of file locations on the disk. However, it lacks some advanced features like file
permissions and journaling, making it less suitable for modern operating systems. FAT 16 was introduces in
1987 with DOS 3.31 while FAT32 was introduced with Windows 95 OSR2(MS-DOS 7.1) in 1996.
Advantages:
Simplicity: This simplicity makes it easy to implement and use, making it suitable for devices with
limited resources or compatibility requirements.
Data recovery: Due to its simple structure, FAT file systems are relatively easy to recover in case of data
corruption or accidental deletion.
Compatibility: It can natively be read from and written to by Windows, MacOS and Linux operating
systems without the need for third-party software.
Disadvantages:
Fragmentation: Fragmentation occurs when file data is scattered across different parts of the disk, resulting in
reduced performance. Regular defragmentation is required to optimise disk performance.
Lack of advanced features: The newest version, FAT32, lacks several advanced features found in other file
systems. It does not support file-level security permissions, journaling, encryption, or compression.
Volume name limitations: The volume names for FAT16 and FAT32 cannot exceed 11 characters and cannot
include most non-alphanumeric characters.
File name limitations: Files on a FAT16 file system cannot exceed 8.3 character for their files names. This means
8 characters plus a 3 character file extension.
exFAT is a file system introduced by Microsoft as an improved version of FAT32. It addresses some of the
limitations of FAT32, allowing for larger file sizes and better performance. exFAT is commonly used for
removable storage devices, such as external SSDs, hard drives and SD cards as it provides compatibility across
multiple operating systems. It was first introduced in 2006 as part of Windows CE 6.0.
Advantages:
Large file and partition size support: exFAT supports much larger file sizes and partition sizes
compared to FAT file systems. It can handle files bigger than 4 GB, making it suitable for storing large
media files or disk images.
Efficient disk space utilisation: exFAT improves disk space utilisation compared to older FAT file
systems. It uses smaller cluster sizes, which reduces the amount of wasted disk space for smaller files.
Compatibility: It can natively be read from and written to by Windows and MacOS operating systems
without the need for third-party software.
Disadvantages:
Limited metadata support: exFAT lacks some advanced features found in other modern file systems. It doesn’t
support file-level security permissions, journaling, or file system-level encryption.
Fragmentation: Like FAT file systems, exFAT is still susceptible to fragmentation. As files are created, modified,
and deleted, fragmentation can occur leading to decreased performance over time.
NTFS is the default file system used by Windows NT-based operating systems, starting in 1993 with Windows
NT 3.1, all the way up to and including Windows 11. It offers advanced features like file permissions,
encryption, compression, and journaling. NTFS supports large file and partition sizes, making it suitable for
modern storage devices. However, it has limited compatibility with non-Windows operating systems.
Advantages:
Security and permissions: NTFS provides a solid security model with file-level permissions. It allows
you to set permissions for individual files and folders, controlling access rights for users and groups.
Trim support on solid-state drives (SSDs): TRIM informs the drive about unused data, which allows the
SSD to erase and prepare the space for future writes. TRIM is enabled by default, when NTFS file system
is chosen to maintain its performance.
Disadvantages:
Disk errors and repairs: Although NTFS is designed to be reliable, disk errors can still occur. When encountering
disk errors, NTFS repairs can be time-consuming and may require special tools.
Fragmentation: Over time, NTFS file systems can become fragmented, especially as files are created, modified
and deleted. Fragmentation can lead to decreased performance as the system needs to access scattered file
fragments.
RAID 0
o RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It is based on
stripping that means if one disk fails then all data in the array is lost.
o This level doesn't provide fault tolerance but increases the system performance.
Example:
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
In this level, instead of placing just one block into a disk at a time, we can work with two or more blocks placed
it into a disk before moving on to the next one.
20 22 24 26
21 23 25 27
28 30 32 34
29 31 33 35
In this above figure, there is no duplication of data. Hence, a block once lost cannot be recovered.
Pros of RAID 0:
o In this level, throughput is increased because multiple data requests probably not on the same disk.
o This level full utilizes the disk space and provides high performance.
o It requires minimum 2 drives.
Cons of RAID 0:
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides 100% redundancy
in case of a failure.
Example:
A A B B
C C D D
E E F F
G G H H
Only half space of the drive is used to store the data. The other half of drive is just a mirror to the already
stored data.
Pros of RAID 1:
o The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails, then the other
automatically takes over.
o In this level, the array will function even if any one of the drives fails.
Cons of RAID 1:
o In this level, one extra drive is required per drive for mirroring, so the expense is higher.
RAID 2
o RAID 2 consists of bit-level striping using hamming code parity. In this level, each data bit in a word is
recorded on a separate disk and ECC code of data words is stored on different set disks.
o Due to its high cost and complex structure, this level is not commercially used. This same performance
can be achieved by RAID 3 at a lower cost.
Pros of RAID 2:
o This level uses one designated drive to store parity.
o It uses the hamming code for error detection.
Cons of RAID 2:
RAID 3
o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity information is
stored for each disk section and written to a dedicated parity drive.
o In case of drive failure, the parity drive is accessed, and data is reconstructed from the remaining
devices. Once the failed drive is replaced, the missing data can be restored on the new drive.
o In this level, data can be transferred in bulk. Thus high-speed data transmission is possible.
A B C P(A, B, C)
D E F P(D, E, F)
G H I P(G, H, I)
J K L P(J, K, L)
Pros of RAID 3:
Cons of RAID 3:
RAID 4
o RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating data, the RAID 4
adopts a parity-based approach.
o This level allows recovery of at most 1 disk failure due to the way parity works. In this level, if more
than one disk fails, then there is no way to recover the data.
o Level 3 and level 4 both are required at least three disks to implement RAID.
A B C P0
D E F P1
G H I P2
J K L P3
In this figure, we can observe one disk dedicated to parity.
In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1 then the parity bits is
XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit is XOR(0,0,1,1)= 0. That means, even number
of one results in parity 0 and an odd number of one results in parity 1.
C1 C2 C3 C4 Parity
0 1 0 0 1
0 0 1 1 0
Suppose that in the above figure, C2 is lost due to some disk failure. Then using the values of all the other
columns and the parity bit, we can recompute the data bit stored in C2. This level allows us to recover lost
data.
RAID 5
o RAID 5 is a slight modification of the RAID 4 system. The only difference is that in RAID 5, the parity
rotates among the drives.
o It consists of block-level striping with DISTRIBUTED parity.
o Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than one disk fails, then
there is no way for data recovery.
0 1 2 3 P0
5 6 7 P1 4
10 11 P2 8 9
15 P3 12 13 14
P4 16 17 18 19
This level was introduced to make the random write performance better.
Pros of RAID 5:
Cons of RAID 5:
o In this level, disk failure recovery takes longer time as parity has to be calculated from all available
drives.
o This level cannot survive in concurrent drive failure.
RAID 6
o This level is an extension of RAID 5. It contains block-level stripping with 2 parity bits.
o In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using RAID 5, and RAID 1. When
your disks fail, you need to replace the failed disk because if simultaneously another disk fails then
you won't be able to recover any of the data, so in this case RAID 6 plays its part where you can
survive two concurrent disk failures before you run out of options.
A0 B0 Q0 P0
A1 Q1 P1 D1
Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
o This level performs RAID 0 to strip data and RAID 1 to mirror. In this level, stripping is performed
before mirroring.
o In this level, drives required should be multiple of 2.
Cons of RAID 6: