Module-1 Introduction To File Structures
Module-1 Introduction To File Structures
1
Prepared by Shilpa B, Dept. of ISE, CEC
Overview of File Structure Design
I. General Goals
o Get the information we need with one access to the disk.
o If that’s not possible, then get the information with as few accesses as possible.
o Group information so that we are likely to get everything we need with only one trip to the disk.
II. Fixed versus Dynamic Files
o It is relatively easy to come up with file structure designs that meet the general goals when the
files never change.
o When files grow or shrink when information is added and deleted, it is much more difficult.
3
Prepared by Shilpa B, Dept. of ISE, CEC
Difference between physical and logical files:
a. Physical files
A collection of bytes are stored on a disk The file residing in secondary storage
or tape. device and managed by operating system
A file, when the word is used in this sense, is called as physical file
physically exists.
A disk drive might contain even thousands
of these physical files.
The physical file is very much hardware
and OS dependent.
The computer considers all kinds of files
as stream of bytes.
The operating system acts as manager in
managing these files
b. Logical files
Logical file is a channel that connects the program to a physical file.Programs read and write data
from logical file.Before a logical file can be used, it must be associated with a physical file.
This act of connection is called operating the file.Data in a physical file is persistent and data in
logical file is temporary.
A logical file is identified by a program variable or constant.The program sends (or receives) bytes to
(or from) a file through the logical file.
4
Prepared by Shilpa B, Dept. of ISE, CEC
The program knows nothing about where the bytes go or from where it came.The OS is responsible
for associating logical file in a program to a physical file in disk or tape.
Writing through or reading from a file in a program is done through OS
Opening Files
To associate a logical program file with a physical system file we have two options:
1) Open an existing file
2) Create a new file, deleting anyexisting contents in the physical file.
Opening a file makes it ready for use by the program.The C++ open function is used to open a file.
Function to open a file:
fd = open(filename, flags, [pmode]);
Argument Type Explanation
Type
fd int The file descriptor. Used to refer to file within the program. It is an integer. If
there is an error to open file, this value is negative.
filename char* A character string contains physical file name
flags int Flags argumments controls the operations of open function, determining
whether it opens the existing file or reading or writing. The value of flags is set
by performing a bitwise OR of the following
O_RDONLY : Open the file for Read only
O_WRONLY : Open the file for Write only
O_RDWR : Open the file for Read or write
O_CREAT : Create file if it does not exist
O_APPEND : Append every write operation to the end of the file
O_TRUNC : Delete any prior file contents
pmode int If O_CREAT is specified, pmode is required. This integer argument specifies
protection mode for the file. The pmode is 3 digit octal number, 0751, that
(protection indicates how the file can be used by the owner (1st digit), by the members of
mode) the owner’s group (2nd digit), and by every one else (3rd digit). The 1st bit of
each octal digit indicates read permission, 2nd digit write permission and the
third execute permission.
5
Prepared by Shilpa B, Dept. of ISE, CEC
How to do it in c++?
Standard C++ stream classes are defined in iostream.h and fstream.h
We can create file in one statement and open it in another using open( ) function which is the member
of fstream class.In the open( ) function we include several mode bits to specify certain aspect of file
object
Ex: fstream file;
File.open(“myfile.txt”, ios::out);
Mode bits for open( ) function
member stands
access
constant for
ios::in input File open for reading: the internal stream buffer supports input operations.
ios::out output File open for writing: the internal stream buffer supports output operations.
ios::binary binary Operations are performed in binary mode rather than text.
All output operations happen at the end of the file, appending to its existing
ios::app append
contents.
ios::trunc truncate Any contents that existed in the file before it is open are discarded.
Closing Files
close(fd); (fd : file descriptor): Closing a file is like hanging up a phone.
When you hang up the phone, the phone line is available for taking or placing another call
When you close a file the logical file name or file descriptor is available for use with another file
Files are usually closed automatically by the OS when programs terminate normally
The execution of a close statement within a program is needed only to protect it against data loss and
to free up the logical filenames for reuse. In C++ file.close( );
Reading and Writing
Reading and writing are fundamentals to file processing. They are the actions that make file
processing an input/output operation
a. Read function
It requires 3 pieces of information: Read( Source_file, Destination_address, Size);
Source_file The Read call must know where it is read from. We specify the source by
logical file name through which data is specified
Destination_address Read must know where to place the information it reads from the input file.
We specify the destination by giving the 1st address of the memory block
where we want to store the data
Size Read must know how much information to bring in from the file. Here
argument is supplied as a byte count
6
Prepared by Shilpa B, Dept. of ISE, CEC
b. Write function
Write function is used to write data from a variable inside the program into the file
It is similar to read function but moves in the other direction
Write(Destination_file, Source_address, Size);
Destination_file The logical filename that is used for sending the data
Source_address Write must know where to find information it will send. We provide this
specification as the 1st address of the memory block where the data is stored
Size The no. of bytes to b written must be supplied
Seeking
In previous samples we read file sequentially, reading one byte after another until we reach end of
the file.Sometimes we want to read or write without going through every byte sequentially
Perhaps we know that next piece of information resides 10 thousand bytes away, so we need to jump
there.Or perhaps we need to jump at the end of file so we can add new information there.
To satisfy these needs we must be able to control the movements of the read write pointer
The action of moving directly to a certain position in a file is often called seeking
A seek requires 2 pieces of information: Seek( Source_file, Offset);
Source_file is the logical filename in which the seek will occur
Offset is the number of positions in the file, the pointer is to be moved from the start of the file
Ex: Seek(data, 373);
This makes to move the pointer directly from the origin to the 373 rd position in a file called data
No matter what computer system you have, even if it is a small PC, there may be thousands of files
you have access to. To do so, computer has some method for organizing its files.In unix it is called
file system
Unix file system is a tree structured organization of directories with the root of the tree signified by
the character ‘/’. All directories including root contains 2 kinds of files: regular files with programs
and data and directories
7
Prepared by Shilpa B, Dept. of ISE, CEC
The above diagram shows sample unix directory structure.Since every files in unix system is part of
the file system that begins with root, any files can be uniquely identified by giving its absolute
pathname.
Ex: the true unambiguous name for the file “addr” is /usr6/mydir/addr.
Physical and Logical files in UNIX
It is easy to think of magnetic disk as a source of file because we are used to the idea of storing such
things on disks
But, in unix, devices like keyboard and console are also files (as shown in the above diagram)
/dev/kbd and /dev/console respectively
The keyboard produces a sequence of bytes that are sent to the computer when keys are pressed.
The console accepts a sequence of bytes and displays the symbols on screen. A Unix file is
represented logically by an integer-the file descriptor
This integer is an index to an array of more complete information about the file.
A keyboard, a disk file, and a magnetic tape are all represented by integers.
Once the integer that describes the file is identified, a program can access that file
If it knows the logical name of a file, a program can access that file without knowing whether the file
comes from the disk, a tape or a connection to another computer
This view of a file in Unix makes it possible to do with a very few operations compared to other OS.
9
Prepared by Shilpa B, Dept. of ISE, CEC
No. of cylinders is same as no. of tracks on a single surface, and each track has same capacity
No. of cylinders = no. of tracks on a single surface
Amount of data that can be held on a track and no. of tracks on a surface depends on how densely bits
can be stored on a disk surface
Cylinder contains group of tracks. Track contains group of sectors.Sector contains group of bytes
Track capacity = no. of sectors per track X bytes per sector
Cylinder capacity = no. of tracks per cylinder X track capacity
Drive capacity = no. of cylinders X cylinder capacity
Ex: we want to store a file with 50000 fixed length data records on a typical 2.1 gigabyte small
computer disk with following characteristics:No. of bytes per sector = 512, No. of sectors per track =
63, No. of tracks per cylinder = 16, No. of cylinders = 4092, How many cylinders are needed?
Soln:
There will be 2 records per sector
So no. of records per track = 2 X no. of sectors per track
= 2 X 63 = 126 records
Given that No. of tracks per cylinder = 16
So, no. of records per cylinder = 16 X 126 = 2016 records
File can store 50000 fixed length data records
So, no. of cylinders = 50000 / 2016 = 24.8 cylinders
10
Prepared by Shilpa B, Dept. of ISE, CEC
So there are 2 cases to study
o Data stored in consecutive or adjacent sectors
o Data spread across non adjacent sectors
If data stored in adjacent sectors, it should be easy to access them
In the above figure, adjacent sectors are used to store data. Initially R/W head positioned in sector 1
There is a delay in transferring data from sector to main memory
When data from sector 1 is transferred, the disk would have moved some distance in circular motion
Therefore, when R/W head is ready to read, the sector under it is not sector 2 but some other sector
say 4.
So the disk has to rotate one rotation in order to read sector 2.This delay is called rotational delay
Together with the transfer delay, the total time taken to read will be much longer.
The figure below uses non adjacent sectors to store data
Here an interleaving factor of 2 is used.When R/W head is reading sector 1, the disk rotates and by
the time the transfer data is complete, the R/W head is ready to read next sector say sector2
Thus when sector interleaving factor is chosen intelligently, the entire track can be read with few
rotations
b. Clusters
Another view of sector organization, also designed to improve performance is the view maintained by
the part of a computer’s OS that we call the “file manager”
When a program access a file it is the fie manager’s job to map the logical parts of the file to their
corresponding physical locations. It does this by viewing the file as a series of clusters of sectors
A cluster is a fixed number of contiguous sectors. Once a given cluster has been found on a disk, all
sectors in that cluster can be accessed without requiring an additional seek
To view a file as series of clusters and still maintain the sectored view, the file manager ties logical
sectors to the physical clusters they belong to by using “file allocation table” (FAT)
11
Prepared by Shilpa B, Dept. of ISE, CEC
The FAT contains the list of all clusters in a file
c. Extents
Clusters may or may not be contiguous (share a common boundary) on a disk.
Cluster sizes may range from 1 to 65,535 blocks.
Generally, a system manager assigns a small cluster size to a disk with a relatively small number of
blocks. Relatively larger disks are assigned a larger cluster size to minimize the overhead for disk space
allocation.
An extent is one or more adjacent clusters allocated to a file or to a portion of a file.
If enough contiguous disk space is available, the entire file is allocated as a single extent.
Conversely, if there is not enough contiguous disk space, the file is allocated using several extents,
which may be scattered physically on the disk.
Figure below shows how a single file (File A) may be stored as a single extent or as multiple extents.
d. Fragmentation
If, for example, size of a sector is 512 bytes and size of all records in the fie is 300 bytes, there is no
convenient fit between records and sectors
There are 2 ways to deal with this situation:
o Store only one record per sector
o Allow records to span sectors so the beginning of a record might be found in one sector and the
end of it in another
The 1st option has advantage that any record can be retrieved by retrieving just one sector
But it might has the disadvantage that it might leave an enormous amount of unused space within each
sector. This loss of space is called internal fragmentation
12
Prepared by Shilpa B, Dept. of ISE, CEC
The 2nd option has the advantage that it loses no space from internal fragmentation, but it has the
disadvantage that some records can be retrieved only by accessing both 2 sectors.
A block organization does not present the sector spanning and fragmentation problems because blocks
can vary in size to fit the logical organization of the data.A block is usually organized to hold a integral
no. of logical record
The term “blocking factor” is used to indicate the no. of records that are to be store in each block in a file
If we use block organization, no space would be lost to internal fragmentation and no need to load 2
blocks to retrieve one record
In block addressing schemes, each block of data is usually accompanied by one or more subblocks
containing extra information about the data block
13
Prepared by Shilpa B, Dept. of ISE, CEC
There is a count subblock that contains no. of bytes in the accompanying data block
There is a key subblock contains key for last record in data blockusing this key subblock, program can
ask its disk drive to search among all the blocks on a track for a block with desired key
Nondata overhead
Both blocks and sectors require that a certain amount of space be taken up on the disk in the form of
nondata overhead
Some of the overhead consist of information that is stored on the disk during preformatting which is
done before the disk can be used
On sector addressable disk, preformatting involves
o Storing at beginning of each sector, information such as sector address track address and
condition ( whether sector is usable or defective)
o Placing gaps and synchronization marks between fields of information to help the read/write
mechanism
Suppose we have block addressable disk drive with 20000 bytes per track and the amount of space taken
up by subblock and interblock gaps is equivalent to 300 bytes per block. We want to store a file
containing 100 byte records on the disk. How many records can be stored per track if the blocking factor
is 10? If it is 60?
o There are 100 bytes per block
Blocking factor =10
Data in each block = 100 X 10 = 1000 bytes
Space taken up by subblock and interblock gaps = 300 bytes per block
So total data = 1000+300 = 1300 bytes
So, no. of blocks that can be stored on a 20000 byte track = 20000/1300 = 15 blocks
So, no. of records can be stored per track if the blocking factor is 10= 15 X 10 = 150 records
o If Blocking factor =60,
Data in each block = 100 X 60 = 6000 bytes
So total data along with overhead = 6000+300 = 6300 bytes
So, no. of blocks that can be stored on a 20000 byte track = 20000/6300 = 3 blocks
So, no. of records can be stored per track if the blocking factor is 60= 3 X 60 = 180 records
So larger blocking factor leads more efficient use of storage
When blocks are larger, fewer blocks are required to hold a file
14
Prepared by Shilpa B, Dept. of ISE, CEC
The cost of a disk access
Disk access can be divided int 3 physical operations
o Seek time
o Rotational delay
o Transfer time
a. Seek time
It is the time required to move the access arm to the correct cylinder
The amount of time spent during a disk access depends on how far the arm has to move
If we are accessing a file sequentially and the file is packed into several consecutive cylinders,
seeking needs to be done only after all the tracks on a cylinder have been processed
If we are alternately accessing sectors from 2 files that are stored at opposite extremes (one is
innermost and the other is outermost cylinder) on a disk, seeking is very expensive
So system designers often go to minimize seeking.We usually try to determine average seek time
required for particular operation. Most hard disk available today have average seek time of less
than 10 milliseconds
High performance disks have average seek time of less than 7.5 milliseconds
b. Rotational delay
It is the time it takes for the disk to rotate so the sector we want is under the R/W head
It is also referred to as latency.
Hard disk usually rotate at about 5000rpm, i.e 1 rotation per 12 milliseconds
Floppy disk rotate about 360rpm, i.e 1 rotation per 83.3 milliseconds
In many cases rotational delay can be much less than the average
The rotational delay is inversely proportional to rotational speed of the drive
The average rotational delay is the time for the disk to rotate 180 0
c. Transfer time
Once the data we want is under W/R head, it can be transferred
Transfer time is given by
𝑛𝑜.𝑜𝑓𝑏𝑦𝑡𝑒𝑠𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟𝑟𝑒𝑑
Transfer time = 𝑋𝑟𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑡𝑖𝑚𝑒
𝑛𝑜.𝑜𝑓𝑏𝑦𝑡𝑒𝑠𝑜𝑛𝑎𝑡𝑟𝑎𝑐𝑘
If drive is sectored, transfer time for one sector depends on the no. of sectors on a track
Disk as bottleneck
Disk performance is increasing, but still slow!
Even high-performance network is faster than disk
A process called “disk bound” – the network and computer’s CPU have to wait for sometimes for the
disk to transmit the data. A no. of techniques are used to solve this problem
One is Multiprogramming – CPU works on other jobs while waiting for data to arrive
15
Prepared by Shilpa B, Dept. of ISE, CEC
Next one is Striping – splitting part of a file on several different drives, then letting the separate
drives deliver parts of the file to the network simultaneously
Striping contains an important concept called parallelism – whenever there is a bottleneck at some
point in the system, duplicating the source of the bottleneck and configure the system, so several of
them operate parallel
Magnetic tape
It belongs to a class of device that provides no direct accessing facility but can provide very rapid
sequential access to data
Tapes are compact, stand up well under different environmental conditions, are easy to store and
transport, and are less expensive than disk
Years ago tape systems were widely used to store application data
An application that needed data from a specific tape would issue a request for the tape, which would
be mounted by an operator onto a tape drive.
The application could then directly read and write on the tape
The tremendous reduction in the cost of disk system has changed the way tapes are used
In odd parity, the bit is set to make the number of bits in the frame odd.
This is done to check the validity of the data.
Frames are organized into data blocks of variable size separated by interblock gaps which contains
no information, and are long enough to permit stopping and starting
Tape drives come in many shapes, sizes and speeds
16
Prepared by Shilpa B, Dept. of ISE, CEC
Performance is measured using 3 quantities
o Tape density – commonly 800, 1600, 6250 bits per inch (bpi) per track [ recently 30000bpi]
o Tape speed – commonly 30 to 200 inches per second (ips)
o Size of interblock gap – commonly between 0.3inch and 0.75 inch
Estimating tape length requirements
(i) Suppose we want to store a backup copy of a large mailing list file with one million 100 bytes record. If
we want to store the file on a 6250bpi tape that has an interblock gap of 0.3 inches, how much tape is
needed?
Ans:
There are mainly 2 things that takes up space on a tape
o Interblock gap
o Data blocks
Let b = physical length of data block
g = length of interblock gap
n = no. of data block
then space required for storing the file
s = n X (b + g )
We have g = 0.3 inch
n = 1 million = 1,000,000
Bytes per block = 100
Bytes per inches = 6250
b = block size (bytes per block) / tape density (bytes per inch)
So, b = 100 / 6250 = 0.016 inch
So s = 1000000 X (0.016+0.3)
= 316000 inches
= 316000/12 feet = 26333 feets tape is needed to store the record
(ii) If for the same problem, blocking factor is 50, show that only one tape is required to back up the file
Ans:
Blocking factor = 50
So no. of blocks n = 1000000/50 = 20000
And s = 20000X(0.016+0.3) = 6320 inches
Given that, we need to store 6250 bpi file
If we have tape of 6320 inches size, only one tape is required to store 6250bpi file
(iii)Suppose we want to store a backup copy of a large mailing list file with 350000 records of 80 byte. If we
want to store the file on a 6250bpi tape that has an interblock gap of 0.3 inches, and blocking factor is
50, how much tape is needed to store these records?
Ans:
17
Prepared by Shilpa B, Dept. of ISE, CEC
n = 350000
Bytes per block = 80
Bytes per inches = 6250
b = block size (bytes per block) / tape density (bytes per inch)
So, b = 80 / 6250 = 0.0128 inch
So s = n X (b + g )
Where n = 350000, b = 0.0128, g = 0.3
So, s = 350000 X (0.0128+0.3)
= 109480 inches
= 109480 /12 feet = 9123feets tape is needed to store the record
Blocking factor = 50
So no. of blocks n = 350000 /50 = 7000
And s = 7000 X (0.0128+0.3)=2189.6 inches tape is required
Estimating data transmission time
Nominal data transmission rate = tape density (bpi) X tape speed (ips)
Hence 6250bpi, 200 ips tape has a nominal transmission rate of
nominal transmission rate = 6250 X 200 = 1250000 bytes / Sec
= 1250 kilobytes/Sec
Once our data gets dispersed by interblock gaps, the effective transmission rate certainly suffers. Suppose
for the previous problem, blocking factor is 1, one million 100 bytes record and an interblock gap of 0.3
inches, then
Effective recording density = no. of bytes per block / no. of inches required to store a block
= 100 bytes / 0.316 inches = 316.4 bpi
If the tape is moving at the rate of 200ips,
then effective transmission rate = 316.4 X 200
= 63280 bytes/sec
= 63.3 kilobyte/sec
Which is very less than nominal rate. If blocking factor is larger, then it improves the result.
So, some factors that influence the performance is:
Block size
Gap size
Tape speed
Recording density
Time it takes to start and stop the tape
18
Prepared by Shilpa B, Dept. of ISE, CEC
Disk versus Tape
Disk
Excellent for random access and storage of files for which immediate access was desired
Dedicated for several processes
Tape
Ideal for processing data sequentially and long-term storage of files. Dedicated for one process
Introduction to CD-ROM
CD-ROM – Compact Disk Read Only Memory
It can hold a lots of data and can be reproduced cheaply. A single disc can hold more than 600
megabytes of data
CD-ROM is read only (or write once) in the same sense as a CD audio disc: once it has been
recorded, it cannot be changed
It is a publishing medium used for distributing information to many users rather than a data storage
and retrieval medium like magnetic disks
It is used for distribution of
o All types of software
o Codes
o Textual data
o Digitalized images
o Video information
o Digital audio etc
A Short history of CD-ROM
CD-ROM is developed in late 1960s and early 1970s. The goal was to store movies on disc
Then the consumer products industry spent so much money developing other different technologies
Then spent years fighting over which approach should become standard
The surviving format is one called Laser Vision. So competitors of this lost money and important
market opportunities. This hard lessons made them to develop CD audio and CD-ROM
Why Laser Vision technology became popular is that, it supports recording in both Constant Linear
Velocity (CLV) and Constant Angular Velocity (CAV) that enables fast seek performance
In early 1980s, a no. of firms began looking at possibility of storing digital textual information on
Laser Vision discs. Laser vision stores data in an analogue form; it is after all, storing a digital signal
Phiips and Sony began work on a way to store music on optical discs. Rather than storing the music
in the kind of analogue form used on video discs, they developed digital data format
They had learned hard lessons from the expensive standards battles over video discs
19
Prepared by Shilpa B, Dept. of ISE, CEC
This time they worked with the other players in the consumer product industry to develop licencing
system that resulted in the emergence of CD audio as a broadly accepted standard format as soon as
the 1st disc and players were introduced. CD audio appeared in US in early 1984
CD-ROM which is a digital data format built on the top of CD audio standards
The 1st commercially available CD-ROM drives appeared in 1985
The different large and small firms worked out the main features of file system standards by early
summer of 1986. That work has become an official international standard for organizing files on CD-
ROM. The latest new technology for CDs is the DVD, which stands for Digital Video Disc
The Sony Corporation has developed DVD for the video market, especially for the new high
definition TVs, but DVD is also available for storing file
The pits scatter the light but lands reflect most of it back to the pickup
This alternating pattern of high and low intensity reflected light is the signal used to reconstruct the
original digital information
The encoding scheme used for this signal is not simply a matter of calling a pit a 1 and a land a 0
Instead the 1s are represented by the transitions from pit to land and back again
Every time the light intensity changes we get a 1. The 0s are represented by the amount of time
between transitions. The longer between transitions, the more 0s we have
It is not possible to have 2 adjacent 1s – 1s are always separated by 0s
In fact, due t the limits of the resolution of the optical pickup, there must be at least 2 0s between any
pair of 1s. This means that the row pattern of 1s and 0s has to be translated to get the 8bit patterns of
1s and 0s that form the bytes of original data.
This translation scheme which is done through a lookup table turns the original 8 bit of data into 14
expanded bits that can represented in the pits and lands on the disc
20
Prepared by Shilpa B, Dept. of ISE, CEC
The reading process reverses this translation
CLV instead of CAV
Data on CD ROM is stored in a single spiral track that winds for almost 3miles from the centre to the
outer edge of the disc
A sector towards the outer edge of the disc takes the same amount of space as a sector towards the
centre of the disc
This means that we can write all f the sectors at the maximum density permitted by storage medium
Since reading the data requires that it pass under the optical pickup device at a constant rate, the
constant data density implies that the disc has to spin more slowly when we are reading to the outer
edges than when we are reading towards the centre
This is why the spiral is a constant linear velocity(CLV) format:as we seek from the centre to the
edge, we change the rate of rotation of the disc so the linear speed of the spiral past the pickup device
stays the same
CAV With its concentric tracks and pie shaped sectors writes data less denselyin the outer tracks
than in the tracks toward the centre
We are wasting storage capacity in the outer tracks but have the advantage of being able to spin the
disc at the same speed for all positions of the read head.
Given the sector arrangement shown in the figure, one rotation reads 8 sectors no matter where we
are on the disc.A timing mark placed on the disc makes it easy to find the start of a sector
The CLV format is responsible for the poor seeking performance of CD ROM drives
The CAV formats provides definite track boundaries and a timing mark to find the start of a sector
But the CLV format provides no straight forward way to jump to specific location.
Addressing
We use sector addressing scheme that is related to the CD ROMs route as and audio playback device
Each second of playing time on a CD is divided into 75 sectors each of which holds 2KB of data.
According to the originals Philips and Sony standards a CD whether used for audio or CD ROM
contains at least on hour of playing time that means that disc is capable of holding 5,40,000KB data
CD ROM Strength and weakness
a. Seek performance
21
Prepared by Shilpa B, Dept. of ISE, CEC
The chief weakness of CD ROM is the random access performance
Current magnetic disc technology is such that the average time for random data access combining
seek time and rotational delay is about 30m sec. On a CD ROM this average access takes 500m sec
Our file design strategy must avoid seeks to an even greater extent than on magnetic disc
b. Data Transfer rate
A CD ROM drive reads 70ectors or 150 KB of data per second.
This data transfer rate is part of the fundamental definition of CD ROM
It can’t be changed without leaving behind the commercial advantages of CD audio standard
It is a modest transfer rate about 5 times faster than the transfer rate of floppy disc
c. Storage capacity
A CD ROM holds more than 600 MB of data
Although it is possible to use up this storage area very quickly particularly if you are storing raster
images 600MB is big when it comes to text applications
d. Read only access
From a design stand point the fact that CD ROM is publishing medium, a storage device that
cannot be changed after manufacture, provides significant advantages. We never have to worry
about updating
This not only simplifies some of the file structures but also means that it is worthwhile to optimize
our index structures and other aspects of file organization
e. Asymmetric writing and reading
For most media files are written and read using the same computer system
Often reading and writing are both interactive and are there for constrained by the need to provide
quick response to user. CD ROM is different
We create the files to be placed on the disc once, then we distribute the Disc and it is accessed
1000 of times
A Journey of a Byte
What happens when a program writes a byte to a file on a disk? What happens between the program
and the disk? We see an example of a journey of 1 byte
Suppose we want to appended a byte representing a character P stored in a character variable ch to a
file named in the variable textfile stored somewhere on a disk
From programmer’s point of view entire journey of the byte is represented by a statement
“write(textfile, ch, 1)”.But actual journey is much longer than this
The write statement results in a call to the computer’s OS which has the task of seeing that the rest of
journey is completed successfully
22
Prepared by Shilpa B, Dept. of ISE, CEC
The write statement tells the operating system to send one character to disk and gives the OS the
location of the character.
The OS takes over the job of writing and then returns control to calling program
Once the OS has taken over the job, the rest of the journey is largely beyond the program’s control
a. File manager
OS is not a single program, but a collection of programs. Each one designed to manage a different
part of computer’s resource. Among these programs, one deals with file related matters – file
manager. It has several layers as below:
1 The program asks the OS to write the contents of the variable c to the next available Logical
position in TEXT.
2 The OS passes the job on to the file manager
3 The file manager looks up TEXT in a table containing information about it, such as
whether the file is open and available for use, what types of access are allowed, if
any, and what physical file the logical name TEXT corresponds to.
4 The file manager searches a File Allocation Table for the physical location of the
sector that is to
contain the byte.
5 The file manager makes sure that the last sector in the file has been stored in a system
I/O buffer in RAM, then deposits the ‘P’ into its proper position in the buffer.
6 The file manager gives instructions to the I/O processor about where the byte is stored
in RAM and where it needs to be sent on the disk
7 The I/O processor finds a time when the drive is available to receive the data and puts
the data in proper format for the disk. It may also buffer the data to send it out in
chunks of the proper size for the disk
8 The I/O processor sends the data to the disk controller.
9 The controller instructs the drive to move the r/w head to the proper track, waits for
thedesired sector to come under the r/w head, then sends the byte to the drive to be
depositedbit-by-bit, on the surface of the disk.
Physical
b. I/O Buffer
23
Prepared by Shilpa B, Dept. of ISE, CEC
Next, the file manager determines whether the sector that is to contain P is already in memory or needs
to be loaded into memory. If the sector needs to be loaded, the file manager must find the available
system I/O buffer space for it and then read it from the disk
Once it has the sector in a buffer, in memory, the file manager can deposit the P into its proper position
in the buffer. The file manager moves P form program’s data area to a system output buffer where it
may join other bytes headed for the same place on the disk.
If necessary, file manager may have to load the corresponding sector from the disk into the system
output buffer. The system I/O buffer allows the file manager t read and write data in sector sized or
block sized unit
c. The bytes leaves memory: the I/O processor and disk controller
Till now, bytes have travelled along data paths that are designed to be very fast and are relatively
expensive. Now it is time for the byte to travel along the data path that is likely to be narrower than the
one in primary memory
Because of the bottlenecks created by these differences in speed and data path widths, our byte and its
companions might have to wait for an external path to become available
24
Prepared by Shilpa B, Dept. of ISE, CEC
The process of dissembling and assembling groups of bytes for transmission to and from external
devices are so specialized that it is unreasonable to ask an expensive general purpose CPU to spend its
valuable time doing I/O when a simpler device could do the job and free the CPU to do its other works
Such a special purpose device is called an I/O processor
An I/O processor may be anything from a simple chip capable of taking a byte and passing it along one
cue to a powerful small computer capable of executing very sophisticated programs and
communicating with many devices simultaneously
The I/O processor takes its instructions from OS, but once it begins processing I/O, it runs
independently, relieving OS
In typical computer the file manager might now tell the I/O processor that there is data in the buffer to
be transmitted to the disk, how much data there is and where it is to go on the disk
This information might come in the form of a little program that the OS constructs and the I/O
processor executes
The job of controlling the operations of the disk is done by a device called disk controller
The I/O processor asks the disk controller if the disk drive is available for writing
If there is much I/O processing, there is a good chance that the drive will not be available and our byte
will have to wait in its buffer until the drive becomes available
Then the disk drive is instructed to move its R/W head to the track and sector on the drive where our
byte and its companions have to be stored
The R/W head must seek to the proper track and then wait until the disk has spun around so the desired
sector is under head
Once the track and sector are located, the I/O processor can send out bytes, one at a time to drive where
it probably is stored in a little 1 byte buffer while it waits to be deposited on the disk
Finally as the disk spins under the R/W head, the 8-bits of our byte are deposited one at a time on the
surface of the disk
Buffer management
Buffer is the part of main memory available for storage of copies of disk blocks
Buffering involves working with large chunks of data in memory
So number of access to secondary storage can be reduced. But use of buffer within programs can also
affect the performance
a. Buffer bottlenecks
File manager allocates I/O buffer that are big enough to hold incoming data
It is common for file manager to allocate several buffers for performing I/O
Consider if a program is performing both I/O on one character at a time and only one I/O buffer is
available
25
Prepared by Shilpa B, Dept. of ISE, CEC
When the program asks for its 1stcharacter the I/O buffer is loaded with the sector containing the
character and the char is transmitted to the program
If the program then decides to output a char, the I/O buffer is filled with the sector into which the
output char needs to go, destroying its original contents
Then when the next i/p char is needed, the buffer contents have to be written to disk to make room
for the original sector containing the 2nd i/p char and so on
That’s why, I/O systems use at least 2 buffers – one for i/p and one for o/p
A program that reads many sectors from a file might have to spend much of its time waiting for the
I/O system to fill its buffer every time a read operation is performed before it can being processing
When this happens, the program that is running is said to be I/O bound – the CPU spends much of
its time just waiting for I/O to be performed. Solution for this problem is to use more than one buffer
b. Buffering strategies -Multiple buffering
Suppose that a program is only writing to a disk and that it is I/O bound
The CPU wants to be filling a buffer at the same time that I/O is being performed
If 2 buffers are used and I/O-CPU overlapping is permitted, the CPU can be filling one buffer
while the contents of the other are being transmitted to disk
When both tasks are finished, the roles of the buffers can be exchanged
This method is called as double buffering
Normally any number of buffers can be used and they can be organized in a variety of ways
Some file systems use a buffering scheme called buffer pooling
When a system buffer is needed it is taken from a pool of available buffers and used
When the system receives a request to read certain sector or block, it looks to see if one of its buffer
already contains that sector or block
If no buffer contains it, the system finds from its pool of buffers one that is not currently in use and
loads the sector or block into it. Different schemes are used to decide which buffer to take from a
buffer pool. One general strategy is to take the buffer that is least recently used (LRU)
26
Prepared by Shilpa B, Dept. of ISE, CEC
When a buffer is accessed it is put on a LRU queue so it is allowed to retain its data until all other
LRU buffers have been accessed
c. Move mode and locate mode
Move mode – way of handling buffered data – it involves moving chunks of data from one place in
memory to another before they can be accessed – time consuming
There are 2 ways to avoid move mode
o If file manager can perform I/O directly between secondary storage and the program’s data
area, no extra move is necessary
o File manager could use system buffers to handle I/O provided the program with locations
using pointers
Both techniques are examples of locate mode
To do this
o Read whole block into a single big buffer
o Move the different parts to their own positions
Sometimes we can avoid this 2 step process using a technique called scatter input – a single read call
identifies not one but a collection of buffers into which data from a single block is to be scattered
27
Prepared by Shilpa B, Dept. of ISE, CEC
The converse of scatter input is gather output – several buffers can be gathered and written with a
single write call. This avoids the need to copy them to a single output buffer
I/O in UNIX
a. The kernel
Process of transmitting data from a program to an external device can be described as proceedings
through a series of layers
The top most layer deals with data in logical, structural terms
We store in a file a name, a body of text, an image, an array of numbers or some other logical entity
The layer that follow collectively carry out the task of turning the logical object into collection of
bits on a physical device
The top most I/O layer in Unix consist of processes that impose certain logical views on files
These processes includes shell routines like cat and tail, user programs that operate on files and library
routines like scanf and fread that are called from programs to read strings, numbers and so on
Below this layer, is the Unix kernel
The component of the kernel is shown in the above diagram
It views all I/O as operating on a sequence of bytes
Once the control comes to kernel, all assumptions about logical view of file is gone
This is to make all operations below the top layer independent of an application’s logical view of a file
Journey of byte through kernel
When program executes a system call such as
write (fd, &ch, 1);
thekernel is invoked immediately
The routines that let processes communicate directly with the kernel make up the system call interface
Now system call instructs the kernel to write a character to a file
28
Prepared by Shilpa B, Dept. of ISE, CEC
The kernel I/O system begins by connecting the file descriptor in the program to some file or device in
the file system
It does this by proceeding through a series of 4 tables – that enables the kernel to find its way from
process to the places on the disk where it holds the file it refers to
The 4 tables are:
o File descriptor table
o Open file table – with information about open files
o File allocation table – part of structure called index node
o Table of index nodes – one entry for each file in use
These tables are managed by I/O system and owned by different parts of the system
o File descriptor table – owned by the process or your program
o Open file table and index node table are owned by kernel
o The index node is part of file system
The 4 tables are invoked by kernel to get the information it needs
File descriptor table
It is a simple table that associates each of the file descriptors used by a process with an entry in the open
file table
Every process has its own descriptor table which includes entry for all files it has opened
29
Prepared by Shilpa B, Dept. of ISE, CEC
These entries are called file structures and they contain important information about
o How the corresponding file is to be used, such as read/write mode used when it was opened
o The no. of processes currently using it
o Offset within the file to be used for the next read or write
o Array of pointers to generic functions that can be used to operate on the file
In general, an open file table tells the kernel what it can do with a file that has been opend in a certain
way and provides information on how it can operate on the file
The kernel still needs more information about the file such as where the file is stored on disk, how big
the filee is, and who owns it.This information is found in index node (or inode) table
Index node (inode) table
It is more permanent structure than the open file table’s file structure
Inode exists as long as its corresponding file exists. When a file is opened, a copy of inode is usually
loaded into memory where it is added to the inode table for rapid access
Most important component of inode is a list or index of the disk blocks that make up file
Once the kernel I/O system has the inode information, it knows all it needs to know about the file
It then invokes I/O processor program that is appropriate for the
o type of data
o type of operation
o type of device that is to be written
in Unix this program is called device driver – it sees that your data is moved from its buffer to its proper
place on disk
30
Prepared by Shilpa B, Dept. of ISE, CEC
a. linking file names to files
All references to files begin with a directory, for it is in directories that file names are kept
Directory is just a small file that contains for each file, a file name together with a pointer to the file’s
inide on disk. This pointer from a directory to the inode of a file is called a hard link
It provides a direct reference from a file name to all other information about the file
When a file is opened, this hard link is used to bring the inode into memory and to setup the
corresponding entry in the open file table. A field in inode tells how many hard links there are to the
inode
There is another kind of link – soft link or symbolic link – it links file name to another file name rather
than to actual file. So soft link is a path name of some file.
These soft links are not supported by all UNIX systems
b. Normal files, special files, sockets
Kernel distinguish files into 3 categories
o Normal filestext files
o Special filesrepresents stream of characters and control signals that drive some devices link
printers
o Socketsabstractions that serve as end point for the inter process communication
31
Prepared by Shilpa B, Dept. of ISE, CEC
Fundamentals of file structure concepts
Field and Record organization
When we build file structures, we are making it possible to make data persistent i.e, one program can
create data in memory and store it in a file and other program can read the file and recreate the data in its
memory.
The basic unit of data is field, in which we have a single data value
Fields are organized into aggregates either as an array or as a record. When record is stored in memory,
we refer it to as an object and refer to its fields as members
When that object is stored in a file, we call it as record.
A stream file
Suppose we need to store name and address information about a collection of people, we will use object
of class person using object s in C++ to store information about individuals.
Ex: the code below gives a C++ function to write the fields of a person to file as stream of bytes
Here all fundamental units like Mary Ames, 123 Maple etc are called as fields where field is a smallest
logically meaningful unit of information in a file.
Field structure
There are many ways of adding structure to files to maintain the identity of fields . Four of the most
common methods are:
Force the fields into a predictable length.
Begin each field with a length indicator.
Place a delimiter at the end of each field to separate it from the next field.
Use a "keyword = value" expression to identify each field and its contents .
32
Prepared by Shilpa B, Dept. of ISE, CEC
Method 1: Fix the Length of Fields
The fields in our sample file vary in their length. If we force the fields into predictable lengths, then
we can pull them back out of the file simply by counting our way to the end of the field.
We can define a struct in C or a class in C++ to hold these fixed length fields. Size of array is one
larger than the longest string it can hold , because string in C or C++ are stored with a terminating 0
byte.
But a fixed size field in a file doesn’t need to add this extra character. So object of class person can
be stored in 10+10+15+15+2+9 = 61 bytes.
Using this structure, our output looks a following:
Drawback is wastage of space. (Instead of using 4 bytes to store Ames, we use 10 bytes and so on)
Method 2: Begin Each Field with a Length Indicator
Another way to make it possible to count to the end of a field involves storing the field length just
ahead of the field, as illustrated in Fig below.
If the fields are not too long (length less than 256 bytes), it is possible to store the length in a single
byte at the start of each field.
Method 3: Separate the Fields with Delimiters
We can also preserve the identity of fields by separating them with delimiters.
All we need to do is choose some special character or sequence of characters that will not appear
within a field and then insert that delimiter into the file after writing each field.
The choice of a delimiter character can be very important since it must be a character that does not
get in the way of processing .
In many instances white-space characters (blank, new line, tab) make excellent delimiters because
they provide a clean separation between fields when we list them on the console.
Also, most programming languages include I/O statements that, by default, assume that fields are
separated by white space.
Unfortunately, white space would be a poor choice for our file since blanks often occur as legitimate
characters within an address field.
33
Prepared by Shilpa B, Dept. of ISE, CEC
Therefore, instead of white space we use the vertical bar character as our delimiter, so our file
appears as in Fig. below.
Such self-describing structures can be very useful tools for organizing files in many applications. It is
easy to tell what fields are contained in a file, even if we don't know ahead of time what fields the
file is supposed to contain.
It is also a good format for dealing with missing fields. If a field is missing, this format makes it
obvious, because the keyword is simply not there.
This format is used in combination with another format, a delimiter to separate fields.
Extensive use is made of the istream method getline. Arguments to getline are a character array to hold
the string, a maximum length and a delimiter.
getline reads upto 1st occurrence of delimiter or end of line whichever comes first
when this program is executed, we get
34
Prepared by Shilpa B, Dept. of ISE, CEC
Record structure
A record can be defined as a set of fields th at belong together when the file is viewed in terms of a
higher level of organization.
Like the notion of a field, a record is another conceptual tool. It is another level of organization that we
impose on the data to preserve meaning.
Records do not necessarily exist in the file in any physical sense, yet they are an important logical notion
included in the file structure.
Here are some of the most often used methods for organizing a file into records:
Require that the records be a predictable number of bytes in length.
Require that the records be a predictable number of fields in length.
Begin each record with a length indicator consisting of a count of the number of bytes that the record
contains .
Use a second file to keep track of the beginning byte address for each record.
Place a delimiter at the end of each record to separate it from the next record.
Method 1: Make Records a Predictable Number of Bytes (Fixed-length Records)
A fixed-length record file is one in which each record contains the same number of bytes .
We have a fixed number of fields, each with a predetermined length, which combine to make a
fixed-length record. This kind of field and record structure is illustrated in Fig. below:
It is important to realize, however, that fixing the number of bytes in a record does not imply that the
sizes or number of fields in the record must be fixed.
Fixed-length records are frequently used as containers to hold variable numbers of variable length
fields. It is also possible to mix fixed and variable-length fields within a record.
Figure below illustrates how variable-length fields might be placed in a fixed-length record.
35
Prepared by Shilpa B, Dept. of ISE, CEC
Method 3: Begin Each Record with a Length Indicator
We can communicate the length of records by beginning each record with a field containing an
integer that indicates how many bytes there are in the rest of the record.
36
Prepared by Shilpa B, Dept. of ISE, CEC
Representing the Record Length
Option-1
Write the length in the form of a two-byte binary integer before each record. This is a natural solution in
C, since it does not require us to go to the trouble of converting the record length into character form.
Option-2
Convert the length into character string using formatted output.
With C stream, we use fprintf. With C++ stream class we use overloaded insertion operator (<<).
Example
Each of these lines inserts the length as a decimal string followed by a single blank that functions as
delimiter
Output from an implementation with text length field is given by
Each record has record length preceding the data field which is delimited by block
The 1st record contains characters starting from Ames to final delimiter after 74075. So the character
36 is placed before the record followed by a blank space.
Reading variable length record from the file
The program must read the length of the record, move characters of the record into buffers, then
break records into fields as below:
The following code declares object of class person and class DelimTextBufpacks the person into the
buffer and writes buffer to file.
38
Prepared by Shilpa B, Dept. of ISE, CEC
It uses the variable length strategy.
A binary value is used to represent the length of the record
Write inserts the current buffer size, then the character of the buffer.
Read clears current buffer contents extract the record size, reads proper no. of bytes into the buffer
and sets the buffer size.
Extending class person with buffer operations
Buffer classes have the capacity of packing any no. of and type of values but they do not record how
these values are combined to make objects
Pack operation
The method addfield is included to support the specification of fields and their size
A buffer for objects of class Person is initialized by the new method InitBuffer of class Person
39
Prepared by Shilpa B, Dept. of ISE, CEC
Using Inheritance for Record Buffer Classes
Inheritance in C++ stream classes
C++ incorporates inheritance to allow multiple classes to share members and methods
One or more base classes define members and methods which are then used by subclasses
Stream classes are defined in such a hierarchy
fstream is embedded in a class hierarchy that contains many other classes (read operation including
extraction operators are defined in class istream and write operations are defined in class ostream)
Class fstream inherits these operations from its parent class iostream which inturn inherits from class
istream and ostream
There are 2 base classes: ios and fstreambase that provides common declaration and basic stream
operations (ios) and access to OS file operations (fstreambase)
There are use of multiople inheritance in these classes (classes have more than one base class)
The keyword virtual is used to ensure that class ios is included only once in the ancestry of any of
these classes
Objects of a class are also object of their base class and includes members and methods of base
classes.
Ex: class fstream is the object of class fstreambase, iostream, istream, ostream and ios and includes
all of the members and methods of those base classes
Hence the read method and extraction operations (>>) defined in istream are also available in
iostream, ifstream and fstream.
Open and close operations of class fstreambase are also members of class fstream
Benefits of inheritance is that operations that work on base class object also work on derived class
object.
Class hierarchy for record buffer object
Characteristics of 3 buffer classes can be combined into a single class hierarchy as in the figure
below.
40
Prepared by Shilpa B, Dept. of ISE, CEC
The members and methods common to all the 3 buffer classes are included in the basic class
IOBuffer
Other methods are in the class VariableLengthBuffer and FixedLengthBuffer which supports read
and write operations for different types of records.
LengthFieldBuffer, DelimFieldBuffer and FixedFieldBuffer have the pack and unpack methods for
specific field representations.
The common members of all of the buffer classes Buffer, Buffersize and Maxbytes declared as
protected members
Protected members of the class can be used by methods of the class and by methods of class derived
from the class
Protected members of IOBuffer can be used by methods in all of the classes in this hierarchy
Protected members of VariableLengthBuffer can be used in its subclass but not in class IOBuffer and
FixedLengthBuffer
The constructor for IOBuffer have only one parameter which specifies maximum bytes
Methods are declared for reading, writing, packing and unpacking.
IOBuffer defines these methods as virtual to allow each subclass to define its own implementation
The ‘=0’ represents pure virtual method – it is an abstract class
41
Prepared by Shilpa B, Dept. of ISE, CEC
The above code shows write method for VariableLengthBuffer
!stream and !streamgood()are the 2 methods returns if stream has experienced an error
Write method returns address in the stream where the record was written which is delimited by
calling stream.tellg().
Class VariableLengthBuffer (and class FixedLengthBuffer) have functions: read, write and
SizeofBuffer.
Class DelimFieldBuffer (LengthFieldBuffer and FixedFieldBuffer) have functions: pack and unpack.
FixedFieldBuffer is the subclass of IOBuffer that supports read and write of fixed length records.
Write method writes fixed size record and Read function must know the size inorder to read record
properly
Addfield used to specify field size
initBuffer method used to initialize buffer as shown below:
42
Prepared by Shilpa B, Dept. of ISE, CEC
Unpack function:
43
Prepared by Shilpa B, Dept. of ISE, CEC
Records can also be searched based on a secondary key. Those do not typically uniquely identify a
record.
It is not dataless (no real data) and has a canonical form (i.e. there are restrictions on the values that
the key must take)
A primary key should be unchanging. It is the key that is used to identify a record uniquely.
In general not every field is a key Keys correspond to fields, or combination of fields, that may be
used in a search.
Sequential search
Evaluating performance
Sequential search is one of the simplest forms of file searching
The file is searched one record at a time, until a record is found with a particular key
Sequential search is slow:
o If there are n records in the file, you may have to look at all of them before you find the one you
want
o If the key you are looking for is in the file, on average you will need to look through n/2 records
before finding it
Sequential search is said to be O(n), because the time it takes is proportional to n.
Although sequential search is slow, it is not appalling Sequential search always looks at the adjacent
record in the file next
Therefore, it makes good use of the fact that every read of a file does not result in a disk access
A big chunk of the file is read into a buffer in main memory
So most reads of the file will not actually result in disk accesses
Improving Sequential search performance with record blocking
We grouped bytes into fields, fields into records and now records to blocks. Blocking is done strictly
as performance measure.
Although blocking results in substantial performance improvement, it doesn’t change the order of
sequential search operation. The cost of searching is still O(n).
It reflects the differences between memory access speed and cost of accessing secondary storage
Blocking saves time because it decreases amount of seeking
Blocking doesn’t change the no. of comparisons that must be done in memory and it probably
increase amount of data transformed between disk and memory (we always read whole block even if
record we are seeking is the 1st one in the block)
When sequential search is good?
Sequential search is usually considered as expensive method. But it is extremely easy to program and
it requires simplest of file structures.
There are many situations in which you can use sequential search:
44
Prepared by Shilpa B, Dept. of ISE, CEC
o Your collection of elements is not sorted/cannot be sorted.
o Your collection of elements is very small
o When the number of searches you will perform on the data is low. (That binary search requires
sorted data is a drawback only if the data does not need to be searched many times. If you have to
perform multiple searches, it is worth sorting it once and using binary search rather than
searching in a linear fashion every time.)
Unix tools for sequential Processing
Most common FS that occurs in Unix is an ASCII file with the new line character as the record
delimiter and white space as field delimiter. Such files are simple and easy to process.
Since in this kind of file structure, records are variable in length, these can be processed sequentially.
Some of the tools in Unix for sequential processing are:
We can also combine tools to create, on the fly, some very powerful file processing software. For
example, to find the number of words in all records containing the word Ada:
Direct access
The most radical alternative to searching sequentially through a file for a record is a retrieval
mechanism known as direct access .
We have direct access to a record when we can seek directly to the beginning of the record and read
it in. Whereas sequential searching is an O(n) operation, direct access is 0(1); no matter how large
the file is, we can still get to the record we want with a single seek.
Class IOBuffer includes Dread and DWrite functions for direct read and direct write operations:
45
Prepared by Shilpa B, Dept. of ISE, CEC
Dread begins by seeking to the requested spot.
If request is beyond the end of file, function fails.
When seek succeeds, read method is called, which select the correct one.
Direct access is predicated on knowing where the beginning of the required record is .
Sometimes this information about record location is carried in a separate index file. But, for the
moment, we assume that we do not have an index.
We assume, instead, that we know the relative record number (RRN) of the record that we want.
The idea of an RRN is an important concept that emerges from viewing a file as a collection of
records rather than a collection of bytes.
If a file is a sequence of records, then the RRN of a record gives its position relative to the beginning
of the file. The first record in a file has RRN 0, the next has RRN 1, and so forth.
The RRN tells us the relative position of the record we want in the sequence of records, but we still
have to read sequentially through the file, counting records as we go, to get to the record we want,
but looking for a particular RRN is still an O(n) process.
For instance, if we are interested in the record with an RRN of 546 and our file has a fixed-length
record size of 128 bytes per record, we can calculate the byte offset as follows:
In general, given a fixed-length record file where the record size is r, the byte offset of a record with
an RRN of n is
o Write method add a header to file and returns no. of bytes in the header
o Read method reads the header and check for consistency.
47
Prepared by Shilpa B, Dept. of ISE, CEC
Sequential Access
This access method the information/data stored on a device is accessed in the exact order in which it
was stored.
Sequential access methods are seen in older storage devices such as magnetic tape.
File Organization Method
The process that involves how data/information is stored so file access could be as easy and quickly
as possible. Three main ways of file organization:
1. Sequential
2. Index-Sequential
3. Random
Sequential file organization
All records are stored in some sort of order (ascending, descending, alphabetical).
The order is based on a field in the record.
For example a file holding the records of employeeID, date of birth and address.
The employee ID is used and records stored is group accordingly (ascending/descending).
Can be used with both direct and sequential access.
Index-Sequential organization
The records is stores in some order but there is a second file called the index-file that indicates where
exactly certain key points.
Can not be used with sequential access method.
Random file organization
The records are stored randomly but each record has its own specific position on the disk (address).
With this method no time could be wasted searching for a file. Instead it jumps to the exact position
and access the data/information.
Can only be used with direct access access method.
Question Bank
1. Define file structures. Why to study file structures design? What is the driving force behind FS
design?
2. Explain overview of file structure design / explain goals of good FS design
3. Explain history of file structure design
4. Explain the functions of READ and WRITE with parameters
5. With a neat sketch, explain UNIX directory structure
6. Differentiate between physical and logical files
7. Discuss about fundamental file processing operations
8. What is a file? Explain briefly the evolution of file structure design.
9. Discuss about the fundamental File processing Operations.
10. Explain briefly about I/O Redirection and pipes
11. List and explain different Unix file system commands
48
Prepared by Shilpa B, Dept. of ISE, CEC
12. Explain seeking with C and C++ streams.
13. Explain sector based data organization in magnetic disk.
14. Explain the different costs of disk access
15. How the data is physically stored on a CDROM?
16. Differentiate between CLV and CAV
17. What are the different buffering strategies? Explain briefly.
18. Write a note on buffer management.
19. What is seeking and how it is supported in C++ streams.
20. What do you mean by file structure? Explain in brief a short history of file structure design.
21. Briefly discuss the evolution of file structure.
22. What are file structures? What is the driving force behind the file structure design?
23. What is seeking and how it is supported in C Streams and C++ Streams.
24. Explain the following :
Physical file,
Logical file,
Open function,
Close function, and
Reading and writing file.
25. Bring out the differences between physical files and logical files.
26. Describe the relation between physical file and the logical file.
27. With a neat sketch, explain UNIX directory structure.
28. Discuss about the Fundamental File processing operations.
29. Explain the functions OPEN, READ, and WRITE with parameters.
30. Explain the following functions:
Open a file, and
Close a file
31. Explain the strengths & weaknesses of CD-ROM .
32. Define the following terms :
Seek time,
Rotational Delay, and
Transfer time
33. What are the two basic ways to address data on disks?
34. What are the different buffering strategies? Explain briefly.
35. Write a note on organization of CD-ROM.
36. How the data is physically stored on a CD-ROM? List the major strengths and weaknesses of
CDROMs.
49
Prepared by Shilpa B, Dept. of ISE, CEC
37. Write a note on disk organisation.
38. Explain sector based data organisation in magnetic disk with a neat diagram.
39. Suppose that we want to store a file with 60,000 fixed length data records where each requires 80
bytes and records are not allowed to space two sectors, sector/track = 63 bytes per sector = 512,
tracks per cylinder = 16 and average rotational delay = 6 m/s. How many cylinders are required for
the file?
40. Briefly explain the different basic ways to organize the data on a disk.
41. Explain the organization of data on Tapes with neat diagram. With an example estimate the tape
length requirements.
42. Write short notes on Magnetic tapes.
43. Explain the organization of data on tapes, with a neat diagram. Estimate the tape length
requirements, with a suitable example.
44. Calculate the space required on tape, if we want to store the 1 million 100 bytes records on a 7250
bpi tape that has an internal block gap of 0.2 inches and with a blocking factor of 60. Hence calculate
the space required.
45. Explain the different costs of the disk access.
46. What are the three distinct operations that contribute to the total cost of access on disk?
47. Briefly explain the organization of data on Nine-Track tapes with a neat diagram.
50
Prepared by Shilpa B, Dept. of ISE, CEC