0% found this document useful (0 votes)
40 views

Introduction To File Structures: CENG 351 1

This document provides an introduction to file structures. It discusses where file structures fit within computer science between applications, databases, file systems, and operating systems. The goal of file structures is to minimize trips to slow secondary storage like disks. Files are organized into records, records into fields. Disks use tracks, sectors, and cylinders to organize where data is stored. Accessing data requires finding the correct cylinder, track, and sector, which has seek time, rotational delay, and transfer time costs.

Uploaded by

puja kumari
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Introduction To File Structures: CENG 351 1

This document provides an introduction to file structures. It discusses where file structures fit within computer science between applications, databases, file systems, and operating systems. The goal of file structures is to minimize trips to slow secondary storage like disks. Files are organized into records, records into fields. Disks use tracks, sectors, and cylinders to organize where data is stored. Accessing data requires finding the correct cylinder, track, and sector, which has seek time, rotational delay, and transfer time costs.

Uploaded by

puja kumari
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 78

Introduction to File Structures

CENG 351 1
File Structures: What is it about ?

 So far we have talked about database tables.


 From tables to file structures.

– Storage of data
– Organization of data
– Access to data
– Processing of data

CENG 351 2
Where do File Structures fit in
Computer Science?

Application

DBMS

File system

Operating System

Hardware

CENG 351 3
Computer Architecture
data is Main Memory - Semiconductors
manipulated (RAM) - Fast, expensive,
here
volatile, small
data
transfer

data is - disks, tape


stored here Secondary - Slow,cheap,
Storage stable, large

CENG 351 4
Primary vs. Secondary Storage
Primary Secondary

• Fast (+) • Slow (-)


• Capacity Small • Capacity large
(-) (Many databases (+) (cheaper)
are too large to fit in • Non-volatile (+)
main memory)
• Volatile (-)
How fast is main memory?

• Typical time for getting info from:


Main memory: ~12 nanosec = 120 x 10-9 sec
Magnetic disks: ~30 milisec = 30 x 10-3 sec

CENG 351 6
Normal Arrangement
• Secondary storage (SS) provides reliable, long-
term storage for large volumes of data
• At any given time, we are usually interested in
only a small portion of the data
• This data is loaded temporarily into main
memory, where it can be rapidly manipulated
and processed.
• As our interests shift, data is transferred
automatically between MM and SS, so the data
we are focused on is always in MM.

CENG 351 7
Goal of the file structures
• Minimize the number of trips to the disk in
order to get desired information
• Grouping related information so that we are
likely to get everything we need with only
one trip to the disk.

CENG 351 8
Physical Files and Logical Files
• physical file: a collection of bytes stored on a disk or
tape
• logical file: a "channel" (like a telephone line) that
connects the program to a physical file
• The program (application) sends (or receives) bytes
to (from) a file through the logical file. The program
knows nothing about where the bytes go (came from).
• The operating system is responsible for associating a
logical file in a program to a physical file in disk or
tape. Writing to or reading from a file in a program is
done through the operating system.

CENG 351 9
Files
• The physical file has a name, for instance
myfile.txt
• The logical file has a logical name (a
variable) inside the program.
– In C :
FILE * outfile;
– In C++:
fstream outfile;
CENG 351 10
Basic File Processing Operations
• Opening
• Closing
• Reading
• Writing
• Seeking

CENG 351 11
File Systems

• Stored data is organized into files.


• Files are organized into records.
• Records are organized into fields.

CENG 351 12
Example
• A student file may be a collection of student
records, one record for each student
• Each student record may have several fields, such
as
– Name
– Address
– Student number
– Gender
– Age
– GPA
• Typically, each record in a file has the same fields.
CENG 351 13
Properties of Files

1) Persistance: Data written into a file


persists after the program stops, so the
data can be used later.
2) Sharability: Data stored in files can be
shared by many programs and users
simultaneously.
3) Size: Data files can be very large.
Typically, they cannot fit into main
memory.
CENG 351 14
Secondary Storage Devices

CENG 351 15
Secondary Storage Devices
 Two major types of storage devices:
1. Direct Access Storage Devices (DASDs)
– Magnetic Disks
Hard disks (high capacity, low cost per bit)
Floppy disks (low capacity, slow, cheap)
– Optical Disks
CD-ROM = (Compact disc, read-only memory)
DVD
2. Serial Devices
– Magnetic tapes (very fast sequential access)

CENG 351 16
Magnetic Disks

• Bits of data (0’s and 1’s) are stored on


circular magnetic platters called disks.
• A disk rotates rapidly (& never stops).
• A disk head reads and writes bits of data as
they pass under the head.
• Often, several platters are organized into a
disk pack (or disk drive).

CENG 351 17
Top view of a 36 GB, 10,000 RPM, IBM SCSI
server hard disk, with its top cover removed.
Note the height of the drive and the 10 stacked platters.
(The IBM Ultrastar 36ZX.) 18
19
Components of a Disk
Spindle
Tracks
Disk head

Sector

Platters
Arm movement

Arm assembly
Looking at a surface

tracks

sector
Surface of disk showing tracks and sectors
CENG 351 21
Organization of Disks
• Disk contains concentric tracks.
• Tracks are divided into sectors
• A sector is the smallest addressable unit in a disk.
• Sectors are addressed by:
surface #
cylinder (track) #
sector #

CENG 351 22
Accessing Data
• When a program reads a byte from the disk, the
operating system locates the surface, track and
sector containing that byte, and reads the entire
sector into a special area in main memory called
buffer.
• The bottleneck of a disk access is moving the
read/write arm. So it makes sense to store a file in
tracks that are below/above each other in different
surfaces, rather than in several tracks in the same
surface.
CENG 351 23
Cylinders
• A cylinder is the set of tracks at a given
radius of a disk pack.
– i.e. a cylinder is the set of tracks that can be
accessed without moving the disk arm.
• All the information on a cylinder can be
accessed without moving the read/write
arm.

CENG 351 24
Cylinders

CENG 351 25
Estimating Capacities

• Track capacity = # of sectors/track * bytes/sector


• Cylinder capacity = # of tracks/cylinder * track capacity
• Drive capacity = # of cylinders * cylinder capacity
• Number of cylinders = # of tracks in a surface

Knowing these relationships allows us to compute the amount of


disk space a file is likely to require

CENG 351 26
Exercise
• Store a file of 20000 records on a disk with
the following characteristics:
# of bytes per sector = 512
# of sectors per track = 40
# of tracks per cylinder = 12
# of cylinders = 1331
Q1. How many cylinders does the file require
if each data record requires 256 bytes?
Q2. What is the total capacity of the disk?
CENG 351 27
Clusters
• Another view of sector organization is the
one maintained by the O.S.’s file manager.
• It views the file as a series of clusters of
sectors.
• File manager uses a file allocation table
(FAT) to map logical sectors of the file to
the physical clusters.

CENG 351 28
Extents
• If there is a lot of room on a disk, it may be
possible to make a file consist entirely of
contiguous clusters. Then we say that the
file is one extent. (very good for sequential
processing)
• If there isn’t enough contiguous space
available to contain an entire file, the file is
divided into two or more noncontiguous
parts. Each part is an extent.
CENG 351 29
Fragmentation
 Internal fragmentation: loss of space
within a sector or a cluster.
1) Due to records not fitting exactly in a sector:
e.g. Sector size is 512 and record size is 300
bytes. Either
– store one record per sector, or
– allow records span sectors.
2) Due to the use of clusters: If the file size is not a
multiple of the cluster size, then the last cluster
will be partially used.

CENG 351 30
The Cost of a Disk Access
 The time to access a sector in a track on a surface is
divided into 3 components:

Time Component Action


Time to move the read/write arm to
Seek Time
the correct cylinder
Time it takes for the disk to rotate so
Rotational delay (or
that the desired sector is under the
latency)
read/write head
Once the read/write head is
Transfer time positioned over the data, this is the
time it takes for transferring data
CENG 351 31
Seek time
• Seek time is the time required to move the arm to
the correct cylinder.
• Largest in cost.
Typically:
– 5 ms (miliseconds) to move from one track to the next
(track-to-track)
– 50 ms maximum (from inside track to outside track)
– 30 ms average (from one random track to another
random track)

CENG 351 32
Average Seek Time (s)
• Since it is usually impossible to know exactly how many
tracks will be traversed in every seek, we usually try to
determine the average seek time (s) required for a
particular file operation.
• If the starting and ending positions for each access are
random, it turns out that the average seek traverses one
third of the total number of cylinders.
• Manufacturer’s specifications for disk drives often list this
figure as the average seek time for the drives.
• Most hard disks today have s of less than 10 ms, and high-
performance disks have s as low as 7.5 ms.

CENG 351 33
Latency (rotational delay)
• Latency is the time needed for the disk to rotate so
the sector we want is under the read/write head.
• Hard disks usually rotate at about 5000rpm, which
is one revolution per 12 msec.
• Note:
– Min latency = 0
– Max latency = Time for one disk revolution
– Average latency (r) = (min + max) / 2
= max / 2
= time for ½ disk revolution
• Typically 6 – 8 ms average
CENG 351 34
Transfer Time
• Transfer time is the time for the read/write head
to pass over a block.
• The transfer time is given by the formula:
number of bytes transferred
Transfer time = --------------------------------- x rotation time
number of bytes on a track
• e.g. if there are 63 sectors per track, the time to
transfer one sector would be 1/63 of a
revolution.
CENG 351 35
Exercise
Given the following disk:
– 20 surfaces
800 tracks/surface
25 sectors/track
512 bytes/sector
– 3600 rpm (revolutions per minute)
– 7 ms track-to-track seek time
28 ms avg. seek time
50 ms max seek time.
Find:
a) Average latency
b) Disk capacity
c) Time to read the entire disk, one cylinder at a time
CENG 351 36
Solution
a) Average Latency:
3600 rev/min
 1 min = 60000 msec =
 Average latency = ½ * (60000 / 36000) = 16.7/2 = 8.3 ms
b) Disk capacity
25*512*800*20 = 204.8MB
c) Time to read the disk:
Track read time = 1 revolution time= 16.7 ms
Cylinder read time = 20*16.7= 334ms
Total read time = 800*cylinder reads + 799 cylinder switches
= 800*334 ms + 799 * 7ms
= 267 sec + 5.59 sec = 272.59 sec
CENG 351 37
Exercise
• Disk characteristics:
– Average seek time = 8 msec.
– Spindle speed = 10,000 rpm
– Sectors per track = 170
– Sector size = 512 bytes

• Q) What is the average time to read one


sector?

CENG 351 38
Solution
• Average time to read one sector:
s + r + btt
• What is btt?
btt : block transfer time = revolution time/ #of
sectors per track
Revolution time = 60000/10000 = 6 msec
btt = 6/170 = 0.035 ms
• s + r + btt = 8+3+0.035 =11.035 ms
CENG 351 39
Sequential Reading
• Given the following disk:
– s = 16 ms
– r = 8.3 ms
– Block transfer time = 0.84 ms
a) Calculate the time to read 10 sequential
blocks
b) Calculate the time to read 100 sequential
blocks
CENG 351 40
Solution
a) Reading 10 sequential blocks:
= s + r+ 10 * btt
= 16 + 8.3 + 10 * 0.84 = 32.7 ms

b) 100 blocks:
= 16 + 8.3 + 100 * 0.84 = 108.3 ms

CENG 351 41
Random Reading
Given the same disk,
a) Calculate the time to read 10 blocks
randomly
b) Calculate the time to read 100 blocks
randomly

CENG 351 42
Solution
a) Reading 10 blocks randomly:
= 10 * (s + r + btt)
= 10 * (16 + 8.3 + 0.84) = 251.4 ms

b) 100 blocks:
= 100 *(16 + 8.3 + 0.84) = 2514 ms

CENG 351 43
Fast Sequential Reading
• We assume that blocks are arranged so that there
is no rotational delay in transferring from one
track to another within the same cylinder. This is
possible if consecutive track beginnings are
staggered (like running races on circular race
tracks)
• We also assume that the consecutive blocks are
arranged so that when the next block is on an
adjacent cylinder, there is no rotational delay after
the arm is moved to new cylinder
• Fast sequential reading: no rotational delay after
finding the first block.
CENG 351 44
Consequently …
Reading b blocks:
i. Sequentially:
s + r + b * btt
insignificant for large files

 b * btt
i. Randomly:
b * (s + r + btt)

CENG 351 45
Exercise
• Given a file of 30000 records, 1600 bytes
each, and block size 2400 bytes, how does
record placement affect sequential reading
time?
i) Empty space in blocks.

i) Records overlap block boundaries.

CENG 351 46
Solution
i) Empty space in blocks:
b = # of blocks = n = # of records
30000*0.84 = 25.2 sec
ii) Records overlap boundaries:
Bfr = Blocking factor = 2400/1600 =3/2
b = 30000/1.5 = 20000 blocks
Time = 20000 * 0.84 = 16.8 sec (1/3 faster)

CENG 351 47
Exercise
• Specifications of a 300MB disk drive:
– Min seek time = 6ms.
– Average seek time = 18ms
– Rotational delay = 8.3ms
– transfer rate = 16.7 ms/track or 1229 bytes/ms
– Bytes per sector = 512
– Sectors per track = 40
– Tracks per cylinder = 12
– Tracks per surface = 1331
– Interleave factor = 1
– Cluster size= 8 sectors
– Extent size = 5 clusters
Q) How long will it take to read a 2048Kb file that is
divided into 8000 256 byte records?
i) Access the file sequentially
ii) Access the file randomly
CENG 351 48
Solution
First find the # of extents:
1 cluster = 8 sectors = 8 *512 = 4096 bytes
 16 records per cluster
 File contains 8000/16 = 500 clusters
Extent size = 5 clusters = 1 track
 File contains 100 extents => 100 tracks
i) Access the file sequentially:
For 1 track = s + r + track transfer time
= 18+8.3+16.7 = 43 ms
100 tracks = 4300 ms = 4.3 sec
i) Access the file randomly: (8000 records)
For each record: s+ r + read 1 cluster
= 18 + 8.3 + 1/5 * 16.7 = 29.6 ms
8000 records => 8000 * 29.6 = 236.8 sec
CENG 351 49
Secondary Storage Devices: Floppy
Disks

CENG 351 50
Floppy Disks
A floppy disk is a disk storage
medium composed of a disk of thin
and flexible magnetic storage
medium.

Developed by IBM
3.5-inch, 5.24-inch and 8-inch
forms

CENG 351
Internal parts of a 3½-inch floppy disk.

1)A hole that indicates a high-capacity


disk.
2)The hub that engages with the drive
motor.
3)A shutter that protects the surface
when removed from the drive.
4)The plastic housing.
5)A polyester sheet reducing friction
against the disk media as it rotates
within the housing.
6)The magnetic coated plastic disk.
7)A schematic representation of one
sector of data on the disk; the tracks
and sectors are not visible on actual
disks.

CENG 351
Floppy Disks
A spindle motor in the drive rotates the
magnetic medium at a certain speed
A stepper motor-operated mechanism moves
the magnetic read/write head(s) along the
surface of the disk

CENG 351
Secondary Storage Devices:
Magnetic Tapes

CENG 351 54
Characteristics
• No direct access, but very fast sequential
access.
• Resistant to different environmental
conditions.
• Easy to transport, store, cheaper than disk.
• Before it was widely used to store
application data; nowadays, it’s mostly used
for backups or archives
CENG 351 55
Magnetic tapes
• A sequence of bits are stored on magnetic
tape.
• For storage, the tape is wound on a reel.
• To access the data, the tape is unwound
from one reel to another.
• As the tape passes the head, bits of data are
read from or written onto the tape.

CENG 351 56
Reel 1 Reel 2

tape

Read/write head
CENG 351 57
Tracks
• Typically data on tape is stored in 9
separate bit streams, or tracks.
• Each track is a sequence of bits.
• Recording density = # of bits per inch (bpi).
Typically 800 or 1600 bpi.
30000 bpi on some recent devices.

CENG 351 58
In detail
8 bits = 1 byte

… 0 0 0 0 …
1 1 1 1
1 1 1 1
0 0 0 0
1 1 1 1 ½”
1 1 1 1
0 0 0 0


1
0
1
0
1
0
1
0


parity bit

CENG 351 59
Tape Organization
2400’

BOT marker
Data blocks Inter block gap EOT marker
Header block

BOT = beginning of tape; EOT = end of tape


Header block: describes data blocks
Inter block gap: For acceleration and deceleration of tape
Blocking factor: # records per block
CENG 351 60
Spring 2006 by Li Ma, TSU - cs344
Data Blocks and Records
• Each data block is a sequence of contiguous
records.
• A record is the unit of data that a user’s
program deals with.
• The tape drive reads an entire block of
records at once.
• Unlike a disk, a tape starts and stops.
• When stopped, the read/write head is over
an interblock gap.
CENG 351 61
Example: tape capacity
• Given the following tape:
– Recording density = 1600 bpi
– Tape length = 2400 ’ (feet)
– Interblockgap = ½ ” (inch)
– 512 bytes per record
– Blocking factor = 25
• How many records can we write on the
tape? (ignoring BOT and EOT markers and
the header block for simplicity)
CENG 351 62
Solution
 #bytes/block = (512 bytes/record) * (25 records/block)
= 12,800 bytes/block
 Block length = (#bytes/block) / (#bytes/inch)
= 12,800/1600 inches = 8 inches
 Block + gap = 8” + 1/2” = 8.5”
 Tape length =2400 ft * 12 in/ft = 28,800 in
 #blocks = (tape length) / (block + gap)
= 28,800/8.5 = 3388 blocks
 #records = (#blocks) * (#records/block)
= 3388 * 25 = 84,700 records

CENG 351 63
Spring 2006 by Li Ma, TSU - cs344
Secondary Storage Devices:
CD-ROM

CENG 351 64
Physical Organization of CD-ROM
• Compact Disk – read only memory (write once)
• Data is encoded and read optically with a laser
• Can store around 600MB data
• Digital data is represented as a series of Pits and
Lands:
– Pit = a little depression, forming a lower level in the
track
– Land = the flat part between pits, or the upper levels
in the track

CENG 351 65
Organization of data
• Reading a CD is done by shining a laser at the disc
and detecting changing reflections patterns.
– 1 = change in height (land to pit or pit to land)
– 0 = a “fixed” amount of time between 1’s
LAND PIT LAND PIT LAND
...------+ +-------------+ +---...
|_____| |_______|
..0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 ..

• Note : we cannot have two 1’s in a row!


=> uses Eight to Fourteen Modulation (EFM) encoding
table.
CENG 351 66
Properties
• Note that: Since 0's are represented by the length
of time between transitions, we must travel at
constant linear velocity (CLV)on the tracks.
• Sectors are organized along a spiral
• Sectors have same linear length
• Advantage: takes advantage of all storage space
available.
• Disadvantage: has to change rotational speed
when seeking (slower towards the outside)
CENG 351 67
Addressing
• 1 second of play time is divided up into 75
sectors.
• Each sector holds 2KB
• 60 min CD:
60min * 60 sec/min * 75 sectors/sec =
270,000 sectors = 540,000 KB ~ 540 MB
• A sector is addressed by:
Minute:Second:Sector
e.g. 16:22:34
CENG 351 68
DVD (Digital Video Disc) Characteristics
• A DVD disc has the same physical size as a CD disc, but it can store from
4.7 to 17 GB of data.

• Like a CD disc, data is recorded on a DVD disc in a spiral trail of tiny pits
separated by lands.

• The DVD’s larger capacity is achieved by making the pits smaller and the
spiral tighter, and by recording the data as many as four layers, two on each
side of the disc.

• To read these tightly packed discs, lasers that produce a shorter wavelength
beam of light are required to achieve more accurately aiming and focusing
mechanism. In fact, the focusing mechanism is the technology that allows
data
to be recorded on two layers. To read the second layer, the reader simply
focuses the laser a little deeper into the disc, where the second layer of data is
recorded. 69
Secondary Storage: Flash Memory

CENG 351 70
Flash Memory
• Non-volatile computer storage chip that can be electrically
erased and reprogrammed.
• It was developed from EEPROM (electrically erasable
programmable read-only memory)
• The NAND type: primarily used in memory cards, USB
flash drives, for general storage and transfer of data.
• The NOR type: used as a replacement for the older
EPROM and as an alternative to certain kinds of ROM
applications.

CENG 351
Flash Memory
Replacement for hard disks:
• Adv: Flash memory does not
have the mechanical
limitations and latencies of
hard drives
• Disadv: The cost per gigabyte
of flash memory remains
significantly higher than that
of hard disks.

CENG 351
Buffer Management

CENG 351 73
Buffer Management

• Buffering means working with large chunks of


data in main memory so the number of accesses to
secondary storage is reduced.
• System I/O buffers: These are beyond the control
of application programs and are manipulated by
the O.S.

CENG 351 74
System I/O Buffer
Data transferred
by blocks

Secondary
Storage Buffer Program

Data transferred
by records
Temporary storage in MM
for one block of data

CENG 351 75
Buffer Bottlenecks
• Consider the following program segment:
while (1) {
infile >> ch;
if (infile.fail()) break;
outfile << ch;
}
• What happens if the O.S. used only one I/O buffer?
 Buffer bottleneck
• Most O.S. have an input buffer and an output
buffer.

CENG 351 76
Buffering Strategies
• Double Buffering: Two buffers can be used to
allow processing and I/O to overlap.
– Suppose that a program is only writing to a disk.
– CPU wants to fill a buffer at the same time that I/O is
being performed.
– If two buffers are used and I/O-CPU overlapping is
permitted, CPU can be filling one buffer while the other
buffer is being transmitted to disk.
– When both tasks are finished, the roles of the buffers
can be exchanged.
• The actual management is done by the O.S.
CENG 351 77
Other Buffering Strategies
• Multiple Buffering: instead of two buffers any
number of buffers can be used to allow processing
and I/O to overlap.
• Buffer pooling:
– There is a pool of buffers.
– When a request for a sector is received, O.S. first looks
to see that sector is in some buffer.
– If not there, it brings the sector to some free buffer. If
no free buffer exists, it must choose an occupied buffer.
(usually LRU strategy is used)

CENG 351 78

You might also like