File Organization EDIT
File Organization EDIT
It is a system that manages and organizes all computer files, stores them and makes them
available when they are needed. Without a file system, information that are kept in a storage
area would be one large body of data with no way to tell where one piece of information steps
and the next begins.
File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record. In simple terms, Storing the files in certain order is called file Organization.
Record - a record represents a collection of attributes that describe a real-world entity. A record
consists of fields, with each field describing an attribute of the entity.
A character is any letter, number, space, punctuation mark, or symbol typed or entered on a
computer.
In most computer systems, a byte is a unit of data that is binary digits long. A byte is the unit
most computers use to represent a character such as a letter, number or typographic symbol
Bit (Character) - a bit is the smallest unit of data representation (value of a bit may be a 0 or 1).
Eight bits make a byte which can represent a character or a special symbol in a character code.
Read: The term read is a description of how information is obtaining from a drive or another
storage device. For example, before the page was shown on your browser, it had to be read from
server hard drive and then downloaded
Write: When saving new information or changes, you write to the drive, if write protection is
enabled or you lack the proper permissions you may only have read only capabilities which
means you can read information but not write or change anything
If you lack read access or the drive is accessible the information is not readable
Read means you can open the file and view the content. Write means you can make changes to
the file.
Fetch is a file transfer protocol {FTP} application that provides a graphical user interface to
download {copying from a remote server to a local computer} and upload {copying from a local
computer to a remote server} files
1. Sequential
A sequentially organized file consists of records arranged in the sequence in which they are
written to the file (the first record written is the first record in the file, the second record written
is the second record in the file, and so on). As a result, records can be added only at the end of
the file. Attempting to add records at some place other than the end of the file will result in the
file begin truncated at the end of the record just written.
Sequential files are usually read sequentially, starting with the first record in the file. Sequential
files with a fixed-length record type that are stored on disk can also be accessed by relative
record number (direct access).
Records in sequential files can be read or written only sequentially.
After you have placed a record into a sequential file, you cannot shorten, lengthen, or delete the
record. However, you can update (REWRITE) a record if the length does not change. New
records are added at the end of the file.
If the order in which you keep records in a file is not important, sequential organization is a good
choice whether there are many records or only a few. Sequential output is also useful for printing
reports.
2 Random File Organization
This is the type of file design where the records are stored in a storage with no regard to any
specific sequence. In random file organisation, records are stored in random order within the file.
Though there is no sequencing to the placement of the records. As such, random files are also
known in some literature as direct access files.
3. Serial
Serial file organization is the simplest file organization method. In serial files, records are
entered in the order of their creation. As such, the file is unordered, and is at best in
chronological order. Serial files are primarily used as transaction files in which the transactions
are recorded in the order that they occur.
This type of access is normally used by magnetic tapes
4. Indexed- Sequential
indexed file contains records ordered by a record key. Each record contains a field that contains
the record key. The record key uniquely identifies the record and determines the sequence in
which it is accessed with respect to other records. A record key for a record might be, for
example, an employee number or an invoice number.
An indexed file can also use alternate indexes, that is, record keys that let you access the file
using a different logical arrangement of the records. For example, you could access the file
through employee department rather than through employee number
There are three major types of indexes used:
Basic Index: This provides a location for each record (key) that exists in
the system.
Implicit Index: This type of index gives a location of all possible records
(keys) whether they exist or not.
Limit Index: This index groups the records (keys) and only provides the
location of the highest key in the group. Generally, they form a hierarchical
index. Data records are blocked before being written to disk. An index may
consist of the highest key in each block, (or on each track).
A0064
A0073 Block 3
A0075
In the above example, data records are shown as being 3 to a block. The index,
then, holds the key of the highest record in each block. (An essential element of
the index, which has been omitted from the diagram for simplicity, is the physical
address of the block of data records). Should we wish to access record 5, whose
key is A0038, we can quickly determine from the index that the record is held m
block 2, since this key is greater than the highest key in block 1, A0025, and less
than the highest key in block 2, A0053. By way of the index we can go directly to
the record we wish to retrieve, hence the term "direct access".
Old Master File: The old master file used during the first updating period is termed
Father while the new master file produced is termed the son.
New Master File (that’s the son): The new master file (son) used during the second
updating period becomes the Father while the master file produced during the second
updating forms, the Son.
Grand-Father-Father & Son Analog: Now considering the two updating periods, the
old master file used in the first period now becomes the Grandfather, the new master
file used in the second updating becomes the Father while the new master file produced
in the second period forms the Son.
Therefore, we have, Grandfather – Father – Son forming a generation. On the next
updating run, the Grandfather file is over written and can then be used again, as the
great-grand-father is not allowed.
Master file: A master file consists of data fields which are of a permanent
nature. The values of these fields must continually be brought up-to-date so that
the file will always contain the most recent transaction or affairs in the
organization. For instance, an employee file is made of records whose fields may
include; Employee Number, Name, Date of birth, Qualification, Salary grade, etc.
These fields are permanent, although, Qualification and Salary grade might need
to be updated at a future time/date.
File characteristics
Data file(s) should have at least one of the following behaviors or qualities:
i. Hit rate: This is the frequency with which active records in the file are
accessed. File records that are accessed more often are said to have
high hit rate, while those records which are rarely accessed are said to
have low hit rate.
ii. Volatility: The ease at which records are either added or deleted from
file is described as file volatility. This calls for careful handling of data
file to avoid sudden loss of its contents (records) due to its volatility
nature.
iii. Growth: This is referred to the increase in size of the file. When a file is
created, it grows as new records are added to the file. The file grows as
its contents (records) increase.
iv. Size: The file size is expressed in terms of characters in the field of the
record and the number of records. When a file is created, the size is
determined by the amount of data stored in the file.
Types of storage devices and media
The characteristics of magnetic storage media, tape, disk, cartridge, bubble,
hard disk, CD-ROM, floppy disks, zip disk, tape streamer, flash memory,
optical disk.
Storage media/devices
A device that can receive data and retain for subsequent retrieval is called
storage medium. The file storage and retrieval describe the organization,
storage, location and retrieval of coded information in computer system.
Important factors in storing and retrieving information are the type of media or
storage device used to store information, the media’s storage capacity, the speed
of access and information transfer to and from the storage media, the number of
times new information can be written to the media and how the media interacts
with the computer.
File storage can be classified as permanent storage and temporal storage. File
can also be classified and having been stored to or retrieved from primary or
secondary memory. The primary memory is also known as main memory is the
computer’s main random access memory (RAM). All information that is
processed by the computer must first pass through main memory. Secondary is
any form of memory/storage other than the main computer memory. Such
memories (devices) cover a wide range of capacities and speed of access. The
examples are; magnetic disk storage which include floppies, hard-disks,
cartridge, exchangeable multi-platter, CD-ROMs, fixed disks, flash disk and
magnetic tape.
a. Permanent storage
b. Temporary storage
Storage media:
Information is stored on many different types of media, the most common being
floppy disks, hard drives, CD-ROMs, magnetic tapes and flash disks.
Floppy disks
Floppy disks are most often used to store information, such as application
programs that are frequently accessed by the users. A floppy disk is a thin piece
of magnetic material inside a protective envelop. The size of the disk is usually
given as a diameter of the magnetic media. The two most common sizes are; 2.5
inch and 5.25 inch. Both sizes of floppies are removable disk – that is, they must
be inserted into a compatible disk drive to read from or written to. This drive is
usually internal to, or part of a computer. Most floppy drives today are double
sided, with one head on each side of the disk. This doubles the storage capacity
of the disk, allowing it to be written to on both sides. Information is organized on
the disk by dividing the disk into tracks and sectors. Tracks are concentric
circular regions on the surface of the disk. Before a floppy can be used, the
computer has to format it by placing special information on the disk that enables
the computer to find each track and sector.
Hard drive
metal
box with associated read/write heads. They
are usually internal to a computer. Most
hard drives have multiple platters stacked on top of one another, each with it’s
own read/write head. The media in a hard drive is generally not removable from
the drive assembly, although external hard drive do exist with removable hard
disks. The read/write heads in a hard drive are precisely aligned with the
surfaces of the hard disks, allowing thousands of tracks and dozens of sectors
per track. The combination of more heads and more tracks allows hard drives to
store more data and to transfer data at a higher rate than floppy disks.
Accessing information on a hard disk involves moving the heads to the right track
and then waits for the correct sector to revolve underneath the head. Seek time
is the average time needed/required to move the heads from one track to some
other desired track on the disk. The time needed to move from one track to a
neighbouring track is often in the 1 millisecond (i.e one thousand per second)
range, and the average seek time to reach arbitrary track anywhere on the disk is
in the 6 to 15 millisecond range.
Rational latency is the average time required for the correct sector to come under
the head once they are positioned on the correct track. This time depends on
how fast the disk is revolving. Today, many drives run 120 to 180 (or more)
revolutions per second or faster, yielding average rotational latencies of a few
milliseconds.
If a file required more than one sector for storage, the positions of the sectors on
the individual tracks can greatly affect the average access time. Typically, it takes
the disk controller a small amount of time to finish reading a sector. If the next
sector to be read is the neighboring sector on the track, the electronic may not
have enough time to get ready to read it before it rotates under the read/write
head. If this is the case, the drive must wait until the sector comes all the way
round again. This access time can be reduced by interleaving, or alternatively
placing the sectors on the tracks so that sequential sectors for the same file are
separated from each other by one or two sectors. When information is distributed
optimally, the device controller is ready to start reading just as the appropriate
sector comes under the read/write head.
After many files have been written to and erased from a disk, fragmentation can
occur. Fragmentation happens when pieces of single files are inefficiently
distributed in many locations on a disk. The result is an increase in average file
access time. This problem can be fixed by running a defragmentation program,
which goes through the drive track by track and rearranges the sectors for each
file so that they can be accessed more quickly.
Disk structure
Unlike floppy drives, in which the read/write heads actually touch the surface of
the material, the heads in most hard disks float slightly of the surface. When the
heads accidentally touch the media, either because the drive is dropped or
bumped hard or because of an electrical malfunction, the surface becomes
scratched. Any data stored where the head has touched the disk is lost. This is
called head crash. To help reduce the possibility of head crash, most disk
controllers park the heads over an unused track on the disk when the drive is not
being used by the CPU.
It is important to have a large hard disk drive to undertake some tasks like video
and sound editing, however the latest desktop computers come with a Minimum
of 60Gbyte of capacity which is enough for most standard tasks. Currently, hard
disk drives have capacities up to 120Gbytes and data transfer rates of 160Mbits
per second.
CD-ROMs
Magnetic tape
Magnetic tape has served as efficient and reliable information storage media
since 1950s. Most magnetic tapes are made of mylar, a type of strong plastic, in
which metallic particles have been embedded. A read/write head is identical to
those used for audio tape reads and writes binary information to the tape. Reel –
to-reel magnetic tape is commonly used to store information for large mainframe
or supercomputers. High-density cassette tapes, resembling audio cassette
tapes are used to store information for personal computers and mainframes.
Magnetic tape storage has advantage of being able to hold enormous amounts of
data. For this reason, it is used to store information on the largest computer
system. Magnetic tape has two major shortcomings; it has a very slow data
access time when compared to other form of storage media and access to
information on magnetic tape is sequential. In sequential data storage, data are
stored with the first bit at the beginning of the tape and the last bit at the end of
the tape, in a linear fashion. To access a random bit of information, the tape drive
has to forward or reverse through the tape until it finds the location of the bit. The
bit closest to the location of the read/write head can be accessed relatively faster,
but bits far away take a considerable time to access. RAM on the other hand, is
random access, meaning that it can locate any one bit as easily as any other.
Flash memory
Another type of storage device, called Flash memory, traps small amounts of electric charge in
‘wells’ on the surface of a chip.Side effects of this trapped charge, such as the electric field it
creates are late used to read the stored value. To rewrite to flash memory, the charges in the wells
must be first drained. Such drives are useful for storing information that changes infrequently.
Most flash memories are of large capacity and hence are used to store large volume of data.
Future technology
• Formatting
– Physical: divide the blank slate into sectors identified by headers containing
such information as sector number; sector interleaving.
– Logical: marking bad blocks; partitioning (optional) and writing a blank directory
on disk; installing file allocation tables, and other relevant information (file system
initialization)
• Reliability
• Controller caches
– newer disks have on-disk caches (128KB—512KB)
Indexed sequential: This is an access method for a sequentially organized file whose records are
indexed with their corresponding address. This access method supports both sequential access
and indexed access and the two
types of access that can be comfortably done in this method are sequential
access and random access. Note that this method is only possible with disk files.
Seek Time
Seek time is one of the three delays associated with reading or writing data on a
computer’s drive and somewhat similar for CD or DVD drives. The others are
rotational delay and transfer time and their sum is the access time. In order to
read or write data in a particular place on the disk, the read/write head of the disk
needs to be physically moved to the correct place. This process is known as
seeking and the time it takes for the head to move to the right place is the seek
time. Seek time for a given disk varies depending on how far the head
destination is from its origin at the time of each read/write instruction.
Rotational Delay
The rotational delay is the time required for the address area of the disk to rotate
into a position where it is accessible by the read/write head.
Transfer Time
Transfer Time is the number of bits that are transferred per unit of time. The unit
therefore, is in bit per second (bps).
Concept of buffer
Functions of Buffer
Buffer synchronizes the speed of data transfer among the systems/devices that
are under the control of the CPU. It makes individual devices (that is, input and
output devices) to perform their functions independently. This is because, the
rate at which the CPU performs/sends or receive it data is very high compare to
the rate at which the I/O devices receive or send data. Buffer is equally used to
accommodate the differences in the rate at which two devices can handle data
during data communication/transfer.
File Processing
File is a body of stored data or information in an electronic format. Almost all information stored
on computers is in the form of files. Files reside on mass storage devices such as hard drives,
CD-ROMs, magnetic tape, and floppy disks. When the central processing unit (CPU) of a
computer needs data from a file, or needs to write data to a file, it temporarily stores the file in its
main memory, or Random Access Memory (RAM), while it works on the data.
Information in computers is usually classified into two different types of files: data files (those
containing data) and program files (those that manipulate data). Within each of these categories,
many different types of files exist that store various kinds of information.
Different computer operating systems have unique rules for the naming of files. Windows 95
(Win95) and disk operating systems (DOS), for instance, make use of an extension attached to
the end of each filename in order to indicate the type of file. Extensions begin with a period (.),
and then have one or more letters. An example of a file extension used in Win95 and DOS is
.bak, which indicates that the file is a backup file.
When saving a file, a user can give it any name within the rules of the operating system. In
addition, the name must be unique. Two files in the same directory may not have the same name,
but some operating systems allow the same name for one file to be placed in more than one
location. These additional names are called aliases.
Directory files contain information used to organize other files into a hierarchical structure. In
the Macintosh operating system, directory files are called folders. The topmost directory in any
file system is the root directory. A directory contained within another directory is called a
subdirectory. Directories containing one or more subdirectories are called parent directories.
Directory files contain programs or commands that are executable by the computer.
Executable files have a .exe suffix at the end of their names and are often called EXE
(pronounced EX-ee) files. Text files contain characters represented by their ASCII (American
Standard Code for Information Interchange) codes. These files are often called ASCII
(pronounced ASK-ee) files. Files that contain words, sentences, and bodies of paragraphs are
frequently referred to as text files. The diagram below shows the root directory, sub director and
file
The file processing operations deal with the various activities which are performed on
the file. These operations are briefly described as shown below;
_
File creation: The process of bringing file into existence is called file creation.
_
Searching: Searching is locating data in a file by reference to a special field of
each record/data called the key. The key is a unique filed used to identify certain
record in a file. If a record is to be inserted into a file, it must be given a unique
key value.
_
Retrieving/reading: This involves reading an existing data from a form of storage
or input medium.
_
Writing: Writing is the act of recording data onto some form of storage.
_
Deleting: This means removing a record or item of data from a storage medium
such as disk/tape.
_
File updating: This is an act of changing values in one or more records of a file
without changing the organization of the file. That is making the file modern by
adding most recent data to the file.
File merging: Combining multiple sets of data files or records to produce only one
set, usually in an ordered sequence is referred to as file merging.
Reporting: Reporting is a file processing operation that deals with the production
(printing) of report from the file in a specified format.
File display: The contents of a data file can be displayed either on the computer
screen as soft copy or printed on the paper as hard copy.
Table
Table is a collection of records. Each record stores information associated with a
key by which specific records are found or the records may be arranged in an
array so that the index is the key. In commercial applications the word table is
often used as a synonym for matrix or array.
Array
Array is an ordered collection of a number of elements of the same type, the
number being fixed unless the array is flexible. The elements of one array may
be of type integer, those of another array may be of type real, while the elements
of a third array ma be of type character string. Each element has a unique list of
index values that determine its position in the ordered collection. Each index is of
a discrete type and the number of dimension in the ordering is fixed.
List
List is a finite ordered sequence of items (x1, x2, …, xn), where n ≥ 0. If n = 0,
the list has no elements and is called null list (or empty list). If n > 0, the list
has at least one element, x1, which is called the head of the list. The list
consisting of the remaining items is called the tail of the original list. The tail of
the null list is the null list, as the tail of a list containing only one element.
Stack
A stack is a linear list that can be accessed for either input or output at just one of its
two ends. In stack operations, all accesses involving insertions and removals are made
at one end of the list, called top. This implies access on a last in first out (LIFO) basis
where the most recently inserted item on the list is the first to be removed.
The operations push and pop refer respectively to the insertion and removal of items at
the top of the stack. Stacks occur frequently in computing and in particular are closely
QUEUE
Queue is concerned with the most useful handling of various waiting line
situations, from airplanes waiting to land to computer programs
filed for processing. The field arose from telephone networking studies
in the early 20th century, and deals with such factors as the pattern of
arrivals in a queue, the varying needs of each arrival, and the resulting