Lesson 5: File Management: IT 311: Applied Operating System
Lesson 5: File Management: IT 311: Applied Operating System
File Structure
Four terms are in common use when discussing files:
A field is the basic element of data. An individual field contains a single value, such as
an employee’s last name, a date, or the value of a sensor reading. It is characterized
by its length and data type (e.g., ASCII string, decimal).
A record is a collection of related fields that can be treated as a unit by some
application program. For
A file is a collection of similar records. The file is treated as a single entity by users
and applications and may be referenced by name. Files have file names and may be
created and deleted.
Some file systems are structured only in terms of fields, not records. In that case, a file
is a collection of fields.
A database is a collection of related data. The essential aspects of a database are
that the relationships that exist among elements of data are explicit, and that the
database is designed for use by a number of different applications. A database may
contain all of the information related to an organization or a project, such as a
business or a scientific study. The database itself consists of one or more types of
files.
The Pile
The least complicated form of file organization may be
termed the pile. Data are collected in the order in which they
arrive. Each record consists of one burst of data. The purpose
of the pile is simply to accumulate the mass of data and save
it. Records may have different fields, or similar fields in
different orders.
Pile files are encountered when data are collected and stored
prior to processing or when data are not easy to organize. This
type of file uses space well when the stored data vary in size
and structure, is perfectly adequate for exhaustive searches,
and is easy to update. However, beyond these limited uses,
this type of file is unsuitable for most applications.
Access Methods
Files store information. When it is used, this information
must be accessed and read into computer memory. The
information in the file can be accessed in several ways.
Some systems provide only one access method for files.
Others (such as mainframe operating systems) support many
access methods, and choosing the right one for a particular
application is a major design problem.
Contents
Associated with any file management system and collection
of files is a file directory. The directory contains information
about the files, including attributes, location, and ownership.
Much of this information, especially that concerned with
storage, is managed by the operating system. The directory
is itself a file, accessible by various file management
routines. Although some of the information in directories is
available to users and applications, this is generally provided
indirectly by system routines.
Contents cont…
Table next slide suggests the information typically stored in the
directory for each file in the system. From the user’s point of view,
the directory provides a mapping between file names, known to
users and applications, and the files themselves. Thus, each file
entry includes the name of the file. Virtually all systems deal with
different types of files and different file organizations, and this
information is also provided. An important category of information
about each file concerns its storage, including its location and size.
In shared systems, it is also important to provide information that is
used to control access to the file. Typically, one user is the owner
of the file and may grant certain access privileges to other users.
Finally, usage information is needed to manage the current use of
the file and to record the history of its usage.
Structure
The way in which the information of previous slide table is
stored differs widely among various systems. Some of the
information may be stored in a header record associated with
the file; this reduces the amount of storage required for the
directory, making it easier to keep all or much of the directory
in main memory to improve speed.
The simplest form of structure for a directory is that of a list of
entries, one for each file. This structure could be represented
by a simple sequential file, with the name of the file serving as
the key. In some earlier single-user systems, this technique
has been used. However, it is inadequate when multiple users
share a system and even for single users with many files.
Structure cont…
To understand the requirements for a file structure, it is helpful to
consider the types of operations that may be performed on the directory:
Search: When a user or application references a file, the directory must be
searched to find the entry corresponding to that file.
Create file: When a new file is created, an entry must be added to the
directory.
Delete file: When a file is deleted, an entry must be removed from the
directory.
List directory: All or a portion of the directory may be requested.
Generally, this request is made by a user and results in a listing of all files
owned by that user, plus some of the attributes of each file (e.g., type,
access control information, usage information).
Update directory: Because some file attributes are stored in the directory,
a change in one of these attributes requires a change in the corresponding
directory entry.
Structure cont…
A more powerful and flexible
approach, and one that is almost
universally adopted, is the
hierarchical, or tree-structure,
approach (see Figure). As before,
there is a master directory, which
has under it a number of user
directories. Each of these user
directories, in turn, may have
subdirectories and files as entries.
This is true at any level: That is, at
any level, a directory may consist of
entries for subdirectories and/or
entries for files.
Naming
Users need to be able to refer to a file by a symbolic name.
Clearly, each file in the system must have a unique name in
order that file references be unambiguous. On the other
hand, it is an unacceptable burden on users to require they
provide unique names, especially in a shared system.
Naming cont…
The use of a tree-structured directory minimizes the difficulty in
assigning unique names. Any file in the system can be located by
following a path from the root or master directory down various branches
until the file is reached. The series of directory names, culminating in the
file name itself, constitutes a pathname for the file.
As an example, the file in the lower left-hand corner of next slide figure
has the pathname User_B/Word/Unit_A/ABC. The slash is used to delimit
names in the sequence. The name of the master directory is implicit,
because all paths start at that directory. Note it is perfectly acceptable to
have several files with the same file name, as long as they have unique
pathnames, which is equivalent to saying that the same file name may be
used in different directories. In our example, there is another file in the
system with the file name ABC, but that file has the pathname
/User_B/Draw/ABC.
Naming cont…
Description
The master directory includes entries for
the system and users Ay, B, and C.
Users Ay, B, and C each have their own
directories. The directory for user B has
entries for draw and word. Draw and
word, in turn, have their own directories.
The directory for word contains an entry
for unit Ay, and the directory for unit Ay
contains file Ay B C with path name
forward slash User underscore B forward
slash Word forward slash Unit
underscore Ay forward slash Ay B C. The
directory for draw contains entry Ay B C,
which yields file Ay B C, with path name
forward slash User underscore B forward
slash Draw forward slash Ay B C.
Naming cont…
Although the pathname facilitates the selection of file names, it would be
awkward for a user to have to spell out the entire pathname every time a
reference is made to a file. Typically, an interactive user or a process has
associated with it a current directory, often referred to as the working
directory.
Files are then referenced relative to the working directory. For example,
if the working directory for user B is “Word,” then the pathname
Unit_A/ABC is sufficient to identify the file in the lower left-hand corner of
previous slide figure. When an interactive user logs on, or when a
process is created, the default for the working directory is the user home
directory. During execution, the user can navigate up or down in the tree
to change to a different working directory.
Lossy Compression
Lossy compression can be applied to files where losing
data may not be detected, or if it is, the file is still usable.
This generally means that images, video, and sound use
lossy compression techniques (but not always!).
When you are watching a streamed video, you will often be
given the choice of watching in HD or SD. Whilst we know
the Standard Definition video is of inferior quality, it does
have the benefits of being a much smaller file so it will work
on slower connections. Even though the file is of a lower
quality (due to lossy compression), it remains usable.
Lossless Compression
Lossless compression is employed when we need to
reduce the size of the file without losing any data. This may
be because removing data would make the file unusable as
is the case for text files or program code, or to allow us to
uncompress the file back to it’s original state for editing.
Zipped files make use of lossless compression, allowing the
files & folders to be uncompressed after sending without
losing data. Compressing data, can be acheived using a
variety of algorithms, one being Huffman Encoding which
reduces the number of characters and makes efficient use of
bit patterns.