100% found this document useful (2 votes)
463 views

Chapter 5 File Management

This document discusses file management and file systems. It describes the need for long-term storage of information that persists beyond individual processes. Files provide this long-term storage on disks or other media in a way that is accessible to multiple processes concurrently. The document outlines key aspects of file systems including file naming, structure, attributes, operations and organization.

Uploaded by

Brian Mutuku
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
463 views

Chapter 5 File Management

This document discusses file management and file systems. It describes the need for long-term storage of information that persists beyond individual processes. Files provide this long-term storage on disks or other media in a way that is accessible to multiple processes concurrently. The document outlines key aspects of file systems including file naming, structure, attributes, operations and organization.

Uploaded by

Brian Mutuku
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter 5

File Management
Introduction
• Computer applications store and retrieve information.
• When a process is running, it can store information within its
address space.
• A problem with keeping information within the address space is
that when a process terminates, that information is lost.
• For some applications like database systems, we need a
permanent form of storage.
• Furthermore, it is necessary to have multiple processes
accessing the same information or parts of it at the same time.
• To solve this problem, we should make the information to be
independent of any one process
Long Term Storage

• Requirements for long term storage:


– It must be possible to store a very large amount of information
– The information must survive the termination of the process using it
– Multiple processes must be able to access the information
concurrently 

• The solution is to store the information on disks (or other


media) in units called files.
• Processes can then read them and write new ones if need
be.
• Information stored in files must be persistent (not affected
by process creation and termination).
Long Term Storage(cont)
• A file should only disappear when its owner explicitly
deletes it.
• Files are managed by the operating system.
• Part of the OS that deals with file naming, structure,
implementation, access, storage and protection is referred
to as the file system.
• To the users the most important aspect is the interface of
the file system:
– what constitutes a file,
– how a file is named and protected and
– operations allowed on files.
File Structure
Structure Terms
Field File
 basic element of data
 collection of similar
records
 contains a single value
 treated as a single entity
 fixed or variable length
 may be referenced by
name
Database  access control
 collection of related data
restrictions usually apply
 relationships among at the file level
elements of data are
Record
explicit
 collection of related fields
 designed for use by a
that can be treated as a
number of different unit by some application
applications program
 consists of one or more
 One field is the key – a
types of files unique identifier
File Naming

• Files are an abstraction mechanism.


• They provide a way to store information on disk and read
• This is done in such a way that shields the user from
the details of how and where the information is stored
and how the disk actually works.
•  When a process creates a file, it gives it a name.
• When the process terminates, the file continues to
exist and can be accessed by other processes using
its name.
File Naming (cont)
• The rule for naming files varies from system to system,
– but all operating systems allow strings of one to eight letters as
legal file names.
• Many operating file systems support names as long as 255 characters.
• Some are case-sensitive (UNIX), while others (MS-DOS) are not.
• Many OSes support two-part file names, with the two parts separated
by a period.
• The part following the period is called the file extension and usually
indicates something about the file.
– For example, html, for WWW Hypertext Markup Language
document, pas, for Pascal files, doc for MS Word documents, txt, for
text files etc.
File Attributes

• The OS associates
other information with
each file, for example,
the date and time of
creation and file size.
• This extra information
is referred to as file
attributes.
• The list of attributes
varies from one
system to another.
File Attributes (cont)
• The first four relate to file protection and tell who may access it
and who may not.
• In some systems, the user must supply a password to access a
file, in which case the password must be one of the attributes.
• Flags are bits or short fields that control some specific
property.
• Hidden files, for example, do not appear in the listing of all files.
• The archive flag keeps track of whether a file has been backed
up.
• Temporary flag allows a file to be marked for automatic deletion
when the process that created it terminates.
File Operations
• Files store information and allow it to be retrieved later on.
• Different systems provide different operations to allow storage and
retrieval.
• CREATE – Creates a file with no data. The purpose of this is to announce
that a new file is being created and set some of the attributes.
•  DELETE – Deletes a file when it is no longer needed to free some disk
space.
• OPEN – Used to open a file before it can be used by a process. The
purpose is to allow the operating system fetch the attributes and list of
disk addresses into memory for rapid access later on.
• CLOSE – When a process is through with a file, it should be closed to
free up internal table space. A disk is written in blocks and closing a file
forces writing of the file’s last block even though the block may not be full.
File Operations (cont)
•  READ – Data are read from a file. The caller must specify the size of data and provide a
buffer to put them in.
•  WRITE – Data is usually written to the file at the current position. If the current position is at
the end of the file, the size is increased. If it is in the middle, existing data are overwritten and
lost forever.
• APPEND – This is used to add data at the end of a file. Some systems do not provide for this
since it can be derived from other system calls.
•  SEEK – For random access files, a method is needed to specify from where to read. The
SEEK system call repositions the pointer from the current position to a specific position in the
file. After this, data can be read from the current position.
•  GET ATTRIBUTES – Processes often need to read the file attributes to do their work. This
system call is used to return specific attributes requested for.
•  SET ATTRIBUTES – Some of the file attributes are user settable and can be changed after
the file has been created. This system call makes that possible.
• RENAME – Often, users will want to change the name of an existing file. This system call is
used for that, but it is not necessary since a file can usually be copied to a new file with the
new name and then the old one deleted.
File Structures
• Files can be structured in several ways. The three most common
possibilities are:
– Unstructured sequence of bytes
– Fixed length records
– Tree of variable records

• (a) Unstructured sequence of bytes – The operating system does not


care what is in the file.
– All it sees are bytes.
– Any meaning on the content of the file must be interpreted by the user
programs.
• Both UNIX and MS- DOS use this approach.
• This approach gives maximum flexibility.

• User programs can put whatever they want in the files .


File Structures (cont)
(b) Fixed-length records – file is sequence of fixed length
records, each with some internal structure.
•A read operation reads one record and the write operation
appends or overwrites one record.
•An example of an operating system that viewed files as a sequence of
fixed-length records was CP/M.
•It used a 128-character record.
(c) Tree of variable records – a file consists of trees of records
– not necessarily of the same length, each containing a key field in
a fixed position in the record.
•The tree is sorted on the key field, to allow rapid searching for a
particular key.
File Types
• Many OS support several types of files.
– UNIX and DOS for example have regular files and directories.
– UNIX also has character and block special files.

• Regular files are those that contain user information.


• Directories are system files for maintaining the structure of the file system.
• Character special files are related to input/output and used to model serial I/O
devices such as terminals, printers and networks.
• Block special devices are used to model disks.  
• Regular files are generally either ASCII files or binary files.
– ASCII files consist of lines of text that need not to be of the same size, but are
terminated by the carriage return or line feed character.
– ASCII files can be displayed and printed as is, and can be edited by an ordinary text
editor.
• Listing binary files on the printer gives an incomprehensible random junk.
File Organization

• How files are stored in storage media


• Criteria for File Organization
– Short access time
• Needed when accessing a single record
– Ease of update
• File on CD-ROM will not be updated, so this is not a concern
– Economy of storage
• Should be minimum redundancy in the data
• Redundancy can be used to speed access such as an index
– Simple maintenance
– Reliability
File Organization Methods

• Various ways of storing files exist


– Pile
– Sequential File
– Indexed Sequential File
– Indexed File
The Pile
• Data are collected in
the order they arrive
• Purpose is to
accumulate a mass of
data and save it
• Records may have
different fields
• No structure
• Record access is by
exhaustive search
Sequential File

• Fixed format used for records


• Records are the same length
• All fields the same (order and
length)
• Field names and lengths are
attributes of the file
• One field is the key field
– Uniquely identifies the record
– Records are stored in key
sequence
Indexed Sequential File

• Index provides a lookup capability to quickly reach the


vicinity of the desired record
– Contains key field and a pointer to the main file
– Indexed is searched to find highest key value that is equal to or
precedes the desired key value
– Search continues in the main file at the location indicated by the pointer

• Comparison of sequential and indexed sequential


– Example: a file contains 1 million records
– On average 500,000 accesses are required to find a record in a
sequential file
– If an index contains 1000 entries, it will take on average 500 accesses
to find the key, followed by 500 accesses in the main file. Now on
average it is 1000 accesses
Indexed Sequential File (cont)

• New records are added to an


overflow file
• Record in main file that
precedes it is updated to
contain a pointer to the new
record
• The overflow is merged with
the main file during a batch
update
• Multiple indexes for the
same key field can be set up
to increase efficiency
Indexed File
• Uses multiple indexes for
different key fields
• May contain an exhaustive
index that contains one entry
for every record in the main
file
• May contain a partial index
• The Direct or Hashed File
– Directly access a block at a
known address
– Key field required for each
record
File Access
• Sequential access - A process reads all the bytes or
records in a file in order, starting at the beginning, but
can not skip and read them out of order.
• Sequential files can be rewound, however, so they can
be read as often as needed.
• Sequential files are convenient for magnetic tapes and
not disks.
• Random access files - read the bytes or records of a
file out of order, or access records by key rather than
by position.
File Access (cont)
• Two methods are used for specifying where to start
reading.
• In the first method, every read operation gives the
position in the file to start reading at.
• In second one, a special operation, SEEK, is provided
to set the current position.
– After a SEEK, the file can be read sequentially from the now-
current position.

• In most modern operating systems, there is no distinction


between random access and sequential access files.
File Directory

• Contains information about files


– Attributes
– Location
– Ownership

• Directory itself is a file owned by the


operating system
Directory Elements

• Basic Information
– File name: must be unique
– File type: e.g., text, binary
– File organization

• Address Information
– Volume: device on which file is stored
– Starting address: e.g., cylinder, track on disk
– Size used: in bytes, words or blocks
– Size allocated: maximum size of the file
Directory Elements (cont)

• Access Control Information


– Owner: able to grant/deny access to other users and
to change these privileges
– Access information: e.g., user’s name and password
for each authorized user
– Permitted actions: controls reading, writing, executing,
transmitting over a network

• Usage Information
– Date Created, Identity of Creator, Date Last Read
Access, Identity of Last Reader, Date Last Modified
Directory Operations
• CREATE – Used for creating a directory.
• DELETE – Deletes a directory. Only an empty directory can be deleted.
• OPENDIR – Directories can be read. For example to list all files in a directory, a
listing program opens the directory to read out the names of all the files it contains.
•  CLOSEDIR – When a directory has been read, it should be closed to free up
internal table space.
•  READDIR – Returns the next entry in an open directory. Formerly, it was possible
to read directories using the normal READ system call, but this has the
disadvantage of forcing the programmer to know and deal with the internal structure
of the directories.
•  RENAME – Used to rename directories just like in files
•  LINK – Linking is a technique that is used to allow a file appear in more than one
directory. This system call specifies an existing file and a path name and creates a
link from the existing file to a path specified by the path.
Hierarchical, or
Tree-Structured Directory
• Master directory with
user directories
underneath it
• Each user directory may
have subdirectories and
files as entries
• Each directory and
subdirectory can be
organized as a
sequential file
Hierarchical, or
Tree-Structured Directory
 Easily enforce access restriction on
directories.

 Easily organize collections of files.

 Minimize the difficulty in assigning


unique names.
Naming

• The tree structure allows users to find a file by


following a path from the root or master
directory down various branches until the file is
reached
• The series of directory names, culminating in
the file name itself, constitutes a pathname for
the file
• Duplicate filenames are possible if they have
different pathnames
File Sharing

• In multiuser system, there is almost


always a requirement for allowing files to
be shared among a number of users

• Two issues
– Access rights
– Management of simultaneous access
Access Rights

• A wide variety of access rights have been


used by various systems
– often as a hierarchy, with each right implying
those that precede it.
• None
– User may not know the existence of file by not
allowing to read the user directory that includes
this file
• Knowledge
– User can only determine that the file exists
and who its owner is
Access Rights cont…

• Execution
– The user can load and execute a program but
cannot copy it, e.g., proprietary programs
• Reading
– The user can read the file for any purpose,
including copying and execution
• Appending
– The user can add data to the file but cannot
modify or delete any of the file’s contents
Access Rights cont…

• Updating
– The user can modify, delete, and add to the
file’s data.
• Changing protection
– User can change access rights granted to
other users
• Deletion
– User can delete the file
User Classes

• Access can be provided to different classes


of users
– Owner: usually the files creator, has full rights
and may grant rights to others
– Specific users: individual users who are
designated by user ID
– User groups: a set of users identified as a
group
– All: all users who have access to this system
Simultaneous Access

• When access is granted to append or update


a file to more than one user, the OS or file
management system must enforce discipline
• User may lock the entire file or individual
records during update
• Mutual exclusion and deadlock are issues for
shared access, ref. readers/writers problem

You might also like