0% found this document useful (0 votes)
154 views

File Management

The document discusses computer file management. It describes how files can be created, edited, opened, closed, and organized into directories on a storage device. It explains that files are stored physically in a disorganized way due to fragmentation, but are logically organized through a file allocation table or master file table that tracks each file's location and position in the directory hierarchy. It provides an example directory structure and discusses the typical operations performed on files like create, open, edit, etc.

Uploaded by

Jun Mendoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

File Management

The document discusses computer file management. It describes how files can be created, edited, opened, closed, and organized into directories on a storage device. It explains that files are stored physically in a disorganized way due to fragmentation, but are logically organized through a file allocation table or master file table that tracks each file's location and position in the directory hierarchy. It provides an example directory structure and discusses the typical operations performed on files like create, open, edit, etc.

Uploaded by

Jun Mendoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

File Management

The term computer file management refers to the manipulation of documents and data
in files on a computer. Specifically, one may create a new file or edit an existing file and
save it; open or load a pre-existing file into memory; or close a file without saving it.
Additionally, one may group related files in directories. These tasks are accomplished in
different ways in different operating systems and depend on the user interface design
and, to some extent, the storage medium being used.

Concept of the hierarchy of files

Files can also be managed based on their location on a storage device. They are stored
in a storage medium in binary form. Physically, the data is placed in a not-so-well
organized structure, due to fragmentation. However, the grouping of files into directories
(for operating systems such as DOS, Unix, Linux) or folders (for the Mac OS and
Windows) is done by changing an index of file information known as the File Allocation
Table (NTFS for recent versions of Windows) or Master File Table (depending on
operating system used). In this index, the physical location of a particular file on the
storage medium is stored, as well as its position in the hierarchy of directories (as we
see it using commands such as DIR, LS and programs such as Explorer, Finder).

For DOS/Windows the hierarchy (along with examples):

Drive (C:)
Directory/Folder (C:\My Documents)
Sub-directory/Sub-folder (C:\My

Documents\My Pictures)
File (C:\My Documents\My Pictures\VacationPhoto.jpg)

A file manager or file browser is a computer program that provides a user interface to
work with file systems. The most common operations performed on files or groups of
files are: create, open, edit, view, print, play, rename, move, copy, delete, search/find,
and modify file attributes, properties and file permissions. Files are typically displayed in
a hierarchy. Some file managers contain features inspired by web browsers, including
forward and back navigational buttons.

A file system is a means to organize data expected to be retained after a program


terminates by providing procedures to store, retrieve and update data, as well as
manage the available space on the device(s) which contain it. A file system organizes
data in an efficient manner and is tuned to the specific characteristics of the device.
There is usually a tight coupling between the operating system and the file system.
Some file systems provide mechanisms to control access to the data and metadata.
Ensuring reliability is a major responsibility of a file system. Some file systems provide a
means for multiple programs to update data in the same file at nearly the same time.

Without a file system programs would not be able to access data by file name or
directory and would need to be able to directly access data regions on a storage device.
File systems are used on data storage devices such as hard disk drives, floppy disks,
optical discs, or flash memory storage devices to maintain the physical location of the
computer files. They may provide access to data on a file server by acting as clients for
a network protocol or they may be virtual and exist only as an access method for virtual
data. This is distinguished from a directory service and registry.

Space management

File systems allocate space in a granular manner, usually multiple physical units on the
device. The file system is responsible for organizing files and directories, and keeping
track of which areas of the media belong to which file and which are not being used. For
example, in Apple DOS of the early 1980s, 256-byte sectors on 140 kilobyte floppy disk
used a track/sector map.

This results in unused space when a file is not an exact multiple of the allocation unit,
sometimes referred to as slack space. For a 512-byte allocation, the average unused
space is 255 bytes. For a 64 KB clusters, the average unused space is 32KB.

The size of the allocation unit is chosen when the file system is created. Choosing the
allocation size based on the average size of the files expected to be in the file system
can minimize the amount of unusable space. Frequently the default allocation may
provide reasonable usage. If it can be anticipated that a file system will contain mostly
small files a small cluster size should be chosen. Choosing an allocation size that is too
small results in excessive overhead if the file system will contain mostly very large files.

File system fragmentation occurs when unused space or single files are not contiguous.
As a file system is used, files are created, modified and deleted. When a file is created
the file system allocates space for the data. Some file systems permit or require
specifying an initial space allocation and subsequent incremental allocations as the file
grows. As files are deleted the space they were allocated eventually is considered
available for use by other files. This creates alternating used and unused areas of
various sizes.

This is free space fragmentation. When a file is created and there is not an area of
contiguous space available for its initial allocation the space must be assigned in
fragments. When a file is modified such that it becomes larger it may exceed the space
initially allocated to it, another allocation must be assigned elsewhere and the file
becomes fragmented.
Directories

File systems typically have directories (also called folders) which allow the user to group
files. This may be implemented by connecting the file name to an index in a table of
contents. Directory structures may be flat (i.e. linear), or allow hierarchies where
directories may contain subdirectories. The first file system to support arbitrary
hierarchies of directories was the file system in the Multics operating system. The native
file systems of Unix-like systems also support arbitrary directory hierarchies as do the
FAT file system in MS-DOS 2.0 and later and Microsoft Windows, the NTFS file system
in the Windows NT family of operating systems, and the ODS-2 and higher levels of the
Files-11 file system in OpenVMS.

Metadata

Other bookkeeping information is typically associated with each file within a file system.
The length of the data contained in a file may be stored as the number of blocks
allocated for the file or as a byte count. The time that the file was last modified may be
stored as the file's timestamp. File systems might store the file creation time, the time it
was last accessed, the time the file's meta-data was changed, or the time the file was
last backed up. Other information can include the file's device type (e.g. block,
character, socket, subdirectory, etc.), its owner user ID and group ID, and its access
permission settings (e.g. whether the file is read-only, executable, etc.).

Additional attributes can be associated on file systems, such as NTFS using extended
file attributes. Some file systems provide for user defined attributes such as the author
of the document, the character encoding of a document or the size of an image.

Some file systems allow for different data collections to be associated with one file
name. These separate collections may be referred to as streams or forks. Apple has
long used a forked file system on the Macintosh, and Microsoft supports streams in
NTFS. Some file systems maintain multiple past revisions of a file under a single file
name; the filename by itself retrieves the most recent version, while prior saved version
can be accessed using a special naming convention.

Utilities

File systems include utilities to initialize, alter parameters of and remove an instance of
the file system. Some include the ability to extend or truncate the space allocated to the
file system.

Directory utilities create, rename and delete directory entries and alter metadata
associated with a directory. They may include a means to create additional links to a
directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create
bidirectional links to files.

File utilities create, list, copy, move and delete files, and alter metadata. They may be
able to truncate data, truncate or extend space allocation, append to, move, and modify
files in-place. Depending on the underlying structure of the file system, they may
provide a mechanism to pretend to, or truncate from, the beginning of a file, insert
entries into the middle of a file or deletion entries from a file.

Also in this category are utilities to free space for deleted files if the file system provides
an undelete function. Some file systems defer reorganization of free space, secure
erasing of free space and rebuilding of hierarchical structures.

They provide utilities to perform these functions at times of minimal activity. Included in
this category is the infamous defragmentation utility. Some of the most important
features of files system utilities involve supervisory activities which may involve
bypassing ownership or direct access to the underlying device. These include high
performance backup and recovery, data replication and reorganization of various data
structures and allocation tables within the file system.

Restricting and permitting access

There are several mechanisms used by file systems to control access to data. Usually
the intent is to prevent reading or modifying files by a user or group of users. Another
reason is to insure data is modified in a controlled way so access may be restricted to a
specific program.

Examples include passwords stored in the metadata of the file or elsewhere and file
permissions in the form of permission bits, access control lists, or capabilities. The need
for file system utilities to be able to access the data at the media level to reorganize the
structures and provide efficient backup usually means that these are only effective for
polite users but are not effective against intruders.

Methods for encrypting file data are sometimes included in the file system. This is very
effective since there is no need for file system utilities to know the encryption seed to
effectively manage the data. The risks of relying on encryption include the fact that an
attacker can copy the data and use brute force to decrypt the data. Losing the seed
means losing the data.

Encrypting File System: Maintaining integrity

One of the file systems significant responsibilities is to insure that, regardless of the
actions by programs accessing the data, the structure remains consistent. This includes
actions taken if a program modifying data terminates abnormally or neglects to inform
the file system that is has completed its activities. This may include updating the
metadata, the directory entry and handling any data that was buffered but not yet
updated on the physical storage media.

Other failures which the file system must deal with include media failures or loss of
connection to remote systems. In the event of an operating system failure or "soft"
power failure, special routines in the file system must be invoked similar to when an
individual program fails.

The file system must also be able to correct damaged structures. These may occur as a
result of an operating system failure for which the OS was unable to notify the file
system, power failure or reset.

The file system must also record events to allow analysis of systemic issues as well as
problems with specific files or directories.

User data

The most important purpose of a file system is to manage user data. This includes
storing, retrieving and updating data. Some file systems accept data for storage as a
stream of bytes which are collected and stored in a manner efficient for the media.

When a program retrieves the data it specifies the size of a memory buffer and the file
system transfers data from the media to the buffer. Sometimes a runtime library routine
may allow the user program to define a record based on a library call specifying a
length. When the user program reads the data the library retrieves data via the file
system and returns a record.

Some file systems allow the specification of a fixed record length which is used for all
write and reads. This facilitates updating records. An identification for each record, also
known as a key, makes for a more sophisticated file system. The user program can
read, write and update records without regard with their location. This requires
complicated management of blocks of media usually separating key blocks and data
blocks. Very efficient algorithms can be developed with pyramid structure for locating
records.

Types of file systems

File system types can be classified into disk/tape file systems, network file systems and
special purpose file systems.

Disk file systems. A disk file system takes advantages of the ability of disk storage
media to randomly address data in a short amount of time. Additional considerations
include the speed of accessing data following that initially requested and the anticipation
that the following data may also be requested. This permits multiple users (or
processes) access to various data on the disk without regard to the sequential location
of the data. Examples include FAT (FAT12, FAT16, FAT32, exFAT), NTFS). Some disk
file systems are journaling file systems or versioning file systems.

Optical discs ISO 9660 and Universal Disk Format (UDF) are two common formats that
target Compact Discs, DVDs and Blu-ray discs. Mount Rainier is an extension to UDF
supported by Linux 2.6 series and Windows Vista that facilitates rewriting to DVDs.

A flash file system considers the special abilities, performance and restrictions of flash
memory devices. Frequently a disk file system can use a flash memory device as the
underlying storage media but it is much better to use a file system specifically designed
for a flash device.

A tape file system is a file system and tape format designed to store files on tape in a
self-describing form. Magnetic tapes are sequential storage media with significantly
longer random data access times than disks, posing challenges to the creation and
efficient management of a general-purpose file system.

In a disk file system there is typically a master file directory, and a map of used and free
data regions. Any file additions, changes, or removals require updating the directory and
the used/free maps. Random access to data regions is measured in milliseconds so this
system works well for disks.

Tape requires linear motion to wind and unwind potentially very long reels of media.
This tape motion may take several seconds to several minutes to move the read/write
head from one end of the tape to the other.

Consequently, a master file directory and usage map can be extremely slow and
inefficient with tape. Writing typically involves reading the block usage map to find free
blocks for writing, updating the usage map and directory to add the data, and then
advancing the tape to write the data in the correct spot. Each additional file write
requires updating the map and directory and writing the data, which may take several
seconds to occur for each file.

Tape file systems instead typically allow for the file directory to be spread across the
tape intermixed with the data, referred to as streaming, so that time-consuming and
repeated tape motions are not required to write new data.

However, a side effect of this design is that reading the file directory of a tape usually
requires scanning the entire tape to read all the scattered directory entries. Most data
archiving software that works with tape storage will store a local copy of the tape
catalog on a disk file system, so that adding files to a tape can be done quickly without
having to rescan the tape media. The local tape catalog copy is usually discarded if not
used for a specified period of time, at which point the tape must be re-scanned if it is to
be used in the future.
IBM has developed a file system for tape called the Linear Tape File System. The IBM
implementation of this file system has been released as the open-source IBM Linear
Tape File System, which uses a separate partition on the tape to record the index meta-
data, thereby avoiding the problems associated with scattering directory entries across
the entire tape.

Writing data to a tape is often a significantly time-consuming process that may take
several hours. Similarly, completely erasing or formatting a tape can also take several
hours. With many data tape technologies it is not necessary to format the tape before
over-writing new data to the tape. This is due to the inherently destructive nature of
overwriting data on sequential media.

Because of the time it can take to format a tape, typically tapes are pre-formatted so
that the tape user does not need to spend time preparing each new tape for use. All that
is usually necessary is to write an identifying media label to the tape before use, and
even this can be automatically written by software when a new tape is used for the first
time.

Another concept for file management is the idea of a database-based file system.
Instead of, or in addition to, hierarchical structured management, files are identified by
their characteristics, like type of file, topic, author, or similar rich metadata.

Very large file systems, embodied by applications like Apache Hadoop and Google File
System, use some database file system concepts.

Transactional file systems

Some programs need to update multiple files "all at once". For example, a software
installation may write program binaries, libraries, and configuration files. If the software
installation fails, the program may be unusable. If the installation is upgrading a key
system utility, such as the command shell, the entire system may be left in an unusable
state.

Transaction processing introduces the isolation guarantee, which states that operations
within a transaction are hidden from other threads on the system until the transaction
commits, and that interfering operations on the system will be properly serialized with
the transaction.

Transactions also provide the atomicity guarantee, that operations inside of a


transaction are either all committed, or the transaction can be aborted and the system
discards all of its partial results.

This means that if there is a crash or power failure, after recovery, the stored state will
be consistent. Either the software will be completely installed or the failed installation
will be completely rolled back, but an unusable partial install will not be left on the
system.
Windows, beginning with Vista, added transaction support to NTFS, abbreviated TxF.
TxF is the only commercial implementation of a transactional file system, as
transactional file systems are difficult to implement correctly in practice. There are a
number of research prototypes of transactional file systems for UNIX systems.

Ensuring consistency across multiple file system operations is difficult, if not impossible,
without file system transactions. File locking can be used as a concurrency control
mechanism for individual files, but it typically does not protect the directory structure or
file metadata. File locking also cannot automatically roll back a failed operation, such as
a software upgrade; this requires atomicity.

Journaling file systems are one technique used to introduce transaction-level


consistency to file system structures. Journal transactions are not exposed to programs
as part of the OS API; they are only used internally to ensure consistency at the
granularity of a single system call.

Data backup systems typically do not provide support for direct backup of data stored in
a transactional manner, which makes recovery of reliable and consistent data sets
difficult. Most backup software simply notes what files have changed since a certain
time, regardless of the transactional state shared across multiple files in the overall
dataset. As a workaround, some database systems simply produce an archived state
file containing all data up to that point, and the backup software only backs that up and
does not interact directly with the active transactional databases at all. Recovery
requires separate recreation of the database from the state file, after the file has been
restored by the backup software.

Network file systems

A network file system is a file system that acts as a client for a remote file access
protocol, providing access to files on a server. Examples of network file systems include
clients for the NFS, AFS, SMB protocols, and file-system-like clients for FTP and
WebDAV.

A shared disk file system is one in which a number of machines (usually servers) all
have access to the same external disk subsystem (usually a SAN). The file system
arbitrates access to that subsystem, preventing write collisions.

A special file system presents non-file elements of an operating system as files so they
can be acted on using file system APIs. This is most commonly done in Unix-like
operating systems, but devices are given file names in some non-Unix-like operating
systems as well.

FAT
The family of FAT file systems is supported by almost all operating systems for personal
computers, including all versions of Windows and MS-DOS/PC DOS and DR-DOS. (PC
DOS is an OEM version of MS-DOS, MS-DOS was originally based on SCP's 86-DOS.
DR-DOS was based on Digital Research's Concurrent DOS.) The FAT file systems are
therefore well-suited as a universal exchange format between computers and devices of
most any type and age.

The FAT file system traces its roots back to an (incompatible) 8-bit FAT precursor in the
short-lived M-DOS project and Standalone disk BASIC. Over the years, the file system
has been expanded from FAT12 to FAT16 and FAT32.

Various features have been added to the file system including sub-directories,
codepage support, extended attributes, and long filenames. Third-parties such as Digital
Research have incorporated optional support for deletion tracking, and
volume/directory/file-based multi-user security schemes to support file and directory
passwords and permissions such as read/write/delete/execute access rights. Most of
these extensions are not supported by Windows.

The FAT12 and FAT16 file systems had a limit on the number of entries in the root
directory of the file system and had restrictions on the maximum size of FAT-formatted
disks or partitions.

FAT32 addresses the limitations in FAT12 and FAT16, except for the file size limit of
close to 4 GB, but it remains limited compared to NTFS.

FAT12, FAT16 and FAT32 also have a limit of 8 characters for the file name, and 3
characters for the extension (such as .exe). This is commonly referred to as the 8.3
filename limit. VFAT, an optional extension to FAT12, FAT16 and FAT32, introduced in
Windows 95 and Windows NT 3.5, allowed long file names (LFN) to be stored in the
FAT file system in a backwards compatible fashion.

NTFS

NTFS, introduced with the Windows NT operating system, allowed ACL-based


permission control. Other features also supported by NTFS include hard links, multiple
file streams, attribute indexing, quota tracking, sparse files, encryption, compression,
and reparse points (directories working as mount-points for other file systems, symlinks,
junctions, remote storage links), though not all these features are well-documented.

Long file paths and long file names.


In hierarchical file systems, files are accessed by means of a path that is a branching
list of directories containing the file. Different file systems have different limits on the
depth of the path. File systems also have a limit on the length of an individual filename.

Common File Types


Compressed Type
File Type Description
AAC Advanced Audio Coding
cab Microsoft Cabinet
RAR Rar Archive (.rar) – for multiple file, archive (rar to .r01-.r99 to s01 and so on)
zip

Physical recordable media archiving


File Type Description
The generic file format for most optical media, including CD-ROM, DVD-ROM, Blu-ray
ISO Disc, HD DVD and UMD. ISO images indicate write directives, while .bin files that
usually accompany such files contain the actual data.
IMG For archiving MS-DOS formatted floppy disks.

Database
File Type Description
ACCDB Microsoft Database (Microsoft
MDF Microsoft SQL Server Database
MYD MySQL, MyISAM table data
MYI MySQL, MyISAM table index

Desktop Publishing
File Type Description
PUB Microsoft Publisher
PMD Adobe PageMaker

Document
File Type Description
DOC, DOCX Microsoft Word document
ODT OpenDocument Text document
PDF Portable Document Format
RTF Rich Text document
TXT ASCII or Unicode plaintext

Font File
File Type Description
TTF (.ttf, .ttc) TrueType Font

Graphics
File Type Description
Exchangeable image file format and is a specification for the image file format used by
Exif
digital cameras
GIF CompuServe's Graphics Interchange Format
a file format used for icons in Microsoft Windows. Contains small bitmap images at
ICO
multiple resolutions and sizes.
JPEG, JFIF Joint Photographic Experts Group – a lossy image format widely used to display
(.jpg or .jpeg) photographic images.
Portable Network Graphic (lossless, recommended for display and edition of graphic
PNG
images)
PSD, PDD Adobe Photoshop Drawing
Tagged Image File Format (usually lossless, but many variants exist, including lossy
TIFF (.tif or .tiff)
ones)

Vector Graphics
File Type Description
AI Adobe Illustrator Document
CDR CorelDRAW Document

Object code, executable files, shared and dynamically-linked libraries


File Type Description
Plug ins for some photo editing programs including Adobe Photoshop, Paint Shop Pro,
.8BF
GIMP and Helicon Filter.
.Class used in Java
COM commands used in DOS
XPI PKZIP archive that can be run by Mozilla web browsers to install software)
New Executable (.EXE – used in DOS 4.0 and later, 16-bit Microsoft Windows, and
EXE OS/2), also a Portable Executable (.EXE, .DLL – used in Microsoft Windows and some
other systems)
WAR archives of Java Web applications
VBX Visual Basic extensions

Presentation
File Type Description
ODP Open Document Presentation
OTP Open Document Presentation template
POT Microsoft PowerPoint template
PPS Microsoft PowerPoint Show
PPT, PPTX Microsoft PowerPoint
Script Type
File Type Description
AS Adobe Flash ActionScript File
BAT Batch file
BAS QBasic & QuickBASIC
HTA HTML Application
JS JavaScript and Jscript
JSFL Adobe JavaScript language
PHP PHP
PL Perl
PM Perl module
PS1XML Windows Power Shell format and type definitions
PSC1 Windows Power Shell console file
PSD1 Windows Power Shell data file
PSM1 Windows Power Shell module file
VBS Visual Basic Script

Lossless Audio
File Type Description
FLAC free lossless and compressed codec of the Ogg project

MP3 –
GSM –
WMA – (.WMA)
AAC
(RA, RM)

WMA Windows Media Audio 9 Lossless (WMA)

Lossy audio
File Type Description
WMA Windows Media Audio (earlier versions)
MP3 MPEG Layer 3
GSM GSM Full Rate, originally developed for use in mobile phones
AAC Advanced Audio Coding (usually in an MPEG-4 container)
(.m4a, .mp4, .m
4p, .aac)
RA, RM RealAudio

Other music
File Type Description
MID – standard MIDI file; most often just notes and controls but occasionally also
MID
sample dumps
MUS MUS – Finale Notation file
Playlist
File Type Description
M3U Playlist file

Source code for computer programs


File Type Description
C C source
CLS Visual Basic class
COB, CBL COBOL source

CPP, CC, CXX C++ source


CS C# source
CSPROJ C# project (Visual Studio .NET)
FRM Visual Basic form
FRX Visual Basic form stash file (binary form file)
PL, PM Perl

VAP Visual Studio Analyzer project


VB Visual Basic.NET source
VBP, VIP Visual Basic project
VBPROJ Visual Studio compatible project group
VCPROJ Visual C++ project
VDPROJ Visual Studio deployment project

Spreadsheet
File Type Description
ODS Open Document spreadsheet
WKS Lotus 1-2-3
XLSX Microsoft Excel worksheet sheet

Video
File Type Description
3GP the most common video format for cell phones
GIF Animated GIF (simple animation; until recently often avoided because of patent
problems)
AVI container (a shell, which enables any form of compression to be used)
DAT video standard data file (automatically created when we attempted to burn as video file
on the CD)
FLV Flash video (encoded to run in a flash animation)
FLA Macromedia Flash (for producing)
MPEG, MP4 MPEG-4 Part 14, shortened "MP4" – multimedia container (most often used for Sony's
(.mpeg, .mpg, . PlayStation Portable and Apple's iPod)
mpe)
OGG container, multimedia
RM RM – Real Media
SWF Shockwave File/Macromedia Flash (for viewing)
WMV Windows Media Video
Webpage
File Type Description
XML EXtensible Markup Language
HTML HyperText Markup Language
(.html, .htm)
XHTML – eXtensible HyperText Markup Language
(.xhtml, .xht)
MHTML – Archived HTML, store all data on one web page (text, images, etc.) in one big file
(.mht, .mhtml)
ASP – (.asp) Microsoft Active Server Page
ASPX – (.aspx) Microsoft Active Server Page. NET
JSP – (.jsp) JavaServer Pages
Perl
PL (.pl)
PHP – – ? is version number (previously abbreviated Personal Home Page, later changed to
(.php, .php?, .p PHP: Hypertext Preprocessor)
html)

Other Types
File Type Description
BAK backup file
INI used by many applications to store configuration
INF similar file format to INI; used to install device drivers under Windows
TMP, $$$ Temporary file
CDA Compact Disc Audio
CSS cascading style sheet
DAT Data file in special format or ASCII
WAV Sound format (Microsoft Windows Resource Interchange File Format WAVE)

You might also like