0% found this document useful (0 votes)
12 views

File Systems2023Part2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

File Systems2023Part2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

File systems

• PART 1:
• Secondary storage devices
• Types of storage devices
• Disk structure
• Disk formatting
• Files
• File systems
• Definition of “file system”
• Address mapping
• Strategies for allocating disk space to files
• PART 2:
• Windows file systems
• FAT 12,16, 32
• NTFS
• Linux file systems
• Linux file structure on disk
• ext2
• Mounting a file system in Linux and Windows
• The boot sequence

• This section of the course is based in part on chapters 11, 13, 14


and 15 in the book Operating System Concepts, tenth ed., and
chapter 4 in the book Modern Operating Systems, third ed.
Windows files systems

• Directory structure
• Different Windows file systems
• FAT 12
• FAT 16, 32
• The NTFS file system
• The Master File Table (MFT)
• MTS entries
• Structure of MFT
• NTFS file storage allocation
Structure of directory entries for MS-DOS
• Directory entries are 32 bytes long
• A directory entry has the format below:
• Attributes: a collection of bit flags:
• read only,
• hidden,
• system file,
• root directory,
• sub-directory,
• file has been archived,
• last 2 unused
File Allocation Table (FAT)
• FAT is a linked list allocation of blocks to files using tables
• FAT, developed in 1977 for use on floppy disks, MS-DOS,
later adapted for use on hard drives and other devices, as
well as for the first Windows versions
• The increase in disk drives capacity required three major
variants: FAT12, FAT16, FAT32
• FAT was replaced with NTFS as the default file system on
Windows operating systems starting with Windows XP
(2001)
• Nevertheless, FAT continues to be used on flash
memory and other solid-state memory cards (including
USB drives), as well as digital cameras, etc.
File Allocation Table (FAT)

• FAT12, FAT16, FAT32


• The number after FAT indicates the
number of bits used to represent an
entry in the table
• The original FAT was 8 bits
• Implicitly this number defines the
size of the FAT table as well as the
size of the file system (partition)
• A FAT had as many entries as it can
be indexed with 12, 16, 32 bits
• FAT 12 means the table has 4096
entries
• Thus, the number of blocks in the
partition is 4096
FAT structure in a partition
• The boot block, the first blocks of the partition, contains
information about the file system, including how many copies
of the FAT tables are present, how big a sector is, how many
sectors in a block. It may also contains the code necessary
to start the operating system (boot the OS)
• There are usually two copies of the FAT table
• The root directory, unlike directories in the data area, has a
fixed size
• for FAT12, uses 14 sectors
• Each sector is 512 bytes, and each directory entry is 32 bytes
• 16 directory entries per sector, 14 * 16 = 224 entries in the root
directory
• The other directories have variable lengths, but each entry is
32 bytes long as well
• Data area FAT-x 1 FAT-x 2 Root directory Data Blocks
• Boot Block
Once only for file data, later expanded to include subdirectories
FAT12

• Designed for floppy disks,


1.44MB capacity, block =
sector = 512bytes, 2812
sectors
• Need 12 bits to address any
block
• If FAT12 used with larger
drives, need to increase block
sizes
• FAT12 was allowed to have
new block sizes of 1KB, 2KB
and 4 KB
• 2^{12} = 4096 is the largest
number that can be
represented with 12 bits
• If block size is 4KB, 4 * 2^12 =
16MB, each partition can be
FAT16, FAT32
• FAT16 was introduced to handle
larger disk drives
• Was allowed block sizes 8KB, 16KB
and 32KB
• FAT16 uses 128KB of main memory per
partition (FAT must be loaded in main memory)
• 2^{16} = 65536 entries in the FAT, each
entry is 2 bytes long
• The largest possible disk partition was 2GB
• 65536 blocks * 32KB per block = 2097152
bytes
• 2GB is just enough for 8 minutes of
video!
• Could address hard drives of 8GB
• Starting with Windows 95, FAT32 file
system was introduced
Windows NTFS
• NTFS is the file system currently used by Windows
• The block size ranges from 512 bytes to 64 KB, depending on the
partition size. Most NTFS disks use 4-KB blocks.
• Blocks are referred to by their offset from the start of the
partition using 64-bit numbers.
• Using this scheme, the system can calculate a physical storage
offset (in bytes) by multiplying the block number by the block
size.
Windows NTFS
• A NTFS partition only has boot blocks, an allocation table and
storage for files
• no need to reserve any specific physical addresses on the disk for
any specific type of data such as file allocation tables, partitions
table
• NTFS stores all system and administration data of the file system
in files.
• This is the same information that other file systems keep in hidden
areas normally located at the beginning of the disk with fixed
physical addresses.
• This information is stored as ordinary files that can be physically
located anywhere on the NTFS partition
• Thus, each partition is structured as below:
NTFS Master File
Table (MFT)
• The MFT is the only
data structure in an
NTFS
• It stores information
about the files and
directories, and
allocates storage to
files
• This file table contains
information about every
file and directory listed
in the file system.
• Each file or directory
has at least one entry
in the MFT.
NTFS Master File
Table
• Each entry is exactly
1 KB (1024 bytes) in
size.
• It contains the file’s
attributes, such as its
name and
timestamps, and the
list of disk addresses
where its blocks are
located.
• The MFT is itself a
file, the file can grow
as needed, up to a
maximum size of
2^{48} entries.
NTFS Master file
• The first 16 MFT entries
are reserved for NTFS
metadata files
• Each of these files has a
name that begins with a
dollar sign to indicate
that it is a metadata file.
• Entry 0 describes the
MFT file itself. In
particular, it tells where
the blocks of the MFT file
are located so that the
system can find the MFT
file.
• The address of the first
MTF block is in the boot
sector
Some metadata
Master file
entries
• MTF entry 1 is a duplicate of
entry 0.
• MTF entry 3 contains
information about the
partition, such as its size,
label, and version.
• MTF entry 5: the root directory,
which itself is a file and can
grow to an arbitrary length.
• Free space on the partition is
kept track of with a bitmap.
The bitmap is itself a file, and
its attributes and disk
addresses are given in MFT
entry 6.
• MTF entry 11 is a directory
containing miscellaneous files
for things like disk quotas.
Structure of master file entries

• The entries size is1024 bytes. The first 42 bytes store the header. The other 982 bytes do not have a
fixed structure, and are used to keep attributes.

• The MFT record header contains 12 data fields including


• flags to determine if the entry relates to a file or a directory,
• if the entry is in use or can be reused (MFT entries are only marked
as deleted and are eventually reused by the file system).
• the header also contains the MFT entry number.
Structure of master file entries

• The other 982 bytes consist of a sequence of (attribute header, value) pairs.
• Each attribute begins with a header telling which attribute this is and
how long the value is.
• If the attribute value is short enough to fit in the MFT entry, it is placed
there.
• If it is too long, it is placed elsewhere on the disk and a pointer to it is
placed in the MFT entry
Structure of master file entries

• NTFS defines 13 attributes that can appear in MFT entries. Each


attribute header identifies the attribute and gives the length and
location of the value field
Structure of master file records

• The standard information field contains the file owner, security


information, the timestamps. This field is always present.
• The value of the file name is obviously the name of the file
• NTFS files have an ID associated with them that is like the i-node number
in UNIX. Files can be opened by ID
• An NTFS file has one or
more data streams
associated with it. For each
stream, the stream name, if
present, goes in this
attribute header. Following
the header is either a list of
disk addresses telling
which blocks the stream
contains, or for streams of
only a few hundred bytes
(and there are many of
Structure of master file entries

• Usually, attribute values follow their attribute headers directly, but if a


value is too long to fit in the MFT entry (like data), it may be put in
separate disk blocks.
• Such an attribute is said to be a nonresident attribute.
• The headers for resident
attributes are 24 bytes
long
• The headers of
nonresident attributes
are longer because they
contain information about
where to find the
attribute on disk.
Storage allocation
• The blocks in a stream are described by a sequence of records,
each one describing a sequence of logically contiguous blocks
• Each record begins with a header giving the offset of the first
block within the stream. Next is the offset of the first block not in
the record
• Each record header is followed by one or more pairs, each giving
a disk address and run length. The disk address is the offset of
the disk block from the start of its partition; the run length is the
number of blocks in the run
Directory structure and record number

• A directory entry consists of a filename and a "file ID“,


which is the record number representing the file in the
Master File Table
• A record number consists of a 48-bit index into the MFT
and a 16-bit sequence number used to detect stale
references
Addressing a file

• All MFT files are numbered, the index in the MFT


• MFT files are addressed in a -bit system.
• The first file has the address of zero.
• The address of the last file changes as the MFT grows.
• The address of the last file can be measured by dividing
the size of the $MFT file by the size of each record.
• Considering the fact that each file is sized exactly 1 KB
in all existing versions of Windows, this task is trivial.
Unix/Linux files systems

• Unix/Linux structure on disk


• Linux “extended file system” (ext2, ext3, ext4,…)
• Linux directory entries
• Structure of i-nodes
• Linux hierarchical file naming
The Linux file systems
• The initial file system of Linux was based on the
Minix operating system file system (Linux is based on the
Minix OS).
• The Minix file system used 16-bit offsets internally and thus had a
maximum size limit of only 64 MBs and there was also a filename
length limit of 14 characters
• The extended file system (ext), released in1992, solved the two
major problems in the Minix file system (maximum partition size
and filename length limitation to 14 characters), and allowed 2
GBs of data and filenames of up to 255 characters.
• The second extended file system (ext2), released in 1993, was
an overhaul of the extended file system with many ideas from
Unix file systems.
• ext2 is the most consequential design of all Linux file systems,
subsequent versions of the Linux file system have not changed the
design of ext2
• Now there are over 130 Linux file systems, but the standard is
the versions of the extended file system, mostly ext2, ext3, and
General view of the Linux structure on disk
• Boot Block
• Super Block
• i-node List
• Data Block

• The boot block occupies the beginning of a


file system, the first sector
• May contain bootstrap code that is read into
the machine at boot time
• Only one boot block is required to boot the
system, but every file system may contain a
boot block

Boot Block Super Block i-node List Data Blocks


General view of Linux: Super block
• Boot Block
• Super Block
• i-node List
• Data Block

• The super block describes the file


system:
- Number of bytes in the file system
- Number of files it can store
- Where to find free space in the file
system
- Additional data to manage the file
system

Boot Block Super Block i-node List Data Blocks


General view of Linx: i-node list/table
• Boot Block
• Super Block
• i-node List
• Data Block

• i-node:
• Each i-node is the internal
representation of a file, i.e. the mapping
of the file to disk blocks
• It also contains the attributes of the file:
owner, permissions, date of creation, etc
• The i-node list is a pre-determined
sequence of i-nodes available to a
“block group” in the file system
Boot Block Super Block i-node List Data Blocks
General view of Linux: Data blocks
• Boot Block
• Super Block
• Inode List
• Data Block

• Data Blocks:
• Storage available to a file system
• An allocated data block can belong to one
and only one file in the file system

Boot Block Super Block i-node List Data Blocks


The Linux ext2 file system
• ext2 divides the logical partition that it occupies into Block Groups, the file
system has the disk layout as shown below
• Each group includes data blocks and i-nodes stored in adjacent tracks.
• This structure allows files stored in a single block group to be accessed with a
lower average disk seek time
The Linux ext2 file system
• Block 0 (boot block) contains code to boot the computer
• All block groups in the filesystem have the same size and
are stored sequentially
• The block groups have the layout as below:
• Superblock: Contains a copy of the of the filesystem’s
superblock for the partition
• Group descriptor: Contains information about the location
of the bitmaps, the number of free blocks and i-nodes in
the group, the number of directories in the group
The Linux ext2 file system
• Two bitmaps are used to keep track of the free blocks and free
i-nodes
• Block bitmap identifies the free blocks inside the group
• i-node bitmap identifies the free i-nodes inside the group
• Each map is one block long. With a 1-KB block, this design
limits a partition to 8192 blocks and 8192 i-nodes
The Linux ext2 file system
• Following the bitmaps are the i-nodes themselves.
• They are numbered from 1 up to some maximum.
• Each i-node is 128 bytes long and describes exactly one file.
• An i-node contains enough information to locate all the disk
blocks that hold the file’s data.
• Following the i-nodes are the data blocks. All the files and
directories are stored here.
• Note that i-nodes are numbered at the partition level, thus to
find in which group an i-node is, take the number of the i-node
divided by the number of i-nodes per group
Structure of i-nodes

• i-nodes are 128 bytes long.


• They contain the same
information about a file that
you will find in the directory
entry of an MS-DOS file system
such as:
• Type (directory or file)
• Access rights
• Owners
• Timestamps
• Size
• Pointers to data blocks
Structure of i-nodes

• The i-node contains the disk


addresses of the first 12 blocks
of a file.
• For files longer than 12 blocks, a
field in the i-node contains the
disk address of a single indirect
block. This block contains the
disk addresses of more disk
blocks.
• For example, if a block is 1 KB
and a disk address is 4 bytes,
the single indirect block can
hold 256 disk addresses. Thus,
this scheme works for files of up
to 268 KB.
Structure of i-nodes

• Beyond that, a double indirect


block is used.
• It contains the addresses of
256 single indirect blocks,
each of which holds the
addresses of 256 data
blocks. This mechanism is
sufficient to handle files up
to 10 + 216 blocks
(67,119,104 bytes)
• Triple indirect block. can handle
file sizes of 16 GB for 1-KB
blocks. For 8-KB block sizes, the
addressing scheme can support
file sizes up to 64 TB.
Mapping directory entry to i-nodes

Data blocks
100

• The i-node number i-node for /foo


201
of a file is used as 30 direct (100)

an index into the i- direct (201)


single indirect (40)
node table (on double indirect (45) 150

disk) to locate the triple indirect


40 150
corresponding i- 160 160
node and bring it File i-node
name address
into memory. 60 299
299
/foo 30 399
45 60
70 70
399

Directory
Linux file systems ext3 and ext4
• In ext2 writes are delayed, and changes may not be committed to
disk for up to 30 sec, which is a very long time interval
• Like Windows NTFS, ext3 and ext4 are “journaling file systems”
• The basic idea behind a journaling file system is to maintain a
journal, which describes all file-system operations in sequential
order.
• By sequentially writing out changes to the file-system data, the
operations do not suffer from the overheads of disk-head movement
during random disk accesses.
• Eventually, the changes will be written out, committed, to the
appropriate disk location, and the corresponding journal entries can
be discarded.
Structure of Linux directory entries
• Each entry of a Linux directory consists of four fixed-length fields and
one variable-length field.
• The fixed files are:
• The first field is the i-node number.
• The next field tells how big the entry is (in bytes),
• Next is the type field: file, directory, and so on.
• The last fixed field is the length of the actual file name in bytes
• Finally, comes the file name itself, terminated by a 0 byte
Linux hierarchical file naming

• Linux hierarchical
naming:
• /users/faculty/rama/foo
• Each part of the file
name corresponds to
an i-node which form
part of a tree like
structure where all but
the leaf nodes are
directory files (which
are i-nodes)
• Each directory entry
contains a type which
indicates if it is a
directory or a data file
The big picture

• File descriptors and i-nodes


• File system mounting
• The OS boot sequence
Relation between file descriptors and i-nodes: Linux

• Before a file can be read, it must be opened:



int fd = open("foo.txt", O_RDONLY | O_CREAT);
• The open() system call returns an integer, a “file descriptor”, which
uniquely identifies the opened file

• The operating system


maintains two data
structures representing the
state of open files:
• the per-process file
descriptor table and
• the system-wide open file
table.
Relation between file descriptors and i-nodes: Linux

• When a file is opened:



int fd = open("foo.txt", O_RDONLY | O_CREAT);
a new entry is created in the open file table.

• A pointer to this entry is


stored in the process's file
descriptor table.
• The file descriptor table is a
simple array of pointers into
the open file table.
• The index into the file
descriptor table is the file
descriptor (fd)
• Each entry in the open file
table has a pointer to the i-
node of the corresponding
Relation between file descriptors and i-nodes: Linux
Opening a file
• The open() system call passes a file name to the file system.
• The open() system call first searches the open-file table to see if the file
is already in use by another process.
• If it is, an entry is created into the process file descriptor table pointing
to the entry in the open-file table corresponding to the open file
• If the file is not already open, the directory structure is searched for the
given file name.
• Once the file is found, its i-node is copied into main memory, and an
entry in the open-file table is created pointing to the new i-node
• Next, an entry is made in the process file descriptor, with a pointer to the
newly created entry in the open-file table
• The open() call returns a pointer to the appropriate entry in the file
descriptor of the process.
• All other file operations (read, write, etc.) are then performed via this
pointer.
Opening a file: example
• Open(/usr/me/mailbox), not already open
• 1-find entry of “usr” in / and get the i-node of the usr directory on disk
• 2-load i-node of usr directory in main memory
• 3-find blocks of the directory file usr on disk
• 4-load the directory usr in main memory
• 5-read usr until find entry of directory “me” and read the number of the
corresponding i-node
• 6-find i-node of the directory file “me” on disk and load it in main memory
• 6.1-read the directory file “me” into main memory
• 7-read the directory “me” until string “mailbox” is found
• 8-read the i-node number of “mailbox” in the directory “me”
• 9-find the i-node on disk and load it in main memory
• 10-create an entry in the open file table and makes it point to the i-node for
the mailbox file
• 11-create an entry in the file descriptor table and makes it point to the
corresponding entry in the open file table
• 12 returns the entry number in the file descriptor to the calling process
• The system usually doesn’t need the absolute path, rather the relative
path from the current working directory is enough
Multiple file systems
• A computer or an operating system could use different file
systems
• Windows may have a main NTFS file system, but also a legacy
FAT-32 or FAT-16 drive or partition that contains old, but still
needed, data, and from time to time a flash drive, an old CD-
ROM or a DVD
• Windows handles these disparate file systems by identifying
each one with a different drive letter, as in C:, D:, etc.
• When a process opens a file, the drive letter is explicitly or
implicitly present so Windows knows which file system to
pass the request to.
File system mounting
• Whether it is Unix/Linux or Windows OS system, a file system must be
mounted before it can be available to processes of the OS
• The mount procedure is straightforward.
• The operating system is given the name of the device and the mount
point, i.e. the location within the file structure where the file system
is to be attached
• Typically, a mount point is an empty directory.
Mounting file systems
• The OS must be told of the
existence of file systems, this
operation is called mounting
• File systems are on a “device”,
for example a hard disk or a
usb key
• Mounting a file system
attaches that file system to a
directory (mount point) and
makes it available to the
system.
• The root (/) file system is
always mounted. Any other file
system can be connected to
the root
Mounting file
systems
• When mounting a file system, any
files or directories in the underlying
mount point directory become
unavailable as long as the file
system is mounted.
• Thus, typically mount directories are
empty, don’t want to obscure
• Here thefiles.
existing file system of partition c is
mounted in /home/user1
• Note, the subdirectory Docs1 has
disappeared, this is what happens
when a file system is mounted in a
directory that is not empty
Mounting file systems
• Assume you wanted to access a
local file system from the /opt file
system where the local file
contains a set of unbundled items
• First, you must create a directory
to use as a mount point for the
file system you want to mount, for
example, /opt/unboundled

• Once the mount point is created,


you can mount the file system
using the command “mount”
Mounting:
another example

• Assume we want to mount


the file system on one of the
disk partitions.
• First a mount point must be
selected, i.e. a directory in
the file system.
• The side picture describes
the file systems on 3
partitions
Linux: mounting
a file system

• The file system of partition b


has been mounted in the
directory /home
Linux: mounting
a file system

• Here the file system of


partition c has been
mounted in /home/user1
• Note, the subdirectory Docs1
has disappeared, this is what
happens when a file system
is mounted in a directory
that is not empty
Mounting a Linux file system: the device side
• Beside the mounting point, the operating system must be given
the name of the device (partitions, USB keys, etc)
• In Linux, the names of the devices are in /dev
• sda for the disk partitions, sda1, sda2 for partitions 1 or 2,
• sdb for USB key
• cdrom
• fd for floppy disks
• To mount a USB key, the device /dev/sdb of the file system is
combined with the mount point /mnt/media directory using:
sudo mount /dev/sdb /mnt/media
Automounting file systems
• File systems are not usually mounted manually. Rather the OS kernel has
a table, Virtual File System Table, that contains all information need to mount
file systems at boot time
• A system file is created when the OS is installed (in Linux, /etc/fstab) that
contains a list of file systems and how to mount them.
• The table has columns like
• Device: usually the given name of the mounted device (sda,usb,floppy)
• File System Type: shows the type of filesystem in use (ufs (unix file system), ntfs, FAT)
• Mount point: Directory where to mount
• Mount at boot: yes or no
Windows: mounting file system
• Windows–based systems mount each partition with a separate name,
denoted by a letter and a colon such as C:
• To record that a file system is mounted at C:, for example, the
operating system places a pointer to the file system in a field of the
mount table corresponding to C:.
• When a process specifies the driver letter, the operating system finds
the appropriate file-system pointer and traverses the directory
structures on that device to find the specified file or directory.
• One can find all the Windows partitions by right-click the Start button and
select Disk Management.
• 2 hidden partitions that holds information about the system boot file and the disk boot
sector in case of system failure. It is the part of the hard disk that is not displayed or used
directly under normal conditions.
The booting sequence
• How does a computer
boot?
• Once you turn on
your computer, the
first thing the CPU
does it to read and
execute a sequence
of instructions from
the BIOS, a program
on a small memory
cheap sitting on the
motherboard
• Among these
instructions, some
request the CPU to
load the first sector
of the hard disk, the
master boot record
The booting sequence
• The end of the MBR contains the partition table. This table
gives the starting and ending addresses of each partition.
• One of the partitions in the table is marked as active.

• The first thing the MBR program does is locate the active
partition, read in its first block, which is called the boot block,
and execute it.
• The program in the boot block loads the operating system
contained in that partition
Why disk partitioning
• On DOS, Microsoft Windows, a common practice is to use one
primary partition for the active file system that will contain the
operating system, the page/swap file, all utilities, applications, and
user data.
• On most Windows computers, the drive letter C: is routinely
assigned to this primary partition.
• On Unix-like operating systems it is possible to use multiple
partitions on a disk device. Multiple partitions allow directories such
as /boot, /tmp, /usr, /var, or /home to be allocated their own
filesystems. Such a scheme has a number of advantages:
• If one file system gets corrupted, the data outside that filesystem/partition
may stay intact, minimizing data loss.
• Specific file systems can be mounted with different parameters, e.g.,
read-only, or with the execution of setuid files disabled.
• Keeping user data such as documents separate from system files allows the
system to be updated with lessened risk of disturbing the data.
• A common minimal configuration for Linux systems is to use three
partitions: one holding the system files mounted on "/" (the
root directory), one holding user configuration files and data mounted
on /home (home directory), and a swap partition.

You might also like