Sgi Xfs Guide
Sgi Xfs Guide
007–4273–007
COPYRIGHT
© 2003-2004, 2012–2015 Silicon Graphics International Corp. All rights reserved; provided portions may be copyright in third parties,
as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this
electronic documentation in any manner, in whole or in part, without the prior written permission of SGI.
Linux is a registered trademark of Linus Torvalds in the U.S. and other countries. All other trademarks mentioned herein are the
property of their respective owners.
New Features in This Guide
007–4273–007 iii
Record of Revision
Version Description
007–4273–007 v
Contents
007–4273–007 vii
Contents
mkfs.xfs for a Directory Block Size Larger than Filesystem Block Size . . . . . . 15
Growing a Filesystem . . . . . . . . . . . . . . . . . . . . . . 15
4. Filesystem Maintenance . . . . . . . . . . . . . . . . . . 17
Filesystem Reorganization . . . . . . . . . . . . . . . . . . . . . 17
Filesystem Corruption . . . . . . . . . . . . . . . . . . . . . . 17
Checking Filesystem Consistency . . . . . . . . . . . . . . . . . . 18
Overview of the Commands to Check Filesystem Consistency . . . . . . . . . 18
xfs_repair -n Command Line . . . . . . . . . . . . . . . . . 19
xfs_check Command Line . . . . . . . . . . . . . . . . . . . 20
Repairing XFS Filesystem Problems . . . . . . . . . . . . . . . . . . 21
Repairing Inconsistent Filesystems with xfs_repair . . . . . . . . . . . 21
Common xfs_repair Error Messages . . . . . . . . . . . . . . . . 23
xfs_repair Error Messages When Files Are in lost+found . . . . . . . . 24
What to Do If xfs_repair Cannot Repair a Filesystem . . . . . . . . . . 25
Mounting a Filesystem Without Log Recovery . . . . . . . . . . . . . . 25
Remounting an XFS Filesystem . . . . . . . . . . . . . . . . . . . 26
5. Disk Quotas . . . . . . . . . . . . . . . . . . . . . . 27
Overview of Disk Quotas . . . . . . . . . . . . . . . . . . . . . 27
Enabling Quotas . . . . . . . . . . . . . . . . . . . . . . . . 29
Enabling Quotas for Users . . . . . . . . . . . . . . . . . . . . 29
Enabling Quotas for Groups . . . . . . . . . . . . . . . . . . . 29
Enabling Quotas for Projects . . . . . . . . . . . . . . . . . . . 30
Setting Quota Limits . . . . . . . . . . . . . . . . . . . . . . 31
Setting Quota Limits for Users . . . . . . . . . . . . . . . . . . 31
Setting Quota Limits for Groups . . . . . . . . . . . . . . . . . . 31
viii 007–4273–007
®
XFS Administrator Guide
007–4273–007 ix
Contents
x 007–4273–007
®
XFS Administrator Guide
007–4273–007 xi
Contents
panic_mask . . . . . . . . . . . . . . . . . . . . . . . 92
Index . . . . . . . . . . . . . . . . . . . . . . . . . . 93
xii 007–4273–007
About This Guide
This guide tells you how to plan, create, and maintain XFS® filesystems on a system
running the Linux operating system.
Related Publications
For information about this release, see the following SGI InfiniteStorage Software
Platform (ISSP) README.txt release note.
The following documents contain additional information:
• DMF 6 Administrator Guide
• CXFS 7 Client-Only Guide for SGI InfiniteStorage
• XVM Volume Manager Administrator Guide
• Linux Configuration and Operations Guide
• The user guide and quick start guide for your hardware
• NIS Administrator’s Guide
• Personal System Administration Guide
• Performance Co-Pilot for Linux User’s and Administrator’s Guide
• SGI L1 and L2 Controller Software User’s Guide
007–4273–007 xiii
About This Guide
Obtaining Publications
You can obtain SGI documentation as follows:
• Log in to the SGI Customer Portal at http://support.sgi.com. Click the following:
Support by Product
> productname
> Documentation
If you do not find what you are looking for, click Search Knowledgebase, enter a
document-title keyword, select the category Documentation, and click Search.
• The /docs directory on the ISSP DVD or in the online download page contains
information about the release, such as the following:
– The ISSP release note: /docs/README.txt
– Other release notes: /docs/README_NAME.txt
– A complete list of the packages and their location on the media:
/docs/RPMS.txt
– The packages and their respective licenses: /docs/PACKAGE_LICENSES.txt
• The /docs directory on the SGI XFS & XVM media kit for RHEL CD or in the
online download page contains information about the release, such as the
following :
– The XFS & XVM media kit release note:
/docs/xfs_xvm-VERSION-reademe.txt
– A complete list of the packages and their location on the media:
/docs/xfs_xvm-VERSION-rpms.txt
– The packages and their respective licenses: /docs/PACKAGE_LICENSES.txt
• The ISSP release notes and manuals are provided in the noarch/sgi-isspdocs
RPM and will be installed on the system into the following location:
/usr/share/doc/packages/sgi-issp-VERSION/TITLE
• You can view man pages by typing man title at a command line.
xiv 007–4273–007
®
XFS Administrator Guide
Note: The external websites referred to in this guide were correct at the time of
publication, but are subject to change.
Conventions
The following conventions are used throughout this document:
Convention Meaning
command This fixed-space font denotes literal items such as
commands, files, routines, path names, signals,
messages, and programming language structures.
variable Italic typeface denotes variable entries and words or
concepts being defined.
user input This bold, fixed-space font denotes literal items that the
user enters in interactive sessions. (Output is shown in
nonbold, fixed-space font.)
[] Brackets enclose optional portions of a command or
directive line.
Reader Comments
If you have comments about the technical accuracy, content, or organization of this
publication, contact SGI. Be sure to include the title and document number of the
publication with your comments. (Online, the document number is located in the
front matter of the publication. In printed publications, the document number is
located at the bottom of each page.)
You can contact SGI in any of the following ways:
• Send e-mail to the following address:
[email protected]
• Contact your customer service representative and ask that an incident be filed in
the SGI incident tracking system.
007–4273–007 xv
About This Guide
xvi 007–4273–007
Chapter 1
2 007–4273–007
Chapter 2
For XFS filesystems on disk partitions and logical volumes and for the data
subvolume of filesystems on logical volumes, the block size guidelines are as follows:
• The minimum block size is 512 bytes. Small block sizes increase allocation
overhead which decreases filesystem performance. In general, the recommended
block size for filesystems under 100 MB and for filesystems with many small files
is 512 bytes. The filesystem block size must be a power of two.
• The default block size is 4096 bytes (4 KB). This is the recommended block size for
filesystems over 100 MB.
• The maximum block size is the page size of the kernel, which is 4 KB on x86
systems (both 32-bit and 64-bit) and is configurable on ia64 systems. Because large
block sizes can waste space, in general block sizes should not be larger than 4096
bytes (4 KB).
Block sizes are specified in bytes as follows:
• Decimal (default)
• Octal (prefixed by 0)
• Hexadecimal (prefixed by 0x or 0X)
007–4273–007 3
2: Planning an XFS Filesystem
4 007–4273–007
®
XFS Administrator Guide
Log Size
The maximum log size for either an internal log or an external log is 2,136,997,888
bytes (that is, 10 MB less than 2 GB), which equates to 521728 4-KB blocks. In
addition, the size of an internal log cannot be larger than the AG size.
007–4273–007 5
2: Planning an XFS Filesystem
Note: Although it is possible to explicitly set the size by using by the mkfs
command, it is much less reliable.
For a filesystem with very high transaction activity, SGI recommends using the
maximum log size.
Note the following:
• The larger the log, the more outstanding transactions that XFS can support.
• Using the maximum log size can increase the filesystem mount time after a crash.
• The amount of disk space required for log records is proportional to the
transaction rate and the size of transactions on the filesystem, not the size of the
filesystem. Larger block sizes result in larger transactions.
• Transactions from directory updates (for example, the mkdir and rmdir
commands and the create() and unlink() system calls) cause more log data
to be generated.
• The disk space dedicated to the log does not show up in listings from the df
command, nor can you access it with a filename.
6 007–4273–007
®
XFS Administrator Guide
– Maximum size:
-l size=521728b
• External log, where device is the location of the external log subvolume:
-l logdev=device
007–4273–007 7
2: Planning an XFS Filesystem
filesystem is larger than 64 GB, the default number of AGs is still greater than 8, but
the AG size is 4 GB.
XFS lets you select the stripe unit for a RAID device or stripe volume. This ensures
that data allocations, inode allocations, and the internal log will be aligned along
stripe units when the end-of-file is extended and the file size is larger than 512 KB.
You specify stripe units in 512-byte block units or in bytes. See the mkfs.xfs(1M)
man page for information on specifying stripe units.
When you specify a stripe unit, you also specify a stripe width in 512-byte block units
or in bytes. The stripe width must be a multiple of the stripe unit. The stripe width
will be the preferred I/O size returned in the stat() system call. See the
mkfs.xfs(8) man page for information on specifying stripe width.
When used in conjunction with the -b (block size) option of the mkfs.xfs
command, you can use the -d su= and -d sw= options to specify the stripe unit and
stripe width, respectively, in filesystem blocks.
For a RAID device, the default stripe unit is 0, indicating that the feature is disabled.
You should configure the stripe unit and width sizes of RAID devices in order to
avoid unexpected performance anomalies caused by the filesystem doing non-optimal
I/O operations to the RAID unit. For example, if a block write is not aligned on a
RAID stripe unit boundary and is not a full stripe unit, the RAID will be forced to do
a read/modify/write cycle to write the data. This can have a significant performance
impact. By setting the stripe unit size properly, XFS will avoid unaligned accesses.
For a striped volume, the stripe unit that was specified when the volume was created
is provided by default.
8 007–4273–007
®
XFS Administrator Guide
• If you plan to use logical volumes, you may want to repartition to create disk
partitions of equal size that can be striped or plexed.
007–4273–007 9
Chapter 3
Caution: When you create a filesystem, all files already on the disk partition or
! logical volume are destroyed.
Making a Filesystem
This section discusses the following:
• "Procedure to Make a Filesystem" on page 11
• "mkfs.xfs Using the Defaults" on page 13
007–4273–007 11
3: Creating XFS Filesystems
4. Use the mkfs.xfs(8) command to make the filesystem. See the following
examples:
• "mkfs.xfs Using the Defaults" on page 13
• "mkfs.xfs Specifying Block and Log Size of Internal Log" on page 13
• "mkfs.xfs for a Logical Volume with a Log Subvolume" on page 14
• "mkfs.xfs for a Directory Block Size Larger than Filesystem Block Size" on
page 15
5. Make a mount directory:
# mkdir -p mountdir
For example:
# mount /dev/sdc1 /mnt/scratch_space
For example:
/dev/sdc1 /mnt/scratch_space xfs defaults 0 0
12 007–4273–007
®
XFS Administrator Guide
Note: Do not run fsck for XFS filesystems listed in /etc/fstab that use XVM
devices (that is, you should set the fsck flag to 0), because XVM devices are not
always available. If an fsck is run on an XFS filesystem when XVM devices are not
available, the system may suspend the system boot sequence and require input from
the administrator. XVM includes a helper service that mounts all filesystems listed in
/etc/fstab that use XVM devices at the time XVM is started during the boot
sequence.
The following example shows the command line to create an XFS filesystem using the
defaults and system output:
# mkfs.xfs /dev/sdc1
meta-data=/dev/sdc1 isize=256 agcount=18, agsize=1048576 blks
data = bsize=4096 blocks=17921788, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=0
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2187, version=1
= sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
blocksize is the filesystem block size (see "Choosing the Filesystem Block Size" on page
3), logsize is the size of the area dedicated to log records (see "Choosing the Log Type
007–4273–007 13
3: Creating XFS Filesystems
and Size" on page 4), and partition is the device name or logical volume. The default
values are 4-KB blocks and a 1000-block log.
The following example shows the command line used to create an XFS filesystem and
the system output. The filesystem has a 10–MB internal log and a block size of 1 KB
and is on the partition /dev/dsk/dks0d4s7.
# mkfs.xfs -b size=1k -l size=10m /dev/sdc1
meta-data=/dev/sdc1 isize=256 agcount=18, agsize=4194304 blks
data = bsize=1024 blocks=71687152, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=0
naming =version 2 bsize=4096
log =internal log bsize=1024 blocks=10240, version=1
= sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
14 007–4273–007
®
XFS Administrator Guide
mkfs.xfs for a Directory Block Size Larger than Filesystem Block Size
If you are making a filesystem with a directory block size that is larger than the
filesystem block size, use the following mkfs.xfs command to create the new XFS
filesystem:
# mkfs.xfs -b size=blocksize -n size=dirblocksize partition
dirblocksize is the directory block size (see "Choosing the Filesystem Directory Block
Size" on page 4).
For example:
# mkfs.xfs -b size=2k -n size=4k /dev/sdc1
meta-data=/dev/sdc1 isize=256 agcount=4,
agsize=152867832 blks
= sectsz=512 attr=2
data = bsize=2048 blocks=611471327,
imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=2048 blocks=298569, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Growing a Filesystem
To grow an existing XFS filesystem, increase the available disk space and use the
xfs_growfs(8) command. The filesystem must be mounted to be grown. The
existing contents of the filesystem are undisturbed, and the added space becomes
available for additional file storage.
Growing an XFS filesystem is supported on XVM volumes. You must first grow the
XVM volume before growing the XFS filesystem. For information on XVM volumes,
see the XVM Volume Manager Administrator’s Guide.
The following example grows a filesystem mounted at /mnt:
# xfs_growfs /mnt
meta-data=/mnt isize=256 agcount=30, agsize=262144 blks
data = bsize=4096 blocks=7680000, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=0
007–4273–007 15
3: Creating XFS Filesystems
16 007–4273–007
Chapter 4
Filesystem Maintenance
Filesystem Reorganization
Filesystems can become fragmented over time. When a filesystem is fragmented,
blocks of free space are small and files have many extents. The xfs_fsr command
reorganizes filesystems so that the layout of the extents is improved. This improves
overall performance. See the xfs_fsr(8) man page for more information.
Filesystem Corruption
Most often, a filesystem is corrupted because the system experienced a panic. This
can be caused by system software failure, hardware failure, or human error (for
example, pulling the plug). Another possible source of filesystem corruption is
overlapping partitions.
There is no foolproof way to predict hardware failure. The best way to avoid
hardware failures is to conscientiously follow recommended diagnostic and
maintenance procedures.
Human error is probably the greatest single cause of filesystem corruption. To avoid
problems, follow these rules closely:
• Always shut down the system properly. Do not simply turn off power to the
system. Use a standard system shutdown tool, such as the shutdown(8) command.
• Never remove a filesystem physically (never pull out a hard disk) without first
turning off power.
007–4273–007 17
4: Filesystem Maintenance
Note: The xfs_repair command without the -n option makes modifications and
should be used with caution; see "Repairing XFS Filesystem Problems" on page 21
• xfs_check
The xfs_check command calls the checking routines of the general-purpose XFS
filesystem debugger xfs_db, which requires more memory and time to check a
filesystem than does xfs_repair -n. You can use xfs_check on filesystems
18 007–4273–007
®
XFS Administrator Guide
with extended attributes. (For more information about extended attributes, see the
attr(1) man page.)
The filesystem to be checked must have been unmounted cleanly using normal system
administration procedures (the umount command or system shutdown), not as a
result of a crash or system reset. If the filesystem has not been unmounted cleanly,
mount it and unmount it cleanly before running xfs_check or xfs_repair -n.
Unlike fsck, xfs_check and xfs_repair -n are not invoked automatically on
system startup. You should use these commands if you suspect a filesystem
consistency problem.
Caution: If you suspect problems with the root filesystem, you should use a boot disk
! or an alternate root to run xfs_repair.
device is the device file for a disk partition or logical volume that contains an XFS
filesystem, such as /dev/xscsi/pci02.02.0-1/target3/lun0/part1
007–4273–007 19
4: Filesystem Maintenance
- agno = 0
- agno = 1
...
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
For information about potential errors, see "Common xfs_repair Error Messages"
on page 23.
For more details, see the xfs_repair(8) man page.
20 007–4273–007
®
XFS Administrator Guide
The xfs_repair (without the -n option) checks XFS filesystem consistency and, if
problems are detected, also corrects them if possible. The filesystem to be checked
and repaired must have been unmounted cleanly using normal system administration
procedures (the umount command or system shutdown), not as a result of a crash or
system reset. If the filesystem has not been unmounted cleanly, mount it and
unmount it cleanly before running xfs_repair.
The command line for xfs_repair when you want it to repair any inconsistencies it
finds is:
# xfs_repair device
device is the disk or volume device for the filesystem. It must not be mounted.
007–4273–007 21
4: Filesystem Maintenance
The following example shows the output you see from running xfs_repair on a
clean filesystem:
# xfs_repair /dev/xscsi/pci02.02.0-1/target3/lun0/part1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
...
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
...
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
22 007–4273–007
®
XFS Administrator Guide
There was something wrong with the inode that was not correctable,
so xfs_repair turned it into a zero-length free inode. This usually
happens because the inode claims blocks that are used by something
else or the inode itself is badly corrupted. Typically, the cleared
007–4273–007 23
4: Filesystem Maintenance
24 007–4273–007
®
XFS Administrator Guide
In this example, inode 242000 was an inode that was moved to lost+found during
a previous xfs_repair run. This run of xfs_repair found that the filesystem is
consistent. If the lost+found directory had been empty, in phase 4 only the
messages about clearing and deleting the lost+found directory would have
appeared. The imap claims and disconnected inode messages appear (one pair
of messages per inode) if there are inodes in the lost+found directory.
007–4273–007 25
4: Filesystem Maintenance
mounting the filesystem with the -o norecover option of the mount command.
This option mounts the filesystem without running log recovery. You must mount the
filesystem as read-only when you use this option.
Option Description
ro Read only
rw Read-write
barrier Barrier on
nobarrier Barrier off
swalloc Stripe allocation on
noalign Stripe allocation off
For more information, see the mount(8) command.
26 007–4273–007
Chapter 5
Disk Quotas
007–4273–007 27
5: Disk Quotas
You can also impose limits according to user ID, group ID, or project ID. You can
associate a directory in the filesystem hierarchy with a project ID by including it in
the /etc/projects file. (You can use /etc/projid to map each project name to
its number.) With project quotas in effect, such a directory and all files and directories
below it can be subjected to a quota, meaning that the aggregate resource used
thereunder is limited. For more information, see the xfs_quota(8) man page.
Note: Group quotas and project quotas are mutually exclusive per filesystem because
XFS records either the project ID or the group ID of a file in the same physical
location; how the number is interpreted depends upon whether project or group
quotas are in force.
Disk quotas can be used to do disk usage accounting. Disk usage accounting
monitors disk usage, but does not enforce disk usage limits. See "Monitoring Disk
Space Usage with Quota Accounting" on page 34 for more information.
You must first turn on disk quotas on a filesystem, then you can set quotas on that
filesystem for individual users and for projects or groups.
For more details about disk quotas, see the quotas(4) man page.
28 007–4273–007
®
XFS Administrator Guide
Enabling Quotas
This section discusses the following:
• "Enabling Quotas for Users" on page 29
• "Enabling Quotas for Groups" on page 29
• "Enabling Quotas for Projects" on page 30
• To turn on disk quotas manually for users on a non-root filesystem, mount the
filesystem with this command:
# mount -o quota fsname rootdir
fsname is the device name of the filesystem, rootdir is the directory where the
filesystem is mounted.
• To turn on disk quotas for users on the root filesystem, you must pass the quota
mount options into the kernel at boot time through the Linux rootflags boot
option. The following example adds the rootflags=quota option to the append
line in elilo.conf:
append="root=/dev/xscsi/pci00.01.0-1/tsrget0/lun0/part3 rootflags=quota"
007–4273–007 29
5: Disk Quotas
• To turn on disk quotas manually for groups on a non-root filesystem, mount the
filesystem with this command:
# mount -o gquota fsname rootdir
fsname is the device name of the filesystem, rootdir is the directory where the
filesystem is mounted.
• To turn on disk quotas for groups on the root filesystem, you must pass the quota
mount options into the kernel at boot time through the Linux rootflags boot
option. The following example adds the rootflags=gquota option to the
append line in elilo.conf:
append="root=/dev/xscsi/pci00.01.0-1/tsrget0/lun0/part3 rootflags=gquota"
Note: Group and project quotas are mutually exclusive per filesystem.
• To turn on disk quotas manually for projects on a non-root filesystem, mount the
filesystem with this command:
# mount -o prjquota fsname rootdir
fsname is the device name of the filesystem, rootdir is the directory where the
filesystem is mounted.
• To turn on disk quotas for projects on the root filesystem, you must pass the quota
mount options into the kernel at boot time through the Linux rootflags boot
option. The following example adds the rootflags=prjquota option to the
append line in elilo.conf:
append="root=/dev/xscsi/pci00.01.0-1/tsrget0/lun0/part3 rootflags=prjquota"
30 007–4273–007
®
XFS Administrator Guide
Note: Group and project quotas are mutually exclusive per filesystem.
where:
• N is a soft or hard limit for disk usage in blocks of the specified unit: k (kilobytes),
m (megabytes), g (gigabytes), or t (terabytes)
• user is a user name or numeric user ID
• rootdir is the mount point of the XFS filesystem.
For example, to set limits for user userA on /mnt/myxfs using a soft limit of 5
Mbytes and a hard limit of 6 Mbytes:
# xfs_quota -x -c ’limit -u bsoft=5m bhard=6m userA’ /mnt/myxfs
007–4273–007 31
5: Disk Quotas
where:
• N is a soft or hard limit for disk usage in blocks of the specified unit: k (kilobytes),
m (megabytes), g (gigabytes), or t (terabytes)
• group is a group name or numeric group ID
• rootdir is the mount point of the XFS filesystem.
For example, to set limits for group groupA on /mnt/myxfs using a soft limit of 5
Mbytes and a hard limit of 6 Mbytes:
# xfs_quota -x -c ’limit -g bsoft=5m bhard=6m groupA’ /mnt/myxfs
Note: Group and project quotas are mutually exclusive per filesystem.
where:
• N is a soft or hard limit for disk usage in blocks of the specified unit: k (kilobytes),
m (megabytes), g (gigabytes), or t (terabytes)
• project is a project name or numeric group ID
• rootdir is the mount point of the XFS filesystem.
For example, to set limits for project projectA on /mnt/myxfs using a soft limit of
5 Mbytes and a hard limit of 6 Mbytes:
# xfs_quota -x -c ’limit -p bsoft=5m bhard=6m projectA’ /mnt/myxfs
For more information about projects, see the xfs_quota(8) man page.
32 007–4273–007
®
XFS Administrator Guide
Administering Quotas
If the filesystem being dumped contains quotas, xfsdump will use xfs_quota(8) to
store the quotas in the following files in the root of the filesystem to be dumped:
xfsdump_quotas User quotas
xfsdump_quotas_group Group quotas
These files will then be included in the dump. These files will appear only for those
quotas that are enabled on the filesystem being dumped. Upon restoration, you can
use xfs_quota to reactivate the quotas for the filesystem.
Note: The xfsdump_quotas file will probably require modification to change the
filesystem or UIDs if the filesystem has been restored to a different partition or system.
007–4273–007 33
5: Disk Quotas
• To create a file that lists the current quota limits of all the filesystems for groups,
enter this command as superuser:
# xfs_quota -x -c ’report -g -f quotafile’
fsname is the device name of the filesystem, rootdir is the directory where the
filesystem is mounted.
34 007–4273–007
®
XFS Administrator Guide
• To turn on disk usage accounting manually on the root filesystem for user quotas,
execute the following commands. The quotaon command turns on disk
accounting with enforcement, and the quotaoff -o command turns off the
enforcement:
# quotaon -v /
# quotaoff -v -o enforce /
# reboot
• To turn on disk usage accounting manually on the root filesystem (/) for group
quotas:
# quotaon -v -o gquota /
# quotaoff -v -o gqenforce /
# reboot
• To get information about disk usage, use the commands described in "Checking
Disk Space Usage" on page 35.
007–4273–007 35
Chapter 6
007–4273–007 37
6: Backup and Recovery Procedures
• File types:
Regular
Directory
Symbolic link
Block and character special
FIFO
socket
xfsdump and xfsrestore retain hard links. xfsdump does not affect the state of
the filesystem being dumped (for example, access times are retained). xfsrestore
detects and bypasses media errors and recovers rapidly after encountering them.
xfsdump does not cross mount points, local or remote.
xfsdump optionally prompts for additional media when the end of the current media
is reached. Operator estimates of media capacity are not required and xfsdump also
supports automated backups. xfsdump maintains an extensive online inventory of all
dumps performed. Inventory contents can be viewed through various filters to
quickly locate specific dump information. xfsrestore supports interactive
operation, allowing selection of individual files or directories for recovery. It also
permits selection from among backups performed at different times when multiple
dumps are available. Dump contents may also be viewed noninteractively.
Note: If you are using disk quotas on XFS filesystems, see Chapter 5, "Disk Quotas".
38 007–4273–007
®
XFS Administrator Guide
Media files
Data
tory
Inven
r
inato
Term
007–4273–007 39
6: Backup and Recovery Procedures
You can also dump data streams that are larger than a single media object. The data
stream can be broken between any two media files including data segment boundaries.
(The inventory is never broken into segments.) In addition, if you specify multiple
drives, the dump is automatically broken into multiple streams. The xfsdump utility
prompts for a new media object when the end of the current media object is reached.
Figure 6-2 illustrates the data layout of a single dump session that requires two media
objects on each of two devices.
40 007–4273–007
®
XFS Administrator Guide
ent
segm
Data
ent
segm
Data
ent
segm
Data
t1
objec
Media
ent
segm
Data
tory
Inven
r
inato
Term
t2
objec
Media
007–4273–007 41
6: Backup and Recovery Procedures
The xfsdump utility also accommodates multiple dumps on a single media object.
When dumping to tape, for example, the tape is automatically advanced past the
existing dump sessions and the existing stream terminator is erased. The new dump
data is then written, followed by the new stream terminator. (For drives that do not
permit termination to operate in this way, other means are used to achieve the same
effective result.)
Figure 6-3 illustrates the layout of media files for two dumps on a single media object.
Figure 6-4 illustrates a case in which multiple dumps use multiple media objects. If
media files already exist on the additional media objects, the xfsdump utility finds the
existing stream terminator, erases it, and begins writing the new dump data stream.
42 007–4273–007
®
XFS Administrator Guide
ent
segm
Data
Firs
t
dum
ent
segm
Data
p
tory
Inven
er
Form inator
term tion
loca
ent
segm
Data
ent
segm
Data
Sec
ond
dum
tory
Inven
p
r
inato
Term
007–4273–007 43
6: Backup and Recovery Procedures
ent
segm
Firs
Data
t
dum
ent
segm
p
Data
tory
Inven
er
Form inator
term tion
loca
ent
segm
Me
Data
dia
obje
ent
segm
ct 1
Data
Sec
ent
segm
ond
Data
dum
p
ent
segm
Data
tory
Inven
Me
dia
r
inato
obje
Term
ct 2
44 007–4273–007
®
XFS Administrator Guide
xfsdump Syntax
You must be the superuser to use xfsdump. To display a summary of xfsdump
syntax, use the -h option:
# xfsdump -h
xfsdump: version X.X
xfsdump: usage: xfsdump [ -b <blocksize> (with minimal rmt option) ]
[ -c <media change alert program> ]
[ -f <destination> ... ]
[ -h (help) ]
[ -l <level> ]
[ -m <force usage of minimal rmt> ]
[ -o <overwrite tape > ]
[ -p <seconds between progress reports> ]
[ -s <subtree> ... ]
[ -v <verbosity {silent, verbose, trace}> ]
[ -A (don’t dump extended file attributes) ]
[ -B <base dump session id> ]
[ -E (pre-erase media) ]
[ -F (don’t prompt) ]
[ -I (display dump inventory) ]
007–4273–007 45
6: Backup and Recovery Procedures
Note: The dump level does not need to be specified for a level–0 dump. For a
discussion of dump levels, see "About Incremental and Resumed Dumps" on page 50.
46 007–4273–007
®
XFS Administrator Guide
In this case, a session label (-L option) and a media label (-M option) are supplied,
and the entire filesystem is dumped. Since no verbosity option is supplied, the default
of verbose is used, resulting in the detailed screen output. The dump inventory is
updated with the record of this backup because the -J option is not specified.
Following is an example of a backup of a subdirectory of a filesystem. In the
following example, the verbosity is set to silent, and the dump inventory is not
updated (-J option):
# xfsdump -f /dev/tape -v silent -J -s people/fred /usr
Note: For remote backups, use the variable block size tape device if the device
supports variable block size operation; otherwise, use the fixed block size device. For
more information, see intro(7) .
007–4273–007 47
6: Backup and Recovery Procedures
In this case, /disk2/engr is backed up to the variable block size tape device on the
remote system magnolia. Existing dumps on the tape mounted on magnolia were
skipped before recording the new data.
48 007–4273–007
®
XFS Administrator Guide
Note: The superuser account on the local system must be able to rsh to the remote
system without a password. For more information, see hosts.equiv(4) .
Backing Up to a File
You can back up data to a file instead of a device. In the following example, a file
(Makefile) and a directory (Source) are backed up to a dump file
(monday_backup) in /usr/tmp on the local system:
# xfsdump -f /usr/tmp/monday_backup -v silent -J -s \
people/fred/Makefile -s people/fred/Source /usr
You may also dump to a file on a remote system, but the file must be in the remote
system’s /dev directory. For example, the following command backs up the
/usr/people/fred subdirectory on the local system to the regular
file /dev/fred_mon_12-2 on the remote system theduke:
Alternatively, you could dump to any remote file if that file is on an NFS-mounted
filesystem. In any case, permission settings on the remote system must allow you to
write to the file.
For information on using the standard input and standard output capabilities of
xfsdump and xfsrestore to pipe data between filesystems or across the network,
see "Using xfsdump and xfsrestore to Copy Filesystems" on page 68.
Reusing Tapes
When you use a new tape as the media object of a dump session, xfsdump begins
writing dump data at the beginning of the tape without prompting. If the tape
already has dump data on it, xfsdump begins writing data after the last dump
stream, again without prompting.
007–4273–007 49
6: Backup and Recovery Procedures
If, however, the tape contains data that is not from a dump session, xfsdump
prompts you before continuing:
# xfsdump -f /dev/tape /test
xfsdump: version X.X - type ^C for status and control
xfsdump: dump date: Fri Dec 2 11:25:19 1994
xfsdump: level 0 dump
xfsdump: session id: d23cc072-b21d-1001-8f97-080069068eeb
xfsdump: preparing tape drive
xfsdump: this tape contains data that is not part of an XFS dump
xfsdump: do you want to overwrite this tape?
type y to overwrite, n to change tapes or abort (y/n):
You must answer y if you want to continue with the dump session, or n to quit. If
you answer y, the dump session resumes and the tape is overwritten. If you do not
respond to the prompt, the session eventually times out.
Note: This means that an automatic backup, for example one initiated by a crontab
entry, will not succeed unless you specified the -F option with the xfsdump
command, which forces it to overwrite the tape rather than prompt for approval.
Caution: This erases all data on the tape, including any dump sessions
!
The tape can now be used by xfsdump without prompting for approval.
50 007–4273–007
®
XFS Administrator Guide
always backs up the complete filesystem. A dump level of any other number backs
up all files that have changed since a dump with a lower dump level number.
For example, if you perform a level–2 backup on a filesystem one day and your next
dump is a level–3 backup, only those files that have changed since the level–2 backup
are dumped with the level–3 backup. In this case, the level–2 backup is called the base
dump for the level–3 backup. The base dump is the most recent backup of that
filesystem with a lower dump level number.
Resumed dumps work in much the same way. When a dump is resumed after it has
been interrupted, the remaining files that had been scheduled to be backed up during
the interrupted dump session are backed up, and any files that changed during the
interruption are also backed up.
A week later, a level–1 dump of the filesystem is performed on the same tape:
# xfsdump -f /dev/tape -l 1 -L week_2 /usr
The tape is forwarded past the existing dump data and the new data from the level 1
dump is written after it. (Note that it is not necessary to specify the media label for
each successive dump on a media object.)
A week later, a level 2 dump is taken and so on, for the four weeks of a month in this
example, the fourth week being a level 3 dump (up to nine dump levels are
supported):
# xfsdump -f /dev/tape -l 2 -L week_3 /usr
007–4273–007 51
6: Backup and Recovery Procedures
You can later continue the dump by including the-R option and a different session
label:
# xfsdump -f /dev/tape -R -L week_1.contd -v silent /disk2p
Any files that were not backed up before the interruption, and any file changes that
were made during the interruption, are backed up after the dump is resumed.
Note: Use of the -R option requires that the dump was made with a dump inventory
taken, that is, the -J option was not used with xfsdump.
52 007–4273–007
®
XFS Administrator Guide
The dump inventory records are presented sequentially and are indented to illustrate
the hierarchical order of the dump information.
You can view a subset of the dump inventory by specifying the level of depth (1, 2, or
3) that you want to view. For example, specifying depth=2 filters out a lot of the
specific dump information, as you can see by comparing the previous output with the
following:
007–4273–007 53
6: Backup and Recovery Procedures
# xfsdump -I depth=2
file system 0:
fs id: d23cb450-b21d-1001-8f97-080069068eeb
session 0:
mount point: magnolia.abc.xyz.com:/test
device: magnolia.abc.xyz.com:/dev/rdsk/dks0d3s2
time: Mon Nov 28 11:44:04 1994
session label: ""
session id: d23cbf44-b21d-1001-8f97-080069068eeb
level: 0
resumed: NO
subtree: NO
streams: 1
session 1:
mount point: magnolia.abc.xyz.com:/test
device: magnolia.abc.xyz.com:/dev/rdsk/dks0d3s2
...
You can also view a filesystem-specific inventory by specifying the filesystem mount
point with the mnt option. The following output shows an example of a dump
inventory display in which the depth is set to 1, and only a single filesystem is
displayed:
# xfsdump -I depth=1,mnt=magnolia.abc.xyz.com:/test
filesystem 0:
fs id: d23cb450-b21d-1001-8f97-080069068eeb
You can also look at a list of contents on the dump media itself by using the-t option
with xfsrestore. See "Displaying the Contents of the Dump Media with
xfsrestore" on page 57.
About xfsrestore
This section discusses the following:
• "xfsrestore Syntax" on page 55
• "Displaying the Contents of the Dump Media with xfsrestore" on page 57
• "Performing Simple Restores with xfsrestore" on page 58
• "Restoring Individual Files with xfsrestore" on page 60
54 007–4273–007
®
XFS Administrator Guide
xfsrestore Syntax
You can use the xfsrestore command to view and extract data from the dump data
created by xfsdump.
You can get a summary of xfsrestore syntax with the --h option:
# xfsrestore -h
xfsrestore: version X.X
xfsrestore: usage: xfsrestore [ -a <alt. workspace dir> ... ]
[ -e (don’t overwrite existing files) ]
[ -f <source> ... ]
[ -h (help) ]
[ -i (interactive) ]
[ -n <file> (restore only if newer than) ]
[ -o (restore owner/group even if not root) ]
[ -p <seconds between progress reports> ]
[ -r (cumulative restore) ]
[ -s <subtree> ... ]
[ -t (contents only) ]
[ -v <verbosity {silent, verbose, trace}> ]
[ -A (don’t restore extended file attributes) ]
[ -C (check tape record checksums) ]
[ -D (restore DMAPI event settings) ]
[ -E (don’t overwrite if changed) ]
[ -F (don’t prompt) ]
[ -I (display dump inventory) ]
[ -J (inhibit inventory update) ]
[ -L <session label> ]
[ -N (timestamp messages) ]
[ -O <options file> ]
007–4273–007 55
6: Backup and Recovery Procedures
Use xfsrestore to restore data backed up with xfsdump. You can restore files,
subdirectories, and filesystems regardless of the way they were backed up. For
example, if you back up an entire filesystem in a single dump, you can select
individual files and subdirectories from within that filesystem to restore.
You can use xfsrestore interactively or noninteractively. With interactive mode,
you can peruse the filesystem or files backed up, selecting those you want to restore.
In noninteractive operation, a single command line can restore selected files and
subdirectories, or an entire filesystem. You can restore data to its original filesystem
location or any other location in an XFS filesystem.
By using successive invocations of xfsrestore, you can restore incremental dumps
on a base dump. This restores data in the same sequence it was dumped.
56 007–4273–007
®
XFS Administrator Guide
007–4273–007 57
6: Backup and Recovery Procedures
hostname: cumulus
mount point: /disk2
volume: /dev/rdsk/dks0d2s0
session time: Wed Oct 25 16:59:00 1995
level: 0
session label: ‘‘tape1’’
media label: ‘‘media1’’
file system id: d2a602fc-b21d-1001-8938-08006906dc5c
session id: d2a61284-b21d-1001-8938-08006906dc5c
media id: d2a61285-b21d-1001-8938-08006906dc5c
58 007–4273–007
®
XFS Administrator Guide
In this case, xfsrestore went to the first dump on the tape and asked if this was
the dump to restore. If you had entered 1 for “skip,” xfsrestore would have
proceeded to the next dump on the tape (if there was one) and asked if this was the
dump you wanted to restore.
You can request a specific dump if you used xfsdump with a session label. For
example:
# xfsrestore -f /dev/tape -L Wed_11_23 /usr
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: advancing tape to next media file
xfsrestore: dump session found
xfsrestore: restore of level 0 dump of magnolia.abc.xyz.com:/usr created Wed Nov 23 11:17:54 1994
xfsrestore: beginning media file
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
xfsrestore: restoring non-directory files
xfsrestore: ending media file
xfsrestore: restoring directory attributes
xfsrestore: restore complete: 200 seconds elapsed
In this way you recover a dump with a single command line and do not have to
answer y or n to the prompts asking you if the dump session found is the correct
one. To be even more exact, use the -S option and specify the unique session ID of
the particular dump session:
# xfsrestore -f /dev/tape -S \ d23cbf47-b21d-1001-8f97-080069068eeb /usr2/tmp
xfsrestore: version X.X - type ^C for status and control
xfsrestore: preparing tape drive
xfsrestore: dump session found
xfsrestore: advancing tape to next media file
xfsrestore: advancing tape to next media file
xfsrestore: dump session found
xfsrestore: restore of level 0 dump of magnolia.abc.xyz.com:/test resumed Mon Nov 28 11:50:41 1994
xfsrestore: beginning media file
xfsrestore: media file 0 (media 0, file 2)
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
007–4273–007 59
6: Backup and Recovery Procedures
You can find the session ID by viewing the dump inventory (see "Examining xfsdump
Archives" on page 53). Session labels might be duplicated, but session IDs never are.
You can also restore a file “in place” that is, restore it directly to where it came from
in the original backup.
Note: However, if you do not use the -e, -E, or -n option, you will overwrite any
existing files of the same name.
60 007–4273–007
®
XFS Administrator Guide
xfsrestore: restore of level 0 dump of magnolia.abc.xyz.com:/usr2 created Tue Dec 6 10:55:17 1994
xfsrestore: beginning media file
xfsrestore: media file 0 (media 0, file 1)
xfsrestore: reading ino map
xfsrestore: initializing the map tree
xfsrestore: reading the directory hierarchy
xfsrestore: restoring non-directory files
xfsrestore: ending media file
xfsrestore: restoring directory attributes
xfsrestore: restore complete: 203 seconds elapsed
In this case, the dump data is extracted from the tape on magnolia, and the
destination is the directory /usr2 on the local system. For an example of using the
standard input option of xfsrestore, see "Using xfsdump and xfsrestore to
Copy Filesystems" on page 68.
Note: Interactive restore is not allowed when the xfsrestore source is standard
input (stdin).
007–4273–007 61
6: Backup and Recovery Procedures
In the interactive restore session above, the subdirectory people/fred and the file
two were restored relative to the current working directory (“.”). An asterisk (*) in
your ls output indicates your selections.
62 007–4273–007
®
XFS Administrator Guide
In the following example, the level–0 base dump and succeeding higher-level dumps
are on /dev/tape. First the level-0 dump is restored, then each higher-level dump
in succession:
# /usr/tmp/xfsrestore -f /dev/tape -r -v silent .
hostname: cumulus
mount point: /disk2
volume: /dev/rdsk/dks0d2s0
session time: Wed Oct 25 14:37:47 1995
level: 0
session label: "week_1"
media label: "Jun_94"
file system id: d2a602fc-b21d-1001-8938-08006906dc5c
session id: d2a60b26-b21d-1001-8938-08006906dc5c
media id: d2a60b27-b21d-1001-8938-08006906dc5c
Next, enter the same command again. The program goes to the next dump and again
you select the default:
# xfsrestore -f /dev/tape -r -v silent .
hostname: cumulus
mount point: /disk2
007–4273–007 63
6: Backup and Recovery Procedures
volume: /dev/rdsk/dks0d2s0
session time: Wed Oct 25 14:40:54 1995
level: 1
session label: "week_2"
media label: "Jun_94"
file system id: d2a602fc-b21d-1001-8938-08006906dc5c
session id: d2a60b2b-b21d-1001-8938-08006906dc5c
media id: d2a60b27-b21d-1001-8938-08006906dc5c
You then repeat this process until you have recovered the entire sequence of
incremental dumps. The full and latest copy of the filesystem will then have been
restored. In this case, it is restored relative to “.”, that is, in the directory you are in
when the sequence of xfsrestore commands is issued.
Restore an interrupted dump just as if it were an incremental dump. Use the -r
option to inform xfsrestore that you are performing an incremental restore, and
answer y and n appropriately to select the proper “increments” to restore (see
"Performing Cumulative Restores with xfsrestore" on page 62).
64 007–4273–007
®
XFS Administrator Guide
007–4273–007 65
6: Backup and Recovery Procedures
From this it can be determined that session 0 was interrupted and then resumed and
completed in session 1.
To restore the interrupted dump session in the example above, use the following
sequence of commands:
# xfsrestore -f /dev/tape -r -L 180894usr .
# xfsrestore -f /dev/tape -r -L Resumed180894usr .
This restores the entire /usr backup relative to the current directory. (You should
remove the housekeeping directory from the destination directory when you are
finished.)
Interrupting xfsrestore
In a manner similar to xfsdump interruptions, you can interrupt an xfsrestore
session. This allows you to interrupt a restore session and then resume it later. To
interrupt a restore session, type the interrupt character (typically <CTRL-C>). You
receive a list of options, which include interrupting the session or continuing.
# xfsrestore -f /dev/tape -v silent /disk2
hostname: cumulus
mount point: /disk2
volume: /dev/rdsk/dks0d2s0
session time: Wed Oct 25 17:20:16 1995
level: 0
session label: "week1"
media label: "newtape"
file system id: d2a602fc-b21d-1001-8938-08006906dc5c
session id: d2a6129e-b21d-1001-8938-08006906dc5c
media id: d2a6129f-b21d-1001-8938-08006906dc5c
66 007–4273–007
®
XFS Administrator Guide
please confirm
1: interrupt this session
2: continue (default) (timeout in 60 sec)
-> 1
interrupt request accepted
007–4273–007 67
6: Backup and Recovery Procedures
removed during the process of performing the cumulative recovery but should be
removed after the cumulative recovery is completed.
• orphanage is created if a file or subdirectory is restored that is not referenced in
the filesystem structure of the dump. For example, if you dump a very active
filesystem, it is possible for new files to be in the non-directory portion of the
dump, yet none of the directories dumped reference that file. A warning message
is displayed, and the file is placed in the orphanage directory, named with its
original inode number and generation count (for example, 123479.14).
Note: The superuser account on the local system must be able to rsh to the remote
system without a password. For more information, see hosts.equiv(4).
68 007–4273–007
Chapter 7
For example, if you have six AGs and two concats, you would use a value of 4:
(6/2) + 1 = 4
Note: The agskip mount option disables the rotorstep system tunable parameter.
007–4273–007 69
7: Enhanced XFS Extensions
ibound Purpose
The purpose of the ibound mount option is to specify the location of the metadata
region, which contains metadata operations for extended attributes, directory entries,
and inodes. The remainder of the filesystem is known as the user-extents region.
If you first create a filesystem on a volume that concatenates a slice of solid-state
drive (SSD) media with rotating hard-disk drive (HDD) media, you can then use the
ibound mount option to restrict metadata to the SSD media at the beginning of that
filesystem. The result will be operations that take place on media with the
appropriate characteristics:
• Small latency-sensitive metadata operations in the metadata region on fast SSD
media
• Large bandwidth-demanding and capacity-intensive user data operations in the
user-extents region on HDD media
70 007–4273–007
®
XFS Administrator Guide
region (for practical purposes, you will normally want more AGs in the user-extents
region). If this requirement is not met, the option is ignored. See "When ibound is
Ignored" on page 75.
There should be at least 8 AGs in the metadata region on the SSD. For example, to
create 8 AGs, you would set the AG size using the mkfs.xfs agsize option so that
it is 1/8 the size of the SSD.
Note: The configuration rules for filesystems using ibound may result in an XFS
filesystem with thousands of AGs. With this many AGs, XFS will consume more CPU
resources searching for free space in a nearly full filesystem. For best performance,
ensure that the filesystem is less than ~90% full as reported by the df(1) command.
To maximize performance of the filesystem with an SSD drive, you should use an
external log. You can use a partition of the SSD media or separate HDD media.
Note: The argument you supply to ibound is the address of the physical disk block,
not the filesystem block.
This value is then rounded up to the end of the AG that holds the specified block.
Ideally, the address that you specify will be at the end of the AG, and that AG will
consist of SSD disk. The resulting region, from the beginning of the first AG (that is,
block 0 of AG0) through end of the AG that contains the specified physical address, is
the metadata region. The remainder of the filesystem is the user-extents region.
007–4273–007 71
7: Enhanced XFS Extensions
XFS will use as many AGs within the user-extents region as required to contain the
user data for a file. By default, it will start the user-data allocation at a specific AG, if
that AG is available. If the desired AG has become too full or fragmented, the next
AG will be used in order, wrapping around to the first AG in the user-extents region.
The specific AG that XFS selects for the beginning of user data for a file is calculated
based upon the AG used for the corresponding inodes:
• For inodes located in AG0 (the first AG in the metadata region), XFS will attempt
to begin to allocate space starting in the first AG of the user-extents region
• For inodes allocated in successive AGs within the metadata region, XFS will
attempt to begin to allocate space in proportionally indexed AGs within the
user-extents region
For example, Figure 7-1 shows a conceptual diagram using a value of
ibound=15022944, which is located in AG7 (for best performance, the value should
represent the final physical block in the AG). This designates that the metadata region
is AG0 through AG7.
SSD HDD
Figure 7-1 ibound Value Specifying the End of the Metadata Region
Note: The metadata region always consists of complete AGs. If you specify a value
that is not the final block, the metadata region end-point will be rounded up to the
final block of the AG.
In this case, the filesystem has 8 AGs in the metadata region (AG0–AG7) and 24 AGs
in the user-extents region (AG8–AG31). For each AG within the metadata region,
72 007–4273–007
®
XFS Administrator Guide
Table 7-1 shows the default selection preference for the corresponding user-extents
region. Figure 7-2 represents this graphically.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Figure 7-2 Mapping Metadata-Region AGs to the Beginning User-Extents Region AGs
For example, using the above situation, suppose the inode for file myfile is located
in AG1. XFS would therefore by default prefer to start allocating user-extents for the
file in AG11; however, if AG11 is busy, XFS will start allocation of space at AG12,
allocating space in as many AGs as necessary. When XFS reaches the end of the
007–4273–007 73
7: Enhanced XFS Extensions
user-extent region at AG31, it will wrap around to the beginning of the user-extent
region at AG8.
To override the default ibound extent allocation policy, see "agskip Mount Option
for Allocation Group Specification" on page 69.
Note: If agskip is specified, its value is used instead of the default proportional
indexing. For example, if you specified agskip=2 for the above situation, the start of
user data for the first new file written will be in AG8 because it is the first AG in the
user-extents area and the start of user data for the second new file written will be in
AG10.
The size of other metadata is highly variable and depends heavily upon how the
filesystem is used. If there are large extended attributes in the filesystem or if there
are long filenames, more overhead space will be required.
Questions to consider:
• What is the typical size of a filename? Include in the overhead the average
filename length times the number of inodes.
• What is the typical size of a directory name? Include in the overhead the average
directory name length times the number of directories.
• What percentage of the inodes will be directories? Include in the overhead 4096
bytes for each directory with a minimum of 32 bytes times the total number of
inodes.
• What is the size and number of extended attributes? Include in the overhead the
number of extended attributes times the average extended-attribute size.
74 007–4273–007
®
XFS Administrator Guide
Note: If you specify a block that is not at the end of the AG, the value will be
rounded up to the end of the AG that contains the specified value.
5. Verify that the mount was successful by examining the XFS kernel messages.
Note: The ibound and inode64 mount options are mutually exclusive. If you
issue both options, an error will be logged.
007–4273–007 75
7: Enhanced XFS Extensions
When the ibound mount option is used successfully, the XFS kernel module will log
an INFO message, indicating the maximum possible inode identification number that
results given the effective metadata region. ().
Note: This number is the inode identification number, not the count of inodes.
For example:
XFS: filesystem filesystem_name maximum new inode number is new_inode_ID_number
If the ibound value that you specify points to a block that does not allow for a
sufficient number of inodes, the XFS kernel module will log a WARN message to
indicate that it will instead use an appropriate value. For example:
XFS: filesystem filesystem_name ibound is too small, using new_inode_ID_number
76 007–4273–007
®
XFS Administrator Guide
If there are insufficient AGs in the user-extents area, the XFS kernel module will log a
WARN message, indicating that it is reverting to either inode32 or inode64 behavior, as
appropriate for the filesystem size. For example:
XFS: filesystem filesystem_name ibound is too small, using inode32|inode64
Message Indicating that the Filesystem Has Grown and ibound is Reinstated
If the filesystem grows so that there are sufficient AGs in the user-extents area, then
ibound will be reinstated and the following message will be logged:
XFS: filesystem filesystem_name maximum new inode number is new_inode_ID_number
This example describes how to create an XVM volume using both SSD and HDD so
that the SSD is used for storing as many inodes as possible. The volume is
constructed so that the first 8 allocation groups (AGs) and external log are placed on
the SSD. The external log is the maximum size of 1 GiB. The remainder of the volume
is a two-disk stripe.
007–4273–007 77
7: Enhanced XFS Extensions
1. Partition the SSD disk and HDD disks similarly, using a GPT label and primary
partition that starts at MB 34:
• SSD disk sdb:
cxfsxe4:~ # parted /dev/sdb
GNU Parted 2.3
Using /dev/sdb
Welcome to GNU Parted! Type ’help’ to view a list of commands.
(parted) mklabel gpt
Warning: The existing disk label on /dev/sdb will be destroyed and all data on
this disk will be lost. Do you want to continue?
Yes/No? yes
(parted) unit s
(parted) mkpart primary xfs 34 -34
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? ignore
(parted) quit
Information: You may need to update /etc/fstab.
78 007–4273–007
®
XFS Administrator Guide
Warning: The existing disk label on /dev/sdd will be destroyed and all data on
this disk will be lost. Do you want to continue?
Yes/No? yes
(parted) unit s
(parted) mkpart primary xfs 34 -34
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? ignore
(parted) quit
Information: You may need to update /etc/fstab.
007–4273–007 79
7: Enhanced XFS Extensions
xvm:local> quit
80 007–4273–007
®
XFS Administrator Guide
6. Make the filesystem, specifying the largest disk address (sector) allowed to be
used for storing an inode (19502864 in this case, as determined in step 5c) for
the agsize value:
cxfsxe4:~ # mkfs.xfs -f -d agsize=19502864s -l logdev=/dev/lxvm/hybridvol_log -l size=128m /dev/lxvm/hybridvol
warning: unable to probe device topology for device /dev/lxvm/hybridvol
meta-data=/dev/lxvm/hybridvol isize=256 agcount=109, agsize=2437858 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=263692052, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =/dev/lxvm/hybridvol_log bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
8. Display the kernel messages to verify that the filesystem was correctly mounted
with the ibound option, as described in "Message Indicating a Successful Mount
with ibound" on page 76. For example:
cxfsxe4:~ # dmesg | grep XFS
XFS (xvm-46): XFS: filesystem xvm-46 maximum new inode number is 508767775
XFS (xvm-46): Mounting Filesystem
XFS (xvm-46): Ending clean mount
(In the case of error messages, see "ibound and Kernel Messages" on page 76.)
If you use a value for ibound that is smaller than the size of the first AG, the
filesystem will determine an appropriate value to use instead. To illustrate this,
carrying on from the example in "Example of Successfully Maximizing SSD Storage of
Inodes for an SSD/HDD Filesystem":
1. Unmount the filesystem:
cxfsxe4:~ # umount /mnt
007–4273–007 81
7: Enhanced XFS Extensions
2. Mount the filesystem with an ibound value that is obviously too small, such as 1:
cxfsxe4:~ # mount -o ibound=1,logdev=/dev/lxvm/hybridvol_log /dev/lxvm/hybridvol /mnt
3. Display the kernel messages to determine if the filesystem was correctly mounted
with the ibound option. In this case, the output shows that the improper value
specified in the previous step is overridden with an appropriate value:
cxfsxe4:~ # dmesg | grep XFS
XFS (xvm-46): XFS: filesystem xvm-46 ibound is too small, using 19502856
XFS (xvm-46): XFS: filesystem xvm-46 maximum new inode number is 39005727
XFS (xvm-46): Mounting Filesystem
XFS (xvm-46): Ending clean mount
If there are more AGs in the metadata region than in the user-extents region, the
ibound option will be ignored. To illustrate this, carrying on from the previous
example that has an AG count of 109:
1. Unmount the filesystem:
cxfsxe4:~ # umount /mnt
2. Mount the filesystem with an ibound value that specifies a block within AG55
(which would result in 54 AGs in the metadata region and 55 AGs in the
user-extents region, given a total of 109 AGs):
3. Display the kernel messages to determine if the filesystem was correctly mounted
with the ibound option. In this case, the output shows that the ibound option
has been ignored, and inode32 behavior will be used instead:
cxfsxe4:~ # dmesg | grep XFS
XFS (xvm-46): filesystem xvm-46 ibound is too small, using inode32
XFS (xvm-46): Mounting Filesystem
XFS (xvm-46): Ending clean mount
82 007–4273–007
Appendix A
007–4273–007 83
A: XFS System-Tunable Kernel Parameters
Prefix
Each of the parameters uses a prefix of fs.xfs. For example, the full name of the
stats_clear parameter is fs.xfs.stats_clear.
where:
• value is the value you want to set
Note: This is the recommended method to permanently set an XFS system tunable
parameter. Setting the parameter in the /etc/sysctl.conf file is not recommended
because the file may be parsed at boot time before the xfs module is loaded.
where:
• systune is the parameter name
• value is the value you want to set for the parameter
84 007–4273–007
®
XFS Administrator Guide
For example, to temporarily set the rotorstep parameter (which has the fs.xfs
prefix) to 255, enter the following:
# sysctl fs.xfs.rotorstep=255
fs.xfs.rotorstep = 255
where:
• systune is the parameter name
For example, to query the current setting of the rotorstep parameter (which has the
fs.xfs prefix):
# sysctl fs.xfs.rotorstep
rotorstep = 255
Parameter Types
This section discusses the following:
• "Parameters to Set at Initial Configuration" on page 86
• "Mount-Time Parameter for Initial Configuration" on page 88
• "Parameters for Special-Case Performance Tuning" on page 88
• "Mount-Time Parameter for Special-Case Performance Tuning" on page 91
• "Debugging Parameters Restricted to SGI Support" on page 91
007–4273–007 85
A: XFS System-Tunable Kernel Parameters
inherit_noatim
Specifies whether the noatim flag set by the xfs_io(8) chattr command will be
inherited by files in a given directory.
Range of values:
• 0 prevents inheritance
• 1 causes the noatim
flag to be inherited
inherit_nodfrg
Specifies whether the nodfrg flag set by the xfs_io(8) chattr command will be
inherited by files in a given directory.
Range of values:
• 0 prevents inheritance
• 1 causes the nodfrg flag to be inherited
86 007–4273–007
®
XFS Administrator Guide
inherit_nodump
Specifies whether the nodump flag set by the xfs_io(8) chattr command will be
inherited by files in a given directory.
Range of values:
• 0 prevents inheritance
inherit_nosym
Specifies whether the nosymlinks flag set by the xfs_io(8) chattr command will
be inherited by files in a given directory.
Range of values:
• 0 prevents inheritance
• 1 causes the nosymlinks flag to be inherited
inherit_sync
Specifies whether the sync flag set by the xfs_io(8) chattr command will be
inherited by files in a given directory.
Range of values:
• 0 prevents inheritance
• 1 causes the sync flag to be inherited
sgid_inherit
Controls the action taken for a file created in a set group ID (SGID) directory if the
group ID of the new file does not match the effective group ID or one of the
supplementary group IDs of the parent directory.
Range of values:
• 0 does not clear the S_ISGID bit
• 1 clears the S_ISGID bit
007–4273–007 87
A: XFS System-Tunable Kernel Parameters
stats_clear
symlink_mode
probe_dmapi
Determines whether or not XFS attempts to load the xfs_dmapi module and enable
the dmi/dmapi/xdsm mount option when mounting a filesystem.
Range of values:
• 0 does not load the module or enable the mount option
• 1 loads the module and enable the mount option
88 007–4273–007
®
XFS Administrator Guide
• "probe_limit" on page 89
• "rotorstep" on page 89
• "syncd_timer" on page 90
• "xfs_buf_age" on page 90
• "xfs_buf_timer" on page 90
probe_limit
Specifies the maximum number of pages that XFS will cluster together when probing,
in order to optimize the conversion of delayed allocation or unwritten extents into
real extents.
Range of values:
• Default: 4096 (0x1000)
• Minimum: 0
• Maximum: 2097151 (0x1fffff)
rotorstep
In inode32 allocation mode, determines how many files the allocator attempts to
allocate before moving to the next allocation group. The intent is to control the rate at
which the allocator moves between allocation groups when allocating extents for new
files.
Range of values:
• Default: 1
• Minimum: 1
• Maximum: 255
See also:
• "agskip Mount Option for Allocation Group Specification" on page 69
007–4273–007 89
A: XFS System-Tunable Kernel Parameters
syncd_timer
Specifies the interval (in centiseconds) at which the xfssyncd thread flushes
metadata such as log activity out to disk does some processing on unlinked inodes.
Range of values:
• Default: 3000
• Minimum: 100
• Maximum: 720000
xfs_buf_age
Specifies the age (in centiseconds) at which xfsbufd flushes dirty metadata buffers
to disk.
Range of values:
• Default: 1500
• Minimum: 100
• Maximum: 720000
xfs_buf_timer
Specifies the interval (in centiseconds) at which xfsbufd scans the dirty metadata
buffers list.
Range of values:
• Default: 100
• Minimum: 50
• Maximum: 3000
90 007–4273–007
®
XFS Administrator Guide
fstrm_timer
Specifies the filestream timer, which is the required time interval (in centiseconds)
between file creates in a directory to maintain a stream of files.
Range of values:
• Default: 3000
• Minimum: 1
• Maximum: 360000
error_level
Specifies the reporting volume when internal errors occur, such as the number of
detailed messages and backtraces for filesystem shutdowns. XFS macros use the
following threshold values:
XFS_ERRLEVEL_OFF is 0
XFS_ERRLEVEL_LOW is 3
XFS_ERRLEVEL_HIGH is 5
Range of values:
007–4273–007 91
A: XFS System-Tunable Kernel Parameters
• Default: 3
• Minimum: 0 (turns off error reporting)
• Maximum: 11
panic_mask
Specifies a bitmask that causes certain error conditions to call BUG(). The value is the
AND value of the following tags representing errors that should cause panics:
XFS_NO_PTAG 0
XFS_PTAG_IFLUSH 0x00000001
XFS_PTAG_LOGRES 0x00000002
XFS_PTAG_AILDELETE 0x00000004
XFS_PTAG_ERROR_REPORT 0x00000008
XFS_PTAG_SHUTDOWN_CORRUPT 0x00000010
XFS_PTAG_SHUTDOWN_IOERROR 0x00000020
XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040
Range of values:
• Default: 0
• Minimum: 0
• Maximum: 255
92 007–4273–007
Index
A
D
access control lists (ACLs), 2
accounting, 34 data segments, 39
agsize, 75, 71 database journaling, 1
agskip, 69 disk partitioning, 8
allocation group specification, 69 disk quotas
allocation groups, 7 See quotas, 27
archives, 53 dual-hosted disks, 18
attr, 2 dump, 2
attributes, 2 dump inventory, 39
dump layouts, 39
dump session, 38
B dump stream, 38
dump, incremental, 50
backup and restore, 2 dump, resumed, 50
backup procedures, 37
backups, 18
bandwidth operations and SSD, 70 E
barrier mount option and remount, 26
block size Enhanced XFS extension, 69
filesystem directory, 4 agskip mount option, 69
planning, 3 ibound mount option for SSD media, 70
block sizes, 1 erasing tape data, 50
error_level, 91
/etc/fstab, 34
C /etc/fstab file, 29
/etc/modprobe.d/sgi-xfs.conf, 84
capacity-intensive operations and SSD, 70 extended attributes, 2
consistency of filesystems, 18 extent allocation policy, 71
copying files with xfsdump and xfsrestore, 68 extents, 1
corruption of filesystems, 17 external filesystem log, 5
crash recovery, 1
create(), 6
007–4273–007 93
Index
F inherit_nodfrg, 86
inherit_nodump, 87
fcntl system call, 1 inherit_nosym, 87
features, 1 inherit_sync, 86, 87
filestream timer, 91 inode number, 76
filesystem log, 4 inode32 vs inode64 behavior, 75
filesystem repair, 21 inodes, 1, 23
fs_quota –p, 32 internal filesystem log, 5
fsck, 1, 19 inventory of a dump, 53
fstab file, 12
fstrm_timer, 91
J
G journaling, 1
gqnoenforce, 34
gquota mount option, 29 L
group quotas
See "quotas", 31 latency-sensitive operations and SSD, 70
growing filesystems, 15 log recovery, 25
log size, 5
log type, 5
H logdev, 7
logs and mkfs.xfs options, 7
hard limits, 27 lost+found, 24
hard-disk drive and ibound, 70 lost+found directory, 23
hardware requirements, 3
HDD media and ibound, 70
hierarchical storage manager (HSM), 2 M
housekeeping directory, 67
maintenance of filesystems, 17
making a filesystem, 11
I maximum filesystem size, 1
maximum inode number, 76
I/O performance, 1 media file, 39
ibound mount option media layout, 38
examples, 77 media object, 38
kernel messages, 76 memory recommendation, 1
overview, 70 metadata region, 70
inconsistent filesystems, 21 mkdir, 6
incremental dumps, 50 mkfs.xfs -n, 4
inherit_noatim, 86
94 007–4273–007
®
XFS Administrator Guide
mkfs.xfs command, 14 Q
mkfs.xfs –b, 3, 13, 15
mkfs.xfs –l, 14 qnoenforce, 34
mkfs.xfs –p, 12 quotaoff, 35
monitoring disk space usage, 34 quotaon, 35
mount —o norecover, 26 quotas, 27
mounting a filesystem, 12 administering, 33
mounting without log recovery, 25 disk space usage monitoring and , 34
mt erase, 50 displaying, 33
multiprocessing systems, 1 enabling for groups, 29
enabling for projects, 30
enabling for users, 29
N hard limits, 27
limits for groups, 31
namespaces, 2 limits for projects, 32
network restores, 60 limits for users, 31
noalign mount option and remount, 26 mutually exclusive group and project quotas, 31
nobarrier mount option and remount, 26 soft limits, 27
O R
007–4273–007 95
Index
S probe_dmapi, 88
probe_limit, 89
sgid_inherit, 87 queries, 85
shutdown, 17 rotorstep, 89
site-configurable system tunable kernel sgid_inherit, 87
parameters, 83 stats_clear, 88
size of filesystem, 1 symlink_mode, 88
soft limits, 27 syncd_timer, 90
solid-state drive, 70 temporary changes, 84
solid-state drive and ibound, 70 xfs_buf_timer, 90
sparse files, 1 xfsbufd_centisecs, 90
SSD media and ibound, 70
stats_clear, 88
stream terminator, 39 T
stripe units, 7, 8
striped volume and stripe unit, 8 tape data, erasing, 50
swalloc mount option and remount, 26 tapes, reusing, 49
symlink_mode, 88 transaction activity and log size, 6
syncd_timer, 90 transaction rate and log size, 6
sysctl, 84
system namespace, 2
system panic, 17 U
system tunable kernel parameters
appropriate settings, 83 umount, 12
debugging parameters restricted to SGI unlink(), 6
Support, 91 unmounting a disk partition, 12
error_level, 91 user namespace, 2
fstrm_timer, 91 user quotas
inherit_noatim, 86 See "quotas", 31
inherit_nodfrg, 86 user-extents region, 70
inherit_nodump, 87
inherit_nosym, 87
inherit_sync, 86, 87 V
mount-time parameter for initial
configuration, 88 volume manager, 1
mount-time parameter for special-case
performance tuning, 91
panic_mask, 92 X
parameters for special-case performance
tuning, 88 xfs_buf_timer, 90
parameters set at initial configuration, 86 xfs_check, 19, 20
permanent changes, 84
96 007–4273–007
®
XFS Administrator Guide
007–4273–007 97