GPFS 4.2 troubleshoot
GPFS 4.2 troubleshoot
IBM
GA76-0443-06
IBM Spectrum Scale
Version 4 Release 2.0
IBM
GA76-0443-06
Note
Before using this information and the product it supports, read the information in “Notices” on page 319.
This edition applies to version 4 release 2 of the following products, and to all subsequent releases and
modifications until otherwise indicated in new editions:
v IBM Spectrum Scale ordered through Passport Advantage® (product number 5725-Q01)
v IBM Spectrum Scale ordered through AAS/eConfig (product number 5641-GPF)
v IBM Spectrum Scale for Linux on z Systems (product number 5725-S28)
Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the
change.
IBM welcomes your comments; see the topic “How to send your comments” on page xii. When you send
information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes
appropriate without incurring any obligation to you.
© Copyright IBM Corporation 2014, 2016.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . vii The mmlscluster command . . . . . . . . . 44
The mmlsconfig command . . . . . . . . . 45
About this information . . . . . . . . ix The mmrefresh command . . . . . . . . . 45
The mmsdrrestore command . . . . . . . . 46
Prerequisite and related information . . . . . . xi
The mmexpelnode command . . . . . . . . 46
Conventions used in this information . . . . . . xi
How to send your comments . . . . . . . . xii
Chapter 4. GPFS file system and disk
Summary of changes . . . . . . . . xiii information . . . . . . . . . . . . . 49
Restricted mode mount . . . . . . . . . . 49
Read-only mode mount . . . . . . . . . . 49
Chapter 1. Logs, dumps, and traces . . 1
The lsof command . . . . . . . . . . . . 50
GPFS logs . . . . . . . . . . . . . . . 1
The mmlsmount command . . . . . . . . . 50
Creating a master GPFS log file . . . . . . . 2
The mmapplypolicy -L command . . . . . . . 51
Protocol services logs . . . . . . . . . . . 3
mmapplypolicy -L 0 . . . . . . . . . . 52
SMB logs . . . . . . . . . . . . . . 3
mmapplypolicy -L 1 . . . . . . . . . . 52
NFS logs. . . . . . . . . . . . . . . 4
mmapplypolicy -L 2 . . . . . . . . . . 53
Object logs . . . . . . . . . . . . . . 6
mmapplypolicy -L 3 . . . . . . . . . . 54
The IBM Spectrum Scale HDFS transparency log . 8
mmapplypolicy -L 4 . . . . . . . . . . 55
Protocol authentication log files . . . . . . . 9
mmapplypolicy -L 5 . . . . . . . . . . 55
CES monitoring and troubleshooting . . . . . 11
mmapplypolicy -L 6 . . . . . . . . . . 56
CES tracing and debug data collection . . . . 13
The mmcheckquota command . . . . . . . . 57
The operating system error log facility . . . . . 19
The mmlsnsd command . . . . . . . . . . 57
MMFS_ABNORMAL_SHUTDOWN . . . . . 20
The mmwindisk command . . . . . . . . . 58
MMFS_DISKFAIL . . . . . . . . . . . 20
The mmfileid command . . . . . . . . . . 59
MMFS_ENVIRON . . . . . . . . . . . 20
The SHA digest . . . . . . . . . . . . . 61
MMFS_FSSTRUCT . . . . . . . . . . . 20
MMFS_GENERIC . . . . . . . . . . . 20
MMFS_LONGDISKIO . . . . . . . . . . 21 Chapter 5. Resolving deadlocks . . . . 63
MMFS_QUOTA . . . . . . . . . . . . 21 Automated deadlock detection . . . . . . . . 63
MMFS_SYSTEM_UNMOUNT . . . . . . . 22 Automated deadlock data collection . . . . . . 65
MMFS_SYSTEM_WARNING . . . . . . . 22 Automated deadlock breakup . . . . . . . . 66
Error log entry example . . . . . . . . . 22 Deadlock breakup on demand . . . . . . . . 67
Using the gpfs.snap command . . . . . . . . 23 Cluster overload detection . . . . . . . . . 68
Data gathered by gpfs.snap on all platforms . . 23
Data gathered by gpfs.snap on AIX . . . . . 24 Chapter 6. Other problem
Data gathered by gpfs.snap on Linux . . . . . 25 determination tools . . . . . . . . . 71
Data gathered by gpfs.snap on Windows . . . 25
Data gathered by gpfs.snap for a master snapshot 25 Chapter 7. Installation and
Data gathered by gpfs.snap on Linux for
protocols . . . . . . . . . . . . . . 26
configuration issues . . . . . . . . . 73
mmdumpperfdata command . . . . . . . . 31 Installation and configuration problems . . . . . 73
mmfsadm command . . . . . . . . . . . 33 What to do after a node of a GPFS cluster
Trace facility . . . . . . . . . . . . . . 34 crashes and has been reinstalled . . . . . . 74
Generating GPFS trace reports . . . . . . . 34 Problems with the /etc/hosts file . . . . . . 74
Best practices for setting up core dumps on a client Linux configuration considerations . . . . . 74
system . . . . . . . . . . . . . . . . 38 Protocol authentication problem determination 75
Problems with running commands on other
nodes . . . . . . . . . . . . . . . 75
Chapter 2. Troubleshooting options GPFS cluster configuration data files are locked 76
available in GUI . . . . . . . . . . . 41 Recovery from loss of GPFS cluster configuration
data file . . . . . . . . . . . . . . 77
Chapter 3. GPFS cluster state Automatic backup of the GPFS cluster data. . . 78
information . . . . . . . . . . . . . 43 Error numbers specific to GPFS applications calls 78
The mmafmctl Device getstate command . . . . 43 GPFS modules cannot be loaded on Linux . . . . 79
The mmdiag command . . . . . . . . . . 43 GPFS daemon will not come up . . . . . . . 79
The mmgetstate command . . . . . . . . . 43
Contents v
vi IBM Spectrum Scale 4.2: Problem Determination Guide
Tables
1. IBM Spectrum Scale library information units ix 8. Events for the GPFS component . . . . . 153
2. Conventions . . . . . . . . . . . . xii 9. Events for the KEYSTONE component 154
3. Core object log files in /var/log/swift . . . . 7 10. Events for the NFS component. . . . . . 155
4. Additional object log files in /var/log/swift 8 11. Events for the Network component . . . . 159
5. General system log files in /var/adm/ras . . . 8 12. Events for the Object component . . . . . 160
6. Authentication log files . . . . . . . . . 9 13. Events for the SMB component . . . . . 165
7. Events for the AUTH component . . . . . 151 14. Message severity tags ordered by priority 171
IBM Spectrum Scale is a file management infrastructure, based on IBM® General Parallel File System
(GPFS™) technology, that provides unmatched performance and reliability with scalable access to critical
file data.
To find out which version of IBM Spectrum Scale is running on a particular AIX node, enter:
lslpp -l gpfs\*
To find out which version of IBM Spectrum Scale is running on a particular Linux node, enter:
rpm -qa | grep gpfs
To find out which version of IBM Spectrum Scale is running on a particular Windows node, open the
Programs and Features control panel. The IBM Spectrum Scale installed program name includes the
version number.
Which IBM Spectrum Scale information unit provides the information you need?
The IBM Spectrum Scale library consists of the information units listed in Table 1.
To use these information units effectively, you must be familiar with IBM Spectrum Scale and the AIX,
Linux, or Windows operating system, or all of them, depending on which operating systems are in use at
your installation. Where necessary, these information units provide some background information relating
to AIX, Linux, or Windows; however, more commonly they refer to the appropriate operating system
documentation.
Note: Throughout this documentation, the term “Linux” refers to all supported distributions of Linux,
unless otherwise specified.
Table 1. IBM Spectrum Scale library information units
Information unit Type of information Intended users
IBM Spectrum Scale: This information unit explains how to System administrators or programmers
Administration and Programming do the following: of GPFS systems
Reference v Use the commands, programming
interfaces, and user exits unique to
GPFS
v Manage clusters, file systems, disks,
and quotas
v Export a GPFS file system using the
Network File System (NFS) protocol
For the latest support information, see the IBM Spectrum Scale FAQ in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
Note: Users of IBM Spectrum Scale for Windows must be aware that on Windows, UNIX-style file
names need to be converted appropriately. For example, the GPFS cluster configuration data is stored in
the /var/mmfs/gen/mmsdrfs file. On Windows, the UNIX namespace starts under the %SystemDrive%\
cygwin64 directory, so the GPFS cluster configuration data is stored in the C:\cygwin64\var\mmfs\gen\
mmsdrfs file.
Depending on the context, bold typeface sometimes represents path names, directories, or file
names.
bold underlined bold underlined keywords are defaults. These take effect if you do not specify a different
keyword.
constant width Examples and information that the system displays appear in constant-width typeface.
Italics are also used for information unit titles, for the first use of a glossary term, and for
general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For
example, <Enter> refers to the key on your terminal or workstation that is labeled with the
word Enter.
\ In command examples, a backslash indicates that the command or coding example continues
on the next line. For example:
mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \
-E "PercentTotUsed < 85" -m p "FileSystem space used"
{item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means
that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
| In synopsis statements, vertical lines separate a list of choices. In other words, a vertical line
means Or.
In the left margin of the document, vertical lines indicate technical changes to the
information.
Include the publication title and order number, and, if applicable, the specific location of the information
about which you have comments (for example, a page number or a table number).
To contact the IBM Spectrum Scale development organization, send your comments to the following
e-mail address:
Summary of changes
for IBM Spectrum Scale version 4 release 2
as updated, November 2015
Changes to this release of the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library
include the following:
Cluster Configuration Repository (CCR): Backup and restore
You can backup and restore a cluster that has Cluster Configuration Repository (CCR) enabled. In
the mmsdrbackup user exit, the type of backup that is created depends on the configuration of
the cluster. If the Cluster Configuration Repository (CCR) is enabled, then a CCR backup is
created. Otherwise, a mmsdrfs backup is created. In the mmsdrrestore command, if the
configuration file is a Cluster Configuration Repository (CCR) backup file, then you must specify
the -a option. All the nodes in the cluster are restored.
Changes in IBM Spectrum Scale for object storage
Object capabilities
Object capabilities describe the object protocol features that are configured in the IBM
Spectrum Scale cluster such as unified file and object access, multi-region object
deployment, and S3 API emulation. For more information, see the following topics:
v Object capabilities in IBM Spectrum Scale: Concepts, Planning, and Installation Guide
v Managing object capabilities in IBM Spectrum Scale: Administration and Programming
Reference
Storage policies for object storage
Storage policies enable segmenting of the object storage within a single cluster for various
use cases. Currently, OpenStack Swift supports storage polices that allow you to define
the replication settings and location of objects in a cluster. IBM Spectrum Scale enhances
storage policies to add compression and unified file and object access functions for object
storage. For more information, see the following topics:
v Storage policies for object storage in IBM Spectrum Scale: Concepts, Planning, and Installation
Guide
v Mapping of storage policies to filesets in IBM Spectrum Scale: Administration and
Programming Reference
v Administering storage policies for object storage in IBM Spectrum Scale: Administration and
Programming Reference
Multi-region object deployment
The main purpose of the object protocol is to enable the upload and download of object
data. When clients have a fast connection to the cluster, the network delay is minimal.
However, when client access to object data is over a WAN or a high-latency network, the
network can introduce an unacceptable delay and affect quality-of-service metrics. To
improve that response time, you can create a replica of the data in a cluster closer to the
clients using the active-active multi-region replication support in OpenStack Swift.
Multi-region can also be used to distribute the object load over several clusters to reduce
contention in the file system. For more information, see the following topics:
Summary of changes xv
For more information, see IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Protocols cluster disaster recovery (DR)
You can use the mmcesdr command to perform DR setup, failover, failback, backup, and restore
actions. Protocols cluster DR uses the capabilities of Active File Management based Async
Disaster Recovery (AFM DR) to provide a solution that allows an IBM Spectrum Scale cluster to
fail over to another cluster and fail back, and backup and restore the protocol configuration
information in cases where a secondary cluster is not available. For more information, see
Protocols cluster disaster recovery in IBM Spectrum Scale: Advanced Administration Guide.
Quality of Service for I/O operations (QoS)
You can use the QoS capability to prevent I/O-intensive, long-running GPFS commands, called
maintenance commands, from dominating file system performance and significantly delaying
normal tasks that also compete for I/O resources. Determine the maximum capacity of your file
system in I/O operations per second (IOPS) with the new mmlsqos command. With the new
mmchqos command, assign a smaller share of IOPS to the QoS maintenance class, which
includes all the maintenance commands. Maintenance command instances that are running at the
same time compete for the IOPS allocated to the maintenance class, and are not allowed to
exceed that limit.
Security mode for new clusters
Starting with IBM Spectrum Scale V4.2, the default security mode for new clusters is
AUTHONLY. The mmcrcluster command sets the security mode to AUTHONLY when it creates
the cluster and automatically generates a public/private key pair for authenticating the cluster. In
the AUTHONLY security mode, the sending and receiving nodes authenticate each other with a
TLS handshake and then close the TLS connection. Communication continues in the clear. The
nodes do not encrypt transmitted data and do not check data integrity.
In IBM Spectrum Scale V4.1 or earlier, the default security mode is EMPTY. If you update a
cluster from IBM Spectrum Scale V4.1 to V4.2 or later by running mmchconfig release=LATEST, the
command checks the security mode. If the mode is EMPTY, the command issues a warning
message but does not change the security mode of the cluster.
Snapshots
You can display information about a global snapshot without displaying information about fileset
snapshots with the same name. You can display information about a fileset snapshot without
displaying information about other snapshots that have the same name but are snapshots of other
filesets.
spectrumscale Options
The spectrumscale command options for installing GPFS and deploying protocols have changed
to remove config enable and to add config perf. For more information, see IBM Spectrum Scale:
Concepts, Planning, and Installation Guide.
New options have been added to spectrumscale setup and spectrumscale deploy to disable
prompting for the encryption/decryption secret. Note that if spectrumscale setup --storesecret is
used, passwords will not be secure. New properties have been added to spectrumscale cofig
object for setting password data instead of doing so through enable object. For more
information, see IBM Spectrum Scale: Administration and Programming Reference.
The spectrumscale options for managing share ACLs have been added. For more information, see
IBM Spectrum Scale: Administration and Programming Reference.
ssh and scp wrapper scripts
Starting with IBM Spectrum Scale V4.2, a cluster can be configured to use ssh and scp wrappers.
The wrappers allow GPFS to run on clusters where remote root login through ssh is disabled. For
more information, see the help topic "Running IBM Spectrum Scale without remote root login" in
the IBM Spectrum Scale: Administration and Programming Reference.
Documented commands, structures, and subroutines
The following lists the modifications to the documented commands, structures, and subroutines:
You can collect various types of logs such as GPFS logs, protocol service logs, operating system logs, and
transparent cloud tiering logs. The GPFS™ log is a repository of error conditions that are detected on each
node, as well as operational events such as file system mounts. The operating system error log is also
useful because it contains information about hardware failures and operating system or other software
failures that can affect the IBM Spectrum Scale system.
Note: The GPFS error logs and messages contain the MMFS prefix to distinguish it from the components
of the IBM Multi-Media LAN Server, a related licensed program.
The IBM Spectrum Scale system also provides a system snapshot dump, trace, and other utilities that can
be used to obtain detailed information about specific problems.
GPFS logs
The GPFS log is a repository of error conditions that are detected on each node, as well as operational
events such as file system mounts. The GPFS log is the first place to look when you start debugging the
abnormal events. As GPFS is a cluster file system, events that occur on one node might affect system
behavior on other nodes, and all GPFS logs can have relevant data.
The GPFS log can be found in the /var/adm/ras directory on each node. The GPFS log file is named
mmfs.log.date.nodeName, where date is the time stamp when the instance of GPFS started on the node and
nodeName is the name of the node. The latest GPFS log file can be found by using the symbolic file name
/var/adm/ras/mmfs.log.latest.
The GPFS log from the prior startup of GPFS can be found by using the symbolic file name
/var/adm/ras/mmfs.log.previous. All other files have a time stamp and node name appended to the file
name.
At GPFS startup, log files that are not accessed during the last 10 days are deleted. If you want to save
old log files, copy them elsewhere.
Many GPFS log messages can be sent to syslog on Linux. The systemLogLevel attribute of the
mmchconfig command determines the GPFS log messages to be sent to the syslog. For more information,
see the mmchconfig command in the IBM Spectrum Scale: Administration and Programming Reference.
This example shows normal operational messages that appear in the GPFS log file on Linux node:
The mmcommon logRotate command can be used to rotate the GPFS log without shutting down and
restarting the daemon. After the mmcommon logRotate command is issued, /var/adm/ras/
mmfs.log.previous will contain the messages that occurred since the previous startup of GPFS or the last
run of mmcommon logRotate. The /var/adm/ras/mmfs.log.latest file starts over at the point in time that
mmcommon logRotate was run.
Depending on the size and complexity of your system configuration, the amount of time to start GPFS
varies. If you cannot access a file system that is mounted, examine the log file for error messages.
GPFS is a file system that runs on multiple nodes of a cluster. This means that problems originating on
one node of a cluster often have effects that are visible on other nodes. It is often valuable to merge the
GPFS logs in pursuit of a problem. Having accurate time stamps aids the analysis of the sequence of
events.
Before following any of the debug steps, IBM suggests that you:
1. Synchronize all clocks of all nodes in the GPFS cluster. If this is not done, and clocks on different
nodes are out of sync, there is no way to establish the real time line of events occurring on multiple
nodes. Therefore, a merged error log is less useful for determining the origin of a problem and
tracking its effects.
2. Merge and chronologically sort all of the GPFS log entries from each node in the cluster. The
--gather-logs option of the gpfs.snap command can be used to achieve this:
gpfs.snap --gather-logs -d /tmp/logs -N all
By default, the NFS, SMB, and Object protocol logs are stored at: /var/log/messages.
For more information on logs for the spectrumscale installation toolkit, see the “Logging and debugging”
topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide
SMB logs
The SMB services write the most important messages to syslog.
With the standard syslog configuration, you can search for the terms such as ctdbd or smbd in the
/var/log/messages file to see the relevant logs. For example:
When the size of the log.smbd file becomes 100 MB, the system changes the file as log.smbd.old. To
capture more detailed traces for problem determination, use the mmprotocoltrace command.
Note: By default, the mmprotocoltrace command enables tracing for all connections, which negatively
impacts the cluster when the number of connections are high. It is recommended to limit the trace to
certain client IP addresses using the -c parameter.
When using Active Directory, the most important messages are written to syslog, similar to the logs in
SMB protocol. For example:
To capture debug traces for Active Directory authentication, use the following command to enable
tracing:
To disable tracing for Active Directory authentication, use the following command:
NFS logs
The clustered export services (CES) NFS server writes log messages in the /var/log/ganesha.log file at
runtime.
Operating system's log rotation facility is used to manage NFS logs. The NFS logs are configured and
enabled during the NFS server packages installation.
Log levels can be displayed by using the mmnfs configuration list | grep LOG_LEVEL command. For
example:
mmnfs configuration list | grep LOG_LEVEL
By default, the log level is EVENT. Additionally, the following NFS log levels can also be used; starting
from lowest to highest verbosity:
Note: The FULL_DEBUG level increases the size of the log file. Use it in the production mode only if
instructed by the IBM Support.
Increasing the verbosity of the NFS server log impacts the overall NFS I/O performance.
To change the logging to the verbose log level INFO, use the following command:
This change is cluster-wide and restarts all NFS instances to activate this setting. The log file now
displays more informational messages, for example:
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_rpc_dispatch_threads
:THREAD :INFO :5 rpc dispatcher threads were started successfully
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[disp] rpc_dispatcher_thread
:DISP :INFO :Entering nfs/rpc dispatcher
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[disp] rpc_dispatcher_thread
:DISP :INFO :Entering nfs/rpc dispatcher
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[disp] rpc_dispatcher_thread
:DISP :INFO :Entering nfs/rpc dispatcher
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[disp] rpc_dispatcher_thread
:DISP :INFO :Entering nfs/rpc dispatcher
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_Start_threads
:THREAD :EVENT :gsh_dbusthread was started successfully
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_Start_threads
:THREAD :EVENT :admin thread was started successfully
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_Start_threads
:THREAD :EVENT :reaper thread was started successfully
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_Start_threads
:THREAD :EVENT :General fridge was started successfully
2015-06-03 12:49:31 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[reaper] nfs_in_grace
:STATE :EVENT :NFS Server Now IN GRACE
2015-06-03 12:49:32 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_start
:NFS STARTUP :EVENT :-------------------------------------------------
2015-06-03 12:49:32 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_start
:NFS STARTUP :EVENT : NFS SERVER INITIALIZED
2015-06-03 12:49:32 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[main] nfs_start
:NFS STARTUP :EVENT :-------------------------------------------------
2015-06-03 12:50:32 : epoch 556edba9 : cluster1.ibm.com : ganesha.nfsd-21582[reaper] nfs_in_grace
:STATE :EVENT :NFS Server Now NOT IN GRACE
To display the currently configured CES log level, use the following command:
NFS-related log information is written to the standard GPFS log files as part of the overall CES
infrastructure. This information relates to the NFS service management and recovery orchestration within
CES.
Object logs
There are a number of locations where messages are logged with the Object protocol.
The core Object services, proxy, account, container, and Object server have their own logging level sets in
their respective configuration files. By default, Swift logging is set to show messages at or above the
ERROR level, but can be changed to INFO or DEBUG levels if more detailed logging information is required.
By default, the messages logged by these services are saved in the /var/log/swift directory.
You can also configure these services to use separate syslog facilities by the log_facility parameter in
one or all of the Object service configuration files and by updating the rsyslog configuration. These
parameters are described in the Swift Deployment Guide (docs.openstack.org/developer/swift/
deployment_guide.html) that is available in the OpenStack documentation.
An example of how to set up this configuration can be found in the SAIO - Swift All In One
documentation (docs.openstack.org/developer/swift/development_saio.html#optional-setting-up-rsyslog-
for-individual-logging) that is available in the OpenStack documentation.
Note: To configure rsyslog for unique log facilities in the protocol nodes, the administrator needs to
ensure that the manual steps mentioned in the preceding link are carried out on each of those protocol
nodes.
The Keystone authentication service writes its logging messages to /var/log/keystone/keystone.log file.
By default, Keystone logging is set to show messages at or above the WARNING level.
For information on how to view or change log levels on any of the Object related services, see the “CES
collection and tracing” section in the IBM Spectrum Scale: Advanced Administration Guide.
The following commands can be used to determine the health of Object services:
v To see whether there are any nodes in an active (failed) state, run the following command:
mmces state cluster OBJ
The system displays output similar to this:
NODE COMPONENT STATE EVENTS
prt001st001 OBJECT HEALTHY
prt002st001 OBJECT HEALTHY
prt003st001 OBJECT HEALTHY
prt004st001 OBJECT HEALTHY
prt005st001 OBJECT HEALTHY
prt006st001 OBJECT HEALTHY
prt007st001 OBJECT HEALTHY
In this example, all nodes are healthy so no active events are shown.
v To display the history of events generated by the monitoring framework, run the following command:
mmces events list OBJ
The system displays output similar to this:
The following tables list the IBM Spectrum Scale for object storage log files.
Table 3. Core object log files in /var/log/swift. Core object log files in /var/log/swift
Log file Component Configuration file
account-auditor.log Account auditor Swift service account-server.conf
account-auditor.error
account-reaper.log Account reaper Swift service account-server.conf
account-reaper.error
account-replicator.log Account replicator Swift service account-server.conf
account-replicator.error
account-server.log Account server Swift service account-server.conf
account-server.error
container-auditor.log Container auditor Swift service container-server.conf
container-auditor.error
container-replicator.error
container-server.log Container server Swift service container-server.conf
container-server.error
container-updater.log Container updater Swift service container-server.conf
container-updater.error
object-auditor.log Object auditor Swift service object-server.conf
object-auditor.error
object-expirer.log Object expirer Swift service object-expirer.conf
object-expirer.error
object-replicator.log Object replicator Swift service object-server.conf
object-replicator.error
object-server.log Object server Swift service object-server.conf
object-server.error object-server-sof.conf
object-updater.log Object updater Swift service object-server.conf
object-updater.error
proxy-server.log Proxy server Swift service proxy-server.conf
proxy-server.error
Table 4. Additional object log files in /var/log/swift. Additional object log files in /var/log/swift
Log file Component Configuration file
ibmobjectizer.log Unified file and object access spectrum-scale-objectizer.conf
objectizer service
ibmobjectizer.error spectrum-scale-object.conf
policyscheduler.log Object storage policies spectrum-scale-object-
policies.conf
policyscheduler.error
swift.log Performance metric collector
(pmswift)
swift.error
Table 5. General system log files in /var/adm/ras. General system log files in /var/adm/ras
Log file Component
mmcesmonitor.log CES framework services monitor
mmfs.log Various IBM Spectrum Scale command logging
0x2000: Contents of
function-internal variables that
might be interesting.
/var/adm/ras/log.winbindd-dc-
connect
/var/adm/ras/log.winbindd-idmap
/var/adm/ras/log.winbindd
Note: Some of the authentication modules like Keystone services log information also in
/var/log/messages.
If you change the log levels, the respective authentication service must be restarted manually on each
protocol node. Restarting authentication services might result in disruption of protocol I/O.
Each CES node runs a separate GPFS process that monitors the network address configuration of the
node. If a conflict between the network interface configuration of the node and the current assignments of
the CES address pool is found, corrective action is taken. If the node is unable to detect an address that is
assigned to it, the address is reassigned to another node.
Additional monitors check the state of the services that are implementing the enabled protocols on the
node. These monitors cover NFS, SMB, Object, and Authentication services that monitor, for example,
daemon liveliness and port responsiveness. If it is determined that any enabled service is not functioning
correctly, the node is marked as failed and its CES addresses are reassigned. When the node returns to
normal operation, it returns to the normal (healthy) state and is available to host addresses in the CES
address pool.
An additional monitor runs on each protocol node if Microsoft Active Directory (AD), Lightweight
Directory Access Protocol (LDAP), or Network Information Service (NIS) user authentication is
configured. If a configured authentication server does not respond to test requests, GPFS marks the
affected node as failed.
Aside from the automatic failover and recovery of CES addresses, two additional outputs are provided by
the monitoring that can be queried: events and state.
State can be queried by entering the mmces state show command, which shows you the state of each of
the CES components. The possible states for a component follow:
HEALTHY
The component is working as expected.
DISABLED
The component has not been enabled.
SUSPENDED
When a CES node is in the suspended state, most components also report suspended.
STARTING
The component (or monitor) recently started. This state is a transient state that is updated after
the startup is complete.
UNKNOWN
Something is preventing the monitoring from determining the state of the component.
STOPPED
The component was intentionally stopped. This situation might happen briefly if a service is
being restarted due to a configuration change. It might also happen because a user ran the mmces
service stop protocol command for a node.
DEGRADED
There is a problem with the component but not a complete failure. This state does not cause the
CES addresses to be reassigned.
FAILED
The monitoring detected a significant problem with the component that means it is unable to
function correctly. This state causes the CES addresses of the node to be reassigned.
Looking at the states themselves can be useful to find out which component is causing a node to fail and
have its CES addresses reassigned. To find out why the component is being reported as failed, you can
look at events.
The mmces events command can be used to show you either events that are currently causing a
component to be unhealthy or a list of historical events for the node. If you want to know why a
component on a node is in a failed state, use the mmces events active invocation. This command gives
you a list of any currently active events that are affecting the state of a component, along with a message
that describes the problem. This information should provide a place to start when you are trying to find
and fix the problem that is causing the failure.
If you want to get a complete idea of what is happening with a node over a longer time period, use the
mmces events list invocation. By default, this command prints a list of all events that occurred on this
node, with a time stamp. This information can be narrowed down by component, time period, and
severity. As well as being viewable with the command, all events are also pushed to the syslog.
A CES node can be marked as unavailable by the monitoring process. The command mmces node list
can be used to show the nodes and the current state flags that are associated with it. When unavailable
(one of the following node flags are set), the node does not accept CES address assignments. The
following possible node states can be displayed:
Suspended
Indicates that the node is suspended with the mmces node suspend command. When suspended,
health monitoring on the node is discontinued. The node remains in the suspended state until it
is resumed with the mmces node resume command.
Network-down
Indicates that monitoring found a problem that prevents the node from bringing up the CES
addresses in the address pool. The state reverts to normal when the problem is corrected. Possible
causes for this state are missing or non-functioning network interfaces and network interfaces
that are reconfigured so that the node can no longer host the addresses in the CES address pool.
No-shared-root
Indicates that the CES shared root directory cannot be accessed by the node. The state reverts to
normal when the shared root directory becomes available. Possible cause for this state is that the
file system that contains the CES shared root directory is not mounted.
Failed Indicates that monitoring found a problem with one of the enabled protocol servers. The state
reverts to normal when the server returns to normal operation or when the service is disabled.
Starting up
Indicates that the node is starting the processes that are required to implement the CES services
that are enabled in the cluster. The state reverts to normal when the protocol servers are
functioning.
Additionally, events that affect the availability and configuration of CES nodes are logged in the GPFS
log file /var/adm/ras/mmfs.log.latest. The verbosity of the CES logging can be changed with the mmces
log level n command, where n is a number from 0 (less logging) to 4 (more logging). The current log
level can be viewed with the mmlscluster --ces command.
For more information about CES troubleshooting, see the IBM Spectrum Scale Wiki (www.ibm.com/
developerworks/community/wikis/home/wiki/General Parallel File System (GPFS)).
Collection of debugging information, such as configuration files and logs, can be gathered by using the
gpfs.snap command. This command gathers data about GPFS, operating system information, and
information for each of the protocols.
GPFS + OS
GPFS configuration and logs plus operating system information such as network configuration or
connected drives.
CES Generic protocol information such as configured CES nodes.
NFS NFS Ganesha configuration and logs.
SMB SMB and CTDB configuration and logs.
OBJECT
Openstack Swift and Keystone configuration and logs.
AUTHENTICATION
Authentication configuration and logs.
PERFORMANCE
Dump of the performance monitor database.
Information for each of the enabled protocols is gathered automatically when the gpfs.snap command is
run. If any protocol is enabled, then information for CES and authentication is gathered.
To gather performance data, add the --performance option. The --performance option causes gpfs.snap
to try to collect performance information.
Note: Because this process can take up to 30 minutes to run, gather performance data only if necessary.
If data is only required for one protocol or area, the automatic collection can be bypassed. Provided one
or more of the following options to the --protocol argument: smb,nfs,object,ces,auth,none
If the --protocol command is provided, automatic data collection is disabled. If --protocol smb,nfs is
provided to gpfs.snap, only NFS and SMB information is gathered and no CES or Authentication data is
collected. To disable all protocol data collection, use the argument --protocol none.
Types of tracing
Tracing is logging at a high level. The command for starting and stopping tracing (mmprotocoltrace)
supports SMB tracing. NFS and Object tracing can be done with a combination of commands.
SMB To start SMB tracing, use the mmprotocoltrace start smb command. The output looks similar to
this example:
Starting traces
Trace ’d83235aa-0589-4866-aaf0-2e285aad6f92’ created successfully
Note: Running the mmprotocoltrace start smb command without the -c option enables tracing
for all SMB connections. This configuration can slow performance. Therefore, consider adding the
-c option to trace connections for specific client IP addresses.
To see the status of the trace command, use the mmprotocoltrace status smb command. The
output looks similar to this example:
The tar file then includes the log files that contain top-level logs for the time period the trace
was running for.
Traces time out after a certain amount of time. By default, this time is 10 minutes. The timeout
can be changed by using the -d argument when you start the trace. When a trace times out, the
first node with the timeout ends the trace and writes the location of the collected data into the
mmprotocoltrace logs. Each other node writes an information message that states that another
node ended the trace.
A full usage message for the mmprotocoltrace command is printable by using the -h argument.
NFS NFS tracing is achieved by increasing the log level, repeating the issue, capturing the log file, and
then restoring the log level.
To increase the log level, use the command mmnfs configuration change LOG_LEVEL=FULL_DEBUG.
You can set the log level to the following values: NULL, FATAL, MAJ, CRIT, WARN, EVENT,
INFO, DEBUG, MID_DEBUG, and FULL_DEBUG.
FULL_DEBUG is the most useful for debugging purposes.
After the issue is recreated by running the gpfs.snap command either with no arguments or with
the --protocol nfs argument, the NFS logs are captured. The logs can then be used to diagnose
any issues.
To return the log level to normal, use the same command but with a lower logging level (the
default is EVENT).
Object
The process for tracing the object protocol is similar to NFS. The Object service consists of
multiple processes that can be controlled individually.
The Object services use these logging levels, at increasing severity: DEBUG, INFO, AUDIT,
WARNING, ERROR, CRITICAL, and TRACE.
Keystone and Authentication
mmobj config change --ccrfile keystone.conf --section DEFAULT --property debug
--value True
Finer grained control of Keystone logging levels can be specified by updating the
Keystone's logging.conf file. For information on the logging levels in the logging.conf
file, see the OpenStack logging.conf documentation (docs.openstack.org/kilo/config-
reference/content/section_keystone-logging.conf.html).
These commands increase the log level for the particular process to the debug level. After you
have re-created the problem, run the gpfs.snap command with no arguments or with the
--protocol object argument.
Then, decrease the log levels again by using the commands that are shown previously but with
--value ERROR instead of --value DEBUG.
The following steps describe how to run a typical trace. It is assumed that the trace system is reset for the
type of trace that you want to run: SMB, Network, or Object. The examples use the SMB trace.
1. Before you start the trace, you can check the configuration settings for the type of trace that you plan
to run:
mmprotocoltrace config smb
The response to this command displays the current settings from the trace configuration file. For more
information about this file, see the “Trace configuration file” on page 17 subtopic.
2. Clear the trace records from the previous trace of the same type:
mmprotocoltrace clear smb
This command responds with an error message if the previous state of a trace node is something
other than DONE or FAILED. If this error occurs, follow the instructions in the “Resetting the trace
system ” on page 18 subtopic.
3. Start the new trace:
]# mmprotocoltrace start smb
The following response is typical. The last line gives the location of the trace log file:
Stopping traces
Trace ’01239483-be84-wev9-a2d390i9ow02’ stopped for smb
Waiting for traces to complete
Waiting for node ’node1’
Waiting for node ’node2’
Finishing trace ’01239483-be84-wev9-a2d390i9ow02’
Trace tar file has been written to ’/tmp/mmfs/smb.20150513_162322.trc/smb.trace.20150513_162542.tar.gz’
If you do not stop the trace, it continues until the trace duration expires. For more information, see
the “Trace timeout” subtopic.
7. Look in the trace log files for the results of the trace. For more information, see the “Trace log files”
on page 17 subtopic.
Trace timeout
If you do not stop a trace manually, the trace runs until its trace duration expires. The default trace
duration is 10 minutes, but you can set a different value in the mmprotocoltrace command. Each node
that participates in a trace starts a timeout process that is set to the trace duration. When a timeout
occurs, the process checks the trace status. If the trace is active, the process stops the trace, writes the file
location to the log file, and exits. If the trace is not active, the timeout process exits.
If a trace stops because of a timeout, look in the log file of each node to find the location of the trace log
file. The log entry is similar to the following entry:
2015-08-26T16:53:35.885 W:14150:MainThread:TIMEOUT:
Trace ’d4643ccf-96c1-467d-93f8-9c71db7333b2’ tar file located at
’/tmp/mmfs/smb.20150826_164328.trc/smb.trace.20150826_165334.tar.gz’
Trace log files are compressed files in the /var/adm/ras directory. The contents of a trace log file depends
on the type of trace. The product supports three types of tracing: SMB, Network, and Object.
SMB SMB tracing captures System Message Block information. The resulting trace log file contains an
smbd.log file for each node for which information has been collected . A global trace captures
information for all the clients that are connected to the SMB server. A targeted trace captures
information for the specified IP address.
Network
Network tracing calls Wireshark's dumpcap utility to capture network packets. The resulting trace
log file contains a pcappng file that is readable by Wireshark and other programs. The file name is
similar to bfn22-10g_all_00001_20150907125015.pcap.
If the mmprotocoltrace command specifies a client IP address, the trace captures traffic between
that client and the server. If no IP address is specified, the trace captures traffic across all network
interfaces of each participating node.
Object
The trace log file contains log files for each node, one for each of the object services.
Object tracing sets the log location in the rsyslog configuration file. For more information about
this file, see the description of the rsyslogconflocation configuration parameter in the “Trace
configuration file” subtopic.
It is not possible to configure an Object trace by clients so that information for all connections is
recorded.
Each node in the cluster has its own trace configuration file, which is stored in the /var/mmfs/ces
directory. The configuration file contains settings for logging and for each type of tracing:
[logging]
filename
The name of the log file.
level The current logging level, which can be debug, info, warning, error, or critical.
[smb]
defaultloglocation
The default log location that is used by the reset command or when current information
is not retrievable.
defaultloglevel
The default log level that is used by the reset command or when current information is
not retrievable.
traceloglevel
The log level for tracing.
maxlogsize
The maximum size of the log file in kilobytes.
esttracesize
The estimated trace size in kilobytes.
[network]
numoflogfiles
The maximum number of log files.
Before you run a new trace, verify that the trace system is reset for the type of trace that you want to
run: SMB, Network, or Object. The examples in the following instructions use the SMB trace system. To
reset the trace system, follow these steps:
1. Stop the trace if it is still running.
a. Check the trace status to see whether the current trace is stopped on all the nodes:
mmprotocoltrace status smb
If the command is successful, then you have successfully reset the trace system. Skip to the last step
in these instructions.
If the command returns an error message, go to the next step.
Note: The command responds with an error message if the trace state of a node is something other
than DONE or FAILED. You can verify the trace state of the nodes by running the status command:
mmprotocoltrace status smb
3. Run the clear command again with the -f (force) option.
mmprotocoltrace -f clear smb
4. After a forced clear, the trace system might still be in an invalid state. Run the reset command. For
more information about the command, see the “Using advanced options” on page 19.
mmprotocoltrace reset smb
5. Check the default values in the trace configuration file to verify that they are correct. To display the
values in the trace configuration file, run the config command. For more information about the file,
see the “Trace configuration file” on page 17 subtopic.
mmprotocoltrace config smb
The reset command restores the trace system to the default values that are set in the trace configuration
file. The command also performs special actions for each type of trace:
v For an SMB trace, the reset removes any IP-specific configuration files and sets the log level and log
location to the default values.
v For a Network trace, the reset stops all dumpcap processes.
v For an Object trace, the reset sets the log level to the default value. It then sets the log location to the
default location in the rsyslog configuration file, and restarts the rsyslog service.
The following command resets the SMB trace:
mmprotocoltrace reset smb
The status command with the -v (verbose) option provides more trace information, including the values
of trace variables. The following command returns verbose trace information for the SMB trace:
mmprotocoltrace -v status smb
The error logging facility is referred to as the error log regardless of operating-system specific error log
facility naming conventions.
Failures in the error log can be viewed by issuing this command on an AIX node:
errpt -a
On Windows, use the Event Viewer and look for events with a source label of GPFS in the Application
event category.
On Linux, syslog may include GPFS log messages and the error logs described in this section. The
systemLogLevel attribute of the mmchconfig command controls which GPFS log messages are sent to
syslog. For more information, see the mmchconfig command in the IBM Spectrum Scale: Administration
and Programming Reference.
The error log contains information about several classes of events or errors. These classes are:
v “MMFS_ABNORMAL_SHUTDOWN” on page 20
v “MMFS_DISKFAIL” on page 20
v “MMFS_ENVIRON” on page 20
v “MMFS_FSSTRUCT” on page 20
v “MMFS_GENERIC” on page 20
v “MMFS_LONGDISKIO” on page 21
v “MMFS_QUOTA” on page 21
v “MMFS_SYSTEM_UNMOUNT” on page 22
v “MMFS_SYSTEM_WARNING” on page 22
MMFS_DISKFAIL
This topic describes about the MMFS_DISKFAIL error log available in IBM Spectrum Scale.
The MMFS_DISKFAIL error log entry indicates that GPFS has detected the failure of a disk and forced
the disk to the stopped state. This is ordinarily not a GPFS error but a failure in the disk subsystem or
the path to the disk subsystem.
MMFS_ENVIRON
This topic describes about the MMFS_ENVIRON error log available in IBM Spectrum Scale.
MMFS_ENVIRON error log entry records are associated with other records of the MMFS_GENERIC or
MMFS_SYSTEM_UNMOUNT types. They indicate that the root cause of the error is external to GPFS
and usually in the network that supports GPFS. Check the network and its physical connections. The
data portion of this record supplies the return code provided by the communications code.
MMFS_FSSTRUCT
This topic describes about the MMFS_FSSTRUCT error log available in IBM Spectrum Scale.
The MMFS_FSSTRUCT error log entry indicates that GPFS has detected a problem with the on-disk
structure of the file system. The severity of these errors depends on the exact nature of the inconsistent
data structure. If it is limited to a single file, EIO errors will be reported to the application and operation
will continue. If the inconsistency affects vital metadata structures, operation will cease on this file
system. These errors are often associated with an MMFS_SYSTEM_UNMOUNT error log entry and will
probably occur on all nodes. If the error occurs on all nodes, some critical piece of the file system is
inconsistent. This can occur as a result of a GPFS error or an error in the disk system.
If the file system is severely damaged, the best course of action is to follow the procedures in “Additional
information to collect for file system corruption or MMFS_FSSTRUCT errors” on page 168, and then
contact the IBM Support Center.
MMFS_GENERIC
This topic describes about MMFS_GENERIC error logs available in IBM Spectrum Scale.
The MMFS_GENERIC error log entry means that GPFS self diagnostics have detected an internal error,
or that additional information is being provided with an MMFS_SYSTEM_UNMOUNT report. If the
record is associated with an MMFS_SYSTEM_UNMOUNT report, the event code fields in the records
will be the same. The error code and return code fields might describe the error. See Chapter 15,
“Messages,” on page 173 for a listing of codes generated by GPFS.
If the error is generated by the self diagnostic routines, service personnel should interpret the return and
error code fields since the use of these fields varies by the specific error. Errors caused by the self
checking logic will result in the shutdown of GPFS on this node.
MMFS_GENERIC errors can result from an inability to reach a critical disk resource. These errors might
look different depending on the specific disk resource that has become unavailable, like logs and
allocation maps. This type of error will usually be associated with other error indications. Other errors
generated by disk subsystems, high availability components, and communications components at the
MMFS_LONGDISKIO
This topic describes about the MMFS_LONGDISKIO error log available in IBM Spectrum Scale.
The MMFS_LONGDISKIO error log entry indicates that GPFS is experiencing very long response time
for disk requests. This is a warning message and can indicate that your disk system is overloaded or that
a failing disk is requiring many I/O retries. Follow your operating system's instructions for monitoring
the performance of your I/O subsystem on this node and on any disk server nodes that might be
involved. The data portion of this error record specifies the disk involved. There might be related error
log entries from the disk subsystems that will pinpoint the actual cause of the problem. If the disk is
attached to an AIX node, refer to AIX in IBM Knowledge Center (www.ibm.com/support/
knowledgecenter/ssw_aix/welcome) and search for performance management. To enable or disable, use the
mmchfs -w command. For more details, contact the IBM Support Center.
The mmpmon command can be used to analyze I/O performance on a per-node basis. See Failures using
the mmpmon command and the Monitoring GPFS I/O performance with the mmpmon command topic in the
IBM Spectrum Scale: Advanced Administration Guide.
MMFS_QUOTA
This topic describes about the MMFS_QUOTA error log available in IBM Spectrum Scale.
The MMFS_QUOTA error log entry is used when GPFS detects a problem in the handling of quota
information. This entry is created when the quota manager has a problem reading or writing the quota
file. If the quota manager cannot read all entries in the quota file when mounting a file system with
quotas enabled, the quota manager shuts down but file system manager initialization continues. Mounts
will not succeed and will return an appropriate error message (see “File system forced unmount” on page
105).
Quota accounting depends on a consistent mapping between user names and their numeric identifiers.
This means that a single user accessing a quota enabled file system from different nodes should map to
the same numeric user identifier from each node. Within a local cluster this is usually achieved by
ensuring that /etc/passwd and /etc/group are identical across the cluster.
When accessing quota enabled file systems from other clusters, you need to either ensure individual
accessing users have equivalent entries in /etc/passwd and /etc/group, or use the user identity mapping
facility as outlined in the IBM white paper entitled UID Mapping for GPFS in a Multi-cluster Environment
in IBM Knowledge Center (www.ibm.com/support/knowledgecenter/SSFKCN/
com.ibm.cluster.gpfs.doc/gpfs_uid/uid_gpfs.html).
It might be necessary to run an offline quota check (mmcheckquota) to repair or recreate the quota file. If
the quota file is corrupted, mmcheckquota will not restore it. The file must be restored from the backup
copy. If there is no backup copy, an empty file can be set as the new quota file. This is equivalent to
recreating the quota file. To set an empty file or use the backup file, issue the mmcheckquota command
with the appropriate operand:
v -u UserQuotaFilename for the user quota file
v -g GroupQuotaFilename for the group quota file
v -j FilesetQuotaFilename for the fileset quota file
After replacing the appropriate quota file, reissue the mmcheckquota command to check the file system
inode and space usage.
MMFS_SYSTEM_UNMOUNT
This topic describes about the MMFS_SYSTEM_UNMOUNT error log available in IBM Spectrum Scale.
The MMFS_SYSTEM_UNMOUNT error log entry means that GPFS has discovered a condition that
might result in data corruption if operation with this file system continues from this node. GPFS has
marked the file system as disconnected and applications accessing files within the file system will receive
ESTALE errors. This can be the result of:
v The loss of a path to all disks containing a critical data structure.
If you are using SAN attachment of your storage, consult the problem determination guides provided
by your SAN switch vendor and your storage subsystem vendor.
v An internal processing error within the file system.
See “File system forced unmount” on page 105. Follow the problem determination and repair actions
specified.
MMFS_SYSTEM_WARNING
This topic describes about the MMFS_SYSTEM_WARNING error log available in IBM Spectrum Scale.
The MMFS_SYSTEM_WARNING error log entry means that GPFS has detected a system level value
approaching its maximum limit. This might occur as a result of the number of inodes (files) reaching its
limit. If so, issue the mmchfs command to increase the number of inodes for the file system so there is at
least a minimum of 5% free.
This is an example of an error log entry that indicates a failure in either the storage subsystem or
communication subsystem:
LABEL: MMFS_SYSTEM_UNMOUNT
IDENTIFIER: C954F85D
Description
STORAGE SUBSYSTEM FAILURE
Probable Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM
Failure Causes
STORAGE SUBSYSTEM
COMMUNICATIONS SUBSYSTEM
Recommended Actions
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
EVENT CODE
Running the gpfs.snap command with no options is similar to running gpfs.snap -a. It collects data from
all nodes in the cluster. This invocation creates a file that is made up of multiple gpfs.snap snapshots.
The file that is created includes a master snapshot of the node from which the gpfs.snap command was
invoked and non-master snapshots of each of other nodes in the cluster.
If the node on which the gpfs.snap command is run is not a file system manager node, gpfs.snap creates
a non-master snapshot on the file system manager nodes.
The difference between a master snapshot and a non-master snapshot is the data that is gathered. A
master snapshot gathers information from nodes in the cluster. A master snapshot contains all data that a
non-master snapshot has. There are two categories of data that is collected:
1. Data that is always gathered by gpfs.snap (for master snapshots and non-master snapshots):
v “Data gathered by gpfs.snap on all platforms”
v “Data gathered by gpfs.snap on AIX” on page 24
v “Data gathered by gpfs.snap on Linux” on page 25
v “Data gathered by gpfs.snap on Windows” on page 25
2. Data that is gathered by gpfs.snap only in the case of a master snapshot. See “Data gathered by
gpfs.snap for a master snapshot” on page 25.
When the gpfs.snap command runs with no options, data is collected for each of the enabled protocols.
You can turn off the collection of all protocol data and specify the type of protocol information to be
collected using the --protocol option. For more information, see gpfs.snap command in IBM Spectrum
Scale: Administration and Programming Reference.
These items are always obtained by the gpfs.snap command when gathering data for an AIX node:
1. The output of these commands:
v errpt -a
v lssrc -a
v lslpp -hac
v no -a
These items are always obtained by the gpfs.snap command when gathering data for a Linux node:
1. The output of these commands:
v dmesg
v fdisk -l
v lsmod
v lspci
v rpm -qa
v rpm --verify gpfs.base
v rpm --verify gpfs.docs
v rpm --verify gpfs.gpl
v rpm --verify gpfs.msg.en_US
2. The contents of these files:
v /etc/filesystems
v /etc/fstab
v /etc/*release
v /proc/cpuinfo
v /proc/version
v /usr/lpp/mmfs/src/config/site.mcr
v /var/log/messages*
These items are always obtained by the gpfs.snap command when gathering data for a Windows node:
1. The output from systeminfo.exe
2. Any raw trace files *.tmf and mmfs.trc*
3. The *.pdb symbols from /usr/lpp/mmfs/bin/symbols
When the gpfs.snap command is specified with no options, a master snapshot is taken on the node
where the command was issued. All of the information from “Data gathered by gpfs.snap on all
platforms” on page 23, “Data gathered by gpfs.snap on AIX” on page 24, “Data gathered by gpfs.snap on
Linux,” and “Data gathered by gpfs.snap on Windows” is obtained, as well as this data:
1. The output of these commands:
v mmauth
v mmgetstate -a
You can turn off the collection of all protocol data and specify the type of protocol information to be
collected using the --protocol option..
Synopsis
mmdumpperfdata [--remove-tree] [StartTime EndTime | Duration]
Availability
Available with IBM Spectrum Scale Standard Edition or higher.
Description
The mmdumpperfdata command runs all named queries and computed metrics used in the mmperfmon
query command for each cluster node, writes the output into CSV files, and archives all the files in a
single .tgz file. The file name is in the iss_perfdump_YYYYMMDD_hhmmss.tgz format.
The TAR archive file contains a folder for each cluster node and within that folder there is a text file with
the output of each named query and computed metric.
If the start and end time, or duration are not given, then by default the last four hours of metrics
information is collected and archived.
Parameters
--remove-tree or -r
Removes the folder structure that was created for the TAR archive file.
StartTime
Specifies the start timestamp for query in the YYYY-MM-DD[-hh:mm:ss] format.
EndTime
Specifies the end timestamp for query in the YYYY-MM-DD[-hh:mm:ss] format.
Duration
Specifies the duration in seconds
Exit status
0 Successful completion.
nonzero
A failure has occurred.
Security
The node on which the command is issued must be able to execute remote shell commands on any other
node in the cluster without the use of a password and without producing any extraneous messages. See
the following IBM Spectrum Scale: Administration and Programming Reference topic: “Requirements for
administering a GPFS file system”.
Examples
1. To archive the performance metric information collected for the default time period of last four hours
and also delete the folder structure that the command creates, issue this command:
mmdumpperfdata --remove-tree
See also the following IBM Spectrum Scale: Administration and Programming Reference topic:
v “mmperfmon command”.
Location
/usr/lpp/mmfs/bin
mmfsadm command
The mmfsadm command is intended for use by trained service personnel. IBM suggests you do not run
this command except under the direction of such personnel.
Note: The contents of mmfsadm output might vary from release to release, which could obsolete any
user programs that depend on that output. Therefore, we suggest that you do not create user programs
that invoke mmfsadm.
The mmfsadm command extracts data from GPFS without using locking, so that it can collect the data in
the event of locking errors. In certain rare cases, this can cause GPFS or the node to fail. Several options
of this command exist and might be required for use:
cleanup
Delete shared segments left by a previously failed GPFS daemon without actually restarting the
daemon.
dump what
Dumps the state of a large number of internal state values that might be useful in determining
the sequence of events. The what parameter can be set to all, indicating that all available data
should be collected, or to another value, indicating more restricted collection of data. The output
is presented to STDOUT and should be collected by redirecting STDOUT.
showtrace
Shows the current level for each subclass of tracing available in GPFS. Trace level 14 provides the
highest level of tracing for the class and trace level 0 provides no tracing. Intermediate values
exist for most classes. More tracing requires more storage and results in a higher probability of
overlaying the required event.
trace class n
Sets the trace class to the value specified by n. Actual trace gathering only occurs when the
mmtracectl command has been issued.
Other options provide interactive GPFS debugging, but are not described here. Output from the
mmfsadm command will be required in almost all cases where a GPFS problem is being reported. The
mmfsadm command collects data only on the node where it is issued. Depending on the nature of the
problem, mmfsadm output might be required from several or all nodes. The mmfsadm output from the
file system manager is often required.
To determine where the file system manager is, issue the mmlsmgr command:
mmlsmgr
Trace facility
The IBM Spectrum Scale system includes many different trace points to facilitate rapid problem
determination of failures.
IBM Spectrum Scale tracing is based on the kernel trace facility on AIX, embedded GPFS trace subsystem
on Linux, and the Windows ETL subsystem on Windows. The level of detail that is gathered by the trace
facility is controlled by setting the trace levels using the mmtracectl command.
The mmtracectl command sets up and enables tracing using default settings for various common problem
situations. Using this command improves the probability of gathering accurate and reliable problem
determination information. For more information about the mmtracectl command, see the IBM Spectrum
Scale: Administration and Programming Reference.
If the problem requires more detailed tracing, the IBM Support Center might ask you to modify the GPFS
trace levels. Use the mmtracectl command to establish the required trace classes and levels of tracing. The
syntax to modify trace classes and levels is as follows:
mmtracectl --set --trace={io | all | def | "Class Level [Class Level ...]"}
For example, to tailor the trace level for I/O, issue the following command:
mmtracectl --set --trace=io
Once the trace levels are established, start the tracing by issuing:
After the trace data has been gathered, stop the tracing by issuing:
mmtracectl --stop
To clear the trace settings and make sure tracing is turned off, issue:
mmtracectl --off
Other possible values that can be specified for the trace Class include:
afm
active file management
alloc
disk space allocation
allocmgr
allocation manager
basic
'basic' classes
brl
byte range locks
cksum
checksum services
cleanup
cleanup routines
cmd
ts commands
defrag
defragmentation
dentry
dentry operations
dentryexit
daemon routine entry/exit
disk
physical disk I/O
disklease
disk lease
dmapi
Data Management API
ds data shipping
errlog
error logging
eventsExporter
events exporter
file
file operations
fs file system
Values that can be specified for the trace Class, relating to vdisks, include:
vdb
vdisk debugger
vdisk
vdisk
vhosp
vdisk hospital
For more information about vdisks and GPFS Native RAID, see IBM Spectrum Scale RAID: Administration.
The trace Level can be set to a value from 0 through 14, which represents an increasing level of detail. A
value of 0 turns tracing off. To display the trace level in use, issue the mmfsadm showtrace command.
On AIX, the –aix-trace-buffer-size option can be used to control the size of the trace buffer in memory.
On Linux nodes only, use the mmtracectl command to change the following:
v The trace buffer size in blocking mode.
For example, to set the trace buffer size in blocking mode to 8K, issue:
mmtracectl --set --tracedev-buffer-size=8K
For more information about the mmtracectl command, see the IBM Spectrum Scale: Administration and
Programming Reference.
core_pattern + ulimit
The simplest way is to change the core_pattern file at /proc/sys/kernel/core_pattern and to enable core
dumps using the command 'ulimit -c unlimited'. Setting it to something like /var/log/cores/core.%e.%t.
%h.%p will produce core dumps similar to core.bash.1236975953.node01.2344 in /var/log/cores. This
will create core dumps for Linux binaries but will not produce information for Java™ or Python
exceptions.
ABRT
ABRT can be used to produce more detailed output as well as output for Java and Python exceptions.
This overwrites the values stored in core_pattern to pass core dumps to abrt. It then writes this
information to the abrt directory configured in /etc/abrt/abrt.conf. Python exceptions is caught by the
python interpreter automatically importing the abrt.pth file installed in /usr/lib64/python2.7/site-
packages/. If some custom configuration has changed this behavior, Python dumps may not be created.
To get Java runtimes to report unhandled exceptions through abrt, they must be executed with the
command line argument '-agentpath=/usr/lib64/libabrt-java-connector.so'.
The ability to collect core dumps has been added to gpfs.snap using the '--protocol core' option.
Samba can dump to the directory '/var/adm/ras/cores/'. Any files in this directory will be gathered.
Events
Use Monitoring > Events page in the GUI to monitor the events that are reported in the system. The
Events page displays events and you can monitor and troubleshoot errors on your system.
The status icons help to quickly determine whether the event is informational, a warning, or an error.
Click an event and select Properties from the Action menu to see detailed information on the event. The
event table displays the most recent events first.
You can mark certain events as read to change the status of the event in the events view. The status icons
become gray in case an error or warning is fixed or if it is marked as read.
There are events on states that start with "MS*". These events can be errors, warnings, or information
messages that cannot be marked as read and these events automatically change the status from current to
historic when the problem is resolved or information condition changes. The user must either fix the
problem or change the state of some component to make the current event a historical event. There are
also message events that start with MM*. These events never become historic by themselves. The user
must use the action Mark as Read on those events to make them historical because the system cannot
detect itself even if the problem or information is not valid anymore.
Some issues can be resolved by running a fix procedure. Use action Run Fix Procedure to do so. The
Events page provides a recommendation for which fix procedure to run next.
Logs
IBM Support might ask you to collect trace files and dump files from the system to help them resolve a
problem. Typically, you perform this task from the management GUI. Use Settings > Download Logs
page to download logs through GUI.
The GUI log files contain the issues that are related to GUI and it is smaller in size as well. The full log
files give details of all kinds of IBM Spectrum Scale issues. The GUI log consists of the following types of
information:
v Traces from the GUI that contains the information about errors occurred inside GUI code
The full GUI and IBM Spectrum Scale log file help to analyze all kinds of IBM Spectrum Scale issues.
These files are large (gigabytes) and might take an hour to download. You need to select the number of
days for which you need to download the log files. These logs files are collected from each individual
node. In a cluster with hundreds of nodes, downloading these log files might take a long time and the
downloaded file can be large in size. It is recommended to limit the number of days so that it reduces the
size of the log file. It is always better to reduce the size of the log file as you might need to send it to
IBM Support to help fix the issues.
The issues that are reported in the GUI logs are enough to understand the problem in most of the cases.
So, it is recommended to try out the GUI log files first before you download the full log files.
When this command displays a NeedsResync target/fileset state, inconsistencies between home and cache
are being fixed automatically; however, unmount and mount operations are required to return the state to
Active.
The mmafmctl Device getstate command is fully described in the GPFS Commands chapter in the IBM
Spectrum Scale: Administration and Programming Reference.
Use the mmdiag command to query various aspects of the GPFS internal state for troubleshooting and
tuning purposes. The mmdiag command displays information about the state of GPFS on the node where
it is executed. The command obtains the required information by querying the GPFS daemon process
(mmfsd), and thus will only function when the GPFS daemon is running.
The mmdiag command is fully described in the GPFS Commands chapter in IBM Spectrum Scale:
Administration and Programming Reference.
The remaining flags have the same meaning as in the mmshutdown command. They can be used to
specify the nodes on which to get the state of the GPFS daemon.
For example, to display the quorum, the number of nodes up, and the total number of nodes, issue:
mmgetstate -L -a
The mmgetstate command is fully described in the GPFS Commands chapter in the IBM Spectrum Scale:
Administration and Programming Reference.
The mmlscluster command is fully described in the GPFS Commands chapter in the IBM Spectrum Scale:
Administration and Programming Reference.
Depending on your configuration, additional information not documented in either the mmcrcluster
command or the mmchconfig command may be displayed to assist in problem determination.
If a configuration parameter is not shown in the output of this command, the default value for that
parameter, as documented in the mmchconfig command, is in effect.
The mmlsconfig command is fully described in the GPFS Commands chapter in the IBM Spectrum Scale:
Administration and Programming Reference.
Use the mmrefresh command only when you suspect that something is not working as expected and the
reason for the malfunction is a problem with the GPFS configuration data. For example, a mount
command fails with a device not found error, and you know that the file system exists. Another example
is if any of the files in the /var/mmfs/gen directory were accidentally erased. Under normal
circumstances, the GPFS command infrastructure maintains the cluster data files automatically and there
is no need for user intervention.
The -f flag can be used to force the GPFS cluster configuration data files to be rebuilt whether they
appear to be at the most current level or not. If no other option is specified, the command affects only the
node on which it is run. The remaining flags have the same meaning as in the mmshutdown command,
and are used to specify the nodes on which the refresh is to be performed.
For example, to place the GPFS cluster configuration data files at the latest level, on all nodes in the
cluster, issue:
mmrefresh -a
The mmsdrrestore command restores the latest GPFS system files on the specified nodes. If no nodes are
specified, the command restores the configuration information only on the node where it is invoked. If
the local GPFS configuration file is missing, the file specified with the -F option from the node specified
with the -p option is used instead.
This command works best when used in conjunction with the mmsdrbackup user exit, which is
described in the GPFS user exits topic in the IBM Spectrum Scale: Administration and Programming Reference.
For more information, see mmsdrrestore command in IBM Spectrum Scale: Administration and Programming
Reference.
The cluster manager keeps a list of the expelled nodes. Expelled nodes will not be allowed to rejoin the
cluster until they are removed from the list using the -r or --reset option on the mmexpelnode command.
The expelled nodes information will also be reset if the cluster manager node goes down or is changed
with mmchmgr -c.
Or,
mmexpelnode {-l | --list}
Or,
mmexpelnode {-r | --reset} -N {all | Node[,Node...]}
Restricted mode mount is not intended for normal operation, but may allow the recovery of some user
data. Only data which is referenced by intact directories and metadata structures would be available.
Attention:
1. Follow the procedures in “Information to be collected before contacting the IBM Support Center” on
page 167, and then contact the IBM Support Center before using this capability.
2. Attempt this only after you have tried to repair the file system with the mmfsck command. (See
“Why does the offline mmfsck command fail with "Error creating internal storage"?” on page 147.)
3. Use this procedure only if the failing disk is attached to an AIX or Linux node.
Some disk failures can result in the loss of enough metadata to render the entire file system unable to
mount. In that event it might be possible to preserve some user data through a restricted mode mount. This
facility should only be used if a normal mount does not succeed, and should be considered a last resort
to save some data after a fatal disk failure.
Restricted mode mount is invoked by using the mmmount command with the -o rs flags. After a
restricted mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.
Attention: Attempt this only after you have tried to repair the file system with the mmfsck command.
Read-only mode mount is invoked by using the mmmount command with the -o ro flags. After a
read-only mode mount is done, some data may be sufficiently accessible to allow copying to another file
system. The success of this technique depends on the actual disk structures damaged.
The lsof command is available in Linux distributions or by using anonymous ftp from
lsof.itap.purdue.edu (cd to /pub/tools/unix/lsof). The inventor of the lsof command is Victor A. Abell
([email protected]), Purdue University Computing Center.
Use the -L option to see the node name and IP address of each node that has the file system in use. This
command can be used for all file systems, all remotely mounted file systems, or file systems mounted on
nodes of certain clusters.
While not specifically intended as a service aid, the mmlsmount command is useful in these situations:
1. When writing and debugging new file system administrative procedures, to determine which nodes
have a file system mounted and which do not.
2. When mounting a file system on multiple nodes, to determine which nodes have successfully
completed the mount and which have not.
3. When a file system is mounted, but appears to be inaccessible to some nodes but accessible to others,
to determine the extent of the problem.
4. When a normal (not force) unmount has not completed, to determine the affected nodes.
5. When a file system has force unmounted on some nodes but not others, to determine the affected
nodes.
For example, to list the nodes having all file systems mounted:
mmlsmount all -L
The mmlsmount command is fully described in the GPFS Commands chapter in the IBM Spectrum Scale:
Administration and Programming Reference.
The -L flag, used in conjunction with the -I test flag, allows you to display the actions that would be
performed by a policy file without actually applying it. This way, potential errors and misunderstandings
can be detected and corrected without actually making these mistakes.
mmapplypolicy -L 0
Use this option to display only serious errors.
mmapplypolicy -L 1
Use this option to display all of the information (if any) from the previous level, plus some information
as the command runs, but not for each file. This option also displays total numbers for file migration and
deletion.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 1
mmapplypolicy -L 2
Use this option to display all of the information from the previous levels, plus each chosen file and the
scheduled migration or deletion action.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 2
mmapplypolicy -L 3
Use this option to display all of the information from the previous levels, plus each candidate file and the
applicable rule.
This command:
mmapplypolicy fs1-P policyfile -I test -L 3
mmapplypolicy -L 4
Use this option to display all of the information from the previous levels, plus the name of each explicitly
excluded file, and the applicable rule.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 4
indicate that there are two excluded files, /fs1/file1.save and /fs1/file2.save.
mmapplypolicy -L 5
Use this option to display all of the information from the previous levels, plus the attributes of candidate
and excluded files.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 5
mmapplypolicy -L 6
Use this option to display all of the information from the previous levels, plus files that are not candidate
files, and their attributes.
This command:
mmapplypolicy fs1 -P policyfile -I test -L 6
contains information about the data1 file, which is not a candidate file.
Indications leading you to the conclusion that you should run the mmcheckquota command include:
v MMFS_QUOTA error log entries. This error log entry is created when the quota manager has a
problem reading or writing the quota file.
v Quota information is lost due to node failure. Node failure could leave users unable to open files or
deny them disk space that their quotas should allow.
v The in doubt value is approaching the quota limit. The sum of the in doubt value and the current usage
may not exceed the hard limit. Consequently, the actual block space and number of files available to
the user of the group may be constrained by the in doubt value. Should the in doubt value approach a
significant percentage of the quota, use the mmcheckquota command to account for the lost space and
files.
v User, group, or fileset quota files are corrupted.
During the normal operation of file systems with quotas enabled (not running mmcheckquota online),
the usage data reflects the actual usage of the blocks and inodes in the sense that if you delete files you
should see the usage amount decrease. The in doubt value does not reflect how much the user has used
already, it is just the amount of quotas that the quota server has assigned to its clients. The quota server
does not know whether the assigned amount has been used or not. The only situation where the in doubt
value is important to the user is when the sum of the usage and the in doubt value is greater than the
user's quota hard limit. In this case, the user is not allowed to allocate more blocks or inodes unless he
brings the usage down.
The mmcheckquota command is fully described in the GPFS Commands chapter in the IBM Spectrum
Scale: Administration and Programming Reference.
To find out the local device names for these disks, use the mmlsnsd command with the -m option. For
example, issuing mmlsnsd -m produces output similar to this:
Disk name NSD volume ID Device Node name Remarks
------------------------------------------------------------------------------------
hd2n97 0972846145C8E924 /dev/hdisk2 c5n97g.ppd.pok.ibm.com server node
hd2n97 0972846145C8E924 /dev/hdisk2 c5n98g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n97g.ppd.pok.ibm.com server node
hd3n97 0972846145C8E927 /dev/hdisk3 c5n98g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n97g.ppd.pok.ibm.com server node
hd4n97 0972846145C8E92A /dev/hdisk4 c5n98g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n97g.ppd.pok.ibm.com server node
hd5n98 0972846245EB501C /dev/hdisk5 c5n98g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n97g.ppd.pok.ibm.com server node
hd6n98 0972846245DB3AD8 /dev/hdisk6 c5n98g.ppd.pok.ibm.com server node
hd7n97 0972846145C8E934 /dev/hd7n97 c5n97g.ppd.pok.ibm.com server node
To obtain extended information for NSDs, use the mmlsnsd command with the -X option. For example,
issuing mmlsnsd -X produces output similar to this:
Disk name NSD volume ID Device Devtype Node name Remarks
---------------------------------------------------------------------------------------------------
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd3n97 0972846145C8E927 /dev/hdisk3 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n97g.ppd.pok.ibm.com server node,pr=no
hd5n98 0972846245EB501C /dev/hdisk5 hdisk c5n98g.ppd.pok.ibm.com server node,pr=no
sdfnsd 0972845E45F02E81 /dev/sdf generic c5n94g.ppd.pok.ibm.com server node
sdfnsd 0972845E45F02E81 /dev/sdm generic c5n96g.ppd.pok.ibm.com server node
The mmlsnsd command is fully described in the GPFS Commands chapter in the IBM Spectrum Scale:
Administration and Programming Reference.
For example, if you issue mmwindisk list, your output is similar to this:
Disk Avail Type Status Size GPFS Partition ID
---- ----- ------- --------- -------- ------------------------------------
0 BASIC ONLINE 137 GiB
1 GPFS ONLINE 55 GiB 362DD84E-3D2E-4A59-B96B-BDE64E31ACCF
2 GPFS ONLINE 200 GiB BD5E64E4-32C8-44CE-8687-B14982848AD2
3 GPFS ONLINE 55 GiB B3EC846C-9C41-4EFD-940D-1AFA6E2D08FB
4 GPFS ONLINE 55 GiB 6023455C-353D-40D1-BCEB-FF8E73BF6C0F
5 GPFS ONLINE 55 GiB 2886391A-BB2D-4BDF-BE59-F33860441262
6 GPFS ONLINE 55 GiB 00845DCC-058B-4DEB-BD0A-17BAD5A54530
7 GPFS ONLINE 55 GiB 260BCAEB-6E8A-4504-874D-7E07E02E1817
8 GPFS ONLINE 55 GiB 863B6D80-2E15-457E-B2D5-FEA0BC41A5AC
9 YES UNALLOC OFFLINE 55 GiB
10 YES UNALLOC OFFLINE 200 GiB
The mmwindisk command does not provide the NSD volume ID. You can use mmlsnsd -m to find the
relationship between NSDs and devices, which are disk numbers on Windows.
Attention: Use this command only when the IBM Support Center directs you to do so.
Before you run mmfileid, you must run a disk analysis utility and obtain the disk sector numbers that
are damaged or suspect. These sectors are input to the mmfileid command.
For more information, see the help topic on setting the Quality of Service for I/O operations (QoS) in
the IBM Spectrum Scale: Administration and Programming Reference.
You can redirect the output to a file with the -o flag and sort the output on the inode number with the
sort command.
The mmfileid command output contains one line for each inode found to be on a corrupted disk sector.
Each line of the command output has this format:
InodeNumber LogicalDiskAddress SnapshotId Filename
Assume that a disk analysis tool reports that disks hdisk6, hdisk7, hdisk8, and hdisk9 contain bad
sectors, and that the file addr.in has the following contents:
k148n07:hdisk9:2206310-2206810
k148n07:hdisk8:2211038-2211042
k148n07:hdisk8:2201800-2202800
k148n01:hdisk6:2921879-2926880
k148n09:hdisk7:1076208-1076610
The lines that begin with the word Address represent GPFS system metadata files or reserved disk areas.
If your output contains any lines like these, do not attempt to replace or repair the indicated files. If you
suspect that any of the special files are damaged, call the IBM Support Center for assistance.
The following line of output indicates that inode number 14336, disk address 1072256 contains file
/gpfsB/tesDir/testFile.out. The 0 to the left of the name indicates that the file does not belong to a
snapshot. This file is on a potentially bad disk sector area:
14336 1072256 0 /gpfsB/tesDir/testFile.out
The following line of output indicates that inode number 14344, disk address 2922528 contains file
/gpfsB/x.img. The 1 to the left of the name indicates that the file belongs to snapshot number 1. This file
is on a potentially bad disk sector area:
14344 2922528 1 /gpfsB/x.img
The SHA digest is a short and convenient way to identify a key registered with either the mmauth show
or mmremotecluster command. In theory, two keys may have the same SHA digest. In practice, this is
extremely unlikely. The SHA digest can be used by the administrators of two GPFS clusters to determine
if they each have received (and registered) the right key file from the other administrator.
An example is the situation of two administrators named Admin1 and Admin2 who have registered the
others' respective key file, but find that mount attempts by Admin1 for file systems owned by Admin2
If Admin1 finds that the SHA digests do not match, Admin1 runs the mmremotecluster update
command, passing the correct key file as input.
If Admin2 finds that the SHA digests do not match, Admin2 runs the mmauth update command,
passing the correct key file as input.
This is an example of the output produced by the mmauth show all command:
Cluster name: fksdcm.pok.ibm.com
Cipher list: EXP1024-RC2-CBC-MD5
SHA digest: d5eb5241eda7d3ec345ece906bfcef0b6cd343bd
File system access: fs1 (rw, root allowed)
The distributed nature of GPFS, the complexity of the locking infrastructure, the dependency on the
proper operation of disks and networks, and the overall complexity of operating in a clustered
environment all contribute to increasing the probability of a deadlock.
Deadlocks can be disruptive in certain situations, more so than other type of failure. A deadlock
effectively represents a single point of failure that can render the entire cluster inoperable. When a
deadlock is encountered on a production system, it can take a long time to debug. The typical approach
to recovering from a deadlock involves rebooting all of the nodes in the cluster. Thus, deadlocks can lead
to prolonged and complete outages of clusters.
To troubleshoot deadlocks, you must have specific types of debug data that must be collected while the
deadlock is in progress. Data collection commands must be run manually before the deadlock is broken.
Otherwise, determining the root cause of the deadlock after that is difficult. Also, deadlock detection
requires some form of external action, for example, a complaint from a user. Waiting for a user complaint
means that detecting a deadlock in progress might take many hours.
| In GPFS V4.1 and later, automated deadlock detection, automated deadlock data collection, deadlock
| breakup options, and cluster overload detection are provided to make it easier to handle a deadlock
| situation.
| v “Automated deadlock detection”
| v “Automated deadlock data collection” on page 65
| v “Automated deadlock breakup” on page 66
| v “Deadlock breakup on demand” on page 67
| v “Cluster overload detection” on page 68
Automated deadlock detection monitors waiters. Deadlock detection relies on a configurable threshold to
determine if a deadlock is in progress. When a deadlock is detected, an alert is issued in the mmfs.log,
the operating system log, and the deadlockDetected callback is triggered.
To simplify the process of monitoring for deadlocks, a user callback program can be registered with
mmaddcallback for the deadlockDetected event. This program can be used for recording and notification
purposes. When a suspected deadlock is detected, the deadlockDetected event is triggered, and the user
callback program is run. See the /usr/lpp/mmfs/samples/deadlockdetected.sample file for an example of
using the deadlockDetected event.
The following messages, related to deadlock detection, might be found in the mmfs.log files:
Enabled automated deadlock detection.
[A] Deadlock detected: 2015-03-04 02:06:21: waiting 301.291 seconds on node c937f3n04-40g:
PutACLHandlerThread 2449: on MsgRecordCondvar, reason ’RPC wait’ for tmMsgTellAcquire1
When a Deadlock detected message is found, it means that a long waiter exceeded the deadlock
detection threshold and is suspected to be a deadlock. It takes time to know with certainty if a long
waiter is an actual deadlock or not. A real deadlock will not disappear after waiting for a longer period,
but a false-positive deadlock can disappear. When selecting a deadlockDetectionThreshold value, there is
a trade-off between waiting too long and not having timely detection of deadlocks and not waiting long
enough causing false-positive deadlock detection. If a false-positive deadlock is detected, a message
similar to the following might be found in the mmfs.log files:
Wed Mar 4 02:11:53.220 2015: [N] Long waiters have disappeared.
In addition to the messages found in mmfs.log files, the mmdiag --deadlock command can be used to
query the suspected deadlock waiters currently on a node. Only the longest waiters that are suspected
deadlocks are shown. Legitimately long waiters that are ignored by deadlock detection are not shown,
but those waiters are shown in the mmdiag --waiters section. Other waiters, which are much shorter than
the longest deadlock waiters, are not shown because they are typically not relevant (even if their waiter
length exceeds the deadlock detection threshold).
The /var/log/messages files on Linux and the error report on AIX also have information relevant for
deadlock detection, but most details are only shown in the mmfs.log files.
While deadlockDetectionThreshold is for medium length waiters that can grow to moderate lengths,
deadlockDetectionThresholdForShortWaiters is for short waiters that should never be long. Waiters that
can be legitimately long under normal operating conditions are ignored by automated deadlock detection,
for example:
TSDELDISKCmdThread: on ThCond 0x1127916B8 (0x1127916B8) (InodeScanStatCondvar),
reason ’Waiting for PIT worker threads to finish’
When you adjust the deadlock detection threshold, you can disable automated deadlock data collection to
avoid collecting debug data unnecessarily. Run the workload for a while to determine the longest waiter
length detected as a false-positive deadlock. Use that length to determine a better value for
deadlockDetectionThreshold. You can also try increasing the deadlockDetectionThreshold a few times
until no more false-positive deadlocks are detected. If you disabled automated deadlock data collection
while you were adjusting the threshold, enable it again after the adjustments are complete.
Deadlock amelioration functions should only be used on a stable GPFS cluster to avoid extraneous
messages in the mmfs.log files and unnecessary debug data collection. If a cluster is not stable, deadlock
detection should be disabled.
All deadlock amelioration functions, not just deadlock detection, are disabled by specifying 0 for
deadlockDetectionThreshold. A positive value must be specified for deadlockDetectionThreshold to
enable any part of the deadlock amelioration functions.
Automated deadlock data collection can be used to help gather this crucial debug data on detection of a
potential deadlock.
Automated deadlock data collection is enabled by default and controlled with the mmchconfig attribute
deadlockDataCollectionDailyLimit. The deadlockDataCollectionDailyLimit attribute specifies the
maximum number of times debug data can be collected in a 24-hour period. To view the current data
collection interval, enter the following command:
mmlsconfig deadlockDataCollectionDailyLimit
Note: The 24-hour period for deadlockDataCollectionDailyLimit is enforced passively. When there is a
need to collect debug data, the deadlockDataCollectionDailyLimit is examined to determine whether 24
hours passed since the beginning of this period and whether a new period for
deadlockDataCollectionDailyLimit needs to be started or not. If the number of debug data collections
exceeds the deadlockDataCollectionDailyLimit value before the period reaches 24 hours, then no debug
data will be collected until the next period starts. Sometimes exceptions are made to help capture the
The following messages, related to deadlock data collection, might be found in the mmfs.log files:
[I] Enabled automated deadlock debug data collection.
[N] Debug data has not been collected. It was collected recently at 2014-01-29 12:58:00.
Trace data is part of the debug data that is collected when a suspected deadlock is detected. However, on
a typical customer system, GPFS tracing is not routinely turned on. In this case, the automated debug
data collection turns on tracing, waits for 20 seconds, collects the trace, and turns off tracing. The 20
seconds of trace will not cover the formation of the deadlock, but it might still provide some helpful
debug data.
If a system administrator prefers to control the deadlock breakup process, the deadlockDetected callback
can be used to notify system administrators that a potential deadlock was detected. The information from
the mmdiag --deadlock section can then be used to help determine what steps to take to resolve the
deadlock.
Automated deadlock breakup is disabled by default and controlled with the mmchconfig attribute
deadlockBreakupDelay. The deadlockBreakupDelay attribute specifies how long to wait after a
deadlock is detected before attempting to break up the deadlock. Enough time must be provided to allow
the debug data collection to complete. To view the current breakup delay, enter the following command:
mmlsconfig deadlockBreakupDelay
The value of 0 shows that automated deadlock breakup is disabled. To enable automated deadlock
breakup, specify a positive value for deadlockBreakupDelay. If automated deadlock breakup is to be
enabled, a delay of 300 seconds or longer is recommended.
If your goal is to break up a deadlock as soon as possible, and your workload can afford an interruption
at any time, then enable automated deadlock breakup from the beginning. Otherwise, keep automated
deadlock breakup disabled to avoid unexpected interruptions to your workload. In this case, you can
choose to break the deadlock manually, or use the function that is described in the “Deadlock breakup on
demand” topic.
| Due to the complexity of the GPFS code, asserts or segmentation faults might happen during a deadlock
| breakup action. That might cause unwanted disruptions to a customer workload still running normally
| on the cluster. A good reason to use deadlock breakup on demand is to not disturb a partially working
| cluster until it is safe to do so. Try not to break up a suspected deadlock prematurely to avoid
| unnecessary disruptions. If automated deadlock breakup is enabled all of the time, it is good to set
| deadlockBreakupDelay to a large value such as 3600 seconds. If using mmcommon breakDeadlock, it is
| better to wait until the longest deadlock waiter is an hour or longer. Much shorter times can be used if a
| customer prefers fast action in breaking a deadlock over assurance that a deadlock is real.
The following messages, related to deadlock breakup, might be found in the mmfs.log files:
[I] Enabled automated deadlock breakup.
A deadlock can be localized, for example, it might involve only one of many file systems in a cluster. The
other file systems in the cluster can still be used, and a mission critical workload might need to continue
uninterrupted. In these cases, the best time to break up the deadlock is after the mission critical workload
ends.
The mmcommon command can be used to break up an existing deadlock in a cluster when the deadlock
was previously detected by deadlock amelioration. To start the breakup on demand, use the following
syntax:
mmcommon breakDeadlock [-N {Node[,Node...] | NodeFile | NodeClass}]
If the mmcommon breakDeadlock command is issued without the -N parameter, then every node in the
cluster receives a request to take action on any long waiter that is a suspected deadlock.
If the mmcommon breakDeadlock command is issued with the -N parameter, then only the nodes that
are specified receive a request to take action on any long waiter that is a suspected deadlock. For
example, assume that there are two nodes, called node3 and node6, that require a deadlock breakup. To
send the breakup request to just these nodes, issue the following command:
mmcommon breakDeadlock -N node3,node6
The output of the mmdsh command can be used to determine if any deadlock waiters still exist and if
any additional actions are needed.
The effect of the mmcommon breakDeadlock command only persists on a node until the longest
deadlock waiter that was detected disappears. All actions that are taken by mmcommon breakDeadlock
are recorded in the mmfs.log file. When mmcommon breakDeadlock is issued for a node that did not
have a deadlock, no action is taken except for recording the following message in the mmfs.log file:
[N] Received deadlock breakup request from 192.168.40.72: No deadlock to break up.
The mmcommon breakDeadlock command provides more control over breaking up deadlocks, but
multiple breakup requests might be required to achieve satisfactory results. All waiters that exceeded the
deadlockDetectionThreshold might not disappear when mmcommon breakDeadlock completes on a
node. In complicated deadlock scenarios, some long waiters can persist after the longest waiters
disappear. Waiter length can grow to exceed the deadlockDetectionThreshold at any point, and waiters
can disappear at any point as well. Examine the waiter situation after mmcommon breakDeadlock
completes to determine whether the command must be repeated to break up the deadlock.
Another way to break up a deadlock on demand is to enable automated deadlock breakup by changing
deadlockBreakupDelay to a positive value. By enabling automated deadlock breakup, breakup actions
are initiated on existing deadlock waiters. The breakup actions repeat automatically if deadlock waiters
are detected. Change deadlockBreakupDelay back to 0 when the results are satisfactory, or when you
want to control the timing of deadlock breakup actions again. If automated deadlock breakup remains
enabled, breakup actions start on any newly detected deadlocks without any intervention.
A cluster overload condition does not affect how GPFS works outside of the deadlock amelioration
functions. However, cluster overload detection and notification can be used for monitoring hardware,
network, or workload conditions to help maintain a healthy production cluster.
Cluster overload detection is enabled by default and controlled with the mmchconfig attribute
deadlockOverloadThreshold. The deadlockOverloadThreshold attribute can be adjusted to ensure that
overload conditions are detected according to the criteria you set, instead of reporting overload
conditions that you can tolerate. To view the current threshold for cluster overload detection, enter the
following command:
mmlsconfig deadlockOverloadThreshold
To disable cluster overload detection, specify a value of 0 for the deadlockOverloadThreshold attribute.
To simplify the process of monitoring for a cluster overload condition, a user callback program can be
registered with mmaddcallback for the deadlockOverload event. This program can be used for recording
and notification purposes. Whenever a node detects an overload condition, the deadlockOverload event
is triggered, and the user callback program is run.
When a node detects an overload condition, it notifies all nodes in the cluster that the cluster is now
overloaded. The notification process uses the cluster manager and the gpfsNotifyOverload event.
Overload is a cluster-wide condition because all the nodes in a cluster work together, and long waiters on
one node can affect other nodes in the cluster. To reduce network traffic, each node checks whether the
overload condition should be cleared or not. After a node does not detect an overload condition and is
not informed that the cluster is still overloaded, each node will mark the cluster as no longer overloaded
after a short period.
The following messages, related to cluster overload, might be found in the mmfs.log files:
[W] Warning: cluster myCluster may become overloaded soon.
[I] This node is the cluster manager of Cluster myCluster, sending ’overloaded’ status to the entire cluster
If automated deadlock breakup is enabled, it is disabled temporarily until the overload condition is
cleared. This process avoids unnecessary breakup actions when a false-positive deadlock is detected.
If your problem occurs on the AIX operating system, see AIX in IBM Knowledge Center
(www.ibm.com/support/knowledgecenter/ssw_aix/welcome) and search for the appropriate kernel
debugging documentation for information about the AIX kdb command.
If your problem occurs on the Linux operating system, see the documentation for your distribution
vendor.
If your problem occurs on the Windows operating system, the following tools that are available from the
Microsoft website (www.microsoft.com), might be useful in troubleshooting:
v Debugging Tools for Windows
v Process Monitor
v Process Explorer
v Microsoft Windows Driver Kit
v Microsoft Windows Software Development Kit
The mmpmon command is intended for system administrators to analyze their I/O on the node on
which it is run. It is not primarily a diagnostic tool, but may be used as one for certain problems. For
example, running mmpmon on several nodes may be used to detect nodes that are experiencing poor
performance or connectivity problems.
The syntax of the mmpmon command is fully described in the GPFS Commands chapter in the IBM
Spectrum Scale: Administration and Programming Reference. For details on the mmpmon command, see the
Monitoring GPFS I/O performance with the mmpmon command topic in the IBM Spectrum Scale:
Administration and Programming Reference.
An IBM Spectrum Scale installation problem should be suspected when GPFS modules are not loaded
successfully, commands do not work, either on the node that you are working on or on other nodes, new
command operands added with a new release of IBM Spectrum Scale are not recognized, or there are
problems with the kernel extension.
A GPFS configuration problem should be suspected when the GPFS daemon will not activate, it will not
remain active, or it fails on some nodes but not on others. Suspect a configuration problem also if
quorum is lost, certain nodes appear to hang or do not communicate properly with GPFS, nodes cannot
be added to the cluster or are expelled, or GPFS performance is very noticeably degraded once a new
release of GPFS is installed or configuration parameters have been changed.
These are some of the errors encountered with GPFS installation, configuration and operation:
v “Installation and configuration problems”
v “GPFS modules cannot be loaded on Linux” on page 79
v “GPFS daemon will not come up” on page 79
v “GPFS daemon went down” on page 83
v “IBM Spectrum Scale failures due to a network failure” on page 84
v “Kernel panics with a 'GPFS dead man switch timer has expired, and there's still outstanding I/O
requests' message” on page 85
v “Quorum loss” on page 85
v “Delays and deadlocks” on page 86
v “Node cannot be added to the GPFS cluster” on page 87
v “Remote node expelled after remote file system successfully mounted” on page 87
v “Disaster recovery issues” on page 88
v “GPFS commands are unsuccessful” on page 89
v “Application program errors” on page 91
v “Troubleshooting Windows problems” on page 92
v “OpenSSH connection delays” on page 93
The IBM Spectrum Scale: Concepts, Planning, and Installation Guide provides the step-by-step procedure for
installing and migrating IBM Spectrum Scale, however, some problems might occur if the procedures
were not properly followed.
After reinstalling IBM Spectrum Scale code, check whether the /var/mmfs/gen/mmsdrfs file was lost. If it
was lost, and an up-to-date version of the file is present on the primary GPFS cluster configuration
server, restore the file by issuing this command from the node on which it is missing:
mmsdrrestore -p primaryServer
where primaryServer is the name of the primary GPFS cluster configuration server.
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but it
is present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file, and
remoteFile is the full path name of that file on that node.
One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.
If you have made modifications to any of the users exist in /var/mmfs/etc, you will have to restore them
before starting GPFS.
For additional information, see “Recovery from loss of GPFS cluster configuration data file” on page 77.
The /etc/hosts file must have a unique node name for each node interface to be used by GPFS. Violation
of this requirement results in the message:
6027-1941
Cannot handle multiple interfaces for host hostName.
If you receive this message, correct the /etc/hosts file so that each node interface to be used by GPFS
appears only once in the file.
Many of the GPFS administration commands perform operations on nodes other than the node on which
the command was issued. This is achieved by utilizing a remote invocation shell and a remote file copy
command. By default these items are /usr/bin/ssh and /usr/bin/scp. You also have the option of
specifying your own remote shell and remote file copy commands to be used instead of the default ssh
and scp. The remote shell and copy commands must adhere to the same syntax forms as ssh and scp but
may implement an alternate authentication mechanism. For details, see the mmcrcluster and
mmchcluster commands. These are problems you may encounter with the use of remote commands.
Authorization problems
This topic describes about issues with running remote commands due to authorization problems in IBM
Spectrum Scale.
| Note: Use the ssh and scp commands that are shipped with the OpenSSH package supported by GPFS.
| Refer to the IBM Spectrum Scale FAQ in IBM Knowledge Center (www.ibm.com/support/
| knowledgecenter/STXKQY/gpfsclustersfaq.html) for the latest OpenSSH information.
For the ssh and scp commands issued by GPFS administration commands to succeed, each node in the
cluster must have an .rhosts file in the home directory for the root user, with file permission set to 600.
This .rhosts file must list each of the nodes and the root user. If such an .rhosts file does not exist on each
node in the cluster, the ssh and scp commands issued by GPFS commands will fail with permission
errors, causing the GPFS commands to fail in turn.
If you elected to use installation specific remote invocation shell and remote file copy commands, you
must ensure:
1. Proper authorization is granted to all nodes in the GPFS cluster.
2. The nodes in the GPFS cluster can communicate without the use of a password, and without any
extraneous messages.
Connectivity problems
This topic describes about the issues with running GPFS commands on remote nodes due to connectivity
problems.
Another reason why ssh may fail is that connectivity to a needed node has been lost. Error messages
from mmdsh may indicate that connectivity to such a node has been lost. Here is an example:
mmdelnode -N k145n04
Verifying GPFS is stopped on all affected nodes ...
mmdsh: 6027-1617 There are no available nodes on which to run the command.
mmdelnode: 6027-1271 Unexpected error from verifyDaemonInactive: mmcommon onall.
Return code: 1
If error messages indicate that connectivity to a node has been lost, use the ping command to verify
whether the node can still be reached:
ping k145n04
PING k145n04: (119.114.68.69): 56 data bytes
<Ctrl- C>
----k145n04 PING Statistics----
3 packets transmitted, 0 packets received, 100% packet loss
If connectivity has been lost, restore it, then reissue the GPFS command.
When rsh problems arise, the system may display information similar to these error messages:
6027-1615
nodeName remote shell process had return code value.
6027-1617
There are no available nodes on which to run the command.
The mmcommon showLocks command displays information about the lock server, lock name, lock
holder, PID, and extended information. If a GPFS administration command is not responding,
stopping the command will free the lock. If another process has this PID, another error occurred to
the original GPFS command, causing it to die without freeing the lock, and this new process has the
same PID. If this is the case, do not kill the process.
2. If any locks are held and you want to release them manually, from any node in the GPFS cluster issue
the command:
mmcommon freeLocks <lockName>
When GPFS commands are unable to retrieve or update the GPFS cluster configuration data files, the
system may display information similar to these error messages:
6027-1628
Cannot determine basic environment information. Not enough nodes are available.
6027-1630
The GPFS cluster data on nodeName is back level.
6027-1631
The commit process failed.
6027-1632
The GPFS cluster configuration data on nodeName is different than the data on nodeName.
6027-1633
Failed to create a backup copy of the GPFS cluster data on nodeName.
A copy of the IBM Spectrum Scale cluster configuration data files is stored in the /var/mmfs/gen/mmsdrfs
file on each node. For proper operation, this file must exist on each node in the IBM Spectrum Scale
cluster. The latest level of this file is guaranteed to be on the primary, and secondary if specified, GPFS
cluster configuration server nodes that were defined when the IBM Spectrum Scale cluster was first
created with the mmcrcluster command.
where primaryServer is the name of the primary GPFS cluster configuration server.
If the /var/mmfs/gen/mmsdrfs file is not present on the primary GPFS cluster configuration server, but is
present on some other node in the cluster, restore the file by issuing these commands:
mmsdrrestore -p remoteNode -F remoteFile
mmchcluster -p LATEST
where remoteNode is the node that has an up-to-date version of the /var/mmfs/gen/mmsdrfs file and
remoteFile is the full path name of that file on that node.
One way to ensure that the latest version of the /var/mmfs/gen/mmsdrfs file is always available is to use
the mmsdrbackup user exit.
GPFS provides an exit, mmsdrbackup, that can be used to automatically back up the GPFS configuration
data every time it changes. To activate this facility, follow these steps:
1. Modify the GPFS-provided version of mmsdrbackup as described in its prologue, to accomplish the
backup of the mmsdrfs file however the user desires. This file is /usr/lpp/mmfs/samples/
mmsdrbackup.sample.
2. Copy this modified mmsdrbackup.sample file to /var/mmfs/etc/mmsdrbackup on all of the nodes in
the GPFS cluster. Make sure that the permission bits for /var/mmfs/etc/mmsdrbackup are set to
permit execution by root.
GPFS will invoke the user-modified version of mmsdrbackup in /var/mmfs/etc every time a change is
made to the mmsdrfs file. This will perform the backup of the mmsdrfs file according to the user's
specifications. See the GPFS user exits topic in the IBM Spectrum Scale: Administration and Programming
Reference.
When experiencing installation and configuration problems, GPFS may report these error numbers in the
operating system error log facility, or return them to an application:
ECONFIG = 215, Configuration invalid or inconsistent between different nodes.
This error is returned when the levels of software on different nodes cannot coexist. For
information about which levels may coexist, see the IBM Spectrum Scale FAQ in IBM Knowledge
Center (www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
ENO_QUOTA_INST = 237, No Quota management enabled.
To enable quotas for the file system issue the mmchfs -Q yes command. To disable quotas for the
file system issue the mmchfs -Q no command.
EOFFLINE = 208, Operation failed because a disk is offline
This is most commonly returned when an open of a disk fails. Since GPFS will attempt to
continue operation with failed disks, this will be returned when the disk is first needed to
complete a command or application request. If this return code occurs, check your disk
Some of the more common problems that you may encounter are:
1. If the portability layer is not built, you may see messages similar to:
Mon Mar 26 20:56:30 EDT 2012: runmmfs starting
Removing old /var/adm/ras/mmfs.log.* files:
Unloading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra
runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist.
runmmfs: Unable to verify kernel/module configuration.
Loading modules from /lib/modules/2.6.32.12-0.6-ppc64/extra
runmmfs: The /lib/modules/2.6.32.12-0.6-ppc64/extra/mmfslinux.ko kernel extension does not exist.
runmmfs: Unable to verify kernel/module configuration.
Mon Mar 26 20:56:30 EDT 2012 runmmfs: error in loading or unloading the mmfs kernel extension
Mon Mar 26 20:56:30 EDT 2012 runmmfs: stopping GPFS
2. The GPFS kernel modules, mmfslinux and tracedev, are built with a kernel version that differs from
that of the currently running Linux kernel. This situation can occur if the modules are built on
another node with a different kernel version and copied to this node, or if the node is rebooted using
a kernel with a different version.
3. If the mmfslinux module is incompatible with your system, you may experience a kernel panic on
GPFS startup. Ensure that the site.mcr has been configured properly from the site.mcr.proto, and
GPFS has been built and installed properly.
For more information about the mmfslinux module, see the Building the GPFS portability layer topic in the
IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
The output of this command should list mmfsd as operational. For example:
12230 pts/8 00:00:00 mmfsd
If the output does not show this, the GPFS daemon needs to be started with the mmstartup
command.
3. If you did not specify the autoload option on the mmcrcluster or the mmchconfig command, you
need to manually start the daemon by issuing the mmstartup command.
If you specified the autoload option, someone may have issued the mmshutdown command. In this
case, issue the mmstartup command. When using autoload for the first time, mmstartup must be run
manually. The autoload takes effect on the next reboot.
4. Verify that the network upon which your GPFS cluster depends is up by issuing:
ping nodename
to each node in the cluster. A properly working network and node will correctly reply to the ping
with no lost packets.
Query the network interface that GPFS is using with:
netstat -i
Determine the problem with accessing node nodeName and correct it.
6. Verify that the GPFS environment is properly initialized by issuing these commands and ensuring that
the output is as expected.
If one or more nodes in the cluster will not start GPFS, these are the possible causes:
v If message:
6027-2700 [E]
A node join was rejected. This could be due to incompatible daemon versions, failure to find
the node in the configuration database, or no configuration manager found.
is written to the GPFS log, incompatible versions of GPFS code exist on nodes within the same cluster.
For shared segment problems, follow the problem determination and repair actions specified with the
following messages:
6027-319
Could not create shared segment.
6027-320
Could not map shared segment.
6027-321
Shared segment mapped at wrong address (is value, should be value).
6027-322
Could not map shared segment in kernel extension.
For network problems, follow the problem determination and repair actions specified with the following
message:
6027-306 [E]
Could not initialize inter-node communication
When the daemon is unable to come up, GPFS may report these error numbers in the operating system
error log, or return them to an application:
ECONFIG = 215, Configuration invalid or inconsistent between different nodes.
This error is returned when the levels of software on different nodes cannot coexist. For
information about which levels may coexist, see the IBM Spectrum Scale FAQ in IBM Knowledge
Center (www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html).
6027-341 [D]
Node nodeName is incompatible because its maximum compatible version (number) is less than the
version of this node (number).
6027-342 [E]
Node nodeName is incompatible because its minimum compatible version is greater than the
version of this node (number).
6027-343 [E]
Node nodeName is incompatible because its version (number) is less than the minimum compatible
version of this node (number).
These are all conditions where the GPFS internal checking has determined that continued operation
would be dangerous to the consistency of your data. Some of these conditions are errors within GPFS
processing but most represent a failure of the surrounding environment.
In most cases, the daemon will exit and restart after recovery. If it is not safe to simply force the
unmounted file systems to recover, the GPFS daemon will exit.
Indications leading you to the conclusion that the daemon went down:
v Applications running at the time of the failure will see either ENODEV or ESTALE errors. The ENODEV
errors are generated by the operating system until the daemon has restarted. The ESTALE error is
generated by GPFS as soon as it restarts.
When quorum is lost, applications with open files receive an ESTALE error return code until the files are
closed and reopened. New file open operations will fail until quorum is restored and the file system is
remounted. Applications accessing these files prior to GPFS return may receive a ENODEV return code
from the operating system.
v The GPFS log contains the message:
6027-650 [X]
The mmfs daemon is shutting down abnormally.
Most GPFS daemon down error messages are in the mmfs.log.previous log for the instance that failed.
If the daemon restarted, it generates a new mmfs.log.latest. Begin problem determination for these
errors by examining the operating system error log.
If an existing quorum is lost, GPFS stops all processing within the cluster to protect the integrity of
your data. GPFS will attempt to rebuild a quorum of nodes and will remount the file system if
automatic mounts are specified.
v Open requests are rejected with no such file or no such directory errors.
When quorum has been lost, requests are rejected until the node has rejoined a valid quorum and
mounted its file systems. If messages indicate lack of quorum, follow the procedures in “GPFS daemon
will not come up” on page 79.
v Removing the setuid bit from the permissions of these commands may produce errors for non-root
users:
mmdf
mmgetacl
mmlsdisk
mmlsfs
mmlsmgr
mmlspolicy
mmlsquota
mmlssnapshot
mmputacl
mmsnapdir
mmsnaplatest
The GPFS system-level versions of these commands (prefixed by ts) may need to be checked for how
permissions are set if non-root users see the following message:
6027-1209
GPFS is down on this node.
Note: The mode bits for all listed commands are 4555 or -r-sr-xr-x. To restore the default (shipped)
permission, enter:
chmod 4555 tscommand
This dependency is direct because various IBM Spectrum Scale internal messages flow on the network,
and may be indirect if the underlying disk technology is dependent on the network. Symptoms included
in an indirect failure would be inability to complete I/O or GPFS moving disks to the down state.
The problem can also be first detected by the GPFS network communication layer. If network
connectivity is lost between nodes or GPFS heart beating services cannot sustain communication to a
node, GPFS will declare the node dead and perform recovery procedures. This problem will manifest
itself by messages appearing in the GPFS log such as:
Mon Jun 25 22:23:36.298 2007: Close connection to 192.168.10.109 c5n109. Attempting reconnect.
Mon Jun 25 22:23:37.300 2007: Connecting to 192.168.10.109 c5n109
Mon Jun 25 22:23:37.398 2007: Close connection to 192.168.10.109 c5n109
Mon Jun 25 22:23:38.338 2007: Recovering nodes: 9.114.132.109
Mon Jun 25 22:23:38.722 2007: Recovered 1 nodes.
Nodes mounting file systems owned and served by other clusters may receive error messages similar to
this:
Mon Jun 25 16:11:16 2007: Close connection to 89.116.94.81 k155n01
Mon Jun 25 16:11:21 2007: Lost membership in cluster remote.cluster. Unmounting file systems.
If a sufficient number of nodes fail, GPFS will lose the quorum of nodes, which exhibits itself by
messages appearing in the GPFS log, similar to this:
Mon Jun 25 11:08:10 2007: Close connection to 179.32.65.4 gpfs2
Mon Jun 25 11:08:10 2007: Lost membership in cluster gpfsxx.kgn.ibm.com. Unmounting file system.
When either of these cases occur, perform problem determination on your network connectivity. Failing
components could be network hardware such as switches or host bus adapters.
For example:
GPFS Deadman Switch timer has expired, and there’s still outstanding I/O requests
GPFS is designed to tolerate node failures through per-node metadata logging (journaling). The log file is
called the recovery log. In the event of a node failure, GPFS performs recovery by replaying the recovery
log for the failed node, thus restoring the file system to a consistent state and allowing other nodes to
continue working. Prior to replaying the recovery log, it is critical to ensure that the failed node has
indeed failed, as opposed to being active but unable to communicate with the rest of the cluster.
In the latter case, if the failed node has direct access (as opposed to accessing the disk with an NSD
server) to any disks that are a part of the GPFS file system, it is necessary to ensure that no I/O requests
submitted from this node complete once the recovery log replay has started. To accomplish this, GPFS
uses the disk lease mechanism. The disk leasing mechanism guarantees that a node does not submit any
more I/O requests once its disk lease has expired, and the surviving nodes use disk lease time out as a
guideline for starting recovery.
This situation is complicated by the possibility of 'hung I/O'. If an I/O request is submitted prior to the
disk lease expiration, but for some reason (for example, device driver malfunction) the I/O takes a long
time to complete, it is possible that it may complete after the start of the recovery log replay during
recovery. This situation would present a risk of file system corruption. In order to guard against such a
contingency, when I/O requests are being issued directly to the underlying disk device, GPFS initiates a
kernel timer, referred to as dead man switch. The dead man switch timer goes off in the event of disk
lease expiration, and checks whether there is any outstanding I/O requests. If there is any I/O pending, a
kernel panic is initiated to prevent possible file system corruption.
Such a kernel panic is not an indication of a software defect in GPFS or the operating system kernel, but
rather it is a sign of
1. Network problems (the node is unable to renew its disk lease).
2. Problems accessing the disk device (I/O requests take an abnormally long time to complete). See
“MMFS_LONGDISKIO” on page 21.
Quorum loss
Each GPFS cluster has a set of quorum nodes explicitly set by the cluster administrator.
These quorum nodes and the selected quorum algorithm determine the availability of file systems owned
by the cluster. See the IBM Spectrum Scale: Concepts, Planning, and Installation Guide and search for quorum.
When quorum loss or loss of connectivity occurs, any nodes still running GPFS suspend the use of file
systems owned by the cluster experiencing the problem. This may result in GPFS access within the
suspended file system receiving ESTALE errnos. Nodes continuing to function after suspending file
system access will start contacting other nodes in the cluster in an attempt to rejoin or reform the
quorum. If they succeed in forming a quorum, access to the file system is restarted.
Normally, quorum loss or loss of connectivity occurs if a node goes down or becomes isolated from its
peers by a network failure. The expected response is to address the failing condition.
If file system processes appear to stop making progress, there may be a system resource problem or an
internal deadlock within GPFS.
Note: A deadlock can occur if user exit scripts that will be called by the mmaddcallback facility are
placed in a GPFS file system. The scripts should be placed in a local file system so they are accessible
even when the networks fail.
Inode Information
-----------------
Number of used inodes: 4244
Number of free inodes: 157036
Number of allocated inodes: 161280
Maximum number of inodes: 512000
GPFS operations that involve allocation of data and metadata blocks (that is, file creation and writes)
will slow down significantly if the number of free blocks drops below 5% of the total number. Free up
some space by deleting some files or snapshots (keeping in mind that deleting a file will not
necessarily result in any disk space being freed up when snapshots are present). Another possible
cause of a performance loss is the lack of free inodes. Issue the mmchfs command to increase the
number of inodes for the file system so there is at least a minimum of 5% free. If the file system is
approaching these limits, you may notice the following error messages:
6027-533 [W]
Inode space inodeSpace in file system fileSystem is approaching the limit for the maximum
number of inodes.
This example shows that deadlock debug data was automatically collected in /tmp/mmfs. If deadlock
debug data was not automatically collected, it would need to be manually collected.
To determine which nodes have the longest waiting threads, issue this command on each node:
/usr/lpp/mmfs/bin/mmdiag --waiters waitTimeInSeconds
For all nodes that have threads waiting longer than waitTimeInSeconds seconds, issue:
mmfsadm dump all
Notes:
a. Each node can potentially dump more than 200 MB of data.
b. Run the mmfsadm dump all command only on nodes that you are sure the threads are really
hung. An mmfsadm dump all command can follow pointers that are changing and cause the node
to crash.
3. If the deadlock situation cannot be corrected, follow the instructions in “Additional information to
collect for delays and deadlocks” on page 168, then contact the IBM Support Center.
The following two messages might appear in the GPFS log for active/active disaster recovery scenarios
with GPFS replication. The purpose of these messages is to record quorum override decisions that are
made after the loss of most of the disks:
6027-435 [N]
The file system descriptor quorum has been overridden.
6027-490 [N]
The descriptor replica on disk diskName has been excluded.
A message similar to these appear in the log on the file system manager, node every time it reads the file
system descriptor with an overridden quorum:
...
6027-435 [N] The file system descriptor quorum has been overridden.
6027-490 [N] The descriptor replica on disk gpfs23nsd has been excluded.
6027-490 [N] The descriptor replica on disk gpfs24nsd has been excluded.
...
For more information on quorum override, see the IBM Spectrum Scale: Concepts, Planning, and Installation
Guide and search for quorum.
For PPRC and FlashCopy-based configurations, more problem determination information can be collected
from the ESS log file. This information and the appropriate ESS documentationmust be referred while
working with various types disk subsystem-related failures. For instance, if users are unable to perform a
PPRC failover (or failback) task successfully or unable to generate a FlashCopy® of a disk volume, they
should consult the subsystem log and the appropriate ESS documentation. For more information, see the
following topics:
v IBM Enterprise Storage Server® (www.redbooks.ibm.com/redbooks/pdfs/sg245465.pdf)
v IBM TotalStorage Enterprise Storage Server Web Interface User's Guide (publibfp.boulder.ibm.com/epubs/
pdf/f2bui05.pdf).
These messages indicate that ssh is not working properly on nodes k145n01 and k145n02.
If you encounter this type of failure, determine why ssh is not working on the identified node. Then
fix the problem.
4. Most problems encountered during file system creation fall into three classes:
v You did not create network shared disks which are required to build the file system.
v The creation operation cannot access the disk.
Follow the procedures for checking access to the disk. This can result from a number of factors
including those described in “NSD and underlying disk subsystem failures” on page 127.
v Unsuccessful attempt to communicate with the file system manager.
The file system creation runs on the file system manager node. If that node goes down, the mmcrfs
command may not succeed.
5. If the mmdelnode command was unsuccessful and you plan to permanently de-install GPFS from a
node, you should first remove the node from the cluster. If this is not done and you run the
mmdelnode command after the mmfs code is removed, the command will fail and display a message
similar to this example:
Verifying GPFS is stopped on all affected nodes ...
k145n05: ksh: /usr/lpp/mmfs/bin/mmremote: not found.
If this happens, power off the node and run the mmdelnode command again.
6. If you have successfully installed and are operating with the latest level of GPFS, but cannot run the
new functions available, it is probable that you have not issued the mmchfs -V full or mmchfs -V
compat command to change the version of the file system. This command must be issued for each of
your file systems.
In addition to mmchfs -V, you may need to run the mmmigratefs command. See the File system
format changes between versions of GPFS topic in the IBM Spectrum Scale: Administration and Programming
Reference.
Note: Before issuing the -V option (with full or compat), see the Migration, coexistence and compatibility
topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide. You must ensure that all
nodes in the cluster have been migrated to the latest level of GPFS code and that you have
successfully run the mmchconfig release=LATEST command.
Make sure you have operated with the new level of code for some time and are certain you want to
migrate to the latest level of GPFS. Issue the mmchfs -V full command only after you have definitely
decided to accept the latest level, as this will cause disk changes that are incompatible with previous
levels of GPFS.
For more information about the mmchfs command, see the IBM Spectrum Scale: Administration and
Programming Reference.
If message 6027-538 is returned from the mmcrfs command, verify that the disk descriptors are specified
correctly and that all named disks exist and are online. Issue the mmlsnsd command to check the disks.
6027-538
Error accessing disks.
If the daemon was not running when you issued the command, you will see message 6027-665. Follow
the procedures in “GPFS daemon will not come up” on page 79.
6027-665
Failed to connect to file system daemon: errorString.
When GPFS commands are unsuccessful, the system may display information similar to these error
messages:
6027-1627
The following nodes are not aware of the configuration server change: nodeList. Do not start GPFS
on the preceding nodes until the problem is resolved.
Note: There is no way to force GPFS nodes to relinquish all their local shares in order to check for
lost quotas. This can only be determined by running the mmcheckquota command immediately after
mounting the file system, and before any allocations are made. In this case, the value in doubt is the
amount lost.
To display the latest quota usage information, use the -e option on either the mmlsquota or the
mmrepquota commands. Remember that the mmquotaon and mmquotaoff commands do not enable
and disable quota management. These commands merely control enforcement of quota limits. Usage
continues to be counted and recorded in the quota files regardless of enforcement.
Reduce quota usage by deleting or compressing files or moving them out of the file system. Consider
increasing quota limit.
Application program errors can be associated with these GPFS message numbers:
6027-506
program: loadFile is already loaded at address.
6027-695 [E]
File system is read-only.
Make sure users own their home directories, which is not normally the case on Windows. They should
also own ~/.ssh and the files it contains. Here is an example of file attributes that work:
bash-3.00$ ls -l -d ~
drwx------ 1 demyn Domain Users 0 Dec 5 11:53 /dev/fs/D/Users/demyn
bash-3.00$ ls -l -d ~/.ssh
drwx------ 1 demyn Domain Users 0 Oct 26 13:37 /dev/fs/D/Users/demyn/.ssh
bash-3.00$ ls -l ~/.ssh
total 11
drwx------ 1 demyn Domain Users 0 Oct 26 13:37 .
drwx------ 1 demyn Domain Users 0 Dec 5 11:53 ..
-rw-r--r-- 1 demyn Domain Users 603 Oct 26 13:37 authorized_keys2
-rw------- 1 demyn Domain Users 672 Oct 26 13:33 id_dsa
-rw-r--r-- 1 demyn Domain Users 603 Oct 26 13:33 id_dsa.pub
-rw-r--r-- 1 demyn Domain Users 2230 Nov 11 07:57 known_hosts
bash-3.00$
The SMB2 protocol is negotiated between a client and the server during the establishment of the SMB
connection, and it becomes active only if both the client and the server are SMB2 capable. If either side is
not SMB2 capable, the default SMB (version 1) protocol gets used.
The SMB2 protocol does active metadata caching on the client redirector side, and it relies on Directory
Change Notification on the server to invalidate and refresh the client cache. However, GPFS on Windows
currently does not support Directory Change Notification. As a result, if SMB2 is used for serving out a
IBM Spectrum Scale file system, the SMB2 redirector cache on the client will not see any cache-invalidate
operations if the actual metadata is changed, either directly on the server or via another CIFS client. In
such a case, the SMB2 client will continue to see its cached version of the directory contents until the
redirector cache expires. Therefore, the use of SMB2 protocol for CIFS sharing of GPFS file systems can
result in the CIFS clients seeing an inconsistent view of the actual GPFS namespace.
A workaround is to disable the SMB2 protocol on the CIFS server (that is, the GPFS compute node). This
will ensure that the SMB2 never gets negotiated for file transfer even if any CIFS client is SMB2 capable.
To disable SMB2 on the GPFS compute node, follow the instructions under the “MORE INFORMATION”
section at the Microsoft Support website (support.microsoft.com/kb/974103).
If you are using OpenSSH and experiencing an SSH connection delay (and if IPv6 is not supported in
your environment), try disabling IPv6 on your Windows nodes and remove or comment out any IPv6
addresses from the /etc/resolv.conf file.
You can also suspect a file system problem if a file system unmounts unexpectedly, or you receive an
error message indicating that file system activity can no longer continue due to an error, and the file
system is being unmounted to preserve its integrity. Record all error messages and log entries that you
receive relative to the problem, making sure that you look on all affected nodes for this data.
These are some of the errors encountered with GPFS file systems:
v “File system will not mount”
v “File system will not unmount” on page 104
v “File system forced unmount” on page 105
v “Unable to determine whether a file system is mounted” on page 108
v “Multiple file system manager failures” on page 108
v “Discrepancy between GPFS configuration data and the on-disk data for a file system” on page 109
v “Errors associated with storage pools, filesets and policies” on page 109
v “Failures using the mmbackup command” on page 116
v “Snapshot problems” on page 116
v “Failures using the mmpmon command” on page 119
v “NFS issues” on page 121
v “Problems working with Samba” on page 123
v “Data integrity” on page 124
v “Messages requeuing in AFM” on page 124
For more information about mounting a file system that is owned and served by another GPFS
cluster, see the IBM Spectrum Scale: Advanced Administration Guide.
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
Note: If automountdir is mounted (as in step 2) and the mmcommon startAutomounter command is
not able to bring up the automount daemon, manually umount the automountdir before issuing the
mmcommon startAutomounter again.
4. Verify that the mount command was issued to GPFS by examining the GPFS log. You should see
something like this:
Mon Jun 25 11:33:03 2004: Command: mount gpfsx2.kgn.ibm.com:gpfs55 5182
5. Examine /var/log/messages for autofs error messages.
This is an example of what you might see if the remote file system name does not exist.
Jun 25 11:33:03 linux automount[20331]: attempting to mount entry /gpfs/automountdir/gpfs55
Jun 25 11:33:04 linux automount[28911]: >> Failed to open gpfs55.
Jun 25 11:33:04 linux automount[28911]: >> No such device
Jun 25 11:33:04 linux automount[28911]: >> mount: fs type gpfs not supported by kernel
Jun 25 11:33:04 linux automount[28911]: mount(generic): failed to mount /dev/gpfs55 (type gpfs)
on /gpfs/automountdir/gpfs55
6. After you have established that GPFS has received a mount request from autofs (Step 4) and that
mount request failed (Step 5), issue a mount command for the GPFS file system and follow the
directions in “File system will not mount” on page 95.
To start the automount daemon, issue the mmcommon startAutomounter command, or stop and
restart GPFS using the mmshutdown and mmstartup commands.
These are some of the errors encountered when mounting remote file systems:
v “Remote file system I/O fails with the “Function not implemented” error message when UID mapping
is enabled”
v “Remote file system will not mount due to differing GPFS cluster security configurations” on page 101
v “Cannot resolve contact node address” on page 101
v “The remote cluster name does not match the cluster name supplied by the mmremotecluster
command” on page 101
v “Contact nodes down or GPFS down on contact nodes” on page 102
v “GPFS is not running on the local node” on page 102
v “The NSD disk does not have an NSD server specified and the mounting cluster does not have direct
access to the disks” on page 102
v “The cipherList option has not been set properly” on page 103
v “Remote mounts fail with the “permission denied” error message” on page 103
Remote file system I/O fails with the “Function not implemented” error message
when UID mapping is enabled
When user ID (UID) mapping in a multi-cluster environment is enabled, certain kinds of mapping
infrastructure configuration problems might result in I/O requests on a remote file system failing:
ls -l /fs1/testfile
ls: /fs1/testfile: Function not implemented
For more information about configuring UID mapping, see the IBM white paper entitled UID Mapping for
GPFS in a Multi-cluster Environment in IBM Knowledge Center (www.ibm.com/support/
knowledgecenter/SSFKCN/com.ibm.cluster.gpfs.doc/gpfs_uid/uid_gpfs.html).
The GPFS log on the cluster issuing the mount command should have entries similar to these:
There is more information in the log file /var/adm/ras/mmfs.log.latest
Mon Jun 25 16:39:27 2007: Waiting to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command: mount gpfsxx2.ibm.com:gpfs66 30291
Mon Jun 25 16:39:27 2007: The administrator of 199.13.68.12 gpfslx2 requires
secure connections. Contact the administrator to obtain the target clusters
key and register the key using "mmremotecluster update".
Mon Jun 25 16:39:27 2007: A node join was rejected. This could be due to
incompatible daemon versions, failure to find the node
in the configuration database, or no configuration manager found.
Mon Jun 25 16:39:27 2007: Failed to join remote cluster gpfsxx2.ibm.com
Mon Jun 25 16:39:27 2007: Command err 693: mount gpfsxx2.ibm.com:gpfs66 30291
The GPFS log file on the cluster that owns and serves the file system will have an entry indicating the
problem as well, similar to this:
Mon Jun 25 16:32:21 2007: Kill accepted connection from 199.13.68.12 because security is required, err 74
To resolve this problem, contact the administrator of the cluster that owns and serves the file system to
obtain the key and register the key using mmremotecluster command.
The SHA digest field of the mmauth show and mmremotecluster commands may be used to determine if
there is a key mismatch, and on which cluster the key should be updated. For more information on the
SHA digest, see “The SHA digest” on page 61.
To resolve the problem, correct the contact list and try the mount again.
The remote cluster name does not match the cluster name supplied by the
mmremotecluster command
A mount command fails with a message similar to this:
Cannot mount gpfslx2:gpfs66: Network is unreachable
In this example, the correct cluster name is gpfslx2.ibm.com and not gpfslx2
mmlscluster
To resolve the problem, use the mmremotecluster show command and verify that the cluster name
matches the remote cluster and the contact nodes are valid nodes in the remote cluster. Verify that GPFS
is active on the contact nodes in the remote cluster. Another way to resolve this problem is to change the
contact nodes using the mmremotecluster update command.
The NSD disk does not have an NSD server specified and the mounting cluster
does not have direct access to the disks
A file system mount fails with a message similar to this:
Failed to open gpfs66.
No such device
mount: Stale NFS file handle
Some file system data are inaccessible at this time.
Check error log for additional information.
Cannot mount gpfslx2.ibm.com:gpfs66: Stale NFS file handle
To resolve the problem, the cluster that owns and serves the file system must define one or more NSD
servers.
The mmchconfig cipherlist=AUTHONLY command must be run on both the cluster that owns and
controls the file system, and the cluster that is attempting to mount the file system.
See the IBM Spectrum Scale: Administration and Programming Reference for detailed information about the
mmauth command and the mmremotefs command.
Mount failure due to client nodes joining before NSD servers are
online
If a client node joins the GPFS cluster and attempts file system access prior to the file system's NSD
servers being active, the mount fails. This is especially true when automount is used. This situation can
occur during cluster startup, or any time that an NSD server is brought online with client nodes already
active and attempting to mount a file system served by the NSD server.
Two mmchconfig command options are used to specify the amount of time for GPFS mount requests to
wait for an NSD server to join the cluster:
nsdServerWaitTimeForMount
Specifies the number of seconds to wait for an NSD server to come up at GPFS cluster startup
time, after a quorum loss, or after an NSD server failure.
Valid values are between 0 and 1200 seconds. The default is 300. The interval for checking is 10
seconds. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no
effect.
nsdServerWaitTimeWindowOnMount
Specifies a time window to determine if quorum is to be considered recently formed.
Valid values are between 1 and 1200 seconds. The default is 600. If nsdServerWaitTimeForMount
is 0, nsdServerWaitTimeWindowOnMount has no effect.
The GPFS daemon need not be restarted in order to change these values. The scope of these two
operands is the GPFS cluster. The -N flag can be used to set different values on different nodes. In this
case, the settings on the file system manager node take precedence over the settings of nodes trying to
access the file system.
When a node rejoins the cluster (after it was expelled, experienced a communications problem, lost
quorum, or other reason for which it dropped connection and rejoined), that node resets all the failure
times that it knows about. Therefore, when a node rejoins it sees the NSD servers as never having failed.
From the node's point of view, it has rejoined the cluster and old failure information is no longer
relevant.
GPFS checks the cluster formation criteria first. If that check falls outside the window, GPFS then checks
for NSD server fail times being within the window.
the file system will not unmount until all processes are finished accessing it. If mmfsd is up, the
processes accessing the file system can be determined. See “The lsof command” on page 50. These
processes can be killed with the command:
lsof filesystem | grep -v COMMAND | awk ’{print $2}’ | xargs kill -9
If mmfsd is not operational, the lsof command will not be able to determine which processes are still
accessing the file system.
For Linux nodes it is possible to use the /proc pseudo file system to determine current file access. For
each process currently running on the system, there is a subdirectory /proc/pid/fd, where pid is the
numeric process ID number. This subdirectory is populated with symbolic links pointing to the files
that this process has open. You can examine the contents of the fd subdirectory for all running
processes, manually or with the help of a simple script, to identify the processes that have open files
in GPFS file systems. Terminating all of these processes may allow the file system to unmount
successfully.
2. Verify that there are no disk media failures.
Look on the NSD server node for error log entries. Identify any NSD server node that has generated
an error log entry. See “Disk media failure” on page 132 for problem determination and repair actions
to follow.
3. If the file system must be unmounted, you can force the unmount by issuing the mmumount -f
command:
Note:
a. See “File system forced unmount” for the consequences of doing this.
b. Before forcing the unmount of the file system, issue the lsof command and close any files that are
open.
c. On Linux, you might encounter a situation where a GPFS file system cannot be unmounted, even
if you issue the mmumount -f command. In this case, you must reboot the node to clear the
condition. You can also try the system umount command before you reboot. For example:
umount -f /fileSystem
4. If a file system that is mounted by a remote cluster needs to be unmounted, you can force the
unmount by issuing the command:
mmumount fileSystem -f -C RemoteClusterName
If your file system has been forced to unmount, follow these steps:
1. With the failure of a single disk, if you have not specified multiple failure groups and replication of
metadata, GPFS will not be able to continue because it cannot write logs or other critical metadata. If
you have specified multiple failure groups and replication of metadata, the failure of multiple disks in
different failure groups will put you in the same position. In either of these situations, GPFS will
forcibly unmount the file system. This will be indicated in the error log by records indicating exactly
which access failed, with an MMFS_SYSTEM_UNMOUNT record indicating the forced unmount.
The user response to this is to take the needed actions to restore the disk access and issue the
mmchdisk command to disks that are shown as down in the information displayed by the mmlsdisk
command.
2. Internal errors in processing data on a single file system may cause loss of file system access. These
errors may clear with the invocation of the umount command, followed by a remount of the file
system, but they should be reported as problems to the IBM Support Center.
3. If an MMFS_QUOTA error log entry containing Error writing quota file... is generated, the quota
manager continues operation if the next write for the user, group, or fileset is successful. If not,
further allocations to the file system will fail. Check the error code in the log and make sure that the
disks containing the quota file are accessible. Run the mmcheckquota command. For more
information, see “The mmcheckquota command” on page 57.
If the file system must be repaired without quotas:
a. Disable quota management by issuing the command:
mmchfs Device -Q no
b. Issue the mmmount command for the file system.
c. Make any necessary repairs and install the backup quota files.
d. Issue the mmumount -a command for the file system.
e. Restore quota management by issuing the mmchfs Device -Q yes command.
f. Run the mmcheckquota command with the -u, -g, and -j options. For more information, see “The
mmcheckquota command” on page 57.
g. Issue the mmmount command for the file system.
4. If errors indicate that too many disks are unavailable, see “Additional failure group considerations.”
Once it is decided how many replicas to create, GPFS picks disks to hold the replicas, so that all replicas
will be in different failure groups, if possible, to reduce the risk of multiple failures. In picking replica
GPFS requires a majority of the replicas on the subset of disks to remain available to sustain file system
operations:
v If there are at least five different failure groups, GPFS will be able to tolerate a loss of two of the five
groups. If disks out of three different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are at least three different failure groups, GPFS will be able to tolerate a loss of one of the three
groups. If disks out of two different failure groups are lost, the file system descriptor may become
inaccessible due to the loss of the majority of the replicas.
v If there are fewer than three failure groups, a loss of one failure group may make the descriptor
inaccessible.
If the subset consists of three disks and there are only two failure groups, one failure group must have
two disks and the other failure group has one. In a scenario that causes one entire failure group to
disappear all at once, if the half of the disks that are unavailable contain the single disk that is part of
the subset, everything stays up. The file system descriptor is moved to a new subset by updating the
remaining two copies and writing the update to a new disk added to the subset. But if the downed
failure group contains a majority of the subset, the file system descriptor cannot be updated and the
file system has to be force unmounted.
Introducing a third failure group consisting of a single disk that is used solely for the purpose of
maintaining a copy of the file system descriptor can help prevent such a scenario. You can designate
this disk by using the descOnly designation for disk usage on the disk descriptor. See the NSD creation
considerations topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide and the
Establishing disaster recovery for your GPFS cluster topic in the IBM Spectrum Scale: Advanced
Administration Guide.
In certain failure situations, GPFS cannot determine whether the file system in question is mounted or
not, and so cannot perform the requested command. In such cases, message 6027-1996 (Command was
unable to determine whether file system fileSystem is mounted) is issued.
If you encounter this message, perform problem determination, resolve the problem, and reissue the
command. If you cannot determine or resolve the problem, you may be able to successfully run the
command by first shutting down the GPFS daemon on all nodes of the cluster (using mmshutdown -a),
thus ensuring that the file system is not mounted.
When the file system manager node fails, another file system manager is appointed in a manner that is
not visible to applications except for the time required to switch over.
There are situations where it may be impossible to appoint a file system manager. Such situations involve
the failure of paths to disk resources from many, if not all, nodes. In this event, the cluster manager
nominates several host names to successively try to become the file system manager. If none succeed, the
cluster manager unmounts the file system everywhere. See “NSD and underlying disk subsystem
failures” on page 127.
The required action here is to address the underlying condition that caused the forced unmounts and
then remount the file system. In most cases, this means correcting the path to the disks required by GPFS.
If NSD disk servers are being used, the most common failure is the loss of access through the
communications network. If SAN access is being used to all disks, the most common failure is the loss of
connectivity through the SAN.
You issue a disk command (for example, mmadddisk, mmdeldisk, or mmrpldisk) and receive the
message:
6027-1290
GPFS configuration data for file system fileSystem may not be in agreement with the on-disk data
for the file system. Issue the command:
mmcommon recoverfs fileSystem
Before a disk is added to or removed from a file system, a check is made that the GPFS configuration
data for the file system is in agreement with the on-disk data for the file system. The preceding message
is issued if this check was not successful. This may occur if an earlier GPFS disk command was unable to
complete successfully for some reason. Issue the mmcommon recoverfs command to bring the GPFS
configuration data into agreement with the on-disk data for the file system.
If running mmcommon recoverfs does not resolve the problem, follow the procedures in “Information to
be collected before contacting the IBM Support Center” on page 167, and then contact the IBM Support
Center.
When you are sure that your setup is correct, see if your problem falls into one of these categories:
v “A NO_SPACE error occurs when a file system is known to have adequate free space” on page 110
v “Negative values occur in the 'predicted pool utilizations', when some files are 'ill-placed'” on page 111
The user might have a policy that writes data into a specific storage pool. When the user tries to create a
file in that storage pool, it returns the ENOSPC error if the storage pool is full. The user next issues the
df command, which indicates that the file system is not full, because the problem is limited to the one
storage pool in the user's policy. In order to see if a particular storage pool is full, the user must issue the
mmdf command.
This output indicates that the file system is only 51% full.
4. To query the storage usage for an individual storage pool, the user must issue the mmdf command.
mmdf fs1
Inode Information
------------------
Number of used inodes: 74
Number of free inodes: 137142
Number of allocated inodes: 137216
Maximum number of inodes: 150016
In this case, the user sees that storage pool sp1 has 0% free space left and that is the reason for the
NO_SPACE error message.
5. To resolve the problem, the user must change the placement policy file to avoid putting data in a full
storage pool, delete some files in storage pool sp1, or add more space to the storage pool.
Suppose that 2 GB of data from a 5 GB file named abc, that is supposed to be in the system storage pool,
are actually located in another pool. This 2 GB of data is said to be 'ill-placed'. Also, suppose that 3 GB of
this file are in the system storage pool, and no other file is assigned to the system storage pool.
If you run the mmapplypolicy command to schedule file abc to be moved from the system storage pool
to a storage pool named YYY, the mmapplypolicy command does the following:
1. Starts with the 'Current pool utilization' for the system storage pool, which is 3 GB.
2. Subtracts 5 GB, the size of file abc.
3. Arrives at a 'Predicted Pool Utilization' of negative 2 GB.
The mmapplypolicy command does not know how much of an 'ill-placed' file is currently in the wrong
storage pool and how much is in the correct storage pool.
When there are ill-placed files in the system storage pool, the 'Predicted Pool Utilization' can be any
positive or negative value. The positive value can be capped by the LIMIT clause of the MIGRATE rule.
The 'Current Pool Utilizations' should always be between 0% and 100%.
Note: I/O errors while migrating files indicate failing storage devices and must be addressed like any
other I/O error. The same is true for any file system error or panic encountered while migrating files.
The mmlsfileset command identifies filesets in this state by displaying a status of 'Deleting'.
5. If you unlink a fileset that has other filesets linked below it, any filesets linked to it (that is, child
filesets) become inaccessible. The child filesets remain linked to the parent and will become accessible
again when the parent is re-linked.
6. By default, the mmdelfileset command will not delete a fileset that is not empty.
Chapter 8. File system issues 113
To empty a fileset, first unlink all its immediate child filesets, to remove their junctions from the
fileset to be deleted. Then, while the fileset itself is still linked, use rm -rf or a similar command, to
remove the rest of the contents of the fileset. Now the fileset may be unlinked and deleted.
Alternatively, the fileset to be deleted can be unlinked first and then mmdelfileset can be used with
the -f (force) option. This will unlink its child filesets, then destroy the files and directories contained
in the fileset.
7. When deleting a small dependent fileset, it may be faster to use the rm -rf command instead of the
mmdelfileset command with the -f option.
When the mmafmctl Device getstate command displays a NeedsResync target/fileset state, inconsistencies
exist between the home and cache. To ensure that the cached data is synchronized with the home and the
fileset is returned to Active state, either the file system must be unmounted and mounted or the fileset
must be unlinked and linked. Once this is done, the next update to fileset data will trigger an automatic
synchronization of data from the cache to the home.
Snapshot problems
Use the mmlssnapshot command as a general hint for snapshot-related problems, to find out what
snapshots exist, and what state they are in. Use the mmsnapdir command to find the snapshot directory
name used to permit access.
The mmlssnapshot command displays the list of all snapshots of a file system. This command lists the
snapshot name, some attributes of the snapshot, as well as the snapshot's status. The mmlssnapshot
command does not require the file system to be mounted.
An example of a snapshot restriction error is exceeding the maximum number of snapshots allowed at
one time. For simple errors of these types, you can determine the source of the error by reading the error
message or by reading the description of the command. You can also run the mmlssnapshot command to
see the complete list of existing snapshots.
Examples of incorrect snapshot name errors are trying to delete a snapshot that does not exist or trying to
create a snapshot using the same name as an existing snapshot. The rules for naming global and fileset
snapshots are designed to minimize conflicts between the file system administrator and the fileset
owners. These rules can result in errors when fileset snapshot names are duplicated across different
filesets or when the snapshot command -j option (specifying a qualifying fileset name) is provided or
omitted incorrectly. To resolve name problems review the mmlssnapshot output with careful attention to
the Fileset column. You can also specify the -s or -j options of the mmlssnapshot command to limit the
output. For snapshot deletion, the -j option must exactly match the Fileset column.
For more information about snapshot naming conventions, see the mmcrsnapshot command in the IBM
Spectrum Scale: Administration and Programming Reference.
In this case, the file system that contains the snapshot to restore should be mounted, and then the
fileset of the snapshot should be linked.
If you encounter additional errors that cannot be resolved, contact the IBM Support Center.
It is also possible to get a name conflict as a result of issuing the mmrestorefs command. Since
mmsnapdir allows changing the name of the dynamically-generated snapshot directory, it is possible that
an older snapshot contains a normal file or directory that conflicts with the current name of the snapshot
directory. When this older snapshot is restored, the mmrestorefs command will recreate the old, normal
file or directory in the file system root directory. The mmrestorefs command will not fail in this case, but
The fix is the similar to the one mentioned before. Perform one of these two steps:
1. After the mmrestorefs command completes, rename the conflicting file or directory that was restored
in the root directory.
2. Run the mmsnapdir command to select a different name for the dynamically-generated snapshot
directory.
Finally, the mmsnapdir -a option enables a dynamically-generated snapshot directory in every directory,
not just the file system root. This allows each user quick access to snapshots of their own files by going
into .snapshots in their home directory or any other of their directories.
Unlike .snapshots in the file system root, .snapshots in other directories is invisible, that is, an ls -a
command will not list .snapshots. This is intentional because recursive file system utilities such as find,
du or ls -R would otherwise either fail or produce incorrect or undesirable results. To access snapshots,
the user must explicitly specify the name of the snapshot directory, for example: ls ~/.snapshots. If there
is a name conflict (that is, a normal file or directory named .snapshots already exists in the user's home
directory), the user must rename the existing file or directory.
The inode numbers that are used for and within these special .snapshots directories are constructed
dynamically and do not follow the standard rules. These inode numbers are visible to applications
through standard commands, such as stat, readdir, or ls. The inode numbers reported for these
directories can also be reported differently on different operating systems. Applications should not expect
consistent numbering for such inodes.
The mmpmon command is thoroughly documented in the Monitoring GPFS I/O performance with the
mmpmon command topic in the IBM Spectrum Scale: Advanced Administration Guide, and the GPFS
Commands chapter in the IBM Spectrum Scale: Administration and Programming Reference. Before proceeding
with mmpmon problem determination, review all of this material to ensure that you are using mmpmon
correctly.
Note: Do not use the perfmon trace class of the GPFS trace to diagnose mmpmon problems. This trace
event does not provide the necessary data.
For details on how GPFS and NFS interact, see the NFS and GPFS topic in the IBM Spectrum Scale:
Administration and Programming Reference.
These are some of the problems encountered when GPFS interacts with NFS:
v “NFS client with stale inode data”
v “NFS V4 problems”
Turning off NFS caching will result in extra file systems operations to GPFS, and negatively affect its
performance.
The clocks of all nodes in the GPFS cluster must be synchronized. If this is not done, NFS access to the
data, as well as other GPFS file system operations, may be disrupted. NFS relies on metadata timestamps
to validate the local operating system cache. If the same directory is either NFS-exported from more than
one node, or is accessed with both the NFS and GPFS mount point, it is critical that clocks on all nodes
that access the file system (GPFS nodes and NFS clients) are constantly synchronized using appropriate
software (for example, NTP). Failure to do so may result in stale information seen on the NFS clients.
NFS V4 problems
Before analyzing an NFS V4 problem, review this documentation to determine if you are using NFS V4
ACLs and GPFS correctly:
1. The NFS Version 4 Protocol paper and other information found in the Network File System Version 4
(nfsv4) section of the IETF Datatracker website (datatracker.ietf.org/wg/nfsv4/documents).
2. The Managing GPFS access control lists and NFS export topic in the IBM Spectrum Scale: Administration
and Programming Reference.
3. The GPFS exceptions and limitations to NFS V4 ACLs topic in the IBM Spectrum Scale: Administration and
Programming Reference.
The commands mmdelacl and mmputacl can be used to revert an NFS V4 ACL to a traditional ACL. Use
the mmdelacl command to remove the ACL, leaving access controlled entirely by the permission bits in
the mode. Then use the chmod command to modify the permissions, or the mmputacl and mmeditacl
commands to assign a new ACL.
For files, the mmputacl and mmeditacl commands can be used at any time (without first issuing the
mmdelacl command) to assign any type of ACL. The command mmeditacl -k posix provides a
translation of the current ACL into traditional POSIX form and can be used to more easily create an ACL
to edit, instead of having to create one from scratch.
File systems being exported with Samba may (depending on which version of Samba you are using)
require the -D nfs4 flag on the mmchfs or mmcrfs commands. This setting enables NFS V4 and CIFS
(Samba) sharing rules. Some versions of Samba will fail share requests if the file system has not been
configured to support them.
GPFS performs extensive checking to validate metadata and ceases using the file system if metadata
becomes inconsistent. This can appear in two ways:
1. The file system will be unmounted and applications will begin seeing ESTALE return codes to file
operations.
2. Error log entries indicating an MMFS_SYSTEM_UNMOUNT and a corruption error are generated.
If actual disk data corruption occurs, this error will appear on each node in succession. Before proceeding
with the following steps, follow the procedures in “Information to be collected before contacting the IBM
Support Center” on page 167, and then contact the IBM Support Center.
1. Examine the error logs on the NSD servers for any indication of a disk error that has been reported.
2. Take appropriate disk problem determination and repair actions prior to continuing.
3. After completing any required disk repair actions, run the offline version of the mmfsck command on
the file system.
4. If your error log or disk analysis tool indicates that specific disk blocks are in error, use the mmfileid
command to determine which files are located on damaged areas of the disk, and then restore these
files. See “The mmfileid command” on page 59 for more information.
5. If data corruption errors occur in only one node, it is probable that memory structures within the
node have been corrupted. In this case, the file system is probably good but a program error exists in
GPFS or another authorized program with access to GPFS data structures.
Follow the directions in “Data integrity” and then reboot the node. This should clear the problem. If
the problem repeats on one node without affecting other nodes check the programming specifications
code levels to determine that they are current and compatible and that no hardware errors were
reported. Refer to the IBM Spectrum Scale: Concepts, Planning, and Installation Guide for correct software
levels.
NSDs, for example, might be defined on top of Fibre Channel SAN connected disks. This information
provides detail on the creation, use, and failure of NSDs and their underlying disk technologies.
These are some of the errors encountered with GPFS disks and NSDs:
v “NSD and underlying disk subsystem failures”
v “GPFS has declared NSDs built on top of AIX logical volumes as down” on page 136
v “Disk accessing commands fail to complete due to problems with some non-IBM disks” on page 138
v “Persistent Reserve errors” on page 138
v “GPFS is not using the underlying multipath device” on page 141
Note: If you are reinstalling the operating system on one node and erasing all partitions from the system,
GPFS descriptors will be removed from any NSD this node can access locally. The results of this action
might require recreating the file system and restoring from backup. If you experience this problem, do
not unmount the file system on any node that is currently mounting the file system. Contact the IBM
Support Center immediately to see if the problem can be corrected.
For disks that are SAN-attached to all nodes in the cluster, device=DiskName should refer to the disk
device name in /dev on the node where the mmcrnsd command is issued. If a server list is specified,
device=DiskName must refer to the name of the disk on the first server node. The same disk can have
different local names on different nodes.
When the mmcrnsd command encounters an error condition, one of these messages is displayed:
6027-2108
Error found while processing stanza
or
6027-1636
Error found while checking disk descriptor descriptor
Usually, this message is preceded by one or more messages describing the error more specifically.
or
6027-1661
Failed while processing disk descriptor descriptor on node nodeName.
One of these errors can occur if an NSD server node does not have read and write access to the disk. The
NSD server node needs to write an NSD volume ID to the raw disk. If an additional NSD server node is
specified, that NSD server node will scan its disks to find this NSD volume ID string. If the disk is
SAN-attached to all nodes in the cluster, the NSD volume ID is written to the disk by the node on which
the mmcrnsd command is running.
If you need to find out the local device names for these disks, you could use the -m option on the
mmlsnsd command. For example, issuing:
mmlsnsd -m
To find the nodes to which disk t65nsd4b is attached and the corresponding local devices for that disk,
issue:
mmlsnsd -d t65nsd4b -M
To display extended information about a node's view of its NSDs, the mmlsnsd -X command can be
used:
mmlsnsd -X -d "hd3n97;sdfnsd;hd5n98"
Note: The -m, -M and -X options of the mmlsnsd command can be very time consuming, especially on
large clusters. Use these options judiciously.
If for some reason the second step fails, for example because the disk is damaged and cannot be written
to, the mmdelnsd command issues a message describing the error and then another message stating the
exact command to issue to complete the deletion of the NSD. If these instructions are not successfully
completed, a subsequent mmcrnsd command can fail with
6027-1662
Disk device deviceName refers to an existing NSD name.
This error message indicates that the disk is either an existing NSD, or that the disk was previously an
NSD that had been removed from the GPFS cluster using the mmdelnsd -p command, and had not been
marked as available.
If the GPFS data structures are not removed from the disk, it might be unusable for other purposes. For
example, if you are trying to create an AIX volume group on the disk, the mkvg command might fail
with messages similar to:
0516-1339 /usr/sbin/mkvg: Physical volume contains some 3rd party volume group.
0516-1397 /usr/sbin/mkvg: The physical volume hdisk5, will not be added to the volume group.
0516-862 /usr/sbin/mkvg: Unable to create volume group.
The easiest way to recover such a disk is to temporarily define it as an NSD again (using the -v no
option) and then delete the just-created NSD. For example:
mmcrnsd -F filename -v no
mmdelnsd -F filename
GPFS will stop using a disk that is determined to have failed. This event is marked as MMFS_DISKFAIL
in an error log entry (see “The operating system error log facility” on page 19). The state of a disk can be
checked by issuing the mmlsdisk command.
The consequences of stopping disk usage depend on what is stored on the disk:
v Certain data blocks may be unavailable because the data residing on a stopped disk is not replicated.
v Certain data blocks may be unavailable because the controlling metadata resides on a stopped disk.
v In conjunction with other disks that have failed, all copies of critical data structures may be unavailable
resulting in the unavailability of the entire file system.
The disk will remain unavailable until its status is explicitly changed through the mmchdisk command.
After that command is issued, any replicas that exist on the failed disk are updated before the disk is
used.
On AIX, consult “The operating system error log facility” on page 19 for hardware configuration error log
entries.
Accessible disk devices will generate error log entries similar to this example for a SSA device:
--------------------------------------------------------------------------
LABEL: SSA_DEVICE_ERROR
IDENTIFIER: FE9E9357
Description
DISK OPERATION ERROR
Probable Causes
DASD DEVICE
Failure Causes
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ERROR CODE
2310 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
Description
DISK FAILURE
Probable Causes
STORAGE SUBSYSTEM
DISK
Failure Causes
STORAGE SUBSYSTEM
DISK
Recommended Actions
CHECK POWER
RUN DIAGNOSTICS AGAINST THE FAILING DEVICE
Detail Data
EVENT CODE
1027755
VOLUME
fs3
RETURN CODE
19
PHYSICAL VOLUME
vp31n05
-----------------------------------------------------------------
GPFS offers a method of protection called replication, which overcomes disk failure at the expense of
additional disk space. GPFS allows replication of data and metadata. This means that three instances of
data, metadata, or both can be automatically created and maintained for any file in a GPFS file system. If
one instance becomes unavailable due to disk failure, another instance is used instead. You can set
different replication specifications for each file, or apply default settings specified at file system creation.
Refer to the File system replication parameters topic in the IBM Spectrum Scale: Concepts, Planning, and
Installation Guide.
GPFS will mark disks down if there have been problems accessing the disk.
2. To prevent any I/O from going to the down disk, issue these commands immediately:
mmchdisk fs1 suspend -d gpfs1nsd
mmchdisk fs1 stop -d gpfs1nsd
Note: If there are any GPFS file systems with pending I/O to the down disk, the I/O will timeout if
the system administrator does not stop it.
To see if there are any threads that have been waiting a long time for I/O to complete, on all nodes
issue:
mmfsadm dump waiters 10 | grep "I/O completion"
3. The next step is irreversible! Do not run this command unless data and metadata have been replicated.
This command scans file system metadata for disk addresses belonging to the disk in question, then
replaces them with a special “broken disk address” value, which may take a while.
CAUTION:
Be extremely careful with using the -p option of mmdeldisk, because by design it destroys
references to data blocks, making affected blocks unavailable. This is a last-resort tool, to be used
when data loss may have already occurred, to salvage the remaining data–which means it cannot
take any precautions. If you are not absolutely certain about the state of the file system and the
impact of running this command, do not attempt to run it without first contacting the IBM Support
Center.
mmdeldisk fs1 gpfs1n12 -p
4. Invoke the mmfileid command with the operand :BROKEN:
mmfileid :BROKEN
For more information, see “The mmfileid command” on page 59.
5. After the disk is properly repaired and available for use, you can add it back to the file system.
You can rebalance the file system at the same time by issuing:
mmadddisk fs1 gpfs12nsd -r
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
2. To re-replicate data that only has single copy, issue:
mmrestripefs fs1 -r
Optionally, use the -b flag instead of the -r flag to rebalance across all disks.
Note: Rebalancing of files is an I/O intensive and time consuming operation, and is important only
for file systems with large files that are mostly invariant. In many cases, normal file update and
creation will rebalance your file system over time, without the cost of the rebalancing.
Strict replication
If data or metadata replication is enabled, and the status of an existing disk changes so that the disk is no
longer available for block allocation (if strict replication is enforced), you may receive an errno of
ENOSPC when you create or append data to an existing file. A disk becomes unavailable for new block
allocation if it is being deleted, replaced, or it has been suspended. If you need to delete, replace, or
suspend a disk, and you need to write new data while the disk is offline, you can disable strict
replication by issuing the mmchfs -K no command before you perform the disk action. However, data
written while replication is disabled will not be replicated properly. Therefore, after you perform the disk
action, you must re-enable strict replication by issuing the mmchfs -K command with the original value
of the -K option (always or whenpossible) and then run the mmrestripefs -r command. To determine if a
disk has strict replication enforced, issue the mmlsfs -K command.
Note: A disk in a down state that has not been explicitly suspended is still available for block allocation,
and thus a spontaneous disk failure will not result in application I/O requests failing with ENOSPC.
While new blocks will be allocated on such a disk, nothing will actually be written to the disk until its
availability changes to up following an mmchdisk start command. Missing replica updates that took
place while the disk was down will be performed when mmchdisk start runs.
No replication
When there is no replication, the system metadata has been lost and the file system is basically
irrecoverable. You may be able to salvage some of the user data, but it will take work and time. A forced
unmount of the file system will probably already have occurred. If not, it probably will very soon if you
try to do any recovery work. You can manually force the unmount yourself:
1. Mount the file system in read-only mode (see “Read-only mode mount” on page 49). This will bypass
recovery errors and let you read whatever you can find. Directories may be lost and give errors, and
parts of files will be missing. Get what you can now, for all will soon be gone. On a single node,
issue:
mount -o ro /dev/fs1
2. If you read a file in block-size chunks and get an EIO return code that block of the file has been lost.
The rest of the file may have useful data to recover or it can be erased. To save the file system
parameters for recreation of the file system, issue:
mmlsfs fs1 > fs1.saveparms
Error numbers specific to GPFS application calls when disk failure occurs
When a disk failure has occurred, GPFS may report these error numbers in the operating system error
log, or return them to an application:
EOFFLINE = 208, Operation failed because a disk is offline
This error is most commonly returned when an attempt to open a disk fails. Since GPFS will
attempt to continue operation with failed disks, this will be returned when the disk is first
needed to complete a command or application request. If this return code occurs, check your disk
for stopped states, and check to determine if the network path exists.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
ENO_MGR = 212, The current file system manager failed and no new manager could be appointed.
This error usually occurs when a large number of disks are unavailable or when there has been a
major network failure. Run the mmlsdisk command to determine whether disks have failed. If
disks have failed, check the operating system error log on all nodes for indications of errors. Take
corrective action by issuing the mmchdisk command.
To repair the disks, see your disk vendor problem determination guide. Follow the problem
determination and repair actions specified.
This is the default behavior, and can be changed with the useNSDserver file system mount option. See
the NSD server considerations topic in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
Note: In general, after fixing the path to a disk, you must run the mmnsddiscover command on the
server that lost the path to the NSD. (Until the mmnsddiscover command is run, the reconnected node
will see its local disks and start using them by itself, but it will not act as the NSD server.)
After that, you must run the command on all client nodes that need to access the NSD on that server; or
you can achieve the same effect with a single mmnsddiscover invocation if you utilize the -N option to
specify a node list that contains all the NSD servers and clients that need to rediscover paths.
If both your data and metadata have been replicated, implement these recovery actions:
1. Unmount the file system:
mmumount fs1 -a
2. Delete the disk from the file system:
mmdeldisk fs1 gpfs10nsd -c
3. If you are replacing the disk, add the new disk to the file system:
mmadddisk fs1 gpfs11nsd
4. Then restripe the file system:
mmrestripefs fs1 -b
Note: Ensure there is sufficient space elsewhere in your file system for the data to be stored by using
the mmdf command.
GPFS has declared NSDs built on top of AIX logical volumes as down
Earlier releases of GPFS allowed AIX logical volumes to be used in GPFS file systems. Using AIX logical
volumes in GPFS file systems is now discouraged as they are limited with regard to their clustering
ability and cross platform support.
Existing file systems using AIX logical volumes are however still supported, and this information might
be of use when working with those configurations.
which will display any underlying physical device present on this node which is backing the NSD. If the
underlying device is a logical volume, perform a mapping from the logical volume to the volume group.
For example, to verify the volume group gpfs1vg on the five nodes in the GPFS cluster, for each node in
the cluster issue:
lspv | grep gpfs1vg
Here the output shows that on each of the five nodes the volume group gpfs1vg is the same physical
disk (has the same pvid). The hdisk numbers vary, but the fact that they may be called different hdisk
names on different nodes has been accounted for in the GPFS product. This is an example of a properly
defined volume group.
If any of the pvids were different for the same volume group, this would indicate that the same volume
group name has been used when creating volume groups on different physical volumes. This will not
work for GPFS. A volume group name can be used only for the same physical volume shared among
nodes in a cluster. For more information, refer to AIX in IBM Knowledge Center (www.ibm.com/
support/knowledgecenter/ssw_aix/welcome) and search for operating system and device management.
For some non-IBM disks, when many varyonvg -u commands are issued in parallel, some of the AIX
varyonvg -u invocations do not complete, causing the disk command to hang.
This situation is recognized by the GPFS disk command not completing after a long period of time, and
the persistence of the varyonvg processes as shown by the output of the ps -ef command on some of the
nodes of the cluster. In these cases, kill the varyonvg processes that were issued by the GPFS disk
command on the nodes of the cluster. This allows the GPFS disk command to complete. Before mounting
the affected file system on any node where a varyonvg process was killed, issue the varyonvg -u
command (varyonvg -u vgname) on the node to make the disk available to GPFS. Do this on each of the
nodes in question, one by one, until all of the GPFS volume groups are varied online.
GPFS allows file systems to have a mix of PR and non-PR disks. In this configuration, GPFS will fence PR
disks for node failures and recovery and non-PR disk will use disk leasing. If all of the disks are PR
disks, disk leasing is not used, so recovery times improve.
GPFS uses the mmchconfig command to enable PR. Issuing this command with the appropriate
usePersistentReserve option configures disks automatically. If this command fails, the most likely cause
is either a hardware or device driver problem. Other PR-related errors will probably be seen as file
system unmounts that are related to disk reservation problems. This type of problem should be debugged
with existing trace tools.
Persistent Reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and
command options. These PR commands and command options give SCSI initiators the ability to establish,
preempt, query, and reset a reservation policy with a specified target disk. The functions provided by PR
commands are a superset of current reserve and release mechanisms. These functions are not compatible
with legacy reserve and release mechanisms. Target disks can only support reservations from either the
legacy mechanisms or the current mechanisms.
Note: Attempting to mix Persistent Reserve commands with legacy reserve and release commands will
result in the target disk returning a reservation conflict error.
Persistent Reserve establishes an interface through a reserve_policy attribute for SCSI disks. You can
optionally use this attribute to specify the type of reservation that the device driver will establish before
accessing data on the disk. For devices that do not support the reserve_policy attribute, the drivers will use
the value of the reserve_lock attribute to determine the type of reservation to use for the disk. GPFS
supports four values for the reserve_policy attribute:
Persistent Reserve support affects both the parallel (scdisk) and SCSI-3 (scsidisk) disk device drivers and
configuration methods. When a device is opened (for example, when the varyonvg command opens the
underlying hdisks), the device driver checks the ODM for reserve_policy and PR_key_value and then opens
the device appropriately. For PR, each host attached to the shared disk must use unique registration key
values for reserve_policy and PR_key_value. On AIX, you can display the values assigned to reserve_policy
and PR_key_value by issuing:
lsattr -El hdiskx -a reserve_policy,PR_key_value
If needed, use the AIX chdev command to set reserve_policy and PR_key_value.
Note: GPFS manages reserve_policy and PR_key_value using reserve_policy=PR_shared when Persistent
Reserve support is enabled and reserve_policy=no_reserve when Persistent Reserve is disabled.
Notes:
1. To view the keys that are currently registered on a disk, issue the following command from a node
that has access to the disk:
/usr/lpp/mmfs/bin/tsprreadkeys hdiskx
2. To check the AIX ODM status of a single disk on a node, issue the following command from a node
that has access to the disk:
lsattr -El hdiskx -a reserve_policy,PR_key_value
Before trying to clear the PR reservation, use the following instructions to verify that the disk is really
intended for GPFS use. Note that in this example, the device name is specified without a prefix (/dev/sdp
is specified as sdp).
If the registered key values all start with 0x00006d, which indicates that the PR registration was issued
by GPFS, proceed to the next step to verify the SCSI-3 PR reservation type. Otherwise, contact your
system administrator for information about clearing the disk state.
2. Display the reservation type on the disk:
/usr/lpp/mmfs/bin/tsprreadres sdp
If the output does not indicate a PR reservation with this type, contact your system administrator for
information about clearing the disk state.
The mmlsdisk command output might show unexpected results for multipath I/O devices. For example
if you issue this command:
mmlsdisk dmfs2 -M
The mmlsdisk output shows that I/O for NSD m0001 is being performed on disk /dev/sdb, but it should
show that I/O is being performed on the device-mapper multipath (DMM) /dev/dm-30. Disk /dev/sdb is
one of eight paths of the DMM /dev/dm-30 as shown from the multipath command.
To change the NSD device type to a known device type, create a file that contains the NSD name and
device type pair (one per line) and issue this command:
mmchconfig updateNsdType=/tmp/filename
Existing file systems using AIX logical volumes are, however, still supported. This information might be
of use when working with those configurations.
If an error report contains a reference to a logical volume pertaining to GPFS, you can use the lslv -l
command to list the physical volume name. For example, if you want to find the physical disk associated
with logical volume gpfs7lv, issue:
lslv -l gpfs44lv
Output is similar to this, with the physical volume name in column one.
gpfs44lv:N/A
PV COPIES IN BAND DISTRIBUTION
hdisk8 537:000:000 100% 108:107:107:107:108
In this example, k164n04 and k164n05 are quorum nodes and k164n06 is a nonquorum node.
To change the quorum status of a node, use the mmchnode command. To change one quorum node to
nonquorum, GPFS does not have to be stopped. If you are changing more than one node at the same
time, GPFS needs to be down on all the affected nodes. GPFS does not have to be stopped when
changing nonquorum nodes to quorum nodes, nor does it need to be stopped on nodes that are not
affected.
For example, to make k164n05 a nonquorum node, and k164n06 a quorum node, issue these commands:
mmchnode --nonquorum -N k164n05
mmchnode --quorum -N k164n06
To set a node's quorum designation at the time that it is added to the cluster, see the mmcrcluster or
mmaddnode commands.
The default dump directory for GPFS is /tmp/mmfs. This directory might disappear on Linux if cron is
set to run the /etc/cron.daily/tmpwatch script. The tmpwatch script removes files and directories in /tmp
that have not been accessed recently. Administrators who want to use a different directory for GPFS
dumps can change the directory by issuing this command:
mmchconfig dataStructureDump=/name_of_some_other_big_file_system
Note: This state information (possibly large amounts of data in the form of GPFS dumps and traces) can
be dumped automatically as part the first failure data capture mechanisms of GPFS, and can accumulate
in the (default /tmp/mmfs) directory that is defined by the dataStructureDump configuration parameter. It
is recommended that a cron job (such as /etc/cron.daily/tmpwatch) be used to remove
dataStructureDump directory data that is older than two weeks, and that such data is collected (for
example, via gpfs.snap) within two weeks of encountering any problem that requires investigation.
You can exclude all GPFS file systems by adding gpfs to the excludeFileSytemType list in this script, or
exclude specific GPFS file systems in the excludeFileSytemType list.
/usr/bin/updatedb -f "excludeFileSystemType" -e "excludeFileSystem"
If indexing GPFS file systems is desired, only one node should run the updatedb command and build the
database in a GPFS file system. If the database is built within a GPFS file system it will be visible on all
nodes after one node finishes building it.
Once you start a new session (by logging out and logging back in), the use of the GPFS drive letter will
supersede any of your settings for the same drive letter. This is standard behavior for all local file
systems on Windows.
Why does the offline mmfsck command fail with "Error creating
internal storage"?
The mmfsck command requires some temporary space on the file system manager for storing internal
data during a file system scan. The internal data will be placed in the directory specified by the mmfsck
-t command line parameter (/tmp by default). The amount of temporary space that is needed is
proportional to the number of inodes (used and unused) in the file system that is being scanned. If GPFS
is unable to create a temporary file of the required size, the mmfsck command will fail with the
following error message:
Error creating internal storage
The mode of an AFM client cache fileset cannot be changed from local-update mode to any other mode;
however, it can be changed from read-only to single-writer (and vice versa), and from either read-only or
single-writer to local-update.
Why are setuid/setgid bits in a single-writer cache reset at home after data is
appended?
The setuid/setgid bits in a single-writer cache are reset at home after data is appended to files on which
those bits were previously set and synced. This is because over NFS, a write operation to a setuid file
resets the setuid bit.
On a fileset whose metadata in all subdirectories is not cached, any application that optimizes by
assuming that directories contain two fewer subdirectories than their hard link count will not traverse the
last subdirectory. One such example is find; on Linux, a workaround for this is to use find -noleaf to
correctly traverse a directory that has not been cached.
For an operating system in the gateway whose Linux kernel version is below 2.6.32, the NFS max rsize is
32K, so AFM would not support an extended attribute size of more than 32K on that gateway.
The .ptrash directory is present in cache and home. In some cases, where there is a conflict that AFM
cannot resolve automatically, the file is moved to .ptrash at cache or home. In cache the .ptrash gets
cleaned up when eviction is triggered. At home, it is not cleared automatically. When the administrator is
looking to clear some space, the .ptrash should be cleaned up first.
Why is my data not read from the network locally when I have an FPO pool
(write-affinity enabled storage pool) created?
When you create a storage pool that is to contain files that make use of FPO features, you must specify
allowWriteAffinity=yes in the storage pool stanza.
To change the failure group in a write-affinity–enabled storage pool, you must use the mmdeldisk and
mmadddisk commands; you cannot use mmchdisk to change it directly.
Why does Hadoop receive a fixed value for the block group factor instead of the
GPFS default value?
When a customer does not define the dfs.block.size property in the configuration file, the GPFS
connector will use a fixed block size to initialize Hadoop. The reason for this is that Hadoop has only one
block size per file system, whereas GPFS allows different chunk sizes (block-group-factor × data block
size) for different data pools because block size is a per-pool property. To avoid a mismatch when using
Hadoop with FPO, define dfs.block.size and dfs.replication in the configuration file.
How can I retain the original data placement when I restore data from a TSM
server?
When data in an FPO pool is backed up in a TSM server and then restored, the original placement map
will be broken unless you set the write affinity failure group for each file before backup.
For AFM home or cache, an FPO pool file written on the local side will be placed according to the write
affinity depth and write affinity failure group definitions of the local side. When a file is synced from
home to cache, it follows the same FPO placement rule as when written from the gateway node in the
cache cluster. When a file is synced from cache to home, it follows the same FPO data placement rule as
when written from the NFS server in the home cluster.
To retain the same file placement at both home and cache, ensure that each has the same cluster
configuration, and set the write affinity failure group for each file.
Note: The recorded events are stored in local database on each node. The user can get a list of recorded
events using mmces events list command. The recorded events can also be displayed through GUI.
Table 7. Events for the AUTH component
Event EventType Severity Message Description Cause User Action
ads_down STATE_CHANGE ERROR External ADS server is External ADS The local node Local node is
unresponsive server is is unable to unable to
unresponsive connect to any connect to any
ADS server. Active Directory
Service server.
Verify network
connection and
check that
Active Directory
Service server(s)
are operational.
ads_failed STATE_CHANGE ERROR local winbindd is local winbindd is The local Local winbindd
unresponsive unresponsive. winbindd does does not
not respond to respond to ping
ping requests. It requests. Try to
is needed for restart
Active Directory winbindd, and
Service. if not successful,
perform
winbindd
troubleshooting.
ads_up STATE_CHANGE INFO external ADS server is External ADS External Active
up server is up. Directory
Service server is
operational, no
user action
required.
ads_warn INFO WARNING external ADS server External ADS An internal An internal
monitoring returned server error occurred error occurred
unknown result monitoring while while
returned monitoring the monitoring
unknown result. external ADS external Active
server. Directory
Service server.
Perform trouble
check.
ldap_down STATE_CHANGE ERROR external LDAP server External LDAP The local node Local node is
{0} is unresponsive server <LDAP is unable to unable to
server> is connect to the connect to
unresponsive. LDAP server. LDAP server.
Verify network
connection and
check that
LDAP server is
operational.
ldap_up STATE_CHANGE INFO external LDAP server The external NA
{0} is up LDAP server is
operational.
Obtain this information as quickly as you can after a problem is detected, so that error logs will not wrap
and system parameters that are always changing, will be captured as close to the point of failure as
possible. When a serious problem is detected, collect this information and then call IBM. For more
information, see:
v “Information to be collected before contacting the IBM Support Center”
v “How to contact the IBM Support Center” on page 169.
Regardless of the problem encountered with GPFS, the following data should be available when you
contact the IBM Support Center:
1. A description of the problem.
2. Output of the failing application, command, and so forth.
3. A tar file generated by the gpfs.snap command that contains data from the nodes in the cluster. In
large clusters, the gpfs.snap command can collect data from certain nodes (for example, the affected
nodes, NSD servers, or manager nodes) using the -N option.
If the gpfs.snap command cannot be run, collect these items:
a. Any error log entries relating to the event:
v On an AIX node, issue this command:
errpt -a
v On a Linux node, create a tar file of all the entries in the /var/log/messages file from all nodes in
the cluster or the nodes that experienced the failure. For example, issue the following command
to create a tar file that includes all nodes in the cluster:
mmdsh -v -N all "cat /var/log/messages" > all.messages
v On a Windows node, use the Export List... dialog in the Event Viewer to save the event log to a
file.
b. A master GPFS log file that is merged and chronologically sorted for the date of the failure (see
“Creating a master GPFS log file” on page 2).
c. If the cluster was configured to store dumps, collect any internal GPFS dumps written to that
directory relating to the time of the failure. The default directory is /tmp/mmfs.
d. On a failing Linux node, gather the installed software packages and the versions of each package
by issuing this command:
rpm -qa
e. On a failing AIX node, gather the name, most recent level, state, and description of all installed
software packages by issuing this command:
lslpp -l
f. File system attributes for all of the failing file systems, issue:
mmlsfs Device
When a delay or deadlock situation is suspected, the IBM Support Center will need additional
information to assist with problem diagnosis. If you have not done so already, ensure you have the
following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to be collected for all problems related to GPFS” on page 167.
2. The deadlock debug data collected automatically.
3. If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000),
issue the following command:
gpfs.snap --deadlock
If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the
following command:
gpfs.snap --deadlock --quick
When file system corruption or MMFS_FSSTRUCT errors are encountered, the IBM Support Center will
need additional information to assist with problem diagnosis. If you have not done so already, ensure
you have the following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to be collected for all problems related to GPFS” on page 167.
2. Unmount the file system everywhere, then run mmfsck -n in offline mode and redirect it to an output
file.
The IBM Support Center will determine when and if you should run the mmfsck -y command.
When the GPFS daemon is repeatedly crashing, the IBM Support Center will need additional information
to assist with problem diagnosis. If you have not done so already, ensure you have the following
information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to be collected for all problems related to GPFS” on page 167.
2. Ensure the /tmp/mmfs directory exists on all nodes. If this directory does not exist, the GPFS daemon
will not generate internal dumps.
3. Set the traces on this cluster and all clusters that mount any file system from this cluster:
mmtracectl --set --trace=def --trace-recycle=global
4. Start the trace facility by issuing:
mmtracectl --start
When you contact the IBM Support Center, the following will occur:
1. You will be asked for the information you collected in “Information to be collected before
contacting the IBM Support Center” on page 167.
2. You will be given a time period during which an IBM representative will return your call. Be
sure that the person you identified as your contact can be reached at the phone number you
provided in the PMR.
3. An online Problem Management Record (PMR) will be created to track the problem you are
reporting, and you will be advised to record the PMR number for future reference.
4. You may be requested to send data related to the problem you are reporting, using the PMR
number to identify it.
5. Should you need to make subsequent calls to discuss the problem, you will also use the PMR
number to identify the problem.
If you do not have an IBM Software Maintenance service contract
If you do not have an IBM Software Maintenance service contract, contact your IBM sales
representative to find out how to proceed. Be prepared to provide the information you collected
in “Information to be collected before contacting the IBM Support Center” on page 167.
For failures in non-IBM software, follow the problem-reporting procedures provided with that product.
A severity tag is a one-character alphabetic code (A through Z), optionally followed by a colon (:) and a
number, and surrounded by an opening and closing bracket ([ ]). For example:
[E] or [E:nnn]
If more than one substring within a message matches this pattern (for example, [A] or [A:nnn]), the
severity tag is the first such matching string.
When the severity tag includes a numeric code (nnn), this is an error code associated with the message. If
this were the only problem encountered by the command, the command return code would be nnn.
If a message does not have a severity tag, the message does not conform to this specification. You can
determine the message severity by examining the text or any supplemental information provided in the
message catalog, or by contacting the IBM Support Center.
Each message severity tag has an assigned priority that can be used to filter the messages that are sent to
the error log on Linux. Filtering is controlled with the mmchconfig attribute systemLogLevel. The default
for systemLogLevel is error, which means GPFS will send all error [E], critical [X], and alert [A]
messages to the error log. The values allowed for systemLogLevel are: alert, critical, error, warning,
notice, configuration, informational, detail, or debug. Additionally, the value none can be specified so
no messages are sent to the error log.
Alert [A] messages have the highest priority, and debug [B] messages have the lowest priority. If the
systemLogLevel default of error is changed, only messages with the specified severity and all those with
a higher priority are sent to the error log. The following table lists the message severity tags in order of
priority:
Table 14. Message severity tags ordered by priority
Type of message
(systemLogLevel
Severity tag attribute) Meaning
A alert Indicates a problem where action must be taken immediately. Notify the
appropriate person to correct the problem.
X critical Indicates a critical condition that should be corrected immediately. The
system discovered an internal inconsistency of some kind. Command
execution might be halted or the system might attempt to continue despite
the inconsistency. Report these errors to the IBM Support Center.
E error Indicates an error condition. Command execution might or might not
continue, but this error was likely caused by a persistent condition and will
remain until corrected by some other program or administrative action. For
example, a command operating on a single file or other GPFS object might
terminate upon encountering any condition of severity E. As another
example, a command operating on a list of files, finding that one of the files
has permission bits set that disallow the operation, might continue to
operate on all other files within the specified list of files.
Messages for GPFS Native RAID in the ranges 6027-1850 – 6027-1899 and 6027-3000 – 6027-3099 are
documented in IBM Spectrum Scale RAID: Administration.
6027-329 Cannot pin the main shared segment: 6027-339 [E] Nonnumeric trace value 'value' after class
name 'class'.
Explanation: Trying to pin the shared segment during Explanation: The specified trace value is not
initialization. recognized.
User response: Check the mmfs.cfg file. The pagepool User response: Specify a valid trace integer value.
size may be too large. It cannot be more than 80% of
real memory. If a previous mmfsd crashed, check for
6027-340 Child process file failed to start due to
processes that begin with the name mmfs that may be
error rc: errStr.
holding on to an old pinned shared segment. Issue
mmchconfig command to change the pagepool size. Explanation: A failure occurred when GPFS attempted
to start a program.
6027-334 [E] Error initializing internal User response: If the program was a user exit script,
communications. verify the script file exists and has appropriate
permissions assigned. If the program was not a user
Explanation: The mailbox system used by the daemon
exit script, then this is an internal GPFS error or the
for communication with the kernel cannot be
GPFS installation was altered.
initialized.
User response: Increase the size of available memory
6027-341 [D] Node nodeName is incompatible because
using the mmchconfig command.
its maximum compatible version
(number) is less than the version of this
6027-335 [E] Configuration error: check fileName. node (number). [value/value]
Explanation: A configuration error is found. Explanation: The GPFS daemon tried to make a
connection with another GPFS daemon. However, the
User response: Check the mmfs.cfg file and other
other daemon is not compatible. Its maximum
error messages.
compatible version is less than the version of the
daemon running on this node. The numbers in square
6027-336 [E] Value 'value' for configuration parameter brackets are for use by the IBM Support Center.
'parameter' is not valid. Check fileName.
User response: Verify your GPFS daemon version.
Explanation: A configuration error was found.
User response: Check the mmfs.cfg file. 6027-342 [E] Node nodeName is incompatible because
its minimum compatible version is
greater than the version of this node
6027-337 [N] Waiting for resources to be reclaimed (number). [value/value]
before exiting.
Explanation: The GPFS daemon tried to make a
Explanation: The mmfsd daemon is attempting to connection with another GPFS daemon. However, the
terminate, but cannot because data structures in the other daemon is not compatible. Its minimum
daemon shared segment may still be referenced by compatible version is greater than the version of the
kernel code. This message may be accompanied by daemon running on this node. The numbers in square
other messages that show which disks still have I/O in brackets are for use by the IBM Support Center.
progress.
User response: Verify your GPFS daemon version.
User response: None. Informational message only.
6027-344 [E] Node nodeName is incompatible because 6027-349 [E] Bad "subnets" configuration: invalid
its version is greater than the maximum cluster name pattern
compatible version of this node "clusterNamePattern".
(number). [value/value]
Explanation: A cluster name pattern specified by the
Explanation: The GPFS daemon tried to make a subnets configuration parameter could not be parsed.
connection with another GPFS daemon. However, the
User response: Run the mmlsconfig command and
other daemon is not compatible. Its version is greater
check the value of the subnets parameter. The optional
than the maximum compatible version of the daemon
cluster name pattern following subnet address must be
running on this node. The numbers in square brackets
a shell-style pattern allowing '*', '/' and '[...]' as wild
are for use by the IBM Support Center.
cards. Run the mmchconfig subnets command to
User response: Verify your GPFS daemon version. correct the value.
6027-345 Network error on ipAddress, check 6027-350 [E] Bad "subnets" configuration: primary IP
connectivity. address ipAddress is on a private subnet.
Use a public IP address instead.
Explanation: A TCP error has caused GPFS to exit due
to a bad return code from an error. Exiting allows Explanation: GPFS is configured to allow multiple IP
recovery to proceed on another node and resources are addresses per node (subnets configuration parameter),
not tied up on this node. but the primary IP address of the node (the one
specified when the cluster was created or when the
User response: Follow network problem
node was added to the cluster) was found to be on a
determination procedures.
private subnet. If multiple IP addresses are used, the
primary address must be a public IP address.
6027-346 [E] Incompatible daemon version. My
User response: Remove the node from the cluster;
version = number, repl.my_version =
then add it back using a public IP address.
number
Explanation: The GPFS daemon tried to make a
6027-358 Communication with mmspsecserver
connection with another GPFS daemon. However, the
through socket name failed, err value:
other GPFS daemon is not the same version and it sent
errorString, msgType messageType.
a reply indicating its version number is incompatible.
Explanation: Communication failed between
User response: Verify your GPFS daemon version.
spsecClient (the daemon) and spsecServer.
User response: Verify both the communication socket
6027-347 [E] Remote host ipAddress refused
and the mmspsecserver process.
connection because IP address ipAddress
was not in the node list file
6027-359 The mmspsecserver process is shutting
Explanation: The GPFS daemon tried to make a
down. Reason: explanation.
connection with another GPFS daemon. However, the
other GPFS daemon sent a reply indicating it did not Explanation: The mmspsecserver process received a
recognize the IP address of the connector. signal from the mmfsd daemon or encountered an
error on execution.
User response: Add the IP address of the local host to
the node list file on the remote host. User response: Verify the reason for shutdown.
6027-348 [E] Bad "subnets" configuration: invalid 6027-360 Disk name must be removed from the
subnet "ipAddress". /etc/filesystems stanza before it can be
deleted.
Explanation: A subnet specified by the subnets
configuration parameter could not be parsed. Explanation: A disk being deleted is found listed in
the disks= list for a file system.
User response: Run the mmlsconfig command and
check the value of the subnets parameter. Each subnet User response: Remove the disk from list.
must be specified as a dotted-decimal IP address. Run
the mmchconfig subnets command to correct the
6027-361 [E] Local access to disk failed with EIO,
value.
switching to access the disk remotely.
Explanation: Local access to the disk failed. To avoid
unmounting of the file system, the disk will now be
accessed remotely.
User response: Wait until work continuing on the inaccessible for writing and reissue the mmadddisk
local node completes. Then determine why local access command.
to the disk failed, correct the problem and restart the
daemon. This will cause GPFS to begin accessing the
6027-370 mmdeldisk completed.
disk locally again.
Explanation: The mmdeldisk command has
completed.
6027-362 Attention: No disks were deleted, but
some data was migrated. The file system User response: None. Informational message only.
may no longer be properly balanced.
Explanation: The mmdeldisk command did not 6027-371 Cannot delete all disks in the file
complete migrating data off the disks being deleted. system
The disks were restored to normal ready, status, but
the migration has left the file system unbalanced. This Explanation: An attempt was made to delete all the
may be caused by having too many disks unavailable disks in a file system.
or insufficient space to migrate all of the data to other User response: Either reduce the number of disks to
disks. be deleted or use the mmdelfs command to delete the
User response: Check disk availability and space file system.
requirements. Determine the reason that caused the
command to end before successfully completing the 6027-372 Replacement disk must be in the same
migration and disk deletion. Reissue the mmdeldisk failure group as the disk being replaced.
command.
Explanation: An improper failure group was specified
for mmrpldisk.
6027-363 I/O error writing disk descriptor for
disk name. User response: Specify a failure group in the disk
descriptor for the replacement disk that is the same as
Explanation: An I/O error occurred when the the failure group of the disk being replaced.
mmadddisk command was writing a disk descriptor on
a disk. This could have been caused by either a
configuration error or an error in the path to the disk. 6027-373 Disk diskName is being replaced, so
status of disk diskName must be
User response: Determine the reason the disk is replacement.
inaccessible for writing and reissue the mmadddisk
command. Explanation: The mmrpldisk command failed when
retrying a replace operation because the new disk does
not have the correct status.
6027-364 Error processing disks.
User response: Issue the mmlsdisk command to
Explanation: An error occurred when the mmadddisk display disk status. Then either issue the mmchdisk
command was reading disks in the file system. command to change the status of the disk to
User response: Determine the reason why the disks replacement or specify a new disk that has a status of
are inaccessible for reading, then reissue the replacement.
mmadddisk command.
6027-374 Disk name may not be replaced.
6027-365 [I] Rediscovered local access to disk. Explanation: A disk being replaced with mmrpldisk
Explanation: Rediscovered local access to disk, which does not have a status of ready or suspended.
failed earlier with EIO. For good performance, the disk User response: Use the mmlsdisk command to
will now be accessed locally. display disk status. Issue the mmchdisk command to
User response: Wait until work continuing on the change the status of the disk to be replaced to either
local node completes. This will cause GPFS to begin ready or suspended.
accessing the disk locally again.
6027-375 Disk name diskName already in file
6027-369 I/O error writing file system descriptor system.
for disk name. Explanation: The replacement disk name specified in
Explanation: mmadddisk detected an I/O error while the mmrpldisk command already exists in the file
writing a file system descriptor on a disk. system.
User response: Determine the reason the disk is User response: Specify a different disk as the
replacement disk.
6027-376 Previous replace command must be 6027-382 Value value for the 'sector size' option
completed before starting a new one. for disk disk is not a multiple of value.
Explanation: The mmrpldisk command failed because Explanation: When parsing disk lists, the sector size
the status of other disks shows that a replace command given is not a multiple of the default sector size.
did not complete.
User response: Specify a correct sector size.
User response: Issue the mmlsdisk command to
display disk status. Retry the failed mmrpldisk
6027-383 Disk name name appears more than
command or issue the mmchdisk command to change
once.
the status of the disks that have a status of replacing or
replacement. Explanation: When parsing disk lists, a duplicate
name is found.
6027-377 Cannot replace a disk that is in use. User response: Remove the duplicate name.
Explanation: Attempting to replace a disk in place,
but the disk specified in the mmrpldisk command is 6027-384 Disk name name already in file system.
still available for use.
Explanation: When parsing disk lists, a disk name
User response: Use the mmchdisk command to stop already exists in the file system.
GPFS's use of the disk.
User response: Rename or remove the duplicate disk.
User response: Specify a correct 'has metadata' value. 3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or
6027-390 Value value for the 'has metadata' option virtual shared disks were incorrectly re-configured
for disk name is invalid. after creating a file system.
Explanation: When parsing disk lists, the 'has User response: Verify:
metadata' value given is not valid. 1. The disks are correctly defined on all nodes.
User response: Specify a correct 'has metadata' value. 2. The paths to the disks are correctly defined and
operational.
6027-394 Too many disks specified for file User response: Start any disks that have been stopped
system. Maximum = number. by the mmchdisk command or by hardware failures.
Verify that paths to all disks are correctly defined and
Explanation: Too many disk names were passed in the operational.
disk descriptor list.
User response: Check the disk descriptor list or the 6027-420 Inode size must be greater than zero.
file containing the list.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-399 Not enough items in disk descriptor list
entry, need fields. User response: Record the above information. Contact
the IBM Support Center.
Explanation: When parsing a disk descriptor, not
enough fields were specified for one disk.
6027-421 Inode size must be a multiple of logical
User response: Correct the disk descriptor to use the sector size.
correct disk descriptor syntax.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-416 Incompatible file system descriptor
version or not formatted. User response: Record the above information. Contact
the IBM Support Center.
Explanation: Possible reasons for the error are:
1. A file system descriptor version that is not valid
was encountered.
2. No file system descriptor can be found.
6027-422 Inode size must be at least as large as 6027-428 Indirect block size must be a multiple
the logical sector size. of the minimum fragment size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.
6027-423 Minimum fragment size must be a 6027-429 Indirect block size must be less than
multiple of logical sector size. full data block size.
Explanation: An internal consistency check has found Explanation: An internal consistency check has found
a problem with file system parameters. a problem with file system parameters.
User response: Record the above information. Contact User response: Record the above information. Contact
the IBM Support Center. the IBM Support Center.
6027-424 Minimum fragment size must be greater 6027-430 Default metadata replicas must be less
than zero. than or equal to default maximum
number of metadata replicas.
Explanation: An internal consistency check has found
a problem with file system parameters. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center. User response: Record the above information. Contact
the IBM Support Center.
6027-425 File system block size of blockSize is
larger than maxblocksize parameter. 6027-431 Default data replicas must be less than
or equal to default maximum number of
Explanation: An attempt is being made to mount a
data replicas.
file system whose block size is larger than the
maxblocksize parameter as set by mmchconfig. Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Use the mmchconfig
maxblocksize=xxx command to increase the maximum User response: Record the above information. Contact
allowable block size. the IBM Support Center.
6027-426 Warning: mount detected unavailable 6027-432 Default maximum metadata replicas
disks. Use mmlsdisk fileSystem to see must be less than or equal to value.
details.
Explanation: An internal consistency check has found
Explanation: The mount command detected that some a problem with file system parameters.
disks needed for the file system are unavailable.
User response: Record the above information. Contact
User response: Without file system replication the IBM Support Center.
enabled, the mount will fail. If it has replication, the
mount may succeed depending on which disks are
6027-433 Default maximum data replicas must be
unavailable. Use mmlsdisk to see details of the disk
less than or equal to value.
status.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-427 Indirect block size must be at least as
large as the minimum fragment size. User response: Record the above information. Contact
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
6027-434 Indirect blocks must be at least as big as
User response: Record the above information. Contact
inodes.
the IBM Support Center.
Explanation: An internal consistency check has found
a problem with file system parameters.
User response: Record the above information. Contact
the IBM Support Center.
6027-468 Disk name listed in fileName or local 6027-472 [E] File system format version versionString
mmsdrfs file, not found in device name. is not supported.
Run: mmcommon recoverfs name.
Explanation: The current file system format version is
Explanation: Tried to access a file system but the disks not supported.
listed in the operating system's file system database or
User response: Verify:
the local mmsdrfs file for the device do not exist in the
file system. 1. The disks are correctly defined on all nodes.
2. The paths to the disks are correctly defined and
User response: Check the configuration and
operative.
availability of disks. Run the mmcommon recoverfs
device command. If this does not resolve the problem,
configuration data in the SDR may be incorrect. If no 6027-473 [X] File System fileSystem unmounted by the
user modifications have been made to the SDR, contact system with return code value reason
the IBM Support Center. If user modifications have code value
been made, correct these modifications.
Explanation: Console log entry caused by a forced
unmount due to disk or communication failure.
6027-469 File system name does not match
descriptor. User response: Correct the underlying problem and
remount the file system.
Explanation: The file system name found in the
descriptor on disk does not match the corresponding
device name in /etc/filesystems. 6027-474 [X] Recovery Log I/O failed, unmounting
file system fileSystem
User response: Check the operating system's file
system database. Explanation: I/O to the recovery log failed.
User response: Check the paths to all disks making up
6027-470 Disk name may still belong to file the file system. Run the mmlsdisk command to
system filesystem. Created on IPandTime. determine if GPFS has declared any disks unavailable.
Repair any paths to disks that have failed. Remount the
Explanation: The disk being added by the mmcrfs, file system.
mmadddisk, or mmrpldisk command appears to still
belong to some file system.
6027-475 The option '--inode-limit' is not enabled.
User response: Verify that the disks you are adding Use option '-V' to enable most recent
do not belong to an active file system, and use the -v features.
no option to bypass this check. Use this option only if
you are sure that no other file system has this disk Explanation: mmchfs --inode-limit is not enabled
configured because you may cause data corruption in under the current file system format version.
both file systems if this is not the case. User response: Run mmchfs -V, this will change the
file system format to the latest format supported.
6027-471 Disk diskName: Incompatible file system
descriptor version or not formatted. 6027-476 Restricted mount using only available
Explanation: Possible reasons for the error are: file system descriptor.
1. A file system descriptor version that is not valid Explanation: Fewer than the necessary number of file
was encountered. system descriptors were successfully read. Using the
2. No file system descriptor can be found. best available descriptor to allow the restricted mount
to continue.
3. Disks are not correctly defined on all active nodes.
4. Disks, logical volumes, network shared disks, or User response: Informational message only.
virtual shared disks were incorrectly reconfigured
after creating a file system. 6027-477 The option -z is not enabled. Use the -V
User response: Verify: option to enable most recent features.
1. The disks are correctly defined on all nodes. Explanation: The file system format version does not
2. The paths to the disks are correctly defined and support the -z option on the mmchfs command.
operative. User response: Change the file system format version
by issuing mmchfs -V.
6027-478 The option -z could not be changed. 6027-484 Remount failed for device after daemon
fileSystem is still in use. restart.
Explanation: The file system is still mounted or Explanation: A remount failed after daemon restart.
another GPFS administration command (mm...) is This ordinarily occurs because one or more disks are
running against the file system. unavailable. Other possibilities include loss of
connectivity to one or more disks.
User response: Unmount the file system if it is
mounted, and wait for any command that is running to User response: Issue the mmlsdisk command and
complete before reissuing the mmchfs -z command. check for down disks. Issue the mmchdisk command
to start any down disks, then remount the file system.
If there is another problem with the disks or the
6027-479 [N] Mount of fsName was blocked by
connections to the disks, take necessary corrective
fileName
actions and remount the file system.
Explanation: The internal or external mount of the file
system was blocked by the existence of the specified
6027-485 Perform mmchdisk for any disk failures
file.
and re-mount.
User response: If the file system needs to be mounted,
Explanation: Occurs in conjunction with 6027-484.
remove the specified file.
User response: Follow the User response for 6027-484.
6027-480 Cannot enable DMAPI in a file system
with existing snapshots. 6027-486 No local device specified for
fileSystemName in clusterName.
Explanation: The user is not allowed to enable
DMAPI for a file system with existing snapshots. Explanation: While attempting to mount a remote file
system from another cluster, GPFS was unable to
User response: Delete all existing snapshots in the file
determine the local device name for this file system.
system and repeat the mmchfs command.
User response: There must be a /dev/sgname special
device defined. Check the error code. This is probably a
6027-481 [E] Remount failed for mountid id:
configuration error in the specification of a remote file
errnoDescription
system. Run mmremotefs show to check that the
Explanation: mmfsd restarted and tried to remount remote file system is properly configured.
any file systems that the VFS layer thinks are still
mounted.
6027-487 Failed to write the file system descriptor
User response: Check the errors displayed and the to disk diskName.
errno description.
Explanation: An error occurred when mmfsctl include
was writing a copy of the file system descriptor to one
6027-482 [E] Remount failed for device name: of the disks specified on the command line. This could
errnoDescription have been caused by a failure of the corresponding disk
device, or an error in the path to the disk.
Explanation: mmfsd restarted and tried to remount
any file systems that the VFS layer thinks are still User response: Verify that the disks are correctly
mounted. defined on all nodes. Verify that paths to all disks are
correctly defined and operational.
User response: Check the errors displayed and the
errno description.
6027-488 Error opening the exclusion disk file
fileName.
6027-483 [N] Remounted name
Explanation: Unable to retrieve the list of excluded
Explanation: mmfsd restarted and remounted the disks from an internal configuration file.
specified file system because it was in the kernel's list
of previously mounted file systems. User response: Ensure that GPFS executable files have
been properly installed on all nodes. Perform required
User response: Informational message only. configuration steps prior to starting GPFS.
6027-489 Attention: The desired replication factor 6027-495 You have requested that the file system
exceeds the number of available be upgraded to version number. This
dataOrMetadata failure groups. This is will enable new functionality but will
allowed, but the files will not be prevent you from using the file system
replicated and will therefore be at risk. with earlier releases of GPFS. Do you
want to continue?
Explanation: You specified a number of replicas that
exceeds the number of failure groups available. Explanation: Verification request in response to the
mmchfs -V full command. This is a request to upgrade
User response: Reissue the command with a smaller
the file system and activate functions that are
replication factor, or increase the number of failure
incompatible with a previous release of GPFS.
groups.
User response: Enter yes if you want the conversion
to take place.
6027-490 [N] The descriptor replica on disk diskName
has been excluded.
6027-496 You have requested that the file system
Explanation: The file system descriptor quorum has
version for local access be upgraded to
been overridden and, as a result, the specified disk was
version number. This will enable some
excluded from all operations on the file system
new functionality but will prevent local
descriptor quorum.
nodes from using the file system with
User response: None. Informational message only. earlier releases of GPFS. Remote nodes
are not affected by this change. Do you
want to continue?
6027-492 The file system is already at file system
version number Explanation: Verification request in response to the
mmchfs -V command. This is a request to upgrade the
Explanation: The user tried to upgrade the file system file system and activate functions that are incompatible
format using mmchfs -V --version=v, but the specified with a previous release of GPFS.
version is smaller than the current version of the file
system. User response: Enter yes if you want the conversion
to take place.
User response: Specify a different value for the
--version option.
6027-497 The file system has already been
upgraded to number using -V full. It is
6027-493 File system version number is not not possible to revert back.
supported on nodeName nodes in the
cluster. Explanation: The user tried to upgrade the file system
format using mmchfs -V compat, but the file system
Explanation: The user tried to upgrade the file system has already been fully upgraded.
format using mmchfs -V, but some nodes in the local
cluster are still running an older GPFS release that does User response: Informational message only.
support the new format version.
User response: Install a newer version of GPFS on 6027-498 Incompatible file system format. Only
those nodes. file systems formatted with GPFS 3.2.1.5
or later can be mounted on this
platform.
6027-494 File system version number is not
supported on the following nodeName Explanation: A user running GPFS on Microsoft
remote nodes mounting the file system: Windows tried to mount a file system that was
formatted with a version of GPFS that did not have
Explanation: The user tried to upgrade the file system Windows support.
format using mmchfs -V, but the file system is still
mounted on some nodes in remote clusters that do not User response: Create a new file system using current
support the new format version. GPFS code.
User response: Unmount the file system on the nodes
that do not support the new format version. 6027-499 [X] An unexpected Device Mapper path
dmDevice (nsdId) has been detected. The
new path does not have a Persistent
Reserve set up. File system fileSystem
will be internally unmounted.
Explanation: A new device mapper path is detected or
a previously failed path is activated after the local
6027-507 program: loadFile is not loaded. User response: Take the action indicated by other
error messages and error log entries.
Explanation: The program could not be loaded.
User response: None. Informational message only. 6027-518 Cannot mount fileSystem: Already
mounted.
6027-510 Cannot mount fileSystem on mountPoint: Explanation: An attempt has been made to mount a
errorString file system that is already mounted.
Explanation: There was an error mounting the GPFS User response: None. Informational message only.
file system.
User response: Determine action indicated by the
error messages and error log entries. Errors in the disk
path often cause this problem.
6027-519 Cannot mount fileSystem on mountPoint: 6027-535 Disks up to size size can be added to
File system table full. storage pool pool.
Explanation: An attempt has been made to mount a Explanation: Based on the parameters given to
file system when the file system table is full. mmcrfs and the size and number of disks being
formatted, GPFS has formatted its allocation maps to
User response: None. Informational message only.
allow disks up the given size to be added to this
storage pool by the mmadddisk command.
6027-520 Cannot mount fileSystem: File system
User response: None. Informational message only. If
table full.
the reported maximum disk size is smaller than
Explanation: An attempt has been made to mount a necessary, delete the file system with mmdelfs and
file system when the file system table is full. rerun mmcrfs with either larger disks or a larger value
for the -n parameter.
User response: None. Informational message only.
User response: None. Informational message only. Explanation: Insufficient memory for GPFS internal
data structures with current system and GPFS
configuration.
6027-531 The following disks of name will be
formatted on node nodeName: list. User response: Reduce page pool usage with the
mmchconfig command, or add additional RAM to
Explanation: Output showing which disks will be system.
formatted by the mmcrfs command.
User response: None. Informational message only. 6027-537 Disks up to size size can be added to
this file system.
6027-532 [E] The quota record recordNumber in file Explanation: Based on the parameters given to the
fileName is not valid. mmcrfs command and the size and number of disks
Explanation: A quota entry contained a checksum that being formatted, GPFS has formatted its allocation
is not valid. maps to allow disks up the given size to be added to
this file system by the mmadddisk command.
User response: Remount the file system with quotas
disabled. Restore the quota file from back up, and run User response: None, informational message only. If
mmcheckquota. the reported maximum disk size is smaller than
necessary, delete the file system with mmdelfs and
reissue the mmcrfs command with larger disks or a
6027-533 [W] Inode space inodeSpace in file system larger value for the -n parameter.
fileSystem is approaching the limit for
the maximum number of inodes.
6027-538 Error accessing disks.
Explanation: The number of files created is
approaching the file system limit. Explanation: The mmcrfs command encountered an
error accessing one or more of the disks.
User response: Use the mmchfileset command to
increase the maximum number of files to avoid User response: Verify that the disk descriptors are
reaching the inode limit and possible performance coded correctly and that all named disks exist and are
degradation. online.
6027-534 Cannot create a snapshot in a 6027-539 Unable to clear descriptor areas for
DMAPI-enabled file system, fileSystem.
rc=returnCode. Explanation: The mmdelfs command encountered an
Explanation: You cannot create a snapshot in a error while invalidating the file system control
DMAPI-enabled file system. structures on one or more disks in the file system being
deleted.
User response: Use the mmchfs command to disable
DMAPI, and reissue the command. User response: If the problem persists, specify the -p
option on the mmdelfs command.
6027-544 Could not invalidate disk of fileSystem. 6027-551 fileSystem is still in use.
Explanation: A disk could not be written to invalidate Explanation: The mmdelfs or mmcrfs command
its contents. Check the subsystems in the path to the found that the named file system is still mounted or
disk. This is often an I/O error. that another GPFS command is running against the file
system.
User response: Ensure the indicated logical volume is
writable. User response: Unmount the file system if it is
mounted, or wait for GPFS commands in progress to
terminate before retrying the command.
6027-545 Error processing fileset metadata file.
Explanation: There is no I/O path to critical metadata
6027-552 Scan completed successfully.
or metadata has been corrupted.
Explanation: The scan function has completed without
User response: Verify that the I/O paths to all disks
error.
are valid and that all disks are either in the 'recovering'
or 'up' availability states. If all disks are available and User response: None. Informational message only.
the problem persists, issue the mmfsck command to
repair damaged metadata
6027-553 Scan failed on number user or system
files.
6027-546 Error processing allocation map for
storage pool poolName. Explanation: Data may be lost as a result of pointers
that are not valid or unavailable disks.
Explanation: There is no I/O path to critical metadata,
or metadata has been corrupted. User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
User response: Verify that the I/O paths to all disks check the availability of all the disks that make up the
are valid, and that all disks are either in the 'recovering' file system.
6027-554 Scan failed on number out of number user 6027-560 File system is already suspended.
or system files.
Explanation: The tsfsctl command was asked to
Explanation: Data may be lost as a result of pointers suspend a suspended file system.
that are not valid or unavailable disks.
User response: None. Informational message only.
User response: Some files may have to be restored
from backup copies. Issue the mmlsdisk command to
6027-561 Error migrating log.
check the availability of all the disks that make up the
file system. Explanation: There are insufficient available disks to
continue operation.
6027-555 The desired replication factor exceeds User response: Restore the unavailable disks and
the number of available failure groups. reissue the command.
Explanation: You have specified a number of replicas
that exceeds the number of failure groups available. 6027-562 Error processing inodes.
User response: Reissue the command with a smaller Explanation: There is no I/O path to critical metadata
replication factor or increase the number of failure or metadata has been corrupted.
groups.
User response: Verify that the I/O paths to all disks
are valid and that all disks are either in the recovering
6027-556 Not enough space for the desired or up availability. Issue the mmlsdisk command.
number of replicas.
Explanation: In attempting to restore the correct 6027-563 File system is already running.
replication, GPFS ran out of space in the file system.
The operation can continue but some data is not fully Explanation: The tsfsctl command was asked to
replicated. resume a file system that is already running.
User response: Make additional space available and User response: None. Informational message only.
reissue the command.
6027-564 Error processing inode allocation map.
6027-557 Not enough space or available disks to Explanation: There is no I/O path to critical metadata
properly balance the file. or metadata has been corrupted.
Explanation: In attempting to stripe data within the User response: Verify that the I/O paths to all disks
file system, data was placed on a disk other than the are valid and that all disks are either in the recovering
desired one. This is normally not a problem. or up availability. Issue the mmlsdisk command.
User response: Run mmrestripefs to rebalance all
files. 6027-565 Scanning user file metadata ...
Explanation: Progress information.
6027-558 Some data are unavailable.
User response: None. Informational message only.
Explanation: An I/O error has occurred or some disks
are in the stopped state.
6027-566 Error processing user file metadata.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to Explanation: Error encountered while processing user
all disks. Reissue the command. file metadata.
User response: None. Informational message only.
6027-559 Some data could not be read or written.
Explanation: An I/O error has occurred or some disks 6027-567 Waiting for pending file system scan to
are in the stopped state. finish ...
User response: Check the availability of all disks and Explanation: Progress information.
the path to all disks, and reissue the command. User response: None. Informational message only.
6027-568 Waiting for number pending file system 6027-575 Unable to complete low level format for
scans to finish ... fileSystem. Failed with error errorCode
Explanation: Progress information. Explanation: The mmcrfs command was unable to
create the low level file structures for the file system.
User response: None. Informational message only.
User response: Check other error messages and the
error log. This is usually an error accessing disks.
6027-569 Incompatible parameters. Unable to
allocate space for file system metadata.
Change one or more of the following as 6027-576 Storage pools have not been enabled for
suggested and try again: file system fileSystem.
Explanation: Incompatible file system parameters Explanation: User invoked a command with a storage
were detected. pool option (-p or -P) before storage pools were
enabled.
User response: Refer to the details given and correct
the file system parameters. User response: Enable storage pools with the mmchfs
-V command, or correct the command invocation and
reissue the command.
6027-570 Incompatible parameters. Unable to
create file system. Change one or more
of the following as suggested and try 6027-577 Attention: number user or system files
again: are not properly replicated.
Explanation: Incompatible file system parameters Explanation: GPFS has detected files that are not
were detected. replicated correctly due to a previous failure.
User response: Refer to the details given and correct User response: Issue the mmrestripefs command at
the file system parameters. the first opportunity.
6027-571 Logical sector size value must be the 6027-578 Attention: number out of number user or
same as disk sector size. system files are not properly replicated:
Explanation: This message is produced by the mmcrfs Explanation: GPFS has detected files that are not
command if the sector size given by the -l option is not replicated correctly
the same as the sector size given for disks in the -d
option.
6027-579 Some unreplicated file system metadata
User response: Correct the options and reissue the has been lost. File system usable only in
command. restricted mode.
Explanation: A disk was deleted that contained vital
6027-572 Completed creation of file system file system metadata that was not replicated.
fileSystem.
User response: Mount the file system in restricted
Explanation: The mmcrfs command has successfully mode (-o rs) and copy any user data that may be left
completed. on the file system. Then delete the file system.
User response: None. Informational message only.
6027-580 Unable to access vital system metadata.
Too many disks are unavailable.
6027-573 All data on the following disks of
fileSystem will be destroyed: Explanation: Metadata is unavailable because the
disks on which the data reside are stopped, or an
Explanation: Produced by the mmdelfs command to
attempt was made to delete them.
list the disks in the file system that is about to be
destroyed. Data stored on the disks will be lost. User response: Either start the stopped disks, try to
delete the disks again, or recreate the file system.
User response: None. Informational message only.
User response: Determine why a disk is unavailable. command must be run with the file system unmounted.
6027-582 Some data has been lost. 6027-588 No more than number nodes can mount
a file system.
Explanation: An I/O error has occurred or some disks
are in the stopped state. Explanation: The limit of the number of nodes that
can mount a file system was exceeded.
User response: Check the availability of all disks by
issuing the mmlsdisk command and check the path to User response: Observe the stated limit for how many
all disks. Reissue the command. nodes can mount a file system.
6027-584 Incompatible parameters. Unable to 6027-589 Scanning file system metadata, phase
allocate space for root directory. Change number ...
one or more of the following as
Explanation: Progress information.
suggested and try again:
User response: None. Informational message only.
Explanation: Inconsistent parameters have been
passed to the mmcrfs command, which would result in
the creation of an inconsistent file system. Suggested 6027-590 [W] GPFS is experiencing a shortage of
parameter changes are given. pagepool. This message will not be
repeated for at least one hour.
User response: Reissue the mmcrfs command with the
suggested parameter changes. Explanation: Pool starvation occurs, buffers have to be
continually stolen at high aggressiveness levels.
6027-585 Incompatible parameters. Unable to User response: Issue the mmchconfig command to
allocate space for ACL data. Change one increase the size of pagepool.
or more of the following as suggested
and try again:
6027-591 Unable to allocate sufficient inodes for
Explanation: Inconsistent parameters have been file system metadata. Increase the value
passed to the mmcrfs command, which would result in for option and try again.
the creation of an inconsistent file system. The
parameters entered require more space than is Explanation: Too few inodes have been specified on
available. Suggested parameter changes are given. the -N option of the mmcrfs command.
User response: Reissue the mmcrfs command with the User response: Increase the size of the -N option and
suggested parameter changes. reissue the mmcrfs command.
6027-586 Quota server initialization failed. 6027-592 Mount of fileSystem is waiting for the
mount disposition to be set by some
Explanation: Quota server initialization has failed. data management application.
This message may appear as part of the detail data in
the quota error log. Explanation: Data management utilizing DMAPI is
enabled for the file system, but no data management
User response: Check status and availability of the application has set a disposition for the mount event.
disks. If quota files have been corrupted, restore them
from the last available backup. Finally, reissue the User response: Start the data management application
command. and verify that the application sets the mount
disposition.
6027-594 Disk diskName cannot be added to 6027-597 [E] The quota command was requested to
storage pool poolName. Allocation map process quotas for a type (user, group, or
cannot accommodate disks larger than fileset), which is not enabled.
size MB.
Explanation: A quota command was requested to
Explanation: The specified disk is too large compared process quotas for a user, group, or fileset quota type,
to the disks that were initially used to create the which is not enabled.
storage pool.
User response: Verify that the user, group, or fileset
User response: Specify a smaller disk or add the disk quota type is enabled and reissue the command.
to a new storage pool.
6027-598 [E] The supplied file does not contain quota
6027-595 [E] While creating quota files, file fileName, information.
with no valid quota information was
Explanation: A file supplied as a quota file does not
found in the root directory. Remove files
contain quota information.
with reserved quota file names (for
example, user.quota) without valid User response: Change the file so it contains valid
quota information from the root quota information and reissue the command.
directory by: - mounting the file system
without quotas, - removing the files, To mount the file system so that new quota files are
and - remounting the file system with created:
quotas to recreate new quota files. To 1. Mount the file system without quotas.
use quota file names other than the 2. Verify there are no files in the root directory with
reserved names, use the mmcheckquota the reserved user.quota or group.quota name.
command.
3. Remount the file system with quotas.
Explanation: While mounting a file system, the state
of the file system descriptor indicates that quota files
6027-599 [E] File supplied to the command does not
do not exist. However, files that do not contain quota
exist in the root directory.
information but have one of the reserved names:
user.quota, group.quota, or fileset.quota exist in the Explanation: The user-supplied name of a new quota
root directory. file has not been found.
User response: To mount the file system so that new User response: Ensure that a file with the supplied
quota files will be created, perform these steps: name exists. Then reissue the command.
1. Mount the file system without quotas.
2. Verify that there are no files in the root directory 6027-600 On node nodeName an earlier error may
with the reserved names: user.quota, group.quota, have caused some file system data to be
or fileset.quota. inaccessible at this time. Check error log
3. Remount the file system with quotas. To mount the for additional information. After
file system with other files used as quota files, issue correcting the problem, the file system
the mmcheckquota command. can be mounted again to restore normal
data access.
6027-596 [I] While creating quota files, file fileName Explanation: An earlier error may have caused some
containing quota information was found file system data to be inaccessible at this time.
in the root directory. This file will be User response: Check the error log for additional
used as quotaType quota file. information. After correcting the problem, the file
Explanation: While mounting a file system, the state system can be mounted again.
of the file system descriptor indicates that quota files
do not exist. However, files that have one of the 6027-601 Error changing pool size.
reserved names user.quota, group.quota, or
fileset.quota and contain quota information, exist in the Explanation: The mmchconfig command failed to
root directory. The file with the reserved name will be change the pool size to the requested value.
used as the quota file. User response: Follow the suggested actions in the
User response: None. Informational message. other messages that occur with this one.
User response: Check the return code. This is usually User response: Check that the communications paths
due to network or disk connectivity problems. Issue the are available between the two nodes.
mmlsdisk command to determine if the paths to the
6027-614 Value value for option name is out of 6027-621 Negative quota limits are not allowed.
range. Valid values are number through
Explanation: The quota value must be positive.
number.
User response: Reissue the mmedquota command and
Explanation: The value for an option in the command
enter valid values when editing the information.
line arguments is out of range.
User response: Correct the command line and reissue
6027-622 [E] Failed to join remote cluster clusterName
the command.
Explanation: The node was not able to establish
communication with another cluster, usually while
6027-615 mmcommon getContactNodes
attempting to mount a file system from a remote
clusterName failed. Return code value.
cluster.
Explanation: mmcommon getContactNodes failed
User response: Check other console messages for
while looking up contact nodes for a remote cluster,
additional information. Verify that contact nodes for the
usually while attempting to mount a file system from a
remote cluster are set correctly. Run mmremotefs show
remote cluster.
and mmremotecluster show to display information
User response: Check the preceding messages, and about the remote cluster.
consult the earlier chapters of this document. A
frequent cause for such errors is lack of space in /var.
6027-623 All disks up and ready
Explanation: Self-explanatory.
6027-616 [X] Duplicate address ipAddress in node list
User response: None. Informational message only.
Explanation: The IP address appears more than once
in the node list file.
6027-624 No disks
User response: Check the node list shown by the
mmlscluster command. Explanation: Self-explanatory.
User response: None. Informational message only.
6027-617 [I] Recovered number nodes for cluster
clusterName.
6027-625 File system manager takeover already
Explanation: The asynchronous part (phase 2) of node pending.
failure recovery has completed.
Explanation: A request to migrate the file system
User response: None. Informational message only. manager failed because a previous migrate request has
not yet completed.
6027-618 [X] Local host not found in node list (local User response: None. Informational message only.
ip interfaces: interfaceList)
Explanation: The local host specified in the node list 6027-626 Migrate to node nodeName already
file could not be found. pending.
User response: Check the node list shown by the Explanation: A request to migrate the file system
mmlscluster command. manager failed because a previous migrate request has
not yet completed.
6027-619 Negative grace times are not allowed. User response: None. Informational message only.
Explanation: The mmedquota command received a
negative value for the -t option. 6027-627 Node nodeName is already manager for
fileSystem.
User response: Reissue the mmedquota command
with a nonnegative value for grace time. Explanation: A request has been made to change the
file system manager node to the node that is already
the manager.
6027-620 Hard quota limit must not be less than
soft limit. User response: None. Informational message only.
Explanation: The hard quota limit must be greater
than or equal to the soft quota limit.
User response: Reissue the mmedquota command and
enter valid values when editing the information.
6027-628 Sending migrate request to current 6027-635 [E] The current file system manager failed
manager node nodeName. and no new manager will be appointed.
Explanation: A request has been made to change the Explanation: The file system manager node could not
file system manager node. be replaced. This is usually caused by other system
errors, such as disk or communication errors.
User response: None. Informational message only.
User response: See accompanying messages for the
base failure.
6027-629 [N] Node nodeName resigned as manager for
fileSystem.
6027-636 [E] Disk marked as stopped or offline.
Explanation: Progress report produced by the
mmchmgr command. Explanation: A disk continues to be marked down
due to a previous error and was not opened again.
User response: None. Informational message only.
User response: Check the disk status by issuing the
mmlsdisk command, then issue the mmchdisk start
6027-630 [N] Node nodeName appointed as manager
command to restart the disk.
for fileSystem.
Explanation: The mmchmgr command successfully
6027-637 [E] RVSD is not active.
changed the node designated as the file system
manager. Explanation: The RVSD subsystem needs to be
activated.
User response: None. Informational message only.
User response: See the appropriate IBM Reliable
Scalable Cluster Technology (RSCT) document
6027-631 Failed to appoint node nodeName as
(www.ibm.com/support/knowledgecenter/SGVKBA/
manager for fileSystem.
welcome) and search on diagnosing IBM Virtual Shared
Explanation: A request to change the file system Disk problems.
manager node has failed.
User response: Accompanying messages will describe 6027-638 [E] File system fileSystem unmounted by
the reason for the failure. Also, see the mmfs.log file on node nodeName
the target node.
Explanation: Produced in the console log on a forced
unmount of the file system caused by disk or
6027-632 Failed to appoint new manager for communication failures.
fileSystem.
User response: Check the error log on the indicated
Explanation: An attempt to change the file system node. Correct the underlying problem and remount the
manager node has failed. file system.
6027-633 The best choice node nodeName is Explanation: There has been an attempt to
already the manager for fileSystem. concurrently mount a file system on separate nodes in
both a normal mode and in 'restricted' mode.
Explanation: Informational message about the
progress and outcome of a migrate request. User response: Decide which mount mode you want
to use, and use that mount mode on both nodes.
User response: None. Informational message only.
6027-641 [E] Unable to access vital system metadata. 6027-646 [E] File system unmounted due to loss of
Too many disks are unavailable or the cluster membership.
file system is corrupted.
Explanation: Quorum was lost, causing file systems to
Explanation: An attempt has been made to access a be unmounted.
file system, but the metadata is unavailable. This can be
User response: Get enough nodes running the GPFS
caused by:
daemon to form a quorum.
1. The disks on which the metadata resides are either
stopped or there was an unsuccessful attempt to
delete them. 6027-647 [E] File fileName could not be run with err
errno.
2. The file system is corrupted.
Explanation: The specified shell script could not be
User response: To access the file system:
run. This message is followed by the error string that is
1. If the disks are the problem either start the stopped returned by the exec.
disks or try to delete them.
User response: Check file existence and access
2. If the file system has been corrupted, you will have
permissions.
to recreate it from backup medium.
6027-662 mmfsd timed out waiting for primary 6027-668 Could not send message to file system
node nodeName. daemon
Explanation: The mmfsd server is about to terminate. Explanation: Attempt to send a message to the file
system failed.
User response: Ensure that the mmfs.cfg
configuration file contains the correct host name or IP User response: Check if the file system daemon is up
address of the primary node. Check mmfsd on the and running.
primary node.
6027-669 Could not connect to file system
6027-663 Lost connection to file system daemon. daemon.
Explanation: The connection between a GPFS Explanation: The TCP connection between the
command and the mmfsd daemon has broken. The command and the daemon could not be established.
daemon has probably crashed.
User response: Check additional error messages.
User response: Ensure that the mmfsd daemon is
running. Check the error log.
6027-670 Value for 'option' is not valid. Valid
values are list.
6027-664 Unexpected message from file system
Explanation: The specified value for the given
daemon.
command option was not valid. The remainder of the
Explanation: The version of the mmfsd daemon does line will list the valid keywords.
not match the version of the GPFS command.
User response: Correct the command line.
User response: Ensure that all GPFS software
components are at the same version.
6027-671 Keyword missing or incorrect.
Explanation: A missing or incorrect keyword was
6027-665 Failed to connect to file system daemon:
encountered while parsing command line arguments
errorString
User response: Correct the command line.
Explanation: An error occurred while trying to create
a session with mmfsd.
6027-672 Too few arguments specified.
User response: Ensure that the mmfsd daemon is
running. Also, only root can run most GPFS Explanation: Too few arguments were specified on the
commands. The mode bits of the commands must be command line.
set-user-id to root.
User response: Correct the command line.
6027-676 Option option specified more than once. 6027-684 Value value for option is incorrect.
Explanation: The named option was specified more Explanation: An incorrect value was specified for the
than once on the command line. named option.
User response: Correct the command line. User response: Correct the command line.
6027-677 Option option is incorrect. 6027-685 Value value for option option is out of
range. Valid values are number through
Explanation: An incorrect option was specified on the
number.
command line.
Explanation: An out of range value was specified for
User response: Correct the command line.
the named option.
User response: Correct the command line.
6027-678 Misplaced or incorrect parameter name.
Explanation: A misplaced or incorrect parameter was
6027-686 option (value) exceeds option (value).
specified on the command line.
Explanation: The value of the first option exceeds the
User response: Correct the command line.
value of the second option. This is not permitted.
User response: Correct the command line.
6027-679 Device name is not valid.
Explanation: An incorrect device name was specified
6027-687 Disk name is specified more than once.
on the command line.
Explanation: The named disk was specified more than
User response: Correct the command line.
once on the command line.
User response: Correct the command line.
6027-680 [E] Disk failure. Volume name. rc = value.
Physical volume name.
6027-688 Failed to read file system descriptor.
Explanation: An I/O request to a disk or a request to
fence a disk has failed in such a manner that GPFS can Explanation: The disk block containing critical
no longer use the disk. information about the file system could not be read
from disk.
User response: Check the disk hardware and the
software subsystems in the path to the disk. User response: This is usually an error in the path to
the disks. If there are associated messages indicating an
I/O error such as ENODEV or EIO, correct that error
6027-681 Required option name was not specified.
and retry the operation. If there are no associated I/O
Explanation: A required option was not specified on errors, then run the mmfsck command with the file
the command line. system unmounted.
User response: Correct the command line.
6027-689 Failed to update file system descriptor.
6027-682 Device argument is missing. Explanation: The disk block containing critical
information about the file system could not be written
Explanation: The device argument was not specified to disk.
on the command line.
User response: This is a serious error, which may
User response: Correct the command line. leave the file system in an unusable state. Correct any
I/O errors, then run the mmfsck command with the
6027-683 Disk name is invalid. file system unmounted to make repairs.
User response: Correct the command line. Explanation: Could not obtain enough memory
(RAM) to perform an operation.
User response: Either retry the operation when the
mmfsd daemon is less heavily loaded, or increase the
size of one or more of the memory pool parameters by
issuing the mmchconfig command.
6027-691 Failed to send message to node 6027-698 [E] Not enough memory to allocate internal
nodeName. data structure.
Explanation: A message to another file system node Explanation: A file system operation failed because no
could not be sent. memory is available for allocating internal data
structures.
User response: Check additional error message and
the internode communication configuration. User response: Stop other processes that may have
main memory pinned for their use.
6027-692 Value for option is not valid. Valid
values are yes, no. 6027-699 [E] Inconsistency in file system metadata.
Explanation: An option that is required to be yes or Explanation: File system metadata on disk has been
no is neither. corrupted.
User response: Correct the command line. User response: This is an extremely serious error that
may cause loss of data. Issue the mmfsck command
with the file system unmounted to make repairs. There
6027-693 Cannot open disk name.
will be a POSSIBLE FILE CORRUPTION entry in the
Explanation: Could not access the given disk. system error log that should be forwarded to the IBM
Support Center.
User response: Check the disk hardware and the path
to the disk.
6027-700 [E] Log recovery failed.
6027-694 Disk not started; disk name has a bad Explanation: An error was encountered while
volume label. restoring file system metadata from the log.
Explanation: The volume label on the disk does not User response: Check additional error message. A
match that expected by GPFS. likely reason for this error is that none of the replicas of
the log could be accessed because too many disks are
User response: Check the disk hardware. For currently unavailable. If the problem persists, issue the
hot-pluggable drives, ensure that the proper drive has mmfsck command with the file system unmounted.
been plugged in.
unavailable disks or insufficient memory for file system system database (the given file) for a valid device entry.
control structures. Check other error messages as well
as the error log for additional information. Unmount
6027-707 Unable to open file fileName.
the file system and correct any I/O errors. Then
remount the file system and try the operation again. If Explanation: The named file cannot be opened.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs. User response: Check that the file exists and has the
correct permissions.
Explanation: The file system has encountered an error Explanation: An incorrect keyword was encountered.
that is serious enough to make some or all data User response: Correct the command line.
inaccessible. This message indicates that an error
occurred that left the file system in an unusable state.
6027-709 Incorrect response. Valid responses are
User response: Possible reasons include too many "yes", "no", or "noall"
unavailable disks or insufficient memory for file system
control structures. Check other error messages as well Explanation: A question was asked that requires a yes
as the error log for additional information. Unmount or no answer. The answer entered was neither yes, no,
the file system and correct any I/O errors. Then nor noall.
remount the file system and try the operation again. If User response: Enter a valid response.
the problem persists, issue the mmfsck command with
the file system unmounted to make repairs.
6027-710 Attention:
Explanation: The mmfsd daemon is not accepting Explanation: The file system has encountered an error
messages because it is restarting or stopping. that is serious enough to make some or all data
inaccessible. This message indicates that an error
User response: None. Informational message only. occurred that left the file system in an unusable state.
Possible reasons include too many unavailable disks or
6027-719 [E] Device type not supported. insufficient memory for file system control structures.
Explanation: A disk being added to a file system with User response: Check other error messages as well as
the mmadddisk or mmcrfs command is not a character the error log for additional information. Correct any
mode special file, or has characteristics not recognized I/O errors. Then, remount the file system and try the
by GPFS. operation again. If the problem persists, issue the
mmfsck command with the file system unmounted to
User response: Check the characteristics of the disk make repairs.
being added to the file system.
6027-725 The mmfsd daemon is not ready to 6027-731 Error number while performing command
handle commands yet. Waiting for for name quota on fileSystem
quorum.
Explanation: An error occurred when switching
Explanation: The GPFS mmfsd daemon is not quotas of a certain type on or off. If errors were
accepting messages because it is waiting for quorum. returned for multiple file systems, only the error code
is shown.
User response: Determine why insufficient nodes have
joined the group to achieve quorum and rectify the User response: Check the error code shown by the
problem. message to determine the reason.
6027-726 [E] Quota initialization/start-up failed. 6027-732 Error while performing command on
fileSystem.
Explanation: Quota manager initialization was
unsuccessful. The file system manager finished without Explanation: An error occurred while performing the
quotas. Subsequent client mount requests will fail. stated command when listing or reporting quotas.
User response: Check the error log and correct I/O User response: None. Informational message only.
errors. It may be necessary to issue the mmcheckquota
command with the file system unmounted.
6027-733 Edit quota: Incorrect format!
Explanation: The format of one or more edited quota
6027-727 Specified driver type type does not
limit entries was not correct.
match disk name driver type type.
User response: Reissue the mmedquota command.
Explanation: The driver type specified on the
Change only the values for the limits and follow the
mmchdisk command does not match the current driver
instructions given.
type of the disk.
User response: Verify the driver type and reissue the
6027-734 [W] Quota check for 'fileSystem' ended
command.
prematurely.
Explanation: The user interrupted and terminated the
6027-728 Specified sector size value does not
command.
match disk name sector size value.
User response: If ending the command was not
Explanation: The sector size specified on the
intended, reissue the mmcheckquota command.
mmchdisk command does not match the current sector
size of the disk.
6027-735 Error editing string from mmfsd.
User response: Verify the sector size and reissue the
command. Explanation: An internal error occurred in the mmfsd
when editing a string.
6027-729 Attention: No changes for disk name User response: None. Informational message only.
were specified.
Explanation: The disk descriptor in the mmchdisk 6027-736 Attention: Due to an earlier error
command does not specify that any changes are to be normal access to this file system has
made to the disk. been disabled. Check error log for
additional information. The file system
User response: Check the disk descriptor to determine
must be unmounted and then mounted
if changes are needed.
again to restore normal data access.
Explanation: The file system has encountered an error
6027-730 command on fileSystem.
that is serious enough to make some or all data
Explanation: Quota was activated or deactivated as inaccessible. This message indicates that an error
stated as a result of the mmquotaon, mmquotaoff, occurred that left the file system in an unusable state.
mmdefquotaon, or mmdefquotaoff commands. Possible reasons include too many unavailable disks or
insufficient memory for file system control structures.
User response: None, informational only. This
message is enabled with the -v option on the User response: Check other error messages as well as
mmquotaon, mmquotaoff, mmdefquotaon, or the error log for additional information. Unmount the
mmdefquotaoff commands. file system and correct any I/O errors. Then, remount
the file system and try the operation again. If the
problem persists, issue the mmfsck command with the
file system unmounted to make repairs. already recorded in the file system configuration. The
most likely reason for this problem is that too many
disks have become unavailable or are still unavailable
6027-737 Attention: No metadata disks remain.
after the disk state change.
Explanation: The mmchdisk command has been
User response: Issue an mmchdisk start command
issued, but no metadata disks remain.
when more disks are available.
User response: None. Informational message only.
6027-744 Unable to run command while the file
6027-738 Attention: No data disks remain. system is mounted in restricted mode.
Explanation: The mmchdisk command has been Explanation: A command that can alter the data in a
issued, but no data disks remain. file system was issued while the file system was
mounted in restricted mode.
User response: None. Informational message only.
User response: Mount the file system in read-only or
read-write mode or unmount the file system and then
6027-739 Attention: Due to an earlier reissue the command.
configuration change the file system is
no longer properly balanced.
6027-745 fileSystem: no quotaType quota
Explanation: The mmlsdisk command found that the management enabled.
file system is not properly balanced.
Explanation: A quota command of the cited type was
User response: Issue the mmrestripefs -b command at issued for the cited file system when no quota
your convenience. management was enabled.
User response: Enable quota management and reissue
6027-740 Attention: Due to an earlier the command.
configuration change the file system is
no longer properly replicated.
6027-746 Editing quota limits for this user or
Explanation: The mmlsdisk command found that the group not permitted.
file system is not properly replicated.
Explanation: The root user or system group was
User response: Issue the mmrestripefs -r command at specified for quota limit editing in the mmedquota
your convenience command.
User response: Specify a valid user or group in the
6027-741 Attention: Due to an earlier mmedquota command. Editing quota limits for the root
configuration change the file system user or system group is prohibited.
may contain data that is at risk of being
lost.
6027-747 [E] Too many nodes in cluster (max number)
Explanation: The mmlsdisk command found that or file system (max number).
critical data resides on disks that are suspended or
being deleted. Explanation: The operation cannot succeed because
too many nodes are involved.
User response: Issue the mmrestripefs -m command
as soon as possible. User response: Reduce the number of nodes to the
applicable stated limit.
6027-749 Pool size changed to number K = number 6027-756 [E] Configuration invalid or inconsistent
M. between different nodes.
Explanation: Pool size successfully changed. Explanation: Self-explanatory.
User response: None. Informational message only. User response: Check cluster and file system
configuration.
6027-750 [E] The node address ipAddress is not
defined in the node list 6027-757 name is not an excluded disk.
Explanation: An address does not exist in the GPFS Explanation: Some of the disks passed to the mmfsctl
configuration file. include command are not marked as excluded in the
mmsdrfs file.
User response: Perform required configuration steps
prior to starting GPFS on the node. User response: Verify the list of disks supplied to this
command.
6027-751 [E] Error code value
6027-758 Disk(s) not started; disk name has a bad
Explanation: Provides additional information about an
volume label.
error.
Explanation: The volume label on the disk does not
User response: See accompanying error messages.
match that expected by GPFS.
User response: Check the disk hardware. For
6027-752 [E] Lost membership in cluster clusterName.
hot-pluggable drives, make sure the proper drive has
Unmounting file systems.
been plugged in.
Explanation: This node has lost membership in the
cluster. Either GPFS is no longer available on enough
6027-759 fileSystem is still in use.
nodes to maintain quorum, or this node could not
communicate with other members of the quorum. This Explanation: The mmfsctl include command found
could be caused by a communications failure between that the named file system is still mounted, or another
nodes, or multiple GPFS failures. GPFS command is running against the file system.
User response: See associated error logs on the failed User response: Unmount the file system if it is
nodes for additional problem determination mounted, or wait for GPFS commands in progress to
information. terminate before retrying the command.
6027-753 [E] Could not run command command 6027-760 [E] Unable to perform i/o to the disk. This
node is either fenced from accessing the
Explanation: The GPFS daemon failed to run the
disk or this node's disk lease has
specified command.
expired.
User response: Verify correct installation.
Explanation: A read or write to the disk failed due to
either being fenced from the disk or no longer having a
6027-754 Error reading string for mmfsd. disk lease.
Explanation: GPFS could not properly read an input User response: Verify disk hardware fencing setup is
string. correct if being used. Ensure network connectivity
between this node and other nodes is operational.
User response: Check that GPFS is properly installed.
6027-762 No quota enabled file system found. 6027-769 Malformed mmpmon command
'command'.
Explanation: There is no quota-enabled file system in
this cluster. Explanation: The command read from the input file is
malformed, perhaps with an unknown keyword.
User response: None. Informational message only.
User response: Correct the command invocation and
reissue the command.
6027-763 uidInvalidate: Incorrect option option.
Explanation: An incorrect option passed to the
6027-770 Error writing user.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
6027-764 Error invalidating UID remapping cache
disks and reissue the command.
for domain.
Explanation: An incorrect domain name passed to the
6027-771 Error writing group.quota file.
uidinvalidate command.
Explanation: An error occurred while writing the cited
User response: Correct the command invocation.
quota file.
User response: Check the status and availability of the
6027-765 [W] Tick value hasn't changed for nearly
disks and reissue the command.
number seconds
Explanation: Clock ticks incremented by AIX have not
6027-772 Error writing fileset.quota file.
been incremented.
Explanation: An error occurred while writing the cited
User response: Check the error log for hardware or
quota file.
device driver problems that might cause timer
interrupts to be lost. User response: Check the status and availability of the
disks and reissue the command.
6027-766 [N] This node will be expelled from cluster
cluster due to expel msg from node 6027-774 fileSystem: quota management is not
enabled, or one or more quota clients
Explanation: This node is being expelled from the
are not available.
cluster.
Explanation: An attempt was made to perform quotas
User response: Check the network connection
commands without quota management enabled, or one
between this node and the node specified above.
or more quota clients failed during quota check.
User response: Correct the cause of the problem, and
6027-767 [N] Request sent to node to expel node from
then reissue the quota command.
cluster cluster
Explanation: This node sent an expel request to the
6027-775 During mmcheckquota processing,
cluster manager node to expel another node.
number node(s) failed. It is
User response: Check network connection between recommended that mmcheckquota be
this node and the node specified above. repeated.
Explanation: Nodes failed while an online quota
6027-768 Wrong number of operands for check was running.
mmpmon command 'command'.
User response: Reissue the quota check command.
Explanation: The command read from the input file
has the wrong number of operands.
6027-776 fileSystem: There was not enough space
User response: Correct the command invocation and for the report. Please repeat quota
reissue the command. check!
Explanation: The vflag is set in the tscheckquota
command, but either no space or not enough space
could be allocated for the differences to be printed.
User response: Correct the space problem and reissue
the quota check.
6027-777 [I] Recovering nodes: nodeList 6027-786 [E] Message failed because the destination
node refused the connection.
Explanation: Recovery for one or more nodes has
begun. Explanation: This node sent a message to a node that
refuses to establish a connection.
User response: No response is needed if this message
is followed by 'recovered nodes' entries specifying the User response: Check previous messages for further
nodes. If this message is not followed by such a information.
message, determine why recovery did not complete.
6027-787 [E] Security configuration data is
6027-778 [I] Recovering nodes in cluster cluster: inconsistent or unavailable.
nodeList
Explanation: There was an error configuring security
Explanation: Recovery for one or more nodes in the on this node.
cited cluster has begun.
User response: Check previous messages for further
User response: No response is needed if this message information.
is followed by 'recovered nodes' entries on the cited
cluster specifying the nodes. If this message is not
6027-788 [E] Failed to load or initialize security
followed by such a message, determine why recovery
library.
did not complete.
Explanation: There was an error loading or initializing
the security library on this node.
6027-779 Incorrect fileset name filesetName.
User response: Check previous messages for further
Explanation: The fileset name provided on the
information.
command line is incorrect.
User response: Correct the fileset name and reissue
6027-789 Unable to read offsets offset to offset for
the command.
inode inode snap snap, from disk
diskName, sector sector.
6027-780 Incorrect path to fileset junction
Explanation: The mmdeldisk -c command found that
junctionName.
the cited addresses on the cited disk represent data that
Explanation: The path to the fileset junction is is no longer readable.
incorrect.
User response: Save this output for later use in
User response: Correct the junction path and reissue cleaning up failing disks.
the command.
6027-790 Specified storage pool poolName does not
6027-781 Storage pools have not been enabled for match disk diskName storage pool
file system fileSystem. poolName. Use mmdeldisk and
mmadddisk to change a disk's storage
Explanation: The user invoked a command with a
pool.
storage pool option (-p or -P) before storage pools were
enabled. Explanation: An attempt was made to change a disk's
storage pool assignment using the mmchdisk
User response: Enable storage pools with the mmchfs
command. This can only be done by deleting the disk
-V command, or correct the command invocation and
from its current storage pool and then adding it to the
reissue the command.
new pool.
User response: Delete the disk from its current storage
6027-784 [E] Device not ready.
pool and then add it to the new pool.
Explanation: A device is not ready for operation.
User response: Check previous messages for further 6027-792 Policies have not been enabled for file
information. system fileSystem.
Explanation: The cited file system must be upgraded
6027-785 [E] Cannot establish connection. to use policies.
Explanation: This node cannot establish a connection User response: Upgrade the file system via the
to another node. mmchfs -V command.
User response: Check previous messages for further
information.
6027-793 No policy file was installed for file 6027-851 Unable to process interrupt received.
system fileSystem.
Explanation: An interrupt occurred that tsiostat
Explanation: No policy file was installed for this file cannot process.
system.
User response: Contact the IBM Support Center.
User response: Install a policy file.
6027-852 interval and count must be positive
6027-794 Failed to read policy file for file system integers.
fileSystem.
Explanation: Incorrect values were supplied for
Explanation: Failed to read the policy file for the tsiostat parameters.
requested file system.
User response: Correct the command invocation and
User response: Reinstall the policy file. reissue the command.
6027-795 Failed to open fileName: errorCode. 6027-853 interval must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. interval parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.
6027-796 Failed to read fileName: errorCode. 6027-854 count must be less than 1024.
Explanation: An incorrect file name was specified to Explanation: An incorrect value was supplied for the
tschpolicy. count parameter.
User response: Correct the command invocation and User response: Correct the command invocation and
reissue the command. reissue the command.
6027-797 Failed to stat fileName: errorCode. 6027-855 Unable to connect to server, mmfsd is
not started.
Explanation: An incorrect file name was specified to
tschpolicy. Explanation: The tsiostat command was issued but
the file system is not started.
User response: Correct the command invocation and
reissue the command. User response: Contact your system administrator.
6027-798 Policy files are limited to number bytes. 6027-856 No information to report.
Explanation: A user-specified policy file exceeded the Explanation: The tsiostat command was issued but no
maximum-allowed length. file systems are mounted.
User response: Install a smaller policy file. User response: Contact your system administrator.
6027-850 Unable to issue this command from a 6027-858 File system not mounted.
non-root user.
Explanation: The requested file system is not
Explanation: tsiostat requires root privileges to run. mounted.
User response: Get the system administrator to User response: Mount the file system and reattempt
change the executable to set the UID to 0. the failing operation.
command invocation and reissue the command. replication attributes of the named file.
6027-873 [W] Error on gpfs_stat_inode([pathName/ 6027-879 [E] Error deleting pathName: errorString
fileName],inodeNumber.genNumber):
Explanation: An error occurred while attempting to
errorString
delete the named file.
Explanation: An error occurred during a
User response: Investigate the file and possibly
gpfs_stat_inode operation.
reissue the command. The file may have been removed
User response: Reissue the command. If the problem or locked by another application.
persists, contact the IBM Support Center.
6027-880 Error on gpfs_seek_inode(inodeNumber):
6027-874 [E] Error: incorrect Date@Time errorString
(YYYY-MM-DD@HH:MM:SS)
Explanation: An error occurred during a
specification: specification
gpfs_seek_inode operation.
Explanation: The Date@Time command invocation
User response: Reissue the command. If the problem
argument could not be parsed.
persists, contact the contact the IBM Support Center
User response: Correct the command invocation and
try again. The syntax should look similar to:
6027-881 [E] Error on gpfs_iopen([rootPath/
2005-12-25@07:30:00.
pathName],inodeNumber): errorString
Explanation: An error occurred during a gpfs_iopen
6027-875 [E] Error on gpfs_stat(pathName): errorString
operation.
Explanation: An error occurred while attempting to
User response: Reissue the command. If the problem
stat() the cited path name.
persists, contact the IBM Support Center.
User response: Determine whether the cited path
name exists and is accessible. Correct the command
6027-882 [E] Error on gpfs_ireaddir(rootPath/
arguments as necessary and reissue the command.
pathName): errorString
Explanation: An error occurred during a
6027-876 [E] Error starting directory scan(pathName):
gpfs_ireaddir() operation.
errorString
User response: Reissue the command. If the problem
Explanation: The specified path name is not a
persists, contact the IBM Support Center.
directory.
User response: Determine whether the specified path
6027-883 Error on
name exists and is an accessible directory. Correct the
gpfs_next_inode(maxInodeNumber):
command arguments as necessary and reissue the
errorString
command.
Explanation: An error occurred during a
gpfs_next_inode operation.
6027-877 [E] Error opening pathName: errorString
User response: Reissue the command. If the problem
Explanation: An error occurred while attempting to
persists, contact the IBM Support Center.
open the named file. Its pool and replication attributes
remain unchanged.
6027-884 [E:nnn] Error during directory scan
User response: Investigate the file and possibly
reissue the command. The file may have been removed Explanation: A terminal error occurred during the
or locked by another application. directory scan phase of the command.
User response: Verify the command arguments.
6027-878 [E] Error on gpfs_fcntl(pathName): errorString Reissue the command. If the problem persists, contact
(offset=offset) the IBM Support Center.
Explanation: An error occurred while attempting fcntl
on the named file. Its pool or replication attributes may 6027-885 [E:nnn] Error during inode scan: errorString
not have been adjusted.
Explanation: A terminal error occurred during the
User response: Investigate the file and possibly inode scan phase of the command.
reissue the command. Use the mmlsattr and mmchattr
commands to examine and change the pool and User response: Verify the command arguments.
User response: Contact the IBM Support Center. Explanation: An error occurred during a
pthread_cond_wait operation.
User response: Contact the IBM Support Center.
6027-900 [E] Error opening work file fileName: 6027-906 [E:nnn] Error on system(command)
errorString
Explanation: An error occurred during the system call
Explanation: An error occurred while attempting to with the specified argument string.
open the named work file.
User response: Read and investigate related error
User response: Investigate the file and possibly messages.
reissue the command. Check that the path name is
defined and accessible.
6027-907 [E:nnn] Error from sort_file(inodeListname,
sortCommand,sortInodeOptions,tempDir)
6027-901 [E] Error writing to work file fileName:
Explanation: An error occurred while sorting the
errorString
named work file using the named sort command with
Explanation: An error occurred while attempting to the given options and working directory.
write to the named work file.
User response: Check these:
User response: Investigate the file and possibly v The sort command is installed on your system.
reissue the command. Check that there is sufficient free
v The sort command supports the given options.
space in the file system.
v The working directory is accessible.
v The file system has sufficient free space.
6027-902 [E] Error parsing work file fileName. Service
index: number
6027-908 [W] Attention: In RULE 'ruleName'
Explanation: An error occurred while attempting to
(ruleNumber), the pool named by
read the specified work file.
"poolName 'poolType'" is not defined in
User response: Investigate the file and possibly the file system.
reissue the command. Make sure that there is enough
Explanation: The cited pool is not defined in the file
free space in the file system. If the error persists,
system.
contact the IBM Support Center.
User response: Correct the rule and reissue the
command.
6027-903 [E:nnn] Error while loading policy rules.
This is not an irrecoverable error; the command will
Explanation: An error occurred while attempting to
continue to run. Of course it will not find any files in
read or parse the policy file, which may contain syntax
an incorrect FROM POOL and it will not be able to
errors. Subsequent messages include more information
migrate any files to an incorrect TO POOL.
about the error.
User response: Read all of the related error messages
6027-909 [E] Error on pthread_join: where
and try to correct the problem.
#threadNumber: errorString
Explanation: An error occurred while reaping the
6027-904 [E] Error returnCode from PD writer for
thread during a pthread_join operation.
inode=inodeNumber pathname=pathName
User response: Contact the IBM Support Center.
Explanation: An error occurred while writing the
policy decision for the candidate file with the indicated
inode number and path name to a work file. There 6027-910 [E:nnn] Error during policy execution
probably will be related error messages.
Explanation: A terminating error occurred during the
User response: Read all the related error messages. policy execution phase of the command.
Attempt to correct the problems.
User response: Verify the command arguments and
reissue the command. If the problem persists, contact
6027-905 [E] Error: Out of memory. Service index: the IBM Support Center.
number
Explanation: The command has exhausted virtual 6027-911 [E] Error on changeSpecification change for
memory. pathName. errorString
User response: Consider some of the command Explanation: This message provides more details
parameters that might affect memory usage. For further about a gpfs_fcntl() error.
assistance, contact the IBM Support Center.
User response: Use the mmlsattr and mmchattr
commands to examine the file, and then reissue the
change command.
6027-912 [E] Error on restriping of pathName. 6027-918 Cannot make this change to a nonzero
errorString length file.
Explanation: This provides more details on a Explanation: GPFS does not support the requested
gpfs_fcntl() error. change to the replication attributes.
User response: Use the mmlsattr and mmchattr User response: You may want to create a new file
commands to examine the file and then reissue the with the desired attributes and then copy your data to
restriping command. that file and rename it appropriately. Be sure that there
are sufficient disks assigned to the pool with different
failure groups to support the desired replication
6027-913 Desired replication exceeds number of
attributes.
failure groups.
Explanation: While restriping a file, the tschattr or
6027-919 Replication parameter range error (value,
tsrestripefile command found that the desired
value).
replication exceeded the number of failure groups.
Explanation: Similar to message 6027-918. The (a,b)
User response: Reissue the command after adding or
numbers are the allowable range of the replication
restarting file system disks.
attributes.
User response: You may want to create a new file
6027-914 Insufficient space in one of the replica
with the desired attributes and then copy your data to
failure groups.
that file and rename it appropriately. Be sure that there
Explanation: While restriping a file, the tschattr or are sufficient disks assigned to the pool with different
tsrestripefile command found there was insufficient failure groups to support the desired replication
space in one of the replica failure groups. attributes.
User response: Upgrade the command software on all User response: Correct command line and reissue the
nodes and reissue the command. command.
6027-937 [E] Error creating shared temporary 6027-945 -r value exceeds number of failure
sub-directory subDirName: subDirPath groups for data.
Explanation: The mkdir command failed on the Explanation: The mmchattr command received
named subdirectory path. command line arguments that were not valid.
User response: Specify an existing writable shared User response: Correct command line and reissue the
directory as the shared temporary directory argument command.
to the policy command. The policy command will
create a subdirectory within that. 6027-946 Not a regular file or directory.
Explanation: An mmlsattr or mmchattr command
6027-938 [E] Error closing work file fileName: error occurred.
errorString
User response: Correct the problem and reissue the
Explanation: An error occurred while attempting to command.
close the named work file or socket.
User response: Record the above information. Contact 6027-947 Stat failed: A file or directory in the
the IBM Support Center. path name does not exist.
Explanation: A file or directory in the path name does
6027-939 [E] Error on not exist.
gpfs_quotactl(pathName,commandCode,
resourceId): errorString User response: Correct the problem and reissue the
command.
Explanation: An error occurred while attempting
gpfs_quotactl().
6027-948 [E:nnn] fileName: get clone attributes failed:
User response: Correct the policy rules and/or enable errorString
GPFS quota tracking. If problem persists contact the
IBM Support Center. Explanation: The tsfattr call failed.
User response: Check for additional error messages.
6027-951 [E] Error on operationName to work file 6027-959 'fileName' is not a regular file.
fileName: errorString
Explanation: Only regular files are allowed to be clone
Explanation: An error occurred while attempting to parents.
do a (write-like) operation on the named work file.
User response: This file is not a valid target for
User response: Investigate the file and possibly mmclone operations.
reissue the command. Check that there is sufficient free
space in the file system.
6027-960 cannot access 'fileName': errorString.
Explanation: This message provides more details
6027-953 Failed to get a handle for fileset
about a stat() error.
filesetName, snapshot snapshotName in file
system fileSystem. errorMessage. User response: Correct the problem and reissue the
command.
Explanation: Failed to get a handle for a specific
fileset snapshot in the file system.
6027-961 Cannot execute command.
User response: Correct the command line and reissue
the command. If the problem persists, contact the IBM Explanation: The mmeditacl command cannot invoke
Support Center. the mmgetacl or mmputacl command.
User response: Contact your system administrator.
6027-954 Failed to get the maximum inode
number in the active file system.
6027-962 Failed to list fileset filesetName.
errorMessage.
Explanation: Failed to list specific fileset.
Explanation: Failed to get the maximum inode
number in the current active file system. User response: None.
User response: Correct the command line and reissue
the command. If the problem persists, contact the IBM 6027-963 EDITOR environment variable not set
Support Center.
Explanation: Self-explanatory.
6027-955 Failed to set the maximum allowed User response: Set the EDITOR environment variable
memory for the specified fileSystem and reissue the command.
command.
Explanation: Failed to set the maximum allowed 6027-964 EDITOR environment variable must be
memory for the specified command. an absolute path name
User response: Correct the command line and reissue Explanation: Self-explanatory.
the command. If the problem persists, contact the IBM User response: Set the EDITOR environment variable
Support Center. correctly and reissue the command.
6027-987 name is not a valid special name. 6027-993 Keyword aclType is incorrect. Valid
values are: 'posix', 'nfs4', 'native'.
Explanation: Produced by the mmputacl command
when the NFS V4 'special' identifier is followed by an Explanation: One of the mm*acl commands specified
unknown special id string. name is one of the following: an incorrect value with the -k option.
'owner@', 'group@', 'everyone@'.
User response: Correct the aclType value and reissue
User response: Specify a valid NFS V4 special name the command.
and reissue the command.
6027-994 ACL permissions cannot be denied to
6027-988 type is not a valid NFS V4 type. the file owner.
Explanation: Produced by the mmputacl command Explanation: The mmputacl command found that the
when the type field in an ACL entry is not one of the READ_ACL, WRITE_ACL, READ_ATTR, or
supported NFS Version 4 type values. type is one of the WRITE_ATTR permissions are explicitly being denied
following: 'allow' or 'deny'. to the file owner. This is not permitted, in order to
prevent the file being left with an ACL that cannot be
User response: Specify a valid NFS V4 type and
modified.
reissue the command.
User response: Do not select the READ_ACL,
WRITE_ACL, READ_ATTR, or WRITE_ATTR
6027-989 name is not a valid NFS V4 flag.
permissions on deny ACL entries for the OWNER.
Explanation: A flag specified in an ACL entry is not
one of the supported values, or is not valid for the type
6027-995 This command will run on a remote
of object (inherit flags are valid for directories only).
node, nodeName.
Valid values are FileInherit, DirInherit, and
InheritOnly. Explanation: The mmputacl command was invoked
for a file that resides on a file system in a remote
User response: Specify a valid NFS V4 option and
cluster, and UID remapping is enabled. To parse the
reissue the command.
user and group names from the ACL file correctly, the
command will be run transparently on a node in the
6027-990 Missing permissions (value found, value remote cluster.
are required).
User response: None. Informational message only.
Explanation: The permissions listed are less than the
number required.
6027-996 [E:nnn] Error reading policy text from:
User response: Add the missing permissions and fileName
reissue the command.
Explanation: An error occurred while attempting to
open or read the specified policy file. The policy file
6027-991 Combining FileInherit and DirInherit may be missing or inaccessible.
makes the mask ambiguous.
User response: Read all of the related error messages
Explanation: Produced by the mmputacl command and try to correct the problem.
when WRITE/CREATE is specified without MKDIR
(or the other way around), and both the
6027-997 [W] Attention: RULE 'ruleName' attempts to
FILE_INHERIT and DIR_INHERIT flags are specified.
redefine EXTERNAL POOLorLISTliteral
User response: Make separate FileInherit and 'poolName', ignored.
DirInherit entries and reissue the command.
Explanation: Execution continues as if the specified
rule was not present.
6027-992 Subdirectory name already exists. Unable
User response: Correct or remove the policy rule.
to create snapshot.
Explanation: tsbackup was unable to create a
snapshot because the snapshot subdirectory already
exists. This condition sometimes is caused by issuing a
6027-998 [E] Error in FLR/PDR serving for client 6027-1006 Incorrect custom [ ] line number.
clientHostNameAndPortNumber:
Explanation: A [nodelist] line in the input stream is not
FLRs=numOfFileListRecords
of the format: [nodelist]. This covers syntax errors not
PDRs=numOfPolicyDecisionResponses
covered by messages 6027-1004 and 6027-1005.
pdrs=numOfPolicyDecisionResponseRecords
User response: Fix the format of the list of nodes in
Explanation: A protocol error has been detected
the mmfs.cfg input file. This is usually the NodeFile
among cooperating mmapplypolicy processes.
specified on the mmchconfig command.
User response: Reissue the command. If the problem
If no user-specified lines are in error, contact the IBM
persists, contact the IBM Support Center.
Support Center.
If user-specified lines are in error, correct these lines.
6027-999 [E] Authentication failed:
myNumericNetworkAddress with
partnersNumericNetworkAddress 6027-1007 attribute found in common multiple
(code=codeIndicatingProtocolStepSequence times: attribute.
rc=errnoStyleErrorCode)
Explanation: The attribute specified on the command
Explanation: Two processes at the specified network line is in the main input stream multiple times. This is
addresses failed to authenticate. The cooperating occasionally legal, such as with the trace attribute.
processes should be on the same network; they should These attributes, however, are not meant to be repaired
not be separated by a firewall. by mmfixcfg.
User response: Correct the configuration and try the User response: Fix the configuration file (mmfs.cfg or
operation again. If the problem persists, contact the mmfscfg1 in the SDR). All attributes modified by GPFS
IBM Support Center. configuration commands may appear only once in
common sections of the configuration file.
6027-1004 Incorrect [nodelist] format in file:
nodeListLine 6027-1008 Attribute found in custom multiple
times: attribute.
Explanation: A [nodelist] line in the input stream is not
a comma-separated list of nodes. Explanation: The attribute specified on the command
line is in a custom section multiple times. This is
User response: Fix the format of the [nodelist] line in
occasionally legal. These attributes are not meant to be
the mmfs.cfg input file. This is usually the NodeFile
repaired by mmfixcfg.
specified on the mmchconfig command.
User response: Fix the configuration file (mmfs.cfg or
If no user-specified [nodelist] lines are in error, contact
mmfscfg1 in the SDR). All attributes modified by GPFS
the IBM Support Center.
configuration commands may appear only once in
If user-specified [nodelist] lines are in error, correct these custom sections of the configuration file.
lines.
6027-1022 Missing mandatory arguments on
6027-1005 Common is not sole item on [] line command line.
number.
Explanation: Some, but not enough, arguments were
Explanation: A [nodelist] line in the input stream specified to the mmcrfsc command.
contains common plus any other names.
User response: Specify all arguments as per the usage
User response: Fix the format of the [nodelist] line in statement that follows.
the mmfs.cfg input file. This is usually the NodeFile
specified on the mmchconfig command.
6027-1023 File system size must be an integer:
If no user-specified [nodelist] lines are in error, contact value
the IBM Support Center.
Explanation: The first two arguments specified to the
If user-specified [nodelist] lines are in error, correct these mmcrfsc command are not integers.
lines.
User response: File system size is an internal
argument. The mmcrfs command should never call the
mmcrfsc command without a valid file system size
argument. Contact the IBM Support Center.
6027-1028 Incorrect value for -name flag. 6027-1035 Option -optionName is mandatory.
Explanation: An incorrect argument was specified Explanation: A mandatory input option was not
with an option that requires one of a limited number of specified.
allowable options (for example, -s or any of the yes |
User response: Specify all mandatory options.
no options).
User response: Use one of the valid values for the
6027-1036 Option expected at string.
specified option.
Explanation: Something other than an expected option
was encountered on the latter portion of the command
6027-1029 Incorrect characters in integer field for
line.
-name option.
User response: Follow the syntax shown. Options may
Explanation: An incorrect character was specified with
not have multiple values. Extra arguments are not
the indicated option.
allowed.
User response: Use a valid integer for the indicated
option.
6027-1038 IndirectSize must be <= BlockSize and
must be a multiple of LogicalSectorSize
6027-1030 Value below minimum for -optionLetter (512).
option. Valid range is from value to value
Explanation: The IndirectSize specified was not a
Explanation: The value specified with an option was multiple of 512 or the IndirectSize specified was larger
below the minimum. than BlockSize.
User response: Use an integer in the valid range for User response: Use valid values for IndirectSize and
the indicated option. BlockSize.
6027-1031 Value above maximum for option 6027-1039 InodeSize must be a multiple of
-optionLetter. Valid range is from value to LocalSectorSize (512).
value.
Explanation: The specified InodeSize was not a
Explanation: The value specified with an option was multiple of 512.
above the maximum.
User response: Use a valid value for InodeSize.
User response: Use an integer in the valid range for
the indicated option.
6027-1040 InodeSize must be less than or equal to
Blocksize.
6027-1032 Incorrect option optionName.
Explanation: The specified InodeSize was not less
Explanation: An unknown option was specified. than or equal to Blocksize.
User response: Use only the options shown in the User response: Use a valid value for InodeSize.
syntax.
6027-1042 DefaultMetadataReplicas must be less
6027-1033 Option optionName specified twice. than or equal to MaxMetadataReplicas.
Explanation: An option was specified more than once Explanation: The specified DefaultMetadataReplicas
on the command line. was greater than MaxMetadataReplicas.
User response: Use options only once. User response: Specify a valid value for
DefaultMetadataReplicas.
6027-1034 Missing argument after optionName
option. 6027-1043 DefaultDataReplicas must be less than
or equal MaxDataReplicas.
Explanation: An option was not followed by an
argument. Explanation: The specified DefaultDataReplicas was
greater than MaxDataReplicas.
User response: All options need an argument. Specify
one. User response: Specify a valid value for
DefaultDataReplicas.
6027-1143 Cannot open fileName. 6027-1150 Error encountered while importing disk
diskName.
Explanation: A file could not be opened.
Explanation: The mmimportfs command encountered
User response: Verify that the specified file exists and problems while processing the disk.
that you have the proper authorizations.
User response: Check the preceding messages for
more information.
6027-1144 Incompatible cluster types. You cannot
move file systems that were created by
GPFS cluster type sourceCluster into 6027-1151 Disk diskName already exists in the
GPFS cluster type targetCluster. cluster.
Explanation: The source and target cluster types are Explanation: You are trying to import a file system
incompatible. that has a disk with the same name as some disk from
a file system that is already in the cluster.
User response: Remove or replace the disk with the finishes, use the mmchnsd command to assign NSD
conflicting name. server nodes to the disks as needed.
6027-1152 Block size must be 64K, 128K, 256K, 6027-1159 The following file systems were not
512K, 1M, 2M, 4M, 8M or 16M. imported: fileSystemList
Explanation: The specified block size value is not Explanation: The mmimportfs command was not able
valid. to import the specified file systems. Check the
preceding messages for error information.
User response: Specify a valid block size value.
User response: Correct the problems and reissue the
mmimportfs command.
6027-1153 At least one node in the cluster must be
defined as a quorum node.
6027-1160 The drive letters for the following file
Explanation: All nodes were explicitly designated or
systems have been reset: fileSystemList.
allowed to default to be nonquorum.
Explanation: The drive letters associated with the
User response: Specify which of the nodes should be
specified file systems are already in use by existing file
considered quorum nodes and reissue the command.
systems and have been reset.
User response: After the mmimportfs command
6027-1154 Incorrect node node specified for
finishes, use the -t option of the mmchfs command to
command.
assign new drive letters as needed.
Explanation: The user specified a node that is not
valid.
6027-1161 Use the dash character (-) to separate
User response: Specify a valid node. multiple node designations.
Explanation: A command detected an incorrect
6027-1155 The NSD servers for the following disks character used as a separator in a list of node
from file system fileSystem were reset or designations.
not defined: diskList
User response: Correct the command line and reissue
Explanation: Either the mmimportfs command the command.
encountered disks with no NSD servers, or was forced
to reset the NSD server information for one or more
6027-1162 Use the semicolon character (;) to
disks.
separate the disk names.
User response: After the mmimportfs command
Explanation: A command detected an incorrect
finishes, use the mmchnsd command to assign NSD
character used as a separator in a list of disk names.
server nodes to the disks as needed.
User response: Correct the command line and reissue
the command.
6027-1156 The NSD servers for the following free
disks were reset or not defined: diskList
6027-1163 GPFS is still active on nodeName.
Explanation: Either the mmimportfs command
encountered disks with no NSD servers, or was forced Explanation: The GPFS daemon was discovered to be
to reset the NSD server information for one or more active on the specified node during an operation that
disks. requires the daemon to be stopped.
User response: After the mmimportfs command User response: Stop the daemon on the specified node
finishes, use the mmchnsd command to assign NSD and rerun the command.
server nodes to the disks as needed.
6027-1164 Use mmchfs -t to assign drive letters as
6027-1157 Use the mmchnsd command to assign needed.
NSD servers as needed.
Explanation: The mmimportfs command was forced
Explanation: Either the mmimportfs command to reset the drive letters associated with one or more
encountered disks with no NSD servers, or was forced file systems. Check the preceding messages for detailed
to reset the NSD server information for one or more information.
disks. Check the preceding messages for detailed
User response: After the mmimportfs command
information.
finishes, use the -t option of the mmchfs command to
User response: After the mmimportfs command assign new drive letters as needed.
6027-1165 The PR attributes for the following 6027-1189 You cannot delete all the disks.
disks from file system fileSystem were
Explanation: The number of disks to delete is greater
reset or not yet established: diskList
than or equal to the number of disks in the file system.
Explanation: The mmimportfs command disabled the
User response: Delete only some of the disks. If you
Persistent Reserve attribute for one or more disks.
want to delete them all, use the mmdelfs command.
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable
6027-1197 parameter must be greater than value:
Persistent Reserve in the cluster as needed.
value.
Explanation: An incorrect value was specified for the
6027-1166 The PR attributes for the following free
named parameter.
disks were reset or not yet established:
diskList User response: Correct the input and reissue the
command.
Explanation: The mmimportfs command disabled the
Persistent Reserve attribute for one or more disks.
6027-1200 tscrfs failed. Cannot create device
User response: After the mmimportfs command
finishes, use the mmchconfig command to enable Explanation: The internal tscrfs command failed.
Persistent Reserve in the cluster as needed.
User response: Check the error message from the
command that failed.
6027-1167 Use mmchconfig to enable Persistent
Reserve in the cluster as needed.
6027-1201 Disk diskName does not belong to file
Explanation: The mmimportfs command disabled the system fileSystem.
Persistent Reserve attribute for one or more disks.
Explanation: The specified disk was not found to be
User response: After the mmimportfs command part of the cited file system.
finishes, use the mmchconfig command to enable
Persistent Reserve in the cluster as needed. User response: If the disk and file system were
specified as part of a GPFS command, reissue the
command with a disk that belongs to the specified file
6027-1168 Inode size must be 512, 1K or 4K. system.
Explanation: The specified inode size is not valid.
6027-1203 Attention: File system fileSystem may
User response: Specify a valid inode size.
have some disks that are in a non-ready
state. Issue the command: mmcommon
6027-1169 attribute must be value. recoverfs fileSystem
Explanation: The specified value of the given attribute Explanation: The specified file system may have some
is not valid. disks that are in a non-ready state.
User response: Specify a valid value. User response: Run mmcommon recoverfs fileSystem
to ensure that the GPFS configuration data for the file
system is current, and then display the states of the
6027-1178 parameter must be from value to value:
disks in the file system using the mmlsdisk command.
valueSpecified
If any disks are in a non-ready state, steps should be
Explanation: A parameter value specified was out of
taken to bring these disks into the ready state, or to
range.
remove them from the file system. This can be done by
User response: Keep the specified value within the mounting the file system, or by using the mmchdisk
range shown. command for a mounted or unmounted file system.
When maintenance is complete or the failure has been
repaired, use the mmchdisk command with the start
6027-1188 Duplicate disk specified: disk option. If the failure cannot be repaired without loss of
Explanation: A disk was specified more than once on data, you can use the mmdeldisk command to delete
the command line. the disks.
User response: Choose an unused name or path. User response: Examine the error code and other
messages to determine the reason for the failure.
Correct the problem and reissue the command.
6027-1208 File system fileSystem not found in
cluster clusterName.
6027-1214 Unable to enable Persistent Reserve on
Explanation: The specified file system does not belong the following disks: diskList
to the cited remote cluster. The local information about
the file system is not current. The file system may have Explanation: The command was unable to set up all
been deleted, renamed, or moved to a different cluster. of the disks to use Persistent Reserve.
User response: Contact the administrator of the User response: Examine the disks and the additional
remote cluster that owns the file system and verify the error information to determine if the disks should have
accuracy of the local information. Use the mmremotefs supported Persistent Reserve. Correct the problem and
show command to display the local information about reissue the command.
the file system. Use the mmremotefs update command
to make the necessary changes. 6027-1215 Unable to reset the Persistent Reserve
attributes on one or more disks on the
following nodes: nodeList
Explanation: The command could not reset Persistent
Reserve on at least one disk on the specified nodes.
User response: Examine the additional error
6027-1221 The number of NSD servers exceeds the 6027-1227 The main GPFS cluster configuration
maximum (value) allowed. file is locked. Retrying ...
Explanation: The number of NSD servers in the disk Explanation: Another GPFS administration command
descriptor exceeds the maximum allowed. has locked the cluster configuration file. The current
process will try to obtain the lock a few times before
User response: Change the disk descriptor to specify giving up.
no more NSD servers than the maximum allowed.
User response: None. Informational message only.
6027-1228 Lock creation successful. 6027-1234 Adding node node to the cluster will
exceed the quorum node limit.
Explanation: The holder of the lock has released it
and the current process was able to obtain it. Explanation: An attempt to add the cited node to the
cluster resulted in the quorum node limit being
User response: None. Informational message only. The
exceeded.
command will now continue.
User response: Change the command invocation to
not exceed the node quorum limit, and reissue the
6027-1229 Timed out waiting for lock. Try again
command.
later.
Explanation: Another GPFS administration command
6027-1235 The fileName kernel extension does not
kept the main GPFS cluster configuration file locked for
exist.
over a minute.
Explanation: The cited kernel extension does not exist.
User response: Try again later. If no other GPFS
administration command is presently running, see User response: Create the needed kernel extension by
“GPFS cluster configuration data files are locked” on compiling a custom mmfslinux module for your kernel
page 76. (see steps in /usr/lpp/mmfs/src/README), or copy the
binaries from another node with the identical
environment.
6027-1230 diskName is a tiebreaker disk and cannot
be deleted.
6027-1236 Unable to verify kernel/module
Explanation: A request was made to GPFS to delete a
configuration.
node quorum tiebreaker disk.
Explanation: The mmfslinux kernel extension does
User response: Specify a different disk for deletion.
not exist.
User response: Create the needed kernel extension by
6027-1231 GPFS detected more than eight quorum
compiling a custom mmfslinux module for your kernel
nodes while node quorum with
(see steps in /usr/lpp/mmfs/src/README), or copy the
tiebreaker disks is in use.
binaries from another node with the identical
Explanation: A GPFS command detected more than environment.
eight quorum nodes, but this is not allowed while node
quorum with tiebreaker disks is in use.
6027-1237 The GPFS daemon is still running; use
User response: Reduce the number of quorum nodes the mmshutdown command.
to a maximum of eight, or use the normal node
Explanation: An attempt was made to unload the
quorum algorithm.
GPFS kernel extensions while the GPFS daemon was
still running.
6027-1232 GPFS failed to initialize the tiebreaker
User response: Use the mmshutdown command to
disks.
shut down the daemon.
Explanation: A GPFS command unsuccessfully
attempted to initialize the node quorum tiebreaker
6027-1238 Module fileName is still in use. Unmount
disks.
all GPFS file systems and issue the
User response: Examine prior messages to determine command: mmfsadm cleanup
why GPFS was unable to initialize the tiebreaker disks
Explanation: An attempt was made to unload the
and correct the problem. After that, reissue the
cited module while it was still in use.
command.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
6027-1233 Incorrect keyword: value.
solve the problem, reboot the machine.
Explanation: A command received a keyword that is
not valid.
6027-1239 Error unloading module moduleName.
User response: Correct the command line and reissue
Explanation: GPFS was unable to unload the cited
the command.
module.
User response: Unmount all GPFS file systems and
issue the command mmfsadm cleanup. If this does not
solve the problem, reboot the machine.
6027-1253 Incorrect value for option option. 6027-1259 command not found. Ensure the
OpenSSL code is properly installed.
Explanation: The provided value for the specified
option is not valid. Explanation: The specified command was not found.
User response: Correct the error and reissue the User response: Ensure the OpenSSL code is properly
command. installed and reissue the command.
6027-1254 Warning: Not all nodes have proper 6027-1260 File fileName does not contain any
GPFS license designations. Use the typeOfStanza stanzas.
mmchlicense command to designate
Explanation: The input file should contain at least one
licenses as needed.
specified stanza.
Explanation: Not all nodes in the cluster have valid
User response: Correct the input file and reissue the
license designations.
command.
User response: Use mmlslicense to see the current
license designations. Use mmchlicense to assign valid
6027-1261 descriptorField must be specified in
GPFS licenses to all nodes as needed.
descriptorType descriptor.
Explanation: A required field of the descriptor was
6027-1255 There is nothing to commit. You must
empty. The incorrect descriptor is displayed following
first run: command.
this message.
Explanation: You are attempting to commit an SSL
User response: Correct the input and reissue the
private key but such a key has not been generated yet.
command.
User response: Run the specified command to
generate the public/private key pair.
6027-1262 Unable to obtain the GPFS
configuration file lock. Retrying ...
6027-1256 The current authentication files are
Explanation: A command requires the lock for the
already committed.
GPFS system data but was not able to obtain it.
Explanation: You are attempting to commit
User response: None. Informational message only.
public/private key files that were previously generated
with the mmauth command. The files have already
been committed. 6027-1263 Unable to obtain the GPFS
configuration file lock.
User response: None. Informational message.
Explanation: A command requires the lock for the
GPFS system data but was not able to obtain it.
6027-1257 There are uncommitted authentication
files. You must first run: command. User response: Check the preceding messages, if any.
Follow the procedure in “GPFS cluster configuration
Explanation: You are attempting to generate new
data files are locked” on page 76, and then reissue the
public/private key files but previously generated files
command.
have not been committed yet.
User response: Run the specified command to commit
6027-1268 Missing arguments.
the current public/private key pair.
Explanation: A GPFS administration command
received an insufficient number of arguments.
6027-1258 You must establish a cipher list first.
Run: command. User response: Correct the command line and reissue
the command.
Explanation: You are attempting to commit an SSL
private key but a cipher list has not been established
yet. 6027-1269 The device name device starts with a
slash, but not /dev/.
User response: Run the specified command to specify
a cipher list. Explanation: The device name does not start with
/dev/.
User response: Correct the device name.
6027-1270 The device name device contains a slash, 6027-1277 No contact nodes were provided for
but not as its first character. cluster clusterName.
Explanation: The specified device name contains a Explanation: A GPFS command found that no contact
slash, but the first character is not a slash. nodes have been specified for the cited cluster.
User response: The device name must be an User response: Use the mmremotecluster command to
unqualified device name or an absolute device path specify some contact nodes for the cited cluster.
name, for example: fs0 or /dev/fs0.
6027-1278 None of the contact nodes in cluster
6027-1271 Unexpected error from command. Return clusterName can be reached.
code: value
Explanation: A GPFS command was unable to reach
Explanation: A GPFS administration command (mm...) any of the contact nodes for the cited cluster.
received an unexpected error code from an internally
User response: Determine why the contact nodes for
called command.
the cited cluster cannot be reached and correct the
User response: Perform problem determination. See problem, or use the mmremotecluster command to
“GPFS commands are unsuccessful” on page 89. specify some additional contact nodes that can be
reached.
6027-1272 Unknown user name userName.
6027-1287 Node nodeName returned ENODEV for
Explanation: The specified value cannot be resolved to
disk diskName.
a valid user ID (UID).
Explanation: The specified node returned ENODEV
User response: Reissue the command with a valid
for the specified disk.
user name.
User response: Determine the cause of the ENODEV
error for the specified disk and rectify it. The ENODEV
6027-1273 Unknown group name groupName.
may be due to disk fencing or the removal of a device
Explanation: The specified value cannot be resolved to that previously was present.
a valid group ID (GID).
User response: Reissue the command with a valid 6027-1288 Remote cluster clusterName was not
group name. found.
Explanation: A GPFS command found that the cited
6027-1274 Unexpected error obtaining the lockName cluster has not yet been identified to GPFS as a remote
lock. cluster.
Explanation: GPFS cannot obtain the specified lock. User response: Specify a remote cluster known to
GPFS, or use the mmremotecluster command to make
User response: Examine any previous error messages. the cited cluster known to GPFS.
Correct any problems and reissue the command. If the
problem persists, perform problem determination and
contact the IBM Support Center. 6027-1289 Name name is not allowed. It contains
the following invalid special character:
char
6027-1275 Daemon node adapter Node was not
found on admin node Node. Explanation: The cited name is not allowed because it
contains the cited invalid special character.
Explanation: An input node descriptor was found to
be incorrect. The node adapter specified for GPFS User response: Specify a name that does not contain
daemon communications was not found to exist on the an invalid special character, and reissue the command.
cited GPFS administrative node.
User response: Correct the input node descriptor and 6027-1290 GPFS configuration data for file system
reissue the command. fileSystem may not be in agreement with
the on-disk data for the file system.
Issue the command: mmcommon
6027-1276 Command failed for disks: diskList. recoverfs fileSystem
Explanation: A GPFS command was unable to Explanation: GPFS detected that the GPFS
complete successfully on the listed disks. configuration database data for the specified file system
User response: Correct the problems and reissue the may not be in agreement with the on-disk data for the
command. file system. This may be caused by a GPFS disk
6027-1292 The -N option cannot be used with Explanation: All disk descriptors specify dataOnly for
attribute name. disk usage.
Explanation: The specified configuration attribute User response: Change at least one disk descriptor in
cannot be changed on only a subset of nodes. This the file system to indicate a usage of metadataOnly or
attribute must be the same on all nodes in the cluster. dataAndMetadata.
User response: Certain attributes, such as autoload,
may not be customized from node to node. Change the 6027-1299 Incorrect value value specified for failure
attribute for the entire cluster. group.
Explanation: The specified failure group is not valid.
6027-1293 There are no remote file systems.
User response: Correct the problem and reissue the
Explanation: A value of all was specified for the command.
remote file system operand of a GPFS command, but
no remote file systems are defined.
6027-1300 No file systems were found.
User response: None. There are no remote file systems
Explanation: A GPFS command searched for file
on which to operate.
systems, but none were found.
User response: Create a GPFS file system before
6027-1294 Remote file system fileSystem is not
reissuing the command.
defined.
Explanation: The specified file system was used for
6027-1301 The NSD servers specified in the disk
the remote file system operand of a GPFS command,
descriptor do not match the NSD servers
but the file system is not known to GPFS.
currently in effect.
User response: Specify a remote file system known to
Explanation: The set of NSD servers specified in the
GPFS.
disk descriptor does not match the set that is currently
in effect.
6027-1295 The GPFS configuration information is
User response: Specify the same set of NSD servers in
incorrect or not available.
the disk descriptor as is currently in effect or omit it
Explanation: A problem has been encountered while from the disk descriptor and then reissue the
verifying the configuration information and the command. Use the mmchnsd command to change the
execution environment. NSD servers as needed.
6027-1303 This function is not available in the 6027-1309 Storage pools are not available in the
GPFS Express Edition. GPFS Express Edition.
Explanation: The requested function is not part of the Explanation: Support for multiple storage pools is not
GPFS Express Edition. part of the GPFS Express Edition.
User response: Install the GPFS Standard Edition on User response: Install the GPFS Standard Edition on
all nodes in the cluster, and then reissue the command. all nodes in the cluster, and then reissue the command.
6027-1304 Missing argument after option option. 6027-1332 Cannot find disk with command.
Explanation: The specified command option requires a Explanation: The specified disk cannot be found.
value.
User response: Specify a correct disk name.
User response: Specify a value and reissue the
command.
6027-1333 The following nodes could not be
restored: nodeList. Correct the problems
6027-1305 Prerequisite libraries not found or and use the mmsdrrestore command to
correct version not installed. Ensure recover these nodes.
productName is properly installed.
Explanation: The mmsdrrestore command was unable
Explanation: The specified software product is to restore the configuration information for the listed
missing or is not properly installed. nodes.
User response: Verify that the product is installed User response: Correct the problems and reissue the
properly. mmsdrrestore command for these nodes.
6027-1306 Command command failed with return 6027-1334 Incorrect value for option option. Valid
code value. values are: validValues.
Explanation: A command was not successfully Explanation: An incorrect argument was specified
processed. with an option requiring one of a limited number of
legal options.
User response: Correct the failure specified by the
command and reissue the command. User response: Use one of the legal values for the
indicated option.
6027-1307 Disk disk on node nodeName already has
a volume group vgName that does not 6027-1335 Command completed: Not all required
appear to have been created by this changes were made.
program in a prior invocation. Correct
Explanation: Some, but not all, of the required
the descriptor file or remove the volume
changes were made.
group and retry.
User response: Examine the preceding messages,
Explanation: The specified disk already belongs to a
correct the problems, and reissue the command.
volume group.
User response: Either remove the volume group or
6027-1338 Command is not allowed for remote file
remove the disk descriptor and retry.
systems.
Explanation: A command for which a remote file
6027-1308 feature is not available in the GPFS
system is not allowed was issued against a remote file
Express Edition.
system.
Explanation: The specified function or feature is not
User response: Choose a local file system, or issue the
part of the GPFS Express Edition.
command on a node in the cluster that owns the file
User response: Install the GPFS Standard Edition on system.
all nodes in the cluster, and then reissue the command.
6027-1339 Disk usage value is incompatible with
storage pool name.
Explanation: A disk descriptor specified a disk usage
involving metadata and a storage pool other than
system.
6027-1347 Disk with NSD volume id NSD volume User response: This is an informational message.
id no longer exists in the GPFS cluster
configuration data but the NSD volume 6027-1361 Attention: There are no available valid
id was not erased from the disk. To VFS type values for mmfs in /etc/vfs.
remove the NSD volume id, issue:
mmdelnsd -p NSD volume id Explanation: An out of range number was used as the
vfs number for GPFS.
Explanation: A GPFS administration command (mm...)
successfully removed the disk with the specified NSD User response: The valid range is 8 through 32. Check
volume id from the GPFS cluster configuration data but /etc/vfs and remove unneeded entries.
was unable to erase the NSD volume id from the disk.
User response: Issue the specified command to
remove the NSD volume id from the disk.
6027-1393 Incorrect node designation specified: 6027-1503 Completed adding disks to file system
type. fileSystem.
Explanation: A node designation that is not valid was Explanation: The mmadddisk command successfully
specified. Valid values are client or manager. completed.
User response: Correct the command line and reissue User response: None. Informational message only.
the command.
6027-1504 File name could not be run with err error.
6027-1394 Operation not allowed for the local
Explanation: A failure occurred while trying to run an
cluster.
external program.
Explanation: The requested operation cannot be
User response: Make sure the file exists. If it does,
performed for the local cluster.
check its access permissions.
User response: Specify the name of a remote cluster.
6027-1505 Could not get minor number for name.
6027-1450 Could not allocate storage.
Explanation: Could not obtain a minor number for the
Explanation: Sufficient memory cannot be allocated to specified block or character device.
run the mmsanrepairfs command.
User response: Problem diagnosis will depend on the
User response: Increase the amount of memory subsystem that the device belongs to. For example,
available. device /dev/VSD0 belongs to the IBM Virtual Shared
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation.
6027-1500 [E] Open devicetype device failed with error:
Explanation: The "open" of a device failed. Operation
6027-1507 READ_KEYS ioctl failed with
of the file system may continue unless this device is
errno=returnCode, tried timesTried times.
needed for operation. If this is a replicated disk device,
Related values are
it will often not be needed. If this is a block or
scsi_status=scsiStatusValue,
character device for another subsystem (such as
sense_key=senseKeyValue,
/dev/VSD0) then GPFS will discontinue operation.
scsi_asc=scsiAscValue,
User response: Problem diagnosis will depend on the scsi_ascq=scsiAscqValue.
subsystem that the device belongs to. For instance
Explanation: A READ_KEYS ioctl call failed with the
device "/dev/VSD0" belongs to the IBM Virtual Shared
errno= and related values shown.
Disk subsystem and problem determination should
follow guidelines in that subsystem's documentation. If User response: Check the reported errno= value and
this is a normal disk device then take needed repair try to correct the problem. If the problem persists,
action on the specified disk. contact the IBM Support Center.
6027-1501 [X] Volume label of disk name is name, 6027-1508 Registration failed with
should be uid. errno=returnCode, tried timesTried times.
Related values are
Explanation: The UID in the disk descriptor does not
scsi_status=scsiStatusValue,
match the expected value from the file system
sense_key=senseKeyValue,
descriptor. This could occur if a disk was overwritten
scsi_asc=scsiAscValue,
by another application or if the IBM Virtual Shared
scsi_ascq=scsiAscqValue.
Disk subsystem incorrectly identified the disk.
Explanation: A REGISTER ioctl call failed with the
User response: Check the disk configuration.
errno= and related values shown.
User response: Check the reported errno= value and
6027-1502 [X] Volume label of disk diskName is
try to correct the problem. If the problem persists,
corrupt.
contact the IBM Support Center.
Explanation: The disk descriptor has a bad magic
number, version, or checksum. This could occur if a
disk was overwritten by another application or if the
IBM Virtual Shared Disk subsystem incorrectly
identified the disk.
User response: Check the disk configuration.
6027-1509 READRES ioctl failed with 6027-1515 READ KEY ioctl failed with
errno=returnCode, tried timesTried times. rc=returnCode. Related values are SCSI
Related values are status=scsiStatusValue,
scsi_status=scsiStatusValue, host_status=hostStatusValue,
sense_key=senseKeyValue, driver_status=driverStatsValue.
scsi_asc=scsiAscValue,
Explanation: An ioctl call failed with stated return
scsi_ascq=scsiAscqValue.
code, errno value, and related values.
Explanation: A READRES ioctl call failed with the
User response: Check the reported errno and correct
errno= and related values shown.
the problem if possible. Otherwise, contact the IBM
User response: Check the reported errno= value and Support Center.
try to correct the problem. If the problem persists,
contact the IBM Support Center.
6027-1516 REGISTER ioctl failed with
rc=returnCode. Related values are SCSI
6027-1510 [E] Error mounting file system stripeGroup status=scsiStatusValue,
on mountPoint; errorQualifier (gpfsErrno) host_status=hostStatusValue,
driver_status=driverStatsValue.
Explanation: An error occurred while attempting to
mount a GPFS file system on Windows. Explanation: An ioctl call failed with stated return
code, errno value, and related values.
User response: Examine the error details, previous
errors, and the GPFS message log to identify the cause. User response: Check the reported errno and correct
the problem if possible. Otherwise, contact the IBM
Support Center.
6027-1511 [E] Error unmounting file system
stripeGroup; errorQualifier (gpfsErrno)
6027-1517 READ RESERVE ioctl failed with
Explanation: An error occurred while attempting to
rc=returnCode. Related values are SCSI
unmount a GPFS file system on Windows.
status=scsiStatusValue,
User response: Examine the error details, previous host_status=hostStatusValue,
errors, and the GPFS message log to identify the cause. driver_status=driverStatsValue.
Explanation: An ioctl call failed with stated return
6027-1512 [E] WMI query for queryType failed; code, errno value, and related values.
errorQualifier (gpfsErrno)
User response: Check the reported errno and correct
Explanation: An error occurred while running a WMI the problem if possible. Otherwise, contact the IBM
query on Windows. Support Center.
User response: Examine the error details, previous
errors, and the GPFS message log to identify the cause. 6027-1518 RESERVE ioctl failed with rc=returnCode.
Related values are SCSI
status=scsiStatusValue,
6027-1513 DiskName is not an sg device, or sg host_status=hostStatusValue,
driver is older than sg3 driver_status=driverStatsValue.
Explanation: The disk is not a SCSI disk, or supports Explanation: An ioctl call failed with stated return
SCSI standard older than SCSI 3. code, errno value, and related values.
User response: Correct the command invocation and User response: Check the reported errno and correct
try again. the problem if possible. Otherwise, contact the IBM
Support Center.
6027-1514 ioctl failed with rc=returnCode. Related
values are SCSI status=scsiStatusValue, 6027-1519 INQUIRY ioctl failed with rc=returnCode.
host_status=hostStatusValue, Related values are SCSI
driver_status=driverStatsValue. status=scsiStatusValue,
Explanation: An ioctl call failed with stated return host_status=hostStatusValue,
code, errno value, and related values. driver_status=driverStatsValue.
User response: Check the reported errno and correct Explanation: An ioctl call failed with stated return
the problem if possible. Otherwise, contact the IBM code, errno value, and related values.
Support Center. User response: Check the reported errno and correct
6027-1537 [E] Connect failed to ipAddress: reason 6027-1543 error propagating parameter.
Explanation: An attempt to connect sockets between Explanation: mmfsd could not propagate a
nodes failed. configuration parameter value to one or more nodes in
the cluster.
User response: Check the reason listed and the
connection to the indicated IP address. User response: Contact the IBM Support Center.
6027-1548 [A] Error: daemon and kernel extension do 6027-1559 The -i option failed. Changes will take
not match. effect after GPFS is restarted.
Explanation: The GPFS kernel extension loaded in Explanation: The -i option on the mmchconfig
memory and the daemon currently starting do not command failed. The changes were processed
appear to have come from the same build. successfully, but will take effect only after the GPFS
daemons are restarted.
User response: Ensure that the kernel extension was
reloaded after upgrading GPFS. See “GPFS modules User response: Check for additional error messages.
cannot be loaded on Linux” on page 79 for details. Correct the problem and reissue the command.
6027-1549 [A] Attention: custom-built kernel 6027-1560 This GPFS cluster contains file systems.
extension; the daemon and kernel You cannot delete the last node.
extension do not match.
Explanation: An attempt has been made to delete a
Explanation: The GPFS kernel extension loaded in GPFS cluster that still has one or more file systems
memory does not come from the same build as the associated with it.
starting daemon. The kernel extension appears to have
User response: Before deleting the last node of a GPFS
been built from the kernel open source package.
cluster, delete all file systems that are associated with it.
User response: None. This applies to both local and remote file systems.
6027-1550 [W] Error: Unable to establish a session 6027-1561 Attention: Failed to remove
with an Active Directory server. ID node-specific changes.
remapping via Microsoft Identity
Explanation: The internal mmfixcfg routine failed to
Management for Unix will be
remove node-specific configuration settings, if any, for
unavailable.
one or more of the nodes being deleted. This is of
Explanation: GPFS tried to establish an LDAP session consequence only if the mmchconfig command was
with an Active Directory server (normally the domain indeed used to establish node specific settings and
controller host), and has been unable to do so. these nodes are later added back into the cluster.
User response: Ensure the domain controller is User response: If you add the nodes back later, ensure
available. that the configuration parameters for the nodes are set
as desired.
6027-1555 Mount point and device name cannot be
equal: name 6027-1562 command command cannot be executed.
Either none of the nodes in the cluster
Explanation: The specified mount point is the same as
are reachable, or GPFS is down on all of
the absolute device name.
the nodes.
User response: Enter a new device name or absolute
Explanation: The command that was issued needed to
mount point path name.
perform an operation on a remote node, but none of
the nodes in the cluster were reachable, or GPFS was
6027-1556 Interrupt received. not accepting commands on any of the nodes.
Explanation: A GPFS administration command User response: Ensure that the affected nodes are
received an interrupt. available and all authorization requirements are met.
Correct any problems and reissue the command.
User response: None. Informational message only.
6027-1564 To change the authentication key for the 6027-1571 commandName does not exist or failed;
local cluster, run: mmauth genkey. automount mounting may not work.
Explanation: The authentication keys for the local Explanation: One or more of the GPFS file systems
cluster must be created only with the specified were defined with the automount attribute but the
command. requisite automount command is missing or failed.
User response: Run the specified command to User response: Correct the problem and restart GPFS.
establish a new authentication key for the nodes in the Or use the mount command to explicitly mount the file
cluster. system.
6027-1565 disk not found in file system fileSystem. 6027-1572 The command must run on a node that
is part of the cluster.
Explanation: A disk specified for deletion or
replacement does not exist. Explanation: The node running the mmcrcluster
command (this node) must be a member of the GPFS
User response: Specify existing disks for the indicated
cluster.
file system.
User response: Issue the command from a node that
will belong to the cluster.
6027-1566 Remote cluster clusterName is already
defined.
6027-1573 Command completed: No changes made.
Explanation: A request was made to add the cited
cluster, but the cluster is already known to GPFS. Explanation: Informational message.
User response: None. The cluster is already known to User response: Check the preceding messages, correct
GPFS. any problems, and reissue the command.
6027-1567 fileSystem from cluster clusterName is 6027-1574 Permission failure. The command
already defined. requires root authority to execute.
Explanation: A request was made to add the cited file Explanation: The command, or the specified
system from the cited cluster, but the file system is command option, requires root authority.
already known to GPFS.
User response: Log on as root and reissue the
User response: None. The file system is already command.
known to GPFS.
6027-1578 File fileName does not contain node
6027-1568 command command failed. Only names.
parameterList changed.
Explanation: The specified file does not contain valid
Explanation: The mmchfs command failed while node names.
making the requested changes. Any changes to the
User response: Node names must be specified one per
attributes in the indicated parameter list were
line. The name localhost and lines that start with '#'
successfully completed. No other file system attributes
character are ignored.
were changed.
User response: Reissue the command if you want to
6027-1579 File fileName does not contain data.
change additional attributes of the file system. Changes
can be undone by issuing the mmchfs command with Explanation: The specified file does not contain data.
the original value for the affected attribute.
User response: Verify that you are specifying the
correct file name and reissue the command.
6027-1570 virtual shared disk support is not
installed.
6027-1587 Unable to determine the local device
Explanation: The command detected that IBM Virtual name for disk nsdName on node
Shared Disk support is not installed on the node on nodeName.
which it is running.
Explanation: GPFS was unable to determine the local
User response: Install IBM Virtual Shared Disk device name for the specified GPFS disk.
support.
User response: Determine why the specified disk on
the specified node could not be accessed and correct
the problem. Possible reasons include: connectivity
User response: Change the invocation of the mmdsh Explanation: This message contains progress
command to use the -F or -L options, or set the information about the mmmount command.
WCOLL environment variable before invoking the User response: None. Informational message only.
mmdsh command.
6027-1625 option cannot be used with attribute 6027-1630 The GPFS cluster data on nodeName is
name. back level.
Explanation: An attempt was made to change a Explanation: A GPFS command attempted to commit
configuration attribute and requested the change to changes to the GPFS cluster configuration data, but the
take effect immediately (-i or -I option). However, the data on the server is already at a higher level. This can
specified attribute does not allow the operation. happen if the GPFS cluster configuration files were
altered outside the GPFS environment, or if the
User response: If the change must be made now, leave
mmchcluster command did not complete successfully.
off the -i or -I option. Then recycle the nodes to pick
up the new value. User response: Correct any problems and reissue the
command. If the problem persists, issue the mmrefresh
-f -a command.
6027-1626 Command is not supported in the type
environment.
6027-1631 The commit process failed.
Explanation: A GPFS administration command (mm...)
is not supported in the specified environment. Explanation: A GPFS administration command (mm...)
cannot commit its changes to the GPFS cluster
User response: Verify if the task is needed in this
configuration data.
environment, and if it is, use a different command.
User response: Examine the preceding messages,
correct the problem, and reissue the command. If the
6027-1627 The following nodes are not aware of
problem persists, perform problem determination and
the configuration server change: nodeList.
contact the IBM Support Center.
Do not start GPFS on the above nodes
until the problem is resolved.
6027-1632 The GPFS cluster configuration data on
Explanation: The mmchcluster command could not
nodeName is different than the data on
propagate the new cluster configuration servers to the
nodeName.
specified nodes.
Explanation: The GPFS cluster configuration data on
User response: Correct the problems and run the
the primary cluster configuration server node is
mmchcluster -p LATEST command before starting
different than the data on the secondary cluster
GPFS on the specified nodes.
configuration server node. This can happen if the GPFS
cluster configuration files were altered outside the
6027-1628 Cannot determine basic environment GPFS environment or if the mmchcluster command did
information. Not enough nodes are not complete successfully.
available.
User response: Correct any problems and issue the
Explanation: The mmchcluster command was unable mmrefresh -f -a command. If the problem persists,
to retrieve the GPFS cluster data files. Usually, this is perform problem determination and contact the IBM
due to too few nodes being available. Support Center.
6027-1637 command quitting. None of the specified 6027-1645 Node nodeName is fenced out from disk
nodes are valid. diskName.
Explanation: A GPFS command found that none of Explanation: A GPFS command attempted to access
the specified nodes passed the required tests. the specified disk, but found that the node attempting
the operation was fenced out from the disk.
User response: Determine why the nodes were not
accepted, fix the problems, and reissue the command. User response: Check whether there is a valid reason
why the node should be fenced out from the disk. If
there is no such reason, unfence the disk and reissue
6027-1638 Command: There are no unassigned the command.
nodes in the cluster.
Explanation: A GPFS command in a cluster 6027-1647 Unable to find disk with NSD volume
environment needs unassigned nodes, but found there id NSD volume id.
are none.
Explanation: A disk with the specified NSD volume id
User response: Verify whether there are any cannot be found.
unassigned nodes in the cluster. If there are none,
either add more nodes to the cluster using the User response: Specify a correct disk NSD volume id.
mmaddnode command, or delete some nodes from the
cluster using the mmdelnode command, and then
6027-1648 GPFS was unable to obtain a lock from
reissue the command.
node nodeName.
Explanation: GPFS failed in its attempt to get a lock
6027-1639 Command failed. Examine previous
from another node in the cluster.
error messages to determine cause.
User response: Verify that the reported node is
Explanation: A GPFS command failed due to
reachable. Examine previous error messages, if any. Fix
previously-reported errors.
the problems and then reissue the command.
User response: Check the previous error messages, fix
the problems, and then reissue the command. If no
6027-1661 Failed while processing disk descriptor
other messages are shown, examine the GPFS log files
descriptor on node nodeName.
in the /var/adm/ras directory on each node.
Explanation: A disk descriptor was found to be
unsatisfactory in some way.
6027-1642 command: Starting GPFS ...
User response: Check the preceding messages, if any,
Explanation: Progress information for the mmstartup
and correct the condition that caused the disk
command.
descriptor to be rejected.
User response: None. Informational message only.
6027-1662 Disk device deviceName refers to an
6027-1643 The number of quorum nodes exceeds existing NSD name
the maximum (number) allowed.
Explanation: The specified disk device refers to an
Explanation: An attempt was made to add more existing NSD.
quorum nodes to a cluster than the maximum number
User response: Specify another disk that is not an
allowed.
existing NSD.
User response: Reduce the number of quorum nodes,
and reissue the command.
6027-1663 Disk descriptor descriptor should refer to 6027-1677 Disk diskName is of an unknown type.
an existing NSD. Use mmcrnsd to create
Explanation: The specified disk is of an unknown
the NSD.
type.
Explanation: An NSD disk given as input is not
User response: Specify a disk whose type is
known to GPFS.
recognized by GPFS.
User response: Create the NSD. Then rerun the
command.
6027-1680 Disk name diskName is already
registered for use by GPFS.
6027-1664 command: Processing node nodeName
Explanation: The cited disk name was specified for
Explanation: Progress information. use by GPFS, but there is already a disk by that name
registered for use by GPFS.
User response: None. Informational message only.
User response: Specify a different disk name for use
by GPFS and reissue the command.
6027-1665 Issue the command from a node that
remains in the cluster.
6027-1681 Node nodeName is being used as an NSD
Explanation: The nature of the requested change
server.
requires the command be issued from a node that will
remain in the cluster. Explanation: The specified node is defined as a server
node for some disk.
User response: Run the command from a node that
will remain in the cluster. User response: If you are trying to delete the node
from the GPFS cluster, you must either delete the disk
or define another node as its server.
6027-1666 [I] No disks were found.
Explanation: A command searched for disks but
6027-1685 Processing continues without lock
found none.
protection.
User response: If disks are desired, create some using
Explanation: The command will continue processing
the mmcrnsd command.
although it was not able to obtain the lock that
prevents other GPFS commands from running
6027-1670 Incorrect or missing remote shell simultaneously.
command: name
User response: Ensure that no other GPFS command
Explanation: The specified remote command does not is running. See the command documentation for
exist or is not executable. additional details.
User response: Specify a valid command.
6027-1688 Command was unable to obtain the lock
for the GPFS system data. Unable to
6027-1671 Incorrect or missing remote file copy reach the holder of the lock nodeName.
command: name Check the preceding messages, if any.
Explanation: The specified remote command does not Follow the procedure outlined in the
exist or is not executable. GPFS: Problem Determination Guide.
User response: Specify a valid command. Explanation: A command requires the lock for the
GPFS system data but was not able to obtain it.
6027-1672 option value parameter must be an User response: Check the preceding messages, if any.
absolute path name. Follow the procedure in the IBM Spectrum Scale:
Problem Determination Guide for what to do when the
Explanation: The mount point does not begin with '/'. GPFS system data is locked. Then reissue the
User response: Specify the full path for the mount command.
point.
6027-1689 vpath disk diskName is not recognized as
6027-1674 command: Unmounting file systems ... an IBM SDD device.
Explanation: This message contains progress Explanation: The mmvsdhelper command found that
information about the mmumount command. the specified disk is a vpath disk, but it is not
recognized as an IBM SDD device.
User response: None. Informational message only.
User response: If this happens frequently, check IP Explanation: The administrator of the cluster requires
connections. authentication.
User response: Contact the administrator to obtain the
6027-1715 EINVAL trap from connect call to cluster's key and register it using: mmremotecluster
ipAddress (socket name) update.
Explanation: The connect call back to the requesting
node failed. 6027-1727 [E] The administrator of the cluster named
clusterName does not require
User response: This is caused by a bug in AIX socket
authentication. Unregister the clusters
support. Upgrade AIX kernel and TCP client support.
key using "mmremotecluster update".
Explanation: The administrator of the cluster does not
6027-1716 [N] Close connection to ipAddress
require authentication.
Explanation: Connection socket closed.
User response: Unregister the clusters key using:
User response: None. Informational message only. mmremotecluster update.
6027-1717 [E] Error initializing the configuration 6027-1728 [E] Remote mounts are not enabled within
server, err value the cluster named clusterName. Contact
the administrator and request that they
Explanation: The configuration server module could enable remote mounts.
not be initialized due to lack of system resources.
Explanation: The administrator of the cluster has not
User response: Check system memory. enabled remote mounts.
User response: Contact the administrator and request
6027-1718 [E] Could not run command name, err value remote mount access.
Explanation: The GPFS daemon failed to run the
specified command. 6027-1729 [E] The cluster named clusterName has not
User response: Verify correct installation. authorized this cluster to mount file
systems. Contact the cluster
administrator and request access.
6027-1724 [E] The key used by the cluster named
clusterName has changed. Contact the Explanation: The administrator of the cluster has not
administrator to obtain the new key and authorized this cluster to mount file systems.
register it using "mmremotecluster User response: Contact the administrator and request
update". access.
Explanation: The administrator of the cluster has
changed the key used for authentication. 6027-1730 [E] Unsupported cipherList cipherList
User response: Contact the administrator to obtain the requested.
new key and register it using mmremotecluster update. Explanation: The target cluster requested a cipherList
not supported by the installed version of OpenSSL.
6027-1725 [E] The key used by the cluster named User response: Install a version of OpenSSL that
clusterName has changed. Contact the supports the required cipherList or contact the
administrator to obtain the new key and administrator of the target cluster and request that a
register it using "mmauth update". supported cipherList be assigned to this remote cluster.
Explanation: The administrator of the cluster has
changed the key used for authentication. 6027-1731 [E] Unsupported cipherList cipherList
User response: Contact the administrator to obtain the requested.
new key and register it using mmauth update. Explanation: The target cluster requested a cipherList
that is not supported by the installed version of
6027-1726 [E] The administrator of the cluster named OpenSSL.
clusterName requires authentication. User response: Either install a version of OpenSSL
Contact the administrator to obtain the that supports the required cipherList or contact the
clusters key and register the key using administrator of the target cluster and request that a
"mmremotecluster update". supported cipherList be assigned to this remote cluster.
6027-1732 [X] Remote mounts are not enabled within 6027-1738 [E] Close connection to ipAddress
this cluster. (errorString). Attempting reconnect.
Explanation: Remote mounts cannot be performed in Explanation: Connection socket closed.
this cluster.
User response: None. Informational message only.
User response: See the IBM Spectrum Scale: Advanced
Administration Guide for instructions about enabling
6027-1739 [X] Accept socket connection failed: err
remote mounts. In particular, make sure the keys have
value.
been generated and a cipherlist has been set.
Explanation: The Accept socket connection received
an unexpected error.
6027-1733 OpenSSL dynamic lock support could
not be loaded. User response: None. Informational message only.
Explanation: One of the functions required for
dynamic lock support was not included in the version 6027-1740 [E] Timed out waiting for a reply from node
of the OpenSSL library that GPFS is configured to use. ipAddress.
User response: If this functionality is required, shut Explanation: A message that was sent to the specified
down the daemon, install a version of OpenSSL with node did not receive a response within the expected
the desired functionality, and configure GPFS to use it. time limit.
Then restart the daemon.
User response: None. Informational message only.
User response: None. Informational message only. User response: None. Informational message only.
6027-1736 [N] Reconnected to ipAddress 6027-1743 [W] Failed to load GSKit library path:
(dlerror) errorMessage
Explanation: The local mmfsd daemon has
successfully reconnected to a remote daemon following Explanation: The GPFS daemon could not load the
an unexpected connection break. library required to secure the node-to-node
communications.
User response: None. Informational message only.
User response: Verify that the gpfs.gskit package
was properly installed.
6027-1737 [N] Close connection to ipAddress
(errorString).
6027-1744 [I] GSKit library loaded and initialized.
Explanation: Connection socket closed.
Explanation: The GPFS daemon successfully loaded
User response: None. Informational message only. the library required to secure the node-to-node
communications.
User response: None. Informational message only.
6027-1805 [N] Rediscovered nsd server access to 6027-1811 [I] Vdisk server recovery: delay complete.
name.
Explanation: Done waiting for existing disk lease to
Explanation: A server rediscovered access to the expire before performing vdisk server recovery.
specified disk.
User response: None.
User response: None.
6027-1812 [E] Rediscovery failed for name.
6027-1806 [X] A Persistent Reserve could not be
Explanation: A server failed to rediscover access to the
established on device name (deviceName):
specified disk.
errorLine.
User response: Check the disk access issues and run
Explanation: GPFS is using Persistent Reserve on this
the command again.
disk, but was unable to establish a reserve for this
node.
6027-1813 [A] Error reading volume identifier (for
User response: Perform disk diagnostics.
objectName name) from configuration file.
Explanation: The volume identifier for the named
6027-1807 [E] NSD nsdName is using Persistent
recovery group or vdisk could not be read from the
Reserve, this will require an NSD server
mmsdrfs file. This should never occur.
on an osName node.
User response: Check for damage to the mmsdrfs file.
Explanation: A client tried to open a globally-attached
NSD disk, but the disk is using Persistent Reserve. An
osName NSD server is needed. GPFS only supports 6027-1814 [E] Vdisk vdiskName cannot be associated
Persistent Reserve on certain operating systems. with its recovery group
recoveryGroupName. This vdisk will be
User response: Use the mmchnsd command to add an
ignored.
osName NSD server for the NSD.
Explanation: The named vdisk cannot be associated
with its recovery group.
6027-1808 [A] Unable to reserve space for NSD
buffers. Increase pagepool size to at User response: Check for damage to the mmsdrfs file.
least requiredPagePoolSize MB. Refer to
the GPFS: Administration and
Programming Reference for more 6027-1815 [A] Error reading volume identifier (for
information on selecting an appropriate NSD name) from configuration file.
pagepool size. Explanation: The volume identifier for the named
Explanation: The pagepool usage for an NSD buffer NSD could not be read from the mmsdrfs file. This
(4*maxblocksize) is limited by factor nsdBufSpace. The should never occur.
value of nsdBufSpace can be in the range of 10-70. The User response: Check for damage to the mmsdrfs file.
default value is 30.
User response: Use the mmchconfig command to 6027-1816 [E] The defined server serverName for
decrease the value of maxblocksize or to increase the recovery group recoveryGroupName could
value of pagepool or nsdBufSpace. not be resolved.
Explanation: The hostname of the NSD server could
6027-1809 [E] The defined server serverName for NSD not be resolved by gethostbyName().
NsdName couldn't be resolved.
User response: Fix hostname resolution.
Explanation: The host name of the NSD server could
not be resolved by gethostbyName().
6027-1817 [E] Vdisks are defined, but no recovery
User response: Fix the host name resolution. groups are defined.
Explanation: There are vdisks defined in the mmsdrfs
6027-1810 [I] Vdisk server recovery: delay number sec. file, but no recovery groups are defined. This should
for safe recovery. never occur.
Explanation: Wait for the existing disk lease to expire User response: Check for damage to the mmsdrfs file.
before performing vdisk server recovery.
User response: None.
6027-1820 Disk descriptor for name refers to an 6027-1900 Failed to stat pathName.
existing NSD.
Explanation: A stat() call failed for the specified
Explanation: The mmcrrecoverygroup command or object.
mmaddpdisk command found an existing NSD.
User response: Correct the problem and reissue the
User response: Correct the input file, or use the -v command.
option.
6027-1901 pathName is not a GPFS file system
6027-1821 Error errno writing disk descriptor on object.
name.
Explanation: The specified path name does not resolve
Explanation: The mmcrrecoverygroup command or to an object within a mounted GPFS file system.
mmaddpdisk command got an error writing the disk
User response: Correct the problem and reissue the
descriptor.
command.
User response: Perform disk diagnostics.
6027-1902 The policy file cannot be determined.
6027-1822 Error errno reading disk descriptor on
Explanation: The command was not able to retrieve
name.
the policy rules associated with the file system.
Explanation: The tspreparedpdisk command got an
User response: Examine the preceding messages and
error reading the disk descriptor.
correct the reported problems. Establish a valid policy
User response: Perform disk diagnostics. file with the mmchpolicy command or specify a valid
policy file on the command line.
6027-1823 Path error, name and name are the same
disk. 6027-1903 path must be an absolute path name.
Explanation: The tspreparedpdisk command got an Explanation: The path name did not begin with a /.
error during path verification. The pdisk descriptor file
User response: Specify the absolute path name for the
is miscoded.
object.
User response: Correct the pdisk descriptor file and
reissue the command.
6027-1904 Device with major/minor numbers
number and number already exists.
6027-1824 [X] An unexpected Device Mapper path
Explanation: A device with the cited major and minor
dmDevice (nsdId) has been detected. The
numbers already exists.
new path does not have a Persistent
Reserve set up. Server disk diskName User response: Check the preceding messages for
will be put offline detailed information.
Explanation: A new device mapper path is detected or
a previously failed path is activated after the local
device discovery has finished. This path lacks a
Persistent Reserve, and cannot be used. All device
6027-1933 Disk diskName has been removed from 6027-1939 Line in error: line.
the GPFS cluster configuration data but
Explanation: The specified line from a user-provided
the NSD volume id was not erased from
input file contains errors.
the disk. To remove the NSD volume id,
issue mmdelnsd -p NSDvolumeid. User response: Check the preceding messages for
more information. Correct the problems and reissue the
Explanation: A GPFS administration command (mm...)
command.
successfully removed the specified disk from the GPFS
cluster configuration data, but was unable to erase the
NSD volume id from the disk. 6027-1940 Unable to set reserve policy policy on
disk diskName on node nodeName.
User response: Issue the specified command to
remove the NSD volume id from the disk. Explanation: The specified disk should be able to
support Persistent Reserve, but an attempt to set up the
registration key failed.
6027-1934 Disk diskName has been removed from
the GPFS cluster configuration data but User response: Correct the problem and reissue the
the NSD volume id was not erased from command.
the disk. To remove the NSD volume id,
issue: mmdelnsd -p NSDvolumeid -N
nodeList. 6027-1941 Cannot handle multiple interfaces for
host hostName.
Explanation: A GPFS administration command (mm...)
successfully removed the specified disk from the GPFS Explanation: Multiple entries were found for the
cluster configuration data but was unable to erase the given hostname or IP address either in /etc/hosts or by
NSD volume id from the disk. the host command.
User response: Issue the specified command to User response: Make corrections to /etc/hosts and
remove the NSD volume id from the disk. reissue the command.
6027-1936 Node nodeName cannot support 6027-1942 Unexpected output from the 'host -t a
Persistent Reserve on disk diskName name' command:
because it is not an AIX node. The disk Explanation: A GPFS administration command (mm...)
will be used as a non-PR disk. received unexpected output from the host -t a
Explanation: A non-AIX node was specified as an command for the given host.
NSD server for the disk. The disk will be used as a User response: Issue the host -t a command
non-PR disk. interactively and carefully review the output, as well as
User response: None. Informational message only. any error messages.
6027-1937 A node was specified more than once as 6027-1943 Host name not found.
an NSD server in disk descriptor Explanation: A GPFS administration command (mm...)
descriptor. could not resolve a host from /etc/hosts or by using the
Explanation: A node was specified more than once as host command.
an NSD server in the disk descriptor shown. User response: Make corrections to /etc/hosts and
User response: Change the disk descriptor to reissue the command.
eliminate any redundancies in the list of NSD servers.
6027-1945 Disk name diskName is not allowed.
6027-1938 configParameter is an incorrect parameter. Names beginning with gpfs are reserved
Line in error: configLine. The line is for use by GPFS.
ignored; processing continues. Explanation: The cited disk name is not allowed
Explanation: The specified parameter is not valid and because it begins with gpfs.
will be ignored. User response: Specify a disk name that does not
User response: None. Informational message only. begin with gpfs and reissue the command.
6027-1947 Use mmauth genkey to recover the file 6027-1964 I/O error on diskName
fileName, or to generate and commit a
Explanation: An I/O error occurred on the specified
new key.
disk.
Explanation: The specified file was not found.
User response: Check for additional error messages.
User response: Recover the file, or generate a new key Check the error log for disk hardware problems.
by running: mmauth genkey propagate or generate a
new key by running mmauth genkey new, followed by
6027-1967 Disk diskName belongs to back-level file
the mmauth genkey commit command.
system fileSystem or the state of the disk
is not ready. Use mmchfs -V to convert
6027-1948 Disk diskName is too large. the file system to the latest format. Use
mmchdisk to change the state of a disk.
Explanation: The specified disk is too large.
Explanation: The specified disk cannot be initialized
User response: Specify a smaller disk and reissue the
for use as a tiebreaker disk. Possible reasons are
command.
suggested in the message text.
User response: Use the mmlsfs and mmlsdisk
6027-1949 Propagating the cluster configuration
commands to determine what action is needed to
data to all affected nodes.
correct the problem.
Explanation: The cluster configuration data is being
sent to the rest of the nodes in the cluster.
6027-1968 Failed while processing disk diskName.
User response: This is an informational message.
Explanation: An error was detected while processing
the specified disk.
6027-1950 Local update lock is busy.
User response: Examine prior messages to determine
Explanation: More than one process is attempting to the reason for the failure. Correct the problem and
update the GPFS environment at the same time. reissue the command.
User response: Repeat the command. If the problem
persists, verify that there are no blocked processes. 6027-1969 Device device already exists on node
nodeName
6027-1951 Failed to obtain the local environment Explanation: This device already exists on the
update lock. specified node.
Explanation: GPFS was unable to obtain the local User response: None.
environment update lock for more than 30 seconds.
User response: Examine previous error messages, if 6027-1970 Disk diskName has no space for the
any. Correct any problems and reissue the command. If quorum data structures. Specify a
the problem persists, perform problem determination different disk as tiebreaker disk.
and contact the IBM Support Center.
Explanation: There is not enough free space in the file
system descriptor for the tiebreaker disk data
6027-1962 Permission denied for disk diskName structures.
Explanation: The user does not have permission to User response: Specify a different disk as a tiebreaker
access disk diskName. disk.
User response: Correct the permissions and reissue
the command. 6027-1974 None of the quorum nodes can be
reached.
6027-1963 Disk diskName was not found. Explanation: Ensure that the quorum nodes in the
cluster can be reached. At least one of these nodes is
Explanation: The specified disk was not found. required for the command to succeed.
User response: Specify an existing disk and reissue User response: Ensure that the quorum nodes are
the command. available and reissue the command.
6027-1975 The descriptor file contains more than 6027-1988 File system fileSystem is not mounted.
one descriptor.
Explanation: The cited file system is not currently
Explanation: The descriptor file must contain only one mounted on this node.
descriptor.
User response: Ensure that the file system is mounted
User response: Correct the descriptor file. and reissue the command.
6027-1976 The descriptor file contains no 6027-1993 File fileName either does not exist or has
descriptor. an incorrect format.
Explanation: The descriptor file must contain only one Explanation: The specified file does not exist or has
descriptor. an incorrect format.
User response: Correct the descriptor file. User response: Check whether the input file specified
actually exists.
6027-1977 Failed validating disk diskName. Error
code errorCode. 6027-1994 Did not find any match with the input
disk address.
Explanation: GPFS control structures are not as
expected. Explanation: The mmfileid command returned
without finding any disk addresses that match the
User response: Contact the IBM Support Center.
given input.
User response: None. Informational message only.
6027-1984 Name name is not allowed. It is longer
than the maximum allowable length
(length). 6027-1995 Device deviceName is not mounted on
node nodeName.
Explanation: The cited name is not allowed because it
is longer than the cited maximum allowable length. Explanation: The specified device is not mounted on
the specified node.
User response: Specify a name whose length does not
exceed the maximum allowable length, and reissue the User response: Mount the specified device on the
command. specified node and reissue the command.
6027-1985 mmfskxload: The format of the GPFS 6027-1996 Command was unable to determine
kernel extension is not correct for this whether file system fileSystem is
version of AIX. mounted.
Explanation: This version of AIX is incompatible with Explanation: The command was unable to determine
the current format of the GPFS kernel extension. whether the cited file system is mounted.
User response: Contact your system administrator to User response: Examine any prior error messages to
check the AIX version and GPFS kernel extension. determine why the command could not determine
whether the file system was mounted, resolve the
problem if possible, and then reissue the command. If
6027-1986 junctionName does not resolve to a
you cannot resolve the problem, reissue the command
directory in deviceName. The junction
with the daemon down on all nodes of the cluster. This
must be within the specified file
will ensure that the file system is not mounted, which
system.
may allow the command to proceed.
Explanation: The cited junction path name does not
belong to the specified file system.
6027-1997 Backup control file fileName from a
User response: Correct the junction path name and previous backup does not exist.
reissue the command.
Explanation: The mmbackup command was asked to
do an incremental or a resume backup, but the control
6027-1987 Name name is not allowed. file from a previous backup could not be found.
Explanation: The cited name is not allowed because it User response: Restore the named file to the file
is a reserved word or a prohibited character. system being backed up and reissue the command, or
else do a full backup.
User response: Specify a different name and reissue
the command.
6027-2008 For the logical volume specification -l 6027-2014 Node node does not have access to disk
lvName to be valid lvName must be the physicalDiskName.
only logical volume in the volume
Explanation: The specified node is not able to access
group. However, volume group vgName
the specified disk.
contains logical volumes.
User response: Choose a different node or disk (or
Explanation: The command is being run on a logical
both), and retry the command. If both the node and
disk name are correct, make sure that the node has
6027-2022 Could not open disk physicalDiskName,
access to the disk.
errno value.
Explanation: The specified disk cannot be opened.
6027-2015 Node node does not hold a reservation
for disk physicalDiskName. User response: Examine the errno value and other
messages to determine the reason for the failure.
Explanation: The node on which this command is run
Correct the problem and reissue the command.
does not have access to the disk.
User response: Run this command from another node
6027-2023 retVal = value, errno = value for key
that has access to the disk.
value.
Explanation: An ioctl call failed with stated return
6027-2016 SSA fencing support is not present on
code, errno value, and related values.
this node.
User response: Check the reported errno and correct
Explanation: This node does not support SSA fencing.
the problem if possible. Otherwise, contact the IBM
User response: None. Support Center.
6027-2017 Node ID nodeId is not a valid SSA node 6027-2024 ioctl failed with rc=returnCode,
ID. SSA node IDs must be a number in errno=errnoValue. Related values are
the range of 1 to 128. scsi_status=scsiStatusValue,
sense_key=senseKeyValue,
Explanation: You specified a node ID outside of the scsi_asc=scsiAscValue,
acceptable range. scsi_ascq=scsiAscqValue.
User response: Choose a correct node ID and retry the Explanation: An ioctl call failed with stated return
command. code, errno value, and related values.
User response: Check the reported errno and correct
6027-2018 The SSA node id is not set. the problem if possible. Otherwise, contact the IBM
Explanation: The SSA node ID has not been set. Support Center.
6027-2112 Permission failure. Option option 6027-2119 [E] Recovery group name not found.
requires root authority to run.
Explanation: The specified recovery group was not
Explanation: The specified command option requires found.
root authority.
User response: Correct the input and reissue the
command.
6027-2134 Node nodeName cannot be used as an 6027-2139 NSD server nodes must be running
NSD server for Persistent Reserve disk either all AIX or all Linux to enable
diskName because it is not a Linux node. Persistent Reserve for disk diskName.
Explanation: There was an attempt to enable Explanation: There was an attempt to enable
Persistent Reserve for a disk, but not all of the NSD Persistent Reserve for a disk, but not all NSD server
server nodes are running Linux. nodes were running all AIX or all Linux nodes.
User response: Correct the configuration and enter the User response: Correct the configuration and enter the
command again. command again.
6027-2135 All nodes in the cluster must be 6027-2140 All NSD server nodes must be running
running AIX to enable Persistent AIX or all running Linux to enable
Reserve for SAN attached disk diskName. Persistent Reserve for disk diskName.
Explanation: There was an attempt to enable Explanation: Attempt to enable Persistent Reserve for
Persistent Reserve for a SAN-attached disk, but not all a disk while not all NSD server nodes are running
nodes in the cluster are running AIX. AIXor all running Linux.
User response: Correct the configuration and run the User response: Correct the configuration first.
command again.
6027-2141 Disk diskName is not configured as a
6027-2136 All NSD server nodes must be running regular hdisk.
AIX to enable Persistent Reserve for
Explanation: In an AIX only cluster, Persistent Reserve
disk diskName.
is supported for regular hdisks only.
Explanation: There was an attempt to enable
User response: Correct the configuration and enter the
Persistent Reserve for the specified disk, but not all
command again.
NSD servers are running AIX.
User response: Correct the configuration and enter the
6027-2142 Disk diskName is not configured as a
command again.
regular generic disk.
Explanation: In a Linux only cluster, Persistent
6027-2137 An attempt to clear the Persistent
Reserve is supported for regular generic or device
Reserve reservations on disk diskName
mapper virtual disks only.
failed.
User response: Correct the configuration and enter the
Explanation: You are importing a disk into a cluster in
command again.
which Persistent Reserve is disabled. An attempt to
clear the Persistent Reserve reservations on the disk
failed. 6027-2143 Mount point mountPoint can not be part
of automount directory automountDir.
User response: Correct the configuration and enter the
command again. Explanation: The mount point cannot be the parent
directory of the automount directory.
6027-2138 The cluster must be running either all User response: Specify a mount point that is not the
AIX or all Linux nodes to change parent of the automount directory.
Persistent Reserve disk diskName to a
SAN-attached disk.
6027-2144 [E] The lockName lock for file system
Explanation: There was an attempt to redefine a fileSystem is busy.
Persistent Reserve disk as a SAN attached disk, but not
all nodes in the cluster were running either all AIX or Explanation: More than one process is attempting to
all Linux nodes. obtain the specified lock.
User response: Correct the configuration and enter the User response: Repeat the command. If the problem
command again. persists, verify that there are no blocked processes.
6027-2145 [E] Internal remote command 'mmremote 6027-2152 The path directoryPath containing image
command' no longer supported. archives was not found.
Explanation: A GPFS administration command Explanation: The directory path supplied does not
invoked an internal remote command which is no contain the expected image files to archive into TSM.
longer supported. Backward compatibility for remote
User response: Correct the directory path name
commands are only supported for release 3.4 and
supplied.
newer.
User response: All nodes within the cluster must be at
6027-2153 The archiving system backupProgram
release 3.4 or newer. If all the cluster nodes meet this
exited with status return code. Image
requirement, contact the IBM Support Center.
backup files have been preserved in
globalWorkDir
6027-2147 [E] BlockSize must be specified in disk
Explanation: Archiving system executed and returned
descriptor.
a non-zero exit status due to some error.
Explanation: The blockSize positional parameter in a
User response: Examine archiver log files to discern
vdisk descriptor was empty. The bad disk descriptor is
the cause of the archiver's failure. Archive the
displayed following this message.
preserved image files from the indicated path.
User response: Correct the input and reissue the
command.
6027-2154 Unable to create a policy file for image
backup in policyFilePath.
6027-2148 [E] nodeName is not a valid recovery group
Explanation: A temporary file could not be created in
server for recoveryGroupName.
the global shared directory path.
Explanation: The server name specified is not one of
User response: Check or correct the directory path
the defined recovery group servers.
name supplied.
User response: Correct the input and reissue the
command.
6027-2155 File system fileSystem must be mounted
read only for restore.
6027-2149 [E] Could not get recovery group
Explanation: The empty file system targeted for
information from an active server.
restoration must be mounted in read only mode during
Explanation: A command that needed recovery group restoration.
information failed; the GPFS daemons may have
User response: Unmount the file system on all nodes
become inactive or the recovery group is temporarily
and remount it read only, then try the command again.
unavailable.
User response: Reissue the command.
6027-2156 The image archive index ImagePath could
not be found.
6027-2150 The archive system client backupProgram
Explanation: The archive image index could be found
could not be found or is not executable.
in the specified path
Explanation: TSM dsmc or other specified backup or
User response: Check command arguments for correct
archive system client could not be found.
specification of image path, then try the command
User response: Verify that TSM is installed, dsmc can again.
be found in the installation location or that the archiver
client specified is executable.
6027-2157 The image archive index ImagePath is
corrupt or incomplete.
6027-2151 The path directoryPath is not contained
Explanation: The archive image index specified is
in the snapshot snapshotName.
damaged.
Explanation: The directory path supplied is not
User response: Check the archive image index file for
contained in the snapshot named with the -S
corruption and remedy.
parameter.
User response: Correct the directory path or snapshot
name supplied, or omit -S and the snapshot name in
the command.
6027-2158 Disk usage must be dataOnly, 6027-2164 [E] Disk descriptor for name refers to an
metadataOnly, descOnly, existing pdisk.
dataAndMetadata, vdiskLog,
Explanation: The specified pdisk already exists.
vdiskLogTip, vdiskLogTipBackup, or
vdiskLogReserved. User response: Correct the command invocation and
try again.
Explanation: The disk usage positional parameter in a
vdisk descriptor has a value that is not valid. The bad
disk descriptor is displayed following this message. 6027-2165 [E] Node nodeName cannot be used as a
server of both vdisks and non-vdisk
User response: Correct the input and reissue the
NSDs.
command.
Explanation: The command specified an action that
would have caused vdisks and non-vdisk NSDs to be
6027-2159 [E] parameter is not valid or missing in the
defined on the same server. This is not a supported
vdisk descriptor.
configuration.
Explanation: The vdisk descriptor is not valid. The
User response: Correct the command invocation and
bad descriptor is displayed following this message.
try again.
User response: Correct the input and reissue the
command.
6027-2166 [E] GPFS Native RAID is not configured.
Explanation: GPFS Native RAID is not configured on
6027-2160 [E] Vdisk vdiskName is already mapped to
this node.
NSD nsdName.
User response: Reissue the command on the
Explanation: The command cannot create the specified
appropriate node.
NSD because the underlying vdisk is already mapped
to a different NSD.
6027-2167 [E] Device deviceName does not exist or is
User response: Correct the input and reissue the
not active on this node.
command.
Explanation: The specified device does not exist or is
not active on the node.
6027-2161 [E] NAS servers cannot be specified when
creating an NSD on a vdisk. User response: Reissue the command on the
appropriate node.
Explanation: The command cannot create the specified
NSD because servers were specified and the underlying
disk is a vdisk. 6027-2168 [E] The GPFS cluster must be shut down
before downloading firmware to port
User response: Correct the input and reissue the
cards.
command.
Explanation: The GPFS daemon must be down on all
nodes in the cluster before attempting to download
6027-2162 [E] Cannot set nsdRAIDTracks to zero;
firmware to a port card.
nodeName is a recovery group server.
User response: Stop GPFS on all nodes and reissue
Explanation: nsdRAIDTracks cannot be set to zero
the command.
while the node is still a recovery group server.
User response: Modify or delete the recovery group
6027-2169 Unable to disable Persistent Reserve on
and reissue the command.
the following disks: diskList
Explanation: The command was unable to disable
6027-2163 [E] Vdisk name not found in the daemon.
Persistent Reserve on the specified disks.
Recovery may be occurring. The disk
will not be deleted. User response: Examine the disks and additional error
information to determine if the disks should support
Explanation: GPFS cannot find the specified vdisk.
Persistent Reserve. Correct the problem and reissue the
This can happen if recovery is taking place and the
command.
recovery group is temporarily inactive.
User response: Reissue the command. If the recovery
group is damaged, specify the -p option.
6027-2170 [E] Recovery group recoveryGroupName does 6027-2176 [E] mmchattr for fileName failed.
not exist or is not active.
Explanation: The command to change the attributes of
Explanation: A command was issued to a recovery the file failed.
group that does not exist or is not in the active state.
User response: Check the previous error messages
User response: Reissue the command with a valid and correct the problems.
recovery group name or wait for the recovery group to
become active.
6027-2177 [E] Cannot create file fileName.
Explanation: The command to create the specified file
6027-2171 [E] objectType objectName already exists in the
failed.
cluster.
User response: Check the previous error messages
Explanation: The file system being imported contains
and correct the problems.
an object with a name that conflicts with the name of
an existing object in the cluster.
6027-2178 File fileName does not contain any NSD
User response: If possible, remove the object with the
descriptors or stanzas.
conflicting name.
Explanation: The input file should contain at least one
NSD descriptor or stanza.
6027-2172 [E] Errors encountered while importing
GPFS Native RAID objects. User response: Correct the input file and reissue the
command.
Explanation: Errors were encountered while trying to
import a GPFS Native RAID based file system. No file
systems will be imported. 6027-2181 [E] Failover is allowed only for
single-writer, independent-writer
User response: Check the previous error messages
filesets.
and if possible, correct the problems.
Explanation: The fileset AFM mode is not compatible
with the requested operation.
6027-2173 [I] Use mmchrecoverygroup to assign and
activate servers for the following User response: Check the previous error messages
recovery groups (automatically assigns and correct the problems.
NSD servers as well): recoveryGroupList
Explanation: The mmimportfs command imported the 6027-2182 [E] Resync is allowed only for single-writer
specified recovery groups. These must have servers filesets.
assigned and activated.
Explanation: The fileset AFM mode is not compatible
User response: After the mmimportfs command with the requested operation.
finishes, use the mmchrecoverygroup command to
assign NSD server nodes as needed. User response: Check the previous error messages
and correct the problems.
6027-2199 [E] No enclosures were found. 6027-2205 There are no AFM target map
definitions.
Explanation: A command searched for disk enclosures
but none were found. Explanation: A command searched for AFM target
map definitions but found none.
User response: None.
User response: None. Informational message only.
6027-2200 [E] Cannot have multiple nodes updating
firmware for the same enclosure. 6027-2206 AFM target map mapName is not
Enclosure serialNumber is already being defined.
updated by node nodeName.
Explanation: The cited AFM target map name is not
Explanation: The mmchenclosure command was known to GPFS.
called with multiple nodes updating the same
User response: Specify an AFM target map known to
firmware.
GPFS.
User response: Correct the node list and reissue the
command.
6027-2207 Node nodeName is being used as a
gateway node for the AFM cluster
6027-2201 [E] The mmafmctl flushPending command clusterName.
completed with errors.
Explanation: The specified node is defined as a
Explanation: An error occurred while flushing the gateway node for the specified AFM cluster.
queue.
User response: If you are trying to delete the node
User response: Examine the GPFS log to identify the from the GPFS cluster or delete the gateway node role,
cause. you must remove it from the export server map.
6027-2202 [E] There is a SCSI-3 PR reservation on 6027-2208 [E] commandName is already running in the
disk diskname. mmcrnsd cannot format cluster.
the disk because the cluster is not
Explanation: Only one instance of the specified
configured as PR enabled.
command is allowed to run.
Explanation: The specified disk has a SCSI-3 PR
User response: None.
reservation, which prevents the mmcrnsd command
from formatting it.
6027-2209 [E] Unable to list objectName on node
User response: Clear the PR reservation by following
nodeName.
the instructions in “Clearing a leftover Persistent
Reserve reservation” on page 139. Explanation: A command was unable to list the
specific object that was requested.
6027-2203 Node nodeName is not a gateway node. User response: None.
Explanation: The specified node is not a gateway
node. 6027-2210 [E] Unable to build a storage enclosure
inventory file on node nodeName.
User response: Designate the node as a gateway node
or specify a different node on the command line. Explanation: A command was unable to build a
storage enclosure inventory file. This is a temporary file
that is required to complete the requested command.
6027-2204 AFM target map mapName is already
defined. User response: None.
Explanation: A request was made to create an AFM
target map with the cited name, but that map name is 6027-2211 [E] Error collecting firmware information on
already defined. node nodeName.
User response: Specify a different name for the new Explanation: A command was unable to gather
AFM target map or first delete the current map firmware information from the specified node.
definition and then recreate it.
User response: Ensure the node is active and retry the
command.
Examine the preceding messages and the GPFS log for Explanation: The mmchfileset command cannot be
additional details. used to change the NFS server or IP address of the
home cluster.
User response: Correct the problems and reissue the
command. User response: To change the AFM target, use the
mmafmctl failover command and specify the
--target-only option. To change the AFM target for
6027-2225 [E] Peer snapshot successfully deleted at
primary filesets, use the mmafmctl changeSecondary
cache. The delete snapshot operation
command.
failed at home. Error code errorCode.
Explanation: For an active fileset, check the AFM
6027-2231 [E] The specified block size blockSize is
target configuration for peer snapshots. Ensure there is
smaller than the system page size
at least one gateway node configured for the cluster.
pageSize.
Examine the preceding messages and the GPFS log for
additional details. Explanation: The file system block size cannot be
smaller than the system memory page size.
User response: Correct the problems and reissue the
command. User response: Specify a block size greater than or
equal to the system memory page size.
6027-2226 [E] Invalid firmware update file.
6027-2232 [E] Peer snapshots are allowed only for
Explanation: An invalid firmware update file was
targets using the NFS protocol.
specified for the mmchfirmware command.
Explanation: The mmpsnap command can be used to
User response: Reissue the command with a valid
create snapshots only for filesets that are configured to
update file.
use the NFS protocol.
User response: Specify a valid fileset target.
6027-2227 [E] Failback is allowed only for
independent-writer filesets.
6027-2233 [E] Fileset filesetName in file system
Explanation: Failback operation is allowed only for
filesystemName does not contain peer
independent-writer filesets.
snapshot snapshotName. The delete
User response: Check the fileset mode. snapshot operation failed at cache. Error
code errorCode.
6027-2228 [E] The daemon version (daemonVersion) on Explanation: The specified snapshot name was not
node nodeName is lower than the found. The command expects the name of an existing
daemon version (daemonVersion) on node peer snapshot of the active fileset in the specified file
nodeName. system.
Explanation: A command was issued that requires User response: Reissue the command with a valid
nodes to be at specific levels, but the affected GPFS peer snapshot name.
servers are not at compatible levels to support this
operation.
6027-2234 [E] Use the mmafmctl converttoprimary
User response: Update the GPFS code on the specified command for converting to primary
servers and retry the command. fileset.
Explanation: Converting to a primary fileset is not
6027-2229 [E] Cache Eviction/Prefetch is not allowed allowed directly.
for Primary and Secondary mode
User response: Check the previous error messages
filesets.
and correct the problems.
Explanation: Cache eviction/prefetch is not allowed
for primary and secondary mode filesets.
6027-2235 [E] Only independent filesets can be
User response: None. converted to secondary filesets.
Explanation: Converting to secondary filesets is
6027-2230 [E] afmTarget=newTargetString is not allowed only for independent filesets.
allowed. To change the AFM target, use
User response: None.
mmafmctl failover with the --target-only
option. For primary filesets, use
mmafmctl changeSecondary.
6027-2236 [E] The CPU architecture on this node does 6027-2242 [E] Error in configuration file.
not support tracing in traceMode mode.
Explanation: The mmnfs export load loadCfgFile
Switching to traceMode mode.
command found an error in the NFS configuration files.
Explanation: The CPU does not have constant time
User response: Correct the configuration file error.
stamp counter capability, which is required for
overwrite trace mode. The trace has been enabled in
blocking mode. 6027-2245 [E] To change the AFM target, use
mmafmctl changeSecondary for the
User response: Update the configuration parameters
primary.
to use the trace facility in blocking mode or replace this
node with modern CPU architecture. Explanation: Failover with the targetonly option can
be run on a primary fileset.
6027-2237 [W] An image backup made from the live User response: None.
file system may not be usable for image
restore. Specify a valid global snapshot
for image backup. 6027-2246 [E] Timeout executing function:
functionName (return code=returnCode).
Explanation: The mmimgbackup command should
always be used with a global snapshot to make a Explanation: The executeCommandWithTimeout
consistent image backup of the file system. function was called but it timed out.
User response: Correct the command invocation to User response: Correct the problem and issue the
include the -S option to specify either a global snapshot command again.
name or a directory path that includes the snapshot
root directory for the file system and a valid global 6027-2247 [E] Creation of exchangeDir failed.
snapshot name.
Explanation: A Cluster Export Service command was
unable to create the CCR exchange directory.
6027-2238 [E] Use the mmafmctl convertToSecondary
command for converting to secondary. User response: Correct the problem and issue the
command again.
Explanation: Converting to secondary is allowed by
using the mmafmctl convertToSecondary command.
6027-2248 [E] CCR command failed: command
User response: None.
Explanation: A CCR update command failed.
6027-2239 [E] Drive serialNumber serialNumber is User response: Correct the problem and issue the
being managed by server nodeName. command again.
Reissue the mmchfirmware command
for server nodeName. 6027-2249 [E] Error getting next nextName from CCR.
Explanation: The mmchfirmware command was Explanation: An expected value from CCR was not
issued to update a specific disk drive which is not obtained.
currently being managed by this node.
User response: Issue the command again.
User response: Reissue the command specifying the
active server.
6027-2250 [E] Error putting next nextName to CCR,
new ID: newExpid version: version
6027-2240 [E] Option is not supported for a secondary
fileset. Explanation: A CCR value update failed.
Explanation: This option cannot be set for a secondary User response: Issue the command again.
fileset.
User response: None. 6027-2251 [E] Error retrieving configuration file:
configFile
6027-2241 [E] Node nodeName is not a CES node. Explanation: Error retrieving configuration file from
CCR.
Explanation: A Cluster Export Service command
specified a node that is not defined as a CES node. User response: Issue the command again.
6027-2252 [E] Error reading export configuration file 6027-2259 [E] The path exportPath to create the export
(return code: returnCode). does not exist (return code:returnCode).
Explanation: A CES command was unable to read the Explanation: A CES command was unable to create an
export configuration file. export because the path does not exist.
User response: Correct the problem and issue the User response: Correct the problem and issue the
command again. command again.
6027-2253 [E] Error creating the internal export data 6027-2260 [E] The path exportPath to create the export
objects (return code returnCode). is invalid (return code: returnCode).
Explanation: A CES command was unable to create an Explanation: A CES command was unable to create an
export data object. export because the path is invalid.
User response: Correct the problem and issue the User response: Correct the problem and issue the
command again. command again.
6027-2254 [E] Error creating single export output, 6027-2261 [E] Error creating new export object, invalid
export exportPath not found (return code data entered (return code: returnCode).
returnCode).
Explanation: A CES command was unable to add an
Explanation: A CES command was unable to create a export because the input data is invalid.
single export print output.
User response: Correct the problem and issue the
User response: Correct the problem and reissue the command again.
command.
6027-2262 [E] Error creating new export object; getting
6027-2255 [E] Error creating export output (return new export ID (return code: returnCode).
code: returnCode).
Explanation: A CES command was unable to add an
Explanation: A CES command was unable to create export. A new export ID was not obtained.
the export print output.
User response: Correct the problem and issue the
User response: Correct the problem and issue the command again.
command again.
6027-2263 [E] Error adding export; new export path
6027-2256 [E] Error creating the internal export output exportPath already exists.
file string array (return code: returnCode).
Explanation: A CES command was unable to add an
Explanation: A CES command was unable to create export because the path already exists.
the array for print output.
User response: Correct the problem and issue the
User response: Correct the problem and issue the command again.
command again.
6027-2264 [E] The --servers option is only used to
6027-2257 [E] Error deleting export, export exportPath provide names for primary and backup
not found (return code: returnCode). server configurations. Provide a
maximum of two server names.
Explanation: A CES command was unable to delete an
export. The exportPath was not found. Explanation: An input node list has too many nodes
specified.
User response: Correct the problem and issue the
command again. User response: Verify the list of nodes and shorten the
list to the supported number.
6027-2258 [E] Error writing export configuration file to
CCR (return code: returnCode). 6027-2265 [E] Cannot convert fileset to secondary
fileset.
Explanation: A CES command was unable to write
configuration file to CCR. Explanation: Fileset cannot be converted to a
secondary fileset.
User response: Correct the problem and issue the
command again. User response: None.
6027-2266 [E] The snapshot names that start with 6027-2273 [E] Error adding the requested IP address
psnap-rpo or psnap0-rpo are reserved ipAddress to a client declaration, return
for RPO. code: returnCode
Explanation: The specified snapshot name starts with Explanation: One of the specified IP addresses to add
psnap-rpo or psnap0-rpo, which are reserved for RPO could not be applied for the given export path.
snapshots.
User response: Correct the problem and try again.
User response: Use a different snapshot name for the
mmcrsnapshot command.
6027-2274 [E] Error changing the requested IP address
ipAddress of a client declaration, return
6027-2267 [I] Fileset filesetName in file system code: returnCode
fileSystem is either unlinked or being
Explanation: The client change could not be applied
deleted. Home delete-snapshot
for the given export path.
operation was not queued.
User response: Correct the problem and try again.
Explanation: The command expects that the peer
snapshot at home is not deleted because the fileset at
cache is either unlinked or being deleted. 6027-2275 [E] Unable to determine the status of DASD
device dasdDevice
User response: Delete the snapshot at home manually.
Explanation: The dasdview command failed.
6027-2268 [E] This is already a secondary fileset. User response: Examine the preceding messages,
correct the problem, and reissue the command.
Explanation: The fileset is already a secondary fileset.
User response: None.
6027-2276 [E] The specified DASD device dasdDevice is
not properly formatted. It is not an
6027-2269 [E] Adapter adapterIdentifier was not found. ECKD-type device, or it has a format
other then CDL or LDL, or it has a
Explanation: The specified adapter was not found.
block size other then 4096.
User response: Specify an existing adapter and reissue
Explanation: The specified device is not properly
the command.
formatted.
User response: Correct the problem and reissue the
6027-2270 [E] Error errno updating firmware for
command.
adapter adapterIdentifier.
Explanation: The firmware load failed for the
6027-2277 [E] Unable to determine if DASD device
specified adapter.
dasdDevice is partitioned.
User response: None.
Explanation: The fdasd command failed.
User response: Examine the preceding messages,
6027-2271 [E] Error locating the reference client IP
correct the problem, and reissue the command.
ipAddress, return code: returnCode
Explanation: The reference IP address for reordering a
6027-2278 [E] Cannot partition DASD device
client could not be found for the given export path.
dasdDevice; it is already partitioned.
User response: Correct the problem and try again.
Explanation: The specified DASD device is already
partitioned.
6027-2272 [E] Error removing the requested IP address
User response: Remove the existing partitions, or
ipAddress from a client declaration,
reissue the command using the desired partition name.
return code: returnCode
Explanation: One of the specified IP addresses to
6027-2279 [E] Unable to partition DASD device
remove could not be found in any client declaration for
dasdDevice
the given export path.
Explanation: The fdasd command failed.
User response: Correct the problem and try again.
User response: Examine the preceding messages,
correct the problem, and reissue the command.
6027-2280 [E] The DASD device with bus ID busID 6027-2286 [E] RPO peer snapshots using mmpsnap are
cannot be found or it is in use. allowed only for primary filesets.
Explanation: The chccwdev command failed. Explanation: RPO snapshots can be created only for
primary filesets.
User response: Examine the preceding messages,
correct the problem, and reissue the command. User response: Reissue the command with a valid
primary fileset or without the --rpo option.
6027-2281 [E] Error errno updating firmware for
enclosure enclosureIdentifier. 6027-2287 The fileset needs to be linked to change
afmShowHomeSnapshot to 'no'.
Explanation: The firmware load failed for the
specified enclosure. Explanation: The afmShowHomeSnapshot value
cannot be changed to no if the fileset is unlinked.
User response: None.
User response: Link the fileset and reissue the
command.
6027-2282 [E] Action action is not allowed for
secondary filesets.
6027-2288 [E] Option optionName is not supported for
Explanation: The specified action is not allowed for
AFM filesets.
secondary filesets.
Explanation: IAM modes are not supported for AFM
User response: None.
filesets.
User response: None.
6027-2283 [E] Node nodeName is already a CES node.
Explanation: An mmchnode command attempted to
6027-2289 [E] Peer snapshot creation failed while
enable CES services on a node that is already part of
running subCommand. Error code
the CES cluster.
errorCode
User response: Reissue the command specifying a
Explanation: For an active fileset, check the AFM
node that is not a CES node.
target configuration for peer snapshots. Ensure there is
at least one gateway node configured for the cluster.
6027-2284 [E] The fileset afmshowhomesnapshot value Examine the preceding messages and the GPFS log for
is 'yes'. The fileset mode cannot be additional details.
changed.
User response: Correct the problems and reissue the
Explanation: The fileset afmshowhomesnapshot command.
attribute value is yes. The fileset mode change is not
allowed.
6027-2290 [E] The comment string should be less than
User response: First change the attribute 50 characters long.
afmshowhomesnapshot value to no, and then issue the
Explanation: The comment/prefix string of the
command again to change the mode.
snapshot is longer than 50 characters.
User response: Reduce the comment string size and
6027-2285 [E] Deletion of initial snapshot
reissue the command.
snapshotName of fileset filesetName in file
system fileSystem failed. The delete
fileset operation failed at cache. Error 6027-2291 [E] Peer snapshot creation failed while
code errorCode. generating snapshot name. Error code
errorCode
Explanation: The deletion of the initial snapshot
psnap0 of filesetName failed. The primary and Explanation: For an active fileset, check the AFM
secondary filesets cannot be deleted without deleting target configuration for peer snapshots. Ensure there is
the initial snapshot. at least one gateway node configured for the cluster.
Examine the preceding messages and the GPFS log for
User response: None.
additional details.
User response: Correct the problems and reissue the
command.
6027-2293 [E] The peer snapshot creation failed 6027-2299 [E] Issue the mmafmctl getstate command
because fileset filesetName is in filesetState to check fileset state and if required
state. issue mmafmctl convertToPrimary.
Explanation: For an active fileset, check the AFM Explanation: Issue the mmafmctl getstate command
target configuration for peer snapshots. Ensure there is to check fileset state and if required issue mmafmctl
at least one gateway node configured for the cluster. convertToPrimary.
Examine the preceding messages and the GPFS log for User response: Issue the mmafmctl getstate command
additional details. to check fileset state and if required issue mmafmctl
User response: None. The fileset needs to be in active convertToPrimary.
or dirty state.
6027-2300 [E] The check-metadata and
6027-2294 [E] Removing older peer snapshots failed nocheck-metadata options are not
while obtaining snap IDs. Error code supported for the primary fileset.
errorCode Explanation: The check-metadata and
Explanation: Ensure the fileset exists. Examine the nocheck-metadata options are not supported for the
preceding messages and the GPFS log for additional primary fileset.
details. User response: None.
User response: Verify that snapshots exist for the
given fileset. 6027-2301 [E] The inband option is not supported for
the primary fileset.
6027-2295 [E] Removing older peer snapshots failed Explanation: The inband option is not supported for
while obtaining old snap IDs. Error the primary fileset.
code errorCode
User response: None.
Explanation: Ensure the fileset exists. Examine the
preceding messages and the GPFS log for additional
details. 6027-2302 [E] AFM target cannot be changed for the
primary fileset.
User response: Verify that snapshots exist for the
given fileset. Explanation: AFM target cannot be changed for the
primary fileset.
6027-2296 [E] Need a target to convert to the primary User response: None.
fileset.
Explanation: Need a target to convert to the primary 6027-2303 [E] The inband option is not supported for
fileset. an AFM fileset.
User response: Specify a target to convert to the Explanation: The inband option is not supported for
primary fileset. an AFM fileset.
User response: None.
6027-2297 [E] The check-metadata and
nocheck-metadata options are not
supported for a non-AFM fileset.
Explanation: The check-metadata and
6027-2306 [E] Failed to check for cached files while 6027-2314 [E] Could not run commandName. Verify that
doing primary conversion from the Object protocol was installed.
filesetMode mode.
Explanation: The mmcesobjlscfg command cannot
Explanation: Failed to check for cached files while find a prerequisite command on the system.
doing primary conversion.
User response: Install the missing command and try
User response: None. again.
6027-2307 [E] Uncached files present, run prefetch 6027-2315 [E] Could not determine CCR file for
first. service serviceName
Explanation: Uncached files present. Explanation: For the given service name, there is not a
corresponding file in the CCR.
User response: Run prefetch and then do the
conversion. User response: None.
6027-2308 [E] Uncached files present, run prefetch 6027-2316 [E] Unable to retrieve file fileName from
first using policy output: nodeDirFileOut. CCR using command command. Verify
that the Object protocol is correctly
Explanation: Uncached files present.
installed.
User response: Run prefetch first using policy output.
Explanation: There was an error downloading a file
from the CCR repository.
6027-2309 [E] Conversion to primary not allowed for
User response: Correct the error and try again.
filesetMode mode.
Explanation: Conversion to primary not allowed for
6027-2317 [E] Unable to parse version number of file
this mode.
fileName from mmccr output
User response: None.
Explanation: The current version should be printed by
mmccr when a file is extracted. The command could
6027-2310 [E] This option is available only for a not read the version number from the output and
primary fileset. failed.
Explanation: This option is available only for a User response: Investigate the failure in the CCR and
primary fileset. fix the problem.
User response: None.
6027-2318 [E] Could not put localFilePath into the CCR
as ccrName
6027-2311 [E] The target-only option is not allowed
for a promoted primary without a target. Explanation: There was an error when trying to do an
fput of a file into the CCR.
Explanation: The target-only option is not allowed for
a promoted primary without a target. User response: Investigate the error and fix the
problem.
6027-2321 [E] AFM primary or secondary filesets 6027-2327 The snapshot snapshotName is the wrong
cannot be created for file system scope for use in targetType backup
fileSystem because version is less than
supportedVersion. Explanation: The snapshot specified is the wrong
scope.
Explanation: The AFM primary or secondary filesets
are not supported for a file system version that is less User response: Please provide a valid snapshot name
than 14.20. for this backup type.
Explanation: The node could not enable the CES OBJ User response: None.
service because of a missing binary or configuration
file. 6027-2330 [E] The outband option is not supported for
User response: Install the required software and retry AFM filesets.
the command. Explanation: The outband option is not supported for
AFM filesets.
6027-2323 [E] The OBJ service cannot be enabled User response: None.
because the number of CES IPs below
the minimum of minValue expected.
6027-2331 [E] CCR value ccrValue not defined. The
Explanation: The value of CES IPs was below the OBJ service cannot be enabled if
minimum. identity authentication is not
User response: Add at least minValue CES IPs to the configured.
cluster. Explanation: Object authentication type was not
found.
6027-2324 [E] The object store for serviceName is either User response: Configure identity authentication and
not a GPFS type or mountPoint does not try again.
exist.
Explanation: The object store is not available at this
time.
User response: Verify that serviceName is a GPFS type.
6027-2332 [E] Only regular independent filesets are 6027-2339 [E] Orphans are present, run prefetch first.
converted to secondary filesets.
Explanation: Orphans are present.
Explanation: Only regular independent filesets can be
User response: Run prefetch on the fileset and then
converted to secondary filesets.
do the conversion.
User response: Specify a regular independent fileset
and run the command again.
6027-2340 [E] Fileset was left in PrimInitFail state.
Take the necessary actions.
6027-2333 [E] Failed to disable serviceName service.
Explanation: The fileset was left in PrimInitFail state.
Ensure authType authentication is
removed. User response: Take the necessary actions.
Explanation: Disable CES service failed because
authentication was not removed. 6027-2341 [E] This operation can be done only on a
primary fileset
User response: Remove authentication and retry.
Explanation: This is not a primary fileset.
6027-2334 [E] Fileset indFileset cannot be changed User response: None.
because it has a dependent fileset
depFileset
6027-2342 [E] Failover/resync is currently running so
Explanation: Filesets with dependent filesets cannot conversion is not allowed
be converted to primary or secondary.
Explanation: Failover/resync is currently running so
User response: This operation cannot proceed until all conversion is not allowed.
the dependent filesets are unlinked.
User response: Retry the command later after
failover/resync completes.
6027-2335 [E] Failed to convert fileset, because the
policy to detect special files is failing.
6027-2343 [E] DR Setup cannot be done on a fileset
Explanation: The policy to detect special files is with mode filesetMode.
failing.
Explanation: Setup cannot be done on a fileset with
User response: Retry the command later. this mode.
User response: None.
6027-2336 [E] Immutable/append-only files or clones
copied from a snapshot are present,
hence conversion is disallowed 6027-2344 [E] The GPFS daemon must be active on
the node from which the mmcmd is
Explanation: Conversion is disallowed if executed with option --inode-criteria or
immutable/append-only files or clones copied from a -o.
snapshot are present.
Explanation: The GPFS daemon needs to be active on
User response: Files should not be the node where the command is issued with
immutable/append-only. --inode-criteria or -o options.
User response: Run the command where the daemon
6027-2337 [E] Conversion to primary is not allowed at is active.
this time. Retry the command later.
Explanation: Conversion to primary is not allowed at 6027-2345 [E] The provided snapshot name must be
this time. unique to list filesets in a specific
snapshot
User response: Retry the command later.
Explanation: The mmlsfileset command received a
snapshot name that is not unique.
6027-2338 [E] Conversion to primary is not allowed
because the state of the fileset is User response: Correct the command invocation or
filesetState. remove the duplicate named snapshots and try again.
Explanation: Conversion to primary is not allowed
with the current state of the fileset.
User response: Retry the command later.
6027-2346 [E] The local node is not a CES node. 6027-2352 The snapshot snapshotName could not be
found for use by commandName
Explanation: A local Cluster Export Service command
was invoked on a node that is not defined as a Cluster Explanation: The snapshot specified could not be
Export Service node. located.
User response: Reissue the command on a CES node. User response: Please provide a valid snapshot name.
6027-2347 [E] Error changing export, export exportPath 6027-2353 [E] The snapshot name cannot be generated.
not found.
Explanation: The snapshot name cannot be generated.
Explanation: A CES command was unable to change
User response: None.
an export. The exportPath was not found.
User response: Correct problem and issue the
6027-2354 Node nodeName must be disabled as a
command again.
CES node before trying to remove it
from the GPFS cluster.
6027-2348 [E] A device for directoryName does not exist
Explanation: The specified node is defined as a CES
or is not active on this node.
node.
Explanation: The device containing the specified
User response: Disable the CES node and try again.
directory does not exist or is not active on the node.
User response: Reissue the command with a correct
6027-2355 [E] Unable to reload moduleName. Node
directory or on an appropriate node.
hostname should be rebooted.
Explanation: Host adapter firmware was updated so
6027-2349 [E] The fileset for junctionName does not
the the specified module needs to be unloaded and
exist in the targetType specified.
reloaded. Linux does not display the new firmware
Explanation: The fileset to back up cannot be found in level until the module is reloaded.
the file system or snapshot specified.
User response: Reboot the node.
User response: Reissue the command with a correct
name for the fileset, snapshot, or file system.
6027-2356 [E] Node nodeName is being used as a
recovery group server.
6027-2350 [E] The fileset for junctionName is not
Explanation: The specified node is defined as a server
linked in the targetType specified.
node for some disk.
Explanation: The fileset to back up is not linked in the
User response: If you are trying to delete the node
file system or snapshot specified.
from the GPFS cluster, you must either delete the disk
User response: Relink the fileset in the file system. or define another node as its server.
Optionally create a snapshot and reissue the command
with a correct name for the fileset, snapshot, and file
6027-2357 [E] Root fileset cannot be converted to
system.
primary fileset.
Explanation: Root fileset cannot be converted to the
6027-2351 [E] One or more unlinked filesets
primary fileset.
(filesetNames) exist in the targetType
specified. Check your filesets and try User response: None.
again.
Explanation: The file system to back up contains one 6027-2358 [E] Root fileset cannot be converted to
or more filesets that are unlinked in the file system or secondary fileset.
snapshot specified.
Explanation: Root fileset cannot be converted to the
User response: Relink the fileset in the file system. secondary fileset.
Optionally create a snapshot and reissue the command
with a correct name for the fileset, snapshot, and file User response: None.
system.
6027-2359 [I] Attention: command is now enabled. This 6027-2600 Cannot create a new snapshot until an
attribute can no longer be modified. existing one is deleted. File system
fileSystem has a limit of number online
Explanation: Indefinite retention protection is enabled.
snapshots.
This value can not be changed in the future.
Explanation: The file system has reached its limit of
User response: None.
online snapshots
User response: Delete an existing snapshot, then issue
6027-2360 [E] The current value of command is
the create snapshot command again.
attrName. This value cannot be changed.
Explanation: Indefinite retention protection is enabled
6027-2601 Snapshot name dirName already exists.
for this cluster and this attribute cannot be changed.
Explanation: by the tscrsnapshot command.
User response: None.
User response: Delete existing file/directory and
reissue the command.
6027-2361 [E] command is enabled. File systems cannot
be deleted.
6027-2602 Unable to delete snapshot snapshotName
Explanation: When indefinite retention protection is
from file system fileSystem. rc=returnCode.
enabled the file systems cannot be deleted.
Explanation: This message is issued by the
User response: None.
tscrsnapshot command.
User response: Delete the snapshot using the
6027-2362 [E] The current value of command is
tsdelsnapshot command.
attrName. No changes made.
Explanation: The current value and the request value
6027-2603 Unable to get permission to create
are the same. No changes made.
snapshot, rc=returnCode.
User response: None.
Explanation: This message is issued by the
tscrsnapshot command.
6027-2500 mmsanrepairfs already in progress for
User response: Reissue the command.
"name"
Explanation: This is an output from mmsanrepairfs
6027-2604 Unable to quiesce all nodes,
when another mmsanrepairfs command is already
rc=returnCode.
running.
Explanation: This message is issued by the
User response: Wait for the currently running
tscrsnapshot command.
command to complete and reissue the command.
User response: Restart failing nodes or switches and
reissue the command.
6027-2501 Could not allocate storage.
Explanation: Sufficient memory could not be allocated
6027-2605 Unable to resume all nodes,
to run the mmsanrepairfs command.
rc=returnCode.
User response: Increase the amount of memory
Explanation: This message is issued by the
available.
tscrsnapshot command.
User response: Restart failing nodes or switches.
6027-2576 [E] Error: Daemon value kernel value
PAGE_SIZE mismatch.
6027-2606 Unable to sync all nodes, rc=returnCode.
Explanation: The GPFS kernel extension loaded in
memory does not have the same PAGE_SIZE value as Explanation: This message is issued by the
the GPFS daemon PAGE_SIZE value that was returned tscrsnapshot command.
from the POSIX sysconf API.
User response: Restart failing nodes or switches and
User response: Verify that the kernel header files used reissue the command.
to build the GPFS portability layer are the same kernel
header files used to build the running kernel.
6027-2607 Cannot create new snapshot until an 6027-2614 File system fileSystem does not contain
existing one is deleted. Fileset snapshot snapshotName err = number.
filesetName has a limit of number
Explanation: An incorrect snapshot name was
snapshots.
specified.
Explanation: The fileset has reached its limit of
User response: Specify a valid snapshot and issue the
snapshots.
command again.
User response: Delete an existing snapshot, then issue
the create snapshot command again.
6027-2615 Cannot restore snapshot snapshotName
which is snapshotState, err = number.
6027-2608 Cannot create new snapshot: state of
Explanation: The specified snapshot is not in a valid
fileset filesetName is inconsistent
state.
(badState).
User response: Specify a snapshot that is in a valid
Explanation: An operation on the cited fileset is
state and issue the command again.
incomplete.
User response: Complete pending fileset actions, then
6027-2616 Restoring snapshot snapshotName
issue the create snapshot command again.
requires quotaTypes quotas to be enabled.
Explanation: The snapshot being restored requires
6027-2609 Fileset named filesetName does not exist.
quotas to be enabled, since they were enabled when the
Explanation: One of the filesets listed does not exist. snapshot was created.
User response: Specify only existing fileset names. User response: Issue the recommended mmchfs
command to enable quotas.
6027-2610 File system fileSystem does not contain
snapshot snapshotName err = number. 6027-2617 You must run: mmchfs fileSystem -Q yes.
Explanation: An incorrect snapshot name was Explanation: The snapshot being restored requires
specified. quotas to be enabled, since they were enabled when the
snapshot was created.
User response: Select a valid snapshot and issue the
command again. User response: Issue the cited mmchfs command to
enable quotas.
6027-2611 Cannot delete snapshot snapshotName
which is in state snapshotState. 6027-2618 [N] Restoring snapshot snapshotName in file
system fileSystem requires quotaTypes
Explanation: The snapshot cannot be deleted while it
quotas to be enabled.
is in the cited transition state because of an in-progress
snapshot operation. Explanation: The snapshot being restored in the cited
file system requires quotas to be enabled, since they
User response: Wait for the in-progress operation to
were enabled when the snapshot was created.
complete and then reissue the command.
User response: Issue the mmchfs command to enable
quotas.
6027-2612 Snapshot named snapshotName does not
exist.
6027-2619 Restoring snapshot snapshotName
Explanation: A snapshot to be listed does not exist.
requires quotaTypes quotas to be
User response: Specify only existing snapshot names. disabled.
Explanation: The snapshot being restored requires
6027-2613 Cannot restore snapshot. fileSystem is quotas to be disabled, since they were not enabled
mounted on number node(s) and in use when the snapshot was created.
on number node(s).
User response: Issue the cited mmchfs command to
Explanation: This message is issued by the disable quotas.
tsressnapshot command.
User response: Unmount the file system and reissue
the restore command.
6027-2620 You must run: mmchfs fileSystem -Q no. 6027-2627 Previous snapshot snapshotName is not
valid and must be deleted before
Explanation: The snapshot being restored requires
another snapshot may be restored.
quotas to be disabled, since they were not enabled
when the snapshot was created. Explanation: The cited previous snapshot is not valid
and must be deleted before another snapshot may be
User response: Issue the cited mmchfs command to
restored.
disable quotas.
User response: Delete the previous snapshot using the
mmdelsnapshot command, and then reissue the
6027-2621 [N] Restoring snapshot snapshotName in file
original snapshot command.
system fileSystem requires quotaTypes
quotas to be disabled.
6027-2628 More than one snapshot is marked for
Explanation: The snapshot being restored in the cited
restore.
file system requires quotas to be disabled, since they
were disabled when the snapshot was created. Explanation: More than one snapshot is marked for
restore.
User response: Issue the mmchfs command to disable
quotas. User response: Restore the previous snapshot and
then reissue the original snapshot command.
6027-2623 [E] Error deleting snapshot snapshotName in
file system fileSystem err number 6027-2629 Offline snapshot being restored.
Explanation: The cited snapshot could not be deleted Explanation: An offline snapshot is being restored.
during file system recovery.
User response: When the restore of the offline
User response: Run the mmfsck command to recover snapshot completes, reissue the original snapshot
any lost data blocks. command.
6027-2624 Previous snapshot snapshotName is not 6027-2630 Program failed, error number.
valid and must be deleted before a new
Explanation: The tssnaplatest command encountered
snapshot may be created.
an error and printErrnoMsg failed.
Explanation: The cited previous snapshot is not valid
User response: Correct the problem shown and
and must be deleted before a new snapshot may be
reissue the command.
created.
User response: Delete the previous snapshot using the
6027-2631 Attention: Snapshot snapshotName was
mmdelsnapshot command, and then reissue the
being restored to fileSystem.
original snapshot command.
Explanation: A file system in the process of a
snapshot restore cannot be mounted except under a
6027-2625 Previous snapshot snapshotName must be
restricted mount.
restored before a new snapshot may be
created. User response: None. Informational message only.
Explanation: The cited previous snapshot must be
restored before a new snapshot may be created. 6027-2633 Attention: Disk configuration for
fileSystem has changed while tsdf was
User response: Run mmrestorefs on the previous
running.
snapshot, and then reissue the original snapshot
command. Explanation: The disk configuration for the cited file
system changed while the tsdf command was running.
6027-2626 Previous snapshot snapshotName is not User response: Reissue the mmdf command.
valid and must be deleted before
another snapshot may be deleted.
6027-2634 Attention: number of number regions in
Explanation: The cited previous snapshot is not valid fileSystem were unavailable for free
and must be deleted before another snapshot may be space.
deleted.
Explanation: Some regions could not be accessed
User response: Delete the previous snapshot using the during the tsdf run. Typically, this is due to utilities
mmdelsnapshot command, and then reissue the such mmdefragfs or mmfsck running concurrently.
original snapshot command.
User response: Reissue the mmdf command.
6027-2635 The free space data is not available. 6027-2642 Specify one and only one of
Reissue the command without the -q FilesetName or -J JunctionPath.
option to collect it.
Explanation: The change fileset and unlink fileset
Explanation: The existing free space information for commands accept either a fileset name or the fileset's
the file system is currently unavailable. junction path to uniquely identify the fileset. The user
failed to provide either of these, or has tried to provide
User response: Reissue the mmdf command.
both.
User response: Correct the command invocation and
6027-2636 Disks in storage pool storagePool must
reissue the command.
have disk usage type dataOnly.
Explanation: A non-system storage pool cannot hold
6027-2643 Cannot create a new fileset until an
metadata or descriptors.
existing one is deleted. File system
User response: Modify the command's disk fileSystem has a limit of maxNumber
descriptors and reissue the command. filesets.
Explanation: An attempt to create a fileset for the
6027-2637 The file system must contain at least cited file system failed because it would exceed the
one disk for metadata. cited limit.
Explanation: The disk descriptors for this command User response: Remove unneeded filesets and reissue
must include one and only one storage pool that is the command.
allowed to contain metadata.
User response: Modify the command's disk 6027-2644 Comment exceeds maximum length of
descriptors and reissue the command. maxNumber characters.
Explanation: The user-provided comment for the new
6027-2638 Maximum of number storage pools fileset exceeds the maximum allowed length.
allowed.
User response: Shorten the comment and reissue the
Explanation: The cited limit on the number of storage command.
pools that may be defined has been exceeded.
User response: Modify the command's disk 6027-2645 Fileset filesetName already exists.
descriptors and reissue the command.
Explanation: An attempt to create a fileset failed
because the specified fileset name already exists.
6027-2639 Incorrect fileset name filesetName.
User response: Select a unique name for the fileset
Explanation: The fileset name provided in the and reissue the command.
command invocation is incorrect.
User response: Correct the fileset name and reissue 6027-2646 Unable to sync all nodes while
the command. quiesced, rc=returnCode
Explanation: This message is issued by the
6027-2640 Incorrect path to fileset junction tscrsnapshot command.
filesetJunction.
User response: Restart failing nodes or switches and
Explanation: The path to the cited fileset junction is reissue the command.
incorrect.
User response: Correct the junction path and reissue 6027-2647 Fileset filesetName must be unlinked to
the command. be deleted.
Explanation: The cited fileset must be unlinked before
6027-2641 Incorrect fileset junction name it can be deleted.
filesetJunction.
User response: Unlink the fileset, and then reissue the
Explanation: The cited junction name is incorrect. delete command.
User response: Correct the junction name and reissue
the command.
6027-2648 Filesets have not been enabled for file 6027-2655 Fileset filesetName cannot be deleted.
system fileSystem.
Explanation: The user is not allowed to delete the root
Explanation: The current file system format version fileset.
does not support filesets.
User response: None. The fileset cannot be deleted.
User response: Change the file system format version
by issuing mmchfs -V.
6027-2656 Unable to quiesce fileset at all nodes.
Explanation: An attempt to quiesce the fileset at all
6027-2649 Fileset filesetName contains user files and
nodes failed.
cannot be deleted unless the -f option is
specified. User response: Check communication hardware and
reissue the command.
Explanation: An attempt was made to delete a
non-empty fileset.
6027-2657 Fileset filesetName has open files. Specify
User response: Remove all files and directories from
-f to force unlink.
the fileset, or specify the -f option to the mmdelfileset
command. Explanation: An attempt was made to unlink a fileset
that has open files.
6027-2650 Fileset information is not available. User response: Close the open files and then reissue
command, or use the -f option on the unlink command
Explanation: A fileset command failed to read file
to force the open files to close.
system metadata file. The file system may be corrupted.
User response: Run the mmfsck command to recover
6027-2658 Fileset filesetName cannot be linked into
the file system.
a snapshot at pathName.
Explanation: The user specified a directory within a
6027-2651 Fileset filesetName cannot be unlinked.
snapshot for the junction to a fileset, but snapshots
Explanation: The user tried to unlink the root fileset, cannot be modified.
or is not authorized to unlink the selected fileset.
User response: Select a directory within the active file
User response: None. The fileset cannot be unlinked. system, and reissue the command.
6027-2653 Failed to unlink fileset filesetName from 6027-2660 Fileset filesetName cannot be linked.
filesetName.
Explanation: The fileset could not be linked. This
Explanation: An attempt was made to unlink a fileset typically happens when the fileset is in the process of
that is linked to a parent fileset that is being deleted. being deleted.
User response: Delete or unlink the children, and then User response: None.
delete the parent fileset.
6027-2661 Fileset junction pathName already exists.
6027-2654 Fileset filesetName cannot be deleted
Explanation: A file or directory already exists at the
while other filesets are linked to it.
specified junction.
Explanation: The fileset to be deleted has other filesets
User response: Select a new junction name or a new
linked to it, and cannot be deleted without using the -f
directory for the link and reissue the link command.
flag, or unlinking the child filesets.
User response: Delete or unlink the children, and then
delete the parent fileset.
6027-2662 Directory pathName for junction has too 6027-2671 Fileset command on fileSystem failed;
many links. snapshot snapshotName must be restored
first.
Explanation: The directory specified for the junction
has too many links. Explanation: The file system is being restored either
from an offline backup or a snapshot, and the restore
User response: Select a new directory for the link and
operation has not finished. Fileset commands cannot be
reissue the command.
run.
User response: Run the mmrestorefs command to
6027-2663 Fileset filesetName cannot be changed.
complete the snapshot restore operation or to finish the
Explanation: The user specified a fileset to tschfileset offline restore, then reissue the fileset command.
that cannot be changed.
User response: None. You cannot change the 6027-2672 Junction parent directory inode number
attributes of the root fileset. inodeNumber is not valid.
Explanation: An inode number passed to tslinkfileset
6027-2664 Fileset at pathName cannot be changed. is not valid.
Explanation: The user specified a fileset to tschfileset User response: Check the mmlinkfileset command
that cannot be changed. arguments for correctness. If a valid junction path was
provided, contact the IBM Support Center.
User response: None. You cannot change the
attributes of the root fileset.
6027-2673 [X] Duplicate owners of an allocation region
(index indexNumber, region regionNumber,
6027-2665 mmfileid already in progress for name. pool poolNumber) were detected for file
Explanation: An mmfileid command is already system fileSystem: nodes nodeName and
running. nodeName.
User response: Wait for the currently running Explanation: The allocation region should not have
command to complete, and issue the new command duplicate owners.
again. User response: Contact the IBM Support Center.
6027-2666 mmfileid can only handle a maximum 6027-2674 [X] The owner of an allocation region
of diskAddresses disk addresses. (index indexNumber, region regionNumber,
Explanation: Too many disk addresses specified. pool poolNumber) that was detected for
file system fileSystem: node nodeName is
User response: Provide less than 256 disk addresses to not valid.
the command.
Explanation: The file system had detected a problem
with the ownership of an allocation region. This may
6027-2667 [I] Allowing block allocation for file result in a corrupted file system and loss of data. One
system fileSystem that makes a file or more nodes may be terminated to prevent any
ill-replicated due to insufficient resource further damage to the file system.
and puts data at risk.
User response: Unmount the file system and run the
Explanation: The partialReplicaAllocation file system kwdmmfsck command to repair the file system.
option allows allocation to succeed even when all
replica blocks cannot be allocated. The file was marked
as not replicated correctly and the data may be at risk 6027-2675 Only file systems with NFSv4 ACL
if one of the remaining disks fails. semantics enabled can be mounted on
this platform.
User response: None. Informational message only.
Explanation: A user is trying to mount a file system
on Microsoft Windows, but the ACL semantics disallow
6027-2670 Fileset name filesetName not found. NFSv4 ACLs.
Explanation: The fileset name that was specified with User response: Enable NFSv4 ACL semantics using
the command invocation was not found. the mmchfs command (-k option)
User response: Correct the fileset name and reissue
the command.
6027-2689 The value for --block-size must be the 6027-2695 [E] The number of inodes to preallocate
keyword auto or the value must be of cannot be higher than the maximum
the form [n]K, [n]M, [n]G or [n]T, where number of inodes.
n is an optional integer in the range 1 to
Explanation: The specified number of nodes to
1023.
preallocate is not valid.
Explanation: An invalid value was specified with the
User response: Correct the --inode-limit argument
--block-size option.
then retry the command.
User response: Reissue the command with a valid
option.
6027-2696 [E] The number of inodes to preallocate
cannot be lower than the number inodes
6027-2690 Fileset filesetName can only be linked already allocated.
within its own inode space.
Explanation: The specified number of nodes to
Explanation: A dependent fileset can only be linked preallocate is not valid.
within its own inode space.
User response: Correct the --inode-limit argument
User response: Correct the junction path and reissue then retry the command.
the command.
6027-2697 Fileset at junctionPath has pending
6027-2691 The fastea feature needs to be enabled changes that need to be synced.
for file system fileSystem before creating
Explanation: A user is trying to change a caching
AFM filesets.
option for a fileset while it has local changes that are
Explanation: The current file system on-disk format not yet synced with the home server.
does not support storing of extended attributes in the
User response: Perform AFM recovery before
file's inode. This is required for AFM-enabled filesets.
reissuing the command.
User response: Use the mmmigratefs command to
enable the fast extended-attributes feature.
6027-2698 File system fileSystem is mounted on
nodes nodes or fileset at junctionPath is
6027-2692 Error encountered while processing the not unlinked.
input file.
Explanation: A user is trying to change a caching
Explanation: The tscrsnapshot command encountered feature for a fileset while the file system is still
an error while processing the input file. mounted or the fileset is still linked.
User response: Check and validate the fileset names User response: Unmount the file system from all
listed in the input file. nodes or unlink the fileset before reissuing the
command.
6027-2693 Fileset junction name junctionName
conflicts with the current setting of 6027-2699 Cannot create a new independent fileset
mmsnapdir. until an existing one is deleted. File
system fileSystem has a limit of
Explanation: The fileset junction name conflicts with
maxNumber independent filesets.
the current setting of mmsnapdir.
Explanation: An attempt to create an independent
User response: Select a new junction name or a new
fileset for the cited file system failed because it would
directory for the link and reissue the mmlinkfileset
exceed the cited limit.
command.
User response: Remove unneeded independent filesets
and reissue the command.
6027-2694 [I] The requested maximum number of
inodes is already at number.
6027-2700 [E] A node join was rejected. This could be
Explanation: The specified number of nodes is already
due to incompatible daemon versions,
in effect.
failure to find the node in the
User response: This is an informational message. configuration database, or no
configuration manager found.
Explanation: A request to join nodes was explicitly
rejected.
6027-2704 Permission failure. The command 6027-2710 [E] Node nodeName is being expelled due to
requires root authority to execute. expired lease.
Explanation: The mmpmon command was issued Explanation: The nodes listed did not renew their
with a nonzero UID. lease in a timely fashion and will be expelled from the
cluster.
User response: Log on as root and reissue the
command. User response: Check the network connection
between this node and the node specified above.
6027-2705 Could not establish connection to file
system daemon. 6027-2711 [E] File system table full.
Explanation: The connection between a GPFS Explanation: The mmfsd daemon cannot add any
command and the mmfsd daemon could not be more file systems to the table because it is full.
established. The daemon may have crashed, or never
User response: None. Informational message only.
been started, or (for mmpmon) the allowed number of
simultaneous connections has been exceeded.
6027-2712 Option 'optionName' has been
User response: Ensure that the mmfsd daemon is
deprecated.
running. Check the error log. For mmpmon, ensure
that the allowed number of simultaneous connections Explanation: The option that was specified with the
has not been exceeded. command is no longer supported. A warning message
is generated to indicate that the option has no effect.
6027-2706 [I] Recovered number nodes. User response: Correct the command line and then
reissue the command.
Explanation: The asynchronous part (phase 2) of node
failure recovery has completed.
6027-2713 Permission failure. The command
User response: None. Informational message only.
requires SuperuserName authority to
execute.
Explanation: The command, or the specified
command option, requires administrative authority.
User response: Log on as a user with administrative
privileges and reissue the command.
6027-2714 Could not appoint node nodeName as 6027-2722 [E] Node limit of number has been reached.
cluster manager. errorString Ignoring nodeName.
Explanation: The mmchmgr -c command generates Explanation: The number of nodes that have been
this message if the specified node cannot be appointed added to the cluster is greater than some cluster
as a new cluster manager. members can handle.
User response: Make sure that the specified node is a User response: Delete some nodes from the cluster
quorum node and that GPFS is running on that node. using the mmdelnode command, or shut down GPFS
on nodes that are running older versions of the code
with lower limits.
6027-2715 Could not appoint a new cluster
manager. errorString
6027-2723 [N] This node (nodeName) is now Cluster
Explanation: The mmchmgr -c command generates
Manager for clusterName.
this message when a node is not available as a cluster
manager. Explanation: This is an informational message when a
new cluster manager takes over.
User response: Make sure that GPFS is running on a
sufficient number of quorum nodes. User response: None. Informational message only.
6027-2716 [I] Challenge response received; canceling 6027-2724 [I] reasonString. Probing cluster clusterName
disk election.
Explanation: This is an informational message when a
Explanation: The node has challenged another node, lease request has not been renewed.
which won the previous election, and detected a
User response: None. Informational message only.
response to the challenge.
User response: None. Informational message only.
6027-2725 [N] Node nodeName lease renewal is
overdue. Pinging to check if it is alive
6027-2717 Node nodeName is already a cluster
Explanation: This is an informational message on the
manager or another node is taking over
cluster manager when a lease request has not been
as the cluster manager.
renewed.
Explanation: The mmchmgr -c command generates
User response: None. Informational message only.
this message if the specified node is already the cluster
manager.
6027-2726 [I] Recovered number nodes for file system
User response: None. Informational message only.
fileSystem.
Explanation: The asynchronous part (phase 2) of node
6027-2718 Incorrect port range:
failure recovery has completed.
GPFSCMDPORTRANGE='range'. Using
default. User response: None. Informational message only.
Explanation: The GPFS command port range format is
lllll[-hhhhh], where lllll is the low port value and hhhhh 6027-2727 fileSystem: quota manager is not
is the high port value. The valid range is 1 to 65535. available.
User response: None. Informational message only. Explanation: An attempt was made to perform a
quota command without a quota manager running.
This could be caused by a conflicting offline mmfsck
6027-2719 The files provided do not contain valid
command.
quota entries.
User response: Reissue the command once the
Explanation: The quota file provided does not have
conflicting program has ended.
valid quota entries.
User response: Check that the file being restored is a
6027-2728 [N] Connection from node rejected because
valid GPFS quota file.
it does not support IPv6
Explanation: A connection request was received from
a node that does not support Internet Protocol Version
6 (IPv6), and at least one node in the cluster is
configured with an IPv6 address (not an IPv4-mapped
one) as its primary address. Since the connecting node
6027-2741 [W] This node can not continue to be 6027-2747 [E] Inconsistency detected between the local
cluster manager node number retrieved from 'mmsdrfs'
(nodeNumber) and the node number
Explanation: This node invoked the user-specified
retrieved from 'mmfs.cfg' (nodeNumber).
callback handler for event tiebreakerCheck and it
returned a non-zero value. This node cannot continue Explanation: The node number retrieved by obtaining
to be the cluster manager. the list of nodes in the mmsdrfs file did not match the
node number contained in mmfs.cfg. There may have
User response: None. Informational message only.
been a recent change in the IP addresses being used by
network interfaces configured at the node.
6027-2742 [I] CallExitScript: exit script exitScript on
User response: Stop and restart GPFS daemon.
event eventName returned code
returnCode, quorumloss.
6027-2748 Terminating because a conflicting
Explanation: This node invoked the user-specified
program on the same inode space
callback handler for the tiebreakerCheck event and it
inodeSpace is running.
returned a non-zero value. The user-specified action
with the error is quorumloss. Explanation: A program detected that it must
terminate because a conflicting program is running.
User response: None. Informational message only.
User response: Reissue the command after the
conflicting program ends.
6027-2743 Permission denied.
Explanation: The command is invoked by an
6027-2749 Specified locality group 'number' does
unauthorized user.
not match disk 'name' locality group
User response: Retry the command with an 'number'. To change locality groups in an
authorized user. SNC environment, please use the
mmdeldisk and mmadddisk commands.
6027-2744 [D] Invoking tiebreaker callback script Explanation: The locality group specified on the
mmchdisk command does not match the current
Explanation: The node is invoking the callback script locality group of the disk.
due to change in quorum membership.
User response: To change locality groups in an SNC
User response: None. Informational message only. environment, use the mmdeldisk and mmadddisk
commands.
6027-2745 [E] File system is not mounted.
Explanation: A command was issued, which requires 6027-2750 [I] Node NodeName is now the Group
that the file system be mounted. Leader.
User response: Mount the file system and reissue the Explanation: A new cluster Group Leader has been
command. assigned.
User response: None. Informational message only.
6027-2746 [E] Too many disks unavailable for this
server to continue serving a 6027-2751 [I] Starting new election: Last elected:
RecoveryGroup. NodeNumber Sequence: SequenceNumber
Explanation: RecoveryGroup panic: Too many disks Explanation: A new disk election will be started. The
unavailable to continue serving this RecoveryGroup. disk challenge will be skipped since the last elected
This server will resign, and failover to an alternate node was either none or the local node.
server will be attempted.
User response: None. Informational message only.
User response: Ensure the alternate server took over.
Determine what caused this event and address the
situation. Prior messages may help determine the cause 6027-2752 [I] This node got elected. Sequence:
of the event. SequenceNumber
Explanation: Local node got elected in the disk
election. This node will become the cluster manager.
User response: None. Informational message only.
6027-2753 [N] Responding to disk challenge: 6027-2758 [E] The AFM target does not support this
response: ResponseValue. Error code: operation. Run mmafmconfig on the
ErrorCode. AFM target cluster.
Explanation: A disk challenge has been received, Explanation: The .afmctl file is probably not present
indicating that another node is attempting to become a on the AFM target cluster.
Cluster Manager. Issuing a challenge response, to
User response: Run mmafmconfig on the AFM target
confirm the local node is still alive and will remain the
cluster to configure the AFM target cluster.
Cluster Manager.
User response: None. Informational message only.
6027-2759 [N] Disk lease period expired in cluster
ClusterName. Attempting to reacquire
6027-2754 [X] Challenge thread did not respond to lease.
challenge in time: took TimeIntervalSecs
Explanation: The disk lease period expired, which will
seconds.
prevent the local node from being able to perform disk
Explanation: Challenge thread took too long to I/O. This can be caused by a temporary
respond to a disk challenge. Challenge thread will exit, communication outage.
which will result in the local node losing quorum.
User response: If message is repeated then the
User response: None. Informational message only. communication outage should be investigated.
6027-2755 [N] Another node committed disk election 6027-2760 [N] Disk lease reacquired in cluster
with sequence CommittedSequenceNumber ClusterName.
(our sequence was OurSequenceNumber).
Explanation: The disk lease has been reacquired, and
Explanation: Another node committed a disk election disk I/O will be resumed.
with a sequence number higher than the one used
User response: None. Informational message only.
when this node used to commit an election in the past.
This means that the other node has become, or is
becoming, a Cluster Manager. To avoid having two 6027-2761 Unable to run command on 'fileSystem'
Cluster Managers, this node will lose quorum. while the file system is mounted in
restricted mode.
User response: None. Informational message only.
Explanation: A command that can alter data in a file
system was issued while the file system was mounted
6027-2756 Attention: In file system FileSystemName,
in restricted mode.
FileSetName (Default)
QuotaLimitType(QuotaLimit) for User response: Mount the file system in read-only or
QuotaTypeUerName/GroupName/FilesetName read-write mode or unmount the file system and then
is too small. Suggest setting it higher reissue the command.
than minQuotaLimit.
Explanation: Users set too low quota limits. It will 6027-2762 Unable to run command on 'fileSystem'
cause unexpected quota behavior. MinQuotaLimit is while the file system is suspended.
computed through:
Explanation: A command that can alter data in a file
1. for block: QUOTA_THRESHOLD * system was issued while the file system was
MIN_SHARE_BLOCKS * subblocksize suspended.
2. for inode: QUOTA_THRESHOLD *
MIN_SHARE_INODES User response: Resume the file system and reissue the
command.
User response: Users should reset quota limits so that
they are more than MinQuotaLimit. It is just a warning.
Quota limits will be set anyway. 6027-2763 Unable to start command on 'fileSystem'
because conflicting program name is
running. Waiting until it completes.
6027-2757 [E] The peer snapshot is in progress. Queue
cannot be flushed now. Explanation: A program detected that it cannot start
because a conflicting program is running. The program
Explanation: The Peer Snapshot is in progress. Queue will automatically start once the conflicting program
cannot be flushed now. has ended as long as there are no other conflicting
programs running at that time.
User response: Reissue the command once the peer
snapshot has ended. User response: None. Informational message only.
6027-2770 Disk diskName belongs to a 6027-2776 Attention: A disk being stopped reduces
write-affinity enabled storage pool. Its the degree of system metadata
failure group cannot be changed. replication (value) or data replication
(value) to lower than tolerable.
Explanation: The failure group specified on the
mmchdisk command does not match the current failure Explanation: The mmchdisk stop command was
group of the disk. issued, but the disk cannot be stopped because of the
current file system metadata and data replication
User response: Use the mmdeldisk and mmadddisk
factors.
commands to change failure groups in a write-affinity
enabled storage pool. User response: Make more disks available, delete
unavailable disks, or change the file system metadata
replication factor. Also check the current value of the
6027-2771 fileSystem: Default per-fileset quotas are
unmountOnDiskFail configuration parameter.
disabled for quotaType.
Explanation: A command was issued to modify
6027-2777 [E] Node nodeName is being expelled
default fileset-level quota, but default quotas are not
because of an expired lease. Pings sent:
enabled.
pingsSent. Replies received:
User response: Ensure the --perfileset-quota option is pingRepliesReceived.
in effect for the file system, then use the
Explanation: The node listed did not renew its lease
in a timely fashion and is being expelled from the leader. Therefore, this node must leave the cluster and
cluster. rejoin.
User response: Check the network connection User response: None. The node will attempt to rejoin
between this node and the node listed in the message. the cluster.
6027-2778 [I] Node nodeName: ping timed out. Pings 6027-2784 [E] No longer a cluster manager or lost
sent: pingsSent. Replies received: quorum while running a group protocol.
pingRepliesReceived.
Explanation: Cluster manager no longer maintains
Explanation: Ping timed out for the node listed, which quorum after attempting to run a group protocol,
should be the cluster manager. A new cluster manager which might indicate a network outage or node
will be chosen while the current cluster manager is failures.
expelled from the cluster.
User response: None. The node will attempt to rejoin
User response: Check the network connection the cluster.
between this node and the node listed in the message.
6027-2785 [X] A severe error was encountered during
6027-2779 [E] Challenge thread stopped. cluster probe.
Explanation: A tiebreaker challenge thread stopped Explanation: A severe error was encountered while
because of an error. Cluster membership will be lost. running the cluster probe to determine the state of the
nodes in the cluster.
User response: Check for additional error messages.
File systems will be unmounted, then the node will User response: Examine additional error messages.
rejoin the cluster. The node will attempt to rejoin the cluster.
6027-2780 [E] Not enough quorum nodes reachable: 6027-2786 [E] Unable to contact any quorum nodes
reachableNodes. during cluster probe.
Explanation: The cluster manager cannot reach a Explanation: This node has been unable to contact any
sufficient number of quorum nodes, and therefore must quorum nodes during cluster probe, which might
resign to prevent cluster partitioning. indicate a network outage or too many quorum node
failures.
User response: Determine if there is a network outage
or if too many nodes have failed. User response: Determine whether there was a
network outage or whether quorum nodes failed.
6027-2781 [E] Lease expired for numSecs seconds
(shutdownOnLeaseExpiry). 6027-2787 [E] Unable to contact enough other quorum
nodes during cluster probe.
Explanation: Disk lease expired for too long, which
results in the node losing cluster membership. Explanation: This node, a quorum node, was unable
to contact a sufficient number of quorum nodes during
User response: None. The node will attempt to rejoin
cluster probe, which might indicate a network outage
the cluster.
or too many quorum node failures.
User response: Determine whether there was a
6027-2782 [E] This node is being expelled from the
network outage or whether quorum nodes failed.
cluster.
Explanation: This node received a message instructing
6027-2788 [E] Attempt to run leader election failed
it to leave the cluster, which might indicate
with error errorNumber.
communication problems between this node and some
other node in the cluster. Explanation: This node attempted to run a group
leader election but failed to get elected. This failure
User response: None. The node will attempt to rejoin
might indicate that two or more quorum nodes
the cluster.
attempted to run the election at the same time. As a
result, this node will lose cluster membership and then
6027-2783 [E] New leader elected with a higher ballot attempt to rejoin the cluster.
number.
User response: None. The node will attempt to rejoin
Explanation: A new group leader was elected with a the cluster.
higher ballot number, and this node is no longer the
6027-2803 Policy set must start with VERSION. User response: Correct or remove the rule.
6027-2804 Unexpected SQL result code - User response: None. Informational message only.
sqlResultCode.
Explanation: This could be an IBM programming 6027-2810 [W] There are numberOfPools storage pools
error. but the policy file is missing or empty.
User response: Check that your SQL expressions are Explanation: The cited number of storage pools are
correct and supported by the current release of GPFS. If defined, but the policy file is missing or empty.
the error recurs, contact the IBM Support Center. User response: You should probably install a policy
with placement rules using the mmchpolicy command,
6027-2805 [I] Loaded policy 'policyFileName or so that at least some of your data will be stored in your
filesystemName': summaryOfPolicyRules nonsystem storage pools.
Or:
6027-2812 Keyword 'keywordValue' begins a second
clauseName clause - only one is allowed. Correct the macro definitions in your policy rules file.
Explanation: The policy rule should only have one If the problem persists, contact the IBM Support Center.
clause of the indicated type.
User response: Correct the rule and reissue the policy 6027-2818 A problem occurred during m4
command. processing of policy rules. rc =
return_code_from_popen_pclose_or_m4
6027-2813 This 'ruleName' rule is missing a Explanation: An attempt to expand the policy rules
clauseType required clause. with an m4 subprocess yielded some warnings or
errors or the m4 macro wrote some output to standard
Explanation: The policy rule must have a clause of the
error. Details or related messages may follow this
indicated type.
message.
User response: Correct the rule and reissue the policy
User response: To correct the error, do one or more of
command.
the following:
Check that the standard m4 macro processing
6027-2814 This 'ruleName' rule is of unknown type
command is installed on your system as /usr/bin/m4.
or not supported.
Or:
Explanation: The policy rule set seems to have a rule
of an unknown type or a rule that is unsupported by Set the MM_M4_CMD environment variable.
the current release of GPFS.
Or:
User response: Correct the rule and reissue the policy
command. Correct the macro definitions in your policy rules file.
If the problem persists, contact the IBM Support Center.
6027-2815 The value 'value' is not supported in a
'clauseType' clause. 6027-2819 Error opening temp file temp_file_name:
Explanation: The policy rule clause seems to specify errorString
an unsupported argument or value that is not Explanation: An error occurred while attempting to
supported by the current release of GPFS. open the specified temporary work file.
User response: Correct the rule and reissue the policy User response: Check that the path name is defined
command. and accessible. Check the file and then reissue the
command.
6027-2816 Policy rules employ features that would
require a file system upgrade. 6027-2820 Error reading temp file temp_file_name:
Explanation: One or more policy rules have been errorString
written to use new features that cannot be installed on Explanation: An error occurred while attempting to
a back-level file system. read the specified temporary work file.
User response: Install the latest GPFS software on all User response: Check that the path name is defined
nodes and upgrade the file system or change your and accessible. Check the file and then reissue the
rules. (Note that LIMIT was introduced in GPFS command.
Release 3.2.)
6027-2825 Duplicate encryption set name 'setName'. User response: Specify a valid assert integer value.
6027-2826 The encryption set 'setName' requested User response: Specify a valid assert integer value.
by rule 'rule' could not be found.
Explanation: The given set name used in the rule 6027-2955 [W] Time-of-day may have jumped back.
cannot be found. Late by delaySeconds seconds to wake
certain threads.
User response: Verify if the set name is correct. Add
the given set if it is missing from the policy. Explanation: Time-of-day may have jumped back,
which has resulted in some threads being awakened
later than expected. It is also possible that some other
6027-2827 [E] Error in evaluation of encryption policy factor has caused a delay in waking up the threads.
for file fileName: %s
User response: Verify if there is any problem with
Explanation: An error occurred while evaluating the network time synchronization, or if time-of-day is being
encryption rules in the given policy file. incorrectly set.
User response: Examine the other error messages
produced while evaluating the policy file. 6027-2956 [E] Invalid crypto engine type
(encryptionCryptoEngineType):
cryptoEngineType.
6027-2828 [E] Encryption not supported on Windows.
Encrypted file systems are not allowed Explanation: The specified value for
when Windows nodes are present in the encryptionCryptoEngineType is incorrect.
cluster.
User response: Specify a valid value for
Explanation: Self-explanatory. encryptionCryptoEngineType.
User response: To activate encryption, ensure there are
6027-2957 [E] Invalid cluster manager selection choice 6027-3105 Pdisk nPathActive invalid in option
(clusterManagerSelection): 'option'.
clusterManagerSelection.
Explanation: When parsing disk lists, the nPathActive
Explanation: The specified value for value is not valid.
clusterManagerSelection is incorrect.
User response: Specify a valid nPathActive value (0 to
User response: Specify a valid value for 255).
clusterManagerSelection.
6027-3106 Pdisk nPathTotal invalid in option
6027-2958 [E] Invalid NIST compliance type 'option'.
(nistCompliance): nistComplianceValue.
Explanation: When parsing disk lists, the nPathTotal
Explanation: The specified value for nistCompliance value is not valid.
is incorrect.
User response: Specify a valid nPathTotal value (0 to
User response: Specify a valid value for 255).
nistCompliance.
6027-3107 Pdisk nsdFormatVersion invalid in
6027-2959 [E] The CPU architecture on this node does option 'name1name2'.
not support tracing in traceMode mode.
Explanation: The nsdFormatVersion that is entered
Switching to traceMode mode.
while parsing the disk is invalid.
Explanation: The CPU does not have constant time
User response: Specify valid nsdFormatVersion, 1 or 2.
stamp counter capability required for overwrite trace
mode. The trace has been enabled in blocking mode.
6027-3200 AFM ERROR: command pCacheCmd
User response: Update configuration parameters to
fileset filesetName fileids
use trace facility in blocking mode or replace this node
[parentId.childId.tParentId.targetId,ReqCmd]
with modern CPU architecture.
original error oerr application error aerr
remote error remoteError
6027-3101 Pdisk rotation rate invalid in option
Explanation: AFM operations on a particular file
'option'.
failed.
Explanation: When parsing disk lists, the pdisk
User response: For asynchronous operations that are
rotation rate is not valid.
requeued, run the mmafmctl command with the
User response: Specify a valid rotation rate (SSD, resumeRequeued option after fixing the problem at the
NVRAM, or 1025 through 65535). home cluster.
6027-3102 Pdisk FRU number too long in option 6027-3201 AFM ERROR DETAILS: type:
'option', maximum length length. remoteCmdType snapshot name
snapshotName snapshot ID snapshotId
Explanation: When parsing disk lists, the pdisk FRU
number is too long. Explanation: Peer snapshot creation or deletion failed.
User response: Specify a valid FRU number that is User response: Fix snapshot creation or deletion error.
shorter than or equal to the maximum length.
6027-3204 AFM: Failed to set xattr on inode
6027-3103 Pdisk location too long in option 'option', inodeNum error err, ignoring.
maximum length length.
Explanation: Setting extended attributes on an inode
Explanation: When parsing disk lists, the pdisk failed.
location is too long.
User response: None.
User response: Specify a valid location that is shorter
than or equal to the maximum length.
6027-3205 AFM: Failed to get xattrs for inode
inodeNum, ignoring.
Explanation: Getting extended attributes on an inode
failed.
User response: None.
6027-3209 Home NFS mount of host:path failed 6027-3215 [W] AFM: Peer snapshot delayed due to
with error err long running execution of operation to
remote cluster for fileset filesetName.
Explanation: NFS mounting of path from the home
Peer snapshot continuing to wait.
cluster failed.
Explanation: Peer snapshot command timed out
User response: Make sure the exported path can be
waiting to flush messages.
mounted over NFSv3.
User response: None.
6027-3210 Cannot find AFM control file for fileset
filesetName in the exported file system at 6027-3216 Fileset filesetName encountered an error
home. ACLs and extended attributes synchronizing with the remote cluster.
will not be synchronized. Sparse files Cannot synchronize with the remote
will have zeros written for holes. cluster until AFM recovery is executed.
Explanation: Either home path does not belong to Explanation: Cache failed to synchronize with home
GPFS, or the AFM control file is not present in the because of an out of memory or conflict error.
exported path. Recovery, resynchronization, or both will be performed
by GPFS to synchronize cache with the home.
User response: If the exported path belongs to a GPFS
file system, run the mmafmconfig command with the User response: None.
enable option on the export path at home.
6027-3217 AFM ERROR Unable to unmount NFS
6027-3211 Change in home export detected. export for fileset filesetName
Caching will be disabled.
Explanation: NFS unmount of the path failed.
Explanation: A change in home export was detected
User response: None.
or the home path is stale.
User response: Ensure the exported path is accessible.
6027-3220 AFM: Home NFS mount of host:path
failed with error err for file system
6027-3212 AFM ERROR: Cannot enable AFM for fileSystem fileset id filesetName. Caching
fileset filesetName (error err) will be disabled and the mount will be
tried again after mountRetryTime seconds,
Explanation: AFM was not enabled for the fileset
on next request to gateway
because the root file handle was modified, or the
remote path is stale. Explanation: NFS mount of the home cluster failed.
The mount will be tried again after mountRetryTime
User response: Ensure the remote export path is
seconds.
accessible for NFS mount.
User response: Make sure the exported path can be
mounted over NFSv3.
6027-3213 Cannot find snapshot link directory
name for exported file system at home
for fileset filesetName. Snapshot directory 6027-3221 AFM: Home NFS mount of host:path
at home will be cached. succeeded for file system fileSystem
fileset filesetName. Caching is enabled.
Explanation: Unable to determine the snapshot
directory at the home cluster. Explanation: NFS mount of the path from the home
cluster succeeded. Caching is enabled.
User response: None.
User response: None.
6027-3214 [E] AFM: Unexpiration of fileset filesetName
failed with error err. Use mmafmctl to 6027-3224 [I] AFM: Failed to set extended attributes
manually unexpire the fileset. on file system fileSystem inode inodeNum
error err, ignoring.
Explanation: Unexpiration of fileset failed after a
home reconnect. Explanation: Setting extended attributes on an inode
failed.
User response: Run the mmafmctl command with the
unexpire option on the fileset. User response: None.
6027-3227 [E] AFM: Cannot enable AFM for file 6027-3234 [E] AFM: Unable to start thread to unexpire
system fileSystem fileset filesetName (error filesets.
err) Explanation: Failed to start thread for unexpiration of
Explanation: AFM was not enabled for the fileset fileset.
because the root file handle was modified, or the User response: None.
remote path is stale.
User response: Ensure the remote export path is 6027-3235 [I] AFM: Stopping recovery for the file
accessible for NFS mount. system fileSystem fileset filesetName
Explanation: AFM recovery terminated because the
6027-3228 [E] AFM: Unable to unmount NFS export current node is no longer MDS for the fileset.
for file system fileSystem fileset
filesetName User response: None.
6027-3239 [E] AFM: Remote command remoteCmdType 6027-3245 [E] AFM: Home mount of afmTarget failed
on file system fileSystem snapshot with error error for file system fileSystem
snapshotName snapshot ID snapshotId fileset ID filesetName. Caching will be
failed. disabled and the mount will be tried
again after mountRetryTime seconds, on
Explanation: A failure occurred when creating or
the next request to the gateway.
deleting a peer snapshot.
Explanation: A mount of the home cluster failed. The
User response: Examine the error details and retry the
mount will be tried again after mountRetryTime seconds.
operation.
User response: Verify that the afmTarget can be
mounted using the specified protocol.
6027-3240 [E] AFM: pCacheCmd file system fileSystem
fileset filesetName file IDs
[parentId.childId.tParentId.targetId,flag] 6027-3246 [I] AFM: Prefetch recovery started for the
error err file system fileSystem fileset filesetName.
Explanation: Operation failed to execute on home in Explanation: Prefetch recovery started.
independent-writer mode.
User response: None.
User response: None.
6027-3247 [I] AFM: Prefetch recovery completed for
6027-3241 [I] AFM: GW queue transfer started for file the file system fileSystem fileset
system fileSystem fileset filesetName. filesetName. error error
Transferring to nodeAddress.
Explanation: Prefetch recovery completed.
Explanation: An old GW initiated the queue transfer
User response: None.
because a new GW node joined the cluster, and the
fileset now belongs to the new GW node.
6027-3248 [E] AFM: Cannot find the control file for
User response: None.
fileset filesetName in the exported file
system at home. This file is required to
6027-3242 [I] AFM: GW queue transfer started for file operate in primary mode. The fileset
system fileSystem fileset filesetName. will be disabled.
Receiving from nodeAddress.
Explanation: Either the home path does not belong to
Explanation: An old MDS initiated the queue transfer GPFS, or the AFM control file is not present in the
because this node joined the cluster as GW and the exported path.
fileset now belongs to this node.
User response: If the exported path belongs to a GPFS
User response: None. file system, run the mmafmconfig command with the
enable option on the export path at home.
6027-3243 [I] AFM: GW queue transfer completed for
file system fileSystem fileset filesetName. 6027-3249 [E] AFM: Target for fileset filesetName is not
error error a secondary-mode fileset or file system.
This is required to operate in primary
Explanation: A GW queue transfer completed.
mode. The fileset will be disabled.
User response: None.
Explanation: The AFM target is not a secondary fileset
or file system.
6027-3244 [I] AFM: Home mount of afmTarget
User response: The AFM target fileset or file system
succeeded for file system fileSystem
should be converted to secondary mode.
fileset filesetName. Caching is enabled.
Explanation: A mount of the path from the home
6027-3250 [E] AFM: Refresh intervals cannot be set for
cluster succeeded. Caching is enabled.
fileset.
User response: None.
Explanation: Refresh intervals are not supported on
primary and secondary-mode filesets.
User response: None.
6027-3257 [E] AFM: Unable to start thread to verify 6027-3305 AFM Fileset filesetName cannot be
primary filesets for RPO. changed as it is in beingDeleted state
Explanation: Failed to start thread for verification of Explanation: The user specified a fileset to tschfileset
primary filesets for RPO. that cannot be changed.
6027-3312 No inode was found matching the 6027-3318 Fileset filesetName cannot be deleted as it
criteria. is in compliant mode and it contains
user files.
Explanation: No inode was found matching the
criteria. Explanation: An attempt was made to delete a
non-empty fileset that is in compliant mode.
User response: None.
User response: None.
6027-3457 [E] Unable to rewrap key with name 6027-3464 [E] New key is already in use.
Keyname (inode inodeNumber, fileset
Explanation: The new key specified in a key rewrap is
filesetNumber, file system fileSystem).
already being used.
Explanation: Unable to rewrap the key for a specified
User response: Ensure that the new key specified in
file because of an error with the key name.
the key rewrap is not being used by the file.
User response: Examine the error message following
this message for information on the specific failure.
6027-3465 [E] Cannot retrieve original key.
Explanation: Original key being used by the file
6027-3458 [E] Invalid length for the Keyname string.
cannot be retrieved from the key server.
Explanation: The Keyname string has an incorrect
User response: Verify that the key server is available,
length. The length of the specified string was either
the credentials to access the key server are correct, and
zero or it was larger than the maximum allowed
that the key is defined on the key server.
length.
User response: Verify the Keyname string.
6027-3466 [E] Cannot retrieve new key.
Explanation: Unable to retrieve the new key specified
6027-3459 [E] Not enough memory.
in the rewrap from the key server.
Explanation: Unable to allocate memory for the
User response: Verify that the key server is available,
Keyname string.
the credentials to access the key server are correct, and
User response: Restart GPFS. Contact the IBM that the key is defined on the key server.
Support Center.
6027-3468 [E] Rewrap error code errorNumber.
6027-3460 [E] Incorrect format for the Keyname string.
Explanation: Key rewrap failed.
Explanation: An incorrect format was used when
User response: Record the error code and contact the
specifying the Keyname string.
IBM Support Center.
User response: Verify the format of the Keyname
string.
6027-3469 [E] Encryption is enabled but the crypto
module could not be initialized. Error
6027-3461 [E] Error code: errorNumber. code: number. Ensure that the GPFS
crypto package was installed.
Explanation: An error occurred when processing a key
ID. Explanation: Encryption is enabled, but the
cryptographic module required for encryption could
User response: Contact the IBM Support Center.
not be loaded.
User response: Ensure that the packages required for
6027-3462 [E] Unable to rewrap key: original key
encryption are installed on each node in the cluster.
name: originalKeyname, new key name:
newKeyname (inode inodeNumber, fileset
filesetNumber, file system fileSystem). 6027-3470 [E] Cannot create file fileName: extended
attribute is too large: numBytesRequired
Explanation: Unable to rewrap the key for a specified
bytes (numBytesAvailable available)
file, possibly because the existing key or the new key
(fileset filesetNumber, file system
cannot be retrieved from the key server.
fileSystem).
User response: Examine the error message following
Explanation: Unable to create an encryption file
this message for information on the specific failure.
because the extended attribute required for encryption
is too large.
6027-3463 [E] Rewrap error.
User response: Change the encryption policy so that
Explanation: An internal error occurred during key the file key is wrapped fewer times, reduce the number
rewrap. of keys used to wrap a file key, or create a file system
with a larger inode size.
User response: Examine the error messages
surrounding this message. Contact the IBM Support
Center.
6027-3471 [E] At least one key must be specified. 6027-3479 [E] Missing combine parameter string.
Explanation: No key name was specified. Explanation: The combine parameter string was not
specified in the encryption policy.
User response: Specify at least one key name.
User response: Verify the syntax of the encryption
policy.
6027-3472 [E] Could not combine the keys.
Explanation: Unable to combine the keys used to
6027-3480 [E] Missing encryption parameter string.
wrap a file key.
Explanation: The encryption parameter string was not
User response: Examine the keys being used. Contact
specified in the encryption policy.
the IBM Support Center.
User response: Verify the syntax of the encryption
policy.
6027-3473 [E] Could not locate the RKM.conf file.
Explanation: Unable to locate the RKM.conf
6027-3481 [E] Missing wrapping parameter string.
configuration file.
Explanation: The wrapping parameter string was not
User response: Contact the IBM Support Center.
specified in the encryption policy.
User response: Verify the syntax of the encryption
6027-3474 [E] Could not open fileType file ('fileName'
policy.
was specified).
Explanation: Unable to open the specified
6027-3482 [E] 'combineParameter' could not be parsed as
configuration file. Encryption files will not be
a valid combine parameter string.
accessible.
Explanation: Unable to parse the combine parameter
User response: Ensure that the specified configuration
string.
file is present on all nodes.
User response: Verify the syntax of the encryption
policy.
6027-3475 [E] Could not read file 'fileName'.
Explanation: Unable to read the specified file.
6027-3483 [E] 'encryptionParameter' could not be parsed
User response: Ensure that the specified file is as a valid encryption parameter string.
accessible from the node.
Explanation: Unable to parse the encryption
parameter string.
6027-3476 [E] Could not seek through file 'fileName'.
User response: Verify the syntax of the encryption
Explanation: Unable to seek through the specified file. policy.
Possible inconsistency in the local file system where the
file is stored.
6027-3484 [E] 'wrappingParameter' could not be parsed
User response: Ensure that the specified file can be as a valid wrapping parameter string.
read from the local node.
Explanation: Unable to parse the wrapping parameter
string.
6027-3477 [E] Could not wrap the FEK.
User response: Verify the syntax of the encryption
Explanation: Unable to wrap the file encryption key. policy.
User response: Examine other error messages. Verify
that the encryption policies being used are correct. 6027-3485 [E] The Keyname string cannot be longer
than number characters.
6027-3478 [E] Insufficient memory. Explanation: The specified Keyname string has too
many characters.
Explanation: Internal error: unable to allocate memory.
User response: Verify that the specified Keyname string
User response: Restart GPFS. Contact the IBM is correct.
Support Center.
6027-3486 [E] The KMIP library could not be 6027-3494 [E] Unrecognized cipher mode.
initialized.
Explanation: Unable to recognize the specified cipher
Explanation: The KMIP library used to communicate mode.
with the key server could not be initialized.
User response: Specify one of the valid cipher modes.
User response: Restart GPFS. Contact the IBM
Support Center.
6027-3495 [E] Unrecognized cipher.
Explanation: Unable to recognize the specified cipher.
6027-3487 [E] The RKM ID cannot be longer than
number characters. User response: Specify one of the valid ciphers.
Explanation: The remote key manager ID cannot be
longer than the specified length. 6027-3496 [E] Unrecognized combine mode.
User response: Use a shorter remote key manager ID. Explanation: Unable to recognize the specified
combine mode.
6027-3488 [E] The length of the key ID cannot be User response: Specify one of the valid combine
zero. modes.
Explanation: The length of the specified key ID string
cannot be zero. 6027-3497 [E] Unrecognized encryption mode.
User response: Specify a key ID string with a valid Explanation: Unable to recognize the specified
length. encryption mode.
User response: Specify one of the valid encryption
6027-3489 [E] The length of the RKM ID cannot be modes.
zero.
Explanation: The length of the specified RKM ID 6027-3498 [E] Invalid key length.
string cannot be zero.
Explanation: An invalid key length was specified.
User response: Specify an RKM ID string with a valid
length. User response: Specify a valid key length for the
chosen cipher mode.
Explanation: The RKM.conf file is larger than the size Explanation: Unable to recognize the specified
that is currently supported. wrapping mode.
User response: User a smaller RKM.conf configuration User response: Specify one of the valid wrapping
file. modes.
6027-3491 [E] The string 'Keyname' could not be parsed 6027-3500 [E] Duplicate Keyname string 'keyIdentifier'.
as a valid key name. Explanation: A given Keyname string has been
Explanation: The specified string could not be parsed specified twice.
as a valid key name. User response: Change the encryption policy to
User response: Specify a valid Keyname string. eliminate the duplicate.
6027-3493 [E] numKeys keys were specified but a 6027-3501 [E] Unrecognized combine mode
maximum of numKeysMax is supported. ('combineMode').
Explanation: The maximum number of specified key Explanation: The specified combine mode was not
IDs was exceeded. recognized.
User response: Change the encryption policy to use User response: Specify a valid combine mode.
fewer keys.
6027-3502 [E] Unrecognized cipher mode ('cipherMode'). 6027-3513 [E] Duplicate backend 'backend'.
Explanation: The specified cipher mode was not Explanation: A duplicate backend name was specified
recognized. in RKM.conf.
User response: Specify a valid cipher mode. User response: Specify unique RKM backends in
RKM.conf.
6027-3503 [E] Unrecognized cipher ('cipher').
6027-3517 [E] Could not open library (libName).
Explanation: The specified cipher was not recognized.
Explanation: Unable to open the specified library.
User response: Specify a valid cipher.
User response: Verify that all required packages are
installed for encryption. Contact the IBM Support
6027-3504 [E] Unrecognized encryption mode ('mode').
Center.
Explanation: The specified encryption mode was not
recognized.
6027-3518 [E] The length of the RKM ID string is
User response: Specify a valid encryption mode. invalid (must be between 0 and length
characters).
6027-3505 [E] Invalid key length ('keyLength'). Explanation: The length of the RKM backend ID is
invalid.
Explanation: The specified key length was incorrect.
User response: Specify an RKM backend ID with a
User response: Specify a valid key length. valid length.
6027-3506 [E] Mode 'mode1' is not compatible with 6027-3519 [E] 'numAttempts' is not a valid number of
mode 'mode2', aborting. connection attempts.
Explanation: The two specified encryption parameters Explanation: The value specified for the number of
are not compatible. connection attempts is incorrect.
User response: Change the encryption policy and User response: Specify a valid number of connection
specify compatible encryption parameters. attempts.
6027-3509 [E] Key 'keyID:RKMID' could not be fetched 6027-3520 [E] 'sleepInterval' is not a valid sleep interval.
(RKM reported error errorNumber).
Explanation: The value specified for the sleep interval
Explanation: The key with the specified name cannot is incorrect.
be fetched from the key server.
User response: Specify a valid sleep interval value (in
User response: Examine the error messages to obtain microseconds).
information about the failure. Verify connectivity to the
key server and that the specified key is present at the
server. 6027-3521 [E] 'timeout' is not a valid connection
timeout.
6027-3510 [E] Could not bind symbol symbolName Explanation: The value specified for the connection
(errorDescription). timeout is incorrect.
Explanation: Unable to find the location of a symbol User response: Specify a valid connection timeout (in
in the library. seconds).
6027-3524 [E] 'tenantName' is not a valid tenantName. 6027-3535 [E] Incorrect client certificate label
'clientCertLabel' for backend 'backend'.
Explanation: An incorrect value was specified for the
tenant name. Explanation: The specified client keypair certificate
label is incorrect for the backend.
User response: Specify a valid tenant name.
User response: Ensure that the correct client certificate
label is used in RKM.conf.
6027-3527 [E] Backend 'backend' could not be
initialized (error errorNumber).
6027-3537 [E] Setting default encryption parameters
Explanation: Key server backend could not be
requires empty combine and wrapping
initialized.
parameter strings.
User response: Examine the error messages. Verify
Explanation: A non-empty combine or wrapping
connectivity to the server. Contact the IBM Support
parameter string was used in an encryption policy rule
Center.
that also uses the default parameter string.
User response: Ensure that neither the combine nor
6027-3528 [E] Unrecognized wrapping mode
the wrapping parameter is set when the default
('wrapMode').
parameter string is used in the encryption rule.
Explanation: The specified key wrapping mode was
not recognized.
6027-3540 [E] The specified RKM backend type
User response: Specify a valid key wrapping mode. (rkmType) is invalid.
Explanation: The specified RKM type in RKM.conf is
6027-3529 [E] An error was encountered while incorrect.
processing file 'fileName':
User response: Ensure that only supported RKM
Explanation: An error was encountered while types are specified in RKM.conf.
processing the specified configuration file.
User response: Examine the error messages that 6027-3541 [E] Encryption is not supported on
follow and correct the corresponding conditions. Windows.
Explanation: Encryption cannot be activated if there
6027-3530 [E] Unable to open encrypted file: key are Windows nodes in the cluster.
retrieval not initialized (inode
User response: Ensure that encryption is not activated
inodeNumber, fileset filesetNumber, file
if there are Windows nodes in the cluster.
system fileSystem).
Explanation: File is encrypted but the infrastructure
6027-3543 [E] The integrity of the file encrypting key
required to retrieve encryption keys was not initialized,
could not be verified after unwrapping;
likely because processing of RKM.conf failed.
the operation was cancelled.
User response: Examine error messages at the time
Explanation: When opening an existing encrypted file,
the file system was mounted.
the integrity of the file encrypting key could not be
verified. Either the cryptographic extended attributes
6027-3533 [E] Invalid encryption key derivation were damaged, or the master key(s) used to unwrap
function. the FEK have changed.
Explanation: An incorrect key derivation function was User response: Check for other symptoms of data
specified. corruption, and verify that the configuration of the key
server has not changed.
User response: Specify a valid key derivation
function.
6027-3545 [E] Encryption is enabled but there is no
valid license. Ensure that the GPFS
6027-3534 [E] Unrecognized encryption key derivation crypto package was installed properly.
function ('keyDerivation').
Explanation: The required license is missing for the
Explanation: The specified key derivation function GPFS encryption package.
was not recognized.
User response: Ensure that the GPFS encryption
User response: Specify a valid key derivation package was installed properly.
function.
6027-3546 [E] Key 'keyID:rkmID' could not be fetched. 6027-3552 Failed to fork a new process to
The specified RKM ID does not exist; operationString file system.
check the RKM.conf settings.
Explanation: Failed to fork a new process to
Explanation: The specified RKM ID part of the key suspend/resume file system.
name does not exist, and therefore the key cannot be
User response: None.
retrieved. The corresponding RKM might have been
removed from RKM.conf.
6027-3553 Failed to sync fileset filesetName.
User response: Check the set of RKMs specified in
RKM.conf. Explanation: Failed to sync fileset.
User response: None.
6027-3547 [E] Key 'keyID:rkmID' could not be fetched.
The connection was reset by the peer
while performing the TLS handshake. 6027-3554 The restore command encountered an
out-of-memory error.
Explanation: The specified key could not be retrieved
from the server, because the connection with the server Explanation: The fileset snapshot restore command
was reset while performing the TLS handshake. encountered an out-of-memory error.
6027-3549 [E] Key 'keyID:rkmID' could not be fetched. User response: Consider some of the command
The TCP connection with the RKM parameters that might affect memory usage. Contact
could not be established. the IBM Support Center.
6027-3550 Error when retrieving encryption User response: Ensure that the file system is not full
attribute: errorDescription. and that files can be created. Contact the IBM Support
Center.
Explanation: Unable to retrieve or decode the
encryption attribute for a given file.
6027-3558 cmdName error: could not initialize the
User response: File could be damaged and may need key management subsystem (error
to be removed if it cannot be read. returnCode).
Explanation: An internal component of the
6027-3551 Error flushing work file fileName: cryptographic library could not be properly initialized.
errorString
User response: Ensure that the gpfs.gskit package
Explanation: An error occurred while attempting to was installed properly. Contact the IBM Support Center.
flush the named work file or socket.
User response: None.
6027-3559 cmdName error: could not create the key 6027-3566 cmdName error: could not open file
database (error returnCode). 'fileName'.
Explanation: The key database file could not be Explanation: The specified file could not be opened.
created.
User response: Ensure that the specified path and file
User response: Ensure that the file system is not full name are correct and that you have sufficient
and that files can be created. Contact the IBM Support permissions to access the file.
Center.
6027-3567 cmdName error: could not convert the
6027-3560 cmdName error: could not create the new private key.
self-signed certificate (error returnCode).
Explanation: The private key material could not be
Explanation: A new certificate could not be converted successfully.
successfully created.
User response: Contact the IBM Support Center.
User response: Ensure that the supplied canonical
name is valid. Contact the IBM Support Center.
6027-3568 cmdName error: could not extract the
private key information structure.
6027-3561 cmdName error: could not extract the key
Explanation: The private key could not be extracted
item (error returnCode).
successfully.
Explanation: The public key item could not be
User response: Contact the IBM Support Center.
extracted successfully.
User response: Contact the IBM Support Center.
6027-3569 cmdName error: could not convert the
private key information to DER format.
6027-3562 cmdName error: base64 conversion failed
Explanation: The private key material could not be
(error returnCode).
converted successfully.
Explanation: The conversion from or to the BASE64
User response: Contact the IBM Support Center.
encoding could not be performed successfully.
User response: Contact the IBM Support Center.
6027-3570 cmdName error: could not encrypt the
private key information structure (error
6027-3563 cmdName error: could not extract the returnCode).
private key (error returnCode).
Explanation: The private key material could not be
Explanation: The private key could not be extracted encrypted successfully.
successfully.
User response: Contact the IBM Support Center.
User response: Contact the IBM Support Center.
6027-3571 cmdName error: could not insert the key
6027-3564 cmdName error: could not initialize the in the keystore, check your system's
ICC subsystem (error returnCode clock (error returnCode).
returnCode).
Explanation: Insertion of the new keypair into the
Explanation: An internal component of the keystore failed because the local date and time are not
cryptographic library could not be properly initialized. properly set on your system.
User response: Ensure that the gpfs.gskit package User response: Synchronize the local date and time on
was installed properly. Contact the IBM Support Center. your system and try this command again.
6027-3565 cmdName error: I/O error. 6027-3572 cmdName error: could not insert the key
in the keystore (error returnCode).
Explanation: A terminal failure occurred while
performing I/O. Explanation: Insertion of the new keypair into the
keystore failed.
User response: Contact the IBM Support Center.
User response: Contact the IBM Support Center.
Explanation: The cluster is configured to operate in User response: Refer to the error message below this
FIPS mode but the cryptographic library could not be line for the cause of the compression failure.
initialized in that mode.
User response: Verify that the gpfs.gskit package has
been installed properly and that GPFS supports FIPS
6027-3587 [E] Aborting compression as the file is 6027-3593 [E] Compression is supported only for
opened in hyper allocation mode. regular files.
Explanation: Compression operation is not performed Explanation: The file is not compressed because
because the file is opened in hyper allocation mode. compression is supported only for regular files.
User response: Compress this file after the file is User response: None.
closed.
6027-3700 [E] Key 'keyID' was not found on RKM ID
6027-3588 [E] Aborting compression as the file is 'rkmID'.
currently memory mapped, opened in
Explanation: The specified key could not be retrieved
direct I/O mode, or stored in a
from the key server.
horizontal storage pool.
User response: Verify that the key is present at the
Explanation: Compression operation is not performed
server. Verify that the name of the keys used in the
because it is inefficient or unsafe to compress the file at
encryption policy is correct.
this time.
User response: Compress this file after the file is no
6027-3701 [E] Key 'keyID:rkmID' could not be fetched.
longer memory mapped, opened in direct I/O mode, or
The authentication with the RKM was
stored in a horizontal storage pool.
not successful.
Explanation: Unable to authenticate with the key
6027-3589 cmdName error: Cannot set the password
server.
twice.
User response: Verify that the credentials used to
Explanation: An attempt was made to set the
authenticate with the key server are correct.
password by using different available options.
User response: Set the password either through the
6027-3702 [E] Key 'keyID:rkmID' could not be fetched.
CLI or by specifying a file that contains it.
Permission denied.
Explanation: Unable to authenticate with the key
6027-3590 cmdName error: Could not access file
server.
fileName (error errorCode).
User response: Verify that the credentials used to
Explanation: The specified file could not be accessed.
authenticate with the key server are correct.
User response: Check whether the file name is correct
and verify whether you have required access privileges
6027-3703 [E] I/O error while accessing the keystore
to access the file.
file 'keystoreFileName'.
Explanation: An error occurred while accessing the
6027-3591 cmdName error: The password specified
keystore file.
in file fileName exceeds the maximum
length of length characters. User response: Verify that the name of the keystore
file in RKM.conf is correct. Verify that the keystore file
Explanation: The password stored in the specified file
can be read on each node.
is too long.
User response: Pick a shorter password and retry the
6027-3704 [E] The keystore file 'keystoreFileName' has
operation.
an invalid format.
Explanation: The specified keystore file has an invalid
6027-3592 cmdName error: Could not read the
format.
password from file fileName.
User response: Verify that the format of the keystore
Explanation: The password could not be read from
file is correct.
the specified file.
User response: Ensure that the file can be read.
6027-3705 [E] Incorrect FEK length after unwrapping;
the operation was cancelled.
Explanation: When opening an existing encrypted file,
the size of the FEK that was unwrapped did not
correspond to the one recorded in the file's extended
attributes. Either the cryptographic extended attributes
User response: Ensure that the RKM server trusts the User response: Copy quota files directly.
client certificate that was used for this request. If this
does not resolve the issue, contact the IBM Support 6027-3905 [E] Specified directory does not exist or is
Center. invalid.
Explanation: The specified directory does not exist or
6027-3719 [W] Wrapping parameter string is invalid.
'oldWrappingParameter' is not safe and
will be replaced with User response: Check the spelling or validity of the
'newWrappingParameter'. directory.
6027-3900 Invalid flag 'flagName' in the criteria file. Explanation: The node could not renew its disk lease
and there was no other quorum node available to
Explanation: An invalid flag was found in the criteria contact.
file.
User response: Determine whether there was a
User response: None. network outage, and also ensure the cluster is
configured with enough quorum nodes. The node will
attempt to rejoin the cluster.
6027-3908 Check file 'fileName' on fileSystem for 6027-3914 [E] Current file system version does not
inodes with broken disk addresses or support compression.
failures.
Explanation: File system version is not recent enough
Explanation: The named file contains the inodes for file compression support.
generated by parallel inode traversal (PIT) with
User response: Upgrade the file system to the latest
interesting flags; for example, dataUpdateMiss or
version, then retry the command.
BROKEN.
User response: None.
6027-4000 [I] descriptorType descriptor on this NSD can
be updated by running the following
6027-3909 The file (backupQuotaFile) is a quota file command from the node physically
in fileSystem already. connected to NSD nsdName:
Explanation: The file is a quota file already. An Explanation: This message is displayed when a
incorrect file name might have been specified. descriptor validation thread finds a valid NSD, or disk,
or stripe group descriptor but with a different ID. This
User response: None.
can happen if a device is reused for another NSD.
User response: None. After this message, another
6027-3910 [I] Delay number seconds for safe recovery.
message is displayed with a command to fix the
Explanation: When disk lease is in use, wait for the problem.
existing lease to expire before performing log and token
manager recovery.
6027-4001 [I] 'mmfsadm writeDesc <device>
User response: None. descriptorType descriptorId:descriptorId
nsdFormatVersion pdiskStatus', where
device is the device name of that NSD.
6027-3911 Error reading message from the file
system daemon: errorString : The system Explanation: This message displays the command that
ran out of memory buffers or memory to must run to fix the NSD or disk descriptor on that
expand the memory buffer pool. device. The deviceName must be supplied by system
administrator or obtained from mmlsnsd -m command.
Explanation: The system ran out of memory buffers or The descriptorId is a hexadecimal value.
memory to expand the memory buffer pool. This
prevented the client from receiving a message from the User response: Run the command that is displayed on
file system daemon. that NSD server node and replace deviceName with the
device name of that NSD.
User response: Try again later.
6027-4004 [D] On-disk NSD descriptor: nsdId nsdId 6027-4009 [E] On-disk NSD descriptor of nsdName is
nsdMagic nsdMagic nsdFormatVersion valid but has a different ID. ID in cache
nsdFormatVersion on disk nsdChecksum is cachedId and ID on-disk is ondiskId
nsdChecksum calculated checksum
Explanation: While verifying an on-disk NSD
calculatedChecksum nsdDescSize
descriptor, a valid descriptor was found but with a
nsdDescSize firstPaxosSector
different ID. This can happen if a device is reused for
firstPaxosSector nPaxosSectors
another NSD with the mmcrnsd -v no command.
nPaxosSectors nsdIsPdisk nsdIsPdisk
User response: After this message, there are more
Explanation: Description of an on-disk NSD
messages displayed that describe the actions to follow.
descriptor.
User response: None.
6027-4010 [I] This corruption can happen if the device
is reused by another NSD with the -v
6027-4005 [D] Local copy of NSD descriptor: nsdId option and a file system is created with
nsdId nsdMagic nsdMagic formatVersion that reused NSD.
formatVersion nsdDescSize nsdDescSize
Explanation: Description of a corruption that can
firstPaxosSector firstPaxosSector
happen when an NSD is reused.
nPaxosSectors nPaxosSectors
User response: Verify that the NSD was not reused to
Explanation: Description of the cached NSD
create another NSD with the -v option and that the
descriptor.
NSD was not used for another file system.
User response: None.
6027-4011 [D] On-disk disk descriptor: uid
6027-4006 [I] Writing NSD descriptor of nsdName with descriptorID:descriptorID magic descMagic
local copy: nsdId nsdId formatVersion formatVersion descSize
nsdFormatVersion formatVersion descSize checksum on disk diskChecksum
firstPaxosSector firstPaxosSector calculated checksum calculatedChecksum
nPaxosSectors nPaxosSectors nsdDescSize firstSGDescSector firstSGDescSector
nsdDescSize nsdIsPdisk nsdIsPdisk nSGDescSectors nSGDescSectors
nsdChecksum nsdChecksum lastUpdateTime lastUpdateTime
Explanation: Description of the NSD descriptor that Explanation: Description of the on-disk disk
was written. descriptor.
User response: None. User response: None.
6027-4007 errorType descriptor on descriptorType 6027-4012 [D] Local copy of disk descriptor: uid
nsdId nsdId:nsdId error error descriptorID:descriptorID
firstSGDescSector firstSGDescSector
Explanation: This message is displayed after reading
nSGDescSectors nSGDescSectors
and writing NSD, disk and stripe group descriptors.
Explanation: Description of the cached disk descriptor.
User response: None.
User response: None.
6027-4008 [E] On-disk descriptorType descriptor of
nsdName is valid but has a different 6027-4013 [I] Writing disk descriptor of nsdName with
UID: uid descriptorId:descriptorId on-disk local copy: uid descriptorID:descriptorID,
uid descriptorId:descriptorId nsdId magic magic, formatVersion formatVersion
nsdId:nsdId firstSGDescSector firstSGDescSector
nSGDescSectors nSGDescSectors descSize
Explanation: While verifying an on-disk descriptor, a
descSize
valid descriptor was found but with a different ID. This
can happen if a device is reused for another NSD with Explanation: Writing disk descriptor to disk with local
the mmcrnsd -v no command. information.
User response: After this message there are more User response: None.
messages displayed that describe the actions to follow.
lastUpdateTime lastUpdateTime
6027-4014 [D] Local copy of StripeGroup descriptor:
uid descriptorID:descriptorID Explanation: Description of the on-disk stripe group
curFmtVersion curFmtVersion descriptor.
configVersion configVersion
User response: None.
Explanation: Description of the cached stripe group
descriptor.
6027-4016 [E] Data buffer checksum mismatch during
User response: None. write. File system fileSystem tag tag1 tag2
nBytes nBytes diskAddresses
6027-4015 [D] On-disk StripeGroup descriptor: uid Explanation: GPFS detected a mismatch in the
sgUid:sgUid magic magic curFmtVersion checksum of the data buffer content which means
curFmtVersion descSize descSize on-disk content of data buffer was changing while a direct I/O
checksum diskChecksum calculated write operation was in progress.
checksum calculatedChecksum
configVersion configVersion User response: None.
Accessibility features
The following list includes the major accessibility features in IBM Spectrum Scale:
v Keyboard-only operation
v Interfaces that are commonly used by screen readers
v Keys that are discernible by touch but do not activate just by touching them
v Industry-standard devices for ports and connectors
v The attachment of alternative input and output devices
IBM Knowledge Center, and its related publications, are accessibility-enabled. The accessibility features
are described in IBM Knowledge Center (www.ibm.com/support/knowledgecenter).
Keyboard navigation
This product uses standard Microsoft Windows navigation keys.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some
states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in
any manner serve as an endorsement of those websites. The materials at those websites are not part of
the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
IBM Corporation
Dept. H6MA/Building 707
Mail Station P300
2455 South Road
Poughkeepsie, NY 12601-5400
USA
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or
any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal without
notice, and represent goals and objectives only.
This information is for planning purposes only. The information herein is subject to change before the
products described become available.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
Each copy or any portion of these sample programs or any derivative work, must include a copyright
notice as follows:
© Copyright IBM Corp. _enter the year or years_. All rights reserved.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of the Open Group in the United States and other countries.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use provided that all
proprietary notices are preserved. You may not distribute, display or make derivative work of these
publications, or any portion thereof, without the express consent of IBM.
Commercial use
You may reproduce, distribute and display these publications solely within your enterprise provided that
all proprietary notices are preserved. You may not make derivative works of these publications, or
reproduce, distribute or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
Notices 321
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE
PUBLICATIONS ARE PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF
MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
This Software Offering does not use cookies or other technologies to collect personally identifiable
information.
If the configurations deployed for this Software Offering provide you as customer the ability to collect
personally identifiable information from end users via cookies and other technologies, you should seek
your own legal advice about any laws applicable to such data collection, including any requirements for
notice and consent.
For more information about the use of various technologies, including cookies, for these purposes, See
IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at
http://www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other
Technologies” and the “IBM Software Products and Software-as-a-Service Privacy Statement” at
http://www.ibm.com/software/info/product-privacy.
Glossary 325
N policy rule
A programming statement within a policy
namespace
that defines a specific action to be
Space reserved by a file system to contain
performed.
the names of its objects.
pool A group of resources with similar
Network File System (NFS)
characteristics and attributes.
A protocol, developed by Sun
Microsystems, Incorporated, that allows portability
any host in a network to gain access to The ability of a programming language to
another host or netgroup and their file compile successfully on different
directories. operating systems without requiring
changes to the source code.
Network Shared Disk (NSD)
A component for cluster-wide disk primary GPFS cluster configuration server
naming and access. In a GPFS cluster, the node chosen to
maintain the GPFS cluster configuration
NSD volume ID
data.
A unique 16 digit hex number that is
used to identify and access all NSDs. private IP address
A IP address used to communicate on a
node An individual operating-system image
private network.
within a cluster. Depending on the way in
which the computer system is partitioned, public IP address
it may contain one or more nodes. A IP address used to communicate on a
public network.
node descriptor
A definition that indicates how GPFS uses
Q
a node. Possible functions include:
manager node, client node, quorum node, quorum node
and nonquorum node. A node in the cluster that is counted to
determine whether a quorum exists.
node number
A number that is generated and quota The amount of disk space and number of
maintained by GPFS as the cluster is inodes assigned as upper limits for a
created, and as nodes are added to or specified user, group of users, or fileset.
deleted from the cluster.
quota management
node quorum The allocation of disk blocks to the other
The minimum number of nodes that must nodes writing to the file system, and
be running in order for the daemon to comparison of the allocated space to
start. quota limits at regular intervals.
node quorum with tiebreaker disks
R
A form of quorum that allows GPFS to
run with as little as one quorum node Redundant Array of Independent Disks (RAID)
available, as long as there is access to a A collection of two or more disk physical
majority of the quorum disks. drives that present to the host an image
of one or more logical disk drives. In the
non-quorum node
event of a single physical device failure,
A node in a cluster that is not counted for
the data can be read or regenerated from
the purposes of quorum determination.
the other disk drives in the array due to
data redundancy.
P
recovery
policy A list of file-placement, service-class, and
The process of restoring access to file
encryption rules that define characteristics
system data when a failure has occurred.
and placement of files. Several policies
Recovery can involve reconstructing data
can be defined within the configuration,
or providing alternative routing through a
but only one policy set is active at one
different server.
time.
Glossary 327
Token management provides data
consistency and controls conflicts. Token
management has two components: the
token management server, and the token
management function.
token management function
A component of token management that
requests tokens from the token
management server. The token
management function is located on each
cluster node.
token management server
A component of token management that
controls tokens relating to the operation
of the file system. The token management
server is located at the file system
manager node.
twin-tailed
A disk connected to two nodes.
U
user storage pool
A storage pool containing the blocks of
data that make up user files.
V
VFS See virtual file system.
virtual file system (VFS)
A remote file system that has been
mounted so that it is accessible to the
local user.
virtual node (vnode)
The structure that contains information
about a file system object in a virtual file
system (VFS).
Index 331
error messages (continued) File Placement Optimizer (FPO), questions related to 148
mmbackup 116 file placement policy 112
mmfsd ready 80 file system 95
multiple file system manager failures 108 mount status 108
network problems 82 space 110
quorum 88 file system descriptor 106, 107
rsh problems 76 failure groups 106
shared segment problems 81, 82 inaccessible 107
snapshot 116, 117, 118 file system manager
TSM 116 cannot appoint 105
error numbers contact problems
application calls 98 communication paths unavailable 96
configuration problems 78 multiple failures 108
data corruption 124 file system mount failure 143
EALL_UNAVAIL = 218 108 file system or fileset getting full 148
ECONFIG = 208 78 file systems
ECONFIG = 215 78, 82 cannot be unmounted 50
ECONFIG = 218 79 creation failure 90
ECONFIG = 237 78 determining if mounted 108
ENO_MGR = 212 109, 135 discrepancy between configuration data and on-disk
ENO_QUOTA_INST = 237 98 data 109
EOFFLINE = 208 135 do not mount 95
EPANIC = 666 107 does not mount 95
EVALIDATE = 214 124 does not unmount 104
file system forced unmount 107 forced unmount 22, 105, 108
GPFS application calls 135 free space shortage 118
GPFS daemon will not come up 82 listing mounted 50
installation problems 78 loss of access 91
multiple file system manager failures 109 remote 100
errors, application program 92 state after restore 118
errors, Persistent Reserve 138 unable to determine if mounted 108
errpt command 167 will not mount 49
events FILE_SIZE attribute 55, 56
Availability 151 files
Reliability 151 /etc/filesystems 96
Serviceability 151 /etc/fstab 96
example /etc/group 21
error logs 22 /etc/hosts 74
EXCLUDE rule 55 /etc/passwd 21
excluded file 55 /etc/resolv.conf 93
attributes 55 /usr/lpp/mmfs/bin/runmmfs 34
extended attribute size supported by AFM 148 /usr/lpp/mmfs/samples/gatherlogs.samples.sh 3
/var/adm/ras/mmfs.log.previous 89
/var/mmfs/etc/mmlock 77
F /var/mmfs/gen/mmsdrfs 77
.rhosts 76
facility
detecting damage 59
Linux kernel crash dump (LKCD) 71
mmfs.log 2, 80, 81, 83, 95, 99, 100, 101, 102, 103, 104, 105,
failure
167
disk 130
mmsdrbackup 78
mmccr command 148
mmsdrfs 78
mmfsck command 147
protocol authentication log 9
of disk media 132
FILESET_NAME attribute 55, 56
snapshot 116
filesets
failure creating a file 143
child 113
failure group 106
deleting 113
failure groups
emptying 113
loss of 107
errors 114
use of 106
lost+found 114
failure, key rewrap 144
moving contents 113
failure, mount 143
performance 113
failures
problems 109
mmbackup 116
snapshots 113
File Authentication
unlinking 113
setup problems 93
usage errors 113
file creation failure 143
FPO 148
file migration
FSDesc structure 106
problems 113
Index 333
GPFS (continued) GPFS (continued)
mount 98, 100, 147 snapshot usage errors 117
mount failure 103, 143 some files are 'ill-placed' 111
mounting cluster 102 stale inode data 121
mounting cluster does not have direct access to the storage pools 114, 115
disks 102 strict replication 134
multipath device 141 system load increase in night 146
multiple file system manager failures 108 timeout executing function error message 148
negative values in the 'predicted pool utilizations', 111 trace facility 34
NFS client 121 tracing the mmpmon command 120
NFS problems 121 TSM error messages 116
NFS V4 121 UID mapping 100
NFS V4 issues 121 unable to access disks 131
NFS V4 problem 121 unable to determine if a file system is mounted 108
no replication 134 unable to start 73
NO_SPACE error 110 underlying disk subsystem failures 127
nodes will not start 81 understanding Persistent Reserve 138
NSD creation failure 130 unmount failure 104
NSD disk does not have an NSD server specified 102 unused underlying multipath device 141
NSD information 128 usage errors 111, 114
NSD is down 130 using mmpmon 119
NSD server 103 value to large failure 143
NSD subsystem failures 127 value to large failure while creating a file 143
NSDs built on top of AIX logical volume is down 136 varyon problems 137
offline mmfsck command failure 147 volume group 137
old inode data 121 volume group on each node 137
on-disk data 109 Windows file system 147
Operating system error logs 19 Windows issues 92, 93
partial disk failure 136 working with Samba 123
permission denied error message 103 GPFS cluster
permission denied failure 144 problems adding nodes 87
Persistent Reserve errors 138 recovery from loss of GPFS cluster configuration data
physical disk association 145 files 77
physical disk association with logical volume 145 GPFS cluster data
policies 111, 112 backup 78
predicted pool utilizations 111 locked 77
problem determination hints 145 GPFS cluster data files storage 77
problem determination tips 145 GPFS command
problems not directly related to snapshots 116 failed 89
problems while working with Samba in 123 return code 89
problems with locating a snapshot 116 unsuccessful 89
problems with non-IBM disks 138 GPFS commands
protocol service logs 3, 6, 8, 11, 13 unsuccessful 89
quorum nodes in cluster 145 GPFS configuration data 109
RAS events 151 GPFS daemon 75, 79, 80, 95, 105
Reliability 151 crash 83
remote cluster name 101 fails to start 80
remote command issues 75, 76 went down 20, 83
remote file system 100, 101 will not start 79
remote file system does not mount 100, 101 GPFS daemon went down 83
remote file system I/O failure 100 GPFS failure
remote mount failure 103 network failure 84
replicated data 133 GPFS GUI logs 41
replicated metadata 133, 134 GPFS is not using the underlying multipath device 141
replication 132, 134 GPFS kernel extension 79
Requeing message 124 GPFS local node failure 102
requeuing of messages in AFM 124 GPFS log 1, 2, 80, 81, 83, 95, 99, 100, 101, 102, 103, 104, 105,
restoring a snapshot 118 167
Samba 123 GPFS messages 173
security issues 75 GPFS modules
Serviceability 151 cannot be loaded 79
set up 38 unable to load on Linux 79
setup issues 119 GPFS problems 73, 95, 127
SMB server health 122 GPFS startup time 2
snapshot directory name conflict 118 GPFS trace facility 34
snapshot problems 116 GPFS Windows SMB2 protocol (CIFS serving) 93
snapshot status errors 117 gpfs.snap 26
Index 335
IBM Spectrum Scale (continued) IBM Spectrum Scale (continued)
file system is known to have adequate free space 110 mmapplypolicy -L 5 command 55
file system is mounted 108 mmapplypolicy -L 6 command 56
file system manager appointment fails 109 mmapplypolicy -L command 52, 53, 54, 55, 56
file system manager failures 109 mmapplypolicy command 51
file system mount problems 97, 98 mmdumpperfdata command 31
file system mount status 108 mmfileid command 59
file system mounting on wrong drive 147 MMFS_DISKFAIL 20
file systems manager failure 108 MMFS_ENVIRON
filesets usage errors 113 error log 20
GPFS cluster security configurations 101 MMFS_FSSTRUCT error log 20
GPFS commands unsuccessful 90 MMFS_GENERIC error log 20
GPFS daemon does not start 82 MMFS_LONGDISKIO 21
GPFS daemon issues 79, 80, 81, 82, 83 mmfsadm command 33
GPFS declared NSD is down 130 mmlscluster command 44
GPFS disk issues 85, 127 mmlsconfig command 45
GPFS down on contact nodes 102 mmlsmount command 50
GPFS error message 97 mmrefresh command 45
GPFS error messages 108, 117 mmremotecluster command 101
GPFS error messages for disk media failures 135 mmsdrrestore command 46
GPFS error messages for file system forced unmount mmwindisk command 58
problems 107 mount 98, 100
GPFS error messages for file system mount status 108 mount failure 103
GPFS error messages for mmbackup errors 116 mount failure as the client nodes joined before NSD
GPFS failure servers 103
network issues 84 mount failure for a file system 143
GPFS file system issues 95 mounting cluster does not have direct access to the
GPFS has declared NSDs built on top of AIX logical disks 102
volume as down 136 multiple file system manager failures 108
GPFS is not running on the local node 102 negative values occur in the 'predicted pool
GPFS modules utilizations', 111
unable to load on Linux 79 newly mounted windows file system is not displayed 147
gpfs.snap 23, 24, 25 NFS client 121
gpfs.snap command 25 NFS on Linux 27
Linux platform 25 NFS problems 121
gpfs.snap command NFS V4 issues 121
usage 23 no replication 134
guarding against disk failures 132 NO_SPACE error 110
GUI logs 41 NSD and underlying disk subsystem failures 127
hang in mmpmon 120 NSD creation fails 130
HDFS transparency log 8 NSD disk does not have an NSD server specified 102
hints and tips for problem determination 145 NSD server 103
hosts file issue 74 Object logs 6
incorrect output from mmpmon 120, 151 offline mmfsck command failure 147
installation and configuration issues 73, 74, 77, 78, 79, 80, old NFS inode data 121
81, 82, 83, 85, 87, 88, 89, 90, 91, 92 operating system error logs 19, 20, 21, 22
key rewrap 144 operating system logs 19, 20, 21, 22
log 2 other problem determination tools 71
logical volume 145 partial disk failure 136
logical volumes are properly defined for GPFS use 136 performance issues 86
logs 1 permission denied error message 103
lsof command 50 permission denied failure 144
manually disabling Persistent Reserve 140 Persistent Reserve errors 138
manually enabling Persistent Reserve 140 physical disk association 145
master log file 2 policies 111
master snapshot 25 problem determination 145
message 6027-648 147 problems while working with Samba 123
message referring to an existing NSD 130 problems with locating a snapshot 116
message requeuing in AFM 124 problems with non-IBM disks 138
message severity tags 171 protocol service logs 3, 6, 8, 11, 13
messages 173 quorum loss 85
mmafmctl Device getstate 43 quorum nodes 145
mmapplypolicy -L 0 command 52 quorum nodes in cluster 145
mmapplypolicy -L 1 command 52 remote cluster name 101
mmapplypolicy -L 2 command 53 remote cluster name does not match with the cluster
mmapplypolicy -L 3 command 54 name 101
mmapplypolicy -L 4 command 55 remote command issues 75, 76
Index 337
mmccr command mmlsdisk command 90, 95, 96, 106, 109, 127, 130, 132, 135,
failure 148 168
mmchcluster command 75 mmlsfileset command 113
mmchconfig command 45, 80, 88, 103 mmlsfs command 97, 133, 134, 167
mmchdisk command 96, 106, 109, 114, 115, 127, 130, 131, 133, mmlsmgr command 33, 96
135 mmlsmount command 50, 80, 91, 95, 105, 106, 127
mmcheckquota command 21, 57, 92, 106 mmlsnsd command 57, 128, 136
mmchfs command 22, 78, 86, 90, 96, 98, 106, 123 mmlspolicy command 112
mmchnode command 146 mmlsquota command 91, 92
mmchnsd command 127 mmlssnapshot command 116, 117, 118
mmchpolicy mmmount command 49, 95, 106, 138
issues with adding encryption policy 143 mmpmon
mmcommon 98, 99 abend 120
mmcommon breakDeadlock 67 altering input file 119
mmcommon recoverfs command 109 concurrent usage 119
mmcommon showLocks command 77 counters wrap 120
mmcrcluster command 45, 75, 80, 87, 146 dump 120
mmcrfs command 90, 123, 127, 138 hang 120
mmcrnsd command 127, 130 incorrect input 119
mmcrsnapshot command 117, 118 incorrect output 120
mmdefedquota command fails 147 restrictions 119
mmdeldisk command 109, 114, 133, 136 setup problems 119
mmdelfileset command 113 unsupported features 119
mmdelfs command 134, 135 mmpmon command 71, 119
mmdelnode command 87, 90 trace 120
mmdelnsd command 130, 134 mmquotaoff command 92
mmdelsnapshot command 117 mmquotaon command 92
mmdf command 86, 110, 136 mmrefresh command 45, 96, 98
mmdiag command 43 mmremotecluster command 61, 101, 102
mmdsh command 76 mmremotefs command 98, 101
mmdumpperfdata 31 mmrepquota command 92
mmedquota command fails 147 mmrestorefs command 117, 118, 119
mmexpelnode command 46 mmrestripefile command 112, 115
mmfileid command 59, 124, 133 mmrestripefs command 115, 133, 136
MMFS_ABNORMAL_SHUTDOWN mmrpldisk command 109, 114, 138
error logs 20 mmsdrbackup 78
MMFS_DISKFAIL mmsdrfs 78
error logs 20 mmsdrrestore command 46
MMFS_ENVIRON mmshutdown command 44, 46, 80, 81, 83, 98, 99
error logs 20 mmsnapdir command 116, 118, 119
MMFS_FSSTRUCT mmstartup command 80, 98, 99
error logs 20 mmtracectl command
MMFS_GENERIC generating GPFS trace reports 34
error logs 20 mmumount command 105, 106, 136
MMFS_LONGDISKIO mmunlinkfileset command 113
error logs 21 mmwindisk command 58
MMFS_QUOTA mode of AFM fileset, changing 148
error log 21 MODIFICATION_TIME attribute 55, 56
error logs 21, 57 module is incompatible 79
MMFS_SYSTEM_UNMOUNT mount
error logs 22 problems 103
MMFS_SYSTEM_WARNING mount command 95, 96, 98, 134, 138
error logs 22 mount failure 143
mmfs.log 2, 80, 81, 83, 95, 99, 100, 101, 102, 103, 104, 105, 167 mounting cluster 102
mmfsadm command 33, 37, 81, 87, 124, 133 Mounting file system
mmfsck command 49, 95, 96, 114, 124, 134, 136 error messages 97
failure 147 Multi-Media LAN Server 1
mmfsd 79, 80, 95, 105
will not start 79
mmfslinux
kernel module 79
N
network failure 84
mmgetstate command 43, 81, 89
network problems 20
mmlock directory 77
NFS 26, 121
mmlsattr command 113
problems 121
mmlscluster command 44, 87, 101, 145
NFS client
mmlsconfig command 34, 45, 98
with stale inode data 121
P Q
quorum 81, 145
partitioning information, viewing 58
disk 85
performance 26, 75
loss 85
permission denied
quorum node 145
remote mounts failure 103
quota
permission denied failure (key rewrap) 144
cannot write to quota file 106
Persistent Reserve
denied 91
checking 139
error number 78
clearing a leftover reservation 139
quota files 57
errors 138
quota problems 21
manually enabling or disabling 140
understanding 138
ping command 76
PMR 169 R
policies RAID controller 132
DEFAULT clause 112 rcp command 75
deleting referenced objects 112 read-only mode mount 49
errors 112 recovery
file placement 112 cluster configuration data 77
incorrect file placement 112 recovery log 85
LIMIT clause 111 recreation of GPFS storage file
long runtime 112 mmchcluster -p LATEST 77
Index 339
Reliability 151 storage pools
remote command problems 75 deleting 112, 115
remote file copy command errors 115
default 75 failure groups 114
remote file system problems 109
mount 101 slow access time 115
remote file system I/O fails with "Function not implemented" usage errors 114
error 100 strict replication 134
remote mounts fail with permission denied 103 subnets attribute 88
remote node support for troubleshooting
expelled 88 contacting IBM support center 167
remote node expelled 88 how to contact IBM support center 169
remote shell information to be collected before contacting IBM support
default 75 center 167
removing the setuid bit 83 syslog facility
replicated Linux 19
metadata 134 syslogd 100
replicated data 133 system load 146
replicated metadata 133 system snapshots 23
replication 114 system storage pool 111, 114
of data 132 System z 74, 168
replication, none 134
reporting a problem to IBM 33
resetting of setuid/setgits at AFM home 148
restricted mode mount 49
T
threads
rpm command 167
tuning 75
rsh
waiting 87
problems using 75
Tivoli Storage Manager server 116
rsh command 75, 89
trace
active file management 35
allocation manager 35
S basic classes 35
Samba behaviorals 37
client failure 123 byte range locks 35
scp command 76 call to routines in SharkMsg.h 36
Secure Hash Algorithm digest 61 checksum services 35
Serviceability 151 cleanup routines 35
serving (CIFS), Windows SMB2 protocol 93 cluster security 37
set up concise vnop description 37
core dumps 38 daemon routine entry/exit 35
setuid bit, removing 83 daemon specific code 37
setuid/setgid bits at AFM home, resetting of 148 data shipping 35
severity tags defragmentation 35
messages 171 dentry operations 35
SHA digest 61, 101 disk lease 35
shared segments 81 disk space allocation 35
problems 82 DMAPI 35
SMB 26 error logging 35
SMB on Linux 26 events exporter 35
SMB server 122 file operations 35
SMB service file system 35
log 3 generic kernel vfs information 36
logs 4 inode allocation 36
SMB2 protocol (CIFS serving), Windows 93 interprocess locking 36
snapshot kernel operations 36
directory name conflict 118 kernel routine entry/exit 36
error messages 116, 117 low-level vfs locking 36
invalid state 117 mailbox message handling 36
restoring 118 malloc/free in shared segment 36
status error 117 miscellaneous tracing and debugging 37
usage error 117 mmpmon 36
valid 116 mnode operations 36
snapshot problems 116 mutexes and condition variables 36
ssh command 76 network shared disk 36
steps to follow online multinode fsck 36
GPFS daemon does not come up 80 operations in Thread class 37
page allocator 36
U
UID mapping 100
umount command 105, 106
unable to start GPFS 81
underlying multipath device 141
understanding, Persistent Reserve 138
unsuccessful GPFS commands 89
usage errors
policies 111
useNSDserver attribute 135
USER_ID attribute 55, 56
using the gpfs.snap command 23
V
v 75
value too large failure 143
varyon problems 137
varyonvg command 138
viewing disks and partitioning information 58
volume group 137
Index 341
342 IBM Spectrum Scale 4.2: Problem Determination Guide
IBM®
Printed in USA
GA76-0443-06