V7.1.0 Troubleshooting Guide for IBM System Storage SAN Volume Controller
V7.1.0 Troubleshooting Guide for IBM System Storage SAN Volume Controller
Troubleshooting Guide
GC27-2284-04
Note
Before using this information and the product it supports, read the information in “Notices” on page 319.
| This edition applies to IBM System Storage SAN Volume Controller, Version 7.1, and to all subsequent releases and
| modifications until otherwise indicated in new editions.
This edition replaces GC27-2284-03.
© Copyright IBM Corporation 2003, 2013.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . vii When to use the management GUI . . . . . 70
Accessing the management GUI . . . . . . 70
Tables . . . . . . . . . . . . . . . ix Deleting a node from a clustered system using
the management GUI . . . . . . . . . . 71
Adding nodes to a clustered system . . . . . 73
About this guide . . . . . . . . . . . xi Service assistant interface . . . . . . . . . . 76
Who should use this guide . . . . . . . . . xi When to use the service assistant . . . . . . 76
Emphasis . . . . . . . . . . . . . . . xi Accessing the service assistant . . . . . . . 77
SAN Volume Controller library and related Cluster (system) command-line interface. . . . . 77
publications . . . . . . . . . . . . . . xi When to use the cluster (system) CLI . . . . . 77
How to order IBM publications . . . . . . . xiv Accessing the cluster (system) CLI. . . . . . 78
Related websites. . . . . . . . . . . . . xv Service command-line interface . . . . . . . . 78
Sending your comments . . . . . . . . . . xv When to use the service CLI . . . . . . . . 78
How to get information, help, and technical Accessing the service CLI. . . . . . . . . 78
assistance . . . . . . . . . . . . . . . xv | Front panel interface . . . . . . . . . . . 78
| Summary of changes for GC27-2284-04 SAN
| Volume Controller Troubleshooting Guide. . . . xvii
Chapter 4. Performing recovery actions
Summary of changes for GC27-2284-03 SAN
Volume Controller Troubleshooting Guide . . . xviii using the SAN Volume Controller CLI . 79
Validating and repairing mirrored volume copies
using the CLI. . . . . . . . . . . . . . 79
Chapter 1. SAN Volume Controller
Repairing a space-efficient volume using the CLI . . 80
overview . . . . . . . . . . . . . . 1 Recovering from offline volumes using the CLI . . 81
Systems . . . . . . . . . . . . . . . . 5 Replacing nodes nondisruptively . . . . . . . 82
Configuration node . . . . . . . . . . . 5
Configuration node addressing . . . . . . . 5
Chapter 5. Viewing the vital product
Management IP failover . . . . . . . . . 6
SAN fabric overview . . . . . . . . . . . 7 data . . . . . . . . . . . . . . . . 89
Viewing the vital product data using the
management GUI . . . . . . . . . . . . 89
Chapter 2. Introducing the SAN Volume
Displaying the vital product data using the CLI . . 89
Controller hardware components . . . . 9 Displaying node properties using the CLI . . . 89
SAN Volume Controller nodes . . . . . . . . 9 Displaying clustered system properties using the
SAN Volume Controller front panel controls and CLI . . . . . . . . . . . . . . . . 90
indicators . . . . . . . . . . . . . . 9 Fields for the node VPD . . . . . . . . . . 92
SAN Volume Controller operator-information Fields for the system VPD . . . . . . . . . 96
panel . . . . . . . . . . . . . . . 14
SAN Volume Controller rear-panel indicators and
Chapter 6. Using the front panel of the
connectors . . . . . . . . . . . . . . 19
Fibre Channel port numbers and worldwide port SAN Volume Controller. . . . . . . . 99
names . . . . . . . . . . . . . . . 35 Boot progress indicator . . . . . . . . . . 99
Requirements for the SAN Volume Controller Boot failed. . . . . . . . . . . . . . . 99
environment . . . . . . . . . . . . . 36 Charging . . . . . . . . . . . . . . . 100
Redundant ac-power switch . . . . . . . . . 47 Error codes . . . . . . . . . . . . . . 100
Redundant ac-power environment requirements 48 Hardware boot . . . . . . . . . . . . . 100
Cabling of redundant ac-power switch (example) 49 Node rescue request . . . . . . . . . . . 100
Uninterruptible power supply . . . . . . . . 52 Power failure . . . . . . . . . . . . . 101
2145 UPS-1U . . . . . . . . . . . . . 53 Powering off . . . . . . . . . . . . . 101
Uninterruptible power-supply environment Recovering . . . . . . . . . . . . . . 102
requirements . . . . . . . . . . . . . 57 Restarting . . . . . . . . . . . . . . 102
Defining the SAN Volume Controller FRUs . . . . 58 Shutting down . . . . . . . . . . . . . 102
SAN Volume Controller FRUs . . . . . . . 58 Validate WWNN? option . . . . . . . . . 103
Redundant ac-power switch FRUs . . . . . . 67 SAN Volume Controller menu options . . . . . 104
Cluster (system) options . . . . . . . . . 106
Node options . . . . . . . . . . . . 108
Chapter 3. SAN Volume Controller user
Version options . . . . . . . . . . . . 108
interfaces for servicing your system . . 69 Ethernet options . . . . . . . . . . . 108
Management GUI interface . . . . . . . . . 69
Contents v
vi SAN Volume Controller: Troubleshooting Guide
Figures
1. SAN Volume Controller system in a fabric 2 32. SAN Volume Controller 2145-8G4 external
2. Data flow in a SAN Volume Controller system 3 connectors . . . . . . . . . . . . . 26
3. SAN Volume Controller nodes with internal 33. Power connector . . . . . . . . . . . 27
SSDs . . . . . . . . . . . . . . . 4 34. Service ports of the SAN Volume Controller
4. Configuration node . . . . . . . . . . 5 2145-8G4 . . . . . . . . . . . . . 27
5. SAN Volume Controller 2145-CG8 front panel 10 35. SAN Volume Controller 2145-8F4 rear-panel
6. SAN Volume Controller 2145-CF8 front panel 10 indicators . . . . . . . . . . . . . 28
7. SAN Volume Controller 2145-8A4 front-panel 36. SAN Volume Controller 2145-8F4 external
assembly . . . . . . . . . . . . . 11 connectors . . . . . . . . . . . . . 28
8. SAN Volume Controller 2145-8G4 front-panel 37. Power connector . . . . . . . . . . . 29
assembly . . . . . . . . . . . . . 11 38. Service ports of the SAN Volume Controller
9. SAN Volume Controller 2145-8F2 and SAN 2145-8F4 . . . . . . . . . . . . . 29
Volume Controller 2145-8F4 front-panel 39. Ports not used during normal operation by the
assembly . . . . . . . . . . . . . 12 SAN Volume Controller 2145-8F4 . . . . . 30
10. SAN Volume Controller 2145-CG8 or 2145-CF8 40. Ports not used on the front panel of the SAN
operator-information panel . . . . . . . 14 Volume Controller 2145-8F4 . . . . . . . 30
11. SAN Volume Controller 2145-CG8 or 2145-CF8 41. SAN Volume Controller 2145-8F2 rear-panel
operator-information panel . . . . . . . 15 indicators . . . . . . . . . . . . . 30
12. SAN Volume Controller 2145-8A4 42. SAN Volume Controller 2145-8F2 external
operator-information panel . . . . . . . 16 connectors . . . . . . . . . . . . . 31
13. SAN Volume Controller 2145-8G4 43. Power connector . . . . . . . . . . . 31
operator-information panel . . . . . . . 16 44. SAN Volume Controller 2145-CG8 or 2145-CF8
14. SAN Volume Controller 2145-8F2 and SAN ac, dc, and power-error LEDs . . . . . . 34
Volume Controller 2145-8F4 45. SAN Volume Controller 2145-8G4 ac and dc
operator-information panel . . . . . . . 17 LEDs. . . . . . . . . . . . . . . 34
15. SAN Volume Controller 2145-CG8 rear-panel 46. SAN Volume Controller 2145-8F4 and SAN
indicators . . . . . . . . . . . . . 19 Volume Controller 2145-8F2 ac and dc LEDs . 35
16. SAN Volume Controller 2145-CG8 rear-panel 47. Photo of the redundant ac-power switch 48
indicators for the 10 Gbps Ethernet feature . . 19 48. A four-node SAN Volume Controller system
17. Connectors on the rear of the SAN Volume with the redundant ac-power switch feature . 50
Controller 2145-CG8 . . . . . . . . . 20 | 49. Rack cabling example. . . . . . . . . . 52
18. 10 Gbps Ethernet ports on the rear of the SAN 50. 2145 UPS-1U front-panel assembly . . . . . 54
Volume Controller 2145-CG8 . . . . . . . 20 51. 2145 UPS-1U connectors and switches. . . . 56
19. Power connector . . . . . . . . . . . 21 52. 2145 UPS-1U dip switches . . . . . . . 57
20. Service ports of the SAN Volume Controller 53. Ports not used by the 2145 UPS-1U . . . . 57
2145-CG8 . . . . . . . . . . . . . 21 54. Power connector . . . . . . . . . . . 57
21. SAN Volume Controller 2145-CG8 port not 55. SAN Volume Controller front-panel assembly 99
used . . . . . . . . . . . . . . . 21 56. Example of a boot progress display . . . . 99
22. SAN Volume Controller 2145-CF8 rear-panel 57. Example of an error code for a clustered
indicators . . . . . . . . . . . . . 22 system . . . . . . . . . . . . . . 100
23. Connectors on the rear of the SAN Volume 58. Example of a node error code . . . . . . 100
Controller 2145-CG8 or 2145-CF8 . . . . . 22 59. Node rescue display . . . . . . . . . 101
24. Power connector . . . . . . . . . . . 23 60. Validate WWNN? navigation . . . . . . 103
25. Service ports of the SAN Volume Controller 61. SAN Volume Controller options on the
2145-CF8 . . . . . . . . . . . . . 23 front-panel display. . . . . . . . . . 105
26. SAN Volume Controller 2145-CF8 port not 62. Viewing the IPv6 address on the front-panel
used . . . . . . . . . . . . . . . 24 display. . . . . . . . . . . . . . 108
27. SAN Volume Controller 2145-8A4 rear-panel 63. Upper options of the actions menu on the
indicators . . . . . . . . . . . . . 24 front panel . . . . . . . . . . . . 112
28. SAN Volume Controller 2145-8A4 external 64. Middle options of the actions menu on the
connectors . . . . . . . . . . . . . 25 front panel . . . . . . . . . . . . 113
29. Power connector . . . . . . . . . . . 25 65. Lower options of the actions menu on the
30. Service ports of the SAN Volume Controller front panel . . . . . . . . . . . . 114
2145-8A4 . . . . . . . . . . . . . 25 66. Language? navigation. . . . . . . . . 124
31. SAN Volume Controller 2145-8G4 rear-panel 67. Example of a boot error code . . . . . . 153
indicators . . . . . . . . . . . . . 26 68. Example of a boot progress display . . . . 153
The chapters that follow introduce you to the SAN Volume Controller, the
redundant ac-power switch, and the uninterruptible power supply. They describe
how you can configure and check the status of one SAN Volume Controller node
or a clustered system of nodes through the front panel or with the management
GUI.
The vital product data (VPD) chapter provides information about the VPD that
uniquely defines each hardware and microcode element that is in the SAN Volume
Controller. You can also learn how to diagnose problems using the SAN Volume
Controller.
The maintenance analysis procedures (MAPs) can help you analyze failures that
occur in a SAN Volume Controller. With the MAPs, you can isolate the
field-replaceable units (FRUs) of the SAN Volume Controller that fail. Begin all
problem determination and repair procedures from “MAP 5000: Start” on page 235.
Emphasis
Different typefaces are used in this guide to show emphasis.
The IBM System Storage SAN Volume Controller Information Center contains all of
the information that is required to install, configure, and manage the SAN Volume
Controller. The information center is updated between SAN Volume Controller
product releases to provide the most current documentation. The information
center is available at the following website:
publib.boulder.ibm.com/infocenter/svc/ic/index.jsp
Unless otherwise noted, the publications in the SAN Volume Controller library are
available in Adobe portable document format (PDF) from the following website:
www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss
The following table lists websites where you can find help, services, and more
information:
Table 1. IBM websites for help, services, and information
Website Address
Directory of worldwide contacts http://www.ibm.com/planetwide
Support for SAN Volume Controller www.ibm.com/storage/support/2145
(2145)
Support for IBM System Storage www.ibm.com/storage/support/
and IBM TotalStorage products
Each of the PDF publications in the Table 2 is also available in the information
center by clicking the number in the “Order number” column:
Table 2. SAN Volume Controller library
Title Description Order number
IBM System Storage SAN This guide provides the GC27-3923
Volume Controller Model instructions that the IBM
2145-CG8 Hardware service representative uses to
Installation Guide install the hardware for SAN
Volume Controller model
2145-CG8.
IBM System Storage SAN This guide provides the GC27-2283
Volume Controller Hardware instructions that the IBM
Maintenance Guide service representative uses to
service the SAN Volume
Controller hardware,
including the removal and
replacement of parts.
IBM System Storage SAN This guide describes the GC27-2284
Volume Controller features of each SAN Volume
Troubleshooting Guide Controller model, explains
how to use the front panel,
and provides maintenance
analysis procedures to help
you diagnose and solve
problems with the SAN
Volume Controller.
Table 3 lists IBM publications that contain information related to the SAN Volume
Controller.
Table 3. Other IBM publications
Title Description Order number
IBM System Storage This guide introduces the IBM SC23-8824
Productivity Center System Storage Productivity
Introduction and Planning Center hardware and software.
Guide
Read This First: Installing the This guide describes how to GI11-8938
IBM System Storage install the IBM System Storage
Productivity Center Productivity Center hardware.
IBM System Storage This guide describes how to SC27-2336
Productivity Center User's configure the IBM System
Guide Storage Productivity Center
software.
Table 4 lists websites that provide publications and other information about the
SAN Volume Controller or related products or technologies.
Table 4. IBM documentation and related websites
Website Address
IBM Storage Management Pack for The IBM Storage Host Software Solutions
Microsoft System Center Operations Information Center describes how to install,
Manager (SCOM) configure, and use the IBM Storage Management
Pack for Microsoft System Center Operations
Manager.
IBM Storage Management Console for The IBM Storage Host Software Solutions
VMware vCenter Information Center describes how to install,
configure, and use the IBM Storage Management
Console for VMware vCenter, which enables SAN
Volume Controller and other IBM storage systems to
be integrated in VMware vCenter environments.
IBM Storage Device Driver for IBM Storage Host Software Solutions Information
VMware VAAI Center describes how to install, configure, and use
the IBM Storage Device Driver for VMware VAAI.
IBM Storage Management Console for The IBM Storage Host Software Solutions
VMware vCenter Site Recovery Information Center describes how to install,
Manager (SRM) configure, and use the IBM Storage Management
Console for VMware vCenter Site Recovery Manager.
IBM Publications Center www.ibm.com/e-business/linkweb/publications/
servlet/pbi.wss
IBM Redbooks® publications www.redbooks.ibm.com/
To view a PDF file, you need Adobe Acrobat Reader, which can be downloaded
from the Adobe website:
www.adobe.com/support/downloads/main.html
The IBM Publications Center offers customized search functions to help you find
the publications that you need. Some publications are available for you to view or
www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss
Related websites
The following websites provide information about SAN Volume Controller or
related products or technologies:
To submit any comments about this book or any other SAN Volume Controller
documentation:
v Go to the feedback page on the website for the SAN Volume Controller
Information Center at publib.boulder.ibm.com/infocenter/svc/ic/
index.jsp?topic=/com.ibm.storage.svc.console.doc/feedback.htm. There you can
use the feedback page to enter and submit comments or browse to the topic and
use the feedback link in the running footer of that page to identify the topic for
which you have a comment.
v Send your comments by email to [email protected]. Include the following
information for this publication or use suitable replacements for the publication
title and form number for the publication on which you are commenting:
– Publication title: IBM System Storage SAN Volume Controller Troubleshooting
Guide
– Publication form number: GC27-2284-02
– Page, table, or illustration numbers that you are commenting on
– A detailed description of any information that should be changed
Information
IBM maintains pages on the web where you can get information about IBM
products and fee services, product implementation and usage assistance, break and
fix service support, and the latest technical information. For more information,
refer to Table 5 on page xvi.
Note: Available services, telephone numbers, and web links are subject to change
without notice.
Before calling for support, be sure to have your IBM Customer Number available.
If you are in the US or Canada, you can call 1 (800) IBM SERV for help and
service. From other parts of the world, see http://www.ibm.com/planetwide for
the number that you can call.
When calling from the US or Canada, choose the storage option. The agent decides
where to route your call, to either storage software or storage hardware, depending
on the nature of your problem.
If you call from somewhere other than the US or Canada, you must choose the
software or hardware option when calling for assistance. Choose the software
option if you are uncertain if the problem involves the SAN Volume Controller
software or hardware. Choose the hardware option only if you are certain the
problem solely involves the SAN Volume Controller hardware. When calling IBM
for service regarding the product, follow these guidelines for the software and
hardware options:
Software option
Identify the SAN Volume Controller product as your product and supply
your customer number as proof of purchase. The customer number is a
7-digit number (0000000 to 9999999) assigned by IBM when the product is
purchased. Your customer number should be located on the customer
information worksheet or on the invoice from your storage purchase. If
asked for an operating system, use Storage.
Hardware option
Provide the serial number and appropriate 4-digit machine type. For the
SAN Volume Controller, the machine type is 2145.
In the US and Canada, hardware service and support can be extended to 24x7 on
the same day. The base warranty is 9x5 on the next business day.
You can find information about products, solutions, partners, and support on the
IBM website.
To find up-to-date information about products, services, and partners, visit the IBM
website at www.ibm.com/storage/support/2145.
Make sure that you have taken steps to try to solve the problem yourself before
you call.
Some suggestions for resolving the problem before calling IBM Support include:
v Check all cables to make sure that they are connected.
v Check all power switches to make sure that the system and optional devices are
turned on.
v Use the troubleshooting information in your system documentation. The
troubleshooting section of the information center contains procedures to help
you diagnose problems.
v Go to the IBM Support website at www.ibm.com/storage/support/2145 to check
for technical information, hints, tips, and new device drivers or to submit a
request for information.
If you have questions about how to use the machine and how to configure the
machine, sign up for the IBM Support Line offering to get a professional answer.
The maintenance supplied with the system provides support when there is a
problem with a hardware component or a fault in the system machine code. At
times, you might need expert advice about using a function provided by the
system or about how to configure the system. Purchasing the IBM Support Line
offering gives you access to this professional advice. Taking this advice while
deploying your system can save issues further down the line.
Contact your local IBM Sales or IBM Support for this offering availability and to
purchase it, if available in your country.
| This topic describes the changes to this guide since the previous edition,
| GC27-2284-03.
| Changed information
New information
This topic describes the changes to this guide since the previous edition,
GC27-2284-02. The following sections summarize the changes that have since been
implemented from the previous version.
Changed information
A SAN is a high-speed Fibre Channel network that connects host systems and
storage devices. In a SAN, a host system can be connected to a storage device
across the network. The connections are made through units such as routers and
switches. The area of the network that contains these units is known as the fabric of
the network.
The SAN Volume Controller software performs the following functions for the host
systems that attach to SAN Volume Controller:
v Creates a single pool of storage
v Provides logical unit virtualization
v Manages logical volumes
v Mirrors logical volumes
The SAN Volume Controller system also provides the following functions:
v Large scalable cache
v Copy Services
– IBM FlashCopy® (point-in-time copy) function, including thin-provisioned
FlashCopy to make multiple targets affordable
– Metro Mirror (synchronous copy)
– Global Mirror (asynchronous copy)
– Data migration
v Space management
– IBM System Storage Easy Tier® to migrate the most frequently used data to
higher performing storage
– Metering of service quality when combined with IBM Tivoli® Storage
Productivity Center
– Thin-provisioned logical volumes
– Compressed volumes to consolidate storage
Figure 1 on page 2 shows hosts, SAN Volume Controller nodes, and RAID storage
systems connected to a SAN fabric. The redundant SAN fabric comprises a
fault-tolerant arrangement of two or more counterpart SANs that provide alternate
paths for each SAN-attached device.
Host zone
Node
Redundant
SAN fabric
Node
Node
RAID RAID
storage system storage system
svc00600
Storage system zone
Volumes
A system of SAN Volume Controller nodes presents volumes to the hosts. Most of
the advanced functions that SAN Volume Controller provides are defined on
volumes. These volumes are created from managed disks (MDisks) that are
presented by the RAID storage systems. All data transfer occurs through the SAN
Volume Controller nodes, which is described as symmetric virtualization.
Node
Redundant
SAN fabric
Node
I/O is sent to
managed disks.
RAID RAID
storage system storage system
svc00601
Data transfer
The nodes in a system are arranged into pairs known as I/O groups. A single pair is
responsible for serving I/O on a given volume. Because a volume is served by two
nodes, there is no loss of availability if one node fails or is taken offline.
System management
The SAN Volume Controller nodes in a clustered system operate as a single system
and present a single point of control for system management and service. System
management and error reporting are provided through an Ethernet interface to one
of the nodes in the system, which is called the configuration node. The configuration
node runs a web server and provides a command-line interface (CLI). The
configuration node is a role that any node can take. If the current configuration
node fails, a new configuration node is selected from the remaining nodes. Each
node also provides a command-line interface and web interface for performing
hardware service actions.
Fabric types
I/O operations between hosts and SAN Volume Controller nodes and between
SAN Volume Controller nodes and RAID storage systems are performed by using
the SCSI standard. The SAN Volume Controller nodes communicate with each
other by using private SCSI commands.
Table 6 on page 4 shows the fabric type that can be used for communicating
between hosts, nodes, and RAID storage systems. These fabric types can be used at
the same time.
Solid-state drives
Some SAN Volume Controller nodes contain solid-state drives (SSDs). These
internal SSDs can be used to create RAID-managed disks (MDisks) that in turn can
be used to create volumes. SSDs provide host servers with a pool of
high-performance storage for critical applications.
Figure 3 shows this configuration. Internal SSD MDisks can also be placed in a
storage pool with MDisks from regular RAID storage systems, and IBM System
Storage Easy Tier performs automatic data placement within that storage pool by
moving high-activity data onto better performing storage.
Node
with SSDs Redundant
SAN fabric
svc00602
The nodes are always installed in pairs, with a minimum of one and a maximum
of four pairs of nodes constituting a system. Each pair of nodes is known as an I/O
group. All I/O operations that are managed by the nodes in an I/O group are
cached on both nodes.
I/O groups take the storage that is presented to the SAN by the storage systems as
MDisks and translates the storage into logical disks (volumes) that are used by
Systems
All your configuration, monitoring, and service tasks are performed at the system
level. Therefore, after configuring your system, you can take advantage of the
virtualization and the advanced features of the SAN Volume Controller system.
A system can consist of between two to eight SAN Volume Controller nodes.
All configuration settings are replicated across all nodes in the system. Because
configuration is performed at the system level, management IP addresses are
assigned to the system. Each interface accesses the system remotely through the
Ethernet system-management address.
Configuration node
A configuration node is a single node that manages configuration activity of the
system.
If the configuration node fails, the system chooses a new configuration node. This
action is called configuration node failover. The new configuration node takes over
the management IP addresses. Thus you can access the system through the same
IP addresses although the original configuration node has failed. During the
failover, there is a short period when you cannot use the command-line tools or
management GUI.
Figure 4 shows an example clustered system that contains four nodes. Node 1 has
been designated the configuration node. User requests (1) are handled by node 1.
1 Configuration
Node
IP Interface
This node then acts as the focal point for all configuration and other requests that
are made from the management GUI application or the CLI. This node is known as
the configuration node.
If the configuration node is stopped or fails, the remaining nodes in the system
determine which node will take on the role of configuration node. The new
The new configuration node broadcasts the new IP address mapping using the
Address Resolution Protocol (ARP). You must configure some switches to forward
the ARP packet on to other devices on the subnetwork. Ensure that all Ethernet
devices are configured to pass on unsolicited ARP packets. Otherwise, if the ARP
packet is not forwarded, a device loses its connection to the SAN Volume
Controller system.
If a device loses its connection to the SAN Volume Controller system, it can
regenerate the address quickly if the device is on the same subnetwork as the
system. However, if the device is not on the same subnetwork, it might take hours
for the address resolution cache of the gateway to refresh. In this case, you can
restore the connection by establishing a command line connection to the system
from a terminal that is on the same subnetwork, and then by starting a secure copy
to the device that has lost its connection.
Management IP failover
If the configuration node fails, the IP addresses for the clustered system are
transferred to a new node. The system services are used to manage the transfer of
the management IP addresses from the failed configuration node to the new
configuration node.
Note: Some Ethernet devices might not forward ARP packets. If the ARP
packets are not forwarded, connectivity to the new configuration node cannot be
established automatically. To avoid this problem, configure all Ethernet devices
to pass unsolicited ARP packets. You can restore lost connectivity by logging in
to the SAN Volume Controller and starting a secure copy to the affected system.
Starting a secure copy forces an update to the ARP cache for all systems
connected to the same switch as the affected system.
If the Ethernet link to the SAN Volume Controller system fails because of an event
unrelated to the SAN Volume Controller, such as a cable being disconnected or an
Note: IP addresses that are used by hosts to access the system over an Ethernet
connection are different from management IP addresses.
SAN Volume Controller supports the following protocols that make outbound
connections from the system:
v Email
v Simple Network Mail Protocol (SNMP)
v Syslog
v Network Time Protocol (NTP)
These protocols operate only on a port configured with a management IP address.
When making outbound connections, the SAN Volume Controller uses the
following routing decisions:
v If the destination IP address is in the same subnet as one of the management IP
addresses, the SAN Volume Controller system sends the packet immediately.
v If the destination IP address is not in the same subnet as either of the
management IP addresses, the system sends the packet to the default gateway
for Ethernet port 1.
v If the destination IP address is not in the same subnet as either of the
management IP addresses and Ethernet port 1 is not connected to the Ethernet
network, the system sends the packet to the default gateway for Ethernet port 2.
When configuring any of these protocols for event notifications, use these routing
decisions to ensure that error notification works correctly in the event of a network
failure.
In the host zone, the host systems can identify and address the nodes. You can
have more than one host zone and more than one disk zone. Unless you are using
a dual-core fabric design, the system zone contains all ports from all nodes in the
system. Create one zone for each host Fibre Channel port. In a disk zone, the
nodes identify the storage systems. Generally, create one zone for each external
storage system. If you are using the Metro Mirror and Global Mirror feature, create
a zone with at least one port from each node in each system; up to four systems
are supported.
Note: Some operating systems cannot tolerate other operating systems in the same
host zone, although you might have more than one host type in the SAN fabric.
A label on the front of the node indicates the SAN Volume Controller node type,
hardware revision (if appropriate), and serial number.
Figure 5 on page 10 shows the controls and indicators on the front panel of the
SAN Volume Controller 2145-CG8.
6 5
svc00717
1 2
3 4
Figure 6 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-CF8.
1 2 3 4
6 5
svc00541c
1 2
3 4
1 2 3 4
8 7 6 5
svc00438
Figure 7. SAN Volume Controller 2145-8A4 front-panel assembly
Figure 8 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-8G4.
1 2 3 5
8 7 6 4
svc00216
Figure 9 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-8F4 and SAN Volume Controller 2145-8F2.
1 2 3 4 5
8 7 6
svc00075
Figure 9. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 front-panel
assembly
The node status LED provides the following system activity indicators:
Off The node is not operating as a member of a system.
On The node is operating as a member of a system.
Flashing
The node is dumping cache and state data to the local disk in anticipation
of a system reboot from a pending power-off action or other controlled
restart sequence.
Front-panel display
The front-panel display shows service, configuration, and navigation information.
The front-panel display shows configuration and service information about the
node and the system, including the following items:
v Boot progress indicator
v Boot failed
v Charging
v Hardware boot
v Node rescue request
v Power failure
v Powering off
v Recovering
v Restarting
v Shutting down
v Error codes
v Validate WWNN?
Navigation buttons
You can use the navigation buttons to move through menus.
There are four navigational buttons that you can use to move throughout a menu:
up, down, right, and left.
Each button corresponds to the direction that you can move in a menu. For
example, to move right in a menu, press the navigation button that is located on
the right side. If you want to move down in a menu, press the navigation button
that is located on the bottom.
Note: The select button is used in tandem with the navigation buttons.
This number is used for warranty and service entitlement checking and is included
in the data sent with error reports. It is essential that this number is not changed
during the life of the product. If the system board is replaced, you must follow the
system board replacement instructions carefully and rewrite the serial number on
the system board.
Select button
Use the select button to select an item from a menu.
The select button and navigation buttons help you to navigate and select menu
and boot options, and start a service panel test. The select button is located on the
front panel of the SAN Volume Controller, near the navigation buttons.
If the service controller assembly front panel is replaced, the configuration and
service software displays the number that is printed on the front of the
replacement panel. Future error reports contain the new number. No system
reconfiguration is necessary when the front panel is replaced.
Error LED
Critical faults on the service controller are indicated through the amber error LED.
Figure 10 shows the operator-information panel for the SAN Volume Controller
2145-CG8.
1 2 3 4 5
1 2
svc00722
8 7 6
Note: If you install the 10 Gbps Ethernet feature, the port activity is not reflected
on the activity LEDs.
Figure 11 shows the operator-information panel for the SAN Volume Controller
2145-CF8.
1 2 3 4 5
svc_bb1gs008
2 1
4 3
10 9 8 7 6
Figure 12 on page 16 shows the operator-information panel for the SAN Volume
Controller 2145-8A4.
Figure 13 shows the operator information panel for the SAN Volume Controller
2145-8G4.
7 6 5 4 3 2 1
svc00215
8 7 6 5 4 3 2 1
svc00084
Figure 14. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4
operator-information panel
System-error LED
When it is lit, the system-error LED indicates that a system-board error has
occurred.
This amber LED lights up if the hardware detects a fatal error that requires a new
field-replaceable unit (FRU). To help you isolate the faulty FRU, see MAP 5800:
Light path to help you isolate the faulty FRU.
A system-error LED is also at the rear of the SAN Volume Controller models
2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2.
Reset button
A reset button is available on the SAN Volume Controller 2145-8A4 node, but do
not use it.
Attention: If you use the reset button, the node restarts immediately without the
SAN Volume Controller control data being written to disk. Service actions are then
required to make the node operational again.
Power button
The power button turns main power on or off for the SAN Volume Controller.
To turn on the power, press and release the power button. You must have a
pointed device, such as a pen, to press the button.
Attention: When the node is operational and you press and immediately release
the power button, the SAN Volume Controller indicates on its front panel that it is
turning off and writes its control data to its internal disk. This can take up to five
minutes. If you press the power button but do not release it, the node turns off
immediately without the SAN Volume Controller control data being written to
disk. Service actions are then required to make the SAN Volume Controller
operational again. Therefore, during a power-off operation, do not press and hold
the power button for more than two seconds.
Note: The 2145 UPS-1U does not turn off when the SAN Volume Controller is shut
down from the power button.
Power LED
The green power LED indicates the power status of the system.
Note: A power LED is also at the rear of the SAN Volume Controller 2145-CG8,
2145-CF8, 2145-8F2, 2145-8F4, and 2145-8G4 nodes.
Release latch
The release latch on the SAN Volume Controller models 2145-8G4, 2145-8F4, and
2145-8F2 gives you access to the light path diagnostics panel, which provides a
method for determining the location of a problem.
After pressing the release latch on the operator-information panel, you can slide
the light path diagnostics panel out to view the lit LEDs. The LEDs indicate the
type of error that has occurred. See MAP 5800: Light path for more detail.
To retract the panel, push it back into the node and snap it into place.
System-information LED
When the system-information LED is lit, a noncritical event has occurred.
Check the light path diagnostics panel and the event log. Light path diagnostics
are described in more detail in the light path maintenance analysis procedure
(MAP).
Locator LED
The SAN Volume Controller does not use the locator LED.
The operator-information panel LEDs refer to the Ethernet ports that are mounted
on the system board. If you install the 10 Gbps Ethernet card on a SAN Volume
Controller 2145-CG8, the port activity is not reflected on the activity LEDs.
Figure 15 shows the rear-panel indicators on the SAN Volume Controller 2145-CG8
back-panel assembly.
1 2 4
svc00720
3 5
Figure 16 shows the rear-panel indicators on the SAN Volume Controller 2145-CG8
back-panel assembly that has the 10 Gbps Ethernet feature.
1
svc00729
Figure 16. SAN Volume Controller 2145-CG8 rear-panel indicators for the 10 Gbps Ethernet
feature
1 10 Gbps Ethernet-link LEDs. The amber link LED is on when this port is
connected to a 10 Gbps Ethernet switch and the link is online.
These figures show the external connectors on the SAN Volume Controller
2145-CG8 back panel assembly.
1 2 3 4 5 6
svc00732
9 8 7
Figure 17. Connectors on the rear of the SAN Volume Controller 2145-CG8
1 2
svc00731
Figure 18. 10 Gbps Ethernet ports on the rear of the SAN Volume Controller 2145-CG8
Live
The SAN Volume Controller 2145-CG8 contains a number of ports that are only
used during service procedures.
Figure 20 shows ports that are used only during service procedures.
1 2 3
3 2
svc00724
Figure 20. Service ports of the SAN Volume Controller 2145-CG8
During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by an
IBM service representative.
The SAN Volume Controller 2145-CG8 can contain one port that is not used.
Figure 21 shows the one port that is not used during service procedures or normal
use.
1
svc00730
When present, this port is disabled in software to make the port inactive.
The SAS port is present when the optional high-speed SAS adapter is installed
with one or more solid-state drives (SSDs).
Figure 22 shows the rear-panel indicators on the SAN Volume Controller 2145-CF8
back-panel assembly.
1 2
svc_00219b_cf8
5 4 5 4 3
Figure 23 shows the external connectors on the SAN Volume Controller 2145-CF8
back panel assembly.
1 2 3 4 5 6
svc_00219_cf8
9 8 7
Figure 23. Connectors on the rear of the SAN Volume Controller 2145-CG8 or 2145-CF8
Neutral
Ground
Live
The SAN Volume Controller 2145-CF8 contains a number of ports that are only
used during service procedures.
Figure 25 shows ports that are used only during service procedures.
1 2 3 svc00227cf8
During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by an
IBM service representative.
The SAN Volume Controller 2145-CF8 can contain one port that is not used.
svc00227cf8b
Figure 26. SAN Volume Controller 2145-CF8 port not used
When present, this port is disabled in software to make the port inactive.
The SAS port is present when the optional high-speed SAS adapter is installed
with one or more solid-state drives (SSDs).
Figure 27 shows the rear-panel indicators on the SAN Volume Controller 2145-8A4
back-panel assembly.
svc00539
2 3 4 5
Figure 28 on page 25 shows the external connectors on the SAN Volume Controller
2145-8A4 back-panel assembly.
svc00538
8 7 6
Figure 29 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8A4 to the power source from the uninterruptible power supply.
The SAN Volume Controller 2145-8A4 contains a number of ports that are used
only during service procedures. These ports are shown in Figure 30.
1 2 3
2
svc00537
During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by your
IBM service representative.
Figure 31 shows the rear-panel indicators on the SAN Volume Controller 2145-8G4
back-panel assembly.
svc00536
2 3 4 5 6
Figure 32 shows the external connectors on the SAN Volume Controller 2145-8G4
back panel assembly.
1 2 3 4
svc00535
8 7 6 5
Figure 33 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8G4 to the power source from the uninterruptible power supply.
Neutral
Ground
Live
The SAN Volume Controller 2145-8G4 contains a number of ports that are only
used during service procedures. These ports are shown in Figure 34.
1 2
3
2
svc00534
During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by your
IBM service representative.
Figure 35 shows the rear-panel indicators on the SAN Volume Controller 2145-8F4
back-panel assembly.
svc00533
2 3 4 5 6 7
Figure 36 shows the external connectors on the SAN Volume Controller 2145-8F4
back panel assembly.
1 2 3 4 5
svc00532
8 7 6
Figure 37 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8F4 to the power source from the uninterruptible power supply.
Neutral
Ground
Live
The SAN Volume Controller 2145-8F4 contains the keyboard service port and the
monitor service port. These ports are used only during service procedures.
Figure 38 provides the locations of the service ports.
svc00531
1 2
The SAN Volume Controller 2145-8F4 is equipped with several ports that are not
used by the SAN Volume Controller during normal operation. Figure 39 on page
30 and Figure 40 on page 30 show the ports that are not used by the SAN Volume
Controller.
Figure 39. Ports not used during normal operation by the SAN Volume Controller 2145-8F4
svc00210
Figure 40. Ports not used on the front panel of the SAN Volume Controller 2145-8F4
Figure 41 shows the rear-panel indicators on the SAN Volume Controller 2145-8F2
back-panel assembly.
1
svc00529
2 3 4 5 6 7
Figure 42 shows the external connectors on the SAN Volume Controller 2145-8F2
back panel assembly.
svc00528
8 7 6 5 4 3 2 1
Figure 43 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8F2 to the power source from the uninterruptible power supply.
Neutral
Ground
Live
Two LEDs are used to indicate the state and speed of the operation of each Fibre
Channel port. The bottom LED indicates the link state and activity.
Table 7. Link state and activity for the bottom Fibre Channel LED
LED state Link state and activity indicated
Off Link inactive
On Link active, no I/O
Each Fibre Channel port can operate at one of three speeds. The top LED indicates
the relative link speed. The link speed is defined only if the link state is active.
Table 8. Link speed for the top Fibre Channel LED
LED state Link speed indicated
Off SLOW
On FAST
Blinking MEDIUM
Table 9 shows the actual link speeds for the SAN Volume Controller models
2145-8A4, 2145-8G4, and 2145-8F4.
Table 9. Actual link speeds
Link speed Actual link speeds
Slow 1 Gbps
Fast 4 Gbps
Medium 2 Gbps
Table 10 shows the actual link speeds for the SAN Volume Controller 2145-CF8 and
for the SAN Volume Controller 2145-CG8.
Table 10. Actual link speeds
Link speed Actual link speeds
Slow 2 Gbps
Fast 8 Gbps
Medium 4 Gbps
There is a set of LEDs for each Ethernet connector. The top LED is the Ethernet
link LED. When it is lit, it indicates that there is an active connection on the
Ethernet port. The bottom LED is the Ethernet activity LED. When it flashes, it
indicates that data is being transmitted or received between the server and a
network device.
There is a set of LEDs for each Ethernet connector. The top LED is the Ethernet
link LED. When it is lit, it indicates that there is an active connection on the
The following terms describe the power, location, and system-error LEDs:
Power LED
This is the top of the three LEDs and indicates the following states:
Off One or more of the following are true:
v No power is present at the power supply input
v The power supply has failed
v The LED has failed
On The SAN Volume Controller is powered on.
| Blinking
The SAN Volume Controller is turned off but is still connected to a
power source.
Location LED
This is the middle of the three LEDs and is not used by the SAN Volume
Controller.
System-error LED
This is the bottom of the three LEDs that indicates that a system board
error has occurred. The light path diagnostics provide more information.
Ac and dc LEDs
The ac and dc LEDs indicate whether the node is receiving electrical current.
Ac LED
The upper LED indicates that ac current is present on the node.
Dc LED
The lower LED indicates that dc current is present on the node.
Ac, dc, and power-supply error LEDs on the SAN Volume Controller 2145-CF8
and SAN Volume Controller 2145-CG8:
The ac, dc, and power-supply error LEDs indicate whether the node is receiving
electrical current.
Figure 44 on page 34 shows the location of the SAN Volume Controller 2145-CF8
ac, dc, and power-supply error LEDs.
svc00542
Figure 44. SAN Volume Controller 2145-CG8 or 2145-CF8 ac, dc, and power-error LEDs
Each of the two power supplies has its own set of LEDs.
Ac LED
The upper LED (1) on the left side of the power supply, indicates that ac
current is present on the node.
Dc LED
The middle LED (2) to the left side of the power supply, indicates that
dc current is present on the node.
Power-supply error LED
The lower LED (3) to the left side of the power supply, indicates a
problem with the power supply.
The ac LED and dc LED are located on the rear of the SAN Volume Controller
2145-8G4.
2
svc00220
Ac and dc LEDs on the SAN Volume Controller 2145-8F4 and the SAN Volume
Controller 2145-8F2:
The ac LED and dc LED are located on the rear of the SAN Volume Controller
2145-8F4 and the SAN Volume Controller 2145-8F2.
svc00105
Figure 46. SAN Volume Controller 2145-8F4 and SAN Volume Controller 2145-8F2 ac and dc
LEDs
Ac LED
The upper LED (1) indicates that ac current is present on the node.
Dc LED
The lower LED (2) indicates that dc current is present on the node.
The physical port numbers identify Fibre Channel cards and cable connections
when you perform service tasks. The physical port numbers are 1 - 4, counting
from left to right when you view the rear panel of the node. The WWPNs are used
for tasks such as Fibre Channel switch configuration and to uniquely identify the
devices on the SAN.
The WWPNs are derived from the worldwide node name (WWNN) of the SAN
Volume Controller node in which the ports are installed.
Port Value of Q
1 4
2 3
3 1
4 2
| 5 5
| 6 6
| 7 7
| 8 8
Input-voltage requirements
Voltage Frequency
200 V to 240 V single phase ac 50 Hz or 60 Hz
Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.
The maximum power that is required depends on the node type and the optional
features that are installed.
Table 11. Maximum power consumption
Components Power requirements
SAN Volume Controller 2145-CG8 and 2145 200 W
UPS-1U
For the high-speed SAS adapter with from one to four solid-state drives, add 50 W
to the power requirements.
The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.
Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Table 12. Physical specifications
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 m to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 ft to 3000 ft) noncondensing
Operating in 10°C to 32°C 914 m to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (3000 ft to 7000 noncondensing
ft)
Turned off 10°C to 43°C 0 m to 2133 m 8% to 80% 27°C (81°F)
(50°F to 109°F) (0 ft to 7000 ft) noncondensing
Storing 1°C to 60°C 0 m to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 ft to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 m to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 ft to 34991 ft) condensing, but
no precipitation
Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Table 13. Environment requirements with redundant ac power
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 m to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 ft to 3000 ft) noncondensing
Operating in 15°C to 32°C 914 m to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (3000 ft to 7000 noncondensing
ft)
Turned off 10°C to 43°C 0 m to 2133 m 20% to 80% 27°C (81°F)
(50°F to 109°F) (0 ft to 7000 ft) noncondensing
Storing 1°C to 60°C 0 m to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 ft to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 m to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 ft to 34991 ft) condensing, but
no precipitation
The following tables list the physical characteristics of the SAN Volume Controller
2145-CG8 node.
Ensure that space is available in a rack that is capable of supporting the node.
Table 14. Dimensions and weight
Height Width Depth Maximum weight
4.3 cm 44 cm 73.7 cm 15 kg
(1.7 in.) (17.3 in.) (29 in.) (33 lb)
Ensure that space is also available in the rack for the following additional space
requirements around the node.
Table 15. Additional space requirements
Additional space
Location requirements Reason
Left side and right side Minimum: 50 mm (2 in.) Cooling air flow
Back Minimum: 100 mm (4 in.) Cable exit
Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz
Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.
The power capacity that is required depends on the node type and which optional
features are installed.
Notes:
v SAN Volume Controller 2145-CF8 nodes will not connect to all revisions of the
2145 UPS-1U power supply unit. The SAN Volume Controller 2145-CF8 nodes
require the 2145 UPS-1U power supply unit part number 31P1318. This unit has
two power outlets that are accessible. Earlier revisions of the 2145 UPS-1U
power supply unit have only one power outlet that is accessible and are not
suitable.
v For each redundant ac-power switch, add 20 W to the power requirements.
v For each high-speed SAS adapter with one to four solid-state drives (SSDs), add
50 W to the power requirements.
The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.
Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 2998 ft) noncondensing
Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 2998 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
The following tables list the physical characteristics of the SAN Volume Controller
2145-CF8 node.
Ensure that space is available in a rack that is capable of supporting the node.
Ensure that space is also available in the rack for the following additional space
requirements around the node.
Input-voltage requirements
Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz
Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.
The power that is required depends on the node type and whether the redundant
ac power feature is used.
The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.
Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 3000 ft) noncondensing
Operating in 10°C to 32°C 914 to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 8% to 80% 27°C (81°F)
(50°F to 109°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 3000 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 20% to 80% 27°C (81°F)
(50°F to 109°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
The following tables list the physical characteristics of the SAN Volume Controller
2145-8A4 node.
Ensure that space is available in a rack that is capable of supporting the node.
Ensure that space is also available in the rack for the following additional space
requirements around the node.
Additional space
Location requirements Reason
Left and right sides Minimum: 50 mm (2 in.) Cooling air flow
Back Minimum: 100 mm (4 in.) Cable exit
Input-voltage requirements
Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz
Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.
The power that is required depends on the node type and whether the redundant
ac power feature is used.
The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.
Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 2998 ft) noncondensing
Operating in 10°C to 32°C 914 to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 8% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 2998 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
The following tables list the physical characteristics of the SAN Volume Controller
2145-8G4 node.
Ensure that space is available in a rack that is capable of supporting the node.
Ensure that space is also available in the rack for the following additional space
requirements around the node.
Additional space
Location requirements Reason
Left and right sides 50 mm (2 in.) Cooling air flow
Back Minimum: Cable exit
100 mm (4 in.)
Input-voltage requirements
Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz
The power that is required depends on the node type and whether the redundant
ac power feature is used.
The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.
Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914.4 m 8% to 80% 23°C (74°F)
lower altitudes (50°F to 95°F) (0 to 3000 ft) noncondensing
Operating in 10°C to 32°C 914.4 to 2133.6 m 8% to 80% 23°C (74°F)
higher altitudes (50°F to 88°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133.6 m 8% to 80% 27°C (81°F)
(50°F to 110°F) (3000 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133.6 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation
Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914.4 m 20% to 80% 23°C (74°F)
lower altitudes (59°F to 89°F) (0 to 3000 ft) noncondensing
Operating in 15°C to 32°C 914.4 to 2133.6 20% to 80% 23°C (74°F)
higher altitudes (50°F to 88°F) m noncondensing
(3000 to 7000 ft)
Turned off 10°C to 43°C 0 to 2133.6 m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133.6 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
The following tables list the physical characteristics of the SAN Volume Controller
2145-8F4 and SAN Volume Controller 2145-8F2 nodes.
Ensure that space is available in a rack that is capable of supporting the node.
Ensure that space is also available in the rack for the following additional space
requirements around the node.
Additional space
Location requirements Reason
Left and right sides 50 mm (2 in.) Cooling air flow
Back Minimum: Cable exit
100 mm (4 in.)
You must connect the redundant ac-power switch to two independent power
circuits. One power circuit connects to the main power input port and the other
power circuit connects to the backup power-input port. If the main power to the
SAN Volume Controller node fails for any reason, the redundant ac-power switch
Place the redundant ac-power switch in the same rack as the SAN Volume
Controller node. The redundant ac-power switch logically sits between the rack
power distribution unit and the 2145 UPS-1U.
You can use a single redundant ac-power switch to power one or two SAN Volume
Controller nodes. If you use the redundant ac-power switch to power two nodes,
the nodes must be in different I/O groups. In the event that the redundant
ac-power switch fails or requires maintenance, both nodes turn off. Because the
nodes are in two different I/O groups, the hosts do not lose access to the back-end
disk data.
For maximum resilience to failure, use one redundant ac-power switch to power
each SAN Volume Controller node.
| You must properly cable the redundant ac-power switch units in your
| environment. Refer to the Planning section in the Information Center for more
| information.
The redundant ac-power switch requires two independent power sources that are
provided through two rack-mounted power distribution units (PDUs). The PDUs
must have IEC320-C13 outlets.
The redundant ac-power switch comes with two IEC 320-C19 to C14 power cables
to connect to rack PDUs. There are no country-specific cables for the redundant
ac-power switch.
The power cable between the redundant ac-power switch and the 2145 UPS-1U is
rated at 10 A.
The following tables list the physical characteristics of the redundant ac-power
switch.
Ensure that space is available in a rack that is capable of supporting the redundant
ac-power switch.
Ensure that space is also available in the rack for the side mounting plates on
either side of the redundant ac-power switch.
The maximum heat output that is dissipated inside the redundant ac-power switch
is approximately 20 watts (70 Btu per hour).
Note: While this topic provides an example of the cable connections, it does not
indicate a preferred physical location for the components.
Figure 48 on page 50 shows an example of the main wiring for a SAN Volume
Controller clustered system with the redundant ac-power switch feature. The
four-node clustered system consists of two I/O groups:
v I/O group 0 contains nodes A and B
v I/O group 1 contains nodes C and D
4
5
7
8
6
9
10
11
12
14 13
svc00358_cf8
Figure 48. A four-node SAN Volume Controller system with the redundant ac-power switch
feature
In this example, only two redundant ac-power switch units are used, and each
power switch powers one node in each I/O group. However, for maximum
redundancy, use one redundant ac-power switch to power each node in the
system.
Some SAN Volume Controller node types have two power supply units. Both
power supplies must be connected to the same 2145 UPS-1U, as shown by node A
and node B. The SAN Volume Controller 2145-CG8 is an example of a node that
has two power supplies. The SAN Volume Controller 2145-8A4 is an example of a
node that has a single power supply.
DC DC
1U Filler Panel
SVC #7 IOGroup 3 Node A AC AC
DC DC
1U Filler Panel
SVC #6 IOGroup 2 Node B AC AC
DC DC
1U Filler Panel
SVC #5 IOGroup 2 Node A AC AC
DC DC
1U Filler Panel
SVC #4 IOGroup 1 Node B AC AC
DC DC
1U Filler Panel
SVC #3 IOGroup 1 Node A AC AC
DC DC
1U Filler Panel
SVC #2 IOGroup 0 Node B AC AC
DC DC
1U Filler Panel
SVC #1 IOGroup 0 Node A AC AC
DC DC
1U Filler Panel
1U ller panel or optional 1U monitor
1U ller panel or optional SSPC server
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
ATT
ENTI
CONNECT ONLY IBM SAN VOLUME
ON
CONTROLLERS TO THESE OUTLETS.
SEE SAN VOLUME CONTROLLER
INSTALLATION GUIDE.
12
1U Filler Panel
12A Max.
12A Max.
Circuit Breaker
ON
BRANCH B
Circuit Breaker
ON
BRANCH B
20
20A
20
20A
OFF
OFF
12A Max.
12A Max.
Circuit Breaker
ON
BRANCH A
Circuit Breaker
ON
BRANCH A
20
20A
20
20A
OFF
OFF
1U Filler Panel
Main Backup
Input Input
svc00765
With a 2145 UPS-1U, data is saved to the internal disk of the SAN Volume
Controller node. The uninterruptible power supply units are required to power the
SAN Volume Controller nodes even when the input power source is considered
uninterruptible.
If the 2145 UPS-1U reports a loss of input power, the SAN Volume Controller node
stops all I/O operations and dumps the contents of its dynamic random access
memory (DRAM) to the internal disk drive. When input power to the 2145 UPS-1U
is restored, the SAN Volume Controller node restarts and restores the original
contents of the DRAM from the data saved on the disk drive.
A SAN Volume Controller node is not fully operational until the 2145 UPS-1U
battery state indicates that it has sufficient charge to power the SAN Volume
Controller node long enough to save all of its memory to the disk drive. In the
event of a power loss, the 2145 UPS-1U has sufficient capacity for the SAN Volume
Controller to save all its memory to disk at least twice. For a fully charged 2145
UPS-1U, even after battery charge has been used to power the SAN Volume
Controller node while it saves dynamic random access memory (DRAM) data,
sufficient battery charge remains so that the SAN Volume Controller node can
become fully operational as soon as input power is restored.
Important: Do not shut down a 2145 UPS-1U without first shutting down the SAN
Volume Controller node that it supports. Data integrity can be compromised by
pushing the 2145 UPS-1U on/off button when the node is still operating. However,
in the case of an emergency, you can manually shut down the 2145 UPS-1U by
pushing the 2145 UPS-1U on/off button when the node is still operating. Service
actions must then be performed before the node can resume normal operations. If
multiple uninterruptible power supply units are shut down before the nodes they
support, data can be corrupted.
For connection to the 2145 UPS-1U, each SAN Volume Controller of a pair must be
connected to only one 2145 UPS-1U.
SAN Volume Controller provides a cable bundle for connecting the uninterruptible
power supply to a node. For SAN Volume Controller 2145-8F2, SAN Volume
Controller 2145-8F4, SAN Volume Controller 2145-8G4, and SAN Volume
Controller 2145-8A4, this is a single power cable plus a serial cable. For SAN
Volume Controller 2145-CF8 and SAN Volume Controller 2145-CG8, this is a
dual-power cable plus serial cable. This cable is used to connect both power
supplies of a node to the same uninterruptible power supply.
The SAN Volume Controller software determines whether the input voltage to the
uninterruptible power supply is within range and sets an appropriate voltage
alarm range on the uninterruptible power supply. The software continues to
recheck the input voltage every few minutes. If it changes substantially but
remains within the permitted range, the alarm limits are readjusted.
Note: The 2145 UPS-1U is equipped with a cable retention bracket that keeps the
power cable from disengaging from the rear panel. See the related documentation
for more information.
7
LOAD 2 LOAD 1 + -
8
1yyzvm
1 2 3 4 5 6
The load segment 2 indicator on the 2145 UPS-1U is lit (green) when power is
available to load segment 2.
The load segment 1 indicator on the 2145 UPS-1U is not currently used by the
SAN Volume Controller.
Note: When the 2145 UPS-1U is configured by the SAN Volume Controller, this
load segment is disabled. During normal operation, the load segment 1 indicator is
off. A “Do not use” label covers the receptacles.
Alarm indicator:
If the alarm is on, go to the 2145 UPS-1U MAP to resolve the problem.
On-battery indicator:
The amber on-battery indicator is on when the 2145 UPS-1U is powered by the
battery. This indicates that the main power source has failed.
If the on-battery indicator is on, go to the 2145 UPS-1U MAP to resolve the
problem.
Overload indicator:
The overload indicator lights up when the capacity of the 2145 UPS-1U is
exceeded.
If the overload indicator is on, go to MAP 5250: 2145 UPS-1U repair verification to
resolve the problem.
Power-on indicator:
When the power-on indicator is a steady green, the 2145 UPS-1U is active.
On or off button:
The on or off button turns the power on or off for the 2145 UPS-1U.
After you connect the 2145 UPS-1U to the outlet, it remains in standby mode until
you turn it on. Press and hold the on or off button until the power-on indicator is
illuminated (approximately five seconds). On some versions of the 2145 UPS-1U,
you might need a pointed device, such as a screwdriver, to press the on or off
button. A self-test is initiated that takes approximately 10 seconds, during which
time the indicators are turned on and off several times. The 2145 UPS-1U then
enters normal mode.
Press and hold the on or off button until the power-on light is extinguished
(approximately five seconds). On some versions of the 2145 UPS-1U, you might
need a pointed device, such as a screwdriver, to press the on or off button. This
places the 2145 UPS-1U in standby mode. You must then unplug the 2145 UPS-1U
to turn off the unit.
Attention: Do not turn off the uninterruptible power supply before you shut
down the SAN Volume Controller node that it is connected to. Always follow the
instructions that are provided in MAP 5350 to perform an orderly shutdown of a
SAN Volume Controller node.
Use the test and alarm reset button to start the self-test.
To start the self-test, press and hold the test and alarm reset button for three
seconds. This button also resets the alarm.
Figure 51 shows the location of the connectors and switches on the 2145 UPS-1U.
svc00308
1 2 3 4 5
Figure 52 on page 57 shows the dip switches, which can be used to configure the
input and output voltage ranges. Because this function is performed by the SAN
Volume Controller software, both switches must be left in the OFF position.
svc00147
OFF
Figure 52. 2145 UPS-1U dip switches
The 2145 UPS-1U is equipped with ports that are not used by the SAN Volume
Controller and have not been tested. Use of these ports, in conjunction with the
SAN Volume Controller or any other application that might be used with the SAN
Volume Controller, is not supported. Figure 53 shows the 2145 UPS-1U ports that
are not used.
Neutral
Ground
Live
The following tables describe the physical characteristics of the 2145 UPS-1U.
Ensure that space is available in a rack that is capable of supporting the 2145
UPS-1U.
Heat output
The 2145 UPS-1U unit produces the following approximate heat output.
Table 25 provides a brief description of each SAN Volume Controller 2145-8F4 FRU.
Table 25. SAN Volume Controller 2145-8F4 FRU descriptions
FRU Description
Frame assembly A complete SAN Volume Controller
2145-8F4 with the exception of the Fibre
Channel cards and the service controller.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-8F4 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA. The card
assembly is located in PCI slot 2. It is not
permitted to install a Fibre Channel card in
PCI slot 1 when the card is installed.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of operating at up to 4
Gbps.
Riser card, PCI Express An interconnection card that provides the
interface between the system board and the
4-port Fibre Channel adapter.
Service controller The FRU that provides the service functions
and the front panel display and buttons.
Disk drive assembly A SATA (serial advanced technology
attachment) disk drive assembly for the
SAN Volume Controller 2145-8F4.
Memory module A 1 GB ECC DRR2 memory module.
Microprocessor The microprocessor on the system board.
Voltage regulator module (VRM) The VRM of the microprocessor.
Power supply assembly An assembly that provides dc power to the
SAN Volume Controller 2145-8F4.
Power backplane An assembly that provides a power interface
between the system board and the power
supply assembly.
CMOS battery A 3.0V battery on the system board that
maintains power to backup the system BIOS
settings.
Fan power cable A kit that provides the cables for connecting
the fan backplanes to the system board.
Front panel signal cable A ribbon cable that connects the
operator-information panel to the system
board.
Fan backplane A kit that provides all fan holder and fan
backplane assemblies.
Operator-information panel The information panel that includes the
power-control button and the light path
diagnostics LEDs.
Fan, 40×40×28 The single fan assemblies located in fan
positions 1 - 3.
Fan, 40×40×56 The double fan assemblies located in fan
positions 4 - 7.
Table 26 provides a brief description of each SAN Volume Controller 2145-8F2 FRU.
Table 26. SAN Volume Controller 2145-8F2 FRU descriptions
FRU Description
Frame assembly A complete SAN Volume Controller
2145-8F2 with the exception of the Fibre
Channel cards and the service controller.
Fibre Channel host bus adapter (HBA) (full The SAN Volume Controller 2145-8F2 is
height) connected to the Fibre Channel fabric
through the Fibre Channel HBA. The full
height card assembly is located in PCI slot 2.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. Its maximum speed is limited to 2
Gbps by the Fibre Channel adapter.
Riser card, PCI (full height) An interconnection card that provides the
interface between the system board and the
PCI card in slot 2.
Fibre Channel HBA (low profile) The SAN Volume Controller 2145-8F2 is
connected to the Fibre Channel fabric
through the Fibre Channel HBA. The low
profile card assembly is located in PCI slot 1.
Riser card, PCI (low profile) An interconnection card that provides the
interface between the system board and the
PCI card in slot 1.
Service controller The FRU that provides the service functions
and the front panel display and buttons.
Disk drive assembly A SATA (serial advanced technology
attachment) disk drive assembly for the
SAN Volume Controller 2145-8F2.
Memory module A 1 GB ECC DRR2 memory module.
Microprocessor The microprocessor on the system board.
Voltage regulator module (VRM) The VRM of the microprocessor.
Power supply assembly An assembly that provides DC power to the
SAN Volume Controller 2145-8F2
Power backplane An assembly that provides a power interface
between the system board and the power
supply assembly.
CMOS battery A 3.0V battery on the system board that
maintains power to backup the system BIOS
settings.
FRU Description
Redundant ac-power switch The redundant ac-power switch and its input power
assembly cables.
You use the management GUI to manage and service your system. The Monitoring
> Events panel provides access to problems that must be fixed and maintenance
procedures that step you through the process of correcting the problem.
You can also sort events by time or error code. When you sort by error code, the
most serious events, those with the lowest numbers, are displayed first. You can
select any event that is listed and select Actions > Properties to view details about
the event.
v Recommended Actions. For each problem that is selected, you can:
– Run a fix procedure.
– View the properties.
v Event log. For each entry that is selected, you can:
– Run a fix procedure.
– Mark an event as fixed.
– Filter the entries to show them by specific minutes, hours, or dates.
– Reset the date filter.
– View the properties.
Regularly monitor the status of the system using the management GUI. If you
suspect a problem, use the management GUI first to diagnose and resolve the
problem.
Use the views that are available in the management GUI to verify the status of the
system, the hardware devices, the physical storage, and the available volumes. The
Monitoring > Events panel provides access to all problems that exist on the
system. Use the Recommended Actions filter to display the most important events
that need to be resolved.
If there is a service error code for the alert, you can run a fix procedure that assists
you in resolving the problem. These fix procedures analyze the system and provide
more information about the problem. They suggest actions to take and step you
through the actions that automatically manage the system where necessary. Finally,
they check that the problem is resolved.
If there is an error that is reported, always use the fix procedures within the
management GUI to resolve the problem. Always use the fix procedures for both
system configuration problems and hardware failures. The fix procedures analyze
the system to ensure that the required changes do not cause volumes to be
inaccessible to the hosts. The fix procedures automatically perform configuration
changes that are required to return the system to its optimum state.
You must use a supported web browser. Verify that you are using a supported web
browser from the following website:
You can use the management GUI to manage your system as soon as you have
created a clustered system.
Procedure
1. Start a supported web browser and point the browser to the management IP
address of your system.
The management IP address is set when the clustered system is created. Up to
four addresses can be configured for your use. There are two addresses for
IPv4 access and two addresses for IPv6 access.
2. When the connection is successful, you see a login panel.
3. Log on by using your user name and password.
4. When you have logged on, select Monitoring > Events.
5. Ensure that the events log is filtered using Recommended actions.
6. Select the recommended action and run the fix procedure.
7. Continue to work through the alerts in the order suggested, if possible.
Results
After all the alerts are fixed, check the status of your system to ensure that it is
operating as intended.
The cache on the selected node is flushed before the node is taken offline. In some
circumstances, such as when the system is already degraded (for example, when
both nodes in the I/O group are online and the volumes within the I/O group are
degraded), the system ensures that data loss does not occur as a result of deleting
the only node with the cache data. If a failure occurs on the other node in the I/O
group, the cache is flushed before the node is removed to prevent data loss.
Before deleting a node from the system, record the node serial number, worldwide
node name (WWNN), all worldwide port names (WWPNs), and the I/O group
that the node is currently part of. If the node is re-added to the system at a later
time, recording this node information can avoid data corruption.
Chapter 3. SAN Volume Controller user interfaces for servicing your system 71
Attention:
v If you are removing a single node and the remaining node in the I/O group is
online, the data on the remaining node goes into write-through mode. This data
can be exposed to a single point of failure if the remaining node fails.
v If the volumes are already degraded before you remove a node, redundancy to
the volumes is degraded. Removing a node might result in a loss of access to
data and data loss.
v Removing the last node in the system destroys the system. Before you remove
the last node in the system, ensure that you want to destroy the system.
v When you remove a node, you remove all redundancy from the I/O group. As a
result, new or existing failures can cause I/O errors on the hosts. The following
failures can occur:
– Host configuration errors
– Zoning errors
– Multipathing-software configuration errors
v If you are deleting the last node in an I/O group and there are volumes that are
assigned to the I/O group, you cannot remove the node from the system if the
node is online. You must back up or migrate all data that you want to save
before you remove the node. If the node is offline, you can remove the node.
v When you remove the configuration node, the configuration function moves to a
different node within the system. This process can take a short time, typically
less than a minute. The management GUI reattaches to the new configuration
node transparently.
v If you turn the power on to the node that has been removed and it is still
connected to the same fabric or zone, it attempts to rejoin the system. The
system tells the node to remove itself from the system and the node becomes a
candidate for addition to this system or another system.
v If you are adding this node into the system, ensure that you add it to the same
I/O group that it was previously a member of. Failure to do so can result in
data corruption.
This task assumes that you have already accessed the management GUI.
Procedure
1. Select Monitoring > System.
2. Find the node that you want to remove.
If the node that you want to remove is shown as Offline, then the node is not
participating in the system.
If the node that you want to remove is shown as Online, deleting the node can
result in the dependent volumes to also go offline. Verify whether the node has
any dependent volumes.
3. To check for dependent volumes before attempting to remove the node, click
Manage , and then click Show Dependent Volumes.
If any volumes are listed, determine why and if access to the volumes is
required while the node is removed from the system. If the volumes are
assigned from MDisk groups that contain solid-state drives (SSDs) that are
located in the node, check why the volume mirror, if it is configured, is not
synchronized. There can also be dependent volumes because the partner node
Before you add a node to a system, you must make sure that the switch zoning is
configured such that the node being added is in the same zone as all other nodes
in the system. If you are replacing a node and the switch is zoned by worldwide
port name (WWPN) rather than by switch port, make sure that the switch is
configured such that the node being added is in the same VSAN or zone.
If you are adding a node that has been used previously, either within a different
I/O group within this system or within a different system, consider the following
situations before adding the node. If you add a node to the system without
changing its worldwide node name (WWNN), hosts might detect the node and use
it as if it were in its old location. This action might cause the hosts to access the
wrong volumes.
v If the new node requires a level of software that is higher than the software level
that is available on the system, the entire clustered system must be upgraded
before the new node can be added.
v If you are re-adding a node back to the same I/O group after a service action
required the node to be deleted from the system and the physical node has not
changed, no special procedures are required and the node can be added back to
the system.
v If you are replacing a node in a system either because of a node failure or an
upgrade, you must change the WWNN of the new node to match that of the
original node before you connect the node to the Fibre Channel network and
add the node to the system.
v If you are creating an I/O group in the system and are adding a new node,
there are no special procedures because this node was never added to a system
and the WWNN for the node did not exist.
v If you are creating an I/O group in the system and are adding a new node that
has been added to a system before, the host system might still be configured to
the node WWPNs and the node might still be zoned in the fabric. Because you
Chapter 3. SAN Volume Controller user interfaces for servicing your system 73
cannot change the WWNN for the node, you must ensure that other components
in your fabric are configured correctly. Verify that any host that was previously
configured to use the node has been correctly updated.
v If the node that you are adding was previously replaced, either for a node repair
or upgrade, you might have used the WWNN of that node for the replacement
node. Ensure that the WWNN of this node was updated so that you do not have
two nodes with the same WWNN attached to your fabric. Also ensure that the
WWNN of the node that you are adding is not 00000. If it is 00000, contact your
IBM representative.
Attention:
1. If you are adding a node to the SAN again, ensure that you are adding the
node to the same I/O group from which it was removed. Failure to do this
action can result in data corruption. You must use the information that was
recorded when the node was originally added to the system. If you do not
have access to this information, call the IBM Support Center to add the node
back into the system without corrupting the data.
2. For each external storage system, the LUNs that are presented to the ports on
the new node must be the same as the LUNs that are presented to the nodes
that currently exist in the system. You must ensure that the LUNs are the same
before you add the new node to the system.
3. For each external storage system, LUN masking for each LUN must be
identical for all nodes in a system. You must ensure that the LUN masking for
each LUN is identical before you add the new node to the system.
4. You must ensure that the model type of the new node is supported by the SAN
Volume Controller software level that is currently installed on the system. If the
model type is not supported by the SAN Volume Controller software level,
upgrade the system to a software level that supports the model type of the new
node. See the following website for the latest supported software levels:
www.ibm.com/storage/support/2145
Note: Whenever possible you must provide a meaningful name for objects to
make identifying that object easier in the future.
This task assumes that you have already accessed the management GUI.
Important: You need this information to avoid possible data corruption if you
must remove and add the node to the system again.
Chapter 3. SAN Volume Controller user interfaces for servicing your system 75
If a node shows node error 578 or node error 690, the node is in service state.
Perform the following steps from the front panel to exit service state:
1. Press and release the up or down button until the Actions? option displays.
2. Press the select button.
3. Press and release the up or down button until the Exit Service? option
displays.
4. Press the select button.
5. Press and release the left or right button until the Confirm Exit? option
displays.
6. Press the select button.
For any other node errors, follow the appropriate service procedures to fix the
errors. After the errors are resolved and the node is in candidate state, you can try
to add the node to the system again.
The node might be in service state because it has a hardware issue, has corrupted
data, or has lost its configuration data.
The management GUI operates only when there is an online clustered system. Use
the service assistant if you are unable to create a clustered system.
The service assistant provides detailed status and error summaries, and the ability
to modify the World Wide Node Name (WWN) for each node.
You must use a supported web browser. Verify that you are using a supported and
an appropriately configured web browser from the following website:
www.ibm.com/storage/support/2145
Procedure
1. Start a supported web browser and point your web browser to
<serviceaddress>/service for the node that you want to work on.
2. Log on to the service assistant using the superuser password.
If you do not know the current superuser password, reset the password.
Results
For a full description of the commands and how to start an SSH command-line
session, see the “Command-line interface” topic in the “Reference” section of the
SAN Volume Controller Information Center.
Nearly all of the flexibility that is offered by the CLI is available through the
management GUI. However, the CLI does not provide the fix procedures that are
available in the management GUI. Therefore, use the fix procedures in the
management GUI to resolve the problems. Use the CLI when you require a
configuration setting that is unavailable in the management GUI.
You might also find it useful to create command scripts using the CLI commands
to monitor for certain conditions or to automate configuration changes that you
make on a regular basis.
Chapter 3. SAN Volume Controller user interfaces for servicing your system 77
Accessing the cluster (system) CLI
Follow the steps that are described in the “Command-line interface” topic in the
“Reference” section of the SAN Volume Controller Information Center to initialize
and use a CLI session.
For a full description of the commands and how to start an SSH command-line
session, see the “Command-line interface” topic in the “Reference” section of the
SAN Volume Controller Information Center.
To access a node directly, it is normally easier to use the service assistant with its
graphical interface and extensive help facilities.
| Use the front panel when you are physically next to the system and are unable to
| access one of the system GUIs.
Attention: Run the repairvdiskcopy command only if all volume copies are
synchronized.
When you issue the repairvdiskcopy command, you must use only one of the
-validate, -medium, or -resync parameters. You must also specify the name or ID
of the volume to be validated and repaired as the last entry on the command line.
After you issue the command, no output is displayed.
-validate
Use this parameter if you only want to verify that the mirrored volume copies
are identical. If any difference is found, the command stops and logs an error
that includes the logical block address (LBA) and the length of the first
difference. You can use this parameter, starting at a different LBA each time to
count the number of differences on a volume.
-medium
Use this parameter to convert sectors on all volume copies that contain
different contents into virtual medium errors. Upon completion, the command
logs an event, which indicates the number of differences that were found, the
number that were converted into medium errors, and the number that were
not converted. Use this option if you are unsure what the correct data is, and
you do not want an incorrect version of the data to be used.
-resync
Use this parameter to overwrite contents from the specified primary volume
copy to the other volume copy. The command corrects any differing sectors by
copying the sectors from the primary copy to the copies being compared. Upon
completion, the command process logs an event, which indicates the number
of differences that were corrected. Use this action if you are sure that either the
primary volume copy data is correct or that your host applications can handle
incorrect data.
-startlba lba
Optionally, use this parameter to specify the starting Logical Block Address
(LBA) from which to start the validation and repair. If you previously used the
validate parameter, an error was logged with the LBA where the first
difference, if any, was found. Reissue repairvdiskcopy with that LBA to avoid
reprocessing the initial sectors that compared identically. Continue to reissue
repairvdiskcopy using this parameter to list all the differences.
Notes:
1. Only one repairvdiskcopy command can run on a volume at a time.
2. Once you start the repairvdiskcopy command, you cannot use the command to
stop processing.
3. The primary copy of a mirrored volume cannot be changed while the
repairvdiskcopy -resync command is running.
4. If there is only one mirrored copy, the command returns immediately with an
error.
5. If a copy being compared goes offline, the command is halted with an error.
The command is not automatically resumed when the copy is brought back
online.
6. In the case where one copy is readable but the other copy has a medium error,
the command process automatically attempts to fix the medium error by
writing the read data from the other copy.
7. If no differing sectors are found during repairvdiskcopy processing, an
informational error is logged at the end of the process.
To check the progress of validation and repair of mirrored volumes, issue the
following command:
lsrepairvdiskcopyprogress –delim :
If a repair operation completes successfully and the volume was previously offline
because of corrupted metadata, the command brings the volume back online. The
only limit on the number of concurrent repair operations is the number of virtual
disk copies in the configuration.
Notes:
1. Because the volume is offline to the host, any I/O that is submitted to the
volume while it is being repaired fails.
2. When the repair operation completes successfully, the corrupted metadata error
is marked as fixed.
3. If the repair operation fails, the volume is held offline and an error is logged.
Note: Only run this command after you run the repairsevdiskcopy command,
which you must only run as required by the fix procedures or by the IBM Support
Center.
If you have lost both nodes in an I/O group and have, therefore, lost access to all
the volumes that are associated with the I/O group, you must perform one of the
following procedures to regain access to your volumes. Depending on the failure
type, you might have lost data that was cached for these volumes and the volumes
are now offline.
One node in an I/O group has failed and failover has started on the second node.
During the failover process, the second node in the I/O group fails before the data
in the write cache is written to hard disk. The first node is successfully repaired
but its hardened data is not the most recent version that is committed to the data
store; therefore, it cannot be used. The second node is repaired or replaced and has
lost its hardened data, therefore, the node has no way of recognizing that it is part
of the clustered system.
Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 81
Perform the following steps to recover from an offline volume when one node has
down-level hardened data and the other node has lost hardened data:
Procedure
1. Recover the node and add it back into the system.
2. Delete all IBM FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the offline volumes.
3. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command.
4. Re-create all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.
Example
Both nodes in the I/O group have failed and have been repaired. The nodes have
lost their hardened data, therefore, the nodes have no way of recognizing that they
are part of the system.
Perform the following steps to recover from an offline volume when both nodes
have lost their hardened data and cannot be recognized by the system:
1. Delete all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the offline volumes.
2. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command.
3. Create all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.
These procedures are nondisruptive because changes to your SAN environment are
not required. The replacement (new) node uses the same worldwide node name
(WWNN) as the node that you are replacing. An alternative to this procedure is to
replace nodes disruptively either by moving volumes to a new I/O group or by
rezoning the SAN. The disruptive procedures, however, require additional work on
the hosts.
Note: For nodes that contain solid-state drives (SSDs): if the existing SSDs are
being moved to the new node, the new node must contain the necessary
serial-attached SCSI (SAS) adapter to support SSDs.
v All nodes that are configured in the system are present and online.
Important:
1. Do not continue this task if any of the conditions listed are not met unless you
are instructed to do so by the IBM Support Center.
2. Review all of the steps that follow before you perform this task.
3. Do not perform this task if you are not familiar with SAN Volume Controller
environments or the procedures described in this task.
4. If you plan to reuse the node that you are replacing, ensure that the WWNN of
the node is set to a unique number on your SAN. If you do not ensure that the
WWNN is unique, the WWNN and WWPN are duplicated in the SAN
environment and can cause problems.
Tip: You can change the WWNN of the node you are replacing to the factory
default WWNN of the replacement node to ensure that the number is unique.
5. The node ID and possibly the node name change during this task. After the
system assigns the node ID, the ID cannot be changed. However, you can
change the node name after this task is complete.
Procedure
1. (If the system software version is at 5.1 or later, complete this step.)
Confirm that no hosts have dependencies on the node.
When shutting down a node that is part of a system or when deleting the
node from a system, you can use either the management GUI or a
command-line interface (CLI) command. In the management GUI, select
Monitoring > System > Manage. Click Show Dependent Volumes to display
all the volumes that are dependent on a node. You can also use the node
parameter with the lsdependentvdisks CLI command to view dependent
volumes.
If dependent volumes exist, determine if the volumes are being used. If the
volumes are being used, either restore the redundant configuration or suspend
the host application. If a dependent quorum disk is reported, repair the access
to the quorum disk or modify the quorum disk configuration.
Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 83
2. Use these steps to determine the system configuration node, and the ID,
name, I/O group ID, and I/O group name for the node that you want to
replace. If you already know the physical location of the node that you want
to replace, you can skip this step and proceed to step 3.
Tip: If one of the nodes that you want to replace is the system configuration
node, replace it last.
a. Issue this command from the command-line interface (CLI):
lsnode -delim :
This output is an example of the output that is displayed for this
command:
id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:
config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias
3:dvt113294:100089J137:5005076801005A07:online:0:io_grp0:yes:
20400002096810C7:8A4:iqn.1986-03.com.ibm:2145.ldcluster-80.dvt113294:
14:des113004:10006BR010:5005076801004F0F:online:0:io_grp0:no:
2040000192880040:8G4:iqn.1986-03.com.ibm:2145.ldcluster-80.des113004:
b. In the config_node column, find the value yes and record the values in the
id and name columns.
c. Record the values in the id and the name columns for each node in the
system.
d. Record the values in the IO_group_id and the IO_group_name columns for
each node in the system.
e. Issue this command from the CLI for each node in the system to
determine the front panel ID:
lsnodevpd node_name or node_id
where node_name or node_id is the name or ID of the node for which you
want to determine the front panel ID.
f. Record the value in the front_panel_id column. The front panel ID is
displayed on the front of each node. You can use this ID to determine the
physical location of the node that matches the node ID or node name that
you want replace.
3. Perform these steps to record the WWNN or iSCSI name of the node that you
want to replace:
a. Issue this command from the CLI:
lsnode -delim : node_name or node_id
where node_name or node_id is the name or ID of the node for which you
want to determine the WWNN or iSCSI name.
b. Record the WWNN or iSCSI name of the node that you want to replace.
Also record the order of the Fibre Channel and Ethernet ports.
4. Issue this command from the CLI to power off the node:
stopsystem -node node_name
Important:
a. Record and mark the order of the Fibre Channel or Ethernet cables with
the node port number (port 1 to 4 for Fibre Channel, or port 1 to 2 for
Ethernet) before you remove the cables from the back of the node. The
Fibre Channel ports on the back of the node are numbered 1 to 4 from left
to right. You must reconnect the cables in the exact order on the
replacement node to avoid issues when the replacement node is added to
the system. If the cables are not connected in the same order, the port IDs
A list of nodes is displayed. Wait until the removed node is not listed in the
command output.
7. Perform these steps to change the WWNN or iSCSI name of the node that you
just deleted from the system to FFFFF:
For SAN Volume Controller V6.1.0 or later:
a. Power on the node. With the Cluster panel displayed, press the up or
down button until the Actions option is displayed.
b. Press and release the select button.
c. Press the up or down button until Change WWNN? is displayed.
d. Press and release the select button to display the current WWNN.
e. Press and release the select button to switch into edit mode. The Edit
WWNN? panel is displayed.
f. Change the WWNN to FFFFF.
g. Press and release the select button to exit edit mode.
h. Press the right button to confirm your selection. The Confirm WWNN? panel
is displayed.
i. Press and release the select button to confirm.
8. Install the replacement node and the uninterruptible power supply in the rack
and connect the uninterruptible power supply cables. See the IBM System
Storage SAN Volume Controller Model 2145-XXX Hardware Installation Guide to
determine how to connect the node and the uninterruptible power supply.
Important: Do not connect the Fibre Channel or Ethernet cables during this
step.
9. If you are removing SSDs from an old node and inserting them into a new
node, see the IBM System Storage SAN Volume Controller Hardware Maintenance
Guide for specific instructions.
10. Power on the replacement node.
11. Record the WWNN of the replacement node. You can use this name if you
plan to reuse the node that you are replacing.
12. Perform these steps to change the WWNN name of the replacement node to
match the name that you recorded in step 3 on page 84:
For SAN Volume Controller V6.1.0 or later:
Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 85
a. With the Cluster panel displayed, press the up or down button until the
Actions option is displayed.
b. Press and release the select button.
c. Press the up or down button until Change WWNN? is displayed.
d. Press and release the select button to display the current WWNN.
e. Press the select button to switch into edit mode. The Edit WWNN? panel is
displayed.
f. Change the WWNN to the numbers that you recorded in step 3 on page
84.
g. Press and release the select button to exit edit mode.
h. Press the right button to confirm your selection. The Confirm WWNN? panel
is displayed.
i. Press the select button to confirm.
Wait one minute. If Cluster: is displayed on the front panel, this indicates
that the node is ready to be added to the system. If Cluster: is not displayed,
see the troubleshooting information to determine how to address this problem
or contact the IBM Support Center before you continue with the next step.
13. Connect the Fibre Channel or Ethernet cables to the same port numbers that
you recorded for the original node in step 4 on page 84.
14. Issue this CLI command to verify that the last five characters of the WWNN
are correct:
lsnodecandidate
Important: If the WWNN is not what you recorded in step 3 on page 84, you
must repeat step 12 on page 85.
15. Issue this CLI command to add the node to the system and ensure that the
node has the same name as the original node and is in the same I/O group as
the original node. See the addnode CLI command documentation for more
information.
addnode -wwnodename WWNN -iogrp iogroupname/id
WWNN and iogroupname/id are the values that you recorded for the original
node.
The SAN Volume Controller V5.1 and later automatically reassigns the node
with the name that was used originally. For versions before V5.1, use the name
parameter with the svctask addnode command to assign a name. If the
original name of the node name was automatically assigned by SAN Volume
Controller, it is not possible to reuse the same name. It was automatically
assigned if its name starts with node. In this case, either specify a different
name that does not start with node or do not use the name parameter so that
SAN Volume Controller automatically assigns a new name to the node.
If necessary, the new node is updated to the same SAN Volume Controller
software version as the system. This update can take up to 20 minutes.
Important:
a. Both nodes in the I/O group cache data; however, the cache sizes are
asymmetric. The replacement node is limited by the cache size of the
partner node in the I/O group. Therefore, it is possible that the
replacement node does not use the full cache size until you replace the
other node in the I/O group.
b. You do not have to reconfigure the host multipathing device drivers
because the replacement node uses the same WWNN and WWPN as the
Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 87
88 SAN Volume Controller: Troubleshooting Guide
Chapter 5. Viewing the vital product data
Vital product data (VPD) is information that uniquely records each element in the
SAN Volume Controller. The data is updated automatically by the system when
the configuration is changed.
Using different sets of commands, you can view the system VPD and the node
VPD. You can also view the VPD through the management GUI.
Perform the following steps to view the vital product data for a node:
Procedure
1. From Home, click System Status.
2. Select the node for which you want to display the details.
3. Click VPD to view the data.
Note: For the SAN Volume Controller 2145-8A4, 2145-8G4, and 2145-8F4 nodes,
the lsnodevpd nodename command displays the device serial number of the Fibre
Channel card as “N/A.”
Procedure
1. Issue the lsnode CLI command to display a concise list of nodes in the system.
The following is an example of the CLI command you can issue to list the
nodes in the system:
lsnode -delim :
The following is an example of the output that is displayed:
id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias:
panel_name:enclosure_id:canister_id:enclosure_serial_number
1:node1:UPS_Fake_SN:50050768010050B1:online:0:io_grp0:yes:10000000000050B1:8G4:iqn.1986-03.com.ibm:2145.cluster0.node1:000368:::
2. Issue the lsnode CLI command and specify the node ID or name of the node
that you want to receive detailed output.
The following is an example of the CLI command you can issue to list detailed
output for a node in the system:
lsnode -delim : group1node1
Where group1node1 is the name of the node for which you want to view
detailed output.
The following is an example of the output that is displayed:
id:1
name:group1node1
UPS_serial_number:10L3ASH
WWNN:500507680100002C
status:online
IO_group_id:0
IO_group_name:io_grp0
partner_node_id:2
partner_node_name:group1node2
config_node:yes
UPS_unique_id:202378101C0D18D8
port_id:500507680110002C
port_status:active
port_speed:2GB
port_id:500507680120002C
port_status:active
port_speed:2GB
port_id:500507680130002C
port_status:active
port_speed:2GB
port_id:500507680140003C
port_status:active
port_speed:2GB
hardware:8A4
iscsi_name:iqn.1986-03.com.ibm:2145.ndihill.node2
iscsi_alias
failover_active:no
failover_name:node1
failover_iscsi_name:iqn.1986-03.com.ibm:2145.ndihill.node1
failover_iscsi_alias
Procedure
Issue the lssystem command to display the properties for a clustered system.
The following is an example of the command you can issue:
lssystem -delim : build1
Results
id:000002007A00A0FE
name:build1
location:local
partnership:
bandwidth:
total_mdisk_capacity:90.7GB
space_in_mdisk_grps:90.7GB
space_allocated_to_vdisks:14.99GB
total_free_space:75.7GB
statistics_status:on
statistics_frequency:15
required_memory:0
cluster_locale:en_US
time_zone:522 UTC
code_level:6.1.0.0 (build 47.3.1009031000)
FC_port_speed:2Gb
console_IP:9.71.46.186:443
id_alias:000002007A00A0FE
gm_link_tolerance:300
gm_inter_cluster_delay_simulation:0
gm_intra_cluster_delay_simulation:0
email_reply:
email_contact:
email_contact_primary:
email_contact_alternate:
email_contact_location:
email_state:stopped
inventory_mail_interval:0
total_vdiskcopy_capacity:15.71GB
total_used_capacity:13.78GB
total_overallocation:17
total_vdisk_capacity:11.72GB
cluster_ntp_IP_address:
cluster_isns_IP_address:
iscsi_auth_method:none
iscsi_chap_secret:
auth_service_configured:no
auth_service_enabled:no
auth_service_url:
auth_service_user_name:
auth_service_pwd_set:no
auth_service_cert_set:no
relationship_bandwidth_limit:25
gm_max_host_delay:5
tier:generic_ssd
tier_capacity:0.00MB
tier_free_capacity:0.00MB
tier:generic_hdd
tier_capacity:90.67GB
tier_free_capacity:75.34GB
email_contact2:
email_contact2_primary:
email_contact2_alternate:
total_allocated_extent_capacity:16.12GB
Table 27 shows the fields you see for the system board.
Table 27. Fields for the system board
Item Field name
System board Part number
System serial number
Number of processors
Number of memory slots
Number of fans
Number of Fibre Channel adapters
Number of SCSI, IDE, SATA, or SAS devices
Note: The service controller is a device.
Number of power supplies
Number of high-speed SAS adapters
BIOS manufacturer
BIOS version
BIOS release date
System manufacturer
System product
Planar manufacturer
Power supply part number
CMOS battery part number
Power cable assembly part number
Service processor firmware
SAS controller part number
Table 28 shows the fields you see for each processor that is installed.
Table 28. Fields for the processors
Item Field name
Processor Part number
Processor location
Manufacturer
Version
Speed
Status
Processor serial number
Table 30 shows the fields that are repeated for each installed memory module.
Table 30. Fields that are repeated for each installed memory module
Item Field name
Memory module Part number
Device location
Bank location
Size (MB)
Manufacturer (if available)
Serial number (if available)
Table 31 shows the fields that are repeated for each installed adapter card.
Table 31. Fields that are repeated for each adapter that is installed
Item Field name
Adapter Adapter type
Part number
Port numbers
Location
Device serial number
Manufacturer
Device
Card revision
Chip revision
Table 32 on page 94 shows the fields that are repeated for each device that is
installed.
Table 33 shows the fields that are specific to the node software.
Table 33. Fields that are specific to the node software
Item Field name
Software Code level
Node name
Worldwide node name
ID
Unique string that is used in dump file
names for this node
Table 34 shows the fields that are provided for the front panel assembly.
Table 34. Fields that are provided for the front panel assembly
Item Field name
Front panel Part number
Front panel ID
Front panel locale
Table 35 shows the fields that are provided for the Ethernet port.
Table 35. Fields that are provided for the Ethernet port
Item Field name
Ethernet port Port number
Ethernet port status
MAC address
Supported speeds
Table 36 on page 95 shows the fields that are provided for the power supplies in
the node.
Table 37 shows the fields that are provided for the uninterruptible power supply
assembly that is powering the node.
Table 37. Fields that are provided for the uninterruptible power supply assembly that is
powering the node
Item Field name
Uninterruptible power supply Electronics assembly part number
Battery part number
Frame assembly part number
Input power cable part number
UPS serial number
UPS type
UPS internal part number
UPS unique ID
UPS main firmware
UPS communications firmware
Table 38 shows the fields that are provided for the SAS host bus adapter (HBA).
Table 38. Fields that are provided for the SAS host bus adapter (HBA)
Item Field name
SAS HBA Part number
Port numbers
Device serial number
Manufacturer
Device
Card revision
Chip revision
Table 39 on page 96 shows the fields that are provided for the SAS solid-state drive
(SSD).
Table 40 shows the fields that are provided for the small form factor pluggable
(SFP) transceiver.
Table 40. Fields that are provided for the small form factor pluggable (SFP) transceiver
Item Field name
Small form factor pluggable (SFP) Part number
transceiver
Manufacturer
Device
Serial number
Supported speeds
Connector type
Transmitter type
Wavelength
Maximum distance by cable type
Hardware revision
Port number
Worldwide port name
Table 41 on page 97 shows the fields that are provided for the system properties as
shown by the management GUI.
Figure 55 shows where the front-panel display 1 is located on the SAN Volume
Controller node.
Restarting
Restarting
svc00552
Figure 55. SAN Volume Controller front-panel assembly
The Boot progress display on the front panel shows that the node is starting.
Booting 130
During the boot operation, boot progress codes are displayed and the progress bar
moves to the right while the boot operation proceeds.
Boot failed
If the boot operation fails, boot code 120 is displayed.
Failed 120
See the "Error code reference" topic where you can find a description of the failure
and the appropriate steps that you must perform to correct the failure.
© Copyright IBM Corp. 2003, 2013 99
Charging
The front panel indicates that the uninterruptible power supply battery is charging.
Charging
svc00304
A node will not start and join a system if there is insufficient power in the
uninterruptible power supply battery to manage with a power failure. Charging is
displayed until it is safe to start the node. This might take up to two hours.
Error codes
Error codes are displayed on the front panel display.
Figure 57 and Figure 58 show how error codes are displayed on the front panel.
svc00433
For descriptions of the error codes that are displayed on the front panel display,
see the various error code topics for a full description of the failure and the actions
that you must perform to correct the failure.
Hardware boot
The hardware boot display shows system data when power is first applied to the
node as the node searches for a disk drive to boot.
If this display remains active for longer than 3 minutes, there might be a problem.
The cause might be a hardware failure or the software on the hard disk drive
might be missing or damaged.
Power failure
The SAN Volume Controller node uses battery power from the uninterruptible
power supply to shut itself down.
The Power failure display shows that the SAN Volume Controller is running on
battery power because main power has been lost. All I/O operations have stopped.
The node is saving system metadata and node cache data to the internal disk
drive. When the progress bar reaches zero, the node powers off.
Note: When input power is restored to the uninterruptible power supply, the SAN
Volume Controller turns on without the front panel power button being pressed.
Powering off
The progress bar on the display shows the progress of the power-off operation.
Powering Off is displayed after the power button has been pressed and while the
node is powering off. Powering off might take several minutes.
The progress bar moves to the left when the power is removed.
Chapter 6. Using the front panel of the SAN Volume Controller 101
Recovering
The front panel indicates that the uninterruptible power supply battery is not fully
charged.
Recovering
svc00305
When a node is active in a system but the uninterruptible power supply battery is
not fully charged, Recovering is displayed. If the power fails while this message is
displayed, the node does not restart until the uninterruptible power supply has
charged to a level where it can sustain a second power failure.
Restarting
The front panel indicates when the software on a node is restarting.
Restarting
If you press the power button while powering off, the panel display changes to
indicate that the button press was detected; however, the power off continues until
the node finishes saving its data. After the data is saved, the node powers off and
then automatically restarts. The progress bar moves to the right while the node is
restarting.
Shutting down
The front-panel indicator tracks shutdown operations.
The Shutting Down display is shown when you issue a shutdown command to a
SAN Volume Controller clustered system or a SAN Volume Controller node. The
progress bar continues to move to the left until the node turns off.
When the shutdown operation is complete, the node turns off. When you power
off a node that is connected to a 2145 UPS-1U, only the node shuts down; the 2145
UPS-1U does not shut down.
Shutting Down
Typically, this panel is displayed when the service controller has been replaced.
The SAN Volume Controller uses the WWNN that is stored on the service
controller. Usually, when the service controller is replaced, you modify the WWNN
that is stored on it to match the WWNN on the service controller that it replaced.
By doing this, the node maintains its WWNN address, and you do not need to
modify the SAN zoning or host configurations. The WWNN that is stored on disk
is the same that was stored on the old service controller.
After it is in this mode, the front panel display will not revert to its normal
displays, such as node or cluster (system) options or operational status, until the
WWNN is validated. Navigate the Validate WWNN option (shown in Figure 60) to
choose which WWNN that you want to use.
Validate WWNN?
Select
Node WWNN:
To choose which stored WWNN that you want this node to use, perform the
following steps:
1. From the Validate WWNN? panel, press and release the select button. The Disk
WWNN: panel is displayed and shows the last five digits of the WWNN that is
stored on the disk.
2. To view the WWNN that is stored on the service controller, press and release
the right button. The Panel WWNN: panel is displayed and shows the last five
numbers of the WWNN that is stored on the service controller.
3. Determine which WWNN that you want to use.
a. To use the WWNN that is stored on the disk, perform the following steps:
1) From the Disk WWNN: panel, press and release the down button. The
Use Disk WWNN? panel is displayed.
2) Press and release the select button.
b. To use the WWNN that is stored on the service controller, perform the
following steps:
1) From the Panel WWNN: panel, press and release the down button. The
Use Panel WWNN? panel is displayed.
2) Press and release the select button.
Chapter 6. Using the front panel of the SAN Volume Controller 103
The node is now using the selected WWNN. The Node WWNN: panel is displayed
and shows the last five numbers of the WWNN that you selected.
If neither WWNN that is stored on the service controller panel nor disk is suitable,
you must wait until the node restarts before you can change it. After the node
restarts, select Change WWNN to change the WWNN to the value that you want.
Menu options enable you to review the operational status of the clustered system,
node, and external interfaces. They also provide access to the tools and operations
that you use to service the node.
Figure 61 on page 105 shows the sequence of the menu options. Only one option at
a time is displayed on the front panel display. For some options, additional data is
displayed on line 2. The first option that is displayed is the Cluster: option.
R/L
R/L
Node Service
Node L/R Status: L/R L/R Address
WWNN: s
S
U R/L
/
D IPv4 IPv4 IPv4 IPv6 IPv6 IPv6
L/R L/R L/R L/R L/R
Address Subnet Gateway Address Prefix Gateway
U
R/L
/
D R/L
L/R
Ethernet MAC
L/R Speed-2: L/R
Port-2: Address-2:
L/R
U
/ Ethernet MAC
D L/R Speed-3: L/R
Port-3: Address-3:
L/R
Ethernet MAC
L/R Speed-4: L/R
Port-4: Address-4:
R/L
R/L
U
/
D FC Port-3 FC Port-3 FC Port-4 L/R FC Port-4
L/R L/R
Status Speed Status Speed
Actions
x
U
/
D
Language?
L L L Select activates language
Use the left and right buttons to navigate through the secondary fields that are
associated with some of the main fields.
Note: Messages might not display fully on the screen. You might see a right angle
bracket (>) on the right side of the display screen. If you see a right angle bracket,
press the right button to scroll through the display. When there is no more text to
display, you can move to the next item in the menu by pressing the right button.
Chapter 6. Using the front panel of the SAN Volume Controller 105
Similarly, you might see a left angle bracket (<) on the left side of the display
screen. If you see a left angle bracket, press the left button to scroll through the
display. When there is no more text to display, you can move to the previous item
in the menu by pressing the left button.
The main cluster (system) option displays the system name that the user has
assigned. If a clustered system is in the process of being created on the node, and
no system name has been assigned, a temporary name that is based on the IP
address of the system is displayed. If this node is not assigned to a system, the
field is blank.
Status option
Status is indicated on the front panel.
This field is blank if the node is not a member of a clustered system. If this node is
a member of a clustered system, the field indicates the operational status of the
system, as follows:
Active
Indicates that this node is an active member of the system.
Inactive
Indicates that the node is a member of a system, but is not now operational. It
is not operational because the other nodes that are in the system cannot be
accessed or because this node was excluded from the system.
Degraded
Indicates that the system is operational, but one or more of the member nodes
are missing or have failed.
These fields contain the IPv4 addresses of the system. If this node is not a member
of a system or if the IPv4 address has not been assigned, these fields are blank.
The IPv4 subnet mask addresses are set when the IPv4 addresses are assigned to
the system.
The IPv4 subnet options display the subnet mask addresses when the system has
IPv4 addresses. If the node is not a member of a system or if the IPv4 addresses
have not been assigned, this field is blank.
The IPv4 gateway addresses are set when the system is created.
The IPv4 gateway options display the gateway addresses for the system. If the
node is not a member of a system, or if the IPv4 addresses have not been assigned,
this field is blank.
These fields contain the IPv6 addresses of the system. If the node is not a member
of a system, or if the IPv6 address has not been assigned, these fields are blank.
The IPv6 prefix option displays the network prefix of the system and the service
IPv6 addresses. The prefix has a value of 0 - 127. If the node is not a member of a
system, or if the IPv6 addresses have not been assigned, a blank line displays.
The IPv6 gateway addresses are set when the system is created.
This option displays the IPv6 gateway addresses for the system. If the node is not
a member of a system, or if the IPv6 addresses have not been assigned, a blank
line displays.
The IPv6 addresses and the IPv6 gateway addresses consist of eight (4-digit)
hexadecimal values that are shown across four panels, as shown in Figure 62 on
page 108. Each panel displays two 4-digit values that are separated by a colon, the
address field position (such as 2/4) within the total address, and scroll indicators.
Move between the address panels by using the left button or right button.
Chapter 6. Using the front panel of the SAN Volume Controller 107
svc00417
Figure 62. Viewing the IPv6 address on the front-panel display
Node options
The main node option displays the identification number or the name of the node
if the user has assigned a name.
Status option
The node status is indicated on the front panel. The status can be one of the
following states:
Active The node is operational, assigned to a system, and ready to perform I/O.
Service
There is an error that is preventing the node from operating as part of a
system. It is safe to shut down the node in this state.
Candidate
The node is not assigned to a system and is not in service. It is safe to shut
down the node in this state.
Starting
The node is part of a system and is attempting to join the system. It cannot
perform I/O.
Version options
The version option displays the version of the SAN Volume Controller software
that is active on the node. The version consists of four fields that are separated by
full stops. The fields are the version, release, modification, and fix level; for
example, 6.1.0.0.
Build option
The Build: panel displays the level of the SAN Volume Controller software that is
currently active on this node.
The Cluster Build: panel displays the level of the software that is currently active
on the system that this node is operating in.
Ethernet options
The Ethernet options display the operational state of the Ethernet ports, the speed
and duplex information, and their media access control (MAC) addresses.
108 SAN Volume Controller: Troubleshooting Guide
The Ethernet panel shows one of the following states:
Config - Yes
This node is the configuration node.
Config - No
This node is not the configuration node.
No Cluster
This node is not a member of a system.
Press the right button to view the details of the individual Ethernet ports.
The Ethernet port options Port-1 through Port-4 display the state of the links and
indicates whether or not there is an active link with an Ethernet network.
Link Online
An Ethernet cable is attached to this port.
Link Offline
No Ethernet cable is attached to this port or the link has failed.
Speed options
The speed options Speed-1 through Speed-4 display the speed and duplex
information for the Ethernet port. The speed information can be one of the
following values:
10 The speed is 10 Mbps.
100 The speed is 100 Mbps.
1 The speed is 1Gbps.
10 The speed is 10 Gbps.
The MAC address options MAC Address-1 through MAC Address-4 display the
media access control (MAC) address of the Ethernet port.
Chapter 6. Using the front panel of the SAN Volume Controller 109
Failed The port is not operational because of a hardware failure.
Not installed
This port is not installed.
For the SAN Volume Controller 2145-8F2, you can use the Set FC Speed action
option to change the Fibre Channel port speed of a node that is not participating in
a system.
Actions options
During normal operations, action menu options are available on the front panel
display of the node. Only use the front panel actions when directed to do so by a
service procedure. Inappropriate use can lead to loss of access to data or loss of
data.
Figure 63 on page 112, Figure 64 on page 113, and Figure 65 on page 114 show the
sequence of the actions options. In the figures, bold lines indicate that the select
button was pressed. The lighter lines indicate the navigational path (up or down
and left or right). The circled X indicates that if the select button is pressed, an
action occurs using the data entered.
Only one action menu option at a time is displayed on the front-panel display.
Note: Options only display in the menu if they are valid for the current state of
the node. See Table 42 for a list of when the options are valid.
Chapter 6. Using the front panel of the SAN Volume Controller 111
Confirm
Cluster IPv4 IPv4 IPv4 Create?
Gateway: Cancel?
IPv4? Address: Subnet:
x
Confirm
Cluster IPv6 IPv6 IPv6 Create?
Address: Prefix: Gateway: Cancel?
IPv6?
x
Confirm
Service IPv4 IPv4 IPv4 Address?
Address: Gateway: Cancel?
IPv4? Subnet: x
Confirm
Service IPv6 IPv6 IPv6 Address?
Address: Gateway: Cancel?
IPv6? Prefix:
x
Confirm
Service DHCPv4?
Cancel?
DHCPv4? x
Confirm
Service DHCPv6? Cancel?
DHCPv6? x
Confirm
Change Edit WWNN?
Cancel?
WWNN? WWNN? x
svc00657
Figure 63. Upper options of the actions menu on the front panel
Confirm
Exit Exit? Cancel?
Service? x
Confirm
Recover Recover? Cancel?
Cluster?
x
Confirm
Remove Remove?
Cancel?
Cluster? x
Confirm
Paced Upgrade?
Upgrade? Cancel?
x
svc00658
Figure 64. Middle options of the actions menu on the front panel
Chapter 6. Using the front panel of the SAN Volume Controller 113
Confirm
Set FC Edit Speed?
Speed? Speed? Cancel?
x
Confirm
Reset Reset? Cancel?
Password? x
Confirm
Rescue Rescue? Cancel?
Node? x
Exit Actions?
svc00659
Figure 65. Lower options of the actions menu on the front panel
To perform an action, navigate to the Actions option and press the select button.
The action is initiated. Available parameters for the action are displayed. Use the
left or right buttons to move between the parameters. The current setting is
displayed on the second display line.
To set or change a parameter value, press the select button when the parameter is
displayed. The value changes to edit mode. Use the left or right buttons to move
between subfields, and use the up or down buttons to change the value of a
subfield. When the value is correct, press select to leave edit mode.
Each action also has a Confirm? and a Cancel? panel. Pressing select on the
Confirm? panel initiates the action using the current parameter value setting.
Pressing select on the Cancel? panel returns to the Action option panel without
changing the node.
Note: Messages might not display fully on the screen. You might see a right angle
bracket (>) on the right side of the display screen. If you see a right angle bracket,
press the right button to scroll through the display. When there is no more text to
display, you can move to the next item in the menu by pressing the right button.
Similarly, you might see a left angle bracket (<) on the left side of the display
screen. If you see a left angle bracket, press the left button to scroll through the
display. When there is no more text to display, you can move to the previous item
in the menu by pressing the left button.
The Cluster IPv4 or Cluster IPv6 option allows you to create a clustered system.
From the front panel, when you create a clustered system, you can set either the
IPv4 or the IPv6 address for Ethernet port 1. If required, you can add more
management IP addresses by using the management GUI or the CLI.
Press the up and down buttons to navigate through the parameters that are
associated with the Cluster option. When you have navigated to the desired
parameter, press the select button.
If you are creating the clustered system with an IPv4 address, complete the
following steps:
1. Press and release the up or down button until Actions? is displayed. Press and
release the select button.
2. Press and release the up or down button until Cluster IPv4? is displayed.
Press and release the select button.
3. Edit the IPv4 address, the IPv4 subnet, and the IPv4 gateway.
4. Press and release the left or right button until IPv4 Confirm Create? is
displayed.
5. Press and release the select button to confirm.
If you are creating the clustered system with an IPv6 address, complete the
following steps:
1. Press and release the up or down button until Actions? is displayed. Press and
release the select button.
2. Press and release the left or right button until Cluster Ipv6? is displayed. Press
and release the select button.
3. Edit the IPv6 address, the IPv6 prefix, and the IPv6 gateway.
4. Press and release the left or right button until IPv6 Confirm Create? is
displayed.
5. Press and release the select button to confirm.
Using the IPv4 address, you can set the IP address for Ethernet port 1 of the
clustered system that you are going to create. The system can have either an IPv4
or an IPv6 address, or both at the same time. You can set either the IPv4 or IPv6
Chapter 6. Using the front panel of the SAN Volume Controller 115
management address for Ethernet port 1 from the front panel when you are
creating the system. If required, you can add more management IP addresses from
the CLI.
Attention: When you set the IPv4 address, ensure that you type the correct
address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.
Press the right button to display the next secondary option or press the left button
to display the previous options.
Using this option, you can set the IPv4 subnet mask for Ethernet port 1.
Attention: When you set the IPv4 subnet mask address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
Using this option, you can set the IPv4 gateway address for Ethernet port 1.
Attention: When you set the IPv4 gateway address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.
Using this option, you can start an operation to create a clustered system with an
IPv4 address.
1. Press and release the left or right button until IPv4 Confirm Create? is
displayed.
2. Press the select button to start the operation.
If the create operation is successful, Password is displayed on line 1. The
password that you can use to access the system is displayed on line 2. Be sure
to immediately record the password; it is required on the first attempt to
manage the system from the management GUI.
Chapter 6. Using the front panel of the SAN Volume Controller 117
Attention: The password displays for only 60 seconds, or until a front panel
button is pressed. The clustered system is created only after the password
display is cleared.
If the create operation fails, Create Failed: is displayed on line 1 of the
front-panel display screen. Line 2 displays one of two possible error codes that
you can use to isolate the cause of the failure.
Using this option, you can set the IPv6 address for Ethernet port 1 of the system
that you are going to create. The system can have either an IPv4 or an IPv6
address, or both at the same time. You can set either the IPv4 or IPv6 management
address for Ethernet port 1 from the front panel when you are creating the system.
If required, you can add more management IP addresses from the CLI.
Attention: When you set the IPv6 address, ensure that you type the correct
address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Using this option, you can set the IPv6 prefix for Ethernet port 1.
Attention: When you set the IPv6 prefix, ensure that you type the correct
network prefix.Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Using this option, you can set the IPv6 gateway for Ethernet port 1.
Attention: When you set the IPv6 gateway address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.
Using this option, you can start an operation to create a clustered system with an
IPv6 address.
1. Press and release the left or right button until IPv6 Confirm Create? is
displayed.
2. Press the select button to start the operation.
If the create operation is successful, Password is displayed on line 1. The
password that you can use to access the system is displayed on line 2. Be sure
to immediately record the password; it is required on the first attempt to
manage the system from the management GUI.
Attention: The password displays for only 60 seconds, or until a front panel
button is pressed. The clustered system is created only after the password
display is cleared.
If the create operation fails, Create Failed: is displayed on line 1 of the
front-panel display screen. Line 2 displays one of two possible error codes that
you can use to isolate the cause of the failure.
Chapter 6. Using the front panel of the SAN Volume Controller 119
Service IPv4 or Service IPv6 options
You can use the front panel to change a service IPv4 address or a service IPv6
address.
The IPv4 Address panels show one of the following items for the selected Ethernet
port:
v The active service address if the system has an IPv4 address. This address can be
either a configured or fixed address, or it can be an address obtained through
DHCP.
v DHCP Failed if the IPv4 service address is configured for DHCP but the node
was unable to obtain an IP address.
v DHCP Configuring if the IPv4 service address is configured for DHCP while the
node attempts to obtain an IP address. This address changes to the IPv4 address
automatically if a DHCP address is allocated and activated.
v A blank line if the system does not have an IPv4 address.
If the service IPv4 address was not set correctly or a DHCP address was not
allocated, you have the option of correcting the IPv4 address from this panel. The
service IP address must be in the same subnet as the management IP address.
To set a fixed service IPv4 address from the IPv4 Address: panel, perform the
following steps:
1. Press and release the select button to put the panel in edit mode.
2. Press the right button or left button to move to the number field that you want
to set.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly
increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.
Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. When all the fields are set as required, press and release the select button to
activate the new IPv4 address.
The IPv4 Address: panel is displayed. The new service IPv4 address is not
displayed until it has become active. If the new address has not been displayed
after 2 minutes, check that the selected address is valid on the subnetwork and
that the Ethernet switch is working correctly.
The IPv6 Address panels show one of the following conditions for the selected
Ethernet port:
If the service IPv6 address was not set correctly or a DHCP address was not
allocated, you have the option of correcting the IPv6 address from this panel. The
service IP address must be in the same subnet as the management IP address.
To set a fixed service IPv6 address from the IPv6 Address: panel, perform the
following steps:
1. Press and release the select button to put the panel in edit mode. When the
panel is in edit mode, the full address is still shown across four panels as eight
(four-digit) hexadecimal values. You edit each digit of the hexadecimal values
independently. The current digit is highlighted.
2. Press the right button or left button to move to the number field that you want
to set.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value.
4. When all the fields are set as required, press and release the select button to
activate the new IPv6 address.
The IPv6 Address: panel is displayed. The new service IPv6 address is not
displayed until it has become active. If the new address has not been displayed
after 2 minutes, check that the selected address is valid on the subnetwork and
that the Ethernet switch is working correctly.
If a service IP address does not exist, you must assign a service IP address or use
DHCP with this action.
To set the service IPv4 address to use DHCP, perform the following steps:
1. Press and release the up or down button until Service DHCPv4? is displayed.
2. Press and release the down button. Confirm DHCPv4? is displayed.
3. Press and release the select button to activate DHCP, or you can press and
release the up button to keep the existing address.
4. If you activate DHCP, DHCP Configuring is displayed while the node attempts
to obtain a DHCP address. It changes automatically to show the allocated
address if a DHCP address is allocated and activated, or it changes to DHCP
Failed if a DHCP address is not allocated.
To set the service IPv6 address to use DHCP, perform the following steps:
1. Press and release the up or down button until Service DHCPv6? is displayed.
2. Press and release the down button. Confirm DHCPv6? is displayed.
Chapter 6. Using the front panel of the SAN Volume Controller 121
3. Press and release the select button to activate DHCP, or you can press and
release the up button to keep the existing address.
4. If you activate DHCP, DHCP Configuring is displayed while the node attempts
to obtain a DHCP address. It changes automatically to show the allocated
address if a DHCP address is allocated and activated, or it changes to DHCP
Failed if a DHCP address is not allocated.
Note: If an IPv6 router is present on the local network, SAN Volume Controller
does not differentiate between an autoconfigured address and a DHCP address.
Therefore, SAN Volume Controller uses the first address that is detected.
Important: Only change the WWNN when you are instructed to do so by a service
procedure. Nodes must always have a unique WWNN. If you change the WWNN,
you might have to reconfigure hosts and the SAN zoning.
1. Press and release the up or down button until Actions is displayed.
2. Press and release the select button.
3. Press and release the up or down button until Change WWNN? is displayed on
line 1. Line 2 of the display shows the last five numbers of the WWNN that is
currently set. The first number is highlighted.
4. Edit the highlighted number to match the number that is required. Use the up
and down buttons to increase or decrease the numbers. The numbers wrap F to
0 or 0 to F. Use the left and right buttons to move between the numbers.
5. When the highlighted value matches the required number, press and release the
select button to activate the change. The Node WWNN: panel displays and the
second line shows the last five characters of the changed WWNN.
If the node is active, entering service state can cause disruption to hosts if other
faults exist in the system.While in service state, the node cannot join or run as part
of a clustered system.
To exit service state, ensure that all errors are resolved. You can exit service state
by using the Exit Service? option or by restarting the node.
If there are no noncritical errors, the node enters candidate state. If possible, the
node then becomes active in a clustered system.
To exit service state, ensure that all errors are resolved. You can exit service state
by using this option or by restarting the node.
Perform service actions on nodes only when directed by the service procedures. If
used inappropriately, service actions can cause loss of access to data or data loss.
For information about the recover system procedure, see “Recover system
procedure” on page 215.
Use this option as the final step in decommissioning a clustered system after the
other nodes have been removed from the system using the command-line interface
(CLI) or the management GUI.
Attention: Use the front panel to remove state data from a single node system. To
remove a node from a multi-node system, always use the CLI or the remove node
options from the management GUI.
From the Remove Cluster? panel, perform the following steps to delete the state
data from the node:
1. Press and hold the up button.
2. Press and release the select button.
3. Release the up button.
After the option is run, the node shows Cluster: with no system name. If this
option is performed on a node that is still a member of a system, the system shows
error 1195, Node missing, and the node is displayed in the list of nodes in the
system. Remove the node by using the management GUI or CLI.
Note: This action can be used only when the following conditions exist for the
node:
v The node is in service state.
v The node has no errors.
v The node has been removed from the clustered system.
For additional information, see the “Upgrading the software manually” topic in the
information center.
Note: This option is available only on SAN Volume Controller 2145-8F2 nodes.
Chapter 6. Using the front panel of the SAN Volume Controller 123
Use the Reset password? option if the user has lost the system superuser password
or if the user is unable to access the system. If it is permitted by the user's
password security policy, use this selection to reset the system superuser password.
If your password security policy permits password recovery, and if the node is
currently a member of a clustered system, the system superuser password is reset
and a new password is displayed for 60 seconds. If your password security policy
does not permit password recovery or the node is not a member of a system,
completing these steps has no effect.
If the node is in active state when the password is reset, the reset applies to all
nodes in the system. If the node is in candidate or service state when the password
is reset, the reset applies only to the single node.
Note: Another way to rescue a node is to force a node rescue when the node
boots. It is the preferred method. Forcing a node rescue when a node boots works
by booting the operating system from the service controller and running a program
that copies all the SAN Volume Controller software from any other node that can
be found on the Fibre Channel fabric. See “Performing the node rescue when the
node boots” on page 230.
Language? option
You can change the language that displays on the front panel.
The Language? option allows you to change the language that is displayed on the
menu. Figure 66 shows the Language? option sequence.
Language?
Select
svc00410
English? Japanese?
To select the language that you want to be used on the front panel, perform the
following steps:
Results
If the selected language uses the Latin alphabet, the front panel display shows two
lines. The panel text is displayed on the first line and additional data is displayed
on the second line.
If the selected language does not use the Latin alphabet, the display shows only
one line at a time to clearly display the character font. For those languages, you
can switch between the panel text and the additional data by pressing and
releasing the select button.
Additional data is unavailable when the front panel displays a menu option, which
ends with a question mark (?). In this case, press and release the select button to
choose the menu option.
Note: You cannot select another language when the node is displaying a boot
error.
Using the power control for the SAN Volume Controller node
SAN Volume Controller nodes are powered by an uninterruptible power supply
that is located in the same rack as the nodes.
The power state of the SAN Volume Controller is displayed by a power indicator
on the front panel. If the uninterruptible power supply battery is not sufficiently
charged to enable the SAN Volume Controller to become fully operational, its
charge state is displayed on the front panel display of the node.
If the SAN Volume Controller software is running and you request it to power off
from the management GUI, CLI, or power button, the node starts its power off
processing. During this time, the node indicates the progress of the power-off
operation on the front panel display. After the power-off processing is complete,
the front panel becomes blank and the front panel power light flashes. It is safe for
you to remove the power cable from the rear of the node. If the power button on
the front panel is pressed during power-off processing, the front panel display
changes to indicate that the node is being restarted, but the power-off process
completes before the restart is performed.
If the SAN Volume Controller software is not running when the front panel power
button is pressed, the node immediately powers off.
Chapter 6. Using the front panel of the SAN Volume Controller 125
Note: The 2145 UPS-1U does not power off when the node is shut down from the
power button.
If you turn off a node using the power button or by a command, the node is put
into a power-off state. The SAN Volume Controller remains in this state until the
power cable is connected to the rear of the node and the power button is pressed.
During the startup sequence, the SAN Volume Controller tries to detect the status
of the uninterruptible power supply through the uninterruptible power supply
signal cable. If an uninterruptible power supply is not detected, the node pauses
and an error is shown on the front panel display. If the uninterruptible power
supply is detected, the software monitors the operational state of the
uninterruptible power supply. If no uninterruptible power supply errors are
reported and the uninterruptible power supply battery is sufficiently charged, the
SAN Volume Controller becomes operational. If the uninterruptible power supply
battery is not sufficiently charged, the charge state is indicated by a progress bar
on the front panel display. When an uninterruptible power supply is first turned
on, it might take up to two hours before the battery is sufficiently charged for the
SAN Volume Controller node to become operational.
If input power to the uninterruptible power supply is lost, the node immediately
stops all I/O operations and saves the contents of its dynamic random access
memory (DRAM) to the internal disk drive. While data is being saved to the disk
drive, a Power Failure message is shown on the front panel and is accompanied
by a descending progress bar that indicates the quantity of data that remains to be
saved. After all the data is saved, the node is turned off and the power light on the
front panel turns off.
Note: The node is now in standby state. If the input power to the uninterruptible
power supply unit is restored, the node restarts. If the uninterruptible power
supply battery was fully discharged, Charging is displayed and the boot process
waits for the battery to charge. When the battery is sufficiently charged, Booting is
displayed, the node is tested, and the software is loaded. When the boot process is
complete, Recovering is displayed while the uninterruptible power supply finalizes
its charge. While Recovering is displayed, the system can function normally.
However, when the power is restored after a second power failure, there is a delay
(with Charging displayed) before the node can complete its boot process.
Event logs
Error codes
The following topics provide information to help you understand and process the
error codes:
v Event reporting
v Understanding the events
v Understanding the error codes
v Determining a hardware boot failure
If the node is showing a boot message, failure message, or node error message,
and you determined that the problem was caused by a software or firmware
failure, you can restart the node to see if that might resolve the problem. Perform
the following steps to properly shut down and restart the node:
1. Follow the instructions in “MAP 5350: Powering off a SAN Volume Controller
node” on page 262.
2. Restart only one node at a time.
3. Do not shut down the second node in an I/O group for at least 30 minutes
after you shut down and restart the first node.
Event reporting
Events that are detected are saved in an event log. As soon as an entry is made in
this event log, the condition is analyzed. If any service activity is required, a
notification is sent.
The following methods are used to notify you and the IBM Support Center of a
new event:
v The most serious system error code is displayed on the front panel of each node
in the system.
v If you enabled Simple Network Management Protocol (SNMP), an SNMP trap is
sent to an SNMP manager that is configured by the customer.
The SNMP manager might be IBM Systems Director, if it is installed, or another
SNMP manager.
Power-on self-test
When you turn on the SAN Volume Controller, the system board performs
self-tests. During the initial tests, the hardware boot symbol is displayed.
All models perform a series of tests to check the operation of components and
some of the options that have been installed when the units are first turned on.
This series of tests is called the power-on self-test (POST).
If a critical failure is detected during the POST, the software is not loaded and the
system error LED on the operator information panel is illuminated. If this failure
occurs, use “MAP 5000: Start” on page 235 to help isolate the cause of the failure.
When the software is loaded, additional testing takes place, which ensures that all
of the required hardware and software components are installed and functioning
correctly. During the additional testing, the word Booting is displayed on the front
panel along with a boot progress code and a progress bar. If a test failure occurs,
the word Failed is displayed on the front panel.
The service controller performs internal checks and is vital to the operation of the
SAN Volume Controller. If the error (check) LED is illuminated on the service
controller front panel, the front-panel display might not be functioning correctly
and you can ignore any message displayed.
Understanding events
When a significant change in status is detected, an event is logged in the event log.
Error data
To avoid having a repeated event that fills the event log, some records in the event
log refer to multiple occurrences of the same event. When event log entries are
coalesced in this way, the time stamp of the first occurrence and the last occurrence
of the problem is saved in the log entry. A count of the number of times that the
error condition has occurred is also saved in the log entry. Other data refers to the
last occurrence of the event.
You can view the event log by using the Monitoring > Events options in the
management GUI. The event log contains many entries. You can, however, select
only the type of information that you need.
You can also view the event log by using the command-line interface (lseventlog).
See the “Command-line interface” topic for the command details.
Table 43 describes some of the fields that are available to assist you in diagnosing
problems.
Table 43. Description of data fields for the event log
Data field Description
Event ID This number precisely identifies why the event was logged.
Error code This number describes the service action that should be followed to
resolve an error condition. Not all events have error codes that are
associated with them. Many event IDs can have the same error code
because the service action is the same for all the events.
Sequence number A number that identifies the event.
Event count The number of events coalesced into this event log record.
Object type The object type to which the event log relates.
Object ID A number that uniquely identifies the instance of the object.
Fixed When an alert is shown for an error condition, it indicates if the
reason for the event was resolved. In many cases, the system
automatically marks the events fixed when appropriate. There are
some events that must be manually marked as fixed. If the event is a
message, this field indicates that you have read and performed the
action. The message must be marked as read.
First time The time when this error event was reported. If events of a similar
type are being coalesced together, so that one event log record
represents more than one event, this field is the time the first error
event was logged.
Event notifications
The system can use Simple Network Management Protocol (SNMP) traps, syslog
messages, and Call Home email to notify you and the IBM Support Center when
significant events are detected. Any combination of these notification methods can
be used simultaneously. Notifications are normally sent immediately after an event
is raised. However, there are some events that might occur because of service
actions that are being performed. If a recommended service action is active, these
events are notified only if they are still unfixed when the service action completes.
Each event that the system detects is assigned a notification type of Error, Warning,
or Information. When you configure notifications, you specify where the
notifications should be sent and which notification types are sent to that recipient.
SNMP traps
You can use the Management Information Base (MIB) file for SNMP to configure a
network management program to receive SNMP messages that are sent by the
system. This file can be used with SNMP messages from all versions of the
software. More information about the MIB file for SNMP is available at this
website:
www.ibm.com/storage/support/2145
Syslog messages
The syslog protocol is a standard protocol for forwarding log messages from a
sender to a receiver on an IP network. The IP network can be either IPv4 or IPv6.
The system can send syslog messages that notify personnel about an event. The
system can transmit syslog messages in either expanded or concise format. You can
use a syslog manager to view the syslog messages that the system sends. The
system uses the User Datagram Protocol (UDP) to transmit the syslog message.
You can specify up to a maximum of six syslog servers.You can use the
management GUI or the command-line interface to configure and modify your
syslog settings.
Table 45 shows how SAN Volume Controller notification codes map to syslog
security-level codes.
Table 45. SAN Volume Controller notification types and corresponding syslog level codes
SAN Volume Controller
notification type Syslog level code Description
ERROR LOG_ALERT Fault that might require
hardware replacement that
needs immediate attention.
WARNING LOG_ERROR Fault that needs immediate
attention. Hardware
replacement is not expected.
INFORMATIONAL LOG_INFO Information message used,
for example, when a
configuration change takes
place or an operation
completes.
TEST LOG_DEBUG Test message
Table 46 on page 132 shows how SAN Volume Controller values of user-defined
message origin identifiers map to syslog facility codes.
The Call Home feature transmits operational and event-related data to you and
service personnel through a Simple Mail Transfer Protocol (SMTP) server
connection in the form of an event notification email. When configured, this
function alerts service personnel about hardware failures and potentially serious
configuration or environmental issues.
To send email, you must configure at least one SMTP server. You can specify as
many as five additional SMTP servers for backup purposes. The SMTP server must
accept the relaying of email from the management IP address. You can then use the
management GUI or the command-line interface to configure the email settings,
including contact information and email recipients. Set the reply address to a valid
email address. Send a test email to check that all connections and infrastructure are
set up correctly. You can disable the Call Home function at any time using the
management GUI or the command-line interface.
Notifications can be sent using email, SNMP, or syslog. The data sent for each type
of notification is the same. It includes:
v Record type
v Machine type
v Machine serial number
v Error ID
v Error code
v Software version
v FRU part number
v Cluster (system) name
v Node ID
v Error sequence number
v Time stamp
v Object type
v Object ID
v Problem data
To send data and notifications to service personnel, use one of the following email
addresses:
v For systems that are located in North America, Latin America, South America or
the Caribbean Islands, use [email protected]
v For systems that are located anywhere else in the world, use
[email protected]
Because inventory information is sent using the Call Home email function, you
must meet the Call Home function requirements and enable the Call Home email
function before you can attempt to send inventory information email. You can
adjust the contact information, adjust the frequency of inventory email, or
manually send an inventory email using the management GUI or the
command-line interface.
Inventory information that is sent to IBM includes the following information about
the clustered system on which the Call Home function is enabled. Sensitive
information such as IP addresses is not included.
v Licensing information
v Details about the following objects and functions:
Drives
External storage systems
Hosts
MDisks
Volumes
RAID types
Easy Tier
FlashCopy
Metro Mirror and Global Mirror
For detailed information about what is included in the Call Home inventory
information, configure the system to send an inventory email to yourself.
Note: If more than one error occurs during an operation, the highest priority error
code displays on the front panel. The lower the number for the error code, the
higher the priority. For example, error code 1020 has a higher priority than error
code 1370.
Procedure
1. Locate the error code in one of the tables. If you cannot find a particular code
in any table, call IBM Support Center for assistance.
2. Read about the action you must perform to correct the problem. Do not
exchange field replaceable units (FRUs) unless you are instructed to do so.
3. Normally, exchange only one FRU at a time, starting from the top of the FRU
list for that error code.
Event IDs
The SAN Volume Controller software generates events, such as informational
events and error events. An event ID or number is associated with the event and
indicates the reason for the event.
Error events are generated when a service action is required. An error event maps
to an alert with an associated error code. Depending on the configuration, error
events can be notified through email, SNMP, or syslog.
Informational events
The informational events provide information about the status of an operation.
Informational events are recorded in the event log and, based on notification type,
can be notified through email, SNMP, or syslog.
SCSI status
Some events are part of the SCSI architecture and are handled by the host
application or device drivers without reporting an event. Some events, such as
read and write I/O events and events that are associated with the loss of nodes or
loss of access to backend devices, cause application I/O to fail. To help
troubleshoot these events, SCSI commands are returned with the Check Condition
status and a 32-bit event identifier is included with the sense information. The
identifier relates to a specific event in the event log.
If the host application or device driver captures and stores this information, you
can relate the application failure to the event log.
SCSI Sense
Nodes notify the hosts of events on SCSI commands. Table 49 defines the SCSI
sense keys, codes and qualifiers that are returned by the nodes.
Table 49. SCSI sense keys, codes, and qualifiers
Key Code Qualifier Definition Description
2h 04h 01h Not Ready. The logical The node lost sight of the system
unit is in the process of and cannot perform I/O
becoming ready. operations. The additional sense
does not have additional
information.
2h 04h 0Ch Not Ready. The target port The following conditions are
is in the state of possible:
unavailable. v The node lost sight of the
system and cannot perform
I/O operations. The additional
sense does not have additional
information.
v The node is in contact with
the system but cannot perform
I/O operations to the
specified logical unit because
of either a loss of connectivity
to the backend controller or
some algorithmic problem.
This sense is returned for
offline volumes.
Reason codes
The reason code appears in bytes 20-23 of the sense data. The reason code provides
the node with a specific log entry. The field is a 32-bit unsigned number that is
presented with the most significant byte first. Table 50 lists the reason codes and
their definitions.
If the reason code is not listed in Table 50, the code refers to a specific event in the
event log that corresponds to the sequence number of the relevant event log entry.
Table 50. Reason codes
Reason code
(decimal) Description
40 The resource is part of a stopped FlashCopy mapping.
50 The resource is part of a Metro Mirror or Global Mirror relationship
and the secondary LUN in the offline.
51 The resource is part of a Metro Mirror or Global Mirror and the
secondary LUN is read only.
60 The node is offline.
71 The resource is not bound to any domain.
72 The resource is bound to a domain that has been recreated.
73 Running on a node that has been contracted out for some reason
that is not attributable to any path going offline.
Object types
You can use the object code to determine the object type.
Line 1 of the front panel displays the message Booting that is followed by the boot
code. Line 2 of the display shows a boot progress indicator. If the boot code detects
an error that makes it impossible to continue, Failed is displayed. You can use the
code to isolate the fault.
Failed 120
Procedure
1. Attempt to restore the software by using the node rescue procedure.
2. If node rescue fails, perform the actions that are described for any failing node
rescue code or procedure.
The codes indicate the progress of the boot operation. Line 1 of the front panel
displays the message Booting that is followed by the boot code. Line 2 of the
display shows a boot progress indicator. Figure 68 provides a view of the boot
progress display.
Booting 130
Because node errors are specific to a node, for example, memory has failed, the
errors are only reported on that node.
Line 2 contains either the error code or the error code and additional data. In
errors that involve a node with more than one power supply, the error code is
followed by two numbers. The first number indicates the power supply that has a
problem (either a 1 or a 2). The second number indicates the problem that has been
detected.
Figure 69 provides an example of a node error code. This data might exceed the
maximum width of the menu screen. You can press the Right navigation to scroll
the display.
The additional data is unique for any error code. It provides necessary information
that enables you to isolate the problem in an offline environment. Examples of
additional data are disk serial numbers and field replaceable unit (FRU) location
codes. When these codes are displayed, you can do additional fault isolation by
navigating the default menu to determine the node and Fibre Channel port status.
There are two types of node errors: critical node errors and noncritical node errors.
Critical errors
A critical error means that the node is not able to participate in a clustered system
until the issue that is preventing it from joining a clustered system is resolved. This
error occurs because part of the hardware has failed or the system detects that the
software is corrupt. If a node has a critical node error, it is in service state, and the
fault LED on the node is on. The exception is when the node cannot connect to
enough resources to form a clustered system. It shows a critical node error but is
in the starting state. Resolve the errors in priority order. The range of errors that
are reserved for critical errors are 500 - 699.
Noncritical errors
A noncritical error code is logged when there is a hardware or code failure that is
related to just one specific node. These errors do not stop the node from entering
active state and joining a clustered system. If the node is part of a clustered
system, there is also an alert that describes the error condition. The range of errors
that are reserved for noncritical errors are 800 - 899.
To start node rescue, press and hold the left and right buttons on the front panel
during a power-on cycle. The menu screen displays the Node rescue request. See
the node rescue request topic. The hard disk is formatted and, if the format
completes without error, the software image is downloaded from any available
node. During node recovery, Line 1 of the menu screen displays the message
Booting followed by one of the node rescue codes. Line 2 of the menu screen
Booting 300
The three-digit code that is shown in Figure 70 represents a node rescue code.
Note: The 2145 UPS-1U will not power off following a node rescue failure.
Line 1 of the menu screen contains the message Create Failed. Line 2 shows the
error code and, where necessary, additional data.
You must perform software problem analysis before you can perform further
operations to avoid the possibility of corrupting your configuration.
Error codes for clustered systems describe errors other than recovery errors.
svc00433
Figure 73. Example of an error code for a clustered system
300 The 2145 is running node rescue. v Disk cable assembly (10%)
User response: Exchange the FRU for a new FRU. Possible Cause-FRUs or other:
v Fibre Channel adapter (100%)
Possible Cause-FRUs or other:
2145-CG8 or 2145-CF8
v Disk drive (50%)
345 The 2145 is searching for a donor node 370 Installing software
from which to copy the software.
Explanation: The 2145 is installing software.
Explanation: The node is searching at 1 Gb/s for a
User response:
donor node.
1. If this code is displayed and the progress bar has
User response: If the progress bar has stopped for been stopped for at least ten minutes, the software
more than two minutes, exchange the FRU for a new install process has failed with an unexpected
FRU. software error.
Possible Cause-FRUs or other: 2. Power off the 2145 and wait for 60 seconds.
v Fibre Channel adapter (100%) 3. Power on the 2145. The software upgrade operation
continues.
350 The 2145 cannot find a donor node. 4. Report this problem immediately to your Software
Support Center.
Explanation: The 2145 cannot find a donor node.
User response: If the progress bar has stopped for Possible Cause-FRUs or other:
more than two minutes, perform the following steps: v None
1. Ensure that all of the Fibre Channel cables are
connected correctly and securely to the cluster. 510 The detected memory size does not
2. Ensure that at least one other node is operational, is match the expected memory size.
connected to the same Fibre Channel network, and
Explanation: The amount of memory detected in the
is a donor node candidate. A node is a donor node
node is less than the amount required for the node to
candidate if the version of software that is installed
operate as an active member of a system. The error
on that node supports the model type of the node
code data shows the detected memory, in MB, followed
that is being rescued.
by the minimum required memory, in MB, there is then
3. Ensure that the Fibre Channel zoning allows a a series of values indicating the amount of memory, in
connection between the node that is being rescued GB, detected in each memory slot.
and the donor node candidate.
Data:
4. Perform the problem determination procedures for
the network. v Detected memory on MB
v Minimum required memory in MB
Possible Cause-FRUs or other: v Memory in slot 1 in GB
v None v Memory in slot 2 in GB
v ... etc.
Other:
v Fibre Channel network problem User response: Check the memory size of another
2145 that is in the same cluster. For the 2145-8F2,
2145-8F4, 2145-8G4, 2145-8A4, 2145-CF8, and 2145-CG8,
360 The 2145 is loading software from the if you have just replaced a memory module, check that
donor. the module that you have installed is the correct size,
Explanation: The 2145 is loading software from the then go to the light path MAP to isolate any possible
donor. failed memory modules.
User response: If the progress bar has been stopped Possible Cause—FRUs or other:
for at least two minutes, restart the node rescue v Memory module (100%)
procedure.
Possible Cause-FRUs or other: 517 The WWNNs of the service controller
v None and the disk do not match.
Explanation: The node is unable to determine the
365 Cannot load SW from donor WWNN that it should use. This is because of the
service controller or the nodes internal drive being
Explanation: None. replaced.
User response: None. User response: Follow troubleshooting procedures to
configure the WWNN of the node.
1. Continue to follow the hardware remove and
replace procedure for the service controller or disk
these explain the service actions.
2. If you have not followed the hardware remove and 2145-8G4, 2145-8A4, 2145-CF8, or 1245-CG8
replace procedures, you should determine the v System board assembly (100%)
correct WWNN. If you do not have this information
recorded, examine your Fibre Channel switch 2145-8F2 or 2145-8F4
configuration to see whether it is listed there.
Follow the procedures to change the WWNN of a v Frame assembly (100%)
node.
523 The internal disk file system is
Possible Cause-FRUs or other: damaged.
v None Explanation: The node startup procedures have found
problems with the file system on the internal disk of
521 Unable to detect a Fibre Channel the node.
adapter User response: Follow troubleshooting procedures to
Explanation: The 2145 cannot detect any Fibre reload the software.
Channel adapter cards. 1. Follow Procedure: Rescuing node canister machine
code from another node (node rescue).
Explanation: The 2145 cannot detect any Fibre
Channel adapter cards. 2. If the rescue node does not succeed, use the
hardware remove and replace procedures.
User response: Ensure that a Fibre Channel adapter
card has been installed. Ensure that the Fibre Channel Possible Cause—FRUs or other:
card is seated correctly in the riser card. Ensure that the
riser card is seated correctly on the system board. If the v Node canister (80%)
problem persists, exchange FRUs for new FRUs in the v Other (20%)
order shown.
Possible Cause-FRUs or other: 524 Unable to update BIOS settings.
2145-CG8 or 2145-CF8 Explanation: Unable to update BIOS settings.
v 4-port Fibre Channel host bus adapter assembly User response: Power off node, wait 30 seconds, and
(95%) then power on again. If the error code is still reported,
v System board assembly (5%) replace the system board.
Possible Cause-FRUs or other:
2145-8G4 or 2145-8A4
v System board (100%)
v 4-port Fibre Channel host bus adapter (80%)
v Riser card (19%)
525 Unable to update system board service
v System board (1%) processor firmware.
530 A problem with one of the node's power Reason 1: A power supply is not detected.
supplies has been detected. v Power supply (19%)
Explanation: The 530 error code is followed by two v System board (1%)
numbers. The first number is either 1 or 2 to indicate v Other: Power supply is not installed correctly (80%)
which power supply has the problem.
The second number is either 1, 2 or 3 to indicate the Reason 2: The power supply has failed.
reason. 1 indicates that the power supply is not v Power supply (90%)
detected. 2 indicates that the power supply has failed. 3 v Power cable assembly (5%)
indicates that there is no input power to the power
v System board (5%)
supply.
If the node is a member of a cluster, the cluster will Reason 3: There is no input power to the power supply.
report error code 1096 or 1097, depending on the error v Power cable assembly (25%)
reason.
v UPS-1U assembly (4%)
The error will automatically clear when the problem is v System board (1%)
fixed.
v Other: Power supply is not installed correctly (70%)
User response:
1. Ensure that the power supply is seated correctly 541 Multiple, undetermined, hardware
and that the power cable is attached correctly to errors
both the node and to the 2145 UPS-1U.
2. If the error has not been automatically marked fixed Explanation: Multiple hardware failures have been
after two minutes, note the status of the three LEDs reported on the data paths within the node canister,
on the back of the power supply. For the 2145-CG8 and the threshold of the number of acceptable errors
or 2145-CF8, the AC LED is the top green LED, the within a given time frame has been reached. It has not
DC LED is the middle green LED and the error been possible to isolate the errors to a single
LED is the bottom amber LED. component.
3. If the power supply error LED is off and the AC After this node error has been raised, all ports on the
and DC power LEDs are both on, this is the normal node will be deactivated. The reason for this is that the
condition. If the error has not been automatically node canister is considered unstable, and has the
fixed after two minutes, replace the system board. potential to corrupt data.
4. Follow the action specified for the LED states noted User response:
in the table below.
1. Follow the procedure for collecting information for
5. If the error has not been automatically fixed after support, and contact your support organization.
two minutes, contact support.
2. A software [code] upgrade may resolve the issue.
Explanation: The internal drive within the node is Possible Cause—FRUs or other:
reporting too many errors. It is no longer safe to rely v 2145-CG8 or 2145-CF8
on the integrity of the drive. Replacement is
– Disk drive (50%)
recommended.
– Disk controller (30%)
User response: Follow troubleshooting procedures to
– Disk backplane (10%)
fix the hardware:
– Disk signal cable (8%)
1. View hardware information.
– Disk power cable (1%)
2. Replace parts (canister or disk).
– System board (1%)
Possible Cause—FRUs or other: v 2145-8A4
v 2145-CG8 or 2145-CF8 – Disk drive assembly (80%)
– Disk drive (50%) – Disk cable assembly (15%)
– Disk controller (30%) – System board (5%)
– Disk backplane (10%) v 2145-8G4
– Disk signal cable (8%) – Disk drive assembly (80%)
– Disk power cable (1%) – Disk drive cables (10%)
578 The state data was not saved following Service controller (100%)
a power loss.
Explanation: On startup, the node was unable to read Other:
its state data. When this happens, it expects to be v None
automatically added back into a cluster. However, if it
has not joined a cluster in 60 sec, it raises this node
error. This is a critical node error, and user action is 581 A serial link error in the 2145 UPS-1U
required before the node can become a candidate to has occurred.
join a cluster. Explanation: There is a fault in the communications
User response: Follow troubleshooting procedures to cable, the serial interface in the uninterruptible power
correct connectivity issues between the cluster nodes supply 2145 UPS-1U, or 2145.
and the quorum devices. User response: Check that the communications cable
1. Manual intervention is required once the node is correctly plugged in to the 2145 and the 2145
reports this error. UPS-1U. If the cable is plugged in correctly, replace the
2. Attempt to reestablish the cluster using other nodes. FRUs in the order shown.
This may involve fixing hardware issues on other Possible Cause-FRUs or other:
nodes or fixing connectivity issues between nodes.
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
3. If you are able to reestablish the cluster, remove the
cluster data from the node showing 578 so it goes to v 2145 power cable assembly (40%)
candidate state, it will then be automatically added v 2145 UPS-1U assembly (30%)
back to the cluster. v 2145 system board (30%)
a. To remove the cluster data from the node, either
go to the service assistant, select the radio 2145-8F2 or 2145-8F4
button for the node with a 578, click Manage
v 2145 power cable assembly (40%)
System, then choose Remove System Data.
v 2145 UPS-1U assembly (30%)
b. Or use the CLI to satask leavecluster.
v 2145 frame assembly (30%)
If the node does not automatically add back to the
cluster, note the name and I/O group of the node,
then delete the node from the cluster configuration 582 A battery error in the 2145 UPS-1U has
(if this has not already happened) and then add the occurred.
node back to the cluster using the same name and Explanation: A problem has occurred with the
I/O group. uninterruptible power supply 2145 UPS-1U battery.
4. If all nodes have either node error 578 or 550,
User response: Exchange the FRU for a new FRU.
follow the cluster recovery procedures.
After replacing the battery assembly, if the 2145
5. Attempt to determine what caused the nodes to UPS-1U service indicator is on, press and hold the 2145
shut down. UPS-1U Test button for three seconds to start the
self-test and verify the repair. During the self-test, the
Possible Cause—FRUs or other: rightmost four LEDs on the 2145 UPS-1U front-panel
v None assembly flash in sequence.
Possible Cause-FRUs or other:
580 The service controller ID could not be v UPS-1U battery assembly (50%)
read.
v UPS-1U assembly (50%)
Explanation: The 2145 cannot read the unique ID from
the service controller, so the Fibre Channel adapters
583 An electronics error in the 2145 UPS-1U
cannot be started.
has occurred.
User response: In the sequence shown, exchange the
Explanation: A problem has occurred with the 2145
following FRUs for new FRUs.
UPS-1U electronics.
Possible Cause-FRUs or other:
User response: Exchange the FRU for a new FRU.
584 The 2145 UPS-1U is overloaded. 588 The 2145 UPS-1U is not cabled correctly.
Explanation: A problem with output overload has Explanation: The signal cable or the 2145 power
been reported by the uninterruptible power supply cables are probably not connected correctly. The power
2145 UPS-1U. The Overload Indicator on the 2145 cable and signal cable might be connected to different
UPS-1U front panel is illuminated red. 2145 UPS-1U assemblies.
User response: User response:
1. Ensure that only one 2145 is receiving power from 1. Connect the cables correctly.
the 2145 UPS-1U. Also ensure that no other devices 2. Restart the node.
are connected to the 2145 UPS-1U.
2. Disconnect the 2145 from the 2145 UPS-1U. If the Possible Cause-FRUs or other:
Overload Indicator is still illuminated, on the v None.
disconnected 2145 replace the 2145 UPS-1U.
3. If the Overload Indicator is now off, and the node is Other:
a 2145-8F2, 2145-8F4, 2145-8G4 or 2145-8A4, on the
v Cabling error (100%)
disconnected 2145, with all outputs disconnected, in
the sequence shown, exchange the FRUs for new
FRUs. 589 The 2145 UPS-1U ambient temperature
4. If the Overload Indicator is now off, and the node is limit has been exceeded.
a 2145-CG8 or 2145-CF8, on the disconnected 2145, Explanation: The ambient temperature threshold for
with all outputs disconnected, determine whether it the 2145 UPS-1U has been exceeded.
is one of the two power supplies or the power cable
assembly that must be replaced. Plug just one User response: Reduce the temperature around the
power cable into the left hand power supply and system:
start the node and see whether the error is reported. 1. Turn off the 2145 UPS-1U and unplug it from the
Then shut down the node and connect the other power source.
power cable into the left hand power supply and 2. Clear the vents and remove any heat sources.
start the node and see whether the error is repeated.
Then repeat the two tests for the right hand power 3. Ensure that the air flow around the 2145 UPS-1U is
supply. If the error is repeated for both cables on not restricted.
one power supply but not the other, replace the 4. Wait at least five minutes, and then restart the 2145
power supply that showed the error; otherwise, UPS-1U. If the problem remains, exchange 2145
replace the power cable assembly. UPS-1U assembly.
The node will not start until a sufficient charge exists to a. If you have deliberately removed the adapter
store the state and configuration data held in the node (possibly replacing it with a different adapter
memory if power were to fail. The front panel of the type), you will need to follow the management
node will show "charging". GUI recommended actions to mark the
hardware change as intentional.
User response: Wait for sufficient battery charge for
enclosure to start: b. If the previous steps have not isolated the
problem, use the remove and replace procedures
1. Wait for the node to automatically fix the error
to replace the adapter, if this does not fix the
when there is sufficient charge.
problem replace the system board.
2. Ensure that no error conditions are indicated on the
uninterruptible power supply. Possible Cause—FRUs or other cause:
v Fibre Channel adapter
690 The node is held in the service state. v System board
Explanation: The node is in service state and has been
instructed to remain in service state. While in service 701 A Fibre Channel adapter has failed.
state, the node will not run as part of a cluster. A node
must not be in service state for longer than necessary Explanation: A Fibre Channel adapter has failed.
while the cluster is online because a loss of redundancy This node error does not, in itself, stop the node
will result. A node can be set to remain in service state becoming active in the system. However, the Fibre
either because of a service assistant user action or Channel network might be being used to communicate
because the node was deleted from the cluster. between the nodes in a clustered system. Therefore, it
User response: When it is no longer necessary to hold is possible that this node error indicates the reason why
the node in the service state, exit the service state to the critical node error 550 A cluster cannot be formed
allow the node to run: because of a lack of cluster resources is reported
on the node.
1. Use the service assistant action to release the service
state. Data:
Possible Cause—FRUs or other: v A number indicating the adapter location. The
location indicates an adapter slot. See the node
v none
description for the definition of the adapter slot
locations.
700 The Fibre Channel adapter that was
User response:
previously present has not been
detected. 1. If possible, use the management GUI to run the
recommended actions for the associated service
Explanation: A Fibre Channel adapter that was error code.
previously present has not been detected. The adapter
2. Use the remove and replace procedures to replace
might not be correctly installed, or it might have failed.
the adapter. If this does not fix the problem, replace
This node error does not, in itself, stop the node the system board.
canister from becoming active in the system; however,
the Fibre Channel network might be being used to Possible Cause-FRUs or other cause:
communicate between the node canisters in a clustered v Fibre Channel adapter
system. It is possible that this node error indicates why
the critical node error 550 A cluster cannot be formed v System board
because of a lack of cluster resources is reported
on the node canister. 702 A Fibre Channel adapter has a PCI
Data: error.
v Location—A number indicating the adapter location. Explanation: A Fibre Channel adapter has a PCI error.
The location indicates an adapter slot, see the node
This node error does not, in itself, stop the node
canister description for the definition of the adapter
becoming active in the system. However, the Fibre
slot locations
Channel network might be being used to communicate
User response: between the nodes in a clustered system. Therefore, it
1. If possible, this noncritical node error should be is possible that this node error indicates the reason why
serviced using the management GUI and running the critical node error 550 A cluster cannot be formed
the recommended actions for the service error code. because of a lack of cluster resources is reported
on the node.
2.
There are a number of possibilities. Data:
v A number indicating the adapter location. The because of a lack of cluster resources is reported
location indicates an adapter slot. See the node on the node.
description for the definition of the adapter slot
Data:
locations.
Three numeric values are listed:
User response:
v The ID of the first unexpected inactive port. This ID
1. If possible, use the management GUI to run the
is a decimal number.
recommended actions for the associated service
error code. v The ports that are expected to be active, which is a
hexadecimal number. Each bit position represents a
2. Use the remove and replace procedures to replace
port, with the least significant bit representing port 1.
the adapter. If this does not fix the problem, replace
The bit is 1 if the port is expected to be active.
the system board.
v The ports that are actually active, which is a
Possible Cause-FRUs or other cause: hexadecimal number. Each bit position represents a
port, with the least significant bit representing port 1.
v Fibre Channel adapter
The bit is 1 if the port is active.
v System board
User response:
1. If possible, use the management GUI to run the
703 A Fibre Channel adapter is degraded.
recommended actions for the associated service
Explanation: A Fibre Channel adapter is degraded. error code.
This node error does not, in itself, stop the node 2. Possibilities:
becoming active in the system. However, the Fibre v If the port has been intentionally disconnected,
Channel network might be being used to communicate use the management GUI recommended action
between the nodes in a clustered system. Therefore, it for the service error code and acknowledge the
is possible that this node error indicates the reason why intended change.
the critical node error 550 A cluster cannot be formed v Check that the Fibre Channel cable is connected
because of a lack of cluster resources is reported at both ends and is not damaged. If necessary,
on the node. replace the cable.
Data: v Check the switch port or other device that the
v A number indicating the adapter location. The cable is connected to is powered and enabled in a
location indicates an adapter slot. See the node compatible mode. Rectify any issue. The device
description for the definition of the adapter slot service interface might indicate the issue.
locations. v Use the remove and replace procedures to replace
the SFP transceiver in the 2145 node and the SFP
User response: transceiver in the connected switch or device.
1. If possible, use the management GUI to run the v Use the remove and replace procedures to replace
recommended actions for the associated service the adapter.
error code.
2. Use the remove and replace procedures to replace Possible Cause-FRUs or other cause:
the adapter. If this does not fix the problem, replace
v Fibre Channel cable
the system board.
v SFP transceiver
Possible Cause-FRUs or other cause: v Fibre Channel adapter
v Fibre Channel adapter
v System board 705 Fewer Fibre Channel I/O ports
operational.
704 Fewer Fibre Channel ports operational. Explanation: One or more Fibre Channel I/O ports
that have previously been active are now inactive. This
Explanation: A Fibre Channel port that was situation has continued for one minute.
previously operational is no longer operational. The
physical link is down. A Fibre Channel I/O port might be established on
either a Fibre Channel platform port or an Ethernet
This node error does not, in itself, stop the node platform port using FCoE. This error is expected if the
becoming active in the system. However, the Fibre associated Fibre Channel or Ethernet port is not
Channel network might be being used to communicate operational.
between the nodes in a clustered system. Therefore, it
is possible that this node error indicates the reason why Data:
the critical node error 550 A cluster cannot be formed
Three numeric values are listed:
v The ID of the first unexpected inactive port. This ID representing port 1. The bit is 1 if the port is
is a decimal number. expected to have a connection to all online nodes.
v The ports that are expected to be active, which is a v The ports that actually have connections. This is a
hexadecimal number. Each bit position represents a hexadecimal number, each bit position represents a
port, with the least significant bit representing port 1. port, with the least significant bit representing port 1.
The bit is 1 if the port is expected to be active. The bit is 1 if the port has a connection to all online
v The ports that are actually active, which is a nodes.
hexadecimal number. Each bit position represents a User response:
port, with the least significant bit representing port 1.
1. If possible, this noncritical node error should be
The bit is 1 if the port is active.
serviced using the management GUI and running
User response: the recommended actions for the service error code.
1. If possible, use the management GUI to run the 2. Follow the procedure: Mapping I/O ports to
recommended actions for the associated service platform ports to determine which platform port
error code. does not have connectivity.
2. Follow the procedure for mapping I/O ports to 3. There are a number of possibilities.
platform ports to determine which platform port is v If the port’s connectivity has been intentionally
providing this I/O port. reconfigured, use the management GUI
3. Check for any 704 (Fibre channel platform port recommended action for the service error code
not operational) or 724 (Ethernet platform port and acknowledge the intended change. You must
not operational) node errors reported for the have at least two I/O ports with connections to
platform port. all other nodes.
4. Possibilities: v Resolve other node errors relating to this
v If the port has been intentionally disconnected, platform port or I/O port.
use the management GUI recommended action v Check that the SAN zoning is correct.
for the service error code and acknowledge the
intended change. Possible Cause: FRUs or other cause:
v Resolve the 704 or 724 error. v None.
v If this is an FCoE connection, use the information
the view gives about the Fibre Channel forwarder 710 The SAS adapter that was previously
(FCF) to troubleshoot the connection between the present has not been detected.
port and the FCF.
Explanation: A SAS adapter that was previously
Possible Cause-FRUs or other cause: present has not been detected. The adapter might not
be correctly installed or it might have failed.
v None
Data:
706 Fibre Channel clustered system path v A number indicating the adapter location. The
failure. location indicates an adapter slot. See the node
description for the definition of the adapter slot
Explanation: One or more Fibre Channel (FC) locations.
input/output (I/O) ports that have previously been
able to see all required online nodes can no longer see User response:
them. This situation has continued for 5 minutes. This 1. If possible, use the management GUI to run the
error is not reported unless a node is active in a recommended actions for the associated service
clustered system. error code.
A Fibre Channel I/O port might be established on 2. Possibilities:
either a FC platform port or an Ethernet platform port v If the adapter has been intentionally removed,
using Fiber Channel over Ethernet (FCoE). use the management GUI recommended actions
for the service error code, to acknowledge the
Data:
change.
Three numeric values are listed: v Use the remove and replace procedures to
v The ID of the first FC I/O port that does not have remove and open the node and check the adapter
connectivity. This is a decimal number. is fully installed.
v The ports that are expected to have connections. This v If the previous steps have not isolated the
is a hexadecimal number, and each bit position problem, use the remove and replace procedures
represents a port - with the least significant bit to replace the adapter. If this does not fix the
problem, replace the system board.
Possible Cause-FRUs or other cause: 1. If possible, use the management GUI to run the
v High-speed SAS adapter recommended actions for the associated service
error code.
v System board
2. Use the remove and replace procedures to replace
the adapter. If this does not fix the problem, replace
711 A SAS adapter has failed. the system board.
Explanation: A SAS adapter has failed.
Possible Cause-FRUs or other cause:
Data:
v High-speed SAS adapter
v A number indicating the adapter location. The
v System board
location indicates an adapter slot. See the node
description for the definition of the adapter slot
locations. 720 Ethernet adapter that was previously
present has not been detected.
User response:
1. If possible, use the management GUI to run the Explanation: An Ethernet adapter that was previously
recommended actions for the associated service present has not been detected. The adapter might not
error code. be correctly installed or it might have failed.
2. Use the remove and replace procedures to replace Data:
the adapter. If this does not fix the problem, replace v A number indicating the adapter location. The
the system board. location indicates an adapter slot. See the node
description for the definition of the adapter slot
Possible Cause-FRUs or other cause: locations. If the location is 0, the adapter integrated
v High-speed SAS adapter into the system board is being reported.
v System board User response:
1. If possible, use the management GUI to run the
712 A SAS adapter has a PCI error. recommended actions for the associated service
error code.
Explanation: A SAS adapter has a PCI error.
2. If the adapter location is 0, use the remove and
Data: replace procedures to replace the system board.
v A number indicating the adapter location. The 3. If the location is not 0, there are a number of
location indicates an adapter slot. See the node possibilities:
description for the definition of the adapter slot a. Use the remove and replace procedures to
locations. remove and open the node and check that the
User response: adapter is fully installed.
1. If possible, use the management GUI to run the b. If the previous steps have not located and
recommended actions for the associated service isolated the problem, use the remove and
error code. replace procedures to replace the adapter. If this
does not fix the problem, replace the system
2. Replace the adapter using the remove and replace
board.
procedures. If this does not fix the problem, replace
the system board.
Possible Cause-FRUs or other cause:
Possible Cause-FRUs or other cause: v Ethernet adapter
v SAS adapter v System board
v System board
721 An Ethernet adapter has failed.
713 A SAS adapter is degraded. Explanation: An Ethernet adapter has failed.
Explanation: A SAS adapter is degraded. Data:
Data: v A number indicating the adapter location. The
location indicates an adapter slot. See the node
v A number indicating the adapter location. The
description for the definition of the adapter slot
location indicates an adapter slot. See the node
locations. If the location is 0, the adapter integrated
description for the definition of the adapter slot
into the system board is being reported.
locations.
User response:
User response:
1. If possible, use the management GUI to run the Possible Cause—FRUs or other cause:
recommended actions for the associated service v Ethernet adapter
error code.
v System board
2. If the adapter location is 0, use the remove and
replace procedures to replace the system board.
724 Fewer Ethernet ports active.
3. If the adapter location is not 0, use the remove and
replace procedures to replace the adapter. If this Explanation: An Ethernet port that was previously
does not fix the problem, replace the system board. operational is no longer operational. The physical link
is down.
Possible Cause—FRUs or other cause:
Data:
v Ethernet adapter
Three numeric values are listed:
v System board
v The ID of the first unexpected inactive port. This is a
decimal number.
722 An Ethernet adapter has a PCI error.
v The ports that are expected to be active. This is a
Explanation: An Ethernet adapter has a PCI error. hexadecimal number. Each bit position represents a
port, with the least significant bit representing port 1.
Data:
The bit is 1 if the port is expected to be active.
v A number indicating the adapter location. The
v The ports that are actually active. This is a
location indicates an adapter slot. See the node
hexadecimal number. Each bit position represents a
description for the definition of the adapter slot
port, with the least significant bit representing port 1.
locations. If the location is 0, the adapter integrated
The bit is 1 if the port is active.
into the system board is being reported.
User response:
User response:
1. If possible, use the management GUI to run the
1. If possible, use the management GUI to run the
recommended actions for the associated service
recommended actions for the associated service
error code.
error code.
2. Possibilities:
2. If the adapter location is 0, use the remove and
replace procedures to replace the system board. a. If the port has been intentionally disconnected,
use the management GUI recommended action
3. If the adapter location is not 0, use the remove and
for the service error code and acknowledge the
replace procedures to replace the adapter. If this
intended change.
does not fix the problem, replace the system board.
b. Make sure the Ethernet cable is connected at
Possible Cause—FRUs or other cause: both ends and is undamaged. If necessary,
replace the cable.
v Ethernet adapter
c. Check that the switch port, or other device the
v System board cable is connected to, is powered and enabled in
a compatible mode. Rectify any issue. The
723 An Ethernet adapter is degraded. device service interface might indicate the issue.
d. If this is a 1 Gbps port, use the remove and
Explanation: An Ethernet adapter is degraded.
replace procedures to replace the SFP transceiver
Data: in the SAN Volume Controller and the SFP
v A number indicating the adapter location. The transceiver in the connected switch or device.
location indicates an adapter slot. See the node e.
description for the definition of the adapter slot Replace the adapter or the system board
locations. If the location is 0, the adapter integrated (depending on the port location) by using the
into the system board is being reported. remove and replace procedures.
User response:
Possible Cause—FRUs or other cause:
1. If possible, use the management GUI to run the
recommended actions for the associated service v Ethernet cable
error code. v Ethernet SFP transceiver
2. If the adapter location is 0, use the remove and v Ethernet adapter
replace procedures to replace the system board. v System board
3. If the adapter location is not 0, use the remove and
replace procedures to replace the adapter. If this
does not fix the problem, replace the system board.
| The error is resolved by either re-configuring the Explanation: Event log full.
| system to change which type of connection is allowed User response: To fix the errors in the event log, go to
| on a port, or by changing the SAN fabric configuration the start MAP.
| so ports are not in the same zone. A combination of
| both options may be used. Possible Cause-FRUs or other:
| The system reconfiguration is to change the Fibre v Unfixed errors in the log.
| Channel ports mask to reduce which ports can be used
| for internode communication. 1011 Fibre Channel adapter (4 port) in slot 1
| The local Fibre Channel port mask should be modified is missing.
| if the cluster id reported matches the cluster id of the Explanation: Fibre Channel adapter (4 port) in slot 1
| node logging the error. is missing.
| The partner Fibre Channel port mask should be User response:
| modified if the cluster id reported does not match the
1. In the sequence shown, exchange the FRUs for new
| cluster id of the node logging the error. The partner
FRUs.
| Fibre Channel port mask may need to be changed for
| one or both clusters.
2145-8F4 2145-8F2
v Dual port Fibre Channel host bus adapter - full “online”, go to start MAP. If you return to this step,
height (90%) contact your support center to resolve the problem
v PCI riser card (8%) with the 2145.
v Frame assembly (2%) 3. Go to repair verification MAP.
N/A 2145-8F2
v Dual port Fibre Channel host bus adapter - low
2145-8F4 profile (80%)
v PCI riser card (10%)
N/A v Frame assembly (10%)
N/A 2145-8F2
v Dual port Fibre Channel host bus adapter - full
2145-8F2 height (80%)
v PCI riser card (10%)
N/A v Frame assembly (10%)
1020 The system board service processor has 1030 The internal disk of a node has failed.
failed.
Explanation: An error has occurred while attempting
Explanation: The cluster is reporting that a node is to read or write data to the internal disk of one of the
not operational because of critical node error 522. See nodes in the cluster. The disk has failed.
the details of node error 522 for more information.
User response: Determine which node's internal disk
User response: See node error 522. has failed using the node information in the error.
Replace the FRUs in the order shown. Mark the error
as fixed.
1022 The detected memory size does not
match the expected memory size. Possible Cause-FRUs or other:
Explanation: The cluster is reporting that a node is 2145-CG8 or 2145-CF8
not operational because of critical node error 510. See v disk drive (50%)
the details of node error 510 for more information.
v Disk controller (30%)
User response: See node error 510. v Disk backplane (10%)
v Disk signal cable (8%)
1025 The 2145 system assembly is failing. v Disk power cable (1%)
Explanation: The 2145 system assembly is failing. v System board (1%)
User response:
2145-8A4
1. Go to the light path diagnostic MAP and perform
the light path diagnostic procedures. v disk drive (90%)
2. If the light path diagnostic procedure isolates the v disk cable assembly (10%)
FRU, mark this error as “fixed” and go to the repair
verification MAP. If you have just replaced a FRU 2145-8G4
2145-8F4 2145-8G4
N/A N/A
2145-8F2 2145-8F2
N/A N/A
1056 Fibre Channel adapter in slot 2 adapter 1060 One or more Fibre Channel ports on the
present but failed. 2145 are not operational.
Explanation: Fibre Channel adapter in slot 2 adapter Explanation: One or more Fibre Channel ports on the
present but failed. 2145 are not operational.
User response: User response:
1. Replace the Fibre Channel adapter. 1. Go to MAP 5600: Fibre Channel to isolate and
2. Check node status. If all nodes show a status of repair the problem.
“online”, mark the error that you have just repaired 2. Go to the repair verification MAP.
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step, Possible Cause-FRUs or other:
contact your support center to resolve the problem
with the 2145. 2145-8F4, 2145-8G4, 2145-CF8, or 2145-CG8
3. Go to repair verification MAP. v Fibre Channel cable (80%)
v Small Form-factor Pluggable (SFP) connector (5%)
Possible Cause-FRUs or other:
v 4-port Fibre Channel host bus adapter (5%)
2145-8F2
2145-8F2
Dual port Fibre Channel host bus adapter - full height v Fibre Channel cable (80%)
(100%) v Small Form-factor Pluggable (SFP) connector (5%)
v Dual port Fibre Channel host bus adapter (Fibre
2145-8G4 Channel MAP isolates to the correct type) (5%)
N/A Other:
v Fibre Channel network fabric (10%)
2145-8F4
N/A
1089 One or more fans are failing.
Explanation: One or more fans are failing.
1091 One or more fans (40x40x56) are failing.
User response:
Explanation: One or more fans (40x40x56) are failing.
1. Determine the failing fan(s) from the fan indicator
on the system board or from the text of the error User response:
data in the log. The reported fan for the 2145-8A4, 1. Determine the failing fan(s) from the fan indicator
2145-CF8, or 2145-CG8 matches the fan assembly on the system board or from the text of the error
position. For the 2145-8G4, if you have determined data in the log.
the failing fan number from the error data in the 2. If all fans on the fan backplane are failing or if no
log, use the following list to determine the position fan fault lights are illuminated, verify that the cable
of the fan assembly to replace. Each fan assembly between the fan backplane and the system board is
contains two fans. connected.
2. Exchange the FRU for a new FRU. 3. Exchange the FRU for a new FRU.
3. Go to repair verification MAP. 4. Go to repair verification MAP.
v Fan number:Fan assembly position
v 1 or 2 :1 Possible Cause-FRUs or other:
v 3 or 4 :2
2145-8F2 or 2145-8F4
v 5 or 6 :3
v Fan 40x40x56 (98%)
v 7 or 8 :4
v Fan power cable assembly (2%)
v 9 or 10:5
v 11 or 12:6 2145-8G4
6. Go to the repair verification MAP. 1. Check that the room temperature is within the
limits allowed.
Possible Cause-FRUs or other: 2. Check for obstructions in the air flow.
3. Mark the errors as fixed.
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
4. Go to repair verification MAP.
v The FRU that is indicated by the Light path
diagnostics (25%)
Possible Cause-FRUs or other:
v System board (5%)
None
2145-8F2 or 2145-8F4
v The FRU that is indicated by the Light path Other:
diagnostics (25%)
v Frame assembly (5%) System environment (100%)
Other: Error,AC,DC:Action
1094 The ambient temperature threshold has OFF,OFF,OFF:There is no power detected. Ensure that
been exceeded. the power cable is connected at the node and 2145
Explanation: The ambient temperature threshold has UPS-1U. If the AC LED does not light, check the status
been exceeded. of the 2145 UPS-1U to which the power supply is
connected. Follow MAP 5150 2145 UPS-1U if the
User response: UPS-1U is showing no power or an error; otherwise,
replace the power cable. If the AC LED still does not otherwise, replace the power cable. If the AC LED still
light, replace the power supply. does not light, replace the power supply.
OFF,OFF,ON:The power supply has a fault. Replace the OFF,OFF,ON:The power supply has a fault. Replace the
power supply. power supply.
OFF,ON,OFF:Ensure that the power supply is installed OFF,ON,OFF:Ensure that the power supply is installed
correctly. If the DC LED does not light, replace the correctly. If the DC LED does not light, replace the
power supply. power supply.
Possible Cause-FRUs or other: 1121 A high speed SAS adapter has failed.
Explanation: A fault has been detected on a high
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8 speed SAS adapter.
v Light path diagnostic MAP FRUs (98%)
User response: In the sequence shown, exchange the
v System board (2%) FRUs for new FRUs.
Go to the repair verification MAP. 3. Ensure that the air vents of the 2145 UPS are not
obstructed.
Possible Cause-FRUs or other:
4. Ensure that the air flow around the 2145 UPS is not
1. High speed SAS adapter (90%)
restricted.
2. System board (10%)
5. Wait for at least five minutes, and then restart the
2145 UPS. If the problem remains, check the
1122 A high speed SAS adapter error has ambient temperature. Correct the problem.
occurred. Otherwise, exchange the FRU for a new FRU.
Explanation: The high speed SAS adapter has 6. Check node status. If all nodes show a status of
detected a PCI bus error and requires service before it “online”, mark the error that you have just repaired
can be restarted. The high speed SAS adapter failure “fixed”. If any nodes do not show a status of
has caused all of the solid-state drives that were being “online”, go to start MAP. If you return to this step,
accessed through this adapter to go Offline. contact your support center to resolve the problem
with the uninterruptible power supply.
User response: If this is the first time that this error 7. Go to repair verification MAP.
has occurred on this node, complete the following
steps:
Possible Cause-FRUs or other:
1. Power off the node.
2. Reseat the high speed SAS adapter card. 2145 UPS electronics unit (50%)
3. Power on the node.
Other:
4. Submit the lsmdisk task and ensure that all of the
solid-state drive managed disks that are located in
this node have a status of Online. The system ambient temperature is outside the
specification (50%)
If the sequence of actions above has not resolved the
problem or the error occurs again on the same node, 1136 The 2145 UPS-1U has reported an
complete the following steps: ambient over temperature.
1. In the sequence shown, exchange the FRUs for new Explanation: The 2145 UPS-1U has reported an
FRUs. ambient over temperature.
2. Submit the lsmdisk task and ensure that all of the
solid-state drive managed disks that are located in User response:
this node have a status of Online. 1. Power off the node attached to the 2145 UPS-1U.
3. Go to the repair verification MAP. 2. Turn off the 2145 UPS-1U, and then unplug the 2145
UPS-1U from the main power source.
Possible Cause-FRUs or other: 3. Ensure that the air vents of the 2145 UPS-1U are not
1. High speed SAS adapter (90%) obstructed.
2. System board (10%) 4. Ensure that the air flow around the 2145 UPS-1U is
not restricted.
5. Wait for at least five minutes, and then restart the
1133 A duplicate WWNN has been detected.
2145 UPS-1U. If the problem remains, check the
Explanation: The cluster is reporting that a node is ambient temperature. Correct the problem.
not operational because of critical node error 556. See Otherwise, exchange the FRU for a new FRU.
the details of node error 556 for more information. 6. Check node status. If all nodes show a status of
User response: See node error 556. “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
1135 The 2145 UPS has reported an ambient contact your support center to resolve the problem
over temperature. with the uninterruptible power supply.
Explanation: The 2145 UPS has reported an ambient 7. Go to repair verification MAP.
over temperature. The uninterruptible power supply
switches to Bypass mode to allow the 2145 UPS to cool. Possible Cause-FRUs or other:
User response:
2145 UPS-1U assembly (50%)
1. Power off the nodes attached to the 2145 UPS.
2. Turn off the 2145 UPS, and then unplug the 2145 Other:
UPS from the main power source.
Other: N/A
v The input AC power is missing (40%)
v The input AC power is not in specification (40%) 1146 The signal connection between a 2145
and its 2145 UPS-1U is failing.
1141 The 2145 UPS-1U has reported that it Explanation: The signal connection between a 2145
has a problem with the input AC power. and its 2145 UPS-1U is failing.
Explanation: The 2145 UPS-1U has reported that it has User response:
a problem with the input AC power. 1. Exchange the FRUs for new FRUs in the sequence
shown.
User response:
2. Check node status. If all nodes show a status of
1. Check the input AC power, whether it is missing or
“online”, mark the error that you have just repaired
out of specification. Correct if necessary. Otherwise,
as “fixed”. If any nodes do not show a status of
exchange the FRU for a new FRU.
“online”, go to start MAP. If you return to this step,
2. Check node status. If all nodes show a status of contact your support center to resolve the problem
“online”, mark the error that you have just repaired with the uninterruptible power supply.
“fixed”. If any nodes do not show a status of
3. Go to repair verification MAP.
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
Possible Cause-FRUs or other:
with the uninterruptible power supply.
3. Go to repair verification MAP. 2145-8G4
v Power cable assembly (40%)
Possible Cause-FRUs or other:
v 2145 UPS-1U assembly (30%)
v 2145 UPS-1U input power cable (10%)
v System board (30%)
v 2145 UPS-1U assembly (10%)
2145-8F2 or 2145-8F4
Other:
v Power cable assembly (40%)
v The input AC power is missing (40%)
v 2145 UPS-1U assembly (30%)
v The input AC power is not in specification (40%)
v Frame assembly (30%)
2. Check node status. If all nodes show a status of 3. Ensure that only 2145s are receiving power from the
“online”, mark the error that you have just repaired uninterruptible power supply. Ensure that there are
“fixed”. If any nodes do not show a status of no switches or disk controllers that are connected to
“online”, go to start MAP. If you return to this step, the 2145 UPS.
contact your support center to resolve the problem 4. Remove each connected 2145 input power in turn,
with the uninterruptible power supply. until the output overload is removed.
3. Go to repair verification MAP. 5. Exchange the FRUs for new FRUs in the sequence
shown, on the overcurrent 2145.
Possible Cause-FRUs or other: 6. Check node status. If all nodes show a status of
v None “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
Other: “online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
v Configuration error
with the 2145 UPS.
7. Go to repair verification MAP.
1152 Incorrect type of uninterruptible power
supply detected. Possible Cause-FRUs or other:
Explanation: The cluster is reporting that a node is v Power cable assembly (50%)
not operational because of critical node error 587. See v Power supply assembly (40%)
the details of node error 587 for more information.
v 2145 UPS electronics assembly (10%)
1161 The output load on the 2145 UPS-1U 1166 The 2145 UPS-1U output load is
exceeds the specifications (reported by unexpectedly high.
2145 UPS-1U alarm bits).
Explanation: The uninterruptible power supply output
Explanation: The output load on the 2145 UPS-1U is possibly connected to an extra non-2145 load.
exceeds the specifications (reported by 2145 UPS-1U
User response:
alarm bits).
1. Ensure that there are no other devices that are
User response: connected to the 2145 UPS-1U.
1. Ensure that only 2145s are receiving power from the 2. Check node status. If all nodes show a status of
uninterruptible power supply. Also, ensure that no “online”, mark the error that you have just repaired
other devices are connected to the 2145 UPS-1U. “fixed”. If any nodes do not show a status of
2. Exchange, in the sequence shown, the FRUs for new “online”, go to start MAP. If you return to this step,
FRUs. If the Overload Indicator is still illuminated contact your support center to resolve the problem
with all outputs disconnected, replace the 2145 with the 2145 UPS-1U.
UPS-1U. 3. Go to repair verification MAP.
3. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired Possible Cause-FRUs or other:
“fixed”. If any nodes do not show a status of v 2145 UPS-1U assembly (5%)
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
Other:
with the 2145 UPS-1U.
v Configuration error (95%)
4. Go to repair verification MAP.
v Power supply assembly (40%) Explanation: 2145 UPS electronics fault (reported by
v 2145 UPS-1U assembly (10%) the 2145 UPS alarm bits).
User response:
1165 The 2145 UPS output load is 1. Replace the uninterruptible power supply
unexpectedly high. The 2145 UPS output electronics assembly.
is possibly connected to an extra 2. Check node status. If all nodes show a status of
non-2145 load. “online”, mark the error that you have just repaired
Explanation: The 2145 UPS output load is “fixed”. If any nodes do not show a status of
unexpectedly high. The 2145 UPS output is possibly “online”, go to start MAP. If you return to this step,
connected to an extra non-2145 load. contact your support center to resolve the problem
with the UPS.
User response: 3. Go to repair verification MAP.
1. Ensure that only 2145s are receiving power from the
uninterruptible power supply. Ensure that there are Possible Cause-FRUs or other:
no switches or disk controllers that are connected to
the 2145 UPS. 2145 UPS electronics assembly (100%)
2. Check node status. If all nodes show a status of
“online”, the problem no longer exists. Mark the
1171 2145 UPS-1U electronics fault (reported
error that you have just repaired “fixed” and go to
by the 2145 UPS-1U alarm bits).
the repair verification MAP.
3. Go to repair verification MAP. Explanation: 2145 UPS-1U electronics fault (reported
by the 2145 UPS-1U alarm bits).
Possible Cause-FRUs or other: User response:
1. Replace the uninterruptible power supply assembly.
None
2. Check node status. If all nodes show a status of
Other: “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
v Configuration error “online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
with the 2145 UPS-1U.
3. Go to repair verification MAP.
Possible Cause-FRUs or other: contact your support center to resolve the problem
with the uninterruptible power supply.
2145 UPS-1U assembly (100%) 3. Go to repair verification MAP.
Two possible scenarios when this error event is logged back into the cluster. Continuously monitor the
are: status of the nodes in the cluster while the cluster is
1. The node has failed without saving all of its state attempting to add the node. Note: If the node type
data. The node has restarted, possibly after a repair, is not supported by the software version of the
and shows node error 578 and is a candidate node cluster, the node will not appear as a candidate
for joining the cluster. The cluster attempts to add node. Therefore, incompatible hardware is not a
the node into the cluster but does not succeed. After potential root cause of this error.
15 minutes, the cluster makes a second attempt to 4. If the node was added to the cluster but failed
add the node into the cluster and again does not again before it has been online for 24 hours,
succeed. After another 15 minutes, the cluster investigate the root cause of the failure. If no events
makes a third attempt to add the node into the in the event log indicate the reason for the node
cluster and again does not succeed. After another 15 failure, collect dumps and contact IBM technical
minutes, the cluster logs error code 1194. The node support for assistance.
never came online during the attempts to add it to 5. When you have fixed the problem with the node,
the cluster. you must use either the cluster console or the
2. The node has failed without saving all of its state command line interface to manually remove the
data. The node has restarted, possibly after a repair, node from the cluster and add the node into the
and shows node error 578 and is a candidate node cluster.
for joining the cluster. The cluster attempts to add 6. Mark the error as fixed and go to the verification
the node into the cluster and succeeds and the node MAP.
becomes online. Within 24 hours the node fails
again without saving its state data. The node Possible Cause-FRUs or other:
restarts and shows node error 578 and is a
candidate node for joining the cluster. The cluster
None, although investigation might indicate a
again attempts to add the node into the cluster,
hardware failure.
succeeds, and the node becomes online; however,
the node again fails within 24 hours. The cluster
attempts a third time to add the node into the 1195 A 2145 is missing from the cluster.
cluster, succeeds, and the node becomes online;
Explanation: You can resolve this problem by
however, the node again fails within 24 hours. After
repairing the failure on the missing 2145.
another 15 minutes, the cluster logs error code 1194.
User response:
A combination of these scenarios is also possible. 1. If it is not obvious which node in the cluster has
failed, check the status of the nodes and find the
Note: If the node is manually removed from the cluster, 2145 with a status of offline.
the count of automatic recovery attempts is reset to
2. Go to the Start MAP and perform the repair on the
zero.
failing node.
User response: 3. When the repair has been completed, this error is
1. If the node has been continuously online in the automatically marked as fixed.
cluster for more than 24 hours, mark the error as 4. Check node status. If all nodes show a status of
fixed and go to the Repair Verification MAP. “online”, but the error in the log has not been
2. Determine the history of events for this node by marked as fixed, manually mark the error that you
locating events for this node name in the event log. have just repaired “fixed”. If any nodes do not
Note that the node ID will change, so match on the show a status of “online”, go to start MAP. If you
WWNN and node name. Also, check the service return to this step, contact your support center to
records. Specifically, note entries indicating one of resolve the problem with the 2145.
three events: 1) the node is missing from the cluster 5. Go to repair verification MAP.
(cluster error 1195 event 009052), 2) an attempt to
automatically recover the offline node is starting Possible Cause-FRUs or other:
(event 980352), 3) the node has been added to the
v None
cluster (event 980349).
3. If the node has not been added to the cluster since
the recovery process started, there is probably a 1200 The configuration is not valid. Too
hardware problem. The node's internal disk might many devices, MDisks, or targets have
be failing in a manner that it is unable to modify its been presented to the system.
software level to match the software level of the Explanation: The configuration is not valid. Too many
cluster. If you have not yet determined the root devices, MDisks, or targets have been presented to the
cause of the problem, you can attempt to manually system.
remove the node from the cluster and add the node
User response: 2. When you are satisfied that the problem has been
1. Remove unwanted devices from the Fibre Channel corrected, mark the error that you have just
network fabric. repaired “fixed”.
2. Start a cluster discovery operation to find 3. Go to MAP 5700: Repair verification.
devices/disks by rescanning the Fibre Channel
network. Possible Cause-FRUs or other:
3. List all connected managed disks. Check with the v Fibre Channel cable assembly (1%)
customer that the configuration is as expected. v Fibre Channel adapter (1%)
Mark the error that you have just repaired fixed.
4. Go to repair verification MAP. Other:
v Fibre Channel network fabric fault (98%)
Possible Cause-FRUs or other:
v None 1210 A local Fibre Channel port has been
excluded.
Other:
Explanation: A local Fibre Channel port has been
Fibre Channel network fabric fault (100%) excluded.
User response:
1201 A solid-state drive requires a recovery. 1. Repair faults in the order shown.
Explanation: The solid-state drive that is identified by 2. Check the status of the disk controllers. If all disk
this error needs to be recovered. controllers show a “good” status, mark the error
that you just repaired as “fixed”.
User response: To recover this SSD drive, submit the
3. Go to repair verification MAP.
following command: chdrive -task recover drive_id
where drive_id is the identity of the drive that needs to
Possible Cause-FRUs or other:
be recovered.
v Fibre Channel cable assembly (75%)
v Small Form-factor Pluggable (SFP) connector (10%)
1202 A solid-state drive is missing from the
configuration. v Fibre Channel adapter (5%)
User response:
1216 SAS errors have exceeded thresholds.
1. Use the transmitting and receiving WWPNs
indicated in the error data to determine the section Explanation: The cluster has experienced a large
of the Fibre Channel fabric that has generated the number of SAS communication errors, which indicates
duplicate frame. Search for the cause of the problem a faulty SAS component that must be replaced.
by using fabric monitoring tools. The duplicate
User response: In the sequence shown, exchange the
frame might be caused by a design error in the
FRUs for new FRUs.
topology of the fabric, by a configuration error, or
by a software or hardware fault in one of the Go to the repair verification MAP.
components of the Fibre Channel fabric, including
Possible Cause-FRUs or other:
inter-switch links.
1. SAS Cable (70%)
1217 A solid-state drive has exceeded the 1230 A login has been excluded.
temperature warning threshold.
Explanation: A port to port fabric connection, or login,
Explanation: The solid-state drive identified by this between the cluster node and either a controller or
error has reported that its temperature is higher than another cluster has had excessive errors. The login has
the warning threshold. therefore been excluded, and will not be used for I/O
operations.
User response: Take steps to reduce the temperature
of the drive. User response: Determine the remote system, which
1. Determine the temperature of the room, and reduce might be either a controller or a SAN Volume
the room temperature if this action is appropriate. Controller cluster. Check the event log for other 1230
errors. Ensure that all higher priority errors are fixed.
2. Replace any failed fans.
3. Ensure that there are no obstructions to air flow for This error event is usually caused by a fabric problem.
the node. If possible, use the fabric switch or other fabric
diagnostic tools to determine which link or port is
4. Mark the error as fixed. If the error recurs, contact
reporting the errors. If there are error events for links
hardware support for further investigation.
from this node to a number of different controllers or
clusters, then it is probably the node to switch link that
Possible Cause-FRUs or other: is causing the errors. Unless there are other contrary
v Solid-state drive (10%) indications, first replace the cable between the switch
and the remote system.
Other: 1. From the fabric analysis, determine the FRU that is
v System environment or airflow blockage (90%) most likely causing the error. If this FRU has
recently been replaced while resolving a 1230 error,
choose the next most likely FRU that has not been
1220 A remote Fibre Channel port has been
replaced recently. Exchange the FRU for a new FRU.
excluded.
2. Mark the error as fixed. If the FRU replacement has
Explanation: A remote Fibre Channel port has been not fixed the problem, the error will be logged
excluded. again; however, depending on the severity of the
User response: problem, the error might not be logged again
immediately.
1. View the event log. Note the MDisk ID associated
with the error code. 3. Start a cluster discovery operation to recover the
login by re-scanning the Fibre Channel network.
2. From the MDisk, determine the failing disk
controller ID. 4. Check the status of the disk controller or remote
cluster. If the status is not “good”, go to the Start
3. Refer to the service documentation for the disk MAP.
controller and the Fibre Channel network to resolve
the reported problem. 5. Go to repair verification MAP.
Other:
1311 A solid-state drive is offline due to
excessive errors.
Enclosure/controller fault (100%)
Explanation: The drive that is reporting excessive
errors has been taken offline.
1330 A suitable managed disk (MDisk) or
User response: In the management GUI, click drive for use as a quorum disk was not
Troubleshooting > Recommended Actions to run the found.
recommended action for this error. If this does not
resolve the issue, contact your next level of support. Explanation: A quorum disk is needed to enable a
tie-break when some cluster members are missing.
Three quorum disks are usually defined. By default, the
1320 A disk I/O medium error has occurred. cluster automatically allocates quorum disks when
managed disks are created; however, the option exists
Explanation: A disk I/O medium error has occurred.
to manually assign quorum disks. This error is reported
User response: when there are managed disks or image mode disks
1. Check whether the volume the error is reported but no quorum disks.
against is mirrored. If it is, check if there is a “1870 To become a quorum disk:
Mirrored volume offline because a hardware read
v The MDisk must be accessible by all nodes in the
error has occurred” error relating to this volume in
cluster.
the event log. Also check if one of the mirror copies
is synchronizing. If all these tests are true then you v The MDisk must be managed; that is, it must be a
must delete the volume copy that is not member of a storage pool.
synchronized from the volume. Check that the v The MDisk must have free extents.
volume is online before continuing with the v The MDisk must be associated with a controller that
following actions. Wait until the medium error is is enabled for quorum support. If the controller has
corrected before trying to re-create the volume multiple WWNNs, all of the controller components
mirror. must be enabled for quorum support.
2. If the medium error was detected by a read from a
host, ask the customer to rewrite the incorrect data A quorum disk might not be available because of a
to the block logical block address (LBA) that is Fibre Channel network failure or because of a Fibre
reported in the host systems SCSI sense data. If an Channel switch zoning problem.
individual block cannot be recovered it will be
necessary to restore the volume from backup. (If User response:
this error has occurred during a migration, the host 1. Resolve any known Fibre Channel network
system does not notice the error until the target problems.
device is accessed.)
2. Ask the customer to confirm that MDisks have been User response:
added to storage pools and that those MDisks have 1. Repair problems on all enclosures/controllers and
free extents and are on a controller that is enabled switches on the same SAN as this 2145 cluster.
for use as a provider of quorum disks. Ensure that
2. If problems are found, mark this error as “fixed”.
any controller with multiple WWNNs has all of its
components enabled to provide quorum disks. 3. If no switch or disk controller failures can be found,
Either create a suitable MDisk or if possible enable take an event log dump and call your hardware
quorum support on controllers with which existing support center.
MDisks are associated. If at least one managed disk 4. Go to repair verification MAP.
shows a mode of managed and has a non-zero
quorum index, mark the error that you have just Possible Cause-FRUs or other:
repaired as “fixed”.
v None
3. If the customer is unable to make the appropriate
changes, ask your software support center for Other:
assistance.
v Enclosure/controller fault
4. Go to repair verification MAP.
v Fibre Channel switch
Possible Cause-FRUs or other:
v None 1360 A SAN transport error occurred.
Explanation: This error has been reported because the
Other: 2145 performed error recovery procedures in response
to SAN component associated transport errors. The
Configuration error (100%) problem is probably caused by a failure of a component
of the SAN.
1335 Quorum disk not available. User response:
Explanation: Quorum disk not available. 1. View the event log entry to determine the node that
logged the problem. Determine the 2145 node or
User response: controller that the problem was logged against.
1. View the event log entry to identify the managed 2. Perform Fibre Channel switch problem
disk (MDisk) being used as a quorum disk, that is determination and repair procedures for the
no longer available. switches connected to the 2145 node or controller.
2. Perform the disk controller problem determination 3. Perform Fibre Channel cabling problem
and repair procedures for the MDisk identified in determination and repair procedures for the cables
step 1. connected to the 2145 node or controller.
3. Include the MDisks into the cluster. 4. If any problems are found and resolved in step 2
4. Check the managed disk status. If the managed disk and 3, mark this error as “fixed”.
identified in step 1 shows a status of “online”, mark 5. If no switch or cable failures were found in steps 2
the error that you have just repaired as “fixed”. If and 3, take an event log dump. Call your hardware
the managed disk does not show a status of support center.
“online”, go to start MAP. If you return to this step,
6. Go to repair verification MAP.
contact your support center to resolve the problem
with the disk controller.
Possible Cause-FRUs or other:
5. Go to repair verification MAP.
v None
Possible Cause-FRUs or other:
Other:
v None
v Fibre Channel switch
Other: v Fibre Channel cabling
Fibre Channel network fabric fault (100%) Normally zero, or very few, differences are
expected; however, if the copies have been marked
as synchronized inappropriately, then a large
1600 Mirrored disk repair halted because of
number of virtual medium errors could be created.
difference.
User response: Ensure that all higher priority errors
Explanation: During the repair of a mirrored volume
are fixed before you attempt to resolve this error.
two copy disks were found to contain different data for
the same logical block address (LBA). The validate Determine whether the excessive number of virtual
option was used, so the repair process has halted. medium errors occurred because of a mirrored disk
validate and repair operation that created errors for
Read operations to the LBAs that differ might return
differences, or whether the errors were created because
the data of either volume copy. Therefore it is
of a copy operation. Follow the corresponding option
important not to use the volume unless you are sure
shown below.
that the host applications will not read the LBAs that
differ or can manage the different data that potentially 1. If the virtual medium errors occurred because of a
can be returned. mirrored disk validate and repair operation that
created medium errors for differences, then also
User response: Perform one of the following actions: ensure that the volume copies had been fully
v Continue the repair starting with the next LBA after synchronized prior to starting the operation. If the
the difference to see how many differences there are copies had been synchronized, there should be only
for the whole mirrored volume. This can help you a few virtual medium errors created by the validate
decide which of the following actions to take. and repair operation. In this case, it might be
v Choose a primary disk and run repair possible to rewrite only the data that was not
resynchronizing differences. consistent on the copies using the local data
recovery process. If the copies had not been
v Run a repair and create medium errors for synchronized, it is likely that there are now a large
differences. number of medium errors on all of the volume
v Restore all or part of the volume from a backup. copies. Even if the virtual medium errors are
v Decide which disk has correct data, then delete the expected to be only for blocks that have never been
copy that is different and re-create it allowing it to be written, it is important to clear the virtual medium
synchronized. errors to avoid inhibition of other operations. To
recover the data for all of these virtual medium
Then mark the error as “fixed”. errors it is likely that the volume will have to be
recovered from a backup using a process that
Possible Cause-FRUs or other: rewrites all sectors of the volume.
logged again by the managed disk discovery that The meaning of the error IDs is shown below. For each
automatically runs at this time; this could take a error ID the most likely reason for the condition is
few minutes. given. If the problem is not found in the suggested
3. Go to repair verification MAP. areas, check the configuration and state of all of the
SAN components (switches, controllers, disks, cables
Possible Cause-FRUs or other: and cluster) to determine where there is a single point
of failure.
v None
010040 A disk controller is only accessible from a single
Other: node port.
v Enclosure/controller fault v A node has detected that it only has a connection to
the disk controller through exactly one initiator port,
1627 The cluster has insufficient redundancy and more than one initiator port is operational.
in its controller connectivity. v The error data indicates the device WWNN and the
WWPN of the connected port.
Explanation: The cluster has detected that it does not
have sufficient redundancy in its connections to the v A zoning issue or a Fibre Channel connection
disk controllers. This means that another failure in the hardware fault might cause this condition.
SAN could result in loss of access to the application
data. The cluster SAN environment should have 010041 A disk controller is only accessible from a single
redundant connections to every disk controller. This port on the controller.
redundancy allows for continued operation when there v A node has detected that it is only connected to
is a failure in one of the SAN components. exactly one target port on a disk controller, and more
To provide recommended redundancy, a cluster should than one target port connection is expected.
be configured so that: v The error data indicates the WWPN of the disk
v each node can access each disk controller through controller port that is connected.
two or more different initiator ports on the node. v A zoning issue or a Fibre Channel connection
v each node can access each disk controller through hardware fault might cause this condition.
two or more different controller target ports. Note:
Some disk controllers only provide a single target 010042 Only a single port on a disk controller is
port. accessible from every node in the cluster.
v each node can access each disk controller target port v Only a single port on a disk controller is accessible to
through at least one initiator port on the node. every node when there are multiple ports on the
controller that could be connected.
If there are no higher-priority errors being reported, v The error data indicates the WWPN of the disk
this error usually indicates a problem with the SAN controller port that is connected.
design, a problem with the SAN zoning or a problem v A zoning issue or a Fibre Channel connection
with the disk controller. hardware fault might cause this condition.
If there are unfixed higher-priority errors that relate to 010043 A disk controller is accessible through only half,
the SAN or to disk controllers, those errors should be or less, of the previously configured controller ports.
fixed before resolving this error because they might
v Although there might still be multiple ports that are
indicate the reason for the lack of redundancy. Error
accessible on the disk controller, a hardware
codes that must be fixed first are:
component of the controller might have failed or one
v 1210 Local FC port excluded of the SAN fabrics has failed such that the
v 1230 Login has been excluded operational system configuration has been reduced to
a single point of failure.
Note: This error can be reported if the required action, v The error data indicates a port on the disk controller
to rescan the Fibre Channel network for new MDisks, that is still connected, and also lists controller ports
has not been performed after a deliberate that are expected but that are not connected.
reconfiguration of a disk controller or after SAN v A disk controller issue, switch hardware issue,
rezoning. zoning issue or cable fault might cause this
condition.
The 1627 error code is reported for a number of
different error IDs. The error ID indicates the area 010044 A disk controller is not accessible from a node.
where there is a lack of redundancy. The data reported
in an event log entry indicates where the condition was v A node has detected that it has no access to a disk
found. controller. The controller is still accessible from the
partner node in the I/O group, so its data is still requested object is either unavailable or does not
accessible to the host applications. exist,” ask the customer if this device was removed
v The error data indicates the WWPN of the missing from the system.
disk controller. v If “yes”, mark the error as fixed in the cluster
v A zoning issue or a cabling error might cause this event log and continue with the repair
condition. verification MAP.
v If “no” or if the command lists details of the
User response:
failing controller, continue with the next step.
1. Check the error ID and data for a more detailed
3. Check whether the device has regained connectivity.
description of the error.
If it has not, check the cable connection to the
2. Determine if there has been an intentional change remote-device port.
to the SAN zoning or to a disk controller
4. If all attempts to log in to a remote-device port have
configuration that reduces the cluster's access to
failed and you cannot solve the problem by
the indicated disk controller. If either action has
changing cables, check the condition of the
occurred, continue with step 8.
remote-device port and the condition of the remote
3. Use the GUI or the CLI command lsfabric to device.
ensure that all disk controller WWPNs are
5. Start a cluster discovery operation by rescanning the
reported as expected.
Fibre Channel network.
4. Ensure that all disk controller WWPNs are zoned
6. Check the status of the disk controller. If all disk
appropriately for use by the cluster.
controllers show a “good” status, mark the error
5. Check for any unfixed errors on the disk that you have just repaired as “fixed”. If any disk
controllers. controllers do not show “good” status, go to start
6. Ensure that all of the Fibre Channel cables are MAP. If you return to this step, contact the support
connected to the correct ports at each end. center to resolve the problem with the disk
7. Check for failures in the Fibre Channel cables and controller.
connectors. 7. Go to repair verification MAP.
8. When you have resolved the issues, use the GUI
or the CLI command detectmdisk to rescan the Possible Cause-FRUs or other:
Fibre Channel network for changes to the MDisks. v None
Note: Do not attempt to detect MDisks unless you
are sure that all problems have been fixed. Other:
Detecting MDisks prematurely might mask an v Fibre Channel network fabric fault (50%)
issue.
v Enclosure/controller fault (50%)
9. Mark the error that you have just repaired as
fixed. The cluster will revalidate the redundancy
and will report another error if there is still not 1660 The initialization of the managed disk
sufficient redundancy. has failed.
10. Go to MAP 5700: Repair verification. Explanation: The initialization of the managed disk
has failed.
Possible Cause-FRUs or other:
User response:
v None
1. View the event log entry to identify the managed
disk (MDisk) that was being accessed when the
1630 The number of device logins was problem was detected.
reduced. 2. Perform the disk controller problem determination
Explanation: The number of port to port fabric and repair procedures for the MDisk identified in
connections, or logins, between the node and a storage step 1.
controller has decreased. This might be caused by a 3. Include the MDisk into the cluster.
problem on the SAN or by a deliberate reconfiguration 4. Check the managed disk status. If all managed
of the SAN. disks show a status of “online”, mark the error that
User response: you have just repaired as “fixed”. If any managed
disks do not show a status of “online”, go to the
1. Check the error in the cluster event log to identify
start MAP. If you return to this step, contact your
the object ID associated with the error.
support center to resolve the problem with the disk
2. Check the availability of the failing device using the controller.
following command line: lscontroller object_ID.
5. Go to repair verification MAP.
If the command fails with the message
“CMMVC6014E The command failed because the
Possible Cause-FRUs or other: delete all of the relationships that could not be
v None recovered and then re-create the relationships.
1. Note the I/O group index against which the error is
Other: logged.
2. List all of the Metro Mirror and Global Mirror
Enclosure/controller fault (100%) relationships that have either a master or an
auxiliary volume in this I/O group. Use the volume
view to determine which volumes in the I/O group
1670 The CMOS battery on the 2145 system
you noted have a relationship defined.
board failed.
3. Note the details of the Metro Mirror and Global
Explanation: The CMOS battery on the 2145 system Mirror relationships that are listed so that they can
board failed. be re-created.
User response: 4. Delete all of the Metro Mirror and Global Mirror
1. Replace the CMOS battery. relationships that are listed. Note: The error will
automatically be marked as “fixed” once the last
2. Mark the error that you have just repaired as
relationship on the I/O group is deleted. New
“fixed”.
relationships should not be created until the error is
3. Go to repair verification MAP. fixed.
5. Using the details noted in step 3, re-create all of the
Possible Cause-FRUs or other: Metro Mirror and Global Mirror relationships that
you just deleted. Note: You are able to delete a
CMOS battery (100%) Metro Mirror or Global Mirror relationship from
either the master or auxiliary cluster; however, you
1695 Persistent unsupported disk controller must re-create the relationship on the master cluster.
configuration. Therefore, it might be necessary to go to another
cluster to complete this service action.
Explanation: A disk controller configuration that
might prevent failover for the cluster has persisted for Possible Cause-FRUs or other:
more than four hours. The problem was originally
v None
logged through a 010032 event, service error code 1625.
User response:
1710 There are too many cluster partnerships.
1. Fix any higher priority error. In particular, follow The number of cluster partnerships has
the service actions to fix the 1625 error indicated by been reduced.
this error's root event. This error will be marked as
“fixed” when the root event is marked as “fixed”. Explanation: A cluster can have a Metro Mirror and
Global Mirror cluster partnership with one or more
2. If the root event cannot be found, or is marked as
other clusters. Partnership sets consist of clusters that
“fixed”, perform an MDisk discovery and mark this
are either in direct partnership with each other or are
error as “fixed”.
in indirect partnership by having a partnership with
3. Go to repair verification MAP. the same intermediate cluster. The topology of the
partnership set is not fixed; the topology might be a
Possible Cause-FRUs or other: star, a loop, a chain or a mesh. The maximum
v None supported number of clusters in a partnership set is
four. A cluster is a member of a partnership set if it has
Other: a partnership with another cluster in the set, regardless
of whether that partnership has any defined
v Enclosure/controller fault
consistency groups or relationships.
The following are examples of valid partnership sets
1700 Unrecovered Metro Mirror or Global
for five unique clusters labelled A, B, C, D, and E
Mirror relationship
where a partnership is indicated by a dash between
Explanation: This error might be reported after the two cluster names:
recovery action for a cluster failure or a complete I/O v A-B, A-C, A-D. E has no partnerships defined and
group failure. The error is reported because some therefore is not a member of the set.
Metro Mirror or Global Mirror relationships, whose
v A-B, A-D, B-C, C-D. E has no partnerships defined
control data is stored by the I/O group, were active at
and therefore is not a member of the set.
the time of the failure and the current state of the
relationship could not be recovered. v A-B, B-C, C-D. E has no partnerships defined and
therefore is not a member of the set.
User response: To fix this error it is necessary to
v A-B, A-C, A-D, B-C, B-D, C-D. E has no partnerships 6. Restart all relationships and consistency groups that
defined and therefore is not a member of the set. were stopped.
v A-B, A-C, B-C. D-E. There are two partnership sets. 7. Go to repair verification MAP.
One contains clusters A, B, and C. The other contains
clusters D and E. Possible Cause-FRUs or other:
v None
The following are examples of unsupported
configurations because the number of clusters in the set
is five, which exceeds the supported maximum of four | 1720 Metro Mirror (Remote copy) -
clusters: | Relationship has stopped and lost
v A-B, A-C, A-D, A-E.
| synchronization, for reason other than a
| persistent I/O error (LSYNC)
v A-B, A-D, B-C, C-D, C-E.
| Explanation: A remote copy relationship or
v A-B, B-C, C-D, D-E.
| consistency group needs to be restarted. In a Metro
| Mirror (remote copy) or Global Mirror operation, the
The cluster prevents you from creating a new Metro
| relationship has stopped and lost synchronization, for a
Mirror and Global Mirror cluster partnership if a
| reason other than a persistent I/O error.
resulting partnership set would exceed the maximum
of four clusters. However, if you restore a broken link | User response: The administrator must examine the
between two clusters that have a partnership, the | state of the system to validate that everything is online
number of clusters in the set might exceed four. If this | to allow a restart to work. Examining the state of the
occurs, Metro Mirror and Global Mirror cluster | system also requires checking the partner Fibre
partnerships are excluded from the set until only four | Channel (FC) port masks on both clusters
clusters remain in the set. A cluster partnership that is | 1. If the partner FC port mask was changed recently,
excluded from a set has all of its Metro Mirror and | check that the correct mask was selected.
Global Mirror cluster partnerships excluded.
| 2. Perform whatever steps are needed to maintain a
| consistent secondary, if desired.
Event ID 0x050030 is reported if the cluster is retained
in the partnership set. Event ID 0x050031 is reported if | 3. The administrator must issue a start command.
the cluster is excluded from the partnership set. All
clusters that were in the partnership set report error | Possible Cause-FRUs or other:
1710. | v None
| See Non-critical node error “888” on page 173 for Possible Cause-FRUs or other:
| details.
v None
| Use the lsfabric command to view the current number
| of logins between nodes.
1860 Thin-provisioned volume copy offline
| Possible Cause-FRUs or other cause: because of failed repair.
| v None Explanation: The attempt to repair the metadata of a
thin-provisioned volume that describes the disk
1840 The managed disk has bad blocks. contents has failed because of problems with the
automatically maintained backup copy of this data. The
Explanation: These are "virtual" medium errors which error event data describes the problem.
are created when copying a volume where the source
has medium errors. During data moves or duplication, User response: Delete the thin-provisioned volume
such as during a flash copy, an attempt is made to and reconstruct a new one from a backup or mirror
move medium errors; to achieve this, virtual medium copy. Mark the error as “fixed”. Also mark the original
errors called “bad blocks” are created. Once a bad 1862 error as “fixed”.
block has been created, no attempt will be made to Possible Cause-FRUs or other:
read the underlying data, as there is no guarantee that
v None
the old data still exists once the “bad block” is created.
Therefore, it is possible to have “bad blocks”, and thus
medium errors, reported on a target vdisk, without 1862 Thin-provisioned volume copy offline
medium errors actually existing on the underlying because of corrupt metadata.
storage. The “bad block” records are removed when the
data is overwritten by a host. Explanation: A thin-provisioned volume has been
taken offline because there is an inconsistency in the
cluster metadata that describes the disk contents. This
Note: On an external controller, this error can only
might occur because of corruption of data on the
result from a copied medium error.
physical disk (e.g., medium error or data miscompare),
User response: the loss of cached metadata (because of a cluster
1. The support center will direct the user to restore the recovery) or because of a software error. The event data
data on the affected volumes. gives information on the reason.
2. When the volume data has been restored, or the The cluster maintains backup copies of the metadata
user has chosen not to restore the data, mark the and it might be possible to repair the thin-provisioned
error as “fixed”. volume using this data.
3. Go to repair verification MAP. User response: The cluster is able to repair the
inconsistency in some circumstances. Run the repair
Possible Cause-FRUs or other: volume option to start the repair process. This repair
v None process, however, can take some time. In some
situations it might be more appropriate to delete the
thin-provisioned volume and reconstruct a new one
1850 Compressed volume copy has bad from a backup or mirror copy.
blocks
If you run the repair procedure and it completes, this
Explanation: A system recovery operation was error is automatically marked as “fixed”; otherwise,
performed, but data on one or more volumes was not another error event (error code 1860) is logged to
recovered; this is normally caused by a combination of indicate that the repair action has failed.
hardware faults. If data containing a medium error is
copied or migrated to another volume, bad blocks will Possible Cause-FRUs or other:
be recorded. If a host attempts to read the data in any v None
of the bad block regions, the read will fail with a
medium error.
User response:
1. The support center will direct the user to restore the
data on the affected volumes.
1865 Thin-provisioned volume copy offline 1870 Mirrored volume offline because a
because of insufficient space. hardware read error has occurred.
Explanation: A thin-provisioned volume has been Explanation: While attempting to maintain the volume
taken offline because there is insufficient allocated real mirror, a hardware read error occurred on all of the
capacity available on the volume for the used space to synchronized volume copies.
increase further. If the thin-provisioned volume is
The volume copies might be inconsistent, so the
auto-expand enabled, then the storage pool it is in also
volume is now offline.
has no free space.
User response:
User response: The service action differs depending
on whether the thin-provisioned volume copy is v Fix all higher priority errors. In particular, fix any
auto-expand enabled or not. Whether the disk is read errors that are listed in the sense data. This
auto-expand enabled or not is indicated in the error error event will automatically be fixed when the root
event data. event is marked as “fixed”.
v If you cannot fix the root error, but the read errors
If the volume copy is auto-expand enabled, perform
on some of the volume copies have been fixed, mark
one or more of the following actions. When you have
this error as “fixed” to run without the mirror. You
performed all of the actions that you intend to perform,
can then delete the volume copy that cannot read
mark the error as “fixed”; the volume copy will then
data and re-create it on different MDisks.
return online.
v Determine why the storage pool free space has been Possible Cause-FRUs or other:
depleted. Any of the thin-provisioned volume copies,
with auto-expand enabled, in this storage pool might v None
have expanded at an unexpected rate; this could
indicate an application error. New volume copies 1895 Unrecovered FlashCopy mappings
might have been created in, or migrated to, the
storage pool. Explanation: This error might be reported after the
recovery action for a cluster failure or a complete I/O
v Increase the capacity of the storage pool that is
group failure. The error is reported because some
associated with the thin-provisioned volume copy by
FlashCopies, whose control data is stored by the I/O
adding more MDisks to the group.
group, were active at the time of the failure and the
v Provide some free capacity in the storage pool by current state of the mapping could not be recovered.
reducing the used space. Volume copies that are no
longer required can be deleted, the size of volume User response: To fix this error it is necessary to
copies can be reduced or volume copies can be delete all of the FlashCopy mappings on the I/O group
migrated to a different storage pool. that failed.
v Migrate the thin-provisioned volume copy to a 1. Note the I/O group index against which the error is
storage pool that has sufficient unused capacity. logged.
v Consider reducing the value of the storage pool 2. List all of the FlashCopy mappings that are using
warning threshold to give more time to allocate extra this I/O group for their bitmaps. You should get the
space. detailed view of every possible FlashCopy ID. Note
the IDs of the mappings whose IO_group_id
If the volume copy is not auto-expand enabled, matches the ID of the I/O group against which this
perform one or more of the following actions. In this error is logged.
case the error will automatically be marked as “fixed”, 3. Note the details of the FlashCopy mappings that are
and the volume copy will return online when space is listed so that they can be re-created.
available. 4. Delete all of the FlashCopy mappings that are
v Determine why the thin-provisioned volume copy listed. Note: The error will automatically be marked
used space has grown at the rate that it has. There as “fixed” once the last mapping on the I/O group
might be an application error. is deleted. New mappings cannot be created until
v Increase the real capacity of the volume copy. the error is fixed.
v Enable auto-expand for the thin-provisioned volume 5. Using the details noted in step 3, re-create all of the
copy. FlashCopy mappings that you just deleted.
Other:
2601 Error detected while sending an email.
2145 software (100%) Explanation: An error has occured while the cluster
was attempting to send an email in response to an
event. The cluster is unable to determine if the email
2500 A secure shell (SSH) session limit for has been sent and will attempt to resend it. The
the cluster has been reached. problem might be with the SMTP server or with the
Explanation: Secure Shell (SSH) sessions are used by cluster email configuration. The problem might also be
applications that manage the cluster. An example of caused by a failover of the configuration node. This
such an application is the command-line interface error is not logged by the test email function because it
(CLI). An application must initially log in to the cluster responds immediately with a result code.
to create an SSH session. The cluster imposes a limit on User response:
the number of SSH sessions that can be open at one
time. This error indicates that the limit on the number v If there are higher-priority unfixed errors in the log,
of SSH sessions has been reached and that no more fix those errors first.
logins can be accepted until a current session logs out. v Ensure that the SMTP email server is active.
The limit on the number of SSH sessions is usually v Ensure that the SMTP server TCP/IP address and
reached because multiple users have opened an SSH port are correctly configured in the cluster email
session but have forgotten to close the SSH session configuration.
when they are no longer using the application. v Send a test email and validate that the change has
corrected the issue.
User response:
v Mark the error that you have just repaired as fixed.
v Because this error indicates a problem with the
number of sessions that are attempting external v Go to MAP 5700: Repair verification.
access to the cluster, determine the reason that so
many SSH sessions have been opened. Possible Cause-FRUs or other:
v Run the Fix Procedure for this error on the panel at v None
Management GUI Troubleshooting >
Recommended Actions to view and manage the 2700 Unable to access NTP network time
open SSH sessions. server
Explanation: Cluster time cannot be synchronized
2600 The cluster was unable to send an with the NTP network time server that is configured.
email.
User response: There are three main causes to
Explanation: The cluster has attempted to send an examine:
email in response to an event, but there was no
v The cluster NTP network time server configuration is
acknowledgement that it was successfully received by
incorrect. Ensure that the configured IP address
the SMTP mail server. It might have failed because the
matches that of the NTP network time server.
cluster was unable to connect to the configured SMTP
server, the email might have been rejected by the v The NTP network time server is not operational.
server, or a timeout might have occurred. The SMTP Check the status of the NTP network time server.
server might not be running or might not be correctly v The TCP/IP network is not configured correctly.
configured, or the cluster might not be correctly Check the configuration of the routers, gateways and
configured. This error is not logged by the test email firewalls. Ensure that the cluster can access the NTP
function because it responds immediately with a result network time server and that the NTP protocol is
code. permitted.
User response:
The error will automatically fix when the cluster is able
v Ensure that the SMTP email server is active. to synchronize its time with the NTP network time
v Ensure that the SMTP server TCP/IP address and server.
port are correctly configured in the cluster email
configuration. Possible Cause-FRUs or other:
v Send a test email and validate that the change has v None
corrected the issue.
v Mark the error that you have just repaired as fixed.
v Go to MAP 5700: Repair verification.
For each of these consistency groups, also note the Possible Cause-FRUs or other:
cluster ID of the remote cluster. v None
v Determine how many unique remote cluster IDs
there are among all of the Global Mirror and Metro
3081 Unable to send email to any of the
Mirror relationships and consistency groups that you
configured email servers.
have identified in the first two steps. For each of
these remote clusters, decide if you want to Explanation: Either the system was not able to
re-establish the partnership with that cluster. Ensure connect to any of the SMTP email servers, or the email
that the total number of partnerships that you want transmission has failed. A maximum of six email
to have with remote clusters does not exceed the servers can be configured. Error event 2600 or 2601 is
cluster limit. In version 4.3.1 this limit is 1. If you raised when an individual email server is found to be
re-establish a partnership, you will not have to delete not working. This error indicates that all of the email
the Global Mirror and Metro Mirror relationships servers were found to be not working.
and consistency groups that use the partnership.
User response:
v Re-establish any selected partnerships.
v Check the event log for all unresolved 2600 and 2601
v Delete all of the Global Mirror and Metro Mirror errors and fix those problems.
relationships and consistency groups that you listed
in either of the first two steps whose remote cluster v If this error has not already been automatically
partnership has not been re-established. marked fixed, mark this error as fixed.
v Check that the error has been marked as fixed by the v Perform the check email function to test that an
system. If it has not, return to the first step and email server is operating properly.
determine which Global Mirror or Metro Mirror
relationships or consistency groups are still causing Possible Cause-FRUs or other:
the issue. v None
The following list identifies some of the hardware that might cause failures:
v Power, fan, or cooling switch
v Application-specific integrated circuits
v Installed small form-factor pluggable (SFP) transceiver
v Fiber-optic cables
Perform the following steps if you were sent here from either the maintenance
analysis procedures or the error codes:
Procedure
1. If the customer has changed the SAN configuration by changing the Fibre
Channel cable connections or switch zoning, ask the customer to verify that the
changes were correct and, if necessary, reverse those changes.
2. Verify that the power is turned on to all switches and storage controllers that
the SAN Volume Controller system uses, and that they are not reporting any
hardware failures. If problems are found, resolve those problems before
proceeding further.
3. Verify that the Fibre Channel cables that connect the systems to the switches
are securely connected.
The following items can indicate that a single Fibre Channel or 10G Ethernet link
has failed:
v The Fibre Channel port status on the front panel of the node
v The Fibre Channel status LEDs at the rear of the node
v An error that indicates that a single port has failed (703, 723)
Attempt each of these actions, in the following order, until the failure is fixed:
1. Ensure that the Fibre Channel or 10G Ethernet cable is securely connected at
each end.
2. Replace the Fibre Channel or 10G Ethernet cable.
3. Replace the SFP transceiver for the failing port on the SAN Volume Controller
SAN Volume Controller node.
Note: SAN Volume Controller nodes are supported with both longwave SFP
transceivers and shortwave SFP transceivers. You must replace an SFP
transceiver with the same type of SFP transceiver. If the SFP transceiver to
replace is a longwave SFP transceiver, for example, you must provide a suitable
replacement. Removing the wrong SFP transceiver could result in loss of data
access.
4. Perform the Fibre Channel switch or FCF service procedures for a failing Fibre
Channel or 10G Ethernet with Fibre Channel over Ethernet personality enabled
link. This might involve replacing the SFP transceiver at the switch.
5. Replace the Fibre Channel adapter on the node.Replace the Fibre Channel
adapter or Fibre Channel over Ethernet adapter on the node.
For network problems, you can attempt any of the following actions:
v Test your connectivity between the host and SAN Volume Controller ports.
v Try to ping the SAN Volume Controller system from the host.
v Ask the Ethernet network administrator to check the firewall and router settings.
v Check that the subnet mask and gateway are correct for the SAN Volume
Controller host configuration.
Using the management GUI for SAN Volume Controller problems, you can attempt
any of the following actions:
For host problems, you can attempt any of the following actions:
v Verify that the host iSCSI qualified name (IQN) is correctly configured.
v Use operating system utilities (such as Windows device manager) to verify that
the device driver is installed, loaded, and operating correctly.
If error code 705 on node is displayed, this means the FC I/O port is inactive.
Fibre Channel over Ethernet uses Fibre Channel as a protocol and Ethernet as an
inter-connect.
Note: Concerning a Fibre Channel over Ethernet enabled port: either the fibre
channel forwarder (FCF) is not seen, or the Fibre Channel over Ethernet feature is
not configured on switch.
v Verify that the Fibre Channel over Ethernet feature is enabled on the FCF.
v Verify the remote port (switch port) properties on the FCF.
Run lsfabric, and verify the host is seen as a remote port in the output. If the
host is not seen, in order:
v Verify that SAN Volume Controller and host get an Fibre Channel ID (FCID) on
the FCF. If unable to verify, check the VLAN configuration.
v Verify that SAN Volume Controller and host port are part of a zone and that
zone is currently in force.
v Verify the volumes are mapped to host and are online. For more information, see
lshostvdiskmap and lsvdisk in the description in the SAN Volume Controller
Information Center.
What to do next
If the problem is not resolved, verify the state of the host adapter.
v Unload and load the device driver
v Use the operating system utilities (for example, Windows Device Manager) to
verify the device driver is installed, loaded, and operating correctly.
The following categories represent the types of service actions for storage systems:
v Controller code upgrade
v Field replaceable unit (FRU) replacement
Ensure that you are familiar with the following guidelines for upgrading controller
code:
v Check to see if the SAN Volume Controller supports concurrent maintenance for
your storage system.
v Allow the storage system to coordinate the entire upgrade process.
v If it is not possible to allow the storage system to coordinate the entire upgrade
process, perform the following steps:
1. Reduce the storage system workload by 50%.
2. Use the configuration tools for the storage system to manually failover all
logical units (LUs) from the controller that you want to upgrade.
3. Upgrade the controller code.
4. Restart the controller.
5. Manually failback the LUs to their original controller.
6. Repeat for all controllers.
FRU replacement
Ensure that you are familiar with the following guidelines for replacing FRUs:
v If the component that you want to replace is directly in the host-side data path
(for example, cable, Fibre Channel port, or controller), disable the external data
paths to prepare for upgrade. To disable external data paths, disconnect or
disable the appropriate ports on the fabric switch. The SAN Volume Controller
ERPs reroute access over the alternate path.
v If the component that you want to replace is in the internal data path (for
example, cache, or drive) and did not completely fail, ensure that the data is
backed up before you attempt to replace the component.
v If the component that you want to replace is not in the data path, for example,
uninterruptible power supply units, fans, or batteries, the component is
generally dual-redundant and can be replaced without additional steps.
Attention: Perform service actions only when directed by the fix procedures. If
used inappropriately, service actions can cause loss of access to data or even data
loss. Before attempting to recover a storage system, investigate the cause of the
failure and attempt to resolve those issues by using other fix procedures. Read and
understand all of the instructions before performing any action.
Attention: Do not attempt the recovery procedure unless the following conditions
are met:
v All hardware errors are fixed.
v All nodes have candidate status.
The system recovery procedure is one of several tasks that must be performed. The
following list is an overview of the tasks and the order in which they must be
performed:
1. Preparing for system recovery
a. Review the information regarding when to run the recover system
procedure
b. Fix your hardware errors
c. Remove the system information for node canisters with error code 550 or
error code 578 by using the service assistant.
2. Performing the system recovery. After you prepared the system for recovery
and met all the pre-conditions, run the system recovery.
Note: Run the procedure on one system in a fabric at a time. Do not perform
the procedure on different nodes in the same system. This restriction also
applies to remote systems.
3. Performing actions to get your environment operational
v Recovering from offline VDisks (volumes) by using the CLI
v Checking your system, for example, to ensure that all mapped volumes can
access the host.
You can run the recovery procedure by using the front panel or the service
assistant.
Attention: If you experience failures at any time while running the recover
system procedure, call the IBM Support Center. Do not attempt to do further
recovery actions, because these actions might prevent IBM Support from restoring
the system to an operational status.
Certain conditions must be met before you run the recovery procedure. Use the
following items to help you determine when to run the recovery procedure:
| 1. Check that no node in the cluster is active, and that the management IP is not
| accessible from any other node. If this is the case there is no need to recover
| the cluster.
| 2. Resolve all hardware errors in nodes, so that only nodes 578/550 are present. If
| this is not the case, go to “Fix hardware errors.”
| 3. Ensure all backend-storage administered by cluster is present prior to recovery.
| 4. If any nodes have been replaced, ensure that the WWNN of the replacement
| node matches that of the replaced node, and that no prior cluster data remains
| on this node (see Procedure: Removing system data from a node canister).
Obtain a basic understanding about the hardware failure. In most situations when
there is no clustered system, a power issue is the cause.
| v The node has been powered off or the power cords were unplugged.
| v A 2145 UPS-1U might have failed and shut down one or more nodes because of
| the failure. In general, this cause might not happen because of the redundancy
| provided by the second 2145 UPS-1U.
| v Check the node status of every node that is a member of the system. Resolve all
| errors.
| – All nodes must be reporting either a node error 578, or no cluster name is
| shown on the Cluster: display. These error codes indicate that the system has
| lost its configuration data. If any nodes report anything other than these error
| codes, do not perform a recovery. You can encounter situations where
| non-configuration nodes report other node errors, such as a node error 550.
| The 550 error can also indicate that a node is not able to join a system.
| Note: If any of the buttons on the front panel have been pressed after these
| two error codes are reported, the report for the node returns to the 578 node
| error. The change in the report happens after approximately 60 seconds. Also,
| if the node was rebooted or if hardware service actions were taken, the node
| might show no cluster name on the Cluster: display.
| – If any nodes show Node Error: 550, record the data from the second line of
| the display. If the last character on the second line of the display is >, use the
| right button to scroll the display to the right.
| - In addition to the Node Error: 550, the second line of the display can show
| a list of node front panel IDs (seven digits) that are separated by spaces.
| Note: If after resolving all these scenarios, half or greater than half of the
| nodes are reporting Node Error: 578, it is appropriate to run the recovery
| procedure.
| – For any nodes that are reporting a node error 550, ensure that all the missing
| hardware that is identified by these errors is powered on and connected
| without faults.
| – If you have not been able to restart the system, and if any node other than
| the current node is reporting node error 550 or 578, you must remove system
| data from those nodes. This action acknowledges the data loss and puts the
| nodes into the required candidate state.
Procedure
1. Press and release the up or down button until the Actions menu option is
displayed.
2. Press and release the select button.
3. Press and release the up or down button until Remove Cluster? option is
displayed.
4. Press and release the select button.
5. The node displays Confirm Remove?.
6. Press and release the select button.
7. The node displays Cluster:.
Results
When all nodes show Cluster: on the top line and blank on the second line, the
nodes are in candidate status. The 550 or 578 error has been removed. You can
now run the recovery procedure.
Before performing this task, ensure that you have read the introductory
information in the overall recover system procedure.
To remove system information from a node with an error 550 or 578, follow this
procedure using the service assistant:
Procedure
1. Point your browser to the service IP address of one of the nodes, for example,
https://node_service_ip_address/service/.
If you do not know the IP address or if it has not been configured, use the
front panel menu to configure a service address on the node.
2. Log on to the service assistant.
3. Select Manage System.
4. Click Remove System Data.
5. Confirm that you want to remove the system data when prompted.
6. Remove the system data for the other nodes that display a 550 or a 578 error.
All nodes previously in this system must have a node status of Candidate and
have no errors listed against them.
7. Resolve any hardware errors until the error condition for all nodes in the
system is None.
8. Ensure that all nodes in the system display a status of candidate.
Results
When all nodes display a status of candidate and all error conditions are None,
you can run the recovery procedure.
Attention: This service action has serious implications if not performed properly.
If at any time an error is encountered not covered by this procedure, stop and call
IBM Support.
One or more of the volumes is offline because there was fast write data in the
cache. Further actions are required to bring the volumes online; see “Recovering
from offline VDisks using the CLI” on page 221 for details (specifically, see the
task concerning recovery from offline VDisks using the command-line interface
(CLI)).
v T3 failed
Start the recovery procedure from any node in the system; the node must not have
participated in any other system. To receive optimal results in maintaining the I/O
group ordering, run the recovery from a node that was in I/O group 0.
Note: Each individual stage of the recovery procedure might take significant time
to complete, dependant upon the specific configuration.
Procedure
1. Click the up or down button until the Actions menu option is displayed; then
click Select.
2. Click the up or down button until the Recover Cluster? option is displayed,
and then click Select; the node displays Confirm Recover?.
3. Click Select; the node displays Retrieving.
After a short delay, the second line displays a sequence of progress messages
indicating the actions are taking place; for example, Finding qdisks. The
backup files are scanned to find the most recent configuration backup data.
After the file and quorum data retrieval is complete, the node displays T3
data: on the top line.
4. Verify the date and time on the second line of the display. The time stamp
shown is the date and time of the last quorum update and must be less than 30
minutes before the failure. The time stamp format is YYYYMMDD hh:mm,
where YYYY is the year, MM is the month, DD is the day, hh is the hour, and
mm is the minute.
Attention: If the time stamp is not less than 30 minutes before the failure, call
IBM support.
5. After verifying the time stamp is correct, press and hold the UP ARROW and
click Select.
The node displays Backup file on the top line.
6. Verify the date and time on the second line of the display. The time stamp
shown is the date and time of the last configuration backup and must be less
than 24 hours before the failure. The time stamp format is YYYYMMDD hh:mm,
where YYYY is the year, MM is the month, DD is the day, hh is the hour, and
mm is the minute.
Attention: If the time stamp is not less than 24 hours before the failure, call
IBM support.
Note: Changes made after the time of this configuration backup might not be
restored.
7. After verifying the time stamp is correct, press and hold the UP ARROW and
click Select.
Note: Any system errors logged at this time might temporarily overwrite the
display; ignore the message: Cluster Error: 3025. After a short delay, the
second line displays a sequence of progress messages indicating the actions
taking place.
When each node is added to the system, the display shows Cluster: on the top
line, and the cluster (system) name on the second line.
Attention: After the last node is added to the system, there is a short delay to
allow the system to stabilize. Do not attempt to use the system. The recovery is
still in progress. Once recovery is complete, the node displays T3 Succeeded on
the top line.
8. Click Select to return the node to normal display.
Results
Note: The web browser must not block pop-up windows, otherwise progress
windows cannot open.
Run the recovery from any nodes in the system; the nodes must not have
participated in any other system.
Note: Each individual stage of the recovery procedure might take significant time
to complete, dependant upon the specific configuration.
Before performing this procedure, read the recover system procedure introductory
information; see “Recover system procedure” on page 215.
Procedure
1. Point your browser to the service IP address of one of the nodes.
If the IP address is unknown or has not been configured, assign an IP address
using the initialization tool.
2. Log on to the service assistant.
Attention: If the time stamp is not less than 24 hours before the failure, call
IBM Support.
Changes made after the time of this backup date might not be restored.
Results
The volumes are back online. Use the final checks to get your environment
operational again.
v T3 recovery completed with errors
T3 recovery completed with errors: One or more of the volumes are offline
because there was fast write data in the cache. To bring the volumes online, see
“Recovering from offline VDisks using the CLI” for details.
v T3 failed
If any errors are logged in the error log after the system recovery procedure
completes, use the fix procedures to resolve these errors, especially the errors
related to offline arrays.
If you have performed the recovery procedure, and it has completed successfully
but there are offline volumes, you can perform the following steps to bring the
volumes back online. Any volumes that are offline and are not thin-provisioned (or
Note: If you encounter errors in the error log after running the recovery procedure
that are related to offline arrays, use the fix procedures to resolve the offline array
errors before fixing the offline volume errors.
Example
Perform the following steps to recover an offline volume after the recovery
procedure has completed:
1. Delete all IBM FlashCopy function mappings and Metro Mirror or Global
Mirror relationships that use the offline volumes.
2. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command. (This will only bring the volume back online so that you can
attempt to deal with the data loss.)
3. Refer to “What to check after running the system recovery” for what to do
with volumes that have been corrupted by the loss of data from the
write-cache.
4. Recreate all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.
The recovery procedure performs a recreation of the old system from the quorum
data. However, some things cannot be restored, such as cached data or system data
managing in-flight I/O. This latter loss of state affects RAID arrays managing
internal storage. The detailed map about where data is out of synchronization has
been lost, meaning that all parity information must be restored, and mirrored pairs
must be brought back into synchronization. Normally this results in either old or
stale data being used, so only writes in flight are affected. However, if the array
had lost redundancy (such as syncing, or degraded or critical RAID status) prior to
the error requiring system recovery, then the situation is more severe. Under this
situation you need to check the internal storage:
v Parity arrays will likely be syncing to restore parity; they do not have
redundancy when this operation proceeds.
v Because there is no redundancy in this process, bad blocks may have been
created where data is not accessible.
v Parity arrays could be marked as corrupt. This indicates that the extent of lost
data is wider than in-flight IO, and in order to bring the array online, the data
loss must be acknowledged.
v Raid-6 arrays that were actually degraded prior the system recovery may require
a full restore from backup. For this reason, it is important to have at least a
capacity match spare available.
Note: Any data that was in the SAN Volume Controller write cache at the time
of the failure is lost.
v Run the application consistency checks.
Configuration data for the system provides information about your system and the
objects that are defined in it. The backup and restore functions of the svcconfig
command can back up and restore only your configuration data for the SAN
Volume Controller system. You must regularly back up your application data by
using the appropriate backup methods.
You can maintain your configuration data for the system by completing the
following tasks:
v Backing up the configuration data
v Restoring the configuration data
v Deleting unwanted backup configuration data files
Before you back up your configuration data, the following prerequisites must be
met:
v No independent operations that change the configuration for the system can be
running while the backup command is running.
v No object name can begin with an underscore character (_).
Note:
v The default object names for controllers, I/O groups, and managed disks
(MDisks) do not restore correctly if the ID of the object is different from what is
recorded in the current configuration data file.
v All other objects with default names are renamed during the restore process. The
new names appear in the format name_r where name is the name of the object in
your system.
Note: You can add new hardware, but you must not remove any hardware
because the removal can cause the restore process to fail.
v No zoning changes were made on the Fibre Channel fabric which would prevent
communication between the SAN Volume Controller and any storage controllers
which are present in the configuration.
Note: It is not currently possible to determine which canister within the identified
enclosure was previously used for cluster creation. Typically the restoration should
be performed via canister 1.
The SAN Volume Controller analyzes the backup configuration data file and the
system to verify that the required disk controller system nodes are available.
Before you begin, hardware recovery must be complete. The following hardware
must be operational: hosts, SAN Volume Controller, drives, the Ethernet network,
and the SAN fabric.
Before you back up your configuration data, the following prerequisites must be
met:
v No independent operations that change the configuration can be running while
the backup command is running.
v No object name can begin with an underscore character (_).
You must regularly back up your configuration data and your application data to
avoid data loss. It is recommended that this is performed after any significant
changes in configuration have been made to the system. Note that the system
automatically creates a backup of the configuration data each day at 1AM. This is
known as a cron backup and is written to /dumps/svc.config.cron.xml_<serial#>
on the configuration node. A manual backup can be generated at any time using
the instructions in this task. If a severe failure occurs, both the configuration of the
system and application data may be lost. The backup of the configuration data can
be used to restore the system configuration to the exact state it was in before the
failure. In some cases it may be possible to automatically recover the application
data. This can be attempted via the <Recover System Procedure> also known as a
Tier 3 (T3) procedure. Restoring the system configuration without attempting to
recover the application data is performed via the <Restoring the System
Configuration> procedure also known as a Tier 4 (T4) recovery. Both of these
procedures require a recent backup of the configuration data.
This task requires the use of the system command line interface (CLI). Refer to the
Reference > Command-line interface if you are unsure how to access the system,
or copy files to and from the system using the CLI.
Procedure
1. Back up all of the application data that you stored on your volumes using
your preferred backup method.
2. Open a command prompt.
3. Using the command-line interface, issue the following command to log on to
the system:
plink -i ssh_private_key_file superuser@cluster_ip
where ssh_private_key_file is the name of the SSH private key file for the
superuser and cluster_ip is the IP address or DNS name of the clustered
system for which you want to back up the configuration.
4. Issue the following CLI command to remove all of the existing configuration
backup and restore files that are on your configuration node in the /tmp
directory.
svcconfig clear -all
5. Issue the following CLI command to back up your configuration:
svcconfig backup
The svcconfig backup CLI command creates three files that provide
information about the backup process and the configuration. These files are
created in the /tmp directory of the configuration node.
The following table describes the three files that are created by the backup
process:
If the process fails, resolve the errors, and run the command again.
7. Issue the following command to exit the system:
exit
8. Issue the following command to copy the backup files to a location that is not
in your system:
pscp -i ssh_private_key_file superuser@cluster_ip:/tmp/svc.config.backup.*
/offclusterstorage/
where cluster_ip is the IP address or DNS name of the system and
offclusterstorage is the location where you want to store the backup files.
If the configuration node changes, you must copy these files to a location
outside of your system because the /tmp directory on this node becomes
inaccessible. The configuration node might change in response to an error
recovery action or to a user maintenance activity.
What to do next
You can rename the backup files to include the configuration node name either at
the start or end of the file names so that you can easily identify these files when
you are ready to restore your configuration.
Issue the following command to rename the backup files that are stored on a Linux
or IBM AIX host:
mv /offclusterstorage/svc.config.backup.xml
/offclusterstorage/svc.config.backup.xml_myconfignode
where offclusterstorage is the name of the directory where the backup files are
stored and myconfignode is the name of your configuration node.
To rename the backup files that are stored on a Windows host, right-click the name
of the file and select Rename.
You must regularly back up your configuration data and your application data to
avoid data loss. If a system is lost after a severe failure occurs, both configuration
for the system and application data is lost. You must reinstate the system to the
exact state it was in before the failure, and then recover the application data.
Important:
1. There are two phases during the restore process: prepare and execute. You must
not change the fabric or system between these two phases.
2. For a SAN Volume Controller with internal solid-state drives (SSDs), all nodes
must be added into the system before restoring your data. See step 10 on page
229.
If you do not understand the instructions to run the CLI commands, see the
command-line interface reference information.
Note: Because the RSA host key has changed, a warning message might
display when you connect to the system using SSH.
6. If the clustered system was previously configured as replication layer, then
use the chsystem command to change the layer setting.
7. Identify the configuration backup file from which you want to restore.
The file can be either a local copy of the configuration backup XML file that
you saved when backing up the configuration or an up-to-date file on one of
the nodes.
Configuration data is automatically backed up daily at 01:00 system time on
the configuration node.
Download and check the configuration backup files on all nodes that were
previously in the system to identify the one containing the most recent
complete backup
For each node in the system:
a. From the management GUI, click Settings > Support.
b. Click Show full log listing.
c. Select the node to operate on from the selection box at the top of the table.
d. Find the file name that begins with svc.config.cron.xml.
e. Double-click the file to download the file to your computer.
f. If a recent configuration file is not present on this node, configure service
IP addresses for other nodes and connect to the service assistant to look for
configuration files on other nodes. For details on how to do this, see the
information regarding service IPv4 or service IPv6 at “Service IPv4 or
Service IPv6 options” on page 120.
The XML files contain a date and time that can be used to identify the most
recent backup. After you identify the backup XML file that is to be used when
you restore the system, rename the file to svc.config.backup.xml.
Note: Issuing this CLI command on a single node system adds the other
nodes to the system.
This CLI command creates a log file in the /tmp directory of the configuration
node. The name of the log file is svc.config.restore.execute.log.
16. Issue the following command to copy the log file to another server that is
accessible to the system:
pscp superuser@cluster_ip:/tmp/svc.config.restore.execute.log
full_path_for_where_to_copy_log_files
17. Open the log file from the server where the copy is now stored.
18. Check the log file to ensure that no errors or warnings have occurred.
What to do next
You can remove any unwanted configuration backup and restore files from the
/tmp directory on your configuration by issuing the following CLI command:
Procedure
1. Issue the following command to log on to the system:
plink -i ssh_private_key_file superuser@cluster_ip
where ssh_private_key_file is the name of the SSH private key file for the
superuser and cluster_ip is the IP address or DNS name of the clustered system
from which you want to delete the configuration.
2. Issue the following CLI command to erase all of the files that are stored in the
/tmp directory:
svconfig clear -all
Similarly, if you have replaced the service controller, use the node rescue procedure
to ensure that the service controller has the correct software.
Attention: If you recently replaced both the service controller and the disk drive
as part of the same repair operation, node rescue fails.
Node rescue works by booting the operating system from the service controller
and running a program that copies all the SAN Volume Controller software from
any other node that can be found on the Fibre Channel fabric.
Procedure
1. Ensure that the Fibre Channel cables are connected.
2. Ensure that at least one other node is connected to the Fibre Channel fabric.
3. Ensure that the SAN zoning allows a connection between at least one port of
this node and one port of another node. It is better if multiple ports can
connect. This is particularly important if the zoning is by worldwide port name
(WWPN) and you are using a new service controller. In this case, you might
need to use SAN monitoring tools to determine the WWPNs of the node. If you
need to change the zoning, remember to set it back when the service procedure
is complete.
4. Turn off the node.
5. Press and hold the left and right buttons on the front panel.
6. Press the power button.
7. Continue to hold the left and right buttons until the node-rescue-request
symbol is displayed on the front panel (Figure 74).
Results
The node rescue request symbol displays on the front panel display until the node
starts to boot from the service controller. If the node rescue request symbol
displays for more than two minutes, go to the hardware boot MAP to resolve the
problem. When the node rescue starts, the service display shows the progress or
failure of the node rescue operation.
Note: If the recovered node was part of a clustered system, the node is now
offline. Delete the offline node from the system and then add the node back into
the system. If node recovery was used to recover a node that failed during a
software upgrade process, it is not possible to add the node back into the system
until the upgrade or downgrade process has completed. This can take up to four
hours for an eight-node clustered system.
The volume virtualization that is provided extends the time when a medium error
is returned to a host. Because of this difference to non-virtualized systems, the
SAN Volume Controller uses the term bad blocks rather than medium errors.
The SAN Volume Controller allocates volumes from the extents that are on the
managed disks (MDisks). The MDisk can be a volume on an external storage
controller or a RAID array that is created from internal drives. In either case,
depending on the RAID level used, there is normally protection against a read
error on a single drive. However, it is still possible to get a medium error on a
read request if multiple drives have errors or if the drives are rebuilding or are
offline due to other issues.
The SAN Volume Controller provides migration facilities to move a volume from
one underlying set of physical storage to another or to replicate a volume that uses
FlashCopy or Metro Mirror or Global Mirror. In all these cases, the migrated
volume or the replicated volume returns a medium error to the host when the
logical block address on the original volume is read. The system maintains tables
of bad blocks to record where the logical block addresses that cannot be read are.
These tables are associated with the MDisks that are providing storage for the
volumes.
Important: The dumpmdiskbadblocks only outputs the virtual medium errors that
have been created, and not a list of the actual medium errors on MDisks or drives.
It is possible that the tables that are used to record bad block locations can fill up.
The table can fill either on an MDisk or on the system as a whole. If a table does
fill up, the migration or replication that was creating the bad block fails because it
was not possible to create an exact image of the source volume.
The system creates alerts in the event log for the following situations:
v When it detects medium errors and creates a bad block
v When the bad block tables fill up
The recommended actions for these alerts guide you in correcting the situation.
Clear bad blocks by deallocating the volume disk extent, by deleting the volume or
by issuing write I/O to the block. It is good practice to correct bad blocks as soon
as they are detected. This action prevents the bad block from being propagated
when the volume is replicated or migrated. It is possible, however, for the bad
block to be on part of the volume that is not used by the application. For example,
it can be in part of a database that has not been initialized. These bad blocks are
corrected when the application writes data to these areas. Before the correction
happens, the bad block records continue to use up the available bad block space.
SAN Volume Controller nodes must be configured in pairs so you can perform
concurrent maintenance.
When you service one node, the other node keeps the storage area network (SAN)
operational. With concurrent maintenance, you can remove, replace, and test all
field replaceable units (FRUs) on one node while the SAN and host systems are
powered on and doing productive work.
Note: Unless you have a particular reason, do not remove the power from both
nodes unless instructed to do so. When you need to remove power, see “MAP
5350: Powering off a SAN Volume Controller node” on page 262.
Procedure
v To isolate the FRUs in the failing node, complete the actions and answer the
questions given in these maintenance analysis procedures (MAPs).
v When instructed to exchange two or more FRUs in sequence:
1. Exchange the first FRU in the list for a new one.
2. Verify that the problem is solved.
3. If the problem remains:
a. Reinstall the original FRU.
b. Exchange the next FRU in the list for a new one.
4. Repeat steps 2 and 3 until either the problem is solved, or all the related
FRUs have been exchanged.
5. Complete the next action indicated by the MAP.
6. If you are using one or more MAPs because of a system error code, mark the
error as fixed in the event log after the repair, but before you verify the
repair.
Note: Start all problem determination procedures and repair procedures with
“MAP 5000: Start.”
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures.”
| This MAP applies to all SAN Volume Controller models. However the front panel
| will only display the status of the first four Fibre Channel ports; the service
You might have been sent here for one of the following reasons:
v The fix procedures sent you here
v A problem occurred during the installation of a SAN Volume Controller
v Another MAP sent you here
v A user observed a problem that was not detected by the system
SAN Volume Controller nodes are configured in pairs. While you service one node,
you can access all the storage managed by the pair from the other node. With
concurrent maintenance, you can remove, replace, and test all FRUs on one SAN
Volume Controller while the SAN and host systems are powered on and doing
productive work.
Notes:
v Unless you have a particular reason, do not remove the power from both nodes
unless instructed to do so.
v If a recommended action in these procedures involves removing or replacing a
part, use the applicable procedure.
v If the problem persists after performing the recommended actions in this
procedure, return to step 1 of the MAP to try again to fix the problem.
Procedure
1. Were you sent here from a fix procedure?
NO Go to step 2
YES Go to step 8 on page 237
2. (from step 1)
Find the IBM System Storage Productivity Center (SSPC) that is close to and is
set up to manage the SAN Volume Controller system. The SSPC is normally
located in the same rack as the SAN Volume Controller system.
3. (from step 2)
Log in to the SSPC using the user ID and password that is provided by the
user.
4. (from step 3)
Log into the management GUI using the user ID and password that is
provided by the user and launch the management GUI for the system that
you are repairing.
5. (from step 4)
Does the management GUI start?
NO Go to step 8 on page 237.
YES Go to step 6.
6. (from step 5)
1
svc00561
NO Go to step 10.
YES The service controller for the SAN Volume Controller has failed.
a. Check that the service controller that is indicating an error is
correctly installed. If it is, replace the service controller.
b. Go to “MAP 5700: Repair verification” on page 282.
10. (from step 9)
Is the operator-information panel error LED 1 that you see in Figure 76
on page 238 illuminated or flashing?
svc00714
2145-CF8 2145-8A4 2145-8G4 2145-8F4
2145-CG8 2145-8F2
NO Go to step 11.
YES Go to “MAP 5800: Light path” on page 284.
11. (from step 10 on page 237)
Is the hardware boot display that you see in Figure 77 displayed on the
node?
NO Go to step 13.
YES Go to step 12.
12. (from step 11)
Has the hardware boot display that you see in Figure 77 displayed for more
than three minutes?
NO Go to step 13.
YES Perform the following:
a. Go to “MAP 5900: Hardware boot” on page 306.
b. Go to “MAP 5700: Repair verification” on page 282.
13. (from step 11 )
Is Failed displayed on the top line of the front-panel display of the node?
NO Go to step 14.
YES Perform the following:
a. Note the failure code and go to “Boot code reference” on page 153
to perform the repair actions.
b. Go to “MAP 5700: Repair verification” on page 282.
14. (from step 13)
Is Booting displayed on the top line of the front-panel display of the node?
NO Go to step 16 on page 239.
YES Go to step 15.
15. (from step 14)
A progress bar and a boot code are displayed. If the progress bar does not
advance for more than three minutes, it has stalled.
Note: The 2145 UPS-1U turns off only when its power button is
pressed, input power has been lost for more than five minutes, or the
SAN Volume Controller node has shut it down following a reported
loss of input power.
20. (from step 19 on page 239)
Is Charging or Recovering displayed in the top line of the front-panel
display of the node?
NO Go to step 21.
YES
v If Charging is displayed, the uninterruptible power supply battery is
not yet charged sufficiently to support the node. If Charging is
displayed for more than two hours, go to “MAP 5150: 2145
UPS-1U” on page 252.
v If Recovering is displayed, the uninterruptible power supply battery
is not yet charged sufficiently to be able to support the node
immediately following a power supply failure. However, if
Recovering is displayed, the node can be used normally.
v If Recovering is displayed for more than two hours, go to “MAP
5150: 2145 UPS-1U” on page 252.
21. (from step 20)
Is Validate WWNN? displayed on the front-panel display of the node?
NO Go to step 22 on page 241.
YES The node is indicating that its WWNN might need changing. It enters
this mode when the node service controller or disk has been changed
but the required service procedures have not been followed.
Note: Do not validate the WWNN until you read the following
information to ensure that you choose the correct value. If you choose
an incorrect value, you might find that the SAN zoning for the node is
also not correct and more than one node is using the same WWNN.
Therefore, it is important to establish the correct WWNN before you
continue.
a. Determine which WWNN that you want to use.
v If the service controller has been replaced, the correct value is
probably the WWNN that is stored on disk (the disk WWNN).
v If the disk has been replaced, perhaps as part of a frame
replacement procedure, but has not been re-initialized, the
correct value is probably the WWNN that is stored on the
service controller (the panel WWNN).
b. Select the stored WWNN that you want this node to use:
v To use the WWNN that is stored on the disk, perform the
following steps:
1) From the Validate WWNN? panel, press and release the
select button. The Disk WWNN: panel is displayed and
shows the last five digits of the WWNN that is stored on the
disk.
Results
If you suspect that the problem is a software problem, see “Upgrading the system”
documentation for details about how to upgrade your entire SAN Volume
Controller environment.
If the problem is still not fixed, collect diagnostic information and contact the IBM
support center.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller SAN
Volume Controller 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, or 2145-8F2 node.
v The power switch failed to turn the node on
v The power switch failed to turn the node off
v Another MAP sent you here
Procedure
1. Are you here because the node is not powered on?
NO Go to step 11 on page 248.
YES Go to step 2.
2. (from step 1)
Is the power LED on the operator-information panel continuously
illuminated? Figure 78 on page 243 shows the location of the power LED 1
on the operator-information panel.
svc00715
2145-CF8 2145-8G4 2145-8F4
2145-CG8 2145-8F2
Figure 78. Power LED on the SAN Volume Controller models 2145-CG8, 2145-CF8,
2145-8G4, and 2145-8F4 or 2145-8F2 operator-information panel
NO Go to step 3.
YES The node is powered on correctly. Reassess the symptoms and return
to “MAP 5000: Start” on page 235 or go to “MAP 5700: Repair
verification” on page 282 to verify the correct operation.
3. (from step 2 on page 242)
Is the power LED on the operator-information panel flashing approximately
four times per second?
NO Go to step 4.
YES The node is turned off and is not ready to be turned on. Wait until the
power LED flashes at a rate of approximately once per second, then
go to step 5.
If this behavior persists for more than three minutes, perform the
following procedure:
a. Remove all input power from the SAN Volume Controller node by
removing the power retention brackets and the power cords from
the back of the node. See “Removing the cable-retention brackets”
to see how to remove the cable-rentention brackets when removing
the power cords from the node.
b. Wait one minute and then verify that all power LEDs on the node
are extinguished.
c. Reinsert the power cords and power retention brackets.
d. Wait for the flashing rate of the power LED to slow down to one
flash per second. Go to step 5.
e. If the power LED keeps flashing at a rate of four flashes per
second for a second time, replace the parts in the following
sequence:
v System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 282.
4. (from step 3)
Is the Power LED on the operator-information panel flashing approximately
once per second?
YES The node is in standby mode. Input power is present. Go to step 5.
NO Go to step 6 on page 244.
5. (from step 3 and step 4)
Press the power-on button on the operator-information panel of the node.
5
svc00574
4 5 4 1
Figure 80. Power LED indicator on the rear panel of the SAN Volume Controller 2145-CG8 or
2145-CF8
1 1
svc00307
2 2
Figure 81. SAN Volume Controller models 2145-8G4 and 2145-8F4 or 2145-8F2 ac and dc
LED indicators on the rear panel
1
2
3
svc00571
Figure 82. Power LED indicator and ac and dc indicators on the rear panel of the SAN
Volume Controller 2145-CG8 or 2145-CF8
NO Verify that the input power cable or cables are securely connected at
both ends and show no sign of damage; otherwise, if the cable or
cables are faulty or damaged, replace them. If the node still fails to
power on, replace the specified parts based on the SAN Volume
Controller model type.
Attention: Be sure that you are turning off the correct 2145 UPS-1U.
If necessary, trace the cables back to the 2145 UPS-1U assembly.
Turning off the wrong 2145 UPS-1U might cause customer data loss.
Go to step 13.
YES Go to step 13.
13. (from step 12)
If necessary, turn on the 2145 UPS-1U that is connected to this node and then
press the power button to turn the node on.
Did the node turn on and boot correctly?
NO Go to “MAP 5000: Start” on page 235 to resolve the problem.
YES Go to step 14.
14. (from step 13)
The node has probably suffered a software failure. Dump data might have
been captured that will help resolve the problem. Call your support center for
assistance.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a 2145-8A4 node.
v The power switch failed to turn the node on.
v The power switch failed to turn the node off.
v Another MAP sent you here.
Procedure
1. Are you here because the node is not turned on?
NO Go to step 9 on page 252.
YES Go to step 2.
2. (from step 1)
Is the power LED on the operator-information panel continuously
illuminated? Figure 83 shows the location of the power LED 1 on the
operator-information panel.
Figure 83. Power LED on the SAN Volume Controller 2145-8A4 operator-information panel
NO Go to step 3.
YES The node turned on correctly. Reassess the symptoms and return to
“MAP 5000: Start” on page 235 or go to “MAP 5700: Repair
verification” on page 282 to verify the correct operation.
3. (from step 2)
Is the power LED on the operator-information panel flashing?
NO Go to step 5 on page 250.
YES The node is in standby mode. Input power is present. Go to step 4.
4. (from step 3)
Attention: Be sure that you are turning off the correct 2145 UPS-1U.
If necessary, trace the cables back to the 2145 UPS-1U assembly.
Turning off the wrong 2145 UPS-1U might cause customer data loss.
Go to step 11.
YES Go to step 11.
11. (from step 8 on page 251)
If necessary, turn on the 2145 UPS-1U that is connected to this node and then
press the power button to turn on the node.
Did the node turn on and boot correctly?
NO Go to “MAP 5000: Start” on page 235 to resolve the problem.
YES Go to step 12.
12. (from step 11)
The node has probably suffered a software failure. Dump data might have
been captured that will help resolve the problem. Contact your IBM service
representative for assistance.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You may have been sent here for one of the following reasons:
v The system problem determination procedures sent you here
v A problem occurred during the installation of a SAN Volume Controller
v Another MAP sent you here
v A customer observed a problem that was not detected by the system problem
determination procedures
Figure 85 shows an illustration of the front of the panel for the 2145 UPS-1U.
7
LOAD 2 LOAD 1 + -
1yyzvm
1 2 3 4 5 6
Table 55 identifies which status and error LEDs that display on the 2145 UPS-1U
front-panel assembly relate to the specified error conditions. It also lists the
uninterruptible power supply alert-buzzer behavior.
Table 55. 2145 UPS-1U error indicators
[5] [6]
[1] Load2 [2] Load1 [3] Alarm [4] Battery Overload Power-on Buzzer Error condition
Green (see Green (see Note 3 No errors; the 2145
Note 1) ) UPS-1U was
configured by the SAN
Volume Controller
Procedure
1. Is the power-on indicator for the 2145 UPS-1U that is connected to the failing
SAN Volume Controller off?
NO Go to step 3.
YES Go to step 2.
2. (from step 1)
Are other 2145 UPS-1U units showing the power-on indicator as off?
NO The 2145 UPS-1U might be in standby mode. This can be because the
on or off button on this 2145 UPS-1U was pressed, input power has
been missing for more than five minutes, or because the SAN Volume
Controller shut it down following a reported loss of input power. Press
and hold the on or off button until the 2145 UPS-1U power-on indicator
is illuminated (approximately five seconds). On some versions of the
2145 UPS-1U, you need a pointed device, such as a screwdriver, to
press the on or off button.
Go to step 3.
YES Either main power is missing from the installation or a redundant
ac-power switch has failed. If the 2145 UPS-1U units are connected to a
redundant ac-power switch, go to “MAP 5320: Redundant ac power”
on page 259. Otherwise, complete these steps:
a. Restore main power to installation.
b. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 258.
3. (from step 1 and step 2)
Are the power-on and load segment 2 indicators for the 2145 UPS-1U
illuminated solid green, with service, on-battery, and overload indicators off?
NO Go to step 4.
YES The 2145 UPS-1U is no longer showing a fault. Verify the repair by
continuing with “MAP 5250: 2145 UPS-1U repair verification” on page
258.
4. (from step 3)
Is the 2145 UPS-1U on-battery indicator illuminated yellow (solid or
flashing), with service and overload indicators off?
NO Go to step 5 on page 256.
YES The input power supply to this 2145 UPS-1U is not working or is not
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You may have been sent here because you have performed a repair and want to
confirm that no other problems exist on the machine.
Procedure
1. Are the power-on and load segment 2 indicators for the repaired 2145
UPS-1U illuminated solid green, with service, on-battery, and overload
indicators off?
NO Continue with “MAP 5000: Start” on page 235.
YES Go to step 2.
2. (from step 1)
Is the SAN Volume Controller node powered by this 2145 UPS-1U powered
on?
NO Press power-on on the SAN Volume Controller node that is connected
to this 2145 UPS-1U and is powered off. Go to step 3.
YES Go to step 3.
3. (from step 2)
Is the node that is connected to this 2145 UPS-1U still not powered on or
showing error codes in the front panel display?
NO Go to step 4.
YES Continue with “MAP 5000: Start” on page 235.
4. (from step 3)
Does the SAN Volume Controller node that is connected to this 2145 UPS-1U
show “Charging” on the front panel display?
NO Go to step 5.
YES Wait for the “Charging” display to finish (this might take up to two
hours). Go to step 5.
5. (from step 4)
Press and hold the test/alarm reset button on the repaired 2145 UPS-1U for
three seconds to initiate a self-test. During the test, individual indicators
illuminate as various parts of the 2145 UPS-1U are checked.
Does the 2145 UPS-1U service, on-battery, or overload indicator stay on?
NO 2145 UPS-1U repair verification has completed successfully. Continue
with “MAP 5700: Repair verification” on page 282.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller.
v “MAP 5150: 2145 UPS-1U” on page 252 sent you here.
Perform the following steps to solve problems that have occurred in the redundant
ac-power switches:
Procedure
1. One or two 2145 UPS-1Us might be connected to the redundant ac-power
switch. Is the power-on indicator on any of the connected 2145 UPS-1Us on?
NO Go to step 3.
YES The redundant ac-power switch is powered. Go to step 2.
2. (from step 1)
Measure the voltage at the redundant ac-power switch output socket connected
to the 2145 UPS-1U that is not showing power-on.
CAUTION:
Ensure that you do not remove the power cable of any powered
uninterruptible power supply units
Is there power at the output socket?
NO One redundant ac-power switch output is working while the other is
not. Replace the redundant ac-power switch.
CAUTION:
You might need to power-off an operational node to replace the
redundant ac-power switch assembly. If this is the case, consult with
the customer to determine a suitable time to perform the
replacement. See “MAP 5350: Powering off a SAN Volume Controller
node” on page 262. After you replace the redundant ac-power switch,
continue with “MAP 5340: Redundant ac power verification” on page
260.
YES The redundant ac-power switch is working. There is a problem with
the 2145 UPS-1U power cord or the 2145 UPS-1U . Return to the
procedure that called this MAP and continue from where you were
within that procedure. It will help you analyze the problem with the
2145 UPS-1U power cord or the 2145 UPS-1U.
3. (from step 1)
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You might have been sent here because you have replaced a redundant ac-power
switch or corrected the cabling of a redundant ac-power switch. You can also use
this MAP if you think a redundant ac-power switch might not be working
correctly, because it is connected to nodes that have lost power when only one ac
power circuit lost power.
In this MAP, you will be asked to confirm that power is available at the redundant
ac-power switch output sockets 1 and 2. If the redundant ac-power switch is
connected to nodes that are not powered on, use a voltage meter to confirm that
power is available.
If the redundant ac-power switch is powering nodes that are powered on (so the
nodes are operational), take some precautions before continuing with these tests.
Although you do not have to power off the nodes to conduct the test, the nodes
will power off if the redundant ac-power switch is not functioning correctly.
For each of the powered-on nodes connected to this redundant ac-power switch,
perform the following steps:
1. Use the management GUI or the command-line interface (CLI) to confirm that
the other node in the same I/O group as this node is online.
If any of these tests fail, correct any failures before continuing with this MAP. If
you are performing the verification using powered-on nodes, understand that
power is no longer available if the following is true:
v The on-battery indicator on the 2145 UPS-1U that connects the redundant
ac-power switch to the node lights for more than five seconds.
v The SAN Volume Controller node display shows Power Failure.
When the instructions say “remove power,” you can switch the power off if the
sitepower distribution unit has outputs that are individually switched; otherwise,
remove the specified redundant ac-power switch power cable from the site power
distribution unit's outlet.
Procedure
1. Are the two site power distribution units providing power to this redundant
ac-power switch connected to different power circuits?
NO Correct the problem and then return to this MAP.
YES Go to step 2.
2. (from step 1)
Are both of the site power distribution units providing power to this redundant
ac-power switch powered?
NO Correct the problem and then return to the start of this MAP.
YES Go to step 3.
3. (from step 2)
Are the two cables that are connecting the site power distribution units to the
redundant ac-power switch connected?
NO Correct the problem and then return to the start of this MAP.
YES Go to step 4.
4. (from step 3)
Is there power at the redundant ac-power switch output socket 2?
NO Go to step 8 on page 262.
YES Go to step 5.
5. (from step 4)
Is there power at the redundant ac-power switch output socket 1?
NO Go to step 8 on page 262.
YES Go to step 6.
6. (from step 5)
Remove power from the Main power cable to the redundant ac-power switch.
Is there power at the redundant ac-power switch output socket 1?
NO Go to step 8 on page 262.
Results
Powering off a single node will not normally disrupt the operation of a clustered
system. This is because, within a SAN Volume Controller system, nodes operate in
pairs called an I/O group. An I/O group will continue to handle I/O to the disks
it manages with only a single node powered on. There will, however, be degraded
performance and reduced resilience to error.
Care must be taken when powering off a node to ensure the system is not
impacted more than it need be. If the procedures outlined here are not followed, it
is possible your application hosts will lose access to their data or, in the worst case,
data will be lost.
You can use the following preferred methods to power off a node that is a member
of a system and not offline:
1. Use the Shut Down a Node option on the management GUI
2. Use the CLI command stopcluster –nodename.
If a node is offline or not a member of a system, it must be powered off using the
power button.
To provide the least disruption when powering off a node, the following should all
apply:
v The other node in the I/O group should be powered on and active in the
system.
In some circumstances, the reason you are powering off the node might make
meeting these conditions impossible; for instance, if you are replacing a broken
Fibre Channel card, the volumes will not be showing an online status. You should
use your judgment to decide when it is safe to proceed when a condition has not
been met. Always check with the system administrator before proceeding with a
power off that you know will disrupt I/O access, as they might prefer to either
wait until a more suitable time or suspend the host applications
To ensure a smooth restart, a node must save the data structures it cannot recreate
to its local, internal, disk drive. The amount of data it saves to local disk can be
high, so this operation might take several minutes. Do not attempt to interrupt the
controlled power off.
Attention: The following actions do not allow the node to save data to its local
disk. Therefore, you should not power off a node using these methods:
v Removing the power cable between the node and the uninterruptible power
supply. Normally the uninterruptible power supply provides sufficient power to
allow the write to local disk in the event of a power failure, but obviously it is
unable to provide power in this case.
v Holding down the power button on the node. When the power button is pressed
and released, the node indicates this to the software and the node can write its
data to local disk before it powers off. If the power button is held down, the
hardware interprets this as an emergency power off and shuts down
immediately without giving you the opportunity to save the data to a local disk.
The emergency power off occurs approximately four seconds after the power
button is pressed and held down.
v Pressing the reset button on the light path diagnostics panel.
Perform the following steps to use the management GUI to power off a system:
Procedure
1. Sign on to the IBM System Storage Productivity Center as an administrator and
then launch the management GUI for the system that you are servicing.
2. Find the system that you are about to shut down.
If the nodes that you want to power off are shown as Offline, then the nodes
are not participating in the system. In these circumstances, you must use the
power button on the nodes to power off the nodes.
If the nodes that you want to power off are shown as Online, powering off the
nodes can result in the dependent volumes to also go offline. Verify whether or
not the nodes have any dependent volumes.
3. Select the node and click Show Dependent Volumes.
4. Make sure that the status of each volume in the I/O group is Online. You
might need to view more than one page.
Note: If, after waiting 30 minutes, you have a degraded volume and all of the
associated nodes and MDisks are online, contact the IBM Support Center for
assistance.
Ensure that all volumes that are being used by hosts are online before you
continue.
5. If possible, check that all the hosts that access the volumes that are managed by
this I/O group are able to fail over to use paths that are provided by the other
node in the group.
Perform this check using the multipathing device driver software of the host
system. The commands to use differ, depending on the multipathing device
driver being used. If you are using the System Storage Multipath Subsystem
Device Driver (SDD), the command to query paths is datapath query device. It
can take some time for the multipathing device drivers to rediscover paths after
a node is powered on. If you are unable to check on the host that all paths to
both nodes in the I/O group are available, do not power off a node within 30
minutes of the partner node being powered on or you might lose access to the
volume.
6. If you have decided it is okay to continue and power off the nodes, select the
system that you want to power off, and then click Shut Down System.
7. Click OK. If you have selected a node that is the last remaining node that
provides access to a volume for example, a node that contains solid-state drives
(SSDs) with unmirrored volumes, the Shutting Down a Node-Force panel is
displayed with a list of volumes that will go offline if this node is shut down.
8. Check that no host applications are accessing the volumes that will go offline;
only continue with the shut down if the loss of access to these volumes is
acceptable. To continue with shutting down the node, click Force Shutdown.
What to do next
During the shut down, the node saves its data structures to its local disk and
destages all the write data held in cache to the SAN disks; this processing can take
several minutes.
Procedure
1. Issue the lsnode CLI command to display a list of nodes in the system and
their properties. Find the node that you are about to shut down and write
down the name of the I/O group it belongs to. Confirm that the other node in
the I/O group is online.
lsnode -delim :
id:name:UPS_serial_number:WWNN:status:IO_group_id: IO_group_name:config_node:
vdisk_id vdisk_name
0 vdisk0
1 vdisk1
If the node goes offline or is removed from the system, the dependent volumes
also go offline. Before taking a node offline or removing it from the system, you
can use the command to ensure that you do not lose access to any volumes.
3. If you have decided that it is okay to continue and that you can power off the
node, issue the stopcluster –node <name> CLI command to power off the
node. Ensure that you use the –node parameter, because you do not want to
power off the whole system:
stopcluster –node group1node1
Are you sure that you want to continue with the shut down? yes
Note: If there are dependent volumes and you want to shut down the node
anyway, add the -force parameter to the stopcluster command. The force
parameter forces continuation of the command even though any
node-dependent volumes will be taken offline. Use the force parameter with
caution; access to data on node-dependent volumes will be lost.
During the shut down, the node saves its data structures to its local disk and
destages all the write data held in the cache to the SAN disks; this process can
take several minutes.
At the end of this process, the node powers off.
With this method, you cannot check the system status from the front panel, so you
cannot tell if the power off is liable to cause excessive disruption to the system.
Instead, use the management GUI or the CLI commands, described in the previous
topics, to power off an active node.
If you must use this method, notice in Figure 86 that each model type has a power
control button 1 on the front.
1 1 1 1 1
svc00716
2145-CF8 2145-8A4 2145-8G4 2145-8F4
2145-CG8 2145-8F2
Figure 86. Power control button on the SAN Volume Controller models
When you have determined it is safe to do so, press and immediately release the
power button. The front panel display changes to display Powering Off, and a
progress bar is displayed.
The 2145-CG8 or the 2145-CF8 requires that you remove a power button cover
before you can press the power button. The 2145-8A4, the 2145-8G4, the 2145-8F4,
or 2145-8F2 might require you to use a pointed device to press the power button.
If you press the power button for too long, the node cannot write all the data to its
local disk. An extended service procedure is required to restart the node, which
involves deleting the node from the system and adding it back into the system.
Results
The node saves its data structures to disk while powering off. The power off
process can take up to five minutes.
When a node is powered off by using the power button (or because of a power
failure), the partner node in its I/O group immediately stops using its cache for
new write data and destages any write data already in its cache to the SAN
attached disks. The time taken by this destage depends on the speed and
utilization of the disk controllers; it should complete in less than 15 minutes, but it
could be longer, and it cannot complete if there is data waiting to be written to a
disk that is offline.
If a node powers off and restarts while its partner node continues to process I/O,
it might not be able to become an active member of the I/O group immediately. It
has to wait until the partner node completes its destage of the cache. If the partner
node is powered off during this period, access to the SAN storage that is managed
by this I/O group is lost. If one of the nodes in the I/O group is unable to service
any I/O, for example, because the partner node in the I/O group is still flushing
266 SAN Volume Controller: Troubleshooting Guide
its write cache, the volumes that are managed by that I/O group will have a status
of Degraded.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
| This MAP applies to all SAN Volume Controller models. However the front panel
| will only display the status of the first four Fibre Channel ports; the service
| assistant GUI should be used if the node has more than four Fibre Channel
| ports.Be sure that you know which model you are using before you start this
procedure. To determine which model you are working with, look for the label that
identifies the model type on the front of the node.
Procedure
1. Is the power LED on the operator-information panel illuminated and
showing a solid green?
NO Continue with the power MAP. See “MAP 5050: Power 2145-CG8,
2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2” on page 242 or “MAP
5060: Power 2145-8A4” on page 249.
YES Go to step 2.
2. (from step 1)
Is the service controller error light 1 that you see in Figure 87 illuminated
and showing a solid amber?
1
svc00561
NO Start the front panel tests by pressing and holding the select button for
five seconds. Go to step 3 on page 268.
Check each switch in turn. Did the service panel switches and display operate
as described in Figure 88?
NO The SAN Volume Controller front panel has failed its switch test.
v Replace the service controller.
v Verify the repair by continuing with “MAP 5700: Repair verification”
on page 282.
YES Press and hold the select button for five seconds to exit the test. Go to
step 5.
5. Is the front-panel display now showing Cluster:?
NO Continue with “MAP 5000: Start” on page 235.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
| This MAP applies to all SAN Volume Controller models. However the front panel
| will only display the status of the first four Fibre Channel ports; the service
| assistant GUI should be used if the node has more than four Fibre Channel
| ports.Be sure that you know which model you are using before you start this
procedure. To determine which model you are working with, look for the label that
identifies the model type on the front of the node.
If you encounter problems with the 10 Gbps Ethernet feature on the SAN Volume
Controller 2145-CG8, see “MAP 5550: 10G Ethernet and Fibre Channel over
Ethernet personality enabled Adapter port” on page 273.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Ethernet checks failed
v Another MAP sent you here
v The customer needs immediate access to the system by using an alternate
configuration node. See “Defining an alternate configuration node” on page 272
1 2 3 4 5
svc00718
Figure 89. Port 2 Ethernet link LED on the SAN Volume Controller rear panel
If all Ethernet connections to the configuration node have failed, the system is
unable to report failure conditions, and the management GUI is unable to access
the system to perform administrative or service tasks. If this is the case and the
customer needs immediate access to the system, you can make the system use an
alternate configuration node.
If only one node is displaying Node Error 805 on the front panel, perform the
following steps:
Procedure
1. Press and release the power button on the node that is displaying Node Error
805.
2. When Powering off is displayed on the front panel display, press the power
button again.
3. Restarting is displayed.
Results
The system will select a new configuration node. The management GUI is able to
access the system again.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
This MAP applies to the SAN Volume Controller 2145-CG8 model with the 10G
Ethernet feature installed. Be sure that you know which model you are using
before you start this procedure. To determine which model you are working with,
look for the label that identifies the model type on the front of the node. Check
that the 10G Ethernet adapter is installed and that an optical cable is attached to
each port. Figure 18 on page 20 shows the rear panel of the 2145-CG8 with the 10G
Ethernet ports.
If you experience a problem with error code 805, go to “MAP 5500: Ethernet” on
page 269.
If you experience a problem with error code 703 or 723, go to “Fibre Channel and
10G Ethernet link failures” on page 211.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Ethernet checks failed
v Another MAP sent you here
Procedure
1. Is node error 720 or 721 displayed on the front panel of the affected node or
is service error code 1072 shown in the event log?
YES Go to step 11 on page 275.
NO Go to step 2.
2. (from step 1) Perform the following actions from the front panel of the
affected node:
a. Press and release the up or down button until Ethernet is shown.
b. Press and release the left or right button until Ethernet port 3 is shown.
Was Ethernet port 3 found?
No Go to step 11 on page 275
Yes Go to step 3
3. (from step 2) Perform the following actions from the front panel of the
affected node:
a. Press and release the up or down button until Ethernet is shown.
b. Press and release the up or down button until Ethernet port 3 is shown.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
| This MAP applies to all SAN Volume Controller models. However the front panel
| will only display the status of the first four Fibre Channel ports; the service
| assistant GUI should be used if the node has more than four Fibre Channel
| ports.Be sure that you know which model you are using before you start this
procedure. To determine which model you are working with, look for the label that
identifies the model type on the front of the node.
You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Fibre Channel checks failed
v Another MAP sent you here
Perform the following steps to solve problems caused by the Fibre Channel ports:
Note: After the node joins the system, the node's Fibre Channel port
speed will be changed to match the system setting. Check the setting
before changing the node.
a. Press and hold the down button.
b. Press and release the select button.
c. Release the down button.
The Fibre Channel speed setting is shown on the display. If this
value does not match the speed of the SAN, use the down and up
buttons to set it correctly.
d. Press the select button to accept any changes and return to the
Fibre Channel status display.
e. If the status shows active, continue with “MAP 5700: Repair
verification” on page 282. Otherwise, go to step 9.
9. (from step 8 on page 278)
The noted port on the SAN Volume Controller displays a status of inactive. If
the noted port still displays a status of inactive, replace the parts that are
associated with the noted port until the problem is fixed in the following
order:
a. Fibre Channel cables from the SAN Volume Controller to Fibre Channel
network.
b. Faulty Fibre Channel fabric connections, particularly the SFP transceiver at
the Fibre Channel switch. Use the SAN problem determination procedure
to resolve any Fibre Channel fabric connection problem.
c. SAN Volume Controller Fibre Channel SFP transceiver.
Note: SAN Volume Controller nodes are supported with both longwave
SFPs and shortwave SFPs. You must replace an SFP with the same type of
SFP transceiver that you are replacing. If the SFP transceiver to replace is a
longwave SFP transceiver, for example, you must provide a suitable
replacement. Removing the wrong SFP transceiver could result in loss of
data access. See the “Removing and replacing the Fibre Channel SFP
transceiver on a SAN Volume Controller node” documentation to find out
how to replace an SFP transceiver.
d. Replace the Fibre Channel adapter assembly shown in the following table:
Note: SAN Volume Controller nodes are supported with both longwave
SFP transceivers and shortwave SFP transceivers. You must replace an SFP
transceiver with the same type of SFP transceiver. If the SFP transceiver to
replace is a longwave SFP transceiver, for example, you must provide a
suitable replacement. Removing the wrong SFP transceiver could result in
loss of data access. See the “Removing and replacing the Fibre Channel
SFP transceiver on a SAN Volume Controller node” documentation to find
out how to replace an SFP transceiver.
b. Replace the Fibre Channel adapter assembly shown in the following table:
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
You might have been sent here because you performed a repair and want to
confirm that no other problems exists on the machine.
Procedure
1. Are the Power LEDs on all the nodes on? For more information about this
LED, see “Power LED” on page 18.
NO Go to “MAP 5000: Start” on page 235.
YES Go to step 2.
2. (from step 1)
Are all the nodes displaying Cluster: on the top line of the front panel
display with the second line blank or displaying a system name?
NO Go to “MAP 5000: Start” on page 235.
YES Go to step 3.
3. (from step 2)
Using the SAN Volume Controller application for the system you have just
repaired, check the status of all configured managed disks (MDisks).
Do all MDisks have a status of online?
NO If any MDisks have a status of offline, repair the MDisks. Use the
problem determination procedure for the disk controller to repair the
MDisk faults before returning to this MAP.
If any MDisks have a status of degraded paths or degraded ports,
repair any storage area network (SAN) and MDisk faults before
returning to this MAP.
If any MDisks show a status of excluded, include MDisks before
returning to this MAP.
Go to “MAP 5000: Start” on page 235.
YES Go to step 4.
4. (from step 3)
Using the SAN Volume Controller application for the system you have just
repaired, check the status of all configured volumes. Do all volumes have a
status of online?
NO Go to step 5.
YES Go to step 6 on page 284.
5. (from step 4)
Following a repair of the SAN Volume Controller, a number of volumes are
showing a status of offline. Volumes will be held offline if SAN Volume
Controller cannot confirm the integrity of the data. The volumes might be the
target of a copy that did not complete, or cache write data that was not written
back to disk might have been lost. Determine why the volume is offline. If the
volume was the target of a copy that did not complete, you can start the copy
again. Otherwise, write data might not have been written to the disk, so its
state cannot be verified. Your site procedures will determine how data is
restored to a known state.
To bring the volume online, you must move all the offline disks to the recovery
I/O group and then move them back to an active I/O group.
Go to “MAP 5000: Start” on page 235.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
Ensure that the node is turned on, and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:
Procedure
1. Is the Error LED, shown in Figure 90, on the SAN Volume Controller
2145-CG8 operator-information panel on or flashing?
1 2
svc00721
REMIND
OVERSPEC LOG LINK PS PCI SP
RESET
Light Path Diagnostics
Figure 91. SAN Volume Controller 2145-CG8 or 2145-CF8 light path diagnostics panel
23
22
21
20 6
19
18
7
17
svc00713
16
15 14 13 12 11 10 9 8
Figure 92. SAN Volume Controller 2145-CG8 system board LEDs diagnostics panel
If an SSD has been deliberately removed from a slot, the system error
LED and the DASD diagnostics panel LED will light. The error is
maintained even if the SSD is replaced in a different slot. If an SSD has
been removed or moved, the error is cleared by powering off the node
using MAP 5350, removing both the power cables, replacing the power
cables, and then restarting the node.
Resolve any node or system errors that relate to SSDs or the system disk
drive.
If an error is still shown, power off the node and reseat all the drives.
If the error remains, replace the following components in the order listed:
1. The system disk drive
2. The disk backplane
RAID This is not used on the SAN Volume Controller 2145-CG8.
BRD An error occurred on the system board. Perform the following actions to
resolve the problem:
1. Check the LEDs on the system board to identify the component that
caused the error. The BRD LED can be lit because of any of the
following reasons:
v Battery
v Missing PCI riser-card assembly. There must be a riser card in PCI
slot 2 even if the optional adapter is not present.
v Failed voltage regulator
2. Replace any failed or missing replacement components, such as the
battery or PCI riser-card assembly.
3. If a voltage regulator fails, replace the system board.
Ensure that the node is turned on, and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:
Procedure
1. Is the Error LED, shown in Figure 93, on the SAN Volume Controller
2145-CF8 operator-information panel on or flashing?
1 2 3 4 5
svc_bb1gs008
2 1
4 3
10 9 8 7 6
REMIND
OVERSPEC LOG LINK PS PCI SP
RESET
Light Path Diagnostics
Figure 94. SAN Volume Controller 2145-CG8 or 2145-CF8 light path diagnostics panel
24
5
23
6
22
7
21
20
19 8
18
9
17
16 14 13 12 11 10
15
Figure 95. SAN Volume Controller 2145-CF8 system board LEDs diagnostics panel
If an SSD has been deliberately removed from a slot, the system error
LED and the DASD diagnostics panel LED will light. The error is
maintained even if the SSD is replaced in a different slot. If an SSD has
been removed or moved, the error is cleared by powering off the node
using MAP 5350, removing both the power cables, replacing the power
cables, and then restarting the node.
Resolve any node or system errors that relate to SSDs or the system disk
drive.
If an error is still shown, power off the node and reseat all the drives.
If the error remains, replace the following components in the order listed:
1. The system disk drive
2. The disk backplane
RAID This is not used on the SAN Volume Controller 2145-CF8.
BRD An error occurred on the system board. Perform the following actions to
resolve the problem:
1. Check the LEDs on the system board to identify the component that
caused the error. The BRD LED can be lit because of any of the
following reasons:
v Battery
v Missing PCI riser-card assembly. There must be a riser card in PCI
slot 2 even if the optional adapter is not present.
v Failed voltage regulator
2. Replace any failed or missing replacement components, such as the
battery or PCI riser-card assembly.
3. If a voltage regulator fails, replace the system board.
Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:
Procedure
1. Is the Error LED, shown in Figure 96, on the SAN Volume Controller
2145-8A4 operator-information panel on or flashing?
Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors indicated by the Error LED and light path LEDs:
Procedure
1. Is the Error LED, shown in Figure 98 on page 299, on the SAN Volume
Controller 2145-8G4 operator-information panel illuminated or flashing?
svc00230
Figure 98. SAN Volume Controller 2145-8G4 operator-information panel
Light Path
Diagnostics
SP DASD RAID
PCI
svc00224
Figure 99. SAN Volume Controller 2145-8G4 light path diagnostics panel
Are one or more LEDs on the light path diagnostics panel on or flashing?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 282.
YES See Table 61 on page 301 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 302.
Some actions will require that you observe the state of LEDs on the
system board. Figure 100 on page 300 shows the location of the system
board LEDs. The fan LEDs are located adjacent to each FAN. To view
the LEDs you will need to do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13
svc00231
14 15 16 17 18
3. Continue with “MAP 5700: Repair verification” on page 282 to verify the
correct operation.
Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors indicated by the Error LED and light path LEDs:
Procedure
1. Is the Error LED, shown in Figure 101 on page 303, on the SAN Volume
Controller 2145-8F2 or the SAN Volume Controller 2145-8F4
operator-information panel illuminated or flashing?
svc00108
Figure 101. SAN Volume Controller 2145-8F4 operator-information panel
Light Path
Diagnostics
SP DASD
Figure 102. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 light
path diagnostics panel
Are one or more LEDs on the light path diagnostics panel on or flashing?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. Cable, signal, front panel
c. Frame assembly
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 282.
YES See Table 62 on page 305 and perform the action specified for the
specific light path diagnostics LEDs, then go to step 3 on page 306.
Some actions will require that you observe the state of LEDs on the
system board or on the fan backplanes. The location of the system
board LEDs are shown in Figure 103 on page 304. The fan LEDs are
located adjacent to each FAN. To view the LEDs you will need to do
the following:
8 7 6
9
10
11
12
13
14
15
16
1 2 3 4
svc00107
Figure 103. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 system
board LEDs
3. Continue with “MAP 5700: Repair verification” on page 282 to verify the
correct operation.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
| This MAP applies to all SAN Volume Controller models. However the front panel
| will only display the status of the first four Fibre Channel ports; the service
| assistant GUI should be used if the node has more than four Fibre Channel
| ports.Be sure that you know which model you are using before you start this
procedure. To determine which model you are working with, look for the label that
identifies the model type on the front of the node.
You might have been sent here for one of the following reasons:
v The hardware boot display, shown in Figure 104, is displayed continuously.
v The boot progress is hung and an error is displayed on the front panel
v Another MAP sent you here
Perform the following steps to allow the node to start its boot sequence:
Procedure
1. Is the Error LED on the operator-information panel illuminated or flashing?
NO Go to step 2.
YES Go to “MAP 5800: Light path” on page 284 to resolve the problem.
2. (From step 1)
If you have just installed the SAN Volume Controller node or have just
replaced a field replaceable unit (FRU) inside the node, perform the
following steps:
a. Ensure that the correct power cable assembly from the 2145 UPS-1U to the
node is installed. The correct power cable assembly has tape that binds the
cables together.
b. Identify and label all the cables that are attached to the node so that they
can be replaced in the same port. Remove the node from the rack and place
it on a flat, static-protective surface. See the “Removing the node from a
rack” information to find out how to perform the procedure.
1 2 1 2
svc00675
1 2 1 2
Figure 106. Keyboard and monitor ports on the SAN Volume Controller models 2145-8G4,
2145-8A4, 2145-8F4 and 2145-8F2
svc00572
1
Figure 107. Keyboard and monitor ports on the SAN Volume Controller 2145-CF8
svc00723
2
Figure 108. Keyboard and monitor ports on the SAN Volume Controller 2145-CG8
Note: With the FRUs removed, the boot will hang with a different boot failure
code.
NO Go to step 6 to replace the FRUs, one-at-a-time, until the failing FRU is
isolated.
YES Go to step 7
6. (From step 5)
Remove all hardware except the hardware that is necessary to power up.
Continue to add in the FRUs one at a time and power on each time until the
original failure is introduced.
Does the boot operation still hang?
NO Verify the repair by continuing with “MAP 5700: Repair verification”
on page 282.
YES Go to step 7.
7. (from steps 4 and 6)
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.
Use this MAP to determine which detailed MAP to use for replacing an offline
SSD.
Attention: If the drive use property is member and the drive must be replaced,
contact IBM support before taking any actions.
Procedure
Are you using an SSD in a RAID 0 array and using volume mirroring to provide
redundancy?
Yes Go to “MAP 6001: Replace offline SSD in a RAID 0 array.”
No Go to “MAP 6002: Replace offline SSD in RAID 1 array or RAID 10 array”
on page 314.
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.
Attention:
1. Back up your SAN Volume Controller configuration before you begin these
steps.
2. If the drive use property is member and the drive must be replaced, contact IBM
support before taking any actions.
Perform the following steps only if a drive in a RAID 0 (striped) array has failed:
Note: If a listed volume has a mirrored, online, and in-sync copy, you can
recover the copied volume data from the copy. All the data on the unmirrored
volumes will be lost and will need to be restored from backup.
2. Delete the storage pool using the rmmdiskgrp -force <mdiskgrp id> CLI
command.
All MDisks and volume copies in the storage pool are also deleted. If any of
the volume copies were the last in-sync copy of a volume, all the copies that
are not in sync are also deleted, even if they are not in the storage pool.
3. Using the drive ID that you recorded in substep 1e, set the use property of the
drive to unused using the chdrive command.
chdrive -use unused <id of offline drive>
The drive is removed from the drive listing.
4. Follow the physical instructions to replace or remove a drive. See the
“Replacing a SAN Volume Controller 2145-CG8 solid-state drive (SSD)”
If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 235.
This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.
Procedure
1. Make sure the drive property use is not member.
Use the lsdrive CLI command to determine the use.
2. Record the drive property values of the node ID and the slot ID for use in step
4. These values identify which physical drive to remove.
3. Record the error sequence number for use in step 11.
4. Use the drive ID that you recorded in step 2 to set the use attribute property
of the drive to unused with the chdrive command.
chdrive -use failed <id of offline drive>
chdrive -use unused <id of offline drive>
The drive is removed from the drive listing.
5. Follow the physical instructions to replace or remove a drive. See the
“Replacing a SAN Volume Controller 2145-CG8 solid-state drive (SSD)”
documentation or the “Removing a SAN Volume Controller 2145-CG8
solid-state drive (SSD)” documentation to find out how to perform the
procedures.
6. A new drive object is created with the use property set to unused.
7. Change the use property for the drive to candidate.
chdrive -use candidate <id of new drive>
8. Change the use property for the drive to spare.
chdrive -use spare <id of new drive>
v If you are using spare drives, perform a member exchange. Move data from
the spare to the newly inserted device.
v If you do not have a spare, when you mark the drive object as spare, the
array starts to build on the newly inserted device.
9. If the spare is not a perfect match for the replaced drive, then the array is
considered unbalanced, and error code 1692 is recorded in the error log.
10. Follow the fix procedure to complete the procedure.
11. Mark the drive error as fixed using the error sequence number from step 3.
cherrstate -sequencenumber <error_sequence_number>
Accessibility features
The SAN Volume Controller Information Center and its related publications are
accessibility-enabled. The accessibility features of the Information Center are
described in Viewing information in the information center in the Information
Center.
Keyboard navigation
You can use keys or key combinations to perform operations and initiate menu
actions that can also be done through mouse actions. You can navigate the SAN
Volume Controller Information Center from the keyboard by using the shortcut
keys for your browser or screen-reader software. See your browser or screen-reader
software Help for a list of shortcut keys that it supports.
See the IBM Human Ability and Accessibility Center for more information about
the commitment that IBM has to accessibility.
IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply
to you.
Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact:
IBM Corporation
Almaden Research
650 Harry Road
Bldg 80, D3-304, Department 277
San Jose, CA 95120-6099
U.S.A.
The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.
All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject
to change without notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to
change before the products described become available.
This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
If you are viewing this information softcopy, the photographs and color
illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of
International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the web at Copyright and
trademark information at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered
trademarks or trademarks of Adobe Systems Incorporated in the United States,
and/or other countries.
Linux and the Linux logo is a registered trademark of Linus Torvalds in the United
States, other countries, or both.
Other product and service names might be trademarks of IBM or other companies.
This equipment has been tested and found to comply with the limits for a Class A
digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can
radiate radio frequency energy and, if not installed and used in accordance with
the instruction manual, might cause harmful interference to radio communications.
Operation of this equipment in a residential area is likely to cause harmful
interference, in which case the user will be required to correct the interference at
his own expense.
Properly shielded and grounded cables and connectors must be used in order to
meet FCC emission limits. IBM is not responsible for any radio or television
interference caused by using other than recommended cables and connectors, or by
Notices 321
unauthorized changes or modifications to this equipment. Unauthorized changes
or modifications could void the user's authority to operate the equipment.
This device complies with Part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) this device might not cause harmful interference, and
(2) this device must accept any interference received, including interference that
might cause undesired operation.
Responsible Manufacturer:
“Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im
Wohnbereich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber
verlangt werden, angemessene Mabnahmen zu ergreifen und dafür
aufzukommen.”
Dieses Gerät ist berechtigt, in übereinstimmung mit dem Deutschen EMVG das
EG-Konformitätszeichen - CE - zu führen.
Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:
Generelle Informationen:
Das Gerät erfüllt die Schutzanforderungen nach EN 55024 und EN 55022 Klasse
A.
Notices 323
Taiwan Class A compliance statement
jjieta2
Korean Communications Commission Class A Statement
This explains the Korean Communications Commission (KCC) statement.
Notices 325
326 SAN Volume Controller: Troubleshooting Guide
Index
Numerics adding
nodes 73
circuit breakers (continued)
requirements (continued)
10 Gbps Ethernet address SAN Volume Controller
link failures 273 MAC 109 2145-CG8 37
MAP 5550 273 Address Resolution Protocol (ARP) 5 CLI
10 Gbps Ethernet card addressing cluster (system) commands 77
activity LED 19 configuration node 5 service commands 78
10G Ethernet 211, 273 CLI commands
2145 UPS-1U lssystem
alarm 55
circuit breakers 56 B displaying clustered system
properties 91
connecting 53 back-panel assembly
cluster (system) CLI
connectors 56 SAN Volume Controller 2145-8A4
accessing 78
controls and indicators on the front connectors 24
when to use 77
panel 54 indicators 24
cluster (system) commands
description of parts 56 SAN Volume Controller 2145-8F2
CLI 77
dip switches 56 connectors 31
clustered system
environment 58 indicators 30
restore 222
heat output of node 38 SAN Volume Controller 2145-8F4
T3 recovery 222
Load segment 1 indicator 55 connectors 28
clustered systems
Load segment 2 indicator 55 indicators 28
adding nodes 73
MAP SAN Volume Controller 2145-8G4
Call Home email 130, 133
5150: 2145 UPS-1U 253 connectors 26
deleting nodes 71
5250: repair verification 258 indicators 26
error codes 155
nodes SAN Volume Controller 2145-CF8
IP address
heat output 38 connectors 22
configuration node 5
on or off button 55 indicators 22
IP failover 6
on-battery indicator 55 SAN Volume Controller 2145-CG8
IPv4 address 106
operation 53 connectors 20
IPv6 address 107
overload indicator 55 indicators 19
metadata, saving 101
ports not used 56 backing up
options 106
power-on indicator 55 system configuration files 224
overview 5
service indicator 55 backup configuration files
properties 91
test and alarm-reset button 56 deleting
recovery codes 155
unused ports 56 using the CLI 230
removing nodes 71
restoring 227
restore 216
bad blocks 233
T3 recovery 216
A battery
Charging, front panel display 100
codes
about this document node error
power 101
sending comments xv critical 153
boot
ac and dc LEDs 33 noncritical 153
codes, understanding 153
ac power switch, cabling 49 node rescue 153
failed 99
accessibility 317 commands
progress indicator 99
repeat rate svcconfig backup 224
buttons, navigation 13
up and down buttons 317 svcconfig restore 227
repeat rate of up and down comments, sending xv
buttons 124 configuration
accessing C node failover 6
cluster (system) CLI 78 Call Home 130, 133 configuration node 5
management GUI 70 Canadian electronic emission notice 322 connecting
publications 317 charging 100 2145 UPS-1U 53
service assistant 77 circuit breakers connectors
service CLI 78 2145 UPS-1U 56 2145 UPS-1U 56
action menu options requirements SAN Volume Controller 2145-8A4 24
front panel display 110 SAN Volume Controller SAN Volume Controller 2145-8F2 31
sequence 110 2145-8A4 42 SAN Volume Controller 2145-8F4 28
action options SAN Volume Controller SAN Volume Controller 2145-8G4 26
node 2145-8G4 44 SAN Volume Controller 2145-CF8 22
create cluster 115 SAN Volume Controller SAN Volume Controller 2145-CG8 20
active status 106 2145-CF8 39 contact information
Taiwan 324
fields
system board assembly 58 I Ethernet activity LED 19
information center xii
I/O operations, stopped 101
description for the node vital product information help xv
identification
data 92 information, system
label, node 14
description for the system vital LED 18
name 108
product data 96 informational events 134
number 108
device 92 interface
inactive status 106
event log 129 front panel 78
indicators and controls on the front panel
fibre-adapter card 92 inventory information
2145 UPS-1U
front panel 92 emails 133
alarm 55
memory module 92 event notifications 130
illustration 54
processor 92 IP address
Load segment 1 indicator 55
processor cache 92 cluster 107
Load segment 2 indicator 55
software 92 cluster (system) 106
on or off button 55
system 96 IPv6 107
on-battery indicator 55
system board 92 service 120
overload indicator 55
uninterruptible power supply 92 system 107
power-on indicator 55
fix IPv4 address 106
test and alarm-reset button 56
errors 216 IPv6
SAN Volume Controller
front panel address 107
navigation buttons 13
2145 UPS-1U 54 gateway menu option 107
node status LED 12
action menu options 110 prefix mask menu option 107
select button 13
booting 125 iSCSI
SAN Volume Controller 2145-8A4
buttons and indicators 99 link problems 211, 212
illustration 11
charging 125
operator-information panel 15
display 13
SAN Volume Controller 2145-8F2
ID 14
interface 78
error LED 14 J
illustration 12 Japan Electronics and Information
menu options 104
operator information panel 17 Technology Industries Association
Ethernet 109
SAN Volume Controller 2145-8F4 Statement 324
Fibre Channel port-1 through
illustration 12 Japanese electronic emission notice 324
port-4 109
operator information panel 17
IPv4 address 106
SAN Volume Controller 2145-8G4
IPv6 address 107
Language? 124
illustration 11
operator information panel 16
K
node 108 keyboards
SAN Volume Controller 2145-CF8
version 108 accessibility features 317
illustration 10
power failure 125 Korean electronic emission
operator-information panel 15
powering off the SAN Volume statement 325
SAN Volume Controller 2145-CG8
Controller 125
illustration 9
recovering 125
operator-information panel 14
SAN Volume Controller 99
front panel display
status indicators L
action menu options 110 language menu selection options 124
node rescue request 230
boot failed 99 LEDs
boot progress 99 ac and dc 33, 34, 35
charging 100 diagnostics 284
G error codes 100 Ethernet
gateway hardware boot 100 activity 19, 32
menu option 107 menu options 104 link 32
node option 117, 119 node rescue request 101 Fibre Channel 31
Germany electronic emission compliance power failure 101 hard-disk drive activity 17
statement 322 powering off 101 location 19, 33
recovering 102 power 18, 33
restarting 102 power-supply error 33
H shutting down 102
indicators on the rear panel 32
rear-panel indicators 19, 22, 24, 26,
28, 30
hard-disk drive activity LED 17
10 Gbps Ethernet card 19 SAN Volume Controller 2145-8A4 24
ac and dc LEDs 33, 34, 35 SAN Volume Controller 2145-8F2 30
Index 329
LEDs (continued) MAP (continued) menu options (continued)
SAN Volume Controller 2145-8F4 28 5700: Repair verification 282 SAN Volume Controller
SAN Volume Controller 2145-8G4 26 5800: Light path 284 active 106
SAN Volume Controller 2145-CF8 22 5900: Hardware boot 307 degraded 106
SAN Volume Controller 2145-CG8 19 6000: Replace offline SSD 312 inactive 106
system information 18 6001 Replace offline SSD in a RAID 0 sequence 104
system-error 17, 33 array 312 system
light path MAP 284 6002: Replace offline SSD in a RAID 1 gateway 107
link failures array or RAID 10 array 314 IPv6 prefix 107
Fibre Channel 211 power off SAN Volume Controller status 108
link problems node 262 message classification 156
iSCSI 211, 212 MAPs (maintenance analysis procedures)
Load segment 1 indicator 55 10 Gbps Ethernet 273
Load segment 2 indicator 55
locator LED 19
2145 UPS-1U 253
2145 UPS-1U repair verification 258
N
navigation
log files Ethernet 269
accessibility 317
viewing 129 Fibre Channel 275
buttons 13
front panel 267
create cluster 115
hardware boot 307
Language? 124
M light path 284
power
recover cluster 124
MAC address 109 New Zealand electronic emission
SAN Volume Controller
maintenance analysis procedures (MAPs) statement 322
2145-8A4 249
10 Gbps Ethernet 273 node
SAN Volume Controller
2145 UPS-1U 253 create cluster 115
2145-8F2 242
Ethernet 269 options
SAN Volume Controller
Fibre Channel 275 create cluster? 115
2145-8F4 242
front panel 267 gateway 119
SAN Volume Controller
hardware boot 307 IPv4 address 115
2145-8G4 242
light path 284 IPv4 confirm create? 117
SAN Volume Controller
overview 235 IPv4 gateway 117
2145-CF8 242
power IPv4 subnet mask 116
SAN Volume Controller
SAN Volume Controller IPv6 address 118
2145-CG8 242
2145-8A4 249 IPv6 Confirm Create? 119
power off 262
SAN Volume Controller IPv6 prefix 118
redundant ac power 259, 260
2145-8F2 242 Remove Cluster? 123
repair verification 282
SAN Volume Controller status 108
SSD failure 312, 314
2145-8F4 242 subnet mask 116
start 235
SAN Volume Controller rescue request 101
using 235
2145-8G4 242 software failure 242, 249
media access control (MAC) address 109
SAN Volume Controller node canisters
medium errors 233
2145-CG8 242 configuration 5
menu options
repair verification 282 node rescue
clustered system
SSD failure 312, 314 codes 153
IPv4 address 106
start 235 node status LED 12
IPv4 gateway 107
management GUI nodes
IPv4 subnet 107
accessing 70 adding 73
clustered systems
shut down a node 262 cache data, saving 101
IPv6 address 107
management GUI interface configuration 5
clusters
when to use 70 addressing 5
IPv6 address 107
managing failover 5
options 106
event log 129 deleting 71
reset password 124
MAP failover 6
status 106
5000: Start 235 hard disk drive failure 101
Ethernet
5050: Power SAN Volume Controller identification label 14
MAC address 109
2145-CG8, 2145-CF8, 2145-8G4, options
port 109
2145-8F4, and 2145-8F2 242 main 108
speed 109
5060: Power 2145-8A4 249 removing 71
Fibre Channel port-1 through
5150: 2145 UPS-1U 253 replacing nondisruptively 82
port-4 109
5250: 2145 UPS-1U repair rescue
front panel display 104
verification 258 performing 230
IPv4 gateway 107
5320: Redundant ac power 259 viewing
IPv6 gateway 107
5340: Redundant ac power general details 90
IPv6 prefix 107
verification 260 vital product data 89
Language? 124
5400: Front panel 267 noncritical
node
5500: Ethernet 269 node errors 153
options 108
5550: 10 Gbps Ethernet 273 not used
status 108
5600: Fibre Channel 275 2145 UPS-1U ports 56
Index 331
requirements (continued) SAN Volume Controller (continued) SAN Volume Controller 2145-8F4
ac voltage 36, 37, 39, 40, 41, 42, 43, hardware components 9 (continued)
44 menu options controls and indicators on the front
circuit breakers 37, 39, 42, 44 Language? 124 panel 12
electrical 36, 39, 41, 43 node 108 dimensions and weight 45
power 36, 39, 41, 43 node 9 heat output 45
SAN Volume Controller 2145-8A4 41 overview 1 humidity 45
SAN Volume Controller 2145-8G4 43 power control 125 indicators and controls on the front
SAN Volume Controller 2145-CF8 39 power off 262 panel 12
SAN Volume Controller 2145-CG8 36 power-on self-test 128 light path MAP 302
Rescue Node preparing environment 36 MAP 5800: Light path 302
option 124 properties 90 operator information panel 17
rescue nodes software product characteristics 45
performing 230 overview 1 rear-panel indicators 28
reset button 17 SAN Volume Controller 2145-8A4 specifications 45
reset password menu option 124 additional space requirements 43 weight and dimensions 45
navigation 124 air temperature without redundant ac SAN Volume Controller 2145-8G4
resetting the password 124 power 42 additional space requirements 45
resetting passwords 124 circuit breaker requirements 42 air temperature without redundant ac
restore connectors 24 power 44
system 215, 222 controls and indicators on the front circuit breaker requirements 44
panel 11 connectors 26
dimensions and weight 43 controls and indicators on the front
S heat output of node 43
humidity with redundant ac
panel 11
dimensions and weight 45
SAN (storage area network)
power 42 heat output of node 45
fabric overview 7
humidity without redundant ac humidity with redundant ac
problem determination 210
power 42 power 44
SAN Volume Controller
indicators and controls on the front humidity without redundant ac
2145 UPS-1U 53
panel 11 power 44
action options
input-voltage requirements 41 indicators and controls on the front
create cluster 115
light path MAP 296 panel 11
field replaceable units
MAP 5800: Light path 296 input-voltage requirements 43
4-port Fibre Channel adapter 58
nodes light path MAP 298
40×40×28 fan 58
heat output 43 MAP 5800: Light path 298
40×40×56 fan 58
not used, service ports 24 nodes
alcohol wipe 58
operator-information panel 15 heat output 45
CMOS battery 58
ports 24 not used, service ports 26
disk backplane 58
power requirements for each operator information panel 16
disk controller 58
node 41 power requirements for each
disk drive assembly 58
product characteristics 41 node 43
disk drive cables 58
rear-panel indicators 24 product characteristics 43
disk power cable 58
requirements 41 rear-panel indicators 26
disk signal cable 58
specifications 41 requirements 43
Ethernet cable 58
temperature with redundant ac specifications 43
fan assembly 58
power 42 temperature with redundant ac
fan power cable 58
weight and dimensions 43 power 44
Fibre Channel adapter
SAN Volume Controller 2145-8F2 weight and dimensions 45
assembly 58
air temperature 45 SAN Volume Controller 2145-CF8
Fibre Channel cable 58
connectors 31 additional space requirements 40
Fibre Channel HBA 58
controls and indicators on the front air temperature without redundant ac
frame assembly 58
panel 12 power 39
front panel 58
dimensions and weight 45 circuit breaker requirements 39
memory module 58
heat output 45 connectors 22
microprocessor 58
humidity 45 controls and indicators on the front
operator-information panel 58
indicators and controls on the front panel 10
power backplane 58
panel 12 dimensions and weight 40
power cable assembly 58
light path MAP 302 heat output of node 41
power supply assembly 58
MAP 5800: Light path 302 humidity with redundant ac
riser card, PCI 58
operator information panel 17 power 40
riser card, PCI Express 58
product characteristics 45 humidity without redundant ac
service controller 58
rear-panel indicators 30 power 39
service controller cable 58
specifications 45 indicators and controls on the front
system board 58
weight and dimensions 45 panel 10
thermal grease 58
SAN Volume Controller 2145-8F4 input-voltage requirements 39
voltage regulator module 58
air temperature 45 light path MAP 290
front-panel display 99
connectors 28 MAP 5800: Light path 290
hardware 1
Index 333
V
validating
volume copies 79
VDisks (volumes)
recovering from offline
using CLI 221
viewing
event log 129
vital product data (VPD)
displaying 89
overview 89
understanding the fields for the
node 92
understanding the fields for the
system 96
viewing
nodes 89
volume copies
validating 79
volumes
recovering from offline
using CLI 81
volumes (VDisks)
recovering from offline
using CLI 221
VPD (vital product data)
displaying 89
overview 89
understanding the fields for the
node 92
understanding the fields for the
system 96
W
websites xv
when to use
cluster (system) CLI 77
management GUI interface 70
service assistant 76
service CLI 78
worldwide node names
change 122
choose 103
display 108
node, front panel display 108, 122
validate, front panel display 103
worldwide port names (WWPNs)
description 35
Printed in USA
GC27-2284-04