Hacmpplangd PDF
Hacmpplangd PDF
Standard Edition
Version 7.2
IBM
IBM PowerHA SystemMirror for AIX
Standard Edition
Version 7.2
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page 107.
This edition applies to IBM PowerHA SystemMirror 7.2 Standard Edition for AIX and to all subsequent releases and
modifications until otherwise indicated in new editions.
© Copyright IBM Corporation 2017, 2018.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
About this document . . . . . . . . . v Using NFS with PowerHA SystemMirror . . . 51
Highlighting . . . . . . . . . . . . . . v Planning resource groups. . . . . . . . . . 58
Case-sensitivity in AIX . . . . . . . . . . . v Overview for resource groups . . . . . . . 58
ISO 9000. . . . . . . . . . . . . . . . v General rules for resources and resource groups 59
Related information . . . . . . . . . . . . v Types of resource groups: Concurrent and
nonconcurrent . . . . . . . . . . . . 59
Planning PowerHA SystemMirror . . . . 1 Resource group policies for startup, fallover, and
fallback . . . . . . . . . . . . . . . 60
What's new in Planning for PowerHA SystemMirror 1
Resource group attributes . . . . . . . . 60
PowerHA SystemMirror maximum limits . . . . . 1
Moving resource groups to another node . . . 68
Overview of planning process . . . . . . . . 2
Planning cluster networks and resource groups 69
Planning guidelines . . . . . . . . . . . 2
Planning parallel or serial order for processing
Eliminating single points of failure: Configuring
resource groups . . . . . . . . . . . . 69
redundant components supported by PowerHA
Planning resource groups in clusters that have
SystemMirror . . . . . . . . . . . . . 3
sites . . . . . . . . . . . . . . . . 70
Overview of the planning process . . . . . . 4
Planning for replicated resources . . . . . . 76
Initial cluster planning . . . . . . . . . . . 5
Planning for Workload Manager . . . . . . 77
Planning cluster nodes . . . . . . . . . . 6
Planning for cluster events . . . . . . . . . 79
Planning for repository disk and cluster multicast
Planning site and node events . . . . . . . 80
IP address . . . . . . . . . . . . . . 7
Planning node_up and node_down events . . . 81
Planning for disk fencing . . . . . . . . . 8
Network events . . . . . . . . . . . . 84
Planning cluster sites . . . . . . . . . . 10
Network interface events . . . . . . . . . 85
Planning cluster security . . . . . . . . . 11
Clusterwide status events. . . . . . . . . 86
Application planning . . . . . . . . . . 12
Resource group event handling and recovery . . 87
Drawing a cluster diagram . . . . . . . . 17
Customizing cluster event processing. . . . . 89
Host name requirements . . . . . . . . . 18
Custom remote notification of events . . . . . 93
Planning cluster network connectivity . . . . . 18
Customizing event duration time until warning 93
General network considerations for PowerHA
User-defined events . . . . . . . . . . 94
SystemMirror . . . . . . . . . . . . . 19
Event summaries and preamble . . . . . . 97
Monitoring in PowerHA SystemMirror . . . . 22
Planning for PowerHA SystemMirror clients . . . 97
Designing the network topology . . . . . . 22
Clients running Clinfo . . . . . . . . . . 97
Planning for IP address takeover via IP aliases . 25
Clients not running Clinfo . . . . . . . . 97
Planning for other network conditions . . . . 29
Network components . . . . . . . . . . 98
Avoiding network conflicts . . . . . . . . 33
Applications and PowerHA SystemMirror . . . . 98
Adding the network topology to the cluster
Overview of applications and PowerHA
diagram . . . . . . . . . . . . . . 33
SystemMirror . . . . . . . . . . . . . 98
Planning shared disk and tape devices . . . . . 33
Application automation: Minimizing manual
Overview of shared disk and tape devices . . . 33
intervention . . . . . . . . . . . . . 99
Choosing a shared disk technology . . . . . 34
Application dependencies . . . . . . . . 101
Disk power supply considerations . . . . . . 35
Application interference . . . . . . . . . 102
Planning for nonshared disk storage . . . . . 35
Robustness of application . . . . . . . . 103
Planning a shared disk installation . . . . . 36
Application implementation strategies . . . . 103
Adding the disk configuration to the cluster
diagram . . . . . . . . . . . . . . 37
Planning for tape drives as cluster resources . . 37 Notices . . . . . . . . . . . . . . 107
Planning shared LVM components. . . . . . . 39 Privacy policy considerations . . . . . . . . 109
Planning for LVM components . . . . . . . 40 Trademarks . . . . . . . . . . . . . . 109
Planning LVM mirroring . . . . . . . . . 42
Planning for disk access . . . . . . . . . 45 Index . . . . . . . . . . . . . . . 111
Using fast disk takeover . . . . . . . . . 47
Using quorum and varyon to increase data
availability . . . . . . . . . . . . . 48
Highlighting
The following highlighting conventions are used in this document:
Bold Identifies commands, subroutines, keywords, files, structures, directories, and other items whose names are
predefined by the system. Also identifies graphical objects such as buttons, labels, and icons that the user
selects.
Italics Identifies parameters whose actual names or values are to be supplied by the user.
Monospace Identifies examples of specific data values, examples of text similar to what you might see displayed,
examples of portions of program code similar to what you might write as a programmer, messages from
the system, or information you should actually type.
Case-sensitivity in AIX
Everything in the AIX operating system is case-sensitive, which means that it distinguishes between
uppercase and lowercase letters. For example, you can use the ls command to list files. If you type LS, the
system responds that the command is not found. Likewise, FILEA, FiLea, and filea are three distinct file
names, even if they reside in the same directory. To avoid causing undesirable actions to be performed,
always ensure that you use the correct case.
ISO 9000
ISO 9000 registered quality systems were used in the development and manufacturing of this product.
Related information
v The PowerHA SystemMirror Version 7.2 for AIX PDF documents are available in the PowerHA
SystemMirror 7.2 PDFs topic.
v The PowerHA SystemMirror Version 7.2 for AIX release notes are available in the PowerHA
SystemMirror 7.2 release notes topic.
In this PDF file, you might see revision bars (|) in the left margin that identifies new and changed
information.
December 2018
Updated information about the in the “Pre-event and post-event scripts” on page 90 topic.
The following table displays the PowerHA SystemMirror components and their corresponding maximum
values.
Table 1. PowerHA SystemMirror maximum limits
Component Maximum limits
Nodes in a cluster 16
Resources available in a resource group 128
Backup repository disks in a cluster 6
Resource groups in a cluster 64
Number of networks in a cluster 48
Cluster IP addresses in a cluster 256
Application servers in a resource group 128
Application monitors in a resource group 128
Service IP labels in a resource group 128
Interfaces per node per network 7
Persistent IP addresses per network per node 1
Volume groups in a resource group 128
Cluster name length 63 characters
Node name length 64 characters
Network interface name length 64 characters
Service IP name length 63 characters
Persistent IP name length 64 characters
Resource group name length 64 characters
Application server name length 64 characters
Application monitor name length 64 characters
Your major goal throughout the planning process is to eliminate single points of failure. A single point of
failure exists when a critical cluster function is provided by a single component. If that component fails,
the cluster has no other way of providing that function, and the application or service dependent on that
component becomes unavailable.
For example, if all the data for a critical application resides on a single disk, and that disk fails, that disk
is a single point of failure for the entire cluster. Clients cannot access that application until the data on
the disk is restored. Likewise, if dynamic application data is stored on internal disks rather than on
external disks, it is not possible to recover an application by having another cluster node take over the
disks. Therefore, identifying necessary logical components required by an application, such as file systems
and directories (which could contain application data and configuration variables), is an important
prerequisite for planning a successful cluster.
Realize that, while your goal is to eliminate all single points of failure, you may have to make some
compromises. Usually a cost is associated with eliminating a single point of failure. For example,
purchasing an additional hardware device to serve as backup for the primary device increases cost. The
cost of eliminating a single point of failure could be compared against the cost of losing services if that
component fail. Again, the purpose of the PowerHA SystemMirror is to provide a cost-effective, highly
available computing platform that can meet future processing demands.
Note: It is important that failures of cluster components be remedied as soon as possible. Depending on
your configuration, PowerHA SystemMirror might not be able to handle a second failure, due to lack of
resources.
Planning guidelines
Designing the cluster that provides the best solution for your organization requires careful and thoughtful
planning. In fact, adequate planning is the key to building a successful PowerHA SystemMirror cluster. A
well-planned cluster is easier to install, provides higher application availability, performs better, and
requires less maintenance than a poorly planned cluster.
You might need to plan for additional processes within your environment. For example, patch
management and process management processes are critical if you want your environment to handle
various types of failures.
For a critical application to be highly available, none of the associated resources should be a single point
of failure. As you design a PowerHA SystemMirror cluster, your goal is to identify and address all
potential single points of failure. Questions to ask include:
v What application services are required to be highly available? What is the priority of these services?
v What is the cost of a failure compared to the necessary hardware to eliminate the possibility of this
failure?
v What is the maximum number of redundant hardware and software components that PowerHA
SystemMirror can support?
v What is the required availability of these services? Do they need to be available 24 hours a day, seven
days a week, or is eight hours a day, five days a week sufficient?
v What could happen to disrupt the availability of these services?
v What is the allotted time for replacing a failed resource? What is an acceptable degree of performance
degradation while operating after a failure?
To plan, implement, and maintain a successful PowerHA SystemMirror cluster requires continuing
communication among many groups within your organization. Ideally, you should assemble the
following representatives (as applicable) to aid in PowerHA SystemMirror planning sessions:
v Network administrator
v System administrator
v Database administrator
v Application programming
v Support personnel
v End users
PowerHA SystemMirror supports a variety of configurations, providing you with a great deal of
flexibility. For information about designing for the highest level of availability for your cluster, see the
IBM white paper High Availability Cluster Multiprocessing Best Practices.
Related reference:
“Eliminating single points of failure: Configuring redundant components supported by PowerHA
SystemMirror”
The PowerHA SystemMirror software provides numerous options to avoid single points of failure.
Related information:
High Availability Cluster Multiprocessing Best Practices
The following table summarizes potential single points of failure and describes how to eliminate them by
configuring redundant hardware and software cluster components.
Related reference:
“Planning guidelines” on page 2
Designing the cluster that provides the best solution for your organization requires careful and thoughtful
planning. In fact, adequate planning is the key to building a successful PowerHA SystemMirror cluster. A
well-planned cluster is easier to install, provides higher application availability, performs better, and
requires less maintenance than a poorly planned cluster.
In this step, you plan the core of the cluster: the applications to be made highly available, the types of
resources they require, the number of nodes, shared IP addresses, and a mode for sharing disks
(nonconcurrent or concurrent access). Your goal is to develop a high-level view of the system that serves
as a starting point for the cluster design. After making these initial decisions, start to draw a diagram of
the cluster. Initial cluster planning describes this step of the planning process.
In this step, you decide on names for the cluster and the nodes. Optionally, you also decide on names for
sites and decide which nodes belong to which site. Initial cluster planning describes this step of the
planning process.
In this step, you decide whether the site will use a stretched cluster or a linked cluster. A stretched cluster
contains nodes from sites at the same geographical locations. Stretched clusters must share a repository
disk. A linked cluster contains nodes from sites that are located at different geographical locations. Linked
clusters use a separate repository disk.
In this step, you plan the networks that connect the nodes in your system. You first examine issues
relating to TCP/IP and point-to-point networks in a PowerHA SystemMirror environment. Planning
cluster network connectivity describes this step of the planning process.
In this step, you plan the shared disk devices for the cluster. You decide which disk storage technologies
you will use in your cluster, and examine issues relating to those technologies in the PowerHA
SystemMirror environment. Planning shared disk and tape devices describes this step of the planning
process.
In this step, you plan the shared volume groups for the cluster. You first examine issues relating to LVM
components in a PowerHA SystemMirror environment. Planning shared LVM components describes this
step of the planning process.
In this step, you examine issues relating to PowerHA SystemMirror clients. Planning for PowerHA
SystemMirror clients describes this step of the planning process.
Related reference:
“Initial cluster planning”
This section describe the initial steps you take to plan a PowerHA SystemMirror cluster to make
applications highly available.
“Planning cluster network connectivity” on page 18
This section describe planning the network support for a PowerHA SystemMirror cluster.
“Planning resource groups” on page 58
These topics describe how to plan resource groups within a PowerHA SystemMirror cluster.
“Planning shared LVM components” on page 39
This section describe planning shared volume groups for a PowerHA SystemMirror cluster.
“Planning shared disk and tape devices” on page 33
This section discusses information to consider before configuring shared external disks in a PowerHA
SystemMirror cluster and provides information about planning and configuring tape drives as cluster
resources.
“Planning for cluster events” on page 79
These topics describe the PowerHA SystemMirror cluster events.
“Planning for PowerHA SystemMirror clients” on page 97
These topics discuss planning considerations for PowerHA SystemMirror clients. This is the last step
before proceeding to the installation of your PowerHA SystemMirror software.
“Planning cluster sites” on page 10
PowerHA SystemMirror clusters can be used within a single site or multiple sites for disaster recovery.
Related information:
Resource group behavior during cluster events
Before you start PowerHA SystemMirror planning, make sure that you understand the concepts and
terminology relevant to PowerHA SystemMirror.
For example, when you plan the size of your cluster, include enough nodes to handle the processing
requirements of your application after a node fails.
Keep in mind the following considerations when determining the number of cluster nodes:
v A PowerHA SystemMirror cluster can be made up of any combination of IBM® Power Systems™
servers. Ensure that all cluster nodes do not share components that could be a single point of failure
(for example, a power supply). Similarly, do not place nodes on a single rack.
v Create small clusters that consist of nodes that perform similar functions or share resources. Smaller,
simple clusters are easier to design, implement, and maintain.
v For performance reasons, it may be desirable to use multiple nodes to support the same application. To
provide mutual takeover services, the application must be designed in a manner that allows multiple
instances of the application to run on the same node.
For example, if an application requires that the dynamic data resides in a directory called /data, the
application probably cannot support multiple instances on the same processor. For such an application
(running in a nonconcurrent environment), try to partition the data so that multiple instances of the
application can run, each accessing a unique database.
Furthermore, if the application supports configuration files that enable the administrator to specify that
the dynamic data for instance1 of the application resides in the data1 directory, instance2 resides in the
data2 directory, and so on, then multiple instances of the application are probably supported.
v Certain configurations, including additional nodes in the cluster design, can increase the level of
availability provided by the cluster. Certain configurations also give you more flexibility in planning
node fallover and reintegration.
The most reliable cluster node configuration is to have at least one standby node.
v Choose cluster nodes that have enough I/O slots to support redundant network interface cards and
disk adapters.
Remember, the cluster composed of multiple nodes is still more expensive than a single node, but
without planning to support redundant hardware, (such as enough I/O slots for network and disk
adapters), the cluster will have no better availability.
v Use nodes with similar processing speed.
v Use nodes with the sufficient CPU cycles and I/O bandwidth to allow the production application to
run at peak load. Remember, nodes should have enough capacity to allow PowerHA SystemMirror to
operate.
To plan for this, benchmark or model your production application, and list the parameters of the
heaviest expected loads. Then choose nodes for a PowerHA SystemMirror cluster that will not exceed
85% busy when running your production application.
When you create a cluster, you assign a name to it. PowerHA SystemMirror associates this name with
the PowerHA SystemMirror-assigned cluster ID.
PowerHA SystemMirror uses a shared disk to store Cluster Aware AIX (CAA) cluster configuration
information. You must have at least 512 MB and no more than 460 GB of disk space allocated for the
cluster repository disk. This configuration is automatically kept highly available on the disk that is
provided. This feature requires that a dedicated shared disk be available to all nodes that are part of the
cluster. This disk cannot be used for application storage or any other purpose.
When planning the disks that you want to use as repository disks, you must plan for a backup or
replacement disks, which can be used in case the primary repository disk fails. The backup disk must be
the same size and type as the primary disk, but could be in a different physical storage disk. Update
your administrative procedures and documentation with the backup disk information. You can also
replace a working repository disk with a new one to increase the size or to change to a different storage
subsystem. To replace a repository disk, you can use the SMIT interface.
Note: If the shared disk that is used as a repository disk is a mapped virtual SCSI (vSCSI) disk, you must
map the disk as an vSCSI disk to all nodes in the cluster. The mapping of the vSCSI disk must be
identical across all nodes in the cluster. For example, you cannot map the repository disk using the vSCSI
method to one node in the cluster and map the same disk using the N-Port ID Virtualization (NPIV)
method to another node in the cluster.
You can use a multicast IP address for cluster monitoring and communication. You can specify this
address when you create the cluster, or you can have one be generated automatically when you
synchronize the initial cluster configuration.
Note: The default mechanism uses unicast communications and requires no extra configuration.
However, if you want to use multicast communication, you must continue reading and ensure that your
network devices are enabled for multicast communications.
If you decide to use multicast, PowerHA SystemMirror uses multicast-based communication between
hosts in the cluster. Your environment’s network must allow multicast IP packets to flow between hosts
in the cluster. To verify whether nodes in your environment support multicast based communication, use
the mping command. Run the mping command before you start using PowerHA SystemMirror in your
environment.
Note: Some of the network switches allow multicast packets to flow for a while before stopping them.
So, it is critical to conduct the mping test for at least 5 minutes and to make sure that the network fabric
allows multicast packet flow without any issues. Also, when switches are cascaded, typically the switches
need additional configuration to route multicast packets. To configure multicast packet flow, see the
documentation that is provided by the switch vendor to configure multicast packet flow.
Note: The range 224.0.0.0-224.0.0.255 is reserved for local purposes, such as administrative and
maintenance tasks, and data that they receive is never forwarded by multicast routers. Similarly, the
range 239.0.0.0 - 239.255.255.255 is reserved for administrative scoping. These special multicast groups are
regularly published in the Assigned Numbers RFC.
PowerHA SystemMirror 7.1.2, or later, supports IP version 6 (IPv6), however, you cannot explicitly
specify the IPv6 multicast address. CAA uses an IPv6 multicast address which is derived from the IP
version 4 (IPv4) multicast address. To determine the IPv6 multicast address, a standard prefix of oxFF05 is
combined using the logical OR operator with the hexadecimal equivalent of the IPv4 address. For
example, the IPv4 multicast address is 228.8.16.129 or 0xE4081081. The transformation by the logical OR
operation with the standard prefix is 0xFF05:: | 0xE4081081. Thus, the resulting IPv6 multicast address is
0xFF05::E408:1081.
Related information:
Repository disk failure
Replacing a repository disk with SMIT
Troubleshooting multicast
Testing multicast in a network
To use disk fencing in PowerHA SystemMirror, your disks must meet the following requirements:
v All disks that are managed by the storage systems must be enabled for SCSI-3 Persistent Reservation
(PR). Some storage systems do not enable SCSI-3 PR capabilities by default.
v All disks must not be in use when disk fencing is enabled (the volume groups must be offline).
v The disks must not be reserved before you start PowerHA SystemMirror. You can use the storage
systems software to release any disk reservations.
To use the disk fencing option, you must identify the critical resource group and the storage subsystem
must support SCSI-3 PR and ODM reserve_policy of PR_shared. This policy is applied to all disks that
are part of the volume group and resource group.
Reservation type
The disk must have a require reservation type of Write Exclusive All Registrant (WEAR). You can
run the following clmgr commands to check if your disk supports SCSI-3 capability (WEAR -
type 7h) to support PowerHA SystemMirror disk fencing:
clmgr scsipr_capability query physical_volume <disk>
clmgr scsipr_capability query volume_group <vg>
where disk is the name of the disk and vg is the name of the volume group.
reserve_policy
Defines whether a reservation method is running on the disk. The reserve_policy of PR_shared is
the required policy that applies the shared host method for the disk. To view the attributes for
the disk, run the lsattr -Rl <diskname> -a reserve_policy command.
To use the disk fencing feature, you must specify a critical resource group. The critical resource group
that you specify must meet the following criteria:
v The critical resource group cannot be added as a child in any parent_child, start_after, or stop_after,
dependency relationship.
v The critical resource group must have all nodes in the cluster as participating nodes.
v The critical resource group cannot use the Online on Home Node Only startup policy. The critical
resource group can use any other startup policy.
The reserve_policy for a disk of a volume group that has cluster services that are running and disk
fencing is enabled, must use the following settings:
Table 2. Disk settings
Option Value
Configured Reserve Policy PR_shared
Effective Reserve Policy PR_shared
Reservation Status SCSI PR reservation (Write Exclusive All Registrant)
EMC disks do not support SCSI-3 PR capabilities by default. If you attempt to configure disk fencing on
EMC storage without enabling SCSI-3 PR, an error occurs.
The following is an example of steps to enable SCSI-3 PR on some EMC storage devices:
Note: Before you complete the following steps, each disk must not be in use on any nodes in the cluster
and the volume groups must not be in a varyon state. The following commands are part of the EMC
software packages that are installed on an AIX LPAR.
1. Identify the IDs for the storage devices and disks in the EMC storage subsystem by running the
following command:
powermt display dev=hdiskpowerX
Pseudo name=hdiskpowerX
Symmetrix ID=000194900568
Logical device ID=0036
Device WWN=6000097000019490056853303030xxxx
state=alive; policy=SymmOpt; queued-IOs=0
For more information about configuring EMC for SCSI-3 PR, see the EMC Support website.
Hitachi disks do not support SCSI-3 PR capabilities by default. You must manually enable SCSI-3 PR
capabilities for each disk that is part of a volume group that is managed by PowerHA SystemMirror. To
enable the SCSI-3 PR capabilities, in the Hitachi Storage Navigator you must enable the Host Mode
Options (HMHO) 2 and 72.
Related information:
Configuring a quarantine policy
Troubleshooting disk fencing
If you have multiple sites, you can use the following PowerHA SystemMirror Enterprise Edition features
for disaster recovery:
v PowerHA SystemMirror Enterprise Edition for AIX includes options for supporting replication
technologies that are available from various storage subsystems.
v PowerHA SystemMirror Enterprise Edition for AIX for GLVM provides host-based replication over
TCP/IP networks.
PowerHA SystemMirror 7.1.2, or later, supports different types of definitions for sites and site-specific
policies for high availability and disaster recovery (HADR). You can define multiple sites in both
PowerHA SystemMirror Standard Edition for AIX and PowerHA SystemMirror Enterprise Edition for
AIX.
PowerHA SystemMirror uses Cluster Aware AIX (CAA) for cluster communication and cluster health
management.
You can use PowerHA SystemMirror management interfaces to create the following multiple-site
solutions:
Stretch clusters
Contains nodes from sites that are located at the same geographical locations. Stretched clusters
must share a repository disk across all nodes in the site. Stretched clusters do not support HADR
with Storage Replication Management. To use stretch clusters, your network environment must
support multicast-based communication.
Linked clusters
Contains nodes from sites that are located at different geographical locations. Linked clusters use
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
All resources defined to PowerHA SystemMirror must have unique names. The service IP labels, volume
groups, and resource group names must be both unique within the cluster and distinct from each other.
The name of a resource should relate to the application it serves, as well as to any corresponding device.
For example, a service address for a resource group that runs a WebSphere® instance could be named
websphere_service_address.
Connection authentication
You can also configure a virtual private network (VPN) for internode communications. If you use a VPN,
use persistent IP labels for VPN tunnels.
Security Configuration
PowerHA SystemMirror uses the Cluster Aware AIX (CAA) function to create secure communication path
for heartbeats and synchronization between nodes in a cluster.
You can use the following CAA methods to create cluster security credentials for nodes in the cluster.
Self signed
PowerHA SystemMirror generates the security credentials.
Security certificate and private key pairs
PowerHA SystemMirror uses existing security certificate and private key pairs that you provide.
Secure Shell (SSH)
PowerHA SystemMirror uses the keys already configured for SSH communication in your
environment.
PowerHA SystemMirror provides security for PowerHA SystemMirror messages sent between cluster
nodes as follows:
v Message authentication ensures the origination and integrity of a message.
v Message encryption changes the appearance of the data while it is transmitted and returns it to its
original form when received by a node that authenticates the message.
v Messages are either encrypted or hashed depending on a security level of Low, Medium, or High. A
Low security level hashes only a few of the messages as compared to a High security level where
messages are encrypted.
PowerHA SystemMirror supports the following types of encryption keys for message authentication and
encryption:
v Message Digest 5 (MD5) with Data Encryption Standard (DES)
v MD5 with Triple DES
v MD5 with Advanced Encryption Standard (AES).
Select an encryption algorithm that is compatible with the security methodology used by your
organization.
When the PATH variable is initialized for the cluster executable, the default PATH in the
/etc/environment file is scanned before adding paths to the cluster executables. If any of the following
items are found in the default PATH, these items are skipped and not included in the resulting PATH that
is used for the cluster executables:
Note: You should remove any of the following items from the default PATH.
v The stat() subroutine fails to return information about the directory
v A directory is world writable
Application planning
Before you start planning for an application, be sure you understand the data resources for your
application and the location of these resources within the cluster in order to provide a solution that
enables them to be handled correctly if a node fails.
To prevent a failure, you must thoroughly understand how the application behaves in a single-node and
multinode environment. Do not make assumptions about the application's performance under adverse
conditions.
Use nodes with the sufficient CPU cycles and I/O bandwidth to allow the production application to run
at peak load. Remember, nodes should have enough capacity to allow PowerHA SystemMirror to operate.
To plan for this, benchmark or model your production application, and list the parameters of the heaviest
expected loads. Then choose nodes for a PowerHA SystemMirror cluster that will not exceed 85% busy,
when running your production application.
You can configure multiple application monitors for an application and direct PowerHA SystemMirror to
both:
v Monitor the termination of a process or more subtle problems affecting an application
v Automatically attempt to restart the application and take appropriate action (notification or fallover) if
restart attempts fail.
Keep in mind the following guidelines to ensure that your applications are serviced correctly within a
PowerHA SystemMirror cluster environment:
v Lay out the application and its data so that only the data resides on shared external disks. This
arrangement not only prevents software license violations, but it also simplifies failure recovery.
v If you are planning to include multitiered applications in parent-child dependent resource groups in
your cluster, see the section Planning considerations for multitiered applications. If you are planning to
use location dependencies to keep certain applications on the same node, or on different nodes, see the
section Resource group dependencies.
v Write robust scripts to start and stop the application on the cluster nodes. The startup script especially
must be able to recover the application from an abnormal end, such as a power failure. Ensure that it
runs properly in a single-node environment before including the PowerHA SystemMirror software.
v Confirm application licensing requirements. Some vendors require a unique license for each processor
that runs an application, which means that you must protect the license for the application by
incorporating processor-specific information into the application when it is installed. As a result, even
though the PowerHA SystemMirror software processes a node failure correctly, it might be unable to
restart the application on the fallover node because of a restriction on the number of licenses for that
application available within the cluster. To avoid this problem, be sure that you have a license for each
system unit in the cluster that might potentially run an application.
v Ensure that the application runs successfully in a single-node environment. Debugging an application
in a cluster is more difficult than debugging it on a single processor.
v Verify that the application uses a proprietary locking mechanism if you need concurrent access.
Related reference:
“Planning considerations for multitiered applications” on page 16
Business configurations that use multitiered applications can use parent and child dependent resource
groups. For example, the database must be online before the application controller. In this case, if the
database goes down and is moved to a different node the resource group containing the application
controller would have to be brought down and back up on any node in the cluster.
“Resource group dependencies” on page 63
PowerHA SystemMirror offers a wide variety of configurations where you can specify the relationships
between resource groups that you want to maintain at startup, fallover, and fallback.
CoD resources are composed of On/Off CoD resources and Enterprise Pool CoD (EPCoD) resources. Both
of these resources can dynamically deliver supplementary resources to your environment that are used
through normal DLPAR management (allocation or release of resources to an LPAR).
The additional processors and memory, while physically present, are not used until PowerHA
SystemMirror decides that the additional capacity that is required is worth the cost. You can use the
ROHA function to quickly and easily acquire extra resources to meet peek or unexpected workloads in
your environment.
PowerHA SystemMirror integrates with the DLPAR, On/Off CoD, and EPCoD functions. The collection
of these functions is called ROHA. The active node is hosted by an LPAR on a frame with sufficient
permanent resources. The standby node is hosted by an LPAR on a frame with minimal permanent
resources and relies on ROHA to dynamically add extra resources.
When you configure PowerHA SystemMirror to use resources through the ROHA function, the LPAR
nodes in the cluster do not use more resources until the resources are required by the application.
PowerHA SystemMirror does not activate any CoD resources until the free pool is exhausted. After the
free pool is exhausted, more hardware resources are activated by PowerHA SystemMirror. CoD hardware
resources are activated and allocated to the LPAR dynamically until the requirements of the application
are met. When an application no longer requires the allocated hardware resources, they are released to
the free pool, at which point PowerHA SystemMirror deactivates the hardware resources. These hardware
resources return to the pool from which they originated (On/Off CoD pool or EPCoD pool).
Related reference:
“Monitoring in PowerHA SystemMirror” on page 22
The primary task of PowerHA SystemMirror is to recognize and respond to failures. PowerHA
SystemMirror uses the Cluster Aware AIX infrastructure to monitor the activity of its network interfaces,
devices, and IP labels.
Related information:
Administering PowerHA SystemMirror
Application controllers
To put the application under PowerHA SystemMirror control, you create an application controller
resource that associates a user-defined name with the names of specially written scripts to start and stop
the application.
By defining an application controller, PowerHA SystemMirror can start another instance of the
application on the takeover node when a fallover occurs. This protects your application so that it does
not become a single point of failure.
After you define the application controller, you can add it to a resource group. A resource group is a set
of resources that you define so that the PowerHA SystemMirror software can treat them as a single unit.
Related reference:
“Planning resource groups” on page 58
These topics describe how to plan resource groups within a PowerHA SystemMirror cluster.
PowerHA SystemMirror offers the following PowerHA SystemMirror Smart Assist applications to help
you integrate the application into a PowerHA SystemMirror cluster:
Smart Assist for WebSphere
Extends an existing PowerHA SystemMirror configuration to include monitoring and recovery
support for various WebSphere components.
Smart Assist for DB2®
Extends an existing PowerHA SystemMirror configuration to include monitoring and recovery
support for DB2 Universal Database™ (UDB) Enterprise Server Edition.
Smart Assist for Oracle
Provides assistance to those involved with the installation of Oracle Application Server 10g (9.0.4)
(AS10g) Cold Failover Cluster (CFC) solution on an IBM AIX operating system.
Smart Assist for FileNet® P8
Offers you enterprise-level scalability and flexibility to handle the most demanding content
challenges, the most complex business processes, and integration with existing systems in your
environment.
Smart Assist for SAP MaxDB
Setup MaxDB and liveCache database instances for high availability.
Smart Assist for Lotus® Domino® Server
Automatically configures PowerHA SystemMirror for environment that already have Lotus
Domino configured.
Smart Assist for Tivoli® Storage Manager
Uses the three different areas of Tivoli Storage Manager, server, client, and admin center to
provide a highly available solution for your environment.
Smart Assist for SAP
Sets up SAP Netweaver 2004s for high availability by protecting its single point of failures
Smart Assist for Tivoli Directory Server
Automatically configure PowerHA SystemMirror where the Tivoli Directory Server is already
installed.
Smart Assist for SAP liveCache Hot Standby
Provides management interfaces that assist you in configuring a PowerHA SystemMirror policy
and deploying start methods, stop methods, and monitor methods for the workload stacks in
your environment.
Smart Assist for Websphere MQSeries®
Enables programs to communicate with each other across a network of components that are not
similar, such as processors, subsystems, operating systems, and communication protocols.
Related information:
Smart Assist applications for PowerHA SystemMirror
Application monitoring
PowerHA SystemMirror can monitor applications that are defined to application controllers.
You can configure multiple application monitors and associate them with one or more application
controllers. You can assign each monitor a unique name in SMIT. By supporting multiple monitors per
application, PowerHA SystemMirror can support more complex configurations. For example, you can
configure one monitor for each instance of an Oracle parallel server in use. Otherwise, you can configure
a custom monitor to check the health of the database along with a process stop monitor to instantly
detect the end of the database process.
You can use the Application Availability Analysis tool to measure the exact amount of time that any of
your PowerHA SystemMirror-defined applications is available. The PowerHA SystemMirror software
collects, time stamps, and logs the following information:
v An application monitor is defined, changed, or removed
v An application starts, stops, or fails
v A node fails or is shut down, or startup
v A resource group is taken offline or moved
v Application monitoring via multiple monitors is suspended or resumed.
Related information:
Configuring PowerHA SystemMirror cluster topology and resources (extended)
Monitoring a PowerHA SystemMirror cluster
Environments such as service access points (SAP) require applications to be cycled (stopped and then
started again) whenever a database fails. Many application services are provided by an environment like
SAP, and the individual application components often need to be controlled in a specific order.
Establishing interdependencies between resource groups is also useful when system services are required
to support application environments. Services such as cron jobs for pruning log files or for initiating
backups need to move from one node to another along with an application, but typically are not initiated
until the application is established. These services can be built into application controller start and stop
scripts, or they can be controlled through pre-event and post-event processing. However, dependent
resource groups simplify the way you configure system services to be dependent upon applications they
serve.
Note: To minimize the chance of data loss during the application stop and restart process, customize
your application controller scripts to ensure that any uncommitted data is stored to a shared disk
temporarily during the application stop process and read back to the application during the application
restart process. It is important to use a shared disk because the application might be restarted on a node
other than the one on which it was stopped.
You can also configure resource groups with location dependencies so that certain resource groups are
kept online on the same node, or on different nodes at startup, fallover, and fallback.
Related reference:
“Planning resource groups” on page 58
These topics describe how to plan resource groups within a PowerHA SystemMirror cluster.
The following illustration shows a mixed cluster that includes a rack-mounted system and standalone
systems. The diagram uses rectangular boxes to represent the slots supported by the nodes. If your
cluster uses thin nodes, darken the outline of the nodes and include two nodes to a drawer. For wide
nodes, use the entire drawer. For high nodes, use the equivalent of two wide nodes. Keep in mind that
each thin node contains an integrated Ethernet connection.
Begin drawing this diagram by identifying the cluster name and the applications that are being made
highly available. Next, darken the outline of the nodes that will make up the cluster. Include the name of
each node.
Keep in mind the following requirements when you select the host name:
v The host name cannot be an alias in the /etc/hosts file.
v The name resolution for the host name must work both ways. Therefore, only a limited set of
characters can be used.
v The IP address that belongs to the host name must be reachable on the server, even when PowerHA is
in the DOWN state.
v The host name cannot be a service address.
v The host name cannot be an address that is located on a network, which is defined as private in
PowerHA.
v The host name, the CAA node name, and the COMMUNICATION_PATH (that is, the communication
path to the node) must be the same.
v By default, the PowerHA node name, the CAA node name, and the COMMUNICATION_PATH (that
is, the communication path to the node) are set to be the same.
v The host name and the PowerHA node name can be different.
v The host name cannot be changed after the cluster configuration is completed.
Note: These requirements leave the base addresses and the persistent address as candidates for the host
name. You can use the persistent address as the host name only if you set up the persistent alias
manually before you configure the cluster topology.
Prerequisites
In the Initial cluster planning topic, you began planning your cluster, identifying the number of nodes
and the key applications you want to make highly available. You started drawing the cluster diagram.
This diagram is the starting point for the planning you will do in this section.
Also, by now you should have decided whether or not you will use IP address takeover (IPAT) to
maintain specific service IP addresses.
Overview
Your primary goal is to use redundancy to design a cluster topology that eliminates network components
as potential single points of failure.
IP aliases
An IP alias is an IP label or address that is configured onto a network interface controller (NIC) in
addition to the normally configured IP label or address on the NIC. The use of IP aliases is an AIX
function that PowerHA SystemMirror supports. AIX supports multiple IP aliases on a NIC. Each IP alias
on a NIC can be on a separate subnet. AIX also allows IP aliases with different subnet masks to be
configured for an interface. PowerHA SystemMirror does not yet support this function.
IP aliases are used in PowerHA SystemMirror as service addresses for IP address takeover.
Network connections
PowerHA SystemMirror requires that each node in the cluster have at least one direct, nonrouted
network connection with every other node. The software uses these network connections to pass
heartbeat messages among the cluster nodes to determine the state of all cluster nodes, networks, and
network interfaces.
PowerHA SystemMirror requires all of the communication interfaces for a given cluster network to be
defined on the same physical network and route packets to each other. By default, PowerHA
SystemMirror uses unicast communications for heartbeat. If you choose to use multicast heartbeat
instead, you need to ensure that your network supports multicast. The communication interfaces must
also be able to receive responses from each other without interference by any network equipment.
PowerHA SystemMirror also requires that all nodes within a site must have at least one direct network
connection with every other node in the same site.
Unicast communications are always used between sites in a linked cluster. Within a site, you can select
unicast (the default) or multicast communications.
Between cluster nodes, place only intelligent switches, routers, or other network equipment that
transparently passes through multicast and other packets to all cluster nodes. This requirement includes
equipment that optimizes protocols.
If such equipment is placed in the paths between cluster nodes and clients, you might need to configure
a ping client list in the clinfo.rc file to help inform clients of IP address movements. The specific network
topology might require other solutions to ensure clients can continue to access the server after IP address
takeover.
Bridges, hubs, and other passive devices that do not modify the packet flow can be safely placed between
cluster nodes, and between nodes and clients.
Related information:
Programming client applications for the Clinfo API
In PowerHA SystemMirror, all node host names must be resolved locally with the /etc/hosts file. When
defining nodes to the cluster, you must specify an IP address or label that resolves locally to the host
name, and after you have synchronized the initial cluster configuration, the host name of the node might
not be changed.
For TCP/IP networks, an IP label and its associated IP address must appear in the /etc/hosts file.
The name of the service IP label or address must be unique within the cluster and distinct from the
volume group and resource group names. It should relate to the application it serves, as well as to any
corresponding device, such as websphere_service_address.
When you assign a service IP label to an interface, use a naming convention that helps identify the
interface's role in the cluster. The related entries in the /etc/hosts file would be similar to the following:
100.100.50.1 net1_en0
100.100.60.1 net2_en1
You configure the network interface controller (NIC) by following the instructions in the relevant AIX
documentation. AIX assigns an interface name to the NIC when it is configured. The interface name is
made up of 2 or 3 characters that indicate the type of NIC, followed by a number that AIX assigns in
sequence for each adapter of a certain type. For example, AIX assigns an interface name such as en0 for
the first Ethernet NIC it configures, en1 for the second, and so on.
Related information:
Configuring cluster events
Cluster partitioning
Partitioning, also called node isolation, occurs when a network or network interface controller (NIC)
failure isolates cluster nodes from each other.
When a PowerHA SystemMirror node stops receiving network traffic from another node, it assumes that
the other node has failed. Depending on your PowerHA SystemMirror configuration, the node might
begin acquiring disks from the failed node and making applications and IP labels available. If the failed
node is actually still up, data corruption might occur when the disks are taken from it. If the network
becomes available again, PowerHA SystemMirror stops one of the nodes to prevent further disk
contention and duplicate IP addresses on the network.
PowerHA SystemMirror heartbeat mechanism relies on the IP subsystem and the network infrastructure.
Therefore, if the network is congested or a node is congested, the IP subsystem can silently discard the
heartbeats. Attempts are made to adjust monitoring characteristics to take network congestion into
account and prevent cluster partitioning.
Related reference:
“Monitoring clusters” on page 30
The Cluster Aware AIX infrastructure monitors all available and supported network and storage
interfaces. The cluster managers on cluster nodes also send messages to each other through the
connections between these interfaces.
Two routers connect the networks, and they route packets between the cluster and clients, but not
between the two networks. A clinfo.rc file is installed on each node in the cluster, containing the IP
addresses of several client systems.
The PCI Hot Plug utility in PowerHA SystemMirror is not applicable to interfaces on a virtual Ethernet.
This utility only handles physical interface cards. Because virtual Ethernet uses virtual I/O adapters, you
cannot use the utility.
The following list contains additional considerations for PowerHA SystemMirror with virtual Ethernet:
v If VIOS has multiple physical interfaces defined on the same network, or if there are two or more
PowerHA SystemMirror nodes using VIOS in the same frame, PowerHA SystemMirror will not be
informed of (and therefore will not react to) single physical interface failures. This does not limit the
availability of the entire cluster because VIOS routes traffic around the failure. VIOS support is
analogous to EtherChannel in this regard. Use methods that are not based on VIOS to provide
notification of individual physical interface failures.
v If VIOS has only a single physical interface on a network, then PowerHA SystemMirror detects a
failure of that physical interface. However, the failure will isolate the node from the network.
Note: In VIOS 2.2.0.11, or later, you can use storage area network (SAN) communication between
logical partitions by establishing a virtual local area network through a virtual Ethernet adapter on
each VIOS client. You can set up SAN communication through VIOS for both NPIV and vSCSI
environments.
v In a VIOS environment, failure of the physical network adapter and network components outside the
virtualized network might not be detected reliably. To detect external network failures, you must
configure the netmon.cf file with one or more addresses outside of the virtualized network.
To troubleshoot virtual Ethernet interfaces defined to PowerHA SystemMirror and to detect an interface
failure, treat these interfaces as interfaces defined on single adapter networks.
Note: For Ethernet, PowerHA supports any combination of virtual and physical adapters on the same
network name.
Monitoring connections are necessary because they enable PowerHA SystemMirror to recognize the
difference between a network failure and a node failure. For instance, if connectivity on the PowerHA
SystemMirror network (this network's IP labels are used in a resource group) is lost, and you have
another TCP/IP based network, PowerHA SystemMirror recognizes the failure of its cluster network and
takes recovery actions that prevent the cluster from becoming partitioned.
To avoid cluster partitioning, you should configure redundant networks in the PowerHA SystemMirror
cluster.
When designing your network topology, ensure that clients have highly available network access to their
applications. This requires that none of the following network interfaces are a single point of failure:
v The IP subsystem
v A single network
v A single NIC
To eliminate the network as a single point of failure, configure multiple networks so that PowerHA
SystemMirror has multiple paths among cluster nodes. Keep in mind that if a client is connected to only
one network, that network is a single point of failure for the client. In a multiple-network setup if one
network fails, the remaining networks can still function to connect nodes and provide access for clients.
The more networks you can configure to carry heartbeats and other information among cluster nodes, the
greater the degree of system availability.
The following diagram illustrates a dual-network setup with more than one path to each cluster node.
Note: Hot replacement of the dual-port Ethernet adapter used to configure two interfaces for one
PowerHA SystemMirror IP network is currently not supported.
Note: Hot replacement of the dual-port Ethernet adapter used to configure two interfaces for one
PowerHA SystemMirror IP network is currently not supported.
Related information:
Resource group behavior during cluster events
When a node is configured with multiple connections to a single network, the network interfaces serve
different functions in PowerHA SystemMirror.
Service interface
A service interface is a network interface configured with a PowerHA SystemMirror service IP label. The
service IP label is used by clients to access application programs. The service IP is only available when
the corresponding resource group is online.
A persistent node IP label is an IP alias that can be assigned to a specific node on a cluster network. A
persistent node IP label always stays on the same node (node-bound), and coexists on an NIC that
already has a service or boot IP label defined. A persistent node IP label is does not require installing an
additional physical NIC on that node, and is not part of any resource group.
Assigning a persistent node IP label provides a node-bound address that you can use for administrative
purposes, because a connection to a persistent node IP label always goes to a specific node in the cluster.
You can have one persistent node IP label per network per node.
As one of the best practices for PowerHA SystemMirror, you must configure a persistent IP label for each
cluster node. This is useful, for instance, if you must access a particular node in a PowerHA
SystemMirror cluster for purposes of running reports or for diagnostics. Having a persistent IP label
configured has the advantage that PowerHA SystemMirror can access the persistent IP label on the node
despite individual NIC failures, assuming that there are spare NICs on the network.
24 Planning PowerHA SystemMirror
After a persistent node IP label is configured on a specified network node, it becomes available at boot
time and remains configured even if PowerHA SystemMirror is shut down on that node.
The following list describes how PowerHA SystemMirror responds to a failure when a persistent node IP
label is configured:
v If a NIC that has a service IP label configured fails and there is also a persistent label defined on this
NIC, the persistent label falls over to the same boot interface to which the service IP label falls over.
v If all NICs on the cluster network on a specified node fail, the persistent node IP label becomes
unavailable. A persistent node IP label always remains on the same network and on the same node. It
does not move between the nodes in the cluster.
Related information:
Configuring PowerHA SystemMirror cluster topology and resources (extended)
PowerHA SystemMirror uses IPAT via IP aliases to keep service IP addresses highly available.
When PowerHA SystemMirror is started on the node, the service IP label is aliased onto one of the boot
interfaces that is defined to PowerHA SystemMirror. If that interface fails, the service IP label is aliased
onto another interface if one is available on the same network. To use IPAT via IP aliases, the network
must support gratuitous ARP.
When you configure a persistent node IP label on a cluster network, the IP address associated with a
persistent IP label must be on a different subnet than any service address with which it might share an
interface. Networks with one interface per node do not require separate subnets.
In some situations you might need to configure a persistent IP label on the same subnet as the service IP
label. In this case, to avoid problems with network packets sent from either of the addresses, consider
configuring the distribution preference for service IP aliases. This preference lets you configure the type
of the distribution preference suitable for the VPN firewall external connectivity requirements.
Note: The subnet considerations are different if you are planning to configure a Network File System
(NFS).
Related reference:
“NFS cross-mounting and IP labels” on page 54
To enable NFS cross-mounting, each cluster node might act as an NFS client. Each of these nodes must
have a valid route to the service IP label of the NFS server node. That is, to enable NFS cross-mounting,
an IP label must exist on the client nodes, and this IP label must be configured on the same subnet as the
service IP label of the NFS server node.
“Types of distribution for service IP label aliases” on page 28
You can specify in SMIT different distribution preferences for the placement of service IP label aliases.
“Planning for IP address takeover via IP aliases”
Assigning IP aliases to NICs allows you to create more than one IP label on the same network interface.
During IP address takeover via IP aliases, when an IP label moves from one NIC to another, the target
NIC receives the new IP label as an IP alias and keeps the original IP label and hardware address.
Review the following information when planning for IP address takeover via IP aliases:
v Each network interface must have a boot IP label defined to PowerHA SystemMirror. The interfaces
that is defined to PowerHA SystemMirror are used to keep the service IP addresses highly available.
v Hardware Address Takeover (HWAT) cannot be configured for networks that are using IP address
takeover via IP aliasing.
v The following subnet requirements apply when there are multiple interfaces on a node attached to the
same network:
– All boot addresses must be defined on different subnets.
– Service addresses must be on a different subnet from all boot addresses and persistent addresses.
Note: These subnet requirements avoid the IP route striping function of the AIX operating system,
which allows multiple routes to the same subnet and can cause application traffic to be sent to a failed
interface. These requirements do not apply to multiple interfaces that are combined into a single logical
interface using Ethernet Aggregation or EtherChannel.
v Service address labels configured for IP address takeover via IP aliases can be included in all
nonconcurrent resource groups.
v Multiple service labels can coexist as aliases on a given interface.
v The netmask for all IP labels in a PowerHA SystemMirror network must be the same.
v If there are multiple service and persistent labels, PowerHA SystemMirror attempts to distribute them
evenly across all available network interfaces. You can specify a location preference such that the
persistent and service aliases are always mapped to the same interface. For more information about
service labels, see Distribution preference for service IP label aliases.
The boot addresses on a node (the base address assigned by the AIX operating system after a system
reboot and before the PowerHA SystemMirror software is started) are defined to PowerHA SystemMirror
as boot addresses. PowerHA SystemMirror automatically discovers and configures boot addresses when
you first configure your cluster. When the PowerHA SystemMirror software is started on a node, the
node's service IP label is added as an alias onto one of the NICs that is defined as a boot address. If the
NIC that hosts the service IP fails, PowerHA SystemMirror attempts to move it to another active boot
NIC on the same node.
During a node fallover event, the service IP label that is moved is placed as an alias on the target node's
NIC in addition to any other service labels that might already be configured on that NIC.
For example, if Node A fails, PowerHA SystemMirror tries to move all resource groups to Node B. If the
resource groups contain service IP addresses, PowerHA SystemMirror places the service IP as an alias
onto the appropriate NIC on Node B, and any other existing labels remain intact on Node B's NIC. Thus,
a NIC on Node B can now receive client traffic that formerly was directed to the service address that was
on Node A. Later, when Node A is restarted, it starts on its boot addresses and is available to host the
service IP if Node B fail. When Node B releases the requested service IP label, the alias for the service IP
labels is deleted on Node B. Node A again puts the service IP label as an alias onto one of its boot
address interfaces on the appropriate network.
During IPAT, PowerHA SystemMirror attempts to recover the service IP address on the same node by
using a different adapter on the same subnet. If there are no adapters available on the same subnet that is
defined to PowerHA SystemMirror, then PowerHA SystemMirror moves the service IP address to the
backup node that is defined in the resource group policy. While releasing the service IP address,
If your environment has multiple adapters on the same subnet, all the adapters must have the same
network configuration and the adapters must be part of the PowerHA SystemMirror configuration.
When using IPAT via IP aliases, service IP labels are acquired by using all available and appropriate
interfaces. If multiple interfaces are available to host the service IP label, the interface is chosen according
to the configured distribution policy. By default, an anti-collocation policy is used, and an attempt is
made to evenly distribute service IP labels across all the available interfaces.
In PowerHA SystemMirror, there is no impact to any service IP labels aliased on an interface if you
remove the boot-time address for the interface with the ifconfig command. However, if you use chdev
command (or the chinet fastpath) to replace a boot address on an interface, this command deletes any
service IP aliases and there is no indication that this happened in PowerHA SystemMirror. If you need to
change or temporarily unassign the boot address of an interface that is actively hosting service addresses,
it is best to move service addresses to another interface first. If no other interfaces are available, use the
ifconfig command alias to replace a boot address on an interface without disrupting application access.
Related reference:
“NFS cross-mounting in PowerHA SystemMirror” on page 53
An NFS cross-mounting is a specific PowerHA SystemMirror NFS configuration where each node in the
cluster can act as both the NFS server and the NFS client. While a file system is being exported from one
node, the file system is mounted with NFS on all the nodes for the resource group, including the one that
is exporting it. Another file system can also be exported from another node, and be mounted with NFS
on all nodes.
PowerHA SystemMirror lets you specify the distribution preference for the service IP label aliases. These
are the service IP labels that are part of PowerHA SystemMirror resource groups and that belong to IPAT
via IP aliases networks.
A distribution preference for service IP label aliases is a networkwide attribute used to control the
placement of the service IP label aliases on the physical network interface cards on the nodes in the
cluster.
Use the distribution preference for IP aliases to address the following cluster requirements:
v Firewall considerations
v Cluster configurations that use VLANs (when applications are expecting to receive packets from a
specific network interface)
v Specific requirements for the placement of IP labels in the cluster
Configuring a distribution preference for service IP label aliases does the following:
v Lets you customize the load balancing for service IP labels in the cluster, taking into account the
persistent IP labels previously assigned on the nodes.
v Enables PowerHA SystemMirror to redistribute the alias service IP labels according to the preference
you specify.
v Allows you to configure the type of the distribution preference suitable for the VPN firewall external
connectivity requirements.
The distribution preference is exercised in the cluster as long as acceptable network interfaces are
available. PowerHA SystemMirror always keeps service IP labels active, even if the preference cannot be
satisfied.
Type of distribution
preference Description
Anti-collocation This is the default. PowerHA SystemMirror distributes all service IP label aliases across all boot IP
labels by using a least-loaded selection process.
Collocation PowerHA SystemMirror allocates all service IP label aliases on the same network interface card
(NIC).
Anti-collocation with PowerHA SystemMirror distributes all service IP label aliases across all active physical interfaces
persistent labels that are not hosting the persistent IP label. PowerHA SystemMirror places the service IP label alias
on the interface that is hosting the persistent label only if no other network interface is available.
Note: If you did not configure persistent IP labels, PowerHA SystemMirror lets you select the
anti-collocation with persistent distribution preference, but it issues a warning and uses the
regular anti-collocation preference by default.
Collocation with persistent All service IP label aliases are allocated on the same NIC that is hosting the persistent IP label.
labels This option might be useful in VPN firewall configurations where only one interface is granted
external connectivity and all IP labels (persistent and service) must be allocated on the same
interface card.
Note: If you did not configure persistent IP labels, you can use PowerHA SystemMirror to select
the collocation with persistent distribution preference, but it issues a warning and uses the regular
collocation preference by default.
Anti-collocation with source Service labels are mapped by using the anti-collocation preference. If there are not enough
adapters, more than one service label can be placed on one adapter. With this choice one label is
chosen as the source address for outgoing communication. The interface label chosen in the
Source IP Label for outgoing packets field is the source address.
Collocation with source Service labels are mapped by using Collocation preference. With this choice one service label is
chosen as the source address for outgoing communication. The interface label chosen in the
Source IP Label for outgoing packets field is the source address.
Anti-Collocation with Service labels are mapped by using the anti-collocation with persistent preference. One service
persistent label and source address can be chosen as a source address for the case when there are more service addresses
than boot adapters.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
An IP address that is valid at one site might not be valid at the other site because of subnet issues, you
can associate a service IP label that is configurable on multiple nodes with a specific site. Site-specific
service IP labels are configured in PowerHA SystemMirror and can be used with or without PowerHA
SystemMirror Enterprise Edition for AIX networks. This label is associated with a resource group and is
active only when the resource group is in an Online Primary state at the associated site.
If network information service (NIS) or domain name server (DNS) is in operation, IP lookup defaults to
a name server system for name and address resolution. However, if the name server was accessed
through an interface that has failed, the request does not complete, and eventually times out. This time
out can significantly slow down PowerHA SystemMirror event processing.
To ensure that cluster event completes successfully and quickly, PowerHA SystemMirror disables NIS or
DNS host name resolution by setting the following AIX environment variable during service IP label
swapping:
NSORDER = local
As a result, the /etc/hosts file of each cluster node must contain all PowerHA SystemMirror defined IP
labels for all cluster nodes.
Disabling NIS or DNS host name resolution is specific to the PowerHA SystemMirror event script
environment. PowerHA SystemMirror sets the NSORDER variable to local when it attaches a service IP
label and when it swaps IP labels on an interface.
Other processes continue to use the default system name resolution settings (for example, applications
outside of PowerHA SystemMirror that require DNS IP address resolution). If these processes request IP
lookup, then during the network interface reconfiguration events in PowerHA SystemMirror the
processes still may not be able to contact an external name server. The request to the DNS will succeed
after PowerHA SystemMirror completes the network interface reconfiguration event.
All available and supported network and storage interfaces in the cluster are used to, monitor interfaces,
ensure connectivity to cluster peers, and report when a connection fails.
Contact your IBM representative for the current list of supported adapters, disks and multipath drivers.
PowerHA SystemMirror supports monitoring and communication over the following types of interfaces:
v Ethernet
v 4 GB and 8 GB Emulex Fibre Channel adapters
v The repository disk (SAN and SAS disks are supported)
v FCoE (Fibre Channel over Ethernet)
Related information:
PowerHA Hardware Support Matrix
Certain VPN firewall configurations allow external connectivity to only one NIC at a time. If your
firewall is configured this way, allocate all PowerHA SystemMirror service and persistent IP labels on the
same interface.
To have PowerHA SystemMirror manage the IP labels to satisfy the requirements of such a VPN firewall:
v Specify the persistent IP label for each node in the cluster. The persistent IP label is mapped to an
available interface on the selected network.
v Specify the collocation with persistent distribution preference for the network containing the service IP
labels. This ensures that all service IP label aliases are allocated on the same physical interface that is
hosting the persistent IP label.
Related reference:
“Types of distribution for service IP label aliases” on page 28
You can specify in SMIT different distribution preferences for the placement of service IP label aliases.
Related information:
Administering PowerHA SystemMirror
Before you implement IPv6 in your environment, you must consider the following areas of PowerHA
SystemMirror:
v Cluster Aware AIX (CAA) automatically uses heartbeats for any IP address configured in a node. To
prevent CAA from heartbeating any IPv6 address on a particular interface, your network that uses
PowerHA SystemMirror must be identified as a private network.
v IPv6 uses dynamic configuration of adapter addresses and other network attributes. By default, IPv6
addresses do not persist across a system reboot operation, however, you can configure your system to
run the autoconf6 command during startup. You need to plan for how and where the autoconf6
command is run in your system environment.
v PowerHA SystemMirror uses link local addresses as boot addresses. You can configure a second alias
address to be used as the PowerHA SystemMirror boot address. Link local addresses are convenient to
Changing the network attribute to private makes the network Oracle-compatible by changing all
interfaces to service.
After creating your cluster networks (either manually or using discovery), you can change the network
attribute by following this SMIT path:
Cluster Nodes and Networks > Manage Networks and Network Interfaces > Networks > Change/Show
a Network
Select the network to be changed, and then change the Network Attribute setting to private. Synchronize
the cluster after making this change.
For example, if a network cable is unplugged from a Virtual I/O Server (VIOS), it cannot communicate
with the external network. Thus, the VIOS partitions might report their individual virtual interfaces as
available, when they cannot reach any external LAN beyond the virtual network. You can fix this
problem with APAR IV14422. This problem is similar to APAR IZ01331, which is for HACMP 6.1. If you
are migrating from HACMP 6.1, or earlier, and you applied APAR IZ01331, you need not change your
existing netmon.cf file. However, you must apply APAR IV14422 after the migration.
The following list describes the variables that are used in the netmon.cf file.
!REQD
An explicit string that must be at the beginning of the line without any leading spaces.
owner The interface whose online or offline status is determined by whether it can ping any of the
specified targets. The owner can be specified as a hostname, IP address, or interface name. If you
use a hostname, it must resolve to an IP address or the line is ignored. You can specify the !ALL
string to indicate that all adapters use the specified target.
target The IP address or hostname you want the owner to try to ping. To use a hostname, the target
must be resolvable to an IP address.
When you are creating or changing the netmon.cf file, consider the following information:
v When you change the netmon.cf file in PowerHA SystemMirror Version 7.1, or later, you do not have
to restart cluster services to apply the changes. The cthags subsystem automatically re-reads the
netmon.cf file approximately every minute.
v You must select targets that are outside the virtual network environment.
v Targets that you identify must be maintained through changes in your network environment.
v You can provide only one target per line. However, in IBM AIX 7.1 with Technology Level 4, or earlier,
you specify the same owner entry up to 32 different lines in the netmon.cf file. In IBM AIX 7.1 with
Technology Level 4, or later, and AIX Version 7.2, or later, only the last five entries for an owner entry
are considered. For an owning adapter listed on more than one line, the adapter is considered available
if it can ping any of the provided targets.
v Do not use targets that are all on the same physical system. Also, do not make all of your targets to be
adapters from the same PowerHA SystemMirror cluster. Otherwise, any node in that cluster cannot
keep its adapters available when it is the only node online.
v Each virtual adapter must have at least one line inside the netmon.cf file that specifies a target that can
be pinged from the boot IP address on that interface, or a persistent IP alias if one is configured.
v Network hardware that can be pinged, such as gateways and routers, are useful as target addresses
because PowerHA SystemMirror nodes already use them.
If some adapters on the same network are virtual and others are not, it is perfectly acceptable to use the
!REQD format. For virtual adapter en0 in the netmon.cf file, use !REQD format. For physical adapter en1,
it is optional to include !REQD in the netmon.cf file, and it is also optional to use the !REQD format.
In PowerHA SystemMirror, only the !REQD entries are used. If there are any other entries in the
netmon.cf file, they will be useless and are ignored. But how the !REQD value is used is still the same.
You must be able to ping at least one target (if there is more than one line for the same adapter).
Note: This format also applies to IVE networks. However, you cannot use a target that is a member of
the IVE network in the same physical system.
Examples
You can now add networking to the sample cluster diagram started in Overview of the planning process.
Related reference:
“Overview of the planning process” on page 4
This topic describes the steps for planning a PowerHA SystemMirror cluster.
Prerequisites
You have completed the planning steps in the planning cluster network connectivity and planning
application and application controllers sections.
Refer to AIX documentation for the general hardware and software setup for your disk and tape devices.
Typically, takeover occurs within 30 - 300 seconds. This range depends on the number and types of disks
used, the number of volume groups, the file systems (whether shared or Network File System (NFS)
cross-mounted), and the number of critical applications in the cluster configuration.
When planning the shared external disk for your cluster, the objective is to eliminate single points of
failure in the disk storage subsystem. The following table lists the disk storage subsystem components,
with suggested ways to eliminate them as single points of failure.
For specific information on what disk technologies are supported on specific version of PowerHA
SystemMirror and the AIX operating system, see PowerHA hardware support matrix.
Related information:
OEM disk, volume group, and file systems accommodation
The IBM DS4000® series are less prone to power supply problems because they have redundant power
supplies.
The executable modules of the highly available applications should be on the internal disks and not on
the shared external disks, for the following reasons:
v Licensing
v Application startup
Related information:
Configuring AIX for PowerHA SystemMirror
Licensing
Vendors might require that you purchase a separate copy of each application for each processor or
multiprocessor that might run it, and protect the application by incorporating processor-specific
information into the application when it is installed.
Thus, if you are running your application executable from a shared disk, it is possible that after a fallover,
PowerHA SystemMirror will be unable to restart the application on another node, because, for example,
the processor ID on the new node does not match the ID of the node on which the application was
installed.
The application might also require that you purchase what is called a node-bound license, that is, a
license file on each node that contains information specific to the node.
There might also be a restriction on the number of floating licenses (available to any cluster node)
available within the cluster for that application. To avoid this problem, be sure that there are enough
licenses for all processors in the cluster that might potentially run an application at the same time.
You might need to customize your configuration files if your configuration requires both of the following:
v You plan to store these configuration files on a shared file system.
v The application cannot use the same configuration on every fallover node.
For example, in a two-node mutual takeover configuration, both nodes might be running different
instances of the same application, and standing by for one another. Each node must be aware of the
location of configuration files for both instances of the application, and must be able to access them after
a fallover. Otherwise, the fallover will fail, leaving critical applications unavailable to clients.
To decrease how much you will need to customize your configuration files, place slightly different
startup files for critical applications on local file systems on either node. This allows the initial application
parameters to remain static. The application will not need to recalculate the parameters each time it is
called.
Your cluster requirements depend on the configuration you specify. To ensure that you account for all
required components, complete a diagram for your system. In addition, consult the hardware information
for detailed information about cabling and attachment for the particular devices you are configuring.
Disk adapters
Remove any SAS terminators on the adapter. Use external terminators in a PowerHA SystemMirror
cluster. If you terminate the shared SAS bus on the adapter, you lose termination when the cluster node
that contains the adapter fails.
Cables
The cables required to connect nodes in your cluster depend on the type of SCSI bus you are configuring.
Select cables that are compatible with your disk adapters and controllers. For information on the type and
length SCSI cable required, see the hardware documentation that accompanies each device you want to
include on the SCSI bus.
For the cluster diagram, draw a box representing each shared disk. Then label each box with a shared
disk name.
Related reference:
“Initial cluster planning” on page 5
This section describe the initial steps you take to plan a PowerHA SystemMirror cluster to make
applications highly available.
Direct fibre channel tape unit attachments are supported. Management of shared tape drives is simplified
by the following PowerHA SystemMirror functions:
v Configuration of tape drives using SMIT
v Verification of proper configuration of tape drives
v Automatic management of tape drives during resource group start and stop operations
v Reallocation of tape drives on node failure and node recovery
v Controlled reallocation of tape drives on cluster shutdown
v Controlled reallocation of tape drives during dynamic reconfiguration
When you plan to include tape drives as cluster resources remember the following:
v A tape loader or stacker is treated like a simple tape drive by PowerHA SystemMirror.
v No more than two cluster nodes can share the tape resource.
v Tape resources cannot be part of concurrent resource groups.
v The tape drive must have the same name (for example, /dev/rmt0) on both nodes that shares the tape
device.
v When a tape special file is closed, the default action is to release the tape drive. PowerHA
SystemMirror is not responsible for the state of the tape drive after an application has opened the tape.
v No means of synchronizing tape operations and application controllers is provided. If you decide that
a tape reserve and release operation should be done asynchronously, provide a way to notify the
application controller to wait until the reserve and release operation is complete.
This reservation is held until an application releases it, or the node is removed from the cluster:
v When the special file for the tape is closed, the default action is to release the tape drive. An
application can open a tape drive with a do-not-release-on-close flag. PowerHA SystemMirror will not
be responsible for maintaining the reservation after an application is started.
v Upon stopping cluster services on a node and bringing resource groups offline, the tape drive is
released, allowing access from other nodes.
v Upon unexpected node failure, a forced release is done on the takeover node. The tape drive is then
reserved as part of resource group activation.
If a tape operation is in progress when a tape reserve or release is initiated, it may take might minutes
before the reserve or release operation completes. PowerHA SystemMirror allows synchronous or
asynchronous reserve and release operations. Synchronous and asynchronous operation is specified
separately for reserve and release.
Synchronous operation
With synchronous operation, (the default value), PowerHA SystemMirror waits for the reserve or release
operation, including the execution of a user-defined recovery procedure, to complete before continuing.
Asynchronous operation
With asynchronous operation, PowerHA SystemMirror creates a child process to perform the reserve or
release operation, including the execution of a user-defined recovery procedure, and immediately
continues.
Recovery procedures
Recovery procedures are highly dependent on the application accessing the tape drive.
Rather than trying to predict likely scenarios and develop recovery procedures, PowerHA SystemMirror
provides for the execution of user defined recovery scripts for the following operations:
v Tape start
v Tape stop
Tape start and stop operations occur during node start and stop, node fallover and reintegration, and
dynamic reconfiguration. These scripts are called when a resource group is activated (tape start) or when
a resource group is deactivated (tape stop). Sample start and stop scripts can be found in the
/usr/es/sbin/cluster/samples/tape directory:
tape_resource_stop_example
v During tape start, PowerHA SystemMirror reserves the tape drive, forcing a release if necessary, and
then calls the user-provided tape start script.
v During tape stop, PowerHA SystemMirror calls the user-provided tape stop script, and then releases
the tape drive.
Note: You are responsible for correctly positioning the tape, terminating processes or applications,
writing to the tape drive, and writing end-of-tape marks within these scripts.
Other application-specific procedures should be included as part of the start server and stop server
scripts.
Tape drives with more than one SCSI interface are not supported. Therefore, only one connection exists
between a node and a tape drive. The usual notion of adapter fallover does not apply.
If a node that has tape resources that are part of a PowerHA SystemMirror resource group fails, the
takeover node reserves the tape drive, forcing a release if necessary, and then calls the user-provided tape
start script.
On reintegration of a node, the takeover node runs the tape stop script and then releases the tape drive.
The node being reintegrated reserves the tape drive and calls the user-provided tape start script.
PowerHA SystemMirror does not provide tape fallover and recovery procedures for network failure.
Prerequisites
You should also be familiar with how to use the Logical Volume Manager (LVM).
Overview
Planning shared logical volume manager (LVM) components for a PowerHA SystemMirror cluster
depends on the type of shared disk device and the method of shared disk access.
To avoid a single point of failure for data storage, use data redundancy as supported by LVM or your
storage system.
Related information:
OEM disk, volume group, and file systems accommodation
Operating system and device management
Physical storage refers to the actual location of data on a disk. Logical storage controls how data is made
available to the user. Logical storage can be discontiguous, expanded, and replicated, and can span
multiple physical disks. These facilities provide improved availability of data.
Physical volumes
A physical volume is a single physical disk or a logical unit presented by a storage array.
The physical volume is partitioned to provide the AIX operating system with a way of managing how
data is mapped to the volume. The following figure shows a conventional use of physical partitions
within a physical volume.
Volume groups
A volume group is a set of physical volumes that the AIX operating system treats as a contiguous,
addressable disk region. You can place multiple physical volumes in the same volume group. The actual
number depends on how the volume group is created.
In the PowerHA SystemMirror environment, a shared volume group is a volume group that resides entirely
on the external disks that are shared by the cluster nodes. A nonconcurrent shared volume group can be
varied on by only one node at a time.
Logical volumes
A logical volume is a set of logical partitions that the AIX operating system makes available as a single
storage unit, that is, the logical view of a disk.
A logical partition is the logical view of a physical partition. Logical partitions might be mapped to one,
two, or three physical partitions to implement mirroring.
In the PowerHA SystemMirror environment, logical volumes can be used to support a journaled file
system or a raw device.
File systems
A file system is written to a single logical volume.
Ordinarily, you organize a set of files as a file system for convenience and speed in managing data.
In the PowerHA SystemMirror system, a shared file system is a journaled file system that resides entirely
in a shared logical volume.
You want to plan shared file systems to be placed on external disks that are shared by cluster nodes. Data
resides in file systems on these external shared disks in order to be made highly available.
The order in which file systems are mounted is usually not important. However, if this is important to
your cluster, you need to plan for some things:
v File systems that exist within a single resource group are mounted in alphanumeric order when the
resource group comes online. They are also unmounted in reverse alphanumeric order when the
resource group is taken offline.
v If you have shared, nested file systems, additional care is needed. If you have shared, nested file
systems within a single resource group, then you must set the filesystems recovery method for the
resource group to sequential to guarantee the correct mount order.
v If you have nested file systems that reside in different resource groups, you must additionally plan a
parent-child relationship for those resource groups to guarantee the correct mount order.
If a copy is lost due to an error, the other undamaged copies are accessed, and the AIX operating system
continues processing with an accurate copy. After access is restored to the failed physical partition, AIX
resynchronizes the contents (data) of the physical partition with the contents (data) of a consistent mirror
copy.
The following figure shows a logical volume composed of two logical partitions with three mirrored
copies. In the diagram, each logical partition maps to three physical partitions. Each physical partition
should be designated to reside on a separate physical volume within a single volume group. This
configuration provides the maximum number of alternative paths to the mirror copies and, therefore, the
greatest availability.
Figure 7. Logical volume of two logical partitions with three mirrored copies
The mirrored copies are transparent, meaning that you cannot isolate one of these copies. For example, if
you delete a file from a logical volume with multiple copies, the deleted file is removed from all copies of
the logical volume.
Although using mirrored copies spanning multiple disks (on separate power supplies) together with
multiple disk adapters ensures that no disk is a single point of failure for your cluster, these
configurations might increase the time for write operations.
Specify the superstrict disk allocation policy for the logical volumes in volume groups for which forced
varyon is specified. This configuration:
v Guarantees that copies of a logical volume always reside on separate disks
v Increases the chances that forced varyon will be successful after a failure of one or more disks.
If you plan to use forced varyon for the logical volume, apply the superstrict disk allocation policy for
disk enclosures in the cluster.
For more information about forced varyon, see the section Using quorum and varyon to increase data
availability.
Related reference:
“Using quorum and varyon to increase data availability” on page 48
How you configure quorum and varyon for volume groups can increase the availability of mirrored data.
The AIX operating system uses journaling for its file systems. In general, this means that the internal state
of a file system at startup (in terms of the block list and free list) is the same state as at shutdown. In
practical terms, this means that when AIX starts up, the extent of any file corruption can be no worse
than at shutdown.
Each volume group contains a jfslog or jfs2log log, which is itself a logical volume. This log typically
resides on a different physical disk in the volume group than the journaled file system. However, if
access to that disk is lost, changes to file systems after that point are in jeopardy.
To avoid the possibility of that physical disk being a single point of failure, you can specify mirrored
copies of each jfslog or jfs2log log. Place these copies on separate physical volumes.
A SAN is a high-speed network that allows your environment to establish direct connections between
storage devices and systems (nodes). Thus, two or more systems located at different locations can access
the same physical disks by using a SAN network connection. You can combine remote disks into a
volume group using LVM. You can import this volume group to the nodes located at different locations.
The logical volumes in the volume group that contain the remote disks can have up to three remote
mirrors. You can set up at least one remote mirror at each location. The data stored in the logical volume
PowerHA SystemMirror automatically synchronizes all remote mirrors after a disk or node failure occurs,
and the nodes are brought back online. Automatic synchronization happens even if one of the disks is in
the PVREMOVED state or the PVMISSING state. Automatic synchronization is not available for all cases
of LVM split-site mirroring. If it is not available, you can use C-SPOC to synchronize the data.
When planning for an LVM split-site mirroring configuration, you must also plan for the repository disk
used in the cluster. You must verify that there is a second disk ready to be used as a repository disk
when the primary disk fails.
Note: Cluster Aware AIX supports live repository replacement without impacting critical cluster
functions.
Example
The following figure is a configuration example of LVM split-site mirroring using a SAN.
You can mirror the disks that are connected to at least one node at each of the two locations. In this
example, PV4 is available for Node A and Node B on Location 1 and Node C on Loaction 2 using the
Fibre Channel Switch 1 and Fibre Channel Switch 2 connection. You can have a mirror of PV4 on
Location 1. The disks that are connected to the nodes on only one location (PV5 and PV6) cannot be
mirrored across locations.
You can use the AIX LVM mirrored pools function to ensure that data is correctly and completely
mirrored between the two locations. If PV1 and PV2 are in one mirrored pool, and PV3 and PV4 are in a
separate mirrored pool, then LVM allows one complete copy of the data to be present at each location.
You must use superstrict mirrored pools to guarantee that a complete copy of the data is at each location.
A SAN is a high-speed network that allows the establishment of direct connections between storage
devices and processors . Thus, two or more nodes located at different sites can access the same physical
disks, which can be separated by some distance, through the common SAN. These remote disks can be
combined into a volume group by using LVM, and this volume group can be imported to the nodes that
are located at different sites. The logical volumes in this volume group can have up to three mirrors.
Thus, you can set up at least one mirror at each site. The information stored on this logical volume is
kept highly available, and in case of certain failures, the remote mirror at another site will still have the
latest information so that the operations can be continued on the other site.
PowerHA SystemMirror automatically synchronizes mirrors after a disk or node failure and subsequent
reintegration. PowerHA SystemMirror handles the automatic mirror synchronization even if one of the
disks is in the PVREMOVED or PVMISSING state. The automatic synchronization is not possible for all
cases, but you can use C-SPOC to synchronize the data from the surviving mirrors to stale mirrors after a
disk or site failure, and subsequent reintegration.
Note: In PowerHA SystemMirror Enterprise Edition, you can also use mirroring in a cluster that spans
two sites by using the Geographic Logical Volume Manager (GLVM) mirroring function.
When you varyon the volume group in enhanced concurrent mode on all nodes that own the resource
group in a cluster, the LVM allows access to the volume group on all nodes. However, it restricts the
higher-level connections, such as NFS mounts and JFS mounts, on all nodes, and allows them only on the
node that currently owns the volume group in PowerHA SystemMirror.
You can use the AIX MPIO function to access disk subsystems through multiple paths. Multiple paths
provide more throughput and higher availability than the use of a single path. In particular, when
multiple paths are used, failure of a single path due to an adapter, or a cable or switch failure will not
cause applications to lose access to data. While PowerHA SystemMirror will attempt to recover from
complete loss of access to a volume group, that loss itself is going to be temporarily disruptive. The AIX
MPIO function can prevent a single component failure from causing an application outage.
When fast disk takeover is used, the disk reservation function is not used. If the cluster becomes
partitioned, nodes in each partition could accidentally varyon the volume group in active state. Because
active state varyon of the volume group allows mounting of file systems and changing physical volumes,
this situation can result in different copies of the same volume group. For more information about fast
disk takeover and using multiple networks, see the section Using fast disk takeover.
Enhanced concurrent mode is the only option for creating concurrent volume groups. In PowerHA
SystemMirror, enhanced concurrent mode volume groups do not use disk reserves. The concurrent access
that is required for MPIO accessed disks is automatically provided in PowerHA SystemMirror.
All concurrent volume groups are created as enhanced concurrent mode volume groups by default. For
enhanced concurrent volume groups, the Concurrent Logical Volume Manager (CLVM) coordinates
changes between nodes through the Group Services component of the Reliable Scalable Cluster
Technology (RSCT) function in the AIX operating system. Group Services protocols flow over the
communications links between the cluster nodes.
Related tasks:
Converting volume groups to enhanced concurrent mode
Related reference:
“Using fast disk takeover” on page 47
PowerHA SystemMirror automatically detects failed volume groups and initiates a fast disk takeover for
enhanced concurrent mode volume groups that are included as resources in nonconcurrent resource
groups.
Fast disk takeover is especially useful for fallover of enhanced concurrent mode volume groups made up
of a large number of disks. This disk takeover mechanism is faster than disk takeover used for standard
volume groups included in nonconcurrent resource groups. During fast disk takeover, PowerHA
SystemMirror skips the extra processing needed to break the disk reserves, or update and synchronize the
logical volume manager (LVM) information by running lazy update.
Fast disk takeover has been observed to take no more than 10 seconds for a volume group with two
disks. This time is expected to increase very slowly for larger numbers of disks and volume groups. The
actual time observed in any configuration depends on factors outside of PowerHA SystemMirror control,
such as the processing power of the nodes and the amount of unrelated activity at the time of the
fallover. The actual time observed for completion of fallover processing depends on additional factors,
such as whether or not a file system check is required, and the amount of time needed to restart the
application.
Note: Enhanced concurrent mode volume groups are not concurrently accessed. They are only accessed
by one node at any given time. The fast disk takeover mechanism works at the volume group level, and
is thus independent of the number of disks used.
To enable fast disk takeover, PowerHA SystemMirror activates enhanced concurrent volume groups in the
active and passive states.
Active varyon
Active varyon behaves the same as ordinary varyon, and makes the logical volumes available. When an
enhanced concurrent volume group is varied on in active state on a node:
v Operations on file systems, such as file system mounts
v Operations on applications
v Operations on logical volumes, such as creating logical volumes
v Synchronizing volume groups
Passive varyon
When an enhanced concurrent volume group is varied on in passive state, the LVM provides the
equivalent of disk fencing for the volume group at the LVM level.
Passive state varyon allows only a limited number of read-only operations on the volume group:
v LVM read-only access to the volume group's special file
v LVM read-only access to the first 4 Kb of all logical volumes that are owned by the volume group.
The following operations are not allowed when a volume group is varied on in passive state:
v Operations on file systems, such as file systems mounting
v Any operations on logical volumes, such as having logical volumes open
v Synchronizing volume groups
PowerHA SystemMirror correctly varies on the volume group in active state on the node that owns the
resource group, and changes active and passive states appropriately as the state and location of the
resource group changes.
v Upon cluster startup:
– On the node that owns the resource group, PowerHA SystemMirror activates the volume group in
active state. PowerHA SystemMirror activates a volume group in active state only on one node at a
time.
– PowerHA SystemMirror activates the volume group in passive state on all other nodes in the cluster.
v Upon fallover:
– If a node releases a resource group or if the resource group is being moved to another node for any
other reason, PowerHA SystemMirror switches the varyon state for the volume group from active to
passive on the node that releases the resource group. PowerHA SystemMirror then activates the
volume group in active state on the node that acquires the resource group.
– The volume group remains in passive state on all other nodes in the cluster.
v PowerHA SystemMirror does the following processes when node reintegration occurs:
– Changes the varyon state of the volume group from active to passive on the node that releases the
resource group.
– Varies on the volume group in active state on the joining node .
– Activates this volume group in passive state on all other nodes in the cluster.
Note: The switch between active and passive states is necessary to prevent mounting file systems on
more than one node at a time.
Using quorum
Quorum ensures that more than half of the physical disks in a volume group are available.
Quorum does not keep track of logical volume mirrors, and is therefore not a useful way to ensure data
availability. You can lose quorum when you still have all your data. Conversely, you can lose access to
some of your data, and not lose quorum.
Quorum is beneficial for volume groups on RAID arrays, such as the ESS and IBM TotalStorage DS
Series. The RAID device provides data availability and recovery from loss of a single disk. Mirroring is
typically not used for volume groups contained entirely within a single RAID device. If a volume group
is mirrored between RAID devices, forced varyon can bring a volume group online despite loss of one of
the RAID devices.
Decide whether to enable or disable quorum for each volume group. The following table shows how
quorum affects when volume groups varyon and off:
Quorum checking is enabled by default. You can disable quorum by using the chvg -Qn vgname
command, or by using the smit chvg fastpath.
Related information:
chvg command
Quorum must be enabled for a PowerHA SystemMirror concurrent access configuration. Disabling
quorum could result in data corruption. Any concurrent access configuration where multiple failures
could result in no common shared disk between cluster nodes has the potential for data corruption or
inconsistency.
The following figure shows a cluster with two sets of IBM disk subsystems configured for no single point
of failure. The logical volumes are mirrored across subsystems and each disk subsystem is connected to
each node with separate NICs.
If multiple failures result in a communications loss between each node and one set of disks in such a way
that node A can access subsystem 1 but not subsystem 2, and node B can access subsystem 2 but not
subsystem 1. Both nodes continue to operate on the same baseline of data from the mirrored copy they
can access. However, each node does not see modifications made by the other node to data on disk. As a
result, the data becomes inconsistent between nodes.
With quorum protection enabled, the communications failure results in one or both nodes varying off the
volume group. Although an application does not have access to data on the volume group that is varied
off, data consistency is preserved.
PowerHA SystemMirror selectively provides recovery for nonconcurrent resource groups (with the
startup policy not Online on All Available Nodes) that are affected by failures of specific resources.
PowerHA SystemMirror automatically reacts to an LVM_SA_QUORCLOSE loss-of-quorum error
If the AIX Logical Volume Manager takes a volume group in the resource group offline due to a loss of
quorum for the volume group on the node, PowerHA SystemMirror selectively moves the resource group
to another node. You can change this default behavior by customizing resource recovery to use a notify
method instead of fallover.
PowerHA SystemMirror launches selective failover and recovers the affected resource groups in response
to an LVM_SA_QUORCLOSE error. This error is generated by AIX LVM for specific error conditions,
even if the volume group is not defined as quorum enabled. AIX LVM might also generate other types of
error notifications; however, PowerHA SystemMirror doe not react to these by default. In these cases, you
still need to configure customized error notification methods, or use AIX automatic error notification
methods to react to volume group failures.
You can use the rootvg system event to monitor the loss of access to rootvg. If the system loses access,
PowerHA SystemMirror logs an event in the system error log and reboots the system by default. You can
change this setting using SMIT to log an event but not reboot the system.
You can monitor the rootvg events only when the rootvg disk is using the native AIX Multipath I/O
(MPIO) driver and the rootvg disk is not an internal parallel SCSI disk. To verify whether the rootvg disk
is using the MPIO driver, on the command line, type lspath -l hdiskname, where hdiskname is the name
of the rootvg disk. If the rootvg disk is not using the MPIO driver, the following error message is
displayed:
lspath: 0514-538 Cannot perform the requested function because the
specified device does not support multiple paths.
Related information:
Error notification method used for volume group loss
PowerHA SystemMirror monitoring system events
Selective fallover for handling resource groups
Forcing a varyon of a volume group lets you keep a volume group online as long as there is one valid
copy of the data is available. Use a forced varyon only for volume groups that have mirrored logical
volumes.
Note: Use caution when using this function to avoid creating a partitioned cluster.
You can use SMIT to force a varyon of a volume group on a node if the normal varyon command fails on
that volume group due to a lack of quorum but with one valid copy of the data available. Using SMIT to
force a varyon is useful for local disaster recovery, when data is mirrored between two disk enclosures
and one of the disk enclosures becomes unavailable.
Note: You can specify a forced varyon attribute for volume groups on SCSI disks that use logical volume
manager (LVM) mirroring, and for volume groups that are mirrored between separate RAID or ESS
devices.
If you want to force the volume group to varyon when disks are unavailable, use varyonvg -f, which
forces the volume group to varyon, whether or not there are copies of your data. You can specify forced
varyon in SMIT for volume groups in a resource group.
If you are using forced varyon, it is important that multiple network connections between nodes that
shares the storage exists. Multiple network connections help to ensure that each node always has a
communication path to the other nodes, even if one network fails. Having multiple network connections
prevents your cluster from becoming partitioned. Otherwise, a network failure might cause nodes to
attempt to take over resource groups that are still active on other nodes. In this situation, if you have set
a forced varyon setting, you might experience data loss or divergence.
For NFS to work as expected on a PowerHA SystemMirror cluster, there are specific configuration
requirements. Therefore, you must plan for the following tasks:
v Creating shared volume groups
v Exporting NFS file systems
v NFS mounting and fallover
The PowerHA SystemMirror scripts handles default NFS behavior. You might need to modify the scripts
to handle your particular configuration.
You can configure NFS in all resource groups that behave as nonconcurrent; that is, they do not have an
Online on All Available Nodes startup policy.
After NFS file systems become part of resource groups that belong to an active PowerHA SystemMirror
cluster, PowerHA SystemMirror takes care of cross-mounting and unmounting the file systems during
cluster events (such as fallover of a resource group that contains the file system to another node in the
cluster).
If for some reason you stop the cluster services and must manage the NFS file systems manually, the file
systems must be unmounted before you restart the cluster services. This enables management of NFS file
systems by PowerHA SystemMirror after the nodes join the cluster.
When NFS clients use NFS locking to arbitrate access to the shared NFS file system, there is a limit of
two nodes per resource group. Each resource group that uses reliable NFS contains one pair of PowerHA
SystemMirror nodes.
Independent pairs of nodes in the cluster can provide Reliable NFS services. For example, in a four-node
cluster, you can set up two NFS client and server pairs (for example, Node A and Node B provides one
set of Reliable NFS services, and Node C and NodeD can provide another set of Reliable NFS services.)
Pair 1 can provide reliable NFS services for one set of NFS file systems, and pair 2 can provide reliable
NFS services to another set of NFS file systems. This is true whether or not NFS cross-mounting is
configured. PowerHA SystemMirror does not impose a limit to the number of resource groups or NFS file
systems as long as the nodes participating in the resource groups follow the constraints outlined in this
example..
To ensure that the IP address that is going to be used by NFS always resides on the node, you can:
v Use an IP address that is associated with a persistent label
v For an IPAT via aliases configuration, use the IP address used at boot time
v Use an IP address that resides on an interface that is not controlled by PowerHA SystemMirror
In the event of node failure, NFS clients attached to a PowerHA SystemMirror cluster operate the same
way as when a standard NFS server fails and reboots. Accesses to the file systems hang and then recover
when the file systems become available again. However, if the major numbers are not the same, when
another cluster node takes over the file system and re-exports the file system, the client application will
not recover. The client application will not recover because the file system exported by the node will
appear to be different from the one exported by the failed node.
Keep in mind the following points when planning for exporting NFS file systems and directories in
PowerHA SystemMirror:
v NFS file systems and directories to export:
In AIX, you specify the NFS file systems and directories to export by using the smit mknfsexp
command (which creates the /etc/exports file). In PowerHA SystemMirror, you specify NFS file systems
and directories to export by including them in a resource group in PowerHA SystemMirror.
v Export options for NFS exported file systems and directories:
If you want to specify special options for exporting NFS in PowerHA SystemMirror, you can create a
/usr/es/sbin/cluster/etc/exports file. This file has the same format as the regular AIX /etc/exports file.
To ensure the best NFS performance, NFS file systems used by PowerHA SystemMirror should include
the entry vers = <version number> in the options field in the /etc/filesystems file.
Related reference:
“NFS cross-mounting and IP labels” on page 54
To enable NFS cross-mounting, each cluster node might act as an NFS client. Each of these nodes must
have a valid route to the service IP label of the NFS server node. That is, to enable NFS cross-mounting,
an IP label must exist on the client nodes, and this IP label must be configured on the same subnet as the
service IP label of the NFS server node.
When your environment uses NFS version 2 and version 3 to export in resource groups, the cross-mount
capability is restricted to only two node resource groups. If the resource group contains only NFS version
4 (or later) exports, the cross-mount capability is extended up to any number of nodes that are
supporting a resource group.
Each node in the resource group is part of a mutual takeover (or active-active) cluster configuration,
providing and mounting an NFS file system.
Applications access the NFS file systems on any node that is part of the resource group.
When a fallover occurs for a resource group that is configured with IP address takeover, the NFS file
system is locally mounted by the takeover node and re-exported. All other nodes in the resource group
maintain their NFS file system mount.
In an NFS cross-mount configuration, any NFS mount that is defined in a resource group must have a
corresponding NFS export in a resource group. If your NFS cross-mounts do not follow this
configuration, the following message is displayed:
claddres: WARNING: NFS mounts were specified for the resource group
’<RG name>’;however no NFS exports have been specified.
If an NFS cross-mount field contains a value, the corresponding NFS exported file system must also
contain a value.
To enable NFS cross-mounting, each cluster node might act as an NFS client. Each of these nodes must
have a valid route to the service IP label of the NFS server node. That is, to enable NFS cross-mounting,
an IP label must exist on the client nodes, and this IP label must be configured on the same subnet as the
service IP label of the NFS server node.
If the NFS client nodes have service IP labels on the same network, this is not an issue. However, in
certain cluster configurations, you need to create a valid route.
The easiest way to ensure access to the NFS server is to have an IP label on the client node that is on the
same subnet as the service IP label of the NFS server node.
To create a valid route between the NFS client node and the node that is exporting the file system, you
can configure your environment in either of the following ways:
v A separate NIC with an IP label configured on the service IP network and subnet
v A persistent node IP label on the service IP network and subnet.
Be aware that these solutions do not provide automatic root permissions to the file systems because of
the export options for NFS file systems that are set in PowerHA SystemMirror by default.
To enable root level access to NFS-mounted file systems on the client node, add all of the node's IP labels
or addresses to the root = option in the cluster exports file: /usr/es/sbin/cluster/etc/exports. You can do
this on one node, because synchronizing the cluster resources propagates this information to the other
cluster nodes.
Related reference:
“Exporting NFS file systems and directories” on page 52
The process of exporting NFS file systems and directories in PowerHA SystemMirror is different from
that in the AIX operating system.
Stable storage is a file system space that is used to save the state information by the NFS version 4 server.
This is very crucial for maintaining NFS version 4 client's state information to facilitate smooth and
transparent fallover/fallback/move of the Resource group from one node to another.
The this example, Node A currently hosts a nonconcurrent resource group, RG1, which includes /fs1 as
an exported NFS file system and service1 as a service IP label.
In this example, Node B currently hosts a nonconcurrent resource group, RG2, which includes /fs2 as an
exported NFS file system and service2 as a service IP label. On reintegration, /fs1 is passed back to
Node A, locally mounted, and exported. Node B mounts it over NFS again.
In this scenario:
v Node A locally mounts and exports /fs1, then over-mounts on /mnt1.
v Node B NFS-mounts /fs1, on /mnt1 from Node A.
Setting up a resource group like this ensures the expected default node-to-node NFS behavior.
When Node A fails, Node B closes any open files in Node A: /fs1, unmounts it, mounts it locally, and
re-exports it to waiting clients.
Both resource groups contain both nodes as possible owners of the resource groups.
PowerHA SystemMirror handles NFS mounting in nonconcurrent resource groups in a couple of ways.
These include:
v The node that currently owns the resource group mounts the file system over the file system's local
mount point, and this node exports the NFS file system.
v All the nodes in the resource group (including the current owner of the group) mount the NFS file
system over a different mount point.
Therefore, the owner of the group has the file system mounted twice. One file system is mounted as a
local mount and the other is an NFS mount.
Note: The NFS mount point must be outside the directory tree of the local mount point.
Because IPAT is used in resource groups that have NFS-mounted file systems, the nodes will not
unmount and remount NFS file systems during a fallover. When the resource group falls over to a new
node, the acquiring node locally mounts the file system and NFS exports it. (The NFS-mounted file
system is temporarily unavailable to cluster nodes during fallover.) As soon as the new node acquires the
IPAT label, access to the NFS file system is restored.
All applications must refers to the file system through the NFS-mounted file system. If the applications
used must always reference the file system by the same mount point name, you can change the mount
point for the local file system mount (for example, change it to mount point_local and use the previous
local mount point as the new NFS mount point).
The default options used by PowerHA SystemMirror when performing NFS mounts are hard, intr.
An NFS mount point is required to mount a file system using NFS. In a nonconcurrent resource group all
the nodes in the resource group mount the NFS file system. You create an NFS mount point on each node
in the resource group. The NFS mount point must be outside the directory tree of the local mount point.
To create NFS mount points and to configure the resource group for the NFS mount:
1. On each node in the resource group, create an NFS mount point by executing the following
command:
mkdir /mount point
where mount point is the name of the local NFS mount point over which the remote file system is
mounted.
2. In the Change/Show Resources and Attributes for a Resource Group SMIT panel, the Filesystem to
NFS Mount field must specify both mount points.
Specify the nfs mount point and the local mount point, separating the two with a semicolon. For
example:
/nfspoint;/localpoint
If there are more entries, separate them with a space:
/nfspoint1;/local1 /nfspoint2;/local2
3. Optional: If there are nested mount points, nest the NFS mount points in the same manner as the
local mount points so that they match correctly.
4. Optional: When cross-mounting NFS file systems, set the Filesystems Mounted before IP Configured
field in SMIT for the resource group to true.
In Initial cluster planning, you made preliminary choices about the resource group policies and the
takeover priority for each node in the resource group node lists. In this section you do the following:
v Identify the individual resources that constitute each resource group.
v For each resource group, identify which type of group it is: concurrent or nonconcurrent.
v Define the participating node list for the resource groups. The node list consists of the nodes assigned
to participate in the takeover of a given resource group.
v Identify the resource group startup, fallover, and fallback policy.
v Identify applications and their resource groups for which you want to set up location dependencies,
parent-child dependencies, or both.
v Identify the intersite management policies of the resource groups. Are there replicated resources to
consider?
v Identify other attributes and runtime policies to refine resource group behavior.
The following rules and restrictions apply to resources and resource groups:
v In order for PowerHA SystemMirror to keep a cluster resource highly available, it must be part of a
resource group. If you want a resource to be kept separate, define a group for that resource alone. A
resource group can have one or more resources defined.
v A resource cannot be included in more than one resource group.
v The components of a resource group must be unique. Put the application along with the resources it
requires in the same resource group.
v The service IP labels, volume groups, and resource group names must be both unique within the
cluster and distinct from each other. The name of a resource should relate to the application it serves,
as well as to any corresponding device, such as websphere_service_address.
v If you include the same node in participating node lists for more than one resource group, make sure
that the node has the memory, and network interfaces, necessary to manage all resource groups
simultaneously.
A concurrent resource group can be online on multiple nodes. All nodes in the node list of the resource
group acquire that resource group when they join the cluster. There are no priorities among nodes.
Concurrent resource groups can be configured to run on all nodes in the cluster.
The only resources included in a concurrent resource group are volume groups with raw logical volumes,
raw disks, and application controllers that use the disks. The device on which these logical storage
entities are defined must support concurrent access.
Concurrent resource groups have the Online on All Available Nodes startup policy and do not fallover or
fallback from one node to another.
Nonconcurrent resource groups cannot be online on multiple nodes. You can define a variety of startup,
fallover, and fallback policies for these resource groups.
You can fine-tune the nonconcurrent resource group behavior for node preferences during a node startup,
resource group fallover to another node in the case of a node failure, or when the resource group falls
back to the reintegrating node.
PowerHA SystemMirror allows you to configure only valid combinations of startup, fallover, and fallback
behaviors for resource groups. The following table summarizes the basic startup, fallover, and fallback
behaviors you can configure for resource groups in PowerHA SystemMirror.
In addition to the node policies described in the previous table, other issues might determine the resource
groups that a node acquires.
Related reference:
“Planning for cluster events” on page 79
These topics describe the PowerHA SystemMirror cluster events.
The following table summarizes which resource group startup, fallover or fallback policies are affected by
a given attribute or run-time policy. Not every resource group that is available is not listed below.
Related reference:
“Parent and child dependent resource groups” on page 64
Related applications in different resource groups are configured to be processed in logical order.
“Resource-group location dependencies” on page 65
Certain applications in different resource groups stay online together on a node or stay online on
different nodes.
Related information:
PowerHA SystemMirror resources and resource groups
With a settling time is specified, you can avoid having a resource group activated on the first available
node when multiple nodes are starting cluster services at the same time. Also, a higher priority node for
the resource group might join the cluster during this time period.
Settling time lets the cluster manager wait for a specified amount of time before activating a resource
group. Use this attribute to ensure that a resource group does not bounce among nodes, while nodes with
increasing priority for the resource group are brought online.
If the node that is starting is the first node in the node list for this resource group, the settling time
period is skipped and PowerHA SystemMirror immediately attempts to acquire the resource group on
this node.
Note: If a settling time period is specified for a resource group and a resource group is currently in the
ERROR state, the cluster manager waits for the settling time period before attempting to bring the
resource group online during a node_up event.
In general, when a node joins the cluster, it can acquire resource groups. The following list describes the
role of the settling time in this process:
v If the node is the highest priority node for a specific resource group, the node immediately acquires
that resource group and the settling time is ignored. This is the only one circumstance under which
PowerHA SystemMirror ignores the setting.
v If the node is able to acquire some resource groups, but is not the highest priority node for those
groups, the resource groups are not acquired on that node. Instead, they wait during the settling time
interval to see whether a higher priority node joins the cluster.
When the settling time interval ends, PowerHA SystemMirror moves the resource group to the highest
priority node that is currently available and that can take the resource group. If PowerHA SystemMirror
does not find appropriate nodes, the resource group remains offline.
You can use node distribution policy for cluster startup to ensure that PowerHA SystemMirror activates
only one resource group that has this policy enabled on each node. This policy helps you distribute your
CPU-intensive applications on different nodes.
Setting a dynamic node priority policy allows you to use a predefined Resource Monitoring and Control
(RMC) resource variable, such as lowest CPU load, to select the takeover node. With a dynamic priority
policy enabled, the order of the takeover node list is determined by the state of the cluster at the time of
the event, as measured by the selected RMC resource variable. You can set different policies for different
groups or the same policy for several groups.
Another option is to use a user-defined dynamic node priority variable such as cl_highest_udscript_rc⌂.
If you use this option, you must provide a script and execution timeout value, which is invoked on all
candidate failover nodes at the time of the event. Return values are collected from each and the takeover
node is selected based on the return values of the script and the selected dynamic node priority variable.
Remember that selecting a takeover node also depends on such conditions as the availability of a
network interface on that node.
You can use a delayed fallback timer to set the time for a resource group to fall back to a higher priority
node. You can configure the fallback behavior for a resource group to occur at a predefined recurring
time (daily, weekly, monthly, or a specific date).
The resource group does not immediately fall back to its higher priority node under the following
condition.
v You have configured a delayed fallback timer for a resource group.
v A higher priority node joins the cluster.
At the time specified in the Delayed Fallback Timer attribute, one of two scenarios takes place:
v A higher priority node is found. If a higher priority node is available for the resource group, PowerHA
SystemMirror attempts to move the resource group to this node when the fallback timer expires. If the
acquisition is successful, the resource group is acquired on that node.
However, if the acquisition of the resource group on the node fails, PowerHA SystemMirror attempts
to move the resource group to the next higher priority node in the group node list, and so on. If the
acquisition of the resource group on the last node that is available fails, the resource group goes into
an error state. You must take action to fix the error and bring such a resource group back online.
v A higher priority node is not found. If there are no higher priority nodes available for a resource group,
the resource group remains online on the same node until the fallback timer expires again. for example,
if a daily fallback timer expires at 11:00 p.m. and there are no higher priority nodes available for the
resource group to fallback on, the fallback timer recurs the next night at 11:00 p.m.
A fallback timer that is set to a specific date does not recur.
Keep the following points in mind when planning how to configure these dependencies:
v Although by default all resource groups are processed in parallel, PowerHA SystemMirror processes
dependent resource groups according to the order dictated by the dependency, and not necessarily in
parallel. Resource group dependencies are honored cluster wide and override any customization for
serial order of processing of any resource groups included in the dependency.
v Dependencies between resource groups offer a predictable and reliable way of building clusters with
multitiered applications.
The following limitations apply to configurations that combine dependencies. Verification will fail if you
do not have only one resource group belonging to an Online on Same Node dependency set and an
Online On Different Nodes dependency set at the same time.
Related applications in different resource groups are configured to be processed in logical order.
Configuring a resource group dependency allows for better control for clusters with multitiered
applications where one application depends on the successful startup of another application, and both
applications are required to be kept highly available with PowerHA SystemMirror.
Consider the following when planning for parent-child dependent resource groups:
v Plan applications you need to keep highly available and consider whether your business environment
requires one application to be running before another application can be started.
Certain applications in different resource groups stay online together on a node or stay online on
different nodes.
If failures do occur over the course of time, PowerHA SystemMirror distributes resource groups so that
they remain available, but not necessarily on the nodes you originally specified, unless they have the
same home node and the same fallover and fallback policies.
Resource-group location dependency offers you an explicit way to specify that certain resource groups
will always be online on the same node, or that certain resource groups will always be online on different
nodes. You can combine these location policies with parent-child dependencies or start after dependencies
and stop after dependencies to have all child or source resource groups online on the same node while
the parent or target is online on a different node. You can also have all child or source resource groups
online on different nodes for better performance.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
If you have replicated resources, you can combine resource groups into a site dependency to keep them
online at the same site.
PowerHA SystemMirror supports the following types of resource-group location dependencies between
resource groups:
v Online on same node
The following rules and restrictions apply to the Online on same node dependency set of resource
groups. Verification will fail if you do not follow these guidelines:
– All resource groups configured as part of a given same node dependency set must have the same
node list (the same nodes in the same order).
– All nonconcurrent resource groups in the same node dependency set must have the same startup,
fallover, and fallback policies.
– Online Using Node Distribution Policy is not allowed for startup.
– If a Dynamic Node Priority Policy is configured as a Fallover Policy, all resource groups in the set
must have the same policy.
In start after dependencies, the target resource group must be online on any node in the cluster before a
source (dependent) resource group can be activated on a node. There is no dependency when releasing
resource groups and the groups are released in parallel.
The following are guidelines and limitations for start after dependencies.
v A resource group can serve as both a target and a source resource group, depending on which end of a
given dependency link it is placed.
v You can specify three levels of dependencies for resource groups.
v You cannot specify circular dependencies between resource groups.
v This dependency applies only at the time of resource group acquisition. There is no dependency
between these resource groups during resource group release.
v A source resource group cannot be acquired on a node until its target resource group is fully
functional. If the target resource group does not become fully functional, the source resource group
goes into an offline due to target offline state. If you notice that a resource group is in this state, you
might need to troubleshoot which resources might need to be brought online manually to resolve the
resource group dependency.
v When a resource group in a target role falls over from one node to another, there will be no effect on
the resource groups that depend on it.
v After the source resource group is online, any operation (bring offline, move resource group) on the
target resource group does not effect the source resource group.
v If the target resource group is offline, you cannot manually move a resource group or bring a resource
group online on the source resource group.
Note: You should configure several application monitors, especially a monitor that checks the application
startup for the application that is included in the target resource groups. This process verifies that the
application in the target resource group starts successfully.
In stop after dependencies, the target resource group must be offline on any node in the cluster before a
source (dependent) resource group can be brought offline on a node. There is no dependency when
acquiring resource groups and the groups are acquired in parallel.
The following are limitations and guidelines for stop after dependencies.
v A resource group can serve as both a target and a source resource group, depending on which end of a
given dependency link it is placed.
v You can specify three levels of dependencies for resource groups.
v You cannot specify circular dependencies between resource groups.
v This dependency applies only at the time a resource group is released. There is no dependency
between these resource groups during resource group acquisition.
v A source resource group cannot be released on a node until its target resource group is offline.
v When a resource group in a source role falls over from one node to another, first the target resource
group is released and then the source resource group is released. After that, both resource groups are
acquired in parallel, assuming that there is no start after or parent-child dependency between these
resource groups.
v If the target resource group is offline, you cannot manually move a resource group or bring a resource
group offline on the source resource group.
If you use clRGmove with resource groups that have the Never Fallback fallback policy, the resource
group remains on that node until you move it elsewhere.
The following paragraphs describe the rules which apply when using clRGmove to manage resource
groups with different policies.
In this type of dependency, the target resource group must be online on any node in the cluster before a
source (dependent) resource group can be activated on a node. The following rules apply to resource
groups with a start after dependency:
v If the target resource groups are offline due to your request made through the clRGmove command,
PowerHA SystemMirror rejects manual attempts to bring the source resource groups that depend on
these resource groups online. The error message lists the target resource groups that must be brought
online first.
In this type of dependency, the target resource group must be offline on any node in the cluster before a
source (dependent) resource group can be brought offline on a node. The following rules apply to
resource groups with a stop after dependency:
v If you have a target and a source resource group online, and would like to move the source resource
group to another node or take it offline, PowerHA SystemMirror prevents you from doing so before a
target resource group is taken offline.
IPAT does not apply to concurrent resource groups or online on all available nodes resource groups.
It is important to consider how clients reach application addresses over the cluster network, when there
are firewalls or VPNs that are configured in the environment.
Related reference:
“Planning for IP address takeover via IP aliases” on page 25
Assigning IP aliases to NICs allows you to create more than one IP label on the same network interface.
Note: Even if you specify the order of resource group processing on a single node, the actual fallover
of the resource groups may be triggered by different policies. Therefore, it is not guaranteed that
Although by default PowerHA SystemMirror processes resource groups in parallel, if you establish
dependencies between some of the resource groups in the cluster, processing may take longer than it does
for clusters without dependent resource groups as there may be more processing to do to handle one or
more rg_move events
Upon acquisition, first the parent or higher priority resource groups are acquired, then the child resource
groups are acquired. Upon release, the order is reversed. The remaining resource groups in the cluster
(those that do not have dependencies themselves) are processed in parallel.
Also, if you specify serial order or processing and have dependent resource groups configured, make sure
that the serial order does not contradict the dependency specified. The resource groups dependency
overrides any serial order in the cluster.
Related reference:
“Planning for cluster events” on page 79
These topics describe the PowerHA SystemMirror cluster events.
Related information:
Configuring processing order for resource groups
Upgrading a PowerHA SystemMirror cluster
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
You can use the following policies for concurrent resource groups:
Ignore
Startup Policy Online on all available nodes
Fallover Policy Bring offline (on Error node only)
Fallback Policy Never fallback
For nonconcurrent resource groups, you can use the following policies:
Ignore
Startup Policy Online on home node
Never fallback
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
Concurrent resource groups (online on all nodes) with an intersite management policy of online on both
sites have multiple ONLINE instances and no ONLINE SECONDARY instances when the cluster is
running at both locations.
Concurrent resource groups with an intersite management policy of prefer primary site or online on
either site have primary instances on each node at the primary site and secondary instances on nodes at
the secondary site.
If the secondary instance cannot be brought to the ONLINE SECONDARY state, the primary instance will
still be brought ONLINE, if possible.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
You can specify a dependency between two or more resource groups that reside on nodes, which are
located at different sites. In this case, if either the parent or child moves to the other site, the dependent
group moves also. If the parent group cannot be activated on the fallover site, the child resource group
will also remain inactive.
The dependency applies only to the state of the primary instance of the resource group. If the parent
group's primary instance is OFFLINE, and the secondary instance is ONLINE SECONDARY on a node,
the child group's primary instance will be OFFLINE.
During resource group recovery, resource groups can fall over to nodes on either site. The sequence for
acquiring dependent resource groups is the same as that of clusters without sites, in which the parent
resource group is acquired first, and then the child resource group is acquired. The release logic is
reversed when the child resource group is released before a parent resource group is released.
If you have sites that are defined in clusters without sites, you must configure application monitoring for
applications included in resource groups with dependencies.
Related reference:
“Resource-group location dependencies” on page 65
Certain applications in different resource groups stay online together on a node or stay online on
different nodes.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
The intersite management policy for a resource group determines the fallback behavior of the ONLINE
instances of the resource groups between sites, which governs the location of the secondary instance.
The ONLINE SECONDARY instance is located at the site that does not have the ONLINE instance. The
following table shows the expected behavior of resource groups during site events according to the
startup and intersite management policies.
Secondary site
The first node that joins this site acquires the resource
group in ONLINE SECONDARY state.
Intersite fallover
The ONLINE instance falls between sites when no nodes at the local
site can acquire the resource group. The secondary instance moves
to the other site and is brought to the ONLINE SECONDARY state
on the highest priority node that is available, if possible.
Intersite fallback
The ONLINE instance falls back to the primary site when a node
from the primary site joins. The secondary instance moves to the
other site and is brought to the ONLINE SECONDARY state on the
highest priority node that is available, if possible.
Online on first Prefer primary site
available node Cluster Startup
Primary site
or The node that joins first from the primary site, and meets
the criteria, acquires the resource group in the ONLINE
Online using node
state. The resource group is OFFLINE on all other nodes
distribution policy
at the primary site. The node distribution policy applies
only to the primary instance of the resource group.
Secondary site
The first node to join the cluster in this site acquires all
secondary instances of resource groups with this startup
policy in the ONLINE_SECONDARY state (no
distribution).
Intersite fallover
The ONLINE instance falls between sites when no nodes at the local
site can acquire the resource group. The secondary instance moves
to the other site and is brought to the ONLINE SECONDARY state
on the highest priority node that is available, if possible.
Intersite fallback
The ONLINE instance falls back to the primary site when a node
from the primary site joins. The secondary instance moves to the
other site and is brought to the ONLINE SECONDARY state on the
highest priority node that is available, if possible.
Intersite fallover
The ONLINE instances fall between sites when all nodes at the local
site go OFFLINE or fail to start the resource group. The secondary
instances move to the other site and are brought to the ONLINE
SECONDARY state where possible.
Intersite fallback
ONLINE instances fall back to the primary site when a node on the
primary site rejoins. Nodes at the secondary site acquire the resource
group in the ONLINE_SECONDARY state.
Online on home node Online on either site
only Cluster Startup
Primary site
The home node that joins the cluster (from either site)
acquires the resource group in the ONLINE state.
Nonhome nodes leave the resource group OFFLINE.
Secondary site
The first node to join from the other site acquires the
resource group in the ONLINE_SECONDARY state.
Intersite fallover
The ONLINE instance falls between sites when no nodes at the local
site can acquire the resource group. The secondary instance moves
to the other site and is brought to the ONLINE_SECONDARY state
on the highest priority node available, if possible.
Intersite fallback
The ONLINE instance does not fall back to the primary site when a
node on the primary site rejoins. The highest priority rejoining node
acquires the resource group in the ONLINE_SECONDARY state.
Online on first Online on either site
available node Cluster Startup
Primary site
or The node that joins first from either site, that meets the
distribution criteria, acquires the resource group in the
Online using node
ONLINE state.
distribution policy
Secondary site
After the resource group is ONLINE, the first joining node
from the other site acquires the resource group in the
ONLINE_SECONDARY state.
Intersite fallover
The ONLINE instance falls between sites when no nodes at the local
site can acquire the resource group.
Intersite fallback
The ONLINE instance does not fall back to the primary site when
the primary site joins. A rejoining node acquires the resource group
in the ONLINE_SECONDARY state.
Intersite fallover
The ONLINE instance falls between sites when all nodes at the local
site go OFFLINE or fail to start the resource group.
Intersite fallback
The ONLINE instance does not fall back to the primary site when
the primary site joins. Rejoining nodes acquire the resource group in
the ONLINE_SECONDARY state.
Online on all available Online at both sites
nodes Cluster Startup
All nodes at both sites activate the resource group in the ONLINE
state.
Intersite fallover
No fallover occurs. Resource group is either in the OFFLINE state or
in the ERROR state.
Intersite fallback
No fallback occurs.
A particular instance of a resource group can fall over within one site, but it cannot move between sites.
If no nodes are available on the site where the affected instance resides, that instance goes into the
ERROR or ERROR_SECONDARY state. It does not stay on the node where it failed. This behavior
applies to both primary and secondary instances.
The Cluster Manager moves the resource group if a node_down or node_up event occurs, even if fallover
between sites is disabled. You can also manually move a resource group between sites.
If you migrated from a previous release of PowerHA SystemMirror, you can change the resource group
recovery policy to allow the Cluster Manager to move a resource group to another site to avoid having
the resource group go into the ERROR state.
When fallover across sites is enabled, PowerHA SystemMirror tries to recover the primary instance of a
resource group in situations where an interface that is connected to an intersite network fails or becomes
available.
When fallover across sites is enabled, PowerHA SystemMirror tries to recover the secondary instance and
the primary instance of a resource group in these situations:
To enable or disable intersite resource group recovery, complete the following steps:
1. From the command line, enter smit sysmirror.
2. From the SMIT interface, select Custom Cluster Configurations > Resources > PowerHA
SystemMirror Extended Resources Configuration > Customize Inter-site Resource Group Recovery,
and press Enter.
With PowerHA SystemMirror replicated resources, you can use the following functions:
v Enables you to dynamically reconfigure resource groups that contain replicated resources with the
PowerHA SystemMirror Enterprise Edition for AIX and the PowerHA SystemMirror site configurations.
v Consolidates PowerHA SystemMirror Enterprise Edition for AIX verification into standard cluster
verification by automatically detecting and calling the installed PowerHA SystemMirror Enterprise
Edition for AIX product's verification utilities.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
The following replicated resources are supported PowerHA SystemMirror Enterprise Edition for AIX
configurations:
v Resource groups with concurrent node policy can have nonconcurrent site management policy.
v Intersite recovery of resource groups containing PowerHA SystemMirror Enterprise Edition for AIX
replicated resources is allowed by default for new installations of PowerHA SystemMirror.
Configurations that are updated and migrated from previous releases maintain the preexisting
behavior. You can configure this behavior to be a fallover or notify option on cluster-initiated resource
group movement. If you select the notify option, you need to configure a pre-event or post-event
script, or a remote notification method.
v Parent, child, and location dependency configurations for replicated resource groups.
v Node-based resource group distribution startup policy for resource groups with PowerHA
SystemMirror sites.
You cannot configure a resource group to use a nonconcurrent node policy and a concurrent inter-site
management policy.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
The Cluster Manager uses dynamic event phasing in that it moves the secondary instance from that site
to the other site if a node is available to host it. Every attempt is made to maintain the secondary
instance in the SECONDARY_ONLINE state. Even if a node at a given site is configured so that it cannot
host more than one primary instance, it might host more than one secondary instance to keep them all in
the SECONDARY_ONLINE state.
WLM allows you to set targets for and to set limits on CPU, physical memory use, and disk I/O
bandwidth for different processes and applications. This provides better control over the use of critical
system resources at peak loads. PowerHA SystemMirror allows you to configure WLM classes into
PowerHA SystemMirror resource groups so that the starting and stopping of WLM and the active WLM
configuration can be under cluster control.
PowerHA SystemMirror does not verify every aspect of your WLM configuration; therefore, it is your
responsibility to ensure the integrity of the WLM configuration files. After you add the WLM classes to a
PowerHA SystemMirror resource group, the verification utility checks only whether the required WLM
For complete information on how to set up and use Workload Manager, see the IBM AIX Workload
Manager (WLM) Redbooks® publication.
Workload Manager distributes system resources among processes that request them according to the class
they are in. Processes are assigned to specific classes according to class assignment rules. Planning for
WLM integration with PowerHA SystemMirror includes two basic steps:
1. Using AIX SMIT panels to define the WLM classes and class assignment rules related to highly
available applications.
2. Using PowerHA SystemMirror SMIT panels to establish the association between the WLM
configuration and the PowerHA SystemMirror resource groups.
Related information:
AIX Workload Manager (WLM) Redbooks
You set up class assignment rules that tell WLM how to classify all new processes (as well as those
already running at the time of WLM startup) according to their group ID (GID), user ID (UID), and the
full path name.
After WLM classes are added to a PowerHA SystemMirror resource group, then at the time of cluster
synchronization on the node, PowerHA SystemMirror reconfigures WLM to use the rules required by the
classes associated with the node. In the event of dynamic resource reconfiguration on the node, WLM is
reconfigured in accordance with any changes made to WLM classes associated with a resource group.
WLM startup occurs either when the node joins the cluster or when a dynamic reconfiguration of the
WLM configuration takes place.
The configuration is node-specific and depends the resource groups in which the node participates. If the
node cannot acquire any resource groups associated with WLM classes, WLM is not started.
Finally, if WLM is currently running and was not started by PowerHA SystemMirror, the startup script
restarts WLM from the user-specified configuration, saving the previous configuration. When PowerHA
SystemMirror is stopped, it returns WLM back to its previous configuration.
Failure to start up WLM generates an error message logged in the hacmp.out log file, but node startup
and the resource reconfiguration proceeds.
WLM shutdown occurs either when the node leaves the cluster or on dynamic cluster reconfiguration. If
WLM is currently running, the shutdown script determines whether the WLM was running before being
started by the PowerHA SystemMirror and what configuration it was using. It then either does nothing
(if WLM is not currently running), or stops WLM (if it was not running before PowerHA SystemMirror
startup), or stops it and restarts it in the previous configuration (if WLM was previously running).
Overview
In PowerHA SystemMirror, resource groups are processed in parallel by default, if possible, unless you
specify a customized serial processing order for all or some of the resource groups in the cluster.
The logic and sequence of events as described in examples might not list all the events.
Note: Sites are supported only in PowerHA SystemMirror 7.1.2, or later, in both the Enterprise Edition
and the Standard Edition. Replication management is supported only in PowerHA SystemMirror
Enterprise Edition.
Site event scripts are included in the PowerHA SystemMirror software. If sites are not defined, no site
events are generated. The PowerHA SystemMirror site_event scripts run as follows if sites are defined:
v The first node in a site runs the site_up event before it completes node_up event processing. The
site_up_complete event runs after the node_up_complete event.
v When the last node in a site goes down, the site_down event runs before the node_down event, and
the site_down_complete event runs after the node_down_complete event.
Without installing PowerHA SystemMirror Enterprise Edition, you can define pre-event and post-events
to run when a site changes state. In this case, you can define all site-related processes.
Site events (including the check_for_site_up event and the check_for_site_down event) are logged in the
hacmp.out log file.
If sites are defined, the site_up event runs when the first node in the site comes up and the site_down
event runs when the last node in the site goes down. The event script sequence for handling resource
groups in general is:
site_up
site_up_remote
node_up
rg_move events to process resource group actions
node_up_complete
site_up_complete
site_up_remote_complete
site_down
site_down_remote
node_down
rg_move events to process resource group actions
node_down_complete
This processing is necessary to ensure the proper balance of cluster resources. As long as the existing
cluster managers first acknowledge a node rejoining the cluster, they can release any resource groups
belonging to that node if necessary. Whether or not the resource groups are actually released in this
situation depends on how the resource groups are configured for takeover (or dependencies). The new
node can then start its operations.
The cluster manager then takes into account all node policies, especially the configuration of
dependencies for resource groups, and the current distribution and state of resource groups on all nodes
in order to properly handle any acquiring, releasing, bringing online or taking offline of resource groups
before a node_up_complete event can run.
Parent and child or location dependencies between resource groups offer a predictable and reliable way
of building clusters with multitiered applications. However, node_up processing in clusters with
dependencies could take more time than the parallel processing in clusters without resource groups'
dependencies. You might need to adjust the config_too_long warning timer for node_up events.
node_down events
When all network interfaces are down, or a node does not respond to heartbeats, the cluster managers
then run a node_down event. Depending on the cluster configuration, the peer nodes then take the
necessary actions to get critical applications up and running and to ensure that data remains available.
When you stop cluster services and bring resource groups offline, PowerHA SystemMirror stops on the
local node after the node_down_complete event releases the stopped node's resources. The other nodes
run the node_down_complete event and do not take over the resources of the stopped node.
When you stop cluster services and move the resource groups to another node, PowerHA SystemMirror
stops after the node_down_complete event on the local node releases its resource groups. The surviving
nodes in the resource group node list take over these resource groups.
When you stop cluster services and place resource groups in an unmanaged state, PowerHA
SystemMirror software stops immediately on the local node. The node_down_complete event is run on
the stopped node. The cluster managers on remote nodes process node_down events, but do not take
over any resource groups. The stopped node does not release its resource groups.
Node failure
When a node fails, the cluster manager on that node does not have time to generate a node_down event.
In this case, the cluster managers on the surviving nodes recognize that a node_down event has occurred
(when they realize the failed node is no longer communicating), and they trigger node_down events.
The following list describes the default parallel sequence of node_down events:
1. node_down
2. This event occurs when a node intentionally leaves the cluster or fails.
3. In some cases, the node_down event receives the forced parameter.
4. All nodes run the node_down event.
5. All nodes run the node_down event.
6. All nodes run the process_resources script. After the cluster manager evaluates the status of affected
resource groups and the configuration, it initiates a series of sub events to redistribute resources as
configured for fallover or fallback.
7. All nodes run the process_resources_complete script.
8. node_down_complete
Network events
PowerHA SystemMirror distinguishes between two types of network failure, local and global, and uses
different network failure events for each type of failure. The network failure event script is often
customized to send mail.
The cluster manager takes selective recovery action to move affected resource groups to other
nodes if Service IP is configured as part of the resource group. The results of the recovery
actions are logged to hacmp.out.
network_down (global) This event occurs when all of the nodes connected to a network have lost contact with a
network. It is assumed in this case that a network-related failure has occurred rather than a
node-related failure. This event has the following format:
network_down -1 network_name
Note: The -1 argument is - one. This argument indicates that the network_down event is
global.
The global network failure event mails a notification to the system administrator, but takes no
further action since appropriate actions depend on the local network configuration.
network_down_complete (local) This event occurs after a local network failure event has completed. It has the following
format:
network_down_complete node_name network_name
When a local network failure event occurs, the cluster manager takes selective recovery
actions for resource groups containing a service network interface card (NIC) connected to
that network.
The default processing for this event takes no actions because appropriate actions depend on
the network configuration.
network_unstable This event occurs when PowerHA is receiving continuous state changes for the network. The
event has the following format:
network_unstable network_name
This event only runs after a network_down_complete for the corresponding network. No
additional network events will run for this network until the stability is restored. When
PowerHA is no longer receiving multiple, continuous state changes for the network, and the
network is up, a “network_up” will run.
network_stable This event occurs when PowerHA is no longer receiving continuous state changes for the
network. The event has the following format:
network_stable network_name
This event only ever runs after a network_unstable event has been run for the corresponding
network. Once a network has stabilized, any future state changes will result in normal event
processing for those changes. If the network is currently “up” when stability is restored, a
“network_up” event will be run.
network_up This event occurs when the cluster manager determines a network has become available for
use. Whenever a network becomes available again, PowerHA SystemMirror attempts to bring
resource groups containing service IP labels on that network back online.
network_up_complete This event occurs only after a network_up event has successfully completed. This event is
often customized to notify the system administrator that an event demands manual attention.
Whenever a network becomes available again, PowerHA SystemMirror attempts to bring
resource groups containing service IP labels on that network back online.
For example, starting a cluster with all service or boot interfaces disconnected produces the following
results:
v First node active: No failure events are generated.
v Second node active: One failure event is generated.
v Third node active: One failure event is generated.
For more information about how resource groups are handled in PowerHA SystemMirror, see Resource
group behavior during cluster events. This topic contains information about the following PowerHA
SystemMirror functions:
v Selective fallover for handling resource groups
v Handling of resource group acquisition failures
v Handling of resource groups configured with service IP resources
v Handling of PowerHA SystemMirror Enterprise Edition resource groups
Related information:
Resource group behavior during cluster events
Notes:
v If dependencies between resource groups or sites are specified, PowerHA SystemMirror processes
events in a different sequence than usual.
v The lists in the following table do not include all possible resource group states. Also, the resource
group instances could be in the process of acquiring or releasing. The corresponding resource group
states are not listed here, but have descriptive names that explain which actions take place.
Table 6. Resource group events
Resource group event name Event description
rg_move This event moves a specified resource group from one node to another.
rg_move_complete This action indicates that the rg_move event has successfully completed.
resource_state_change This trigger event is used for resource group recovery if resource group
dependencies are configured in the cluster. This action indicates that the cluster
manager needs to change the state of one or more resource groups, or there is a
change in the state of a resource managed by the cluster manager. This event runs
on all nodes if one of the following situations occurs:
v Application monitoring failure
v Selective fallover for loss of volume group
v Local network down
v WAN failure
v Resource group acquisition failure
v Resource group recovery on IP interface availability
v Expiration of settling timer for a resource group
v Expiration of fallback timer for a resource group.
resource_state_ change_complete This event runs when the resource_state_change event completes successfully. You
can add pre-event or post-events here if necessary. You might want to be notified
about resource state changes, for example.
external_resource_state_ change This event runs when you move a resource group and PowerHA SystemMirror
uses the dynamic processing path to handle the request because the resource
group dependencies are configured in the cluster.
external_resource_state_ change_complete This event runs when the external_resource_state_change event completes
successfully.
The following table includes some but not all possible states of resource groups:
Table 7. Resource group subevents
Resource group subevents Event description
releasing This action indicates that a resource group is being released either to be brought offline or
to be acquired on another node.
acquiring This action is used when a resource group is being acquired on a node.
rg_up This action indicates that the resource group is online.
rg_down This action indicates that the resource group is offline.
rg_error This action indicates that the resource group is in error state.
rg_acquiring _secondary This action indicates that the resource group is coming online at the target site (only the
replicated resources are online).
rg_up_secondary This action indicates that the resource group is online in the secondary role at the target
site (only replicated resources are online).
rg_error_ secondary This action indicates that the resource group at the site receiving the mirror data is in
error state.
rg_temp_error_ state This action indicates that the resource group is in a temporary error state. For example, it
occurs due to a local network or an application failure. This state informs the cluster
manager to initiate an rg_move event for this resource group. Resource groups should not
be in this state when the cluster is stable.
After the completion of an event, the cluster manager has the state of resources and resource groups
involved in the event. The cluster manager then analyzes the resource group information that it maintains
internally and determines whether recovery events need to be queued for any of the resource groups. The
cluster manager also uses the status of individual resources in resource groups to print out a
comprehensive event summary to the hacmp.out log file.
For each resource group, the cluster manager keeps track of the nodes on which the resource group has
tried to come online and failed. This information is updated when recovery events are processed. The
cluster manager resets the node list for a resource group as soon as the resource group moves to the
online or error states.
In PowerHA SystemMirror, the resource group ERROR states are displayed with detail:
Table 8. Resource groups in an ERROR state
Causes for resource group ERROR states PowerHA SystemMirror displays this message
Parent group is NOT ONLINE; as a result, the child OFFLINE due to parent offline
resource group is unavailable
Higher-priority different-node dependency group is OFFLINE due to lack of available node
ONLINE
Another distributed group was acquired OFFLINE
Group is falling over and in the OFFLINE state OFFLINE
temporarily
Manual intervention is only required when a resource group remains in ERROR state after the event
processing finishes.
Related information:
Resource group behavior during cluster events
As part of the planning process, you need to decide whether to customize event processing. If the actions
taken by the default scripts are sufficient for your purposes, you do not need to do anything further to
configure events during the configuration process.
If you do decide to customize event processing to your environment, use the PowerHA SystemMirror
event customization facility described in this section. If you customize event processing, register these
user-defined scripts with PowerHA SystemMirror during the configuration process.
Complete customization of an event includes a notification to the system administrator (before and after
event processing), and user-defined commands or scripts that run before and after event processing, as
shown in the following example:
Notify sysadmin of event to be processed
Pre-event script or command
PowerHA SystemMirror event script
Post-event script or command
Notify sysadmin that event processing is complete
Event notification
You can specify a notify command that sends mail to indicate that an event is about to happen (or has
just occurred), and that an event script succeeded or failed.
You configure notification methods for cluster events in SMIT under the Custom Cluster Configuration >
Events > Cluster Events > Change/Show Pre-Defined Events menu. For example, a cluster might want
to use a network failure notification event to inform system administrators that traffic might have to be
rerouted. Afterwards, you can use a network_up notification event to tell system administrators that
traffic can again be serviced through the restored network.
Event notification in a PowerHA SystemMirror cluster can also be done using pre-event and post-event
scripts.
For example, you can specify one or more pre-event scripts that run before the node_down event script is
processed. When the cluster manager recognizes that a remote node is down, it first processes these
user-defined scripts. One such script might designate that a message be sent to all users to indicate that
performance might be affected (when adapters are swapped and when application controllers are stopped
and restarted). Following the node_down event script, a post processing event script for network_up
notification might be included to broadcast a message to all users that a certain system is now available
at another network address.
The following scenarios are other examples of where pre-event and post-event processing is useful:
v If a node_down event occurs, this script could notify users on the server about to takeover for the
downed application controller that performance might vary, or that they should seek alternate systems
for certain applications.
v Due to a network being down, a custom installation might be able to reroute traffic through other
machines by creating new IP routes. The network_up and network_up_complete event scripts could
reverse the procedure, ensuring that the correct routes exist after all networks are functioning.
v You can stop cluster services and move resource groups to another node as a post-event script if a
network failed on the local node (but otherwise the network is functioning).
Note that when writing your PowerHA SystemMirror pre-event or post-event scripts, none of the shell
environment variables defined in /etc/environment are available to your program. If you need to use any
of these variables, explicitly source them by including this line in your script:
". /etc/environment"
If you plan to create pre-event or post-event scripts for your cluster, be aware that your scripts will be
passed the same parameters used by the PowerHA SystemMirror event script you specify. For pre-event
and post-event scripts, the arguments passed to the event command are the event name, event exit status,
and the trailing arguments passed to the event command.
All PowerHA SystemMirror event scripts are maintained in the /usr/es/sbin/cluster/events directory. The
parameters passed to your script are listed in the event script headers.
CAUTION:
Be careful not to kill any PowerHA SystemMirror processes as part of your script. If you are using the
output of the ps command and using a grep to search for a certain pattern, make sure the pattern does
not match any of the PowerHA SystemMirror, Cluster Aware AIX (CAA), or Reliable Scalable Cluster
Technology (RSCT) processes.
If the forced varyon attribute is specified for a volume group, special scripts to force a varyon operation
are no longer required.
Historically, nonrecoverable event script failures result in the event_error event being run on the cluster
node where the failure occurred. The remaining cluster nodes did not indicate the failure. With PowerHA
SystemMirror, all cluster nodes run the event_error event if any node has an unrecoverable error. All
nodes log the error and record the failing node name in the hacmp.out log file.
If you have added pre-event or post-event for the event_error event, be aware that those event methods
are called on every node, not just the failing node.
A Korn shell environment variable indicates the node where the event script failed:
EVENT_FAILED_NODE is set to the name of the node where the event failed. Use this variable in your
pre-event or post-event script to determine where the failure occurred.
The variable LOCALNODENAME identifies the local node. If LOCALNODENAME is not the same as
EVENT_FAILED_NODE, the failure occurred on a remote node.
Resource groups processed in parallel and using pre-event and post-event scripts
Resource groups are processed in parallel by default in PowerHA SystemMirror unless you specify a
customized serial processing order for all or some of the resource groups in the cluster.
When resource groups are processed in parallel, fewer cluster events occur in the cluster and appear in
the event summaries.
The use of parallel processing reduces the number of particular cluster events for which you can create
customized pre-event or post-event scripts. If you start using parallel processing for a list of resource
groups in your configuration, be aware that some of your existing pre-event and post-event scripts might
not work for these resource groups.
In particular, only the following events take place during parallel processing of resource groups:
acquire_svc_addr
acquire_takeover_addr
node_down
node_up
release_svc_addr
release_takeover_addr
start_server
stop_server
Note: In parallel processing, these events apply to an entire list of resource groups that are being
processed in parallel, and not to a single resource group, as in serial processing. If you have pre-event
and post-event scripts configured for these events, then after migration, these event scripts are launched
not for a single resource group but for a list of resource groups, and might not work as expected.
Consider these events that do not occur in parallel processing if you have pre-event and post-event
scripts and plan to upgrade to the current version.
If you want to continue using pre-event and post-event scripts, you could have one of the following
cases.
In this case, if you have resources in resource groups that require handling by
pre-event and post-event scripts written for specific cluster events, include these
resource groups in the serial processing lists in SMIT to ensure that specific pre-event
and post-event scripts can be used for these resources.
For information about specifying serial or parallel processing of resource groups, see
the section Configuring processing order for resource groups.
You upgrade to PowerHA SystemMirror If, before migration you had configured customized pre-event or post-event scripts in
4.5 or later and choose parallel processing your cluster, then now that these resource groups are processed in parallel after
for some of the pre-existing resource migration, the event scripts for a number of events cannot be used for these resource
groups in your configuration. groups, since these events do not occur in parallel processing.
If you want existing event scripts to continue working for the resource groups, include
these resource groups in the serial ordering lists in SMIT, to ensure that the pre-event
and post-event scripts can be used for these resources.
For information about specifying serial or parallel processing of resource groups, see
Configuring processing order for resource groups.
Related reference:
“Using forced varyon” on page 50
PowerHA SystemMirror provides a forced varyon function to use in conjunction with AIX automatic
error notification methods. The forced varyon function enables you to have the highest possible data
availability.
Such scripts could become all-encompassing case statements. For instance, if you want to take an action
for a specific event on a specific node, you need to edit that individual case, add the required code for
pre-event and post-event scripts, and also ensure that the scripts are the same across all nodes.
To summarize, even though the logic of such scripts captures the desired behavior of the cluster, they can
be difficult to customize and even more difficult to maintain later on, when the cluster configuration
changes.
If you have applications included in dependent resource groups and still plan to use pre-event and
post-event scripts in addition to the dependencies, additional customization of pre-event and post-event
scripts might be needed. To minimize the chance of data loss during the application stop and restart
process, customize your application controller scripts to ensure that any uncommitted data is stored to a
shared disk temporarily during the application stop process and read back to the application during the
application restart process. It is important to use a shared disk because the application might be restarted
on a node other than the one on which it was stopped.
Related reference:
“Resource group dependencies” on page 63
PowerHA SystemMirror offers a wide variety of configurations where you can specify the relationships
between resource groups that you want to maintain at startup, fallover, and fallback.
You can use the verification automatic monitoring cluster_notify event to configure a PowerHA
SystemMirror remote notification method to send out a message in case of detected errors in cluster
configuration. The output of this event is logged in the hacmp.out file throughout the cluster on each
node that is running cluster services.
You can configure any number of notification methods, for different events and with different text or
numeric messages and telephone numbers to dial. The same notification method can be used for several
different events, as long as the associated text message conveys enough information to respond to all of
the possible events that trigger the notification.
After configuring the notification method, you can send a test message to make sure everything is
configured correctly and that the expected message will be sent for a given event.
Note: PowerHA SystemMirror checks the availability of the port when the notification method is
configured and before a page is issued. Modem status is not checked.
v Each node that can send email messages from the SMIT panel using AIX mail must have a TCP/IP
connection to the Internet.
v Each node that can send text messages to a cell phone must have an appropriate Hayes-compatible
dialer modem installed and enabled.
Cluster events that include acquiring and releasing resource groups take a longer time to complete. The
following cluster events are considered slow events:
v node_up
v node_down
v reconfig_resource
v rg_move
Customize event duration time for slow cluster events to avoid getting unnecessary system warnings
during normal cluster operation.
All other cluster events are considered fast events. These events typically take a shorter time to complete
and do not involve acquiring or releasing resources. Examples of fast events include:
v swap_adapter
v Events that do not handle resource groups
You can customize event duration time before receiving a warning for fast events to take corrective action
faster.
Consider customizing Event Duration Time Until Warning if, in the case of slow cluster events,
PowerHA SystemMirror issues warning messages too frequently. In the case of fast events, you want to
speed up detection of a possible problem event.
Note: Dependencies between resource groups offer a predictable and reliable way of building clusters
with multitier applications. However, processing of some cluster events (such as node_up) in clusters
with dependencies could take more time than processing of those events where all resource groups are
processed in parallel. Whenever resource group dependencies allow, PowerHA SystemMirror processes
multiple nonconcurrent resource groups in parallel, and processes multiple concurrent resource groups on
all nodes at once. However, a resource group that is dependent on other resource groups cannot be
started until the others have been started first. The config_too_long warning timer for node_up events
should be set large enough to allow for this.
User-defined events
You can define your own events for which PowerHA SystemMirror can run your specified recovery
programs. This process adds a new dimension to the predefined PowerHA SystemMirror pre-event and
post-event script customization facility.
You specify the mapping between events that you define and the recovery programs that define the event
recovery actions through the SMIT interface. With this mapping, you control both the scope of each
recovery action and the number of event steps synchronized across all nodes.
An RMC resource refers to an instance of a physical or logical entity that provides services to some other
component of the system. The term resource is used very broadly to refer to software and hardware
entities. For example, a resource could be a particular file system or a particular host machine. A resource
class refers to all resources of the same type, such as processors or host machines.
A resource manager (daemon) maps actual entities to RMC's abstractions. Each resource manager
represents a specific set of administrative tasks or system functions. The resource manager identifies the
key physical or logical entity types related to that set of administrative tasks or system functions, and
defines resource classes to represent those entity types.
The AIX resource monitor generates events for OS-related resource conditions such as the percentage of
CPU that is idle (IBM.Host.PctTotalTimeIdle) or the percentage of disk space in use
(IBM.PhysicalVolume.PctBusy). The program resource monitor generates events for process-related
occurrences such as the unexpected end of a process. The program resource monitor uses the resource
attribute IBM.Program.ProgramName.
where:
v node_set is a set of nodes on which the recovery program is to run
v recovery_command is a quote-delimited string specifying a full path to the executable program. The
command cannot include any arguments. Any executable program that requires arguments must be a
separate script. The recovery program must be in this path on all nodes in the cluster. The program
must specify an exit status.
v expected_status is an integer status to be returned when the recovery command completes successfully.
The cluster manager compares the actual status returned to the expected status. A mismatch indicates
unsuccessful recovery. If you specify the character X in the expected status field, the cluster manager
omits the comparison.
v NULL is currently not used.
You specify node sets by dynamic relationships. PowerHA SystemMirror supports the following dynamic
relationships:
All The recovery command runs on all nodes in the current membership.
Event The node on which the event occurred.
Other All nodes except the one on which the event occurred.
The specified dynamic relationship generates a set of recovery commands identical to the original, except
that a node ID replaces node_set in each set of commands.
The command string for user-defined event commands must start with a slash (/). The clcallev command
runs commands that do not start with a slash.
To list all persistent attribute definitions for the IBM.Host RMC resource (selection string field):
lsrsrcdef -e -A p IBM.Host
A sample program sends a message to /tmp/r1.out that paging space is low on the node where the event
occurred. For recovery program r1.rp, the SMIT fields would be filled in as follows.
Table 10. Example: Recovery program fields
Field Value
Event Name E_page_space(User-defined name)
Recovery program path /r1.rp
Resource name IBM.Host (cluster node)
Selection string Name = ?" (name of node)
Expression TotalPgSpFree < 256000
(VMM is within 200 MB of paging space warning level).
The recovery program does not execute a command with arguments itself. Instead, it points to a shell
script, /tmp/checkpagingspace, which contains:
#!/bin/ksh
/usr/bin/echo “Paging Space LOW!” > /tmp/r1.out
exit 0
Barrier commands
You can put any number of barrier commands in the recovery program. All recovery commands before a
barrier start in parallel. After a node encounters a barrier command, all nodes must reach it before the
recovery program continues.
If multiple events are outstanding simultaneously, you only see the highest priority event. Node events
are higher priority than network events. But user-defined events, the lowest priority, do not roll up at all,
so you see all of them.
You can view a compilation of just the event summary portions of the past seven days of hacmp.out log
files by using the View Event Summaries option in the Problem Determination Tools SMIT panel. The
event summaries can be compiled even if you have redirected the hacmp.out file to a nondefault location.
The Display Event Summaries report also includes resource group information generated by the
clRGinfo command. You can also save the event summaries to a specified file instead of viewing them
through SMIT.
When events handle resource groups with dependencies, a preamble is written to the hacmp.out log file
listing the plan of sub events for handling the resource groups.
PowerHA SystemMirrorclients are end-user devices that can access the nodes in a PowerHA
SystemMirror cluster. For planning purposes, it is important that you evaluate the cluster from the point
of view of the clients.
Clients running the Clinfo daemon can reconnect to the cluster quickly after a cluster event. If you have
hardware other than IBM System p between the cluster and the clients, make sure that you can update
the ARP cache of those network components after a cluster event occurs.
If you configure the cluster to swap hardware addresses as well as IP addresses, you do not need to be
concerned about updating the ARP cache. However, be aware that this option causes a longer delay.
If you are using IPAT via IP aliases, make sure all your clients support TCP/IP gratuitous ARP.
For clients running the Clinfo daemon, decide whether to customize the /usr/es/sbin/cluster/etc/clinfo.rc
script to do more than update the ARP cache when a cluster event occurs.
This assumes the client is connected directly to one of the cluster networks.
Network components
If you configured the network so that clients attach to networks on the other side of a router, bridge, or
gateway rather than to the cluster's local networks, be sure that you can update the ARP cache of those
network components after a cluster event occurs.
With PowerHA SystemMirror you can configure clusters with multitiered applications by establishing
dependencies between resource groups containing different applications. These topics describe resource
group dependencies and how they can help with keeping dependent applications highly available.
Related reference:
“Initial cluster planning” on page 5
This section describe the initial steps you take to plan a PowerHA SystemMirror cluster to make
applications highly available.
There are few requirements that an application must meet to recover well under PowerHA SystemMirror.
Some required characteristics, as well as a number of suggestions, are discussed here. These are grouped
according to key points that applies to all PowerHA SystemMirror environments. This topic covers the
following application considerations:
v Automation. Making sure your applications start and stop without user intervention
v Dependencies. Knowing what factors outside PowerHA SystemMirror affect the applications
v Interference. Knowing that applications themselves can hinder PowerHA SystemMirror functioning
v Robustness. Choosing strong, stable applications
v Implementation. Using appropriate scripts, file locations, and cron schedules.
You should add an application monitor to detect a problem with application startup. An application
monitor in startup monitoring mode checks an application controller’s successful startup within the
specified stabilization interval and exits after the stabilization period ends.
You can start the PowerHA SystemMirror cluster services on the nodes without stopping your
applications, by selecting an option from a SMIT panel PowerHA SystemMirror Services > Start Cluster
Services. When starting, PowerHA SystemMirror relies on the application startup scripts and configured
application monitors to ensure that PowerHA SystemMirror is aware of the running application and does
not start a second instance of the application.
Create a start script that starts the application. The start script should perform any clean-up or
preparation necessary to ensure proper startup of the application, and also to properly manage the
number of instances of the application that need to be started. When the application controller is added
to a resource group. PowerHA SystemMirror calls this script to bring the application online as part of
processing the resource group. Because the cluster daemons call the start script, there is no option for
interaction. Additionally, upon a PowerHA SystemMirror fallover, the recovery process calls this script to
bring the application online on a standby node. This allows for a fully automated recovery, and is why
any necessary cleanup or preparation should be included in this script.
PowerHA SystemMirror calls the start script as the root user. It might be necessary to change to a
different user in order to start the application. The su command can accomplish this. Also, it might be
necessary to run thenohup command on commands that are started in the background and have the
potential to be ended upon exit of the shell.
For example, a PowerHA SystemMirror cluster node might be a client in a Network Information Service
(NIS) environment. If this is the case and you need to use the su command to change the user ID, there
must be a route to the NIS server at all times. In the event that a route does not exist and the su
command is attempted, the application script hangs. You can avoid this situation by enabling the
PowerHA SystemMirror cluster node to be an NIS client. That way, a cluster node has the ability to
access its own NIS map files to validate a user ID.
The start script should also check for the presence of required resources or processes. This will ensure an
application can start successfully. If the necessary resources are not available, a message can be sent to the
administration team to correct this and restart the application.
Start scripts should be written so that they determine whether one instance of the application is already
running and not start another instance unless multiple instances are desired. Keep in mind that the start
script might be run after a primary node has failed. There might be recovery actions necessary on the
backup node in order to restart an application. This is common in database applications. Again, the
recovery must be able to run without any interaction from administrators.
The application stop script should use a phased approach. The first phase should be an attempt to stop
the cluster services and bring resource groups offline. If processes refuse to end, the second phase should
be used to forcefully ensure that all processing is stopped. Finally, a third phase can use a loop to repeat
any steps necessary to ensure that the application has ended completely.
Keep in mind that PowerHA SystemMirror allows 360 seconds by default for events to complete
processing. A message indicates that the cluster has been in reconfiguration too long appears until the
cluster completes its reconfiguration and returns to a stable state. This warning might be an indication
that a script is hung and requires manual intervention. If this is a possibility, you might want to consider
stopping an application manually before stopping PowerHA SystemMirror.
You can change the time period before the config_too_long event is called.
In PowerHA SystemMirror, support for dependent resource groups allows you to configure the following
options:
v Three levels of dependencies between resource groups, for example a configuration in which node A
depends on node B, and node B depends on node C. PowerHA SystemMirror prevents you from
configuring circular dependencies.
v A type of dependency in which a parent resource group must be online on any node in the cluster
before a child (dependent) resource group can be activated on a node.
If two applications must run on the same node, both applications must reside in the same resource
group.
If a child resource group contains an application that depends on resources in the parent resource group
and, then upon fallover conditions, and if the parent resource group falls over to another node, the child
resource group is temporarily stopped and automatically restarted. Similarly, if the child resource group
is concurrent, PowerHA SystemMirror takes it offline temporarily on all nodes, and brings it back online
on all available nodes. If the fallover of the parent resource group is not successful, both the parent and
the child resource groups go into an ERROR state.
Note that when the child resource group is temporarily stopped and restarted, the application that
belongs to it is also stopped and restarted. Therefore, to minimize the chance of data loss during the
application stop and restart process, customize your application controller scripts to ensure that any
uncommitted data is stored to a shared disk temporarily during the application stop process and read
back to the application during the application restart process. It is important to use a shared disk because
the application might be restarted on a node other than the one on which it was stopped.
For example, if the database is made highly available and a fallover occurs, consider whether actions
should be taken at the higher tiers in order to automatically return the application to service. If so, it
might be necessary to stop and restart application or client tiers. This can be facilitated in one of two
ways. One way is to run the cli_on_node command on the tiers, and the other is to use a remote
execution command such as rsh, rexec, or ssh.
Note: Certain methods, such as the use of ~/.rhosts files, pose a security risk.
To configure complex clusters with multitiered applications, you can use parent-child dependent resource
groups. You might also want to consider using location dependencies.
Clinfo API is the cluster information daemon. You can write a program using the Clinfo API to run on
any tiers that would stop and restart an application after a fallover has completed successfully. In this
sense, the tier, or application, becomes cluster aware, responding to events that take place in the cluster.
Another way to address the issue of multitiered architectures is to use pre-event and post-event scripts
around a cluster event. These scripts would call a remote execution command, such as rsh, rexec, or ssh,
to stop and restart the application.
Related concepts:
“Applications and PowerHA SystemMirror” on page 98
This topic addresses some of the key issues to consider when making your applications highly available
under PowerHA SystemMirror.
Related reference:
“Writing effective scripts” on page 103
Writing smart application start scripts can also help reduce the likelihood of problems when you bring
applications online.
“Planning resource groups” on page 58
These topics describe how to plan resource groups within a PowerHA SystemMirror cluster.
“Application dependencies”
Historically, to achieve resource group and application sequencing, system administrators had to build
the application recovery logic in their pre-event and post-event processing scripts. Every cluster would be
configured with a pre-event script for all cluster events, and with a post-event script for all cluster events.
Application dependencies
Historically, to achieve resource group and application sequencing, system administrators had to build
the application recovery logic in their pre-event and post-event processing scripts. Every cluster would be
configured with a pre-event script for all cluster events, and with a post-event script for all cluster events.
Such scripts could become all-encompassing case statements. For example, if you want to take an action
for a specific event on a specific node, you need to edit that individual case, add the required code for
pre-event and post-event scripts, and also ensure that the scripts are the same across all nodes.
To summarize, even though the logic of such scripts captures the desired behavior of the cluster, the
scripts can be difficult to customize and even more difficult to maintain later on when the cluster
configuration changes.
If you are using pre-event and post-event scripts or other methods, such as resource group processing
ordering to establish dependencies between applications that are supported by your cluster, then these
methods might no longer be needed or can be significantly simplified. Instead, you can specify
dependencies between resource groups in a cluster.
Note: In many cases, applications depend on more than data and an IP address. For the success of any
application under PowerHA SystemMirror, it is important to know what the application should not
depend in order to function properly. This topic outlines many of the major dependency issues. Keep in
mind that these dependencies might come from outside the PowerHA SystemMirror and application
Locally attached devices can pose a clear dependency problem. In the event of a fallover, if these devices
are not attached and accessible to the standby node, an application might fail to run properly. These
might include a CD-ROM device, a tape device, or an optical juke box. Consider whether your
application depends on any of these and if they can be shared between cluster nodes.
Hard coding
Hard coding an application to a particular device in a particular location creates a potential dependency
issue. For example, the console is typically assigned as /dev/tty0. Although this assigned name is
common, it is by no means guaranteed. If your application assumes the name /dev/tty0, ensure that all
possible standby nodes have the same configuration.
Software licensing
Another possible problem is software licensing. Software can be licensed to a particular CPU ID. If this is
the case with your application, a fallover of the software will not successfully restart. You might be able
to avoid this problem by having a copy of the software on all cluster nodes. Know whether your
application uses software that is licensed to a particular CPU ID.
Related reference:
“Planning considerations for multitiered applications” on page 16
Business configurations that use multitiered applications can use parent and child dependent resource
groups. For example, the database must be online before the application controller. In this case, if the
database goes down and is moved to a different node the resource group containing the application
controller would have to be brought down and back up on any node in the cluster.
Application interference
Sometimes an application or an application environment might interfere with the proper functioning of
PowerHA SystemMirror. An application might run properly on both the primary and standby nodes.
However, when PowerHA SystemMirror is started, a conflict with the application or environment could
arise that prevents PowerHA SystemMirror from functioning successfully.
Additionally, products that manipulate network routes can keep PowerHA SystemMirror from
functioning as it was designed. These products can find a secondary path through a network that has had
an initial failure. This routing might prevent PowerHA SystemMirror from properly diagnosing a failure
and taking appropriate recovery actions.
Related reference:
“Initial cluster planning” on page 5
This section describe the initial steps you take to plan a PowerHA SystemMirror cluster to make
applications highly available.
Beyond basic stability, an application under PowerHA SystemMirror should meet other robustness
characteristics.
A good application candidate for PowerHA SystemMirror should be able to restart successfully after a
hardware failure. Run a test on an application before managing it with PowerHA SystemMirror. Run the
application under a heavy load and fail the node. What does it take to recover after the node is back
online? Can this recovery be completely automated? If not, the application might not be a good candidate
for high availability.
Applications should regularly save to disk any information necessary to restart. If a failure occurs, the
application can pick up from where it was before, rather than completely starting over.
Consider characteristics such as time to start, time to restart after failure, and time to stop. Your decisions
in a number of areas, such as script writing, file storage, /etc/inittab file and cron schedule issues, can
improve the probability of successful application implementation.
A good practice for start scripts is to check prerequisite conditions before you start an application. The
prerequisite conditions might include access to a file system, adequate paging space, and free file system
space. The start script should exit and run a command to notify system administrators if requirements are
not met.
In pre-event and post-event scripts, on the first line you must specify the shell environment. For example,
if you are using the Korn shell environment, the first line in the event script must be #!/bin/ksh93.
When you start a database, it is important to consider whether there are multiple instances within the
same cluster. In this scenario, you want to start only the instances applicable for each node. Certain
database startup commands read a configuration file and start all known databases at the same time. This
behavior might not be an ideal configuration for all environments.
Be careful not to kill any PowerHA SystemMirror processes as part of your script. If you are using the
output of the ps command and you are using the grep command to search for a certain pattern, verify
that the pattern does not match any of the PowerHA SystemMirror or Reliable Scalable Cluster
Technology (RSCT) processes.
There are advantages and disadvantages to storing optional files in either location. Having files stored on
each node's internal disks implies that you have multiple copies of, and potentially multiple licenses for,
the application. This could require additional cost as well as maintenance in keeping these files
synchronized. However, in the event that an application needs to be upgraded, the entire cluster need not
be taken out of production. One node could be upgraded while the other remains in production. The best
solution is the one that works best for a particular environment.
The inittab file starts applications when you start the system. If cluster resources are needed for an
application to function, they will not become available until after PowerHA SystemMirror is started. It is
better to use the PowerHA SystemMirror application controller facility that allows the application to be a
resource that is started only after all dependent resources are online.
Note: It is important that the following settings are correct in /etc/inittab file:
hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init
v The clinit and pst_clinit entries must be the last entries of run level "2".
v The clinit entry must proceed the pst_clinit entry.
In the cron table, jobs are started according to a schedule set in the table and the date setting on a node.
This information is maintained on internal disks and thus cannot be shared by a standby node.
Synchronize these cron tables so that a standby node can perform the necessary action at the appropriate
time. Also, ensure that the date is set the same on the primary node and any of its standby nodes.
The Oracle Database, like many databases, functions well under PowerHA SystemMirror. It is a robust
application that handles failures well. It can roll back uncommitted transactions after a fallover and
return to service in a timely manner. However, there are a few things to keep in mind when using Oracle
Database under PowerHA SystemMirror.
Starting Oracle
Oracle must be started by the Oracle user ID. Thus, the start script should contain the following code su
- oracleuser. The dash (-) is important since the su command needs to take on all characteristics of the
Oracle user and reside in the Oracle users home directory. The command would look something like this:
su - oracleuser -c /apps/oracle/startup/dbstart
The dbstart command and the dbshut command read the /etc/oratabs file for instructions on which
database instances are known and should be started. In certain cases it is inappropriate to start all of the
instances, because they might be owned by another node. This would be the case in the mutual takeover
of two Oracle instances. The oratabs file typically resides on the internal disk and thus cannot be shared.
If appropriate, consider other ways of starting different Oracle instances.
The stopping of Oracle is a process of special interest. There are several different ways to ensure Oracle
has completely stopped. The suggested sequence is this: first, implement a graceful shutdown; second,
call a shutdown immediate, which is a somewhat more forceful method; finally, create a loop to check the
process table to ensure all Oracle processes have exited.
The Oracle product database contains several files as well as data. It is necessary that the data and redo
logs be stored on shared disk so that both nodes might have access to the information. However, the
Oracle binaries and configuration files could reside on either internal or shared disks. Consider what
solution is best for your environment.
SAP R/3 is an example of a three-tiered application. It has a database tier, an application tier, and a client
tier. Most frequently, it is the database tier that is made highly available. In such a case, when a fallover
occurs and the database is restarted, it is necessary to stop and restart the SAP application tier. You can
do this in one of two ways:
v Using a remote execution command, such as rsh, rexec, or ssh
Note: Certain methods, such as the use of ~/.rhosts files, pose a security risk.
v Making the application tier nodes cluster aware.
The first way to stop and start the SAP application tier is to create a script that performs remote
command execution on the application nodes. The application tier of SAP is stopped and then restarted.
This is done for every node in the application tier. Using a remote execution command requires a method
of allowing the database node access to the application node.
Note: Certain methods, such as the use of ~/.rhosts files, pose a security risk.
A second method for stopping and starting the application tier is to make the application tier nodes
cluster aware. This means that the application tier nodes are aware of the clustered database and know
when a fallover occurs. You can implement this by making the application tier nodes either PowerHA
SystemMirror servers or clients. If the application node is a server, it runs the same cluster events as the
database nodes to indicate a failure. Pre-event and post-event scripts could then be written to stop and
restart the SAP application tier. If the application node is a PowerHA SystemMirror client, it is notified of
the database fallover using SNMP through the cluster information daemon (Clinfo). A program could be
written using the Clinfo API to stop and restart the SAP application tier.
Related information:
Programming client applications for the Clinfo API
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in
any manner serve as an endorsement of those websites. The materials at those websites are not part of
the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or
any equivalent agreement between us.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice,
and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject to change without
notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to change before the
products described become available.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to actual people or business enterprises is
entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
Each copy or any portion of these sample programs or any derivative work must include a copyright
notice as follows:
Portions of this code are derived from IBM Corp. Sample Programs.
This Software Offering does not use cookies or other technologies to collect personally identifiable
information.
If the configurations deployed for this Software Offering provide you as the customer the ability to collect
personally identifiable information from end users via cookies and other technologies, you should seek
your own legal advice about any laws applicable to such data collection, including any requirements for
notice and consent.
For more information about the use of various technologies, including cookies, for these purposes, see
IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at
http://www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other
Technologies” and the “IBM Software Products and Software-as-a-Service Privacy Statement” at
http://www.ibm.com/software/info/product-privacy.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Notices 109
110 Planning PowerHA SystemMirror
Index
A event (continued)
network 84
adding network interface 85
disk configuration 37 node 80
network topology 33 notification 89
AIX Workload Manager overview 79
overview 77 pre-event and post-event scripts 90
application 12, 98 resource group 87
dependencies 101 site 80
interference 102 summary 97
multitiered 16 user-defined 94
overview 98 example
writing scripts 103 network connection 21
application controller 14
application monitoring 15
F
C fast disk takeover
file system 41
47
E M
mirroring
event journal log 43
cluster-wide status 86 physical partition 42
V
P varyon 48
physical partition forced 50
mirroring 42 virtual adapters 31
physical volume 40 virtual Ethernet 21
planning virtual networks 31
LVM split-site mirroring 43 virtual SCSI 36
Planning volume group 40
IPv6 30 VPN firewall 30
Index 113
114 Planning PowerHA SystemMirror
IBM®
Printed in USA