Clustered Data Ontap 8.3 - HA Configuration Guide
Clustered Data Ontap 8.3 - HA Configuration Guide
3
High-Availability Configuration Guide
NetApp, Inc.
495 East Java Drive
Sunnyvale, CA 94089
U.S.
Table of Contents | 3
Contents
Understanding HA pairs .............................................................................. 7
What an HA pair is ...................................................................................................... 7
How HA pairs support nondisruptive operations and fault tolerance ......................... 7
How the HA pair improves fault tolerance ...................................................... 9
Connections and components of an HA pair ............................................................. 11
Comparison of HA pair types .................................................................................... 13
How HA pairs relate to the cluster ............................................................................ 13
How HA pairs relate to MetroCluster configurations ............................................... 16
If you have a two-node switchless cluster ................................................................. 17
32
33
35
36
36
37
38
38
40
40
40
41
41
42
Table of Contents | 5
Cabling the HA interconnect (all systems except 32xx or FAS80xx in
separate chassis) ...................................................................................... 69
Cabling the HA interconnect (32xx systems in separate chassis) ................. 70
Cabling the HA interconnect (FAS80xx systems in separate chassis) .......... 70
Required connections for using uninterruptible power supplies with standard or
mirrored HA pairs ................................................................................................ 71
Understanding HA pairs
HA pairs provide hardware redundancy that is required for nondisruptive operations and fault
tolerance and give each node in the pair the software functionality to take over its partner's storage
and subsequently give back the storage.
What an HA pair is
An HA pair is two storage systems (nodes) whose controllers are connected to each other directly. In
this configuration, one node can take over its partner's storage to provide continued data service if the
partner goes down.
You can configure the HA pair so that each node in the pair shares access to a common set of storage,
subnets, and tape drives, or each node can own its own distinct set of storage.
The controllers are connected to each other through an HA interconnect. This allows one node to
serve data that resides on the disks of its failed partner node. Each node continually monitors its
partner, mirroring the data for each others nonvolatile memory (NVRAM or NVMEM). The
interconnect is internal and requires no external cabling if both controllers are in the same chassis.
Takeover is the process in which a node takes over the storage of its partner. Giveback is the process
in which that storage is returned to the partner. Both processes can be initiated manually or
configured for automatic initiation.
Fault tolerance
When one node fails or becomes impaired and a takeover occurs, the partner node continues
to serve the failed nodes data.
During nondisruptive upgrades of Data ONTAP, the user manually enters the storage
failover takeover command to take over the partner node to allow the software upgrade
to occur. The takeover node continues to serve data for both nodes during this operation.
The HA pair supplies nondisruptive operation and fault tolerance due to the following aspects of its
configuration:
The controllers in the HA pair are connected to each other either through an HA interconnect
consisting of adapters and cables, or, in systems with two controllers in the same chassis, through
an internal interconnect
The nodes use the interconnect to perform the following tasks:
The nodes use two or more disk shelf loops, or storage arrays, in which the following conditions
apply:
In case of takeover, the surviving node provides read/write access to the partner's disks or
array LUNs until the failed node becomes available again.
Note: Disk ownership is established by Data ONTAP or the administrator; it is not based on
They own their spare disks, spare array LUNs, or both, and do not share them with the other
node.
They each have mailbox disks or array LUNs on the root volume that perform the following
tasks:
Continually check whether the other node is running or whether it has performed a takeover
Related concepts
Understanding HA pairs | 9
HA pair
Controller
Yes
No
NVRAM
Yes
No
CPU fan
Yes
No
Maybe, if
No
all NICs fail
Yes
No
No, if dual- No
path cabling
is used
No, if dual- No
path cabling
is used
Disk drive
No
Standalone
HA pair
No
Understanding HA pairs | 11
Hardware
components
Standalone
HA pair
Power supply
Maybe, if
both power
supplies fail
No
Maybe, if
both fans
fail
No
HA interconnect
adapter
Not
applicable
No
HA interconnect cable
Not
applicable
No
Network
HA Interconnect
Node1
Node2
Node1
Storage
Node2
Storage
Primary connection
Redundant primary connection
Standby connection
Redundant standby connection
Related information
Understanding HA pairs | 13
Data
duplication?
Failover
possible after
loss of entire
node (including
storage)?
Notes
Standard HA pair
No
No
Mirrored HA pair
Yes
No
MetroCluster
Yes
Yes
Related information
Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
The following diagram shows two HA pairs. The multipath HA storage connections between the
nodes and their storage are shown for each HA pair. For simplicity, only the primary connections to
the data and cluster networks are shown.
Understanding HA pairs | 15
Node3
Storage
Node4
Storage
Node3
Node4
HA Interconnect
HA pair
Data
Network
Cluster
Network
HA Interconnect
Node1
Node1
Storage
HA pair
Node2
Node2
Storage
If Node1 and Node2 both fail, the storage owned by Node1 and Node2 becomes unavailable to the
data network. Although Node3 and Node4 are clustered with Node1 and Node2, they do not have
direct connections to Node1 and Node2's storage and cannot take over their storage.
Understanding HA pairs | 17
fails disastrously or is disabled. For example, if an entire node loses power, including its storage,
you cannot fail over to the partner node. For that capability you must have a MetroCluster
configuration.
If the root aggregate is mirrored, storage failover takeover will fail unless all current mailbox disks
are accessible. When all mailbox disks are accessible, storage failover takeover succeeds with the
surviving plex.
Mirrored HA pairs use SyncMirror, implemented through the storage aggregate mirror
command.
Related information
The failure or loss of three or more disks in a RAID-DP (RAID double-parity) aggregate
The failure of an array LUN; for example, because of a double-disk failure on the storage array
The failure of a SAS HBA, FC-AL adapter, disk shelf loop or stack, or disk shelf module does not
require a failover in a mirrored HA pair.
When you manually initiate takeover with the storage failover takeover command
When a node in an HA pair with the default configuration for immediate takeover on panic
undergoes a software or system failure that leads to a panic
By default, the node automatically performs a giveback, returning the partner to normal operation
after the partner has recovered from the panic and booted up.
When a node in an HA pair undergoes a system failure (for example, a loss of power) and cannot
reboot
Note: If the storage for a node also loses power at the same time, a standard takeover is not
possible.
When a node does not receive heartbeat messages from its partner
This could happen if the partner experienced a hardware or software failure that did not result in a
panic but still prevented it from functioning correctly.
When you halt one of the nodes without using the -f or -inhibit-takeover true parameter
Note: In a two-node cluster with cluster HA enabled, halting or rebooting a node using the
inhibittakeover true parameter causes both nodes to stop serving data unless you first
disable cluster HA and then assign epsilon to the node that you want to remain online.
When you reboot one of the nodes without using the inhibittakeover true parameter
The -onreboot parameter of the storage failover command is enabled by default.
When hardware-assisted takeover is enabled and it triggers a takeover when the remote
management device (Service Processor) detects failure of the partner node
Standard or
mirrored HA pair
Single disk
failure
No
No
Yes
Yes
Double disk
failure (2
disks fail in
same RAID
group)
Maybe; if root
volume has
double disk
failure, or if the
mailbox disks
are affected, no
failover is
possible
Triple disk
failure (3
disks fail in
same RAID
group)
Maybe. If
SyncMirror is
being used, no
takeover occurs;
otherwise, yes
Maybe; if root
volume has
triple disk
failure, no
failover is
possible
No
No
Single HBA
(initiator)
failure, Loop
A
Maybe; if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes
Maybe; if root
volume has
double disk
failure, no
failover is
possible
Yes, if multipath HA
or SyncMirror is
being used
Yes, if multipath
HA or SyncMirror
is being used, or if
failover succeeds
Standard or
mirrored HA pair
Single HBA
(initiator)
failure, Loop
B
No
Yes, if multipath HA
or SyncMirror is
being used
Yes, if multipath
HA or SyncMirror
is being used, or if
failover succeeds
Single HBA
initiator
failure (both
loops at the
same time)
Maybe; if the
data is mirrored
or multipath HA
is being used
and the mailbox
disks are not
affected, then
no; otherwise,
yes
No failover needed
if data is mirrored or
multipath HA is in
use
AT-FCX
failure (Loop
A)
Only if
multidisk
volume failure
or an open loop
condition occurs,
and neither
SyncMirror nor
multipath HA is
in use
Maybe; if root
volume has
double disk
failure, no
failover is
possible
No
Yes, if failover
succeeds
AT-FCX
failure (Loop
B)
No
Maybe; if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes
Yes, if multipath HA
or SyncMirror is in
use
Yes
Standard or
mirrored HA pair
IOM failure
(Loop A)
Only if
multidisk
volume failure
or an open loop
condition occurs,
and neither
SyncMirror nor
multipath HA is
in use
Maybe; if root
volume has
double disk
failure, no
failover is
possible
No
Yes, if failover
succeeds
IOM failure
(Loop B)
No
Maybe: if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes
Yes, if multipath HA
or SyncMirror is in
use
Yes
Shelf
(backplane)
failure
Only if
multidisk
volume failure
or an open loop
condition occurs,
and data isnt
mirrored
Maybe; if root
volume has
double disk
failure or if the
mailboxes are
affected, no
failover is
possible
Maybe; if data is
mirrored, then yes;
otherwise, no
Maybe; if data is
mirrored, then yes;
otherwise, no
Shelf, single
power failure
No
No
Yes
Yes
Shelf, dual
power failure
Only if
multidisk
volume failure
or an open loop
condition occurs
and data is not
mirrored
Maybe; if root
volume has
double disk
failure, or if the
mailbox disks
are affected, no
failover is
possible
Maybe; if data is
mirrored, then yes;
otherwise, no
Maybe; if data is
mirrored, then yes;
otherwise, no
Standard or
mirrored HA pair
Controller,
single power
failure
No
No
Yes
Yes
Controller,
dual power
failure
Yes
No
Yes, if failover
succeeds
HA
interconnect
failure (1
port)
No
No
Not applicable
Yes
HA
interconnect
failure (both
ports)
No
Yes
Not applicable
Yes
Tape
interface
failure
No
No
Yes
Yes
Heat exceeds
permissible
amount
Yes
No
No
No
Fan failures
(disk shelves
or controller)
No
No
Yes
Yes
Reboot
Yes
No
No
Yes, if failover
occurs
Panic
Yes
No
No
Yes, if failover
occurs
Related information
SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
You can monitor the progress using the storage failover showtakeover command.
The aggregate relocation can be avoided during this takeover instance by using the
bypassoptimization parameter with the storage failover takeover command. To
bypass aggregate relocation during all future planned takeovers, set the
bypasstakeoveroptimization parameter of the storage failover modify
command to true.
2. If the user-initiated takeover is a negotiated takeover, the target node gracefully shuts down,
followed by takeover of the target node's root aggregate and any aggregates that were not
relocated in Step 1.
3. Before the storage takeover begins, data LIFs migrate from the target node to the node performing
the takeover or to any other node in the cluster based on LIF failover rules.
The LIF migration can be avoided by using the skiplif-migration parameter with the
storage failover takeover command.
Clustered Data ONTAP 8.3 File Access Management Guide for CIFS
Clustered Data ONTAP 8.3 File Access Management Guide for NFS
Clustered Data ONTAP 8.3 Network Management Guide
4. Existing SMB (CIFS) sessions are disconnected when takeover occurs.
Attention: Due to the nature of the SMB protocol, all SMB sessions except for SMB 3.0
sessions connected to shares with the Continuous Availability property set, will be
disruptive. SMB 1.0 and SMB 2.x sessions cannot reconnect after a takeover event. Therefore,
takeover is disruptive and some data loss could occur.
5. SMB 3.0 sessions established to shares with the Continuous Availability property set can
reconnect to the disconnected shares after a takeover event.
If your site uses SMB 3.0 connections to Microsoft Hyper-V and the Continuous
Availability property is set on the associated shares, takeover will be nondisruptive for those
sessions.
Clustered Data ONTAP 8.3 File Access Management Guide for CIFS
If the node doing the takeover panics
If the node that is performing the takeover panics within 60 seconds of initiating takeover, the
following events occur:
After it reboots, the node performs self-recovery operations and is no longer in takeover mode.
Failover is disabled.
If the node still owns some of the partner's aggregates, after enabling storage failover, return these
aggregates to the partner using the storage failover giveback command.
Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover giveback - Return failed-over storage to its
home node
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Aggregates created on clustered Data ONTAP systems (except for the root aggregate containing
the root volume) have an HA policy of SFO. Manually initiated takeover is optimized for
performance by relocating SFO (non-root) aggregates serially to the partner prior to takeover.
During the giveback process, aggregates are given back serially after the taken-over system boots
and the management applications come online, enabling the node to receive its aggregates.
Because aggregate relocation operations entail reassigning aggregate disk ownership and shifting
control from a node to its partner, only aggregates with an HA policy of SFO are eligible for
aggregate relocation.
The root aggregate always has an HA policy of CFO and is given back at the start of the giveback
operation since this is necessary to allow the taken-over system to boot. All other aggregates are
given back serially after the taken-over system completes the boot process and the management
applications come online, enabling the node to receive its aggregates.
Note: Changing the HA policy of an aggregate from SFO to CFO is a Maintenance mode
operation. Do not modify this setting unless directed to do so by a customer support representative.
Related information
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
HA policy and how it affects takeover and giveback operations on page 28
Related information
If a background disk firmware update is occurring on a disk on either node, manually initiated
takeover operations are delayed until the disk firmware update finishes on that disk. If the
background disk firmware update takes longer than 120 seconds, takeover operations are aborted
and must be restarted manually after the disk firmware update finishes. If the takeover was
initiated with the bypassoptimization parameter of the storage failover takeover
command set to true, the background disk firmware update occurring on the destination node
does not affect the takeover.
If a background disk firmware update is occurring on a disk on the source (or takeover) node and
the takeover was initiated manually with the options parameter of the storage failover
takeover command set to immediate, takeover operations start immediately.
If a background disk firmware update is occurring on a disk on a node and it panics, takeover of
the panicked node begins immediately.
If a background disk firmware update is occurring on a disk on either node, giveback of data
aggregates is delayed until the disk firmware update finishes on that disk. If the background disk
firmware update takes longer than 120 seconds, giveback operations are aborted and must be
restarted manually after the disk firmware update completes.
If a background disk firmware update is occurring on a disk on either node, aggregate relocation
operations are delayed until the disk firmware update finishes on that disk. If the background disk
firmware update takes longer than 120 seconds, aggregate relocation operations are aborted and
must be restarted manually after the disk firmware update finishes. If aggregate relocation was
initiated with the -override-destination-checks of the storage aggregate
relocation command set to true, background disk firmware update occurring on the
destination node does not affect aggregate relocation.
DR home owner
If the system is in a MetroCluster switchover, DR home owner reflects the value of the Home
owner field before the switchover occurred.
Related information
1. Display the ownership of physical disks using the storage disk show -ownership
command:
Example
cluster::> storage disk show -ownership
Disk
Aggregate Home
Owner
DR Home
DR Home ID Reserver
Pool
-------- --------- -------- -------- ------------------ ----------- -----1.0.0
aggr0_2
node2
node2
2014941509 Pool0
1.0.1
aggr0_2
node2
node2
2014941509 Pool0
1.0.2
aggr0_1
node1
node1
2014941219 Pool0
1.0.3
node1
node1
2014941219 Pool0
...
Home ID
Owner ID
2. If you have a system that uses shared disks, display the partition ownership using the storage
disk show -partition-ownership command:
Example
cluster::> storage disk show -partition-ownership
Root
Container Container
Disk
Aggregate Root Owner Owner ID
Data Owner
Owner
Owner ID
-------- --------- ----------- ----------- -------------------- ----------1.0.0
node1
1886742616 node1
node1
1886742616
1.0.1
node1
1886742616 node1
node1
1886742616
1.0.2
node2
1886742657 node2
node2
1886742657
1.0.3
node2
1886742657 node2
node2
1886742657
...
Data
Owner ID
----------1886742616
1886742616
1886742657
1886742657
Do not use the root aggregate for storing data. Storing user data in the root aggregate adversely
affects system stability and increases the storage failover time between nodes in an HA pair.
Make sure that each power supply unit in the storage system is on a different power grid so that a
single power outage does not affect all power supply units.
Use LIFs (logical interfaces) with defined failover policies to provide redundancy and improve
availability of network communication.
Keep both nodes in the HA pair on the same version of Data ONTAP.
higher. You must first upgrade the node running Data ONTAP 8.2 to 8.2.1 or a higher version
within the 8.2 release family.
Test the failover capability routinely (for example, during planned maintenance) to ensure proper
configuration.
Make sure that each node has sufficient resources to adequately support the workload of both
nodes during takeover mode.
Use the Config Advisor tool to help ensure that failovers are successful.
If your system supports remote management (through a Service Processor), make sure that you
configure it properly.
Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
Follow recommended limits for FlexVol volumes, dense volumes, Snapshot copies, and LUNs to
reduce the takeover or giveback time.
For systems using disks, check for failed disks regularly and remove them as soon as possible.
Failed disks can extend the duration of takeover operations or prevent giveback operations.
Multipath HA is required on all HA pairs except for some FAS22xx and FAS25xx system
configurations, which use single-path HA and lack the redundant standby connections.
To ensure that you receive prompt notification if takeover capability becomes disabled, configure
your system to enable automatic email notification for takeover impossible EMS messages:
ha.takeoverImpVersion
ha.takeoverImpLowMem
ha.takeoverImpDegraded
ha.takeoverImpUnsync
ha.takeoverImpIC
ha.takeoverImpHotShelf
ha.takeoverImpNotDef
Avoid using the -only-cfo-aggregates parameter with the storage failover giveback
command.
Related tasks
Architecture compatibility
Both nodes must have the same system model and be running the same Data ONTAP software
and system firmware versions. The Data ONTAP release notes list the supported storage systems.
Storage capacity
The number of disks or array LUNs must not exceed the maximum configuration capacity.
The total storage attached to each node must not exceed the capacity for a single node.
If your system uses both native disks and array LUNs, the combined total of disks and array
LUNs cannot exceed the maximum configuration capacity.
The total storage attached to each node must not exceed the capacity for a single node.
To determine the maximum capacity for a system using disks, array LUNs, or both, see the
Hardware Universe at hwu.netapp.com.
Note: After a failover, the takeover node temporarily serves data from all the storage in the HA
pair.
Different types of storage can be used on separate stacks or loops on the same node. You can
also dedicate a node to one type of storage and the partner node to a different type, if needed.
Multipath HA is required on all HA pairs except for some FAS22xx and FAS25xx system
configurations, which use single-path HA and lack the redundant standby connections.
Network connectivity
Both nodes must be attached to the same network and the Network Interface Cards (NICs) or
onboard Ethernet ports must be configured correctly.
System software
Related references
Disks or array LUNs in the same plex must be from the same pool, with those in the opposite
plex from the opposite pool.
There must be sufficient spares in each pool to account for a disk or array LUN failure.
Both plexes of a mirror should not reside on the same disk shelf because it might result in a
single point of failure.
If you are using array LUNs, paths to an array LUN must be redundant.
Related references
Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
When you revert a system using root-data partitioning to a previous version of Data ONTAP
When you transfer partitioned storage to a system running a previous version of Data ONTAP
The smaller partition is used to compose the root aggregate. The larger partition is used in data
aggregates. The size of the partitions is set by Data ONTAP, and depends on the number of disks
used to compose the root aggregate when the system is initialized. (The more disks used to compose
the root aggregate, the smaller the root partition.) After system initialization, the partition sizes are
fixed; adding partitions or disks to the root aggregate after system initialization increases the size of
the root aggregate, but does not change the root partition size.
The partitions are used by RAID in the same manner as physical disks are; all of the same
requirements apply. For example, if you add an unpartitioned drive to a RAID group consisting of
partitioned drives, the unpartitioned drive is partitioned to match the partition size of the drives in the
RAID group and the rest of the disk is unused.
If a partitioned disk is moved to another node or used in another aggregate, the partitioning persists;
you can use the disk only in RAID groups composed of partitioned disks.
The following diagram shows one way to configure the partitions for an active-active configuration
with 12 partitioned disks. In this case, there are two RAID-DP data aggregates, each with their own
data partitions, parity partitions, and spares. Note that each disk is allocated to only one node. This is
a best practice that prevents the loss of a single disk from affecting both nodes.
The disks used for data, parity, and spare partitions might not be exactly as shown in these diagrams.
For example, the parity partitions might not always align on the same disk.
Array LUNs
HDD types that are not available as internal drives: ATA, FCAL, and MSATA
100-GB SSDs
MetroCluster
RAID4
Aggregates composed of partitioned drives must have a RAID type of RAID-DP.
Related information
the HA pair.
In a dual-chassis HA pair, the controllers are in separate chassis. The HA interconnect is provided by
external cabling.
Required
Verifying and setting the HA state on the controller modules and chassis on page 72
Uses HA
state PROM
value?
FAS8080
Single-chassis or dual-chassis
Dual-chassis: External
InfiniBand using the ports on
the I/O expansion modules
Yes
Single-chassis: Internal
InfiniBand
FAS8060
Single-chassis
Internal InfiniBand
Yes
FAS8040
Single-chassis
Internal InfiniBand
Yes
FAS8020
Single-chassis
Internal InfiniBand
Yes
6290
Single-chassis or dual-chassis
Dual-chassis: External
InfiniBand using NVRAM
adapter
Yes
Single-chassis: Internal
InfiniBand
Dual-chassis: External
InfiniBand using NVRAM
adapter
Single-chassis: Internal
InfiniBand
Dual-chassis: External
InfiniBand using NVRAM
adapter
Single-chassis: Internal
InfiniBand
6280
6250
Single-chassis or dual-chassis
Single-chassis or dual-chassis
Yes
Yes
Uses HA
state PROM
value?
6240
Single-chassis or dual-chassis
Dual-chassis: External
InfiniBand using NVRAM
adapter
Yes
Single-chassis: Internal
InfiniBand
Dual-chassis: External
InfiniBand using NVRAM
adapter
Single-chassis: Internal
InfiniBand
Dual-chassis: External
InfiniBand using NVRAM
adapter
Single-chassis: Internal
InfiniBand
Single-chassis: Internal
InfiniBand
6220
6210
3250
Single-chassis or dual-chassis
Single-chassis or dual-chassis
Single-chassis or dual-chassis
Yes
Yes
Yes
Uses HA
state PROM
value?
3220
Single-chassis or dual-chassis
Yes
Single-chassis: Internal
InfiniBand
FAS25xx
Single-chassis
Internal InfiniBand
Yes
FAS22xx
Single-chassis
Internal InfiniBand
Yes
45
configuration includes SAS disk shelves. For cabling the HA interconnect between the nodes,
use the procedures in this guide.
Required documentation
Installing an HA pair requires the correct documentation.
The following table lists and briefly describes the documentation you might need to refer to when
preparing a new HA pair, or converting two stand-alone systems into an HA pair:
Manual name
Description
Diagnostics Guide
Description
Related information
Required tools
You must have the correct tools to install theHA pair.
The following list specifies the tools you need to install the HA pair:
Hand level
Marker
Required equipment
When you receive your HA pair, you should receive the equipment listed in the following table. See
the Hardware Universe at hwu.netapp.com to confirm your storage system type, storage capacity, and
so on.
Required equipment
Details
Storage system
Storage
Note: When 32xx systems are in a dualchassis HA pair, the c0a and c0b 10-GbE
ports are the HA interconnect ports. They do
not require an HA interconnect adapter.
Regardless of configuration, the 32xx
system's c0a and c0b ports cannot be used for
data. They are only for the HA interconnect.
Not applicable
Not applicable
Details
1. Install the nodes in the equipment rack as described in the guide for your disk shelf, hardware
documentation, or the Installation and Setup Instructions that came with your equipment.
2. Install the disk shelves in the equipment rack as described in the appropriate disk shelf guide.
3. Label the interfaces, where appropriate.
4. Connect the nodes to the network as described in the setup instructions for your system.
The nodes are now in place and connected to the network; power is available.
After you finish
1. Install the system cabinets, nodes, and disk shelves as described in the System Cabinet Guide.
If you have multiple system cabinets, remove the front and rear doors and any side panels that
need to be removed, and connect the system cabinets together.
2. Connect the nodes to the network, as described in the Installation and Setup Instructions for your
system.
3. Connect the system cabinets to an appropriate power source and apply power to the cabinets.
Result
The nodes are now in place and connected to the network, and power is available.
After you finish
This procedure explains how to cable a configuration using DS14mk2 AT or DS14mk4 FC disk
shelves.
Refer to the the NetApp Support Site for additional documentation if your HA pair configuration
includes SAS disk shelves.
Note: If you are installing an HA pair that uses array LUNs, there are specific procedures you
must follow when cabling Data ONTAP systems to storage arrays.
1. Determining which Fibre Channel ports to use for Fibre Channel disk shelf connections on page
51
2. Cabling Node A to DS14mk2 AT or DS14mk4 FC disk shelves on page 52
3. Cabling Node B to DS14mk2 AT or DS14mk4 FC disk shelves on page 54
4. Cabling the HA interconnect (all systems except 32xx or FAS80xx in separate chassis) on page
56
5. Cabling the HA interconnect (32xx systems in separate chassis) on page 57
6. Cabling the HA interconnect (FAS80xx systems in separate chassis) on page 58
Related information
Determining which Fibre Channel ports to use for Fibre Channel disk shelf
connections
Before cabling your HA pair, you need to identify which Fibre Channel ports to use to connect your
disk shelves to each storage system, and in what order to connect them.
You must keep the following guidelines in mind when identifying which ports to use:
Every disk shelf loop in the HA pair requires two ports on the node, one for the primary
connection and one for the redundant multipath HA connection.
A standard HA pair with one loop for each node uses four ports on each node.
Onboard Fibre Channel ports should be used before using ports on expansion adapters.
See the Hardware Universe at hwu.netapp.com to obtain the correct expansion slot assignment
information for the various adapters you use to cable your HA pair.
If using Fibre Channel HBAs, insert the adapters in the same slots on both systems.
After identifying the ports, you should have a numbered list of Fibre Channel ports for both nodes,
starting with Port 1.
Disk shelf loops using ESH4 modules must be cabled to the quad-port HBA first.
Disk shelf loops using AT-FCX modules must be cabled to dual-port HBA ports or onboard ports
before using ports on the quad-port HBA.
Port A of the HBA must be cabled to the In port of Channel A of the first disk shelf in the loop.
Port A of the partner node's HBA must be cabled to the In port of Channel B of the first disk shelf
in the loop. This ensures that disk names are the same for both nodes.
Additional disk shelf loops must be cabled sequentially with the HBAs ports.
Port A is used for the first loop, port B for the second loop, and so on.
If available, ports C or D must be used for the redundant multipath HA connection after cabling
all remaining disk shelf loops.
All other cabling rules described in the documentation for the HBA and the Hardware Universe
must be observed.
The circled numbers in the diagram correspond to the step numbers in the procedure.
The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.
The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.
The port numbers refer to the list of Fibre Channel ports you created.
The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.
Node A
controller
Port A1
Port A2
2
3
Node A
disk shelf 1
Node B
disk shelf 1
Out
In
In
Out
Out
In
To Node A
disk shelf 2
Channel A
In
To Node B
disk shelf 2
Channel B
Out
Node B
controller
Then to
9 Node B
Port B3
Then to
5 Node B
Port B4
2. Cable Fibre Channel port A1 of Node A to the Channel A Input port of the first disk shelf of
Node A loop 1.
3. Cable the Node A disk shelf Channel A Output port to the Channel A Input port of the next disk
shelf in loop 1.
4. Repeat Step 3 for any remaining disk shelves in loop 1.
5. Cable the Channel A Output port of the last disk shelf in the loop to Fibre Channel port B4 of
Node B.
This provides the redundant multipath HA connection for Channel A.
6. Cable Fibre Channel port A2 of Node A to the Channel B Input port of the first disk shelf of
Node B loop 1.
7. Cable the Node B disk shelf Channel B Output port to the Channel B Input port of the next disk
shelf in loop 1.
Cable Node B.
The circled numbers in the diagram correspond to the step numbers in the procedure.
The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.
The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.
The port numbers refer to the list of Fibre Channel ports you created.
The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.
Node A
controller
Port A1
Port A3
Port A2
Port A4
Then to
5 Node A
A
Node A
disk shelf 1
Node B
disk shelf 1
Out
In
In
Out
Out
In
To Node B
disk shelf 2
Channel A
In
Out
2
Node B
controller
4
To Node A
disk shelf 2
Channel B
6
Port B1
Port B2
Port B3
Port B4
2. Cable Port B1 of Node B to the Channel B Input port of the first disk shelf of Node A loop 1.
Both channels of this disk shelf are connected to the same port on each node. This is not required,
but it makes your HA pair easier to administer because the disks have the same ID on each node.
This is true for Step 5 also.
3. Cable the disk shelf Channel B Output port to the Channel B Input port of the next disk shelf in
loop 1.
4. Repeat Step 3 for any remaining disk shelves in loop 1.
5. Cable the Channel B Output port of the last disk shelf in the loop to Fibre Channel port A4 of
Node A.
This provides the redundant multipath HA connection for Channel B.
6. Cable Fibre Channel port B2 of Node B to the Channel A Input port of the first disk shelf of Node
B loop 1.
This procedure applies to all dual-chassis HA pairs (HA pairs in which the two controller modules
reside in separate chassis) except the 32xx or FAS80xx in separate chassis, regardless of disk shelf
type.
Steps
1. See the Hardware Universe at hwu.netapp.com to ensure that your interconnect adapter is in the
correct slot for your system in an HA pair.
For systems that use an NVRAM adapter, the NVRAM adapter functions as the HA interconnect
adapter.
2. Plug one end of the optical cable into one of the local node's HA adapter ports, then plug the
other end into the partner node's corresponding adapter port.
You must not cross-cable the HA interconnect adapter. Cable the local node ports only to the
identical ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
This procedure applies to 32xx systems regardless of the type of attached disk shelves.
Steps
1. Plug one end of the 10-GbE cable to the c0a port on one controller module.
2. Plug the other end of the 10-GbE cable to the c0a port on the partner controller module.
3. Repeat the preceding steps to connect the c0b ports.
Do not cross-cable the HA interconnect adapter; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result
Because the FAS80xx storage controller modules do not include external HA interconnect ports, you
must use the HA interconnect ports on the I/O expansion modules to deploy these controller models
in separate chassis.
This procedure applies to FAS80xx systems, regardless of the type of attached disk shelves.
Steps
1. Plug one end of the QSFP InfiniBand cable to the ib0a port on one I/O expansion module.
2. Plug the other end of the QSFP InfiniBand cable to the ib0a port on the partner's I/O expansion
module.
3. Repeat the preceding steps to connect the ib0b ports.
Do not cross-cable the HA interconnect ports; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result
This procedure explains how to cable a configuration using DS14mk2 AT or DS14mk4 FC disk
shelves.
Refer to the the NetApp Support Site for additional documentation if your HA pair configuration
includes SAS disk shelves.
Note: If you are installing an HA pair that uses array LUNs, there are specific procedures you
must follow when cabling Data ONTAP systems to storage arrays.
Determining which Fibre Channel ports to use for Fibre Channel disk shelf
connections
Before cabling your HA pair, you need to identify which Fibre Channel ports to use to connect your
disk shelves to each storage system, and in what order to connect them.
You must keep the following guidelines in mind when identifying which ports to use:
Every disk shelf loop in the HA pair requires two ports on the node, one for the primary
connection and one for the redundant multipath HA connection.
A standard HA pair with one loop for each node uses four ports on each node.
Onboard Fibre Channel ports should be used before using ports on expansion adapters.
See the Hardware Universe at hwu.netapp.com to obtain the correct expansion slot assignment
information for the various adapters you use to cable your HA pair.
If using Fibre Channel HBAs, insert the adapters in the same slots on both systems.
Disk shelf loops using ESH4 modules must be cabled to the quad-port HBA first.
Disk shelf loops using AT-FCX modules must be cabled to dual-port HBA ports or onboard ports
before using ports on the quad-port HBA.
Port A of the HBA must be cabled to the In port of Channel A of the first disk shelf in the loop.
Port A of the partner node's HBA must be cabled to the In port of Channel B of the first disk shelf
in the loop. This ensures that disk names are the same for both nodes.
Additional disk shelf loops must be cabled sequentially with the HBAs ports.
Port A is used for the first loop, port B for the second loop, and so on.
If available, ports C or D must be used for the redundant multipath HA connection after cabling
all remaining disk shelf loops.
All other cabling rules described in the documentation for the HBA and the Hardware Universe
must be observed.
Mirrored HA pairs, regardless of disk shelf type, use SyncMirror to separate each aggregate into two
plexes that mirror each other. One plex uses disks in pool 0 and the other plex uses disks in pool 1.
You must assign disks to the pools appropriately.
Follow the documented guidelines for software-based disk ownership.
1. Create a table that specifies the port usage; the cabling diagrams in this document use the notation
P1-3 (the third port for pool 1).
For a 32xx HA pair that has two mirrored loops, the port list might look like the following
example:
Pool 1
The circled numbers in the diagram correspond to the step numbers in the procedure.
The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.
The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.
The port numbers refer to the list of Fibre Channel ports you created.
The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.
d. Repeat Substep 3c, connecting the next Channel A output to the next disk shelf Channel A
Input port for any remaining disk shelves in this loop for each disk pool.
e. Repeat Substep 3a through Substep 3d for any additional loops for Channel A, Node A, using
the odd-numbered ports (P0-3 and P1-3, P0-5, and P1-5, and so on).
4. Cable Channel A for Node B.
a. Cable the second port for pool 0 (P0-2) of Node B to the first Node B disk shelf Channel A
Input port of disk shelf pool 0.
b. Cable the second port for pool 1 (P1-2) of Node B to the first Node B disk shelf Channel A
Input port of disk shelf pool 1.
c. Cable the disk shelf Channel A Output port to the next disk shelf Channel A Input port in the
loop for both disk pools.
d. Repeat Substep 4c, connecting Channel A output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 4a through Substep 4d for any additional loops on Channel A, Node B, using
the even-numbered ports (P0-4 and P1-4, P0-6, and P1-6, and so on).
After you finish
The circled numbers in the diagram correspond to the step numbers in the procedure.
The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.
The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.
The port numbers refer to the list of Fibre Channel ports you created.
The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.
d. Repeat Substep 2c, connecting Channel B output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 2a through Substep 2d for any additional loops on Channel B, Node A, using
the even-numbered ports (P0-4 and P1-4, P0-6, and P1-6, and so on).
3. Cable Channel B for Node B.
a. Cable the first port for pool 0 (P0-1) of Node B to the first Node A disk shelf Channel B Input
port of disk shelf pool 0.
b. Cable the first port for pool 1 (P1-1) of Node B to the first Node A disk shelf Channel B Input
port of disk shelf pool 1.
c. Cable the disk shelf Channel B Output port to the next disk shelf Channel B Input port in the
loop for both disk pools.
d. Repeat Substep 3c, connecting Channel B output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 3a through Substep 3d for any additional loops for Channel B, Node B, using
the odd-numbered ports (P0-3 and P1-3, P0-5, and P1-5, and so on).
After you finish
The circled numbers in the diagram correspond to the step numbers in the procedure.
The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.
The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.
The port numbers refer to the list of Fibre Channel ports you created.
The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.
2. Connect the Channel A output port on the last disk shelf for each loop belonging to Node A to an
available port on Node B in the same pool.
3. Connect the Channel B output port on the last disk shelf for each loop belonging to Node A to an
available port on Node B in the same pool.
4. Connect the Channel A output port on the last disk shelf for each loop belonging to Node B to an
available port on Node B in the same pool.
5. Connect the Channel B output port on the last disk shelf for each loop belonging to Node B to an
available port on Node B in the same pool.
This procedure applies to all dual-chassis HA pairs (HA pairs in which the two controller modules
reside in separate chassis) except the 32xx or FAS80xx in separate chassis, regardless of disk shelf
type.
Steps
1. See the Hardware Universe at hwu.netapp.com to ensure that your interconnect adapter is in the
correct slot for your system in an HA pair.
For systems that use an NVRAM adapter, the NVRAM adapter functions as the HA interconnect
adapter.
2. Plug one end of the optical cable into one of the local node's HA adapter ports, then plug the
other end into the partner node's corresponding adapter port.
You must not cross-cable the HA interconnect adapter. Cable the local node ports only to the
identical ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
This procedure applies to 32xx systems regardless of the type of attached disk shelves.
Steps
1. Plug one end of the 10-GbE cable to the c0a port on one controller module.
2. Plug the other end of the 10-GbE cable to the c0a port on the partner controller module.
3. Repeat the preceding steps to connect the c0b ports.
Do not cross-cable the HA interconnect adapter; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result
Because the FAS80xx storage controller modules do not include external HA interconnect ports, you
must use the HA interconnect ports on the I/O expansion modules to deploy these controller models
in separate chassis.
This procedure applies to FAS80xx systems, regardless of the type of attached disk shelves.
1. Plug one end of the QSFP InfiniBand cable to the ib0a port on one I/O expansion module.
2. Plug the other end of the QSFP InfiniBand cable to the ib0a port on the partner's I/O expansion
module.
3. Repeat the preceding steps to connect the ib0b ports.
Do not cross-cable the HA interconnect ports; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result
Configuring an HA pair
Bringing up and configuring a standard or mirrored HA pair for the first time can require enabling
HA mode capability and failover, setting options, configuring network connections, and testing the
configuration.
These tasks apply to all HA pairs regardless of disk shelf type.
Steps
1.
2.
3.
4.
5.
6.
7.
8.
Verifying and setting the HA state on the controller modules and chassis on page 72
Setting the HA mode and enabling storage failover on page 74
Enabling cluster HA and switchless-cluster in a two-node cluster on page 75
Verifying the HA pair configuration on page 77
Configuring hardware-assisted takeover on page 77
Configuring automatic takeover on page 79
Configuring automatic giveback on page 80
Testing takeover and giveback on page 84
The ha-config command only applies to the local controller module and, in the case of a dualchassis HA pair, the local chassis. To ensure consistent HA state information throughout the system,
you must also run these commands on the partner controller module and chassis, if necessary.
Note: When you boot a node for the first time, the HA state value for both controller and chassis is
default.
The HA state is recorded in the hardware PROM in the chassis and in the controller module. It must
be consistent across all components of the system, as shown in the following table:
Configuring an HA pair | 73
If the system or systems are
in a...
Stand-alone configuration
(not in an HA pair)
The chassis
non-ha
Controller module A
The chassis
Controller module A
Controller module B
Chassis A
Controller module A
Chassis B
Controller module B
The chassis
Controller module A
Controller module B
Chassis A
Controller module A
Chassis B
Controller module B
A single-chassis HA pair
A dual-chassis HA pair
ha
ha
mcc
mcc
Use the following steps to verify the HA state is appropriate and, if not, to change it:
Steps
1. Reboot or halt the current controller module and use either of the following two options to boot
into Maintenance mode:
a. If you rebooted the controller, press Ctrl-C when prompted to display the boot menu and then
select the option for Maintenance mode boot.
b. If you halted the controller, enter the following command from the LOADER prompt:
boot_ontap maint
2. After the system boots into Maintenance mode, enter the following command to display the HA
state of the local controller module and chassis:
ha-config show
4. If necessary, enter the following command to set the HA state of the chassis:
ha-config modify chassis ha-state
6. Boot the system by entering the following command at the boot loader prompt:
boot_ontap
Configuring an HA pair | 75
If you want to...
Related references
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Enable takeover
Disable takeover
Related information
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
In a two-node cluster, cluster HA ensures that the failure of one node does not disable the cluster. If
your cluster contains only two nodes:
Enabling cluster HA requires and automatically enables storage failover and auto-giveback.
A two-node cluster can be configured using direct-cable connections between the nodes instead of a
cluster interconnect switch. If you have a two-node switchless configuration, the switchlesscluster network option must be enabled to ensure proper cluster communication between the
nodes.
Steps
If storage failover is not already enabled, you are prompted to confirm enabling of both storage
failover and auto-giveback.
2. If you have a two-node switchless cluster, enter the following commands to verify that the
switchless-cluster option is set:
a. Enter the following command to change to the advanced privilege level:
set -privilege advanced
Confirm when prompted to continue into advanced mode. The advanced mode prompt appears
(*>).
b. Enter the following command:
network options switchless-cluster show
If the output shows that the value is false, you must issue the following command:
network options switchless-cluster modify true
Configuring an HA pair | 77
Config Advisor is a configuration validation and health check tool for NetApp systems. It can be
deployed at both secure sites and non-secure sites for data collection and system analysis.
Note: Support for Config Advisor is limited and available only online.
Steps
1. Log in to the NetApp Support Site at mysupport.netapp.com and go to Downloads > Software >
ToolChest.
2. Click Config Advisor.
3. Follow the directions on the web page for downloading, installing, and running the utility.
4. After running Config Advisor, review the tool's output and follow the recommendations provided
to address any issues discovered.
Related information
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators
Takeover
initiated upon
receipt?
Description
power_loss
Yes
l2_watchdog_reset
Yes
power_off_via_sp
Yes
power_cycle_via_sp
Yes
reset_via_sp
Yes
abnormal_reboot
No
Configuring an HA pair | 79
Alert
Takeover
initiated upon
receipt?
Description
loss_of_heartbeat
No
periodic_message
No
test
No
Reboots
Panics
Related information
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators
The node cannot send heartbeat messages to its partner due to events such as loss of power or
watchdog reset.
The automatic giveback causes a second unscheduled interruption (after the automatic takeover).
Depending on your client configurations, you might want to initiate the giveback manually to
plan when this second interruption occurs.
The takeover might have been due to a hardware problem that can recur without additional
diagnosis, leading to additional takeovers and givebacks.
Note: Automatic giveback is enabled by default if the cluster contains only a single HA pair.
Automatic giveback is disabled by default during nondisruptive Data ONTAP upgrades.
Before performing the automatic giveback (regardless of what triggered it), the partner node waits for
a fixed amount of time as controlled by the -delay-seconds parameter of the storage failover
Configuring an HA pair | 81
modify command. The default delay is 600 seconds. By delaying the giveback, the process results in
true
Note: Setting this parameter to false does
Related information
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators
Configuring an HA pair | 83
Parameter
Default setting
-auto-giveback true|false
600
-onreboot true|false
true
The following table describes how combinations of the -onreboot and -auto-giveback
parameters affect automatic giveback for takeover events not caused by a panic:
storage failover
modify parameters used
Cause of takeover
-onreboot true
reboot command
Yes
-auto-giveback true
Yes
reboot command
Yes
-auto-giveback false
No
reboot command
No
-auto-giveback true
Yes
-onreboot false
reboot command
No
-auto-giveback false
No
When the -onreboot parameter is set to false, a takeover does not occur in the case of a node
reboot. Therefore, automatic giveback cannot occur, regardless of whether the -auto-giveback
parameter is set to true. A client disruption occurs.
Default setting
-onpanic true|false
true
-auto-giveback-after-panic true|
false
true
(Privilege: Advanced)
The following table describes how parameter combinations of the storage failover modify
command affect automatic giveback in panic situations:
storage failover parameters used
-onpanic true
Yes
-auto-giveback-after-panic true
-onpanic true
Yes
-auto-giveback-after-panic false
-onpanic false
No
-auto-giveback-after-panic true
-onpanic false
No
-auto-giveback-after-panic false
Note: If the -onpanic parameter is set to true, automatic giveback is always performed if a
panic occurs.
If the -onpanic parameter is set to false, takeover does not occur. Therefore, automatic
giveback cannot occur, even if the autogivebackafterpanic parameter is set to true. A
client disruption occurs.
Configuring an HA pair | 85
normally provided by the partner node. During giveback, control and delivery of the partner's storage
should return to the partner node.
Steps
1. Check the cabling on the HA interconnect cables to make sure that they are secure.
2. Verify that you can create and retrieve files on both nodes for each licensed protocol.
3. Enter the following command:
storage failover takeover -ofnode partner_node
Example
5. Enter the following command to display all the disks that belong to the partner node (Node2) that
the takeover node (Node1) can detect:
storage disk show -home node2 -ownership
The following command displays all disks belonging to Node2 that Node1 can detect:
Home
Owner ID
DR Home ID
4078312453 4078312453 -
node2 node2 -
4078312453 4078312453 -
6. Enter the following command to confirm that the takeover node (Node1) controls the partner
node's (Node2) aggregates:
aggr show fields homeid,homename,ishome
cluster::> aggr show fields homeid,homename,ishome
aggregate home-id
home-name is-home
--------- ---------- --------- --------aggr0_1
2014942045 node1
true
aggr0_2
4078312453 node2
false
aggr1_1
2014942045 node1
true
aggr1_2
4078312453 node2
false
4 entries were displayed.
During takeover, the is-home value of the partner node's aggregates is false.
7. Give back the partner node's data service after it displays the Waiting for giveback message
by entering the following command:
storage failover giveback -ofnode partner_node
8. Enter either of the following commands to observe the progress of the giveback operation:
storage failover show-giveback
storage failover show
9. Proceed depending on whether you saw the message that giveback was completed successfully:
If takeover and giveback...
Then...
Is completed successfully
Fails
Correct the takeover or giveback failure and then repeat this procedure.
Related references
87
Monitoring an HA pair
You can use a variety of commands to monitor the status of the HA pair. If a takeover occurs, you
can also determine what caused the takeover.
cluster ha show
haconfig show
Note: This is a Maintenance mode command.
Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: storage failover hwassist show - Display hwassist status
Clustered Data ONTAP 8.3 man page: storage failover hwassist stats show - Display hwassist
statistics
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: cluster ha show - Show high-availability configuration
status for the cluster
Clustered Data ONTAP 8.3 man page: storage aggregate show - Display a list of aggregates
Meaning
Connected to partner_name.
Monitoring an HA pair | 89
State
Meaning
Pending shutdown.
In takeover.
Meaning
Takeover in progress.
Monitoring an HA pair | 91
State
Meaning
firmware-status.
Node unreachable.
Meaning
Connected to partner_name,
Automatic takeover disabled.
Monitoring an HA pair | 93
State
Meaning
Meaning
Monitoring an HA pair | 95
State
Meaning
Meaning
Monitoring an HA pair | 97
Possible values for reason are as follows:
Local node has encountered errors while reading the storage failover partner's mailbox disks
Boot failed
Booting
Dumping core
Halted
In takeover
Initializing
Operator completed
Rebooting
Takeover disabled
Unknown
Up
Waiting
Related references
Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
99
inhibittakeover
takeoveronreboot setting
Halting or rebooting a node without initiating takeover in a two-node cluster on page 100
Related information
Clustered Data ONTAP 8.3 man page: system node reboot - Reboot a node
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Before a node in a cluster configured for cluster HA is rebooted or halted using the
inhibittakeover true parameter, you must first disable cluster HA and then assign epsilon to
Confirm when prompted to continue into advanced mode. The advanced mode prompt appears
(*>).
b. Determine which node holds epsilon by using the following command:
cluster show
Eligibility
-----------true
true
Epsilon
-----------true
false
If the node you wish to halt or reboot does not hold epsilon, proceed to step 3.
c. If the node you wish to halt or reboot holds epsilon, you must remove it from the node by
using the following command:
cluster modify -node Node1 -epsilon false
3. Halt or reboot and inhibit takeover of the node that does not hold epsilon (in this example, Node2)
by using either of the following commands as appropriate:
system node halt -node Node2 -inhibit-takeover true
system node reboot -node Node2 -inhibit-takeover true
4. After the halted or rebooted node is back online, you must enable cluster HA by using the
following command:
cluster ha modify -configured true
Take over the partner node even if there is a disk storage failover takeover
mismatch
allowdiskinventorymismatch
Take over the partner node even if there is a
Data ONTAP version mismatch
If you specify the storage failover takeover option immediate command without
first migrating the data LIFs, data LIF migration from the node is significantly delayed even if
the skiplifmigrationbeforetakeover option is not specified.
Similarly, if you specify the immediate option, negotiated takeover optimization is bypassed
even if the bypassoptimization option is set to false.
Attention: For All-Flash Optimized FAS80xx series systems, both nodes in the HA pair must have
inactive, or the contents of the failover partner's NVRAM cards are unsynchronized,
takeover is normally disabled. Using the force option enables a node to take over its
partner's storage despite the unsynchronized NVRAM, which can contain client data that can
be lost upon storage failover takeover.
Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: network interface migrate-all - Migrate all data and cluster
management logical interfaces away from the specified node
Clustered Data ONTAP 8.3 Physical Storage Management Guide
To perform planned maintenance, you must take over one of the nodes in an HA pair. Cluster-wide
quorum must be maintained to prevent unplanned client data disruptions for the remaining nodes. In
some instances, performing the takeover can result in a cluster that is one unexpected node failure
away from cluster-wide loss of quorum.
This can occur if the node being taken over holds epsilon or if the node with epsilon is not healthy.
To maintain a more resilient cluster, you can transfer epsilon to a healthy node that is not being taken
over. Typically, this would be the HA partner.
Only healthy and eligible nodes participate in quorum voting. To maintain cluster-wide quorum,
more than N/2 votes are required (where N represents the sum of healthy, eligible, online nodes). In
clusters with an even number of online nodes, epsilon adds additional voting weight toward
maintaining quorum for the node to which it is assigned.
Note: Although cluster formation voting can be modified by using the cluster modify
eligibility false command, you should avoid this except for situations such as restoring the
node configuration or prolonged node maintenance. If you set a node as ineligible, it stops serving
SAN data until the node is reset to eligible and rebooted. NAS data access to the node might also
be affected when the node is ineligible.
For further information on cluster administration, quorum and epsilon, see the document library on
the NetApp support site at mysupport.netapp.com/documentation/productsatoz/index.html.
Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
Steps
1. Verify the cluster state and confirm that epsilon is held by a healthy node that is not being taken
over.
Confirm you want to continue when the advanced mode prompt appears (*>).
b. Determine which node holds epsilon by using the following command:
cluster show
Eligibility
-----------true
true
Epsilon
-----------true
false
If the node you want to take over does not hold epsilon, proceed to Step 4.
2. Enter the following command to remove epsilon from the node that you want to take over:
cluster modify -node Node1 -epsilon false
3. Assign epsilon to the partner node (in this example, Node2) by using the following command:
cluster modify -node Node2 -epsilon true
Halting or rebooting a node without initiating takeover in a two-node cluster on page 100
Related references
If giveback is interrupted
If the takeover node experiences a failure or a power outage during the giveback process, that process
stops and the takeover node returns to takeover mode until the failure is repaired or the power is
restored.
However, this depends upon the stage of giveback in which the failure occurred. If the node
encountered failure or a power outage during partial giveback state (after it has given back the root
aggregate), it will not return to takeover mode. Instead, the node returns to partial-giveback mode. If
this occurs, complete the process by repeating the giveback operation.
If giveback is vetoed
If giveback is vetoed, you must check the EMS messages to determine the cause. Depending on the
reason or reasons, you can decide whether you can safely override the vetoes.
The storage failover show-giveback command displays the giveback progress and shows
which subsystem vetoed the giveback, if any. Soft vetoes can be overridden, while hard vetoes cannot
be, even if forced. The following tables summarize the soft vetoes that should not be overridden,
along with recommended workarounds.
You can review the EMS details for any giveback vetoes by using the following command:
event log show -node * -event gb*
Workaround
vfiler_low_level
Terminate the CIFS sessions causing the veto, or shutdown the CIFS
application that established the open sessions.
Overriding this veto might cause the application using CIFS to
disconnect abruptly and lose data.
Workaround
Disk Check
Workaround
Lock Manager
RAID
Disk Inventory
Workaround
SnapMirror
Related references
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: storage failover giveback - Return failed-over storage to its
home node
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators
1. Confirm that there are two paths to every disk by entering the following command:
storage disk show -port
Note: If two paths are not listed for every disk, this procedure could result in a data service
outage. Before proceeding, address any issues so that all paths are redundant. If you do not
have redundant paths to every disk, you can use the nondisruptive upgrade method (failover) to
add your storage.
2. Install the new disk shelf in your cabinet or equipment rack, as described in the DiskShelf14mk2
and DiskShelf14mk4 FC, or DiskShelf14mk2 AT Hardware Service Guide.
displays messages about adapter resets and eventually indicates that the loop is down. These
messages are normal within the context of this procedure. However, to avoid them, you can
optionally disable the adapter prior to disconnecting the disk shelf.
If you choose to, disable the adapter attached to the Channel A Output port of the last disk
shelf by entering the following command:
run node nodename fcadmin config -d adapter
adapter identifies the adapter by name. For example: 0a.
4. Disconnect the SFP and cable coming from the Channel A Output port of the last disk shelf.
Note: Leave the other ends of the cable connected to the controller.
5. Using the correct cable for a shelf-to-shelf connection, connect the Channel A Output port of the
last disk shelf to the Channel A Input port of the new disk shelf.
6. Connect the cable and SFP you removed in Step 4 to the Channel A Output port of the new disk
shelf.
7. If you disabled the adapter in Step 3, reenable the adapter by entering the following command:
run node nodename fcadmin config -e adapter
9. Confirm that there are two paths to every disk by entering the following command:
storage disk show -port
SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Whenever you remove a module from an HA pair, you need to know whether the path you will
disrupt is redundant.
If it is, you can remove the module without interfering with the storage systems ability to serve
data. However, if that module provides the only path to any disk in your HA pair, you must take
action to ensure that you do not incur system downtime.
When you replace a module, make sure that the replacement modules termination switch is in the
same position as the module it is replacing.
Note: ESH4 modules are self-terminating; this guideline does not apply to ESH4 modules.
If you replace a module with a different type of module, make sure that you also change the
cables, if necessary.
For more information about supported cable types, see the hardware documentation for your disk
shelf.
Always wait 30 seconds after inserting any module before reattaching any cables in that loop.
1. Verify that all disk shelves are functioning properly by entering the following command:
run -node nodename environ shelf
2. Verify that there are no missing disks by entering the following command:
run -node nodename aggr status -r
Local disks displayed on the local node should be displayed as partner disks on the partner node,
and vice-versa.
3. Verify that you can create and retrieve files on both nodes for each licensed protocol.
If the disks have redundant paths, you can remove the module without interfering with the storage
systems ability to serve data. However, if that module provides the only path to any of the disks in
your HA pair, you must take action to ensure that you do not incur system downtime.
Step
1. Enter the storage disk show -port command on your system console.
This command displays the following information for every disk in the HA pair:
Primary port
Secondary port
Disk type
Disk shelf
Bay
Port Type
---B
B
B
A
A
Shelf
Notice that every disk has two active ports: one for A and one for B. The presence of the
redundant path means that you do not need to fail over a node before removing modules from
the system.
redundant paths, a hardware or configuration problem can cause one or more disks to have
only one path. If any disk in your HA pair has only one path, you must treat that loop as if it
were in a single-path HA pair when removing modules.
The following example shows what the storage disk show -port command output might
look like for an HA pair consisting of Data ONTAP systems that do not use redundant paths:
cluster::> storage disk show -port
Primary
Port Secondary
Bay
--------------- ---- --------------1.0.0
A
1.0.1
A
1.0.2
A
1.0.3
B
1.0.4
B
...
Port Type
----
Shelf
For this HA pair, there is only one path to each disk. This means that you cannot remove a
module from the configuration and disable that path without first performing a takeover.
Hot-swapping a module
You can hot-swap a faulty disk shelf module, removing the faulty module and replacing it without
disrupting data availability.
About this task
When you hot-swap a disk shelf module, you must ensure that you never disable the only path to a
disk; disabling that single path results in a system outage.
Attention: If there is newer firmware in the /mroot/etc/shelf_fw directory than that on the
replacement module, the system automatically runs a firmware update. This firmware update
causes a service interruption on non-multipath HA ATFCX installations, multipath HA
configurations running versions of Data ONTAP prior to 7.3.1, and systems with non-RoHS
ATFCX modules.
Steps
1. Verify that your storage system meets the minimum software requirements to support the disk
shelf modules that you are hot-swapping.
See the DiskShelf14mk2 and DiskShelf14mk4 FC, or DiskShelf14mk2 AT Hardware Service
Guide for more information.
c. Wait for takeover to be complete and make sure that the partner node, or NodeA, reboots and
is waiting for giveback.
Any module in the loop that is attached to NodeA can now be replaced.
4. Put on the antistatic wrist strap and grounding leash.
5. Disconnect the module that you are removing from the Fibre Channel cabling.
6. Using the thumb and index fingers of both hands, press the levers on the CAM mechanism on the
module to release it and pull it out of the disk shelf.
7. Slide the replacement module into the slot at the rear of the disk shelf and push the levers of the
cam mechanism into place.
Attention: Do not use excessive force when sliding the module into the disk shelf; you might
b. Wait for the giveback to be completed before proceeding to the next step.
11. Test the replacement module.
12. Test the configuration.
See the...
SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
For FAS2240 configurations, the external storage must be cabled as multipath HA.
You must have already removed all aggregates from the disk drives in the disk shelves you are
removing.
Attention: If you attempt this procedure with aggregates on the disk shelf you are removing,
As a best practice, you should remove disk drive ownership after you remove the aggregates from
the disk drives in the disk shelves you are removing.
Note: This procedure follows the best practice of removing disk drive ownership; therefore,
steps are written with the assumption that you have removed disk drive ownership.
The Clustered Data ONTAP Physical Storage Management Guide includes the Removing
ownership from a disk procedure for removing disk drive ownership. This document is available
on the NetApp Support Site at mysupport.netapp.com.
Note: The procedure for removing ownership from disk drives requires you to disable disk
autoassignment. You reenable disk autoassignment when prompted at the end of this shelf hotremove procedure.
If you are removing one or more disk shelves from within a loop, you must have factored the
distance to bypass the disk shelves you are removing; therefore, if the current cables are not long
enough, you need to have longer cables available.
The Hardware Universe at hwu.netapp.com contains information about supported cables.
This procedure follows cabling best practices; therefore, references to modules and module input
and output ports align with the best practices. If your storage system is cabled differently from
Path A refers to the A-side disk shelf module (module A) located in the top of the disk shelf.
Path B refers to the B-side disk shelf module (module B) located in the bottom of the disk shelf.
The first disk shelf in the loop is the disk shelf with the input ports directly connected to the
controllers.
The interim disk shelf in the loop is the disk shelf directly connected to other disk shelves in the
loop.
The last disk shelf in the loop is the disk shelf with output ports directly connected to the
controllers.
The next disk shelf is the disk shelf downstream of the disk shelf being removed, in depth order.
The previous disk shelf is the disk shelf upstream of the disk shelf being removed, in depth order.
Clustered Data ONTAP commands and 7-Mode commands are used; therefore, you will be
entering commands from the clustershell and from the nodeshell.
Steps
1. Verify that your system configuration is Multi-Path HA by entering the following command
from the nodeshell of either controller:
sysconfig
because the internal storage is cabled as single-path HA and the external storage is cabled as
multipath HA.
Attention: If your non FAS2240 system is shown as something other than Multi-Path HA,
you cannot continue with this procedure. Your system must meet the prerequisites stated in the
Before you begin section of this procedure.
2. Verify that the disk drives in the disk shelves you are removing have no aggregates (are spares)
and ownership is removed, by completing the following substeps:
a. Enter the following command from the clustershell of either controller:
storage disk show -shelf shelf_number
b. Check the output to verify there are no aggregates on the disk drives in the disk shelves you
are removing.
Disk drives with no aggregates have a dash in the Aggregate column.
continue with this procedure. Your system must meet the prerequisites stated in the Before
you begin section of this procedure.
c. Check the output to verify that ownership is removed from the disk drives on the disk shelves
you are removing or that the disk drives are failed.
If the output
shows...
Then...
unassigned or
broken for all disk
drives
Any disk drives in
the disk shelves you
are removing have
ownership
Example
The following output for the storage disk show -shelf 3 command shows disk drives on
the disk shelf being removed (disk shelf 3). All of the disk drives in disk shelf 3 have a dash in the
Aggregate column. Two disk drives have the ownership removed; therefore, unassigned
appears in the Container Type column. And two disk drives are failed; therefore, broken
appears in the Container Type column:
cluster::> storage disk show -shelf 3
Usable
Disk
Disk
Size Shelf Bay Type
-------- -------- ----- --- -----...
1.3.4
3
4 SAS
1.3.5
3
5 SAS
1.3.6
3
6 SAS
1.3.7
3
7 SAS
...
Container
Container
Type
Name
Owner
----------- ---------- --------unassigned
unassigned
broken
broken
3. Turn on the LEDs for each disk drive in the disk shelves you are removing so that the disk shelves
are physically identifiable by completing the following substeps:
You need to be certain of which disk shelves you are removing so that you can correctly recable
path A and path B later in this procedure.
You enter the commands from the nodeshell of either controller.
a. Identify the disk drives in each disk shelf you are removing:
fcadmin device_map
In this output, the shelf mapping shows three disk shelves in a loop and their respective 14
disk drives. If disk shelf 3 is being removed, disk drives 45 44 43 42 41 40 39 38 37 36 35 34
33 32 are applicable.
fas6200> fcadmin device_map
Loop Map for channel 0c:
...
Shelf mapping:
Shelf 3: 45 44 43 42
Shelf 4: 77 76 75 74
Shelf 5: 93 92 91 90
...
41
73
89
40
72
88
39
71
87
38
70
86
37
69
85
36
68
84
35
67
83
34
66
82
33
65
81
32
64
80
b. Turn on the LEDs for the disk drives you identified in Substep a:
led_on disk_name
To turn on the fault LED for disk drive 0c.45 in disk shelf 3 identified in Substep a, you enter
led_on 0c.45
4. If you are removing an entire loop of disk shelves, complete the following substeps; otherwise, go
to the next step:
a. Remove all cables on path A and path B.
This includes controller-to-shelf cables and shelf-to-shelf cables for all disk shelves in the
loop you are removing.
b. Go to Step 8.
5. If you are removing one or more disk shelves from a loop (but keeping the loop), recable the
applicable path A loop connections to bypass the disk shelves you are removing by completing
the applicable set of substeps:
If you are removing more than one disk shelf, complete the applicable set of substeps one disk
shelf at a time.
Then...
a.
Remove the cable connecting the module A output port of the first
disk shelf and the module A input port of the second disk shelf in the
loop and set it aside.
b.
Move the cable connecting the controller to the module A input port
of the first disk shelf to the module A input port of the second disk
shelf in the loop
a.
Remove the cable connecting the module A output port of the disk
shelf being removed and the module A input port of the next disk
shelf in the loop and set it aside.
b.
Move the cable connecting the module A input port of the disk shelf
being removed to the module A input port of the next disk shelf in the
loop
a.
Remove the cable connecting the module A input port of the last disk
shelf and the module A output port of the previous disk shelf in the
loop and set it aside.
b.
Move the cable connecting the controller to the module A output port
of the last disk shelf to the module A output port of the previous disk
shelf in the loop
6. Verify that the cabling on path A has successfully bypassed the disk shelves you are removing
and all disk drives on the disk shelves you are removing are still connected through path B, by
entering the following command from the nodeshell of either controller:
storage show disk -p
In this example of how the output should appear, the disk shelf being removed is disk shelf 3. One
line item appears for each disk drive connected through path B (now the primary path); therefore,
the disk drives are listed in the PRIMARY column and B appears in the first PORT column. There is
no connectivity through path A for any of the disk drives in the disk shelf being removed;
therefore, no information is shown in the SECONDARY or second PORT columns:
0
1
2
3
4
5
6
7
Attention: If the output shows anything other than all the disk drives connected only through
125
Aggregate aggr_1
8 disks on shelf sas_1
(shaded grey)
Node2
Owned by Node1
before relocation
Owned by Node2
after relocation
Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
Related information
Because volume count limits are validated programmatically during aggregate relocation
operations, it is not necessary to check for this manually.
If the volume count exceeds the supported limit, the aggregate relocation operation fails with a
relevant error message.
You should not initiate aggregate relocation when system-level operations are in progress on
either the source or the destination node; likewise, you should not start these operations during
the aggregate relocation.
These operations can include the following:
Takeover
Giveback
Shutdown
If you have a MetroCluster configuration, you should not initiate aggregate relocation while
disaster recovery operations (switchover, healing, or switchback) are in progress.
You should not initiate aggregate relocation on aggregates that are corrupt or undergoing
maintenance.
For All-Flash Optimized FAS80xx-series systems, both nodes in the HA pair must have the AllFlash Optimized personality enabled.
Because the All-Flash Optimized configuration supports only SSDs, if one node in the HA pair
has HDDs or array LUNs (and therefore, is not configured with the All-Flash Optimized
If the source node is used by an Infinite Volume with SnapDiff enabled, you must perform
additional steps before initiating the aggregate relocation and then perform the relocation in a
specific manner.
You must ensure that the destination node has a namespace mirror constituent and make decisions
about relocating aggregates that include namespace constituents.
Before initiating the aggregate relocation, you should save any core dumps on the source and
destination nodes.
Steps
1. View the aggregates on the node to confirm which aggregates to move and ensure they are online
and in good condition:
storage aggregate show -node source-node
Example
The following command shows six aggregates on the four nodes in the cluster. All aggregates are
online. Node1 and Node3 form an HA pair and Node2 and Node4 form an HA pair.
cluster::> storage aggregate show
Aggregate
Size Available Used% State
#Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ------ ----------aggr_0
239.0GB
11.13GB
95% online
1 node1 raid_dp,
normal
aggr_1
239.0GB
11.13GB
95% online
1 node1 raid_dp,
normal
aggr_2
239.0GB
11.13GB
95% online
1 node2 raid_dp,
normal
aggr_3
239.0GB
11.13GB
95% online
1 node2 raid_dp,
normal
aggr_4
239.0GB
238.9GB
0% online
5 node3 raid_dp,
normal
aggr_5
239.0GB
239.0GB
0% online
4 node4 raid_dp,
normal
6 entries were displayed.
The following command moves the aggregates aggr_1 and aggr_2 from Node1 to Node3. Node3
is Node1's HA partner. The aggregates can be moved only within the HA pair.
3. Monitor the progress of the aggregate relocation with the storage aggregate relocation
show command:
storage aggregate relocation show -node source-node
Example
The following command shows the progress of the aggregates that are being moved to Node3:
cluster::> storage aggregate relocation show -node node1
Source Aggregate
Destination
Relocation Status
------ ----------- ------------- -----------------------node1
aggr_1
node3
In progress, module: wafl
aggr_2
node3
Not attempted yet
2 entries were displayed.
node1::storage aggregate>
When the relocation is complete, the output of this command shows each aggregate with a
relocation status of Done.
Related concepts
Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
Background disk firmware update and takeover, giveback, and aggregate relocation on page 29
Related information
Meaning
-node nodename
-destination nodename
-override-vetoes true|false
Meaning
-relocate-to-higher-version true|
false
-override-destination-checks true|
false
Related information
Clustered Data ONTAP 8.3 man page: storage aggregate relocation start - Relocate aggregates to
the specified destination
Clustered Data ONTAP 8.3 Upgrade and Revert/Downgrade Guide
You can review the EMS details for aggregate relocation by using the following command:
event log show -node * -event arl*
The following tables summarize the soft and hard vetoes, along with recommended workarounds:
Veto checks during aggregate relocation
Vetoing subsystem
module
Workaround
Vol Move
Backup
Lock manager
To resolve the issue, gracefully shut down the CIFS applications that
have open files, or move those volumes to a different aggregate.
Overriding this veto results in loss of CIFS lock state, causing
disruption and data loss.
Workaround
RAID
Workaround
Disk Inventory
WAFL
Workaround
RAID
Clustered Data ONTAP 8.3 man page: storage aggregate relocation show - Display relocation
status of an aggregate
135
Copyright information
Copyright 19942015 NetApp, Inc. All rights reserved. Printed in the U.S.
No part of this document covered by copyright may be reproduced in any form or by any means
graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an
electronic retrieval systemwithout prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and
disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE,
WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice.
NetApp assumes no responsibility or liability arising from the use of products described herein,
except as expressly agreed to in writing by NetApp. The use or purchase of this product does not
convey a license under any patent rights, trademark rights, or any other intellectual property rights of
NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents,
or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer
Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).
Trademark information
NetApp, the NetApp logo, Go Further, Faster, ASUP, AutoSupport, Campaign Express, Cloud
ONTAP, clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness, Flash Accel,
Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale,
FlexShare, FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster,
MultiStore, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, SANtricity, SecureShare,
Simplicity, Simulate ONTAP, Snap Creator, SnapCopy, SnapDrive, SnapIntegrator, SnapLock,
SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapValidator,
SnapVault, StorageGRID, Tech OnTap, Unbound Cloud, and WAFL are trademarks or registered
trademarks of NetApp, Inc., in the United States, and/or other countries. A current list of NetApp
trademarks is available on the web at http://www.netapp.com/us/legal/netapptmlist.aspx.
Cisco and the Cisco logo are trademarks of Cisco in the U.S. and other countries. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as
such.
137
Index
HA configuration 32
A
active/active
storage configuration 37
active/passive
storage configuration 37
adapters
NVRAM 48
quad-port Fibre Channel HBA 51, 59
aggregate ownership
relocation of 125
aggregate relocation
benefits of 125
commands for 129
effect on root-data partitioning 126
effect on shared disks 126
how it works 125
monitoring progress of 132
overriding a veto of 132
aggregates
CFO 28
HA policy of 28
ownership change 27, 28, 127
relocation of 28, 125, 127
root 28, 32
SFO 28
All-Flash Optimized personality systems
conditions of manual takeovers with configuration
mismatches 102
asymmetrical
storage configuration 37
automatic giveback
commands for configuring 81
how it works 80
parameters and how they affect giveback 82
automatic takeover
triggers for 80
automatic takeovers
commands for changing policy 79
B
background disk firmware update
giveback 29
takeover 29
best practices
C
cabinets
preparing for cabling 50
cabling
Channel A, for mirrored HA pairs 61
Channel A, for standard HA pairs 52
Channel B, for mirrored HA pairs 63
Channel B, for standard HA pairs 54
error message, cross-cabled HA interconnect 5658,
69, 70
HA interconnect for standard HA pair 56, 69
HA interconnect for standard HA pair, 32xx systems
57, 70
HA interconnect for standard HA pair, 80xx systems
58, 70
HA interconnect, cross-cabled 5658, 69, 70
HA pairs 45
preparing equipment racks for 49
preparing system cabinets for 50
requirements 48
CFO
definition of 28
HA policy 28
Channel A
cabling 52, 61
defined 35
Channel B
cabling 54, 63
chassis configurations, single or dual 40
CIFS sessions
effect of takeover on 25
cluster HA
configuring in two-node clusters 75
cluster HAconfiguring in two-node clusters
disabling, when halting or rebooting a node in a twonode cluster 100
cluster high availability
configuring in two-node clusters 75
cluster network 13
clusters
configuring cluster HA in two-node 75
configuring switchless-cluster in two-node 75
special configuration settings for two-node 75
Index | 139
clusters and HA pairs 13
commands
aggregate home status 87
cf giveback (enables giveback) 84
cf takeover (initiates takeover) 84
cluster ha status 87
disabling storage failover 75
enabling HA mode 74
enabling storage failover 75
for automatic giveback configuration 81
for changing automatic takeover policy 79
for checking node states 88
for configuring hardware-assisted takeover 77
ha-config modify 72
ha-config show 72
ha-config status 87
halting a node without initiating takeover 99
rebooting a node without initiating takeover 99
storage disk show -port (displays paths) 114
storage failover giveback (enables giveback) 84
storage failover status 87
storage failover takeover (initiates takeover) 84
takeover (description of all status commands) 87
comments
how to send feedback about documentation 137
comparison
HA pair types 13
Config Advisor
checking for common configuration errors with 77
downloading and running 77
configuration variations
mirrored HA pairs 19
configurations
HA differences between supported system 42
testing takeover and giveback 84
controller failover
benefits of 9
controller failovers
events that trigger 21
current owner
disk ownership type, defined 30
failover
benefits of controller 9
failovers
events that trigger 21
failures
table of failover triggering 21
fault tolerance
how HA pairs support 7
data network 13
Data ONTAP
upgrading nondisruptively 118
upgrading nondisruptively, documentation for 118
disk shelves
about modules for 112
137
how to send feedback about 137
required 46
DR home owner
disk ownership type, defined 30
dual-chassis HA configurations
diagram of 40
interconnect 41
E
eliminating
single point of failure 9
EMS message
takeover impossible 32
entry-level platforms
benefits of root-data partitioning for 37
epsilon
moving during manually initiated takeover 104
equipment racks
installation in 45
preparation of 49
events
table of failover triggering 21
G
giveback
CFO (root) aggregates only 108
commands for 108
commands for configuring automatic 81
definition of 20
effect on root-data partitioning 29
effect on shared disks 29
interrupted 106
manual 108
monitoring progress of 106, 108
overriding vetoes 108
partial-giveback 106
performing a 106
testing 84
veto 106
what happens during 27
giveback after reboot
automatic 80
H
HA
configuring in two-node clusters 75
HA configurations
benefits of 7
best practices 32
definition of 7
differences between supported system 42
single- and dual-chassis 40
HA interconnect
cabling 56, 69
cabling, 32xx dual-chassis HA configurations 57, 70
cabling, 80xx dual-chassis HA configurations 58, 70
in the HA pair 7
single-chassis and dual-chassis HA configurations
41
HA mode
enabling 74
HA pairs
cabling 45, 50
cabling mirrored 59
events that trigger failover in 21
in a two-node switchless cluster 17
installation 45
managing disk shelves in 110
MetroCluster, compared with 16
required connections for using UPSs with 71
setup requirements 33
setup restrictions 33
storage configuration variations 37
types of
installed in equipment racks 45
installed in system cabinets 46
mirrored 18
types of, compared 13
HA pairs and clusters 13
HA policy
CFO 28
SFO 28
HA state
chassis 72
controller modules 72
ha-config modify command
modifying the HA state 72
ha-config show command
verifying the HA state 72
hardware
components described 11
HA components described 11
single point of failure 9
hardware replacement, nondisruptive
documentation for 118
hardware-assisted takeover
commands for configuring 77
events that trigger 78
how it speeds up takeover 25
requirements for 36
HDDs
shared, how they work 38
standard layouts for shared 38
high availability
configuring in two-node clusters 75
home owner
disk ownership type, defined 30
hot-removing
disk shelves 118
Index | 141
information
how to send feedback about improving
documentation 137
installation
equipment rack 45
HA pairs 45
system cabinet 46
node states
description of 88
Nondisruptive aggregate relocation 7
nondisruptive hardware replacement
documentation for 118
shelf modules 112
nondisruptive operations
how HA pairs support 7
nondisruptive storage controller upgrade using aggregate
relocation
documentation for 118
storage controller upgrade using aggregate
relocation, nondisruptive 118
nondisruptive upgrades
Data ONTAP 118
Data ONTAP, documentation for 118
NVRAM
adapter 48
L
layouts
standard shared HDD 38
licenses
cf 74
not required 74
LIF configuration
best practice 32
M
mailbox disks
in the HA pair 7
manual takeovers
commands for performing 102
effects of in mismatched All-Flash Optimized
personality systems 102
MetroCluster
HA pairs, compared with 16
mirrored HA pairs
about 18
advantages of 18
cabling 59
cabling Channel A 61
cabling Channel B 63
restrictions 35
setup requirements for 35
variations 19
mirroring
NVMEM log 7
NVRAM log 7
modules, disk shelf
about 112
best practices for changing types 113
hot-swapping 115
restrictions for changing types 112
testing 113
multipath HA loop
adding disk shelves to 110
O
original owner
disk ownership type, defined 30
overriding vetoes
giveback 106
owner
disk ownership type, defined 30
ownership
disk, types of 30
displaying disk ownership 31
displaying partition 31
P
panic
leading to takeover and giveback 80
parameters
of the storage failover modify command used for
configuring automatic giveback 82
partitioning
root-data, benefits of 37
root-data, how it works 38
root-data, requirements for using 40
root-data, standard layouts for 38
partitions
viewing ownership for 31
platforms
R
racking the HA pair
in a system cabinet 46
in telco-style racks 45
reboot
leading to takeover and giveback 80
relocation
aggregate ownership 125, 127
of aggregates 125, 127
removing
disk shelves 118
requirements
documentation 46
equipment 48
for using root-data partitioning 40
HA pair setup 33
hot-swapping a disk shelf module 115
tools 47
restrictions
HA pair setup 33
in mirrored HA pairs 35
root aggregate
CFO HA policy 28
data storage on 32
giveback of 28
root-data partitioning
benefits of 37
effect on aggregate relocation 126
effect on giveback 29
effect on takeover 29
how it works 38
requirements for using 40
standard layouts for 38
S
SFO
definition of 28
HA policy 28
shared drives
benefits of 37
shared HDDs
how they work 38
shared layouts
standard HDD 38
sharing storage loops or stacks
within HA pairs 37
shelf modules
upgrading or replacing 112
shelves
hot-removing 118
managing in an HA pair 110
single point of failure
analysis 9
definition of 9
eliminating 9
single-chassis HA configurations
diagram of 40
interconnect 41
SMB 3.0 sessions on Microsoft Hyper-V
effect of takeover on 25
SMB sessions
effect of takeover on 25
spare disks
in the HA pair 7, 35
standard HA pair
cabling Channel A 52
cabling Channel B 54
cabling HA interconnect for 56, 69
cabling HA interconnect for, 32xx systems 57, 70
cabling HA interconnect for, 80xx systems 58, 70
standard layouts
shared HDD 38
states
description of node 88
status messages
description of node state 88
storage aggregate relocation start command
key parameters of 130
storage configuration variations
standard HA pairs 37
storage controller upgrade using aggregate relocation,
nondisruptive
documentation for 118
storage failover
commands for disabling 75
commands for enabling 75
Index | 143
testing takeover and giveback 84
suggestions
how to send feedback about documentation 137
switchless-cluster
enabling in two-node clusters 75
symmetrical
storage configuration 37
system cabinets
installation in 46
preparing for cabling 50
system configurations
HA differences between supported 42
takeover
automatic 20
configuring when it occurs 79
definition of 20
effect on CIFS sessions 25
effect on root-data partitioning 29
effect on shared disks 29
effect on SMB 3.0 sessions 25
effect on SMB sessions 25
hardware-assisted 25, 36
hardware-assisted takeover 78
manual 20
moving epsilon during manually initiated 104
reasons for 79
testing 84
what happens during 25
when it occurs 20
takeover impossible
EMS message 32
takeovers
V
verifying
takeover and giveback 84
veto
giveback 106
override 106
vetoes
of an aggregate relocation 132
overriding 132