0% found this document useful (0 votes)
393 views

Clustered Data Ontap 8.3 - HA Configuration Guide

Clustered Data Ontap 8.3 - HA Configuration Guide

Uploaded by

Vinay K Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
393 views

Clustered Data Ontap 8.3 - HA Configuration Guide

Clustered Data Ontap 8.3 - HA Configuration Guide

Uploaded by

Vinay K Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Clustered Data ONTAP 8.

3
High-Availability Configuration Guide

NetApp, Inc.
495 East Java Drive
Sunnyvale, CA 94089
U.S.

Telephone: +1 (408) 822-6000


Fax: +1 (408) 822-4501
Support telephone: +1 (888) 463-8277
Web: www.netapp.com
Feedback: [email protected]

Part number: 215-09149_B0


March 2015

Table of Contents | 3

Contents
Understanding HA pairs .............................................................................. 7
What an HA pair is ...................................................................................................... 7
How HA pairs support nondisruptive operations and fault tolerance ......................... 7
How the HA pair improves fault tolerance ...................................................... 9
Connections and components of an HA pair ............................................................. 11
Comparison of HA pair types .................................................................................... 13
How HA pairs relate to the cluster ............................................................................ 13
How HA pairs relate to MetroCluster configurations ............................................... 16
If you have a two-node switchless cluster ................................................................. 17

Understanding mirrored HA pairs ........................................................... 18


Advantages of mirrored HA pairs ............................................................................. 18
Asymmetrically mirrored HA pairs ........................................................................... 19

Understanding takeover and giveback ..................................................... 20


When takeovers occur ............................................................................................... 20
Failover event cause-and-effect table ............................................................ 21
How hardware-assisted takeover speeds up takeover ................................................ 25
What happens during takeover .................................................................................. 25
What happens during giveback ................................................................................. 27
HA policy and how it affects takeover and giveback operations .............................. 28
How root-data partitioning affects takeover and giveback ........................................ 29
Background disk firmware update and takeover, giveback, and aggregate
relocation ............................................................................................................. 29
Types of disk ownership ............................................................................................ 30
Displaying disk and partition ownership ....................................................... 31

Planning your HA pair configuration ....................................................... 32


Best practices for HA pairs .......................................................................................
Setup requirements and restrictions for HA pairs .....................................................
Setup requirements and restrictions for mirrored HA pairs ......................................
Requirements for hardware-assisted takeover ...........................................................
If your cluster consists of a single HA pair ...............................................................

32
33
35
36
36

Storage configuration variations for HA pairs .......................................................... 37

4 | High-Availability Configuration Guide


Benefits of root-data partitioning for entry-level and All Flash FAS storage
systems ................................................................................................................
How root-data partitioning works .................................................................
Standard root-data partitioning layouts .........................................................
Requirements for using root-data partitioning ..............................................
HA pairs and storage system model types ................................................................
Single-chassis and dual-chassis HA pairs .....................................................
Interconnect cabling for systems with variable HA configurations ..............
HA configuration and the HA state PROM value .........................................
Table of storage system models and HA configuration differences ..............

37
38
38
40
40
40
41
41
42

Installing and cabling an HA pair ............................................................. 45


System cabinet or equipment rack installation .......................................................... 45
HA pairs in an equipment rack ...................................................................... 45
HA pairs in a system cabinet ......................................................................... 46
Required documentation ........................................................................................... 46
Required tools ........................................................................................................... 47
Required equipment .................................................................................................. 48
Preparing your equipment ......................................................................................... 49
Installing the nodes in equipment racks ........................................................ 49
Installing the nodes in a system cabinet ........................................................ 50
Cabling a standard HA pair ....................................................................................... 50
Determining which Fibre Channel ports to use for Fibre Channel disk
shelf connections ..................................................................................... 51
Cabling Node A to DS14mk2 AT or DS14mk4 FC disk shelves .................. 52
Cabling Node B to DS14mk2 AT or DS14mk4 FC disk shelves .................. 54
Cabling the HA interconnect (all systems except 32xx or FAS80xx in
separate chassis) ...................................................................................... 56
Cabling the HA interconnect (32xx systems in separate chassis) ................. 57
Cabling the HA interconnect (FAS80xx systems in separate chassis) .......... 58
Cabling a mirrored HA pair ...................................................................................... 59
Determining which Fibre Channel ports to use for Fibre Channel disk
shelf connections ..................................................................................... 59
Creating your port list for mirrored HA pairs ............................................... 60
Cabling the Channel A DS14mk2 AT or DS14mk4 FC disk shelf loops ..... 61
Cabling the Channel B DS14mk2 AT or DS14mk4 FC disk shelf loops ...... 63
Cabling the redundant multipath HA connection for each loop .................... 66

Table of Contents | 5
Cabling the HA interconnect (all systems except 32xx or FAS80xx in
separate chassis) ...................................................................................... 69
Cabling the HA interconnect (32xx systems in separate chassis) ................. 70
Cabling the HA interconnect (FAS80xx systems in separate chassis) .......... 70
Required connections for using uninterruptible power supplies with standard or
mirrored HA pairs ................................................................................................ 71

Configuring an HA pair ............................................................................. 72


Verifying and setting the HA state on the controller modules and chassis ............... 72
Setting the HA mode and enabling storage failover .................................................. 74
Commands for setting the HA mode ............................................................. 74
Commands for enabling and disabling storage failover ................................ 75
Enabling cluster HA and switchless-cluster in a two-node cluster ........................... 75
Verifying the HA pair configuration ......................................................................... 77
Configuring hardware-assisted takeover ................................................................... 77
Commands for configuring hardware-assisted takeover ............................... 77
System events that trigger hardware-assisted takeover ................................. 78
Configuring automatic takeover ................................................................................ 79
Commands for controlling automatic takeover ............................................. 79
System events that always result in an automatic takeover ........................... 80
Configuring automatic giveback ............................................................................... 80
How automatic giveback works .................................................................... 80
Commands for configuring automatic giveback ........................................... 81
How variations of the storage failover modify command affect automatic
giveback ................................................................................................... 82
Testing takeover and giveback .................................................................................. 84

Monitoring an HA pair .............................................................................. 87


Commands for monitoring an HA pair ..................................................................... 87
Description of node states displayed by storage failover show-type commands ...... 88

Halting or rebooting a node without initiating takeover ........................ 99


Commands for halting or rebooting a node without initiating takeover ................... 99
Halting or rebooting a node without initiating takeover in a two-node cluster ....... 100

About manual takeover ............................................................................ 102


Commands for performing and monitoring manual takeovers ............................... 102
Moving epsilon for certain manually initiated takeovers ........................................ 104

About manual giveback ........................................................................... 106


If giveback is interrupted ......................................................................................... 106

6 | High-Availability Configuration Guide


If giveback is vetoed ................................................................................................ 106
Commands for performing a manual giveback ....................................................... 108

Managing DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair


................................................................................................................ 110
Adding DS14mk2 AT or DS14mk4 FC disk shelves to a multipath HA loop ........ 110
Upgrading or replacing modules in an HA pair ...................................................... 112
About the disk shelf modules .................................................................................. 112
Restrictions for changing module types .................................................................. 112
Best practices for changing module types ............................................................... 113
Testing the modules ................................................................................................. 113
Determining the path status for your HA pair ......................................................... 114
Hot-swapping a module .......................................................................................... 115

Nondisruptive operations with HA pairs ............................................... 118


Where to find procedures for nondisruptive operations with HA pairs .................. 118
Hot-removing disk shelves or loops in systems running Data ONTAP 8.2.1 or
later .................................................................................................................... 118

Relocating aggregate ownership within an HA pair ............................. 125


How aggregate relocation works ............................................................................. 125
How root-data partitioning affects aggregate relocation ......................................... 126
Relocating aggregate ownership ............................................................................. 127
Commands for aggregate relocation ........................................................................ 129
Key parameters of the storage aggregate relocation start command ....................... 130
Veto and destination checks during aggregate relocation ....................................... 132

Copyright information ............................................................................. 135


Trademark information ........................................................................... 136
How to send comments about documentation and receive update
notification ............................................................................................ 137
Index ........................................................................................................... 138

Understanding HA pairs
HA pairs provide hardware redundancy that is required for nondisruptive operations and fault
tolerance and give each node in the pair the software functionality to take over its partner's storage
and subsequently give back the storage.

What an HA pair is
An HA pair is two storage systems (nodes) whose controllers are connected to each other directly. In
this configuration, one node can take over its partner's storage to provide continued data service if the
partner goes down.
You can configure the HA pair so that each node in the pair shares access to a common set of storage,
subnets, and tape drives, or each node can own its own distinct set of storage.
The controllers are connected to each other through an HA interconnect. This allows one node to
serve data that resides on the disks of its failed partner node. Each node continually monitors its
partner, mirroring the data for each others nonvolatile memory (NVRAM or NVMEM). The
interconnect is internal and requires no external cabling if both controllers are in the same chassis.

Takeover is the process in which a node takes over the storage of its partner. Giveback is the process
in which that storage is returned to the partner. Both processes can be initiated manually or
configured for automatic initiation.

How HA pairs support nondisruptive operations and fault


tolerance
HA pairs provide fault tolerance and let you perform nondisruptive operations, including hardware
and software upgrades, relocation of aggregate ownership, and hardware maintenance.

Fault tolerance

When one node fails or becomes impaired and a takeover occurs, the partner node continues
to serve the failed nodes data.

Nondisruptive software upgrades or hardware maintenance

During hardware maintenance or upgrades, you can perform a storage failover


takeover operation of one node and then, if necessary, power off that controller. The partner
node continues to serve data for both nodes while you perform the upgrade or maintenance.
When the upgrade or maintenance is finished, you perform a storage failover
giveback to return the data service to the original node.

8 | High-Availability Configuration Guide

During nondisruptive upgrades of Data ONTAP, the user manually enters the storage
failover takeover command to take over the partner node to allow the software upgrade
to occur. The takeover node continues to serve data for both nodes during this operation.

Clustered Data ONTAP 8.3 Upgrade and Revert/Downgrade Guide

Nondisruptive aggregate ownership relocation can be performed without a takeover and


giveback.

The HA pair supplies nondisruptive operation and fault tolerance due to the following aspects of its
configuration:

The controllers in the HA pair are connected to each other either through an HA interconnect
consisting of adapters and cables, or, in systems with two controllers in the same chassis, through
an internal interconnect
The nodes use the interconnect to perform the following tasks:

Continually check if the other node is functioning.

Mirror log data for each others NVRAM or NVMEM.

The nodes use two or more disk shelf loops, or storage arrays, in which the following conditions
apply:

Each node manages its own disks or array LUNs.

In case of takeover, the surviving node provides read/write access to the partner's disks or
array LUNs until the failed node becomes available again.
Note: Disk ownership is established by Data ONTAP or the administrator; it is not based on

which disk shelf the disk is attached to.

Clustered Data ONTAP 8.3 Physical Storage Management Guide

They own their spare disks, spare array LUNs, or both, and do not share them with the other
node.

They each have mailbox disks or array LUNs on the root volume that perform the following
tasks:

Maintain consistency between the pair

Continually check whether the other node is running or whether it has performed a takeover

Store configuration information

Related concepts

Nondisruptive operations with HA pairs on page 118


Where to find procedures for nondisruptive operations with HA pairs on page 118
Types of disk ownership on page 30

Understanding HA pairs | 9

How the HA pair improves fault tolerance


A storage system has a variety of single points of failure, such as certain cables or hardware
components. An HA pair greatly reduces the number of single points of failure because if a failure
occurs, the partner can take over and continue serving data for the affected system until the failure is
fixed.
Single point of failure definition
A single point of failure represents the failure of a single hardware component that can lead to loss of
data access or potential loss of data.
Single point of failure does not include multiple/rolling hardware errors, such as triple disk failure,
dual disk shelf module failure, and so on.
All hardware components included with your storage system have demonstrated very good reliability
with low failure rates. If a hardware component such as a controller or adapter fails, you can use the
controller failover function to provide continuous data availability and preserve data integrity for
client applications and users.
Single point of failure analysis for HA pairs
Different hardware components and cables in the storage system can be single points of failure, but
an HA configuration can eliminate these points to improve data availability.
Hardware
components

Single point of failure


Standalone

HA pair

Controller

Yes

No

If a controller fails, the node automatically


fails over to its partner node. The partner
(takeover) node serves data for both of the
nodes.

NVRAM

Yes

No

If an NVRAM adapter fails, the node


automatically fails over to its partner node.
The partner (takeover) node serves data for
both of the nodes.

CPU fan

Yes

No

If the CPU fan fails, the node automatically


fails over to its partner node. The partner
(takeover) node serves data for both of the
nodes.

How storage failover eliminates single point


of failure

10 | High-Availability Configuration Guide


Hardware
components

Single point of failure

Multiple NICs with


interface groups
(virtual interfaces)

Maybe, if
No
all NICs fail

If one of the networking links within an


interface group fails, the networking traffic is
automatically sent over the remaining
networking links on the same node. No
failover is needed in this situation.

SAS or FC-AL HBA


(host bus adapter)

Yes

No

If an HBA for the primary loop fails for a


configuration without multipath HA, the
partner node attempts a takeover at the time of
failure. With multipath HA, no takeover is
required.
If the HBA for the secondary loop fails for a
configuration without multipath HA, the
failover capability is disabled, but both nodes
continue to serve data to their respective
applications and users, with no impact or
delay. With multipath HA, failover capability
is not affected.

SAS or FC-AL cable


(controller-to-shelf,
shelf-to-shelf )

No, if dual- No
path cabling
is used

If a SAS stack or an FC-AL loop breaks in a


configuration that does not have multipath
HA, the break could lead to a failover,
depending on the shelf type.
The partnered nodes invoke the negotiated
failover feature to determine which node is
best for serving data, based on the disk shelf
count. When multipath HA is used, no failover
is required.

Disk shelf module

No, if dual- No
path cabling
is used

If a disk shelf module fails in a configuration


that does not have multipath HA, the failure
could lead to a failover.
The partnered nodes invoke the negotiated
failover feature to determine which node is
best for serving data, based on the disk shelf
count. When multipath HA is used, there is no
impact.

Disk drive

No

If a disk fails, the node can reconstruct data


from the RAID4 parity disk.
No failover is needed in this situation.

Standalone

HA pair

No

How storage failover eliminates single point


of failure

Understanding HA pairs | 11
Hardware
components

Single point of failure

How storage failover eliminates single point


of failure

Standalone

HA pair

Power supply

Maybe, if
both power
supplies fail

No

Both the controller and disk shelf have dual


power supplies. If one power supply fails, the
second power supply automatically activates.
No failover is needed in this situation. If both
power supplies fail, the node automatically
fails over to its partner node, which serves data
for both nodes.

Fan (controller or disk


shelf)

Maybe, if
both fans
fail

No

Both the controller and disk shelf have


multiple fans. If one fan fails, the second fan
automatically provides cooling. No failover is
needed in this situation.
If both fans fail, the node automatically fails
over to its partner node, which serves data for
both nodes.

HA interconnect
adapter

Not
applicable

No

If an HA interconnect adapter fails, the


failover capability is disabled but both nodes
continue to serve data to their respective
applications and users.

HA interconnect cable

Not
applicable

No

The HA interconnect adapter supports dual


HA interconnect cables. If one cable fails, the
heartbeat and NVRAM data are automatically
sent over the second cable with no delay or
interruption.
If both cables fail, the failover capability is
disabled but both nodes continue to serve data
to their respective applications and users.

Connections and components of an HA pair


Each node in an HA pair requires a network connection, an HA interconnect between the controllers,
and connections to both its own disk shelves and its partner node's shelves.
The following diagram shows a standard HA pair with native DS4243 disk shelves and multipath
HA:

12 | High-Availability Configuration Guide

Network

HA Interconnect
Node1

Node2

Node1
Storage

Node2
Storage

Primary connection
Redundant primary connection
Standby connection
Redundant standby connection

Related information

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

Understanding HA pairs | 13

Comparison of HA pair types


The different types of HA pairs support different capabilities for data duplication and failover.
HA pair type

Data
duplication?

Failover
possible after
loss of entire
node (including
storage)?

Notes

Standard HA pair

No

No

Use this configuration to provide


higher availability by protecting against
many hardware single-points-of-failure.

Mirrored HA pair

Yes

No

Use this configuration to add increased


data protection to the benefits of a
standard HA pair.

MetroCluster

Yes

Yes

Use this configuration to provide data


and hardware duplication to protect
against a large-scale disaster, such as
the loss of an entire site.

Related information

Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide


Clustered Data ONTAP 8.3 MetroCluster Management and Disaster Recovery Guide

How HA pairs relate to the cluster


HA pairs are components of the cluster, and both nodes in the HA pair are connected to other nodes
in the cluster through the data and cluster networks. But only the nodes in the HA pair can take over
each other's storage.
Although the controllers in an HA pair are connected to other controllers in the cluster through the
cluster network, the HA interconnect and disk-shelf connections are found only between the node
and its partner and their disk shelves or array LUNs.
The HA interconnect and each node's connections to the partner's storage provide physical support
for high-availability functionality. The high-availability storage failover capability does not extend to
other nodes in the cluster.
Note: Network failover does not rely on the HA interconnect and allows data network interfaces to
failover to different nodes in the cluster outside the HA pair. Network failover is different than
storage failover since it enables network resiliency across all nodes in the cluster.

14 | High-Availability Configuration Guide


Non-HA (or stand-alone) nodes are not supported in a cluster containing two or more nodes.
Although single-node clusters are supported, joining two separate single-node clusters to create one
cluster is not supported, unless you wipe clean one of the single-node clusters and join it to the other
to create a two-node cluster that consists of an HA pair.

Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
The following diagram shows two HA pairs. The multipath HA storage connections between the
nodes and their storage are shown for each HA pair. For simplicity, only the primary connections to
the data and cluster networks are shown.

Understanding HA pairs | 15

Node3
Storage

Node4
Storage

Node3

Node4
HA Interconnect

HA pair

Data
Network

Cluster
Network

HA Interconnect
Node1

Node1
Storage

HA pair

Key to storage connections


Primary connection
Redundant primary connection
Standby connection
Redundant standby connection

Possible storage failover scenarios in this cluster are as follows:

Node2

Node2
Storage

16 | High-Availability Configuration Guide

Node1 fails and Node2 takes over Node1's storage.

Node2 fails and Node1 takes over Node2's storage.

Node3 fails and Node4 takes over Node3's storage.

Node4 fails and Node3 takes over Node4's storage.

If Node1 and Node2 both fail, the storage owned by Node1 and Node2 becomes unavailable to the
data network. Although Node3 and Node4 are clustered with Node1 and Node2, they do not have
direct connections to Node1 and Node2's storage and cannot take over their storage.

How HA pairs relate to MetroCluster configurations


Except for single node clusters, an HA pair is the basic unit of a Data ONTAP cluster configuration.
MetroCluster builds upon the HA pair foundation by providing comprehensive disaster recovery
capabilities.
With the exception of single-node clusters, clustered Data ONTAP configurations consist of one or
more HA pairs where the nodes are the same model of storage controller (platform) hardware. The
cluster can grow in two-node increments to a maximum of 24 nodes for specific protocols and
platform combinations.

Clustered Data ONTAP Storage Platform Mixing Rules


This matched platform HA pair configuration is the fundamental building block that delivers a highavailability storage environment.
A MetroCluster configuration extends the resiliency of the HA pair by using mirroring to recover
from disasters and protect the data in the configuration. This configuration provides disaster recovery
through a single MetroCluster command that activates a secondary Storage Virtual Machine (SVM)
on the survivor site to serve the mirrored data originally owned by the disaster-affected primary site.
The MetroCluster configuration protects data by implementing two physically separate, mirrored
clusters. Each cluster synchronously mirrors the data and Storage Virtual Machine (SVM)
configuration of the other. In event of a disaster at one site, an administrator can activate the mirrored
SVM and begin serving the mirrored data from the surviving site. Additionally, the nodes in each
cluster are configured as an HA pair, providing a level of local failover.
Related information

Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide


Clustered Data ONTAP 8.3 MetroCluster Management and Disaster Recovery Guide

Understanding HA pairs | 17

If you have a two-node switchless cluster


In a two-node switchless cluster configuration, you do not need to connect the nodes in the HA pair
to cluster network switches. Instead, you install cluster network connections directly between the two
storage controllers.
In a two-node switchless cluster, the two nodes can only be an HA pair. For cabling details, see the
Hardware Universe at hwu.netapp.com and the Installation and Setup Instructions for your system.
The switchless cluster feature cannot be used with more than two nodes. If you plan to add more
nodes, you must connect each node in the cluster to cluster network switches.
Related concepts

If your cluster consists of a single HA pair on page 36


Related tasks

Enabling cluster HA and switchless-cluster in a two-node cluster on page 75


Related references

Halting or rebooting a node without initiating takeover on page 99

18 | High-Availability Configuration Guide

Understanding mirrored HA pairs


Mirrored HA pairs provide high availability through failover just as standard HA pairs do.
Additionally, mirrored HA pairs maintain two complete copies of all mirrored data. These copies are
called plexes and are continually, synchronously updated every time Data ONTAP writes to a
mirrored aggregate. The plexes can be physically separated to protect against the loss of one set of
disks or array LUNs.
Note: Mirrored HA pairs do not provide the capability to fail over to the partner node if one node

fails disastrously or is disabled. For example, if an entire node loses power, including its storage,
you cannot fail over to the partner node. For that capability you must have a MetroCluster
configuration.
If the root aggregate is mirrored, storage failover takeover will fail unless all current mailbox disks
are accessible. When all mailbox disks are accessible, storage failover takeover succeeds with the
surviving plex.
Mirrored HA pairs use SyncMirror, implemented through the storage aggregate mirror
command.
Related information

Clustered Data ONTAP 8.3 Data Protection Guide


Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide
Clustered Data ONTAP 8.3 MetroCluster Management and Disaster Recovery Guide
Clustered Data ONTAP 8.3 man page: storage aggregate mirror - Mirror an existing aggregate

Advantages of mirrored HA pairs


Data mirroring provides additional data protection in the event of disk failures and reduces the need
for failover if problems occur with other components.
Mirroring your data protects it from the following problems that can cause data loss:

The failure or loss of three or more disks in a RAID-DP (RAID double-parity) aggregate

The failure of an array LUN; for example, because of a double-disk failure on the storage array

The failure of a storage array

The failure of a SAS HBA, FC-AL adapter, disk shelf loop or stack, or disk shelf module does not
require a failover in a mirrored HA pair.

Understanding mirrored HA pairs | 19


Similar to standard HA pairs, if either node in a mirrored HA pair becomes impaired or cannot access
its data, the other node can automatically serve the impaired nodes data until the problem is
corrected.

Asymmetrically mirrored HA pairs


You can selectively mirror your storage. For example, you can mirror all the storage on one node and
none of the storage on the other node. Takeover will function normally. However, any unmirrored
data is lost if the storage that contains that data is damaged or destroyed.
Note: You must connect the unmirrored storage to both nodes, just as you must for mirrored
storage. You cannot have storage that is connected to only one node in an HA pair.

20 | High-Availability Configuration Guide

Understanding takeover and giveback


Takeover and giveback are the operations that let you take advantage of the HA configuration to
perform nondisruptive operations and avoid service interruptions. Takeover is the process in which a
node takes over the storage of its partner. Giveback is the process in which the storage is returned to
the partner. You can initiate the processes in different ways.

When takeovers occur


You can initiate takeovers manually or they can occur automatically when a failover event happens,
depending on how you configure the HA pair. In some cases, takeovers occur automatically,
regardless of configuration.
Takeovers can occur under the following conditions:

When you manually initiate takeover with the storage failover takeover command

When a node in an HA pair with the default configuration for immediate takeover on panic
undergoes a software or system failure that leads to a panic
By default, the node automatically performs a giveback, returning the partner to normal operation
after the partner has recovered from the panic and booted up.

When a node in an HA pair undergoes a system failure (for example, a loss of power) and cannot
reboot
Note: If the storage for a node also loses power at the same time, a standard takeover is not

possible.

When a node does not receive heartbeat messages from its partner
This could happen if the partner experienced a hardware or software failure that did not result in a
panic but still prevented it from functioning correctly.

When you halt one of the nodes without using the -f or -inhibit-takeover true parameter
Note: In a two-node cluster with cluster HA enabled, halting or rebooting a node using the
inhibittakeover true parameter causes both nodes to stop serving data unless you first

disable cluster HA and then assign epsilon to the node that you want to remain online.

When you reboot one of the nodes without using the inhibittakeover true parameter
The -onreboot parameter of the storage failover command is enabled by default.

When hardware-assisted takeover is enabled and it triggers a takeover when the remote
management device (Service Processor) detects failure of the partner node

Understanding takeover and giveback | 21


Related tasks

Moving epsilon for certain manually initiated takeovers on page 104


Related references

Halting or rebooting a node without initiating takeover on page 99

Failover event cause-and-effect table


Certain events can cause a controller failover in HA pairs. The storage system responds differently
depending on the event and the type of HA pair.
Event

Does the event


trigger
failover?

Does the event


prevent a
future failover
from occurring,
or a failover
from occurring
successfully?

Is data still available on the affected


volume after the event?
Single node cluster

Standard or
mirrored HA pair

Single disk
failure

No

No

Yes

Yes

Double disk
failure (2
disks fail in
same RAID
group)

Yes, unless you


are using
SyncMirror or
RAID-DP, then
no

Maybe; if root
volume has
double disk
failure, or if the
mailbox disks
are affected, no
failover is
possible

No, unless you are


No, unless you are
using RAID-DP or
using RAID-DP or
SyncMirror, then yes SyncMirror, then
yes

Triple disk
failure (3
disks fail in
same RAID
group)

Maybe. If
SyncMirror is
being used, no
takeover occurs;
otherwise, yes

Maybe; if root
volume has
triple disk
failure, no
failover is
possible

No

No

Single HBA
(initiator)
failure, Loop
A

Maybe; if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes

Maybe; if root
volume has
double disk
failure, no
failover is
possible

Yes, if multipath HA
or SyncMirror is
being used

Yes, if multipath
HA or SyncMirror
is being used, or if
failover succeeds

22 | High-Availability Configuration Guide


Event

Does the event


trigger
failover?

Does the event


prevent a
future failover
from occurring,
or a failover
from occurring
successfully?

Is data still available on the affected


volume after the event?
Single node cluster

Standard or
mirrored HA pair

Single HBA
(initiator)
failure, Loop
B

No

Yes, unless you


are using
SyncMirror or
multipath HA
and the mailbox
disks are not
affected, then no

Yes, if multipath HA
or SyncMirror is
being used

Yes, if multipath
HA or SyncMirror
is being used, or if
failover succeeds

Single HBA
initiator
failure (both
loops at the
same time)

Yes, unless the


data is mirrored
on a different
(up) loop or
multipath HA is
in use, then no
takeover needed

Maybe; if the
data is mirrored
or multipath HA
is being used
and the mailbox
disks are not
affected, then
no; otherwise,
yes

No, unless the data


is mirrored or
multipath HA is in
use, then yes

No failover needed
if data is mirrored or
multipath HA is in
use

AT-FCX
failure (Loop
A)

Only if
multidisk
volume failure
or an open loop
condition occurs,
and neither
SyncMirror nor
multipath HA is
in use

Maybe; if root
volume has
double disk
failure, no
failover is
possible

No

Yes, if failover
succeeds

AT-FCX
failure (Loop
B)

No

Maybe; if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes

Yes, if multipath HA
or SyncMirror is in
use

Yes

Understanding takeover and giveback | 23


Event

Does the event


trigger
failover?

Does the event


prevent a
future failover
from occurring,
or a failover
from occurring
successfully?

Is data still available on the affected


volume after the event?
Single node cluster

Standard or
mirrored HA pair

IOM failure
(Loop A)

Only if
multidisk
volume failure
or an open loop
condition occurs,
and neither
SyncMirror nor
multipath HA is
in use

Maybe; if root
volume has
double disk
failure, no
failover is
possible

No

Yes, if failover
succeeds

IOM failure
(Loop B)

No

Maybe: if
SyncMirror or
multipath HA is
in use, then no;
otherwise, yes

Yes, if multipath HA
or SyncMirror is in
use

Yes

Shelf
(backplane)
failure

Only if
multidisk
volume failure
or an open loop
condition occurs,
and data isnt
mirrored

Maybe; if root
volume has
double disk
failure or if the
mailboxes are
affected, no
failover is
possible

Maybe; if data is
mirrored, then yes;
otherwise, no

Maybe; if data is
mirrored, then yes;
otherwise, no

Shelf, single
power failure

No

No

Yes

Yes

Shelf, dual
power failure

Only if
multidisk
volume failure
or an open loop
condition occurs
and data is not
mirrored

Maybe; if root
volume has
double disk
failure, or if the
mailbox disks
are affected, no
failover is
possible

Maybe; if data is
mirrored, then yes;
otherwise, no

Maybe; if data is
mirrored, then yes;
otherwise, no

24 | High-Availability Configuration Guide


Event

Does the event


trigger
failover?

Does the event


prevent a
future failover
from occurring,
or a failover
from occurring
successfully?

Is data still available on the affected


volume after the event?
Single node cluster

Standard or
mirrored HA pair

Controller,
single power
failure

No

No

Yes

Yes

Controller,
dual power
failure

Yes

Yes, until power


is restored

No

Yes, if failover
succeeds

HA
interconnect
failure (1
port)

No

No

Not applicable

Yes

HA
interconnect
failure (both
ports)

No

Yes

Not applicable

Yes

Tape
interface
failure

No

No

Yes

Yes

Heat exceeds
permissible
amount

Yes

No

No

No

Fan failures
(disk shelves
or controller)

No

No

Yes

Yes

Reboot

Yes

No

No

Yes, if failover
occurs

Panic

Yes

No

No

Yes, if failover
occurs

Related information

NetApp Hardware Universe


Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
SAS Disk Shelves Universal SAS and ACP Cabling Guide

Understanding takeover and giveback | 25

SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

How hardware-assisted takeover speeds up takeover


Hardware-assisted takeover speeds up the takeover process by using a node's remote management
device (Service Processor) to detect failures and quickly initiate the takeover rather than waiting for
Data ONTAP to recognize that the partner's heartbeat has stopped.
Without hardware-assisted takeover, if a failure occurs, the partner waits until it notices that the node
is no longer giving a heartbeat, confirms the loss of heartbeat, and then initiates the takeover.
The hardware-assisted takeover feature uses the following process to take advantage of the remote
management device and avoid that wait:
1. The remote management device monitors the local system for certain types of failures.
2. If a failure is detected, the remote management device immediately sends an alert to the partner
node.
3. Upon receiving the alert, the partner initiates takeover.
Hardware-assisted takeover is enabled by default.

What happens during takeover


When a node takes over its partner, it continues to serve and update data in the partner's aggregates
and volumes. To do this, the node takes ownership of the partner's aggregates, and the partner's LIFs
migrate according to network interface failover rules. Except for specific SMB 3.0 connections,
existing SMB (CIFS) sessions are disconnected when the takeover occurs.
The following steps occur when a node takes over its partner:
1. If the negotiated takeover is user-initiated, aggregate relocation is performed to move data
aggregates one at a time from the partner node to the node that is performing the takeover.
The current owner of each aggregate (except for the root aggregate) is changed from the target
node to the node that is performing the takeover. There is a brief outage for each aggregate as
ownership changes. This outage is briefer than an outage that occurs during a takeover without
aggregate relocation.

You can monitor the progress using the storage failover showtakeover command.

The aggregate relocation can be avoided during this takeover instance by using the
bypassoptimization parameter with the storage failover takeover command. To
bypass aggregate relocation during all future planned takeovers, set the
bypasstakeoveroptimization parameter of the storage failover modify
command to true.

26 | High-Availability Configuration Guide


Note: Aggregates are relocated serially during planned takeover operations to reduce client
outage. If aggregate relocation is bypassed, longer client outage occurs during planned takeover
events. Setting the bypasstakeoveroptimization parameter of the storage
failover modify command to true is not recommended in environments that have
stringent outage requirements.

2. If the user-initiated takeover is a negotiated takeover, the target node gracefully shuts down,
followed by takeover of the target node's root aggregate and any aggregates that were not
relocated in Step 1.
3. Before the storage takeover begins, data LIFs migrate from the target node to the node performing
the takeover or to any other node in the cluster based on LIF failover rules.
The LIF migration can be avoided by using the skiplif-migration parameter with the
storage failover takeover command.

Clustered Data ONTAP 8.3 File Access Management Guide for CIFS
Clustered Data ONTAP 8.3 File Access Management Guide for NFS
Clustered Data ONTAP 8.3 Network Management Guide
4. Existing SMB (CIFS) sessions are disconnected when takeover occurs.
Attention: Due to the nature of the SMB protocol, all SMB sessions except for SMB 3.0
sessions connected to shares with the Continuous Availability property set, will be

disruptive. SMB 1.0 and SMB 2.x sessions cannot reconnect after a takeover event. Therefore,
takeover is disruptive and some data loss could occur.
5. SMB 3.0 sessions established to shares with the Continuous Availability property set can
reconnect to the disconnected shares after a takeover event.
If your site uses SMB 3.0 connections to Microsoft Hyper-V and the Continuous
Availability property is set on the associated shares, takeover will be nondisruptive for those
sessions.

Clustered Data ONTAP 8.3 File Access Management Guide for CIFS
If the node doing the takeover panics
If the node that is performing the takeover panics within 60 seconds of initiating takeover, the
following events occur:

The node that panicked reboots.

After it reboots, the node performs self-recovery operations and is no longer in takeover mode.

Failover is disabled.

If the node still owns some of the partner's aggregates, after enabling storage failover, return these
aggregates to the partner using the storage failover giveback command.

Understanding takeover and giveback | 27


Related concepts

HA policy and how it affects takeover and giveback operations on page 28


How automatic giveback works on page 80
Related information

Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status

What happens during giveback


The local node returns ownership of the aggregates and volumes to the partner node after you resolve
any issues on the partner node or complete maintenance operations. In addition, the local node
returns ownership when the partner node has booted up and giveback is initiated either manually or
automatically.
The following process takes place in a normal giveback. In this discussion, Node A has taken over
Node B. Any issues on Node B have been resolved and it is ready to resume serving data.
1. Any issues on Node B have been resolved and it displays the following message:
Waiting for giveback

2. The giveback is initiated by the storage failover giveback command or by automatic


giveback if the system is configured for it.
This initiates the process of returning ownership of Node B's aggregates and volumes from Node
A back to Node B.
3. Node A returns control of the root aggregate first.
4. Node B completes the process of booting up to its normal operating state.
5. As soon as Node B reaches the point in the boot process where it can accept the non-root
aggregates, Node A returns ownership of the other aggregates, one at a time, until giveback is
complete.
You can monitor the progress of the giveback with the storage failover show-giveback
command.
I/O resumes for each aggregate once giveback is complete for that aggregate; this reduces the overall
outage window for each aggregate.
Related concepts

HA policy and how it affects takeover and giveback operations on page 28

28 | High-Availability Configuration Guide


Related information

Clustered Data ONTAP 8.3 man page: storage failover giveback - Return failed-over storage to its
home node
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status

HA policy and how it affects takeover and giveback


operations
Data ONTAP automatically assigns an HA policy of CFO or SFO to an aggregate that determines
how storage failover operations (takeover and giveback) occur for the aggregate and its volumes.
HA policy is assigned to and required by each aggregate on the system. The two options, CFO
(controller failover), and SFO (storage failover), determine the aggregate control sequence Data
ONTAP uses during storage failover and giveback operations.
Although the terms CFO and SFO are sometimes used informally to refer to storage failover
(takeover and giveback) operations, they actually represent the HA policy assigned to the aggregates.
For example, the terms SFO aggregate or CFO aggregate simply refer to the aggregate's HA policy
assignment.

Aggregates created on clustered Data ONTAP systems (except for the root aggregate containing
the root volume) have an HA policy of SFO. Manually initiated takeover is optimized for
performance by relocating SFO (non-root) aggregates serially to the partner prior to takeover.
During the giveback process, aggregates are given back serially after the taken-over system boots
and the management applications come online, enabling the node to receive its aggregates.

Because aggregate relocation operations entail reassigning aggregate disk ownership and shifting
control from a node to its partner, only aggregates with an HA policy of SFO are eligible for
aggregate relocation.

The root aggregate always has an HA policy of CFO and is given back at the start of the giveback
operation since this is necessary to allow the taken-over system to boot. All other aggregates are
given back serially after the taken-over system completes the boot process and the management
applications come online, enabling the node to receive its aggregates.
Note: Changing the HA policy of an aggregate from SFO to CFO is a Maintenance mode

operation. Do not modify this setting unless directed to do so by a customer support representative.
Related information

Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status

Understanding takeover and giveback | 29

How root-data partitioning affects takeover and giveback


If you have an entry-level or All Flash FAS (AFF) platform model that uses root-data partitioning,
also called shared disks, storage failover takeover and giveback processing occurs just as with
nonshared disks.
For storage systems that use root-data partitioning, storage failover takeover and giveback operations
occur according to the aggregate's HA policy. Ownership of the container disk does not change
during takeover or giveback operations.
Ownership changes that occur during negotiated storage failover takeover or giveback events are
temporary.
Related concepts

Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
HA policy and how it affects takeover and giveback operations on page 28
Related information

Clustered Data ONTAP 8.3 Physical Storage Management Guide

Background disk firmware update and takeover, giveback,


and aggregate relocation
Background disk firmware updates affect HA pair takeover, giveback, and aggregate relocation
operations differently, depending on how those operations are initiated.
The following list describes how background disk firmware update affects takeover, giveback, and
aggregate relocation:

If a background disk firmware update is occurring on a disk on either node, manually initiated
takeover operations are delayed until the disk firmware update finishes on that disk. If the
background disk firmware update takes longer than 120 seconds, takeover operations are aborted
and must be restarted manually after the disk firmware update finishes. If the takeover was
initiated with the bypassoptimization parameter of the storage failover takeover
command set to true, the background disk firmware update occurring on the destination node
does not affect the takeover.

If a background disk firmware update is occurring on a disk on the source (or takeover) node and
the takeover was initiated manually with the options parameter of the storage failover
takeover command set to immediate, takeover operations start immediately.

If a background disk firmware update is occurring on a disk on a node and it panics, takeover of
the panicked node begins immediately.

30 | High-Availability Configuration Guide

If a background disk firmware update is occurring on a disk on either node, giveback of data
aggregates is delayed until the disk firmware update finishes on that disk. If the background disk
firmware update takes longer than 120 seconds, giveback operations are aborted and must be
restarted manually after the disk firmware update completes.

If a background disk firmware update is occurring on a disk on either node, aggregate relocation
operations are delayed until the disk firmware update finishes on that disk. If the background disk
firmware update takes longer than 120 seconds, aggregate relocation operations are aborted and
must be restarted manually after the disk firmware update finishes. If aggregate relocation was
initiated with the -override-destination-checks of the storage aggregate
relocation command set to true, background disk firmware update occurring on the
destination node does not affect aggregate relocation.

Types of disk ownership


The HA or Disaster Recovery (DR) state of the system that owns a disk can affect which system has
access to the disk. This means that there are several types of ownership for disks.
Disk ownership information is set either by Data ONTAP or by the administrator, and recorded on the
disk, in the form of the controller module's unique system ID (obtained from a node's NVRAM card
or NVMEM board).
Disk ownership information displayed by Data ONTAP can take one or more of the following forms.
Note that the names used vary slightly depending on the context.

Owner (or Current owner)


This is the system that can currently access the disk.

Original owner (or Home owner)


If the system is in HA takeover, then Owner is changed to the system that took over the node, and
Original owner or Home owner reflects the system that owned the disk before the takeover.

DR home owner
If the system is in a MetroCluster switchover, DR home owner reflects the value of the Home
owner field before the switchover occurred.

Related information

Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide


Clustered Data ONTAP 8.3 MetroCluster Management and Disaster Recovery Guide
Clustered Data ONTAP 8.3 Physical Storage Management Guide

Understanding takeover and giveback | 31

Displaying disk and partition ownership


You can view disk ownership to determine which node controls the storage. Beginning with Data
ONTAP 8.3, you can also view the partition ownership on systems that use shared disks.
Steps

1. Display the ownership of physical disks using the storage disk show -ownership
command:
Example
cluster::> storage disk show -ownership
Disk
Aggregate Home
Owner
DR Home
DR Home ID Reserver
Pool
-------- --------- -------- -------- ------------------ ----------- -----1.0.0
aggr0_2
node2
node2
2014941509 Pool0
1.0.1
aggr0_2
node2
node2
2014941509 Pool0
1.0.2
aggr0_1
node1
node1
2014941219 Pool0
1.0.3
node1
node1
2014941219 Pool0
...

Home ID

Owner ID

---------- ----------2014941509 2014941509


2014941509 2014941509
2014941219 2014941219
2014941219 2014941219

2. If you have a system that uses shared disks, display the partition ownership using the storage
disk show -partition-ownership command:
Example
cluster::> storage disk show -partition-ownership
Root
Container Container
Disk
Aggregate Root Owner Owner ID
Data Owner
Owner
Owner ID
-------- --------- ----------- ----------- -------------------- ----------1.0.0
node1
1886742616 node1
node1
1886742616
1.0.1
node1
1886742616 node1
node1
1886742616
1.0.2
node2
1886742657 node2
node2
1886742657
1.0.3
node2
1886742657 node2
node2
1886742657
...

Data
Owner ID
----------1886742616
1886742616
1886742657
1886742657

32 | High-Availability Configuration Guide

Planning your HA pair configuration


As you plan your HA pair, you must consider recommended best practices, the requirements, and the
possible variations.

Best practices for HA pairs


To ensure that your HA pair is robust and operational, you need to be familiar with configuration best
practices.

Do not use the root aggregate for storing data. Storing user data in the root aggregate adversely
affects system stability and increases the storage failover time between nodes in an HA pair.

Make sure that each power supply unit in the storage system is on a different power grid so that a
single power outage does not affect all power supply units.

Use LIFs (logical interfaces) with defined failover policies to provide redundancy and improve
availability of network communication.

Keep both nodes in the HA pair on the same version of Data ONTAP.

Follow the documented procedures when upgrading your HA pair.

Clustered Data ONTAP 8.3 Upgrade and Revert/Downgrade Guide


Note: You cannot directly upgrade a node running Data ONTAP 8.2 to Data ONTAP 8.3 or

higher. You must first upgrade the node running Data ONTAP 8.2 to 8.2.1 or a higher version
within the 8.2 release family.

Maintain consistent configuration between the two nodes.


An inconsistent configuration is often the cause of failover problems.

Test the failover capability routinely (for example, during planned maintenance) to ensure proper
configuration.

Make sure that each node has sufficient resources to adequately support the workload of both
nodes during takeover mode.

Use the Config Advisor tool to help ensure that failovers are successful.

If your system supports remote management (through a Service Processor), make sure that you
configure it properly.

Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators

Follow recommended limits for FlexVol volumes, dense volumes, Snapshot copies, and LUNs to
reduce the takeover or giveback time.

Planning your HA pair configuration | 33


When adding traditional or FlexVol volumes to an HA pair, consider testing the takeover and
giveback times to ensure that they fall within your requirements.

For systems using disks, check for failed disks regularly and remove them as soon as possible.
Failed disks can extend the duration of takeover operations or prevent giveback operations.

Clustered Data ONTAP 8.3 Physical Storage Management Guide

Multipath HA is required on all HA pairs except for some FAS22xx and FAS25xx system
configurations, which use single-path HA and lack the redundant standby connections.

To ensure that you receive prompt notification if takeover capability becomes disabled, configure
your system to enable automatic email notification for takeover impossible EMS messages:

ha.takeoverImpVersion

ha.takeoverImpLowMem

ha.takeoverImpDegraded

ha.takeoverImpUnsync

ha.takeoverImpIC

ha.takeoverImpHotShelf

ha.takeoverImpNotDef

Avoid using the -only-cfo-aggregates parameter with the storage failover giveback
command.

Related tasks

Verifying the HA pair configuration on page 77

Setup requirements and restrictions for HA pairs


You must follow certain requirements and restrictions when setting up a new HA pair. These
requirements help provide the data availability benefits of the HA pair design.
The following list specifies the requirements and restrictions you should be aware of when setting up
a new HA pair:

Architecture compatibility
Both nodes must have the same system model and be running the same Data ONTAP software
and system firmware versions. The Data ONTAP release notes list the supported storage systems.

Clustered Data ONTAP 8.3 Release Notes


Clustered Data ONTAP Storage Platform Mixing Rules

Nonvolatile memory (NVRAM or NVMEM) size and version compatibility

34 | High-Availability Configuration Guide


The size and version of the system's nonvolatile memory must be identical on both nodes in an
HA pair.

Storage capacity

The number of disks or array LUNs must not exceed the maximum configuration capacity.

The total storage attached to each node must not exceed the capacity for a single node.

If your system uses both native disks and array LUNs, the combined total of disks and array
LUNs cannot exceed the maximum configuration capacity.

The total storage attached to each node must not exceed the capacity for a single node.

To determine the maximum capacity for a system using disks, array LUNs, or both, see the
Hardware Universe at hwu.netapp.com.
Note: After a failover, the takeover node temporarily serves data from all the storage in the HA

pair.

Disks and disk shelf compatibility

FC, SATA, and SAS storage are supported in HA pairs.

FC disks cannot be mixed on the same loop as SATA or SAS disks.

Different connection types cannot be combined in the same loop or stack.

Different types of storage can be used on separate stacks or loops on the same node. You can
also dedicate a node to one type of storage and the partner node to a different type, if needed.

NetApp Hardware Universe


Clustered Data ONTAP 8.3 Physical Storage Management Guide

Multipath HA is required on all HA pairs except for some FAS22xx and FAS25xx system
configurations, which use single-path HA and lack the redundant standby connections.

Mailbox disks or array LUNs on the root volume

Two disks are required if the root volume is on a disk shelf.

One array LUN is required if the root volume is on a storage array.

Interconnect adapters and cables


HA interconnect adapters and cables must be installed unless the system has two controllers in
the chassis and an internal interconnect.

Network connectivity
Both nodes must be attached to the same network and the Network Interface Cards (NICs) or
onboard Ethernet ports must be configured correctly.

System software

Planning your HA pair configuration | 35


The same system software, such as SyncMirror, Server Message Block (SMB)/Common Internet
File System (CIFS), or Network File System (NFS), must be licensed and enabled on both nodes.
Note: If a takeover occurs, the takeover node can provide only the functionality for the licenses
installed on it. If the takeover node does not have a license that was being used by the partner
node to serve data, your HA pair loses functionality after a takeover.

Systems using array LUNs


For an HA pair using array LUNs, both nodes in the pair must be able to detect the same array
LUNs.
Note: Only the node that is the configured owner of a LUN has read-and-write access to that
LUN. During takeover operations, the emulated storage system maintains read-and-write
access to the LUN.

All-Flash Optimized systems


For All-Flash Optimized FAS80xx series systems, both nodes in the HA pair must have the AllFlash Optimized personality enabled.
Note: If both nodes in the HA pair do not have the All-Flash Optimized personality enabled,
Data ONTAP automatically generates an AutoSupport message warning of the mismatch.

Related references

Commands for performing and monitoring manual takeovers on page 102

Setup requirements and restrictions for mirrored HA pairs


The restrictions and requirements for mirrored HA pairs include those for a standard HA pair with
these additional requirements for disk pool assignments and cabling.

You must ensure that your pools are configured correctly:

Disks or array LUNs in the same plex must be from the same pool, with those in the opposite
plex from the opposite pool.

There must be sufficient spares in each pool to account for a disk or array LUN failure.

Both plexes of a mirror should not reside on the same disk shelf because it might result in a
single point of failure.

The storage failover command's -mode option must be set to ha.

If you are using array LUNs, paths to an array LUN must be redundant.

Related references

Commands for setting the HA mode on page 74

36 | High-Availability Configuration Guide


Related information

Clustered Data ONTAP 8.3 Data Protection Guide

Requirements for hardware-assisted takeover


The hardware-assisted takeover feature is available on systems where the Service Processor module
is configured for remote management. Remote management provides remote platform management
capabilities, including remote access, monitoring, troubleshooting, logging, and alerting features.
Although a system with remote management on both nodes provides hardware-assisted takeover for
both, hardware-assisted takeover is also supported on HA pairs in which only one of the two systems
has remote management configured. Remote management does not have to be configured on both
nodes in the HA pair. Remote management can detect failures on the system in which it is installed
and provide faster takeover times if a failure occurs on the system.
Related information

Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators

If your cluster consists of a single HA pair


Cluster high availability (HA) is activated automatically when you enable storage failover on clusters
that consist of two nodes, and you should be aware that automatic giveback is enabled by default. On
clusters that consist of more than two nodes, automatic giveback is disabled by default, and cluster
HA is disabled automatically.
A cluster with only two nodes presents unique challenges in maintaining a quorum, the state in which
a majority of nodes in the cluster have good connectivity. In a two-node cluster, neither node holds
epsilon, the value that designates one of the nodes as the master. Epsilon is required in clusters with
more than two nodes. Instead, both nodes are polled continuously to ensure that if takeover occurs,
the node that is still up and running has full read-write access to data as well as access to logical
interfaces and management functions. This continuous polling function is referred to as cluster high
availability or cluster HA.
Cluster HA is different than and separate from the high availability provided by HA pairs and the
storage failover commands. While crucial to full functional operation of the cluster after a
failover, cluster HA does not provide the failover capability of the storage failover functionality.
Related concepts

If you have a two-node switchless cluster on page 17


Related tasks

Enabling cluster HA and switchless-cluster in a two-node cluster on page 75

Planning your HA pair configuration | 37


Related references

Halting or rebooting a node without initiating takeover on page 99


Related information

Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators

Storage configuration variations for HA pairs


Because your storage management and performance requirements can vary, you can configure HA
pairs symmetrically, asymmetrically, as an active/passive pair, or with shared disk shelf stacks.
Symmetrical (active/active) configurations
In a symmetrical HA pair, each node has the same amount of storage.
Asymmetrical configurations
In an asymmetrical standard HA pair, one node has more storage than the other. This is
supported as long as neither node exceeds the maximum capacity limit for the node.
Active/passive configurations
In this configuration, the passive node has only a root volume, and the active node has all
the remaining storage, in addition to serving all data requests during normal operation.
The passive node responds to data requests only if it has taken over the active node.
Shared loops or stacks
In this configuration, shared loops or stacks between the nodes are particularly useful for
active/passive configurations, as described in the preceding bullet.

Benefits of root-data partitioning for entry-level and All


Flash FAS storage systems
Starting with Data ONTAP 8.3, you can use root-data partitioning, also called shared drives, to
significantly increase usable system capacity on entry-level and All Flash FAS (AFF) platform
models.
Root-data partitioning enables the root aggregate to use less space, leaving more space for the data
aggregate and improving storage utilization.
Note: Because support for root-data partitioning begins with Data ONTAP 8.3, you must first
unpartition any partitioned drives before performing the following operations:

When you revert a system using root-data partitioning to a previous version of Data ONTAP

When you transfer partitioned storage to a system running a previous version of Data ONTAP

38 | High-Availability Configuration Guide


Related information

Clustered Data ONTAP 8.3 Physical Storage Management Guide

How root-data partitioning works


For entry-level and All Flash FAS (AFF) platform models, aggregates can be composed of parts of a
drive rather than the entire drive.
Root-data partitioning is usually enabled and configured by the factory. It can also be established by
initiating system initialization using option 4 from the boot menu. Note that system initialization
erases all data on the disks of the node and resets the node configuration to the factory default
settings.
When a node has been configured to use root-data partitioning, partitioned disks have two partitions:

The smaller partition is used to compose the root aggregate. The larger partition is used in data
aggregates. The size of the partitions is set by Data ONTAP, and depends on the number of disks
used to compose the root aggregate when the system is initialized. (The more disks used to compose
the root aggregate, the smaller the root partition.) After system initialization, the partition sizes are
fixed; adding partitions or disks to the root aggregate after system initialization increases the size of
the root aggregate, but does not change the root partition size.
The partitions are used by RAID in the same manner as physical disks are; all of the same
requirements apply. For example, if you add an unpartitioned drive to a RAID group consisting of
partitioned drives, the unpartitioned drive is partitioned to match the partition size of the drives in the
RAID group and the rest of the disk is unused.
If a partitioned disk is moved to another node or used in another aggregate, the partitioning persists;
you can use the disk only in RAID groups composed of partitioned disks.

Standard root-data partitioning layouts


The root aggregate is configured by the factory; you should not change it. However, you can use the
data partitions in a few different configurations, depending on your requirements.
The following diagram shows one way to configure the partitions for an active-passive configuration
with 12 partitioned disks. There are two root aggregates, one for each node, composed of the small
partitions. Each root aggregate has a spare partition. There is just one RAID-DP data aggregate, with
two parity disk partitions and one spare partition.

Planning your HA pair configuration | 39

The following diagram shows one way to configure the partitions for an active-active configuration
with 12 partitioned disks. In this case, there are two RAID-DP data aggregates, each with their own
data partitions, parity partitions, and spares. Note that each disk is allocated to only one node. This is
a best practice that prevents the loss of a single disk from affecting both nodes.

The disks used for data, parity, and spare partitions might not be exactly as shown in these diagrams.
For example, the parity partitions might not always align on the same disk.

40 | High-Availability Configuration Guide

Requirements for using root-data partitioning


In most cases, you can use drives that are partitioned for root-data partitioning exactly as you would
use a physical, unshared drive. However, you cannot use root-data partitioning in certain
configurations.
The following storage devices cannot be partitioned:

Array LUNs

HDD types that are not available as internal drives: ATA, FCAL, and MSATA

100-GB SSDs

You cannot use root-data partitioning with the following technologies:

MetroCluster

RAID4
Aggregates composed of partitioned drives must have a RAID type of RAID-DP.

Related information

Clustered Data ONTAP 8.3 Physical Storage Management Guide

HA pairs and storage system model types


Different model storage systems support some different HA configurations. This includes the
physical configuration of the HA pair and the manner in which the system recognizes that it is in an
HA pair.
Note: The physical configuration of the HA pair does not affect the cluster cabling of the nodes in

the HA pair.

Single-chassis and dual-chassis HA pairs


Depending on the model of the storage system, an HA pair can consist of two controllers in a single
chassis, or two controllers in two separate chassis. Some models can be configured either way, while
other models can be configured only as a single-chassis HA pair or dual-chassis HA pair.
The following example shows a single-chassis HA pair:

Planning your HA pair configuration | 41


In a single-chassis HA pair, both controllers are in the same chassis. The HA interconnect is provided
by the internal backplane. No external HA interconnect cabling is required.
The following example shows a dual-chassis HA pair and the HA interconnect cables:

In a dual-chassis HA pair, the controllers are in separate chassis. The HA interconnect is provided by
external cabling.

Interconnect cabling for systems with variable HA configurations


In systems that can be configured either as a single-chassis or dual-chassis HA pair, the interconnect
cabling is different depending on the configuration.
The following table describes the interconnect cabling for 32xx and 62xx systems:
If the controller modules in
the HA pair are...

The HA interconnect cabling is...

Both in the same chassis

Not required, since an internal interconnect is used

Each in a separate chassis

Required

HA configuration and the HA state PROM value


Some controller modules and chassis automatically record in a PROM whether they are in an HA
pair or stand-alone. This record is the HA state and must be the same on all components within the
stand-alone system or HA pair. The HA state can be manually configured if necessary.
Related tasks

Verifying and setting the HA state on the controller modules and chassis on page 72

42 | High-Availability Configuration Guide

Table of storage system models and HA configuration differences


Supported storage systems can have significant differences in their HA configurations, depending
upon the model.
The following table lists the supported storage systems and their HA configuration differences:
Storage
system
model

HA configuration (singlechassis, dual-chassis, or


either)

Interconnect type (internal


InfiniBand, external InfiniBand,
or external 10-Gb Ethernet)

Uses HA
state PROM
value?

FAS8080

Single-chassis or dual-chassis

Dual-chassis: External
InfiniBand using the ports on
the I/O expansion modules

Yes

Single-chassis: Internal
InfiniBand

FAS8060

Single-chassis

Internal InfiniBand

Yes

FAS8040

Single-chassis

Internal InfiniBand

Yes

FAS8020

Single-chassis

Internal InfiniBand

Yes

6290

Single-chassis or dual-chassis

Dual-chassis: External
InfiniBand using NVRAM
adapter

Yes

Single-chassis: Internal
InfiniBand

Dual-chassis: External
InfiniBand using NVRAM
adapter

Single-chassis: Internal
InfiniBand

Dual-chassis: External
InfiniBand using NVRAM
adapter

Single-chassis: Internal
InfiniBand

6280

6250

Single-chassis or dual-chassis

Single-chassis or dual-chassis

Yes

Yes

Planning your HA pair configuration | 43


Storage
system
model

HA configuration (singlechassis, dual-chassis, or


either)

Interconnect type (internal


InfiniBand, external InfiniBand,
or external 10-Gb Ethernet)

Uses HA
state PROM
value?

6240

Single-chassis or dual-chassis

Dual-chassis: External
InfiniBand using NVRAM
adapter

Yes

Single-chassis: Internal
InfiniBand

Dual-chassis: External
InfiniBand using NVRAM
adapter

Single-chassis: Internal
InfiniBand

Dual-chassis: External
InfiniBand using NVRAM
adapter

Single-chassis: Internal
InfiniBand

Dual-chassis: External 10-Gb


Ethernet using onboard ports
c0a and c0b
These ports are dedicated HA
interconnect ports. Regardless
of the system configuration,
these ports cannot be used for
data or other purposes.

Single-chassis: Internal
InfiniBand

6220

6210

3250

Single-chassis or dual-chassis

Single-chassis or dual-chassis

Single-chassis or dual-chassis

Yes

Yes

Yes

44 | High-Availability Configuration Guide


Storage
system
model

HA configuration (singlechassis, dual-chassis, or


either)

Interconnect type (internal


InfiniBand, external InfiniBand,
or external 10-Gb Ethernet)

Uses HA
state PROM
value?

3220

Single-chassis or dual-chassis

Dual-chassis: External 10-Gb


Ethernet using onboard ports
c0a and c0b
These ports are dedicated HA
interconnect ports. Regardless
of the system configuration,
these ports cannot be used for
data or other purposes.

Yes

Single-chassis: Internal
InfiniBand

FAS25xx

Single-chassis

Internal InfiniBand

Yes

FAS22xx

Single-chassis

Internal InfiniBand

Yes

45

Installing and cabling an HA pair


To install and cable a new standard or mirrored HA pair, you must have the correct tools and
equipment and you must connect the controllers to the disk shelves. If it is a dual-chassis HA pair,
you must also cable the HA interconnect between the nodes. HA pairs can be installed in either
NetApp system cabinets or in equipment racks.
The term V-Series system refers to the storage systems released prior to Data ONTAP 8.2.1 that can
use array LUNs. The FAS systems released in Data ONTAP 8.2.1 and later can use array LUNs if the
proper license is installed.
The specific procedure you use depends on the following aspects of your configuration:

Whether you have a standard or mirrored HA pair

Whether you are using FC or SAS disk shelves


Note: Refer to the NetApp Support Site for additional documentation if your HA pair

configuration includes SAS disk shelves. For cabling the HA interconnect between the nodes,
use the procedures in this guide.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Multipath HA is required on all HA pairs except for some FAS22xx and FAS25xx system
configurations, which use single-path HA and lack the redundant standby connections.
Related information

Clustered Data ONTAP 8.3 Data Protection Guide

System cabinet or equipment rack installation


You need to install your HA pair in one or more NetApp system cabinets or in standard telco
equipment racks. Each of these options has different requirements.

HA pairs in an equipment rack


Depending on the amount of storage you ordered, you need to install the equipment in one or more
telco-style equipment racks.
The equipment racks can hold one or two nodes on the bottom as well as eight or more disk shelves.
For information about how to install the disk shelves and nodes into equipment racks, see the
appropriate documentation that came with your equipment.

46 | High-Availability Configuration Guide

NetApp Documentation: Product Library A-Z

HA pairs in a system cabinet


Depending on the number of disk shelves, the HA pair you ordered arrives in a single system cabinet
or multiple system cabinets.
The number of system cabinets you receive depends on how much storage you ordered. All internal
adapters such as networking adapters, Fibre Channel adapters, and other adapters arrive preinstalled
in the nodes.
If it comes in a single system cabinet, both the Channel A and Channel B disk shelves are cabled, and
the HA adapters are also precabled.
If the HA pair you ordered has more than one cabinet, you must complete the cabling by cabling the
local node to the partner nodes disk shelves and the partner node to the local nodes disk shelves.
You must also cable the nodes together by cabling the NVRAM HA interconnects. If the HA pair
uses switches, you must install the switches as described in the accompanying switch documentation.
The system cabinets might also need to be connected to each other. See your System Cabinet Guide
for information about connecting your system cabinets together.

Required documentation
Installing an HA pair requires the correct documentation.
The following table lists and briefly describes the documentation you might need to refer to when
preparing a new HA pair, or converting two stand-alone systems into an HA pair:
Manual name

Description

NetApp Hardware Universe

This utility describes the physical requirements


that your site must meet to install NetApp
equipment.

The appropriate system cabinet guide

This guide describes how to install NetApp


equipment into a system cabinet.

The appropriate disk shelf guide

These guides describe how to cable a disk shelf


to a storage system.

The appropriate hardware documentation for


your storage system model

These guides describe how to install the storage


system, connect it to a network, and bring it up
for the first time.

Diagnostics Guide

This guide describes the diagnostics tests that


you can run on the storage system.

Clustered Data ONTAP 8.3 Network


Management Guide

This guide describes how to perform network


configuration for the storage system.

Installing and cabling an HA pair | 47


Manual name

Description

Clustered Data ONTAP 8.3 Upgrade and


Revert/Downgrade Guide

This guide describes how to upgrade storage


system and disk firmware, and how to upgrade
storage system software.

Clustered Data ONTAP 8.3 Data Protection


Guide

This guide describes, among other topics,


SyncMirror technology, which is used for
mirrored HA pairs.

Clustered Data ONTAP 8.3 System


Administration Guide for Cluster
Administrators

This guide describes general storage system


administration, including tasks such as adding
nodes to a cluster.

FlexArray Virtualization Installation


Requirements and Reference Guide

If you are installing a Data ONTAP system HA


pair, this guide provides information about
cabling Data ONTAP systems to storage arrays.
You can also refer to the Data ONTAP system
implementation guides for information about
configuring storage arrays to work with Data
ONTAP systems.

FlexArray Virtualization Implementation Guide


for Third-Party Storage

If you are installing a Data ONTAP system HA


pair, this guide provides information about
configuring storage arrays to work with Data
ONTAP systems.

Related information

NetApp Documentation: Product Library A-Z

Required tools
You must have the correct tools to install theHA pair.
The following list specifies the tools you need to install the HA pair:

#1 and #2 Phillips screwdrivers

Hand level

Marker

48 | High-Availability Configuration Guide

Required equipment
When you receive your HA pair, you should receive the equipment listed in the following table. See
the Hardware Universe at hwu.netapp.com to confirm your storage system type, storage capacity, and
so on.
Required equipment

Details

Storage system

Two of the same type of storage system

Storage

See the Hardware Universe at hwu.netapp.com

HA interconnect adapter card (for applicable


controller modules that do not share a chassis)

InfiniBand (IB) HA adapter


(The NVRAM adapter card functions as the HA
interconnect adapter on applicable storage
systems.)
See the Hardware Universe at hwu.netapp.com

Note: When 32xx systems are in a dualchassis HA pair, the c0a and c0b 10-GbE
ports are the HA interconnect ports. They do
not require an HA interconnect adapter.
Regardless of configuration, the 32xx
system's c0a and c0b ports cannot be used for
data. They are only for the HA interconnect.

For SAS disk shelves: SAS HBAs, if applicable


For DS14 disk shelves: FC-AL or FC HBA (FC
HBA for disk) adapters, if applicable

Minimum of two SAS HBAs, two FC-AL


adapters, or their equivalent in onboard ports

Fibre Channel switches, if applicable

Not applicable

SFP (small form-factor pluggable) modules, if


applicable

Not applicable

NVRAM HA adapter media converter

Only if using fiber cabling

Installing and cabling an HA pair | 49


Required equipment

Details

Cables (provided with shipment unless


otherwise noted)

For systems using FC disk shelf


connections, two optical controller-to-disk
shelf cables per loop

For systems using SAS disk shelf


connections, two SAS controller-to-disk
shelf cables per stack

Multiple disk shelf-to-disk shelf cables, if


applicable

For systems using the IB HA interconnect


adapter, two 4xIB copper cables, two 4xIB
optical cables, or two optical cables with
media converters
Note: You must purchase longer optical

cables separately for cabling distances


greater than 30 meters.

The 32xx systems, when in a dual-chassis


HA pair, require 10 GbE cables (Twinax or
SR optical) for the HA interconnect

Preparing your equipment


You must install your nodes in your system cabinets or equipment racks, depending on your
installation type.

Installing the nodes in equipment racks


Before you cable your nodes together, you must install the nodes and disk shelves in the equipment
rack, label the disk shelves, and connect the nodes to the network.
Steps

1. Install the nodes in the equipment rack as described in the guide for your disk shelf, hardware
documentation, or the Installation and Setup Instructions that came with your equipment.
2. Install the disk shelves in the equipment rack as described in the appropriate disk shelf guide.
3. Label the interfaces, where appropriate.
4. Connect the nodes to the network as described in the setup instructions for your system.

50 | High-Availability Configuration Guide


Result

The nodes are now in place and connected to the network; power is available.
After you finish

Cable the HA pair.

Installing the nodes in a system cabinet


Before you cable your nodes together, you must install the system cabinet, nodes, and any disk
shelves, and connect the nodes to the network. If you have two cabinets, the cabinets must be
connected together.
Steps

1. Install the system cabinets, nodes, and disk shelves as described in the System Cabinet Guide.
If you have multiple system cabinets, remove the front and rear doors and any side panels that
need to be removed, and connect the system cabinets together.
2. Connect the nodes to the network, as described in the Installation and Setup Instructions for your
system.
3. Connect the system cabinets to an appropriate power source and apply power to the cabinets.
Result

The nodes are now in place and connected to the network, and power is available.
After you finish

Proceed to cable the HA pair.

Cabling a standard HA pair


To cable a standard HA pair, you identify the ports you need to use on each node. You then cable the
ports and cable the HA interconnect.
About this task

This procedure explains how to cable a configuration using DS14mk2 AT or DS14mk4 FC disk
shelves.
Refer to the the NetApp Support Site for additional documentation if your HA pair configuration
includes SAS disk shelves.
Note: If you are installing an HA pair that uses array LUNs, there are specific procedures you
must follow when cabling Data ONTAP systems to storage arrays.

Installing and cabling an HA pair | 51

FlexArray Virtualization Installation Requirements and Reference Guide


Refer to the NetApp Support Site for additional documentation about configuring storage arrays to
work with Data ONTAP.

FlexArray Virtualization Implementation Guide for Third-Party Storage


The sections for cabling the HA interconnect apply to all systems regardless of disk shelf type.
Steps

1. Determining which Fibre Channel ports to use for Fibre Channel disk shelf connections on page
51
2. Cabling Node A to DS14mk2 AT or DS14mk4 FC disk shelves on page 52
3. Cabling Node B to DS14mk2 AT or DS14mk4 FC disk shelves on page 54
4. Cabling the HA interconnect (all systems except 32xx or FAS80xx in separate chassis) on page
56
5. Cabling the HA interconnect (32xx systems in separate chassis) on page 57
6. Cabling the HA interconnect (FAS80xx systems in separate chassis) on page 58
Related information

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

Determining which Fibre Channel ports to use for Fibre Channel disk shelf
connections
Before cabling your HA pair, you need to identify which Fibre Channel ports to use to connect your
disk shelves to each storage system, and in what order to connect them.
You must keep the following guidelines in mind when identifying which ports to use:

Every disk shelf loop in the HA pair requires two ports on the node, one for the primary
connection and one for the redundant multipath HA connection.
A standard HA pair with one loop for each node uses four ports on each node.

Onboard Fibre Channel ports should be used before using ports on expansion adapters.

See the Hardware Universe at hwu.netapp.com to obtain the correct expansion slot assignment
information for the various adapters you use to cable your HA pair.

If using Fibre Channel HBAs, insert the adapters in the same slots on both systems.

After identifying the ports, you should have a numbered list of Fibre Channel ports for both nodes,
starting with Port 1.

52 | High-Availability Configuration Guide


Cabling guidelines for a quad-port Fibre Channel HBA
If using ports on the quad-port, 4-Gb Fibre Channel HBAs, use the procedures in the following
sections, with the following additional guidelines:

Disk shelf loops using ESH4 modules must be cabled to the quad-port HBA first.

Disk shelf loops using AT-FCX modules must be cabled to dual-port HBA ports or onboard ports
before using ports on the quad-port HBA.

Port A of the HBA must be cabled to the In port of Channel A of the first disk shelf in the loop.
Port A of the partner node's HBA must be cabled to the In port of Channel B of the first disk shelf
in the loop. This ensures that disk names are the same for both nodes.

Additional disk shelf loops must be cabled sequentially with the HBAs ports.
Port A is used for the first loop, port B for the second loop, and so on.

If available, ports C or D must be used for the redundant multipath HA connection after cabling
all remaining disk shelf loops.

All other cabling rules described in the documentation for the HBA and the Hardware Universe
must be observed.

Cabling Node A to DS14mk2 AT or DS14mk4 FC disk shelves


To cable Node A, you must use the Fibre Channel ports you previously identified and cable the disk
shelf loops owned by the node to these ports.
About this task

This procedure uses multipath HA, which is required on all systems.

This procedure does not apply to SAS disk shelves.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Note: You can find additional cabling diagrams in your system's Installation and Setup Instructions

on the NetApp Support Site.


Steps

1. Review the cabling diagram before proceeding to the cabling steps.

The circled numbers in the diagram correspond to the step numbers in the procedure.

The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.

Installing and cabling an HA pair | 53

The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.

The port numbers refer to the list of Fibre Channel ports you created.

The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.

Node A
controller

Port A1

Port A2

2
3
Node A
disk shelf 1

Node B
disk shelf 1

Out

In

In

Out

Out

In

To Node A
disk shelf 2
Channel A

In

To Node B
disk shelf 2
Channel B

Out

Node B
controller

Then to

9 Node B
Port B3

Then to

5 Node B
Port B4

2. Cable Fibre Channel port A1 of Node A to the Channel A Input port of the first disk shelf of
Node A loop 1.
3. Cable the Node A disk shelf Channel A Output port to the Channel A Input port of the next disk
shelf in loop 1.
4. Repeat Step 3 for any remaining disk shelves in loop 1.
5. Cable the Channel A Output port of the last disk shelf in the loop to Fibre Channel port B4 of
Node B.
This provides the redundant multipath HA connection for Channel A.
6. Cable Fibre Channel port A2 of Node A to the Channel B Input port of the first disk shelf of
Node B loop 1.
7. Cable the Node B disk shelf Channel B Output port to the Channel B Input port of the next disk
shelf in loop 1.

54 | High-Availability Configuration Guide


8. Repeat Step 7 for any remaining disk shelves in loop 1.
9. Cable the Channel B Output port of the last disk shelf in the loop to Fibre Channel port B3 of
Node B.
This provides the redundant multipath HA connection for Channel B.
10. Repeat Steps 2 through 9 for each pair of loops in the HA pair, using ports 3 and 4 for the next
loop, ports 5 and 6 for the next one, and so on.
Result

Node A is completely cabled.


After you finish

Cable Node B.

Cabling Node B to DS14mk2 AT or DS14mk4 FC disk shelves


To cable Node B, you must use the Fibre Channel ports you previously identified and cable the disk
shelf loops owned by the node to these ports.
About this task

This procedure uses multipath HA, which is required on all systems.

This procedure does not apply to SAS disk shelves.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Note: You can find additional cabling diagrams in your system's Installation and Setup Instructions
on the NetApp Support Site at mysupport.netapp.com.
Steps

1. Review the cabling diagram before proceeding to the cabling steps.

The circled numbers in the diagram correspond to the step numbers in the procedure.

The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.

The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.

Installing and cabling an HA pair | 55

The port numbers refer to the list of Fibre Channel ports you created.

The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.

Node A
controller

Port A1

Port A3

Port A2

Port A4

Then to

5 Node A
A

Node A
disk shelf 1

Node B
disk shelf 1

Out

In

In

Out

Out

In

To Node B
disk shelf 2
Channel A

In

Out

2
Node B
controller

4
To Node A
disk shelf 2
Channel B

6
Port B1

Port B2

Port B3

Port B4

2. Cable Port B1 of Node B to the Channel B Input port of the first disk shelf of Node A loop 1.
Both channels of this disk shelf are connected to the same port on each node. This is not required,
but it makes your HA pair easier to administer because the disks have the same ID on each node.
This is true for Step 5 also.
3. Cable the disk shelf Channel B Output port to the Channel B Input port of the next disk shelf in
loop 1.
4. Repeat Step 3 for any remaining disk shelves in loop 1.
5. Cable the Channel B Output port of the last disk shelf in the loop to Fibre Channel port A4 of
Node A.
This provides the redundant multipath HA connection for Channel B.
6. Cable Fibre Channel port B2 of Node B to the Channel A Input port of the first disk shelf of Node
B loop 1.

56 | High-Availability Configuration Guide


7. Cable the disk shelf Channel A Output port to the Channel A Input port of the next disk shelf in
loop 1.
8. Repeat Step 7 for any remaining disk shelves in loop 1.
9. Cable the Channel A Output port of the last disk shelf in the loop to Fibre Channel port A3 of
Node A.
This provides the redundant multipath HA connection for Channel A.
10. Repeat Steps 2 to 9 for each pair of loops in the HA pair, using ports 3 and 4 for the next loop,
ports 5 and 6 for the next one, and so on.
Result

Node B is completely cabled.


After you finish

Cable the HA interconnect.

Cabling the HA interconnect (all systems except 32xx or FAS80xx in


separate chassis)
To cable the HA interconnect between the HA pair nodes, you must make sure that your interconnect
adapter is in the correct slot. You must also connect the adapters on each node with the optical cable.
About this task

This procedure applies to all dual-chassis HA pairs (HA pairs in which the two controller modules
reside in separate chassis) except the 32xx or FAS80xx in separate chassis, regardless of disk shelf
type.
Steps

1. See the Hardware Universe at hwu.netapp.com to ensure that your interconnect adapter is in the
correct slot for your system in an HA pair.
For systems that use an NVRAM adapter, the NVRAM adapter functions as the HA interconnect
adapter.
2. Plug one end of the optical cable into one of the local node's HA adapter ports, then plug the
other end into the partner node's corresponding adapter port.
You must not cross-cable the HA interconnect adapter. Cable the local node ports only to the
identical ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):

Installing and cabling an HA pair | 57


HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.

3. Repeat Step 2 for the two remaining ports on the HA adapters.


Result

The nodes are connected to each other.


After you finish

Configure the system.

Cabling the HA interconnect (32xx systems in separate chassis)


To enable the HA interconnect between 32xx controller modules that reside in separate chassis, you
must cable the onboard 10-GbE ports on one controller module to the onboard 10-GbE ports on the
partner.
About this task

This procedure applies to 32xx systems regardless of the type of attached disk shelves.
Steps

1. Plug one end of the 10-GbE cable to the c0a port on one controller module.
2. Plug the other end of the 10-GbE cable to the c0a port on the partner controller module.
3. Repeat the preceding steps to connect the c0b ports.
Do not cross-cable the HA interconnect adapter; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result

The nodes are connected to each other.


After you finish

Configure the system.

58 | High-Availability Configuration Guide

Cabling the HA interconnect (FAS80xx systems in separate chassis)


To enable the HA interconnect between FAS80xx controller modules that reside in separate chassis,
you must cable the QSFP InfiniBand ports on one I/O expansion module to the QSFP InfiniBand
ports on the partner's I/O expansion module.
About this task

Because the FAS80xx storage controller modules do not include external HA interconnect ports, you
must use the HA interconnect ports on the I/O expansion modules to deploy these controller models
in separate chassis.
This procedure applies to FAS80xx systems, regardless of the type of attached disk shelves.
Steps

1. Plug one end of the QSFP InfiniBand cable to the ib0a port on one I/O expansion module.
2. Plug the other end of the QSFP InfiniBand cable to the ib0a port on the partner's I/O expansion
module.
3. Repeat the preceding steps to connect the ib0b ports.
Do not cross-cable the HA interconnect ports; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result

The nodes are connected to each other.


After you finish

Configure the system.

Installing and cabling an HA pair | 59

Cabling a mirrored HA pair


To cable a mirrored HA pair, you must identify the ports you need to use on each node, cable those
ports, and then cable the HA interconnect.
About this task

This procedure explains how to cable a configuration using DS14mk2 AT or DS14mk4 FC disk
shelves.
Refer to the the NetApp Support Site for additional documentation if your HA pair configuration
includes SAS disk shelves.
Note: If you are installing an HA pair that uses array LUNs, there are specific procedures you
must follow when cabling Data ONTAP systems to storage arrays.

FlexArray Virtualization Installation Requirements and Reference Guide


Refer to the NetApp Support Site for additional documentation about configuring storage arrays to
work with Data ONTAP.

FlexArray Virtualization Implementation Guide for Third-Party Storage


The sections for cabling the HA interconnect apply to all systems regardless of disk shelf type.
Related information

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

Determining which Fibre Channel ports to use for Fibre Channel disk shelf
connections
Before cabling your HA pair, you need to identify which Fibre Channel ports to use to connect your
disk shelves to each storage system, and in what order to connect them.
You must keep the following guidelines in mind when identifying which ports to use:

Every disk shelf loop in the HA pair requires two ports on the node, one for the primary
connection and one for the redundant multipath HA connection.
A standard HA pair with one loop for each node uses four ports on each node.

Onboard Fibre Channel ports should be used before using ports on expansion adapters.

See the Hardware Universe at hwu.netapp.com to obtain the correct expansion slot assignment
information for the various adapters you use to cable your HA pair.

If using Fibre Channel HBAs, insert the adapters in the same slots on both systems.

60 | High-Availability Configuration Guide


After identifying the ports, you should have a numbered list of Fibre Channel ports for both nodes,
starting with Port 1.
Cabling guidelines for a quad-port Fibre Channel HBA
If using ports on the quad-port, 4-Gb Fibre Channel HBAs, use the procedures in the following
sections, with the following additional guidelines:

Disk shelf loops using ESH4 modules must be cabled to the quad-port HBA first.

Disk shelf loops using AT-FCX modules must be cabled to dual-port HBA ports or onboard ports
before using ports on the quad-port HBA.

Port A of the HBA must be cabled to the In port of Channel A of the first disk shelf in the loop.
Port A of the partner node's HBA must be cabled to the In port of Channel B of the first disk shelf
in the loop. This ensures that disk names are the same for both nodes.

Additional disk shelf loops must be cabled sequentially with the HBAs ports.
Port A is used for the first loop, port B for the second loop, and so on.

If available, ports C or D must be used for the redundant multipath HA connection after cabling
all remaining disk shelf loops.

All other cabling rules described in the documentation for the HBA and the Hardware Universe
must be observed.

Creating your port list for mirrored HA pairs


After you determine which Fibre Channel ports to use, you can create a table that identifies which
ports belong to which port pool.
About this task

Mirrored HA pairs, regardless of disk shelf type, use SyncMirror to separate each aggregate into two
plexes that mirror each other. One plex uses disks in pool 0 and the other plex uses disks in pool 1.
You must assign disks to the pools appropriately.
Follow the documented guidelines for software-based disk ownership.

Clustered Data ONTAP 8.3 Physical Storage Management Guide


Step

1. Create a table that specifies the port usage; the cabling diagrams in this document use the notation
P1-3 (the third port for pool 1).
For a 32xx HA pair that has two mirrored loops, the port list might look like the following
example:

Installing and cabling an HA pair | 61


Pool 0

Pool 1

P0-1: onboard port 0a

P1-1: onboard port 0c

P0-2: onboard port 0b

P1-2: onboard port 0d

P0-3: slot 2 port A

P1-3: slot 4 port A

P0-4: slot 2 port B

P1-4: slot 4 port B

After you finish

Cable the Channel A loops.


Related information

Clustered Data ONTAP 8.3 Data Protection Guide

Cabling the Channel A DS14mk2 AT or DS14mk4 FC disk shelf loops


To begin cabling the disk shelves, you must connect the appropriate pool ports on the node to the
Channel A modules of the disk shelf stack for the pool.
About this task

This procedure uses multipath HA, which is required on all systems.

This procedure does not apply to SAS disk shelves.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Steps

1. Complete your port list.


2. Review the cabling diagram before proceeding to the cabling steps.

The circled numbers in the diagram correspond to the step numbers in the procedure.

The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.

The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.

The port numbers refer to the list of Fibre Channel ports you created.

62 | High-Availability Configuration Guide

The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.

3. Cable Channel A for Node A.


a. Cable the first port for pool 0 (P0-1) of Node A to the first Node A disk shelf Channel A Input
port of disk shelf pool 0.

Installing and cabling an HA pair | 63


b. Cable the first port for pool 1 (P1-1) of Node A to the first Node A disk shelf Channel A Input
port of disk shelf pool 1.
c. Cable the disk shelf Channel A Output port to the next disk shelf Channel A Input port in the
loop for both disk pools.
Note: The illustration shows only one disk shelf per disk pool. The number of disk shelves
per pool might be different for your configuration.

d. Repeat Substep 3c, connecting the next Channel A output to the next disk shelf Channel A
Input port for any remaining disk shelves in this loop for each disk pool.
e. Repeat Substep 3a through Substep 3d for any additional loops for Channel A, Node A, using
the odd-numbered ports (P0-3 and P1-3, P0-5, and P1-5, and so on).
4. Cable Channel A for Node B.
a. Cable the second port for pool 0 (P0-2) of Node B to the first Node B disk shelf Channel A
Input port of disk shelf pool 0.
b. Cable the second port for pool 1 (P1-2) of Node B to the first Node B disk shelf Channel A
Input port of disk shelf pool 1.
c. Cable the disk shelf Channel A Output port to the next disk shelf Channel A Input port in the
loop for both disk pools.
d. Repeat Substep 4c, connecting Channel A output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 4a through Substep 4d for any additional loops on Channel A, Node B, using
the even-numbered ports (P0-4 and P1-4, P0-6, and P1-6, and so on).
After you finish

Cable the Channel B loops.

Cabling the Channel B DS14mk2 AT or DS14mk4 FC disk shelf loops


To provide mirrored storage, you cable the mirrored pool ports on the node to the Channel B modules
of the appropriate disk shelf stack.
About this task

This procedure uses multipath HA, which is required on all systems.

This procedure does not apply to SAS disk shelves.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

64 | High-Availability Configuration Guide


Steps

1. Review the cabling diagram before proceeding to the cabling steps.

The circled numbers in the diagram correspond to the step numbers in the procedure.

The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.

The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.

The port numbers refer to the list of Fibre Channel ports you created.

The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.

Installing and cabling an HA pair | 65

2. Cable Channel B for Node A.


a. Cable the second port for pool 0 (P0-2) of Node A to the first Node B disk shelf Channel B
Input port of disk shelf pool 0.
Note: Both channels of this disk shelf are connected to the same port on each node. This is
not required, but it makes your HA pair easier to administer because the disks have the
same ID on each node.

66 | High-Availability Configuration Guide


b. Cable the second port for pool 1 (P1-2) of Node A to the first Node B disk shelf Channel B
Input port of disk shelf pool 1.
c. Cable the disk shelf Channel B Output port to the next disk shelf Channel B Input port in the
loop for both disk pools.
Note: The illustration shows only one disk shelf per disk pool. The number of disk shelves
per pool might be different for your configuration.

d. Repeat Substep 2c, connecting Channel B output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 2a through Substep 2d for any additional loops on Channel B, Node A, using
the even-numbered ports (P0-4 and P1-4, P0-6, and P1-6, and so on).
3. Cable Channel B for Node B.
a. Cable the first port for pool 0 (P0-1) of Node B to the first Node A disk shelf Channel B Input
port of disk shelf pool 0.
b. Cable the first port for pool 1 (P1-1) of Node B to the first Node A disk shelf Channel B Input
port of disk shelf pool 1.
c. Cable the disk shelf Channel B Output port to the next disk shelf Channel B Input port in the
loop for both disk pools.
d. Repeat Substep 3c, connecting Channel B output to input, for any remaining disk shelves in
each disk pool.
e. Repeat Substep 3a through Substep 3d for any additional loops for Channel B, Node B, using
the odd-numbered ports (P0-3 and P1-3, P0-5, and P1-5, and so on).
After you finish

Cable the HA interconnect.


Related tasks

Cabling the HA interconnect (32xx systems in separate chassis) on page 57


Cabling the HA interconnect (FAS80xx systems in separate chassis) on page 58

Cabling the redundant multipath HA connection for each loop


To complete the multipath HA cabling for the disk shelves, you must add the final connection for
each channel on the final disk shelf in each loop.
Steps

1. Review the cabling diagram before proceeding to the cabling steps.

Installing and cabling an HA pair | 67

The circled numbers in the diagram correspond to the step numbers in the procedure.

The location of the Input and Output ports on the disk shelves vary depending on the disk
shelf models.
Make sure that you refer to the labeling on the disk shelf rather than to the location of the port
shown in the diagram.

The location of the Fibre Channel ports on the controllers is not representative of any
particular storage system model; determine the locations of the ports you are using in your
configuration by inspection or by using the Installation and Setup Instructions for your model.

The port numbers refer to the list of Fibre Channel ports you created.

The diagram only shows one loop per node and one disk shelf per loop.
Your installation might have more loops, more disk shelves, or different numbers of disk
shelves between nodes.

68 | High-Availability Configuration Guide

2. Connect the Channel A output port on the last disk shelf for each loop belonging to Node A to an
available port on Node B in the same pool.
3. Connect the Channel B output port on the last disk shelf for each loop belonging to Node A to an
available port on Node B in the same pool.
4. Connect the Channel A output port on the last disk shelf for each loop belonging to Node B to an
available port on Node B in the same pool.
5. Connect the Channel B output port on the last disk shelf for each loop belonging to Node B to an
available port on Node B in the same pool.

Installing and cabling an HA pair | 69

Cabling the HA interconnect (all systems except 32xx or FAS80xx in


separate chassis)
To cable the HA interconnect between the HA pair nodes, you must make sure that your interconnect
adapter is in the correct slot. You must also connect the adapters on each node with the optical cable.
About this task

This procedure applies to all dual-chassis HA pairs (HA pairs in which the two controller modules
reside in separate chassis) except the 32xx or FAS80xx in separate chassis, regardless of disk shelf
type.
Steps

1. See the Hardware Universe at hwu.netapp.com to ensure that your interconnect adapter is in the
correct slot for your system in an HA pair.
For systems that use an NVRAM adapter, the NVRAM adapter functions as the HA interconnect
adapter.
2. Plug one end of the optical cable into one of the local node's HA adapter ports, then plug the
other end into the partner node's corresponding adapter port.
You must not cross-cable the HA interconnect adapter. Cable the local node ports only to the
identical ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.

3. Repeat Step 2 for the two remaining ports on the HA adapters.


Result

The nodes are connected to each other.


After you finish

Configure the system.

70 | High-Availability Configuration Guide

Cabling the HA interconnect (32xx systems in separate chassis)


To enable the HA interconnect between 32xx controller modules that reside in separate chassis, you
must cable the onboard 10-GbE ports on one controller module to the onboard 10-GbE ports on the
partner.
About this task

This procedure applies to 32xx systems regardless of the type of attached disk shelves.
Steps

1. Plug one end of the 10-GbE cable to the c0a port on one controller module.
2. Plug the other end of the 10-GbE cable to the c0a port on the partner controller module.
3. Repeat the preceding steps to connect the c0b ports.
Do not cross-cable the HA interconnect adapter; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result

The nodes are connected to each other.


After you finish

Configure the system.

Cabling the HA interconnect (FAS80xx systems in separate chassis)


To enable the HA interconnect between FAS80xx controller modules that reside in separate chassis,
you must cable the QSFP InfiniBand ports on one I/O expansion module to the QSFP InfiniBand
ports on the partner's I/O expansion module.
About this task

Because the FAS80xx storage controller modules do not include external HA interconnect ports, you
must use the HA interconnect ports on the I/O expansion modules to deploy these controller models
in separate chassis.
This procedure applies to FAS80xx systems, regardless of the type of attached disk shelves.

Installing and cabling an HA pair | 71


Steps

1. Plug one end of the QSFP InfiniBand cable to the ib0a port on one I/O expansion module.
2. Plug the other end of the QSFP InfiniBand cable to the ib0a port on the partner's I/O expansion
module.
3. Repeat the preceding steps to connect the ib0b ports.
Do not cross-cable the HA interconnect ports; cable the local node ports only to the identical
ports on the partner node.
If the system detects a cross-cabled HA interconnect, the following message appears on the
system console and in the event log (accessible using the event log show command):
HA interconnect port <port> of this appliance seems to be connected to
port <port> on the partner appliance.
Result

The nodes are connected to each other.


After you finish

Configure the system.

Required connections for using uninterruptible power


supplies with standard or mirrored HA pairs
You can use a UPS (uninterruptible power supply) with your HA pair. The UPS enables the system to
fail over gracefully if power fails for one of the nodes, or to shut down gracefully if power fails for
both nodes. You must ensure that the correct equipment is connected to the UPS.
To gain the full benefit of the UPS, you must ensure that all the required equipment is connected to
the UPS. The equipment that needs to be connected depends on whether your configuration is a
standard or a mirrored HA pair.
For a standard HA pair, you must connect the controller, disks, and any FC switches in use.
For a mirrored HA pair, you must connect the controller and any FC switches to the UPS, as for a
standard HA pair. However, if the two sets of disk shelves have separate power sources, you do not
have to connect the disks to the UPS. If power is interrupted to the local controller and disks, the
controller can access the remote disks until it shuts down gracefully or the power supply is restored.
In this case, if power is interrupted to both sets of disks at the same time, the HA pair cannot shut
down gracefully.

72 | High-Availability Configuration Guide

Configuring an HA pair
Bringing up and configuring a standard or mirrored HA pair for the first time can require enabling
HA mode capability and failover, setting options, configuring network connections, and testing the
configuration.
These tasks apply to all HA pairs regardless of disk shelf type.
Steps

1.
2.
3.
4.
5.
6.
7.
8.

Verifying and setting the HA state on the controller modules and chassis on page 72
Setting the HA mode and enabling storage failover on page 74
Enabling cluster HA and switchless-cluster in a two-node cluster on page 75
Verifying the HA pair configuration on page 77
Configuring hardware-assisted takeover on page 77
Configuring automatic takeover on page 79
Configuring automatic giveback on page 80
Testing takeover and giveback on page 84

Verifying and setting the HA state on the controller modules


and chassis
For systems that use the HA state value, the value must be consistent in all components in the HA
pair. You can use the Maintenance mode ha-config command to verify and, if necessary, set the
HA state.
About this task

The ha-config command only applies to the local controller module and, in the case of a dualchassis HA pair, the local chassis. To ensure consistent HA state information throughout the system,
you must also run these commands on the partner controller module and chassis, if necessary.
Note: When you boot a node for the first time, the HA state value for both controller and chassis is
default.

The HA state is recorded in the hardware PROM in the chassis and in the controller module. It must
be consistent across all components of the system, as shown in the following table:

Configuring an HA pair | 73
If the system or systems are
in a...

The HA state is recorded on


these components...

The HA state on the


components must be...

Stand-alone configuration
(not in an HA pair)

The chassis

non-ha

Controller module A

The chassis

Controller module A

Controller module B

Chassis A

Controller module A

Chassis B

Controller module B

The chassis

Controller module A

Controller module B

Chassis A

Controller module A

Chassis B

Controller module B

A single-chassis HA pair

A dual-chassis HA pair

Each single-chassis HA pair


in a MetroCluster
configuration

Each dual-chassis HA pair in


a MetroCluster configuration

ha

ha

mcc

mcc

Use the following steps to verify the HA state is appropriate and, if not, to change it:
Steps

1. Reboot or halt the current controller module and use either of the following two options to boot
into Maintenance mode:
a. If you rebooted the controller, press Ctrl-C when prompted to display the boot menu and then
select the option for Maintenance mode boot.
b. If you halted the controller, enter the following command from the LOADER prompt:
boot_ontap maint

74 | High-Availability Configuration Guide


Note: This option boots directly into Maintenance mode; you do not need to press Ctrl-C.

2. After the system boots into Maintenance mode, enter the following command to display the HA
state of the local controller module and chassis:
ha-config show

The HA state should be ha for all components if the system is in an HA pair.


3. If necessary, enter the following command to set the HA state of the controller:
ha-config modify controller ha-state

4. If necessary, enter the following command to set the HA state of the chassis:
ha-config modify chassis ha-state

5. Exit Maintenance mode by entering the following command:


halt

6. Boot the system by entering the following command at the boot loader prompt:
boot_ontap

7. If necessary, repeat the preceding steps on the partner controller module.


Related information

Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide


Clustered Data ONTAP 8.3 MetroCluster Management and Disaster Recovery Guide

Setting the HA mode and enabling storage failover


You need to set the HA mode and enable storage failover functionality to get the benefits of an HA
pair.

Commands for setting the HA mode


The HA license is no longer required starting with Data ONTAP 8.2, yet there are specific Data
ONTAP commands for setting the HA mode. The system must be physically configured for HA
before HA mode is selected. A reboot is required to implement the mode change.
If you want to...

Use this command...

Set the mode to


HA

storage failover modify -mode ha -node nodename

Configuring an HA pair | 75
If you want to...

Use this command...

Set the mode to


non-HA

storage failover modify -mode non_ha -node nodename


Note: You must disable storage failover before disabling HA mode.

Related references

Connections and components of an HA pair on page 11


Description of node states displayed by storage failover show-type commands on page 88
Related information

Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes

Commands for enabling and disabling storage failover


There are specific Data ONTAP commands for enabling the storage failover functionality.
If you want to...

Use this command...

Enable takeover

storage failover modify -enabled true -node nodename

Disable takeover

storage failover modify -enabled false -node nodename

Related information

Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes

Enabling cluster HA and switchless-cluster in a two-node


cluster
A cluster consisting of only two nodes requires special configuration settings. Cluster high
availability (HA) differs from the HA provided by storage failover, and is required in a cluster if it
contains only two nodes. Also, if you have a switchless configuration, the switchless-cluster option
must be enabled.
About this task

In a two-node cluster, cluster HA ensures that the failure of one node does not disable the cluster. If
your cluster contains only two nodes:

Enabling cluster HA requires and automatically enables storage failover and auto-giveback.

Cluster HA is enabled automatically when you enable storage failover.

76 | High-Availability Configuration Guide


Note: If the cluster contains or grows to more than two nodes, cluster HA is not required and is
disabled automatically.

A two-node cluster can be configured using direct-cable connections between the nodes instead of a
cluster interconnect switch. If you have a two-node switchless configuration, the switchlesscluster network option must be enabled to ensure proper cluster communication between the
nodes.
Steps

1. Enter the following command to enable cluster HA:


cluster ha modify -configured true

If storage failover is not already enabled, you are prompted to confirm enabling of both storage
failover and auto-giveback.
2. If you have a two-node switchless cluster, enter the following commands to verify that the
switchless-cluster option is set:
a. Enter the following command to change to the advanced privilege level:
set -privilege advanced

Confirm when prompted to continue into advanced mode. The advanced mode prompt appears
(*>).
b. Enter the following command:
network options switchless-cluster show

If the output shows that the value is false, you must issue the following command:
network options switchless-cluster modify true

c. Enter the following command to return to the admin privilege level:


set -privilege admin
Related concepts

How HA pairs relate to the cluster on page 13


If your cluster consists of a single HA pair on page 36
If you have a two-node switchless cluster on page 17
Related references

Halting or rebooting a node without initiating takeover on page 99

Configuring an HA pair | 77

Verifying the HA pair configuration


You can go to the NetApp Support Site and download the Config Advisor tool to check for common
configuration errors.
About this task

Config Advisor is a configuration validation and health check tool for NetApp systems. It can be
deployed at both secure sites and non-secure sites for data collection and system analysis.
Note: Support for Config Advisor is limited and available only online.
Steps

1. Log in to the NetApp Support Site at mysupport.netapp.com and go to Downloads > Software >
ToolChest.
2. Click Config Advisor.
3. Follow the directions on the web page for downloading, installing, and running the utility.
4. After running Config Advisor, review the tool's output and follow the recommendations provided
to address any issues discovered.

Configuring hardware-assisted takeover


You can configure hardware-assisted takeover to speed up takeover times. Hardware-assisted
takeover uses the remote management device to quickly communicate local status changes to the
partner node.

Commands for configuring hardware-assisted takeover


There are specific Data ONTAP commands for configuring the hardware-assisted takeover feature.
If you want to...

Use this command...

Disable or enable hardware-assisted


takeover

storage failover modify hwassist

Set the partner address

storage failover modify hwassistpartnerip

Set the partner port

storage failover modify


hwassistpartnerport

Specify the interval between


heartbeats

storage failover modify


hwassisthealthcheckinterval

78 | High-Availability Configuration Guide


If you want to...

Use this command...

Specify the number of times the


hardware-assisted takeover alerts are
sent

storage failover modify hwassistretrycount

Related information

Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators

System events that trigger hardware-assisted takeover


The remote management device (Service Processor) can detect many events and generate alerts. The
partner node might initiate takeover, depending on the type of alert received.
Alert

Takeover
initiated upon
receipt?

Description

power_loss

Yes

A power loss has occurred on the node.


The remote management device has a power supply
that maintains power for a short period after a power
loss, allowing it to report the power loss to the
partner.

l2_watchdog_reset

Yes

The system watchdog hardware has detected an L2


reset.
The remote management device detected a lack of
response from the system CPU and reset the system.

power_off_via_sp

Yes

The remote management device was used to power


off the system.

power_cycle_via_sp

Yes

The remote management device was used to cycle the


system power off and on.

reset_via_sp

Yes

The remote management device was used to reset the


system.

abnormal_reboot

No

An abnormal reboot of the node has occurred.

Configuring an HA pair | 79
Alert

Takeover
initiated upon
receipt?

Description

loss_of_heartbeat

No

The heartbeat message from the node was no longer


received by the remote management device.
Note: This alert does not refer to the heartbeat
messages between the nodes in the HA pair; it
refers to the heartbeat between the node and its
local remote management device.

periodic_message

No

A periodic message has been sent during a normal


hardware-assisted takeover operation.

test

No

A test message has been sent to verify a hardwareassisted takeover operation.

Configuring automatic takeover


Automatic takeover is enabled by default. You can control when automatic takeovers occur by using
specific commands.

Commands for controlling automatic takeover


There are specific Data ONTAP commands you can use to change the default behavior and control
when automatic takeovers occur.
If you want takeover to occur
automatically when the partner node...

Use this command...

Reboots

storage failover modify node nodename


onreboot true

Panics

storage failover modify node nodename


onpanic true

Related information

Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators

80 | High-Availability Configuration Guide

System events that always result in an automatic takeover


Some events always lead to an automatic takeover if storage failover is enabled. These takeovers
cannot be avoided through configuration.
The following system events cause an automatic and unavoidable takeover of the node:

The node cannot send heartbeat messages to its partner due to events such as loss of power or
watchdog reset.

You halt the node without using the -f or -inhibit-takeover parameter.

The node panics.

Configuring automatic giveback


You can configure automatic giveback so that when a node that has been taken over boots up to the
Waiting for Giveback state, giveback automatically occurs.

How automatic giveback works


The automatic takeover and automatic giveback operations can work together to reduce and avoid
client outages. They occur by default in the case of a panic or reboot, or if the cluster contains only a
single HA pair. However, these operations require specific configuration for some other cases.
With the default settings, if one node in the HA pair panics or reboots, the partner node automatically
takes over and then automatically gives back storage when the affected node reboots. This returns the
HA pair to a normal operating state.
The automatic giveback occurs by default after a panic or a reboot. You can also configure the system
to perform an automatic giveback in cases other than a panic or a reboot. However, because each of
the options controls different aspects of automatic giveback, you must configure them independently.
Although you can also set the system to always attempt an automatic giveback (for cases other than a
panic or a reboot), you should do so with caution:

The automatic giveback causes a second unscheduled interruption (after the automatic takeover).
Depending on your client configurations, you might want to initiate the giveback manually to
plan when this second interruption occurs.

The takeover might have been due to a hardware problem that can recur without additional
diagnosis, leading to additional takeovers and givebacks.
Note: Automatic giveback is enabled by default if the cluster contains only a single HA pair.
Automatic giveback is disabled by default during nondisruptive Data ONTAP upgrades.

Before performing the automatic giveback (regardless of what triggered it), the partner node waits for
a fixed amount of time as controlled by the -delay-seconds parameter of the storage failover

Configuring an HA pair | 81
modify command. The default delay is 600 seconds. By delaying the giveback, the process results in

two brief outages:

1. One outage during the takeover operation


2. One outage during the giveback operation
This process avoids a single, prolonged outage that includes:
1. The time for the takeover operation
2. The time it takes for the taken-over node to boot up to the point at which it is ready for the
giveback
3. The time for the giveback operation
If the automatic giveback fails for any of the non-root aggregates, the system automatically makes
two additional attempts to complete the giveback.

Commands for configuring automatic giveback


There are specific Data ONTAP commands for enabling or disabling automatic giveback.
If you want to...

Use this command...

Enable automatic giveback so that giveback


occurs as soon as the taken-over node boots,
reaches the Waiting for Giveback state, and
the Delay before Auto Giveback period has
expired
The default setting is false, except for twonode clusters, where the default setting is
true.

storage failover modify node nodename


autogiveback true

Disable automatic giveback


The default setting is false, except for twonode clusters, where the default setting is

storage failover modify node nodename


autogiveback false

true
Note: Setting this parameter to false does

not disable automatic giveback after


takeover on panic and takeover on reboot;
automatic giveback after takeover on panic
must be disabled by setting the
autogivebackafterpanic parameter
to false

Disable automatic giveback after takeover on


panic (this setting is enabled by default)

storage failover modify node nodename


autogivebackafterpanic false

82 | High-Availability Configuration Guide


If you want to...

Use this command...

Delay automatic giveback for a specified


number of seconds (default is 600)
This option determines the minimum time
that a node will remain in takeover before
performing an automatic giveback.

storage failover modify node nodename


delayseconds seconds

Change the number of times the automatic


giveback is attempted within 60 minutes
(default is two)

storage failover modify node nodename


attempts integer

Change the time period (in minutes) used by


the -attempts parameter (default is 60
minutes)

storage failover modify node nodename


attemptstime integer

Change the time period (in minutes) to delay


the automatic giveback before terminating
CIFS clients that have open files.
During the delay, the system periodically
sends notices to the affected workstations. If
0 (zero) minutes are specified, then CIFS
clients are terminated immediately.

storage failover modify node nodename


autogivebackcifsterminateminutes
integer

Override any vetoes during automatic


giveback operations

storage failover modify node nodename


autogivebackoverridevetoes true

Note: Some vetos cannot be overridden.

Related information

Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators

How variations of the storage failover modify command affect automatic


giveback
The operation of automatic giveback depends on how you configure the parameters of the storage
failover modify command.
The effects of automatic giveback parameter combinations that apply to situations
other than panic
The following table lists the storage failover modify command parameters that apply to
takeover events not caused by a panic:

Configuring an HA pair | 83
Parameter

Default setting

-auto-giveback true|false

true for two-node clusters


false for clusters with four-nodes or more

-delay-seconds integer (seconds)

600

-onreboot true|false

true

The following table describes how combinations of the -onreboot and -auto-giveback
parameters affect automatic giveback for takeover events not caused by a panic:
storage failover
modify parameters used

Cause of takeover

Does automatic giveback


occur?

-onreboot true

reboot command

Yes

-auto-giveback true

halt command, or power cycle

Yes

operation issued from the Service


Processor
-onreboot true

reboot command

Yes

-auto-giveback false

halt command, or power cycle

No

operation issued from the Service


Processor
-onreboot false

reboot command

No

-auto-giveback true

halt command, or power cycle


operation issued from the Service
Processor

Yes

-onreboot false

reboot command

No

-auto-giveback false

halt command, or power cycle

No

operation issued from the Service


Processor
Note: If the -onreboot parameter is set to true and a takeover occurs due to a reboot, then
automatic giveback is always performed, regardless of whether the -auto-giveback parameter is
set to true.

When the -onreboot parameter is set to false, a takeover does not occur in the case of a node
reboot. Therefore, automatic giveback cannot occur, regardless of whether the -auto-giveback
parameter is set to true. A client disruption occurs.

84 | High-Availability Configuration Guide


The effects of automatic giveback parameter combinations that apply to panic
situations
The following table lists the storage failover modify command parameters that apply to panic
situations:
Parameter

Default setting

-onpanic true|false

true

-auto-giveback-after-panic true|
false

true

(Privilege: Advanced)
The following table describes how parameter combinations of the storage failover modify
command affect automatic giveback in panic situations:
storage failover parameters used

Does automatic giveback occur after panic?

-onpanic true

Yes

-auto-giveback-after-panic true
-onpanic true

Yes

-auto-giveback-after-panic false
-onpanic false

No

-auto-giveback-after-panic true
-onpanic false

No

-auto-giveback-after-panic false
Note: If the -onpanic parameter is set to true, automatic giveback is always performed if a
panic occurs.

If the -onpanic parameter is set to false, takeover does not occur. Therefore, automatic
giveback cannot occur, even if the autogivebackafterpanic parameter is set to true. A
client disruption occurs.

Testing takeover and giveback


After you configure all aspects of your HA pair, you need to verify that it is operating as expected in
maintaining uninterrupted access to both nodes' storage during takeover and giveback operations.
Throughout the takeover process, the local (or takeover) node should continue serving the data

Configuring an HA pair | 85
normally provided by the partner node. During giveback, control and delivery of the partner's storage
should return to the partner node.
Steps

1. Check the cabling on the HA interconnect cables to make sure that they are secure.
2. Verify that you can create and retrieve files on both nodes for each licensed protocol.
3. Enter the following command:
storage failover takeover -ofnode partner_node

See the man page for command details.


4. Enter either of the following commands to confirm that takeover occurred:
storage failover show-takeover
storage failover show
Example

If you have the storage failover command's -auto-giveback option enabled:


cluster::> storage failover show
Takeover
Node
Partner Possible
State Description
------ ------- --------- ----------------node1 node2
Waiting for giveback
node2 node1
false
In takeover, Auto giveback will be
initiated in number of seconds seconds

Example

If you have the storage failover command's -auto-giveback option disabled:


cluster::> storage failover show
Takeover
Node
Partner Possible
State Description
------ ------- --------- ----------------node1 node2
Waiting for giveback
node2 node1
false
In takeover.

5. Enter the following command to display all the disks that belong to the partner node (Node2) that
the takeover node (Node1) can detect:
storage disk show -home node2 -ownership

The following command displays all disks belonging to Node2 that Node1 can detect:

86 | High-Availability Configuration Guide


cluster::> storage disk show -home node2 -ownership
Disk
Aggregate
Reserver
Pool
------ ------------------ ----1.0.2 4078312452 Pool0
1.0.3 4078312452 Pool0
...

Home

Owner DR Home Home ID

Owner ID

DR Home ID

----- ----- ------- ---------- ---------- ---------node2 node2 -

4078312453 4078312453 -

node2 node2 -

4078312453 4078312453 -

6. Enter the following command to confirm that the takeover node (Node1) controls the partner
node's (Node2) aggregates:
aggr show fields homeid,homename,ishome
cluster::> aggr show fields homeid,homename,ishome
aggregate home-id
home-name is-home
--------- ---------- --------- --------aggr0_1
2014942045 node1
true
aggr0_2
4078312453 node2
false
aggr1_1
2014942045 node1
true
aggr1_2
4078312453 node2
false
4 entries were displayed.

During takeover, the is-home value of the partner node's aggregates is false.
7. Give back the partner node's data service after it displays the Waiting for giveback message
by entering the following command:
storage failover giveback -ofnode partner_node

8. Enter either of the following commands to observe the progress of the giveback operation:
storage failover show-giveback
storage failover show

9. Proceed depending on whether you saw the message that giveback was completed successfully:
If takeover and giveback...

Then...

Is completed successfully

Repeat Step 2 through Step 8 on the partner node.

Fails

Correct the takeover or giveback failure and then repeat this procedure.

Related references

Description of node states displayed by storage failover show-type commands on page 88

87

Monitoring an HA pair
You can use a variety of commands to monitor the status of the HA pair. If a takeover occurs, you
can also determine what caused the takeover.

Commands for monitoring an HA pair


There are specific Data ONTAP commands for monitoring the HA pair.
If you want to check...

Use this command...

Whether failover is enabled or has occurred, or


reasons why failover is not currently possible

storage failover show

Displays the nodes on which the storage


failover HA-mode setting is enabled
You must set the value to ha for the node to
participate in a storage failover (HA pair)
configuration.
The non-ha value is used only in a standalone, or single node cluster configuration.

storage failover show -mode ha

Whether hardware-assisted takeover is enabled

storage failover hwassist show

The history of hardware-assisted takeover


events that have occurred

storage failover hwassist stats show

The progress of a takeover operation as the


partner's aggregates are moved to the node
doing the takeover

storage failover showtakeover

The progress of a giveback operation in


returning aggregates to the partner node

storage failover showgiveback

Whether an aggregate is home during takeover


or giveback operations

aggregate show fields


homeid,ownerid,homename,ownername,
ishome

Whether cluster HA is enabled (applies only to


two node clusters)

cluster ha show

The HA state of the components of an HA pair


(on systems that use the HA state)

haconfig show
Note: This is a Maintenance mode command.

88 | High-Availability Configuration Guide


Related tasks

Enabling cluster HA and switchless-cluster in a two-node cluster on page 75


Related information

Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: storage failover hwassist show - Display hwassist status
Clustered Data ONTAP 8.3 man page: storage failover hwassist stats show - Display hwassist
statistics
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: cluster ha show - Show high-availability configuration
status for the cluster
Clustered Data ONTAP 8.3 man page: storage aggregate show - Display a list of aggregates

Description of node states displayed by storage failover


show-type commands
You can use the storage failover show, storage failover showtakeover, and storage
failover showgiveback commands to check the status of the HA pair and to troubleshoot
issues.
The following table shows the node states that the storage failover show command displays:
State

Meaning

Connected to partner_name.

The HA interconnect is active and can transmit


data to the partner node.

Connected to partner_name, Partial


giveback.

The HA interconnect is active and can transmit


data to the partner node. The previous giveback
to the partner node was a partial giveback, or is
incomplete.

Connected to partner_name, Takeover


of partner_name is not possible due
to reason(s): reason1, reason2,....

The HA interconnect is active and can transmit


data to the partner node, but takeover of the
partner node is not possible.
A detailed list of reasons explaining why
takeover is not possible is provided in the
section following this table.

Connected to partner_name, Partial


giveback, Takeover of partner_name
is not possible due to reason(s):
reason1, reason2,....

The HA interconnect is active and can transmit


data to the partner node, but takeover of the
partner node is not possible. The previous
giveback to the partner was a partial giveback.

Monitoring an HA pair | 89
State

Meaning

Connected to partner_name, Waiting


for cluster applications to come
online on the local node.

The HA interconnect is active and can transmit


data to the partner node and is waiting for
cluster applications to come online.
This waiting period can last several minutes.

Waiting for partner_name, Takeover


of partner_name is not possible due
to reason(s): reason1, reason2,....

The local node cannot exchange information


with the partner node over the HA interconnect.
Reasons for takeover not being possible are
displayed under reason1, reason2,

Waiting for partner_name, Partial


giveback, Takeover of partner_name
is not possible due to reason(s):
reason1, reason2,....

The local node cannot exchange information


with the partner node over the HA interconnect.
The previous giveback to the partner was a
partial giveback. Reasons for takeover not being
possible are displayed under reason1,
reason2,

Pending shutdown.

The local node is shutting down. Takeover and


giveback operations are disabled.

In takeover.

The local node is in takeover state and


automatic giveback is disabled.

In takeover, Auto giveback will be


initiated in number of seconds
seconds.

The local node is in takeover state and


automatic giveback will begin in number of
seconds seconds.

In takeover, Auto giveback


deferred.

The local node is in takeover state and an


automatic giveback attempt failed because the
partner node was not in waiting for giveback
state.

Giveback in progress, module module


name.

The local node is in the process of giveback to


the partner node. Module module name is
being given back.

Run the storage failover showgiveback command for more information.

Normal giveback not possible:


partner missing file system disks.

The partner node is missing some of its own file


system disks.

Retrieving disk information. Wait a


few minutes for the operation to
complete, then try giveback.

The partner and takeover nodes have not yet


exchanged disk inventory information. This
state clears automatically.

90 | High-Availability Configuration Guide


State

Meaning

Connected to partner_name, Takeover


is not possible: Local node missing
partner disks

After a takeover or giveback operation (or in the


case of MetroCluster, a disaster recovery
operation including switchover, healing, or
switchback), you might see disk inventory
mismatch messages.
If this is the case, you should wait at least five
minutes for the condition to resolve before
retrying the operation.
If the condition persists, investigate possible
disk or cabling issues.

Connected to partner, Takeover is


not possible: Storage failover
mailbox disk state is invalid,
Local node has encountered errors
while reading the storage failover
partner's mailbox disks. Local node
missing partner disks

After a takeover or giveback operation (or in the


case of MetroCluster, a disaster recovery
operation including switchover, healing, or
switchback), you might see disk inventory
mismatch messages.

Previous giveback failed in module


module name.

Giveback to the partner node by the local node


failed due to an issue in module name.

If this is the case, you should wait at least five


minutes for the condition to resolve before
retrying the operation.
If the condition persists, investigate possible
disk or cabling issues.

Run the storage failover showgiveback command for more information.

Previous giveback failed. Auto


giveback disabled due to exceeding
retry counts.

Giveback to the partner node by the local node


failed. Automatic giveback is disabled because
of excessive retry attempts.

Takeover scheduled in seconds


seconds.

Takeover of the partner node by the local node


is scheduled due to the partner node shutting
down or an operator-initiated takeover from the
local node. The takeover will be initiated within
the specified number of seconds.

Takeover in progress, module module


name.

The local node is in the process of taking over


the partner node. Module module name is
being taken over.

Takeover in progress.

The local node is in the process of taking over


the partner node.

Monitoring an HA pair | 91
State

Meaning

firmware-status.

The node is not reachable and the system is


trying to determine its status from firmware
updates to its partner.
A detailed list of possible firmware statuses is
provided after this table.

Node unreachable.

The node is unreachable and its firmware status


cannot be determined.

Takeover failed, reason: reason.

Takeover of the partner node by the local node


failed due to reason reason.

Previous giveback failed in module:


module name. Auto giveback disabled
due to exceeding retry counts.

Previously attempted giveback failed in module


module name. Automatic giveback is disabled.

Previous giveback failed in module:


module name.

Run the storage failover showgiveback command for more information.

Previously attempted giveback failed in module


module name. Automatic giveback is not
enabled by the user.

Run the storage failover showgiveback command for more information.

Connected to partner_name, Giveback


of one or more SFO aggregates
failed.

The HA interconnect is active and can transmit


data to the partner node. Giveback of one or
more SFO aggregates failed and the node is in
partial giveback state.

Waiting for partner_name, Partial


giveback, Giveback of one or more
SFO aggregates failed.

The local node cannot exchange information


with the partner node over the HA interconnect.
Giveback of one or more SFO aggregates failed
and the node is in partial giveback state.

Connected to partner_name, Giveback


of SFO aggregates in progress.

The HA interconnect is active and can transmit


data to the partner node. Giveback of SFO
aggregates is in progress.

Run the storage failover showgiveback command for more information.

92 | High-Availability Configuration Guide


State

Meaning

Waiting for partner_name, Giveback


of SFO aggregates in progress.

The local node cannot exchange information


with the partner node over the HA interconnect.
Giveback of SFO aggregates is in progress.

Run the storage failover showgiveback command for more information.

Waiting for partner_name. Node owns


aggregates belonging to another
node in the cluster.

The local node cannot exchange information


with the partner node over the HA interconnect,
and owns aggregates that belong to the partner
node.

Connected to partner_name, Giveback


of partner spare disks pending.

The HA interconnect is active and can transmit


data to the partner node. Giveback of SFO
aggregates to the partner is done, but partner
spare disks are still owned by the local node.

Run the storage failover showgiveback command for more information.

Connected to partner_name,
Automatic takeover disabled.

The HA interconnect is active and can transmit


data to the partner node. Automatic takeover of
the partner is disabled.

Waiting for partner_name, Giveback


of partner spare disks pending.

The local node cannot exchange information


with the partner node over the HA interconnect.
Giveback of SFO aggregates to the partner is
done, but partner spare disks are still owned by
the local node.

Run the storage failover showgiveback command for more information.

Waiting for partner_name. Waiting


for partner lock synchronization.

The local node cannot exchange information


with the partner node over the HA interconnect,
and is waiting for partner lock synchronization
to occur.

Waiting for partner_name. Waiting


for cluster applications to come
online on the local node.

The local node cannot exchange information


with the partner node over the HA interconnect,
and is waiting for cluster applications to come
online.

Takeover scheduled. target node


relocating its SFO aggregates in
preparation of takeover.

Takeover processing has started. The target


node is relocating ownership of its SFO
aggregates in preparation for takeover.

Monitoring an HA pair | 93
State

Meaning

Takeover scheduled. target node has


relocated its SFO aggregates in
preparation of takeover.

Takeover processing has started. The target


node has relocated ownership of its SFO
aggregates in preparation for takeover.

Takeover scheduled. Waiting to


disable background disk firmware
updates on local node. A firmware
update is in progress on the node.

Takeover processing has started. The system is


waiting for background disk firmware update
operations on the local node to complete.

Relocating SFO aggregates to taking


over node in preparation of
takeover.

The local node is relocating ownership of its


SFO aggregates to the taking-over node in
preparation for takeover.

Relocated SFO aggregates to taking


over node. Waiting for taking over
node to takeover.

Relocation of ownership of SFO aggregates


from the local node to the taking-over node has
completed. The system is waiting for takeover
by the taking-over node.

Relocating SFO aggregates to


partner_name. Waiting to disable
background disk firmware updates on
the local node. A firmware update
is in progress on the node.

Relocation of ownership of SFO aggregates


from the local node to the taking-over node is in
progress. The system is waiting for background
disk firmware update operations on the local
node to complete.

Relocating SFO aggregates to


partner_name. Waiting to disable
background disk firmware updates on
partner_name. A firmware update is
in progress on the node.

Relocation of ownership of SFO aggregates


from the local node to the taking-over node is in
progress. The system is waiting for background
disk firmware update operations on the partner
node to complete.

Connected to partner_name. Previous


takeover attempt was aborted
because reason. Local node owns
some of partner's SFO aggregates.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt was aborted because of the reason
displayed under reason. The local node owns
some of its partner's SFO aggregates.

Reissue a takeover of the partner


with the "bypass-optimization"
parameter set to true to takeover
remaining aggregates, or issue a
giveback of the partner to return
the relocated aggregates.

Either reissue a takeover of the partner node,


setting the bypassoptimization
parameter to true to takeover the remaining
SFO aggregates, or perform a giveback of
the partner to return relocated aggregates.

94 | High-Availability Configuration Guide


State

Meaning

Connected to partner_name. Previous


takeover attempt was aborted. Local
node owns some of partner's SFO
aggregates.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt was aborted. The local node owns some
of its partner's SFO aggregates.

Reissue a takeover of the partner


with the "bypass-optimization"
parameter set to true to takeover
remaining aggregates, or issue a
giveback of the partner to return
the relocated aggregates.
Waiting for partner_name. Previous
takeover attempt was aborted
because reason. Local node owns
some of partner's SFO aggregates.
Reissue a takeover of the partner
with the "bypass-optimization"
parameter set to true to takeover
remaining aggregates, or issue a
giveback of the partner to return
the relocated aggregates.

Waiting for partner_name. Previous


takeover attempt was aborted. Local
node owns some of partner's SFO
aggregates.
Reissue a takeover of the partner
with the "bypass-optimization"
parameter set to true to takeover
remaining aggregates, or issue a
giveback of the partner to return
the relocated aggregates.
Connected to partner_name. Previous
takeover attempt was aborted
because failed to disable
background disk firmware update
(BDFU) on local node.

Either reissue a takeover of the partner node,


setting the bypassoptimization
parameter to true to takeover the remaining
SFO aggregates, or perform a giveback of
the partner to return relocated aggregates.

The local node cannot exchange information


with the partner node over the HA interconnect.
The previous takeover attempt was aborted
because of the reason displayed under reason.
The local node owns some of its partner's SFO
aggregates.

Either reissue a takeover of the partner node,


setting the bypassoptimization
parameter to true to takeover the remaining
SFO aggregates, or perform a giveback of
the partner to return relocated aggregates.

The local node cannot exchange information


with the partner node over the HA interconnect.
The previous takeover attempt was aborted. The
local node owns some of its partner's SFO
aggregates.

Either reissue a takeover of the partner node,


setting the bypassoptimization
parameter to true to takeover the remaining
SFO aggregates, or perform a giveback of
the partner to return relocated aggregates.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt was aborted because the background
disk firmware update on the local node was not
disabled.

Monitoring an HA pair | 95
State

Meaning

Connected to partner_name. Previous


takeover attempt was aborted
because reason.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt was aborted because of the reason
displayed under reason.

Waiting for partner_name. Previous


takeover attempt was aborted
because reason.

The local node cannot exchange information


with the partner node over the HA interconnect.
The previous takeover attempt was aborted
because of the reason displayed under reason.

Connected to partner_name. Previous


takeover attempt by partner_name
was aborted because reason.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt by the partner node was aborted
because of the reason displayed under reason.

Connected to partner_name. Previous


takeover attempt by partner_name
was aborted.

The HA interconnect is active and can transmit


data to the partner node. The previous takeover
attempt by the partner node was aborted.

Waiting for partner_name. Previous


takeover attempt by partner_name
was aborted because reason.

The local node cannot exchange information


with the partner node over the HA interconnect.
The previous takeover attempt by the partner
node was aborted because of the reason
displayed under reason.

Previous giveback failed in module:


module name. Auto giveback will be
initiated in number of seconds
seconds.

The previous giveback attempt failed in module


module_name. Auto giveback will be initiated
in number of seconds seconds.

Run the storage failover showgiveback command for more information.

Node owns partner's aggregates as


part of the non-disruptive
controller upgrade procedure.

The node owns its partner's aggregates due to


the non-disruptive controller upgrade procedure
currently in progress.

Connected to partner_name. Node


owns aggregates belonging to
another node in the cluster.

The HA interconnect is active and can transmit


data to the partner node. The node owns
aggregates belonging to another node in the
cluster.

Connected to partner_name. Waiting


for partner lock synchronization.

The HA interconnect is active and can transmit


data to the partner node. The system is waiting
for partner lock synchronization to complete.

96 | High-Availability Configuration Guide


State

Meaning

Connected to partner_name. Waiting


for cluster applications to come
online on the local node.

The HA interconnect is active and can transmit


data to the partner node. The system is waiting
for cluster applications to come online on the
local node.

Non-HA mode, reboot to use full


NVRAM.

Storage failover is not possible. The HA mode


option is configured as non_ha.

Non-HA mode, remove HA interconnect


card from HA slot to use full
NVRAM.

Storage failover is not possible. The HA mode


option is configured as non_ha.

Non-HA mode, remove partner system


to use full NVRAM.

You must remove the partner controller from


the chassis to use all of the node's NVRAM.

Storage failover is not possible.

Non-HA mode. See documentation for


procedure to activate HA.

You must move the HA interconnect card


from the HA slot to use all of the node's
NVRAM.

Storage failover is not possible. The HA mode


option is configured as non_ha.

Non-HA mode. Reboot node to


activate HA.

You must reboot the node to use all of its


NVRAM.

The node must be rebooted to enable HA


capability.

Storage failover is not possible. The HA mode


option is configured as non_ha.

You must run the storage failover


modify mode ha node nodename

command on both nodes in the HA pair and


then reboot the nodes to enable HA
capability.

Possible reasons automatic takeover is not possible


If automatic takeover is not possible, the reasons are displayed in the storage failover show
command output. The output has the following form:
Takeover of partner_name is not possible due to reason(s): reason1,
reason2, ...

Monitoring an HA pair | 97
Possible values for reason are as follows:

Automatic takeover is disabled

Disk shelf is too hot

Disk inventory not exchanged

Failover partner node is booting

Failover partner node is performing software revert

Local node about to halt

Local node has encountered errors while reading the storage failover partner's mailbox disks

Local node is already in takeover state

Local node is performing software revert

Local node missing partner disks

Low memory condition

NVRAM log not synchronized

Storage failover interconnect error

Storage failover is disabled

Storage failover is disabled on the partner node

Storage failover is not initialized

Storage failover mailbox disk state is invalid

Storage failover mailbox disk state is uninitialized

Storage failover mailbox version mismatch

Takeover disabled by operator

The size of NVRAM on each node of the HA pair is different

The version of software running on each node of the HA pair is incompatible

Partner node attempting to take over this node

Partner node halted after disabling takeover

Takeover disallowed due to unknown reason

Waiting for partner node to recover

98 | High-Availability Configuration Guide


Possible firmware states

Boot failed

Booting

Dumping core

Dumping sparecore and ready to be taken-over

Halted

In power-on self test

In takeover

Initializing

Operator completed

Rebooting

Takeover disabled

Unknown

Up

Waiting

Waiting for cluster applications to come online on the local node

Waiting for giveback

Waiting for operator input

Related references

Commands for setting the HA mode on page 74


Related information

Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status

99

Halting or rebooting a node without initiating


takeover
You can prevent an automatic storage failover takeover when you halt or reboot a node. This ability
enables specific maintenance and reconfiguration operations.

Commands for halting or rebooting a node without initiating


takeover
Inhibiting automatic storage failover takeover when halting or rebooting a node requires specific
commands. If you have a two-node cluster, you must perform additional steps to ensure continuity of
service.
To prevent the partner from
taking over when you...

Use this command...

Halt the node

system node halt node node inhibit-takeover

If you have a two-node cluster, this command causes all data


LIFs in the cluster to go offline unless you first disable cluster
HA and then assign epsilon to the node that you intend to keep
online.
Reboot the node
Including the

inhibittakeover

parameter overrides the

takeoveronreboot setting

system node reboot node node inhibit-takeover

If you have a two-node cluster, this command causes all data


LIFs in the cluster to go offline unless you first disable cluster
HA and then assign epsilon to the node that you intend to keep
online.

of the partner node to prevent


it from initiating takeover.
Reboot the node
By default, a node
automatically takes over for its
partner if the partner reboots.
You can change the
onreboot parameter of the
storage failover

command to change this


behavior.

storage failover modify node node onreboot


false

Takeover can still occur if the partner exceeds the


userconfigurable expected time to reboot, even when the
onreboot parameter is set to false.

100 | High-Availability Configuration Guide


Related tasks

Halting or rebooting a node without initiating takeover in a two-node cluster on page 100
Related information

Clustered Data ONTAP 8.3 man page: system node reboot - Reboot a node
Clustered Data ONTAP 8.3 man page: storage failover modify - Modify storage failover attributes

Halting or rebooting a node without initiating takeover in a


two-node cluster
In a two-node cluster, cluster HA ensures that the failure of one node does not disable the cluster. If
you halt or reboot a node in a two-node cluster without takeover by using the inhibittakeover
true parameter, both nodes will stop serving data unless you change specific configuration settings.
About this task

Before a node in a cluster configured for cluster HA is rebooted or halted using the

inhibittakeover true parameter, you must first disable cluster HA and then assign epsilon to

the node that you want to remain online.


Steps

1. Enter the following command to disable cluster HA:


cluster ha modify -configured false

Note that this operation does not disable storage failover.


2. Because disabling cluster HA automatically assigns epsilon to one of the two nodes, you must
determine which node holds it, and if necessary, reassign it to the node that you wish to remain
online.
a. Enter the following command to change to the advanced privilege level:
set -privilege advanced

Confirm when prompted to continue into advanced mode. The advanced mode prompt appears
(*>).
b. Determine which node holds epsilon by using the following command:
cluster show

In the following example, Node1 holds epsilon:

Halting or rebooting a node without initiating takeover | 101


cluster::*> cluster show
Node
Health
-------------------- ------Node1
true
Node2
true

Eligibility
-----------true
true

Epsilon
-----------true
false

If the node you wish to halt or reboot does not hold epsilon, proceed to step 3.
c. If the node you wish to halt or reboot holds epsilon, you must remove it from the node by
using the following command:
cluster modify -node Node1 -epsilon false

At this point, neither node holds epsilon.


d. Assign epsilon to the node that you wish to remain online (in this example, Node2) by using
the following command:
cluster modify -node Node2 -epsilon true

3. Halt or reboot and inhibit takeover of the node that does not hold epsilon (in this example, Node2)
by using either of the following commands as appropriate:
system node halt -node Node2 -inhibit-takeover true
system node reboot -node Node2 -inhibit-takeover true

4. After the halted or rebooted node is back online, you must enable cluster HA by using the
following command:
cluster ha modify -configured true

Enabling cluster HA automatically removes epsilon from both nodes.


5. Enter the following command to return to the admin privilege level:
set -privilege admin
Related tasks

Moving epsilon for certain manually initiated takeovers on page 104

102 | High-Availability Configuration Guide

About manual takeover


You can perform a takeover manually when maintenance is required on the partner, and in other
similar situations. Depending on the state of the partner, the command you use to perform the
takeover varies.

Commands for performing and monitoring manual


takeovers
You can manually initiate the takeover of a node in an HA pair to perform maintenance on that node
while it is still serving the data on its disks, array LUNs, or both to users.
The following table lists and describes the commands you can use when initiating a takeover:
If you want to...

Use this command...

Take over the partner node

storage failover takeover

Monitor the progress of the takeover as the


partner's aggregates are moved to the node
doing the takeover

storage failover showtakeover

Display the storage failover status for all nodes


in the cluster

storage failover show

Take over the partner node without migrating


LIFs

storage failover takeover


skiplifmigrationbeforetakeover
true

Take over the partner node even if there is a disk storage failover takeover
mismatch
allowdiskinventorymismatch
Take over the partner node even if there is a
Data ONTAP version mismatch

storage failover takeover option


allowversionmismatch

Note: This option is only used during the


nondisruptive Data ONTAP upgrade process.

Take over the partner node without performing


aggregate relocation

storage failover takeover


bypassoptimization true

Take over the partner node before the partner


storage failover takeover option
has time to close its storage resources gracefully immediate

About manual takeover | 103


Note: Before you issue the storage failover command with the immediate option, you must
migrate the data LIFs to another node by using the following command:
network interface migrate-all -node node

If you specify the storage failover takeover option immediate command without
first migrating the data LIFs, data LIF migration from the node is significantly delayed even if
the skiplifmigrationbeforetakeover option is not specified.

Similarly, if you specify the immediate option, negotiated takeover optimization is bypassed
even if the bypassoptimization option is set to false.

Attention: For All-Flash Optimized FAS80xx series systems, both nodes in the HA pair must have

the All-Flash Optimized personality enabled. In an HA pair with an All-Flash Optimized


personality configuration mismatch, a storage failover takeover of aggregates with HDDs will take
the HDD aggregates offline. Additionally, even forced storage failover takeover operations might
not bring HDD aggregates online.
Because the All-Flash Optimized configuration supports only SSDs, if one node in the HA pair has
HDDs or array LUNs (and therefore, is not configured with the All-Flash Optimized personality),
the following conditions apply if you attempt a storage failover takeover of that node by the node
that has the All-Flash Optimized personality enabled:

Graceful storage failover takeover fails.


Do not attempt storage failover takeover unless you first correct the All-Flash Optimized
personality mismatch.

Storage failover takeover using the -allow-disk-inventory-mismatch true parameter


might succeed, but fail to bring online aggregates with HDDs.
If you specify this parameter, negotiated takeover optimization is bypassed even if the
bypassoptimization parameter is set to false. Using this parameter can result in client
outage.

Storage failover takeover using the immediate option succeeds.

Storage failover takeover using the force option succeeds.


If you specify this option, negotiated takeover optimization is bypassed even if the
bypassoptimization option is set to false.
Attention: Using this option can result in data loss. If the HA interconnect is detached or

inactive, or the contents of the failover partner's NVRAM cards are unsynchronized,
takeover is normally disabled. Using the force option enables a node to take over its
partner's storage despite the unsynchronized NVRAM, which can contain client data that can
be lost upon storage failover takeover.

104 | High-Availability Configuration Guide


Related information

Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 man page: storage failover show - Display storage failover status
Clustered Data ONTAP 8.3 man page: network interface migrate-all - Migrate all data and cluster
management logical interfaces away from the specified node
Clustered Data ONTAP 8.3 Physical Storage Management Guide

Moving epsilon for certain manually initiated takeovers


You should move epsilon if you expect any manually initiated takeovers could result in your storage
system being one unexpected node failure away from a cluster-wide loss of quorum.
About this task

To perform planned maintenance, you must take over one of the nodes in an HA pair. Cluster-wide
quorum must be maintained to prevent unplanned client data disruptions for the remaining nodes. In
some instances, performing the takeover can result in a cluster that is one unexpected node failure
away from cluster-wide loss of quorum.
This can occur if the node being taken over holds epsilon or if the node with epsilon is not healthy.
To maintain a more resilient cluster, you can transfer epsilon to a healthy node that is not being taken
over. Typically, this would be the HA partner.
Only healthy and eligible nodes participate in quorum voting. To maintain cluster-wide quorum,
more than N/2 votes are required (where N represents the sum of healthy, eligible, online nodes). In
clusters with an even number of online nodes, epsilon adds additional voting weight toward
maintaining quorum for the node to which it is assigned.
Note: Although cluster formation voting can be modified by using the cluster modify
eligibility false command, you should avoid this except for situations such as restoring the

node configuration or prolonged node maintenance. If you set a node as ineligible, it stops serving
SAN data until the node is reset to eligible and rebooted. NAS data access to the node might also
be affected when the node is ineligible.
For further information on cluster administration, quorum and epsilon, see the document library on
the NetApp support site at mysupport.netapp.com/documentation/productsatoz/index.html.

Clustered Data ONTAP 8.3 System Administration Guide for Cluster Administrators
Steps

1. Verify the cluster state and confirm that epsilon is held by a healthy node that is not being taken
over.

About manual takeover | 105


a. Enter the following command to change to the advanced privilege level:
set -privilege advanced

Confirm you want to continue when the advanced mode prompt appears (*>).
b. Determine which node holds epsilon by using the following command:
cluster show

In the following example, Node1 holds epsilon:


cluster::*> cluster show
Node
Health
-------------------- ------Node1
true
Node2
true

Eligibility
-----------true
true

Epsilon
-----------true
false

If the node you want to take over does not hold epsilon, proceed to Step 4.
2. Enter the following command to remove epsilon from the node that you want to take over:
cluster modify -node Node1 -epsilon false

3. Assign epsilon to the partner node (in this example, Node2) by using the following command:
cluster modify -node Node2 -epsilon true

4. Perform the takeover operation using the following command:


storage failover takeover -ofnode node

5. Enter the following command to return to the admin privilege level:


set -privilege admin
Related tasks

Halting or rebooting a node without initiating takeover in a two-node cluster on page 100
Related references

Halting or rebooting a node without initiating takeover on page 99

106 | High-Availability Configuration Guide

About manual giveback


You can perform a normal giveback, a giveback in which you terminate processes on the partner
node, or a forced giveback.
Note: Prior to performing a giveback, you must remove the failed drives in the taken-over system
as described in the Clustered Data ONTAP Physical Storage Management Guide.

If giveback is interrupted
If the takeover node experiences a failure or a power outage during the giveback process, that process
stops and the takeover node returns to takeover mode until the failure is repaired or the power is
restored.
However, this depends upon the stage of giveback in which the failure occurred. If the node
encountered failure or a power outage during partial giveback state (after it has given back the root
aggregate), it will not return to takeover mode. Instead, the node returns to partial-giveback mode. If
this occurs, complete the process by repeating the giveback operation.

If giveback is vetoed
If giveback is vetoed, you must check the EMS messages to determine the cause. Depending on the
reason or reasons, you can decide whether you can safely override the vetoes.
The storage failover show-giveback command displays the giveback progress and shows
which subsystem vetoed the giveback, if any. Soft vetoes can be overridden, while hard vetoes cannot
be, even if forced. The following tables summarize the soft vetoes that should not be overridden,
along with recommended workarounds.
You can review the EMS details for any giveback vetoes by using the following command:
event log show -node * -event gb*

Giveback of the root aggregate


These vetoes do not apply to aggregate relocation operations:
Vetoing subsystem
module

Workaround

vfiler_low_level

Terminate the CIFS sessions causing the veto, or shutdown the CIFS
application that established the open sessions.
Overriding this veto might cause the application using CIFS to
disconnect abruptly and lose data.

About manual giveback | 107


Vetoing subsystem
module

Workaround

Disk Check

All failed or bypassed disks should be removed before attempting


giveback.
If disks are sanitizing, you should wait until the operation completes.
Overriding this veto might cause an outage caused by aggregates or
volumes going offline due to reservation conflicts or inaccessible
disks.

Giveback of SFO aggregates


Vetoing subsystem
module

Workaround

Lock Manager

Gracefully shutdown the CIFS applications that have open files, or


move those volumes to a different aggregate.
Overriding this veto results in loss of CIFS lock state, causing
disruption and data loss.

Lock Manager NDO

Wait until the locks are mirrored.


Overriding this veto causes disruption to Microsoft Hyper-V virtual
machines.

RAID

Check the EMS messages to determine the cause of the veto:

Disk Inventory

If the veto is due to nvfile, bring the offline volumes and


aggregates online.

If disk add or disk ownership reassignment operations are in


progress, wait until they complete.

If the veto is due to an aggregate name or UUID conflict,


troubleshoot and resolve the issue.

If the veto is due to mirror resync, mirror verify, or offline disks,


the veto can be overridden and the operation restarts after
giveback.

Troubleshoot to identify and resolve the cause of the problem.


The destination node might be unable to see disks belonging to an
aggregate being migrated.
Inaccessible disks can result in inaccessible aggregates or volumes.

108 | High-Availability Configuration Guide


Vetoing subsystem
module

Workaround

SnapMirror

Troubleshoot to identify and resolve the cause of the problem.


This veto is due to failure to send an appropriate message to
SnapMirror, preventing SnapMirror from shutting down.

Related references

Description of node states displayed by storage failover show-type commands on page 88


Related information

Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status

Commands for performing a manual giveback


You can manually initiate a giveback on a node in an HA pair to return storage to the original owner
after completing maintenance or resolving any issues that caused the takeover.
If you want to...

Use this command...

Give back storage to a partner node

storage failover giveback ofnode


nodename

Give back storage even if the partner is not in


the waiting for giveback mode

storage failover giveback ofnode


nodename requirepartnerwaiting
false

Do not use this option unless a longer client


outage is acceptable.
Give back storage even if processes are vetoing
the giveback operation (force the giveback)

storage failover giveback ofnode


nodename overridevetoes true

Use of this option can potentially lead to longer


client outage, or aggregates and volumes not
coming online after the giveback.
Give back only the CFO aggregates (the root
aggregate)

storage failover giveback ofnode


nodename onlycfoaggregates true

Monitor the progress of giveback after you issue


the giveback command

storage failover showgiveback

About manual giveback | 109


Related information

Clustered Data ONTAP 8.3 man page: storage failover giveback - Return failed-over storage to its
home node
Clustered Data ONTAP 8.3 man page: storage failover show-giveback - Display giveback status
Clustered Data ONTAP 8.3 man page: storage failover takeover - Take over the storage of a node's
partner
Clustered Data ONTAP 8.3 man page: storage failover show-takeover - Display takeover status
Clustered Data ONTAP 8.3 Command Map for 7-Mode Administrators

110 | High-Availability Configuration Guide

Managing DS14mk2 AT or DS14mk4 FC disk


shelves in an HA pair
You must follow specific procedures to add disk shelves to an HA pair or to upgrade or replace disk
shelf hardware in an HA pair.
This section covers DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair. Refer to the the
NetApp Support Site for additional documentation if your HA pair configuration includes SAS disk
shelves.
Related information

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

Adding DS14mk2 AT or DS14mk4 FC disk shelves to a


multipath HA loop
To add supported DS14mk2 AT or DS14mk4 FC disk shelves to an HA pair configured for multipath
HA, you must add the new disk shelf to the end of a loop, ensuring that it is connected to the previous
disk shelf and to the controller.
About this task

This procedure does not apply to SAS disk shelves.

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246
Steps

1. Confirm that there are two paths to every disk by entering the following command:
storage disk show -port
Note: If two paths are not listed for every disk, this procedure could result in a data service
outage. Before proceeding, address any issues so that all paths are redundant. If you do not
have redundant paths to every disk, you can use the nondisruptive upgrade method (failover) to
add your storage.

2. Install the new disk shelf in your cabinet or equipment rack, as described in the DiskShelf14mk2
and DiskShelf14mk4 FC, or DiskShelf14mk2 AT Hardware Service Guide.

Managing DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair | 111


3. Find the last disk shelf in the loop to which you want to add the new disk shelf. The Channel A
Output port of the last disk shelf in the loop is connected back to one of the controllers.
Note: In Step 4 you disconnect the cable from the disk shelf. When you do this, the system

displays messages about adapter resets and eventually indicates that the loop is down. These
messages are normal within the context of this procedure. However, to avoid them, you can
optionally disable the adapter prior to disconnecting the disk shelf.
If you choose to, disable the adapter attached to the Channel A Output port of the last disk
shelf by entering the following command:
run node nodename fcadmin config -d adapter
adapter identifies the adapter by name. For example: 0a.

4. Disconnect the SFP and cable coming from the Channel A Output port of the last disk shelf.
Note: Leave the other ends of the cable connected to the controller.

5. Using the correct cable for a shelf-to-shelf connection, connect the Channel A Output port of the
last disk shelf to the Channel A Input port of the new disk shelf.
6. Connect the cable and SFP you removed in Step 4 to the Channel A Output port of the new disk
shelf.
7. If you disabled the adapter in Step 3, reenable the adapter by entering the following command:
run node nodename fcadmin config -e adapter

8. Repeat Step 4 through Step 7 for Channel B.


Note: The Channel B Output port is connected to the other controller.

9. Confirm that there are two paths to every disk by entering the following command:
storage disk show -port

Two paths should be listed for every disk.


Related information

SAS Disk Shelves Universal SAS and ACP Cabling Guide


SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

112 | High-Availability Configuration Guide

Upgrading or replacing modules in an HA pair


In an HA pair with redundant pathing, you can upgrade or replace disk shelf modules without
interrupting access to storage.
About this task

These procedures are for DS14mk2 AT or DS14mk4 FC disk shelves.


If your configuration includes SAS disk shelves, refer to the NetApp support site at
mysupport.netapp.com/documentation/productsatoz/index.html for additional documentation.

SAS Disk Shelves Universal SAS and ACP Cabling Guide

SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

About the disk shelf modules


A disk shelf module (ESH4 or AT-FCX) in a DS14mk2 AT or DS14mk4 FC shelf includes a SCSI-3
Enclosure Services Processor that maintains the integrity of the loop when disks are swapped and
provides signal retiming for enhanced loop stability. When upgrading or replacing a module, you
must be sure to cable the modules correctly.
The DS14mk2 AT or DS14mk4 FC disk shelves support the ESH4 or AT-FCX modules.
There are two modules in the middle of the rear of the disk shelf, one for Channel A and one for
Channel B.
Note: The Input and Output ports on module B on the DS14mk2 AT/DS14mk4 FC shelf are the
reverse of module A.

Restrictions for changing module types


If you plan to change the type of any module in your HA pair, make sure that you understand the
restrictions.
You cannot mix ESH4 modules in the same loop with AT-FCX modules.

Managing DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair | 113

Best practices for changing module types


If you plan to change the type of any module in your HA pair, make sure that you review the best
practice guidelines.

Whenever you remove a module from an HA pair, you need to know whether the path you will
disrupt is redundant.
If it is, you can remove the module without interfering with the storage systems ability to serve
data. However, if that module provides the only path to any disk in your HA pair, you must take
action to ensure that you do not incur system downtime.

When you replace a module, make sure that the replacement modules termination switch is in the
same position as the module it is replacing.
Note: ESH4 modules are self-terminating; this guideline does not apply to ESH4 modules.

If you replace a module with a different type of module, make sure that you also change the
cables, if necessary.
For more information about supported cable types, see the hardware documentation for your disk
shelf.

Always wait 30 seconds after inserting any module before reattaching any cables in that loop.

Testing the modules


You should test your disk shelf modules after replacing or upgrading them to ensure that they are
configured correctly and operating.
Steps

1. Verify that all disk shelves are functioning properly by entering the following command:
run -node nodename environ shelf

2. Verify that there are no missing disks by entering the following command:
run -node nodename aggr status -r

Local disks displayed on the local node should be displayed as partner disks on the partner node,
and vice-versa.
3. Verify that you can create and retrieve files on both nodes for each licensed protocol.

114 | High-Availability Configuration Guide

Determining the path status for your HA pair


If you want to remove a module from your HA pair, you need to know if the path you plan to disrupt
is redundant. You can use the storage disk show -port command to indicate whether the disks
have redundant paths.
About this task

If the disks have redundant paths, you can remove the module without interfering with the storage
systems ability to serve data. However, if that module provides the only path to any of the disks in
your HA pair, you must take action to ensure that you do not incur system downtime.
Step

1. Enter the storage disk show -port command on your system console.
This command displays the following information for every disk in the HA pair:

Primary port

Secondary port

Disk type

Disk shelf

Bay

Examples for configurations with and without redundant paths


The following example shows what the storage disk show -port command output might
look like for a redundant-path HA pair consisting of Data ONTAP systems:
cluster::> storage disk show -port
Primary
Port Secondary
Bay
--------------- ---- --------------1.0.0
A
node1:0b.00.0
1.0.1
A
node1:0b.00.1
1.0.2
A
node1:0b.00.2
1.0.3
B
node2:0b.00.3
1.0.4
B
node2:0b.00.4
...

Port Type
---B
B
B
A
A

Shelf

------ ----- --SAS


0
0
SAS
0
1
SAS
0
2
SAS
0
3
SAS
0
4

Notice that every disk has two active ports: one for A and one for B. The presence of the
redundant path means that you do not need to fail over a node before removing modules from
the system.

Managing DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair | 115


Attention: Make sure that every disk has two paths. Even in an HA pair configured for

redundant paths, a hardware or configuration problem can cause one or more disks to have
only one path. If any disk in your HA pair has only one path, you must treat that loop as if it
were in a single-path HA pair when removing modules.
The following example shows what the storage disk show -port command output might
look like for an HA pair consisting of Data ONTAP systems that do not use redundant paths:
cluster::> storage disk show -port
Primary
Port Secondary
Bay
--------------- ---- --------------1.0.0
A
1.0.1
A
1.0.2
A
1.0.3
B
1.0.4
B
...

Port Type
----

Shelf

------ ----- --SAS


0
0
SAS
0
1
SAS
0
2
SAS
0
3
SAS
0
4

For this HA pair, there is only one path to each disk. This means that you cannot remove a
module from the configuration and disable that path without first performing a takeover.

Hot-swapping a module
You can hot-swap a faulty disk shelf module, removing the faulty module and replacing it without
disrupting data availability.
About this task

When you hot-swap a disk shelf module, you must ensure that you never disable the only path to a
disk; disabling that single path results in a system outage.
Attention: If there is newer firmware in the /mroot/etc/shelf_fw directory than that on the

replacement module, the system automatically runs a firmware update. This firmware update
causes a service interruption on non-multipath HA ATFCX installations, multipath HA
configurations running versions of Data ONTAP prior to 7.3.1, and systems with non-RoHS
ATFCX modules.
Steps

1. Verify that your storage system meets the minimum software requirements to support the disk
shelf modules that you are hot-swapping.
See the DiskShelf14mk2 and DiskShelf14mk4 FC, or DiskShelf14mk2 AT Hardware Service
Guide for more information.

116 | High-Availability Configuration Guide


2. Determine which loop contains the module you are removing, and determine whether any disks
are single-pathed through that loop.
3. Complete the following steps if any disks use this loop as their only path to a controller:
a. Follow the cables from the module you want to replace back to one of the nodes, called
NodeA.
b. Enter the following command at the NodeB console:
storage failover takeover -ofnode NodeA

c. Wait for takeover to be complete and make sure that the partner node, or NodeA, reboots and
is waiting for giveback.
Any module in the loop that is attached to NodeA can now be replaced.
4. Put on the antistatic wrist strap and grounding leash.
5. Disconnect the module that you are removing from the Fibre Channel cabling.
6. Using the thumb and index fingers of both hands, press the levers on the CAM mechanism on the
module to release it and pull it out of the disk shelf.
7. Slide the replacement module into the slot at the rear of the disk shelf and push the levers of the
cam mechanism into place.
Attention: Do not use excessive force when sliding the module into the disk shelf; you might

damage the connector.


Wait 30 seconds after inserting the module before proceeding to the next step.
8. Recable the disk shelf to its original location.
9. Check the operation of the new module by entering the following command from the console of
the node that is still running:
run -node nodename

The node reports the status of the modified disk shelves.


10. Complete the following steps if you performed a takeover previously:
a. Return control of NodeAs disk shelves by entering the following command at the console of
the takeover node:
storage failover giveback -ofnode NodeA

b. Wait for the giveback to be completed before proceeding to the next step.
11. Test the replacement module.
12. Test the configuration.

Managing DS14mk2 AT or DS14mk4 FC disk shelves in an HA pair | 117


Related concepts

Best practices for changing module types on page 113


Related tasks

Determining the path status for your HA pair on page 114


Hot-removing disk shelves or loops in systems running Data ONTAP 8.2.1 or later on page 118

118 | High-Availability Configuration Guide

Nondisruptive operations with HA pairs


By taking advantage of an HA pair's takeover and giveback operations, you can change hardware
components and perform software upgrades in your configuration without disrupting access to the
system's storage.
You can perform nondisruptive operations on a system by having its partner take over the system's
storage, performing maintenance, and then giving back the storage. Aggregate relocation extends the
range of nondisruptive capabilities by enabling storage controller upgrade and replacement
operations.

Where to find procedures for nondisruptive operations with


HA pairs
An HA pair enables you to perform nondisruptive system maintenance and upgrade operations. You
can refer to the specific documents for the required procedures.
The following table lists where you can find information about nondisruptive operations:
If you want to perform this task
nondisruptively...

See the...

Upgrade Data ONTAP

Clustered Data ONTAP 8.3 Upgrade and


Revert/Downgrade Guide

Replace a hardware FRU component

FRU procedures for your platform

NetApp Documentation: Product Library A-Z


You can find a list of all FRUs for your platform
in the Hardware Universe.
NetApp Hardware Universe

Hot-removing disk shelves or loops in systems running


Data ONTAP 8.2.1 or later
If your system is running Data ONTAP 8.2.1 or later, you can hot-remove disk shelvesphysically
remove disk shelves that have had the aggregates removed from the disk drivesin a clustered Data
ONTAP multipath HA configuration with DS14 disk shelves that is up and serving data. You can hot-

Nondisruptive operations with HA pairs | 119


remove one or more disk shelves from anywhere within a loop of disk shelves or remove a loop of
disk shelves.
Before you begin

Your system must not have SAS disk shelves.


See the appropriate guide for clustered Data ONTAP systems with SAS disk shelves.

SAS Disk Shelves Installation and Service Guide for DS4243, DS2246, DS4486, and DS4246

Your storage system must be running Data ONTAP 8.2.1 or later.

Your storage system must be a multipath HA system.

For FAS2240 configurations, the external storage must be cabled as multipath HA.

You must have already removed all aggregates from the disk drives in the disk shelves you are
removing.
Attention: If you attempt this procedure with aggregates on the disk shelf you are removing,

you could fail the system with a multi-disk panic.


For information about taking an aggregate offline for clustered systems, see Commands for
managing aggregates in the Clustered Data ONTAP Physical Storage Management Guide. This
document is available on the NetApp Support Site at mysupport.netapp.com.

As a best practice, you should remove disk drive ownership after you remove the aggregates from
the disk drives in the disk shelves you are removing.
Note: This procedure follows the best practice of removing disk drive ownership; therefore,
steps are written with the assumption that you have removed disk drive ownership.

The Clustered Data ONTAP Physical Storage Management Guide includes the Removing
ownership from a disk procedure for removing disk drive ownership. This document is available
on the NetApp Support Site at mysupport.netapp.com.
Note: The procedure for removing ownership from disk drives requires you to disable disk
autoassignment. You reenable disk autoassignment when prompted at the end of this shelf hotremove procedure.

Multipath HA configurations cannot be in a takeover state.

If you are removing one or more disk shelves from within a loop, you must have factored the
distance to bypass the disk shelves you are removing; therefore, if the current cables are not long
enough, you need to have longer cables available.
The Hardware Universe at hwu.netapp.com contains information about supported cables.

About this task

This procedure follows cabling best practices; therefore, references to modules and module input
and output ports align with the best practices. If your storage system is cabled differently from

120 | High-Availability Configuration Guide


what is prescribed as best practice, the modules and/or module input and output ports might be
different.

Path A refers to the A-side disk shelf module (module A) located in the top of the disk shelf.

Path B refers to the B-side disk shelf module (module B) located in the bottom of the disk shelf.

The first disk shelf in the loop is the disk shelf with the input ports directly connected to the
controllers.

The interim disk shelf in the loop is the disk shelf directly connected to other disk shelves in the
loop.

The last disk shelf in the loop is the disk shelf with output ports directly connected to the
controllers.

The next disk shelf is the disk shelf downstream of the disk shelf being removed, in depth order.

The previous disk shelf is the disk shelf upstream of the disk shelf being removed, in depth order.

Clustered Data ONTAP commands and 7-Mode commands are used; therefore, you will be
entering commands from the clustershell and from the nodeshell.

Steps

1. Verify that your system configuration is Multi-Path HA by entering the following command
from the nodeshell of either controller:
sysconfig

It might take up to a minute for the system to complete discovery.


The configuration is listed in the System Storage Configuration field.
Note: For FAS2240 systems with external storage, the output is displayed as Mixed-Path HA

because the internal storage is cabled as single-path HA and the external storage is cabled as
multipath HA.
Attention: If your non FAS2240 system is shown as something other than Multi-Path HA,

you cannot continue with this procedure. Your system must meet the prerequisites stated in the
Before you begin section of this procedure.
2. Verify that the disk drives in the disk shelves you are removing have no aggregates (are spares)
and ownership is removed, by completing the following substeps:
a. Enter the following command from the clustershell of either controller:
storage disk show -shelf shelf_number

b. Check the output to verify there are no aggregates on the disk drives in the disk shelves you
are removing.
Disk drives with no aggregates have a dash in the Aggregate column.

Nondisruptive operations with HA pairs | 121


Attention: If disk drives in the disk shelves you are removing have aggregates, you cannot

continue with this procedure. Your system must meet the prerequisites stated in the Before
you begin section of this procedure.
c. Check the output to verify that ownership is removed from the disk drives on the disk shelves
you are removing or that the disk drives are failed.
If the output
shows...

Then...

unassigned or
broken for all disk

Go to the next step.


The disk drives in the disk shelves you are removing are in the
correct state.

drives
Any disk drives in
the disk shelves you
are removing have
ownership

You can use the Removing ownership from a disk procedure


referenced in the Before you begin section of this procedure.

Example

The following output for the storage disk show -shelf 3 command shows disk drives on
the disk shelf being removed (disk shelf 3). All of the disk drives in disk shelf 3 have a dash in the
Aggregate column. Two disk drives have the ownership removed; therefore, unassigned
appears in the Container Type column. And two disk drives are failed; therefore, broken
appears in the Container Type column:
cluster::> storage disk show -shelf 3
Usable
Disk
Disk
Size Shelf Bay Type
-------- -------- ----- --- -----...
1.3.4
3
4 SAS
1.3.5
3
5 SAS
1.3.6
3
6 SAS
1.3.7
3
7 SAS
...

Container
Container
Type
Name
Owner
----------- ---------- --------unassigned
unassigned
broken
broken

3. Turn on the LEDs for each disk drive in the disk shelves you are removing so that the disk shelves
are physically identifiable by completing the following substeps:
You need to be certain of which disk shelves you are removing so that you can correctly recable
path A and path B later in this procedure.
You enter the commands from the nodeshell of either controller.
a. Identify the disk drives in each disk shelf you are removing:
fcadmin device_map

122 | High-Availability Configuration Guide


Example

In this output, the shelf mapping shows three disk shelves in a loop and their respective 14
disk drives. If disk shelf 3 is being removed, disk drives 45 44 43 42 41 40 39 38 37 36 35 34
33 32 are applicable.
fas6200> fcadmin device_map
Loop Map for channel 0c:
...
Shelf mapping:
Shelf 3: 45 44 43 42
Shelf 4: 77 76 75 74
Shelf 5: 93 92 91 90
...

41
73
89

40
72
88

39
71
87

38
70
86

37
69
85

36
68
84

35
67
83

34
66
82

33
65
81

32
64
80

b. Turn on the LEDs for the disk drives you identified in Substep a:
led_on disk_name

You must be in advanced privilege level to enter this command.


The fault LED on the front of the disk drive illuminatessolid. Additionally, if you have any
failed disk drives in the disk shelves you are removing, the activity LED on the front of those
disk drives blinks.
It is recommended that you turn on the LED for a minimum of four disk drives so that the disk
shelves you are removing can be visually identified. You must repeat the command for each
disk drive.
Example

To turn on the fault LED for disk drive 0c.45 in disk shelf 3 identified in Substep a, you enter
led_on 0c.45

4. If you are removing an entire loop of disk shelves, complete the following substeps; otherwise, go
to the next step:
a. Remove all cables on path A and path B.
This includes controller-to-shelf cables and shelf-to-shelf cables for all disk shelves in the
loop you are removing.
b. Go to Step 8.
5. If you are removing one or more disk shelves from a loop (but keeping the loop), recable the
applicable path A loop connections to bypass the disk shelves you are removing by completing
the applicable set of substeps:
If you are removing more than one disk shelf, complete the applicable set of substeps one disk
shelf at a time.

Nondisruptive operations with HA pairs | 123


If you need a graphical system cabling reference, use the platform specific Installation and Setup
Instructions document that ships with each platform, or access these documents on the NetApp
Support Site at mysupport.netapp.com by searching on your specific platform. For example, to
find the Installation and Setup Instructions document for FAS3200 systems, search on FAS3200
series.
If you are removing...
The first disk shelf in a loop

An interim disk shelf in a


loop

The last disk shelf in a loop

Then...
a.

Remove the cable connecting the module A output port of the first
disk shelf and the module A input port of the second disk shelf in the
loop and set it aside.

b.

Move the cable connecting the controller to the module A input port
of the first disk shelf to the module A input port of the second disk
shelf in the loop

a.

Remove the cable connecting the module A output port of the disk
shelf being removed and the module A input port of the next disk
shelf in the loop and set it aside.

b.

Move the cable connecting the module A input port of the disk shelf
being removed to the module A input port of the next disk shelf in the
loop

a.

Remove the cable connecting the module A input port of the last disk
shelf and the module A output port of the previous disk shelf in the
loop and set it aside.

b.

Move the cable connecting the controller to the module A output port
of the last disk shelf to the module A output port of the previous disk
shelf in the loop

6. Verify that the cabling on path A has successfully bypassed the disk shelves you are removing
and all disk drives on the disk shelves you are removing are still connected through path B, by
entering the following command from the nodeshell of either controller:
storage show disk -p

It might take up to a minute for the system to complete discovery.


Example

In this example of how the output should appear, the disk shelf being removed is disk shelf 3. One
line item appears for each disk drive connected through path B (now the primary path); therefore,
the disk drives are listed in the PRIMARY column and B appears in the first PORT column. There is
no connectivity through path A for any of the disk drives in the disk shelf being removed;
therefore, no information is shown in the SECONDARY or second PORT columns:

124 | High-Availability Configuration Guide


fas6200> storage show disk -p
PRIMARY
PORT
---------- ---...
0d.64
B
0d.65
B
0d.66
B
0d.67
B
0d.68
B
0d.69
B
0d.70
B
0d.71
B
...

SECONDARY PORT SHELF BAY


---------- ---- --------3
3
3
3
3
3
3
3

0
1
2
3
4
5
6
7

Attention: If the output shows anything other than all the disk drives connected only through

path B, you must correct the cabling by repeating Step 5.


7. Complete the following substeps:
a. Repeat Step 5 and Step 6 for path B.
b. Repeat Step 1 to confirm that your system configuration is the same as before you began this
procedure.
c. Go to the next step.
8. If, when you removed ownership from the disk drives as part of the preparation for this
procedure, you disabled disk autoassignment, then reenable disk autoassignment by entering the
following command; otherwise, go to the next step:
storage disk option modify -autoassign on

Enter the applicable command from the clustershell of each controller.


9. Power off the disk shelves you disconnected and unplug the power cords from the disk shelves.
10. Remove the disk shelves from the rack or cabinet.
To make disk shelves lighter and easier to maneuver, remove the power supplies and modules.
Avoid removing the disk drivesor carriers if possible, because excessive handling can lead to
internal damage.
Related concepts

Best practices for changing module types on page 113


Related tasks

Hot-swapping a module on page 115


Determining the path status for your HA pair on page 114

125

Relocating aggregate ownership within an HA pair


You can change the ownership of aggregates among the nodes in an HA pair without interrupting
service from the aggregates.
Both nodes in an HA pair are physically connected to each other's disks or array LUNs. Each disk or
array LUN is owned by one of the nodes. While ownership of disks temporarily changes when a
takeover occurs, the aggregate relocation operations either permanently (for example, if done for load
balancing) or temporarily (for example, if done as part of takeover) change the ownership of all disks
or array LUNs within an aggregate from one node to the other. The ownership changes without any
data-copy processes or physical movement of the disks or array LUNs.

How aggregate relocation works


Aggregate relocation takes advantage of the HA configuration to move the ownership of storage
aggregates within the HA pair. Aggregate relocation enables storage management flexibility not only
by optimizing performance during failover events, but also facilitating system operational and
maintenance capabilities that previously required controller failover.
Aggregate relocation occurs automatically during manually initiated takeovers to reduce downtime
during planned failover events such as nondisruptive software upgrades. You can manually initiate
aggregate relocation independent of failover for performance load balancing, system maintenance,
and nondisruptive controller upgrades. However, you cannot use the aggregate relocation operation to
move ownership of the root aggregate.
The following illustration shows the relocation of the ownership of aggregate aggr_1 from Node1 to
Node2 in the HA pair:
Node1

Aggregate aggr_1
8 disks on shelf sas_1
(shaded grey)

Node2
Owned by Node1
before relocation

Owned by Node2
after relocation

126 | High-Availability Configuration Guide


The aggregate relocation operation can relocate the ownership of one or more SFO aggregates if the
destination node can support the number of volumes in the aggregates. There is only a brief
interruption of access to each aggregate. Ownership information is changed one by one for the
aggregates.
During takeover, aggregate relocation happens automatically after you manually initiate takeover.
Before the target controller is taken over, ownership of each of the controller's aggregates is moved,
one at a time, to the partner controller. When giveback is initiated, ownership is automatically moved
back to the original node. The bypassoptimization parameter can be used with the storage
failover takeover command to suppress aggregate relocation during the takeover.
Aggregate relocation and Infinite Volumes with SnapDiff enabled
The aggregate relocation requires additional steps if the aggregate is currently used by an Infinite
Volume with SnapDiff enabled. You must ensure that the destination node has a namespace mirror
constituent, and make decisions about relocating aggregates that include namespace constituents.

Clustered Data ONTAP 8.3 Infinite Volumes Management Guide


Related concepts

HA policy and how it affects takeover and giveback operations on page 28

How root-data partitioning affects aggregate relocation


If you have a platform model that uses root-data partitioning, also called shared disks, aggregate
relocation processing occurs just as with physical (nonshared) disks.
The container disk ownership changes to the destination node during aggregate relocation only if the
operation transfers ownership of all partitions on that physical disk to the destination node. This
ownership change occurs only with permanent aggregate relocation operations.
Ownership changes that occur during negotiated storage failover takeover or giveback events are
temporary.
Related concepts

Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
Related information

Clustered Data ONTAP 8.3 Physical Storage Management Guide

Relocating aggregate ownership within an HA pair | 127

Relocating aggregate ownership


You can change the ownership of an aggregate only between the nodes within an HA pair.
About this task

Because volume count limits are validated programmatically during aggregate relocation
operations, it is not necessary to check for this manually.
If the volume count exceeds the supported limit, the aggregate relocation operation fails with a
relevant error message.

You should not initiate aggregate relocation when system-level operations are in progress on
either the source or the destination node; likewise, you should not start these operations during
the aggregate relocation.
These operations can include the following:

Takeover

Giveback

Shutdown

Another aggregate relocation operation

Disk ownership changes

Aggregate or volume configuration operations

Storage controller replacement

Data ONTAP upgrade

Data ONTAP revert

If you have a MetroCluster configuration, you should not initiate aggregate relocation while
disaster recovery operations (switchover, healing, or switchback) are in progress.

If you have a MetroCluster configuration and initiate aggregate relocation on a switched-over


aggregate, the operation might fail because it exceeds the DR partner's volume limit count.

You should not initiate aggregate relocation on aggregates that are corrupt or undergoing
maintenance.

For All-Flash Optimized FAS80xx-series systems, both nodes in the HA pair must have the AllFlash Optimized personality enabled.
Because the All-Flash Optimized configuration supports only SSDs, if one node in the HA pair
has HDDs or array LUNs (and therefore, is not configured with the All-Flash Optimized

128 | High-Availability Configuration Guide


personality), you cannot perform an aggregate relocation from that node to the node with the AllFlash Optimized personality enabled.

If the source node is used by an Infinite Volume with SnapDiff enabled, you must perform
additional steps before initiating the aggregate relocation and then perform the relocation in a
specific manner.
You must ensure that the destination node has a namespace mirror constituent and make decisions
about relocating aggregates that include namespace constituents.

Clustered Data ONTAP 8.3 Infinite Volumes Management Guide

Before initiating the aggregate relocation, you should save any core dumps on the source and
destination nodes.

Steps

1. View the aggregates on the node to confirm which aggregates to move and ensure they are online
and in good condition:
storage aggregate show -node source-node
Example

The following command shows six aggregates on the four nodes in the cluster. All aggregates are
online. Node1 and Node3 form an HA pair and Node2 and Node4 form an HA pair.
cluster::> storage aggregate show
Aggregate
Size Available Used% State
#Vols Nodes RAID Status
--------- -------- --------- ----- ------- ------ ------ ----------aggr_0
239.0GB
11.13GB
95% online
1 node1 raid_dp,
normal
aggr_1
239.0GB
11.13GB
95% online
1 node1 raid_dp,
normal
aggr_2
239.0GB
11.13GB
95% online
1 node2 raid_dp,
normal
aggr_3
239.0GB
11.13GB
95% online
1 node2 raid_dp,
normal
aggr_4
239.0GB
238.9GB
0% online
5 node3 raid_dp,
normal
aggr_5
239.0GB
239.0GB
0% online
4 node4 raid_dp,
normal
6 entries were displayed.

2. Issue the command to start the aggregate relocation:


storage aggregate relocation start -aggregate-list aggregate-1,
aggregate-2... -node source-node -destination destination-node

The following command moves the aggregates aggr_1 and aggr_2 from Node1 to Node3. Node3
is Node1's HA partner. The aggregates can be moved only within the HA pair.

Relocating aggregate ownership within an HA pair | 129


cluster::> storage aggregate relocation start -aggregate-list aggr_1,
aggr_2 -node node1 -destination node3
Run the storage aggregate relocation show command to check relocation
status.
node1::storage aggregate>

3. Monitor the progress of the aggregate relocation with the storage aggregate relocation
show command:
storage aggregate relocation show -node source-node
Example

The following command shows the progress of the aggregates that are being moved to Node3:
cluster::> storage aggregate relocation show -node node1
Source Aggregate
Destination
Relocation Status
------ ----------- ------------- -----------------------node1
aggr_1
node3
In progress, module: wafl
aggr_2
node3
Not attempted yet
2 entries were displayed.
node1::storage aggregate>

When the relocation is complete, the output of this command shows each aggregate with a
relocation status of Done.
Related concepts

Benefits of root-data partitioning for entry-level and All Flash FAS storage systems on page 37
Background disk firmware update and takeover, giveback, and aggregate relocation on page 29

Commands for aggregate relocation


There are specific Data ONTAP commands for relocating aggregate ownership within an HA pair.
If you want to...

Use this command...

Start the aggregate relocation process.

storage aggregate relocation start

Monitor the aggregate relocation process

storage aggregate relocation show

Related information

Clustered Data ONTAP 8.3 Commands: Manual Page Reference

130 | High-Availability Configuration Guide

Key parameters of the storage aggregate relocation start


command
The storage aggregate relocation start command includes several key parameters used
when relocating aggregate ownership within an HA pair.
Parameter

Meaning

-node nodename

Specifies the name of the node that currently


owns the aggregate

-destination nodename

Specifies the destination node where aggregates


are to be relocated

-aggregate-list aggregate name

Specifies the list of aggregate names to be


relocated from source node to destination node
(This parameter accepts wildcards)

-override-vetoes true|false

Specifies whether to override any veto checks


during the relocation operation
Use of this option can potentially lead to longer
client outage, or aggregates and volumes not
coming online after the operation.

Relocating aggregate ownership within an HA pair | 131


Parameter

Meaning

-relocate-to-higher-version true|
false

Specifies whether the aggregates are to be


relocated to a node that is running a higher
version of Data ONTAP than the source node

-override-destination-checks true|
false

You cannot perform an aggregate relocation


from a node running Data ONTAP 8.2 to a
node running Data ONTAP 8.3 or higher
using the relocatetohigher-version
true parameter.
You must first upgrade the source node to
Data ONTAP 8.2.1 or higher before you can
perform an aggregate relocation operation
using this parameter.
Similarly, you must upgrade to Data ONTAP
8.2.1 or higher before you can upgrade to
Data ONTAP 8.3.

Although you can perform aggregate


relocation between nodes running different
minor versions of Data ONTAP (for
example, 8.2.1 to 8.2.2, or 8.2.2 to 8.2.1),
you cannot perform an aggregate relocation
operation from a higher major version to a
lower major version (for example, 8.3 to
8.2.2).

Specifies if the aggregate relocation operation


should override the check performed on the
destination node
Use of this option can potentially lead to longer
client outage, or aggregates and volumes not
coming online after the operation.

Related information

Clustered Data ONTAP 8.3 man page: storage aggregate relocation start - Relocate aggregates to
the specified destination
Clustered Data ONTAP 8.3 Upgrade and Revert/Downgrade Guide

132 | High-Availability Configuration Guide

Veto and destination checks during aggregate relocation


In aggregate relocation operations, Data ONTAP determines whether aggregate relocation can be
completed safely. If aggregate relocation is vetoed, you must check the EMS messages to determine
the cause. Depending on the reason or reasons, you can decide whether you can safely override the
vetoes.
The storage aggregate relocation show command displays the aggregate relocation
progress and shows which subsystem, if any, vetoed the relocation. Soft vetoes can be overridden, but
hard vetoes cannot be, even if forced.
You can review the EMS details for any giveback vetoes by using the following command:
event log show -node * -event gb*

You can review the EMS details for aggregate relocation by using the following command:
event log show -node * -event arl*

The following tables summarize the soft and hard vetoes, along with recommended workarounds:
Veto checks during aggregate relocation
Vetoing subsystem
module

Workaround

Vol Move

Relocation of an aggregate is vetoed if any volumes hosted by the


aggregate are participating in a volume move that has entered the
cutover state.
Wait for the volume move to complete.
If this veto is overridden, cutover resumes automatically once the
aggregate relocation completes. If aggregate relocation causes the
move operation to exceed the number of retries (the default is 3), then
the user needs to manually initiate cutover using the volume move
trigger-cutover command.

Backup

Relocation of an aggregate is vetoed if a dump or restore job is in


progress on a volume hosted by the aggregate.
Wait until the dump or restore operation in progress is complete.
If this veto is overridden, the backup or restore operation is aborted
and must be restarted by the backup application.

Lock manager

To resolve the issue, gracefully shut down the CIFS applications that
have open files, or move those volumes to a different aggregate.
Overriding this veto results in loss of CIFS lock state, causing
disruption and data loss.

Relocating aggregate ownership within an HA pair | 133


Vetoing subsystem
module

Workaround

Lock Manager NDO

Wait until the locks are mirrored.


This veto cannot be overridden; doing so disrupts Microsoft Hyper-V
virtual machines.

RAID

Check the EMS messages to determine the cause of the veto:


If disk add or disk ownership reassignment operations are in progress,
wait until they complete.
If the veto is due to a mirror resync, a mirror verify, or offline disks,
the veto can be overridden and the operation restarts after giveback.

Destination checks during aggregate relocation


Vetoing subsystem
module

Workaround

Disk Inventory

Relocation of an aggregate fails if the destination node is unable to see


one or more disks belonging to the aggregate.
Check storage for loose cables and verify that the destination can
access disks belonging to the aggregate being relocated.
This check cannot be overridden.

WAFL

Relocation of an aggregate fails if the relocation would cause the


destination to exceed its limits for maximum volume count or
maximum volume size.
This check cannot be overridden.

Lock Manager NDO

Relocation of an aggregate fails if:

The destination does not have sufficient lock manager resources to


reconstruct locks for the relocating aggregate.

The destination node is reconstructing locks.

Retry aggregate relocation after a few minutes.


This check cannot be overridden.
Lock Manager

Permanent relocation of an aggregate fails if the destination does not


have sufficient lock manager resources to reconstruct locks for the
relocating aggregate.
Retry aggregate relocation after a few minutes.
This check cannot be overridden.

134 | High-Availability Configuration Guide


Vetoing subsystem
module

Workaround

RAID

Check the EMS messages to determine the cause of the failure:

If the failure is due to an aggregate name or UUID conflict,


troubleshoot and resolve the issue. This check cannot be
overridden.

Relocating an aggregate fails if the relocation would cause the


destination to exceed its limits for maximum aggregate count, system
capacity, or aggregate capacity. You should avoid overriding this
check.
Related information

Clustered Data ONTAP 8.3 man page: storage aggregate relocation show - Display relocation
status of an aggregate

135

Copyright information
Copyright 19942015 NetApp, Inc. All rights reserved. Printed in the U.S.
No part of this document covered by copyright may be reproduced in any form or by any means
graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an
electronic retrieval systemwithout prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and
disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE,
WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice.
NetApp assumes no responsibility or liability arising from the use of products described herein,
except as expressly agreed to in writing by NetApp. The use or purchase of this product does not
convey a license under any patent rights, trademark rights, or any other intellectual property rights of
NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents,
or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer
Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).

136 | High-Availability Configuration Guide

Trademark information
NetApp, the NetApp logo, Go Further, Faster, ASUP, AutoSupport, Campaign Express, Cloud
ONTAP, clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness, Flash Accel,
Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale,
FlexShare, FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster,
MultiStore, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, SANtricity, SecureShare,
Simplicity, Simulate ONTAP, Snap Creator, SnapCopy, SnapDrive, SnapIntegrator, SnapLock,
SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapValidator,
SnapVault, StorageGRID, Tech OnTap, Unbound Cloud, and WAFL are trademarks or registered
trademarks of NetApp, Inc., in the United States, and/or other countries. A current list of NetApp
trademarks is available on the web at http://www.netapp.com/us/legal/netapptmlist.aspx.
Cisco and the Cisco logo are trademarks of Cisco in the U.S. and other countries. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as
such.

137

How to send comments about documentation and


receive update notification
You can help us to improve the quality of our documentation by sending us your feedback. You can
receive automatic notification when production-level (GA/FCS) documentation is initially released or
important changes are made to existing production-level documents.
If you have suggestions for improving this document, send us your comments by email to
[email protected]. To help us direct your comments to the correct division, include in the
subject line the product name, version, and operating system.
If you want to be notified automatically when production-level documentation is released or
important changes are made to existing production-level documents, follow Twitter account
@NetAppDoc.
You can also contact us in the following ways:

NetApp, Inc., 495 East Java Drive, Sunnyvale, CA 94089 U.S.

Telephone: +1 (408) 822-6000

Fax: +1 (408) 822-4501

Support telephone: +1 (888) 463-8277

138 | High-Availability Configuration Guide

Index
HA configuration 32

A
active/active
storage configuration 37
active/passive
storage configuration 37
adapters
NVRAM 48
quad-port Fibre Channel HBA 51, 59
aggregate ownership
relocation of 125
aggregate relocation
benefits of 125
commands for 129
effect on root-data partitioning 126
effect on shared disks 126
how it works 125
monitoring progress of 132
overriding a veto of 132
aggregates
CFO 28
HA policy of 28
ownership change 27, 28, 127
relocation of 28, 125, 127
root 28, 32
SFO 28
All-Flash Optimized personality systems
conditions of manual takeovers with configuration
mismatches 102
asymmetrical
storage configuration 37
automatic giveback
commands for configuring 81
how it works 80
parameters and how they affect giveback 82
automatic takeover
triggers for 80
automatic takeovers
commands for changing policy 79

B
background disk firmware update
giveback 29
takeover 29
best practices

C
cabinets
preparing for cabling 50
cabling
Channel A, for mirrored HA pairs 61
Channel A, for standard HA pairs 52
Channel B, for mirrored HA pairs 63
Channel B, for standard HA pairs 54
error message, cross-cabled HA interconnect 5658,
69, 70
HA interconnect for standard HA pair 56, 69
HA interconnect for standard HA pair, 32xx systems
57, 70
HA interconnect for standard HA pair, 80xx systems
58, 70
HA interconnect, cross-cabled 5658, 69, 70
HA pairs 45
preparing equipment racks for 49
preparing system cabinets for 50
requirements 48
CFO
definition of 28
HA policy 28
Channel A
cabling 52, 61
defined 35
Channel B
cabling 54, 63
chassis configurations, single or dual 40
CIFS sessions
effect of takeover on 25
cluster HA
configuring in two-node clusters 75
cluster HAconfiguring in two-node clusters
disabling, when halting or rebooting a node in a twonode cluster 100
cluster high availability
configuring in two-node clusters 75
cluster network 13
clusters
configuring cluster HA in two-node 75
configuring switchless-cluster in two-node 75
special configuration settings for two-node 75

Index | 139
clusters and HA pairs 13
commands
aggregate home status 87
cf giveback (enables giveback) 84
cf takeover (initiates takeover) 84
cluster ha status 87
disabling storage failover 75
enabling HA mode 74
enabling storage failover 75
for automatic giveback configuration 81
for changing automatic takeover policy 79
for checking node states 88
for configuring hardware-assisted takeover 77
ha-config modify 72
ha-config show 72
ha-config status 87
halting a node without initiating takeover 99
rebooting a node without initiating takeover 99
storage disk show -port (displays paths) 114
storage failover giveback (enables giveback) 84
storage failover status 87
storage failover takeover (initiates takeover) 84
takeover (description of all status commands) 87
comments
how to send feedback about documentation 137
comparison
HA pair types 13
Config Advisor
checking for common configuration errors with 77
downloading and running 77
configuration variations
mirrored HA pairs 19
configurations
HA differences between supported system 42
testing takeover and giveback 84
controller failover
benefits of 9
controller failovers
events that trigger 21
current owner
disk ownership type, defined 30

adding to an HA pair with multipath HA 110


hot swapping modules in 115
hot-removing 118
managing in an HA pair 110
disk slicing
benefits of for entry-level platforms 37
disks
how shared HDDs work 38
requirements for using partitioned 40
slicing, benefits of for entry level and All Flash FAS
platforms 37
types of ownership for 30
viewing ownership for 31
documentation
how to receive automatic notification of changes to

failover
benefits of controller 9
failovers
events that trigger 21
failures
table of failover triggering 21
fault tolerance
how HA pairs support 7

data network 13
Data ONTAP
upgrading nondisruptively 118
upgrading nondisruptively, documentation for 118
disk shelves
about modules for 112

137
how to send feedback about 137
required 46
DR home owner
disk ownership type, defined 30
dual-chassis HA configurations
diagram of 40
interconnect 41

E
eliminating
single point of failure 9
EMS message
takeover impossible 32
entry-level platforms
benefits of root-data partitioning for 37
epsilon
moving during manually initiated takeover 104
equipment racks
installation in 45
preparation of 49
events
table of failover triggering 21

140 | High-Availability Configuration Guide


feedback
how to send comments about documentation 137
Fibre Channel ports
identifying for HA pair 51, 59
forcing takeover
commands for 102
effects of using the immediate option 102
FRU replacement, nondisruptive
documentation for 118

G
giveback
CFO (root) aggregates only 108
commands for 108
commands for configuring automatic 81
definition of 20
effect on root-data partitioning 29
effect on shared disks 29
interrupted 106
manual 108
monitoring progress of 106, 108
overriding vetoes 108
partial-giveback 106
performing a 106
testing 84
veto 106
what happens during 27
giveback after reboot
automatic 80

H
HA
configuring in two-node clusters 75
HA configurations
benefits of 7
best practices 32
definition of 7
differences between supported system 42
single- and dual-chassis 40
HA interconnect
cabling 56, 69
cabling, 32xx dual-chassis HA configurations 57, 70
cabling, 80xx dual-chassis HA configurations 58, 70
in the HA pair 7
single-chassis and dual-chassis HA configurations

41
HA mode
enabling 74

HA pairs
cabling 45, 50
cabling mirrored 59
events that trigger failover in 21
in a two-node switchless cluster 17
installation 45
managing disk shelves in 110
MetroCluster, compared with 16
required connections for using UPSs with 71
setup requirements 33
setup restrictions 33
storage configuration variations 37
types of
installed in equipment racks 45
installed in system cabinets 46
mirrored 18
types of, compared 13
HA pairs and clusters 13
HA policy
CFO 28
SFO 28
HA state
chassis 72
controller modules 72
ha-config modify command
modifying the HA state 72
ha-config show command
verifying the HA state 72
hardware
components described 11
HA components described 11
single point of failure 9
hardware replacement, nondisruptive
documentation for 118
hardware-assisted takeover
commands for configuring 77
events that trigger 78
how it speeds up takeover 25
requirements for 36
HDDs
shared, how they work 38
standard layouts for shared 38
high availability
configuring in two-node clusters 75
home owner
disk ownership type, defined 30
hot-removing
disk shelves 118

Index | 141

information
how to send feedback about improving
documentation 137
installation
equipment rack 45
HA pairs 45
system cabinet 46

node states
description of 88
Nondisruptive aggregate relocation 7
nondisruptive hardware replacement
documentation for 118
shelf modules 112
nondisruptive operations
how HA pairs support 7
nondisruptive storage controller upgrade using aggregate
relocation
documentation for 118
storage controller upgrade using aggregate
relocation, nondisruptive 118
nondisruptive upgrades
Data ONTAP 118
Data ONTAP, documentation for 118
NVRAM
adapter 48

L
layouts
standard shared HDD 38
licenses
cf 74
not required 74
LIF configuration
best practice 32

M
mailbox disks
in the HA pair 7
manual takeovers
commands for performing 102
effects of in mismatched All-Flash Optimized
personality systems 102
MetroCluster
HA pairs, compared with 16
mirrored HA pairs
about 18
advantages of 18
cabling 59
cabling Channel A 61
cabling Channel B 63
restrictions 35
setup requirements for 35
variations 19
mirroring
NVMEM log 7
NVRAM log 7
modules, disk shelf
about 112
best practices for changing types 113
hot-swapping 115
restrictions for changing types 112
testing 113
multipath HA loop
adding disk shelves to 110

O
original owner
disk ownership type, defined 30
overriding vetoes
giveback 106
owner
disk ownership type, defined 30
ownership
disk, types of 30
displaying disk ownership 31
displaying partition 31

P
panic
leading to takeover and giveback 80
parameters
of the storage failover modify command used for
configuring automatic giveback 82
partitioning
root-data, benefits of 37
root-data, how it works 38
root-data, requirements for using 40
root-data, standard layouts for 38
partitions
viewing ownership for 31
platforms

142 | High-Availability Configuration Guide


benefits of root-data partitioning for entry-level and
All Flash FAS 37
plexes
requirements for, in the HA pair 35
port list
creating for mirrored HA pairs 60
ports
identifying which ones to use 51, 59
power supply
best practices 32
preparing equipment racks 49

R
racking the HA pair
in a system cabinet 46
in telco-style racks 45
reboot
leading to takeover and giveback 80
relocation
aggregate ownership 125, 127
of aggregates 125, 127
removing
disk shelves 118
requirements
documentation 46
equipment 48
for using root-data partitioning 40
HA pair setup 33
hot-swapping a disk shelf module 115
tools 47
restrictions
HA pair setup 33
in mirrored HA pairs 35
root aggregate
CFO HA policy 28
data storage on 32
giveback of 28
root-data partitioning
benefits of 37
effect on aggregate relocation 126
effect on giveback 29
effect on takeover 29
how it works 38
requirements for using 40
standard layouts for 38

S
SFO

definition of 28
HA policy 28
shared drives
benefits of 37
shared HDDs
how they work 38
shared layouts
standard HDD 38
sharing storage loops or stacks
within HA pairs 37
shelf modules
upgrading or replacing 112
shelves
hot-removing 118
managing in an HA pair 110
single point of failure
analysis 9
definition of 9
eliminating 9
single-chassis HA configurations
diagram of 40
interconnect 41
SMB 3.0 sessions on Microsoft Hyper-V
effect of takeover on 25
SMB sessions
effect of takeover on 25
spare disks
in the HA pair 7, 35
standard HA pair
cabling Channel A 52
cabling Channel B 54
cabling HA interconnect for 56, 69
cabling HA interconnect for, 32xx systems 57, 70
cabling HA interconnect for, 80xx systems 58, 70
standard layouts
shared HDD 38
states
description of node 88
status messages
description of node state 88
storage aggregate relocation start command
key parameters of 130
storage configuration variations
standard HA pairs 37
storage controller upgrade using aggregate relocation,
nondisruptive
documentation for 118
storage failover
commands for disabling 75
commands for enabling 75

Index | 143
testing takeover and giveback 84
suggestions
how to send feedback about documentation 137
switchless-cluster
enabling in two-node clusters 75
symmetrical
storage configuration 37
system cabinets
installation in 46
preparing for cabling 50
system configurations
HA differences between supported 42

commands for configuring hardware-assisted 77


commands for forcing 102
commands to change policy for 79
effects of using the immediate option 102
testing
takeover and giveback 84
tools
required 47
twitter
how to receive automatic notification of
documentation changes 137
two-node switchless cluster 17

takeover
automatic 20
configuring when it occurs 79
definition of 20
effect on CIFS sessions 25
effect on root-data partitioning 29
effect on shared disks 29
effect on SMB 3.0 sessions 25
effect on SMB sessions 25
hardware-assisted 25, 36
hardware-assisted takeover 78
manual 20
moving epsilon during manually initiated 104
reasons for 79
testing 84
what happens during 25
when it occurs 20
takeover impossible
EMS message 32
takeovers

uninterruptible power supplies


See UPSs
UPSs
required connections with HA pairs 71
utilities
checking for common configuration errors with
Config Advisor 77
downloading and running Config Advisor 77

V
verifying
takeover and giveback 84
veto
giveback 106
override 106
vetoes
of an aggregate relocation 132
overriding 132

You might also like