Comprehensive NetApp Node Replacement Guide
Comprehensive NetApp Node Replacement Guide
Replacement process
1 Preparing the system for the replacement
Pre-replacement tasks for SAN configurations in an HA pair
Shutting down a node operating in 7-Mode or clustered Data ONTAP
Verifying that the new controller module has no content in NVMEM
Resetting storage encryption disk authentication keys to their MSID
(default security ID set by the manufacturer)
215-07372_H0 June 2019 Copyright © 2019 NetApp, Inc. All rights reserved. 1
Web: www.netapp.com • Feedback: [email protected]
Related information
https://library.netapp.com/ecm/ecm_download_file/ECMP12475945
Steps
1. Preparing the system for controller replacement on page 31
2. Replacing the controller module hardware on page 39
3. Restoring and verifying the system configuration after hardware replacement on page 47
4. Running diagnostics tests after replacing a controller module on page 51
5. Completing the recabling and final restoration of operations on page 55
6. Completing the replacement process on page 63
1 2 3 4 5 6
Storage Encryption?
YES
NO YES
Reset authentication
key to MSID
HA down the
Shutting STAND-ALONE
impaired controller module
GO TO NEXT STEP
Shut down the power
through SP
GO TO NEXT STEP
Steps
1. Determining your controller CNA port configuration on page 33
2. Checking quorum on the SCSI blade on page 33
3. Preparing for Storage or Volume Encryption configurations on page 34
4. Shutting down the target controller on page 35
5. Verifying the new controller module has no content in NVRAM on page 37
Steps
b. Run the following Cluster-Mode command on the console of the impaired node:
system node hardware unified-connect modify
2. Copy and save the information displayed on the screen to a safe location for later reuse.
Steps
1. Verify that the internal SCSI blade is operational and in quorum on the impaired node:
event log show -node impaired-node-name -messagename scsiblade.*
You should see messages similar to the following, indicating that the SCSI-blade process is in quorum with the other nodes
in the cluster:
2. If you do not see the quorum messages, check the health of the SAN processes and resolve any issues before proceeding
with the replacement.
Steps
2. Display the key ID for each self-encrypting disk on the original system:
disk encrypt show
Example
The first disk in the example is associated with an MSID key; the other disks are associated with a non-MSID key.
3. Examine the output of the disk encrypt show command, and if any disks are associated with a non-MSID key, rekey the
disks to an MSID key by taking one of the following actions:
4. Verify that all of the self-encrypting disks are associated with an MSID key:
disk encrypt show
Example
The following example shows the output of the disk encrypt show command when all self-encrypting disks are
associated with an MSID key:
6. Repeat step 1 on page 34 through step 5 on page 35 for each individual node or HA pair.
Choices
• Shutting down a node running ONTAP on page 35
Steps
1. If the system is running ONTAP, check the status of the nodes in the cluster.
a. Change the privilege level to advanced, entering y when prompted to continue: set -privilege advanced
The advanced prompt (*>) appears.
b. Verify the status of the node members in the cluster: cluster show -epsilon *
Example
The following example displays information about the health and eligibility of the nodes in the cluster:
Note: You must not assign epsilon to a node that has to be replaced.
Note: In a cluster with a single HA pair, you must not assign epsilon to either node.
c. Perform one of the following actions, depending on the result of the command:
If... Then...
All nodes show true for both health
a. Exit advanced mode:
and eligibility, and epsilon is not
assigned to the impaired node set -privilege admin
b. Proceed to Step 3.
d. Go to Step 3.
2. If the impaired node is part of an HA pair, disable the auto-giveback option from the console of the healthy node:
storage failover modify -node local -auto-giveback false
• If the impaired node shows the ONTAP prompt, then take over the impaired node from the
healthy node and be prepared to interrupt the reboot:
storage failover takeover -ofnode impaired_node_name
When prompted to interrupt the reboot, you must press Ctrl-C to go to the LOADER
prompt.
• If the display of the impaired node shows the Waiting for giveback message, then
press Ctrl-C and respond y to take the node to the LOADER prompt.
• If the impaired node does not show either the Waiting for giveback message or an
ONTAP prompt, then power-cycle the node.
You must contact technical support if the node does not respond to the power cycle.
The method that you use to shut down the node depends on whether you use remote management through the node's Service
Processor (SP):
6. If the system is in a stand-alone configuration, shut down the power supplies, and then unplug both of the power cords from
the power source.
Steps
2. If the NVRAM LED is not flashing, there is no content in the NVRAM; You can skip the following steps and proceed to the
next task in this procedure.
3. If the NVRAM LED is flashing, there is data in the NVRAM and you must disconnect the battery to clear the memory:
b. Locate the battery, press the clip on the face of the battery plug to release the lock clip from the plug socket, and unplug
the battery cable from the socket.
1
NVMEM battery
START HERE
1 2 3 4 5 6
ONE TWO
CONTROLLER CONTROLLERS
IN CHASSIS IN CHASSIS
GO TO NEXT STEP
Steps
1. Removing the controller module from the system on page 40
2. Moving the boot device on page 42
3. Moving the NVRAM battery on page 43
4. Moving the DIMMs to the new controller module on page 43
Steps
2. Loosen the hook and loop strap binding the cables to the cable management arm, and then unplug the system cables and
SFPs (if needed) from the controller module, and keep track of where the cables were connected.
Leave the cables in the cable management arm so that when you reinstall the cable management arm, the cables are
organized.
3. Remove the cable management arms from the left and right sides of the controller module and set them aside.
The illustration shows the cable management arms on a FAS2552 system. The procedure is the same for all FAS25xx
systems.
4. Squeeze the latch on the cam handle until it releases, as shown in the following illustration. Open the cam handle fully to
release the controller module from the midplane, and then, using two hands, pull the controller module out of the chassis.
1
Button to release controller module cover
Steps
1. Locate the boot device using the following illustration or the FRU map on the controller module:
1
Boot device
2
Boot device holder; not removable
2. Open the boot device cover and hold the boot device by its edges at the notches in the boot device housing, gently lift it
straight up and out of the housing.
Attention: Always lift the boot device straight up out of the housing. Lifting it out at an angle can bend or break the
connector pins in the boot device.
4. Align the boot device with the boot device socket or connector, and then firmly push the boot device straight down into the
socket or connector.
Important: Always install the boot device by aligning the front of the boot device squarely over the pins in the socket at
the front of the boot device housing. Installing the boot device at an angle or over the rear plastic pin first can bend or
damage the pins in the boot device connector.
5. Verify that the boot device is seated squarely and completely in the socket or connector.
If necessary, remove the boot device and reseat it into the socket.
Steps
1. Locate the battery, press the clip on the face of the battery plug to release the lock clip from the plug socket, and then unplug
the battery cable from the socket.
1
NVMEM battery
2. Grasp the battery and press the tab marked PUSH, and then lift the battery out of the holder and controller module.
Steps
1. Verify that the NVMEM battery cable connector is not plugged into the socket.
4
3
1
System DIMM
2
NVMEM DIMM
The NVMEM DIMM has an NVMEM label on one of the chips.
3
System DIMM slot
4
NVMEM DIMM slot
The NVMEM DIMM slot has white ejector tabs.
3. Note the location and orientation of the DIMM in the socket so that you can insert it in the new controller module in the
proper orientation.
4. Slowly press down on the two DIMM ejector tabs, one at a time, to eject the DIMM from its slot, and then lift it out of the
slot.
5. Locate the corresponding slot for the DIMM in the new controller module, align the DIMM over the slot, and then insert the
DIMM into the slot.
The notch among the pins on the DIMM should align with the tab in the socket. The DIMM fits tightly in the slot but should
go in easily. If not, you should realign the DIMM with the slot and reinsert it.
Important: You must install the NVMEM DIMM only in the NVMEM DIMM slot.
6. Visually inspect the DIMM to verify that it is evenly aligned and fully inserted into the slot.
The edge connector on the DIMM must make complete contact with the slot.
7. Push carefully, but firmly, on the top edge of the DIMM until the latches snap into place over the notches at the ends of the
DIMM.
9. In the new controller module, orient the NVMEM battery cable connector to the socket on the controller module and plug
the cable into the socket.
You must ensure that the plug locks down onto the socket on the controller module.
Steps
1. Align the end of the controller module with the opening in the chassis, and then gently push the controller module halfway
into the system.
Note: You must not completely insert the controller module in the chassis until instructed to do so.
2. Recable the management port or serial console port so that you can access the system to perform the tasks in the following
sections.
b. Enter one of the following commands from the healthy node’s console and wait for the
giveback to complete:
c. If you have not already done so, reinstall the cable management arm, and then tighten the
thumbscrew on the cam handle on back of the controller module.
d. Bind the cables to the cable management device with the hook and loop strap.
A stand-alone configuration
a. With the cam handle in the open position, firmly push the controller module in until it
meets the midplane and is fully seated, and then close the cam handle to the locked
position.
Attention: You must not use excessive force when sliding the controller module into
the chassis; you might damage the connectors.
b. Reconnect the power cables to the power supplies and to the power sources, turn on the
power to start the boot process.
c. If you have not already done so, reinstall the cable management arm, and then tighten the
thumbscrew on the cam handle on back of the controller module.
d. Bind the cables to the cable management device with the hook and loop strap.
Important: During the boot process, you might see the following prompts:
• A prompt warning of a system ID mismatch and asking to override the system ID.
• A prompt warning that when entering Maintenance mode in an HA configuration you must confirm that the healthy
node remains down.
START HERE
STEP
Verify that HA state (ha-config show)
matches your configuration
1 2 3 4 5 6
NO
YES
Fibre Channel?
YES
NO
Restore FC configuration
NO
YES
GO TO NEXT STEP
Steps
1. In Maintenance mode, display the HA state of the new controller module and chassis:
ha-config show
If your system is... The HA state for all components should be...
In an HA pair ha
Stand-alone non-ha
2. If the displayed system state of the controller does not match your system configuration, set the HA state for the controller
module:
ha-config modify controller ha-state
3. If the displayed system state of the chassis does not match your system configuration, set the HA state for the chassis:
ha-config modify chassis ha-state
Steps
After you issue the command, wait until the system stops at the LOADER prompt.
5. Boot the node back into Maintenance mode for the configuration changes to take effect.
Steps
2. Because modifying one port in a port pair modifies the other port, answer y when prompted by the system.
After you issue the command, wait until the system stops at the LOADER prompt.
4. Boot the node back into Maintenance mode for the configuration changes to take effect.
When setting the date and time at the LOADER prompt, verify that all times are set to GMT.
Steps
1. If you have not already done so, halt the replacement node to display the LOADER prompt.
2. Determine the system time by using the date command on the healthy node (if the system is in an HA pair) or another
reliable time source.
Steps
For the latest release of SP firmware, log in to the NetApp Support Site at mysupport.netapp.com and update it, if needed, in
the following steps.
3. Download and install the most current version of firmware for your system by following the provided instructions.
NetApp Downloads: System Firmware and Diagnostics
Related information
Find a System Administration Guide for your version of ONTAP 9
Find a System Administration Guide for your version of Data ONTAP 8
START HERE
STEP
Choose the test method
1 2 3 4 5 6
GO TO NEXT STEP
• For ONTAP 8.2 and later, you do not require loopback plugs to run tests on storage interfaces.
Steps
1. If the node to be serviced is not at the LOADER prompt, bring it to the LOADER prompt.
Important: During the boot_diags process, you might see a prompt warning that when entering Maintenance mode in
an HA configuration, you must confirm that the partner remains down. To continue to Maintenance mode, you should
enter y
4. Display and note the available devices on the controller module: sldiag device show -dev mb
The controller module devices and ports that are displayed can be any one or more of the following:
• cna is a Converged Network Adapter or interface that is not connected to a network or storage device.
• sas is a Serial Attached SCSI device that is not connected to a disk shelf.
5. How you proceed depends on how you want to run diagnostics on your system.
Choices
• Running diagnostics tests concurrently after replacing the controller module on page 52
• Running diagnostics tests individually after replacing the controller module on page 53
Steps
1. Display and note the available devices on the controller module: sldiag device show -dev mb
The controller module devices and ports that are displayed can be any one or more of the following:
• cna is a Converged Network Adapter or interface that is not connected to a network or storage device.
2. Review the enabled and disabled devices in the output from step 1 and then determine which tests you want to run
concurrently.
*> <SLDIAG:_ALL_TESTS_COMPLETED>
7. After the tests are complete, verify that there are no hardware problems on your storage system:
sldiag device status -long -state failed
8. Correct any issues that are found, and repeat this procedure.
Steps
3. Examine the output and, if applicable, enable the tests that you want to run for the device:
sldiag device modify -dev dev_name -index test_index_number -selection enable
test_index_number can be an individual number, a series of numbers separated by commas, or a range of numbers.
<SLDIAG:_ALL_TESTS_COMPLETED>
boot media sldiag device status -dev bootmedia -long -state failed
b. Turn off or leave on the power supplies, depending on how many controller modules are in
the chassis:
• If you have two controller modules in the chassis, leave the power supplies turned on
to provide power to the other controller module.
• If you have one controller module in the chassis, turn off the power supplies, and then
unplug them from the power sources.
c. Check the controller module you are servicing to verify that you have observed all of the
considerations identified for running system-level diagnostics, that cables are securely
connected, and that hardware components are properly installed in the storage system.
d. Boot the controller module you are servicing, interrupting the boot by pressing Ctrl-C
when prompted.
This takes you to the Boot menu:
• If you have two controller modules in the chassis, fully seat the controller module you
are servicing in the chassis.
The controller module boots up when fully seated.
• If you have one controller module in the chassis, connect the power supplies, and then
turn them on.
g. Enter boot_diags at the prompt, and then rerun the system-level diagnostic test.
8. Exit system-level diagnostics, and continue with recabling and restoration of the storage system.
1 2 3 4 5 6
HA STAND-ALONE
Storage Encryption?
YES
NO
Restore Storage
Encryption
Install licenses on
the replacement node
GO TO NEXT STEP
Steps
1. Reinstall the cable management arms and recable the controller module, as needed.
If you removed the media converters (SFPs), remember to reinstall them if you are using fiber optic cables.
Related information
Disk Shelves Documentation
Reassigning disks
If the storage system is in an HA pair, the system ID of the new controller module is automatically assigned to the disks when
the giveback occurs at the end of the procedure. In a stand-alone system, you must manually reassign the ID to the disks.
1. If the replacement node is in Maintenance mode (showing the *> prompt), exit Maintenance mode:
halt
After you issue the command, you must wait until the system stops at the LOADER prompt.
2. If you are running Data ONTAP 8.2.2 or earlier at the replacement node prompt, set the environmental variables:
a. Confirm that the new controller module boots in clustered Data ONTAP: setenv bootarg.init.boot_clustered
true
3. From the LOADER prompt on the replacement node, boot the node:
If you are prompted to override the system ID due to a system ID mismatch, enter y.
4. Wait until the Waiting for giveback... message is displayed on the replacement node console and then, on the healthy
node, verify that the controller module replacement has been detected and the new partner system ID has been automatically
assigned.
Example
5. From the healthy node, verify that any coredumps are saved:
6. Your next step depends on the version of ONTAP your system is running.
b. After the node displays Waiting for Giveback..., give back the node:
storage failover giveback -ofnode replacement_node_name
As the replacement node boots up, it might again display the prompt warning of a system
ID mismatch and asking to override the system ID. You can respond Y.
The replacement node takes back its storage and finishes booting up to the ONTAP
prompt.
Note: If the giveback is vetoed, you can consider overriding the vetoes.
ONTAP 9 High-Availability Configuration Guide
d. Wait until the storage failover show-giveback command output indicates that
the giveback operation is complete.
e. Confirm that the HA pair is healthy and that takeover is possible by using the storage
failover show command.
The output from the storage failover show command should not include the
System ID changed on partner message.
d. Monitor the progress of the giveback operation by using the storage failover
show-giveback command.
e. Wait until the storage failover show-giveback command output indicates that
the giveback operation is complete.
f. Confirm that the HA pair is healthy and that takeover is possible by using the storage
failover show command.
The output from the storage failover show command should not include the
System ID changed on partner message.
8. Verify that the disks or FlexArray Virtualization LUNs were assigned correctly: storage disk show -ownership
Example
The disks belonging to the replacement node should show the new system ID for the replacement node. In the following
example, the disks owned by node1 now show the new system ID, 1873775277:
Disk Aggregate Home Owner DR Home Home ID Owner ID DR Home ID Reserver Pool
----- ------ ----- ------ -------- ------- ------- ------- --------- ---
1.0.0 aggr0_1 node1 node1 - 1873775277 1873775277 - 1873775277 Pool0
1.0.1 aggr0_1 node1 node1 1873775277 1873775277 - 1873775277 Pool0
.
.
.
9. Verify that the expected volumes are present for each node: vol show -node node-name
10. If you disabled automatic takeover on reboot, reenable it on the healthy node console: storage failover modify -
node replacement-node-name -onreboot true
Steps
1. If you have not already done so, reboot the replacement node, interrupt the boot process by entering Ctrl-C, and then select
the option to boot to Maintenance mode from the displayed menu.
You must enter Y when prompted to override the system ID due to a system ID mismatch.
Note: Make a note of the old system ID, which is displayed as part of the disk owner column.
Example
The following example shows the old system ID of 118073209:
3. Reassign disk ownership by using the system ID information obtained from the disk show command:
disk reassign -s old system ID
In the case of the preceding example, the command is disk reassign -s 118073209.
You can respond Y when prompted to continue.
Example
After you issue the command, you must wait until the system stops at the LOADER prompt.
6. If you are running Data ONTAP 8.2.2 or earlier at the replacement node prompt, set the environmental variables:
a. Confirm that the new controller module boots in clustered Data ONTAP: setenv bootarg.init.boot_clustered
true
Steps
1. If you need new license keys in the Data ONTAP 8.2 format, obtain replacement license keys on the NetApp Support Site in
the My Support section under Software licenses.
Note: The new license keys that you require are auto-generated and sent to the email address on file. If you fail to receive
the email with the license keys within 30 days, contact technical support.
3. If you want to remove the old licenses, complete the following substeps:
Related information
Find a System Administration Guide for your version of ONTAP 9
Find a System Administration Guide for your version of Data ONTAP 8
NetApp Knowledgebase Answer 1002749: Data ONTAP 8.2 and 8.3 Licensing Overview and References
Step
1. Restore Storage or Volume Encryption functionality by using the appropriate procedure in the NetApp Encryption Power
Guide.
ONTAP 9 NetApp Encryption Power Guide
Use one of the following procedures, depending on whether you are using onboard or external key management:
Steps
1. Verify that the logical interfaces are reporting to their home server and ports:
network interface show -is-home false
If any LIFs are listed as false, revert them to their home ports:
network interface revert *
If... Then...
AutoSupport is enabled Send an AutoSupport message to register the serial number.
AutoSupport is not enabled Call NetApp Support to register the serial number.
Related information
NetApp Support
Disposing of batteries
You must dispose of batteries according to the local regulations regarding battery recycling or disposal. If you cannot properly
dispose of batteries, you must return the batteries to NetApp, as described in the RMA instructions that are shipped with the kit.
Related information
https://library.netapp.com/ecm/ecm_download_file/ECMP12475945
Copyright
Copyright © 2019 NetApp, Inc. All rights reserved. Printed in the U.S.
No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or
mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written
permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
Trademark
NETAPP, the NETAPP logo, and the marks listed on the NetApp Trademarks page are trademarks of NetApp, Inc. Other
company and product names may be trademarks of their respective owners.
http://www.netapp.com/us/legal/netapptmlist.aspx
Trademark 65