How To Replace A Failed Storage Controller (XtremIO 6.x Only)
How To Replace A Failed Storage Controller (XtremIO 6.x Only)
Notice
Introduction
There can be various hardware or software failures that necessitate replacing a Storage
Controller in the XtremIO Cluster.
Table of Content
• Introduction on page 1
• Table of Content on page 1
• References used in this procedure on page 1
• SC replacement with active cluster on page 2
• Part 1 - Prerequisites prior to scheduling SC FRU on page 2
• Part 2 - SC-FRU Known issues on page 3
• Part 3 - Storage Controller Re-Image on page 4
• Part 4 - SC replace preparation steps on page 5
• Part 5 - SC replacement steps on page 6
• Part 6 - Post SC replacement steps on page 7
• Appendix A - How to verify TCP ports before SC FRU on page 7
1
How to replace a failed Storage Controller (XtremIO 6.x only)
• XtremIO FRU Replacement Procedures (FRU replacement guide) for the version
running on the XtremIO cluster - The FRU guide can be downloaded using the
XtremIO SolVe Generator or from the support page for XtremIO
• Storage Controller Rescue Image - XtremIO Storage Controller 6.0.0-59 Rescue
Image
• Win32DiskImager tool to burn USB drive - Win32DiskImager-0.9.5-binary.zip
Please review How to create, use and manage the Screen user "scruser" if the upgrade is
being performed remotely (i.e: ESRS) to avoid process failure due to network disconnecti-
ons
This procedure should be followed when the cluster is active.
1. IMPORTANT: To ensure the XtremIO cluster host environment is ready for the
upcoming replacement, review with the customer the SC-FRU pre-checks ahead of the
upcoming replacement.
For details on the SC-FRU pre-checks, refer to EMC KB# 486531-Preparing for an
XtremIO Storage Controller replacement (SC FRU)
2. IMPORTANT: XtremIO Engineering advises to upgrade the XMS
to 6.2.0-85 and above in the event the current XMS is at 6.2.0-81
in order to avoid several known issues with SC FRU
if customer cannot upgrade to XMS 6.2.0-85, refer to the following Link for workaround
3. Obtain the XtremIO FRU Replacement Procedures (FRU replacement guide) for the
version running on the XtremIO cluster
The FRU guide can be downloaded using XtremIO Solve Generator or from XtremIO
support Page
4. Inform the CE to download the XtremIO SC Rescue Image matching the version
running on the XtremIO cluster
2
How to replace a failed Storage Controller (XtremIO 6.x only)
For clusters running XtremIO version 6.0.0-55 and above, download the XtremIO
Storage Controller 6.0.0-59 Rescue Image SC rescue image
5. Inform the CE to Create a bootable USB drive with the XtremIO SC Rescue image
according to the SC software re-installation procedure in the FRU replacement guide.
The free tool Win32DiskImager-0.9.5-binary.zip can be used to create the USB drive.
6. Inform the customer to get a KVM/USB Keyboard ready.
7. Upload the latest versions of the XtremIO Health Check Script (HCS)
3
How to replace a failed Storage Controller (XtremIO 6.x only)
Information
4
How to replace a failed Storage Controller (XtremIO 6.x only)
Warning
For single brick cluster: Ask the CE to confirm the IB is 2m length (any other IB cable may
cause the SC FRU procedure to fail)
1. Login into XMS and start XMCLI session as tech
2. Collect a new log bundle before proceeding with the rest of this procedure
3. Ask CE to label all the cables connected to the failed SC (if that has not been done
already)
4. Deactivate the failed SC
xmcli (tech)> replace-storage-controller-prepare sc-id=<ID of the failed
SC> cluster-id=<ID of the cluster> <force>
11:20:57 - Storage-Controller-Name Index Cluster-Name X-Brick-
Index Mgr-Addr Mgr-Addr-Subnet MGMT-GW-IP
11:20:57 - X1-SC1 1 xbrick718 1 10.82.78.50 10.82.78.50/24
10.82.78.1
11:20:57 - Running validations
11:20:57 - Disabling Notifiers
11:20:57 - Deactivating Storage Controller
11:21:17 - Powering-off Storage Controller
11:21:33 - Removing old Storage Controller
11:21:33 - Please disconnect Storage Controller X1-SC1 [1]
11:21:33 - Proceed when Storage Controller is physically removed
Please Enter "Done" (to proceed with) or "Abort" (to cancel) the command. (Done/Abort):
5
How to replace a failed Storage Controller (XtremIO 6.x only)
Warning
Due to issue discovered with VPLEX and zeroed WWNN, we do not connect the FC cables at this stage.
Warning
For large Scale environments (with many volumes), and Native Replication configured, NR may be automati-
cally suspended and resumed during the SC replacement
1. Ask the CE to physically insert the replacement SC:
01.IPMI
02.SAS
03.InfiniBand
04.Management (only if is X1-SC1 or X1-SC2)
05.Tech Dongle (only if is X1-SC1 or X1-SC2)
Power
2. Attach the following cables to the new SC
3. Ask CE to press the power button to power on the new SC and wait 10 minutes to
allow the new SC to boot.
4. Run the attached script to fix the IB rules issues:
a. Upload the signed script using xmsupload to the XMS
b. Run the script with the following options:
--tech-password <password> - optional
--cluster-id <cluster-id>
--sc-id <id> - id between 1-8
Example:
xmcli (tech)> run-script script="sc_ib_rules_fix-v1.0-
s4.0.0.py" arguments="--cluster-id=1 --sc-id=1"
11:58:05 - 2019-01-01 11:58:02,650 INFO Starting
execution of sc_ib_rules_fix version 1.0
11:58:10 - 2019-01-01 11:58:03,845 INFO Cluster name:
xbrick742-744 Cluster PSNT: XIO00182211064
11:58:10 - 2019-01-01 11:58:04,872 INFO Storage
controller: X1-SC1
6
How to replace a failed Storage Controller (XtremIO 6.x only)
Note: The CE must remain next to the cluster in case cabling should be fixed while the
command is running
Note: The process should take approximately 30 minutes.
7
How to replace a failed Storage Controller (XtremIO 6.x only)
11004-11005 - X2-SC1
3. Check TCP port connectivity with the peer node, prior to running replacement command, for example for
testing connectivity with X4-SC1: