4.8 Basic Troubleshooting Tips For Fibre Channel (FC) SAN Issues - Mycloudwiki
4.8 Basic Troubleshooting Tips For Fibre Channel (FC) SAN Issues - Mycloudwiki
Anil K Y Ommi
There are many areas where the errors can be made and you might experience lots of
issues with the mis-configuration settings. A thorough and deep understanding of the SAN
configuration is needed to troubleshoot any storage related issues. Slight differences can
make a huge data loss and could make the organisation collapse. To troubleshoot any
kind of situation, follow these tips as a starting step before the advanced troubleshooting.
There might be other tools to troubleshoot the issues but these are basic first steps which
might help you save the time.
Latency
Zoning between two devices
fcping is available on most switch platforms as well as being a CLI tool for most
operating systems and some HBAs. It works by sending Extended Link Service (ELS)
echo request frames to a destination, and the destination responding with ELS echo
response frames. For example
# fcping 50:01:43:80:05:6c:22:ae
fctrace
Another tool that is modeled on a popular IP networking tool is the fctrace tool. This
tool traces a route/path to an N_Port. The following command shows an fctrace command
example
# fctrace fcid 0xef0010 vsan 1
If you know that your LUN masking and zoning are correct but the server still does not see
the device, it may be necessary to reboot the host.
5) Understanding Switch Configuration Dumps
Each switch vendor also tends to have a built-in command/script that is used to gather
configs and logs to be sent to the vendor for their tech support groups to analyze. The
output of these commands/scripts can also be useful to you as a storage administrator.
Each vendor has its own version of these commands/scripts
Cisco – show tech-support
Brocade – supportshow or supportsave
QLogic – create support
The following example shows the error counters for a physical switch port on a switch:
admin> portshow 4/15
These port counters can sometimes be misleading. It is perfectly normal to see high
counts
against some of the values, and it is common to see values increase when a server is
rebooted and when similar changes occur. If you are not sure what to look for, check your
switch documentation, but also compare the counters to some of your known good ports.
If some counters are increasing on a given port that you are concerned with, but they are
not increasing on some known good ports, then you know that you have a problem on that
port.
Other commands show similar error counters as well as port throughput. The
following porterrshow command shows some encoding out (enc out) and class 3 discard
(disc c3) errors on port 0. This may indicate a bad cable, a bad port, or another hardware
problem
admin> porterrshow