PowerStore 3.0 Administration - Troubleshooting - Participant Guide
PowerStore 3.0 Administration - Troubleshooting - Participant Guide
0
ADMINISTRATION -
TROUBLESHOOTING
PARTICIPANT GUIDE
PARTICIPANT GUIDE
[email protected]
[email protected]
PowerStore 3.0 Administration - Troubleshooting
Appendix ................................................................................................. 97
Introduction
There are LEDs on the front of the base enclosure. The drives are in 2.5 inch (6.35
cm) carriers and are hot-swappable. Enable drive identification from the CLI or
PowerStore Manager to locate a specific drive.
To identify a particular appliance in the rack, select BLINK LED from Hardware >
APPLIANCES > Appliance > COMPONENTS > DRIVES in PowerStore Manager.
Once the button is activated, the button is relabeled to STOP BLINK LED.
Module Fault
Ethernet Link
Ethernet Activity
SAS Activity
Port link
Port link
Node Fault
Node fault
Node Power
Node power
Unsafe to Remove
Unsafe to remove
Add I/O modules in pairs: One module in node A and one module in node B. Both
nodes must have the same type of I/O modules in the same slots.
The port link LEDs are below the ports. The power fault LED is on the right, below
the ejector handle.
1 2
The image below shows the AC Power Supplies from the back.
The node includes a Lithium-ion (Li-ion) internal battery that powers the associated
node during a power event.
The PowerStore 500T has an AC model and a DC model. This page only applies to
PowerStore 500T DC models.
The image below shows the DC Power Supply from the back.
3 2 1
The enclosure power and fault LEDs on expansion enclosures are slightly different
from the LEDs on the base enclosure. The drives are in 2.5-inch carriers and are
hot-swappable.
1 4
2 3
Amber: Fault
Blue: No fault
2: Drive status/activity
3: Drive fault
Amber: Fault
Off: No fault
The Dell Service Tag (DST) is a serialized label that allows Support to track
hardware in the field.
The DST is a black pull-out tag between the drives in slots 16 and 17.
The left side of the tag contains the service tag and the Express Service Code
information. The QRL code is on the right:
QRL pull-out
Info An event has occurred that does not impact system functions. No action is required.
An error has occurred to be aware of but does not have a significant impact on the system. For example, a
Minor
component is working, but its performance is not optimum.
May have an impact on the system and should be remedied as soon as possible. For example, the last
Major
synchronization time for a resource does not match the time that its protection policy indicates.
An error has occurred that has a significant impact on the system and should be remedied immediately. For
Critical
example, a component is missing or has failed and recovery may not be possible.
Alerts about the overall cluster are visible on the Dashboard Overview tab or
under Monitoring > Alerts. Select alert headings to show only that type of alert as
shown here:
Alert Details
On the Dashboard Overview tab, click an alert title to see its details and the
actions to take. Focus on critical alerts first, and then major alerts. Minor and Info
alerts do not necessarily require any action. On the alert details slide-out panel,
acknowledge that the issue which triggered the alert has been addressed.
Copy and paste the event code into the monitoring, events, or filters to find all
timestamps where this event has occurred for further troubleshooting. For example,
select 0x00a00301.
Alert Information
Alert details
Associated Events
Monitoring
All alerts are listed on the Monitoring page. There are two ways to get to this
screen. Select Monitoring from the menu bar. Or, from the Dashboard Overview,
click VIEW ALL ALERTS.
1. Click any heading on this page to sort the table by that column. Click again to
reverse the sort order. Sort by Severity, Code, Description, Resource Type
or Name, Updated Date, or Timestamp.
a. In this example, the list has been sorted by severity to show the critical alerts
at the top.
2. Click the Add Filters button to add filters to the columns. Filtering limits the list
to show only those alerts that match the filters.
a. In this example, Last Updated has been filtered to show alerts that were
updated in last 30 days.
b. Acknowledged alerts are not displayed on the list. To show acknowledged
alerts, filter the Acknowledged button.
3. Select one or more check boxes to acknowledge multiple alerts at once.
4. Click the description of any alert to view its details.
Component-Specific Alerts
The Dashboard Overview shows alerts for the entire cluster. Alerts can also be
found when they are specific to compute, storage, protection, and hardware. If
these objects have generated alerts, see the severity icon in the Alerts column.
From the Compute menu, select Virtual Machines to see alerts related to
each vVol-hosted Virtual Machine
Storage Alerts
From the Storage menu, select Volumes, Volume Groups, Storage Containers, File Systems, or NAS Servers to
see lerts related to each of those specific storage objects.
From the Protection menu, select Remote Systems to see Alerts related to each the
PowerStore cluster's remote replication partner clusters.
Appliance Alerts
Select the Hardware menu to see Alerts related to each the PowerStore
cluster's Appliances and Front End Ports.
Map, unmap, clone, modify, delete, restore, refresh, and create snapshot.
Fault information
A: Most likely, there is a cabling or hardware fault causing the issue. The
installation of the ENS24 expansion enclosure occurs after the cluster is created
successfully.
The addition of the ENS24 expansion enclosure is blocked if:
SAM 1
T
SAM 2
4. Fan packs
5. Clock Distribution Boards (CDBs)
The ENS24 chassis can also be replaced, but this requires downtime for the
enclosure as well as the connected appliance if the enclosure is already in use.
Hardware that requires replacing at the SAM level usually results in a failover to the
other SAM in the enclosure.
Follow the normal process for replacing components for the NVMe expansion
enclosure components.
PSU temperature
Three or more SSD temperature exceeded
Three or more fan rotor failures
Components that generate thermal threshold alerts, but only the SAM shuts
down when the threshold is surpassed:
SAMs
DIMMs
QSFPs
PCI switches
Traffic is redirected to the other SAM while the SAM is shut down.
PowerStore Alerts:
May or may not indicate a problem requiring user intervention. Focus on Critical
and Major alerts.
Can be sorted and filtered.
Includes details helpful in solving the problem:
Install and launch the PowerStore CLI client on a Microsoft Windows, Linux, or
UNIX system. PowerStore CLI sends commands to the system through the secure
HTTPS protocol using the PowerStore REST API.
On Windows, double-click the installer and follow the prompts. The default
installation location is:
– 64-bit systems: C:\Program Files\Dell EMC\PowerStore CLI
Overview
To run PowerStore CLI from an SSH session on the primary PowerStore node, first
connect to the service account. Once the connection is made, the steps are the
same for an SSH session or running from a workstation. Logging directly into the
admin account does not work.
Run PowerStore CLI against the PowerStore primary node—the node running the
management stack. Connecting to a nonprimary node, causes the system to
display an error message:The system was unable establish a secure
connection to the storage server. If the connection fails on all nodes,
the management stack may be down or unresponsive. Contact the support provider
for assistance.
If the PATH variable includes the installation directory, invoke PowerStore CLI from
Windows Command Prompt or UNIX or Linux shell with the pstcli command. If
the PATH variable does not include the installation directory, include the directory in
the command.
The PowerStore CLI client contacts the server that issued the certificate—the
trusted certificate authority (CA)—and confirms the validity of the certificate before
proceeding. When the certificate is verified, PowerStore CLI and its backend server
establish the connection, and begin to exchange data.
Command Syntax
Entering only pstcli, without any options or commands, lists the options and their
descriptions.
Syntax options:
-u username
The username on the PowerStore system, such as admin.
-d hostname or ipaddress
Name or address of the primary node. Use localhost for the primary node,
such as when using an SSH session to the primary PowerStore node.
[-p password]
Optional: password for the specified account. If the password is not on the
command line, the system prompts for it, as shown in the examples.
[-session]
Optional: Opens a persistent session with the cli> prompt. Or enter a
command to run and return context back to the workstation.
[-ssl value] Optional. Where value is one of the following:
Command Examples
For example:
Starting a session
The PowerStore Manager CLI or pstcli filtering feature filters fields with different
datatype and array values. Filtering is supported in both session mode and single
action mode. The syntax is similar to SQL WHERE statement.
Limitations:
Basic Help
Get help on any command using the help command. Commands are divided into
categories. Get a list of the commands in a category, or a list of all the commands
with the help all command. For example, help all shows that at here are
commands that are named alert and event under Monitoring.
Command Help
The command help alert shows that there is an alert show command, with
an option of -id. Get more information by using help alert show. Note the
syntax that is shown for the highlighted show subcommand. The -id option goes
before show. All other options follow show.
Note the highlighted -limit option. The default limit is 100 records.
Tab Key
When in "session mode," use the keyboard Tab key to display the options.
For example, when connected in "session mode," press the Tab key to display all
the available commands. Type volume and press Tab it displays all the options.
Select one option and press Tab to display the sub-options.
Alert Help
To show all alerts on the system, use the alert showcommand. For example,
Windows clients may be having problems authenticating. An alert states that a
node has changed life cycles. To show the details about this specific alert, select
and copy the id field from the display.
The show subcommand is after the -id option. This output shows that the node is
now healthy. If there were a problem to correct, the repair_flow_l10n would
show a suggestion of how to correct the problem. Since there is no problem, the
repair_flow_l10n and system_impact_l10n fieldsare empty.
Sorting by Fields
The default format for show commands is table. The width of the columns is limited
due to the width of your window. Note the hyphen (-) after -sort
NVP Format
Instead of the default table, show events or alerts are in name-value pair (NVP),
comma-separated values (CSV), or JavaScript Object Notation (JSON). NVP
format shows the full value of each field name and its value, since it uses one
field per line. To display specific fields and severities of events, use the options
that are available for alerts and events.
Here are four events showing specific fields in NVP format, which is sorted by
severity in descending order:
CSV Format
Output the same data to CSV format for use with a spreadsheet, such as Microsoft
Excel. PSTCLI does not allow redirection of output the way Windows command
prompt and UNIX or Linux shells do. To output to a CSV file on your client system:
1. Exit from session mode back to the Windows command prompt or UNIX or
Linux shell.
2. Run PSTCLI from the prompt, providing destination, username, password, and
selection fields. Omit the -session option. Redirect the output to a file. Note: If
the password is not on the command line, the system displays a blank line and
waits. Enter the password to continue.
PSTCLI exits after the command runs.
3. View the .csv file in the client system directory.
Events are similar to alerts. An alert is a summation of one or more events that
need or needed attention. Many events do not generate an alert, so there are more
events than alerts. The same system could have 100,000 events, so scrolling
through them may be impractical.
Use the help event show command to list the syntax of the command. It uses
most of the same options as the alert show command. The highlighted options,
-sort, and -select, can be used with the fields listed. Use -limit <value> to
control the number of events that are displayed.
NVP Format
Like alerts, events can be displayed in the more readable NVP format.
Here are the first four events, showing the most important fields in NVP format
sorted by severity:
Hardware Help
To display the list of available hardware command options, use the help
hardware command. As with other commands, select which fields to display and
sort by most fields. Here is a partial list of the help for that command:
...
If there appears to be a problem with the network, show a listing of the network
status. As with other commands, there are many options available. Use the help
network command to see the list of options.
Network Help
There are options to change and show network settings. Note the multiple options
for ip_pool_addresses highlighted below.
Network help
Network show
license show
For example:
When a trial license expires, PowerStore switches into Read-Only mode, but
allows a one-time extension of trial period for an additional seven days.
When a second extension is requested, a dialog indicates that no additional trial
extension is possible. Install a permanent license.
Deep Dive: Certificates can also be managed using REST API. For
more information, see the REST API Reference Guide on
PowerStore: Info Hub
To view the status of SMB or NFS volumes set up on a PowerStore T system, use
the commands that are shown here.
If there are NAS servers running on the PowerStore T system, list them with the
nas_server show command.
One server is intended for NFS while the other is intended for SMB. Each server
uses its own IP address.
A NAS server is degraded when a node on the NAS appliance is unavailable and
all NAS servers are hosted on the remaining node. A degraded NAS server is
displayed as follows:
The smb_server show command displays any SMB servers and their
corresponding domain or workgroup and NAS servers.
The smb_share show command displays SMB shares. In this example, only one
share exists on the system. Add -help to either command for a complete list of
options.
To list all the file systems, use the file_system show command:
To see what other fields are available to select with this command, type
file_system show -help.
Selecting option [1] opens the session. The system prompts again on the next
login.
Selecting option [2] closes the session.
Selecting option [3] opens the session. The system does not prompt for the
certificate on subsequent logins.
PowerStore CLI:
Uses a restricted version of the Linux BASH shell.
Is downloaded from the documentation website.
Can be used to perform many functions similar to the functions in the
PowerStore Manager.
Can show hardware, network, and NAS properties.
Can show alerts and events.
Service Container
Use commands to get basic information about the system from the Service
Container CLI. The CLI uses a restricted version of the Linux Bash shell. Only
certain commands are allowed using the service account.
There are two CLI programs available for managing PowerStore systems:
Service Container
PSTCLI
To access the Service Container, use a terminal emulator to connect to one of the
management addresses or hostname using SSH on port 22. This example uses
PuTTY:
1. Open PuTTY.
2. Connect to one of the management addresses or hostname using SSH on port
22:
svc_diag Command
PowerStore Container commands run scripts of the same name. The svc_diag
command (script) displays many different pieces of system information, depending
on which subcommands and options are used.
Command Syntax
Add an action subcommand, such as list, from basic help to get more specific
help.
To get the Node ID, Appliance Name, Service Tag, Model, IP address, and other
information, use svc_diag list --info command.This command only
displays information about the current node.
In the example, the cluster has one model 1000T appliance, which has an ID of A1.
The service tag for the appliance, 3BP42W2, is the same as that for the cluster
displayed on the previous tab.
To list fault status for various hardware components, use the command: svc_diag
list --hardware --sub_options fault_status.
Script Help
svc_networkcheck -h parameters
Connectivity - Ping
Validate DNS
The svc_tcpdump script attempts to open a given interface and runs a tcpdump
on it. The tcpdump command can be configured to save outputs to files.
Script Help
Help is available for the svc_tcpdump service script with the -h parameter.
svc_tcpdump parameters
SVC_tcpdump -D
The svc_tcpdump -D script screen prints the list of available network interfaces
on the system in which tcpdump can capture packets.
Advanced uses of
svc_tcpdump
Advanced uses of svc_tcpdump include writing a dump file and filtering collected
tcpdump traffic.
If using values other than default 20 MB and five rotations, the dump file size
-C ## and dump file rotations –W #must be >= 1.
Secure Copy (SCP) can be used to copy dump files to an external host for
analysis.
Filter collected tcpdump traffic:
Limitations
Scripts only operate within a service shell session on the local node.
SSH into a peer node or a node of a different appliance to run scripts there.
Limits apply to some options.
For example, ping only supports MTU 1500 or 9000.
NAS and File use their own scripts.
For example, svc_nas run nas_svc_tcpdump <options> to run
tcpdump on a NAS container.
There are some limitations when running in Service Mode (due to failure or
svc_rescue_state set).
For example:
o svc_networkcheck: tcp_port_check does not allow access to
some ports.
o svc_networkcheck: tracert could be inconsistent.
If the PowerStore system is running, but the UI is inaccessible, here are some
troubleshooting tips.
Service commands are available using SSH and logging in with the service
credentials.
This status shows that the Control Path is running, but it may be unresponsive
or in an error state. Go to the next step to dig deeper.
2. Check the journal logs with the svc_journalctl command with the -f (or -
-follow) option, which keeps the command active so that new log entries are
displayed as they occur. Use the -g (or --grep) option to filter the command to
see only st_io_monitor status:
Restarting CP
Use the commands shown here to collect PowerStore log files. For example, to
check the status of PowerStore T NAS service, use the commands that are shown
here.
To generate a specific log file, use the svc_dc run -p profile command,
where profile is one of the following:
All normal and detailed bundles include the NAS logs. If you only need the NAS
logs, you can use the NAS profile:
Important: Since the job may take a long time to run, add an
ampersand (&) to the end of the command to run the job in the
background. This action allows you to run other commands while
the job runs. The system notifies you when the job completes, or
you can use the jobs command to check the status.
2. Since the svc_dc run command specified a profile of nas, look for a bundle
with that profile. To list and find your bundle, use the svc_dc list command:
3. Use the ls -l command to list the files in the directory in a long format, and
look for a file with a pointer to 191cc8a6-baf7-4ac0-861d-
fe408ddc1642:
ls -l command
The l in the first column of each row means that the file is a symbolic link, or
pointer, to a file in the /cyc_var/cyc_service/data_collection folder,
indicated by the -> sign.
Note that each line wraps to the line below, so the id spans two lines.
Use an application, such as Winscp, to download the file to a system where you
can examine the contents. If the application does not accept symbolic links,
copy the original file from the subfolder of
/cyc_var/cyc_service/data_collection to the home directory. Give
the new file a different name.
SDNAS Container
PowerStore has a docker based architecture with containers like Control Path,
Data Path, Serviceability, SDNAS. Access the SDNAS container to collect logs and
run scripts in SDNAS container. Root privileges must be gained in order to access
the SDNAS container.
SDNAS scripts are made available to run within the Serviceability container using
SSH tunneling, which helps in:
More flexibility
Simplified workflow
Easier troubleshooting
Service users can run specific set of NAS scripts without root access.
To view the list of commands that can be run as a service user, use the svc_nas
list command.
When a node is removed from an appliance, the power is removed from half of the
NVMe NVRAM drives.
The NVRAM drives connected to the battery backup module of the removed
node, lose battery backup.
If the peer node is also removed, data loss can occur.
Power is removed from the NVMe NVRAM drives, which blocks their ability
to persistently store data.
All writes in DRAM cache memory within the node are lost.
PowerStore creates virtual drives to provide redundancy to the NVMe NVRAM
drives while a node is removed.
Either one or two virtual drives are created, depending on the number of
NVRAM drives in that model.
PowerStore 1000-3200 models have two NVRAM drives per system: one
virtual drive is created.
PowerStore 5000-9200 models have four NVRAM drives per system: two
virtual drives are created.
The virtual drives have reserved space equal to the NVRAM drives in the fsck
reserved space on data drives.
There is no reduction in the amount of space for data.
Data is mirrored between the NVMe NVRAM drives and the virtual drives.
The process that creates the virtual drives and mirrors the data finishes within
30 seconds.
Cluster 1
Node A Node B
Appliance 1
NAS
NAS
NAS Volumes
Server
FS1 NAS V1
FS2
NAS V2
V1
Since File System 01 is offline, SMB shares and NFS export related to File
System01 are also offline.
Serviceability:
Troubleshoot PowerStore by connecting to the system over SSH or PowerStore
Manager over the Service LAN Ports.
Use SSH connection to connect to the PowerStore T nodes when the system is
in an unknown or bad state to troubleshoot issues.
While troubleshooting the PowerStore X, go to the service container with nc
localhost 50000.
Service containers:
Access PowerStore Service Container using SSH to the management node.
Run PSTCLI from Service Container.
Collect many types of configuration and log information using Service
Container.
In PowerStoreOS 3.0, network service scripts start earlier in the startup
process.
The svc_networkcheck service script validates network connectivity and
other network services.
The svc_tcpdump script attempts to open a given interface and runs a
tcpdump on it.
Scripts only operate within a service shell session on the local node.
Certain commands are restricted from the service account and can only be
accessed through root account. Accessing the root account is referred to as
inject root.
Root privileges must be gained in order to access the SDNAS container.
Service users can run only a specific set of NAS scripts without the root access.
When a file system goes offline, the following items are also placed offline:
Snapshots, clones, SMB shares, and NFS exports.
The data collections are critical to Support personnel, who can diagnose and repair
issues on PowerStore systems.
Data collections can be run manually using PowerStore Manager or the svc_dc
commands.
To collect journal logs from PowerStore Manager, select Settings > Support >
Gather Support Materials:
The Support Materials Library displays the existing compressed collections of ~220
MB Log files.
Data is collected at the appliance level. You may include additional information in
the data collection by selecting the Advanced support materials
collectionoptions. The Advanced option collects additional system information
and requires additional space on the Node. Advanced collections can take longer
than the default data collection. The system runs one collection job at a time. The
video below describes how to Gather Support Materials.
Procedure:
Select Send to Support if you want the system to automatically send the
collection to Support when the job completes. This option is only enabled when
remote support through Secure Remote Services is enabled on the system. The
collection may be sent to Support later.
Select Recent Jobs to monitor the support collection job. When the collection
job completes, the system posts the job information, including its status, on the
Support Materials page.
Alerts - Remediation
Procedures:
In addition to the Settings > Support Materials page, you can perform Gather
Support Materials from the Hardware page. The example shows a data collection
being done on PowerStoreDemo-appliance-2.
Procedure:
From the Hardware page, select the appliance for the data collection
From the MORE ACTIONS drop-down, select Gather Support Materials
From the Gather Support Materials page, provide a Description.
Select the icon to expand Advanced support materials collection options, if
needed.
Check the box to Include additional information.
Select the START button.
Once data collection completes, you can DOWNLOAD the file to a host, SEND TO
SUPPORT or DELETE it.
Select the checkbox for the data collection files, and then select the option that you
want. The SEND TO SUPPORT option is only available if the SupportAssist page
is configured. The below video demonstrates how to download support materials to
windows host.
The example shows the download process. The Initial Configuration data
collection includes files from both appliances. Once the files are selected, it is
downloaded to the default Downloads folder on the localhost. The output is a
.tgz file. Optionally, the file can be exported in either a .csv or .xlsm file to the
same Downloads folder.
1. Select the file.
2. Select DOWNLOAD.
3. Select the file to download if both service and dump files are available. There
could be dump file or regular system collect files.
Journal file: 1:31 to 4:23 Journal file: 4:24 to 7:51 Journal file: 7:52 to 9:16
~220 MB ~220 MB ~220 MB
PowerStore automatic daily log collections only contain data since the last
successful automatic collection.
Day one data is 1.8 GB. Day two data is 1.6 GB. Day three data is 1.9 GB.
These three logfiles consume 10.5 GB for three days (1.8 GB + 3.4 GB + 5.3
GB).
The svc_dc (data collect) command set is used to manage data collections on the
PowerStore. All commands can be found in the PowerStore Service Scripts Guide.
Service commands are available from the Service shell using SSH, Port 22, and
logging in with the service credentials. The service shell is restricted and limits the
ability to run certain commands.
svc_dc run
The svc_dc run command generates a new data collection on the local
appliance using the default profile. Each appliance in a cluster generates its own
data collection archive and stores it locally on that appliance.
svc_dc list
The svc_dc list command lists all data collections, or details for one collection.
Each data collection is identified by an ID. Use the ID in the command to get details
of the collection. For example, svc_dc list 949af06e-7896-4b36-9f29-
d3871a13ad6.
svc_dc download
The svc_dc download command all downloads a data collection file to a Linux
host. The commands require an IP address, path, username, and password of
the remote Linux host. If the PowerStore data collection contains several files,
select the file by entering the Indexnumber of the file.
The file transfer uses the SCP protocol on TCP port 22. When the file transfer is
underway, there is no indication of progress, which could take several minutes to
an hour to complete.
In the example, the file that is indicated by Index 10 is downloaded to the Linux
host into /opt directory.
svc_dc delete
Since data collection requires space, delete files that are no longer needed. The
svc_dc delete command displays a list of files on the PowerStore Delete any or
all of them.
The example shows the file with Index 1 being deleted from the PowerStore
database.
This interaction performs data collection from the service account CLI.
Blink LED