0% found this document useful (0 votes)
26 views16 pages

Information Collection and Basic Troubleshooting Procedure EN V1.0

Information-Collection-and-Basic-Troubleshooting-Procedure-EN-V1.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views16 pages

Information Collection and Basic Troubleshooting Procedure EN V1.0

Information-Collection-and-Basic-Troubleshooting-Procedure-EN-V1.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Information Collection and Basic

Troubleshooting procedure for Customer


Issues
1. Basic information collection
2. Device auto reboot
3. Service interrupt
4. Function module failure
4.1 NAT issue
4.2 policy issue
4.3 Route issue
4.4 DHCP server issue
4.5 IPSECVPN issue
4.5.1 Multi-Entry network IPSEC fail in establishment
4.6 AAA、WEBAUTH、SNMP and LDAP issue
4.7 HA issue
4.8 AV signature database cannot upgrade
4.9 Scvpn data disconnected
4.10 App recognition errors issue
4.11 Application layer function failure (ALG,URL,Filtering etc) issue
4.12 GSM module does not work
4.13 L2TP cannot dial-in
5. Performance Issue
5.1 CPU with high utilization
5.2 Slow to login to device administration
5.2.1 Network Quality
5.2.2 Core0 with high utilization
5.2.3 Hardware Issue
5.3 Device out of memory
6. Slow to access some service, service abnormal
6.1 Cannot access some websites
6.2 Slow to access some services
7. HSA/HSM typical problems
7.1 HSA reject log
7.2 HSA disk filled
7.3 HSM registration is abnormal
8. Hardware Failure
8.1 Power supply failure
8.2 Fan failure
8.3 Interface or Switch chip failure
8.4 Memory failure
8.5 Device Crystal Oscillator failure

1. Basic Information Collection


1. The basic information of device should be collected by the following command when
device failed
Show tech-support tofile // The information collection and implementation have
been enriched since after version of 5.0R3P4、5.0R4
show tech-supprot tofile //and export the corresponding information from “ Export
system debugging information” in web UI
Show cpu detail
Show session generic
Show cpu-cntr
Show dp-r memory 、show dp-r packet-buffer、show dp-r packet-descriptor
Show snat resource
Show log event | include SNAT
Show logging alarm file
show log security
show environment
Notice:These are all frequently-used “show” command,the collected information
will vary for different cases,There are some examples in the behind。The general
principle is to collect the overall resource state of failed device and the detail
information of involved function module.
2. Collecting the user’s network topology information, and mark with hillstone device
model, upstream and downstream devices, as well as ip and mac information.

2. Device Auto Reboot


Device Auto Reboot means the device suddenly automatic reboot without any manual
operations after it connected with the current network and running a while, it will
cause the service interruption but usually auto recovered after rebooting.
1. Export tech support files:login into the WEBUI page of firewall,click[System] —>
[Tool] —> [Export System Debugging Information];If it is SX system,need to select
the corresponding sub-card to export.
2. If cannot export the dump file through the step1,use command “show tech-
support ex” to check if there is any dump file of device。If it does,apply shell
password to enter shell and export file to ftp;If not,collecting the following
information after excluding the reason of human factors, power voltage and
temperature that caused the device rebooting.
3. show logging alarm file(alarm had been changed to undocumented command in
5.0R4 version)
show logging alarm
show tech-support dmes
show tech-support console/fconsole
show tech-support controller
show tech-support tofile , and export the corresponding information from
“ Export system debugging information” in web UI
4. Collect information of “show tech-support” separately and save.

3. Service interruption
The service interruption might cause by many of reasons, therefore, the multi-analysis
is required when the service interruption happened. The basic procedure is to check
network connectivity first, firewall configuration, device or function connectivity, and
the last step is debugging.
1. Ping upstream gateway and downstream from HS
2. Ping client from HS
3. Ping server from HS
4. Ping HS interface IP from client
5. Ping server from client
6. Traceroute to judge if the route is correct from client
7. Check if the basic network configuration of firewall changed, such as policy, nat,
router, arp and mac etc.
8. Check if it is wrong configuration or false positives of function by show session
limit 、show qos
9. Check if resource is all right by show cpu detail /show cpu-cntr /show dp-r packet-
buffer/show session gen/show snat resource
10. Check if the session established correctly by show session src-ip x.x.x.x or show
session dst-ip,also check if the address after NAT and Egress interface is correct
(show ip route interface to check interface ID)and show arp
11. debug dp filter src-ip x.x.x.x or Debug dp filter dst-ip x.x.x.x
12. debug dp basic、debug dp drop、debug dp snoop

4. Function module failure


4.1 NAT issue
1. Use “show snat resource, show snat id x resource detail” to check the current NAT
resource utilization
2. Use “show log event | include SNAT” to check if the SNAT resource exhausted, SX
device need to check by step 1.
3. Use “show snat” to check the current SNAT configuration
4. show dp-config nat
5. Use “show dnat rule” to check the current configuration of DNAT
6. Use “show dnat server” to check the status of server in monitoring
7. Use “show session src-ip x.x.x. or show session dst-ip x.x.x.x” to check if the switch
address of NAT is correct
8. debug dp filter src-ip x.x.x.x or Debug dp filter dst-ip x.x.x.x
9. debug dp basic,debug dp nat,debug dp drop

4.2 policy issue


1. Use “show policy” to show all the policy under CLI configuration
2. Use “show policy all” to show all the security policy in the device(include the NBC
policy added in WEBUI)
3. Check if the application identification and ALG was enabled
4. debug dp filter src-ip x.x.x.x or debug dp filter dst-ip x.x.x.x
5. Use “debug dp basic” to check the in and out zone, ip and next interface of flow
6. Use “debug dp policy lookup” to check the policy ID matched with the current flow
7. Use “debug dp drop” to check the dropped data packet
8. Use “debug dp snoop” to check the package information of data packet 2-4
9. Use “show dp-r me、show dp-config policy” to check device memory utilization
10. The problem of mismatched PPS happened before with unknown reason, suggest
to upgrade to 5.0R3P8 or higher version

4.3 Route issue


1. Use “show ip route” to show routing table
2. Use “show fib” to show fib routing table
3. Use “show interface” to show interface information
4. Check if the corresponding interface is the backup interface; check track status
(Failure or not)
5. Use “show ip route x.x.x.x” to check the route status of destination address
6. debug dp filter src-ip x.x.x.x or Debug dp filter dst-ip x.x.x.x
7. Use “debug dp basic” to check the in and out zone, ip and next interface of flow
8. Use “debug dp route” to check the detail route debug information

4.4 DHCP server issue


1. Use “show dhcp-server pool” to check the DHCP ip address pool
2. Use “show dhcp-server statistics” to check packet status received or sent by DHCP
3. Use “show dhcp-server binding” to check the status of IP, MAC and leasing
assigned by DHCP client
4. Use “show logging event” to check the log of DHCP address assignment
5. Use “debug dhcpd serve” to check the debug information of DHCP server
6. Use “debug dhcpd relay” to check the debug information of DHCP-relay

4.5 IPSEC VPN issue


1. Use “show isakmp sa/ show ipsec sa” to check negotiation status of sa in two
phases
2. Use “show isakmp peer/ show tunnel ipsec auto” to check the configuration of
isakmp and tunnel
3. Use “show interface bind-tunnels tunnelx” to check the binding status of VPN
4. Use “show log event” to check the corresponding log negotiated by VPN
5. Use “debug filter” also “debug dp basic”、”debug self” to verify if there is any NAT
device between the two configurations, and if their IP address changed
6. Enable nat-t function
7. Use “debug vpn” to check and analyze the negotiation procedure of IPSEC;Use
“debug dp basic、debug self” to check if the negotiation message send out from
the correct interface port
8. Use “debug dp basic and debug dp ipsec packet” to check encryption and
decryption of data packet
9. Check if route and policy are added correctly.

4.5.1 Multi-Entry network IPSEC fail in establishment


1. Check if the basic configuration of IPSEC VPN is correct
2. Use “debug vpn” to show that why cannot receive the negotiation message from
peer site. Check if the peer address is pingable to test the basic network
connectivity;Check if there is NAT device between the VPN devices
3. Use “show ip route” to check if the route is correct to peer side.
4. Use “show session tunnel” or “debug” to check if the sending end of the message
has been sent out from the correct interface port
5. If the current route selection is correct, then we can clear the tunnel session to
negotiate again. Usually the down/up of interface port will cause wrong route
selection, and the session does not rematch after route recovered, therefore, the
message will be sent to wrong interface port.
6. Among different ISPs,the VPN negotiation message might be blocked during the
transmission procedure,if excluded the configuration or route selection issue,
suggesting to ask user to contact with ISP to resolve the problem,also we can
verify if the message reach the peer side by capturing packet in port mirroring.
7. Suggest upgrade VPN Multi-Entry environment to 5.0R3P4 or higher version

4.6 AAA、WEBAUTH、SNMP and LDAP issue


1. Use “show auth-user” to check the certified user
2. Use “show aaa-server” to check the configured AAA server
3. Use “show user aaa-server” to check the exact user synchronize or configure in
AAA server
4. Use “show webauth” to check the configuration of webauth
5. Use “show snmp-server” to check the configuration of SNMP
6. Use “debug aaa” to check the debug information of local 、 radius and ldap
verification
7. Use “debug webauth” to check the webauth debug information
8. Use “debug snmpd” to check the snmp process of receive packet and transmit
packet
9. If the network manager software cannot monitor device via SNMP, then we can
try to use mibbrowser to connect firewall and load information
10. If the device cannot read the user of windows AD by LDAP, then we can try
ladapbrowser at first, especially pay attention to the writing of member property,
group categories and named attributes
11. If cannot synchronize the user information from AD, try to switch to plaintext,
also need to check if the inter-domain authentication of AD server is configured
as certificate authentication, we only support the password for now, so the inter-
domain authentication of AD server need to be closed.

4.7 HA issue
1. Use “show version” to check if the version, hardware model and license of two HA
devices are the same, if the board card is the same; if the active function is the
same. Such as Vrouter, policy mode etc.
2. Comparing the configuration files of two devices, to check if there is any mac-cone
configuration etc.
3. Use “show ha group 0” to verify the ha group basic information in two HA devices,
and to check if the HA priority, monitoring, preemption and Peer information are
correct and complete.
4. Use “show ha link status” to check HA heartbeat line information
5. Use “show ha flow statistics” to check the transmission status of main packet and
sub-packet.
6. Use “show ha protocol statistic” to check the heartbeat packet receiving and
transmission status between two HA devices.
7. Ping the heartbeat line IP of peer side to check if the heartbeat is normal, also
check if there is any IP conflict in the heartbeat interface.
8. Use “show ha sync statistic” to check synchronize statistic of HA device
9. Use “show ha protocol statistic” to check “hello” message and other related
information of HA device
10. Use “show interface” to check if the MAC of HA interface is correct
11. Use “show ha cluster” to check if the HA cluster of two devices are the same
12. Check if the defined ha track time interval is too short,or the set value is too
small.(The recommend time interval is longer than 3 seconds)
13. Use “debug ha” to check the HA state, negotiation status and file synchronization.
14. Check the MAC and port learning conditions of upstream and downstream switch
or router
15. Capturing the packet to analyze the sending of free arp
16. The device management is abnormal after HA switched,active the free arp
sending function of corresponding zone, update the upper and down device ARP
after reinforce the HA switch
17. Check if the time configuration of preemption and hello is appropriate, suggest
to set the time a little bit longer
18. If the downstream device is old,which cannot sent free ARP after HA switch,
suggest to enable arp-passive-learning under flow mode in the hillstone end.

4.8 AV signature database cannot update


1. If the update cannot be done on the web page, try to upgrade by using command
line at first.
2. If the command line still cannot update the signature database, please download
the signature database from online server to local and upgrade via ftp.
3. If the ftp update failed, verify if the device platform has been switched by checking
the device SN
4. If it does not, try to delete the AV in the device firstly then update, the deleting
method is as following:
Enter into shell
~ # ls -lh /flash/av/sig
total 102662
-rwxr-xr-x 1 root 0 140 Jul 30 14:51 av_sig_desc.xml
-rw-r--r-- 1 root 0 2.6M Jul 30 14:51 compiled_fishing_sig
-rw-r--r-- 1 root 0 532.4K Feb 18 15:47 compiled_malip_sig
-rw-r--r-- 1 root 0 57.4M Jul 30 14:51 compiled_malware_sig
-rw-r--r-- 1 root 0 27.6M Sep 29 2013 compiled_mdb_sig
-rw-r--r-- 1 root 0 12.1M Jul 30 14:51 compiled_sig
~#
~ # cd flash/av/sig
/wr_point/etc/mnt/mtdblock2/av/sig # ls
av_sig_desc.xml compiled_malip_sig compiled_mdb_sig
compiled_fishing_sig compiled_malware_sig compiled_sig
/wr_point/etc/mnt/mtdblock2/av/sig # rm av_sig_desc.xml
... (Delete all 6 files)

4. If the update still failed via ftp after deleting AV, please state the operation process
and submit to Research & Development team for analyzing.

4.9 Scvpn data disconnection


1. If scvpn cannot login,suggest refer to “scvpn Debugging instructions” document
and “The General Troubleshooting process for SCVPN client login issue”

Debugging The General


Troubleshooting process for SCVPN client login issue.docx
introduction for SCVPN.docx

2. After login to scvpn successfully,if the data is still disconnected, collecting the
following information:
 The debugging log in client-side
 Use “ debug dp basic、debug dp drop、debug dp ipsec、debug dp tunnel” in
device,active packet capturing in the computer at same time.
3. Check the captured packet, verify if the checksum of udp packet from the source
port 4433 of device is 0, if it does not, which means it has been changed by the
middle NAT device, so suggest user to connect public network directly.
4. If the checksum is ok, submit the debug and capturing packet to Research &
Development group to analyze

4.10 App recognition errors issue


1. Verify if the app signature database is the latest
2. Verify if the corresponding policy or qos configuration is correct, if the policy
sequence is correct or if the status is enable.
3. The application authentication does not work with the collecting information
Show app info
Show app cache table
Use “debug dp basic” in device if this function didn’t take effect and capturing packet
in the computer(If it is mobile phone application,provide the mobile phone system
version and the application software version)

4.11 Application layer function failure (ALG, URL filtering etc)


issue
1. Verity if ALG function enabled, if application ALG AUTO with non-standard port
enabled, if application identification activated.
2. Use “show dp-r packet-buffer” to verify if the surplus value is normal, use “show dp-
r packet-descriptor skip” to verify if the value keeping rise continuously. Otherwise
the buffer is leaking if these two value are abnormal, suggest to upgrade version
of software.
3. If the video delay please check the corresponding configuration of device, check if
it caused by configured limitation function, such as qos, session-limit and ad etc.
4. Check if both ends of communication has went through the NAT when the video and
audio function failed, if it does not, close the corresponding ALG function and clear
the session.
5. Some IP phone company’s protocol packets are irregular, they may need Hillstone
to active “icmp-unreachable-session-keep” this command under flow mode.
6. Filtering the source address and destination address, debugging the forwarding
procedure of data packet to check if there is any packet drop.
7. Upgrade to the latest P version.
8. Capturing mirrored packet in the inbound and outbound interface of device.

4.12 SMS modem does not work


1. Check the event log if it warns “Failed to send testing message to xxxx, the root
cause is equipment exception”, also check if the device system language is English.
If Chinese, please change to English.
2. Verify if the SMS modem itself is normal.
3. Change the USB line of SMS modem
4. Use USB drive to verify if the device’s USB port is good
5. If all of above are good, use “debug sms” to capture failed SMS sending information
and submit to Research & Development group to analyze.

4.13 L2TP Cannot dial-in


4.13.1 L2TP over IPSec
1. Use “debug vpn” to verify if p1 and p2 are correct, if the client is win7, then need
to enable the “life size” of p2, the value is “250000”.
2. If it does not work,use “debug vpn,debug l2tp” and collect the information

4.13.2 L2TP
1. If the client-end is win7 system, please make sure the relative registry has been
changed to 0 at first.
2. If the client-end is lac, the command of “l2tp-include-ppp-acf” need to be enabled
in the hillstone device.
3. If use AD authentication, L2TP need to use “PAP” authentication in the PPP
configuration.
4. Upgrade to 5.0R3P7 or higher version
5. Collecting device information of “debug l2tp”
6. If the client-end is computer, capture packets; if it is lac, collect the relative debug
information.

5. Performance Issue
5.1 CPU with high Utilization
When the device’s management system is getting slow, the throughput is dropping and
the delay is increasing, which means the device have performance problems. So now
we need to collect the device model, configuration, throughput, concurrency/new
connections and the CPU utilization.
1. Use “Show cpu detail” to check the utilization situation of each core CPU
2. Use “Show debug” to check if the debug function is closed or not
3. Use “Show session generic”
4. Add the statistical cluster of new session in WEBUI/CLI to check the numbers of
new session of current system, to judge if there is any new abnormal IP.
5. Add the statistical cluster of interface bandwidth in WEBUI/CLI to check the
throughput of current system
6. Use “Show controller slot 0 port 1 statistic (0: slot number; 1: port number)” to
check the statistics of large and small packet for each interface port, also Use
“ Show controller slot 0 port 1 statistic clear” to check the counting statistics of a
specific period, to confirm if it is the performance limitation
7. Use “Show process” to check which process takes high utilization
8. Enable the attack defense function at both trust and untrust zone, Use “show log
security” to check if there are many flood attack logs, also check the statistics to
see if there are some abnormal events of IP flow, session number and new session.
9. If the above problems were excluded, the functions required high-performance in
AV or IPS can be closed for temporary.
10. If the device use UTM function, suggest to use 5.0R3F5、5.0R4 or higher version.
11. Configuring “cpu-busy-drop-packet input-queue-threshold 20000” under flow
mode in urgent case。Function declaration:Discarding the non-DNS and non-http
new packets when the input queue over 20000. No impact to the existed session
data flow. No impact to the new created packets of DNS and HTTP. The scene
performance will be better on application of HTTP.
12. At last, use profiling to collect the CPU utilization of cycle, dmiss, imiss when the
CPU is in high utilization, also show the highest one, and use “show version detail”
to collect information at the same time.

5.2 Device logging and managing is slow


“Show cpu”, it is displayed that the CPU utilization is not high and data forwarding is
good. But the device logging speed is slow

5.2.1 Network Quality


1. Check if it is slow to login device from both of internet and intranet.
2. Ping the interface IP address of device to check if there is any delay or packet loss.
3. Check session-limit or qos function to see if the ip has been limited

5.2.2 Core0 with high utilization


1. Use “show cpu detail” to check the statistics of core0
2. Use “show flow-on-core0” to check if its status is off
3. Use “show process” to check the resource occupation of each process
4. If the process utilization of httpauth is too high,and the device enabled with
webauth or ARP authentication function, we can extent the timeout of webauth
and the re-authentication time.
5. If the process utilization of monitord is too high,please close some duplicate and
unnecessary statistics.
6. If the process utilization of vpnd is too high, use “show ipsec sa” to check the VPN
tunnel established by device, adjust the DPD packet sending interval of each
tunnel and peer device. In general, the recommend packet sending interval is
over 3 seconds.
7. If the process utilization of logd is too high,use “show log” to check record of log,
close all “log to console”,if the log recorded the syslog, close “ log traffic to
buffer”;Use “show log syslog” to check syslog of device configuration,combine
the multiple logs to one if there are multiple logs, also set the log format to binary
format if the syslog server is hillstone device; And close the duplicate log if the
user agree with that, such as NAT log and policy log that would reduce the output
of log.
8. If the process utilization of snmp is too high,use “show snmp-server” to check if
the unnecessary snmp server monitor can be discarded.
9. Change the default management port as uncommon port to avoid the common
port under attacks which makes the management is slow.

5.2.3 Hardware Issue


1. Show controller slot 0 port 1 statistic (1: port number) (check if the physical interface
loss packet)
2. Show controller slot 0 port 5 statistic (the most common device is M2105,if the
internal interface have packet loss),it might be hardware issue.
3. Use “show interface” to check duplex / rate.
4. Use “show cpu-cntr” to check if the controller has packet drop and the rate keeps
increasing, but if the cpu utilization is not high and there is no burst traffic, it might
be hardware issue.
5. Use “show controller slot 0 brige statistic” to check if brige has packet drop; Check
if the device bandwidth is mismatched between inbound and outbound.

5.3 Device out of memory


Show dp-r memory
show dp-r memory detail
show dp-r memory stat
Show memory detail
show mem-dbg
Show session gen
When the low level device had UTM function enabled, suggest to use 5.0R3 or higher
version and cut session at the same time.

6. Slow to access services, some services are abnormal


6.1 Cannot visit some website
1. Check if you are able to telnet to the 80 port of site, if DNS resolution works fine;
also check device configuration and basic resource status: policy, NAT, Routing,
session, session-limit, qos.
2. Check the configuration of NBC, AV, IPS; check the log which was generated by
corresponding function, see if the application layer function blocked it.
3. Check if it is multi-link, or try to change to another link.
4. Change the DNS configuration at your computer
5. Use “debug dp filter” and “debug dp basic、debug dp policy lookup、debug dp drop”
to check if there is any packet dropped within device software process
6. Check if this website have any limitation for part of IP addresses in the NAT address
pool, try to set NAT address separately for the testing computer.
7. Considering the effect of TCP MSS value, when the website cannot access, try to
capture packet in this PC to analyze if there is any big packet being transmitted or
discarded (capturing the packet from other PC which can access to this site
successfully, comparing the normal and abnormal packet), also try to reduce TCP
MSS value (If it go through VPN tunnel, change the DF configuration of IPSEC and
SCVPN to clear), and clear the corresponding session before testing.
8. If all the above problems were excluded, suggest user connecting PC with device
directly to test(exclude user’s internal network environment problems);If user is in
urgent situation, he may can connect with the external network or skip the device
if it is available, to narrow down the investigation range.

6.2 Slow to access some services


1. Check the device configuration and basic resource status:policy, NAT resource(non-
X device:show log event | in SNAT)、show session gen (check if device failed in
loading session)、session-limit、qos、 bandwidth usage、Track failure again and
again、show dp-r packet-buffer、show dp-r packet-descriptor、show cpu-cntr etc.
2. System Resource occupation situation : interface bandwidth 、 CPU with high
utilization etc、duplex and rate status of interface port.
3. Check if the device output bandwidth has been occupied
4. Check the user network status, if there is any delay or packet drop to the destination
address
5. Check security log, see if there is any packet drop under attack defense, such like
download from SCVPN or L2TP
6. If the service uses the non-standard port , please check if the application
identification of corresponding zone is active, and check session to confirm if the
system identification is good and alg auto is active.
7. If the AV function enabled, please check if AV behavior is fill magic, if it is, change it
to reset and then test again, usually this is about FTP and HTTP application.
8. Use “debug dp basic” to check the software process status
9. Capturing packet in the visited PC, check if there are many retransmissions cause
the slow, use “debug dp snoop、debug dp basic” to check forwarding status of
corresponding tcp packet with same SN.
10. Check if the server connect with access side automatically, or disorder existed
because of bad network quality, for examples, if the packet of RST and FIN received
early than normal data packets, so the session has been deleted, and the data
packet cannot find the session so it will be discarded by the system. Try to enable
“ full cone nat” function and test again, the general applications include game, FTP,
HTTP or some applications bases on client-end.
11. For the asymmetric route , unordered packet and high-speed file backup and file
download in Intranet, please close the corresponding tcp options, such as “tcp-seq-
check-disable、no tcp-syn-bit-check or no tcp-rst-bit-check under flow mode” , then
test again.
12. Comparing the captured packet from normal situation and abnormal situation, to
find the root cause of abnormal service
13. If all the above problems were excluded, suggest user connecting PC with device
directly to test (exclude user’s internal network environment problems);If user is
in urgent situation, he may can connect with the external network or skip the
device if it is available, to narrow down the investigation range.

7. HSA/HSM Typical problems


The detail troubleshooting procedure of HSA please refer to “How to handle common

How to handle
common issues in HSA.docx
issues in HSA”
Examples for the typical problems:
7.1 HSA do not receive log
1. Check if the configuration of HSA address information is correct
2. Check if the system time of HSA is correct, if it does not, synchronize the system
time through NTP server
3. Check if StoneOS version match with HSA version.
4. Check if the corresponding log function at StoneOS is enabled
5. Check if it able to ping the HSA IP address in firewall, use “SHOW ARP” to check if
the corresponding ARP information is correct.
6. Use “show ip route” to check if the route from system to HSA is correct.
7. Use “show session” to check the corresponding session for sending log, check if
the log was sent from the correct port to syslog server. If it does not, use “clear
Session dst-ip xxxx” and then check HSA log again. Usually the route was selected
incorrectly caused by the interface down/up, there is no session rematch after
the route recovered.
8. Check if HSA is in direct connection, if it does not, then check if there is any
limitations in the middle device.

7.2 HSA Disk Filled(Version 1.0)


1. Too many logs will cause the disk cannot automatic delete logs timely because of
the automatic deleting algorithm problem.
2. Delete logs:
(1)Use “df –h” to check the hard disk utilization
(2)Enter into index: cd /opt/my_cluster/mysqld_data
(3)Use command “ls” to check all the files of database
(4)Delete the earliest database, use command “rm –rf *20131106” to delete all the
database at 2013-11-06.

7.3 HSM Registration is abnormal


The detail troubleshooting procedure please refer to “HSM B&S Common Issues and

HSM B&S
Common Issues and Solutions.docx
Solutions”
1. Check if the location information of HSM is correct
2. telnet to the corresponding interface of HSM address, check if the interface is
open.
3. Check if you can ping the IP address of HSA/HSM;Use “show arp” to check if the
ARP information is correct.
4. Use “show ip route” to check if the route from system to HSA is correct.
5. Use “show session” to check the corresponding session for sending log, check if
the log was sent from the correct port to syslog server. If it does not, use “clear
Session dst-ip xxxx” and then check HSA log again. Usually the route was selected
incorrectly caused by the interface down/up, there is no session rematch after the
route recovered.
6. Check the packet interaction status between device and HSM by debug; also check
if HSM is direct connection, if not, please check if there is any limitations in the
middle device.
7. Check if the HSM license is correct.
8. Check if the quantity of device reach the upper limit. Although user can install
multiple device licenses to expand the quantity of manageable devices, HSM-200
can only manage 200 devices.
9. Check if the registration failure caused by the certificate installation failure.
Check SSL certificate in StoneOS by debug. StoneOS use the certificate of network
manager ca which were generated by system automatically to register at HSM-200,
user can use “show pki trust-domain” to check. If the system time of security
appliance is wrong, it may cause the certificate installation failure, then the device
cannot register at HSM
10. Enable debug function in the StoneOS side, and execute debug smagent /debug
nmagent to check relative debug information.

8. Hardware Failure
If the system cannot start or the LED indicators of device panel does not work properly
after device power on or firmware upgrade, it may consider the device has hardware
failure or firmware version issue, we need to collect the relative information according
to the following steps:
1. Check output of console,show en,show log alarm and show log alarm file
2. Check if StoneOS is compatible with platform; Check if the corresponding board-
card is compatible with StoneOS, for examples: IOC-2XFP card have old type and
new type, higher version 3.5R6p16、4.0R6 and 4.5R2 supports new card, but does
not supports FEC-AV-30/60 since after 5.0version, update COPPER use the ‘2’ tag
version.
3. Record the status and color of PWR, ALM, and interface LED indicators in the device
panel.
4. If the interfaces cannot collaborate, please check the interface configuration of
duplex and rate on both ends. Or use loopback method to test the interface and
module.
5. Plug in/out the board card during the system operating will cause the LED display
abnormal, please confirm if the board-card can support hot-plugging.

8.1 Power Failure


1. PWR LED light keeps in orange:Power works with problem
2. PWR LED light keeps in red:Power works with problem, system is down
3. PWR LED light off:No power for system、Inside fault in battery module、power switch failure、
Power supply LED light failure.

8.2 Fan Failure


1. FAN LED light keeps in orange:Fan system failure,the system can still work,need to replace
fan as soon as possible.
2. FAN LED light keeps in red:Fan system failure seriously, or the fan plugged out. The system
will be automatically shut down within 15 seconds.
3. Under CLI mode,use “show environment” to check the status of fan

8.3 Interface or Switch chip failure


1. The LED indicator does not light on or the brightness is abnormal when the
Ethernet interface is connected with cable.
2. Check the output of console to judge if the internal switch chip is failed 。As
following:
During the device startup process, if the key field of console output like following,
the problem is internal switch chip failure, it may cause the device reboot again
and again.
Unknown switch identifier
Unknown Device. Initialization failed
qdLoadDriver return Failed
LS NIC: slot 0 ls driver init failed
NIC: init failed for slot 0
3. The GE interface port of Ethernet network broken, the interface pin get bending.

8.4 Memory Failure


Memory Failed or Memory with poor contact:
HILLSTONE NETWORKS
Hillstone Bootloader 2.1.3 Apr 23 2010-13:29:59
Product S/N: 0802042080021835 Assembly number: B023
Platform: SG-6000-M2100
DRAM: 512 MB
PWR LED indicator keeps in green,STA and ALM LED indicators does not light on. All
the LED indicators of interface keep lighting all the time. The output of console stopped
in DRAM. Usually the problem happened after shipping or device abnormal vibration.
Memory Detection Method: repeat the input after booting the device:Use console
to connect with Hillstone security appliance,then shut the power down and reboot
device to execute the command “hs_test_inflash”.
If there is memory integrity test means it has enter into memory test page, 02%
present the memory test progress.
HILLSTONE NETWORKS
Hillstone Bootloader 2.2.5 Sep 28 2011-15:35:28
Product S/N: 0802042080021835 Assembly number: B023
Platform: SG-6000-M2105
DRAM: 512 MB
Memory integrity test from 0000000000000000 to 0000000002000000, step 8
02%
After test,the screen will show up “FAILURE XXX” if there is any problem in memory.

8.5 Device Crystal Oscillator Failure


Update the firmware version to the latest, failure description: the output bandwidth
did not overload, CPU and memory works right. Packet drop in the internal interface
of firewall. The web page cannot open as normal. Key world:
InCRCAlignErrors: InBadPkts:. Usually the situation will happen in low-end models,
such as SA-2001, SA-2001A, MS-10, VR5600T,SG6000-M2105,NAV20. And most of
cases happened in the products which was produced before year 2010. Year of
manufacture can be found in the Sn, use “show ver” to find Sn, and the number eight
is the year of manufacture.
CLI information collection:(For instances: SA-2001,Port5 as internal interface port)
SA-2001# show controller slot 0 p 5 statistic clear
ethernet0/5:
InGoodOctets: 10253 InBadOctets: 8818
InGoodPkts: 57 InBadPkts: 13
InUnicastPkts: 57 InBroadcastPkts: 0
InMulticastPkts: 0 InControlPkts: 0
InUndersizePkts: 0 InOversizePkts: 0
InFragments: 0 InJabbers: 0
InMACRcvErrors: 0 InCRCAlignErrors: 13
DropEvents: 0
OutGoodOctets: 19192 OutGoodPkts: 86
OutUnicastPkts: 78 OutBroadcastPkts: 1
OutMulticastPkts: 7 OutControlPkts: 0
Collisions: 0 OutDropDeferrals: 0
SingleCollisions: 0 MultipleCollisions: 0
ExcessiveCollisions: 0 LateCollisions: 0
Pkts64Octets: 3 Pkts65to127Octets: 77
Pkts128to255Octets: 43 Pkts256to511Octets: 14
Pkts512to1023Octets: 12 Pkts1024toMaxOctets: 7

You might also like