Information Collection and Basic Troubleshooting Procedure EN V1.0
Information Collection and Basic Troubleshooting Procedure EN V1.0
3. Service interruption
The service interruption might cause by many of reasons, therefore, the multi-analysis
is required when the service interruption happened. The basic procedure is to check
network connectivity first, firewall configuration, device or function connectivity, and
the last step is debugging.
1. Ping upstream gateway and downstream from HS
2. Ping client from HS
3. Ping server from HS
4. Ping HS interface IP from client
5. Ping server from client
6. Traceroute to judge if the route is correct from client
7. Check if the basic network configuration of firewall changed, such as policy, nat,
router, arp and mac etc.
8. Check if it is wrong configuration or false positives of function by show session
limit 、show qos
9. Check if resource is all right by show cpu detail /show cpu-cntr /show dp-r packet-
buffer/show session gen/show snat resource
10. Check if the session established correctly by show session src-ip x.x.x.x or show
session dst-ip,also check if the address after NAT and Egress interface is correct
(show ip route interface to check interface ID)and show arp
11. debug dp filter src-ip x.x.x.x or Debug dp filter dst-ip x.x.x.x
12. debug dp basic、debug dp drop、debug dp snoop
4.7 HA issue
1. Use “show version” to check if the version, hardware model and license of two HA
devices are the same, if the board card is the same; if the active function is the
same. Such as Vrouter, policy mode etc.
2. Comparing the configuration files of two devices, to check if there is any mac-cone
configuration etc.
3. Use “show ha group 0” to verify the ha group basic information in two HA devices,
and to check if the HA priority, monitoring, preemption and Peer information are
correct and complete.
4. Use “show ha link status” to check HA heartbeat line information
5. Use “show ha flow statistics” to check the transmission status of main packet and
sub-packet.
6. Use “show ha protocol statistic” to check the heartbeat packet receiving and
transmission status between two HA devices.
7. Ping the heartbeat line IP of peer side to check if the heartbeat is normal, also
check if there is any IP conflict in the heartbeat interface.
8. Use “show ha sync statistic” to check synchronize statistic of HA device
9. Use “show ha protocol statistic” to check “hello” message and other related
information of HA device
10. Use “show interface” to check if the MAC of HA interface is correct
11. Use “show ha cluster” to check if the HA cluster of two devices are the same
12. Check if the defined ha track time interval is too short,or the set value is too
small.(The recommend time interval is longer than 3 seconds)
13. Use “debug ha” to check the HA state, negotiation status and file synchronization.
14. Check the MAC and port learning conditions of upstream and downstream switch
or router
15. Capturing the packet to analyze the sending of free arp
16. The device management is abnormal after HA switched,active the free arp
sending function of corresponding zone, update the upper and down device ARP
after reinforce the HA switch
17. Check if the time configuration of preemption and hello is appropriate, suggest
to set the time a little bit longer
18. If the downstream device is old,which cannot sent free ARP after HA switch,
suggest to enable arp-passive-learning under flow mode in the hillstone end.
2. After login to scvpn successfully,if the data is still disconnected, collecting the
following information:
The debugging log in client-side
Use “ debug dp basic、debug dp drop、debug dp ipsec、debug dp tunnel” in
device,active packet capturing in the computer at same time.
3. Check the captured packet, verify if the checksum of udp packet from the source
port 4433 of device is 0, if it does not, which means it has been changed by the
middle NAT device, so suggest user to connect public network directly.
4. If the checksum is ok, submit the debug and capturing packet to Research &
Development group to analyze
4.13.2 L2TP
1. If the client-end is win7 system, please make sure the relative registry has been
changed to 0 at first.
2. If the client-end is lac, the command of “l2tp-include-ppp-acf” need to be enabled
in the hillstone device.
3. If use AD authentication, L2TP need to use “PAP” authentication in the PPP
configuration.
4. Upgrade to 5.0R3P7 or higher version
5. Collecting device information of “debug l2tp”
6. If the client-end is computer, capture packets; if it is lac, collect the relative debug
information.
5. Performance Issue
5.1 CPU with high Utilization
When the device’s management system is getting slow, the throughput is dropping and
the delay is increasing, which means the device have performance problems. So now
we need to collect the device model, configuration, throughput, concurrency/new
connections and the CPU utilization.
1. Use “Show cpu detail” to check the utilization situation of each core CPU
2. Use “Show debug” to check if the debug function is closed or not
3. Use “Show session generic”
4. Add the statistical cluster of new session in WEBUI/CLI to check the numbers of
new session of current system, to judge if there is any new abnormal IP.
5. Add the statistical cluster of interface bandwidth in WEBUI/CLI to check the
throughput of current system
6. Use “Show controller slot 0 port 1 statistic (0: slot number; 1: port number)” to
check the statistics of large and small packet for each interface port, also Use
“ Show controller slot 0 port 1 statistic clear” to check the counting statistics of a
specific period, to confirm if it is the performance limitation
7. Use “Show process” to check which process takes high utilization
8. Enable the attack defense function at both trust and untrust zone, Use “show log
security” to check if there are many flood attack logs, also check the statistics to
see if there are some abnormal events of IP flow, session number and new session.
9. If the above problems were excluded, the functions required high-performance in
AV or IPS can be closed for temporary.
10. If the device use UTM function, suggest to use 5.0R3F5、5.0R4 or higher version.
11. Configuring “cpu-busy-drop-packet input-queue-threshold 20000” under flow
mode in urgent case。Function declaration:Discarding the non-DNS and non-http
new packets when the input queue over 20000. No impact to the existed session
data flow. No impact to the new created packets of DNS and HTTP. The scene
performance will be better on application of HTTP.
12. At last, use profiling to collect the CPU utilization of cycle, dmiss, imiss when the
CPU is in high utilization, also show the highest one, and use “show version detail”
to collect information at the same time.
How to handle
common issues in HSA.docx
issues in HSA”
Examples for the typical problems:
7.1 HSA do not receive log
1. Check if the configuration of HSA address information is correct
2. Check if the system time of HSA is correct, if it does not, synchronize the system
time through NTP server
3. Check if StoneOS version match with HSA version.
4. Check if the corresponding log function at StoneOS is enabled
5. Check if it able to ping the HSA IP address in firewall, use “SHOW ARP” to check if
the corresponding ARP information is correct.
6. Use “show ip route” to check if the route from system to HSA is correct.
7. Use “show session” to check the corresponding session for sending log, check if
the log was sent from the correct port to syslog server. If it does not, use “clear
Session dst-ip xxxx” and then check HSA log again. Usually the route was selected
incorrectly caused by the interface down/up, there is no session rematch after
the route recovered.
8. Check if HSA is in direct connection, if it does not, then check if there is any
limitations in the middle device.
HSM B&S
Common Issues and Solutions.docx
Solutions”
1. Check if the location information of HSM is correct
2. telnet to the corresponding interface of HSM address, check if the interface is
open.
3. Check if you can ping the IP address of HSA/HSM;Use “show arp” to check if the
ARP information is correct.
4. Use “show ip route” to check if the route from system to HSA is correct.
5. Use “show session” to check the corresponding session for sending log, check if
the log was sent from the correct port to syslog server. If it does not, use “clear
Session dst-ip xxxx” and then check HSA log again. Usually the route was selected
incorrectly caused by the interface down/up, there is no session rematch after the
route recovered.
6. Check the packet interaction status between device and HSM by debug; also check
if HSM is direct connection, if not, please check if there is any limitations in the
middle device.
7. Check if the HSM license is correct.
8. Check if the quantity of device reach the upper limit. Although user can install
multiple device licenses to expand the quantity of manageable devices, HSM-200
can only manage 200 devices.
9. Check if the registration failure caused by the certificate installation failure.
Check SSL certificate in StoneOS by debug. StoneOS use the certificate of network
manager ca which were generated by system automatically to register at HSM-200,
user can use “show pki trust-domain” to check. If the system time of security
appliance is wrong, it may cause the certificate installation failure, then the device
cannot register at HSM
10. Enable debug function in the StoneOS side, and execute debug smagent /debug
nmagent to check relative debug information.
8. Hardware Failure
If the system cannot start or the LED indicators of device panel does not work properly
after device power on or firmware upgrade, it may consider the device has hardware
failure or firmware version issue, we need to collect the relative information according
to the following steps:
1. Check output of console,show en,show log alarm and show log alarm file
2. Check if StoneOS is compatible with platform; Check if the corresponding board-
card is compatible with StoneOS, for examples: IOC-2XFP card have old type and
new type, higher version 3.5R6p16、4.0R6 and 4.5R2 supports new card, but does
not supports FEC-AV-30/60 since after 5.0version, update COPPER use the ‘2’ tag
version.
3. Record the status and color of PWR, ALM, and interface LED indicators in the device
panel.
4. If the interfaces cannot collaborate, please check the interface configuration of
duplex and rate on both ends. Or use loopback method to test the interface and
module.
5. Plug in/out the board card during the system operating will cause the LED display
abnormal, please confirm if the board-card can support hot-plugging.