Imanager U2000 Troubleshooting - (V100R001C00 - 01)
Imanager U2000 Troubleshooting - (V100R001C00 - 01)
V100R001C00
Troubleshooting
Issue 01
Date 2009-09-25
Website: http://www.huawei.com
Email: [email protected]
and other Huawei trademarks are the property of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective holders.
Notice
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but the statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Contents
Figures
Tables
Purpose
This document describes the procedure for handling a fault, information collecting, fault
identifying, fault handling, and suggestions on U2000 troubleshooting.
Related Versions
The following table lists the product versions related to this document.
Intended Audience
This document is intended for:
l U2000 system administrators
l Technical support engineers
Organization
This document describes the operations that are performed by the NMS administrators on the
U2000 .
Chapter Description
1 Basic Principles of You need to locate and clear a fault by observing the
Troubleshooting troubleshooting principles and cautions.
3 Fault Data Collection In the case of a system fault, you need to collect the
related data in a timely manner, to locate and handle the
fault.
Chapter Description
5 Faults of the Operating System This topic describes how to troubleshoot the faults of the
operating system.
6 Faults of the Database This topic describes how to troubleshoot the faults of the
database.
7 U2000 Server Troubleshooting This topic describes how to troubleshoot the U2000
server.
8 Faults of the U2000 Client This topic describes how to troubleshoot the faults of the
U2000 client.
11 NMS System Maintenance This topic describes how to troubleshoot the NMS
Tool Troubleshooting system maintenance tool.
A Obtaining the Technical This topic describes how to obtain the technical support
Support in the case of any problems encountered during routine
maintenance.
Conventions
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Symbol Description
General Conventions
The general conventions that may be found in this document are defined as follows.
Convention Description
Command Conventions
The command conventions that may be found in this document are defined as follows.
Convention Description
GUI Conventions
The GUI conventions that may be found in this document are defined as follows.
Convention Description
Keyboard Operations
The keyboard operations that may be found in this document are defined as follows.
Format Description
Key Press the key. For example, press Enter and press Tab.
Key 1+Key 2 Press the keys concurrently. For example, pressing Ctrl+Alt
+A means the three keys should be pressed concurrently.
Key 1, Key 2 Press the keys in turn. For example, pressing Alt, A means
the two keys should be pressed in turn.
Mouse Operations
The mouse operations that may be found in this document are defined as follows.
Action Description
Click Select and release the primary mouse button without moving
the pointer.
Drag Press and hold the primary mouse button and move the
pointer to a certain position.
Update History
Updates between document versions are cumulative. Therefore, the latest document version
contains all updates made to previous versions.
You need to locate and clear a fault by observing the troubleshooting principles and cautions.
Troubleshooting Principles
To analyze, locate, and clear a fault, observe the following principles:
l Restore the system monitoring as soon as possible.
l Before locating a fault, collect the fault data in a timely manner, and save the collected data
to a mobile storage medium or another computer in the network.
l When determining the troubleshooting scheme, evaluate the impact first, to ensure the
normal transmission of services.
l If the fault point cannot be located or the fault cannot be cleared, contact Huawei to obtain
technical support. Cooperate with engineers from Huawei for the troubleshooting, to
minimize the period of service interruption.
Troubleshooting Cautions
l Analyze the fault symptom, and handle the fault after locating the cause. If the cause is
unknown, do not perform operations blind, to prevent the problem from being enlarged.
The repairing of faults on the U2000 does not affect the NE running.
l Before handling a fault, keep all onsite records concerning the fault and do not delete any
data or log randomly.
l Before any modification, back up the data of the U2000 by exporting the script or backing
up the NMS data.
l After the system recovers, observe the running status, to make sure that the fault is cleared.
Complete the related handling report in a timely manner.
2 Troubleshooting Process
When the U2000 is abnormal because of mis-operations, external causes such as power failure,
and software and hardware faults of the U2000 , the network may fail to be monitored. In this
case, you can locate the fault and repair the system by referring to the troubleshooting process
and observing the troubleshooting principles and cautions. If the problem persists, contact the
local office or customer service center of Huawei.
Figure 2-1 shows the troubleshooting process.
Start
Yes
Generate an alarm? Process the alarm
No
No
Collect fault information Fault removed?
Yes
Yes
Emergency? End
No
Yes
End
NOTE
l Normally, the troubleshooting consists of three stages: locating the fault, collecting the information,
and clearing the fault.
l If an alarm or abnormal event occurs on the U2000, clear the fault according to the prompt.
In the case of a system fault, you need to collect the related data in a timely manner, to locate
and handle the fault.
When a fault occurs on the U2000, see Table 3-1 to collect the fault data.
NOTE
It is recommended that you use the Quick Step tool to collect the related data. For details, refer to the
Huawei iManager U2000 User Guide (Quick Step).
Time and place Collect the information about the time and place of the fault. The time
should be accurate to the minute.
Symptom Describe the symptom when the fault occurs. The fault can be located
description better based on a more specific description.
Measures taken After you take some preliminary troubleshooting measures in field, new
and result problems may occur. Therefore, you need to record the procedure of
taking measures and the subsequent result in details.
IP information Run the following commands to view the IP address and MAC address:
l On Solaris and Linux, log in as user root and run the ifconfig -a
command.
l On Windows, open the command prompt window and run the
ipconfig /all command.
Alarm information Collect the alarm information, especially the U2000 alarms or abnormal
events.
Log information In the Solaris and Linux OS, do as follows to collect the log information
about the OS, database, and U2000:
l Use the Quick Step tool to collect the information about the OS and
database. For details, refer to the Huawei iManager U2000 User
Guide (Quick Step).
l For the details about collecting the log information about the
U2000, refer to Log Management in the Huawei iManager U2000
Administrator Guide.
In the Windows OS, collect the log information about the operating
system, database, and U2000 in the following method:
l Choose Start > Run from the desktop. Enter eventvwr.msc and then
press Enter. In Event Viewer, select the corresponding event name,
and right-click to save the log information of the operating system.
l In the MSSQLServer_installation_directory\MSSQL\LOG
directory, collect all the logs.
l Collect U2000 information, for details, refer to Log Management
in the Huawei iManager U2000 Administrator Guide.
Networking If the fault is caused by networking problems, you need to view the
diagram networking diagram.
ICMR-related files If the server runs on Solaris and Linux, you need to collect the ICMR-
related files:
l All files in the /etc/ICMR directory
l Files in the /var/ICMR directory
4 NE Management Troubleshooting
Possible Cause
The possible causes are:
l The DCN between the NMS and the NE is faulty.
l The communication parameters of the NMS or the NE are incorrectly set.
l The NE is being restarted and does not respond.
Procedure
l Check the DCN between the U2000 and the NE.
1. Check that the U2000 and the NE are reachable. You can use the ping command to
check the network connectivity between the NMS and the NE and the packet loss ratio.
2. Rectify the fault according to the onsite condition.
l Check the settings of the parameters on the NMS and the NE.
1. Check the settings of the NMS communication parameters, including the IP address
and the parameters related to the gateway.
2. Check the settings of the NE parameters, including the IP address, ID, extension ID,
and the parameters related to the gateway.
3. Check whether the name and password of the user logging in to the NE are correct.
4. Make sure that the settings of the parameters for the creation of the NE are the same
as those on the device side.
l If the NE is being restarted and does not respond, add the NE after the restart is complete.
----End
Possible Cause
l The number of NEs exceeds the maximum management capability of the NMS.
l The disk space is insufficient.
Procedure
Step 1 Check whether the number of NEs exceeds the maximum management capability of the NMS.
For the performance indicators, refer to the Huawei iManager U2000 Product Description.
Step 2 Check the disk space of the server. In normal situations, the disk usage cannot exceed 80%. If
the disk usage exceeds 80%, clear the disk. You can delete and back up related files to free the
disk space.
----End
Possible Cause
There are too many non-gateway NEs that are connected to a gateway NE. Thus, the scale of
the subnets is too large and the ECC storm occurs.
Procedure
Step 1 Run the ping command to check whether the IP addresses of the disconnected gateway NEs are
available.
Step 2 Check whether the number of non-gateway NEs connected to a gateway NE exceeds the
maximum.
For the maximum number of non-gateway NEs connected to a gateway NE, refer to the product
description of the related version. If the actual number exceeds the maximum, modify the actual
number according to the planning.
----End
Possible Cause
The NMS database is abnormal.
Procedure
Step 1 Initialize the database. For details, refer to Backing Up and Restoring the U2000 Database in
the Huawei iManager U2000 Administrator Guide.
Step 2 Manually recover the U2000 data. For details, refer to Backing Up and Restoring the U2000
Database in the Huawei iManager U2000 Administrator Guide.
----End
This topic describes how to troubleshoot the faults of the operating system.
5.1 Solaris OS Troubleshooting
This topic describes how to troubleshoot the Solaris OS.
5.2 Linux OS Troubleshooting
This topic describes how to troubleshoot the Linux OS.
Symptom
The operating system enters the single-user mode after restart. A message is displayed indicating
"WARNING - Unable to repair the / filesystem. Run fsck manually (fsck -F ufs /dev/rdsk/
c*t*d*s*)."
NOTE
In the warning prompt "Unable to repair the / filesystem", the / may indicate another directory.
Possible Cause
The server is switched off illegally or powered off. Therefore, the file system that is running is
damaged. After the powered supply is restored, the system performs a self-check during the
startup of the server. If the file system is detected damaged, the self-check fails and the system
enters the single-user mode during the startup.
Procedure
Step 1 Log in to the operating system as user root.
CAUTION
l If the disk capacity is large and the file system is damaged severely, it may take a long time
to restore the file system by using the fsck -y command. During the restoration, do not
perform any operation to the server. Otherwise, the operating system cannot recover.
l The fsck command can be used to rectify only normal faults. For the fault on the Solaris
startup parameters or kernel damage due to abnormal power failure, the command is invalid.
Step 3 Observe the information displayed on the screen. Check whether the file systems of all partitions
are correct and whether the file system of the damaged partition is restored.
If the error information or the information that requires restoration is displayed again, run the
fsck -y command repeatedly until such information is not displayed again.
Step 4 To synchronize the files and restart the operating system, run the following commands:
# sync;sync;sync;sync;sync;sync
# init 6
----End
Symptom
The operating system is started repeatedly. A message is displayed indicating "Cannot open‘/
etc/path_to_inst’Program terminated." Then the system is started repeatedly.
Possible Cause
The server is powered off abnormally or other abnormal operations are performed. This causes
that the operating system is damaged and the path_to_inst system file cannot be opened.
Therefore, the operating system cannot be started.
Procedure
Step 1 During self-check of the operating system (before entering the operating system), press STOP
+A to exit the startup. The ok prompt is displayed.
Step 2 Insert the installation CD-ROM of Solaris 10. To start from the CD-ROM and enter the single-
user mode, run the following command:
ok boot cdrom -s
NOTE
Wait for 5 minutes. When SINGLE USER MODE and # are displayed, the system enters the single user
start mode.
Step 3 To search for the corresponding raw equipment name of the system root directory, run the
following commands:
# cat /etc/vfstab
NOTE
The displayed message changes according to different actual conditions.
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c1t0d0s1 - - swap - no -
/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 / ufs 1 no -
/dev/dsk/c1t0d0s7 /dev/rdsk/c1t0d0s7 /T2000 ufs 2 yes -
/dev/dsk/c1t0d0s6 /dev/rdsk/c1t0d0s6 /opt ufs 2 yes -
/devices - /devices devfs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -
/dev/dsk/c1t1d0s0 /dev/rdsk/c1t1d0s0 /version ufs 2
yes -
In the preceeding message, the corresponding raw partition of the root directory (/) is /dev/dsk/
c1t0d0s0.
Step 4 Set the corresponding raw equipment of the root directory to the /mnt directory to restore the
damaged operating system.
# mount raw equipment name /mnt
For example, run the following commands to set the /dev/dsk/c1t0d0s0 to the /mnt:
# mount /dev/dsk/c1t0d0s0 /mnt
Step 5 If /etc/path_to_inst is lost, run the following commands to restore it by using the path_to_inst-
INSTALL template that is reserved in the /etc directory by the system.
# cd /mnt/etc
# cp path_to_inst-INSTALL path_to_inst
Step 6 Run the following commands to synchronize the file and restart the operating system:
# sync;sync;sync;sync;sync;sync
# init 6
Step 7 After the system restarts normally, run the fsck -y command to repair the file system.
----End
Symptom
After the workstation is started, a message is displayed indicating that the display is unadapted
and errors occur in the /var/dt/Xerrors file.
Possible Cause
The peripherals of the workstation are incorrectly connected. For example, the mouse or
keyboard is not connected or connected improperly.
Procedure
Step 1 Repair the connection of the peripherals (such as the mouse, keyboard, and display) according
to the information displayed on the screen.
----End
Symptom
After the Solaris OS is started, the user cannot log in to the GUI.
Possible Cause
Abnormal shutdown may damage the file system. Consequently, the user cannot log in to the
GUI after the Solaris OS is started. In this case, you can use the fsck command to restore the
file system.
Procedure
Step 1 After the Solaris OS is started, enter the password of the root user according to the prompt to
access the CLI.
Step 2 Run the following command for several times to automatically rectify the fault:
# fsck -y
NOTE
The fsck command can be used to rectify only normal faults. For the fault on the Solaris startup parameters
or kernel damage due to abnormal power failure, the command is invalid.
----End
Symptom
When the graphical tools are used on Solaris, such as the smc, a message is displayed indicating
"can’t open to display."
Possible Cause
The DISPLAY environment variable may not be set in GUI mode.
Procedure
Step 1 Log in to the OS in GUI mode.
Step 2 To query the terminal number, run the following commands as user root:
# set | grep DISPLAY
# xhost +
Step 3 To set the DISPLAY environment variable, run the following commands:
# DISPLAY=local host name (or IP address):local terminal No.
# export DISPLAY
# set | grep DISPLAY
DISPLAY=10.70.77.62:0.0
# xhost +
# DISPLAY=10.70.77.62:0.0
# export DISPLAY
----End
Symptom
A CD-ROM is in the CD-ROM drive. When you use the eject command to open the drive, the
system prompts Device busy and the CD-ROM cannot be ejected.
Possible Cause
The data in the CD-ROM is in use.
Procedure
Step 1 Check that the data in the current CD-ROM is not in use.
Step 3 Press the eject button on the drive panel to take out the disk from the CD-ROM.
----End
Symptom
Certain operations are abnormal. For example, the operation system cannot be logged in to, the
operation system runs at a low speed, the database cannot be started, or the U2000 cannot be
started.
Possible Cause
Normally, the disk space occupancy should be 80% or below.
Procedure
Step 1 Check the disk space. Do as follows:
(1) Log in to the Solaris OS as the root user.
(2) Run the following command to check the disk usage:
# df -k
(3) View the usage of the directories including the / directory, /opt directory, and /opt/
U2000 directory in the displayed information.
Step 2 If the size of the disk space exceeds the normal value, you need to manually clear the disk. For
details, refer to Managing U2000 Files and Disks in the Huawei iManager U2000
Administrator Guide.
----End
Symptom
The U2000 runs at a low speed.
Possible Cause
The memory may be insufficient.
Procedure
Step 1 To check the memory occupancy status, run the following command as user root:
# vmstat 2
If the value of the sr column remains at a value from 200 to 300 page/sec, it indicates that the
physical memory may be insufficient.
Step 2 Close unnecessary applications.
Step 3 If the memory occupancy remains high, you need to replace the physical memory.
----End
Possible Cause
The CPU usage may be over high.
Procedure
Step 1 To check the memory occupancy status, run the following command as user root:
# vmstat 2
In the last column, id indicates the idle CPU ratio. If the idle CPU ratio remains below 10% for
a long time, the dominant frequency of the CPU mainly bottlenecks the running efficiency.
Step 2 Close unnecessary applications.
----End
Possible Cause
The settings of the parameters on the SaX2 tool do not match those of the related parameters on
the video card drive of the OS.
Procedure
Step 1 Log in to the system as the root user. Run the following commands to open the GUI for
configuring the SaX2 tool:
# init 3
# sax2
Step 2 Set the resolution of the monitor to VESA 1024*768@60HZ. Click OK.
----End
Symptom
The backup file does not exist in the path specified in the backup task of the database backup
tool.
Possible Cause
The possible causes that result in the database backup failure are as follows:
l The database is not started.
l Full Disk Space.
l The authorities of the backup path may be incorrect.
Procedure
Step 1 Check that the database is normally started.
Step 2 Check the disk space. For details, see 5.1.5 Operation Anomaly Caused by Insufficient Disk
Space.
----End
1 Check whether the disk Rectify the fault with reference to 5.1.5 Operation
usage exceeds the limit. Anomaly Caused by Insufficient Disk Space.
2 Check whether the Rectify the fault with reference to 6.1.2.5 Incorrect
configuration file for user Configuration File for the sybase User.
sybase is incorrect.
3 Check whether there is any Rectify the fault according to the following error
error message in logs. messages:
l 6.1.2.1 Prompting Permission denied in Logs
l 6.1.2.2 Prompting Shared memory segment *.krg
is in use in Logs
l 6.1.2.3 Prompting the Incorrect Setting of the
Shared Memory in Logs
l 6.1.2.4 Prompting the Failure of Opening
lv_master in Logs
Symptom
In the single-node cluster, the Sybase database cannot be started.
The following message is displayed in the $SYBASE/$SYBASE_ASE/install/DBSVR.log:
00:00000:00000:2004/10/10 00:03:16.63 kernel dopen: open '/dev/rdsk/c1t1d0s3',
Permission denied 00:00000:00000:2004/10/10 00:03:16.63 kernel kdconfig: unable to
read primary master device 00:00000:00000:2004/10/10 00:03:16.65 kernel kiconfig:
read of config block failed
Possible Cause
In the preceding message, Permission denied indicates that the authorities to the file are
insufficient, which causes that the file cannot be read. Therefore, the database server cannot be
started.
CAUTION
The following operations of rectifying the fault are specific only to the single server system. If
similar faults occur to the HA system, contact the local office or customer service center of
Huawei for troubleshooting.
Procedure
Step 1 Determine the user (nmsuser, sybase, root, or other names) that is used to start the Sybase. The
correct user should be sybase.
Step 2 Check the raw partition or the file that reports Permission denied in the log, and check whether
the user that is used to start the database has the authorities to access the file or raw partition (a
disk partition without having a file system imposed over it). If the user does not have the
authorities, assign authorities to the user.
NOTE
The equipment files are placed in the $SYBASE/data directory. You can change the authorities to an
equipment file by running the chmod 755 equipment file name command.
----End
Symptom
In the single-node cluster, the Sybase database cannot be started.
Possible Cause
The Sybase database server is shut down improperly. Therefore, the DBSVR.krg and
DBSVR.srg junk files exist in the $SYBASE or $SYBASE/$SYBASE_ASE directory.
CAUTION
The following operations of rectifying the fault are specific only to the single server system. If
similar faults occur to the HA system, contact the local office or customer service center of
Huawei for troubleshooting.
Procedure
Step 1 Log in to the operating system as user sybase.
Step 2 Run the following commands, and check whether the DBSVR.krg and DBSVR.srg files exist
in the $SYBASE or $SYBASE/$SYBASE_ASE directory.
$ cd $SYBASE
$ ls -al
$ cd $SYBASE/$SYBASE_ASE
$ ls -al
Step 3 If the DBSVR.krg and DBSVR.srg files exist, run the following commands to delete the files.
$ rm -rf DBSVR.krg
$ rm -rf DBSVR.srg
----End
Symptom
In the single-node cluster, the Sybase database cannot be started.
The following message is displayed in the $SYBASE/$SYBASE_ASE/install/DBSVR.log:
00:00000:00000:2005/07/20 17:07:15.41 kernel Using config area from primary master
device. 00:00000:00000:2005/07/20 17:07:16.65 kernel Warning: Using default file
'/opt/sybase1192/DBSVR.cfg' since a configuration file was not specified. Specify
a configuration file name in the RUNSERVER file to avoid this message.
00:00000:00000:2005/07/20 17:07:17.39 kernel os_create_region: can't allocate
260775936 bytes 00:00000:00000:2005/07/20 17:07:17.42 kernel kbcreate: couldn't
create kernel region. 00:00000:00000:2005/07/20 17:07:17.42 kernel kistartup:
could not create shared memory
Possible Cause
The /etc/system file is not configured with correct shared memory.
CAUTION
The following operations of rectifying the fault are specific only to the single server system. If
similar faults occur to the HA system, contact the local office or customer service center of
Huawei for troubleshooting.
Procedure
Step 1 Add set shmsys:shminfo_shmmax=memory (MB) x 1024 x 1024/2 at the end of the /etc/
system file.
(1) To check the memory, run the following command as user root:
# prtdiag
Memory size:2GB
(2) Add set shmsys:shminfo_shmmax=memory (MB) x 1024 x 1024/2 at the end of the /etc/
system file.
For example, if the memory is 2 GB, the value of the memory (2048MB) x 1024 x 1024/2
is 1073741824.
Then, add the following contents at the end of the /etc/system file:
set shmsys:shminfo_shmmax=1073741824
TIP
l In the case of GUI, see the methods of opening and editing a file in the Solaris Online Help.
l In the case of CLI, edit the file by running the vi command. For the specific method, see the
commands that are commonly used on Solaris.
----End
Symptom
In the single-node cluster, the Sybase database cannot be started.
Possible Cause
The equipment file of the master database is lost.
CAUTION
The following operations of rectifying the fault are specific only to the single server system. If
similar faults occur to the HA system, contact the local office or customer service center of
Huawei for troubleshooting.
Procedure
Step 1 Back up the U2000 data to the local server. For details, see the Huawei iManager U2000
Administrator Guide.
Step 2 Reinstall the Sybase database. For details, see the Huawei iManager U2000 Installation Guide
CAUTION
The U2000 monitoring may be interrupted during the database reinstallation. Therefore, ensure
that the database data is backed up for data restoration.
Step 3 Initialize the U2000 database. For details, see the administrator guide for the corresponding
version and solution.
CAUTION
Data may be lost during the database initialization. Therefore, ensure that the database data is
backed up before the initialization.
Step 4 Restore the U2000 database data. For details, see the administrator guide for the corresponding
version and solution.
Step 5 Restart the database.
----End
Symptom
In the single-node cluster, the Sybase database cannot be started.
After switching to the sybase user by running the su - sybase command, a certain user runs the
showserver command. The query result does not contain the dataserver and backupserver
processes.
Possible Cause
The following configuration files for the sybase user may be faulty:
l The sybase user group does not exist.
l The sybase user does not exist.
l The .profile file does not exist in the home directory of the sybase user.
l The .profile file of the sybase user is incorrect.
CAUTION
The following operations of rectifying the fault are specific only to the single server system. If
similar faults occur to the HA system, contact the local office or customer service center of
Huawei for troubleshooting.
Procedure
Step 1 To check whether the sybase user group exists, run the following command as the root user:
# cat /etc/group
If sybase is displayed before the first : in the preceding message, it indicates that the sybase
user group exists. Otherwise, run the following command as the root user to create the sybase
user group manually:
# groupadd sybase
Step 2 To check whether the sybase user exists, run the following command as the root user:
# cat /etc/passwd
If sybase is displayed before the first : in the preceding message, it indicates that the sybase
user exists. Otherwise, run the following command as the root user to create the sybase user
manually:
# useradd -d /opt/sybase -g sybase -s /usr/bin/sh sybase
Step 3 To check whether the .profile file exists in the home directory of the sybase user, run the
following command as the root user:
# su - sybase
$ cd $HOME
$ ls -a
If the .profile file is displayed, it indicates that the .profile file exists. Otherwise, run the
following command as the root user to create the file manually:
# touch /opt/sybase/.profile
Step 4 To check whether the .profile file is correct, run the following command as the sybase user:
$ cat .profile
If the preceding information is displayed, it indicates that the .profile file is correct. Otherwise,
add the following information to the .profile file in the /opt/sybase/ directory as the root user:
#!/usr/bin/sh
PS1=$
export PS1
. /opt/sybase/SYBASE.sh
LANG=C
export LANG
Step 5 Set the host and authorities of the /opt/sybase/ directory to the correct values.
# chmod -R 755 /opt/sybase
# chown -R sybase:sybase /opt/sybase
----End
Symptom
In the single server system, the database cannot be started automatically after the Solaris or
SUSE Linux server is started.
Possible Cause
The Sybase database is manually started by different users, which causes the number of
devices configuration item in the DBSVR.cfg file to restore the default value. As a result, the
database process cannot automatically restart.
Procedure
Step 1 Check whether the DBSVR.krg file exists in the /opt/sybase/ASE-* directory. If the file exists,
delete the file.
Step 2 Modify the DBSVR.cfg file in the /opt/sybase/ASE-* directory. Change the value of the
number of devices configuration item to 255.
Step 3 Log in to the operating system as user sybase.
Step 4 To start the database manually, run the following commands:
$ cd $SYBASE/$SYBASE_ASE/install
$ ./startserver -f ./RUN_DBSVR
$ ./startserver -f ./RUN_DBSVR_back
----End
The log indicates that the Rectify the fault with reference to 6.1.4.1 Prompting dopen:
equipment file cannot be open '/opt/sybase/data/lv_LogDB_dev' in Logs.
opened.
The log indicates suspect. Rectify the fault with reference to 6.1.4.2 Prompt suspect in
Logs.
The log indicates the disk Rectify the fault with reference to 6.1.4.3 Disk of the
allocated for the database Database Logs Is Full.
logs is full.
Symptom
A message is displayed in the $SYBASE/$SYBASE_ASE/install/DBSVR.log indicating that
the equipment file cannot be opened. The message displayed is as follows:
NOTE
The contents in () are explanations of the message.
00:00000:00001:2005/07/20 17:18:29.57 server Activating disk 'LogDB_dev'.
00:00000:00001:2005/07/20 17:18:29.57 kernel Initializing virtual device 13, '/
opt/sybase1192/data/lv_LogDBR6' 00:00000:00001:2005/07/20 17:18:29.57 kernel
dopen: open '/opt/sybase/data/lv_LogDB_dev', No such file or directory (The
equipment file does not exist.) 00:00000:00001:2005/07/20 17:18:29.57 kernel
udactivate: error starting virtual disk 13 (The equipment cannot be activated
because the equipment file does not exist.) ...... 00:00000:00001:2005/07/20
17:18:46.38 kernel udstartio: vdn 13 has not been set up (The equipment 13 is not
activated.) 00:00000:00001:2005/07/20 17:18:46.40 server Error: 840, Severity: 17,
State: 1 (Error code) 00:00000:00001:2005/07/20 17:18:46.40 server Device
'LogDB_dev' (with physical name '/opt/sybase1192/data/lv_LogDB_dev', and virtual
device number 13) has not been correctly activated at startup time. Please contact
a user with System Administrator (SA) role. (The equipment cannot be started.)
00:00000:00001:2005/07/20 17:18:46.40 server Unable to proceed with the recovery
of dbid <8> because of previous errors. Continuing with the next database. (The
database cannot be restored because the equipment cannot be started.)
Possible Cause
The equipment file of the database is lost. The file may be deleted by mistake or lost due to the
power failure.
Fault Diagnosis
To find the name of the database where the fault occurs, run the following commands as user
root:
# su - sybase
$ isql -Usa -P<sa password> -SDBSVR
1> select name,status from sysdatabases
2> go
NOTE
Assume that the physical file of LogDB is deleted by mistake.
name status
------------------------------ ------
Eml_multinesvrDB 12
FaultDB 12
LogDB 76
master 0
model 0
sybsystemdb 0
sybsystemprocs 8
tempdb 12
The status value of LogDB is 76, it indicates that the physical file of LogDB is deleted by
mistake.
Procedure
Step 1 To start the database, run the following commands as user sybase:
$ cd /opt/sybase/ASE-*/install
$ ./startserver -f ./RUN_DBSVR &
$ ./startserver -f ./RUN_DBSVR_back &
In the message displayed, if the status value of database name to be restored is 320, it indicates
that the setting is successful.
Step 4 Run the following commands:
1> shutdown
2> go
Step 5 To start the database, run the following commands as user sybase:
$ cd /opt/sybase/ASE-*/install
$ ./startserver -f ./RUN_DBSVR &
$ ./startserver -f ./RUN_DBSVR_back &
NOTE
The following takes the unexpected deletion of the physical file of LogDB as an example.
name
------------------------------
FaultDB_dev
FaultDBlog_dev
LogDB_dev
LogDBlog_dev
NAWdmNemgrDB_994_dev
NAWdmNemgrDB_994log_dev
NgwdmaNemgrDB_6154_dev
NgwdmaNemgrDB_6154log_dev
OAMSDB_dev
OAMSDBlog_dev
SchdDB_dev
SchdDBlog_dev
SecurityDB_dev
SecurityDBlog_dev
TNCOMMONDB_dev
TNCOMMONDBlog_dev
TNOTNDB_dev
TNOTNDBlog_dev
TopoDB_dev
TopoDBlog_dev
TransPerfDB_dev
TransPerfDBlog_dev
master
mcdb_dev
mcdblog_dev
sysprocsdev
tapedump1
tapedump2
tempdb_dev
tempdblog_dev
(2) Find the names of the database devices to be deleted according to the message displayed.
The prefixes of the names of the database devices to be deleted are consistent with the name
of the database to be restored. For example, the name of the database to be restored in this
case is LogDB. Then, the names of the database devices to be deleted are LogDB_dev and
LogDBlog_dev.
(3) To delete the database devices, run the following commands:
1> sp_dropdevice database device name
2> go
For example, the names of the database devices to be deleted in this case are
LogDB_dev and LogDBlog_dev. Run the following commands:
1> sp_dropdevice LogDB_dev
2> go
1> sp_dropdevice LogDBlog_dev
2> go
Step 9 Initialize the database. For the specific method, see the administrator guide for the corresponding
version and solution.
Step 10 Restore the database data. For the specific method, see the administrator guide for the
corresponding version and solution.
----End
Symptom
A message is displayed in the $SYBASE/$SYBASE_ASE/install/DBSVR.log indicating that
the equipment file cannot be opened. The message displayed is as follows:
00:00000:00001:2005/07/20 17:33:25.71 server Error: 926, Severity: 14, State: 1
00:00000:00001:2005/07/20 17:33:25.71 server Database 'database name' cannot be
opened.
An earlier attempt at recovery marked it 'suspect'.
Check the SQL Server errorlog for information as to the cause.
Possible Cause
The log contains suspect. Generally, this fault occurs because of the abnormal power failure of
the server, or because the equipment file of the database is damaged or the database log is full
but not cleared in a timely manner. Therefore, you need to rectify the fault manually.
CAUTION
If the master database is suspended, you need to re-install the database or seek advice from
Sybase engineers.
Procedure
Step 1 Log in to the operating system as user root.
Step 2 To log in to the database as user sa , run the following commands:
# su - sybase
$ isql -Usa -P<sa password> -SDBSVR
Step 3 To update the suspended database in the log, run the following commands:
1> sp_configure 'allow update', 1
2> go
1> update master..sysdatabases set status = -32768 where name = 'database name'
2> go
1> shutdown SYB_BACKUP
2> go
1> shutdown
2> go
Step 10 Run the following commands to restart the database server. Then you can restore the database.
$ cd /opt/sybase/ASE-*/install
$ ./startserver -f ./RUN_DBSVR &
$ ./startserver -f ./RUN_DBSVR_back &
----End
Symptom
The database is started abnormally.
Possible Cause
The possible causes that result in full log space of the database are as follows:
l The log truncation is not set.
l The database is set to a small size.
Fault Diagnosis
To find the name of the database with full log space, do as follows:
1. Ensure that the U2000 application is closed and the database is started.
2. To search for the names of all the databases, run the following commands as user root:
# su - sybase
$ isql -Usa -P<sa password> -SDBSVR
1> sp_helpdb
2> go
3. To search for the name of the database with full log space, run the following commands:
# su - sybase
$ isql -Usa -P<sa password> -SDBSVR
1> sp_helpdb database name
2> go
In the message displayed, the number behind only log free kbytes indicates the remaining
space of the database log.
4. Find the name of the database with full log space according to the message displayed.
Procedure
Step 1 Log in to the operating system as user root.
Step 3 To update the suspended database in the log, run the following commands:
1> sp_configure 'allow update', 1
2> go
1> update master..sysdatabases set status = -32768 where name = 'database name'
2> go
1> shutdown SYB_BACKUP
2> go
1> shutdown
2> go
Step 10 Run the following commands to restart the database server. Then you can restore the database.
$ cd /opt/sybase/ASE-*/install
$ ./startserver -f ./RUN_DBSVR &
$ ./startserver -f ./RUN_DBSVR_back &
----End
Possible Cause
The possible causes that result in the database re-installation failure are as follows:
l The path where the installation software package is located contains space, punctuations,
or Chinese characters.
l The path where the database to be installed is located contains space, punctuations, or
Chinese characters.
l The database is uninstalled incompletely. Therefore, junk files exist.
l The registry information is faulty or deleted incompletely.
l The computer is infected by viruses.
Procedure
Step 1 Ensure that the following paths do not contain any Chinese character:
l The path where the installation software package is located
l The path where the database to be installed is located
Step 2 Ensure that the database is installed correctly according to the following method:
NOTE
The Microsoft SQL Server 2000 is considered as an example.
(1) You need to stop the database server and exit the database service manager before
uninstalling the Microsoft SQL Server 2000.
(2) Click Start and choose Control Panel. The Control Panel window is displayed.
(3) Double-click the Add or Remove Programs icon. The Add or Remove Programs
window is displayed.
(4) Select Microsoft SQL Server 2000, and then click Change/Remove.
(5) Click Yes. A progress bar is displayed.
(6) Perform the rest operations according to the prompts.
(7) Delete the MSSQL2000 folder in the installation directory of the database.
(8) Delete the Microsoft SQL Server folder in the Program Files folder that is placed in the
installation directory of the operating system.
(9) Delete the MSDesigners7 and MSDesigners98 folders in the Program Files\Common
Files\Microsoft Shared directory that is in the installation directory of the operating
system.
(10) Delete the following registry information.
TIP
For the method of opening the registries, see the Windows Online Help.
Step 3 After the preceding operations are performed, restart the operating syste.
Step 4 Ensure that the registries do not contain the PendingFileRenameOperations key value.
TIP
For the method of opening the registries, see the Windows Online Help.
Step 6 If the database re-installation fails, the computer may be infected with viruses. Check for and
remove the viruses by using the anti-virus software.
Step 7 If the preceding procedure does not work, contact Huawei technical support personnel.
----End
Symptom
After the Windows password is changed, an attempt to log in to the SQL Server fails. How to
solve this problem?
Possible Cause
Procedure
Step 1 Choose Start > Aministrative Tools > Services .
Step 2 In the SQL Server services automatically started by Windows, right-click MSSQLSERVER ,
and then choose Properties. Click the Log On tab, and change the password to the new one.
Step 4 In the service manager of SQL Server, start the SQL Server and SQL Server Agent services.
----End
If prompts are displayed in If the following information is displayed, rectify the fault with
the DOS window, locate the reference to the corresponding solutions:
fault according to the l 6.2.3.1 System Prompts login database failure
prompts.
l 6.2.3.4 System Prompts Incorrect Parameter of Java
Virtual Machine
If no prompt is displayed, If the following information is displayed, rectify the fault with
locate the fault by querying reference to the corresponding solutions:
the log information in the l 6.2.3.2 Prompt Failed to open the database
nms\server\database\log 'U2000DB'Failed to open the database 'VSMDB' in Logs
file.
l 6.2.3.3 Prompt Cannot insert duplicate key in object
'TrailServiceType' in Logs
Symptom
On Windows, when the U2000 database is initialized, a message is displayed indicating login
database failure.
Possible Cause
The possible causes that result in the database login failure are as follows:
l The alias of the database server is set incorrectly or is not set.
Procedure
Step 1 Check whether the database is started. If not, start it manually.
(1) Double-click the database icon on the taskbar of Windows. The SQL Server Service
Manager window is displayed.
(2) Check whether the database server is started.
If Start/Continue is grayed out, it indicates that the database is already started. Otherwise,
click Start/Continue to start the database server.
Step 2 Check for and rectify the alias of the database server.
(1) Click Start and then choose Programs > Microsoft SQL Server > Client Network
Utility. On the Alias tab page, view the alias of the database server.
The Server alias should be DBSVR.
(2) Initialize the database again.
If the message indicating login database failure is displayed again, the ODBC data source
may not be configured or configured incorrectly.
Step 3 Check for and restore the configuration of the ODBC data source.
(1) Choose Control Panel > Administrative Tools > Data Sources (ODBC).
(2) On the System DSN tab page, view the configuration of U2000DBServer.
(4) Click Finish. In the Microsoft SQL Server Configuration dialog box displayed, enter the
following information:
(5) Click Next. In the Microsoft SQL Server Configuration dialog box displayed, set the
parameters as follows:
l Select the With Windows NT authentication using the network login ID. and
Connect to SQL Server to obtain default setting for the additional configuration
options. check boxes.
l In the Login ID field, enter the database user name sa. The Password is null. If a
password is set, enter the password.
(6) Click Next. In the dialog box displayed, select Change the default database to: and then
select master from the drop-down list.
(7) Click Next. In the dialog box displayed, the default settings are recommended.
(9) Click Test Data Source.... Then, observe the information displayed on the screen. If TEST
COMPLETED SUCCESSFULLY! is displayed, the U2000 application and the database
server are connected.
(10) Initialize the database again.
----End
6.2.3.2 Prompt Failed to open the database 'U2000DB'Failed to open the database
'VSMDB' in Logs
Symptom
Database initialization fails. Check the logs in the nms\server\database\log directory and the
following message is found:
2008-08-06_10:27:51(DBConnectionManager.getSingleConnection) finish to
getSingleConnection
2008-08-06_10:27:51(CMSSQLConfig.mssqlSetDBOwner) Begin to set database
U2000DB's owner to U2000user
2008-08-06_10:27:51(CMSSQLConfig.mssqlSetDBOwner) ERROR:Set database U2000DB's
owner to U2000user failed
2008-08-06_10:27:51(CMSSQLConfig.mssqlSetDBOwner) ERROR:java.sql.SQLException:
[Microsoft][ODBC SQL Server Driver][SQL Server] Failed to open the database
'U2000DB', because the file cannot be accessed, or the memory or the disk space is
Possible Cause
Certain database files were deleted or the disk space is insufficient.
Procedure
Step 1 Check the disk space. You can locate and rectify the fault with reference to 5.1.5 Operation
Anomaly Caused by Insufficient Disk Space.
Step 2 To delete the database manually, run the following commands:
> isql -Usa -P<sa password> -SDBSVR
1> drop database database name
2> go
Symptom
Database initialization fails. Check the logs in the U2000\server\database\log directory and the
following message is found:
2008-04-02_18:20:11(CServerConfig.RunCommand) ERROR:Execute command failed
2008-04-02_18:20:11(CServerConfig.RunCommand) ERROR:java.lang.Exception: MSSQL
bcp executes failed
2008-04-02_18:20:11(CServerConfig.LoadDataTable) ERROR:Load data to
U2000DB.TrailServiceType from D:\U2000\server\database/staticdata/chinese
\TrailServiceType.dat failed
2008-04-02_18:20:11(CServerConfig.LoadDataTable) ERROR:java.lang.Exception:
Failed to import the static data.
2008-04-02_18:20:11(CServerConfigManagement.loadAllStaticDatatable) ERROR:load
static data failed
2008-04-02_18:20:11(CServerConfigManagement.loadAllStaticDatatable)
ERROR:java.lang.Exception: Failed to import the static data .
2008-04-02_18:20:11(CServerConfigManagement.InitializeDatabase)
ERROR:Initialize database failed
2008-04-02_18:20:11(CServerConfigManagement.InitializeDatabase)
ERROR:java.lang.Exception: Failed to import the static data.
2008-04-02_18:20:11(CServerConfigManagement.InitializeDatabase) ERROR:Error
Message is Starting copy...
SQLState = 23000, NativeError = 2627
Error = [Microsoft][ODBC SQL Server Driver][SQL Server]Violation of UNIQUE KEY
constraint 'UQ__TrailServiceType__114A936A'. Cannot insert duplicate key in object
'TrailServiceType'.
SQLState = 01000, NativeError = 3621
Warning = [Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been
terminated.
BCP copy in failed
Possible Cause
The character set used by the Microsoft SQL server database is not Chinese, while that used by
the U2000 is Chinese.
Procedure
Step 1 Run the following commands according to the command prompts:
> isql -Usa -P<sa password> -SDBSVR
1> sp_helpsort
2> go
Server default collation
NOTE
If Chinese-PRC is displayed, it indicates that the character set used by the database is Chinese. Otherwise, the
database needs to be installed again.
----End
Symptom
Database initialization fails. Check the logs in the vsm\server\database\log directory and the
following message is found:
Possible Cause
The symbol \ exists at the end of the value of the IMAP environment variable.
Procedure
Step 1 Check for and restore the IMAP environment variable. For details, see 7.1.4 U2000
Environment Variable Is Set Incorrectly.
Step 2 Initialize the database again.
----End
Possible Cause
The possible causes that result in the database backup failure are as follows:
l The database is not started.
l Full Disk Space.
Procedure
Step 1 Ensure that the database is started.
If the database icon in the Windows taskbar is displayed as , it indicates that the database is
started.
Step 2 Check the disk space. For details, see 5.1.5 Operation Anomaly Caused by Insufficient Disk
Space.
----End
1 Judge whether the fault is caused by Rectify the fault with reference to 7.1.1
the U2000 coredump. Abnormal Termination of the Server
Application.
2 Locate and rectify the fault Locate and rectify the fault according to the
according to the following system following system prompts:
prompts. l 7.1.2 System Prompting Connection
Failure to the Database
l 7.1.3 Prompting Invalid License
l 7.1.4 U2000 Environment Variable Is Set
Incorrectly
3 Restarting the U2000 server fails. Contact Huawei engineers for troubleshooting.
Symptom
The U2000 server application is terminated abnormally.
Possible Cause
The problem may be caused by the U2000 core dump.
Procedure
Step 1 Check whether any file whose name starts with core. exists in the following directories.
On UNIX:
l /opt/U2000
l /opt/U2000/server
l /opt/U2000/server/bin
On Windows:
l D:\U2000
l D:\U2000\server
l D:\U2000\server\bin
NOTE
l In the case of the Unix OS, the installation of the U2000 in the /opt/U2000 path is taken as an example.
l In the case of the Windows OS, the installation of the U2000 in the D:\U2000 path is taken as an
example.
----End
Possible Cause
l The database is not started.
l The communication connection between the database and the server is set improperly.
l The database password is illegally modified, which causes that the configuration file is
damaged.
l Other problems regarding the database occur.
Procedure
l Check whether the database is started. If the database is not started, start the database
manually.
Check and start the database on Windows according to the following procedure:
1. Double-click the database icon on the Windows taskbar.
The SQL Server Service Manager dialog box is displayed.
2. Check whether the database server is started.
– If the Start/Continue option is grayed, it indicates that the database is started.
– If the database is not started, click Start/Continue to start the database server.
NOTE
In the dialog box that is displayed, select the Auto-start service when OS starts option.
Check and start the database on Solaris according to the following procedure:
----End
Possible Cause
l If the U2000 cannot start or certain functions cannot be used, the possible cause is that the
license item is incorrect.
l If the time setting of the OS is incorrect, the license may also be invalid.
Procedure
l Check for and rectify the fault on Solaris according to the following precautions:
1. Ensure that the date of the OS is the current date.
2. A unique license file exists in the /opt/U2000/server/license directory.
If more than one license files exist in the directory, you need to delete redundant license
files manually.
3. The MAC address in the license file must be the same as the MAC address of the NIC
that is actually used on the server.
If the MAC addresses are different, you need to apply for a new license.
4. The license file must be transferred in the ASCII format.
TIP
You can check the license file by running the vi command. If each line of the license file ends
with the ^M symbol, it indicates that the license file is uploaded in binary mode. You need to
re-upload the license file.
5. The authority of the U2000 is incorrect.
6. The license file must comply with the U2000 version.
l Check for and rectify the fault on Windows according to the following precautions:
NOTE
Suppose that the U2000 is installed in the D:\U2000 directory.
1. Ensure that the date of the OS is the current date.
2. A unique license file exists in the D:\U2000\server\license directory.
If more than one license files exist in the directory, you need to delete redundant license
files manually.
3. The MAC address in the license file must be the same as the MAC address of the NIC
that is actually used on the server.
If the MAC addresses are different, you need to apply for a new license.
4. The license file must comply with the U2000 version.
----End
Symptom
A message is displayed indicating that the environment variable of the U2000 is set incorrectly.
Possible Cause
The environment variable is lost or modified.
Procedure
Step 1 Check the environment variable of the U2000.
l On Windows, right-click My Computer on the desktop and choose Properties from the
shortcut menu. On the Advanced tab page, click Environment variable to query the value
of IMAP.
l On Solaris, run the echo $IMAP command as user nmsuser to query the value of IMAP.
----End
Possible Cause
This is caused by the authority problem of the U2000 installation path. You can change the
owner of the U2000 installation path to solve this problem.
Procedure
Step 1 Log in to the Unix OS as the root user.
Step 2 Change the owner of the U2000 installation path to nmsuser. Then, run the following commands
in the CLI:
# cd /opt
# chown -R nmsuser U2000
Step 3 Modify the owner of the EmfSecuDm process to root. Then, run the following commands:
# cd /opt/U2000/server/bin
# chown -R root EmfSecuDm
----End
Possible Cause
The cause of this problem is that these processes had once started as the root user and then
abnormally exited before they are started.
Procedure
Step 1 Normally start the processes as the root user, and then normally exit.
----End
Possible Cause
If the system time of the server is modified while the NMS is running, the whole system looks
normal. Some functions based on timer principles, however, may be affected, such as the
scheduled dump function of the security Daemon.
Procedure
l Shut down the NMS and the database, and then restart the server.
NOTE
Set the correct system time of the server when installing the NMS. Never modify it while the NMS
is running. If needed, first exit the NMS server, then modify the system time and restart the NMS
server.
----End
1 Check whether the number of non- Contact Huawei engineers for network
gateway NEs managed by the division, ECC reconstruction, and DCN
gateway NE exceeds the limit. reconstruction.
Generally, each gateway NE is
recommended to support a
maximum of 50 non-gateway NEs
(including the non-gateway NEs
that use the extended ECC to
connect to the gateway NE). If the
number of non-gateway NEs
exceeds 60, it is recommended that
the number of gateway NEs be
increased. Otherwise, ECC
congestion may occur easily, which
causes slow response to operations
in the user interface.
2 Check whether a large number of Rectify the fault according to the abnormal
abnormal events are reported to the events.
U2000.
4 Check whether the operating system If the operating system runs abnormally,
is normal. rectify the fault with reference to 5.1.1
If the operating system runs at a low Starting the Operating System Fails.
speed or crashes or is restarted
frequently, the problem may be
caused by exceptions of the
operating system.
5 Check whether the disk usage If the disk space exceeds the normal value,
exceeds the limit. rectify the fault with reference to 5.1.5
Normally, the disk space occupancy Operation Anomaly Caused by
should be 80% or below. Insufficient Disk Space.
6 Check the hardware performance of Rectify the fault with reference to 5.1.6 Slow
the U2000 server. Running of the System Caused by
Insufficient Memory and 5.1.7 Slow
Running of the System Caused by High
CPU Usage.
This topic describes how to troubleshoot the faults of the U2000 client.
8.1 Starting the U2000 Client Fails
8.2 U2000 Client Login Failure
8.3 U2000 Client Runs Abnormally
8.4 Main Menu or Icons Cannot Be Loaded in the U2000 Client Window
8.5 The NE Manager GUI of Certain Equipment Is Displayed Abnormally on the U2000 Client
Possible Cause
The possible causes that result in the U2000 client startup failure are as follows:
l The files of the operating system and client are abnormal.
l The shortcut icon on the desktop is not updated after upgrade.
l The virtual memory is not set. This may be caused by illegal installation of the U2000
client.
Procedure
Step 1 If a prompt is displayed, locate and rectify the fault according to the prompt information.
Step 2 Uninstall the U2000 client and then install it again. For details, see the installation guide for
Huawei iManager U2000 Client Installation Guide .
----End
Possible Cause
The possible causes that result in the U2000 client login failure are as follows:
l The U2000 server is faulty.
l When the server is installed in the Windows OS, the ODBC data source is configured
incorrectly or not configured on the U2000 server.
l When the server is installed in the Windows OS, the database dynamic port setting on the
U2000 is incorrect.
l The network between the client and server is faulty.
l The version of the client is inconsistent with that of the server.
l The communication protocol used by the client is inconsistent with that used by the server.
l The user that logs in to the client is locked. This may be caused by a number of failed login
attempts.
l The number of clients allowed in the license is restricted.
Procedure
Step 1 If a prompt is displayed, locate and rectify the fault according to the prompt information.
Step 2 Choose Help > About on the U2000 server to check the number of clients allowed in the license.
If the number of clients to log in exceeds the maximum number of clients allowed in the license,
apply for a new license and update the U2000 license. For details, see the method in the
installation guide for the corresponding version and solution.
Step 3 If the U2000 server is installed in the Windows OS, check and restore the ODBC data source
settings on the U2000. For details, see Step 3 in 6.2.3.1 System Prompts login database
failure .
Step 4 If the U2000 server is installed in the Windows OS, do as follows to modify the dynamic port
number on the U2000:
(1) Choose Start > All Programs > Microsoft SQL Server > Client Network Utility.
(2) Check whether the dynamic port number is 1433 on the Alias tab page. If not, change the
value to 1433.
Step 5 Check whether the versions of the client and server are consistent. If the versions are inconsistent,
replace the client with a version that is consistent with the server version, and then log in to the
client again.
Step 6 Check whether the communication protocols used by the client and the server are consistent. If
the protocols are inconsistent, modify the protocols so that the protocols are consistent.
TIP
Log in to the Sysmonitor Client on the server, and choose System > Communication Settings.... In the
dialog box displayed, view the communication mode of the server.
l To check the network between the client and server, run the following command on Solaris:
# ping -s IP address of the NMS
Step 8 Check whether the client access control is set on the server.
On the U2000 server, you can set the client IP addresses that can be accessed. If the IP address
of a client is not in the permitted range, the client cannot access the server. For details, see the
administrator guide for the corresponding version and solution.
Step 9 If the number of failed login attempts by using the same user exceeds 3, the login authority of
the user is locked.
You can log in to the client again in 30 minutes (default) or unlock the user as another user that
has the authority, such as user admin.
Step 10 Check whether the system time is the current time. If not, modify the system time.
----End
Possible Cause
The computer may be infected with viruses.
Fault Diagnosis
Check for and remove the viruses.
Possible Cause
Operations are performed improperly or an abnormality occurs during the installation or upgrade
of the U2000 client. As a result, the index files of earlier versions are not normally cleared.
Procedure
Step 1 Shut down the U2000 client.
Step 2 Delete all the files in the following path on the U2000 client.
After you restart the U2000 client, the user configuration file is automatically generated.
----End
Possible Cause
For the NE manager of certain equipment such as the equipment of the PTN series, RTN series,
NG WDM series, and SLM 3160 series, the browser settings result in abnormal display of the
GUI.
Procedure
Step 1 Check whether the browser settings comply with the standards. For the Windows OS, the default
browser needs to be Microsft Internet Explorer; for the Solaris OS, the default browser needs to
be Mozilla browser.
Step 2 Check the version of Internet Explorer in the Windows OS. If the security level of Internet
Explorer is set to high, the running of scripts is affected and the GUI becomes grayed out. To
make the GUI display normally, you need to set the security level of the Internet Explorer to
Medium or a lower level. In the Windows 2003 OS, the function of Internet Explorer enhanced
security settings is installed by default. This function results causes the security level to remain
high. Therefore, you need to cancel the function as follows:
(1) Choose Start > Control Panel. The Control Panel dialog box is displayed.
(2) Double-click the Add or Remove Programs icon. The Add or Remove Programs dialog
box is displayed.
(3) Click the Add/Remove Windows Components icon. The Windows Components
Wizard.
(4) Clear the selection of the check box to the left of Internet Explorer Enhanced Security
Configuration.
NOTE
By default, the check box is selected, which indicates that the security level of the Internet Explorer
is high.
(5) Click Next.
(6) Click Finish.
(7) Double-click the Internet Explorer icon on the desktop to open the Internet Explorer.
(8) Choose Tool > Internet Options.
(9) In the Internet Options dialog box, select Security. Then, move the slider to set the security
level of Internet Explorer to Medium or a lower level.
(10) Click Apply.
(11) Click OK.
Step 3 Check whether Internet Explorer is configured with the proxy server. If Internet Explorer is
configured with the proxy server, cancel the proxy server or disable the connection to the
U2000 server through the proxy server.
Step 4 Check the installation directory of the U2000 client. The directory name contains only the letters,
numbers, and underscores (_) and cannot contain the space or bracket.
----End
NOTE
l If the server is configured with one network card, the Host name is the Host IP address of the master
server. In this example, the Host name of the master servers are 129.9.1.1 and 129.9.1.2.
l If the server is configured with two network cards and has the IPMP feature enabled, the Host name
is the IP address (floating IP address) of the master server, that is, the IP address of the network card
on the U2000 that is used for external services.
l If the server is configured with two network cards and has the IPMP feature disabled, the Host
name is the Data replication IP address of the master server.
In a Dual-Host State
Run the following command on the master server of primary site to check the system status:
# vradmin -g datadg repstatus datarvg
Replicated Data Set: datarvg
Primary:
Host name: 129.9.1.1
RVG name: datarvg
DG name: datadg
RVG state: disabled for I/O
Data volumes: 4
SRL name: srl_vol
SRL size: 3.00 G
Total secondaries: 1
Secondary:
Host name: 129.9.1.2<unreacheable>
RVG name: datarvg
DG name: datadg
Replication status: paused due to network disconnection
Current mode: asynchronous
Logging to: SRL
Timestamp Information: N/A
Config Errors:
129.9.1.2: Pri or Sec IP not available or vradmind not running
Run the following command on the master server of secondary site to check the system status:
# vradmin -g datadg repstatus datarvg
Replicated Data Set: datarvg
Primary:
Host name: 129.9.1.2
RVG name: datarvg
DG name: datadg
RVG state: enabled for I/O
Data volumes: 4
SRL name: srl_vol
SRL size: 3.00 G
Total secondaries: 1
Config Errors:
129.9.1.1: Pri or Sec IP not available or vradmind not running
l In the dual-host state, if the U2000 client connects to the secondary site, perform incremental or full
synchronization on the secondary site.
l In the dual-host state, if the U2000 client is still running on the primary site, perform incremental or
full synchronization on the primary site.
In a Healing State
Run the following command on the master server of primary and the secondary site to check the
system status:
# vradmin -g datadg repstatus datarvg
If the on-screen terminal output contains the acting secondary information as follows, it can be
confirmed that the system is running in a healing status. (Usually because the secondary site
takes over forcibly, the network between the primary site and the secondary site returns to
normal.)
Replicated Data Set: datarvg
Primary:
Host name: 129.9.1.2
RVG name: datarvg
DG name: datadg
RVG state: enabled for I/O
Data volumes: 4
SRL name: srl_vol
SRL size: 3.00 G
Total secondaries: 1
Config Errors:
129.9.1.1: Primary-Primary configuration
2 Check whether the resources are Rectify the fault with reference to
abnormal. 9.2.5 Resource in the Frozen
State and 9.2.6 Resource in the
Fault State.
4 Check whether the data on the Rectify the fault with reference to
primary node is consistent with the 9.2.3 Data Replication Cannot Be
data on the secondary node. Performed Between Primary and
Secondary Nodes.
1 Check whether the files of the Rectify the fault with reference to
operating system are normal. 5.1.1 Starting the Operating
System Fails.
2 Check whether the VCS is normal. Rectify the fault with reference to
Run the hastatus -sum command 9.2.10 Failed to Start the VCS
to query the status of the VCS. If Because of the Errors in the
the reported status of the VCS is Configuration File.
ADMIN, it indicates that the VCS
fails to be started.
Symptom
Data replication and switching cannot be performed between the primary and secondary nodes.
Possible Cause
The possible causes that result in the communication failure between the primary and secondary
nodes are as follows:
l The network between the primary and secondary nodes is unstable or a firewall exists.
l The IP addresses and gateways of the primary and secondary nodes are set incorrectly.
Procedure
Step 1 To check the communication status between the primary and secondary nodes, run the following
commands as user root on the primary node:
# ping -s IP address of the Master NIC on the secondary node
# ping -s IP address of the replication NIC on the secondary node
TIP
Run cat /etc/hosts | grep loghost as user root on secondary node can query the IP address of the Master
NIC on the secondary node.
Generally, the bandwidth between the primary and secondary nodes is at least 2 Mbit/s and the
packet loss ratio is smaller than 0.1%.
Step 2 Check whether all the ports used by the HA system are enabled.
To query the service ports that are enabled in the system, run the following command as user
root:
# netstat -an
----End
Symptom
A lock in red is displayed on a resource or resource group in the VCS Explorer.
Possible Cause
You may forget to restore the resource group after freezing it manually.
Procedure
Step 1 In the VCS Explorer interface, right-click the resource group that is in the frozen state, and then
choose Unfreeze.
----End
Symptom
In the VCS Explorer, a cross in red is displayed for a certain resource. The resource is in the
Fault state.
Possible Cause
The resource is faulty. For example, the U2000 coredump occurs.
Procedure
Step 1 Right-click the name of the resource that is in the Fault state, and then choose Clear Fault to
rectify the fault.
Step 2 In the case of the primary server, right-click AppService, and then choose Online. The
AppService resource group is in the Online state.
----End
Possible Cause
The DCN between the primary and secondary nodes is instable.
Procedure
Step 1 To modify the heartbeat detection timeout, run the following commands as user root respectively
on the primary and secondary nodes:
# haconf -makerw
# /opt/VRTSvcs/bin/hahb -local Icmp AYATimeout
# /opt/VRTSvcs/bin/hahb -modify Icmp AYATimeout heartbeat detection timeout -clus
Cluster name of the opposite node
# haconf -dump -makero
NOTE
l The heartbeat detection timeout is 300 seconds by default. You can set the heartbeat detection timeout,
such as 600 seconds, according to the duration of network interruption between the primary and
secondary nodes.
l If you use one or two NICs but do not enable the IPMP feature, the Cluster name of the opposite node
is the host name Cluster of the opposite node, such as SecondaryCluster.
l If you use two NICs and enable the IPMP feature, the Cluster name of the opposite node is the host
name of the opposite node, such as Secondary.
Step 2 After the DCN becomes stable, you need to run the following commands as user root on the
primary and secondary nodes to restore the heartbeat detection timeout to the default value.
# haconf -makerw
# /opt/VRTSvcs/bin/hahb -local Icmp AYATimeout
# /opt/VRTSvcs/bin/hahb -modify Icmp AYATimeout 300 -clus Cluster name of the
opposite node
# haconf -dump -makero
----End
2. To start the T2000 and Sybase on the node to be monitored, run the following command
as user root:
3. To freeze the T2000 and Sybase on the primary node, run the following commands as user
root:
# hagrp -freeze AppService
# hagrp -freeze VVRService
4. To freeze the T2000 and Sybase on the secondary node, run the following commands as
user root:
# hagrp -freeze AppService
# hagrp -freeze VVRService
5. To restore the protection for the primary and secondary nodes after the network recovers,
run the following commands:
# hagrp -unfreeze AppService
# hagrp -unfreeze VVRService
9.2.8 Connection Failure Between the Rlink and the Remote Host
Symptom
In the console window, the following error message is displayed:
vxvm:vxrlink: ERROR: Unable to establish connection with remote host
<remote_host>
Possible Cause
l The network connection between the primary site and the secondary site is torn down.
l The vradmind service process is stopped.
Procedure
l Check network connection between primary and secondary nodes.
Run the following command:
# ping host
IP address of the master server on the secondary site
If each host can be pinged successfully, it indicates that network connection is normal.
Otherwise, clear the network fault first.
l Check whether the vradmind process of the primary/secondary site is running.
Run the following command:
# ps -ef | grep vradmind
----End
Possible Cause
The server is powered off abnormally or other abnormal operations are performed.
Procedure
l Open a terminal window.
l Run the following commands on the node on which the disk volume is abnormal:
# cd /opt/HWICMR/bin
# ./runtaskflow.sh recover_rvg.tf
l check whether the status of disk volume and data replication status is correct. If so, the
recovery is successful.
----End
Possible Cause
The VCS startup failure may be caused by a power failure.
Procedure
Step 1 To restore the VCS on the primary site, run the following command on the primary site as the
root user:
# hasys -force host name of the primary site
Step 2 If starting the VCS on the secondary site fails, run the following command on the secondary site
as the root user:
# hasys -force host name of the secondary site
----End
Possible Cause
The NMS cannot be used because of the fault on the primary site.
Procedure
l The connection between the client and server is torn down. In this case, the primary site is
unavailable. The NMS application processes are automatically switched to the server on
the secondary site. Do as follows:
1. Log in to the U2000 server on the secondary site through the client.
2. Manage NEs through the U2000 server on the secondary site.
l On the client, the NEs on the NMS preempt the resource of each other. The server is in the
dual-host state. Do as follows:
1. Shut down the U2000 server on the primary site. For details, refer to the Huawei
iManager U2000 High Availability System (Veritas) Administrator Guide.
2. Log in to the U2000 server on the secondary site through the client.
3. Manage NEs through the U2000 server on the secondary site.
l The damage of the NMS data results in the failure of the server. In this case, the primary
and secondary sites are both unavailable. Do as follows:
1. Recover the backup data of the U2000. For details, refer to the Huawei iManager
U2000 High Availability System (Veritas) Administrator Guide.
2. If there is no backup data, recover the data by using the script. For details, refer to the
Huawei iManager U2000 High Availability System (Veritas) Administrator Guide.
----End
Symptom
The instability of the data communication network (DCN) between the primary and secondary
nodes leads to the frequent interruption of heartbeat between the two nodes. As a result, the
U2000 cannot work normally.
Possible Cause
You can rectify the fault by modifying the timeout period of the heartbeat detection.
Procedure
l To modify the heartbeat detection timeout, run the following commands respectively on
the primary and secondary nodes:
# haconf -makerw
# /opt/VRTSvcs/bin/hahb -local Icmp AYATimeout
# /opt/VRTSvcs/bin/hahb -modify Icmp AYATimeout Heartbeat Detection Timeout -
clus Cluster name of the opposite node
# haconf -dump -makero
NOTE
l The default Heartbeat Detection Timeout is 300s. You can set the heartbeat detection timeout
according to the interruption time of the network between the primary and secondary nodes. For
example, set the value to 600.
l If you use one or two network adapters but do not enable the IPMP feature, the Cluster name of
the opposite node is the opposite hostnameCluster, such as SecondaryCluster.
l If you use two network adapters and enable the IPMP feature, the Cluster name of the opposite
node is the hostname of the opposite node, such as Secondary.
l After the DCN becomes stable, you need to run the following commands on the primary
and secondary nodes, to restore the heartbeat detection timeout to the default value.
# haconf -makerw
# /opt/VRTSvcs/bin/hahb -local Icmp AYATimeout
# /opt/VRTSvcs/bin/hahb -modify Icmp AYATimeout 600 -clus Cluster name of the
opposite node
# haconf -dump -makero
----End
Possible Cause
l The slave server is not started. The possible causes may be manual shutdown, abnormal
power-off, and hardware fault.
l The MSuite server of the slave server is not started or is started abnormally.
l The IP address used for connecting the slave server to the master server changes.
l The network between the slave server and the master server is faulty or the NIC of the slave
server is faulty.
Procedure
Step 1 Check whether the slave server is started successfully.
If the slave server is started abnormally, check the server hardware, such as hard disk, CPU,
memory, and card.
Step 2 To check whether the MSuite server is started successfully, run the following commands as user
root on the slave server:
# cd /opt/HWENGR/engineering
# ./startclient.sh
If the login window of the NMS maintenance tool is displayed, it indicates that the tool is
normally started. Otherwise, run the ./startserver.sh command to start the server of the NMS
maintenance tool.
Step 3 Check whether the IP address used for connecting the slave server to the master server changes.
Run the ifconfig -a command as user root to check whether the displayed IP address is the same
as the IP address in the server list of the MSuite. If the IP addresses are different, select the slave
server, and then choose System > Synchronize the IP and hostname of the slaveserver .
Step 4 Run the ping Floating IP address of the slave server command as user root on the master server
to check whether the network between the master and slave servers is normal.
If the displayed floating IP address of the slave server is alive, it indicates that the network
between the master and slave server is normal. Otherwise, troubleshoot the network fault.
----End
Possible Cause
l The slave server is in non-non-monitored mode.
l The status of the U2000 of the slave server is not refreshed.
l The U2000 on the slave server is abnormal.
Procedure
Step 1 Check whether the slave server is in monitored mode.
Log in to the MSuite client, and check the monitoring status of the slave server in the lower pane
of the main interface. If system displays Monitoring, it indicates that the slave server is in
monitored mode. Otherwise, choose System > Start monitoring slave server to restore the
monitoring function of the master server for the slave server.
Step 2 Check whether the status of the U2000 on the slave server is inconsistent with that on the master
server for a long time (more than 10 minutes).
After the master server starts, the slave server starts after the synchronization for 2 to 10 minutes.
Therefore, it is a common phenomenon when the status of the T2000 on the slave server is
inconsistent with that on the master server for a short time.
Step 3 Check whether the configuration of the slave server is correct.
l To check whether the following configuration items are correct, run the following command
as user root on the slave server:
# cat /opt/sybase/interfaces
SYSDBServer
master tcp ether Floating IP address of the master server 5100
query tcp ether Floating IP address of the master server 5100
SYSDBServer_back
master tcp ether Floating IP address of the master server 5200
query tcp ether Floating IP address of the master server 5200
# cat /opt/U2000/server/conf/imap.cfg | grep MDPAddress
MDPAddress=Floating IP address of the master server
# cat /opt/U2000/server/conf/emfmoni.cfg | grep MONI_DISTRIBUTE_MODE
MONI_DISTRIBUTE_MODE=1
If the displayed floating IP address of the master server is inconsistent with the actual IP
address or the value of MONI_DISTRIBUTE_MODE is not 1, you need to manually
modify the configuration file for restoration by running the vi command.
l Run the ls /opt/U2000/server/conf/sysmoni command to view the files in the directory.
If the following configuration files exist in the directory, delete them manually.
– moniemffault.cfg
– moniperfsrv.cfg
– moniweblct.cfg
– moniemfsecu.cfg
– monipubsvr.cfg
– monizip.cfg
– moniemftopo.cfg
– monisvhdsvr.cfg
– monicau.cfg
– moniiNBXmlFramework.cfg
– monitomcat.cfg
– moniemfalmagent.cfg
– monin2100dcsrv.cfg
– monitoolkit.cfg
----End
Possible Cause
l The hard disk of the master server is faulty.
l The OS of the master server is faulty.
l A severe fault occurs on the file system of the master server. Consequently, the files on the
master server are lost and reinstalling the NMS is required.
Procedure
l Reinstall the master server where the faults occur.
For details, refer to the Huawei iManager U2000 Installation Guide.
NOTE
During the installation, make sure that the IP address and host name of the reinstalled server are the
same as those of the faulty master server.
l Configure the IP Network Multipathing (IPMP) on the master server. Run the following
commands on the master server as the root user:
# cd /opt/HWICMR/bin
# ./runtaskflow.sh config_distributed_ipmp.tf
Configure the IPMP according to the prompts. Note that the settings of the IPMP parameters
must be the same as those for the master server before the faults occur.
l Log in to the system maintenance tool. Choose System > Mounting a slave server to add
the original slave servers again.
l Choose System > Restoring the NMS information to select the up-to-date backup data.
Then, click OK.
----End
Possible Cause
l The hard disk of the slave server is faulty.
l The OS of the slave server is faulty.
l A severe fault occurs on the file system of the slave server. Consequently, the files on the
slave server are lost and reinstalling the NMS is required.
Procedure
Step 1 Reinstall the slave server where the faults occur.
For details, refer to the Huawei iManager U2000 Installation Guide.
NOTE
During the installation, make sure that the IP address and host name of the reinstalled server are the same
as those of the faulty slave server.
Step 2 Configure the IPMP of the slave server. Run the following commands on the slave server as the
root user:
# cd /opt/HWICMR/bin
# ./runtaskflow.sh config_distributed_ipmp.tf
Configure the IPMP according to the prompts. Note that the settings of the IPMP parameters
must be the same as those for the slave server before the faults occur.
Step 3 If the slave protection server exists in the distributed system, switch the services on the slave
protection server to the slave server.
(1) On the client of the NMS maintenance tool, click the Server tab. Right-click the server
where the subsystem is to be added and choose Switch Nodes from the shortcut menu. The
Switch Nodes dialog box is displayed.
(2) Click OK to start switchover. Wait until the Switch nodes successfully dialog box is
displayed.
(3) Click OK to complete switchover.
Step 4 If the slave protection server does not exist in the distributed system, log in to the NMS
maintenance tool. Choose System > Restoring the NMS information to select the up-to-date
backup data. Then, click OK.
----End
This topic describes how to troubleshoot the NMS system maintenance tool.
Possible Cause
The client of the network management system maintenance suite refreshes the instance status
every 15 seconds. Therefore, the instance status between the client of the network management
system maintenance suite and the system monitoring client may be inconsistent in a short time.
Procedure
l On the client of the network management system maintenance suite, click the Instance tab.
l Click the shortcut icon to refresh the information on the network management system.
----End
Possible Cause
Before you run the /startClient.sh file as the nmsuser user, the file has already run through the
root user. As a result, a right error occurs.
Procedure
Step 1 Log in to the OS as the root user.
Step 2 Modify the owner of the file where the information indicates a right error. Then, run the following
command:
# chown nmsuser /opt/U2000/engineering/conf/launch/client/org.eclipse.osgi
----End
This topic describes how to obtain the technical support in the case of any problems encountered
during routine maintenance.
During the routine maintenance of the U2000, if there is any problem that is uncertain or hard
to solve, or if you cannot find the solution to a problem from this manual, contact the customer
service center of Huawei or send an email to [email protected]. You can also go to http://
support.huawei.com to obtain the latest technical materials of Huawei.
Before seeking the technical support, collect the relevant information.
Index
B
basic principle, 1-1