0% found this document useful (0 votes)
47 views

5.routine Maintenance and Troubleshooting Cases (V4)

The document discusses routine maintenance and troubleshooting cases for ZTE's NetNumen M31 (RAN) mobile network element management system. It provides guidance on daily, weekly, monthly, and quarterly routine maintenance items including checking equipment environment, running status of servers and clients, alarms, logs, backups, and performing troubleshooting. The document aims to ensure proper operation of the management system through regular preventative maintenance.

Uploaded by

Tri Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

5.routine Maintenance and Troubleshooting Cases (V4)

The document discusses routine maintenance and troubleshooting cases for ZTE's NetNumen M31 (RAN) mobile network element management system. It provides guidance on daily, weekly, monthly, and quarterly routine maintenance items including checking equipment environment, running status of servers and clients, alarms, logs, backups, and performing troubleshooting. The document aims to ensure proper operation of the management system through regular preventative maintenance.

Uploaded by

Tri Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Routine Maintenance and

Troubleshooting Cases
ZTE University

Objects

Learn about Daily Routine Maintenance Items


Learn about Weekly Routine Maintenance Items
Learn about Monthly Routine Maintenance Items
Learn about Quarterly Routine Maintenance Items
Learn about Common Alarms and Fault
Troubleshooting
Troubleshooting Flow for Major Accidents

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Overview of Daily Routine Maintenance Items

Daily Routine Maintenance Items


Type

Check Item

Environment monitoring &


maintenance

Temperature and humidity in the equipment room


Operational status of the air conditioner

Running Status Maintenance Communication Between Server and Client


of Main Equipment
CPU/MEM Utilization of Server and Client
The Server Running Status
Current Alarms
NetNumen Logs
The Running Status of the Disk Array
Backup of the Data Modified Every Day
System Load

Checking Temperature in Equipment Room

Device type

NetNumen M31 (RAN) mobile


NE management system

Temperature
Long-term Working
Condition

Short-term Working
Condition

10 ~30

0 ~45

Check Humidity in Equipment Room

Device Type

NetNumen M31 (RAN) mobile


NE management system

Relative Humidity
Long-term Working
Condition

Short-term Working
Condition

30 ~85

5 ~95

Checking Operational Status of the Air


Conditioner

Checking Communication Between Server


and Client

Purpose:
z

Make sure that the communication between the


NetNumen M31(RAN) server and the client is normal.

Operation Procedure:
z
z

Select Start > Run on the NetNumen M31 (RAN) client.


Input the ping command and the server address, and
then press Enter to check whether the client is normally
communicating with the server.
Start the server console and check whether the
processes are successfully started.
Start the client and check whether you can normally log
on.

Checking Communication Between the


NetNumen Server and the Lower-level EMS

Purpose:
z

Operation Procedure :
z
z

Make sure that the communication between the NetNumen


M31(RAN) server and the lower-level EMS is normal.
On the server ping the lower-level EMS.
Log in to the client and enter the topology management view.

Inspection Criteria:
z

z
z

The communication link between the server and the lower-level


EMS is green.
There is no link-broken alarm among the current alarms.
You can ftp the lower-level EMS via the server.

Checking CPU/MEM Utilization of Server and


Client

Check Purpose:
z

To check whether the CPU utilization and memory


(MEM) utilization meets the requirements.

Operation Procedure:

Checking CPU/MEM Utilization of Server and


Client

Click AppServer on the


System Monitor tree in the
left pane of the System
Monitor view.
Then click the View button
in the Server Performance
area. The application
server performance dialog
box appears, in which you
can view CPU and
memory utilization.

Checking Hardware

Checking the Server Running Status

Checking Current Alarms

Checking NetNumen Logs

Check the Running Status of the Disk Array

Purpose:
z

Operation Procedure:
z

To ensure enough space in the disk array and proper running of


the disk array.
Check the hard disk space of the disk array. Insufficient space in
the disk array may influence alarm and performance data collection
and report generation in EMS.
Check whether any fault occurs in the hard disk, for example, read
or write error. If yes, contact the local ZTE office.

Check Criteria:
z
z

The disk array should have sufficient space.


The hard disk should not have the read/write problem.

Checking Backup of the Data Modified Every


Day

Checking Backup of the Data Modified Every


Day (Continued)

Operation Procedure:
z

In the \ums-server\utils\usf-backup directory, the EMS provides the backup


and restoration script run.sh of Oracle database running in UNIX. This
script must be run in a UNIX operating system installed with Oracle
database. Switch to an Oracle user on the computer/workstation where the
database is installed.
Before running the script, configure its contents according to actual
application.

The following describes the precautions on running the run.sh script.


z
z

Be sure not to configure the IP address of the Oracle database as


127.0.0.1.
Create the service name first before configuring Oracle database.

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Overview of Weekly Routine Maintenance

Weekly Maintenance Items


Type

Check Item

Running Status of Main


Equipment

Checking History Alarms


Checking History Alarms that are Backed Up
Backing up EMS Configuration and System Logs
Calibrating the System Clock
Checking Database Space
Checking Hard Disk Space
Checking Operation Log in the Operating System
of Server
Checking Running State of Dual Hosts
Checking Virus for Server and Clients
Updating Client Virus Definitions

Checking History Alarms

Backing up History Alarms

Backing up EMS Configuration and System


Logs

Purpose:
z
z

To back up the configuration data of the EMS periodically.


In UNIX environment, a utility called crontab can be used to
schedule and automate tasks. Users can schedule a database
backup and restoration script to run on the defined schedule, so as
to achieve automatic and periodical backup of configuration data.

Operation Procedure:
z

Switch to an Oracle user on the computer/workstation where the


database is installed.
Execute the crontab -e command to set a scheduled task via the
text editor. Input * * * * * \ums-server\utils\usf-backup\run.sh A
\ums-server\Backup. This command can be divided into two parts:
the first part is used to set data and time, and the second part is
used to set the command(s) to be executed.

Backing up EMS Configuration and System


Logs (Continued)

Five "*" in the first part stand for five numerals respectively. The following
introduces the unit and value range of each numeral.
z
z
z
z
z
z
z
z
z
z
z

Minute: 0 to 59
Hour: 0 to 23
Date: 1 to 31
Month: 1 to 12
Week: 0 to 6 ("0" indicates Sunday.)
Except these five numerals, you may encounter some symbols with special
meanings, for example, "/", "-" and ",". The following introduces these symbols.
"*" stands by all the numerals in the value range;
"/" means "per".
"*/5" indicates every five units;
"-" is used between two numerals, indicating a number range.
"," is used to separate several numerals.

The use of crontab varies with different UNIX versions. For more details of
crontab, please refer to the help of corresponding UNIX version. The following
gives some commonly-used crontab commands:
z
z
z

crontab -l: used to list the contents of cron service


crontab -r: used to delete the contents of cron service
crontab -e: used to edit the contents of cron service

Calibrating System Clock

Purpose:
z

To guarantee the time synchronization between the EMS server and


lower-level network element management systems.

Operation Procedure:
z
z
z

z
z

Run the date command to view the system time of the server.
View the system time of the lower-level network element management
system.
Check whether the system time of the lower-level network element
management system is consistent with the system time of the NetNumen
M31(RAN) server.
Check whether the clock service is enabled on the NetNumen M31(RAN)
server that serves as the upper-level EMS.
Check whether the upper-level network element management system is
configured as the clock source of the lower-level network element
management system.

Checking Database Space

Checking Hard Disk Space

Checking Operation Log in the Operating


System of Server

Checking Running State of Dual Hosts

Checking Virus for Server and Clients

Updating Client Virus Definitions

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Overview of Monthly Routine Maintenance

Monthly Maintenance Items

Type

Check Item

Environment monitoring &


maintenance

Check power voltage

Running Status of Main Equipment

Analyzing System Performance

Cleaning the Equipment


Checking the Remote Maintenance Tool

Check power voltage

Purpose:
z

Operation Procedure:
z

To ensure that the power equipment works normally.


Check the voltage and frequency of the power
equipment.

Check Criteria:
Parameter

Index

Operational Power Supply

220 V, 50 Hz

Voltage range

176 V~264 V

Voltage frequency range

45 Hz~65 Hz

Cleaning the Equipment

Analyzing System Performance

Purpose:
z To check whether the statistical performance indexes meet
corresponding range requirements.
Check Criteria:
z Each performance index is within the required range and has no
great difference from recent index value.
z If an index varied greatly in some day, query the details of the
performance data in that day further (using the smallest query
granularity) and analyze the query result to find the cause of the
unexpected performance index change.
Operation Procedure:
z In the NetNumen M31(RAN), select Performance > template task
management to enter the template task management interface.

Analyzing System Performance (Continued)


z

Check whether any scheduled task expires. If some


task expires, reset its running cycle.
Analyze the performance reports to check the running
state of the system.
Create a directory on the hard disk of the client and
back up the performance statistics every month under
the directory. If necessary, back up the performance
data on another storage medium, such as a magneticoptical disk. (optional)

Checking the Remote Maintenance Tool

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Overview of Quarterly Routine Maintenance

Quarterly Maintenance Items


Type

Check Item

Environment monitoring &


maintenance

Checking the Grounding Resistance

Running Status of Main


Equipment

Check the dual-server changeover


Clearing Databases
Whether there is any unauthorized
access in the client
Whether there is any unauthorized
access in the server
Checking Devices on the LAN
Checking the Settings of Firewall

Check the dual-server changeover

Checking the Ground Resistance

Clearing Databases

Whether there is any unauthorized access

Whether there is any unauthorized access in


the server

Checking the Settings of Firewall

Checking Devices on the LAN

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Active/Standby Switchover Abnormal

Meaning
z

Cause
z
z

If a failure occurs in the active control board, the standby control board will
immediately take over all operations, so that the operations will not be interrupted.
When the active board is down, and the back plug-in card is damaged.
OMM client will be fault when the standby board with data on it is switched to active
board.

Handling Method:
z

1.

If the switchover process can not be done automatically when the states of the active
board is fairly poor, or OMM memory overflow. It is necessary to use CLI command to
perform switchover process manually.
Start Handover:
Run the following command in the active&standby board:
#cd /home/zte/OmmHost/bin/
#./startsys

Active/Standby Switchover Abnormal(Continued)

Handling Method:
2.

View Switchover Result:


Use the telnet enter the active board, and use root user telnet the OmmHost
controller:
#telnet localhost 1234
SBCX->SCS_TestChgOver
If in the active board perform switchover process, the system will no prompt
information output.
If in the standby board perform swithcover process, the system will promptboard
status is not master,changeover forbidden.

Alarm Information Asynchronous

Meaning

Cause

Handling Method

The alarm information between OMM and EMS is inconsistent.


Network instability
By default, the network management system will synchronize alarm automatically.
When the alarm asynchronous, you can synchronize it manually.
In network management main interfaceselect Fault > Synchronizing Active
Alarms/Synchronizing History Alarmsset the synchronizing conditionsand
then click OK button to start the alarm synchronization.

Cell Blocking

Meaning
z

Cause
z

If perform the cell block operation, the CRNC shall prohibit the use of the indicated
logical resources according to the Blocking Priority Indicator IE.
Low Priority: the CRNC shall prohibit the use of the logical resources when the
resources become idle. New traffic shall not be allowed to use the logical resources
while the CRNC waits for the resources to become idle and once the resources are
blocked.
Normal Priority: the CRNC shall prohibit the use of the logical resources if the
resources are idle or immediately upon expiry of the shutdown timer specified by the
Shutdown Timer IE in the BLOCK RESOURCE REQUEST message. New traffic
shall not be allowed to use the logical resources while the CRNC waits for the
resources to become idle and once the resources are blocked.
High Priority: the CRNC shall prohibit the use of the logical resources immediately.

Handling Method
z

According to the requirements to select the blocking priority.

Cell Blocking(Continued)

Broken Access Managed Object(AMO) Link

Hard Disk Overload of Database Server

CPU Overload of Application Server

RAM Overload of Application Server

Hard Disk Overload of Application Server

Log File Space Threshold Crossing

Directory Space Threshold Crossing

Access Managed Object (AMO) Startup Error

Alarm Forwarding Failure

Insufficient Disk Space

A Client Fails to Connect the Server

Server Startup Failure

Database Startup Failure

Database Startup Failure (Continued)

Full Data Tablespace

Failure of Alarm Box to Indicate Audio/Visual


Alarms

Performance Report Problem Caused by


Incorrect Time Zone Settings

Failure of Reporting Lower-Level EMS's


Configuration Data

Failure of Reporting Alarm Data from LowerLevel EMS

Failure of Reporting Performance Data from


Lower-Level EMS

Broken Link to Northbound Interface

Failure of Reporting Configuration Data to the


NMS

Failure of Reporting Configuration Data to the


NMS (Continued)

Failure of Reporting Alarms to the NMS

Failure of Reporting Performance Data to the


NMS

Failure of Reporting Performance Data to the


NMS (Continued)

Content
Daily Routine Maintenance Items
Weekly Routine Maintenance Items
Monthly Routine Maintenance Items
Quarterly Routine Maintenance Items
Common Alarms and Fault Troubleshooting
Troubleshooting Flow for Major Accidents

Overview of Major Accidents

Troubleshooting Flow for Major Accidents

Troubleshooting Flow for Major Accidents


Fault occur
Personnel attendance finds system failure or
service interruption

Inform the technical leader of the


department and the upper-level leader
Yes

Is it fault of transmission equipment


or power supply?

Inform the related department to


troubleshoot the fault

No
Do not perform any operation and
observe the result of auto recovery

Yes

Is the system resuming


automatically?
No

No

Is also recovery successful?


1. Cooperate with the technical leader or
expert to troubleshoot the fault;
2. Back up related data or fault
information;
3. Troubleshoot the fault according to the
fault analysis flow on site

Yes
Check whether the fault is cleared via the
fault management system and failure
observation system. Check whether the
equipment is running normally
Yes
Back up the system

Inform the local office and the engineers


in the local office to troubleshoot the fault
via calls

Analyze the reason and report the


department and related leaders
Report the information two hours before
the system was interrupted and the
troubleshooting process to the equipment
producer

No
Recovery?

The producer analyzes fault reasons


according to the information in the mobile
phone on site and give solutions and
precautions
End

Recovery?

No
The local office sends engineers to the
site
Yes

You might also like