Вы находитесь на странице: 1из 54

Internal

OptiX OSN 1500/2500/3500/7500


Troubleshooting

www.huawei.com

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Upon completion of this course, you will be


able to:
List the common analysis methods of
fault locating.
Outline the Fault Handling Flow.
Analyze the typical faults: traffic
interruption, error bit, etc.

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 2

Chapter 1 Troubleshooting Preparation


Chapter 2 Troubleshooting Idea and Methods
Chapter 3 Classified Troubleshooting Examples

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 3

Requirements for Maintenance Personnel


Be familiar with hardware system and SDH basic principle,

particularly in the alarm signal flow


Alarm/performance generation principle
Master the basic operations of the transmission equipment

NMS, testing devices, loopback, board replacement

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 4

Requirements for Maintenance Personnel


Familiar with the network under maintenance

Network topology, network protection, traffic


configuration
Collect and save on-site data

System alarms, performance events data,


configurations, operation records of NMS

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 5

Flow Chart

Start

Record fault
symptom
External cause?
No
Analyze the fault to
locate it
Fault
removed?
No
Report the fault to
Huawei

Other handling
flows

Yes

OptiX OSN
emergency
handling flow
Yes

Continue 1
HUAWEI TECHNOLOGIES CO., LTD.

Continue 2
All rights reserved

Page 6

Flow Chart

Continue 1
Make solution
together

Continue 2

Try the solution


No

Service
recovered?
Yes
Observe service
running

No

Fault removed?
Yes
Compile the fault
handling report
End

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 7

Chapter 1 Troubleshooting Preparation


Chapter 2 Troubleshooting Idea and Methods
Chapter 3 Classified Troubleshooting Examples

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 8

One Question
What is the key for troubleshooting ?

To locate a failure ACCURATELY in one station

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 9

Basic Principles for Locating Faults


External first, then internal

Exclude external problems first


Broken fiber, switch failure
Power failure, grounding
Station first, then boards

Try your best to locate the


troubles to one node

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 10

Basic Principles for Locating Faults


LU first, then TU

LU alarms can lead to TU alarms


Higher-severity alarms first, then Lower-severity alarms

First analyze critical/major alarms


Then come to minor/warning alarms

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 11

Common Methods of Fault Locating

Alarm and performance analysis


Loopback
Replacement
Configuration data analysis
Configuration modification
Test with instruments
Rule of thumb

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 12

Alarm and Performance Analysis


Use NMS

How to obtain alarms


and performance?

Comprehensive
All alarms/performance events
from the whole network
Accurate
Current alarms, history
alarms, occurrence time and
performance event data can be
queried.

HUAWEI TECHNOLOGIES CO., LTD.

Observe indicators on
boards and cabinets

Not detailed
No history alarms

All rights reserved

Page 13

Alarm and Performance Analysis


Obtain alarm and

Select the key alarm or

performance events

performance events

Analyze reasons

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Limit the troubles to a


certain range or a node

Page 14

Alarm and Performance Analysis


R-LOS

11

22

33

HUAWEI TECHNOLOGIES CO., LTD.

44

MS-RDI

LP-RDI

All rights reserved

Page 15

TU-AIS

Loopback
What is loopback?

Loopback is the most common, most efficient method in


troubleshooting.

Inloop

Inloop
SDH equipment

Line

outloop

Line

outloop
Inloop

outloop
Tributary

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 16

Loopback
Board
involved

Tributary
board

Line
board

Notes

Loopback
options

Inloop/
outloop

Inloop/
outloop

Loopback
tools

Loopback
level

Application

Loopback
cable,
NMS

Loopback
at path
level

Separate switching faults from


transmission faults. Determine the
tributary board failure roughly. Be
unnecessary to modify service
configuration.

Patch
fiber, NMS

Loopback
by optical
interface

Locate single station faults. Roughly


determines the line board failure. Be
no need to modify service
configuration.

May interrupt the traffic and ECC


Software loopback is not a thorough method
Will automatically be removed in 5 minutes (provisionable)

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 17

Loopback

Procedures

Draw the traffic flow diagram


Loopback section after section to locate the
faulty NE
Locate the faults to certain boards

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 18

Loopback
Sample the traffic path
Choose one affected traffic path from the selected faulty NE
Select one NE from several faulty NEs

Notes

During loopback, try your best not to select VC4 #1

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 19

Loopback
Draw the traffic flow diagram
Traffic source and sink
Pass-through nodes and timeslot

w2:17

w2:17

e2:17

w2:17

t2:1
HUAWEI TECHNOLOGIES CO., LTD.

t2:1
All rights reserved

Page 20

Loopback
Loopback section after section
Connect testing device
Check alarms

Locate the faults to certain boards

w2:17

w2:17

e2:17

w2:17

t2:1
HUAWEI TECHNOLOGIES CO., LTD.

t2:1
All rights reserved

Page 21

Replacement
Objective

Application

Fiber

External faults

Cable

Boards faults

Module
Board

Effective thoughts
MSP switch
SNCP switch
Active/standby XC switch
TPS switch

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 22

Configuration Data Analysis


Query & Analyze the configuration
Timeslot configuration
J1 or C2 bytes
LU and TU paths loopback
SNCP or MSP switching conditions (e.g. MS-SD)
External commands (e.g. locked switch)
Consistent Configuration in both NMS and NEs

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 23

Configuration Modification

Objective

Application Examples

Port

No spare boards

Timeslot

Restore the traffic

Slot

temporarily

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 24

Testing Instrument
Instrument

Test item

Bit error testing device

Bit error/traffic

Optical power meter

Optical power

SDH analyzer

Bit error/traffic/overhead bytes

Multi-meter

Voltage/current/resistance

This method is the most authoritative, but we must have the devices in hand.

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 25

Rule of Thumb
Last resort
Reset board
Power off and on
Resend the configuration

Do not consider them as a panacea.

They not helpful for us to find the cause of


the failure.

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 26

Common Methods of Fault Locating


Methods

Application

Features

Alarm and
performance
analysis

Universal

1. Evaluate the whole network situation.


2. Locate the faulty point preliminarily based on the collected data.
3. Cause no negative effect on normal services
4. Depend on the NMS

Loopback

Locate the fault to a single


station or board

Replacement

Locate the fault to a board or


isolate external faults

Configuration
data analysis

Locate the fault to a single


station or board

1. Can find the fault cause.


2. Fault locating time is longer.
3. Depend on the NMS

Configuration
modification

Locate the fault to a board

1. Have a high risk.


2. Depend on the NMS

Test with
instruments

Isolate external faults and


resolve interconnectivity
problem

1. A general method with high accuracy


2. Have certain requirements for the meters.
3. Applied with other methods

Experience

Special cases

HUAWEI TECHNOLOGIES CO., LTD.

1. Independent of alarm and performance event analysis


2. Rapid and effective
1. Convenient
2. Require spare parts/equipment.
3. Applied with other methods

1. Fast fault handling


2. High probability of mistake
3. Need experience accumulation.

All rights reserved

Page 27

Common Troubleshooting Sequence


Exclude external troubles
Switching
problem?
Fiber problems?
Trunk cable?
Power supply
system?
Grounding
problem?

Replacement
Instrument
testing
Loopback
Alarm/performan
ce analysis

Locate troubles to one NE

Loopback
Alarm/performance analysis

Locate the troubles to one board

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Replacement
Loopback
Alarm/performance analysis
Configuration analysis
Configuration modification
Rule of Thumb

Page 28

Chapter 1 Troubleshooting Preparation


Chapter 2 Troubleshooting Idea and Methods
Chapter 3 Classified Troubleshooting Examples

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 29

Classified Troubleshooting Examples

Traffic Interruption
Bit Errors

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 30

Traffic Interruption
Possible Causes

External causes
Power

supply system
equipment poweroff,
undervoltage, etc.
Swtching problems
Fiber or trunk cables
Excessive attenuation,
fibercut
Cable disconnection

HUAWEI TECHNOLOGIES CO., LTD.

Operation causes
Loopback
Data

modification

All rights reserved

Equipment failure
Faulty

board
Performance degrade

Page 31

Traffic Interruption
Effective Methods
Alarm/performance analysis
Loopback
Replacement

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 32

Traffic Interruption
Procedures
Equipment operator

Check the indicator status on each board


Judge alarms
Hardware loopback
Replacement
NMS operator

Check the login of each station


Query and analyze alarms
Loopback section by section
Configuration modification
Implement switch
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 33

Traffic Interruption
No-protection Line
w

2:1

11

2:1 w

22

2:1

33

44

t2:1

t2:1

LP-RDI

TU-AIS

Network Configuration

Node 1 is the centralized services node.


Each station has E1 services with node 1.

Failure Description
Interrupted E1 service between node1 and 4
Node 4:TU-AIS
Node 1: LP-RDI
Other services normal

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 34

Traffic Interruption
w

2:1

11

2:1 w

22

2:1 w

33

44

t2:1

t2:1
TU-AIS

LP-RDI

Alarm analysis

Query alarms

TU-AIS in
node 4 only
Node 4 can not receive
the traffic from node 1

Other traffic normal


between nodes 1, 2, 3

Failure location
between nodes 3 and 4
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 35

Traffic Interruption
w

2:1 w

11

Loopback
t2:1

2:1 w

22

2:1 w

33

44
t2:1

BER
tester

Connect
tester

Normal

Outloop on VC4 #2
at node 4
Normal

Yes

Failure between nodes 3, 4


Optical port
inloop at east LU of node 3

Failure between nodes 3, 4


Inloop on VC4 #2
at east LU of node 3

HUAWEI TECHNOLOGIES CO., LTD.

Failure in node 3

Yes

Failure in node 4

No

No

Normal

No

Yes

Failure in node 4
All rights reserved

Page 36

Failure in node 3

Traffic Interruption
w

2:1

11

2:1 w

22

2:1 w

33

44

t2:1

t2:1

LP-RDI

Replacemen
t

TU-AIS

Locate failure in
one node
Maybe LU/TU/XC faulty
TPS switch
Traffic normal

Yes

Replace faulty TU

No

Active/standby XC switch
Traffic normal

Yes

Replace faulty XC

No

Replace faulty LU
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 37

Traffic Interruption
w
3
SNCP Ring

Network Configuration

e 2 w
SNCP Ring

TU-AIS

w 4 e

TU-AIS
Node 1 is the centralized services node.
Each station has E1 services with node 1.

Failure Description
E1 services interrupted between nodes1 and 4
Nodes 1, 4: TU-AIS
Other services normal

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 38

Traffic Interruption
w
e

e 2 w
SNCP Ring
w 4 e
TU-AIS
TU-AIS

Thoughts
and
methods

e TU-AIS
1
w
Alarm/performance analysis
Analyze configuration correctness
Disconnect ring, convert to line
Loopback
Replacement

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 39

Traffic Interruption
R-LOS
R-LOS

e 2 w

e TU-AIS
MSP Ring
TU-AIS 3
MSP Ring 1
e
w
STM-4
5 e
Network Configuration
w 4 e
w
Node 1 is the centralized services node.
Each station has E1 services with node 1.
Shortest service route configuration

Failure Description
Fibers between NE2-NE3 are broken
R-LOS
E1 services interrupted between nodes1 and 3
Nodes 1, 3: TU-AIS
Other services normal

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 40

Traffic Interruption
LU

MSP switch
process

SF or SD detection
K1 & K2 bytes transmission
XC

Normally process APS protocol


Started
Right

APS controller

switch state

XC

Implement switching
Protection channels

Available
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 41

Traffic Interruption
S

Query and check


alarms

R-LOS APS-INDI

Check switch status

APS-INDI

Yes
Normal
No
Maybe APS protocol stoped
Restart it
Switch status normal

e 2 w

S R-LOS

w
3

w 4 e
APS-INDI

No

Resend configuration
Switch status normal

Yes

No
Restart APS protocol node by
node to locate faulty LU/XC

Draw switched traffic


flow diagram
Loopback section after section
to locate faulty LU/XC

Replace faulty LU/XC


HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

MSP Ring
STM-4

e
Yes

Page 42

APS-INDI

5 e

APS-INDI

Traffic Interruption

R-LOS APS-INDI

e1:17

w1:17

e 2 w

SR-LOS

Normal route

w
3

APS-INDI
2

e1:17

w1:17

t2:1

t2:1

e APS-INDI
MSP Ring
1 P
w
STM-4
5 e
w 4 e
wAPS-INDI
APS-INDI
P

Switched route

e1:17

1 w1:17 2 w3:17

1
w3:17

5
w3:17

One complex line


Can use dichotomy

HUAWEI TECHNOLOGIES CO., LTD.

w3:17

t2:1

t2:1
Notes

e3:17 w3:17

e3:17

e3:17

e3:17

All rights reserved

Page 43

Bit Errors
Possible Causes

External causes
Performance

Equipment failure

degradation
excessive

of
fibers,
attenuation
Dirty fiber joint or incorrect
connector
Poor equipment grounding
Strong interference source
near the equipment
Poor
ventilation,
high
operating temperature
HUAWEI TECHNOLOGIES CO., LTD.

Transmitter

or receiver

failure in LU
Poor synchronization
Poor coordination
between XC and LU/TU
Fan failure
Faulty boards or poor
performance

All rights reserved

Page 44

Bit Errors
Effective Methods

Alarm/Performance analysis
Loopback
Replacement

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 45

Bit Errors
Operations

Equipment operator

Measure optical power


Check cable or fiber connection and
grounding
Clean fiber connector
Check ventilation and temperature
Hardware loopback
Replace board
Exclude interference source
NMS operator

Query and analyze alarms/


performance events
Loopback section by section
Configuration modification
Implement switch
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 46

RSBBE
MSBBE
HPBBE

Bit Errors
w

11

22

MSFEBBE
HPFEBBE

33

LPBBE

44

LPFEBBE

Network Configuration

Node 1 is the centralized services node.


Each station has E1 services with node 1.

Failure Description
Too many bit errors

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 47

RSBBE
MSBBE
HPBBE

Bit Errors
w

11

22

MSFEBBE
HPFEBBE

33

44

LPBBE

LPFEBBE
Check and exclude
external causes

Performance
event analysis

Performance
event
analysis
RSBBE/MSBBE/HPBBE
LPBBE from 4 to 1
from 4 to 3
LU first
then TU
Failure locates
between 3 or 4

continue
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 48

RSBBE
MSBBE
HPBBE

Bit Errors
w

11

22

33

LPBBE

Performance
event analysis

MSFEBBE
HPFEBBE

44
LPFEBBE

Check fans
and temperature
Normal

No
Solve problems

Yes
Measure or query
optical power
Normal
Yes

No

Check and replace


transmitter/fiber/
connector/receiver

continue
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 49

RSBBE
MSBBE
HPBBE

Bit Errors
w

11

22

MSFEBBE
HPFEBBE

33

44

LPBBE

LPFEBBE

Connect
BER tester

Loopback &
Replacement
Loopback

Active/standby XC switch

Locate and replace


the faulty LU/XC
HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 50

Modify configuration

Bit Errors
RSBBE
MSBBE
HPBBE

11

22

HUAWEI TECHNOLOGIES CO., LTD.

33

LPBBE

Question

MSFEBBE
HPFEBBE

44
LPFEBBE

How

to solve sporadic bit errors?


Interchange
You can not loopback for a long time
Fiber or LU

All rights reserved

Page 51

Questions
What is the key of troubleshooting?

To locate a failure ACCURATELY in certain station


What is the principle of troubleshooting?
External first, then internal
Station first, then boards
LU first, then TU
Higher-severity alarms first, then lower-severity alarms

HUAWEI TECHNOLOGIES CO., LTD.

All rights reserved

Page 52

Which methods for troubleshooting?

Summary

HUAWEI TECHNOLOGIES CO., LTD.

1Alarm and performance analysis


2Loopback
3Replacement
4Configuration Data Analysis
5Configuration Modification
6Test with instruments
7Rule of Thumb

All rights reserved

Page 53

Thank You
www.huawei.com