Вы находитесь на странице: 1из 17

FAULT MANAGEMENT

BY : 1. DHANNY P (55412110012) 2. SHARITA (55412110011) 3. M. ANAS MASA (554121100..) 4. M. SALSABIL (554101100..)

Introduce
Fault management is the process of locating and correcting network problems or faults Fault management is probably the most important task in Network Management

Benefit of Fault Management Process


Increased network reliability
Provides tools allowing engineer to quickly
Detect problems Initiate recovery procedures

Need to maintain the illusion of complete and continuous connectivity Also provides tools to extract information about the networks current state

Benefit of Fault Management Process


Efficient fault management can:
Save repair costs throught efficient fault detection, location, and correction Improve customer care through efficient trouble administartion Improve service availability and equipment reliability through proactive maintenance and through measurement, review, and corrective action

The steps for succesful fault management


Identify the problem by gathering data about the state of the network Isolate the cause, and decide if the fault should be managed Correct the fault

Fault Management Terminology


Divided by three :
Prime Network
alarm represents scenario that involves a fault in the network, a managed element or the management system ticket represents an attention-worthy root cause alarm. It has a status which represent the entire corelation tree. It represents the same information as an alarm in prime optical

Prime Optical
alarm represents a notification from a managed Network Element (Ne) that a certain fault condition occured. Does not used the term ticket

Prime Central
alarm to mean a root cause fault condition on which the entire fault lifecycle can be performed

Step 1 : Identifying The Problem


Gathering Information (data) to identify a problem
To learn that a problem exists we need to gather data about the current state of the network

Two approaches
Log critical network events :
Examples : Transmitted by network device when fault conditions occur Reactive method If device fails it cannot send an event
Failure of a link Lack of response from host

Step 1 : Identifying The Problem


Poll network devices :
Can help find faults in a timely manner Tradeoff Degree of timeliness vs bandwidth consumption Other factors Number of devices to poll Bandwidth of links Example Assume each query and response is 100 bytes long (including data and header information) For a network of 30 devices (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval Polling every minute 800 bits/second (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173 Megabits/hour Polling every 10 minutes 17.3 Megabits/hour May not know about event for 10 minutes

Step 2 : Decide which fault to be managed


Need to decide which faults to manage
Need to prioritise faults If number of faults reports is high network may not handle volume Limiting event traffic can reduce redundant transmissions and storage

Factors to consider
Scope of control over network Size of network

Fault Management of Network Management System


Simplest system
Reports existence of fault but NOT location

More complex tool


Uses capability of hosts and network devices to
Send critical network events Facilitate isolation of fault cause

Advanced tool
Correction of fault

Impact of Fault On The Network


A fault management tool MUST be capable of analysing how a fault can affect other areas of the network Need to know
What services the fault
STOPS IMPACTS

Not only that a fault has occurred but also how that fault affects other network communication

Data can come from performance management tools

Form of Reporting Fault


Common forms of fault reporting
Text Graphical Auditory signals

Text
Will work on any type of terminal

Graphical
Considered to be very effective Can use flashing images to gain attention Colour can be used to indicate device status

Auditory signals
Will quickly call attention to the occurrence of a fault

Then Correct The Fault

TOM by NetCracker

Monitoring & Mediation


Monitoring of network fault alarm Assembly of network performance reports Transformation of network data into a single format Mediation of network alarm and event mediation Data extraction adapters

Analysis & Control


Historical data storage Root cause analysis Problem identification Trouble ticket inititation Problem resolution tracking

Reporting Engine
Repository of report templates Representation of rule-based network statistics Generation of user-configurable templates Data comparison and historical reporting

Вам также может понравиться