Вы находитесь на странице: 1из 7

Ethernet and MPLS OAM

Operations, Administration and Maintenance

Overview
This paper describes the Ethernet and Multi-Protocol Label Switching (MPLS) tools and procedures used to accomplish Operations, Administration, and Maintenance (OAM). This functionality addresses the fault management aspects of the Fault, Configuration, Accounting, Performance, Security (FCAPS) model as defined by the ITU-T Telecommunication Management Network (TMN), as shown in Figure 1.
Protecting revenue Maximizing revenue growth Reducing operational costs

Objectives
by preventing service outages and offering faster service restoration by enabling richer service offerings by cutting repair costs and operational overhead

Figure 2. OAM objectives

S FCAP

OAM Process Flow


Figure 3 describes the service provider process flow when faults appear in the network, starting with the fault and ending after verification of the repair. Each step must be optimized to protect both the service provider and the subscriber.
Secur Perfo

Fault Fault Detection Fault Notification Fault Verification Fault Isolation Repair Repair Verification
Figure 3. OAM process flow

Accou

Config

BML

ity Ma

rmanc

nting

TM N

Fault Mana

uratio

SML NM EML NEL

nagem

e Man

Mana

n Man

agem

ent

geme nt

geme nt

agem ent

ent

OAM
NEL: Network Element Layer (devices) EML: Element Management Layer (device-level functions) NML: Network Management Layer (topology management) SML: Service Management Layer (Service Level Agreements (SLAs)) BML: Business Management Layer (budgeting and billing)

Fault Detection

Legend :

Figure 1. FCAPS model

Recent enhancements to Ethernet and MPLS have added carrier-class OAM features for monitoring, detecting, verifying, isolating, and repairing faults, with appropriate notifications to network administrators. These enhancements enable network operators to deploy timesaving, automated, self-healing practices, as well as on-demand diagnostics and troubleshooting techniques. The purpose of OAM is to improve revenue growth and profitability for service providers, as outlined in Figure 2. This white paper describes the OAM features in the context of the objectives above, and the unique benefits of Cienas solution.

Fault detection includes mechanisms to detect faults at the device control plane or data plane level. Faults must be detected quickly enough to minimize Time to Recover (TTR). However, detection should be based on an observation window large enough to avoid false fault detections. For example, a control plane can become nonresponsive for a few microseconds while handling a burst of interrupts. As long as the control plane is restored to a normal state within an acceptable time window, the network element does not experience a software failure. OAM handles a wide range of failure scenarios that vary in nature and location, from a software defect to a backhoe tearing apart a fiber conduit by mistake. There are three major categories of failure:
> > >

Link failure Service transport failure SLA failure

White Paper

Ethernet and MPLS OAM

Link Failure Link failure represents either the complete failure of a link or the performance of a link degrading below an acceptable level. The causes may include an optical transceiver failure at either end of the link, dust or other impurities in the connector, a fiber cut between the elements, or element failure at the other end of the link.
Cienas Carrier Ethernet Service Delivery (CESD) switches are optimized to enable network reconvergence below 50 ms. These enhancements allow Ethernet service-delivery networks based on Ciena products to support critical, time-sensitive applications with the same SLAs and guarantees of SONET/SDH optical rings. This level of performance is achieved, in part, by providing high-priority, interrupt-based failure detection, shielding services from link-level failures.

Service Level Agreement Failure The SLA describes the characteristics of the services provided by carriers to their subscribers. Adherence to the SLA can be measured using one or more of the following metrics:
>

Frame Delay: delay experienced by the traffic carried by the service Frame Delay Variation: variation in that delay Frame Loss: percentage of frames passed through the service that were dropped by the network Service Availability: percentage of time when the service is available to the subscriber

> >

>

Service Transport Failure Ethernet services can be transported natively, using Virtual Local Area Networks (VLANs) (IEEE 802.1Q) or stacked VLANs (802.1ad), or MPLS tunnels and MPLS Virtual Circuits (VCs). Each of these transport mechanisms can fail due to software failure, memory corruption, or simple misconfiguration.

Monitoring these SLA parameters provides indications of fault or performance issues. The Metro Ethernet Forum (MEF) and the ITU-T are defining standards for performance management of Ethernet services. This white paper focuses on the fault management aspect of SLA failures. SLA failures can be caused by link failures, such as a failing optical transceiver resulting in partial packet loss, or a service transport failure, such as a software failure leading to incorrect forwarding tables.

Cienas True Carrier EthernetTM offerings are the only access/metro edge solutions that enable service providers to deploy any mix of Ethernet and MPLS-based service transports over a common infrastructure. This allows service providers to migrate easily from Ethernet to MPLS access deployments and extend the services and capabilities of an MPLS core network directly to subscribers, with no additional capital investment required. Ciena, through the early adoption of IEEE 802.1ag Connectivity Fault Management (CFM) provides VLAN-based service transport OAM. The combination of Label Switched Path (LSP) ping, LSP traceroute, Virtual Circuit Connection Verification (VCCV), Bidirectional Forwarding Detection (BFD) and Fast ReRoute (FRR) provides comprehensive MPLS-based service transport OAM.

Cienas CESD switches offer intelligent classification and queue servicing, which minimizes frame delay and variation. In addition, Ciena provides a unique set of selfhealing techniques at the link and service transport layers, to minimize SLA failures relating to frame loss and service availability.

Fault Notification
Once detected by the network element layer, the fault needs to be conveyed to the entities that will work toward repairing the fault. Such entities can require either human or automated servicingsuch as the manual replacement of a faulty transceiver, or a Rapid Spanning Tree Protocol (RSTP) reconvergence after a link failure, respectively. In any case, fault notification should be:
>

Responsive: the time saved will protect revenue and may avoid penalties. Meaningful: a mere link down Simple Network Management Protocol (SNMP) trap sent when an optical transceiver fails is insufficient. A trap containing information regarding the faulty transceiver and the reason for the failure reduces troubleshooting cost.

>

Ethernet and MPLS OAM

>

Concise: sending multiple traps with redundant failure information will obfuscate the real cause of the failure and slow down the fault isolation step.

costly. Features such as alarm correlation help minimize the cost of isolating a fault by decreasing the number of fault notification messages.

Ciena provides a comprehensive solution for optimum fault notification, including high-priority generation of SNMP traps with a content focused on failure source. In addition, Cienas Ethernet Services Manager (ESM) solution offers alarm correlation capabilities enabling network operators to associate alarms to more quickly isolate the cause of the fault.

Ciena offers a complete on-demand OAM solution, enabling the network operator to conduct layer-by-layer fault isolation (link, service transport, and SLA layers). Figure 4 shows the extent of the various OAM mechanisms useful for isolating faults.

Repair
Depending on the efficiency of the OAM process, repair and preventative maintenance can occur at different stages:

Fault Verification
After notification, the Network Operation Center (NOC) engineer should verify the fault, and determine whether the condition persists. By the time the link fail indication is received, the Ethernet network will have reconverged. Under most conditions, failover and restoration with Cienas Carrier Ethernet Service Delivery devices takes less than 50 ms. Fault verification using on-demand OAM techniques eliminates false failure indications. Not verifying the validity of the fault could lead the network operator to try to isolate a failure that does not exist.

>

After the fault impacts the service. Time-to-repair is most critical, as the network operator needs to remedy the problem quickly to restore the service. Cienas True Carrier Ethernet solutions provide modularity in the network elements, enabling the network operator to change only the failed element, saving time and eliminating impacts to other services. For example, risk of error is eliminated because the failure of a hotswappable transceiver does not require the replacement and re-cabling of the entire network element. Before the fault impacts the service. Redundancy enables proactive maintenance, significantly reducing service outage times. Cienas modular solution, coupled with redundant links, control modules, power supplies, and fans, allows non-invasive repair of network components, protecting the services the components carry. For example, the failure of a redundant control module will lead only to non-invasive switchover to the standby module. Before the fault leads to an element or network failure, such as a performance degradation scenario. By continuously monitoring key metrics relating to element and network health, service providers can schedule maintenance preemptively, thereby using fewer resources.

>

Fault Isolation
Fault isolation consists of determining the exact source, location, and nature of the fault, including the specific network element(s) and network layer(s) experiencing the fault. A failure at a low level may impact higher levels and lead to additional failures. For example, a link failure can lead to broken MPLS tunnel connectivity, also impacting all of the MPLS VCs that tunnel carries. Notification of a low-level failure can be followed or surrounded by higher-level failure notifications. This process makes fault isolation more difficult, time-consuming, and
Service Agreement Layer

>

Repair Verification
After a remedy is enacted, the same ondemand OAM mechanisms used during fault verification confirm that the fault no longer exists. An IP ping can be used both to verify IP connectivity faults on the control-plane and restore connectivity.

Service Transport Layer Link Layer

Service Transport Layer Link Layer

Ethernet

MPLS

MPLS

Figure 4. Major network fault categories

Ethernet and MPLS OAM

Service Level Agreement

MEF Service OAM

MEF

Service Transport

VCCV/BFD IEEE 802.1ag CFM/ITU-T Y.1731

MSTP MSTP

LSP Ping
802.1ag Y.1731

Link

802.3ah EFM OAM

SNMP Fault Notification

IEEE 802.3ah EFM OAM Fault Verification Fault Isolation

MSTP

802.1ag Y.1731

Fault

Fault Detection

Repair

Fault Verification

Figure 5. OAM protocols matrix

OAM Protocols
With the addition of comprehensive OAM capabilities, Ethernet and MPLS offer a complete feature set that allows carriers to maximize Ethernet-based service revenue. IEEE, IETF, ITU-T, and MEF now describe mechanisms that report the status of a given end-to-end service, representing a subscriber-centric view of the network, and provide link connectivity information, representing a provider-centric view of the network. Figure 5 offers a high-level view of these mechanisms against the OAM process flow and different failure categories. IEEE 802.3ah Ethernet First Mile (EFM) OAM EFM OAM, described in Figure 6, provides link-layer mechanisms that complement applications that may reside in higher layers (such as IEEE 802.1ag or MEF Service OAM). EFM OAM, also called link OAM, encompasses a simple protocol that operates across a single link.
Features
Auto-discovery

IEEE 802.3ah EFM OAM


Benefits
Eliminates the need for operator configuration Enables the detection of a one-way link failure Provides on-demand link diagnostics, including bit-error rate approximation Offers proactive, traffic-based threshold link monitoring Supports communication of network element conditions that may cause link failure, including power and temperature Allows supplemental link statistics collection, augmenting SNMP Enables standards development organizations and vendors to expand scope

Uni-directional Fault Signaling Remote Loopback Link Monitoring Critical Events

Layer 2 Variable Retrieval Organization-specific Extensions

Figure 6. IEEE 802.3ah EFM OAM

Thresholds are configured to monitor signal degradation, such as frame errors. Messages are passed across the link to The CFM protocol, often called Ethernet OAM, sends heartcommunicate statistics regarding link health. When a failing beat style Continuity Check Messages (CCMs). Failure to link is detected, SNMP communicates this to management receive these messages, in order, in a certain amount of time stations. In addition, the link may be taken out of service and indicates one or more possible network errors, including placed in remote loopback mode for fault isolation. Prior to path or device failure or network configuration problems. placing a link in service, EFM OAM may be used to test the Management stations monitor the status of the reception performance of the link. Once verified to be operational and of CCMs and take appropriate action. error-free, the link is taken out of remote loopback and placed in service. Standby links may be tested continuously prior to being activated by protocols IEEE 802.1ag CFM such as IEEE 802.1w RSTP or IEEE 802.1aq Shortest Features Benefits Continuity Check Continuously verifies VLAN connectivity and Path Bridging.
may indicate network faults or misconfigurations

service transport networks. CFM, operates at both the physical and logical levels, monitoring and troubleshooting faults. For instance, CFM can monitor physical links between adjacent or distant devices. In addition, fault monitoring between two end-points can be configured based on a logical network layer (such as per-VLAN). Key CFM features are shown in Figure 7.

IEEE 802.1ag Connectivity Fault Management Building upon IEEE 802.3ah EFM OAM, IEEE 802.1ag CFM specifies capabilities for detecting, isolating, and reporting connectivity faults for VLAN-based

Loopback Request (MAC ping) Linktrace Request (MAC traceroute)

Offers on-demand or proactive indication of VLAN control-plane responsiveness Provides on-demand or proactive VLAN topology information

Figure 7. IEEE 802.1ag CFM

Ethernet and MPLS OAM

ITU-T Y.1731
Features
Alarm Indication Signal (ETH-AIS) Remote Defect Indication (ETH-RDI) Locked Signal (ETH-LCK) Test Signal (ETH-Test) Performance Monitoring (ETH-PM) Frame Loss Measurement (ETH-LM) Frame Delay Measurement (ETH-DM)

Benefits
Provides fault notification for devices not participating in the VLAN-based Ethernet Continuity Check Offers fault indication of the other end of a VLAN-based Ethernet service Enables maintenance actions while differentiating and isolating actual fault conditions Allows a one-way, on-demand, in-service or out-of-service VLAN test, such as throughput or frame loss Monitors traffic performance on a point-to-point, end-to-end, VLAN-based Ethernet service Collects end-to-end frame loss information to approximate severely errored seconds, which indicate VLAN-based service transport availability Provides an on-demand Frame Delay and Frame Delay Variation measurement between two points of the VLAN-based service

VLAN-based service transport networks configure certain network elements at Maintenance End Points (MEPs). These MEPs sit at the boundaries of Ethernet domains. Figure 9 shows the span of the different OAM mechanisms offered by Y.1731 MPLS MPLS deployed to the customer premises facilitates the interconnection of the access infrastructure with the existing MPLS core network, while increasing the need for MPLSspecific OAM tools. Further description of MPLS shown in Figures 10 and 11.

Figure 8. ITU-T Y.1731

LSP Ping Troubleshooting tools are provided in the form of Media Access Control (MAC) ping (formally known as IEEE 802.1ag Loopback Request) and MAC traceroute (formally known as IEEE 802.1ag Linktrace Request). Network operators may initiate these features, or the features may run automatically as monitoring functions in background processes. LSP ping is an in-band, on-demand mechanism to verify the status of an MPLS tunnel. An LSP can fail because of misconfigurations such as disabled MPLS, mismatched labels, or routing into the wrong tunnel, or broken Label Distribution Protocol (LDP) adjacencies, corruption of Forwarding Information Bases (FIB), or other software/ hardware failures. LSP ping sends an echo request to a target Label Switch Router (LSR) using MPLS addressing. To prevent the IP packet from being routed to its destination, the destination IP address of the echo request packet is defined as 127.0.0.0/8. If reached, the destination LSR sends an echo reply back to the originator of the MPLS echo request.
CE

ETH-PM ETH-AIS, ETH-RDI, ETH-LCK ETH-Test, ETH-LM, ETH-DM

UNI CE MEP

Ethernet
MEP

Ethernet
MEP

UNI

Figure 9. ITU-T Y.1731 architecture


Features

MPLS
Benefits
Offers on-demand connectivity information about MPLS tunnels Provides MPLS switching and Maximum Transmission Unit (MTU) configuration information Enables proactive connectivity monitoring of MPLS pseudowires Allows scalable, proactive data-plane verification of MPLS LSPs Provides automated repair of MPLS failures Label Switched Path Ping LSP Traceroute Virtual Circuit Connection Verification Bi-directional Forwarding Fast ReRoute

Since CFM is being developed after completion of the IEEE 802.1ad Provider Bridges protocol, a second important aspect of the project allows multiple nested Maintenance Domains (MDs) to coexist on the same physical network, each potentially managed by a different administrative organization (service provider or network operator). ITU-T Y.1731

Figure 10. MPLS OAM

ITU-T Study Group 13 developed Y.1731 in cooperation with IEEE 802.1ag CFM, further defining VLAN-based service transport OAM functionality. Several additional features offer performance monitoring capabilities. ITU-T Y.1731 and CFM use an identical frame format and share the same operation code (OpCode) space. As a result, these complementary protocols are simpler to deploy in a service providers network. Figure 8 provides a summary of the features contained in Y.1731.

Ciena offers a solution allowing transport of Ethernet services, either natively or using MPLS encapsulation.

Ethernet and MPLS OAM

Fast ReRoute Fast ReRoute allows automated repair of LSP tunnels to reduce packet loss on LSPs. If there is a link or node failure, an LSP employing Fast ReRoute can redirect MPLS traffic to previously computed and established alternate paths around the failed link or node. The alternate paths are selected during the establishment of a primary LSP under hop-by-hop control. With Fast ReRoute enabled, Resource ReSerVation Protocol-Traffic Extension (RSVP-TE) establishes local alternate LSPs for each potential point of failure along the primary path. MEF Service OAM The MEF is pursing a complementary set of OAM-related functions operating at the SLA layer. The Phase 1 specification will contain performance monitoring capabilities for point-to-point services reflecting the frame loss ratio, frame delay (latency), and frame delay variation (jitter) characteristics of the service, as shown in Figure 12. In addition, per-service fault management will be supported for point-to-point, point-to-multipoint, and multi-point services. Fault detection encompasses loss of continuity between management end-points and detection of potential for loops in the service. This fault detection/ verification capability is supported proactively or on demand through operator action. MEF Service OAM, often called Service OAM, also provides fault isolation and fault notification.

VC B

MPLS

l Tunne

MPLS Tu

nnel

VC A

Figure 11. Basic MPLS constructs

LSP Traceroute LSP traceroute determines the hop-by-hop path and destination of an LSP. Like LSP ping, traceroute is an in-band, on-demand MPLS OAM utility that uses an MPLS echo request/reply mechanism to detect MTU misconfiguration between LSRs. However, with LSP traceroute, all LSRs along the pathup to and including the destination LSRreply to the echo request. This technique allows the operator to identify and distinguish LSRs along a path. Virtual Circuit Connection Verification Using LSP ping, a service provider can monitor the status of an MPLS tunnel. To diagnose a problem within the tunnel, the service provider needs a mechanism to verify the connectivity of the pseudowires (VCs). VCCV allows proactive monitoring of pseudowires within MPLS tunnels by establishing a control channel associated with each pseudowire. Bi-directional Forwarding Detection VCCV requires involvement of the MPLS control-plane; as the number of VCs increase, so will the load on the controlplane. BFD allows systematic and more scalable detection of MPLS LSP data plane failures, with less involvement from the control plane. As a result, BFD allows faster detection on a larger number of LSPs. BFD relies on a hello packet exchanged by neighbors at negotiated, regular intervals. When a hello packet is not received as expected, the neighbor is declared down.

IP
Ethernet services offer the benefit of low deployment costs by not requiring IP provisioning of each individual data plane element. However, the control plane uses mostly IPbased protocols, such as Telnet, SNMP, or IGMP. In that regard, control plane failures must be detected at the IP level. Two mechanisms have been in use since the advent of IP networking: IP ping, which provides on-demand connectivity verification of the IP control-plane, and IP traceroute, which offers routing and delay information for an IP destination.

MEF Service OAM


Features
Point-to-point Ethernet Virtual Circuit Performance Monitoring Point-to-multipoint EVC PM Multipoint-to-multipoint EVC PM EVC Fault Management Enables identification and isolation of fault at the SLA layer Provides SLA assurance for different services

Benefits

Figure 12. MEF Service OAM

Ethernet and MPLS OAM

IP Ping IP ping is a basic mechanism that verifies IP connectivity through the network. It verifies that a given IP address exists, is reachable, and can accept ping requests, and calculates the latency between the control planes of two IP network elements. IP Traceroute IP traceroute is another OAM tool that records and displays the IP message route between two IP elements. It also calculates the latency between the control-planes of each IP element of the route.

Conclusion

Cienas Carrier Ethernet Service Delivery solution, described in Figure 13, enables service providers to operate, administrate, and maintain any mix of Ethernet and MPLS-based L2 VPNs effectively. By leveraging this unique OAM capability, service providers can protect current revenue and maximize revenue growth, while reducing operational costs.

Objectives

Carrier Ethernet Service Delivery Solution


Preventing service outages:
> > > >

Protects revenue by:

Sub-50 ms automated network reconvergence Robust Quality of Service (QoS) architecture minimizes SLA failures Modular architecture enables planned non-invasive repairs Redundancy for mission-critical network components Generates precise failure information more quickly Service-aware OAM feature set intelligently traverses each layer as needed Complete OAM feature set covers each network layer (link, service transport and SLA) Comprehensive Ethernet and MPLS OAM feature sets Intelligent classification Advanced alarm correlation simplifies fault isolation Hot-swappable solution enables shorter and less expensive repairs On-demand OAM techniques eliminate unnecessary investigation of false failure indications Modular solution reduces cost of spares Proactive monitoring enables cost-effective preemptive maintenance

Offering faster service restoration:

> >

>

Maximizes revenue growth by:

Enabling richer service offerings: Reducing repair costs: Reducing operational overhead:

>

> >

Reduces operational costs by:

>

> >

Figure 13. Cienas Carrier Ethernet Service Delivery solution

Specialising in transition to service-driven networks to help you change the way you compete.

1201 Winterson Road Linthicum, MD 21090 1.800.207.3714 (US and Canada) 1.410.865.8671 (outside US) +44.20.7012.5555 (international) www.ciena.com

Ciena may from time to time make changes to the products or specifications contained herein without notice. 2009 Ciena Corporation. All rights reserved. WP062A4 2.2009

Вам также может понравиться