Вы находитесь на странице: 1из 10

Fault Management in Hybrid Environment with IP and Optical Networks

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh


National Center for High-Performance Computing, Taiwan {tlliu, n00hmt00, csyang, cyeh}@nchc.org.tw

Abstract. Internet2 community was initiated by National Research and Education Networks (NRENs) that peers with each other at IP Layer. With the explosive growth of trafc demands from E-Science applications, optical networks has been introduced in late 90s for exploring higher bandwidth and better QoS guarantee. Problems arise when integrating the whole new optical infrastructure with current IP backbone, including controlling data path and network management. In this paper, we focus on fault management issue in hybrid networks containing DWDM, SDH and IP technologies. For instance, when there is a ber cut in DWDM layer, tens of alarms would appear concurrently indicating network breakdowns in DWDM, SDH and IP layer separately, which will confuse engineers for error locating. Therefore, we experiment with several scenarios and propose a novel correlation method among these layers according to correlation diagram. Our contribution would benet NRENs with hybrid networks for root-cause nding among alarms cross different network layers Key words: hybrid networks, fault management, alarm analysis, NREN, Optical, IP

Introduction

As the growth of Internet business, network trafc expanded tremendously since last century. However, there lacks of effective QoS mechanism over Internet backbone and packets get crowded during rush hours. Soon there came strong demands of building a separate information highway for research and education community. In 1996, NextGeneration Internet Initiatives of US is announced, which was then expanded to a new global research network known as Internet2 [1]. In this new network infrastructure, National Research and Education Networks (NRENs) from different countries could get connected by direct peering or transit service from other NRENS. Internet2 successfully drives several e-Science applications such as high-resolution medical imaging, highdenition video streaming, and large-volume transfer of science databases. These applications require not only big pipes but also QoS-guaranteed transfer. Optical networks can provision such dedicated channels for individual connections between remote users. These channels, called lightpaths [2], allow packets to be sequentially delivered without jitter and congestion. In order to meet e-Science applications demand, NRENs started to deploy optical backbone since late 90s, including CA*net, NLR, Internet2, SURFnet, TWAREN, CESNET and so on. In the year of 2003, Global Lambda
Gabriela Kr maov, Petr Sojka (Eds.): CESNET Conference 2008, Proceedings, pp. 8594, 2008. c r c CESNET, z. s. p. o., 2008

86

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh

Integrated Facility (GLIF) [3,4] was established. GLIF is a virtual organization that shares optical network resource around the globe. Participants of GLIF share their residual optical bandwidth which can be found in GLIF map [5]. Then these resources are managed and allocated by GLIF Open Lambda Exchange (GOLE) points. End-to-end lightpaths can be provisioned across continents and oceans.

Fig. 1. Hybrid Networks adopting DWDM, SONET/SDH and IP technology

With the global construction of optical networks, hybrid networks that employ both IP and optical technologies became a popular infrastructure in research community. In optical networks, two main technologies are covered: DWDM (Dense WavelengthDivision Multiplexing) and SONET/SDH. For NRENS that own or acquire dark bers from carriers, DWDM technology can be used for all-optical data transmission. At least 30 client data signals can be converted to wavelengths and then transmitted simultaneously over a single ber. These data signals could be 10GE, or SONET/SDH. SONET/SDH uses time-division multiplexing technique which specic number of time slots from SONET/SDH links can be allocated to individual circuits (also know as lightpaths in research community). The two terminations of lightpaths are then connected with client equipment with GE or POS interface. Therefore, if all above technologies are deployed, hybrid network is constructed of three layers: DWDM, SONET/SDH and IP as shown in Fig. 1. In the bottom layer, dark ber footprints form the topology of DWDM network. Wavelengths in these bers are then converted into SONET/SDH signals that construct the second layer. Finally, IP devices are connected with each other through SONET/SDH circuits/lightpaths. When managing faults in hybrid network environment, cross-layer relationships must be analyzed carefully. For example, in a network disruption event of ber cut, all the wavelengths in that ber get fail. SONET/SDH and IP networks on top of these wavelengths will also be affected. In this paper, we would like to discuss these cross-layer relationships and propose a novel alarm correlation method. Problem denitions are given in Section 2. Traditional correlation process is also introduced in the same section.

Fault Management in Hybrid Environment with IP and Optical Networks

87

In Section 3, we experiment real scenarios and illustrate the correlation diagram. Our proposed alarm correlation method is operated following the paths on this diagram. Finally, we briey conclude our paper in Section 4.

Alarm Correlation Crosses IP and Optical Layers

We depict the typical network layer hierarchy in hybrid networks in Fig. 2. We assume that there are f bers run in the same conduit. For each ber, there are w wavelengths which are converted into oc-192 (for SONET) or STM-64 (for SDH) signals. Average x SONET/SDH circuits traverse each oc-192 or STM-64 link. End users connect their IP clients (router, switch or PC) to the termination point of lightpaths to utilize the end-to-end bandwidth assigned from underlying optical infrastructure.

Fig. 2. Network Layer Hierarchy in Hybrid Networks

If network breakdown occurs in single layer, network alarms from all affected layers would appear concurrently. Therefore, correlation could be done from bottom layer up. In order to compute the complexity of alarm correlation, we discuss 5 cases as follows: Case 1. Client signal loss The complexity of this case is O(1) since there would be a single alarm from lightpath termination point. Case 2. SONET/SDH link loss of signal For the broken SONET/SDH link, we have to omit alarms from x lightpaths that traverse over it. Hence the complexity of this case is O(x).

88

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh

Case 3. Wavelength fail If transponder malfunctions, the wavelength cannot convert to SONET/SDH signal and will cause SONET/SDH link loss of signal. The complexity is the same as Case 2. which is O(x). Case 4. Single ber cut In case of ber cut, all w wavelengths are malfunctioned, including x lightpaths for each wavelength. In order to correlate these alarms, the computation complexity is O(wx). Case 5. Conduit collapsed If the conduit collapsed due to road construction or destruction, all f bers in the conduit are cut concurrently. The complexity for each ber is O(wx) according to Case 4. As a result, total time complexity is O(fwx). From the above discussion, we conclude that the worst case complexity for crosslayer correlation is O(fwx). If we can draw a correlation diagram and nd the root cause from top layer down, the complexity could be minimized. Details will be given in next section.

Experiment Scenarios and Correlation Diagram

In this section, we would like to experiment real case scenarios according to previous section. These tests are performed at our tesdtbed in TWAREN optical lab. From the test results, a novel correlation ow is proposed to simplify root-cause analysis.

3.1

TWAREN Optical Lab

TWAREN (TaiWan Advanced Research and Education Network) is the network backbone that represents NREN of TAIWAN in global Internet2 community. It is a hybrid network that employs both IP and SDH technologies. In order to support pilot projects for cutting edge experiments, TWAREN deployed an optical network laboratory [6] supported both next-generation SDH and DWDM. The laboratory is equipped with CISCO ONS 15454 optical switches, which support SDH modules and DWDM network modules in single chassis. Therefore, hybrid network architecture that supports IP, SDH and DWDM can be designed and deployed using single product. The network topology is depicted in Fig. 3. We simulate three sites called A, B and C. In site A, we have two ONS shelves that operate in DWDM and SDH layer separately. In site B and C, both DWDM and SDH modules can be installed in a single shelf. The DWDM equipment in site A functions as ROADM to provide direct wavelengths between B and C. Site B and C are simply DWDM leaf nodes. For each site, there is a CISCO 7609 router connected to SDH terminations of ONS. In DWDM layer, there are two dark bers which connect node A-B and node A-C. In SDH layer, we provision three STM-64 links designated as A-B, A-C, and B-C to form a ring topology. By connecting CISCO 7609 routers to the SDH terminations, this topology acts as hybrid architecture to serve both IP network and optical network.

Fault Management in Hybrid Environment with IP and Optical Networks

89

Fig. 3. Network Topology of TWAREN Optical Lab

3.2

Test Scenarios and Results

In order to simulate the link breakdown in several layers, we have to examine the device cards in ONS 15454, which is illustrated in Fig. 4(a). Dark ber is connected to OSC-SCM (Optical Service Channel-Combiner/Splitter Module) module, where the proprietary optical service channel is dropped for management operations while other data channels are delivered to de-multiplexer (32-DMX) and wavelength select and switch (32-WSS) modules for wavelength separation and switching (including local add-drop). In our lab environment, 32 wavelengths could be de-multiplexed from each dark ber. For local add-drop, each wavelength is connected with transponder (TXP-MR10G) for conversion to client signal, which is STM-64 in our conguration. Through the switching backplane of ONS, SDH circuits/lightpaths are provisioned over STM-64 links and terminated at Gigabit Ethernet cards (CE or ML cards). During our test, a lightpath is provisioned between Site A and C. Lightpath terminations at A and C are connected to routers at each site. We design following four test scenarios to simulate link breakdown at each layer, which is referred to Fig. 4(b). Table 1 lists the generated alarms in each test scenario. Each orange row indicates a major alarm while each red row indicates a critical alarm1 . Test 1. Client signal loss Client signal loss can be simulated simply by shutdown of the router interface that connects to lightpath termination. As a result, we can easily observe that CE/ML card at near end delivers a "Carrier Loss on LAN" alarm to address the problem. The CE/ML card at the far end of the lightpath termination simultaneously issues a "Transport Layer Failure" alarm to notify that the client IP signal cannot be detected from this lightpath. Test 2. SONET/SDH link loss of signal We unplug the link between STM-64 and transponder cards to enforce loss of signal for a SDH link. From Table *3-1, both near and far end of lightpaths that traverse this
1 Critical indicates a severe, service-affecting alarm that needs immediate correction. Major is

still a serious alarm, but the failure has less of an impact on the network [7]

90

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh

(a)

(b)

Fig. 4. (a) Device cards of ONS 15454 (b) Link Breakdown Test Scenarios Table 1. Generated Alarms in each Test Scenario
Scenario Test 1 Test 2 Alarm source (near-end or far-end) Carrier loss on LAN near Transport Layer Failure far Transport Layer Failure near and far Loss of Signal near Loss of Signal near Transport Layer Failure near and far Incoming Payload Signal Absent near and far Incoming Payload Signal Absent near Transport Layer Failure near and far Incoming Payload Signal Absent ROADM site Loss of Signal near and far Alarm Description Affected card/module CE/ML CE/ML CE/ML STM-64 TXP-MR-10G CE/ML TXP-MR-10G 32-WSS CE/ML TXP-MR-10G OSC-CSM

Test 3

Test 4

SDH link will send Transport Layer Failure alarms which reveal that there are signal detection problems on these lightpaths. In SDH and wavelength layer, we only receive

Fault Management in Hybrid Environment with IP and Optical Networks

91

"Loss of Signal" alarms from near end STM-64 and transponder cards. No alarm is generated at the far end site in optical layer. Test 3. Transponder/wavelength fail To simulate transponder/wavelength fail, we remove the link between transponder and 32-WSS cards. Again, Transport Layer Failure alarms are issued by terminations of all affected lightpaths. In wavelength layer, both transponders at near end and far end generate "Incoming Payload Signal Absent" alarm to indicate the missing wavelength. But only near end 32-WSS card issue the same "Incoming Payload Signal Absent" alarm. Surprisingly, there is no "Loss of Signal" alarm in SDH layer. Test 4. Single ber cut In our last scenario, dark ber is unplugged from OSC-CSM module to represent single ber cut. We obtain Transport Layer Failure alarms at lightpath termination as predicted. OSC-CSM at two ends of the dark ber issue "Loss of Signal" alarm to warn ber cut problem. At wavelength layer, only the transponders at ROADM site (site A in our conguration) will send Incoming Payload Signal Absent alarm to indicate wavelength fail. We do not receive any Loss of Signal alarm in SDH layer, same to Test 3. 3.3 Correlation Diagram and Alarm Correlation Process Flow

From our observation, the experimental results are quiet different from analysis in previous section. For instance, in a ber cut situation, 2wx alarms are issued in IP/lightpath termination layer, 2w alarms in both SDH and wavelength layer and 2 alarms at ber layer. Our experiment has equivalent results with IP/lightpath termination layer (Transport Layer Failure) and ber layer (Loss of Signal), but only the transponder at ROADM site will send alarm (Incoming Payload Signal Absent) in wavelength layer and no alarms generated in SDH layer. Apparently, device vendor has processed part of correlation. It is very interesting that because ONS15454 integrates DWDM, SDH and IP access layers within single device, cross-layer information could be obtained easily. For example, if a transponder detects signal absent in wavelength layer, the STM-64 card connected to it should also have the same problem and therefore the alarm for STM-64 card could be ignored. It is the reason why there are no alarms in SDH layer in our test scenarios 3 and 4. In test 2, the link is unplugged between transponder and STM-64 cards at the near end. Because the underlying DWDM/wavelength layer remains intact, far end transponder and STM-64 cards will not issue loss of signal. Although part of alarms between wavelength layer and SDH layer could be correlated, further analysis of alarms between IP/lightpath layer and optical-related layers are still required. Therefore, we design the alarm correlation diagram as shown in Fig. 5 according to our experimental results. This diagram compensates for verdors partial correlation and will help network operators/engineers to discover the actual location of network obstruction. In previous section, we describe traditional alarm correlation process which alarms are examined from bottom layer up. The time complexity is in proportion to the product of dimensions at each layer. Such correlation process is time-consuming and wastes iterations in real case where partial correlations have been done. We propose another alarm correlation method which is process from top layer down as illustrated

92

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh

Fig. 5. Alarm Correlation Diagram

in Fig. 6. Alarms are rstly categorized by layers. We then examine alarms from the top-most layer. If there is any possible alarms exists in this layer, we follow all the correlation path to bottom layers and mark these alarms as correlated. Same iterations are processed toward bottom-most layer. The complexity of our correlation ow is proportional to maximum number of correlation path multiplies the depth of layers. For example, 4 alarms are obtained in Test 2 and are categorized into 3 layers: two "Transport Layer Failure" alarms at both near end and far end IP/lightpath layer, a "Loss of Signal" alarm at near end SDH layer, and a "Loss of Signal" alarm at near end wavelength layer. We rst look at top-most IP/lightpath layer. "Transport Layer Failure" alarms can be found in correlation diagram and there are two correlation paths to bottom layers: one looks for "Incoming Payload Signal Absent" alarm from transponder in wavelength layer; another looks for "Loss of Signal" alarm from STM-64 in SDH layer. In this case, only "Loss of Signal" alarm from STM-64 module is received, we hereby mark two "Transport Layer Failure" alarms as correlated. We then further look for "Loss of Signal" alarm from transponder in wavelength layer and mark "Loss of Signal" in SDH layer as correlated. After the correlation process ends, only "Loss of Signal" from transponder remains and network engineer could easily be notied that the root-cause of the event.

Fault Management in Hybrid Environment with IP and Optical Networks

93

Fig. 6. Alarm Correlation Process Flow

Conclusions

Hybrid network infrastructure will play an important role in future research network community. In addition to legacy IP-based Internet2 network, globally optical-based lambda network is managed by GLIF and international lightpaths can be exchanged at GOLE. Any research and education user with huge amount of data can choose their trafc to be delivered through optical lightpaths instead of IP network for better QoS guarantee. End-to-End IP connections that traverse over lightpaths are arranged by crossconnecting underlying optical synchronous channels and/or wavelengths. Therefore, there is an emerging demand to manage such hybrid infrastructures that cross IP and optical layers. In this paper, we review hybrid research networks and discuss alarm analysis cross DWDM, SDH, and IP layers. Scenarios are experimented in TWAREN optical lab and the results indicate that preliminary correlations may have done by vendor. Our proposed correlation diagram and process ow compensate for such partial correlation function and will help network operators and engineers to locate the root cause of network problem. Future works include implementation of this diagram with current control and management system [8], integration with PM alarm analysis to detect signal degrading, and further incorporation with alarms come from layer2 switched and layer 3 routers.

94

Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh

References
1. Debra Cameron: Internet2: The Future of the Internet and Next-Generation Initiatives. Computer Technology Research (1999) 2. Jing Wu, Michel Savoie, Scott Campbell, Hanxi Zhang, Gregor V. Bochmann, Bill St. Arnaud: Customer-managed end-to-end lightpath provisioning. International Journal of Network Management, Vol. 15, Issue 5, pp. 349362. (2005) 3. L. Smarr, T.A. DeFanti, M.D. Brown, C.T.A.M. de Laat: iGrid 2005: The Global Lambda Integrated Facility. iGrid2005 special issue, Future Generation Computer Systems, volume 22 issue 8, pp. 849851. (2006) 4. Tom DeFanti, Cees de Laat, Joe Mambretti, Kees Neggers, Bill St, Arnaud: TransLight a global-scale LambdaGrid for e-science. Communications of ACM, vol. 46, no. 11, pp. 3441. (2003) 5. Global Lambda Integrated Facility (GLIF). http://www.glif.is/ 6. TWAREN Optical Laboratory, http://optlab1.twaren.net/optlabEN/ 7. Cisco ONS 15454 SDH Troubleshooting and Maintenance Guide, Release 3.3, Cisco Press, http://www.cisco.com/en/US/products/hw/optical/ps2006/prod_ troubleshooting_guide_chapter09186a00801a5420.html 8. Shu-Cheng Lin, Hui-Min Tseng, Yi-Cheng Cheng, Hui-Lan Lee, Te-Lung; Liu, Chu-Sing Yang, C. Eugene Yeh: TWAREN Optical Network Laboratory and Lightpath Control System. Proceedings of IEEE AINAW, pp. 14921498. (2008)

Вам также может понравиться