Вы находитесь на странице: 1из 698

HP ExpertOne

******ebook converter DEMO Watermarks*******

Building HP FlexFabric Data Centers eBook


(Exam HP2-Z34)

Hppress.com

******ebook converter DEMO Watermarks*******

HP ExpertOne

******ebook converter DEMO Watermarks*******

Building HP FlexFabric Data Centers eBook (Exam HP2-Z34)


2014 Hewlett-Packard Development Company, L.P.
Published by:
HP Press
660 4th Street, #802
San Francisco, CA 94107
All rights reserved. No part of this book may be reproduced or transmitted in any
form or by any means, electronic or mechanical, including photocopying, recording,
or by any information storage and retrieval system, without written permission from
the publisher, except for the inclusion of brief quotations in a review.
ISBN: 978-1-937826-90-1
WARNING AND DISCLAIMER
This book provides information about the topics covered in the Building HP
FlexFabric Data Centers (HP2-Z34) certification exam. Every effort has been made
to make this book as complete and as accurate as possible, but no warranty or fitness
is implied.
The information is provided on an as is basis. The author, HP Press, and HewlettPackard Development Company, L.P., shall have neither liability nor responsibility to
any person or entity with respect to any loss or damages arising from the information
contained in this book or from the use of the discs or programs that may accompany
it.
The opinions expressed in this book belong to the author and are not necessarily
those of Hewlett-Packard Development Company, L.P.
TRADEMARK ACKNOWLEDGEMENTS
All terms mentioned in this book that are known to be trademarks or service marks
have been appropriately capitalized. HP Press or Hewlett-Packard Inc. cannot attest
to the accuracy of this information. Use of a term in this book should not be regarded
as affecting the validity of any trademark or service mark.
GOVERNMENT AND EDUCATION SALES
This publisher offers discounts on this book when ordered in quantity for bulk
purchases, which may include electronic versions. For more information, please
contact U.S. Government and Education Sales 1-855-4HPBOOK (1-855-447-2665)

******ebook converter DEMO Watermarks*******

or email sales@hppressbooks.com.
Feedback Information
At HP Press, our goal is to create in-depth reference books of the best quality and
value. Each book is crafted with care and precision, undergoing rigorous
development that involves the expertise of members from the professional technical
community.
Readers feedback is a continuation of the process. If you have any comments
regarding how we could improve the quality of this book, or otherwise alter it to
better suit your needs, you can contact us through email at
feedback@hppressbooks.com. Please make sure to include the book title and ISBN in
your message.
We appreciate your feedback.
Publisher: HP Press
Contributors and Reviewers: Olaf Borowski, Gerhard Roets, Vincent Gilles,
Olivier Vallois
HP Press Program Manager: Michael Bishop

HP Headquarters
Hewlett-Packard Company
3000 Hanover Street
Palo Alto, CA
943041185
USA
Phone: (+1) 650-857-1501
Fax: (+1) 650-857-5518
******ebook converter DEMO Watermarks*******

HP, COMPAQ and any other product or service name or slogan or logo contained in
the HP Press publications or web site are trademarks of HP and its suppliers or
licensors and may not be copied, imitated, or used, in whole or in part, without the
prior written permission of HP or the applicable trademark holder. Ownership of all
such trademarks and the goodwill associated therewith remains with HP or the
applicable trademark holder.
Without limiting the generality of the foregoing:
a. Microsoft, Windows and Windows Vista are either US registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries;
and
b. Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, Intel, Intel Logo,
Intel Atom, Intel Atom Inside, Intel Core, Intel Core Inside, Intel Inside Logo,
Intel Viiv, Intel vPro, Itanium, Itanium Inside, Pentium, Pentium Inside, ViiV
Inside, vPro Inside, Xeon, and Xeon Inside are trademarks of Intel Corporation in
the U.S. and other countries.

******ebook converter DEMO Watermarks*******

Special Acknowledgments
This book is based on the Building HP FlexFabric Data Centers course (Course ID:
00908176). HP Press would like to thank the courseware developers, Peter
Debruyne, David Bombal, and Steve Sowell.
Thanks to Debi Pearson and Miriam Allred for their help preparing this eBook for
publication.

Introduction
This study guide helps you prepare for the Building HP FlexFabric Data Centers
exam (HP2-Z34). The HP2-Z34 elective exam is for candidates who want to acquire
the HP ASE-FlexNetwork Architect V2 certification, or the HP ASE-FlexNetwork
Integrator V1 certification. The exam tests you on specific Data Center topics and
technologies such as Multitenant Device Context (MDC), Datacenter Bridging
(DCB), Multiprotocol Label Switching (MPLS), Fibre Channel over Ethernet
(FCoE), Ethernet Virtual Interconnect (EVI), and Multi-Customer Edge (MCE). The
exam will also cover topics on high availability and redundancy such as Transparent
Interconnection of Lots of Links (TRILL) and Shortest Path Bridging Mac-in-Mac
mode (SPBM).

HP ExpertOne Certification
HP ExpertOne is the first end-to-end learning and expertise program that combines
comprehensive knowledge and hands-on real-world experience to help you attain the
critical skills needed to architect, design, and integrate multivendor and multiservice
converged infrastructure and cloud solutions. HP, the largest IT company in the world
and the market leader in IT training, is committed to help you stay relevant and keep
pace with the demands of a dynamic, fast-moving industry.
The ExpertOne program takes into account your current certifications and experience,
providing the relevant courses and study materials you need to pass the certification
exams. As an ExpertOne certified member, your skills, knowledge, and real-world
experience are recognized and valued in the marketplace. To continue your
professional and career growth, you have access to a large ExpertOne community of
IT professionals and decision-makers, including the worlds largest community of
cloud experts. Share ideas, best practices, business insights, and challenges as you
gain professional connections globally.

******ebook converter DEMO Watermarks*******

To learn more about HP ExpertOne certifications, including storage, servers,


networking, converged infrastructure, cloud, and more, please visit
hp.com/go/ExpertOne.

Audience
This study guide is designed for networking professionals who want to demonstrate
their expertise in implementing HP FlexNetwork solutions by passing the HP2-Z34
certification exam. It is specifically targeted at networking professionals who want to
extend their knowledge of how to design and implement HP FlexFabric solutions for
the data center.

Assumed Knowledge
To understand the technologies and protocols covered in this study guide, networking
professionals should have on the job experience. The associated training course,
which includes numerous hands on lab activities, provides a good foundation for the
exam, but learners are also expected to have real world experience.

Relevant Certifications
After you pass these exams, your achievement may be applicable toward more than
one certification. To determine which certifications can be credited with this
achievement, log in to The Learning Center and view the certifications listed on the
exams More Details tab. You might be on your way to achieving additional HP
certifications.

Preparing for Exam HP2-Z34


This self-study guide does not guarantee that you will have all the knowledge you
need to pass the exam. It is expected that you will also draw on real-world
experience and would benefit from completing the hands-on lab activities provided
in the instructor-led training.

Recommended HP Training
Recommended training to prepare for each exam is accessible from the exams page
in The Learning Center. See the exam attachment, Supporting courses, to view and
register for the courses.

******ebook converter DEMO Watermarks*******

Obtain Hands-on Experience


You are not required to take the recommended, supported courses, and completion of
training does not guarantee that you will pass the exams. HP strongly recommends a
combination of training, thorough review of courseware and additional study
references, and sufficient on-the-job experience prior to taking an exam.

Exam Registration
To register for an exam, go to hp.com/certification/learn_more_about_exams.html.

******ebook converter DEMO Watermarks*******

1 Datacenter Products and


Technologies Overview
EXAM OBJECTIVES
In this chapter, you learn to:
Understand the components of the HP FlexFabric network architecture.
Describe common datacenter networking requirements.
Position the HP FlexFabric products.
Describe the HP IMC VAN Modules.

INTRODUCTION
This chapter introduces HPs FlexFabric portfolio, and describes how these products
can be used to deploy simple, scalable, automated data center networking solutions.
Specific data center technologies are also introduced. These include multi-tenant
solutions such as MDC, MCE, and SPBM, along with Hypervisor integration
protocols like PBB and VEPA. Other connectivity solutions include MPLS L2VPN,
VPLS, EVI, SPBM, and TRILL.

ASSUMED KNOWLEDGE
Because this course introduces the HP FlexFabric portfolio and datacenter
technologies, learners are not expected to have prior knowledge about this topic. It is
helpful, however, to be familiar with the requirements and growing trends of modern
datacenters.

HP FlexFabric Overview
******ebook converter DEMO Watermarks*******

This chapter provides an overview of the components that are involved in the
FlexFabric network architecture. It describes common data center networking
requirements, positions HP FlexFabric products, and describes the HP data center
technologies.

The World is Moving to a New Style of IT


Many IT functions and systems are continuing to change at a relatively brisk pace. As
shown in Figure 1-1, new paradigms arise, such as cloud computing and networking,
big data, BYOD and new security mechanisms, to name a few. With these new
paradigms come new challenges and new requirements, influencing how we build
networks going forward.

Figure 1-1: The World is Moving to a New Style of IT

Cloud: We must understand how to build an agile, flexible and secure network
edge, especially with regards to multi-tenancy.
Security: We have to rebuild the perimeter of the network wherever a device
connects without degrading the quality of business experience.
Big Data: We have to enable the network to respond dynamically to real-time data
analytics and to deal with the volume of traffic involved.
Mobility: We need to simplify the policy model in the campus by unifying wired
and wireless networks. In the data center, we need to increase the agility and
performance of mobile VMs.

******ebook converter DEMO Watermarks*******

A converged infrastructure can meet these needs by providing several key features,
including:
A resilient fabric for less downtime and faster VM mobility
Network virtualization for faster data center provisioning
Software Defined Networking (SDN) to simplify deployment and security
creating business agility and network alignment to business priorities.

Apps Are Changing - Networks Must Change


Applications are changing and the networks infrastructure must be capable of
handling these new application requirements. One significant trend is a massive
increase in virtualization. Almost any service will be offered as a virtualized
service, hosted inside a data center. These virtualized services can be in private
clouds, a customers local data center, or public clouds. They might even be offered
as a type of hybrid cloud service, which is a mix of private and public clouds.
Inside the data center, the bulk of data traffic is now server-to-server. This is mainly
due to the change in application behavior, since (as shown in Figure 1-2) we see
much more use of federated applications as opposed to monolithic application
models of the past.

Figure 1-2: Apps Are Changing - Networks Must Change

******ebook converter DEMO Watermarks*******

Previously, companies may have used a single email server that provided multiple
functions. In todays environment, companies may instead leverage a front-end
server, a business logic server, and a back-end database system. In such a
deployment, each client request towards the data center is handled by multiple
services inside the data center. This results in similar client-server interactions as in
the past, but with increased server-to-server traffic to fulfill those client requests.
Also, many storage services and protocols are now being supported by a converged
network that handles both traditional client-server traffic, as well as disk storagerelated traffic.

Multi-tier Legacy Architecture in the Data


Center (DC)
Federated applications and virtualization has changed the way traffic flows through
the infrastructure. As packets must be passed between more and more servers,
increased latency can impact performance and end-user productivity. Networks must
be designed to mitigate these risks, while ensuring a stable, loop-free environment,
see Figure 1-3. Network loops in a large data center environment can have egregious
impacts on the business, so the ability to maintain loop-free paths is of particular
importance.

Figure 1-3: Multi-tier Legacy Architecture in the Data Center (DC)

HPN FlexFabric Value Proposition


HPs FlexFabric approach has a focus on the three customer benefits. The network

******ebook converter DEMO Watermarks*******

should be simple, scalable, and automated.


Simple reducing operational complexity by up to 75%
Unified virtual/physical and LAN/SAN fabrics
OS/feature consistently, no licensing complexity, cost
Scalable double the fabric scaling, with up to 500% improved service delivery
Non-blocking reliable fabric for 100-10,000 hosts
Spine and leaf fabric optimized for Cloud, SDN
Automated cutting network provisioning time from months to minutes
300% faster time to service delivery, Software-Defined Network Fabric
Open, standards based programmability, SDN App Store and SDK

HP FlexFabric Product Overview


This product overview section begins with a discussion of core and aggregation
switches. This is followed by an overview of access switches, and the IMC network
management systems.

HP FlexFabric Core Switches


Figure 1-4 introduces the current portfolio of HP FlexFabric core switches. This
includes the HP FlexFabric 12900, 12500, 11900 and the 7904 Switch Series.

Figure 1-4: HP FlexFabric Core Switches

******ebook converter DEMO Watermarks*******

HP FlexFabric 12900 Switch Series


The HP FlexFabric 12900 Switch Series, shown in Figure 1-5, is an exceedingly
capable core data center switch. The switch includes support for Open Flow 1.3,
laying a foundation for SDN and investment protection.

Figure 1-5: HP FlexFabric 12900 Switch Series

It provides 36 Tbps of throughput in a non-blocking fabric, and supports up to 768 10


gigabit ports, and up to 256 40Gbps ports. The 12900-series supports Fiber Channel
over Ethernet (FCoE) and Data Center Bridging (DCB).
The switch allows for In Service Software Upgrades (ISSU) to minimize downtime.
Additionally, protocols like TRILL and SPB can be used to provide scalable
connectivity between data center sites. All of these functions can be used in
conjunction with IRF to offer a redundant, flexible platform.

HP 12500E Switch Series


The HP 12500E Switch Series, shown in Figure 1-6, allows for up to 24Tbps
switching capacity. It is available in 8 and 18-slot chassis. It supports very large
Layer 2 and Layer 3 address and routing tables, and data buffers. It allows for up to
four units in an IRF system.

******ebook converter DEMO Watermarks*******

Figure 1-6: HP 12500E Switch Series

The HP 12500 Switch Series has been updated, and so now supports high density 10
gigabit, 40 gigabit or 100 gigabit Ethernet modules - up to 400 gigabit per slot. It can
support traditional Layer 2 and Layer 3 functions IPv4 and IPv6. These devices also
feature support for the more modern protocols, such as MPLS, VPLS, MDC, EVI, and
more.
Wire-speed services provide a high-performance backbone while the energyefficient design lowers operational costs.

HP 12500 Switch Series Overview


Figure 1-7 compares the features and capabilities of the 12500C and 12500E
platforms. The 12500C is based on Comware5 while the 12500E is based on
Comware7. The use of Comware7 results in enhanced MPU performance.

******ebook converter DEMO Watermarks*******

Figure 1-7: HP 12500 Switch Series Overview

HP FlexFabric 11908 Switch Series


The HP FlexFabric 11900 Switch Series, shown in Figure 1-8, supports up to
7.7Tbps of throughput in a non-blocking fabric. This switch can be a good choice for
data center aggregation switch.

Figure 1-8: HP FlexFabric 11908 Switch Series

HP FlexFabric 7900 Switch Series


******ebook converter DEMO Watermarks*******

The HP FlexFabric 7900 Switch Series, shown in Figure 1-9, is the next generation
compact modular data center core switch. It is based on the same architecture and
ComWare7 code as larger chassis-based switches.

Figure 1-9: HP FlexFabric 7900 Switch Series

The feature set includes full support for IRF, TRILL, DCB, EVI, MDC, OpenFlow
and VXLAN.

HP FlexFabric Access Switches


The HP 5900 Switch Series, shown in Figure 1-10, can serve as traditional top-ofrack access switches.

Figure 1-10: HP FlexFabric Access Switches

******ebook converter DEMO Watermarks*******

The HP 5900AF Switch Series is available in various models, including 48 1Gbps


port versions and as 48 10Gbps switch port versions, with 4x 40Gbps uplink ports. It
is also available with 48 1/10Gbps ports, with 4 x 40Gbps uplink connections. The
1/10Gbps port version is especially convenient for data centers which are migrating
servers from 1 to 10Gbps interfaces.
The 5930 is a Top-of-Rack (ToR) switch with 32 10/40Gbps ports. This switch
could be used to terminate 40 gigabit connections from blade server enclosures, or it
could be deployed as a distribution or aggregation layer device to concentrate a set
of HP 5900Switch Series. Each of the 40Gbps ports can be split out as four 10Gbps
ports with a special cable. This means that the 32 40Gbps ports could become 128
10Gbps ports, available in a 1U device.
The CP in the 5900CP model stands for Converged Ports. As the name implies,
both Fibre Channel over Ethernet (FCoE) and native Fibre Channel (FC) are
supported in a single, converged ToR access switch. All of the 5900 Switch Series
shown here support FCoE, but only the 5900CP also supports native FC connectivity.
The module installed in each port determines whether that port functions as a 10Gbps
FCoE port, or as an 8Gbps FC port. The 5900CP supports FCoE-to-FC gateway
functionality.
The HP FlexFabric 5900v is a virtual switch that can be installed as a replacement
for the VMware switch on a Hypervisor. The 5900v is based on the VEPA protocol.
This means that the 5900v does not support local direct configuration. Inter-VM
traffic will be sent to an external ToR switch to be serviced. This is why the 5900v
must be deployed in combination with a physical switch which also supports the
VEPA protocol. All Comware7-based 5900-series switches support VEPA.
In HP blade enclosures can have interconnects installed. These interconnects must
match the physical form factor of the blade enclosure. The HP 6125 XLG can provide
this blade server interconnectivity.
This switch belongs to the HP 5900 Switch Series family of switches, as it provides
10Gbps access ports for blade servers, along with 4 x 40Gbps uplink ports. As a
Comware7-based product, the 6125 XLG can be configured with the same protocols
and features as traditional HP 5900 Switch Series. For example, features like FCoE
and IRF are supported. This means that multiple 6125 XLG switches in the same
blade enclosure can be grouped together as a single virtual IRF system. It also
supports VEPA, and so can work with the 5900v switch running on a Hypervisor.

HP FlexFabric 5930 Switch Series


******ebook converter DEMO Watermarks*******

The HP FlexFabric 5930 Switch Series, shown in Figure 1-11, are built on the latest
generation of ASICs, and so includes hardware support for VXLAN & NVGRE.
VXLAN is an overlay virtualization technology which is largely promoted by
VMware. NVGRE is an overlay technology which is largely promoted with
Microsoft and used in their HyperV product.

Figure 1-11: HP FlexFabric 5930 Switch Series

Since the HP FlexFabric 5930 Switch Series has hardware support for both
technologies, both products can be interconnected with traditional VLANs, with
support for OpenFlow and SDN. With 32 40Gbps ports, it is suitable as a component
in large scale spine or leaf networks that can leverage IRF and TRILL.

HP FlexFabric 5900CP Converged Switch


The HP FlexFabric 5900CP supports 48 x 10Gbps converged ports. As shown in
Figure 1-12, support for 4/8Gbps FC or 1/10Gbps Ethernet is available on all ports.
It supports HPs universal converged optic transceivers. The hardware optics in each
port determines whether that port will function as a native FC port, or as an Ethernet
port. The converged optics interface is a single device that can be configured to
operate as either of the two. This means that the network administrator can easily
change the operational mode of the physical interface via CLI configuration. This
eliminates the need to unplug receivers and reconnect transceivers for this purpose.

******ebook converter DEMO Watermarks*******

Figure 1-12: HP FlexFabric 5900CP Converged Switch

FlexFabric 5700 Datacenter ToR Switch


The HP FlexFabric 5700 Top-of-Rack switch is available in various combinations of
1Gbps and 10Gbps port configurations with 10Gbps or 40Gbps uplinks, as shown in
Figure 1-13. This relatively new addition to the FlexFabric family offers L2 and L3
lite support, IRF support of nine switches to simplify management operations.

Figure 1-13: FlexFabric 5700 Datacenter ToR Switch

The 5700 switch series delivers 960Gbps switching capacity and is SDN-ready.

HP HSR6800 Router Series


******ebook converter DEMO Watermarks*******

The HP HSR6800 Router Series, shown in Figure 1-14, provides comprehensive


routing, firewall and VPN functions. It uses 2Tbps backplane to support 420Mpps
routing throughput. This is a high-density WAN router that can support up to 31
10Gbps Ethernet ports and is 40/100Gbps ready.

Figure 1-14: HP HSR6800 Router Series

Two of these carrier-class devices can be grouped into an IRF team to operate as a
single, logical router entity. This eases configuration and change management, and
eliminates the need for other redundancy protocols like VRRP.

Virtual Services Router


The Virtual Services Router (VSR) can be seen as a network function virtualization
(NFV) technology. It is very easy to deploy the VSR on any branch or data center or
cloud infrastructure, see Figure 1-15 for more information. It is based on Comware7
and can be installed on a hypervisor, such as VMware ESXi or LINUX KVM.

******ebook converter DEMO Watermarks*******

Figure 1-15: Virtual Services Router

The VSR makes it very easy and convenient to support a multi-tenant data center.
New router instances can be quickly deployed inside the hosted environment to
provide routed functionality for a specific customer solution. VSR comes in multiple
versions, with various licensing options to provide more advanced capabilities.

IMC VAN Fabric Manager


Basic data center management of devices is handled by IMC. The VAN Fabric
Manager (VFM) is a software module that can be added to IMC. This module adds
advanced traffic management capabilities for many data center protocols, such as
SPB, TRILL, and IRF. Storage protocols such as DCB and FCoE are also supported,
see Figure 1-16.

******ebook converter DEMO Watermarks*******

Figure 1-16: IMC VAN Fabric Manager

It also manages the data center interconnect protocol such as EVI, and provides
zoning services for converged storage management.
You can easily view and manage information about VM migrations. VM migration
records include the VM name, source and destination server, start and end times for
the migration, and name of the EVI service to which the VM belongs. You can also
perform a migration replay, which allows you to playback the migration process,
allowing you to view the source, destination, and route of a migration in a video.

HP FlexFabric Cloud: Virtualized DC Use Case


Figure 1-17 shows an example of an HP FlexFabric deployment. At the access layer,
5900vs are deployed inside a blade server hypervisor environment, in conjunction
with 5900-series switches with VESA support.

******ebook converter DEMO Watermarks*******

Figure 1-17: HP FlexFabric Cloud: Virtualized DC Use Case

With a deployment of HP blade systems, the 6125 XLGs can be used for
interconnectivity.
In this scenario the access layer is directly connected to the core, which could be
comprised of 12900 or 11900-series devices. Connectivity to remote locations can
be provided by the HSR 6800 router, and the entire system can be managed from a
single pane-of-glass with HPs IMC. Additional insight and management for data
center specific technologies can be provided by the addition of the VFM module for
IMC.

Data Center Technologies Overview


The data center may provide support for multiple tenants. Multiple infrastructures
may co-exist in an independent way.
The data center should also have support for Ethernet fabric technologies to provide
interconnect between all the switches, as well as converged FC/FCoE support. This
fabric should integrate with Hypervisor environments.
Also, data center interconnect technologies Network overlay technologies are used to
connect several multi-tenant data centers together in a scalable, seamless way.

Overview of DC Technologies
Figure 1-18 provides an overview of data center technologies and generalizes where
these technologies are deployed.

******ebook converter DEMO Watermarks*******

Figure 1-18: Overview of DC Technologies

Multi-tenant support is provided by technologies such as MDC, MCE and SPBM.


Hypervisor integration is provided by PBB and VEPA protocols, along with the
5900v switch product.
Overlay networking solutions are provided by VXLAN and SDN.
Data center interconnect technologies include MPLS L2VPN, VPLS, EVI, and
SPBM.
OpenFlow technology can be used to understand, define, and control network
behavior.
Large-scale Layer 2 Ethernet fabrics can be deployed using traditional link
aggregation along with TRILL or SPBM.
IRF or Enhanced IRF can be used to improve manageability and redundancy in the
Ethernet fabric.
Storage and Ethernet technologies can be converged with switches that support
DCB, FCoE, and native FC.

Multi-tenant Support
Multi-tenancy support involves the ability to support multiple business units,
customers, and services over a common infrastructure. This data center infrastructure
must provide techniques to isolate multiple customers from each other.

******ebook converter DEMO Watermarks*******

Multi-tenant Isolation
Several isolation techniques are available, in two general categories. Physical
isolation is one solution. However, this solution is less scalable due to the cost of
purchasing separate hardware for each client, as well as the space, power, and
cooling concerns. With logical isolation, isolated services and customers share a
common hardware infrastructure. This reduces initial capital expenditures and
improves return on investment.

Multi-tenant Isolation with MDC and MCE


One isolation technique is Multi-tenant Device Context (MDC). This technology
creates a virtual device inside a physical device. This ensures customer isolation at
the hardware layer, since ASICs or line cards are dedicated to each customer.
Since each MDC has its own configuration file, with separate administrative logins,
isolation at the management layer is also achieved. There is also isolation of control
planes, since each MDC has its own path selection protocol, such as TRILL, SPB,
OSPF, or STP. Isolation at the data plane is achieved through separate routing tables
and Layer 2 MAC address tables.
Another technology to provide Layer 3 routing isolation is Multi-customer Carrier
Ethernet (MCE). This is also known in the market as Virtual Routing and Forwarding
(VRF). With VRF, separate virtual routing instances can be defined in a single
physical router.
This technology maintains separate routing functionality and routing tables for each
customer. However, the platforms hardware limitations still apply. For example, ten
MCEs might be configured on a device that has a hardware limit of 128,000 IPv4
routes. In this scenario, all ten customer MCE routing tables must share that 128,000
entry maximum.
Unlike MDC, which allows for different management planes per customer, MCE
features a single management plane for all customers. In other words, a single
administrator configures and manages all customer MCE instances.

Multi-tenant Isolation for Layer 2


VLANs are the traditional method used to isolate Layer 2 networks, and this remains
a prominent technology in data centers. However, the 4094 VLAN maximum can be a
limiting factor for large, multi-tenant deployments. Another difficulty is preventing

******ebook converter DEMO Watermarks*******

each client from using the same set of VLANs.


QinQ technology alleviates some of these concerns. Each customer has their own set
of 4096 VLANs, using a typical 802.1q tag. An outer 802.1q tag is added, which is
unique to each client. The data center uses this unique outer tag to move frames
between customer devices. Before the frame is handed off to the client, the outer tag
is removed.
A limitation of this technique involves the MAC address table. All customer VLANs
traverse the provider network with a common outer 802.1q tag. Therefore, all client
VLANs share the same MAC address table. It is possible for this to increase the odds
of MAC address collision multiple devices that use the same address.
Another option is Shortest Path Bridging using MAC in MAC mode (SPBM). SPBM
can also isolate customers, similar to QinQ. Unlike QinQ, SPBM creates a new
encapsulation, with the original customer frame as the payload of the new frame. This
new outer frame includes a unique customer service identifier, providing a highly
scalable solution.
SPBM supports up to 16 million service identifiers. Each of the 16 million customers
can have their own set of 4094 VLANs. A common outer VLAN identifier tag can be
used for all client VLANs, like with QinQ. Alternatively, different customer VLANs
can use different identifiers. Compared to QinQ, SPBM provides increased
scalability while limiting the issue of MAC address collision.
Virtual eXtensible LAN (VXLAN) is another technology that provides a virtualized
VLAN for Hypervisor environments. A Virtual Machine (VM) can be assigned to a
VXLAN, and use it to communicate with other VMs in the same VXLAN.
This technology requires some integration with traditional VLANs via a hardware
gateway device. This functionality can be provided by the HP Comware 5930 switch.
VXLAN supports up to 16 million VXLAN IDs so is quite scalable.
VXLAN provides a single VXLAN ID space. While SPBM could be used to
encapsulate 4094 traditional VLANs into a single customer service identifier, with
VXLAN, a customer with 100 VLANs would use 100 VXLAN IDs. For this reason,
some planning is required to ensure that each client uses a unique range of VXLAN
IDs.

Network Overlay Functions


Network overlay functions provide a virtual network for a specific, typically VM-

******ebook converter DEMO Watermarks*******

based service.
Software Defined Networking (SDN) can be considered a network overlay function,
since it can centralize the control of traffic flows between devices, virtual or
otherwise.
VXLAN is an SDN technology that can provide overlay networks for VMs. Each VM
can be assigned to a unique VXLAN ID, as supposed to a physical, traditional VLAN
ID. HP is developing solutions to integrate SDN and VXLAN solutions. This will
enable inter-connectivity between VXLAN-assigned virtual services and physical
hosts.

SDN: Powering Your Network Today and


Tomorrow
SDN can be used to control the network behavior inside the data center. As shown in
Figure 1-19, the SDN architecture consists of the infrastructure, control, and
application layers.

Figure 1-19: SDN: Powering Your Network Today and Tomorrow

The infrastructure layer consists of overlay technologies such as VXLAN or NVGRE.


Or it can is consist of devices that support OpenFlow.
The control plane is to be delivered by the HP Virtual Application Network (VAN)
SDN controller. This controller will be able to interact with VXLAN and OpenFlowenabled devices. It will have the ability to be directly configured, or to be controlled

******ebook converter DEMO Watermarks*******

by an external application, such as automation, cloud management, or security tools.


The HP SDN app store will provide centralized availability for SDN-capable
applications. Load-balancing will also be provided.

Data Center Ethernet Fabric Technologies


This section will focus on Ethernet fabric technologies for the data center. An
Ethernet fabric should provide a high speed Layer 2 interconnect with efficient path
selection. It should also provide scalability to enable ample bandwidth and link
utilization.

Data Center Ethernet Fabric Technologies 2


IRF combines two or more devices into a single, logical device. IRF systems can be
deployed at each layer in the data center. For example, they are often deployed at the
core layer of a data center, and could also be used to aggregate access layer
switches. Servers could also be connected to IRF systems at the access layer.
These layers can be interconnected by traditional multi-chassis link aggregations,
which provide an active-active redundancy solution. Each IRF system is managed as
an independent entity. If a customer has 200 physical access switches, they could be
grouped into 100 IRFs, each IRF system containing two physical switches. If a new
VLAN must be defined, it must be defined on each of the 100 IRF systems.
Enhanced IRF (EIRF) is the next generation of IRF technology, allowing for the
grouping of up to 100 or more devices into a single logical device. Enhanced IRF can
combine multiple layers into a single logical system. For instance, several
aggregation and access layer switches can be combined into a single logical device.
Like traditional IRF, this provides a relatively easy active-active deployment model.
However, with Enhanced IRF a large set of physical devices will be perceived as a
single, very large switch with many line cards. If 100 physical switches were
combined into a single EIRF system, they are all managed as a single entity. If a new
VLAN must be defined, it only needs to be defined one time, as opposed to multiple
times with traditional IRF. Also, EIRF eliminates the need to configure multi-chassis
link aggregations as inter-switch links.

Data Center Ethernet Fabric Technologies 3


IRF and EIRF offer a compelling, HP Comware-based solution for building an
Ethernet fabric. TRILL and SPBM offer other, standards-based technologies for data

******ebook converter DEMO Watermarks*******

center connectivity. HP Comware IRF or EIRF technology can provide switch and
link redundancy while connecting to a standards-based TRILL or SPBM fabric.
TRILL ensures that the shortest path for Layer 2 traffic is selected, while allowing
maximum, simultaneous utilization of all available links. For example, two server
access switches could connect to multiple aggregation switches and also be directly
connected to each other. Traffic flow between servers on the two switches can utilize
the direct connection between the two switches, while other traffic uses the accessto-aggregation switch links. This is an advantage over traditional STP-based path
selection, which would require one of the links (likely the access-access connection)
to be disabled for loop prevention.
TRILL can also take advantage of this active-active, multi-path connectivity for cases
when switches have, say four uplinks between them. The traffic will be loadbalanced over all equal-cost links. This load balancing can be based on
source/destination MAC address pairs, or source/destination IP addressing.
A limitation of TRILL is the fact that it supports a single VLAN space only. While
TRILL provides for very efficient traffic delivery, it remains limited by the 4094
VLAN maximum.
SPBM is similar to TRILL in its ability to leverage routing-like functionality for
efficient Layer 2 path selection. Compared to TRILL, SPBM offers a more
deterministic method of providing load-sharing over multiple equal-cost paths. This
allows the administrator to engineer specific paths for specific customer traffic
SPBM also offers the potential for greater scalability than TRILL. This is because
SPBM supports multiple VLAN spaces, since each customers traffic is uniquely
tagged with a service identifier in the SPBM header.
RFC 7272 is a relatively recent standard that will allow the use of a 24-bit identifier,
as opposed to the current 12-bit VLAN ID. This will allow greater scalability for
multiple tenants. This feature is not currently supported on HP Comware switches.

Server Access Layer Hypervisor Networking


Hypervisor networking is supported at the access layer of a data center deployment,
in the form of VEPA and EVB. These technologies enable integration between virtual
and physical environments.
For example, the HP Comware Hypervisor 5900v provides a replacement option for
the Hypervisors own built-in software vSwitch. The 5900v sends inter-VM traffic to

******ebook converter DEMO Watermarks*******

an external, physical switch for processing. This external switch must support VEPA
technology to be used for this purpose.
Typically, most inter-VM traffic is handled by a physical switch anyway, since there
are typically multiple ESX hosts. Traffic between VMs hosted by different ESX
platforms are handled by an external physical switch. Only inter-VM traffic on the
same ESX host is handled by that hosts internal vSwitch. The VEPA EVB model
ensures a more consistent traffic flow, since all inter-VLAN traffic is via an external
switch.
This results in greater visibility and insight into inter-VLAN traffic flow. Traditional
network analysis tools and port mirroring tools are thus capable of detailed traffic
inspection and analysis.

Server Access Layer Converged Storage


Storage convergence means that a single infrastructure has support for both native
Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), and iSCSI.
With Fibre Channel technology, a physical Host Bus Adapter (HBA) is installed in
each server to provide access to storage devices. To ensure lossless delivery of
storage frames, FC uses a buffer-to-buffer credit system for flow control. A separate
Ethernet interface is installed in the server to perform traditional Ethernet data
communications.
FCoE is a technology that provides traditional FC and 10Gbps Ethernet support over
a single Converged Network Adapter (CNA). The servers application layer
continues to perceive a separate adapter for each of these functions. Therefore, the
CNA must accept traditional FC frames, encapsulate them in Ethernet and send it
over the converged network fabric. A suite of Data Center Bridging (DCB) protocols
enhance the Ethernet standard. This ensures the lossless frame delivery that is
required by FC.
iSCSI encapsulates traditional SCSI protocol communications inside a TCP/IP
packet, which is then encapsulated in an Ethernet frame. The iSCSI protocol does not
require that Ethernet be enhanced by DCB or any other special protocol suite.
Instead, capabilities inherent to the TCP/IP protocol stack will mitigate packet loss
issues.
However enterprise-class iSCSI deployments should have robust QoS capabilities
and hardware switches with enhanced buffer capabilities. This will help to ensure
that iSCSI frame delivery is reliable, with minimal retransmissions.

******ebook converter DEMO Watermarks*******

Although DCB was originally developed to ensure lossless delivery for FCoE, it can
also be used for iSCSI deployments. This minimizes frame drop and retransmission
issues.

Server Access Layer FC/FCoE


The 5900CP provides native FC fabric services. Since it provides both FCoE and
native FC connections, it can act as a gateway between native FC and FCoE
environments.
In addition to this FC-FCoE gateway service, other deployment scenarios are
supported by the HP 5900CP. It can be used to interconnect a collection of traditional
FC storage and server devices, or to connect a collection of FCoE-based systems.
Multiple Fiber Channel device roles are supported. The 5900CP can fill the FCF
role to support full fabric services. It can also act as an NPV node to support
endpoint ID virtualization.

Data Center Interconnect Technologies


Data center Interconnect technologies allow customer services to be interconnected
across multiple data center sites. Two data center locations could be deployed, or
multiple data centers could be spread over multiple locations for additional
scalability and redundancy.
These technologies typically require options for path redundancy and scalable Layer
2 connectivity between the data centers. This ensures that all customer requirements
can be met, such as the ability to move VMs to different physical hosts via
technologies such as VMWares vMotion.

Data Center Interconnect Technologies 2


Data centers can be connected using some traditional Layer 2 connection. This could
be dark fiber connectivity between two sites, or some other connectivity available
from a service provider. Once these physical connections are established, traditional
VLAN trunk links and link aggregation can be configured to connect core devices at
each site.
MPLS L2VPN is typically offered and deployed by a service provider, although
some larger enterprises may operate their own internal MPLS infrastructure. Either
way, L2VPN tunnels can be established to connect sites over the MPLS fabric.

******ebook converter DEMO Watermarks*******

In this way, MPLS L2VPN provides a kind of pseudo wire between sites. It is
important to note this connection lacks the intelligence to perform MAC-learning or
other Layer 2 services. It is simply a dumb connection between sites.

Data Center Interconnect Technologies 3


MPLS Virtual Private LAN Service (VPLS) is another option that is typically
deployed by a service provider. Some enterprises may have their own MPLS
infrastructure, over which they may wish to deploy a VPLS solution. Unlike MPLS
L2VPN, VPLS has the intelligence to perform traditional Layer 2 functions, such as
MAC learning for each connected site. Therefore, when a device at one location
sends a unicast frame into the fabric, it can be efficiently forwarded to the correct
site. This is more efficient than having to flood the frame to all sites.
Ethernet Virtual Interconnect (EVI) is an HP propriety technology to interconnect data
centers with Layer 2 functionality. This technology enables the transport of L2 VPN
and VPLS without need for an underlying MPLS infrastructure. Any typical IP routed
connection between the data centers can be used to interconnect up to eight remote
sites.
The advantage of EVI is that it is very easy to configure as compared to MPLS.
MPLS requires expertise with several technologies, including IP backbone
technologies, label switching, and routing. EVI also makes it easy to optimize the
Ethernet flooding behavior.

Summary
In this chapter, you learned that HPs FlexFabric provides a simple, scalable,
automated approach to data center networking solutions.
You also learned that HPs FlexFabric product portfolio includes core switches like
the 12900, 12500, 11900, and 7904. It also includes 5900AF, 5930, 5900CP, 5900v,
and 6125XLG access switches. For routing, the HSR 6800 and VSR are available.
Improved visibility and management functions for TRILL/SPB and FCoE/FC are
available with the IMC VAN fabric manager product.
You also learned that:
Technologies that support multi-tenant solutions include MDC, MCE, and SPBM.
Hypervisor integration is provided by PBB and VEPA.
Overlay solutions include VXLAN and SDN, while data center interconnect
technologies include MPLS L2VPN, VPLS, EVI, and SPBM.

******ebook converter DEMO Watermarks*******

Large-scale Layer 2 fabrics can be deployed using TRILL or SPBM, with IRF and
EIRF providing improved manageability and redundancy.
The HP data center portfolio can create converged network support with DCB,
FCoE, and native FC.

Learning Check
Answer each of the questions below.
1. HPs FlexFabric includes the following components (choose all that apply)?
a. Core switches
b. Aggregation switches
c. MSM 4x0-series access points.
d. Access switches.
e. The 5900CP converged switch.
f. Both physical and virtual services routers
g. HPs IMC management platform
2. The IMC VAN fabric manager provides which three capabilities (choose three)?
a. Unified SPB, TRILL, and IRF fabric management
b. VPN connectivity and performance management.
c. VXLAN system management
d. Unified DCB, FCoE, and FC SAN management.
e. EVI protocol management for data center interconnects.
f. Switch and router ACL configuration management.
3. Which two statements are true about multi-tenant isolation for Layer 2?
a. VLANs provide a traditional method to isolate Layer 2 networks that is
limited to 4094 VLANs1
b. With QinQ technology, up to 256 customers can each have their own set of
4094 isolated VLANs.
c. DCB is an overlay technology that allows a converged infrastructure
d. Shortest Path Bridging MAC-in-MAC mode can support 16 million isolated
customers through the use of and I-SID2
4. Which technology can extend a Layer 2 VLAN across multiple data centers using

******ebook converter DEMO Watermarks*******

a Layer 3 technology?
a. DCB.
b. EIRF.
c. SDN.
d. TRILL.
e. VXLAN.

Learning Check Answers


1. a, b, d, e, f, g
2. a, d, e
3. a, d
4. d

******ebook converter DEMO Watermarks*******

2 Multitenant Device Context

EXAM OBJECTIVES
In this chapter, you learn to:
Describe MDC features.
Explain MDC use cases.
Describe MDC architecture and operation.
Describe support for MDC on various hardware platforms.
Understand firmware updates and ISSU with MDC.
Describe supported IRF configurations with MDC.

INTRODUCTION
Multitenant Device Context (MDC) is a technology that can partition a physical
device or an IRF fabric into multiple logical switches called "MDCs."
Each MDC uses its own hardware and software resources, runs independently of
other MDCs, and provides services for its own customer. Creating, starting,
rebooting, or deleting an MDC does not affect any other MDC. From the user's
perspective, an MDC is a standalone device.

MDC Overview
Multitenant Device Context (MDC) can partition either a single physical device or an
IRF fabric into multiple logical switches called "MDCs."
With MDC, physical networking platforms, such as HP 11900, 12500, and 12900
switches can be virtualized to support multitenant networks. In other words, MDC

******ebook converter DEMO Watermarks*******

provides customers with 1:N device virtualization capability to virtualize one


physical switch into multiple logical switches as shown in Figure 2-1.

Figure 2-1: Feature overview

Other benefits of MDC include:


Complete separation of control planes, data planes and forwarding capabilities.
No additional software license required to enable MDC.
Reduced power, cooling and space requirements within the data center.
Up to 75% reduction of devices and cost when compared to deployments without
1:N device virtualization.
Modification of interface allocations without stopping MDCs.

IRF versus MDC


What is the difference between MDC and technologies like IRF?
The main difference is that in the case of IRF (N:1 Virtualization), you are combining
multiple physical devices into one logical device. With MDC on the other hand (N:1
Virtualization), you are splitting either a single device or a logical IRF device into
separate discrete logical units.

******ebook converter DEMO Watermarks*******

The reason for doing this is to provide network features such as VLANs, routing, IRF
and other features to different entities (customers, development network), but still use
the same hardware. Customers can also be given different feature sets inside the
same logical "big box" device. Each of the MDCs operate as a totally independent
device inside the same physical device (or IRF fabric).
Instead of buying additional core switches for different customers or business units, a
single core switch or IRF fabric can be used to provide the same hardware feature
set to multiple customers or business units.

MDC Features
Each MDC uses its own hardware and software resources, runs independently of
other MDCs, and provides services for its own environment. Creating, starting,
rebooting, or deleting an MDC does not affect the configuration or service of any
other MDC. From the user's perspective, an MDC is a standalone device.
Each MDC is isolated from the other MDCs on the same physical device and cannot
communicate with them via the switch fabric. To allow two MDCs on the same
physical device to communicate with each other, you must physically connect a port
allocated to one MDC to a port allocated to the other MDC using an external cable. It
is not possible to make a connection between MDCs over the backplane of the
switch.
Each MDC has its own management, control and data planes, which is the same size
as the physical device. For example, if the device has a 64-KB space for ARP
entries, each MDC created on the device gets a separate 64-KB space for its own
ARP entries.
Management of MDCs on the same physical device is done via the default MDC
(admin MDC), or via management protocols such as SSH or telnet.

MDC Applications
MDC can be used for applications such as the following:
Device renting
Service hosting
Staging of a new network on production equipment
Testing features such as SPB and routing that cannot be configured on a single
device

******ebook converter DEMO Watermarks*******

Student labs
Instead of purchasing new devices, you can configure more MDCs on existing
network devices to expand the network.
As an initial example, in Figure 2-2 a service provider provides access services to
three companies, but only deploys a single physical device (or IRF stack). The
provider configures an MDC for each company on the same hardware device to
logically create three separate devices.

Figure 2-2: MDC application example

The administrators of each of the three companies can log into their allocated MDC
to maintain their own network without affecting any other MDC. The result is the
same as deploying a separate gateway for each company.
Additional use cases will be discussed later in this chapter.

MDC Benefits Overview


MDC Benefits
Higher utilization of existing network resources and fewer hardware upgrade
costs: Instead of purchasing new devices, you can configure more MDCs on existing
network devices to expand the network. For example, when there are more user
groups, you can configure more MDCs and assign them to the user groups. When
there are more users in a group, you can assign more interfaces and other resources to
the group.

******ebook converter DEMO Watermarks*******

Lower management and maintenance cost: Management and maintenance of


multiple MDCs occur on a single physical device.
Independence and high security: Each MDC operates like a standalone physical
device. It is isolated from other MDCs on the same physical device and cannot
directly communicate with them. To allow two MDCs on the same physical device to
communicate, you must physically connect a cable from a port allocated to one MDC
to another port allocated to the other MDC.

MDC Features
An MDC can be considered a standalone device. Creating, running, rebooting or
deleting a MDC does not affect the configuration or service of any other MDC. This
is because of Comware v7's container based OS level virtualization technology as
shown in Figure 2-3.

Figure 2-3: Feature overview

Each MDC is a new logical device defined on the existing physical device. The
physical device could either be a single switch or an IRF fabric.
A traditional switching device has its own control, management and data planes.
When you define a new MDC, the same features and restrictions of the physical

******ebook converter DEMO Watermarks*******

device will apply to the new MDC and the new MDC will have separated control
and management planes. Each MDC has a separate telnet server process, separate
SNMP process, separate LACP process, separate OSPF process etc.
In addition, each MDC will also have an isolated data plane. This means that the
VLANs defined in one MDC are totally independent of the VLANs defined in a
different MDC. As an example, MDC1 can have VLANs 10, 20 and 30 configured.
MDC2 can also have VLANs 10, 20 and 30 configured, but here is no communication
between VLAN 10 on MDC1 and VLAN 10 on MDC2.
Each MDC also has its own hardware limits. This is because resources are assigned
to MDCs down to the ASIC level.
A switch configured without multiple MDCs has a limit of 4094 VLANs in the
overall chassis. However, once a new MDC is created, ASICs and line cards within
the physical device are assigned to the new MDC and can be programmed by the new
management and control plane. Each MDC is a new logical device inside the
physical device and has a separate limit of 4094 VLANs. Other features such as the
number of VRFs supported are also set per MDC and what is configured in one MDC
does not affect other MDCs limits.
In other words, if you have 4 MDCs on a chassis, the total chassis will support 4
times the hardware and software limits of the same chassis with a single MDC or a
traditional chassis. As an example, rather than supporting only 4094 VLANs, 4 x
4094 VLANs are supported with a total of 16,376 VLANs supported (4094 per MDC
and running 4 MDCs).
MDCs share and compete for CPU resources. If an MDC needs a lot of CPU
resources while the other MDCs are relatively idle, the MDC can access more CPU
resources. If all MDCs need a lot of CPU resources, the physical device assigns CPU
resources to MDCs according to their CPU weights.
Use the
MDCs.

limit-resource cpu weight

command to assign CPU weights to user

Supported Platforms
Supported Products
MDC is supported on chassis based platforms running the HP Comware 7 operating
system. MDC is not by the HP Comware 5 operating system. As an example, the
12500 series switches require main processing units (MPUs) running HP Comware 7

******ebook converter DEMO Watermarks*******

and not HP Comware 5. This also applies to the HP 10500 series switches. See
Figure 2-4 for supported platforms.

Figure 2-4: Supported platforms

MDC is only available on chassis based switches and not fixed port switches. This is
due to the processing and memory requirements of running separate virtual switches
within the same physical switch. If you configured three MDCs that would require 3
x LACP process, 3 x BGP processes, 3 x OSPF processes, 3 x telnet processes etc.
Fixed port switches do not have enough memory to run multiple MDCs and create
separate instances of all processes.
In contrast, chassis based switches have the HP Comware operating system installed
on the Main Processing Unit (MPU) and may also have the HP Comware operating
system running on the line cards or Line Processing Units (LPU) with their own
memory. The chassis based switches have more memory and can therefore run
multiple MDCs.
All MDC capable devices have a "default MDC or admin MDC. The default
MDC can access and manage all hardware resources. User MDCs can be created,
managed or deleted via the default MDC and. The default MDC is system predefined
and cannot be created or deleted. The default MDC always uses the name "Admin"
and the ID 1.
The number of MDCs available depends on the Main Processing Unit (MPU)
capabilities and switch generation. The supported number of MPUs is in the range
four to nine:
The 11900 and 12500 switch series support four MDCs.
The HP FlexFabric 12900 switch series supports nine MDCs. This is because the
switch has enhanced memory capabilities.

******ebook converter DEMO Watermarks*******

Note
When you configure MDCs, follow these restrictions and guidelines:
Only MPUs with 4-GB memory or 8-GB memory space support configuring
MDCs. The MDC feature and the enhanced IRF feature (4-chassis IRF) are
mutually exclusive. When using MDC, the IRF Fabric is currently limited to 2
nodes.
The number of MDCs supported per LPU differs depending on LPU memory. Refer to
Table 2-1 through Table 2-5 below for summary and SKUs with LPU memory.
Note
The product details shown below are for reference only.
Table 2-1: MDCs support per device and LPU

Table 2-2: LPUs with 512MB Memory


SKU

Description

JC068A
JC065A
JC476A
JC069A

HP 12500 8-port 10-GbE XFP LEC Module


HP 12500 48-port Gig-T LEC Module
HP 12500 32-port 10-GbE SFP+ REC Module
HP 12500 48-port GbE SFP LEC Module

JC075A
JC073A
JC074A
JC064A
JC070A

HP 12500 48-port GbE SFP LEB Module


HP 12500 8-port 10-GbE XFP LEB Module
HP 12500 48-port Gig-T LEB Module
HP 12500 32-port 10-GbE SFP+ REB Module
HP 12500 4-port 10-GbE XFP LEC Module

Table 2-3: LPUs with 1G Memory


SKU

Description

JC068B HP 12500 8-port 10GbE XFP LEC Module

******ebook converter DEMO Watermarks*******

JC069B HP 12500 48-port GbE SFP LEC Module


JC073B
JC074B
JC075B
JC064B
JC065B
JC476B
JC659A
JC660A
JC780A
JC781A

HP 12500 8-port 10GbE XFP LEB Module


HP 12500 48-port Gig-T LEB Module
HP 12500 48-port GbE SFP LEB Module
HP 12500 32-port 10GbE SFP+ REB Module
HP 12500 48-port Gig-T LEC Module
HP 12500 32-port 10-GbE SFP+ REC Module
HP 12500 8-port 10GbE SFP+ LEF Module
HP 12500 48-port GbE SFP LEF Module
HP 12500 8-port 10GbE SFP+ LEB Module
HP 12500 8-port 10GbE SFP+ LEC Module

JC782A HP 12500 16-port 10-GbE SFP+ LEB Module


JC809A HP 12500 48-port Gig-T LEC TAA Module
JC810A HP 12500 8-port 10-GbE XFP LEC TAA Mod
JC811A HP 12500 48-port GbE SFP LEC TAA Module
JC812A HP 12500 32p 10-GbE SFP+ REC TAA Module
JC813A HP 12500 8-port 10-GbE SFP+ LEC TAA Mod
JC814A HP 12500 16p 10-GbE SFP+ LEC TAA Module
JC818A HP 12500 48-port GbE SFP LEF TAA Module
Table 2-4: LPUs with 4G Memory
SKU

Description

JG792A HP FF 12500 40p 1/10GbE SFP+ FD Mod


JG794A HP FF 12500 40p 1/10GbE SFP+ FG Mod
JG796A HP FF 12500 48p 1/10GbE SFP+ FD Mod
JG790A HP FF 12500 16p 40GbE QSFP+ FD Mod
JG786A HP FF 12500 4p 100GbE CFP FD Mod
JG788A HP FF 12500 4p 100GbE CFP FG Mod
Refer to device release notes to determine support.

******ebook converter DEMO Watermarks*******

Table 2-5: Example of HP 12500-CMW710-R7328P01 support of Ethernet interface


cards for ISSU and MDC

******ebook converter DEMO Watermarks*******

Use Case 1: Datacenter Change Management


Overview
A number of use cases are discussed in this chapter. In this first use case, MDC is
used to better handle change management procedures in a data center.
Separate MDCs are created for a production network, a quality assurance (QA)
network and a Development network. This is in line with procedures followed by
Enterprise resource planning (ERP) applications which tend to have three separate
installations.

Development Network
******ebook converter DEMO Watermarks*******

A separate development MDC allows for testing to be performed on a separate


logical network, but still using the same physical switches as are used in the
production network.
As an example, a customer may want to test a new load balancer for two to three
weeks. The test can be performed on a temporary basis using the development
network rather than the production network. However, as mentioned both networks
use the same physical switches.
Rather than introducing the additional risk of a new untested device in the production
network, comprehensive tests can be performed using the development network.
Features of the new device can be tested, issues resolved and updated network
configuration verified without affecting the current running network. The additional
benefit of MDC is that the test will be relevant and consistent with the production
network as the tests are being performed on the same hardware as the production
network.

Quality Assurance (QA) Network


A Quality Assurance network is an identical logical copy of the production network.
When a major change is required on the production network, the change can be
validated on the QA network. Changes such as the addition of new VLANs, new
routing protocols or new access control lists (ACLs) can be tested and validated in
advance on the QA network before deploying the change on the production network.
The advantage of using MDC in this scenario is that all the MDCs are running on the
same physical hardware. Thus the tests and configuration are validated as if they
were running on the production network. This is a much better approach than using
smaller test switches instead of actual production core switches to try to validate
changes. Using different switches does not make the QA tests 100% valid as there
could be differences in firmware or hardware capabilities between the QA network
and the production network when tested on different hardware.
Note
The QA process will validate feature configurations, but cannot be used to test
or validate firmware updates. All MDCs in a physical device or IRF are
running the same firmware version and all MDCs will be upgraded together
during a firmware update

Use Case 2: Customer Isolation


******ebook converter DEMO Watermarks*******

This second use case uses MDC for customer isolation.


In a data center, multiple customers could use the same core network infrastructure,
but be isolated using traditional network isolation technologies such as VLANS and
VRFs.
A customer may however want further isolation in addition to traditional network
isolation technologies. They may want isolation of their configurations, memory and
CPU resources from other customers. MDC provides this functionality whereas
traditional technologies such as VLANs don't provide this level of isolation.
This use case is limited by the number of supported MDCs on the physical switches.
As an example, when using a 12500 series switch with 4GB MPU, this use case will
only allow for isolation of two to three customers, as shown in Figure 2-5. This is
because one MDC is used for the Admin MDC and the switch supports a maximum of
four MDCs.

Figure 2-5: Use Case 2: Customer isolation

An additional use case for MDC isolation is where different levels of security are
required within a single customer network. A customer may have a lower security
level network and a higher security level network and may want to keep these
separate from each other. These networks would be separated entirely by using
multiple MDCs. This use case is however also restricted by the number of MDCs a
switch can support.
Note

******ebook converter DEMO Watermarks*******

MDCs are different to VRFs as VRFs only separate the data plane and not the
management plane of a device. In the example use case of different security
level networks, multiple network administrators are involved. A lower level
security zone administrator cannot configure or view the configuration of a
higher level security zone. When configuring VRFs however, the entire
configuration would be visible to network administrators.

Use Case 3: Infrastructure and Customer


Isolation
The third MDC use case splits a switch logically into two separate devices. One
MDC is used for core infrastructure and another MDC is used for customers, as
shown in Figure 2-6.The benefit here is that the core data center infrastructure
network is isolated from all customer networks. There are separate VLANs (4094),
separate QinQ tags and separate VRFs per MDC.

Figure 2-6: Use Case 3: Infrastructure and Customer Isolation

The data center core network is logically running a totally separate management
network independent of all customer data networks. Both management and customer
networks still use the same physical equipment.

Use Case 4: Hardware Limitation Workaround


In this fourth use case, MDC provides a workaround for hardware limitations on
switches. As an example, a data center may use Shortest Path Bridging MAC mode
(SPBM) or Transparent Interconnection of Lots of Links (TRILL). The current switch
ASICs cannot provide the core SPBM service and layer 3 routing services at the
same time.

******ebook converter DEMO Watermarks*******

SPB is essentially a replacement for Spanning Tree. One caveat of SPB is that core
devices simply switch encapsulated packets and do not read the packet contents. This
is similar to the behavior of P devices in an MPLS environment. A core SPBM
device would therefore not be able to route packets between VLANs.
An SPB edge device is typically required for the routing. SPB encapsulated packets
would be de-capsulated so that the device can view the IP frames and perform interVLAN routing.
If IP routing is required on the same physical core as the device configured for SPB,
two MDCs would be configured, as shown in Figure 2-7. One MDC would be
configured with SPB and be part of the SPB network. Another MDC would then be
configured that is not running SPB to provide layer 3 functionality. A physical cable
would be used to connect the two MDCs on the same chassis switch. The SPB MDC
is thus connected to the layer 3 routing core MDC via a physical back-to-back cable.

Figure 2-7: Use Case 4: Hardware limitation workaround

This scenario would apply for both SPB and TRILL.

MDC Numbering and Naming


MDC 1 is created by default with HP Comware 7 and is named Admin in the
default configuration. Non-default MDCs are allocated IDs 2 and above. Names are
assigned to these MDCs as desired, such as DevTest, DMZ and Internal as
shown in Figure 2-8.

******ebook converter DEMO Watermarks*******

Figure 2-8: MDC numbering and naming

Architecture
It is important to realize that even though MDCs look like two, three or even four
logical devices running on a physical device, there is still only one MPU with only
one CPU.
Only one kernel is booted. On top of this kernel, multiple MDC contexts will be
started and each MDC context will have its own processes and allocated resources.
But, there is still only one Kernel. This also explains why multiple MDCs need to run
the same firmware version.
A device supporting MDCs is an MDC itself, and is called the "Admin" MDC. The
default MDC always uses the name Admin and the ID 1. You cannot delete it or
change its name or ID. By default, there is one kernel that started and it will start one
MDC and one MDC only (the Admin MDC). The Admin MDC is used to manage any
other MDCs.
The moment a new MDC is defined, all the control plane protocols of the new MDC
will run in that MDC process group. This process group is isolated from other
process groups and they cannot interact with each other.
Processes that form part of the process group can be allocated a CPU weight to

******ebook converter DEMO Watermarks*******

provide more processing to specific MDCs. CPU, disk usage and memory usage of
process groups can also be restricted for any new MDC. Resource allocation will be
covered later in this chapter.
This restriction does not apply to the Admin MDC. The Admin MDC will always
have 100% access to the system. If necessary, it can take all CPU resources, or use
all memory, or use the entire flash system. The Admin MDC can also access the files
of the other MDCs, since these files are stored in a subfolder per MDC on the main
flash.
It is important to remember that there is still a physical MPU dependency. If the
physical MPU goes down, all of the MDCs running on top of the physical MPU will
also go down. That is why it is worth considering the use of an IRF fabric for high
availability.
As an example, two core physical chassis switches are configured as an IRF fabric.
In addition, three MDCs are configured.
If the first physical switch is powered off, all MDCs (three in this example), will
have a master IRF failure and will activate the slave as the new master (second
chassis).

Architecture, Control Plane


When a new MDC is defined, the MDC can be started. A new control plane is
configured for the MDC. However, the MDC only has access to the Main Processing
Unit (MPU). No line cards or interfaces are available to the MDC until they have
been assigned by an administrator to the MDC.
This is similar to booting a chassis with only the MPU and no Line Processing Units
(LPU) / line cards inserted in the chassis.
Using the
interfaces.

display

interface

brief

command for example would show no

Architecture, ASICs
How do you assign line card interfaces to an MDC?
Because of the hardware restrictions on devices, the interfaces on some interface
cards are grouped. Interfaces therefore need to be allocated to the MDC per ASIC
(port group).

******ebook converter DEMO Watermarks*******

It is important to understand how ASICs are used within a chassis based switch.
In a chassis, each of the line cards has one or more local ASICs. This affects the data
plane of the switch as the data plane packet processing is done by the ASIC. When
packets are received by the switch, functions such as VLAN lookups, MAC address
lookups and so on are performed by ASICs. These ASICs also hold the VLAN table
or the IP routing table.
One ASIC can be used by multiple physical interfaces. As an example, one ASIC on
the line card can be used by 24 Gigabit Ethernet ports. Depending on the line card
models there may be up to 4 ASICs on a physical line card. Another example is a 48
Gigabit Ethernet port line card which could have only two ASICs.

Architecture, ASIC Control


Why this is important to understand? Because each of these ASICs has its own
hardware resources and limits. For each ASIC as an example, there is a limit of 4094
VLANs.
The moment you define a new VLAN at the global chassis level, that VLAN will be
programmed by the control plane into each of the ASICs on the chassis. If there are
six different ASICs on a line card, each ASIC will be programmed with all globally
configured VLANs. In a normal chassis all the ASICs are used by the MPU, so they
are programmed by the single control plane.
Each ASIC can only have one control plane or ASIC programming process. The
ASIC can have only one master and cannot be configured by other control planes.
When creating a new MDC, a new control plane is created. Two control planes
cannot modify the same ASIC.
By default, all ASICs and line cards are controlled by the Admin MDC. When
creating a new MDC, the control of an ASIC can be changed from the default Admin
MDC to that new MDC. This results in all physical interfaces that are bound to the
ASIC also being moved to the new MDC. Individual interfaces cannot be assigned to
an MDC. They are assigned indirectly to the MDC when the ASIC they use is
assigned to the MDC.
All interfaces which are managed by one ASIC must be assigned to the same MDC.
For example 10500/11900/12900 series switches only support one MDC per LPU. In
the configuration, this is enforced by the CLI through port-groups. As shown in Figure
2-9, all interfaces which are bound to the same ASIC must be assigned as a port-

******ebook converter DEMO Watermarks*******

group to an MDC. An example of 12500/12500E LPU MDC Port Group


Implementation is given in Table 2-6.

Figure 2-9: Architecture, ASIC control

Table 2-6: Example 12500/12500E LPU MDC Port Group Implementation

HP Comware7 will notify which ports belong to a port-group. The following sample
configuration shows 11900 MDC port group allocation:
[DC1-SPINE-1-mdc-2-mdc2]allocate interface FortyGigE 1/1/0/1
Configuration of the interfaces will be lost. Continue? [Y/N]:y
Group error: all interfaces of one group must be allocated to the same mdc.
FortyGigE1/1/0/1
Port list of group 5:
FortyGigE1/1/0/1

FortyGigE1/1/0/2

FortyGigE1/1/0/3

FortyGigE1/1/0/4

FortyGigE1/1/0/5

FortyGigE1/1/0/6

******ebook converter DEMO Watermarks*******

FortyGigE1/1/0/7

FortyGigE1/1/0/8

Architecture, Hardware Limits


In addition to a new control plane being created, hardware limits change with the
creation of a new MDC.
As an example, if 1000 VLANs were created using the Admin MDC, these VLANs
would be programmed on each ASIC that is associated with the Admin MDC.
However, ASICs associated with another MDC, such as the Development MDC, will
not have the 1000 VLANs programmed. They only have the VLANs configured by an
administrator of the Development MDC. The control plane of the Admin MDC does
not control and can therefore not program the ASICs associated with the
Development MDC.
If VLAN 10 was configured on the Admin MDC, that VLAN is not programmed onto
the ASICs of the Development MDC. VLAN 10 would only be programmed on the
ASICs if VLAN 10 was configured on the Development MDC. However, VLAN 10
on the Admin MDC is different and totally independent from VLAN 10 on the
Development MDC. MAC addresses learned in the Development MDC are different
from the MAC addresses learned in the Admin MDC.
There is no control plane synchronization between the ASICs of different MDCs. By
default there is only one MDC and all ASICs have the same VLAN information.
However, as soon as multiple MDCs are created, each ASIC in a different MDC is in
effect part of a different switch, controlled and programmed separately. This
principle applies to all the resources and features such as access lists, VRFs, VPN
instances, routing table sizes etc.
This also means that if any MDC is running out of hardware resources at the ASIC
level, the resource shortage will not impact any of the other MDCs.
This is ideal for heavy load environments. Customers could stress test a network
with many VRFs, access lists or quality of service (QoS) rules without affecting
other MDCs. A development MDC could run out of resources without affecting the
production MDC for example.
However, while there is isolation of the data plane by isolating the ASICs, this is not
the case for a number of other components. Switch hardware resources such as CPU,
physical memory and the flash file systems are shared between MDCs.

Architecture, File System


******ebook converter DEMO Watermarks*******

Each MDC has its own configuration files on a dedicated part of the disk. An MDC
administrator can therefore only modify or restart their own MDC.
Access to the switch CPU and physical memory by MDCs can also be restricted.
There is also good isolation and separation of MDC access to these resources.
For the file system however, there is only one file system available on the flash card.
The Admin MDC (which is the original MDC) has root access to the file system.
This MDC has total control of the flash and has the privileges to perform operations
such as formatting the file system. Any file system operations such as formatting the
flash or using fixdisk are only available from the Admin MDC.
Configurations saved from the Admin MDC are typically saved to the root of the file
system. Other MDCs only have access to a subset of the file structure. This is based
on the MDC identifier. When a new MDC is defined, a folder is created on flash with
the MDC identifier. MDC 2 for example, has a folder "2" created for it on flash. All
files saved by MDC 2 are stored in this subfolder. Additionally, any file operations
such as listing directories and files on the flash using DIR will only show files within
this subfolder. From within the MDC, it appears that root access is provided, but in
effect, only a subfolder is made available to the MDC.
The Admin MDC can view all the configuration files of other MDCs as they are
subfolders in the root of the file system. This is something to consider in specific use
cases.
Within the other MDCs, only the local MDC files are visible. MDC 2 would not be
able to view the files of Admin MDC or other MDCs (such as MDC 3).
The Admin MDC can also be used to monitor and restrict the file space made
available to other MDCs. The Admin MDC has full access and unlimited control
over the file system, but other MDCs can be restricted from the Admin MDC if
required.

Architecture, Console Ports


Console Port
Other components which are shared between the MDCs are the console port and the
Management-Ethernet ports. The console port and AUX port of the physical chassis
always belong to the Admin MDC (default MDC). Other MDCs do not have access
to the physical console or AUX ports.

******ebook converter DEMO Watermarks*******

To access the console of the other MDCs, first access the admin MDC console and
then use the switchto mdc command to switch to the console of a specific MDC.
This is similar to the Open Application Platform (OAP) connect functionality used to
connect to the console of subslots on other devices like the unified wireless
controllers.

Management-Ethernet ports
The management interface of all MDCs share the same physical Out Of Band (OOB)
management Ethernet port. Switching to this interface using the switchto command
is not possible like with the console port.
The management Ethernet interface is shared between all MDCs. When a new MDC
is created, the system automatically shows the Management Ethernet interface of the
MPU inside the MDC.
You must assign different IP addresses to the Management-Ethernet interfaces so
MDC administrators can access and manage their respective MDCs. The IP
addresses for the management Ethernet interfaces do not need to belong to the same
network segment.
The interface can be configured from within each MDC as the interface is shared
between all the MDCs. This means that the physical interface will accept
configurations from all the MDCs. Network administrators or operators of the MDCs
will need to agree on the configuration of the Management-Ethernet port.

Design Considerations
ASIC Restrictions
When designing an MDC solution, remember that ASIC binding determines the
interface grouping that will need to be allocated to an MDC. Interfaces have to be
assigned per ASIC.
Some of the line cards only have a single ASIC. This means that all the interfaces on
the line card will need to be assigned to or removed from an MDC at the same time.
Some line cards may have 2 or more ASICs. This allows for a smaller number of
interfaces to be assigned to an MDC at the same time.
The second consideration is that the number of MDCs will depend on the MPU
generation and memory size.

******ebook converter DEMO Watermarks*******

The interfaces in a group must be assigned to or removed from the same MDC at the
same time. You can see how the interfaces are grouped by viewing the output of the
allocate interface or undo allocate interface command:
If the interfaces you specified for the command belong to the same group or groups
and you have specified all interfaces in the group or groups for the command, the
command outputs no error information.
Otherwise, the command displays the interfaces that failed to be assigned and the
interfaces in the same group or groups.
Assigning or reclaiming a physical interface restores the settings of the interface to
the defaults. For example, if the MDC administrator configures the interface, and
later on the interfaces are assigned to a different MDC, the interface configuration
settings are lost.
To assign all physical interfaces on an LPU to a non-default MDC, you must first
reclaim the LPU from the default MDC by using the undo location and undo
allocate commands. If you do not do so, some resources might be still occupied by
the default MPU.

Platforms
The number of MDCs supported by a platform also needs to be considered. This
depends on the MPU platform as well as the MPU generation.
You can create MDCs only on MPUs with a memory space that is equal to or greater
than 4 GB. The maximum number of non-default MDCs depends on the MPU model.
Refer to earlier in this chapter for more details.

Basic Configuration Steps


Overview
The configuration steps for creating and enabling an MDC will be discussed. Basic
MDC configuration is discussed first and then advanced configuration options such
as setting resource limits will be covered.
Step 1: Define the new MDC with the new ID and a new name.
Step 2: Authorize the MDC to use specific line cards. ASICs are not assigned at this
point. Authorization is given so the next step can be used to assign interfaces.

******ebook converter DEMO Watermarks*******

Step 3: Allocate interfaces to the MDC. Remember to allocate per ASIC group.
Step 4: Start the MDC. This starts the new MDC control plane.
Step 5: Access the MDC console by using the switchto command.

Configuration Step 1: Define a New MDC


Step 1: Define the new MDC with the new ID and a new name.
This command needs to be entered from within the default Admin MDC. You cannot
type this command from any non-default MDCs.
From the default MDC enter system view. Next, define a new MDC by specifying a
name of your choice and ID of the MDC. This ID is used for the subfolder on the
flash file system. See Figure 2-10 for an example.

Figure 2-10: Configuration step 1: Define a new MDC

Once the MDC is configured, a new process group is defined. The process group is
not started at this point as the MDC needs to be manually started in step 4. To create
an MDC, see Table 2-7.
Table 2-7: Creating an MDC
Step

Command Remarks

1.
Enter
system
view.

systemview

2.
Create
and

mdc mdcname [ id
mdc-id ]

By default, there is a default MDC with the name Admin and


the ID 1. The default MDC is system predefined. You do not
need to create it, and you cannot delete it.
The MDC starts to work after you execute the mdc start
command.

******ebook converter DEMO Watermarks*******

MDC.

This command is mutually exclusive with the irf


enhanced command.

mode

Configuration Step 2: Authorize MDC for a


Line Card
When you create an MDC, the system automatically assigns CPU, storage space, and
memory space resources to the MDC to ensure its operation. You can adjust the
resource allocations as required (this is discussed in more detail later in this
chapter).
An MDC needs interfaces to forward packets. However, the system does not
automatically assign interfaces to MDCs and you must assign them manually.
By default, a non-default MDC can access only the resources on the MPUs. All LPUs
of the device belong to the default MDC and a non-default MDC cannot access any
LPUs or resources on the LPUs. To assign physical interfaces to an MDC, you must
first authorize the MDC to use the interface cards to which the physical interfaces
belong.
Step 2 is to authorize the MDC to access interfaces of a specific line card. This
command is entered from the non-default MDC context. In Figure 2-11, MDC 2 with
the name Dev is authorized to allocate interfaces on the line card in slot 2.

Figure 2-11: Configuration step 2: Authorize MDC for a line card

This command does not assign any of the interfaces to the MDC at this point. It only
authorizes the assignment of the interfaces on that line card. Interfaces will be
assigned to the MDC in step 3, as outlined in Table 2-8.
Multiple MDCs can be authorized to use the same interface card.
Table 2-8: To authorize an MDC to use an interface card
Step

Command

1. Enter system
view.

system-view

Remarks

******ebook converter DEMO Watermarks*******

2. Enter MDC
view.

mdc mdc-name [
id mdc-id ]

In standalone mode:
location slot
slot-number

3. Authorize the
MDC to use an In IRF mode:
interface card. location chassis
chassis-number
slot slot-number

By default, all interface cards of the device


belong to the default MDC, and a non-default
MDC cannot use any interface card.
You can authorize multiple MDCs to use the
same interface card.

Configuration Step 3: Allocate Interfaces per


ASIC
By default, all physical interfaces belong to the default MDC, and a non-default MDC
has no physical interfaces to use for packet forwarding. To enable a non-default
MDC to forward packets, you must assign it interfaces.
The console port and AUX port of the device always belong to the default MDC and
cannot be assigned to a non-default MDC.
Important
When you assign physical interfaces to MDCs on an IRF member device, make
sure the default MDC always has at least one physical IRF port in the up state.
Assigning the default MDC's last physical IRF port in the up state to a nondefault MDC splits the IRF fabric. This restriction does not apply to 12900
series switches.
Only a physical interface that belongs to the default MDC can be assigned to a nondefault MDC. The default MDC can use only the physical interfaces that are not
assigned to a non-default MDC.
One physical interface can belong to only one MDC. To assign a physical interface
that belongs to a non-default MDC to another non-default MDC, you must first
remove the existing assignment by using the undo allocate interface command.
Assigning a physical interface to or reclaiming a physical interface from an MDC
restores the settings of the interface to the defaults.
Remember that because of hardware restrictions, the interfaces on some interface

******ebook converter DEMO Watermarks*******

cards are grouped. The interfaces that form part of the ASIC group may vary
depending on the line card and the interfaces in a group must be assigned to the same
MDC at the same time.
When interfaces are allocated to the new MDC, they are removed from the default
MDC and moved to the specified non-default MDC. All current interface
configuration is reset on the interfaces when moved to the new MDC. These
interfaces appear as new interfaces in the MDC. They will thus be assigned by
default to VLAN 1. In Figure 2-12, interfaces Gigabit Ethernet 2/0/1 to 2/0/48 have
been allocated to MDC 2, named Dev. To configure parameters for a physical
interface assigned to an MDC, you must log in to the MDC.

Figure 2-12: Configuration step 3: Allocate interfaces per ASIC

In IRF mode on 12500 series switches, you must assign non-default MDCs physical
interfaces for establishing IRF connections. A non-default MDC needs to use the
physical IRF ports to forward packets between member devices. This is discussed in
more detail later in this chapter.
After you change the configuration of a physical IRF port, you must use the save
command to save the running configuration. Otherwise, after a reboot, the master and
subordinate devices in the IRF fabric have different physical IRF port configurations
and you must use the undo allocate interface command and the undo port
group interface command to restore the default and reconfigure the physical IRF
port. Table 2-9 outlines the configuration procedure.
Table 2-9: Configuration Procedure
Step
1. Enter
system
view.
2. Enter
MDC view.

Command

Remarks

system-view

mdc mdc-name [ id mdc-id ]

(Approach 1) Assign individual


interfaces to the MDC:
allocate interface {

Use either or both approaches.

******ebook converter DEMO Watermarks*******

3. Assign interface-type interfacenumber }&<1-24>


physical
interfaces to Approach 2) Assign a range of
the MDC. interfaces to the MDC:
allocate interface
interface-type interfacenumber1 to interface-type
interface-number2

By default, all physical interfaces


belong to the default MDC, and a
non-default MDC has no physical
interfaces to use.
You can assign multiple physical
interfaces to the same MDC.

Configuration Step 4: Start MDC


Once interfaces are assigned to the MDC, the MDC can be started. The start
command starts the control plane and management plane of the MDC, as shown in
Figure 2-13. The data plane will be active for any interfaces which have been
allocated to this MDC at the moment the MDC is started.

Figure 2-13: Configuration step 4: Start MDC

At this point you may notice that the total memory utilization of the switch will
increase. This is because multiple additional processes for the MDC are being
started. To start an MDC, see Table 2-10.
Important
If you access the BootWare menus and select the Skip Current System
Configuration option while the device starts up, all MDCs will start up
without loading any configuration file.
Table 2-10: Starting an MDC
Step

Command

1. Enter system view.


2. Enter MDC view.

system-view
mdc mdc-name [ id mdc-id ]

******ebook converter DEMO Watermarks*******

3. Start the MDC.

mdc start

Configuration Step 5: Access the MDC


A non-default MDC operates as if it were a standalone device. From the system view
of the default MDC, you can log in to a non-default MDC and enter MDC system
view.
In Figure 2-14, the console is switched to the Dev MDC from the Admin MDC. The
prompt will display as if you are accessing a new console session. Within the Dev
MDC, you will need to enter the system-view again to configure the switch. In this
example the host name is changed to Dev for the Dev MDC.

Figure 2-14: Configuration step 5: Access the MDC

In MDC system view, you can assign an IP address to the Management-Ethernet


interface, or create a VLAN interface on the MDC and assign an IP address to the
interface. This will allow administrators of the MDC to log in to the MDC by using
Telnet or SSH.
To return from a user MDC to the default MDC, use the switchback or quit
command. In this example the switchback command is used to return to the Admin
MDC and the output shows the switch name as switch. Table 2-11 outlines how to
log in to a non-default MDC from the system view of the default MDC.
Table 2-11: To log in to a non-default MDC from the system view of the default MDC
Step

Command

Remarks

******ebook converter DEMO Watermarks*******

1. Enter
system view.

system-view

2. Log in to an
MDC

switchto mdc
mdc-name

You use this command to log in to only an MDC


that is in active state.

MDC Advanced Configuration Topics


Once basic configuration has been completed, multiple advanced options can be
configured.
Options such as restricting MDC resource access to CPU, memory and file system
access will be discussed in this chapter. Configuration of the Management-Ethernet
interface and firmware updates will also be discussed.
Resource allocation to MDCs are explained in Table 2-12, values may be modified if
required.
Table 2-12: The default values shown will fit most customer deployments
Allocation
Resource Information
Used to assign
MPU and LPU
CPU resources to
each MDC
according to their
CPU weight

CPU
weight

When MDCs need


more CPU
resources, the
device assigns
CPU resources
according to their
CPU weights

Default

10 Default (100 Max)


By default, the default MDC has a CPU weight
of 10 (unchangeable) on each MPU and each
interface card.
Each non-default MDC has a CPU weight of
10 on each MPU and each interface card that
it is authorized to use.

Specify CPU
weights for MDCs
using limitresource cpu
weight command

******ebook converter DEMO Watermarks*******

Disk
space

Memory
space

Used to limit the


amount of disk
space each MDC
can use for
configuration and
log files

100% Default (100% Max)

By default, all MDCs share the disk space in


Specify disk space
the system, and an MDC can use all free disk
percentages for
space in the system.
MDCs using
limit-resource
memory
command
Used to limit the
amount of memory 100% Default (100% Max)
space each MDC By default, all MDCs share the memory space
can use
in the system, and an MDC can use all free
Specify memory
memory space in the system.card, and each
space percentages
non-default MDC has a CPU weight of 10 on
for MDCs using
each MPU and each interface card that it is
limit-resource
authorized to use.
disk command

Although fabric modules are shared by MDCs, traffic between MDCs are isolated as
source/destination Packet Processors within the chassis are isolated.

Restricting MDC Resources: Limit CPU


All MDCs are authorized to use the same share of CPU resources. If one MDC takes
too many CPU resources, the other MDCs might not be able to operate. To ensure
correct operation of all MDCs, specify a CPU weight for each MDC.
The amount of CPU resources an MDC can use depends on the percentage of its CPU
weight among the CPU weights of all MDCs that share the same CPU. For example,
in Figure 2-15, three MDCs share the same CPU, setting their weights to 10, 10, and
5 is equivalent to setting their weights to 2, 2, and 1:
The two MDCs with the same weight can use the CPU for approximately the same
period of time.
The third MDC can use the CPU for about half of the time for each of the other

******ebook converter DEMO Watermarks*******

two MDCs.

Figure 2-15: Restricting MDC resources: Limit CPU

The CPU weight specified for an MDC takes effect on all MPUs and all LPUs that the
MDC is authorized to use. Table 2-13 outlines how to specify a CPU weight for an
MDC.
The resource limits are only used if required. If an MDC does not require any of the
CPU resources, other MDCs can use all the available CPU. In other words, there is
no hard limit on the CPU usage when CPU resources are available.
Table 2-13: How to specify a CPU weight for an MDC
Step

Command Remarks

1. Enter
system
view.

systemview

2. Enter
MDC
view.

mdc mdcname [
id mdcid ]
limitresource
cpu
weight

3. Specify
a CPU
weight for
the MDC. weightvalue

By default, the default MDC has a CPU weight of 10


(unchangeable) on each MPU and each interface card, and
each non-default MDC has a CPU weight of 10 on each MPU
and each interface card that it is authorized to use.

Restricting MDC Resources: Limit Memory


******ebook converter DEMO Watermarks*******

By default, MDCs on a device share and compete for the system memory space. All
MDCs share the memory space in the system, and an MDC can use all free memory
space in the system. If an MDC takes too much memory space, other MDCs may not
be able to operate normally. To ensure correct operation of all MDCs, specify a
memory space percentage for each MDC to limit the amount of memory space each
MDC can use. Table 2-14 outlines how to specify a memory space percentage for an
MDC.
The memory space to be assigned to an MDC must be greater than the memory space
that the MDC is using. Before you specify a memory space percentage for an MDC,
use the mdc start command to start the MDC and use the display mdc resource
command to view the amount of memory space that the MDC is using.
Note
An MDC cannot use more memory than the allocated value specified by the
limit-resource memory command. This is in contrast to CPU resource limit
which is a weighted value.
Table 2-14: How to specify a memory space percentage for an MDC
Step
1. Enter system
view.
2. Enter MDC
view.

Command

Remarks

system-view
mdc mdc-name [ id mdc-id
]

In standalone mode:
limit-resource memory
slot slot-number ratio
limit-ratio

3. Specify a
memory space
percentage for In IRF mode:
limit-resource memory
the MDC.
chassis chassis-number

By default, all MDCs share the


memory space in the system, and an
MDC can use all free memory space
in the system.

slot slot-number ratio


limit-ratio

Restricting MDC Resources: Limit Storage


By default, MDCs on a device share and compete for the disk space of the device's
storage media, such as the Flash and CF cards. An MDC can use all free disk space

******ebook converter DEMO Watermarks*******

in the system.
If an MDC occupies too much disk space, the other MDCs might not be able to save
information such as configuration files and system logs. To prevent this, specify a
disk space percentage for each MDC to limit the amount of disk space each MDC can
use for configuration and log files. Table 2-15 outlines how to specify a disk space
percentage for an MDC.
Before you specify a disk space percentage for an MDC, use the display mdc
resource command to view the amount of disk space the MDC is using. The amount
of disk space indicated by the percentage must be greater than that the MDC is using.
Otherwise, the MDC cannot apply for more disk space and no more folders or files
can be created or saved for the MDC.
If the device has more than one storage medium, the disk space percentage specified
for an MDC takes effect on all the media.
Table 2-15: To specify a disk space percentage for an MDC
Step

Command

1. Enter system
view.

system-view

2. Enter MDC
view.

mdc mdc-name [ id mdc-id


]

Remarks

In standalone mode:
limit-resource disk slot
slot-number ratio
limit-ratio

3. Specify a
disk space
percentage for In IRF mode:
limit-resource disk
the MDC.
chassis chassis-number

By default, all MDCs share the disk


space in the system, and an MDC
can use all free disk space in the
system.

slot slot-number ratio


limit-ratio

Management Ethernet
When a non-default MDC is created, the system automatically provides access to the
Management Ethernet interface of the MPU. The Management-Ethernet interfaces of
all non-default MDCs use the same interface type and number and the same physical
port and link as the default MDC's physical Management-Ethernet interface.
However, you must assign a different IP address to the Management-Ethernet

******ebook converter DEMO Watermarks*******

interface so MDC administrators can access and manage their respective MDCs, see
Figure 2-16 for an example. The IP addresses for the Management-Ethernet interfaces
do not need to belong to the same network segment.

Figure 2-16: Management Ethernet

Device Firmware Updates


To run Comware 7, MPUs must be fitted with 4GB SDRAM and also have a CF card
of at least 1 GB in size. 4 GB SDRAM is fitted as standard in the JC072B and the
JG497A, but the JC072A must be upgraded from 1 GB to 4 GB of SDRAM by using
two memory upgrade kits (2 x JC609A). If required, 1 GB CF cards (JC684A) are
available for purchase. If an upgraded JC072A needs to be returned for repair, be
sure to retain the upgrade parts for use in the replacement unit.
As shown in Figure 2-17, due to physical memory limits, interface cards with 512
MB memory do not support ISSU, and the interfaces on each of these cards can be
assigned to only one MDC. Except for these ISSU and MDC limitations, these cards
provide full support for all other features.

Figure 2-17: Device firmware updates

Refer to earlier in this chapter for more detail.

Network Virtualization Types


******ebook converter DEMO Watermarks*******

In this section, MDC and IRF interoperability will be discussed.

IRF
Refer to the left hand of Figure 2-18. The network virtualization shown in the figure
is the combination of multiple physical switches configured as a single logical fabric
using IRF. Distributed link aggregation could then be used to connect multiple
physical cables to the separate physical switches as a single logical link connected to
a single logical device. Multi-Chassis Link Aggregation (MLAG) could be used for
link aggregation between the IRF fabric and other switches.
IRF supports both 2 and 4 chassis configurations.

Figure 2-18: Network Virtualization Types

MDC
The middle of Figure 2-18 shows MDC on a single physical switch. This has been
discussed at length previously in this chapter. We have discussed how the MDC
technology provides multi tenant device contexts, where multiple virtual or logical
devices are created on a single physical chassis.
Each of these logical contexts provides unique VLAN and VRF resources and also
provides hardware isolation inside the same physical chassis.

MDC and IRF


Although MDC can be deployed on a single chassis with redundant power supplies,
redundant management modules (MPUs) and redundant line cards (LPUs), most

******ebook converter DEMO Watermarks*******

customers have MDC deployed together with HP Intelligent Resilient Framework


(IRF).
IRF N:1 device virtualization together with MDC 1:N virtualization achieves a
combined N:1 + 1:N device virtualization solution as shown in the right hand of
Figure 2-18. This achieves higher port densities together with chassis redundancy.
Currently, only 2-chassis IRF & MDC is supported.
The right hand of Figure 2-18 shows MDC and IRF combined to provide a single
virtual device with multiple device contexts. In this example, two physical switches
are virtualized using IRF to create a single logical switch. The IRF fabric is then
carved up into multiple MDCs to provide IRF resiliency for each of the MDCs
defined in the IRF fabric.
This would be used to provide a common control plane, data plane and management
plane for each MDC across 2 physical systems.

IRF-Based MDCs
When you configure MDCs, follow these guidelines (see Figure 2-19):
To configure both IRF and MDCs on a device, configure IRF first. Otherwise, the
device will reboot and load the master's configuration rather than its own when it
joins an IRF fabric as a subordinate member, and none of its settings except for
the IRF port settings take effect.
Before assigning a physical IRF port to an MDC or reclaiming a physical IRF port
from an MDC, you must use the undo port group interface command to
restore the default. After assigning or reclaiming a physical IRF port, you must
use the save command to save the running configuration.

Figure 2-19: IRF-Based MDCs

By default, when a new IRF fabric is created, only the default Admin MDC is created
on the IRF fabric. All line cards are assigned to the Admin MDC by default. Line

******ebook converter DEMO Watermarks*******

cards and interfaces will then need to be manually assigned to other MDCs as
required.
It is important to note that at the time of this writing only 2 chassis IRF fabrics are
currently supported in conjunction with the MDC feature. A 4 chassis IRF fabric
which provides greater IRF scalability is not currently supported with MDC.

IRF-Based MDCs
As discussed previously, any new MDCs need to be authorized to use line cards
before interfaces can be allocated to the MDC. Once authorized, port groups are used
to allocate interfaces to the MDC.
What kind of combinations would be possible with IRF and MDCs?
Figure 2-20 shows various MDC and IRF scenarios.

Figure 2-20: IRF-Based MDCs

The first scenario is the most typical. Each MDC is allowed to allocate resources on
both chassis 1 and chassis 2. This will provide redundancy for each of the configured
MDCs.
This is not a required configuration. An MDC can be created without redundancy (as
shown in the second scenario). In this example, only specific line cards on chassis 1
in the IRF fabric have been allocated to MDC 4. MDC4 does not have any IRF
redundancy on chassis 2. The other MDCs have redundancy and have line cards
allocated on both chassis 1 and chassis 2 in the IRF fabric.

******ebook converter DEMO Watermarks*******

In the third scenario, both MDC 3 and 4 have line cards allocated only on chassis 1,
while MDC 1 and 2 have line cards allocated from both chassis 1 and 2. MDC 1 and
2 have redundancy in case of a chassis failure, but MDC 3 and 4 do not have any
redundancy if chassis 1 fails.
In the same way, as seen in the fourth scenario, MDC 1 is only configured on chassis
1, while MDCs 2, 3 and 4 are only configured on chassis 2. This is also a supported
configuration.
Scenario 5 and 6 show other supported variations of how MDCs can be configured
within an IRF fabric.
As can be seen, various combinations are possible and the administrator can decide
where MDCs operate. There is no limitation on where the MDCs need to be
configured on the chassis devices in the IRF fabric.

MDCs and IRF Types


Overview
There are two ways to configure IRF in combination with MDC. This is dependent
on the switch generation.
As shown in Figure 2-21, the method used by the 12500 and 12500E Series Switches
has separate IRF links per MDC. The alternate method used on the 10500, 11900 and
12900 Series Switches uses a shared IRF link for all MDCs.

******ebook converter DEMO Watermarks*******

Figure 2-21: MDCs and IRF types

12500/12500E
When configuring IRF on the 12500/12500E Series Switches, a dedicated IRF link
per MDC is required.
For MDC 2 on chassis 1 to communicate with MDC 2 on chassis 2, a dedicated IRF
port needs to be configured on both chassis switches that are physically part of that
MDC. For example, if line card 2 was assigned to MDC 2, then you would need to
assign a physical port on line card 2 as an IRF port for MDC 2. If line card 3 was
assigned to MDC 3, then a physical port on line card 3 would need to be configured
as an IRF port for MDC 3. This would be configured for each MDC.
This configuration also results in all data packets for an MDC using the dedicated
IRF port between the two chassis switches. As an example, if data is sent between
MDC1 on chassis 1 and MDC1 on chassis 2, the data would traverse the dedicated
IRF port connecting the two MDCs and not other IRF links.
This results in isolation of the data plane as the IRF link of MDC 1 will not receive

******ebook converter DEMO Watermarks*******

traffic from MDC 2 or other MDCs. This also applies to other MDCs.

10500/11900/12900
The version of the IRF and MDC interoperability used on the 10500, 11900 and
12900 Series Switches uses a single shared IRF link for all MDCs rather than a
dedicated IRF link per MDC.
This results in a change of packet flow between physical switches and MDCs. On a
12500 switch, a packet sent from one MDC to another uses the dedicated link for that
MDC. There is no explicit specification of source MDC when traffic traverses the
IRF link. It is therefore important that IRF the link be correctly connected to the
appropriate MDCs on both chassis switches. If an administrator accidently cabled
MDC2 on chassis 1 to MDC3 on chassis 2 on 12500 switches, traffic will flow
between the two MDCs using that IRF physical link. VLAN 10 traffic in MDC 2
would end up as VLAN 10 traffic on MDC 3 for example. This breaks the original
design principals of MDCs as the switch fabric is now extended from one MDC to
another, whereas MDCs should be separate logical switches. Each MDC should have
a separate VLAN space, but in this example VLANs are shared.
IRF and MDC on 10500, 11900 and 12900 switches no longer require dedicated
links per MDC. A shared IRF link is used and MDC traffic is differentiated using an
additional tag.
Using the same example, if VLAN 10 traffic is sent from MDC2 on chassis 1 to MDC
2 on chassis 2, an additional tag is added to the traffic across the IRF link. This
allows chassis 2 to different between the VLAN 10 traffic of MDC 2 and the VLAN
10 traffic of MDC 3.
The IRF port is part of the Admin MDC and direct MDC connections are no longer
supported. IRF commands are not available in non-default MDCs.
Proper bandwidth provisioning is required however, as the IRF port will now be
carrying traffic for multiple MDCs.

Configuration Examples
12500/12500E
Differences in IRF approaches are reflected in the configuration commands. When
configuring IRF on 12500/12500E switches, the MDC is specified in the port group
command.

******ebook converter DEMO Watermarks*******

Even though the IRF configured is completed using the Admin MDC, the IRF
configuration associates specific IRF interfaces with specific MDCs.
In Figure 2-22, the IRF configuration of IRF port 1/1 is shown. Interface Gigabit
Ethernet 1/3/0/1 is added to the IRF port, but is associated with MDC 2. The
physical interface 1/0/3/1 must be assigned to MDC 2. Gigabit Ethernet 1/3/0/24
could not be used with MDC 2 for example as it has already been associated with
MDC 3 using the allocate interface command. In this example, the interface is
correctly associated with MDC 3.
Note
MDC allows IRF fabrics to use 1 Gigabit Ethernet ports rather than only 10
Gigabit Ethernet ports.

Figure 2-22: Configuration examples

10500/11900/12900
The 10500, 11900 and 12900 Series Switches no longer use the MDC keyword when
IRF is configured. The interfaces are simply bound to the IRF port (1/1 in this
example). The main difference with these switches is that all the interfaces are part
of the Admin MDC. It is no longer possible to bind interfaces associated with nondefault MDCs to the IRF port.

******ebook converter DEMO Watermarks*******

More MDC and IRF Configuration


Information
Because of port groups and ASIC limitations, it may not be possible to assign
individual interfaces to IRF ports. Multiple physical interfaces may need to be
associated with the IRF port at the same time. Groups of four interfaces are often
associated as per the example shown in Figure 2-23.

Figure 2-23: More MDC and IRF configuration information

This is similar to the behavior on 5900 switches which also require that a group of
four interfaces be configured for IRF. This doesn't mean that you have to use all four
ports for IRF to function. You could as an example only physically cable two of the
ports. But, you cannot use any of the four ports in the group for any other function
apart from IRF once the group is used for IRF.
In Figure 2-23, port TenGigabitEthernet 1/0/0/5 is added to IRF. However, an error
is displayed indicating that ports 1/0/0/5 to 1/0/0/8 need to be shut down. As the
interfaces are part of a port group, they need to be allocated for IRF use as a group
rather than individually. Once allocated, one of the interfaces could be used for the
actual IRF functionality, but the entire group needs to be activated for IRF use (this is
true for certain platforms such as the 5900 series switches but may be different on
other platforms).

10500/11900/12900 Link Failure Scenario


******ebook converter DEMO Watermarks*******

As per IRF best practices, multiple physical interfaces should form part of the IRF
link between switches. If one of the physical interfaces goes down, IRF continues to
use the remaining links. As long as at least one link is active between the switches,
IRF will remain active. There will be reduced bandwidth between the IRF devices,
but IRF functionality is not affected (no split brain).
However, as shown in Figure 2-24, when all physical links between the switches go
down, an IRF split will occur.

Figure 2-24: 10500/11900/12900 link failure scenario

Since the admin MDC is used for IRF configuration port configuration, this is also
the MDC where IRF MAD needs to be configured. There will be no MAD
configuration in other MDCs. This also implies that the IRF MAD ports have to
belong to the Admin MDC.

12500/12500E Link Failure Scenario


On the 12500/12500E switches, IRF configuration is more complicated.
There is a base IRF protocol running at the chassis level and in addition, MDCs use
the IRF physical interfaces to exchange data. The data sent by a MDC is for that
particular MDC only. As an example, an IRF link configured in MDC 1 will only
transport data between MDC1 contexts. The link between MDC 2 contexts will only
transport data for MDC 2. The links do not carry data for other MDC contexts, but
are used by the base IRF protocol.

******ebook converter DEMO Watermarks*******

Refer to the first scenario in Figure 2-25. If the link between MDC1 on chassis 1 and
MDC 1 on chassis 2 fails, the base IRF protocol will remain online as there are still
3 active links between chassis that can be used by the base IRF protocol.

Figure 2-25: 12500/12500E link failure scenario

However, the data plane connection for MDC 1 is down which results in a split for
MDC 1. In a traditional IRF system, that would result in a chassis split brain.
However, in this example by contrast, the base IRF protocol can determine that both
chassis are still online and are still connected because the 3 remaining links are still
active. The base IRF protocol running at the chassis level will trigger MDC 1 to shut
down all external ports on the standby chassis, but the core IRF protocol and other
MDCs continue to operate normally.
This is in effect a split brain scenario for MDC 1, but is automatically resolved by
the base IRF protocol because the remaining links are still active and can be used to
detect the failure of the single MDC. Once again, MDC 1 is lost on the standby
chassis, but MDC 2, 3 and 4 will continue to operate normally.
In the second scenario, the IRF link that is part of MDC 2 is lost. In this example, as
per the previous example, the base IRF protocol continues to function normally. This
is because 3 out of 4 links are still up for the base IRF protocol. The data connection
for MDC 2 is down in this example, and this results in a split brain for MDC 2. The
IRF protocol will shut down the external facing interfaces of MDC 2 on the standby
chassis. All other MDCs will continue to operate normally and so will the base IRF

******ebook converter DEMO Watermarks*******

protocol.
Another advantage of this setup is that if the IRF link for a given MDC is restored,
the MDC is not rebooted and the ports on the slave device are restored automatically.
There is no reboot of the slave device as long as there is an IRF connection between
the switches.
A similar situation occurs in the third scenario. In this example, both MDC 1 and
MDC 2 will have the external interfaces of the standby chassis shut down because of
the split brain on those MDCs. The base IRF protocol will continue to operate as
normal as there are still two remaining links up between the chassis. MDCs 3 and 4
will also continue to operate normally.
In the last example, all links between the chassis are lost. This means that there is no
communication between the chassis IRF ports. This results in a split brain scenario
for the base IRF protocol and all MDCs. This scenario requires an external multiple
active detection method such as MAD BFD to resolve the split brain.

IRF-based MDC: IRF Fabric Split


An IRF fabric is split when no physical IRF ports connecting the chassis are active.
As shown in Figure 2-26, this results in both chassis becoming active at the same
time with the same IP address and same MAC address. This results in multiple
network issues and requires a split brain protocol such as Multi Active Detection
(MAD) to resolve. One of the systems in the IRF fabric should shut down all external
ports.

Figure 2-26: IRF-based MDC: IRF Fabric Split

Previously in this chapter, we discussed the scenario of a split in a single MDC


where the standby MDC is automatically shut down. When the link recovers, the
MDC is restarted and not the entire chassis. The base kernel and other MDCs will

******ebook converter DEMO Watermarks*******

continue to operate normally.


However, when the entire IRF is lost like in this example, the situation is different.
When the link is recovered, the standby system will need to be rebooted when it
rejoins the fabric. This is similar to a traditional IRF system.

Multi Active Detection (MAD)


When all physical IRF ports between chassis go down, an additional mechanism is
required to resolve multiple active devices. In order to ensure that the split brain is
detected and resolved, configure traditional MAD BFD or MAD LACP.
MAD BFD may be the preferred MAD method as there is no dependency on any
other devices outside of the IRF fabric and MAD BFD is very fast at detecting the
split.
As shown in Figure 2-27, MAD BFD is configured at the base IRF level and it thus
configured using the Admin MDC. In addition, all MAD BFD links need to be
assigned to the Admin MDC.

Figure 2-27: Multi Active Detection (MAD)

Summary
In this chapter, you learned about Multitenant Device Context (MDC). This is a
technology that can partition a physical device or an IRF fabric into multiple logical
switches called "MDCs."
MDC features and use cases were discussed in this chapter, including using a single
physical switch for multiple customers, which provides separation but also leverages
as single device.

******ebook converter DEMO Watermarks*******

The MDC architecture, supported devices and operation were discussed. Upgrade
restrictions and options were also discussed.
Lastly, support for MDC and IRF was discussed including the differences between
first and second generation switches such as the 12500 and 12900. The way IRF
ports are configured and the results of link failures including split brain scenarios
was also discussed.

Learning Check
Answer each of the questions below.
1. An administrator has configured two customer MDCs (MDC 2 and MDC 3) on a
core 12500 switch. What should an administrator configure to allow traffic
between the two MDCs?
a.

Create routed ports in each MDC and configure inter-VLAN routing


between the MDCs.

b. Configure VRFs in each MDC and enable route leaking between the VRFs.
c. Connect a physical cable from a port in MDC 2 to a port in MDC 3 and then
configure the ports to be in the same VLAN on each MDC.
d. Configure routing between MDC 1 and the customer MDCs. Traffic between
customer MDCs must be sent via the Admin MDC.
2. A network administrator has taken delivery of a new HP 12900 switch. How
many MDCs exist when the switch is booted?
a. Zero
b. One
c. Two
d. Four
e. Nine
3. How are interfaces allocated to MDCs?
a. By individual interface
b. By interface group
c. By interface port
d. By MDC number
4. Which device requires separate IRF ports per MDC?

******ebook converter DEMO Watermarks*******

a. 10500
b. 12900
c. 11900
d. 12500
5. A 12500 switch is configured with 4 IRF ports, each of which is in a different
MDC. Port 1 = MDC 1, Port 2 = MDC 2, Port 3 = MDC 4.
IRF Port 1 goes down. What is the result?
a. All MDCs go offline.
b. An IRF split occurs and MAD is required to resolve the split brain.
c. The core IRF protocol goes offline, but IRF within the MDCs continues as
normal.
d. MDC 1 goes offline, but other MDCs continue as normal. The core IRF
protocol requires MAD to resolve the split brain.
e. MDC 1 goes offline, but other MDCs continue as normal. The core IRF
protocol continues as normal.

Learning Check Answers


1. c
2. b
3. b
4. d
5. e

******ebook converter DEMO Watermarks*******

3 Multi-CE (MCE)

EXAM OBJECTIVES
In this chapter, you learn to:
Describe MCE Features.
Describe MCE use cases.
Configure MCE.
Describe and configure route leaking.
Configure isolated management access.

INTRODUCTION
Multi-VPN-Instance CE (MCE) enables a switch to function as a Customer Edge
(CE) device of multiple VPN instances in a BGP/MPLS VPN network, thus reducing
network equipment investment. In the remainder of this module we will use Multi-CE
or MCE when talking about Multi-VPN-Instance CE.

MPLS L3VPN Overview


MPLS L3VPN is an L3VPN technology used to interconnect geographically
dispersed VPN sites, as shown in Figure 3-1. MPLS L3VPN uses BGP to advertise
VPN routes and uses MPLS to forward VPN packets over a service provider
backbone.

******ebook converter DEMO Watermarks*******

Figure 3-1: MPLS L3VPN overview

MPLS L3VPN provides flexible networking modes, excellent scalability, and


convenient support for MPLS QoS and MPLS TE.
Note
MPLS basics are discussed in chapter 3 and MPLS VPNs in other study
guides. This study guide only covers the MCE feature without a detailed
discussion of MPLS L3VPNs.

Basic MPLS L3VPN Architecture


A basic MPLS L3VPN architecture has the following types of devices:
Customer edge device (CE device or CE) - A CE device resides on a customer
network and has one or more interfaces directly connected to a service provider
network. It does not support VPN or MPLS.
Provider edge device (PE device or PE) - A PE device resides at the edge of a
service provider network and connects to one or more CEs. All MPLS VPN
services are processed on PEs.
Provider device (P device or P) - A P device is a core device on a service
provider network. It is not directly connected to any CE. A P device has only
basic MPLS forwarding capability and does not handle
CEs and PEs mark the boundary between the service providers and the customers. A

******ebook converter DEMO Watermarks*******

CE is usually a router. After a CE establishes adjacency with a directly connected


PE, it redistributes its VPN routes to the PE and learns remote VPN routes from the
PE. CEs and PEs use BGP/IGP to exchange routing information. You can also
configure static routes between them.
After a PE learns the VPN routing information of a CE, it uses BGP to exchange VPN
routing information with other PEs. A PE maintains routing information about only
VPNs that are directly connected, rather than all VPN routing information on the
provider network.
A P router maintains only routes to PEs. It does not need to know anything about VPN
routing information.
When VPN traffic is transmitted over the MPLS backbone, the ingress PE functions
as the ingress LSR, the egress PE functions as the egress LSR, while P routers
function as the transit LSRs.

Site
A site has the following features:
A site is a group of IP systems with IP connectivity that does not rely on any
service provider network.
The classification of a site depends on the topological relationship of the devices,
rather than the geographical relationships, though the devices at a site are, in most
cases, adjacent to each other geographically.
A device at a site can belong to multiple VPNs, which means that a site can belong
to multiple VPNs.
A site is connected to a provider network through one or more CEs. A site can
contain multiple CEs, but a CE can belong to only one site.
Sites connected to the same provider network can be classified into different sets by
policies. Only the sites in the same set can access each other through the provider
network. Such a set is called a VPN.

Termininology
VRF / VPN Instance
VPN instances, also called virtual routing and forwarding (VRF) instances,
implement route isolation, data independence, and data security for VPNs.

******ebook converter DEMO Watermarks*******

A VPN instance has the following components:


A separate Label Forwarding Information Base (LFIB).
A separate routing table.
Interfaces bound to the VPN instance.
VPN instance administration information, including route distinguishers (RDs),
route targets (RTs), and route filtering policies.
To associate a site with a VPN instance, bind the VPN instance to the PE's interface
connected to the site. A site can be associated with only one VPN instance, and
different sites can associate with the same VPN instance. A VPN instance contains
the VPN membership and routing rules of associated sites.
With MPLS VPNs, routes of different VPNs are identified by VPN instances.
A PE creates and maintains a separate VPN instance for each directly connected site.
Each VPN instance contains the VPN membership and routing rules of the
corresponding site. If a user at a site belongs to multiple VPNs, the VPN instance of
the site contains information about all the VPNs.
For independence and security of VPN data, each VPN instance on a PE has a
separate routing table and a separate label forwarding information base (LFIB).
A VPN instance contains the following information: an LFIB, an IP routing table,
interfaces bound to the VPN instance, and administration information of the VPN
instance. The administration information includes the route distinguisher (RD), route
filtering policy, and member interface list.

VPN-IPv4 Address
Each VPN independently manages its address space. The address spaces of VPNs
might overlap. For example, if both VPN 1 and VPN 2 use the addresses on subnet
10.110.10.0/24, address space overlapping occurs.
BGP cannot process overlapping VPN address spaces. For example, if both VPN 1
and VPN 2 use the subnet 10.110.10.0/24 and each advertise a route destined for the
subnet, BGP selects only one of them, resulting in the loss of the other route.
Multiprotocol BGP (MP-BGP) can solve this problem by advertising VPN-IPv4
addresses (also called VPNv4 addresses).
As shown in Figure 3-2, a VPN-IPv4 address consists of 12 bytes. The first eight

******ebook converter DEMO Watermarks*******

bytes represent the RD, followed by a four-byte IPv4 prefix. The RD and the IPv4
prefix form a unique VPN-IPv4 prefix.

Figure 3-2: VPN-IPv4 address

An RD can be in one of the following formats:


When the Type field is 0, the Administrator subfield occupies two bytes, the
Assigned number subfield occupies four bytes, and the RD format is 16-bit AS
number:32-bit user-defined number. For example, 100:1.
When the Type field is 1, the Administrator subfield occupies four bytes, the
Assigned number subfield occupies two bytes, and the RD format is 32-bit IPv4
address:16-bit user-defined number. For example, 172.1.1.1:1.
When the Type field is 2, the Administrator subfield occupies four bytes, the
Assigned number subfield occupies two bytes, and the RD format is 32-bit AS
number:16-bit user-defined number, where the minimum value of the AS number
is 65536. For example, 65536:1.
To guarantee global uniqueness for a VPN-IPv4 address, do not set the Administrator
subfield to any private AS number or private IP address.

Route Target Attribute


MPLS L3VPN uses route target community attributes to control the advertisement of
VPN routing information. A VPN instance on a PE supports the following types of
route target attributes:
Export target attributeA PE sets the export target attribute for VPN-IPv4 routes
learned from directly connected sites before advertising them to other PEs.
Import target attributeA PE checks the export target attribute of VPN-IPv4
routes received from other PEs. If the export target attribute matches the import
target attribute of a VPN instance, the PE adds the routes to the routing table of the
VPN instance.
Route target attributes define which sites can receive VPN-IPv4 routes, and from
which sites a PE can receive routes.

******ebook converter DEMO Watermarks*******

Like RDs, route target attributes can be one of the following formats:
16-bit AS number:32-bit user-defined number. For example, 100:1.
32-bit IPv4 address:16-bit user-defined number. For example, 172.1.1.1:1.
32-bit AS number:16-bit user-defined number, where the minimum value of the
AS number is 65536. For example, 65536:1.

MCE / VRF-Lite
Multi-CE or VRF-Lite supports multiple VPN instances in customer edge devices.
This feature provides separate routing tables or VPNs without MPLS L3VPNs and
supports overlapping IP addresses.

MCE Overview
BGP/MPLS VPN transmits private network data through MPLS tunnels over the
public network. However, the traditional MPLS L3VPN architecture requires that
each VPN instance use an exclusive CE to connect to a PE, as shown in Figure 3-3.

Figure 3-3: MCE overview

A private network is usually divided into multiple VPNs to isolate services. To meet
these requirements, you can configure a CE for each VPN, which increases device
expense and maintenance costs. Or, you can configure multiple VPNs to use the same
CE and the same routing table, which sacrifices data security.
You can use the Multi-VPN-Instance CE (MCE) function in multi-VPN networks.

******ebook converter DEMO Watermarks*******

MCE allows you to bind each VPN to a VLAN interface. The MCE creates and
maintains a separate routing table for each VPN.
This separates the forwarding paths of packets of different VPNs and, in conjunction
with the PE, can correctly advertise the routes of each VPN to the peer PE, ensuring
the normal transmission of VPN packets over the public network.
As shown in Figure 3-3, the MCE device creates a routing table for each VPN.
VLAN interface 2 binds to VPN 1 and VLAN-interface 3 binds to VPN 2. When
receiving a route, the MCE device determines the source of the routing information
according to the number of the receiving interface, and then adds it to the
corresponding routing table. The MCE connects to PE 1 through a trunk link that
permits packets tagged with VLAN 2 or VLAN 3. PE 1 determines the VPN that a
received packet belongs to according to the VLAN tag of the packet, and sends the
packet through the corresponding tunnel.
You can configure static routes, RIP, OSPF, IS-IS, EBGP, or IBGP between an MCE
and a VPN site and between an MCE and a PE.
Note
To implement dynamic IP assignment for DHCP clients in private networks,
you can configure DHCP server or DHCP relay agent on the MCE. When the
MCE functions as the DHCP server, the IP addresses assigned to different
private networks cannot overlap.

Feature Overview
MCE Features
MCE supports the configuration of additional routing tables within a single routing
device. As analogy, this can be compared to VLANs configured on Layer 2 switches.
Each VLAN is a separate, isolated Layer 2 network and each VPN instance is a
separate, isolated Layer 3 network. Each VPN instance or VRF is a separate routing
table which runs independently of other routing tables on the device.
In Layer 2 VLANs, a Layer 2 access port belongs to a single VLAN. In the same way,
in VPN-instances, each Layer 3 routed interface belongs to a single VPN instance.
Examples of interfaces that belong to a single VPN instance include:
The Layer 3 interface of a VLAN. Example: interface

vlan 10

******ebook converter DEMO Watermarks*******

Routed ports. Example: Gigabit

Ethernet 1/0/2

Routed subinterfaces. Example: Gigabit

Ethernet 1/0/2.10

Loopback interfaces. Example: interface

loopback 1

In Figure 3-4, various interfaces have been defined in separate VPN instances. As an
example, Gigabit Ethernet 1/0 and Gigabit Ethernet 2/0.10 are configured in the RED
VPN instance, Gigabit Ethernet 2/0.20 is configured in the GREEN VPN instance and
loopback 10, interface VLAN 10 and interface VLAN 10 are configured in the BLUE
VPN instance.

Figure 3-4: Feature overview

Each VPN instance configured by a network administrator has separate interfaces and
separate routing tables.

Supported Platforms
MCE is available on almost all Comware routing devices (switches and routers).
Comware 5 fixed port switches include the 3600v2, 5500, 5800 and 5820 switches.
Comware 7 fixed port switches include the 5900, 5920 and 5930 switches. Chassis
based switches running either Comware 5 or Comware 7 include the 7500 (Comware
5), 10500, 11900, 12500 and 12900 switches.
Routers that support MCE include the MSR, HSR and SR series routers.

******ebook converter DEMO Watermarks*******

Design Considerations
The number of VPN instances supported is hardware dependent, as shown in Figure
3-5. For software based routers, the restriction is typically a memory restriction.

Figure 3-5: Design considerations

For switches, this is typically restricted by the ASICs used in the switches.

Use Case 1: Multi-Tenant Datacenter


A number of use cases for MCE will now be discussed.
The first use case is a Multi-Tenant Data Center. This is a data center infrastructure
provided by a hosting provider offering various services to customers.
A requirement in the environment is that each customer should have a separate
routing infrastructure isolated from other customers.
Access control lists (ACLs) could be used to separate customers, but ACLs need to
be individually configured and are often very complex and are prone to errors.
Customers would still be running within the same routing table instance and a misconfigured ACL would allow access between customer networks. By default traffic
would be permitted between customers and only with careful ACL configuration are
customers blocked.
MCE in contrast creates separate routing tables and thus separates customer
resources by design. No access is permitted between VPN instances by default. Only
with explicit additional configuration (route leaking) is traffic permitted between the
separate VPN instances. The MCE feature is also much simpler to configure and
maintain than traditional ACLs.

******ebook converter DEMO Watermarks*******

Typically, to ensure that all of these customers can access a common internet gateway
connection, MCE is combined with a virtual firewall per customer. The firewall used
would also be VPN instance aware to ensure separation.
In Figure 3-6, the RED and GREEN customer are configured in separate VPN
instances and cannot communicate with each other, even though they are using a
shared network infrastructure. Both customers can access also the Internet via the
common Internet firewall.

Figure 3-6: Use Case 1: Multi-tenant datacenter

Use Case 2: Campus with Independent Business


Units
The second use case is a campus with independent business units, or teams or
applications.
In some cases, external teams may be working at a customer site on a specific
project, but may be located throughout the campus. The owner of the infrastructure

******ebook converter DEMO Watermarks*******

may want to isolate the external team from the rest of the network, but allow them to
communicate across different parts of the core infrastructure. This would create a
separate isolated virtual network using the same equipment.
A second example may be the use of external application monitoring. An internal
ERP application may be monitored by an external supplier or partner. MCE could be
used to tightly control which networks are available to the external party. Only
certain internal routes would be advertised and available to the external party.
A third example of service isolation is a managed voice over IP (VoIP) infrastructure.
In this example, the entire VoIP infrastructure is managed and configured by an
external partner. The internal VoIP addressing is isolated from the normal corporate
infrastructure providing better security and separation. The external VoIP partner can
manage the VoIP network, but has no access to the rest of the network.
A forth example is a guest network. A network may consist of multiple locations
connected via routed links. Each location may need to provide guest connectivity, but
also use a centralized Internet connection. A remote site may be connected via a
routed WAN link to the central site and in this case, configuration of separate VPN
instances may be beneficial to provide guest network isolation across routed
networks.

Use Case 3: Overlapping IP Segments


In this third use case example, support for overlapping IP networks is required. This
may occur when companies merge and the same IP address space is used by multiple
parts of the business.
In this case each business or department is separated by VPN instances to isolate the
networks and their addressing.
If connectivity between the instances is required, a VPN instance aware firewall
could be used at the Layer 3 border between instances. This device would perform
network address translation (NAT) between the VPN instances as well as provide
firewall functionality.

Use Case 4: Isolated Management Network


A forth use case of VPN instances is an isolated management network for network
devices.
This would not be required for Layer 2 switches as these devices do not have IP

******ebook converter DEMO Watermarks*******

addresses in the customer network. The management subnet of a Layer 2 device is by


default isolated from the customer or user portion of the network. This is because a
Layer 2 switch only has one Layer 3 IP address which is used exclusively for device
management, but is configured in a separate management VLAN.
On Inter-VLAN routing devices or Layer 3 devices however, the IP interfaces of the
device are accessible by user or customer devices by design. Separation in this case
would be required. A dedicated VPN instance would be created for the management
interface of the device. Protocols such as SNMP, telnet, SSH and other traditional
networking management protocols would operate inside the dedicated VPN-Instance
and would not be accessible from the customer VPN instances.
Note
Several HP Provision switches have OOB Management ports. The Provision
OOB Management ports operate by default in their own IP routing space. There
is no requirements to define a new routing table for management purposes. This
is in contrast with HP Comware devices which require administrators to define
a management routing table (VPN Instance) for the OOB Management port.

Use Case 5: Shared Services in Data Center


This last use case discussed is a shared services VPN instance in a data center. In the
first use case discussed, VPN instances were used to separate customer networks. In
this example, VPN instances are extended to provide shared services.
The type of shared services that a service provider may offer a customer includes
central firewall facilities, backup facilities, network monitoring, hypervisor
management and security services. All services could be provided either within a
single VPN instance or by using multiple VPN instances.
Customers could continue using their own routing protocols such as OSPF within
their customer VPN instances. The shared services instances may even use different
routing protocols. Each VPN instance is still isolated and only specific routes are
permitted between the VPN instances by using route leaking.

Basic Configuration Steps


The following is an overview of the basic configuration steps:
1. Define a new VPN instance. This creates a new routing table or virtual routing

******ebook converter DEMO Watermarks*******

and forwarding instance (VRF).


2. Each VPN instance is uniquely identified by a route distinguisher (RD). This is
an eight byte value used to uniquely identify routes in Multiprotocol BGP (MPBGP). Even though MP-BGP is not used, the RD must be specified.
3. Layer 3 interfaces are then assigned to the VPN instance.
4. All existing interface configuration is removed in step 3. Any IP address or other
configuration will need to be reconfigured.
5. Optionally, dynamic or static routing can be configured.

Configuration Step 1: Define VPN-Instance


A VPN instance is a collection of the VPN membership and routing rules of its
associated site. See Figure 3-7 and Table 3-1 for the first configuration steps to
create a VPN instance.

Figure 3-7: Configuration step 1: Define VPN-Instance

Table 3-1: The first configuration step is to create a VPN instance


Step

Command

1. Enter system view.


2. Create a VPN instance and
enter VPN instance view.

system-view
ip vpn-instance
vpn-instance-name

Remarks
By default, no VPN
instance is created.

Once the VPN instance has been defined, a list of VPN instances can be displayed
and the routing table of the VPN instance can be displayed.

******ebook converter DEMO Watermarks*******

By default, no interfaces will be bound to the VPN instance apart from internal
loopback interfaces in the 127.0.0.0 range. The display ip routing-table vpninstance <name> will display this, as shown in Figure 3-8.

Figure 3-8: Step 1: Define VPN-Instance (continued)

Configuration Step 2: Route Distinguisher


The second step is to configure the route-distinguisher (RD) of the VPN instance, as
shown in Figure 3-9.

Figure 3-9: Configuration step 2: Route Distinguisher

BGP cannot process overlapping VPN address spaces. For example, if both VPN 1
and VPN 2 use the subnet 10.110.10.0/24 and each advertise a route destined for the
subnet, BGP selects only one of them, resulting in the loss of the other route.
Multiprotocol BGP (MP-BGP) can solve this problem by advertising VPN-IPv4
prefixes.
MCE does not require MP-BGP, but a unique RD is still required.
Use Table 3-2 to configure a Route Distinguisher and optional descriptions.
Table 3-2: How to configure an RD and optional descriptions

******ebook converter DEMO Watermarks*******

Step

Command

1. Enter system view.


2. Create a VPN instance and
enter VPN instance view.

system-view

3. Configure an RD for the


VPN instance.

route-distinguisher
route-distinguisher

4. (Optional.) Configure a
description for the VPN
instance.
5. (Optional.) Configure a
VPN
ID for the VPN instance.

ip vpn-instance
vpn-instance-name

Description
description

vpn vpn

Remarks
By default, no VPN instance
is created.
By default, no RD is
specified for a VPN
instance.
By default, no description is
configured for a VPN
instance.
By default, no VPN ID is
configured for a VPN
instance.

The command display ip vpn-instance [ instance-name vpn-instancename ] displays information about a specified or all VPN instances.

Syntax
display ip vpn-instance [ instance-name vpn-instance-name ]
instance-name vpn-instance-name

Displays information about the specified VPN instance. The vpninstance-name is a case-sensitive string of 1 to 31 characters. If no
VPN instance is specified, the command displays brief information
about all VPN instances.

Example
Display brief information about all VPN instances, as shown in Figure 3-10.

******ebook converter DEMO Watermarks*******

Figure 3-10: Step 2: Route Distinguisher (continued)

Command output is shown in Table 3-3.


Table 3-3: Display VPN-instance route distinguisher command output
Field

Description

VPN-Instance Name Name of the VPN instance.


RD
RD of the VPN instance.
Create Time

Time when the VPN instance was created.

Configuration Step 3: Define L3 Interface


Optionally, Layer 3 routed interfaces can be defined in the VPN instance. This
typically applies to switches as most switches have only a single routed interface by
default - interface VLAN 1. Additional Layer 3 interfaces can be created either as
routed ports, or Layer 3 VLAN interface, or routed subinterface, or loopback
interface.
Use display interface brief to display brief Ethernet interface
information. In the output in Figure 3-11, multiple interface types are
shown, including a routed port, routed subinterface, loopback interface
and VLAN interface.

******ebook converter DEMO Watermarks*******

Figure 3-11: Step 3: Define L3 Interface (continued)

Syntax
display interface [ interface-type [ interface-number | interfacenumber.subnumber ] ] brief [ description ]
interface-type

Specifies an interface type.


interface-number
Specifies an interface number.
interface-number.subnumber

Specifies a subinterface number, where interface-number is a main


interface (which must be a Layer 3 Ethernet interface) number, and
subnumber is the number of a subinterface created under the interface.
The value range for the subnumber argument is 1 to 4094.
description

Displays the full description of the specified interface. If the keyword is


not specified, the command displays at most the first 27 characters of the
interface description. If the keyword is specified, the command displays
all characters of the interface description.

Usage Guidelines
If no interface type is specified, this command displays information about all
interfaces.

******ebook converter DEMO Watermarks*******

If an interface type is specified but no interface number or subinterface number is


specified, this command displays information about all interfaces of that type.
If both the interface type and interface number are specified, this command displays
information about the specified interface.

Examples
Display brief information about all interfaces.

The brief information of interface(s) under bridge mode:

Command output is shown in Table 3-4.


Table 3-4: Display brief information about all interfaces command output
Field

Description

The brief

******ebook converter DEMO Watermarks*******

information of
Brief information about Layer 3 interfaces.
interface(s)
under route
mode:
Link: ADM ADMThe interface has been shut down by the network
administratively administrator. To recover its physical layer state, run the undo
down; Stby shutdown command.
standby
StbyThe interface is a standby interface.
If the network layer protocol of an interface is UP, but its link is an
Protocol: (s) on-demand link or not present at all, this field displays UP (s),
spoofing
where s represents the spoofing flag. This attribute is typical of
interface Null 0 and loopback interfaces.
Interface

Link

Description

The brief
information of
interface(s)
under bridge
mode:

Interface name.
Physical link state of the interface:
UPThe link is up.
DOWNThe link is physically down.
ADMThe link has been administratively shut down. To recover
its physical state, run the undo shutdown command.
StbyThe interface is a standby interface.
Interface description configured by using the description command.
If the description keyword is not specified in the display interface
brief command, the Description field displays at most 27
characters. If the description keyword is specified in the display
interface brief command, the field displays the full interface
description.

Brief information about Layer 2 interfaces.

If the speed of an interface is automatically negotiated, its speed


attribute includes the auto negotiation flag, indicated by the letter a
Speed or
in parentheses.
Duplex: (a)/A - If the duplex mode of an interface is automatically negotiated, its
auto; H - half; F duplex mode attribute includes the following options:
full
(a)/AAuto negotiation.
HHalf negotiation.
FFull negotiation.
Type: A access; T Link type options for Ethernet interfaces.

******ebook converter DEMO Watermarks*******

trunk; H
hybrid

Link type options for Ethernet interfaces.

Speed

Interface rate, in bps.


Duplex mode of the interface:
AAuto negotiation.
FFull duplex.
F(a)Auto negotiated full duplex.
HHalf duplex.
H(a)Auto negotiated half duplex
Link type of the interface:
AAccess.
HHybrid.
TTrunk..
Port VLAN ID.

Duplex

Type
PVID

Cause

Causes for the physical state of an interface to be DOWN.


Not connectedNo physical connection exists (possibly because
the network cable is disconnected or faulty).
Administratively DOWNThe port was shut down with the
shutdown command. To restore the physical state of the interface,
use the undo shutdown command.

Configuration Step 4: Bind L3 Interface


By default all Layer 3 interfaces on a device are associated with the default VPN
instance (public VPN instance).
After creating and configuring a VPN instance, associate the VPN instance with the
MCE's interface connected to the site and the interface connected to the PE.
Any IP address configuration on the interface is lost and will need to be reconfigured,
see Figure 3-12.

******ebook converter DEMO Watermarks*******

Figure 3-12: Configuration Step 4: Bind L3 Interface

Use Table 3-5 to associate a VPN instance with an interface.


Table 3-5: How to associate a VPN instance with an interface
Step

Command

1. Enter
system view.

systemview

2. Enter
interface view.

interface
interfacetype
interfacenumber

3. Associate a
VPN instance
with the
interface.

ip binding
vpninstance
vpninstancename

Remarks

By default, no VPN instance is associated with an


interface. The interface is by default part of the public
/ default instance
The ip binding vpn-instance command deletes
the IP address of the current interface. You must reconfigure an IP address for the interface after
configuring the command.

Display detailed information about a specified VPN instance.


<Sysname> display ip vpn-instance instance-name vpn1
VPN-Instance Name and ID : vpn1, 1
Create time : 2000/04/26 13:29:37
Up time : 0 days, 16 hours, 45 minutes and 21 seconds
Route Distinguisher : 10:1
Export VPN Targets : 10:1
Import VPN Targets : 10:1

******ebook converter DEMO Watermarks*******

Description : this is vpn1


Maximum Routes Limit : 200
Interfaces : Vlan-interface2, LoopBack0

Command output is shown in Table 3-6.


Table 3-6: Display detailed VPN instance information command output
Field

Description

VPN-Instance Name and ID Name and ID of the VPN instance


Create Time
Time when the VPN instance was created
Up Time
Duration the VPN instance has been up
Route Distinguisher
Export VPN Targets
Import VPN Targets

RD of the VPN instance


Export target attribute of the VPN instance
Import target attribute of the VPN instance

Import Route Policy


Import routing policy of the VPN instance
Description
Description of the VPN instance
Maximum number of Routes Maximum number of routes of the VPN instance
Interfaces

Interfaces bound to the VPN instance

Configuration Step 5: Configure IP on L3


address
Overview
Once the Layer 3 interface has been associated with the VPN instance, an IP address
is required. Configure the IP address on the interface in the VPN instance.
The display interface brief command does not indicate VPN instance
membership. To view the VPN instance membership, use the display ip vpninstance or display ip routing table commands.
In the example in Figure 3-13, an IP address is configured on Gigabit Ethernet 2/0
and this is shown in the output of the display ip vpn-instance instance-name
vpn1

******ebook converter DEMO Watermarks*******

Figure 3-13: Configuration Step 5: Configure IP on L3 address

IP Address
Use the ip
port.

address

Use the undo

command to assign an IPv4 address to the management Ethernet

ip address

command to restore the default.

Syntax
ip address ip-address { mask-length | mask }
undo ip address

ip-address: Specifies an IPv4 address in dotted decimal notation.


mask-length: Specifies the length of the subnet mask, in the range of 0 to 32.
mask: Specifies the subnet mask in dotted decimal notation.
Default: No IPv4 address is configured.

Display IP Routing-Table VPN-Instance


Use the display ip routing-table vpn-instance command to display the routing
information of a VPN instance / VRF.

Syntax
display ip routing-table vpn-instance vpn-instance-name [ verbose ]
vpn-instance-name

******ebook converter DEMO Watermarks*******

Name of the VPN instance, a string of 1 to 31 characters.


verbose

Displays detailed information.

Example
Display the routing information of VPN instance vpn2.

Command output is shown in Table 3-7.


Table 3-7: Display IP routing-table VPN-instance command output
Field

Description

Destinations

Number of destination addresses

Routes
Number of routes
Destination/Mask Destination address/mask length
Proto
Protocol discovering the route
Pre
Preference of the route
Cost
NextHop
Interface

Cost of the route


Address of the next hop along the route
Outbound interface for forwarding packets to the destination
segment

Configuration Step 6: Configure Routing (1 of


3)
Overview
******ebook converter DEMO Watermarks*******

You can configure static routing, OSPF, EBGP, or IBGP between an MCE and a VPN
site.

Static Routes
An MCE can reach a VPN site through a static route, see Figure 3-14 for an example
static route inside VPN-Instance. Static routing on a traditional CE is globally
effective and does not support address overlapping among VPNs. An MCE supports
binding a static route to a VPN instance, so that the static routes of different VPN
instances can be isolated from each other.

Figure 3-14: Configuration step 6: Configure Routing (1 of 3)

Use Table 3-8 to configure a static route to a VPN site.


Table 3-8: How to configure a static route to a VPN site
Step

Command

1. Enter
system
view.

system-view

2.
Configure
a static
route for a
VPN
instance.

ip route-static vpn-instance s-vpninstance-name dest-address { mask-length |


mask } { interface-type interface-number [
next-hop-address ] | next-hop-address [
public ] | vpn-instance d-vpn-instancename next-hop-address } [ permanent ] [
preference preference-value ] [ tag tagvalue ] [ description description-text ]

3.
(Optional.)
Configure

******ebook converter DEMO Watermarks*******

Remarks

By default, no
static route is
configured.
Perform this
configuration on
the MCE. On the
VPN site,
configure a
common static
route.

The default

the default
reference
for static
routes.

default-preference-value

The default
preference is 60.

Configuration Step 6: Configure routing (2 of


3)
Once the static routes have been defined, the routing tables for the VPN instance can
be reviewed.
As shown in Figure 3-15, network connectivity can also be tested using the ping and
tracert tools for example. These commands require that the -vpn-instance <name>
option be specified to indicate the specific VPN instance. Otherwise traffic is sent in
the public instance.

Figure 3-15: Configuration step 6: Configure routing (2 of 3)

This also applies to other commands such as viewing the ARP cache.

ping
Use ping to verify whether the destination IP address is reachable, and display
related statistics.
To use the name of the destination host to perform the ping operation, you must first

******ebook converter DEMO Watermarks*******

configure the DNS on the device. Otherwise, the ping operation will fail.
To abort the ping operation during the execution of the command, press Ctrl+C.

Syntax
ping [ ip ] [ -a source-ip | -c count | -f | -h ttl | -i interfacetype interface-number | -m interval | -n | -p pad | -q | -r | -s
packet-size | -t timeout | -tos tos | -v | -vpn-instance vpninstance-name ] * host
ip:

Supports IPv4 protocol. If this keyword is not specified, IPv4 is also


supported.
-a source-ip:

Specifies the source IP address of an ICMP echo request. It


must be an IP address configured on the device. If this option is not specified,
the source IP address of an ICMP echo request is the primary IP address of
the outbound interface of the request.
-c count:

Specifies the number of times that an ICMP echo request is sent.


The count argument is in the range of 1 to 4294967295. The default value is
5.
-f:

Discards packets larger than the MTU of an outbound interface, which


means the ICMP echo request is not allowed to be fragmented.
ttl: Specifies the TTL value for an ICMP echo request. The ttl argument is
in the range of 1 to 255. The default value is 255.
-h

interface-type interface-number: Specifies the ICMP echo request sending


interface by its type and number. If this option is not provided, the ICMP echo
request sending interface is determined by searching the routing table or
forwarding table according to the destination IP address.
-i

-m interval:

Specifies the interval (in milliseconds) to send an ICMP echo


request. The interval argument is in the range of 1 to 65535. The default value
is 200.
-n:

Disables domain name resolution for the host argument. If the host
argument represents the host name for the destination, and this keyword is not
specified, the device translates host into an address.
-p pad:

Specifies the value of the pad field in an ICMP echo request, in


hexadecimal format. No more than 8 "pad" hexadecimal characters can be
used. The pad argument is 0 to ffffffff. If the specified value is less than 8
characters, 0s are added in front of the value to extend it to 8 characters. For
example, if pad is configured as 0x2f, then the packets are padded with

******ebook converter DEMO Watermarks*******

0x0000002f to make the total length of the packet meet the requirements of the
device. By default, the padded value starts from 0x01 up to 0xff, where
another round starts again if necessary, like 0x010203feff01.
-q:

Displays only statistics. If this keyword is not specified, the system


displays all information.
-r:

Records routing information. If this keyword is not specified, routes are


not recorded.
-s packet-size:

Specifies length (in bytes) of an ICMP echo request (not


including the IP packet header and the ICMP packet header). The packet-size
argument is in the range of 20 to 8100. The default value is 56.
-t timeout:

Specifies the timeout time (in milliseconds) of an ICMP echo


reply. If the source does not receive an ICMP echo reply within the timeout, it
considers the ICMP echo reply timed out. The timeout argument is in the
range of 0 to 65535. The default value is 2000.
-tos tos:

Specifies the ToS value of an ICMP echo request. The tos


argument is in the range of 0 to 255. The default value is 0.
-v:

Displays non ICMP echo reply received. If this keyword is not specified,
the system does not display non ICMP echo reply.
Specifies the MPLS L3VPN to
which the destination belongs, where the vpn-instance-name argument is a
case-sensitive string of 1 to 31 characters. If the destination is on the public
network, do not specify this option.
-vpn-instance

vpn-instance-name:

host:

IP address or host name (a string of 1 to 20 characters) for the


destination.

Examples
Test whether the device with an IP address of 1.1.2.2 is reachable.

******ebook converter DEMO Watermarks*******

Test whether the device with an IP address of 1.1.2.2 in VPN 1 is reachable.

Test whether the device with an IP address of 1.1.2.2 is reachable. Only results are
displayed.

Test whether the device with an IP address of 1.1.2.2 is reachable. The route
information is displayed.

The output shows that:


The destination is reachable.
The route is 1.1.1.1 <-> {1.1.1.2; 1.1.2.1} <-> 1.1.2.2.

******ebook converter DEMO Watermarks*******

Table 3-9: Test reachable destinations command output


Field

Description

PING 1.1.2.2 (1.1.2.2):


56 data bytes, press
CTRL_C to break

Test whether the device with IP address 1.1.2.2 is


reachable. There are 56 data bytes in each ICMP echo
request. Press Ctrl+C to abort the ping operation.
Received ICMP echo replies from the device whose IP
address is 1.1.2.2. If no echo reply is received during the
timeout period, no information is displayed.
bytesNumber of data bytes in the ICMP reply.
icmp_seqPacket sequence, used to determine whether a
segment is lost, disordered or repeated.
ttlTTL value in the ICMP reply.
timeResponse time.

56 bytes from 1.1.2.2:


icmp_seq=0 ttl=254
time=4.685 ms

RR:

Routers through which the ICMP echo request passed.


They are displayed in inversed order, which means the
router with a smaller distance to the destination is
displayed first.

--- 1.1.2.2 ping statistics


Statistics on data received and sent in the ping operation.
--5 packet(s) transmitted Number of ICMP echo requests sent.
5 packet(s) received
Number of ICMP echo replies received.
0.0% packet loss

Percentage of packets not responded to the total packets


sent.

round-trip
min/avg/max/std-dev = Minimum/average/maximum/standard deviation response
4.685/4.761/4.834/0.058 time, in milliseconds.
ms

Configuration step 6: Configure routing (3 of 3)


Use the display arp vpn-instance command to display the ARP entries for a
specific VPN. As shown in Figure 3-16, the command shows information about ARP
entries including the IP address, MAC address, VLAN ID, output interface, entry
type, and aging timer.

******ebook converter DEMO Watermarks*******

Figure 3-16: Configuration step 6: Configure routing (3 of 3)

Syntax
display arp vpn-instance vpn-instance-name [ count ]
vpn-instance-name

Specifies the name of an MPLS L3VPN, a case-sensitive string of 1 to


31 characters.
count

Displays the number of ARP entries.

Example
Display ARP entries for the VPN instance named test.

VPN-Instance dynamic routing OSPF


example
Overview
A separate OSPF process is required for every VPN instance. In Figure 3-17, an
OSPF process of 1001 is configured for VPN instance customerA. When configuring
the OSPF process, specify a unique process number for that OSPF process and the
VPN instance that the OSPF process is associated with.

******ebook converter DEMO Watermarks*******

Figure 3-17: VPN-Instance dynamic routing--OSPF example

Each OSPF process configured on a device will have its own link state database and
requires its own router-id which must exist in the VPN instance.
The OSPF configuration process is very similar to traditional OSPF configuration. A
loopback address is configured in the VPN instance before configuring OSPF. If no
routed interfaces are available within the VPN instance, the OSPF process will not
start because no router-id can be allocated to the process.
In Figure 3-17 area 0 is configured within the OSPF process and OSPF is enabled on
all interfaces configured with IPv4 addresses in the VPN instance.

Loopback configuration
By default all Layer 3 interfaces are associated with the default VPN instance.
After creating and configuring a VPN instance, associate the VPN instance with the
MCE's interface connected to the site and the interface connected to the PE.
Any IP address configuration on the interface is lost and will need to be reconfigured.
To associate a VPN instance with an interface, see Table 3-10.
Table 3-10: How to associate a VPN instance with an interface
Step

Command

1. Enter
system view.

system-view

2. Enter

interface
interface-

Remarks

******ebook converter DEMO Watermarks*******

interface
view.
3. Associate
a VPN
instance with
the interface.

type
interfacenumber
ip binding
vpn-instance
vpninstancename

By default, no VPN instance is associated with an


interface.
The ip binding vpn-instance command deletes
the IP address of the current interface. You must reconfigure an IP address for the interface after
configuring the command.

OSPF
An OSPF process belongs to the public network or a single VPN instance. If you
create an OSPF process without binding it to a VPN instance, the process belongs to
the public network.
Binding OSPF processes to VPN instances ensures that routes learned populate the
correct VPN instance.
To configure OSPF between an MCE and a VPN site, see Table 3-11.
Table 3-11: How to configure OSPF between an MCE and a VPN site
Step

Command

1. Enter system
view.

system-view

2. Create an OSPF
process for a VPN
instance and enter
OSPF view.

ospf [ process-id |
router-id router-id |
vpn-instance vpninstance-name ] *

Remarks

Perform this configuration on the


MCE. On a VPN site, create a
common OSPF process.
An OSPF process bound to a
VPN instance does not use the
public network router ID
configured in system view.
Therefore, configure a router ID
for the OSPF process.
An OSPF process can belong to
only one VPN instance, but one
VPN instance can use multiple
OSPF processes to advertise
VPN routes.
The default domain ID is 0.

******ebook converter DEMO Watermarks*******

3. (Optional.)
Configure the
OSPF domain ID.

4. (Optional.)
Configure the type
codes of OSPF
extended community
attributes.
5. Optional.)
Configure the
external route tag
for imported VPN
routes.

6. Redistribute
remote site routes
advertised by the PE
into OSPF.
7. (Optional.)
Configure OSPF to
redistribute the
default route.

8. Create an OSPF
area and enter OSPF
area view.

9. Enable OSPF on
interfaces that are
configured with
subnets in the range
specified by the
network command.

domain-id domain-id [
secondary ]

Perform this configuration on the


MCE. All OSPF processes of the
same VPN instance must be
configured with the same OSPF
domain ID to ensure correct route
advertisement..

ext-community-type {
The defaults are as follows:
domain-id type-code1 |
router-id type-code2 | 0x0005 for Domain ID.
route-type type-code3 0x0107 for Router ID.
0x0306 for Route Type.
}

route-tag tag-value

import-route protocol
[ process-id
| all-processes |
allow-ibgp ] [ allowdirect | cost cost |
route-policy routepolicy-name | tag tag
| type type ] *

By default, no routes are


redistributed into OSPF.

By default, no routes are


redistributed into OSPF.

default-routeadvertise summary cost


cost

By default, OSPF does not


redistribute the default route.
This command redistributes the
default route in a Type-3 LSA.
The MCE advertises the default
route to the site.

area area-id

By default, no OSPF area is


created.

network ip-address
wildcard-mask

By default, an interface neither


belongs to any area nor runs
OSPF.

******ebook converter DEMO Watermarks*******

VPN-instance dynamic routingOSPF example


Overview
To view information for a specific OSPF process or VPN instance, specify the OSPF
process number in commands. The VPN instance keyword is not required as an
OSPF process number is associated with an individual VPN instance.
In Figure 3-18, the link state database of OSPF process number 1001 is displayed as
well as the OSPF peers for the process.

Figure 3-18: VPN-instance dynamic routing--OSPF example

display ospf lsdb


Use the display ospf lsdb command to display OSPF LSDB information. If no
OSPF process is specified, this command displays LSDB information for all OSPF
processes.

Syntax
display ospf [ process-id ] lsdb [ brief | [ { asbr | ase | network
| nssa | opaque-area | opaque-as | opaque-link | router | summary }
[ link-state-id ] ] [ originate-router advertising-router-id |
self-originate ] ]

******ebook converter DEMO Watermarks*******

process-id

Specifies an OSPF process by its ID in the range of 1 to 65535.


brief

Displays brief LSDB information.


asbr

Displays Type-4 LSA (ASBR Summary LSA) information in the LSDB.


ase

Displays Type-5 LSA (AS External LSA) information in the LSDB.


network

Displays Type-2 LSA (Network LSA) information in the LSDB.


nssa

Displays Type-7 LSA (NSSA External LSA) information in the LSDB.


opaque-area

Displays Type-10 LSA (Opaque-area LSA) information in the LSDB.


opaque-as

Displays Type-11 LSA (Opaque-AS LSA) information in the LSDB.


opaque-link

Displays Type-9 LSA (Opaque-link LSA) information in the LSDB.


router

Displays Type-1 LSA (Router LSA) information in the LSDB.


summary

Displays Type-3 LSA (Network Summary LSA) information in the


LSDB.
link-state-id

Specifies a Link state ID, in the IP address format.


originate-router advertising-router-id

Displays information about LSAs originated by the specified router.

******ebook converter DEMO Watermarks*******

self-originate

Displays information about self-originated LSAs.

Example
Display OSPF LSDB information.

Command output is shown in Table 3-12.


Table 3-12: Display OSPF LSDB command output
Field

Description

Area
Type

LSDB information of the area.


LSA Type.

LinkState ID
AdvRouter
Age
Len

Link state ID.


Advertising router.
Age of LSA.
Length of LSA.

******ebook converter DEMO Watermarks*******

Sequence
Metric

Sequence number of the LSA.


Cost of the LSA.

*Opq-Link Opaque LSA generated by a virtual link.


Display Type-2 LSA (Network LSA) information in the LSDB.

Command output is shown in Table 3-13.


Table 3-13: Display Type-2 LSA (Network LSA) information in the LSDB command
output
Field

Description

Type

LSA type.

LS ID

DR IP address.

******ebook converter DEMO Watermarks*******

Adv Rtr
LS Age

Router that advertised the LSA.


LSA age time.

Len

Length of LSA.
LSA options:
O-Opaque LSA advertisement capability.
E-AS External LSA reception capability.
EA-External extended LSA reception capability.
DC-On-demand link support.
N-NSSA external LSA support.
P-Capability of an NSSA ABR to translate Type-7 LSAs into Type-5
LSAs.

Options

Seq#
Checksum
Net Mask

LSA sequence number.


LSA checksum.
Network mask.

Attached
Router

ID of the router that established adjacency with the DR, and ID of the
DR itself.

display ospf peer


Use the display
neighbors.

ospf

peer

command to display information about OSPF

If no OSPF process is specified, this command displays OSPF neighbor information


for all OSPF processes.
If the verbose keyword is not specified, this command displays brief OSPF neighbor
information.
If no interface is specified, this command displays the neighbor information for all
interfaces.
If no neighbor ID is specified, this command displays all neighbor information.

Syntax
display ospf [ process-id ] peer
interface-number ] [ neighbor-id ]

verbose

interface-type

process-id

Specifies an OSPF process by ID in the range of 1 to 65535.

******ebook converter DEMO Watermarks*******

verbose

Displays detailed neighbor information.


interface-type interface-number

Specifies an interface by its type and number.


neighbor-id

Specifies a neighbor router ID.

Example
Display detailed OSPF neighbor information.

Command output is shown in Table 3-14.


Table 3-14: Display detailed OSPF neighbor information command output
Field
Area areaID
interface
IPAddress
(InterfaceName)'s
neighbors
Router ID
Address
GR State

Description
Neighbor information of the interface in the specified area:
areaID-Area to which the neighbor belongs.
IPAddress-Interface IP address.
InterfaceName-Interface name.
Neighbor router ID.
Neighbor router address.
GR state.
Neighbor state:
Down-Initial state of a neighbor conversation.

******ebook converter DEMO Watermarks*******

Mode
Priority

Init-The router has seen a Hello packet from the neighbor.


However, the router has not established bidirectional
communication with the neighbor (the router itself did not appear
in the neighbor's hello packet).
Attempt- Available only in an NBMA network, Under this state,
the OSPF router has not received any information from a
neighbor for a period but can send Hello packets at a longer
interval to keep neighbor relationship.
2-Way-Communication between the two routers is bidirectional.
The router itself appears in the neighbor's Hello packet.
Exstart-The goal of this state is to decide which router is the
master, and to decide upon the initial Database Description (DD)
sequence number.
Exchange-The router is sending DD packets to the neighbor,
describing its entire link-state database.
Loading-The router sends LSRs packets to the neighbor,
requesting more recent LSAs.
Full-The neighboring routers are fully adjacent.
Neighbor mode for LSDB synchronization.
Neighboring router priority.

DR
BDR

DR on the interface's network segment.


BDR on the interface's network segment.

MTU

Neighboring router interface MTU.


LSA options:
O-Opaque LSA advertisement capability.
E-AS External LSA reception capability.
EA-External extended LSA reception capability.
DC-On-demand link support.
N-NSSA external LSA support.
P-Capability of an NSSA ABR to translate Type-7 LSAs into
Type-5 LSAs.

State

Options

Dead timer due in


This dead timer will expire in 33 seconds.
33 sec
Neighbor is up
The neighbor has been up for 02:03:35.
for 02:03:35
Authentication
Authentication sequence number.
Sequence
Neighbor state
change count

Count of neighbor state changes.

******ebook converter DEMO Watermarks*******

VPN-Instance Dynamic Routing OSPF


Example
Overview
Use the display ip routing-table vpn-instance command to display the
routing information of a VPN instance / VRF.
In Figure 3-19, the output of the routing table for VPN instance customerA is shown
on R1.

Figure 3-19: VPN-instance dynamic routing--OSPF example

Syntax
display ip routing-table vpn-instance vpn-instance-name [ verbose ]
vpn-instance-name

Name of the VPN instance, a string of 1 to 31 characters.


verbose

Displays detailed information.

Example
Display the routing information of VPN instance vpn2.

******ebook converter DEMO Watermarks*******

Command output is shown in Table 3-15:


Table 3-15: Display the routing information of VPN instance vpn2 command output
Field

Description

Destinations
Number of destination addresses
Routes
Number of routes
Destination/Mask Destination address/mask length
Proto
Pre
Cost
NextHop
Interface

Protocol discovering the route


Preference of the route
Cost of the route
Address of the next hop along the route
Outbound interface for forwarding packets to the destination
segment

MCE: Advanced configuration


In this section, advanced MDC configuration topics are discussed including the
following:
Routing table limits are used to ensure that the VPN-instance routing tables do not
consume all the hardware resources of the underlying platform. This is done by
limiting the number of routes permitted in a VPN instance.
Route leaking is a VPN configuration option which allows routing between VPN
instances. VPN instances are by design isolated from each other. However, in
certain cases, routing is required between VPN instances and routes can therefore

******ebook converter DEMO Watermarks*******

be "leaked" between isolated routing tables. Routes of one VPN instance can also
be advertised into other VPN instances to provide dynamic routing exchange
between routing protocols in different VPN instances.
Management Access VPN instances are popular in data center environments and
are typically configured on core and distribution switches. These switches are
performing IP routing and IP forwarding roles for customer VPNs. To isolate the
management function of these devices from customer networks, management
protocols and management functionality are configured within a dedicated
management VPN instance.

VPN instance routing limits


Overview
VPN instance routing table limits allow a network operator to restrict the number of
active routes allowed in a VPN instance.
It is recommended that VPN instance routing limits be configured on all customer
VPNs to ensure resource protection. If this is not done, a single VPN instance could
potentially consume all hardware resources.
As an example, if OSPF was configured within a customer VPN instance, the OSPF
process within the VPN instance may be learning hundreds or thousands of routes
from an external OSPF router. However, the same underlying hardware or ASICs are
being used for all OSPF processing in all VPN instances on that device. A customer
VPN instance may be able to consume a disproportionate amount of resources, or in
the worst case scenario, consume all resources on core devices.
Underlying ASIC routing table limits apply to all the VPN instances which are
defined. If a switch can support 64 thousand routes as a maximum, one VPN instance
could consume all 64 thousand routes which would mean that there are no hardware
resources available for other VPN instances. This will not only affect that single
VPN instance, but will affect all VPN instances and therefore potentially affect all
customers.
Setting limits on the number of routes permitted in a VPN instance ensures that a
sufficient number of free resources are available on core devices. This protects both
backbone routing as well as routing for other VPN instances.
By default, the number of active routes allowed for a VPN instance is not limited.
Setting the maximum number of active routes for a VPN instance can prevent the a

******ebook converter DEMO Watermarks*******

device from learning too many routes.


Two types of limits are configurable:
Limit
Warning threshold
The routing table limit will limit the maximum number of routes accepted by the
routing table. In Figure 3-20, this value is set to 20 which limits the routes in the VPN
instance to a maximum of 20 routes. In Figure 3-20 the warning threshold is also set
to 80 percent. When the number of routes in the VPN instance reaches 16 routes
(80% of 20), SNMP traps will be generated to warn network operators that the
number of routes in the routing table is approaching the maximum. This is a type of
high-water mark alert notifying network operators before additional routes are
denied entry to the VPN instance routing table.

Figure 3-20: VPN instance routing limits

Warning message examples


.%May 13 11:56:13:847 2014 HP RM/4/RM_ACRT_REACH_THRESVALUE:
Threshold value 80% of max active IPv4 routes reached in URT of customerA
.%May 13 11:56:33:426 2014 HP RM/4/RM_ROUTE_REACH_LIMIT: Max active
IPv4 routes 20 reached the limit in URT of customerA

Syntax
routing-table limit number { warn-threshold | simply-alert }
undo routing-table limit
number

Specifies the maximum number of routes. The value range depends on


the system operating mode.
warn-threshold

Specifies a warning threshold, in the range of 1 to 100 in percentage.


When the percentage of the number of existing routes to the maximum
number of routes exceeds the specified threshold, the system gives an

******ebook converter DEMO Watermarks*******

alarm message but still allows new routes. If routes in the VPN instance
reach the maximum, no more routes are added
simply-alert

Specifies that when routes exceeds the maximum number, the system still
accepts routes but generates a system log message.

Usage guidelines
A limit configured in VPN instance view applies to both the IPv4 VPN and the IPv6
VPN.
A limit configured in IPv4 VPN view or IPv6 VPN view applies to only the IPv4
VPN or the IPv6 VPN.
IPv4/IPv6 VPN prefers the limit configured in IPv4/IPv6 VPN view over the limit
configured in VPN instance view.

Examples
Specify that VPN instance vpn1 supports up to 1000 routes, and when routes exceed
the upper limit, can receive new routes but generates a system log message.
<Sysname> system-view
[Sysname] ip vpn-instance vpn1
[Sysname-vpn-instance-vpn1] route-distinguisher 100:1
[Sysname-vpn-instance-vpn1] routing-table limit 1000 simply-alert
Specify that the IPv4 VPN vpn2 supports up to 1000 routes, and when
routes exceed the upper limit, can receive new routes but generates
a system log message.
<Sysname> system-view
[Sysname] ip vpn-instance vpn2
[Sysname-vpn-instance-vpn2] route-distinguisher 100:2
[Sysname-vpn-instance-vpn2] ipv4-family
[Sysname-vpn-ipv4-vpn2] routing-table limit 1000 simply-alert

Specify that the IPv6 VPN vpn3 supports up to 1000 routes, and when routes exceed
the upper limit, can receive new routes but generates a system log message.
<Sysname> system-view
[Sysname] ip vpn-instance vpn3

******ebook converter DEMO Watermarks*******

[Sysname-vpn-instance-vpn3] route-distinguisher 100:3


[Sysname-vpn-instance-vpn3] ipv6-family
[Sysname-vpn-ipv4-vpn3] routing-table limit 1000 simply-alert

Route leaking
Route leaking allows for tightly controlled routed communication between VPN
instances. While the original purpose of VPN instances was to isolate communication
between instances, you may have scenarios where some routed communication
between VPN instances is required.
One of the advantages of using the VPN instance is that we can simply create a
limited number of routes for leaking. If the routes are not manually defined, no
communication will be possible between VPN instances. Access control is therefore
easier to implement than using access control lists within a single routing table.
One use case for route leaking is to specify that certain subnets in VPN instance A are
reachable from certain subnets instance B. As an example, 10.1.1.0/24 in VPN
instance A is reachable from 10.2.2.0/24 in VPN instance B. Note that 10.2.2.0/24
needs to be reachable from VPN instance A to allow for bidirectional
communication.
Another scenario is a shared service within a data center configured on a subnet such
as 10.254.0.0/16. The shared subnet could provide backup services or monitoring
services to customer VPN instances. Route leaking between a VPN instance and the
public routing table is also possible for scenarios such as a central firewall with
Internet access. The firewall could be in a dedicated VPN instance as shown in
Figure 3-21 or the public routing table.
Note
When configuring route leaking, ensure that bidirectional communication is
enabled by leaking the necessary routes into both VPN instances.

******ebook converter DEMO Watermarks*******

Figure 3-21: Route leaking

Network Address Translation (NAT) is not supported on some of the Layer 3


switches discussed in this study guide. Therefore, overlapping IP addresses cannot
be used if only those devices are used. Overlapping subnets in VPN instances can be
used, but require inter-VPN instance NAT which is only supported on routers.

Route leakingStatic route example


The scenario in Figure 3-22 shows two VPN instances which require communication
between them. A shared firewall is configured in the Shared-Internet (shared) VPN
instance. The Core routing device has multiple VPN instances configured with one
interface in the CustomerA VPN instance and the other in the Shared-Internet (shared)
VPN instance. CA-R1 is unaware of any configured VPN instances and is configured
as a traditional router or switch. The firewall is also unaware of VPN instances in
this example. Several firewalls can be configured to be VPN instance aware, but in
this example, the firewall is configured as a traditional firewall only with IPv4
addresses on the internal and external interfaces.

******ebook converter DEMO Watermarks*******

Figure 3-22: Route leaking--Static route example

The CustomerA VPN instance is configured with subnets in the 10.2.0.0/16 range and
the Shared-Internet (shared) VPN instance with subnets in the 10.3.0.0/16 range. An
Internet facing router is performing NAT (not shown in the diagram).
To enable connectivity, static routes need to be configured on the Core, CA-R1 and
Firewall devices.
The first static route command in Figure 3-22 is configured on the Core device and
enables connectivity to networks in the range 10.2.0.0/16 via the next hop 10.2.1.2
(CA-R1). The static route is added to the CustomerA VPN instance routing table on
the Core device. The second static route adds a default route to the shared (SharedInternet) VPN instance with the next hop set as the Firewall.
CA-R1 has a default route configured with the Core device being the next hop. The
Firewall needs to be configured with subnet 10.2.0.0/16 to allow for bidirectional
traffic. The next hop on the Firewall for 10.2.0.0/16 is set to the Core device.
To enable route leaking, additional static routes are then added to each VPN instance
on the Core device. A static route is added to each VPN instance on the core device,
but in this case the next hop is set to an IP address in a different VPN instance. The
first route leaking command adds a default route (0.0.0.0) to the CustomerA VPN
instance, but sets the next hop to 10.3.1.3 in the shared VPN instance. The second
route leaking command adds network 10.2.0.0/16 to the shared VPN instance with a
next hop of 10.2.1.2 in the CustomerA VPN instance.

Syntax
******ebook converter DEMO Watermarks*******

ip route-static vpn-instance s-vpn-instance-name dest-address {


mask | mask-length } { next-hop-address [ public ] [ bfd controlpacket bfd-source ip-address | permanent | track track-entry-number
] | interface-type interface-number [ next-hop-address ] [ backupinterface interface-type interface-number [ backup-nexthop backupnexthop-address ] [ permanent ] | bfd { control-packet | echopacket } | permanent ] | vpn-instance d-vpn-instance-name next-hopaddress [ bfd control-packet bfd-source ip-address | permanent |
track track-entry-number ] } [ preference preference-value ] [ tag
tag-value ] [ description description-text ]
undo ip route-static vpn-instance s-vpn-instance-name dest-address
{ mask | mask-length } [ next-hop-address [ public ] | interfacetype interface-number [ next-hop-address ] | vpn-instance d-vpninstance-name next-hop-address ] [ preference preference-value ]
vpn-instance s-vpn-instance-name

Specifies a source MPLS L3VPN by its name, a case-sensitive string of


1 to 31 characters. Each VPN has its own routing table, and the
configured static route is installed in the routing tables of the specified
VPNs.
dest-address

Specifies the destination IP address of the static route, in dotted decimal


notation.
mask

Specifies the mask of the IP address, in dotted decimal notation.


mask-length

Specifies the mask length in the range of 0 to 32.


vpn-instance d-vpn-instance-name

Specifies a destination MPLS L3VPN by its name, a case-sensitive


string of 1 to 31 characters. If a destination VPN is specified, packets
will search for the output interface in the destination VPN based on the
configured next hop address.
next-hop-address

Specifies the IP address of the next hop in the destination vpn-instance,


in dotted decimal notation.
backup-interface interface-type interface-number

Specifies a backup output interface by its type and number. If the backup

******ebook converter DEMO Watermarks*******

output interface is an NBMA interface or broadcast interface (such as an


Ethernet interface, a virtual template interface, or a VLAN interface),
rather than a P2P interface, you must specify the backup next hop
address.
backup-nexthop backup-nexthop-address

Specifies a backup next hop address.


bfd

Enables BFD to detect reachability of the static route's next hop. When
the next hop is unreachable, the system immediately switches to the
backup route.
control-packet

Specifies the BFD control mode.


bfd-source ip-address

Specifies the source IP address of BFD packets. H3C recommends that


you specify the loopback interface address.
permanent

Specifies the route as a permanent static route. If the output interface is


down, the permanent static route is still active.
track track-entry-number

Associates the static route with a track entry specified by its number in
the range of 1 to 1024. For more information about track, see High
Availability Configuration Guide.
echo-packet

Specifies the BFD echo mode.


public

Indicates that the specified next hop address is on the public network.
interface-type interface-number

Specifies an output interface by its type and number. If the output


interface is an NBMA interface or broadcast interface (such as an
Ethernet interface, a virtual template interface, or a VLAN interface),
rather than a P2P interface, the next hop address must be specified.

******ebook converter DEMO Watermarks*******

preference preference-value

Specifies a preference for the static route, in the range of 1 to 255. The
default is 60.
tag tag-value

Sets a tag value for marking the static route, in the range of 1 to
4294967295. The default is 0. Tags of routes are used for route control
in routing policies. For more information about routing policies, see
Layer 3IP Routing Configuration Guide.
description description-text

Configures a description for the static route, which comprises 1 to 60


characters, including special characters like the space, but excluding the
question mark (?).
The routing tables of the devices are updated with the new routes. In Figure 3-23, the
routing tables of VPN instance CustomerA and Shared-Internet (shared) are shown on
the Core device.

Figure 3-23: Route leaking: Static route example

Both VPN instances CustomerA and Shared-Internet (shared) display the two
additional static routes previously configured.
CustomerA VPN instance contains the following static routes:
10.2.0.0/16 with a next hop of 10.2.1.2 (CA-R1) in the CustomerA VPN instance.
The NextHop (CA-R1) is also in the CustomerA VPN instance.

******ebook converter DEMO Watermarks*******

0.0.0.0/0 with a next hop of 10.3.1.3 (Firewall) in the shared VPN instance. The
NextHop (Firewall) is in a different VPN instance.
The Shared-Internet (shared) VPN instance contains the following static routes:
10.2.0.0/16 with a next hop of 10.2.1.2 (CA-R1) in the CustomerA VPN instance.
The NextHop (CA-R1) is in a different VPN instance.
0.0.0.0/0 with a next hop of 10.3.1.3 (Firewall) in the shared VPN instance The
NextHop (Firewall) is also in the shared VPN instance.
Connectivity between the CustomerA VPN instance and the Shared-Internet (shared)
VPN instance can be verified by using ping for example, as shown in Figure 3-24.

Figure 3-24: Route leaking: Static route example (continued)

In this case the Firewall is able to ping a server with IP address 10.2.0.2 in the
CustomerA VPN instance.
Tracert shows that the path from the Firewall in the shared VPN instance traverses
the Core device (both VPN instances) to reach the server in the CustomerA VPN
instance.

Route leakingStatic route restrictions


There are restrictions on static route leaking. Static routes can only be configured for
remote IP subnets and not for directly connected subnets. The is because a next hop IP
address must be configured as part of the static route command; and that address
cannot be the local device where the static route is applied.

******ebook converter DEMO Watermarks*******

Multiprotocol BGP (MBGP) is required for inter-VLAN routing of directly


connected subnets between VPN instances. In other words, the device with interfaces
in different VPN instances will need to run MBGP to advertise directly connected
subnets between the VPN instances.
In the sample network in Figure 3-22, the Core device would need to run MBGP to
route traffic from CustomerA to subnet 10.3.1.0/24, or to route from the SharedInternet VPN instance to subnet 10.2.1.0/24.
MBGP configuration is out of the scope of this study guide.

Management access VPN instance


In this section isolated management access using a separate VPN instance will be
discussed.
Most data center switches have dedicated management Ethernet ports. On chassis
based devices, the management port is located on the management processing unit
(MPU). On fixed port devices, the out of band management port is located either on
the front or back of the device, as shown on Figure 3-25.

Figure 3-25: Management access VPN instance

A management Ethernet interface uses an RJ-45 connector. It can be used to connect a


PC for software loading and system debugging, or connect to a remote device, for
example, a remote network management station, for remote system management. It has
the attributes of a common Ethernet interface, but because it is located on the main
board, it provides much faster connection speed than a common Ethernet interface

******ebook converter DEMO Watermarks*******

when used for operations such as software loading and network management.
The display interface brief command displays this interface as a M-Ethernet
Interface. The Management Ethernet Interface is defined as a routed port in the
configuration and therefore these ports cannot be used for switching operations, but
only are for routed operations.
To configure a management Ethernet interface, see Table 3-16.
Table 3-16: How to configure a management Ethernet interface
Step

Command

1. Enter system view.


2. Enter management
Ethernet interface
view

system-view

3. Set the description


string

Remarks

interface interfacetype interface-number

description text

Optional
By default, the description is MGigabitEthernet0/0/0 Interface

As the Management Ethernet port is a routed port, an IP address can be configured


directly on the interface. Routed ports including the Management Ethernet port can
also be bound to VPN instances.
Once the management interface is bound to a specific VPN instance, management
subnets are only available within the specific VPN instance and are no longer part of
the public routing table. Any routing configuration such as default gateway or routing
protocols would need to be configured for that VPN instance.
Once IP connectivity is established in the management VPN instance, management
protocols need to be configured to use the specified VPN instance. All management
protocols use the public routing table by default.
For example, a switch is configured to use RADIUS authentication for network
operators using server with IP address 10.1.2.100. The switch will attempt to
connect to the server using the public routing table by default. Even though the
RADIUS server IP address may be reachable via the Management Ethernet port,
RADIUS authentication would fail. The switch will not have 10.1.2.100 in the public
routing table and will thus not be able to reach the RADIUS server using RADIUS.
Management protocols including RADIUS need to be configured to use the correct

******ebook converter DEMO Watermarks*******

VPN instance instead of using the public routing table. This needs to be configured on
a per protocol basis (telnet, SSH, RADIUS etc).

Management access VPN-Instance (1/2)


Overview
Figure 3-26 shows examples of SNMP, Syslog and NTP configured to use the mgmt
VPN instance instead of the public routing table.

Figure 3-26: Management access VPN-Instance (1/2)

The first command is an example of SNMP trap host configuration. Any warning or
informational messages or other events displayed on the switch can be copied to an
SNMP server (NMS management system) using an SNMP trap. Host 10.0.1.100
could be an IMC server configured to receive SNMP traps of another NMS system.
The snmp-agent target-host command specifies options such as trap, UDP domain and
host IP address. In addition in this example, the command specifies the VPN instance
to use when sending traps. If the vpn-instance mgmt option is not specified, the
switch will attempt to contact the host using the public routing table.
The second example shows the configuration for a syslog server. Once again the VPN
instance mgmt option is used to specify that syslog messages are sent based on the
VPN instance routing table rather than the public routing table.
The third example configures NTP to use the correct VPN instance to reach the NTP
server.

SNMP Agent
******ebook converter DEMO Watermarks*******

The SNMP Agent sends notifications (traps and informs) to inform the NMS of
significant events, such as link state changes and user logins or logouts. Unless
otherwise stated, the trap keyword in the command line includes both traps and
informs.
Enable an SNMP notification only if necessary. SNMP notifications are memoryintensive and may affect device performance.
To generate linkUp or linkDown notifications when the link state of an interface
changes, you must enable linkUp or linkDown notification globally by using the
snmp-agent trap enable standard [ linkdown | linkup ] * command and on the interface
by using the enable snmp trap updown command.
After you enable a notification for a module, whether the module generates
notifications also depends on the configuration of the module. For more information,
see the configuration guide for each module.
To enable SNMP traps, see Table 3-17.
Table 3-17: How to enable SNMP traps
Step

Command

1. Enter
system
view.

system-view

2. Enable
notifications
globally.

snmp-agent trap enable [ bgp | configuration |


ospf [ authentication-failure | bad-packet |
config-error | grhelper-status-change |
grrestarter-status-change | if-state-change |
lsa-maxage | lsa-originate | lsdb-approachingoverflow | lsdb-overflow | neighbor-statechange | nssatranslator-status-change |
retransmit | virt-authentication-failure |
virt-bad-packet | virt-config-error | virtretransmit | virtgrhelper-status-change |
virtif-state-change | virtneighbor-state-change
] * | standard [ authentication | coldstart |
linkdown | linkup | warmstart ] * | system ]

Remarks

By
default,
all the
traps are
enabled
globally.

You can configure the SNMP agent to send notifications as traps or informs to a host,
typically an NMS, for analysis and management. Traps are less reliable and use
fewer resources than informs, because an NMS does not send an acknowledgement
when it receives a trap.

******ebook converter DEMO Watermarks*******

When network congestion occurs or the destination is not reachable, the SNMP agent
buffers notifications in a queue. You can configure the queue size and the notification
lifetime (the maximum time that a notification can stay in the queue). A notification is
deleted when its lifetime expires. When the notification queue is full, the oldest
notifications are automatically deleted.
You can extend standard linkUp/linkDown notifications to include interface
description and interface type, but must make sure that the NMS supports the
extended SNMP messages.
To send informs, make sure:
The SNMP agent and the NMS use SNMPv3.
Configure the SNMP engine ID of the NMS when you configure SNMPv3 basic
settings. Also, specify the IP address of the SNMP engine when you create the
SNMPv3 user.
Configuration prerequisites
Configure the SNMP agent with the same basic SNMP settings as the NMS. You
must configure an SNMPv3 user, a MIB view, and a remote SNMP engine ID
associated with the SNMPv3 user for notifications.
The SNMP agent and the NMS can reach each other.
To configure the SNMP agent to send notifications to a host, see Table 3-18.
Table 3-18: How to configure the SNMP agent to send notifications to a host
Step

Command

1. Enter
system
view.

system-view

Remarks

(Approach 1) Send traps to the target host:


snmp-agent target-host trap address
udp-domain { ip-address | ipv6 ipv6address } [ udp-port port-number ] [
vpn-instance vpn-instance-name ]
params securityname security-string [
v1 | v2c | v3 [ authentication |
privacy ] ]

2.
Configure
a target (Approach 2) Send informs to the target host:
snmp-agent target-host inform address
host.
udp-domain { ip-address | ipv6 ipv6address } [ udp-port port-number ] [
vpn-instance vpn-instance-name ]

Use either approach.


By default, no target
host is configured.
Current software
version does not
support SNMPv1 and
SNMPv2c. The v1 and
v2c keywords are
reserved at the CLI only

******ebook converter DEMO Watermarks*******

params securityname security-string {


v2c | v3 [ authentication | privacy ]
}

for future support..

Syslog
The info-center loghost command takes effect only after information center is enabled
with the info-center enable command.
The device supports up to four log hosts.
Use info-center
Use undo

loghost

to specify a log host and to configure output parameters.

info-center loghost

to restore the default.

Syntax
info-center loghost [ vpn-instance vpn-instance-name
address | ipv6 ipv6-address } [ port port-number ]
local-number ]

] { ipv4[ facility

undo info-center loghost [ vpn-instance vpn-instance-name ] { ipv4address | ipv6 ipv6-address }


vpn-instance

vpn-instance-name: Specifies an MPLS L3VPN by its name, a casesensitive string of 1 to 31 characters. If the log host is on the public
network, do not specify this option.
ipv4-address

Specifies the IPv4 address of a log host within the VPN instance.
ipv6 ipv6-address

Specifies the IPv6 address of a log host within the VPN instance.
port port-number

Specifies the port number of the log host, in the range of 1 to 65535. The
default is 514. It must be the same as the value configured on the log
host. Otherwise, the log host cannot receive system information.
facility local-number

Specifies a logging facility from local0 to local7 for the log host. The
default value is local7. Logging facilities are used to mark different

******ebook converter DEMO Watermarks*******

logging sources, and query and filer logs.

Examples
Output logs to the log host 1.1.1.1.
<Sysname> system-view
[Sysname] info-center loghost 1.1.1.1

NTP
When you specify an NTP server for the device, the device is synchronized to the
NTP server, but the NTP server is not synchronized to the device.
To synchronize the PE to a PE or CE in a VPN, provide vpn-instance vpn-instancename in your command.
If you include the vpn-instance vpn-instance-name option in the undo ntp-service
unicast-server command, the command removes the NTP server with the IP address
of ip-address in the specified VPN. If you do not include the vpn-instance vpninstance-name option in this command, the command removes the NTP server with
the IP address of ip-address in the public network.
Use ntp-service

unicast-server

Use undo ntp-service


the device.

to specify an NTP server for the device.

unicast-server

to remove an NTP server specified for

Syntax
ntp-service unicast-server { ip-address | server-name } [ vpninstance
vpn-instance-name
]
[
authentication-keyid
keyid
|
priority | source interface-type interface-number | version number
] *
undo ntp-service unicast-server { ip-address | server-name } [ vpninstance vpn-instance-name ]
ip address

Specifies an IP address of the NTP server. It must be a unicast address,


rather than a broadcast address, a multicast address or the IP address of
the local clock.
server-name

Specifies a host name of the NTP server, a case-insensitive string of 1 to

******ebook converter DEMO Watermarks*******

255 characters.
vpn-instance vpn-instance name

Specifies the MPLS L3VPN to which the NTP server belongs, where
vpn-instance-name is a case-sensitive string of 1 to 31 characters. If the
NTP server is on the public network, do not specify this option.
authentication-keyid keyid

Specifies the key ID to be used for sending NTP messages to the NTP
server, where keyid is in the range of 1 to 4294967295. If the option is
not specified, the local device and NTP server do not authenticate each
other.
priority

Specifies this NTP server as the first choice under the same condition.
source interface-type interface-number

Specifies the source interface for NTP messages. For an NTP message
the local device sends to the NTP server, the source IP address is the
primary IP address of this interface. The interface-type interface-number
argument represents the interface type and number.
version number

Specifies the NTP version, where number is in the range of 1 to 4. The


default value is 4.

Examples
Specify NTP server 10.1.1.1 for the device, and configure the device to run NTP
version 4.
<Sysname> system-view
[Sysname] ntp-service unicast-server 10.1.1.1 version 4

Management access VPN-Instance (2/2)


RADIUS
A RADIUS scheme specifies the RADIUS servers that the device can communicate
with. It also defines a set of parameters that the device uses to exchange information
with the RADIUS servers, including the IP addresses of the servers, UDP port

******ebook converter DEMO Watermarks*******

numbers, shared keys, and server types.


Switches support the defining of multiple RADIUS or TACACS schemes. A switch
could for example be configured with one RADIUS scheme for 802.1X authentication
and a different RADIUS scheme for management authentication. The RADIUS
servers referenced in each VPN instance could also be different. The vpn-instance
command is used within the radius scheme to specify which VPN instance and
RADIUS server is used for a particular scheme.
Customers could have their own RADIUS servers which they may want to use for
802.1X authentication. By configuring the relevant vpn-instance on each customer
RADIUS scheme, RADIUS packets would be sent within that VPN instance to the
relevant customer RADIUS server rather than via the public routing table.
A separate RADIUS scheme could also be configured for management authentication
within a VPN instance. Figure 3-27 shows an example of RADIUS server
configuration within the management VPN instance. This ensures that RADIUS
authentication uses the mgmt VPN instance rather than the public routing table or a
customer VPN.

Figure 3-27: Management access VPN-Instance (2/2)

Create a RADIUS scheme before performing any other RADIUS configurations. You
can configure up to 16 RADIUS schemes. A RADIUS scheme can be referenced by
multiple ISP domains.
To create a RADIUS scheme, see Table 3-19.
Table 3-19: How to create a RADIUS scheme

******ebook converter DEMO Watermarks*******

Step
1. Enter
system view.
2. Enter
RADIUS
scheme view.

Command

Remarks

system-view

radius scheme radius-schemename

Specify the primary RADIUS


authentication server:
primary authentication { ipv4address | ipv6 ipv6-address } [
port-number | key { cipher |
simple } string | vpn-instance
vpn-instance-name ] *

3. Specify
RADIUS
authentication
Specify a secondary RADIUS
servers.
authentication server:

secondary authentication {
ipv4-address | ipv6 ipv6address } [ port-number | key {
cipher | simple } string | vpninstance vpn-instance-name ] *

Configure at least one


command.
By default, no authentication
server is specified.
Two authentication servers
in a scheme, primary or
secondary, cannot have the
same combination of IP
address, port number, and
VPN.

The VPN specified for a RADIUS scheme applies to all authentication and
accounting servers in that scheme. If a VPN is also configured for an individual
RADIUS server, the VPN specified for the RADIUS scheme does not take effect on
that server.
To specify a VPN for a scheme, see Table 3-20.
Table 3-20: How to specify a VPN for a scheme
Step

Command

1. Enter system view.


2. Enter RADIUS
scheme view.
3. Specify a VPN for
the RADIUS scheme.

system-view

Remarks

radius scheme
radius-scheme-name
vpn-instance vpninstance-name

By default, a RADIUS scheme


belongs to the public network.

sFlow/Netflow
The second command in Figure 3-27 shows an example of sFlow configured to use

******ebook converter DEMO Watermarks*******

the mgmt VPN instance when communicating with sFlow collector 10.0.1.100.
To configure the sFlow agent and sFlow collection information, see Table 3-21.
Table 3-21: How to configure the sFlow agent and sFlow connection information
Step

Command

1. Enter
system
view.

system-view

Remarks

2.
(Optional.)
Configure
an IP
address for
the sFlow
agent.

sflow agent { ip ipaddress | ipv6 ipv6address }

By default, no IP address is
configured for the sFlow agent. The
device periodically checks whether
the sFlow agent has an IP address. If
not, the device automatically selects
an IPv4 address for the sFlow agent
but does not save the IPv4 address in
the configuration file.
It is recommended that you manually
configure an IP address for the sFlow
agent.
Only one IP address can be
configured for the sFlow agent on the
device, and a newly configured IP
address overwrites the existing one.

3.
Configure
the sFlow
collector
information.

sflow collector
collector-id [ vpninstance vpn-instancename ] { ip ip-address |
ipv6 ipv6-address } [
port port-number |
datagram-size size |
time-out seconds |
description text ] *

By default, no sFlow collector


information is configured.

4.
(Optional.)
Specify the
source IP
address of
sFlow
packets.

sflow source { ip ipaddress | ipv6 ipv6address } *

By default, the source IP address is


determined by routing.

******ebook converter DEMO Watermarks*******

OpenFlow
The third command in Figure 3-27 shows an example of OpenFlow configured to
communicate with an OpenFlow controller (10.0.1.100) using the mgmt VPN
instance.
The number of controller supported by an OpenFlow switch is switch dependent. The
OpenFlow channel between the OpenFlow switch and each controller can have only
one main connection, and the connection must use TCP or SSL. The main connection
must be reliable and processes control messages to complete tasks such as deploying
entries, obtaining data, and sending information.
To specify a controller for an OpenFlow switch and configure the main connection to
the controller, see Table 3-22.
Table 3-22: How to specify a controller for an OpenFlow switch and configure the
main connection to the controller
Step
1. Enter system
view.
2. Enter
OpenFlow
instance view.
3. Specify a
controller and
configure the main
connection to the
controller.

Command

Remarks

system-view

openflow instance instance-id

controller controller-id address {


ip ip-address | ipv6 ipv6-address
} [ port port-number ] [ ssl sslpolicy-name ] [ vrf vrf-name ]

By default, an
OpenFlow instance
is not configured
with any main
connection.

IMC Management Access using VPN-Instance


IMC can be used for network management in conjunction with VPN instances. No
additional configuration is required within IMC to support basic device management,
status reporting and SNMP polling of devices. IMC will simply try to reach the
device via the configured IP address and the device will respond from that IP
address. IMC is unaware that it has been configured within a VPN instance.
However, when using IMC device discovery, the option "Automatically register to
receive SNMP traps from supported devices" needs to be unchecked (turned off).

******ebook converter DEMO Watermarks*******

This IMC option is not VPN instance aware and will configure the devices to send
SNMP traps to the IMC server using the public routing table instead of the correct
VPN instance.
Figure 3-28 shows an example of the configured target-host command as configured
by IMC. No VPN instance has been configured and traps will not reach the IMC
server at IP address 10.0.1.100 because the public routing table is used instead of the
VPN instance.

Figure 3-28: IMC Management Access using VPN-Instance

Ensure that the IMC option is unchecked and that the correct command is manually
configured on the device with the correct VPN instance.
A feature of IMC that must be configured when working with VPN instances is the
Intelligent Configuration Center.
The Intelligent Configuration Center is part of the basic IMC platform and provides
automated deployment of configurations as well as backup and restore configuration
options.
Backups of device configurations requires additional setup when used with VPN
instances. Failing to configure this will result in backups failing as can be seen in
Figure 3-29.

******ebook converter DEMO Watermarks*******

Figure 3-29: Failed backups of device configurations

The reason for the failure is that IMC is unaware of the VPN instance by default. IMC
will instruct the network device to backup the configuration to the IMC server
running a local TFTP server. IMC will use SNMP set commands to initiate the TFTP
backup from the device. Included in the SNMP set messages are the backup filename
to be used and the TFTP server IP address. IMC does not however specify any VPN
instance by default.
Since IMC has only included the TFTP server address and filename in the SNMP set
messages, when the device initiates the TFTP backup, it will use the public routing
table and thus the backup will fail (the IMC server is not reachable from the public
routing table).
As discussed previously, the TFTP upload from the network device needs to use the
management VPN instance rather than the public routing table.
IMC can be configured to include the required VPN instance name when instructing a
device to backup its configuration. The SNMP set instructions sent to the device will
thus include the VPN instance in addition to the filename and TFTP server IP
address.
This option is configured by selecting the following options (see Figure 3-30):
1. Configuration Center menu
2. Options menu
3. VPN instance tab
4. For each device, selecting the VPN Instance Name to use for that device.

******ebook converter DEMO Watermarks*******

Figure 3-30: IMC Management Access using VPN-Instance

When IMC instructs the device to initiate a backup, the SNMP set instruction will
include the specified VPN instance name.
Note
The specified VPN instance must be defined on the network device.
The output of a successful backup is shown in Figure 3-31.

Figure 3-31: Output of a successful backup

A core network device may have multiple interfaces configured with IP addresses in
multiple VPN instances. Any one of these IP addresses could be used for the
management of the device and this includes IP addresses configured within customer
VPN instances. That means that a customer may attempt to telnet to a core device or
use snmp to configure the device. For security reasons, it is undesirable to permit any
management access to core network devices from customer VPN instances.

******ebook converter DEMO Watermarks*******

Access control lists (ACLs) can be configured to only allow management access
from specific VPN instances such as the management VPN instance. The vpninstance option is available on Comware ACLs to only allow access from
specified VPN instances.
In Figure 3-32, an access list 2001 is configured with an entry that only permits the
IMC host (IP address 10.0.100) configured in mgmt VPN instance. The access list is
then bound to various management protocols such as telnet, SSH, HTTP and others.
The management protocols are therefore restricted to only allow access from host
10.0.1.100 in the mgmt VPN instance.

Figure 3-32: IMC Management Access using VPN-Instance

The ACL should be applied to all management protocols on the device to ensure
customers are not able to connect the device.
Note
Comware device ACLs have an implicit permit by default when used as packet
filers. However, in this example, the ACL is used to limit management
protocols and in this case, the default action is deny. This is the opposite of the
behavior of packet filters that filter traffic passing through the device.

ACLs
An access control list (ACL) is a set of rules (or permit or deny statements) for
identifying traffic based on criteria such as source IP address, destination IP address,
and port number.
Table 3-23 is a list of ACL categories.
Table 3-23: ACL categories

******ebook converter DEMO Watermarks*******

Each ACL category has a unique range of ACL numbers. When creating an ACL, you
must assign it a number. In addition, you can assign the ACL a name for ease of
identification. After creating an ACL with a name, you cannot rename it or delete its
name.
For an IPv4 basic or advanced ACLs, its ACL number and name must be unique in
IPv4. For an IPv6 basic or advanced ACL, its ACL number and name must be unique
in IPv6.
The rules in an ACL are sorted in a specific order. When a packet matches a rule, the
device stops the match process and performs the action defined in the rule. If an ACL
contains overlapping or conflicting rules, the matching result and action to take
depend on the rule order.
The following ACL match orders are available:
configSorts ACL rules in ascending order of rule ID. A rule with a lower ID is
matched before a rule with a higher ID. If you use this approach, carefully check
the rules and their order.
autoSorts ACL rules in depth-first order. Depth-first ordering makes sure any
subset of a rule is always matched before the rule. Table 1 lists the sequence of
tie breakers that depth-first ordering uses to sort rules for each type of ACL.
The match order of user-defined ACLs can only be config.
Sort ACL rules in depth-first order, as shown in Table 3-24:
Table 3-24: Sort ACL rules in depth-first order
ACL category Sequence of tie breakers
IPv4 basic

VPN instance
More 0s in the source IP address wildcard (more 0s means a

******ebook converter DEMO Watermarks*******

ACL

narrower IP address range)


Rule configured earlier
VPN instance
Specific protocol type rather than IP (IP represents any protocol
over IP)
IPv4 advanced
More 0s in the source IP address wildcard mask
ACL
More 0s in the destination IP address wildcard
Narrower TCP/UDP service port number range
Rule configured earlier
A wildcard mask, also called an inverse mask, is a 32-bit binary number represented
in dotted decimal notation. In contrast to a network mask, the 0 bits in a wildcard
mask represent "do care" bits, and the 1 bits represent "don't care" bits. If the "do
care" bits in an IP address are identical to the "do care" bits in an IP address
criterion, the IP address matches the criterion. All "don't care" bits are ignored. The
0s and 1s in a wildcard mask can be noncontiguous. For example, 0.255.0.255 is a
valid wildcard mask.

Telnet server acl


Use telnet
Use undo

server acl

to apply an ACL to filter Telnet logins.

telnet server acl

to restore the default.

Only one ACL can be used to filter Telnet logins, and only users permitted by the
ACL can Telnet to the device. This command does not take effect on existing Telnet
connections. You can specify an ACL that has not been created yet in this command.
The command takes effect after the ACL is created.

Syntax
telnet server acl acl-number
undo telnet server acl
acl-number

Specifies an ACL by its number:


Basic ACL2000 to 2999.
Advanced ACL3000 to 3999.
Ethernet frame header ACL4000 to 4999.

******ebook converter DEMO Watermarks*******

Examples
Permit only the user at 1.1.1.1 to Telnet to the device.
<Sysname> system-view
[Sysname] acl number 2001
[Sysname-acl-basic-2001] rule permit source 1.1.1.1 0
[Sysname-acl-basic-2001] quit
[Sysname] telnet server acl 2001

Summary
In this chapter, you learned about Multi-CE (MCE). MCE enables a switch to
function as a Customer Edge (CE) device of multiple VPN instances in a BGP/MPLS
VPN network, thus reducing network equipment investment.
You learned about MCE features and supported platforms. MCE use cases such
multi-tenant datacenters, overlapping IP subnets, isolated management networks and
others were discussed.
You learned the basic configuration steps for configuring MCE including:
i. Define a new VPN instance.
ii. Set Route-Distinguisher
iii. Bind an L3 interface to the VPN-instance
iv. Configure L3 interface IP address
v. Optionally, configure L3 dynamic or static routing
Advanced MCE configuration options were also discussed:
Routing table limits
Route leaking (both static and dynamic)
Management access VPN instances

Learning Check
Answer each of the questions below.
1. Is MBGP required to implement MCE on CE devices?
a. Yes, as route leaking requires MBGP.

******ebook converter DEMO Watermarks*******

b. Yes, otherwise routes are not advertised to PE devices.


c. No, MCE does not require MBGP except when routing for directly connected subnets in different VPN
instances.
d. No, MCE only uses static routing to route between subnets in different VPN instances.

2. Which components are part of a VPN instance (Choose four)?


a. Separate LFIB
b. Global routing table
c. VPNv4 routes
d. Public routing table
e. Separate routing table
f. Interfaces bound to the VPN instance
g. RD
3. Which interface type cannot be allocated to a VPN instance?
a. Layer 3 VLAN interfaces
b. Routed ports
c. Loopback interfaces
d. Layer 2 VLAN interfaces
e. Routed subinterfaces
4.

An administrator has configured IMC in a VPN instance with the name


"management". IMC is not receiving SNMP trap messages. How does the
administrator resolve this?
a. Move the IMC server out of the VPN instance as this is an unsupported
setup.
b.

Ensure that the "Automatically register to receive SNMP traps from


supported devices" option is checked within IMC.

c. Configure and select the VPN instance within the IMC GUI interface.
d. Manually configure the SNMP target host for traps on the network device.

Learning Check Answers


1. c
2. a, e, f, g
3. d

******ebook converter DEMO Watermarks*******

4. d

******ebook converter DEMO Watermarks*******

4 DCB Datacenter Bridging

EXAM OBJECTIVES
In this chapter, you learn to:
Describe the DCB Protocols.
Understand the DCBX protocol.
Understand and configure PFC.
Understand and configure ETS.
Understand and configure APP.
Understand Congestion Notification.
Describe datacenter use cases for CEE.

INTRODUCTION
Using separate, single-purpose networks for data and storage can increase
complexity and cost, as compared to a converged network solution. Datacenter
Bridging (DCB) is a technology that enables the consolidation of IP-based LAN
traffic and block-based storage traffic onto a single converged Ethernet network. This
can help to eliminate the need to build separate infrastructures for LAN systems that
carry typical end-user data traffic, and SAN systems that carry storage-specific
communications.
You will learn about the individual standards-based protocols that enable DCB, and
how they enable communication between devices, provide for lossless Ethernet
transmissions, handle flow control, and support Quality of Service (QoS).

******ebook converter DEMO Watermarks*******

ASSUMED KNOWLEDGE
You should have a basic understanding of Data Center Bridging (DCB) protocols and
configuration parameters and be familiar with the features of Fibre Channel Protocol
(FCP), InfiniBand (IB), and iSCSI.

DCB Topics
This chapter will introduce the concepts related to DCB, and review DCB
configuration parameters. Priority-based Flow Control (PFC) will be explored,
along with PFC configuration.
The operation and configuration of the Application TLV (APP) will discussed, along
with an overview of ETS. You will also learn how to configure ETS.

Datacenter BridgingIntroduction
Data Center Bridging consists of a collection of standards that extend the
functionality of Ethernet. Various vendors have used different acronyms, such as
CEE, when discussing or promoting their own DCB-based solutions. However, the
IEEE standards group uses the term DCB to describe the suite of technologies that
enable FCoE to send Fibre Channel communications over Ethernet systems.
The motivation for DCB is to reduce the cost and complexity of running separate,
single-purpose networks for SANs and LANs. The consolidation of data center
infrastructure reduces the number of physical components, along with the associated
costs of rack space, device power, and cooling costs.
DCB offers advantages over previous technologies such as Fibre Channel Protocol
(FCP), InfiniBand (IB), and iSCSI, as described below and shown in Figure 4-1.

******ebook converter DEMO Watermarks*******

Figure 4-1: DCB vs Previous Technologies

DCB vs Previous Technologies


Fibre Channel Protocol (FCP) is a lightweight mapping of SCSI to the Fibre
Channels transport protocol. Fibre Channel can carry FCP and IP traffic to create a
converged network. However, the cost of FC prevented widespread use, except for
large data center SANs.
InfiniBand (IB) provides for a converged network using SCSI Remote Direct
Memory Access Protocol (SRP) or iSCSI Extensions for RDMA (iSER).
Widespread deployment was also limited due to cost, and the complex gateway and
routers needed to translate from IB to native FC storage devices.
Internet SCSI (iSCSI) provides a direct SCSI to TCP/IP mapping layer. Due to its
lower cost, iSCSI can appeal to small-medium sized deployments. However, scaling
the systems requires more complexity and cost in the form of iSCSI to FC gateways,
and so this solution is often avoided by larger enterprises.

******ebook converter DEMO Watermarks*******

FC over IP (FCIP) and FC Protocol (FCP) can map FC characteristics to LANs and
WANs. Again, these protocols were not widely adopted due to complexity, lack of
scalability, and cost.
Now that 10GbE is becoming more widespread, Fibre Channel over Ethernet (FCoE)
is the next attempt to converge block storage protocols onto Ethernet. FCoE embeds
FC frames within Ethernet frames, and relies on the Ethernet infrastructure that has
been enhanced by implementing IEEE Data Center Bridging (DCB) standards. The
individual protocols and components that enable FCoE traffic to be supported over
Ethernet are described below.

DCB Components
The standards-based protocols and components of DCB are shown in Figure 4-2 and
introduced below.

Figure 4-2: DCB Components

DCBX: The Data Center Bridging eXchange protocol is used to communicate key
parameters between DCB-capable devices. The information exchanged is largely
centered on PFC, APP, and ETS functionality.
PFC: Priority-based Flow Control helps to ensure that Ethernet can provide the
lossless frame delivery that FCoE requires.
APP: Provides instructions to CNA about application-to-CoS mapping.

ETS: Enhanced Transmission Selection enables control over how much


bandwidth LAN, SAN, and other traffic types can use over a converged Ethernet
link.

CNA: A Converged Network Adapter can support both Fibre Channel and
traditional LAN communications on a single interface.
CN: Congestion Notification supports end-to-end flow control in an attempt to
localize the effects of congestion to the device that is causing it.

******ebook converter DEMO Watermarks*******

DCB Feature Overview


DCBX enables devices to discover peers, detect configuration parameters and
configure peer CNAs. It is an extension to LLDP, adding new type-length-values
(TLVs) that enable the exchange PFC, APP, and ETS information.
In Figure 4-3, the server access switch sends DCBX frames to automatically
configure the servers CNA. The APP TLV is used to inform the CNA that FCoE
frames are to be marked with an 802.1p value of 3.

Figure 4-3: DCB Feature Overview

ETS controls bandwidth utilization. In this example, the FCoE traffic (802.1p = 3)
shall be mapped to ETS queue 1, and have 60% of the bandwidth reserved (during
times of congestion). All other traffic (802.1p = 0-2, 4-7) shall be mapped to ETS
queue 0, and have access to 40% of the bandwidth.
PFC is an enhancement to the Ethernet Pause feature, which uses Pause frames to
pause all traffic on a link. It is as if PFC is logically dividing a single physical link
into multiple virtual links, reserving one such link for FCoE. Thus, the pause
mechanism can stop all traffic other than that specified as no-drop, ensuring that
FCoE frames are not dropped due to a short-lived burst of LAN traffic.
As shown in Figure 4-3, this PFC information is also passed (via DCBX) between
the server access switch and the storage switch. This ensures that the lossless frame
requirement for FCoE is enforced all the way between the Server CNA and target
SAN.

DCBSupported Products
******ebook converter DEMO Watermarks*******

The features introduced thus far are available on all HP datacenter switches running
Comware 7, including both fixed configuration access switches and chassis-based
core switches.

Access switches
At the access layer, the 5900 Switch Series support DCB-compliant features. This
also includes the HP 5920 and 5930 switches.

Core switches
Chassis-based switches suitable for deployment at the datacenter core also support
DCB. This includes the 11900/12500/12900 Switch Series.

Full HP Supported configuration limited to select


products
HP has gone to great lengths to fully ensure that various product combinations support
the features you need. HP has created the Single Point of Connectivity Knowledge
(SPOCK) as the primary portal for detailed information about HP storage products. It
is highly recommended that you consult SPOCK to ensure that you are deploying
systems that have been fully tested by HP. The current URL for SPOCK is
http://h20272.www2.hp.com/.

Design Considerations
Migration from traditional storage to FCoE-based systems can be gradual. Deploy
FCoE first at the server-to-network edge. Then migrate further into aggregation/core
layers and storage devices over time. Transitioning the server-to-network edge first
to accommodate FCoE/DCB will maintain existing network architecture, management
roles, and the existing SAN and LAN topologies. This approach offers the greatest
benefit and simplification without disrupting the data center architecture.
You should also consider implementing FCoE only with those servers requiring
access to FC SAN targets. Most data center assets only need a LAN connection, as
opposed to both LAN and SAN connections. You should use CNAs only with the
servers that actually benefit from them. Dont needlessly change the entire
infrastructure.
ProLiant c-Class BladeSystem G7 and later blade servers come with HP FlexFabric
adapters (HP CNAs) as the standard LAN-on-Motherboard (LOM) devices. This

******ebook converter DEMO Watermarks*******

provides a very cost effective adoption of FCoE technology. FlexFabric modules


eliminate up to 95% of network sprawl at the server edge. One device converges
traffic inside enclosures and directly connects to LANs and SANs.
During design and implementation, remember that Ethernet Maximum Transmission
Unit (MTU), or maximum frame size is 1518 bytes, while the MTU for FCoE is 2240
bytes. This so-called baby jumbo frame size must be supported on all devices
between FCoE-capable servers and storage systems.
Also, FCoE uses specific MAC addresses, as listed in Figure 4-4. You must ensure
that these MACs are not blocked.

Figure 4-4: Design Considerations

DCBXData Center Bridging eXchange


DCBX is an extension to LLDP that facilitates connectivity between DCB-enabled
devices. As defined in the IEEE 802.1Qaz standard, DCBX accomplishes this by
adding TLVs to LLDP. This can be compared to LLDP-MED, which defines
extensions to LLDP to facilitate connectivity between Voice-over-IP (VoIP) endpoints
and switches.
Version 1.00 was the initial public version of DCBX. The main goal for version 1.00
was to enable automatic, priority-based flow control mapping. This included limited
support for L2 marking, allowing the switch to inform the CNA that FCoE frames
should be placed into a specific queue.
Version 1.01 enhances this marking capability. In addition to the L2-based
classification, L4-based classification is also supported. Thus, TCP and UDP ports
could be used as a basis for classification. This is critical for iSCSI, which is a
TCP-based storage protocol.

******ebook converter DEMO Watermarks*******

After v1.01, more refinements were made, and the IEEE 802.1Qaz standard was
ratified. A key enhancement concerns the CNA's operational settings. In previous
versions, a switch announced information toward the CNA, and the CNA only
announced its original settings back to the switch. This could make troubleshooting
difficult, since there is no definitive method to ensure that the settings were actually
accepted by the CNA. With the 802.1Qaz standard version, the LLDP output reveals
both the recommended settings, as announced by the switch, and the operational
settings actually in use on the CNA.
IEEE 802.1Qaz is the default version enabled in HP datacenter switches. While you
can manually configure the version in use, it is a best practice to allow the switch to
automatically detect which version to use. The switch will detect whether an attached
CNA only supports v1.00 or v1.01, and will adjust accordingly.

Configuration Steps for DCBX


Following are the steps to configure DCBX on HP Datacenter switches running
Comware 7.
1. Enable global LLDP
2. Enable the DCBX TLVs for LLDP on the interface
3. Verify
These steps will be detailed below.

DCBX Step 1: Enable Global LLDP


The first step to enabling DCBX is to ensure that LLDP is enabled globally on the
device. With many Comware-based devices, LLDP is globally enabled by default,
but some Comware-based devices require the command shown. The easiest approach
is simply to issue the command, as shown in Figure 4-5, and verify that the feature is
enabled with the display lldp status command.

Figure 4-5: DCBX Step 1: Enable Global LLDP

DCBX Step 2: Enable Interface LLDP DCBX


TLVs
******ebook converter DEMO Watermarks*******

The next step involves enabling the DCBX-specific TLVs on the interface to which a
CNA is attached. Assuming LLDP has been enabled globally, LLDP is enabled by
default at the interface level. However, the TLVs for DCBX must be manually
enabled, as shown in Figure 4-6.

Figure 4-6: DCBX Step 2: Enable Interface LLDP DCBX TLVs

DCBX Step 3: Verify


Figure 4-7 shows the syntax to verify LLDP has been configured on the Comware
switch to support the DCBX TLVs. You notice that the DEFAULT column indicates
that DCBX TLVs are not enabled by default, but that YES is in the STATUS column
for DCBX TLVs.

Figure 4-7: DCBX Step 3: Verify

Ethernet Flow Control


Priority-based Flow Control (PFC) is defined in the IEEE 802.1Qbb standard.
802.1Qbb allows the network to provide link-level flow control for different classes
of traffic. The goal is to provide lossless Ethernet, which is a strict requirement for
FCoE. This is because fibre channel assumes that frames will not be dropped under
normal circumstances.

******ebook converter DEMO Watermarks*******

Ethernet, however, assumes that frames can be dropped because higher level
protocols, such as TCP deal with this issue. As shown in Figure 4-8, Ethernet does
include a pause feature which can be used for flow control. As switch buffers
begin to fill, and frame drops will soon occur the switch can send a pause message
on the link. This causes the upstream device to stop sending frames.

Figure 4-8: Ethernet Flow Control

The problem with this mechanism is that there is no way to pause only certain types
of traffic. Either all frames are paused, or none are paused. This is actually fine for
FCoE traffic. It is best to wait until the switch indicates that it can once again accept
frames, instead of risking dropped frames, which is unacceptable.
However, during this time, all TCP/IP traffic on a converged network would also be
paused. TCP/IP protocol stacks are designed to handle packet loss at upper layers,
such as TCP, and for some application-layer protocols that use UDP. Therefore,
pausing TCP/IP frames is creates unnecessary delays in transmissions, reducing
overall performance for LAN traffic. The solution is to have a priority-based flow
control mechanism, as described below.

PFCEnhancing Ethernet Flow Control


Priority-based Flow Control, as the name implies, enhances the functionality of the
original Ethernet flow control mechanism. Specifically, 802.1Qbb only pauses
frames that are tagged with a certain 802.1p CoS value, such as FCoE traffic.
Meanwhile, LAN traffic, marked with different CoS values, would be unaffected.
Beyond the lossless requirement of FCoE, consider the typical network shown in
Figure 4-9. It is possible that the downstream storage network may experience high
utilization, causing buffers to fill up. The PFC mechanism issues a PAUSE frame for
storage traffic, and so the CNA stops transmitting.

******ebook converter DEMO Watermarks*******

Figure 4-9: PFC Enhancing Ethernet Flow Control

Since the downstream data network is not experiencing congestion, there is no reason
to pause this traffic. As mentioned, TCP/IP stacks can handle FCoE frames, allowing
downstream SAN device buffers to handle frame drops, but in this scenario, there is
little danger of drops in the data network anyway.
In this manner, PFC provides a lossless Ethernet medium for FCoE traffic, without
negatively affecting LAN traffic.

PFCConfiguration Modes
There two available configuration modes for PFC. Manual mode is used for switchto-switch links, since switches do not use DCBX to configure each other. Since there
is no negotiation frames sent between switches, each switch must be locally
configured with compatible PFC parameters.
Automatic mode is used for links connecting switches to endpoint devices, such as
servers and storage systems. For these links, only the switch need be configured for
PFC. This is because switches use DCBX to exchange configuration information with
the endpoint CNA, which can adopt the switches proposed configuration.

Configuration Steps for PFC Manual Mode


Figure 4-10 introduces the steps required to configure PFC for switch-to-switch
links. This includes enabling PFC on the interface in manual mode, and specifying the
802.1p value to be used for lossless traffic.

******ebook converter DEMO Watermarks*******

Figure 4-10: Configuration Steps for PFC Manual Mode

PFC Manual Step 1: Enable Interface PFC


Mode
Step 1, as shown in Figure 4-11, involves enabling PFC on the interfaces that connect
to other switches. This configuration would of course be repeated at the switch on the
other side of the link.

Figure 4-11: PFC Manual Step 1: Enable Interface PFC Mode

PFC Manual Step 2: Enable Lossless for Dot1p


The second step is also configured at the interface, informing the switch which
802.1p value is to be used for lossless Ethernet. In the example in Figure 4-12, an
802.1p CoS value of 3 is specified.

Figure 4-12: PFC Manual Step 2: Enable Lossless for Dot1p

Although the 802.1Qbb standard supports multiple lossless values, most hardware
only supports a single lossless queue. For this reason, only one 802.1p value may be
specified for lossless, no-drop service. Since FCoE-based traffic is the reason for
using PFC, this does not create a practical limitation.

Configuration Steps for PFC Auto Mode


Before configuring PFC in auto mode, DCBX must first be enabled on the interface.
This was discussed in the previous section of this chapter.
The steps to configure PFC for switch-to-endpoint links includes enabling PFC on the

******ebook converter DEMO Watermarks*******

interface in auto mode, specifying the 802.1p value to be used for lossless traffic, and
then verifying that the configuration has been successful.

PFC Auto Step 1: Enable Interface PFC Mode


The configuration for PFC in auto mode is similar to that for configuring manual
mode. The difference is in the use of the keyword auto, as shown in Figure 4-13.

Figure 4-13: PFC Auto Step 1: Enable Interface PFC Mode

PFC Auto Step 2: Enable Lossless for Dot1p


The second step for PFC auto is identical to that for PFC manual mode. The 802.1p
value to be used for lossless Ethernet is specified at the interface configuration level.
In the example in Figure 4-14, an 802.1p CoS value of 3 is specified.

Figure 4-14: PFC Auto Step 2: Enable Lossless for Dot1p

PFC Auto Step 3: Verify


The final step involves validating the configuration. LLDP local information is
displayed for the specific interface configured, to verify which 802.1p value is
enabled for lossless Ethernet. In Figure 4-15, much of the initial output is not
displayed, in order to focus on pertinent DCBX PFC information. In bold, you can
see that PFC lossless Ethernet is enabled for 802.1p value of 3, as indicated by a
number 1. It is off for other 802.1p values, as indicate by a numeral 0.

******ebook converter DEMO Watermarks*******

Figure 4-15: PFC Auto Step 3: Verify

In the second example in Figure 4-15 LLDP neighbor information is displayed for the
specific interface configured. Notice that the use of the verbose keyword is
necessary to see detailed information, such as which PFC values have been accepted
and currently configured on the endpoint CNA.

APPApplication TLV
APP is defined by the DCBX standard, and allows the switch to program application
layer QoS rules on the CNA. For a typical switch configuration, the admin must
define access rules, to be used as match conditions for a QoS policy. The policy is
then applied to a specific interface, so when data exits the interface, it will be
checked against the classifier and takes some action. For example, the action could
be to elevate or decrease its priority, place it in a different queue, or drop the packet.
This operation happens in an ASIC on the switch. The CNA has a similar ASIC, and
so also has the capability of performing traffic selection and queuing operations.
However, some server administrators are not fluent with these types of rules.
The APP TLV allows the network administrator to configure the switch, which will
propose QoS policy to the CNA. In this way, the CNA dynamically learns QoS rules,
and uses them when it transmits frames to the network.
To implement this functionality, traditional QoS mechanisms must be defined, and
then applied to the APP TLV feature. This process starts by defining a classifier. With
APP, traffic can be classified based on the Layer 2 Ethertype field, or on the Layer 4

******ebook converter DEMO Watermarks*******

TCP or UDP destination port number. For example, FCoE uses Ethertype 0x8906,
and all iSCSI traffic uses TCP destination port 3260.
Traffic is classified by using advanced ACLs, which can permit or deny traffic based
on several different criteria. This includes protocol, source and destination port,
source and destination IP address, and more.
However, when used to select traffic for APP, we lack the wealth of rules of a
traditional switch. The hardware on both the switch and CNA can understand this
advanced functionality, but the APP TLV is a fairly simple, lightweight system that is
limited in its capability to deliver information. The APP TLV only accommodates the
exchange of Layer 2 protocol Ethertype, or Layer 4 TCP/UDP destination port
number.
This means that when you configure an ACL in order to classify traffic for APP, all
fields are ignored except Ethertype and destination port.
To configure QoS on an HP switch running Comware, an ACL is defined, and then
bound to a traffic classifier. The classifier is an object that describes the traffic to
have certain behaviors, or treated in a specific way for queuing services. When
configuring QoS specifically for DCBX, only two types of ACLs may be used. The
only options are an Ethernet ACL or an advanced ACL.
Classifiers can have multiple conditions, and Boolean logic can be used to control
this match criteria. Should multiple criteria be specified as match conditions for a
classifier, it uses a logical AND operator by default. Therefore, all criteria would
have to match, in order for the traffic to be considered in the class.
For the APP TLV, a Boolean OR operator must always be used. For example, you
may create an Ethernet ACL to specify FCoE traffic, and an advanced ACL to specify
iSCSI traffic, and then apply both of these ACLs as match conditions in a classifier
object. If you use a logical AND in this case, the condition would never be met, since
packets cannot be both iSCSI AND FCoE. In that case, the classifier will be ignored.
You define a classifier to select appropriate traffic, and then you define a behavior to
specify how that traffic is processed. A behavior defines what actions are to be taken
when the condition is matched.
In this example, we want to ensure that a certain class is marked with an 802.1p CoS
value. This remarked value is sent to the server CNA via the APP TLV, which the
CNA accepts and conforms to.
A QoS policy consists of a set of classifiers, which are bound to certain behaviors.

******ebook converter DEMO Watermarks*******

QoS policy classifiers can be defined for traditional usage, to locally modify the
switches own QoS mechanisms. Classifiers may also be defined for the APP TLV, to
modify the QoS mechanism on an attached servers CNA. You must inform the switch
of this by using the mode DCBX syntax. Only the rules that have these keywords
will be sent to the CNA.
This configuration model is quite flexible. You can decide which rules are locally
significant to modify traffic classification and behavior for the switch itself, and
which rules are to send toward the CNA to modify its behavior.
Once classifiers and behaviors are defined in a policy, that policy must be applied in
order to take effect. This QoS policy must be applied in the outbound direction,
either to an interface, or at the global configuration level.

Configuration Steps for APP


Figure 4-16 shows the steps to configure the APP TLV feature. DCBX configuration
is a prerequisite to configuring the APP TLV feature. Then ACLs are configured and
applied to a classifier object. Behavior objects are then defined, and the two are tied
together into a QoS policy.
Finally, the QoS policy is activated, and the configuration is verified.

******ebook converter DEMO Watermarks*******

Figure 4-16: Configuration Steps for APP

APP Step 1: Configure Traffic ACLs for Layer


2
The first step is to configure ACLs for Layer 2 traffic classes. Recall that Ethernet
ACLs are used to describe FCoE frames at Ethertype 0x8906.
Figure 4-17 shows an example ACL, configured to specify Ethertype 0x8906 for
FCoE. The number 4000 is used in this example, since Ethernet ACLs are specified
by the numbers 4000-4999. Optionally you can also create named ACLs.

Figure 4-17: APP Step 1: Configure Traffic ACLs for Layer 2

Note that you must specify an exact match on this Ethertype by using an all ones
mask of 0xFFFF. This is not an inverse or wild card mask, so 0xFFFF specifies that
the entire specified pattern of 8906 must match exactly. The example also shows how
to add comments to an ACL for documentation purposes.

******ebook converter DEMO Watermarks*******

Remember that the ACL simply provides a description of the traffic for classification
purposes, not for security purposes. This helps to explain why the use of permit or
deny has no effect in this use case. Whether you use permit or deny, the traffic
indicated will be considered a match.

APP Step 2: Configure Traffic ACLs for Layer


4
As in the previous step, this step involves defining an ACL, to be used for traffic
classification. Instead of an Ethertype ACL for Layer 2 traffic, an advanced ACL is
used, typically to select iSCSI traffic.
As before, the permit or deny keyword is not relevant. Also, for the DCBX APP TLV
feature, only destination port info is analyzed. Source port, IP source address, and the
IP destination address fields are ignored.
Figure 4-18 shows a typical example, used to specify iSCSI traffic at TCP port 3260.
The ACL number used is 3000, since advanced ACLs are in the range between 30003999.

Figure 4-18: APP Step 1: Configure Traffic ACLs for Layer 2

APP Step 3: Configure QOS Traffic Classifier


Once an ACL is created, it must be bound to a traffic classifier. QoS traffic
classifiers can group one or more ACLs. You must be careful to use the OR operator,
since the default operator is AND. Since it is impossible for a packet to be both
FCoE and iSCSI, no traffic would match your classifier.
The top example in Figure 4-19 reveals how to create a single classifier with two
criteria. You might do this if you wanted to specify a lossless Ethernet service for
both FCoE and iSCSI traffic.

******ebook converter DEMO Watermarks*******

Figure 4-19: APP Step 3: Configure QOS Traffic Classifier

If you are only interested in specifying FCoE traffic, the bottom example in Figure 419 could be used. In this example, there is an only one match criterion. However, the
OR operator must still be used. Even with a single match criteria, the DCBX module
will ignore the classifier unless the OR operator is used.

APP Step 4: Configure QOS Traffic Behavior


Now that classifiers are configured, traffic behaviors are defined. This describes the
action to be taken on a class. The DCBX module will only parse the dot1p behavior.
While the CNA ASIC may be capable of more advanced behaviors, the APP TLV is
limited to communicating this single behavior.
You may recall from the previous section that with PFC, traffic marked with a
specific dot1p value should receive no-drop, or lossless Ethernet service. The
purpose of configuring the APP feature is to ensure that appropriate traffic is actually
marked with this 802.1p value, so PFC can do its job.
Since we configured PFC to provide lossless Ethernet for anything marked with an
802.1p value of 3, we configure APP to mark appropriate packets with that value.
Figure 4-20 shows an example behavior for storage.

Figure 4-20: APP Step 4: Configure QOS Traffic Behavior

******ebook converter DEMO Watermarks*******

APP Step 5: Configure QOS Policy


In this step, shown in Figure 4-21, the classifier and behavior are bound together in a
QoS Policy. QoS policies are processed much like many ACLs are processed, in a
top-down fashion. This means that the order of rules is critical.

Figure 4-21: APP Step 5: Configure QOS Policy

There are no rule numbers that can be used to change the order, should you
accidentally enter rules in the wrong order. You must remove the policy rules and
reapply them in the proper order. The critical element here is the integration of the
QoS policy with DCBX. Remember, only QoS policy rules that include the mode
dcbx option will be handled by DCBX, and communicated from the switch to the
CNA.

APP Step 6: Activate the QoS Policy


The final configuration step is to activate the QoS policy. This can be done at the
global or interface level. Both must be applied in the outbound direction for DCBX.
In the example in Figure 4-22, Interface Ten-GigabitEthernet 1/0/49 connects to a
server CNA, so the policy is applied outbound to this interface.

Figure 4-22: APP Step 6: Activate the QoS Policy

APP Step 7: Verify


Figure 4-23 reveals how to validate your configuration efforts. The top example
shows how to verify what settings were proposed by the switch. The output has been
truncated to focus on pertinent information. You can see that frames with an Ethertype
of 0x8906 are assigned to a QoS map value of 0x8. As well see below, this translates
to a CoS value of 3.

******ebook converter DEMO Watermarks*******

Figure 4-23: APP Step 7: Verify

The bottom example of Figure 4-23 reveals what information the neighboring CNA
announces back to the switch. This validates whether the CNA has accepted the
proposed values.
Recall from our earlier discussion of DCBX that pre-standard versions v1.00 and
v1.01 of DCBX will only announce their originally configured version, with no
indication of whether or not they have accepted the proposed values. In that case, you
would only be able to infer that the values have been accepted by noting successful
operations of your system. The IEEE standard version of DCBX will display the
accepted, operational values, as shown in the example.
Figure 4-24 shows the 801.p to CoS map values used for Comware 7 devices. This
explains why a value of 0x8 was displayed in the previous LLDP output. Comware
switches map a CoS hex value of 0x8 to an 802.1p value of 3.

******ebook converter DEMO Watermarks*******

Figure 4-24: APP Step 7: Verify

APPOther examples
Figure 4-25 shows some other pertinent examples related to APP TLV functionality.
In the top example, both iSCSI and FCoE is proposed to the CNA as being marked
with an 802.1p value of 3 (Comware CoS 0x8).

******ebook converter DEMO Watermarks*******

Figure 4-25: APP - Other examples

The second example in Figure 4-25 hints at additional capabilities. As before, FCoE
Ethertype 0x8906 is assigned to CoS map 0x8 (802.1p = 3). Also ports 8000 and
6600 are assigned to CoS map 0x4 (802.1p = 2).
Port 6600 is the port used for VMWare vMotion, and port 8000 is used for some
other application of interest to the network administrator. Now that these applications
will be marked as indicated, QoS policy can be implemented to control this traffic.
For example, an administrator can reserve 1Gbps for vMotion traffic, 2Gbps for the
data application, and 4Gbps for storage.

ETSEnhanced Transmission Selection


While the APP TLV allows us to assign specific traffic classes to specific dot1p
values, it does not allow us to specify the bandwidth and queuing mechanisms used
by the CNA. So it is possible that all dot1p values would be assigned to the same
queue on the CNA. Marking traffic types with a specific 802.1p value will have no
effect if all the 802.1p values are processed by the same queue.
ETS, as defined by the 802.1Qaz standard, allows the switch to specify which
802.1p value should be processed by which queue on the CNA, and how much
bandwidth should be available for each queue.
Marking protocols, such as 802.1p at Layer 2, and DSCP at Layer 3, have been
standardized for years. However, actual queuing and scheduling mechanisms are

******ebook converter DEMO Watermarks*******

vendor defined. So, while we can mark packets in a standard way, how those packets
are actually processed can be unique for each vendor.
It is true that most vendors base their queue service on common mechanisms, such as
weighted fair queuing, weighted round-robin queuing, or strict priority queuing. Still,
the specifics of these mechanisms are not standardized. There was no standard for
specifying how to service packets with a specific marking.
802.1Qaz defines such a standard. It describes ASIC queue scheduling and
bandwidth allocation. The standard not only describes how scheduling should be
done, it also defines how the CNA can be programed from the network switch for
conformance. Thus, the switch can control how the CNA processes frames outbound,
back toward the switch.
The switch controls the number of queues (maximum 8) and CoS-to-queue mapping
on the CNA. So the switch can dictate which 802.1p values map to which queues.
For example the switch can indicate that packets marked with CoS value 3 shall be
placed into queue number 2.
The scheduling algorithm can also be controlled. Weights, which essentially
translates to bandwidth, can be assigned to queues.
ETS can control the number of queues used on the CNA. While the adapter may
initially be configured to utilize two queues one for data and one for storage traffic,
it can be configured to leverage more queues.
ETS allows us the option to assign particular dot1p values to specific queues. By
default a single dot1p class is assigned to a single queue. Most physical switches
support 8 queues per interface, so each of the eight 802.1p values gets its own queue.
This mapping can be customized by modifying the switches dot1p-to-LP QoS map.
There is one such map for the entire switch, so you cannot have unique mappings per
interface. If this mapping is changed, it applies to how the local switch processes
frames, and how the CNA will be instructed to process frames.
The default maps each 802.1p value to its own queue. In Comware terminology, this
means that each of the dot1p values is assigned to a unique local-precedence map.
This local-precedence map controls how the Comware switch processes frames. If
the local precedence value is 0, then the Comware switch places frames in queue 0.
In the example in Figure 4-26, the default one-to-one mapping of 802.1p value to
local precedence has been changed. In this configuration, only the 802.1p value of 3
is assigned to its own local-precedence value, and therefore its own queue. All other

******ebook converter DEMO Watermarks*******

802.1p values share queue 0. Essentially, this sets up a scenario where all the data
shares a single queue, and the storage traffic gets a queue of its own.

Figure 4-26: ETS Enhanced Transmission Selection

Several queuing options are provided by the ETS standard, as described here.
Strict priority can be a good option to use for voice traffic. When congestion occurs,
traffic in higher strict priority queues will be serviced first. Lower priority queues
will not be serviced unless higher priority queues are empty. The risk of these
mechanisms is that the strict priority queue can starve other queues.
For this reason, a credit-based mechanism has been introduced. The intention is to
provide strict priority queues, while mitigating the risk of queue starvation by
enforcing a rate limit.
This is a very good mechanism to use when there is a mixture of traffic types. For
example, VoIP traffic requires low delay and minimal variations in delay (jitter). The
strict priority mechanism ensures that the VoIP packets, placed in the strict queue,
will be serviced preferentially. However, the credit-based rate limit prevents these
packets from starving other queues. This mechanism has not been implemented yet.
For this reason, most implementations focus on the ETS queuing mechanism.
Enhanced Transmission Selection can be seen as both a standard to exchange
information, and a specific scheduling mechanism. This mechanism allows each of

******ebook converter DEMO Watermarks*******

traffic class to have its own minimum bandwidth or service level. But if that class
isnt utilizing its bandwidth, it is available for other classes.
The generic nature of this ETS mechanism frees vendors to implement it into their
unique hardware platforms. For Comware, this definition matches the Weighted
Round Robin (WRR) scheme, and is implemented on an interface. Bandwidth
percentages are calculated based on this scheme, and those percentages are sent to
the CNA.
It is up to the CNA to receive these values, and configure its ASIC in such a way that
it respects these values.

Configuration Steps for ETS


As with PFS and APP TLV, ETS configuration information is transported using
DCBX. Therefore, the configuration of DCBX is a prerequisite to the configuration
of ETS.
Figure 4-27 shows the three steps involved in configuring ETS. This includes
configuring CoS-to-Queue mapping, setting interface scheduling and weight
parameters, and then verifying the configuration. These steps are detailed in the
following sections.

Figure 4-27: Configuration Steps for ETS

ETS Step 1: QoS Map dot1p-lp


Modifying the QoS queue map is an optional step, since every switch already has a
mapping by default. Figure 4-28 indicates how to modify the default configuration on
a Comware switch, for a two-queue configuration. You can see that an 802.1p value
of 3 is mapped to queue 1, while all other queues are mapped to queue 0.

******ebook converter DEMO Watermarks*******

Figure 4-28: ETS Step 1: QoS Map dot1p-lp

So, based on this mapping, only two queues will be used to process all traffic.

ETS Step 2: Interface Scheduling and Weights


In the scenario in Figure 4-29, queue 0 and 1 are used to process all traffic. Based on
this, the ETS application will look at how those queues are actually configured on the
physical interface. This interface configuration will be translated to ETS values, to
be proposed to the servers CNA. With this in mind, the first step is to configure the
type of queuing. Since queue starvation is not desirable, WRR is to be configured.

Figure 4-29: ETS Step 2: Interface scheduling and weights

For WRR, two types of weights to be assigned byte-count and weight value. For
ETS applications, specifying weights using byte-count is best, since it is a more
accurate way to specify bandwidth utilization. If a weight value is used instead,
packets are counted. Since packets are of a variable length, you have less granularity
of control over bandwidth. Ten 500-byte packets is much different bandwidth
utilization that ten 1500-byte packets.
In the example in Figure 4-29, queue 0 is assigned a byte-count of 4, and queue 1 is
assigned a byte-count of 6. If weight had been specified, six packets of 200 bytes

******ebook converter DEMO Watermarks*******

each means that 1200 bytes would be transmitted, while 4 packets at 1500 bytes
results in 6000 bytes transmitted. So the weight values may not accurately reflect
bandwidth. To have more accurate control, therefore, you should use byte count.
On the physical interface in this example we are using byte count weight values of
four and six respectively. The configurable range is between 1 and 15.
4+6 = 10, so queue 0 gets 4 of 10 bytes (40%), while queue 1 gets 6 out of 10 or
60%. This is a simple calculation when there are only two queues.
The configured values in the example will be sent to the CNA. It is up to the CNA to
program its ASIC to actually support these values.

ETS Step 2 Continued: A Weight Problem


The caveat for the example in Figure 4-30 is that the switch will actually calculate
the percentage that it is announcing using all the queues assigned by WRR. Since all
the queues are enabled for WRR, and they all have a default weight, we dont see the
expected percentages in the LLDP output above. We see that each queue gets 11%,
except for queue 3, which gets 17%. This is actually fairly close to our intended
targets.

Figure 4-30: ETS Step 2 Continued: A Weight Problem

As shown in Figure 4-30, 11+17 = 28, and 11/28=39%, and 17/28=61%. However
when we look at the output, it is not intuitive or obvious that we have achieved our
goal. Further, we intend to use 2 queues, but we see all eight queues are in play.
To rectify this issue, we must ensure that only our intended queues are in use.

******ebook converter DEMO Watermarks*******

ETS Step 2 Continued: Assign Queues to SP


To ensure that only queue 0 and 1 are used, we can configure the other queues to use
the Strict Priority (SP) queuing mechanism. It is then vital that these other queues are
never actually used for anything. Otherwise, they could starve out the other two
queues on the interface. This is enforced with the local 802.1p to local-precedence
map. If this map doesnt assign any traffic to the other queues, they will remain idle.
The example in Figure 4-31 indicates how to assign these idle queues to use SP.

Figure 4-31: ETS Step 2 Continued: Assign Queues to SP

ETS Step 3: Verify Local Configuration


Now that the non-essential queues are assigned to use SP, only queues 0 and 1 are
used for WRR. Now the LLDP output in Figure 4-32 shows that queue 0 receives
40% of the bandwidth, and queue 1 is assigned 60% of the available bandwidth.

Figure 4-32: ETS Step 3: Verify Local Configuration

******ebook converter DEMO Watermarks*******

Summary
In this chapter, you learned about the simplicity, cost, and feature benefits of Data
Center Bridging. You also learned about how DCB compares to previous attempts at
converging data and storage networks.
You learned about the specific protocols that define DCB, starting with DCBX. This
is the communication protocol used between converged network switches and
storage server CNA adapters.
You learned that PFC helps to ensure that a lossless Ethernet service is provided for
FCoE traffic. It does this by enhancing the standard Ethernet Pause mechanism,
enabling it to pause frames for frames marked with a specific 802.1p value.
The APP TLV ensures that both the switch and its attached server CNA are properly
marking frames. This ensures that both PFC and ETS can function properly.
While APP is responsible for marking frames, ETS controls how to treat frames
marked with a specific value. ETS standardizes how frames are queued for
transmission, how much bandwidth each queue receives, and the queuing mechanism
used for each queue.
Lastly, the CN protocol was discussed. You learned how CN differs from PFC in that
it is an end-to-end protocol, as opposed to a link-local protocol.

Learning Check
Answer each of the questions below.
1. Select four DCB protocol components (Choose four)?
a. DCBX the Datacenter Bridging exchange protocol.
b. PFC: Priority-based Flow Control
c. ETS: Enhanced Transmission Selection
d. EVI: Ethernet Virtual Interconnect
e. CN: Congestion Notification.
2. DCBX is an extension to LLDP that facilitates connectivity between DCBenabled devices.
a. True.
b. False.

******ebook converter DEMO Watermarks*******

3. What is the name for the feature that provides a PAUSE mechanism per 802.1p
priority value to help ensure that storage traffic receives lossless service without
negatively impacting data traffic?
a. ETS
b. Congestion Notification
c. Standard Ethernet Flow Control
d. DCBX
e. PFC.
4. Which of the statements below accurately describe the Application TLV, or APP
(Choose three)?
a. The APP TLV allows the network administrator to configure a switch, which
will then automatically propose QoS policy to the CNA.
b. To implement APP, traditional QoS mechanisms must be defined, and then
applied to the APP TLV feature.
c. A special set of QoS mechanisms are provided to deploy the APP TLV
feature.
d. The APP TLV only accommodates the exchange of Layer 2 Ethertype value,
or Layer 4 TCP/UDP destination port number.
e.

The APP TLV can accommodate all of the classification mechanisms


supported by a typical switch.

5. ETS allows the switch to specify which 802.1p value should be processed by
which queue on the CNA.
a. True.
b. False

Learning Check Answers


1. a, b, c, e
2. a
3. e
4. a, b, d
5. a

******ebook converter DEMO Watermarks*******

5 Fibre Channel over Ethernet

EXAM OBJECTIVES
In this chapter, you learn to:
Describe Fibre Channel basic operations.
Understand the roles and ports in an FC Fabric.
Configure a 5900CP for native FC connectivity.
Configure FCoE functionality for Server access.
Configure Fabric Extension.
Understand and configure Storage Area Networking (SAN) Zoning.
Describe and configure NPV mode.

INTRODUCTION
In this chapter, you will learn about the fundamental concepts surrounding native
Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) based SAN fabrics.
This includes a discussion of fabric components, connectivity and operation.
Specific topics related to fabric addressing, security, reliability, and redundancy are
to be covered, as well as how to perform initial configuration functions.
Additional concepts and configurations involve FCoE host access, Fabric expansion,
zoning for security, and N_Port Virtualization (NPV).

ASSUMED KNOWLEDGE
You should be familiar with FCoE mechanisms from Chapter 4.

******ebook converter DEMO Watermarks*******

What is a SAN?
A Storage Area Network (SAN), as shown in Figure 5-1, is a separate Infrastructure
used for storage components. It is a network designed specifically for storage access.
Because of the critical nature of data storage, and requirement for absolute fidelity, a
SAN design must be very resilient and redundant.

Figure 5-1: What is a SAN?

Historically, the SAN was segregated from LAN traffic. This was largely due to the
limited bandwidth of 100Mbps and 1Gbps Ethernet, along with a lack of any
standardized means to converge the two networks.
Most SANs leverage the Fibre Channel protocol to transmit data between servers
and storage systems, with additional capabilities for long-haul links, in case there
was a need for inter-site replication services.

SAN Components
The components that make up a SAN infrastructure are introduced in Figure 5-2 and
below.

******ebook converter DEMO Watermarks*******

Figure 5-2: SAN Components

Switches: create the fabric that interconnects SAN devices, in a similar way that
Ethernet switches enable connectivity for LAN devices. A switch fabric is
simply a group of switches that are interconnected to provide a scalable solution.
While scalability is often desired, SANs require a very low-latency system. For
this reason, as few switches as possible should be deployed to meet an
organizations SAN requirements.
Routers, bridges and gateways: Devices typically used to extend the SAN over
long distances. SAN Fibre Channel systems use flow-control mechanisms that
were designed for low-latency, short-haul networks. Specially designed routers,
bridges, and/or gateways have the ability to extend the reach of SAN technology
over long distances, while satisfying the requirement for quick responses to SAN
signaling frames. These devices can have more advanced features, such as
integrating multi-protocol systems, improving fault isolation, and more.
Storage devices: This is the disk subsystems used to actually store data,
available with a wide variety of capacities and capabilities. Often, storage
systems are deployed as a Redundant Array of Independent Disks (RAID), or in a
Just A Bunch of Drives (JBOD) configuration. Various virtualization
technologies can be leveraged with storage systems.

******ebook converter DEMO Watermarks*******

Servers: The devices that connect to a SAN with either a Host Bus Adapter
(HBA) or Converged Network Adapter (CNA).
Cabling and connectors: The medium over which digital signaling is transmitted
and received. As with LANs, both fibre optic and copper solutions are available.

HP Disk Storage Systems Portfolio


Figure 5-3 reveals some of the many solutions that HP provides, as it relates to SAN
systems. This includes systems for the SMB market, such as the StoreVirtual 400. For
the midrange market, HP offers the 3PAR StoreServ 7000 and P600 EVA. Enterprise
class storage solutions, such as the 3PAR StoreServ 7450, 10000, and XP P9500 are
also available.

Figure 5-3: HP Disk Storage Systems Portfolio

This list of available products and features is rapidly evolving. It is recommended


that you consult with HPs Storage Single Point Of Connectivity Knowledge
(SPOCK) for detailed information about solutions, compatibility, and capability.
The current URL for HP Storage SPOCK is http://h20272.www2.hp.com/Index.aspx?
lang=en&cc=us&hpappid=hppcf

******ebook converter DEMO Watermarks*******

Converged Networking - Cookbooks


HP storage and server groups work in a very strict configuration mode with regard to
storage systems. The goal is to minimize all possible risks with regards to platform
interoperability, firmware upgrades, version capabilities, and more. This is why the
storage and server group creates validated configurations by building a complete
system of switches storage arrays, servers with Converged Network Adapters
(CNAs) and Host buss Adapters (HBAs), using specific firmware versions.
Various combinations are fully tested and validated to ensure the smoothest possible
deployment experience. As mentioned before, platforms, versions and firmware
upgrades are quickly evolving. As such, this study guide does not focus on specific
product combinations for deployment. The focus is instead placed on understanding
how these systems work, and how to configure them.

Host (Initiator)(Originator)
Consider a legacy, stand-alone server that is not connected to, nor using a SAN,
instead using internally installed disk systems. Communication is initiated by the
server it needs to either store or retrieve data from the disks, which passively wait
to receive and respond to these requests.
This aspect of data storage does not change after migration to a SAN-based solution.
Modern SAN systems simply move the storage out of the servers physical enclosure,
such that they are connected via some infrastructure, instead of being connected by a
local cable inside the servers enclosure. At a basic level, servers still request data
to be stored or retrieved from storage as before. This is why the host is referred to as
the Initiator, or originator of SAN service requests.
Note
There are some specialized replication conditions in which the storage system
initiates communication with another storage system. Most of the time, the host
server is the initiator, as described above.
The actual file or storage systems independence from the servers is based on
Logical Unit Numbers (LUNs). A LUN is how each logical disk or volume is
identified by the SAN.
Inside the storage system target, many terabytes of raw data may be available. The
storage administrator can logically separate this physical storage array into unique

******ebook converter DEMO Watermarks*******

volumes, or LUNs. The administrator can now decide which LUNs are available to
be presented to specific hosts.
Figure 5-4 shows a host connected to a SAN with two Host Bus Adapters (HBAs).
This provides for redundant, multi-fabric connectivity. SAN-A and SAN-B are
completely isolated from each other, providing two separate paths between the host
initiator and the storage target. The SAN system could select a path to use, or it might
be configured to load-balance between the two paths.

Figure 5-4: Host (Initiator)(Originator)

In this scenario, SAN-A and SAN-B are physically isolated. This means that the
SANs have completely isolated control, management, and data planes. It also means
that firmware updates to the SAN-A fabric will have no effect on the other fabric,
SAN-B.
In order to take advantage of these two, separate SAN fabrics, hosts must connect to
the SAN with adapters that are configured with Multi-Path I/O functionality. This is
because without MPIO, the server would not recognize the two paths to the same disk
system, thinking they were two unique disks, and therefor send different read/write
commands to what is actually one system. Of course, this would create a serious
corruption of data.

Disk Array (Target) (Responder)


In a SAN solution, a Disk Array is referred to as the target, or responder. It is the
target of host read/write requests, and responds to those requests. Typically the Disk
Array, as shown in Figure 5-5, will have multiple interfaces and controllers for

******ebook converter DEMO Watermarks*******

increased throughput, availability and redundancy.

Figure 5-5: Disk Array (Target) (Responder)

Like the host, the target system may have an interface connected to SAN-A and
another connected to SAN-B. The storage system most likely will have two separate
internal controllers, with each one responsible for communicating with its separate
SAN fabric.
Disk arrays are typically protected with controller cache memory system with battery
backup. In case of power outage, this write-back caching mechanism helps to
ensure data integrity.
Management software is typically deployed to do replication functions, often over
multiple locations, so the disk array can replicate/make back-ups to remote locations
for disaster recovery purposes.

Nodes, Ports, and Links


Specific terminology is used to refer to the Nodes, ports, and links that interconnect
SAN subsystems. The initiator and target interfaces to the fabric is called an
N_Port or Node Port. As shown in Figure 5-6, these N-Ports connect to a SAN
switches F_Port or Fabric Port. F-ports are therefore at the edge of the SAN

******ebook converter DEMO Watermarks*******

fabric.

Figure 5-6: Nodes, Ports, and Links

If your SAN consists of a single switch, you would only have F_Ports on the switch
connected to host and target N-ports. If you extend your SAN fabric to include
multiple switches, E_Ports, or Expansion Ports will be used to connect them.
While other port types are available, they are not relevant to the discussion here.
These port types will be reviewed in later chapters.

FC Frame and Addressing


As previously described, hosts initiate read/write requests to storage targets over a
SAN fabric. These request messages are carried inside a Fibre Channel (FC) frame.
This section will discuss FC frames, header format, packet flow, FC World Wide
Name (WWN), and the FC_ID.

Fibre Channel Frame


The FC frame begins with a Start of Frame (SOF) delimiter, and concludes with an
End-of-Frame (EOF) delimiter. The frame header contains addressing and other
information, as discussed in the next section.
There are optional headers that could be used to assist in things like encryption, for
example. The data payload field contains the actual data to be transmitted. Notice that
the payload field can be up to 2048 bytes, which is larger than the standard 1500-

******ebook converter DEMO Watermarks*******

byte maximum payload of an Ethernet frame. On a converged network FCoE is used


to carry FC frames inside Ethernet frames. For this to be successful, jumbo frames
must be enabled on the Ethernet infrastructure.
The FC frame also includes a CRC to validate data integrity between host and target.
Figure 5-7 shows a data frame. There are also link control frames that are used to
acknowledge frame receipt, and also for link responses (Busy or Reject).

Figure 5-7: Fibre Channel Frame

Fibre Channel Frame Header


Figure 5-8 shows the individual fields of the FC frame header, as described below.

Figure 5-8: Fibre Channel Frame Header

R_CTL: Indicates frame type (data, ACK, or Link Response) and data type.
D_ID: The destination identifier indicates the destination of the frame. An initiator
must determine the D_ID of a target so it can originate a request. There are
several methods of determining the D_ID, as will be discussed below.

******ebook converter DEMO Watermarks*******

CS_CTL: The Class Specific Control field is used for QoS.


S_ID: The source identifier indicates the originator of the frame. It can either be
assigned by the fabric controller or administratively set.
TYPE: Indicates the upper-layer protocol being carried. In other words, it
indicates what is carried in the Data Payload.
F_CTL: Frame Control indicates various options, such as sequence information.
SEQ_ID: The sequence identifier is assigned by the sequence initiator, and is
unique within a given exchange
DF_CTL: Indicates presence and size of optional header information.
SEQ_CNT: the Sequence Count is a 16-bit number that gets incremented on each
frame in a sequence. Storage data must be fragmented into pieces for transmission
and reassembled in the proper order upon arrival at the destination. SEQ_ID and
SEQ_CNT facilitate this process. A file being stored may be broken up into
several sequences, each with a unique SEQ_ID. That sequence is further
fragmented into 2112-byte pieces to fit into an FC frame. With these numbers the
destination can determine that it has received frame n of sequence SEQ_ID.
OX_ID: the Originator Exchange ID is filled in by the originator. This is used to
group related transmission sequences.
RX_ID: This value is set to 0xFFFF by the originator. Along with the OX_ID,
these values constitute a kind of nickname for any given exchange.
Parameter: The parameter field has multiple purposes. One of the most common is
to be used like the IP Headers offset field to indicate a relative offset location
for data, or for link control information.

Fibre Channel Terminology


Figure 5-9 highlights the characteristics of a frame, a sequence, and an exchange.

******ebook converter DEMO Watermarks*******

Figure 5-9: Fibre Channel Terminology

An exchange can be compared to a high-level SCSI read or write operation. Servers


need to read information from, or write information to a disk storage system. The
server sends information to the disk, and the disk responds back. It is possible that
the information sent to the disk subsystem is a very small request, such as read the
10MB file named MyDoc.pdf, for example. Of course, the response is that the
entire 10MB file is then transmitted from the disk subsystem to the server, which
requires several frames.
The complete group of frames that belong to a single request is called an exchange.
So, an exchange is a bidirectional communication which can be compared to a
traditional SCSI read or write operation. It is the complete process of a server
sending a read request to a disk, and the resultant frames to deliver that file to the
server. The server confirms that successful receipt of requested data.
This is also true for a write operation. The host sends a simple write request, the disk
confirms that it is ready to perform this operation, and then server transfers the file.
An exchange consists of a number of sequences. This is a communication of one or
more frames in a single direction. So, a simple read request would be a single frame,
and the response could be, say, 50 frames. The 50 frames sent from the disk to the
server is a sequence.
At the lowest level is a single frame. This is the description of what should be read
from the disk or the actual payload that has been read from the disk and is now being
transmitted to the server. The frame carries Upper-Layer Protocol (ULP) data. For

******ebook converter DEMO Watermarks*******

storage traffic, this is the SCSI protocol, which is encapsulated in a Fibre Channel
frame.

SCSI (FCP) write operation


Figure 5-10 illustrates the relationship between SCSI operations and the Fibre
Channel Protocol (FCP) implementation (at the upper layer protocol level). Also
described is how those operations translate to Fibre Channel and how the FC layer
packages them for transmission.

Figure 5-10: SCSI (FCP) write operation

For a write operation, a server initiator starts with a write command. This is a
sequence that contains only a single frame. A transfer ready response confirms that
the disk is ready to fulfill this operation.
Then the server transmits five frames in a row, all part of sequence number 3. When
all data has been sent, the target system confirms the reception of all data by sending
a response frame that with the sequence field set to a value of 4. This indicates that
all of sequence 3 data was received successfully, and so sequence 4 is expected next.
This scenario used five frames as an example. It is possible that fifty, one hundred, or
more frames could be transmitted in sequence, before an acknowledgement frame is
sent. The important thing is that all frames are received in order for the transmission
to successful. Otherwise, the data would be corrupted

******ebook converter DEMO Watermarks*******

This behavior explains why FC has very strict expectations for a lossless network.
The protocol has no elegant mechanism to recover from frame loss. In this scheme, if
a packet is lost, there is no selective retransmission. The entire sequence must be
retransmitted. For example, if frame 5 of 50 was lost, 45 frames would have to be
retransmitted.

FC World Wide Name (WWN)


The Fibre Channel WWN is a unique identifier for each device in the fabric. Each
HBA, CNA, and switch port must have a unique WWN. This is akin to the BIA of an
Ethernet adapter. However, the WWN is not used for addressing in FC, or for frame
delivery. Recall that the FC_ID is used for this. The FC_ID is a 24-bit address used
to define source/destination pairs inside an FC frame.
The WWN is used to unambiguously identify systems independent of the FC_ID. The
FC_ID is assigned dynamically by the fabric when the host connects, using a kind of
first come, first serve method. This means that when the host is rebooted, or loses
connectivity to the fabric for some time, another system could come on line, and
acquire the FC_ID that was originally assigned to the disconnected host. For this
reason, it is important to use some identifier that will remain constant, and the WWN
services this purpose.
This WWN proves useful at two levels. At the fabric level, the WWN can be used
for zoning. Figure 5-11 shows two servers connected to a SAN fabric, along with
two storage systems. Zoning can key on the assigned WWNs to control which servers
can see which storage system. In this way, zonings acts as a type of access filter,
limiting storage access to only appropriate, trusted hosts. There should typically be
one initiator per zone, as a best practice.

******ebook converter DEMO Watermarks*******

Figure 5-11: FC World Wide Name (WWN)

The second use case for WWN is known as LUN masking. This is a feature that
enhances the security of the storage system itself, as opposed to the FC fabric. Figure
5-11 shows three logical disks (LUNS) defined inside the storage system. The system
decides which LUN is visible to which initiator. LUN A could be visible to the
server at the top of Figure 5-11, with the WWN ending in b6. Meanwhile, LUN B
may be visible only to the bottom server, with a WWN ending in b7.

******ebook converter DEMO Watermarks*******

To summarize, zoning controls which targets are visible, while on that target, LUN
masking controls which LUNs are visible to a specific WWN. Zoning can also
contain Registered State Change Notification (RSCN) messages, which are generated
when nodes or switches join or leave the fabric, or when a switch name is changed.

FC WWN Structure
The WWN is a 64-bit address defined by the IEEE, see Figure 5-12. A portion of this
address contains vendor specific information, with another portion used as a type of
serial number, to distinguish between ports from the same vendor.
The WWN is used only inside the fabric to which the adapter is connected. This
means that the WWN does not need to be globally unique. It must only be unique on
the connected fabric.

Figure 5-12: FC WWN Structure

The IEEE has defined two formats for the WWN:


Original format: Addresses are assigned to manufacturers by the IEEE standards
committee, and are built into the device at build time, similar to Ethernet MAC
address. First 2 bytes are either hex 10:00 or 2x:xx (where the x's are vendorspecified) followed by the 3-byte vendor identifier and 3 bytes for a vendorspecified serial number
New addressing schema: the most significant four bits is either 0x5 or 0x6
followed by a 3-byte vendor identifier and four-and-a-half bytes for a vendorspecified serial number

Fibre Channel ID Addressing (1 of 3)


******ebook converter DEMO Watermarks*******

As previously stated, the WWN is not used for actual frame delivery, since the 24-bit
FC_ID serves this purpose. This FC_ID is assigned by the fabric when the node
connects, via a formal registration process.
Parallels can be drawn between FC_ID and IP addressing. Like IP addresses, FC_ID
is an end-to-end address, and does not change as the packet traverses routed systems.
It is also hierarchical. Each switch in the fabric has a unique domain ID. A switchs
domain ID serves as the first octet of the FC_ID assigned to any connected node. The
switch can also assign an area ID, and a vendor specific portion of the address.
Forwarding is based on this hierarchical address. As with IP, routing tables can be
summarized with masks, placing /8 or /16 prefix-based entries in the routing table.
Unlike IP, FC_IDs do not have an underlying Layer 2 address. With an Ethernet/IP
infrastructure, the MAC address is dynamically learned as needed, and mapped to the
IP address in an ARP cache. With FC, a node must go through a formal registration
process called fabric login or FLOGI. This process handles host registration, and
ensures that the switch fabric has correct addressing information.
Figure 5-13 shows an example of the original Brocade address structure for the
FC_ID. The first octet contains the Domain ID. This is a number from 1 to 239,
which is the maximum number of domain IDs possible in a single FC fabric.

******ebook converter DEMO Watermarks*******

Figure 5-13: Fibre Channel ID Addressing (1 of 3)

The next octet contains the area, which typically represents the switches port number.
The final octet is vendor specific. It could be 0 for the first host on an interface, and
increment from there.
As an example of this schema, if a switch was assigned to domain ID 1, and a host
connected to port 5 on that switch, then the FC_ID could be 010500.

Fibre Channel ID Addressing (2 of 3)


The Domain ID is a term that refers to the actual switch and all of N_Ports that it
groups together. The domain ID neednt be globally unique. It must only be unique
within a fabric. For Comware devices, there is a flat FC_ID assignment schema. The
area and ports are grouped together as Port IDs, as shown in Figure 5-14. Therefore,

******ebook converter DEMO Watermarks*******

16 bits are available for ports, which improves scalability for a Comware-based
solution.

Figure 5-14: Fibre Channel ID Addressing (2 of 3)

The FC_IDs are logically assigned, and are not based on connectivity to a particular
switch port. The first host that comes online, connected to a switch using domain ID
1, will be assigned an FC_ID of 0x010001. The second host to come on line will be
assigned an FC_ID of 0x010002, and so on, in a first come, first serve fashion.
This grouping based on a switchs domain ID can improve and simplify FC routing
functions. A switch with a domain ID of 01 can create an FC route for 01000/8,
grouping all FC_IDs of domain 01 in one class. Routing concepts will be explored
later in this chapter.

Fibre Channel ID Addressing (3 of 3)


In the example in Figure 5-15, there are two switches, each with a unique domain ID.
Switch 02 on the left has three servers connected. The switch on the right has been
assigned to domain 01, with two storage systems attached. All end ports (initiators or
targets) gets a unique FC_ID, assigned by the switch. So servers are all 0x02xxxx, all
storage systems are 0x01xxxx.

******ebook converter DEMO Watermarks*******

Figure 5-15: Fibre Channel ID Addressing (3 of 3) for FC Auto Mode

Fabric Domain IDs


Each switch can be statically configured with a domain ID, or it can be configured to
support a dynamically assigned domain ID. To avoid unpredictable results, you
should configure all switches in a fabric to use the same method, see Figure 5-16.

******ebook converter DEMO Watermarks*******

Figure 5-16: Fabric Domain IDs

When you use static domain IDs, you must configure a unique domain ID per switch.
Assigning static domain IDs is currently recommended as a best practice.
With dynamically assigned domain IDs, a switch is assigned an ID by an existing
switch as it joins the fabric. One switch in the fabric, called the principal switch,
carries this responsibility. The principal switch assigns the next unused ID out of the
239 available numbers.

Principal Switch Election


Principal switch election takes place at startup, within the first 10 seconds after
connections are enabled. The election criteria are simply based on a priority value
that you can configure. As shown in Figure 5-17, the switch with the highest priority

******ebook converter DEMO Watermarks*******

wins the election, and becomes the principal switch. If there is a tie in priority
values, the switch with the lowest WWN wins the election.

Figure 5-17: Principal Switch Election

The principle switch is responsible for assigning a local domain ID to all other
switches. During this process, a concept called a desired ID is supported. This
means that all FC switches can request a preferred ID. If available, the principal
switch assigns that value. If there is a conflict, (perhaps because static configuration
was applied to some switch), then the other FC switch will shut down the link. This
ensures that new switches will not disrupt an existing fabric.

******ebook converter DEMO Watermarks*******

FC Interswitch Forwarding
This section is focused on the forwarding of Fibre Channel traffic between switches.
Native FC flow control mechanisms are explored, along with bandwidth aggregation
capabilities, the FC routing table, and other available fabric services.

FC Flow Control Overview


Fibre Channel provides a lossless network, as required by the SCSI protocol. This is
necessary due to the SCSI protocols lack of good recovery mechanism, as
previously discussed.
Native Fibre Channel uses a so-called buffer-to-buffer credit mechanism. This means
that during initial peer connection, the peer grants credits, which control how many
frames may be transmitted. Credits deplete as frames are transmitted, and when all
credits are depleted, transmission must cease.
The receiving peer normally sends continuous credit updates during a communication
session, as long as buffers are available. This is a very safe mechanism, since
transmitters may not send data unless the receiving peer indicates that it is capable of
processing inbound frames.
Compare this with our discussion of FCoE mechanisms in the previous chapter.
FCoE uses the DCBX protocol with PFC. PFC allows transmission until the peer
sends a PAUSE frame. FCoE assumes there is ample bandwidth, using PAUSE
frames to stop transmissions during the occasional overload. When the pause frame
expires, traffic may be sent again. So, PAUSE frames are normally not sent.
To summarize, native FC B2B assumes it should not transmit, unless it receives
credits from the receiver. The FCoE PFC mechanism assumes it is free to transmit,
unless it receives PAUSE frames.

FC Classes and Flow control


Native FC defines three classes of service to differentiate traffic. Start_of_Frame
Connect Class 1 (SOFc1) provides a dedicated, guaranteed connection, which is best
for sustained, high-throughput sessions. Class 2 provides a connectionless service,
appropriate when very short messages are sent. Class 3 is similar to Class 2, but
with a variation in flow control. This is appropriate for real-time traffic broadcasts.
Figure 5-18 summarizes the characteristics of BB and EE flow control. The different

******ebook converter DEMO Watermarks*******

classes leverage these mechanisms in different ways. Class 1 frames use EE flow
control, (with one exception). Class 2 uses both BB and EE, while Class 3-based
sessions use B2B flow control exclusively.

Figure 5-18: FC Classes and Flow control

The BB flow control mechanism is used between an N_Port and an F_Port, and
between N_Ports that are directly connected in a point-to-point topology. Since BB
flow control lacks the overhead of sending Acknowledgement frames, it is wellsuited for time-sensitive transmissions. A special type of SOFc1 initial connect
frame also uses BB flow control.
The End-to-End (EE) flow control mechanism provides another option for ensuring
reliable communications between N_Ports. Class 1 and 2 FC traffic uses EE, in
which an ACK frame from the receiver assures the transmitter that the previous frame
has been successfully received. Therefore, the next frame can be sent.
If there are insufficient buffers to receive additional frames, a busy message is sent to
the transmitter. A corrupted or otherwise malformed frame will cause a Fabric frame
Reject (F_RJT) message to be sent to the transmitter.

FC Class 2 Flow Control Scenario


Figure 5-19 depicts an FC transmission session for Class 2 traffic, which uses both
B2B flow control, based on buffer credits, and EE flow control, based on ACK
frames.

******ebook converter DEMO Watermarks*******

Figure 5-19: FC Class 2 Flow Control Scenario

The servers N_Port sends a data frame to the target, and decrements its F_Port
credit count by one. The switchs F_Port receives this frame, and sends an R_RDY
frame to the servers N_Port, thus incrementing its credit.
The switch then transmits this frame out its target-connected F_Port, decrementing its
credit count for that target by one. The disk subsystems N_Port receives this data
frame, and sends an R_RDY to the switchs transmitting F_Port to increment its
buffer count back up by one. This disk responder also sends and EE ACK frame
through the fabric and on to the initiator, confirming successful receipt of the frame.
For B2B, the receiver continuously updates the transmitter by sending additional
credits, as long as it has buffers available. So the server may send only as much as its
credit allows. The storage system sends additional credits if it can handle the load.
This occurs at wire speed, so it is a very fast mechanism. The combination of both
credit counts between each F_Port-to-N_Port connection, and the EE ACK
mechanism provides a very robust, fail-safe transmission.

ISL Bandwidth Aggregation


To increase the available bandwidth between devices, physical inter-switch links can
be bundled into a single, logical medium, as shown in Figure 5-20. Different vendors
use their own terminology for this feature. Brocade refers to this as a trunked link,
Cisco calls it a port channel, while Comware calls this a SAN aggregation link.

******ebook converter DEMO Watermarks*******

Figure 5-20: ISL Bandwidth Aggregation

Comware supports layer 2 bridge aggregation and layer 3 route aggregation. This
allows the bundling of multiple FC ports into a single high-bandwidth logical link.

FC Forwarding
Each FC switch has an FC routing table. Each entry in this table contains the
destination FC_ID, a mask, and an outgoing port.
Each directly connected node has host, or node route entry in the table. Each node has
a unique 24-bit FC_ID, and so a /24 table entry indicates a route to a single device.
This can be likened to a /32 route in an IP route table.
Figure 5-21 shows a switch with domain ID 01, with two storage systems attached
and on line. The first target host to connect has been assigned the FC_ID of
0x010001, and has /24 full match entry in the FC routing table. This is a directly
connected route, connected on port FC 1/0/1. The entry for device with FC_ID
0x010002 is similarly recorded in the table.

******ebook converter DEMO Watermarks*******

Figure 5-21: FC Forwarding

During the initial registration of this device to the fabric, the FC_ID was assigned,
and the route was entered to the route table. If the host goes off line, the associated
route table entry becomes unavailable.
This mechanism ensures that a switch knows how to reach local hosts. If a fabric
consists of a single switch, there is no need to update the routing table. All hosts can
find each other, being directly connected to the same switch, and host routes are
entered automatically, as targets and initiators connect.
When multiple switches are in use, their unique domain IDs describe connectivity to
remotely attached devices. The first octet of any nodes FC_IDs is set to its attached
switchs domain ID. All nodes attached to switch with domain ID 0x02 have an
FC_ID that begins with 0x02, and so on.
The switch with domain ID 0x01 neednt have an entry for every host connected to
switch 0x02. It simply needs a single entry of 0x020000 /8, denoting the outgoing
interface attached to switch 0x02. In Figure 5-22, this is interface FC1/0/41

******ebook converter DEMO Watermarks*******

Figure 5-22: FC Forwarding

Similar to IP, both static and dynamic routing can be utilized. Static routes are
manually configured on each switch by the administrator. Dynamic routing uses the
Fibre Channel Shortest Path First protocol (FSPF).
FSPF is a link-state routing protocol, like OSPF or IS-IS. This protocol is on by
default, so when two FC switches connect, they automatically exchange routes.
FSPF can support link costs to accommodate complex routing scenarios. Also, the
graceful restart feature is available to support In-Service Software Update (ISSU).
This allows Comware switch firmware to be upgraded without downtime.

Fabric Services
This section will discuss Fabric login, Simple Name Service, state change
notification, and zoning.

Fabric Login (FLOGI)


When a node powers on, it activates its attached switch port link. Before
communicating, the node must login to the fabric. Unlike complex security login
mechanisms, which are based on usernames and passwords, this is a relatively

******ebook converter DEMO Watermarks*******

simple, yet formal registration process.


Both initiator and target nodes must register, enabling the fabric to learn about the
nodes. Thus, the fabric has no need to dynamically learn a Layer 2 address, like
Layer 3 TCP/IP systems must learn MAC addresses. This information is gleaned as
each device connects, through an explicit control-plane process called FLOGI.
During FLOGI, the Fibre Channel fabric assigns an FC_ID to the node. In the
example in Figure 5-23, the server is attached to a switch with domain ID 02, and it
is the first device to activate a port on this switch. Therefore, it gets an FC_ID of
0x020001.

******ebook converter DEMO Watermarks*******

Figure 5-23: Fabric Login (FLOGI)

The switch associates this address with the outgoing port that connects to the server.

Simple Name Service Database


As previously discussed, FC_IDs are assigned on a first-come, first-serve basis and
can change if devices lose connectivity. This makes the FC_ID less suitable for use

******ebook converter DEMO Watermarks*******

with Fibre Channel security services. This drives the need for a more stable
identifier, called the WWN. This is a number that remains constant, regardless of
reboots and outages. Since FC_IDs are still used for actual data transmission, the
fabric must maintain a map of each WWN and its associated FC_ID.
This WWN-to-FC_ID mapping service is provided by the Simple Name Service
database. This database is exchanged between switches, so every switch in the fabric
has a copy. The database shown in Figure 5-24 lists each WWN, along with its
associated FC_ID and node type. The nodes types are listed as either Target or
Initiator. When an initiator sends a query to this database, it asks to see all possible
target storage systems.

Figure 5-24: Simple Name Service Database

The Name Service can respond with the list of all FC_IDs. The server can then send
a logical request to each storage system to ask if any LUNs are available. However,
queries can also be filtered based on which device initiated a request. This allows
the fabric to show a different set of available targets to different initiator hosts.

******ebook converter DEMO Watermarks*******

In the example in Figure 5-24, suppose the server at the top (WWN ending in 0xb6)
is requesting all possible targets. The fabric can filter its response for this initiator,
perhaps only revealing targets with a WWN that ends with 0x10 (only target with
FC_ID 0x010002, in this example).
This is called soft zoning, since it does not actually enforce hardware security rules,
merely how responses are filtered for certain initiator queries. It may be technically
possible for a host to send a request to a known FC_ID, bypassing the soft-zoning
capability of the name service, and reach other storage systems.

VSANsVirtual SAN/Fabrics
VSANs, also known as virtual fabrics, provide the ability to implement multiple
fabrics on a single physical infrastructure. This is typically used when isolation is
required between fabrics. This isolation can be at different levels.
Data isolation ensures that no unintentional data transfer can occur between VSANs.
The physical links are shared among VSANs, but kept logically separate through the
use of VSAN tagging.
Control isolation provides independent instantiations of the fabric for each VSAN.
All Fibre Channel services are isolated for each VSAN. Each VSAN has its own
name service and zone service. It is therefore impossible to see information from one
fabric in another.
If an administrator relied solely on the soft zoning feature previously described, a
simple error in zone configuration could reveal classified targets to unauthorized
initiators. The creation of VSANs eliminates this issue, since all aspects of a VSAN
are isolated. An initiating host can only access targets in the same VSAN.
Fault isolation is also achieved in the data, control, and management planes, since
misconfigurations in one VSAN should not impact other VSANs.

VSAN vs Physical SAN


Figure 5-25 compares separate physical SANs on the left to a VSAN solution on the
right. On the left, each of the three departments has its own storage infrastructure,
deployed as separate physical fabrics, denoted as red, blue, and green. Deploying
separate physical infrastructure for each department creates additional cost, and
increases rack space and power utilization due to the larger number of devices to
manage.

******ebook converter DEMO Watermarks*******

Figure 5-25: VSAN vs Physical SAN

Instead, these systems could share a single physical infrastructure, as shown on the
right. This infrastructure can then be logically separated into VSANs. A common
storage pool can be shared among these VSANs. This reduces the number of switches
to be managed, thereby lowering costs for initial deployment, rack space, power, and
cooling.
Another benefit is that unused ports can be easily moved to an appropriate VSAN
without disrupting a production environment.

VSANsVirtual SAN/Fabrics on Comware


VSANs have historically been used for service isolation, but not necessarily for
redundancy. To accommodate redundancy, two physical fabrics (SAN A and SAN B)
are deployed with physically separate fabrics. In the previous example, only SAN A
is depicted, with the three departments virtually separated via VSANs. To add
redundancy, a separate physical SAN B fabric could be deployed, with identical
VSAN configuration. Each host can then be connected to both infrastructures for
redundancy.
Comware 7 switches can improve this scenario. With the Fibre Channel switch

******ebook converter DEMO Watermarks*******

functionality on Comware 7 devices, FC frames are moved through the internal


switch architecture using FCoE. This is because internally, Comware switches
always use Ethernet technology. This is true even for HP switches like the Comware
5900 CP, which provides native FC ports. Since native FC traffic is internally
switched via FCoE, each Fibre Channel VSAN requires an associated Ethernet
transport VLAN.
For a typical SAN A/SAN B concept, a Comware 5900CP-A and a Comware
5900CP-B is deployed, each with a single VSAN. Figure 5-26 shows a 5900CPCore1 and Core2. C1 hosts VSAN 11, while Core 2 hosts VSAN21. These switches
are not part of an IRF group, nor is there any other physical interconnection between
these two switches. Thusly the fabric separation is maintained.

Figure 5-26: VSANsVirtual SAN/Fabrics on Comware

However, the 5900AF top-of-rack switches need to use IRF for redundancy. To
maintain a logical fabric separation, VSAN 11 is configured, and the network
administrator ensures that only physical ports on unit 1 are assigned.

******ebook converter DEMO Watermarks*******

Similarly, VSAN 21 defined, and is only associated with physical interfaces from
unit 2 of the IRF. This ensures that neither VSAN 11 nor VSAN 21 traffic will cross
the IRF links, and the concept of physically separated fabrics is maintained. Although
they are managed and configured on the same IRF system, each VSAN is separately
processed by individual IRF members.
Along with the dedicated FC uplinks, the Top-of-rack switches also have a bridge
aggregation group configured, with physical uplinks to traditional data core switches.
These are the HP Comware model 12900-series switches indicated in Figure 5-26.

VSANsTagging
As shown in Figure 5-27, VSAN tagging for an FC fabric is quite similar to VLAN
tagging for Ethernet switches. Some Ethernet switch ports can be configured as a
member of a single, untagged VLAN for endpoint connectivity, while others may be
configured to use 802.1q tagging to support multiple VLANs for switch-to-switch
links. Fibre Channel also supports two types of tagging Native FC tagging and
FCoE tagging.

******ebook converter DEMO Watermarks*******

Figure 5-27: VSANsTagging

With native FC communications, Ethernet is not involved. In this case, FC frames


need a specific, special VSAN tag inside the FC frame. The native FC port can be an
access port, connected to a node, and using native FC frames.
Connections to switches may need to support multiple VSANs. This is where a trunk
link is used. The VSAN ID is tagged inside the FC frame.
FCoE uses a transport VLAN, which has an 802.1q tag. Even when sending normal
FC frames, FCoE uses an 802.1q tag. So for FCoE, there is no access port type. All
ports always have an 802.1q tag. This is an implicit tag that is also used for the
VLAN. This is why these will always be tagged, even if only a single VSAN is
permitted on the port.

Basic Configuration Steps


Figure 5-28 introduces the steps to configuring FC infrastructure.

******ebook converter DEMO Watermarks*******

Figure 5-28: Basic Configuration Steps

This starts with configuring the switch working mode and FCoE operating mode.
Then VSANs can be defined, along with a transport VLAN for FCoE, bound to the
VSAN.
Support for native Fibre Channel can be configured on the physical interface. The FC
port type is set, and the FC interface is assigned to a VSAN. Initially, a simple
default zone can be configured that permits everything.

Configuration Step 1: System-working-mode


The switch must be operating in advanced mode to have access to any and all FCoE
configuration syntax. This setting requires a system reboot to take effect. As depicted
in Figure 5-29, once the system working mode is set to advanced, this configuration
must be saved, and then a reboot command is issued.

******ebook converter DEMO Watermarks*******

Figure 5-29: Configuration Step 1: System-working-mode

After the reboot, the working mode is verified via the display system-workingmode command.

Configuration Step 2: Define FCoE Operating


Mode
The FCoE operating mode depends on how the system is deployed. The operating
mode can be configured as a Fibre Channel Forwarder (FCF), an N_Port
Virtualization (NPV), or as a transit mode switch.
Only one mode is supported per switch or IRF. Also, remember that this command is
only available after the switch has been configured to operate in advanced mode, as
previously described in Step 1.
For this discussion, an FCF is to be configured. NPVs will be covered in a later
section. Once set,as shown in Figure 5-30, the FCoE operating mode can be verified
with the display fcoe-mode command, as shown in Figure 5-31.

Figure 5-30: Configuration Step 2: Define FCoE Operating Mode

******ebook converter DEMO Watermarks*******

Figure 5-31: Configuration Step 2: Define FCoE Operating Mode cont.

Configuration Step 3: Define VSAN


The definition of a VSAN creates a virtual fabric. This virtual fabric can provide
complete isolation of services for different VSANs sharing the same physical
infrastructure, providing a logical fabric separation for IRF top-of-rack systems.
VSAN 1 is defined in the system by default, so any new Virtual FC or FC interfaces
are assigned to VSAN 1 by default.
In Figure 5-32, VSAN10 is created from global configuration mode on the switch.

Figure 5-32: Configuration Step 3: Define VSAN

Configuration Step 4: Transport VLAN and


Bind VSAN
Now that a VSAN is defined, a VLAN must be created and dedicated for the purpose
of FCoE transport. No other hosts or Layer 2 functions are permitted for this VLAN,
which has a one-to-one relationship with the VSAN.
You cannot have one VLAN that services multiple VSANs. Nor can you have one
VSAN that is serviced by multiple VLANS. If the intended design requires multiple
VSANs, then a VLAN must be defined for each one.
In Figure 5-33, VLAN 10 is defined from global configuration mode. This VLAN is
defined as being dedicated to service VSAN 10 with the fcoe enable vsan 10
command.

******ebook converter DEMO Watermarks*******

Figure 5-33: Configuration Step 4: Transport VLAN and Bind VSAN

In the example in Figure 5-33, the VLAN and VSAN numbers match. Although this
can be a good idea to minimize confusion and ease documentation, it is not a
technical requirement. Any VLAN number can be configured to support any VSAN.

Configuration Step 5: Configure FC Interface


Fibre Channel interface functionality must be configured, since the HP 5900CP
switch supports converged ports. Each port can be a 1 or 10gbps port, depending on
whether an SFP or SFP+ device is installed. Alternatively, the port could be a 4 or
8gbps native Fibre Channel port. Again, this depends on the adapter installed in a
particular port.
In the configuration, the port can be configured as either an Ethernet or as an FC port.
However, it is important to remember that the optic interface installed for that port
must match the intended configuration.
For example, if an SFP+ 10Gbps Ethernet interface has been inserted, and that
interface is configured for FC, then the port will remain inoperative, in a down state.
Ethernet configurations require physical adapters with Ethernet optics, and FC
configurations require FC optic-based adapters.
Note
16Gbps FC interface optics can be installed in an HP 5900CP. However, this
switch only supports a maximum of 8Gbps for FC, and so the port will only
operate at 8Gbps.
HP has released converged optics that can support both 8Gbps FC and 10Gbps
Ethernet. If those are deployed, the administrative configuration determines the
operational status of the interface.
The example in Figure 5-34 shows an interface initially operating as a 10Gbps
interface. When the port-type fc command is issued, this interface becomes an FC
port.

******ebook converter DEMO Watermarks*******

Figure 5-34: Configuration Step 5: Configure FC Interface

The interfaces operational status can be verified with the display interface brief
command. In the example in Figure 5-35, the interface is operating as a native FC
port.

Figure 5-35: ETS Step 2: Interface scheduling and weights

Interface FC1/0/1 is a member of VSAN 1 by default, and is currently in a nonoperational, or DOWN state.

Configuration Step 6: FC Interface Port Type


(1 of 2)
Now that the interface has been configured to operate as an FC port, its FC port type
can be configured. This includes being configured as one of the following:
E_Port: Expansion port connects to another switchs E_Port
F_Port: Fabric port connects to a nodes N_Port
NP_Port: NPV (virtual) enabled port
In the example in Figure 5-36, interface FC1/0/1 is configured as an F_Port, since
this port is to be connected to a server or storage systems N_Port.

******ebook converter DEMO Watermarks*******

Figure 5-36: Configuration Step 6: FC Interface Port Type (1 of 2)

Configuration Step 6: FC Interface Port Type


(2 of 2)
The command display interface brief can be used to validate the configuration. In
the example in Figure 5-37, interface FC1/0/1 is still in VSAN 1, but the mode
column shows that it is configured to operate as an F_Port.

Figure 5-37: Configuration Step 6: FC Interface Port Type (2 of 2)

Configuration Step 7: Assign FC Interface to


VSAN
In the scenario in Figure 5-38, the port should be a member of VSAN 10. To
configure this port to no longer be a member of the default VSAN 1, the port access
vsan 10 command is used.

******ebook converter DEMO Watermarks*******

Figure 5-38: Configuration Step 7: Assign FC Interface to VSAN

Only native FC interfaces can be configured as an access port. FCoE Virtual Fibre
Channel (VFC) interfaces use a transport VLAN, which serves as a VSAN trunk
protocol.
Again, the display interface brief command validates that interface FC1/0/1 has
been configured as a member of VSAN 10.

Configuration Step 8: Set Default Zone Permit


The default zoning configuration on a Comware switch denies everything. To change
this, a simple zone default-zone permit command. This command is analogous to
the permit any statement in an access list. The topic of zones and zoning will be
covered in a later section of this chapter.
For verification, the display zone status vsan 10 command reveals that the default
zone has been configured to permit all access, as shown in Figure 5-39.

******ebook converter DEMO Watermarks*******

Figure 5-39: Configuration Step 8: Set Default Zone Permit

Configuration Step 9: Status Review


Upon completion of a basic configuration, several commands are available to
validate Fibre Channel operation and configuration.
Display interface brief
Display interface FC packet pause (or drops)
Display transceiver info
Display vsan port-member
Display vsan login
Display vsan name service

Optional Debugging
Optionally, you may use the debug commands shown above.
Debug FC interface
Debug FLOGI
Debug FDISC

FCoE Overview
Topics to be covered in this section include the following:

******ebook converter DEMO Watermarks*******

Consolidation
Terminology
CNA
FCoE Stack compared to OSI/FC Stack
FCoE Frame Format
FIP: Fibre Channel Initialization Protocol
FPMA: Fabric Provided MAC-Address

FCoE I/O Consolidation


The main goal of FCoE is to achieve network consolidation. A typical deployment
could fill a rack with a large number of devices, cables, and adapters, due to the
separate infrastructure for LAN and SAN.
Figure 5-40 highlights the savings in equipment:
50% fewer switches in each server rackOnly two CN Top-of-Rack switches,
compared with four (two LAN and two FC) switches per rack with separate
Ethernet and FC switches
50% fewer adapters per server
75% fewer cable connections

Figure 5-40: FCoE I/O Consolidation

******ebook converter DEMO Watermarks*******

FCoE Goals
FCoE is tasked with maintaining the latency, security, and traffic management
attributes of Fibre Channel, while integrating with and preserving investment in
existing FC environments. The protocol must not be disruptive to any standard Fibre
Channel functions and capabilities. FC must continue to function as always, using the
same FC_IDs, WWNs, zoning, FSPF routing, and so on.
Interoperability between native Fibre Channel and FCoE based systems should be
very easy. This is because there is no device required to convert between native
FC and some other protocol. The native FC functionality is simply encapsulated in an
Ethernet frame, and then decapsulated prior to transmission to a native FC device.
This ability to integrate Ethernet and native FC without need for a separate protocol
simplifies the deployment of storage environments.
As explained above, all the capabilities of native FC technology are extended over
Ethernet systems through the use FCoE. Toward this end, it is vital to ensure that
Ethernet provides the lossless transmission that Fibre Channel requires. This
capability was previously described in the chapter about DCBX and PFC.

FCoE Terminology
The following introduces the terminology surrounding FCoE concepts and
configuration.
FCoE: Fibre Channel over Ethernet carries native FC frames inside a standard
Ethernet frame.
CEE: Converged Enhanced Ethernet, also known as Data Center Bridging (DCB),
describes a suite of protocols and capabilities necessary to support FC
technology over Ethernet infrastructure.
VFC: Virtual Fibre Channel Interfaces provide an FC abstraction layer over a
traditional Ethernet connection. This enables all of the traditional port types
supported on native Fibre Channel, including:
VN_Port: This provides the virtual equivalent of an FC N_Port, for end node
connectivity.
VF_Port: Provides the virtual equivalent of an FC F_Port, for switch fabric
connectivity.
VE_Port: This is the virtual equivalent of an FC E_Port, for switch-to-switch
links.

******ebook converter DEMO Watermarks*******

Enode: this is an FCoE device that supports FCoE VN_Ports. This includes both
server initiators and storage system targets.
FCF: an FC Forwarder is a device that provides FC fabric services with Ethernet
connectivity
FIP: the Fibre Channel Initialization Protocol acts as a type of helper protocol for
initial link setup.

Converged Network Adapters (CNA)


Each FCoE-capable server must be equipped with a Converged Network Adapter
(CNA). As the name implies, this adapter supports both Ethernet services for
standard LAN connectivity, and Fibre Channel services for SAN fabric connections,
see Figure 5-41.

Figure 5-41: Converged Network Adapters (CNA)

Traditional Host Bus Adapters (HBAs) only support native FC, while Network
Interface Cards (NICs) only support Ethernet. The CNA converges these two
functions into a single device. This adapter presents itself to the server OS as two
separate devices an Ethernet NIC and an HBA.
Therefore, the OS is not aware that convergence is taking place, continuing to
perceive two separate fabrics for LAN and SAN. This aspect of CNAs makes it easy
to migrate from separate legacy systems to a converged solution.

HP CNA Products
Figure 5-42 shows some of the CNA products that HP supports. New products and
capabilities are being added by HP on a regular basis. Please check current
documentation.

******ebook converter DEMO Watermarks*******

Figure 5-42: HP CNA Products

FCoE Server Access


A hardware-based CNA provides FCoE capability that is integrated with traditional
Ethernet services. Although both services are provided by a single device, the server
OS perceives a separate HBA and NIC. This not only makes network convergence
transparent to the server OS, but also to server administrators, who can continue to
configure these separate adapters as always.
The adapter will leverage FIP and the DCB suite (DCBX, PFC, and ETS) to
facilitate SAN fabric connectivity. These protocol suites run independent of each
other. If you configure FCoE to use VLAN 10, it is the network administrators
responsibility to ensure that VLAN 10 is assigned the correct 802.1p mapping, and
that PFC and ETS are properly deployed to provide lossless service for VLAN 10.
In Figure 5-43, the server has two CNAs installed. For Fibre Channel, CNA-Port1 is
connected, and will use FLOGI, and acquire FC_ID in VSAN 11, while CNA-Port2
does FLOGI and get a unique FC_ID on VSAN 21.

******ebook converter DEMO Watermarks*******

Figure 5-43: FCoE Server Access

Meanwhile the Ethernet functionality of the two CNAs can be aggregated in a


traditional NIC teaming configuration to enhance bandwidth utilization and
redundancy for LAN communications. The network administrator may choose how
and whether to team these NICs, just as they did with separate HBAs and NICs.

FCoE Stack Overview


Figure 5-44 compares the classic OSI models protocol concepts with FCoE and
native Fibre Channel. Notice that Upper Layer Protocol (ULP) services are identical
between FCoE and native FC, as are FC layers 2 through 4.

******ebook converter DEMO Watermarks*******

Figure 5-44: FCoE Stack Overview

It is only the physical and data link layer protocols of FC that have been replaced by
Ethernet. An FCoE mapping layer presents itself to FC-2 as a native FC interface
stack. It encapsulates the FC frames in Ethernet for transmissions, and decapsulates it
before passing received traffic up through the stack.

FCoE Encapsulation
The native FC frame is encapsulated in a typical Ethernet frame. In Figure 5-45, you
can see the standard Ethernet source and destination MAC addresses, the Ether Type
field, the IEEE 802.1Q tag, and the 4-bit version field. FCoE data frames have an
Ether Type of 0x8906.

******ebook converter DEMO Watermarks*******

Figure 5-45: FCoE Encapsulation

The standard Ethernet FCS or Frame Check Sequence serves as a frame trailer, and
aids in detection of corrupted frames.
Contained inside this Ethernet frame is a native, unmodified FC frame.

FIP: FC Initialization Protocol


With native FC connections there is a direct physical link between HBA and fabric,
so when the physical link is down, the FC link is of course down.
FCoE uses virtual links. While the Ethernet link may be up, a logical FC connection
must be established and maintained between the CNA and the FCF switch, see Figure
5-46.

******ebook converter DEMO Watermarks*******

Figure 5-46: FIP: FC Initialization Protocol

For example, the Ethernet link and all associated physical connections may be up, but
the Virtual FC interface could be manually shut down. FIP notifies the peer of this
condition, ensuring that it understands the lack of connectivity. FIP provides a
mechanism to accurately reflect the logical status of the FCoE connectivity.
Other functions provided by FIP include:
FCoE VLAN Discovery: Ensures that the CNA learns from the FCF which 802.1q
VLAN tag it should use.
FCF Discovery: Enables the CNA to find its attached FCF
FLOGI: Fabric Login must occur for any FC device to acquire an FC_ID and
communicate over the fabric. Since this is FCoE, a fabric MAC address will also
be allocated, called the FPMA.

******ebook converter DEMO Watermarks*******

FPMA: The Fabric Provided MAC-Address enables the FCoE transmissions


Link Keep-alive: With the above functionality complete, FCoE communications
are now possible. That status of the link is continuously validated with link keepalive messages.

FIP: VLAN and FCF Discovery


FCoE data frames are denoted by Ethertype 0x8906, while FIP frames use an
Ethertype value of 0x8916. Initial FIP frames are sent using the BIA MAC address
from the Ethernet portion of the CNA. This is the same as any typical Ethernet frame
would be sent.
The first step of the FIP protocol is to perform VLAN discovery. Since the VLAN has
yet to be discovered, the appropriate 802.1q tag is unknown. Therefore, these
discovery frames are sent as untagged, native Ethernet frames. The FCF recognizes
VLAN discovery messages, and responds with the FCoE VLAN ID, as configured for
that interface. VLAN discoveries are the only FIP frames that are sent untagged. All
other frames are tagged per the FCF VLAN discovery response.
The next step is to perform FCF discovery, in which the node sends a Discovery
Solicitation message. The FCF responds with a Discovery Advertisement, which
contains an FCF Priority value. If multiple forwarders exist on the same VLAN, they
would all respond to the solicitation message. The node selects the FCF with the
highest priority, which is the lowest numerical value.

FIP: FLOGI and FPMA


Once the FCF is selected, the host must login to that system, using FLOGI. You may
recall from a previous section that FLOGI results in the assignment of an FC_ID. For
FCoE, an FPMA is also assigned. The nodes CNA will use the FPMA as its source
MAC address for all FCoE frames (Ethertype 0x8906). Prior to FLOGI, the CNA's
BIA is used.
The FPMA is constructed of two 24-bit pieces the Fibre Channel MAC Address
Prefix (FC-MAP) and the FC_ID. The default FPMA prefix is 0xEFC00, but can be
manually configured. The second portion of the FPMA is equal to the assigned
FC_ID. Since this is unique per VLAN, there is typically little motivation to modify
the FC-MAP. The FPMA need only be unique within the VLAN, and the unique
FC_ID ensures this is the case. This is because there is a one-to-one relationship
between VLAN and VSAN.

******ebook converter DEMO Watermarks*******

The example in Figure 5-47 shows an FPMA of 0x0EFC00010004. This was


constructed from the default FC-MAP, and an assigned FC_ID of 0x010004.

Figure 5-47: FIP: FLOGI and FPMA

FCoE Design considerations


FCoE was designed to move data inside a single data center environment, and was
not designed for long-distance WAN communications. This is primarily due to the
timers involved in the PFC and PAUSE functions of DCBX, and associated buffer
calculations.

Configuration Steps for FCoE Host Access


As shown in Figure 5-48, the prerequisites are similar to previous configurations.
The server and storage nodes must be configured to support appropriate DCBX
functionality. Switches must support FCF mode and have a VSAN defined, along
with unique Domain ID assignment and a default zone permit, as a minimum.

******ebook converter DEMO Watermarks*******

Figure 5-48: Configuration Steps for FCoE Host Access

Configuration Steps for FCoE Host Access


Figure 5-49 introduces the steps to configure FCoE host access. These steps are
detailed in the following sections.

Figure 5-49: Configuration Steps for FCoE Host Access

Configuration Step 1: Create Virtual FC


Interface
A new virtual FC interface is created in the top portion of the example in Figure 5-

******ebook converter DEMO Watermarks*******

50. The second example in Figure 5-50 reveals how to verify this configuration. You
can see that VFC 2 has been created.

Figure 5-50: Configuration Step 1: Create Virtual FC Interface

Configuration Step 2: VFC FC Port Type


The switch port in this scenario is intended to connect to some end host. Therefore, it
must be configured as an F_Port. The examples in Figure 5-51 reveal the syntax to
configure and verify this requirement.

******ebook converter DEMO Watermarks*******

Figure 5-51: Configuration Step 2: VFC FC Port Type

Configuration Step 3: Bind VFC to Interface (1


of 2)
To function, the previously configured virtual interface must be associated with a
physical interface. This could be either a single physical interface, or logical linkaggregation interface.
When a single VFC is bound to an Ethernet link-aggregation, the FCoE traffic will be
distributed over the link-aggregation member ports using the traditional hash
mechanisms. Since the FCoE frame does not contain an IP header, the hashing
algorithm will use the source/destination Ethernet MAC address for the calculation.
Since all communication between FCoE devices will be using a stable MAC
Address, the communication between any 2 FCoE devices is guaranteed to use a
single link of the link-aggregation. This ensures that link aggregation will not
introduce problems such as out-of-order delivery.
Out-of-order delivery is not an issue for traditional IP networks. Multiple packets of
a single flow can be sent over different physical links or paths with different latency.
Sequencing information contained in the headers allows packets to be reassembled in
the proper order, regardless of the order in which they arrived.

******ebook converter DEMO Watermarks*******

Binding a virtual Fibre Channel interface to a logical aggregation interface is not


applicable for a server-facing port-group. This is because servers typically have two
physical CNA adapters. The reason for having two Fibre Channel connections from a
server is to connect each one to a separate fabric, and so they cannot be aggregated.
The use of bridge aggregation is appropriate for inter-switch links, thus providing
ample bandwidth for the multiple host communications traversing these links.
In Figure 5-52, VFC 2 is bound to physical interface ten-gigabitethernet 1/0/2.

Figure 5-52: Configuration Step 3: Bind VFC to Interface (1 of 2)

Configuration Step 3: Bind VFC to Interface (2


of 2)
The binding can be verified with the display interface vfc brief command, as
shown in Figure 5-53. The example reveals that VFC2 is bound to interface
XGE1/0/2.

Figure 5-53: Configuration Step 3: Bind VFC to Interface (2 of 2)

Configuration Step 4: Assign VFC Interface to


VSAN
As with native FC interfaces, the virtual interface must be assigned to a VSAN. You

******ebook converter DEMO Watermarks*******

have learned that the VSAN traffic is transported over a VLAN, and that FCoE uses
802.1q VLAN tagging. Since the tagging allows for multiple VLANs, multiple
VSANs are implicitly supported. To achieve this functionality, the FCoE VFC
interface must be configured as a VSAN trunk port, as in the example in Figure 5-54.

Figure 5-54: Configuration Step 4: Assign VFC Interface to VSAN

Configuration Step 5: Physical Interface VLAN


Assignment
The virtual interface has been configured to use VSAN 10, which is using VLAN 10
as a transport. The virtual interface has been bound to a physical interface. This
physical interface must therefore be configured to support VLAN 10.
The example shown in Figure 5-55 completes this scenario by configuring the
physical interface to be a trunk port that allows VLAN 10.

Figure 5-55: Configuration Step 5: Physical Interface VLAN Assignment

Fabric Expansion
In previous sections you learned about FCoE connectivity between Top-of-Rack
switches and server hosts. The example scenarios have revealed how to connect to
isolated FC switches. In these scenarios, one switch is an FCF which connects to the
server CNA, using FCoE, and another switch is a 5900CP that connects to storage
systems via native Fibre Channel.
The focus now shifts to interconnecting the FCoE Top-of-Rack and native FC
switches using a fabric expansion. This involves understanding and configuring
E_Ports, the FSPF routing protocol, and validating the resultant Fibre Channel
routing table.

******ebook converter DEMO Watermarks*******

Fabric Expansion: E_Port


For FC switch-to-switch connections, each side of the link is to be configured as an
expansion port, called an E_Port. Once configured, the switches discover each other,
and fabric services are thus extended across multiple switches. The E_Port in an
HP5900 or other Comware device must be facing another Comware E_Port.
This includes Fabric link services, such as the FSPF routing protocol, which
populates the FC routing table by exchanging domain information.
Name service database exchange will also occur, providing a consistent, fabric-wide
name service. All targets and initiators can be aware of all devices, regardless of
physical switch connections.
For security purposes, Zone databases information is exchanged between switches.
All switches have access to all zone information.
VSAN tagging support will also be configured consistently across all switches in the
fabric.
For FCoE, the E_Port is a virtual construct, and so is referred to as a VE_Port. This
has the same functionality as a native Fibre Channel E_Port. Unlike most FCoEbased functions, DCBX protocol functionality is not required for VE_Ports. Instead,
the manual configuration of simple PFC commands is sufficient.
For Server CNAs, FIP fulfills all of the initial connection requirements. However,
FIP does not serve this purpose with switch-to-switch links. Instead, it is simply
assumed that the network administrator properly configures these connections. The
FIP keep-alive mechanism is used to determine ongoing link status.

Fabric Expansion: Routing Table Exchange


Once the switches become aware that they are part of an expanded fabric, they will
exchange routing table information. These routing tables can be constructed using
static routes or via the Fibre Channel Shortest Path First protocol.
On Comware switches, FSPF is enabled by default to ensure that /8 switch domain
ID routes are automatically exchanged.

Configuration steps for Fabric Expansion with


FCoE
******ebook converter DEMO Watermarks*******

Prior to configuring fabric expansion, you must manually configure the physical
interface to support PFC.
Then you can create a new VFC interface, set its port type to be an E_Port, and verify
the configuration.

Configuration Step 1: Create New VFC


Interface
The first step is to prepare a new VFC interface. In this scenario, interface Ten1/0/4
is to be connected to another switch, thereby serving an E_Port role. The example in
Figure 5-56 shows this interface being configured as a trunk port, with VLAN 10
enabled to traverse the link.

Figure 5-56: Configuration Step 1: Create New VFC Interface

Next, VFC 4 is created, bound to the interface, and made a member of VSAN 10.

Configuration Step 2: Set FC Port Type to


E_Port
The virtual port is configured as an E_Port, and then the configuration is validated.
The example in Figure 5-57 shows that VFC 4 is created, defined as an E_Port, and
bound to interface XGE1/0/4.

******ebook converter DEMO Watermarks*******

Figure 5-57: Configuration Step 2: Set FC Port Type to E_Port

Configuration Step 3: Verify Status


As shown in Figure 5-58, several display commands are available to verify
successful FIP peering, FSPF route peering and routing table information, as well as
the name service database.

******ebook converter DEMO Watermarks*******

Figure 5-58: Configuration Step 3: Verify Status

Multi-path - Concepts
Multi-path deployments connect a host system to both SAN A and SAN B for
redundancy. Storage systems that support path redundancy could have dual adapters
to connect to both SANs. Figure 5-59 depicts a storage system with CTRL-1 and
CTRL-2, and each controller is connected to both SAN A and SAN B.

******ebook converter DEMO Watermarks*******

Figure 5-59: Multi-path - Concepts

If these controllers were configured in an Active-Active mode, then the server would
see LUN-A four times. Using HBA-P1, connected to SAN A, the server would see
the target FC_IDs for CTRL-1 and 2, and each would show LUN-A. The server
would have a similar view via HBA-P2, via SAN B.
If the server is not aware that it is seeing the same disk 4 times, it could write
different data to the same disk, leading to file system corruption. To prevent this
issue, a Multi-Path I/O (MPIO) driver is required for host HBAs. The MPIO feature
ensures that each of the four paths are identified and recognized as separate
connections to the same LUN. If one path fails, the MPIO automatically switches to a
different path, enabling continuous service in the face of hardware or connection
failures.

******ebook converter DEMO Watermarks*******

MPIO also makes load-sharing options available. Various algorithms could be used
to split the load among different paths, some based on connections, some based on
perceived load. Load balancing may require special load-balancing software
installations on the server, and must be configured by the server administrator. The
fabric has no control over these load-sharing functions.

Multi-pathAutomatic Failover
MPIO facilitates automatic failover functionality. In Figure 5-60, the active link
between HBA-P1 and the SAN A switch has failed. The MPIO driver will
immediately detect this failure, and use HBA-P2 for continued service.

Figure 5-60: Multi-pathAutomatic Failover

This failover feature is transparent to the server, and no special fabric configuration

******ebook converter DEMO Watermarks*******

is required to support this feature. The network administrator must simply ensure that
both fabrics support the same services and connections. If a certain storage target
was only connected to SAN A, there is obviously no failover capability available to
this target via SAN B.
A similar requirement relates to fabric zone configuration, which must be configured
identical on both fabric A and B. If Fabric Bs zone configuration filters server
visibility to a target, failover functionality is broken.

Fabric Zoning
Fabric zoning provides for access restrictions inside a VSAN, and so is configured
separately for each VSAN. This zoning configuration controls with nodes may
communicate with each other.
The objective is to ensure that host initiators can only discover intended targets. This
control is implemented as a set of permit and deny statements, similar to a TCP/IPbased ACL.
Figure 5-61 shows two storage systems. One is a Tier-1 production system for ESX
hosts, named 3Par. The other is a Tier-2 system named MSA. The server named
ESX-1 is placed in a zone named ESX, along with the 3Par storage system. The
archive server is placed into the archive zone with the MSA storage system.

******ebook converter DEMO Watermarks*******

Figure 5-61: Fabric Zoning

Zones can be configured such that only devices in the same zone may discover each
other. Although all systems share the same fabric, their access scope is limited by
zoning. It is quite easy for the network administrator to modify this behavior at will.
Existing servers can be granted additional access, or have stricter filtering controls
applied, and new servers and zones can be added or modified.

Fabric Zoning Concepts


A zone member simply refers to a node, either by FC_ID or the port WWN (pWWN).
For reasons soon to be described, it is recommended that the actual FC_ID or
pWWN be abstracted in the zone configuration through the use of Zone aliases.

******ebook converter DEMO Watermarks*******

Zones are defined in order to group members together. Traffic between all members
of the same zone is allowed. It is often best to avoid having a large number of
members in a single zone, as this will impact the number access rules to be created.
Instead, consider creating small zones for point-to-point connections. It is also
recommended to have only one initiator per zone. For example, a zone may be
created to allow host ESX-1 to access the 3Par storage target. A second zone could
be created to allow host ESX-2 to access 3Par, and so on. This is often preferable to
creating a single zone with several server and storage system members.
With two hosts and a target in the same zone, the switch needs a rule to permit ESX-1
to ESX-2, and to 3Par, and rules in the other direction, from 3Par to ESX-1, along
with rules from ESX-2 to ESX-1 and 3Par, and back. A zone with 10 members
requires an exponential number of rules, since you must allow each member to see
every other member. Creating zones with only two members is recommended, since it
preserves hardware resources and reduces configuration efforts.
As shown in Figure 5-62, defined zones can be grouped into a zone set. The zone
database supports multiple zone sets, but only one zone set can be active at any time.
The zone database is distributed to all FC switches, and can be configured to share
all zone sets, or only the active zone set.

Figure 5-62: Fabric Zoning Concepts

Zone Members
As shown in Figure 5-63, zone members can be identified based on FC_ID or WWN.

******ebook converter DEMO Watermarks*******

Remember that an FC_ID is dynamically assigned, and can change over time. This
lack of permanence makes the FC_ID a less reliable identifier for security purposes.
However, switches also support static FC_ID assignment, thereby eliminating this
concern.

Figure 5-63: Zone Members

FC_ID-based zoning is often referred to as hard zoning, since it is enforced at the


hardware level. This makes it a more secure method of zoning, especially in
conjunction with fabrics that contain untrustworthy nodes, or nodes not under your
direct administrative control.
WWN-based zoning is often called soft zoning, since it is enforced by the name
server. When servers query the name service, zoning can filter the response, so that
initiators will only learn about authorized targets.
Since FC_ID can change, WWN-based zoning is considered a more stable method of
access control. For example, if a VM is configured with a virtual HBA (vHBA),
giving it direct Fibre Channel access, the VM will have its own FC_ID and WWN. If
this VM is moved to a different host, its FC_ID would change, but the WWN is

******ebook converter DEMO Watermarks*******

maintained and would therefore have consistent SAN access.


Zone Aliases are logical names that the administrator can assign, and to which the
hosts FC_ID or WWN can be bound. If a CNA or HBA must be replaced, only the
zone alias configuration need be updated. The rest of your zone configuration remains
valid, since it only references the alias.
Zone Aliases also ease the process of copying zone configuration from SAN-A to
SAN-B. The WWNs are unique between SAN A and B, but since your zone
configuration only references aliases, it can simply be copied it from SAN A to SAN
B.

Zone Enforcement
Hard zoning is the default zoning method for Comware devices. This means
permitted source and destination address information is programmed at the ASIC
level, creating a hardware-enforced ACL, permitting and denying traffic based on
information in the transmitted frames.
Since ASICs have a limited number of resources, an overly large zone set may force
the switch to use soft zoning. This is especially true for zones that have been
configured with many members. The switch from hard to soft zoning will occur
automatically, when hardware resource limits have been reached, see Figure 5-64.

******ebook converter DEMO Watermarks*******

Figure 5-64: Zone Enforcement

The switch to soft zoning means that filtering is no longer enforced at the packet
level. Instead, filtering occurs when the switch responds to name service requests.
For example, when the archive server queries the name service for targets, its
response only includes the MSA target. Since the 3Par storage target is not included,
the archive server is unaware of that target.
Access to that target is technically possible, but would require a relatively skilled
hacker to determine the FC_ID for 3Par, and reprogram the HBA to transmit frames
directly to this FC_ID, without use of standard discovery mechanisms.

Zoning Configuration Prerequisites


******ebook converter DEMO Watermarks*******

Prior to zone configuration, an operational VSAN must be configured, and the port
WWNs for hosts must be documented.

Configuration Steps for Zoning


Figure 5-65 introduces the steps to configure zoning, as detailed in the following
pages.

Figure 5-65: Configuration Steps for Zoning

Configuration Step 1: Prepare Zone Member


Alias
Zones can directly reference FC_ID or pWWNs. However, zone member aliases can
provide ongoing administrative advantages. While multiple members can be
configured with the same alias, it is a best practice to configure a unique alias per
member to support more granular security controls in the future.
In the example shown in Figure 5-66, zone aliases are analogous to objects used in an
ACL. Two arbitrary but administratively meaningful zone alias names are configured,
and the associated pWWNs are assigned to them.

******ebook converter DEMO Watermarks*******

Figure 5-66: Configuration Step 1: Prepare Zone Member Alias

All zone configuration is VSAN specific. You must deploy separate zone
configurations for each one. However, with consistent zone aliases, you can simply
copy and paste the rest of your zone configuration among VSANs.
In other words, the configuration indicated in Figure 5-66 is the only portion of the
zoning deployment that will be unique between VSANs. All of the zoning
configuration described in the following pages can simply be configured once, on
VSAN 10, and then copy/pasted to your other VSAN.

Configuration Step 2: Define Zones


Zone definitions are analogous to individual lines (ACEs) in an access list. Zone
members are allowed to communicate with each other.
In the example in Figure 5-67, a zone named esx1-3par1 is created and members are
specified, based on the aliases created previously. Since aliases were used, this and
all remaining zone configuration syntax can simply be copied and pasted into the
other VSAN. Also, this example follows the previously mentioned best-practice of
creating small, point-to-point zones, to ensure ASIC resources are not unduly taxed.

******ebook converter DEMO Watermarks*******

Figure 5-67: Configuration Step 2: Define Zones

Another example is also shown in Figure 5-67, to reveal the syntax used to base zone
membership on FC_IDs and pWWNs. This is also shown to point out that mixing
WWN and FCID is not considered a best practice, and should be avoided.

Configuration Step 3: Define a Zone Set


Now that zones have been defined, they can be grouped together into a zone set.
Similar to how an ACL groups individual ACEs into an applicable entity, a zone set
group zones together into a single entity.
The example in Figure 5-68 shows a set named Zoneset1 being created, with the
previously defined zones specified as members.

Figure 5-68: Configuration Step 3: Define a Zone Set

Configuration Step 4: Distribute and Activate


Zone Set
Continuing with the access list analogy, an ACL will group individual ACEs into an
entity that can then be applied to an interface. The defined zone set collects groups

******ebook converter DEMO Watermarks*******

together into an entity, which can be distributed and activated in the fabric.
This entity will indeed be distributed to all switches in the fabric, including both
native Fibre Channel and FCoE-based switches. Recall that while multiple zone sets
can exist in the database, only one can be active in the fabric. As the network
administrator, you can configure the distribution to include all zone sets, or only the
active zone set.
Figure 5-69 shows how to configure the distribution for all zones, by using the full
option, and then configuring which zone is to be activated.

Figure 5-69: Configuration Step 4: Distribute and Activate Zone Set

Configuration Step 5: Verify


Validation commands include the following:
Display zoneset vsan
Display zone name esx1-3par1
Display zone-alias
Display zone member fcid 010001
Display zoneset active vsan 10

NPV NPIV Overview


This section provides insight into the N_Port Virtualization and the N_Port Virtual
ID. Terms will be defined and related concepts will be explored. You will then learn
how this can improve multi-vendor interoperability, and how to configure the N_Port
Virtualization role on a fibre channel device.

Server Virtualization with NPIV


The goal of N_Port ID virtualization is to enable Hypervisors such as VMware ESX
or Microsoft Hyper-V to extend physical HBA functionality to Virtual Machines
(VMs).

******ebook converter DEMO Watermarks*******

Figure 5-70 shows VMWare ESX server 1 with a physical HBA (pHBA). This
physical HBA will perform a traditional FLOGI to the fabric, and so the physical
ESX server gains access to LUNs.

Figure 5-70: Server Virtualization with NPIV

For a VM deployment, a virtual HBA is added to each VMs device list. On the
storage system, a LUN is defined for this virtual HBA to access. For example, the
VM might be a Microsoft Exchange Mail server, able to directly access the SAN
fabric through the virtual HBA, and use the defined LUN to store and retrieve data.
Multiple VMs run on one physical host, and each VM requires SAN access. This
means that each VM must perform an FLOGI to the fabric, and be assigned a unique
FC_ID and WWN. Since a single physical server is hosting multiple VMs, the FC
fabric perceives a single physical port performing multiple logins, with multiple
addresses assigned.

******ebook converter DEMO Watermarks*******

The FC switch fabric must have the ability to support this scenario, in the form of a
feature called N_Port ID Virtualization. Support for NPIV is enabled by default on
the Comware Fibre Channel switches. Both the ESX host and the physical HBA must
also support NPIV, since VMs are not aware of this concept.
The VMs virtual HBA is simply performing a traditional N_Port role, using standard
FLOGI communications. Inside the ESX host, a virtual port is created towards each
VM. The example in Figure 5-70 depicts two VMs deployed in a single ESX host, so
virtual ports 1 and 2 are created. Each virtual port operates as F_Port. As in physical
fabrics, the VMs virtual N_Port connects to this virtual F_Port.
However, ESX hosts are not actually switches, and so are not capable of processing
the virtual HBAs FLOGI request. This request is forwarded by the physical host to
its upstream switch connection. The VM perceives that it is communicating with the
virtual F_Port for FLOGI, while the physical server actually forwards this on to the
physical switch.
With NPIV, the host servers physical HBA performs FLOGI and receives the first
available FC_ID. The physical HBA will then proxy the VMs virtual HBA login
toward the upstream FC switch. The VMs virtual HBA receives the next available
FC_ID.
For data forwarding, it is important to understand that all storage traffic must leave
the physical HBA. For example if VM 1 is running FC target software, it can operate
as a disc storage system and accept incoming connections. VM 2 is configured as a
typical server, and so could use VM 1 as a target. However, this traffic cannot stay
inside the ESX environment because it is not a fibre channel switch. It has no
knowledge of FC routing and zoning.
This is why the physical ESX host must always forward traffic upstream to an actual
switch, which can route the traffic to an appropriate destination. This could be back
over the same physical interface, in this example.
This is a very unlikely scenario because most VMs are used as initiating hosts. Still,
the possibility does exist, and the scenario serves to highlight the relationship
between virtual and physical components.

FC Switch with NPV Mode


A Fibre Channel switch can be configured to operate in NPV mode, to take advantage
of the NPIV concept. Essentially, this moves the functionality of ESX internal virtual
ports (described previously) out of the virtual realm toward the ESX physical server

******ebook converter DEMO Watermarks*******

and its directly attached physical switch ports.


The NPV mode switch has an uplink to another physical switch, assigned a domain
ID of 0x01 in this example. This uplink is configured as an N_Port Virtualization
(NP_Port), which indicates that it will proxy FLOGI requests to the upstream switch.
The downlink ports connected to hosts are configured as F_Ports, since the attached
ESX hosts connect with traditional HBA N_Ports. Since no fabric services are
provided by this switch, all fabric service requests received from the ESX hosts are
proxied to the upstream fibre channel switch at Domain ID 0x01. All FLOGI sessions
are proxied to this upstream switch, which will assign FC_IDs.
In Figure 5-71, the NPV switch logs into switch 0x01, receiving the first available
FC_ID of 0x010001. Assuming the ESX servers are the next to login, they will be
assigned FC_IDs of 0x010002 and 0x010003.

Figure 5-71: FC Switch with NPV Mode

******ebook converter DEMO Watermarks*******

When one of the physical servers performs a name service lookup, this is also
proxied through the NPV switch, on to FC switch 0x01.
Also, the same data forwarding rules apply as in the previous example. All traffic
must exit the NPV switch. If host ESX 1 requires a storage connection to ESX 2, this
request must be forwarded upstream to switch 0x01, which will then send that
request out the same physical interface, back through the NPV, and on to target
0x0003. As before, this is an unlikely scenario, since hosts are typically organized
behind the NPV switch, while storage systems would be connected to the fibre
channel switch.

FC Switch NPV Mode - Considerations


Figure 5-72 compares advantages and disadvantages of using a Fibre Channel switch
configured to operate as an N_Port Virtualization.

Figure 5-72: FC Switch NPV Mode - Considerations

NPV can simplify fabric services because theres no need to distribute the zone
information to other switches. Switches operating in NPV mode do not take part in
zone enforcement, because all of this communication flows through the NPV mode
switch to be processed and enforced by the real fibre channel switch.
Similarly, there is a reduced number of name service and routing updates, as both of
these services are also maintained solely by the real fibre channel switches. This
means that name service databases are smaller, while routing tables and topologies
are simplified.

******ebook converter DEMO Watermarks*******

Meanwhile, redundancy capabilities remain, since NPV mode switches are capable
of link aggregation to the native FC switch fabric. Also, the concept of redundant
SAN A and SAN B is available for redundancy by simply configuring two NPV
mode devices for the two different fabrics.
An advantage for larger deployments is that NPV mode reduces the number of
domain IDs in use, which is limited to 239. Since NPV mode switches do not
consume a domain ID, greater scalability is available.
Another advantage relates to greater vendor Interoperability. There is no real
standardized Interoperability for the fabric services. Concepts are well described but
most vendors have different features and methods of implementing these concepts.
Practically speaking, there is little to no actual Interoperability between the vendors.
NPV switches do not take part in actual fabric services. They simply emulate a
traditional node (or multiple nodes), and this is a very standardized mechanism
which will work fine with other vendors.
For example, a server could be connected to a 5900 NPV switch, which in turn is
connected to a Brocade FC fabric. This effectively integrates a Comware 5900
switch into an existing Brocade fabric.
Another example involves using a virtual connect flex fabric in NPV mode. Blade
servers connect to the virtual connect flex fabric device, and the virtual connect flex
fabric device, configured as an NP_Port connects to a full Comware based 5900
fibre channel fabric.
One perceived disadvantage is that the traffic must leave the NPV switch and travel
to an actual FC switch to get forwarded to a target. Practically speaking, this is not an
issue since most designs place initiators behind the NPV switch, and targets behind
the FC fabric switches. Traffic must traverse this path anyway.
Another possible disadvantage relates to link oversubscriptions. Since multiple
servers may be connected through a reduced number of uplinks, you must ensure that
that sufficient bandwidth is available on uplinks.

Prerequisites to Configure NPV Mode


Before configuring a switch to operate in NPV mode, the system working mode must
be set to advanced, which requires a system reboot. Also, you should verify that no
existing FCoE mode configurations have been applied.

******ebook converter DEMO Watermarks*******

Configuration Steps for NPV Mode


The steps to configure NPV mode are shown in Figure 5-73.

Figure 5-73: Configuration Steps for NPV Mode

The steps involve globally enabling this mode on the switch, and then configuring the
VFC or FC Interfaces. Uplink interfaces must be configured as NP_Ports, and
downlink interfaces must be configured as F_Ports. Finally, the configuration should
be verified.
Notice that step 2 involves configuring the port as either a virtual or native FC port.
This means that the NPV mode can act as a convenient migration and interoperability
mechanism between native Fibre Channel and FCoE systems. This is because a 5900
CP could use FCoE over Virtual FC interfaces to connect to the servers, while using
native FC interfaces to connect to a traditional Cisco, HP, or Brocade fabric.
If there are legacy servers with native Fibre Channel, they can be connected to the
downstream native FC interfaces of the NPV switch. This switch can connect
upstream to native fibre channel storage system, while simultaneously connecting via
FCoE to other storage systems.

Configuration Step 1: Configure global NPV


mode
******ebook converter DEMO Watermarks*******

The first step is to enable global NPV mode. This done with the fcoe-mode npv
command, as shown in Figure 5-74.

Figure 5-74: Configuration Step 1: Configure global NPV mode

This FCOE mode command supports a single configuration option only. A single
switch cannot function in both an NPV mode and fibre channel forwarding mode at
the same time.

Configuration Step 2: Configure FC or VFC


Interfaces
The second step is to configure FC or virtual FC interfaces, similar to previous
configurations which we have seen. You can configure a native fibre channel
interface, ensuring that the correct optics have been installed.
The top example in Figure 5-75 shows an interface that started under the assumption
that it would operate as 10Gbps Ethernet. The command port-type fc was issued,
converting this port to a native fibre channel port.

Figure 5-75: Configuration Step 2: Configure FC or VFC Interfaces

******ebook converter DEMO Watermarks*******

The bottom example in Figure 5-75 illustrates the configuration of a virtual fibre
channel interface for FCoE, assuming that DCB is already configured. Interface tengigabit 1/0/4 is configured as a trunk link, and configured to allow VLAN 10 to
traverse this link.
Next, interface VFC 4 is created, bound to the interface ten1/0/4, and made a member
of VSAN 10.

Configuration Step 3: Uplink Interface


NP_Port
The uplink interface should be configured as the N_Port Virtualization port. This is
the interface that will connect to an available port on fibre channel switch. This
would typically be a Fibre Channel Forwarder (i.e., a native Fibre channel switch).
However, it could be another NPV switchs F_Port.
For example, a blade server could be connected to a virtual connect flex fabric, as
shown in Figure 5-76. The virtual connect flex fabric, operating in NPV mode,
connects to a 5900 CP, also operating in NPV mode, which is connected to a Brocade
SAN switch.

Figure 5-76: Configuration Step 3: Uplink Interface NP_Port

In this scenario, all the fabric logins will be handled by the Brocade SAN switch
with a single domain ID. As long as this is a typical deployment, and storage systems
targets are not running on the blade server, the system will work fine.

Configuration Step 4: Downlink Interfaces


F_Port
The downlink interfaces will connect to the N_Port of the servers HBA or CNA.
The hosts will see F_Port and initiate FC link setup. This host FLOGI session will be
proxied by the NPV switch to the upstream Fibre Channel switch.

******ebook converter DEMO Watermarks*******

As shown in Figure 5-77, this is accomplished with the fc mode f command.

Figure 5-77: Configuration Step 4: Downlink Interfaces F_Port

Configuration Step 5: Verify Status


The NPV switch status should be verified. Using the display npv login command,
you can see the actual logins. Even though they are not processed by NPV mode
switch, it still keeps track of logins, enabling you to see which FC devices are logged
in to which port.
This is important. When the NPV mode switch receives data for a specific FC_ID, it
must know which downstream interface should receive this traffic.
Use the display NPV status command to validate operational status.
With the display npv traffic-map command, you can see which downstream ports
are currently using which upstream ports. If multiple upstream ports are available,
the NPV switch can perform a kind of load distribution.
It does this by assigning some downstream ports to uplink 1 and other downstream
ports to uplink 2, for example. This can be seen by displaying the traffic map.

Summary
In this chapter, you learned about how various infrastructure components create an
integrated SAN fabric. This included discussions about HBAs, CNAs, native Fibre
Channel switches, and FCoE switches.
You learned that FC fabrics deploy various numbering schemas, such as FC_ID
addresses for data transmissions, static and FSPF-based routing, and WWNs with
zoning to control the targets that specific server initiators are allowed to use.

******ebook converter DEMO Watermarks*******

Key services provided by the SAN fabric include the Fabric Login service, which
formalizes the connection of hosts to the fabric, and the Simple Name Service used to
map FC_IDs to WWNs. VSANs enable separate logical storage fabrics to share a
common physical infrastructure, which can lower costs and improve security.
You also learned that MPIO is required to leverage the improved reliability and
performance provided by multi-path redundancy and load-sharing.
Finally, NPIV and NVP were discussed as methods of enabling hypervisors such as
VMWares ESX or Microsofts Hyper-V to avail HBA or CNA functionality to
internally hosted Virtual Machines.

Learning Check
Answer each of the questions below.
1.

Which statement below accurately describes an FCoE deployment


consideration?
a. FCoE protocols and systems are easily deployed as a multi-vendor solution.
b. Nearly any HP switch and storage system combination can be used to
deploy an FCoE solution
c. There is no need to deploy a specific version of firmware to the switches in
an FCoE deployment
d. HP has devised a set of converged networking cookbooks to ensure you are
deploying a validated set of storage arrays, servers, CNAs, and switches,
using specific firmware versions.

2. Choose three correctly described components of a typical FCoE deployment


(Choose three).
a. A host is a server system that initiates disk read or write requests
b. A disk array is a target devices that responds to disk read or write requests
from a host.
c.

N_Ports are used to connect hosts nodes to the fabric, while T_ports
connect target disk end systems to the fabric.

d. F_ports are those fabric ports that connect to either host initiators or target
disk arrays.
e. E_Ports are used to expand the fabric to multiple switches.
3. Which four statements below accurately describe FCoE naming and forwarding

******ebook converter DEMO Watermarks*******

conventions (Choose four)?


a. A WWN provides a unique identifier for each FCoE device to enable frame
delivery
b. The WWN is somewhat like a BIA for a Layer 2 network interface that can
identify systems independently from the FC_ID
c. The FC_ID is a dynamically assigned address that is used as the source and
destination address of a frame.
d. With Comware devices, the FC_ID uses 16 bits to identify ports, increasing
addressing scalability.
e. Each switch is assigned a unique domain ID. This ID must be manually
assigned
f. The domain ID can be statically assigned or manually assigned.
4. Which two statements below accurately describe FC classes and flow control
(Choose two)?
a. BE flow control uses an R_RDY message for flow control.
b. BB flow control use a frame credit mechanism to help provide lossless
frame transmission.
c. All flow control mechanisms can be used by all FC classes.
d. EE flow control uses an ACK frame to let the transmitter know that the
previous frame was successfully received.
e. BB flow control is not well suited for time-sensitive applications.
5. What are to prerequisites to configuring NPV mode on a switch (Choose two)?
a. System-working-mode must be set to advanced, which immediately takes
effect
b. No existing fcoe-mode configurations should be in place
c. System-working-mode must always be left to its default value
d. System-working-mode must be set to advanced, which takes effect after a
reboot
e. The correct fcoe-mode must be configured before NPV mode is activated

Learning Check Answers


1. d
2. a, b, d, e

******ebook converter DEMO Watermarks*******

3. b, c, d, f
4. b, d
5. b, d

******ebook converter DEMO Watermarks*******

6 Transparent Interconnection of Lots


of Links (TRILL)

EXAM OBJECTIVES
In this chapter, you learn to:
Describe the goal of TRILL.
Describe the use cases of TRILL.
Understand the operation of TRILL.
Configure TRILL.

INTRODUCTION
This chapter is focused on the TRILL protocol. You will learn the motivation for its
development, and how it can provide very large-scale Layer 2 infrastructure for data
centers.
You will also explore the details of TRILL operation, and understand how TRILL
leverages Layer 3 technologies to provide Layer 2 services. TRILL design and
deployment considerations will be discussed, as well as how to configure TRILL on
Comware devices.

TRILL Introduction
TRILL is an IETF standard that stands for Transparent Interconnection of Lots of
Links. The goal of the TRILL protocol is to provide large-scale Layer 2 fabric
services. The intent is to maintain the simplicity of traditional Layer 2 systems while
adding the scalability and convergence of a Layer 3 routed network.

******ebook converter DEMO Watermarks*******

From the perspective of an endpoint device, a standard frame continues to transport


data from source to destination MAC address. However, traditional Spanning Tree
Protocol (STP) between switches is replaced with the routing-like functionality of
TRILL.
STP uses a single path in the network, which may not actually be the best path for a
specific source-to-destination traffic flow. With TRILL, Layer 2 forwarding is based
on best path selection, very much like that of OSPF or IS-IS. This provides actual
best path selection while supporting a redundant, active-active topology.

TRILL Standards
The TRILL protocol is documented in a set of IETF RFCs, and was developed by
Radia Perlman, developer of the original IEEE STP. Figure 6-1 shows the RFCs that
describe the actual operation of the TRILL protocol.

Figure 6-1: TRILL Standards

TRILL frames use a hop count or Time-To-Live (TTL) mechanism, which must be
processed by switch hardware. For this reason, TRILL will typically only be
supported on newer generation Data Center switches which have been designed with
ASICS that support this TTL processing.

TRILL Concepts 1
A switch that runs TRILL is called a Routing Bridge, or RBridge. This is because it
is a Layer 2 bridging device that uses routing functionality to determine optimal data
flow.
Figure 6-2 shows four RBridges that form the TRILL network. This network supports
connectivity for standard endpoints and Classic Ethernet switches. Two such
switches are indicated in the figure, as CE-Switch1 and CE-Switch2.

******ebook converter DEMO Watermarks*******

Figure 6-2: TRILL Concepts 1

Each RBridge is identified by a unique system ID, which is automatically generated


by the TRILL protocol. The ID is based on device MAC address by default. A system
ID can be manually configured, but automatic generation works well.
The system ID is not used to forward frames. It is used to uniquely identify each
RBridge inside the Link State Database (LSDB) of the TRILL network. This can be
compared to how each OSPF router is identified by its router ID in OSPFs LSDB,
often derived from the IP address of a loopback interface.
An RBridge forwards frames through a TRILL network based on source and
destination nicknames. Each RBridge in the figure has a unique hexadecimal value
for this purpose. RB1 has been assigned a nickname of 0x0001, RB2 has 0x0002, and
so on.
When RB1 sends a frame to RB2, TRILL adds source and destination nicknames to
the frame. In this scenario, RB1s nickname of 0x0001 is the source, and RB2s

******ebook converter DEMO Watermarks*******

nickname of 0x0002 is the destination. Each RBridge in the path will process these
frames based on destination nickname address. Again, this can be compared with
how a routing protocol like OSPF routes frames based on destination IP address.
Nicknames can be automatically generated by the system. Unlike IP addressing,
nicknames do not have a hierarchical structure. There is no network and host portion
for nicknames, and neither is there any sort of mask. Nicknames create a simple, flat
address space.
Nicknames are 16-bit values, so the theoretical number of available addresses is
65,536. They can be randomly chosen from this space. However, an administrator
can manually assign nicknames, making network documentation and diagnostics more
intuitive. Instead of random hexadecimal numbers, a schema can be used to
distinguish between distribution and access switches, or perhaps to indicate rack
locations inside a data center.
TRILL is based on the IS-IS link-state routing protocol. IS-IS operates between all
RBridges, exchanging link state information, and building an LSDB. The SPF
algorithm is run on this database to determine optimal paths through the TRILL
network.
With TRILL terminology, CE stands for Classic Ethernet, and is used to indicate a
traditional Ethernet switch that does not run the TRILL protocol.

TRILL Concepts 2
Like all link-state routing protocols, TRILL uses the concept of a Designated Router,
called the Designated RBridge (DRB). DRBs improve update efficiency on multiaccess networks by acting as a single point of contact. Thus, every device on a multiaccess broadcast domain neednt communicate with every other device. Each device
need only communicate new information to the DRB, which will inform others. Each
DRB generates Link State Advertisements (LSAs) for its multi-access network.
Another use for the DRB is concerned with access links, which are used to connect
the TRILL network to endpoints or CE switches. The DRB is responsible for
selecting the Appointed VLAN Forwarder (AVF).
CE-Switch2 in Figure 6-3 has a traditional Ethernet connection to RB1 and RB2. The
TRILL protocol can detect that RB1 and RB2 are interconnected via this CE device.
Each RBridge sends HELLO packets. RB1s HELLO packet is transported by CESwitch2, and so arrives inbound at RB2. The same thing happens when RB2 sends a
HELLO packet, and so the two RBridges discover each other on this access link.

******ebook converter DEMO Watermarks*******

Figure 6-3: TRILL Concepts 2

Once RB1 and RB2 discover each other, they elect a single DRB for the link. In this
example RB2 has won the election for the link connected to CE-Switch2. The DRB
can then select an AVF, which ensures that each access network VLAN is only
allowed a single connection to the TRILL network. This avoids loops. It also
prevents the TRILL network from perceiving a single MAC address as being sourced
by multiple RBridges.
This feature also enhances scalability when multiple access VLANs connect to a
TRILL network. TRILL is VLAN-aware, and so the links from CE-Switch2 to RB1
and 2 can be 802.1q trunks. In this scenario, a thousand VLANs are passing over
these trunk links. To split the load, the DRB might select itself as the AVF for VLANs
1-500 and it could appoint RB2 to forward traffic for VLANs 501-1000.
In this way, the DRB controls which device forwards traffic for a particular VLAN.
The actual port which is doing the forwarding is called the Appointed Port (AP).

******ebook converter DEMO Watermarks*******

This is similar to how Multiple Spanning-Tree Instances (MST) works.


While this delegation of traffic load is defined in the TRILL standard, the HP
Comware implementation does not support such distribution. Instead, the DRB is
always the AVF, with no distribution mechanism available. However, this is not a
practical limitation. TRILL can be combined with IRF topologies, allowing an
alternative mechanism for this redundancy.

TRILL Concepts 3
RBridges peer with each other using the IS-IS link state protocol. They exchange link
state information, build an LSDB, and calculate the best path to each destination.
These destinations are identified based on System IDs, which acts as the LSP
identifier. The LSP also includes the nickname of the destination RBridge.
The ingress RBridge is the TRILL device receiving an Ethernet frame from an
endpoint or a CE Switch. This ingress RBridge encapsulates the original Ethernet
frame in a new TRILL frame.
This new frame contains source and destination nicknames in the TRILL network.
The ingress RBridge sets its own nickname as the source, and determines an
appropriate destination nickname to apply. This new frame is routed through the
TRILL network via the shortest path, based on the destination nickname.
In Figure 6-4, supposed RB1 must transmit data to RB2. A traditional STP topology
might choose the path from RB1 > RB6 > RB3 > RB7 > RB2. TRILL uses IS-IS, and
so will determine a more elegant path from RB1 to any one of the switches RB6-9,
and then straight down to RB2.

******ebook converter DEMO Watermarks*******

Figure 6-4: TRILL Concepts 3

In this scenario multiple paths exist between RBridges. If these paths are of equal
cost, then typical Layer 3 load-balancing principles can be applied. Endpoint unicast
traffic entering RB1 can be load-shared over the four different paths to destination
RB2. This is a significant scalability improvement over the original Spanning Tree
Protocol.
The RBridge connected to the destination end point is called the egress RBridge.
This device must remove the TRILL header, and transmit a standard Ethernet frame to
the endpoint.
The operation of TRILL is not affected whether frames enter as native, untagged
Ethernet frames, or whether they include an 802.1q tag. TRILL simply processes
frames based on the source and destination MAC address. Any 802.1q VLAN tag
will be maintained over the TRILL network.

TRILL Frame Format


Figure 6-5 shows three sections of header information for the payload. This includes
an outer header, the TRILL header, and an inner header.

******ebook converter DEMO Watermarks*******

Figure 6-5: TRILL Frame Format

The outer header is a typical Ethernet frame. It is built with destination and source
MAC addresses to traverse a single link between two RBridges, in a hop-by-hop
fashion. The receiving RBridge strips off the outer header, determines the best
outbound interface, creates a new outer header, and sends the frame. This frame
arrives at the next-hop RBridge, and the process continues.
It is possible to use a VLAN tag in this outer header, but know that this 802.1q tag has
local significance only. It is like a routed sub-interface used to connect between two
routers.
The TRILL header operates similar to an IP header in a Layer-3 routed network. This
end-to-end header is maintained across the entire TRILL network. This header
contains the nickname of the ingress RBridge as the source, and the egress RBridge
nickname as the destination.
Each RBridge receiving a TRILL frame discards the outer header, analyzes the
TRILL headers destination nickname, and selects the outbound interface along the
best path. It then builds a new outer header, and transmits the frame to the next-hop
RBridge. Each RBridge also decrements the Hop Count field in the TRILL frame.
This is similar to how the Time-To-Live (TTL) field is utilized in a Layer 3 routing

******ebook converter DEMO Watermarks*******

protocol. It mitigates any loops inside the TRILL network.


Inside the TRILL header is the original Ethernet frame created by an end system. The
egress RBridge strips off all other headers, leaving only this original Ethernet frame,
before transmitting to the intended endpoint.

TRILL Frame: Outer Header 1


As a best practice, RBridges should be directly connected. In Figure 6-6, this is not
the case, as it is possible to insert an intermediate CE switch between two RBridges.
This CE switch can perform traditional L2 forwarding, since the outer header has a
valid source and destination MAC address. Specifically, this would be the source
and destination MAC address of the RBridges on the local link. Since these are
typical MAC addresses, the CE switch can learn the addresses and forward frames.

******ebook converter DEMO Watermarks*******

Figure 6-6: TRILL Frame: Outer Header 1

The scenario depicted can serve to clarify why any VLAN tag on the outer header is
only locally significant. It is only used to traverse the connection between two
directly-connected RBridges. Of course, there is an intermediate CE switch between
the RBridges in this case. This CE switch must be configured to support any VLANs
and tagging used between RBridges. Stated another way, TRILL should not attempt to
use VLANs not supported by intermediate CE switches.
In reality this will be automatically handled by TRILL. TRILL exchanges IS-IS
HELLO packets over multiple VLANs. These HELLO packets will be tagged with
multiple VLAN IDs, enabling the RBridges to detect which VLANs can be
successfully passed between each other and which ones cannot.

******ebook converter DEMO Watermarks*******

TRILL Frame: Outer Header 2


Any VLAN tag used by the outer header is referred to as the designated VLAN. It is
acting as a transport VLAN tag used to reach the RBridge on the other side of a single
link. If this link supports 802.1q tagging, then multiple VLANs could be available for
the RBridges to use. In this case, the lowest VLAN ID is selected as the designated
VLAN.
If enabled on the interface, this would be VLAN 1. Often, VLAN1 is configured as
the PVID, which is untagged. If this is the case, then the outer header requires no
VLAN tag. The frames will be transmitted by RBridges as untagged.
If an intermediate Layer 2 switch is deployed, you must ensure that it is properly
configured for VLAN tagging. This CE switch may support multiple VLANs, some of
which are used for other purposes in the traditional Ethernet environment. In this
case, you can manually configure a designated VLAN.
For example, supposed that VLANs 10 through 20 are supported between RB1 and
RB2, but VLANs 10 through 19 are used for other purposes by the CE switch. The
administrator can manually configure the designated VLAN as VLAN 20.
The designated VLAN is configured per interface. If RB1 has a different interface
connected to RB4, that interface could be configured to use VLAN 1 has the
designated VLAN. Again, this is an optional configuration. RBridges should be
directly connected, and so would normally use untagged VLAN1 for TRILL.

TRILL Frame: TRILL Header


The TRILL header functions very much like a typical layer 3 IP header. The Ingress
RBridge receives user traffic. To build the TRILL header, it sets its own nickname as
the source. It then determines the best path to reach the destination, and sets the
appropriate egress RBridge nickname as the destination. It starts this process by
comparing the destination MAC address in the original header to its own MAC
address table.
If the destination address is unknown then the packet will be forwarded to the
destination nickname All-RBridges. The All-RBridges nickname is sent to a
Multicast distribution group and processed by all RBridges.
The MAC address is known if there is a MAC table entry for that address. This MAC
table doesnt bind MAC addresses to interfaces like a classic Ethernet switch.
Instead, the table binds MAC addresses to nicknames. The switch can thus choose the

******ebook converter DEMO Watermarks*******

appropriate egress RBridge, and place that nickname into the destination address
field of the TRILL header. Actual traffic forwarding inside the TRILL network is
based on this destination nickname.

TRILL Frame Flow Overview


Figure 6-7 introduces the general concept of how a frame travels through a simple
TRILL fabric. Only the header fields that are most important to understanding general
concepts are shown.

Figure 6-7: TRILL Frame Flow Overview

1. The client named CL1, with MAC address C1, creates a frame destined for the
client named CL3, with a MAC address of C3. In the example, an 802.1Q tag of
10 happens to be added, which could be the case if the end node were a VMbased server. If it were an actual end client, an 802.1Q tag might not be
included. The point is that ultimately, the original frame created by the source
node will be successfully delivered to the destination node.
2. The TRILL Routing Bridge RB1 receives this frame. In this case, RB1 has
learned that CL3 is located at the site behind RB3. The process of how this is
learned will be discussed in the next section of this chapter. RB1 adds a TRILL
header, placing its nickname in the Ingress RB fields, and RB3s nickname as the
ultimate destination.

******ebook converter DEMO Watermarks*******

a. The hop count field is set to a number high enough to reach the ultimate
destination. The Multi-destination, or M bit is set to zero, indicating that
the frame is a unicast frame.
b.

RB1 adds an outer frame, with itself as the source, and RB2 as the
destination. In this example, VLAN 20 is used to move frames through the
TRILL fabric.

3. RB2 receives the frame, strips off the outer header, and adds a new one, with
itself as the source and RB3 as the destination. With the exception of the hop
count field being decremented, the TRILL header is not modified. The example
uses Ethernet as the only Layer 2 protocol. However, the connection between
Routing Bridges could be PPP, any nearly any other Layer 2 protocol. The frame
is transmitted to RB3
4. RB3 receives the frame, and strips off the outer header. The TRILL header
indicates that RB3 is the egress Routing Bridge, and so the TRILL header is also
removed.
a.

Now the original, inner header can be analyzed. RB3 sees that the
destination MAC address is for client CL3, and forwards the frame
accordingly.

TRILL Operation
This section is focused on TRILL operation. This includes both unicast and multidestination forwarding, along with a discussion of how TRILL forms adjacencies.
Trill link types, trees, and DRB operation will also be covered, along with some
design considerations.

TRILL Forwarding: Multi-Destination 1


Multi-destination traffic applies to broadcast, multicast and unknown unicast traffic.
To handle this traffic, TRILL creates a distribution tree to ensure loop-free delivery.
This is very similar to a classic Spanning Tree topology. However, while STP uses
this topology for all traffic, TRILL only uses it for broadcast, multicast, and unknown
destination unicast frames. It uses the shortest path for all other traffic.
The basic principle is that there are some tree roots inside the TRILL network. All
other RBridges calculate the shortest path to the tree root. These paths are indicated
by the thick lines in Figure 6-8. Multi-Destination traffic will be forwarded based on
that tree topology.

******ebook converter DEMO Watermarks*******

Figure 6-8: TRILL Forwarding: Multi-Destination 1

TRILL Forwarding: Multi-Destination 2


In the example scenario in Figure 6-9, Server2 and Server4 need to communicate. To
begin this communication, Server2 sends an ARP broadcast to discover Server4s
MAC address.

******ebook converter DEMO Watermarks*******

Figure 6-9: TRILL Forwarding: Multi-Destination 2

RB2 is the ingress RBridge for this frame. It receives the frame and determines that
the destination address is a broadcast. It creates a TRILL header by adding its own
nickname as the source and by adding the tree name as the destination. Usage of a
unique tree name enables multiple distribution trees within a single TRILL topology.
The ability to use multiple distribution trees enables load balancing across the TRILL
fabric. It is the ingress RBridge that chooses which path shall be used.

TRILL Forwarding: Multi-Destination 3


RB2 forwards this frame into the tree. Core RBridge RB1 receives the frame and
forward it based on the tree name. Downstream access RBridges RB3 and RB4
receive this frame and analyze the endpoint users source MAC address, along with
the source nickname, as shown in Figure 6-10. Thus the senders MAC address is
now learned. It is added to the TRILL switchs MAC address table and bound to the

******ebook converter DEMO Watermarks*******

associated source nickname, as gleaned from the TRILL frame.

Figure 6-10: TRILL Forwarding: Multi-Destination 3

Compare this to traditional Ethernet switches. A CE switch looks at the source MAC
addresses of frames to automatically bind MAC addresses to outgoing ports, and
stores this information in a MAC address table. Similarly, a TRILL switch looks at
source MAC addresses to automatically bind them to source nicknames, storing this
information in a TRILL MAC address table.
Each TRILL access switch that received this frame will remove the TRILL header
and send a traditional Ethernet broadcast frame out its access ports. Both Server3
and Server 4 receive the ARP request from Server 2.

TRILL Forwarding: Unicast 1


As described in Figure 6-11, endpoint MAC addresses are learned, and bound to a

******ebook converter DEMO Watermarks*******

nickname. Once this occurs, the TRILL devices can perform unicast forwarding.

Figure 6-11: TRILL Forwarding: Unicast 1

TRILL Forwarding: Unicast 2


Continuing with the scenario in Figure 6-12, Server 4 has received the ARP request
from Server2. It recognizes its address in the request, and so must send an ARP
unicast reply back to Server2.

******ebook converter DEMO Watermarks*******

Figure 6-12: TRILL Forwarding: Unicast 2

RB4 receives this unicast frame, and looks up the destination MAC address in its
table. It finds a match, and sees that the MAC address for Server 2 is bound to the
nickname for RB2.
RB4 creates a TRILL frame, using its nickname as the source, and RB2s nickname as
the destination.

TRILL Forwarding: Unicast 3


RB4 finds a match for the destination MAC address, and so determines the
destination nickname is RB2s. Next, RB4 looks up the destination interface for this
nickname.
As TRILL devices, these RBridges have been running IS-IS, and have already
determined the shortest path to RB2. RB4 thus knows which physical interface should
be used to forward data to RB2. RB4 forwards the TRILL frame toward RB2 over

******ebook converter DEMO Watermarks*******

this interface. RBridges in the path to RB2 receive the frame and forward it along the
best path to RB2.
RB2 receives the frame. It gleans the endpoints source MAC address, and binds it to
RB4s nickname in its MAC address table. Finally, it removes the TRILL header and
sends a standard Ethernet frame to Server 2.
In this example the direct link connecting RB2 and RB4 is the single best path
between Server 2 and Server 4, as shown in Figure 6-13. If that link fails, there are
two possible paths remaining. One path is via RB3 to RB2, and the other is via RB1
to RB2. It would be possible in that case to leverage equal cost multipath
capabilities. The load could be divided between these two paths.

Figure 6-13: TRILL Forwarding: Unicast 3

TRILL RBridge Peering and Adjacencies


As discussed, TRILL uses the IS-IS link state routing protocol, using the SPF

******ebook converter DEMO Watermarks*******

algorithm. This IS-IS protocol runs at Layer 2 on the switches, using its own hello
frame for peer discovery. The TRILL IS-IS protocol operates as a result of basic
TRILL configuration, and does not require specific configuration of its own.
Links and link costs are described as TLVs inside LSPs. As shown in Figure 6-14,
the information from these LSPs is stored in the LSDB. The LSDB includes every
possible path to each destination. Then the SPF algorithm is performed on this
database to determine the best paths. This is based on path costs, as advertised in
LSPs.

Figure 6-14: TRILL RBridge Peering and Adjacencies

Just like with OSPF running on Layer 3 routers, the LSDB is a list of all possible
paths, and the routing table is a list of the best paths. This routing table lists every
destination nickname, along with the outgoing interface used to get there.

TRILL Port Types


TRILL uses three port types. These are the TRILL access ports, TRILL Hybrid ports,
and Trunk ports. These port types should not be confused with traditional interface
port types. They are separate and distinct from VLAN access, hybrid, and trunk ports.
Figure 6-15 represents TRILL access ports with a solid line. TRILL access ports

******ebook converter DEMO Watermarks*******

only forward user frames. They do not accept TRILL-encapsulated frames. TRILL
hello frames may be exchanged on these interfaces, in case a designated RBridge
election is required.

Figure 6-15: TRILL Port Types

If you are sure that a port will be the only TRILL access port for a given segment, you
can use the access alone option. This effectively makes the interface a silent
interface, since no hello frames will be exchange. If two access alone switches are
connected to a common CE switch they will both be in the forwarding state. This
causes MAC address flapping inside the TRILL network. For this reason, you should
be mindful when using this option.
In Figure 6-15, Server4 has a single link to RB4, making it safe to use the access
alone option on that interface. TRILL is fully supported in combination with IRF.
RB4 could be an IRF system of two physical switches, with Server4 connected to
two physical interfaces in the bonded or teamed configuration. This logical interface
would be perceived by TRILL as a single interface. In this way it is possible to gain
redundancy for endpoints connected to access alone ports.
Figure 6-15 also shows CE-Switch 3 connected to both RB3 and RB4 for
redundancy. In this scenario, it is vital that all the traffic entering the TRILL network
from CE-Switch3 enters via a single port. To that end, RB3 and RB4 connections to
this switch are configured as access ports. Therefore they do not support TRILL
backbone traffic, but will exchange TRILL hello frames. One of them will be elected
as the designated RBridge and will actually forward CE-Switch3 frames into the
TRILL network. The other interface will be in the blocking state.

******ebook converter DEMO Watermarks*******

Like an access port, a TRILL Hybrid port can forward user frames. It can also be
used to transmit TRILL frames and operate as a transit link. Figure 6-15 shows the
RB1 and RB2 connections to CE-Switch2 are configured as Hybrid ports. This is not
a recommended configuration, but does provide some additional redundancy. If the
links connecting RB1 and RB3 on the left to RB2 and RB4 on the right were to fail,
there would still be a backbone-capable path via CE-Switch2.
Most of the VLANs in a CE switch will be user VLANs. In this hybrid scenario it can
make sense to manually specify the designated ports which will the actual transport
VLAN used by TRILL.
TRILL trunk ports can only be used to transmit TRILL frames. It does not support the
transmission or receipt of traditional Ethernet frames. Only connections with other
RBridge can be made. Effectively only a single VLAN is needed on the TRILL trunk
port.

TRILL Multi-Destination Tree Calculation 1


You previously learned how TRILL multi-destination traffic is forwarded very much
like a classic STP scenario, over a spanning-tree topology with one bridge acting as
the trees root. Like classic STP, root bridge election is based on a configurable
priority value that defaults to 32768. The difference is that with STP, the lowest
priority value wins the election, while with TRILL the highest priority wins. If there
is a tie, the highest nickname wins.
In Figure 6-16, RB1 is configured with priority 65000, so it is the tree root. RB2 has
the second-highest priority of 64000, and so would become the new root if RB1 were
to fail.

******ebook converter DEMO Watermarks*******

Figure 6-16: TRILL Multi-Destination Tree Calculation 1

TRILL Multi-Destination Tree Calculation 2


A Spanning Tree can have only one Root Bridge. However, that Root Bridge can
request that other trees be created in the same physical topology, each with its own
Root Bridge. This can make sense if multiple equal cost paths are available towards
the tree root.
If multiple equal-cost paths are not available the different trees will simply follow
the same topology. In Figure 6-17, RB1 is the tree root, which has been configured to
request two tree calculations.

******ebook converter DEMO Watermarks*******

Figure 6-17: TRILL Multi-Destination Tree Calculation 2

RB3 does not have multiple equal-cost paths to RB1. Therefore, the trees for both
VLAN 11 and VLAN 12 use the same link to RB1. This is also the case for RB2.
RB4 has two equal-cost paths to RB1. This means that the tree for VLAN 11 can get
to the RB1 root via RB2, while the tree for VLAN 12 can use RB3. In this way,
traffic load can be shared among equal-cost paths.
The Ingress RBridge can map traffic to the multicast distribution, per VLAN. This is
not a configurable part of the TRILL definition. In the example, VLAN 11 may use the
topology indicated by the dashed lines, and VLAN 12 may use the topology indicated
by the solid lines. Or the opposite may be the case. As long as the load sharing
occurs, network performance is optimized.

TRILL Designated Routing Bridge


A DRB is selected for each link, based on the highest priority assigned to RBridge
interfaces on that link. The default priority is 64. In case of a tie, the highest MAC

******ebook converter DEMO Watermarks*******

address will win.


Figure 6-18 shows that RB1 and RB2 are connected in a TRILL backbone. RB1 is the
DRB, which means that either someone configured it with a higher priority, or the
priorities on RB1 and RB2 were the same, and RB1 had a higher MAC address.

Figure 6-18: TRILL Designated Routing Bridge

DRB responsibilities depend on link type. On backbone links, the DRB must
advertise a pseudo node, used to represent that link. This is standard IS-IS

******ebook converter DEMO Watermarks*******

terminology, and similar to how an OSPF Designated Router (DR) operates on any
multi-access network, like Ethernet. This responsibility has to do with the internal
operation of the link-state protocol.
For access links, the DRB must assign the AVF role, which controls actual data flow.
As previously mentioned, with Comware implementations, the DRB always assigns
itself as the AVF.
Figure 6-18 shows RB1 and RB2 connected via CE-Switch2. Since RB1 is a
Comware switch that won the DRB election, it also fills the role of AVF. All data
from CE-Switch2 enters the TRILL network via RB1.
RB2 is neither a DRB nor AVF. If the connections from both RB1 and RB2 support
the same VLAN or set of VLANs, then RB2s connection to CE-Switch2 remains in a
blocking state. This prevents a loop.
However, if RB1s connection to CE-Switch2 only supports VLAN 100, while
RB2s connection only supports VLAN 200, then there is no need for RB2s
connection to enter a blocking state. The RBridges are aware of VLANs through the
use of Hello frames.

Unicast ECMP
ECMP allows load-sharing across multiple equal cost paths. This can be controlled
in part by device-level maximum ECMP settings. This option functions at the ASIC
level. It controls what the maximum number of paths calculation should be for all
possible ECMP protocols. This includes OSPF, BGP, TRILL and so on. All of these
protocols must respect the hardwares maximum configured capability.
ECMP is also controlled by TRILL maximum unicast ECMP path settings. This limit
is set to 8 paths by default. The 5900-series supports up to 32 paths at the hardware
level. This configuration can never exceed the device-level ASICs maximum ECMP
path settings.
It is the ingress RBridge that performs traffic distribution. In the example in Figure 619, Server4 transmits data to Servers A1 and A2. The MAC addresses for A1 and
A2 are both learned on RB1. Since RB4 has multiple equal-cost paths to RB1, it can
distribute the packet load over those paths.

******ebook converter DEMO Watermarks*******

Figure 6-19: Unicast ECMP

Design Considerations 1
The configuration of Layer 3 routing can impact TRILL operation. This is because a
device cannot perform both TRILL Layer 2 functions and IP Layer 3 routing on the
same interface. Therefore, the TRILL devices must be pure Layer 2 devices,
interconnected to a Layer 3-capable router.
For dedicated TRILL ports, VRRP can be used on these Layer 3 gateways to ensure
redundancy.

Design Considerations 2
TRILL does not forward Spanning Tree BPDUs and does not participate in Spanning
Tree calculation. Effectively, TRILL creates Spanning Tree Islands. This also means

******ebook converter DEMO Watermarks*******

that Spanning Tree should be disabled on the TRILL interfaces.


The TRILL network is loop free due to IS-IS SPF calculations and reverse paths
checks for Multicast traffic. This loop-free topology is enforced by the data plane by
leveraging the hop count field in the TRILL header.
However, while loops are mitigated inside the TRILL network, the rest of the
network still requires sound design and implementation. CE switches must still be
configured with the same care as before.

Design Considerations 3
TRILL is fully supported in combination with IRF. This means that link aggregation
can be used between IRFs systems for backbone links, as well as for access links
toward CE switches and endpoints.
Figure 6-20 shows a TRILL network with two stand-alone switches at the core.
Surrounding this core are four IRF systems. Each IRF system consists of two physical
switches. Each IRF system has multiple connections to the TRILL core.

******ebook converter DEMO Watermarks*******

Figure 6-20: Design Considerations 3

There is a direct connection between IRF3 and IRF4, enabling traffic to pass directly
to each other. When data is transmitted to other devices in the TRILL network, it must
travel through the TRILL core switches.
Between CoreA and IRF3, a link aggregation has been created. This aggregated link
set can be configured as a TRILL trunk. This provides simultaneous support for both
TRILL and IRF inside the backbone.
At the access layer, the CE-Layer2 switch is also connected to IRF3 with two
physical links, configured for standard link-aggregation. From the IRF3 perspective,

******ebook converter DEMO Watermarks*******

this is simply a bridge aggregation, configured as a TRILL access port. In this


particular case, it could be a TRILL Access alone port, but this is optional.
There are also possibilities for redundant design when a TRILL network connects to
Layer 3 devices. One such method is for two physical routers to each be connected to
a TRILL access port. VRRP would provide a traditional failover mechanism. Since a
VRRP MAC address can only be learned in one port of an RBridge, an activestandby model must be used.
Another method is indicated in Figure 6-20. In this case the TRILL networks IRF4 is
connected to IRF CE-Layer 3 via link aggregation. Since IRF presents the two
physical routers as a single, logical device an active-active model is deployed.
Many data center engineers prefer the superior bandwidth utilization provided by
IRFs active-active design. Others may be attracted to the individual control planes
provided by separate, physical routers running VRRP.

Graceful Restart for TRILL IS-IS


TRILLs IS-IS process includes support for graceful restart. This feature is helpful
for situations where there is an MPU failover on chassis switches, or on a master
switch failover in an IRF.
In these cases, the TRILL control plane processes must be restarted. Although the
hardware and firmware quickly recover from these failovers, routing protocols lose
their peer relationships. The time required to reestablish these lost peer relationships
creates additional downtime.
Most routing protocols like OSPF and BGP have a graceful restart mechanism that
reduces this downtime, and TRILL is no exception. During this re-peering, the device
that failed over takes on the role of a GR Restarter. The connected neighbors support
this by becoming GR Helpers. Both devices must be configured to support this
functionality.

TRILL Configuration
Figure 6-21 introduces the basic and optional steps involved in TRILL configuration,
as detailed in this section.

******ebook converter DEMO Watermarks*******

Figure 6-21: TRILL Configuration

Step 1: Enable TRILL Globally


Figure 6-22 shows how to enable TRILL globally on the device. When you enable
TRILL, a nickname is automatically generated. However, manual assignment eases
troubleshooting. During diagnostics, it is often helpful to analyze the LSDB, which
lists the links available for each TRILL nickname. If randomly generated nicknames
are used, it can be difficult to know which nickname is associated to a particular
physical device.

******ebook converter DEMO Watermarks*******

Figure 6-22: Step 1: Enable TRILL Globally

You should be alert and maintain good documentation when assigning nicknames. If
two devices are configured with the same nickname, the highest configured priority
value will prevail. If priorities were not configured, the device with the highest
system ID will keep its name. The other unit could either be functionally disabled or
auto-assigned a new name. Either way, you will have lost the advantage of knowing
which device has a particular nickname.

Step 2: Configure Uplink Interfaces


All TRILL backbone links are configured as TRILL uplinks. They will use any
available VLAN on the interface for the outer encapsulation, defaulting to VLAN 1 if
no other VLANs have been configured. If multiple VLANs are configured TRILL will
automatically select lowest VLAN ID.
In the example in Figure 6-23, interface ten1/0/2 is enabled for TRILL, and the
TRILL link type is configured as a trunk. Thus, only other TRILL devices can be
discovered on this link. It cannot process standard Ethernet data frames from
endpoints.

Figure 6-23: Step 2: Configure Uplink Interfaces

Since no VLANs are configured on this link, trill traffic will use VLAN1. The outer
header of the frame will be sent without an 802.1q header and associated VLAN tag.

Step 3: Configure Access Interfaces


******ebook converter DEMO Watermarks*******

TRILL ports that connect to endpoints or CE switches are configured as access ports.
TRILL port types are separate and distinct from VLAN port types. TRILL access port
frames can be tagged or untagged. Also, remember that user VLAN tags are part of
the INNER header of the TRILL Frame. The inner header is merely considered part
the data field by the outer header, so is not relevant to TRILL devices.
The example in Figure 6-24 shows an interface enabled for TRILL, with a link type
of access. Since this is the default link type, the command is not actually required. It
is shown here to convey proper syntax, or should a trunk link need to be reverted to
an access link.

Figure 6-24: Step 3: Configure Access Interfaces

Optionally you can use the access alone option on the interface. This makes it a silent
interface, which is appropriate for single-homed devices that have no other TRILL
connections.

Step 4: Configure Multicast Root


TRILL devices use an STP-like tree for broadcast, multicast, and unknown unicast,
traffic. Like STP, this Spanning-Tree emanates from a root bridge. The primary
criterion for electing the root bridge is a configured priority value. The root can
request multiple tree calculations for ECMP forwarding of multicast traffic.
Figure 6-25 shows a TRILL configuration. The priority is set to 65535, the highest
value available. Also configured is the number of trees that should be calculated. You
should configure this value based on the number of actual paths available between
other RBridges and the root bridge.

******ebook converter DEMO Watermarks*******

Figure 6-25: Step 4: Configure Multicast Root

Step 5: Verify
Several display commands are available to validate your configuration. These
commands are shown in Figure 6-26.

Figure 6-26: Step 5: Verify

Optional Step 6: Interface DRB Priority


The first optional step involves configuring the DRB priority on an interface, as
shown in Figure 6-27. This enables you to control which RBridge wins the election
to become the DRB for a particular Ethernet link. The highest priority wins, and if
there is a tie the highest MAC address wins. The default priority is 64.

Figure 6-27: Optional Step 6: Interface DRB Priority

Optional Step 7: Interface Designated VLAN


You can optionally configure the transport VLAN used by the outer header of a

******ebook converter DEMO Watermarks*******

TRILL frame, per interface. By default the lowest available VLAN ID is used. This is
negotiated between peers on the link by exchanging Hello frames. The VLAN
specified in Figure 6-28 is a proposed VLAN. If this VLAN is not a viable option,
the switches will negotiate another option.

Figure 6-28: Optional Step 7: Interface Designated VLAN

Optional Step 8: Interface Link Cost


As shown in Figure 6-29, configuring interface link cost allows manual control of
path cost calculation. This is applicable per interface. By default, auto-cost
calculation is enabled. Auto-cost divides the number 20,000,000,000,000 by
interface link speed. A 10Gbps link will have a cost of 2000, while a 1Gbps link
carries a cost of 20,000.

Figure 6-29: Optional Step 8: Interface Link Cost

Manual configuration of link cost can be useful if the physical interface does not
reflect the actual bandwidth available. For instance, a 10 gigabit interface may be
used for an engineered Layer 2 connection like MPLS. This MPLS link may be
limited to 2 gigabits per second of bandwidth toward a remote site. It would of
course be appropriate to manually configure the TRILL cost in these situations. Other
than situations like this, it is best to use the default auto-cost calculation.

Summary
In this chapter, you learned that TRILL is an IETF standard that stands for
Transparent Interconnection of Lots of Links. The goal of the TRILL protocol is to
provide a large scale Layer 2 fabric services. The intent is to maintain the simplicity
of a traditional Layer 2 while adding the scalability and convergence of a Layer 3
routed network
TRILL-capable devices are called Routing Bridges (RBridges) because they use the

******ebook converter DEMO Watermarks*******

IS-IS routing protocol to build a routing table, optimizing data flow for Layer 2
traffic. These RBridges are identified by a System ID and a Nickname.
TRILL devices elect DRBs and AVFs are elected to help optimize TRILL operation
and avoid loops.
Multi-destination traffic is forwarded through the TRILL network using a classic
STP-like topology, while unicast traffic is forwarded based on best paths, as
determined by TRILLs IS-IS routing protocol.
TRILL is enabled globally on a Comware device, and also on appropriate interfaces.
Interfaces that interconnect TRILL core devices are configured as trunk ports, while
connections to CE switches and endpoints are configured as access interfaces.

Learning Check
Answer each of the questions below.
1. Which of the statements below accurately describe TRILL (Choose all that
apply)?
a. An RBridge is a switch that runs TRILL. It uses routing functionality to
determine optimal Layer 2 paths.
b. Each RBridge is uniquely identified by a System ID. This ID is based on the
devices MAC address by default
c. The nickname is used like an IP address to forward frames, along with the
system ID.
d. TRILL uses a link-state database (LSDB) to determine optimal Layer 2
paths.
e. TRILL devices can be connected to a classic Ethernet switch.
f. A TRILL deployment specifies DRBs, AVFs and appointed ports to help
move frames along an optimal path.
2. Which three are part of a TRILL-encapsulated frame (Choose three)?
a. The mezzanine header
b. Sheep header.
c. TRILL header
d. Outer header.
e. Inner header.

******ebook converter DEMO Watermarks*******

3. TRILL multi-destination forwarding includes which three frame types (Choose


three)?
a. All unicast frames.
b. Broadcast frames.
c. All Multicast frames
d. Unicast frames with an unknown destination.
e. Multicast frames with an unknown destination.
4. Which protocol does TRILL use to determine optimal paths?
a. A special version of OSPF that runs at Layer 2.
b. Standard OSPF.
c. A special version of NLSP.
d. A special version of IS-IS that runs at Layer 2.

Learning Check Answers


1. a, b, d, e, f
2. c, d, e
3. b, c, d
4. d

******ebook converter DEMO Watermarks*******

7 Shortest Path Bridging Mac-in-Mac


Mode (SPBM)

EXAM OBJECTIVES
In this chapter, you learn to:
Describe the goal of SPBM.
Describe the use cases of SPBM.
Understand the operation of SPBM.
Configure SPBM.

INTRODUCTION
Shortest Path Bridging Mac-in-Mac mode (SPBM) provides Layer 2 connectivity
between data center sites. This chapter defines SPBM, and explains SPBM operation
and configuration.

SPBM Introduction
Shortest Path Bridging Mac-in-Mac mode (SPBM) enables large Layer 2
deployments. The goal is to maintain the simplicity of a Layer 2 fabric while
leveraging the scalability and convergence of Layer 3 routed services. Best-path
traffic forwarding is based on a link-state routing protocol.
With the traditional Layer 2 Spanning-Tree Protocol, some links are placed in a
blocking state to avoid loops. Since SPBM uses routing mechanisms, all links in the
network can be used for actual traffic forwarding.
SPBM is very similar to the TRILL protocol. TRILL was proposed by the IETF,

******ebook converter DEMO Watermarks*******

while SPBM was developed by the IEEE. Each protocol has unique features that
make it a compelling option for large data center deployments.

SPBM Standards
There are two standards related to SPBM. One is IEEE 802.1ah, which defines
Provider Backbone Bridging (PBB). The other is Shortest Path Bridging (SPB), as
defined by IEEE 801.1aq.
Each of these standards is discussed in this chapter, starting with PBB.

PBB Overview
This section provides an overview of PBB, before delving into device roles and
MAC-in-MAC encapsulation. Frame format and terminology is described, as is PBB
operation with SPB.

PBB Introduction
PBB is a Layer 2 VPN technology based on Ethernet standards, and is quite similar
to an MPLS VPLS solution. It provides bridge functions towards customer networks
and maintains isolation between customer networks.
Compared with MPLS VPLS solutions SPBM is relatively simple to deploy, since it
is based on Ethernet standards. MPLS VPLS infrastructures require expertise with IP
routed infrastructures, label switching, and VPN overlay technologies. Engineers
must be familiar with how all of these layers interact to successfully deploy and
troubleshoot these networks.
With PBB, customer MAC frames are simply encapsulated in a service provider
MAC frame. The customer frame is transported as payload inside a provider Ethernet
frame, with new source and destination MAC addresses.
The frame header includes a service ID field to uniquely identify client networks for
multi-tenant support. This is a unique differentiator compared to TRILL, which
currently supports 4094 VLANs, with no explicit support for any kind of client
network differentiator inside the TRILL fabric.

IEEE 802.1ah PBB Encapsulation Comparison


In the early 80s, the DIX (Digital, Intel, Xerox) consortium released the Ethernet II

******ebook converter DEMO Watermarks*******

specification, as depicted in Figure 7-1. This was soon followed by the IEEE 802.3
standard Ethernet frame format. Most protocol stacks, including TCP/IP, continue to
use on the Ethernet II frame format. This frame format includes destination and source
MAC addresses, an Ethertype definition, and the actual data or payload.

Figure 7-1: IEEE 802.1ah PBB Encapsulation Comparison

In 1998, the 802.1Q standard was released. This provided a tagging mechanism to
allow traffic for multiple VLANs to traverse the same physical connection. This tag
was carried in an additional header, which was inserted between the original
Ethertype and Source Address fields of the standard Ethernet header. This tag has
been referred to as a VLAN tag or C-tag (Customer tag). The customer VLAN tag
must be in the range 1- 4094.
In 2005 the 802.1ad standard was released, in an initial attempt at providing multitenant customer isolation in a common fabric. This standard is also known as QinQ
because 802.1Q customer C-tags are maintained inside of another 802.1Q VLAN
tag called the Service or S-tag.
Using this technology, 4000 VLANs for one customer could have an additional outer
802.1Q tag of 11. These frames would then traverse the service providers
infrastructure. The same range of 4000 VLANs for another client could traverse the
same provider network with an outer 802.1Q tag of 12.

******ebook converter DEMO Watermarks*******

This provides a good solution to allow a single infrastructure to support multiple


clients using the same VLAN numbers. However, the limitations of this solution
should be considered.
One disadvantage relates to scalability. A provider may wish to support several
clients, each with 10,000 or more MAC addresses. All of the providers backbone
switches must learn every MAC address from every client, and maintain a MAC
address table. As new clients are added, memory and CPU utilization increases
proportionally.
Another limitation is that the original outer Ethernet header from each client is
maintained, with its original source and destination MAC addresses. Any MAC
address conflicts between clients will create problems for the provider network.
MAC address conflicts may be relatively rare, but the risk remains for one clients
network to impact another clients, which is not acceptable. Also, router redundancy
protocols like VRRP greatly increase the likelihood of conflicts.
A customer might deploy VRRP with Router ID 1, in VLAN 10. In this scenario,
VRRP uses Ethernet MAC address 0000-015e-0001. Any other client with the same
configuration would use the same MAC address. If both clients shared the same
provider network, a conflict will occur. This will cause a MAC flapping issue,
where the provider switches learn that source address as emanating from Client A,
then from Client B, then client A again, and so on.
PBB alleviates these concerns by encapsulating the entire client frame in a new
Ethernet frame. The original frame is now the payload inside this outer header. The
provider backbone need only process a few MAC addresses per client, regardless of
whether the client has 10 or 10,000 MAC addresses.
The PBB header also includes additional tags to provide unique services to each
client. For example, the I-tag carries unique client QoS information. These features
enable PBB to provide a scalable, multi-tenant environment.
In 2001, the 802.11ah standard was included into the new 802.1Q-2011
specification.

PBB Device Roles


PBB supports two device roles - the Backbone Edge Bridge (BEB) and the
Backbone Core Bridge (BCB). This is analogous to an MPLS deployment, which
uses Provider Edge (PE) and the Provider (P) roles.

******ebook converter DEMO Watermarks*******

The BEB receives the original customer frame and encapsulates it into a new MACin-MAC frame. The source address of this frame is the BEBs own local MAC
address. The target BEBs MAC address is used as the destination in this new frame.
In Figure 7-2, BEB1 receives and encapsulates a frame from the customer. BEB1s
MAC address is the source MAC address of the outer frame, and BEB2 is the
destination. Once encapsulated, the frame is forwarded over the uplink port, into the
backbone.

Figure 7-2: PBB Device Roles

The BCB receives this frame from BEB1, and forwards it based on the outer
destination MAC address. In this scenario, the BCB is using traditional Layer 2
switching to forward the frame. It does not require knowledge of customer MAC
addresses, since it is only using the outer frames MAC addresses for learning and
forwarding purposes.
BEB2 receives PBB frame and decapsulates it. It parses the frames backbone
service instance identifier, or I-SID, which uniquely identifies each customer. This
enables BEB2 to be connected to several clients and deliver frames to the correct
one. In a way, the I-SID acts as a kind of a VLAN tag to differentiate customers
inside the PBB network.
There may be 200 customers supported between BEB1 and BEB2, each with a
unique I-SID assigned. Over the provider network, the source and destination MAC
addresses for all 200 customers are the same, that of BEB1 and BEB2. Therefore, the
BCB only needs to learn and maintain a MAC address table with two MAC
addresses.

******ebook converter DEMO Watermarks*******

PBB MAC-in-MAC Encapsulation


PBB is purely an encapsulation technology, using the MAC-in-MAC format to ensure
customer isolation over a common fabric. It does not provide any Layer 2 path
calculation. This functionality is the responsibility of other protocols, such as link
aggregation, STP, or SPB.
Figure 7-3 indicates a very basic PBB example. Since there is only one possible
path, there is no need for path calculation. The BCB can simply be a traditional Layer
2 switch with no special configuration required.

Figure 7-3: PBB MAC-in-MAC Encapsulation

The example network is simple, but lacks redundancy. Link Aggregation could be
configured between the BEBs and BCB to provide link redundancy and additional
bandwidth. Data forwarding is still based on traditional MAC learning and
forwarding, since link aggregation does not change how the BCB learns and forwards
frames.
For device redundancy, two BCBs could be deployed. They could run traditional
STP to determine a loop-free path, which would automatically be recalculated in
case of a device failure. The basic examples described here serve to illustrate the
point that PBB is only an encapsulation protocol, and is not responsible for path
forwarding and redundancy.

PBB Frame Format


The PBB frame includes the typical preamble, and then starts with a Backbone

******ebook converter DEMO Watermarks*******

Destination Address (B-DA) and Source Address (B-SA), as shown in Figure 7-4.
This is a traditional Ethernet frame, with a standard Ethernet Type field, or TPID.
The Backbone VLAN tag (B-Tag) is simply a traditional 802.1Q VLAN tag, which is
used by the BCB fabric to properly forward frames. The Canonical Format Indicator
(CFI) indicates that MAC addresses are in standard or canonical format.

Figure 7-4: PBB Frame Format

The I-Tag contains additional PBB information, including the unique I-SID value
assigned to each customer. The I-tag also includes the backbone Instance Priority
Code Point (I-PCP). This value is used for QoS marking, similar to the 802.1p
priority value used in VLAN tagging. This provides a mechanism for the service
provider to set priority values without modifying the customers original 802.1p
marking inside the standard 802.1Q headerthe B-Tag.
The Drop Eligibility Indicator (I-DEI) is a feature which has been available in IP
networks for years, as a part of the DSCP values in an IP header. Two bits of the
DSCP value are used to indicate levels of drop probability for QoS purposes.
Likewise, PBBs I-DEI value is used to mark certain traffic with higher or lower
levels of drop probability or eligibility.
A network administrator can vary drop probability for clients based on bandwidth
utilization. For example, as long as clients use less than 100 Mbps, their frames are
marked with a low eligibility value, and are unlikely to be discarded. When client
traffic bursts between 100 and 150Mbps, frames can be remarked with a higher IDEI value, and so are more likely to be discarded. Above 150Mbps, frames are
remarked with yet another value, and are very likely to be discarded. When multiple
client frames arrive at a congested switch, the ones with the highest eligibility value

******ebook converter DEMO Watermarks*******

are dropped first.


The I-Tags UCA bit indicates that different addressing format is carried in an I-Tag,
while the RES bits are reserved for future use.
The C-DA and C-SA are the source and destination MAC addresses of the original
clients Ethernet frame. This is only learned by client-facing BEBs. The Ethernet
payload could be an untagged frame, a single VLAN tag or a QinQ frame with a
double tag. The S-Tag would be the outer tag and the clients C-tag would be the
inner VLAN tag.

PBB Concepts: B-TAG and I-SID


It is important to understand the relationship between the B-Tag and I-SID. The B-Tag
includes the B-VLAN and priority values, enabling multiple VLAN usage in the PBB
backbone. A single B-VLAN can transport multiple customers traffic, since each
client can be uniquely identified by their I-SID. It is the I-SID that enables multitenant support over a single backbone. The I-SID is administratively configured,
using a number in the range of 255 to 16,777,215.

PBB With or Without SPB


For a PBB-only configuration, the backbone requires some Layer 2 path services
between BEBs. Since the backbone typically requires device redundancy, some
protocol must provide optimal paths while avoiding loops.
This redundancy could be provided by using IRF with Multi-Chassis link
aggregations. One consideration for this solution is that it requires a homogenous,
single-vendor deployment.
Another option is to leverage the Spanning Tree Protocol, however it is an unlikely
choice for a large backbone infrastructure. This is because of limitations related to
fail-over time and scalability. An additional factor relates to the active-standby
nature of STP redundancy, due to the single tree that is calculated.
With multi-vendor deployments, or when a highly redundant and scalable solution is
required, SPB is an attractive option. While PBB provides the encapsulation service,
SPBs multi-path, active-active deployment model provides distinct advantages over
STP.
SPB uses IS-IS link calculations to determine best paths through the backbone. This
allows all active paths to be used for data forwarding. The combination of SPB and

******ebook converter DEMO Watermarks*******

PBB results in SPBM, Shortest Path Bridging with MAC-in-MAC mode.

SPB Introduction
There are two flavors of SPB available. One is Shortest Path Bridging VLAN Mode
(SPBV) and the other is Shortest Path Bridging MAC-in-MAC mode (SPBM)
SPBV was originally intended as a successor to STP. SPBV does not encapsulate
packets. It is only responsible for discerning best paths using an Active-Active
model. This protocol will not be discussed further, as it is not widely implemented.
HP devices do not support this protocol.
Like SPBV, SPBM also provides multiple active paths in a Layer 2 Ethernet, based
on shortest path calculations performed by IS-IS.

SPBM Device Roles


Since SPBM is based on PBB the same BCB and BEB device roles are used. Figure
7-5 shows four BCBs and four BEBs. As with a pure PBB-only scenario, BEBs are
customer facing, and are responsible for learning customer MAC addresses, PBB
still encapsulates customer frames as before, so that BCBs need not learn customer
MAC addresses. BCBs only need to learn the MAC addresses of the four BEBs.

******ebook converter DEMO Watermarks*******

Figure 7-5: SPBM Device Roles

Unlike PBB, with SPBM, the backbone is not just a simple Layer 2 switch fabric.
Instead, switches run SPBs IS-IS protocol for best-path topology calculations.

SPBM Path Calculation


BCBs need only learn BEB MAC addresses, and so SPBM is responsible for
calculating best paths to reach these BEB MAC addresses. The IS-IS link State
Protocol is used to calculate these paths, and both BEBs and BCBs must participate
in this process, as shown in Figure 7-6.

******ebook converter DEMO Watermarks*******

Figure 7-6: SPBM Path Calculation

Frame forwarding for SPBM differs from TRILLs hop-by-hop encapsulation method.
With TRILL, each routing bridge that receives a frame strips away the outer header,
and creates a new Layer 2 header to transmit the frame to the next-hop RBridge. This
method requires specific hardware capabilities built into the devices ASIC circuitry.
With SPBM, the BEB receives a client frame and creates a PBB encapsulation
around it. That BEBs MAC address is the source, and the ultimate destination BEB
MAC address is the destination. This single encapsulation traverses the entire SPBM
backbone. The link-state protocol finds the best path to that destination BEB, and
BCBs all forward the frame accordingly.

******ebook converter DEMO Watermarks*******

SPBM: Multiple Path Selection 1/2


In a traditional IP network, traffic can be load-balanced over multiple equal-cost
paths. Some IP routers might use per-packet load-sharing, where each packet is sent
over a different equal-cost path, in a round-robin fashion. However most Layer 3
switches load-share based on a hash calculation that is performed on the source and
destination IP addresses. This means that all traffic with the same source/destination
IP address pair will traverse the same path. A different source/destination pair will
use another one of the equal-cost paths.
SPB does not use the mechanisms described above. In fact, load-sharing can be very
simple to administratively control by configuring B-VLANs. For example, you could
setup B-VLAN 100 for a certain set of tenants to be supported between BCB1BCB3-BCB4, as shown in Figure 7-7. Similarly, B-VLAN 200 could be for a
different set of tenants, supported between BCB1-BCB2-BCB4. Although there are
multiple physical paths, a single path has been configured for Silver tenants, and a
different path for Blue tenants. This is an easy way for an administrator to split the
load across different paths.

******ebook converter DEMO Watermarks*******

Figure 7-7: SPBM: Multiple Path Selection 1/2

Alternatively, you could decide to enable both B-VLANs 100 and 200 on both paths,
which are of equal cost in this scenario. Therefore, there are two equal-cost paths for
both Silver and Blue tenants. In this case one of sixteen predefined ECT algorithms
can be applied to each B-VLAN.
These algorithms provide a very deterministic path mechanism. In other words, it
uses a specific, determined path from a source to a particular destination. The same
path will be used for return traffic, since the same algorithm is used. For example,
one of the sixteen available algorithms chooses the path based on the highest bridge
ID. Another bases it on the lowest bridge ID. The network administrator can assign
which customers traffic uses which algorithm.
Blue customer traffic uses B-VLAN 100, which you have configured to use ECT
algorithm 1. When this algorithm detects equal-cost paths, the path with the highest

******ebook converter DEMO Watermarks*******

Bridge ID is selected. Assuming BCB3 has a higher Bridge ID than BCB2, Blue
tenant traffic between BCB1 and 4 traverses BCB3.
Meanwhile, Silver tenants use B-VLAN 200, which is configured to use ECT
algorithm 2. This traffic would use BCB2 as their preferred path, since this algorithm
chooses the bridge with the lowest Bridge ID.
Since the return traffic uses the same algorithm, customer traffic originating from
BEB 4 and destined for BEB1 will use the same path as described above. Blue
tenants configured to use ECT algorithm one will reach BEB1 via BCB3. Silver
tenants assigned to use algorithm 2 would reach BEB1 via BCB2.
The ECT algorithms described above base their path choices on Bridge ID, and
administrators can manipulate these ID values to engineer traffic flows. An
administrator could simply assign a higher Bridge ID value to BCB2, and change
traffic flow patterns. In this way, a network administrator can pre-determine or preengineer the paths along which certain client traffic will pass.
By default all backbone VLANs use ECT algorithm 1, but you can select which
customer B-VLANs use which ECT algorithms. The deterministic nature of the ECT
mechanism is different from that used by TRILL. TRILL simply uses a hash-based
mechanism for equal-cost load-sharing. This load balancing is effective, but there is
no precise, deterministic control over which specific path a particular customers
traffic will use.

SPBM: Multiple Path Selection 2/2


The administrator assigns each backbone VLAN to a specific ECT algorithm, thus
creating a deterministic path for each one. For example, in Figure 7-8, backbone
VLAN 100 might be used for tenants with a Blue plan, while VLAN 200 might be
used for tenants that have paid more to have a Silver plan.

******ebook converter DEMO Watermarks*******

Figure 7-8: SPBM: Multiple Path Selection 2/2

Traffic assignment is controlled by the ingress BEBs. These BEBs are configured
with a Virtual Switch Instance (VSI) to accept a particular clients traffic, similar to
an MPLS VPLS scenario.
From within the VSI configuration, the network administrator defines the I-SID,
which uniquely identifies the customer. The backbone VLAN tag can also be defined.
This backbone VLAN tag can be mapped to a specific ECT algorithm to control path
selection, as described above.
For example, a BEB may have an interface connected to two Silver tenants and
different interfaces connected to some Blue tenants. On this switch, a VSI is defined
for each silver customer, with a unique I-SID defined. All of the VSIs for these
customers are configured to use B-VLAN 200. More VSIs are defined for the Blue
clients, each with a unique I-SID, and all configured to use VLAN 100. This VLAN

******ebook converter DEMO Watermarks*******

100 can be configured to use a different ECT algorithm, and therefore a different path
in the backbone.

SPBM VSI
The VSI is defined on BEBs. This definition includes a unique I-SID to define the
tenant, as well as a backbone VLAN. There are also two types of interfaces that can
be defined. Tenant-facing physical interfaces and uplink interfaces, which are PBB
interfaces.
The I-SID number ranges from 255 to 16,777,215. I-SID number 255 is reserved for
the SPB fast-channel feature, and should not be assigned by network administrators
for tenant usage.
The fast-channel feature is an enhancement in how IS-IS LSPs are delivered between
BCBs. In a traditional IS-IS network an LSP is generated by a device and then
forwarded to peers. Peers process this LSP to their link-state database, and then
forward it on to their peers. In a large network LSPs may need to be processed by
several switches in the path. Each switch must submit the LSP to their control plane
and process the update, delaying receipt of the LSP at the remote end of the network.
This hop-by-hop processing must occur on traditional Layer 3 IS-IS networks
because each routed link is in a different Layer 2 broadcast domain.
SPBs fast-channel feature is possible because all BCBs have Layer 2 connectivity,
provided by PBB. The configuration can include reserved I-SID 255 to be used as a
kind of internal VLAN. When a topology change occurs, LSPs are generated as usual.
However, not only are they forwarded to IS-IS peers, they are also forwarded as a
data frame over I-SID 255. All BCBs therefore receive LSPs at about the same time,
and can all process them in parallel.
The fast channel feature is automatically enabled when you configure a VSI with ISID 255.

SPBM S-VID to I-SID to B-VLAN Mappings


The BEBs and BCBs in an SPBM fabric must be aware of which traffic belongs to
which tenant. On tenant facing interfaces the BEB selects specific traffic as belonging
to tenant 1, 2, or 3. This selection can be done at the interface or VLAN level. For
VLAN-level selection, the outer VLAN is called the S-VLAN ID, see Figure 7-9 for
an example. There are over 4000 S-VIDs available, each of which can be mapped on
a specific I-SID number.

******ebook converter DEMO Watermarks*******

Figure 7-9: SPBM S-VID to I-SID to B-VLAN Mappings

I-SIDs are assigned to tenants. Each tenant should have a unique I-SID, or unique set
of I-SIDs, different from those assigned to other tenants. Therefore, I-SIDs represent
a tenants traffic in the SPBM fabric. Also, a single I-SID defines or represents a
single MAC address table.
If the tenant requires connectivity for ten VLANs to a remote location, you have two
choices. You can either use a single I-SID for all of that tenants traffic, or each tenant
VLAN can be mapped to its own I-SID, or you can use a hybrid of the two.
If a single I-SID is used for all ten tenants VLANs, then a single, flat MAC address
table is common for all VLANs. The simplicity of this option is attractive, but there
must be no MAC address conflicts across all VLANs for that tenant.
If you use a separate I-SID for each tenants VLAN, then each VLAN has its own
MAC address table. This option adds a bit of complexity, but eliminates the
possibility of inter-VLAN address conflicts.
Note
Many modern switches maintain separate MAC address tables per VLAN. In
that case, the concerns about MAC address conflicts described above would
not be an issue.

SPBM Forwarding Flows


This section is focused on how various types of traffic are forwarded through an
SPBM fabric. The section begins with a discussion of the MAC learning and

******ebook converter DEMO Watermarks*******

forwarding process for unicast traffic. Then the three variations of multicast traffic
flows are discussed, including broadcast, multicast, and unknown unicast.

SPBM Forwarding Process Unicast


The BEBs VSI for a client receives the frames that a client transmits, and learns
MAC addresses by checking the source address field in these received frames. This
identical to the operation of traditional Ethernet switches.
Incoming tenant frames destined for a remote site over the SPBM fabric will be
encapsulated inside a PBB frame. This outer frame is marked with an I-SID and
tagged with the appropriate B-VLAN ID. The BEB places its MAC address as the
source address in the new header. The switch must also determine how to forward
this frame to the appropriate destination. This process is shown in Figure 7-10.

Figure 7-10: SPBM Forwarding Process Unicast

The switch analyzes the destination MAC address in the received tenant frame,
comparing it to its VSI MAC address table. If this MAC address was previously
learned from a remote BEB, it will be in this VSI MAC address table.
However, it will not be mapped to an outgoing interface, as with a traditional MAC
address table. Instead, it is mapped to the MAC address of the remote BEB from
which it was learned. The local BEB would therefore know which remote BEB to
forward the frame to. That BEBs MAC address would be used as the destination

******ebook converter DEMO Watermarks*******

MAC address in the PBB frame header. Of course, the fabric BCBs forward frames
based on this destination MAC address.
If the tenant-generated frames destination MAC address has not yet been learned, or
if it is a multicast or broadcast, then the frame is flooded inside the backbone. In this
way, every pertinent BEB will receive the frame. This flooding is based on a Layer 2
multicast process. BCBs do not use the Ethernet broadcast MAC address for this
purpose.

SPBM Multicast Forwarding


SPBM supports two multicast methods to deliver frames inside the backbone. One
method is called head-end replication, and the other is called tandem replication.
With head-end replication, tenant multicast traffic is transported as unicast traffic to
each remote BEB. In an SPBM network with six BEB switches, the ingress BEB that
receives a client frame transmits five unicast frames into the backbone one for each
of the other five BEBs.
One advantage to head-end replication is simplicity, as there is no requirement for
multicast support inside the backbone. Another advantage is optimized path selection,
since each unicast frame can be forwarded along the shortest path to the destination
BEB.
Head-end replication need only send unicast frames to those BEBs that participate in
that tenants I-SID. This limits the number of destination BEBs that need to receive
and process the frame. Although an SPBM fabric may be comprised of fifty BEBs,
perhaps only three of them are used to support a particular tenant. Only these three
switches would be configured with that tenants VSI. The ingress BEB, therefore,
only needs to create two replicas of the frame.
Tandem replication uses multicasting as its transport mechanism. The ingress BEB is
therefore relieved of the burden of replicating and transmitting an inbound frame
multiple times. It need only transmit a single multicast frame into the backbone.
However, the backbone devices must now be capable of processing and forwarding
multicast frames.

SPBM Multicast Head-End Replication


With head-end replication, an ingress BEB receives tenant multicast frames and
replicates them as unicast frames into the backbone. A unicast frame will be
generated for each BEB that participates in that tenants I-SID. Overhead is a concern

******ebook converter DEMO Watermarks*******

when a single multicast frame must be replicated and transmitted several times as a
unicast frame. This overhead increases with the size of the clients deployment. If the
SPBM fabric supports a client with 20 sites, each multicast must be replicated
nineteen times.
For this reason, head-end replication is most suitable for deployments with limited
multicast traffic. This is the default mode on HP Comware devices configured for
SPBM.
Figure 7-11 shows an example of BEB1 receiving an incoming multicast date frame.
BEB1 must replicate this frame two times, one for each of the other BEBs
participating in the tenant deployment. Each replica must be encapsulated. One must
then be transmitted to BEB2, and the other must be transmitted to BEB3.

Figure 7-11: SPBM Multicast Head-End Replication

SPBM Multicast Tandem Replication


Tandem replication uses a special multicast address, based on a unique ingress BEB
identifier and the tenants I-SID. The ingress BEB uses this as the destination MAC
address of the new frame, and sends it into the backbone. Based on the shortest path

******ebook converter DEMO Watermarks*******

topology, certain BCBs will be forks in the path toward destination BEBs. These
BCBs must replicate the frame toward each BEB that is relevant to the tenant.
In Figure 7-12, BEB1 sends a multicast frame into the backbone. The address was
fashioned as described above, based on the ID for BEB1 and the clients I-SID.
When the BCB receives this multicast traffic it knows which interfaces to use for
frame forwarding. The multicast address is client-specific, and the BCB knows
which peers are relevant to that client. Using Reverse-Path Forwarding (RPF) it
knows that it received the frame on the best path back to the source, and knows that it
should duplicate these frames away from the source, toward appropriate destination
BCBs.

Figure 7-12: SPBM Multicast Tandem Replication

As previously stated, the default method used to forward client multicast traffic is
head-end replication. The mode can be configured for each tenants VSI. It is
important that the same method is configured in the VSI of every BEB for a particular
client.

SPBM Backbone Multicast VLAN


******ebook converter DEMO Watermarks*******

The backbone multicast VLAN feature is an option to isolate multicast traffic inside
the backbone, see Figure 7-13. Since head-end replication uses unicasting, the
backbone multicast option is only applicable for the tandem multicast method. The
feature is configured with the multicast-bvlan enable command and should be
configured with care to ensure a functional deployment.

Figure 7-13: SPBM Backbone Multicast VLAN

The B-VLAN configured by the administrator for this feature should always be an
odd-numbered VLAN ID, such as 11, 21, and 33 for example. These odd-numbered
VLANs will be used for unicast delivery. The device with the multicast backbone
VLAN feature enabled will automatically use the next even VLAN for multicast
delivery.
For example, supposed that a tenant sends multicast data toward BEB1. BEB1s VSI
for this tenant is bound to backbone VLAN99. Unicast traffic from that client will
traverse the backbone with that tag. Since the BEB is configured with the backbone
multicast feature, any multicast frame from that client are forwarded to the BCB
tagged with B-VLAN 100.
Remember that since each B-VLAN can be configured with a specific ECT
algorithm, VLANs can use unique paths through the backbone.

******ebook converter DEMO Watermarks*******

SPBM Configuration Requirements


If IRF systems participate in SPBM, the IRF MAC address must always be
configured. This is configured by default on chassis-based devices, but should be
manually configured on traditional top-of-rack devices. Failure to configure this can
create problems after an IRF fail-over event.
By default, when a different physical switch becomes the new IRF master, it begins
to use a new MAC address after a few minutes. This would result in a different
system ID, and an IS-IS topology change in the SPBM fabric. Since a BEBs MAC
address is used as the source or destination for frame transport, a change in that
address breaks the PBB delivery mechanism. For these reasons, stable MAC
addresses are important.
Another important fact to remember is that the version of IS-IS used by SPBM is
exclusively for its use. This protocol is separate and distinct from the IS-IS protocol
used for IP routing. The two protocols do not interact in any way.
We know that each BEBs MAC address must be unique, since these addresses serve
as the source and destination address for frame transmission across the fabric. Of
course, you should avoid statically configuring a MAC address on these devices that
could conflict with another address.
The backbone VLAN for the SPBM should be dedicated for SPBM. No other feature
should be configured on this VLAN. Also, SPBM requires that the Spanning-Tree
mode of operation be set to MST.

SPBM Support for Graceful Restart


The SPBM implementation on HP devices includes full support for graceful restart.
This feature minimizes outages during fail-over event. Without this feature, chassis
blades or IRF systems may fail-over quickly, but routing protocols must re-establish
peer relationships, lengthening the effective downtime.
As with the traditional graceful restart feature used with OSPF, SPBMs IS-IS
defines a graceful restarter function on the device that failed over, and graceful
restart helper function device peers. The configuration must be enabled on both the
restarter and helper device. The configuration of this feature is a best practice for
high-availability deployments.

Design Considerations 1
******ebook converter DEMO Watermarks*******

By default, LSPs are only propagated by the control-plane, just like OSPFs control
plane process distributes LSPs. This means that a device must receive LSPs, process
them at the control plane, and then propagate them to other devices.
Fast channel is an additional, optional mechanism that propagates LSPs throughout
SPBM domain using a multicast that goes out to all SPBM devices. This messaging is
processed by hardware, while the data plane simultaneously propagates the
information to other devices.
The feature is automatically enabled when a VSI is created with I-SID 255. This VSI
should not have tenant-facing interfaces, nor should this I-SID value be assigned to an
actual tenant. Even if the fast channel feature is not desired for initial deployment, ISID 255 should be kept in reserve.

Design Considerations 2
As we have seen with TRILL, devices that do Layer 2 shortest-path bridging cannot
also perform Layer 3 routing functions. Routing must be performed by some device
that is not configured for SPBM.
Layer 3 routers must connect to the SPBM VSI as an endpoint. Chassis-based SPBM
switches can use a separate Multi-Tenant Device Context (MDC) for this purpose. In
that scenario, a cable can be connected from an SPBM-participant line card to an IProuting MDC. Effectively SPB and Layer 3 routing functions are being performed by
two separate devices, since the MDCs are functionally isolated features inside the
physical switch.

Design Considerations 3
SPBM is fully supported in combination with IRF, and SPBM backbone links can use
link aggregation. You can also use service instances on link aggregation interfaces for
customer access. This is a convenient way to provide redundant access into the
SPBM service.
You can also take advantage of the link-aggregation hashing algorithm for loadsharing between SPBM backbone devices. Multiple physical links using link
aggregation are perceived as a single logical connection by the BCBs.
As shown in Figure 7-14, when Layer 3 routing and IRF are combined with Layer 2
SPBM services, you can leverage IRFs active-active model for redundancy. This
active-active Layer 3 role is connected to an SPBM VSI by a link-aggregation group.

******ebook converter DEMO Watermarks*******

Figure 7-14: Design Considerations 3

Alternatively, you can deploy an active-standby model that uses separately-managed


switches, configured to use VRRP. Some network engineers may prefer the
convenience of managing multiple physical switches as a single IRF entity. Others
may be attracted to the flexibility of having separately managed control planes in
separate physical switches.

Configuration Steps for SPBM


Figure 7-15 outlines the configuration steps for SPBM.

******ebook converter DEMO Watermarks*******

Figure 7-15: Configuration Steps for SPBM

To begin the configuration of SPBM features, the L2VPN and SPBM settings are
globally configured, as are MSTP regions.
The B-VLAN is configured, along with customer-facing VSI and I-SID. The final
required steps are to create and bind a service instance for the customer and verify
your efforts.
Optionally, multicast replication and B-VLAN to ECT mapping can be configured.
Note
The 12500 cannot be configured for SPBM if it is operating in standard mode.
Prior to configuring SPBM on a 12500 model, it should be set to any other
mode but standard, such as route, bridge, grand, or advanced.

Prerequisite Step for Some Models


The 12500 cannot be configured for SPBM if it is operating in standard mode. It must

******ebook converter DEMO Watermarks*******

be configured in some other mode, such as route, bridge, grand, or advanced. While
most HP Comware switch models do not require this step, other models may be
introduced that require this step. You should check the configuration manuals for your
specific device to determine if the system working mode needs to be configured.
Figure 7-16 shows a typical configuration for a 12500. This setting requires a system
reboot in order to take effect. The command display system-working-mode can be
used to validate the setting.

Figure 7-16: Prerequisite Step for Some Models

Step 1: Configure Global L2VPN and Global


SPBM
Figure 7-17 shows the syntax to globally enable L2VPN and SPBM.

Figure 7-17: Step 1: Configure Global L2VPN and Global SPBM

Step 2: Configure MSTP Region Settings for


SPBM
Next step is to configure an MSTP region, as required by SPBM. Backbone VLANs
are all allocated to special MSTP instance 4092. Using this special MST instance
number, there is no need to run actual Multiple Instance Spanning Tree, but the region
configuration must be active.

******ebook converter DEMO Watermarks*******

In Figure 7-18, B-VLAN 4021 is used. The region name backbone is configured.
This can be any name, but should match on all devices. Instance 4092 is defined, and
B-VLAN 4021 is allocated to it. If other B-VLANs were to be used, they would also
need to be mapped to this instance. Finally, the MSTP instance is activated in the last
command shown.

Figure 7-18: Step 2: Configure MSTP Region Settings for SPBM

Step 3: Configure B-VLAN


The B-VLANs must be defined and enabled on all backbone links. These uplinks
should be configured to only permit B-VLANs and no others.
In the example scenario shown in Figure 7-19, VLAN 4021 is the B-VLAN. In the
figure, the first command creates the VLAN. Then a backbone interface is configured
as a trunk port. Default VLAN 1 is removed, and VLAN 4021 is assigned to the port.

Figure 7-19: Step 3: Configure B-VLAN

The last command in Figure 7-19 enables the SPBM feature on the interface. This
command actuates two functions. It enables PBB on the interface, and so all frames in
and out of this interface shall be encapsulated using PBB. Any frames received on
this interface that are not PBB-encapsulated are ignored.
The command also enables SPB using MAC-in-MAC mode. SPBMs IS-IS protocol
is therefore active on this interface. It sends IS-IS Hello packets, and forms
adjacencies with other IS-IS peers.

******ebook converter DEMO Watermarks*******

Step 4: Create SPBM VSI and Configure I-SID


BEBs have some interfaces connected to clients, and others connected to the SPBM
backbones fabric of BCBs. A VSI is configured for each client. The VSI will have
two types of active interfaces. Customer facing interfaces will be configured with a
service instance. The other interfaces are backbone links. These will be learned
through SPBM. There will be one logical link for each remote BEB which belongs to
the same I-SID.
Figure 7-20 shows a VSI named customerA is configured. This name is locally
significant only. BEB2 could be configured as customerA1, as long as the I-SID
matches. Of course, configuration consistency is a good idea.

Figure 7-20: Step 4: Create SPBM VSI and Configure I-SID

In this scenario, an I-SID of 10001 is assigned. This uniquely defines the customer
network inside the backbone. B-VLAN 4021 is associated to this I-SID. This is the
VLAN that will be used to transport customer As traffic over the SPBM fabric. It
was assigned to a physical, backbone-facing interface in the previous step.
Optionally, this VLAN can be configured to use specific ECT load-sharing
algorithms.

Step 5: Create and Bind Service Instance


Customer-facing interfaces must be configured. The service instance is created with
some arbitrary ID number. The service instance ID is locally significant only. The
service instance defines which customer is connected to the interface, and which
customer VLANs can traverse the SPBM fabric.
In Figure 7-21, service instance 10 is defined on interface ten1/0/1. The
encapsulation command defines that incoming traffic tagged with VLAN 101 shall be
eligible for SPBM fabric services. Alternatively, the encapsulation default option
could be specified. This would allow all incoming traffic to use SPBM, tagged or
untagged. Whatever is specified, all of that client traffic is associated with a single
VSI, as defined in the previous step.

******ebook converter DEMO Watermarks*******

Figure 7-21: Step 5: Create and Bind Service Instance

By defining additional service instances, multiple VLANs can have separate MAC
address tables. For example, the BEB could be configured with service instance 11.
Then the encapsulation command for that instance can specify VLAN 102 and then
cross connected to another VSI. You may recall that each VSI defines a separate
MAC table. One VSI should be considered as one shared MAC address data base.
The VSI can also be used to perform local switching operations. For example, you
could configure interfaces ten1/0/1, 1/0/5, and 1/0/7 for service instance 10, with
encapsulation VLAN ID 101, and cross-connected to the VSI for CustomerA. This
results in three interfaces inside the virtual switch instance, and this VSI can perform
local switching between them.
Interfaces are configured for Layer 2 switch port operation by default with the
command port link-mode bridge. The interface must be operating as a Layer 2 port
to deploy the configuration shown, but you typically would not need to enter the
command.

Step 6: Verify
Several display commands are available to verify SPBM operation. These
commands are shown in Figure 7-22.

Figure 7-22: Step 6: Verify

******ebook converter DEMO Watermarks*******

Configuration Review: BCB Configuration


Figure 7-23 collects the SPBM syntax into a single BCB configuration example. The
BCB must participate in the SPBM IS-IS topology, and so interface is configured
with the spbm enable command. Since BCBs are not customer facing, they do not
require the VSI instance configuration that a BEB needs.

Figure 7-23: Configuration Review: BCB Configuration

Globally, L2VPN and SPBM are enabled. The MST region is configured with the
special instance number 4092, the B-VLAN 4021 is assigned, and the instance is
activated. VLAN 4021 is actually created from global configuration mode.
Next, BCBs physical interfaces are configured as trunk ports, and VLAN 1 is
removed to ensure that only the B-VLAN is supported on the interface, as required.
The VLAN is permitted on the interface, and SPBM is enabled.

Optional Step: Multicast Replication


You have previously learned that head-end replication is the default method of
handling multicast, broadcast, and unknown unicast frames. If multicast replication
mode is desired, then it should be configured on all BEB nodes participating in the
same customer I-SID.
In the example in Figure 7-24, the VSI for customer A has been assigned I-SID
10001. From within that context, multicast replication mode is set to tandem. This
means that each VSI can be configured with its own multicast replication mode.
Some VSIs can be configured for head-end replication, while others can be
configured for multicast replication.

******ebook converter DEMO Watermarks*******

Figure 7-24: Optional Step: Multicast Replication

It is recommended that the default head-end replication be used for I-SIDs with only
two participating BEBs. It doesnt make sense to use multicast tandem mechanism
when only a single unicast frame needs to be sent anyway.

Optional Step: B-VLAN to ECT mapping


Another optional configuration task is to specify the ECT algorithm to be used for a
specific B-VLAN. The ECT algorithm controls path determination through the
backbone. Sixteen ECT algorithms are available. The actual numbers assigned to
them are hexadecimal values 0x0080C201 0x00080C210. (0x10 = 16 in decimal)
By default all the B-VLANs use algorithm 1. This configuration is actually
configured as a global SPBM setting. It is not configured at the I-SID level. The BVLAN to ECT algorithm mapping configuration should be consistent across all BCBs
in the fabric.
In the example shown in Figure 7-25, BEB1 is configured by first entering the SPBM
configuration context. From here, B-VLAN 4021 is configured to use ECT algorithm
2. When multiple paths are available, this algorithm with choose the path with the
lowest bridge ID.

Figure 7-25: Optional Step: B-VLAN to ECT mapping

HP devices support an ECT migration feature, allowing the easy migration of


customers to different B-VLAN without any impact. To do this, the network
administrator begins my creating a new B-VLAN, and assigning the desired ECT
algorithm. This configuration must be completed on all the backbone devices.

******ebook converter DEMO Watermarks*******

Next, each BEB participating in the customer VSI is configured to support the new BVLAN. This new B-VLAN assignment is automatically exchanged throughout the
backbone by IS-IS. Even though the administrator executes the command to use the
new B-VLAN, devices will not actually start using it unless it has learned it from ISIS or from other BEBs for that I-SID are also using it.
In this way, traffic will continue to be forwarded on the original B-VLAN until the
administrator has configured every pertinent BEB, and IS-IS has announced the new
B-VLAN. Once that occurs, all devices seamlessly switch to the new B-VLAN,
which could be configured with a different ECT algorithm.

Summary
In this chapter you learned Shortest Path Bridging Mac-in-Mac mode (SPBM) is
defined by IEEE 802.1ah (PBB) and 802.1aq (SPB). The goal is to maintain the
simplicity of a Layer 2 fabric while leveraging the scalability and convergence of
Layer 3 routed services. Best-path traffic forwarding is based on a link-state routing
protocol.
SPBM is relatively simple to deploy, since it is based on Ethernet standards, and
provides multi-tenant capabilities.
Two SPBM device roles are defined. BEBs receive customer frames, wrap them in a
PBB encapsulation, and send them into the SPBM backbone. BCBs for the backbone
fabric, simply forwarding frames based on the outer PBB MAC header.
SPBM uses the SPB protocol to determine best paths. SPB is based on the IS-IS
routing protocol.
When multiple equal-cost paths exist, one of sixteen available ECT algorithms are
used to provide deterministic path selection. Administrators can control which
algorithm is used for a particular client, and manipulate selection metrics used by that
algorithm to engineer SPBM fabric paths.
A VSI is defined on BEBs to service customer traffic. A customers VSI definition
includes a Backbone Service Instance Identifier, called an I-SID, and a B-VLAN.
This definition must be consistent across all BEBs that participate in that customers
connectivity.
SPBMs processing of unicast frames is very similar to that of classic Ethernet.
Multicast forwarding can use a head-end replication or tandem replication method.

******ebook converter DEMO Watermarks*******

SPBM configuration includes configuring MSTP regions, B-VLANs, VSIs and ISIDs. Multicast replication and B-VLAN-ECT mapping configurations are optional.

Learning Check
Answer each of the questions below.
1. Which of the statements below accurately describe the goals of SPBM (Choose
three)?
a. Support a large-scale Layer 3 fabric.
b. Maintain the simplicity of Layer 2.
c. Provide the scalability and convergence of Layer 3.
d. Forward based on best path selection.
e. Forward based on advanced STP.
2. Which statements are true about PBB device roles (Choose three)?
a. The BEB adds a MAC-in-MAC encapsulation to customer frames
b. The BEB-added frame encapsulation uses its local BEB MAC address as
the source, and the target BEBs MAC address as the destination.
c. BCBs forward frames based on destination IP address
d. BCBs forward frames based on the outer MAC address
e. BCBs forward frames based on the customers MAC address
3. PBB supports over 16 million customers, uniquely identified by an I-SID, while
the B-TAG enables multiple VLANs to be used inside the backbone.
a. True.
b. False.
4. How does SPBM load-share across multiple equal-cost paths?
a. SPBM uses a hashing algorithm based on source/destination IP addresses.
b.

SPBM uses a hashing algorithm based on source/destination MAC


addresses

c. Per-packet load sharing uses a round-robin scheme to split the load among
multiple equal-cost paths.
d. A set of 16 predefined ECT algorithms are used to provide deterministic
path selection.

******ebook converter DEMO Watermarks*******

5. When SPBM handles multicast traffic, what are the two types of replication used
(Choose two)?
a. Head-end replication uses a BCB to send unicast frames throughout the
backbone.
b. Unicast replication uses a BEB to send unicast frames into the backbone.
c. Tandem replication uses a special multicast address to forward frames to
appropriate customer sites
d. Head-end replication uses the ingress BEB to replicate a multicast frame as
several unicast frames into the backbone.
e. I-SID replication creates a special address based on the customer I-SID and
a unique ingress BEB identifier.

Learning Check Answers


1. b, c, d
2. a, b, d
3. a
4. d
5. c, d

******ebook converter DEMO Watermarks*******

8 Ethernet Virtual Interconnect

EXAM OBJECTIVES
In this chapter, you learn to:
Describe EVI features.
Understand EVI basic operations.
Understand EVI redundancy options.
Configure EVI.

INTRODUCTION
This chapter is focused on the Ethernet Virtual Interconnect (EVI) protocol. The
motivation for developing EVI is explained, as are the features and functionality that
enables EVI to extend Layer 2 broadcast domains over routed IP transport networks.

EVI Introduction
Data center administrators need the flexibility to move any VM to any physical
server, at any data center site. This is only possible if each VMs VLAN can be
extended across physical data centers. EVI was developed by HP to extend Layer 2
networks across data centers. This can also be accomplished with traditional fiber
connections. However, direct fiber connections are not always available or feasible.
For example, a customer might have each data center connected via a routed service.
In that scenario EVI can use these routed connections to transport Layer 2 traffic. A
key advantage over other options is that there is no need for MPLS support between
data center connections. This is a strict requirement for client-managed services like
VPLS or L2 VPN.

******ebook converter DEMO Watermarks*******

EVI protocols support the grouping of up to eight data centers in the same Layer 2
topology. Several optimization and tuning techniques are available to improve
performance and resiliency.
One such feature relates to local and remote MAC learning. You will see how local
MAC learning is based on traditional data plane processes, while remote MAC
learning uses a control plane process. This provides efficiency gains over other
options.
Another feature is ARP suppression. EVI-capable devices can snoop ARP
information from the remote site, and provide proxy responses to local ARP
broadcast request. This can suppress ARP traffic from traversing WAN connections.
Also, EVI links block the flooding of multicast traffic. This affects the relationship
between redundant VRRP routers, ensuring that each data center has a local VRRP
master. This optimizes data flow between endpoints and their default gateways.
Selective flooding is also support. This enables the network administrator to control
which multicast and unicast traffic may traverse the WAN connection.

Supported Products
Currently, the Comware12500 supports EVI. The Comware 12900 is supported with
the latest release, while 11900 support is planned. The MSR/VSR is supported as of
late 2014, with HSR support planned for early 2015.
Note
You cannot mix a 12500 and 12900 in the same EVI network, since they use
different EVI encapsulation. HP routers can interface with either the 12500 or
12900.

EVI Operation
This section begins with discussions about EVI terminology and concepts. This is
followed by reviewing how EVI Networks operate, including neighbor discovery, the
traffic forwarding process, tuning, and redundancy.
EVI is an IP-based Layer 2 overlay technology for interconnecting data centers. With
EVI, Layer 2 frames are encapsulated in IP packets using a format called MAC in
GRE. This encapsulated traffic can traverse routed networks, using GRE tunnels that

******ebook converter DEMO Watermarks*******

are established between each data centers EVI nodes.


Since EVI uses standard GRE and IP encapsulation, any existing IP routing
infrastructure can be used to extend Layer 2 networks between data centers.
Traditional Ethernet switches glean MAC addresses from the source address field of
normal endpoint data frames. These MAC addresses are mapped to the interface on
which they were received. EVI learns the MAC addresses for local devices in much
the same way.
By comparison, EVI uses a control plane-based MAC learning process for remote
devices. When a local MAC address is learned, the EVI nodes in that data center use
IS-IS to announce it to remote sites. Remote sites map these MAC address to the link
on which they were learned.
Since remote sites proactively learn MAC addresses via this control plane process,
they are free from having to inspect each incoming frames source address for that
information.

EVI Terminology
Figure 8-1 introduces terms related to EVI. These terms are described below.

******ebook converter DEMO Watermarks*******

Figure 8-1: EVI Terminology

An Edge Device is the switch or router in a data center that provides EVI services. It
performs local MAC learning, forms IS-IS peers with remote edge devices, and
announces local MAC addresses to those remote peers.
The EVI network ID is unique identifier between edge devices. Each edge device
may be assigned multiple EVI network IDs to isolate multiple tenants. Each EVI
network ID defines an additional IS-IS process. This ensures that learned MAC
information for one tenant remains separate and private. A single infrastructure can
securely support multiple tenants.
The LAN side of an edge device connects to traditional VLANs, with the traditional
4094 VLAN limit. You must map selected VLANs to specific EVI network IDs, thus
creating an extended VLAN.
For example, a data center site may have 1000 traditional, local VLANs. The
network administrator wants to extend 100 of those VLANs to a remote data center.

******ebook converter DEMO Watermarks*******

Only these selected VLANs will become extended VLANs, by being mapped to a
specific EVI network ID. Each VLAN can only be mapped to a single EVI network
ID. It is not possible to map a VLAN to multiple EVI network IDs.
The EVI Neighbor Discovery Protocol (ENDP) provides an IP address registration
service. This is useful when several sites must be interconnected via EVIs GRE
tunnels.
Each side of a GRE tunnel must be configured with a source and destination IP
address. When only two sites are connected, this is easy. Each side merely specifies
its own local address as the GRE tunnel source, and the other sides IP address as the
destination. However, with three data centers, each site must be configured with two
separate tunnels, so that each destination can be configured. As additional sites are
added, yet another tunnel must be configured. If a data center is decommissioned all
other data centers must be reconfigured to remove the associated tunnel.
ENDP defines server and client roles to alleviate this administrative overhead. Each
data centers ENDP client can register themselves to the ENDP server. They can also
query the server for a list of currently active peer addresses. This enables the
dynamic setup and tear down of GRE tunnels based on the list of registered IP
addresses. This greatly simplifies the addition and removal of data centers.
EVI IS-IS is based on standard IS-IS mechanisms. Specific extensions were added to
announce Layer 2 reachability information for EVI. This reachability information
includes a list of VLANs and the MAC addresses that have been learned on those
VLANs.

EVI Concepts 1/2


Figure 8-2 shows an EVI-based scenario. Three data centers are connected by a
public transport IP network, which may not a part of the customer-managed network.
It could be a service providers network, or a service providers MPLS L3 VPN
service, for example. Each data center has a single edge device that connects to the
site-local VLANs, and provides connectivity to remote sites.

******ebook converter DEMO Watermarks*******

Figure 8-2: EVI Concepts 1/2

The edge device could be filling the role of a traditional core switch, with hundreds
of local interfaces connected to traditional switches inside the data center. Classic
MAC learning is performed in this local network. This means that each frame
entering a local port is examined by the edge device. Its source address is added to
the MAC address table, mapped to the interface on which it was learned.
Using ENDP, GRE transport tunnels are dynamically formed between EVI peers. The
switch perceives that logical Layer 2 EVI links are available over these Layer 3
GRE tunnels. This allows remote site MAC addresses to be learned via EVIs IS-IS
protocol.

EVI Concepts 2/2


******ebook converter DEMO Watermarks*******

The EVI network ID is a logical entity that can be used to isolate multiple tenants. It
allows unique logical topologies to be formed for each client, each with its own
separate MAC address table.
Figure 8-3 shows a topology that supports two separate client networks, EVI network
ID 11 and an EVI network ID 12, represented in the figure as logical switching
objects.

Figure 8-3: EVI Concepts 2/2

EVI network ID 11 is configured to operate between data centers 1, 2 and 3, while


EVI network ID 12 operates only between data centers 2 and 3. Each logical network
has its own ENDP server, IS-IS instance, and MAC address table. Complete tenant
isolation is achieved.
Multiple VLANs can be assigned to each logical ID 11 and 12, but each VLAN can
be mapped to one and only one ID. VLANs 1 through 99 could be mapped to EVI ID
11, while VLANs 200- 250 could be mapped to EVI ID12.

******ebook converter DEMO Watermarks*******

By using QinQ technology, EVI edge devices can even provide each tenant with its
own set of 4094 VLANs. This is conveyed in Figure 8-3. In DC-3, Edge3 is
connected to a local switch that is configured with QinQ. Any VLANs from the local
switch will have an additional, outer 802.1q tag of 20 added before transmission to
Edge3. Edge3 will map VLAN20 to EVI ID 11. For another tenant, Edge 3 maps
VLAN30 to EVI ID 12.
This QinQ solution carries a risk as related to duplicate MAC addresses. EVIs
internal MAC database perceives all MAC addresses as associated with VLAN 20,
since it only sees the outer tag. If there were duplicate MAC addresses among the
clients multiple VLANs, MAC flapping could occur in the EVI network. However, it
is quite rare for duplicate MAC addresses to be an issue.
VRRP could increase the odds of this issue, since VRRP uses a specific MAC range.
If VLANs 101 and 102 both used VRRP router ID 1, they could end up with the same
MAC address, and cause a conflict.

EVI Network
Each EVI network configuration is defined with a unique network ID, in the range of
1 - 16777215. The EVI network also requires an EVI tunnel interface. This tunnel
interface is defined by a unique tunnel interface ID. Multiple tunnel interfaces will
use the same physical source IP address.
Each EVI network has an EVI IS-IS process, which is automatically created when a
new EVI tunnel interface is configured. EVIs IS-IS process is responsible for linkstate calculation and remote-site MAC address exchange.
The EVI process ID will match the EVI tunnel interface ID. If an administrator
configures interface tunnel 26, there will also be an EVI IS-IS process 26. The
neighbor discovery protocol must also be configured. This is unique for each
network ID to ensure that unique topologies can be formed for each network tunnel.
Finally, local VLANs are mapped to specific EVI IDs to create a set of extended
VLANs. Traffic tuning and optimization are also configured per network ID. This
includes ARP suppression and selective flooding. In this way, each EVI configuration
has a completely isolated configuration set.

EVI Process
The EVI network process can be described in three phases - neighbor discovery,
MAC address advertisement, and actual data forwarding.

******ebook converter DEMO Watermarks*******

ENDP neighbor discovery occurs on edge devices. Each edge device registers its IP
address with the ENDP server. Each ENDP client also queries the server for other
edge devices using the same network ID. GRE tunnels are automatically formed with
these edge devices.
So, these Layer 3 GRE tunnels are automatically established, with an overlay of
associated Layer 2 EVI links. The EVI IS-IS routing protocol will form adjacencies
over these EVI links. These adjacencies enable IS-IS to send locally learned MAC
address information to remote peers.
Once the network is fully established, endpoint data forwarding can occur. Local
traffic is received, and the destination MAC address is found in the EVI MAC table.
This table associates this destination MAC address to an EVI link. The frame is
encapsulated and sent over the appropriate tunnel. The remote edge device receives
this traffic and forwards it toward the appropriate local destination, based on the
local MAC address table.

EVI Neighbor DiscoveryIntroduction


ENDP eases the process of connecting multiple sites. The number of sites supported
is dependent on hardware models in use. ENDP provides an IP address registration
service that enables sites to be dynamically added and removed.
Each ENDP Client (ENDC) is configured with the IP address of the ENDP Server
(ENDS). The client will register its own local transport IP address with the ENDS.
Then it queries the server to retrieve the list of active remote IP address. The client
will refresh the transport IP address information at regular intervals. When clients go
offline, the IP address is removed from the ENDS.
When the client has retrieved all active remote IP addresses it can setup EVI tunnels
to each of them. Each EVI network ID maintains separate ENDP configuration. This
enables each EVI ID to maintain unique a topology.

EVI Neighbor DiscoveryConfiguration


The ENDP server can be enabled on any EVI edge device. For registration
redundancy purposes up to two ENDP servers can be configured for each network
ID.
An edge device configured as an ENDS is automatically its own ENDC. It registers
its own transport IP address in the local database. Two ENDP servers are often
configured to provide redundancy. This redundancy is achieved since each client

******ebook converter DEMO Watermarks*******

registers with both ENDP servers. However, consider a scenario with eight
interconnected data centers, with ENDS configured at DC-1 and DC-2. Loss of
connectivity to both DC-1 and DC-2 breaks the EVI connectivity between all the
other data centers. There must be at least one functional, accessible ENDS.
The ENDC is manually configured with the transport ID address of up to two ENDP
servers. If an edge device has already been configured as an ENDS, it is already a
client of itself, and so can only be configured with one more ENDS.
ENDP authentication can be configured to enhance security. This is in the form of a
password, which must match between the client and the server.

EVI Neighbor Discovery Example


Figure 8-4 shows five interconnected data centers. The edge device at each data
center has a routed interface connected to the transport IP network. Edge1 has an IP
address of 10.1.0.1/24, Edge2 has 10.2.0.1/24, and so on. These addresses will be
used to establish the tunnels. Each data center requires four GRE tunnels, one to each
remote site.

Figure 8-4: EVI Neighbor Discovery Example

The two top-most data centers edge devices are to be configured as ENDP servers.
As discussed, they will automatically register to themselves as an ENDC. Each is
also manually configured as an ENDC of the other edge device.

******ebook converter DEMO Watermarks*******

All of the other sites are configured as pure ENDP clients of both ENDP servers.
The ENDS member database lists all active transport IP addresses and system IDs.
The address listed is the public, provider-facing IP address. Each Edges system ID
is also listed, based on the devices MAC address. This System ID is used by IS-IS
in the same way OSPF uses a Router ID. It uniquely identifies each node in the
network.

EVI MAC Learning


MAC addresses local to each data center are learned from local interfaces. This is
classic data plane-based learning. Source MAC addresses are gleaned from normal
endpoint data frames.
These locally learned MAC addresses can then be announced by EVI IS-IS to all
remote edge devices. The remote site edge devices receive these MAC addresses
and add them to their EVI MAC address table associated with the EVI link on which
it was received and learned.
EVI IS-IS transmits this information in Link State Packets (LSPs). These LSPs
include the actual MAC address, and the VLAN on which it exists.
For example, if DC-1 has 10,000 local MAC addresses, Edge1 sends an LSP with
10,000 records. This is similar to how OSPF might use LSPs to advertise 10,000
available routes. Remote edge devices receive this LSP, and add the 10,000 MAC
addresses to their database.

EVI Traffic ForwardingUnicast


Figure 8-5 highlights the process used by EVI to forward unicast frames. When an
edge device receives a unicast frame, it learns the source MAC and looks up the
destination MAC address. If the destination is local, it performs traditional local
interface switching to the correct local interface. This is classic Ethernet switching.

******ebook converter DEMO Watermarks*******

Figure 8-5: EVI Traffic Forwarding - Unicast

If the MAC address is for a device at the remote site, the edge switch encapsulates
the frame and sends it over the EVI tunnel the other site.
The remote site edge device receives the inbound Unicast traffic on the EVI link. It
does not learn the source MAC address from it, since remote MAC addresses are
learned via the IS-IS control-plane process. It looks up the destination MAC address
of the incoming frame. If the MAC address is for a local device, then classic, local
interface switching will be performed.
Typically, there would be a match, since Edge Device1 learned that MAC address
from Edge Device2 in the first place. However, if there is no match, Device2 sends
an EVI IS-IS instruction to Device1 to purge that address.

EVI Traffic ForwardingMulticast 1/2


The Multicast delivery process applies to broadcast, multicast, and unknown unicast
traffic. By default, the EVI protocol avoids multicast flooding by dropping multicast
packets. To enable Multicast traffic over an EVI network, Protocol Independent
Multicasting (PIM) is configured, along with IGMP snooping for IPv4 and MLD
snooping for IPv6.
Edge devices should flood Multicast control protocol traffic, such as IGMP, MLD,
and PIM packets, to remote sites. This is important because a multicast receiver and
transmitter must be aware of each other. When an additional listener comes online, it
sends an IGMP message on the network. That IGMP message must reach the remote

******ebook converter DEMO Watermarks*******

site in order to trigger the delivery of the multicast stream.


Since IGMP and PIM protocol packets use a multicast destination, those packets
would be dropped by default. It is therefore important to configure EVI with the
selective flooding feature.
Note that IGMP and MLD PIM protocol packets arriving on EVI links from remote
sites will be flooded out the local interface by default. It is locally generated
multicast frames that will not be forwarded across the EVI links, by default.

EVI Traffic ForwardingMulticast 2/2


Figure 8-6 highlights the process that EVI uses to forward multicast traffic, as Edge
Device1 receives an inbound multicast packet on a local interface. The switch checks
to see if any interfaces have registered to receive multicast traffic. Hosts register to
receive a multicast stream by sending an IGMP join message.

Figure 8-6: EVI Traffic Forwarding Multicast 2/2

If local endpoints have registered, the switch performs traditional multicast delivery.
If a remote device has joined the multicast group across an EVI link, then frames are
encapsulated and sent over the EVI tunnel to the remote site. If several clients from
several remote sites have registered, then Edge Device1 uses a technique called
hidden replication. It sends a separate packet across each EVI link, toward each
remote site.
Remote site Device2 receives the multicast traffic on the EVI link. It checks to see if
any of its local interfaces are registered. This would have happened as described

******ebook converter DEMO Watermarks*******

above, when a local endpoint sends an IGMP join request. Of course, this should be
the case, otherwise Device1 wouldnt have received the request over the EVI link.
Device2 performs traditional multicast delivery to get these multicast frames to
appropriate endpoints. However, it will not forward multicast traffic over any EVI
links. This split-horizon mechanism prevents loops inside the EVI network topology.

EVI Traffic ForwardingFlooding


EVI switches flood broadcast traffic to all local interfaces and all EVI links - EVI
will not break traditional broadcast mechanism. Broadcast traffic like ARP will be
flooded to all remote sites by default. The ARP suppression feature can be
configured to minimize ARP broadcasts over EVI links.
Unknown unicast frames are flooded out all local interfaces, but not over EVI links.
This is unnecessary, since MAC addresses for remote devices are learned by the EVI
IS-IS protocol.
This is a good approach for most normal traffic, but can create challenges for
specific protocols that rely on classic unknown unicast mechanisms. Exceptions can
be configured to ensure that these special applications continue to operate properly.
This will be covered later in this chapter.
Incoming multicast traffic is flooded out all local interfaces in a traditional fashion. It
will not be flooded out EVI links, unless IGMP, MLD, or PIM messages have been
received over that link, indicating that the traffic is required.
However, exception rules can be configured to enable selective MAC flooding. An
example could be when VRRP is configured. VRRP uses a multicast address to send
VRRP Hello packets, but never uses IGMP to join that multicast group. Since EVI
will never receive an IGMP join for VRRP, it will not send VRRP hello packets
across EVI links. The network administrator thus can configure a specific rule to
permit the appropriate multicast MAC address to traverse EVI links.

EVI Traffic Optimization: ARP Suppression 1/2


ARP suppression applies to broadcast traffic. You have learned that EVI floods
broadcast traffic to all sites by default. The ARP suppression feature reduces the
number of ARP broadcast between sites, since the local edge device can proxy ARP
responses on behalf of remote clients. A caching mechanism is activated on the Edge
device to cache MAC-to-IP information.

******ebook converter DEMO Watermarks*******

EVI Traffic Optimization: ARP Suppression 2/2


Figure 8-7 highlights the operational steps used with ARP suppression. At DC-1,
Server1 sends an ARP broadcast request that arrives at Edge1. Edge1 does not yet
have an ARP record for the requested IP address, and floods the request to all remote
sites. At remote site DC-2, Server2 receives the request and sends an ARP reply,
which is forwarded by Edge2 across the EVI link.

Figure 8-7: EVI Traffic Optimization: ARP Suppression 2/2

Edge1 snoops for ARP replies on EVI link, learns Server2s MAC address, and adds
it to an EVI ARP cache.
Therefore when another device at DC-1, Server3, broadcasts an ARP request for
Server2, the broadcast need not be forwarded over EVI links. Edge1 has the entry,
and proxies a response back to its local client.

EVI Traffic Optimization: Selective Flooding


1/2
Some protocols or applications require multicast packets, but do not perform

******ebook converter DEMO Watermarks*******

IGMP/MLD registration. Microsoft Network load-balancing Cluster is one example


that may or may not use IGMP, depending on how it is configured. Another example
is VRRP. EVI will block unregistered multicast from traversing EVI links by default.
This may render the protocol or application as non-functional.
As previously discussed, EVI will block VRRP Hellos by default. As a result, VRRP
hosts in DC-1 and DC-2 will not hear hellos from each other. Therefore, they will
both assume the master role and actively function as the default gateway for local
clients. This ensures that local hosts always forward frames to a local VRRP
forwarder. This is preferable to using a default gateway on the other side of a WAN
link, since data paths are optimized.

EVI Traffic Optimization: Selective Flooding


2/2
Each EVI Network ID has its own selective flooding configuration. On each edge
device, specific multicast MAC addresses that should be flooded can be configured
per VLAN. This means that it is possible to allow the default configuration to block
VRRP on VLAN 20, and allow VRRP Hellos to traverse EVI links for VLAN 30.
Another example is noted in Figure 8-8, with the Microsoft Network Load Balancing
(NLB) General multicast. This protocol uses the 03-BF MAC address range as a
destination multicast address. This address range can be configured as an exception
for VLAN 10.

Figure 8-8: EVI Traffic Optimization: Selective Flooding 2/2

Once configured, this multicast MAC address would selectively be allowed to be


flooded over EVI links, while all other unregistered multicast traffic will conform to
the default blocking action.

EVI Redundancy 1/2


******ebook converter DEMO Watermarks*******

EVI allows for up to two logical devices at each local site. Since EVI is supported
by IRF systems, site redundancy can be handled at this level. Two EVI devices can
be configured as a single, logical IRF system. Other devices perceive a single logical
EVI edge device.
EVI IS-IS also supports graceful restarts. This ensures quick, nearly lossless
recovery from an IRF master fail over scenario, or from an MPU fail over to a
different IRF chasses.
Another option is to deploy two distinct EVI edge devices in a local site, without use
of IRF. With this solution, both edge devices connect to local site switches, and so
announce the same set of MAC addresses to remote edge devices. This could cause
duplicate MAC learning at remote sites. To avoid this issue an active/standby model
is used.
The active edge devices participate in EVI IS-IS, announcing locally learned MAC
addresses. Therefore, remote sites learn MAC addresses only from this single, active
edge device.
The standby edge will maintain an LSDB, and so is ready to transmit all appropriate
MAC address information in an LSP. However, it remains quiet unless the active
device fails. In that event, the standby unit activates, forming IS-IS adjacencies to
remote sites. System functionality is thus maintained with very minimal downtime.

EVI Redundancy 2/2


EVI must be able to detect that a site is using two edge devices. The network
administrator will physically interconnect EVI edge devices to other local switches
for redundancy, but EVI will not be aware of this. This is because EVI IS-IS only
operates on the GRE tunnels formed over the transport IP network. It does not operate
on the local site interconnections.
The solution involves EVI Site IDs, which are configured by the administrator. Each
data center is assigned a unique ID, which is configured on both edge devices at the
site. The edge devices announce this ID on their local interfaces, and so see each
others announcement. The device with the lowest MAC address is isolated as a
standby device. The device with the highest MAC address is the active EVI IS-IS
participant.
The standby edge device is not in the data path, and ignores all data frames received
from EVI links. It still exchanges IS-IS Hello packets, but does not form adjacencies
nor exchange MAC information in any LSPs. It does accept LSPs inbound from

******ebook converter DEMO Watermarks*******

remote sites to build and maintain a current LSDB. Therefore, it is ready to take over
should the currently active device fail.
The default site ID is 0. Site ID 0 is not blocked from forming adjacencies. So if all
sites are left to their default configuration, they will still form IS-IS adjacencies.

EVI IS-IS Maximum MAC Address


Announcement
EVI IS-IS uses LSPs to announce local MAC addresses per VLAN. Thousands of
endpoints may exist at a large data center, so a single LSP might contain a large
amount of MAC address information. An update to the local MAC address table
results in an updated LSP, announced to all remote edge devices.
With EVI, a single LSP can contain a maximum of 56,320 MAC addresses,
advertised by each active edge device along with its EVI system ID. This then
defines the upper limit then for each site in an EVI deployment.
For EVI to handle more than this default maximum, virtual system IDs can be
defined. A virtual ID must be defined for each block of 56,320 MAC addresses. The
network administrator must ensure that each virtual system ID is unique throughout
the entire EVI network.

Configuration Steps for EVI


Figure 8-9 introduces the steps to configure EVI on HP Comware switches. These
steps are detailed next.

******ebook converter DEMO Watermarks*******

Figure 8-9: Configuration Steps for EVI

Optional Step 1: Configure the EVI Site-ID


If two edge devices are deployed at a single site, you should configure an EVI Site
ID. Each site must be assigned a unique site ID, which is configured identically on
both Edge devices at the site.
In Figure 8-10, the switches at site 1 are named Site1-12500-1, and Site1-12500-2.
They are both assigned site ID 1. The device at another site is named Site2-12500-1,
and is assigned an EVI site ID of 2.

******ebook converter DEMO Watermarks*******

Figure 8-10: Optional Step 1: Configure the EVI Site-ID

Step 2: Configure the Transport Interface


The transport interface must be configured on the edge device. This is a routed link
that is connected directly to the IP transport network. To do this, a VLAN is defined,
and the port is made a member of the VLAN. Then a Layer 3 VLAN interface is
created and an IP address is assigned to it. Finally, EVI is enabled on the physical
interface.
Note
The Layer 3 VLAN interface would also be configured to participate in some
routing protocol, such as static routing or OSPF, for example. This
configuration is not shown.
In Figure 8-11, VLAN 4001 is created and assigned to interface G3/0/1. Next, Layer
3 interface VLAN 4001 is defined with an IP address of 10.1.0.2/24. The
configuration step is completed by enabling EVI on the physical interface. This step
must be done on the physical interface, and not on logical interface VLAN 4001.

Figure 8-11: Step 2: Configure the Transport Interface

Step 3: Configure the EVI Tunnel Interface


******ebook converter DEMO Watermarks*******

Each EVI network requires a unique tunnel internal interface. This logical interface is
configured with its tunnel source IP address. This is the address assigned to the
interface connected to the IP transport network, as configured in the previous step.
Manual configuration of the tunnel destination is not necessary, since it will be
learned dynamically via ENDP.
The tunnel Interface ID also serves as the EVI IS-IS Process ID. In the example in
Figure 8-12, interface tunnel 1 was created with a mode setting of EVI. Since tunnel
interface 1 was created, EVI IS-IS process 1 will be created.

Figure 8-12: Step 3: Configure the EVI Tunnel Interface

Step 4: Configure the EVI Network ID


The EVI tunnel must be mapped to an EVI Network ID. The valid range for this
number is between 1 and 16777215. In the example in Figure 8-13, tunnel 1 is
configured with EVI network ID 101.

Figure 8-13: Step 4: Configure the EVI Network ID

Step 5: Configure the Extended VLANs


When you map site-local VLANs to EVI, you are creating extended VLANs. These
are the VLANs that EVI will extend across data centers. As shown in Figure 8-14,
multiple VLANs can be mapped to single EVI network ID, but each VLAN can be
assigned to one and only one EVI network ID.

Figure 8-14: Step 5: Configure the Extended VLANs

******ebook converter DEMO Watermarks*******

All the MAC addresses learned in all VLANs for a particular network ID are
announced by EVI IS-IS. These MAC addresses are announced to remote-site peers
along with their VLAN ID. This is how EVI maintains separate MAC tables per
VLAN.
Note
In this example VLAN 4001 serves as the uplink transport. This VLAN must
never be configured as an extended VLAN.

Step 6: Configure ENDS


A previous step enabled the GRE tunnel by specifying the tunnel source. Tunnel
destinations are automatically discovered by ENDP. Figure 8-15 reveals the
configuration of an ENDP Server, or ENDS.

Figure 8-15: Step 6: Configure ENDS

Up to two servers can be configured. An EVI edge device configured as an ENDS


automatically registers with itself as a client. This configuration is added on the EVI
tunnel interface. Since one tunnel interface is configured for each network ID, a
separate neighbor discovery topology exists for EVI network ID.

Step 7: Configure EVI Neighbor Discovery


Client
EVI neighbor discovery is enabled by configuring ENDP clients (ENDC) with the IP
address of the server, as configured in the previous step. A client can be configured
with up to two servers. Servers can only be configured with one server, since they
are already acting as one of the two possible servers.
In Figure 8-16, the top example could be for a two-site deployment. Edge device
Site2-125001-Tunnel1 is configured as an ENDC. This continues the example from
Figure 8-15, in which Site1-125001-Tunnel1 was configured as an ENDS.

******ebook converter DEMO Watermarks*******

Figure 8-16: Step 7: Configure EVI Neighbor Discovery Client

The next example in Figure 8-16 is for a three-site deployment. Site 1 and Site 2 edge
devices are each configured as servers, and then configured as a client of each other.
Site 3s edge device is configured purely as a client, and so both servers are
specified.

Step 8: Verify
As shown in Figure 8-17, there are several verification commands available to
validate your configuration efforts.

******ebook converter DEMO Watermarks*******

Figure 8-17: Step 8: Verify

The neighbor discovery summary and member commands reveal current EVI status
and reveal registered addresses.
The client summary options summarize client configuration and provide an overview
of all the remote IP addresses which have been learned through the neighbor
discovery protocol.
The Display EVI link interface tunnel command shows the status of the EVI link.
The display interface command shows information about logical L2 EVI links.
The control plane protocol can be examined with display EVI IS-IS brief, allowing
you to see remote and local MAC address.

Advanced Step 9: ARP Suppression


ARP flooding suppression minimizes ARP broadcast traffic over EVI links. This
must be configured separately on each edge device, since each edge device performs
local ARP caching.

******ebook converter DEMO Watermarks*******

As in the example in Figure 8-18, this configuration is performed under the tunnel
interface, so each tenants EVI network ID can function separately.

Figure 8-18: Advanced Step 9: ARP Suppression

Advanced Step 10: Selective Flooding


Selective flooding provides administrative control over traffic flows. By default
broadcasts are always flooded, while unknown and unregistered multicast unicast are
only locally flooded on the local interfaces. The network administrator can enable
traditional flooding for selected MAC addresses on one or more VLANs.
In Figure 8-19, the top example is for Microsoft NLB MAC. The MAC address to be
flooded is configured on the tunnel interface. In the example, whenever local packets
in VLAN 10 are received, with the specified destination MAC address, they will be
flooded to all remote sites. This is Independent of receipt of any IGMP join or PIM
register messages.

Figure 8-19: Advanced Step 10: Selective Flooding

For typical multicast traffic, it is important to ensure that IGMP and PIM protocol
messages are flooded between data centers. These protocols use Layer 3 multicast
destination addresses 224.0.0.1 and 224.0.0.13. These addresses translate to the
Layer 2 MAC address 0100-5e00-0001 and 0100-5e00-000d. In the case of Figure
8-19, the selective flooding is enabled for VLAN 10.

Advanced Step 11: Virtual-system IDs


You learned earlier that a single EVI LSP can handle 56,320 MAC addresses, and
that this defines the upper limit of MAC addresses per data center site. Virtual system
IDs can be defined to accommodate more than this default maximum.

******ebook converter DEMO Watermarks*******

Virtual system IDs are defined from within the EVI IS-IS process. The EVI IS-IS
process ID is unique for each EVI network ID, and is based on the EVI tunnel
interface number. Since Tunnel 1 was defined, the EVI IS-IS process is also 1.
Remember that the default limit of 56,320 is per data center, not per deployment. If
you have four data centers, each with 20,000 MAC addresses, there are 80,000 MAC
addresses in the deployment. The scenario in Figure 8-20 does not require virtual
system IDs. Each data center site announces its own LSP of 20,000 addresses, which
is well below the default maximum. If an individual site expanded over time to over
56,320 MAC addresses, then virtual system IDs would be required.

Figure 8-20: Advanced Step 11: Virtual-system IDs

Summary
In this chapter you learned:
EVI was developed by HP to extend Layer 2 network across Layer 3 networks to
remote data centers. EVI is based on the IS-IS link-state routing protocol, and
uses MAC-in-GRE format tunnels to form peers over any existing routed IP
infrastructure.
With support for up to eight interconnected data centers, several optimization
techniques are available, including ARP suppression, VRRP Hello blocking, and
selective flooding.
An EVI edge device provides EVI services by forming IS-IS adjacencies through
GRE tunnels. A single EVI deployment can accommodate multiple tenants, since
each tenants GRE tunnel set is defined by a unique EVI network ID.
Tunnels are dynamically created as new sites are brought up and down through use
of the EVI Neighbor Discovery Protocol (ENDP). Once the tunnels are formed,
EVI IS-IS advertises MAC reachability information to remote sites. This enables
efficient data forwarding for endpoints.
Local MAC learning is handled in the data plane, in the same way that traditional
Ethernet switches glean addresses from the source address field of an Ethernet
frame. Learning MAC addresses for remote devices is handled at the control
plane, by the EVI IS-IS protocol.

******ebook converter DEMO Watermarks*******

Default EVI forwarding mechanism efficiently handle most unicast, multicast, and
broadcast traffic. However, these defaults can be modified to handle special
cases, including ARP broadcast suppression and VRRP Hello frames.
EVI configuration includes setting up transport and tunnel interfaces, network IDs,
extended VLANs, and ENDP. ARP suppression, selective flooding, virtual system
IDs, and site IDs are optional.

Learning Check
Answer each of the questions below.
1. What are three goals of EVI (Choose three)?
a. Support connectivity between three or more data centers.
b. Extend Layer 2 networks across data centers.
c. Use Layer 3 transport mechanisms without requiring MPLS connectivity.
d. Provide several techniques to optimize and tune Layer 3 connectivity.
e. Enable VMs to be easily moved to any data center
2. Choose three correctly described components of a typical EVI deployment
(Choose three).
a. The device that provides EVI services is called the Edge device.
b. Each participating VM host is assigned a VNI.
c. The ENDP provides public transport IP address registration.
d. EVI IS-IS is used as a control protocol for EVI.
e. EVI uses an ENDC server to register MAC addresses.
3. Internal EVI interfaces perform classic MAC learning, while EVI-facing links
leverage EVI IS-IS for MAC learning at the control plane.
a. True.
b. False.
4. What are three steps for EVI neighbor discovery (Choose three)?
a. The ENDP Client registers the transport IP with the ENDP Server.
b. The ENDC queries the ENDS to retrieve active remote IP addresses.
c. The ENDS devices update each others ARP cache.
d. Edge devices can setup EVI tunnels to active IP addresses.

******ebook converter DEMO Watermarks*******

e. IPSec tunnels respond to access list permit statements.


5. What are two advanced configuration options for EVI (Choose two)?
a. Multicast routing protocol configuration 1
b. Selective flooding
c. EVI Site-ID configuration.
d. ARP suppression2

Learning Check Answers


1. b, c, e
2. a, c, d
3. a
4. a, b, d
5. b, d

******ebook converter DEMO Watermarks*******

9 MPLS Basics

EXAM OBJECTIVES
In this chapter, you learn to:
Describe Multiprotocol Label Switching (MPLS).
Understand the basic operation of MPLS.
Describe the MPLS encapsulation.
Clarify several MPLS misconceptions.
Describe the behavior of a Label Switching Router (LSR).
Describe a Label Switched Path (LSP).
Describe a Forwarding Equivalence Class (FEC).

INTRODUCTION
Multiprotocol Label Switching (MPLS) provides connection-oriented label
switching over connectionless IP backbone networks. It integrates both the flexibility
of IP routing and the simplicity of Layer 2 switching.

MPLS Advantages
Multiprotocol Label Switching (MPLS) delivers the following advantages:
High speed and efficiencyMPLS uses short- and fixed-length labels to forward
packets, avoiding complicated routing table lookups.
Multiprotocol supportMPLS resides between the link layer and the network
layer. It can work over various link layer protocols (for example, PPP, ATM,
frame relay, and Ethernet) to provide connection-oriented services for various
network layer protocols (for example, IPv4 and IPv6).

******ebook converter DEMO Watermarks*******

Good scalabilityThe connection-oriented switching and multi-layer label stack


features enable MPLS to deliver various extended services, such as VPN, traffic
engineering, and QoS.

MPLS History
Multiprotocol Label Switching (MPLS) was initially proposed as a solution to
improve the forwarding speed of routers. In the past, routers performed software
based routing using a central CPU rather than using ASICs on line cards. Every
packet that arrived on an interface required a routing table lookup which was both
time consuming and processor intensive. This was exacerbated when routers had
very large routing tables.
Switches in the past provided Layer 2 hardware forwarding, but could typically not
route packets or had limited Layer 3 capabilities. Most switches also only supported
Ethernet connections and not Layer 2 encapsulations such as ATM or Frame Relay.
One of the original objectives of MPLS was to provide routing information base
(RIB) lookups close to the forwarding performance of Layer 2 switching. To enable
this, it was envisioned that MPLS core devices would use labels rather than routing
tables and would perform simple swaps of labels rather than using complex routing
tables. The premise was to program the simple forwarding table used by MPLS via a
higher level protocol. The MPLS forwarding table or label forwarding information
base (LFIB) would be derived from information contained in the routing table.
This original MPLS goal of increased speed through label switching is less relevant
today because of hardware based routing. In today's networks, devices perform Layer
3 forwarding in hardware using ASICs and can therefore route at wire speeds. The
past performance issues of slower Layer 3 forwarding is negated by the use of
hardware based forwarding information bases (FIBs) based on software routing
information bases (RIBs).
MPLS is still valid today as it abstracts the backbone network from the user network.
Customer services can be transported transparently across the MPLS backbone
network with greater ease and scalability.
MPLS benefits include improved performance (not as relevant today) and reduced
total cost of ownership because of flexible network deployments. Both Layer 2 and
Layer 3 customer connections are supported across an MPLS network. Large scale
networks are supported providing connections from multiple customers or business
units. MPLS can also utilize existing underlying network infrastructures including
Frame Relay, ATM and Ethernet.

******ebook converter DEMO Watermarks*******

MPLS provides better security and isolation of customer networks. It is far simpler
and more scalable to configure MPLS L3PVNs than trying to separate customer
traffic using access control lists.
MPLS supports dynamic re-convergence when a link or device in the network fails.
Traditional protocols such as OSPF and BGP are used in addition to MPLS specific
protocols to provide automatic failure and path calculation. This provides
redundancy to any services utilizing the core MPLS infrastructure including L2VPNS
or L3VPNs.
MPLS supports advanced traffic engineering allowing network operators to select
other path selection algorithms in addition to the default path selection algorithms of
bandwidth or hop count used by traditional routing protocols.

MPLS Application Uses


MPLS provides many advantages to service providers with multiple MPLS
application use cases including L2VPNs, L3VPNs and VPLS.
One major advantage of MPLS is the abstraction of the core network from customer
devices. The customer network devices are unaware of the underlying MPLS core
network. The core MPLS network devices are also protected from spurious or
malicious routing updates. Management access to the core network devices is also
restricted because of customer isolation.
Another advantage is customer separation across a shared infrastructure multiple
customers can use the same core MPLS network, but be separated in a logical way,
similar to how VLANs separate customers in traditional Ethernet environments.
MPLS supports customer networks that use their own IP addressing scheme which
may overlap with other customers.
MPLS supports both Layer 3 IP and Layer 2 Ethernet VPN services between
customer sites across a shared backbone network. A customer Layer 2 network can
cross multiple routed core network devices which are invisible to the customer.
Customers can also implement their own MPLS networks transporting Layer 2 or
Layer 3 business unit traffic across an existing Layer 3 infrastructure. Customers can
therefore easily create isolated networks similar to VLANs, but with the added
advantage that MPLS networks can span routed links and disparate sites.
A second important use case for MPLS is advanced traffic engineering (TE). The
details of MPLS TE are out of scope of this study guide, but a brief overview is

******ebook converter DEMO Watermarks*******

provided here for completeness.


Network congestion can degrade backbone network performance. Congestion can
occur when network resources are inadequate or when load distribution is
unbalanced. Traffic engineering (TE) is intended to avoid the latter situation where
partial congestion might occur because of improper resource allocation.
In traditional IP routing, traffic is typically forwarded based on a single best path to a
destination (or a limited number of paths with equal cost multipath selection). Route
selection is based on bandwidth and not on other dynamic factors such as traffic load.
MPLS TE can load share traffic across multiple unequal paths based on real time
network conditions.
As an example, consider a network with two WAN connections between two remote
sites. One WAN connection is a high speed, expensive connection while the other is a
low speed, inexpensive link. Configuring policy based IP routing for load sharing of
certain traffic types across one link and other traffic types across the second link is
possible, but is labor intensive and error prone. Manual policy based routing
configuration is also not scalable or dynamic. This is not a flexible or practical
method in global service provider networks.
MPLS TE provides an easier alternative for traffic path selection using scalable,
easier to configure policies that can also dynamically adjust to network conditions
such as load or link failure.
TE can make the best use of network resources and avoid uneven load distribution by
the use of real-time traffic monitoring dynamic tuning of traffic management
attributes, routing parameters, and resource constraints.
MPLS TE combines the MPLS technology and traffic engineering. It reserves
resources by establishing LSP tunnels along the specified paths, allowing traffic to
bypass congested nodes to achieve appropriate load distribution.
MPLS TE features simplicity and good scalability. With MPLS TE, a service
provider can deploy traffic engineering on the existing MPLS backbone to provide
various services and optimize network resources management.

Supported Products
Both Comware routers and switches include support for MPLS (model dependent).

******ebook converter DEMO Watermarks*******

Switches
All Comware chassis based switches support MPLS with the exception of certain
models when used in combination with basic line processing units (LPUs). Please
refer to the datasheets of individual switches for more information.
MPLS support on HP Comware fixed port switches is limited to the high end and
data centre models. As an example, a 5500HI switch includes full MPLS support,
whereas a 5500 switch running either the standard or enhanced images does not
support MPLS.
5800 series switches support MPLS, but be aware that the 5820 series switches do
not support MPLS. Comware 7 specific devices such as the 5900 series switches do
support MPLS.
Switches that support MPLS include the following functionality: basic MPLS,
L3VPN, L2VPN (for Ethernet only) and VPLS.

Routers
All chassis based routers support MPLS. In addition, all MSR routers support MPLS
except for the MSR900 series routers.
Routers that support MPLS include the following functionality: basic MPLS, L3VPN,
and L2VPN (for Ethernet and other media types such as ATM).
Router support for VPLS (point-to-multipoint VPN feature) is limited to high end
routers.
HP Virtual Services Routers (VSRs) also support MPLS including MPLS VPNs and
MPLS Traffic Engineering (MPLS TE)

MPLS Terminology
Table 9-1 briefly describes various MPLS terms. Some of these are discussed in
more detail later in the chapter.
Table 9-1: MPLS terms
Term

Description

Customer

A CE device is a customer network device directly connected to the


service provider network. It can be a network device (such as a router

******ebook converter DEMO Watermarks*******

edge (CE)

or a switch) or a host. It is unaware of the existence of any VPN,


neither does it need to support MPLS.
As a forwarding technology based on classification, MPLS groups
Forwarding
packets to be forwarded in the same manner into a class called the
equivalence
forwarding equivalence class (FEC). That is, packets of the same FEC
class (FEC)
are handled in the same way.
Label
A label uniquely identifies a FEC and has local significance.
Label
A router that performs MPLS forwarding is a label switching router
switch
(LSR).
router (LSR)
A label switched path (LSP) is the path along which packets of a FEC
Label
travel through an MPLS network.
switch path An LSP is a unidirectional packet forwarding path. Two neighboring
(LSP)
LSRs are called the "upstream LSR" and "downstream LSR" along the
direction of an LSP.
The Label Forwarding Information Base (LFIB) on an MPLS network
Label
functions like the Forwarding Information Base (FIB) on an IP
Forwarding
network. When an LSR receives a labeled packet, it searches the LFIB
Information
to obtain information for forwarding the packet, such as the label
Base (LFIB)
operation type, the outgoing label value, and the next hop.
A label block is a set of labels. It includes the following parameters:
Label baseThe LB specifies the initial label value of the label
block. A PE automatically selects an LB value that cannot be manually
modified.
Label rangeThe LR specifies the number of labels that the label
block contains. The LB and LR determine the labels contained in the
label block. For example, if the LB is 1000 and the LR is 5, the label
block contains labels 1000 through 1004.
Label-block offsetThe LO specifies the offset of a label block. If the
existing label block becomes insufficient as the VPN sites increase,
you can add a new label block to enlarge the label range. A PE uses an
Label block LO to identify the position of the new label block. The LO value of a
label block is the sum of the LRs of all previously assigned label
blocks. For example, if the LR and LO of the first label block are 10
and 0, the LO of the second label block is 10. If the LR of the second
label block is 20, the LO of the third label block is 30.
A label block whos LB, LO, and LR are 1000, 10, and 5 is
represented as 1000/10/5.
Assume that a VPN has 10 sites, and a PE assigns the first label block
LB1/0/10 to the VPN. When another 15 sites are added, the PE keeps
the first label block and assigns the second label block LB2/10/15 to

******ebook converter DEMO Watermarks*******

extend the network. LB1 and LB2 are the initial label values that are
randomly selected by the PE.
The top of stack label is removed. The packet will be forwarded
Pop label
based on remaining label stack (if labels remain in stack) or Layer 3
header (if no labels remain).
The top label in the label stack is removed and replaced (changed or
Swap label
swapped) with a new label.
Insert /
Various terms used for the addition of a new label to a non-MPLS
Impose /
packet or added to the top of stack of an MPLS packet.
Push label
MPLS-TE focuses on the optimization of overall network
performance. It is intended to conveniently provide highly efficient
and reliable network services. The performance objectives associated
MPLS
with TE are either traffic oriented to enhance quality of service (QoS)
Traffic
or resources oriented to optimize resources (especially bandwidth)
Engineering
utilization. TE helps optimize network resources use to reduce
(MPLS-TE)
network administrative cost, and dynamically tune traffic when
congestion or flapping occurs. In addition, it allows ISPs to provide
added value services.
Provider
P devices do not directly connect to CEs. They only need to forward
device (P) user packets between PEs using label swapping.
A PE device is a service provider network device connected to one or
Provider
more CEs. It provides VPN access by mapping and forwarding
edge (PE) packets from user networks to public network tunnels and from public
network tunnels to user networks.
Route
An RD is added before a site ID to distinguish the sites that have the
distinguisher same site ID but reside in different VPNs. An RD and a site ID
(RD)
uniquely identify a VPN site.
PEs use the BGP route target attribute (also called "VPN target"
attribute) to manage BGP L2VPN information advertisement. PEs
support the following types of route target attributes:
Export target attributeWhen a PE sends L2VPN information (such as
site ID, RD, and label block) to the peer PE in a BGP update message,
it sets the route target attribute in the update message to export target.
Route target Import target attributeWhen a PE receives an update message from
(RT)
the peer PE, it checks the route target attribute in the update message.
If the route target value matches an import target, the PE accepts the
L2VPN information in the update message.
Route target attributes determine which PEs can receive L2VPN
information, and from which PEs that a PE can receive L2VPN

******ebook converter DEMO Watermarks*******

information.

MPLS Forwarding Equivalence Class (FEC)


The first MPLS term described in more detail is the MPLS Forwarding Equivalence
Class (FEC).
A FEC is a group of data packets with similar or identical parameters which use the
same MPLS labels and are forwarded in the same way through an MPLS network.
As an analogy, a FEC can be compared to an IP prefix such as 10.1.1.0/24. In a
production network, it is unlikely that multiple packets will have exactly the same
header values or packet content. Typically, the destination subnet for multiple packets
will be the same whilst other values are different (TCP port, TOS, source IP address
etc.).
As an example, when using traditional IP routing, ping or telnet traffic to the same
destination subnet will use the same routing entry in the IPv4 routing table even
though the Layer 4 protocol is different. Traffic from multiple sources to the same
destination may also use the same entry. For example, a source of 10.2.2.1/24 or
10.2.2.2/24 may send traffic to 10.1.1.1/24, but traffic from either source host will
match the same routing table entry to reach host 10.1.1.1 (destination subnet
10.1.1.0/24).
In an IP routing table, any IP packets that match the destination IP prefix will follow
the same path or paths through the network. In the same way, multiple flows from
different sources may use the same FEC to reach a destination network. A FEC
typically groups packets by destination IP address rather than grouping packets that
have identical headers.
This raises the question what is the difference between a FEC and IP prefix? IPv4
unicast routing is based on the destination IPv4 address in a packet. With MPLS in
contrast, the assignment of the data path is very flexible and does not have to be
based solely on destination IPv4 address. FEC selection can be based on several
options including source IP address, Layer 2 characteristics or source interface. As
an example, you could assign all data (IPv4, IPv6, ARP) arriving on an interface to a
specific FEC and then forward those packets in the same way.

MPLS Label
An MPLS label uniquely identifies an FEC and has local significance.

******ebook converter DEMO Watermarks*******

A label is inserted between the Layer 2 header and Layer 3 header of a packet by a
label switch router (LSR), as illustrated in Figure 9-1. The MPLS header is often
referred to as a shim or Layer 2 header in reference to the OSI model.

Figure 9-1: MPLS Label

The MPLS header is neither the Layer 2 nor the Layer 3 header of the OSI model.
The Layer 2 header could be Ethernet, PPP, token ring or another Layer 2
encapsulation. The Layer 3 header could be IPv4 or IPv6. The MPLS header is
inserted between an Ethernet header and IPv4 header.
The MPLS header is four bytes (32 bits) long and consists of the following fields:
Label20-bit label value.
TC3-bit traffic class, used for QoS. It is also referred to as the experimental
bits (Exp).
S1-bit bottom of stack flag. A label stack can comprise multiple labels. The
label nearest to the Layer 2 header is called the "top label," and the label nearest
to the Layer 3 header is called the "bottom label." The S field is set to 1 if the
label is the bottom label and set to 0 if not. This is used with MPLS L2VPNs and
L2VPNs.
TTL8-bit time to live field used for loop prevention. This is similar to the way
TTL works in IPv4.
MPLS labels are locally significant. That means that the label is only meaningful to
the next hop LSR. No end to end single label value is assigned to a FEC. The label
typically changes on a per hop basis and is different between each router hop. It is
possible that a next hop LSR allocates the same label randomly to the same FEC.
This occurrence of both ingress and egress label being the same is a random event
and should not be expected. Labels typically change on a per hop basis per FEC.
The MPLS label field is 20 bits in length. Label numbers are therefore in the range 0

******ebook converter DEMO Watermarks*******

to 1,048,575.
Labels can be assigned manually or allocated by MPLS control protocols. For
scalability reasons labels are typically automatically created and allocated by using
label distribution protocols like Label Distribution Protocol (LDP) rather than being
assigned manually by administrators.
Label distribution protocols include LDP, Multiprotocol BGP (MBGP), constrain
based OSPF and RSVP. This study guide focuses on LDP.
MPLS supports label stacking - rather than a single label being inserted, multiple
labels are inserted in the MPLS header using a label stack. MPLS implementations
such as L2VPNs and L3VPNs require multiple MPLS labels. Typically, the outer
label identifies the peer PE device (next-hop device in BGP) and the inner label
identifies the VPN or circuit.
The S-Bit or S Field indicates bottom of stack when set to 1. This means that the
current label is the last label in the stack. If the S-Bit is set to 0, it indicates that there
are more labels in the stack.
There are several reserved label values as explained in RFC3032:
A value of 0 represents the "IPv4 Explicit NULL Label". This label value is only
legal at the bottom of the label stack. It indicates that the label stack must be
popped, and the forwarding of the packet must then be based on the IPv4 header.
This can be used for Penultimate Hop Popping (PHP) which is explained later in
the chapter.
A value of 1 represents the "Router Alert Label". This label value is legal
anywhere in the label stack except at the bottom. When a received packet contains
this label value at the top of the label stack, it is delivered to a local software
module for processing. The actual forwarding of the packet is determined by the
label beneath it in the stack. However, if the packet is forwarded further, the
Router Alert Label should be pushed back onto the label stack before forwarding.
The use of this label is analogous to the use of the "Router Alert Option" in IP
packets. Since this label cannot occur at the bottom of the stack, it is not
associated with a particular network layer protocol.
A value of 2 represents the "IPv6 Explicit NULL Label". This label value is only
legal at the bottom of the label stack. It indicates that the label stack must be
popped, and the forwarding of the packet must then be based on the IPv6 header.
A value of 3 represents the "Implicit NULL Label". This is a label that an LSR
may assign and distribute, but which never actually appears in the encapsulation.

******ebook converter DEMO Watermarks*******

When an LSR would otherwise replace the label at the top of the stack with a new
label, but the new label is "Implicit NULL", the LSR will pop the stack instead of
doing the replacement. Although this value may never appear in the encapsulation,
it needs to be specified in the Label Distribution Protocol, so a value is reserved.
Values 4-15 are reserved.
Note
The label range for Circuit Cross Connect (CCC) and Static Virtual Circuit
(SVC) is from 16 to 1023. These labels are reserved for static LSPs.

MPLS Label Switch Router (LSR)


The MPLS Label Switch Router (LSR) is the actual device performing the label
switching. LSRs performs packet forwarding using label switching and run the MPLS
control plane protocols required to set up the label switch path (LSP).
An MPLS network comprises the following types of LSRs:
Ingress LSRIngress LSR of packets. It labels packets entering into the MPLS
network.
Transit LSRIntermediate LSRs in the MPLS network. The transit LSRs on an
LSP forward packets to the egress LSR according to labels.
Egress LSREgress LSR of packets. It removes labels from packets and
forwards the packets to their destination networks.
There are also two LSR roles dependent on location in a core MPLS network:
Provider device (P) - P devices do not directly connect to CEs, but are backbone
core devices. They only need to forward user packets between PEs. These
devices typically swap labels (Transit LSR)
Provider edge (PE) - A PE device is a service provider network device
connected to one or more CEs. It provides VPN access by mapping and
forwarding packets from user networks to public network tunnels and from public
network tunnels to user networks. These devices typically insert and pop labels
(Ingress or Egress LSR).
The PE device provides most MPLS features such as L2VPNs and L3VPNs and has
the most complex configuration. Incoming traffic is selected and labels inserted or
removed based on the FEC.

******ebook converter DEMO Watermarks*******

P devices are unaware of the additional labels used for MPLS implementations such
as L2VPNs and simply swap labels. These devices tend to have simple
configurations.
A third device type is Customer Edge (CE device). The CE does not have an LSR
role, but is rather a customer device at a customer site that is unaware of the MPLS
network. This device runs traditional routing and switching and connects to the PE
device.
In an unmanaged MPLS CE environment, the CE device is configured and managed
by the customer and the PE device is managed by the service provider. In a fully
management environment, the service provider would manage both PE and CE
devices.
LSRs may perform the following actions on labels:
Insert / Impose / Push label: Addition of a new label to a non-MPLS packet or
added to the top of stack of an MPLS packet. Typically performed by a PE device
when packets enter the MPLS network. A PE will look up which FEC the packet
is assigned to and then based on the FEC insert a label on the packet. The packet
is then transmitted to the core MPLS network (typically to a P router).
Swap label: The top of stack label is removed and replaced (changed or
swapped) with a new label. Typically performed by a P device when packets
move from one interface to another in the core MPLS network. Labels are
swapped on a hop by hop basis by each LSR in the LSP.
Pop / remove label: The top of stack label is removed. The packet will be
forwarded based on remaining label stack (if labels remain in the stack) or Layer
3 header (if no labels remain). Typically performed by a PE device when packets
leave the MPLS network. Packets that exit the MPLS network are typically
unchanged from when they entered the MPLS network.

MPLS Label Switch Path (LSP)


A label switched path (LSP) is the path along which packets of a FEC travel through
an MPLS network. It is often referred to as an MPLS tunnel.
As shown in Figure 9-2, the LSP is a unidirectional packet forwarding path. Two
neighboring LSRs are called the "upstream LSR" and "downstream LSR" along the
direction of an LSP.

******ebook converter DEMO Watermarks*******

Figure 9-2: MPLS Label Switch Path (LSP)

As an analogy, view the LSP as a group of labels that are bound to each other through
the MPLS network forming a path through the MPLS network. In Figure 9-2, the PE1
on the left has an LSP to PE2 on the right. When PE1 pings PE2, it does not send IPv4
packets (EtherType 0x0800) to P1 in the MPLS network, but rather inserts a label
before transmission to P1. The packet sent to P1 by PE1 is therefore an MPLS packet
(EtherType 0x8847). The label inserted is based on advertisements from LDP and a
predetermined path calculation made by the IGP or other mechanisms.
Each P LSR will receive the MPLS packet (EtherType 0x8847) and will label switch
the packets between interfaces. Each P LSR will also swap labels based on the
predetermined path and advertised labels. This will continue until the packet arrives
at PE2. PE2 will pop the label and route the packet to the appropriate interface based
on the IP prefix.
The core routers are not aware of the contents of the packet, but simply swap labels.

******ebook converter DEMO Watermarks*******

This is why the LSP is often referred to as a tunnel. Packets are encapsulated in
MPLS for transmission across the MPLS network. Any packet that is labeled by PE1
with the same label will follow the same predetermined LSP. The packets will arrive
at PE2 from PE1 without any of the core routers being aware of the Layer 3 packet
headers or packet content.
LSPs are unidirectional. Therefore a separate LSP is created for traffic in the reverse
direction (not shown in the figure). Different labels are used and a separate LSP is
calculated for traffic from PE2 back to PE1. Two LSPs would therefore be required
for bidirectional communication.

MPLS Label Switch Path (LSP)


How is the LSP path selection calculated?
Path selection can firstly be based on a traditional interior gateway protocol (IGP)
such as OSPF. The routing protocol will calculate the best path based on its own
metric calculations such as bandwidth (OSPF) or hop count (RIP). In the case of
OSPF, the best path is based on bandwidth. The LSP in turn is based on the best path
selection as calculated by OSPF. This provides an easy and convenient LSP path
selection process.
A second option is to manually select the LSP using explicit routed path selection. An
administrator would configure the LSP on a hop by hop basis by specifying the label
to be used and the outgoing interface. This method is very flexible, but is also very
labor intensive. In addition, there is no dynamic failover mechanism and thus explicit
routed path selection is not used regularly in production environments.
In more complex MPLS networks, constraint-based routing is often used. To establish
a Constraint-based Routed Label Switched Path (CRLSP), an administrator
configures a routing protocol such as OSPF, but in addition specifies constraints,
such as explicit paths or path restrictions. Links or routers can be included or
excluded based on specified constraint criteria.
An example of this type of traffic engineering is the forwarding of customer traffic
across a high speed, more expensive link and an additional low speed, less
expensive link. The high speed link could be marked with a different color to the
low speed link. Based on a specified algorithm, traffic from only certain customers
will be transmitted across the high speed link while others use the low speed link.
The mechanics of path selection is out of the scope of this study guide, but it is
mentioned here for completeness.

******ebook converter DEMO Watermarks*******

MPLS Label Information Base (LIB)


An MPLS label information base (LIB) is the software table of all the possible
forward equivalence classes (FECs) and the labels allocated to them, see Figure 9-3.
This is similar in concept to the IPv4 routing information base (RIB) which is the
software routing table of an IPv4 router.

Figure 9-3: MPLS Label Information Base (LIB)

In a traditional RIB, information such as the destination network, outgoing interface


and next hop IP address are stored. In a LIB, similar information is stored including
FEC (destination), outgoing interface and associated label.
The LIB is populated via routing protocols such as OSPF, ISIS and RIP. Labels
associated to FECs are added by MPLS control plane protocols such as LDP.
The LIB contains FEC to label mappings and label to label mappings. The FEC to
label mappings associate ingress packets to MPLS labels. For example, ingress IPv4
traffic destined to IP prefix 10.1.1.0/24 may have label 1000 inserted.
However in the core MPLS network, labels are swapped. Therefore a label of 1000
could be mapped to label 1001. MPLS traffic received by a P LSR with a label 1000
on an ingress interface has the label swapped with label 1001 on egress from the P
LSR.

******ebook converter DEMO Watermarks*******

One difference between the FIB and LIB is that the LIB contains all known FECs and
labels, whereas the FIB contains the best routes. In an MPLS environment, the LIB
contains all known routes, but the LFIB contains only the best paths.

MPLS Label Forward Information Base (LFIB)


The Label Forwarding Information Base (LFIB) is the hardware table as
programmed in the ASICs of the MPLS label switch router, see Figure 9-4.

Figure 9-4: MPLS Label Forward Information Base (LFIB)

The LFIB in an MPLS network functions like the Forwarding Information Base (FIB)
in an IP network. When an LSR receives a labeled packet, it searches the LFIB to
obtain information for forwarding the packet, such as the label operation type, the
outgoing label value, and the next hop.
A FIB (hardware) is populated with information from the RIB (software). In the same
way, a LFIB (hardware) is populated with information from the LIB (software).
Information in the LIB is used to program the LFIB. While the LIB contains all known
possible routes, on the best path for a FEC is programmed into the LFIB.
The LFIB contains a FEC to label mapping and egress interface for traffic entering
the MPLS network. In other words, when traffic is received by a PE from a CE, that

******ebook converter DEMO Watermarks*******

traffic may be IPv4 traffic. Before the PE transmits the traffic to a P LSR, the PE will
insert a label. An IP prefix to label mapping is therefore stored in the PE LFIB.
Traffic received by a P device from a PE device will be label swapped. Thus, a
label to label mapping and egress interface is stored in the LFIB of the P device.

MPLS Forwarding Information Base (FIB)


As shown in Figure 9-5, an LSR consists of two components:

Control plane: Implements label distribution and routing, assigns labels,


distributes FEC-label mappings to neighbor LSRs, creates the LFIB, and
establishes and removes LSPs.

Forwarding plane: Forwards packets according to the LFIB.

Figure 9-5: MPLS Forwarding Information Base (FIB)

Tables:

******ebook converter DEMO Watermarks*******

Routing Information Base (RIB): Software based version of the routing table.
Label Information Base (LIB): Software table of all the possible forward
equivalence classes (FECs) and the labels allocated to them.
Forwarding information bases (FIB): Hardware forwarding table based on the
software routing information base (RIB).
Forwarding information bases (FIB): Hardware forwarding table based on the
software routing information base (RIB).
Label Forwarding Information Base (LFIB): Hardware label forwarding table
based on information in the software LIB.
An Edge LSR forwards both labeled packets and IP packets via the forwarding plane
and therefore uses either the LFIB (labeled packets) or the FIB (IP packets). An
ordinary LSR only needs to forward MPLS labeled packets and therefore uses only
the LFIB.

MPLS Label Processing


MPLS label processing can be compared to IPv4 routing. In IPv4, a destination
prefix needs to be learned via a dynamic routing protocol or configured manually
using static routes.
Information about a single IP prefix could be learnt in multiple ways. For example
both RIP and OSPF may have learnt about network 10.1.1.0/24 and would want to
add that route to the routing table. Based on criteria such as the administrative
distance, the best route is added to the routing table (RIB). The selected RIB entry
information is then stored in the hardware routing table (FIB).
Once the RIB and FIB are populated, traffic destined for the destination prefix can be
processed in hardware without referring to the software RIB.
In the similar way, destination prefixes or FECs are learnt via routing protocols such
as OSPF and label distribution protocols such as LDP, or manually configured in the
LIB. The best path is programmed in the LFIB (in hardware).
Once the labels have been programmed into the LFIB, traffic destined to the FEC can
then be processed in hardware.
Later in this chapter, we will discuss how the destination FECs are announced and
processed and then how traffic destined to a FEC is processed.

******ebook converter DEMO Watermarks*******

MPLS Distribution Protocol (LDP)


Destination FECs are announced by label distribution protocols such as LDP.
LDP distributes destination network to label mappings based on entries in the IPv4
routing table. The router's local IPv4 routing table contains networks learnt via
routing protocols and each IPv4 entry is a possible MPLS FEC.
For each local IPv4 prefix found in the local IPv4 routing table, a local label is
assigned. Administrators can limit label assignment to only certain IPv4 prefixes by
configuring the LSP trigger command appropriately.
LDP exchanges FEC to LDP mappings with LDP neighbors. A local router informs
neighbor LDP neighbors (peers) which label to use when sending traffic to the local
router for a specific FEC. As an example, LSR1 may select label 1000 for network
10.1.1.0/24. LSR1 will then advertise label 1000 to peer LSRs using LDP. Those
neighbors should then use label 1000 when sending traffic to LSR1 destined for
network 10.1.1.0/24. All LDP neighbors will learn which labels to use for all
possible FECs advertised by the local LSR.
When traffic is received from peer LSRs, the local LSR knows which FEC the traffic
belongs to as the LSR allocated those labels locally and advertised them to the peers.

MPLS LDP
The network in Figure 9-6 will be used as a starting point to explain label
distribution. The network state is as follows:
IPv4 addresses configured.
OSPF configured and IPv4.
Routes have been exchanged.

******ebook converter DEMO Watermarks*******

Figure 9-6: MPLS LDP

When LDP is configured for label distribution, label advertisement may take place as
follows:
1. Assume that PE-2 in Figure 9-6 has learned about a subnet of 10.1.2.0/24 from
CE-2. This advertisement is a traditional IPv4 advertisement via routing
protocols such as OSPF or RIP. MPLS or LDP is not used as neither interface
Gigabit Ethernet 1/0/0 on PE-2 is not configured for MPLS nor is CE configured
to use MPLS.
2. Assuming that the interface is currently up, subnet 10.1.2.0/24 will be shown as
available in PE-2s routing table. As OSPF is configured on PE-2 and is
advertising routes, network 10.1.2.0/24 will also be advertised to P-1.
3. The OSPF process on P-1 will learn about network 10.1.2.0/24 and the route
will be added to the IPv4 routing table with a next hop of PE-2 and local
outgoing interface of Gigabit Ethernet 1/0/0.
4. PE-2 in this example is configured to generate labels for all prefixes in the IPv4
routing table. PE-2 will allocate a locally significant label (2001 in this
example) to subnet 10.1.2.0/24. This information is stored in both the LIB and
LFIB tables.
5. PE-2 will advertise the label to P-1 using LDP. In other words, PE-2 is
advertising to P-1 that subnet 10.1.2.0/24 is available using label 2001.
6. P-1 adds the update to its local LIB.
7. P-1 then associates label 2001 with subnet 10.1.2.0/24 in the FIB.

MPLS Data Forwarding


As shown in Figure 9-7, assume that P-1 pings a host (10.1.2.10) on subnet

******ebook converter DEMO Watermarks*******

10.1.2.0/24. Traffic flows as follows:

Figure 9-7: MPLS Data Forwarding

1. The incoming traffic for data transmission is IPv4. This is because the local IPv4
version stack is used when P-1 pings the IPv4 address of 10.1.2.10. The ping
application uses the ICMP protocol (layer 4) which uses the IPv4 protocol
(Layer 3). Because this is IPv4 traffic, the FIB table is used for hardware
forwarding of the traffic.
2. In addition to the LIB containing a label associated with subnet 10.1.2.0/24, the
label was also associated previously with the FIB table by P-1. In the example,
label 2001 has been associated with subnet 10.1.2.0/24. When the packet is
transmitted to PE-2, the MPLS label is inserted and the packet is transmitted as
an MPLS labeled packet (EtherType 0x8847) to PE-2.
3. PE-2, on receipt of the packet knows that the LFIB table should be used rather
than the FIB table because of the packet header EtherType 0x8847. The entry in
the LFIB table is POP as the packet is going to an IPv4 interface that does not
have MPLS enabled on it.
4. Once the label has been removed, the IPv4 header can be processed according to
the FIB table. The FIB table has an entry for subnet 10.1.2.0/24 with an outgoing
interface of G1/0/0.
5. The packet is then transmitted to the CE device as an IPv4 packet (EtherType
0x0800).
The CE device is unaware that the packet was encapsulated using MPLS in the MPLS
core.
Figure 9-8 shows an extended topology with the addition of PE-1.

******ebook converter DEMO Watermarks*******

Figure 9-8: MPLS LDP

P-1 will independently allocate a local label to the subnet 10.1.2.0/24. In this
example label 3001 was allocated.
P-1 will announce the subnet and label to PE-1 using LDP.
PE-1 will update the LIB table to indicate that it can get to subnet 10.1.2.0/24
using label 3001.
PE-1 will also update the FIB table to indicate that label 3001 should be inserted
when traffic is transmitted to subnet 10.1.2.0/24.

******ebook converter DEMO Watermarks*******

MPLS Packet ForwardingPart 1


Figure 9-9 shows packet processing for traffic through the extended MPLS network:

Figure 9-9: MPLS Packet Forwarding - Part 1

1. Assume that CE-1 pings 10.1.2.10 which is a host on subnet 10.1.2.0/24. Also
assume that CE-1 has PE-1 configured as its default gateway.
2. When PE-1 receives the IPv4 traffic, PE-1 will check the FIB table as the
incoming traffic is IPv4 traffic (EtherType 0x0800). The FIB table has an entry
for the destination subnet (10.1.2.0/24) and outgoing interface of G1/0/0.
3. In addition, the label 3001 is associated with this subnet in the FIB. In this
example, label 3001 has been associated with subnet 10.1.2.0/24. When PE-1
transmits the packet to P-1, the MPLS label is inserted and the packet is
transmitted as an MPLS labeled packet (EtherType 0x8847).
4. When P-1 receives the MPLS labeled traffic, P-1 will check the LFIB table
because the incoming traffic is MPLS traffic (EtherType 0x8847). The LFIB
table has an entry for the label 3001 which should be swapped with label 2001
and transmitted out of interface of G1/0/0.
5. P-1 swaps label 3001 with label 2001 and transmits the packet as an MPLS
labeled packet (EtherType 0x8847) to PE-2. It is important to note that P-1 did
not check the FIB or RIB to process the traffic. The router is routing traffic
between interfaces without the use of the IPv4 routing table.

******ebook converter DEMO Watermarks*******

MPLS Packet ForwardingPart 2


Figure 9-10 shows the second part of the packet processing for traffic through the
MPLS network.

Figure 9-10: MPLS Packet Forwarding - Part 2

1. PE-2 receives the MPLS labeled traffic from P-1.


2.

PE-2 will check the LFIB table as the incoming traffic is MPLS traffic
(EtherType 0x8847). The LFIB table has an entry for the label 2001 which
should be removed (popped) as the packet is going to an IPv4 interface that does
not have MPLS enabled on it.

3. Once the label has been removed, the IPv4 header can be processed according to
the FIB table. The FIB table has an entry for subnet 10.1.2.0/24 with an outgoing
interface of G1/0/0.
4. The packet is then transmitted to the CE device as an IPv4 packet (EtherType
0x0800).
Both CE devices are unaware that the packet was encapsulated using MPLS in the
MPLS core.

******ebook converter DEMO Watermarks*******

MPLS LDP - Part 1


In Figure 9-11, the core MPLS network is now expanded to include two P routers and
additional links between devices.

Figure 9-11: MPLS LDP - Part 1

In MPLS environments, there are two label retention modes. The label retention
mode specifies whether an LSR may maintain a label mapping for a FEC learned
from a neighbor that is not its next hop.
The two modes are:
Liberal label retentionRetains a received label mapping for a FEC regardless
of whether the advertising LSR is the next hop of the FEC. This mechanism
allows for quicker adaptation to topology changes, but it wastes system resources
because LDP has to keep unused labels. Most MPLS routers support liberal label
retention only.
Conservative label retentionRetains a received label mapping for a FEC only
when the advertising LSR is the next hop of the FEC. This mechanism saves label
resources, but it cannot quickly adapt to topology changes.
In the example in Figure 9-11, we are assuming that liberal label retention mode is
used as this is the most common implementation.
1. In Figure 9-11, PE-2 announces label 2001 to both P-1 and P-2.
2. P-1 and P-2 will update their LIB tables to include label 2001. In Figure 9-11,
P-1 and P-2 both independently allocate a local label for subnet 10.1.2.0/24. P-1
allocates label 3001 and P-2 allocates label 3002.

******ebook converter DEMO Watermarks*******

3. Both P-1 and P-2 advertise their allocated labels to PE-1 which in turn updates
its local LIB. In this case, PE-1 has learnt about two paths to subnet network
10.1.2.0/24. PE-1 will also allocate its own local label to the subnet, in this
case 4001.

******ebook converter DEMO Watermarks*******

MPLS LDP - Part 2


In addition to labels being advertised to upstream LSRS, labels are also advertised
back on the paths they were received (downstream). LDP will advertise all possible
paths.
In Figure 9-12:
1. PE-1 advertises its local label back to both P-1 and P-2.
2. P-1 and P-2 advertise their local labels to each other. In addition they update
their LIB tables with all labels received.

Figure 9-12: MPLS LDP - Part 2

In Figure 9-12, P-1 contains the following labels for subnet 10.1.2.0/24:
2001 - label advertised by PE-2
3001 - label advertised by P-2
4001 - label advertised by PE-1
P-1 can therefore reach the FEC (10.1.2.0/24) via any of the three routers. LDP does
not select the best path. Another mechanism is required to determine the best path to
the destination.
The same situation applies to PE-1 and P-2. They have multiple paths to the FEC
(10.1.2.0/24).
As mentioned on previously, the routers are using liberal label retention. The
advantage of this mechanism is that routers have already learned about multiple paths

******ebook converter DEMO Watermarks*******

to the same FEC and can react more quickly to topology changes. The disadvantage is
overhead - more label information must to be maintained by the routers.

MPLS LDP - Best Path Selection


MPLs control plane protocols do not calculate the best path. By default the IGP is
used for path selection (OSPF, ISIS, static routes). An IGP such as OSPF uses a
bandwidth calculation to determine the best path to a destination network and this in
turn determines which LSP and labels are used for MPLS traffic. The LSRs will use
the labels associated with the OSPF path selection next hop and outgoing interface.
The best path LDP peer is determined from the IGP next hop IP address.
The LSR determines the label and next hop by comparing the IGP next hop IP address
in the routing table with LDP peers. Once a match is made between the IGP next hop
and LDP peer IP address, the LSR can determine which label was advertised by that
LDP peer and then use that label for label insertion or swapping. This information is
then programmed in the LFIB in addition to the outgoing interface.
In the case of a link failure, the IGP is also used to determine the new best path.
When a link fails, OSPF will for example recalculate the best path to the destination
prefix based on the bandwidth of remaining links. A new next hop IP address will
then be associated with the route in the IP routing table. This in turn will determine
the new LDP peer to use as well as the new label to use. The new label and new
outgoing interface will be then programmed into the LFIB hardware table.

MPLS LDP Best Path


In Figure 9-13, it is assumed that OSPF is the IGP used by the LSRs and all interfaces
are of equal cost.

******ebook converter DEMO Watermarks*******

Figure 9-13: MPLS LDP Best Path

OSPF routing tables path selections for destination subnet 10.1.2.0/24 are shown in
Figure 9-13. Both P-1 and P-2 have determined that the best path to prefix
10.1.2.0/24 is via PE-2. In the case of PE-1, the best path is via P-1.
LSRs will compare the next hop IP addresses in the IP routing tables against LDP
peers. Once a matched LDP peer is found, the label advertised by that peer and
outgoing interface to that LDP peer is programmed in the LFIB table of each LSR. In
the case of P-1, peer PE-2 and label 2001 are selected. This best route is then added
to the LFIB of P-1.
This is an important distinction - the LIB (software table) contains all possible paths
as seen in Figure 9-13. However, the LFIB (hardware table) only contains the best
route (not shown in Figure 9-13).

MPLS LDP Best Path After Link Failure


In Figure 9-14, it is assumed that the link between P-1 and PE-2 and the link between
P-1 and P-2 fail.

******ebook converter DEMO Watermarks*******

Figure 9-14: MPLS LDP Best Path After Link Failure

The only path available for P-1 to get to subnet 10.1.2.0/24 is via PE-1, P-2 and PE2. OSPF will determine this and update the IPv4 routing tables accordingly. The IP
routing table of P-1 will reflect the new next hop of PE-1 and outgoing interface
G1/0/1 (link to PE-1). PE-1 will also update its routing table with a new next hop of
P-2 and outgoing interface of G1/0/2 (link to P-2). The P-2 router does not need to
change its routing entry as the best path is still active.
New LDP peers and labels are selected by both P-1 and PE-1. In the case of P-1, the
new LDP peer is PE-1 and the advertised label of 4001 is now used for forwarding
traffic to subnet 10.1.2.0/24. PE-1 will select a new LDP peer of P-2 and label of
3002.
Because of liberal label retention, failover is very quick as the labels were
previously learnt and retained. After link failure, LFIBs are simply updated with the
updated label.
In the case of conservative label retention, the new labels would have to be
discovered which would slow down convergence.

MPLS Data Plane Label Processing Example


An LSR can receive various types of traffic from CE devices (IPv4, IPv6, IPX, layer
2, layer 2 labeled). In this study guide only incoming IPv4 and incoming labeled
traffic are discussed.
An edge LSR (PE) typically receives IP traffic from a CE device. When the PE
receives the IP traffic destined to a subnet across an MPLS core, it will look up the
destination IP prefix in the FIB. A corresponding forwarding equivalence class

******ebook converter DEMO Watermarks*******

(FEC) and label may be associated with the route. This label would have previously
been announced by a LDP peer.
The PE will insert the label and forward the traffic to the neighbor (typically a P
LSR) based on the IGP selection process.
The neighbor LSR will determine that the received packet is a labeled packet
because of the EtherType. The LSR (typically a P router) will check the incoming
label against the LFIB. If there is a match, the LSR will swap the label in hardware
with a new label (LFIB). The new label is determined by the IGP next hop and in turn
the neighbor LDP LSR.

MPLS Label Processing


In summary, refer to Figure 9-15 for MPLS label processing. In this topology, it is
assumed that label announcement via LDP has been completed.

Figure 9-15: MPLS Label Processing

In this example, the Ingress PE device receives an IPv4 packet with a destination
address of 10.1.2.10. This packet will match the forward equivalence class
10.1.0/24 and the Ingress LSR will find the corresponding label for that forward
equivalence class. In this example, OSPF has selected a path via P2 and P3 to the
Egress PE LSR. P2 has advertised the label 1035 to the Ingress PE to use for this
FEC. The Ingress LSR will therefore insert a label of 1035 and send an MPLS
labeled packet (EtherType 0x8847) to P2.
P2 will find a matching entry in its LFIB and then swap the label with a new label of
1096. The label of 1096 was previously advertised to P2 by P3 for this FEC. P2 will

******ebook converter DEMO Watermarks*******

forward the packet with the new label of 1096 to P3 (EtherType 0x8847).
P3 will follow a similar process and find a matching entry in its LFIB and then swap
the label with the label advertised previously by the Egress PE. P3 will forward the
packet with the new label to the Egress LSR (EtherType 0x8847).
The Egress LSR will find a matching entry in its LFIB table. This will include the
instruction to pop the label. The packets IPv4 destination address will then be
checked against the FIB table which indicates that the packet should be forwarded as
a normal IPv4 packet out of the MPLS network (EtherType 0x0800).

PHP
An egress node must perform two forwarding table lookups to forward a packet: two
LFIB lookups (if the packet has more than one label), or one LFIB lookup and one
FIB lookup (if the packet has only one label).
The penultimate hop popping (PHP) feature can pop the label at the penultimate node,
so the egress node only performs one table lookup.
A PHP-capable egress node sends the penultimate node an implicit null label of 3.
This label never appears in the label stack of packets. If an incoming packet matches
an LFIB entry comprising the implicit null label, the penultimate node pops the label
stack of the packet and forwards the packet to the egress LSR. The egress LSR then
forwards the packet.
Sometimes, the egress node must use the TC field in the label to perform QoS. To
keep the TC information, you can configure the egress node to send the penultimate
node an explicit null label of 0. If an incoming packet matches an LFIB entry
comprising the explicit null label, the penultimate hop replaces the value of the top
label with value 0, and forwards the packet to the egress node. The egress node gets
the TC information, pops the label of the packet, and forwards the packet.

Basic MPLS Configuration Steps


Basic MPLS configuration will now be discussed. These configuration steps, as
shown in Figure 9-16, are performed on P and PE devices and not CE devices.

******ebook converter DEMO Watermarks*******

Figure 9-16: Basic MPLS Configuration Steps

In the first step, IP addresses and Interior Gateway Protocols (IGPs) are configured.
In this study guide, the IGP used is OSPF, but another protocol such as ISIS could
also be used. It is important that a loopback address be configured on both P and PE
devices with a /32 mask. The loopback IP address must also be advertised by the
IGP.
In the second step, an MPLS LSR-ID is configured on each LSR device. HP
recommends that this be set to a loopback IP address of the LSR.
Thirdly, LDP is enabled globally on the LSR and IP prefixes that trigger label
announcements are specified.
Fourthly, MPLS and MPLS LDP are enabled on each backbone interface.
Lastly, the configuration is verified.
Note
The configuration steps shown in this study guide are Comware 7 specific.
Comware 5 MPLS configuration is similar, but be aware that there are minor
differences. Please refer to the relevant device configuration documentation.

Step 1: Configure IP and IGP


In this first step, configure IP and OSPF. The configuration of IP addresses and basic
OSPF is not explained here as these fundamental topics are explained in other study
guides.
It is good practice to configure interfaces for optimal OSPF performance including

******ebook converter DEMO Watermarks*******

the following:
Configure interfaces as routed ports whenever possible.
Ensure minimal or no Layer 2 protocol impact on OSPF by disabling or tuning
spanning tree.
Configure at least one loopback address with a /32 mask. Configure OSPF to use
the loopback IP address as the router ID and advertise the loopback address via
the OSPF.
Ensure that all backbone interfaces are configured with relevant IP addresses and
advertise the networks via OSPF.
Enhance OSPF performance by using OSPF network type P2P on links where
there are only two OSPF routers. This ensures that routers do not need to wait for
the designated router election process to complete before forming peer
relationships and exchanging routing information.
Adjust OSPF interface timers by reducing the OSPF hello interval to 1 second
from the default of 10 seconds and the dead interval to 4 seconds from the default
of 40 seconds. This helps detect peer device failure more quickly and provides
quicker adjacency setup.

Command reference:
Some of the following commands may be used for initial setup:

ospf
Use the ospf command to enable OSPF and enter OSPF view.
You can enable multiple OSPF processes on a router and specify different router IDs
for them.
Enable an OSPF process before performing other tasks.

Syntax
ospf [ process-id
instance-name ]

router-id

router-id

vpn-instance

undo ospf [ process-id ]


process-id

Specifies an OSPF process by its ID in the range of 1 to 65535.

******ebook converter DEMO Watermarks*******

vpn-

router-id router-id

Specifies an OSPF router ID in dotted decimal notation.


vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string


of 1 to 31 characters. If no VPN is specified, the OSPF process runs on
the public network.

Examples
Enable OSPF process 100 and specify router ID 10.10.10.1.
<Sysname> system-view
[Sysname] ospf 100 router-id 10.10.10.1
[Sysname-ospf-100]

area (OSPF view)


Use the area command to create an area and enter area view.
Use the undo area command to remove an area.
By default, no OSPF area is created.

Syntax
area area-id
undo area area-id
area-id

Specifies an area by its ID, an IP address or a decimal integer in the range of 0 to


4294967295 that is translated into the IP address format by the system.

Examples
Create area 0 and enter area 0 view.
<Sysname> system-view
[Sysname] ospf 100
[Sysname-ospf-100] area 0
[Sysname-ospf-100-area-0.0.0.0]

******ebook converter DEMO Watermarks*******

network (OSPF area view)


Use the network command to enable OSPF on the interface attached to the specified
network in the area.
Use the undo network command to disable OSPF for the interface attached to the
specified network in the area.
By default OSPF is not enabled on any interface.
This command enables OSPF on the interface attached to the specified network. The
interface's primary IP address must be in the specified network. If only the interface's
secondary IP address is in the network, the interface cannot run OSPF.

Syntax
network ip-address wildcard-mask
undo network ip-address wildcard-mask
ip-address

Specifies the IP address of a network.


wildcard-mask

Specifies the wildcard mask of the IP address. For example, the


wildcard mask of mask 255.0.0.0 is 0.255.255.255.

Examples
Specify the interface whose primary IP address is on network 131.108.20.0/24 to run
OSPF in Area 2.
<Sysname> system-view
[Sysname] ospf 100
[Sysname-ospf-100] area 2
[Sysname-ospf-100-area-0.0.0.2] network 131.108.20.0 0.0.0.255

ospf network-type
Use the ospf network-type command to set the network type for an interface.
Use the undo ospf network-type command to restore the default network type for an
interface.

******ebook converter DEMO Watermarks*******

By default, the network type of an interface depends on its link layer protocol:
For Ethernet and FDDI, the network type is broadcast.
For ATM, FR, and X.25, the network type is NBMA.
For PPP, LAPB, HDLC, and POS, the network type is P2P.
If a router on a broadcast network does not support multicast, configure the network
type for the connected interfaces as NBMA.
If any two routers on an NBMA network are directly connected through a virtual link,
the network is fully meshed, and you can configure the network type for the connected
interfaces as NBMA. If two routers are not directly connected, configure the P2MP
network type so that the two routers can exchange routing information through another
router.
When the network type of an interface is NBMA or P2MP unicast, you must use the
peer command to specify the neighbor.
If only two routers run OSPF on a network, you can configure the network type for
the connected interfaces as P2P.
When the network type of an interface is P2MP unicast, all OSPF packets are unicast
by the interface.

Syntax
ospf network-type { broadcast | nbma | p2mp [ unicast ] | p2p [
peer-address-check ] }
broadcast

Specifies the network type as broadcast.


nbma

Specifies the network type as NBMA.


p2mp

Specifies the network type as P2MP.


unicast

Specifies the P2MP interface to unicast OSPF packets. By default, a


P2MP interface multicasts OSPF packets.
p2p

******ebook converter DEMO Watermarks*******

Specifies the network type as P2P.


peer-address-check

Checks whether the peer interface and the local interface are on the same
network segment. Two P2P interfaces can establish a neighbor
relationship only when they are on the same network segment.

Examples
Configure the OSPF network type for VLAN-interface 10 as NBMA.
<Sysname> system-view
[Sysname] interface vlan-interface 10
[Sysname-Vlan-interface10] ospf network-type nbma

ospf timer hello


Use the ospf timer hello command to set the hello interval on an interface.
Use the undo ospf timer hello command to restore the default.
By default, the hello interval is 10 seconds for P2P and broadcast interfaces, and is
30 seconds for P2MP and NBMA interfaces.
The shorter the hello interval, the faster the topology converges, and the more
resources are consumed. Make sure the hello interval on two neighboring interfaces
is the same.

Syntax
ospf timer hello seconds
undo ospf timer hello
seconds

Specifies the hello interval in the range of 1 to 65535 seconds.

Examples
Configure the hello interval on VLAN-interface as 20 seconds.
<Sysname> system-view
[Sysname] interface vlan-interface 10
[Sysname-Vlan-interface10] ospf timer hello 20

******ebook converter DEMO Watermarks*******

ospf timer dead


Use the ospf

timer

Use the undo

ospf

dead command to set the neighbor dead interval.

timer dead command to restore the default.

The dead interval is 40 seconds for broadcast and P2P interfaces. The dead interval
is 120 seconds for P2MP and NBMA interfaces.
If an interface receives no hello packet from a neighbor within the dead interval, the
interface considers the neighbor down. The dead interval on an interface is at least
four times the hello interval. Routers attached to the same segment must have the
same dead interval.

Syntax
ospf timer dead seconds
undo ospf timer dead
seconds

Specifies the dead interval in the range of 1 to 2147483647 seconds.

Examples
Configure the dead interval for VLAN-interface 10 as 60 seconds.
<Sysname> system-view
[Sysname] interface vlan-interface 10
[Sysname-Vlan-interface10] ospf timer dead 60

Step 2: Configure MPLS LSR-ID


Overview
In this second step, the MPLS label switch router ID is configured. The LSR ID must
be configured on each LSR device and typically, the loopback IP address is used as
the LSR ID. In Figure 9-17, the LSR ID is configured as 10.0.0.1. The display mpls
summary command is used to show MPLS settings including the LSR ID.

******ebook converter DEMO Watermarks*******

Figure 9-17: Step 2: Configure MPLS LSR-ID

mpls lsr-id
Use the mpls lsr-id command to configure an LSR ID for the local LSR.
Use the undo mpls lsr-id command to delete the LSR ID of the local LSR.
By default, an LSR has no LSR ID.
HP recommends that you use the address of a loopback interface on the LSR as the
LSR ID.

Syntax
mpls lsr-id lsr-id
undo mpls lsr-id
lsr-id

Specifies an ID for identifying the LSR, in dotted decimal notation.

Examples
Configure the LSR ID as 10.0.0.1 for the local node.
<Sysname> system-view
[Sysname] mpls lsr-id 10.0.0.1

display mpls summary


Use the display

mpls

summary command to display MPLS summary information.

Syntax
display mpls summary

******ebook converter DEMO Watermarks*******

Examples
# Display MPLS summary information.

See Table 9-2 for the output description.


Table 9-2: Display MPLS summary output description
Field

Description
Label type that the egress assigns to the penultimate hop:

Egress Label
Type

Implicit-null.
Explicit-null.
Non-null.

Labels
Range
Idle

Label information.
Label range.
Number of idle labels in the label range.

******ebook converter DEMO Watermarks*******

Protocols
Type

Running label distribution protocols and the related information.


Protocol type: LDP, BGP, RSVP, Static, Static CRLSP, TE, or
L2VPN.
Label distribution protocol running status:

State

Normal.
RecoverThe protocol is in the Graceful Recovery process.

Step 3: LDP and Prefixes which Trigger LSP


Overview
In the next step, LDP is enabled globally on the device and prefixes are specified that
can trigger LSP label announcements. Labels can be announced for all IP prefixes
found in the LSR IP routing table, or they can be limited to certain IP prefixes only.
The network administrator could configure that some background traffic should use
using pure IP routing without label switching. This is useful where access lists or
traditional IP filters are used to filter IP traffic on the backbone network.
Additionally, selected label triggers may be used to ensure that only specific traffic is
label switched. In a L2VPN or VPLS scenario, it is possible to generate labels for
LSR loopback interfaces only and not other core interfaces. In this case, VPN traffic
would be label switched on the backbone while other internal traffic would use
traditional IP and be Layer 3 routed. This is discussed in more detail in the L2VPN
and VPLS chapters.
Within the MPLS LDP context, you specify which IP prefixes trigger label
announcements. In Figure 9-18, label announcements are triggered for all IP prefixes,
but this could be limited to certain IP prefixes by using access lists.

Figure 9-18: Step 3: LDP and Prefixes which Trigger LSP

mpls ldp
******ebook converter DEMO Watermarks*******

Use the mpls

ldp

command to enable LDP globally and enter LDP view.

Use the undo mpls ldp command to disable LDP globally for an LSR and delete all
LDP-VPN instances.
By default, LDP is globally disabled.
You must enable LDP globally for an LSR to run LDP.
The GR commands, the session protection command, and the targeted-peer
command are available only in LDP view. All other commands available in LDP
view are also available in LDP-VPN instance view.
Commands executed in LDP view take effect only for the public network.
Commands executed in LDP-VPN instance view take effect only for the specified
VPN instance. The GR commands are global commands and take effect for all VPN
instances and the public network.

Syntax
mpls lsr-id lsr-id
undo mpls lsr-id
lsr-id

Specifies an ID for identifying the LSR, in dotted decimal notation.

Examples
Enable LDP globally and enter LDP view.
<Sysname> System-view
[Sysname] mpls ldp
[Sysname-ldp]

lsp-trigger
Use the lsp-trigger command to configure an LSP generation policy.
Use undo lsp-trigger command to restore the default.
By default, LDP can only use host routes with a 32-bit mask to generate LSPs.

******ebook converter DEMO Watermarks*******

The default LSP generation policy depends on the label distribution control mode.
In Ordered mode, LDP can only use the Loopback interface address routes with a
32-bit mask and the routes with a 32-bit mask that match the FECs of label
mappings received from downstream LSRs to generate LSPs.
In Independent mode, LDP can use all routes with a 32-bit mask to generate LSPs.
After you configure an LSP generation policy, LDP uses all routes or the routes
permitted by the IP prefix list to generate LSPs, regardless of the label distribution
control mode.
HP recommends using the default LSP generation policy.

Syntax
lsp-trigger { all | prefix-list prefix-list-name }
undo lsp-trigger
all

Enables LDP to use all routes to generate LSPs.


prefix-list prefix-name

Specifies an IP prefix list by its name, a case-sensitive string of 1 to 63


characters. LDP can only use the routes permitted by the IP prefix list to
generate LSPs.

Examples
Configure an LSP generation policy to use only routes 10.10.1.0/24 and 10.20.1.0/24
to establish LSPs for the public network.
<Switch> system-view
[Switch] ip prefix-list egress-fec-list index 1 permit 10.10.1.0 24
[Switch] ip prefix-list egress-fec-list index 2 permit 10.20.1.0 24
[Switch] mpls ldp
[Switch-ldp] lsp-trigger prefix-list egress-fec-list

Step 4: Enable MPLS and LDP on interfaces


In step 4, MPLS and LDP are enabled on each backbone interface. This applies to
physical routed interfaces or routed VLAN interfaces. Do not configure MPLS on
customer facing interfaces.

******ebook converter DEMO Watermarks*******

In Figure 9-19, it is assumed that the Forty Gigabit Ethernet 1/0/51 interface is a
routed port and therefore MPLS and LDP are configured directly on the interface.
The mpls enable command is used to enable label switching and the mpls ldp enable
command configures the device to attempt to form an LDP session with a peer device
and then exchange labels and FEC information.

Figure 9-19: Step 4: Enable MPLS and LDP on interfaces

mpls enable command


Use the mpls enable command to enable MPLS on an interface. Execute this
command on all interfaces that need to perform MPLS forwarding.
Use the undo

mpls enable

command to disable MPLS on an interface.

By default, MPLS is disabled on an interface.

Syntax
mpls enable
undo mpls enabled

Examples
Enable MPLS on interface VLAN-interface 2.
<Sysname> System-view
[Sysname] interface vlan-interface 2
[Sysname-Vlan-interface2] mpls enable

mpls ldp enable command


Use the mpls

ldp enable

command to enable LDP for an interface.

Use the undo

mpls ldp enable

command to disable LDP for an interface.

By default, LDP is disabled on an interface.

******ebook converter DEMO Watermarks*******

Before you enable LDP for an interface, use the


to enable LDP globally.

mpls ldp

command in system view

Disabling LDP on an interface terminates all LDP sessions on the interface, and
removes all LSPs established through the sessions.
If the interface is bound with a VPN instance, you must also use the
command to enable LDP for the VPN instance.

vpn-instance

An up interface enabled with LDP and MPLS sends Link Hellos for neighbor
discovery.
An up MPLS TE tunnel interface enabled with LDP sends Targeted Hellos to the
tunnel destination and establishes a session with the tunnel peer.

Syntax
mpls ldp enable
undo mpls ldp enable

Examples
Enable LDP for VLAN-interface 2.
<Sysname> system-view
[Sysname] mpls ldp
[Sysname-ldp] quit
[Sysname] interface vlan-interface 2
[Sysname-Vlan-interface2] mpls ldp enable

Step 5: Verify
The last step is verification. Several display commands are available to verify the
MPLS global settings, MPLS interfaces, LDP global settings, LDP interfaces and
MPLS LDP LSP exchange information.

Step 5.1: Verify MPLS Global Setting


The display mpls summary command can be used to view the LSR Router ID, the
label ranges that will be used and the number of available labels in the ranges.
In Figure 9-20, various label ranges are shown that have been allocated by the

******ebook converter DEMO Watermarks*******

Comware device as well as the number of labels that are available. The two label
distribution protocols available on this device are LDP and Static. Protocols could
include Multiprotocol BGP, RSVP, TE and others.

Figure 9-20: Step 5.1: Verify MPLS Global Setting

Guidelines
Use the display

mpls summary

command to display MPLS summary information.

Syntax
display mpls summary

Examples
Display MPLS summary information.
<Sysname> display mpls summary
MPLS LSR ID : 2.2.2.2
Egress Label Type: Implicit-null
Labels:
Range Idle
16-1023 1008
1024-9215 8192
65536-73727 8192

******ebook converter DEMO Watermarks*******

131072-139263 8192
Protocols:
Type State
BGP Normal
Static Normal

See Table 9-3 for the output description.


Table 9-3: Display MPLS summary output description
Field

Description
Label type that the egress assigns to the penultimate hop:

Egress Label
Type

Implicit-null.
Explicit-null.
Non-null.

Labels
Range

Label information.
Label range.

Idle
Protocols

Number of idle labels in the label range.


Running label distribution protocols and the related information.
Protocol type: LDP, BGP, RSVP, Static, Static CRLSP, TE, or
L2VPN.
Label distribution protocol running status:

Type

State

Normal.
RecoverThe protocol is in the GR process.

Step 5.2: Verify MPLS Interfaces


A network administrator can verify which interfaces have MPLS enabled by using the
display MPLS interface command, as shown in Figure 9-21. This should list all
the backbone facing interfaces.

******ebook converter DEMO Watermarks*******

Figure 9-21: Step 5.2: Verify MPLS Interfaces

Guidelines
Use the display mpls interface command to display MPLS interface
information, including the interface name, interface status, and interface MPLS MTU.

Syntax
display mpls interface [ interface-type interface-number ]
interface-type interface-number

Specifies an interface by the interface type and number. If you do not


specify an interface, the command displays MPLS information for all
MPLS-enabled interfaces.

Examples
Displays all MPLS interfaces.
<Sysname> display mpls interface
Interface

Status

MPLS MTU

Vlan2

Up

1500

Vlan20

Up

1500

The MPLS MTU of an interface is in bytes.

Step 5.3: Verify MPLS global parameters


LDP is a label distribution protocol. As shown in Figure 9-22, the display mpls ldp
parameter command will show LDP global configuration settings. The output
includes nonstop routing and graceful restart information (both disabled by default).

******ebook converter DEMO Watermarks*******

Figure 9-22: Step 5.3: Verify MPLS global parameters

A network administrator would need to enable support for nonstop routing or


graceful restart in a situation where dual management modules are used or in IRF
based systems. This ensures seamless failover to another IRF member or management
module in a situation where the primary master fails.
The output also displays the label switch router ID.

Guidelines
Use the display
parameters.

mpls

ldp

parameter

command to display LDP running

Syntax
display mpls ldp parameter [ vpn-instance vpn-instance-name ]
vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string


of 1 to 31 characters. The command displays the LDP running
parameters for the specified VPN. If you do not specify a VPN instance,
the command displays the LDP running parameters for the public
network.

Examples
Display LDP running parameters for the public network.

******ebook converter DEMO Watermarks*******

<Sysname> display mpls ldp parameter


Global Parameters:
Protocol Version : V1
Nonstop Routing : Off

Graceful Restart : Off

Reconnect Time : 120 sec

Forwarding State Hold Time: 360 sec

Instance Parameters:
Instance ID : 0

Instance State : Active

LSR ID : 0.0.0.0
Loop Detection : Off
Hop Count Limit : 32
Label Retention
Ordered

Mode:

IGP Sync Delay : 0 sec

Path Vector Limit : 32


Liberal

Label

Distribution

Control

Mode:

IGP Sync Delay on Restart : -

See Table 9-4 for the output description.


Table 9-4: Display MPLS LDP running parameter output description
Field

Description

Global
Global parameters for all LDP-enabled networks.
Parameters
Protocol
LDP protocol version.
Version
Whether the nonstop routing function is enabled. This field is not
Nonstop
supported in the current software version and is reserved for future
Routing
support.
Whether the GR function is enabled.
Graceful
OnEnabled.
Restart
OffDisabled.
Reconnect
Value of the Reconnect timer, in seconds.
Time
Forwarding
State Hold Value of the MPLS Forwarding State Holding timer, in seconds.
Time
Instance
Running parameters for a specific VPN instance or public network.
Parameters
VPN instance ID. For the public network, this field displays 0.

******ebook converter DEMO Watermarks*******

VPN instance ID. For the public network, this field displays 0.
Instance
State
LSR ID
Loop
Detection

LDP status in the VPN instance, Active or Inactive.


LSR ID of the local device.
Whether loop detection is enabled.
OnEnabled.
OffDisabled.

Hop Count
Hop count limit specified for loop detection.
Limit
Path Vector
Path Vector length limit specified for loop detection.
Limit
Label
Retention
Mode
IGP Sync
Delay
IGP Sync
Delay on
Restart

The device supports only the Liberal mode.


Delay time (in seconds) that LDP must wait before it notifies IGP of an
LDP session-up event. This field is not supported in the current
software version and is reserved for future support.
Delay time (in seconds) that LDP must wait before it notifies IGP of an
LDP session-up event in case of LDP restart. This field is not supported
in the current software version and is reserved for future support.

Step 5.4: Verify MPLS LDP Enabled Interfaces


and Peers
As shown in Figure 9-23, interfaces enabled for MPLS LDP can be verified using the
display mpls ldp interface command.

******ebook converter DEMO Watermarks*******

Figure 9-23: Step 5.4: Verify MPLS LDP Enabled Interfaces and Peers

An administrator can also verify that an LDP peer relationship has been established
with a peer LDP device using the display mpls ldp peer command. Information
such as support for graceful restart is shown in the output.

display mpls ldp interface


Use the display
information.

mpls

ldp

interface

command to display LDP interface

Syntax
display mpls ldp interface [ interface-type interface-number ]
interface-type interface-number

Specifies an interface by its type and number. If you do not specify an


interface, this command displays information about all LDP interfaces.

Examples
Display information about all LDP interfaces.
<Sysname> display mpls ldp interface
Interface

MPLS

LDP

Auto-config

Vlan17

Enabled

Configured -

Vlan20

Enabled

Configured -

******ebook converter DEMO Watermarks*******

See Table 9-5 for the output description.


Table 9-5: Display MPLS LDP interface output description
Field

Description

Interface
MPLS
LDP
Autoconfig

Interface enabled with LDP.


Whether the interface is enabled with MPLS.
Whether the interface is configured with the mpls ldp enable command.
LDP automatic configuration information. This field is not supported in the
current software version and is reserved for future support.

display mpls ldp peer


Use the display mpls ldp peer command to display the LDP peer and session
information.

Syntax
display mpls ldp peer [ vpn-instance vpn-instance-name ] [ peerlsr-id ] [ verbose ]
vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string


of 1 to 31 characters. The command displays LDP peer and session
information for the specified VPN. If you do not specify a VPN instance,
the command displays the LDP peer and session information for the
public network.
peer peer-lsr-id

Specifies an LDP peer by its LSR ID. If you do not specify this option,
the command displays all LDP peers and related session information.
verbose

Displays detailed LDP peer and session information. If you do not


specify this keyword, the command displays brief LDP peer and session
information.

Examples
Display brief information about all LDP peers and LDP sessions for the public

******ebook converter DEMO Watermarks*******

network.
<Sysname> display mpls ldp peer
Total number of peers: 1
Peer LDP ID

State

Role

GR

MD5 KA

Sent/Rcvd

2.2.2.9:0

Operational Passive Off Off 39/39

See Table 9-6 for the output description.


Table 9-6: Display MPLS LDP peer output description
Field

Description

Peer LDP
LDP identifier of the peer.
ID
State of the LDP session between the local LSR and the peer:
Non ExistentNo TCP connection is established.
InitializedA TCP connection has been established.
State

OpenRecvLDP has received an acceptable initialization message.


OpenSentLDP has sent an initialization message.
OperationalAn LDP session has been established.

Role

GR

Role of the local LSR in the session, Active or Passive.


In a session, the LSR with a higher IP address takes the Active role. The
Active LSR initiates a TCP connection to the passive LSR.
Whether GR is enabled on the peer.
OnEnabled.
OffDisabled.
Whether MD5 authentication is enabled for the LDP session on the local
device.

MD5

OnEnabled.
OffDisabled.

KA
Number of keepalive messages sent/received.
Sent/Rcvd

Step 5.5: Verify MPLS LDP LSP Overview


******ebook converter DEMO Watermarks*******

Once LDP has exchanged forwarding equivalence class (FEC) and label information,
an administrator can view the LSP table with display mpls ldp lsp command.
Figure 9-24 shows networks 10.0.0.1/32, 10.0.0.2/32 and 10.0.1.0/24. These are IP
prefixes listed in the IPV4 routing table. Each prefix has an associated label which is
announced to peer devices.

Figure 9-24: Step 5.5: Verify MPLS LDP LSP Overview

The local router will receive traffic for each FEC that is either unlabelled (displayed
with a hyphen "-"), or is labeled (label displayed). Outgoing labels are also shown is
used or a hyphen "-" is shown to indicate unlabelled traffic.
An MPLS router may receive either labeled or unlabelled traffic. If labeled traffic is
required, the LSR will typically either swap or pop the label. Unlabelled traffic will
typically have a label imposed when the egress interface is a MPLS enabled
interface.

Guidelines
Use the display mpls ldp lsp command to display information about LSPs generated
by LDP.

Syntax
display mpls ldp lsp [ vpn-instance
destination-address mask-length ]

vpn-instance-name

vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string


of 1 to 31 characters. The command displays LDP LSP information for

******ebook converter DEMO Watermarks*******

the specified VPN. If you do not specify a VPN instance, the command
displays LDP LSP information for the public network.
destination-address mask-length

Specifies an FEC by an IP address and a mask length in the range of 0 to


32. If you do not specify a FEC, the command displays information about
LDP LSPs for all FECs.

Examples
Display LDP LSP information for the public network.

See Table 9-7 for the output description.


Table 9-7: Display MPLS LDP LSP output description
Field

Description
LSP status:

Status Flags *Stale, indicating the LSP is under a GR process.


LLiberal, indicating the LSP is not available.
LSP statistics:
FECsTotal number of FECs.
Ingress LSPsNumber of LSPs that take the local device as the
ingress node.

******ebook converter DEMO Watermarks*******

Transit LSPsNumber of LSPs that take the local device as a transit


node.
Egress LSPsNumber of LSPs that take the local device as the
egress node.
FEC
In/Out
Label
Nexthop
OutInterface

Forwarding equivalence class identified by an IP prefix.


Incoming/outgoing label.
Next hop address for the FEC.
Outgoing interface for the FEC.

Step 5.6: Verify Using Tracert LSP


Use the tracert lsp ipv4 command to locate MPLS LSP errors on the LSP for a
FEC, as shown in Figure 9-25. The command sends MPLS echo requests along the
LSP to be inspected, with the TTL increasing from 1 to the specified value.

Figure 9-25: Step 5.6: Verify Using Tracert LSP

Each hop along the LSP will return an MPLS echo reply to the ingress device
because of the TTL timeout (similar to IPv4 tracert).
The ingress device can collect information about each hop along the LSP, including
an LSP failure. For example, MPLS may not be enabled on an interface in the path
which results in the failed LSP.
You can also use MPLS LSP tracert command to collect information about hop in the
LSP, including the label allocated.

Syntax
tracert lsp [ -a source-ip | -exp exp-value | -h ttl-value | -r
reply-mode | -t time-out ] * ipv4 dest-addr mask-length [
destination-ip-addr-header ]

******ebook converter DEMO Watermarks*******

destination-ip-addr-header ]
-a source-ip

Specifies the source IP address for the echo request messages.


-exp exp-value

Specifies the EXP value for the echo request messages. The exp-value
argument ranges from 0 to 7 and defaults to 0.
-h ttl-value

Specifies the TTL value for the echo request messages. The ttl-value
argument ranges from 1 to 255 and defaults to 30.
-r reply-mode

Specifies the reply mode of the receiver in response to the echo request
messages. The reply-mode argument can be 1 or 2, where 1 means Do
not response and 2 means Respond using a UDP packet. The default
is 2.
-t time-out

Specifies the timeout interval for the response to an echo request


message. The time-out argument ranges from 0 to 65535 milliseconds
and defaults to 2000 milliseconds.
ipv4 dest-addr mask-length

Specifies a FEC by an IPv4 destination address and the mask length of


the destination address. The mask-length argument ranges from 0 to 32.
destination-ip-addr-header

Specifies the destination address in the IP header of the MPLS echo


request messages. It can be any address on segment 127.0.0.0/8any
local loopback address.

Example
Locate errors along the LSP for FEC 3.3.3.9.

******ebook converter DEMO Watermarks*******

Tracert MPLS
Use the tracert mpls ipv4 command to trace MPLS LSPs from the ingress node to the
egress node for an IPv4 prefix.
tracert mpls [ -a source-ip | -exp exp-value | -h ttl-value | -r
reply-mode | -rtos tos-value | -t time-out | -v | fec-check ] *
ipv4 dest-addr mask-length [ destination start-address [ endaddress [ address-increment ] ] ]
-a source-ip

Specifies the source address for MPLS echo request packets. If you do
not specify this option, the command uses the primary IP address of the
outgoing interface as the source address for MPLS echo requests.
-exp exp-value

Specifies the EXP value for MPLS echo request packets, in the range of
0 to 7. The default is 0.
-h ttl-value

Specifies the maximum TTL value for MPLS echo request packets,
namely, the maximum number of hops to be inspected. The value range
for the ttl-value argument is 1 to 255, and the default is 30.
-r reply-mode

Specifies the reply mode of the receiver in response to MPLS echo


request packets. The reply-mode argument can be 1, 2, or 3. 1 means
"Do not reply," 2 means "Reply by using a UDP packet," and 3 means
"reply by using a UDP packet that carries the Router Alert option." The
default is 2.
-rtos tos-value

Specifies the ToS value in the IP header of an MPLS echo reply packet.
The value range is 0 to 7, and the default value is 6.

******ebook converter DEMO Watermarks*******

-t time-out

Specifies the timeout interval for the reply to an MPLS echo request.
The value range is 0 to 65535 milliseconds, and the default is 2000
milliseconds.
-v

Displays detailed reply information. If you do not specify this keyword,


the command displays brief reply information.
fec-check

Checks the FEC stack at transit nodes.


dest-addr mask-length

Specifies an FEC by an IPv4 destination address and a mask length. The


value range for the mask-length argument is 0 to 32.
destination

Specifies the destination address in the IP header of MPLS echo


requests. The default is 127.0.0.1.
start-address

Specifies the destination address or the start destination address. This


address must be an address on subnet 127.0.0.0/8a local loopback
address. If the start-address argument is specified without the endaddress argument, the start-address is the destination address in the IP
header. If you specify both the start-address argument and the endaddress argument, you specify a range of destination addresses and the
destination addresses increase in turn by the address-increment, starting
from the start-address to the end-address. The command performs a
traceroute for each of the destination addresses.
end-address

Specifies the end destination address. This address must be an address


on subnet 127.0.0.0/8a local loopback address.
address-increment

Specifies the increment value by which the destination address in the IP


header increases in turn. The value range is 1 to 16777215 and the
default value is 1.

******ebook converter DEMO Watermarks*******

Examples
Trace the path that the LSP (for FEC 5.5.5.9/32) traverses from the ingress node to
the egress node. Specify the IP header destination address range as 127.1.1.1 to
127.1.1.2 and set the address increment value to 1. With these settings, the device
performs a traceroute for 127.1.1.1 and 127.1.1.2, respectively.

Trace the path that the LSP (for FEC 5.5.5.9/32) traverses from the ingress node to
the egress node. Display detailed reply information, specify the IP header destination
address range as 127.1.1.1 to 127.1.1.2, and set the address increment value to 1.
With these settings, the device performs a traceroute for 127.1.1.1 and 127.1.1.2,
respectively.

******ebook converter DEMO Watermarks*******

See Table 9-8 for the output description:


Table 9-8: Tracert MPLS output description
Field
LS trace route
FEC
Destination
address
TTL
Replier
Time
Type
Downstream
ReturnCode

Description
Trace the LSPs for the specified FEC.
Destination IP address in the IP header.
Number of hops
Address of the LSR that replied to the request.
Time used to receive the reply, in milliseconds.
LSR type: Ingress, Transit, or Egress.
Address of the downstream LSR and the label assigned by the
downstream LSR.
Return code. The number in parentheses represents a return
subcode.

Summary
In this chapter you learned Multiprotocol Label Switching (MPLS) basics including
how labels are allocated and advertised using label distribution protocols like LDP.
You learned many MPLS terms including LSR, FEC, LSP, labels, label stack, FIB,
RIB, LFIB, LIB and PHP amongst others.
You learned how MPLS inserts a 32 bit header to packets which includes a 20 bit
label. The behavior of LSRs was explained and how labels are inserted, swapped or
popped.
Tables that are used by LSRs including RIB, LIB, FIB and LFIB were explained.
Both the advertisement of FECs and forwarding of data to a FEC were explained.
The configuration and verification of basic MPLS was then detailed.

Learning Check
Answer each of the questions below.

******ebook converter DEMO Watermarks*******

1. An HP Comware MPLS P LSR receives an MPLS labeled packet. Which table


will be used for reading the packet and determining the hop behavior?
a. RIB
b. LIB
c. FIB
d. LFIB
2. Which protocol determines the path LSRs use?
a. EGP
b. LDP
c. IGP
d. TDP
3. An HP Comware Ingress LSR receives a packet from a CE. Which table will be
used for packet forwarding?
a. RIB
b. FIB
c. LIB
d. LFIB
4. Which MPLS feature removes a label on a P router rather than the PE router
when traffic is destined to a network directly connected to the PE router?
a. PHP
b. PBB
c. L2VPN
d. LDP

Learning Check Answers


1. d
2. c
3. b
4. a

******ebook converter DEMO Watermarks*******

******ebook converter DEMO Watermarks*******

10 MPLS Layer 2 VPN (MPLS


L2VPN)

EXAM OBJECTIVES
In this chapter, you learn to:
Describe MPLS L2VPN Features.
Understand L2VPN architecture.
Describe L2VPN implementation methods: Martini, Kompella, CCC, SVC.
Configure MPLS L2VPN.
Verify MPLS L2VPN.

INTRODUCTION
This chapter discusses MPLS L2VPN technologies which provide point-to-point
layer 2 connections across an MPLS backbone network. MPLS VPLS connections
that provide point-to-multipoint connections are discussed in Chapter 11.

ASSUMED KNOWLEDGE
You should have a basic understanding of Multiprotocol Label Switching (MPLS),
including basic operations, the behavior of a Label Switching Router (LSR), and
Label Switched Paths (LSPs). You should also be familiar with MPLS application
uses cases including L2VPNs and VPLS.

MPLS L2VPN
******ebook converter DEMO Watermarks*******

Traditional VPNs based on Asynchronous Transfer Mode (ATM) or Frame Relay


(FR) were popular in the past. They shared the network infrastructure of carriers, but
had some inherent disadvantages:
Dependence on dedicated media: To provide both ATM-based and FR-based VPN
services, carriers had to establish two separate infrastructures across the whole
service scope, one ATM infrastructure and one FR infrastructure. The cost was
very high and the infrastructures were not utilized efficiently.
Complicated deployment: To add a site to an existing VPN, you had to modify the
configurations of all edge nodes connected with the VPN site.
MPLS L2VPN, shown in Figure 10-1, was developed as a solution to address the
above disadvantages.

Figure 10-1: MPLS L2VPN

MPLS L2VPN provides Layer 2 VPN services over an MPLS or IP backbone. It


allows carriers to establish L2VPNs on different data link layer protocols, including
ATM, FR, VLAN, Ethernet and PPP.
MPLS L2VPN transfers user data transparently and from a user's perspective, the
MPLS network is a Layer 2 switched network that can be used to establish Layer 2
connections between nodes. For example, when two Ethernet networks are connected
through MPLS L2VPN over an MPLS backbone, Ethernet users are unaware of the
MPLS backbone. The user experience is the same as if they were connected directly
through an Ethernet connection.
MPLS L2VPN is an implementation of Pseudo Wire Emulation Edge-to-Edge

******ebook converter DEMO Watermarks*******

MPLS L2VPN is an example of an application that can make use of the MPLS core
network. L2VPNs provide point-to-point connectivity in contrast to VPLS which
provides point-to-multipoint connectivity. If a customer requires point-to-multipoint
connections to interconnect three datacenters for example, they would need to use
VPLS or other technologies. VPLS is an extension to MPLS L2VPNs and is
discussed in Chapter 11.

Comparison with MPLS L3VPN


MPLS L3VPN is not discussed in this study guide, but the question often arises as to
what the difference is between L3VPN and L2VPN.
Compared with MPLS L3VPN, MPLS L2VPN has the following advantages:
High scalability: MPLS L2VPN establishes only Layer 2 connections. It does not
involve the routing information of users. This greatly reduces the load of the PEs
and even the load of the whole service provider network, enabling carriers to
support more VPNs and to service more users.
Guaranteed reliability and private routing information security: As no routing
information of users is involved, MPLS L2VPN neither tries to obtain nor
processes the routing information of users, guaranteeing the security of the user
VPN routing information.
Support for multiple network layer protocols, such as IP, IPX, and SNA.
Please refer to the HP website for more information about L3VPN.

MTU
An MPLS label stack is inserted between the link layer header and network layer
header of a packet. With the addition of the MPLS header, an MPLS packet may
exceed the maximum transmission unit (MTU) of an interface and therefore be
dropped.
To address the issue, you can configure the MPLS MTU on an interface of an LSR.
The LSR will then compare the length of an MPLS packet against the configured
MPLS MTU on the interface. When the packet is larger than the MPLS MTU:
If fragmentation is allowed, the LSR removes the label stack from the packet,
fragments the IP packet (the length of a fragment is the MPLS MTU minus the
length of the label stack), adds the label stack back into each fragment, and then
forwards the fragments.
If fragmentation is not allowed, the LSR drops the packet directly.

******ebook converter DEMO Watermarks*******

To configure the MPLS MTU of an interface, see Table 10-1.


Table 10-1: Configure the MPLS MTU of an interface
Step

Command

1. Enter system view.

system-view

2. Enter interface view.

interface interfacetype interface-number

3. Configure the MPLS


MTU of the interface.

mpls mtu value

Remarks

By default, the MPLS MTU of an


interface is not configured.

MPLS packets carrying L2VPN or IPv6 packets are always successfully forwarded,
even if they are larger than the MPLS MTU.
If the MPLS MTU of an interface is greater than the MTU of the interface, data
forwarding may fail on the interface.
If you do not configure the MPLS MTU of an interface, fragmentation of MPLS
packets will be based on the MTU of the interface, and the length of fragments does
not take the MPLS labels into account. Thus, the length of an MPLS fragment may be
larger than the interfaces MTU.

MPLS L2VPN Label Stack


In MPLS L2VPNs, the concepts and principles of CE, PE and P are the same as those
in other MPLS technologies such as basic MPLS or MPLS L3VPNs:
Customer edge device (CE): A CE resides on a customer network and has one or
more interfaces directly connected with service provider networks. It can be a
router, a switch, or a host. It unaware of the existence of the VPN and does not
need to support MPLS.
Provider edge router (PE): A PE resides on a service provider network and
connects one or more CEs to the network. On an MPLS network, all VPN
processing occurs on the PEs.
Provider (P) router: A P router is a backbone router on a service provider
network. It is not directly connected with any CE. It only needs to be configured
with basic MPLS forwarding capability.
MPLS L2VPN uses label stacks, as shown in Figure 10-2, to implement the
transparent transmission of user packets in the MPLS network. Layer 2 packets

******ebook converter DEMO Watermarks*******

received by the PE from a CE, are encapsulated using a VC label and tunnel label.

Figure 10-2: MPLS L2VPN Label Stack

An outer label, also called a tunnel label, is used to transfer packets from one PE
to another.
An inner label, also called a VC label, is used to identify different connections
between VPNs.
Upon receiving packets, a PE determines to which CE the packets are to be
forwarded based on the VC labels.
The label stacking is discussed in more detail later in this chapter.

MPLS L2VPN Terminology


Some basic MPLS L2VPN terms you need to understand include VLL, VC and PW.
A traditional leased line is used to physically connect two remote sites. From a
customer point of view, a virtual leased line (VLL) behaves in the same way as a
traditional leased line. When traffic is sent from one CE to another through an MPLS
backbone, Layer 2 traffic is sent to the service provider at one end by a CE device
and arrives at the other end CE device unchanged. A VLL interconnects two customer
CE devices logically across the MPLS backbone.
A virtual circuit (VC) is also called a pseudo wire (PW). It is a virtual bidirectional
connection that connects the attachment circuits (ACs) on two PEs. An MPLS VC
includes a pair of label switch paths (LSPs) in opposite directions. In other words,

******ebook converter DEMO Watermarks*******

each unidirectional path of the virtual circuit uses a different LSP and therefore
different labels for forwarding. Traffic from CE A to CE B will use different labels
than traffic from CE B to CE A.
Multiple virtual circuits can share the same core MPLS network in conjunction with
other MPLS technologies such as MPLS traffic engineering (MPLS-TE) and MPLS
L3VPNs.
See Table 10-2 for terminology definitions.
Table 10-2: Terminology definitions
Term

Description

Attachment An AC is a link between a CE and a PE, such as an FR DLCI, ATM


circuit (AC) VPI/VCI, Ethernet interface, VLAN, or PPP connection.
A cross-connect concatenates two physical or virtual circuits such as
CrossACs and PWs. It switches packets between the two physical or virtual
connect
circuits. Cross-connects include AC to AC cross-connect, AC to PW
cross-connect, and PW to PW cross-connect.
A CE device is a customer network device directly connected to the
Customer
service provider network. It can be a network device (such as a router
edge (CE) or a switch) or a host. It is unaware of the existence of any VPN,
neither does it need to support MPLS.
As a forwarding technology based on classification, MPLS groups
Forwarding
packets to be forwarded in the same manner into a class called the
equivalence
forwarding equivalence class (FEC). That is, packets of the same FEC
class (FEC)
are handled in the same way.
A label block is a set of labels. It includes the following parameters:
Label baseThe LB specifies the initial label value of the label
block. A PE automatically selects an LB value that cannot be
manually modified.
Label rangeThe LR specifies the number of labels that the label
block contains. The LB and LR determine the labels contained in
the label block. For example, if the LB is 1000 and the LR is 5, the
label block contains labels 1000 through 1004.
Label-block offsetThe LO specifies the offset of a label block. If
the existing label block becomes insufficient as the VPN sites
increase, you can add a new label block to enlarge the label range.
A PE uses an LO to identify the position of the new label block.
Label block
The LO value of a label block is the sum of the LRs of all

******ebook converter DEMO Watermarks*******

previously assigned label blocks. For example, if the LR and LO


of the first label block are 10 and 0, the LO of the second label
block is 10. If the LR of the second label block is 20, the LO of the
third label block is 30.
A label block which has LB, LO, and LR values of 1000, 10, and 5
is represented as 1000/10/5.
Assume that a VPN has 10 sites, and a PE assigns the first label
block LB1/0/10 to the VPN. When another 15 sites are added, the
PE keeps the first label block and assigns the second label block
LB2/10/15 to extend the network. LB1 and LB2 are the initial label
values that are randomly selected by the PE.
MPLS-TE focuses on the optimization of overall network
performance. It is intended to conveniently provide highly efficient
and reliable network services. The performance objectives associated
MPLS
with TE are either traffic oriented to enhance quality of service (QoS)
Traffic
or resources oriented to optimize resources (especially bandwidth)
Engineering
utilization. TE helps optimize network resources use to reduce
(MPLS-TE)
network administrative cost, and dynamically tune traffic when
congestion or flapping occurs. In addition, it allows ISPs to provide
added value services.
Provider
P devices do not directly connect to CEs. They only need to forward
device (P) user packets between PEs.
A PE device is a service provider network device connected to one or
Provider
more CEs. It provides VPN access by mapping and forwarding
edge (PE) packets from user networks to public network tunnels and from public
network tunnels to user networks.
Pseudo wire A virtual bidirectional connection between two PEs. An MPLS PW
(PW)
comprises a pair of LSPs in opposite directions.
Route
An RD is added before a site ID to distinguish the sites that have the
distinguisher same site ID but reside in different VPNs. An RD and a site ID
(RD)
uniquely identify a VPN site.
PEs use the BGP route target attribute (also called "VPN target"
attribute) to manage BGP L2VPN information advertisement. PEs
support the following types of route target attributes:
Export target attributeWhen a PE sends L2VPN information (such
as site ID, RD, and label block) to the peer PE in a BGP update
message, it sets the route target attribute in the update message to
export target.

Route target
Import target attributeWhen a PE receives an update message
(RT)

******ebook converter DEMO Watermarks*******

from the peer PE, it checks the route target attribute in the update
message. If the route target value matches an import target, the PE
accepts the L2VPN information in the update message.
Route target attributes determine which PEs can receive L2VPN
information, and from which PEs that a PE can receive L2VPN
information.
A site ID uniquely identifies a site in a VPN. Sites in different VPNs
can have the same site ID.
A tunnel (or public tunnel) is a connection that carries one or more
Tunnel
PWs across the MPLS or IP backbone. It can be an LSP tunnel, MPLS
TE tunnel, or a GRE tunnel.
A VC is also called a Pseudowire (PW). It is a virtual bidirectional
Virtual
connection that connects the ACs on two PEs. An MPLS VC includes a
Circuit (VC)
pair of LSPs in opposite directions.
A point-to-point L2 VPN service provided in the public network. It
Virtual
enables two sites to be connected as if they were connected via a
Leased Line
leased line. It cannot provide switching among multiple points of the
(VLL)
service provider.
Site ID

MPLS L2VPN Configuration Methods


Overview
Multiple L2VPN implementation methods are available. Some options require
manual configuration and others use dynamic signaling protocols to advertise the
labels used by the L2VPN Virtual Circuit.
The Provider-Provisioned Virtual Private Network (PPVPN) working group of the
IETF has drafted several framework protocols. Two of the most important ones are
Martini draft and Kompella draft:
draft-martini-l2circuit-trans-mpls
draft-kompella-ppvpn-l2vpn
The Martini draft defines a method for establishing PPP links to implement MPLS
L2VPN. It uses Label Distribution Protocol (LDP) as a signaling protocol for VC
label transfer.
The Kompella draft defines a CE-to-CE mode for implementing MPLS L2VPN on the

******ebook converter DEMO Watermarks*******

MPLS network. It uses extended BGP as the signaling protocol to advertise Layer 2
reachability information and VC labels. Kompella uses similar protocols (MBGP /
MPLS) to those used in L3VPNs defined in RFC 2547.
L2VPNs are extended to support point-to-multipoint connections using VPLS in RFC
4761 and RFC 4762. The two MPLS L2VPN implementation methods use Label
Distribution Protocol (LDP) and Border Gateway Protocol (BGP) to carry VC labels
and establish point-to-multipoint links. RFC 4762 is defined as Virtual Private LAN
Service (VPLS) Using Label Distribution Protocol (LDP) Signaling and RC 4761 is
defined as Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and
Signaling.
In addition, MPLS L2VPN can also be implemented by configuring VC labels
statically. Circuit Cross Connect (CCC) and Static Virtual Circuit (SVC) are two of
the static implementation methods.
Note
The focus of this study guide is Martini as it is the most widely used
implementation. Other implementation methods are discussed here for
completeness and comparison purposes.

Comparison
See Table 10-3 for a comparison of MPLS L2VPN implementation modes.
Table 10-3: Comparison of MPLS L2VPN implementation modes

******ebook converter DEMO Watermarks*******

Martini MPLS L2VPN


Martini MPLS L2VPN employs two levels of labels to transfer user packets. LDP
and is used to advertise a destination network to create a LSP between PE devices.
This is used as the outer label in L2VPNs. Remote LDP is used as the signaling
protocol to distribute the inner VC label.
Previously LDP was used to advertise an IPv4 prefix. In this case, remote LDP
advertises a unique L2 interface to a remote PE device. PE1 in Figure 10-3 allocates
a label to the interface connected to CE1 and advertises that information to PE2.

******ebook converter DEMO Watermarks*******

Figure 10-3: Label distribution in Martini mode

To allow the exchange of VC labels between PEs, the Martini method extended LDP
by adding the forwarding equivalence class (FEC) type of VC FEC. Moreover, as the
two PEs exchanging VC labels may not be connected directly, a remote LDP session
must be set up to transfer the VC FEC and VC labels.
With Martini MPLS L2VPN, only PEs need to maintain a small amount of VC labels
and LSP mappings and no P device contains Layer 2 VPN information. Therefore, it
has high scalability. In addition, to add a new VC, you only need to configure a oneway VC for each of the PEs. Your configuration will not affect the operation of the
network.
The Martini method applies to scenarios with sparse Layer 2 connections, such as a
scenario with a star topology.
The VC FEC contains the following information:
VC typeEncapsulation type of the VC such as PPP, HDLC, FR, Ethernet and
ATM.
VC IDIdentifier of a VC on a PE.
The VC type and the VC ID uniquely identify a VC. On a PE, the VC ID uniquely
identifies a VC among the VCs of the same type.
As shown in Figure 10-3, the PEs send a VC FEC and VC label mapping to each

******ebook converter DEMO Watermarks*******

other. After the VC labels are distributed, a VC is set up between the PEs.
The key of the Martini method is to set up VCs between CEs. Martini MPLS L2VPNs
employ the VC type and VC ID to identify a VC. The VC type as discussed indicates
the encapsulation type of the VC, which can be ATM, VLAN, or PPP for example.
The VC ID uniquely identifies the VC among the VCs of the same VC type on a PE.
The PEs connecting the two CEs of a VC exchange VC labels through LDP, and bind
their respective CE by the VC ID.
Once LDP establishes an LSP between the two PEs and the label exchange and the
binding to CE are finished, a VC is set up and ready to transfer Layer 2 data.

Kompella MPLS L2VPN


Kompella MPLS L2VPN employs two levels of labels to transfer user packets, and
uses BGP as the signaling protocol to distribute the inner VC label.
Different from other MPLS L2VPN modes, Kompella introduces the concept of a
VPN. It allows CEs in the same VPN to establish a connection. CEs in different
VPNs cannot establish a connection.
Kompella MPLS L2VPN has the following basic concepts:
CE IDKompella numbers CEs inside a VPN. A CE ID uniquely identifies a CE
in a VPN. CEs in different VPNs can have the same CE ID.
Route distinguisherTo distinguish CEs with the same CE ID in different VPNs,
Kompella adds an RD before a CE ID. An RD and a CE ID uniquely identify a
CE.
Route targetKompella uses the BGP route target attribute (also called "VPN
target" attribute) to identify VPNs to make sure CEs in the same VPN can
establish a connection and CEs in different VPNs cannot.
A PE supports the following types of route target attributes:
Export target attributeWhen a PE sends L2VPN information (such as CE ID and
RD) to the peer PE through a BGP update message, it sets the route target attribute
carried in the update message to export target.
Import target attributeWhen a PE receives an update message from the peer PE,
it checks the route target attribute in the update message. If the route target value
matches an import target on the PE, the PE accepts the L2VPN information in the
update message.

******ebook converter DEMO Watermarks*******

In brief, route target attributes define which PEs can receive L2VPN information, and
from which PEs a PE can receive L2VPN information.
Different from Martini mode, the Kompella mode does not distribute the VC label
assigned by the local PE directly to the peer PE through the signaling protocol.
Instead, it uses label blocks to assign labels to multiple connections at time. A PE
advertises label blocks to all PEs in the same VPN. Each PE calculates the VC labels
according to the label blocks from other PEs.
A label block includes the following parameters:
Label BaseInitial label value of the label block. A PE automatically selects the
LB value that cannot be manually modified.
Label RangeNumber of labels that the label block contains. LB and LR
determine the labels contained in the label block. For example, if the LB is 1000
and LR is 5, the label block contains labels 1000 through 1004.
Label-block OffsetOffset of the label block. When CEs increase in a VPN and
the existing label block size is not enough, you do not need to withdraw the label
block on the PEs. Instead, you can assign a new label block in addition to the
existing label block to enlarge the label range. A PE uses LO to identify a label
block among all label blocks, and to determine from which label block it assigns
labels. The LO value of a label block is the sum of LRs of all previously assigned
label blocks. For example, if the LR and LO of the first label block is 10 and 0,
the LO of the second label block is 10. If the LR of the second label block is 20,
the LO of the third label block is 30.
The following describes a label block in the format of LB/LO/LR. For example, a
label block whose LB, LO, and LR are 1000, 10, and 5 is represented as 1000/10/5.
With label blocks, you can reserve some labels for the VPN for future use. This
wastes some label resources in the short term, but can reduce the VPN deployment
and configuration workload in the case of expansion.
Assume that an enterprise VPN contains 10 CEs and the number of CEs might
increase to 20 in future. In this case, set the LR to 20. When you add a CE to the
VPN, you only need to modify the configurations of the PE to which the new CE is
connected. No change is required for the other PEs, which simplifies VPN expansion.

CCC MPLS L2VPN


The CCC mode sets up a CCC connection by establishing two static LSPs in opposite
directions and binding the static LSPs to ACs.

******ebook converter DEMO Watermarks*******

Unlike other MPLS L2VPN implementations, Circuit Cross Connect (CCC) employs
just one level of label to transfer user data. Therefore, it uses label switched paths
(LSPs) exclusively. That is, a CCC LSP can be used to transfer only the data of the
CCC connection; it can neither be used for other MPLS L2VPN connections, nor for
MPLS L3VPN or common IP packets.
The most significant advantage of this method is that no label signaling is required
for transferring Layer 2 VPN information. As long as MPLS forwarding is supported
and service provider networks are interconnected, this method works perfectly. In
addition, since LSPs are dedicated, this method supports QoS services.
There are two types of CCC connections:
Local connection: A local connection is established between two local CEs that
are connected to the same PE. The PE functions like a Layer 2 switch and can
directly switch packets between the CEs without any static LSP.
Remote connection: A remote connection is established between a local CE and a
remote CE, which are connected to different PEs. In this case, a static LSP is
required to transport packets from one PE to another.
Note
You must configure for each remote CCC connection two LSPs, one for
inbound and the other for outbound, on all P devices on the path.

SVC MPLS L2VPN


Static Virtual Circuit (SVC) also implements MPLS L2VPN by static configuration. It
transfers L2VPN information without using any signaling protocol.
The SVC method resembles the Martini method closely and is in fact a static
implementation of the Martini method. The difference is that it does not use LDP to
transfer Layer 2 VC and link information. You only need to configure VC label
information.
Note
The labels for CCC and SVC range from 16 to 1023, which are reserved for
static LSPs.

******ebook converter DEMO Watermarks*******

MPLS L2VPN Architecture - Martini


Overview
The Martini L2VPN configuration method is now discussed in more detail as it is the
focus of this study guide.
Martini is configured on the premise that a working core MPLS infrastructure is in
place providing PE to PE communication. The L2VPN is configured between two PE
devices and requires that the loopback addresses of the PE devices be advertised via
a unique label. To implement this, the loopback IP addresses must be configured
using a /32 mask, advertised with a routing protocol such as OSPF, and advertised
via a label distribution protocol such as LDP. For L2VPNs to function, active LSPs
between PE loopbacks are required across the MPLS core.
As an example, in Figure 10-4, PE1 is configured with loopback address of
10.0.0.1/32. This address is advertised using a routing protocol like OSPF to PE2.
At the same time, LDP advertises this destination prefix with an auto generated label
to PE2.

Figure 10-4: MPLS L2VPN Architecture - Martini

PE2 is thus able to reach 10.0.0.1/32 using a label rather than an IPv4 prefix (assume
penultimate hop popping is turned off in this example). Traffic from PE2 to the
loopback of PE1 will use the MPLS tunnel or LSP. In other words, traffic sent from
PE2 to PE1 uses an MPLS tagged packet with the label represented as "T" (tunnel
label) in Figure 10-4. This forms a virtual tunnel between PE1 and PE2.
In this example, the label discussed was applied to a destination IPv4 prefix
(10.0.0.1/32). However, labels can also be applied to L2 interfaces. This is the
feature that L2VPNs make use of to create Layer 2 connections between sites. The PE

******ebook converter DEMO Watermarks*******

devices in the figure do not have IP addresses configured on the interfaces connecting
them to the CE devices (AC interfaces). PE1 generates an additional label for the
interface to CE1 (AC). This label is advertised using remote LDP to PE2 and is
represented in Figure 10-4 as "V" (VC circuit label).
MPLS L2VPNs use label stacks to implement the transparent transmission of user
packets in the MPLS network. Layer 2 packets received by the PE from a CE, are
encapsulated using a VC label and tunnel label.
An outer label, also called a tunnel label, is used to transfer packets from one PE
to another.
An inner label, also called a VC (virtual circuit) label, identifies a Layer 2
connection to a CE. PE devices use this label to identify a specific local L2 port.
This is required because a PE may have multiple ports using the same tunnel label
for communication with another PE.
Upon receiving packets, a PE determines to which CE the packets are to be
forwarded to according to the VC labels.
As shown in Figure 10-4, MPLS L2VPN forwards packets in the following steps:
After PE 1 receives a Layer 2 packet from CE 1, it adds a VC label to the packet
according to the VC bound to the AC, searches for the public tunnel, adds a tunnel
tag to the packet, and then forwards the packet to the core MPLS network. Any P
devices will forward the packet to PE 2 according to the tunnel tag. In this
example however, PE1 forwards the traffic directly to PE2 as no P devices are
shown in the figure.
After PE 2 receives the packet from the public tunnel, it identifies the VC to which
the packet belongs according to the VC label of the packet, deletes the tunnel
label and the VC label from the packet, and then forwards the resulting packet to
CE 2 through the AC bound to the VC.

MPLS L2VPN Architecture - Martini


(continued)
To reiterate, one of the advantages of L2VPNs over L3VPNs is that multiple types of
traffic can be forwarded from CE to CE and not only IPv4. Ethernet or PPP frames
could be transported transparently across the MPLS core.
Another advantage of using MPLS virtual leased lines (VLL) rather than physical
leased lines is link recovery. If a link failed in an MPLS core with multiple paths
between PE devices, OSPF could reroute the traffic using an alternate path through

******ebook converter DEMO Watermarks*******

the core. A new tunnel label would be used, but the same VC label could be used for
the VLL. The VLL would therefore still be available, whereas a physical leased line
would be down.
PE devices may have multiple customers connected to them. Traffic received from a
customer by one PE should not inadvertently be transmitted to a different customer
L2VPN on another PE device.
On receipt of packets from a remote PE device, the local PE requires a label to
identify which interface to forward the packets out of. The PE will receive packets
from the tunnel and use the virtual circuit (VC) label to determine the local egress
interface. The PE will remove the tunnel and VC label from the packet, and then
forward the packet to the CE device.
This ensures that packets are transmitted out of the correct interface and ensures that
customer is unaware of any labels or VPNs.
A L2VPN typically requires bidirectional communication. However, MPLS label
switch paths (LSPs) are unidirectional and therefore two separate LSPs are used for
a L2VPN (transmit/receive). LSPs are created independently of L2VPNs and PE
devices require a mechanism to ensure that the transmitted and received packets of a
L2VPN are processed correctly.
To ensure that both transmit and receive traffic of a L2VPN is processed as part of
that L2VPN and not injected into another L2VPN, configure both PE devices with the
same pseudo wire ID.
A unique Pseudo wire ID will be assigned to each L2VPN connection. This is
configured by each PE device forming the point-to-point connection. The Pseudo
wire ID has to be set to the same value on both sides of the L2VPN and has to be
unique on each PE device. Multiple L2VPNs could be configured on a single PE
device and therefore each L2VPN must be uniquely identified on that PE with a
unique PW.
Remote LDP advertises labels between PE devices and ensures that transmit and
receive LSPs are bound to the correct L2 VPN. A PE device will receive an LDP
advertisement from another PE device indicating which label is associated with a
particular PW.
Pseudo wire IDs function as possible destination resources in the same way that
IPV4 prefixes in the routing table function as possible destination resources. Basic
LDP generates a label for IP prefixes found in the routing table and advertises that

******ebook converter DEMO Watermarks*******

label to peers. In this case however, remote LDP generates labels for the PW IDs
configured locally on the PE device and advertises those PW IDs and labels to peer
PE devices (which may be remote).
The remote LDP announcement contains information about the PW ID and label
which allows the remote PE to determine which label to use for a particular PW ID.
As an example, PE2 is advertising label 65679 to PE1 via remote LDP for pseudo
wire ID 3. When PE1 receives traffic from a CE device that is matched to the L2VPN
(PW 3), PE1 will encapsulate the traffic from the CE with inner label 65679 (VC
label) and an outer label (LSP tunnel label) learnt via LDP. The packet will then be
sent to the core MPLS network as an MPLS packet.
The combination of LDP (outer label advertisements) and remote LDP (inner label
advertisements) ensures that the correct LSP is used for both the transmit and receive
paths of the L2VPN. PE devices know which unidirectional LSP to use for the
bidirectional pseudo wire.

MPLS L2VPN Configuration Steps


The following configurations steps apply to Comware 7 devices. Comware 5 steps
differ.
The overview of configuring MPLS L2VPNs steps are as follows:
1. Before L2VPNs can be configured, the prerequisite steps of configuring the core
MPLS network with OSPF, MPLS and LDP must be completed.
2. The PE devices are configured globally for MPLS L2VPNs.
3. A service instance on the customer facing interfaces is configured.
4. In the last step a cross connect group is defined and used to bind or cross
connect the customer facing interface and the target PE device.
Once configured, verify that the configuration is working as expected.

Step 1: Configure Basic MPLS and LDP


The first step is to configure basic MPLS and LDP. The configuration was explained
in Chapter 9 and it thus not repeated here.
Before VPLS is configured, it is assumed that the backbone infrastructure has been
configured:

******ebook converter DEMO Watermarks*******

IP routing is configured with a routing protocol such as OSPF


Loopback addresses are configured and are being advertised
Basic MPLS and LDP are configured on all backbone facing interfaces
That any target PE loopback IP addresses are reachable via an LSP. Ensure for
example that PE1 is able to reach PE2 via a unique LSP and that PE2 can reach
PE1 using a different LSP label.

Step 2: Configure Global L2VPN


Once basic MPLS has been configured and tested, L2VPNs can be configured.
Ensure for example that the following has been completed:
LSR ID for the PE has been configured using the mpls lsr-id command
MPLS has been enabled on the backbone MPLS interface using the mpls enable
command
The l2vpn enable command, as shown in Figure 10-5, is not used exclusively by
L2VPNs, but is required for any kind of advanced L2 connectivity on a Comware7
device such as SPB or VPLS.

Figure 10-5: Step 2: Configure Global L2VPN

You must enable L2VPN before configuring other L2VPN settings.


Use the l2vpn enable command to enable L2VPN.
Use the undo l2vpn enable command to disable L2VPN.
L2VPNs are disabled by default.

Syntax
l2vpn enable
undo l2vpn enable
Follow the steps in Table 10-4 to enable L2VPN.
Table 10-4: Steps to enable L2VPN

******ebook converter DEMO Watermarks*******

Step

Command

Remarks

1. Enter system view. system-view


2. Enable L2VPN.

l2vpn enable By default, L2VPN is disabled.

Step 3: Configure a Service Instance


Overview
Once L2VPN has been enabled globally on a device, a service instance is required
for L2VPNs. A service instance is once again a generic term and is also used in
various L2 protocols such as VPLS and SPB.
The service instance is a locally significant number. This is configured on the PE
device and applied to the customer facing interface. In Figure 10-6, service-instance
10 is applied to interface ten-gigabit Ethernet 1/0/1. Instead of selecting all arriving
traffic on the interface for transmission through the L2VPN, a network administrator
can specify that only certain customer traffic types are transported via the L2VPN. In
the simplest scenario, all arriving traffic is selected for transmission. Other options
include specifying an individual VLAN or tagged traffic only. The encapsulation
command is used to specify matched traffic. In Figure 10-6, VLAN 10 traffic is
selected for transmission.

Figure 10-6: Step 3: Configure a Service Instance

802.1Q supports up to 4094 VLANs and therefore 4094 service instances could be
created on a single interface to match individual VLANs. Each service instance in
turn is mapped to a unique L2VPN. As an example, VLAN 10 traffic arriving on an
interface at SiteA could be matched and forwarded to one remote PE (SiteB) while
VLAN 11 traffic arriving on the same interface is forwarded to a different remote PE
(SiteC).
The service VLAN (s-vid) option applies to the outer VLAN tag of an incoming
packet. If 802.1Q frames are received from a customer network, the service VLAN
ID matches the 801.2Q VLAN ID.

******ebook converter DEMO Watermarks*******

However, QinQ frames could transmitted to the PE. An intermediate device between
the PE and CE could be adding and additional tag to the original 802.1Q frames. That
would allow the service provider to assign all VLANs that belong to a single
customer to a single outer VLAN tag. Therefore, the PE will receive frames with two
tags, the outer tag being the service VLAN ID and the inner tag, the customer VLAN
ID. The PE will transport all VLANs that belong to that customer based on the single
outer VLAN ID and send that traffic to a specific site. Other customers would be
encapsulated with a different service VLAN ID, which would allow the service
provider to scale beyond 4096 customer VLANs.

Service-instance
Use the service-instance command to create a service instance and enter service
instance view.
Use the undo service-instance command to delete an existing service instance.
By default, no service instance is created.
The service instances created on different Layer 2 Ethernet interfaces can have the
same service instance ID.
service-instance service-instance-id
undo service-instance service-instance-id
instance-id

Specifies the ID of the service instance, in the range of 1 to 4096.

Example
Create service instance 1 on the Layer 2 Ethernet interface Ten-GigabitEthernet 1/0/1
and enter service instance 1 view.
<Sysname> system-view
[Sysname] interface ten-gigabitethernet 1/0/1
[Sysname-Ten-GigabitEthernet1/0/1] service-instance 1
[Sysname-Ten-GigabitEthernet1/0/1-srv1]

encapsulation (service instance view)


Use the encapsulation command to configure a packet matching rule for the current
service instance.

******ebook converter DEMO Watermarks*******

Use the undo encapsulation command to remove the packet matching rule of the
current service instance.
By default, no packet matching rule is configured for a service instance.
You can choose only one of the following match criteria for a service instance:
Match all incoming packets.
Match incoming packets with any VLAN ID or no VLAN ID.
Match incoming packets with a specific VLAN ID.
The match criteria for different service instances configured on an interface must be
different.
You can create multiple service instances on a Layer 2 Ethernet interface, but only
one service instance can use the default match criteria (encapsulation default) to
match packets that do not match any other service instance. If only one service
instance is configured on an interface and the service instance uses the default match
criteria, all packets received on the interface match the default match criteria.
This command cannot be executed multiple times for a service instance.
Removing the match criteria for a service instance also removes the association
between the service instance and the VSI.

Syntax
encapsulation default
encapsulation { tagged | untagged }
encapsulation s-vid vlan-id [ only-tagged ]
undo encapsulation
default

Specifies the default match criteria.


s-vid vlan-id

Matches packets with a specific outer VLAN ID. The vlan-id argument
specifies a VLAN ID in the range of 1 to 4094.
only-tagged

Matches only tagged packets. If this keyword is not specified when the

******ebook converter DEMO Watermarks*******

matching VLAN is the default VLAN, packets with the default VLAN ID
or without any VLAN ID are all matched. If this keyword is specified
when the matching VLAN is the default VLAN, only packets with the
default VLAN ID are matched
tagged

Matches tagged packets.


untagged

Matches untagged packets.

Example
Configure service instance 1 on Ten-GigabitEthernet 1/0/1 to match packets that have
an outer VLAN ID of 111.
<Sysname> system-view
[Sysname] interface GigabitEthernet 1/0/1
[Sysname-GigabitEthernet3/0/3] service-instance 1
[Sysname-GigabitEthernet3/0/3-srv100] encapsulation s-vid 111

Step 4: Configure Cross Connect Group


Overview
In this step a cross connect group is created to bind the customer interface to the
target PE. The customer interface was configured in the previous step with a service
instance. The service instance configured previously selects traffic to transmit via the
L2VPN and that is bound in this step to the remote PE. The IP address specified by
the peer command is the remote PE's loopback IP address which is reachable using a
unidirectional LSP through the core MPLS network. The outer label (tunnel label) is
learned via LDP and the inner label (VC label) is configured with the pw-id
parameter. The pseudo wire (pw-id) parameter needs to be configured the same on
both PE devices. This allows both remote PE devices to determine through remote
LDP advertisements that this connection is the same L2VPN. Received and
transmitted traffic will therefore be matched to the same L2VPN.
In Figure 10-7, a cross connect group with the name l2vpn1 is configured. This is a
locally significant value. Within the cross connect group, a connection with the name
ldp is created that binds service instance configured on Ten Gigabit Ethernet 1/0/1 to

******ebook converter DEMO Watermarks*******

PE 192.3.3.3 using pseudo wire 3. The pw-id needs to be unique and match on both
PE devices. Each pseudo wire identifies a L2VPN and thus needs to be unique on
each PE, but needs to be the same on both sides of the pseudo wire.

Figure 10-7: Step 4: Configure Cross Connect Group

xconnect-group
Use the xconnect-group command to create a cross-connect group and enter crossconnect group view. If the specified group has been created, the prompt changes to
the cross-connect group view.
Use undo xconnect-group to delete a cross-connect group.
L2VPNs can create multiple LDP, BGP, and static PWs for a cross-connect group.

Syntax
xconnect-group group-name
undo xconnect-group group-name
group-name

Specifies the name of the cross-connect group, a case-sensitive string of


1 to 31 characters excluding hyphens.

Example
Create a cross-connect group named vpn1 and enter cross-connect group view.
<Sysname> system-view
[Sysname] xconnect-group vpn1
[Sysname-xcg-vpn1]

connection
Use connection to create a cross-connect and enter cross-connect view. If the
specified cross-connect has been created, the command opens cross-connect view.

******ebook converter DEMO Watermarks*******

Use undo

connection

to remove a cross-connect.

A cross-connect is a point-to-point connection. You can perform the following


operations in cross-connect view:
Execute ac interface and peer to connect an AC to a PW, so the PE can forward
packets between the AC and the PW.
Execute peer twice to connect two PWs to form a multi-segment PW.
Execute ac interface and ccc to connect an AC to a remote CCC connection, so
the PE can forward packets between the AC and the remote CCC connection.
No cross-connect is created by default.

Syntax
xconnect-group group-name
undo xconnect-group group-name
group-name

Specifies the name of the cross-connect group, a case-sensitive string of


1 to 31 characters excluding hyphens.

Example
Create cross-connect ac2pw for cross-connect group vpn1 and enter cross-connect
view.
<Sysname> system-view
[Sysname] xconnect-group vpn1
[Sysname-xcg-vpn1] connection ac2pw
[Sysname-xcg-vpn1-ac2pw]

ac interface
Use ac interface to bind an AC to a cross-connect.
Use undo ac interface to remove the binding.
An AC can be a Layer 3 interface or a service instance on a Layer 2 Ethernet
interface.
After you bind a Layer 3 interface or a service instance on a Layer 2 interface to a

******ebook converter DEMO Watermarks*******

cross-connect, the cross-connect forwards packets received from the Layer 3


interface or packets that match the service instance on the Layer 2 interface to the
bound PW or another AC.
The access mode determines how the PE treats the VLAN tag in Ethernet frames
received from the AC. It also determines how the PE forwards Ethernet frames to the
AC.
VLAN access modeEthernet frames received from the AC must carry a VLAN
tag in the Ethernet header. The VLAN tag is called a P-tag assigned by the service
provider. Ethernet frames sent to the AC must also carry the P-tag.
Ethernet access modeIf Ethernet frames from the AC have a VLAN tag in the
header, the VLAN tag is called a U-tag, and the PE ignores it. Ethernet frames
sent to the AC do not carry the P-tag.
The service instance specified in this command must have match criteria configured
by encapsulation.
No AC is bound to a cross-connect by default.

Syntax
ac interface interface-type interface-number [ service-instance instance-id ] [ access-mode { ethernet | vlan
}]
undo ac interface interface-type interface-number [ service-instance instance-id ]
interface-type interface-number

Specifies an interface.
service-instance instance-id

Specifies a service instance by its ID in the range of 1 to 4096.


access-mode

Specifies the access mode. By default, the access mode is VLAN.


ethernet

Specifies the Ethernet access mode.


vlan

Specifies the VLAN access mode.

******ebook converter DEMO Watermarks*******

Examples
Configure service instance 200 on the Layer 2 interface Ten-GigabitEthernet 1/0/1 to
match packets with an outer VLAN tag of 200, and bind the service instance to the
cross-connect actopw in the cross-connect group vpn1.
<Sysname> system-view
[Sysname] interface ten-gigabitethernet 1/0/1
[Sysname-Ten-GigabitEthernet1/0/1] service-instance 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] quit
[Sysname-Ten-GigabitEthernet1/0/1] quit
[Sysname] xconnect-group vpn1
[Sysname-xcg-vpn1] connection actopw
[Sysname-xcg-vpn1-actopw] ac interface ten-gigabitethernet 1/0/1 service-instance 200

Configure service instance 200 on Ten-GigabitEthernet 1/0/1 to match packets with


an outer VLAN tag of 200, and bind the service instance to the auto-discovery crossconnect in the cross-connect group vpwsbgp.
<Sysname> system-view
[Sysname] interface ten-gigabitethernet 1/0/1
[Sysname-Ten-GigabitEthernet1/0/1] service-instance 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] quit
[Sysname-Ten-GigabitEthernet1/0/1] quit
[Sysname] xconnect-group vpwsbgp
[Sysname-xcg-vpwsbgp] auto-discovery bgp
[Sysname-xcg-vpwsbgp-auto] site 1 range 10 default-offset 0
[Sysname-xcg-vpwsbgp-auto-1] connection remote-site-id 2
[Sysname-xcg-vpwsbgp-auto-1-2] ac interface ten-gigabitethernet 1/0/1 service-instance 200

peer
Use peer to configure a PW for a cross-connect and enter cross-connect PW view. If
the specified PW has been created, the command opens cross-connect PW view.
Use undo

peer

to delete a PW.

******ebook converter DEMO Watermarks*******

To create a static PW, you must specify the incoming and outgoing labels. To enter the
view of an existing static PW, you do not need to specify the incoming and outgoing
labels.
If you do not specify the incoming and outgoing labels when you create a new PW,
LDP is used to create the PW.
The PW ID for a PW must be the same on the PEs at the ends of the PW.
The LSR ID of the peer PE and the PW ID uniquely identify a PW, and must not both
be the same as those of any VPLS PW or PW bound to a cross-connect.
PW redundancy is mutually exclusive with multi-segment PW function. If you have
configured two PWs by using the peer command in cross-connect view, you cannot
configure a backup PW by using the backup-peer command in cross-connect PW
view, and vice versa.
No PW is configured for a cross-connect by default.

Syntax
peer ip-address pw-id pw-id [ in-label label-value out-label label-value ] [ pw-class class-name | tunnel-policy
tunnel-policy-name ]
undo peer ip-address pw-id pw-id
ip-address

Specifies the LSR ID of the peer PE.


pw-id pw-id

Specifies a PW ID for the PW, in the range of 1 to 4294967295.


in-label label-value

Specifies the incoming label of the PW, in the range of 16 to 1023.


out-label label-value

Specifies the outgoing label of the PW, in the range of 16 to 1023.


pw-class class-name

Specifies a PW class by its name, a case-sensitive string of 1 to 19


characters. You can configure the PW type and control word by
specifying a PW class. If no PW class is specified, the PW type is

******ebook converter DEMO Watermarks*******

determined by the interface type. The control word function is not


supported for PW types that do not require using control word.
tunnel-policy tunnel-policy-name

Specifies a tunnel policy by its name, a case-sensitive string of 1 to 19


characters. If no tunnel policy is specified, the default tunnel policy is
used.

Examples
Configure an LDP PW destined to 4.4.4.4 for the cross-connect pw2pw in the crossconnect group vpn1 and enter cross-connect PW view. The PW ID is 200.
<Sysname> system-view
[Sysname] xconnect-group vpn1
[Sysname-xcg-vpn1] connection pw2pw
[Sysname-xcg-vpn1-pw2pw] peer 4.4.4.4 pw-id 200
[Sysname-xcg-vpn1-pw2pw-4.4.4.4-200]

Configure a static PW destined to 5.5.5.5 for the cross-connect pw2pw in the crossconnect group vpn1 and enter cross-connect PW view. The static PW has an ID of
200, an incoming label of 100, and an outgoing label of 200.
<Sysname> system-view
[Sysname] xconnect-group vpn1
[Sysname-xcg-vpn1] connection pw2pw
[Sysname-xcg-vpn1-pw2pw] peer 5.5.5.5 pw-id 200 in-label 100 out-label 200
[Sysname-xcg-vpn1-pw2pw-5.5.5.5-200]

Step 5: Verify
Overview
The configured L2VPN can be verified and status reviewed.
Figure 10-8 shows output for both PE1 and PE2. The output shows that remote LDP
is advertising the labels to use (In/Out) for PW ID (pseudo wire) 3. An xconnect
group named l2vpn1 is configured on both PE devices. PE1 has a peer of 192.3.3.3
(PE2) and PE2 has a peer of 192.2.2.2 (PE1).

******ebook converter DEMO Watermarks*******

Figure 10-8: Step 5: Verify

PE2 is advertising label 65679 (in label) to PE1 via remote LDP. When PE1
receives traffic from a CE device that is matched to the L2VPN, PE1 will
encapsulate the traffic from the CE with inner label 65679 (VC label) and an outer
label (LSP tunnel label) learnt via LDP (not shown in output). The packet will then
be sent to the core MPLS network as an MPLS packet (with label stack of L). In the
same way, when PE2 sends packets via the L2VPN to PE1, the inner label (VC label)
is set to 65681 as learned via remote LDP and the outer label (LSP tunnel label) is
learnt locally using LDP, typically from a P device.
Once label exchange has taken place, the status of L2VPN is up.

Guidelines
Use the display l2vpn pw command to display L2VPN PW information.

Syntax
display l2vpn pw [ xconnect-group group-name ] [ protocol { bgp | ldp | static } ] [ verbose ]
xconnect-group group-name

Displays L2VPN PW information for the cross-connect group specified


by its name, a case-sensitive string of 1 to 31 characters. If no group is
specified, the command displays L2VPN PW information for all crossconnect groups.
protocol

Displays L2VPN PW information established by a specific protocol. If

******ebook converter DEMO Watermarks*******

no protocol is specified, the command displays L2VPN PW information


established by all protocols.
bgp

Displays BGP PW information.


ldp

Displays LDP PW information.


static

Displays static PW information, including remote CCC connections.


verbose

Displays detailed information. Without this keyword, the command


displays brief information.

display l2vpn pw
Display brief information about all L2VPN PWs.
<Sysname> display l2vpn pw
Flags: M - main, B - backup, H - hub link, S - spoke link, N - no split horizon
Total number of PWs: 2, 2 up, 0 blocked, 0 down, 0 defect
Xconnect-group Name: ldp
PeerPW ID/Rmt Site In/Out Label Proto Flag Link ID State
192.3.3.3 500

65699/65699 LDP M 0 Up

Xconnect-group Name: vpnb


Peer PW ID/Rmt Site In/Out Label Proto Flag Link ID State
192.3.3.3 2

65636/65663 BGP M 1 Up

See Table 10-5 for the output description:


Table 10-5: Display l2vpn pw output description
Step

Description
PW flag:

Flag

MPrimary PW.
BBackup PW.

******ebook converter DEMO Watermarks*******

PW
ID/Rmt
Site
Proto
Link ID
State

This field displays the PW ID for a static or LDP PW, and displays the
remote site ID for a BGP PW.
Protocol used to establish the PW: LDP, Static, or BGP.
Link ID of the PW.
PW state: Up, Down, Blocked, or BFD Defect.
Blocked indicates that the PW is a backup PW. Defect indicates BFD
has detected a defect on the PW.

display l2vpn pw verbose


Display detailed information about all PWs..
<Sysname> display l2vpn pw verbose
Xconnect-group Name: ldp
Connection Name: ldp
Peer: 192.3.3.3 PW ID: 500
Signaling Protocol : LDP
Link ID : 0 PW State : Up
In Label : 65699 Out Label: 65699
MTU : 1500
PW Attributes : Main
VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000000
Tunnel NHLFE IDs : 136
Xconnect-group Name: vpnb
Connection of auto-discovery: Site 1
Peer: 192.3.3.3 Remote Site: 2
Signaling Protocol : BGP
Link ID : 1 PW State : Up
In Label : 65636 Out Label: 65663
MTU : 1500
PW Attributes : Main
VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000000
Tunnel NHLFE IDs : 136

See Table 10-6 for the output description.

******ebook converter DEMO Watermarks*******

Table 10-6: Display l2vpn pw verbose output description


Field

Description

XconnectCross-connect group name.


group Name
Connection Cross-connect name, which is displayed for LDP and static PWs.
Peer
Peer IP address of the peer PE of the PW.
PW state: Up, Down, Blocked, or Defect.
PW State
Blocked indicates that the PW is a backup PW. Defect indicates BFD
has detected a defect on the PW.
Wait time to switch traffic from the backup PW to the primary PW
Wait to
when the primary PW recovers, in seconds. If the switchover is
Restore
disabled, this field displays Infinite.
Time
This field is available when both primary and backup PW exist, and is
displayed only for the primary PW.
Remaining
Remaining wait time for traffic switchover, in seconds.
Time
PW attribute:
PW
MainThe PW is the primary PW.
Attributes
BackupThe PW is the backup PW.
VCCV CC type:
Control-WordControl word.
VCCV CC

Router-AlertMPLS router alert label.


TTLTTL timeout.
VCCV BFD type:

Fault Detection with BFDBFD packets use IP/UDP encapsulation


(with IP/UDP Headers).
VCCV BFD
Fault Detection with Raw-BFDBFD packets use PW-ACH
encapsulation (without IP/UDP Headers).
Tunnel
Group ID

ID of the tunnel group for the PW.

NHLFE ID of the public tunnel that carries the PW.


Tunnel
If equal-cost tunnels are available, this field displays multiple NIDs.
NHLFE IDs
If no tunnel is available, this field displays None.
Connection

******ebook converter DEMO Watermarks*******

of autoThe PW is a BGP PW.


discovery
Site
Local site ID.
Remote Site Remote site ID.

Summary
In this chapter, you learned about MPLS L2VPNs which provide Layer 2 point-topoint VPN services over an MPLS or IP backbone.
MPLS L2VPNs transfer user data transparently. The MPLS network is a Layer 2
switched network that can be used to establish Layer 2 connections between CE
nodes.
Various implementation methods were discussed including Martini, Kompella, CCC
and SVC. Martini which uses LDP to exchange VC labels as discussed at length.
The configuration and verification of Martini implementations of L2VPNs was then
discussed.

Learning Check
Answer each of the questions below.
1. Which L2VPN implementation uses MBGP?
a. SVC
b. Martini
c. Kompella
d. CCC
2. Which L2VPN implementation uses one level of label?
a. SVC
b. Martini
c. Kompella
d. CCC
3. An administrator wants to configure a L2VPN. Which parameter must to be the
same on both PE devices?

******ebook converter DEMO Watermarks*******

a. pw-id
b. xconnect-group
c. connection
d. service-instance
4. Which L2VPN implementation uses LDP?
a. SVC
b. Martini
c. Kompella
d. CCC
5. Which protocol advertises the outer label in a Kompella L2VPN?
a. BGP
b. LDP
c. MBGP
d. Remote LDP

Learning Check Answers


1. c
2. d
3. a
4. b
5. b

******ebook converter DEMO Watermarks*******

11 Virtual Private LAN Service


(VPLS)

EXAM OBJECTIVES
In this chapter, you learn to:
Describe MPLS VPLS Features.
Understand VPLS architecture.
Describe VPLS Loop Prevention.
Configure MPLS VPLS.

INTRODUCTION
Virtual private LAN service (VPLS), also called transparent LAN service (TLS) or
virtual private switched network service, can deliver a point-to-multipoint L2VPN
service over public networks. With VPLS, geographically-dispersed sites can
interconnect and communicate over a metropolitan area networks (MAN) or wide
area networks (WAN) as if they were on the same local area network (LAN).

ASSUMED KNOWLEDGE
You should have a basic knowledge of Label Distribution Protocol (LDP) and
prefixes that trigger a label switch path (LSP).

MPLS VPLS Overview


Virtual Private LAN Service (VPLS), also called transparent LAN service (TLS) or
virtual private switched network service, delivers a point-to-multipoint L2VPN

******ebook converter DEMO Watermarks*******

service over an MPLS or IP backbone, as shown in Figure 11-1. The provider


backbone emulates a switch to connect all geographically dispersed sites of each
customer network. The backbone is transparent to the customer sites, which can
communicate with each other as if they were on the same LAN.

Figure 11.1: MPLS VPLS

VPLS provides Layer 2 VPN services for CE devices. However, it supports


multipoint services, rather than the point-to-point services that traditional L2VPNs
support.

******ebook converter DEMO Watermarks*******

While L2VPN PE devices simply forward any packet received from the CE devices
on the service instance, VPLS PE devices participate in customer MAC address
learning. PE devices learn and maintain a MAC address table in a similar way to a
traditional Ethernet switch. The VPLS virtual switch instance has both local physical
interfaces as well as virtual Ethernet interfaces. Any source MAC addresses in
frames received from either a local CE device (local physical interface) or from
remote PE devices (virtual Ethernet interfaces) will be learned and maintained in a
virtual switch MAC address table. Each PE creates and maintains a virtual switch
instance (VSI) which provides transparent layer 2 forwarding for customer sites.
Note
Even though VPLS can be enabled over IP backbone networks, this study guide
focuses on MPLS based networks.
As shown in Figure 11-2, a VPLS network can be regarded as a large, multisite, layer
2 switch from the point of view of users. The VPLS network can transparently
transmit all L2/L3 packets sent by CE devices and it appears as if the VPLS network
is a simple L2 switch with no protocols enabled. However, VPLS is actually a
multipoint VPN technology that offers a service equivalent to an Ethernet switch over
an MPLS-based core network.

Figure 11.2: CE devices view VPLS network as one large L2 switch

How are packets processed in a VPLS environment? In Figure 11-3, a core MPLS

******ebook converter DEMO Watermarks*******

network and connected CE sites are shown. For the CE devices, a VPLS network
acts like a L2 switch with no protocols enabled, and transparently transmits all user
packets (user PDU).

Figure 11.3: Packet forwarding.

The PE devices encapsulate CE traffic from local sites destined to remote sites using
a virtual circuit (VC) label and MPLS tunnel label. Based on the destination MAC
address in the frame received from the CE device, the traffic will be forwarded to
either a single remote PE via a pseudo wire or multiple PE devices via multiple
pseudo wires.
The encapsulated MPLS packet will be label switched across the MPLS backbone by
P devices.
The remote PE device will upon receipt of the MPLS encapsulated packet, use the
VC label to select the correct VPN to which the user packet belongs. The PE will
then, based on the destination MAC address, select the correct egress physical
interface. The PE will lastly remove the VC label and forward the original user
packet to the local CE device.
Once again, from the customer point of view, the VPLS is viewed as a multisite,
layer 2 switch. Multiple customer sites can be connected to the VPLS network and
because the PE devices learn MAC addresses, traffic from site 1 to site 2 can be

******ebook converter DEMO Watermarks*******

transmitted directly between those two sites via a pseudo wire (PW). Unicast traffic
from site 1 to site 3 will also be transmitted directly via a PW for any discovered
MAC addresses and not flooded to all sites.
As shown in Figure 11-1, a customer may have a stretched data center network
consisting of a layer 2 network hosting virtual machines across 4 sites (datacenters).
Virtual machines across the 4 sites could be configured in the same subnet. A virtual
machine in site 1 will be able to communicate with a virtual machine in site 2
without any inter-VLAN routing.
Unicast communication between a virtual machine in site 1 and a virtual machine in
site 2 would also only traverse PE1 and PE2 and any intermediate core MPLS P
devices. The unicast traffic between the two sites is not sent to sites 3 or site 4. This
because of MAC address learning by the PE devices and the direct PW between PE1
and PE2.
As an additional example, this also applies for communication from Site 1 to Site 3.
Unicast traffic transmitted between virtual machines in Site 1 and 3 is contained to
those sites and is not visible in Site 2 or Site 4.
This behavior is consistent with the behavior of a transparent, layer 2 Ethernet switch
as represented in Figure 11-2. Logically, each site is connected to a large Ethernet
switch which performs MAC address learning, forwarding of unicast traffic to
specific ports only; and flooding of broadcast, multicast and unknown unicast traffic.
VPLS technology is implemented using two drafts (Martini and Kompella), but is
limited to Ethernet interfaces. Currently, the Martini mode is the most widely used
version.

Use Case: Interconnect for Multiple Data


Centers
One of the use cases for VPLS is using a MPLS backbone to interconnect multiple
data centers at Layer 2, as shown in Figure 11-4.

******ebook converter DEMO Watermarks*******

Figure 11.4: Use Case: Interconnect for Multiple Data Centers

L2VPNs could be used in cases where two data centers are connected. This is
because L2VPNs provide only point-to-point connections and not point-to-multipoint
connection. If three or more data centers need to be connected in the same Layer 2
VLAN, VPLS would be required.
Technologies like VMware vMotion are only supported when the Hypervisors are
connected in the same Layer 2 VLAN (this changed in ESXi version 6). In this
example, virtual machines can be moved between any of the three remote data centers
across the MPLS backbone.
The core MPLS backbone may be using layer 3 connections, but from the point of
view of the hypervisors, they are on the same VLAN. OSPF or other protocols used
in the core determine the short path between data centers to provide an optimized
path.

MPLS VPLS Termininology


As shown in Figure 11-5, VPLS terminology is similar to L2VPNs, but with virtual
switch extensions.

******ebook converter DEMO Watermarks*******

Figure 11.5: MPLS VPLS Termininology

Pseudo wire (PW): A pseudo wire is a bidirectional virtual connection between two
PEs. An MPLS PW consists of two unidirectional MPLS LSPs in opposite
directions. In a VPLS solution, multiple pseudo wires are configured to provide
point-to-multipoint functionality.
Virtual switch instance (VSI): A virtual switch instance is a virtual switch within a
PE device operating like a traditional layer 2 switch. The VSI has physical local
interfaces on the PE device as well as virtual Ethernet interfaces (PW connections to
remote PEs). The VSI can dynamically learn MAC addresses from both the physical
and virtual interfaces. There is no central control plane MAC learning or MAC

******ebook converter DEMO Watermarks*******

address synchronization between PE devices. Each PE device performs local MAC


learning and acts independently of other PE devices. Broadcast, multicast and unicast
traffic are also processed independently by the local virtual switch.
VPLS instance: A VPLS instance is a grouping of local virtual switch instances
(VSIs) on multiple PE devices into one logical switch. The VPLS instance is created
per customer. The virtual switch backplane is not a crossbar or multi-bus backplane
as in traditional physical switches but is rather the MPLS core network.
From a customer point of view, the VPLS solution mimics a large layer 2 switch, but
in reality, a VPLS solution consisting of 4 PE devices actually consists of 4 virtual
switch instances working together. Each VSI is performing MAC learning
independently of other PE VSIs but together appear to be a single switch. Once again,
the logical grouping of all the VSIs is what we call a VPLS instance.
See Table 11-1 for VPLS terminology.
Table 11.1: VPLS Terminology
Term

Description

Attachment circuit that connects the CE to the PE. It can use physical
interfaces or virtual interfaces. Usually, all user packets on an AC,
AC
including Layer 2 and Layer 3 protocol messages, must be forwarded
to the peer site without being changed.
A customer edge device that is directly connected with the service
CE
provider network.
Packets transmitted over a PW use the standard PW encapsulation
Encapsulation
formats and technologies: raw and tagged.
A forwarder functions as the VPLS forwarding table. Once a PE
Forwarders receives a packet from an AC, the forwarder selects a PW for
forwarding the packet.
Network provider edge device that functions as the network core PE.
An NPE resides at the edge of a VPLS network core domain and
NPE
provides transparent VPLS transport services between core
networks.
A provider edge device connects one or more CEs to the service
provider network. A PE implements VPN access by mapping and
PE
forwarding packets between private networks and public network
tunnels. A PE can be a UPE or NPE in a hierarchical VPLS.
A pseudo wire is a bidirectional virtual connection between two
PEs. An MPLS PW consists of two unidirectional MPLS LSPs in
PW

******ebook converter DEMO Watermarks*******

opposite directions.
The PW signaling protocol fundamental to VPLS. It is used for
creating and maintaining PWs and to automatically discover the VSI
PW signaling
peer PE. Currently, there are two PW signaling protocols: LDP and
BGP.
802.1Q in 802.1Q, a tunneling protocol based on 802.1Q. It offers a
point-to-multipoint L2VPN service mechanism. With QinQ, the
private network VLAN tags of packets are encapsulated into the
QinQ
public network VLAN tags, allowing packets to be transmitted with
two layers of tags across the service provider network. This
provides a simpler Layer 2 VPN tunneling service.
Quality of service (QoS) is implemented by mapping the preference
QoS
information in the packet header to the QoS preference information
transferred on the public network.
Route
An RD is added before a site ID to distinguish the sites that have the
distinguisher same site ID but reside in different VPNs. An RD and a site ID
(RD)
uniquely identify a VPN site.
PEs use the BGP route target attribute (also called "VPN target"
attribute) to manage BGP L2VPN information advertisement. PEs
support the following types of route target attributes:

Route target
(RT)

Tunnel

UPE

Export target attributeWhen a PE sends L2VPN information


(such as site ID, RD, and label block) to the peer PE in a BGP
update message, it sets the route target attribute in the update
message to export target.
Import target attributeWhen a PE receives an update message
from the peer PE, it checks the route target attribute in the update
message. If the route target value matches an import target, the PE
accepts the L2VPN information in the update message.
Route target attributes determine which PEs can receive L2VPN
information, and from which PEs that a PE can receive L2VPN
information.
A tunnel can be an LSP tunnel or an MPLS TE tunnel. It carries one
or more PWs over an IP/MPLS backbone. If a PW is carried on an
LSP or MPLS TE tunnel, each packet on the PW packet is forwarded
to the correct VSI. The outer label is the public LSP or MPLS TE
tunnel label, which makes sure the packet is correctly forwarded to
the remote PE.
User facing provider edge device that functions as the user access
convergence device.

******ebook converter DEMO Watermarks*******

VSI

VPLS
instance

A virtual switch instance provides Layer 2 switching services for a


VPLS instance on a PE. A VSI acts as a virtual switch that has all the
functions of a conventional Ethernet switch, including source MAC
address learning, MAC address aging, and flooding. VPLS uses VSIs
to forward Layer 2 packets in VPLS instances.
A customer network might include multiple geographically dispersed
sites (such as site 1 and site 3 in Figure 11-1). The service provider
uses VPLS to connect all the sites to create a single Layer 2 VPN,
which is referred to as a "VPLS instance." Sites in different VPLS
instances cannot communicate with each other at Layer 2.

MPLS VPLS Control Protocols


Overview
A pseudo wire is a bidirectional virtual connection between two PEs. PEs use PWs
to forward packets among VPN sites. PWs include static PWs, LDP PWs, BGP PWs,
and BGP auto-discovery LDP PWs.
The two dynamic VPLS signaling protocols are LDP and MP-BGP. LDP signaling is
used for transmitting VC information and conforms to RFC 4762.In LDP signaling
mode, PE peers need to be manually specified. MP-BGP signaling conforms to RFC
4761. MP-BGP can also be used as the signaling protocol for transmitting VC
information, but supports automatic topology discovery.
PWs can be established on an MPLS tunnel (a common LSP) or a GRE tunnel. For a
PW to be established, you need to complete the following:
Establish an MPLS tunnel between the local end and the remote peer PE.
Determine the address of the peer PE. If the peer PE is in the same VSI as the
local PE, you can specify the address of the peer PE manually, or let the signaling
protocol find the peer PE automatically.
Use either the LDP or BGP signaling protocols to assign multiplex distinguishing
flags (that is, VC labels) and advertise the assigned VC flags to the peer PE,
establish unidirectional VCs and further establish a PW. If a PW is established on
an MPLS tunnel, a packet transported over the PW will contain two levels of
labels. The inner label, called a VC label, identifies the VC to which the packet
belongs so that the packet is forwarded to the correct CE; while the outer label,
called the public network MPLS tunnel label, is for guaranteeing the correct
transmission of the packet on the MPLS tunnel.

******ebook converter DEMO Watermarks*******

This study guide focuses on the LDP implementation method. The LDP
implementation of VPLS uses extended LDP (remote LDP sessions) as the PW
signaling protocol and is called Martini VPLS. This method is easy to implement
when compared to MBGP.
However, as LDP does not provide an automatic VPLS member discovery
mechanism, each peer PE requires manually configuration and every PE needs to be
reconfigured whenever a new PE joins.

Figure 11.6: MPLS VPLS Control Protocols

As shown in Figure 11-6, a PW is established between two PE using LDP as


follows:
1. After being associated with a VSI, each PE uses LDP in downstream unsolicited
(DU) mode to send a label mapping message to its peer PE (without
solicitation). The message contains the PW ID FEC, the VC label bound with the
PW ID FEC, and interface settings such as maximum transmission unit (MTU).
2. Upon receiving the LDP message, a PE determines whether it is associated with

******ebook converter DEMO Watermarks*******

the PW ID. If the association exists, the PE accepts the label mapping message
and responds with its own label mapping message.
3. After a unidirectional VC is established in each direction, the PW is formed. A
PW can be viewed as a virtual Ethernet interface of a VSI.

Implementation Methods
For reference purposes, a summary of the various PW implementation methods is
provided.

Static PW
To create a static PW, specify the address of the remote PE, the incoming label, and
the outgoing label.

LDP PW (Martini)
To create an LDP PW, specify the address of the remote PE, and use LDP to advertise
the PW-label binding to the remote PE. After the two PEs receive the PW-label
binding from each other, they establish an LDP PW. The FEC type in the LDP
message is PW ID FEC Element that includes the PW ID field (FEC 128). The PW
ID identifies the PW bound to the PW label.
Notes:
Two PEs establish a neighborhood with each other via the extended LDP. They
directly send LDP messages over TCP connections, maintain a remote LDP
session, and exchange VPN control information via the LDP session, including
PW label allocation (the PW label is equivalent to a private network label in a
L3 VPN).
A PE and a P still need to establish a common LDP neighborhood with each other
so as to allow for public network MPLS label allocation.
A PE establishes a Virtual Switch Instance (VSI) for each VPN. Each VSI has an
ID.
A pair of bi-directional Pseudo Wires (PWs) is established for each VPN
between two PEs. A label is allocated to each PW via the extended LDP. This
label is encapsulated in the transmitted packet so as to distinguish VPNs.

BGP PW
To create a BGP PW, BGP advertises label block information to the remote PE. After

******ebook converter DEMO Watermarks*******

the two PEs receive label block information from each other, they use the label block
information to calculate the incoming and outgoing labels and create the BGP PW. A
PE also uses the received label block information to automatically find the remote
PE.
Notes:
Two PEs establish a peering relationship with each other via MBGP. They are
added with a VPLS family and exchange VC signaling via BGP. The VPLS in this
case is also called Kompella VPLS.
A PE establishes a VSI for each VPN.
The Kompella VPLS is somewhat similar to common MPLS L3 VPN. An RT and
an RD also need to be configured for each VSI.

BGP Auto-discovery LDP PW


To create a BGP auto-discovery LDP PW, a PE uses BGP to automatically find the
remote PE, and uses LDP to advertise the PW-label binding to the remote PE. After
the two PEs receive the PW-label binding from each other, they establish a BGP
auto-discovery LDP PW.
The information advertised by BGP includes the ID (for example, LSR ID) and VPLS
ID of the advertising PE. The receiving PE compares the received VPLS ID with its
own VPLS ID. If the two VPLS IDs are identical, the two PEs use LDP to establish a
PW. If not, the PEs do not establish a PW. The FEC type in the LDP message is
Generalized PW ID FEC Element (FEC 129), which contains the VPLS ID, Source
Attachment Individual Identifier (SAII), and Target Attachment Individual Identifier
(TAII). The SAII is the LSR ID of the advertising PE. The TAII identifies the remote
PE and is advertised by the remote PE. VPLS ID+SAII+TAII uniquely identifies a
PW in a VPLS instance.

MPLS VPLS Martini Architecture


VPLS uses the same principles as L2VPNs, but supports point-to-multipoint
connections using multiple L2VPN connections.
When using the Martini implementation method, manual configuration of remote PE
peers is required. VC labels are exchanged between peers using extended LDP. For
label exchange and user packet forwarding to take place correctly, all PE members of
the VPLS instance need to be configured in a full mesh.
Each L2VPN configured to a remote PE peer acts as a virtual Ethernet interface in the

******ebook converter DEMO Watermarks*******

virtual switch instance on the local PE.


In Figure 11-7, four PE devices are configured to provide VPLS functionality to four
customer sites.

Figure 11.7: MPLS VPLS Martini Architecture

On each PE, the following exists:


1 x virtual switch instance (VSI) which learns MAC addresses from connected
interfaces.

******ebook converter DEMO Watermarks*******

1 x local physical interface to CE devices (AC). This is added to the VSI using a
service instance.
3 x virtual Ethernet interfaces which are the pseudo wires (PWs) connecting the
local PE to the 3 remote PE devices. These are L2VPN connections which are
terminated on the virtual switch instance rather than on the physical interface to
the CE device. This is in contrast to L2VPNs where the L2VPN was terminated
on the physical interface.
The result is that the VSI is a 4 port layer 2 Ethernet switch. MAC addresses need to
be learnt in a similar way to a traditional Ethernet switch. Broadcast, multicast and
unknown unicast traffic arriving from either the physical interface or virtual
interfaces will also need to be processed and forwarded out of appropriate
interfaces. These mechanisms are discussed next.

MPLS VPLS Martini Architecture - MAC


Learning
MAC Address Learning, Aging, and Withdrawal
VPLS provides connectivity through source MAC learning. A PE device maintains a
MAC address table for each VSI. As shown in Figure 11-8, a PE learns source MAC
addresses in the following ways:
Learning the source MAC addresses of directly connected sites: If the source
MAC address of a packet from a CE does not exist in the MAC address table, the
PE learns the source MAC address on the AC connected to the CE.
Learning the source MAC addresses of remote sites connected through PWs: A
VSI regards a PW as a logical Ethernet interface. If the source MAC address of a
packet received from a PW does not exist in the MAC address table, the PE
learns the source MAC address on the PW of the VSI.

******ebook converter DEMO Watermarks*******

Figure 11.8: MPLS VPLS Martini Architecture - MAC Learning

If no packet is received from a MAC address before the aging timer expires, VPLS
deletes the MAC address to save MAC address table resources.
When an AC or a PW goes down, the PE deletes MAC addresses on the AC or PW
and sends an LDP address withdrawal message to notify all other PEs in the VPLS
instance to delete those MAC addresses.

Unicast Traffic Forwarding and Flooding


After a PE receives a unicast packet from an AC, the PE searches the MAC address
table of the VSI bound to the AC to determine how to forward this packet.
If a match is found, the PE forwards the packet according to the matching entry. If
the outgoing interface in the entry is a PW, the PE inserts the PW label to the
packet, adds the public tunnel header to the packet, and then forwards the packet
to the remote PE over the PW. If the outgoing interface in the entry is a local
interface, the PE directly forwards the packet to the local interface.

******ebook converter DEMO Watermarks*******

If no match is found, the PE floods the packet to all other ACs and PWs in the VSI.
After a PE receives a unicast packet from a PW, the PE searches the MAC address
table of the VSI bound to the PW to determine how to forward this packet.
If a match is found, the PE forwards the packet through the egress interface in the
matching entry.
If no match is found, the PE floods the packet to all ACs in the VSI.

Multicast and Broadcast Traffic Forwarding and


Flooding
After a PE receives a multicast or broadcast packet from an AC, the PE floods the
packet to all other ACs and the PWs in the VSI bound to the AC.
After a PE receives a multicast or broadcast packet from a PW, the PE floods the
packet to all ACs in the VSI bound to the PW.

Example and Warning:


In Figure 11-8, if PE1 receives broadcast traffic from the locally connected CE
device. The virtual switch instance will flood the broadcast out of both virtual
interfaces to PE2 and PE3. A single packet is replicated twice as the VSI floods
traffic in the same way a traditional layer 2 switch does.
If the topology was extended to a scenario where 20 sites connected using VPLS, a
single broadcast packet received by PE1 would be replicated 19 times. If a multicast
video was being streamed by a CE at 1 Mbps, that would result in 19 Mbps on the
uplink interface of PE1 to the core MPLS network. This principal applies to
multicast, broadcast and unknown unicast.

MPLS VPLS Martini Architecture - Loop


Prevention
Overview
Even though the customer views the VPLS network as one large switch, internally
multiple layer 2 switches are connected using layer 2 interfaces. Each PE is
configured as a L2 switch and is connected in a full mesh to other PE L2 switches.
Therefore loops need to be considered and prevented.

******ebook converter DEMO Watermarks*******

In the topology in Figure 11-9, four VSIs are configured with a full mesh of pseudo
wires. Each VSI is acting as a layer 2 switch. In a traditional layer 2 switched
network, this would cause loops because a broadcast received by a traditional L2
switch is flooded out of all interfaces except the interface on which it arrives. In a
looped topology, this may result in a broadcast storm. In traditional switched
networks protocols like spanning tree (STP) are used to prevent loops. However,
enabling STP on a service provider network is not feasible because of scalability
and slow convergence issues. Therefore, VPLS uses the following methods to
prevent loops:
Full mesh - PEs and PWs are logically fully meshed. Each PE must create for each
VPLS forwarding instance, a tree to all the other PEs of the instance.
Split horizon - Each PE must support split horizon to avoid loops, that is, a PE
cannot forward packets via PWs of the same VSI instance. In other words, a PE
does not forward packets received from a PW to any other PW in the same VSI
but only forwards those packets to ACs.
Note
If a network administrator does not configure a full mesh between PE devices,
some CE sites will not be able to communicate with each other.

******ebook converter DEMO Watermarks*******

Figure 11.9: MPLS VPLS Martini Architecture - Loop Prevention

MPLS VPLS Design Considerations


A few design considerations to keep in mind when configuring MPLS VPLS:

MPLS Backbone
******ebook converter DEMO Watermarks*******

VPLS may be a candidate solution for scenarios where multiple data centers need to
be interconnected. However, VPLS requires that the links between the data centers be
MPLS enabled as VPLS uses a virtual circuit label to identify the VPN and an LSP to
reach the destination PE device.
If the customer has routed connections with a service provider MPLS network, VPLS
could also not be used as VPLS requires L2 connections on the PE interfaces.
Note
Technically, it is possible to create GRE tunnels on certain CE devices and
then enable MPLS on the GRE tunnels. This however complicates the
configuration.

Layer 3 Routing
Another consideration is that VPLS PE devices can transport frames received from
the CE device, but the PE devices do not interact with the CE devices. As an
example, it is not possible to configure IP addresses on the PE customer facing
interfaces. The virtual switch configured on the PEs provide L2 switching only to
remote sites and no L3 IP services or IP gateway functionality.
If IP gateway functionality is required, or VRRP is required inside the VPLS
instance, another device has to be used to provide this L3 gateway functionality.
This is achieved by either adding a separate physical device to the PE, or by using a
local back-to-back connection to another interface on the PE configured with the
required IP addresses and routing. Even though the same device is used for the VSI
and routing functionality, logically the VSI interfaces are configured for L2 only and
see the L3 interface as an external device.

MPLS VPLS Martini Configuration Steps


VPLS relies on a working MPLS backbone. Therefore, before configuring VPLS,
ensure that a core MPLS network is operational.
The following is an overview of basic VPLS configuration steps (completed on PE
devices):

1.

Enable L2VPNs globally. This is the same configuration as the


configuration used for L2VPNs.

******ebook converter DEMO Watermarks*******

2. Configure the virtual switch instances (create virtual switch).


3. Configure remote LDP peers. A L2VPN is created for each LDP
peer which will operate as a virtual interface on the VSI.
4. Configure the local physical interface via a service instance.
5. Bind the service instance to the VSI. The virtual switch will at this
point have both virtual interfaces (step 3) and physical interfaces
(step 4) associated to it.
6. Verify the configuration.

Step 1: Configure Basic MPLS and LDP


The first step is to configure basic MPLS and LDP. The configuration was explained
in Chapter 9 and it thus not repeated here.
Before VPLS is configured, it is assumed that the backbone infrastructure has been
configured:
IP routing is configured with an IGP such as OSPF.
Loopback addresses are configured and are being advertised.
Basic MPLS and LDP are configured on all backbone facing interfaces.
Any target PE loopback IP addresses are reachable via an LSP. Ensure for
example that PE1 is able to reach PE2 via a unique LSP and that PE2 can reach
PE1 using a different LSP.

Step 2: Configure Global L2VPN


Before configuring L2VPNs, ensure that the following has been completed:
LSR ID for the PE has been configured using the mpls

lsr-id

command

MPLS has been enabled on the backbone MPLS interface using the mpls
command

enable

Once basic MPLS has been configured and tested, configure L2VPNs.
You must enable L2VPN before configuring other L2VPN settings. This command
applies to MPLS L2VPNs, VPLS and SPB.
As shown in Figure 11-10, use the l2vpn enable command to enable L2VPN.

******ebook converter DEMO Watermarks*******

Use the undo l2vpn enable command to disable L2VPN.


L2VPNs are disabled by default.

Figure 11.10: Step 2: Configure Global L2VPN

Syntax
l2vpn enable
undo l2vpn enable

To enable L2VPN, follow the steps in Table 11-2.


Table 11.2: Steps to enable L2VPN
Step

Command

Remarks

1. Enter system view. system-view


2. Enable L2VPN.

l2vpn enable By default, L2VPN is disabled.

Step 3: Configure the Virtual Switch Instance


Overview
The next step is to create a virtual switch instance (VSI). Local physical ports and
remote virtual ports will then be bound to the VSI to create the virtual switch. The
local physical ports are bound to the VSI using a service instance. The virtual ports
are represented by pseudo wires (PWs) to remote PE devices, which are configured
on the same VPLS Instance.
The VSI is a local configuration object and has no member ports by default. Multiple
control protocols are available for use with the VSI. This study guide focuses on the
LDP Martini method.
In Figure 11-11, a virtual switch instance is created for a data center inter connect
(name dcic). The VSI name is locally significant.

******ebook converter DEMO Watermarks*******

Figure 11.11: Step 3: Configure the Virtual Switch Instance

Guidelines
Use the vsi command to create a VSI and enter VSI view. If the specified VSI
already exits, you enter the VSI view directly.
Use the undo

vsi

command to remove a VSI.

You can create multiple LDP, BGP, and static PWs for a VSI.

Syntax
vsi vsi-name
undo vsi vsi-name
vsi-name

Name of the VSI instance, a case-insensitive string of 1 to 31 characters. Hyphens (-)


are not allowed.

Examples
Create a VSI named vpls1 and enter VSI view.
<Sysname> system-view
[Sysname] vsi vpls1
[Sysname-vsi-vpls1]

Step 4: Configure the VSI Remote LDP Peers


Overview
In this step, the virtual interfaces are created by specifying the remote LDP PE peers
and associated pseudo wire IDs. As mentioned previously, this study guide focuses
on the LDP configuration method and hence the pwsignal is set to use LDP. Other
valid methods to configure the pseudo wires include Kompella which uses MBGP
and static configuration.
The switch virtual ports are added to the virtual switch instance by specifying PW

******ebook converter DEMO Watermarks*******

IDs. The PW IDs could be the same for all peer connections within a VSI or use
unique values. The PW ID values must match on both PE devices on either end of the
L2VPN (peer command). The IP addresses specified in the peer commands are the
loopback IP addresses of the remote PE devices and must be reachable via an MPLS
LSP.
In Figure 11-12, the peer command specifies the loopback IP address of PE2 and a
pseudo wire ID of 1001. On PE2, a peer command would be configured with the IP
address of PE1 and the same PW ID of 1001.

Figure 11.12: Step 4: Configure the VSI Remote LDP Peers

In this example, the same PW ID is used for the L2VPN to 10.0.0.3, but a different
number could have been specified. On PE3, a peer command would also be
configured with the IP address of PE1 and a matching PW ID of 1001.

pwsignal command
Use the pwsignal command to specify the PW signaling protocol for VPLS to use,
and enter VSI LDP view
(Martini mode) or VSI BGP view (Kompella mode).
pwsignal { bgp | ldp }
bgp

Specifies to use BGP signaling (Kompella mode).


ldp

Specifies to use LDP signaling (Martini mode).

Examples
Specify that VPLS instance aaa uses the connection mode of Martini and enter VSI
LDP view.

******ebook converter DEMO Watermarks*******

<Sysname> system-view
[Sysname] vsi aaa
[Sysname-vsi-aaa] pwsignal ldp
[Sysname-vsi-aaa-ldp]

peer command (VSI LDP view)


Use the peer command to create a peer PE for a VPLS instance.
Use the undo peer command to remove a peer PE.
With the hub-spoke feature for a VPLS instance, you can specify the connection mode
of the peer PE as hub or spoke.
peer ip-address [ { hub | spoke } | pw-class class-name | [ pw-id
pw-id ] [ upe | backup-peer ip-address [ backup-pw-id pw-id ] ] ]
undo peer ip-address
ip-address

IP address of the remote VPLS peer PE.


hub

Specifies the peer PE as the hub.


spoke

Specifies the peer PE as a spoke. This is the default when the hub-spoke
feature is enabled for the instance.
pw-class class-name

References a PW class template. class-name represents the template


name, a case-insensitive string of 1 to 19 characters.
pw-id pw-id

ID of the PW to the VPLS peer PE, in the range of 1 to 4294967295.


upe

Specifies that the peer PE is a UPE in the H-VPLS model.


backup-peer ip-address

Specifies the IP address of the backup NPE. If you specify this


parameter, you create a primary NPE and a backup NPE on the UPE.

******ebook converter DEMO Watermarks*******

backup-pw-id pw-id

Specifies the ID of the PW to the backup NPE. The pw-id argument is in


the range of 1 to 4294967295, and the default is the VSI ID.

Examples
Create a peer PE, which is of the UPE type, with the IP address of 4.4.4.4 and the
PW ID of 200.
<Sysname> system-view
[Sysname] vsi aaa
[Sysname-vsi-aaa] pwsignal ldp
[Sysname-vsi-aaa-ldp] peer 4.4.4.4 pw-id 200 upe

Create a primary peer PE 1.1.1.1 and a backup peer PE 2.2.2.2, and set the PW ID to
the primary peer to 300 and that to the backup peer to 400.
<Sysname> system-view
[Sysname] vsi aaa
[Sysname-vsi-aaa] pwsignal ldp
[Sysname-vsi-aaa-ldp]
backup-pw-id 400

peer

1.1.1.1

pw-id

300

backup-peer

2.2.2.2

Step 5: Configure a Service Instance


Overview
In this step traffic arriving on the local physical interface is matched to a virtual
service instance. The service instance is a locally significant number. This is
configured on the PE device and applied to the customer facing interface.
In Figure 11-13, service-instance 10 is applied to interface Ten Gigabit Ethernet
1/0/1. Instead of selecting all arriving traffic on the interface for transmission to other
sites, a network administrator can specify that only certain customer traffic types are
selected for transmission via VPLS. Like with L2VPNs, in the simplest scenario, all
arriving traffic is selected for transmission. Other options include specifying an
individual VLAN only or tagged traffic only. The encapsulation command is used to
specify matched traffic. In Figure 11-13, VLAN 10 tagged traffic is selected for
transmission.

******ebook converter DEMO Watermarks*******

Figure 11.13: Step 5: Configure a Service Instance

The service instance ID is a locally significant number and can be different at each
VPLS site. The instance ID also does also not need to match the VLAN ID.
Traffic from either physical interfaces or bridge aggregation interfaces can be
matched to a VSI. If the PE device is configured as part of an IRF system, the IRF
system can be configured with a bridge aggregation group on customer facing
interfaces (which could also be an IRF system). Traffic selection can be based on the
bridge aggregation rather than physical interfaces in this example. This provides
additional redundancy at PE level.

Service-instance command
Use the service-instance command to create a service instance and enter service
instance view.
Use the undo

service-instance

command to delete an existing service instance.

By default, no service instance is created.


The service instances created on different Layer 2 Ethernet interfaces can have the
same service instance ID.
service-instance service-instance-id
undo service-instance service-instance-id
instance-id

Specifies the ID of the service instance, in the range of 1 to 4096.

Example
On Layer 2 interface Gigabit Ethernet 3/0/3, create service instance 100 and enter its
view.
<Sysname> system-view
[Sysname] interface GigabitEthernet 3/0/3
[Sysname-GigabitEthernet3/0/3] service-instance 100

******ebook converter DEMO Watermarks*******

[Sysname-GigabitEthernet3/0/3-srv100]

Encapsulation command (service instance view)


Use the encapsulation command to configure a packet matching rule for the current
service instance.
Use the undo encapsulation command to remove the packet matching rule of the
current service instance.
By default, no packet matching rule is configured for a service instance.
You can choose only one of the following match criteria for a service instance:
Match all incoming packets.
Match incoming packets with any VLAN ID or no VLAN ID.
Match incoming packets with a specific VLAN ID.
The match criteria for different service instances configured on an interface must be
different.
You can create multiple service instances on a Layer 2 Ethernet interface, but only
one service instance can use the default match criteria (encapsulation default) to
match packets that do not match any other service instance. If only one service
instance is configured on an interface and the service instance uses the default match
criteria, all packets received on the interface match the default match criteria.
This command cannot be executed multiple times for a service instance.
Removing the match criteria for a service instance also removes the association
between the service instance and the VSI.

Syntax
encapsulation default
encapsulation { tagged | untagged }
encapsulation s-vid vlan-id [ only-tagged ]
undo encapsulation
default

Specifies the default match criteria.


s-vid vlan-id

******ebook converter DEMO Watermarks*******

Matches packets with a specific outer VLAN ID. The vlan-id argument
specifies a VLAN ID in the range of 1 to 4094.
only-tagged

Matches only tagged packets. If this keyword is not specified when the
matching VLAN is the default VLAN, packets with the default VLAN ID
or without any VLAN ID are all matched. If this keyword is specified
when the matching VLAN is the default VLAN, only packets with the
default VLAN ID are matched
tagged

Matches tagged packets.


untagged

Matches untagged packets.

Example
Configure service instance 1 on Layer 2 Ethernet interface Ten-GigabitEthernet 1/0/1
to match packets that have an outer VLAN ID of 111.
<Sysname> system-view
[Sysname] interface ten-gigabitethernet 1/0/1
[Sysname-Ten-GigabitEthernet1/0/1] service-instance 1
[Sysname-Ten-GigabitEthernet1/0/1-srv1] encapsulation s-vid 111

Step 6: Bind the Service Instance to the VSI


Overview
Once a service instance has been defined, the service instance can be bound to a
virtual switch instance (VSI). The cross connect object is not a global object within
VPLS, but is configured within the service instance.
In Figure 11-14, any traffic arriving from VLAN 10 will be cross connected to a VSI
dcic (data center interconnect). VLAN 10 traffic was previously associated with
service instance 10 in Figure 11-13. This is how a new physical port is added to the
virtual switch. The VSI will now learn MAC addresses as they arrive from VLAN 10
on interface ten-gigabitethernet1/0/1.

******ebook converter DEMO Watermarks*******

Figure 11.14: Step 6: Bind the Service Instance to the VSI

A network administrator could configure another service instance with a different


number such as 11 on the same interface. The new service instance could be
configured to match VLAN 11 traffic arriving on the same interface, but cross connect
that traffic to a different VSI.
This also means that the network administrator could configure a service instance 11
which is looking at a for instance at a serve all traffic coming in with service VLAN
ID 11 and the traffic which is marked with service ID VLAN 11 will be bind to
different virtual switch instance.

Guidelines
Use the xconnect
a VSI.
Use the undo

vsi

command to bind a Layer 3 interface or a service instance to

xconnect vsi

command to remove the binding.

By default, a service instance is not bound to any VSI.


After you bind a Layer 3 interface to a VSI, packets received from the interface are
forwarded according to the MAC address table of the VSI. After you bind a service
instance on a Layer 2 interface to a VSI, packets received from the interface and
matching the service instance are forwarded according to the MAC address table of
the VSI.
The access mode determines how the PE considers the VLAN tag in Ethernet frames
received from the AC and how the PE forwards Ethernet frames to the AC.
VLAN access modeEthernet frames received from the AC must carry a VLAN
tag in the Ethernet header. The PE considers the VLAN tag as a P-tag assigned by
the service provider. Ethernet frames sent to the AC must also carry the P-tag.
Ethernet access modeIf Ethernet frames from the AC have a VLAN tag in the
header, the PE considers it as a U-tag and ignores it. Ethernet frames sent to the
AC do not carry the P-tag.
Before you configure this command for a service instance, make sure you have

******ebook converter DEMO Watermarks*******

configured match criteria for the service instance by using the encapsulation
command.
The xconnect vsi command is available for service instances with the ID in the
range of 1 to 4094.

Syntax
xconnect vsi vsi-name [access-mode{ethernet|vlan} | {hub|spoke }]
undo xconnect vsi
vsi-name

Name of a VPLS instance, a case-insensitive string of 1 to 31 characters.


access-mode

Specifies the AC access mode. By default, the access mode is VLAN.


ethernet

Specifies the access mode as Ethernet.


vlan

Specifies the access mode as VLAN.

Example
Configure service instance 200 on Layer 2 interface Ten-GigabitEthernet 1/0/1 to
match packets with an outer VLAN tag of 200, and bind the service instance to the
VSI vpn1.
<Sysname> system-view
[Sysname] vsi vpn1
[Sysname-vsi-vpn1] quit
[Sysname] interface ten-gigabitethernet 1/0/1
[Sysname-Ten-GigabitEthernet1/0/1] service-instance 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200
[Sysname-Ten-GigabitEthernet1/0/1-srv200] xconnect vsi vpn1

Step 7: Verify
Overview
******ebook converter DEMO Watermarks*******

Display commands can be used to verify VPLS configuration. These commands are
used on the PE devices and not the P or CE devices.
The display l2vpn pw verbose command is used to verify the L2VPN status of the
virtual switch instance (VSI). In Figure 11-15 the VSI used is dcic (data center
interconnect) and two L2VPN connections are configured, one to PE 10.0.0.2 and
another to PE 10.0.0.3.

Figure 11.15: Step 7: Verify

Both Link ID 8 (Peer 10.0.0.2) and Link ID 9 (Peer 10.0.0.3) currently have status of
up (State: Up).
LDP detects the loss of a connection to a remote PE peer by using LDP keep-alives.
The state of the PW would change to Down for a lost connection. All MAC
addresses that have been learned from that remote peer are flushed as the virtual
interface is down.

Syntax
Use the display l2vpn pw command to display L2VPN PW information.
Without ldp and static, this command displays both LDP PW and static PW
information.
display l2vpn pw [ vsi vsi-name ] [ protocol { bgp | ldp | static }
] [ verbose ]
vsi vsi-name

Displays L2VPN PW information for the VSI specified by its name, a


case-sensitive string of 1 to 31 characters. If no VSI is specified, the
command displays L2VPN PW information for all VSIs.

******ebook converter DEMO Watermarks*******

protocol

Specifies a signaling protocol. If no protocol is specified, this command


displays PWs created by all protocols.
bgp

Displays BGP PW information.


ldp

Displays LDP PW information, including PWs for FEC 128 (LDP PWs)
and FEC 129 (BGP auto-discovery LDP PWs).
static

Displays static PW information.


verbose

Displays detailed information. Without this keyword, the command


displays brief information.

display l2vpn pw
Display brief information about all PWs.

See Table 11-3 for the output description.


Table 11.3: display l2vpn pw output description

******ebook converter DEMO Watermarks*******

Step

Description
This field displays

PW
The PW ID for an LDP PW (FEC 128) or a static PW
ID/Rmt
"-" for a BGP auto-discovery LDP PW (FEC 129)
Site
The remote site ID for a BGP PW.
Proto

Protocol used to establish the PW: LDP, Static, or BGP.


PW flag:
MPrimary PW.
BBackup PW.

Flag

HThe PW is the hub link in the VPLS hub-spoke network. This value is
not supported in the current software version and is reserved for future
support.
SThe PW is a spoke link in the VPLS hub-spoke network. This value is
not supported in the current software version and is reserved for future
support.
NSplit horizon forwarding is disabled.

Link
ID

Link ID of the PW in the VSI.

State

PW state: Up, Down, Blocked, or BFD Defect. Blocked indicates that the
PW is blocked. BFD Defect indicates BFD has detected a defect on the PW.

display l2vpn pw verbose


Display detailed information about all L2VPN PWs.
<Sysname> display l2vpn pw verbose
VSI Name: aaa
Peer: 2.2.2.9 Remote Site: 2
Signaling Protocol : BGP
Link ID : 9 PW State : Up
In Label : 131120 Out Label: 131119
MTU : 1500
PW Attributes : Main
VCCV CC : VCCV BFD : -

******ebook converter DEMO Watermarks*******

Tunnel Group ID : 0x1800000960000000


Tunnel NHLFE IDs : 138
Peer: 3.3.3.9 Remote Site: 3
Signaling Protocol : BGP
Link ID : 10 PW State : Up
In Label : 131121 Out Label: 131181
MTU : 1500
PW Attributes : Main
VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000001
Tunnel NHLFE IDs : 130
VSI Name: bbb
Peer: 2.2.2.9 VPLS ID: 100:100
Signaling Protocol : LDP
Link ID : 8 PW State : Up
In Label : 131153 Out Label: 131153
MTU : 1500
PW Attributes : Main
VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000960000000
Tunnel NHLFE IDs : 138

See Table 11-4 for the output description:


Table 11.4: Display l2vpn pw verbose output description
Field

Description

Peer
Link ID

IP address of the peer PE to which the PW is destined.


Link ID of the PW in the VSI.
PW state: Up, Down, Blocked or BFD Detect.
PW State Blocked indicates that the PW is blocked. BFD Defect indicates BFD
has detected a defect on the PW.
Wait time to switch traffic from the backup PW to the primary PW when
Wait to
the primary PW recovers, in seconds. If the switchover is disabled, this
Restore
field displays Infinite.
Time
This field is available when both primary and backup PW exist, and is
displayed only for the primary PW.
Remaining Remaining wait time for traffic switchover, in seconds. This field is

******ebook converter DEMO Watermarks*******

Time
MTU

displayed after the switchover wait timer is started.


Negotiated MTU of the PW.
PW attribute:
MainThe PW is the primary PW.
BackupThe PW is the backup PW.

Hub linkThe PW is the hub link in the VPLS hub-spoke network.


PW
This value is not supported in the current software version and is
Attributes
reserved for future support.
Spoke linkThe PW is a spoke link in the VPLS hub-spoke network.
This value is not supported in the current software version and is
reserved for future support.
No-split-horizonSplit horizon forwarding is disabled.
VCCV CC type:
Control-WordControl word.
VCCV CC

Router-AlertMPLS router alert label.


TTLTTL timeout.

VCCV
BFD
Tunnel
Group ID
Tunnel
NHLFE
IDs

VCCV BFD type:


Fault Detection with BFDBFD packets use IP/UDP encapsulation
(with IP/UDP Headers).
Fault Detection with Raw-BFDBFD packets use PW-ACH
encapsulation (without IP/UDP Headers).
ID of the tunnel group for the PW.
NHLFE ID of the public tunnel that carries the PW.
If equal-cost tunnels are available, this field displays multiple NIDs.
If no tunnel is available, this field displays None.

VPLS ID ID of the VPLS instance.


Remote
ID of the remote site.
Site

Step 7: Verify (continued)


Use display l2vpn mac-address to display MAC address table information for VSIs,
as shown in Figure 11-16. This displays MAC addresses that the PE has learnt from

******ebook converter DEMO Watermarks*******

local interfaces and from remote sites via virtual interfaces.

Figure 11.16: Step 7: Verify (continued)

Syntax
display l2vpn mac-address [vsi vsi-name] [dynamic] [count]
vsi vsi-name

Displays MAC address table information for the VSI specified by its
name, a case-sensitive string of 1 to 31 characters. If no VSI is
specified, the command displays MAC address table information for all
VSIs.
dynamic

Displays dynamically generated MAC address entries. If this keyword is


not specified, the command displays all types of MAC address entries.
Currently, the device supports only dynamic MAC address entries.
count

Displays the number of the MAC address entries. If you do not specify
this keyword, the command displays detailed information about the
MAC address entries.

Example
Display MAC address table information for all VSIs.
<Sysname> display l2vpn mac-address
MAC Address

State

VSI Name

Link ID

Aging Time

0000-0000-000a

dynamic

vpn1

Aging

0000-0000-0009

dynamic

vpn1

Aging

--- 2 MAC address(es) found ---

Display the total number of MAC address entries of all VSIs.

******ebook converter DEMO Watermarks*******

<Sysname> display l2vpn mac-address count


2 MAC address(es) found

See Table 11-5 for the output description.


Table 11.5: Display l2vpn mac-address output description
Field
State
Link ID
Aging Time
XX MAC
address(es)
found

Description
MAC address type. Currently, the MAC address type can only be
dynamic, which indicates that the MAC address is dynamically
learned.
Outgoing link ID of the MAC address entry. It is the link ID of the AC
or PW in the VSI.
Indicates whether the MAC address entry will be aged.
Total number of MAC address entries of the VSI.

Summary
In this chapter you learned about VPLS, which is an extension of L2VPNs. VPLS
supports point-to-multipoint connections, whereas L2VPNs only support point-topoint connections.
A VPLS network is perceived to be a large layer 2 switch by CE devices.
Multiple implementation methods were discussed include static configuration and the
use dynamic protocols such as LDP and MBGP. The chapter focused on the Martini
method using extended LDP.
You learned how MAC addresses are learnt, aged and withdrawn by the virtual
switch instances (VSI). Flooding and forwarding of unicast, broadcast and multicast
traffic was discussed.
You learned about VPLS loop prevention mechanisms using a full mesh of PWs and
split horizon.
The configuration and verification of VPLS were also discussed.

Learning Check
******ebook converter DEMO Watermarks*******

Answer each of the questions below.


1. Which mechanism prevents loops in a VPLS core network?
a. STP
b. Split horizon
c. TTL
d. Partial mesh
e. OSPF
2. Which of the following is a bidirectional virtual connection between two PEs?
a. LSP
b. VSI
c. PW
d. NPE
e. VPLS instance
3. Which of the following is not permitted in VPLS?
a. Broadcast packet received on PW 1 transmitted out of AC.
b. Broadcast packet received on AC transmitted out of PW 1.
c. Broadcast packet received on AC transmitted out of PW 1 and PW 2.
d. Broadcast packet received on PW 1 transmitted out of PW 2.

Learning Check Answers


1. b
2. c
3. d

******ebook converter DEMO Watermarks*******

12 Data Center Network Design

EXAM OBJECTIVES
In this chapter, you learn to:
Describe requirements for a datacenter network design.
Describe different datacenter deployment models.
Understand various data center technologies and their impact on a design.
Describe the options for data center layers.
Understand the HP FlexFabric portfolio.

INTRODUCTION
This chapter provides an overview of data center design considerations. This
includes relating various design philosophies and objectives to specific technologies
for Layer 2 connectivity inside a data center, data center interconnects, Layer
services, storage protocols, and overlay technologies.

Key Drivers for a New Data Center


Infrastructure
When looking at data center topologies, there are several key business and
technology drivers that affect data center design choices.
Large scale data center consolidation is one such driver. As hosted solutions gain
popularity, data centers continue to grow in size, demanding new levels of
performance and scalability. Several smaller data centers are being consolidated into
larger facilities to improve economies of scale.

******ebook converter DEMO Watermarks*******

Multiple organizations are hosting business-critical applications in these large data


centers, and the expectation is for extremely reliable, continuously operating
services.
Due to high service volumes, and pressures to minimize space, power, and heating
requirements, many data centers will rely on blade servers to optimize server
deployments.
Server virtualization is a key element in all data center deployments. The ability to
host multiple Virtual Machines inside a single physical blade server further reduces
power, space, and cooling costs. These technologies can drastically increase the
flexibility and ease of initial deployments. Migrating virtual machines to new
physical servers are becoming much easier with tools like VMwares vMotion and
Microsofts Live Migration.
New application deployment and delivery models are also driving data center
design. Where formerly one server may have performed all functions for an
application, there may now be a separate front-end server, a business logic server,
and a back-end database server.
In the single-server model, a client made a request, and a single server performed all
functions to service that request and then generate responses. Now, a similar client
request might be serviced by three servers, which must all communicate amongst
themselves before responding to that client request. These new models create
bandwidth intensive traffic flows, and require high-performance server-to-server
communications.
Meanwhile, Virtual Desktop Infrastructures (VDI) can concentrate many client
environments into a few hosted systems. This further increases the need for scalable,
reliable, high-performance data center infrastructure.

Data Center Deployment Models


Figure 12-1 provides an overview of four different data center deployment models.
Each model serves unique business requirements, with different characteristics and
objectives. This will lead to different technical priorities, which can ultimately be
met with a unique set of protocols, design choices, and implementation methods.

******ebook converter DEMO Watermarks*******

Figure 12-1: Data Center Deployment Models

These methods enable new cloud-based delivery models that drive a whole new set
of technology requirements across servers, storage, and networking domains. These
increasingly popular models let enterprises provision applications more flexibly
within traditional internal infrastructures, and enable hosted application and service
providers to build entire businesses based on delivering services via a public cloud
model. Given the range of use cases and options, customers often deploy a
combination of architectures to address varied requirements and to optimize
operations.

HP FlexFabric Use Cases in the Data Center


Figure 12-2 provides a high-level overview of how HP FlexFabric offerings can be
used in modern data center deployments.

******ebook converter DEMO Watermarks*******

Figure 12-2: HP FlexFabric Use Cases in the Data Center

Traditional 2-Tier networks use the classic, hierarchical tree paradigm used for
several decades. In this deployment, access switches like the 59xx-series are singleor dual-homed to one or more core switches, such as the FlexFabric 12x00-series.
In a leaf/spine deployment, so-called leaf switches, such as the HP 59xx-series, form
the access layer. This access layer is fully meshed to a fabric of spine switches, such
as the FlexFabric 5930-series. Nearly everything is one hop away from everything
else. More advanced protocols replace STP in this design, so none of the links are
placed in a blocking state.
What makes the real difference between traditional 2 tier and Spine/leaf? A list of
Pros/Cons would be interesting
Overlay technologies create a virtual network that is built on top of another network.
The physical network could be a traditional 2-tier, or new leaf-spine designs. HP is
developing Software Defined Networking (SDN), which combines many separately
managed, physical network components into a single control plane.
VXLAN is an overlay technology that allows multiple clients to share a single data
center infrastructure. Layer 2 frames are encapsulated into an IP-datagram with a 24bit VXLAN ID, and routed over any Layer 3 infrastructure. Each of over 16 million
clients can have up to 4095 VLANs.
HPs advanced, 1-tier blade systems greatly simplify a virtualized network
deployment by incorporating traditional Top-of-Rack (ToR) switches inside the blade
enclosure. This could be in the form of a Virtual Connect module, or a 6125XLG,
which can be connected directly to the core, or to End-of-Row (EoR) Chassis

******ebook converter DEMO Watermarks*******

devices.
These models are based on the HP Comware operating system, which provides zerocost licensing, IRF for redundancy, IMC management integration, and capability to be
integrated into a software-defined fabric.

Requirements Overview
Typical data center design requirements include the following:
Virtualization improves efficiency, agility, and resiliency of data center network
operations
Multi-tenant support may not be a concern for private clouds, but is the main
reason many data centers exist the core business is to host the infrastructure for
multiple organizations.
Multiple Data centers are often required, either to add additional capacity or for
resiliency purposes. A natural disaster strikes a location in one continent, while
the redundant data center in another continent may be unfazed.
WAN connections are required to connect each corporate site or tenant to data
center facilities
Storage services are often centrally located for efficient server access and
streamlined communications and manageability.

Compliance to stringent, generally accepted security best-practices and


procedures is vital for any type of networking environment. This is especially
true for multi-tenant data centers.

Impact of Requirements
Each requirement impacts the design and deployment choices you make. These are
summarized below:
Virtualization increases device density, which drives an increased need for
bandwidth. These virtual machines should be able to be hosted on any physical
device. This means that the associated Layer 2 domain or VLAN must be
available to any physical server. These VLANs may likely need to span across
physically separate data centers, especially to enable certain disaster recovery
scenarios.
Multi-tenant support requires a single infrastructure to support and isolate
multiple tenants. Hardware devices, software, and protocols must be deployed
that can meet this requirement.

******ebook converter DEMO Watermarks*******

When multiple Data centers are required for scalability and redundancy, they need
to be interconnected with links that support Layer 2 connectivity, while localizing
the impact of faults. For example, broadcast storms and STP loop errors at one
site should not affect operations at other sites.
WAN connections can be deployed using various technologies. The options used
can depend on what services are required, and whether the customer or provider
will manage the link. Perhaps one data center tenant my prefer a provider to
provision and manage some type of L2VPN service, while another tenant may
prefer to use some other technology.
Storage services are often deployed in data centers using some converged
technology. Both iSCSI and FCoE require specific service handling over Ethernet
systems. iSCSI requires special QoS configurations on the Ethernet fabric, while
FCoE requires specific lossless Ethernet services to be deployed. Of course, the
devices you deploy should be a part of a validated design, and support required
QoS or lossless Ethernet features.
Security at the network edge challenged by virtualization. Specific technologies
may be required to maintain insight into Virtual Machine communication. This is
important for compliance monitoring and reporting.

Overview
This section will discuss data center technologies, including the following:
Customer Service models
Layer 2
Layer 3
Multi-tenant
Data center interconnect
Data center WAN connectivity
Storage
Security

Data Center Customer Service Models


Figure 12-3 provides a basic overview of three customer service models. A single
tenant model can be implemented as a traditional enterprise solution a single data
center, using a single set of 4094 VLANs. A Layer 2 data center interconnect may be

******ebook converter DEMO Watermarks*******

required to access a single backup data center. A single Layer 3 routing service is
normally sufficient for a single enterprise.

Figure 12-3: Data Center Customer Service Models

When a data center need only support a limited number of tenants, it might be
possible to use a single data center VLAN space, allocating unique VLANs to each
tenant. Another alternative is to allocate a separate set of 4095 VLANs to each of the
tenants.
In this scenario, each tenants Layer 2 DC interconnect should be isolated. A separate
Layer 3 routing service should be provided per tenant, provided by the data centers
own core devices, or by separate data center routers.
For large multi-tenant data centers, a single VLAN space might suffice, if each tenant
only needs a few VLANs. A more flexible and scalable alternative is to provide each
tenant with their own VLAN space of 4094 isolated VLANs.
We will also need isolated Layer 2 DC interconnects and Layer 3 services for each
tenant. This could be provided by a physical DC routing solution, or by deploying
some type of network function virtualization.

Datacenter Customer Service Solutions


Figure 12-4 shows specific technologies that can be used for certain deployment
models. These technologies will be reviewed in the sections that follow.

******ebook converter DEMO Watermarks*******

Figure 12-4: Datacenter Customer Service Solutions

Intra Datacenter Layer 2 Services Overview


There are many options to provide Layer 2 connectivity inside the data center. These
include the following:
VLANs
VLANs with MDC
QinQ
VLANs with TRILL
TRILL with MDC
SPBM
SPBM with MDC
The deployment model to be implemented is a key factor in selecting which Layer 2
option to use. This chapter will review these technologies and relate their suitability
to different deployment models.

Layer 2: VLANs
Classic VLANs use a traditional Layer 2 isolation model, where there is a separate
MAC address table per VLAN. All HP data center switches provide support for the

******ebook converter DEMO Watermarks*******

standard 4094 concurrent VLANs. This can be sufficient for small to medium sized
data centers, especially if they are privately owned, for a single tenant. The
technology is simple and well-understood by the typical network administrator.
IRF is often deployed with traditional VLANs to improve redundancy at the access
and core layers. Link-aggregation is used between IRF systems for improved
bandwidth utilization and resiliency.
For improved scalability, VLANs can be deployed along with Multi-tenant Device
Context (MDC), see Figure 12-5. This is suitable for a limited number of tenants,
since current-generation devices support a maximum of nine MDCs. Since one of
these is used for management, a maximum of eight tenants can be currently supported.
Since each MDC has exclusive access to its own set of advanced ASICs in
hardware, each tenant has a dedicated set of 4094 VLANS.

Figure 12-5: Layer 2: VLANs

Again, this solution is typically deployed in conjunction with IRF and linkaggregation for improved scalability and redundancy.

Layer 2: QinQ
******ebook converter DEMO Watermarks*******

QinQ provides a more scalable technology for multi-tenant support. Each customer
has their own set of 4094 VLANs, using standard 802.1q tagging. An additional,
outer 802.1q tag is added to the original tenant frame, which is used to move frames
through a common data center infrastructure.
This outer tag is a standard 802.1q tag, and so also supports 4094 unique backbone
Service VLANs (S-VLANs), one for each tenant. Each S-VLAN can host an isolated
set of 4094 customer VLANs (C-VLANs). For a data center that intends to host fewer
than 4094 tenants, QinQ is a viable option that can be implemented with relative
ease. All HP data centers switches support QinQ with 4094 concurrent VLANs.
Typically QinQ is be deployed in combination with IRF at the server access and core
or aggregation layers. Link-aggregation is used between the IRFs to further improve
scalability and redundancy.
Although QinQ provides complete isolation between customers, shared data center
services can still be offered through the use of selective QinQ. This allows some
provider VLANs to be visible in several tenants VLAN space. For example, CVLAN 4001 can be assigned a unique, outer S-VLAN tag. This S-VLAN tag could be
associated with a tenant in order to provide some type of back-up service that is
deployed on the providers VLAN 4001. All HP data center switches support
selective QinQ.

Layer 2: TRILL
TRILL currently supports a single set of 4094 VLANs, and so does not improve
scalability over traditional VLANs in this regard. However, TRILL does provide
significant improvements in frame delivery methods. IRF and link-aggregation
provide a redundant active/active topology, but the use of STP means that most traffic
traverses the core devices. This is not optimal, especially for communications
between two access-layer switches. TRILL uses a shortest path unicast delivery
method that optimizes frame paths.
TRILL is also more efficient in its method of handling traffic over multiple equal cost
paths. This multi-device, multi-path load-balancing is based on an IP hashing
algorithm. For example, one hundred servers near one edge of the data center are in
VLAN 10. They communicate with one hundred servers at the other edge of the data
center, also in VLAN10. This layer 2 traffic can traverse multiple paths inside the
TRILL fabric, based on source/destination IP address pairs.
TRILL is often best when combined with the MDC feature, especially considering
that TRILL devices only provide Layer 2 services. Any Layer 3 service must be

******ebook converter DEMO Watermarks*******

connected through a TRILL access port. With MDC, one context can be used to host
TRILL fabric layer 2 services, while another can be used for the layer 3 IP service.
The alternative is to use separate devices for each service function.

Layer 2: SPBM
SPBM uses an I-SID number to uniquely identify up to 16 million tenants, each with
its own set of 4094 VLANs. Like TRILL, SPBM uses an efficient, shortest path
unicast delivery method. Unlike TRILLs IP hash-based load sharing method, SPBM
uses a deterministic, configuration-based multi-path technique.
SPBM only provides Layer 2 services. Layer 3 service must be connected through an
SPBM VSI Service instance. One option is to use one MDC to host the SPBM fabric,
while another MDC is used for Layer 3 services. A back-to-back cable can be used
to connect the line cards from the SPBM fabric to the line cards of the layer 3 MDC.

Data Center Layer 3 Services Overview


Several technologies are available to provide Layer 3 IP services, including the
following:
IRF
VRRP
MCE
MDC
QinQ Sub-interfaces
NFV
Path optimization

Layer 3: IRF
IRF, as shown in Figure 12-6, combines multiple physical devices into a single
virtual device, managed as a single unit. IRF can leverage the data plane of both
devices simultaneously to provide an active/active redundancy model. This means
there is no slave fail-over time. The only downtime is due to link-failure. The actual
service restore time will vary between 10 - 100 milliseconds, depending on IRF
hardware.

******ebook converter DEMO Watermarks*******

Figure 12-6: Layer 3: IRF

IRF provides an active/standby control plane, so local and static routes have a hitless
failover. This is because these routes need not be relearned by the standby unit. For
dynamically learned routes, a graceful restart feature maintains peer relationships
during fail-over. This feature is supported for most routing protocols, such as OSPF,
BGB, and IS-IS. It is also supported for TRILL and SPBM.
Some of the routing protocols support non-stop routing, in which full control-plane
synchronization is maintained. With graceful restart, peer relationships are
maintained while the new master builds a link-state database. Non-stop routing keeps
the link-state database synchronized in real time. Therefore, the graceful restart
feature is not required.
IRF is the recommended Layer 3 model inside the data center.

Layer 3: VRRP
VRRP is used between two or more independent Layer 3 switches or routers. Unlike
IRF, the devices have independent control planes, and so continue to be individually
configured as separate devices.
VRRP uses an active/standby model. The master device acts as the active forwarder,
while the backup device lays dormant, waiting to become an active forwarder should
the master fail. By default, this failover takes about 3 seconds, as opposed to 100ms
or less for IRF.

******ebook converter DEMO Watermarks*******

VRRP is the recommended layer 3 redundancy model to connect data centers, and is
available on all HP data center routers and switches. The isolated control planes
used by VRRF can be an advantage when connecting data centers.
VRRP can be tuned to improve overall performance. One such method is to
implement Bi-Directional Forwarding Detection (BFD). This allows the standby unit
to monitor the status of its BFD session with the master, and immediately transition to
a forwarding state. With BFD tuning, the 3-second failover delay can be reduced to
somewhere between 100ms to 2 seconds, depending on hardware. When VRRF is
configured to use BFD, typical failover time is less than 500ms.
Another tuning feature is VRRP Hello Blocking. When hello packets are blocked
between VRRP devices, both units will assume the master role, resulting in two
masters. This creates a more desirable active/active router topology.
This would not be an acceptable condition inside a data center, since both routers
would use the same VRRP virtual MAC address. This would create a conflict and
cause a MAC-flapping condition for attached data center switches. However, for
connectivity between data centers, this is not an issue. This is because we can ensure
that the conflicting MAC address will not be learned on the link to the remote data
center. Data center interconnect technologies such as HP EVI will block them
automatically. If some other interconnect is used, then Ethernet ACLs can be
deployed.

Layer 3: Multi-Customer CE (MCE)


In single-tenant environments, traditional Layer 3 routing might be sufficient. In such
a deployment, a single routing table is used. All layer 3 VLAN interfaces will be
serviced by this single routing table.
This could possibly be sufficient for multi-tenant environments, since communication
between different customers Layer 3 interfaces can be secured with ACLs.
However, great diligence is required with this deployment. A simple ACL
configuration error could expose one tenants traffic to others. Also, with dynamic
routing exchange, it is possible for one customer to advertise a route that could
conflict with another customer. For these reasons, isolated routing tables are
preferred in a multi-tenant environment.
The MCE feature, also known as VRF-Lite, can be applied to Comware devices to
create VPN Instances, as shown in Figure 12-7. This provides isolated routing tables
for tenants, perhaps created by different routing protocols.

******ebook converter DEMO Watermarks*******

Figure 12-7: Layer 3: Multi-Customer CE (MCE)

Data center administrators can maintain control over all tenant routing functions.
Each MCEs routing instance can have routing limits applied, and all MCEs can be
managed by a single administrator or team. The MCE feature is supported by most
HP Data Center Switches.

Layer 3: Multi-tenant Device Context (MDC)


MDC is a technology that can partition a physical device or IRF fabric into multiple
logical switches called MDCs. For a traditional customer deployment, a single
admin MDC can host Multiple L3 MCE instances. Hundreds of customers can be
supported, each with their own IP routing table. All these IP VPN Instances share the
same underlying hardware ASICs, with a single management interface.
If the design requires more isolation, multiple MDCs can be used. Each MDC will
have its own set of L3 MCE VPN instances, bound to its own set of dedicated
hardware resources. Each MDC can be separately managed by different
administrative groups, as shown in Figure 12-8. This requires dedicated line cards or
interfaces on the core devices.

******ebook converter DEMO Watermarks*******

Figure 12-8: Layer 3: Multi-tenant Device Context (MDC)

Up to eight customer MDCs can be created, depending on the chassis-based switch


model deployed.

Layer 3: Traditional Sub-interfaces


With a traditional, single-tenant routing design, a physical routed interface can be
created on a switch by disabling the ports Layer 2 bridge function. The IP address is
configured directly on the interface, so there is no need to create a virtual VLAN
interface on the switch.
Additionally, this physical interface can be logically divided into sub-interfaces,
with an IP address assigned to each one. In this scenario, ingress packets will be
serviced by the appropriate Layer 3 sub-interface, based on traditional 802.1q VLAN

******ebook converter DEMO Watermarks*******

tagging.
These VLAN tags are locally significant. For example, the sub-interface configured
to service frames with an 802.1q tag for VLAN 10 will route this traffic. It will not
be bridged at Layer 2, since that functionality is disabled on the interface.
Sub-interfaces (like the one servicing VLAN 10) simply perform normal Layer 3
routing. As such, the Layer 2 frame, including the original tag, is removed. The
destination IP address is compared to the route table, and the best-path egress
interface is selected. This could be another sub-interface, perhaps servicing VLAN
20. A new frame is added, with appropriate 802.1q tag, and the frame is transmitted.
In this way, a single physical interface can be used to route traffic between multiple
VLANs.
In a multi-tenant environment, one physical interface might route between the ten
VLANs in use by a single customer, with a sub-interface for each VLAN. Each of
these sub-interfaces can be assigned to that customers IP VPN Instance, and so
becomes a part of that clients isolated routing table. In this scenario, physical
interface functionality can be extended by using QinQ sub-interfaces.

Layer 3: QinQ Sub-interfaces


In a multi-tenant, QinQ-based environment, each customer has their own set of 4094
VLANS that use an inner 802.1q C-VLAN tag. Each customer is assigned a unique SVLAN tag, which is added as an outer tag to maintain tenant separation over a
common fabric.
If a traditional sub-interface receives a QinQ frame, it would only consider the outer
S-VLAN tag to service inbound frames. For this reason, a separate physical interface
would be required for each customer.
However, a routed QinQ sub-interface is designed to interpret both the inner and
outer tags. It can parse the S-VLAN tag to maintain tenant separation, and interpret
inner C-VLAN tags to route for each customer VLAN space. This means that a router
with a single 10Gbps interface can provide inter-VLAN routing functions for
multiple customers.
Besides this additional functionality, QinQ sub-interfaces are very similar to
traditional sub-interfaces. It is a single physical interface with its Layer 2 bridging
function disabled, thus enabling it to operate as a Layer 3 routed interface. This
interface is then logically divided into sub-interfaces. Each sub-interface has unique
IP addressing assigned.

******ebook converter DEMO Watermarks*******

Layer 3: Network Function Virtualization


(NVF)
Network Function Virtualization (NVF) provides each tenant with a dedicated Layer
3 routing service, from within a hypervisor environment, see Figure 12-9. For
example, an HP Virtual Service Router (VSR) runs inside a VMWare ESXi and
KVM. Support for other hypervisor platforms are expected with future releases.

Figure 12-9: Layer 3: Network Function Virtualization (NVF)

The HP VSR provides a dedicated Layer 3 service that can be managed by the tenant,
or by the data center provider. It supports a broad Layer 3 feature set, including
VRRP, OSPF, MCE, and MPLS.
Different performance levels are available, since it can be deployed in a version that
uses a single vCPU, or 4 vCPUs. Since it runs as a VM, two or more vNICs can be
assigned. These can be connected to a traditional customer VLAN or to a customer
VXLAN, which would be terminated by the ESX host. In this way, the HP VSR can
be used to route between VXLAN services and traditional VLANs.

Layer 3: Remote Site IP Path to Data Centers


******ebook converter DEMO Watermarks*******

Now that various Layer 2 and Layer 3 technologies have been reviewed, it is
appropriate to discuss connectivity between remote sites.
Solution redundancy and administrative flexibility are enhanced by extending a
VLAN across two data centers. This is because any VM on that VLAN can be located
and easily moved to either of the two sites.
Meanwhile, user sites will typically have a Layer 3 IP routed path to each data
center. This scenario can lead to a condition known as traffic Tromboning.

Tromboning
In the scenario in Figure 12-10, the user site is serviced by the router at the top of the
figure, at IP address 10.3.1.1. Two redundant data centers are used, called DC1 and
DC2. A Layer 2 service connects the two data centers. The DC routers are configured
to use VRRP. The VRRP Primary router is at DC1, while the backup is at DC2.

Figure 12-10: Tromboning

User traffic for a server VM arrives at DC1, via the IP cloud. DC1 is aware that the
target server is actually at DC2. The traffic is therefore sent from DC1 to DC2 via the
Layer 2 interconnect circuit.
The server receives the user request and replies. This reply is via its default
gateway, which is across the Layer 2 circuit at DC1. Since the target VM is hosted at
DC2, while the active VRRP router is at DC1, there is a sub-optimal routing

******ebook converter DEMO Watermarks*******

condition. The traffic from the client doesnt directly go to DC2. Instead the traffic
initially arrives at DC1, and is then extended (like a trombone slide) out over to
DC2. The return traffic must allow follow this same sub-optimal path back to the
client.
This not only affects remote communication but also local communication. Any
routed communication between two servers in DC2 must traverse the DC
interconnect to be routed by the default gateway at DC1.

Layer 3: Path Optimization using VRRP


This sub-optimal path issue can be improved by filtering VRRP hello packets, either
automatically by using EVI, or by manually configuring an Ethernet ACL. The two
VRRP devices cant see each other, so they both assume the VRRP master role. Each
data center now has a local routing service, so intra-DC traffic will always be routed
locally.
Although the request from end-user to VM server may still take a sub-optimal path,
the return traffic will be forwarded directly by DC2s local router. Although this is
an improvement, the issue of asymmetric routing remains. It is not a best practice for
return traffic to use a path different than the request traffic.

Layer 3: VRRP Path Optimization Scenario


Figure 12-11 reveals how VRRP hello filtering can be used to improve packet flow.
There is still a VRRP primary at DC1, and the client may still use DC1 as its path to
reach a server VM that is actually located at DC2.

******ebook converter DEMO Watermarks*******

Figure 12-11: Layer 3: VRRP Path Optimization Scenario

However, when you filter VRRP hello packets, the router at DC2 also becomes a
VRRP Primary device. The return traffic is therefore optimized, as is traffic among
VMs at DC2.
Again, the asymmetric routing issue still remains. As you can see the user traffic and
server reply traffic are using different paths.

Layer 3: Path Optimization with a Load


Balancer
In the scenario described in Figure 12-11, the target servers IP subnet is announced
by some routing protocol, such as OSPF, by both DC1 and DC2. The client site sees a
lower cost to reach this subnet via DC1, and so that is the path used for all traffic
toward that subnet.
As you have seen, this lowest-cost model can lead to sub-optimal paths when a
VLAN is extended across data centers. This behavior can be optimized using a WAN
load balancer such as the F5 BIG-IP product.

Layer 3: Load Balancer Path Optimization


Scenario
In the scenario in Figure 12-12, traffic from clients is sent to a Big-IP load balancer.

******ebook converter DEMO Watermarks*******

The load balancer determines the targets location. The load balancers will adjust
client perception and routing to ensure that optimal paths are achieved. Clients
connect based on the DNS names, and the Big-IP devices at each site will
dynamically update information to maintain optimal, symmetric paths.

Figure 12-12: Layer 3: Load Balancer Path Optimization Scenario

Data Center Interconnect (DCI) Overview


Data center designs can leverage any of several technologies for interconnections, as
shown in Figure 12-13. This includes Multi-site IRF, IRF with Link Aggregation,
EVI, L2VPN/VPLS and SPBM.

******ebook converter DEMO Watermarks*******

Figure 12-13: Data Center Interconnect (DCI) Overview

DCI: Multi-Site IRF


Multi-site IRF (or Geo-IRF) creates a single logical entity using physical devices at
different sites. Dedicated fiber links are required for this solution. You cannot use
any type of engineered linked between IRF members.
One important design considerations is that each data center should be an isolated
failure domain. With multi-site IRF, an issue inside the IRF control plane would
impact both data center locations. Also, complex split-brain scenarios can result
from this deployment. This scenario can also complicate the firmware update
process.
The impact at both data centers could require local devices to have connectivity to
remote IRF members. You can see this in Figure 12-14, where each local IRF is dual
homed to its local Geo-IRF member and the IRF member at the remote data center.
For these reasons, it is generally best to avoid using a single IRF system over
multiple sites.

******ebook converter DEMO Watermarks*******

Figure 12-14: DCI: Multi-Site IRF

DCI: Multiple IRF Systems with LinkAggregation


You can deploy multiple IRF systems with link aggregation. Each data center core is
comprised of an IRF system, and these IRF systems can be interconnected using
traditional link aggregation. There should be at least two or more links connecting the
data centers. Since each site operates an independent IRF group, these can be
traditional 1Gbps connections, 10Gbps connections, direct fiber links, some type of
engineered circuit, or a service providers L2VPN solution.
All the links are to be bundled using link aggregation so there are multiple
active/active paths between the data centers. LACP can be used to negotiate and

******ebook converter DEMO Watermarks*******

control the aggregated connections.


HP Data Center switches also have support for Ethernet Operations Administration
Management (OAM). This can be used as a heartbeat for the aggregated links. OAM
operates at Layer 2, in a similar way to BFD operation at Layer 3. If there is a failure
inside an MPLS cloud between data centers, LACP could take up to 90 seconds to
detect it. OAM can detect this condition in less than 500 milliseconds.
A deployment that uses IRF and link aggregation is very easy to maintain and operate.
Both switch and link failures are reliably and easily managed. This method is
especially appropriate for a design that requires two data centers. If more than two
data centers are required, some Layer 2 topology protocol must be used that can
prevent loops and perform topology calculation between sites.
IRF and link aggregation is available on all HP Data Center Switches.

DCI: IRF with Link-Aggregation


Figure 12-15 shows an example of two data centers using IRF with link aggregation.
Each data center has an IRF group. The IRF systems at each site are interconnected
using multiple links. These links might be dedicated fiber optic cables, or a Dense
Wave-Division Multiplex (DWDM) circuit. Whatever the connecting technology, all
of the links are bundled into a single, logical connection using link aggregation.

Figure 12-15: DCI: IRF with Link-Aggregation

DCI: Ethernet Virtual Interconnect (EVI)


For Ethernet Virtual Interconnect (EVI) each data center has local IRF systems which

******ebook converter DEMO Watermarks*******

can be interconnected by a Layer 3 network. The EVI protocol will transport local
data center VLANs and interconnect up to eight data centers. Multiple tenants can be
supported with multiple EVI networks.
Network IDs can be used to isolate traffic over the EVI system. If a single VLAN
space is used to host multiple tenants, you can allocate VLANs 100 - 199 for network
ID1, and VLANs 200 - 299 for Network ID2, and so on. In this way, you allocate
specific VLANs to each tenant, with separation maintained over the EVI network.
You could also use a QinQ device in front of the EVI device to ensure full VLAN
space isolation. Thus, all 4094 tenant C-VLANs are encapsulated in single S-VLAN
by the QinQ device. These S-VLANs will be transported and managed by EVI.
EVI provides multicast isolation, so VRRP hello blocking is automatic. EVI is
currently available on 12500 and 12900 model HP Data Center switches.
Do not combine the 12500 and 12900 into a single EVI deployment. These products
use different EVI encapsulations and therefore are not compatible for EVI network
deployments.

DCI: L2VPN / VPLS


L2VPN and VPLS provide Layer 2 bridging services. You can use L2VPN to connect
two data centers, or VPLS if more than two data centers must be connected. These
technologies could be offered by a service provider, or self-deployed and managed
by data center staff.
When a service provider delivers these services, a simple Ethernet connection is
provisioned at the local data center. Data center core devices can provide Layer 3
services over this connection.
Alternatively, the data center team might choose to deploy their own service. This
team must be aware that the WAN Edge devices are strictly Layer 2 devices. They
only provide the L2VPN or VPLS service to core switches, and cannot provide any
Layer 3 functions. The core switches must provide the routing service, just as with a
provider-delivered service.

DCI: SPBM
SPBM can be used to connect two or more data center sites with full multi-tenant
support. However it does require a Layer 2 Ethernet connection between sites. This
connection can be a direct fiber link or an L2VPN service.

******ebook converter DEMO Watermarks*******

One advantage to this approach is the availability of Layer 2 multi-path services.


Layer 2 Ethernet traffic will take the shortest path to any of the several sites that may
be connected.
This pathing is purely a Layer 2 function. Some data center device must be
configured to provide any required Layer 3 functionality. This could be in the form of
a dedicated core device, a separate MDC, or an HP VSR, possibly in conjunction
with third-party load balancers.

Storage (1)
Several protocols should be considered in a data center design. These include NFS,
iSCSI, FC, and FCoE.
Network File System (NFS) binds separate storage structures into a single virtual
drive, to provide file-level access. There are no significant design requirements for
the deployment of NFS. You can use a dedicated VLAN and server NIC if you need
guaranteed service for NFS. Another option is to let NFS share a NIC with other
traffic, perhaps using QoS to mitigate delivery issues, if needed.
iSCSI provides block-level access to storage systems. It is highly recommended to
have a dedicated VLAN for the iSCSI infrastructure. It is also best to use a dedicated
switch that was designed for storage traffic. For connectivity, both data and iSCSI
traffic will share the top-of-rack switch, which will have dedicated uplinks to the
Ethernet storage switch.

Storage (2)
For FC and FCoE deployments, it is vital to consult HPs validated design
documentation. This ensures that you are using the right components, and that all
components will interoperate.
A common option for Fibre Channel is HPs 5900CP. This switch provides full FC
fabric services to interconnect server native FC HBAs, native FC storage systems,
and HP virtual connect FC solutions. It can also provide Fibre Channel N-Port
Virtualization gateways to existing HP ISS SAN switches, such as the H/B/C/ series
SAN switches. The 5900CP can easily be added to service most existing native FC
environments.
Another Fibre Channel option is to use the HP Flex Fabric Blade system. This system
provides FCoE internally to the blade server CNA, as well as an NPV gateway to
traditional SAN switches.

******ebook converter DEMO Watermarks*******

For FCoE, the 5900CP can be used as the gateway between FCoE and the native
Fibre Channel systems.

Data Center WAN


Data center WANs require high-performance connectivity, often at speed of 10Gbps.
Fine-grained QoS services are needed to prioritize and control traffic over the links.
Dynamic VPN technologies ease deployments and improve scalability. For example,
DVPN is an HP proprietary technology that offers easy multi-site IPSec VPN tunnel
setup. Equipment should be able to perform high-speed encryption and hashing
functions in support of these IPSec tunnels. These features are all provided by HPs
router portfolio.

Overlay Technologies (1)


For data center designs that require an overlay technology, VXLAN should be used.
VXLAN provisions virtual Layer 2 networks for hypervisor VMs. The VXLAN
protocol is developed and promoted by VMware and other vendors.
Initially, VXLAN exists only between VMware hypervisors as a virtual construct.
Additional components are required for external Layer 3 VXLAN routing.
For single-customer deployments the hardware VXLAN gateway function of the 5930
can be used for routing. You must bind the VXLAN to a VLAN on the physical
interface with a service Instance.
Multi-tenant infrastructures can also use the 5930s hardware VXLAN gateway, using
a transport VLAN to reach the tenants Layer 3 gateway.
Another option is to deploy the HP VSR. You can configure multiple vNICs as routed
interfaces for the VSR. Some of these routed ports will be bound to VXLAN to reach
other VMs on the same VXLAN, while other ports will be bound to traditional
VLANs. This provides connectivity between VXLAN and the classic IP network.

Overlay Technologies (2)


The hardware-based 5930 VXLAN gateway requires VTEP tunnel setup to remote
VTEP endpoints. VTEP endpoints are ESXi servers with online VMs in the VXLAN.
This tunnel setup is automatically and dynamically orchestrated using the HP VAN
SDN Controller, which interacts with the VMware NSX controller.

******ebook converter DEMO Watermarks*******

Server Access Options


Considerations for server access options include storage access requirements.
Server-to-storage communications can be handled with a dedicated FC network, a
converged FCoE system, or via iSCSI.
Optimized hypervisor support is another consideration. This can be achieved by
supporting multiple paths for the hypervisor, management, vMotion, VM networks,
and storage networks. This way, each role can be isolated to their own network
paths. For RAID environment this can be provided by the Flex10 network cards.
Another consideration is the need for deep integration with the hypervisor network.
This can be achieved using EVB.

Overview of Data Center Layers


This section provides an overview of data center access layers, as listed below:
Access Layer
Blade Server One-Tier design
2-Tier design
3-Tier design
Layer 2 Fabrics

Access Layer Switching: ToR Design


Access Layer switching is done with Top-of-Rack switches. In a ToR design, servers
connect to access switches via copper or fiber within the rack, while the access
switch connects to backbone switches within the data center via multi-mode fiber
(MMF).
Often, the data center backbone consists of high-capacity aggregation or distribution
switches with Layer 2/3 capabilities. Ethernet copper is generally restricted to
relatively short runs inside the rack, while connections outside the rack can be made
with smaller form factor multi-mode fiber. This reduces the overhead weight, as well
as having the ability to be connected to different capacity interfaces to support the
bandwidth requirements of the rack.
Figure 12-16 lists advantages and disadvantages of this solution.

******ebook converter DEMO Watermarks*******

Figure 12-16: Access Layer Switching: ToR Design

Access Layer Switching: EoR Design


Another option for deploying access layer switches is to use and End-of-Row (EoR)
model. With this design, a rack containing the switching equipment is typically
placed at either end of a row of cabinets or racks. Bundles of cabling provide the
connectivity from each server rack to the switching equipment rack.
Servers are usually connected to a patch panel inside each server rack. The copper
or fiber bundles are home run to another patch panel in the rack containing the access
switches. The switches are then connected to the patch panel via patch cables. EoR
switches are typically connected back to the core with a series of fiber patch cables.
EoR does not imply that the network racks have to be placed at the end of the row.
There are designs where network switch racks are placed together in the middle of
cabinet/rack rows. Placing the switch rack in the middle of a row limits the length of
cables required to connect the furthest server racks to the nearest network rack.
Unlike the ToR model, where each rack is treated as an independent module, in the
EoR placement model, each row is treated as an independent module.
Figure 12-17 lists advantages and disadvantages of this design.

******ebook converter DEMO Watermarks*******

Figure 12-17: Access Layer Switching: EoR Design

Blade Server One-Tier Design


In a Blade server one-tier design, the blade enclosures include internal blade
switches. As shown in Figure 12-18, these serve as ToR access switches that connect
directly to the core, as would any ToR switch.

Figure 12-18: Blade Server One-Tier Design

******ebook converter DEMO Watermarks*******

This deployment signifies the current pinnacle of network virtualization. Server


blades allow for substantial compute density per rack, row, and data center. HP has
optimized the BladeSystem server portfolio to maximize virtualization capabilities.
This solution simplifies high performance networking, while providing flexible VM
networking and converged I/O options.

Blade Server One-Tier Design


The network edge physically starts with the FlexConnect fabric, but really starts
internally in the VMs and configured virtual switches. HP 12500 Switches are able to
communicate natively with VMware virtual switches, allowing updates of ARP
tables between physical and virtual switches, see Figure 12-19. The VMs and virtual
switches can provision VLANs, which in turn interoperate with the IRF fabric,
allowing seamless VM movement with vMotion and high performance frame
forwarding.

******ebook converter DEMO Watermarks*******

Figure 12-19: Blade Server One-Tier Design

Combining LACP and IRF in this design provides high-speed link aggregation with
re-convergence times at sub-50 ms in the event of a link failure. It also allows links
to be aggregated and utilized for higher bandwidth from the converged network
adapters across all switches to forward traffic.
This design also supports the dynamic storage of worldwide names. This feature
allows a FlexConnect module to be configured with IP and VLAN information once.
If a device fails, the replaced FlexConnect device will gracefully provision itself
with the original configuration.

******ebook converter DEMO Watermarks*******

This design supports long-range vMotion connectivity, enabling clustering or


synchronizing VMs between sites. The switches allow VPLS connectivity between
data center locations. Be aware that long-range fiber segments using MPLS can add
latency that limits the ability for VMs to use converged I/O resources between data
center locations. Long distance WAN networks generally address normal server
communications and disaster recovery efforts.

Simplified Two-Tier Design Objectives


Simplified two-tier design is very similar to the one-tier blade design. The difference
is that instead of the FlexFabric Blade servers, traditional rack servers are used, as
shown in Figure 12-20. They are connected to Top-of-Rack switch. This solution is
best suited to designs that must use a mixture of rack and blade servers. This flatter,
two-tier design is simpler, and introduces less latency than three-tier legacy models.

Figure 12-20: Simplified Two-Tier Design Objectives

Three-Tier Design Objectives


A three-tier design introduces an aggregation layer between access and core devices.
With this design, many ToR switches connect to a few aggregation layer switches.
These aggregation layer switches are then dual homed to a redundant core, as shown
in Figure 12-21.

******ebook converter DEMO Watermarks*******

Figure 12-21: Three-Tier Design Objectives

This type of design is suited to data center networks where added bandwidth, 10GbE
port capacity, and simplified management are paramount. It also helps ensure the
interoperability of legacy EoR and ToR switches. Although the depicted design
focuses on HP switches, IRF permits the addition of third-party switches at any level,
and will interoperate using standards-based networking.

Data Center Layer 2 Fabric


The data center Layer 2 fabric needs to provide a standards-based solution for largescale deployments. For a 2-tier design, traditional IRF will typically be used, along
with link aggregation.
3-tier designs would often use TRILL or SPBM. The TRILL solution provides a
standards-based, shortest path algorithm for Layer 2 forwarding, with ECMP loadsharing. The limitation of a maximum 4094 VLANs should be considered.
SPBM also uses a standards-based, shortest path algorithm. Its equal-cost loadsharing algorithm is more deterministic. Each tenants traffic can be configured to use
a single best path. SPBM is far more scalable, with support for over 16 million
tenants, each with its own unique I-SID assignment.

Example of an HP Standards-Based DC Fabric


Figure 12-22 shows an example of an HP standards-based DC fabric. The ToR
switches can be individual devices, or they could be deployed as an IRF system,
interconnected to the spine switches. The access switches provide layer 2

******ebook converter DEMO Watermarks*******

functionality, with Layer 3 services provided at the spine.

Figure 12-22: Example of an HP Standards-Based DC Fabric

Variations on this theme are possible. These include:


Scale out with more spine switches for increased bandwidth: i.e. With 4 spine
switches, 4 x 10GE is available instead of 2 x 10GE provided by 2 spine
switches
Single IRF spine switch with multiple members for chassis HA: i.e. 4 chassis IRF
spine
Multi-homed servers to multiple leaf switches
Layer 3 devices connected to leaf switches instead of spine

HP FlexFabric Core Switches


Figure 12-23 reviews HPs FlexFabric core switch portfolio.

******ebook converter DEMO Watermarks*******

Figure 12-23: HP FlexFabric Core Switches

HP FlexFabric Access Switches


Figure 12-24 reviews HPs FlexFabric access switch portfolio.

Figure 12-24: HP FlexFabric Access Switches

HP FlexFabric Routers
Figure 12-25 reviews HPs HSR router series.

******ebook converter DEMO Watermarks*******

Figure 12-25: HP FlexFabric Routers

IMC VAN Fabric Manager


IMC VAN Fabric Manager simplifies the management of data center and SAN
fabrics. It unifies the view of data center fabric components to enable quick
troubleshooting and pro-active management. It provides a unified view of all of the
network and storage devices in the data center fabric alongside the fabric health to
enable quick troubleshooting and proactive management.
It helps eliminate manual provisioning and allows you to easily configure Ethernet
Virtual Interconnect (EVI), shortest path bridging (SPB) or transparent interconnect
of lots of links (TRILL) through the same graphical user interface used to automate,
monitor, and manage your entire network, as shown in Figure 12-26.

******ebook converter DEMO Watermarks*******

Figure 12-26: IMC VAN Fabric Manager

Summary
In this chapter you learned:
Key drivers for new data center designs include large-scale consolidation,
optimized blade server deployments, server virtualization technologies, and new
application and delivery models.
Data center deployment models include traditional enterprise, traditional multitenant, and cloud computing designs.
Layer 2 data center solutions include traditional VLANs, VLANs with MDC,
QinQ, VLANs with TRILL, TRILL and MDC, SPBM, and SPBM with MDC.
Data center interconnect solutions include IRF with link-aggregation, EVI,
L2VPN/VPLS, and SPBM.
Layer 3 data center technologies include IRF, VRRP, MCE, MDC, QinQ subinterfaces, and the HP VSR.
Storage protocols include NFS, iSCSI, FC, and FCoE.
VXLAN can be used for data center designs that require an overlay technology to
automatically provision Layer 2 networks for hypervisor VMs.
Data center designs include using ToR access solution, EoR access solution, a
blade server one-tier design, 2-tier, 3-tier, and Layer 2 fabric designs.

Learning Check
******ebook converter DEMO Watermarks*******

Answer each of the questions below.


1. Name three use cases for the HP FlexFabric in the data center (Choose three)?
a. IEEE-compliant route/switch.
b. Leaf/spine.
c. SDN Overlay
d. Traditional 3-tier
e. 1-tier blade systems.
2. Choose three solutions that are appropriate for large, multi-tenant data center
solutions (Choose three).
a. IRF
b. Traditional VLANs
c. EVI + QinQ
d. MCE
e. SPB
3. TRILL, QinQ, and SPBM with MDC all provide possible Layer 3 services for
intra-data center connectivity
a. True.
b. False.
4. What are three possible Layer 3 solutions for data centers (Choose three)?
a. QinQ.
b. VRRP.
c. NFV.
d. SPBM.
e. MCE.
5. What are three advantages of a ToR design (Choose three)?
a. Issue isolation
b. Traffic isolation
c. Effect of physical disasters limited to ToR, instead of entire row
d. Fewer switches to manage as separate entities
e. Fewer rack-to-rack hops

******ebook converter DEMO Watermarks*******

Learning Check Answers


1. b, c, e
2. c, d, e
3. b
4. b, c, e
5. a, b, c

******ebook converter DEMO Watermarks*******

13 Practice Test
INTRODUCTION
This exam tests your skills and knowledge on how to deploy and implement the HP
FlexFabric Data Center solutions.
In this exam you will be tested on specific Data Center topics and technologies such
as Multitenant Device Context (MDC), Datacenter Bridging (DCB), Multiprotocol
Label Switching (MPLS), Fibre Channel over Ethernet (FCoE), Ethernet Virtual
Interconnect (EVI), Multi-Customer Edge (MCE). The exam will also cover topics
on high availability and redundancy such as Transparent Interconnection of Lots of
Links (TRILL) and Shortest Path Bridging Mac-in-Mac mode (SPBM).
This certification exam is designed for candidates with on the job experience. The
associated training course, which includes numerous hands on lab activities,
provides a foundation, but you are expected to have experience in the real world as
well.

Exam Details
The following are details about the exam:
Exam ID: HP2-Z34
Number of items: 60
Item types: Multiple choice (single response)
Exam time: 105 minutes
Passing score: 70%

HP2-Z34 Testing Objectives


5 % Fundamental HP FlexFabric Data Center architectures and technologies
Describe common data center networking requirements and options for data

******ebook converter DEMO Watermarks*******

center architectures.
5 % HP FlexFabric Data Center solutions, products, and warranty/service
offerings
Explain how the HP FlexFabric portfolio, including switches, routers, and IMC
modules, meets common data center needs.
15 % HP FlexFabric Data Center solution planning and design
Plan how to use data center technologies (such as MDC, MPLS, MPLS Layer 2
VPNs, VPLS, MCE, TRILL, SPBM, DCB, FCoE, and EVI) for common data
center use cases.
Explain the impact of various data center technologies on the network design.
55 % HP FlexFabric Data Center solution implementation (install, configure,
setup)
Implement various forms of virtualization on HP solutions, including HP MDC
and MCE (VRF Lite).
Configure HP solutions to extend Layer 2 connectivity between and within sites
(using appropriate technologies, such as MPLS Layer 2 VPNs, VPLS, SPBM,
and HP EVI).
Configure HP solutions to support LAN/SAN convergence using technologies
such as DCB and FCoE.
15 % HP FlexFabric Data Center solution enhancement (performance-tune,
optimize, upgrade)
Provide resiliency, efficiency, and load-balancing for data center infrastructure
solutions.
Enhance QoS for storage traffic carried in the data center LAN.
5 % HP FlexFabric Data Center solution troubleshooting, repair and replacement
Verify and troubleshoot the implementation of various data center technologies.

Test Preparation Questions and Answers


The following questions will help you measure your understanding of the material
presented in this study guide. Read all the choices carefully, as there may be more
than one correct answer.
Choose all correct answers for each question.

******ebook converter DEMO Watermarks*******

Questions
1. Refer to the exhibits.

Figure 13-1: Exhibit 1 for question 1

Figure 13-2: Exhibit 2 for question 1

Switch 1 is configured as shown in the second exhibit. When traffic destined to


servers on other switches arrives on interfaces Ten1/0/1 to Ten1/0/48, Switch 1
should send the traffic over the TRILL region. (Similarly, it should egress traffic
from the TRILL region on those interfaces.)
How should the administrator configure interfaces ten1/0/1 to ten1/0/48?
a. Create a service instance on these interfaces; the service instance
references a TRILL virtual switch instance (VSI).
b. Configure these interfaces as TRILL trunk ports.
c. Configure these interfaces as TRILL access ports.

******ebook converter DEMO Watermarks*******

d. Add VLAN 100 as a permitted VLAN on