You are on page 1of 58

High Performance Networks

Chapter 6
Multi-Protocol Label Switching
Introduction

MPLS is an important tool for backbone service provider, e.g.


carrier service providers, and Internet Service Providers (ISPs) to
solve network problems including: scalability, speed, QoS
management and traffic engineering.

MPLS represents the convergence of connection-oriented


forwarding techniques and the Internets routing protocols

MPLS leveraged the high performance cell switching capability of


ATM switch hardware and melded them together into a network
using existing IP routing protocol.

MPLS:
1.
supports traffic engineering, i.e. put the traffic where the
bandwidth is. MPLS provides the ability to explicitly set
single or multiple paths that the traffic will take through the
network. This feature allows optimizes bandwidth utilization
of underutilized paths and solve congestion.
2.
allows service providers to create layer 3 (L3) virtual
private networks (VPNs) across their backbone network for
multiple customers.
3.
supports QoS. Service providers can provide multiple classes
of service with hard QoS guarantees to their VPN customers.

MPLS was based on and evolved from Ipsilon/Nokias IP


Switching and Ciscos Tag Switching.

MPLS Elements

Terminology:

Label Switch Path (LSP).


This is a unidirectional logical path that an MPLS frame
travels the network.

Forwarding Equivalence Class (FEC).


The FEC represents a group of packets that share the same
requirements for their transportation. All packets in such a

High Performance Networks

6-1

group are provided the same treatment en route to the


destination. The assignment of a particular packet to a
particular FEC is done once when the packet enters the
MPLS network. FECs are based on service requirements (e.g.
QoS requirements) for a given set of packets or simply for an
address prefix, e.g. IP address prefix.

Label Switch Router (LSR).


AN LSR:
1.
is a core router in an MPLS network;
2.
participates in the establishment of LSPs using Label
Signaling Protocols, e.g. RSVP-TE (RSVP traffic
engineering/tunnel extension) or CR-LDP (constraintrouted label distribution protocol); and
3.
performs high speed switching of labeled traffic on
established LSPs.

Label Edge Router (LER).


1.
An LER is a device at the edge of an MPLS network.
It can be an Ingress LSR or an Egress LSR;
2.
The Ingress LSR assigns an FEC to an incoming
packet once;
3.
The packets in an FEC are then assign to an LSP
based on traffic criteria; and
4.
The Egress LSR removes labels from traffic coming in
from an incoming LSP.
Note: An LSP extends from and Ingress LSR to an Egress
LSR.

Label Information Base (LIB).


An LIB is a table created in each LSR or LER that relates
incoming label and interface to outgoing label and interfaces.
An LIB also contains FEC-to-label bindings.

Packet-based MPLS networks can have IP router-based LER with


either
1.
ATM based Core LSRs OR
2.
IP router-based Core LSRs,
and use packet based transport technologies to link the LSRs.
ATM based Core LSRs are ATM switches with their control plane
replaced with IP control plane (e.g. running an instance of the
networks Interior Gateway Protocol (IGP))

High Performance Networks

6-2

Figure 6.1 shows a typical MPLS network.


On each physical link, a particular label specific within the context
of that link represent a segment of an LSP. The association
between actual label values and LSP at any hop can be created on
demand by RSVP-TE or CR-Label.
MPLS Backbone/Domain
LER

Core LSR

LER

Figure 6.1 MPLS Consists


of Edge, Core and Label
Switched Paths (LSPs)

Label Switched Paths and Per-hop Processing

An LSP need not follow the shortest path


edge LSRs. External routing algorithms
determine new routes (non-shortest) for
result in more optimal distribution of
network.

between any two


can be used to
LSPs. This can
loads around a

Multi-protocol Label Switching Label Encoding


o
A label can be mapped to an ATM VPI/VCI or to a
Frame Relay DLCI
o
For layer 2 (L2) protocol that does not offer a labeltype field (e.g. Ethernet or the Point-to-Point
Protocol), then the 32-bit MPLS label (or label stack)
forms a shim layer between layer 2 and the network
layer (see figure 6.3a). Figure 6.2 shows the structure
of the generic MPLS Frame and the format of each
label:

The 20-bit label indicates the LSP to which the


packet belongs

The 3-bit experimental field, e.g. indicates


additional queuing and scheduling disciplines
independent of the LSP

The 8-bit Time to Live (TTL) field is defined


to assist in the detection and discard of looping
MPLS packets

High Performance Networks

6-3

The S bit is set to 1 to indicate the final stack


entry before the original packet
The stacking scheme allows for LSPs to be tunneled
through other LSPs.

Figure 6.2 MPLS Label


Stack Encoding for Packet
Oriented Transport

Encoding Labels on Specific Links

For packet-based link layers, the MPLS frame is


simply placed within the links native frame
format. Figure 6.3a shows the frame format for
MPLS frame carried by PoS.

For MPLS frames carried by ATM, see figure


6.3b, packet to cell conversion is required at the
ingress node and vise versa at the egress node
of the LSP for MPLS frame carried by ATM.

Shim layer

Figure 6.3an MPLS Encoding for Point-toPoint Protocol (PPP) over SONET links

Figure 6.3b MPLS Encoding for ATM Links

Label Creation and Binding


For L2 protocol that does not offer a label-type field, an LSR
or LER can create a label or select one from a pool of labels
and then bind it to an FEC as result of some events that
indicates a need for such label creation and binding.
The events that trigger label creation and FEC-label binding
can be:

High Performance Networks

6-4

1.
2.

the reception of a signaling message, e.g. RSVP-TE


messages; or
the reception of a data packet.

Data Packets Processing in a Core LSR.


It is not necessary for a core LSR to classifying (i.e.
separating) incoming packets based on their IP header
contents (e.g. destination address). The MPLS label itself
together with the identity of the arrival interface provide all
the necessary context to determine a packets next hop and
the associated metering, policing (i.e. packet dropping) or
marking, queuing, and scheduling rules, see figure 6.4. Note
the separation of control plane and forwarding plane in
figure 6.4. The forwarding plane is used to forward user data
traffic. The control plane is responsible for the distribution of
network topology and traffic engineering attributes, and for
the setting up of LSPs.
The switching table (also known as forwarding table)
contains one or more LIB(s) for labels the LSR knows about,
including a new label to apply when the packet is forwarded.
Figure 6.5 shows an example of the format of a switching
table. Switching table entries are modified whenever a new
label needs to be activated or an old label needs to be
removed.

Signaling
(RSVP-TE, CR-LDP)

Routing
(OSPF-TE, IS-IS-TE)
Control Plane

Switching table
Output
Ports

LIB
LIB

Policing &
Marking

Switching
Fabric

Queuing &
Scheduling

LIB

MPLS Frame
Context
Input
Ports

Context influences subsequent processing

MPLS LABEL

IP Packet
Forwarding Plane

Figure 6.4 Simplified diagram for a core LSR. The forwarding


plane shows the components of the forwarding engine

High Performance Networks

6-5

Figure 6.5 An Example of a Switching/Forwarding/LIB Table

In port
1
2
11

In label
46
102
111

Output port
10
4
5

Out Label
48
201
-

Action
Swap
Push
PoP

Data Packets Processing in an LER (both ingress & egress)


o
An LER is located at the edge of an MPLS network,
see figure 6.1. An LER originates and/or terminates
LSPs and performs both label-based forwarding and
conventional IP routing functions
o
At ingress, an LER accepts unlabeled packets and
creates an initial MPLS Frame by pushing one or
more MPLS Label entries.
o
On egress, the LER terminates an LSP by popping the
top MPLS stack entry and forwarding the remaining
packet based on rules indicated by the popped label,
e.g. if the payload is an IPv4 packet then it should be
forwarded according to IP routing rules.
o
Figure 6.6 shows the simplified diagram of the
components in the data forward plane of an Ingress
LSR. It shows how an incoming IP packet is being
labeled for transmission out an MPLS interface.
o
Conventional IP packet processing/classification
determines the FEC (e.g. based on the destination IP
address prefix, QoS requirements, etc.). The
forwarding table provides the FEC-to-label binding
which in turn will determine the LSP for this packet.
o
Once labeled, packets are transmitted into the Core
along the chosen LSP.

Figure 6.6 Forwarding Plane of an Ingress Label Edge Router


High Performance
Networks
6-6can be either packet or cell based. When cell based, the
Note:
The MPLS output technology
MPSL frame is further segmented and the VPI/VCI is set to the value of the top label in the
MPLS Label Stack

An ingress LSR classifies incoming IP packets, using


as much header information as necessary to map
packets to the correct LSP and to correctly set the
Experimental bits.
In general, the number of per-hop behaviors (e.g. the
number of queues at each outgoing interface port) in
LERs is more than that in core LSRs. So packets that
are being classified into different per-hop behaviors at
the ingress LSR may end up being mapped to the
same per-hop behaviors at the core LSRs. This is
referred to as an MPLS Behavior Aggregate. The label
field, the experimental field or some combination of
both fields may define an MPLS Behavior Aggregate.
Besides traffic classification, ingress LSRs are also
responsible to perform other traffic conditioning
functions, e.g. rate shaping and/or policing/marking of
the traffic going onto particular LSPs to maintain
overall service goals.
Ingress rate shaping is required on traffic that is
destined to be combined with other traffic to form a
behavior aggregate further downstream. For behavior
aggregates encompassing multiple micro-flows, core
LSRs are unable to mediate between aggressive and
non-aggressive micro-flows within the behavior
aggregate. Ingress rate shaping bounds the
interference between micro-flows making up a
behavior aggregate.

Driving Per-hop Behavior for QoS Requirements


o
By defining a particular LSP is associated with a
particular FEC. A packet is assigned to an FEC at the
Ingress LSR when it enters an MPLS domain. The
LIB provides the FEC-to-label binding which in turn
maps to an LSP.
There are 3 approaches for establishing edge-to-edge
QoS requirements:
1.
Use only the label field to encode both the next
hop (i.e. path information) with each distinct
queuing and scheduling behavior as a new FEC
(LSP), i.e. LSPs following the same path can
have different QoS behavior. See figure 6.7a.

High Performance Networks

6-7

2.

3.

This approach will create a huge amount of


active LSPs.
The Experimental field encodes up to 8
additional queuing and scheduling behaviors for
the same FEC (LSP). See figure 6.7b.
The Experiment field encodes up to 8 queuing
and scheduling behaviors independent of the
FEC (LSP). Figure 6.7c shows that the label
field encodes only the next hop (i.e. path)
information. Destination IP address prefixes
can be used as the only attribute to create LSPs
for this approach. Besides, this approach will
create the least amount of active LSPs.

Figure 6.7a The Label alone can provide Per-Hop Context

Figure 6.7b The Label and Experimental Bits


together provide Per-hop Behavior Context

Figure 6.7c Experimental Bits alone


provide Per-hop Behavior Context

An MPLS network must provide appropriate policing


and traffic rate shaping (smoothing of traffic bursts) at
the edges when the core LSRs are queuing and
scheduling on limited information.
An MPLS network that takes path information (the
Label value) as part of the packets context can have
more finely grained control over per-hop resource
sharing than a DiffServ network can have. This is
because MPLS Label value is essentially a
compressed version of the information derived from
multi-field (MF) classification (e.g. using the

High Performance Networks

6-8

destination address and the type of service fields at IPv4 header for traffic separation) at the ingress to the
MPLS network. If the Edge LSRs classify individual
flows onto their own LSPs, the Label value at any hop
allows a Core LSR to know enough contexts to
differentiate packets at flow level.

Traffic Engineering over MPLS

The task of mapping traffic flows onto an existing physical


topology is called traffic engineering. Specifically, traffic
engineering provides the ability to move traffic flows away from
the shortest path selected by the IGP (e.g. OSPF or IS-IS) and onto
a potentially less congested physical path across the service
providers network, see figure 6.8
LSR

LER

LER

Hong
Kong

Peking
LSR

LSR

IGP Shortest Path from Hong Kong to Peking


Traffic Engineering Path from Hong Kong to Peking

Figure 6.8 Traffic Engineering Path verse IGP Shortest Path across an MPLS Network

Traffic engineering allows service providers to balance the traffic


load on the various links, routers and switches in the network so
that none of these components is over-utilized or under-utilized. In
this way, service providers can exploit the economies of the
bandwidth that has been provisioned across the entire network.
This helps to cut down the operational cost and capital investment.

Applications for Traffic Engineering


Existing IGPs can actually contribute to network congestion
because they do not take bandwidth availability and traffic
characteristics into account when building their routing tables.
Service providers understand that traffic engineering can be used to
significantly enhance the operation and performance of their
networks. They intent to use the capabilities of traffic engineering
to:

High Performance Networks

6-9

1. route primary paths around known bottlenecks or points of


congestion in the network;
2. provide precise control over how traffic is rerouted when the
primary path is faced with single or multiple failures;
3. provide more efficient use of available bandwidth and long haul
fiber, i.e. make sure there is no over-utilized or under-utilized
components;
4. make themselves more competitive within their markets by
maximizing operational efficiency, resulting in lower
operational costs;
5. enhance the traffic oriented performance characteristics of the
network by minimizing packet loss, minimizing prolonged
periods of congestion and maximizing throughput;
6. enhance statistically bounded performance characteristics of the
network (such as loss ratio, delay and jitter) that are required to
support multi-services; and
7. provide more options, lower costs and better services to their
customers.

History

In the early 1990s, service providers (e.g. ISPs) networks


were composed of routes interconnected by leased lines T1
and T3 links. When the demand for bandwidth increased
faster than the speed of individual network links as in the
case of Internet growth spurt in those days, the services
providers responded by simply provisioning more links to
provide additional bandwidth.
The traffic-engineering tool available in those days for
router-based networks was the manipulation of routing
metrics. Referring to figure 6.9, assume Network A sends a
large amount of traffic to Network C and Network D. With
the metrics in this figure, links 1 and 2 might be congested.
This is because both Network A-to-Network C and Network
A-to-Network D flow over link 1 and 2. However, if the
metric for link 4 were changed to 20, then Network A-toNetwork D would be moved to link 4.
Network A

Network D
Link 4
Metric = 40

Router A

Router D

Link 1
Metric = 10

Link 3
Metric = 10
Router C

Router B
Link 2
Metric = 10
Network B

High Performance Networks

Network C

6-10
Figure 6.9 Metric Based Traffic Control

There are quite a few limitations in metric based traffic


control and router based core networks:
1.
Traffic engineering based on metric manipulation is
not scalable. A metric adjustment in one part of a
large network is very likely to cause problems (i.e. hot
spots) in another part of the network. Metric
adjustment is a trial-and-error approach, not a
scientific solution;
2.
Tradition software based routers have limited packet
processing power and aggregate bandwidth; and
3.
IGP (e.g. OSPF, IS-IS & RIP) route calculation is
topology driven and is based on simple additive metric
such as the hop count or an administrative value.
Bandwidth availability and traffic characteristics, i.e.
traffic load in the network, are not taken into account
when the routes are calculated. This results in noneven distribution of traffic across the network, causing
inefficient use of expensive resources and congestion
as well.

Around the mid 1990s, the volume of Internet traffic reached


a point that ISPs and Internet backbone service providers
were required to migrate their networks to support trunks
that were larger than T3 (45 Mbps). At that time, OC-3 (155
Mbps) ATM interfaces were available. Service providers
then moved their router-based networks to the IP overlay
model (i.e. IP over ATM), see figure 6.10.
ATM Core
ATM switch

ATM switch

Router
POP 1

POP 4

POP 2

POP 3
ATM switch

ATM switch

Figure 6.10 The IP over ATM (IP overlay) Model

Operation of IP over ATM:

High Performance Networks

6-11

OC- 3 Links
OC-12 Links

PVCs traversing the ATM core are used as logical circuits to


provide connectivity between edge routers. A set of PVCs is
configured to fully connect the routers at the edge of the
ATM core. This maps the physical topology in figure 6.10 to
a logical topology as shown in figure 6.11. The physical
paths for the PVC overlay are usually calculated by an
offline configuration utility. The PVC paths and attributes
are globally optimized by the offline configuration utility
based on link capacity and historical traffic patterns (i.e.
traffic engineered). The offline configuration utility can also
calculate the set of secondary backup PVCs that is ready to
respond to failure conditions. Finally, after the globally
optimized PVC mesh has been calculated, the supporting
configurations are downloaded to the routers and the ATM
switches to implement the single or double full-mesh logical
topology, see figure 6.11. When congestion occurs, a new
trunk is added or a new POP is deployed.
Router
POP 4
POP 1

ATM Core
POP 2
POP 3

Figure 6.11 Logical IP Topology over an ATM Core

The edge routers have knowledge only of the individual


PVCs that appear to them as simple P2P circuits between
two routers. They do not have any knowledge about the
ATM infrastructure. The mapping of IP prefixes (routes) to
PVCs at each edge router is also determined by and
downloaded from the offline configuration utility.
Finally, ATM PVCs are integrated into the IP network by
running the IGP across each of the PVCs to establish peer
relationships and exchange routing information.
Advantages of the IPoATM Model:
1.
ATM offered the bandwidth that service providers
needed in the mid 1990s.
2.
ATM supports traffic engineering via the manipulation
of PVCs.
3.
ATM provides deterministic performance.
High Performance Networks

6-12

4.

Compared with software-based routers, ATM switches


forward the packets/cells much faster, provide higherspeed interfaces and significantly greater aggregate
bandwidth.

Disadvantages of the IPoATM Model:


1.
It is complicate and expensive to co-ordinate, operate
and manage two different networks.
2.
Currently, the maximum speed supported by ATM
interfaces is only up to OC-48 (2.488 Gbps). It is
complicate and expensive for ATM to support
interfaces beyond this speed.
3.
A cell tax (i.e. overhead) of around 20% is paid when
IP packets are carried over ATM infrastructure.
4.
ATM suffers from the n-square PVC problem.
5.
A large number of full-mesh connected routers create
IGP stress.
6.
Not being able to seamlessly integrate L2 and L3.

Components of MPLS Traffic Engineering

MPLS provides router-based traffic engineering solution.


There are four functional components:
1.
information distribution;
2.
path selection;
3.
signaling; and
4.
packet forwarding

Information Distribution
Traffic engineering requires details knowledge about
1.
network topology; and
2.
dynamic information about network loading.
Distribution of this information for traffic engineering can be
achieved by simple extensions to OSPF or IS-IS to include
link attributes as part of each routers link state
advertisement. Extensions to OSPF and IS-IS for traffic
engineering are known as OSPF-TE and IS-IS-TE
respectively. IS-IS-TE is achieved by defining new TypeLength Values (TLVs)/objects. OSPF-TE is implemented
with Opaque LSAs (link state advertisements). The standard
flooding algorithm used by link state IGP ensures that link
attributes are distributed to all routers in the service
providers routing domain. Some of the traffic engineering
extensions added to the IGP link state advertisement are:

High Performance Networks

6-13

1.
2.
3.
4.
5.

maximum link bandwidth;


maximum reservable link bandwidth;
current bandwidth reservation;
current bandwidth usage; and
link color. Links or resources can be classified into
different classes. Links or resources indicated by the
same color are said to belong to the same class. For
example, if OC-48 links are indicated with a color,
then all other types of links of OC-48 capacity have
the same color. The links or resources color attribute
can be used to implement policies to optimize network
performance, e.g. to implement the generalized
inclusion and exclusion policy to restrict the
placement of traffic to a specific subset of links or
resources.

Figure 6.12 shows the components in the control plane of an


LSR. Components involved for information distribution are
bocks highlighted with bold lines. As shown in the figure,
every LSR in an MPLS domain maintains network link
attributes and network topology in a specialized Traffic
Engineering Database (TED). The TED is used exclusively
for calculating explicit paths for the placement of LSPs
across the physical topology. IGP continues the calculation
of the traditional shortest path based on the information
contained in the routers link state database.
LSR
Control Plane
LSP
Setup

IGP Route
Selection

LSP Route
Selection

Link State
Database

TE
Database

Signaling
Component
(RSVP-TE or
CR-LDP)

Information
Flooding

LSP
Setup

Information
Flooding
OSPF-TE/IS-IS-TE Routing

Packets
In

Packet Forwarding Plane Components


Data Forwarding Plane

Packets
Out

Figure 6.12 Information Distribution Components

Path Selection
Based on the information stored in the TED, every ingress
LSR in the MPLS domain can calculate the paths of its own
set of LSPs across the MPLS backbone. The path for each

High Performance Networks

6-14

LSP calculated can be either a strict explicit route or a loose


explicit route. A strict explicit route specifies all the LSRs in
the LSP while a loose explicit route specifies only some of
the LSRs in the LSP, see figure 6.13.
MPLS Backbone
LSR 1

LSR 2
LSR 3

Ingress
LSR

LSR 4

LSR 5

LSR 6

Hong
Kong

Egress
LSR
Peking

Strict: [4, 5, 6]
Loose: [4, 9]

LSR 7

LSR 8

LSR 9

Figure 6.13 Ingress LSR Calculates Explicit Routes

The concept of constraint-based routing is used to calculate


the physical path for an LSP. Constraint-based routing can
co-exist with IGP routing. Constraint-based routing takes in
more attributes or constraints such traffic flow or traffic
trunk attributes (to be discussed later), network technology,
link attributes, etc to calculate the path. The resultant path
created is usually deviated from the shortest path calculated
by traditional IGP, e.g. OSPF.
With reference to figure 6.14, the Constraint Shortest Path
First (CSFP) algorithm takes into account specific
restrictions stored in TED to calculate the shortest path
across the MPLS backbone. Input into the CSPF algorithm
includes:
1.
topology link state information;
2.
attributes associated with the state of network
resources, e.g. maximum link bandwidth, current
bandwidth reserved, etc. (i.e. link attributes);
3.
attributes required to support traffic traversing the
proposed LSP, e.g. bandwidth requirements (i.e.
traffic flow or traffic trunk attributes); and
4.
other administrative attributes, e.g. maximum hop
count, administrative policy requirements such as the
inclusion or exclusion of certain class of link.
Note: 1 and 2 above is distributed by OSPF-TE or IS-IS-TE.
3 and 4 are entered by the operator.
High Performance Networks

6-15

The output of the CSPF calculation is an explicit route


consisting of a sequence of LSR addresses that provides the
shortest path through the network that meets the constraints.
This output is then passed to the signaling component to
setup the LSP, see figure 6.14.
Explicit Route
Ingress LSR
Control Plane
LSP
Setup

IGP Route
Selection

CSPF Path
Selection

Link State
Database

TE
Database

Signaling
Component
(RSVP-TE or
CR-LDP)

Information
Flooding

LSP
Setup

Information
Flooding
OSPF-TE/IS-IS-TE Routing

Packets
In

Packet Forwarding Plane Components


Packet Forwarding Plane

Packets
Out

Figure 6.14 Path Selection Component

The online path calculation discussed above is not


deterministic, that is the physical path selected for an LSP
depends on the order in which this LSP is calculated.
Therefore, usually an offline planning and analysis tool is
available for global optimization. This offline tool performs
global path calculation for all the required LSPs
simultaneously and selects the best solution for the network
as a whole. The output of the offline calculation is a set of
explicit routes for the LSPs. These LSPs can be installed in
any order and the utilization of network resources is optimal.
Traffic Trunks
As indicated in IETF RFC 2430, the definition of traffic
trunks is: A traffic trunk is an aggregation of traffic flows
of the same class which are placed inside an LSP. Traffic
flows that share specific attributes, e.g. ingress LSR, egress
LSR, average rate, peak rate, priority, FEC, etc., belong to
the same class. Traffic trunks can be mapped to a set of LSPs
and can also be moved from one LSP to another LSP either
automatically or through administrative intervention. This
enables the network to adapt to changing load condition.
As described in IETF RFC 2702, the following attributes of
traffic trunks are significant for traffic engineering:

High Performance Networks

6-16

1.

2.

3.
4.

5.

6.

7.

Traffic parameter attributes. This is the resources


requirement of a traffic trunk, e.g. peak rate, average
rate, burst size, etc.
Generic path selection and maintenance attributes.
These attributes define how paths are selected, i.e. via
constraint-based routing signaling (e.g. RSVP-TE),
IGPs or other manual means.
Priority attribute. This attribute defines the relative
importance of traffic trunks.
Preemption attribute. This attribute determines
whether a traffic trunk can preempt another traffic
trunk from a given path.
Resilience attribute. This attribute indicates whether
to reroute or leave the traffic trunk as is under a failure
condition.
Policing attribute. This attribute determines the
actions that should be taken by the network when a
traffic trunk exceeds the traffic parameters specified in
its contract.
Resource attributes. This attribute constrains the
placement of traffic trunks. For example, use the
resource color/class attribute to include or exclude a
specific set of resources in the placement of traffic
trunks.

Signaling Component
After the explicit route for an LSP has been calculated, this
LSP can be installed by:
1.
manual configuration; or
2.
using a signaling protocol to establish the LSP and
distribute the labels, see figure 6.15.
Manual configuration requires going into each and every
LSR along the path and specifying the incoming
label/interface and outgoing label/interface. This is much
like provisioning ATM PVCs.
Explicit Route
LSR
Control Plane

LSP
Setup

IGP Route
Selection

LSP Path
Selection

Link State
Database

TE
Database

Signaling
Component
(RSVP-TE,
LDP or
CR-LDP)

LSP
Setup

Information
Flooding

Information
Flooding
OSPF-TE/IS-IS-TE Routing

High Packets
Performance Networks

6-17
Packet Forwarding
Plane Components
Packet Forwarding Plane

In

Figure 6.15 Signaling Component

Packets
Out

Using a signaling protocol is the preferred way to setup


LSPs and distribute labels. There are 3 label distribution
protocols used for label distribution and/or setting up LSPs
in MPLS. They are:
1.
LDP (Label Distribution Protocol).
LDP does not support traffic engineering nor explicit
routes. It executes hop-by-hop, i.e. each LSR along
the path looks at the IP routing table to determine
where the next hop for the LSP. It uses the same path
as the IGP. It is usually used to distribute labels
between LER peers in a targeted LDP session in
MPLS.
2.
RSVP-TE (RSVP with traffic engineering).
RSVP-TE supports traffic engineering and is widely
deployed in MPLS.
3.
CR-LDP (Constraint-based routing LDP)
CR-LDP extends LDP to support explicit routes (i.e.
traffic engineering).
o

More on Label Binding and Label Distribution


a.
Label binding is the mapping between an FEC
and a label. FEC-to-label bindings are stored in
the LIB table in an LER or LSR. FEC-to-label
binding can be triggered by some control
events, for example when an LSR receives a
label binding request from an upstream LSR
(downstream-on-demand); or when an LSR
discovers a next hop for a particular FEC.
b.
Downstream-on-demand Label Distribution
(figure 6.16)
1.
Upstream LSR1 recognizes downstream
LSR2 as its next-hop for an FEC
2.
A request is made to LSR2 for a binding
between the FEC and a label
3.
If LSR2 recognizes the FEC and has a
next hop for it, it creates a binding and
replies to LSR1
4.
Both LSRs then have a common
understanding about this label.

High Performance Networks

6-18

Example of protocol that distributes labels in


this fashion is RSVP-TE.
LRS1

Request for Binding for an FEC

LSR2

Figure 6.16 Downstream on


demand Label Distribution

c.

upstream

Label-FEC binding

downstream

Downstream Unsolicited Label Distribution


(figure 6.17)
1.
LSR2 discovers a next hop further
downstream for a particular FEC
2.
LSR2 generates a label for the FEC and
communicates the binding to LSR1
3.
LSR1 inserts the binding into its
LIB/switching table
4.
If LSR2 is the next hop to the FEC, LSR1
can use that label knowing that its
meaning is understood.
Example of protocol that distributes labels in
this fashion is CR-LDP. Note: LDP also
supports downstream on demand.
LRS1

upstream

LSR2

Label-FEC binding

downstream

Figure 6.17 Downstream Unsolicited Label Distribution

d.

e.

High Performance Networks

Label Control
MPLS defines modes for distribution of labels
to neighboring LSRs.
1.
ordered In this mode, an LSR binds a
label to a particular FEC and distribute
this binding to its upstream peers if and
only if it is the egress LSR OR it has
received the label binding for the FEC
from its next hop (downstream) LSR.
RSVP-TE works in this mode.
2.
independent In this mode, once an LSR
recognizes the next hop for a particular
FEC, it makes the decision to bind a label
to the FEC independently to distribute the
binding to its peers.
Label Retention
MPLS defines the treatment for FEC-to-label
bindings received from LSRs that are not the
6-19

next hop for a given FEC. Two modes are


defined:
1.
conservative In this mode, these FECto-label bindings are discarded. This
mode requires an LSR to maintain fewer
labels. This mode is for ATM-LSRs.
2.
liberal In this mode, these FEC-to-label
bindings are retained. This mode allows
for quicker switching of traffic to other
LSPs in case of topology changes.
o

Label Distribution Protocol (LDP)


The LDP is a new protocol defined by IETF for the
distribution of FEC-to-label binding information to
LSRs in an MPLS network. It is used to map FECs to
labels, which in turn create LSPs. The physical path
taken by these LSPs are the same as the routes
calculated by traditional IGP.
LDP sessions are established between LDP peers in
the MPLS network (not necessary adjacent). The peers
exchange the following types of LDP messages:
1.
discovery messages announce and maintain
the presence of an LSR in a network.
2.
session messages establish, maintain and
terminate sessions between LDP peers.
3.
Advertisement messages create, change and
delete label mappings for FECs.
4.
notification messages provide advisory
information and signal error information.
RSVP Traffic Engineering/Tunnel Extension
(RSVP-TE), IETF RFC 3209
Brief Review of RSVP

Resource Reservation Protocol (RSVP), an


IETF standard (RFC 2205), specifies resource
reservation techniques for IP networks

RSVP is a protocol that enables resources (e.g.


link bandwidth, queuing space, switching
bandwidth) to be reserved for a given session
(or sessions) prior to any attempt to exchange
media between the participants. Note: RSVP
does not carry user data. User data is
transported by RTP after the reservation
procedures are performed.

High Performance Networks

6-20

High Performance Networks

RSVP provides strong QoS guarantees,


significant granularity of resource allocation,
and significant feedback to applications and
users.
RSVP also has the ability to support protection,
i.e. traffic restoration in case of failure, in a
timely fashion (less than 50 msec) in MPLS.
RSVP Messages Syntax
The message format is Type-Length-Value. A
type field identifies the message type, followed
by a length field, followed by the data itself
RSVP messages
These messages (i.e. types) are:
1) Path, 2) Resv, 3) PathErr, 4) ResvErr, 5)
PathTear, 6) ResvTear and 7) ResvConf
Each RSVP message will carry a number of
objects. This will be discussed later.
Figure 6.18 shows how RSVP establishes a
session between host A and host B.
1.
Host A first issues a PATH message to
the far end via a number of routers. This
message carries the traffic specifications,
e.g. the bandwidth and packet size, about
the data the sender expects to send.
2.
Each RSVP-enabled router along the way
establishes a path state that includes the
previous source address of the PATH
message (i.e. the next hop back to the
sender).
3.
The receiver of the PATH message
responds with a Reservation Request
(RESV) message. The receiver will
indicate the type of reservation service
requested, e.g. Controlled-load service or
Guaranteed service defined in Integrated
Services.
4.
The RESV message travels back to the
sender along the same route that the
PATH message took (but in reverse). At
each router, the requested resources
indicated in the FlowSpec object (see
table 6.1 below) are allocated if they are
available.
6-21

5.

Finally, the RESV message reaches the


sender with a confirmation that resources
have been reserved.
Note: RSVP supports QoS reservation in
routers along the IGP path between a pair of
hosts. This creates scalability problem because
each router along the path has to maintain the
per-flow state between a pair of hosts and there
can be millions of hosts requesting RSVP
service from the network simultaneously.
Host A

Host B

PATH
sender

PATH

RESV

PATH

RESV

PATH

RESV

RESV receiver

Figure 6.18 RSVP Used for Resource Reservation

Object
Session

Objects carried in RSVP messages.


The following table shows some of the objects
carried in RSVP messages.
Functions

Table 6.1 RSVP Objects


Information in the Object

To identify a session

RSVP_HOP To identify the previous node


through which this RSVP
message come from
Time_Value To indicate the time out
period of this RSVP message
TSpec
To indicate the traffic
specifications the sender
expects to send

High Performance Networks

6-22

1) Dest. IP address,
2) Dest. IP port number (optional),
3) Class type, e.g. IPv4/UDP or
IPv6/UDP
1) IP address and interface of an
node
1) Time out period in msec

RSVP
Messages
All

All

ALL

1) Specifications for metering. For


PATH
example, a token bucket. This
RESV
includes (i) a token bucket size in
bytes (this limits the input max.
burst size) and (ii) a token bucket
rate (this limits the input average
rate)
2) Peak rate in bytes/sec
3) Maximum packet size in bytes
4) Minimum policing unit in bytes,
e.g. m bytes. Packets shorter than m
bytes will be consider as m bytes
long)

Sender_
Template

FlowSpec

FilterSpec

ADSpec

To describe the format of data


packets that a specific sender
(i.e. host) will originate. This
template is in the form of a
FilterSpec that is typically
used to select this senders
packets from others in the
same session on the same
link.
1) To specify the desired QoS
2) To indicate the parameters
of the desired QoS control
service.
3) To indicate the accepted
traffic specifications

See FilterSpec

PATH

1) A service number specifies


either Guaranteed or Controlledload service defined in IntServ
2) An RSpec (reserve spec.)
contains the parameter(s) for the
specified service number, e.g.
bandwidth, maximum delay,
packet, etc.. This parameter defines
the desired QoS
3) A TSpec object,

RESV

To be used together with the


session object to define the
set of data packets (the flow)
that receives the service
defined by the FlowSpec
object. It indicates
information of the sender, see
the column on the right). It is
sent as a Sender_Template in
the PATH message.
To allow the sender and the
routers along the path to
advertise their QoS
capabilities to the receiver(s).
When the ADSpec objects
reaches a receiver in the
PATH message, it provides a
pretty good indication of what
the receiver can reasonably
request in term of QoS from
the routers and sender

1) IP address of the sender (i.e.


PATH,
host)
RESV
2) IP port number of sender (i.e.
host). This is an optional parameter.
It is included, for example, in video
conferencing with the video stream
and the audio stream request for
different QoS treatments defined
different FlowSpec

High Performance Networks

1) Parameters of QoS capability,


e.g. link bandwidth.

PATH

The reservations that RSVP makes are soft,


which means that they need to be
refreshed/updated on a regular basis by the
receiver(s).
RSVP is used in IntServ and DiffServ for
resources reservation, and RSVP-TE (traffic
engineering) is used for setting up MPLS LSPs.
6-23

Label Binding and LSP Tunnel Establishment


using RSVP-TE

In the late 1990s, IETF extended RSVP to


support traffic engineering and solve the
scalability problem. In RSVP-TE, RSVP
sessions do not extend from host to host. They
only take place between ingress and egress
LSRs. Traffic from hosts connected to an
ingress LSR is aggregated at the ingress LSR.
The aggregated traffic, known as traffic trunk,
is then mapped to an LSP, also know as LSP
tunnel, see figure 6.19. Every LSR or LER only
has to maintain states of the LSPs, not the perflow state.
Host A
LSP
Ingress LSR

Host C
Egress LSR

Host B

Figure 6.19 Traffic Aggregation

High Performance Networks

RSVP-TE
performs
downstream
label
allocation, distribution, and binding on demand
among LSRs in LSP path, thus establishing path
state in networks, see figure 6.20.
New objects have been defined in the
extensions to support traffic engineering in
MPLS. Some of these new objects are:
1.
LABEL_REQUEST object;
2.
EXPLICIT _ROUTE object (ERO);
3.
RECORD_ROUTE object (RRO);
4.
SESSION_ATTRIBUTES object;
5.
LABEL object; and
6.
STYPE object.
These objects are included in the PATH and/or
RESV messages to setup an LSP tunnel (see
figure 6.20). Other standard RSVP objects listed
in table 6.1 can also be included in the PATH
and RESV messages.

6-24

PATH MESSAGE
(SESSION, LABEL_REQUEST, ERO*, RRO*, SESSION_ATTRIBUTE*)
Ingress LSR

Egress LSR

RESV MESSAGE
(SESSION, LABEL, STYLE, RRO*)

* Optional Object
LSP Established

Figure 6.20 Establishing an LSP Tunnel

Details of objects in a PATH message to


support traffic engineering & Class of Service
1.
LABEL_REQUEST object. This object
indicates that a label binding for a
specific LSP. This object also contains
the Layer 3 protocol ID (e.g. IP) that will
traverse this LSP. This object can also
indicate the label range for ATM-based
and Frame Relay-based labels. This is no
label range for regular 32-bit MPLS
labels.
2.
EXPLICIT_ROUTE object. This object
is encoded as a series of sub-objects
contained in the ERO. Each sub-object
can identify a group of nodes in the
explicit route or can specify an operation
to be performed along the path. Each
group of nodes is called an abstract node.
The format of sub-object is shown in
figure 6.21.
L

Type

Length

Sub-object Contents

Figure 6.21 ERO Sub-object Format

In figure 6.21:
L bit = 0 implies strict hop in the explicit
route
L bit = 1 implies loose hop in the explicit
route
Type can be IPv4 prefix; IPv6 prefix; or
Autonomous System Number (this
number identifies an abstract node
High Performance Networks

6-25

consisting of the set of nodes belonging


to the autonomous system. This number
allows an LSP to traverse across different
autonomous systems.)
Figure 6.22 show a loose explicit route
with a strict hop and a loose hop.
10.10.10.1

Ingress LSR

10.10.10.20.1

Egress LSR

10.10.40.1
Explicit Route = {[L=0, IPv4, 10.10.10.1]
[L=1, IPv4, 10.10.40.1]}

LSP
10.10.30.1

Figure 6.22 Loose Explicit Route

3.

High Performance Networks

RECORD_ROUTE Object
This object is sent to the egress LSR via
the PATH message and is returned to the
ingress LSR via the RESV message. The
ingress LSR can get information about
the actual route that the LSP traverses
from this object. RRO can be used to:
a.
detect L3 routing loops or routes
inherent in explicit route;
b.
collect detailed hop-by-hop path
information of a RSVP session.
c.
input into the ERO. After the
ingress LSR receives the RRO
from the RESV messages, it can
alter its ERO in the next PATH
message. This can be used to pindown (i.e. not allowed to change) a
session path to prevent the path
from being altered even if a better
path becomes available.
When an RRO traverses the path in a
PATH message, each node (including the
ingress and egress LSRs) along the path
will insert its IP address prefix sub-object
into the RRO. When the RRO returns to
the ingress LSR along the same path in
the RESV message, each node along the
6-26

4.

5.

High Performance Networks

path will retrieve the RRO from the


RESV message. Hence every node along
the path will have the complete route of
the LSP from ingress to egress.
SESSION_ATTRIBUTION Object
This object is used to control LSP priority,
preemption, and fast reroute features.
The Setup Priority field defines the
priority of this LSP. This field is used
when deciding whether one LSP tunnel
can preempt another.
The Holding Priority field defines the
priority of the LSP tunnel with respect to
holding resources that other LSPs want to
consume. This field is used when
deciding whether one LSP tunnel can be
preempted.
The Local Repair flag is used to indicate
if local repair with transit LSRs can
violate the ERO in case of failure.
The Ingress node may reroute bit is used
to indicate if the ingress LSR can reroute
the LSP without tearing it down.
SESSION, SENDER_TEMPLATE,
FLOW_SPEC and FILTER_SPEC
New C-type (Class-type) extensions have
been defined for these regular RSVP
objects.
SESSION: the new C-type defined is
LSP_TUNNEL_IPv4 which contains the
IPv4 address of the egress node and a
unique 16-bit LSP_ID (i.e. LSP
tunnel/traffic trunk ID) that remains
constant over the life of the LSP tunnel
even if the LSP is rerouted. This object
uniquely identifies an LSP tunnel.
SENDER_TEMPLATE: the new C-type
defined is LSP_TUNNEL_IPv4 which
contains the IPv4 address for the sender
node and a unique 16-bit LSP_ID that
can be changed to allow a sender to
share resources with itself. This LSP_ID
is used when an LSP tunnel that was
6-27

established with a Share-Explicit style


is rerouted.
FLOW_SPEC: the new C-type defined
is
CLASS_OF_SERVICE
(CoS).
FLOW_SPEC is used to define the
desired QoS, see table 6.1. When a traffic
flow in a session satisfies the
specifications (i.e. sender IP address and
optionally sender IP port number) listed
in the FILTER_SPEC object carried in
the same PATH message with the
FLOW_SPEC, then this flow will get the
desired QoS treatments define in the
FLOW_SPEC in each LSR along the
path.
Usually the ingress LSR will construct a
TSPEC and inserts it into the
FLOW_SPEC. Based on this information,
the egress LSR will construct a receiver
TSPEC and RSPEC (see table 6.1),
inserts them into the FLOW_SPEC and
sends it back to the ingress LSR via the
RESV message. LSRs along the path will
reserve resources based on the
information in FLOW_SPEC.
FILTER_SPEC: the new C-type defined
is LSP_TUNNEL_IPv4 which contains
the IPv4 address and optionally the IP
port number for the sender node and a
unique 16-bit LSP_ID that can be
changed to allow a sender to share
resources with itself. This LSP_ID is
used when an LSP tunnel that was
established with a Share-Explicit style
is rerouted. FILTER_SPEC together with
FLOW_SPEC form the FLOW
DESCRIPTOR.
Figure 6.23 shows some of the objects that can
be included in a PATH message.
PATH Message (SESSION, LABEL_REQUEST, ERO, RRO, SESSION_ATTRIBUTE,
SENDER_TEMPLATE, FLOW_SPEC)

Figure 6.23 PATH Message


High Performance Networks

6-28

Details of objects in a RSVP message to support


traffic engineering & Class of Service
1.
LABEL object: a label is provided for
each sender (LSR/LER) to the LSP.
Figure 6.24 shows how an LSR processes
the label in a RESV message. When the
LSR receives a RESV message
corresponding to a previous PATH
message, it binds the incoming label for
the specifying FEC/traffic trunk to the
receiving interface (2 in this example)
and updates the forwarding table. It then
binds a locally allocated label to the
LSPs incoming interface (1 in this
example) and updates the forwarding
table. The LSR then constructs a new
LABEL object, replaces the old LABEL
object in the received RESV message and
forward this RESV message to previous
hop (upstream) in the LSP.
LSR

Upstream
LSR/LER

RESV (LABEL = 11)

RESV (LABEL = 22)

Downstream
LSR/LER

Direction of LSP (i.e. packet flow)

MPSL Forwarding/Switching Table


Input Interface Input Label
1

11

Output Interface Output Label


2

22

Action
Swap

Figure 6.24 LSR Process the LABEL Object

2.

High Performance Networks

STYLE object: this object specifies the


resources reservation style that can be
applied on traffic trunks (i.e. aggregated
flow). In MPLS, reservation style can be
either Fixed Filter (FF) or Shared
Explicit (SE). It is the receiver, i.e. the
egress LSR that chooses the reservation
style for an LSP, NOT the ingress LSR.
However, an ingress LSR can set the
Ingress node may reroute bit in the
SESSION_ATTRIBUTE
object
to

6-29

request that the egress LSR uses the SE


reservation style.
Fixed Filter: FF reservation style
specifies an explicit list of senders and a
distinct reservation for each of them.
Each sender is identified by the IP
address of an LSR/LER and a local
LSP_ID. Each sender has distinct
reservation and is not shared with other
senders. A separate LSP is constructed
for each sender-receiver pair. Traditional
application for this style of reservation is
video distribution which requires a
separate pipe for each of the individual
video streams. In figure 6.25, ingress
LSR A and B create two separate pointto- point LSPs, LSP 1 and LSP 2,
towards common egress LSR D both with
FF reservation style. The total amount of
bandwidth reserved on shared link C-D is
equal to the sum of reservation required
by ingress LSR A and ingress LSR B.
Egress LSR D also different labels for
LSP 1 and LSP 2.
Ingress
LSR A

LSP 1

LSR C

LSP 1

Egress
LSR D

LSP 2
LSP 2
Ingress
LSR B

Figure 6.25 Fixed Filter Reservation Style

Shared Explicit: The SE reservation


style creates a single reservation over a
link that is shared by an explicit list of
senders. Again, separate LSP is created
for each sender-receiver pair. In figure
6.26, LSP 1 and LSP 2 are created with
SE reservation style. Link C-D is the
shared link with bandwidth reservation of
the larger request. For the multipoint-topoint LSP shown in figure 6.26, egress
LSR D assigns the same label to LSP 1
High Performance Networks

6-30

and LSP 2. This is known as label


merging or stream merging.
Ingress
LSR A

LSP 1

LSR C

LSP 1

LSP 2

Egress
LSR D

LSP 2
Ingress
LSR B

Figure 6.26 Shared Explicit Reservation

Figure 6.27 shows some of the objects that can


be included in an RESV message.
RESV Message (SESSION, LABEL, RRO, STYLE, FLOW_DESCRIPTOR list)

Figure 6.27 RESV Message

High Performance Networks

Establishing an LSP Tunnel using RSVP-TE


We use the example shown in figure 6.28 to
explain how RSVP-TE establishes an LSP
tunnel from LSR 1 to LSR 4.
1.
Assume an explicit route has already
been constructed and downloaded to
ingress LSR1, and the L-bit in each subobject in the ERO of this explicit route
has been cleared to specify strict hop.
2.
LSR 1 creates a PATH message with
objects as shown in figure 6.28. It:
- sets up the SESSION object to uniquely
identify this LSP tunnel;
- indicates in the LABEL_REQUEST
object that FEC-to-label binding is
requested for the LSP and also
identifies the L3 protocol (L3PID, e.g.
IP) going to be carried by this LSP;
- inserts its own IP address prefix into the
RRO object;
- sets up the priority, preemption and fast
reroute in the SESSION_ATTRIBUTE
object;
- enters the senders IP address (in
RSVP-TE, this can be the IP address of
the ingress LSR, LSR 1) and the IP port

6-31

3.
4.

5.

6.

High Performance Networks

number if necessary into the


SENDER_TEMPLATE object
- enters the traffic characteristics (see
table 6.1) of the flow that will be sent
along the LSP into TSPEC in the
FLOW_SPEC object.
Note, if the LSP is intended to carry
best-effort traffic and does not require
that resources be allocated, then the
burst size and the rate will be set to 0.
Ingress LSR 1 sends the PATH message
to LSR 2 as specified by the ERO.
When LSR 2 receives the PATH message,
it records the ERO, the session, the
LABEL_REQUEST
object,
the
session_attribute object, IP address of the
previous hop, and the TSPEC. It then
inserts its own IP address into the RRO
and forwards the PATH message to LSR
3.
LSR 3 process the PATH message
exactly the same as LSR 2 and forwards
it to egress LSR 4. Note: LSR 3 is the so
call penultimate LSR.
When LSR 4 receives the PATH message,
it notices from the session object that it is
the egress LSR for the LSP. It generates a
RESV message for the session to
distribute labels and establish forwarding
state for the LSP tunnel. It:
- allocates a label with a value of 0 and
places it in the LABEL object. Both 0
and 3 are of special meaning to the
LSR 4. They are used to speed up the
operation of the egress LSR by avoid
doing two table lookups;
- constructs the STYLE object for the
RESV message. It selects the
appropriate reservation style. If the
Ingress node may reroute bit in the
SESSION_ATTRIBUTE object is set
by LSR 1, then it may set the style to
SE.
6-32

7.

8.

9.

High Performance Networks

- based on the TPSCE in the PATH


message, LSR 4 constructs an
appropriate receiver TSPC and RSPEC
for the FLOW_SPEC. Together with
the SENDPER_SPEC, this forms the
FLOW_DESCRIPTOR in the RESV
message;
- sends the RESV message back to LSR 3
based on the previous hop information.
Note: ERO is not in the RESV message.
When LSR 3 receives the RESV
message containing the label assigned
by LSR 4, it
- stores the label 0 as part of the
reservation state for the LSP;
- allocates a new label, 22, and replaces
the old label 0 with this new label in
the LABEL object. This is the label
that LSR 3 uses to identify incoming
traffic on the LSP from LSR 2.
- updates its forwarding/switching
table;
- allocates the resources and install
filters based on the information in the
STYLE
object
and
the
FLOW_DESCRIPTOR list; and
- forwards the RESV message upstream
to LSR 2 based on the previous hop
information it receives in the previous
PATH message.
LSR 2 processes the RESV message
exactly the same way as LSR 3 except
the new label allocated is 11.
When LSR 1 receives the RESV
message that contains the label 11
assigned by LSR 2, it allocates the
resources, installs the filter(s), updates
its forwarding table and uses label 11
for all outgoing traffic that maps to
this LSP.

6-33

PATH Message (SESSION, LABEL_REQUEST, ERO, RRO,


SESSION_ATTRIBUTE,
SENDER_TEMPLATE, FLOW_SPEC)
LSR 2

LSR 3

Ingress
LSR 1

Egress
LSR 4
LABEL =11

LABEL = 22

LABEL=0

RESV Message (SESSION, LABEL, RRO, STYLE, FLOW_DESCRIPTOR list)

Figure 6.28 Setup an LSP Tunnel between Ingress LSR 1 and Egress LSR 4 Style

Packet Forwarding
o
The components for packet forwarding in LER and
LSR are shown in figure 6.6 and figure 6.4
respectively.
o
We use the LSP established in figure 6.28 to explain
packet forwarding in MPLS. With reference to figure
6.29:
1.
When ingress LSR 1 receives a standard IP
packet, it analyses the IP header. Based on this
analysis, the packet is classified, mapped to an
FEC/traffic trunk and hence assigned a label (in
this case 11). LSR 1 encapsulates the IP packet
in an MPLS frame, pushes the label 11 onto the
label header and forwards the MPLS packet out
interface 2.
2.
LSR 2 receives the MPLS packet on interface 1
with a label equal 11. It looks up the
forwarding/switching table and learns that the
packet should be forwarded out interface 2 with
label equals 22. It swaps the label to 22 and
forwards the MPLS packet out interface 2.
3.
LSR 3 (the penultimate LSR) processes the
received packet exactly the same way as LSR 2.
4.
LSR 4 receives the MPLS packet on interface 1
with a label equal 0. Because the MPLS frame
has a label value of 0, LSR 4 knows that it is the
egress LSR for the LSP tunnel and that it must
make a forwarding decision based on the
destination IP address in the packets IP header,
not based on the MPLS label. Therefore LSR 4
performs a standard IP routing by doing a

High Performance Networks

6-34

longest-match lookup in its IP routing table for


the next hop.

MPLS Domain
Ingress
LSR 1

LSR 2
2

Egress
LSR 4

LSR 3
2

PUSH
IP Packet

SWAP
Layer-2
Header

SWAP

Label = 22

Label = 11

POP

Label = 0

Figure 6.29 Data Packet Forwarding

MPLS Fast Re-Route

MPLS fast re-route allow an LSP tunnel to be rerouted in less than


50 msec.

Conditions that require to re-route an established LSP may include:


1.
when any resource (e.g. link or router) along the LSP tunnel
failed;
2.
when the LSP does not meet QoS requirements; and
3.
when the failed resources along the original path are restored
and are available, the previously re-routed LSP can re-route
back to its original path.

The make before break approach is adopted in MPLS for rerouting. Backup tunnels are usually pre-established and traffic is
transferred to the backup tunnel before the primary tunnel is tear
down.

SE reservation style in RSVP prevents double counting of


resources when the backup path and the primary path share
common link(s)/hop(s).

There are two methods to use RSVP-TE to establish backup LSP


tunnels. They are:
1.
end-to-end protection switching.
In this method, it is required to pre-establish two paths from
ingress to egress, one primary and one backup, for an LSP
for redundancy. If a link along the primary path fails, the

High Performance Networks

6-35

ingress node will be notified and it will switch all traffic to


the pre-signaled backup path, see figure 6.30.
When the backup path is idle, i.e. not carrying any traffic, its
resources will be used by other LSPs. When re-route to
backup path is triggered, then it will preempt other lower
priority LSP(s) and reclaim the resources.
LSR 2

Failure

LSR 3

Link Failure

Failure
Ingress
LSR 1

Egress
LSR 6

LSR 4

Backup Path

Primary LSP
Backup Path
LSR 5

Figure 6.30 End-to-end Protection Switching

2.

local repair.
Local repair allows the LSP to be repaired at the place of
failure. This allows the existing LSP to reroute around a
local point of failure. This method allows the network to
converge faster then method 1 above.
For the one-to-one local repair backup scheme, at each
LSR/LER along an LSP, a detour LSP is pre-signaled to
protect this node against a failure of its downstream link or
node. A detour LSP is a partial LSP that starts upstream of
that node and intersects with the original LSP somewhere
downstream of the point of link or node failure. Figure 6.31
show the detour LSP at LSR 2 that protects it from a failure
of LSR 3 or the LSR2LSR3 link. The route of this detour
LSP is LSR2LSR4-LSR5-LSR6.
Point of Local Repair

LSR 2

Link Failure

LSR 3

Ingress
LSR 1

Egress
LSR 6

Active LSP
Detour LSP
LSR 4

LSR 5

Figure 6.31 Local-repair Protection Switching

MPLS VPNs

There are a number of diverse VPN models. They are:


High Performance Networks

6-36

1.

Traditional VPNs
a)
Frame Relay (Layer 2)
b)
ATM (Layer 2)
2.
Customer Premises Equipment (CPE) based VPNs
a)
L2 Tunneling Protocol (L2TP), PPTP (Layer 2)
b)
IPSec (Layer 3)
3.
Service Provider Provisioned VPNs
a)
BGP/MPLS VPNs - RFC 2547bis (Layer 3)
b)
MPLS-based Layer 2 VPNs.
We will discuss 3 a) and 3 b) here.

BGP/MPLS VPN - RFC 2547bis (Layer 3)

RFC 2547bis
1.
provides a mechanism that simplifies WAN operations
for a diverse set of customers that have little IP
routing experience/expertise; and
2.
is a way to efficiently scale the network while
delivering revenue generating value added services.

Network Components
1.
RFC 2547bis defines a collection of policies to control
the connectivity among a set of customer sites. A
customer site is connected to the service provider
MPLS network by one or more ports at the Provider
Edge (PE) router where the service provider associates
each port with a VPN routing and forwarding table
(VRF) (see figure 6.32).
2.
Customer Edge (CE) Device
A CE device provides customer access to the service
provider network over a data link (e.g. ATM PVC,
Frame Relay PVC, VLAN) to one or more PE routers.
Usually the CE device is a router (can be a L2 switch
or a host) that establishes an adjacency with its
directly connected PE routers. After the adjacency is
established, the CE router advertises the sites local
VPN routes to the PE router and learns remote VPN
routes from the PE router.
3.
PE routers
PE routers exchange routing information with CE
routers using static routing, RIPv2, OSPF, IS-IS or
EBGP. A PE router only maintains VPN routes (i.e.
VRF table) for those VPNs to which it is directly

High Performance Networks

6-37

attached (see figure 6.32), not ALL the service


providers VPN routes. This enhance scalability.
Again, a port on a PE router, not a customer site, is
assoicated with a VRF. The customer connection to a
port is indirectly mapped to the specific VRF
associated with this port. A PE router can maintain
multiple VRF that supports the per-VPN segregation
of routing informatino.
After learning local VPN routes from CE routers, a PE
router exchanges VPN routing infromation with other
PE routers using IBGP. PE routers can maintain IBGP
sessions to route reflectors as an alternative to a full
mesh of IBGP session. This enhance scalability.
When using an LSP to forward VPN data traffic, the
ingress PE routers functions as the ingress LSR and
the egress PE router functions as the egress LSR.
Provider (P) Routers
A P router is any router in the providers network that
does not attach to CE devices. P routers are the
MPLS core LSRs. VPN data traffic is forwarded
across the MPLS backbone using a TWO layers
label stack. P routers are not required to maintain
specific VPN routing information for each customer
site. This greatly enhances the scalability.

4.

Service Provider MPLS Backbone

CE
CE
Customer A
Site 1
10.11/16

Customer A
Site 2
10.21/16

VRF-A

VRF-A

PE
PE
VRF-B

Customer B
Site 1
10.11/16

CE

VRF-B

CE

CE
VRF-A
PE
Customer A
Site 3
10.31/16

Figure 6.32 RFC 2547bis Network Components

Operation Issues and Solutions


Besides scaling, RFC 2547bis also provides solutions for the
following operational issues:

High Performance Networks

6-38

Customer B
Site 2
10.21/16

1.
2.
3.
4.

support overlapping customer address space;


constrain network connectivity;
maintain updated VPN routing information; and
conserve backbone bandwidth and PE router packet
processing resources

Overlapping Customer Address Spaces


Figure 6.32 shows the overlapping of customer As
private IP address space with customer Bs private IP
address space. To solve this problem and to provide
globally unique addresses, each customers IP address
is prefixed with an 8-byte Route Distinguisher
(RD), see figure 6.33. The addresses so formed are
called VPN-IPv4 addresses. Note:
The route
distinguisher by itself should be globally unique.
In figure 6.33, the TYPE field can be either 0 or 1.
For TYPE 0, the ADMINISTRATOR subfield is 2
bytes and should hold a globally unique autonomous
system number (ASN), i.e. the service provider shall
use the ASN assigned to it in this field. The
ASSIGNED NUMBER subfield holds a value from
the numbering space administrated by the service
provider.
For type 1, the ADMINISTRATOR subfield is 4 bytes
and should hold a globally unique IPv4 address, e.g.
can use the global loop back address of the PE that
originates the route, i.e. the egress PE for an LSP.
Again, the ASSIGNED NUMBER subfield holds a
value from the numbering space administrated by the
service provider.
Route Distinguisher

Figure 6.33

TYPE
2-byte

ADMINISTRATOR

ASSIGNED NUMBER

6-byte

IPv4 ADDRESS PREFIX


4-byte

When configuring RDs on PE routers, each VRF in


each PE within a VPN can have its own unique RD.
The VPN-IPv4 addresses in each PE will be
distributed to other PE routers within the VPN by
Multi-Protocol BGP (MP-BGP). Hence, the use of
unique RDs can:
1.
create distinct routes to a common IPv4 prefix;
High Performance Networks

6-39

2.

use policy to decide which packets use which


route.

Note:
1.
VPN-IPv4 addresses are used only within the
service provider network;
2.
VPN customers are not aware of the use of
VPN-IPv4 addresses;
3.
VPN-IPv4 addresses (i.e. routes) are carried
only in routing protocol MP-BGP that runs
across the service provider network; and
4.
VPN-IPv4 addresses are not carried in the
packet headers of VPN data traffic as it crosses
the service provider network.

MP-BGP (RFC 2858)


Conventional BGP4 carries routing information
only for IPv4 address family. IETF standardizes
the Multi-protocol extensions for BGP4. These
extensions allow BGP4 to carry routing
information for multiple network layer
protocols, e.g. IPv6, IPX, VPN-IPv4, etc.
Therefore, every PE router in the service
provider network has to support MP-BGP so as
to support RFC 2547bis VPN.

Constrain Network Connectivity


If the route to a specific network is not installed in a
PE routers VRF, the network is then considered to be
unreachable from that PE router. Hence, service
providers can constrain the flow of customer VPN
data traffic by constraining the flow of VPN-IPv4
routing information. The MP-BGP/MPLS VPN
constrains the flow of VPN-IPv4 routing information
by:
1.
using multiple VRF tables; and
2.
using BGP extended community attributes.
Using Multiple VRF Tables
Each PE router maintains one or more per-site VRFs
with each VRF configured to associate with one or
more ports which connect directly to customer sites,
see figure 6.32.
When receiving an outbound
customer data packet from a directly attached CE

High Performance Networks

6-40

router, the PE router performs a route lookup in the


VRF that is associated with that site. The specific
VRF used is determined by the port over which the
data packet is received. Support of multiple VRF
tables makes it easy for the PE router to provide the
per-VPN segregation of routing information. Figure
6.34 shows how PE 1 populated VRF-A:
1.
PE 1 learns customer A site 1s VPN A routes
from CE 1 and imports them into VRF-A;
2.
Remote routes are learned via MP-IBGP from
PE 2 and PE 3 that are directly connected to
sites with hosts that are members of VPN A
(see figure 6.34). Based on the BGP extended
community route target attributes (to be
discussed later), PE 1 may import remote routes
learned for VPN A into VRF-A.
3.
PE 1 does not import local routes from CE 5
and remote routes from CE 3 into VRF-A
because they are routes for VPN B.
Service Provider MPLS Backbone

CE 1
Customer A
Site 1
10.11/16

Local
Routes

CE 2

Remote Route
P

Customer A
Site 2
10.21/16

VRF-A

VRF-A

PE 1
PE 2
VRF-B

Customer B
Site 1
10.11/16

Remote
Routes

CE 5

VRF-B

CE 3

CE 4
VRF-A
PE 3
Customer A
Site 3
10.31/16

Figure 6.34 A PE Router Populate s a VRF Table

Using BGP extended community attributes


Extended community attributes carried in BGP
messages as attributes of the route are used to control
the distribution of routing information between PE
routers. These attributes identify the route as
belonging to a specific collection of routes, all of
which are treated the same with respect to routing
High Performance Networks

6-41

Customer B
Site 2
10.21/16

policy. Each BGP extended community attribute is 32


bits long, globally unique (e.g. contains either the
providers global ASN or a global IP address) and can
be used by only one VPN. However, a customer VPN
can use multiple BGP extended communities.
RFC 2547bis VPNs can use up to 3 different types of
BGP extended community attributes:
1.
The route target attribute identifies a collection
of sites (VRFs) to which a PE router distributes
routes. A PE router uses this attribute to
constrain the import of remote routes into its
VRF.
2.
The VPN-of-origin attribute identifies a
collection of sites and establishes the associated
route as coming from one of the sites in that set.
3.
The site-of-origin attribute identifies the
specific site from which a PE router learns a
route. It is encoded as a route extended
community attribute, which can be used to
prevent routing loops.

High Performance Networks

Using the route target attribute


Before distributing local routes to other PE
routers, the ingress PE router attaches a route
target attribute to each route learned from
directly connected sites. The route target
attached to the route is based on the value of the
VRFs configured export target policy. An
ingress PE router can be configured to assign a
single route target attribute to all routes or a set
of routes learned from a given site. Besides, the
directly connected CE router can specify one or
more route targets for each route.
Before importing remote routes that have been
distributed by another PE router, each VRF on
an egress PE router is configured with an
import target policy. A PE router can only
import VPN-IPv4 route into a VRF if the route
target carried with the received route matches
one of the PE router VRFs import target.
By careful configuration of export target and
import target policies, service providers can
construct different types of VPN topologies.
6-42

Example: Hub-and-spoke VPN Topology


Assume that Customer A wants its BGP/MPLS
VPN service provider to create a VPN that
supports hub-and-spoke site connectivity, see
figure 6.35. The inter site connectivity for
Customer A can be described by the following
policies.
1.
Customer A site 1 can communication
directly with Customer A site 3 but not
directly with Customer A site 2. If
Customer A site 1 wants to communicate
with Customer A site 2, it must sends
data traffic by way of Customer A site 3.
2.
Customer A site 2 can communication
directly with Customer A site 3 but not
directly with Customer A site 1. If
Customer A site 2 wants to communicate
with Customer A site 1, it must sends
data traffic by way of Customer A site 3.
3.
Customer A site 3 can communicate
directly with Customer A site 1 and site
2.
4.
Customer A sites cannot send traffic to or
receive data traffic from other sites
belonging to other corporations.
With reference to figure 6.35, a hub-and-spoke
topology is created using 2 globally unique
route target values: Hub and Spoke.
The VRF-A on PE 3 router (the hub site) is
configured with an export target = Hub and an
import target = Spoke. With this configuration,
VRF-A on PE 3 router distributes all the routes
in its VRF with a Hub attribute that causes the
routes to be imported by the spoke sites (PE 1
and PE 2). VRF-A on PE 3 router imports all
remote routes with a Spoke attribute.
Both VRF-As on PE 1 router and PE 2 router
are configured with an export target = Spoke
and an import target = Hub. These two VRF-As
distribute their routes with a Spoke attribute and
import routes with only a Hub attribute.
High Performance Networks

6-43

Service Provider
Network

CE 2

CE 1
Customer A
Site 1
10.11/16

Exp Target = Spoke


Imp Target = Hub

VRF-A
PE 1

Exp Target = Spoke


Imp Target = Hub

Customer A
Site 2
10.21/16

VRF-A
PE 2

VRF-B
Customer B
Site 1
10.11/16

VRF-B

CE 5

CE 3
Exp Target = Hub
Imp Target = Spoke

Customer B
Site 2
10.21/16

CE 4
VRF-A
LSP

PE 3
Customer A
Site 3
10.31/16

Figure 6.35 Hub-and-spoke VPN Connectivity

Maintain updated VPN routing information


When the configuration of a PE router is changed by
creating a new VRF or by adding one or more new
import target policies to an existing VRF, the existing
PE router might need to obtain VPN-IPv4 routes that
it previously discarded. However, conventional
BGP4 is a stateful protocol and does not support readvertisement of routes. That is, once BGP peers
synchronize their VRF tables, they do not exchange
routing information until there is a change in their
routing information.
The route refreshment capability supported by MPBGP provides a solution to this problem. Whenever
the configuration of a PE router is changed, the PE
router sends a route refresh message to its peers or
the route reflector to trigger the re-transmission of
routing information from its MP-BGP peers to obtain
routing information it previously discarded.

Conserve backbone bandwidth and PE router


packet processing resources
The generation, transmission and processing of
routing updates consumes backbone bandwidth and

High Performance Networks

6-44

router packet processing resources. These assets can


be conserved by eliminating the transmission of
unnecessary routing updates.
The number of BGP routing updates can be reduced
by enabling the new BGP cooperative route filtering
capability. During the establishment of the MP-IBGP
session, a BGP speaker that wants to send or receive
outbound route filters (ORFs) to or from its peer or
route reflector advertises the cooperative route
filtering capability using a BGP capabilities
advertisement. The BGP speaker sends its peer a set of
ORFs that are expressed in terms of BGP
communities. The ORF entries are carried in BGP
route refresh messages. The peer applies the received
ORFs in addition to its locally configured export
target policy, to constrain and filter outbound routing
updates to the BGP speaker.

Operation Model
There are two fundamental traffic flows occur in a
BGP/MPLS VPN
1.
A control flow that is used in VPN route distribution
and LSP establishment.
VPN route distribution includes the exchange of
routing information between the CE and PE routers,
and between the PE routers across the providers
network.
LSP establish includes the RSVP-TE signaling
messages exchanges between the PE and P routers
across the providers network.
2.
A data flow that is used to forward customer data
traffic.
o

We will use the example shown in figure 6.36 to


explain the BGP/MPLS VPN (RFC 2547bis) service
provided by the service provider to customer B.
Exchange of Routing Information
PE 1 is configured with VRF-B with a globally unique
route distinguisher, a globally unique export and
import target Cust-B, and is associated with the port
over which it learns routes from CE 5. When CE5
advertises the route with prefix 10.11/16 to PE 1, PE 1

High Performance Networks

6-45

installs a local route to 10.11/16 in VRF-B. PE 1


advertises:
1.
the VPN-IPv4 address for 10.11/16 together
with the BGP extended community attribute
route target, Cust-B; and
2.
the selected MPLS label, e.g. 789, which
identifies VRF-Bs association with the port
connecting to CE5, and the loop-back address
of PE 1 as the BGP next hop for the route
10.11/16;
to PE 2. When PE 2 receives PE 1s route
advertisement, it determines if it should install the
route to prefix 10.11/16 into VRF-B by performing
route filtering based on the BGP extended community
attribute Cust-B with its import target Cust-B. In
this case, PE 2 installs the route to prefix 10.11/16 into
its VRF-B and then advertises the route to prefix
10.11/16 to CE 3.
LSP Establishment
One or more LSPs have to be setup from PE 2 to PE 1
for Customer B site 2 to send data to Customer B site
1. RSVP-TE is usually used to setup these LSPs.
However, if best-effort LSPs are desired, then LDP is
used.
Data Flow
Assume host 10.21.1.2 at Customer B site 2 wants to
communicate with server 10.11.2.3 at Customer B site
1. Host 10.21.1.2 forwards all data packets for server
10.11.2.3 to default gateway. When a packet arrives at
CE 3, it performances a longest match route lookup
and forwards the packet to PE 2. When PE 2 receives
this packet, it performs a route lookup in VRF-B and
obtains the following information:
1.
The MPLS label, 789, that was advertised by
PE 1 with the route;
2.
The BGP next hop for the route (the loop back
address of PE 1);
3.
The outgoing port for the LSP from PE 2 to PE
1; and
4.
The initial MPLS label for the LSP from PE 2
to PE 1.
High Performance Networks

6-46

User traffic is forwarded from PE 2 to PE 1 using


MPLS with a 2-layer label stack. PE 2 first pushes the
label 789 onto the label stack making it the bottom
label. Then it pushes the label associated with the LSP
from PE 2 to PE 1 onto the label stack making it the
top label. Then PE 2 forwards the MPLS frame out
onto the output port. Assume the label at the
penultimate LSR to PE 1 for this LSP is 3. Then this
penultimate LSR pops out the top label, 3, and
forwards the MPLS frame to PE 1. When PE 1
receives the packet, it pops the bottom label, 789, and
uses it to identify the directly attached CE that is the
next hop to 10.11/16. Finally PE 1 forwards the packet
to CE 5, which forwards the packet to server
10.11.2.3.
Service Provider
Network

CE 2

CE 1
Customer A
Site 1
10.11/16

VRF-A

VRF-A

PE 1
Exp Target = Cust-B
Imp Target = Cust-B

VRF-B
Customer B
Site 1
10.11/16

Customer A
Site 2
10.21/16

Exp Target = Cust-B


Imp Target = Cust-B

PE 2
VRF-B

CE 5

CE 3

Customer B
Site 2
10.21/16

CE 4
VRF-A

LSP

PE 3
Customer A
Site 3
10.31/16

Figure 6.36 Customer B VPN

Some Benefits of BGP/MPLS VPNs


1.
BGP/MPLS VPN allows service providers to offer
scalable, revenue generating value-added services.
2.
There are no constraints on the address plan used by
each VPN customer.
3.
Customers do not have to deal with inter-site routing
issues because they are the responsibility of the
service provider.
4.
Providers do not have a separate backbone or virtual
backbone to administer for each customer VPN.
5.
The policies that determine whether a specific site is a
member of a particular VPN are the policies of the

High Performance Networks

6-47

6.
7.
8.

9.

customer. Customer policies are to be implemented by


the service provider alone.
A VPN can span multiple service providers.
Service providers can use a common infrastructure to
deliver both VPN and Internet connectivity services.
Flexible and scalable QoS for customer. VPN service
can be supported through the use of the experimental
bits in the MPLS shim header or by the use of traffic
engineered LSPs.
RFC 2547bis is link layer independent.

MPLS-based Layer 2 VPNs Draft-Martini and Virtual


Private LAN Service (VPLN)
BGP/MPLS VPN mitigates most of the scalability issues and also
eliminates the BGP stress on CEs because they do not exchange
routing information directly with each other. However, it is
considered to be an overkill solution to provide strict layer 2 (L2)
transportation services, e.g. to provide Ethernet L2 services to
forward Ethernet frames across an IP/MPLS network between
customer sites. In the following sections, we will focus on the
mechanisms in IP/MPLS to provide L2 Ethernet services, namely:
1.
Point-to-Point (P2P) Ethernet Service delivered via draftmartini over an MPLS network (also known as Ethernet over
MPLS (EoMPLS)) or via L2TPv3 over an IP network;
2.
Multipoint-to-Multipoint (MP2MP) Ethernet Service
delivered via VPLN

The Pseudo-wire (PW) Concept


PW is the packet leased line concept standardized by IETF.
An Ethernet PW allows Ethernet frames, not including
preamble and FCS, to be carried over a packet switched
network, e.g. an IP/MPLS network. An Ethernet PW
emulates a single Ethernet link between exactly two
endpoints. The PW terminates a logical port within the PE.
This port provides an Ethernet MAC service that delivers
each Ethernet frame that is received at the logical port to the
logical port in the corresponding PE at the other end of the
PW. An Ethernet PW can be configured manually or setup
using signaling protocol like BGP or LDP.
In figure 6.37, a big PSN tunnel is used to aggregate multiple
PWs across a PSN network. The PSN tunnel can be created
using generic routing protocol (GRE), L2TP or MPLS. This

High Performance Networks

6-48

tunnel is used to shield the internals of the network, i.e. P1


and P2, from information relating to the service provided by
the PE1 and PE2. While PE1 and PE2 are involved in
creating the PWs and mapping the L2 service to the PWs, P1
and P2 are agnostic to the L2 service and are passing either
IP or MPLS packets from one edge to another.
Packet Switched Network (PSN) Tunnel
Native Ethernet or
VLAN service
CE 1

CE 2

PE 1

P1
PW1

P2

PE 2

PW2

Figure 6.37 Reference Model Adopted by IETF to Support the Ethernet Pseudo-wires Emulated Services

Draft-Martini (P2P) - EoMPLS


Draft-martini is an IETF L2 encapsulation method for
carrying Ethernet, Frame Relay and ATM traffic across an
MPLS network. With draft-martini encapsulation, a PW is
constructed by building a pair of unidirectional MPLS virtual
connection (VC) LSPs between two PE endpoints. One VCLSP is for outgoing traffic, and the other is for incoming
traffic.
EoMPLS uses targeted LDP that allows the LDP session
to be established between the ingress and egress PEs
irrespective of whether the PEs are adjacent (directly
connected) or nonadjacent (not directly connected).
o

Ethernet Encapsulation
For a PW to carry an Ethernet frame (without the
preamble and FCS), it can be configured as one of the
following:
1.
Raw mode. In raw mode, the assumption is that
the PW represents a virtual connection between
two Ethernet ports. What goes in on the ingress
side goes out on the egress side.
2.
Tagged mode. In tagged mode, the assumption
is that the PW represents a connection between
two VLANs. Each VLAN is represented by a
different PW.

High Performance Networks

6-49

Figure 6.38 shows the establishment of both raw mode


and tagged mode PWs between the PE 1 router and the
PE 2 router.
Tagged Mode PWs

VLAN10 VLAN20

VLAN10
CE 2

CE 1

PE 1

P1

P2

PE 2

CE 4

CE 3

Raw Mode

VLAN20

Raw Mode
Raw PW

Figure 6.38 Martini Tunnel Modes

o
o

Maximum Transmit Unit (MTU)


Both ends of a PW must agree on their MTU size to be
transported over the MPLS network. The P routers
shall be able to support the largest size.
PWs must be able to support frame re-ordering in
order to deliver the frames in sequence.
Using LDP and MPLS LSP to Setup a PW
We will use the example shown in figure 6.40 to
explain the setup of a PW.
Steps:
1.
A targeted LDP session is formed between PE 1
LSR and PE 2 LSR.
2.
PE 1 and PE 2 exchange VC information, i.e.
service information. This is achieved by
carrying VC information in a label mapping
message sent in downstream unsolicited mode
with a new type of forwarding equivalency
class element as shown in figure 6.39.
a)
PW or VC Type - a value that represents
whether the VC is of type Frame Relay
DLCI, ATM cell, PPP, Ethernet tagged or
untagged frames, Circuit Emulation, and
so on. This field indicates the service
provided.

High Performance Networks

6-50

b)

c)

d)

PW or VC ID a connection ID that
together with the PW (VC) type identifies
a particular PW (VC).
Group ID represent a group of PWs.
For example, all the PWs carried by the
same Ethernet port can belong to the same
group. The Group ID is intended to be
used as a port index or a virtual index.
Interface Parameters a field that is
used to provide interface specific
parameters, such as the interface MTU.
PW TYPE (VC TYPW)
PW ID (VC ID)
Group ID
Interface Parameters

Figure 6.39 LDP Forwarding Equivalency Class

3.

Assume the VC label PE 2 gives to PE 1 is 201


and the VC label PE 1 gives to PE 2 is 102.
MPLS RSVP-TE is used to setup the two
opposite direction LSPs connecting PE 1 LSR
and PE 2 LSR.
Assume the LSP label used at the penultimate
LSR for both LSPs is 3.

Up to step 3, the two VC-LSPs are established and the


PW is considered operational. The following steps
show how an Ethernet frame is forwarded across the
MPLS network from CE1 to CE 2.
4.

5.

6.

High Performance Networks

When PE 1 receives an Ethernet frame, it rips


off the preamble and FCS fields, pushes the VC
label (201 in this case) and then the LSP tunnel
label (41 in this case) onto its label stack (i.e. the
shim header).
P 1 LSR and P 2 LSR use the upper LSP tunnel
label to switch the packet towards PE 2. P 1 and
P 2 do not have visibility to the VC label.
P 2 is the penultimate LSR for PE 2. It pops the
LSP tunnel label (3 in this case) and forwards
the packet to PE 2.
6-51

7.

PE 2, the egress LSR, receives the packet with


the inner VC label 201 that indicates to PE 2
how to process this packet.
In general, for raw mode Ethernet service, all PE
2 has to do is to forward the packet to CE 2. For
tagged mode Ethernet service or other services,
PE 2 may have to carry out more complicate
processing.
VC-LSP

PSN Tunnel LSP

Native Ethernet or
VLAN Service
P1
CE 1

P2

PE 2

PE 1

CE 2

LSP Label = 41

Ethernet
Frame

VC Label = 201

LSP Label = 51

VC Label = 201

VC Label = 201

Figure 6.40 LDP Session between PEs

VPLS (MP2MP)
VPLS emulates a LAN that provides full learning and
switching capabilities. Learning and switching are done by
allowing PE routers to forward Ethernet frames (without
preamble and FCS) based on learning the MAC address of
end stations that belong to the VPLS. VPLS allows an
enterprise customer to be in full control of its WAN routing
policies by running the routing service transparently over a
public IP/MPLS network. VPLS services are transparent to
higher layer protocols and use L2 emulated LANs to
transport any type of traffic such as IPv4, IPv6 and IPX.
With VPLS, the CEs are connected to VPLS-enabled PEs.
The PEs can participate in one or more VPLSs/VPLS
domains. For example PE 1 in figure 6.41 participates in
VPLS 1 and VPLS 2. To the CEs, a VPLS domain looks like
an Ethernet switch, and the CEs can exchange information
with each other as if they were connected via a LAN.
Separate L2 broadcast domains are maintained on a perVPLS basis by PEs. Such domains are then mapped into
tunnels in the service provider network.

High Performance Networks

6-52

Figure 6.41 shows a typical VPLS reference model. LSPs are


created between PEs. These LSP tunnels can be shared
among different VPLS domains and with other services such
as EoMPLS tunnels and L3 MPLS VPN tunnels. The PE
routers are configured to be part of one, many or no VPLS,
depending on whether they are participating in a VPLS
service.
LSP Tunnel

Paired VC-LSPs

Paired VC-LSPs

LAN Service

CE 1

PE 1

PE 2

CE 2

VPLS 1

VPLS 1
CE 4

CE 5
PE 3

VPLS 2

VPLS 2

Paired VC-LSPs

Paired VC-LSPs

VPLS 1

CE 3
VPLS 1
VPLS 2

Figure 6.41 VPLS Reference Model

VPLS Requirements
1.
Separation between VPLS domains
A VPLS system must distinguish different
customer domains. Each customer domain
emulates its own LAN. VPLS PEs must
maintain a separate virtual switching instance
per VPN.
2.
MAC address learning
A VPLS should be capable of learning and
forwarding based on MAC address. The VPLS
looks exactly like a LAN switch to the CEs.
3.
Switching
A VPLS switch should be able to switch
packets between different tunnels based on
MAC addresses. The VPLS switch should also
be able to work on 802.1pq tagged and
untagged Ethernet packets and should support
per-VLAN functionality.

High Performance Networks

6-53

4.

Flooding
A VPLS should be able to support the flooding
of packets with unknown MAC addresses as
well as broadcast and multicast packets.
5.
Redundancy and failure recovery
A VPLS should be able to recover from
network failure to ensure high availability.
6.
Provider edge signaling
In addition to manual configuration methods,
VPLS should provide a way to signal between
PEs to auto-configure and to inform the PEs of
membership, tunneling, and other relevant
parameters. Many vendors have adopted LDP
as the signaling mechanism; however, some
prefer BGP as used in BGP/MPLS VPN
7.
VPLS membership discovery
The VPLS control and management plane
should provide methods to discover the PEs that
connect CEs forming a VPLS. One method to
achieve auto-discovery is by using of BGP.
8.
Inter-provider connectivity
A VPLS domain should be able to cross
multiple providers and the VPLS identification
should be globally unique.
9.
VPLS management and operations
VPLS
configuration,
management
and
monitoring, including the monitoring of
customer SLAs (service level agreements), are
very important to the success of the VPLS
service.
Signaling the VPLS Service
The signaling mechanism with VPLS is the same as
that in the P2P martini model. Targeted LDP sessions
are first setup between PE peers. These peered PEs
then exchange VC information using LDP with a
forwarding equivalency class element. The main
difference in the signaling mechanism between VPLS
and the martini model is that in a P2P martini tunnel,
the VC ID is a service identifier representing a
particular service on the Ethernet port, such as a
different P2P VLAN. With VPLS, the VC ID
represents an emulated LAN segment.

High Performance Networks

6-54

Some vendors adopted BGP as the signaling


mechanism because of its scalability (by using route
reflector) and its ability to support VPLS deployment
across multiple providers.
VPLS Encapsulation
VPLS encapsulation in data forwarding is derived
from the martini encapsulation used for P2P EoMPLS
service. Service delimiters, such as VC/VLAN label
and LSP label are assigned at the ingress PE and
stripped in the egress PE, see figure 6.40.
Loop-Free Topology
The problem with having a VPLS domain emulate a
LAN is that it can create the same circumstances that
create a loop in a LAN. In figure 6.42, assume PE 1
receives an Ethernet frame from CE 1 with an
unknown destination MAC address 080003009876.
PE 1 floods this packet to PE 2 and PE 3 that
participate in the same VPLS, i.e. VPLS 1. Assume
PE 2 also does not know this destination address, then
it sends this packet to both CE 2 and PE 3. In the same
manner, if PE 3 does not know this destination address,
then it sends this packet to CE 3 and PE 1. Hence, a
loop is formed!
To avoid loops in a VPLS domain, the following two
conditions must be satisfied.
1.
All PEs participated in a VPLS domain must be
fully meshed, i.e. fully connected; AND
2.
Each PE must support a split-horizon scheme
wherein a PE MUST NOT forward traffic
from one PW (i.e. one PE) to another PW (i.e.
another PE) in the same VPN. A PE can only
forward the received traffic from the PW(s) to
its local CE(s).
MPLS Backbone

Destination MAC: 080003009876


CE 1

PE 1

PE 2

VPLS 1

VPLS 1

PE 3
4

High Performance Networks

CE 2

VPLS 1

6-55

CE 3

Figure 6.42 L2 Loops

MAC Address Learning and Withdrawal


In MP2MP, a PE is connected with multiple VC-LSPs
to different PEs that participate in multiple VPLS
domains. The PE needs to decide which LSPs to put
the traffic on. This decision is based on the destination
MAC addresses that belong to a certain VPLS. MAC
address learning allows the PE to determine from
which physical port or LSP a particular MAC address
came.
In figure 6.43,
1.
PE 1, PE 2 and PE 3 establish targeted LDP
sessions with each other,
2.
using LDP, PE 1 signals VC label 102 to PE 2
and PE 2 signals VC label 201 to PE 1. This
establishes the paired VC-LSPs between PE 1
and PE 2. The similar applies to PE 1 and PE 3,
and also to PE 2 and PE 3 except with different
VC labels, see figure 6.43.
3.
A station behind CE 2, say station A, with a
MAC address of 080003009876, sends a
broadcast packet. PE 2 recognizes that station A
belongs to VPLS 1 (via manual configuration or
BGP, see subsection auto-discovery) and
replicates this packet to the two VC-LSPs to PE
1 and PE 3. that also participate in VPLS 1.
PE 2 also associates MAC address
080003009876 with the local Ethernet port or
VLAN it came on. Then it installs this MAC
address-Port/VLAN association/mapping into
its FIB 1.
4.
When the packet comes to PE 1 on PE 1s
inbound VC-LSP, it associates the MAC
address of station A with the outbound VC-LSP
in the same VC-LSP pair that constitutes the
PW between PE 1 and PE 2. PE 1 then installs
this MAC address-PW association/mapping into
its FIB 1.

High Performance Networks

6-56

5.

PE 3 process this incoming broadcast packet the


same way as PE 1 and updates its own FIB 1.
From then on, if PE 1 (or PE 3) receives a
packet destined for MAC 080003009876, it
automatically sends it on its outbound VC-LSP
to PE 2 using label 201 (or 203).

Note: a PE has to maintain separate FIBs for all the


VPLS domains it participates in. For example, PE 1 in
figure 6.41 has to maintain two FIBs, FIB 1 for VPLS
1 and FIB 2 for VPLS 2.
VC Label: 102
MAC: 080003009876
Comes from PE 2
VPLS 1, VC Label 201

VC-LSP (102)

FIB 1

CE 1

Destination MAC: FFFFFFFFFFFF


Source MAC: 080003009876

VC Label: 201
FIB 1

CE 2

VPLS 1

VPLS 1
PE 2

VC-LSP (201)

PE 1
VC Label 103

VC Label 203
FIB 1

PE 3
VC Label 301

VC Label 302

VPLS 1

CE 3

VC-LSP

Figure 6.43 MAC Address Learning

Unqualified Versus Qualified Learning


In unqualified learning, a customer VPLS is a portbased service where the VPLS is considered a single
broadcast domain that contains all the VLANs that
belong to the same customer. In this case, a single
customer is handled with a single VPLS. On the other
hand, qualified learning assumes a VLAN-based
VPLS where each customer VLAN can be treated as a
separate VPLS and as a separate broadcast domain.
The advantage of qualified learning is that customer
broadcast is confined to a particular VLAN.

Auto-discovery
Auto-discovery refers to the process of finding all the
PEs that participated in a given VPLS. This of course

High Performance Networks

6-57

can be achieved via manual configuration on each PE


belonging to a certain VPLS. However, manual
configuration is time consuming and labor intensive.
A mechanism using BGP extended communities to
identify a VPLS has been adopted for auto-discovery.
This approach is similar to the routes exchanged in a
BGP/MPLS L3 VPN. In VPLS, the routes exchanged
in BGP carry a VPN-L2 address. A VPN-L2 address
contains a route distinguisher field that distinguishes
between different VPN-L2 addresses. Also, a BGP
route target (RT) extended community is used to
constrain route distribution between PEs. The RT is
indicative of a particular VPLS. Because a PE is fully
meshed with all other PEs, it receives BGP
information from all PEs. The PE filters out the
information based on the RT and learns only
information pertinent to the route targets (VPLSs) it
belongs in.

High Performance Networks

6-58