Вы находитесь на странице: 1из 48

Architectures for Networks and Services

Arhitecturi pentru retele si servicii (ARS)

Routing protocols Inter-domain routing


Managementul serviciilor si retelelor (MSR)

Routing in the Internet (review)

Routing and administrative boundaries

The Internet is a federation of autonomous IP networks, owned and operated by different organizations. Each organization manages autonomously: the connectivity and routing within its own network; the policies for interconnection with other networks.

Internet routing must take into account these administrative boundaries.


2

Octavian Catrina

Autonomous systems (AS)

An AS is a connected group of networks under a single technical administration, with consistent interior routing, and unified exterior routing policy. An AS is treated as a single entity by the rest of the Internet. It is also called (autonomous) routing domain. Autonomous systems are distinguished by AS numbers. 16-bit numbers initially; 32-bit numbers since RFC 4893, 2007. A routing domain needs a unique AS number only if it participates in inter-domain routing, with distinct routing policies.

The range 64512 - 65535 is reserved for private use.


ISP-1 AS 100 ISP-2 AS 500

Transit networks. Need an AS number.

Single-homed stub network. Extension of ISP-1 network. Does not need its own AS number.
Octavian Catrina

S1

S2 AS 7000

Multi-homed stub network. Needs an AS number for typical routing policies related to multi-homing.
3

How many Autonomous Systems?


Evolution of the number of assigned AS numbers (found in BGP routing tables)

http://www.potaroo.net, Oct. 2012

Octavian Catrina

Intra-domain vs. inter-domain routing


Intra-domain routing

Inter-domain routing

Based on IP network topology and performance metrics. All routers in the AS participate in route finding and selection. Focus on efficiency and performance (QoS). Interior routing (or gateway) protocols (IGP): RIP, OSPF, ...

Based on AS interconnection topology and routing policies. Designated pairs of AS border routers exchange routing info. Focus on scalability, route stability, convergence. Exterior routing (or gateway) protocols (EGP): BGP4.

IGP
Octavian Catrina

EGP
5

Internet structure
Tier-1 ISPs. Very large, global networks

Transit domains
Tier-2 ISPs. National or regional ISPs.

Company networks, small ISPs, ... Customer-Provider Relationship (transit) Hierarchical structure

Stub domains

Peer Relationship (shared-cost) Shortcuts


Examples of paths
Octavian Catrina 6

Map of the Internet


Three distinct regions are apparent: an inner core of highly connected nodes, an outer periphery of isolated networks, and a mantle-like mass of peer-connected nodes. Bigger the node, more connections it has. The picture shows the Internet as consisting of a dense core of 80 or so critical nodes surrounded by an outer shell of 5,000 sparsely connected, isolated nodes that are very much dependent upon this core. Separating the core from the outer shell are approximately 15,000 peerconnected and self-sufficient nodes. Take away the core, and an interesting thing happens: about 30 percent of the nodes from the outer shell become completely cut off. But the remaining 70 percent can continue communicating because the middle region has enough peer-connected nodes to bypass the core.

With the core connected, any node is able to communicate with any other node within about 4 links. If the core is removed, it takes about 7 or 8 links. http://www.technologyreview.com
Octavian Catrina 7

Types of routing domains

Transit domain

Allows other routing domains to use its network infrastructure to communicate with each-other (ISPs, typically).
S1

T1

T3

S5
T1 - T3: Transit domains. S1 - S5: Stub domains. S1, S3: Multi-homed stub domains.

S4 S2

T2

S3

Stub domain

Does not allow the use of its network infrastructure for traffic between other routing domains.

A stub domain is connected to at least one transit domain. Single-homed: 1 transit domain. Dual-homed: 2 transit domains. Can also be connected to other stub domains.

Different traffic patterns: content rich; access rich.


8

Octavian Catrina

Inter-domain relationships

Customer-provider relationship

Provider offers transit for traffic between the customer and all prefixes it can reach or part of them. Customer pays provider.

Peering relationship

Two ASes interconnect their networks such that to allow traffic between their customers and between their internal prefixes. Typically established by ISPs of similar sizes, without payment. Peering provides shortcuts and reduces transit costs.
Customer

Provider Peer
AS3 AS5

Peer

AS2 AS1

AS4
AS6
Contribution to delivery costs (implicit)

AS7

Octavian Catrina

Physical ISP interconnection

Dedicated links

Set up between 2 ISPs that agree to interconnect their networks mesh topology for AS interconnection.

AS1

Internet exchange

Dedicated links

More efficient solution: rendezvous point for many ISPs hub and spokes physical topology. Run by an independent organization. A typical Internet exchange offers:

AS2 Internet Exchange

AS3

Collocation for ISP routers running BGP. High speed switched Ethernet LAN (1-10 Gbps links) for the interconnection of the ISP routers.
Octavian Catrina

AS4

AS5

10

Border Gateway Protocol (BGP)

BGP: The inter-domain routing protocol of the Internet

"BGP is running on more than 100K routers (my estimate), making it one of worlds largest and most visible distributed systems". T. Griffin, 2002. 1989-1995: The beginnings.
1989: BGP-1 (RFC 1105) replaces EGP. 1990, 1991: BGP-2 (RFC 1163). BGP-3 (RFC 1267).

BGP evolution ("BGP was not designed, it evolved")

1995: BGP-4 (RFC 1771).

Uses CIDR, the new IP addressing & routing scheme. Deployed all over the Internet during the following years. So far, it scaled up quite well with the fast growth of the Internet.

2006: Updated BGP-4 specification, RFC 4271.

Tens of other RFCs describe various BGP-4 extensions introduced in the mean time.
11

Octavian Catrina

Active BGP routes (2004)


About 180000 routes
Route aggregation problems again!
ISPs improve BGP policies

Route aggregation does not work anymore

CIDR is deployed, route aggregation limits routing table growth

Fast exponential growth before CIDR

http://www.potaroo.net, Oct. 2004

Octavian Catrina

12

Active BGP routes (2012)


About 430000 routes

http://www.potaroo.net, Oct. 2012

Octavian Catrina

13

Outline of BGP operation

Path vector routing protocol

Similar to distance vector protocols. Routers establish a BGP session on a TCP connection, exchange the current routes in the BGP Routing Information Base (RIB), and then send incremental updates of RIB changes.

AS1

BGP sessions

BGP session

R1

Feasible routes. Withdrawn routes.


R2

AS2

BGP routing information (UPDATE message)

Feasible routes: Address prefixes reachable via the sender's AS and an associated set of path attributes. Withdrawn (infeasible) routes: Address prefixes of previously announced routes that are not reachable anymore via the sender's AS. An UPDATE message can indicate both feasible routes and infeasible routes.
Octavian Catrina 14

BGP path attributes

Path attributes

BGP selects a feasible "best" route to a destination based on information provided in a set of attributes associated to each path, including: loop detection, next hop, local preference, etc.

Well-known attributes

Must be recognized by all compliant BGP implementations. Are propagated to other neighbors. Well-known mandatory: Must be specified for all paths. Well-known discretionary: May be omitted for some paths.

Optional attributes

Not required and not expected to be recognized by all BGP implementations. If recognized they may be propagated to other BGP routers, according to their meaning. Optional transitive: If not recognized, may be propagated. Optional non-transitive: If not recognized, must be discarded.
Octavian Catrina 15

Main BGP path attributes


Attribute
ORIGIN

Category
well-known mandatory
well-known mandatory well-known mandatory

Description
Origin of the path information: 0 = intra-AS (IGP); 1 = EGP; 2 = incomplete (other means). Used to assess the trust in the source of routing info.
List of all the ASes on the path: AS_SEQUENCE, AS_SET (AS_SET is needed for route aggregation). Loop detection, in/out filtering, best path selection. Address of the router to be used as next hop on the path to the specified prefix.

AS_PATH

NEXT_HOP

MULTI_EXIT_DISC optional non- Value used in best path selection to discriminate (MED) transitive among multiple entry points of a neighboring AS. LOCAL_PREF well-known Value of the degree of preference for this route. Only used inside an AS (iBGP session, not eBGP).

COMMUNITY (RFC 1997)

optional transitive

32-bit tag that can be associated to any group of prefixes in order to specify various policies (can be structured as 16-bit AS number and 16-bit value). Standard (e.g., NO_EXPORT) and custom policies. Often used to control inbound traffic.

. . .
Octavian Catrina 16

BGP extensions

Multiprotocol BGP (MP-BGP)

Extends BGP in order to carry routing information for other protocols besides IPv4 (e.g., IPv6, VPN-IPv4, VPN-IPv6). RFC 4760, 2007, Multiprotocol Extensions for BGP-4.

BGP Extended Communities Attribute

Extends the range of values and adds structure (Type field). RFC 4360, 2006, BGP Extended Communities Attribute.
Attribute
MP_REACH_NLRI MP_UNREACH_NLRI

Category

Description

optional non- Multiprotocol Reachable NLRI. Used to advertise MP route. transitive Multiprotocol Unreachable NLRI. Used to withdraw MP route. Two fields are used to describe the kind of route: Address Family Identifier (AFI), e.g., IPv4, IPv6, vpnv4 (VPN-IPv4), vpnv6 (VPN-IPv6). Subsequent Address Family Identifier (SAFI), e.g. unicast, multicast.

Extended Communities

optional transitive

Extension of Community attribute8 octets, out of which 1-2 octets for type field. E.g., Route Target (RT) Community used for MPLS-based L3 VPNs.
17

Octavian Catrina

BGP messages
BGP message
OPEN

Description
Used to establish a BGP session between two routers (includes mutual identification and optional capabilities negotiation). Each router sends an OPEN message, which is acknowledged (if accepted) by a KEEPALIVE message.
Used to verify the liveness of the BGP session. A KEEPALIVE message is sent if no other message was transmitted during HoldTime/3 seconds. A BGP session is canceled if no message was received during HoldTime seconds. Used to exchange routing information on a BGP session. Can carry a set of feasible routes with common attribute-set (prefixes and attribute-set), and/or a set of withdrawn routes (prefixes). Used to inform the other BGP router of an error. After sending or receiving this message a BGP session is closed.

KEEPALIVE

UPDATE

NOTIFICATION

ROUTE_REFRESH Message added in 2000 to enable a graceful restart of a BGP session (e.g., when policies are reconfigured).
Octavian Catrina 18

BGP session overview


AS 100 140.1.128.0/18 160.2.0.0/16 10.1.3.1 R1 10.1.3.2 R2 AS 200

BGP

TCP connection establishment (to port 179) BGP session establishment

BGP

190.3.192.0/18 170.4.0.0/16

OPEN (Version = 4, My-AS = 100, Hold-Time = 180, BGP-Identifier = 10.1.3.1, Optional-Param.) KEEPALIVE OPEN (Version = 4, My-AS = 200, Hold-Time = 180, BGP-Identifier = 10.1.3.2, Optional-Param.) Session established KEEPALIVE Session established

Exchange of current routes in RIB


UPDATE (Path-Attributes-Len = ..., Path-Attributes = ..., NLRI = 140.1.128.0/18, 160.2.0.0/16)

UPDATE (Path-Attributes-Len = ..., Path-Attributes = ..., NLRI = 190.3.192.0/18, 170.4.0.0/16)

.. Incremental updates (RIB .changes) using UPDATE: - Announce new feasible routes. - Withdraw infeasible routes. Periodic KEEPALIVE messages (e.g., at Hold-Time/3)
Octavian Catrina

NLRI = Network Layer Reachability Information = IP address prefix


19

Example: BGP session setup

OPEN Message and main parameters

KEEPALIVE Message

Octavian Catrina

20

Example: BGP updates (1)

UPDATE Message advertising a feasible route to 10.20.0.0/16

Attributes of the BGP route to the prefix specified below

Prefix of the BGP route: 10.20.0.0/16

Octavian Catrina

21

Example: BGP updates (2)

UPDATE Message announcing a withdrawn (infeasible) route to 10.20.0.0/16

Prefix of the withdrawn route: 10.20.0.0/16

A BGP UPDATE message can advertise feasible routes and also announce withdrawn routes (previously advertised routes which are no longer feasible).

Octavian Catrina

22

BGP path vector routing (1)


AS 20
AS 10 140.1.0.0/16 R1 Router R1: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 internal ... 11.1.1.1

150.1.0.0/16
11.1.2.1

AS 30
160.1.0.0/16 11.1.2.2 R3

11.1.1.2

R2

Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ...

Router R3: BGP routing table Prefix AS_PATH NEXT_HOP 160.1.0.0/16 internal ...

BGP session setup


UPDATE (Prefix = 140.1.0.0/16, AS_PATH = 10, NEXT_HOP = 11.1.1.1) UPDATE (Prefix = 150.1.0.0/16, AS_PATH = 20, NEXT_HOP = 11.1.1.2) Router R1: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.1.2 Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ... 140.1.0.0/16 10 11.1.1.1

R1 and R2 exchange the current routes in their tables.

The AS_PATH attribute lists the ASes on the path. It is used for loop detection, for policy-based routing, and as basic path metric (AS count). The NEXT_HOP attribute indicates the IP address of the router used to reach the prefix.
Octavian Catrina 23

BGP path vector routing (2)


AS 20

AS 10
140.1.0.0/16 R1 Router R1: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.1.2

150.1.0.0/16

AS 30 160.1.0.0/16 11.1.2.2 R3

11.1.1.1
11.1.1.2 R2

11.1.2.1

Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ... 140.1.0.0/16 10 11.1.1.1

Router R3: BGP routing table Prefix AS_PATH NEXT_HOP 160.1.0.0/16 internal ...

BGP session setup

R2 and R3 exchange the current routes in their tables.


We assume that AS 20 is the provider, and AS 10 and AS 30 are its customers. AS 20 provides transit for its customers, hence it announces the routes to them.

UPDATE (Prefix = 150.1.0.0/16, AS_PATH = 20, NEXT_HOP = 11.1.2.1) UPDATE (Prefix = 140.1.0.0/16, AS_PATH = 20 10, NEXT_HOP = 11.1.2.1) UPDATE (Prefix = 160.1.0.0/16, AS_PATH = 30, NEXT_HOP = 11.1.2.2) Router R3: BGP routing table Prefix AS_PATH NEXT_HOP 160.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.2.1 140.1.0.0/16 20 10 11.1.2.1
24

Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ... 140.1.0.0/16 10 11.1.1.1 160.1.0.0/16 30 11.1.2.2
Octavian Catrina

BGP path vector routing (3)


AS 20

AS 10
140.1.0.0/16 R1 Router R1: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.1.2

150.1.0.0/16

AS 30 160.1.0.0/16 11.1.2.2 R3

11.1.1.1
11.1.1.2 R2

11.1.2.1

Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ... 140.1.0.0/16 10 11.1.1.1 160.1.0.0/16 30 11.1.2.2

Router R3: BGP routing table Prefix AS_PATH NEXT_HOP 160.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.2.1 140.1.0.0/16 20 10 11.1.2.1

UPDATE (Prefix = 160.1.0.0/16, AS_PATH = 20 30, NEXT_HOP = 11.1.1.2) Router R1: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 internal ... 150.1.0.0/16 20 11.1.1.2 160.1.0.0/16 20 30 11.1.1.2

R2 sends an incremental update to R1.

Suppose that the BGP session between AS 20 and AS 30 is lost (link failure, R3 restart, etc). Then, R2 removes from its table the routes learned in this session and sends an update to R1.
Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 internal ... 140.1.0.0/16 10 11.1.1.1

UPDATE (Withdrawn-Routes = 160.1.0.0/16)

R2 sends an incremental update to R1.


25

R1's table returns to previous state.


Octavian Catrina

BGP policy-based routing


BGP router
Received BGP route updates + Import policies Control outbound traffic Best route table + Export policies Control inbound traffic Sent BGP route updates

Route import policies: For each BGP peer - Import filter selects the received routes accepted from that peer. - Path attribute manipulation influences best route selection by this BGP router. Control of outbound traffic: Imported route participates in the selection of the best path toward that prefix.
Octavian Catrina

Route export policies: For each BGP peer - Export filter selects the routes announced to that peer. - Path attribute manipulation influences the best route selection by that peer. Control of inbound traffic: Announcing a route enables the other AS to route traffic to that prefix via this AS.
26

Example: Route filtering


AS3 All routes Tier-1 ISP All routes Customer (AS1) & internal routes (AS2) All routes Customer (AS6) & internal routes (AS4) All routes

AS5 Tier-1 ISP


Static route

AS2 Tier-2 ISP


BGP not necessary Static route AS1 Default route

Customer (AS1) & internal routes (AS2)


Customer (AS6) & internal routes (AS4)

AS4 Tier-2 ISP AS6 routes


AS6 routes AS6

BGP not necessary


Default route AS7

Customer
Peer

Provider Peer

Note: Assume that AS2, AS3, AS4, AS5 have many other interconnections, not shown in this picture. In particular, AS2 and AS4 are also connected to other Tier-1 and Tier-2 ISPs, and have many other customers.
Octavian Catrina 27

BGP route processing


Receive BGP updates Import filter and attribute manipulation Select one best route to each destination based on attributes

BGP Loc-RIB

Export filter and attribute manipulation

Send BGP updates

Apply import policies BGP Adj-RIB-In

Best route selection

BGP best route table

Apply export policies BGP Adj-RIB-Out Dir. con. net. Static routes

Install routes for IP forwarding IGP routes


IP routing table

Policies are typically defined using vendor specific languages. Route filtering can be done based on prefix, AS number, attributes. Import and export of internal routes is also controlled by policies. The interactions between BGP and internal routing (IGP, static) are not precisely defined.
Octavian Catrina 28

Best route selection

Overview

The algorithm operates on the set of routes in Adj-RIBs-In after applying import policies (filter, tweak attributes, assign local preference). It selects one best route for each destination.

Sequence of selection steps for each destination

Discard all the routes for which NEXT_HOP is inaccessible. Discard all the routes for which AS_PATH contains the local AS or a loop. Let n be the number of routes left.

If n>1, select the routes with the largest LOCAL_PREF.

If n>1, select the routes with the smallest number of ASes in AS_PATH.
If n>1, select the routes with the lowest ORIGIN (IGP<EGP<INCOMPLETE).

If n>1 and routes from the same AS, select the routes with the lowest MED.
If n>1 and a route learned from eBGP, discard routes learned from iBGP.

If n>1, select the routes with shortest path (local metric) to NEXT_HOP.
If n>1, select the route from the BGP speaker with the lowest BGP Id.
29

Octavian Catrina

Route propagation in an AS

Example

AS 20 has two border routers, R2 and R3. How can R2 advertise to R3 the routes learned from R1? How can R3 advertise to R2 the routes learned from R4?
AS 10 140.1.0.0/16 R1 11.1.1.1 11.1.1.2 150.1.0.0/16

BGP

R2 10.0.0.1 AS 20

AS 30

IGP?

10.0.0.2 R3

11.1.2.2
11.1.2.1 R4

160.1.0.0/16

150.2.0.0/16

BGP

AS 20 runs an IGP for interior routing. Could this IGP distribute the BGP routes as well? No. The IGP cannot carry the BGP path attributes and does not scale up to BGP routing table size. Solution: Propagate BGP routes within an AS using BGP.
Octavian Catrina 30

External vs. internal BGP

External BGP (eBGP): BGP routers in different ASs Internal (iBGP): BGP routers in the same AS
AS 10 11.1.1.2

140.1.0.0/16

11.1.1.1

eBGP
R1

R2 10.0.0.1
AS 20

150.1.0.0/16

AS 30 11.1.2.2 11.1.2.1 R4 160.1.0.0/16

iBGP

10.0.0.2 R3

150.2.0.0/16

eBGP

eBGP session
Directly connected routers (usually). Do not use LOCAL_PREF. Update AS_PATH, NEXT_HOP attributes in advertised paths. Advertise the best route to each destination. Usually, an export filter for each session.
Octavian Catrina

iBGP session
Need not directly connect the routers. Use LOCAL_PREF attribute. No change to AS_PATH and NEXT_HOP attributes. Advertise only best routes learned from eBGP (not from iBGP, to avoid loops). Usually, no export filter. Full mesh of iBGP sessions.
31

Example: eBGP + iBGP + IGP


AS 10 140.1.0.0/16 R1 11.1.1.1 11.1.1.2

eBGP

R2 10.0.0.1 AS 20

150.1.0.0/16

AS 30 11.1.2.2 11.1.2.1 R4 160.1.0.0/16

iBGP

10.0.0.2 R3

150.2.0.0/16

eBGP
UPDATE (eBGP) Prefix = 140.1.0.0/16, AS_PATH = 20 10, NEXT_HOP = 11.1.2.1

UPDATE (eBGP) Prefix = 140.1.0.0/16, AS_PATH = 10, NEXT_HOP = 11.1.1.1

UPDATE (iBGP) Prefix = 140.1.0.0/16, AS_PATH = 10, NEXT_HOP = 11.1.1.1, LOCAL_PREF = 100

Note how the path attributes change during route propagation

Router R2: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 10 11.1.1.1 Router R2: IP routing table P BGP IGP IGP
Octavian Catrina

Router R3: BGP routing table Prefix AS_PATH NEXT_HOP 140.1.0.0/16 10 11.1.1.1 Router R3: IP routing table P BGP IGP IGP Prefix INTF NEXT_HOP 140.1.0.0/16 N 10.0.0.1 11.1.2.0/30 W direct 10.0.0.0/30 N direct 150.2.0.0/16 E direct 150.1.0.0/16 N 10.0.0.1 11.1.1.0/30 N 10.0.0.1
32

Prefix INTF NEXT_HOP 140.1.0.0/16 E 11.1.1.1 11.1.1.0/30 E direct 10.0.0.0/30 S direct 150.1.0.0/16 W direct 150.2.0.0/16 S 10.0.0.2 11.1.2.0/30 S 10.0.0.2

Which is the right path?

Hot potato routing

Cold potato routing

Closest exit routing (get rid of it as soon as possible). "Natural" BGP behavior.

Best exit routing (closest to destination). Must use MED or COMMUNITY attributes.

Example: Dual-homed customer

Hot potato! SF Hot potato!

Provider (ISP) NY Customer SF

Provider (ISP) NY Customer

Cold potato

Hot potato

Asymmetric routes. If the customer downloads only, then its own network carries most of the traffic!
Octavian Catrina

Symmetric routes. The provider carries most of the traffic in this case.
33

Multi-homing with one provider (1)

Example: Dual-homing without BGP


Customer R3 11.1.1.2

11.1.1.1

R1

AS 10 Transit provider Hot potato routing

150.1.0.0/24
Hot potato routing

No control of inbound traffic. Asymmetric routing.


R4 11.1.2.2 11.1.2.1

R2

Configuration example: The customer can configure default routes on R3 (via R1) and on R4 (via R2) and distribute them internally using IGP; outbound packets follow the IGP shortest paths and exit on either R3 or R4. The provider can configure static routes to the customer's prefix on R1 (via R3) and R2 (via R1) and distribute them using BGP (possibly also IGP).

How can we control outbound traffic? Easier: We have full control of the routers in our own AS. How can we control inbound traffic? Difficult: We need to influence routing in the provider's AS. We need BGP on R3, R4. However, solutions currently offered by BGP are not satisfactory - still an open issue.
Octavian Catrina 34

Multi-homing with one provider (2)

Example: Backup link using BGP

Simpler goal: Primary link carries all the traffic. Backup link is used only when the primary link fails.

Control of outbound traffic


Primary link
AS 456 Customer 150.1.0.0/24 Higher local preference for default route via primary link R3 11.1.1.2 UPDATE (eBGP) Prefix = 0.0.0.0/0, AS_PATH = 10, NEXT_HOP = 11.1.1.1 11.1.1.1 AS 10 Transit provider Hot potato routing

R1

iBGP
Backup link R4 11.1.2.2

UPDATE (eBGP) Prefix = 0.0.0.0/0, AS_PATH = 10, NEXT_HOP = 11.1.2.1 11.1.2.1

iBGP

R2

Traffic can still return on the backup link. We have to also control inbound traffic.

Configuration example for outbound traffic: R1/R2 send default routes to R3/R4. R3 and R4 set the local preference of the received default routes such that the primary link has higher preference than the backup link. They don't need to import other routes. The default route is then distributed in the customer network by IGP. Variant: on Cisco routers we can configure a default network instead of default route - one of the networks advertised by the provider.
Octavian Catrina 35

Multi-homing with one provider (3)

Control of inbound traffic using the MED attribute

MED: Multi-exit discriminator attribute. Optional, non-transitive. Lower MED means better route (MED can be IGP metric). Applies only to routes received from the same adjacent AS.
Does not help if multi-homed with different providers.

Example: Using MED for backup link


Primary link
AS 456 Customer R3 11.1.1.2

UPDATE (eBGP) Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 11.1.1.2 MED = 100

11.1.1.1

R1

AS 10 Transit provider Cold potato routing

150.1.0.0/24

iBGP
Backup link R4 11.1.2.2

UPDATE (eBGP) iBGP Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 11.1.2.2 R2 MED = 200 11.1.2.1

Configuration example for inbound traffic: R3 advertises the customer's route with a lower MED than R4. This indicates that the preferred route for inbound traffic is via R3 (the primary link). Then, R1 and R2 select the route via R3 as best route to the customer (assuming that the provider supports and accepts MED).
Octavian Catrina 36

Community attribute

Community attributes

Versatile mechanism that allows grouping destinations into communities and applying to them various policies based on a Community attribute. The attribute is optional and transitive. RFC 1997, RFC 4360 (extended communities), etc.

Several standard (well-known) communities

NO_EXPORT (0xFFFFFF01): Do not export the route in eBGP sessions (only use it in within your AS). NO_ADVERTISE (0xFFFFFF02): Do not advertise the route in iBGP sessions (only use it in that router).

Delegated communities

Operators can define communities for policies of their own choice. Attribute value is structured into a 16-bit AS number (local AS), and a 16-bit value for the policy.
37

Octavian Catrina

Route filtering based on communities


Communities defined by AS 99 for export filters: 99:100 Customer prefixes 99:200 Transit prefixes 99:300 Peer prefixes 99:400 Internal prefixes Received routes are tagged with these communities when they are imported.
R7
AS 10 Our transit provider

R2

Import: set 99:200 Export: match 99:100, 99:400 Import: set 99:300 Export: match 99:100, 99:400
R3 R4

R1

AS 99 Our ISP

AS 88 Our peer

R5

R8 AS 1234 Our customer

Import: set 99:100 Export: match 99:100, 99:200, 99:300, 99:400

R6 AS 4567 Our customer

Octavian Catrina

38

Multi-homing with different providers (1)


Multi-homing and route aggregation
UPDATE Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 150.1.1.2 R1 150.1.1.2 160.1.1.2

AS 10 Transit provider 150.1.0.0/16 UPDATE Prefix = 150.1.0.0/16, R2 AS_PATH = {10 456}, 150.1.1.1 150.1.2.1 NEXT_HOP = 150.1.2.1
Can aggregate with loss of path info: use AS_SET. 160.1.1.1 160.1.2.1 UPDATE Prefix = 150.1.0.0/24, R3 AS_PATH = 20 456, NEXT_HOP = 160.1.2.1 AS 20 Transit provider No aggregation 160.1.0.0/16 Some Router: BGP routing table Prefix AS_PATH NEXT_HOP 150.1.0.0/16 ... {10 456} ... 150.1.0.0/24 ... 20 456 ... Best path, because it is more specific!

AS 456 Our AS 150.1.0.0/24

UPDATE Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 160.1.1.2

Increase of the number of multi-homed networks leads to less aggregation, hence fast growth of the BGP routing table, and more instability.

If AS 10 aggregates, routers in the Internet will take the path via AS 20 because the route is more specific. Should AS 10 de-aggregate its /16 to balance the traffic?!
Octavian Catrina

39

Multi-homing with different providers (2)


Control of inbound traffic using communities Example: Backup link
AS 456 Our AS 150.1.0.0/24 R1 UPDATE Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 150.1.1.2 150.1.1.2 160.1.1.2 AS 10 Transit provider 150.1.0.0/16 AS 10 uses primary link to 456 because it prefers a customer route to peer and provider routes. UPDATE Prefix = 150.1.0.0/24, AS_PATH = 10 456, NEXT_HOP = ... No aggregation of AS 456 prefix such that other ASes do not route via AS 20. UPDATE Prefix = 150.1.0.0/24, AS_PATH = 20 10 456, NEXT_HOP = ... AS 20 routes to 456 via AS 10 due to the community attribute received. It could forward the community attribute, but this helps only if it is recognized by its peers and providers.

R2 150.1.1.1

Backup link
160.1.1.1 R3

Assume that the providers use the communities and policies listed below.

UPDATE Prefix = 150.1.0.0/24, AS_PATH = 456, NEXT_HOP = 160.1.1.2 COMMUNITY = 20:30

AS 20 Transit provider 160.1.0.0/16

Communities defined by AS 20 for customers:


20:90 20:70 20:50 20:30 Set LOCAL_PREF Set LOCAL_PREF Set LOCAL_PREF Set LOCAL_PREF = 90 = 70 = 50 = 30

Provider policy for assigning default LOCAL_PREF to imported routes: Customer: LOCAL_PREF = 80 Peer: LOCAL_PREF = 60 Transit: LOCAL_PREF = 40

Caveat: De-aggregation violates CIDR principles and increases BGP routing table, which will become unmanageable.
40

Octavian Catrina

Scaling up iBGP

How to scale up iBGP to a large domain?

Routes learned from an iBGP session cannot be advertised on iBGP sessions, to avoid routing loops. Hence we must set up a full mesh of iBGP speakers. For N routers, each router must establish N-1 sessions, for a total of N(N-1)/2 sessions. Does not scale up. ASes can have 100s of routers or more.

Solution 1: Confederations

Divide a large AS in smaller sub-domains. Use an iBGP full mesh inside each sub-domain, and eBGP between sub-domains.

Solution 2: Route reflectors

Establish one or more routers, called route reflectors, that forward the iBGP routes to a cluster of other routers.
41

Octavian Catrina

Route reflectors (1)

From iBGP full mesh to iBGP with route reflector


Full mesh iBGP
iBGP
R2

iBGP with route reflector (RR)


eBGP iBGP eBGP
R1 R2

eBGP

eBGP

R1

iBGP

iBGP
R3

RR eBGP

iBGP
R3

eBGP

Interactions between a router reflector and its clients

R2 and R3 are configured as clients of the route reflector R1. Clients establish iBGP sessions only with their route reflector. When a route reflector receives a route update from its clients or from eBGP, it computes the best route and then advertises it to its clients.
Octavian Catrina 42

Route reflectors (2)

A more general configuration: route reflector and mesh


eBGP
R5
R2

eBGP
R1 = Route reflector (RR). R2, R3, R4 = RR clients. R5, R6 = non-clients.

iBGP
R1

iBGP iBGP iBGP

iBGP iBGP eBGP


R6

R3

eBGP

RR

R4

eBGP

Full mesh

Route reflector

R5 and R6 are not clients of the route reflector R1. They establish together with R1 a full mesh of iBGP sessions. When R1 receives a route update from a client, it computes the best path and advertises it to its clients and to non-clients. When R1 receives a route update from a non-client, it computes the best path and advertises it to its clients.
Octavian Catrina 43

BGP convergence

Convergence issues

BGP cannot guarantee that a unique stable routing exists for any set of policies. Moreover, BGP cannot guarantee that any stable routing exists or that it can be found. Various scenarios with non-deterministic outcome or persistent oscillation have been presented over time.

Causes

Locally defined policies may conflict globally. Unconstrained policy languages. However, outages are caused by configuration errors ...

Convergence speed

Improved by the use of triggered, incremental updates. But with current Internet size & complexity, path vector routing is quite slow (e.g., for unreachable destination).
Octavian Catrina 44

Stability of inter-domain routing

Challenges

Inter-domain routing in the global Internet relies on a huge distributed system of over 100K BGP routers. A router reconfiguration/restart, BGP configuration error, or local instability (link, BGP session, etc.) can flood with updates all the BGP routers in the Internet. This causes routing instability and disruption of Internet traffic.

BGP traffic analysis

Most routes do not change frequently. Most of the Internet traffic uses these routes. A small fraction of the routes are responsible for most of the BGP messages exchanged.

Approach

Penalize the unstable routes such that to preserve the more stable ones (and reduce Internet traffic disruption).
45

Octavian Catrina

Improving BGP stability

Avoid transmitting messages too frequently

UPDATE messages sent by a BGP router and advertising the same route should be separated by at least MinRouteAdvertisementInterval (MRAI) seconds. Default value: 30 sec. Caveat: Can delay propagation of updates and increase convergence duration.

"Route flap dampening" (RFC 2439)

Routes are given a penalty for changing. The penalty is increased each time the route changes. Route updates are suppressed if the penalty exceeds suppress limit. The penalty decreases exponentially when the route does not change. The route can be announced when the penalty falls below reuse limit. This mechanism can dramatically reduce the number of BGP updates and routing instability.
Octavian Catrina 46

Route flap dampening example

Octavian Catrina

47

Summary

"BGP was not designed. It evolved ..."

BGP remains a quite simple protocol, and there are many robust and efficient implementations. It has managed fairly well over the last 15 years to cope with the fast growth and increasing complexity of the Internet. BGP configuration for large domains is quite complicated and error prone (especially policies).

Challenges

BGP security (so far, only TCP MAC with static keys). Multi-homing and BGP "traffic engineering". How to do them without breaking CIDR and exploding the BGP routing tables? Policy-based routing. BGP next generation?

Octavian Catrina

48

Вам также может понравиться