Вы находитесь на странице: 1из 58

High-Availability

Enterprise Network
Design

haviland@cisco.com

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 1
Staying On Target
HA Focus vs Distractions!
“Flat networks “Variety”
of vendors,
are easier” protocols,
beware! designs, etc.

Inherited Five “Feature


complexity nines is rich”
hard to purge job one! let’s use all
the knobs!

The latest
cool stuff Change is hard,
older is more
505 stable sometimes $$$
0911_04F9_c3
© 1999, Cisco Systems, Inc. 2
HA Features of the Catalyst 6500
Consider for Backbones & Server Farms
✔ Fabric Redundancy
switch fabric module
in CatOS 6.1
✔ Supervisor Redundancy
HA feature in CatOS 5.4.1
stateful recovery
image versioning on the fly
✔ MSFC Redundancy
config-sync feature
IOS 12.1.3 CatOS 6.1

505
HSRP pair
0911_04F9_c3 © 1999, Cisco Systems, Inc. 3
Thinking Outside the Box

For HA/HP design


“outside the box”
☛ the logical design Inside:
“HA”,
is critical RAID,
UPS,
☛ network features MTBF,
& protocols etc.
☛ geophysical
diversity is powerful

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 4
Dramatis Personae
Our Cast of Symbols
✔ Links GigE Channel
GE, DPT, SONET, etc.
✔ L2 switching Catalyst 4000
L2 forwarding in hardware
✔ L3 switching Catalyst 6500
L3/L2 forwarding in hardware
✔ Routing
L3 forwarding (SW or HW)
Cisco 7500 Cisco 12000
✔ Control plane = IOS
routing protocols & features
✔ QoS where required
505
✔ Application intelligence
0911_04F9_c3 © 1999, Cisco Systems, Inc. 5
HA Gigabit Campus Architecture
survivable modules + survivable backbone
Access L2
Client
Blocks
Distribution L3

☛ Define
the mission Ethernet or ATM
Layer 2 or Layer 3 Backbone
critical parts
first!
Distribution L3
Server
Block
Access L2
E or FE Port
505 GE
0911_04F9_c3
or GEC
© 1999, Cisco Systems, Inc.
Server Farm 6
High Availability Design
Why a Modular ABC Approach

✔ Many new products, features,


technologies
✔ HA and HP application operation is
the goal
✔ Start with modular, structured
approach (the “logical” design)
✔ Add multicast, VoIP, DPT, DWDM...
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 7
Design the Solution
Then Pick the Products
New
$350 Price per 10/100 Modules
New
Modules
$300

$250 New Catalyst 5XXX Catalyst 6XXX


Catalyst 2912G
$200 Catalyst 2948G
Catalyst 2980G New
Catalyst 4XXX

$100

10/100 Ports 32-96


24 24-500+ 24-350+
Gigabit Ports 6-12 3-38+ 8-64+
Backplane 24 Gbps 1.2-3.6 + 10Gbps 250+ Gbps
505 Switching Capacity 20 Mpps Up to 72 Mpps Up to 150 Mpps
0911_04F9_c3 © 1999, Cisco Systems, Inc. 8
HA Design Reality Check!
Assume Things Fail - Then What?

✔ Networks are complex


✔ Things break, people make mistakes
✔ What happens if a failure occurs?
✔ Simple, structured, deterministic design
required for fast recovery
✔ The “tradeoffs”
your choices are important
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 9
Network Recovery
How Long? What Happens?
Building Branches
Access

Layer 2
Distribution

WAN

Layer 3
Core L3
6 WAN
backup
5
Server
4 Distribution

3
2 Server Farm Layer 2
1
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 10
Network Recovery Times
If You Follow the Rules
Failure Recovery Recovery
Scenario Mode Time

1,2 server Server NIC < 2 seconds

3,4 uplink HSRP (& UplinkFast) tune to 3 seconds

5,6 core HSRP track tune to 3 seconds

dual-path L3 alternate path used < 2 seconds

EtherChannel channel recovery < 1 second

L3 routing EIGRP or OSPF depends on tuning

L2 general L2 spanning tree tune (up to 50 seconds)

DPT IPS 50 milliseconds


505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 11
Design for High Availability
How to Build Boring Networks!

✔ The Concepts
✔ The Rules
✔ Design Building Block
✔ Design Backbone
✔ Notes on Tuning

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 12
HA Network Design Concepts
thinking outside the box

1) Simplicity & Determinism


2) Collapse the Sandwich
3) Spanning Tree Failure Domain
4) Map L3 to L2 to L1
5) Scaling and Hierarchy
6) ABCs of Module + Backbone
Design
7) The Four Corners
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 13
1) Simplicity and Determinism
reducing the degrees of freedom

Simple Flexible
Structured “HA Continuum” Complex
Deterministic Varied

Boring! Interesting!

✔ Every Choice Affects Availability!


✔ Determinism or Flexibility?
✔ Would you support 27 desktop environments?
✔ Would you support 13 network vendors?
✔ Would you use 57 varieties of Cisco IOS?
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 14
2) Collapse the Sandwich
route IP over glass

Traditional
Model Optical
Internetworking
IP Service
• Lower equipment cost IP
Traffic
FR/ATM Eng • Lower operational cost
Big Fat Pipe
• Simplified architecture
Fiber
SONET Mgmt • Scalable capacity
Fiber

Fiber

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 15
3) Minimize the Failure Domain
public enemy number one
avoid highly meshed, non-deterministic large scale L2 = VLAN topology
Building 1 Building 2

Where should root go? Broadcast flooding


What happens when Multicast flooding
something breaks?
Loops within loops
How long to converge?
ST from heck
Many blocking links
Times 100 VLANs?
Large failure domain!

505 Building 3 Building 4


0911_04F9_c3 © 1999, Cisco Systems, Inc. 16
4) Map L3 to L2 to L1

✔ Easier administration & troubleshooting

Clients in subnet 10.0.55.0


VLAN 55
wiring closet “55” on floor 55
access switch “55”
interface VLAN 55
all match and life is good
go fishing with your kids

10/100 BaseT
505
GE or
0911_04F9_c3 GEC
© 1999, Cisco Systems, Inc. 17
5) Scaling and Hierarchy

Strong hierarchies
like telephone
system and Flat L2 Ethernet is ATM LANE is
Internet segment easy but does not logically flat, scales
addressing and scale as N squared
therefore scale

U U U

C C C

N N N
C complexity
U unmanageable
505
N number
0911_04F9_c3 of devices
© 1999, Cisco Systems, Inc. 18
6) Building Block &
Backbone Design ABCs
Server Farm
LAN Access

A design bb
Distribution
B design BB
C connect bb to BB

Core Divide and conquer


Cookie cutter
configuration
Deterministic
Distribution
L3 demarcation
Ecommerce
Solution
WAN Access
WAN Internet PSTN
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 19
7) Four Square Network Redundancy
or the Four Corners Problem

L3 One Chassis Two Chassis

Simplest GeoPhysical
No Redundancy Effective
One
Supervisor

When space Most Complex


Two is limited Belt and Suspenders
Supervisors
“HA”

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 20
Dos and Don’ts for HA Design

1) Eliminate STP Loops


2) L3 Dual-Path Design
3) EtherChannel Across Cards
4) Workgroup Servers
5) Use HSRP Track
6) Passive Interfaces
7) Issues with Single-Path Design
8) Oversubscription Guidelines
9) HA for single attached servers
10) Protocol Tradeoffs
11) UDLD Protection
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 21
Rule 1) Eliminate STP Loops
in the backbone and mission critical points

Too many cooks spoil the broth


L3 control is better
X.1 X.2 X.3

No blocking links to
waste bandwidth
Avoids slow STP L2 Gigabit switch in
Root
convergence backbone
VLAN X
Very deterministic subnet X = VLAN X

Routed links not VLAN


trunks

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 22
Rule 2) Dual Equal-Cost Path L3

✔ Load balance - don’t waste bandwidth


unlike L1 and L2 redundancy
✔ Fast recovery to remaining path
detect L1 down & purge - about 1s
✔ Works with any routed fat pipes

Equal cost Path A


routes to X Destination
network X
Path A
Path B Path B

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 23
Rule 3) EtherChannel Across Cards

Increased availability Small complexity increase


✔ Sub second recovery ✔ Single L2 STP link
✔ Spans cards on 6500 ✔ Single L3 subnet
✔ Up to 8 ports in channel ✔ less if channel set “on”

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 24
Rule 4a) Connect Workgroup Server
With no L2 recovery path, what happens if link
breaks ….
Client X.1 VLAN X in purple
includes clients
and workgroup
servers attached
C Link CB at different places.
breaks ….

Workgroup server X.100


A attached to distribution layer
B
L2 path to client X.1

Links to core
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 25
Rule 4b) Connect Workgroup Server
• Subnet X now discontiguous
• Incoming traffic gets dropped
Client X.1
Routers A & B continue to
advertise reachability of
C subnet X ...

Workgroup server X.100


A attached to distribution layer
B
L2 path to client X.1

X.100 not X.1 not


reachable reachable

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 26
Rule 4c) Connect Workgroup Server
• Introduce L2/STP redundancy
• Adds a loop (band-aid fix)
Client X.1
•VLAN trunk AB forms L2 loop
•recovery path for STP
C
•prevents black hole

Workgroup server X.100


A attached to distribution layer
B
L2 path to client X.1

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 27
Rule 4d) Connect Workgroup Server

Real Lessons:
☛ Enterprise Server Farms
are better
☛ L3 demarcation is better
☛ Example of why extended
L2 is difficult

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 28
Rule 5a) Use HSRP Track
• Review - Hot Standby Router Protocol
• Fast recovery can be tuned to 3s or less

Subnet M
hosts M.1 M.2 M.3 Router X acts as gateway
router for subnet M, IP address
M.100. If link Z fails router Y
will take over as M.100 gateway
with same MAC address

X is M.100 Y ( becomes M.100)


HSRP Primary HSRP Backup
Priority 200 Priority 100

10/100 BaseT
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. GE or GEC
29
Rule 5b) Use HSRP Track
• Track extends HSRP to monitor links to backbone
• Ensures shortest path - best outbound gateway

Subnet M
hosts M.1 M.2 M.3 Track interface A - lower priority 75
Track interface B - lower priority 75
HSRP triggers if both A and B lost

X is M.100 Y ( becomes M.100)


HSRP Primary HSRP Backup
Priority 200 Priority 100

A B 10/100 BaseT
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. GE or GEC
30
Rule 6a) Use Passive Interfaces
• L3 switches X & Y in distribution layer
• 4 VLANs per wiring closet
• 10 wiring closets
ABCD EFGH IJKL MNOP
Wiring
closet … Ten total
switch

Distribution
X Y
switch

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 31
Rule 6b) Use Passive Interfaces

• What X and Y see is 4*10=40 routed links


• Increased protocol overhead & CPU

A.1 A A.2
B.1 B B.2
X Y
C.1 C C.2
D.1 D D.2
E.1 E E.2
F.1 F F.2
G.1 G G.2
Etc. Etc. Etc.

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 32
Rule 6c) Use Passive Interfaces

☛ Turns off routing updates & overhead


☛ Leave two routed links for redundant paths
☛ CDP, VTP, HSRP etc. still function on all links

A.1 A A.2
B.1 (passive) B B.2 (passive)
X Y
C.1 (passive) C C.2 (passive)
D.1 (passive) D D.2 (passive)
E.1 E E.2
F.1 (passive) F F.2 (passive)
G.1 (passive) G G.2 (passive)
Etc. Etc. Etc.

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 33
Rule 7a) Issues With Single Path
Designs
Subnet A Subnet B
Outbound case ...
Access
✔ L3 engine MSFC on L2
New, longer
core-X reloads outbound
GE routes
✔ Lights are on but
nobody home - HSRP HSRP
does not recover primary
✔ Remove passive
interface to wiring Single path
to core
closet subnets A, B
✔ Provide longer routed Core L3 X Y
recovery path

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 34
Rule 7b) Issues with Single-Path
Design
Subnet A Subnet B
Inbound case ...
Access
L2
✔ Recovery must take
place in both GE
directions
✔ Routing protocol HSRP
recovers longer route primary
from X to subnets A, B
Single path
✔ Therefore dual-path L3 to core
is better & faster than
single-path Core L3 X Y

New, longer
505 routes to A, B
0911_04F9_c3 © 1999, Cisco Systems, Inc. 35
Rule 8a) Oversubscription
Guidelines

Non-blocking Blocking
✔ Oversubscription part of design design 2:1

all networks - not bad GE GE GE

✔ Non-blocking switches
do not mean a non-
blocking network
GE GE
✔ You determine the
amount of “blocking”

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 36
Rule 8b) Oversubscription
Guidelines
200 100BaseT

✔ Oversubscription rules 20:1


of thumb work well
✔ 20:1 at wiring closet GE
8 uplinks
✔ Less in distribution and Distribution
n:1
server farm L3

✔ QoS required IFF Dual-link


congestion occurs GEC

✔ Protect real time flows Core L3


use non-
at congested points blocking
switches

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 37
Rule 9) Dual Supervisors
HA for Single Attached Servers

✔ Single point of failure


✔ Dual supervisors - fast stateful recovery
✔ No increase in complexity

Single attached server


mission critical application

HA dual supervisors
Catalyst 6XXX

10/100 BaseT Redundant uplinks


505
GE or
0911_04F9_c3 GEC
© 1999, Cisco Systems, Inc. 38
Rule 10) Protocol Tradeoffs
Automatic or Manual Configuration

✔Configuration up front rather


than CPU overhead later, for
example:
➙ set VTP mode transparent
➙ set/clear VLANs for each trunk
➙ set trunks on or off
➙ set channel on or off
✔Choose flexibility or
determinism

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 39
Rule 11) UniDirectional Link
Detection
✔ UDLD detects mismatch when physical layer
checks out OK
✔ Prevents various failure conditions including
crossed wiring

Tx Fiber
The lights
are on,
BUT ….. Rx Fiber

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 40
Building Block Means Survivable
Self-contained Backbone

✔ Autonomous Survivability ASU


Unit - HSRP
delimits
✔ L3 Broadcast Multicast failure
demarcation domain

✔ Cookie cutter configuration


✔ L3 Demarcation of failure
domain
✔ Simple, repeatable, L2
deterministic L3
✔ Redundancy adds 15% cost
at mission critical points like
505
server farm
0911_04F9_c3 © 1999, Cisco Systems, Inc. 41
Building Block Templates
Use “As Is” or Combine

1) Standard Model
simple, structured
2) VLAN Model
more flexible
3) Large Scale Server Farm
Model
accommodate dual NIC
4) Small Scale Server Farm
Model
accommodate dual NIC
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 42
1) Standard Building Block
no loops - no STP complexity

Subnet 10 Subnet 12 Subnet 14 Subnet 16


Subnet 11 Subnet 13 Subnet 15 Subnet 17

Access L2
root switch
VLAN 10/11

Highly Deterministic
L1 maps L2 maps L3
GE/GEC
VLAN Trunks No blocking links
Shortest path always
Not “flexible”

HSRP Primary HSRP Primary


Subnets/VLANs Subnets/VLANs
10, 12, 14, 16 11, 13, 15, 17

10/100 BaseT

505 GE or GEC Dual Path with Tracking


0911_04F9_c3 © 1999, Cisco Systems, Inc. 43
2) VLAN Building Block
make L2 design match L3 design
All VLANs terminate at L3 boundary
All VLANs All VLANs All VLANs All VLANs
All Subnets All Subnets All Subnets All Subnets

Uplink-
Fast FE FO FE FO FE FO FE FO
BO BE BO BE BO BE BO BE

GE/GEC More flexible


VLAN Trunks FO forwarding odd
BE blocking even etc.

STP root STP root


L2 VLANs 10 12 14 16 L2
Path
VLANs 11 13 15 17 L2
L3 HSRP primary
subnets 10 12 14 16
HSRP primary
subnets 11 13 15 17
L3
10/100 BaseT
505 Dual Path with Tracking
GE or
0911_04F9_c3 GEC
© 1999, Cisco Systems, Inc. 44
3) Large-Scale Server Farm
Building Block
Dual-NIC Server
based on VLAN building block
Example Fault Tolerant Mode (FTM)
aggregates traffic - high BW
Same IP Address - seamless recovery

Access L2
UplinkFast

GE/GEC
10/100 BaseT
VLAN Trunks
GE or GEC

STP root STP root


L2 VLANs EVEN L2
Path
VLANs ODD L2
L3 HSRP primary
subnets EVEN
HSRP primary
subnets ODD
L3
505 Dual Path with Tracking
0911_04F9_c3 © 1999, Cisco Systems, Inc. 45
4) Small-Scale Server Farm
Building Block

Simplified building block with


no STP loops
Use if port density permits
Dual-NIC Server
Use if no oversubscription Example Fault Tolerant Mode (FTM)
(non-blocking) is a Same IP Address - seamless recovery
requirement

L2
L2 Path
L2
HSRP primary HSRP primary
L3 subnets EVEN subnets ODD L3
10/100 BaseT
505 Dual Path with Tracking
0911_04F9_c3 © 1999, Cisco Systems, Inc. GE or GEC46
Redundant Backbone Models
all good - increasing scale

1) Collapsed L3 Backbone
2) Full Mesh
3) Partial Mesh
4) Dual-Path L2 Switched
5) Dual-Path L3 Switched

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 47
1) Collapsed L3 Backbone
large building or small campus
Clients

Access L2

GE/GEC
Scale depends on
physical plant and
policy more than
Collapsed performance
Core L3
Backbone

10/100 BaseT
Server Farm GE or GEC

505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 48
2) Full Mesh Backbone
small campus - n squared limitation

Access L2

Client
Blocks
Distribution L3

Note importance of
2 blocks - 6 peerings passive wiring
3 blocks - 15 peerings closet interfaces in
4 blocks - 28 peerings meshed designs!
5 blocks - 45 peerings
Distribution L3

Server
Block
Access L2

E or FE Port
505 GE or GEC
0911_04F9_c3 © 1999, Cisco Systems, Inc. 49
3) Partial Mesh Backbone
medium campus - traffic flow to server farm

Access L2

Client
Blocks
Distribution L3

Predominant
traffic pattern

Distribution/Core L3

Server
Block
Access L2
E or FE Port
505 GE or GEC
0911_04F9_c3 © 1999, Cisco Systems, Inc. 50
4) Dual-Path L2 Switched Backbone
no STP loops or VLAN trunks in core

North West South

Access L2
Client
Blocks
Distribution L3

Core L2
Dual L2 Backbone
“red” core “blue” core
subnet=VLAN=ELAN subnet=VLAN=ELAN
E or FE Port
505 GE or GEC
0911_04F9_c3 © 1999, Cisco Systems, Inc. 51
5a) Benefits of a L3 Backbone

✔ Multicast PIM routing control


✔ Load balancing
✔ No blocked links
✔ Fast convergence EIGRP/OSPF
✔ Greater scalability overall
✔ Router peering reduced
✔ IOS features in the backbone
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 52
5b) Dual-Path L3 Backbone
largest scale, intelligent multicast
Access L2

Client
Block
Distribution L3

All routed links,


Core L3 consider subnet
count !

Distribution L3
Server
Farm
Block
Access L2
E or FE Port
505 GE or GEC
0911_04F9_c3 © 1999, Cisco Systems, Inc. 53
Restore Considerations


✔Restoring
Restoring can
can take
take longer
longer in
in
some
some cases
cases -- more
more complex
complex --
schedule
schedule

✔On
On power
power up
up L1
L1 may
may come
come up
up
before
before L3
L3 builds
builds routing
routing table
table --
temporary
temporary black
black hole
hole for
for HSRP
HSRP

✔Use
Use “preempt
“preempt delay”
delay” for
for HSRP
HSRP
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 54
Campus Failover Layer 2
Recovery & Tuning
UplinkFast
STP No tuning, 2
Tune ‘diameter’ on seconds, wiring
root switch closet only
Improves recovery Only applies with
time maxage forwarding &
blocking link

PortFast Backbonefast
Server or desktop Converges 2 sec +
ports only 1 s 2xFwd_delay for
indirect link failures
Move directly from
linkup into Eliminates maxage
forwarding timeout
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 55
Campus Failover Layer 3
Recovery & Tuning

OSPF (fast LAN links)


Caution with
Tune hello timer 1
aggressive tuning sec, dead timer 3 sec
Good when network
is stable, highly <4s to recognize
summarized problem, then
converge

EIGRP (fast LAN links)


HSRP (fast LAN links)
Tune hello timer 1
Tune hello timer 1 sec, hold timer 3 sec
sec, dead timer 3 sec
<4s to recognize
<4s to converge problem, then
converge
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 56
Keeping Networks Available!

✔ KISS - eliminate complex L2


✔ ASU - building blocks
✔ Redundant backbone
✔ Redundant L3 paths
✔ L3 segments failure domain
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 57
505
0911_04F9_c3 © 1999, Cisco Systems, Inc. 58

Вам также может понравиться