New Generation of Cisco Switching

Certification Zone - Tutorial
Page 1 of 50
Tutorial
New Generation of Cisco Switching

by Howard Berkowitz Introduction Old and New Models: Hierarchical, SAFE, and ECNM
"Is it SAFE?"
What to study -- and Not to Study New Paradigms and Metaphors Cisco's Switch Product Positioning Failover Requirements
CertificationZone Subscribers Should What does this mean in the context of switches?
Availability Terminology Paging Mr. Murphy Selecting Recovery Strategies Cost and Complexity in Selecting Strategies Recovery Time Requirements in Selecting Strategies 1:N, 1:1, and 1+1 Protection Strategies Switch Platform Architecture: A Model Practical Issues: What Are Ports? Management Hardware Software Control Forwarding Tables and Populating Them Forwarding Ingress Buffering and Processing
Pattern Recognition
Advances in Forwarding Tables: CAM and TCAM

Introducing Ternary Tables Templates For further details... Forwarding models
Fabric
Shared Bus Shared Memory Crossbar
Egress Processing QoS at the Switch Interfacing: the GBIC (Gigabit Ethernet Interface Converter) Characterizing Switch Performance Throughput Blocking Output Blocking Grandfather Switch: Catalyst 5x00 Platform Family Stacking and Clustering: 3750 and 2950 Midrange Flexibility: Catalyst 3550 Platform Family A New Interface Paradigm
Hardware Aspects of Voice Ports
Management and Control Forwarding Catalyst 4000/4500 Platform Family
http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides... 5/31/2005
Page 2 of 50
Management and Control Forwarding Catalyst 6000/6500 Platform Family Management and Control Database Manager Forwarding Switching Functions for High Availability Layer 1/2 High Availability for Links and Interfaces Layer 1 Failover
SONET and POS
Unidirectional Links: Detection Protocol (UDLD) and configuring Unidirectional Ethernet Layer 2 Aggregation Preventing Broadcast Storms Other Layer 2 Security and Management Enhancements Private VLANs 802.1x -- Port Based Authentication DHCP-related Security Features Growing Frames beyond Normal Size Single Spanning Tree High Availability
Layer 2 Traceroute
Core/Backbone Switch Failure Indirect Root Failures Root Wars Distribution Switch Failure Performance Enhancements to Individual Spanning Trees IEEE 802.1w Rapid Spanning Tree Protocol (RSTP) Port Types in 802.1d and 802.1w Port States in 802.1d and 802.1w PortFast, BPDU Guard, and 802.1w Functional Equivalence Root Wars and Root Guard STP Convergence Time Performance Enhancements to Multiple Spanning Trees MSTP: Subdividing the Spanning Tree for Faster Convergence MSTP Regions IST, CIST, and CST VLAN Tagging and VLAN Trunk Protocol (VTP)
VTP Pruning
VLAN-to-Spanning Tree Relationships PVST Conclusion References
Introduction
While most of the focus of this paper is on L2 switching, there is a significant amount on the architecture and implementation of "L3 switching". L3 switching is really routing, but the term L3 switching has tended to become associated with implementation techniques that do much of the work in specialized hardware. Please, please don't get confused by trying to see how L3 switching is somehow different, in basic principles, from routing. It isn't. At worst, it's purely a marketing term; at best, it emphasizes certain implementations. There's no accident that the Cisco 12000 is called the Gigabit Switch Router (GSR), because it makes extensive use of hardware processing. Since it's targeted at a WAN and ISP market, however, Cisco doesn't designate it a switch to avoid confusion with enterprise and server farm relays. This particular paper has many cross-references to other CertificationZone tutorials, and for good reason. The focus here is how a switch does something, while such things as QoS, high availability, and security tutorials define why something is done.
Page 3 of 50
Old and New Models: Hierarchical, SAFE, and ECNM

One thing to bear in mind is that Cisco has updated some of its design models that make use of switching. While there's been a good deal of buzz, including in exam objectives, about the SAFE and Enterprise Composite Networking Model (ECNM), the old three-level hierarchical model, with some updating, is still usefully with us. The SAFE blueprint does describe security measures for a variety of enterprise subsystems, and the hierarchical model can be applied individually to many of these subsystems. For all practical purposes, ECNM is simply the hierarchical model in new clothing, now married to the SAFE blueprint. The hierarchical model has changed the most in that Cisco emphasizes "L3 switching" in the core. Previously, the core emphasized L2 switching, either LAN or ATM. So, the "new" core is simply a place for high-performance routers. The products Cisco calls "high performance multilayer switches", such as the 6500 family discussed in this tutorial, still do IP routing as well as L2 switching. In contrast with routers such as the 12000, 10000, and 7500, the 6500s emphasize Ethernet port density and features more appropriate for LANs than WANs.
"Is it SAFE?"
Well, the quote is from the movie "Marathon Man," which is guaranteed to give nightmares about going to dentists. However, SAFE itself doesn't seem to be an acronym -- at least, it's not spelled out in the main SAFE blueprint from Cisco. Part of the confusion about SAFE and ECNM seems be that material about them is not on Cisco CCO. There is mention of ECNM in several security and design instructor-led courses, but there is no corresponding Cisco white paper. My best interpretation is that ECNM really means the overall design resulting from applying the three-layer hierarchical model to each appropriate subsystem of SAFE. Some Cisco presentations to service provider audiences introduce a fourth hierarchical layer, "collection", between access and distribution. The collection layer involves broadband aggregation (e.g., IP over cable or DSL) between the user premises and the ISP -- it's where the broadband service provider lives. In the Cisco Enterprise SAFE document, http://www.cisco.com/warp/public/cc/so/cuso/epso/sqfr/safe_wp.htm, there is one mention of an "enterprise campus module". This module is composed of the campus proper, the "campus edge", and the edge of service provider networks. Cisco has not made it clear if the "collection tier" is equivalent to the "campus edge" discussed in enterprise-oriented presentations. You may also want to look at an
What to study -- and Not to Study

Understanding the 3550 is vital for CCIE candidates because it is the only L2switching capable device announced to be in the CCIE lab. The 3550, however, also has routing/L3 switching capability as well as L4 QoS, so it could appear in lab scenarios as a L2 switch, a hybrid switch-router, a router, or as a edge traffic policy enforcer. At the same time, especially if
Page 4 of 50
you are studying for the CCNP Switching or CCIE Written examinations, you need to know about platforms that are not in the CCIE lab. The 6500 switch, for example, is Cisco's flagship product for large enterprises and internal use within ISPs. It has some unique features on which you might be tested.
Internet-Draft I coauthored, which hopefully will soon move to RFC, "Terminology for Benchmarking BGP Device Convergence in the Control Plane", http://www.ietf.org/internetdrafts/draft-ietf-bmwg-conterm-05.txt, where we draw a distinction between two functions in the Cisco "distribution tier", the "provider edge router" and the "inter-provider border router," as opposed to the "subscriber edge router". This distinction, while informal, captures For many switches, you will some of the flavor of Cisco's "campus need to recognize that there is edge". While not listed as an official a product family that includes coauthor because we weren't allowed to more than one numbered list more than five coauthors, Alvaro series. For example, the 4000 Retana of Cisco was part of the team series switches are modular, that wrote this document. but the 2948G switches are very similar devices whose configurations are fixed. Table 1. General Positioning Model for Enterprise Switches Enterprise size Wire closet Small Midrange Large Backbone
Fixed configuration Fixed configuration Modular Modular Fixed configuration Modular Modular Modular Modular
You will find switches positioned for different functions, and for the same function within organizations of different size. Fixed configuration platforms are most associated with the smaller enterprises, but they also can be quite useful as aggregation platforms inside larger enterprises.
New Paradigms and Metaphors

Many of Cisco's earlier switches are the result of acquisitions, although modern switches are designed and manufactured by Cisco. As a result, there was a confusing assortment of operating systems and human interfaces across platforms. The "Catalyst Interface", for example, came from Cisco's acquisition of Catalyst. Table 2. Switch Operating Systems and their Interfaces Operating System Interface Comments CatOS Native OS Hybrid IOS 4000, 5000, 6000 2950, 3550, 4000 Sup 3, 6000 MSFC Cat OS + IOS on MSFC (5x00) Routers, MSFC
Real consolidation and a clear picture of future trend came with the introduction with the 3550 and its IOS-based interface. This interface has considerable QoS capability, especially important for Cisco
Page 5 of 50
AVVID (Architecture for Voice, Video, and Integrated Data) use.
Cisco's Switch Product Positioning

Table 3. The View in 1999 Wire closet Server farm Core 8500
2900/4000, 5000 6000
Table 4. Qualifying the 1999 view for Enterprise Size Enterprise size Wire closet Small Midrange Large Backbone
2900XL/2948G 50xx/55xx 4000,5000 55xx 55xx 6000/85xx
Table 5. The View in 2003 Wire closet Server farm Core 6500
2900/4000, 5000 4000,6000
Failover Requirements
Selecting the appropriate level of availability is as much a business as a technical decision. In her book Planning for Survivable Networks, Annlee Hines has written extensively on the basis of these decisions. If you ever plan to recommend real network designs rather than simply pass tests, read her book! [Hines 2002]
CertificationZone Subscribers Should

See my High Availability tutorial for additional details.
My WAN Survival Guide [Berkowitz 2000] discusses some of these cost-benefit trade-offs from the enterprise standpoint, and my Building Service Provider Networks [Berkowitz 2002] looks at the tradeoffs from the service provider viewpoint. Table 6. Broad Goals for High Availability [Berkowitz 2000] Availability Level 1 2 Server "Do nothing special" Backups "Increased availability: protect the data" Full or partial disk mirroring, transaction logging "High availability: Protect the system" Clustered servers "Disaster recovery: protect the Network Locked network equipment Dial/ISDN backup
3 4
Redundant routers No single-point-of-failure local loop No single-point-of-failure national
Page 6 of 50
organization" Alternate server sites
backbone
High availability involves a great many cost trade-offs, some of which are "Layer 8" business rather than technical considerations. Table 7. Costs of High Availability Mechanisms Direct Backup equipment Additional lines/bandwidth Floor space, ventilation, and electrical power for additional resources Indirect Design Network administrator time due to additional complexity; higher salaries for higher skills Performance drops due to fault tolerance overhead
If you choose to "pay me later" and accept failures, what are some of the costs of failures when they occur? Table 8. Costs of Lack of Availability Direct Revenue loss Overtime charges for repair Indirect Lost marketing opportunities Shareholder suits
Salaries of idle production staff Staff morale Radia Perlman's doctoral thesis [Perlman 1988] was on the "Byzantine generals problem". She demonstrated that adding more network elements during certain kinds of failures not only does not increase availability but actually decreases it. The theoretical problem deals with a situation where the decision maker receives conflicting information from multiple sources, some of which is known to be untrue -- but it is not known which information is untrue. Sounds familiar from mutual redistribution problems, hmm? It applies to most routing mechanisms and related mechanisms such as Layer 2 spanning trees.
What does this mean in the context of switches?

Depending on the specific switch model, you may have any or all of the features:
z
Redundant processors/supervisors Redundant/load sharing power supplies Hot-swappable line cards
Availability Terminology
Remember that the CCIE written exam is more concerned with protocol theory and features than specific configuration of routers to use them. This section will give you a good deal of information relevant to the theory of many protocols. For more detail, see the High Availability tutorial. We often speak of single points of failure. Multiprotocol Label Switching (MPLS) has refined that definition into the shared risk group (SRG). The basic definition of an SRG is "a set of network elements that will be affected by the same fault". SRGs can apply to all sorts of network resources, and a given resource can belong to more than one SRG. A shared risk group of routers might be all of those on a common electrical power supply.
Page 7 of 50
Table 9. Basic Shared Risk Groups Layer Hardware Software
Infrastructure Commercial power Physical Data Link Network Transport Application Cable in common duct, single shared medium Cables in common multilink bundle Router Routing software session/instance TCP software Single DNS server
One of the classic SRGs is the common cable or cable duct that gets cut by construction workers. While building alternate cable runs to the telco end office historically is prohibitively expensive, new Cisco technology gives you some creative alternatives. It may not be expensive, balanced against the cost of downtime, to run a wireless LAN from your main router to a router in a nearby building. That alternate router would connect to the end office, at the very least, via a different cable, and ideally would connect to an entirely different office. The bandwidth available to you from one wireless LAN, or a small number of parallel wireless LANs, usually will be comparable to your normal WAN uplink. When the WAN bandwidth requirements are substantial, you still can get laser or wireless links from non-Cisco vendors, providing short-haul bandwidth up to OC12 (622-Mbps) rates.
Paging Mr. Murphy

Murphy's First Law states "Whatever can go wrong, will." His Second Law says "What has gone wrong will get worse." High availability measures will never be able to deal with every possible Murphy case. MPLS protocol designers do try to deal with most Murphy cases, and do a much more extensive job than in other protocols. As a result, the Framework for MPLS Recovery [RFC3469] first approaches the problem of single link (group) failures between network elements, generalizing this model to single interface and single router failures. The latter two are equivalent to SRG failures. My High Availability Tutorial goes into more detail than we can fit here. Other failure modes not considered here include congestion from broadcast storms and the like, Byzantine errors, host or host link errors, etc. Illegal protocol packets or hardware failures clearly are error events. Inopportune events impact high availability as well. A good example of an inopportune event is the arrival of one or more error notifications, or explicit restart/recovery messages, while recovery or restart is in progress.
Selecting Recovery Strategies

Approaches to recovering from failures depend on whether tight resource control in the network is needed, as, for example, where bandwidth is explicitly allocated to meet QoS actions. If overall control of this sort is needed, there may need to be a central (or distributed) network management element, usually called a head end. Before selecting a technology, know your tolerance for outages and your budget. This discussion
Page 8 of 50
assumes that the recovery technology does have sufficient resources to protect against at least a single failure without human intervention. Outside the scope of this discussion are failures where mean time to repair (MTTR) is significant because it requires human intervention, possibly at unmanned sites, and possibly where spares need to be shipped in. You must, however, always remember why you want a particular level of survivability and build against the defined requirements. Designers and, unfortunately, traditional telephony people often use the 50ms cutover goal of SONET as the gold standard. This number is derived from SS7 characteristics of large carrier networks. VoIP is much more tolerant of drops, tolerating 140 ms to 2 s.
Cost and Complexity in Selecting Strategies

Part of the cost of any recovery strategy is the cost of resources that do not routinely carry operational traffic but are devoted to backup. Such resources are assumed in 1:N, 1:1, and 1+1, and possibly local repair models. Dynamic discovery does not make this assumption. The more resources committed, the more expensive the solution. See Table 11 for a summary of recovery strategies, which are detailed in subsequent subsections of this discussion. Local restoration and reversion, also discussed below, can apply to any of the modes of this table. Another consideration is whether the recovery must consider end-to-end performance. All of these strategies provide restoration, but may or may not provide reversion. In restoration, the high availability system has done its job when the failed resource is replaced by another. In reversion, the high availability system also needs to restore the original conditions of resources after the failure is fixed. Another consideration is whether a backup resource needs to be found for the new working resource. Reversion implies, to some extent that the original resource backs up the new working resource, but the risk of the original resource being down may make that inadequate. End-to-end recovery needs to know about SRGs. It needs to know that a recovery action will either minimize the number of resources put into an SRG in which a failure occurred or completely avoid that SRG. Local repair is not aware of end-to-end recovery.
Recovery Time Requirements in Selecting Strategies

Data protocols that are extremely timing-critical are becoming uncommon, such as IBM System Network Architecture (SNA) without local acknowledgement and DEC Local Area Transport (LAT). MPLS work on availability has produced a generally useful list of timers, generalized here for IP as well as MPLS (Table 6). Table 10. Failure Detection Timers Failure or Degradation Type MPLS Definition IP Routing Definition
Path Failure (PF) Recovery mechanisms have decided the path has totally lost connectivity. Link Failure (LF)
BGP or IGP route withdrawal or loss of keepalives at a lower layer.
MPLS recovery mechanisms have been Typically implementation-specific,
Page 9 of 50
informed of a lower-layer total failure.
although OSPF does have a specific notification abstraction, especially for demand circuits. Usually associated with an SNMP trap. BGP or IGP withdrawal route. Generally considered poor practice to announce periodically.
Fault Indication Signal (FIS)
A signal repeatedly transmitted that a fault along a path has occurred, passed along the path until it reaches a network element capable of initiating recovery. Indication that a fault along a working path has been repaired.
Fault Recovery Signal (FRS)
BGP or IGP re-announcement of previously withdrawn route.
You may have real-time applications such as telepresence, telemetry, etc. that must have predictable delay. Delay may also be a commercial differentiator for competitive offerings of mission-critical business applications such as automatic teller machines, credit authorization, and transaction-based Internet commerce.
1:N, 1:1, and 1+1 Protection Strategies

In order of strength of protection (and cost), there are three basic modes for media/link protection: 1:N, 1:1, and 1+1. These modes dedicate backup resources. An additional mode, dynamic discovery, assumes resources are there but does not pre-allocate them. Do remember that to use some of these strategies, you will have to have physical topologies that make the backup resource in physical proximity to the working resource. Table 11. Protection types Type 1:N 1:1 1+1 Description one backup resource for N working resources, N > 1 dedicates a backup resource for each working resource. sends identical data on both links, so that the data is immediately available in the event of failure Relies on sufficient statistical redundancy that routing protocols can find a non-dedicated backup path. This may involve determining a new multi-hop path. Example One extra link in an EtherChannel bundle dual ring FDDI Hot standby power supplies and supervisor engines, SS7 data link protocol, SSCOP Spanning tree, L3 routing
Dynamic discovery
Both 1:N and 1:1 schemes may use the backup resource for lower-priority traffic, which can instantly be pre-empted if the working resource fails. 1+1 protection adds application complexity, because the applications need to be able to decide which copy of information should to be used. In switches, you may see it in cases where Cisco Nonstop Forwarding supports a hot-standby processor. 1+1 is very rare in networking. You will see it in SS7 telephony control networks, but it is not used extensively in enterprise networking.
Switch Platform Architecture: A Model
Page 10 of 50
You can look at a switch abstractly as a relay. Relays are devices with at least two interfaces, which accept data on one interface and send it out another. A range-extending repeater, operating at the physical layer, is the simplest type of relay, with only one input and one output. Ethernet hubs are still relays, although they copy the data onto an internal shared medium and fan the contents of that medium to all other ports. You really can't get a good sense of relays until layer 2, when the platform software has to make a decision regarding which egress interface to use. While Cisco likes to talk about frames vs. packets vs. segments vs. messages, doing so is not correct OSI terminology. OSI formalism sometimes is very pedantic, but some of its terminology can be very precise and unambiguous. OSI documents speak not of specifically named units at every layer (e.g., frame at layer 2), but of Protocol Data Units (PDU). At a specific layer, you speak of Transport PDUs or Data Link PDUs. Another useful concept, especially when dealing with protocol encapsulation, is the layer above the current layer is called (N+1) while the layer below is (N-1). From the perspective of the network layer, it receives (N+1)PDUs from Transport, and sends out (N-1)PDUs to Data Link. A relay, which is a term from the formal (yes, that's the way it's spelled), is a device (or software function) with at least two interfaces. It receives PDUs on one interface and de-encapsulates them until it has the information on which it will make forwarding decisions. Ignoring devices such as multilayer switches, devices such as bridges and LAN and WAN switches accept physical layer bits, build them into Data Link PDUs, and make forwarding decisions on information at Data Link. Routers receive bits, form frames, and extract Network PDUs from the Data Link PDUs. After examining Network Layer information, they internally forward Network PDUs to an outgoing interface, and then encapsulate these into Data Link PDUs and then Physical Layer information. To make any of these forwarding decisions, the relay must first have an association between destination (and possibly other) information in the PDU at which it makes decisions, and information about the appropriate outgoing interface. The process of learning these associations is path determination. In bridges and LAN switches, path determination involves the spanning tree protocol, VLAN protocols, and source routing. In routers, path determination involves static and dynamic routing, as well as the up/down state of hardware interfaces.
Practical Issues: What Are Ports?

Ports, in general, are the physical connectors to which you can connect clients, servers, or switches to a switch. There are virtual ports, but they are beyond the scope of this discussion. A line card can have one or more ports. Through manual configuration, autoconfiguration, and hardware mechanisms, a port can take on many roles. Table 12. Physical Port Types Port type Static Dynamic Secure Trunk Attributes No filtering and may be assigned to a VLAN based on physical port ID. Assigned to a LAN based on frame contents and the definitions in the VLAN Policy Management Server (VPMS) Has a MAC address filter Runs 802.1q, 802.1v or ISL
Page 11 of 50
Source SPAN Destination SPAN
Source of traffic to be sent to the SPAN monitoring port Port associated with SPAN analysis (e.g., RMON)
Don't confuse the physical port types in Table 12 with the spanning tree port types in Table 35. A port can have both a physical type and a spanning tree type.
Management
In a relay, the management function is concerned with building the forwarding "map", whether that is a spanning tree at OSI Layer 2, a routing table at Layer 3, or content switching tables at higher layers. Other functions include exception processing such as ICMP, running routing and spanning tree protocols, etc. Management obviously includes the automated management functions (e.g., TFTP, logging) and the human interface.
Hardware
Management functions usually are implemented in general-purpose processors. As performance requirements grew more stringent, the processor often was a Reduced Instruction Set computer (RISC) design rather than a Complex Instruction Set (CISC) design. Under some conditions, forwarding uses the same processor as is used for management.
Software
Management is primarily a software function. Clearly, this is the role of the human interface, be it textual or Web-oriented, or be it any of the different switch operating systems.
Control
Control software runs management functions, including the human interface, as well as topology learning with spanning tree and dynamic routing protocols.
Forwarding Tables and Populating Them

On router platforms, forwarding tables began with the routing table, that which you see with a show ip route. This table, more formally called the Routing Information Base, is optimized for adding and deleting routes. That optimization benefits control, but not forwarding efficiency. In contrast, the tables used in the high-speed forwarding path are optimized for fast lookup, and are populated from data in the RIB. While the generic computer science term for this fast-lookup data structure is the Forwarding Information Base (FIB), Cisco uses the term cache and FIB a bit differently. The first cache example was the fast switching cache, which is a data structure in the main RAM, which has fewer entries than in the RIB. A fast lookup algorithm, such as hashing, is used. First-generation fast lookup tables had to be rebuilt whenever an entry was added or deleted. Partial updating was not practical. You could, as a result, see drops in performance whenever there was a "cache fault", or an attempt to look up a destination not present in the cache. For fast switching and its equivalent hardware assisted variants, autonomous switching (AGS+ and early 7000) and silicon switching (7000 with RSP), cache faults could significantly affect performance. These distributed caches
Page 12 of 50
were quite small, either 512 or 1024 entries. This small number of entries worked acceptably in an enterprise, which typically has a moderate number of frequently used routes, but was a severe performance limitation in ISP routers. Distributed switching on VIPs was a major performance advance, because the VIP FIB has a one-to-one correspondence with the RIB. With this correspondence, there never will be a cache fault.
Forwarding
At a general level, let's consider the forwarding modes, also called switching paths, in Cisco platforms. Table 13. L2 switching modes Switching mode "Software" "Hardware" -- CAM for L2 "Hardware" -- TCAM for L2 and L3 Speed Slowest but most intelligent Default mode and most common at layer 2 Good compromise between speed and intelligence MIB:RIB Relationship MIB and FIB are the same. May be centralized or distributed. Uses Content Addressable Memory requiring an exact match May be centralized or distributed. Uses one or more Ternary Content Addressable Memories
Table 14. L3 switching modes Switching mode Process switching Fast switching Speed Slowest but most intelligent Default mode, faster than process MIB:RIB Relationship MIB and FIB are the same. FIB is in RAM, and is smaller than the RIB. FIB is in special hardware, and is much smaller than the RIB. FIB is a full copy of the RIB.
Autonomous, silicon, Fast, hardware-assisted and platformoptimum dependent Express Fastest, especially when distributed into multiple Versatile Interface Processors
Ingress Buffering and Processing

As long as the fabric is non-blocking, there is no need for input buffering. It is possible that buffers will be required when doing traffic shaping at the ingress. At the most basic, ingress processing looks up the destination address in the frame or packet header, selects the egress interface, and moves the frame or packet to the fabric. If the fabric is blocking, the packet may go into a buffer. In all cases where I am familiar with router or switch internals, the ingress processor prefixes the frame or packet with an internal header used by the fabric to send it to an appropriate egress interface(s). Such headers are never seen outside the platform.
Pattern Recognition
Ingress processing, in the real world, gets complicated by frequent requirements to recognize patterns in the packet or frame, patterns other than the destination. Among the most common is what we generically call an access control list (ACL), which checks certain fields, usually with a mask that
Page 13 of 50
indicates whether the value of a bit is to be checked, or if the pattern will accept any bit value in that position (i.e., wild card). When you consider wild cards as well as a bit being one or zero, you introduce ternary logic, a step beyond a simple binary on-or-off decision. Cisco now describes the individual lines in an ACL as access control entries (ACE). You can recognize patterns, at L2 and L3, for various reasons, including security filtering, special routing (e.g., source routing) or QoS recognition and marking.
Advances in Forwarding Tables: CAM and TCAM

One of the challenges to wire-speed forwarding is how quickly destination information can be retrieved from an address table. In L2 switches, this historically was the job of the Content Addressable Memory (CAM), and now the job of the Ternary Content Addressable Memory (TCAM). The TCAM has both L2 and L3 fast lookup capability, as opposed to the Forwarding Information Bases in router Versatile Interface Processors (VIP) or the forwarding part of a Route Switch Processor (RSP). Router FIBs, however, hold considerably more routes than a TCAM, a necessity for service providers. Early switches used a CAM to look up destination MAC addresses, CAM had far fewer entries than most router cache or FIB, which often was acceptable given the scope in which a switch worked. In a CAM, you must match on every bit of a MAC address, even if some of them, such as the first 24 bits of vendor ID, are not significant for the particular lookup.
Introducing Ternary Tables

TCAMs, however, can "wildcard" fields. This gives several advantages over a CAM, including longestmatch selection for ACLs and CEF (i.e., in L3 forwarding), a single lookup of fixed latency, and the ability to ignore fields. TCAMs are used in the 6500, 4000 and 3550 series. There is a platform-dependent number of templates and number of entries per template type; the TCAM is partitioned into regions of templates. In the 4000 and basic 6500, there is a single centralized forwarding table. The central forwarding engine is the limit to forwarding performance. Switches with 100-Mbps rates and above use distributed forwarding, which allows the forwarding speeds of multiple forwarding engines to be added. Distributed switching is present in the 3550 and in the 6500 with DFC.
Templates
Switch Database Management for TCAMs was introduced on the 3550. Originally, there were four templates, which would set TCAM elements to an optimal solution for:
z
access default general-purpose routing VLAN
Page 14 of 50
All the templates in assume 8 routed interfaces and 1K VLANs. Table 15. 3550 Template Assumptions TCAM unicast MAC address IGMP group Access Default Routing VLAN 1024 2048 5120 1024 1024 1 8 1 5120 1024 512 512 16384 1024 8192 1024 1024 0 0 0
QoS Access Control Element (ACE) 1024 Security ACE Unicast Routes Multicast Route 2048 2048 2048
Notice that the default template is optimized to support a large number of MAC addresses in the MAC table, and a large number of IP routes in the routing table. The trade off is fewer resources for IGMP groups, QoS, and security related access control entries (lines in access-lists): The routing template offers support for twice as many routes (16,000 versus 8,000), but far fewer access control entries and QoS entries. In contrast, the VLAN template disables routing entirely, and focuses all resources towards L2 and VLAN support. As Chuck Larrieu put it in his 3550 Tutorial, "While it is unlikely that any CCIE Lab scenario would stress any of these settings, it is possible that a Candidate might be asked to 'assure that SVI support is maximized' or 'ensure that L3 functionality is not compromised by L2 considerations'." It is equally possible that a candidate for a written exam -- CCIE or CCNP -- might be asked a similar question. It's likely that the template model will spread to platforms other than the 3550.
For further details...

Please refer to the 3550 Tutorial by Chuck Larrieu.
"Cisco has created within the 3550 platform the means of customizing and optimizing system resource allocation based on particular application or requirement. For example, if a particular Forwarding models switch was strictly Layer 2, or a series of switches had a large number of Demand-based forwarding requires that the first packet of connected stations and a large number of a flow must go through the "slow" or "software" path, VLANs as well, then one could reallocate which then populates a high-speed table. You will see this resources to favor VLAN, while disabling in the Supervisor 1A/MSFC on the 6500. routing and freeing up routing resources. On the other hand, if a particular Topology-based forwarding, on the 6500 with Supervisor 2, installation required extensive QoS or the 4000 with Supervisor 3, and the 3550, breaks the security configurations, an administrator dependence on software lookup. could optimize the switch to allocate resources for those activities."
Fabric
The fabric interconnects the input and output interfaces. There are three main types of fabric:
z
Shared bus Shared memory Crossbar
Page 15 of 50
A given switch will have one or more types of fabric. Indeed, on high-performance switches such as the 6500, the highest-speed fabric is a separate card, not just part of the backplane. Don't make the mistake I did, early in my career, and equate the backplane with the fabric. The backplane tends to be passive or nearly so. The active fabric will be on the supervisor card (or integrated equivalent), and sometimes on a separate plug-in card. Indeed, a single platform can have more than one fabric. Table 16. Fabrics by Platform Type Fabric Speeds in Gbps * Shared bus Shared Memory Crossbar 8.8 13.6 8.8, 13.6, 24 [1] 32 [2] 32 28, 64 1.2 3.6 32 256
Platform 2900 2955 3550 3750 4000 4500 5000 5500 6000 6500
* Cisco specifications are not always clear if the bandwidth stated is unidirectional, or adds together the two directions [1] Depends on platform model [2] Total bandwidth for stack
Shared Bus
Most lower-performance devices use a shared bus as the fabric. A single bus allows a connection between two interfaces, with all interfaces contending for the bus. Don't fall into salesdroid traps and assume faster is always better. Shared bus is the cheapest solution, and thus appropriate for workgroup and other small switches where cost is more important than performance. The fabric is usually built into the backplane. Some devices, such as the 5500 switch, may have several busses bridged into one, and the throughput figure is the sum of the bus speeds.
z
Each port must arbitrate for access Broadcast and multicast are easy Oversubscription is normal
To make multicast and broadcast work properly,

z
Flooded data decreases end-station performance Destination must be only those ports that need that traffic
Page 16 of 50
Multicast or VLAN mechanism must limit traffic to certain ports
Shared Memory
Shared memory systems keep the frame or packet in memory until the last egress interface is finished with it. Memory management can be simple or difficult, depending on whether or not there are requirements for QoS and/or multicast. QoS requires static buffer allocation in the shared memory. When you are multicasting, unless there is enough concurrent ports to the memory to service simultaneously all egress ports in the multicast group, the packet or frame has to stay in memory until the last egress port transmits it.
Crossbar
Crossbar designs are a full mesh, allowing concurrent communications between any pair of interfaces. Obviously, there is no contention for unicast forwarding. Crossbars are the fastest fabric technology. There may be several cooperating crossbars within a large switch or router, as the ASICs involved are typically not greatly larger than 16x16. Multicasting on crossbars can be a challenge, since the one-to-one relationship inherent to a crossbar is not a good fit to the one-to-many of multicast involving multiple egress interfaces. Crossbar works perfectly well in the middle of a multicast tree, where you have a single egress interface for a multicast group address. Shared memory fabrics may work better for multiple-egress-interface multicasting.
Egress Processing
In most switches and routers, the bulk of the processing is done at the ingress. Such functions as egress QoS, data link protocol conversion, etc., do take place in the egress card. When the egress port connects to a server that is incapable of wire-speed operation, output buffering may be needed to avoid drops. In such cases, the amount of output buffering designed into the switch involves delicate tradeoffs. Too little buffering causes data drops, but too much buffering can cause unacceptable delay.
QoS at the Switch

The discussion of QoS here is less to get into the various ways of enforcing QoS, such as shaping, policing, and queuing, and more to discuss how QoS requirements affect switch architecture. When you do not implement a QoS marking mechanism, the DSCP fields of packets and frames are trusted, and those fields used to sort the data units into appropriate queues. In switches, the default means of servicing queues is round-robin. Most switches support four queues, either in partitioned main memory or in dedicated memory You can enable QoS marking and have the option of resetting the DSCP field to a new value, or you can set up new mappings between DSCP values and queues. See Figure 1 for the default mappings from DSCP to queue.
Switch#show qos maps dscp tx-queue DSCP-TxQueue Mapping Table (dscp = d1d2) d1 : d2 0 1 2 3 4 5 6 7 8 9 ------------------------------------0 : 01 01 01 01 01 01 01 01 01 01
Page 17 of 50
1 2 3 4 5 6
: : : : : :
01 02 02 03 04 04
01 02 02 03 04 04
01 02 03 03 04 04
01 02 03 03 04 04
01 02 03 03 04
01 02 03 03 04
02 02 03 03 04
02 02 03 03 04
02 02 03 04 04
02 02 03 04 04
Figure 1. DSCP to Queue Mapping Use the
qos map dscp dscp-values to tx-queue queue-id

command to reset the mappings. In switches with four queues, transmit queue 3 can be taken out of the round robin rotation and designated to follow strict priority queuing. This function, disabled by default, is intended for lowvolume, delay-sensitive traffic such as voice and network control information. Be very conservative in assigning traffic to this queue, or you may starve the other queues. You can find the transmit queue and priority assignment for an interface with the show run
interface command.
A special bandwidth subcommand of tx-queue, not to be confused with interface bandwidth, can allocate a guaranteed minimum bandwidth to each of the four queues. At present, this is only available on non-blocking Gigabit Ethernet interfaces. For a 4000-specific example of such ports, see Table 22. If you enable global QoS without bandwidth statements, each queue will get 250 Mbps. Do be aware that the switch does not check for consistency amount the assignments, and it will let you oversubscribe (e.g., assign 250 Mbps to queues 1 and 2 and 500 Mbps to queues 3 and 4). As long as a transmit queue is below the preconfigured share and shaping values, it is considered high priority and served by the priority queuing discipline. Queues that do meet the share and shape values will be serviced after the high priority queues. Only if no high priority queues exist will strict round robin be observed. The priority discussed here is not directly associated with the DSCP
Interfacing: the GBIC (Gigabit Ethernet Interface Converter)

Cisco standardizes the Gigabit Ethernet ports on switches, and assumes you will connect a Gigabit Ethernet Interface Converter (GBIC) to the ports to interface the port to the specific GE technology. There are GBICs for short- and long-wave optical GE, for long-haul systems, for coarse and dense wavelength division multiplexing on optical transmission systems, for switch stacking, for GE over copper, and a constantly growing list of optical and electrical media.
Characterizing Switch Performance

Many confusing numbers about switch and router performance.
Throughput
For the standard definition of throughput, see RFC 2544. Figure 2 shows test configurations for the
Page 18 of 50
Device Under Test (DUT) with both integrated and separate load generators and receivers.
+------------+ | | +------------| tester |<-------------+ | | | | | +------------+ | | | | +------------+ | | | | | +----------->| DUT |--------------+ | | +------------+
+--------+ +------------+ +----------+ | | | | | | | sender |-------->| DUT |--------->| receiver | | | | | | | +--------+ +------------+ +----------+
Figure 2. Standard Throughput Measurement topology I find it amusing that the presentations by Cisco technical people at Networkers often give a reduced but practical definition of throughput. For example, you'll hear the figure 256 Gbps used to state the throughput of the fabric module on a 6500 series switch. The maximum one-way throughput, however, is 128 Gbps. The sales figure adds together the maximum speeds in each direction, doubling the throughput. It's less amusing if you are asked to answer a question on "speeds and feeds", and it's not clear whether the question is looking for unidirectional or bidirectional management. There's no simple solution here, other than to read the question carefully and see if it makes the conditions of measurement clear. I'd also be more tempted to go with a salesy answer if I were taking a sales certification exam. Somehow, we've managed to avoid widespread propagation of the idea that full-duplex Fast Ethernet has a throughput of 200 Mbps, but this "spin" of the truth still seems popular in describing the throughput of a routing or switching platform.
Blocking
A source of much fear, uncertainty, and doubt (FUD) in switch marketing is whether a forwarding system is blocking or nonblocking. The usual definition of a nonblocking switching fabric is that the fabric is fast enough to transfer all traffic, without loss, while all ports are active. This definition is somewhat flawed. A better way to speak of a nonblocking fabric is one that can keep up with a set of input ports, each of which is outputting to a unique output port of the same or greater speed. Sales presentations for nonblocking relays tout their advantage over blocking devices. In practice, this is often a theoretical rather than a practical advantage. There is an underlying assumption of how nonblocking performance is measured, as shown in Figure 3. In a blocking switch, the fabric is too slow for full noninterfering transmission. In Figure 3, input and output ports are paired, as required by RFC 2544. Every input has a dedicated output.
Page 19 of 50
Figure 3. Switching Fabric
Output Blocking
Output blocking is fairly common, and you must understand that it is a client or server problem, not a switch or network problem, unless an intermediate, blocking relay is connected to the output port. Output blocking occurs when two or more ingress ports try to send simultaneously to the same egress port. Remember that the RFC 2544 throughput specification is explicit that each ingress port relays only to a single egress port. In this situation, the fabric speed is irrelevant, because the problem is at the egress port (Figure 4). You can trade off delay against data loss by providing output buffering. When QoS must be controlled, you need to think through ingress and egress parameters so unacceptable delay will never occur at an egress port.
Page 20 of 50
Figure 4. Output Blocking --- don't blame the switch! Some vendors, though not Cisco, support a technique that buffers at the ingress when the external destination cannot accept data fast enough, or the egress interface is busy. Input buffering, unless very carefully designed, can lead to head of line blocking (Figure 5). What Cisco has done is produce a GE (Gigabit Ethernet) interface for the 4000, which has 18 ports that share a 6 Gbps path into the fabric. These numbers were chosen because many Wintel servers can't generate more than 300 Mbps of traffic. With such servers, there's still a benefit to using GE, to reduce latency in transmission, use single GE NICs rather than Fast EtherChannel, and leave room for growth
Figure 5. Head of Line Blocking
Page 21 of 50
In modern switches, head-of-line blocking cannot occur, because there can be concurrent transfers between input and output ports. Such concurrency can be virtual, but is now generally physical, because a multiported shared memory, or a crossbar, will not get "stuck" waiting for a frame to transfer. Even if there is a blocking fabric, modern switch design prevents head-of-line blocking, because it creates multiple virtual queues in the input buffer, which prevent a frame from ever preventing another frame from reaching the fabric. With a blocking fabric and a single input buffer queue, you can have the scenario in Figure 5. This scenario involves ingress interfaces that have a single first-in-first-out (FIFO) buffer. Assume that two frames destined to output port three arrive simultaneously, one on port 1 and the other on port 2. Port 3, obviously, can only send one frame at a time. Again remembering the input buffer is FIFO, assume another frame, destined for port 4, arrives at port 1. If port 2 gained control of the fabric before port 1 could do so, port 1 can't send the 4th frame, because its path to the fabric is being blocked by the "backpressure" from the output port. Head-of-line blocking means that the data unit "behind" the port 1 destined data unit on port 2 has to wait to be transmitted. In principle, while input port 2 waits to get a path to output port 1, the data unit destined to output port 2 could be transmitted in parallel. The reality is that the fabric cannot see the input traffic in the buffer if the input buffer is a first-in-first-out (FIFO) structure. Shared memory buffering in more modern switches tends to avoid head-of-line blocking, since all ports have access to the memory. You encounter head-of-line blocking in daily life, when you are driving in the right lane, and come to a traffic light where you want to turn right. Your car, however, is the second in line, and the car in front of you wants to go straight. If that car were not at the head of the right lane, you could turn right on red. You are, however, blocked at the head of the line. Given an understanding that blocking may occur even in a "nonblocking" design, an anyto-any crossbar architecture may not improve performance at lower speeds. [Berkowitz 1999, p. 197-199].
Grandfather Switch: Catalyst 5x00 Platform Family

These are obsolescent switches, but worth mentioning because so many people have experience with them. 5000/5500 switches use the CatOS interface, except when configuring L3. Optional L3 forwarding capability began on the NetFlow Feature Card (NFFC). This card can filter, snoop CGMP, and enforce QoS in the "fast path", but does not run the control plane of routing or switching protocols. Control plane functions run in the Supervisor Engine. The Route Switch Feature Card (RSFC) essentially is a full IOS 12.0 capable router, which uses a NFFC II function on the Supervisor Engine for its forwarding. The RSFC directly supports Cisco Express Forwarding.
Stacking and Clustering: 3750 and 2950

Stacking has been in the industry for some time, and is a means of providing management for a group of switches using only one IP address for management. Cisco extended the concept to clustering.
Page 22 of 50
Clustering provides the functionality of stacking, but removes some of the limitations. Stacked switches needed to be in close proximity, as in a single wire closet. A cluster, however, can be defined among switches in different locations reachable by the same LAN. The members of the cluster are selected dynamically rather than by the physical wiring used in a stack. Members of a cluster can be linked with a dedicated cable and GBICs as in stacks, but also with Fast Ethernet or Fast EtherChannel. Since clustering no longer requires the physical proximity of switching, clustering is available on mixtures of up to 16 switches, including the Catalyst 3550, 2950, 3500 XL, 2900 XL, 2900 LRE XL and 1900 Series. While clustering functionality was introduced with the 3512XL, 3524XL, and 3508G XL, only the 3508 G XL is still sold. Their replacements are the Catalyst 3550 and 2950 series. The 3508 G XL is still supported, primarily as a GE concentrator. 3750 series switches are only semi-modular. They have fixed ports for 10/100 Ethernet, but have some number of Small Form-Factor Pluggable (SFP) uplinks. Gigabit Interface Converters (GBIC) plug into the SFP ports. You will also find SFP ports on the 3550 series.
Midrange Flexibility: Catalyst 3550 Platform Family

All 3550s have L2 switching capability "out of the box". The 3550 12-, 24-, and 48-port switches have optional L3 switching (i.e., routing) capability. The 3550-12G and 12T come standard with the L3 software image. It is very simple to upgrade the software, and migrate an L2-box to an L3-box -- all that is required is a software license upgrade. In life, as well as in the CCIE Lab, one should adjust to the new metaphor. This is particularly important for those who have concentrated their efforts around the Catalyst 5000 in their studies.
A New Interface Paradigm

You must understand the new 3550 interface metaphor and understand the relationships among physical ports, routed ports, port based VLANs, and switched virtual ports. With the old set-based switches, one was concerned only with the physical ports, and the placing of those ports into the appropriate VLANs. This was done via the "set port" series of commands. With the 3550, there are physical ports, and there are switched virtual interfaces (SVIs). Physical ports can be designated as L2 only or L2/L3. Without any additional configuration of the switch, all ports are Layer 2 by default. For the following discussion, these are the interfaces being referred to:
Switch_2(config)#interface ? FastEthernet FastEthernet IEEE 802.3 GigabitEthernet GigabitEthernet IEEE 802.3z Port-channel Ethernet Channel of interfaces Vlan Catalyst Vlans
A "port-based" VLAN is a physical port that either has not been configured at all (in which case it is by default a member of VLAN 1) or which has been placed into a particular VLAN via the switchport access vlan command. It should be apparent that port-based VLANs are Layer 2 only. Physical ports become physical Layer 3 ports by the issuing of the no switchport interface command. Once this has been done, the port can be given an IP address and one can enter the port into a routing domain.
Page 23 of 50
Switch_2(config-if)#ip address 10.3.3.1 255.255.255.240 % IP addresses may not be configured on L2 links. Switch_2(config-if)#no switchport Switch_2(config-if)#ip address 10.3.3.1 255.255.255.240 Switch_2(config-if)#
Figure 6. Creating a Layer 3 Port A switch virtual interface (SVI) is a logical interface that represents VLANs of physical switch ports to the routed or bridged processes of the switch. Some detail will be given in the fallback bridging section. For now, let it be said that configuration and capability are similar to that of loopback interfaces. This really takes the concept of VLANs just a very small step beyond the thinking on the earlier Catalyst switches. Unlike loopback interfaces, the creation of an SVI is a two-step process. 1. create the VLAN, using either the VLAN database command from the privilege exec or the VLAN command from the global configuration mode. 2. invoke the SVI by entering the command interface vlan from the global configuration mode.
Switch_2(config)#vlan 307 Switch_2(config-vlan)#name Three-oh-seven Switch_2(config-vlan)#interface vlan 307 Switch_2(config-if)#^Z

Figure 7. Creating Switch Virtual Interfaces (SVI) At this point, the SVI exists. Observe how the virtual interface can be displayed by a command that would normally be used for a physical interface. To be correct, the show interface command also can show subinterfaces.
Switch_2#show interface Vlan307 is up, line protocol is up Hardware is EtherSVI, address is 0009.b775.d400 (bia 0009.b775.d400) MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation ARPA, loopback not set ARP type: ARPA, ARP Timeout 04:00:00 Last input 00:01:48, output never, output hang never Last clearing of "show interface" counters never Input queue: 0/75/0/0 (size/max/drops/flushes);Total output drops: 0 Queueing strategy: fifo Output queue :0/40 (size/max) 5 minute input rate 0 bits/sec, 0 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 0 packets input, 0 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 packets output, 0 bytes, 0 underruns 0 output errors, 0 interface resets 0 output buffer failures, 0 output buffers swapped out
Figure 8. Displaying an SVI Even though the VLAN is not assigned to a physical port, and even though there is no other
Page 24 of 50
configuration on the SVI, the SVI shows "up" and "up". The integration of the Layer 2 and Layer 3 functionality takes place at the SVI level. With the 3550 paradigm, you also have the capability to define an L2 port as either an access port, a trunk port, or a voice port:
Switch_2(config-if)#switchport ? access Set access mode characteristics of the interface trunk Set trunking characteristics of the interface voice Voice appliance attributes <cr> Switch_2(config-if)#switchport voice vlan 77 Switch_2(config-if)#switchport access vlan 78
Figure 9. SVI Functionality The above configuration shows that voice and data VLANs can co-exist on the same port.
Management and Control

The principal human interface to the 3550 is IOS. Cluster management software, however, principally uses a Web interface. The Supervisor engine has direct access to the shared memory, at an internal rate faster than the interface card slots.
Hardware Aspects of Voice Ports

Many IP telephones expect -48 VDC power to be provided to them. Many Cisco switches can send this power over an Ethernet interface, but it is not a given that all platforms and line card types will support it. Take the even potential need for power into consideration when selecting new switches: is there any chance you will need IP phones? If so, plan the migration path for the switch, which might be no more than leaving slots for Ethernetwith-power line cards, and considering the additional power draw when you specify power supplies.
Forwarding
Like the 4000 and related platforms, the 3550 has shared memory. All forwarding decisions take place in "Satellite" ASICs. Control information is sent on a separate control ring to the egress interfaces, while the data part is stored in shared memory.
Table 17. 3550 Filtering Capacity Resource Access control list Limit 512 security (256 in/256 out) 128 QoS Access control entry (i.e., a line in an ACL) 4000 security Depending on the specific 3550 model, there may be more than one TCAM. Table 18. Number and Use of TCAMs in 3550 Models Model 3550-24 TCAMs TCAM use 1 All interfaces on same TCAM
Page 25 of 50
3550-48
Fast Ethernets 1-36 on TCAM 1 All others on TCAM 2 Interfaces 1-4 on TCAM 1 5-8 on TCAM 2 9-12 on TCAM 3 Interfaces 1-4 on TCAM 1 5-8 on TCAM 2 9-12 on TCAM 3
3550-12T 3
3550-12G 3
Catalyst 4000/4500 Platform Family

The original 4000 switches use a code base derived from the 5000 code set. The 4500, however, is IOS-based. 4000 switches are modular. The related 2948G is fixed configuration, as is the 4912G. Once the 4500 was released, the 4000 was targeted more at the wiring closet, especially with AVVID. In contrast, the 4500 is optimized as a termination/aggregation point for metro Gigabit Ethernet. 4500 switches can use cards from both the 4000 and 6500 product lines. One of the target markets for the 4500 is terminating Metro Ethernet between enterprises and service providers: "Gigabit to the end user". Table 19. 4000/4500 Platforms Supervisor Type III IV IV IV IV Slot 1 1 1 1 2
Platform Line Card Slots 4003 4006 4503 4506 4507R 3 6 3 6 7
For increased availability, load-sharing redundant power supplies work with all models of the 4500 series. Only the 4507R supports redundant supervisors.

Originally, Catalyst 4000 Supervisors ran a derivative of the Catalyst 5000 code. Both in later 4000s and in the 4500s, the supervisor code migrated to IOS. You can have redundant supervisors on the 4507R platform. Failover takes 30-50 seconds. Another high-availability features are the use of redundant power supplies and the ability to hot-swap line cards. Supervisor II 4006, 4503, II-Plus 4006,4503, III 4006, 4503, IV 4006,4503,
Feature or Parameter Platforms
Page 26 of 50
4506 Fabric (Gbps) Chipset Layers Processors NFSC [1] MAC addresses OS
[NA] Not available [1] NetFlow Services Card
4506,4507R 64 One K2 L2, L3 1 No
4506 64 NA L2, L3 1 No
4506,4507R 64 NA L2, L3 2 Yes 32K
24 Three K1 ASIC L2 1 No
CatOS
IOS
IOS
IOS
Forwarding
In the 4000, the actual forwarding is done using a K2 chipset. The packet engine/supervisor populates TCAMs in the forwarding engines. The 4000 fabric is shared memory. With the Supervisor 3, cards have a 6 Gbps path into the fabric, so populating it with 4 cards establishes an aggregate bandwidth of 24 Mbps. Adding two GE uplinks brings the aggregate to 32 Mbps. It has an interesting variant on line cards. Each slot has a 6Gbps bandwidth, so you might assume that a card will not have more than 6 GE interfaces. This is true for cards intended for uplinks, but there is a server-oriented 18-port server interface cards. This card exploits the reality that most servers with GE interface cannot drive those interfaces at full speed. While the original 4000 had L2 switching only, an L3 path has been added with the Supervisor 3 with the Layer 3 service module. You can add multiple Layer 3 Services Modules for greater bandwidth. Table 20. 4000 Forwarding Performance L2 switching decisions L3 routing decisions L2-4 ACL processing 48 Mbps 48 Mbps 48 Mbps
QoS marking/processing 48 Mbps Filtering is in the fast path. Traffic policing is also in the fast path, but traffic shaping takes place in the supervisor. Table 21. 4000 Filtering Capacity (Supervisor 3) Resource Access control list Limit 1024 This number combines security and QoS
16,000 in/16,000 out security Access control entry (i.e., a line in an ACL)
Page 27 of 50
16,000 in/16,000 out QoS To understand QoS filtering, you must be aware of several assumptions. First, L2 prioritization depends on the QoS value in an ISL or 802.1Q header. Second, L3 IPv4 prioritization depends either on the Differentiated Services Code Point (DSCP) or the IP precedence value in the ToS byte. Both the DSCP and precedence value are in the Type of Service byte of the IPv4 header. This discussion applies to the Supervisor III, and, unless specifically mentioned, to the Supervisor IV. Given that the fabric is nonblocking, there is no input queuing. Each output interface has four queues, 240 packets each for Fast Ethernet and 1920 packets for non-blocking Gigabit Ethernet. Table 22. Blocking and non-blocking port types on the Catalyst 4000 series Non-blocking Supervisor III and IV uplinks all ports on WS-X4306-GB line card two 1000BASE-X ports on the WS-X4232-GB-RJ line card first two ports on the WS-X4418-GB line card two 1000BASE-X ports on the WS-X4412-2GB-TX line card WS-X4424-GB-RJ45 line card WS-X4448-GB-LX line card Switch supervisors often do not support the range of QoS measures on a router platform. For example, Weighted Random Early Discard (WRED) is not supported on switch platforms, but is available on routers like the 7200. The 6500 switch is an exception that supports WRED. Depending on the model, 4500 platforms will have 28 to 64 Gbps of shared memory backplane. With the Supervisor III or IV, the fabric is fast enough to allow all interfaces to run at wire speed, without fabric blocking. All other ports Blocking 10/100/1000 T ports on the WS-X4412-2GB-TX line card
Catalyst 6000/6500 Platform Family

There are a number of common features between the 6x00 switch series and the 7600 routers. We will concentrate here, however, on the true 6x00 switches. Be aware that the 6x00 family tends to have more functionality, especially in QoS, than other switches. Table 23. 6500 Platforms Supervisor Type Slot
Platform 6503 6506 6509
Line Card Slots 3 6 9
1, 2, 720 2
Page 28 of 50
6509-NEB [1] 9 6513 13
[1] Chassis, airflow, and power supply optimized for service provider environments

The Supervisor card proper contains the switch processor. Optional daughter cards include the MultiLayer Switch Feature Card (MSFC) and Policy Feature Card (PFC). You can install both types of daughter card. In the MSFC is a routing engine and EARL switching ASICs. 6500 forwarding decisions involve a pipeline of three ASICs. The first does L2 forwarding, but recognizes traffic that must be handled at L3. The second does L3 forwarding based on information placed in the flow cache by the MSFC. Finally, the third ASIC does ACL processing. In the 6000, the COIL ASIC connects forwarding cards to the control and results busses, which are separate from the basic 32 Gbps fabric. Pinnacle ASICs connect forwarding cards to the fabric. Pinnacle ASICs support Weighted Round Robin and Weighted Random Early Discard QoS functions. Each Pinnacle handles up to four GE ports.
Database Manager
On a high-end platform such as the 6500, the more traditional limiting factors such as bandwidth are less often a problem than resource contention and exhaustion. You need to understand which ACL and related functions are done in software, creating a centralized bottleneck. Critical resources also can be in the distributed forwarding cards. In particular, these include masks in TCAM, the Logical Operation Units (LOUs); and the ACL-to-switch interface mapping labels. TCAM entries, LOUs, and ACL labels are limited resources. Therefore, depending on your ACL configuration, you might need to be careful not to exhaust the available resources. In addition, with large QoS ACL and VACL configurations, you also might need to consider Non-Volatile Random Access Memory (NVRAM) space. Remember that booting a configuration from a TFTP server is a workaround for configurations that won't fit into NVRAM. Table 24. ACLs Processed in Software in Cisco Catalyst 6500 Series Switches Function ACL denied traffic Specific Environment Supervisor 1a with PFC -- ACL denied packets are processed in software if interface does not have the no ip unreachables command configured Supervisor 2 with PFC2 -- ACL denied packets are leaked to the MSFC2 if unreachables are enabled. Packets are leaked at 10 packets per second (pps) per VLAN (Catalyst OS software with Cisco IOS Software) or one packet every two seconds per VLAN (Cisco IOS Software) Comments
Page 29 of 50
Supervisor 720 with PFC3 -ACL denied packets are leaked to the MSFC3 if unreachables are enabled Traffic denied in an output ACL (Supervisor 1a with PFC only)
Packets requiring ICMP unreachables are leaked at a user-configurable rate (500 pps by default) If traffic is denied in an output ACL, an MLS cache entry is never created for the flow. Therefore, subsequent packets do not match a hardware cache entry and are sent to the MSFC where they are denied in software
IPX filtering based on unsupported parameters (such as source host); ACEs requiring logging (log keyword)
on Supervisor 720, Layer 3 IPX traffic is always processed in software ACEs in the same ACL that do not require logging are still processed in hardware; Supervisor 1a with PFC -Traffic permitted in a TCP intercept ACL is handled in software Supervisor 2 with PFC2 and Supervisor 720 with PFC3 The TCP three-way handshake (SYN, SYN/ACK, ACK) and session close (FIN/RST) are handled in software; all remaining traffic is handled in hardware The set interface parameter is supported in software, with the exception of the set interface Null0 parameter, which is handled in hardware on Supervisor 2 with PFC2 and Supervisor 720 with PFC3 Supervisor 2 with PFC2 and Supervisor 720 with PFC3 support rate-limiting of packets redirected to the MSFC for ACL logging.
TCP intercept
Policy routed traffic (if match length, set ip precedence, or other unsupported parameters are used; if the mls ip pbr command is not configured
Supervisor 1a with PFC
Null0 parameter, which is handled in hardware on Supervisor 2 with PFC2 and Supervisor 720 with PFC3 WCCP redirection for HTTP requests Traffic requiring Network Address Translation (NAT) Supervisor 1a with PFC only) (Supervisor 1a with PFC and Supervisor 2 with PFC2); traffic requiring NAT translation or NetFlow setup (Supervisor 720 with PFC3) Supervisor 2 with PFC2 and Supervisor 720 with PFC3 -Traffic denied in a uRPF check ACL ACE Supervisor 1a with PFC -- Any uRPF check configuration Non-IP (all Supervisors) and Supervisor 1a with PFC and
Unicast RPF check
Page 30 of 50
non-IPX Broadcast traffic denied in a RACL
Supervisor 2 with PFC2 only) RACLs
Forwarding
Depending on the model and features, the 6000 series may use one of several fabric methods. On the 6000, "classic" line cards are interconnected by a Pinnacle ASICs to a 16 Gbps bus. Since the bus is bidirectional, it is marketed as 32 Mbps. In contrast to the 6000, the 6500 has a crossbar fabric, which is mounted on a separate card. This Switch Fabric Module (SFM) has a one-way 128 Gbps or 256 Gbps full-duplex capacity. Individual card channels are 8 Gbps; there are two channels per slot. Maximum throughput in the 6000 is 15 Mbps, while the 6500's maximum is 30 to 210 Mbps, depending on whether the SFM is present. Depending on the platform, slots can have different speeds, even within the same platform. On the 6506 and 6509 switches, and the 7606 router, all with SFM or SFM2 fabrics, each slot gets 16 Gbps. On the 6513, slots 1-8 get 8 Gbps but slots 9-13 get 16 Gbps. In the 6500, the Medusa ASIC interconnects the local card bus and the crossbar fabric. It also connects fabric-enabled cards to the 32 Gbps shared memory. Remember the 6500 supports 10 Gbps Ethernet and the 7600 supports OC-192. While these are considered, respectively, LAN and WAN interfaces, their physical layer is identical. Table 25. 6500 Card Types Card type Classic Fabric enabled Fabric only Function Bus only. COIL and Pinnacle ASICs Bus and fabric Medusa and Pinnacle ASICs. Fabric only. Medusa and Pinnacle ASICs. Can have Distributed Forwarding Card.
Switch fabric module line card Contains the actual fabric 6500s also use TCAM tables for cEF and ACLs. Input and output queuing take place in Pinnacle ASICs on the line card. Table 26. 6500 Filtering Capacity (Supervisor 2) Resource Access control list Limit 512 This number combines security RACL, QoS ACL and VLAN ACL (VACL)
Page 31 of 50
Access control entry (i.e., a line in an ACL)
32,000 entries 8,000 masks
Table 27. 65xx Forwarding Capacity Supervisor 1 and/or classic line cards Supervisor 2 with fabric enabled line cards Supervisor 2 with SFM and 7 6816 fabric-only line cards 15 Mbps 30 Mbps 107 Mbps
Supervisor 2 with SFM and 7 6816 fabric-only line cards plus card-local traffic switching 170 Mbps 6513 with DFC-enabled fabric-only line cards plus card-local traffic switching 210 Mbps
Switching Functions for High Availability

Layer 1/2 High Availability for Links and Interfaces
While redundancy isn't always the solution to high availability, it will usually be the case at Layers 1 and 2. You can, of course, have multiple media, each known to routing. Especially if convergence is an issue, however, the wise course may be to bundle several links together so that Layer 3 does not see a link going up and down. The major techniques for doing this are 802.3 aggregation and Multilink PPP. SONET/SDH restoration is not quite bundling, but it is self-healing in the same general sense. When a lower layer cannot repair itself, that responsibility moves to a higher layer, such as dynamic routing. Another alternative is to use dial backup to a link. Do remember dial backup is primarily Layer 1/2, where dial-on-demand is Layer 3.
Layer 1 Failover
Remember that many Layer 1 mechanisms cannot tell when a link has failed in one direction. You need Layer 2 mechanisms, ranging from link keepalives to the Unidirectional Link Detection Protocol, or Layer 3 routing updates, to detect that condition. The first Cisco feature to provide any sort of recovery in the event of link failure was dial backup, which operates at Layers 1 and 2. Subsequently, dial-on-demand routing (DDR) was adapted to give a Layer 3 capability for such backup. See the CertificationZone High Availability Study Guide for more detail on dial-based recovery.
SONET and POS

While the CCIE lab has no SONET equipment, you may need to answer written questions about SONET Alternate Protection Switching (APS. SONET/SDH can carry either ATM or Packet over SONET (POS), for SONET has become a transitional technology, just as ATM/SONET was an evolutionary step beyond TDM. Due to the large and effective SONET installed base, newer technologies must support SONET. Generalized MPLS and long-haul Ethernet and optical technologies will supplant it, but need to be backward compatible. SONET connectivity does not mean that you have automatic backup. You must explicitly enable Automatic Protection Switching, a SONET high availability technology. In the original version, SONET Line Terminating Equipment connects to a primary and backup SONET medium. The specific SONET terminology used is the working and protection ring. APS supports 1+1 and 1:N models.
Page 32 of 50
In APS, only the working ring actually carries user traffic. A management protocol runs over both rings, however. The APS Protect Group Protocol detects failures and triggers ring switchover. However, SONET has been extremely reliable, and duplicating all rings is very expensive. In the 1:N variant shown on the right side of the figure below, one protection ring covers four LTEs. When a failure occurs, the protection ring is activated only between the endpoints affected by the actual failure.
SONET no longer needs to run over its own physical fiber, but can run on a wavelength of DWDM. This allows links in multiple protection rings to run over the same fiber, with due regard not to put both links of the same ring over the same physical fiber, creating an SRG.
Unidirectional Links: Detection Protocol (UDLD) and configuring Unidirectional Ethernet

UDLD is a control protocol, operating at layer 2, whose function is closely tied to layer 1. Its purpose is detecting cases where your local device can send data to its neighbor, but you cannot receive traffic from the neighbor. Such a failure may not trigger physical layer alarms (e.g., as part of autodetection), but it can cause all sorts of failures, such as spanning tree loops, that can cause widespread network chaos. UDLD works on both copper and fiber media. When it detects the half-failure, it will shut down the interface and force the local switch to take action as if the entire link were down. Autonegotiation complement UDLD as a failure detection function, but can only see layer 1 functions in one direction. Every 60 seconds (by default) or a configurable number of seconds, a switch transmits UDLD messages (packets) to neighbor devices. UDLD messages only go out on ports where the protocol is enabled., and both ends of the link must enable it for the mechanism to work. Unidirectional communications are not always an error. There is a unidirectional Ethernet configuration command that tells the platform to use only a single fiber for one-way traffic on an interface.
Layer 2 Aggregation
Layer 2 aggregation distributes frames across two or more links, normally load-sharing but also providing fallback if a link fails. LAN and MAN standards in this area come from IEEE, primarily under
Page 33 of 50
802.3, but also under 802.17. Table 28. Recent IEEE 802.3 Standards Protocol Function 802.3aa 802.3ab 802.3ac 802.3ad Updates to 802.3u Fast Ethernet Gigabit Ethernet over Cat 5 Frame extension for baby giants Link aggregation
There are two schemes for aggregating 802.3 traffic: Cisco's early and proprietary EtherChannel, and the newer IEEE 802.3ad standard. EtherChannel uses a control protocol called Port Aggregation Protocol (PAgP). 802.3ad uses the Link Aggregate Control Protocol (LACP). These methods use at least two parallel links between two routers or switches, protecting you against a single link failure or a failure of an interface at either end of one link.
Figure 10. Basic 802.3 Aggregation Protection between Switches You can also use 802.3 aggregation between a switch or router and a server with a suitable NIC (Figure 8). Using more links gives you protection against more link failures.
Page 34 of 50
Figure 11. Multiported Servers Other benefits of 802.3 aggregation include load balancing, in which source-destination pairs of MAC addresses are assigned to specific links in the bundle. Should a link fail, the addresses are redistributed onto the working links. Again, routing will be unaware of this redistribution. To implement 802.3 aggregation, first be sure that your interface card supports 802.3 aggregation. Check the platform-specific restrictions, such as which ports can be bundled and if they need to be contiguously numbered. See Dan Farkas' LAN Switching tutorials for configuration details and Chuck Larrieu's paper on 3550-specific features. Any easy way to ensure that all ports have a common configuration is to create the channel first and then configure one port in the channel. Perhaps the most basic application of 802.3 aggregation is having a bundle between two switches. If one link fails, traffic flow continues without impact on STP. It should have little effect on user traffic, although there is a possibility that a frame in transit on the failing link might be lost.
Page 35 of 50
Figure 12. Basic Link/Interface Protection between Access and Distribution Switches Figure 13 shows a fairly complex configuration I implemented for a client, which protected against both default router failure and distribution failure and could protect against access switch failure. To protect against access switch failure, a host would need to have two NICs; each connected to a different access switch. STP would keep one of those NICs in the blocking state. See Dan Farkas' LAN switching papers for configuration details.
Figure 13. Link/Interface Protection to Default Router(s) By using multiple NICs that the host knows how to bundle into an 802.3 aggregation, you can also protect against failures from network element to host. Again, a frame in transit might be lost. Multilink PPP also protects you against failures of interfaces or links in a bundle, but the technologies involved are appropriate for WANs rather than LANs. Potentially, multichassis multilink protects you against a failure of an access server in a stack of access servers. If you simply have one hunt group phone number for the entire stack, it will be random whether different calls go to the same or different access servers. Perhaps the extreme case of using multilink to avoid single points of failure is PPP over L2TP. Resilient Packet Rings, RPR, now under development in the IEEE 802.17 working group, is intended as a more efficient replacement for SONET/SDH, allowing better use of backup facilities. MANs, and RPR in general, are intended to smooth some of the disconnects between enterprise-oriented LANs and longhaul SONET/DWDM [Vijeh 2000]. While SONET/SDH are Layer 1 technologies, RPR is a Layer 2 MAC that runs over the arbitrary physical facilities, including those compatible with SONET/SDH, metropolitan Gigabit Ethernet, etc. The basic unit of data transfer on RPR is an Ethernet frame, not a bit. RPR's L2 technology replaces the framing and the protection mechanisms of SONET/SDH. As opposed to Ethernet, it offers protection switching at SONET speeds. RPR accepts that some traffic can be preempted if one ring fails, an idea certainly consistent with QoS prioritization. Other information is available from the IP over RPR Working Group in the IETF's sub-IP area, and an industry forum, the RPR Alliance, is being formed. As a technique primarily used in metropolitan and wide area carrier networks, RPR is beyond the scope of this paper.
Page 36 of 50
Preventing Broadcast Storms

Broadcast storms have become less of a problem as IP implementers learn ways to avoid them and broadcast-intensive desktop protocols disappear. They still happen, but they can be restricted by appropriate settings on switch ports to which hosts connect. Cisco broadcast suppression counts broadcast frame during a predefined interval, and shuts down the port if the count exceeds a configurable threshold. You should never suppress all broadcasts, because that may disable perfectly legitimate functions such as ARP and DHCP. Since broadcast storms inherently involve bursts of broadcasts that reinforce one another, you can't look at a port and instantly decide a broadcast storm is in process. You need to count the broadcasts over at least 1 s. Cisco switches have broadcast suppression disabled by default. When you enable broadcast suppression, you specify a percentage of bandwidth that can be used by broadcasts. In addition, on Gigabit Ethernet ports, you can suppress multicast and unicast traffic rates as well. Traffic shaping and policing, however, may be a better and more general way to deal with multicasts and unicasts.
Other Layer 2 Security and Management Enhancements

Don't assume that every possible problem in L2 networks is directly associated with operations of the spanning tree. You want to be sure that no more than the essential minimum of devices actually participate in the spanning tree, just as most of the edge routers in an enterprise network simply default to routers higher in the hierarchy. Don't turn off STP, but control it. STP information can be important to error management. Remember there can be other L2 threats that have nothing to do with STP computation, such as broadcast storms and L2 denial of service attacks.
Private VLANs
Mentioned frequently in the Cisco SAFE blueprints, private VLANs impose a NBMA topology on a single Ethernet subnet. This is especially useful in broadband provider applications, where you do not want any user to be able to see the traffic of any other user. Table 29. Types of Ports in Private VLANs Port type Communicates with
promiscuous all other private VLAN ports and is the port used to communicate with devices such as routers, LocalDirector, backup servers, and administrative workstations. isolated Community promiscuous ports only communicate among themselves and with their promiscuous ports. These ports are isolated at Layer 2 from all other ports in other communities or isolated ports within their private VLAN.
Once you have defined the ports, you define pairs of VLANs (e.g., primary to community) that permit communications between them. Table 30. Sub-VLANs in a Private VLAN
Page 37 of 50
Type 1 primary VLAN 1 isolated VLAN 1 or more community VLANs
Traffic rules forwards incoming traffic arriving at a promiscuous port to all other promiscuous, isolated, and community ports isolated ports to communicate to the promiscuous ports. used by a group of community ports to communicate among themselves and transmit traffic to outside the group via the designated promiscuous port.
The simplest private VLAN consists of one primary VLAN and one of either isolated or community types. You are allowed to have additional isolated or community types, which do not communicate with one another. In your configuration, you must bind the isolated and/or community VLAN(s) to the primary VLAN and assign the isolated or community ports to the appropriate sub-VLAN. You will find that many of the private VLAN constraints (Table 31) also apply to 802.1x constraints (Table 32). Table 31. Private VLAN constraints Feature BPDU Guard Constraint Automatically enabled
VLAN membership Set to static Access ports VTP VTP mode primary VLAN isolated or community VLAN numbers Port restrictions ASIC consistency Redefined as host ports transparent mode cannot be changed to client or server. VTP does not understand private VLANs. only 1 isolated VLAN and/or multiple communities can be associated with it. only 1 primary VPAN Private VLANs cannot be numbered 1 or 1001 through 1005. Private VLAN port cannot be channeling or dynamic membership. It only can be trunking if it is a MSFC port. On the same ASIC, you cannot have one port that is a trunk or a SPAN destination, and others that are community, isolated or promiscuous. This is hardware platform specific. Must be identically configured on primary and isolated/community Mutually exclusive with private VLAN port May belong to a private VLAN Cannot be used with private VLANs Cannot be used on private VLAN ports Not supported
Spanning tree parameters Destination SPAN Source SPAN Remote SPAN EtherChannel IGMP snooping
! define the primary VLAN set vlan vlan_num pvlan-type primary ! set vlan vlan_num pvlan-type {isolated | community}
Page 38 of 50
set pvlan primary_vlan_num {isolated_vlan_num | community_vlan_num} ! Associate the isolated or community port(s) to the private VLAN. set pvlan primary_vlan_num {isolated_vlan_num | community_vlan_num} mod/ports ! Map the isolated/community VLAN to the primary VLAN on the promiscuous port. set pvlan mapping primary_vlan_num {isolated_vlan_num | community_vlan_num} mod/ports ! check the configuration show pvlan [vlan_num] show pvlan mapping
Figure 14. Configuring Private VLANs Before doing private VLANs in production, check platform and line-card-specific constraints. Many of these are 6500-specific, so won't show up in the CCIE lab.
802.1x -- Port Based Authentication

With the advent of wireless network access, security of network resources is more important than ever. Security isn't just about protecting your data, but also protecting your networks against attacks and unauthorized use. 802.1x is a Layer 2 port-oriented feature that can become an important component of any security plan. It certainly is advisable to use it for wireless LANs, but that does not do away with the need for encryption. Other available authentications include AAA (with and without Radius). Not all port definitions support 802.1x. Table 32. 802.1x Support by Port Type Type Static access Trunk port 802.1x support Yes No. If you have an existing trunk port and try to configure 802.1x, the port mode will not change. Configuring 802.1x on an existing port generates an error message. If you try to change the mode of an 802.1X-enabled port to trunk, the port mode is not changed. No No. Bear the security issues in mind if you use EtherChannel-capable NICs on your servers. No You can enable it, but 802.1x will not run on the port until the port is no longer a SPAN destination. You can enable 802.1X on a port that is a SPAN destination port; however, 802.1X is disabled until the port is removed as a SPAN destination. Yes.
Dynamic ports EtherChannel port Secure port (i.e., with MAC filters) Switch Port Analyzer (SPAN) destination port SPAN source port
To authenticate with 802.1x, Extensible Authentication Protocol Over LAN (EAPOL) frames are the only ones allowed through the switch, destined for an authentication server or an authentication function on
Page 39 of 50
the switch, until the port user successfully authenticates. In other words, EAPOL extends the functionality of RADIUS to the switch port.
DHCP-related Security Features

Cisco introduced several features to improve security of DHCP. DHCP has special vulnerabilities because the client may be hard to identify or easy to imitate. The DHCP Snooping Feature intercepts DHCP requests from hosts outside your network or outside your firewall. If you think about it, why should a host outside your network want one of your addresses?
show ip dhcp snooping binding would result in a display like that in Table 33.
Table 33. Hypothetical DHCP Snooping Table MAC address IP address Lease (sec) Type VLAN Interface FE 2/1
00c0:0012:3456 192.168.58.37 7200 To configure DHCP snooping,
dynamic 86
! enable globally ip dhcp snooping ! enable for a VLAN ip dhcp snooping vlan vlan-number ! enable information insertion for subscriber tracking with ! dhcp option 82 ip dhcp snooping information option ! define the interface as trusted (e.g., inside firewall) ! or untrusted ip dhcp snooping trust ! set rate limit for accepting DHCP requests ip dhcp snooping limit rate
There is also a set of DHCP snooping features relevant to the DHCP database agent, but that agent and its files are beyond the scope of this discussion of switch functionality. IP source guard, intended for layer 2 ports, complements DHCP snooping at layer 3. Operating on the Principle of Least Privilege, IP source guard blocks any user traffic until a DHCP request and response are fulfilled. Once that process is complete, you can permit either:
z
IP traffic only with the IP source address captured during the DHCP exchange IP traffic only, which meets both the previous IP address criterion, but also originates from the MAC address observed during the DHCP exchange.
This feature can also be used on private VLANs.
Growing Frames beyond Normal Size

While it may sound as if it comes from Alice in Wonderland, there are several capabilities where you can feed magic mushrooms to your switch ... I mean, configure it ... to have Ethernet frames containing more than 1518 bytes (or 1526 if you add the preamble).
Page 40 of 50
The first case is the rather oxymoronic one of "baby giants". This deals with the need for a switch not to drop VLAN-associated frames that have either the additional 2 bytes of an 802.1q header, or the considerably longer frame of Cisco proprietary ISL. Baby giants are not a problem for switches that have ports associated with VLANs as well as their trunks, but the problem happens when you have an intermediate switch (e.g., for aggregation into GE), which only forwards trunk frames. This is a case that matters purely at Layer 2, for inter-switch trunking. The danger is that an intermediate switch such as the one above might drop perfectly legal VLAN trunk frames because they exceed 1518 bytes. IEEE 802.3ac provides a specification under which baby giants are legal. The second case has both L2 and potentially L3 implications. Some switches can support frames with lengths up to 9216 bytes, which reduce the overhead of 26 bytes of header and preamble plus a 9.6 microsecond (on 10 Mbps Ethernet) between frames. Such "jumbo" frames may be kept purely in a switched environment, or you also may need to increase the default MTU of 1500 on IP interfaces.
Single Spanning Tree High Availability

IEEE 802.1d defines the standard spanning tree algorithm and procedures for the industry. IEEE recently reissued an updated version of 802.1d proper, but has also promulgated a new Rapid Spanning Tree architecture, 802.1w. Originally, STP was expected to operate in a flat topology where any bridge could reasonably become root. As we have learned more about campus networks and hierarchical models for them, the reality is that the designer will want to select switches eligible for roots and prioritize which ports will block on those switches. Most of the new performance-enhancing features rely on a much more structured topology than the original 802.1d designers considered. Always consider network management and troubleshooting. Keep a management LAN or VLAN separate from all user traffic. Remember that the basic networking troubleshooting aids such as ping and traceroute do not understand Layer 2 addressing, although Cisco has implemented "Layer 2 traceroute", discussed in this paper. In general, it is harder to troubleshoot at Layer 2 than at Layer 3, a warning to keep Layer 2 designs as simple as possible.
Layer 2 Traceroute
Traditionally, traceroute could only be used with routers and L3 hosts. Cisco developed an extension to traceroute that uses Cisco Discovery Protocol to identify L2-only devices in the path. For this feature to work, a number of criteria must be met. 1. All L2 switches in the path must have IP connectivity. 2. You cannot trace more than ten hops. 3. You can only trace within one VLAN. A given MAC address being traced can belong to more than one VLAN, but you can trace only its role in a single VLAN 4. Multicast addresses are not supported. 5. L3 addresses in the path are identified with ARP. If ARP requests cannot be resolved, the L2 traceroute fails. 6. Hubs will break L2 traceroute, because it assumes one device per switch port.
Page 41 of 50
To execute a L2 traceroute, enter the following command from privileged EXEC mode:
traceroute mac [ip] {source-mac-address} {destination-mac-address}
This section assumes that you understand the basic 802.1d implementation including Bridge Protocol Data Units (BPDUs). See Farkas' LAN Switching and Larrieu's 3550 for details. Table 34. 802.1 and Related Cisco Protocol Summary IEEE Protocol 802.1d 802.1s 802.1w 802.1q 802.1s 802.1x* Function Basic spanning tree Multiple spanning trees Rapid spanning tree; see 802.1d Basic VLAN VLAN Extensions Port authentication PVST Cisco Proprietary Equivalent None ISL PortFast IEEE Enhancement Second edition 802.1d, 802.1w
Its basic approach is for bridges to announce themselves to other bridges using Bridge Protocol Data Units (BPDUs). From BPDU information, the bridges elect a root bridge, and then prevent loops by blocking all but one link between pairs of bridges. We already discussed EtherChannel as a protection against interface and link failures. At the next logical level, several things can go wrong with a spanning tree or VLAN. They include: 1. More than one bridge assumes that it is root ("root wars"). 2. A new non-bridging device attached to a switch port takes too long to start forwarding. 3. A device, typically an end host, endangers the network with a malfunction. 4. A correct spanning tree takes too long to converge, even in the absence of failures. 5. A distribution switch fails and it takes the network a long time to reconverge. 6. A core switch fails and results in long reconvergence time. On a switch, ports will eventually be put in the roles in Table 12. Table 35. STP Port Types Port type Root port (RP) Function BPDUs
Used by nonroot switches to reach the root (1 per switch) Receives
Page 42 of 50
Designated port (DP)
Lowest-cost path to the root (1 per collision domain)
Sends Receives Receives Receives
Nondesignated port (NDP) All other ports Alternate port* Backup port*
*IEEE 802.1w RSTP.
All other ports All other ports
IEEE 802.1w is an upgrade of 802.1d that significantly improves recovery time in switched networks. Cisco has long had proprietary mechanisms to improve convergence after certain failures or addition of devices to the network. Many of these capabilities appear in 802.1w as well. Cisco's hierarchical design model nicely identifies places where you can have STP failures that can get special handling:
z
Core switch failure and failover Distribution switch failure and failover Access switch failure and failover Multiported host NIC/port failure and failover
Core/Backbone Switch Failure

While generic STP assumes a single core switch acting as root, real-world configurations tend to have a pair of such switches. You should be concerned with two main types of root switch failure: indirect root failures and root wars.
Indirect Root Failures

In Figure 14, the normal port blocking caused by STP causes distribution switch DS1 to forward toward core switch CS2, which, in turn, forwards to core switch CS1 via link 2.
Figure 15. Link Failure between Core Switches CS1 is intended to be the root. However, what if link 1 fails? How does DS1 know that it now needs to unblock link 3 and forward directly to CS1?
Page 43 of 50
When CS2 loses link 1, it will start sending inferior BPDUs to its subordinate bridges. Under normal conditions, when a bridge starts to receive inferior BPDUs, it will ignore them until its STP aging timer expires. At the expiry of that timer, the default condition would be to go to a new STP reconvergence, blocking all forwarding until all bridges agree on the new root. If there are no blocked ports on the bridge whose timer expired, it will decide it is the new root bridge -- which may or may not be true. One workaround allows the bridge receiving inferior PDUs to send root link query PDUs out all blocked alternate paths to the root bridge. If the response to one of these PDUs indicates there is a path to the root, then all the blocked ports go into listening and learning, and eventually the spanning tree reconverges. But if we know at configuration time which alternate bridge will become root, we can speed recovery by recognizing that reality and giving the distribution switches a fast mechanism to find the backup core switch without a full STP recomputation. Cisco's original proprietary solution was BackboneFast. IEEE 802.1w has an equivalent solution.
Root Wars
One of the special problems of root switch failures is that you can wind up in failure modes where several switches, some of which were not intended ever to become root, decide that they are root. There may even be a root war. Root wars tend to happen most frequently when the spanning tree is quite large and contains relatively slow links. Root wars were especially common in bridging over WANs with link speeds below 56 Kbps. They are very rare at LAN speeds.
Distribution Switch Failure

If a distribution switch fails, its backup, which typically has been idled by STP, needs to find the new core switch. You might ask, "But doesn't BackboneFast deal with the problem of root failure?" The answer is that it does, but it deals with problems caused by interactions between multiple switches that try to become root. The distribution switch problem assumes that there is indeed a root switch, but that the backup distribution switch needs to find it quickly. Cisco's original proprietary solution was UplinkFast.
Figure 16. Distribution Switch Failure
Page 44 of 50
Spanning tree, of course, will block one of these ports. If the port (or the associated link) fails, the spanning tree algorithm eventually will unblock the alternate port on the wiring closet switch. Deciding which port to unblock requires the spanning tree algorithm to run and possibly elect a new root. During this decision process, forwarding can stop for up to 90 seconds. UplinkFast works on the principle of a sergeant saying, "When I want your opinion, I'll tell you what it is." At the time the wiring closet switch is being configured, the network administrator knows which core switch is primary and which is backup. The noncore switch is told which its primary and secondary links are, and, when it detects a failure on the primary, it has been preconfigured with the knowledge of the backup switch to use. It enables the backup interface without going through the 802.1d listening and learning states.
Performance Enhancements to Individual Spanning Trees

Cisco has long had a variety of enhancements to the spanning tree, but functional equivalents in standards-based protocols are now appearing. Ironically, some of the enhancements go back to the spanning tree originally designed by Radia Perlman and historically available as the "DEC" spanning tree, rather than "IEEE".
IEEE 802.1w Rapid Spanning Tree Protocol (RSTP)

Rapid Spanning Tree Protocol (RSTP) is the industry standard specified in 802.1w. It can be used in a single spanning tree, or in 802.1s VLANs. In the long term, RSTP should provide a standards-based equivalent to many Cisco proprietary spanning tree improvements (e.g., UplinkFast, BackboneFast). Its mechanisms allow rapid addition of new hosts and reconvergence after failures or the addition of new switches. The speed improvement is significant for applications that previously seemed unattainable, such as integrated voice and data. Compared to 802.1d, RSTP has two more port types and two less states. It decoupled two 802.1d concepts, which were the basis for many operational problems with the older algorithm:
z
Port state (Table 35) Port role (Table 36)
RSTP can provide subsecond reconvergence where traditional 802.1d could take 30 s or more to converge. Remember that, as opposed to routers, bridges do not forward while reconvergence is in process. One of the problems of 802.1d convergence is that it must dynamically discover alternate switches after a failure by rerunning spanning tree. It must do so even if there is only one possible backup switch for the particular switch detecting the failure. While STP is running, no forwarding takes place. Another 802.1d problem is that a newly added port must spend 30 s in the learning state before it can begin forwarding. Obviously, this slows recovery time.
Port Types in 802.1d and 802.1w

In 802.1d, after the root is selected, the nonroot switches must know which port is their RP. This is accomplished by each switch calculating path costs back to the root. Costs are based on the speed of the ports a BPDU is received on.
Page 45 of 50
RSTP allows for the creation of several types of ports in addition to the root port and the designated ports of 802.1d. There is an alternate port and a backup port designation kept in the active spanning tree table. When a topology change is detected that affects the current spanning tree, the active tree is immediately flushed, and the backup ports are immediately placed in the forwarding state. Having backup ports allows 802.1w RSTP to provide functionality similar to that of UplinkFast and BackupFast. After the root is selected, the nonroot switches must know which port is their RP. This is accomplished by each switch calculating path costs back to the root. Costs are based on the speed of the ports on which BPDUs are received.
Port States in 802.1d and 802.1w

One of the features causing slow recovery, or new device introduction, in the IEEE 802.1d algorithm is its need to spend 30 s listening and learning before a port will actively forward frames. The idea here is that the port needs to know where the root is, whether it needs to go into a blocking state, etc. In practice, you know, from the network design, that a given port has an end host on it, not another bridge. There is no benefit for the port learning its role in the spanning tree, because it will not have any. The Cisco workaround to this problem is PortFast, which skips the learning phase for edge ports, defined as ports that will not connect to a device with STP capability. 802.1w also avoids this problem in an equivalent manner. Neither of these mechanisms inherently improves availability, but both do decrease the recovery time. Table 36. STP Port States 802.1d State 802.1w State Disabled Blocking Listening Learning Forwarding Learning Forwarding Discarding
The PortFast feature disables the learning phase of the 802.1d algorithm for that port, so that the port will begin forwarding after it learns key MAC addresses. There are cases when an installation tries to make a server more fault-tolerant by giving it multiple interfaces. When these interfaces are in different spanning trees, PortFast would not create any problem. In that case, the challenge is how the server knows which interface to use, a problem that needs to be solved at higher layers or in host software. If, however, the server has two or more interfaces in the spanning tree, there might or might not be a problem. As long as the host never attempts to forward frames between its interfaces, there will be no problem. The spanning tree will simply see it as two separate host endpoints. If the host is capable of bridging, however, then PortFast must not be enabled. Bridging-capable multiple interface hosts need to know which interface to put into the blocking state, putting it into a hot standby role.
PortFast, BPDU Guard, and 802.1w Functional Equivalence
Page 46 of 50
PortFast is not a recovery mechanism, but rather a mechanism to reduce the delay before a nonbridging end host can start to participate in higher-level communications. IEEE 802.1w provides an equivalent standards-based function. Both prevent the attempted reconvergence of the spanning tree when a new device is plugged into a switch port. Remember that while spanning tree reconvergence is in progress, as opposed to routing protocol reconvergence, all forwarding stops. You must not use PortFast or equivalents on hosts with multiple NICs in the same broadcast domain, although it can be used if you have hosts in different redundant broadcast domains. Using it in the first case will prevent the host from being able to decide which NIC to block to follow spanning tree rules.
Root Wars and Root Guard

If we allow our switches to use their default STP configurations, we have no way of telling which switch will come up as the root. As a rule of thumb, in switched campus networks we would want one of our distribution switches to be our root. While we would also want to consider traffic patterns, server placements, VLAN configuration, network policy, and other design and configuration issues, we usually end up choosing a distribution layer switch to be our root. If we had an access switch as our root, we could end up with a suboptimal switching path and longer convergence times. When choosing a root switch, we will also want to choose a secondary root for redundancy. Since distribution switches are deployed in pairs, one could be configured as the root and the other as the secondary root. An additional problem comes when a slow link connects two parts of a spanning tree, as might be done in remote bridging. If that link is sufficiently slow or congested that the hello timers expire, it is entirely possible that bridges on both sides of the link may elect different roots. This problem can be avoided by routing across the link or by using Metro Ethernet or a similar WAN technology of typical LAN speed. Cisco has developed an additional safeguard, extending STP, called Root Guard. Root Guard lets a device participate in STP, as long as it doesn't try to become root. Another root war problem comes from a failure of a link in one direction. In such a case, the real root will receive BPDUs and conclude that it is root, but it is unable to tell other switches it is root. While Root Guard does not solve this, the Unidirectional Link Detection Protocol will.
STP Convergence Time

Three timing parameters affect STP convergence, and are discussed in my High Availability tutorial and Dan Farkas' two Switching tutorials. They are selected by the root and propagated to nonroot switches. Change them only as a last resort, first trying to optimize the topology and then turning on protocol efficiency features. If you must change timers, do so only on root switches and let the information propagate through STP. Table 37. Timers Affecting 802.1d Convergence Timer BPDU hello time Forward delay Maximum BPDU age time (max age) Default 2 every 2 s 15 s 20 s Function Rate and volume of BPDU generation Time port stays in listening and learning states Time after which a forwarding port, hearing no BPDUs, goes back to learning.
Page 47 of 50
Performance Enhancements to Multiple Spanning Trees

Recent enhancements in IEEE spanning tree protocols improve the basic performance of spanning trees by using information hiding and hierarchy, much as we see with ISIS and OSPF. Other enhancements allow better VLAN utilization by establishing a spanning tree for each VLAN, rather than assigning VLANs to the same spanning tree and not gaining an optimal solution for the specific spanning tree population. Much of the IEEE functionality already existed, in closely related form, in Cisco proprietary features. Table 38. Relationships among Spanning Trees, VLANs, and Protocols Protocol Cisco N/A 802.1d, 802.1w 802.1q ISL (obsolete), PVST+ MISTP and MISTP-PVST+ (5000/6000 compatibility mode) 802.1s (you may configure 1 or many VLANs per spanning tree) IEEE
Spanning Trees 1 1 M N
VLANs None M M M
As with routing protocols, especially link state, processing load grows exponentially in some fullinformation topologies. Techniques for improving convergence in small domains, such as decreasing the interval between hellos, do not scale to large size. Instead, the trend is to introduce ISIS-like hierarchy into advanced spanning trees, restricting information from hierarchically lower parts just as information is withheld from ISIS or OSPF stub areas. When we discuss VLANs, you will also see that you can reduce STP overhead by choosing to assign a single STP instance to multiple VLANs, although you can create a 1:1 relationship between STPs and VLANs. 1:1, originally introduced in Cisco ISL, does allow optimal topology for each VLAN, but at greater overhead than MISTP and 802.1s.
MSTP: Subdividing the Spanning Tree for Faster Convergence

Historically, one of the attractions of switched networks was their simplicity, with a logical flat topology. While there might be topological hierarchy based on faster and faster bandwidth toward the core switch (es), there was no concept of aggregation as is seen in OSPF and ISIS. This is no longer the case. While Cisco long has had individual proprietary mechanisms for tuning convergence, there is now an industry standard, IEEE Multiple Spanning Tree Protocol (MSTP -802.1s), which encompasses many of the Cisco tuning features. MSTP builds on RSTP and provides load balancing, multiple paths, and other enhancements. MSTP is an extension of RSTP and uses RSTP messages. It organizes switches into regions, which have some conceptual similarity to ISIS areas. All switches in a region need to have identical MSTP configurations.
MSTP Regions
Page 48 of 50
MSTP regions, comparable to Layer 3 routing protocol backbones, have a single STP for the region. Each MSTP switch belongs to one and only one region. The backbone consists of an internal spanning tree (IST) that sends and receives BPDUs and knows about up to 16 MSTP instances -- the nonbackbone areas. Table 39. Region Definition Parameter Name of region Revision number
spanning-tree mst Configuration Subcommand

name version
MST instance-to-VLAN assignment map instance
IST, CIST, and CST

PVST+ has greater processing overhead, although with potentially greater fault tolerance, than MSTP. MSTP gains efficiency by not trying to keep all the spanning trees separate and equal, as does PVST+, but by creating a hierarchy of spanning trees. MSTP assigns an instance number to each of its spanning trees, up to 16. Instance 0 is the internal spanning tree (IST), the backbone of a region, with subordinate stub regions of limited information. Like the ISIS backbone, only the IST has full information. Only the IST transmits and receives BPDUs. All VLANs are assigned by default to the IST. Each BPDU contains an M-record containing the information for all MSTs, reducing header processing in comparison with per-ST BPDUs. Instances, like nonbackbone areas in a strictly hierarchical routing protocol, do not exchange information. If the regions are interconnected, the exchanged topological information doesn't flow directly between instances. The common spanning tree (CST) interconnects ISTs; the common and internal spanning tree (CIST) is the set of ISTs and the CST belonging to a region. Regions are subtrees of the entire switched domain. The switched domain, defined by the CIST, is an exchange of 802.1d, 802.1s, and 802.1w information. IST instances 1 through 15 are logical stub spanning trees. IST convergence results in the creation of an IST master, which is the CST root if there is only one region. If there is more than one region, the IST master is assigned to the region boundary MSTP switch with the lowest bridge ID and cost to the CST root. IST master status is preemptible if a later reconvergence finds a switch with lower values for the selection tiebreakers of ID and cost. As the region converges, subregions, each with their own IST master, join subregions with a master that is at least preferable to their master. Eventually, only one master is left and there is only one subregion. It may not always be possible to create a single region, perhaps because there are 802.1d legacy switches as well as MST. In such cases, we create a backbone of IST master switches and legacy 802.1d switches, called the CST.
VLAN Tagging and VLAN Trunk Protocol (VTP)

VLANs rely on some mechanism of frame tagging, which lets the VLAN-aware devices recognize the traffic of individual VLANs carried on trunks. IEEE 802.1q is the dominant method of tagging frames, but there are other, obsolescent methods. IEEE 802.10 was used to tag frames on FDDI. Cisco's ISL preceded 802.1q and had significant technical advantages until 802.1q was extended to deal with
Page 49 of 50
multiple spanning trees. LAN Emulation (LANE), strictly speaking, does not tag frames. It has an equivalent and implicit function, however, which identifies VLAN-equivalent Emulated LANs (ELANs) by the virtual circuit with which they are attached. Especially when multiple tagging methods are in use, there needs to be a way to convey the tagging translations among switches. This is the first function of the VLAN Trunk Protocol (VTP). VTP is less a specific high availability mechanism than a means to distribute -- quickly, efficiently, and with minimum human intervention -- changes made as a result of reconfiguration or recovery. VTP has one or more VTP domains to which VLANs are assigned. One or more physical ports are assigned to each VLAN. VTP propagates changes in these relationships.
VLAN-to-Spanning Tree Relationships

IEEE 802.1q is the base industry standard for VLANs. Cisco has a proprietary Inter-Switch Link (ISL) protocol for VLANs, but, as IEEE has extended the capabilities of 802.1q, Cisco is migrating to the standard. Newer products do not support ISL. VLAN management has additional features on the 3550 series as compared with the 4000/5000/6000 series. Be aware that the spanning tree algorithm used on individual VLANs can either be 802.1d or 802.1w.
VTP Pruning
The VTP pruning function reduces demands on system-wide performance by blocking information irrelevant to the downstream switches on a given path. For example, if a particular switch does not have VLAN 42 configured, VTP pruning prevents VLAN 42 control messages from going to that switch. Configuring VLAN 42 on that switch will inform VTP that the switch now needs to hear information about that VLAN.
PVST
Per-VLAN Spanning Tree Protocol (PVST) is built around the 802.1s extensions to spanning tree. Essentially, it is the specification that applies the 802.1w single spanning tree enhancements to multiple spanning trees in a VLAN environment. PVST causes some changes in VLAN numbering that you need to know about; these are described in the 3550 Study Guide.
Conclusion
To understand "switching", you must avoid the market droids' confusion based on the premise "switch good router bad". If you make decisions on L3 information, you are routing. At best, "L3 switching" means that some hardware acceleration techniques were used -- but very similar techniques are used on any high-performance router, such as the Cisco 12000 "Gigabit Switch Router." L2 switching does have meaning: it's the combination of bridging with microsegmentation. Bridging uses spanning trees to form its forwarding tables, and there have been considerable improvements in spanning tree robustness and performance. You need to be familiar with these, both Cisco and IEEE. You also need to be aware of the bandwidth-reducing techniques used to improve performance at L2, such as VTP pruning, CGMP, and IGMP snooping. It appears that Cisco is asking more and more platform-specific "speeds and feeds" questions on certification exams. You need to know the basic characteristic of platform families, but you also need to be realistic. While the number seems to change on a daily basis, it's not unreasonable to say there are over 500 combinations of families, models within families, and common components such as supervisors and power supplies.
Page 50 of 50
References
[Berkowitz 2000] H. Berkowitz. WAN Survival Guide. Wiley, 2000. [Berkowitz 2002] H. Berkowitz. Building Service Provider Networks. Wiley, 2002. [Hines 2002] AA Hines. Planning for Survivable Networks. Wiley, 2002. [Perlman 1988] R. Perlman. "Network Layer Protocols with Byzantine Robustness." PhD dissertation, Massachusetts Institute of Technology, 1988. Laboratory of Computer Science document MIT-LCS-TR429. http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-429.pdf
Change Log: 2003-12-29: - Changed 5 (erroneous) 802.1t references to 802.1s
[IENP-SW2-WP1-F03] [2003-12-29-02]

New Generation of Cisco Switching

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

New Generation of Cisco Switching

Загружено:

Авторское право:

Доступные форматы

Certification Zone - Tutorial