Академический Документы
Профессиональный Документы
Культура Документы
Certification
Student Guide
Version 1.0 Rev A
This document (the Document) contains confidential information of Dell nd embodies trade
secret and proprietary intellectual property of Dell. It is legally protected nd shall not be copied,
modified, reverse engineered, published, disclosed, disseminated outside of Dell or otherwise
used, in whole or in part, without Dells written consent, pro ided, however, that you have the
right to use the Document solely for your internal use and solely as necessary for you to enjoy
the benefit of Services under the applicable SOW ( r ther agreement) you have entered into with
Dell. Copyright 2016 by Dell Inc. The copyright notice does not imply publication of this
document or its contents.
Table of Contents
Physical Layout....................................................................................................................................... 25
Collapsed Core ....................................................................................................................................... 26
Campus (Core/Distribution/Access)....................................................................................................... 27
Redundant Paths.............................................................................................................. ...................... 28
Other Topologies.......................................................................................................................... .......... 29
Other Topologies (cont.) .......................................................................................... ............................ 210
Section 2 Cabling Considerations...................................................................................................... 211
Cabling Considerations......................................................................................................................... 212
Optical Fiber ............................................................................................................... .......................... 213
Optical Fiber Connector ..................................................................................................................... 214
Optical Fiber Connector (cont.)................................................................................................... ...... 215
Optical Fiber Connector (cont.)......................................................................................................... 216
Optical Fiber Connector (cont.)......................................................................................................... 217
Optical Fiber Multi Mode .................................................................................................................. . 218
Optical Fiber Single Mode .................................................................................................................. 219
MTP Fiber Cable ............................................................................................................. ...................... 220
MTP Fiber Cable (cont.)...................................................................................................................... .. 221
MTP Fiber Cable (cont.).................................................................................................. ...................... 222
MTP Fiber Cable (cont.)...................................................................................................... .................. 223
Direct Attach Cable (DAC) .................................................................................................................... 224
Section 3 Hierarchical D sign............................................................................................................ 225
Hierarchical Design 2 Tier.................................................................................................................. 226
Hierarchical Design 3 Tier.................................................................................................................. 227
Section 4 L2/L3 Termination and Segmentation............................................................................... 228
L2/L3 Termination and Segmentation ..................................................................................... ............ 229
Centralized R uting.............................................................................................................................. 230
Centralized Routing (cont.) .................................................................................................................. 231
Distributed Routing.......................................................................................................... .................... 232
Distributed Routing (cont.) .................................................................................................................. 233
Dis ributed Routing (cont.) ................................................................................................. ................. 234
Dell The Power To Do More ............................................................................................................. 235
SNMPv3................................................................................................................................................
331
Modules .................................................................................................................................................
42
Section 1 IP Addressing ...................................................................................................................... 43
Prefix Planning and Placement ............................................................................................................. 44
Summarization and Aggregation ........................................................................................................... 45
Section 2 Optimization of Dynamic Routing Protocols ...................................................................... 46
OSPF Design General Considerations .................................................................................................. 47
OSPF Flooding Domains ........................................................................................................................ 48
OSPF Design 2 Tier Hierarchy ............................................................................................................... 49
OSPF Design 3 Tier Hierarchy .................................................................................................. ........... 410
Modules .................................................................................................................................................
52
Reliability is one of the key pieces to building out a good network design. If a network on the flip side is unreliable the
consequences can be catastrophic
A Reliable network provides itself as a communication medium to its applications for timely, consistent, and
virtually lossless delivery of data.
3. It provides consistent communication between applications, this means the delay interval between packets
needs to be as similar as possible, otherwise you run into issues with jitter. This is grossly noticeable with
VOIP quality
4. 4: It provides virtually lossless delivery of data, this is pretty straightforward if the fabric is overly lossy then
bandwidth requirements were not properly calculated, or the design has suboptimal link utilization. In this
network users will be frustrated and applications will begin failing.
Bec use reli bility is costly we need to determine which applications need ABSOLUTE reliability versus the
appl ca ons that can have minimal reliability and still function. Another factor that needs to be taken into cons
dera ion is the cost of unreliability for an application.
As we have seen from previous slides reliability is much more than is the network up? it is also made up of reliable
delivery of packets with jitter and delay characteristics. With this being said if the network is down then it is going to
be unable to deliver these packets reliably, unless an alternate route exists, and is the convergence necessary to
alternate paths going to be too much for critical applications to handle and cause a failure?
In certain situations this may be catastrophic, or this could be a minor annoyance that is acceptable. We have to
classify which sections of the network are more critical than others, and define the application requirements. At this
point we need to show what a failure would look like from a cost perspective and the amount of time to recover from
this failure.
Scalability should not be associated with the size of a network. The idea behind scalability is that a network of any
size can scale and grow and is documented and designed in such a way to mitigate down time to scale.
A Manageable network is a well documented network, performing proper day to day maintenance, with
emergency safeguards in place to prevent unnecessary down time.
If the network is undocumented it increases the time to resolve any issue. well documented
If day to day maintenance is not performed you begin to have network equipment with outdated firmware and
potential for unresolved bugs to plague the network Daily Maintenance
If emergency safeguards are not put in place then we have virtually no change management with changes
erupting from multiple se tors of the network and left unchecked. Network Contingency/Backup
1. Gather all the documentation this would include L1, L2, and L3 documentation. Any network policies, really
any documentation related to the network at all
2. If its possible validate the documentation and ensure that it accurately reflects the network, programs like
netbrain can help with this
3. Utilize a test device to validate interoperability with most common network devices
Always bring a sample d vice. For instance if you are deploying N3000s take one on site and validate it
is compatible with many of their common devices and applications
4. Validate that existing inf ast u ture can support the new equipment
Ensure that they have the rack space and the power infrastructure to support the new design.
Baselines are extremely important in the pre work and post work of network design. We need to ensure that we
have an accurate view of the entire network before making changes or modifications. This will help us in our traffic
manipulation policies and in our overall customer experience. For example customers may feel the network is
performing at a level that it may not be, they may assume certain devices work when in fact they dont. We need to
provide how the network is functioning pre and post deployment
Utilize documentation to access all existing network infrastructure and gather a network baseline
The documentation that we gathered in the previous slides can now be compiled together and used to
perform baselines of what th curr nt n twork looks like.
Traffic Utilization Baseline
This baseline will take traffic utilization across the networking fabric and will give us an idea of timing vs
utilization. We also need to monitor the links utilization and backup link utilization for critical applications.
CPU Baseline
A lot of people look over this one, but showcasing common CPU trends can grossly assist in troubleshooting
procedures. It can also show us issues early on, for instance I have seen situations where ICMP redirects had
been occurring in a network for ages. They added a bit more traffic and now suddenly the CPU of a switch is
completely verl aded it no longer processes spanning tree BPDUs and declares itself root thus causing a loop.
So the bad situati n which would have been easy to manage has now become catastrophic.
Delay and Jitter characteristics of critical applications
It is important to note the delay and jitter characteristics of critical applications during periods of high traffic
utilization.
L2 nd L3 CAM size / stability
Th s s a major necessity, lets look at the ramifications of not monitoring this in terms of available L3 CAM h
p://arstechnica.com/security/2014/08/internet-routers-hitting-512k-limit-some-become-unreliable/ .
Stab l ty of CAM is important as well if we are constantly flushing and relearning entries in CAM there are some
issues such as potential loops we need to look into
The first step in interpreting business requirements is really going to be understand the business as a whole.
The placement of different offices locations and the overall business hierarchy including Lines of business
define how the network traffic patters will work
Define the overall goals for the design project and the consequences of failure
This is crucial we need to understand the visibility placed on this project and what is considered a success and
what is considered a failure
We now understand the business as a whole lets start to dive deep into the individual application requirements
Define Critical Applications and the location across the network fabric
We need to begin by creating a list of critical applications and map them to a network topology.
Create a list of all applications and their business impact IF failure occurs
Take the list and assign a dollar value to failures
When we talk about layers we are talking about layers such as access which provides entry into the network and
also in certain situations access to policy management. Distribution which can become a point of aggregation with
certain routing policies applied such as summarization etc.
Ideally we want to use the least amount of layers to perform the needed network functions
The more layers the more hops from any given point to another point.
For the sake of function separation and consistent logical boundaries, implement choke points
A choke point is a logical point where traffic converges into a centralized point. This may be an aggregation like
device. The reason you create these points is to map to logical flow more effectively and maintain
management of your various networking policies. For example if devices are going upstream to an aggregation
devices that has QOS implemented in an manner to only send a CIR to the WAN routers then it makes sense
to connect your new device to the existing aggregation point even if you are able to connect directly into the
WAN router.
First Cloud:
If we are using ECMP across the fabric it is a 1:1 ratio
If we are using a single path it is a 2:1 ratio we calculate this by looking at (total downstream bandwidth) : (total
upstream bandwidth) so in this case it would 20 : 10 or simplified its 2: 1
When planning the physical layout, try to use Collapsed Core or Campus (Core/Aggregation/Access) layouts as
reference.
As they are very common, they have better documentation, support and will be easier to the customer to
understand and maintain the network.
Collapsed Core:
Very common for small/medium size networks on a single building;
Easy for customers to understand, v ry important as this size of network usually dont have a dedicated
administrator;
Easy to maintain, as usually doesnt depend a lot on spanning-tree or routing protocols;
The collapsed core usually is a pair of switches to have redundancy, and the access switches may be stacked.
Avoid as much as possible cascading access switches if more than one is required in the closet.
Campus (Core/Aggregation/Access):
More commonly used for large size networks, usually spread across different building;
Topology simple to understand, but may be difficult to maintain, due the high dependency on spanning-tree and
routing protoc l ;
Core and aggregation usually are pairs of switches for redundancy, access switches may be stacked.
Ring topology will not be covered, but some networks may require due to cabling limitations. It may not be
recommended unless required due to that limitations.
A spec al a ention is needed for the spanning-tree planning and oversubscription for that topology.
Full and partial mesh topologies will also not be covered. They are not recommended due to being hard to maintain,
being very dependent on L2 and L3 protocols, and having a high costs of connections. Just to be used
if really needed.
To ensure redundancy, the first step is to map the critical components and ensure that a single failure will not
affect that component.
Core and aggregation are usually in pairs, and the equipment below them are connected to both.
Firewalls must be in pai s, and ea h one connected to a different switch.
Servers usually are capable of having a dual path and are recommended to be connected on two different
switches.
End user nodes us ally dont have redundancy, and in case of a failure on a access switch, a group of users will
be offline. But all the access switches where the users are connected are recommended to have redundant
uplinks, to prevent a failure on the core to affect a group of access switches.
In case of a tacked witch on the access, the uplinks are recommended to be spread across the stack members.
If the customer doesnt expect redundancy in some level on the network, always explain and document the risks
and potenti l losses in case of failures.
This is a real case scenario, on a customer with a pair of core switches, a pair of access switches in the Data
Center, and a large campus with one switch at each building.
The Data Center access switches are dual home to the core switches.
The access switches are in buildings far from each other, with fiber rings connecting them. To
ensure redundancy the rings have two uplinks from different points to the core.
In this scenario the customer has one building with the main DC and an office, on the second building the
customer has a backup DC.
For Building 1, as the Data Center core has more capacity, is was connected as the main core of the network, with
the DC access switches connected directly to it, but an aggregation switch was added to the office, with the access
switches connected to it.
Building 2 has another Core, conn ct d to the core on building one. Due to cost limitations there are just two y
links connecting the buildings, but th can ensure redundancy.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC; QSFP+ uses MTP
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC, being the more common; QSFP+ uses MTP/MPO.
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC, being the more common; QSFP+ uses MTP/MPO.
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC, being the more common; QSFP+ uses MTP/MPO.
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC, being the more common; QSFP+ uses MTP/MPO.
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC; QSFP+ uses MTP
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
Connectors: The MTP cable may have 12 or 24 fibers (6 or 12 pairs). They are not compatible and the right type
must be used.
Polarity: Cable polarity must be flipped odd (1, 3, 5..) times on the path. Polarity may be flipped on the cable or on
the coupler. Some MTP cables have connectors that allow polarity to be reversible on field, other cable doesnt allow
and must be replaced, delaying deployments.
Connectors: The MTP cable may have 12 or 24 fibers (6 or 12 pairs). They are not compatible and the right type
must be used.
Polarity: Cable polarity must be flipped odd (1, 3, 5..) times on the path. Polarity may be flipped on the cable or on
the coupler. Some MTP cables have connectors that allow polarity to be reversible on field, other cable doesnt allow
and must be replaced, delaying deployments.
Connectors: The MTP cable may have 12 or 24 fibers (6 or 12 pairs). They are not compatible and the right type
must be used.
Polarity: Cable polarity must be flipped odd (1, 3, 5..) times on the path. Polarity may be flipped on the cable or on
the coupler. Some MTP cables have connectors that allow polarity to be reversible on field, other cable doesnt allow
and must be replaced, delaying deployments.
Connectors: The MTP cable may have 12 or 24 fibers (6 or 12 pairs). They are not compatible and the right type
must be used.
Polarity: Cable polarity must be flipped odd (1, 3, 5..) times on the path. Polarity may be flipped on the cable or on
the coupler. Some MTP cables have connectors that allow polarity to be reversible on field, other cable doesnt allow
and must be replaced, delaying deployments.
Using fiber Optical Fiber is not as simple as UTP. Some parameters need to be checked during the design and
pre-site phases to ensure everything will run well.
Connector: SFP/SFP+ uses LC; QSFP+ uses MTP
Strands: SFP/SFP+ usually uses duplex fiber, unless a special connector is being used;
Mode: Multi-mode for short distances; Single-mode for long distance. Distance varies according to speed,
connector, fiber quality, and fiber connections on the path.
For switches directly connected with a fiber patch, must check only if the fiber is compatible with the connectors and
the patch cord.
When Fiber Optic Distributors (FOD) are on the path, must check the compatibility between the FOD, connector
and patch on both sides.
A 2-Tier network is a very simple design and usually the recommended one, unless the physical distribution of
equipment or the network size does not allow.
On the Access switches it is recommended the port speed of uplinks to the core to be higher than the port speed of
downlinks to the user, to prevent a single user traffic to overload the uplink and affect the performance of the
network.
Example: 10Gbps Uplink/1Gbps Downlink; 40Gbps Uplink/10Gbps Downlink
Any blade switch connecting to the network must be treated as an Access switch and connected directly to the
core.
A 3-Tier design may be needed when it is not efficient or possible to connect all the equipment to the core, as a
campus with several buildings.
In this case, it is recommended to dedicate a pair of switches to the core, instead of creating a mesh between the
distribution switches, as this topology is usually easier to monitor and maintain.
The relation between the uplink and downlink of the access switches must follow the same recommendation as the
2-Tier design.
Regarding the uplink of Distribution switches, the ports may have the same speed as the downlinks, but
depending of the traffic patterns on the network may need more ports aggregated.
Any blade switch is recommend d to be connected directly on a distribution switch.
On a centralized routing topology all the routing is performed on the network core, and the connections to the
lower layers of the network are done in layer 2.
The L2 connections between the layers may be done using port-channels, to have a better utilization of the
uplinks and faster convergence time in case of failure.
But in case of restrictions to use port-channels, using a spanning-tree protocol to block the redundant path and
avoid a loop is an option.
This kind of topology is recommended for most of the small/medium environments, specially if there isnt a
network professional to maintain the environment after the deployment.
The biggest advantage of centraliz d routing is the ease to configure and maintain.
It also enable the use of some appli ations spread across the environment that may require to be on a single
VLAN.
The disadvantages of this topology is bigger broadcast domains and bigger spanning-tree topology.
With a big broadcast domain, the broadcast traffic may become significant on the end nodes, what can lead to
network slowness, loss of connections and a worst isolation of network problems.
Regarding the bigger spanning-tree topology, it may not be a big deal if correctly configured, but the network is
subject to more frequent t pology changes, causing MAC table flushes, leading to unnecessary unicast flood, and
sometimes high switch CPU usage.
On a centralized routing topology all the routing is performed on the network core, and the connections to the
lower layers of the network are done in layer 2.
The L2 connections between the layers may be done using port-channels, to have a better utilization of the
uplinks and faster convergence time in case of failure.
But in case of restrictions to use port-channels, using a spanning-tree protocol to block the redundant path and
avoid a loop is an option.
This kind of topology is recommended for most of the small/medium environments, specially if there isnt a
network professional to maintain the environment after the deployment.
The biggest advantage of centraliz d routing is the ease to configure and maintain.
It also enable the use of some appli ations spread across the environment that may require to be on a single
VLAN.
The disadvantages of this topology is bigger broadcast domains and bigger spanning-tree topology.
With a big broadcast domain, the broadcast traffic may become significant on the end nodes, what can lead to
network slowness, loss of connections and a worst isolation of network problems.
Regarding the bigger spanning-tree topology, it may not be a big deal if correctly configured, but the network is
subject to more frequent t pology changes, causing MAC table flushes, leading to unnecessary unicast flood, and
sometimes high switch CPU usage.
On a distributed routing topology the routing is moved toward to the access, closer to the nodes.
The biggest advantages of this kind of topology are better scalability and better isolation of L2 problems. But by the
other side it is more difficult to deploy and maintain.
Usually this design is needed for big environments, specially virtualized, where a huge number of VLANs and MAC
addresses are needed on the network.
This may also be needed on a campus environment, moving routing to the distribution layer, to avoid
unnecessary broadcast traffic on the uplinks to the core.
This type of design also has a low r d p ndency on spanning-tree, and the changes on topology are restricted to
small parts of the network.
By the other side, it is very dependent on dynamic routing protocols, usually OSPF, and may require a
knowledgeable network administrator.
On a distributed routing topology the routing is moved toward to the access, closer to the nodes.
The biggest advantages of this kind of topology are better scalability and better isolation of L2 problems. But by the
other side it is more difficult to deploy and maintain.
Usually this design is needed for big environments, specially virtualized, where a huge number of VLANs and MAC
addresses are needed on the network.
This may also be needed on a campus environment, moving routing to the distribution layer, to avoid
unnecessary broadcast traffic on the uplinks to the core.
This type of design also has a low r d p ndency on spanning-tree, and the changes on topology are restricted to
small parts of the network.
By the other side, it is very dependent on dynamic routing protocols, usually OSPF, and may require a
knowledgeable network administrator.
On a distributed routing topology the routing is moved toward to the access, closer to the nodes.
The biggest advantages of this kind of topology are better scalability and better isolation of L2 problems. But by the
other side it is more difficult to deploy and maintain.
Usually this design is needed for big environments, specially virtualized, where a huge number of VLANs and MAC
addresses are needed on the network.
This may also be needed on a campus environment, moving routing to the distribution layer, to avoid
unnecessary broadcast traffic on the uplinks to the core.
This type of design also has a low r d p ndency on spanning-tree, and the changes on topology are restricted to
small parts of the network.
By the other side, it is very dependent on dynamic routing protocols, usually OSPF, and may require a
knowledgeable network administrator.
Flat networks are not scalable or manageable. Only in very small networks will broadcasts not be a concern. At
some point youll have so much broadcast traffic that all your hosts will be spending time looking at that traffic and
not processing what is actually destined for them. So you have to break things up at a layer three boundary. The
original way to do this would have been to have many smaller broadcast domains broken up by dedicated routers.
This approach works just fine but its costly and requires much more hardware. Layer three switching with vlans
solves this problem in a much easier and cheaper way.
Vlans also make administration easier by providing a way to segregate the network into easier sections for
administration.
Finally it provides a piece of the overall security picture. Although not really a feature in itself, the segmentation
provided helps insure that onve sations are never visible to those outside the vlan.
There are many ideas you could design vlans around. One would be to structure all the computers in the same
geographic area such as a floor of a building into the same vlan. This is an easy way to design since it is so
obvious what goes into each vlan.
Another idea is to take all the computers from the same department or job function and put them into the same vlan.
For instance all the marketing computers could be put into their own vlan, all the sales into another, and all the
datacenter computers in yet another vlan.
A third idea would be to take all comput rs that perform the same type of application and put them into the same
vlan. For instance all the VoIP phon s could be put into an isolated vlan, all sql servers in another vlan, etc.
In most deployed environments there is a mix of these philosophies either on purpose or as a result of natural
growth. If you have the ability to design from scratch make sure that you have some idea of what you want to
accomplish before you start fig ring out how many vlans you need.
In most cases avoid VLANs possessing over 254 clients for the sake of broadcast mitigation
The guidelines for vlans are fairly straight forward. In general you dont want over 254 clients on one vlan. This
corresponds to a single /24 network. This is best practice because as the amount of devices increases the amount
of broadcasts also increases.
Apply restrictive subnets to VLANs you only expect a limited number of users to join. (ex: a point to point VLAN
interconnecting 2 L3 switches)
Sometimes you may want to have less than 254 clients in a vlan, and in those cases you may want to add multiple
vlans and use a more restri tive subnet mask than a /24. For instance for border vlans going to an ISP you might
want to have an isolated /30 for just that connection. This would give you the ability to only two addresses point to
point.
Mitigate VLAN propagation where possible to limit the size of the L2 CAM table, and the VLAN database
Once youve decided what to base your vlan design on and have decided the number of needed vlans, you have to
start deciding where those vlans will be flooded. The layer 2 domain should only encompass as many devices as it
absolutely can to get the job done. In the above topology you have a core switches and distribution switches
connected to client d vic s. As you can see not every switch needs to have VLAN 10 and 20 assigned.
The best method is to manually prune those vlans to where they need to be.
GVRP is one option for situations where you want to populate vlans throughout your domain from a single source.
The idea is that a single switch becomes the server in an environment and then floods the vlans learned on that
switch to all other switches in the domain. This is done via a multicast packet that will go throughout the entire
domain. As you can see above Switch A will flood to switch B and C. When vlan 2 is entered on switch A an update
message will be sent to the other two switches with the vlan to be added.
This is very similar to Ciscos VTP protocol. The two are not compatible however you cannot take VTP packets and
translate them to GVRP for instance. There are other significant drawbacks to using GVRP. For instance VTP can
send out vlan names but GVRP cannot. There is no such thing as transparent mode on GVRP. Finally there is
essentially no security for GVRP anything in the multicast domain will be seeing the messages and adding the
vlans into the domain.
Bundling of bandwidth from multiple links has been important since the early telecom days. Early on link
aggregation were important for switch to switch uplinks, but now is used in a variety of scenarios especially on the
host side.
Planning for link aggregation is not difficult, but to make sure that the customer is going to get the benefit they
need from the design there are several things that need to be considered. Well look at the design decisions
involved in where to put lags, how many links are needed, and how to make sure that youre getting the throughput
that you need.
The most common reason to implement link aggregation is to increase bandwidth. It is common even among industry
professionals to confuses bandwidth with speed however. In the above diagram you have three 10 gig links. The
maximum amount of speed that you can have on any single connection is going to be 10 gigabits even though you
have three 10 gig links. Link aggregation is a load balancing protocol, not a throughput increase.
Bandwidth refers to the amount of total traffic available. The top speed of any individual connection is only the link
speed. In order for a single endpoint to utilize more than the link speed of a connection it needs to be a multi-
threaded application and have a hashing algorithm on the switch side that is going to cause it to actually utilize
multiple links.
Before we can determine how many links we need in the lag, and what type of hashing we need to be using, we
need to determine the use case for the lag. In the simple case of switch to switch uplinks the primary consideration is
oversubscription. Ideally we never want to have to buffer anything on switch uplinks. So if we have 20 populated
gigabit ports on a switch we will need at least two 10 gig ports for uplinks to another switch in most cases. Talking to
a client and understanding traffic patterns is also very important: if we have a 48 port switch with 30 ports populated
at gigabit speeds but they only require 100 megs of throughput then we could theoretically get by with one 10 gigabit
link. We wouldnt want to do this for redundancy purposes, but it is possible.
We also need to understand if the traffic pattern is bursty or consistent. If the typical transfer rate on a port is 100
megs but can burst up to gigabit we have to determine if we are willing to accept temporary queuing on an interface
or if we want to make sure the uplink is completely not blocking.
Another consideration elates to hashing and that is what are the normal traffic source and destinations. If most our
traffic is destined for one port from this switch say a single gateway then something like source and destination
mac will not be enough. We will want to choose the method that has the highest amount of information available to
hash on.
In the above diagram you now have multiple endpoints and multiple sources. Mac addresses are visible on the
above diagram so that you can see the information available to the switches to hash. Remember that when a
packet is sent out of switch A to B that the hashing algorithm on the switch determines what link it will be sent
down. The actual algorithm is not generally published but the method is.
When only source and destination are used as in the above diagram you get a situation where it is very likely that all
three hosts will be sent across link A. All the mac addresses are very similar and may wind up with the exact same
outcome for the hashing algorithm.
In most situations youre going to be better off with an lacp link than you would with a static lag. LACP sends protocol
data units or pdus back and forth across the links involved to ensure that the other side is still up. In fact lacp will not
come up at all if it does not detect a pdu from the expected system on the other end.
What this means is that if a system is miscabled that the link will not come up. There are two main reasons we would
want the link to not come up if P Us are not detect. The first is a situation where we have a bad cable and you are
transmitting but not receiving on a pair. With a static lag the switch would keep sending down that link as long as the
physical link state is up. This will lead to a large loss of packets on the other end that could be difficult to diagnose
until you shut down that individual lag member or look carefully at the counters on each side. If the link had been set
to lacp th n as soon as the other pair went bad you would have had the link immediately go down on both sides,
resulting in no loss of packets.
Another problem with lags is that spanning tree is only run at the lag level itself not at the physical link level. If a
cable is plugged into the w ong port then the packet will not be blocked at the physical port level only at the lag
level. Enough packets will result in a broadcast storm in your network as the link plugged into the wrong endpoint
keeps flooding it o t to all destinations. A cable plugged into the wrong port would have never been an issue with lacp
as the link would have of course not come up.
Not all host NIC teaming options require link aggregation. any forms of host teaming are software based and
require no switch configuration at all. When it does there are a few things to keep in mind.
First, the hashing that we talked about earlier is going to be very important. The host itself is going to control the
hashing for sending, so there is nothing that can be done on the switch end to ensure that traffic sent out from the
host is equally balanced. We also want to consider whether hashing with lacp is even the right choice in every
situation. Is it better to use something like VMware's Route based on Virtual port ID where per virtual machine it is
balanced across the available links, with failover available.
When we do use lags, be aware that not all host systems support lacp. VMware only supports static lags unless you
are using distributed virtual swit hes an enterprise feature. Linux will need to have the bonding type set to four in
order to use lacp, etc.
Once you have determined the requirements then you need to baseline the network and test it. If this is a situation
where you are replacing a switch youll need to collect the traffic patterns for a short time to see what the network
currently looks like. Collect sflow or netflow information if possible, and if not collect statistics either via several
static show commands or through a network management system.
Test the current top throughput while in production and during a maintenance window where there is little traffic on
the network. Use iperf for raw network testing and something like Sqlio for application testing. There may also be
specific application considerations that you would need to test such as for VoIP.
In a greenfield environment you are working with their expected traffic patterns but never know for sure until all the
endpoints are implemented. Its best to set up the environment and get theoretical performance on the network
say with an ipe f test and then go back to the customer to test performance once applications and endpoints are
up with Sqlio or something similar.
Spanning tree protocol is both a blessing and a curse. When you need a multi device layer 2 topology it prevents
loops when using a redundant topology. However it can be difficult to manage and design properly if you have not
done the proper work ahead of time. One of the dangers of spanning tree is that when you plug in several switches
that use spanning tree it just works. In theory that is great it sounds plug and play. The problem is that your
spanning tree topology will probably not converge in the best way possible. To make sure that we get an optimal
design we need to plan carefully looking at all the factors involved before we deploy it.
Understanding traffic patterns is important in all scenarios, but specifically in spanning tree design. When we talk of
traffic patterns we typically think of traffic as either East-West or North-South. Were going to look at two types of
datacenters. The first is a datacenter that does a large amount of backend research for a financial institution. The
number of transactions coming into the datacenter from outside is small, but there is a complicated data analysis
and mining application that function in the background. Data goes through a front end process that takes a raw data
file and then splits it into a database cluster. This database cluster stores the data on redundant sans for protection.
From the database cluster there are a variety of custom applications that take that data and analyze and categorize
it.
Compare this to a company that do s int rnet based multimedia distribution. Requests constantly come into the data
center for information and streaming media servers serve those files to customers via streaming technology. While
there is still some amount of traffic inside the datacenter for accounting, etc., the majority is destined for outside the
local netwo k.
Why is this significant? We need to plan the spanning tree root before we can plan anything else in spanning
tree. Then we need to think about whether we want all traffic flowing to the root or whether we want to optimize
the traffic so that east/west traffic can communicate more effectively. While we need to plan for redundancy we
also want to plan for maximum efficiency. Like all design scenarios talk to the customer and make sure what
their requirements are before proceeding.
In spanning tree the primary design decision is choosing the spanning tree root. This is typically a fairly easy
decision however it is one that will need to be consciously made in order to avoid problems later. By default
spanning tree protocol will have he same bridge priority assigned to each switch. It falls back to the smaller mac
address of the switch to determine who should be root with will often cause another vendor or outdated switch to
become the root.
Imagine an all Dell environment where no priority is set. A customer might have an old 2700 switch in a utility
closet somewhere. Going by mac only this switch may very be elected root and it simply doesnt have the
resources to handle the traffic nor is it in an optimized location. Traffic may flow from more powerful switches
towards this switch and then back out to l ave the local network.
Typically your newest or best swit h needs to be the spanning tree root. If doing layer three switching the root should
also be where you a e doing your routing. The priority will be set to the lowest on this switch. Ideally you should have
a second device of nearly the same capability in service and set it as the second lowest priority. Its important to
make s re that in a failure scenario you have predictable failures and without a backup root designated its
impossible to tell how this will happen.
Once spanning tree is optimized the way you want you need to understand how it will operate in a failure scenario.
There are three types of failures that are important: The first is the total failure of the core. If the core immediately
died you need to make sure that you know what switch will become the backup, make sure you have valid
connections to the backup and be sure that the traffic will forward in a way that you need in that scenario.
The second scenario is if the path to your root goes down. Modern protocols such as RSTP and MSTP include a
backup path to the root, but you need to ensure you know what that path is. Plugging a switch that is critical for your
company into the core is p rf ct, however if you put a backup link into an unmanaged switch that you found in the
closest that doesnt provide for a good failure scenario. That may make sense for a little used set of kiosk
computers, but if your failure s enario is not as optimized as your primary scenario youre not planning will.
Finally you need to plan for the int oduction of new switches in your environment whether someone brings a switch
from home and plugs it into their workstation port, adds a new core switch with lower priority or someone plugs in a
cable where it sho ldnt be. Although you can never 100% mitigate problems some of the following are important for
minimizing risk in your environment.
One danger in your network is an outside switch taking over the root for your network either intentionally or
unintentionally. In the above topology someone has added a switch into your network at their desk. Evan though
you set your core as 4096 the new equipment has a lower mac address which gives it a superior bid for the root.
The network will now reconverge with that switch as the root.
There are two problems with this: One, the traffic now flows to someones desk where they can tap into everything
going to that switch if they want. The second is that it causes an outage and almost assuredly poor performance
for your network.
You have two options to deal with this that sound similar: The first is putting spanning tree bpduguard on all your
edge ports. In this case if a bpdu is re eived on the edge port it will immediately go into a disable state meaning that
a switch running spanning t ee simply cannot be added. The default behavior for spanning tree is for the port to go
into forwarding - whether portfast is enabled or not.
Another option that sho ld be added is to use spanning-tree rootguard on any port that should be the designated port
in your network. This req ires that you again understand how forwarding should happen in your network. If any switch
receives a superior bpdu from a port that isnt the designated port then it will discard it. This protects a switch even
a legitimate one from accidentally stealing the spanning tree root.
Spanning tree proactively notifies the network about changes via a topology change notification message. This is
important in the event of say a redundant uplink failing.
TCN is designed to correct forwarding tables after the forwarding topology has changed. This is necessary to
avoid a connectivity outage, as after a topology change some destinations previously accessible via particular
ports might become accessible via different ports. TCN operates by shortening the forwarding table aging time,
such that if the address is not relearned, it will age out and flooding will occur.
The downside is that every port change triggers a tcn by default. So if you have a switch with an attached client that
is flapping up and own due to a bad nic card you are fast aging the cam tables in your layer 2 environment every
time that happens, and causing constant flooding. When the CAM is in a constant state of fast aging it will flood until
it relearns the mac address on its local ports, but the problem is the next TCN fast ages the CAM again forcing it to
flood once again until it receives a frame from its connected host.
The easiest way to fix this issue is to make sure that all TCNs that happen in your environment are important. This is
as simple as adding port-fast or edge port to any devices that are not other network devices. Portfast suppresses the
generation of TCNs and allows the port to go into forwarding immediately.
If you are in an envir nment where you may not have access to other devices and are getting TCN floods then you
can al o u e the TCNguard feature. This will take any TCNs received on a port and then drop them instead of
forwarding them on to the rest of the broadcast domain.
Although having a large layer 2 domain is probably not a great design decision it is sometimes necessary. You may
run into issues where you have uplinks and even entire switches that are not being utilized as well as they could. Per
vlan spanning tree and Multiple spanning tree are two methods of using all your resources. Both work in a similar
way allowing for individual or sets of vlans to have entirely separate roots and paths to their root. In the above
diagram there is only one active path by default. However by moving to another vlan implementation you would be
able to utilize both of the links to the root switch.
PVST is very common throughout non Dell switch implementations. PVST works by having a separate cam table for
each vlan in the environment. In doing so it sends a bpdu for each vlan to a well known mac address. Each vlan can
have its own root and forwarding ports. In the above diagram vlans 1,2, and 3 are forwarding to a root of switch A.
Switch B has a root of B for 4, 5, 6.
Verify This PVST will work along with RSTP by detecting which ports are connected to a rapid spanning tree
environment. The switch will then send out regular bpdus so that non pvst+ switches can still communicate.
MSTP works in much the same way as PVST, but not with one vlan per instance. Instead vlans are associated with
any number of instances. There is a common instance 0 that encompasses anything that is not already assigned to
another instance of spanning tree. Then you divide up the environment how you want based on your traffic patterns
into other instances. In the above diagram instance 1 is used for vlans 1,2,3 and the root is Switch A. For whatever
reason the decision was made to use Switch A also as the root for vlans 4,5,6, but using a different path to the root.
MSTP also coincides directly with rstp. For a port to be in mstp mode all the parameters such as version, the
assigned vlans, and the instanc s must match on connecting switches. If this does not happen the switches still
communicate but fall back into r gular rapid spanning tree mode. So you can easily have a mix of mstp and rstp in
your environment if you need to.
Exclusive: A 100% isolated out-of-band system in which physical and logical paths never cross the production
environment.
Hybrid: An out-of-band system that includes both in-band and exclusive characteristics. This is the more
common of the two, based on manageability and cost.
Developed to manage nodes, such as servers, workstations, routers, switches, hubs, and security appliances on
an IP network
All versions are Application Layer protocols that facilitate the exchange of management information between
network devices
Enables network administrators to manage network performance, find and solve network problems, and plan for
network growth
Difference between V2 & V3: 2 does not support encryption and authentication where as V3 support
To use SNMPv3 on N-series and S-series , the switch must have an engine ID configured
auth: Authenticates a packet by using ith r the Hashed Message Authentication Code (HMAC) with Message
Digest 5 (MD5) method or Se ure Hash Algorithms (SHA) method.
Priv: Authenticates a packet by using either the HMAC MD5 or HMAC SHA algorithms and encrypts the packet using the
Data Encryption Standard (DES), Triple DES (3DES), or Advanced Encryption Standard (AES) algorithms.
SFlow overview:
sFlow is a packet sampling technology and as such it collects data from a random selection of the data passing
through an interface. By randomizing the samples synchronization with traffic patterns can be avoided and while the
resulting data is not 100% accurate the errors can be quantified and therefore accounted for in any analysis
Sflow Sampling:
Statistical packet-based sampling of switched or routed Packet Flows
Time-based sampling of counters
SFlow overview:
sFlow is a packet sampling technology and as such it collects data from a random selection of the data passing
through an interface. By randomizing the samples synchronization with traffic patterns can be avoided and while the
resulting data is not 100% accurate the errors can be quantified and therefore accounted for in any analysis
Sflow Sampling:
Statistical packet-based sampling of switched or routed Packet Flows
Time-based sampling of counters
High Stratum Campus Time Distribution Network: The previous section described a WAN time distribution
network. This section moves one step down in the hierarchy to discuss time distribution on a high stratum
campus network.
High stratum campus network is the potential usage of the broadcast association mode
Root of the synchronization tree is a private time source rather than a public time source from the Internet
Low Stratum Campus Time Distribution Network: Cesium time source is provided at the central data center for the
low stratum campus network. This provisions a stratum 1 time source on the private network.
Under ideal situations summarization and aggregation can be fairly straightforward. But if were honest most
customers environments have prefixes deployed in a way that doesnt always make sense and requires us to hack
around the issues.
Why do we summarize?
We summarize to first and foremost reduce the size of our IP routing tables, this saves space on CAM. We also
summarize for the sake of info mation hiding. We talked about this in the first module. This provides smaller
LSDBs in the case of OSPF. This also reduces routing updates and unnecessary updates due to only seeing the
overall summarization block.
Although OSPF process ID has local significance to the router, it is recommended to have the same process ID
for all the routers in the same OSPF domain. This improves configuration consistency and eases automatic
configuration tasks.
Configure a deterministic router ID for OSPF process, using router-id command.
Choose the router ID (IP add ess) from the same OSPF area address space the router belongs to. This helps in
route summarization, in case these router IDs need to be routed.
If OSPF router ID needs to be routable, configure a loopback interface with the same IP address and include it
under the OSPF process.
Note situations where ECMP can be utilized to balance traffic across the networking fabric
Showcase h w OSPF ECMP can be utilized to balance traffic across a collapsed core scenario and the benefits of
this type f setup. N te also that certain applications require L2 reachability to each other and so in these situations
utilizing another HA / traffic balancing option like VLT would be an option for L2
Boundaries can be placed at the edge of the core or within the distribution layer
This is the decision we need to make do we want to assign the flooding boundary at the core or the distribution,
there are costs and benefits to ach
At times we can accept a hybrid model that moves the flooding boundary depending on circumstances. In the case
of area 10 we can note the added complexity of the interconnections between the access and distribution layers
may not be what we want to allow within area 0. This all depends on business requirements.
When we have multiple points of redistribution this protection mechanism is removed and one device place rip
prefixes in ospf while the other devices advertises those same routes to rip as being originated in ospf. Due to AD
RIP would prefer to go to OSPF for its own internally advertised routes in many cases with the exemption of directly
connected ro tes. This can in many cases cause a routing loop or at best suboptimal routing. The way to fix this is
either play with AD which is NOT the preferred option or tag all routes coming in and being advertised by rip and by
ospf with different route tag values. At that point filter by route tag and dont allow rip or OSPF learned routes to be
readvertised to themselves.
Asymmetric Routing
The problem, certain applications r quire security validation against MiM attacks. They do this by verifying that
their next hop is the same location that the return traffic is coming from. In certain situations the return path can
be different than the path the traffic was sent down. There are a couple ways to get around this with the easiest
being to place a static oute in place for that type of traffic to use the needed path. Other options are generally
situationally based, lets look at the customer documentation !
Routing Loops
Routing lo ps can ccur from various reasons, like for instance redistribution loops like above, improper
placing and setting f static routes, improper placing of EBGP and IBGP speakers, etc.
Suboptimal Routing
This is a situation where traffic takes a path that is not the most optimal for various reason, it could be bad summ
riz tion placement, improper OSPF area placement, etc. and in some cases suboptimal routing can be ccept ble
if the needs of the business are in play
Asymmetric Routing
The problem, certain applications r quire security validation against MiM attacks. They do this by verifying that
their next hop is the same location that the return traffic is coming from. In certain situations the return path can
be different than the path the traffic was sent down. There are a couple ways to get around this with the easiest
being to place a static oute in place for that type of traffic to use the needed path. Other options are generally
situationally based, lets look at the customer documentation !
Routing Loops
Routing lo ps can ccur from various reasons, like for instance redistribution loops like above, improper
placing and setting f static routes, improper placing of EBGP and IBGP speakers, etc.
Suboptimal Routing
This is a situation where traffic takes a path that is not the most optimal for various reason, it could be bad summ
riz tion placement, improper OSPF area placement, etc. and in some cases suboptimal routing can be ccept ble
if the needs of the business are in play
Asymmetric Routing
The problem, certain applications r quire security validation against MiM attacks. They do this by verifying that
their next hop is the same location that the return traffic is coming from. In certain situations the return path can
be different than the path the traffic was sent down. There are a couple ways to get around this with the easiest
being to place a static oute in place for that type of traffic to use the needed path. Other options are generally
situationally based, lets look at the customer documentation !
Routing Loops
Routing lo ps can ccur from various reasons, like for instance redistribution loops like above, improper
placing and setting f static routes, improper placing of EBGP and IBGP speakers, etc.
Suboptimal Routing
This is a situation where traffic takes a path that is not the most optimal for various reason, it could be bad summ
riz tion placement, improper OSPF area placement, etc. and in some cases suboptimal routing can be ccept ble
if the needs of the business are in play
Asymmetric Routing
The problem, certain applications r quire security validation against MiM attacks. They do this by verifying that
their next hop is the same location that the return traffic is coming from. In certain situations the return path can
be different than the path the traffic was sent down. There are a couple ways to get around this with the easiest
being to place a static oute in place for that type of traffic to use the needed path. Other options are generally
situationally based, lets look at the customer documentation !
Routing Loops
Routing lo ps can ccur from various reasons, like for instance redistribution loops like above, improper
placing and setting f static routes, improper placing of EBGP and IBGP speakers, etc.
Suboptimal Routing
This is a situation where traffic takes a path that is not the most optimal for various reason, it could be bad summ
riz tion placement, improper OSPF area placement, etc. and in some cases suboptimal routing can be ccept ble
if the needs of the business are in play
Asymmetric Routing
The problem, certain applications r quire security validation against MiM attacks. They do this by verifying that
their next hop is the same location that the return traffic is coming from. In certain situations the return path can
be different than the path the traffic was sent down. There are a couple ways to get around this with the easiest
being to place a static oute in place for that type of traffic to use the needed path. Other options are generally
situationally based, lets look at the customer documentation !
Routing Loops
Routing lo ps can ccur from various reasons, like for instance redistribution loops like above, improper
placing and setting f static routes, improper placing of EBGP and IBGP speakers, etc.
Suboptimal Routing
This is a situation where traffic takes a path that is not the most optimal for various reason, it could be bad summ
riz tion placement, improper OSPF area placement, etc. and in some cases suboptimal routing can be ccept ble
if the needs of the business are in play
With the importance of the networks to the businesses today, it is very important to understand the required level of
availability and plan accordingly.
Components (switches, hosts, firewalls, links) are subject to fail, be sure to understand the impact of a failure in
theses components and what changes are required to mitigate the risks, or have them on a acceptable level to the
business.
In this module we are going to discuss about some configurations that can make the network more reliable.
In an earlier chapter we discussed about Port-Channels. When using to connect two equipment it is a great
technology to increase the bandwidth, but it may not prevent against component failure. Using a Multi-Chassis
Link Aggregation technology solve this issue, preventing from a switch failure causing an outage.
Among the multi-chassis link aggregation options, stacking is the easiest to deploy and manage, but by the other
side software failures or software updates affects the entire stack at once.
This option is recommended when:
Customer staff is not able to maintain more complex multi-chassis technology;
Network failure does not cause losses to the customer operation;
Customer may easily schedule a maintenance window with downtime.
The procedure to create the stacking may vary with the switch model, with some series requiring the use of
dedicated ports and cables, and others using user port for stacking.
Stacking supports connecting the switches in a ring or daisy-chain topology, with ring always being
recommended.
For additional reference about support and limitations, check the Configuration Guide for the switch model
being used.
Among the multi-chassis link aggregation options, stacking is the easiest to deploy and manage, but by the other
side software failures or software updates affects the entire stack at once.
This option is recommended when:
Customer staff is not able to maintain more complex multi-chassis technology;
Network failure does not cause losses to the customer operation;
Customer may easily schedule a maintenance window with downtime.
The procedure to create the stacking may vary with the switch model, with some series requiring the use of
dedicated ports and cables, and others using user port for stacking.
Stacking supports connecting the switches in a ring or daisy-chain topology, with ring always being
recommended.
For additional reference about support and limitations, check the Configuration Guide for the switch model
being used.
*Note: Figure on the left: Stacking Dell N4000 series using 40G and 10G stack links.
The VLT may be viewed as two independent switches (independent management planes), connected by a LAG,
with a virtualization protocol running.
This virtualization protocol enables the devices to connect to both switches using a single LAG. This technology
also enable the devices to have independent connections to the switches if required.
The VLT in general have a better availability than stacking, as a software failure or software update in one member
does not affect the other. But to ensure the network traffic is not affected on a switch failure/reload, all the parameters
must be correctly configured and all the connections redundant.
The management of a VLT Domain is more difficult than of a stacked switch, and it is recommended to evaluate if
the customer will have the r quir d knowledge on his team before suggesting this design. The lack of knowledge
may lead to network failur s, customer not being able to manage the network, and lack of satisfaction with the produ
t.
Recommendations:
VLT Domain ID may not epeat on the network;
VLTi to use static port-channel;
Devices connecting to VLT to use LACP;
Configure the system mac-address parameter;
Configure the heartbeat link.
The VLT may be viewed as two independent switches (independent management planes), connected by a LAG,
with a virtualization protocol running.
This virtualization protocol enables the devices to connect to both switches using a single LAG. This technology
also enable the devices to have independent connections to the switches if required.
The VLT in general have a better availability than stacking, as a software failure or software update in one member
does not affect the other. But to ensure the network traffic is not affected on a switch failure/reload, all the parameters
must be correctly configured and all the connections redundant.
The management of a VLT Domain is more difficult than of a stacked switch, and it is recommended to evaluate if
the customer will have the r quir d knowledge on his team before suggesting this design. The lack of knowledge
may lead to network failur s, customer not being able to manage the network, and lack of satisfaction with the produ
t.
Recommendations:
VLT Domain ID may not epeat on the network;
VLTi to use static port-channel;
Devices connecting to VLT to use LACP;
Configure the system mac-address parameter;
Configure the heartbeat link.
The MLAG is a virtualization technology with some similarities with VLT, but it has some restrictions regarding the
topologies supported, as a limited support for single-homed devices.
It is very important to know if the customer is able to manage and understand the limitations of this technology. It
must be advised that all equipment connecting to the MLAG must always be dual-home with port-channel.
The MLAG is a virtualization technology with some similarities with VLT, but it has some restrictions regarding the
topologies supported, as a limited support for single-homed devices.
It is very important to know if the customer is able to manage and understand the limitations of this technology. It
must be advised that all equipment connecting to the MLAG must always be dual-home with port-channel.
Supported:
MLAG can also be used in conjunction with Dell Networking VLT. VLT provides the equivalent results as MLAG,
but is available on the OS 9 based Dell Networking switches like the S4810. Each peer pair must use the same
multi-chassis lag feature. This provides the network with the same resiliency and improved bandwidth as when
using MLAG in a multi-tier MLAG-only environment.
Unsupported:
Attempting to connect a MLAG p r with a non-MLAG peer.
Using two different series switch s mod ls as MLAG peers. So N2000 can only peer with N2000, N3000 with
N3000, and N4000 with N4000.
From: Using MLAG in Dell Netwo ks v1.3, Victor Teeter, Feb 2015
Note: this slide shows the supported config. Click on the Unsupported blue button to view the unsupported
config.
Note: this slide shows the unsupported config. Click on the supported blue button to view the unsupported
config.
VRRP and HSRP has the same purpose, to provide first hop redundancy by configuring a virtual IP between two or
more routers.
They are easy to configure, and the steps to configure doesnt change a lot between them.
We can note in the comparison table:
HSRP is Cisco proprietary, while VRRP is open standard;
Both have capability to preempt the virtual IP, with VRRP having it enabled by default.
With VRRP you can use the interface IP address of the primary router as the VRRP address, HSRP doesnt allow
this. It may be used when th re is a lack of available IP addresses.
Default hello and dead time are diff r nt, but configurable. Some equipment may accept sub second timer;
Both use multicast to send hello pa kets.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP is pretty straightforward to be configured, but some attention is required to prevent issues after the
deployment.
Some equipment have limitation on the maximum virtual routers allowed, check this limitation on the model and
software version being used. For N-Series it is currently 50, for S-Series there isnt a maximum.
Deployment with more than 100 virtual routers, combined with other protocols enabled on the switch, may
consume too much CPU and start flapping the virtual routers. Consider increasing the Hello time to consume
less resources and prevent flapping.
After a switch reboots it can take a while to have the routing protocols and routing table stabilized, consider
adding a delay for the preemption.
Evaluate which switch on your topology will be the primary and configure the priority higher on that router.
Usually the spanning-tree root is the recommended.
When sharing IPs between the physi al interface and the VRRP, the VRRP priority on that switch must be 255.
Check if there is any dependency on a interface or route to cause failover of VRRP in cause of fail and configure
tracking.
VRRP usually works with one active virtual router and all the other switches as standby. When configured
together with VLT, both switches on the VLT domain actively forward traffic, even if the switch appears as
backup when issued a show vrrp command.
Consider configuring the VRRP priorities to ensure all virtual routers in a VLT peers are as master, and the other as
backup.
Configure the static and dynamic routing that both VLT peers have the same routing table, both VRRP peers must
be able to forward traffic the same way.
VLT peer routing makes the switch to route the packets destined to the peer MAC address instead of sending
across the VLTi.
Also, the switch answer for ARP requests directed to the other switch in case it fails.
One big advantage of peer routing when compared to VLT is the scalability, as there isnt all the processing
associated with the advertisement packets.
But peer routing needs both switches to be online and synced before starting to work, what can be a challenge in
case one hardware is remove due to failure and the other needs to be rebooted. Some reconfiguration of IP
addresses may be needed in this case. VRRP doesnt have this limitation.
One special care is needed with p r-routing-timeout, as if this is changed from the default (infinite), the routing may
stop working if there isnt a switch r covery.
Storage is a very critical part of any network design, it is important to ensure redundancy to avoid impact on the
operation in case of a component failure.
To achieve redundancy on the storage it is recommended to use multipathing techniques.
On the storage, it usually have two controllers, with one, two or four interfaces on each controller.
On the switch level, it is recommended to have two different paths, with the storage having access to both paths. On
the server you may have pairs of network interfaces, connected to the different paths.
It is important to check if in case of a component failure (storage controller, cable, switch, network interface)
there will still be a working path.
In this example, the PS4210 has two controllers, with two ports on each controller.
Each controller is dual homed, and the server is dual homed.
For EQL, it is required the switches to be lagged or stacked (lagged is preferable), and to use a single VLAN.
In this example, the SC4020has two controllers, with two ports on each controller.
Each controller is dual homed, and the server is dual homed.
For Compellent, it is required the switches to be independent, simulating the concept of fabrics existing in FC
networks.
To connect the server with redundancy to the network, it is recommended to have an even number of
connections, spread equally on the switches, for the same function. Usually 2x (2, 4, 8, 16) connections is
recommended, to have a traffic better distributed on the links, due to the algorithms used.
For each function, the ports used must be of the same speed, to prevent the slowest ports to be overloaded while the
others have bandwidth.
For stand-alone switches, techniques as active/passive, BACS Smart Load Balancing, VMWare Hash Based on Port
ID, Linux TLB, may be used. For stacked switches of with VLT, besides the techniques above, the ports may also be
lagged. For MLAG, the ports MUST be lagged.
If the option is to not use LAG, it is important to ensure the port have the same configuration.
It is recommended the NIC onne tions to be used in pairs, going to two different switches; Use
the same port speed for the connections;
If switches are stacked or with VLT, it is a option to used port-channel, instead of active/standby, or other
active/active techniq e;
If switches are MLAGed, port-channel is required;
If active/standby, r active/active without port-channel is being used, ensure the switch ports have the same
configuration.
It is recommended each standalone ToR or stack to be dual homed. To spread the ToR on the core/aggregation
with single connection is not a good idea, as a single switch failure may affect lots of ToR switches.
If using a stack, consider spreading the uplink on different switches on the stack, to prevent a single switch failure to
isolate the stack.
You can achieve a acceptable oversubscription if you use a LAG on the uplink, with the port speeds the same as
the downlink ports. But depending on the algorithm used on the LAG, one single end station may overload a uplink
port, and impact other users/traffic.
To prevent problems created with unidirectional links, if the ToR and the upstream switches support UDLD, and they
are compatible, consider nabling it. If they dont support, or are from different vendors or families, consider using
single port LACP in replac m nt of UDLD.
Uplink Failure Detection (UFD) on S-Series, and Link Dependency on N-Series, are good features to avoid traffic
black hole in case of uplink failure.
As some customers may have limitations on the ports available on the core/aggregation to connect the access
switches, some times they may be single homed, but the servers still need to be dual-homed.
In this case, with UFD/Link Dependency configured, the server is able to know when that path is not available, as
the switch shutdown the port in case of uplink failure.
On both protocols, the uplinks and downlinks must be explicitly configured on each group.
QOS is a topic that can quickly become frustrating. Its one of those design decisions that can be completely ignored
in favor of excess capacity, used as a temporary failure mechanism or granularly controlled and obsessed over.
Hopefully this module can provide some guidelines on implementing it in customer environments where needed.
Quality of service is simply the art of traffic throughput manipulation. The reasons for needing it vary but typically
there will be a single application or set of applications that need close to 100% network integrity.
QOS is only one method of accomplishing this. You could for instance have an isolated network for this traffic such
as the best practice for storage networks or make sure that you are not oversubscribed as we will go into later.
Essentially QOS design occurs in several phases. After identifying the points of contention in your network you have
to come up with a consistent system for marking the essential traffic. There are two basic approaches here. You can
either A. mark all traffic as close to the source as possible or B. ensure your applications mark their own traffic and
trust that prioritization throughout the network.
Once traffic is prioritized it will be sent first whenever there is contention in the network. This solves many issues
including microbursts in your environment. In the case of a large amount of traffic being sent at once the prioritized
traffic is not dropped. Instead a lower priority traffic will be dropped and then retransmitted.
Finally there are cases when you may need to use one or more QOS tools such as policing or shaping to control
traffic.
Below are some of the requirem nts and b st practices for voice and video traffic. Discuss these with the class.
Example: voice and video
Voice requirement:
Voice traffic should be ma ked to DSCP EF per the QoS Baseline and RFC 3246.
Loss should be no more than 1 %.
One-way Latency (mo th-to-ear) should be no more than 150 ms.
Average one-way Jitter should be targeted under 30 ms.
21320 kbps f guaranteed priority bandwidth is required per call
Video Requirement:
Interactive Video traffic should be marked to DSCP AF41; excess Interactive-Video traffic can be marked down
by a po icer to AF42 or AF43.
Loss should be no more than 1 %.
One-way Latency should be no more than 150 ms.
J tter should be no more than 30 ms.
Overprovision Interactive Video queues by 20% to accommodate bursts
The above requirement can we meet using QoS
Congestion: The potential for congestion exists in campus uplinks because of oversubscription ratios and speed
mismatches in campus downlinks
Oversubscription somewhere in your network is almost impossible to overcome. You are not going to provide 1 gig
of internet uplink for each desktop user despite the fact that most nics are now gigabit.
The question becomes where is it okay to oversubscribe and how much? Both the theoretical oversubscription
rates and the actual traffic usage should be examined before deciding what number is correct.
Oversubscription: Typically 2:1 means, 20 user port trying to access application passing distribution link i.e. only
one link. i.e. Network oversubs ription refers to a point of bandwidth consolidation where the ingress bandwidth is
greater than the egress bandwidth. The distribution to core is in DC is 1:1
Distributed C re Design: Provide Massive scalability, Non-Blocking, High Resiliency, energy efficient and reduce
oversubscripti n rati n in Data Center.
One method of assigning QOS is to simply trust the application as in the above diagram. This allows for less work on
the switch but more initial determination in the software used to make sure it supports QOS fully.
Trusted Endpoints: Trusted endpoints have the capabilities and intelligence to mark application traffic to the
appropriate CoS and/or DSCP values. Trusted endpoints also have the ability to remark traffic that may have been
previously marked by an untrusted device. Trusted endpoints are not typically mobile devices, which means that the
switch port into which they are plugged does not usually change.
Conditionally-Trusted Endpoints: IP phones are trusted devices, while PCs are not, that means trusted and
untrusted endpoint is called conditionally trusted Endpoint.
The above is a typical example of QOS in a campus type environment. You have PCs and telephones at the
access layer.
At the core layer you are going to mostly deal with queuing. The core switches should always forward the VoIP
traffic first. At the wan edge you may need to apply some tool set options such as policing to ensure that wan
subscription rates are not exce d d.
For instance many providers will give you a metro Ethernet handoff that physically goes to 1 gig but you may only
have 512Mbs of negotiated bandwidth. In this case you would want to shape that 1 gigabit amount down to ensure
you do not have d ops at the handoff.
The above are all items that can be used to help with QOS control. Weve already talked about queuing. Shaping is
the process of taking traffic and making it only go at a consistent rate by buffering items as needed. Policing on the
other hand is the process of dropping traffic when it reaches a certain speed period. The choice between these two
will depend on your situation but in most cases you police hostile traffic and shape friendly traffic.
Admission Control: is a validation proc ss in communication systems where a check is performed before a
connection is established to s if curr nt resources are sufficient for the proposed connection.
Classification and Marking: lassify and identify the traffic that is to be treated differently
Policing and Markdown: Policing tools (policers) determine whether packets are conforming to
administratively-defined traffic rates and take action accordingly. Such action could include marking,
remarking or dropping a packet
Scheduling: Scheduling tools determine how a frame/packet exits a device example Strict or WRED
Traffic Shaping: A haper typically delays excess traffic above an administratively-defined rate using a buffer to
hold packets and shape the flow when the data rate of the source is higher than expected
Mean rate:
Also called the committed information rate (CIR), it specifies how much data can be sent or forwarded per unit
time on average
Burst size:
Also called the Committed Burst (Bc) size, it specifies in bits (or bytes) per burst how much traffic can be sent
within a given unit of time to not create scheduling concerns. (For a shaper, such as GTS, it specifies bits per
burst; for a policer, such as CAR, it sp cifies bytes per burst.)
Time interval:
Also called the measu ement interval, it specifies the time quantum in seconds per burst.
This example shows how a network administrator can provide equal access to the Internet (or other external
network) to different departments within a company. Each of four departments has its own Class B subnet that is
allocated 25% of the available bandwidth on the port accessing the Internet.
The QoS Differentiated Services (DiffServ) feature allows traffic to be classified into streams and given certain QoS
treatment in accordance with defined per-hop behaviors.
Dell Networking N-Series switches support both IPv4 and IPv6 packet classification. The Differv capability is not
present on the Dell Networking N1500 switches, diffServ supported by N2000,3000,4000 hardware
The Class Of Service (CoS) queuing feature enables directly configuring certain aspects of switch queuing. This
provides the desired QoS behavior for different types of network traffic when
the complexities of DiffServ are not r quir d. CoS queue characteristics, such as minimum guaranteed bandwidth and
transmission rate shaping, are configurable at the queue (or port) level.