Вы находитесь на странице: 1из 11

Data Center Design Considerations

with 40GbE and 100GbE

Overview of 40GbE and 100GbE technologies in Dell Data Center


Architectures

Dell Networking
August 2013
Dell Data Center Design Considerations with 40GbE and 100GbE

Contents
Executive summary ..................................................................................................... 3
40GbE and 100GbE in data centers .................................................................................. 3
40GbE connectivity characteristics .................................................................................. 5
100GbE connectivity characteristics ................................................................................ 6
Dell Active Fabric Solutions ........................................................................................... 8
Benefits of Active Fabric Architecture .............................................................................. 9
Why Dell? ............................................................................................................... 10

ii
Dell Data Center Design Considerations with 40GbE and 100GbE

Executive summary
Modern data centers have seen an unprecedented increase in network connectivity and bandwidth to
drive workload needs of cloud computing, web services and high-performance computing. Evolution of
Ethernet technology has enabled 10GbE adoption at the server and cost-effective means of carrying
multiple 10GbE streams over 40GbE from access to aggregation and core layers within the data center.

This paper examines the connectivity, bandwidth, reach, media types, power, cabling and optics
technologies that have allowed the creation of data center network architectures without network
bandwidth constraints. Maturation of 10GbE and the emerging 40GbE and 100GbE technologies provide
connectivity alternatives to scale up and scale out data center architectures across enterprise, service
provider and cloud data centers.

The key factors driving 40GbE and 100GbE within the data center are server transitions from 1GbE to
10GbE, virtualized applications driving needs for bisectional bandwidth and favorable fabric economics
resulting from reduction in transport cost per Gigabit of traffic. Bandwidth in the rack is projected to
grow by about 25x from 40Gbps in a 1GbE server rack to 960Gbps in a high-density 10GbE server rack.
Such levels of rack bandwidth growth are driving increased bandwidth requirements for intra-rack and
inter-rack communications within a data center.

Ecosystem of 40GbE technologies, including switching systems, cabling and optics have reached a
maturity point where data center networks can be deployed with high levels of scalability at
reasonable price points. 40GbE technology has reached a point where the cost of ownership is at or
below four 10GbE pipes. There has been significant progress in 100GbE standardization with limited
product offerings to date. 100GbE technologies are in early adopter phase with maturity of the
technologies and richness of connectivity options not at the same level as 40GbE. We have discussed
use cases that will drive 100GbE adoption and how 40GbE now and 100GbE in the future shall enable
network architectures where bandwidth is no longer the bottleneck in data centers.

40GbE and 100GbE in Data Centers


As shown in Figure 1 below, there are multiple connectivity options in the data center depending on
media type (copper or fiber) and bandwidth needs (1GbE, 10GbE or 40GbE) at the rack or blade server.
Server uplinks that have traditionally been over 1GbE copper are now transitioning to 10GbE over
copper and fiber. Server NICs with 40GbE ports from the server have started to become available.
Bandwidth transitions at the server from 1GbE to 10GbE and 40GbE have driven a fivefold increase in
switching systems capacity in the past five years. Uplinks from the top of rack have correspondingly
increased from 20Gbps to 320Gbps; a sixteen fold increase. Reduction in oversubscription rates at the
top of rack is driving the uplink bandwidth requirements even higher. However, servers have driven
approximately half of the 10GbE pipe only and that has helped in keeping uplink bandwidth demand at
the top of rack relatively constant. Advancement in server CPU performance in line with Moore’s Law
will continue to drive higher throughput from the server pipes which will force increased bandwidth
requirements at the top of rack.

3
Dell Data Center Design Considerations with 40GbE and 100GbE

The deployment of 40GbE NICs on servers will gain critical mass when server throughput exceeds 10GbE
and applications start to drive more than 10GbE of data. At that point the need will arise to drive
traffic from 40GbE servers in a rack over 100GbE uplinks at the top of rack switch. The cutover to
100GbE at top of rack will drive economies of scale for 100GbE ecosystem, as that is the position in
data center network that drives the economics of data center network technologies.

40GbE technology in top of rack switches initially experienced the first use case in server connectivity
in 10GbE breakout mode, as the core switches were not ready with 40GbE interfaces. As 40GbE reach
over multimode fiber improved from 150m to 400m and with core refresh to Dell Networking Z-Series
type high density fixed form factor 40GbE switches, the adoption of 40GbE as uplink technology has
been observed between access and core/aggregation layers in the data center network. 40GbE direct
attach cables and 40GbE active optical cables (AOC) have provided rich cabling alternatives for intra-
pod and inter-pod connectivity within the data center. 40GbE single mode fiber solutions such as PSM4
(4 lane solution over Parallel Single-mode) and LR4 can now provide long reach connectivity within
modularized data centers and as interconnects between data centers.

100GbE, on the other hand, is beginning to experience very early adoption with the limited set of
connectivity options that are currently available. Internet2 has been one of the earliest adopters of
100GbE technology in the backbone network. Connectivity to Internet2 network from the data center is
driving some of the earliest 100GbE deployments in the data center. Another early use case for 100GbE
has been as a substitute for 10x10GbE link aggregation between switches within and across Data
Centers. Cost-effective 100GbE optical solutions for intra-data center connectivity are limited to
100/150m reach over multimode fiber with the current generation of 100GbE technologies. 100GbE
breakout techniques to 10GbE and 40GbE, though available, have not reached the same level of reach
and flexibility as 40GbE breakout to 4x10GbE.

The key use case that will drive 100GbE adoption in the data center as stated earlier is 100GbE uplinks
at top of rack driven by transition from 10GbE to 40GbE NICs in servers, and servers exceeding 10GbE
throughput capacities. However with current 100GbE reach limitation over multimode, lack of copper
cabling and core refresh cycles with 100GbE at least a few years away, the options are limited for
100GbE connectivity at top of rack. While mature 10GbE and 40GbE ecosystem now exists (i.e. copper
break out cables, AOC, optics and network test gear) the broader 100GbE ecosystem is just now
starting to appear. Additionally, merchant silicon is enabling compelling cost and density for 40GbE
fabrics.

End users must weigh several factors when they consider 100GbE in the data center:

- Future proof fiber plant for 100GbE


- Cost of optics and reuse
- Link distances
- Structured wiring availability

4
Dell Data Center Design Considerations with 40GbE and 100GbE

Given the plethora of emerging 100GbE connectivity options (CXP, CFP, CFP2, CFP4, PICs/BOA/SiPH)
over both multimode and single mode, an analogy may be helpful to offer some clarity in
understanding the “how-why-where” of various interconnect options. 40/100GbE links can be
categorized as Plumbing links, Fabric links and Clustering links. Plumbing links refer to the much longer
reach structured wiring links which are often looked to be future proof through technology evolutions
as well as provide standards based interoperability across multi-vendor solutions, whereas fabric links
are often shorter reach, intimately tied to the deployed technology (single vendor solutions) and may
often need to be upgraded to match the deployed networking hardware. Clustering links are used as
interconnects between switching systems with technologies such as stacking, multi-chassis link
aggregation and multi-terabit system backplane interconnects.

This view then implies that single mode fiber, which removes reach limitations and enable future
bandwidth growth requirements, would be the best choice for Plumbing links. Plumbing links are often
far fewer than fabric links, so the cost impact of single mode optics would be minimized. The
significantly large number of fabric links naturally brings about concerns over costs of deployment.
Here the use of multi-mode fibers and accompanying cheaper multimode fiber optics and emerging
photonic integrated optics offer economical solutions. Clustering links are used as interconnects
between collocated switching systems to facilitate intra-row or intra-pod communications and cheaper
multimode fiber optics or copper direct attach cables are more amenable to such links.

40GbE connectivity characteristics


Table 1 shows media types and connectivity characteristics of different 40GbE technology. This
includes copper direct attach cable (DAC), multimode fiber and single-mode fiber. The 40G
specifications shown (IEEE standard or proprietary) are available in the QSFP+ form-factor as products,
the only exception being 40GBASE-ER4, which is still under standardization and products are not yet
available. Some 40GbE QSFP+ modules can be used for breakout (4x10GbE) to connect to 10GbE ports,
a capability that has been greatly leveraged in data center deployments. QSFP+ to 4xSFP+ copper DAC
cables are commonly used over short distances, whereas 40GbE SR4 and ESR4 QSFP+ modules are used
in conjunction with MPO-LC breakout cables or passive cassettes to break the QSFP+ into 10Gbps lanes
to connect to 10GBASE-SR modules over medium to longer reaches.

A relatively new technology currently under development is Parallel Single Mode (PSM) which enables
an architecture where multiple single-mode transceivers are interfaced to parallel single-mode fibers.
For example, four 10Gbps transceivers are interfaced to eight single mode fibers (four transmit fibers
and four receive fibers) to carry 40GbE traffic or 4x10GbE traffic. This is similar to the parallel
architecture that is currently used for 40GBASE-SR4 and ESR4 multimode modules. PSM is necessary to
breakout multiple lanes over single-mode fiber (e.g. 4x10GbE-LR breakout) at distances greater than
multimode. PSM can also be used for point-to-point transmission at 40Gbps (or 100Gbps in the future)
where the use of multiple (parallel) fibers for point to point communication is possible and the cost of
additional cabling is acceptable. However, this mode of operation is not compliant with the 40GbE
standard and is considered a proprietary solution.

Table 1: 40GbE connectivity options and specifications

40GbE specification Media type Breakout to Reach


10GbE
40GBASE-CR4 Twin-ax direct 4 x 10GbE DAC 7m
attach copper
40GBASE-SR4 Multimode fiber 4 x 10GBASE-SR 100/150m over
OM3/OM4
40GbE ESR4 Multimode fiber 4 x 10GBASE-SR 300/400m over
(not an IEEE standard) OM3/OM4

5
Dell Data Center Design Considerations with 40GbE and 100GbE

40GbE-PSM4_LR Single-mode fiber 4 x 10GBASE-LR 10km over SMF


(based on 10GBASE-LR
IEEE standard)
40GbE PSM4 Single-mode fiber Not applicable 2km
(not an IEEE standard)
40GBASE-LR4 Single-mode Fiber Not applicable 10km
40GBASE-ER4 Single-mode fiber Not applicable 40km
(on-going specification
in the IEEE, not a
standard yet)

The rich interconnect options in QSFP+ for point-to-point 40GbE links as well as 4x10GbE breakout has
made 40GbE technology very attractive for data centers. Additionally, 40GbE modules are based on
10Gbps lane rates and can leverage the maturity and cost reduction in 10Gbps signaling across
electrical and optical media. The QSFP+ is rated for 3.5W power consumption and all solutions in the
table above are below 3.5W. Multimode solutions (40GBASE -SR4 and ESR4) typically operate around
1.5W, whereas single-mode solutions typically operate closer to 3W. The 40GBASE-CR4 is a passive
QSFP+ plug and the power drawn is inside the chips on the system board.

100GbE connectivity characteristics


Multiple form factors are available and emerging for 100GbE. Each generation of technology
improvements drives down space and power constraints of 100GbE in system designs, while each form
factor lends itself to use cases that users should carefully consider when evaluating 100GbE technology
alternatives in the data center.

Table 1: 100GbE connectivity options and specifications

100GbE Possible 100GbE Power Use case


form factor solutions
CXP 10x10G Copper cable <3.5W Useful for high-performance computing type
and multimode interconnects in clustering links.
optics

CFP Single-mode and <24W Offers early adopters technology to enable


multimode optics 100GbE services.
CFP2 Single-mode and <12W Mature 100GbE interconnects that can be used in
multimode optics Fabric and Plumbing links.

CFP4 4x25G Copper cable, <6W Used as 100GbE interconnects across access and
single-mode and core/aggregation layers in Fabric and Plumbing
multimode optics links in a data center.
QSFP28 4x25G Copper cable, <3.5W Used as 100GbE interconnects across access and
single-mode and core/aggregation layers in Fabric and Plumbing
multimode optics links in a data center. Form factor may support
parallel single mode solutions in future.
PICs/BMO/Si 10x10G and/or 4x25G Technology Photonic Integrated Circuits (PICs), Board Mounted
PH multimode and Dependent Optical Sub-assemblies (BMOs) and Silicon
single-mode optics Photonics (SiPH) are ideal candidates for
Clustering links and Fabric links as the fixed media
nature may not be suitable for heterogeneous
multi-vendor solutions.

6
Dell Data Center Design Considerations with 40GbE and 100GbE

The following table shows the currently standardized solutions for 100GbE and potential for
interoperability with 10GbE and 40GbE.

Table 2: 100GbE standardization solutions

100GbE Specification Media type Breakout to Reach


10/40GbE
100GBASE-CR10 Twin-ax direct N x 10GbE DAC 7m
attach copper
N x 40GBASE-CR4

100GBASE-SR10 Multimode fiber N x 40GBASE-SR10 100/150m over


OM3/OM4
N x 10GBASE-SR

100GBASE-LR4 Single-mode fiber Not Applicable 10km


100GBASE-ER4 Single-mode fiber Not Applicable 40km

The 100GbE standard supports a 7m copper DAC and 100/150m multimode solution, similar to 40GbE.
The twin-ax copper and multimode fiber standard at 100GbE is based on 10Gbps signaling rates.
Therefore, inter-operability with 10GbE as well as with 40GbE (which is internally based on 4x10G) is
technically feasible. 100GBASE-LR4 is the 10km specification over SMF, whereas 100GBASE-ER4 is a
40km specification, similar to 40GbE versions. Both LR4 and ER4 use 4x25G WDM optical lanes.

100GbE solutions currently in use are in the CFP form factor and CXP form-factor. Solutions
implemented are 100GBASE-SR10 and -LR4, as well as 100GBASE-CR10 Copper cable. 100GBASE-ER4
technology has not matured to enable implementations that fit into similar power profiles as LR4
implementations.

As with 4x10G solutions in QSFP+, copper breakout cables with appropriate form-factor plugs at either
end can provide 10/40GE inter-operability (for e.g. CXP to 3xQSFP+); whereas multimode modules like
100GBASE-SR10 CFP or CXP can be broken using optical breakout cables or cassettes to Nx10G or
Nx40G.

Multimode Nx10Gbps optical transceiver chips have been packaged into board-mountable optical
modules. These modules can be used for 100GBASE-SR10 implementations, as well as for Nx10G or
Nx40G breakout.

However at 100GbE, the only option beyond the 100/150m reach of 100GBASE-SR10 is the 100GbE LR4
specification over 10km SMF. This requires the users to transition from MMF to SMF, and incurs the
additional cost differential between MMF and SMF modules, which is greater at 100GbE today than at
40GbE or 10GbE. This has led to significant industry debate and a search for a lower-cost reduced
reach solution over SMF (e.g. ~500m over SMF).

Also, as increasing capacity demands the use of 100GbE in the data center top of rack and core
switches, and in higher density systems across the board, the size of 100GbE module form factors has
started to shrink. Currently under development is a CFP2 form factor that is roughly half that of the
CFP in width, doubling face plate density. It is expected to have both multimode and single-mode
100GbE solutions in CFP2. The CFP2 module power profile is expected to be below 12W.

The next reduction in module size will happen with the CFP4 and QSFP28 form factors. These are
roughly half the width of a CFP2 and hence enable a further 2x increase in density over CFP2. These
modules require further advances in technology due to smaller size and power capability. Copper,
multimode and single-mode could make use of the CFP4 and/or QSFP28 form-factors. CFP4 and QSFP28

7
Dell Data Center Design Considerations with 40GbE and 100GbE

have a 4x25G electrical interface which will result in more streamlined 4x25G Optical/DAC
implementations, whereas 10x10G implementations such as 100GBASE-SR10 will require additional
electrical processing (Gearbox) inside the module compared to a CFP2. The CFP4 max power profile is
<6W while the QSFP28 is <3.5W which is the same as QSFP+.

There is standardization underway in the IEEE to develop smaller, denser 100GbE solutions that will
better fit into a CFP4/QSFP28. These solutions will rely on four lanes with 25Gbps signaling rates per
lane. The specifications that are being created are 100GBASE-CR4 which is a 5m solution over twin-ax
DAC cable and 100GBASE-SR4 which is a ~70–100m solution over multimode OM3/OM4 fiber. The
standards body is also considering a single-mode solution that will work over at least 500m, which will
serve as the intermediate reach solution between 150m MMF and 10km SMF at 100GbE.

To continue supporting 10/40GbE over 4x25G (100G) lane technology, the OIF has created the Multi-
Link Gearbox (MLG) specification which will enable N x 10GbE and M x 40GbE links across 4x25G
interfaces. MLG technology will have to be implemented in system chips and Gearbox chips used inside
modules to support backwards compatibility to 10/40GbE, as more and more systems move to 4x25G
designs.

Dell Active Fabric Solutions


Consumer demand, an exponential increase in internet traffic, big data, data-intensive distributed
applications like Hadoop, data center consolidation, virtualization and video are driving the need for
high-density, high-performance solutions. Analysts predict in the next three years 80 percent of all
data center traffic will be contained within the data center (also known as east-west traffic) and only
20 percent will be of traffic leaving the data center such as Internet traffic (also referred to as north-
south traffic). A primary example of this new traffic pattern can be seen with web search engine
applications. The typical web search engine requires simultaneous communication with every node in
the cluster or rack to provide the most relevant results, furthermore, web servers may require
interaction with hundreds of sub-services that could be running on remote nodes.

Dell Active Fabric solutions are high-performance network fabrics that can meet the resiliency,
performance and scaling needs of enterprise and cloud data centers at reasonable price points while
optimizing space and power. Active Fabric is a leaf and spine fabric architecture based on fat tree Clos
architecture. It can provide full bisectional bandwidth between any two compute nodes in a large
cluster using high-capacity low-cost fixed form factor Ethernet switches. Dell Active Fabric solutions
enable scale out designs in data centers to meet the bandwidth requirements for east-west traffic. The
leaf-spine architecture is comprised of two types of nodes that are used to construct a fabric. The leaf
nodes connect either to the storage or servers in the rack directly or connect to top-of rack switches.
The spine nodes connect to each of the leaf nodes providing an interconnect function between leaf
nodes. Full non-blocking fabrics can be supported with even distribution of bandwidth at the leaf layer
(half the links towards spine and half the links towards compute nodes). The leaf to spine connections
are fabric links that can be based on 40GbE interconnects over multimode or single mode fiber
depending on collocation requirements of leaf and spine switches. In future 100GbE technologies can
be used for fabric links to further optimize cabling and support mixed 10GbE and 40GbE compute
nodes.

Fixed form factor switches provide superior fabric economics for Active Fabrics through low cost, low
power, compact size and high density switching devices. Resiliency is achieved through path diversity
in the fabric using layer 2 or layer 3 multipathing techniques between the leaf and spine layers. Loss of
switching capacity in case of link or node failure is significantly lower in Active Fabric architecture
relative to conventional designs. For example, a 2x4 leaf-spine fabric would lose only 25% of switching
capacity if a leaf device fails compared to 50% switching capacity loss when one of the devices fails in

8
Dell Data Center Design Considerations with 40GbE and 100GbE

a two node conventional core design. No impact to application performance on device failure has been
observed in large Active Fabric designs due to resiliency in the fabric architecture. Scale out with an
Active Fabric solution is achieved by adding leaf and spine nodes as compute requirements increase. A
unique differentiation aspect with fixed form factor switches as building blocks for an Active Fabric is
the option to re-purpose the switches to other positions in the network when the next generation of
core devices becomes available.

Benefits of Active Fabric Architecture

Active
Fabric

There are several advantages of the Active Fabric architecture over conventional chassis-based
architecture:

Cost: The Active Fabric architecture can be scaled to extremely large capacities using attractively
priced fixed form factor Ethernet switches as opposed to complex, expensive chassis-based switches.

Scalability: The Active Fabric architecture is massively scalable and used to build large capacity data
centers.

Performance: The non-blocking architecture delivers high performance and supports any-to-any
communication with full bisection bandwidth at line rate between any two servers.

Scale Out: The architecture enables scale out networking to increase fabric capacity as compute
demands grow by adding leaf and spine nodes without inserting additional layers of switches.

9
Dell Data Center Design Considerations with 40GbE and 100GbE

Resiliency: Network components in traditional architecture need to be updated or replaced


periodically to scale up. Active Fabric architecture allows one node to be brought down or replaced
without having any impact on the overall switch fabric. This built in resiliency is another great
advantage of the leaf-spine architecture.

Flexibility: The distributed core approach allows the use of either layer 2 or layer 3 protocols for
switching and routing within the fabric.

Why Dell?
Dell understands the requirements of the new data center, and its leadership in the server technology
market grants unique insight into the ever increasing demands and changes the role IT will play in the
next decade. With the popularity of virtualization and cloud computing, the enterprise IT landscape
has changed drastically. No longer is it considered a loss, but rather a potential area of revenue.

Providers must embrace these changes and respond to them through innovation and openness. The
field of information technology is far too fluid to continue in the same path of proprietary solutions
leading to vendor lock-in conditions and rigid environments.

Dell is the only vendor with an end-to-end converged infrastructure solution with rack servers, blade
servers, storage offering (iSCSI, FCoE) and top of rack and core switching products that can scale from
the smallest (<5 racks) to the largest (>100 racks) data centers. In addition, Dell has the tools and
management systems to seamlessly manage the converged infrastructure end to end. Highlights of Dell
Data Center product offerings:

 A complete 1/10/40GbE networking portfolio


 Data center rich software feature set such as DCB, FIP Snooping, Stacking and VLT
 Industry-leading server technology: Blade, rack, converged
 Storage (FCoE and iSCSI) with Dell’s Compellent and Equallogic portfolio
 Security (Dell SonicWall)
 Cloud solutions (vCloud), Active Systems, OpenStack
 Open automation software
 Simple, easy-to-use management tools

The following documentation is available for more information on Dell’s enterprise portfolio:

Dell Networking Quick Reference Guide

Dell Data Center Switching Quick Guide

Dell Networking Brochure

10
Dell Data Center Design Considerations with 40GbE and 100GbE

This document is for informational purposes only and may contain typographical errors. The
content is provided as is, without express or implied warranties of any kind.

© 2013 Dell Inc. All rights reserved. Dell and its affiliates cannot be responsible for errors or omissions
in typography or photography. Dell and the DELL logo trademarks of Dell Inc. Intel and Xeon are
registered trademarks of Intel Corporation in the U.S. and other countries. Microsoft, Windows, and
Windows Server are either trademarks or registered trademarks of Microsoft Corporation in the United
States and/or other countries. Other trademarks and trade names may be used in this document to
refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary
interest in the marks and names of others.

August 2013| Rev 1.0

11

Вам также может понравиться