Академический Документы
Профессиональный Документы
Культура Документы
Kenny Ranerup
Director of Engineering
SwitchCore AB
kenny.ranerup@switchcore.com
Ethernet
preamble
preamble
Ethernet frame
bytes: 6
IPG preamble
data
CRC
46-1500
10 Mbit/s.
Coaxial cable.
Shared medium.
TE
10-100 machines on
one network.
Physical size limited.
TE
TE
Ethernet frame
TE
hub
TE
7
hub
hub
TE
TE
TE
Simplex.
10
100 Mbit/s.
Same logical and physical structure as
10Base-T.
Same cable type as 10BASE-T (two twisted
pairs) but requires Category 5 cable class.
More advanced physical encoding allows
100 Mbit/s data rate with only 31 MHz
frequency spectrum (4B/5B, MLT-3).
11
12
Ethernet PHY:1000Base-SX/LX
1 Gbit/s.
Based on FibreChannel physical layer which
is fibre-optical.
Data is transmitted over a pair of fibres.
Available today.
13
14
1000BASE-T: Encoding
15
16
17
18
Switch port
switch
TE
19
TE
TE
TE
Arbitration protocol
hidden inside switch
fabric, i.e. no longer
limited by CSMA/CD.
Many senders and
receivers at the same
time.
Total bandwidth only
limited by the switch
fabrics internal design.
switch
21
TE
TE
switch
TE
TE
switch
switch
TE
TE
TE
TE
TE
TE
switch
switch
Can be extended to
arbitrary size.
20
TE
TE
22
Address Lookup
23
24
Forwarding
Transfer the packet from
the input port to the correct
output port.
One common solution is a
crossbar which can connect
all inputs to all outputs.
Switch fabric
25
Queuing
In some cases the packet has to be stored
temporarily before transmitting.
If the output port speed differs from the input port.
If several inputs transmit to the same output.
If some packets have higher priority and are
therefore not sent in the arrival order.
Output queue
26
1
2
3
4
27
28
Layer-2 switching
Layer-2 switching
Address lookup
Address learning/updates
The switch learns address to port mapping by
listening to source addresses in all incoming
frames, thereby determining which addresses
are present at each port.
If an address isnt present in the lookup table,
the frame is flooded (broadcasted) to all ports.
This simulates a shared LAN.
Due to the need for broadcast for address
learning (and other functions), L2 switches
doesnt scale very well.
29
30
Layer-3 switching
Address lookup
Address learning/updates
31
32
1
2
3
4
33
34
Bus Based
Bus based.
Crossbar switch.
Multi-stage network.
Shared memory.
Combinations thereof.
Shared bus
35
36
Crossbar switch
Multi-stage network.
Scalable.
Requires a large number of chips.
Store-and-forward results in long latencies.
37
38
Shared memory
3
4
39
40
Input Buffering
Congested
41
In1
2
3
4
2
3
4
In4
Out3
3
1
42
Output buffering
Packet buffering
No head-of-line blocking.
Broadcast traffic is duplicated.
Buffer bandwidth much higher since the buffer
must be able to swallow traffic from all input
ports.
64 400
43
44
Buffer fragmentation
1500
Packet length
[bytes]
Cut-through / Store-and-forward
1518
Frames span multiple cells (small external
fragmentation).
256
frame
45
46
Congestion
Congestion Control
0.5 Gbps
1
2
3
0.8 Gbps
47
1.0 Gbps
48
PAUSE frames
TE
TE
49
Admission Control
51
52
53
TCP slow-start.
55
56
QoS
QoS: Latency
57
58
QoS: Bandwidth
Class-of-Service / DiffServ
59
60
10
= Sw itch or
R outer
DiffServ Domain
VLSI Architecture
for
Gigabit Ethernet Switches
Boundary Node
DiffServ Domain
Interior Node
61
Switch Fabrics
Design goals:
Desired characteristics:
63
64
Switch Architecture
MAC
CAM
MAC
MAC
MAC
MAC
Advantages:
Disadvantages:
Serial
to
parallel
65
Shared
buffer
memory
Parallel
to
serial
66
11
67
Serial
to
parallel
68
Shared
buffer
memory
MAC
MAC
CAM
MAC
MAC
MAC
Switch Architecture
MAC
Parallel
to
serial
Switching at 1 Gbit/s
32 ports @ 1 Gbit/s
32 * 2 * 125 Mbyte/s = 8 Gbyte/s
32 ports @ 10 Gbit/s
32 * 2 * 1250 Mbyte/s = 80 Gbyte/s
69
70
32 ports @ 1 Gbit/s
48 million packets/s.
3 lookups/packet
71
72
12
Pin Count
73
Pin Count
Low swing to reduce average power and
peak currents.
High frequency to reduce number of pins.
Very difficult to avoid skew on PCB. Signaling
should probably be asynchronous.
Signals mustnt switch simultaneously.
Interface must be standardized.
75
Processor Performance
Single processor.
48 million packets/s (32 ports @ 1 Gbit/s).
1000 MIPS processor.
20 instructions/packet.
74
Packet Processing
The amount of processing performed for
each packet is increasing rapidly as more
advanced services are introduced.
Programmable solutions are required to give
sufficient flexibility and speed of
implementation.
76
670 instructions/packet.
77
78
13
Design Flow
79
81
MAC
MAC
MAC
ReEnc
CAM
ReEnc
Address
lookup
ReEnc
MAC
MAC
PDEC
PDEC
PDEC
MAC
Serial
to
parallel
Shared
buffer
memory
Parallel
to
serial
CPU i/f
Q-Engine
Rambus i/f
Design Approach
Choose an architecture that makes it
possible to utilize the silicon.
The design is divided into a number of
blocks with 1-2 designers per block.
Critical blocks is designed in full-custom,
mostly memory or datapath-like structures.
All other blocks are designed at RTL level
and synthesized to standard cells.
82
Synthesis
83
14
Full-Custom Design
Methodology Comparison
85
86
Methodology Comparison
OTHERS
30mm2
12mm2
6.3W
0.5W
Semi-Custom Approach
~3x the Area
~12x the power
SwitchCore
87
Switchcore AB
90
15
Characteristics
References
91
92
16