Вы находитесь на странице: 1из 23
A 24­port 10G Ethernet Switch (with asynchronous circuitry) 1 Andrew Lines

A 24­port 10G Ethernet Switch

(with asynchronous circuitry)

1

Andrew Lines
Andrew Lines

Agenda

Agenda Product Information Technical Details Photos 2

Product Information Technical Details Photos

2

Agenda Product Information Technical Details Photos 2

Tahoe: First FocalPoint Family Member

Tahoe
Tahoe

The lowest­latency feature­rich 10GE switch chip

10G Ethernet switch

­ 24 Ports

Line rate performance

­ 240Gb/s bandwidth

­ 360M frames/s

­ Full­speed multicast

Fully­integrated single chip

­ 1MB frame memory

­ 16K MAC addresses

Lowest latency Ethernet

­ 200ns with copper cables

Rich Feature Set

­ Extensive layer 2 features

Flexible SERDES interfaces

­ 10G XAUI (CX­4)

­ 1G SGMII

SPI CPU JTAG LED Frame Processor (Scheduler) RapidArray ™ (packet storage) XAUI (CX­4) Nexus ®
SPI
CPU
JTAG
LED
Frame Processor
(Scheduler)
RapidArray ™
(packet storage)
XAUI (CX­4)
Nexus ®
Nexus ®
XAUI (CX­4)
(Scheduler) RapidArray ™ (packet storage) XAUI (CX­4) Nexus ® Nexus ® XAUI (CX­4) Asynchronous Blocks 3

Asynchronous Blocks

3

(Scheduler) RapidArray ™ (packet storage) XAUI (CX­4) Nexus ® Nexus ® XAUI (CX­4) Asynchronous Blocks 3

Tahoe Hardware Architecture

Modular architecture, centralized control

SPI CPU JTAG LED Interface Interface Interface Interface
SPI
CPU
JTAG
LED
Interface
Interface
Interface
Interface
Management Frame Control LCI Lookup Handler Stats RX Port Logic Scheduler TX Port Logic P
Management
Frame Control
LCI
Lookup
Handler
Stats
RX Port Logic
Scheduler
TX Port Logic
P
M
M
P
Ser
Ser
C
A
A
C
Des
Des
S
C
C
S
Switch Element Data Path
Ra
pidArray ™
(1MB Shared Memory)
RX Port Logic
TX Port Logic
P
M
M
P
Ser
Ser
C
A
A
C
Des
Des
S
C
C
S
Nexus ®
Nexus ®

4

  RX Port Logic TX Port Logic P M M P Ser Ser C A

Tahoe Chip Plot

Fabricated in TSMC 0.13um

Ethernet Port Logic ­ SerDes ­ PCS ­ MAC Nexus Crossbars
Ethernet Port Logic
­ SerDes
­ PCS
­ MAC
Nexus Crossbars

­ 1.5Tb/s total

­ 3ns latency

PCS ­ MAC Nexus Crossbars ­ 1.5Tb/s total ­ 3ns latency 5 RapidArray Memory ­ 1MB

5

RapidArray Memory ­ 1MB shared
RapidArray Memory
­ 1MB shared

Scheduler

­ Highly optimized

­ High event rate

MAC Table

­ 16K addresses

Management

Frame Control

­ CPU interface

­ JTAG

­ EEPROM interface

­ LEDs

­ Frame handler

­ Lookup

­ Statistics

Management Frame Control ­ CPU interface ­ JTAG ­ EEPROM interface ­ LEDs ­ Frame handler

Bridge Features

Robust set of layer­2 features

General Bridge Features

­ 16K MAC entries

­ STP: multiple, rapid, standard

­ Learning and Ageing

­ Multicast GMRP and IGMPv3

VLAN Tag (IEEE 802.1Q­2003)

­ Add / Remove tags

­ Per port association default

­ 4K­entry VLAN­ID table

­ Per VLAN, per­port STP

Scheduling, Pause, Congestion

­ 16 traffic classes for WRED

­ 4 queues per port scheduling

­ WRR or strict priority

­ Pause support

Security

­ 802.1x; MAC Address Security

Monitoring

­ Rich monitoring terms

logical combination of terms

Src Port, Dst Port, VLAN, Traffic Type, Priority, Src MA, Dst MA, etc.

6

­ Monitoring action

• Drop, Mirror, Redirect, Count, Change Priority

­ 16 rules per frame

Statistics

­ RFC 2819 compliant

­ All counters are 64 bits

­ 13 counter groups

RMON and SMON

Fulcrum extensions

RFC 2819 compliant ­ All counters are 64 bits ­ 13 counter groups • RMON and

Link Aggregation and Fat Tree Support

True IEEE­compliant Link Aggregation used to group links between line and fabric switches Symmetric hashing
True IEEE­compliant Link
Aggregation used to group links
between line and fabric switches
Symmetric hashing guarantees
a conversation resolves to the
same fabric switch
Ingress to
fabric hop
Fabri
Fabri
Fabri
uses Link
c
c ∙∙∙
c
Aggregation
Chip
Chip
Chip
hardware to
load balance
Intra­switch
Link (ISL)
Line
Line
Line
Line
Line
Chi
Chi
Chi
Chi
Chi
p p
p ∙∙∙
p p
∙∙∙
∙∙∙
MAC B
MAC A
7

Link Aggregation chip features

Configuration

­ 12 trunk groups

­ Any ports in a group

­ Up to 12 members

Hash: Ethernet CRC

­ Programmable Input

­ SA, DA, Type, VLAN­ ID, Priority, Source port

­ SA­DA hash symmetry forcing

­ Group renumbering

Other HW hooks

­ Slow protocol traps

port ­ SA­DA hash symmetry forcing ­ Group renumbering • Other HW hooks ­ Slow protocol

Two Versions Sampling in Q1 2006

Announced pricing at SC|05 First company to break through $20/port for 10GE

at SC|05 First company to break through $20/port for 10GE 8 • FM2224 ­ 24 10GE

8

FM2224

­ 24 10GE Interfaces

­ 1433­ball BGA

­ 40mm

­ $450

FM2112

­ 8 10GE Interfaces and

­ 16 1­2.5GE Interfaces

­ 897­ball BGA

­ 32mm

­ $265

$450 • FM2112 ­ 8 10GE Interfaces and ­ 16 1­2.5GE Interfaces ­ 897­ball BGA ­

24­Port Reference Design (Now Shipping)

24­Port Reference Design (Now Shipping) Evaluation Platform CSL 13 14 15 16 17 18 19 20
Evaluation Platform CSL 13 14 15 16 17 18 19 20 21 22 23 24
Evaluation Platform
CSL
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
12
ETH

9

Evaluation Platform CSL 13 14 15 16 17 18 19 20 21 22 23 24 1

Agenda

Agenda Product Information Technical Details Photos 10

Product Information Technical Details Photos

10

Agenda Product Information Technical Details Photos 10

Tahoe Hardware Features

Multiple Frequency Requirements

­ 3.125GHz serial links (licensed from RAMBUS)

­ 312.5MHz 32­bit datapaths (sync and async)

­ 750MHz MAC Table, Scheduler, Main Memory, Statistics, cross­chip interconnect (async)

­ 360MHz Frame Processing (sync)

­ 66MHz Management (sync)

Mixed design styles

­ 3 synchronous blocks: synthesize, place, and route

­ Many custom async blocks (most of the transistors)

­ Licensed cores: SERDES, PLL, TTL pads, fusebox

11

and route ­ Many custom async blocks (most of the transistors) ­ Licensed cores: SERDES, PLL,

Tahoe Chip Statistics

TSMC 0.13um LVOD FSG 1.2V

105M transistors

Over 3000 unique cells

1.5MB total SRAM (all asynchronous)

0.5­1.5W per port depending on activity (36W peak)

Flip­chip BGA package

12

total SRAM (all asynchronous) • 0.5­1.5W per port depending on activity (36W peak) • Flip­chip BGA

Sync and Async together?

Use existing 3 rd party IP cores for synchronous I/O, such as high­speed SERDES from RAMBUS.

Use standard synchronous synthesis, place, and route flow to implement logically complex units with lower speed requirements.

Use async flow only where it has the biggest advantages – SRAMs, crossbars, chip­wide interconnect, FIFO's, and high­speed blocks.

Must partition the problem in Architecture.

Some day everything will be Async, but not yet!

13

blocks. • Must partition the problem in Architecture. • Some day everything will be Async, but

Simple Sync­to­Async Conversion

Synchronous Request / Grant FIFO protocol

S2A Synchronous Asynchronous Datapath Datapath Request Grant clock
S2A
Synchronous
Asynchronous
Datapath
Datapath
Request
Grant
clock
A2S Asynchronous Synchronous Datapath Datapath Request Grant clock
A2S
Asynchronous
Synchronous
Datapath
Datapath
Request
Grant
clock

Seamlessly Bridges Different Clock Domains

14

A2S Asynchronous Synchronous Datapath Datapath Request Grant clock Seamlessly Bridges Different Clock Domains 14

Digital Verification

Often overlooked in Academia, but crucial in Industry!

There are nearly as many engineers in verification as there are in design.

Use industry­standard approach of a full­chip simulation with test­bench, test suite, regression engine.

Try to get full line and conjunct coverage.

Convert CSP/PRS into Verilog for chip­level simulation combined with synchronous blocks.

Also use simple closed­environment self­tests to check that different levels of async decomposition match, but this is not sufficient.

15

self­tests to check that different levels of async decomposition match, but this is not sufficient. 15

Design For Test

Must be able to check for manufacturing defects in async blocks.

Introduce special “scan­buffers” which integrate a serial shift register into an async buffer.

Connect the scan­buffers into 16 serial scan­chains.

Can issue an inject, drain, or skip command to each scan­buffer on a scan­chain.

External clocked interface to standard testers.

Commercial fault­grading tool (ZOIX).

16

on a scan­chain. • External clocked interface to standard testers. • Commercial fault­grading tool (ZOIX). 16

Async SRAM in FocalPoint

Async SRAM in FocalPoint Use TSMC 6T state bit layout Multi­bank design connected with async crossbars

Use TSMC 6T state bit layout Multi­bank design connected with async crossbars and busses Supports up to 32 write ports and 32 read ports in parallel Bank runs at 600MHz, but interconnect sustains 750MHz

17

Supports up to 32 write ports and 32 read ports in parallel Bank runs at 600MHz,

SRAM Test and Repair

Scan­buffers integrated into most SRAM banks.

On­chip accelerated testing for largest SRAM.

Tester produces a defect map.

Burn fusebox to use spare addresses to repair bit or address­line errors.

In many SRAMs, can simply remove a block of bad “segments” of storage from the free memory pool. This can repair many more types of errors.

Yield looks quite good so far, as expected.

18

the free memory pool. This can repair many more types of errors. • Yield looks quite

Agenda

Agenda Product Information Technical Details Photos 19

Product Information Technical Details Photos

19

Agenda Product Information Technical Details Photos 19

FocalPoint Test Platform

20
20

FocalPoint EP Board

21
21

FocalPoint EP Rack

22
22

Wishlist

CSP vs CSP formal verification

CSP vs PRS formal verification

ATPG tools for async circuits

Static timing for async circuits

Async synthesis from CSP

65nm advice

If you've working on any of these, talk to me!

23

async circuits • Async synthesis from CSP • 65nm advice If you've working on any of