Академический Документы
Профессиональный Документы
Культура Документы
Processing
Version 2.0
January 2005
Xilinx Advanced Products Division
Agenda
Introduction
Background
Virtex-4 Solutions
Summary
Virtex-4 DSP
Introduction
Virtex-4 DSP
Low Power
2.3mW/100MHz scalable power efficiency
Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution
Virtex-4 DSP
Background
C1
C0
X
C2
C3
Reg
256 clock
cycles
needed
C0
Reg
Reg
Coefficients
Data In
Reg
Data In
C255
MAC Unit
256 operations
in 1 clock cycle
Reg
+
Data Out
Data Out
1 GHz
256 clock cycles
= 4 MSPS
500 MHz
1 clock cycle
= 500 MSPS
Virtex-4 DSP
Consumes Logic to
Implement Adders
C4
C5
C0
X
C6
C7
+
+
Virtex-4 DSP
C30
C31
Reg
Reg
C3
Reg
Reg
C2
Reg
C0
X
C1
Reg
Reg
Reg
C0
Reg
Data In
Variable
Latency
Data Out
Reg
C31
Reg
C30
Reg
Reg
Reg
Reg
Virtex-4 DSP
C7
Reg
Reg
Filters Implemented
Entirely Within the
XtremeDSP Slice
Reg
C6
Reg
Reg
Reg
C5
Reg
Reg
C4
Reg
Reg
C3
Reg
Reg
Reg
C2
Reg
Reg
Reg
C1
Reg
Reg
Reg
C0
Reg
Reg
Data In
+
Data Out
Virtex
-4 XtremeDSP Highest DSP
Virtex-4
Bandwidth Available
44thth Generation
Generation
256GMACs/s
256GMACs/s
250
200
rd Generation
33rd
Generation
111GMACs/s
111GMACs/s
150
nd Generation
22nd
Generation
32GMACs/s
32GMACs/s
100
50
0
Virtex-4 DSP
11stst Generation
Generation
11GMACs/s
11GMACs/s
Virtex-E
10
Virtex-II
Virtex-II Pro
Virtex-4
Virtex-4
Solutions
Integrated
Integrated Cascade
Cascade
Routing
Routing enables
enables
scalable
scalable performance
performance
Arithmatica
Parallel
Arithmatica
Arithmatica
Parallel Counter
Counter
20%
20% faster
faster performance
performance and
and
uses
uses less
less area
area
Virtex-4 DSP
12
Arithmatica
A+Adder
Arithmatica
Arithmatica
A+Adder
20%
20% faster
faster than
than
other
other implementations
implementations
2X the performance
of Virtex
-II Pro
Virtex-II
13
Power (mW
(mW))
Number of
XtremeDSP Slices
1000
512
800
400
600
300
400
200
200
100
0
100
200
300
400
500
Frequency
(MHz)
Note: Power efficiency achieved using the DSP48 component with a toggle rate of 38%.
It is not an entire MAC with BRAM, control path sequencer/address generator in fabric, and including external routing.
Virtex-4 DSP
14
Real
Real and
and Imaginary
Imaginary
35x18
35x18 Complex
Complex Multiply
Multiply
consumes
consumes only
only 92mW
92mW at
at 500MHz
500MHz
35x18
35x18 Complex
Complex Multiply
Multiply 35x18
35x18 Complex
Complex Multiply
Multiply
Imaginary
Real
Imaginary Portion
Portion
Real Portion
Portion
Virtex-4 DSP
15
4VSX55
250
Ex
am
ple
:1
0X
200
150
100
50
4VSX35
4VSX25
2VP30
2VP50
2VP40
2VP70
16
Virtex-4 SX
DSP Platform
Virtex-4 FX
Virtex-4 LX
DSP Slices
Memory
Logic
DCMs
256GMACs/s
DSP Bandwidth
Virtex-4 DSP
17
96GMACs/s
DSP Bandwidth
48GMACs/s
DSP Bandwidth
Dynamically Programmable
DSP Op Modes
OpMode
6
Zero
0
Hold P
0
A:B Select
0
Multiply
0
C Select
0
Feedback Add
0
36-Bit Adder
0
P Cascade Select
0
P Cascade Feedback Add
0
P Cascade Add
0
P Cascade Multiply Add
0
P Cascade Add
0
P Cascade Feedback Add Add0
P Cascade Add Add
0
Hold P
0
Double Feedback Add
0
Feedback Add
0
Multiply-Accumulate
0
Feedback Add
0
Double Feedback Add
0
Feedback Add Add
0
C Select
0
Feedback Add
0
36-Bit Adder
0
Multiply-Add
0
Double
0
Double Add Feedback Add
0
Double Add
0
Virtex-4 DSP
18
Z
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
4
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
1
1
1
1
1
Y
3 2
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
X
1 0
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
Output
+/- Cin
+/- (P + Cin)
+/- (A:B + Cin)
+/- (A * B + Cin)
+/- (C + Cin)
+/- (C + P + Cin)
+/- (A:B + C + Cin)
PCIN +/- Cin
PCIN +/- (P + Cin)
PCIN +/- (A:B + Cin)
PCIN +/- (A * B + Cin)
PCIN +/- (C + Cin)
PCIN +/- (C + P + Cin)
PCIN +/- (A:B + C + Cin)
P +/- Cin
P +/- (P + Cin)
P +/- (A:B + Cin)
P +/- (A * B + Cin)
P +/- (C + Cin)
P +/- (C + P + Cin)
P +/- (A:B + C + Cin)
C +/- Cin
C +/- (P + Cin)
C +/- (A:B + Cin)
C +/- (A * B + Cin)
C +/- (C + Cin)
C +/- (C + P + Cin)
C +/- (A:B + C + Cin)
Enables time-division
multiplexing for DSP
Over 40 different modes
Each XtremeDSP Slice
individually controllable
Change operation in a single
clock cycle
Control functionality from
logic, memory or processor
19
256 GMACs
Performance
Lowest Cost
(90nm)
Comprehensive Library
Fast Turnaround
Exceptional Performance
DSP Design Services, Training & Hotline
Distributor
Services &
Training
Virtex-4 DSP
20
Specify
Specify Design
Design
Synthesize
Synthesize Design
Design
21
VHDL
VHDL and
and Coregen
Coregen Output
Output
Summary
Virtex-4 XtremeDSP
Enabling next generation high-performance DSP
Highest Performance
512 XtremeDSP slices at 500MHz
256 GMACs/s DSP bandwidth
Lowest Power
2.3mW/100MHz
Most Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution available
Virtex-4 DSP
23
Virtex-4 DSP
24