Вы находитесь на странице: 1из 24

Digital Signal

Processing
Version 2.0
January 2005
Xilinx Advanced Products Division

Agenda
Introduction
Background
Virtex-4 Solutions
Summary

Virtex-4 DSP

Introduction

High-Speed DSP Challenges


High performance digital communication
and video imaging designs challenge
existing DSP solutions
Need higher performance
Need lower costs
Need lower power

Compromises are often made


Performance is sacrificed
Time is spent designing substitute
implementations

Virtex-4 DSP

Achieve DSP Performance and


Efficiency in Virtex-4
Virtex-4 XtremeDSP
Performance
512 XtremeDSP slices at 500MHz
256 GMACs/s DSP bandwidth

Low Power
2.3mW/100MHz scalable power efficiency

Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution

Virtex-4 DSP

Background

FPGAs Enable Massively Parallel


DSP
Example 256 TAP Filter Implementation

Programmable DSP - Sequential

C1

C0
X

C2

C3

Reg

256 clock
cycles
needed

C0

Reg

Reg

Coefficients

Data In

Reg

Data In

FPGA - Fully Parallel Implementation

C255

MAC Unit
256 operations
in 1 clock cycle

Reg

+
Data Out

Data Out

1 GHz
256 clock cycles

= 4 MSPS

500 MHz
1 clock cycle

= 500 MSPS

the unprecedented signal processing requirements of nextnext-generation wireless


devices threaten to outpace the capabilities of DSP processors, creating opportunities
for massively parallel and highly customized devices.
devices. BDTI, 2004

Virtex-4 DSP

Parallel Adder Tree Implementation


Consumes FPGA resources
Parallel Adder Tree Implementation

Consumes Logic to
Implement Adders

C4

C5

C0
X

C6

C7

+
+

Virtex-4 DSP

C30

C31

32 TAP filter implementation will


consume 1,461 logic cells to
implement adders in fabric

Reg

Reg

C3

Reg

Reg

C2

Reg

C0
X

C1

Reg

Reg

Reg

C0

Reg

Data In

Variable
Latency

Data Out

Fabric and Routing May


Reduce Performance

Virtex-4 Parallel Implementation


Consumes Zero Logic Resources
Parallel Adder Cascade Implementation

Reg

C31

Reg

C30

Reg

Reg

Reg

Reg

Virtex-4 DSP

C7

Reg

32 TAP filter implementation using


32 XtremeDSP Slices

Reg

Filters Implemented
Entirely Within the
XtremeDSP Slice

Reg

C6

Reg

Reg

Reg

C5

Reg

Reg

C4

Reg

Reg

C3

Reg

Reg

Reg

C2

Reg

Reg

Reg

C1

Reg

Reg

Reg

C0

Reg

Reg

Data In

+
Data Out

Guaranteed 500MHz Performance


Regardless of Filter Size

Xilinx 4th Generation XtremeDSP


DSP Bandwidth
GMACs/s

Virtex
-4 XtremeDSP Highest DSP
Virtex-4
Bandwidth Available

44thth Generation
Generation
256GMACs/s
256GMACs/s

250
200
rd Generation
33rd
Generation
111GMACs/s
111GMACs/s

150

nd Generation
22nd
Generation
32GMACs/s
32GMACs/s

100
50
0

Virtex-4 DSP

11stst Generation
Generation
11GMACs/s
11GMACs/s

Virtex-E
10

Virtex-II

Virtex-II Pro

Virtex-4

Virtex-4
Solutions

Full Custom Design Results in Higher


Performance
Pipeline
Pipeline Registers
Registers
enable
enable 500Mhz
500Mhz
performance
performance

Scalable 500MHz performance is impossible with


Standard Cell libraries and Standard Cell design flow

Integrated
Integrated Cascade
Cascade
Routing
Routing enables
enables
scalable
scalable performance
performance

Arithmatica
Parallel
Arithmatica
Arithmatica
Parallel Counter
Counter
20%
20% faster
faster performance
performance and
and
uses
uses less
less area
area

Virtex-4 DSP

12

Arithmatica
A+Adder
Arithmatica
Arithmatica
A+Adder
20%
20% faster
faster than
than
other
other implementations
implementations

2X the performance
of Virtex
-II Pro
Virtex-II

Wide Filters At Full Speed


Within the Virtex-4 DSP Slice Column
Systolic N-tap FIR
Scalable N-level deep implementation
500MHz performance at N-level deep

Uses Integrated Pipeline Registers to


synchronize filter inputs
Utilizes Input and Output Cascade Routing
Build
-TAP FIR
512
Build massively
massively parallel
parallel 512512-TAP
FIR filter
filter
in
in aa single
single device
device achieving
achieving
256
256 GMACs/s
GMACs/s performance
performance
Equivalent
Equivalent implementation
implementation would
would consume
consume
444
444 Embedded
Embedded Multipliers
Multipliers and
and 77,008
77,008 LCs
LCs
and
and would
would only
only achieve
achieve half
half the
the performance
performance
Virtex-4 DSP

13

Lowest Power DSP


18x18 Multiply
-Accumulate
Multiply-Accumulate
scalable power efficiency
at 2.3
mW/100MHz
2.3mW/100MHz

Power (mW
(mW))

Number of
XtremeDSP Slices

1000

512

800
400

600
300
400
200

60 GMACs/s for 1.38 W

200

20 GMACs/s for .46 W

100

0
100

200

300

400

500

Frequency
(MHz)

Note: Power efficiency achieved using the DSP48 component with a toggle rate of 38%.
It is not an entire MAC with BRAM, control path sequencer/address generator in fabric, and including external routing.

Virtex-4 DSP

14

High-Speed, Low Power


Complex Multiply
Complex filter implementation
Register the inputs using minimal
external resources
Synchronize data using pipeline
delay elements
35x18
35x18 Complex
Complex Multiply
Multiply
at
at 500MHz
500MHz

Real
Real and
and Imaginary
Imaginary
35x18
35x18 Complex
Complex Multiply
Multiply
consumes
consumes only
only 92mW
92mW at
at 500MHz
500MHz
35x18
35x18 Complex
Complex Multiply
Multiply 35x18
35x18 Complex
Complex Multiply
Multiply
Imaginary
Real
Imaginary Portion
Portion
Real Portion
Portion
Virtex-4 DSP

15

Up to 10X Greater DSP Bandwidth


Per Dollar
GMAC/s

4VSX55

250

Ex
am
ple
:1
0X

200
150
100
50

4VSX35
4VSX25
2VP30

2VP50
2VP40

Example: 4VSX55 vs. 2VP70


1.6x more MACs
2x higher performance
1/3 the price
10X MACs
/price ratio
MACs/price

2VP70

Previous generation FPGAs


Unit Cost

* 18x18 mult. + 48-bit acc.


Virtex-4 DSP

16

Virtex-4 DSP Solutions


Choose the Right Combination
Features

Virtex-4 SX
DSP Platform

Virtex-4 FX

Virtex-4 LX

Full Featured Platform Logic Platform

DSP Slices
Memory
Logic
DCMs
256GMACs/s
DSP Bandwidth
Virtex-4 DSP

17

96GMACs/s
DSP Bandwidth

48GMACs/s
DSP Bandwidth

Dynamically Programmable
DSP Op Modes
OpMode

6
Zero
0
Hold P
0
A:B Select
0
Multiply
0
C Select
0
Feedback Add
0
36-Bit Adder
0
P Cascade Select
0
P Cascade Feedback Add
0
P Cascade Add
0
P Cascade Multiply Add
0
P Cascade Add
0
P Cascade Feedback Add Add0
P Cascade Add Add
0
Hold P
0
Double Feedback Add
0
Feedback Add
0
Multiply-Accumulate
0
Feedback Add
0
Double Feedback Add
0
Feedback Add Add
0
C Select
0
Feedback Add
0
36-Bit Adder
0
Multiply-Add
0
Double
0
Double Add Feedback Add
0
Double Add
0

Virtex-4 DSP

18

Z
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1

4
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
1
1
1
1
1

Y
3 2
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1
0 0
0 0
0 0
0 1
1 1
1 1
1 1

X
1 0
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1
0 0
1 0
1 1
0 1
0 0
1 0
1 1

Output
+/- Cin
+/- (P + Cin)
+/- (A:B + Cin)
+/- (A * B + Cin)
+/- (C + Cin)
+/- (C + P + Cin)
+/- (A:B + C + Cin)
PCIN +/- Cin
PCIN +/- (P + Cin)
PCIN +/- (A:B + Cin)
PCIN +/- (A * B + Cin)
PCIN +/- (C + Cin)
PCIN +/- (C + P + Cin)
PCIN +/- (A:B + C + Cin)
P +/- Cin
P +/- (P + Cin)
P +/- (A:B + Cin)
P +/- (A * B + Cin)
P +/- (C + Cin)
P +/- (C + P + Cin)
P +/- (A:B + C + Cin)
C +/- Cin
C +/- (P + Cin)
C +/- (A:B + Cin)
C +/- (A * B + Cin)
C +/- (C + Cin)
C +/- (C + P + Cin)
C +/- (A:B + C + Cin)

Enables time-division
multiplexing for DSP
Over 40 different modes
Each XtremeDSP Slice
individually controllable
Change operation in a single
clock cycle
Control functionality from
logic, memory or processor

Virtex-4 XtremeDSP Slices Useful


For More Than DSP
6:1 high-speed, 36-bit Multiplexer
Use four XtremeDSP Slice and op-modes
500 MHz performance using no programmable logic
Save 1584 LCs to build equivalent function in logic

Dynamic 18-bit Barrel Shifter


Use two XtremeDSP slices
Use dedicated cascade routing and integrated 17-bit shift
Save 1449 LCs to build equivalent function in logic

36-bit Loadable Counter


Use a single XtremeDSP slice, achieve 500 MHz performance
Save 540 LCs to build equivalent function in logic
Virtex-4 DSP

19

FPGAs with DSP Functions

256 GMACs
Performance

Shortest Design Time

Lowest Cost
(90nm)

60+ Advanced DSP Cores

60+ DSP Development Boards

Comprehensive Library
Fast Turnaround
Exceptional Performance
DSP Design Services, Training & Hotline

Distributor
Services &
Training

Virtex-4 DSP

20

Major DSP Alliances

Dedicated Field Specialists

50+ Field DSP


Experts
Systems Expertise

DSP Division Experts


Tools, IP Solutions
Xilinx Design Services, Education and Support

Xilinx FPGA DSP Design


Flow
Implement
Implement In
In Hardware
Hardware

FPGA Designer and


System Architect

Specify
Specify Design
Design

Synthesize
Synthesize Design
Design

DSP Architectural Wizard


Virtex-4 DSP

21

VHDL
VHDL and
and Coregen
Coregen Output
Output

Summary

Virtex-4 XtremeDSP
Enabling next generation high-performance DSP
Highest Performance
512 XtremeDSP slices at 500MHz
256 GMACs/s DSP bandwidth

Lowest Power
2.3mW/100MHz

Most Value
Operate the XtremeDSP slice in over 40 different modes
Highest DSP bandwidth per dollar solution available

Virtex-4 DSP

23

If You Want to Learn More


Evaluate XtremeDSP in Virtex-4
Request an advanced DSP presentation
Learn about advanced, high performance filter implementations
only possible in Virtex-4

Request a demo of the new XtremeDSP capability in


Virtex-4 today
See the fastest, lowest power FPGA DSP solution available

Visit www.xilinx.com/dsp for more information on Xilinx


DSP solutions

Virtex-4 DSP

24

Вам также может понравиться