Вы находитесь на странице: 1из 8

DSP or FPGA?

How to choose the right device


Bamdad Afra and Amit Kapadiya, Nuvation - May 07, 2008

System designers face a number of key questions during the architecture phase of their project.
Increasingly one of these questions is whether to use an FPGA (field programmable gate array) or a
DSP (digital signal processor). To answer this question, system designers consider parameters such
as:

● System performance requirements for signal processing


● Power consumption
● Component count and form factor
● Future product/system road-map and upgradeability for the system
● Economic parameters such as non-recurring engineering (NRE) investment, bill-of-materials
(BOM) cost, time-to-market and project risk

The decision also depends on the technology familiarity factor. In some cases, the design team is
well-versed in DSP systems but has little FPGA background, or vice-versa. In such cases, the team
skill-set may drive the choice between FPGA and DSP. For example, Nuvation recently worked on an
algorithm acceleration project where the algorithm lent itself to wide parallel implementation in an
FPGA. However, classic FPGA approaches were ruled out due to the lack of FPGA skills in the
client's engineering team and the potential barriers this presented to product lifecycle maintenance.

We acknowledge that most engineers and system architects are more familiar with DSP technology
due to the simplicity of designing with DSPs. This is a clear advantage for DSPs. However, developer
familiarity varies widely across design teams, so it is difficult say how important this issue is to a
"generic" design team. Thus, this article ignores this DSP advantage and assumes that the choice of
technology does not depend on developer familiarity.

In order to choose between FPGA and DSP, we look at system performance requirements for signal
processing and BOM cost. We consider devices from a major DSP vendor (Texas Instruments) and a
major FPGA vendor (Altera) to guide us through this process. We identify some of the signal
processing applications in which each specific technology is clearly superior. We also consider
where an FPGA may be used as a co-processor to a DSP chip.

DSP devices from Texas Instruments


Table 1 lists principle DSP devices from Texas Instruments (TI) in different cost categories. This
table summarizes the cost/performance data of more than 160 DSP devices. As shown in Table 1,
DSPs achieve cost/performance in the range of 1.8 to 48 cents per MMAC (millions of multiply-
accumulate operations per second).

Note that the table foot-note provides details relating to each device family. The cost/performance
data presented in this table should be considered in conjunction with these details. For example,
each DaVinci digital media processor incorporates an ARM9 processor (at up to 297 MHz), a
TMS320C64x+ DSP core (up to 4752 MIPS, and eight 8-bit MACs per cycle for up to 4752 MMACS)
as well as many peripherals and internal memory.

In Table 1, specific device names are excluded and only overall performance/cost ranges are
categorized based on the device cost belonging to a certain family. Some families appear multiple
times in the table as there are multiple devices within each family; some of which belong to different
cost categories.

(Click to enlarge)
Table 1. DSP device families from Texas Instruments in different cost categories.

Table notes:
(1): The device cost is based on 100u volumes. Pricing info was obtained from www.ti.com in
January 2008.

(2): MIPS is defined as the number of instructions that can be executed in millions per second. A
range of MIPS are available in various devices within each family.

(3): MMAC is defined as the number of single precision floating-point or fixed-point multiply-an-
-accumulate 32-bit operations that can be executed in millions per second. MMAC performance
values increase by a factor of 2x and 4x for 16-bit and 8-bit operations, respectively. All operations
assume no truncation of multiplication results (i.e. results are twice the size of operation bit-width:
32-bit multiply generates 64-bit result; 16-bit multiply generates 32-bit results, etc).

Device Family Notes:


The following notes provide a collective information summary on features and capabilities offered
within each device family. For specific and complete device capabilities refer to www.ti.com.

DaVinci Digital Media Processors include on-chip ARM9 processor, Ethernet MAC and/or Switch
sub-system, Video/Audio Ports, Video Processing Subsystem, High Definition features, PCI Bus
Interface, ATA Interface, USB Interface, etc.

C6000 Fixed Point DSPs include Viterbi Decoder Co-processor, Turbo Decoder Co-processor,
UTOPIA Slave 2 ATM Controller, PCI Bus Interface, RapidIO, Ethernet MAC, etc.

C6000 Floating Point DSPs include Single/Double precision floating point DSP core, Enhanced
CPU core, Audio Port, Dual Access Internal Memory, etc.

C5000 Fixed Point DSPs include Dual/Quad Core DSPs, On-chip ARM7 Processor, Video Hardware
Accelerator, ADC, USB, etc. This family includes low power devices and devices targeted for IP-
Phone and Client Side Telephony applications.

C2000 Digital Signal Controllers include 16/32-bit Core, Internal Flash Memory, 10/12-bit ADC
(multi-channel), PWM, etc. This device family is targeted for Control Applications. FPGA device
families from Altera
FPGA device families from Altera
Table 2 lists FPGA device families from Altera in different cost categories. This table summarizes
cost/performance data for over 100 FPGA devices (including various speed grades for each specific
device). Note that details regarding each FPGA family (interface capabilities, internal memory
architecture, and versatility of DSP resources) have been excluded from this summary. Moreover,
the MMAC performance estimates are based on clock frequencies which are achievable given an
overall resource utilization of 70 percent.

Also note that Table 2 does not identify MIPS performance values. A designer may include one or
multiple Nios II processors in the FPGA depending on the available resources (registers and
memory). However, defining a MIPS performance value for the FPGA as a whole is impossible
because the use of an embedded processor (and its specifications) is design specific. Finally, it is
important to note that logic resources and registers in the FPGA may be used to create additional
signal processing resources to increase the MMAC performance.

(Click to enlarge)
Table 2. Various FPGA device families from Altera sorted in different cost categories.

Table notes:
(1): Costs were obtained from www.altera.com in Jan 2008. At this time, the pricing for only one
device, EP3SL150, was available in Stratix III device family.

(2): MMAC is defined as the number of fixed-point 32-bit or single-precision floating point multiply-
and-accumulate operations that can be executed in units of millions per second. MMAC performance
values increase by a factor of 4x for 16-bit operations.

(3): Assuming a clock frequency of 120MHz for Cylcone II and Stratix II and a clock frequency of
165MHz for Cyclone III and Stratix III devices. The overall resource utilization is estimated to be
70%. Note that higher resource utilization and performance is achievable in the FPGA.

Choosing FPGAs and/or DSPs


FPGAs and DSPs are different devices, and they were created for different purposes. DSPs were
created to provide an optimized platform for signal processing algorithms implemented in software,
while FPGAs were initially created for providing glue logic. Over time, DSPs and FPGAs have grown
in performance and resources, and they now provide solutions in overlapping markets. Both have
application areas in which they are the optimum solution. For example, FPGAs are by far the
superior choice for networking applications that move traffic at Gigabit/second data rates. DSPs are
the superior choice for video applications such as surveillance. However, there are many
overlapping application space for these two devices.

In previous sections, we looked at the cost-performance values for DSPs and FPGAs. Table 3
summarizes these comparisons in three different MMAC performance categories: Low, Medium and
High. Table 3 also groups devices according to their cost. For example, Medium-performance
devices are sub-categorized into those costing $10~30, and those costing $30-100.

This table shows minimum cost/performance values. Note that FPGAs and DSPs differ in their
functionality and features. These features must be kept in mind while considering their
cost/performance values. Blindly choosing an FPGA its cost advantage (which can be as low as 0.2
cents/MMAC) would be a mistake.

(Click to enlarge)
Table 3. FPGA/DSP comparison summary.

Table Note: MMAC is defined as the number of fixed-point 32-bit or single-precision floating point
multiply-and-accumulate operations that can be executed in units of millions per second.

In order to put DSP and FPGA features in contrast, one can use Table 3 and Table 4 as guidelines for
choosing between FPGA and/or DSPs. The decision process shown in Table 4 considers application-
specific DSP features. DSPs often come with bundled features (see Notes for Table 1) that can
translate into cost savings. Therefore, a DSP with application specific features has an advantage
over an FPGA with similar cost/performance value. Table 4 reflects this advantage.

Table 4. Guideline for choosing DSP and/or FPGA.

Table Note: MMAC is defined as the number of fixed-point 32-bit or single-precision floating point
multiply-and-accumulate operations that can be executed in units of millions per second. DSP in
signal processing applications
For designs with MMAC requirement below 300 MMAC, DSPs are in general the optimum solution
from cost/performance perspective. For designs with MMAC requirement between 300 and 1000
MMAC, the DSP is generally preferable when it comes with application specific resources (such as
video/audio ports, ARM processor, etc., as is the case with the DaVinci digital media processors).
When a DSP with application specific resources does not exist, other aspects of the design must be
considered.

For applications with performance requirements above 1000 MMAC, FPGA/DSP Hybrid solutions are
often the ideal solution. These applications often include multiple signal processing algorithms, some
of which have low performance requirements. In such cases, relatively inexpensive DSPs can
implement the algorithms with low-to-medium performance requirements, leaving the higher-
performance algorithms to FPGAs.

It is important to note that each design is unique. No global solution exists for choosing between
DSPs and FPGAs. The aim of this article is to provide the reader with an overall overview of DSPs
and FPGAs along with their overall cost/performance values and features. The data and guidelines
presented in here may be used as the initial step for choosing a DSP and/or FPGA. The choice must
be justified based on the totality of the design requirements. In the following sections, we look at
some of the additional requirements one must consider when choosing DSPs and/or FPGAs.

DSP in signal processing applications


A DSP is a specialized CPU for signal processing applications. Its core is designed to optimally
execute signal processing algorithms for which the principle operation (multiply-and-accumulate) is
similar across almost all algorithms. DSPs are also packaged with many peripherals and different
types of memory in the same device, similar to micro-controllers. In a sense, DSPs concurrently offer
all flexibilities and functionalities offered by microcontrollers in addition to being optimal for signal
processing applications for low and medium performance applications. Therefore, DSPs become the
device of choice for system architects for a large range of application given the combination of:

● Microcontroller functionality,
● Being optimized for signal processing, and
● Numerous on-chip peripherals bundled in the same package.

As an example, a Nuvation customer wanted to develop a laser control loop. For that project a DSP-
capable microcontroller was optimal for cost and integration reasons. FPGAs lack integrated ADCs
(analog to digital converters) and DACs (digital to analog converters) internally, but developers can
get that functionality in a DSP. Nuvation used a small DSP-capable microcontroller with a 12-bit
ADC, 8-bit DAC, and an Ethernet interface to handle an entire control loop without additional parts.
This approach saved considerably on the BOM cost and board complexity.

As another example, Nuvation recently worked on a motor control application with multiple control
loops that would work well in an FPGA's parallel architecture. However, a TI DSP capable controller
was clearly optimal when we considered the cost implications of developing each separate
processor-type function (communications, supervising, etc).

FPGAs in signal processing applications


FPGA devices let designers to create custom logic for widely parallel, high computation rate signal
processing. For example, an 81-tap FIR filter operating at 400 MSPS requires over 32 billion
MMACS. Such performance is approximately an order of magnitude beyond the capabilities of a
single DSP. However, a single FPGA can easily provide this performance.
It is important to mention that this performance is available at a cost premium, which can be up to
an order of magnitude higher than the price of a DSP. The increased cost is due to NRE costs and
the cost of the FPGA itself. For example, most signal processing systems require more than just a
simple function such as a FIR filter. Most systems perform other types of functions such as data
treatment and decision making. When using an FPGA, every additional signal processing function,
data treatment and/or algorithm requires its own specialized logic. Therefore, added system
complexity may quickly increase the device size, NRE costs and schedule.

FPGA-DSP co-processing architectures


As mentioned in the previous paragraph, each added function in the FPGA has the potential to
increase the schedule, NRE and parts cost. If the added functionality is within the capabilities of a
DSP chip, it is cost effective to implement that function on a DSP while keeping the MAC-intensive
operations in the FPGA.

In general, this means placing functions that consume less than 1000 MMAC on the DSP, and
placing functions with higher requirements on the FPGA. For example, Nuvation implemented an
envelope detection application with a 500 MSPS sample rate on a DSP-FPGA hybrid. The FPGA
performed the initial high-sampling rate filtering and decimation, and a DSP performed the
remaining signal processing functions. This system configuration profiting from advantages offered
on each platform. Application example: IP camera reference design
Application example: IP camera reference design
To illustrate these design decisions in more detail, let us consider an example video application.
Video applications such as IP set-top-boxes, digital video recorders (DVRs), entertainment devices,
and digital cameras (to name a few) are growing rapidly. In such applications, system architects
search for a platform that can address the following areas:

● Video ports and other interface connectivity


● Digital signal processing power
● A CPU that can execute various scheduling, management, and control tasks
● Volume pricing for multi-unit applications

Often these applications target the consumer electronic market. This translates to cost sensitivity,
including a desire to minimize NRE costs. Furthermore, as consumer electronics is a fast moving
market, time-to-market becomes a crucial factor. Overall project risk also needs to be minimized.
Finally, the trend toward miniaturization of consumer electronics and security applications plays an
important role in the system's form factor.

The IP camera market illustrates these concerns. The explosive growth of video surveillance,
machine vision and video teleconferencing, have created a need for a low-cost camera reference
design. Such a design should allow clients to achieve rapid time-to-market with minimum NRE, while
also allowing modification of the design to tailor it to a specific application. In response to these
needs, Nuvation specified the following requirements for its IP camera reference design:

● Smallest form factor IP camera


● Standard optics (CS mount) with wide dynamic range (WDR) imaging
● Low power with PoE support
● Low BOM cost
● DFM/DfX including RoHS compliance and obsolescence risk mitigation
● Full embedded Linux, real-time
● TCP/IP and/or analog video output
● H.264, MPEG-4, MJPEG encoder flexibility, up to D1 at 30 fps
● Support for custom or licensed video analytics software
● Field programmable

Figure 1. Nuvation IP camera reference design. Photo Credit: Jason Rothe.

To meet the requirement objectives, Nuvation engineers chose a DaVinci device from TI, the
TMS320DM6446. This device is a high performance digital media system on chip (SoC) targeted at
high-end video applications. It is a dual processor device that contains a C64x+ DSP core for
accelerated video processing and an ARM9 core for co-processing tasks and peripheral
management.

As the central device in the IP camera reference design, the DM6446 is responsible for acquiring
video data, encoding it in the desired format and outputting it via Ethernet and TCP/IP. As a dual
processor device, the DM6446 allows designers to implement signal processing algorithms in the
C64x+ DSP core while executing other tasks, such as packet assembly and peripheral management,
in the ARM9 microcontroller core.

The availability of a full Linux distribution for the ARM9 is another advantage of the DM6446. Linux
allows system designers to use existing firmware in the open source community and quickly
integrate third party libraries. The DM6446's Ethernet ports, video ports and small footprint and
power consumption were also driving factors for choosing the device. In brief, the DM6446 DSP
made it possible to meet the design requirements while minimizing NRE.

FPGA or DSP? Choose


The choice between the FPGA and DSP depends on many parameters. There is no global recipe for
making the right choice, and there are always trade-offs. It is the understanding of these trade-offs
that guides an architect to choose a platform that best meets the requirements of a specific system.
We highlighted design examples where a FPGA or a DSP is the superior choice, as well as cases that
call for a DSP/FPGA hybrid system. Using these examples, we hope to have given you more insight
into choosing the appropriate device for your design. For more information, or help on choosing the
correct device, contact Nuvation at sales@nuvation.com.

About the authors Bamdad Afra obtained his bachelor's degree from University of Waterloo,
Canada. He has over two years of experience in signal processing and over 5 years of experience in
FPGA design. He has held various technical positions at Texas Instruments and Nuvation.
Amit Kapadiya has more than 10 years of experience in the embedded system design industry. He is
currently the Marketing Manager at Nuvation, and has held hardware engineering positions prior to
marketing. His undergraduate degree in computer engineering is from the University of Waterloo,
Canada.

Related articles

● Design: A Survey of Mainstream DSP Processors


● Design: How to Choose the Right FPGA
● Design: DSP system design, part 1: The basic laws
● Design: DSPs vs. FPGAs for multiprocessing
● Design: Special Preview: BDTI's FPGAs for DSP, Second Edition
● Design: FPGAs vs. DSPs: A look at the unanswered questions
● Design: FPGA/DSP blend tackles telecom apps
● Design: Multi-chip architectures partition H.264 tasks to achieve high-quality video

Вам также может понравиться