Академический Документы
Профессиональный Документы
Культура Документы
REVIEW OF
DSP
HARDWARE
TECHNOLOGY
2.1 Introduction
finds itself suddenly in the main stream of embedded systems technology. DSP has
military, medical and industrial products. While the number and variety of products
that include some form of signal processing has grown dramatically large over the
last decade, the DSP hardware has also evolved according to the requirements of the
applications and algorithms [14]. Throughout the history of computing, the DSP
algorithms particularly for real-time applications have pushed the limits of number
such heavy computational demand for the real time processing of audio, video, radar,
sonar, and several other signals. Options range from dedicated full custom VLSI and
rapidly host and update algorithms, but typically operate at a fraction of theoretical
11
peak performance due to inefficiencies in mapping algorithms to the available
logic.
The DSP hardware, therefore, may be placed in three main categories e.g.,
DSP hardware. In this Chapter, we shall present a brief review of the development of
these general purpose DSP, as well as, the ASIC-and FPGA-based DSP hardware. In
the next section we have discussed the evolution of the programmable DSP (referred
for executing DSP functionality. These processors can handle varieties of applications.
through a limited and fixed set of arithmetic and control operations organized and
sequenced in suitable programs. The general purpose DSPs are, therefore, also
the requirement of the applications, and are often more cost-effective than custom
12
custom ICs may be prohibitive [15]. These DSPs are, therefore, readily available, and
are widely applicable. In spite of their inherent inefficiency in terms of speed and
DSPs were introduced in early 80s. The first generation DSPs such as TMS 32010
and NEC 7720 adopted basic Harvard architecture (Fig. 2.1) that consisted of a
program memory, a data memory, a multiply-aecumulator unit and a control unit with
Various features of the DSP architecture, e.g., the number execution units, bus
systems, memory access for data and instructions, instruction set design, address
generation and addressing have evolved in the last twenty-five years according to the
need of the algorithms and applications. The need to minimize the cost and energy
consumption has influenced the data word width used in DSP processors. DSPs tend
to use the shortest data words that provide adequate accuracy in their target
applications. These processors now support zero-overhead looping, since they usually
spend much of the computation time to execute small section of the program
13
repeatedly [14]. To allow low-cost and high-performance input and output, most DSP
processors incorporate one or more specialized serial or parallel I/O interfaces, and
memory access (DMA) that facilitates data transfer with little or no intervention from
bit floating-point CPU, separate address generators, DMA control, SRAM memory,
Conventional DSP processors contain a single MAC unit and an ALU with
few execution units. They are designed to execute one MAC instruction per clock
family, and DSP560xx family operating at around 20-50 MHz. Due to their low cost,
low power consumption and less memory usage they are popularly used in consumer
augmenting additional hardware and pipelining. The examples of such processors are
DSP563xx and TMS 320Cxx. TMS320C54x which operate at 100-150 MHz and
include additional hardware such as instruction cache and barrel shifter to improve
the speed performance in implementing DSP algorithms. Apart from that, such type
14
of processors also use instruction and arithmetic pipelines to improve instruction-
throughput, and overall reduction of processor time. These processors although do not
as, data parallelism to have more computation performed in every clock cycle. They
contain parallel execution units with extra multiplier and- adder circuitry with
allow more data words to be accessed per clock cycle and fed to the different
execution units for parallel operation. They also have wider instruction words to
conventional DSP processor requires specialized and complex hardware for executing
to compiler targets. So the new DSP processors with multi-issue approach have been
developed.
executing the instructions in parallel group rather than one at a time. The first multi
issue DSP TMS320C62xx was introduced in 1996. At that time it was much faster
15
than other DSP processors. Now all the DSP processor vendors (TI, Analog Devices,
Motorola, and Lucent Technologies) -are using multi-issue architectures for high
multiple instructions in parallel. They are very long instruction word (VLIW) and
superscalar processor.
units and it issues a maximum four to eight instructions per clock cycle. The
processors issue and execute two to four instructions per cycle. In VLIW architecture,
parallel. Accordingly, the instructions are grouped at the time of assembly process.
But superscalar processors contain a specialized hardware unit, which specifies the
programmer to the processor. Both VLIW and superscalar processors require high-
speed. These processors have more execution units, which are active in parallel in
comparison to the conventional DSP processor. Besides, they also require wide on-
chip buses and memory banks to support data movement for parallel execution of
16
2.3 ASIC-based DSP Hardware
signal processing functions. Considerable advance has taken place in the past few
decades in the field of microelectronics, and therefore, it has been possible to realize a
complete printed circuit board on a single chip. ASICs are the key components in
performance and lower power consumption. ASICs and other semiconductor chips
with ASIC blocks are therefore widely used in space applications, defense
applications and consumer products as well. The architecture of ASIC can be put in
two basic categories. One category is based on standard cells while the other is based
on gate arrays. Both these categories differ widely in term of their manufacturing
techniques, cost involved as well as the development time. The gate arrays consist of
rows and columns of regular transistor structures, where each basic cell or the gate
transistors are connected together first to realize low-level functions and low-level
functions are then routed and connected to build higher-level functions. The standard
cell ASICs on the other hand are designed by using transistors which are already
17
connected together and routed to form the higher level functions like flip-flops,
adders and counters. The ASIC designers connect these cells together for
implementing the higher-level functions. The standard, cell ASICs are more
customizable, possess higher utilization of chip area and involve smaller die size than
the equivalent gate array implementation, but involve high NRE cost and high turn
around time.
Most of the DSP ASICs use fixed-point numeric format, because arithmetic
with floating-point format is more complex, and requires more silicon area than
fixed-point format. The precision and the range of numeric format used in the design
however affect the behavior of the system in the following three ways:
(i) Functionality: The frequency response of the system changes as the locations of
the poles and zeros of the filter changes when the filter coefficients are
quantized.
arithmetic operation.
(iii) Overflow: The output of the system may be distorted due to overflow that may
CAD tools are available these days to analyze the effects of fixed-point
18
numeric formats. One can simulate the design using appropriate test signals and
analyze the fidelity of the output signals. Apart from that, filter design packages also
can be . used to evaluate the behavior of a filter implemented using different fixed-
point numeric formats. The DSP ASIC designers also have option to customize the
designs that best suits the constraints and requirements such as area, speed, power
consumption, production cost, design cycle time of their applications. The most
requirements and sampling rates of the application. It is imperative to state that the
underlying hardware structure then directly follows the degree of parallelism in the
algorithm and the architecture. The hardware synthesis tools are found to be quite
useful for the design of hardware architecture. In the first-pass such synthesis tools
candidate architecture. In the second pass, higher-level synthesis tools are used for the
In case of ASICs, the designers need to wait for weeks for the delivery of the
finished products but the FPGA designers can realize the design on FPGA chips by
19
themselves in minutes. FPGA is a class of field programmable logic device that is
reconfigure the FPGA for multiple reuse. This can be an advantage in applications
that need multiple trial versions within a development cycle. FPGAs are significantly
more expensive compared with the DSP, but can have higher performance in specific
applications, and would have less power consumption compared to the programmable
DSPs. FPGAs are mostly used these days for testing, rapid prototyping, and for low-
been possible to reprogram the FPGAs as many times as one need. The same silicon
resource in FPGA thus can be reused for wide range of DSP functionalities. FPGAs
derive this advantage of flexibility over the ASICs at the cost of higher power
spite of all those, architectural differences of the FPGAs from different vendors they
number of more powerful logical units while the fine-grained FPGAs contain
20
relatively larger number of less powerful/elementary logic blocks. Most of the
popularly used FPGAs are coarse-grained and are based on the logical units based on
look-up-tables (LUT). Xilinx 4000 series family of FPGA is comprised of the logical
units called as configurable logic blocks (CLB). Each CLB consists of two 4-input, 1-
output LUTs, one 3-input, 1 output LUT, two flip-flops, and some multiplexers for
selecting appropriate output out of the LUTs or flip-flops. Each CLB of the Xilinx 4K
series can be used to implement a two-bit adder or a nine-bit parity checker. Fine
grained FPGA like Xilinx 6200 consists of two 4-to-l multiplexers and three 2-to-l
multiplexers which can implement any two input functions or single bit storage. The
coarse grained FPGA like Xilinx 4K series can also be used as a form of distributed
memory in addition to logic resources e.g., a each LUT can be used to implement a
programmable switches to connect the LUTs, memory blocks, and the flip-flops for
programmable routing.
Until recently FPGAs, however, have rarely been used to implement DSP
tasks as they are more power hungry, and as they were lacking the gate capacity to
handle the DSP algorithms. All this may be changing with the introduction of new
DSP-oriented products from various FPGA vendors. Altera's Stratix family and
enhancements. For example, both these products offer hardwired on-chip multipliers
embedded throughout the reconfigurable logic array that are intended to accelerate
21
elements, FPGAs are improving their energy efficiency and cost performance while
DSP applications often exceed the performance available from even the fastest DSP
2.5 Conclusion
In this chapter we have discussed the three main categories of DSP hardware,
e.g., programmable general purpose DSP, FPGA-based DSP hardware and ASIC-
based DSP hardware. DSPs are potentially reprogrammable in the field, allowing
product upgrades and are often more cost-effective than custom hardware,
particularly for low volume application, where the development cost of custom ICs
architecture to do multiple tasks in one cycle for faster realization of DSP algorithms.
dividing the memory in to number of banks. We have discussed here the evolution of
to very long instruction, word (VLIW) and superscalar processors, and reconfigurable
requires chips with very high number-crunching capabilities and very high throughput
22
that is not met even by the fastest programmable DSP. Very often again the
tasks in such situations are usually carried out by the ASICs. ASICs can achieve high
area, but they require massive design efforts. ASIC-based hardware, however, do not
offer any flexibility of operation for more than one applications or product
upgradation under evolutionary technology because once the design and fabrication
design time and very high non-recurring engineering cost, and therefore, they are
currently used only for high volume market, time-critical and military applications.
FPGAs are thus becoming more popular these days. FPGAs possess the similar
capability as the ASICs to provide specific circuits for a given DSP application, but
differ basically in terms of their internal connectivity. Unlike dedicated hardware, the
FPGA can be time shared between algorithms by simply reloading the configuratioh
code. In contrast to a programmable DSP, the FPGA actually assumes the logic
typically about 20% of a microprocessor based design working at the same sample
rate. It is not hard to see how the configurability of FPGAs makes them ideal for
23
customized but reconfigurable logic that can execute specific, compute-intensive
24