Вы находитесь на странице: 1из 11

A High Dynamic Range Digital Down-Converter for Software Radio

Application
Adrian Nash
Entegra Ltd
Maidenhead SL6 3NH
UK
Email: Adrian@entegra.co.uk

Abstract
As Digital Signal Processing encroaches further into a radio receiver's RF sections, the dynamic
range that must be handled by the Digital Down- Converter (DDC) increases. This paper presents a
high dynamic range DDC designed using HDL Design Studio and SystemView [1]. The techniques
applied to attain >100dB spur-free dynamic range in a compact design are reviewed. This is
followed by a presentation of the automated design-flow from system-level design through to
synthesis in FPGA technology.

1.

Introduction

As the dynamic range and bandwidth capabilities of both data converters and DSP (software and
hardware) increase, DSP is taking on more of the IF or even lower RF signal processing roles in
a receiver. This is especially so for multi-channel receivers.
For some time, wireless IC vendors have offered very capable data converters and front-end
DSP chips to down-convert a receiver's IF or RF signals to a zero-IF complex signal with a
limited bandwidth. Whilst these ICs are usually programmable to suit a wide range of needs,
they have a fixed architecture. This can result in a poor cost-to-performance trade-off in some
applications that require more channels than the IC has to offer. Since the IC's architecture
cannot be changed by the user, the only alternatives are to either re-design the radio card with
more ICs, or to adopt a modular approach in which extra receiver cards are provided; both
options are expensive.
For this reason, many designers are adopting the FPGA. FPGA vendors have responded to the
demand by producing architectures that inherently support high speed DSP operations. The
FPGA enables the architecture to be programmable. However, FPGAs have a high unit cost,
relatively high power consumption and relatively little available memory (RAM or ROM)
compared to ASICs since much of the device must be dedicated to interconnect and logic cells.
This paper presents a fully custom-designed, area-efficient Direct Digital Down-Converter
(DDC) suitable for FPGA implementation. It achieves high dynamic range in the Numerically
Controlled Oscillator (NCO) by using interpolation rather than the more common use of a
massive sin(x) ROM lookup-table combined with Dithering logic to achieve a high Spur-Free
Dynamic Range (SFDR).
A design-flow is presented in which the DDC design is automatically converted from the system
design plane to the implementation plane in an FPGA vendor-independent manner.

20

20

20

I channel
output

16
Decimating CIC filter

Image rejection
half-band filter

IF input signal
(16 bits) 16

NCO

16
20

20

20

Q channel
output

Decimating CIC filter

Image rejection
half-band filter

Figure 1: The Digital Down Converter (DDC) showing the main components.

2.

Description

2.1 Overview
Figure 1 shows the main components of the DDC. This design demodulates an IF centred on
9.0MHz with a 12MHz bandwidth. The Spur-Free Dynamic Range (SFDR) is designed to be
100dB or better. The sample rate, fS is 50MHz.
The Demodulator is composed of two 16x16 bit multipliers that produce a 32 bit result. The 20
Most Significant Bits (MSBs) are retained (with rounding) to limit the growth of spurs to < 100dBc. It produces products at fIF-fNCO and fIF + fNCO (the image).
The Image Rejection Filter is a 27 tap half-band FIR filter with 12 bit coefficients that limits the
image at fNCO + fIF to -78dBc. Its cut-off frequency is fS /8.
Following the Image Rejection Filter is the Decimation Filter. The Decimation Filter is a 3rd
order Cascaded-Integrator-Comb (CIC) type which decimates by a factor of 4. It provides a
further 33dB of rejection at the image frequency, bringing the total image rejection to 111dB.
2.2

The Numerically Controlled Oscillator (NCO)

The NCO is shown schematically in Figure 2. It generates sin(x) and cos(x) signals which are
fed to the demodulator's multipliers. The main feature of the NCO is the very small Lookup
Table (LUT) used, only 512 x (16 + 8) = 12,288 bits. The FPGA that the design is targeted at
has a limited amount of memory resource which can be allocated as ROM for the LUTs.
The worst-case spur level for an NCO is dependent upon lookup table size (in both length
representing phase precision) and width (representing amplitude precision). In some
commercial DDC chips, massive lookup tables are used. In an ASIC or volume production IC,
memory is low cost and in plentiful supply. Therefore, it makes sense to use memory to gain

Phase
increment

32

-1

Extract
b[31:30]

Extract
b[29:21]

Extract
b[20:3]

Quadrant
selection
logic

Y0 coeff. 512
word LUT

Y1 coeff. 512
word LUT

16

16

Figure 2: Simplified diagram of the NCO. Though the NCO generates both sin(x) and cos(x),
only one channel is shown.

dynamic range. However, in the case of an FPGA, the available memory cells may be very
limited. In this design, interpolation is used to achieve a high dynamic range with two short
lookup tables.
If a B-bit phase accumulator is truncated to b bits, the phase quantization results in a Spur-Free
Dynamic Range (SFDR) of 13 - 6b dB [2]. In this example, B is 32 bits and b is 11 bits (ignoring
the interpolation bits for the moment). Thus for 11 bits, the worst case SFDR is 53dB.
By adding a phase dithering signal of B-b bits to the phase accumulator output, the SFDR can
be increased to 7.84 - 12.04b dB [2]. However, Dithering increases the Signal to Noise Ratio
(SNR).
In a broad-band communication system, the slight increase in SNR caused by Dithering is
usually acceptable because the Noise Power Spectral Density is low and may be masked by the
noise from the RF section. However, in this application, narrow band communication systems
may be used. If a high close-in dynamic range and selectivity is required, the noise level due to
Dithering may be too high. Linear interpolation has the benefit that it significantly lowers the
average spur power but without raising the noise floor, especially close to the IF centre
frequency. Ultimately the maximum SFDR is limited by the amplitude quantization which is 16
bits and the noise floor is limited by the jitter of the clock generator and A/D converters.
The 32 bit phase accumulator is split into 11 integer bits and 7 fractional bits. The 11 LSBs are
ignored since they have been found to have minimal impact on the SFDR. The fractional bits are
used for linear interpolation. The output from the NCO is:
y > n @ = s I > n @ + ^ s I > n @ s I > n @ 1 ` I > n @
(1)
Where y > n @ is the output from the NCO, s n is a sinusoidal function, I > n @ is the integer part
of phase I > n @ and I > n @ is the fractional part of I > n @ .

The term s I > n @ s I > n @ 1 is most conveniently pre-calculated and stored in a


secondary lookup table. The number of amplitude bits for the secondary table is obviously less
than that for the primary table. From Eq. 1 it can be seen that a multiplier and adder are required.
In an FPGA architecture that has embedded multipliers and/or adders, this may represent little
overhead in logic.
The lookup tables are both 512 words long and store a quarter of a cycle. The 11 integer bits are
split into 9 table-indexing bits and the 2 MSBs are used to indicate the quadrant.

Figure 3: Frequency and phase response of the image rejection filter.

2.3

Filtering

2.3.1 Image Rejection Filter

The image rejection low-pass filter is a 27 tap half-band design. The frequency response is
shown in Figure 3. The corner frequency is set to fS/8. The stop-band frequency is set to 3fS/8.
The minimum stop-band attenuation depends upon the precision of the coefficients. Using 12
bits, the minimum at fS/2 is about 62dB. As can be seen from Figure 3, the rejection at the
18MHz image is 77dB.
2.3.2 Decimation Filter

The Decimation filter is a 3rd order Cascaded Integrator Comb (CIC) type. The schematic (from
SystemView [1]) is shown in Figure 4. The number of bits required in the accumulators is 2M
+ N where N is the number of input bits and M is the number of stages.
Since the input width is 16 bits, the accumulators must therefore have a width of 26 bits. The
integrators naturally overflow (since the differentiators are placed after the decimation) but the
overflows are deterministic and do not affect the overall filter's behaviour.

Figure 4: The Cascaded Integrator-Comb decimation filter (SystemView).

3.

Implementation

The DDC was designed in SystemView by Elanix [1]. The main reasons for selecting this tool
were:
A wide range of libraries including RF/analogue, IEEE 802.11 and UMTS enabling a wide
range of radio test benches to be constructed around the DDC design.
Efficient multi-rate scheduler.
Bit-precise DSP library.
Support for HDL Design Studio enabling the design to be converted to VHDL and
combined with an existing hardware design in one environment.
3.1 Design Flow
Figure 5 illustrates the design flow. The design process started with a high precision (64 bit
floating point) simulation model that included models of the transmitter, RF channel, low-noise
front-end, RF down-conversion mixer and A/D converter. Such a model enables the
performance of the DDC to be measured in terms of link performance metrics, e.g. Bit-Error
Ratio. From this model, the precision specification for each block was ascertained.

Once the block specifications had been decided, the finite precision model was created using the
techniques described in Section 2. This model was then tested by substituting it for the high
precision model with the in the SystemView Testbench. The finite-precision ("bit-true") design
was automatically converted to VHDL using HDL Design Studio.

Floating point
design

System level
VHDL
simulation

Fixed-point
design

Synthesis

Convert to
VHDL

results
comparison

Routing

SystemView
Block level
VHDL
simulation

Post-layout
simulation
HDL Design
Studio
Test

Import into
main chip
design

EDA tools
(any vendor)

Figure 5: Design flow used for the DDC.

The SystemView-designed DDC block can be merged into a complete chip design using HDL
Design Studio's graphical design management tool. A working solution requires many other
components besides the DSP. In this example, A/D interface, D/A interface and control logic
(including the control registers and DSP bus interface) are also required. This support logic was
inherited from a previous IC design project.
The VHDL output from HDL Design Studio was simulated using ModelSim (though other
simulators could be used) using a Testbench that was automatically generated by HDL Design
Studio. The HDL simulation results were imported back into SystemView's Analysis tool for
direct comparison with the system-level simulation. Sometimes, changes to the system-level
design such as addition of pipelining registers (delays) to speed up the design in hardware are
required. The cycle of exporting to HDL Design Studio then simulating the HDL design is a
tight loop due to the help of HDL Design Studio's macro-based script processing and process
binding tools.
The VHDL was synthesized to the target device (Xilinx Virtex-II) using Leonardo Spectrum.
HDL Design Studio is Vendor-independent so the simulation, synthesis and other down-stream
tools can be whatever the user decides to use. These tools can be run under the control of HDL
Design Studio using macro-based scripts. Synthesis results for both Xilinx and Altera devices
are presented later.
3.2

HDL Design Studio

HDL Design Studio is a language-based hardware design environment. When a DSP design in
SystemView is exported to HDL Design Studio, a Netlist file is generated which is translated
into hierarchical VHDL or Verilog. SystemView provides not only hierarchical information but
clock-domain and feedback loop information which HDL Design Studio's translator exploits to
create a synthesizable, synchronous logic design. HDL Design Studio also creates a multiple
Clock Domain Testbench using test vectors and expected results generated by SystemView.
HDS embeds the automatically generated HDL with existing, user-created HDL in a nondestructive manner. This results in a very efficient incremental design loop.
Figure 6 illustrates a key concept in HDL Design Studio; the automatic generation of flowcontrol signals. SystemView is a multi-rate simulator in which a time-active block (such as a
decimator) changes the sample rate (the event rate in a more general data-flow sense) for downData In
Timepassive
CE

Timepassive

Data Out

Time-active

Data Out

Time-active

CE

RDY

CE

CE

CE
Data path signals

Enable
Domain 1

Automatically generated flow control signals

RDY

RDY

Enable
Domain 2

Figure 6: HDL Design Studio's automatic generation of "Enable Domains" to provide event-driven data flow.

stream blocks. The Netlist generator analyzes the user's design and identifies Clock Domains
(signals are timed to different clocks which are usually asynchronous with respect to each other)
and Clock-Enable (or simply Enable) Domains. HDL Design Studio generates an infrastructure
of clocks, and data-valid (Ready) flags to indicate when data are valid. The automatic
management of data flow avoids the system-level designer becoming involved in the detailed
data-transport signals required in a Register Transfer Level (RTL) design.
HDL Design Studio creates HDL for the design, and provides a framework to support a wide
range of VHDL or Verilog modelling techniques. This includes models for many of the DSP
blocks in SystemView. In addition, custom functions written in VHDL or Verilog can be easily
included together with IP Vendor functions providing an appropriate SystemView run-time
model is available.
The VHDL code that represents the DSP blocks in SystemView is supplied as free source code
so that a user can extend or modify the functionality as required. When the design is translated,
HDL Design Studio references the appropriate VHDL code modules from the library. It also
references any user's custom code modules.
Figure 7 shows the DDC design in SystemView. Figure 8 shows the same design exported to
HDL Design Studio. In the HDL Design Studio environment, the DDC design can be integrated
with other HDL blocks such as the control logic and A/D converter interface (not shown).

Figure 7: The SystemView schematic for the Digital Down-Converter.

Figure 8: The equivalent Digital Down-Converter design in HDL Design Studio. Each circle represents a hierarchical object. Grey circles have child
blocks. Green circles are "Leaf" blocks.

3.3 Automatic Filter Design Generation


The Image Rejection filters are automatically translated from SystemView to a synthesizable
VHDL design using HDL Design Studio's Optimized Parallel FIR Filter Generator (OPFGEN)
tool. This tool is automatically called as part of the design translation process. It produces
multiplier-free fixed coefficient filter designs given an arbitrary set of coefficients. OPFGEN
can create a design for any Finite Impulse Response desired so is not limited to specific filter
designs. Currently only Feed-forward taps are supported, but it is planned to support feedback
taps in the future enabling completely arbitrary difference equations to be translated to VHDL.

4.

Simulation Results

Figure 9 shows the complex spectrum of the I + jQ output signal from the VHDL simulation
(imported into SystemView's Analysis tool). These results exactly match the SystemView
simulation as expected. For a 9.500MHz input signal (NCO tuned to 9.000MHz), the worst- case
spur in the 12.5MHz bandwidth is -106.6dBc at 6.000MHz.

5.

Synthesis Results

The DDC design has been synthesized using Leonardo Spectrum (level 2) to two types of
device: a Xilinx Virtex II (XC2v500fg256-6), and an Altera Stratix (EP1S25F1020C-5). The
results for area (Xilinx Slices and Altera Logic Cells) and maximum clock speed are
summarized in Tables 1 and 2.

SystemView

VHDL simulation results: I + jQ output with 9.5MHz input signal.


-5e+6

-2.5e+6

-5e+6

-2.5e+6

2.5e+6

5e+6

2.5e+6

5e+6

Magnitude, dB

-50

-100

-150

Frequency, Hz

Figure 9: VHDL Simulation results. Complex spectrum of the I + jQ output with a 9.5000MHz input signal.

ALTERA Stratix EP1S25F1020C-5


Design Element
NCO
Image Rejection Filter
CIC Decimation Filter
Complete DDC Design

Logic Cells
301
975
286
2931

Max. Clock speed (MHz)


271.6
408.1
325.8
271.6

Table 1: Synthesis results for Altera Stratix.

Xilinx XC2V500fg256-6
Design Element

NCO
Image Rejection Filter
CIC Decimation Filter
Complete DDC Design

Slices

126
482
139
1397

Max. Clock speed (MHz)

177.4
233.5
241.4
102.1

Table 2: Synthesis results for Xilinx Virtex-II.

In both Xilinx and Altera devices, the limitation on maximum clock speed is due to the path from
the NCO output to the 32 bit multipliers (Tokens 3 and 4 in Figure 7). The NCO itself is the
slowest component; this is due to the limitations of the 15 bit interpolation multipliers. In the
case of Virtex-II, the embedded 18x18 bit multipliers were used in all cases. A disadvantage of
these multipliers is that due to their fixed position in the chip layout, often the routing delay
associated with them can be quite significant. In this design, the critical path timing was around
10ns, 6ns of this was due to the multiplier routing. Use of a conventional pipelined multiplier
may improve the overall speed for Virtex-II significantly.
The required sample rate is 50MHz, but the design is capable of being clocked at least twice as
fast. This means that a significant reduction in area can be made by multiplexing some hardware.
For example, the filters could be multiplexed between I and Q by clocking at 100MHz.
OPFGEN supports multi-channel filters. Whilst the multiplexing logic can be introduced at the
system design level, greater reductions in area may be gained by applying architecture-specific
multiplexing logic at the RTL level once the basic design has been proven to work correctly.

6.

Conclusion

This paper has presented a high dynamic range Digital Down-Converter (DDC). The DDC's
Numerically Controlled Oscillator employs linear interpolation to vastly reduce the size of
sin(x) lookup for a given Spur-Free Dynamic Range. The avoidance of Dithering enables the
close-in dynamic range is maximized.
The DDC was designed in SystemView then automatically translated to a synthesizable VHDL
design using HDL Design Studio. HDL Design Studio's automatic insertion of event-driven data
flow control signals and generation of multiplier-free VHDL code for the image rejection FIR
filters was also presented. The simulation results compare exactly with those of the SystemView

10

simulation. Though not presented here, a post-synthesis simulation for Xilinx Virtex-II also
matches the SystemView simulation results.
The synthesis results for two leading FPGAs (Altera Stratix and Xilinx Virtex-II) indicate that
the resulting design is both area-efficient and fast. Some limitation is encountered through the
use of embedded multipliers in Virtex-II.

7.

References

[1] http://www.elanix.com
[2] M. J. Flanagan, G. A. Zimmerman. Spur-Reduced Digital Sinusoid Synthesis. IEEE Trans. Communications.
Vol. 43, No.7, July 1995, pp. 2254-2262.

11

Вам также может понравиться