Академический Документы
Профессиональный Документы
Культура Документы
Ashkan Borna
Mojtaba Mehrara
Robert Mullenix
Brian Pietras
Student
Dept. of EECS
University of Michigan
Student
Dept. of EECS
University of Michigan
Student
Dept. of EECS
University of Michigan
Student
Dept. of EECS
University of Michigan
ashborna@umich.edu
mehrara@umich.edu
rbmullen@umich.edu
bpeitras@umich.edu
ABSTRACT
In this report, implementation of a Viterbi decoder on a 16 bit
RISC microprocessor with a 2-stage pipeline is described in
detail. Some extra modules and instructions have been added to
the baseline processor to optimize it as a Viterbi decoder. The
Viterbi algorithm is used for decoding a class of error correcting
codes called convolutional codes which are widely used as
channel coder in digital communication systems. We discuss our
design choices and supplemental additions, as well as some of the
pitfalls encountered.
Keywords
Viterbi algorithm, convolutional codes, RISC processor, VLSI.
1. INTRODUCTION
The Viterbi algorithm is widely used in communication systems
to extract the most probable bit sequence out of a transmitted data
stream that has been encoded using convolutional codes. The
algorithm is based on computing the distance (Hamming distance
for hard input data and Euclidean distance for soft input data)
between the received data sequence and all possible sequences,
and extracting the most probable one.
000
001
010
011
100
101
110
2. BACKGROUND
2.1 Convolutional Encoder
111
3. CHIP OVERVIEW
3.1 Design Considerations
3.2.1 Datapath
Our fully custom datapath incorporates the register file, ALU,
shifter, and enough multiplexers (muxes) and tri-state buffers to
control the flow of data from one unit to another. Although we
designed each component independent of each other, we adjusted
input and output ports as the interactions revealed more accurate
knowledge about timing and load constraints. Section 6 details
the timing specifics.
the non-uniform length made custom implementation very timeconsuming with only marginal perceived performance gains. In
hindsight, we probably should have picked the fixed length
design.
For our shifter, we used the funnel shifter design described in [6].
We picked this shifter because of its simple, intuitive layout and
flexible functionality. We achieved logical and arithmetic right
and left shifts just by using a 4-to-1 mux on the input. Although,
the Viterbi Algorithm did not require arithmetic shifts explicitly,
we gained this functionality with almost zero additional work.
During the datapath construction, we noticed poorer than
expected performance from our components. Our muxes did not
do an adequate job of passing the data with minimal delay. We
spent some time investigating different mux designs, until we
resolved on a six-transistor circuit (a PMOS, an NMOS and two
inverters as output buffers) that perfectly met our needs. It had
the minimum delay of the different designs we tested, and we
could fit two of them inside our bit-slice width of 73.5 lambda.
3.2.2 Controller
For the most part, the controller determines the next state logic on
the positive edge of the clock. We could not, however, get the
Next_PC register to set correctly unless we used the negative
edge. This had a detrimental effect on our final clock speed, since
that forced the branch instruction to finish in half a cycle.
BMU Rdest
PM0
PMU
PM1
PMU
Rsrc1,Rsrc2,Rdest
Swap1(2)
Rsrc1,Rsrc2,Rdest
Hs Rsrc,Rdest (half
shift)
PM2
PMU
PM3
Hcmp1(2,3) Rsrc,
Rdest
Figure 5 shows the program flow for decoding the input data.
4. TESTABILITY
scan_in
scan_out
PSR
IR
PC
Isolated Delay
Integrated Delay
0.7835ns
0.8646ns
0.6744ns
0.7224ns
ALU
2.4154ns
2.3589ns
Shifter
0.290ns
0.6709ns
5. PINS
Our processor uses twenty-six pins. I/O requires twelve pins,
leaving the remaining fourteen as power and ground. Figure 7
shows the I/O placement around our chip with respect to the
major internal components. Five pins serve as testability and
overall system control/synchronization (scan_en, scan_in,
scan_out, reset, and clk). We use one pin to start the beginning of
Viterbi decode (do_viterbi), three pins for validation/handshaking
(send_data, data_out_valid, and din_valid), two pins that provide
input to the Viterbi module (opx_serial and opy_serial), and one
pin (data_out) that contains the recovered signal.
opx_serial
send_data
data_out
data_out_valid
do_viterbi
dirty gnd
dirty vdd
clean gnd
Component
opy_serial
RAM
clean vdd
Viterbi
din_valid
reset
clk
dirty gnd
clean_vdd
ROM
Ctrlr
dirty vdd
Dtpth
clean gnd
dirty gnd
dirty vdd
dirty gnd
scan_en
scan_in
scan_out
clean vdd
clean gnd
dirty vdd
6. TIMING ANALYSIS
We ran timing analysis on each component as we built them,
determining the rise and fall times, worst-case delays, and critical
paths. As the circuit became progressively more complex, we
realized that testing each component in isolation would yield
inaccurate results due to a number of factors such as input drive
strength and output capacitance.
So we went back and
recalculated many of the delays. Table 1 highlights some of the
differences we observed.
Figure 9. Output Bit Error Rate vs. SNR for one million samples
7. Conclusion
In this project we fitted a Viterbi decoder in a baseline 16 bit
RISC processor by adding some extra features and instructions.
Although we could have implemented the decoder in a separate
module like an ASIC, our design fits well into the processor and
we used the processors features to implement our application. In
order to verify the final processor design, we wrote an assembly
code which performed the decoding and compared the chips
outputs to the Matlabs outputs which proved the complete match
between the hardware implementation and Matlab Code. Figure 9
shows the results of Bit Error Rate measurements of our decoder
in Matlab for one million samples and for different channel signal
to noise ratios.
8. REFERENCES
[1] Forney, G.D., Jr., The viterbi algorithm, Proceedings of
the IEEE, Vol. 61, Issue 3, p.p. 268-278 (March 1973)