Kelayakan Lab DSP Thesis

MASTER THESIS
A Prototype Laboratory Environment for Digital Signal Processing

Using Simulink and a Texas Instrument DSP Device
Calle Gustavsson
March 2002
IR-SB-EX-0207
Department of Signals, Sensors & Systems,

Royal Institute of Technology
Abstract
Normally, when a model is designed from building blocks in Simulink, the simulation is
performed within the Simulink environment. A test of the design in a real-time
environment requires that source code is generated, compiled and downloaded to the
target hardware. As a first attempt to bridge this software gap, this thesis describes and
evaluates a prototype laboratory environment, which directly links Simulink to a Texas
Instrument DSP device. The prototype system converts graphical models and makes
available various real-time signal processing algorithms, such as adders, delays, FFTs,
IIR filters and multipliers. Future work is to consider modification of the prototype to
allow for feedback in the graphical models and to find an efficient way of handling signal
processing algorithms where variable buffer lengths are required.
Acknowledgements
I would like to thank Ph.D. Mats Bengtsson for his encouraging comments on my early
work and for his excellent guidance throughout the project. Personally, I gained a lot of
practical software experience and valuable insights into signal processing. Also, I would
like to thank Ph.D. Student Georg Jörngren for good advice concerning DSP issues.
2
CONTENTS
TABLE OF FIGURES ..................................................................................................... 5
1 INTRODUCTION.......................................................................................................... 6
1.1 BACKGROUND............................................................................................................ 6
1.2 THE PROBLEM ............................................................................................................ 6
1.3 THE OBJECTIVE OF THE PROJECT ............................................................................... 7
1.4 EARLIER WORK ......................................................................................................... 8
2 SYSTEM OVERVIEW ................................................................................................. 9
2.1 SYSTEM OVERVIEW ................................................................................................... 9
2.2 COMMENTS ON THE SYSTEM OVERVIEW .................................................................. 10
2.3 PROGRAMMING METHODOLOGY .............................................................................. 10
2.3.1 Block-wise Processing..................................................................................... 11
2.3.2 The Sampling Rate........................................................................................... 11
2.4 EQUIPMENT .............................................................................................................. 11
3 THE MAIN PARTS OF THE SYSTEM ................................................................... 12
3.1 THE GRAPHICAL INTERFACE .................................................................................... 12
3.1.1 The Model Description Language/List-File..................................................... 13
3.2 THE LINK PROGRAM ................................................................................................ 15
3.2.1 Class Diagram.................................................................................................. 15
3.3 THE DSP PROGRAM................................................................................................. 18
3.3.1 Hardware.......................................................................................................... 18
3.3.2 DMA and Signal Processing............................................................................ 19
3.3.3 The Amount of Time Available for Signal Processing.................................... 20
3.3.4 Buffer Handling in the Signal Processing Unit................................................ 21
3.4 MAIN LOOP OF THE DSP PROGRAM ......................................................................... 22
4 OPTIMIZATION AND TESTMETHOD ................................................................. 24
4.1 OPTIMIZATION ......................................................................................................... 24
4.1.1 C or Assembly?................................................................................................ 24
4.2 MEMORY USAGE...................................................................................................... 25
4.3 TEST METHODS........................................................................................................ 26
5 RESULTS ..................................................................................................................... 29
5.1 SPEED PERFORMANCE OF THE SYSTEM .................................................................... 29
5.2 CASE STUDIES.......................................................................................................... 30
5.3 REQUIREMENTS........................................................................................................ 32
5.4 LIMITATIONS............................................................................................................ 32
5.5 CONCLUSION AND FUTURE WORK ........................................................................... 33
6 APPENDIXES .............................................................................................................. 34
APPENDIX A: BRIEF OUTLINE FOR RECURSIVE SEARCH .................................................. 34
APPENDIX B: C6701 INTERRUPTS.................................................................................. 35
3
APPENDIX C: DIRECT MEMORY ACCESS (DMA)........................................................... 37
APPENDIX D: USER’S GUIDE ......................................................................................... 38
7 REFERENCES............................................................................................................. 39
4
TABLE of FIGURES
FIGURE 1: PROTOTYPE LABORATORY SYSTEM OVERVIEW .................................................. 9

FIGURE 2: MODULE LIBRARY FOR THE GRAPHICAL INTERFACE ........................................ 12
FIGURE 3: A BASIC GRAPHICAL MODEL AND AN EXTRACT FROM ITS .MDL-FILE ................. 13
FIGURE 4:CLASS DIAGRAM. THE MOST IMPORTANT CLASSES IN THE LINK PROGRAM AND
THEIR ASSOCIATIONS ................................................................................................. 16
FIGURE 5: THIRTEEN-BLOCK MODEL. THE RECURSIVE ALGORITHM IN THE LINK PROGRAM
RECYCLES BUFFERS .................................................................................................... 17
FIGURE 6: EXECUTION ORDER. THE LINK PROGRAM WRITES DATA TO THE EXTERNAL
MEMORY OF THE DSP PROGRAM TO INFORM THE DSP PROGRAM WHAT MODULE TO
RUN AND WHAT BUFFERS TO USE................................................................................ 18
FIGURE 7: OVERVIEW OF A FEW OF THE PERIPHERALS ON THE EVM BOARD ..................... 19
FIGURE 8: DSP PROGRAM OVERVIEW ................................................................................ 20
FIGURE 9: MEMORY STRUCTURE. SAMPLES FOR THE DELAY MODULE AND THE IIR FILTER
ARE PRESERVED IN A SEPARATE MEMORY AREA ......................................................... 21
FIGURE 10: PSEUDO CODE FOR THE MAIN LOOP IN THE ISR ............................................... 22
FIGURE 11: EXTERNAL MEMORY ACCESS ........................................................................... 25
FIGURE 12: BENCHMARKING. THE SCREEN DUMP ILLUSTRATES THE BENCHMARKING
PROCEDURE IN CODE COMPOSER STUDIO . ................................................................. 26
FIGURE 13: FEEDBACK MODEL. THIS MODEL IS NOT HANDLED BY THE SYSTEM ................ 33
FIGURE 14: INTERRUPT RESPONSE PROCEDURE .................................................................. 35
5
1 INTRODUCTION
This Master Thesis tests the feasibility of an idea through the design of a prototype
laboratory environment for digital signal processing. The thesis begins with an
introduction to the problem of controlling digital signal processors (DSPs). In particular,
the first chapter discusses the problem of directly linking a high-level tool for modeling
and simulation, Simulink (The MathWorks Inc. [1]), to a DSP. The second chapter
presents an overview of the proposed system and the subsequent chapter describes its
three parts: the graphical interface, the link program and the DSP implementation.
Chapter four discusses optimization, memory usage, and the test methods utilized.
Finally, the last chapter evaluates the performance of the system and identifies future
work.
1.1 Background
There are many different ways of controlling a digital signal processor:
1. By writing a computer program in a DSP-compatible language, for example C,

and then compile and download it directly to the target hardware.
2. By describing a system graphically using Simulink, and then generate C-code

using special toolboxes sold by MATLAB. The generated code is more or less
ready for compilation.
3. By developing special programs for DSP-control. You run the programs from the
MATLAB-prompt or from the DOS-prompt. Each program starts a specific
algorithm and parameter settings are possible before the function call.
Such programs form a vital part of the current laboratory exercise in the course Digital
Signal Processing at the Royal Institute of Technology.
The first alternative above is the most general, and the third the least general. The idea
behind this project is to explore a kind of DSP-control, which fits in between the second
and the third alternative as far as flexibility is concerned. To avoid the complication of
having to compile source code, the real-time application is implemented as a DSP-
program, controlled via parameter settings. To create a pedagogically sound and intuitive
laboratory environment, the proposed system has a graphical user interface (similar to
Simulink), where a number of building blocks can be connected in a block diagram
1.2 The Problem
Primarily, this project considers the linking problem between Simulink and a Texas
Instrument DSP device. Normally, when a model is developed from building blocks in
Simulink, the simulation is peformed within the Simulink environment. A test of the
design in a real-time environment requires that source code is generated, compiled and
downloaded to the target hardware.
6
Accordingly, a direct conversion of a Simulink model to some kind of “parameter table”-
representation, which a DSP-program can interpret, involves a number of problems. To
begin with, the analysis of a Simulink block diagram which describes a signal processing
system, consists of these tasks:
• Determining the building blocks (modules) of the graphical system and retrieving
their parameters.
• Identifying the input and output ports of the system
• Determining the connections between the modules.
• Determining the execution order of the modules in the graphical model.
After the analysis of the graphical model these tasks remain:
• Conveying the results from the analysis of the model to a DSP program.
• Creating the DSP-program and a toolbox library.
1.3 The Objective of the Project
The objective of this project is to develop a prototype laboratory environment for digital
signal processing.
The implementation is divided into three parts:
1) A drawing board, which displays a model of the desired digital signal processing
system. By double-clicking on a building block on the drawing board, parameter
values should be adjustable. A pre-defined limited toolbox library should contain
modules, such as FFT, linear filter, delay, multiplier and adder.
In digital signal processing the system typically performs some kind of filtering to extract information from a signal.
2) A Link-program, which interprets and converts the graphical representation of the

model, its component parts and their interconnections, to some kind of
“parameter-table”-representation. The table defines what building blocks and
what parameter values are used, and how the input and output ports are
interconnected.
3) A DSP-program, which interprets the parameter table description of the system,

and performs simulation, analysis and possible visualization. Part of the DSP-
program is a toolbox library consisting of the modules listed under
implementation, part 1.
The typical sampling rate of the system should be at least 8 KHz. It should be possible to
use sound (voice) as input application to the analog to digital converter (ADC) on the
evaluation module board (EVM Board), refer to figure 1, System Overview, below.
7
1.4 Earlier Work
The attempt to directly bridge the software gap between Simulink and a DSP device is, to
the author’s knowledge, a new design approach. However, several successful approaches
have been taken to convert Simulink models to VHDL code. Grout and Keane [2], for
example, describe the development of a software toolbox that can analyze a Simulink
block model in order to produce a VHDL representation of the model. The resulting data
from the toolbox is a model description language/list file (.mdl-file) for the complete
system, and a second model file that can be processed to create the VHDL code.
Similarly, Krukowski and Kale [3] outline the direct mapping of a Simulink structure into
one described in VHDL by generating a VHDL equivalent model.
Further, Matlab’s Real Time Workshop [4] allows for C code generation directly from
Simulink models. By combining such code generation tools with real-time systems
hardware, it is possible to real-time simulate and analyze signal processing designs. In
this paper, however, code creation is not considered.
8
2 SYSTEM OVERVIEW
Chapter 2 presents an overview of the implemented laboratory system.
2.1 System Overview
Figure 1: Prototype Laboratory System Overview
High Level (Block Diagram)
>>RUN LINK_PROGRAM
LINK PROGRAM DSP PROGRAM

DSP Out
get_param() Dsp(vector* table) EVM MODULE
ADC/
create table() { DAC
case (adder)
In
Medium Level Low Level (Assembler, C)

C++ Interface Real-Time Application
9
2.2 Comments on the System Overview
The process begins in Simulink - refer to figure 1 above. You develop a model of a
system on a drawing board in Simulink, using ready-made modules from a toolbox
library. Parameter settings can be changed, by right clicking on the building blocks.
You run the Link program from the prompt in MATLAB, or by double clicking on the
file “link.exe” in the Windows Explorer. The Link program converts the graphical
representation to a kind of “parameter table”-representation of the model. The Link
program automatically launches the DSP-program which interprets the parameter table
and starts the simulation in real-time by calling various DSP toolbox modules.
Finally, a loudspeaker or an oscilloscope connected to the digital-to-analog converter on

the Evaluation Module (EVM Board) conveys the result of the simulation.
2.3 Programming Methodology
A prototype is, according to [5]:

“An original thing in relation to a copy, imitation, representation, later specimen, improved form etc.,
a trial model, a preliminary version”
This prototype laboratory system was developed quickly. It has limited functionality -
time was not wasted on details. Nor is it intended to be complete or accurate in all details.
However, the project aims at designing a dynamic system, in the sense that it should be
easy to expand and modify the system. Emphasis has therefore been laid on making the C
and C++ code easy to understand. Comments on all functions are incorporated in the
source code and the programs are divided into modules; it should not be a nightmare to
improve the system and to add new DSP-modules.
The three parts of the laboratory system were developed and tested separately. The
toolbox library for the graphical interface was developed from ready-made building
blocks in Simulink, whereas Microsoft Visual C++ 6.0 was the essential tool for writing
the link program.
Code Composer Studio and The TMS 320C6701 DSP Platform were used for the initial
design of the DSP program. As for literature, [6] and [7] served as a starting point for the
design of the design of the DSP modules. Also, [8] was at times used for inspiration.
10
2.3.1 Block-wise Processing
In order to simplify the first version of the conversion program, the DSP-program utilizes
block-wise processing. It continuously captures N samples first and then performs
operations on all N samples, not only when the discrete Fourier transform (DFT) or FFT
is performed, but also during filtering. Operations, such as filtering, can be done on every
incoming sample and do not require that a frame or block of data is available at the time
of processing. However, to avoid having to shift between various buffer operations,
block-wise processing is utilized for filtering too.
2.3.2 The Sampling Rate
The sampling rate of the system affects the speed requirements on the DSP program. For
the system to work properly, the amount of processing time spent on signal processing
the samples in the DSP program has to be shorter than the time it takes to fill an input
buffer. To achieve this goal, economic memory usage, buffer sizes and number of blocks
allowed in a model, are key issues that have been considered.
2.4 Equipment
The following equipment was available for design and implementation:
One PC (Pentium III 500 MHz)

One DSP-card (Texas Instruments C6701)
Code Composer Studio
Matlab 6.0, Release 12
Microsoft Visual C++ 6.0
Function generator
Headphones, adapters
11
3 THE MAIN PARTS OF THE SYSTEM
This chapter describes the graphical interface, the link program and the DSP program
more extensively.
3.1 The Graphical Interface
The graphical interface utilizes Simulink, a flexible design tool provided by Matlab [1].
Simulink allows efficient testing and verification of signal processing algorithms [1], [3].
In particular, the high-level circuit description makes possible quick changes and
corrections, which would be impractical, if not impossible to carry out if a low-level tool
were to be used from the start of the design process
Figure 2: Module Library for the Graphical Interface
As can be seen from figure 2, a few ready-made modules form a module library. It should
be noted that Matlab’s Simulink offers various options and settings for each module. The
proposed interface, on the other hand, has put some constraints on these options. More
often than not, it is possible to adjust only one parameter (Multiplier, Delay). As for the
IIR filter, the numerator and denominator are adjustable. For professional use this
limitation is unsatisfactory, but for pedagogical purposes, for example if the interface is
to be used as an introduction to signal processing for university students, this constraint
might not be considered a serious flaw.
12
Note the “FFT-Out” module in the module library. If an FFT module is included in a
block diagram, the program immediately interrupts signal processing and the result of the
FFT is sent to the output.
Once the user has described an algorithm by connecting a number of given library
modules, the model is stored in a Model Description Language/List-File, which is
described in the next section.
3.1.1 The Model Description Language/List-File
The Model Description Language/List-file (.mdl-file) is a low-tech format file that

defines a syntax for storing simple data in text and binary files [9]. The data is arranged
into “chunks”.
(informal) A chunk is a part of something, esp. a large part: ‘a chunk of text’ [5]
There are several different chunk-types. In a graphical model developed in Simulink, the
major chunk-types are blocks and lines. The block sections, which describe the
components of the design model, are stored in alphabetical order, followed by the line
sections, each equivalent to a single wire connector.
Figure 3 shows a basic graphical model of a communication channel, where one branch is
delayed and attenuated. The last part of this model’s .mdl-file, illustrates the idea of
chunks:
Figure 3: A basic graphical model and an extract from its .mdl-file
13
…
Block {
BlockType Inport
Name "In"
Position [30, 38, 60, 52]
Port "1"
Interpolate on
}
Block {
BlockType Fcn
Name "Delay"
Position [95, 80, 155, 110]
ForegroundColor "red"
Expr "2"
}
Block {
BlockType Fcn
Name "Multiplier"
Position [195, 80, 255, 110]
ForegroundColor "green"
Expr "0.2"
}
Block {
BlockType Sum
Name "Sum"
Ports [2, 1]
Position [275, 35, 295, 55]
ForegroundColor "blue"
ShowName off
IconShape "round"
Inputs "|++"
SaturateOnIntegerOverflow on
}
Block {
BlockType Outport
Name "Out"
Position [335, 38, 365, 52]
Port "1"
OutputWhenDisabled "held"
InitialOutput "[]"
}
Line {
SrcBlock "Sum"
SrcPort 1
DstBlock "Out"
DstPort 1
}
Line {
SrcBlock "Delay"
SrcPort 1
DstBlock "Multiplier"
DstPort 1
}
Line {
SrcBlock "Multiplier"
SrcPort 1
Points [25, 0]
DstBlock "Sum"
DstPort 2
}
Line {
SrcBlock "In"
SrcPort 1
Points [5, 0]
Branch {
DstBlock "Sum"
DstPort 1
}
Branch {
Points [0, 50]
DstBlock "Delay"
DstPort 1
}
}
}
}
14
Each chunk has a keyword telling what type of chunk it is and a sequence of data items,
each of which can be an integer, a float, a string, or a new chunk. Each block section, for
instance, has a string, which identifies what type of block it is, and other strings
identifying its parameters. In the .mdl-file all text constants are described by the double
quote. For the “Multiplier”-block above, the multiplier is “0.2”.
As for the line-chunks, the integers and strings are “nested”. The line with “SrcBlock”
“In” has, for example, two branches, ending at the Sum Block and the Delay block
respectively. Thus, it is possible to connect a line to multiple outputs, but Simulink does
not handle two branches ending at the same block without an intermediate adder.
The .mdl-format is simple. The format was intended to create a practical way of storing
models that was fast to read and write, easy to use in programs and reasonably space-
efficient [9]
3.2 The Link Program
This section describes the Link program. Basically, what the Link program does is to
open an .mdl-file to retrieve information about the various building blocks and their wire
connections in the graphical model. Then, this information is encoded and conveyed to
the DSP program.
The core of the link program is an iterative and recursive process, which establishes in
what order to read from and write to what buffers.
Whilst the amount of processing time spent on signal processing in the DSP program has
to be shorter than the time it takes to fill an input buffer, there are no speed requirements
on the Link program. The function of the Link program is to convert an .mdl-file to an
execution order for ready-made library DSP modules, and this conversion is not supposed
to be carried out in real time.
The link program may be operated directly from the Matlab prompt, or via the file
“link.exe” in the Windows Explorer
3.2.1 Class Diagram
This section describes the most important classes in the link program and how they
interact.
15
Block Line Dsp_box
Module: string Start: string module: integer

Startpoint: string End: string param: integer
Endpoint: string readbuffer: integer
Param: string writebuffer: integer
* * *
1 1
Chunk_finder Vec_handler Encoder
open_file() vec_search() code_name()

search_mdl_file() create_module_list() < Works for code_param
< Works for
store_in_vec() fill_ex_vec()
1 1 1 1
block_vec: vector branch_flag: integer code table: int2float
line_vec: vector dsp_box: dsp_box
ex_vec: vector encoder: encoder
Works for
1 D
S
Dsp_ctrl P
dsp_init() Inits and Supplies I

dsp_run() N
write_module_list() T
E
EVM Board Handle Overload Messages R
HPI Handle F
Event Handle A
C
E
Figure 4:Class Diagram. The Most important Classes in the Link Program and their Associations
In the initial conversion stage, class Chunk-finder opens an .mdl-file and reads it in text-
mode. Chunk-finder searches the .mdl-file for two kinds of chunks: blocks and lines. It
ignores the rest of the data in the file whereas the block-chunks and the line-chunks,
encapsulated in separate classes, are stored in two separate vectors, together with the data
items defining them.
In the second conversion stage, class Vec_handler takes over. Put simply, Vec_handler
combines the elements in the two vectors, which Chunk_finder has filled with blocks and
lines, into a new vector, which contains the execution order for the DSP-program. More
specifically, class Vec_handler manages the core of the link program, that is, the
recursive search algorithm, which handles the ordering of input and output buffers for the
different signal processing modules of the graphical model.
16
Figure 5: Thirteen-block model. The recursive algorithm in the link program recycles buffers
The picture above illustrates how the execution order of the DSP program is established,
and how buffers are reused during the analysis of the graphical model. Block1, block2
and block3, for example, have to write to separate buffers, since buffer 1 is to be read by
an adder later on in the model. When all modules connected to buffer 1 have finished
reading, buffer 1 can be reused. Therefore, buffer 1 is used as the output buffer for the
adder, which reads buffer 2 and buffer 1, and so on. As can be seen above, the recursive
search algorithm in Vec_handler always starts and ends on buffer 1 to simplify buffer
handling in the DSP program. Refer to Appendix A for a brief outline of the recursive
search algorithm.
Class Encoder “works for” class Vec_handler. The Encoder converts string-names and
string parameters to float parameters, according to a predefined table. The conversion is
carried out before the names and the parameters are stored in the execution vector.
Once the conversion has been completed, class Dsp_ctrl initiates the DSP-program, and
downloads an executable file to the target hardware. Class Dsp_ctrl supplies the DSP-
program with the execution order of the library modules by writing numbers to an
external memory on the EVM Board. Then, this class initializes the C6701 memory space
through the Host Port Interface (HPI), and the external memory registration registers.
When the boot process has ended, the CPU is taken out of reset and starts executing code
from address zero.
17
The picture below illustrates the list of data the link program writes to the external
memory in the DSP program. Each module has five memory positions at its disposal.
Position 1: Block identifier

Position 2: Parameter, for example, delay factor
Position 3: Buffer to read from
Position 4: Buffer to write to
Position 5: Empty; reserved for future work
Position 6: Block identifier
Etcetera.
Offset
S
t
2 10 1 2 4 2 1 1 3 1 2 1 o
p
Module 1 Module 2 Module N Stop Signal
Figure 6: Execution order. The link program writes data to the external memory of the DSP
program to inform the DSP program what module to run and what buffers to use.
Position number one tells the DSP program what module to run. A float value of 2, for
example, identifies the multiplier module, a value of 3 corresponds to the adder module,
a value of 4 to the delay module, and so on, according to a predefined coding table. The
second position in the external memory conveys the parameter chosen for the module. In
the case of the multiplier module, that is module 1 in the picture, the second position
identifies the multiplier factor, 10. Position 3 and 4 identifies what buffer to read data
from and what buffer to write the processed data to, respectively. Position 5 is reserved
for future work. After the last module in the list, a stop signal is included so that the DSP
program knows where to stop reading the memory. Section 3.4 below describes how the
DSP program interprets the data written to the external memory.
3.3 The DSP Program
Section 3.3 presents the DSP program utilized by the laboratory system.
3.3.1 Hardware
This section presents the Cx6701 processor and the Evaluation Module (EVM) board.
The floating point processor Cx6701 has a peak performance of 1000 million floating
point operations per second and can operate at 167 MHz (6 ns cycle time) [7]. It executes
18
up to eight 32 bits instructions every cycle. It has 64 k internal program memory or
cache, and 64 k internal data memory.
The C6x EVM board is a complete DSP system, which provides quad DSP clock support
up to 133 MHz. Apart from the processor chip, the EVM board includes memory, A/D
capabilities and PC interfacing components [10],[7]. The peripherals include an External
Memory interface (EMIF), Direct Memory Access (DMA), Multi-channel Buffered
Serial Ports (McBSP) and Host Port Interface (HPI). The 32 bits EMIF handles the
communication with external memory and supports SDRAM, SBSRAM and
asynchronous memories. The DMA has two 32 bits DMA data and two 32 bits DMA
address busses. Refer to appendix C for a description of DMA. The McBSP provides a
high-speed communication link with externals. The Host Port Interface (HPI) provides a
low cost interface through which a host processor can directly access the CPU’s internal
memory [6].
External Memory A/D Converter (CODEC)
EMIF McBSP
PC Host
DMA HPI
CPU
. .
. .
. .
Figure 7: Overview of a few of the peripherals on the EVM Board
To enable communication with external peripherals, the EVM Board also includes a CD
quality, 16-bit audio interface. The coder/decoder (CODEC) in the interface supports
sample rates from 5.5 kHz to 48kHz [10]. The CODEC is connected to the C6701
processor through two serial ports. The most time efficient way of transferring data
between the serial ports and the internal memory is to utilize the DMA.
3.3.2 DMA and Signal Processing
The DMA channels and their interaction with the signal processing unit, are described in
this section.
The part of the DSP program that handles the communication with the McBSPs was
developed from a skeleton program, found on [11]. The DSP program makes use of two
DMA channels, configured to continuously capture the sample data from two blocks
each. Refer to figure 8 below. DMA channel two copies data to InBuffer1 or to
InBuffer2. When the input buffer is full, an interrupt is posted to the CPU and the
contents of its registers stored. Refer to Appendix B for a brief description of interrupts.
19
During each interrupt service routine some signal processing, involving one or more
buffer transfers, has to be completed on the sample data. Once the signal processing is
completed, the processed data is copied to OutBuffer1 or to OutBuffer2. Then, the
procedure starts all over again. DMA channel two copies data from the receive register of
the McBSP, while DMA channel one copies the processed data to the transmit register of
the McBSP.
At the end of each block transfer, all necessary registers are restored so that the DMA can
perform another block transfer.
Interrupt when buffer full
InBuffer1 OutBuffer1
McBSP Signal McBSP
Receive Processing Transmit
Register InBuffer2 Unit OutBuffer2 Register
Figure 8: DSP program overview
3.3.3 The Amount of Time Available for Signal Processing
It should be clear by now that the amount of time spent on signal processing one buffer of
sample data in the DSP program has to be shorter than the time it takes to fill an input
buffer. Thus, it is necessary to consider the amount of time available for the signal
processing to be performed.
The formula for calculating the amount of time available for signal processing was found
in [7]. Given a sampling frequency, fs, of 8 kHz and an EVM frequency, fEVM, of 133
MHz, there are 16 625 clock cycles (1/( fs/ fEVM)) between consecutive samples. If each
input buffer contains 512 samples, where even data is right channel data and odd data is
left data, there will be roughly 8⋅106 (512 times 16 625) clock cycles available for signal
processing. A sampling frequency of 44.1 kHz results in 1.5⋅106 available cycles.
In other words, since one sample corresponds to 0.000125 (1/fs) seconds, it takes 0.064
(512 times 0.000125 s) seconds to fill an input buffer. Within this time all signal
processing has to be performed. Consequently, a higher sampling frequency, for example
44.1 KHz, will decrease the available time up to 6 times.
Both channels are sampled and put into the input buffer. Only one channel’s data is
processed during the interrupt service routine though. Naturally, if both channels were to
20
be processed, the amount of time available for signal processing would decrease by
almost a factor 2.
3.3.4 Buffer Handling in the Signal Processing Unit
This section describes the buffer handling in the signal processing unit of the DSP
program.
DSP programs, such as the delay or the IIR filter, where the last values of one buffer
constitute the first values in the next buffer, require preservation of buffer values so as
not to cause a gap in between buffers. To allow for more than one delay and one IIR filter
in the model, the samples that are preserved until the next buffer arrives are stored in a
separate memory space, with only an offset in between the different filters or delays. This
solution makes it possible to configure each delay and each filter independently of the
other filters and delays in the model. An offset pointer keeps track of the offset to each
filter’s start address. The filter coefficients for the different filters are stored in a similar
fashion in a separate memory and with an offset in between
Offset
Preserved Samples for filter1 Preserved Samples for filter2.... ....Preserved Samples for filterN
Figure 9: Memory structure. Samples for the delay module and the IIR filter are preserved in a
separate memory area
Since the IIR filter output depends both on input and output samples, the buffers used
whilst IIR filtering data are slightly longer than the buffers used in, for example, the
delay-program. The length of the input and output buffers are the same as for all the other
modules in the library though. If the modules are to be connected, as they are in a
graphical model, the length of the buffers has to be the same.
As seen above, the preservation of a few samples in between buffer transfers is easily
dealt with. More demanding problems crop up when up sampling and down sampling are
considered.
The implementation of up sampling and down sampling, requires variable buffer lengths.
If a program resamples the N values in a buffer at a rate L times higher than the input
sample rate, by inserting L-1 zeros between consecutive samples, the size of the output
21
buffer will consequently be L times longer. Similarly, since down sampling involves
discarding a number of consecutive samples following each sample, the result will be a
shorter output buffer. Dexterity in handling buffers is needed to solve these problems on
a general level. Most certainly, the simple program structure of this prototype cannot be
used.
3.4 Main Loop of the DSP Program
This section describes how the main loop in the Interrupt Service Routine (ISR) is
designed. The pseudo code below illustrates how the DSP program interprets the
execution order, which the C++ interface writes to the external memory.
The signal process function is called each time the processor interrupts its normal
program flow, that is, when an input buffer is full of data. Initially, in the signal process
function, the content of a full input buffer is converted from short integers (16 bits) to
floating point values (32 bits) via a function call to a separate function.
Typecasting of the data that is to be processed is necessary. Whilst all signal processing
algorithms are performed on float values, the audio interface on the C6701 EVM Board
includes a 16 bit audio interface. The speed penalty for using ‘floats’ instead of ‘shorts’
on the C6701 is typically only a factor 2 [12].
Convert short input data to floating point values

Set float pointer to first position in external memory
MAIN LOOP
do
{
set buffer to read
set buffer to write
run signal process

switch(module)
……
case 2: call multiplier
break;
……
case 5: call IIR filter
break;
……
}while(!stop signal)
Signal processing done!
Figure 10: Pseudo code for the main loop in the ISR
22
Next, a pointer is set to the first position in external memory, where the execution order
for the DSP program resides.
At the outset of the main loop, the buffer to read from and the buffer to write to are
established. Then, one of the various signal processing functions are called. The second
turn in the while clause starts reading at the fifth position in the external memory, third at
the tenth position, and so on. The loop is repeated until a stop signal is found.
23
4 OPTIMIZATION AND TESTMETHOD
Chapter 4 discusses optimization in general and whether to use C code or assembly code
in particular. It also comments on the preparation of memory in the DSP program and the
test methods utilized.
4.1 Optimization
In general, the purpose of optimization is to create the smallest, or the fastest, object code
possible. To achieve these goals, the compiler performs various changes to the assembly
code. For example, it eliminates the dead code, removes the redundant expressions,
optimizes loops, and uses inline functions [19].
Generally, “inline” means to place something directly in the source code. More specifically, in [13]’s words:
“When an inline function is called, the C/C++ source code for the function is inserted at the point of the call. This is known
as inline function expansion. Inline function expansion is advantageous in short functions for the following reasons:
It saves the overhead of a function call.
Once inlined, the optimizer is free to optimize the function in context with the surrounding code”
The compiler in Code Composer Studio allows four levels of optimization: -o0. –o1,-o2,
and -o3. Level –o0 corresponds to no optimization. At level –o3, the C compiler and
optimizer is claimed by Texas Instrument to generate code that is 80% efficient, that is,
optimization up to 80% of handwritten assembly is possible. Refer to [17] for details.
4.1.1 C or Assembly?
The DSP program is almost entirely written in C. The user is normally able to use C for
all programs and functions [12]. With many functions, the code generated from the C
compiler and optimizer in Code Composer, is making full use of the processor’s resource
[12]. No benefit at all would come from writing the code in low-level assembly.
However, FFT assembly code was downloaded from Texas Instruments ftp-site [14] to
improve the performance. The FFT is an example of a particularly complicated signal
processing task which utilizes manipulation of the real and imaginary parts of complex
samples. The use of the FFT assembly routine suits the instruction set on the C6701,
which allows simultaneous manipulation of the real and imaginary parts of complex
samples that are stored as a single 32-bit entity.
Texas Instruments ftp-site also makes available IIR filter assembly code. But this code
works on the assumption that interrupts are disabled before the filter function is called
[14]. Therefore, the IIR filter module in this prototype utilizes the C routine provided
together with the assembly code.
24
4.2 Memory Usage
There are two methods for preparing the memory: Static allocation and dynamic
allocation. Static allocation refers to allocation of memory before the program starts. The
locations of objects are decided at compile-time. Dynamic allocation, on the other hand,
refers to allocation and deallocation of storage in arbitrary order, normally determined by
the choices the user makes, at run-time [15]. Static memory is typically faster than
dynamic memory and sometimes used in real-time systems.
This section discusses how the memory is prepared in the DSP program. It also
comments on the advantages and disadvantages of internal and external memory access.
For an algorithm to run efficiently on the C6701, the code and data must reside on the
DSP’s internal program and data memory. If data has to be retrieved from the external
memory, these transfers can slow down the execution by two to six times [16].
Internal Program
Memory
2-6 cycle latency
ALU External
Memory
Internal Data
Memory
Figure 11: External memory access
The program code for the DSP program fits entirely in the internal program memory.
Similarly, flags for communication with the host-PC reside in the internal data memory.
But, only a limited number of signal buffers can be statically allocated in the internal
memory. Since the aim of this project is to design a user-friendly and dynamic system,
memory is dynamically allocated at run time for filter coefficients, for signal processing
buffers, and for buffers preserving samples in between buffer transfers. After
compilation, at run-time, the user can make certain choices as far as sampling frequency
and buffer sizes are concerned.
To explore the performance of the speed critical DSP program, two separate DSP applications
were implemented though: one utilizing the internal memory and another utilizing the
external memory. Refer to chapter 5 for details.
It should be noted that with C6701, dynamic memory allocation is not possible in the
internal data memory. Consequently, at the expense of speed (refer to section 5.1), filter
coefficients and signal buffers reside in an external memory space (SDRAM0). The
SDRAM devices are always clocked at one-half the CPU rate [10]. In this application the
DSP core runs at 133 MHz, that is, the SDRAM runs at 66.5 MHz (15 ns). In return, the
size of the dynamically allocatable memory section (.sysmem), which utilizes the
SDRAM0 memory, can be extended up to roughly 5 Mbytes, if a linker option (heap
25
size) is changed. Accordingly, there is almost no upper limit as far as how many buffers
can be used in a model.
The sequence of numbers conveying the order in which to run the various DSP modules,
reside in the external memory as well. The retrieval of a few float values from the
external memory before running each DSP module, did not have any major impact on the
amount of time required by the various signal processing options.
Finally, the assembly code for the Infinite Impulse Response (IIR) filter module makes
use of buffers that must be aligned in certain ways in the memory. The IIR filter utilizes
opposite (even and odd) double word (64 bit) boundaries to avoid memory bank hits.
Refer to [17] and [18]. But since this prototype utilizes the C routine provided together
with the assembly code, no such concerns were taken when the filter buffers were
dynamically allocated in the external memory.
4.3 Test Methods
This section describes the two test methods utilized in the project.
Initially, in the project, Benchmarking was used to estimate the number of clock cycles
needed for each DSP module.
Figure 12: Benchmarking. The screen dump illustrates the benchmarking procedure in Code
Composer Studio.
26
The screen dump above illustrates the benchmarking procedure in Code Composer. The
clock is enabled. Breakpoints and Profile points are inserted on and immediately after the
function call to signal_process(). A return statement is added early in the multiplier
module to determine the overhead of the signal process function itself. Note the dead
code in the figure below, “end = end”, which is added to make possible the setting of
breakpoints.
During the benchmarking procedure, optimization level –o3 and ‘Speed most critical’
was chosen under ‘Options’ in Code Composer Studio. Still, the results obtained from the
initial measurements may be overly pessimistic, since “debugging and full scale
optimizations cannot be done together” [7]. In debugging information is added to
enhance the debugging process. In optimization, on the other hand, information is
minimized or removed to increase code efficiency.
A slightly different approach was taken to estimate the amount of processing time needed
for various DSP programs. These tests were conducted from the C++ interface and with
the DSP programs utilizing both DMA and interrupt. Four different signal-processing
algorithms (test cases) were studied:
• 1 multiplier module
• 1 IIR filter
• 10 IIR filter (serially)
• 1 FFT
The amount of time required to complete the signal processing for each test case was
calculated indirectly. The program resides in an infinite loop while it is waiting for the
next interrupt:
while(!DONE)
COUNTER++;
A “counter-variable” in the infinite loop counts the number of additions performed while
the CPU is waiting for the next interrupt. When the program leaves the infinite loop to
service an ISR, a mailbox message communicates the obtained number of additions to the
PC. By resetting the counter immediately after finishing the signal processing in the ISR,
the amount of time needed for each test case can be estimated.
The received number of additions actually corresponds to the amount of time available
during one interrupt in the C++ program. To obtain the amount of time required to signal
process each test case, the received number of additions must be subtracted from a
reference value, that is, the maximum number of additions performed during one
interrupt. This reference value is obtained by commenting the function call to the signal
processing function whereby nothing of value is actually performed during the ISR, apart
from the saving and the restoring of the contents of registers and flags.
27
Further, to obtain the amount of time needed for each algorithm, the maximum number of
additions was correlated with the amount of time available during one interrupt, as
calculated in section 3.3.2. After a simple calculation in the C++ interface, the result was
written to stdout.
To make it possible to compare the different DSP programs, the sampling frequency was
8 KHz and the buffer size 256 throughout the measurements.
28
5 RESULTS
This chapter evaluates the performance of the system. Initially, the speed performance of
a few individual algorithms is evaluated. Then, the speed performance of four different
DSP programs is considered. The programs utilize static memory allocation in the
internal memory or dynamic memory allocation in the external memory. Then, the
constraint put on the prototype is accounted for. Finally, future studies are identified.
5.1 Speed Performance of the System
This section presents the speed performance of individual algorithms in a DSP program
utilizing static memory allocation in the internal memory.
The table below lists the number of clock cycles required to perform signal processing for
a few of the algorithms in the DSP program. The test utilized Benchmarking, as described
in section 4.3, with the DMA and the interrupt routine disabled. Number of modules (Nof
Modules) refers to one, two or three algorithms serially:
Table 1: CLOCK CYCLES
Nof Modules 1 2 3
Multiplier 9606 12326 15043
Delay 9048 11204 14626
FFT 27259 - -
IIR 36223 50905 65663
Table 1: Number of clock cycles needed for one, two and three modules, including overhead (7014
clock cycles)
One clock tick in C6701 is equivalent to 7.5 ns. Table 2 lists the amount of time needed
for each module.
Table 2: TIME [ms]
Nof Modules 1 2 3
Multiplication 0.0724 0.0923 0.1128
Delay 0.0678 0.0841 0.1096
FFT 0.2044 - -
IIR 0.271 0.381 0.4924
Table 2: Time needed for one, two and three modules, including overhead ( 0.052 ms)
The multiplier module above utilizes 0.1 % of the amount of time available during one
interrupt. The IIR filter utilizes 0.4 %. These results were an indication that it might be
29
feasible to implement a laboratory system, which required a number of buffer transfers
between various modules for each algorithm.
With a buffer size of 256, the signal process function in the program has an overhead of
about 7014 clock cycles, corresponding to 52 micro seconds (7.5 ns/clock cycle times
7014). This overhead includes the int2float typecasting for left channel data and the
copying of right channel data (reference channel) to the output buffer.
5.2 Case Studies
This section presents the speed performance of one DSP program utilizing internal data
memory, and three DSP programs utilizing the external memory.
These DSP programs were tested:
Intern: “Intern” utilizes static allocation of memory. Buffers for signal processing, as
well as filter coefficients reside in the internal data memory; sampling frequency and
buffer sizes are set before compilation.
Extern: “Extern” utilizes dynamic memory allocation in the external memory for signal
processing buffers; the user determines sampling frequency and buffer size at run time.
FixedFs/N: “FixedFs/N” is the same application as “Extern”, but sampling frequency and
buffer size are set before compilation.
ExtIntern: “ExtIntern” is the same application as “Extern”, but memory for filter
coefficients are not dynamically allocated in the external memory; the filter coefficients
reside in the internal memory.
In order to go through with the tests, a few test algorithms were created. The algorithm
below, where 20 IIR filters are connected serially, is not a useful one; it is just there to
illustrate the time needed to signal process such a block diagram or a similar time
consuming circuit.
In the table below, “None” refers to the result when the function call to the signal
processing unit is removed from the source code and nothing is performed during the
actual interrupt routine.
30
Table 3: % Cycle Usage
Intern Extern FixedFs/N ExtIntern
Signal Processing % % % %
None 0 0 0 0
One multiplier module 0.226 1.015 0.929 1.015
One IIR filter module 0.681 4.725 3.992 4.854
Ten IIR filters (serially) 5.355 40.263 33.718 41.526
One FFT module 1.412 2.665 2.101 2.663
Table 4: TIME [ms]
Intern Extern FixedFs/N ExtIntern

Signal Processing Time [ms] Time [ms] Time [ms] Time [ms]
None 0 0 0 0
One multiplier module 0.144 0.649 0.595 0.650
One IIR filter module 0.436 3.023 2.555 3.106
Ten IIR filters (serially) 3.426 25.771 21.577 26.576
One FFT module 0.903 1.705 1.345 1.705
As seen from the table above, the amount of time needed for signal processing the
algorithms vary a lot between the different applications. With 10 IIR filters serially in one
algorithm, Intern utilizes a little more than 5 % of the available time. Extern utilizes
slightly more than 40 % of the available time, whereas the extern system with sampling
frequency and buffer size fixed at the time of compilation reaches as high as 34 % of
available time. The extern system became slightly faster when the filter coefficients were
moved to the external memory, instead of having them reside in the internal memory. It
might be fair to say that it is not always better to cram the internal data memory full of
data, instead of reallocating certain data to the external memory.
The obvious generalization to be made with respect to these results is of course that there
is a tradeoff between speed and flexibility. If speed is an issue, choose static allocation in
the internal memory; if flexibility is your main concern, chose dynamic memory
allocation and the external memory.
A maximum test was also performed to see how much signal processing the most flexible
DSP program, EXTERN, could handle within one interrupt.
31
Table 5: MAXIMUM NUMBER OF IIR_FILTERS
Extern
Signal Processing % T [ms]
20 IIR filter (serially) 80 51.2
25 IIR filter (serially) 99.8 63.8
A seen below, this test resulted in the constraint reasoning on the maximum number of
IIR filters that should be allowed in one model.
5.3 Requirements
Certain knowledge on the part of the user is required to run the laboratory system.
Primarily, the first version of the prototype has been designed with some constraint put
on the Simulink model:
• “In” and “Out” blocks define the start point and the end point for the graphical
model and have to be included. Only one input and one output block are allowed.
• Adders must be connected by both inputs to be handled by the Link program.
• As seen from the module library in section 3.1, the program immediately
interrupts the signal processing, if an FFT module is included in the model. The
result of the FFT is sent to the output
• The maximum number of IIR filters allowed in the model is 20, each individually
configured. That is, the filters may have different orders and different parameters.
Maximum “order” is 30.
• The maximum number of characters in the numerator and the denominator for the
IIR filters is 150, respectively.
• The maximum number of delay modules is 100, each having a delay factor in the
range 1-10, integer value.
The maximum number of filters and delays in the model is by no means fixed, nor is it an
upper limit for what the laboratory system might handle. Rather it is just the way this
prototype has been set up to work. If required, the user may, with a small modification to
the source code, increase the number of allowed filters and delays.
5.4 Limitations
The major drawback of the prototype is that it cannot handle feedback in graphical
32
Figure 13: Feedback model. This model is not handled by the system
models. A simple model, such as the one depicted above, could easily be dealt with, but a
general solution to the problem of handling feedback models, requires an approach where
every incoming sample is treated separately.
5.5 Conclusion and Future Work
Though several constraints have been put on the models, the proposed system combines,
on a small scale, the benefits of Simulink’s intuitiveness and user-friendliness, with the
real-time capabilities of a proper DSP-implementation. The system allows for intricate
models to be converted and makes available various signal-processing algorithms, such
as adders, delays, FFTs, IIR filters and multipliers.
As for the DSP implementation, there is a trade-off between flexibility and speed. Having
data reside on the internal memory puts more constraints on the graphical models. In
return, it allows for a much higher sampling frequency. The first version of the prototype
utilizes the external memory. Considering that the recursive search algorithm in the link
program recycles buffers quite effectively, memory allocation in the internal memory is
an interesting alternative though.
Apart from thorough tests, future work is to consider the modification of the C++
interface and the DSP program to allow for feedback in the graphical models. This
improvement of the system will require a new design approach, where individual
processing of each incoming sample replaces the block-wise processing. Once this is
complete, more modules should be added to the DSP-program and to the block library. In
particular, future work is to consider an efficient way and general way of handling up-
sampling and down-sampling, that is, signal processing algorithms where variable buffer
lengths are required.
33
6 APPENDIXES
Appendix A: Brief outline for recursive search
1. Find a line which has the "In"-block as its start block. Follow the line. When the
end block of that line is found, retrieve its parameter and pick a buffer number.
Then, store the block in a vector containing the execution order for the DSP
program.
2. Follow the line from the new block. If more than one line has its origin at the new
output, store the block and its output as a “loose end" in a temporary vector. Then
ignore the rest of the branch and resume the recursive search from (1). If only one
line has its origin at the block, continue searching for the next block. An iterative
function call is needed to handle several blocks serially.
3. If an adder is found, store buffer number and name of adder (Sum, or Sum1, etc)
in a “temporary adder vector” and ignore the rest of the branch. Else, if the adder
already exists in the “temporary adder vector”, set “readbuffer” and “writebuffer”
for the adder, find its parameter and store it in the vector containing the execution
order for the DSP program.
4. Repeat the procedure from (1) with “loose end”-blocks and adders as starting
points instead of "In". Continue until there are no loose ends and the "Out"-block
is found.
If possible, reuse buffers.
A few more tests are included to make things work; refer to source code for details
34
Appendix B: C6701 Interrupts
The brief descriptions in appendixes A and B of the Interrupt Service Routine (ISR) and
the Direct Memory Access (DMA) are based on [6].
Most microprocessors, including C6701, have one or more inputs for stopping
(interrupting) the normal program flow. The interrupt can come from an external or
internal peripheral, or simply from a special instruction in the program. As mentioned
earlier, in the laboratory system an interrupt is posted to the processor when an input
buffer is full with sample data.
When an interrupt occurs, the CPU finishes the current instruction. Refer to figure 14
below. Then, the hardware in the processor branches to a fixed address, predefined during
the construction of the computer. At this address, the programmer has placed an interrupt
service routine. The program code in the ISR of this application performs signal-
processing on the sample data during the interrupt.
Program flow:
Instruction 1
Instruction 2
.
Interrupt occurs .
Instruction n Save contents of registers
and flags
Service the interrupt
Restore the contents of the

registers
Resume original process

Program flow:
Instruction n+1
Instruction n+2
Figure 14: Interrupt response procedure
35
It should be noted that an interrupt might occur at any time during the program flow; it is
impossible to predict between what two instructions in the program flow the interrupt
will occur. Therefore, the user must save the contents of the registers and all flags, before
servicing the interrupt task. Then, the user must restore the registers and the context of
the process before the program is allowed to resume its original process. You can think of
an ISR as an ordinary function with no arguments and void return value that saves and
restores the CPU state.
36
Appendix C: Direct Memory Access (DMA)
Load and store instructions can be used for transferring data from one part of memory to
another in a central processing unit (CPU). However, data transfers keep the CPU busy
and prevent it from performing other tasks. In fact, if the CPU is used for data transfers,
most of the processor time will be wasted while the processor is waiting for new data to
arrive.
A less time-consuming (CPU-time) way of transferring data between internal and

external memory is to use Direct Memory Access (DMA). The DMA acts as a co-
processor, which moves data from one part of memory into another without interfering
with the CPU. Therefore, the DMA-method leaves the CPU free to perform other tasks.
Once the CPU has specified what data transfer options to be carried out, the DMA-unit
can operate independently.
C6701 has four DMA channels. Each channel has its own memory-mapped control
registers that can be set up to move data from one place in memory to another. These
registers contain information regarding source and destination locations in memory,
number of transfers, and format of transfers. To avoid memory conflicts when more than
one DMA channel tries to access the same resource in a given clock cycle, a priority
scheme has to be established. In C6701, the four DMA channels have fixed priorities,
with channel 0 having the highest and channel 3 the lowest priority. In this application,
DMA channel 2 is programmed to generate an interrupt of the CPU when an input buffer
is full. DMA channel 1 copies the data to the transmit register of the McBSP. Channel 0
is used for reset.
37
Appendix D: User’s Guide
Start the Simulink program from the Matlab-prompt:
>simulink
Develop a model from the ready-made building blocks in the sl_library, found in the
folder C:\BRIDGE_PROJECT. Choose parameters for the building blocks, by double
clicking on the blocks. Store the model as sl_model.mdl in the folder
C:\BRIDGE_PROJECT.
To start the conversion program, execute the file LINKPROGRAM.EXE in the folder
C:\BRIDGE_PROJECT\cpp. You can either double click on the file in Windows
Explorer, or you can create a shortcut to the program and move it to your desktop area.
You may also execute the program from the Matlab prompt by typing:
>linkprogram
At start up, the visual control panel will appear and prompt you to enter sampling
frequency and buffer size for the DSP program. Once a valid frequency and buffer size
has been entered (you will be guided by the program) and the ‘Enter’ key pressed, the
DSP program is launched automatically.
If overload occurs in the DSP program, a warning will appear before the program is
halted. If an overload warning appears, firstly check your input level (Maximum Signal
Level: 6Vpp, 2.1 Vrms). Secondly, check your multiplier factors.
To stop the signal-processing, press any key.
Important Note:
If you recompile the program and want to run the new version from outside the Microsoft
Visual C++ program, i.e. from Windows Explorer or from the Matlab prompt, the
LINKPROGRAM.EXE file has to be copied from folder ..\cpp\Debug into folder ..\cpp.
38
7 REFERENCES
[1] The MathWorks Inc., www.mathworks.com/products/

[2] “A Matlab to VHDL Conversion Toolbox for Digital Control, I.A. Krout, K. Keane,
Department of Electronic and Computer Engineering, University of Limerick,
Limerick, Ireland, 2000, www.ece.ul.ie/homepage/ian_grout/paper1.pdf
[3] ”Simulink/Matlab-to-VHDL Route for Full-Custom/FGPA Rapid Prototyping of
DSP Algorithms”, Artur Krukowski, Izzet Kale, University of Westminster, United
Kingdom, November 1999, www.cmsa.wmin.ac.uk/~artur/papers/Paper18.pdf
[4] www.mathworks.com/products/controldesign/cgrp.shtml
[5] Definition from Cambridge International Dictionary of English or Concise Oxford
English Dictionary, www.ordboken.nu
[6] Digital Signal Processing Implementation using the TMS320C6000 DSP Platform,
Naim Dahnoun, Prentice Hall, 2000.
[7] C6x-Based Digital Signal Processing, Nasser Kehtarnavaz, Burc Simsek, Prentice
Hall 2000.
[8] “A “C” Test: The 0x10 Best Questions for would be Embedded Programmers”,
Nigel Jones, www.embedded.com/2000/0005/0005feat2.htm
[9] ”The MDL File Format”, Cornell University Program of Computer Graphics,
Ithaca, New York, May 1998, www.graphics.cornell.edu/online/formats/mdl/
[10] TMS 320C6201/6701 Evaluation Module, Technical Reference, Texas Instruments
[11] Department of Signals, Sensors and Systems, Royal Institute of Technology (KTH),
Stockholm, Sweden, www.s3.kth.se
[12] ”Comparing the C4x and the C6x”, Mark Siggins, Johan Thie, Horizon
Technologies, Oslo, Norway, www.horizon-tech.fr/articles/dsp/hunt/compare.htm
[13] TMS320C6000 Code Generation Tools Online Documentation (SPRH014E)
(c)1998-2000 Texas Instruments Incorporated
[14] Texas Instrument, TMS320C67x DSP Software Support Files,
www-k.ext.ti.com/sc/technical-support/tools/dsp/ftp/c67x.htm
[15] “The Memory Management Glossary, Ravenbrook Limited, Cambridge,
www.memorymanagement.org/glossary/
[16] “An Approach for Quick Development of High Performance Telecom Applications
on the TI TIMS2320C620X DSP”, Manish Kasliwal, RadiSys Corporation,
http://www.radisys.com/files/Task_article_euromagazine.pdf
[17] The TMS 320C6X Optimizing C Compilers User’s Guide (SPRU 187), Texas
Instruments
[18] The TMS320C62x/C67x CPU and Instruction Set Reference Guide (SPRU189),
Texas Instrument
[19] “Run-Time Debugging with Microsoft Visual Studio and Rational Purify “, Goran
Begic, www.therationaledge.com/content/apr_01/t_debug_gb.html
39

Kelayakan Lab DSP Thesis

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Kelayakan Lab DSP Thesis

Загружено:

Авторское право:

Доступные форматы

MASTER THESIS

A Prototype Laboratory Environment for Digital Signal Processing

Department of Signals, Sensors & Systems,

FIGURE 1: PROTOTYPE LABORATORY SYSTEM OVERVIEW .................................................. 9

There are many different ways of controlling a digital signal processor:

1. By writing a computer program in a DSP-compatible language, for example C,

2. By describing a system graphically using Simulink, and then generate C-code

1.2 The Problem

After the analysis of the graphical model these tasks remain:

1.3 The Objective of the Project

The implementation is divided into three parts:

2) A Link-program, which interprets and converts the graphical representation of the

3) A DSP-program, which interprets the parameter table description of the system,

Chapter 2 presents an overview of the implemented laboratory system.

2.1 System Overview

Figure 1: Prototype Laboratory System Overview

High Level (Block Diagram)

LINK PROGRAM DSP PROGRAM

Medium Level Low Level (Assembler, C)

Finally, a loudspeaker or an oscilloscope connected to the digital-to-analog converter on

2.3 Programming Methodology

A prototype is, according to [5]:

2.3.2 The Sampling Rate

The following equipment was available for design and implementation:

One PC (Pentium III 500 MHz)

3.1 The Graphical Interface

Figure 2: Module Library for the Graphical Interface

3.1.1 The Model Description Language/List-File

The Model Description Language/List-file (.mdl-file) is a low-tech format file that

Figure 3: A basic graphical model and an extract from its .mdl-file

3.2 The Link Program

3.2.1 Class Diagram

Module: string Start: string module: integer

open_file() vec_search() code_name()

dsp_init() Inits and Supplies I

Position 1: Block identifier

Module 1 Module 2 Module N Stop Signal

3.3 The DSP Program

External Memory A/D Converter (CODEC)

Figure 7: Overview of a few of the peripherals on the EVM Board

3.3.2 DMA and Signal Processing

Interrupt when buffer full

Figure 8: DSP program overview

3.3.3 The Amount of Time Available for Signal Processing

3.3.4 Buffer Handling in the Signal Processing Unit

3.4 Main Loop of the DSP Program

Convert short input data to floating point values

run signal process

Signal processing done!

It saves the overhead of a function call.

Figure 11: External memory access

4.3 Test Methods

5.1 Speed Performance of the System

Table 1: CLOCK CYCLES

Table 2: TIME [ms]

5.2 Case Studies

These DSP programs were tested:

Table 4: TIME [ms]

Intern Extern FixedFs/N ExtIntern

5.5 Conclusion and Future Work

Appendix A: Brief outline for recursive search

If possible, reuse buffers.

Service the interrupt