DSP TMS Processors PART2

PDF processed with CutePDF evaluation edition www.CutePDF.
com
A property of MVG_OMALLOOR
Addressing
Addressing refers to means to specify location of
operands for instructions
types of addressing are called addressing modes
operands may be input operands for the operation as
Addressing Modes well as results of the operation
DSPs contain separate address generation units
(AGU)
arithmetic units dedicated for address calculation
ÂAnalog Devices refers to data address generator,
Lucent Technologies refers address arithmetic
unit
1 2
Implied Addressing Immediate Addressing

Operand addresses are implied by the instruction Operand itself is encoded into the instruction
ÂIn AT&T DSP16, the operands for multiplier are ÂIn ADSP-21xx, a constant value is loaded into
always fetched from registers X and Y and the regis-ter AX0 as follows:
result is placed into register P. While the AX0 = 1234
assembler syntax of multiplication is Small data words may be included into instruction
P = X * Y word
it could also be written simply as Typically half of the instruction word width
* Longer data words may be stored into word following
the instruction
ÖRequires two access to program memory (instruction
and operand)
3 4
Memory-
Memory-Direct Addressing Register-
Register-Direct Addressing
absolute addressing The operand can be found from a specified
The address of operand is encoded into the register
instruction or can be found from a separate data ÂIn TMS320C3X, value in register R1 is subtracted
word following the instruction from value in register R2 and the result is stored
ÂIn ADSP-21xx, an operand in address 1000 is in R2 as follows:
loaded into register AX0 as follows: SUBF R1,R2
AX0 = DM(1000)
Small addresses can be encoded into the Important addressing mode in load-store
instruction word processors
Long addresses requires a separate data word
following the instruction code
5 6
Register-
Register-Indirect Addressing with
Register-
Register-Indirect Addressing Pre-
Pre- or Post-
Post-Increment
Operand is located in memory address stored in a register Many DSP algorithms access data arrays sequentially
address generation unit can increment the address value in address
Special group of registers can be used to store addresses register
(address registers) z before the memory access (pre-increment)
z after the memory access (post-increment)
Most important addressing mode in DSPs Â DSP32xx:
Natural pointing mechanism when working with data arrays A0 = A0 + *R5++; post-increment by one
z Allows automatic modification of pointers A0 = A0 + *R5--; post-decrement by one
Efficient from instruction set point of view Some DSPs support address increment or decrement with a value in
another register (offset register, modifier register)
z Few bits are needed to indicate address of operand
Â In DSP32xx address register R5 is post-incre-mented with value
Â In Lucent DSP32C, value pointed by the contents of stored in register R17 as follows
register R5 is added to value in accumulator A0: A0 = A0 + *R5++R17;
A0 = A0 + *R5 Â In DSP5600x:
MOVE X:-(R0), A1; pre-decrement R0
Pre-increment operation requires typically extra instruction cycle
7 8
Register-
Register-Indirect Addressing with Register-
Indexing Indexing
effective address is obtained by adding value in address Compilers utilize indexed addressing Stack Frame Pointer
reg-ister and value in another register (index register) for passing parameters in stack
Address Register
together
A stack frame is created each time a
Values in registers are not modified like in previous addressing STACK
mode subroutine is called

Â In DSP5600x: Subroutine can access parameters
MOVE Y1, X:(R6+N6), A1 consistently Pointer to previous stack frame
Number of parameters
Sometimes index value can be part of instruction No need for absolute addresses Parameter #1
Â In TMS320C3X: ÂTI C-compiler generates code for Parameter #2
LDI *-AR1(1),R7; copying the first parameter (integer) of

Effective address is contents of register AR1 subtracted by subroutine to register R0 as follows
one. Contents of AR1 is not modified. (AR3 is used as frame pointer):
Indexed addressing is useful when using the same code for
accessing several data arrays
LDI *+AR3(2),R0;
9 10
Register-
Modulo Address Arithmetic Modulo Address Arithmetic
Data buffer management often needed in DSP applications modulo addressing (circular addressing) provides
In embedded systems, dynamic memory management is expensive hardware support for checking the end of the
Typically need for first-in-first-out (FIFO) buffer
Programmer maintains two pointers:
address registers are updated with pre- or post-
z Read pointer: address of memory location to be read next
z Write pointer: address of memory location where the next data value is
increment
to be written address generation performs modulo arithmetic on
each time read or write operation is computation
performed, the programmer needs to
Öprogrammer sees a circular buffer
check whether the end of buffer has
been reached X0 X1 X2 X3
ReadPointer X2 X3
in the end of buffer, the pointer is X1
initialized to point to the beginning of ReadPointer WritePointer X0 WritePointer
buffer
11 12
Register-
Modulo Address Arithmetic Modulo Address Arithmetic
Implementation #1 Â DSP56001 and DSP96002 have address register triplets Rx, Nx, and Mx,
where x is 0 - 7. The address is stored into Rx, the increment used in
Programmer needs to store the length of circular buffer post-auto-increment addressing is stored into Nx, and the length of
into a special modifier or modulo register modulo-mode addressing buffer is in Mx. These register can be read and
written via the general data bus.
Each modifier register is associated with one or more address
registers Â Auto-increment and modulo-mode arithmetic is performed at an
independent address ALU. Thus, it is possible to access two circular
Starting address or the buffer is not specified; buffers simultaneously.
address register must contain a valid value before usage
General data bus (24)
z circular buffers must begin at k-word boundaries, where k is smallest
N0 M0 R0 R4 M4 N4
power of two that is equal or greater than the size of buffer N1 M1 ADDRESS R1 R5 ADDRESS M5 N5
ALU ALU
z 48-word circular buffer must reside in 64-word boundary, i.e., starting N2 M2 LOW R2 R6 HIGH M6 N6
N3 M3 R3 R7 M7 N7
address may be 0, 64, 128, 192 etc.
This kind of mechanism can be found from TI TMS320C3X and 4X,
Motorola, NEC, and Analog Devices DSPs mux mux mux
PAB (16) XAB (16) YAB (16)
13 14
Register-
Modulo Address Arithmetic Modulo Arithmetic in Lucent DSP16xx
Implementation #2
alternative mechanism is to utilize start and end
registers data bus
Hardware performs comparison of address against the j YAAU

k
value in end register
Modulo addressing may used for any buffer RAM ADD rb
This mechanism can be found in Lucent DSP16XX and r0

r1 CMP
TI TMS320C5X r2
r3
re
15 16
Register-
Modulo Address Arithmetic Bit Reversal
Different DSP may support different number of Bit reversed addressing used mainly in FFT
simultaneous circular buffers Memory location
BEFORE permutation
Index mapping
decimal binary
ÂTI TMS320C5x supports two circular buffers and x0 0
X0 0 0 111 111
W80
1
Motorola DSP561xx four buffers. Motorola x1
W40
X1 1 4 001 100
2
DSP5600x and Analog Devices DSP support x2
W40 W82 3
X2 2 2 010 010
x3
eight circular buffers. W20
4
X3 3 6 011 110
x4 X4 4 1 100 001
W20 W81
5
x5 X5 5 5 101 101
W20 W41
x6 6 X6 6 3 110 011
0 1 3
W2 W4 W8
x7 7 7 7 111 111
X7
Bit Reversed permutation
17 18
Register-
Bit Reversal Short Addressing Modes
Hardware implementation may be Some addressing modes require several words in
Real bit reversal between address register and address program memory (instruction code and data word)
bus
Reverse-carry arithmetic in AGU Some DSPs offer short versions which require
Â In TMS320C3X, bit-reversed addressing mode notation is only one instruction word
symbol "B". Let us suppose the data be stored in memory
starting from address 60h (= AR2) and the length of FFT is ÖShort versions set some restriction on usage
16 (IR0 contains 8, the half of the length of FFT):
Typical short addressing modes are:
*AR2++(IR0)B; AR2 = 0110 0000 = 60 (0. sample)
*AR2++(IR0)B; AR2 = 0110 1000 = 68 (1. sample) Short immediate
*AR2++(IR0)B; AR2 = 0110 0100 = 64 (2. sample)
*AR2++(IR0)B; AR2 = 0110 1100 = 6c (3. sample) Short memory-direct
*AR2++(IR0)B; AR2 = 0110 0010 = 62 (4. sample)
*AR2++(IR0)B; AR2 = 0110 1010 = 6a (5. sample) Paged memory-direct
*AR2++(IR0)B; AR2 = 0110 0110 = 66 (6. sample)
*AR2 ; AR2 = 0110 1110 = 6e (7. sample)
19 20
Short Immediate Addressing Short Memory-

Memory-Direct Addressing
If immediate data word is small enough it can be If direct memory address is small enough it can be
packed into the same instruction word with packed into the same instruction word with
operation code operation code
Typically negative numbers can also be used ÂIn DSP5600x, at most 6-bit addresses (00H -
3FH) may be used in short direct addressing:
Sign is extended automatically MOVE $10, A
ÂIn DSP5600x, at most 12-bit operands may be Sometimes DSP may provide means to add an
used in immediate addressing: offset to short direct address
MOVE #1234, A ÂIn DSP5600x, the I/O register are at the end of
memory map (FFC0H - FFFFH). MOVEP
instruction may be used with short direct
addressing to access those registers.
21 22
Paged Memory-
Memory-Direct Addressing
Special page register is used to hold number of
page or section of memory to be accessed
When access outside this page is required, the page
register must be updated
Â In TMS320C2X and C5X, the Data Bus (16)
data memories are divided into DP

pages containing 128 (27) 9
7
words. Programmer sets a 7 LSBs from Instruction Register
page register to point to a 16
specific page and short direct 16-bit data address

addressing mode can be used
to access data within the page.
23
Instruction Set
Defines what are natural and efficient operations
on the processor
A processor with more instructions is not
necessarily better
Instruction Set and Execution Control Specialized instructions may require more silicon
area
Traditional instruction types
Multiplication and arithmetic
Logic operations
Shifting and rotation
Comparison
1 2
Looping Looping
DSP applications require repeated execution of ÂSoftware looping takes roughly three time longer
small number of arithmetic or multiplication to execute than hardware looping in the following:
instructions ;SW LOOPING
If number of instructions in inner loop is small, MOVE #16,B
overhead in looping lowers the performance LOOP: MAC (R0)+,(R4)+,A
Öall DSPs provide hardware looping instructions DEC B
(zero-overhead looping) JNE LOOP
repeat a single instruction or a block of instructions ;HW LOOPING
without the normal decrement-test-branch sequence RPT #16
loop counter increment, test against end condition, and MAC (R0)+,(R4)+,A
branching are done by hardware
3 4
Single and Multi-

Multi-Instruction Loops Looping Effects
Single-instruction loop repeats execution of one instruction Typically single instruction loop disables interrupts
Repeated instruction is fetched once from the program memory
Consecutive executions free the program bus for operand fetch Maximum single-instruction loop lockout time must be
Multi-instruction loop repeats execution of group of con-sidered
instructions
Alternatives
Instruction must be refetched on each iteration
Program bus is not available for operand fetch Multi-instruction loops with kernel of one instruction
Some DSPs limit the number of instructions for hardware Several single-instruction loops
loops
Â DSP16xx has a special 15-word buffer for repeated Some DSPs may disable interrupts also during
instructions, thus repeat block can contain at most 15 multi-instruction loops
instruction words
5 6
Loop Nesting Depth Branching

Nested loop is a loop inside another loop Conditional/Unconditional
Approaches to handle nested loops unconditional branch is done always
conditional only when condition is fulfilled
Directly nestable (Motorola, Analog Devices, NEC uPD7701x)
z Nested hardware loops are allowed Delayed/Multi-Cycle
z Maximum depths range from three to seven Multi-cycle requires several cycles to complete
partially nestable (DSP Group PineDSPCore, TI TMS320C3X, Delayed allows a number of instructions following the branch to be
executed; branch requires only one cycle
C4x, C5X)
z A single-instruction loop is allowed inside multi-instruc-tion loop Delayed Branch with Nullify
z Multi-instruction loops can not be nested TMS320C4x provides conditional delayed branch where
instructions in delay slot are conditionally executed
software nestable (TI TMS320C3x, TMS320C5x)
z Multi-instruction loop can be nested by saving state of loop registers PC-Relative
before entering to inner loop location of branch is not an absolute address
Non-nestable (TI TMS320C2X, AT&T DSP16xx, DSP32xx) offset from the current instruction location is used
z Nesting of hardware loops are not supported needed in position-independent code
7 8
Conditional Instruction Execution Orthogonality

Instruction is executed only if given condition is true To which extent the processor’s instruction set is
Branches can be avoided in decision-intensive code
Useful in DSPs with deep pipelines, where branching pro-duces extra overhead consistent
Â Analog Devices ADSP-21xx and ADSP-210xx allow programmer to specify the the more orthogonal instruction set, the easier the proces-sor is to
conditions under which the instruction is executed. program
Condition codes are built into the instruction opcode there are fewer inconsistencies and special cases
Extra cycles in execution are needed
Â TMS320C30 has a conditional load for fixed-point and floating-point operands orthogonality is subjective topic
(LDIcond and LDFcond). This is useful when searching a minimum. In the following consistency and completeness of instruction set
the mini-mum of three operands is searched. AR2 points to the beginning of the z e.g., processor with add instruction, but no subtract instruction would
operand array.
be non-orthogonal
LDI *AR2,R3 ;load the first value
CMPI *AR2+(1),R3 ;compare it to next value degree to which operands and addressing modes are uni-formly
LDIGT *AR2+(1),R3 ;conditional load available with different operations
CMPI *AR2+(2),R3 ;compare result to 3rd value
LDIGT *AR2+(2),R3 ;minimum in the register R0 z e.g., processor which provides register-indirect address-ing mode for
add but not for subtract, is non-orthogonal
Â TMS320C5X and C54X provide conditional execution instruction XC. If the specified
condition is true, the next two single-word or a two-word instructions are executed, Processors with larger instruction word widths tend to be
otherwise NOPs are executed.
more orthogonal
9 10
Minimize Instruction Word Width Assembly Language Format

reduced number of operations A) Traditional opcode-operand assembly syntax
less operations -> fewer bits for opcode z instructions expressed in instruction mnemonic and its operands
e.g., DSP16xx does not support rotation MPY X0,X0 ;multiply

reduced number of addressing modes ADD P,A ;add product to accumulator
processors with smaller instruction word width provide less addressing MOV (R0),X0 ;
modes JMP LOOP
limitations allowable combinations of operations and addressing modes B) Functional, C-like or algebraic syntax (C-like arithmetic shorthand)
restrictions on source/destination operands P = X0*X0
e.g., one certain register can be used as address register in certain A = P+A
instructions X0 = *R0
use of mode bits GOTO LOOP
mode bit specifies what is the actual operation for instruction Algorithms are expressed close to the mathematical form
e.g., TMS320C5x does not have separate arithmetic and logical shift actually code is not C; experienced C programmers may find it frustrating because
instructions, thus the actual operation is defined by the shift mode bit syntax does not support all the C syntax
e.g., accumulator shift takes shift count from special reg-ister rather than Assembly language syntax is not related to the instruction set of the processor
from instruction word a single processor may have two assemblers
most of these features complicate programming, but narrower as long as assembler generates same binary opcodes, the syntaxes are the same
from processor’s point of view
instruction word reduces required die size
11 12

DSP TMS Processors PART2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DSP TMS Processors PART2

Загружено:

Авторское право:

Доступные форматы

PDF processed with CutePDF evaluation edition www.CutePDF.

Implied Addressing Immediate Addressing

mode subroutine is called

Â In TMS320C3X: ÂTI C-compiler generates code for Parameter #2

LDI *-AR1(1),R7; copying the first parameter (integer) of

initialized to point to the beginning of ReadPointer WritePointer X0 WritePointer

PAB (16) XAB (16) YAB (16)

Hardware performs comparison of address against the j YAAU

This mechanism can be found in Lucent DSP16XX and r0

Bit Reversed permutation

Short Immediate Addressing Short Memory-

Â In TMS320C2X and C5X, the Data Bus (16)

data memories are divided into DP

page register to point to a 16

specific page and short direct 16-bit data address

Single and Multi-

Loop Nesting Depth Branching

Conditional Instruction Execution Orthogonality

Minimize Instruction Word Width Assembly Language Format

e.g., DSP16xx does not support rotation MPY X0,X0 ;multiply

Вам также может понравиться

DSP TMS Processors PART2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DSP TMS Processors PART2

Загружено:

Авторское право:

Доступные форматы

PDF processed with CutePDF evaluation edition www.CutePDF.

Implied Addressing Immediate Addressing

mode subroutine is called

Â In TMS320C3X: ÂTI C-compiler generates code for Parameter #2

LDI *-AR1(1),R7; copying the first parameter (integer) of

initialized to point to the beginning of ReadPointer WritePointer X0 WritePointer

PAB (16) XAB (16) YAB (16)

Hardware performs comparison of address against the j YAAU

This mechanism can be found in Lucent DSP16XX and r0

Bit Reversed permutation

Short Immediate Addressing Short Memory-

Â In TMS320C2X and C5X, the Data Bus (16)

data memories are divided into DP

page register to point to a 16

specific page and short direct 16-bit data address

Single and Multi-

Loop Nesting Depth Branching

Conditional Instruction Execution Orthogonality

Minimize Instruction Word Width Assembly Language Format

 e.g., DSP16xx does not support rotation MPY X0,X0 ;multiply

Вам также может понравиться

Hardware performs comparison of address against the j YAAU

This mechanism can be found in Lucent DSP16XX and r0

e.g., DSP16xx does not support rotation MPY X0,X0 ;multiply