Assembly I

TMS320C6713 Assembly Language
General DSP System Block Diagram

Internal Memory
Internal Buses
External Memory
Central Processing Unit
P E R I P H E R A L S
What are the typical DSP algorithms?

The Sum of Products (SOP) is the key element in most DSP algorithms:
Algorithm Finite Impulse Response Filter Equation
y ( n) =
a
k =0 M
x(n k )
Infinite Impulse Response Filter
y ( n) =
a
k =0
x ( n k )+
b
k =1
y (n k )
Convolution
y ( n) =
x ( k ) h( n k )
k =0
Discrete Fourier Transform
X (k ) =
x(n) exp[ j(2 / N )nk ]

n =0
N 1
N 1
Discrete Cosine Transform
F (u ) =
c(u ). f ( x). cos u (2 x + 1) 2N x =0
Implementation of Sum of Products (SOP)

It has been shown that SOP is the key element for most DSP algorithms. So lets write the code for this algorithm and at the same time discover the C6713 architecture.
N
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required
Implementation of Sum of Products (SOP)

So lets implement the SOP algorithm! The implementation in this module will be done in assembly.
N
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required
Multiply (MPY)
N
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y
This instruction is performed by a multiplier unit that is called .M
Multiply (.M unit)

40
Y = an * xn
n = 1
.M .M
The . M unit performs multiplications in hardware MPY .M a1, x1, Y
Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.
Addition (.?)
40
Y = an * xn
n = 1
.M .M .? .?
MPY ADD .M .? a1, x1, prod Y, prod, Y
Add (.L unit)

40
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L a1, x1, prod Y, prod, Y
RISC processors such as the C6000 use registers to hold the operands, so lets change this code.
Register File - A
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
40
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L a1, x1, prod Y, prod, Y
A15 32-bits
Let us correct this by replacing a, x, prod and Y by the registers as shown above.
Specifying Register Names

40
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L A0, A1, A3 A4, A3, A4
A15 32-bits
The registers A0, A1, A3 and A4 contain the values to be used by the instructions.
Specifying Register Names

40
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L A0, A1, A3 A4, A3, A4
A15 32-bits
Register File A contains 16 registers (A0 -A15) which are 32-bits wide.
Data loading
Q: How do we load the operands into the registers? .M .M .L .L
A15 32-bits
Load Unit .D
Q: How do we load the operands into the registers? .M .M .L .L .D .D
A: The operands are loaded into the registers by loading them from the memory using the .D unit.
A15 32-bits
Data Memory
Load Unit .D
It is worth noting at this stage that the only way to access memory is through the .D unit. .M .M .L .L .D .D
A15 32-bits
Data Memory
Load Instruction
Q: Which instruction(s) can be used for loading operands from the memory to the registers? .M .M .L .L .D .D
A15 32-bits
Data Memory
Load Instructions (LDB, LDH,LDW,LDDW)

Q: Which instruction(s) can be used for loading operands from the memory to the registers? .M .M .L .L .D .D A: The load instructions.
A15 32-bits
Data Memory
Using the Load Instructions

Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Also the addresses are 32-bit wide. Data
address 00000000 00000002 00000004 00000006 00000008
FFFFFFFF 16-bits

The syntax for the load instruction is: LD *Rn,Rm Where: Rn is a register that contains the address of the operand to be loaded and Rm is the destination register.
16-bits FFFFFFFF
Data a1 x1 prod Y
address 00000000 00000002 00000004 00000006 00000008

The syntax for the load instruction is: LD *Rn,Rm The question now is how many bytes are going to be loaded into the destination register? Data a1 x1 prod Y
address 00000000 00000002 00000004 00000006 00000008
FFFFFFFF 16-bits

The syntax for the load instruction is: LD *Rn,Rm The answer, is that it depends on the instruction you choose:
LDB: loads one byte (8-bit) LDH: loads half word (16-bit) LDW: loads a word (32-bit) LDDW: loads a double word (64-bit)
Data a1 x1 prod Y
address 00000000 00000002 00000004 00000006 00000008
Note: LD on its own does not exist.
FFFFFFFF 16-bits

The syntax for the load instruction is: LD *Rn,Rm
Example:
If we assume that A5 = 0x4 then: (1) LDB *A5, A7 ; gives A7 = 0x00000001 (2) LDH *A5,A7; gives A7 = 0x00000201 (3) LDW *A5,A7; gives A7 = 0x04030201 (4) LDDW *A5,A7:A6; gives A7:A6 = 0x0807060504030201
Data
1
0xA 0xC 0x2 0x4 0x6 0x8
0
0xB 0xD 0x1 0x3 0x5 0x7
address 00000000 00000002 00000004 00000006 00000008
FFFFFFFF 16-bits

The syntax for the load instruction is: LD *Rn,Rm Question: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place? Data
0xA 0xC 0x2 0x4 0x6 0x8 0xB 0xD 0x1 0x3 0x5 0x7
address 00000000 00000002 00000004 00000006 00000008
FFFFFFFF 16-bits
Loading the Pointer Rn

The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL
(a is a constant or label)
.?
a, A5
How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode).

To solve this problem another instruction is available: MVKH
ah al x a A5
eg.
MVKH
.?
a, A5
ah
(a is a constant or label)
Finally, to move the 32-bit address to a register we can use:

MVKL MVKH a, A5 a, A5

Always use MVKL then MVKH, look at the following examples:
Example 1 A5 = 0x87654321 MVKL 0x1234FABC, A5 MVKH 0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
A5 = 0x1234FABC ; OK
Example 2 MVKH A5 = 0x12344321 0x1234FABC, A5 MVKL 0x1234FABC, A5
A5 = 0xFFFFFABC ; Wrong
LDH, MVKL and MVKH

Register File A
A0 A1 A2 A3 A4
a x prod Y . . .
MVKL MVKH
pt1, A5 pt1, A5 pt2, A6 pt2, A6 .D .D .M .L *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4
.M .M .L .L .D .D
MVKL MVKH LDH LDH MPY ADD
A15
32-bits
pt1 and pt2 point to some locations
Data Memory
in the data memory.
Creating a loop
So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So lets create a loop so that we can implement the SOP for N Taps.
MVKL MVKH MVKL MVKH LDH LDH MPY ADD .D .D .M .L pt1, A5 pt1, A5 pt2, A6 pt2, A6 *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4
Creating a loop
So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So lets create a loop so that we can implement the SOP for N Taps.
With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction.
What are the steps for creating a loop

1. Create a label to branch to. 2. Add a branch instruction, B. 3. Create a loop counter. 4. Add an instruction to decrease the loop counter. 5. Make the branch conditional based on the value in the loop counter.
1. Create a label to branch to

MVKL MVKH MVKL MVKH pt1, A5 pt1, A5 pt2, A6 pt2, A6
loop
LDH LDH MPY ADD
.D .D .M .L
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4
2. Add a branch instruction, B.

MVKL MVKH MVKL MVKH pt1, A5 pt1, A5 pt2, A6 pt2, A6
loop
LDH LDH MPY ADD B
.D .D .M .L .?
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop
Which unit is used by the B instruction?

Register File A A0 A1 A2 A3 prod Y . . . a x MVKL MVKH pt1, A5 pt1, A5 pt2, A6 pt2, A6
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL MVKH
loop
LDH LDH MPY ADD B
.D .D .M .L .?
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop
A15 32-bits
Data Memory
Which unit is used by the B instruction?

Register File A A0 A1 A2 A3 prod Y . . . a x MVKL .S MVKH .S pt1, A5 pt1, A5 pt2, A6 pt2, A6
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S MVKH .S
loop
LDH LDH MPY ADD B
.D .D .M .L .S
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop
A15 32-bits
Data Memory
3. Create a loop counter.

Register File A A0 A1 A2 A3 prod Y . . . a x MVKL .S MVKH .S pt1, A5 pt1, A5 pt2, A6 pt2, A6 count, B0 *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S MVKH .S MVKL .S loop LDH LDH MPY ADD B .D .D .M .L .S
A15 32-bits
B0 registers will be introduced later
Data Memory
4. Decrease the loop counter

Register File A A0 A1 A2 A3 prod Y . . . a x MVKL .S MVKH .S pt1, A5 pt1, A5 pt2, A6 pt2, A6 count, B0 *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S MVKH .S MVKL .S loop LDH LDH MPY ADD SUB B .D .D .M .L .S .S
A15 32-bits
Data Memory
5. Make the branch conditional based on the value in the loop counter
What is the syntax for making instruction conditional? [condition] e.g. [B1] B loop Instruction Label
(1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional.
5. Make the branch conditional based on the value in the loop counter
The condition can be inverted by adding the exclamation symbol ! as follows: [!condition] e.g. [!B0] [B0] B B loop ;branch if B0 = 0 loop ;branch if B0 != 0 Instruction Label
5. Make the branch conditional

Register File A A0 A1 A2 A3 prod Y . . . a x MVKL .S1 pt1, A5 MVKH .S1 pt1, A5
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL .S2 count, B0 loop LDH LDH MPY ADD SUB [B0] B .D .D .M .L .S .S *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
A15 32-bits
Data Memory
More on the Branch Instruction (1)

With this processor all the instructions are encoded in a 32-bit. Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded.
32-bit
B Case 1:
21-bit relative address
B .S1
label
Relative branch. Label limited to +/- 220 offset.
More on the Branch Instruction (2)

By specifying a register as an operand instead of a label, it is possible to have an absolute branch. This will allow a dynamic range of 232.
32-bit
B Case 2: B .S2
5-bit register code
register
Absolute branch. Operates on .S2 ONLY!
Testing the code

MVKL .S1 pt1, A5 MVKH .S1 pt1, A5 MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL .S2 count, B0 loop LDH LDH MPY
However, we would like to perform: a0*x0 + a1*x1 + a2*x2 + + aN*xN
This code performs the following operations: a0*x0 + a0*x0 + a0*x0 + + a0*x0
.D .D .M .L .S .S
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
ADD SUB [B0] B
Modifying the pointers

MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0
The solution is to modify the pointers A5 and A6.
loop
LDH LDH MPY ADD SUB
.D .D .M .L .S .S
*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
[B0]
Indexing Pointers
Syntax *R Description Pointer Pointer Modified No
In this case the pointers are used but not modified.
R can be any register
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] Description Pointer + Pre-offset - Pre-offset Pointer Modified No No No
In this case the pointers are modified BEFORE being used and RESTORED to their previous values.
[disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). disp = R or 5-bit constant. R can be any register.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Pointer Modified No No No Yes Yes
In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Pointer Modified No No No Yes Yes Yes Yes
In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Pointer Modified No No No Yes Yes Yes Yes
[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register.
Modify and testing the code

This code now performs the following operations: a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
loop
LDH LDH MPY ADD SUB
.D .D .M .L .S .S
*A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
[B0]
Store the final result

This code now performs the following operations: a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
loop
LDH LDH MPY ADD SUB
.D .D .M .L .S .S .D
*A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
[B0]
B STH

MVKL .S1 pt1, A5 MVKH .S1 pt1, A5 MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL .S2 count, B0 loop
The Pointer A7 has not been initialised.
LDH LDH MPY ADD SUB
.D .D .M .L .S .S .D
*A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
[B0]
B STH

MVKL .S1 pt1, A5 MVKH .S1 pt1, A5 MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL MVKH MVKL loop LDH LDH MPY ADD SUB [B0] B STH .S1 .S1 .S2 .D .D .M .L .S .S .D pt3, A7 pt3, A7 count, B0 *A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
The Pointer A7 is now initialised.
What is the initial value of A4?

MVKL .S1 pt1, A5 MVKH .S1 pt1, A5 MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL MVKH MVKL ZERO LDH LDH MPY ADD SUB [B0] B STH .S1 .S1 .S2 .L .D .D .M .L .S .S .D pt3, A7 pt3, A7 count, B0 A4 *A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
A4 is used as an accumulator, so it needs to be reset to zero.
loop
Increasing the processing power!

Register File A A0 A1 A2 A3 A4 .S1 .S1 .M1 .M1
. . .
A15
32-bits
.L1 .L1 .D1 .D1
How can we add more processing power to this processor?
Data Memory
Increasing the processing power!

Register File A A0 A1 A2 A3 A4 .S1 .S1 .M1 .M1
(1) Increase the clock frequency. (2) Increase the number of Processing units.
. . .
A15
32-bits
.L1 .L1 .D1 .D1
Data Memory
To increase the Processing Power, this processor has two sides (A and B or 1 and 2)
Register File A A0 A1 A2 A3 A4 .S1 .S1 .M1 .M1 .S2 .S2 .M2 .M2 .L2 .L2 .D2 .D2
32-bits
Register File B B0 B1 B2 B3 B4
. . .
A15
32-bits
.L1 .L1 .D1 .D1
. . .
B15
Data Memory
Code Review (using side A only)

Y = an * xn
n = 1 40
MVK loop: LDH LDH MPY ADD SUB [A2] B STH
.S1 .D1 .D1 .M1 .L1 .L1 .S1 .D1
40, A2 *A5++, A0 *A6++, A1 A0, A1, A3 A3, A4, A4 A2, 1, A2 loop A4, *A7
; A2 = 40, loop count ; A0 = a(n) ; A1 = x(n) ; A3 = a(n) * x(n) ; Y = Y + A3 ; decrement loop count ; if A2 0, branch ; *A7 = Y
Note: Assume that A4 was previously cleared and the pointers are initialised.
Let us have a look at the final details concerning the functional units. Consider first the case of the .L and .S units.
Operands - 32/40-bit Register, 5-bit Constant

Operands can be:
5-bit constants (or 16-bit for MVKL and MVKH). 32-bit registers. 40-bit Registers.
However, we have seen that registers are only 32-bit.
So where do the 40-bit registers come from?

A 40-bit register can be obtained by concatenating two registers. However, there are 3 conditions that need to be respected:
The registers must be from the same side. The first register must be even and the second odd. The registers must be consecutive.

All combinations of 40-bit registers are shown below:
40-bit Reg
odd 8
40-bit Reg
odd 8
even
32
even
32
A1:A0 A3:A2 A5:A4 A7:A6 A9:A8 A11:A10 A13:A12 A15:A14
B1:B0 B3:B2 B5:B4 B7:B6 B9:B8 B11:B10 B13:B12 B15:B14

instr .unit <src>, <src>, <dst> instr .unit <src>, <src>, <dst> 32-bit Reg 5-bit Const
< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
OR.L1
A0, A1, A2
32-bit Reg
40-bit Reg

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
OR.L1 ADD.L2
A0, A1, A2 -5, B3, B4
32-bit Reg
40-bit Reg

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
OR.L1 ADD.L2 ADD.L1
A0, -5, A2,
A1, B3, A3,
A2 B4 A5:A4
32-bit Reg
40-bit Reg

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
OR.L1 ADD.L2 ADD.L1 SUB.L1
A0, -5, A2, A2,
A1, A2 B3, B4 A3, A5:A4 A5:A4, A5:A4

< src >
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
OR.L1 ADD.L2 ADD.L1 SUB.L1 ADD.L2
A0, A1, A2 -5, B3, B4 A2, A3, A5:A4 A2, A5:A4, A5:A4 3, B9:B8, B9:B8
Register to register data transfer

To move the content of a register (A or B) to another register (B or A) use the move MV Instruction, e.g.:
MV MV A0, B0 B6, B7
To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.:
MVC MVC IFR, A0 A0, IRP
TMS320C6000 Instruction Set
' C6700: Superset of Fixed-Point (by unit)

.S Unit .L Unit
ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP MPYSP MPYDP MPYI MPYID IDLE
.S .S .L .L .D .D .M .M
ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH
NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO
ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP
C67x
.M Unit
MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
.D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) ADDAD SUB LDB (B/H/W) SUBAB (B/H/W) LDDW ZERO MV
No Unit Used
Note: Refer to the 'C6000 CPU Reference Guide for more details.
'C6x System Block Diagram

Memory
Internal Buses EMIF
.D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2

Control Regs
Extl Memory
Regs (A0-A15) Regs (A0-A15)
Regs (B0-B15) Regs (B0-B15)
CPU
'C6x System Block Diagram

Program RAM Addr Internal Buses D (32) EMIF Data Ram
.D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2

Control Regs
Extl Memory - Sync - Async
Regs (A0-A15) Regs (A0-A15)
Regs (B0-B15) Regs (B0-B15)
CPU
Module 1 Exam
1. Functional Units a. How many can perform an ADD? Name them. b. Which support memory loads/stores? .M .S .D .L
2. Conditional Code
a. Which registers can be used as condl registers? b. Which instructions can be conditional?
3. Coding Problems
a. Move contents of A0-->A1
3. Coding Problems
a. Move contents of A0-->A1
b. Move contents of CSR-->A1 c. Clear register A5
3. Coding Problems (contd)

d. A2 = A02 + A1
e. If (B1 0) then B2 = B5 * B6 f. A2 = A0 * A1 + 10
g. Load an unsigned constant (19ABCh) into register A6.
3. Coding Problems (contd)

h. Load A7 with contents of mem1 and postincrement the selected pointer.

Assembly I

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Assembly I

Загружено:

Авторское право:

Доступные форматы

TMS320C6713 Assembly Language

General DSP System Block Diagram

Central Processing Unit

What are the typical DSP algorithms?

Infinite Impulse Response Filter

Discrete Fourier Transform

x(n) exp[ j(2 / N )nk ]

Discrete Cosine Transform

c(u ). f ( x). cos u (2 x + 1) 2N x =0

Implementation of Sum of Products (SOP)

Implementation of Sum of Products (SOP)

This instruction is performed by a multiplier unit that is called .M

Multiply (.M unit)

The . M unit performs multiplications in hardware MPY .M a1, x1, Y

Add (.L unit)

Specifying Register Names

Specifying Register Names

Q: How do we load the operands into the registers? .M .M .L .L

Q: How do we load the operands into the registers? .M .M .L .L .D .D

Load Instructions (LDB, LDH,LDW,LDDW)

Using the Load Instructions

Using the Load Instructions

address 00000000 00000002 00000004 00000006 00000008

Using the Load Instructions

Using the Load Instructions

address 00000000 00000002 00000004 00000006 00000008

Note: LD on its own does not exist.

Using the Load Instructions

address 00000000 00000002 00000004 00000006 00000008

Using the Load Instructions

address 00000000 00000002 00000004 00000006 00000008

Loading the Pointer Rn

Loading the Pointer Rn

Finally, to move the 32-bit address to a register we can use:

Loading the Pointer Rn

A5 = 0xFFFFFABC (sign extension)

Example 2 MVKH A5 = 0x12344321 0x1234FABC, A5 MVKL 0x1234FABC, A5

LDH, MVKL and MVKH

MVKL MVKH LDH LDH MPY ADD

in the data memory.

What are the steps for creating a loop

1. Create a label to branch to

LDH LDH MPY ADD

*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4

2. Add a branch instruction, B.

LDH LDH MPY ADD B

*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop

Which unit is used by the B instruction?

LDH LDH MPY ADD B

*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop

Which unit is used by the B instruction?

LDH LDH MPY ADD B

*A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 loop

3. Create a loop counter.

MVKL .S MVKH .S MVKL .S loop LDH LDH MPY ADD B .D .D .M .L .S

B0 registers will be introduced later

4. Decrease the loop counter

MVKL .S MVKH .S MVKL .S loop LDH LDH MPY ADD SUB B .D .D .M .L .S .S

5. Make the branch conditional

More on the Branch Instruction (1)

21-bit relative address

Relative branch. Label limited to +/- 220 offset.

More on the Branch Instruction (2)

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4 loop

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4 loop

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4 loop

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop

A5, A0 A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop

A5++, A0 A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop