Академический Документы
Профессиональный Документы
Культура Документы
External Memory
P E R I P H E R A L S
y ( n) =
a
k =0 M
x(n k )
y ( n) =
a
k =0
x ( n k )+
b
k =1
y (n k )
Convolution
y ( n) =
x ( k ) h( n k )
k =0
X (k ) =
N 1
F (u ) =
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required
Multiply (MPY)
N
Y = an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y
Y = an * xn
n = 1
.M .M
Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result.
Addition (.?)
40
Y = an * xn
n = 1
.M .M .? .?
MPY ADD .M .? a1, x1, prod Y, prod, Y
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L a1, x1, prod Y, prod, Y
RISC processors such as the C6000 use registers to hold the operands, so lets change this code.
Register File - A
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
40
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L a1, x1, prod Y, prod, Y
A15 32-bits
Let us correct this by replacing a, x, prod and Y by the registers as shown above.
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L A0, A1, A3 A4, A3, A4
A15 32-bits
The registers A0, A1, A3 and A4 contain the values to be used by the instructions.
Y = an * xn
n = 1
.M .M .L .L
MPY ADD .M .L A0, A1, A3 A4, A3, A4
A15 32-bits
Register File A contains 16 registers (A0 -A15) which are 32-bits wide.
Data loading
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
A15 32-bits
Load Unit .D
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
A: The operands are loaded into the registers by loading them from the memory using the .D unit.
A15 32-bits
Data Memory
Load Unit .D
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
It is worth noting at this stage that the only way to access memory is through the .D unit. .M .M .L .L .D .D
A15 32-bits
Data Memory
Load Instruction
Register File A A0 A1 A2 A3 prod Y . . . a1 x1
Q: Which instruction(s) can be used for loading operands from the memory to the registers? .M .M .L .L .D .D
A15 32-bits
Data Memory
Q: Which instruction(s) can be used for loading operands from the memory to the registers? .M .M .L .L .D .D A: The load instructions.
A15 32-bits
Data Memory
FFFFFFFF 16-bits
Data a1 x1 prod Y
FFFFFFFF 16-bits
Data a1 x1 prod Y
FFFFFFFF 16-bits
Data
1
0xA 0xC 0x2 0x4 0x6 0x8
0
0xB 0xD 0x1 0x3 0x5 0x7
FFFFFFFF 16-bits
FFFFFFFF 16-bits
.?
a, A5
How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode).
eg.
MVKH
.?
a, A5
ah
(a is a constant or label)
A5 = 0x1234FABC ; OK
A5 = 0xFFFFFABC ; Wrong
a x prod Y . . .
MVKL MVKH
pt1, A5 pt1, A5 pt2, A6 pt2, A6 .D .D .M .L *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4
.M .M .L .L .D .D
A15
32-bits
pt1 and pt2 point to some locations
Data Memory
Creating a loop
So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So lets create a loop so that we can implement the SOP for N Taps.
MVKL MVKH MVKL MVKH LDH LDH MPY ADD .D .D .M .L pt1, A5 pt1, A5 pt2, A6 pt2, A6 *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4
Creating a loop
So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So lets create a loop so that we can implement the SOP for N Taps.
With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction.
loop
.D .D .M .L
loop
.D .D .M .L .?
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL MVKH
loop
.D .D .M .L .?
A15 32-bits
Data Memory
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S MVKH .S
loop
.D .D .M .L .S
A15 32-bits
Data Memory
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
A15 32-bits
Data Memory
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
A15 32-bits
Data Memory
5. Make the branch conditional based on the value in the loop counter
What is the syntax for making instruction conditional? [condition] e.g. [B1] B loop Instruction Label
(1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional.
5. Make the branch conditional based on the value in the loop counter
The condition can be inverted by adding the exclamation symbol ! as follows: [!condition] e.g. [!B0] [B0] B B loop ;branch if B0 = 0 loop ;branch if B0 != 0 Instruction Label
.S .S .M .M .M .M .L .L .L .L .D .D .D .D
MVKL .S1 pt2, A6 MVKH .S1 pt2, A6 MVKL .S2 count, B0 loop LDH LDH MPY ADD SUB [B0] B .D .D .M .L .S .S *A5, A0 *A6, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop
A15 32-bits
Data Memory
B Case 1:
B .S1
label
B Case 2: B .S2
register
This code performs the following operations: a0*x0 + a0*x0 + a0*x0 + + a0*x0
.D .D .M .L .S .S
loop
.D .D .M .L .S .S
[B0]
Indexing Pointers
Syntax *R Description Pointer Pointer Modified No
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] Description Pointer + Pre-offset - Pre-offset Pointer Modified No No No
In this case the pointers are modified BEFORE being used and RESTORED to their previous values.
[disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). disp = R or 5-bit constant. R can be any register.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Pointer Modified No No No Yes Yes
In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Pointer Modified No No No Yes Yes Yes Yes
In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values.
Indexing Pointers
Syntax *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Description Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Pointer Modified No No No Yes Yes Yes Yes
[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register.
loop
.D .D .M .L .S .S
[B0]
loop
.D .D .M .L .S .S .D
*A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
[B0]
B STH
.D .D .M .L .S .S .D
*A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 B0, 1, B0 loop A4, *A7
[B0]
B STH
loop
. . .
A15
32-bits
Data Memory
(1) Increase the clock frequency. (2) Increase the number of Processing units.
. . .
A15
32-bits
Data Memory
To increase the Processing Power, this processor has two sides (A and B or 1 and 2)
Register File A A0 A1 A2 A3 A4 .S1 .S1 .M1 .M1 .S2 .S2 .M2 .M2 .L2 .L2 .D2 .D2
32-bits
Register File B B0 B1 B2 B3 B4
. . .
A15
32-bits
. . .
B15
Data Memory
40, A2 *A5++, A0 *A6++, A1 A0, A1, A3 A3, A4, A4 A2, 1, A2 loop A4, *A7
; A2 = 40, loop count ; A0 = a(n) ; A1 = x(n) ; A3 = a(n) * x(n) ; Y = Y + A3 ; decrement loop count ; if A2 0, branch ; *A7 = Y
Note: Assume that A4 was previously cleared and the pointers are initialised.
Let us have a look at the final details concerning the functional units. Consider first the case of the .L and .S units.
40-bit Reg
odd 8
even
32
even
32
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
OR.L1
A0, A1, A2
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
OR.L1 ADD.L2
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
A2 B4 A5:A4
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
32-bit Reg
< src >
40-bit Reg
.L or .S
< dst >
32-bit Reg
40-bit Reg
A0, A1, A2 -5, B3, B4 A2, A3, A5:A4 A2, A5:A4, A5:A4 3, B9:B8, B9:B8
To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.:
MVC MVC IFR, A0 A0, IRP
.S .S .L .L .D .D .M .M
ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH
NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO
ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP
C67x
.M Unit
MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
.D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) ADDAD SUB LDB (B/H/W) SUBAB (B/H/W) LDDW ZERO MV
No Unit Used
Note: Refer to the 'C6000 CPU Reference Guide for more details.
Extl Memory
P E R I P H E R A L S
CPU
P E R I P H E R A L S
CPU
Module 1 Exam
1. Functional Units a. How many can perform an ADD? Name them. b. Which support memory loads/stores? .M .S .D .L
2. Conditional Code
a. Which registers can be used as condl registers? b. Which instructions can be conditional?
3. Coding Problems
a. Move contents of A0-->A1
3. Coding Problems
a. Move contents of A0-->A1
e. If (B1 0) then B2 = B5 * B6 f. A2 = A0 * A1 + 10