Вы находитесь на странице: 1из 6

CSCI 510: Computer Architecture

Written Assignment 2 Solutions

The following code does computation over two vectors. Consider different execution scenarios
and provide the average number of cycles per iteration for each of them.
DADDIU R4,R1,#800 ; R1 = upper bound for X
foo: L.D F2,0(R1) ; (F2) = X(i)
MUL.D F2,F2,F0 ; (F2) = a*X(i)
L.D F6,0(R2) ; (F6) = Y(i)
DIV.D F6,F6,F8 ; (F6) = Y(i)/b
SUB.D F6,F2,F6 ; (F6) = a*X(i) - Y(i)/b
S.D F6,0(R1) ; X(i) = a*X(i) - Y(i)/b
DADDIU R1,R1,#8 ; increment X index
DADDIU R2,R2,#8 ; increment Y index
DSLTU R3,R1,R4 ; test: continue loop?
BNEZ R3,foo ; loop if needed

Operations Number of Clock Cycles for


the Execution Stage
Floating Point Multiplication 5
Floating Point Division 15
Floating Point Subtraction 2
Integer Calculation 1
Memory Access 1

1) (10 pts) Assume a single-issue pipeline not using Tomasulo’s algorithm. Show how an
iteration of the loop would execute without being scheduled by compiler.
Answer:
Clock Cycle Instruction
1 L.D F2, 0(R1)
2 MUL.D F2, F2, F0
3 L.D F6, 0(R2)
4 DIV.D F6, F6, F8
Stall for 14 cycles
19 SUB.D F6, F2, F6
Stall for 1 cycle
21 S.D F6, 0(R1)
22 DADDIU R1, R1, #8
23 DADDIU R2, R2, #8
24 DSLTU R3, R1, R4
25 BNEZ R3, foo

2) (20 pts) Assume a single-issue pipeline not using Tomasulo’s algorithm. Unroll the loop as
many times as necessary and schedule it without any stalls, collapsing the loop overhead
instructions. Show the execution of the scheduled unrolled code.
Answer:
Clock Cycle Instruction
1 L.D F6, 0(R2)
2 L.D F7, 8(R2)
3 L.D F9, 16(R2)
4 L.D F10, 24(R2)
5 DIV.D F6, F6, F8
6 DIV.D F7, F7, F8
7 DIV.D F9, F9, F8
8 DIV.D F10, F10, F8
9 L.D F2, 0(R1)
10 L.D F3, 8(R1)
11 L.D F4, 16(R1)
12 L.D F5, 24(R1)
13 MUL.D F2, F2, F0
14 MUL.D F3, F3, F0
15 MUL.D F4, F4, F0
16 MUL.D F5, F5, F0
17 DADDIU R1, R1, #32
18 DADDIU R2, R2, #32
19 DSLTU R3, R1, R4
20 SUB.D F6, F2, F6
21 SUB.D F7, F3, F7
22 SUB.D F9, F4, F9
23 SUB.D F10, F5, F10
24 S.D F6, -32(R1)
25 S.D F7, -24(R1)
26 S.D F9, -16(R1)
27 S.D F10, -8(R1)
28 BNEZ R3, foo

3) (20 pts) Assume a VLIW processor with instructions that contain three operations (1 floating
point, 1 memory access and 1 integer or branch). Unroll the loop as many times as necessary
and schedule it without any stalls, collapsing the loop overhead instructions. What percent of
the operation slots are used in each schedule? How much larger is the size of the code
compared to the original? What is the total register demand?
Answer:
Memory Reference Floating Point Op. Integer/Branch
1 L.D F6, 0(R2)
2 L.D F7, 8(R2) DIV.D F6, F6, F8
3 L.D F9, 16(R2) DIV.D F7, F7, F8
4 L.D F10, 24(R2) DIV.D F9, F9, F8
5 L.D F11, 32(R2) DIV.D F10, F10, F8
6 L.D F12, 40(R2) DIV.D F11, F11, F8
7 L.D F13, 48(R2) DIV.D F12, F12, F8
8 L.D F14, 56(R2) DIV.D F13, F13, F8
9 L.D F2, 0(R1) DIV.D F14, F14, F8
10 L.D F3, 8(R1) MUL.D F2, F2, F0
11 L.D F4, 16(R1) MUL.D F3, F3, F0
12 L.D F5, 24(R1) MUL.D F4, F4, F0
13 L.D F15, 32(R1) MUL.D F5, F5, F0
14 L.D F16, 40(R1) MUL.D F15, F15, F0
15 L.D F17, 48(R1) MUL.D F16, F16, F0
16 L.D F18, 56(R1) MUL.D F17, F17, F0
17 MUL.D F18, F18, F0
18 SUB.D F6, F2, F6
19 SUB.D F7, F3, F7
20 S.D F6, 0(R1) SUB.D F9, F4, F9
21 S.D F7, 8(R1) SUB.D F10, F5, F10
22 S.D F9, 16(R1) SUB.D F11, F15, F11
23 S.D F10, 24(R1) SUB.D F12, F16, F12
24 S.D F11, 32(R1) SUB.D F13, F17, F13 DADDIU R1, R1, #64
25 S.D F12, -24(R1) SUB.D F14, F18, F14 DADDIU R2, R2, #64
26 S.D F13, -16(R1) DSLTU R3, R1, R4
27 S.D F14, -8(R1) NEZ R3, foo

There are 26 × 3 = 78 slots. 25 are not used. (1 – 25 / 78) × 100% ≈ 67.9% of the
operations slots are used.
Assume each instruction takes 32 bits. The original code has 10 instructions, and thus uses
320 bytes. The VLIW code has 78 instructions (each slot counted as one instructions), and
thus uses 78 × 32 = 2496 bytes and is 78/10 – 1 = 6.8 times larger.
The VLIW code needs 17 floating point registers (14 more than the original code) and 4
integer registers.

4) (20 pts) Assume a single-issue pipeline using Tomasulo’s algorithm without speculation.
Show the execution of three iterations following the example given below. Assume 1 cycle
stall for branch.
Issue Executes Write
Iteration Instruction Comment
at /Memory CDB at

1 L.D F2, 0(R1) 1 2 3


1 MUL.D F2, F2, F0 2 4 9 Wait for L.D
1 L.D F6, 0(R2) 3 4 5
1 DIV.D F6, F6, F8 4 6 21
1 SUB.D F6, F2, F6 5 22 24 Wait for DIV.D
1 S.D F6, 0(R1) 6 25 Wait for SUB.D
1 DADDIU R1, R1, #8 7 8 10 Wait for CDB
1 DADDIU R2, R2, #8 8 9 11 Wait for CDB
1 DSLTU R3, R1, R4 9 10 12 Wait for CDB
1 BNEZ R3, foo 10 13 Wait for DSLTU
2 L.D F2, 0(R1) 11 15 16 Stall for BNEZ
MUL.D F2, F2, F0
Wait for L.D
2 12 17 23
Wait for CDB
L.D F6, 0(R2)
Stall for BNEZ
2 13 15 17
Wait for CDB
2 DIV.D F6, F6, F8 22 23 38 Wait for RS
2 SUB.D F6, F2, F6 24 39 41 Wait for DIV.D
2 S.D F6, 0(R1) 25 42 Wait for SUB.D
2 DADDIU R1, R1, #8 26 27 28
2 DADDIU R2, R2, #8 27 28 29
2 DSLTU R3, R1, R4 28 29 30
2 BNEZ R3, foo 29 31 Wait for DSLTU
3 L.D F2, 0(R1) 30 33 34 Stall for BNEZ
3 MUL.D F2, F2, F0 31 35 40 Wait for L.D
L.D F6, 0(R2)
Stall for BNEZ
3 32 33 35
Wait for CDB
3 DIV.D F6, F6, F8 39 40 55 Wait for RS
3 SUB.D F6, F2, F6 40 56 58 Wait for DIV.D
3 S.D F6, 0(R1) 41 59 Wait for SUB.D
3 DADDIU R1, R1, #8 42 43 44
3 DADDIU R2, R2, #8 43 44 45
3 DSLTU R3, R1, R4 44 45 46
3 BNEZ R3, foo 45 47 Wait for DSLTU
5) (20 pts) Assume a single-issue pipeline using Tomasulo’s algorithm with speculation. Show
the execution of three iterations. Assume branches are predicted to be taken.

Write
Issue Executes
Itrn Instruction CDB Commit Comment
at /Memory
at
1 L.D F2, 0(R1) 1 2 3 4
1 MUL.D F2, F2, F0 2 4 9 10 Wait for L.D
1 L.D F6, 0(R2) 3 4 5 11
1 DIV.D F6, F6, F8 4 6 21 22
1 SUB.D F6, F2, F6 5 22 24 25 Wait for DIV.D
1 S.D F6, 0(R1) 6 25 26 Wait for SUB.D
1 DADDIU R1, R1, #8 7 8 10 27 Wait for CDB
1 DADDIU R2, R2, #8 8 9 11 28
1 DSLTU R3, R1, R4 9 11 12 29 Wait for DADDIU
1 BNEZ R3, foo 10 13 30 Wait for DSLTU
2 L.D F2, 0(R1) 11 12 13 31
2 MUL.D F2, F2, F0 12 14 19 32 Wait for L.D
2 L.D F6, 0(R2) 13 14 15 33
2 DIV.D F6, F6, F8 20 21 36 37 Wait for RS
2 SUB.D F6, F2, F6 21 37 39 40 Wait for DIV.D
2 S.D F6, 0(R1) 22 40 41 Wait for SUB.D
2 DADDIU R1, R1, #8 23 24 25 42
2 DADDIU R2, R2, #8 24 25 26 43
2 DSLTU R3, R1, R4 25 26 27 44
2 BNEZ R3, foo 26 28 45 Wait for DSLTU
3 L.D F2, 0(R1) 27 28 29 46
3 MUL.D F2, F2, F0 28 30 35 47 Wait for L.D
3 L.D F6, 0(R2) 29 30 31 48
3 DIV.D F6, F6, F8 36 37 52 53 Wait for RS
3 SUB.D F6, F2, F6 37 53 55 56 Wait for DIV.D
3 S.D F6, 0(R1) 38 56 57 Wait for SUB.D
3 DADDIU R1, R1, #8 39 40 41 58
3 DADDIU R2, R2, #8 40 41 42 59
3 DSLTU R3, R1, R4 41 42 43 60
3 BNEZ R3, foo 42 44 61 Wait for DSLTU

6) (20 pts) Assume a dual-issue pipeline using Tomasulo’s algorithm with speculation. Show
the execution of three iterations. Assume branches are predicted to be taken.
For 4) 5) 6), assume 5 reservation stations for integer operations, 3 reservation stations for load,
3 reservation stations for store, 2 reservation stations for floating point addition/subtraction, 2
reservation stations for floating point multiplication/division. Also assume two function units of
each type.
Answer:

Write
Issue Executes
Itrn Instruction CDB Commit Comment
at /Memory
at
1 L.D F2, 0(R1) 1 2 3 4
1 MUL.D F2, F2, F0 1 4 9 10
1 L.D F6, 0(R2) 2 3 4 10
1 DIV.D F6, F6, F8 2 5 20 21
1 SUB.D F6, F2, F6 3 21 23 24
1 S.D F6, 0(R1) 3 24 25
1 DADDIU R1, R1, #8 4 5 6 25
1 DADDIU R2, R2, #8 4 5 6 26
1 DSLTU R3, R1, R4 5 7 8 26
1 BNEZ R3, foo 5 9 27
2 L.D F2, 0(R1) 6 7 8 28
2 MUL.D F2, F2, F0 6 9 14 28
2 L.D F6, 0(R2) 7 8 9 29
2 DIV.D F6, F6, F8 22 23 38 39 Wait for RS
2 SUB.D F6, F2, F6 22 39 41 42 Wait for DIV.D
2 S.D F6, 0(R1) 23 42 43
2 DADDIU R1, R1, #8 23 24 25 43
2 DADDIU R2, R2, #8 24 25 26 44
2 DSLTU R3, R1, R4 24 26 27 44
2 BNEZ R3, foo 25 28 45
3 L.D F2, 0(R1) 26 27 28 46
3 MUL.D F2, F2, F0 26 29 34 46
3 L.D F6, 0(R2) 27 28 29 47
3 DIV.D F6, F6, F8 35 36 51 52 Wait for RS
3 SUB.D F6, F2, F6 35 52 54 55 Wait for DIV.D
3 S.D F6, 0(R1) 36 55 56
3 DADDIU R1, R1, #8 36 37 38 56
3 DADDIU R2, R2, #8 37 38 39 57
3 DSLTU R3, R1, R4 37 39 40 57
3 BNEZ R3, foo 38 41 58

Вам также может понравиться