CSCI 510: Computer Architecture Written Assignment 2 Solutions

CSCI 510: Computer Architecture
Written Assignment 2 Solutions
The following code does computation over two vectors. Consider different execution scenarios
and provide the average number of cycles per iteration for each of them.
DADDIU R4,R1,#800 ; R1 = upper bound for X
foo: L.D F2,0(R1) ; (F2) = X(i)
MUL.D F2,F2,F0 ; (F2) = a*X(i)
L.D F6,0(R2) ; (F6) = Y(i)
DIV.D F6,F6,F8 ; (F6) = Y(i)/b
SUB.D F6,F2,F6 ; (F6) = a*X(i) - Y(i)/b
S.D F6,0(R1) ; X(i) = a*X(i) - Y(i)/b
DADDIU R1,R1,#8 ; increment X index
DADDIU R2,R2,#8 ; increment Y index
DSLTU R3,R1,R4 ; test: continue loop?
BNEZ R3,foo ; loop if needed
Operations Number of Clock Cycles for

the Execution Stage
Floating Point Multiplication 5
Floating Point Division 15
Floating Point Subtraction 2
Integer Calculation 1
Memory Access 1
1) (10 pts) Assume a single-issue pipeline not using Tomasulo’s algorithm. Show how an
iteration of the loop would execute without being scheduled by compiler.
Answer:
Clock Cycle Instruction
1 L.D F2, 0(R1)
2 MUL.D F2, F2, F0
3 L.D F6, 0(R2)
4 DIV.D F6, F6, F8
Stall for 14 cycles
19 SUB.D F6, F2, F6
Stall for 1 cycle
21 S.D F6, 0(R1)
22 DADDIU R1, R1, #8
23 DADDIU R2, R2, #8
24 DSLTU R3, R1, R4
25 BNEZ R3, foo
2) (20 pts) Assume a single-issue pipeline not using Tomasulo’s algorithm. Unroll the loop as
many times as necessary and schedule it without any stalls, collapsing the loop overhead
instructions. Show the execution of the scheduled unrolled code.
Answer:
Clock Cycle Instruction
1 L.D F6, 0(R2)
2 L.D F7, 8(R2)
3 L.D F9, 16(R2)
4 L.D F10, 24(R2)
5 DIV.D F6, F6, F8
6 DIV.D F7, F7, F8
7 DIV.D F9, F9, F8
8 DIV.D F10, F10, F8
9 L.D F2, 0(R1)
10 L.D F3, 8(R1)
11 L.D F4, 16(R1)
12 L.D F5, 24(R1)
13 MUL.D F2, F2, F0
14 MUL.D F3, F3, F0
15 MUL.D F4, F4, F0
16 MUL.D F5, F5, F0
17 DADDIU R1, R1, #32
18 DADDIU R2, R2, #32
19 DSLTU R3, R1, R4
20 SUB.D F6, F2, F6
21 SUB.D F7, F3, F7
22 SUB.D F9, F4, F9
23 SUB.D F10, F5, F10
24 S.D F6, -32(R1)
25 S.D F7, -24(R1)
26 S.D F9, -16(R1)
27 S.D F10, -8(R1)
28 BNEZ R3, foo
3) (20 pts) Assume a VLIW processor with instructions that contain three operations (1 floating
point, 1 memory access and 1 integer or branch). Unroll the loop as many times as necessary
and schedule it without any stalls, collapsing the loop overhead instructions. What percent of
the operation slots are used in each schedule? How much larger is the size of the code
compared to the original? What is the total register demand?
Answer:
Memory Reference Floating Point Op. Integer/Branch
1 L.D F6, 0(R2)
2 L.D F7, 8(R2) DIV.D F6, F6, F8
3 L.D F9, 16(R2) DIV.D F7, F7, F8
4 L.D F10, 24(R2) DIV.D F9, F9, F8
5 L.D F11, 32(R2) DIV.D F10, F10, F8
6 L.D F12, 40(R2) DIV.D F11, F11, F8
7 L.D F13, 48(R2) DIV.D F12, F12, F8
8 L.D F14, 56(R2) DIV.D F13, F13, F8
9 L.D F2, 0(R1) DIV.D F14, F14, F8
10 L.D F3, 8(R1) MUL.D F2, F2, F0
11 L.D F4, 16(R1) MUL.D F3, F3, F0
12 L.D F5, 24(R1) MUL.D F4, F4, F0
13 L.D F15, 32(R1) MUL.D F5, F5, F0
14 L.D F16, 40(R1) MUL.D F15, F15, F0
15 L.D F17, 48(R1) MUL.D F16, F16, F0
16 L.D F18, 56(R1) MUL.D F17, F17, F0
17 MUL.D F18, F18, F0
18 SUB.D F6, F2, F6
19 SUB.D F7, F3, F7
20 S.D F6, 0(R1) SUB.D F9, F4, F9
21 S.D F7, 8(R1) SUB.D F10, F5, F10
22 S.D F9, 16(R1) SUB.D F11, F15, F11
23 S.D F10, 24(R1) SUB.D F12, F16, F12
24 S.D F11, 32(R1) SUB.D F13, F17, F13 DADDIU R1, R1, #64
25 S.D F12, -24(R1) SUB.D F14, F18, F14 DADDIU R2, R2, #64
26 S.D F13, -16(R1) DSLTU R3, R1, R4
27 S.D F14, -8(R1) NEZ R3, foo
There are 26 × 3 = 78 slots. 25 are not used. (1 – 25 / 78) × 100% ≈ 67.9% of the
operations slots are used.
Assume each instruction takes 32 bits. The original code has 10 instructions, and thus uses
320 bytes. The VLIW code has 78 instructions (each slot counted as one instructions), and
thus uses 78 × 32 = 2496 bytes and is 78/10 – 1 = 6.8 times larger.
The VLIW code needs 17 floating point registers (14 more than the original code) and 4
integer registers.
4) (20 pts) Assume a single-issue pipeline using Tomasulo’s algorithm without speculation.
Show the execution of three iterations following the example given below. Assume 1 cycle
stall for branch.
Issue Executes Write
Iteration Instruction Comment
at /Memory CDB at
1 L.D F2, 0(R1) 1 2 3

1 MUL.D F2, F2, F0 2 4 9 Wait for L.D
1 L.D F6, 0(R2) 3 4 5
1 DIV.D F6, F6, F8 4 6 21
1 SUB.D F6, F2, F6 5 22 24 Wait for DIV.D
1 S.D F6, 0(R1) 6 25 Wait for SUB.D
1 DADDIU R1, R1, #8 7 8 10 Wait for CDB
1 DADDIU R2, R2, #8 8 9 11 Wait for CDB
1 DSLTU R3, R1, R4 9 10 12 Wait for CDB
1 BNEZ R3, foo 10 13 Wait for DSLTU
2 L.D F2, 0(R1) 11 15 16 Stall for BNEZ
MUL.D F2, F2, F0
Wait for L.D
2 12 17 23
Wait for CDB
L.D F6, 0(R2)
Stall for BNEZ
2 13 15 17
Wait for CDB
2 DIV.D F6, F6, F8 22 23 38 Wait for RS
2 DADDIU R1, R1, #8 26 27 28
2 DADDIU R2, R2, #8 27 28 29
2 DSLTU R3, R1, R4 28 29 30
3 L.D F2, 0(R1) 30 33 34 Stall for BNEZ
3 MUL.D F2, F2, F0 31 35 40 Wait for L.D
L.D F6, 0(R2)
Stall for BNEZ
3 32 33 35
Wait for CDB
3 DIV.D F6, F6, F8 39 40 55 Wait for RS
3 DADDIU R1, R1, #8 42 43 44
3 DADDIU R2, R2, #8 43 44 45
3 DSLTU R3, R1, R4 44 45 46
5) (20 pts) Assume a single-issue pipeline using Tomasulo’s algorithm with speculation. Show
the execution of three iterations. Assume branches are predicted to be taken.
Write
Issue Executes
Itrn Instruction CDB Commit Comment
at /Memory
at
1 L.D F2, 0(R1) 1 2 3 4
1 MUL.D F2, F2, F0 2 4 9 10 Wait for L.D
1 L.D F6, 0(R2) 3 4 5 11
1 DIV.D F6, F6, F8 4 6 21 22
1 SUB.D F6, F2, F6 5 22 24 25 Wait for DIV.D
1 S.D F6, 0(R1) 6 25 26 Wait for SUB.D
1 DADDIU R1, R1, #8 7 8 10 27 Wait for CDB
1 DADDIU R2, R2, #8 8 9 11 28
1 DSLTU R3, R1, R4 9 11 12 29 Wait for DADDIU
1 BNEZ R3, foo 10 13 30 Wait for DSLTU
2 L.D F2, 0(R1) 11 12 13 31
2 L.D F6, 0(R2) 13 14 15 33
2 DIV.D F6, F6, F8 20 21 36 37 Wait for RS
2 DADDIU R1, R1, #8 23 24 25 42
2 DADDIU R2, R2, #8 24 25 26 43
2 DSLTU R3, R1, R4 25 26 27 44
3 L.D F2, 0(R1) 27 28 29 46
3 L.D F6, 0(R2) 29 30 31 48
3 DADDIU R1, R1, #8 39 40 41 58
3 DADDIU R2, R2, #8 40 41 42 59
3 DSLTU R3, R1, R4 41 42 43 60
6) (20 pts) Assume a dual-issue pipeline using Tomasulo’s algorithm with speculation. Show
the execution of three iterations. Assume branches are predicted to be taken.
For 4) 5) 6), assume 5 reservation stations for integer operations, 3 reservation stations for load,
3 reservation stations for store, 2 reservation stations for floating point addition/subtraction, 2
reservation stations for floating point multiplication/division. Also assume two function units of
each type.
Answer:
Write
Issue Executes
Itrn Instruction CDB Commit Comment
at /Memory
at
1 L.D F2, 0(R1) 1 2 3 4
1 MUL.D F2, F2, F0 1 4 9 10
1 L.D F6, 0(R2) 2 3 4 10
1 DIV.D F6, F6, F8 2 5 20 21
1 SUB.D F6, F2, F6 3 21 23 24
1 S.D F6, 0(R1) 3 24 25
1 DADDIU R1, R1, #8 4 5 6 25
1 DADDIU R2, R2, #8 4 5 6 26
1 DSLTU R3, R1, R4 5 7 8 26
1 BNEZ R3, foo 5 9 27
2 L.D F2, 0(R1) 6 7 8 28
2 MUL.D F2, F2, F0 6 9 14 28
2 L.D F6, 0(R2) 7 8 9 29
2 S.D F6, 0(R1) 23 42 43
2 DADDIU R1, R1, #8 23 24 25 43
2 DADDIU R2, R2, #8 24 25 26 44
2 DSLTU R3, R1, R4 24 26 27 44
2 BNEZ R3, foo 25 28 45
3 L.D F2, 0(R1) 26 27 28 46
3 MUL.D F2, F2, F0 26 29 34 46
3 L.D F6, 0(R2) 27 28 29 47
3 S.D F6, 0(R1) 36 55 56
3 DADDIU R1, R1, #8 36 37 38 56
3 DADDIU R2, R2, #8 37 38 39 57
3 DSLTU R3, R1, R4 37 39 40 57
3 BNEZ R3, foo 38 41 58

CSCI 510: Computer Architecture Written Assignment 2 Solutions

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CSCI 510: Computer Architecture Written Assignment 2 Solutions

Загружено:

Авторское право:

Доступные форматы

CSCI 510: Computer Architecture

Written Assignment 2 Solutions

Operations Number of Clock Cycles for

1 L.D F2, 0(R1) 1 2 3

Вам также может понравиться