Вы находитесь на странице: 1из 14

Pipelining tutorial

1. Consider a machine which supports the following two

instruction schedules for R class and I class instructions.
Assume an instruction mix of 60% R class and 40% I
class instructions. Assume that IF steps take 25
nanoseconds, MEM steps of instruction execution require
45 nanoseconds and the other steps require 20

For a multi-cycle implementation,

i. What is the minimum clock cycle time?
ii. How long does it take to execute 100
instructions in nanoseconds?
1.For a multi-cycle implementation,
clock cycle time is the time for the
longest stage => 45 ns

2.For a multi-cycle implementation,

exec_time = 100 x (exec_time R +
exec_time I )
= 100 x (cycle time x CPI x IC) R + (cycle
time x CPI x IC) I
= 100 x (4 x 45 x 0.6 + 5 x 45 x 0.4)
= 19800ns
Q. You have a system that contains a special
processor for doing floating-point operations. You
have determined that 60% of your computations
can use the floating-point
processor. When a program uses the floating-
point processor, the speedup of the floating point
processor is 40% faster than when it doesnt use
i. What is the overall speedup obtained by using
the floating point operations
ii. In order to improve the speedup you are
considering two options:
Option 1: Modifying the compiler so that 70% of
the computations can use the floating-point
processor. Cost of this option is $50K.
Option 2: Modifying the floating-point processor.
The speedup of the floating point processor is
100% faster than when it doesnt use it. Assume
in this case that 50% of the computations can
use the floatingpoint processor. Cost of this
option is $60K.
Which option would you recommend? Justify your
answer quantitatively
ANS : 1.

Overall speedup by using the floating-

point processor, where F = 0.6 and S =
= 1/[(1 0.6) + 0.6/1.4] = 1.206

Where F is the fraction of computation time

For option 1, F = 0.7 S = 1.4
overall speedup= 1/[(1 0.7) +
0.7/1.4] = 1.25
Cost/Performance = $50K/1.25
= $40K
For option 2, F = 0.5 S = 2
overall speedup = 1/[(1 0.5) +
0.5/2]= 1.33
Cost/Performance = $60K/1.33
= $45.1K
Therefore, Option 1 is better
Q. Given a 100 MHz machine with a with a miss
penalty of 20 cycles, a hit time of 2 cycles, and a
miss rate of 5%, calculate the average memory
access time (AMAT).
Ans. AMAT = hit_time + miss_rate x
miss_penaltySince the clock rate is 100
MHz, the cycle time is:1/(100MHz) = 10 ns
which gives AMAT = 10 ns x (2 + 20 x 0.05)
= 30 ns

Note: Here we needed to multiply by the

cycle time because the hit_time and
miss_penalty were given in cycles.
Q. Suppose doubling the size of the cache
decrease the miss rate to 3%, but causes the hit
time to increases to 3 cycles and the miss penalty
1. Consider the following sequence of
Add #20, R0,R1
mul #3 , R2, R3
And #$3A, R2,R4
Add R0,R2,R5
In all instructions , the destination operand is
given last, Initially registers R0 and R2
contain 2000 and 50, respectively. These
instructions are executed in a computer that
has a four stage pipeline . Assume that the
first instruction is fetched in cock cycle1, and
that instruction fetch requires only one clock
a. Describe the operation being performed
Eg. Consider an un-pipelined processor.
Assume that it has 1-ns clock cycle and
that it uses 4 cycles for ALU operations and
5 cycles for branches and 4 cycles for
memory operations. Assume that the
relative frequencies of these operations are
50%, 35% and 15% respectively. Suppose
that due to clock skew and set up,
pipelining the processor adds 0.15 ns of
overhead to the clock. Ignoring any latency
impact, how much speed up in the
instruction execution rate will we gain from
The average instruction execution
time on an un-pipelined processor is
= clock cycle Avg. CPI
= 1 ns ((0.5 4) + (0.35 5) +
(0.15 4))
= 4.35 ns
The avg. instruction execution time
on pipelined processor is = 1 ns + 0.2
ns = 1.2 ns
So speed up = 4.35/1.2 = 3.3625
Q . Assume a pipeline with four stages Fetch
Instruction (FI), Decode instruction and calculate
address DA, fetch operand FO, and execute EX.
Draw a diagram for a sequence of 7 instructions in
which the third instruction is a branch that is taken
and in which there are no data dependencies.
1. Design a 16-bit memory of total capacity
8192 bits using SRAM chips of size 64 X 1 bit.
Give the array configuration of the chips on the
memory board showing all required input and
output signals for assigning this memory to the
lowest address space . The design should allow
for both byte and 16 bit word accesses.

Вам также может понравиться