Академический Документы
Профессиональный Документы
Культура Документы
Laundry Example
Ann, Brian, Cathy, Dave
each have one load of
clothes to wash, dry, and fold
A B C D
Washer takes 30 minutes
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r
D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
204521 Digital System August 4, 2011 5
Pipelined Laundry Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D
Pipelined laundry takes 3.5 hours for 4 loads
Speedup = 6/3.5 = 1.7
I
n
ALU
Inst 0 Im Reg Dm Reg
s
t
ALU
r. Inst 1 Im Reg Dm Reg
ALU
O Im Reg Dm Reg
r Inst 2
d
ALU
e Inst 3 Im Reg Dm Reg
r
ALU
Im Reg Dm Reg
Inst 4
P ro g ra m
e x e c u t io n
o rd e r 2 4 6 8 10 12 14 16 18
Time
(in in structions)
Instruction Reg ALU Data Reg
lw $ 1 , 10 0 ($0 ) fetch access
Instruction Data
lw $2, 200($0) 2 ns
fetch
Reg ALU
access
Reg
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
M
Inst. IR11..15
u
Memory MEM/ x
Register ALU
WB.IR File M
Data u
Data
Datamust
mustbebe x
Mem.
stored
stored fromone
from one M
stage to the next
stage to the next u
ininpipeline
pipeline x
registers/latches.
registers/latches.
hold
holdtemporary
temporary Sign
values
valuesbetween
between Extend
clocks
clocks andneeded
and needed 16 32
info. for
info. for
execution.
execution.
PC+4 [31–28] M M
u u
x x
ALU
Add result 1 0
Add Shift
RegDst
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [5– 0]
Practically:
– Stages are imperfectly balanced
– Pipelining needs overhead
– Speedup less than number of stages
Instruction[25–21] Read
PC Read register 1
address Read
Instruction[20–16] Read data 1
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31–0] Write 0 Read
M data 2 result Address data 1
Instruction u register M M
memory x u u
Instruction[15–11] Write x
1 data Data x
1 memory 0
Write
data
Instruction[15–0] 16 32
Sign
extend ALU
control
Instruction [5–0]
Instruction[25–21] Read
PC Read register 1
address Read
Instruction[20–16] Read data 1
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31–0] Write 0 Read
M data 2 result Address data 1
Instruction u register M M
memory x u u
Instruction[15–11] Write x
1 data Data x
1 memory 0
Write
data
Instruction[15–0] 16 32
Sign
extend ALU
control
Instruction [5–0]
Instruction[25–21] Read
PC Read register 1
address Read
Instruction[20–16] Read data 1
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31–0] Write 0 Read
M data 2 result Address data 1
Instruction u register M M
memory x u u
Instruction[15–11] Write x
1 data Data x
1 memory 0
Write
data
Instruction[15–0] 16 32
Sign
extend ALU
control
Instruction [5–0]
Instruction[25–21] Read
PC Read register 1
address Read
Instruction[20–16] Read data 1
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31–0] Write 0 Read
M data 2 result Address data 1
Instruction u register M M
memory x u u
Instruction[15–11] Write x
1 data Data x
1 memory 0
Write
data
Instruction[15–0] 16 32
Sign
extend ALU
control
Instruction [5–0]
Instruction I IF ID EX MEM WB
Instruction I+1 IF ID EX MEM WB
Instruction I+2 IF ID EX MEM WB
Instruction I+3 IF ID EX MEM WB
Instruction I +4 IF ID EX MEM WB
4 cycles = n -1
Time to fill the pipeline
MIPS Pipeline Stages:
IF = Instruction Fetch
First instruction, I Last instruction,
ID = Instruction Decode I+4 completed
Completed
EX = Execution
MEM = Memory Access
n= 5 pipeline stages Ideal CPI =1
WB = Write Back
In-order = instructions executed in original program order
Ideal pipeline operation without any stall cycles
204521 Digital System August 4, 2011 29
IF ID EX MEM WB
Branches resolved
Here in MEM (Stage 4)
ID
MEM
WB
IF
EX
ALU
Load Mem Reg DM Reg
ALU
Instruction 1 Mem Reg DM Reg
ALU
Instruction 2 Mem Reg DM Reg
ALU
Instruction 3 Mem Reg DM Reg
ALU
Instruction 4 Mem Reg DM Reg
ALU
Load Mem Reg DM Reg
ALU
Instruction 1 Mem Reg DM Reg
ALU
Instruction 2 Mem Reg DM Reg
ALU
Instruction 3 Mem Reg DM Reg
Clock Number
Inst. # 1 2 3 4 5 6 7 8 9 10
LOAD IF ID EX MEM WB
Inst. i+1 IF ID EX MEM WB
Inst. i+2 IF ID EX MEM WB
Inst. i+3 stall IF ID EX MEM WB
Inst. i+4 IF ID EX MEM WB
Inst. i+5 IF ID EX MEM
Inst. i+6 IF ID EX
LOAD instruction “steals” an instruction fetch cycle which will cause the pipeline to stall.
Data
4
Hazard Example
5
Figure A.6 The use of the result of the DADD instruction in the next three instructions
causes a hazard, since the register is not written until after those instructions read it.
Forward
Forward
5
Pipeline
with Forwarding
A set of instructions that depend on the DADD result uses forwarding paths to avoid the data hazard
204521 Digital System August 4, 2011 45
Data Hazard Classification
Given two instructions I, J, with I occurring before J in an
instruction stream: I
..
RAW (read after write): A true data dependence ..
J tried to read a source before I writes to it, so J
incorrectly gets the old value. J
WAW (write after write): A name dependence Program
J tries to write an operand before it is written by I Order
I (Write) I (Read)
Shared Shared
Operand Operand
J (Write) J (Read)
Write after Write (WAW) Read after Read (RAR) not a hazard
204521 Digital System August 4, 2011 47
Read after write (RAW) hazards
With RAW hazard, instruction j tries to read a source
operand before instruction i writes it.
Thus, j would incorrectly receive an old or incorrect
value
Graphically/Example:
Graphically/Example:
Graphically/Example:
… j i … i: DIV F7, F1, F3
Instruction j is a Instruction i is a
j: SUB F1, F4, F6
write instruction read instruction
issued after i issued before j
LD R1, 0 (R2)
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
The LD (load double word) instruction has the data in clock cycle 4
(MEM cycle).
Stall
Stall
M ID/EX
u
x WB EX/MEM
Control M WB MEM/WB
ALUOp
IF/ID EX M WB
ALUSrc
MemWrite
MemRead
Branch
MemtoReg
RegWrite
Add
Add
Shift
left 2
Read
RegDst
Read reg 1
PC
address Read
Read
reg 2 data 1 Read
Zero addr
Read
Write Read M
Instruction data
reg data 2 M ALU Write u
Memory u addr x
Write x
data Registers
Write
Data
data
Memory
Inst[15-0] Sign ALU
extend control
Inst[20-16]
M
u
Inst[15-11] x
Hazard if a current register read address = any register write address in pipeline
and RegWrite is asserted in that pipeline stage
A = B+ C
produces a stall when loading the second data value (B).
Rather than allow the pipeline to stall, the compiler could sometimes
schedule the pipeline to avoid stalls.
ID
EX MEM WB
IF
Stall
Pipeline stall cycles from branches = frequency of taken branches X branch penalty
More Loops
Type Frequency
Arith/Logic 40%
Load 30% of which 25% are followed immediately by
an instruction using the loaded value
Store 10%
branch 20% of which 45% are taken
What is the resulting CPI for the pipelined MIPS with forwarding and branch
address calculation in ID stage when using a branch not-taken scheme?
Branch Penalty = 1 cycle
CPI = Ideal CPI + Pipeline stall clock cycles per instruction
= 1 + stalls by loads + stalls by branches
= 1 + .3 x .25 x 1 + .2 x .45 x 1
= 1 + .075 + .09
= 1.165