Pipeline

ECE4680
Computer Organization and Architecture

Designing a Pipeline Processor
2002-4-3
ECE4680 Pipeline.1
A Single Cycle Processor

Branch
Instruction<31:0>
<0:15>
Rt
RegDst
<11:15>
Instruction
Fetch Unit
<21:25>
<16:20>
Jump
Zero
Clk
<31:26>
op
ALUop
Rd
Imm16
Rd
<5:0>
1 Mux 0
RegWr 5
func
Rs
5
Rt
ALU
Control
ALUctr 3
5
32
0
1
32
MemtoReg
MemWr
0
32
32
WrEn Adr
Data In 32
Clk
Mux
16
Extender
imm16
Instr<15:0>
Zero
ALU
Rw Ra Rb
32 32-bit
Registers
busB
32
Mux
32
Clk
busA
busW
RegDst
ALUSrc
Main
Control
Data
Memory
ALUSrc
ExtOp
ECE4680 Pipeline.2
2002-4-3
Drawbacks of this Single Cycle Processor

Long cycle time:
Cycle time must be long enough for the load instruction:
-
PCs Clock -to-Q +

Instruction Memory Access Time +
Register File Access Time +

ALU Delay (address calculation) +
Data Memory Access Time +

Register File Setup Time +
Clock Skew
Cycle time is much longer than needed for all other instructions.
Examples:
R-type instructions do not require data memory access
Jump does not require ALU operation nor data memory access
2002-4-3
ECE4680 Pipeline.3
Overview of a Multiple Cycle Implementation

The root of the single cycle processors problems:
The cycle time has to be long enough for the slowest instruction
Solution:
Break the instruction into smaller steps
Execute each step (instead of the entire instruction) in one cycle
- Cycle time: time it takes to execute the longest step
- Keep all the steps to have similar length
This is the essence of the multiple cycle processor
The advantages of the multiple cycle processor:
Cycle time is much shorter
Different instructions take different number of cycles to complete
-
Load takes five cycles

Jump only takes three cycles
Allows a functional unit to be used more than once per instruction
ECE4680 Pipeline.4
Why will this hinder

the pipeline?
2002-4-3
Multiple Cycle Processor

MCP: A functional unit to be used more than once per instruction
PCWrCond
Zero
IorD
MemWr
IRWr
PCWr
ALUSelA
WrAdr
32
Din Dout
32
32
32
Rt
Rt 0
Rd
Mux
Ideal
Memory
Ra
Rb
busA
Reg File
busW busB 32
Imm 16
1
32
Rw
1 Mux 0
<< 2
Extend
ExtOp
Zero
32
ALU
RAdr
0
1
32
32
2
3
32
MemtoReg
ECE4680 Pipeline.5
Target
32
Rs
BrWr
Mux
RegWr
Mux
Mux
Instruction Reg
32
32
RegDst
32
PC
32
PCSrc
ALU
Control
ALUOp
ALUSelB
2002-4-3
Outline of Todays Lecture--Pipelining

Introduction to the Concept of Pipelined Processor
Pipelined Datapath and Pipelined Control
How to Avoid Race Condition in a Pipeline Design?
Pipeline Example: Instructions Interaction
ECE4680 Pipeline.6
2002-4-3
Timing Diagram of a Load Instruction

Instruction Fetch
Instr Decode /
Reg. Fetch
Address
Data Memory
Reg Wr
Clk
Old Value
PC
Clk-to-Q
New Value
Instruction Memory Access Time
New Value
Rs, Rt, Rd,

Op, Func
Old Value
ALUctr
Old Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
Delay through Control Logic

New Value
busA
Register File Access Time

New Value
Old Value
Delay through Extender & Mux
Old Value
busB
Register File Write Time
New Value
Address
Old Value
busW
Old Value
3
New Value
ALU Delay
New Value
Data Memory Access Time

New
2002-4-3
ECE4680 Pipeline.7
The Five Stages of Load

Cycle 1 Cycle 2
Load Ifetch
Reg/Dec
Cycle 3 Cycle 4 Cycle 5
Exec
Mem
Wr
Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file
ECE4680 Pipeline.8
2002-4-3
Key Ideas Behind Pipelining

Grading the mid term exams:
5 problems, five people grading the exam
Each person ONLY grades one problem
Pass the exam to the next person as soon as one finishes his part
Assume each problem takes 0.5 hour to grade
- Each individual exam still takes 2.5 hours to grade
-
But with 5 people, all exams can be graded much quicker
The load instruction has 5 stages:

Five independent functional units to work on each stage
- Each functional unit is used only once
The 2nd load can start as soon as the 1st finishes its Ifet stage
Each load still takes five cycles to complete
The throughput, however, is much higher
2002-4-3
ECE4680 Pipeline.9
Pipelining the Load Instruction

Cycle 1 Cycle 2
Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Clock
1st lw Ifetch
Reg/Dec
2nd lw Ifetch
3rd lw
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
The five independent functional units in the pipeline datapath are:

Instruction Memory for the Ifetch stage
Register Files Read ports (bus A and busB) for the Reg/Dec stage
ALU for the Exec stage
Data Memory for the Mem stage
Regiester file is used 2

times but no problem !
Register Files Write port (bus W) for the Wr stage

One instruction enters the pipeline every cycle
One instruction comes out of the pipeline (complete) every cycle
Comparison with m-cycle processor: 5x3 cycles versus 7 cycles
The Effective Cycles per Instruction (CPI) is 1
ECE4680 Pipeline.10
2002-4-3
The Four Stages of R-type

Cycle 1 Cycle 2
R-type Ifetch
Cycle 3 Cycle 4
Reg/Dec
Exec
Wr

Exec: ALU operates on the two register operands
Wr: Write the ALU output back to the register file
2002-4-3
ECE4680 Pipeline.11
Pipelining the R-type and Load Instruction

Cycle 1 Cycle 2
Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Clock
R-type Ifetch
R-type
Reg/Dec
Exec
Ifetch
Reg/Dec
Exec
Ifetch
Reg/Dec
Load
Ops! We have a problem!
Wr
R-type Ifetch
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Wr
R-type Ifetch
Reg/Dec
Exec
Wr
Ideal Case
each functional unit is used once per instruction AND
all instructions are the same, just as discussed in the previous 2 slides.
We have a problem:
Two instructions try to write to the register file at the same time!
We will see many other problems involved in pipelining.
ECE4680 Pipeline.12
2002-4-3
Important Observation
Each functional unit can only be used once per instruction:
necessary but not sufficient.
Each functional unit must be used at the same stage for all instructions:
Load uses Register Files Write Port during its 5th stage
Load
1
Ifetch
2
Reg/Dec
3
Exec
4
Mem
5
Wr
R-type uses Register Files Write Port during its 4th stage
1
R-type Ifetch
2
Reg/Dec
3
Exec
4
Wr
Problem !
2002-4-3
ECE4680 Pipeline.13
Solution 1: Insert Bubble into the Pipeline

Cycle 1 Cycle 2
Clock
Ifetch
Load
Reg/Dec
Exec
Ifetch
Reg/Dec
R-type Ifetch
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Wr
Wr
Exec
Exec
Wr
R-type Ifetch Reg/Dec Pipeline
Bubble
R-type Ifetch Reg/Dec Reg/Dec
Exec
Ifetch
Wr
Exec
Wr
Ifetch Reg/Dec
Reg/Dec
Exec
Wr
Exec
Insert a bubble into the pipeline to prevent 2 writes at the same cycle
The control logic can be complex
No instruction is completed during Cycle 5:
The Effective CPI for load is 2
ECE4680 Pipeline.14
2002-4-3
Solution 2: Delay R-types Write by One Cycle

Delay R-types register write by one cycle:
Now R-type instructions also use Reg Files write port at Stage 5
Mem stage is a NOOP stage: nothing is being done
1
R-type Ifetch
Cycle 1 Cycle 2
2
Reg/Dec
3
Exec
4
Mem
5
Wr
Clock
R-type Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
R-type
Load
R-type Ifetch
R-type Ifetch
Wr
2002-4-3
ECE4680 Pipeline.15
The Four Stages of Store

Cycle 1 Cycle 2
Store
Ifetch
Reg/Dec
Cycle 3 Cycle 4
Exec
Mem
Wr

Exec: Calculate the memory address
Mem: Write the data into the Data Memory
Wr: is NOOP stage so that the pipeline diagram looks more uniform.
ECE4680 Pipeline.16
2002-4-3
The Four Stages of Beq

Cycle 1 Cycle 2
Cycle 3 Cycle 4
NOOP!
Beq
Ifetch
Reg/Dec
Exec
Mem
Wr

Find the difference of Beq
between pipeline processor and
M-cycle processor, think why?
Exec: See slide 23 for more details

ALU compares the two register operands
Adder calculates the branch target address
Mem: If the registers we compared in the Exec stage are the same,
Write the branch target address into the PC
2002-4-3
ECE4680 Pipeline.17
A Pipelined Datapath
Clock-to-Q delay
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
Wr
Branch
ALUOp
No datapath
uder Wr ?
1
0
Ra
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
0
Zero
Data
Me
mDo
RA
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr Register
Rs
ID/Ex Register
Why need such

registers?
IF/ID Register
IUnit
I
PC+4
Imm16
Ex/Mem Register
PC+4
PC
ECE4680 Pipeline.18
PC+4
RegDst
ALUSrc
MemWr
MemtoReg
2002-4-3
The Instruction Fetch Stage

Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]
You are here!

Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
Branch
ALUOp
1
0
PC+4
Imm16
Rb
RFile
Rw Di
Rt
Rd
Exec
Unit
0
Data
Me
mDo
RA
WA
Di
Mux
Rt
Zero
Mem/Wr Register
Ra
Imm16
busA
busB
Ex/Mem Register
Rs
ID/Ex Register
IUnit
I
IF/ID: lw $1, 100 ($2)
PC+4
PC = 14
PC+4
ALUSrc
RegDst
MemWr
MemtoReg
2002-4-3
ECE4680 Pipeline.19
A Detail View of the Instruction Unit

Location 10: lw $1, 0x100($2)
You are here!

Clk
Ifetch
Reg/Dec
1
0
Address
Instruction
Memory
Instruction
IF/ID: lw $1, 100 ($2)
ECE4680 Pipeline.20
Adder
PC = 14
PC new value old output?
10
2002-4-3
The Decode / Register Fetch Stage

Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]

You are here!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
Branch
ALUOp
1
0
Imm16
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
0
Data
Me
mDo
RA
WA
Di
Mux
IUnit
Rt
Zero
Mem/Wr Register
Ra
Imm16
busA
busB
Ex/Mem Register
Rs
PC+4
ID/Ex: Reg. 2 & 0x100
IF/ID:
PC+4
PC
PC+4
ALUSrc
RegDst
MemWr
MemtoReg
2002-4-3
ECE4680 Pipeline.21
Loads Address Calculation Stage

Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]

You are here!
Clk
Ifetch
Reg/Dec
Exec
ALUOp=Add
ExtOp=1
RegWr
Mem
Branch
1
0
Rd
Rw Di
Exec
Unit
0
1
RegDst=0
Data
Me
mDo
RA
WA
Di
ALUSrc=1 MemWr
Mux
Rt
Rb
RFile
Zero
Mem/Wr Register
IUnit
Remember Rt/Rd was
connected to Rfile before?
Rt
Imm16
busA
busB
Ex/Mem: Loads Address
Rs
Ra
ECE4680 Pipeline.22
PC+4
Imm16
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
MemtoReg
2002-4-3
A Detail View of the Execution Unit

You are here!
Clk
Exec
Mem
Adder
32
busA 32
32
Zero
busB
ALU
ID/Ex Register
PC+4
Target
0
Extender
imm16
16
32
Mux
32
32
ALUctr
ALU
Control
ALUSrc=1
ExtOp=1
ALUout
Ex/Mem: Loads Memory Address
<< 2
ALUOp=Add
2002-4-3
ECE4680 Pipeline.23
Loads Memory Access Stage

Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]

You are here!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr
Mem
Branch=0
ALUOp
1
0
Ra
Rt
Rb
RFile
Rw Di
Rd
Data
Mem
RA Do
WA
Di
RegDst
ECE4680 Pipeline.24
Exec
Unit
Zero
ALUSrc
MemWr=0
Mux
IUnit
I
Rt
Imm16
busA
busB
Ex/Mem Register
Rs
Mem/Wr: Loads Data
PC+4
Imm16
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
MemtoReg
2002-4-3
Loads Write Back Stage

Location 10: lw $1, 0x100($2)
$1 <- Mem[($2) + 0x100]

You are somewhere out there!
Clk
Ifetch
Reg/Dec
Exec
ExtOp
RegWr=1
Mem
Wr
Branch
ALUOp
1
0
PC+4
Imm16
Rt
Rb
RFile
Rw Di
Exec
Unit
0
Rd
Data
Mem
RA Do
WA
Di
Mux
IUnit
Rt
Zero
Mem/Wr Register
Ra
Imm16
busA
busB
Ex/Mem Register
Rs
ID/Ex Register
IF/ID:
PC+4
PC
PC+4
ALUSrc
RegDst
MemWr
MemtoReg=1
2002-4-3
ECE4680 Pipeline.25
How About Control Signals?

Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)
N = Exec, Mem, or Wr
Example: Controls Signals at Exec Stage = Func(Loads Exec)
Ifetch
Reg/Dec
Wr
RegWr
Exec
ALUOp=Add
ExtOp=1
Mem
Branch
1
0
Rd
Rw Di
Exec
Unit
0
1
RegDst=0
Zero
Data
Mem
RA Do
WA
Di
ALUSrc=1 MemWr
Mux
Rt
Rb
RFile
Imm16
busA
busB
Mem/Wr Register
IUnit
Why no control signals
at 1st and 2nd stages?
Rt
ID/Ex Register
Rs
Ex/Mem: Loads Address
PC+4
Imm16
Ra
ECE4680 Pipeline.26
IF/ID:
PC+4
PC
PC+4
MemtoReg
2002-4-3
Pipeline Control
The Main Control generates the control signals during Reg/Dec
Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
Control signals for Mem (MemWr Branch) are used 2 cycles later
Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec
Exec
MemW
r
Branch
RegDst
MemtoReg
RegWr
MemW
r
Branch
Wr
Mem/Wr Register
RegDst
ExtOp
ALUSrc
ALUOp
Ex/Mem Register
Main
Control
ID/Ex Register
IF/ID Register
ExtOp
ALUSrc
ALUOp
Mem
MemW
rBranch
MemtoReg
RegWr
MemtoReg
RegWr
MemtoReg
RegWr
2002-4-3
ECE4680 Pipeline.27
Beginning of the Wrs Stage: A Real World Problem

Clk
Clk
RegAdr
WrAdr
RegWr
MemWr
RegWrs Clk-to-Q
MemWrs Clk-to-Q
RegAdrs Clk-to-Q
RegAdr
Data
Reg
File
Ex/Mem
Mem/Wr
RegWr
WrAdrs Clk-to-Q
MemWr
WrAdr
Data
Data
Memory
At the beginning of the Wr stage, we have a problem if:

RegAdrs (Rd or Rt) Clk-to-Q > RegWrs Clk-to-Q
Similarly, at the beginning of the Mem stage, we have a problem if:
WrAdrs Clk-to-Q > MemWrs Clk-to-Q
We have a race condition between Address and Write Enable!
Why can M-cycle processors approach not be useful?
ECE4680 Pipeline.28
2002-4-3
The Pipeline Problem

Multiple Cycle design prevents race condition between Addr and WrEn:
Make sure Address is stable by the end of Cycle N
Asserts WrEn during Cycle N + 1
This approach can NOT be used in the pipeline design because:
Must be able to write the register file every cycle
Must be able write the data memory every cycle
Clock
Store Ifetch
Reg/Dec
Store Ifetch
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
R-type Ifetch
R-type Ifetch
Wr
Solution? Recall 1-cycle processors approach.

2002-4-3
ECE4680 Pipeline.29
Synchronize Register File & Synchronize Memory

Solution: And the Write Enable signal with the Clock
This is the ONLY place where gating the clock is used
MUST consult circuit expert to ensure no timing violation:
- Example: Clock High Time > Write Access Delay
Synchronize Memory and Register File
Clk
Address, Data, and WrEn must be stable

at least 1 set-up time before the Clk edge
I_Addr
I_WrEn
C_WrEn
Actual write
Write occurs at the cycle following

the clock edge that captures the signals
WrEn
C_WrEn
WrEn
I_WrEn
Address
Data
ECE4680 Pipeline.30
I_Addr
I_Data
Reg File
or
Memory
Address
Data
Clk
Reg File
or
Memory
2002-4-3
A More Extensive Pipelining Example

Cycle 1 Cycle 2
Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Clock
0: Load Ifetch
Reg/Dec
4: R-type Ifetch
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
8: Store Ifetch
12: Beq (target is 1000)
End of
Cycle 4
End of
Cycle 5
Wr
End of End of
Cycle 6 Cycle 7
End of Cycle 4: Loads Mem, R-types Exec, Stores Reg, Beqs Ifetch
End of Cycle 5: Loads Wr,
R-types Mem, Stores Exec, Beqs Reg
End of Cycle 6:
R-types Wr,
End of Cycle 7:
Stores Mem, Beqs Exec

Stores Wr,
Beqs Mem
2002-4-3
ECE4680 Pipeline.31
Pipelining Example: End of Cycle 4

0: Loads Mem
4: R-types Exec
8: Stores Reg
8: Stores Reg
4: R-types Exec
12: Beqs Ifet
ALUOp=R-type
ExtOp=x
RegWr=0
0: Loads Mem
Branch=0
Clk
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=1 ALUSrc=0
Zero
Clk
MemWr=0
Data
Mem
RA Do
WA
Di
Mux
IUnit
Rt
Imm16
busA
busB
Mem/Wr: Loads Dout
Rs
PC+4
Ex/Mem: R-types Result
Imm16
ID/Ex: Stores busA & B
PC+4
IF/ID: Beq Instruction
PC+4
PC = 16
ECE4680 Pipeline.32
12: Beqs Ifetch
MemtoReg=x
2002-4-3

0: Lws Wr 4: Rs Mem 8: Stores Exec 12: Beqs Reg 16: Rs Ifetch
12: Beqs Reg
8: Stores Exec
0: Loads Wr
16: Rs Ifet
4: R-types Mem
ALUOp=Add
ExtOp=1
RegWr=1
Branch=0
Clk
1
0
Ra
Rb
RFile
Rw Di
Rt
Rd
Exec
Unit
0
1
RegDst=x ALUSrc=1
Zero
Data
Me
mDo
RA
WA
Di
Clk
MemWr=0
Mux
Rt
Imm16
busA
busB
Mem/Wr: R-types Result
Rs
PC+4
Ex/Mem: Stores Address
Imm16
ID/Ex: Beqs busA & B
IUnit
I
IF/ID: Instruction @ 16
PC+4
PC = 20
PC+4
MemtoReg=1
2002-4-3
ECE4680 Pipeline.33

4: Rs Wr 8: Stores Mem 12: Beqs Exec 16: Rs Reg 20: Rs Ifet
16: R-types Reg
20:
R-types Ifet
4: R-types Wr
RegWr=1
12: Beqs Exec
8: Stores Mem
ALUOp=Sub
ExtOp=1
Branch=0
Clk
1
0
Ra
Rt
Rd
Rb
RFile
Rw Di
Exec
Unit
0
1
RegDst=x ALUSrc=0
ECE4680 Pipeline.34
Zero
Clk
MemWr=1
Data
Me
mDo
RA
WA
Di
Mux
Rt
Imm16
busA
busB
Mem/Wr: Nothing for St
Rs
PC+4
Ex/Mem: Beqs Results
Imm16
ID/Ex:R-types busA & B
IUnit
I
PC+4
PC = 24
PC+4
MemtoReg=0
2002-4-3

8: Stores Wr 12: Beqs Mem 16: Rs Exec 20: Rs Reg 24: Rs Ifet
20: R-types Reg
24:
R-types Ifet
8: Stores Wr
16: R-types Exec
12: Beqs Mem
ALUOp=R-type
ExtOp=x
RegWr=0
Branch=1
Clk
1
0
Ra
Rt
Rb
RFile
Rw Di
Rd
Exec
Unit
0
1
RegDst=1 ALUSrc=0
Zero
Data
Me
mDo
RA
WA
Di
Clk
MemWr=0
Mux
Rt
Imm16
busA
busB
Mem/Wr:Nothing for Beq
Rs
PC+4
Ex/Mem: Rtypes Results
Imm16
ID/Ex:R-types busA & B
IUnit
I
PC+4
PC = 1000
PC+4
MemtoReg=x
2002-4-3
ECE4680 Pipeline.35
The Delay Branch Phenomenon

Cycle 4 Cycle 5
Clk
12: Beq Ifetch Reg/Dec Exec
(target is 1000)
16: R-type Ifetch Reg/Dec
20: R-type
Ifetch
24: R-type
Mem
Wr
Exec
Mem
Wr
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
1000: Target of Br
Wr
Although Beq is fetched during Cycle 4:

Target address is NOT written into the PC until the end of Cycle 7
Branchs target is NOT fetched until Cycle 8
3-instruction delay before the branch take effect
This is referred to as Branch Hazard:
Clever design techniques can reduce the delay to ONE instruction
ECE4680 Pipeline.36
2002-4-3
The Delay Load Phenomenon

Cycle 1 Cycle 2
Clock
I0: Load Ifetch
Plus 1
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem
Plus 2
Plus 3
Plus 4
Wr
Although Load is fetched during Cycle 1:

The data is NOT written into the Reg File until the end of Cycle 5
We cannot read this value from the Reg File until Cycle 6
3-instruction delay before the load take effect
This is referred to as Data Hazard:
Clever design techniques can reduce the delay to ONE instruction
ECE4680 Pipeline.37
2002-4-3
Summary
Disadvantages of the Single Cycle Processor
Long cycle time
Cycle time is too long for all instructions except the Load
Multiple Clock Cycle Processor:
Divide the instructions into smaller steps
Execute each step (instead of the entire instruction) in one cycle
Pipeline Processor:
Natural enhancement of the multiple clock cycle processor
Each functional unit can only be used once per instruction
If a instruction is going to use a functional unit:
- it must use it at the same stage as all other instructions
Pipeline Control:
- Each stages control signal depends ONLY on the instruction
that is currently in that stage
ECE4680 Pipeline.38
2002-4-3
Single Cycle, Multiple Cycle, vs. Pipeline

Cycle 1
Cycle 2
Clk
Single Cycle Implementation:
Load
Store
Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
Ifetch
Reg
Exec
Mem
Wr
Store
Ifetch
Reg
Exec
Mem
R-type
Ifetch
Pipeline Implementation:
Load Ifetch
Reg
Store Ifetch
Exec
Mem
Wr
Reg
Exec
Mem
R-type Ifetch
ECE4680 Pipeline.39
Reg
Exec
Wr
Mem
Wr
2002-4-3

Pipeline

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Pipeline

Загружено:

Авторское право:

Доступные форматы

ECE4680

Computer Organization and Architecture

A Single Cycle Processor

Drawbacks of this Single Cycle Processor

PCs Clock -to-Q +

Register File Access Time +

Data Memory Access Time +

Overview of a Multiple Cycle Implementation

Load takes five cycles

Allows a functional unit to be used more than once per instruction

Why will this hinder

Multiple Cycle Processor

Outline of Todays Lecture--Pipelining

Timing Diagram of a Load Instruction

Rs, Rt, Rd,

Delay through Control Logic

Register File Access Time

Register File Write Time

Data Memory Access Time

The Five Stages of Load

Cycle 3 Cycle 4 Cycle 5

Ifetch: Instruction Fetch

Key Ideas Behind Pipelining

But with 5 people, all exams can be graded much quicker

The load instruction has 5 stages:

Pipelining the Load Instruction

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

The five independent functional units in the pipeline datapath are:

Regiester file is used 2

Register Files Write port (bus W) for the Wr stage

The Four Stages of R-type

Ifetch: Instruction Fetch

Pipelining the R-type and Load Instruction

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ops! We have a problem!

Solution 1: Insert Bubble into the Pipeline

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Solution 2: Delay R-types Write by One Cycle

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

The Four Stages of Store

Ifetch: Instruction Fetch

The Four Stages of Beq

Ifetch: Instruction Fetch

Exec: See slide 23 for more details

Why need such

The Instruction Fetch Stage

$1 <- Mem[($2) + 0x100]

You are here!

IF/ID: lw $1, 100 ($2)

A Detail View of the Instruction Unit

You are here!

IF/ID: lw $1, 100 ($2)

The Decode / Register Fetch Stage

$1 <- Mem[($2) + 0x100]

ID/Ex: Reg. 2 & 0x100

Loads Address Calculation Stage

$1 <- Mem[($2) + 0x100]

Ex/Mem: Loads Address

A Detail View of the Execution Unit

Ex/Mem: Loads Memory Address

Loads Memory Access Stage

$1 <- Mem[($2) + 0x100]

Mem/Wr: Loads Data

Loads Write Back Stage