Вы находитесь на странице: 1из 20

ECE4680

Computer Organization and Architecture


Designing a Pipeline Processor

2002-4-3

ECE4680 Pipeline.1

A Single Cycle Processor


Branch
Instruction<31:0>
<0:15>

Rt
RegDst

<11:15>

Instruction
Fetch Unit

<21:25>

<16:20>

Jump
Zero
Clk

<31:26>
op

ALUop

Rd
Imm16

Rd

<5:0>

1 Mux 0
RegWr 5

func
Rs
5

Rt

ALU
Control

ALUctr 3

5
32
0

1
32

MemtoReg

MemWr
0

32
32
WrEn Adr

Data In 32
Clk

Mux

16

Extender

imm16
Instr<15:0>

Zero
ALU

Rw Ra Rb
32 32-bit
Registers
busB
32

Mux

32
Clk

busA

busW

RegDst
ALUSrc

Main
Control

Data
Memory

ALUSrc
ExtOp
ECE4680 Pipeline.2

2002-4-3

Drawbacks of this Single Cycle Processor


Long cycle time:
Cycle time must be long enough for the load instruction:
-

PCs Clock -to-Q +


Instruction Memory Access Time +

Register File Access Time +


ALU Delay (address calculation) +

Data Memory Access Time +


Register File Setup Time +

Clock Skew

Cycle time is much longer than needed for all other instructions.
Examples:
R-type instructions do not require data memory access
Jump does not require ALU operation nor data memory access

2002-4-3

ECE4680 Pipeline.3

Overview of a Multiple Cycle Implementation


The root of the single cycle processors problems:
The cycle time has to be long enough for the slowest instruction
Solution:
Break the instruction into smaller steps
Execute each step (instead of the entire instruction) in one cycle
- Cycle time: time it takes to execute the longest step
- Keep all the steps to have similar length
This is the essence of the multiple cycle processor
The advantages of the multiple cycle processor:
Cycle time is much shorter
Different instructions take different number of cycles to complete
-

Load takes five cycles


Jump only takes three cycles

Allows a functional unit to be used more than once per instruction

ECE4680 Pipeline.4

Why will this hinder


the pipeline?

2002-4-3

Multiple Cycle Processor


MCP: A functional unit to be used more than once per instruction
PCWrCond
Zero
IorD
MemWr
IRWr

PCWr

ALUSelA

WrAdr
32
Din Dout

32
32

32

Rt
Rt 0

Rd

Mux

Ideal
Memory

Ra
Rb

busA

Reg File

busW busB 32

Imm 16

1
32

Rw

1 Mux 0

<< 2

Extend

ExtOp

Zero

32

ALU

RAdr

0
1

32
32

2
3

32

MemtoReg

ECE4680 Pipeline.5

Target

32

Rs

BrWr

Mux

RegWr

Mux

Mux

Instruction Reg

32

32

RegDst

32

PC
32

PCSrc

ALU
Control

ALUOp
ALUSelB
2002-4-3

Outline of Todays Lecture--Pipelining


Introduction to the Concept of Pipelined Processor
Pipelined Datapath and Pipelined Control
How to Avoid Race Condition in a Pipeline Design?
Pipeline Example: Instructions Interaction

ECE4680 Pipeline.6

2002-4-3

Timing Diagram of a Load Instruction


Instruction Fetch

Instr Decode /
Reg. Fetch

Address

Data Memory

Reg Wr

Clk
Old Value

PC

Clk-to-Q
New Value
Instruction Memory Access Time
New Value

Rs, Rt, Rd,


Op, Func

Old Value

ALUctr

Old Value

ExtOp

Old Value

New Value

ALUSrc

Old Value

New Value

RegWr

Old Value

Delay through Control Logic


New Value

busA

Register File Access Time


New Value

Old Value
Delay through Extender & Mux
Old Value

busB

Register File Write Time

New Value

Address

Old Value

busW

Old Value

3
New Value
ALU Delay
New Value

Data Memory Access Time


New
2002-4-3

ECE4680 Pipeline.7

The Five Stages of Load


Cycle 1 Cycle 2

Load Ifetch

Reg/Dec

Cycle 3 Cycle 4 Cycle 5

Exec

Mem

Wr

Ifetch: Instruction Fetch


Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file

ECE4680 Pipeline.8

2002-4-3

Key Ideas Behind Pipelining


Grading the mid term exams:
5 problems, five people grading the exam
Each person ONLY grades one problem
Pass the exam to the next person as soon as one finishes his part
Assume each problem takes 0.5 hour to grade
- Each individual exam still takes 2.5 hours to grade
-

But with 5 people, all exams can be graded much quicker

The load instruction has 5 stages:


Five independent functional units to work on each stage
- Each functional unit is used only once
The 2nd load can start as soon as the 1st finishes its Ifet stage
Each load still takes five cycles to complete
The throughput, however, is much higher

2002-4-3

ECE4680 Pipeline.9

Pipelining the Load Instruction


Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clock
1st lw Ifetch

Reg/Dec

2nd lw Ifetch
3rd lw

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

The five independent functional units in the pipeline datapath are:


Instruction Memory for the Ifetch stage
Register Files Read ports (bus A and busB) for the Reg/Dec stage
ALU for the Exec stage
Data Memory for the Mem stage

Regiester file is used 2


times but no problem !

Register Files Write port (bus W) for the Wr stage


One instruction enters the pipeline every cycle
One instruction comes out of the pipeline (complete) every cycle
Comparison with m-cycle processor: 5x3 cycles versus 7 cycles
The Effective Cycles per Instruction (CPI) is 1
ECE4680 Pipeline.10

2002-4-3

The Four Stages of R-type


Cycle 1 Cycle 2

R-type Ifetch

Cycle 3 Cycle 4

Reg/Dec

Exec

Wr

Ifetch: Instruction Fetch


Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: ALU operates on the two register operands
Wr: Write the ALU output back to the register file

2002-4-3

ECE4680 Pipeline.11

Pipelining the R-type and Load Instruction


Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock
R-type Ifetch
R-type

Reg/Dec

Exec

Ifetch

Reg/Dec

Exec

Ifetch

Reg/Dec

Load

Ops! We have a problem!

Wr

R-type Ifetch

Wr
Exec

Mem

Wr

Reg/Dec

Exec

Wr

R-type Ifetch

Reg/Dec

Exec

Wr

Ideal Case
each functional unit is used once per instruction AND
all instructions are the same, just as discussed in the previous 2 slides.
We have a problem:
Two instructions try to write to the register file at the same time!
We will see many other problems involved in pipelining.
ECE4680 Pipeline.12

2002-4-3

Important Observation
Each functional unit can only be used once per instruction:
necessary but not sufficient.
Each functional unit must be used at the same stage for all instructions:
Load uses Register Files Write Port during its 5th stage
Load

1
Ifetch

2
Reg/Dec

3
Exec

4
Mem

5
Wr

R-type uses Register Files Write Port during its 4th stage
1
R-type Ifetch

2
Reg/Dec

3
Exec

4
Wr
Problem !

2002-4-3

ECE4680 Pipeline.13

Solution 1: Insert Bubble into the Pipeline


Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock
Ifetch
Load

Reg/Dec

Exec

Ifetch

Reg/Dec

R-type Ifetch

Wr
Exec

Mem

Wr

Reg/Dec

Exec

Wr

Wr

Exec
Exec
Wr
R-type Ifetch Reg/Dec Pipeline
Bubble
R-type Ifetch Reg/Dec Reg/Dec
Exec
Ifetch

Wr
Exec
Wr

Ifetch Reg/Dec
Reg/Dec
Exec

Wr
Exec

Insert a bubble into the pipeline to prevent 2 writes at the same cycle
The control logic can be complex
No instruction is completed during Cycle 5:
The Effective CPI for load is 2

ECE4680 Pipeline.14

2002-4-3

Solution 2: Delay R-types Write by One Cycle


Delay R-types register write by one cycle:
Now R-type instructions also use Reg Files write port at Stage 5
Mem stage is a NOOP stage: nothing is being done
1
R-type Ifetch

Cycle 1 Cycle 2

2
Reg/Dec

3
Exec

4
Mem

5
Wr

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock
R-type Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

R-type

Load

R-type Ifetch

R-type Ifetch

Wr

2002-4-3

ECE4680 Pipeline.15

The Four Stages of Store


Cycle 1 Cycle 2

Store

Ifetch

Reg/Dec

Cycle 3 Cycle 4

Exec

Mem

Wr

Ifetch: Instruction Fetch


Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Write the data into the Data Memory
Wr: is NOOP stage so that the pipeline diagram looks more uniform.

ECE4680 Pipeline.16

2002-4-3

The Four Stages of Beq


Cycle 1 Cycle 2

Cycle 3 Cycle 4
NOOP!

Beq

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch: Instruction Fetch


Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Find the difference of Beq
between pipeline processor and
M-cycle processor, think why?

Exec: See slide 23 for more details


ALU compares the two register operands
Adder calculates the branch target address

Mem: If the registers we compared in the Exec stage are the same,
Write the branch target address into the PC

2002-4-3

ECE4680 Pipeline.17

A Pipelined Datapath

Clock-to-Q delay

Clk
Ifetch

Reg/Dec

Exec
ExtOp

RegWr

Mem

Wr

Branch

ALUOp

No datapath
uder Wr ?

1
0

Ra

Rt

Rb

RFile
Rw Di

Rd

Exec
Unit
0

Zero

Data
Me
mDo
RA
WA
Di

Mux

Rt

Imm16
busA
busB

Mem/Wr Register

Rs

ID/Ex Register

Why need such


registers?

IF/ID Register

IUnit
I

PC+4

Imm16

Ex/Mem Register

PC+4

PC

ECE4680 Pipeline.18

PC+4

RegDst

ALUSrc

MemWr

MemtoReg
2002-4-3

The Instruction Fetch Stage


Location 10: lw $1, 0x100($2)

$1 <- Mem[($2) + 0x100]

You are here!


Clk
Ifetch

Reg/Dec

Exec
ExtOp

RegWr

Mem
Branch

ALUOp

1
0
PC+4

Imm16

Rb

RFile
Rw Di

Rt
Rd

Exec
Unit
0

Data
Me
mDo
RA
WA
Di

Mux

Rt

Zero

Mem/Wr Register

Ra

Imm16
busA
busB

Ex/Mem Register

Rs

ID/Ex Register

IUnit
I

IF/ID: lw $1, 100 ($2)

PC+4

PC = 14

PC+4

ALUSrc

RegDst

MemWr

MemtoReg
2002-4-3

ECE4680 Pipeline.19

A Detail View of the Instruction Unit


Location 10: lw $1, 0x100($2)

You are here!


Clk
Ifetch

Reg/Dec

1
0

Address
Instruction
Memory
Instruction

IF/ID: lw $1, 100 ($2)

ECE4680 Pipeline.20

Adder

PC = 14
PC new value old output?

10

2002-4-3

The Decode / Register Fetch Stage


Location 10: lw $1, 0x100($2)

$1 <- Mem[($2) + 0x100]


You are here!

Clk
Ifetch

Reg/Dec

Exec
ExtOp

RegWr

Mem
Branch

ALUOp

1
0
Imm16

Rt

Rb

RFile
Rw Di

Rd

Exec
Unit
0

Data
Me
mDo
RA
WA
Di

Mux

IUnit

Rt

Zero

Mem/Wr Register

Ra

Imm16
busA
busB

Ex/Mem Register

Rs

PC+4

ID/Ex: Reg. 2 & 0x100

IF/ID:

PC+4

PC

PC+4

ALUSrc

RegDst

MemWr

MemtoReg
2002-4-3

ECE4680 Pipeline.21

Loads Address Calculation Stage


Location 10: lw $1, 0x100($2)

$1 <- Mem[($2) + 0x100]


You are here!

Clk
Ifetch

Reg/Dec

Exec
ALUOp=Add
ExtOp=1

RegWr

Mem
Branch

1
0

Rd

Rw Di

Exec
Unit
0
1

RegDst=0

Data
Me
mDo
RA
WA
Di

ALUSrc=1 MemWr

Mux

Rt

Rb

RFile

Zero

Mem/Wr Register

IUnit
Remember Rt/Rd was
connected to Rfile before?

Rt

Imm16
busA
busB

Ex/Mem: Loads Address

Rs
Ra

ECE4680 Pipeline.22

PC+4

Imm16

ID/Ex Register

IF/ID:

PC+4

PC

PC+4

MemtoReg
2002-4-3

A Detail View of the Execution Unit


You are here!
Clk
Exec

Mem

Adder

32

busA 32
32

Zero

busB

ALU

ID/Ex Register

PC+4

Target

0
Extender

imm16
16

32

Mux

32

32

ALUctr

ALU
Control

ALUSrc=1

ExtOp=1

ALUout

Ex/Mem: Loads Memory Address

<< 2

ALUOp=Add
2002-4-3

ECE4680 Pipeline.23

Loads Memory Access Stage


Location 10: lw $1, 0x100($2)

$1 <- Mem[($2) + 0x100]


You are here!

Clk
Ifetch

Reg/Dec

Exec
ExtOp

RegWr

Mem
Branch=0

ALUOp

1
0

Ra

Rt

Rb

RFile
Rw Di

Rd

Data
Mem
RA Do
WA
Di

RegDst
ECE4680 Pipeline.24

Exec
Unit

Zero

ALUSrc

MemWr=0

Mux

IUnit
I

Rt

Imm16
busA
busB

Ex/Mem Register

Rs

Mem/Wr: Loads Data

PC+4

Imm16

ID/Ex Register

IF/ID:

PC+4

PC

PC+4

MemtoReg
2002-4-3

Loads Write Back Stage


Location 10: lw $1, 0x100($2)

$1 <- Mem[($2) + 0x100]


You are somewhere out there!

Clk
Ifetch

Reg/Dec

Exec
ExtOp

RegWr=1

Mem

Wr

Branch

ALUOp

1
0
PC+4

Imm16

Rt

Rb

RFile
Rw Di

Exec
Unit
0

Rd

Data
Mem
RA Do
WA
Di

Mux

IUnit

Rt

Zero

Mem/Wr Register

Ra

Imm16
busA
busB

Ex/Mem Register

Rs

ID/Ex Register

IF/ID:

PC+4

PC

PC+4

ALUSrc

RegDst

MemWr

MemtoReg=1
2002-4-3

ECE4680 Pipeline.25

How About Control Signals?


Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)
N = Exec, Mem, or Wr
Example: Controls Signals at Exec Stage = Func(Loads Exec)
Ifetch

Reg/Dec
Wr
RegWr

Exec
ALUOp=Add
ExtOp=1

Mem
Branch

1
0

Rd

Rw Di

Exec
Unit
0
1

RegDst=0

Zero

Data
Mem
RA Do
WA
Di

ALUSrc=1 MemWr

Mux

Rt

Rb

RFile

Imm16
busA
busB

Mem/Wr Register

IUnit
Why no control signals
at 1st and 2nd stages?

Rt

ID/Ex Register

Rs

Ex/Mem: Loads Address

PC+4

Imm16
Ra

ECE4680 Pipeline.26

IF/ID:

PC+4

PC

PC+4

MemtoReg
2002-4-3

Pipeline Control
The Main Control generates the control signals during Reg/Dec
Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later
Control signals for Mem (MemWr Branch) are used 2 cycles later
Control signals for Wr (MemtoReg MemWr) are used 3 cycles later
Reg/Dec

Exec

MemW
r
Branch

RegDst

MemtoReg
RegWr

MemW
r
Branch

Wr

Mem/Wr Register

RegDst

ExtOp
ALUSrc
ALUOp

Ex/Mem Register

Main
Control

ID/Ex Register

IF/ID Register

ExtOp
ALUSrc
ALUOp

Mem

MemW
rBranch

MemtoReg
RegWr

MemtoReg
RegWr

MemtoReg
RegWr

2002-4-3

ECE4680 Pipeline.27

Beginning of the Wrs Stage: A Real World Problem


Clk

Clk

RegAdr

WrAdr

RegWr

MemWr
RegWrs Clk-to-Q

MemWrs Clk-to-Q

RegAdrs Clk-to-Q

RegAdr
Data

Reg
File

Ex/Mem

Mem/Wr

RegWr

WrAdrs Clk-to-Q
MemWr

WrAdr
Data

Data
Memory

At the beginning of the Wr stage, we have a problem if:


RegAdrs (Rd or Rt) Clk-to-Q > RegWrs Clk-to-Q
Similarly, at the beginning of the Mem stage, we have a problem if:
WrAdrs Clk-to-Q > MemWrs Clk-to-Q
We have a race condition between Address and Write Enable!
Why can M-cycle processors approach not be useful?

ECE4680 Pipeline.28

2002-4-3

The Pipeline Problem


Multiple Cycle design prevents race condition between Addr and WrEn:
Make sure Address is stable by the end of Cycle N
Asserts WrEn during Cycle N + 1
This approach can NOT be used in the pipeline design because:
Must be able to write the register file every cycle
Must be able write the data memory every cycle

Clock
Store Ifetch

Reg/Dec

Store Ifetch

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

R-type Ifetch

R-type Ifetch

Wr

Solution? Recall 1-cycle processors approach.


2002-4-3

ECE4680 Pipeline.29

Synchronize Register File & Synchronize Memory


Solution: And the Write Enable signal with the Clock
This is the ONLY place where gating the clock is used
MUST consult circuit expert to ensure no timing violation:
- Example: Clock High Time > Write Access Delay
Synchronize Memory and Register File

Clk

Address, Data, and WrEn must be stable


at least 1 set-up time before the Clk edge

I_Addr
I_WrEn
C_WrEn

Actual write

Write occurs at the cycle following


the clock edge that captures the signals
WrEn

C_WrEn

WrEn
I_WrEn
Address
Data

ECE4680 Pipeline.30

I_Addr
I_Data

Reg File
or
Memory

Address
Data
Clk

Reg File
or
Memory

2002-4-3

A More Extensive Pipelining Example


Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Clock
0: Load Ifetch

Reg/Dec

4: R-type Ifetch

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

8: Store Ifetch
12: Beq (target is 1000)

End of
Cycle 4

End of
Cycle 5

Wr

End of End of
Cycle 6 Cycle 7

End of Cycle 4: Loads Mem, R-types Exec, Stores Reg, Beqs Ifetch
End of Cycle 5: Loads Wr,

R-types Mem, Stores Exec, Beqs Reg

End of Cycle 6:

R-types Wr,

End of Cycle 7:

Stores Mem, Beqs Exec


Stores Wr,

Beqs Mem
2002-4-3

ECE4680 Pipeline.31

Pipelining Example: End of Cycle 4


0: Loads Mem

4: R-types Exec
8: Stores Reg

8: Stores Reg
4: R-types Exec

12: Beqs Ifet

ALUOp=R-type
ExtOp=x

RegWr=0

0: Loads Mem

Branch=0

Clk

1
0

Ra

Rt
Rd

Rb

RFile
Rw Di

Exec
Unit
0
1

RegDst=1 ALUSrc=0

Zero

Clk
MemWr=0

Data
Mem
RA Do
WA
Di

Mux

IUnit

Rt

Imm16
busA
busB

Mem/Wr: Loads Dout

Rs

PC+4

Ex/Mem: R-types Result

Imm16

ID/Ex: Stores busA & B

PC+4

IF/ID: Beq Instruction

PC+4

PC = 16

ECE4680 Pipeline.32

12: Beqs Ifetch

MemtoReg=x
2002-4-3

Pipelining Example: End of Cycle 5


0: Lws Wr 4: Rs Mem 8: Stores Exec 12: Beqs Reg 16: Rs Ifetch
12: Beqs Reg

8: Stores Exec

0: Loads Wr

16: Rs Ifet

4: R-types Mem

ALUOp=Add
ExtOp=1

RegWr=1

Branch=0

Clk

1
0

Ra
Rb

RFile
Rw Di

Rt
Rd

Exec
Unit
0
1

RegDst=x ALUSrc=1

Zero

Data
Me
mDo
RA
WA
Di

Clk
MemWr=0

Mux

Rt

Imm16
busA
busB

Mem/Wr: R-types Result

Rs

PC+4

Ex/Mem: Stores Address

Imm16

ID/Ex: Beqs busA & B

IUnit
I

IF/ID: Instruction @ 16

PC+4

PC = 20

PC+4

MemtoReg=1
2002-4-3

ECE4680 Pipeline.33

Pipelining Example: End of Cycle 6


4: Rs Wr 8: Stores Mem 12: Beqs Exec 16: Rs Reg 20: Rs Ifet
16: R-types Reg
20:
R-types Ifet

4: R-types Wr
RegWr=1

12: Beqs Exec

8: Stores Mem

ALUOp=Sub
ExtOp=1

Branch=0

Clk

1
0

Ra

Rt
Rd

Rb

RFile
Rw Di

Exec
Unit
0
1

RegDst=x ALUSrc=0
ECE4680 Pipeline.34

Zero

Clk
MemWr=1

Data
Me
mDo
RA
WA
Di

Mux

Rt

Imm16
busA
busB

Mem/Wr: Nothing for St

Rs

PC+4

Ex/Mem: Beqs Results

Imm16

ID/Ex:R-types busA & B

IUnit
I

IF/ID: Instruction @ 20

PC+4

PC = 24

PC+4

MemtoReg=0
2002-4-3

Pipelining Example: End of Cycle 7


8: Stores Wr 12: Beqs Mem 16: Rs Exec 20: Rs Reg 24: Rs Ifet
20: R-types Reg
24:
R-types Ifet

8: Stores Wr

16: R-types Exec

12: Beqs Mem

ALUOp=R-type
ExtOp=x

RegWr=0

Branch=1

Clk

1
0

Ra

Rt

Rb

RFile
Rw Di

Rd

Exec
Unit
0
1

RegDst=1 ALUSrc=0

Zero

Data
Me
mDo
RA
WA
Di

Clk
MemWr=0

Mux

Rt

Imm16
busA
busB

Mem/Wr:Nothing for Beq

Rs

PC+4

Ex/Mem: Rtypes Results

Imm16

ID/Ex:R-types busA & B

IUnit
I

IF/ID: Instruction @ 24

PC+4

PC = 1000

PC+4

MemtoReg=x
2002-4-3

ECE4680 Pipeline.35

The Delay Branch Phenomenon


Cycle 4 Cycle 5

Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11

Clk
12: Beq Ifetch Reg/Dec Exec
(target is 1000)
16: R-type Ifetch Reg/Dec
20: R-type

Ifetch

24: R-type

Mem

Wr

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

1000: Target of Br

Wr

Although Beq is fetched during Cycle 4:


Target address is NOT written into the PC until the end of Cycle 7
Branchs target is NOT fetched until Cycle 8
3-instruction delay before the branch take effect
This is referred to as Branch Hazard:
Clever design techniques can reduce the delay to ONE instruction
ECE4680 Pipeline.36

2002-4-3

The Delay Load Phenomenon


Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Clock
I0: Load Ifetch
Plus 1

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Plus 2

Plus 3

Plus 4

Wr

Although Load is fetched during Cycle 1:


The data is NOT written into the Reg File until the end of Cycle 5
We cannot read this value from the Reg File until Cycle 6
3-instruction delay before the load take effect
This is referred to as Data Hazard:
Clever design techniques can reduce the delay to ONE instruction
ECE4680 Pipeline.37

2002-4-3

Summary
Disadvantages of the Single Cycle Processor
Long cycle time
Cycle time is too long for all instructions except the Load
Multiple Clock Cycle Processor:
Divide the instructions into smaller steps
Execute each step (instead of the entire instruction) in one cycle
Pipeline Processor:
Natural enhancement of the multiple clock cycle processor
Each functional unit can only be used once per instruction
If a instruction is going to use a functional unit:
- it must use it at the same stage as all other instructions
Pipeline Control:
- Each stages control signal depends ONLY on the instruction
that is currently in that stage

ECE4680 Pipeline.38

2002-4-3

Single Cycle, Multiple Cycle, vs. Pipeline


Cycle 1

Cycle 2

Clk
Single Cycle Implementation:
Load

Store

Waste

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
Ifetch
Reg
Exec
Mem

Wr

Store
Ifetch

Reg

Exec

Mem

R-type
Ifetch

Pipeline Implementation:
Load Ifetch

Reg

Store Ifetch

Exec

Mem

Wr

Reg

Exec

Mem

R-type Ifetch
ECE4680 Pipeline.39

Reg

Exec

Wr
Mem

Wr
2002-4-3

Вам также может понравиться