Вы находитесь на странице: 1из 20

10/8/15

ILP: Out-of-Order Execution

Antonia Zhai
Department Computer Science and Engineering
University of Minnesota
http://www.cs.umn.edu/~zhai
With slides from: Profs. Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler

Branch on equal
beq: PC <-- Ra == 0 ? PC + 4 + disp*4 : PC + 4
0x39
ra
disp
31-26

25-21

20-0

IF: Instruction fetch


IR <-- IMemory[PC]
incrPC <-- PC + 4

ID: Instruction decode/register fetch


A <-- Register[IR[25:21]]

Ex: Execute
Target <-- incrPC + SignExtend(IR[20:0]) << 2
Z <-- (A == 0)

MEM: Memory
PC <-- Z ? Target : incrPC

WB: Write back


nop

Antonia Zhai

University of Minnesota

10/8/15

Datapath for Conditional Branch Instructions


IF/ID

ID/EX

EX/MEM

Zero
Test

Instr

Adata

20:0

Xtnd << 2

25:21

regA

20:16

P
C

datA

regB

Instr.
Mem.

datW
regW

+4

aluA

Reg.
Array
datB

ALUout

ALU
aluB

IncrPC

No
Branch Flag

University of Minnesota

Antonia Zhai

Branch Hazard Pipeline Diagram

Problem

Instruction fetched in IF, branch condition


set inIFMEM
beq
$31, target
ID EX M WB

Useless
instructions
if branch
taken

IF ID EX

M WB

IF ID EX

M WB

IF ID EX

M WB

IF ID EX

M WB

addq

$31, 63, $1

addq

$31, 63, $2

addq

$31, 63, $3

addq

$31, 63, $4

target: addq $31, 63, $5

PC Updated
Time
Antonia Zhai

University of Minnesota

10/8/15

Branch Prediction
Why does prediction work?
Underlying algorithm has regularities.
Loops are iterated multiple times
Data that is being operated on has regularities.
Instruction sequence has redundancies:
Artifacts of way that humans/compilers think
E.g., Error checking branches are rarely taken
Prediction Compressible information streams?
Prediction allows us to break control dependence constraints

University of Minnesota

Antonia Zhai

Elements of Branch Prediction


Determine whether it is a branch instruction
Predict whether it will be taken or not
Predict the target address if taken

r1
r2
r3
beq

r2 / r3
r1 / r3
r2 - r3
r3, 100

Antonia Zhai

Just Predicting Taken/Not Taken Can Help


The target can be computed much
earlier than the branch decision

University of Minnesota

10/8/15

Prediction I: A branch will do exactly what it


did last time
Branch History Table (BHT)
Each entry is a state machine;
Indexed by low-order bits of instruction address
Encode information about prior history of branch instructions
Small chance of two branch instructions aliasing
Predict whether or not branch will be taken
0
1
1
1

.. .. .. .. .. .. .. 1

1
0
0

Branch Prediction Table Index

0
University of Minnesota

Antonia Zhai

Example
Taken/Not
Taken

Instruction

Taken

0x108: beq r1, 0x20 Not Taken

Taken

0x108: beq r1, 0x20

Taken

0x108: beq r1, 0x20

Not taken

0x108: beq r1, 0x20

Taken

0x108: beq r1, 0x20

Not taken

0x208: beq r2, 0x10

Taken

0x108: beq r1, 0x20

Antonia Zhai

Prediction

0x08

University of Minnesota

10/8/15

Example
Taken/Not
Taken

Instruction

Prediction

Taken

0x108: beq r1, 0x20 Not taken

Taken

0x108: beq r1, 0x20 Taken

Taken

0x108: beq r1, 0x20 Taken

Not taken

0x108: beq r1, 0x20 Taken

Taken

0x108: beq r1, 0x20 Not taken

Not taken

0x208: beq r2, 0x10

Taken

0x108: beq r1, 0x20 Not taken

Taken

0/1

0x08

Problem:
Predictor changes too
quickly
University of Minnesota

Antonia Zhai

Example
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {
total = a[i][j]
}
}
What is the misprediction rate?

Solution:
(200 + 2)/(1000 + 100)

There are two branches:


1. Backward branch for the inner loop
(2 out of 10 misprediction for each invocation,
100 invocation,
Misprediction rate: 200/1000)
2. Backward branch for the outer loop
(2/100 misprediction for total)
Antonia Zhai

University of Minnesota

10

10/8/15

Prediction II

Change the prediction after two


mispredictions
2-bit saturation counter
NT
T

Yes!

NT
Yes?

00/01/10/11

0x08

NT
No?

NT

No!
T

University of Minnesota

Antonia Zhai

11

Example
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {
total = a[i][j]
}
Solution:
}
(102 + 3)/(1000 + 100)
What is the misprediction rate?
There are two branches:
1. Backward branch for the inner loop
(1 out of 10 misprediction for each invocation,
100 invocation,
2 extra miss in the first iteration
Misprediction rate: 102/1000)
2. Backward branch for the outer loop
(3/100 misprediction for total)
Antonia Zhai

University of Minnesota

12

10/8/15

Generalization
Using a N-bit saturation counter as a predictor
If branch taken & counter value < (2^n 1):

Increment counter

If branch not taken & counter > 0

Decrement counter

Prediction:

Taken: if most significant bit is 1


Not taken: if most significant bit is 0

Find the proper N:


We want to remember the history, but only recent history

University of Minnesota

Antonia Zhai

13

Prediction III

Whether a branch is taken or not


depends on other branch instructions
if (a > 1) // branch #1
conquer the world
if (a < -1) // branch #2
clean my living room

Two branch instructions:


If branch #1 is taken,
Branch #2 will never be taken

How can we make use of this information?


Antonia Zhai

University of Minnesota

14

10/8/15

Correlation Branch Predictor


Every branch has two separate predictors

One bit predicts the branch if the last branch is taken


One bit predicts the branch if the last branch is not taken

A.K.A. two-level predictor


Prev.
Branch
not
Taken

Prev.
Branch
Taken

NT
NT

NT
T

NT

One bit predictor with one bit of correlation

University of Minnesota

Antonia Zhai

15

Example
for(i = 0; i < 10; i++)
{
a = random(-100, 100)
// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2
clean my living room
} // b3
Input:
-10, 7, 5, 10, -2, -55, 4, -89, 33, -3

Antonia Zhai

Branch #2

B2
Predictor
Correlate
with B1

B1
Action

B2
Prediction

B2
Action

NT/NT

NT

NT

NT

NT

NT

NT

NT

NT

NT

NT

NT

T
University of Minnesota

10/8/15

Example
for(i = 0; i < 10; i++)
{
a = random(-100, 100)
// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2
clean my living room
} // b3
Input:
-10, 7, 5, 10, -2, -55, 4, -89, 33, -3

Branch #2

B2
Predictor
Correlate
with B1

B1
Action

B2
Prediction

B2
Action

NT / NT

NT

NT

T / NT

NT

NT

T / NT

NT

NT

T / NT

NT

NT

T / NT

NT

T / NT

NT

T / NT

NT

NT

T / NT

NT

T / NT

NT

NT

T / NT

NT

University of Minnesota

Antonia Zhai

Definition
(1, 1) predictor
Uses the behavior of the last branch
Selects from 2^1 sets of choices
Each choice is coded with 1 bit
(m, n) predictor
Use the history of m branches
Select from 2^m sets of choices
Each choice is coded with n bits
Example: How many bits are there in a 1024-entry (2, 2) branch
predictor
1024 * (2^2) * 2 = 8192 bits

Antonia Zhai

University of Minnesota

18

10/8/15

A (2,2) Predictor

Branch Address

XX

00/01/10/11 Two bit global history


University of Minnesota

Antonia Zhai

Combine the Local & Global Predictors

Local predictor:

Predict based on history of just one branch

Global predictor:

Predictor based on global history

Combine them with a selector (a multilevel predictor)


A branch predictor without branch address???
Antonia Zhai

University of Minnesota

20

10

10/8/15

Tournament Predictor
Alpha branch predictor
4k-entry 2 predictor-predictor

Use a 2 bit saturation counter to select between two predictors


Based on local information of the branch

A local predictor

1024-entry 10-bit predictor, keeps track of 10 most recent


outcomes
The entry then selects from a 3-bit saturation counter

4k-entry global predictor,

indexed by the history of 12 branches,


Each entry is a standard 2-bit predictor

11.5 misprediction per 1000 completed instruction for


SPECint95
University of Minnesota

Antonia Zhai

21

Branch Target Buffer


0x00

0x0000ac24
A

0x01
0x02
0x03

0b0000001010110000100100

0x04
0x05
0x06
0x07

0x2b0
0x0000aca4
A

0x08

0x09
0x0a
0x0b

0b0000001010110010100100
0x2b2
Antonia Zhai

0x0c
0x0d
0x0e
0x0f

University of Minnesota

11

10/8/15

Example
TAG/Address

Instruction

Prediction

(all branches are taken)

NULL/NULL

0xac24: beq r1, 0x20


0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xac24: beq r1, 0x20
0xaca4: beq r2, 0x10
0xac24: beq r1, 0x20
University of Minnesota

Antonia Zhai

Example
TAG/Address

Instruction

Prediction

(all branches are taken)

NULL/NULL

0xac24: beq r1, 0x20

No match, no prediction

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xac24: beq r1, 0x20

Match, 0xac48

2b0/0xac48

0xaca4: beq r2, 0x10

No match, no prediction

2b2/0xacb8

0xac24: beq r1, 0x20

No Match, no prediction

Antonia Zhai

University of Minnesota

12

10/8/15

The Entire Process

University of Minnesota

Antonia Zhai

25

Hardware Speculation
When the prediction is wrong, incorrectly executed instruction must be
erased
Our hardware support system does not allow this
Extending the hardware --- hardware speculation
Separate the bypassing of results among instructions from the
completion of instruction
Adding an instruction commit stage to Tomasulos algorithm
Goal
Instruction commits inorder

Antonia Zhai

University of Minnesota

26

13

10/8/15

Reorder Buffer --- ROB


Hardware buffer that holds the results of instructions that have
finished execution but not yet committed

Instruction
Type

Destination
Field

Value Field

Ready Field

University of Minnesota

Antonia Zhai

27

Tomasulos Algorithm with ROB


ROB

Store Value
Store Addr
Instr.
Queue

addr
Mem.
Unit
Antonia Zhai

addr

Registers

Operation
Bus

Addr. Unit
Load
buffer

Reg#

3
2
1

Operand
Buses

2
1

Res.
Stations
FP Adders

FP Multipliers

Common data bus (CDB)


University of Minnesota

28

14

10/8/15

Four Execution Stages


Issue:
Get data from in order instruction queue
Issue instruction if a reservation station and a ROB entry is
available, else stall
Read register value if available in the register or ROB, else set tag
Update register entry and ROB entry
Execute (a.k.a. issue):
Monitor the common data bus to wait for operands
Execute when both operands are ready (Resolve RAW dependences)
Write Results:
Write result to the CDB all reservation stations, ROB
For store value and address are sent to ROB
Commit
University of Minnesota

Antonia Zhai

29

Four Execution Stages


Commit (a.k.a., complete, graduate)
Normal commit: head of ROB & results in the buffer
Update register
Remove instruction from ROB
Store instruction
Update memory
Remove instruction from ROB
Branch instruction
Correctly predicted, nothing
Incorrectly predicted, flush ROB

Antonia Zhai

University of Minnesota

30

15

10/8/15

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Op1

Op2

ROB

Adder1

PC

Op1

Op2

Mult 1

ROB

Op1

Op2

ROB

V: 3

V:3

ROB1

Reorder Buffer

ROB1

1 cycles
1 cycles
1 cycles
Register File
Tag
R0

Mult 2

Branch1

i1: issue

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

Adder2

Branch
delay slot

Op

Dst

Val.

Ready

ALU

R2

---

No

R1
R2

Value
100

Mult 1

11

ROB2
ROB3

Time: T0

ROB4

University of Minnesota

Antonia Zhai

31

Examples: Hardware Speculation


Incorrectly
Predicted as
Not taken
Op1

Op2

i1:
I2:
i3:
I4:

ROB

Adder1

Op1

Adder2
PC

Op1

Op2

Mult 1 V: 3

ROB

Branch1 V:PC+4 T:Mult1 V:0x20 ROB2

i1: execute 1
i2: issue

Antonia Zhai

V:3

1 cycles
1 cycles
Register File
Tag

ROB1

R0

Mult 2

Reorder Buffer
Dst

Val.

Ready

ROB1

ALU

R2

---

No

ROB2

bne

---

---

No

ROB4

1 cycles

Op2 ROB

Op

ROB3

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

R1
R2

Value
100

Mult 1

11

Time: T1
University of Minnesota

32

16

10/8/15

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Op1

Op2

ROB

Adder1
Adder2
PC

Op1

Op2

Mult 1

ROB

Branch1 V:PC+4 T:Mult1 V:0x20 ROB2

i1: execute 2
i2: wait (pred)
i3:issue (not shown)

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op1

Op2

ROB

V: 3

V:3

ROB1

1 cycles
1 cycles
1 cycles
Register File
Tag
R0

Mult 2

R1

Reorder Buffer

Op

Dst

Val.

Ready

ROB1

ALU

R2

---

No

ROB2

bne

---

---

No

ROB3

St

---

---

No

R2

Value
100

Mult 1

11

Time: T2

ROB4

University of Minnesota

Antonia Zhai

33

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Adder1

Op1

Op2

ROB

V:3

V:3

ROB4

Adder2
PC

Op1

Op2

Mult 1

ROB

Branch1 V:PC+4 T:Mult1 V:0x20 ROB2

i1: execute 3
ROB1
i2: wait
ROB2
i3: execute (addr. Unit)
ROB3
i4: issue
ROB4

Antonia Zhai

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op1

Op2

ROB

V: 3

V:3

ROB1

1 cycles
1 cycles
1 cycles
Register File
Tag
R0 Adder 1

Mult 2

Reorder Buffer

Op

Dst

Val.

Ready

ALU

R2

---

N0

bne

---

---

N0

St

---

---

NO

ALU

R0

---

NO

R1
R2

Value
3
100

Mult 1

11

Time: T3
University of Minnesota

34

17

10/8/15

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Adder1

Op1

Op2

ROB

V:3

V:3

ROB4

Op1

Adder2
PC
Branch1 V:PC+4

Op1
V:9

Op2

Op2

1 cycles
1 cycles
1 cycles
Register File

ROB

Tag

Mult 1

ROB

V:0x20 ROB2

i1: write result


i2: wait
i3: write result(stall)
i4: execute

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

R0 Adder 1

Mult 2

R1

Reorder Buffer

Op

Dst

Val.

Ready

ROB1

ALU

R2

Yes

ROB2

bne

---

---

N0

ROB3

St

---

---

NO

ROB4

ALU

R0

---

NO

R2

Value
3
100

Mult 1

11

Time: T4
University of Minnesota

Antonia Zhai

35

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Adder1

Op1

Op2

ROB

V:3

V:3

ROB4

Op1

Adder2
PC
Branch1 V:PC+4

Op1
V:9

Op2

Antonia Zhai

Op2

1 cycles
1 cycles
1 cycles
Register File

ROB

Tag

Mult 1

ROB

V:0x20 ROB2

i1: commit
i2: execute
i3: write results
i4: write results(stall)

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

R0 Adder 1

Mult 2

Reorder Buffer

Op

Dst

Val.

Ready

ROB2

bne

---

---

N0

ROB3

St

200

V:3

Yes

ROB4

ALU

R0

---

NO

Value
3

R1

100

R2

ROB1

Time: T5
University of Minnesota

36

18

10/8/15

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Adder1

Op1

Op2

ROB

V:3

V:3

ROB4

Op1

Adder2
PC
Branch1 V:PC+4

Op1
V:9

Op2

Op2

1 cycles
1 cycles
1 cycles
Register File

ROB

Tag

Mult 1

ROB

V:0x20 ROB2

i2: write result


(misprediction)
i3: wait for commit
i4: write results

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

R0 Adder 1

Mult 2

Reorder Buffer

Op

Dst

Val.

Ready

ROB2

bne

---

---

Yes

ROB3

St

200

V:3

Yes

ROB4

ALU

R0

V:6

Yes

Value
3

R1

100

R2

ROB1

Time: T6
University of Minnesota

Antonia Zhai

37

Examples: Hardware Speculation


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Op1

Op2

ROB

Adder1

Op1
PC

Op1

Op2

ROB

Branch1

i2: commit
(misprediction)
i3: wait for commit
i4: Squashed
Antonia Zhai

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0

Adder2

Op2

Branch
delay slot

1 cycles
1 cycles
1 cycles
Register File

ROB

Tag

Mult 1
Mult 2

Reorder Buffer

Op

Dst

Val.

Ready

St

200

V:3

Yes

Value

R0

R1

100

R2

ROB1
ROB2
ROB3
ROB4

Time: T7
University of Minnesota

38

19

10/8/15

Branch

Examples: Hardware Speculation delay slot


i1:
I2:
i3:
I4:

Incorrectly
Predicted as
Not taken
Op1

Op2

ROB

Adder1

Op1

Adder2
PC

Op1

Op2

ROB

Branch1

i3: commit

3 cycles

R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op2

1 cycles
1 cycles
1 cycles
Register File

ROB

Tag

Mult 1
Mult 2

Reorder Buffer

Op

Dst

Val.

Ready

Value

R0

R1

100

R2

ROB1
ROB2
ROB3
ROB4

Antonia Zhai

Store 3 to memory location 200

Time: T8
University of Minnesota

39

20

Вам также может понравиться