Академический Документы
Профессиональный Документы
Культура Документы
Antonia Zhai
Department Computer Science and Engineering
University of Minnesota
http://www.cs.umn.edu/~zhai
With slides from: Profs. Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler
Branch on equal
beq: PC <-- Ra == 0 ? PC + 4 + disp*4 : PC + 4
0x39
ra
disp
31-26
25-21
20-0
Ex: Execute
Target <-- incrPC + SignExtend(IR[20:0]) << 2
Z <-- (A == 0)
MEM: Memory
PC <-- Z ? Target : incrPC
Antonia Zhai
University of Minnesota
10/8/15
ID/EX
EX/MEM
Zero
Test
Instr
Adata
20:0
Xtnd << 2
25:21
regA
20:16
P
C
datA
regB
Instr.
Mem.
datW
regW
+4
aluA
Reg.
Array
datB
ALUout
ALU
aluB
IncrPC
No
Branch Flag
University of Minnesota
Antonia Zhai
Problem
Useless
instructions
if branch
taken
IF ID EX
M WB
IF ID EX
M WB
IF ID EX
M WB
IF ID EX
M WB
addq
$31, 63, $1
addq
$31, 63, $2
addq
$31, 63, $3
addq
$31, 63, $4
PC Updated
Time
Antonia Zhai
University of Minnesota
10/8/15
Branch Prediction
Why does prediction work?
Underlying algorithm has regularities.
Loops are iterated multiple times
Data that is being operated on has regularities.
Instruction sequence has redundancies:
Artifacts of way that humans/compilers think
E.g., Error checking branches are rarely taken
Prediction Compressible information streams?
Prediction allows us to break control dependence constraints
University of Minnesota
Antonia Zhai
r1
r2
r3
beq
r2 / r3
r1 / r3
r2 - r3
r3, 100
Antonia Zhai
University of Minnesota
10/8/15
.. .. .. .. .. .. .. 1
1
0
0
0
University
of
Minnesota
Antonia Zhai
Example
Taken/Not
Taken
Instruction
Taken
Taken
Taken
Not taken
Taken
Not taken
Taken
Antonia Zhai
Prediction
0x08
University of Minnesota
10/8/15
Example
Taken/Not
Taken
Instruction
Prediction
Taken
Taken
Taken
Not taken
Taken
Not taken
Taken
Taken
0/1
0x08
Problem:
Predictor changes too
quickly
University
of
Minnesota
Antonia Zhai
Example
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {
total = a[i][j]
}
}
What is the misprediction rate?
Solution:
(200 + 2)/(1000 + 100)
University of Minnesota
10
10/8/15
Prediction II
Yes!
NT
Yes?
00/01/10/11
0x08
NT
No?
NT
No!
T
University of Minnesota
Antonia Zhai
11
Example
for i = 0; i < 100; i ++ {
for j = 0; j < 10; j++ {
total = a[i][j]
}
Solution:
}
(102 + 3)/(1000 + 100)
What is the misprediction rate?
There are two branches:
1. Backward branch for the inner loop
(1 out of 10 misprediction for each invocation,
100 invocation,
2 extra miss in the first iteration
Misprediction rate: 102/1000)
2. Backward branch for the outer loop
(3/100 misprediction for total)
Antonia Zhai
University of Minnesota
12
10/8/15
Generalization
Using a N-bit saturation counter as a predictor
If branch taken & counter value < (2^n 1):
Increment counter
Decrement counter
Prediction:
University of Minnesota
Antonia Zhai
13
Prediction III
University of Minnesota
14
10/8/15
Prev.
Branch
Taken
NT
NT
NT
T
NT
University of Minnesota
Antonia Zhai
15
Example
for(i = 0; i < 10; i++)
{
a = random(-100, 100)
// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2
clean my living room
} // b3
Input:
-10, 7, 5, 10, -2, -55, 4, -89, 33, -3
Antonia Zhai
Branch #2
B2
Predictor
Correlate
with B1
B1
Action
B2
Prediction
B2
Action
NT/NT
NT
NT
NT
NT
NT
NT
NT
NT
NT
NT
NT
T
University
of
Minnesota
10/8/15
Example
for(i = 0; i < 10; i++)
{
a = random(-100, 100)
// a is a random number
// from -100, 100
if (a > 1) // b1
conquer the world
if (a < -1) // b2
clean my living room
} // b3
Input:
-10, 7, 5, 10, -2, -55, 4, -89, 33, -3
Branch #2
B2
Predictor
Correlate
with B1
B1
Action
B2
Prediction
B2
Action
NT / NT
NT
NT
T / NT
NT
NT
T / NT
NT
NT
T / NT
NT
NT
T / NT
NT
T / NT
NT
T / NT
NT
NT
T / NT
NT
T / NT
NT
NT
T / NT
NT
University of Minnesota
Antonia Zhai
Definition
(1, 1) predictor
Uses the behavior of the last branch
Selects from 2^1 sets of choices
Each choice is coded with 1 bit
(m, n) predictor
Use the history of m branches
Select from 2^m sets of choices
Each choice is coded with n bits
Example: How many bits are there in a 1024-entry (2, 2) branch
predictor
1024 * (2^2) * 2 = 8192 bits
Antonia Zhai
University of Minnesota
18
10/8/15
A (2,2) Predictor
Branch Address
XX
Antonia Zhai
Local predictor:
Global predictor:
University of Minnesota
20
10
10/8/15
Tournament Predictor
Alpha branch predictor
4k-entry 2 predictor-predictor
A local predictor
Antonia Zhai
21
0x0000ac24
A
0x01
0x02
0x03
0b0000001010110000100100
0x04
0x05
0x06
0x07
0x2b0
0x0000aca4
A
0x08
0x09
0x0a
0x0b
0b0000001010110010100100
0x2b2
Antonia Zhai
0x0c
0x0d
0x0e
0x0f
University of Minnesota
11
10/8/15
Example
TAG/Address
Instruction
Prediction
NULL/NULL
Antonia Zhai
Example
TAG/Address
Instruction
Prediction
NULL/NULL
No match, no prediction
2b0/0xac48
Match, 0xac48
2b0/0xac48
Match, 0xac48
2b0/0xac48
Match, 0xac48
2b0/0xac48
Match, 0xac48
2b0/0xac48
Match, 0xac48
2b0/0xac48
Match, 0xac48
2b0/0xac48
No match, no prediction
2b2/0xacb8
No Match, no prediction
Antonia Zhai
University of Minnesota
12
10/8/15
University of Minnesota
Antonia Zhai
25
Hardware Speculation
When the prediction is wrong, incorrectly executed instruction must be
erased
Our hardware support system does not allow this
Extending the hardware --- hardware speculation
Separate the bypassing of results among instructions from the
completion of instruction
Adding an instruction commit stage to Tomasulos algorithm
Goal
Instruction commits inorder
Antonia Zhai
University of Minnesota
26
13
10/8/15
Instruction
Type
Destination
Field
Value Field
Ready Field
University of Minnesota
Antonia Zhai
27
Store Value
Store Addr
Instr.
Queue
addr
Mem.
Unit
Antonia Zhai
addr
Registers
Operation
Bus
Addr. Unit
Load
buffer
Reg#
3
2
1
Operand
Buses
2
1
Res.
Stations
FP Adders
FP Multipliers
28
14
10/8/15
Antonia Zhai
29
Antonia Zhai
University of Minnesota
30
15
10/8/15
Incorrectly
Predicted as
Not taken
Op1
Op2
ROB
Adder1
PC
Op1
Op2
Mult 1
ROB
Op1
Op2
ROB
V: 3
V:3
ROB1
Reorder Buffer
ROB1
1 cycles
1 cycles
1 cycles
Register File
Tag
R0
Mult 2
Branch1
i1: issue
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Adder2
Branch
delay slot
Op
Dst
Val.
Ready
ALU
R2
---
No
R1
R2
Value
100
Mult 1
11
ROB2
ROB3
Time: T0
ROB4
University of Minnesota
Antonia Zhai
31
Op2
i1:
I2:
i3:
I4:
ROB
Adder1
Op1
Adder2
PC
Op1
Op2
Mult 1 V: 3
ROB
i1: execute 1
i2: issue
Antonia Zhai
V:3
1 cycles
1 cycles
Register File
Tag
ROB1
R0
Mult 2
Reorder Buffer
Dst
Val.
Ready
ROB1
ALU
R2
---
No
ROB2
bne
---
---
No
ROB4
1 cycles
Op2 ROB
Op
ROB3
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
R1
R2
Value
100
Mult 1
11
Time: T1
University
of
Minnesota
32
16
10/8/15
Incorrectly
Predicted as
Not taken
Op1
Op2
ROB
Adder1
Adder2
PC
Op1
Op2
Mult 1
ROB
i1: execute 2
i2: wait (pred)
i3:issue (not shown)
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op1
Op2
ROB
V: 3
V:3
ROB1
1 cycles
1 cycles
1 cycles
Register File
Tag
R0
Mult 2
R1
Reorder Buffer
Op
Dst
Val.
Ready
ROB1
ALU
R2
---
No
ROB2
bne
---
---
No
ROB3
St
---
---
No
R2
Value
100
Mult 1
11
Time: T2
ROB4
University of Minnesota
Antonia Zhai
33
Incorrectly
Predicted as
Not taken
Adder1
Op1
Op2
ROB
V:3
V:3
ROB4
Adder2
PC
Op1
Op2
Mult 1
ROB
i1: execute 3
ROB1
i2: wait
ROB2
i3: execute (addr. Unit)
ROB3
i4: issue
ROB4
Antonia Zhai
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op1
Op2
ROB
V: 3
V:3
ROB1
1 cycles
1 cycles
1 cycles
Register File
Tag
R0 Adder 1
Mult 2
Reorder Buffer
Op
Dst
Val.
Ready
ALU
R2
---
N0
bne
---
---
N0
St
---
---
NO
ALU
R0
---
NO
R1
R2
Value
3
100
Mult 1
11
Time: T3
University
of
Minnesota
34
17
10/8/15
Incorrectly
Predicted as
Not taken
Adder1
Op1
Op2
ROB
V:3
V:3
ROB4
Op1
Adder2
PC
Branch1 V:PC+4
Op1
V:9
Op2
Op2
1 cycles
1 cycles
1 cycles
Register File
ROB
Tag
Mult 1
ROB
V:0x20 ROB2
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
R0 Adder 1
Mult 2
R1
Reorder Buffer
Op
Dst
Val.
Ready
ROB1
ALU
R2
Yes
ROB2
bne
---
---
N0
ROB3
St
---
---
NO
ROB4
ALU
R0
---
NO
R2
Value
3
100
Mult 1
11
Time: T4
University
of
Minnesota
Antonia Zhai
35
Incorrectly
Predicted as
Not taken
Adder1
Op1
Op2
ROB
V:3
V:3
ROB4
Op1
Adder2
PC
Branch1 V:PC+4
Op1
V:9
Op2
Antonia Zhai
Op2
1 cycles
1 cycles
1 cycles
Register File
ROB
Tag
Mult 1
ROB
V:0x20 ROB2
i1: commit
i2: execute
i3: write results
i4: write results(stall)
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
R0 Adder 1
Mult 2
Reorder Buffer
Op
Dst
Val.
Ready
ROB2
bne
---
---
N0
ROB3
St
200
V:3
Yes
ROB4
ALU
R0
---
NO
Value
3
R1
100
R2
ROB1
Time: T5
University
of
Minnesota
36
18
10/8/15
Incorrectly
Predicted as
Not taken
Adder1
Op1
Op2
ROB
V:3
V:3
ROB4
Op1
Adder2
PC
Branch1 V:PC+4
Op1
V:9
Op2
Op2
1 cycles
1 cycles
1 cycles
Register File
ROB
Tag
Mult 1
ROB
V:0x20 ROB2
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
R0 Adder 1
Mult 2
Reorder Buffer
Op
Dst
Val.
Ready
ROB2
bne
---
---
Yes
ROB3
St
200
V:3
Yes
ROB4
ALU
R0
V:6
Yes
Value
3
R1
100
R2
ROB1
Time: T6
University
of
Minnesota
Antonia Zhai
37
Incorrectly
Predicted as
Not taken
Op1
Op2
ROB
Adder1
Op1
PC
Op1
Op2
ROB
Branch1
i2: commit
(misprediction)
i3: wait for commit
i4: Squashed
Antonia Zhai
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Adder2
Op2
Branch
delay slot
1 cycles
1 cycles
1 cycles
Register File
ROB
Tag
Mult 1
Mult 2
Reorder Buffer
Op
Dst
Val.
Ready
St
200
V:3
Yes
Value
R0
R1
100
R2
ROB1
ROB2
ROB3
ROB4
Time: T7
University
of
Minnesota
38
19
10/8/15
Branch
Incorrectly
Predicted as
Not taken
Op1
Op2
ROB
Adder1
Op1
Adder2
PC
Op1
Op2
ROB
Branch1
i3: commit
3 cycles
R2 R0 * R0
bne R2, 0x20
100(R1) st R0
R0 R0 + R0
Op2
1 cycles
1 cycles
1 cycles
Register File
ROB
Tag
Mult 1
Mult 2
Reorder Buffer
Op
Dst
Val.
Ready
Value
R0
R1
100
R2
ROB1
ROB2
ROB3
ROB4
Antonia Zhai
Time: T8
University
of
Minnesota
39
20