Академический Документы
Профессиональный Документы
Культура Документы
Joel Emer
Computer Science and Artificial Intelligence Laboratory
M.I.T.
http://www.csg.csail.mit.edu/6.823
L12-2
*
Assuming zero detect on register read
October 20, 2008 http://www.csg.csail.mit.edu/6.823 Arvind & Emer
L12-6
A PC Generation/Mux
P Instruction Fetch Stage 1
Branch F Instruction Fetch Stage 2
Target B Branch Address Calc/Begin Decode
Address I Complete Decode
Known
J Steer Instructions to Functional units
Branch
R Register File Read
Direction &
Jump E Integer Execute
Register Remainder of execute pipeline
Target (+ another 6 stages)
Known
Hardware solutions
• Find something else to do - delay slots
Replaces pipeline bubbles with useful work
(requires software cooperation)
• Speculate - branch prediction
Speculative execution of instructions beyond
the branch
Branch Prediction
Motivation:
Branch penalties limit performance of deeply pipelined
processors
JZ
backward forward
90% 50%
JZ
Dynamic Prediction
Truth/Feedback
Input Prediction
Predictor
Operations
• Predict
Prediction as a feedback control process • Update
Predictor Primitive
• Indexed table holding values
Index
• Operations
– Predict Prediction
Depth P
– Update
Update
I U
Width
• Algebraic notation
Temporal correlation
The way a branch resolves may be a good
predictor of the way it will resolve at the next
execution
Spatial correlation
Several branches may resolve in a highly
correlated manner (a preferred path of
execution)
One-bit Predictor
1 bit
PC
P Prediction
U Taken
A21064(PC; T) = P[ 1, 2K ](PC; T)
1 1 Strongly taken
On ¬taken
1 0 Weakly taken
On taken
0 1 Weakly ¬taken
0 0 Strongly ¬taken
Two-bit Predictor
2 bits
PC
P Prediction
Taken
I
U +/- Adder
k 2k-entry
I-Cache BHT Index BHT,
2 bits/entry
Instruction
Opcode offset
History Register
PC
P History
Taken
I
U Concatenate
History(PC, T) = P(PC; P || T)
Global History
0
Global History Prediction
Concat +/-
Taken
Local History
Concat +/-
Taken
Two-level Predictor
PC
Global
History Prediction
0
Concat
Concat +/-
Taken
Shift in
Taken/¬Taken
results of each
branch
Choosing Predictors
LHist
Prediction
GHist
Chooser
Choice Prediction
PC (4,096x2b)
Limitations of BHTs
Only predicts branch direction. Therefore, cannot redirect
fetch stream until after branch target is determined.
Correctly A PC Generation/Mux
predicted P Instruction Fetch Stage 1
taken branch F Instruction Fetch Stage 2
penalty B Branch Address Calc/Begin Decode
I Complete Decode
Jump Register J Steer Instructions to Functional units
penalty
R Register File Read
E Integer Execute
Remainder of execute pipeline
(+ another 6 stages)
PC
target BP
Address Collisions
A PC Generation/Mux
BTB P Instruction Fetch Stage 1
F Instruction Fetch Stage 2
BHT in later BHT B Branch Address Calc/Begin Decode
pipeline stage
I Complete Decode
corrects when
BTB misses a J Steer Instructions to Functional units
predicted R Register File Read
taken branch
E Integer Execute
Line Prediction
(Alpha 21[234]64)
Instr
Cache
Branch
Predictor
Line PC
Predictor Calc
Return
Stack
Indirect
Branch
Predictor
&fd() k entries
&fc() (typically k=8-16)
&fb()
October 20, 2008 http://www.csg.csail.mit.edu/6.823 Arvind & Emer
L12-35
Best predictors
reflect program
BP, behavior
JMP,
BTB Ret
P Reg
Decode Execute
C Read
http://www.csg.csail.mit.edu/6.823