Вы находитесь на странице: 1из 5

CS-421 Parallel Processing BE (CIS) Batch 2004-05

Handout_7

Branch Prediction
Static Branch Prediction
When branch prediction is not based on branch behavior i.e. the predictor always predicts the
branch in the same direction (taken/not-taken), such prediction is called static branch prediction.
Dynamic Branch Prediction
• The goal of dynamic branch prediction is to make use of run-time behavior of branch to more
accurately predict what direction (taken or not-taken) a given branch will follow.
• To achieve this following control structures are employed.
Branch Prediction Buffer
• A branch prediction buffer (BPB) or branch history table (BHT) is used to hold this
required information.
o BHT is indexed by lower 16 bits of the branch instruction address. The indexed entry
may be a 1 or 2 bit value depending upon 1-bit or 2-bit predictor used.

Lower 16 bits
of PC

T
NT
T
T
NT
NT
.
.
.
o This gives a prediction of direction of branch.
o In 1-bit prediction scheme, entry is inverted if the branch is predicted incorrectly, no
change otherwise.
o The predictor guesses that a branch will behave the same way as it did the last time.
o The following state diagram shows the operation of a 1-bit branch predictor.
Taken Branch
Not-Taken Branch 0 1
(Predict (Predict
Not-Taken) Not-Taken Branch Taken)

Taken Branch
o The drawback of a 1-bit branch predictor is that it reverses its prediction on just one
misprediction giving less prediction accuracy specially for highly regular branches

Page - 1 - of 5
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_7
that strongly favor taken or not-taken as most branches do (e.g. backward branches
are mostly taken in case of loop tests)

o On the first iteration we predict the backward branch to be NT but it is actually T.


o On the last iteration we predict the backward branch to be T but it is actually NT.
o Thus, every loop suffers two mispredictions. This is especially bad for nested loops.
o The prediction accuracy can be improved by employing more bits for prediction. In
an n-bit prediction scheme, an n-bit saturating counter is used.
o An n-bit ordinary counter resets to 0 when incremented at the count of (2n- 1);
similarly, it reaches (2n- 1) when decremented at 0.
o However, an n-bit saturating counter stays (i.e. saturates) at (2n- 1) when
incremented at (2n- 1); similarly, it stays (i.e. saturates) at 0 when decremented at 0.
o The first half of states i.e. from 0 to 2n-1-1 of saturating counter are treated as predict
not-taken whereas next half of states i.e. from 2n-1 to 2n -1 are treated as predict
taken.
o Counter is incremented for a taken branch and decremented on a not-taken branch.
Example: 2-bit Branch Predictor
o The 2-bit branch predictor adds inertia to the prediction so that the prediction
changes only when the branch goes against the prediction twice.
o The counter saturates at counts of 11 and 00 (i.e. it does not roll over). The FSM
below describes the modus operandi of a 2-bit branch predictor.

Page - 2 - of 5
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_7
Taken Branch
00 01
Not-Taken Branch (Predict (Predict
Not-Taken) Not-Taken Branch Not-Taken)

Not-Taken Branch Taken Branch


Taken Branch
11 10
(Predict (Predict
Taken) Taken)

Taken Branch
o As discussed previously in the context of backward branches in loops, 2-bit
prediction improves the performance as compared to 1-bit prediction scheme.

o In practice, a 2-bit branch predictor gives fairly good performance (93% accuracy)
and therefore many architectures rely on 2-bit predictors for branch prediction.
Downside of BPB
o May cause aliasing as lower 16-bits of two different branches may match. However,
this works as it’s just a guess.
o A BHT is accessed in the IF stage and even a correctly predicted taken branch causes
1 cycle penalty as the branch target address can’t be known before ID stage.
o Can we avoid this penalty?
We can avoid this penalty employing Branch Target Buffer (BTB)
Branch Target Buffer (BTB)
• Indexed during IF stage using PC of instruction fetched.
• Each entry holds:

Page - 3 - of 5
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_7
o Predicted target address (if branch is taken)
o Prediction (taken/not-taken) - this is optional field. If this field isn’t used then each
entry in the BTB is for a taken branch (We also assume this approach)
• Branch target address is known before ID stage => no penalty for correct prediction.

• Every reference isn’t necessarily a HIT. (This is in contrast to BPB where every reference is a
HIT)
• In a variation of BTB, target instruction is stored in BTB rather than target address.
• Figure 2 shows the steps followed when a BTB is used with no prediction field.

Page - 4 - of 5
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_7

The following table lists branch penalties for various branches using BTB (assuming that BTB
only stores taken branches):
Branch in Penalty
Branch Prediction Actual Branch
BTB (clock Cycles)
Yes Taken Taken 0
Yes Taken Not-Taken 2
No Not-Taken Taken 2
No Not-Taken Not-Taken 0

******

Page - 5 - of 5