Pipeline Note 2

Data Hazards
Data Hazards
We must ensure that the results obtained when instructions are executed in a pipelined processor are identical to those obtained when the same instructions are executed sequentially. Hazard occurs A3+A B4A No hazard A5C B 20 + C When two operations depend on each other, they must be executed sequentially in the correct order. Another example: Mul R2, R3, R4 Add R5, R4, R6
Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9
I 2 (Add)
I3
F3
D3
E3
W3
I4
F4
D4
E4
W4
Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.
Data Hazards
Figure 8.6. Pipeline stalled by data dependency between D2 and W1.
Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9
I 2 (Add)
I3
F3
D3
E3
W3
I4
F4
D4
E4
W4
Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.
Data Hazards
Figure 8.6. Pipeline stalled by data dependency between D2 and W1.
Operand Forwarding
Instead of from the register file, the second instruction can get data directly from the output of ALU after the previous instruction is completed. A special arrangement needs to be made to forward the output of ALU to the input of ALU.
Data Hazards
Select R2 and R3 for ALU Operations ADD R2 and R3 STORE SUM IN R1
ADD R1, R2, R3
IF
ID
EX
WB
SUB R4, R1, R5
IF
ID
EX
WB
Select R1 and R5 for ALU Operations
Stalling
Stalling involves halting the flow of instructions until the required result is ready to be used. However stalling wastes processor time by doing nothing while waiting for the result.
Cont
ADD R1, R2, R3
IF ID EX M WB
STALL
IF
ID
EX
WB
STALL
IF
ID
EX
WB
STALL
IF
ID
EX
WB
SUB R4, R1, R5
IF
ID
EX
WB
Type of Pipelining
Software Pipelining
Can Handle Complex Instructions Allows programs to be reused
Hardware Pipelining
Help designer manage complexity a complex task can be divided into smaller, more manageable pieces. Hardware pipelining offers higher performance
Type of Hardware Pipelines

Instruction Pipeline - An instruction pipeline is very similar to a manufacturing assembly line. Data Pipeline data pipeline is designed to pass data from stage to stage.
Instruction Pipelines Conflict

It divided into two categories.
Data Conflicts Branch Conflicts
When the current instruction changes a register that the next one needed, data conflicts happens. When the current instruction make a jump, branch conflicts happens.
Control Hazards
Overview
Branch Penalties Branch Prediction
Branch Penalties
Branches are a major problem because you do not know which instruction will come next until the instruction is executed The time to fill the pipeline again is known as the branch penalty. The larger the number of stages in the pipeline the larger the potential branch penalty.
Branch Penalties Cont..
Instruction 3 is a conditional branch to instruction 20, which is taken. As soon as it is executed in step 6, the pipeline is flushed (instruction 3 is able to complete) and instructions starting at #20 are loaded into the pipeline. Note that no instructions are completed during cycles 7 through 10.
Branch Prediction: Improving Branch Performance

Static Prediction With static branch prediction the instruction loaded will depend on the design of the pipeline. If the prediction is correct there is no branch penalty. A compiler has to aware of the type of prediction used on the machine in order to optimize the machine code properly. Never taken - The prediction is that the branch will not be taken. Therefore, the next instructions in memory are loaded into the pipeline Always taken The prediction is that the branch will be taken so the next instruction loaded is at the branch destination. Code prediction The processor determines which prediction to use based on the instruction.
Delayed Branch
The instructions in the delay slots are always fetched. Therefore, we would like to arrange for them to be fully executed whether or not the branch is taken. The objective is to place useful instructions in these slots. The effectiveness of the delayed branch approach depends on how often it is possible to reorder instructions.
Delayed Branch
LOOP NEXT Shift_left Decrement Branch=0 Add (a) Original program loop R1 R2 LOOP R1,R3
LOOP
NEXT
Decrement Branch=0 Shift_left Add

(b) Reordered instructions
R2 LOOP R1 R1,R3
Figure 8.12. Reordering of instructions for a delayed branch.
Clock c
ycle
Delayed Branch
2 3 4 5 E F E
T ime 6 7 8
Instruction Decrement F
Branch
Shift (delay slot)
Decrement (Branch taken)
Branch
Shift (delay slot)
Add (Branch not taken)
Figure 8.13. Execution timing showing the delay slot being filled during the last two passes through the loop in Figure 8.12.
Branch Prediction: Improving Branch Performance

Static Prediction With static branch prediction the instruction loaded will depend on the design of the pipeline. If the prediction is correct there is no branch penalty. A compiler has to aware of the type of prediction used on the machine in order to optimize the machine code properly. Never taken - The prediction is that the branch will not be taken. Therefore, the next instructions in memory are loaded into the pipeline Always taken The prediction is that the branch will be taken so the next instruction loaded is at the branch destination. Code prediction The processor determines which prediction to use based on the instruction.
Branch Prediction Cont..
Better performance can be achieved if we arrange for some branch instructions to be predicted as taken and others as not taken. Use hardware to observe whether the target address is lower or higher than that of the branch instruction. Let compiler include a branch prediction bit. So far the branch prediction decision is always the same every time a given instruction is executed static branch prediction.
Influence on Instruction Sets
Overview
Some instructions are much better suited to pipeline execution than others. Addressing modes Conditional code flags
Addressing Modes
Addressing modes include simple ones and complex ones. In choosing the addressing modes to be implemented in a pipelined processor, we must consider the effect of each addressing mode on instruction flow in the pipeline:
Side effects The extent to which complex addressing modes cause the pipeline to stall Whether a given mode is likely to be used by compilers
Addressing Modes
In a pipelined processor, complex addressing modes do not necessarily lead to faster execution. Advantage: reducing the number of instructions / program space Disadvantage: cause pipeline to stall / more hardware to decode / not convenient for compiler to work with Conclusion: complex addressing modes are not suitable for pipelined execution.
Addressing Modes
Good addressing modes should have:
Access to an operand does not require more than one access to the memory Only load and store instruction access memory operands The addressing modes used do not have side effects
Register, register indirect, index
Conditional Codes
If an optimizing compiler attempts to reorder instruction to avoid stalling the pipeline when branches or data dependencies between successive instructions occur, it must ensure that reordering does not cause a change in the outcome of a computation. The dependency introduced by the conditioncode flags reduces the flexibility available for the compiler to reorder instructions.
Datapath and Control Considerations
Incrementer
PC
Re gister f ile
Constant 4
MUX
A ALU B R
Instruction decoder
IR
MDR
Original Design
Memory b us data lines Address lines
MAR
Figure 7.8. T hree-b us or ganization of the datapath.
Re gister f ile
Bus A
Bus B
A ALU B R
PC Control signal pipeline Incrementer
Instruction decoder
IMAR Memory address (Instruction f etches)
Instruction queue MDR/Write Instruction cache
DMAR
MDR/Read
Pipelined Design
Memory address (Data access) Data cache
Figure 8.18. Datapath modified for pipelined xecution, e with interstage uffers b at the input and output of the ALU.
- Separate instruction and data caches - PC is connected to IMAR - DMAR - Separate MDR - Buffers for ALU - Instruction queue - Instruction decoder output
- Reading an instruction from the instruction cache - Incrementing the PC - Decoding an instruction - Reading from or writing into the data cache - Reading the contents of up to two regs - Writing into one register in the reg file - Performing an ALU operation
Bus C

Pipeline Note 2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Pipeline Note 2

Загружено:

Авторское право:

Доступные форматы

Data Hazards

Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9

Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9

Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

ADD R1, R2, R3

SUB R4, R1, R5

Select R1 and R5 for ALU Operations

SUB R4, R1, R5

Type of Hardware Pipelines

Instruction Pipelines Conflict

Branch Penalties Cont..

Branch Prediction: Improving Branch Performance

Decrement Branch=0 Shift_left Add

Figure 8.12. Reordering of instructions for a delayed branch.

Shift (delay slot)

Decrement (Branch taken)

Shift (delay slot)

Add (Branch not taken)

Branch Prediction: Improving Branch Performance

Branch Prediction Cont..

Influence on Instruction Sets

Register, register indirect, index

Datapath and Control Considerations

Figure 7.8. T hree-b us or ganization of the datapath.

PC Control signal pipeline Incrementer

IMAR Memory address (Instruction f etches)

Instruction queue MDR/Write Instruction cache

Вам также может понравиться