Вы находитесь на странице: 1из 77

Computer Organization and Architecture (AT70.

01)
Comp. Sc. and Inf. Mgmt. Asian Institute of Technology
Instructor: Dr. Sumanta Guha Slide Sources: Patterson & Hennessy COD book site (copyright Morgan Kaufmann) adapted and supplemented

COD Ch. 6 Enhancing Performance with Pipelining

Pipelining

Start work ASAP!! Do not waste time!


Time Task order A B C D 6 PM 7 8 9 10 11 12 1 2 AM

Not pipelined

Assume 30 min. each task wash, dry, fold, store and that separate tasks use separate hardware and so can be overlapped
6 PM Time 7 8 9 10 11 12 1 2 AM

Task order A B C D

Pipelined

Pipelined vs. Single-Cycle Instruction Execution: the Plan


Program execution Time order (in instructions) lw $1, 100($0) 2 4 6 8 10 12 14 16 18
Instruction Reg fetch ALU Data access Reg Instruction Reg fetch ALU

Single-cycle
Data access Reg Instruction fetch

lw $2, 200($0)
lw $3, 300($0)

8 ns

8 ns

...
8 ns

Assume 2 ns for memory access, ALU operation; 1 ns for register access: therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program 2 execution Time order (in instructions) Instruction lw $1, 100($0)
fetch

10

12

14

Reg Instruction fetch

ALU Reg Instruction fetch

Data access ALU Reg

Reg Data access ALU Reg Data access Reg

Pipelined

lw $2, 200($0) lw $3, 300($0)

2 ns

2 ns

2 ns

2 ns

2 ns

2 ns

2 ns

Pipelining: Keep in Mind



Pipelining does not reduce latency of a single task, it increases throughput of entire workload Pipeline rate limited by longest stage
potential speedup = number pipe stages unbalanced lengths of pipe stages reduces speedup

Time to fill pipeline and time to drain it when there is slack in the pipeline reduces speedup

Example Problem

Problem: for the laundry fill in the following table when


1.
2.

the stage lengths are 30, 30, 30 30 min., resp. the stage lengths are 20, 20, 60, 20 min., resp.
Pipeline 1 finish time Ratio unpipelined to pipeline 1 Pipeline 2 finish time Ratio unpiplelined to pipeline 2

Person 1 2 3 4

Unpipelined finish time

Come up with a formula for pipeline speed-up!

Pipelining MIPS

What makes it easy with MIPS?

all instructions are same length

so fetch and decode stages are similar for all instructions simplifies instruction decode and makes it possible in one stage so memory access can be deferred to exactly one later stage one data transfer instruction requires one memory access stage

just a few instruction formats

memory operands appear only in load/stores

operands are aligned in memory

Pipelining MIPS

What makes it hard?

structural hazards: different instructions, at different stages,


in the pipeline want to use the same hardware resource control hazards: succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction, already in pipeline data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline

Before actually building the pipelined datapath and control we first briefly examine these potential hazards individually

Structural Hazards

Structural hazard: inadequate hardware to simultaneously

support all instructions in the pipeline in the same clock cycle E.g., suppose single not separate instruction and data memory in pipeline below with one read port

then a structural hazard between first and fourth lw instructions


Program 2 execution Time order (in instructions) Instruction lw $1, 100($0)
fetch

10

12

14

Reg Instruction fetch

ALU Reg Instruction fetch

Data access ALU Reg

Reg Data access ALU Reg Reg Data access ALU

Pipelined
Hazard if single memory
Reg

lw $2, 200($0) lw $3, 300($0) lw $4, 400($0)

2 ns

2 ns

2 ns

Instruction fetch

Data access

Reg

2 ns

2 ns

2 ns

2 ns

2 ns

MIPS was designed to be pipelined: structural hazards are easy


to avoid!

Control Hazards

Control hazard: need to make a decision based on the result of


a previous instruction still executing in pipeline Solution 1 Stall the pipeline
2 4 6 8 10 12

Program execution Time order (in instructions) add $4, $5, $6 beq $1, $2, 40 lw $3, 300($0)
Instruction fetch

14

16

Reg Instruction fetch

ALU Reg

Data access ALU Instruction fetch

Reg Data access Reg Reg ALU

Note that branch outcome is computed in ID stage with added hardware (later)
Data access

2ns

bubble

Reg

4 ns

2ns

Pipeline stall

Control Hazards

Solution 2 Predict branch outcome

e.g., predict branch-not-taken :


2 4 6 8 10 12 14

Program execution Time order (in instructions) add $4, $5, $6 beq $1, $2, 40 2 ns lw $3, 300($0)

Instruction Reg fetch

ALU

Data access ALU

Reg Data access ALU

Instruction Reg fetch

Reg Data access

2 ns

Instruction Reg fetch

Reg

Prediction success
Program execution Time order (in instructions) add $4, $5 ,$6 beq $1, $2, 40 2 ns 2 4 6 8 10 12 14

Instruction Reg fetch

ALU

Data access ALU

Reg Data access

Instruction Reg fetch

Reg

bubble

bubble

bubble

bubble
ALU

bubble
Data access Reg

or $7, $8, $9

4 ns

Instruction Reg fetch

Prediction failure: undo (=flush) lw

Control Hazards

Solution 3 Delayed branch: always execute the sequentially next statement with the branch executing after one instruction delay compilers job to find a statement that can be put in the slot that is independent of branch outcome

P r o g ra m execution orde r T im e (in instructions) beq $ 1, $2, 40

MIPS does this but it is an option in SPIM (Simulator -> Settings)


2 4 6 8 10 12 14

Instruction fetch

Reg Instruction fetch

ALU Reg Instruction fetch

Data access ALU Reg

Reg Data access ALU Reg Data access Reg

add $4, $5 , $ 6 (d elaye d branch slot) lw $3 , 3 00($0)

2 ns

2 ns

2 ns

Delayed branch beq is followed by add that is independent of branch outcome

Data Hazards

Data hazard: instruction needs data from the result of a


previous instruction still executing in pipeline Solution Forward data if possible
Time add $s0, $t0, $t1 IF 2 4 6 8 10

ID

EX

MEM

WB

Instruction pipeline diagram: shade indicates use left=write, right=read

Program execution order Time (in instructions) add $s0, $t0, $t1 IF

10

ID

EX

MEM

WB

sub $t2, $s0, $t3

IF

ID

EX

MEM

WB

Without forwarding blue line data has to go back in time; with forwarding red line data is available in time

Data Hazards

Forwarding may not be enough

e.g., if an R-type instruction following a load uses the result of the load called load-use data hazard
Time Program execution order (in instructions)
lw $s0, 20($t1) IF

10

12

14

ID

EX

MEM

WB

Without a stall it is impossible to provide input to the sub instruction in time


WB

sub $t2, $s0, $t3

IF

ID

EX

MEM

Time Program execution order (in instructions) lw $s0, 20($t1) IF

10

12

14

ID

EX

MEM

WB

With a one-stage stall, forwarding can get the data to the sub instruction in time
bubble

bubble
sub $t2, $s0, $t3

bubble

bubble

bubble

IF

ID

EX

MEM

WB

Reordering Code to Avoid Pipeline Stall (Software Solution)


Example: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1)
Data hazard

Reordered code: lw $t0, 0($t1) lw $t2, 4($t1) sw $t0, 4($t1) sw $t2, 0($t1)

Interchanged

Pipelined Datapath

1. 2. 3. 4. 5.

We now move to actually building a pipelined datapath First recall the 5 steps in instruction execution
Instruction Fetch & PC Increment (IF) Instruction Decode and Register Read (ID) Execution or calculate address (EX) Memory access (MEM) Write result into register (WB)

Review: single-cycle processor


all 5 steps done in a single clock cycle dedicated hardware required for each step

What happens if we break the execution into multiple cycles, but keep the extra hardware?

Review - Single-Cycle Datapath Steps


ADD ADD 4

PC
ADDR RD Instruction I
32
16

<<2
5 5

32

Instruction Memory

RN1

RN2

Register File
WD

WN RD1

ALU
M U X

Zero

RD2

ADDR

16

E X T N D

Data Memory
WD

RD

32

M U X

Instruction Fetch

IF

Instruction Decode

ID

EX
Execute/ Address Calc.

Memory Access

MEM

Write Back

WB

Pipelined Datapath Key Idea

What happens if we break the execution into multiple cycles, but keep the extra hardware?

Answer: We may be able to start executing a new instruction at each clock cycle - pipelining

but we shall need extra registers to hold data between cycles pipeline registers

Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD ADD 4

64 bits
Instruction I
32 16 32 5 5 5

128 bits <<2

PC
ADDR RD

97 bits

64 bits

Instruction Memory

RN1

RN2

Register File
WD

WN RD1

ALU
M U X

Zero

RD2

ADDR

16

E X T N D

Data Memory
WD

RD

32

M U X

IF/ID

ID/EX

EX/MEM

MEM/WB

Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD ADD 4

64 bits
Instruction I
32 16 32 5 5 5

128 bits <<2

PC
ADDR RD

97 bits

64 bits

Instruction Memory

RN1

RN2

Register File
WD

WN RD1

ALU
M U X

Zero

RD2

ADDR

16

E X T N D

Data Memory
WD

RD

32

M U X

IF/ID

ID/EX

EX/MEM

MEM/WB

Only data flowing right to left may cause hazard, why?

Bug in the Datapath


IF/ID
ADD
ADD

ID/EX

EX/MEM

MEM/WB

PC
ADDR RD Instruction I
32

<<2
5 5

Instruction Memory

16

32

RN1

RN2

Register File
WD

WN RD1

ALU
M U X

RD2

ADDR

16

E X T N D

Data Memory
WD

RD

32

M U X

Write register number comes from another later instruction!

Corrected Datapath
IF/ID
ADD

ID/EX

EX/MEM

MEM/WB

64 bits

133 bits
<<2

ADD

102 bits

69 bits

PC
ADDR RD

Instruction Memory

32

5
5 5

RN1 RN2 WN WD

RD1

ALU
M U X

Zero

Register File RD2

ADDR

16

E X 32 T N D

Data Memory RD
WD

M U X

Destination register number is also passed through ID/EX, EX/MEM and MEM/WB registers, which are now wider by 5 bits

Pipelined Example

Consider the following instruction sequence:

lw

$t0,

10($t1)

sw $t3, 20($t4) add $t5, $t6, $t7 sub $t8, $t9, $t10

LW

Single-Clock-Cycle Diagram: Clock Cycle 1

SW

Single-Clock-Cycle Diagram: Clock Cycle 2


LW

ADD

Single-Clock-Cycle Diagram: Clock Cycle 3


SW LW

SUB

Single-Clock-Cycle Diagram: Clock Cycle 4


ADD SW LW

Single-Clock-Cycle Diagram: Clock Cycle 5


SUB ADD SW

LW

Single-Clock-Cycle Diagram: Clock Cycle 6


SUB ADD

SW

Single-Clock-Cycle Diagram: Clock Cycle 7


SUB

ADD

Single-Clock-Cycle Diagram: Clock Cycle 8

SUB

Alternative View Multiple-Clock-Cycle Diagram


CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8

Time axis
lw $t0, 10($t1)
IM REG ALU DM REG

sw $t3, 20($t4)

IM

REG

ALU

DM

REG

add $t5, $t6, $t7

IM

REG

ALU

DM

REG

sub $t8, $t9, $t10

IM

REG

ALU

DM

REG

Notes

One significant difference in the execution of an R-type instruction between multicycle and pipelined implementations:

register write-back for the R-type instruction is the 5th (the last write-back) pipeline stage vs. the 4th stage for the multicycle implementation. Why? think of structural hazards when writing to the register file

Worth repeating: the essential difference between the pipeline and multicycle implementations is the insertion of pipeline registers to decouple the 5 stages The CPI of an ideal pipeline (no stalls) is 1. Why? The RaVi Architecture Visualization Project of Dortmund U. has pipeline simulations see link in our Additional Resources page As we develop control for the pipeline keep in mind that the text does not consider jump should not be too hard to implement!

Recall Single-Cycle Control the Datapath


0 M u x ALU Add result Add 4 Instruction [31 26] RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Shift left 2 1 PCSrc

Control

Instruction [25 21] PC Read address Instruction [31 0] Instruction memory Instruction [15 11] Instruction [20 16] 0 M u x 1

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address

Read data Data memory

Write data Instruction [15 0] 16 Sign extend 32 ALU control

1 M u x 0

Instruction [5 0]

Recall Single-Cycle ALU Control


Instruction AluOp opcode
LW SW Branch eq R-type R-type R-type R-type R-type 00 00 01 10 10 10 10 10 load word store word branch eq add subtract AND OR set on less xxxxxx xxxxxx xxxxxx 100000 100010 100100 100101 101010 add add subtract add subtract and or set on less 010 010 110 010 110 000 001 111

Instruction Funct Field Desired ALU control operation ALU action input

ALUOp Funct field Operation ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 010 0 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 Truth table for ALU control bits

Recall Single-Cycle Control Signals


Effect of control bits
Signal Name RegDst RegWrite AlLUSrc PCSrc MemRead MemWrite MemtoReg Effect when deasserted The register destination number for the Write register comes from the rt field (bits 20-16) None The second ALU operand comes from the second register file output (Read data 2) The PC is replaced by the output of the adder that computes the value of PC + 4 None None The value fed to the register Write data input comes from the ALU Effect when asserted The register destination number for the Write register comes from the rd field (bits 15-11) The register on the Write register input is written with the value on the Write data input The second ALU operand is the sign-extended, lower 16 bits of the instruction The PC is replaced by the output of the adder that computes the branch target Data memory contents designated by the address input are put on the first Read data output Data memory contents designated by the address input are replaced by the value of the Write data input The value fed to the register Write data input comes from the data memory

Memto- Reg Mem Mem Deter- Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 1 0 0 1 0 0 0 1 0 mining R-format 0 1 1 1 1 0 0 0 0 control lw sw X 1 X 0 0 1 0 0 0 bits beq X 0 X 0 0 0 1 0 1

Pipeline Control

Initial design motivated by single-cycle datapath control use the same control signals Observe:

No separate write signal for the PC as it is written every cycle modified by hazard No separate write signals for the pipeline registers as they are detection unit!! written every cycle No separate read signal for instruction memory as it is read every clock cycle No separate read signal for register file as it is read every clock cycle

Will be

Need to set control signals during each pipeline stage Since control signals are associated with components active during a single pipeline stage, can group control lines into five

groups according to pipeline stage

Pipelined Datapath with Control I


0 M u x 1 IF/ID ID/EX EX/MEM Add 4
RegWrite Shift left 2 Add Add result

PCSrc

MEM/WB

Branch

PC

Address Instruction memory

Instruction

Read register 1

MemWrite Read data 1


ALUSrc

Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero Zero ALU ALU result

MemtoReg Address Data memory Write data Read data 1 M u x 0

Instruction 16 [15 0]

Sign extend

32

ALU control

MemRead

Same control signals as the single-cycle datapath

Instruction [20 16] Instruction [15 11] 0 M u x 1


RegDst ALUOp

Pipeline Control Signals

There are five stages in the pipeline instruction fetch / PC increment instruction decode / register fetch execution / address calculation

Nothing to control as instruction memory read and PC write are always enabled

memory access write back


Write-back stage control lines Reg Mem to write Reg 1 0 1 1 0 X 0 X

Instruction R-format lw sw beq

Execution/Address Calculation Memory access stage stage control lines control lines Reg ALU ALU ALU Mem Mem Dst Op1 Op0 Src Branch Read Write 1 1 0 0 0 0 0 0 0 0 1 0 1 0 X 0 0 1 0 0 1 X 0 1 0 1 0 0

Pipeline Control Implementation

Pass control signals along just like the data extend each
WB Instruction

pipeline register to hold needed control bits for succeeding stages

Control

M EX

WB M WB

IF/ID

ID/EX

EX/MEM

MEM/WB

Note: The 6-bit funct field of the instruction required in the EX

stage to generate ALU control can be retrieved as the 6 least significant bits of the immediate field which is sign-extended and passed from the IF/ID register to the ID/EX register

Pipelined Datapath with Control II


PCSrc 0 M u x 1 Control ID/EX WB M EX EX/MEM WB M MEM/WB WB IF/ID Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Control signals emanate from the control portions of the pipeline registers

Instruction 16 [15 0]

Sign extend

32

ALU control

MemRead

Instruction [20 16]

Instruction [15 11]

0 M u x 1 RegDst

ALUOp

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Pipelined Execution and Control

IF: lw $10, 20($1)

ID: before<1>

EX: before<2>

MEM: before<3>

WB: before<4>

0 M u x 1

IF/ID 00 000 0000

ID/EX WB M 00 000

EX/MEM

MEM/WB

Control

WB M

00 0 0 0 0 WB 0

0 00 EX 0

Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Instruction sequence:
lw sub and or add $10, $11, $12, $13, $14, 20($1) $2, $3 $4, $7 $6, $7 $8, $9

Clock cycle 1
Clock 1
IF: sub $11, $2, $3

Instruction [20 16] Instruction [15 11]

0 M u x 1 RegDst

ALUOp

ID: lw $10, 20($1)

EX: before<1>

MEM: before<2>

WB: before<3>

0 M u x 1

IF/ID 11 lw 010 0001

ID/EX WB M 00 000

EX/MEM

MEM/WB

Control

WB M

00 0 0 0 0 WB 0

0 00 EX 0

Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

Read $1 data 1 Read register 2 Registers Read $X Write data 2 register Write data

Label before<i> means i th instruction before lw Clock cycle 2


Clock 2

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

20

Instruction [15 0]

Sign extend

20

ALU control

MemRead

10

Instruction [20 16] Instruction [15 11]

10

0 M u x 1 RegDst

ALUOp

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Pipelined Execution and Control

IF: and $12, $4, $5

ID: sub $11, $2, $3

EX: lw $10, . . .

MEM: before<1>

WB: before<2>

0 M u x 1

IF/ID 10 sub 000 1100

ID/EX WB M 11 010

EX/MEM

MEM/WB

Control

WB M

00 0 0 0 0 WB 0

0 00 EX 1

Add 4
RegWrite

Add Add result


MemWrite

Shift left 2 ALUSrc $1 Zero ALU ALU result

Branch

PC

Address Instruction memory

Read $2 data 1 Read register 2 Registers Read $3 Write data 2 register Write data

0 M u x 1

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

20

ALU control

MemRead

Instruction sequence:
lw sub and or add $10, $11, $12, $13, $14, 20($1) $2, $3 $4, $7 $6, $7 $8, $9

Clock cycle 3
Clock 3
IF: or $13, $6, $7

Instruction [20 16] Instruction [15 11]

10

11

11

0 M u x 1 RegDst

ALUOp

ID: and $12, $2, $3

EX: sub $11, . . .

MEM: lw $10, . . .

WB: before<1>

0 M u x 1

IF/ID 10 and 000 1100

ID/EX WB M 10 000

EX/MEM

MEM/WB

Control

WB M

11 0 1 0 0 WB 0

1 10 EX 0

Add 4
RegWrite

Add Add result


MemWrite

Shift left 2 ALUSrc $2 Zero ALU ALU result

Branch

PC

Address Instruction memory

Read $4 data 1 Read register 2 Registers Read $5 Write data 2 register Write data

$3

0 M u x 1

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Instruction [20 16] Instruction [15 11]

Clock cycle 4 Clock 4

12

12

11

0 M u x 1 RegDst

ALUOp 10

MemtoReg

Instruction

Read register 1

MemtoReg

Instruction

Read register 1

1 M u x 0

1 M u x 0

Pipelined Execution and Control

IF: add $14, $8, $9

ID: or $13, $6, $7

EX: and $12, . . .

MEM: sub $11, . . .

WB: lw $10, . . .

0 M u x 1

IF/ID 10 or 000 1100

ID/EX WB M 10 000

EX/MEM

MEM/WB

Control

WB M

10 0 0 0 1 WB 1

1 10 EX 0

Add 4 Add Add result

RegWrite

ALUSrc $4 Zero ALU ALU result

PC

Address Instruction memory

7 10

Read $6 data 1 Read register 2 Registers Read $7 Write data 2 register Write data

$5

0 M u x 1

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Clock cycle 5
IF: after<1>

Instruction [20 16] Instruction [15 11]

Instruction sequence:
lw sub and or add $10, $11, $12, $13, $14, 20($1) $2, $3 $4, $7 $6, $7 $8, $9

Clock 5

13

13

12

0 M u x 1 RegDst

ALUOp 11 10

ID: add $14, $8, $9

EX: or $13, . . .

MEM: and $12, . . .

WB: sub $11, . . .

0 M u x 1

IF/ID 10 add 000 1100

ID/EX WB M 10 000

EX/MEM

MEM/WB

Control

WB M

10 0 0 0 1 WB 0

1 10 EX 0

Add 4 Add Add result

RegWrite

ALUSrc $6 Zero ALU ALU result

PC

Address Instruction memory

9 11

Read $8 data 1 Read register 2 Registers Read $9 Write data 2 register Write data

$7

Label after<i> means i th instruction after add Clock cycle 6


Clock 6

0 M u x 1

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Instruction [20 16] Instruction [15 11]

14

14

13

0 M u x 1 RegDst

ALUOp 12 11

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Pipelined Execution and Control

IF: after<2>

ID: after<1>

EX: add $14, . . .

MEM: or $13, . . .

WB: and $12, . . .

0 M u x 1

IF/ID 00 000 0000

ID/EX WB M 10 000

EX/MEM

MEM/WB

Control

WB M

10 0 0 0 1 WB 0

1 10 EX 0

Add 4 Add Add result

RegWrite

ALUSrc $8 Zero ALU ALU result

PC

Address Instruction memory

12

Read data 1 Read register 2 Registers Read Write data 2 register Write data

$9

0 M u x 1

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Clock cycle 7
Clock 7
IF: after<3>

Instruction [20 16] Instruction [15 11]

14

0 M u x 1 RegDst

ALUOp 13 12

Instruction sequence:
lw sub and or add $10, $11, $12, $13, $14, 20($1) $2, $3 $4, $7 $6, $7 $8, $9

ID: after<2>

EX: after<1>

MEM: add $14, . . .

WB: or $13, . . .

0 M u x 1

IF/ID 00 000 0000

ID/EX WB M 00 000

EX/MEM

MEM/WB

Control

WB M

10 0 0 0 1 WB 0

0 00 EX 0

Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

13

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Clock cycle 8
Clock 8

Instruction [20 16] Instruction [15 11]

0 M u x 1 RegDst

ALUOp 14 13

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Pipelined Execution and Control

Instruction sequence:
lw sub and or add $10, $11, $12, $13, $14, 20($1) $2, $3 $4, $7 $6, $7 $8, $9

IF: after<4>

ID: after<3>

EX: after<2>

MEM: after<1>

WB: add $14, . . .

0 M u x 1

IF/ID 00 000 0000

ID/EX WB M 00 000

EX/MEM

MEM/WB

Control

WB M

00 0 0 0 1 WB 0

0 00 EX 0

Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

14

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Instruction [15 0]

Sign extend

ALU control

MemRead

Instruction [20 16]

Clock cycle 9
Clock 9

Instruction [15 11]

0 M u x 1 RegDst

ALUOp 14

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Revisiting Hazards

So far our datapath and control have ignored hazards We shall revisit data hazards and control hazards and enhance our datapath and control to handle them in hardware

Data Hazards and Forwarding

Problem with starting an instruction before previous are finished:

data dependencies that go backward in time called data hazards


Time (in clock cycles) CC 1 Value of register $2: 10 Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg CC 2 10 CC 3 10 CC 4 10 CC 5 10/ 20 CC 6 20 CC 7 20 CC 8 20 CC 9 20

$2 = 10 before sub; $2 = -20 after sub

sub and or add sw

$2, $12, $13, $14, $15,

$1, $3 $2, $5 $6, $2 $2, $2 100($2)

and $12, $2, $5

IM

Reg

DM

Reg

or $13, $6, $2

IM

Reg

DM

Reg

add $14, $2, $2

IM

Reg

DM

Reg

sw $15, 100($2)

IM

Reg

DM

Reg

Software Solution

Have compiler guarantee never any data hazards!

by rearranging instructions to insert independent instructions between instructions that would otherwise have a data hazard between them, or, if such rearrangement is not possible, insert nops $2, $1, $3 sub $2, $1, $3

sub

lw slt and or add sw

$10, 40($3) $5, $6, $7 $12, $2, $5 $13, $6, $2 $14, $2, $2 $15, 100($2)

or

nop nop and or add sw

$12, $13, $14, $15,

$2, $5 $6, $2 $2, $2 100($2)

Such compiler solutions may not always be possible, and nops slow the machine down
MIPS: nop = no operation = 000 (32bits) = sll $0, $0, 0

Hardware Solution: Forwarding

Idea: use intermediate data, do not wait for result to be finally written to the destination register. Two steps:
1. 2.

Detect data hazard Forward intermediate data to resolve hazard

Pipelined Datapath with Control II (as before)


PCSrc 0 M u x 1 Control ID/EX WB M EX EX/MEM WB M MEM/WB WB IF/ID Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Control signals emanate from the control portions of the pipeline registers

Instruction 16 [15 0]

Sign extend

32

ALU control

MemRead

Instruction [20 16]

Instruction [15 11]

0 M u x 1 RegDst

ALUOp

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Hazard Detection

Hazard conditions:

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Eg., in the earlier example, first hazard between sub $2, $1, $3 and and $12, $2, $5 is detected when the and is in EX stage and the sub is in MEM stage because

EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a)

Whether to forward also depends on:

forward, even if there is register number match as in conditions above if the destination register of the later instruction is $0 in which case there is no need to forward value ($0 is always 0 and never overwritten)

if the later instruction is going to write a register if not, no need to

Data Forwarding Plan:


allow inputs to the ALU not just from ID/EX, but also later pipeline registers, and use multiplexors and control signals to choose appropriate inputs to ALU
Time (in clock cycles) CC 1 Value of register $2 : 10 Value of EX/MEM : X Value of MEM/WB : X Program execution order (in instructions) sub $2, $1, $3 IM Reg DM Reg CC 2 10 X X CC 3 10 X X CC 4 10 20 X CC 5 10/ 20 X 20 CC 6 20 X X CC 7 20 X X CC 8 20 X X CC 9 20 X X

sub and or add sw

$2, $12, $13, $14, $15,

$1, $3 $2, $5 $6, $2 $2, $2 100($2)

and $12, $2, $5

IM

Reg

DM

Reg

or $13, $6, $2

IM

Reg

DM

Reg

add $14, $2, $2

IM

Reg

DM

Reg

sw $15, 100($2)

IM

Reg

DM

Reg

Dependencies between pipelines move forward in time

ID/EX

EX/MEM

MEM/WB

Forwarding Hardware

Registers

ALU Data memory

M u x

a. No forwarding

Datapath before adding forwarding hardware


ID/EX EX/MEM MEM/WB

M u x Registers ForwardA M u x ALU Data memory

M u x

Rs Rt Rt Rd

ForwardB M u x Forwarding unit

EX/MEM.RegisterRd MEM/WB.RegisterRd

b. With forwarding

Datapath after adding forwarding hardware

Forwarding Hardware: Multiplexor Control


Mux control
ForwardA = 00 ForwardA = 10 ForwardA = 01 ForwardB = 00 ForwardB = 10 ForwardB = 01

Source
ID/EX EX/MEM MEM/WB ID/EX EX/MEM MEM/WB

Explanation
The first ALU operand comes from the register file The first ALU operand is forwarded from prior ALU result The first ALU operand is forwarded from data memory or an earlier ALU result The second ALU operand comes from the register file The second ALU operand is forwarded from prior ALU result The second ALU operand is forwarded from data memory or an earlier ALU result

Depending on the selection in the rightmost multiplexor (see datapath with control diagram)

Data Hazard: Detection and Forwarding

Forwarding unit determines multiplexor control according to the following rules:


EX hazard if ( EX/MEM.RegWrite // if there is a write and ( EX/MEM.RegisterRd 0 ) // to a non-$0 register and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then ForwardA = 10 EX/MEM.RegWrite // if there is a write and ( EX/MEM.RegisterRd 0 ) // to a non-$0 register and ( EX/MEM.RegisterRd = ID/EX.RegisterRt ) ) // which matches, then ForwardB = 10 if (

1.

Data Hazard: Detection and Forwarding


2.

MEM hazard if ( MEM/WB.RegWrite and ( MEM/WB.RegisterRd 0 ) and ( EX/MEM.RegisterRd ID/EX.RegisterRs )

// if there is a write // to a non-$0 register // and not already a register match // with earlier pipeline register and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline register, then ForwardA = 01 if ( // if there is a write // to a non-$0 register // and not already a register match // with earlier pipeline register and ( MEM/WB.RegisterRd = ID/EX.RegisterRt ) ) // but match with later pipeline register, then ForwardB = 01
This check is necessary, e.g., for sequences such as add $1, $1, $2; add $1, $1, $3; add $1, $1, $4; (array summing?), where an earlier pipeline (EX/MEM) register has more recent data

MEM/WB.RegWrite and ( MEM/WB.RegisterRd 0 ) and ( EX/MEM.RegisterRd ID/EX.RegisterRt )

Forwarding Hardware with Control


ID/EX WB M EX/MEM WB Control IF/ID EX M

Called forwarding unit, not hazard detection unit, because once data is forwarded there is no hazard!

MEM/WB WB

Instruction

M u x Registers ALU M u x IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd Rs Rt Rt Rd M u x Forwarding unit EX/MEM.RegisterRd Data memory

PC

Instruction memory

M u x

MEM/WB.RegisterRd

Datapath with forwarding hardware and control wires certain details, e.g., branching hardware, are omitted to simplify the drawing Note: so far we have only handled forwarding to R-type instructions!

or $4, $4, $2

and $4, $2, $5

sub $2, $1, $3


ID/EX 10 WB M EX 10

before<1>

before<2>

Forwarding
PC Instruction memory

EX/MEM WB M MEM/WB WB

Control

IF/ID

2
Instruction

$2

$1 M u x

5 Registers

ALU $5 $3 M u x 2 5 4 1 3 2 M u x Forwarding unit

Data memory

M u x

Execution example:
$2, $4, $4, $9, $1, $2, $4, $4, $3 $5 $2 $2

Clock cycle 3
Clock 3 add $9, $4, $2 or $4, $4, $2 and $4, $2, $5
ID/EX

sub $2, . . .

before<1>

Instruction

sub and or add

10

WB M EX

10 EX/MEM WB M 10 MEM/WB WB

Control

IF/ID

4 6 Registers

$4

$2 M u x ALU Data memory

PC

Instruction memory

$2

$5 M u x

M u x

2 6 4

2 5 4 M u x Forwarding unit 2

Clock cycle 4
Clock 4

after<1>

add $9, $4, $2

or $4, $4, $2
ID/EX 10 WB M EX 10

and $4, . . .

sub $2, . . .

EX/MEM WB M 10 MEM/WB WB 1

Forwarding
PC Instruction memory

Control

IF/ID

4
Instruction

$4

$4 M u x

2 2 Registers

ALU $2 $2 M u x 4 2 4 2 4 M u x Forwarding unit 4

Data memory

M u x

Execution example (cont.):


$2, $4, $4, $9, $1, $2, $4, $4, $3 $5 $2 $2

Clock cycle 5
Clock 5 after<2> after<1> add $9, $4, $2
ID/EX

or $4, . . .

and $4, . . .

Instruction

sub and or add

WB M EX

10 EX/MEM WB M 10 MEM/WB WB 1

Control

IF/ID

$4 M u x 4 Registers ALU $2 M u x 4 2 9 M u x Forwarding unit 4 4 Data memory

PC

Instruction memory

M u x

Clock cycle 6
Clock 6

Data Hazards and Stalls

Load word can still cause a hazard:

an instruction tries to read a register following a load instruction that writes to the same register
20($1) $2, $5 $2, $6 $4, $2 $6, $7
Time (in clock cycles) Program CC 1 execution order (in instructions) lw $2, 20($1) IM CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

lw and or add Slt

$2, $4, $8, $9, $1,

Reg

DM

Reg

and $4, $2, $5

IM

Reg

DM

Reg

As even a pipeline dependency goes backward in time forwarding will not solve the hazard

or $8, $2, $6

IM

Reg

DM

Reg

add $9, $4, $2

IM

Reg

DM

Reg

slt $1, $6, $7

IM

Reg

DM

Reg

therefore, we need a hazard detection unit to stall the pipeline after the load instruction

Pipelined Datapath with Control II (as before)


PCSrc 0 M u x 1 Control ID/EX WB M EX EX/MEM WB M MEM/WB WB IF/ID Add 4 Add Add result

RegWrite

ALUSrc

PC

Address Instruction memory

Read data 1 Read register 2 Registers Read Write data 2 register Write data

0 M u x 1

Zero ALU ALU result

Address Data memory Write data

Read data

Control signals emanate from the control portions of the pipeline registers

Instruction 16 [15 0]

Sign extend

32

ALU control

MemRead

Instruction [20 16]

Instruction [15 11]

0 M u x 1 RegDst

ALUOp

MemtoReg
1 M u x 0

Instruction

Read register 1

MemWrite

Shift left 2

Branch

Hazard Detection Logic to Stall

Hazard detection unit implements the following check if to stall

if ( ID/EX.MemRead // if the instruction in the EX stage is a load and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register // of the instruction in the ID stage, then stall the pipeline

Mechanics of Stalling

If the check to stall verifies, then the pipeline needs to stall only 1 clock cycle after the load as after that the forwarding unit can resolve the dependency What the hardware does to stall the pipeline 1 cycle:

does not let the IF/ID register change (disable write!) this will cause the instruction in the ID stage to repeat, i.e., stall
therefore, the instruction, just behind, in the IF stage must be stalled as well so hardware does not let the PC change (disable write!) this will cause the instruction in the IF stage to repeat, i.e., stall

changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so effectively the instruction just behind the load becomes a nop a bubble is said to have been inserted
into the pipeline

note that we cannot turn that instruction into an nop by 0ing all the bits in the instruction itself recall nop = 000 (32 bits) because it has already been decoded and control signals generated

Hazard Detection Unit


Hazard detection unit
IF/IDWrite

ID/EX.MemRead ID/EX WB M u x EX/MEM WB M

Control 0

M EX

MEM/WB WB

IF/ID

PCWrite

Instruction

M u x Registers ALU M u x IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRd ID/EX.RegisterRt Rt Rd Rs Rt Data memory

PC

Instruction memory

M u x

M u x Forwarding unit

EX/MEM.RegisterRd

MEM/WB.RegisterRd

Datapath with forwarding hardware, the hazard detection unit and controls wires certain details, e.g., branching hardware are omitted to simplify the drawing

Stalling Resolves a Hazard

Same instruction sequence as before for which forwarding by itself could not resolve the hazard:
Program Time (in clock cycles) execution CC 1 CC 2 order (in instructions) CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10

lw and or add Slt

$2, $4, $8, $9, $1,

20($1) $2, $5 $2, $6 $4, $2 $6, $7

lw $2, 20($1)

IM

Reg

DM

Reg

and $4, $2, $5

IM

Reg

Reg

DM

Reg

or $8, $2, $6

IM

IM

Reg

DM

Reg

bubble
add $9, $4, $2 IM Reg DM Reg

slt $1, $6, $7

IM

Reg

DM

Reg

Hazard detection unit inserts a 1-cycle bubble in the pipeline, after which all pipeline register dependencies go forward so then the forwarding unit can handle them and there are no more hazards

and $4, $2, $5

lw $2, 20($1)
1 X
IF/IDWrite

before<1>
ID/EX.MemRead ID/EX 11 M u x WB M EX

before<2>

before<3>

Hazard detection unit

EX/MEM WB M MEM/WB WB

Instruction

Stalling

Control 0

IF/ID

PCWrite

1 X Registers

$1 M u x ALU $X M u x Data memory

PC

Instruction memory

M u x

Execution example:
Clock cycle 2 Clock 2 lw and or add $2, $4, $4, $9, 20($1) $2, $5 $4, $2 $4, $2
or $4, $4, $2 and $4, $2, $5
2 5
IF/IDWrite

1 X 2

M u x Forwarding unit

ID/EX.RegisterRt

lw $2, 20($1)
ID/EX.MemRead ID/EX 00 M u x WB M EX 11

before<1>

before<2>

Hazard detection unit

EX/MEM WB M MEM/WB WB

Control 0

IF/ID

PCWrite

2
Instruction

$2

$1 M u x

5 Registers

PC

Instruction memory

ALU $5 $X M u x 2 5 4 ID/EX.RegisterRt 1 X 2 M u x Forwarding unit

Data memory

M u x

Clock cycle 3
Clock 3

or $4, $4, $2

and $4, $2, $5


2 5 Hazard detection unit ID/EX.MemRead

bubble
ID/EX 10 M u x WB M EX 00

lw $2, . . .

before<1>

IF/IDWrite

EX/MEM WB M 11 MEM/WB WB

PCWrite

Instruction

Stalling
PC Instruction memory

Control 0

IF/ID

2 5 Registers

$2

$2 M u x ALU Data memory

$5

$5 M u x

M u x

Execution example (cont.):


lw and or add $2, $4, $4, $9,

2 5 4 ID/EX.RegisterRt

2 5 4 M u x Forwarding unit 2

Clock cycle 4
Clock 4 add $9, $4, $2 or $4, $4, $2
4 2 Hazard detection unit ID/EX.MemRead ID/EX 10 M u x WB M EX 10

and $4, $2, $5

bubble

lw $2, . . .

PCWrite

20($1) $2, $5 $4, $2 $4, $2

IF/IDWrite

EX/MEM WB M 0 MEM/WB WB 11

Control 0

IF/ID

$4

$2 M u x

Instruction

2 2 Registers

PC

Instruction memory

ALU $2 $5 M u x 4 2 4 ID/EX.RegisterRt 2 5 4 M u x Forwarding unit

Data memory

M u x

Clock cycle 5
Clock 5

after<1>

add $9, $4, $2


4 2 Hazard detection unit ID/EX.MemRead ID/EX 10 M u x WB M EX 10

or $4, $4, $2

and $4, . . .

bubble

IF/IDWrite

EX/MEM WB M 10 MEM/WB WB 0

Control 0

Instruction

Stalling

IF/ID

PCWrite

4 2 Registers

$4

$4 M u x ALU Data memory

PC

Instruction memory

$2

$2 M u x

M u x

Execution example (cont.):


lw and or add $2, $4, $4, $9, 20($1) $2, $5 $4, $2 $4, $2

4 2 9 ID/EX.RegisterRt

4 2 4 M u x Forwarding unit 4

Clock cycle 6
Clock 6 after<2> after<1>
Hazard detection unit ID/EX.MemRead ID/EX 10 M u x WB M EX 10

add $9, $4, $2

or $4, . . .

and $4, . . .

IF/IDWrite

EX/MEM WB M 10 MEM/WB WB 1

Control 0

IF/ID

PCWrite

$4

Instruction

M u x 4 Registers ALU $2 M u x 4 2 9 ID/EX.RegisterRt M u x Forwarding unit 4 4 Data memory

PC

Instruction memory

M u x

Clock cycle 7
Clock 7

Control (or Branch) Hazards

Problem with branches in the pipeline we have so far is that the branch decision is not made till the MEM stage so what

instructions, if at all, should we insert into the pipeline following the branch instructions?

Possible solution: stall the pipeline till branch decision is known

not efficient, slow the pipeline significantly!

Another solution: predict the branch outcome

e.g., always predict branch-not-taken continue with next

sequential instructions

if the prediction is wrong have to flush the pipeline behind the branch discard instructions already fetched or decoded and

continue execution at the branch target

Predicting Branch-not-taken: Misprediction delay


Time (in clock cycles) Program execution CC 1 CC 2 order (in instructions) 40 beq $1, $3, 7 IM Reg CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 DM Reg

44 and $12, $2, $5

IM

Reg

DM

Reg

48 or $13, $6, $2

IM

Reg

DM

Reg

52 add $14, $2, $2

IM

Reg

DM

Reg

72 lw $4, 50($7)

IM

Reg

DM

Reg

The outcome of branch taken (prediction wrong) is decided only when beq is in the MEM stage, so the following three sequential instructions already in the pipeline have to be flushed and execution resumes at lw

Optimizing the Pipeline to Reduce Branch Delay

Move the branch decision from the MEM stage (as in our current pipeline) earlier to the ID stage

calculating the branch target address involves moving the

branch adder from the MEM stage to the ID stage inputs to this adder, the PC value and the immediate fields are already available in the IF/ID pipeline register calculating the branch decision is efficiently done, e.g., for equality test, by XORing respective bits and then ORing all the results and inverting, rather than using the ALU to subtract and then test for zero (when there is a carry delay)

with the more efficient equality test we can put it in the ID stage without significantly lengthening this stage remember an objective of pipeline design is to keep pipeline stages balanced

we must correspondingly make additions to the forwarding and hazard detection units to forward to or stall the branch at the ID
stage in case the branch decision depends on an earlier result

Flushing on Misprediction

Same strategy as for stalling on load-use data hazard Zero out all the control values (or the instruction itself) in pipeline registers for the instructions following the branch that are already in the pipeline effectively turning them into nops so they are flushed

in the optimized pipeline, with branch decision made in the ID stage, we have to flush only one instruction in the IF stage the branch delay penalty is then only one clock cycle

Optimized Datapath for Branch


IF.Flush Hazard detection unit M u x M u x ID/EX WB Control 0 IF/ID M EX

IF.Flush control zeros out the instruction in the IF/ID pipeline register (which follows the branch)
EX/MEM WB M MEM/WB WB

Shift left 2 M u x ALU M u x Data memory

Registers PC Instruction memory

M u x

Sign extend

M u x Forwarding unit

Branch decision is moved from the MEM stage to the ID stage simplified drawing not showing enhancements to the forwarding and hazard detection units

Pipelined Branch

and $12, $2, $5


IF.Flush

beq $1, $3, 7

sub $10, $4, $8

before<1>

before<2>

Hazard detection unit 72 48 x


M u

ID/EX WB Control 28 IF/ID 48 44 72 0 M u x M EX EX/MEM WB M MEM/WB WB

4 $1
Shift left 2

Registers 72 PC 44 Instruction memory 7

=
$3

M u x

$4

ALU M u x $8

Data memory

M u x

Execution example:
sub beq and or add slt $10, $1, $12 $13 $14, $15, $4, $4, $3, $2, $2, $4, $6, $8 7 $5 $6 $2 $7 Clock cycle 3 Clock 3
lw $4, 50($7)
IF.Flush

Sign extend

10 Forwarding unit

36 40 44 48 52 56

bubble (nop)

beq $1, $3, 7

sub $10, . . .

before<1>

Hazard detection unit ID/EX 76 x


M u

WB Control 0 IF/ID 76 72 M u x M EX

EX/MEM WB M MEM/WB WB

72 lw 50($7)
76 PC 72

4
Shift left 2

Registers Instruction memory

M u x

$1

ALU M u x $3

Data memory

M u x

Optimized pipeline with only one bubble as a result of the taken branch Clock cycle 4 Clock 4

Sign extend

10 Forwarding unit

Simple Example: Comparing Performance

Compare performance for single-cycle, multicycle, and pipelined datapaths using the gcc instruction mix

assume 2 ns for memory access, 2 ns for ALU operation, 1 ns for register read or write assume gcc instruction mix 23% loads, 13% stores, 19% branches, 2% jumps, 43% ALU for pipelined execution assume

50% of the loads are followed immediately by an instruction that uses the result of the load 25% of branches are mispredicted branch delay on misprediction is 1 clock cycle jumps always incur 1 clock cycle delay so their average time is 2 clock cycles

Simple Example: Comparing Performance


Single-cycle (p. 373): average instruction time 8 ns Multicycle (p. 397): average instruction time 8.04 ns Pipelined:
loads use 1 cc (clock cycle) when no load-use dependency and 2 cc when there is dependency given 50% of loads are followed by dependency the average cc per load is 1.5 stores use 1 cc each branches use 1 cc when predicted correctly and 2 cc when not given 25% misprediction average cc per branch is 1.25 jumps use 2 cc each ALU instructions use 1 cc each therefore, average CPI is 1.5 23% + 1 13% + 1.25 19% + 2 2% + 1 43% = 1.18 therefore, average instruction time is 1.18 2 = 2.36 ns

Вам также может понравиться