Академический Документы
Профессиональный Документы
Культура Документы
References:
Chapter 11 of M. Morris Mano and Charles Kime, Logic and
Computer Design Fundamentals, Pearson Prentice Hall, 4th
Edition, 2008.
Overview
Pipelining concept
Pipelined design of Simple Computer
Pipeline hazards
1ns
200ps
200ps
200ps
200ps
200ps
Pipeline
Register
3
Pipelined:
1 operation finishes
every 200ps
200ps
200ps
200ps
200ps
200ps
Pipelining
Pipelining increases throughput, but not total
computation time of a task
Limitations:
Pipelining
Pipelining transformation leads to a
reduction in the critical path, which can be
exploited to increase the clock speed or to
reduce power consumption at same speed.
In parallel processing, multiple outputs are
computed in parallel in a clock period.
Therefore, the effective clock speed is
increased by the level of parallelism.
If we do laundry sequentially...
6 PM
30 30
T
a
s
k
O
r
d
e
r
30
30
30 30
11
10
9
30
30
30 30
30
12
30
2 AM
30 30
30
30
Time
B
C
D
30 30
T
a
s
k
O
r
d
e
r
30
30
10
30
30
11
12
2 AM
Time
30
B
D
30 30
T
a
s
k
O
r
d
e
r
B
D
30
30
10
30
30
30
11
12
2 AM
Time
13
14
DMem
Ifetch
Reg
DMem
Ifetch
Reg
DMem
Ifetch
Reg
ALU
O
r
d
e
r
Reg
ALU
Ifetch
ALU
I
n
s
t
r.
ALU
Reg
Reg
DMem
Reg
15
Overview
Pipelining concept
Pipelined design of Simple Computer
Pipeline hazards
16
17
18
19
20
10
Pipelined Implementation
21
5-stage Pipeline
CPU stages
DR
DF
IF
DR
DF
IF
DR
DF
22
11
Computer Pipelines
Execute billions of instruction, so throughput is
what matters
Throughput versus latency
+ Throughput increases
- Latency for a single instruction increases
Unpipelined Datapath
24
12
Pipelined Datapath
MAR
MDR
25
Overview
Pipelining concept
Pipelined design of Simple Computer
Pipeline hazards
26
13
Pipelining Hazards
Hazards cause the pipe to stall because of
some conflict in the pipe (prevents the next
instruction in pipe from executing in its turn)
Types of hazards
27
Structural Hazards
Resource conflicts in the pipeline
Examples
data access
Register file without a separate write port
28
14
Structural Hazards
IF
load
DR
DF
IF
DR
DF
IF
DR
DF
STALL
IF
DR
DF
IF
DR
DF
sub
and
or
W
29
Structural Hazards
IF and DF compete for single memory port
Ideal Machine
Solutions:
cache-memory (expensive)
Have separate instruction cache-memory and data
cache-memory
30
15
31
r1, (r3)+
32
16
33
Data Hazards
Overlapping instructions cause dependencies
on data (RAW)
e.g.,
MOVA R1, R5
ADD R2, R1, R6
ADD R3, R1, R2
R1R5
Write R1
IF
DR
DF
IF
DR
DF
IF
DR
DF
R2R1 + R6
R3R1 + R2
Write R2
34
17
Hardware stalls
MOVA R1, R5
IF
ADD R2
R2, R1
R1, R6
R6
Hazard detection
DR
IF
DR
DF
IF
DR
IF
ADD R2
R2, R1
R1, R6
R6
DF
DR
DF
IF
DR
DF
W
36
18
37
38
19
Control Hazards
For branch or jump instruction, the correct
instruction to execute is not known in time (at
the start of the IF stage of the instruction after
the Branch)
branch instruction)
Target instruction address not yet calculated
39
Control Hazards
Conflicts that arise from changes in the Program Counter
R1=0 evaluated,
and PC set to 20
Branch instructions
1 BZ R1, 18
2 MOV R2, R3
3 MOV R1, R2
IF
DR
DF
WB
IF
DR
DF
WB
IF
DR
DF
WB
IF
DR
DF
20 MOV R5, R6
WB
Instruction fetched
from target address
20
41
Sol 2:
Perform target address calculation earlier in DR
stage
42
21
Sol. 3a
Assume branch not taken, fixup if taken
43
Sol.3b
Assume branch taken, fixup if not taken
44
22
Sol.4
Delayed branch (ISA change)
45
ISA solution
Branch prediction
47
23
48
Pipelining Cautions
Superpipelining can cause long latencies
49
24
Overview
Pipelining concept
Pipelined design of Simple Computer
Pipeline hazards
50
Parallelism in CPUs
Instruction Level Parallelism
Superscalar
51
25
Pipelined CPUs
mult r1, r2, r3
add r4, r5, r6
ID
IF
ID
MEM
WB
MEM
STALL STALL
WB
52
Superscalar CPUs
mult r1, r2, r3
add r4, r5, r6
IF
ID
IF
ID
MEM
WB
MEM
WB
53
26
4-Way Superscalar
Todays microprocessors are typically 4-way
superscalar
54
VLIW CPUs
mult r1, r2, r3; add r4, r5, r6
IF
ID
MEM
WB
MEM
WB
55
27
4-way VLIW
56
Summary
Pipelining is essential
Parallel computing is the future
57
28