Академический Документы
Профессиональный Документы
Культура Документы
Chapter 3.8,3.9,3.11-3.13
Ideal Machine
Results
tomcatv
doduc
fpppp
li
espresso
gcc
Instructions
issued per
cycle
50
100
150
200
Observations
First three are floating-point programs
data-intensive, loop-intensive
Issue Width
2K
tomcatv
512
128
doduc
fppp
32
espresso
gcc
li
Observations
Branch Prediction
70
60
50
40
30
20
10
0
Perfect
Tournament
tomcatv
doduc
2-bit
fppp
Static
espresso
None
gcc
li
Observations
We do not take misprediction penalty, just
reduce ILP
tomcatv & fppp
Results
gcc
espresso
tournament
2-bit counter
static (profile)
li
fppp
doduc
tomcatv
0
20
40
60
80
100
Observations
Tomcatv has insanely high accuracy
integer programs generally lower than
scientific (floating-point)
For the rest, use a larger tournament
predictor than shown in this picture (and
2k window size, 64 issue width)
256
tomcatv
128
doduc
64
fppp
espresso
32
None
gcc
li
Observations
tomcatv & fppp more sensitive to number
of registers why?
For rest of results, 256 integer & 256 FP
regs
Alpha 21264 41 integer & 41 FP (+ 32 of
each in ISA)
Which do you think made more difference
regs or branch prediction?
Alias Analysis
Memory Disambiguation
Global/stack perfect
assumes all heap refs conflict
perfectly predicts global & stack
Alias Analysis
Memory Disambiguation
60
50
40
30
20
10
0
Perfect
tomcatv
Global/Stack
Perfect
doduc
Inspection
fppp
espresso
None
gcc
li
Observations
Compiler is not good enough
Scientific programs have few heap
references
RAW hazards
value prediction
Realistic machines
Address value prediction and speculation
predict the address
reorder if two predicted addresses dont
match
An alternate approach
Accept that there will
be delays in
processing
Give the processor
something else to do
while it is waiting
Let two threads share
the same machine
TLP vs ILP
Advantages
What is the difference between filling the
window with instructions from two threads
and instructions from one thread?
Different Perspectives
SMT allows one thread to get work done
while the other thread is waiting for
something
SMT is a parallel machine that allows
dynamic resource allocation
SMT is a parallel machine that is
unnecessarily large and complex, and
parallel threads would be better off on
separate, simpler processing elements.
Disadvantages
Interference
Clock rate
Fallacies
Processors with lower CPIs will always be
faster
Processors with faster clock rates will
always be faster
Pitfall
Improving CPI by increasing width but
sacrificing ______________
Improving only one aspect of a multipleissue processor and expecting overall
performance improvement
Sacrificing complexity for space