Академический Документы
Профессиональный Документы
Культура Документы
Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W. Wall WRL Research Report, November 1993
What is ILP?
Instructions that do not have dependencies on each other; can be executed in any order.
r1 := 0[r9] r2 := 17 4[r3] := r6 (has ILP) r1 := 0[r9] r2 := r1 + 17 4[r2] := r6 (no ILP)
Super-scalar machine a machine that can issue multiple independent instructions in the same clock cycle.
Definition of Parallelism
Parallelism = (Number of Instructions) / (Number of Cycles it takes to execute) r1 := 0[r9] r2 := 17 4[r3] := r6 Parallelism = 3 r1 := 0[r9] r2 := r1 + 17 4[r2] := r6 Parallelism = 1
That depends how hard you want to look for it... Ways to increase ILP: Register renaming Branch prediction Alias analysis Indirect-jump prediction
Programsaremadeupofbasicblocksuninterrupted sequences of instructions with no branches. On average, in typical applications, basic blocks are ~10 instructions long. Each basic block has parallelism of around 3.
Types of dependencies
Types of dependencies: * True dependency - given the computations involved, the dependency must exist * False dependency - dependency happens to exist as an artifact of the code generation engine. E.g., two independent values are allocated to the same register by the compiler. r1 := 20[r4] r2 := r1 + r4 ... ... r2 := r1 + 1 r1 := r17 - 1 (a) true data dependency (b) anti-dependency
if r17 = 0 goto L ... ... r1 := r2 + r3 ... r1 := 0[r7] L: (c) output dependency (d) control dependency
r1 := r2 * r3
Register renaming
The compiler's register allocation algorithm can insert false dependencies by assigning unrelated values to the same register. We can undo this damage by assigning each value to a unique register so that only true dependencies remain. However, machines have a finite number of registers, so we can never guarantee perfect parallelism.
Register renaming
Alias analysis
We often have registers that point to a memory location or contain a memory offset. Can two memory pointers point to the same place in memory? If so, there might be a dependency. We're not sure yet. We can try to inspect pointer values at runtime to see if they point to overlapping memory.
Alias analysis
We can correctly predict around ~0.9 by counting which branches have been recently taken, and taking the most common one.
Indirect-jump prediction
If we jump to an address that is not known at compile time--for example, if a destination address is calculated into a register at runtime. This is often the case for "return" constructs, where the the calling function's address is stored on the stack. In this case, we can do indirect-jump prediction.
Latency
Window size
The window size is the maximum number of instructions that can appear in the pending cycle list.
Overall results