Академический Документы
Профессиональный Документы
Культура Документы
Steve Johnson
The Problem
Given an expression tree and a machine architecture, generate a set of instructions that evaluate the tree
Initially, consider only trees (no common subexpressions) Interested in the quality of the program Interested in the running time of the algorithm
The Solution
Over a large class of machine architectures, we can generate optimal programs in linear time
A very practical algorithm But different from the way most compilers work today And the technique, dynamic programming, is powerful and interesting
Representing Operands
In fact, we want the tree to represent where the operands are found
= + MEM (A) MEM (B) MEM (C)
Possible Programs
load load add store or load B,r1 add C,r1 store r1,A B,r1; C,r2; r1,r2,r1 r1,A
or
add
May 23, 2005
B,C,A
Copyright (c) Stephen C. Johnson 2005 6
(Assembler Notation)
Data always moves left to right load B,r1 r1 = MEM(B) add r1,r2,r3 r3 = r1 + r2 store r1,A MEM(A) = r1
Which is Better?
Not all sequences legal on all machines Longer sequences may be faster Situation gets more complex when
Complicated expressions run out of registers Some operations (e.g., call) take a lot of registers Instructions have complicated addressing modes
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 8
Example Code
A = 5*B + asin(C/2 + sin(D)) might generate (machine with 2 registers) load B,r1 OR mul r1,#5,r1 store r1,T1 load C,r1 div r1,#2,r1 store r1,T2 load D,r1 call sin load T2,r2 add r2,r1 call asin load T1,r2 add r2,r1,r1 store r1,A
May 23, 2005
load call load div add call load mul add store
D,r1 sin C,r2 r2,#2,r2 r2,r1,r1 asin B,r2 r2,#5,r2 r1,r2,r1 r1,A
What is an Instruction
An instruction is a tree transformation
MEM (A) REG (r1) load A,r1
REG (r1)
MEM (A)
store
r1,A
REG (r1)
May 23, 2005
10
load
r1(r2),r3
REG (r1)
REG (r2)
May 23, 2005
INT 2
11
Programs
A program is a sequence of instructions A program computes an expression tree if it transforms the tree according to the desired goal
Compute the tree into a register Compute the tree into memory Compute the tree for its side-effects
Condition codes Assignments
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 13
Example
Goal: compute for side effects
load
load add store
B,r1
C,r2 r1,r2,r1 A,r1
14
Example (cont.)
= + MEM (A) REG (r1) MEM (C) load + MEM (A) REG (r1) REG (r2) C,r2
15
Example (cont.)
= + MEM (A) REG (r1) REG (r2) add r1,r2,r1
MEM (A)
REG (r1)
16
Example (concl.)
=
MEM (A)
REG (r1)
17
18
19
Practical Observation
Many (most?) code generation bugs happen in this spill code
Choose a register that is really needed Very hard to test...
Create test cases that just barely fit or just barely dont fit to test edge cases...
Complexity Results
Simple machine with 2-address instructions:
r1 op r2 => r1
Cost = number of instructions Allow common subexpressions only of the form A op B, where A and B are leaf nodes Generating optimal code is N-P complete Even if there are an infinite number of registers! Implies exponential time for a tree with n nodes
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 21
Cost = number of instructions Allow arbitrary common subexpressions Infinite number of registers Can get optimal code in linear time
Topological sort Each node in a different register
22
23
Suppose the root node of T is + Then, in an optimal program, the last instruction must have a + at the root of the tree that it transforms
We make a list of these instructions Each has some preconditions for it to be legal
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 26
Preconditions: Example
Suppose the last instruction was add r1,r2,r1 Suppose the tree T looks like
+
T1
T2
Precondition Resources
If our optimal program ends in this add instruction, then we can assume that it contains two subprograms that compute T1 and T2 into r1 and r2, respectively
28
Reordering Lemma
Let P be an optimal program without stores that computes T. Suppose it ends in an instruction X that has k preconditions. Then we can reorder the instructions in P so it looks like P1 P2 P3 ... Pk X where the Pi compute the preconditions of X in some order. Moreover, P2 uses at most N-1 registers, P3 uses at most N-2 registers, etc., and each Pi computes its precondition optimally using that number of registers
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 30
Cost Computation
Define C(T,n) to be the cost of the optimal program computing T using at most n registers. Suppose X is an instruction matching the root of T with k preconditions, corresponding to subtrees T1 through Tk. Then C(T,n) <= C(X)+C(T1,p1)+...+C(Tk,pk) where C(X) is the cost of instruction X, and p1,...,pk are a permutation of the numbers n, n-1, ... n-k+1 In fact, C(T,n) equals the minimum, over all instructions X and permutations, of this sum
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 31
Sketch of Proof
By the reordering lemma, we can write any optimal program as a sequence of subprograms computing the preconditions in order, with decreasing numbers of scratch registers, followed by some instruction X. If any subprograms is not optimal, we can make the program shorter, contradicting optimality of the original program. Thus the optimal cost equals one of the sums (for some X and permutation)
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 32
33
Consequences
P1 can use all N registers. After P1 runs, all registers are free again. Let C(S,0) be the cost of computing S into a temporary (MEM) location. Then C(T,n) <= C(S,0) + C(T/S,n) One way to compute S into memory is to compute it into a register and store it (there may be other, cheaper, ways). Thus C(S,0) <= C(S,N) + Cstore
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 35
Optimal Algorithm
1. 2. Recursively compute C(S,n) and C(S,0) for all subtrees of T, starting bottom up, and all n <= N. Enumerate all instructions matching the root of T. Those that leave a result in memory contribute to C(T,0). Those leaving a result in a register contribute to C(T,n). Apply the cost formula for each permutation of the preconditions of the instruction, and remember the minimal costs. Update C(T,0) using C(T,0) = min(C(T,0),C(T,N)+Cstore) The result gives the minimal cost to compute the tree using n registers, or to compute it into memory
Copyright (c) Stephen C. Johnson 2005 36
3. 4.
Dynamic Programming
This bottom-up technique is called dynamic programming It has a fixed cost per tree node because:
There are a finite (usually small) number of instructions that match the root of each tree The number of permutations for each instruction is fixed (and typically small) The number of scratch registers N is fixed
So the optimal cost can be determined in time linear in the size of the tree
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 37
Unravelling
Going from the minimal cost back to the instructions can be done several ways:
Can remember the instruction and permutation that gives the minimal value for each node At each node, recompute the desired minimal value until you find an instruction and permutation that attain it
38
39
No Spills!
Note that we do not have to have spill code in this algorithm. The subtrees that are computed and stored fall out of the algorithm. They are computed ahead of the main computation, when all registers are available. The resulting instruction stream is not typically a tree walk of the input.
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 40
Reality Check
Major assumptions
Cost is the sum of costs of instructions
Assumes single ALU, no overlapping Many machines now have multiple ALUs, overlapping operations
Other Issues
Register allocation across multiple statements, flow control, etc.
Can make a big difference in performance Can use this algorithms to evaluate possible allocations
Cost of losing a scratch register to hold a variable
42
Common Subexpressions
A subtree S of T is used more than once (T is now not a tree, but a DAG) Say there are 2 uses of S. Then there are 4 strategies
Compute S and store it Compute one use and save the result until the second use (2 ways, depending on which use is first Ignore the sharing, and recompute S
May 23, 2005 Copyright (c) Stephen C. Johnson 2005 43
Cost Computations
Ignoring the sharing is easy Computing and storing is easy Ordering the two uses implies an ordering of preconditions in some higher-level instruction selection
And the number of free registers is affected, too
Summary
Register spills are evil
Complicated, error-prone, hard to test
If something is to be spilled, compute it ahead of time with all registers free The optimal spill points fall out of the dynamic programming algorithm
45