Вы находитесь на странице: 1из 59

Code optimization by partial redundancy elimination using Eliminatability paths (E-paths)

Prof. Dhananjay M Dhamdhere

Computer Science & Engineering, Indian Institute of Technology, Bombay

These slides are based on

D. M. Dhamdhere: E-path_PRE---Partial redundancy elimination made easy, SIGPLAN Notices, v 37, n 8 (2002), 53-65. D. M. Dhamdhere: Eliminatability path---A versatile basis for partial redundancy elimination, 2002 Dheeraj Kumar: Syntactic and Semantic Partial Redundancy elimination, M. Tech. dissertation, I.I.T. Bombay, 2006.

Computer Science & Engineering, Indian Institute of Technology, Bombay

Partial redundancy elimination

Partial redundancy An expression e in statement s is partially redundant if its value is identical with value of e in some path from start of program to s

Partial redundancy elimination -- A partially redundant occurrence of e is made totally redundant by inserting evaluations of e in some path(s) from start of the program to s

-- The totally redundant occurrence of e is now eliminated

Computer Science & Engineering, Indian Institute of Technology, Bombay

An example of PRE
t=a*b
1

a*b

t=a*b

a*b

-- Insert a*b in node 2 -- Delete a*b from node 3


Computer Science & Engineering, Indian Institute of Technology, Bombay

Partial redundancy elimination


PRE subsumes 3 important classical optimizations:

Common subexpression elimination (CSE) - Expression e is computed along all paths reaching its occurrence

Loop invariant movement - A loop-invariant expression is available along the looping edge. Hence it is partially redundant.

Classical code motion - A less known optimization. It is in fact partial redundancy elimination in specific situations.
Computer Science & Engineering, Indian Institute of Technology, Bombay

PRE subsumes 3 optimizations


1 1. CSE - a*b of node 5 is a CSE. 2 a=.. 4 a*b 2. Loop invariant movement - a*b of node 4 is partially redundant

a*b

3. Code movement - a*b of node 6 can be moved to node 3.

a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay

Benefits and costs of PRE

Benefits: Execution efficiency through a reduction in the number of expression occurrences along a graph path Costs: - Use of compiler generated temporaries to hold values of expressions - Lifetimes of compiler generated temporaries increase register pressure - Insertion of new blocks due to edge placement Desirable goals: Computational optimality and lifetime optimality
Computer Science & Engineering, Indian Institute of Technology, Bombay

Data flow concepts used in partial redundancy elimination

Availability : An expression e is available at a program point p if its value is computed along ALL paths from start of the program to p Partial availability : An expression e is partially available at a program point p if its value is computed along SOME path from start of the program to p Availability = Total redundancy
Partial availability = Partial redundancy
Computer Science & Engineering, Indian Institute of Technology, Bombay

Data flow concepts used in partial redundancy elimination

Anticipatability: An expression e is anticipatable (that is, very busy) at a program point p if it is computed along ALL paths from p to an exit of the program

Computer Science & Engineering, Indian Institute of Technology, Bombay

Data flow concepts used in partial redundancy elimination

Anticipatability: An expression e is anticipatable (that is, very busy) at a program point p if it is computed along ALL paths from p to an exit of the program Safety of a computation (Kennedy 1972): An expression e is safe at a program point p if it is either available or anticipatable at p - Insertion of e at p is a new computation if e is not safe at p. - It increases the execution time of the program. It may also raise new exceptions
Computer Science & Engineering, Indian Institute of Technology, Bombay

Safe insertion of computations


11 a*b 12 21 a*b 22

13

a*b

a*b

23

-- a*b is anticipatable in node 12, but not anticipatable in node 22


-- Insertion of a*b in node 12 is safe, however in 22 it is unsafe -- Insertion in edge (22,23) is safe!
Computer Science & Engineering, Indian Institute of Technology, Bombay

Some partial redundancies cannot be eliminated through safe code insertion


i m a*b -- Insertion in the in-edge of node n is unsafe because a*b is not anticipatable t=a*b

a*b available
n

a*b available, anticipatable a*b anticipatable


k a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay

Performing Partial Redundancy Elimination

Identify partially redundant occurrences of an expression e in a program Insert occurrences of e at some program points where e is safe Delete partially redundant occurrences of e which have become totally redundant Classical PRE: Elimination of partial redundancies in a program through safe insertion of computations.

- Can be looked upon as `code movement from the point of original occurrence to the point of insertion
- It cannot eliminate all partial redundancies in a program!
Computer Science & Engineering, Indian Institute of Technology, Bombay

A brief history of PRE

Morel, Renvoise (1979): Bidirectional data flows for code placement in nodes (MRA). Lacks both computational and lifetime optimality. Dhamdhere (1988): Computational optimality and reduced lifetimes of temporaries than Morel-Renvoise through placement in nodes and edges (EPA). Knoop, Ruthing, Steffen (1992): Lazy code motion (LCM) offering computational optimality and lifetime optimality through a priori edge splitting and placement in nodes. Drechsler and Stadel (1993) reformulated LCM to handle basic blocks. Bodik, Gupta, Soffa (1998) : Complete elimination of partial redundancies through selective code expansion (ComPRE). Based on the work by Steffen (1996). Kennedy et al (1999): PRE in SSA representation of programs (SSAPRE). Dhamdhere (2002): Eliminatability path --- A versatile basis for PRE (E-path_PRE). Develops a concept originating in Dhaneshwar, Dhamdhere (1995) and uses it for evaluation of PRE algorithms and development of new ones. Xue, Knoop (2006) and Dheeraj kumar, Dhamdhere (2006)
Computer Science & Engineering, Indian Institute of Technology, Bombay

Morel-Renvoise Algorithm (MRA)

Performs insertions strictly in nodes of the program graph


Placement possibility (PP) of e at entry/exit of basic blocks: whether it is feasible and safe to place expression e at entry/exit of a block Insert e at the exit of a basic block b if it can be placed at the exit of b but not at its entry

Delete an existing occurrence of e in a basic block if it can be placed at the entry of that block

Computer Science & Engineering, Indian Institute of Technology, Bombay

Morel-Renvoise Algorithm (MRA) (simplified equations)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Morel-Renvoise Algorithm (MRA)


1 1 a=.. t=a*b

a=..

a*b

t=a*b

a*b

a*b

1. a*b is inserted in node 2. Insertion in node 3 would have been lifetime optimal. 2. a*b of node 4 cannot be optimized because it cannot be inserted in node 1. 3. a*b is saved in t in nodes 2 and 4. a*b of node 6 is replaced by use of t.
Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (Dhamdhere 1988)

Performs insertions both in nodes and along edges in the program graph An expression is hoisted as far up as possible to obtain computational optimality It is then subjected to sinking (without affecting computational optimality) to obtain lifetime optimality

It is placed along an edge only if it cannot be placed in a node It is performed only along a critical edge, i.e., an edge from a branch node to a join node

Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (Dhamdhere 1988)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (Dhamdhere 1988)


A. Computational optimality:

The term of PPIN is dropped. Hence PPIN can be true even if PPOUT of a predecessor is false. If PP is true for entry of a basic block i but PP is false for exit of a predecessor j, e is placed along the edge (j,i). -- It is called edge placement. A basic block is inserted in the edge if e is to be placed along it.

-- Edge placement performed only along a critical edge, i.e. along an edge from a branch node to a join node.

Placement into nodes is done as in MRA.


Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (Dhamdhere 1988)


B. Reducing lifetimes of expression variables:

Move insertion points as far down as possible without sacrificing computational optimality (it is achieved by the term)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (Dhamdhere 1988)


EPA solution technique: (hoisting-followed-by-sinking approach)

1. Solve the unidirectional data flow problem obtained by omitting the term from the PPIN equation.
PPIN - apxi (Pavini Antloci. Transpi ). (Antloci Transpi. PPOUT - apxi ). PPOUT - apxi succ(PPINsucc) It hoists e as far up as possible. Provides computational optimality.

2. Now a second data flow is solved to incorporate the term: We examine all predecessors of a block i and change PPIN of block i from true to false if the term is false for its predecessors. It sinks the hoisted expression as far down as possible without compromising computational optimality.
Computer Science & Engineering, Indian Institute of Technology, Bombay

Edge placement algorithm (EPA)


1 1 t=a*b

a=..

a*b

a=..

a*b

t=a*b

a*b

1. a*b is inserted in node 3. However, EPA does not provide lifetime optimality in some cases. 2. a*b is inserted in edge (1,4). This is computationally optimal.
Computer Science & Engineering, Indian Institute of Technology, Bombay

Lazy code motion (KRS 92)

All join edges are split a priori by inserting blocks along them D-Safe-earliest points: An expression is placed at the earliest points where it is anticipatable. Evaluation of an expression is delayed to the latest point where it can be placed without losing computational optimality. Thus, it conceptually performs hoisting-followed-by-sinking, as in the edge placement algorithm.

Insertion and saving is performed uniformly.


Data flow equations are not given here. (Drechsler and Stadel reformulated them.)
Computer Science & Engineering, Indian Institute of Technology, Bombay

Lazy code motion (KRS)


1 1 (1,4) t=a*b 2 a=.. 4 a*b 2 a=.. 4 t

a*b

3
(3,6) t=a*b

a*b

1. Edges (1,4), (3,6), (5,6) and (5,4) are split a priori 2. a*b is inserted in edge (3,6). LCM provides lifetime optimality 3. a*b is inserted in edge (1,4). As in EPA, this is computationally optimal

4. Empty blocks: removed

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

A conceptual basis for PRE: - Identifies partial redundancies which can be eliminated through insertion of code in safe places

* We call them eliminatable partial redundancies


- A simple method for identifying safe insertion points which offer lifetime optimality - Thus, no hoisting-followed-by-sinking
Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

Computationally optimal PRE: - Elimination of all eliminatable partial redundancies identified by E-paths through appropriate insertions provides computational optimality

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

PRE with lifetime optimality: - Insertions performed using the notion of E-paths provides lifetime optimality

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

A versatile basis for PRE:


- Classical PRE: PRE performed by insertion, deletion and saving of expressions over a program graph - PRE over SSA representations of programs

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

Simplicity:
- Insertion, deletion and save points are identified using simple and well-known data flow concepts of availability and anticipatability

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability paths offer ..

A basis for evaluating effectiveness of an approach to PRE:


- Does the approach provide computational optimality? (i.e. does it eliminate all partial redundancies which can be eliminated?) - Does the approach provide lifetime optimality?

Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability Paths (E-paths)

A path i .. k in a program control flow graph is an E-path for an expression e if - Node i contains a locally available occurrence of e and node k contains a locally anticipatable occurrence of e - Nodes in the path (i .. k) are empty wrt e, i.e. they do not contain an occurrence of e or a definition of any of its operands - e is safe at the exit of each node in [i .. k), i.e., it is either available or anticipatable at the exit of each node in [i .. k).

Path [i .. k) includes node i, but excludes node k. Path (i .. k) excludes nodes i and k.
Computer Science & Engineering, Indian Institute of Technology, Bombay

Eliminatability Path*
i m a*b

- a*b available at exit of [i .. m]


- a*b anticipatable at exit of [n .. k) - Occurrence of a*b in node k

is said to be eliminatable

* Dhaneshwar, Dhamdhere (1995) used eliminatability of exps, but did not define or use E-paths explicitly.

a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay

Properties of E-paths: 1

PRE using E-paths provides computational optimality Use of this property: - Use it to evaluate computational optimality of a PRE algorithm.

- A PRE algorithm possesses computational optimality if it can eliminate partial redundancy of e in EACH node k such that an E-path i .. k exists in G.

Computer Science & Engineering, Indian Institute of Technology, Bombay

Properties of E-paths: 2

If i .. k is an E-path and j is a node in (i .. k] - For each in-edge (g, j) such that node g is not in an E-path: if node g has a successor s which is not in an E-path then insert e in edge (g, j) else insert e in node g - Such insertion provides lifetime optimality of the temporary variable used to hold value of e

Use of the property: - Check whether a PRE algorithm provides lifetime optimality by comparing program points where insertions are made

Computer Science & Engineering, Indian Institute of Technology, Bombay

Lifetime optimality using E-paths


i m t=a*b g2 a*b g1 t=a*b - i .. k is an E-path - Insertion in edge (g1, j) and node g2 is lifetime optimal

a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay

Evaluating MRA using E-paths


1 1 a=.. t=a*b

a=..

a*b

t=a*b

a*b

a*b

0. Three E-paths exist: 4 .. 5, 5 .. 4 and 5 .. 6. 1. 5 .. 6 is an E-path. Insertion node 3 would have been lifetime optimal. 2. 5 .. 4 is an E-path. Hence a*b of node 4 is eliminatable, but not eliminated!
Computer Science & Engineering, Indian Institute of Technology, Bombay

PRE using E-paths

For an E-path i .. k a) Insertions: For a node j in (i .. k] - Insert e in edge (g, j) if g is not in an E-path and has a successor which is not in an E-path - Insert e in predecessor g if g is not in an E-path and all its successors are in E-paths b) Save: Save the computation of e in node i, unless i is the end-node of some E-path h .. i (in which case it would be deleted). c) Deletion: Delete the occurrence of e in node k.
Computer Science & Engineering, Indian Institute of Technology, Bombay

PRE using E-paths

E-path i .. k may contain 3 kinds of segments - Avail . Ant segment - Avail . Ant segment - Avail . Ant segment : This is called the E-path suffix.

Find a node m : Avail(m) . Anticipatable(m). Avail(p), p=pred This is the start node of the E-path suffix.
- Trace Avail . Ant segment backwards from m to find node i, the start of the E-path and perform a save in it - Trace Avail . Ant segment forward from m a) to perform appropriate insertion for in-edges b) to find k and perform a deletion
Computer Science & Engineering, Indian Institute of Technology, Bombay

Segments in an E-Path
1 2 3 Start node Of E-path suffix 4 5 10 a*b
Computer Science & Engineering, Indian Institute of Technology, Bombay

a*b

a) 1 .. 2 : Avail Ant. b) 3 .. 4 : Avail Ant. c) 5 .. 10 : Avail Ant (E-path suffix).

E-path suffix: insertions may be needed in paths joining it

Simple data flows for E-path_PRE@


Comp : e is locally available (i.e. downwards exposed) in node Antloc : e is locally anticipatable (i.e. upwards exposed) in node Transp : node does not contain definitions of es operands

@:

Terminology is from Morel-Renvoise algorithm

Computer Science & Engineering, Indian Institute of Technology, Bombay

Simple data flows for E-path_PRE

Availability and Anticipatability (i.e. very busy exps.) Eps-in/Eps-out (Node is in E-path suffix)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Simple data flows for E-path_PRE

Availability and Anticipatability Eps-in/Eps-out (Node is in E-path suffix) SA_in/SA_out (A save should be performed above)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Efficiency of E-path_PRE data flows

The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis - Information flow path (ifp): A graph path along which data flow information may flow during data flow analysis.

(Information flow : Values of data flow properties change from`lattice top to `lattice bot during iterative data flow analysis)
- Width of a graph (reduces to depth of a graph for unidirectional data flows)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Efficiency of E-path_PRE data flows

The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis - Information flow path (ifp): A graph path along which data flow information may flow during data flow analysis.

(Information flow : Values of data flow properties change from`lattice top to `lattice bot during iterative data flow analysis)
- Width of a graph (reduces to depth of a graph for unidirectional data flows)

Number of bit-vector operations during work-list iterative df analysis depend on length of an ifp, and the number of iterations during round-robin iterative df analysis depend on width of an ifp
Computer Science & Engineering, Indian Institute of Technology, Bombay

Efficiency of E-path_PRE data flows

The Eps_in/out data flow of E-path_PRE has been designed to have short information flow paths. This fact may also lead to small width of a program graph. Short information flow paths and small width leads to smaller solution times of data flows. This fact is borne out by experimentation --- comparison with the later data flow of Drechsler, Stadel (1993) (Dhamdhere 2002): - In worklist solution: No. of bit vector operations is 80% smaller - In round-robin iterative solution: No. of iterations is 37% smaller

Computer Science & Engineering, Indian Institute of Technology, Bombay

Code placement models in PRE

Node model - Simple node model Each node contains a single statement - Basic block model Each node is a basic block

Insertion and Saving model


- Saving in situ Value of an expression is saved in the place where it is located - Saving in entry/exit of node An expression is moved to node entry/exit if its value is to be saved - Insertion at entry/exit of node - Unified insertion and saving This is possible only when saving is done at node entry/exit
Computer Science & Engineering, Indian Institute of Technology, Bombay

Code placement models in PRE

Morel-Renvoise Algorithm (MRA): - Basic blocks, saving in situ, insertion at exit Edge placement algorithm (EPA): - Basic blocks, saving in situ, insertions at node exit and in critical edges (edge splitting performed on a needs basis)

Lazy Code Motion (LCM): - Simple nodes, unified saving and insertion, insertion at node entries and in blocks inserted in join edges in a priori edge splitting
E_path-PRE - Basic blocks, saving in situ, insertions at node exit and in critical edges SIM-PRE - Basic blocks, saving in situ, insertion strictly along edges
Computer Science & Engineering, Indian Institute of Technology, Bombay

Evaluation of code placement models using E-paths

Morel-Renvoise algorithm (MRA)


Missed opportunities of optimization (seen before)

Lazy code motion (LCM)

Performs insertion in a join edge (p,j) even if it could have been performed in node p

a*b

2 a*b inserted 3 a*b

Computer Science & Engineering, Indian Institute of Technology, Bombay

Evaluation of code placement models using E-paths

Optimal code motion (OCM) Knoop et al 1994 - Basic blocks, Hybrid model, Insertions at node entry and exit - Hybrid: Uniform insertion and saving model but saving is performed in situ No insertions and savings will be performed at entry to a node (Lemmas 19 and 23). Hence this feature is redundant.

Computer Science & Engineering, Indian Institute of Technology, Bombay

Evaluations of code placement models using E-paths

Complete elimination of partial redundancies (ComPRE) Bodik, Gupta and Soffa (1998) (when adapted to classical PRE) - Simple nodes, unified saving and insertion only in edges An expression in a node is redundantly hoisted into its entry-edges - Addressing this problem will require an additional data flow problem, making it less efficient than E-path_PRE.

Computer Science & Engineering, Indian Institute of Technology, Bombay

Later work

SIM-PRE by J. Xue, J. Knoop (2006): inserts only along edges

Computer Science & Engineering, Indian Institute of Technology, Bombay

SIM-PRE by Xue and Knoop

This data flow traces an E-path !

Computer Science & Engineering, Indian Institute of Technology, Bombay

SIM-PRE by Xue and Knoop


i m a*b g1 t=a*b - i .. k is an E-path - Insertion in edge (g1, j) and node g2 is lifetime optimal - SIM-PRE inserts in edges (g1, j) and (g2, l ) k a*b

g2 t = a*b l

Computer Science & Engineering, Indian Institute of Technology, Bombay

SIM-PRE by Xue and Knoop


SIM-PRE performs better than E-path_PRE in bit vector operations
(Graphic is from J. Xue, J. Knoop (2006))

However, it adds almost 50% more new blocks than E-path_PRE


(Dheeraj Kumar, 2006)
Computer Science & Engineering, Indian Institute of Technology, Bombay

Work by Dheeraj Kumar (2006)

Simplified the data flows of E-path_PRE

Eps_in/out data flow finds nodes {i } that belong to an E-path and have Antouti = true

SA_in/out data flow finds nodes {i } that belong to an E-path and have Avouti = true

@ : M. Tech. dissertation, IIT Bombay

Computer Science & Engineering, Indian Institute of Technology, Bombay

Work by Dheeraj Kumar (2006)

Simplified data flows of E-path_PRE (Proposal 2):

Computer Science & Engineering, Indian Institute of Technology, Bombay

Work by Dheeraj Kumar (2006)

Experimental results - SPECcpu2000 benchmark under GCC 3.4.3 - Proposal 2 performance * Bit map operations 5.5% smaller than SIM-PRE in worklist and 15.8% smaller in iterative * Introduced 30% fewer blocks than SIM-PRE

Computer Science & Engineering, Indian Institute of Technology, Bombay

Thus, eliminatability paths offer ..

A conceptual basis for PRE A versatile basis for PRE A basis for evaluating effectiveness of an approach to PRE

(Efficiency is a bonus!)

Computer Science & Engineering, Indian Institute of Technology, Bombay

Вам также может понравиться