Вы находитесь на странице: 1из 6

Self-adapting Compilers

Compiler Optimization using Machine Learning


Archana Chauhan1, Kartikey Pokhriyal2, Kriti Kathuria3, Naman Goyal4
Department of Computer Science
Maharaja Agrasen Collge
University of Delhi
archana.chauhan16@yahoo.com1, kartikeypokhriyal@gmail.com2, kathuriakriti1@gmail.com3, namangoyal.official@gmail.com4
Abstract This paper focuses optimization of compilers
on three areas: function in-lining, loop unrolling and loop
fusion. We theorize the features of a code that can affect the
above areas, and the machine learning techniques that can
be used to tackle the problems posed these parts of the code
that hinder performance. We discuss some of the techniques
that are already in use for optimizing the compiler, and
propose new ways in which machine learning techniques can
be used and adapted to solve the problem of compiler
optimization.
Keywordscompiler optimization, machine learning,
function in-lining, loop unrolling, loop fusion, neural
networks, regression, k-means, k nearest neighbiurs, apriori,
fp-growth, decision tree, genetic programming, principle
component analysis

I. THE NEED FOR OPTIMIZING COMPILERS


A compiler is a bunch of computer programs that work
to translate source code from a human readable language
to a language of binaries, which is understood by the
computer. As everything else in this world, we strive to get
the best out of the said compiler. In a world obsessed with
results and performance, it is no wonder that compilers
day has also come.
By Compiler optimization we mean to change some
parts of the code in a way it does not lose its integrity and
at the same time becomes more efficient in terms of
memory utilization, time taken to achieve results and cost
of running the program. The program should execute
faster, utilize fewer clock cycles, give clean results, form
smaller executables and use the registers available
efficiently, to promote parallel processing.
Here it is important to note that there is no way we can
build the best and unbeatable compiler optimizer, but we
can make a compiler that is best for a particular code.
II. CLASSICAL SOLUTION
has been quite some time since the concept of
optimizing compilers first did the rounds. Since then many
optimizing techniques have been conceptualized and
applied into the compilers. Few of them being function inlining, loop unrolling, loop fusion, loop fission, loop
peeling, peep-hole optimizations, dead code elimination,
strength reduction, code raising, etc.
So when a program is compiled, a standard set of
optimizations are applied to the source code, regardless of
whether the optimizations are meant for it or not. All the
parts of the code that can be changed for the better are
changed.
It

This technique is cumbersome and operated on the


assumption that either the optimization applied will
optimize the code, or leave it unchanged. Also, as more
and more ways to optimize code are found, this set
expands and the optimization process becomes lengthy and
time consuming.
But it has been found out that some optimizations,
while optimizing some part of the code, have a negative
effect on the other part of the code. So the selection of
optimizations has to be done carefully.
For example, applying a transformation in the name of
optimization, that ends up using more registers than is
feasible. This defeats the purpose of optimization, and ends
up de-optimizing the compiler.
This could be remedied by selecting from the standard
set of optimizations those that are custom fitted to optimize
the code that is being compiled.
This could be done manually, but that would defeat the
purpose of making the process conducive to saving cost
and time. As a final nail, for longer codes, this process of
manual optimization will take longer.
The same could be said for hard coded algorithms.
And hence the need to build adaptive self-learning
compilers that will know which technique would make
them time and cost friendly. The term machine learning
comes to mind at this realization.
III. MACHINE LEARNING BASED OPTIMIZATION
In the famous words of Arthur Samuel, said in 1959,
sum up the notion and essence of machine learning in a
most simplistic way: [Machine Learning is the] field of
study that gives computers the ability to learn without
being explicitly programmed.
Or more formally, as said by Tom Mitchell, in 1997, A
computer program is said to learn from experience E with
respect to some task T and some performance measure P,
if its performance on T, as measured by P, improves with
experience E.
The aim is to decide upon a machine learning
algorithm through comparison results from different
machine learning algorithms, which algorithm is suited for
any particular optimization.
IV. FINE TUNING THE OPTIMIZATIONS
There are three main ways for improving optimizations
for any given code:
Optimization Selection: This pertains to selecting
which of the optimizations, from a given set of

optimizations, must be applied to the code, to maximize


performance
Optimization Ordering (Phase Ordering): Sometimes,
applying some optimization first can make an opening for
the application of other optimizations, hence, leading to
overall performance improvement. Therefore, the compiler
must be able to decide upon the most efficient order for
application of optimizations, for any given code.

Optimization Tuning: For any optimization,


it maybe suited to some codes only after tweaking it a
little. Hence, optimization tuning refers to tuning an
optimization to best suit the needs of a code being
compiled.

machine learning algorithms work upon these features and


assign each a weight. This weight determines which
feature of the code has more influence and which feature
has less influence on the objective of optimization.

V. AREAS OF CODE FOCUSSED ON TO OPTIMIZE


Some optimizations such as code hoisting, dead code
elimination, code motion and common sub-expression
elimination can be applied regardless.
Others, like loop unrolling and fusion and function inlining, need some thinking about. We need to consider
factors such as the total number of registers that will be
required for operations and the length of the appended
code. We do not need the number of instructions
increasing so much that it defeats the purpose, nor do we
need the code to demand excessive registers than available
at hand.
Therefore, for each of function in-lining, loop unrolling
and loop fusion, the factors number of instructions and
number of registers will be factored into the machine
learning algorithm.
For this paper, we elect loop fusion, loop unrolling and
function in-lining.
VI. MACHINE LEARNING MODULE INSIDE THE COMPPILER
As we can see in the figure 1, there are two types of
code optimizations: machine independent and machine
dependent.

Machine
independent
optimizations:
Applying optimizations to the part of code that does not
involve any CPU registers and memory allocations.

Machine dependent optimizations: those


optimizations that affect that part of the code that
communicates with and uses the machine hardware such
as memory, registers and clock cycles. These
optimizations factor in information about the limits and
special features of the target machine to produce code
which gives better performance.
Code transformations such as function in-lining,
loop fusion and loop unrolling are most definitely machine
dependent optimizations, and therefore, we propose the
algorithms for optimizations sit at the machine dependent
code optimizer phase.
VII. FFEATURE SELECTION
Feature selection in machine learning, also called
attribute selection or variable selection, is the process of
determining those characteristics of a problem that may
affect its solution. Therefore, we must isolate features
pertaining to the areas of the code we wish to optimize
(function in-lining, loop fusion and loop unrolling). The

Fig. 1. The processes in a compiler

For each area we wish to focus in optimizing, we


need to decide features that will positively or negatively
affect the code. These are listed as below:
1) Function in-lining
Size of the function
Number of time a function is called, from the same
call site
Number of parameters to the function
If a function is recursive
Number of operations

Number of operands
Live range size
Number of floating point operations
Number of memory operations

2) Loop unrolling
Loop nest level (the number of loops present within a
given loop)
Number of operands
Number of operations
Number of floating point operations
Number of memory operations
Number of branches
Live range size
Known trip count (This is the number of times loop
executes)
Number of parallel computations in a loop
The minimum memory to memory loop carried
dependency
3) Loop fusion
Number of consecutive loops
Trip counts of loops should be the same
Variable dependence of the loops to be fused
Loop nest level (the number of loops present within a
given loop)
Number of operands
Number of operations
Number of floating point operations
Number of memory operations
Number of branches
Live range size
Known trip count (This is the number of times loop
executes)
Number of parallel computations in a loop.
The minimum memory to memory loop carried
dependency.

VIII. MACHINE LEARNING TECHNIQUES


A) LINEAR AND LOGISTIC REGRESSION
Linear and Logistic regression techniques are
essentially the same. For any particular optimization, we
assign each feature a weight. This weight determines the
influence a particular feature has in selecting an
optimization. Initially, this weight is undetermined. To
determine this, both linear and logistic regression use the
least square method, and differ just at accounts of the cost
functions used. Linear regression, as the name states, uses
the cost function: J(x) = ax1 + bx2 + c, where a, b and c
are the weights assigned to features x1 and x2.
Logistic regression uses the sigmoid function as the
cost function.

These two algorithms classify the data space into two


parts, i.e. the part where the features of the code make it
impossible to apply the optimization, and the part where
features make the code appropriate for applying an
optimization.
So whenever a new code is compiled, it is checked
which part it lies in and hence, optimization is decided.
B) NEURAL NETWORKS:
This technique can be used for selecting from all
three options at once, but again it will select only one of
the optimization that we are evaluating.
Visualize it as follow:
Each optimization has some features. In our case, it
will form a matrix with three rows and with columns same
as the features. This matric can be unrolled to form a
vector, to serve as an input for the neural network.
Though, since all three optimizations do not have the
same number of features, some of the values for features
will have to be zero. This will form the input layer of the
neural network. The output layer will have three
optimizations, into which our code will be classified. The
algorithm can be tweaked to be able to assign a code more
than one of the optimizations.
This is the case of checking for all three optimizations
at once.
We can also train a neural network for each
optimization separately. In this case, the features of any
particular optimization will form the input layer, and the
output layer will hold a binary 1/0 result. 1 for applying
the particular optimization and 0 for forgoing the
optimization.
C) k-Means Clustering
This technique puts into clusters the objects in the
data space, such that the observations in one cluster are
similar in one or more ways. This is a technique of
unsupervised machine learning.
The dataset is divided into k clusters initially,
which is basically a random set of k points and each point
forms the centroid.
Add the remaining data points to the cluster with
the nearest centroid. A new centroid is calculated and the
data points are again clustered into clusters with the
closest centroid. This is repeated till finally defined
clusters are obtained.
We start by plotting the characteristics of the training
dataset, and subsequently clustering them according to the
approach defined above.
The end result is different codes grouped into different
clusters. Each cluster has a set of similar characteristics,
based on which a supervised learning algorithm can be
implemented to decide upon the optimizations to be
applied.
So whenever a new code comes in to be compiled, it
can be put into a suitable cluster and can be optimized
accordingly.

D) k Nearest Neighbours Classification


K-nn or K nearest neighbors is the technique which is
used to predict the class of the current element based on
its K neighbours. The predicted value depends on the
value of K used. For example we are dealing with shapes
and we have to predict the shape of an object which has 2
triangles and 3 squares as neighbours and K=3 then the
predicted shape would be a triangle. But if we have K= 5
the predicted shape would be a square. But the large
number of elements of a particular class may affect the
class of the predicted elements. In the previous example if
K=5 then the predicted value will be square no matter
how far the class of square lies from it. So, we use the
concept of weighting. We multiply the inverse of the
distance of the elements and then take a sum of it. So the
final answer would be the predicted value.
While we design a compiler, we are using this K
nearest neighbours technique to predict which
optimization technique should be used on which set of
program.
In a particular program, code statements are
interrelated and therefore, the technique that should be
used to optimize a particular set of code statements can be
predicted by the optimization techniques used by its
neighbouring code statements.
E) Apriori Algorithm
This algorithm is for mining those sets of sample data
points that occur frequently, where frequently is a predefined number.
Apriori is an iterative algorithm, which starts with
single elements. The elements that clear a minimum
occurrence threshold form the data space for the next
iteration. The next iteration involves sets of two, formed
from the data space isolated in the previous step and the
sets that clear the pre-defined minimum occurrence
threshold (checked from the original dataset) form the
latest data space. Now, sets of three are formed. This
process is repeated till we reach the desired item set size is
achieved, by which point only a handful sets remain,
which can then be used to reach a conclusion.
We propose to employ a multi-dimensional association
rules, where size of code, different features and
optimization selected form the dimensions of the rules.
By the end of our analysis, we will have some k-item
sets that will indicate, by virtue of their set form, for
which features any particular optimization has been
applied, hence helping us predict an optimization for a
new code being compiled.
This can be also be done for multiple optimization for
a set of features too.
The point that absolutely must be brought to attention
is that if the subsets of the frequent k-item sets are
analyzed, we can extrapolate the order in which
optimizations are applied.

F) FP-Growth
FP growth or frequent pattern growth adopts a divide
and conquer strategy to find frequent patterns, of any size.
We start with ordering the items in the database by
decreasing order of occurrence. We start with a null node,
which first branches to the item that has the most
occurrence. Then according to the item-sets in the
database, a tree is constructed, going from most frequent
item-set to least frequent. If any item-set does not contain
the most frequent item, then another branch from the null
node is created. Each node, which is made of individual
items, is assigned a number. Following any path from the
root (null) to the leaf gives us a frequent pattern, given
that all nodes satisfy the minimum threshold.
It is fairly obvious that in our case, the item of
dimension optimization will occur the least times, as
compared to the other features that make up the items in
our case. We can also use the Apriori priority to neglect
those items that have an even lesser occurrence than the
result item. We can assign the minimum threshold equal
to the frequency of the result item. On creation of the fptree, we can easily follow a path, and hence, determine the
order of application of optimizations, in order to
maximize performance.
G) Decision Trees
It is a flowchart structure, in which each node denotes
a test on a feature, the branches lead to other tests based
on the outcome of the previous test, and the leaf nodes
denote the final classifier.
Decision tree, in conjunction with neural networks or
any other supervised learning algorithm, is a good method
for determining the order of optimizations.
The given figure 2 depicts the working of the above
most appropriately.
H) Principle Component Analysis
Also called the PCA, it is a technique for
dimensionality reduction.
It coalesces all the variables, possibly correlated, to
arrive at a smaller number of uncorrelated variables that
can serve as descriptors of the underlying structure of the
data.
The idea is to apply PCA to shortlist the most
important features from a list of all features that a code
can employ. The input to the PCA algorithm would be a
list of all the features of the codes that make up the sample
space, from the most simple, to the most complex. We
doing the analysis, we would get a short list of features
that have the most probability of influencing the
performance of a code. We could use this new list of
features to run any of the machine learning algorithms.
We propose using PCA to isolate the features, instead of
considering all the possible features.

The solutions so obtained are crossover to obtain a


better solution whereas mutation adds some features
which enhances the optimal solution.
Algorithm:
1. The flags are chosen and combined to form a
chromosome, which will be a contender for best solution.
2. The execution and compile time in every possible
combination of flags is noted.
3. After that we determine the Fitness function as
described above.
4. The combination with highest fitness value is
selected and used for other generations (codes here).

Fig. 2. Using decision trees in conjunction with neural networks to


arrive at the best possible order for application of optimization .

I) Genetic Programming
The concept uses heuristics of search and optimization
techniques which imitates the process of natural evolution
.Since early age the survival of the fittest is observed, the
same analogy can be applied in terms of optimization.
Being an evolutionary algorithm it incorporates gradual
and progressive enhancement of the subject.
Optimization of compiler can be obtained by reducing
the execution time or reducing the code size basically.
Genetic algorithms use compiler flags to obtain the
necessary optimal results; an optimal flag is one which
optimizes a particular feature of the code whereas an
optimization level helps optimizing a number of features
via setting many flags.
Now the algorithm has ways to approach an optimal
solution wiz selection, crossover and mutation.
Selection: Is the process of selecting the best and
discarding the rest, in a pool of optimal solution it chooses
the best and uses fitness function to determine the same.
For compiler optimizations we have
E: execution time
C: compilation time
Fitness function = 1/E+C
The greater the value of fitness function, the more the
solution is optimal. The inverse requires the Execution
and compilation time to be less.
Crossover and mutation

Fig. 3. Diagrammatic representation of the genetic algorithm

Simple genetic algorithm:


Func. SGA ()
{
Initial population;
F=Fitness Function ();
While (F!=termination criteria)
{
Selection ;
Crossover;
Mutation; } }
The simple genetic algorithm coded above describes
how the selection, crossover and mutation are repeated
until a termination criteria is accomplished, which in this
case is the best possible result.

With the advent of numerous algorithms used in


machine learning, it becomes paramount to determine
what technique to be used; genetic algorithm being
analogous to nature brings the concept of crossover.
Talking in context of compilers, having selected possible
solutions, they can then be crossover to produce a new
solution by selecting common attributes which contribute
in the optimization of the code. The figure 4 below
explains the idea.

Fig. 4. Using genetic programming to find the best of the probable


strategies for optimizations

REFERENCES

[1] Milepost GCC: machine learning enabled self-tuning


compiler, by: Grigori Fursin, Yuriy Kashnikov, Abdul Wahid
Memon, Zbigniew Chamski, Olivier Temam, Mircea
Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric
Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin
Bonilla, John Thomson, Christopher K. I. Williams and
Michael O'Boyle.
[2] Improving Compiler Optimization using Machine
Learning, by: Sameer Kulkarni.
[3] Machine Learning Algorithms for Choosing Compiler
Heuristics, by: Gennady G. Pekhimenko.
[4] https://www.toptal.com/machine-learning/machinelearning-theory-an-introductory-primer, accessed on 14-022016
[5] Statistical and Machine Learning Techniques in
Compiler Design, by: P.J. Joseph, Matthew T. Jacob, Y.N.
Srikant, Kapil Vaswani.
[6]http://www.tutorialspoint.com/compiler_design/compile
r_design_code_optimization.htm, accessed on 04-02-2016.
[7] https://www.cs.princeton.edu/picasso/mats/PCATutorial-Intuition_jp.pdf accessed on 02-03-2016

[8] http://ordination.okstate.edu/PCA.htm accessed on 0203-2016


[9] http://support.minitab.com/en-us/minitab/17/topiclibrary/modeling-statistics/multivariate/principal-componentsand-factor-analysis/what-is-pca/ accessed on 02-03-2016
[10]https://georgemdallas.wordpress.com/2013/10/30/prin
cipal-component-analysis-4-dummies-eigenvectorseigenvalues-and-dimension-reduction/ accessed on 02-032016.
[11]http://research.ijcaonline.org/volume112/number1
0/pxc3900938.pdf accessed on 05-03-2016

Вам также может понравиться