Вы находитесь на странице: 1из 34

CS 498: Program Optimization

Spring 2010
Maria J. Garzaran

University of Illinois at Urbana-Champaign


Course Organization
•  Instructor:
Maria J. Garzaran

•  Office: 4308, Siebel Center


•  Email: garzaran@cs.uiuc.edu
•  Office Hours: By appointment (send me an email)
•  Course website: http://www.cs.uiuc.edu/class/sp10/cs498mjg/

•  Location: Room 1103, Siebel Center, W-F 11:00 - 12:15


•  Credits: 3 undergraduate hours; 3 or 4 graduate hours
Course Subject
•  Program optimization

•  One of the main topics in computer science although


there are few courses focusing on this topic.

•  Improve programs (or generate programs with good


properties along one or more of the following
dimensions)
–  Execution time
–  Power
–  Space
–  Reliability
–  Modularity, Readability
–  Accuracy/completeness
Program Optimization

•  Program optimization is a difficult task

–  Computers are becoming very complex

–  Interactions between hardware and software are


difficult to understand

4
Course Subject
•  Importance of reducing execution time
–  Faster programs are better.
•  An interactive environment exacerbates the thirst
for speed.
•  More science/better engineering designs in same
amount of time
–  Although machines are getting more powerful,
sometimes the only way to take advantage of this
power is through program optimization.
•  Itanium instruction level parallelism
•  Multicore machines
Evolution of Processors (Intel)

Source: Markus Püschel. “How to write fast code”, Spring 2008.


Evolution of Processors (Intel)

Source: Markus Püschel. “How to write fast code”, Spring 2008.


Evolution of Processors (Intel)

Source: Markus Püschel. “How to write fast code”, Spring 2008.


Course Subject
•  How is a program optimized ?
–  Manually
•  Algorithm choice
•  Library selection
•  Need tools to assess progress and understand what is
happening.
•  Tradeoff: development and maintenance time vs.
performance.

–  Automatically
•  By a compiler
•  By developing code generators that automatically search for
best shape of the program. Here the choices are those of the
programmer while in a compiler the choices are those of the
compiler writer.
So, we have 3 choices …
•  Option 1: Use the compiler

•  Option 2: Manual optimizations

•  Option 3: Use libraries


–  can be manually written
–  can be automatically generated

10
Option 1: using the compiler
•  It is the most desirable option, but we will
find that this does not always work well

•  Compilers have many compiler flags.


Maybe playing with them we can obtain
some performance benefit.
–  A few research papers have played with this
idea.

11
Option 2: Manual Tuning
•  Choose the right algorithm

•  Choose the right instruction mix (probably using


assembly instructions).

•  Take into account architectural parameters such as


cache size, instruction throughput, or architectural
devices, such as SIMD instructions or parallelism.
–  The result is codes that are not portable from
architecture to architecture.
–  This option is also very time consuming
12
Option 3: Libraries
•  Building libraries is one of the earliest strategies
to improve productivity.

•  Libraries are particularly important for


performance
–  High performance is difficult to attain and not portable.

•  Libraries are not used as often as it is believed.


–  Not all algorithms implemented.
–  Not all data structures.
13
Compilers vs. Libraries in Sorting
IBM Power 3 IBM Power 4

quicksort

Execution Time
Execution Time

~2X

~2X
~2x
~2x

vendor
library
Input Set Input Set

14
Compilers versus Libraries in DFT

Source: Markus Püschel. “How to write fast code”, Spring 2008.


Compilers versus Libraries in MMM

Source: Markus Püschel. “How to write fast code”, Spring 2008.


Some conclusions
•  Implementations with the same operation counts can
have vastly different performance (up to 100x and more)
–  a cache miss can be 100 times more expensive than an addition
or a multiplication.
–  Vector operations can perform 2 or 4 operations on parallel
–  All recent desktop computers have multiple cores = processors
on a die
•  Minimizing operations count does not mean maximizing
performance

•  End of free speed-up: Legacy code will not get


automatically faster anymore.
17
Source: Markus Püschel. “How to write fast code”, Spring 2008.
Libraries and Productivity

•  Much effort goes into highly-tuned libraries.

•  Automatic generation of libraries would


–  Reduce implementation cost
–  For a fixed cost, enable a wider range of implementations and
thus make libraries more usable.

18
Library Generators

•  Automatically generate highly efficient


libraries for a class of platforms.

•  No need to manually tune the library to the


architectural characteristics of a new
machine.

19
Library Generators
•  How do they work?
–  They use empirical search for program
optimization

•  Generate different versions of a program, execute


and choose the fastest

•  Hard to use in general programs

•  Can be used for library generation


20
Library Generators (Cont.)
•  Examples:
–  In linear algebra: ATLAS, PhiPAC
–  In signal processing: FFTW, SPIRAL

•  Library generators produce a pre-defined


set of algorithms.
–  Exception: SPIRAL accepts formulas and
rewriting rules as input.

21
Course Subject
•  Focus of this course is discuss how to reduce the
execution time of a program.

•  Some issues discussed


–  Factors that affect performance (branches, instruction level
parallelism, choice of instructions, cache misses, …)
–  Program transformations (algorithm change, loop
transformations, vectorization and parallelization)
–  Tools (Vtune, gprof)
–  Automatic program synthesis (library generators)
Course Subject
•  The main focus of this course is manual optimization, but
we need to understand compilers unless we are willing to
program in assembly language.

•  Understanding the manual approach is important for


compiler writers since they should dominate the manual
approach before trying to automate it.

•  The manual approach is also important for machine


designers so they can understand their choices when
mapping programs to their designs.
Course Subject
•  For performance optimization it is necessary to have a
good understanding of target machine features.
•  Program optimization is difficult for many reasons:
–  Machine features may interact in ways that are difficult to analyze.
Explaining/predicting behavior is difficult and has become more difficult
with increasing machine complexity.

–  There many ways to solve a problem and in many cases it is not clear
which one is better. It depends on the class of machine, and the
characteristics of the input data. For example, it is difficult to know when
is quicksort better than radix sort.

–  In general, a “proof” of optimality is unrealistic. Usually difficult to know


how much more could be done.
Tentative List of Topics
•  What is program optimization (performance
tools, hotspots, …)
•  Performance Issues (algorithms, branches,
memory, loops, slow operations, floating point
operations … )
•  Vectorization
•  Parallelism
•  Locality Enhancement
•  Library Generators (Matrix-matrix multiplication,
sorting, sparsity, datamining, spiral)
Tentative Course Outline
•  W 1/20: Introduction. Course Organization. What
is program optimization? Performance Tools,
Hotspots and Cold-Spots.
•  F 1/22: Performance Issues: Algorithms and
Branching
•  W 1/27: Performance Issues: Memory
•  F 1/29: Performance Issues: Loops
•  W 2/3: Performance Issues: Slow operations and
Floating point operations
Tentative Course Outline
•  F 2/5 – 2/12: Vectorization (3 lectures)

•  W 2/17-2/26: Parallelization (4 lectures)

•  W 3/3-3/10: Locality/Tiling (3 lectures)

•  F 3/12: Midterm
Tentative Course Outline
•  W 3/17: Introduction to Library Generators.
Empirical Search
•  F 3/19- 3/31: The ATLAS system (2
lectures)
•  F 4/2: Cache Oblivious algorithms
•  W 4/7-4/9: Sorting (2 lectures)
•  F 4/9: Sorting
•  W 4/14: The SPIRAL system
Tentative Course Outline
•  F: 4/16: The SPARSITY system
•  F: 4/21: Datamining optimizations (2
lectures)
•  W: 5/5: TBD.
Format of the class
•  I will use slides (will put the slides in advance)

•  There will be reading assignments for some of


the topics covered

•  Some of the lectures will be based on the text


book: The Software Optimization Cookbook,
High Performance Recipes for IA-32 Platforms”,
Second Edition, by Richard Gerber, Aart J.C.
Bik, Kevin B. Smith, and Xinmin Tian, published
by INTEL Press, 2006. ISBN: 0-9764832-1-1.
Homeworks and Programming Assignments

•  There will be between 4 and 6 homeworks


and programming assignments.

•  They can be done in groups of 2, in which


case both students will have the same
grade.

•  Solutions will be discussed in class.


Exams
•  Exams will be closed books and closed
notes. Final will only cover the material not
covered in the midterm

•  Midterm: Friday 3/12


•  Final: Date to be announced
Grading
•  Final grades will be computed based on:
20% midterm
20% final
50% homeworks and programming
assignments
10% class participation
Questions?

Вам также может понравиться