Вы находитесь на странице: 1из 45

Session 1

Optimization Techniques

Profiling Tools 1
Content
 Program optimization – introduction
 Optimization techniques for embedded systems
development
 C for Embedded systems

2
Session Objectives
 To learn the importance of optimization of the program
 To know the different optimization techniques for
embedded systems design
 To understand why to use c for embedded system
development

3
Introduction

Profiling Tools 4
The Problem
 PC speed increased 500 times since 1981, but today’s
software is more complex and still hungry for more
resources
 How to run faster on same hardware and OS
architecture?
 Highly optimized applications run tens times faster
than poorly written ones
 Using efficient algorithms and well-designed
implementations leads to high performance
applications

5
Writing Fast Programs
 Use a fast algorithm
 It does not make sense to optimize a bad algorithm
 Implement it efficiently
 Detect hotspots using profiler and fix them
 Understanding of target system architecture is
often required – such as cache structure
 Use platform-specific compiler extensions –

 memory pre-fetching

 cache control-instruction

 branch prediction

 SIMD instructions

 Write multithreaded applications

6
Writing Fast Programs
 Use good coding practices
 Use good data structures
 Apply appropriate optimization techniques
 Optimizing code takes time and reduces source code
readability

7
Optimizing Embedded Software
 Embedded software often runs on processors with
limited computation power, thus optimizing the code
becomes a necessity
 Program can be made either faster or smaller, but not
both
 An improvement in one of these areas can have a
negative impact on the other
 It is up to the programmer to decide which of these
improvements is most important to her/him
 Recommendation: reduce the size of your program

8
Optimizing For Program Size
 Goal:
 Reduce hardware cost of memory
 Reduce power consumption of memory units
 Two opportunities:
 Data
 Reuse constants, variables, data buffers in
different parts of code
 Requires careful verification of correctness

 Generate data using instructions


 Instructions
 Avoid function inlining
 Choose CPU with compact instructions
 Use specialized instructions where possible

9
Cost Of High Performance

10
Performance: Where To Look
 “Maximize performance - who knows where to
optimize and where not to optimize”
 Spend your time optimizing the portions of code where
the most time is taken
 Run a compiled program to learn where that
program spends its time
 May profile other computational resource usage -
Space, Power, I/O
 Not easy to estimate this resource usage by static
analysis (requires dynamic)

11
Performance: Where To Look
 Problem: You're given a program's source
code (which someone else wrote) and asked
to improve its performance by at least 20%
 Where do you begin?
 Look at source code and try to find
inefficient C code
 Try rewriting some of it in assembly

 Rewrite using a different algorithm

 (Remove random portions of the code) 

12
Performance: Where To Look
How to figure out where a program is spending its time?
Count every static instruction - to know which routines
(functions) were the biggest
 Big deal, large functions that aren't executed often
don't really matter
Count every dynamic instruction – to know which
routines executed the most instructions
 Excellent! It tells the “relative importance” of each
function
 But doesn't account for memory system

Count how many cycles were spent in each routine - to


know which routines took the most amount of time

13
The Software Optimization
Process Hotspots are areas in
your code that take a
long time to execute

Create benchmark
Find hotspots

Retest using
benchmark Investigate causes

Modify application

14
Extreme Optimization Pitfalls
 Large application’s performance cannot be improved
before it runs
 Build the application then see what machine it runs on
 Runs great on my computer…
 Debug versus release builds
 Performance requires assembly language
programming
 Code features first then optimize if there is time
leftover

15
Key Point:

Software optimization doesn’t


begin where coding ends –
It is ongoing process that
starts at design stage and
continues all the way through
development

16
90/10 Rule
 90% of execution time is spent in 10% of code
 So the ‘hot’ 10% is the code that must be optimized
 Optimization takes time, but gives efficient code – so
only use for 10%
 Simple interpretation is quick, but gives slow code –
use for 90%
 Tradeoff – need to get balance right!

17
How To Find Performance
Bottlenecks
 Determine how the system resources are being utilized to
identify system-level bottlenecks
 Measure the execution time for each module and function
in the application
 Determine how the various modules running on the
system affect the performance of each other
 Identify the most time-consuming function calls and call
sequences within the application
 Determine how the application is executing at the
processor level to identify microarchitecture-level
performance problems

18
Improving Program Performance
 Compiler writers try to apply several standard
optimizations - Do not always succeed
 Compiler writers sometimes apply aggressive
optimizations
 Often not “informed” enough to know that change
will help rather than hurt
 Optimizations based on specific
architecture/implementation characteristics can be
very helpful
 Much harder for compiler writers because it
requires multiple, generally very different, “back-
end” implementations
19
Improving Program Performance
 How can one help?
 Better code, algorithms and data structures (of
course)
 Re-organize code to help compiler find
opportunities for improvement
 Replace poorly optimized code with assembly code
(i.e., bypass compiler)

20
Writing Efficient C code
 To write efficient C code, you must be aware of areas
 The C compiler has to be conservative

 The limits of the processor architecture the C


compiler is mapping to
 The limits of a specific C compiler - dependent on
the compiler vendor
 look at the compiler’s documentation or

experiment with the compiler

21
Performance Tools Overview
 Timing mechanisms
 Stopwatch : UNIX time tool

 Optimizing compiler (easy way)


 System load monitors
 vmstat , iostat , perfmon.exe, Vtune Counter

 Software profiler
 Gprof, VTune, Visual C++ Profiler, IBM Quantify

 Memory debugger/profiler
 Valgrind , IBM Purify, Parasoft Insure++

22
Optimization Techniques
• Bad memory management has serious impacts
• Poor data locality causes high power dissipation
• Poor memory throughput leads to poor
performance
• Optimization techniques
• Platform independent
• Loop transformation
• Data reuse
• Processor partitioning

23
Optimization Techniques
 Architecture specific
 Memory modeling optimization

 Register allocation – graph coloring

 Custom memory architecture

 Memory address generation

 General compilers – generated addresses are

periodic
 Embedded systems – address sequence might

not be periodic

24
Optimization Techniques
 The "scope" of the optimization:
 Local optimizations - Performed in a part of one procedure.
 Common sub-expression elimination (e.g. those occurring when
translating array indices to memory addresses.
 Using registers for temporary results, and if possible for
variables.
 Replacing multiplication and division by shift and add operations.
 Global optimizations - Performed with the help of data flow
analysis and split-lifetime analysis.
 Code motion (hoisting) outside of loops
 Value propagation
 Strength reductions
 Inter-procedural optimizations

25
Optimization Techniques
 What is improved in the optimization:
 Space optimizations - Reduces the size of the
executable/object.
 Constant pooling

 Dead-code elimination.

 Speed optimizations - Most optimizations belong to


this category

26
Optimization Techniques
 There are important optimizations not covered above,
e.g. the various loop transformations:
 Loop unrolling - Full or partial transformation of a
loop into straight code
 Loop blocking (tiling) - Minimizes cache misses by
replacing each array processing loop into two
loops, dividing the "iteration space" into smaller
"blocks"
 Loop interchange - Change the nesting order of
loops, may make it possible to perform other
transformations
 Loop distribution - Replace a loop by two (or more)
equivalent loops
 Loop fusion - Make one loop out of two (or more)

27
C Language In Embedded Systems

Profiling Tools 28
C Language In Embedded Systems
A number of causes to the increased popularity of C in
embedded system area:
 The ever-increasing complexity of applications drives
programmers from assembly to the high-level
languages
 The high-level programming language C offers good
support for high-speed, low-level I/O operations
 Programmers of embedded applications particularly
appreciate this mixed high/low-level approach
 In comparison to other high-level language compilers,
C language compilers tend to deliver more condensed
code size

29
C Language In Embedded Systems
 Virtually all mathematical modeling tools generate C
source code
 C offers significant productivity gains with opportunities
for
 Code re-use
 Improved code maintenance
 Ongoing developments over the life of the application
 C can be written in a structured manner that reduces
the chance of producing errors
 C can also be written in a very condensed manner,
which is hard to comprehend and dramatically
increases the likelihood of introducing errors

30
C Language In Embedded Systems
 The compiler does not necessarily detect small typing
errors
 The operators &&, &, ||, |, +=, =, and ==, and think
of the ease with which a typo will still lead to
perfectly valid C code
 Not every programmer is fully aware of the effects of
all the possible constructs in the C language
 Casts (implicit or explicit) can cause both confusion
and errors

31
C Language In Embedded Systems
 One of the main reasons that C compilers do a great
job of generating compact, efficient code is because of
the limited run-time checking in C
 There are no provisions in C that would prevent
arithmetic exceptions such as divide by zero,
overflow, validity of addresses or pointers, or
surpassing array boundaries from causing a
runtime software failure
 It is therefore easy to understand that programmers
with a special interest in writing robust, consistent
code have a concern with the programming
language C

32
C Language In Embedded Systems
 Many of the companies developing safety-related
embedded applications have written guidelines to
restrict the use of error-prone C constructs with the
intention of reducing the probability of errors
 The goal of these standards is to increase portability,
reduce maintenance, and above all improve clarity
 Mixed coding style is harder to maintain than bad
coding style

33
C Language In Embedded Systems
 These standards recognize that individual
programmers have the right to make judgments about
how best to achieve the goal of code clarity
 All code should be ANSI standard and should compile
without warning under at least its principal compiler
 Any warnings that cannot be eliminated should be
commented in the code

34
Optimizing C Code

Profiling Tools 35
Help From The Compiler
 Always use compiler optimization settings to build an
application for use with performance tools
 Understanding and using all the features of an
optimizing compiler is required for maximum
performance with the least effort
 Use a compiler that supports your CPU
 Avoid compiler optimization when debugging
 Compiler optimization may:
 Cause certain variables to vanish
 Prevent stepping through each line of the code
 Make it impossible to place breakpoints freely
 Identify your machine to the compiler
 gcc -march=athlon

36
Help From The Compiler
 Ask the compiler to unroll loops
 gcc -funroll-loops
 gcc -funroll-all-loops

 Ask the compiler to generate procedures inline


 gcc -finline-functions
 Askthe compiler to generate conditional
expressions in place of branches
 gcc -O
 Use hand tuned library calls for your platform
 There is very little gain in optimizing the string copy
function... Someone already did this for you

37
Gcc Optimization Levels
 O0
 don’t optimize
 reduce cost of compilation
 make debugging possible
 O1
 basic optimizations for execution time and space reduction
 only functions declared as inline are expanded inline
 only variables declared as register are placed in registers
 O2
 most optimization flags are turned on
 compiler optimizes variable reister usage
 does not do any space-speed trade-offs (ie no inlines)
 O3
 turns on all available optimization flags
 compiler will attempt inlining for all compact functions
 code generated is much larger than 02 but only slightly faster

38
Optimizing Compiler : Choosing
Optimization Flags Combination

39
Optimizing Compiler’s Effect

40
Helping The Compiler
 Variables
 Avoid complicated pointer arithmetic; use array
indexes
 Use aliases

 Use const and register where appropriate

 Use integer arithmetic in place of floating point

 Use local variables in place of function arguments

 Use word sized variables if possible

 Avoid globals; use static variables as a last resort

 Avoid volatile unless you mean it

41
Helping The Compiler
 Functions
 Declare compact functions as inline

 Declare local functions as static

 Avoid function calls in tight and frequent loops

 Avoid indirect calls

 Avoid recursion, unless necessary

 Use __attribute__ ((noreturn))

 Use __attribute__ ((const))

42
Helping The Compiler
 Control flow
 Simple design will often prevent extra branches

 Fewer branches leads to more effective branch


prediction
 Faster for loop

 If..else…

 Switch

 Loop breaking

43
Helping The Compiler
 Files
 Keep closely related functions together

 Little optimization is done (by ld) at the linking stage

 Libraries
 Use functions best suited for the task

 memcpy can be faster than strcpy if you know the


length
 puts is faster than printf

44
Session Summary
 Software optimization doesn’t begin where coding
ends – It is ongoing process that starts at design
stage and continues all the way through development
• Optimization techniques
• Platform independent
• Loop transformation
• Data reuse
• Processor partitioning

45

Вам также может понравиться