Вы находитесь на странице: 1из 22

Top Ten Development Tips for High January 6, 2016

Performance Embedded Systems

Top Ten Development Tips for


High Performance Embedded
Systems

Michael Barr & Nigel Jones

January 6, 2016

ABOUT BARR GROUP

Mission: “Help as many people as possible build


safer, more reliable, and more secure embedded
systems.”

Services: Consult on (re)architecture and process,


design and develop embedded systems/software,
train engineers in best practices, & testify
2 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-1


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

UPCOMING PUBLIC TRAINING

In U.S. locations from February thru April:


Ø  Embedded Software Boot Camp®
Ø  Embedded Security Boot Camp®
Ø  Embedded Android Boot Camp™
Ø  Test-Driven Development; and Agile Management

We’ll also be coming to Munich, Germany in May!

http://barrgroup.com/training-calendar
3 Copyright 2015 Barr Group. All rights reserved.

NIGEL JONES, CHIEF ENGINEER

30+ years of embedded systems design


" Hardware & firmware
" Industrial, telecom, consumer
" CPUs: 8, 16 & 32 bit
" Low power / mobile
" Embedded Languages: C, assembly
" Roles: engineer, consultant, expert witness

E-mail: njones@barrgroup.com
4 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-2


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

High Performance Embedded


Systems

INTRODUCTION

Motivations for increasing performance:

§  Cheaper hardware.
§  Lower energy consumption.
§  Better user experience.
§  Personal satisfaction.

6 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-3


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

HIGH PERFORMANCE PRINCIPLES

§  Do it in hardware
§  If you have to do it in firmware, then
§  Choose the right algorithms
§  Use the best tools
§  Configure the tools optimally
§  Use the right data types / formats
§  Use the right language constructs
§  Apply high performance coding techniques

7 Copyright 2015 Barr Group. All rights reserved.

HARDWARE CONFIGURATION

•  Is your oscillator configured correctly?


•  Use the Clock Output feature to verify
•  Do you have the wait states set correctly for
each memory space?
•  Do you understand that increasing the clock speed ≠
increasing performance?
•  Do you have the instruction and data caches
turned on?
•  Do you understand how to access non-cached
memory?
8 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-4


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

DIRECT MEMORY ACCESS

DMA offers massive performance gains


" Memory – memory transfer
" Peripheral – memory transfer
" CRC calculation
DMA is typically quite hard to set up
" Lot of gains via interrupt minimization so get the
configuration right!

9 Copyright 2015 Barr Group. All rights reserved.

OTHER SPECIALIZED HW

Do you have FIFOs turned on?


Are you taking advantage of a MAC unit?
Is the FPU correctly set up?
" Does the compiler know about the FPU?
" What’s the optimal word length for the FPU?
Does your CPU contain a barrel shifter?
What about other specialized HW features?

10 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-5


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

ALGORITHMS

“A good algorithm badly coded will always beat a


bad algorithm coded superbly”
•  Put time into researching algorithms
•  Don’t listen to folk lore
•  Understand the limitations of an algorithm
•  Benchmark competing algorithms without
worrying too much about the implementation
details.
•  Only after determining the best algorithm
should you worry about implementation
11 Copyright 2015 Barr Group. All rights reserved.

FRIDEN INTEGER SQUARE ROOT

See http://embeddedgurus.com/stack-overflow/2007/04/crest-factor-square-roots-neat-algorithms/

12 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-6


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

SORTING

Name Random Sorted Inverse


Sorted
Qsort 23004 3088 19853
Gnome 17389 892 35395
Selection 14392 14392 14392
Insertion 5588 1179 10324
Shell 6589 4675 6115
Comb 10217 8638 10047
Heap 8449 8607 7413
Bubble 13664 784 16368
Cocktail 17657 3807 27634

13 Copyright 2015 Barr Group. All rights reserved.

MIDDLE OF 3 VALUES

See http://www.cs.mtu.edu/~shene/COURSES/cs201/NOTES/chap03/sort.html

14 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-7


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

MEDIAN FILTERING

•  Ekstrom’s algorithm uses a linked list approach


and is the best I’ve found
•  It destroys sorting based algorithms (with some
caveats)
•  Gives you the median, minimum and maximum

See http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/

15 Copyright 2015 Barr Group. All rights reserved.

INTEGER LOG

See http://embeddedgurus.com/stack-overflow/2008/05/integer-log-functions/

16 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-8


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

CHOOSE THE BEST TOOLS

•  How much is your time worth?


•  Can you afford cheap / free tools?
•  Evaluate tools – particularly compilers
•  Interrupt performance
•  Floating point performance (if applicable)
•  Function calling performance
•  Example code from previous projects
•  Documentation / support

17 Copyright 2015 Barr Group. All rights reserved.

CONFIGURE COMPILER FOR C99

I always configure the compiler for at least C99


compliance.
•  Better defined behavior
•  Access to C99 integer types
•  Restrict
•  Inline
•  _Bool (or bool)
•  _Complex (or complex)

18 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-9


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

ALLOW COMPILER EXTENSIONS

•  Optimal access to hardware / SFR


•  Intrinsic functions
•  Swap nibbles
•  Saturated arithmetic
•  Integer rotation
•  Endian swap
•  Interrupt support
•  Memory models

19 Copyright 2015 Barr Group. All rights reserved.

USE FULL OPTIMIZATION

If optimization breaks your code then…


•  Are you using volatile correctly?
•  Are you misusing restrict?
•  Are you exploiting dark corners of the language?
•  Are you writing convoluted code?
•  Is your code warning free?
•  Is your code Lint free?
•  Are you using a cheap compiler?
20 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-10


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

SPEED OPTIMIZATION > SIZE OPTIMIZATION

On the assumption that you have enough memory


then…
•  Speed optimization gives faster code!
•  Speed uses less stack space
•  Code size can be smaller

21 Copyright 2015 Barr Group. All rights reserved.

MEMORY MODELS

If the target CPU supports different memory


spaces then:
•  Do you understand the access speed of the
different mspaces?
•  Correct placement of key variables is critical for
high performance
•  Default placements are often awful!
•  Do you understand what is being cached and
what isn’t?
22 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-11


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

INTEGER SIZES

•  Choice of the correct integer size can have a


massive impact on performance.
•  Unsigned integers are normally faster than
signed on low end processors.
•  Understand the C integer promotion rules!
•  Use the C99 data types to build high
performance portable code
•  uint16_t
•  uint_least16_t
•  uint_fast16_t
23 Copyright 2015 Barr Group. All rights reserved.

STRUCTURE ALIGNMENT

•  Not an issue on low end processors


•  Huge impact on high end processors
•  Make sure you know the optimal alignment rules
•  Order structure elements to maximize performance
•  Use pragmas to guarantee you get the alignment
you want.
•  Be very careful with bitfields
•  Inherently non-portable / ill-defined
•  Bit field widths > 1 can be very inefficient
24 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-12


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

HIGH PERFORMANCE CODING TECHNIQUES

•  Integer division by a constant


•  Fixed point arithmetic
•  Lookup tables

25 Copyright 2015 Barr Group. All rights reserved.

INTEGER DIVISION BY A CONSTANT

Examples of uint16_t variable A being divided by:


3: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 1
10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
60: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 5
100: (((((uint32_t)A * (uint32_t)0x47AF) >> 16) + A) >> 1) >> 6
π: ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1
√2: (((uint32_t)A * (uint32_t)0xB505) >> 16) >> 0

Above is optimal for 8 /16 bit processors. If you have a 32 bit machine with a
barrel shifter then combine the shifts: e.g.
Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> (16 + 3)
See
http://embeddedgurus.com/stack-overflow/2009/06/division-of-integers-by-constants/

26 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-13


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

FIXED POINT ARITHMETIC

•  Dramatically faster than floating point


•  Direct HW support built into many CPUs
•  Can be handled in standard C
•  Some compilers have intrinsic support for it
•  Usually re-entrant
•  Limited dynamic range
See: https://en.wikipedia.org/wiki/Q_%28number_format
%29

27 Copyright 2015 Barr Group. All rights reserved.

LOOKUP TABLES

•  Normally the fastest way of ‘computing’


something
•  Don’t be afraid of massive lookup tables
•  When combined with range reduction
techniques can accommodate a huge dynamic
range
•  You still need to perform range checking and
illegal condition trapping

28 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-14


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

LOOKUP TABLE SYNTAX

29 Copyright 2015 Barr Group. All rights reserved.

OPTIMAL C CONSTRUCTS - RESTRICT

•  ‘restrict’ added as a keyword in C99


•  Allows compiler to ignore potential aliasing
issues with pointers
•  Can have a huge impact on speed
•  Not without its risks!

30 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-15


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

OPTIMAL C CONSTRUCTS - CONST

Makes code:
•  More robust
•  More maintainable
•  Potentially faster
•  Almost no downside to use

31 Copyright 2015 Barr Group. All rights reserved.

OPTIMAL C CONSTRUCTS STATIC+INLINE

Declare local functions as static inline makes code:


•  More robust
•  More maintainable
•  Potentially a lot faster
•  Almost no down side to use

32 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-16


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

OPTIMAL C CONSTRUCTS – *PTR;++PTR

Avoid post increment and decrement:

y = *ptr++; -----à y = *ptr; ++ptr;

See: https://www.iar.com/Support/resources/articles/writing-
optimizer-friendly-code/

33 Copyright 2015 Barr Group. All rights reserved.

PARAMETER PASSING

Parameter passing is expensive:


•  Use lots of small functions rather than a large
function that switches on the parameter list.
•  Order of parameter passing can be important on
small processors
•  Some parameters passed in registers
•  Others passed on memory locations
•  Don’t use globals to avoid passing parameters!

34 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-17


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

BE WARY OF FORMATTED OUTPUT

•  Uses variable length argument lists…


•  Massive stack usage is normal
•  Long and variable run time
•  It’s easy to link in libraries that you don’t need
•  Floating point
•  Long long
•  Sometimes not re-entrant

35 Copyright 2015 Barr Group. All rights reserved.

INTERRUPTS

High frequency interrupts consume vast amounts


of CPU bandwidth.
•  Intrinsic interrupt overhead
•  Cache flush
•  Pipeline stall
•  Register stacking

36 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-18


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

REGISTER STACKING FOR TWO LINE ISR

160 static __interrupt void timer0_CompareMatchAIsr(void)


\ timer0_CompareMatchAIsr:
161 {
\ 00000000 938A ST -Y, R24
\ 00000002 93FA ST -Y, R31
\ 00000004 93EA ST -Y, R30
\ 00000006 923A ST -Y, R3
\ 00000008 922A ST -Y, R2
\ 0000000A 921A ST -Y, R1
\ 0000000C 920A ST -Y, R0
\ 0000000E 937A ST -Y, R23
\ 00000010 936A ST -Y, R22
\ 00000012 935A ST -Y, R21
\ 00000014 934A ST -Y, R20
\ 00000016 933A ST -Y, R19
\ 00000018 932A ST -Y, R18
\ 0000001A 931A ST -Y, R17
\ 0000001C 930A ST -Y, R16
\ 0000001E B78F IN R24, 0x3F
162 TCCR0B = 0; /* Stop the timer */
\ 00000020 E000 LDI R16, 0
\ 00000022 BF03 OUT 0x33, R16
163 fifo_AddEvent(Event); /* Post the event */
\ 00000024 9100.... LDS R16, Event
\ 00000028 .... RCALL fifo_AddEvent
164 }
\ 0000002A BF8F OUT 0x3F, R24
\ 0000002C 9109 LD R16, Y+
\ 0000002E 9119 LD R17, Y+
\ 00000030 9129 LD R18, Y+
\ 00000032 9139 LD R19, Y+
\ 00000034 9149 LD R20, Y+
\ 00000036 9159 LD R21, Y+
\ 00000038 9169 LD R22, Y+
\ 0000003A 9179 LD R23, Y+
\ 0000003C 9009 LD R0, Y+
\ 0000003E 9019 LD R1, Y+
\ 00000040 9029 LD R2, Y+
\ 00000042 9039 LD R3, Y+
\ 00000044 91E9 LD R30, Y+
\ 00000046 91F9 LD R31, Y+
\ 00000048 9189 LD R24, Y+
\ 0000004A 9518 RETI

15 registers stacked and unstacked!


37 Copyright 2015 Barr Group. All rights reserved.

OCCASIONAL FUNCTION CALLS FROM AN ISR

38 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-19


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

TIPS FOR ISRS

•  Minimize frequency
•  Don’t make function calls, including library calls
•  Consider using software interrupts
•  No floating point
•  Maximum optimization
•  Size optimization may be better than speed!

39 Copyright 2015 Barr Group. All rights reserved.

SIMPLIFY – THEN ADD LIGHTNESS

Allow the compiler to do its thing by


•  Writing simple code
•  Strive for one operation per line
•  Constantly refactor

With apologies to Colin Chapman!


40 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-20


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

KEY TAKEAWAYS

•  Use the hardware to its limits


•  Choose objectively great algorithms
•  Use the best tools you can
•  Configure the tools to perform their best
•  Use full speed optimization
•  Use the correct mspaces and integer types
•  Use techniques such as fixed point & lookup tables
•  Use inline, const, restrict, static and volatile
•  Pay special attention to ISRs
•  Write clean, simple code
41 Copyright 2015 Barr Group. All rights reserved.

QUESTION & ANSWER

42 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-21


Top Ten Development Tips for High January 6, 2016
Performance Embedded Systems

CONCLUSION

43 Copyright 2015 Barr Group. All rights reserved.

Copyright Barr Group. Do not copy. High Performance p. 1-22

Вам также может понравиться