Вы находитесь на странице: 1из 15

Using GCC Auto-Vectorizer

Ira Rosen <ira.rosen@linaro.org> Michael Hope <michael.hope@linaro.org> r !zr !ranch lp"#michaelh $%&un'$using-the-(ectorizer

Using GCC Vectorizer

Vectorization is enabled by the flag -ftree-vectorize and by default at -O3:


gcc O2 ftree-vectorize myloop.c or gcc O3 myloop.c !O :

To enable

-mfpu"neon -mfloat-abi"softfp or -mfloat-abi"hard

#nformation on $hich loops got vectorized% and $hich didn&t and $hy:

-fdump-tree-vect'-details(

dumps information into myloop.c.))t.vect dumps to stderr


2

-ftree-vectorizer-verbose"*+,

-ore information: http:..gcc.gnu.org.pro/ects.tree-ssa.vectorization.html

)ther use*ul *lags

-ffast-math - if operating on floats in a reduction computation 'to allo$ the vectorizer to change the order of the computation( -funsafe-loop-optimizations - if using 0unsigned int0 loop counters 'can be assumed not to overflo$( -ftree-loop-if-convert-stores - more aggressive if-conversion --param min-vect-loop-bound"*+, - if have loops $ith a short trip-count -fno-vect-loop-version- if $orried about code size
3

+hat,s (ectoriza!le

#nnermost loops

for '1 " 23 1 4 m3 1 55( for '/ " 23 / 4 m3 / 55( countable for 'i " 23 i 4 n3 i 55( no control flo$ a*1,*/,*i, " b*1,*/,*i, 6 c*1,*/,*i,3 independent data accesses continuous data accesses

!7ample of not vectorizable loop:


$hile 'a*i, 8" 9( : if 'a*i, 8" 2( a*i, " a*i-;,3 b*i5stride, " 23 < control flo$ loop carried dependence access $ith un1no$n stride
4

uncountable

-pecial *eatures

vectorization of outer loops vectorization of straight-line code if-conversion multiple data-types and type conversions recognition of special idioms 'e.g. dot-product% $idening operations( strided memory accesses cost model runtime aliasing and alignment tests auto-detection of vector size
5

!7amples: http:..gcc.gnu.org.pro/ects.tree-ssa.vectorization.html

GCC Versions

=urrent >inaro ?== is based on @A@ ?== B.C Once @A@ ?== B.D is released 'in about si7 months( >inaro ?== $ill s$itch to ?== B.D Aome of ?== B.D vectorizer related features: EEbuiltinEassumeEaligned alignment hints vectorization of conditions $ith mi7ed types vectorization of bool

-pecial *eatures

vectorization of outer loops vectorization of straight-line code if-conversion multiple data-types and type conversions recognition of special idioms 'e.g. dot-product% $idening operations( strided memory accesses cost model runtime aliasing and alignment tests auto-detection of vector size
7

!7amples: http:..gcc.gnu.org.pro/ects.tree-ssa.vectorization.html

.>;2: mov r;% rB libs$scale.rgb2rgbEtemplate.c: add r2% r2% ); static inline void rgb2Btobgr;CEc' vld3.8 :d;C% d;9% d22<% *r;,8 cmp r2% rC const uint9Et 6src% uint9Et 6dst% mov r2% ip int srcEsize( : strided access add rB% rB% )B9 add ip% ip% )32 const uint9Et 6s " src3 vld3.8 :d;D% d;I% d2;<% *r;, const uint9Et 6end3 vand J;2% JI% J;B H and shift right vshr.u9 J;;% J9% )3 uint;CEt 6d " 'uint;CEt 6(dst3 performed on u9 vand J9% J;2% J;3 end " s 5 srcEsize3 vshll.u8 JI% d2K% )3 vmovl.u9 J;2% d22 $hile 's 4 end( : vshll.u8 J;K% d2B% )3 const int b " 6s553 vmovl.u9 J;;% d23 $idening shift vorr J;2% J;K% J;2 const int g " 6s553 vorr J;;% JI% J;; const int r " 6s553 vshll.u8 J;2% d;C% )9 vshll.u8 JI% d;D% )9 6d55 " 'bFF3( G ''gH27@=(443( no over-promotion to s32 vorr J9% J;2% J;2 G ''rH27@9(449(3 vorr J;;% J;;% JI scalar: DK222 runs ta1e 2;C.K93ms vst;.;C :J9<% *r2,8 << 8 vector: DK222 runs ta1e B9.9K9Cms vst;.;C :J;;<% *r2, speedup: B.B337 bcc .>;2

Vectorizing *or ./).

Lriting vectorizer-friendly code

Mvoid aliasing problems Nse EErestrictEE Jualified pointers

void foo 'int 6EErestrictEE p#nput% int 6EErestrictEE pOutput(

Oon&t unroll loops >oop vectorization is more po$erful than A>P

for 'i"23 i4n3 i5"B( : sum 5" a*2,3 sum 5" a*;,3 sum 5" a*2,3 sum 5" a*3,3 a 5" B3<

for 'i"23 i4n3 i55( sum 5" a*i,3

Lriting vectorizer-friendly code 'cont.(

Nse countable loops% $ith no side-effects o function-calls in the loop 'distribute into a separate loop( for 'i"23 i4n3 i55( for 'i"23 i4n3 i55( : if 'a*i, "" 2( foo'(3 if 'a*i, "" 2( foo'(3 for 'i"23 i4n3 i55( b*i, " c*i,3 < b*i, " c*i,3

o Qbrea1&.&continue& for 'i"23 i4n3 i55( if 'a*i, "" 9( :m " i3 brea13< for 'i"23 i4m3 i55( 10 b*i, " c*i,3

for 'i"23 i4n3 i55( : if 'a*i, "" 9( brea13 b*i, " c*i,3 <

Lriting vectorizer-friendly code 'cont.(

Reep the memory access-pattern simple OonSt use indirect accesses% e.g.: for 'i"23 i4n3 i55( a*b*i,, " 73

OonSt use un1no$n stride% e.g.: for 'i"23 i4n3 i55( a*i5stride, " 73

Nse 0int0 iterators rather than 0unsigned int0 iterators

The = standard says that the former cannot overflo$% $hich helps the compiler to determine the trip count.
11

-ome o* our recent contri!utions


Aupport of vld .vst !O specific patterns: e.g. $idening shift A>P 'straight-line code vectorization( improvements TT> improvements:

reducing the number of moves and amount of spilling 'both for auto- and hand-vectorised code( improving modulo scheduling of !O code
12

0eople

>inaro Toolchain L? #ra Tosen '#T=: irar(


ira.rosenUlinaro.org

auto-vectorizer

Tichard Aandiford '#T=: rsandifo(


richard.sandifordUlinaro.org

!O

bac1-end.TT> optimizations

13

Helping us
Aend us e7amples of code that are important to you to vectorize.

14

)utput /1ample
e7.c: ; )define ;29 2 int a* ,% b* ,3 3 void foo 'void( B : K int i3 C D for 'i " 23 i 4 3 i55( 9 a*i, " i3 I ;2 for 'i " 23 i 4 3 i5"K( ;; b*i, " i3 ;2 <

LhatSs got vectorized:

gcc -c -O3 -ftree-vectorizer-verbose"; e7.c e7.c:D: note: >OOP V!=TOT#V!O. e7.c:3: note: vectorized ; loops in function.

LhatSs got vectorized and $hat didnSt:

gcc -c -O3 -ftree-vectorizer-verbose"2 e7.c e7.c:;2: note: not vectorized: complicated access pattern. e7.c:;2: note: not vectorized: complicated access pattern. e7.c:D: note: >OOP V!=TOT#V!O. e7.c:3: note: vectorized ; loops in function.

Mll the details: ...

gcc -c -O3 -ftree-vectorizer-verbose"I e7.c or gcc -c -O3 -fdump-tree-vect-details e7.c

15

Вам также может понравиться