Modern Benchmarking

Don Stewart | Galois Tech Talk | Feb 2010

2008 Galois, Inc. All rights reserved.

Measuring Performance
• Easy, right?
– Time the computation – Record the time – Compare it against other programs

• Too many variables
– Noise on the machine... – (See Matt S. talk on eliminating noise)

• Need a statistically sound approach
2008 Galois, Inc. All rights reserved.

Making it easy
• A good benchmarking lib should
– Make it easy to code up benchmarks – Plot / print it in a variety of formats – Tell you when you're doing it wrong Compared to most other benchmarking frameworks (for any programming language, not just Haskell), criterion focuses on being easy to use, informative, and robust.
2008 Galois, Inc. All rights reserved.

Criterion
• http://is.gd/8ZqaO • Install the Haskell Platform, $ cabal install criterion • Uses some libraries from Hackage:
– statistics – uvector + uvector-algorithms – Chart + gtk2hs
2008 Galois, Inc. All rights reserved.

Output: Timing 100 runs

2008 Galois, Inc. All rights reserved.

Output: Density Function

2008 Galois, Inc. All rights reserved.

Write code...

Stream fusion code from the vector package: $ cabal install vector
import qualified Data.Vector as U main = print . U.sum $ U.enumFromTo 1 (1000000000 :: Int)

2008 Galois, Inc. All rights reserved.

Run code...

$ ghc -Odph --make enum.hs [1 of 1] Compiling Main Linking enum ... $ time ./enum 500000000500000000 ./enum 1.02s user 0.00s system 96% cpu 1.052 total ( enum.hs, enum.o )

2008 Galois, Inc. All rights reserved.

Now, using Criterion
General structure • • • • import Criterion.Main main = defaultMain $ … Compile Run and collect density functions

2008 Galois, Inc. All rights reserved.

Benchmarking simple functions

Example 1. $ ./ex1 -k win -u ex1.csv

2008 Galois, Inc. All rights reserved.

Benchmarking groups

Example 2. $ ./ex2 -k png:400x200 -u ex2.csv

2008 Galois, Inc. All rights reserved.

Timings vs Densities
• Multiple timings are somewhat useful • A probability distribution for expected times is more useful • Histograms or kernel density estimates

2008 Galois, Inc. All rights reserved.

Estimation of most common times

2008 Galois, Inc. All rights reserved.

Lots of other useful info
warming up estimating clock resolution... mean is 11.95260 us (80001 iterations) found 3366 outliers among 79999 samples (4.2%) 1052 (1.3%) high mild 2186 (2.7%) high severe estimating cost of a clock call... mean is 1.289043 us (77 iterations) found 5 outliers among 77 samples (6.5%) 3 (3.9%) high mild
2008 Galois, Inc. All rights reserved.

2 (2.6%) high severe

Lots of other useful info
benchmarking fold/100k collecting 100 samples, 125 iterations each, in estimated 1.202303 s bootstrapping with 100000 resamples mean: 94.20363 us, lb 93.86267 us, ub 94.71123 us, ci 0.950 std dev: 2.104295 us, lb 1.615804 us, ub 3.312965 us, ci 0.950 found 3 outliers among 100 samples (3.0%) 2 (2.0%) high mild 1 (1.0%) high severe variance introduced by outliers: 0.997% variance is unaffected by outliers
2008 Galois, Inc. All rights reserved.

Benchmarking other types
• Different evaluation depths:
– whnf – nf – nfIO – whnfIO

• Benchmarkable class.
2008 Galois, Inc. All rights reserved.

Combining plots: progression
• http://is.gd/8Zr6E • Collect sets of benchmarks into a group • Compare against prior runs • ./ex -n first • ./ex -n second
2008 Galois, Inc. All rights reserved.

-fllvm beats -fasm

2008 Galois, Inc. All rights reserved.

An aside: the LLVM backend
• • • • • http://is.gd/8TOHc New backend to GHC Uses LLVM for code generation Adds asm optimization pass Good speedups for numeric code

2008 Galois, Inc. All rights reserved.

Sign up to vote on this title
UsefulNot useful