# Rethinking the Benchmark module

There's more to benchmarking than slapping your code inside

Benchmark.bm{|bm| bm.report("foo"){ ... } }

The first thing we instinctively do is add TIMES.times with a large TIMES constant. Surely "errors" cancel each other out for large enough values of TIMES, right? But, how large is large enough?

I wrote a simple AdaptativeBenchmark which works more or less like the Benchmark in stdlib, but also decides how many times to repeat execution in order to approach the average time with the desired precision for a given confidence level:

AdaptativeBenchmark.bm do |bm| # by default, approach the average with a 10% confidence interval for a 95% # confidence level bm.report("bar") { bar } bm.report("foo", :precision => 0.05, :confidence => 0.9) { foo } end

The AdaptativeBenchmark first estimates the sample variance, population variance and population average by running the given block min_runs times (10 by default), and then uses those initial estimates to compute how many iterations are needed for the desired confidence interval and level. The initial number of runs should probably be adjusted the same way the extra ones are, but I'm not sure it's worth the effort.

class AdaptativeBenchmark def initialize(len, runs) @field_len = len @min_runs = runs end def self.bm(len = 10, min_runs = 10) puts "#{" " * len} \tstddev \truns\tinterval\tconfidence" yield new(len, min_runs) end REPORT_OPT = {:precision => 0.1, :confidence => 0.95} def report(name, options = {}) old_sync, $stdout.sync = $stdout.sync, true opt = REPORT_OPT.clone.update(options) sample_avg = sample_variance = 0 tms_to_total = lambda{|tms| tms.utime + tms.stime } take_sample = lambda do |i| GC.start t0 = tms_to_total[Process.times] yield exec_time = tms_to_total[Process.times] - t0 new_sample_avg = sample_avg + (exec_time - sample_avg) / i if i == 1 sample_avg = new_sample_avg next end sample_variance = (1 - 1.0/i) * sample_variance + (i+1) * (new_sample_avg - sample_avg) ** 2 sample_avg = new_sample_avg end (1..@min_runs).each{|i| take_sample[i] } population_variance = 1.0 * @min_runs / (@min_runs - 1) * sample_variance a = opt[:precision] * sample_avg num_runs = (sample_variance / (a ** 2 * (1 - opt[:confidence]))).ceil total_runs = @min_runs+num_runs (@min_runs+1..total_runs+1).each{|i| take_sample[i]} population_variance = 1.0 * total_runs / (total_runs - 1) * sample_variance puts "%-#{@field_len}s %8.6f\t%8.6f\t%-6d\t%8.6f\t%3d%%" % [name, sample_avg, Math.sqrt(population_variance), total_runs, a, opt[:confidence] * 100] ensure $stdout.sync = old_sync end end

### The maths

In order to avoid storing all the sample values, the sample average is computed with the recursive relation

and the sample variance with

the population variance being estimated as . The number of iterations is taken as

where is the desired confidence interval (as a fraction of the population average) and P the confidence level, given as the :precision and :confidence parameters.

### Limitations

As said above, the initial estimates determine how many runs are needed, so at times way too many will be deemed necessary if the variance was high for some reason or the desired confidence interval (level) is too narrow (high).

Computing the maximum likelihood estimates with the above relations makes sense if there are lots of samples, but the basic definitions are probably more practical.

### Example

TIMES = 100_000 AdaptativeBenchmark.bm(10) do |bm| a = 0 $a = 0 @a = 0 bm.report("lvar"){ TIMES.times {|a| a; a; a; a; a} } bm.report("dvar_curr"){ TIMES.times{|b| b; b; b; b; b} } bm.report("dvar"){ 1.times{|b| 1.times{|c| 1.times{|d| TIMES.times{|b| b; b; b; b; b} } } } } bm.report("gvar"){ TIMES.times{|$a| $a; $a; $a; $a; $a} } bm.report("ivar"){ TIMES.times{|@a| @a; @a; @a; @a; @a} } # don't really care about cvar, constant end

stddev runs interval confidence lvar 0.070500 0.003436 19 0.007100 95% dvar_curr 0.069048 0.003481 20 0.006800 95% dvar 0.076000 0.005712 24 0.007600 95% gvar 0.070000 0.000000 11 0.007000 95% ivar 0.100714 0.003397 13 0.010100 95%

### BenchUnit - Daniel Berger (2006-05-10 (Wed) 09:03:16)

I blogged about the idea of a BenchUnit way back at http://djberg96.livejournal.com/52734.html

Perhaps we can join these two ideas....

#### mfp 2006-05-10 (Wed) 09:52:57

Hmm that gives me another idea: testing non-functional requirements the same way we unit test. Imagine

assert_complexity(:linear) do |n,runner| struct = FunkyStruct.new # ... runner.run { n.times{|i| struct.insert(foo(i)) } } end

#### Daniel Berger 2006-05-11 (Thr) 08:31:54

I feel "ZenBench" coming on...

- 12 http://planetruby.0x42.net
- 11 http://www.artima.com/forums/flat.jsp?forum=123&thread=159669
- 9 http://www.artima.com/buzz/community.jsp?forum=123
- 7 http://www-scf.usc.edu/~adiazar/resources.html
- 5 http://anarchaia.org
- 5 http://www.jordaniansongs.net
- 4 http://www.artima.com/buzz/index.jsp
- 4 http://www.ruby-forum.com/topic/100291
- 3 http://www.procata.com/blog
- 2 http://anarchaia.org/archive/2006/05.html

Keyword(s):[blog] [ruby] [frontpage] [benchmark] [snippet] [probability] [estimate]

References:[Ruby]