eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Aim for the Top! Beating the current #1 Wide Finder log analyzer with the join-calculus.

Yesterday's goal was to beat the fastest "Wide Finder" log analyzer program. I have been successful; my best effort is nearly 3 times faster than the current number 1 in Tim Bray's table on the two-core machine I tested it on. Not bad considering that I'm essentially fighting against highly optimized standard libraries written in C (string searching and regexp matching).

I took Ilmari Heikkinen's OCaml implementation (referred to as wf-kig below) and improved it progressively, adding multicore support using JoCaml's join-calculus in the last step (the concurrent core, a mere 12 lines, is explained below). Here follows a summary of the optimizations, but let's see the numbers first. These are my results on a Core 2 Duo 2GHz, 2GB DDR2 667MHz machine (hot cache)*1:

wf-6.py (2)1.012??
wf-6.py (1)1.737??

wf-6.py is the current leader in Tim Bray's classification. It is multicore-enabled; the above table lists the times I got with one and two processes (unsurprisingly, further processes don't make it any faster because the test machine has got only two cores). wf-2.py is the basic Python implementation wf-6.py was evolved from and is about the same size as wf-kig.ml or wf.ml. My optimizations mostly mirror those applied to wf-2.py in order to create wf-6.py.

My fast "Wide Finder" implementations can be found in the darcs repository.

First improvement: sublinear string search

wf-kig.ml processes the input one line at a time, matching each one against the "GET /ongoing/When/\d\d\dx..." regexp. The first optimization I did was splitting that regexp matching into two parts:

  1. find the "GET /ongoing/When/" substring in the line
  2. try to match the remaining regexp at that position

This allows me to use a sublinear string search algorithm for the first part. I took the Boyer-Moore implementation I had written for ICFP07, achieving a 25% speed boost (this is "wf" in the above table). Note that wf-2.py already uses sublinear searching, so it is equivalent to "wf" modulo the implementation language.

Block processing

The profiler indicated nearly half of the time was spent in the function that reads a single line from disk. I changed my code to read 50MB chunks and scan them in one go, with some care to take the incomplete line at the end of each chunk and copy it at the beginning of the next one. This brings a 46% speed gain in wf-block relative to the previous version ("wf").

Avoiding IO

In order to avoid useless copying between kernel and user space, I switched to a memory-mapped solution. I wrote a reusable Bigstring module that allows to use mmap'ed memory as an OCaml string and adapted the Boyer-Moore string search implementation. wf-mmap.ml itself takes less code than wf-block.ml, and is about 30% faster.

There is still some potential for optimization in the single-core case because I have to copy the parts of the line to be matched against the regexp to a buffer, owing to a limitation in the interface of the regexp library. Of course, the ultimate optimization would be to specialize the code for the particular regexp we're using.

At this point, wf-mmap processes 200MB in 0.584s, and is nearly twice faster on a single core than wf-6.py using two. However, Tim's machine has got 8 cores, so I went for a multicore-enabled version.

Distributed programming with JoCaml

I adapted wf-mmap.ml to use several processes managed using JoCaml's join calculus. wf-mmap-multicore processes 200MB in 0.367s when using 2 processes, considerably faster than wf-6.py.

The join calculus is really neat, it allows you to think about concurrent processes at a higher level with fewer chances to deadlock or corrupt your data.

This is the code that controls the workers (if you have some time follow the introduction to concurrent and distributed programming with JoCaml, I can't recommend it enough); it differs a bit from what you'll find in the actual sources, because I just thought of this more elegant way as I was writing these lines:

   def agent(worker) & pending_chunks(chunk::chunks) & pending_results(n) =
      worker(filename, chunk, job_done) & pending_chunks(chunks) &
   or job_done(result) & pending_results(n) =
      merge_hash h result; pending_results(n-1)
   or pending_results(0) & pending_chunks([]) & wait() =
      print_top_n 10 (sort_results h);
      reply to wait
   in spawn pending_chunks(chunks_for_file filename) & pending_results(0);

The various clauses of the first join pattern can be considered separately:

  • if there's a free worker, a pending chunk to be processed and n chunks being processed, whose results are yet to be collected, give that chunk to the worker and record that there's a new pending result; the list of pending chunks is the tail of the previous list
  • once we get a result (job_done(result)), merge it with the statistics we have; there's now one less pending result
  • if there are no pending results and no chunks to be processed (empty list), print the top 10. By replying to the "wait" synchronous channel (whose result some other part of the program is waiting for), the program will terminate.

At the beginning, there are no pending results (pending_results(0)), but some pending_chunks.

Note all the things you can't see in the above code: there are no locks, no explicit synchronization, no boilerplate code that doesn't contribute directly to the solution.


Join Calculus Hurts My Head - Danno (2007-11-05 (Mon) 10:47:15)

Maybe it's just because I had no background in OCaml at the time, but I had to use JOCaml for a school project once and I think I spend more time reading the documentation for it than I ultimately ended up spending programming the dang thing.

Suffice to say, ultimately I prefer the Erlang actor model approach to concurrency.

That said, at least no one's pushing a concurrency approach based on Pi-Calculus, yeesh.

mfp 2007-11-05 (Mon) 11:04:22

I find JoCaml's join patterns very convenient, and I rather enjoy having a type system verifying that I'm using the processes correctly. Erlang does the peripheral stuff better, though (registering channels against a Join.Ns, forking + exec... feels a bit unclean).


Build error - Tim Bray (2007-11-05 (Mon) 12:05:39)

I don't seem to have those .o's and .a's in /usr/local/lib/ocaml... are there special installation options I need or some such? -Tim

sh jocaml-build.sh + as -o 'bigstring.o' 'bigstring.s' + as -o 'wf-mmap-multicore.o' 'wf-mmap-multicore.s' + as -o '/tmp/camlstartup335da0.o' '/tmp/camlstartup7ae318.s' + gcc -o 'wf-mmap-multicore' -I'/usr/local/lib/jocaml' '/tmp/camlstartup335da0.o' '/usr/local/lib/jocaml/std_exit.o' 'wf-mmap-multicore.o' 'bigstring.o' '/usr/local/lib/ocaml/bigarray.o' '/usr/local/lib/ocaml/str.o' '/usr/local/lib/jocaml/threads/join.a' '/usr/local/lib/jocaml/threads/threads.a' '/usr/local/lib/jocaml/unix.a' '/usr/local/lib/jocaml/stdlib.a' '-L/usr/local/lib/jocaml/threads' '-L/usr/local/lib/jocaml' '-L/usr/local/lib/ocaml' '-lthreadsnat' '-lunix' '-lpthread' '-lposix4' '-lunix' 'libbigarray.a' 'libstr.a' '/usr/local/lib/jocaml/libasmrun.a' -lnsl -lsocket -lm gcc: /usr/local/lib/ocaml/bigarray.o: No such file or directory gcc: /usr/local/lib/ocaml/str.o: No such file or directory gcc: libbigarray.a: No such file or directory gcc: libstr.a: No such file or directory Error during linking


Build problem - Tim (2007-11-05 (Mon) 12:10:47)

Once again with linebreaks... I'm missing the stuff in /usr/local/lib/ocaml

sh jocaml-build.sh

+ as -o 'bigstring.o' 'bigstring.s'

+ as -o 'wf-mmap-multicore.o' 'wf-mmap-multicore.s'

+ as -o '/tmp/camlstartup335da0.o' '/tmp/camlstartup7ae318.s'

+ gcc -o 'wf-mmap-multicore' -I'/usr/local/lib/jocaml' '/tmp/camlstartup335da0.o' '/usr/local/lib/jocaml/std_exit.o' 'wf-mmap-multicore.o' 'bigstring.o' '/usr/local/lib/ocaml/bigarray.o' '/usr/local/lib/ocaml/str.o' '/usr/local/lib/jocaml/threads/join.a' '/usr/local/lib/jocaml/threads/threads.a' '/usr/local/lib/jocaml/unix.a' '/usr/local/lib/jocaml/stdlib.a' '-L/usr/local/lib/jocaml/threads' '-L/usr/local/lib/jocaml' '-L/usr/local/lib/ocaml' '-lthreadsnat' '-lunix' '-lpthread' '-lposix4' '-lunix' 'libbigarray.a' 'libstr.a' '/usr/local/lib/jocaml/libasmrun.a' -lnsl -lsocket -lm

gcc: /usr/local/lib/ocaml/bigarray.o: No such file or directory

gcc: /usr/local/lib/ocaml/str.o: No such file or directory

gcc: libbigarray.a: No such file or directory

gcc: libstr.a: No such file or directory

Error during linking

mfp 2007-11-05 (Mon) 12:48:11

Got it now, you have to install JoCaml 3.10.0 after OCaml 3.10.0 and make sure it detects the latter as the "companion OCaml". There are a couple options in JoCaml's configure script:

  JoCaml may have a 'companion' Objective Caml, so as to allow
  object level compatibility beetween JoCaml and OCaml. There are
  severe restructions: the version numbers of both systems should match.

-ocamlc <ocamlc bytecode compiler>
  If <ocamlc bytecode compiler> is executable, then the script runs it,
  to check version number and, when it matches with JoCaml version number,
  to extract the location of the Objective Caml library.
    By default, this option is activated with value 'ocamlc'

-ocamllib <dir>
  The Objective Caml library is <dir>. No version check is performed. Use
  this option at your own risk.

When the configure script is done, you must see something like

** Configuration summary **

Directories where JoCaml will be installed:
       binaries.................. /usr/local/bin
       standard library.......... /usr/local/lib/jocaml
Companion ocaml:
       version: 3.10.0, library: /usr/lib/ocaml/3.10.0.


$ make world   
$ make opt
$ make opt.opt
$ make install

If all goes well, you should get something like this:

$ jocamlopt -v
The JoCaml native-code compiler, version 3.10.0
Standard library directory: /usr/local/lib/jocaml
Companion OCaml library directory: /usr/lib/ocaml/3.10.0

Once JoCaml has got a companion OCaml, you can compile normally:

$ jocamlopt -unsafe -c bigstring.ml
$ jocamlopt -unsafe -c wf-mmap-multicore.ml
$ jocamlopt -o wf-mmap-multicore bigarray.cmxa str.cmxa bigstring.cmx wf-mmap-multicore.cmx

I'll update README.txt and the Makefile to reflect this.

mfp 2007-11-05 (Mon) 13:08:53

README.txt and Makefile updated.


Use this form to create a new top-level comment; for direct replies to existing comments, use the text entries you'll find below.


Last modified:2007/11/05 10:02:30
Keyword(s):[blog] [ocaml] [fronpage] [widefinder] [python] [optimization]

*1 I'm using OCaml 3.10.0, JoCaml 3.10.0 and Python 2.5.1, compiled with gcc 4.0.1,