eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

The ICFP Programming Contest 2006: a VM that challenges YARV as far as the fun factor goes

In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. (...) Based on a translation of these documents, we now know that the society, the Cult of the Bound Variable, was devoted to the careful study of computation, over two millennia before the invention of the digital computer.

So began the task description for the ICFP Programming Contest 2006, the international programming contest where you are given 72H to solve a problem, working alone or in a group.

This year's contest was truly impressive; it's obvious the organizers have poured an enormous amount of work into it. It was so fun many (most) people will keep working on it even though the contest is over. And there was much to be learned as usual!*1

Imagine how exhilarating playing a text adventure game in a simulated OS running on a VM you implemented yourself can be, and that's but one of the puzzles...



The "qualifying problem" involved writing a small VM; you can find my implementation, a fairly fast Ruby extension, below.

We were given the description of a virtual machine (the Universal Machine) and a rather mystifying codex which was actually a program to be executed on that machine. After running it, you'd get a larger program that corresponded to something resembling an OS, named UMIX for its superficial similarity with UNIX.

There were 8 user accounts (in addition to root), each with its home directory, containing the descriptions of the actual problems to be solved. Most of them involved writing a program in some obscure programming language, whose documentation and reference interpreter could be found in the FS inside the UMIX machine!



I worked alone and only scored 607 points. I started quite late (about 20H into the contest), only after reading Gregory Brown's blog.

Since there were several independent problems, this year larger teams had a big head start --- there was virtually no cost associated to communication, and everybody could work in parallel, which was not quite as easy in previous years. I played it safe, attempting to solve the easy parts of several small problems, instead of trying harder with only a couple, facing the risk of not securing any points at all if I didn't finish in time*2

I recognized a few other Rubyists in the scoreboard*3. In addition to Gregory Brown's "4.times { |geek| }" team, I saw a "Chunky Bacon" team, certainly a disciple of _why's.

The UMIX system

At the beginning, you only had access to the guest account, but after some privilege escalation you'd get all the other passwords one after another. The first step, in the guest account, was creating a program in BASIC to obtain weak passwords from other accounts (the same way the well-known "crack" program does). That script could be run with the UMIX-provided interpreter you could find /bin/qbasic.

The attention to detail was stunning; the "FS" replicated what you'd expect on a real system: the UMIX "OS" had a sort of primitive shell, a MUA called ... 'mail' (you could even see the spam received by each user! :), the above-mentioned BASIC interpreter, a "umodem" program that could be used to upload your code to the machine (so that it could be interpreted by the embedded programs)... It even pretended that some incorrect programs dumped core (you got some points after examining the core file you'd get after running ./a.out on the guest account :)

Functional problems

Just a few words about the actual problems. I left a couple untouched...

The "basic" problem

This was the first thing you had to solve: by using the embedded QBASIC interpreter, you got to crack weak passwords from other accounts!

The text adventure game

What can I say... a text adventure in a simulated OS running on your own VM! I've read that at some point you had to solve a puzzle about some sort of ML dialect introduced during the game, but never got to that part --- too little time.

The 2D language

You had to write programs to multiply unary numbers and reverse lists, and a raytracer (!) in the 2D language. A function to add two unary numbers looked like this:

      :plus                          |                                    :
      :  *==================*        |                                    :
      -->!send [(W,S),(W,E)]!--------#--------------------+               :
      :  *==================*        |                    |               :
      :         |                    |                    |               :
      :         |                    v                    v               :
      :         |                 *==============*    *============*      : 
      :         |                 !case N of S, E!--->!send [(N,E)]!-------
      :         |                 *==============*    *============*      : 
      :         |                        |                                : 
      :         |                        v                                : 
      :         |                    *========*       *================*  : 
      :         +------------------->!use plus!------>!send [(Inl W,E)]!---
      :                              *========*       *================*  :
:main                                                          :
:                                                              :
:  *================================================*          :
:  !send [(Inl Inl Inl Inr (),E),(Inl Inl Inr (),S)]!--+       :
:  *================================================*  |       :
:                   |                                  v       :
:                   |                            *========*    :
:                   +--------------------------->!use plus!-----
:                                                *========*    :

Here are my solutions to the multiplication and list reversal problems: mult-opt.2d rev-b.2d

The "Black" problem

The Black problem involved designing a number of black-knots:

 A black-knot has n inputs (numbered 0 to n-1) and n outputs.
 (These are the little holes that kids drop marbles into.) The
 black-knot is a function from each of its n inputs to a pair of
 integers (j, k) where 0 <= j < n and 0 <= k. The number j is the
 output hole that the marble comes out of. The number k is a number
 of "plink" sounds that are produced as the marble rolls unseen
 through the toy.

There were specifications for black-knots with as many as 500 input pipes... I made several variants of a small Ruby program to generate them (some of them using Amb to backtrack and other techniques), but I only solved the problem for the 10-input black-knot --- my programs were too slow for the larger ones:


The O'Cult language

This was a twisted rewrite system; I only solved the first part and didn't even try the XML stuff. I went the TDD way for this: adding one rule at a time until the tests passed. But the solution is quite ugly (only 157 points) so I'd rather keep it to myself :)

Other problems

I left a couple puzzles untouched: 'balance', about yet another twisted programming language (found on yang's account) and the one on ant simulation (Smellular Antomata) with automata.

A fairly fast VM

I read from Gregory's blog that his team tried to code this in Ruby first, and then in C, but the performance was abyssal. Strangely enough, C was the perfect tool for this task: functional languages had a hard time with this part.

Here's my VM, which runs on 32 bit machines, and executes the standard SANDMark benchmark in under 2 minutes on my older K7 XP 1700+. It's a Ruby extension that achieves acceptable performance. It features :

  • a custom allocator for small arrays (20% speed boost over plain malloc)
  • fast array management by reusing the 32bit addresses as the array IDs (this is what makes it 64bit-unsafe)
  • no array bound checking by default (don't use it to run programs you don't trust!)

Quite straightforward, I got it running in a few hours --- I had the codex running in half an hour, but a segfault caused by endianness issues took longer to solve --- I wasn't very bright that day.

This tarball includes my VM and the decoded program with the UMIX environment: mfp-icfp06-vm.tar.gz.

ea2a3fdca4bdd0aedee77c96a2f89db3  mfp-icfp06-vm.tar.gz

Here's a simplified version of the main loop:

while(1) {
    ins = *ptr++;
    switch(OPCODE(ins)) {
 case 0: /* CMOV */
     RA(ins) = RC(ins) ? RB(ins) : RA(ins);  /* heh CMOV on recent micros */
 case 1: /* LD */
     RA(ins) = *(PT(RB(ins)) + RC(ins));
 case 2: /* ST */
     *(PT(RA(ins)) + RB(ins)) = RC(ins);
 case 3: /* ADD */
     RA(ins) = RB(ins) + RC(ins);
 case 4: /* MUL */
     RA(ins) = RB(ins) * RC(ins);
 case 5: /* DIV */
     if(!RC(ins)) {
	 rb_raise(rb_eRuntimeError, "Division by 0.");
     RA(ins) = RB(ins) / RC(ins);
 case 6: /* NAND */
     RA(ins) = ~(RB(ins) & RC(ins));
 case 7: /* HALT */
     return Qnil;
 case 8: /* ALLOC */
	 unsigned int size = RC(ins);
	 unsigned long *p;
	 p = calloc(size + 1, 4);
	 RB(ins) = (unsigned long)p;
	 *p = size * 4;
 case 9: /* FREE */
     free((unsigned long *)RC(ins));
 case 10: /* OUT */
     fputc(RC(ins), stdout);
     fputc(RC(ins), stderr);
 case 11: /* IN */
     i = fgetc(stdin);
     if(i == 4) { /* ^D */
	 i = 0xFFFFFFFF;
     RC(ins) = (unsigned long)i;
     fputc(RC(ins), stderr);
 case 12: /* JMP */
     if(RB(ins) == 0) {
	 ptr = curr_program + RC(ins);
     } else {
	 unsigned long *newarr;
	 unsigned int progsize = *(PT(RB(ins)) - 1);
	 free(curr_program - 1);
	 newarr = malloc(progsize + 4);
	 memcpy(newarr + 1, PT(RB(ins)), progsize);
	 newarr[0] = progsize;
	 curr_program = newarr + 1;
	 ptr = curr_program + RC(ins);
 case 13: /* LIMM */
     registers[REG(ins)] = IMM(ins);
	 char msg[512];
	 snprintf(msg, 511, "Unknown opcode: %d.\n", OPCODE(ins));
	 rb_raise(rb_eRuntimeError, msg);

The details

There were so many little things that made this all so much fun... Here's a funny signature found in an email inside the UMIX system:

 Bill Ohmega     "Hell is other programming
ohmega@cbv.net    languages." -- Sartran

Was (Jean-Paul?) Sartran referring to the "2D" language?

well done on the vm - _why (2006-07-28 (Fri) 13:34:21)

Such an amazing quest. Hey, thanks for the VM, it's been the most swift and stable among the handful I've tried.

mfp 2006-07-28 (Fri) 15:02:45

I've tried some 3-4 VMs people linked to on the icfpcontest-discuss ML, and only the JITting one was a bit faster: SANDMark in 90s vs. 120s with mine (the result is a bit disappointing, the optimizer must be improved) --- but that one only works with sandmark.umz. So mine is the fastest usable one AFAIK.

Jon 2006-08-01 (Tue) 19:34:56

Awesome! Nice writeup.

archive invalid? - Forrest Chang (2006-07-26 (Wed) 10:21:55)

It seems that gunzip finds an unexpected EOF on vm.tar.gz

Otherwise, cool and I look forward to seeing what you wrote.

mfp 2006-07-26 (Wed) 11:00:19

Weird, I just downloaded and it decompressed fine.

$ md5sum vm.tar.gz 
ea2a3fdca4bdd0aedee77c96a2f89db3  vm.tar.gz
$ tar zxvf vm.tar.gz

Just to make sure, I moved it to the static area; you can download it from http://eigenclass.org/misc/mfp-icfp06-vm.tar.gz .

Danno 2006-07-26 (Wed) 12:54:19

This is nuts. I totally don't have the chops for this sort of thing.

Last modified:2006/07/25 13:18:25
Keyword(s):[blog] [ruby] [frontpage] [vm] [icfp] [icfp06]
References:[More archeolinguistics: unearthing proto-Ruby]

*1 I gained some proficiency with Ruby extensions in ICFP03 :)

*2 This is what happened to me on my first participation, in 2003. I worked in a large team, coding for >50H over 72H period. The second time I took it easy, starting only a few hours before the end, so it doesn't really count :)

*3 the scores stopped being updated 8H before the end of the contest