MAIN
2008-07-02 18:44 UTC About problem formulations and ordered permutations
The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or experimental skills. -- Albert Einstein
Here's an unremarkable example where the very description of the problem leads you in the wrong direction (on purpose?).
Here is an interesting coding challenge: write a counter function that counts from 1 to max but only returns numbers whose digits don’t repeat(...) Also display the biggest jump and the total count of numbers (...).
It suggests a brute-force solution, with max iterations and a predicate to verify if the counter satisfies the "no repeated digits criterion".
A vastly better approach (with 1126 times fewer iterations when considering numbers up to 10^10) becomes apparent if the problem is reworded as follows:
generate permutations of the digits '0' to '9' that do not start with '0' and whose numerical value is smaller than 'max', in increasing order
The "do not start with '0'" restriction is required to avoid generating a permutation twice (e.g. 12 and then 012), since a leading zero is ignored.
The easiest way to generate a permutation with m items from a set of n elements is to pick one element and prepend it to a (m-1) permutation of items from the resulting set of (n-1) elements. The "not starting with 0" criterion is fulfilled simply by picking the initial element from the digits different from zero and then putting zero back in the set used in the recursion.
This is expressed directly in OCaml as follows (you can find the full code of the OCaml and Ruby versions below):
module S = Set.Make(struct type t = int let compare = (-) end) let (--) set elm = S.remove elm set let permutations f zero digits len = let base = S.cardinal digits in let rec aux n digits = function 0 -> f n | len -> S.iter (fun s -> aux (n * base + s) (digits -- s) (len-1)) digits in S.iter (fun s -> aux s (digits -- s) (len - 1)) (digits -- zero)
The function is general in the sense that it will find unique permutations in any base (that is, by placing the numbers 10 to 15 in the initial set, a sequence with the digits '0'..'9' + 'a'..'f' can be generated with the same code).
Here's the translation to Ruby:
def aux(n, len, syms, &b) if len == 0 yield n return end syms.each{|s| aux(n * 10 + s, len - 1, syms - [s], &b) } end def compute_seq(zero, syms, len, &b) (syms - [zero]).each{|s| aux(s, len - 1, syms - [s], &b) } end
This version is not as general as the OCaml one, but that's not its only problem. This is how long it takes to find the 8877690 numbers under 10^10 with unique digits:
| Ruby 1.8.6 (base 10) | 73s |
| Ruby 1.9 (base 10) | 63s |
| OCaml (any base) | 0.640s |
As usual, Ruby is 100 times slower.
The OCaml version is more general than the strict minimum required by the original problem (it works with arbitrary bases), and thus presumedly slower than a more specific one. It doesn't fare badly against the fastest implementation I've found, an equivalent solution in Java that uses a bitmask to represent the sets:
| Java (base 10, bitset) | 0.680s without JVM startup overhead |
The Ruby solution uses Array#-, also know as "the poor man's set subtraction". Removing an element from an array is an O(n) operation. In the above-mentioned Java version, iterating over the remaining digits is O(all digits), instead of O(remaining ones). The OCaml version uses a set based on balanced trees where the relevant operations are logarithmic (it has a fairly large constant factor, so an integer set based on a bitset would be faster in this case). This means that the OCaml version would perform nearly as well with base-1000 permutations, as opposed to the other solutions which would become nearly 100 times slower.
Ironically, the OCaml stdlib is richer than Ruby's when it comes down to data structures. (The irony is that, while OCaml makes it very easy to build a data structure with good performance, in Ruby you are effectively stuck with the core classes implemented in C, which means essentially Array and Hash --- Ruby is so slow that algorithmic improvements hardly make up for the interpretation overhead.)
Here's the full code:
Read more...
2008-06-12 08:28 UTC The lightest lightweight threads, Protothreads
Last week, I used the Lwt cooperative lightweight thread library to implement a benchmark that measures context switch performance, determined that it was GC bound and timed it against comparable programs (i.e., the fastest implementations in the computer language benchmark games, which are all based on lightweight threads) and a C version that uses POSIX threads, obtaining these results:
| implementation | memory usage | time (s) |
| Haskell GHC 6.8.2 | 2680KB | 1.22 |
| OCaml ocamlopt 1024Kword minor heap | 5178KB | 1.85 |
| Haskell GHC 6.8.2 -threaded, -N1 | 2760KB | 1.9 |
| Erlang R12B-1 HiPE | 5996KB | 3.96 |
| Mozart/Oz | 3788KB | 4.10 |
| OCaml ocamlopt 32Kword minor heap | 970KB | 4.24 |
| Haskell GHC 6.8.2 -threaded, -N2 | 3300KB | 15.27 |
| GCC C (POSIX threads) | 4520KB | 28.7 |
Here are the figures I get for the C version I made with Protothreads:
Addendum This post wouldn't exist hadn't rektide asked for these results.
| GCC C (Protothreads, optimum scheduling) | 220KB | 0.076s |
| GCC C (Protothreads, pessimum scheduling) | 220KB | 18.6s |
(Read below for an explanation of the difference between optimum and pessimum scheduling policies.)
It is nearly 400 times faster than the C version with POSIX threads, and represents a one order of magnitude improvement over the other lightweight thread implementations. It also needs less memory. The performance is almost unbelievable. Where's the catch, ugly code maybe? Judge for yourself; here are the Protothreads and the POSIX threads C implementations head to head*1:
/* Protothreads, 0.076s*/ /* POSIX threads, 28.7s */
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include "pt.h" #include <pthread.h>
#include "pt-sem.h" #include <limits.h>
#define NUM_THREADS 503 #define NUM_THREADS 503
static struct pt thread_context[NUM_THREADS]; struct stack {
static int data[NUM_THREADS]; char x[PTHREAD_STACK_MIN];
static struct pt_sem mutex[NUM_THREADS]; };
static static pthread_mutex_t mutex[THREADS];
PT_THREAD(thread(struct pt *pt, int id)) static int data[THREADS];
{ static struct stack stacks[THREADS];
static int token;
static int r; static void* thread(void *num)
PT_BEGIN(pt); {
int l = (int)num;
while(1) { int r = (l+1) % THREADS;
PT_SEM_WAIT(pt, &mutex[id]); int token;
token = data[id];
r = (id + 1) % NUM_THREADS; while(1) {
if(token) { pthread_mutex_lock(mutex + l);
data[r] = token - 1; token = data[l];
PT_SEM_SIGNAL(pt, &mutex[r]); if (token) {
} else { data[r] = token - 1;
printf("%d\n", id + 1); pthread_mutex_unlock(mutex + r);
exit(0); }
} else {
} printf("%i\n", l+1);
PT_END(pt); exit(0);
} }
}
int }
main(int argc, char *argv[])
{ int main(int argc, char **argv)
int i; {
int i;
if(argc != 2) exit(255); pthread_t cthread;
data[0] = atoi(argv[1]); pthread_attr_t stack_attr;
for(i = 0; i < NUM_THREADS; i++) { if (argc != 2) exit(255);
PT_SEM_INIT(&mutex[i], 0); data[0] = atoi(argv[1]);
PT_INIT(&thread_context[i]);
} pthread_attr_init(&stack_attr);
PT_SEM_INIT(&mutex[0], 1); for (i = 0; i < THREADS; i++) {
pthread_mutex_init(mutex + i, NULL);
while(1) { pthread_mutex_lock(mutex + i);
for(i = 0; i < NUM_THREADS; i++)
thread(&thread_context[i], i); pthread_attr_setstack(&stack_attr, &stacks[i],
} sizeof(struct stack));
} pthread_create(&cthread, &stack_attr, thread,
(void*)i);
}
pthread_mutex_unlock(mutex + 0);
pthread_join(cthread, NULL);
}
Even though the Protothreads code is similar to the one with pthreads, there are two important differences:
- there is no thread-local data in the Protothreads version. r is recomputed on each iteration
- the proto threads are scheduled manually by running all the corresponding functions in sequence
These dissimilarities give some insight into the nature of Protothreads. They are little more than small state machines whose state is stored in an opaque pt structure. Protothreads' implementation consists of a few preprocessor macros in headers, so the best way to see how they work is to examine the output of the preprocessor (gcc -E, slightly abridged and reformatted here):
Read more...
2008-03-31 11:39 UTC gibak 0.3.0 (backup tool using Git): OSX support, extended attributes, bugfixes
gibak is a backup tool based on git. Since gibak builds upon the infrastructure offered by Git, it shares its main strengths:
- speed: recovering your data is faster that cp -a...
- full revision history
- space-efficient data store, with file compression and textual/binary deltas
- efficient transport protocol to replicate the backup (faster than rsync)
gibak uses Git's hook system to save and restore the information Git doesn't track itself such as permissions, empty directories and optionally extended attributes and mtime fields.
You can read more about gibak here.
What's new in 0.3.0
- OSX support
- support for extended attributes both on Linux and OSX (it might work on other systems if their getxattr(2) interface matches Linux or OSX's).
- a few bugfixes
Read README.upgrade if you used earlier versions of gibak and want to use extended attributes.
Getting it
The latest tarball as of 2008-03-31 is gibak-0.3.0.tar.gz.
You can always get the latest code with
git clone http://eigenclass.org/repos/git/gibak/.git/
The repository can be browsed at http://eigenclass.org/repos/gitweb
Thanks
- Lee Marlow solved problems with submodule paths including spaces
- sean provided most of the fixes required for OSX support
Some FAQs
(This will eventually be moved to a new node.)
Will the repository grow monotonically? Can I delete older history? What about large files?
Read more...
2008-03-24 17:11 UTC Comparing lightweight threads
The Computer Language Benchmarks Game includes a benchmark that measures context switch performance. The entries can be classified into three categories:
- those that rely on lightweight threads, namely Haskell GHC, Erlang HiPE and Mozart/Oz. I don't know whether Smalltalk VisualWorks and Scala belong here too; their performance is far from the top three entries. Update (2008-03-25) According to Julian Morrison and Calum, Smalltalk and Scala have lightweight threads and (lightweight) actors on top of Java thread pools, respectively (see comments below).
- those that use POSIX threads, with or without actual parallelism, including most other entries
- Ruby, a category of its own (?): very expensive user-mode threads with no parallelism
The OCaml entry is amongst the fastest pthread-based ones, but still markedly slower than the top entry, by around an order of magnitude. The version I wrote some time ago, based on the Lwt cooperative lightweight thread library, is close in performance to GHC. Some analysis reveals interesting facts about GHC's concurrency support and Lwt.
Performance
Here are the figures I get on a dual-core AMD Athlon 64 X2 6000+ in 32 bit mode (Why 32 and not 64? Because it's faster in this benchmark; 64 bit pointers are heavy and we get nothing in return in this case.)
Update: I've also timed the Mozart/Oz version and GCC C+pthreads as a baseline.
| implementation | memory usage | time (s) |
| Haskell GHC 6.8.2 | 2680KB | 1.22 |
| OCaml ocamlopt 1024Kword minor heap | 5178KB | 1.85 |
| Haskell GHC 6.8.2 -threaded, -N1 | 2760KB | 1.9 |
| OCaml ocamlopt 256Kword minor heap | 2016KB | 2.05 |
| OCaml ocamlopt 64Kword minor heap | 1228KB | 3.06 |
| Erlang R12B-1 HiPE | 5996KB | 3.96 |
| Mozart/Oz | 3788KB | 4.10 |
| OCaml ocamlopt 32Kword minor heap | 970KB | 4.24 |
| Haskell GHC 6.8.2 -threaded, -N2 | 3300KB | 15.27 |
| GCC C (POSIX threads) | 4520KB | 28.7 |
The Haskell code was compiled with -O2; I used erlc +native +"{hipe, [o3]}" for Erlang and -O3 -fomit-frame-pointer -march=athlong for GCC.
GC overhead
The OCaml version is clearly GC bound, and performance increases as the minor heap is enlarged, decreasing the amount of GC work. Whereas with the default 256KB heap the Erlang program is slightly faster, when OCaml is allowed to use comparable amounts of memory, it is over twice as fast.
Read more...
2008-03-11 10:32 UTC Warm fuzzy things for random simulations
Let's talk about random experiments. The simplest one is tossing a coin, with outcomes "heads" and "tails". It's so elementary that we fully understand it intuitively once we're told that the coin is not loaded. There are three basic problems of interest:
- How to simulate the experiment with a computer and obtain results undistinguishable from the real thing?
- If we're given an outcome, can we compute its probability?
- Can we enumerate all possible outcomes and their probabilities?
In our first experiment, the answers are "rand 2" (if take e.g. "heads" = 0, "tails" = 1), "1/2" and "heads, tails, both with probability 1/2". Easy indeed. Now on to something harder: rolling a fair die. The answers are now "1 + rand(6)", "1/6", and "1..6, all with probability 1/6". Still trivial. Let's aim for the stars! What about rolling three dice? The outcomes are clearly 3..18, but what are the probabilities?
Intuition doesn't help much here --- we "can see" the possible outcomes, but most people cannot give the precise probabilities intuitively (I certainly can't; I can at most draw an approximate graph, but that's because I am used to convolutions). The first problem was how to simulate the experiment; this is by far the easiest one. This is what our dice rolling simulation looks like in Ruby:
def roll_3d6 d1 = rand(6) d2 = rand(6) d3 = rand(6) 1 + d1 + 1 + d2 + 1 + d3 end
We can now simulate the experiment easily, but this doesn't help us with the harder problems: giving the probability of an outcome and doing so for all possible outcomes. Clearly, the above snippet has got all the information we need about this random simulation; it's indeed a precise definition. Wouldn't it be nice to have the computer solve the three problems of interest (simulation, distribution and expectation) for us if we give it the specification of the simulation? It turns out this can be done with a little code massaging:
def roll_3d6(m) m.rand(6).in do |d1| m.rand(6).in do |d2| m.rand(6).in do |d3| m.ret(1+d1 + 1+d2 + 1+d3) end end end end
Given suitable Simulation, Distribution and Expectation classes, we can solve all three problems with the above code.
Simulating the experiment:
p roll_3d6(Simulation.init).run # >> 9
All the possible outcomes and their probabilities:
p roll_3d6(Distribution.init).run # >> # [[3, 0.00462962962962963], [4, 0.0138888888888889], [5, 0.0277777777777778], # [6, 0.0462962962962963], [7, 0.0694444444444444], [8, 0.0972222222222222], # [9, 0.115740740740741], [10, 0.125], [11, 0.125], [12, 0.115740740740741], # [13, 0.0972222222222222], [14, 0.0694444444444444], [15, 0.0462962962962963], # [16, 0.0277777777777778], [17, 0.0138888888888889], [18, 0.00462962962962963]]
Getting the probability given an outcome:
(1..21).each{|n| p roll_3d6(Expectation.init).run(n) }
# >>
# 0.0
# 0.0
# 0.00462962962962963
# 0.0138888888888889
# ...
What's really nice is that Simulation, Expectation and Distribution can be reused for other random experiments. Say you get an extra roll if any of the first 3 dice gave a 6:
def roll_3d6_b(m) m.rand(6).in do |d1| m.rand(6).in do |d2| m.rand(6).in do |d3| if [d1, d2, d3].include?(5) m.rand(6).in{|d4| m.ret(1+d1 + 1+d2 + 1+d3 + 1+d4) } else m.ret(1+d1 + 1+d2 + 1+d3) end end end end end
Here's the distribution we get with roll_3d6_b(Distribution.init).run:
Read more...
- 1723 http://redhanded.hobix.com
- 1300 http://www.rubyinside.com
- 651 http://planetruby.0x42.net
- 398 http://www.rubyweeklynews.org
- 341 http://jp.rubyist.net/magazine/?RubyKaigi2006-0610-2
- 317 http://redhanded.hobix.com/cult/variousPresents.html
- 287 http://programming.reddit.com/info/5znh8/comments
- 275 http://en.wikipedia.org/wiki/Metaprogramming
- 268 http://www.onlinebizbuzz.com
- 260 http://rubyinside.com
Keyword(s):
References:[Opening up my hiki wiki: bliki.rb plugin]