With Ruby 1.9.1 being out for a while now, it's time to review my calculations regarding the memory footprint of objects, since 1.9 incorporates some optimizations that improve significantly on 1.8. I also measured the footprint of OCaml objects while I was at it.
Addendum: added note about Ruby Enterprise Edition (Ruby EE), a patched Ruby 1.8.6 used with Phusion Passenger; see below.
This table summarizes the results (sizes in bytes on x86; around (exactly, for OCaml) twice as much on x86-64 --- the malloc overhead might differ):
| Ruby 1.8 | Ruby EE | Ruby 1.9 | OCaml | |
|---|---|---|---|---|
| object with no IVs | 20 | 20 | 20 | 12 |
| object 1 IV | 120 | 96 | 20 | 16 |
| object 2 IVs | 144 | 112 | 20 | 20 |
| object 3 IV | 168 | 128 | 20 | 24 |
| object 4 IV | 192 | 144 | 48 | 28 |
| struct (Struct or record) 1 elm. | 32 | 24 | 20 | 8 |
| struct 2 elms. | 36 | 28 | 20 | 12 |
| struct 3 elms. | 40 | 32 | 20 | 16 |
| struct 4 elms. | 44 | 36 | 44 | 20 |
(The Ruby EE gains come from the TCMalloc allocator and these are best case figures; the actual footprint will be between them and those for Ruby 1.8.)
Keep in mind that both Ruby 1.8 and 1.9 can suffer from heavy memory fragmentation (both internal and external) when allocating many objects (also, objects might be retained for an arbitrarily long amount of time because the GC is conservative). OCaml has no such problem, as it has got a generational, exact GC with a copying GC in the minor heap and an incremental mark & sweep & compact GC in the major heap.
Ruby 1.8
In Ruby 1.8, an object with one instance variable (IV) takes:
5 words for the object slot
4 (+2 =
mallocoverhead) words for the IV table (st_tablestruct)11 (+2) words for the bins
4 (+2) words for the entry
That is, given
class X; def initialize(x); @x = x end end
X.new(1) will take 30 words, or 120 bytes in x86 (24 of which are used by malloc for internal bookkeeping).
Additional IVs cost 6 words (24 bytes) per IV until we reach 11 IVs (at which point the hash table resizes to 19 bins).
Ruby Enterprise Edition
Ruby EE is a patched Ruby 1.8.6 which uses Google's TCMalloc, which is much faster than the most common one, based on Doug Lea's. There are no changes to the runtime representation of objects, so all the possible gains space-wise come from TCMalloc. According to its documentation, small blocks can be allocated with virtually no overhead, so Ruby EE will take up to 24 fewer bytes per object with IVs, and as much as 8 bytes less per Struct.
Ruby 1.9
Ruby 1.9 doesn't use a symbol -> value hash table for IVs anymore. There's an IV index table per class which contains the index associated to the IV name. The index is used to dereference a per-object IV array.
(Note that the IV index table is shared amongst all the objects of the same class. If each one uses different names for the IVs, the indexes will keep increasing, so in a pathological case the IV array could become arbitrarily large even when the object has got only one IV.)
Ruby 1.9 stores up to 3 instance variables in the object slot without using an external table, so an object with one IV will only take 5 words. Beyond 3 instance variables, it reverts to an external IV array which is resized exponentially (factor 1.25) as new elements are added. For an object with 4 IVs, it'll be of size 5, and the overall footprint will be:
5 words for the object slot
5 (+2) words for the IV array
OCaml
I'd never bothered to look into the size of OCaml objects before (as you're going for records when you want speed anyway), even though it's really easy using the low-level Obj module, which gives information about the runtime representation:
# open Obj;; # let value_size o = let t = repr o in if is_block t then 1 + size t else 0;; val value_size : 'a -> int = <fun> # value_size 0;; - : int = 0 # type foo = A | B of int;; type foo = A | B of int # value_size A;; - : int = 0 # value_size (B 1);; - : int = 2 # value_size (object end);; - : int = 3 # value_size (object val a = 1 end);; - : int = 4 # value_size (object val a = 1 method x = 1 end);; - : int = 4
The value_size function returns the size in words of the value in the heap, and returns 0 for immediate values (bool, char, int, constant constructors).
After a look at CamlinternalOO.ml, I now know that, in addition to the 1 word overhead taken for all values in the heap (the block header used by the GC and the runtime), objects take:
1 word for the method table
1 word for an unique object ID
1 additional word per instance variable
Comments
I'd be interested to see Fusion's Enterprise Ruby included in the comparisons.
Just added it to the table. It's marginally better than stock 1.8 thanks to the TCMalloc allocator, but still far from 1.9, since the internal object representation hasn't changed.
The ruby data is interesting, but why compare it to OCaml? It would seem equally useful to compare it to C, C++, or a fishing expedition...
I bumped into some OCaml code that used objects where I'd normally have records, and realized I didn't know the space overhead for the former. Feel free to ignore that :)
The memory layout in C is uninteresting because trivial (esp. if we only have word-sized fields), and at most a
sizeofaway. As for C++, it's compiler-dependent, but it should be aroundmalloc overhead (~8 bytes) + 1 word (vtable pointer) + 1 word per (word-sized) field (duh), yielding the same sizes as OCaml with commonmallocimplementations (of course, you can get rid of most of the overhead by using custom allocators).Re: C++ -> Virtual method table in Wikipedia.
You mention using records in OCaml, for speed. And, I mean, I'm sure you already know this-- that the Struct class serves the same purpose in Ruby. I can't imagine it's that fast (since every generated class has a field lookup table,) but it is as memory slim as a C struct or O'Caml record (okay okay, minus the object header.) Strange that most people think it's just a novelty.
You're of course right, _why. I gave the sizes of (Ruby) Structs and (OCaml) records for exactly that reason, and they are indeed the same modulo the object slot and
mallocoverhead. I guess in 1.9 Structs should be about as fast as regular objects, as the operations needed to access a field are roughly the same (find index in the table, access per-object array). OTOH, did 1.9 have inline caches for IV access? I don't remember, it's been a while since I read the code, and it might have changed since. If there are, and there's no such logic for Structs, regular IVs could be faster (no hash table lookup to get the index).