Low-level introspection to save brain bits: Ruby object model, class hierarchy and method dispatching
I've often found that it's easier to remember some implementation detail of ruby, inferring from that how the language behaves, instead of trying to keep in mind the numerous implications themselves.
Take method dispatching. There are quite a lot of rules to consider if you try to remember what will happen in each of the cases shown below, yet it's all caused by a particularity of the implementation, which I'll showcase using evil.rb:
class A; def foo; "A#foo" end end class B < A; end module C; def foo; "C#foo" end end module D; def foo; "D#foo" end end class E < A; end class F < A; end a = A.new b = B.new e = E.new f = F.new a.foo # => "A#foo" b.foo # => "A#foo" e.foo # => "A#foo" class E; def foo; "E#foo" end end e.foo # => "E#foo" class << e; include C end e.foo # => "C#foo" class A; include C end f.foo # => "A#foo" f.extend C f.foo # => "A#foo" class F; include D end f.foo # => "D#foo" class F; def foo; "F#foo" end end f.foo # => "F#foo" def f.foo; "f#foo" end f.foo # => "f#foo" module X; def bar; "X#bar" end end b.foo # => "A#foo" class B; include X end b.foo # => "A#foo" b.bar # => "X#bar" module X; include D end b.foo # => "A#foo" class B; include X.clone end b.foo # => "D#foo"
Singletons, mixins, inclusion after inclusion and extension after extension, that's quite a lot of possibilities...
First of all, we need to summon the power of darkness to deconstruct matz' creation:
Object#internal is defined by evil.rb to return a handle that allows us to inspect and manipulate the low-level fields associated to the original object. For regular objects, the interesting fields would be:
- iv_tbl: a pointer to a hash table with the instance variables (st_tbl *, see st.c)
- flags: contains information such as whether the object is frozen
- klass: points to the class of the object
Time for some basic low-level introspection:
a = Object.new a.internal.klass.to_i # => 3085110560 a.internal.iv_tbl.to_i # => 0 a.instance_variable_set(:@foo, 1) a.internal.iv_tbl.to_i # => 135912160
The iv_tbl isn't initialized until an instance variable is set: not only the language, but also the implementation makes sense.
Classes and modules carry a bit of additional info:
- m_tbl: st_tbl (hash table) holding instance methods
- super: a pointer to the class/module higher in the hierarchy
class A; end A.internal.super.to_i # => 3085110560 A.new.internal.klass.super.to_i # => 3085110560 # ... Object.object_id * 2 + 2**32 # => 3085110560
Let's begin with something simple:
o = Object.new o.internal.klass.to_i # => 3085110560 Object.object_id*2 + 2**32 # => 3085110560 OBJ(o.internal.klass) # => Object class A def foo; "A#foo" end end a = A.new OBJ(a.internal.klass) # => A OBJ(a.internal.klass.super) # => Object
OBJ is a method that returns a Ruby object given it's address (which is related to, but not identical to its object_id). We can ignore it for now (there's a small hint in the above snippet if you want to try to figure how it's implemented).
Nothing surprising so far: the implementation matches what we can see inside the language (without resorting to wicked methods).
returns the address of someobj.class, and
that of someclass.superclass.
Moving on to a slightly more complex example:
def a.foo; "singleton foo" end s = OBJ(a.internal.klass) # => #<Class:#<A:0xb7dc9c2c>> class << a; self end # => #<Class:#<A:0xb7dc9c2c>> s.instance_methods(false) # => ["foo"] OBJ(a.internal.klass) # => #<Class:#<A:0xb7dc9c2c>> OBJ(a.internal.klass.super) # => A OBJ(a.internal.klass.super.super) # => Object
This shows that the singleton class is being inserted before (lower than) the actual class in the klass chain. One could also say that the actual class is the singleton one, and that
should always return the singleton class of the object; but that's not the way matz meant it to be*1.
The Pickaxe refers to them as proxy classes, but that term is somewhat overloaded so I prefer naming them ICLASSes (after the T_ICLASS constant used to tag them), which is reminiscent of their low-level and "only for the implementor's eyes" status.
But what are they? Let's see:
module B def foo; "B#foo" end end class A include B end a = A.new OBJ(a.internal.klass) # => A
sup = OBJ(a.internal.klass.super) # ~> undefined method `inspect' for #<B:0xb7d8b028> (NoMethodError)
Object.instance_method(:inspect).bind(sup).call # ~> TypeError: bind argument must be an instance of Array
"What the heck?"
OBJ(a.internal.klass.super.klass) # => B OBJ(a.internal.klass.super.super) # => Object a.foo # => "A#foo" B.internal.super # => nil
What's going on is that the ICLASS was inserted between A and Object in the klass chain. Its klass field points to the module that was included (B). Why didn't #inspect work on it? Module is an Object, and a Module instance too, but the klass chain starting from the ICLASS stops at the included module and doesn't go up to Object. B.internal.klass is 0. It takes some time to internalize what the klass and super fields mean for module, classes or ICLASSes.
A few more bits to think about before I get to explain them properly:
class << a; end OBJ(a.internal.klass) # => #<Class:#<A:0xb7dbbbb8>> class << a; self end.ancestors # => [A, B, Object, Kernel] a.extend B a.foo # => "A#foo" class << a; self end.ancestors # => [A, B, Object, Kernel] module C def foo; "C#foo" end end class << a; include C end a.foo class << a; self end.ancestors # => [C, A, B, Object, Kernel] OBJ(a.internal.klass) # => #<Class:#<A:0xb7dbbbb8>> # OBJ(a.internal.klass.super) # => #~> undefined method `inspect' for #<C:0xb7d70334> (NoMethodError) OBJ(a.internal.klass.super.super) # => A
I was scanning the article to make sure I knew what I thought I knew and it seems you have an error.
f.foo #=> "A#foo" f.extend C f.foo #=> "A#foo" <-- this line surprised me at first. then I went to irb...
it should be "C#foo" if I didn't botch my test.
Too bad I didn't think too format. The error is location obvious enough though.
(reformatted your code)
Surprising, isn't it? I think "A#foo" is correct (the # => lines should be correct because they get added automatically by my xmp filter). Did you include C into A before extending f? Just to double-check:
batsman@tux-chan:~/mess/current$ cat foo.rb class A; def foo; "A#foo" end end class F < A; end module C; def foo; "C#foo" end end f = F.new puts f.foo class A; include C end f.extend C puts f.foo batsman@tux-chan:~/mess/current$ ruby -v foo.rb ruby 1.8.4 (2005-12-24) [i686-linux] A#foo A#foo
My irb also agrees.
Ah. right. I scanned too fast. And only did the partial test. Thanks for the pointer!
*1 Indeed, he considers even class << obj; self end is somewhat abusive, i.e. that singleton classes need not be accessible Ruby objects, so an alternative implementation could do without that.
- 39 http://www.ruby-forum.com/topic/101304
- 19 http://www.oreillynet.com/ruby/blog/2007/04/ruby_code_that_will_swallow_yo.html
- 13 http://chneukirchen.org/anarchaia
- 11 http://anarchaia.org
- 6 http://www.google.com/custom?cof=AH:center;S:http://www.blountcountyschools.net;AWFID:b3508d6803eee015;&q=f oo d chain
- 6 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/243668
- 5 http://www.anarchaia.org
- 3 http://rubyriver.org
- 3 http://chneukirchen.org/anarchaia/archive/2006/02/07.html
- 2 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/235905