eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

On GC and finalizers in Ruby, corrected weak hash table implementations

Jens Himmelreich reported a problem with my weak hash table implementations:

You use a hash, but you only save the object_id in the WeakHash. Every time you need the hash, you get it back from the heap by _id2ref. You use a finalizer to be acknowledged, if the hash is garbage-collected.

Sometimes this concept works, sometimes I get a RangeError. I log the finalization and my theory - without reading reference-material about ruby-garbage-collection - is this: The hash is cleared on the heap, but the finalizer is not immediately called. In this gap it's possible to get a RangeError.

I took another look at gc.c, and Jens is right on.

You can find corrected versions of the SimpleWeakHash and WeakHash classes (two weak hash tables with slightly different semantics) below, but I'll first expand on why GC and object finalization can be fairly distant in time.

The GC

Ruby uses a very simple mark&sweep GC, which as its name implies works by first marking all live*1 objects, and then reclaiming all those that remained unmarked.

gc.c is actually one of the easiest core components; it's easier to follow than eval.c, since it doesn't require remembering lots of things (node types and undocumented conventions regarding the way they nest) and it's leaps and bounds simpler than parse.y. So it only took me a few minutes to verify Jens' intuition.

Finalizers

How do finalizers fit in the picture? As the rest of gc.c, this is also very simple. The interpreter just adds the objects being swept for which a finalizer has been defined to a list of... well, objects whose finalizers need to be called. (If you want to see where this happens, read gc_sweep and look for uses of the deferred_final_list and final_list variables). At some point, later in time, Ruby follows that list and executes the finalizers one after the other.

This explains how finalizers are executed, but not when. The code could hardly be more expressive:

void
rb_gc()
{
    garbage_collect();
    rb_gc_finalize_deferred(); /* <==== */
    /* the above call runs the pending finalizers from the list */
}

/*
 *  call-seq:
 *     GC.start                     => nil
 *     gc.garbage_collect           => nil
 *     ObjectSpace.garbage_collect  => nil
 *
 *  Initiates garbage collection, unless manually disabled.
 *
 */

VALUE
rb_gc_start()
{
    rb_gc();
    return Qnil;
}

So finalizers are executed right away when we trigger the GC manually with GC.start.

When GC isn't followed by finalizer execution...

However, things don't always happen that way. Indeed, although coalesced inside rb_gc(), garbage collection and finalizer execution are conceptually distinct, and you can have the former without the latter when the following functions are called:

  • ruby_xmalloc
  • ruby_xrealloc
  • rb_newobj

ruby_x{malloc,realloc} are but {malloc,realloc} substitutes that trigger a GC run when a given amount of memory has been allocated (the underlying assumption is that if enough memory has been malloc()ed there will probably be lots of dead objects).

Now, rb_newobj is called every time a new object is created! So every instantiation can potentially trigger a GC run which won't be immediately followed by a finalizer execution round.

Deferred execution

For the sake of completeness, please indulge some code taken from eval.c:

static VALUE
rb_call0(klass, recv, id, oid, argc, argv, body, flags)
    VALUE klass, recv;
    ID    id;
    ID    oid;
    int argc;                       /* OK */
    VALUE *argv;            /* OK */
    NODE * volatile body;
    int flags;
{
 ....
   static int tick;
 ....
    if ((++tick & 0xff) == 0) {
        CHECK_INTS;         /* better than nothing */
        stack_check();
        rb_gc_finalize_deferred();
    }

This means that the finalizers can be executed up to 256 method calls after the GC, quite a long time...

Corrected weak hash tables

Here's the fixed code for the SimpleWeakHash, as given by Jens (you can read more about how it works here):

class SimpleWeakHash
 def initialize
   @valid = false
 end

 def [](key)
   __get_hash__[key]
 end

 def []=(key, value)
   __get_hash__[key] = value
 end

 private

 def __get_hash__
   old_critical = Thread.critical
   Thread.critical = true
   set_internal_hash unless @valid
   recover_hash
 ensure
   Thread.critical = old_critical
 end

 def recover_hash
   return ObjectSpace._id2ref(@hash_id)
 rescue RangeError
   set_internal_hash
   return ObjectSpace._id2ref(@hash_id)
 end

 def set_internal_hash
   hash = {}
   @hash_id = hash.object_id
   @valid = true
   ObjectSpace.define_finalizer(hash, lambda{ |id| @valid = id !=
   @hash_id })
   hash = nil
 end
end

Unlike the first version, it doesn't assume that finalizers are called right after garbage collection, and rescues RangeError exceptions raised by ObjectSpace._id2ref.

I fixed WeakHash similarly:

class WeakHash
  attr_reader :cache
  def initialize( cache = Hash.new )
    @cache = cache
    @key_map = {}
    @rev_cache = Hash.new{|h,k| h[k] = {}}
    @reclaim_value = lambda do |value_id| 
      if @rev_cache.has_key? value_id
        @rev_cache[value_id].each_key{|key| @cache.delete key}
        @rev_cache.delete value_id
      end
    end
    @reclaim_key = lambda do |key_id|
      if @key_map.has_key? key_id
        @cache.delete @key_map[key_id]
      end
    end
  end

   def []( key )
     value_id = @cache[key]
     return ObjectSpace._id2ref(value_id) unless value_id.nil?
     nil
   rescue RangeError
     nil
   end

   def []=( key, value )
     case key
     when Fixnum, Symbol, true, false
       key2 = key
     else
       key2 = key.dup
     end
     @rev_cache[value.object_id][key2] = true
     @cache[key2] = value.object_id
     @key_map[key.object_id] = key2

     ObjectSpace.define_finalizer(value, @reclaim_value)
     ObjectSpace.define_finalizer(key, @reclaim_key)
   end
end

Moral

Test often, check your assumptions, code carefully, read the sources!



No Title - evan (2007-05-04 (Fri) 05:50:19)

Nice article; it unscared me of the GC implementation. Time for some hacking.

Name:
Comment:
TextFormattingRules
Preview:

Use this form to create a new top-level comment; for direct replies to existing comments, use the text entries you'll find below.

Name:
Subject:

TextFormattingRules
Preview:
Last modified:2007/04/19 09:08:04
Keyword(s):[blog] [ruby] [frontpage] [weakhash] [_id2ref] [ObjectSpace] [GC] [finalizer]
References:

*1 and some which really aren't, but seem so, due to references left in the C stack