eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

A better Kernel#caller

update.png restored

Proposals for extending Kernel#caller come up regularly on ruby-core and ruby-talk. Indeed, the information it returns in textual form isn't as utile and easy to use as it could be, forcing one to parse it:

def foo; bar end
def bar; puts caller; end

 kernel_caller.rb:1:in `foo'

I decided to implement a Kernel#caller substitute for rcov which would provide more information in a more convenient format:

require 'my_caller'
require 'pp'

def foo; bar end
def bar; pp $rec.callstack(2).map{|cs| cs.to_a} end

$rec = CallStackRecorder.new

would print

 [[Object, :bar, "example.rb", 5], [Object, :foo, "example.rb", 4]]

(it actually returns an array of CallSite structs).

Actually, the main reason to implement it was performance-related: Kernel#caller doesn't allow you to specify how deep the backtrace information should be (one how many levels to skip). When creating cross-referenced code coverage repots for Rails applications, Kernel#caller proved to be way too slow since the stack was very deep (over 60 levels...). I didn't need anything past the second stack frame, so Kernel#caller was wasting everybody's time generating useless data. Therefore, the key feature I needed was the ability to get only the top N frames.

I implemented that in C, using Ruby's RUBY_EVENT hooks. I traced all the (c-)call/return events to maintain a call stack for each Ruby thread. That's fairly slow, but I anticipated it would be faster overall than Kernel#caller if the call stack was deep enough, and I was right (rcov's tracing phase was nearly 2X faster than before).

It took about ~500 lines of C, which corresponded broadly to 80 lines of Ruby:

class CallStack
  def initialize(stack = []); @stack = stack end
  def <<(x); @stack << x end
  def pop; @stack.pop end
  def last(n = 1); @stack[-[n,@stack.size].min..-1] end # bomb for n <= 0?
  def tos; @stack.last end
  def to_a; @stack end

CALLSITES = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = 0 } }

class CallStackRecorder
  CallSite = Struct.new(:klass, :mid, :file, :line)

  def initialize
    @callstack_tbl = Hash.new{|h,k| h[k] = CallStack.new }

  def install_hook 
    ignore = false
    set_trace_func lambda{|event, file, line, mid, binding, klass|
      if [klass, mid] == [CallStackRecorder, :callstack]
        case event
        when 'return': ignore = false; next
        when 'call': ignore = true
      next if ignore
      case event
      when 'c-call', 'c-return': file, line = nil, 0
      case event
      when 'call', 'c-call'
        # this was different in the C implementation, where klass was always
        # the real one (it could even be an ICLASS aka. proxy class)
        receiver = eval("self", binding)
        klass = class << klass; self end unless klass === receiver
        cstack = callstack(1)
        CALLSITES[[klass, mid]][cstack] += 1
        @callstack_tbl[Thread.current] << CallSite.new(klass, mid, file, line)
      when 'return', 'c-return'
        last_site = @callstack_tbl[Thread.current].tos
        if last_site && file && line && last_site.mid == mid && last_site.file == file &&
          last_site.klass == klass
          last_site.line = line

  def remove_hook

  def callstack(depth)
    r = @callstack_tbl[Thread.current].last(depth)


  # works only for the thread that called #install_hook; part of the backtrace
  # is lost for the rest
  def prefill_callstack_tbl
    @callstack_tbl[Thread.current] = CallStack.new format_backtrace_array(caller(2))

  def format_backtrace_array(backtrace)
    backtrace.map do |line|
      md = /^([^:]*)(?::(\d+)(?::in `(.*)'))?/.match(line)
      raise "Bad backtrace format" unless md
      CallSite.new(nil, md[3] ? md[3].to_sym : nil, md[1], (md[2] || '').to_i)

There's a difference between the backtrace info returned by CallStackRecorder#callstack and Kernel#caller's. Can you spot it?*1 I handled that partially in the C code but omitted it from the above snippet.

Another way

After a few hours of C hacking, when all the tests passed and I was about to merge the modifications into the 'head' branch, I checked if the standard Kernel#caller actually accessed private (static, in C terminology, but it's an homophone) internal structures, which would mean my implementation could not operate the same way, and maintaining a call stack myself was indeed the way to go.

It turns out I'd been wrong all that time. I'd written several hundred lines of C for nothing, when this, which gave another >200% speed boost to rcov and took me but a few minutes, would have sufficed:

 static VALUE
 callsite_custom_backtrace(int lev)
   struct FRAME *frame = ruby_frame;
   VALUE ary;
   NODE *n;
   VALUE level;
   VALUE klass;
   ary = rb_ary_new();
   if (frame->last_func == ID_ALLOCATOR) {
           frame = frame->prev;
   for (; frame && (n = frame->node); frame = frame->prev) {
           if (frame->prev && frame->prev->last_func) {
                   if (frame->prev->node == n) continue;
                   level = rb_ary_new();
                   klass = frame->prev->last_class ? frame->prev->last_class : Qnil;
                   if(TYPE(klass) == T_ICLASS) {
                           klass = CLASS_OF(klass);
                   rb_ary_push(level, klass);
                   rb_ary_push(level, ID2SYM(frame->prev->last_func));
                   rb_ary_push(level, rb_str_new2(n->nd_file));
                   rb_ary_push(level, INT2NUM(nd_line(n)));
           else {
                   level = rb_ary_new();
                   rb_ary_push(level, Qnil);
                   rb_ary_push(level, Qnil);
                   rb_ary_push(level, rb_str_new2(n->nd_file));
                   rb_ary_push(level, INT2NUM(nd_line(n)));
           rb_ary_push(ary, level);
           if(--lev == 0)
   return ary;

Sydney and TracePoint - trans (2006-06-06 (Tue) 12:56:08)

Hey another implementation! ;)

Don't know if your aware of Facets' TracePoint. Essentially the same thing and covers the full gamut. I implementated it as a foundation of an event-based AOP library --which worked but using tracing for AOP is terribly inefficent, at least at the Ruby level, nonetheless the TracePoint lib has a particularly nice interface.

The other thing is Sydney's Frame implementation, which I never actually looked at but I suspect is essentially like your example.

It would be nice if Ruby had something built-in of course, perhaps a blend of the best features of these many past attempts.

mfp 2006-06-08 (Thr) 01:29:23

I could have saved a lot of time had I looked for existing implementations or read eval.c before :-|

*1 hint: it's related to blocks