eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Ruby internals: a self-study guide to the sources

I want to read Ruby's sources, which order is best?

I've been answering to that question a few times a year, sometimes on ruby-talk, and as of late responding to private emails. The last time I took the extra effort to draft a self-study guide to ruby's internals. Here's a reformatted version.

Representing Ruby objects, object model basics

Internal representation of Ruby objects

ruby.h
RBasic, Robject, RClass, RFloat, RString, RArray, RRegexp, RHash, RFile, RData and RBignum structs

Keeping track of Ruby objects

gc.c
struct RVALUE and rb_newobj(), struct heaps_slot

Object instantiation

ruby.h
OBJSETUP

(optional diversion) Internal hash tables used by the functions we'll see later on

  • st.h (st.c if you want to see the implementation)

Method dispatching, method cache

eval.c
rb_call(), struct cache_entry, search_method(), rb_get_method_body

Of special interest is search_method(), which performs method lookup, as you can see how it moves up the class hierarchy (klass chain).

Singleton classes

class.c
rb_singleton_class()

How singleton classes are inserted in the klass chain.

Mixins

class.c
rb_include_module(), include_class_new()

the meaning of ICLASSes (proxy classes).

Adding methods

eval.c
rb_add_method()

how methods are added to the m_tbl table of a klass

Instance variables

variable.c
rb_ivar_set(), rb_ivar_get()

how instance variables are stored


At this point, you can read object.c and variable.c to understand most of the object model, peeking into eval.c as needed for the functions you'll see referenced there.


Evaluating Ruby code

Basic nodes

node.h
struct RNode/NODE, enum node_type
eval.c
quick look at rb_eval()

This is the core of the interpreter. Some branches of the big switch statement you can read to get the gist (chosen because they rely on concepts learned before if you followed this guide):

  • NODE_TRUE/NODE_FALSE
  • NODE_IVAR
  • NODE_ISET
  • NODE_IF
  • NODE_SCLASS
  • NODE_DEFN

More complex nodes

eval.c
rb_eval()

Further study of the interpreter taking rb_eval as the starting point Some easy NODEs to begin with:

  • NODE_CLASS: defining new classes
  • NODE_LASGN, NODE_GASGN, NODE_DASGN, NODE CVAR, NODE_CONST: locals, globals, dynamic variables, class variables, constants...
   

The hardest ones are those that handle exceptions and blocks.

Core classes

You can read for instance array.c, hash.c and string.c to see how the core classes are implemented.

Take the class you like, scroll down to the Init_xxx() function and locate the C function that implements the method you want to study. No particular order required.

Harder stuff

More complex last.

The GC

gc.c
gc_mark(), gc_sweep(), obj_free()

It's a fairly straightforward mark&sweep GC, so you'll have no problem understanding it if you know about GCs.

Parsing

Time to take a look at parse.y.

Concentrate on how the AST is built to begin with.

The YACC grammar is tricky, and when combined with yylex it makes for a fairly diffcult read, so skip this unless you specifically want to mess with the grammar.

Threading

eval.c
rb_thread_schedule(), rb_thread_restore_context()

the implementation of green (userspace) threads (you need to know setjmp and friends).




Last modified:2006/09/15 04:54:01
Keyword(s):[blog] [ruby] [frontpage] [internals] [guide] [self-study]
References:[Ruby internals: a self-study guide to the sources]