Ruby internals: a self-study guide to the sources
I want to read Ruby's sources, which order is best?
I've been answering to that question a few times a year, sometimes on ruby-talk, and as of late responding to private emails. The last time I took the extra effort to draft a self-study guide to ruby's internals. Here's a reformatted version.
- Representing Ruby objects, object model basics
- Internal representation of Ruby objects
- Keeping track of Ruby objects
- Object instantiation
- (optional diversion) Internal hash tables used by the functions we'll see later on
- Method dispatching, method cache
- Singleton classes
- Adding methods
- Instance variables
- Evaluating Ruby code
- Core classes
- Harder stuff
- RBasic, Robject, RClass, RFloat, RString, RArray, RRegexp, RHash, RFile, RData and RBignum structs
- struct RVALUE and rb_newobj(), struct heaps_slot
- st.h (st.c if you want to see the implementation)
- rb_call(), struct cache_entry, search_method(), rb_get_method_body
Of special interest is search_method(), which performs method lookup, as you can see how it moves up the class hierarchy (klass chain).
How singleton classes are inserted in the klass chain.
- rb_include_module(), include_class_new()
the meaning of ICLASSes (proxy classes).
how methods are added to the m_tbl table of a klass
- rb_ivar_set(), rb_ivar_get()
how instance variables are stored
At this point, you can read object.c and variable.c to understand most of the object model, peeking into eval.c as needed for the functions you'll see referenced there.
- struct RNode/NODE, enum node_type
- quick look at rb_eval()
This is the core of the interpreter. Some branches of the big switch statement you can read to get the gist (chosen because they rely on concepts learned before if you followed this guide):
Further study of the interpreter taking rb_eval as the starting point Some easy NODEs to begin with:
- NODE_CLASS: defining new classes
- NODE_LASGN, NODE_GASGN, NODE_DASGN, NODE CVAR, NODE_CONST: locals, globals, dynamic variables, class variables, constants...
The hardest ones are those that handle exceptions and blocks.
You can read for instance array.c, hash.c and string.c to see how the core classes are implemented.
Take the class you like, scroll down to the Init_xxx() function and locate the C function that implements the method you want to study. No particular order required.
More complex last.
- gc_mark(), gc_sweep(), obj_free()
It's a fairly straightforward mark&sweep GC, so you'll have no problem understanding it if you know about GCs.
Time to take a look at parse.y.
Concentrate on how the AST is built to begin with.
The YACC grammar is tricky, and when combined with yylex it makes for a fairly diffcult read, so skip this unless you specifically want to mess with the grammar.
- rb_thread_schedule(), rb_thread_restore_context()
the implementation of green (userspace) threads (you need to know setjmp and friends).