Chapter 4
Classes and Modules

The following six C functions constitute to principle API for defining Classes, Modules and Methods.

  1. rb_define_class
  2. rb_define_class_under
  3. rb_define_module
  4. rb_define_module_under
  5. rb_define_method
  6. rb_define_singleton_method

There exist derivatives of these for special cases, however the majority of the API Library is built with these functions.

We should stress that these routines are used primarily for implementing the Ruby API. For example, rb_define_class is used to create the Built-in Class Objects. However, all of these routines are available for creating Ruby Extensions in 'C'.

Defining of a Class

Rb_define_class defines a new Top-Level Class with the a given name and superclass. Classes and Superclasses for built-in types are of the form: rb_cXxxxxx, rb_mXxxxxx and rb_eXxxxxx. For example, to inherit from Class Object use rb_cObject. Top-level Classes are Classes that are NOT nested inside another Class. They CAN appear at any level within the inheritance tree.

VALUE rb_define_class(name, super) const char *name; VALUE super; { VALUE klass; ID id; id = rb_intern(name); if (rb_const_defined(rb_cObject, id)) { klass = rb_const_get(rb_cObject, id); if (TYPE(klass) != T_CLASS) { rb_raise(rb_eTypeError, "%s is not a class", name); } if (rb_class_real(RCLASS(klass)->super) != super) { rb_name_error(id, "%s is already defined", name); } return klass; } if (!super) { rb_warn("no super class for `%s', Object assumed", name); } klass = rb_define_class_id(id, super); st_add_direct(rb_class_tbl, id, klass); rb_name_class(klass, id); rb_const_set(rb_cObject, id, klass); rb_class_inherited(super, klass); return klass; } (class.c)

When creating a Class, more than one Object needs to be created. For every Class Object created, a Constant object must be created to hold a reference to the Class Object. Additionally, an ID Value must be created for the for the Class and Constant Name. The Name/ID pair is stored in the two Hash Tables, sym_tbl and sym_rev_tbl. These tables allow quick conversions from ID to Name and Name to ID.

These actions are performed by calling rb_intern(name). It first tries to lookup the Name in sys_tbl. If it is found, the ID value from this table is returned. If not, the function generates a ID of the appropriate type for the Name.

Additionally, the Name/Object Address pair must be stored in the rb_class_tbl for all defined Classes.

Identifier's (ID's) are composite of two things, an Internal Reference number and a Scope Identifier.Furthermore, the Internal Reference represents two different classes of items, Object References and Internal_Token References. The difference between these items is the magnitude of the reference number. The list of Built-in Token Symbols is maintained internally, and the length of this list is a fixed value represented by tLAST_TOKEN. Internal References greater than this value are Object References, and values that are lower are Internal_Token References.

The Scope Identifier is represented by the lower three(3) bits of the Internal Reference(ID). It's value is one to the following:

#define ID_SCOPE_SHIFT 3         /* Shift Left to make room for ID */
#define ID_SCOPE_MASK 0x07       /* Mask for Scope ID */

#define ID_LOCAL    0x01
#define ID_INSTANCE 0x02
#define ID_GLOBAL   0x03
#define ID_ATTRSET  0x04
#define ID_CONST    0x05
#define ID_CLASS    0x06
#define ID_JUNK     0x07

#define ID_INTERNAL ID_JUNK      /* Unidentified Scope Indentifier */

(parse.y)

At this point no objects have created, but we have loaded or created a Name/ID pair. Before creating the actual objects, rb_define_class must check a couple of error and warning conditions.

Now that a valid ID has been returned by rb_intern(), the function checks to see if a Constant with the same name is already defined. If defined, the type of the klass must be T_CLASS and the super value be equal the requested value (second parameter). If not an error will be raised and processing aborted.

If no errors is detected, the super entry is inspected. If it is empty, a pointer to rb_cObject is used and a warning message displayed!

OK boys and girls, we are finally ready to create the objects.

The function rb_define_class_id(...) creates a new class object and metaclass object. Most of the work is done by the function rb_class_boot.

Creating a Class
The first task is to request a Object Sized (20 Bytes)  block
of memory in ObjectSpace.   The memory that constitutes ObjectSpace 
is managed by the Garbage Collector (See Chapter five for detailed
explanation of Garbage Collection).
The block is set to zero.
The Class Object is initalized: xx->klass is set to "rb_cClass" 
and xx->flags is set to "T_CLASS".
The xx->super entry is set to the input parameter super.
xx->iv_tbl into is set to zero.
xx->m_tbl is initialized as empty numeric Hash Table.
The xx->flags is or'ed with the Super->Flags and the Taint  
bit is set.
Create a Metaclass and insert it between this class and the 
defined Klass Entry.
Set "__classid__" in xx->iv_tbl with the value of rb_id2name(id).
Add name/id pair to rb_class_tbl and rb_cObject's iv_tbl.

Creating a Metaclass

Metaclasses are similarly constructed:

Requests a Object Sized block of memory from ObjectSpace.
The block is set to zero.
The Class Object is initalized: xx->klass is set to "rb_cClass" and xx->flags is set to "T_CLASS".
The xx->super entry is set to the input parameter super.
xx->iv_tbl into is set to zero.
xx->m_tbl is initialized as empty numeric Hash Table.
The FL_SINGLETON Flag is set in xx->flags.
The Constant "__attached__" is inserted into the Singleton's iv_tbl. It's value is the ID of the associated class.
Return address of created Metaclass.

The following is a minimal definition for a class named TestCase.

  class TestCase
    def initialize
    end
  end

Figure 1a Is the node structure before the Singleton is created, 1b) is the completed node structure,

Figure 1: rb_define_class(...);

Defining of a Nested Class

The function rb_define_class_under defines a class that is subordinate, or nested, inside another class. An example of such a nested class is Stat.


void
Init_File(void)
{
      :          :          :          :
    rb_cFile = rb_define_class("File", rb_cIO);
      :          :          :          :
    /* Define Methods and other functions for Class File */
      :          :          :          :
    rb_cStat = rb_define_class_under(rb_cFile, "Stat", rb_cObject);
      :          :          :          :
    /* Define Methods and other functions for Nested Class Stat */
}

(file.c)

The difference between rb_define_class_under and rb_define_class is that the '..under' places the name of the class/constant under the Outer Class not rb_cObject!

A simplified and incomplete Ruby Program Equivalent is:

  class File < IO
    class Stat < Object

Definition of a Module

The processing for rb_define_module differs from rb_define_class in several ways. Since modules do use the super entry, various processing steps associated with the super value are not needed. The type field of the class/module structure is set to T_MODULE instead of T_CLASS. The following is the general processing for creating modules.

Creating a Module
The first task is to request a Object Sized (20 Bytes)  block
of memory in ObjectSpace.
The block is set to zero.
The Class Object is initalized: xx->klass is set to "rb_cModule" 
and xx->flags is set to "T_MODULE".
The xx->super entry is set to zero.
xx->iv_tbl into is set to zero.
xx->m_tbl is initialized as empty numeric Hash Table.
Set "__classid__" in xx->iv_tbl with the value of rb_id2name(id).
Add name/id pair to rb_class_tbl and rb_cObject's iv_tbl.

An example of a module definition is as follows:

void Init_Enumerable(void) { rb_mEnumerable = rb_define_module("Enumerable"); rb_define_method(rb_mEnumerable,"to_a", enum_to_a, 0); /* Define Methods and other functions for Module Enumerable */ } (enum.c)

Note that the naming convention rb_mXxxxx is used.   The 'm' indicates a Module object reference, whereas a 'c' indicates a Class reference.

When a module is included, it is inserted into the inheritence tree beween the current class and the super class.   Multiple module inclusions take precedence over the superclass.

Figure 2:   Module inclusion;
Definition of a Method

The function rb_define_method is used extensively.   The following is an example of defining a method, in this case:
Array#to_s Definition
rb_define_method(rb_cArray, "to_s", rb_ary_to_s, 0); (array.c)

This function uses rb_intern to convert the name ("to_s") into an ID.   This is associated with the function pointer, rb_ary_to_s.   The forth argument indicates the number of arguments required by the function routine.

The equivalent ruby code for defining a method within Class Array would be:
Class Array < Object def to_s rb_ary_to_s() function code end end

The following shows a definition that takes one(1) argument.

rb_define_method(rb_cArray, "concat", rb_ary_concat, 1); (array.c)

The equivalent ruby code for defining this method would be:
Class Array < Object def concat (str) method implementation code end end

Class Initialization

When a Class or Module is Instantiated, it needs to be initialized.  In the case of user defined Classes and Modules, the "Initialization" method handles this chore.  Built-in Classes and Modules must also be initialized and they all have a init_xxxx() function that performs the initialization task.

void Init_Array(void) { rb_cArray = rb_define_class("Array", rb_cObject); rb_include_module(rb_cArray, rb_mEnumerable); rb_define_alloc_func(rb_cArray, ary_alloc); rb_define_singleton_method(rb_cArray, "[]", rb_ary_s_create, -1); rb_define_method(rb_cArray, "initialize", rb_ary_initialize, -1); rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1); rb_define_method(rb_cArray, "to_s", rb_ary_to_s, 0); rb_define_method(rb_cArray, "inspect", rb_ary_inspect, 0); rb_define_method(rb_cArray, "to_a", rb_ary_to_a, 0); :: :: :: :: :: :: :: :: id_cmp = rb_intern("<=>"); rb_cValues = rb_define_class("Values", rb_cArray); } (array.c)

Ruby calls these initialization functions when starts, via rb_call_inits().

void rb_call_inits(void) { Init_sym(); Init_var_tables(); Init_Object(); Init_Comparable(); :: :: :: :: Init_marshal(); Init_Enumerator(); Init_version(); } (inits.c)

The installed library and extended libraries must also be initialized.   The complete explanation of these operations are contained in Chapter 18.   Loading a library:

Require "somelibrary"

The loading mechanism can load either compiled "C" object files (*.o, *.so) or Ruby files (*.rb).   In the case of compiled libraries, the same init_xxxx() function is expected.   With Ruby Libraries the Classes and Modules defined should have an "Initialization" Method.

The extended library stingio is an example of a compiled libary element.   Notice that it is initialized in the same manner as built-in classes.   Examine the items in the ext directory in the Ruby Source Directory.

void Init_stringio() { VALUE StringIO = rb_define_class("StringIO", rb_cData); rb_include_module(StringIO, rb_mEnumerable); rb_define_alloc_func(StringIO, strio_s_allocate); :: :: :: :: :: :: :: :: rb_define_method(StringIO, "truncate", strio_truncate, 1); } (ext/stringio.c)

Singleton Creation

For each class generated by rb_define_class, a singleton class is also generated.   In addtion, Singleton Classes can be explicitly created.

Definition of Singleton Methods

In addition to class method definitions as described in the previous sections, Ruby also allows definition of Singleton Methods.   Singleton Methods are methods that are only defined for a particular Instance of a class.

rb_define_singleton_method(rb_cFile,"link",rb_file_s_link, 2); (file.c)

Singleton classes

Rb_define_singleton_method ()

Now that the creation of a normal methods are understood, let us tackle singleton methods.   The method is created and it's name is registered in the classes m_tbl.   But a singleton method is a different animal.

void rb_define_singleton_method(VALUE obj, const char *name, VALUE(*func), int argc); { rb_define_method(rb_singleton_class(obj), name, func, argc); } (class.c)

As explained, rb_define_method is the normal method definition function.  The difference is the replacement of the Class Name parameter with the rb_singleton_class function.  Briefly, this function creates a Singleton Class for the indicated Class, providing it does not already exist, then populates it with the method specified.

A Unique Class is formally referred to as a Singleton Class.  It should be understood that it is a virtual class, it can not be sub-classed or instatiated.   It's only purpose is to provide a place to mount methods and instance variables that are uniquely associated with the specified Class Name.

This figure shows the simplified object structure after the call to rb_define_singleton_method() with the parameter values of (rb_cArray, "test_singleton_method", rb_test_single_method, 0).

Figure 3:   rb_define_singleton_method(...);

Rb_singleton_class ()

In this section we will be using Call Graphs to better explain the creation of singleton class and their ramifications.

rb_define_singleton_method rb_define_method rb_singleton_class SPECIAL_SINGLETON rb_make_metaclass rb_class_boot rb_singleton_class_attached rb_class_real

The Call Graph shown above is a Static Call Graph. It shows the functions as they appear in the source code.   A Call Graph the shows only the functions called at run-time is a Dynamic Call Graph.

As for the figure above, function calls within a procedure are indicated by indention.   For Example, rb_define_singleton_method() calls rb_define_method() and rb_singleton_class().  In turn, SPECIAL_SINGLETON and rb_make_metaclass() are called by rb_singleton_class().   These call graphs were generated by the program cflow.

The call graphs are indented to show parent/child relationship between functions.   When a function has a deep call graph, it can be easy to forget who is calling who.   All functions at particular level were called by the last function at the previous level. For Example, the functions rb_class_boot(), rb_singleton_class_attached() and rb_class_real() are called by rb_make_metaclass().

As an example of more complex call graph, see the call graph in the next section.   The functions rb_class_new and rb_make_metaclass are both called by rb_define_class_id.

What exactly are singleton classes?

What is the purpose of singleton classes?

Normal classes and singleton classes

Singleton classes are special classes.  We will explore these differences, and the reasons for them.

What should we do to find them?   We should find the differences between the function creating normal classes and the one creating singleton classes.  Normal classes can be defined by the rb_define_class() function.   It call a series of functions to create normal classes.  For the moment, we'll not look at the full content of rb_define_class().  Let us skip ahead to the call graph of rb_define_class().

rb_define_class rb_intern rb_const_defined rb_define_class_id rb_class_new rb_make_metaclass rb_class_boot rb_singleton_class_attached rb_class_real st_add_direct rb_name_class rb_const_set rb_class_inherited

The following is the code for rb_class_new.

VALUE rb_class_new(super) VALUE super; { Check_Type(super, T_CLASS); if (super == rb_cClass) { rb_raise(rb_eTypeError, "can't make subclass of Class"); } if (FL_TEST(super, FL_SINGLETON)) { rb_raise(rb_eTypeError, "can't make subclass of virtual class"); } return rb_class_boot(super); } (class.c)

Check_Type() is checks the type of object structure, so we can ignore it.   rb_raise() is error handling so we can ignore it.   Only rb_class_boot() remains.   So let's look at it.

VALUE rb_class_boot(super) VALUE super; { NEWOBJ(klass, struct RClass); /* allocates struct RClass */ OBJSETUP(klass, rb_cClass, T_CLASS); /* initialization of the RBasic */ klass->super = super; /* (A) */ klass->iv_tbl = 0; klass->m_tbl = 0; klass->m_tbl = st_init_numtable(); OBJ_INFECT(klass, super); return (VALUE)klass; } (class.c)

NEWOBJ() and OBJSETUP() are fixed expressions used when creating Ruby objects that processes one of the internal structure types ( struct Rxxxx).   They are both macros.   In NEWOBJ(), struct RClass is created and the pointer is put in its first parameter klass.   In OBJSETUP(), the struct RBasic member of the RClass (and thus basic.klass and basic.flags) is initialized.

OBJ_INFECT() is a macro related to security.  It is responsible for insuring that security levels assigned to the super class are transfered to the new class.

At (A), the super member of the class is set to the super parameter.   The function rb_class_boot() is responsible for creating a class inheriting from super.

Then, let's look at rb_singleton_class()'s call graph:

VALUE rb_singleton_class SPECIAL_SINGLETON rb_make_metaclass rb_class_boot rb_singleton_class_attached rb_class_real

Here also rb_class_boot() is called.   So up to that point, it's the same as in normal classes.   What's going on after is what's different between normal classes and singleton classes, in other words the characteristics of singleton classes.   If you everything's clear so far, we just need to read rb_singleton_class() and rb_make_metaclass().

Rb_singleton_class()

The Macro SPECIAL_SINGLETON is defined to support testing for Special Constants.

#define SPECIAL_SINGLETON(x,c) do {\ if (obj == (x)) {\ return c;\ }\ } while (0)

The rb_singleton_class() is the function most envolved in creating Singleton Classes.   It comprises the following functional steps:

1) Handle Creating Singletons for the appropriate Special Constants 2) Create or retrieve the address of the Metaclass 3) Transfer super's TAINT status to the Metaclass 4) Transfer super's Frozen status to the Metaclass

VALUE rb_singleton_class(obj) VALUE obj; { VALUE klass; if (FIXNUM_P(obj) || SYMBOL_P(obj)) { rb_raise(rb_eTypeError, "can't define singleton"); } if (rb_special_const_p(obj)) { SPECIAL_SINGLETON(Qnil, rb_cNilClass); SPECIAL_SINGLETON(Qfalse, rb_cFalseClass); SPECIAL_SINGLETON(Qtrue, rb_cTrueClass); rb_bug("unknown immediate %ld", obj); } DEFER_INTS; if (FL_TEST(RBASIC(obj)->klass, FL_SINGLETON) && rb_iv_get(RBASIC(obj)->klass, "__attached__") == obj)) { klass = RBASIC(obj)->klass; } else { klass = rb_make_metaclass(obj, RBASIC(obj)->klass); } if (OBJ_TAINTED(obj)) { OBJ_TAINT(klass); } else { FL_UNSET(klass, FL_TAINT); } if (OBJ_FROZEN(obj)) OBJ_FREEZE(klass); ALLOW_INTS; return klass; } (class.c)

The first and the second half are separated by a blank line.   The first half(step 1) handles a special contants and the second half(steps 2,3 & 4) handles the general case.   In other words, the second half is the trunk of the function.   That's why we'll keep it for later and talk about the first half.

Everything that is handled in the first half are non-pointer VALUE's, in other words objects without an existing C structure.   First, Fixnum and Symbol are explicitely checked.   If found, an error is generated because singletons can not be genereated for FIXNUM and SYMBOL types.   The rb_special_const_p() is a function that returns true for non-pointer VALUEs, so there only Qtrue, Qfalse and Qnil should get caught.   Other than that, there are no valid non-pointer values.   Any other values will cause an exception.

DEFER_INTS() and ALLOW_INTS() are macros related to signals.   As you can guess, DEFER_INTS() suspends or blocks interupt processing and ALLOW_INTS() unblocks interupt processing.   This allows modification of Object Space structures without interruption.

VALUE's Dark Secret -- Part II!

If you look at SPECIAL_CONSTANT processing you find it creates a VALUE Pointer to a built-in class for that Constant.   Essentially, for processing Normal and Class Methods there are actual objects substituted for these non-pointer values.   This is what allows singleton classes to be created for the special constants Nil, True, and Flase!

rb_make_metaclass()

Step 2 envolves the creation of the metaclass or locating the metaclass if already exists.   The following code fragment from rb_singleton_class handles this processing.

VALUE rb_singleton_class(obj) { :: :: :: :: if (FL_TEST(RBASIC(obj)->klass, FL_SINGLETON) && rb_iv_get(RBASIC(obj)->klass, "__attached__") == obj) { klass = RBASIC(obj)->klass; } else { klass = rb_make_metaclass(obj, RBASIC(obj)->klass); } :: :: :: :: } (class.c)

As we will see when examing rb_make_metaclass(), metaclasses always have a iv_tbl entry named "__attached__".   This entry contains the name of it's associated class.

If a class already has a metaclass, it's RBASIC(xx)->klass will point to the metaclass.   Additionally, the current object must be a Class or Module.   Further, if these conditions are met, the value of the variable "__attached__" is requested and compared with the current Object.   If it is a match, the klass variable is set to the address of the metaclass.

If these conditions are not met, then the function rb_make_metaclass() is called to create the metaclass and the klass variable is set to the returned metaclass address.

VALUE rb_make_metaclass(obj, super) VALUE obj, super; { VALUE klass = rb_class_boot(super); FL_SET(klass, FL_SINGLETON); RBASIC(obj)->klass = klass; rb_singleton_class_attached(klass, obj); if (BUILTIN_TYPE(obj) == T_CLASS && FL_TEST(obj, FL_SINGLETON)) { RBASIC(klass)->klass = klass; RCLASS(klass)->super = RBASIC(rb_class_real(RCLASS(obj)->super))->klass; } else { VALUE metasuper = RBASIC(rb_class_real(super))->klass; /* metaclass of a superclass may be NULL at boot time */ if (metasuper) { RBASIC(klass)->klass = metasuper; } } return klass; } (class.c)
As we have seen earlier, rb_class_boot() creates a class structure and initializes it as a subordinate class to the specified super class.   At this point, things are exactly the same as the creation of a Normal Class.
What follows are the actions that modify this normal class structure for it's role as a Singleton Class.  The changes are as follows:

Set the SINGLETON Bit in the metaclass flag word.
Set the base objects klass pointer to point at this object (The new Singleton Object).
Set the super entry to point at the base objects klass pointer.
Call rb_singleton_class_attached to create the iv_tbl "__attached__" entry in the metaclass.
Adjust the super pointer of the metaclass.

The chart below shows the result of the first three teps described above.

Figure 4: rb_singleton_class

When comparing the first and last part of this diagram, you can understand that sclass is inserted without changing the structure.   That's all there is to singleton classes.   In other words the inheritance is increased one step.   If a method is defined in a singleton class, this construction allows the other instances of klass to define completely different methods.

Rb_singleton_class_attached

We will now look at some processing that only lightly touched on.   After the adjustments described above we call this function to create the "__attached__" variable in the new metaclass.

Let's have a look at what it does.

void rb_singleton_class_attached(klass, obj) VALUE klass, obj; { if (FL_TEST(klass, FL_SINGLETON)) { if (!RCLASS(klass)->iv_tbl) { RCLASS(klass)->iv_tbl = st_init_numtable(); } st_insert(RCLASS(klass)->iv_tbl, rb_intern("__attached__"), obj); } } (class.c)

If the FL_SINGLETON flag of klass is set, in other words if it' a singleton class, put the "__attached__" obj relation in the instance variable table of klass ( iv_tbl).

Although the "__attached__" variable does not have the '@' prefix, it is still stored in the instance variables table.   Such instance variable can never be read at the Ruby level, so it can be used to keep values for the system's exclusive use.

Let's now think about the relationship between Klass and obj.   klass is the singleton class of obj.   In other words, this “invisible” instance variable allows the singleton class to remember the instance it was created from.   Its value is used when the singleton class is changed, notably to call hook methods on the instance (i.e.   obj).   For example, when a method is added to a singleton class, the obj's singleton_method_added method is called.   There is no logical necessity to doing it, it was done because that's how it was defined in the language.

But is it really all right?   Storing the instance in __attached__ will force one singleton class to have only one attached instance.   For example, by getting (in some way or an other) the singleton class and calling new on it, won't a singleton class end up having multiple instances?

This cannot be done because the proper checks are done to prevent the creation of an instance of a singleton class.

Singleton classes are in the first place for singleton methods.   Singleton methods are methods existing only on a particular object.   If singleton classes could have multiple instances, there would the same as normal classes.   That's why they are forced to only have one instance.

Rb_class_real

VALUE rb_class_real(cl) VALUE cl; { while (FL_TEST(cl, FL_SINGLETON) || TYPE(cl) == T_ICLASS) { cl = RCLASS(cl)->super; } return cl; } (object.c)

The processing for step five(5) involves insuring that the super entry for the metaclass is a viable canidate.   That is, a class that Is Not either a Singelton or an ICLASS Node!   The function is passed the super input value from 'rb_make_metaclass' and then searches up the inheritance tree until it finds a suitable Object.

The typical resulting object is a rb_cClass for top level classes.   Notice that the klass entry is the Metaclass_class object.

OBJECT ====> Object Address ------>0x00b7f29d08 <Object> rb_cClass RBASIC ====> flags --------------->0x0000000003 <Class> RBASIC ====> klass --------------->0x00b7f29ccc <klass> metaclass_class RCLASS ====> iv_tbl -------------->0x0008fcbdf8 => Variables RCLASS ====> m_tbl --------------->0x0008fcbd70 => Methods RCLASS ====> super --------------->0x00b7f29d1c <super> rb_cModule (Dump of rb_cClass Object)

Figure 5a Is the node structure before the Singleton is created, 5b) is the completed node structure,

Figure 5c is the result of the following Ruby Code that adds a singleton method:

class TestCase def initialize end end # # Define a Singleton Class for 'TestCase' # test_object = TestCase.new() def test_object.test_singleton_method() puts ("TestCase Singleton Method") end

Figure 5:   rb_define_class(...) -- Followed by rb_singleton_class(...);

Summary

We've done a lot, maybe made a real mayhem, so let's finish and put everything in order with a summary.

What are singleton classes?   They are classes that have the FL_SINGLETON flag set and that can only have one instance.

What are singleton methods?   They are methods defined in the singleton class of an object.

What are Singleton instance variables?   They variables defined in the singleton class of an object.
Metaclasses

Inheritance of singleton methods

Infinite chain of classes

Even a class has a class, and it's Class.   And the class of Class is again Class.   We find ourselves in an infinite loop (Figure 5.

Figure 5: Infinite loop of classes

Up to here it's something we've already gone through.   What's going after that is the theme of this chapter.   Why do classes have to make a loop?

First, in Ruby all data are objects.   And classes are data so in Ruby they have to be objects.

As they are objects, they must answer to methods.   And setting the rule “to answer to methods you must belong to a class” made processing easier.   That's where comes the need for a class to also have a class.

Let's base ourselves on this and think about the way to implement it. First, we can try first with the most naïve way, Class's class is ClassClass, ClassClass's class is ClassClassClass..., chaining classes of classes one by one.   But whichever the way you look at it, this can't be implemented effectively.   That's why it's common in object oriented languages where classes are objects that Class's class is to Class itself, creating an endless virtual instance-class relationship.

I'm repeating myself, but the fact that Class's class is Class is only to make the implementation easier, there's nothing important in this logic.

“Class is also an object”

“Everything is an object” is often used as advertising statement when speaking about Ruby.   And as a part of that, “Classes are also object!” also appears.   But these expressions often go too far.   When thinking about these sayings, we have to split them in two:

all data are objects

classes are data

Talking about data or code makes a discussion much harder to understand.   That's why here we'll restrict the meaning of “data” to “what can be put in variables in programs”.

Being able to manipulate classes from programs gives programs the ability to manipulate themselves.   This is called reflection.   It fits object oriented languages, and even more Ruby with the classes it has, to be able to directly manipulate classes.

Nevertheless, classes could be made available in a form that is not an object.   For example, classes could be manipulated with function-style methods (functions defined at the top-level).   However, as inside the interpreter there are data structures to represent the classes, it's more natural in object oriented languages to make them available directly.   And Ruby did this choice.

Furthermore, an objective in Ruby is for all data to be objects.   That's why it's appropriate to make them objects.

By the way, there is a reason not linked to reflection why in Ruby classes had to be made objects.   That is to be able to define methods independently from instances (what is called static methods in Java and in C++).

And to implement static methods, another thing was necessary: singleton methods.   By chain reaction, that also makes singleton classes necessary.

Figure 6: Dependency relationships

Class methods inheritance

In Ruby, singleton methods defined in a class are called class methods.   However, their specification is a little strange.   Why are class methods inherited?

class A def A.test # defines a singleton method in A puts("ok") end end class B < A end B.test() # calls it

This can't occur with singleton methods from objects that are not classes.   In other words, classes are the only ones handled specially.   In the following section we'll see how class methods are inherited.

Singleton class of a class

Assuming that class methods are inherited, where is this operation done?   At class definition (creation)?   At singleton method definition? Then let's look at the code defining classes.

Class definition means of course rb_define_class().   Now let's take the call graph of this function.

rb_define_class rb_class_inherited rb_define_class_id rb_class_new rb_class_boot rb_make_metaclass rb_class_boot rb_singleton_class_attached

If you're wondering where you've seen it before, we looked at it in the previous section.   At that time we did not dwell on it, but why does rb_make_metaclass() appear?   As we saw before, this function introduces a singleton class.   This is very suspicious.   Why is this called even if we are not defining a singleton function? Furthermore, why is the lower level rb_make_metaclass() used instead of rb_singleton_class()?   It looks like we have to check these surroundings again.

rb_define_class_id()

Let's first start our reading with its caller, rb_define_class_id().

VALUE rb_define_class_id(id, super) ID id; VALUE super; { VALUE klass; if (!super) super = rb_cObject; klass = rb_class_new(super); rb_make_metaclass(klass, RBASIC(super)->klass); return klass; } (class.c)

Rb_class_new() was a function that creates a class with super as its superclass.   After that there's the rb_make_metaclass() in question.    I'm concerned by the fact that when called from rb_singleton_class(), t he parameters are different.   Last time was like this:

rb_make_metaclass(obj, RBASIC(obj)->klass);

But this time is like this:

rb_make_metaclass(klass, RBASIC(super)->klass);

So as you can see it's slightly different.   How do the results change depending on that?   Let's have once again a look at a rb_make_metaclass().

rb_make_metaclass (once more)

VALUE rb_make_metaclass(obj, super) VALUE obj, super; { VALUE klass = rb_class_boot(super); FL_SET(klass, FL_SINGLETON); RBASIC(obj)->klass = klass; rb_singleton_class_attached(klass, obj); if (BUILTIN_TYPE(obj) == T_CLASS && FL_TEST(obj, FL_SINGLETON)) { RBASIC(klass)->klass = klass; RCLASS(klass)->super = RBASIC(rb_class_real(RCLASS(obj)->super))->klass; } else { VALUE metasuper = RBASIC(rb_class_real(super))->klass; /* metaclass of a superclass may be NULL at boot time */ if (metasuper) { RBASIC(klass)->klass = metasuper; } } return klass; } (class.c)

As we saw the last time, if the object type is T_CLASS (i.e. it's a Class), extra procesing is needed.

Now let us follow the processing when rb_make_metaclass() is call as follows:

rb_make_metaclass(klass, RBASIC(super)->klass);

Doing this as a diagram gives something like Figure 7   In it, the names between parentheses are singleton classes.   This notation is often used in this book so I'd like you to remember it.   This means that obj's singleton class is written as (obj).   And (klass) is the singleton class for klass.   It looks like the singleton class is caught between a class and this class's superclass's class.

Figure 7 Introduction of a class's singleton class

From this result, and moreover when thinking more deeply, we can think that the superclass's class must again be the superclass's singleton class.   You'll understand with one more inheritance level (Figure 8).

Figure 8: Hierarchy of multi-level inheritance

As the relationship between super and klass is the same as the one between klass and klass2, c must be the singleton class (super).   If you continue like this, finally you'll arrive at the conclusion that Object's class must be (Object).   And that's the case in practice.   For example, by inheriting like in the following program :

class A < Object end class B < A end

internally, a structure like Figure 9 is created.

Figure 9: Class hierarchy and metaclasses

As classes and their metaclasses are linked and inherit like this, class methods are inherited.

Class of a class of a class

You've understood the working of class methods inheritance, but with singletons we have some questions.   What is the class of a class's singleton class?   To do this we can try debugging.   I've made the Figure 10 from the results of this investigation.

Figure 10: Class of a class's singleton class

A class's singleton class puts itself as its own class.   Quite complicated.

The second question: the class of Object must be Class.   Didn't I properly confirm this in chapter 1: Ruby language minimum?

p(Object.class()) # Class

Certainly, that's the case “at the Ruby level”.   But “at the C level”, it's the singleton class (Object).   If (Object) does not appear at the Ruby level, it's because Object#class skips the singleton classes.   Let's look at the body of the method, rb_obj_class() to confirm that.

VALUE rb_obj_class(obj) VALUE obj; { return rb_class_real(CLASS_OF(obj)); } VALUE rb_class_real(cl) VALUE cl; { while (FL_TEST(cl, FL_SINGLETON) || TYPE(cl) == T_ICLASS) { cl = RCLASS(cl)->super; } return cl; } (object.c)

CLASS_OF(obj) returns the basic.klass of the obj.   While in rb_class_real(), all singleton classes are skipped (advancing towards the superclass).   In the first place, singleton class are caught between a class and its superclass, like a proxy.   That's why when a “real” class is necessary, we have to follow the superclass chain (Figure 11).

I_CLASS will appear later when we will talk about include.

Figure 11: Singleton class and real class

Singleton class and metaclass

Well, the singleton classes that were introduced in classes is also one type of class, it's a class's class.   So it can be called metaclass.

However, you should be wary of the fact that singleton classes are not metaclasses.   It's the singleton classes introduced in classes that are metaclasses.   The important fact is not that they are singleton classes, but that they are the classes of classes.   I was stuck on this point when I started learning Ruby.   As I may not be the only one, I would like to make this clear.

Thinking about this, the rb_make_metaclass() function name is somewhat misleading.   When used in classes, it does indeed create a metaclass, but not in the other cases when created for using objects.

Bootstrap

We have nearly finished our talk about classes and metaclasses.   But there is still one problem left.   It's about the 3 metaobjects Object, Module and Class.   These 3 cannot be created with the common use API.   To make a class, its metaclass must be built, but like we saw some time ago, the metaclass's superclass is Class.   However, as Class has not been created yet, the metaclass cannot be build.   So in ruby, only these 3 classes's creation is handled specially.

Then let's look at the code:

Init_Object() { :: :: :: :: :: :: :: :: rb_cObject = boot_defclass("Object", 0); rb_cModule = boot_defclass("Module", rb_cObject); rb_cClass = boot_defclass("Class", rb_cModule); metaclass = rb_make_metaclass(rb_cObject, rb_cClass); metaclass = rb_make_metaclass(rb_cModule, metaclass); metaclass = rb_make_metaclass(rb_cClass, metaclass); :: :: :: :: :: :: :: :: } (object.c)

First, in the first half, boot_defclass() is similar to rb_class_boot(), it just creates a class with its given superclass set.   These links give us something like the left part of Figure 12.

And in the three lines of the second half, (Object), (Module) and (Class) are created and set (right Side of figure 12).   (Object) and (Module)'s classes… that is themselves… is already set in rb_make_metaclass() so there is no problem.   With this, the metaobjects' bootstrap is finished.

Figure 12: Metaobjects creation

After taking everything into account, it gives us a the final shape like Figure 13.

Figure 13: Ruby metaobjects

Class names

In this section, we will analyse the bi-directional conversion between class and class names, in other words constants.   We will start with rb_define_class() and rb_define_class_under().

Naming classes

As we discussed at the beginning, before create a class we have find or create the constant that will hold the reference to the actual class object. The constant has the same name as the class we wish to create.
The steps in creating a class were outline in a previous section, but the part that interests us is below.   Remember that at this point we have found or created the constant and the name/id pair.

Creating a Class
9. Set "__classid__" in xx->iv_tbl with the value of rb_id2name(id). 10. Add name/id pair to rb_class_tbl and rb_cObject's iv_tbl.

This part assigns the class reference to the constant and to the approriate tables.   In fact, top-level classes are separated from the other constants and regrouped in rb_class_tbl() and the references are stored in rb_cObject's iv_tbl.
Class name

We understood how the class can be obtained from the class name, but how to do the opposite?   By doing things like calling p or Class#name, we can get the name of the class, but how is it implemented?

In fact this was already done a long time ago by rb_name_class().   The call is around the following:

rb_define_class klass = rb_define_class_id(id, super); st_add_direct(rb_class_tbl, id, klass); rb_name_class(klass, id); rb_define_class_id

Let's look at its content:

void rb_name_class(klass, id) VALUE klass; ID id; { rb_iv_set(klass, "__classid__", ID2SYM(id)); } (variable.c)

__classid__ is another instance variable that can't be seen from Ruby.   As only VALUEs can be put in the instance variable table, the ID is converted to Symbol using ID2SYM().

That's how we are able to find the constant name from the class.

Nested classes

So, in the case of classes defined at the top-level, we know how the reciprocal link between name and class is constructed.   What's left is the case of classes defined in modules or other classes, and for that it's a little more complicated.   The function to define these nested classes is rb_define_class_under().

VALUE rb_define_class_under(outer, name, super) VALUE outer; const char *name; VALUE super; { VALUE klass; ID id; id = rb_intern(name); if (rb_const_defined_at(outer, id)) { klass = rb_const_get_at(outer, id); if (TYPE(klass) != T_CLASS) { rb_raise(rb_eTypeError, "%s is not a class", name); } if (rb_class_real(RCLASS(klass)->super) != super) { rb_name_error(id, "%s is already defined", name); } return klass; } if (!super) { rb_warn("no super class for `%s::%s', Object assumed", rb_class2name(outer), name); } klass = rb_define_class_id(id, super); rb_set_class_path(klass, outer, name); rb_const_set(outer, id, klass); rb_class_inherited(super, klass); return klass; } (class.c)

The primary difference between this and rb_define_class is that the constants are stored and/or lookup in the outer classes iv_tbl rather than rb_cObject and how how the "__classpath__" is created (rb_set_class_path).

rb_set_class_path()

This function gives the name name to the class klass nested in the class under.   “class path” means a name including all the nesting information starting from top-level, for example “ Net::NetPrivate::Socket”.

void rb_set_class_path(klass, under, name) VALUE klass, under; const char *name; { VALUE str; if (under == rb_cObject) { /* defined at top-level */ str = rb_str_new2(name); /* create a Ruby string from name */ } else { /* nested constant */ str = rb_str_dup(rb_class_path(under)); /* copy return value */ rb_str_cat2(str, "::"); /* concatenate "::" */ rb_str_cat2(str, name); /* concatenate name */ } rb_iv_set(klass, "__classpath__", str); } (variable.c)

Everything except the last line is the construction of the class path, and the last line makes the class remember its own name.   __classpath__ is of course another instance variable that can't be seen from a Ruby program.   In rb_name_class() there was __classid__.   But __classpath__ is different because it includes nesting information (look at the table below).

__classpath__ Net::NetPrivate::Socket __classid__ Socket

It means most classes have have " __classid__" or "__classpath__" defined.   So to find under's classpath we can look up these instance variables.   This is done by rb_class_path().

Nameless classes

Contrary to what I have just said, there are in fact cases in which neither __classpath__ nor __classid__ are set.   That is because in Ruby you can use a method like the following to create a class.

c = Class.new()

If you create a class like this, we won't go through rb_define_class_id() and the classpath won't be set.   In this case, 'c' does not have any name, which is to say we get an unnamed class.

However, if later it's assigned into a constant, the name of this constant will be attached to the class.

SomeClass = c # the class name is SomeClass

Strictly speaking, the name is attached after the assignment, the first time it is requested.   For instance, when calling p on this SomeClass class or when calling the Class#name method.   When doing this, a value equal to the class is searched in rb_class_tbl, and a name has to be chosen.   The following case can also happen:

class A class B C = tmp = Class.new() p(tmp) # here we search for the name end end

so in the worst case we have to search for the whole constant space.   However, generally, there aren't many constants so searching all constants does not take too much time.

Include

We only talked about classes so let's finish this chapter with something else and talk about module inclusion.

rb_include_module (1)

Includes are done by the ordinary method Module#include.   Its corresponding function in C is rb_include_module().   In fact, to be precise, its body is rb_mod_include(), and there Module#append_feature is called, and this function's default implementation finally calls rb_include_module().   Mixing what's happening in Ruby and C gives us the following call graph.

Module#include (rb_mod_include) Module#append_features (rb_mod_append_features) rb_include_module

All usual includes are done by rb_include_module().   This function is a little long so we'll look at it a half at a time.   The First Half follows:

void rb_include_module(klass, module) VALUE klass, module; { #endif VALUE p, c; int changed = 0; rb_frozen_class_p(klass); if (!OBJ_TAINTED(klass)) { rb_secure(4); } if (NIL_P(module)) return; if (klass == module) return; if (TYPE(module) != T_MODULE) { Check_Type(module, T_MODULE); } (class.c)

The first half (above) performs type checking and security.   We can ignore this part.   The second half of the processing is below:

OBJ_INFECT(klass, module); c = klass; while (module) { int superclass_seen = Qfalse; if (RCLASS(klass)->m_tbl == RCLASS(module)->m_tbl) //(A) rb_raise(rb_eArgError, "cyclic include detected"); /* ignore if the module included already in superclasses */ for (p = RCLASS(klass)->super; p; p = RCLASS(p)->super) { switch (BUILTIN_TYPE(p)) { case T_ICLASS: if (RCLASS(p)->m_tbl == RCLASS(module)->m_tbl) { if (!superclass_seen) { c = p; /* move insertion point */ } goto skip; } break; case T_CLASS: superclass_seen = Qtrue; break; } } c = RCLASS(c)->super = include_class_new(module,RCLASS(c)->super); changed = 1; skip: module = RCLASS(module)->super; } if (changed) rb_clear_cache(); } (class.c)

First, block (A) above checks for cyclic conditions (i.e. endlessly including itself!).

Second, what the (B) block does is written in the comment.   It seems to be a special condition so let's first skip reading it for now.   By extracting the important parts from the rest we get the following:

c = klass; while (module) { c = RCLASS(c)->super = include_class_new(module, RCLASS(c)->super); module = RCLASS(module)->super; }

In other words, it's a repetition of module's super.   What is in module's super must be a module included by module (because our intuition tells us so).   Then the superclass of the class where the inclusion occurs is replaced with something.   We do not understand much what, but at the moment I saw that I felt “Ah, doesn't this look the addition of elements to a list (like LISP's cons)?” and it suddenly make the story faster.   In other words it's the following form:

list = new(item, list)

Thinking about this, it seems we can expect that module is inserted between c and c->super.   If it's like this, it fits module's specification.

But to be sure of this we have to look at include_class_new().

include_class_new()

static VALUE include_class_new(module, super) VALUE module, super; { NEWOBJ(klass, struct RClass); /* (A) */ OBJSETUP(klass, rb_cClass, T_ICLASS); if (BUILTIN_TYPE(module) == T_ICLASS) { module = RBASIC(module)->klass; } if (!RCLASS(module)->iv_tbl) { RCLASS(module)->iv_tbl = st_init_numtable(); } klass->iv_tbl = RCLASS(module)->iv_tbl; /* (B) */ klass->m_tbl = RCLASS(module)->m_tbl; klass->super = super; /* (C) */ if (TYPE(module) == T_ICLASS) { /* (D) */ RBASIC(klass)->klass = RBASIC(module)->klass; /* (D-1) */ } else { RBASIC(klass)->klass = module; /* (D-2) */ } OBJ_INFECT(klass, module); OBJ_INFECT(klass, super); return (VALUE)klass; } (class.c)

We're lucky there's nothing we do not know.

(A) First create a new class.

(B) Transplant module's instance variable and method tables into this class.

(C) Make the including class's superclass ( super) the super class of this new class.

In other words, this function creates an include class for the module.   The important point is that at (B) only the pointer is moved on, without duplicating the table.   Later, if a method is added, the module's body and the include class will still have exactly the same methods (Figure 14).

Figure 14: Include class

If you look closely at (A), the structure type flag is set to T_ICLASS.   This seems to be the mark of an include class.   This function's name is include_class_new() so ICLASS's I must be include.

And if you think about joining what this function and rb_include_module() do, we know that our previous expectations were not wrong.   In brief, including is inserting the include class of a module between a class and its superclass (Figure 15).

Figure 15: Include

At (D-2) the module is stored in the include class's klass.   At (D-1), the module's body is taken out… at least that's what I'd like to say, but in fact this check does not have any use.   The T_ICLASS check is already done at the beginning of this function, so when arriving here there can't still be a T_ICLASS.   Modification to ruby piled up at a fast pace during quite a long period of time so there are quite a few small overlooks.

There is one more thing to consider.   Somehow the include class's basic.klass is only used to point to the module's body, so for example calling a method on the include class would be very bad.   So include classes must not be seen from Ruby programs.   And in practice all methods skip include classes, with no exception.

Simulation

It was complicated so let's look at a concrete example.   I'd like you to look at Figure 16 (1).   We have the c1 class and the m1 module that includes m2.   From there, the changes made to include m1 in c1 are (2) and (3).   ims are of course include classes.

Figure 16: Include

rb_include_module (2)

Well, now we can explain the part of rb_include_module() we skipped.

/* (A) skip if the superclass already includes module */ for (p = RCLASS(klass)->super; p; p = RCLASS(p)->super) { switch (BUILTIN_TYPE(p)) { case T_ICLASS: if (RCLASS(p)->m_tbl == RCLASS(module)->m_tbl) { if (!superclass_seen) { c = p; /* the inserting point is moved */ } goto skip; } break; case T_CLASS: superclass_seen = Qtrue; break; } } (class.c)

If one of the T_ICLASSes (include classes) that are in klass's superclasses ( p) has the same table as one of the modules we want to include ( module), it's an include class for module.   That's why we skip the inclusion to not include the module twice.   If this module includes an other module ( module->super), we check this once more.

However, when we skip an inclusion, p is a module that has been included once, so its included modules must already be included… that's what I thought for a moment, but we can have the following context:

module M end module M2 end class C include M # M2 is not yet included in M end # therefore M2 is not in C's superclasses module M include M2 # as there M2 is included in M, end class C include M # I would like here to only add M2 end

So on the contrary, there are cases for which include does not have real-time repercussions.

For class inheritance, the class's singleton methods were inherited but in the case of module there is no such thing.   Therefore the singleton methods of the module are not inherited by the including class (or module).   When you want to also inherit singleton methods, the usual way is to override Module#append_features.

The original work is Copyright © 2002 - 2004 Minero AOKI.
Translated by Vincent ISAMBART
Translations and Additions by C.E. Thornton

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.