Chapter 2
Ruby Objects

Objects

From this chapter on,  we will begin examining the source code.

One of Ruby's principles is the Everything is an Object.  What is an Object?

There are three prime requirements that describe an Object.   Namely:

  1) An object has an identity (an Object Identifier or ID)
  2) It can react to received messages (using Methods)
  3) It can maintain an internal state (Instance Variables)

This chapter will discuss these attributes of Objects and the code that in part supports them.   The Core Sources that are most in play here are:

  1) ruby.h
  2) object.c
  3) class.c 
  4) variable.c

Variables and their relation to Objects

When an Object is created,  it is stored someplace in Object Space.   But to use it,  some sort of handle must be provided.   This is the primary job of Variables  and Constants.   Regardless of the type or class of the Object,  the Variable is always a VALUE(an unsigned long).   With some exceptions,  discussed later,  a Variable most often contains a reference to an Object.

(Value)


Figure 1: VALUE as a Reference to a Object Structure

  typedef unsigned long VALUE;

(ruby.h)

The VALUE is cast into a pointer to the various object structures.   We will cover this topic shortly.

We often create objects that do not have a handle stored in a Variable.   These are often transitory or intermediate objects used in expressions.   These sort of references to these objects are maintained in the Syntactic Tree.  The construction and api's associated with the Syntactic Tree  are dicussed in Chapter 12.   Finally,  Objects can be created that are never referenced,  and they simply exist in Object Space,  until the Garbage Collector destroys them.   For example:

   c  = "List of Files" 
   p("printing a list of files")  
   "hi there"

The first line creates a single object (of Class Sting) and stores a reference to itself in the variable 'c'.   In other words it creates  variable  that points to the string  "list of File"  in Object Space.   

The second line creates an object (a String) without creating a variable to hold it reference.  This is an example of an Intermediate Object  that is only exists for the purpose of sending itself to the method  'p'.   The 'p'  method is part of the standard list of objects generated by Ruby before the user program is read.   This method prints the result of <object>.inspect.

The last example is simply creates the string object "hi there".   However,  once it has been created in this manner,  it is inaccessible.   It will be destroyed by the garbage collector (asynchronously) at some later date.  

Ruby will inform the user with a rather cryptic warning that the statement is 'useless'.

./tst:5: warning: useless use of a literal in void context

Object Structures

The following is a list of the structures that implement objects.   The first part of all Object Structures is the structure RBasic (Described below).   This a structure contains the information that will tell Ruby what kind of object it is and how to handle it.
Struct RBasic    - Preamble for all Structures(flags & Klass ptr)
Struct RObject - General Object - For things not applicable below Struct RClass - Class objects Struct RFloat - Integer Decimals (small & medium) Struct RString - Character strings Struct RArray - Arrays Struct RRegexp - Regular expression Struct RHash - Hash table Struct RFile - IO File,  Socket,  And so on Struct RData - Class for describing everything for 'C' level Struct RStruct - The structure of Ruby Struct Class Struct RBignum - Big integer

For example,  a character string object is represented by the RString Structure.

(String)
Figure 2: Anatomy of a character string object

Now,  let's look at the definitions of several other object types.

* Examples of Object,  String,  and Array Structures

// Structure for a general object
  struct RObject {
  struct RBasic basic;
  struct st_table *iv_tbl;
  } ;

// Character string object structure
  struct RString {
  struct RBasic basic;
  long len;
  char *ptr;
  union {
      long capa;
      VALUE shared;
  } aux;
  };

// Array Object Structure
  struct RArray {
  struct RBasic basic;
  long len;
  union {
      long capa;
      VALUE shared;
  } aux;
  VALUE *ptr;
  };

(ruby.h)

Variables as Reference Pointers (VALUE)

As stated earlier that the data type VALUE may be used to hold a reference to an Object.   This type is cast into a pointer appropriate for the Object being referenced.   There is a macro Rxxxx for each built-in type in Ruby.   For example.

  VALUE str = .....;
  VALUE arr = .....;

 RSTRING(str)->Len;       /* ((struct RString*) str) -> Len */
 RARRAY(arr)->Len;        /* ((struct RArray*)  arr) -> Len */

Foundation Structure -- RBASIC

All Object Structure definitions begin with the structure RBASIC.   This structure contains the information necessary for Ruby to perform the appropriate processing for the object.

(Rbasic)
Figure 3: Struct RBasic

The Structure Definition of RBasic is as follows:

  struct RBasic {
  unsigned long flags;
  VALUE klass;
  };

(ruby.h)

RBasic Entry -- flags

A major reason RBasic is included in all other object structures,  is the entry flags.   It has many uses,  the most important of which indicates the object type (T_xxxx).   The macro TYPE returns the type of any object it is presented.

VALUE str;
str = rb_str_new ();        // Create a new RString Object
type = TYPE(str);           // type = the  enumeration T_STRING

There are flag-enumerations for each Object Structure type.   T_STRING for RString and T_ARRAY for RArray for example.

RBasic Entry -- klass

The other major entry in the structure RBasic is klass.   It holds a reference to the Class of the Object.   This information is contained in a RClass Object.

(Class)
Figure 4: Object and class

  It has been said that class reference is named klass to prevent
  name clashes when Ruby is compiled with a C++ Compiler!

The Use of Type in Structures

I said that the type of structure is stored in the flags member of struct RBasic.    But why do we have to store the type of structure?   It’s to be able to handle all different types of structure via VALUE.    If you cast a pointer to a structure to VALUE,  as the type information does not remain,  the compiler won’t be able to help.    Therefore we have to manage the type ourselves.    That’s the consequence of being able to handle all the structure types in a unified way.

OK,  but the used structure is defined by the class so why are the structure type and class are stored separately? Being able to find the structure type from the class should be enough.   There are two reasons for not doing this.

The first one is (I’m sorry for contradicting what I said before),  in fact there are structures that do not have a struct RBasic (i.e.   they have no klass member).   For example struct RNode that will appear in the second part of the book.   However,  flags is guaranteed to be in the beginning members even in special structures like this.   So if you put the type of structure in flags,  all the object structures can be differentiated in one unified way.

The second reason is that there is no one-to-one correspondence between class and structure.   For example,  all the instances of classes defined at the Ruby level use struct RObject,  so finding a structure from a class would require to keep the correspondence between each class and structure.   That’s why it’s easier and faster to put the information about the type in the structure.

Usage of RBasic.Flags

The RBasic.flags entry is used for many purposes and these will be discussed in detail at the appropriate time.   Just bear in mind that every object contains this entry and it controls how an object is processed.

(Flags)

Figure 5: RBasic.Flags -- Usage Information

The 8 flag bits (FL_USER0 - FL_USER7) are used for multiple purposes by different sections of Ruby.   For example,  FL_USER0 is to indicate,  when set,  that this object is a Singleton.

VALUE's Dark Secret

When we discussed the usage of entries defined as type VALUE,  it was indicated that they held a reference to Objects.   This is true,  but you may have wondered why it was not simply coded as VOID * pointer?  It should come as no surprise that we told a half-truth.   It is not always simply cast to a pointer!

If Ruby had strictly enforced the Object Model, then VALUE could be a VOID* pointer.   However,  certain kinds of simple objects are used so often that this would incur a substantial performance hit.   So in programming Ruby,  they cheated a little.

Some Objects are fully specified by a VALUE,  eliminating the need to create an actual object in Object Space.   This saves a lot of processing cycles and does not functionally compromise the Object Model.   These object types are:

  1) Small Integer
  2) Symbols
  3) True
  4) False
  5) Nil
  6) Undef

How should Variable data be interpreted?   Here we have a possible seven different interpretations for the data.   Ruby Interprets the contents of a Variable as follows:

  1) If the LSB = 1,  it is a Small Integer.

  2) If the VALUE is equal to 0,2,4,  or 6 it is a special
     constant: false,  true,  nil, or undef.

  3) If the lower 8 bits are equal to'0xe',  it is a Symbol.

  4) Otherwise,  it is an Object Reference

The actual processing associated with  Variables  slightly more complicated and each interpretation is discussed below:

Small integer

Because integer types are extremely common,  the formation of Integer Objects can slow the execution of Ruby program.   For this reason signed integers that can be represented in 31 bits or less,  are actually stored in the VALUE itself.

Small Integers (Signed integers of no more than 31 bits) are FIXNUM types.   Integers exceeding this limit are converted the BIGNUM type,  which can hold any size Integer.

To reconstruct the original integer,  the contents of the VALUE are right shifted 1.   This removes the FIXNUM Flag (Bit 0).   The following defines and macro's convert Integers to and from FIXNUM Type.

#define FIXNUM_FLAG 0x01
#define INT2FIX(I) ((VALUE) (((long) (I)) << 1 | FIXNUM_FLAG))
#define FIX2LONG(x) RSHIFT((long)x,1)
(ruby.h)

In brief,  shift 1 bit to the left,  and bitwise 'OR' it with 1.

0110100001000 before conversion
1101000010001 after conversion

That means that Fixnum as VALUE will always be an odd number.   As Ruby object structures are allocated in 20 Byte blocks, addresses will always divisible by 4.   So they do not overlap with the values of Fixnum as VALUE.

There a number of specialized Integer/FIXNUM Conversions.  For example INT2NUM and NUM2INT handle both FIXNUM and BIGNUM Types during conversions.

Also,  to convert int or long to VALUE,  we can use macros like INT2NUM() or LONG2NUM().   Any conversion macro XXXX2XXXX with a name containing NUM can manage both Fixnum and Bignum.   For example if INT2NUM() can’t convert an integer into a Fixnum,  it will automatically convert it to Bignum.   NUM2INT() will convert both Fixnum and Bignum to int.   If the number can’t fit in an int,  an exception will be raised,  so there is not need to check the value range.

Symbol

Internally Ruby uses an ID to reference symbols.   For every symbol,  there is a corresponding unique ID (unsigned long).

  typedef unsigned long ID;

(ruby.h)

With any language processing system,  handling a large volume of symbols (ie character strings) becomes a problem.   Trying to compare strings one by one will increasingly slow the system as the number of symbols increase.   Ruby handles this problem by storing all symbols in a hash table and producing an ID as a key.   This is discussed further in the next chapter.

Ruby processes symbols as much as possible using it's ID.   This greatly increasing processing efficiency.   When a ID is stored in a VALUE word,  it must first be converted into an immediate value.   It is left shifted eight bits,  and the vacated bits is loaded with '0x0e'.   Since this value is NOT divisible by four(4),  Ruby knows that it is a Symbol value.   (All Object Reference Addresses are divisible by four(4) because memory is allocated in quad bytes.)

The Macro's and defines below are used to convert ID's to and from Symbol values.

  #define SYMBOL_FLAG 0x0e
  #define ID2SYM (x) ((VALUE) (((long) (x)) << 8|SYMBOL_FLAG) )
  #define SYM2ID(x) RSHIFT((long)x,8)

(ruby.h)

The following Macro has been defined to test whether a VALUE is holding an Symbol value.   It returns true if it is a symbol value.

  #define SYMBOL_P(x) (((VALUE)(x)&0xff)==SYMBOL_FLAG) 

(ruby.h)

True false nil

These are Special Constants that are stored in VALUE words as immediate values.   When a VALUE word is less than seven and an even number,  it is interpreted as the following immediate values:

  #define Qfalse 0     /* Ruby False*/
  #define Qtrue  2     /* Ruby True*/]
  #define Qnil   4     /* Ruby Nill*/

(ruby.h)

True and false are tested directly.   The following two Macro's test for Nil and a special test for both Nil and false.

  #define NIL_P(v) ((VALUE)(v) == Qnil)         /* True if Nil */
  #define RTEST(v) (((VALUE)(v) & ~Qnil) != 0)  /* True if Nil 
                                                      or False */

(ruby.h)

Qundef

  #define Qundef  6     /* Undefined Value as Placeholder */
(ruby.h)

This value is used internally in the interpreter for Undefined Conditions and is never visible at the Ruby Level.  

Methods

The second requirement of Ruby Objects is that they can respond to messages.   When a Object receives a message it will attempt call the method requested.   Methods are stored in Class Objects.  So to start understanding how objects respond to messages it necessary to understand how class provide the basic support structure of method response processing.  

struct RClass

The fact that Classes and Modules generally similar,  they both use the structure RClass.   They are differentiated by the contents of the RBasic  flags  (ie T_CLASS or T_MODULE).

Modules are also involved in processing messages.   However,  they do so in a way very analogous to Classes,  and discussion of them can be postponed till later.

 300 struct RClass {
 301 struct RBasic basic;
 302 struct st_table *iv_tbl;
 303 struct st_table *m_tbl;
 304 VALUE super;
 305} ;
(ruby.h)

The two fields in RClass that are involved in message processing,  are m_tbl,  and super.

M_tbl  is a pointer to a Hash Table.   The method name serves as the Hash Key .   Assuming the method is found,  an information block (i.e.   a NODE) is returned that contains the information necessary to execute the method.   As with other things here in beginning,  Nodes  are a matter for later discussion.

Super  is reference to the  Superclass  in the Class Inheritance Tree.  Each Class has only one Superclass,  thus only one super  entry in the class structure.

All hash tables used internally by Ruby are referenced with a pointer of type st_table.   Hash table functions and how the work is the topic of the next chapter.  

Ruby is a single inheritance Language.   As we will learn in later chapters,  one the purposes of Modules is to provide for Mix-In's  which yield many of the benefits of Multiple Inheritance without ambiguity.  

The super  entry is used by Ruby to build a Class Inheritance Tree.      Almost all Classes have a reference to it's Superclass.    The only Class that has a super  entry containing 'Nil' is the Root ClassObject.   The details of how the Root Class is constructed and initialized will be discussed in detail in later chapters.   For now,  all that is necessary is the knowledge that all the Kernel Methods are embedded in this class.   This means that all Kernel Methods are available to all classes unless overridden by a later class!

This follow graphic visually depicts the Class Inheritance Tree.  

 (classtree)

Figure 6: Class Inheritance Tree

Search for methods

With the structure described above,  the procedure for searching for methods is straight forward.   If the method is not in the current classes m_tbl ,  then the procedure recursively searches up though the Superclass links.   If the Root Object is reached and the method is not found there,  an error exception will be generated.

The following code tries to activate the method box ,  which does not exist.   This results in an exception error and program termination.

p("Test") 
box("printing a list of files")  
p("OF")

========= Output =========

"Test"
-:2: undefined method `box' for main:Object (NameError)

The following procedure is for sequentially searching Class Inheritance Tree for a method.  

 static NODE*
 search_method (klass,  Id,  Origin)
 VALUE klass,  *origin;
 ID id;
 {
 NODE *body;

 if (!Klass) Return 0;
 while (!St_lookup (RCLASS (klass) ->M_tbl,  Id,  &body) ) {
 klass = RCLASS (klass) ->Super;
 if (!Klass) Return 0;
}

 if (origin) *origin = klass;
 return body;
}
(eval.c)

The hash function st_lookup searches the current classes method table (via the  m_tbl  pointer) for the requested method.   If not found the current Superclass becomes the current class and the search is repeated.   If the super entry is 'Nil',  the function returns false,  otherwise it returns a pointer to the body of the method!

Through out Ruby,  extraordinary care has taken to insure efficient 
processing.    This a major concern of good interpreter design 
and it is especially true of the Ruby Language.

Method lookup functions are another place that could markedly 
slow Ruby programs.    Searching for methods is not only time 
consuming,  but it is one the the most common operations Ruby 
performs.

Ruby solves this by only preforming the method search only the
first time a method name is encountered!  The results of these
searches are cached.   Subsequent references to a  method name are
fulfilled from the cache store.

Instance Variable

Now,  the third requirement of an Object is the ability to retain information specific to an individual Object..  

rb_ivar_set ()

The mechanism for retaining object specific data is an Instance Variable.   Not suprisingly,  references to Instance Variables in an object are stored in a Hash Table.

 VALUE
 rb_ivar_set (obj,  Id,  Val)
 VALUE obj;
 ID id;
 VALUE val;
 {
 if (!OBJ_TAINTED (obj) && rb_safe_level () >= 4)
 rb_raise (rb_eSecurityError,
                   "Insecure: Can't modify instance variable");
 if (OBJ_FROZEN (obj)) rb_error_frozen ("object");
 switches (TYPE (obj)) {
 case T_OBJECT:
 case T_CLASS:
 case T_MODULE:
 if (!ROBJECT (obj) ->Iv_tbl)
              ROBJECT (obj) ->Iv_tbl = st_init_numtable ();
 st_insert (ROBJECT (obj) ->Iv_tbl,  Id,  Val) ;
 break;
 default:
 generic_ivar_set (obj,  Id,  Val) ;
 break;
}
 return val;
}

(variable.c)

The routines rb_raise and rb_error_frozen are error checks.   While these routines are necessary for the actual function,  they are not the main thread of the processing.   For now ignore the error processing and study the procedure's primary function.   With the error code removed,  we are left with a switch and it's attendent processing.

Switch (TYPE (obj)) {
  Case T_aaaa:
  Case T_bbbb:
    :  :  :
  default:
}

The macro TYPE () returns an object's Type Flag (T_OBJECT,  T_STRING,  ..).   These flags a enumerated as integer values and used by the 'switch' statement to select the appropriate processing.   Fixnum and symbols although entirely contained in the variable reference,  still return a type of FIXNUM and SYMBOL,  so these types do not generate any processing problems.  

There are only three object structures that contain an st_table pointer to Instance variables,  they are:

/* 
** TYPE (val) == T_OBJECT 
*/
 struct RObject {
 struct RBasic basic;
 struct st_table *iv_tbl;
} ;
/*
** TYPE (val) == T_CLASS or T_MODULE 
*/
 struct RClass {
 struct RBasic basic;
 struct st_table *iv_tbl;
 struct st_table *m_tbl;
 VALUE super;
};

(ruby.h)

The objects above contain an entry labeled iv_tbl  (Instance Variable TaBLe).   In other words,  OBJECTS,  CLASSES,  and MODULES contain a st_table pointer to a Hash Table for storing  Instance Variables.

For these three classes,  loading a value into an Instance Variable  is accomplished by the following code fragment.   (See rb_ivar_set above)

 if (!ROBJECT (obj) ->iv_tbl)
              ROBJECT (obj) ->iv_tbl = st_init_numtable ();
 st_insert (ROBJECT (obj) ->iv_tbl,  id,  val) ;

A key to understanding this code is that first two entries in ROBJECT and RCLASS Structures are the same.   Because this fact,  ROBJECT and RCLASS Structures can be cast as ROBJECT as long as Only  RBasic and iv_tble entries accessed.   As the chart below shows iv_tbl  can be accessed in both Objects.

 (RObject-RClass)

Figure 6: RObject vs RClass

If the iv_tbl  entry is empty,  an empty Hash Table is constructed.  At this point it assumed that the iv_tbl  is valid.   The Hash Table insert function,  st_insert,  first checks if the method is already in the table.  If it is,  then the value entry is loaded with the new value.   Otherwise,  a new table item is constructed and inserted into the Hash Table.  

Warning: as struct RClass is a class object,  this instance variable table is for the use of the class object itself.   In Ruby programs,  it corresponds to something like the following:

Generic_ivar_set()

As stated above,  only RObject and RClass Object Structures contain an iv_tbl  entries.   So what happens if you want to define an Instance Variable  in other Built-in Objects (or objects derived from them)?    The function generic_ivar_set is used to store the iv_tbl   information for objects without internal iv_tbl  entries.  

 default: 
   generic_ivar_set (obj,  Id,  Val) ;
   break;

(variable.c)

Figure 6b - Code Fragment from rb_ivar_set()

Since Objects other than RObject and RClass,  do not have iv_tbl  entries.   These entries are stored instead in a seperate Hash Table and 'keyed' to the Object's Name.

The structure generic_iv_tbl is the Primary Hash Table.   It uses the Object Name as the Table Key,  and the Table Value is a standard (althougth external)  iv_tbl  entry.  This entry is a pointer to the Hash Table containing the Instance Variables  for the current Object.  

 (givtable)

Figure 7: Generic Instance Variable Tables

 static st_table *generic_iv_tbl;  /* Root of Generic Instance */
                                   /* Variable Table           */

 static void
 generic_ivar_set (obj,  Id,  Val)
 VALUE obj;
 ID id;
 VALUE val;
 {
 st_table *tbl;

      /* Handle Special Constants */
 if (rb_special_const_p (obj)) {
   special_generic_ivar = 1;
 }
      /* If there is no generic_iv_tbl,  create it!,  */
 if (!Generic_iv_tbl) {
   generic_iv_tbl = st_init_numtable ();
 }

      /* Creating and/or Updating a Generic Instance Variable */
 if (!St_lookup (generic_iv_tbl,  Obj,  &tbl) ) {
 FL_SET (obj,  FL_EXIVAR) ;
 tbl = st_init_numtable ();
 st_add_direct (generic_iv_tbl,  Obj,  Tbl) ;
 st_add_direct (tbl,  Id,  Val) ;
 return;
}
 st_insert (tbl,  Id,  Val) ;
}

(variable.c)

The function rb_special_const_p()  returns true if the obj   is a Special Constant  (ie QFALSE,  QTRUE,  QNIL,  QUNDEF,  SYMBOL,  or FIXNUM).

The function st_init_numtable  creates a new empty Hash Table.  

The function st_lookup  first is used to determine if there are any Instance Variables associated with this object.   If there are Instance Variables associated with the current object a pointer the Hash Table for that object is returned.   The particular Instance Variable  is then either updated or created if it does not currently exist.

It there is no Instance Variables associated with this object,  then a Generic Instance Variable entry will be created for the current object.   An Empty Hash Table is created and added to the generic_iv_tbl  for the current object.   A Hash Table for the Instance Variable is created and inserted into the empty table just created in the generic_iv_tbl.

The Macro FL_SET is used to set the EXIVAR flag in the Current Object's basic.flags.   The flag can be thought of as an abreviation of as EXternal Instance VARiable.

Object Construction

The question is asked,  why do only RObject  and RClass structures have iv_tbl entries? There are several reasons.

The primary reason is reducing memory usage.   Adding an iv_tbl  to each Built-in Object would increase memory consumption by 20%.   Instance Variables  are most often used in Objects,  Classes,  and Modules.   While any object can have Instance Variables,  they seldom are found in other objects.   Thus,  using a Generic Instance Tables to implement Instance Variables  substationally reduces memory usage with only a small pentely in effeciency.

A secondary reason is that all objects are now about the same size.   The all fit within twenty(20) bytes.   This in turn improves Garbage Collection and Memory Allocation functions.   This issue is discussed in more detail in the chapter describing Ruby Garbage Collection processing.  

As an interesting side note,  Generic Instance Tables where not introduced until Ruby 1.2.

rb_ivar_get ()

The function rb_ivar_set () is used to create and/or update the value of an Instance Variable. Here we discuss the companion function that returns the value of an Instance Variable.

 VALUE
 rb_ivar_get (obj,  Id)
 VALUE obj;
 ID id;
 {
 VALUE val;

 switches (TYPE (obj)) {
  /* (A) */
 case T_OBJECT:
 case T_CLASS:
 case T_MODULE:
 if (ROBJECT (obj) ->Iv_tbl &&
              St_lookup (ROBJECT (obj) ->Iv_tbl,  Id,  &val) )
 return val;
 break;
  /* (B) */
 default:
 if (FL_TEST (obj,  FL_EXIVAR) || rb_special_const_p (obj))
 return generic_ivar_get (obj,  Id) ;
 break;
}
  /* (C) */
 rb_warning ("instance variable %s not initialized",  
             Rb_id2name (id));

 return Qnil;
}
(variable.c)

(A)   The RObject  iv_tbl  entry is checked.   If it is NULL,  the break statement is executed and process continues at (C).   Otherwise,  the function st_lookup  is called.   If the desired Instance Variable  is found,  the resulting value is returned.   If not,  processing will continue at (C).

(C)   When the requested variable is not found,  a warning message is issued,  and a nil  is returned.

(B)   If the object type is not Robject  or RClass  then we must check if the object has an entry in the generic_iv_tbl.   If it does,  the basic.flags word will have the FL_EXIVAR bit set.   If the bit is set or if the Object is a Special Constant,  then call generic_ivar_get().

Generic_ivar_get() performs a two stage Hash Table lookup.   First st_lookup  is performed using the Current Object as the Key.   If the object is in the generic_iv_tbl,  the iv_tbl  entry for the Objects External Instance Variables  Hash Table is returned.   Then st_lookup  is called to get the value of the requested Instance Variable.

Object Structures

In this section we will look at some of the more complex Object structures and how they are handled.

struct RString

  struct RString {
  struct RBasic basic;
  long len;
  char *ptr;
  union {
      long capa;
      VALUE shared;
  } aux;
  };

(ruby.h)

The pointer (PoinTeR)  points to a Null terminated character String and the length  (LENgth)  entry hold the length of the string NOT including Null Character.  

In addition to handling strings in Ruby,  they are also handled by the Extended Library.   The string pointer and length can be accessed by RSTRING(str)->ptr and RSTRING(str)->len.   You can in fact write into the ptr and len entries,  however it generally foolish to do so.  

However,  if you must mess with the string internals,  at least observe the following:

  1) Check before using if str really points to a struct RString

  2) Members may be read,  but do not modify them

  3) Do not copy the RSTRING(str)->ptr to another location and try to
     use it at some later time!   The Garbage Collectormay destroy 
     to orginal string at any time and deallocate to memory associated
     with that string.

Unless a string is shared,  a subject discussed below,  the aux.capa entry contains the amount of memory allocated for this object's string.   Note that is usually more than the length.   If it is,  then the string can grow up to that limit without allocating more memory.   Again,  this done to increase efficiency,  as memory allocation can be fairly slow.

Strings and Arrays also implement COPY-ON-WRITE when manipulating strings and arrays.   Let us return to a previous example:

(Reference)

Figure 8: The Ruby Variables keeps the reference to an object

Now we have three Variables all pointing at the same String Object.   Now,  what happens if we want to modify Variable B.  

Ruby’s strings can be modified (are mutable).   By mutable I mean after the following code:

s = "str"        # create a string and assign it to s
s.concat("ing")  # append "ing" to this string object
p(s)             # show the string

the content of the object pointed by s will become “string”.   It’s different from Java or Python string objects.   Java’s StringBuffer is closer.

And what’s the relation? First,  mutable means the length (len) of the string can change.   We have to increase or decrease the allocated memory size each time the length changes.   We can of course use realloc() for that,  but generally malloc() and realloc() are heavy operations.   Having to realloc() each time the string changes is a huge burden.

That’s why the memory pointed by ptr has been allocated with a size a little bigger than len.   Because of that,  if the added part can fit into the remaining memory,  it’s taken care of without calling realloc(),  so it’s faster.   The structure member aux.capa contains the length including this additional memory.

So what is this other aux.shared? It’s to speed up the creation of literal strings.   Have a look at the following Ruby program.

while true do  # repeat indefinitely
  a = "str"        # create a string with "str" as content and assign it to a
  a.concat("ing")  # append "ing" to the object pointed by a
  p(a)             # show "string" 
end

Whatever the number of times you repeat the loop,  the fourth line’s p has to show "string".   That’s why the code "str" should create,  each time,  a string object holding a different char[].   However,  if no change occurs for a lot of strings,  useless copies of char[] can be created many times.   It would be better to share one common char[].

The trick that allows this to happen is aux.shared.   String objects created with a literal use one shared char[].   When a change occurs,  the string is copied in unshared memory,  and the change is done on this new copy.   This technique is called “copy-on-write”.   When using a shared char[],  the flag ELTS_SHARED is set in the object structure’s basic.flags,  and aux.shared contains the original object.   ELTS seems to be the abbreviation of ELemenTS.

But,  well,  let’s return to our talk about RSTRING(str)->ptr.   Even if consulting the pointer is OK,  you must not modify it,  first because the value of len or capa will no longer agree with the content,  and also because when modifying strings created as litterals,  aux.shared has to be separated.

To finish this section about RString,  let’s write some examples how to use it.   str is a VALUE that points to RString.

RSTRING(str)->len;                //String length
RSTRING(str)->ptr[0];             //first character
str = rb_str_new("content",  7);  //create a string containing "content"
                                    the second parameter is the length
str = rb_str_new2("content");     //create a string containing "content"
                                    its length is calculated with strlen()
rb_str_cat2(str,  "end");         //Concatenate a C string to a Ruby string

The following Ruby function makes a copy of the Object  currently being referenced and changes that variable's reference to the new object.   Additionally,  the New Object's aux  field contains a reference to the original object.   If the original Object was TAINTED the copy will also be TAINTED!

static VALUE
str_new3(klass,  str)
    VALUE klass,  str;
{
    VALUE str2 = str_alloc(klass);

    RSTRING(str2)->len = RSTRING(str)->len;
    RSTRING(str2)->ptr = RSTRING(str)->ptr;
    RSTRING(str2)->aux.shared = str;
    FL_SET(str2,  ELTS_SHARED);
    OBJ_INFECT(str2,  str);

    return str2;
}

(string.c)

Notice that the Original Object data (ptr and len)  are unchanged,  although in some cases the original object is frozen.   However,  the unchanged variables references still point at the Original Object!

Struct RArray

The RArray and RString Structures are similar.   The ptr  points to data,  len  contains the data length.   Depending on the basic.flag  word,  aux  can contain the array capacity (aux.capa) or a pointer to a shared array(aux.shared).

struct RArray {
struct RBasic basic;
long len;
union {
    long capa;
    VALUE shared;
} aux;
VALUE *ptr;
;

(ruby.h)

From this structure,  it’s clear that Ruby’s Array is an array and not a list.   So when the number of elements changes in a big way,  a realloc() must be done,  and if an element must be inserted at an other place than the end,  a memmove() will occur.   But even if we do it,  it’s moving so fast it’s really impressive on current machines.

That’s why the way to access it is similar to RString.   You can consult RARRAY(arr)->ptr and RARRAY(arr)->len members,  but can’t set them,  etc., etc.   We’ll only look at simple examples:

/* Usage from C side */
VALUE ary;
Ary = rb_ary_new ();              /* Creating Empty Array */
Rb_ary_push (ary,  INT2FIX (9));   /* Push '9' onto the array*/
RARRAY (ary) ->Ptr [ 0 ];         /* Index 0 reference */
Rb_p (RARRAY (ary) ->Ptr [ 0 ]);  /* 'p' reference,  Result 9*/

# Usage from Ruby 
Ary = [ ]                          # Creating Empty Array
Ary.Push (9)                       # push '9' onto the array
Ary [ 0 ]                          # Index 0 reference
P (ary [ 0 ])                      # 'p' reference  Result 9

struct RRegexp

Class of regular expression Regexp.

struct RRegexp {
struct RBasic basic;
struct re_pattern_buffer *ptr;
long len;
char *str;
;

(ruby.h)

The Regular Expression package and it's workings are beyond the scope of this document.   The char *str  points at the Regular Expression String created by the user,  and len  of course is the length of that string.   The *ptr  entry points at the Compiled Regular Expression.  

struct RHash

The following is the RHash Structure that implements Hash Class for Ruby.

struct RHash {
struct RBasic basic;
struct st_table *tbl;
int iter_lev;
VALUE ifnone;
;

(ruby.h)

The structure st_table  is discussed in the next chapter,  "Name and Name Chart".   The entry iter_lev  provides for re-entrancy.   Muli-thread operation for example.   Finally,  ifnone  holds the reference to a block procedure or Qnil.

struct RFile

The structure RFile implements Ruby File Operations and is a subordinate class of Class IO.

struct RFile {
struct RBasic basic;
struct OpenFile *fptr;
};

(ruby.h)

The RFile structure simply points to a large I/O specific structure for handling file Operations.  

typedef struct OpenFile {
FILE *f;                    /* Stdio ptr for read/Write *
FILE *f2;                   /* Additional ptr for rw pipes *
int mode;                   /* Mode flags *
int pid;                    /* Child's pid (for pipes) *
int lineno;                 /* Number of lines read *
char *path;                 /* Pathname for file *
void (*finalize) _ ((struct OpenFile*)); /* Finalize proc *
} OpenFile;

(rubyio.h)

The structure entry usage is documented in the comments.   It provides a wrapper for C Language Standard IO.

struct RData

The RData structure is different from the other Object Structures .   It is used to mount the Extended Library.

struct RData {
struct RBasic basic;
void (*dmark) _ ((void*));
void (*dfree) _ ((void*));
void *data;
} ;

(ruby.h)

The entry *data  points to a user defined structure.   The entries *dmark  and *dfree  are pointers to the functions provided for Mark and Sweep  operations.   For a more detailed explanation of how these functions are used,  please see Chapter 5 (Garbage Collection).

Figure 9 shows the basic relationship that ties Ruby proper and Extended Library  Data Structures.

 (rdata)

Figure 9: The image figure of code>


The original work is Copyright © 2002 - 2004 Minero AOKI.
Translated by Vincent ISAMBART
Translations and Additions by C.E. Thornton
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.