From this chapter on, we will begin examining the source code.
One of Ruby's principles is the Everything is an Object. What is an Object?
There are three prime requirements that describe an Object.   Namely:
1) An object has an identity (an Object Identifier or ID) 2) It can react to received messages (using Methods) 3) It can maintain an internal state (Instance Variables)
This chapter will discuss these attributes of Objects and the code that in part supports them.   The Core Sources that are most in play here are:
1) ruby.h 2) object.c 3) class.c 4) variable.c
When an Object is created,  it is stored someplace in Object Space.   But to use it,  some sort of handle must be provided.   This is the primary job of Variables  and Constants.   Regardless of the type or class of the Object,  the Variable is always a VALUE(an unsigned long).   With some exceptions,  discussed later,  a Variable most often contains a reference to an Object.
Figure 1: VALUE as a Reference to a Object Structure
typedef unsigned long VALUE; (ruby.h)
The VALUE is cast into a pointer to the various object structures.   We will cover this topic shortly.
We often create objects that do not have a handle stored in a Variable.   These are often transitory or intermediate objects used in expressions.   These sort of references to these objects are maintained in the Syntactic Tree.  The construction and api's associated with the Syntactic Tree  are dicussed in Chapter 12.   Finally,  Objects can be created that are never referenced,  and they simply exist in Object Space,  until the Garbage Collector destroys them.   For example:
c = "List of Files" p("printing a list of files") "hi there"
The first line creates a single object (of Class Sting) and stores a reference to itself in the variable 'c'.   In other words it creates  variable  that points to the string  "list of File"  in Object Space.   
The second line creates an object (a String) without creating a variable to hold it reference.  This is an example of an Intermediate Object  that is only exists for the purpose of sending itself to the method 'p'.   The 'p' method is part of the standard list of objects generated by Ruby before the user program is read.   This method prints the result of <object>.inspect.
The last example is simply creates the string object "hi there".   However,  once it has been created in this manner,  it is inaccessible.   It will be destroyed by the garbage collector (asynchronously) at some later date.  
Ruby will inform the user with a rather cryptic warning that the statement is 'useless'.
./tst:5: warning: useless use of a literal in void context
Struct RBasic - Preamble for all Structures(flags & Klass ptr)
Struct RObject - General Object - For things not applicable below Struct RClass - Class objects Struct RFloat - Integer Decimals (small & medium) Struct RString - Character strings Struct RArray - Arrays Struct RRegexp - Regular expression Struct RHash - Hash table Struct RFile - IO File,  Socket,  And so on Struct RData - Class for describing everything for 'C' level Struct RStruct - The structure of Ruby Struct Class Struct RBignum - Big integer
For example,  a character string object is represented by the RString Structure.
Figure 2: Anatomy of a character string object
Now,  let's look at the definitions of several other object types.
// Structure for a general object struct RObject { struct RBasic basic; struct st_table *iv_tbl; } ; // Character string object structure struct RString { struct RBasic basic; long len; char *ptr; union { long capa; VALUE shared; } aux; }; // Array Object Structure struct RArray { struct RBasic basic; long len; union { long capa; VALUE shared; } aux; VALUE *ptr; }; (ruby.h)
As stated earlier that the data type VALUE may be used to hold a reference to an Object.   This type is cast into a pointer appropriate for the Object being referenced.   There is a macro Rxxxx for each built-in type in Ruby.   For example.
VALUE str = .....; VALUE arr = .....; RSTRING(str)->Len; /* ((struct RString*) str) -> Len */ RARRAY(arr)->Len; /* ((struct RArray*) arr) -> Len */
All Object Structure definitions begin with the structure RBASIC.   This structure contains the information necessary for Ruby to perform the appropriate processing for the object.
Figure 3: Struct RBasic
The Structure Definition of RBasic is as follows:
struct RBasic { unsigned long flags; VALUE klass; }; (ruby.h)
A major reason RBasic is included in all other object structures,  is the entry flags.   It has many uses,  the most important of which indicates the object type (T_xxxx).   The macro TYPE returns the type of any object it is presented.
VALUE str; str = rb_str_new (); // Create a new RString Object type = TYPE(str); // type = the enumeration T_STRING
There are flag-enumerations for each Object Structure type.   T_STRING for RString and T_ARRAY for RArray for example.
The other major entry in the structure RBasic is klass.   It holds a reference to the Class of the Object.   This information is contained in a RClass Object.
Figure 4: Object and class
It has been said that class reference is named klass to prevent name clashes when Ruby is compiled with a C++ Compiler!
I said that the type of structure is stored in the flags
member of struct
RBasic
.    But why do we have to store the type of structure?   It’s to be able to
handle all different types of structure via VALUE
.    If you cast a pointer to
a structure to VALUE
,  as the type information does not remain,  the compiler
won’t be able to help.    Therefore we have to manage the type ourselves.    That’s
the consequence of being able to handle all the structure types in a unified
way.
OK,  but the used structure is defined by the class so why are the structure type and class are stored separately? Being able to find the structure type from the class should be enough.   There are two reasons for not doing this.
The first one is (I’m sorry for contradicting what I said before),  in fact
there are structures that do not have a struct RBasic
(i.e.   they have no
klass
member).   For example struct RNode
that will appear in the second
part of the book.   However,  flags
is guaranteed to be in the beginning
members even in special structures like this.   So if you put the type of
structure in flags
,  all the object structures can be differentiated in one
unified way.
The second reason is that there is no one-to-one correspondence between class
and structure.   For example,  all the instances of classes defined at the Ruby
level use struct RObject
,  so finding a structure from a class would require
to keep the correspondence between each class and structure.   That’s why it’s
easier and faster to put the information about the type in the structure.
The RBasic.flags entry is used for many purposes and these will be discussed in detail at the appropriate time.   Just bear in mind that every object contains this entry and it controls how an object is processed.
Figure 5: RBasic.Flags -- Usage Information
The 8 flag bits (FL_USER0 - FL_USER7) are used for multiple purposes by different sections of Ruby.   For example,  FL_USER0 is to indicate,  when set,  that this object is a Singleton.
When we discussed the usage of entries defined as type VALUE,  it was indicated that they held a reference to Objects.   This is true,  but you may have wondered why it was not simply coded as VOID * pointer?  It should come as no surprise that we told a half-truth.   It is not always simply cast to a pointer!
If Ruby had strictly enforced the Object Model, then VALUE could be a VOID* pointer.   However,  certain kinds of simple objects are used so often that this would incur a substantial performance hit.   So in programming Ruby,  they cheated a little.
Some Objects are fully specified by a VALUE,  eliminating the need to create an actual object in Object Space.   This saves a lot of processing cycles and does not functionally compromise the Object Model.   These object types are:
1) Small Integer 2) Symbols 3) True 4) False 5) Nil 6) Undef
How should Variable data be interpreted?   Here we have a possible seven different interpretations for the data.   Ruby Interprets the contents of a Variable as follows:
1) If the LSB = 1,  it is a Small Integer. 2) If the VALUE is equal to 0,2,4,  or 6 it is a special constant: false,  true,  nil, or undef. 3) If the lower 8 bits are equal to'0xe',  it is a Symbol. 4) Otherwise,  it is an Object Reference
The actual processing associated with  Variables  slightly more complicated and each interpretation is discussed below:
Because integer types are extremely common,  the formation of Integer Objects can slow the execution of Ruby program.   For this reason signed integers that can be represented in 31 bits or less,  are actually stored in the VALUE itself.
Small Integers (Signed integers of no more than 31 bits) are FIXNUM types.   Integers exceeding this limit are converted the BIGNUM type,  which can hold any size Integer.
To reconstruct the original integer,  the contents of the VALUE are right shifted 1.   This removes the FIXNUM Flag (Bit 0).   The following defines and macro's convert Integers to and from FIXNUM Type.
#define FIXNUM_FLAG 0x01 #define INT2FIX(I) ((VALUE) (((long) (I)) << 1 | FIXNUM_FLAG)) #define FIX2LONG(x) RSHIFT((long)x,1) (ruby.h)
In brief,  shift 1 bit to the left,  and bitwise 'OR' it with 1.
0110100001000 |
before conversion |
1101000010001 |
after conversion |
That means that Fixnum
as VALUE
will always be an odd number.   As Ruby object structures are allocated in 20 Byte blocks, addresses will always divisible by 4.   So they do not overlap with the
values of Fixnum
as VALUE
.
There a number of specialized Integer/FIXNUM Conversions.  For example INT2NUM and NUM2INT handle both FIXNUM and BIGNUM Types during conversions.
Also,  to convert int
or long
to VALUE
,  we can use macros like
INT2NUM()
or LONG2NUM()
.   Any conversion macro XXXX2XXXX
with a name
containing NUM
can manage both Fixnum
and Bignum
.   For example if
INT2NUM()
can’t convert an integer into a Fixnum
,  it will automatically
convert it to Bignum
.   NUM2INT()
will convert both Fixnum
and Bignum
to
int
.   If the number can’t fit in an int
,  an exception will be raised,  so
there is not need to check the value range.
Internally Ruby uses an ID to reference symbols.   For every symbol,  there is a corresponding unique ID (unsigned long).
typedef unsigned long ID; (ruby.h)
With any language processing system,  handling a large volume of symbols (ie character strings) becomes a problem.   Trying to compare strings one by one will increasingly slow the system as the number of symbols increase.   Ruby handles this problem by storing all symbols in a hash table and producing an ID as a key.   This is discussed further in the next chapter.
Ruby processes symbols as much as possible using it's ID.   This greatly increasing processing efficiency.   When a ID is stored in a VALUE word,  it must first be converted into an immediate value.   It is left shifted eight bits,  and the vacated bits is loaded with '0x0e'.   Since this value is NOT divisible by four(4),  Ruby knows that it is a Symbol value.   (All Object Reference Addresses are divisible by four(4) because memory is allocated in quad bytes.)
The Macro's and defines below are used to convert ID's to and from Symbol values.
#define SYMBOL_FLAG 0x0e #define ID2SYM (x) ((VALUE) (((long) (x)) << 8|SYMBOL_FLAG) ) #define SYM2ID(x) RSHIFT((long)x,8) (ruby.h)
The following Macro has been defined to test whether a VALUE is holding an Symbol value.   It returns true if it is a symbol value.
#define SYMBOL_P(x) (((VALUE)(x)&0xff)==SYMBOL_FLAG) (ruby.h)
True false nil
These are Special Constants that are stored in VALUE words as immediate values.   When a VALUE word is less than seven and an even number,  it is interpreted as the following immediate values:
#define Qfalse 0 /* Ruby False*/ #define Qtrue 2 /* Ruby True*/] #define Qnil 4 /* Ruby Nill*/ (ruby.h)
True and false are tested directly.   The following two Macro's test for Nil and a special test for both Nil and false.
#define NIL_P(v) ((VALUE)(v) == Qnil) /* True if Nil */ #define RTEST(v) (((VALUE)(v) & ~Qnil) != 0) /* True if Nil or False */ (ruby.h)
#define Qundef 6 /* Undefined Value as Placeholder */ (ruby.h)
This value is used internally in the interpreter for Undefined Conditions and is never visible at the Ruby Level.  
The second requirement of Ruby Objects is that they can respond to messages.   When a Object receives a message it will attempt call the method requested.   Methods are stored in Class Objects.  So to start understanding how objects respond to messages it necessary to understand how class provide the basic support structure of method response processing.  
struct RClass
The fact that Classes and Modules generally similar,  they both use the structure RClass.   They are differentiated by the contents of the RBasic  flags  (ie T_CLASS or T_MODULE).
Modules are also involved in processing messages.   However,  they do so in a way very analogous to Classes,  and discussion of them can be postponed till later.
300 struct RClass { 301 struct RBasic basic; 302 struct st_table *iv_tbl; 303 struct st_table *m_tbl; 304 VALUE super; 305} ; (ruby.h)
The two fields in RClass that are involved in message processing,  are m_tbl,  and super.
M_tbl  is a pointer to a Hash Table.   The method name serves as the Hash Key .   Assuming the method is found,  an information block (i.e.   a NODE) is returned that contains the information necessary to execute the method.   As with other things here in beginning,  Nodes  are a matter for later discussion.
Super  is reference to the  Superclass  in the Class Inheritance Tree.  Each Class has only one Superclass,  thus only one super  entry in the class structure.
All hash tables used internally by Ruby are referenced with a pointer of type st_table.   Hash table functions and how the work is the topic of the next chapter.  
Ruby is a single inheritance Language.   As we will learn in later chapters,  one the purposes of Modules is to provide for Mix-In's  which yield many of the benefits of Multiple Inheritance without ambiguity.  
The super  entry is used by Ruby to build a Class Inheritance Tree.      Almost all Classes have a reference to it's Superclass.    The only Class that has a super  entry containing 'Nil' is the Root Class:  Object.   The details of how the Root Class is constructed and initialized will be discussed in detail in later chapters.   For now,  all that is necessary is the knowledge that all the Kernel Methods are embedded in this class.   This means that all Kernel Methods are available to all classes unless overridden by a later class!
This follow graphic visually depicts the Class Inheritance Tree.  
Figure 6: Class Inheritance Tree
With the structure described above,  the procedure for searching for methods is straight forward.   If the method is not in the current classes m_tbl ,  then the procedure recursively searches up though the Superclass links.   If the Root Object is reached and the method is not found there,  an error exception will be generated.
The following code tries to activate the method box ,  which does not exist.   This results in an exception error and program termination.
p("Test") box("printing a list of files") p("OF") ========= Output ========= "Test" -:2: undefined method `box' for main:Object (NameError)
The following procedure is for sequentially searching Class Inheritance Tree for a method.  
static NODE* search_method (klass,  Id,  Origin) VALUE klass,  *origin; ID id; { NODE *body; if (!Klass) Return 0; while (!St_lookup (RCLASS (klass) ->M_tbl,  Id,  &body) ) { klass = RCLASS (klass) ->Super; if (!Klass) Return 0; } if (origin) *origin = klass; return body; } (eval.c)
The hash function st_lookup searches the current classes method table (via the  m_tbl  pointer) for the requested method.   If not found the current Superclass becomes the current class and the search is repeated.   If the super entry is 'Nil',  the function returns false,  otherwise it returns a pointer to the body of the method!
Through out Ruby,  extraordinary care has taken to insure efficient processing.   This a major concern of good interpreter design and it is especially true of the Ruby Language. Method lookup functions are another place that could markedly slow Ruby programs.   Searching for methods is not only time consuming,  but it is one the the most common operations Ruby performs. Ruby solves this by only preforming the method search only the first time a method name is encountered! The results of these searches are cached.   Subsequent references to a method name are fulfilled from the cache store.
Now,  the third requirement of an Object is the ability to retain information specific to an individual Object..  
rb_ivar_set ()
The mechanism for retaining object specific data is an Instance Variable.   Not suprisingly,  references to Instance Variables in an object are stored in a Hash Table.
VALUE rb_ivar_set (obj,  Id,  Val) VALUE obj; ID id; VALUE val; { if (!OBJ_TAINTED (obj) && rb_safe_level () >= 4) rb_raise (rb_eSecurityError, "Insecure: Can't modify instance variable"); if (OBJ_FROZEN (obj)) rb_error_frozen ("object"); switches (TYPE (obj)) { case T_OBJECT: case T_CLASS: case T_MODULE: if (!ROBJECT (obj) ->Iv_tbl) ROBJECT (obj) ->Iv_tbl = st_init_numtable (); st_insert (ROBJECT (obj) ->Iv_tbl,  Id,  Val) ; break; default: generic_ivar_set (obj,  Id,  Val) ; break; } return val; } (variable.c)
The routines rb_raise and rb_error_frozen are error checks.   While these routines are necessary for the actual function,  they are not the main thread of the processing.   For now ignore the error processing and study the procedure's primary function.   With the error code removed,  we are left with a switch and it's attendent processing.
Switch (TYPE (obj)) { Case T_aaaa: Case T_bbbb: : : : default: }
The macro TYPE () returns an object's Type Flag (T_OBJECT,  T_STRING,  ..).   These flags a enumerated as integer values and used by the 'switch' statement to select the appropriate processing.   Fixnum and symbols although entirely contained in the variable reference,  still return a type of FIXNUM and SYMBOL,  so these types do not generate any processing problems.  
There are only three object structures that contain an st_table pointer to Instance variables,  they are:
/* ** TYPE (val) == T_OBJECT */ struct RObject { struct RBasic basic; struct st_table *iv_tbl; } ; /* ** TYPE (val) == T_CLASS or T_MODULE */ struct RClass { struct RBasic basic; struct st_table *iv_tbl; struct st_table *m_tbl; VALUE super; }; (ruby.h)
The objects above contain an entry labeled iv_tbl  (Instance Variable TaBLe).   In other words,  OBJECTS,  CLASSES,  and MODULES contain a st_table pointer to a Hash Table for storing  Instance Variables.
For these three classes,  loading a value into an Instance Variable  is accomplished by the following code fragment.   (See rb_ivar_set above)
if (!ROBJECT (obj) ->iv_tbl) ROBJECT (obj) ->iv_tbl = st_init_numtable (); st_insert (ROBJECT (obj) ->iv_tbl,  id,  val) ;
A key to understanding this code is that first two entries in ROBJECT and RCLASS Structures are the same.   Because this fact,  ROBJECT and RCLASS Structures can be cast as ROBJECT as long as Only  RBasic and iv_tble entries accessed.   As the chart below shows iv_tbl  can be accessed in both Objects.
Figure 6: RObject vs RClass
If the iv_tbl  entry is empty,  an empty Hash Table is constructed.  At this point it assumed that the iv_tbl  is valid.   The Hash Table insert function,  st_insert,  first checks if the method is already in the table.  If it is,  then the value entry is loaded with the new value.   Otherwise,  a new table item is constructed and inserted into the Hash Table.  
Warning: as struct RClass
is a class object,  this instance variable table is
for the use of the class object itself.   In Ruby programs,  it corresponds to
something like the following:
As stated above,  only RObject and RClass Object Structures contain an iv_tbl  entries.   So what happens if you want to define an Instance Variable  in other Built-in Objects (or objects derived from them)?    The function generic_ivar_set is used to store the iv_tbl   information for objects without internal iv_tbl  entries.  
default: generic_ivar_set (obj,  Id,  Val) ; break;Figure 6b - Code Fragment from rb_ivar_set()(variable.c)
Since Objects other than RObject and RClass,  do not have iv_tbl  entries.   These entries are stored instead in a seperate Hash Table and 'keyed' to the Object's Name.
The structure generic_iv_tbl is the Primary Hash Table.   It uses the Object Name as the Table Key,  and the Table Value is a standard (althougth external)  iv_tbl  entry.  This entry is a pointer to the Hash Table containing the Instance Variables  for the current Object.  
Figure 7: Generic Instance Variable Tables
static st_table *generic_iv_tbl; /* Root of Generic Instance */ /* Variable Table */ static void generic_ivar_set (obj,  Id,  Val) VALUE obj; ID id; VALUE val; { st_table *tbl; /* Handle Special Constants */ if (rb_special_const_p (obj)) { special_generic_ivar = 1; } /* If there is no generic_iv_tbl,  create it!,  */ if (!Generic_iv_tbl) { generic_iv_tbl = st_init_numtable (); } /* Creating and/or Updating a Generic Instance Variable */ if (!St_lookup (generic_iv_tbl,  Obj,  &tbl) ) { FL_SET (obj,  FL_EXIVAR) ; tbl = st_init_numtable (); st_add_direct (generic_iv_tbl,  Obj,  Tbl) ; st_add_direct (tbl,  Id,  Val) ; return; } st_insert (tbl,  Id,  Val) ; } (variable.c)
The function rb_special_const_p()  returns true if the obj   is a Special Constant  (ie QFALSE,  QTRUE,  QNIL,  QUNDEF,  SYMBOL,  or FIXNUM).
The function st_init_numtable  creates a new empty Hash Table.  
The function st_lookup  first is used to determine if there are any Instance Variables associated with this object.   If there are Instance Variables associated with the current object a pointer the Hash Table for that object is returned.   The particular Instance Variable  is then either updated or created if it does not currently exist.
It there is no Instance Variables associated with this object,  then a Generic Instance Variable entry will be created for the current object.   An Empty Hash Table is created and added to the generic_iv_tbl  for the current object.   A Hash Table for the Instance Variable is created and inserted into the empty table just created in the generic_iv_tbl.
The Macro FL_SET is used to set the EXIVAR flag in the Current Object's basic.flags.   The flag can be thought of as an abreviation of as EXternal Instance VARiable.
The question is asked,  why do only RObject  and RClass structures have iv_tbl entries? There are several reasons.
The primary reason is reducing memory usage.   Adding an iv_tbl  to each Built-in Object would increase memory consumption by 20%.   Instance Variables  are most often used in Objects,  Classes,  and Modules.   While any object can have Instance Variables,  they seldom are found in other objects.   Thus,  using a Generic Instance Tables to implement Instance Variables  substationally reduces memory usage with only a small pentely in effeciency.
A secondary reason is that all objects are now about the same size.   The all fit within twenty(20) bytes.   This in turn improves Garbage Collection and Memory Allocation functions.   This issue is discussed in more detail in the chapter describing Ruby Garbage Collection processing.  
As an interesting side note,  Generic Instance Tables where not introduced until Ruby 1.2.
rb_ivar_get ()
The function rb_ivar_set () is used to create and/or update the value of an Instance Variable. Here we discuss the companion function that returns the value of an Instance Variable.
VALUE rb_ivar_get (obj,  Id) VALUE obj; ID id; { VALUE val; switches (TYPE (obj)) { /* (A) */ case T_OBJECT: case T_CLASS: case T_MODULE: if (ROBJECT (obj) ->Iv_tbl && St_lookup (ROBJECT (obj) ->Iv_tbl,  Id,  &val) ) return val; break; /* (B) */ default: if (FL_TEST (obj,  FL_EXIVAR) || rb_special_const_p (obj)) return generic_ivar_get (obj,  Id) ; break; } /* (C) */ rb_warning ("instance variable %s not initialized",  Rb_id2name (id)); return Qnil; } (variable.c)
(A)   The RObject  iv_tbl  entry is checked.   If it is NULL,  the break statement is executed and process continues at (C).   Otherwise,  the function st_lookup  is called.   If the desired Instance Variable  is found,  the resulting value is returned.   If not,  processing will continue at (C).
(C)   When the requested variable is not found,  a warning message is issued,  and a nil  is returned.
(B)   If the object type is not Robject  or RClass  then we must check if the object has an entry in the generic_iv_tbl.   If it does,  the basic.flags word will have the FL_EXIVAR bit set.   If the bit is set or if the Object is a Special Constant,  then call generic_ivar_get().Generic_ivar_get() performs a two stage Hash Table lookup.   First st_lookup  is performed using the Current Object as the Key.   If the object is in the generic_iv_tbl,  the iv_tbl  entry for the Objects External Instance Variables  Hash Table is returned.   Then st_lookup  is called to get the value of the requested Instance Variable.
In this section we will look at some of the more complex Object structures and how they are handled.
struct RString
struct RString { struct RBasic basic; long len; char *ptr; union { long capa; VALUE shared; } aux; }; (ruby.h)
The pointer (PoinTeR)  points to a Null terminated character String and the length  (LENgth)  entry hold the length of the string NOT including Null Character.  
In addition to handling strings in Ruby,  they are also handled by the Extended Library.   The string pointer and length can be accessed by RSTRING(str)->ptr and RSTRING(str)->len.   You can in fact write into the ptr and len entries,  however it generally foolish to do so.  
However,  if you must mess with the string internals,  at least observe the following:
1) Check before using if str really points to a struct RString 2) Members may be read,  but do not modify them 3) Do not copy the RSTRING(str)->ptr to another location and try to use it at some later time!   The Garbage Collectormay destroy to orginal string at any time and deallocate to memory associated with that string.
Unless a string is shared,  a subject discussed below,  the aux.capa entry contains the amount of memory allocated for this object's string.   Note that is usually more than the length.   If it is,  then the string can grow up to that limit without allocating more memory.   Again,  this done to increase efficiency,  as memory allocation can be fairly slow.
Strings and Arrays also implement COPY-ON-WRITE when manipulating strings and arrays.   Let us return to a previous example:
Figure 8: The Ruby Variables keeps the reference to an object
Now we have three Variables all pointing at the same String Object.   Now,  what happens if we want to modify Variable B.  
Ruby’s strings can be modified (are mutable).   By mutable I mean after the following code:
s = "str" # create a string and assign it to s s.concat("ing") # append "ing" to this string object p(s) # show the string
the content of the object pointed by s
will become “string
”.   It’s
different from Java or Python string objects.   Java’s StringBuffer
is closer.
And what’s the relation? First,  mutable means the length (len
) of the string
can change.   We have to increase or decrease the allocated memory size each time
the length changes.   We can of course use realloc()
for that,  but generally
malloc()
and realloc()
are heavy operations.   Having to realloc()
each
time the string changes is a huge burden.
That’s why the memory pointed by ptr
has been allocated with a size a little
bigger than len
.   Because of that,  if the added part can fit into the
remaining memory,  it’s taken care of without calling realloc()
,  so it’s
faster.   The structure member aux.capa
contains the length including this
additional memory.
So what is this other aux.shared
? It’s to speed up the creation of literal
strings.   Have a look at the following Ruby program.
while true do # repeat indefinitely a = "str" # create a string with "str" as content and assign it to a a.concat("ing") # append "ing" to the object pointed by a p(a) # show "string" end
Whatever the number of times you repeat the loop,  the fourth line’s p
has to
show "string"
.   That’s why the code "str"
should create,  each time,  a string
object holding a different char[]
.   However,  if no change occurs for a lot of
strings,  useless copies of char[]
can be created many times.   It would be better
to share one common char[]
.
The trick that allows this to happen is aux.shared
.   String objects created
with a literal use one shared char[]
.   When a change occurs,  the string is
copied in unshared memory,  and the change is done on this new copy.   This
technique is called “copy-on-write”.   When using a shared char[]
,  the flag
ELTS_SHARED
is set in the object structure’s basic.flags
,  and aux.shared
contains the original object.   ELTS
seems to be the abbreviation of
ELemenTS
.
But,  well,  let’s return to our talk about RSTRING(str)->ptr
.   Even if
consulting the pointer is OK,  you must not modify it,  first because the value
of len
or capa
will no longer agree with the content,  and also because when
modifying strings created as litterals,  aux.shared
has to be separated.
To finish this section about RString
,  let’s write some examples how to use
it.   str
is a VALUE
that points to RString
.
RSTRING(str)->len; //String length RSTRING(str)->ptr[0]; //first character str = rb_str_new("content",  7); //create a string containing "content" the second parameter is the length str = rb_str_new2("content"); //create a string containing "content" its length is calculated with strlen() rb_str_cat2(str,  "end"); //Concatenate a C string to a Ruby string
The following Ruby function makes a copy of the Object  currently being referenced and changes that variable's reference to the new object.   Additionally,  the New Object's aux  field contains a reference to the original object.   If the original Object was TAINTED the copy will also be TAINTED!
static VALUE str_new3(klass,  str) VALUE klass,  str; { VALUE str2 = str_alloc(klass); RSTRING(str2)->len = RSTRING(str)->len; RSTRING(str2)->ptr = RSTRING(str)->ptr; RSTRING(str2)->aux.shared = str; FL_SET(str2,  ELTS_SHARED); OBJ_INFECT(str2,  str); return str2; } (string.c)
Notice that the Original Object data (ptr and len)  are unchanged,  although in some cases the original object is frozen.   However,  the unchanged variables references still point at the Original Object!
The RArray and RString Structures are similar.   The ptr  points to data,  len  contains the data length.   Depending on the basic.flag  word,  aux  can contain the array capacity (aux.capa) or a pointer to a shared array(aux.shared).
struct RArray { struct RBasic basic; long len; union { long capa; VALUE shared; } aux; VALUE *ptr; ; (ruby.h)
From this structure,  it’s clear that Ruby’s Array
is an array and not a
list.   So when the number of elements changes in a big way,  a realloc()
must
be done,  and if an element must be inserted at an other place than the end,  a
memmove()
will occur.   But even if we do it,  it’s moving so fast it’s really
impressive on current machines.
That’s why the way to access it is similar to RString
.   You can consult
RARRAY(arr)->ptr
and RARRAY(arr)->len
members,  but can’t set them,  etc.,
etc.   We’ll only look at simple examples:
/* Usage from C side */ VALUE ary; Ary = rb_ary_new (); /* Creating Empty Array */ Rb_ary_push (ary,  INT2FIX (9)); /* Push '9' onto the array*/ RARRAY (ary) ->Ptr [ 0 ]; /* Index 0 reference */ Rb_p (RARRAY (ary) ->Ptr [ 0 ]); /* 'p' reference,  Result 9*/ # Usage from Ruby Ary = [ ] # Creating Empty Array Ary.Push (9) # push '9' onto the array Ary [ 0 ] # Index 0 reference P (ary [ 0 ]) # 'p' reference Result 9
struct RRegexp
Class of regular expression Regexp.
struct RRegexp { struct RBasic basic; struct re_pattern_buffer *ptr; long len; char *str; ; (ruby.h)
The Regular Expression package and it's workings are beyond the scope of this document.   The char *str  points at the Regular Expression String created by the user,  and len  of course is the length of that string.   The *ptr  entry points at the Compiled Regular Expression.  
struct RHash
The following is the RHash Structure that implements Hash Class for Ruby.
struct RHash { struct RBasic basic; struct st_table *tbl; int iter_lev; VALUE ifnone; ; (ruby.h)
The structure st_table  is discussed in the next chapter,  "Name and Name Chart".   The entry iter_lev  provides for re-entrancy.   Muli-thread operation for example.   Finally,  ifnone  holds the reference to a block procedure or Qnil.
struct RFile
The structure RFile implements Ruby File Operations and is a subordinate class of Class IO.
struct RFile { struct RBasic basic; struct OpenFile *fptr; }; (ruby.h)
The RFile structure simply points to a large I/O specific structure for handling file Operations.  
typedef struct OpenFile { FILE *f; /* Stdio ptr for read/Write * FILE *f2; /* Additional ptr for rw pipes * int mode; /* Mode flags * int pid; /* Child's pid (for pipes) * int lineno; /* Number of lines read * char *path; /* Pathname for file * void (*finalize) _ ((struct OpenFile*)); /* Finalize proc * } OpenFile; (rubyio.h)
The structure entry usage is documented in the comments.   It provides a wrapper for C Language Standard IO.
struct RData
The RData structure is different from the other Object Structures .   It is used to mount the Extended Library.
struct RData { struct RBasic basic; void (*dmark) _ ((void*)); void (*dfree) _ ((void*)); void *data; } ; (ruby.h)
The entry *data  points to a user defined structure.   The entries *dmark  and *dfree  are pointers to the functions provided for Mark and Sweep  operations.   For a more detailed explanation of how these functions are used,  please see Chapter 5 (Garbage Collection).
Figure 9 shows the basic relationship that ties Ruby proper and Extended Library  Data Structures.
Figure 9: The image figure of code>
The original work is Copyright © 2002 - 2004 Minero AOKI.
Translated by Vincent ISAMBART
Translations and Additions by C.E. Thornton
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.