chapter 5
Garbage Collection

Execution time program image

This chapter will abruptly subject the reader to a discussion of various low level issues.   First,  the Ruby 'C' program memory structure,  followed by the Memory Structure supporting the actual Ruby Language execution.   This may be tough going for some readers,  but this chapter is important to understanding later chapters.

C  Program Memory Structure

A general a 'C'  program has the following basic memory structure.

1. Program Code Space
2. Constants, Static variables, and global variables
3. Machine stack
4. Heap, also refered to a Free Store.
Items 1 and 2 are generally contained in the load image.   Items 3 an 4 are setup by program initialization.

Once a 'C' program is initialized,  it contains at least three seperate memory segments.   Program Code,  Globals,  and static variables are generally in the load image stored in Text Memory.   Arguments and local variables of functions are stored on the Machine Stack.  Lastly,  the program can also acquire additional memory from the Free Store (also refered to as the Heap).

The third item should to well known to most programmers.   The Machine Stack is implemented by the processor hosting Ruby.   In Figure 1a.  the stack is shown growing toward higher addresses.   This is not universal,  the X86 Memory Stack,  Figure 1b,  grows toward the lower addresses.

(Macstack)

Figure 1: Machine Stacks

The Ruby language program uses the the Machine Stack to handle it's stack operations.   The small items, like Integers and arguments are pushed onto the Machine Stack by the underlying code generated by the 'C' Compiler.

As we will see, the ruby interpreter pushes larger blocks of data onto the Machine Stack.   These blocks are called a Stack Frames.   The Stack Frame is a predefine memory block that allows Ruby push structured data onto the Machine Stack.   Ruby also pushes various other structured and unstructured data onto the stack.

The Stack Frame corresponds to a Ruby Function Call.   When the function returns,  the Stack Frame is poped from the Machine Stack!   The left side of  Figure 2 shows the stack after the Ruby Function 'D' is called.   The right shows the stack after returning from the functions 'D' & 'C'.

(Macstack)

Figure 2: Machine Stack - Using native alloca

Alloca() vs Malloc()

The malloc()  function returns a pointer to a block of memory of the size requested, of a Null pointer to indicate an error.   The memory block is obtained from the Heap.   Memory Blocks acquired using malloc() must eventually be returned to the Heap via Free()  function.

The Native alloca()  function is a different animal.   This function is a very low level function without many safeguards!   This function is generally implemented as in-line function generated by the complier.   The requested memory block is allocated on the Machine Stack.   Thus,  when the function that called alloca() returns the memory block evaporates.   The inlined code is generally a single instruction adjusting the stack pointer,  it does not check for stack overflow,  and there is no NULL error return.

The use of Native alloca()  is good in the respect that is simple and can improve execution times when used appropriately.

Now for the "fly in the ointment"!   The sad fact is that Native alloc()  is not supported by all host systems and compilers used to execute the Ruby Language Program (i.e.  the Ruby Interpreter).   The parts of the Ruby interpreter that use alloca()  determine which implementation of alloca() to use.

The Alloca() Emulation  code is provided by missing/alloca.c.

The alloca emulation  function uses a linked list to manage the allocation and release of the acquired memory.   Each entry in the linked list consists of a header and a block of memory.   Each Header contains a pointer to the next entry (if any)  and the current Machine Stack location.   A List Element  is made up of a Header (described above) followed by a Size  block of memory.

(Calloca)
Figure 3: Alloca Emulation linked list diagram

This list and it's operations are controlled by one function,  alloca()   This is necessary to maintain compatibility with Native Alloca.  

When a user calls alloca(),  the function performs Garbage Collection.   The Garbage Collector steps backward through the linked List removing all List Elements where the recorded depth is deeper in stack than the current Machine Stack pointer.   After garbage collection is finished, a new List Element is created and placed a the top of the linked list.  

If the size  is zero, the function returns after the garbage collection is perfomed and does not create a List Element

This is important to remember that until the next alloca() call the linked list remains untouched.  While a List Element may exist beyond the point where calling function returns,  it will evaporate the next time the alloca() function is called.

The Figure below shows the results after the return of 'D' and 'C' and the next garbage collection cycle caused by a alloca() call.   The figure below does not show creation of the next List Element,  if any.

(Calloca)
Figure 4: Alloca Emulation linked list diagram

The missing/alloca.c  function emulates the Native Alloca()  on host systems or compilers that do not support the native alloca().

GC ... Explanation

We have discussed the underlying support for stack resident memory allocation and it's emulation for those host systems and compilers that do not have Native alloca support.  The balance of this chapter will be the management of the Heap.

GC

Normally,  objects reside in memory and at the conclusion of the program these memory resident objects are released along the other memory resources used by the program.   As long as the memory space is large enough to provide all the memory requested, things work fine!  

In actuallity,  memory is always a limited resource.   Even if the user has access to all them memory available, even small amounts of discarded memory add up to very large values for long running programs.   Almost all programs will acquire blocks of heap memory.   When these resources are not released properly,  the accumlation of this memory debris is called a Memory Leak.

The control of the memory resources is always a available to the programmer by using malloc() , and free().   In Object Oriented programs knowing when to release objects from memory can become difficult due to complex object references.

Ruby relieves the programmer of much of this complexity by managing object memory allocation and release automatically.   Ruby handles this task,  as many other languages do,  by using a Garbage Collection  Scheme.

Ruby uses a modified version of the Boehm-Demers-Weiser garbage collector ,  generally known as simply Boehm GC.   With this function there is no need release the memory allocated for Ruby objects and other automatically allocated memory.   The Ruby Garbage Collector,  hereafter refered to as just GC,  manages both the allocation and release of these memory resources.

The issue of Heap Memory Compaction arises in many discussions of the various implementations of Boehm GC.   Many programs that the use a Garbage Collector,  often face issues of memory fragmentation.   Ruby does not implement memory compaction,  but does not generally suffer from crippling heap fragmentation.  

One reason for this is that the basic object structures are all the same size.   It is also true that some objects,  strings leap to mind,  allocate additional memory to hold the string value.   The free store  or heap  storage algorithm is controll at a low level by the malloc functiion.  This, and other issues concernint the Ruby GC will be covered in a later section.

The Boehm GC has been used successfully in many systems.   More information on Boehm GC is available at: http://www.hpl.hp.com/personal/Hans_Boehm/gc.

Recent languages such as Java and Perl, Python, C#, and Eiffel use Garbage Collection.  The rest of this chapter will be devoted to the details of Ruby's Garbage Collector.

What is Garbage Collection?

The first question is what is GC trying accomplish.   The GC allocates memory from the heap  for objects and memory controlled by Objects.   Additionally, and most importantly,  it manages this resource and reclaims the allocated memory from the program when it is no longer in use.   This releaves the programmer of the task of keeping track of allocated memory and releasing it when no longer needed.

Ruby's GC is,  as are most Garbage Collectors,   is composed of three main sections.   The first is Memory Allocation,  the second is Garbage Detection,  and the last is Garbage Reclarmation.

Memory Allocation  functions control access to the memory set aside for GC functions.    This memory is initially allocated from Heap Memory at program initialization.  When this initial allocation of memory is exhausted, Ruby will perform Garbage Collection.   If there is still not enougth memory to satisfy the allocation request,  another block of memory is requested and added to the memory controlled by the Garbage Collector.

The various memory allocation functions return blocks of memory of the requested sizes.   These functions also maintain lists of memory that is allocated and memory that is available for allocation.   In addition to being used by memory allocation functions, these lists are also updated by the Garbage Reclarmation  processing.

Garbage Detection is the process of determining which objects are still in use.  These objects are Live Objects  and will be preserved.    Conversely, all other objects are considered "Dead" and no longer viable.   These objects, called garbage,  are processed by the Garbage Reclarmation  section.

Garbage Reclarmation section of GC attempts to find all the memory being used by the so called Garbage Objects and returns it to the Memory Allocation functions by updating the lists of available memory.   The memory returned is not just the object itself,  but any addtional memory attached to the basic object.   In Ruby for example, Fixnum,  true,  and false  are all completely contained the basic object structure.    Strings,  on the other hand,  allocate additional memory to hold the actual character string.

Survey of garbage collector types

The following sections discuss several general methods for performingGarbage Collection.   We briefing discuss the major types,  their weaknesses and strengths.   Bear in mind the Mark and Sweep  is the type used by Ruby.

Mark & sweep

The Mark phase finds live objects and marks  them to prevent the GC from deallocating the associated memory.   It performs this task by defining a Root Set  of known viable objects and tracing their connections to other objects.  

The Root Set  are objects referenced by:

  1. Ruby Global Variables and Constants
  2. Ruby live Local Variables
  3. The machine stack
  4. Objects referenced by 'C' Global Variables registered by rb_gc_register_address()

(Gcimage)
Figure 5: Root Set trace of live objects

The set of objects found by the sweep phase that are not marked  are considered dead objects,  also called garbage.

Mark and Sweep Merits:

  1. While Mark & Sweep GC's normally have problems with memory fragmentation, Ruby GC is somewhat immune to this problem because the common size of Ruby Objects.
  2. One other advantage is that Mark & Sweep can release a cycle.   See the section on Reference Counting for a discussion of this problem.

Mark and Sweep Deficiencies

  1. The minimum time it takes to run one Mark and Sweep Cycle is directly proportional to the number of Managed Objects.   Each object must be 'touched' at least once.   All live objects must be marked and all garbage objects must be collected.
  2. Because the Mark and Sweep cycle locks out other processing while it runs,  the GC can make a noticable dent in a programs performace.

Stop & copy

This method is a modification of the Mark and Sweep Garbage Collector.   The managed heap space is divided into two sections.   One is the current active heap and the other is empty, call them A and B.   Any any one time, only one these heaps will be active!   The GC allocates new objects in the active heap.

(Stop2)
Figure 6: Stop & copy (1)

When the GC Runs, it transverses all active objects in the same way as Mark and Sweep.   However, the live objects are transferred to the inactive heap instead marking them.   When the Mark phase is complete, the inactive heap is made the active heap.   The freshly inactivated heap only contains garbage objects and thus they are implicitly collected.   This is, the inactive area is considered empty!

(Stop3)
Figure 7: Stop & copy (2)

Stop and Copy Merits:

  1. The live objects are essentially compacted during the Garbage Collection cycle.
  2. This additonally improves the locality of live objects.   Since they are all located near each other, they reduce memory paging in a virtual memory system.   There is a much higher probabilty that referneces will be in the memory cache.
  3. Like it's cousin Mark and Sweep, this method also is able to collect cycles.

Stop and Copy Deficiencies

  1. The time to copy objects to the inactive heap takes longer than simply marking a live object.
  2. The position of the objects change.   The relocation of references may take a signigicant amount of processing.

Reference counting

A Reference Counting Garbage Collector is different from the other types of GC's we have discussed so far.   In this method,  each object has a reference counter  associate with it.   This counter is incremented by one each tiem a new reference pointer is created that points to a particular object.   For example, when a pointer is copied from one place to another by an assignment statement.

When a referece to an object is removed, the counter is decremented by one.   When the reference counter becomes zero,  the object is collected,  e.g.,  the memory allocated for the object is reclaimed.

(Refcnt)
Figure 8: Reference counting example

Reference Counting Merits:
  1. The processing load imposed by the Garbage Collector is distributed across the whole execution of the program.  
  2. Objects are, theorically, released the moment they can no longer be reached by the program.

Reference Counting Deficiencies

  1. The most serious problem with reference counting is that is it not always effective.    The inability to handle cycles  is discussed below.
  2. Reference counting is very hard to make efficient.   Since every time a refernce to an object is created or destroyed, a reference counter must be incremented or decremented.   This adds more overhead to the processing of objects and references.
  3. Additionally, if the reference counter for a large complex object structure becomes zero, it can cascade through references finding other referneces becoming zero.   In some cases this may cause a noticable performace hit.

As mentioned above, a major deficiency of reference counting is handling cycles.  These are a group of objects that form a circular structure.  A simple example is shown below:

(Cycle)
Figure 9: Cycle

As you can see, even if such a structure loses the last reference from an external object,  the counters will never reach zero.

Management of Ruby objects

The Ruby GC only manages Ruby Objects and other memory blocks controlled  by Ruby Objects.   Ruby Objects often acquire addtional memory to provide storage for a variety of purposes.   These include buffers to hold character strings,  memory to hold numeric values, memory to hold arrays and hashes.

While memory can be allocated directly from the heap by calling 'C' language functions,  this memory will not be managed by the Ruby GC.   It is the Users resposibility to release any memory acquired in this manner.  Failure to release memory will result in a memory leak  , as shown below:
Void not_ok ()
{
    Malloc (1024);     /* Possible Memory Leak! */
}

In the following case,  the user allocates memory for a Ruby object using the Ruby API.   Ruby will manage this block on memory automatically and release it when it is no longer in use.

Void this_is_ok ()
{
    Rb_ary_new ();    /* Object managed by the Ruby GC */
away, *
}

Struct RVALUE

As we stated the Ruby GC only manages Ruby Objects.   This control is accomplished using the basic Ruby object structure.   Ruby objects are always addressed with a reference pointer of type RVALUE.

The struct RVALUE a union of Object Structures and System Structures.   There is one object structure for each ruby object type( object,  klass,  string,  ...),  and additonal objects to support the Ruby interpreter operations.


typedef struct RVALUE {
    union {
  struct {
      unsigned long flags;  /* always 0 for freed obj */
      struct RVALUE *next;
  } free;
  struct RBasic  basic;
  struct RObject object;
  struct RClass  klass;
  struct RFloat  flonum;
  struct RString string;
  struct RArray  array;
  struct RRegexp regexp;
  struct RHash   hash;
  struct RData   data;
  struct RStruct rstruct;
  struct RBignum bignum;
  struct RFile   file;
  struct RNode   node;
  struct RMatch  match;
  struct RVarmap varmap;
  struct SCOPE   scope;
    } as;
#ifdef GC_DEBUG
    char *file;            /* Internal Debugging Feature */
    int   line;
#endif
} RVALUE;

(Gc.C)

Ruby GC managed memory contains two distinct types of data.   The first is the actual Ruby Objects,  which are always exactly 20 Bytes(40 Bytes on 64 bit Processors).   The other type is the additional memory allocated and referenced by these objects.   This memory is used to store data and table references.

Of the RVALUE Structures listed above,  only the ones itemized below represent the structures that implement Ruby Objects.

Struct RBasic    - Preamble for all Structures(flags & Klass ptr)
Struct RObject - General Object - For things not applicable below Struct RClass - Class objects Struct RFloat - Integer Decimals (small & medium) Struct RString - Character strings Struct RArray - Arrays Struct RRegexp - Regular expression Struct RHash - Hash table Struct RFile - IO File,  Socket,  And so on Struct RData - Class for describing everything for 'C' level Struct RStruct - The structure of Ruby Struct Class Struct RBignum - Big integer

The importance of the  RVALUE Structure can not by over emphasized.   This is the core of the Ruby Interpreter.   It not only expresses all Ruby objects,  but it also contains structures used by the Interpter itself.   All of these structures are same length and share a common preamble, the RBasic Structure.  

The Structure Definition of RBasic is as follows:

  struct RBasic {
  unsigned long flags;
  VALUE klass;
  };

(ruby.h)
The two structures most important to the operation of the Ruby GC are:
1)  Struct RBasic   
2)  Struct as.free   

As you will remember,  most of the RVALUE structures contain RBasic,  The only exceptions are RNode  and SCOPE.   In the case of as.free,  it reinterprets RBasic enteries only for the purpose of linking RVALUE blocks into the *freelst.

as.free
unsigned long flags;   -- Always zero when added to the *freelist
struct RVALUE *next;   -- Points at next entry in the *freelist

The construction of the *freelist will be discussed shortly.

The flags  word in RBasic,  as explained in Chapter 02,  contains all the state information for a Ruby Object.

Free Store Mapping

Due to the way the malloc()  functions,  the free store  memory is not always allocated in contiguous fashion.    This is because malloc allocates memory in various locations depending on the size of the requested block.   Exactly where memory allocations will take place is beyond the scope of the document and is dependent on the exact implementation of malloc()!   However, since Ruby make one or more very large allocations (Ruby heap's)  and many smaller allocations during object creation and Interpreter processing,  this generally results in only two primary locations for memory allocation.

The following is a plausible memory map for many processors running Ruby.

(Memory Allocation Map)

Figure 10: Memory allocation map

Ruby object heaps

The Ruby Interpreter implements a specialized memory management system for Ruby Objects  on top of the more general Free Store  management scheme provided by   malloc()realloc(),  and free()  functions.   Since Ruby Objects are fixed in size,  the Ruby GC can optimize their management.

The Ruby Object Heap,  hereafter called an Object Heap,  is implemented using two interrelated arrays. The array heaps[ ]  is the root structure.   Each element contains a heaps_slot structure.   Initially only the first elment of heaps[ ]  is instantiated.

The following code fragment show the important defines,  arrays,  and structures that support Object Heap contruction and management.

#define HEAPS_INCREMENT 10

#define HEAP_MIN_SLOTS 10000
static int heap_slots = HEAP_MIN_SLOTS;

static struct heaps_slot {
    void *membase;
    RVALUE *slot;
    int limit;
} *heaps;

static int heaps_length = 0;
static int heaps_used = 0;

(gc.c)

The heaps_slot  structure contains three elements that describe an allocated array.   This array is often refered to as the heap.   It is made up of  heap_slots  number of elements,  refered to as a slot.   Each of these slots  is RVALUE sized.   In other words, each slot is an empty object-sized  block of memory that has been set to zero.

The last sentence above is important.   These slots  will eventully become Ruby Objects!

The three elements the heap_slots  structure are:

1) *membase - Points to the origin of the allocated memory block.
2) *slot    - The first properly "aligned" slot location in the heap.
              (*membase and *slot MAY or MAY NOT be the same address) 
3) limit    - The total number of available "slots" in the heap. 
                 

(Heapitems)

Figure 11: Relationship between the heaps_slot structure and the heap it describes

The *slot  value is necessary because the starting address of the heap  must  be evenly divisible by the size of RVALUE.   When this is not the case,  the nearest evenly divisible address thats greater than *membase  is used.

The heaps  structure is a array of heaps_slot elements. The variables that control it's usage are:

1) *heaps       - Base address  of an array of heaps_slot elements.
2) heaps_length - The number of elements in the heaps array.
3) heaps_used   - The number of elements in heaps array being used.

(Heapitems)

Figure 12: The complete heaps structure

Freelist

The *freelist is the repository of object-sized  memory blocks that will be used to create new objects.   The list is created or extended by the add_heap() function by linking all the elements in a new heap into the *freelist.

Also remember, as objects are reclaimed during the sweep phase  these objects are also added to the *freelist.

This accomplished by steping through each element in the new heap and perfoming the following actions until the array of new elements is exhausted.

1) The heap element  is referenced through the as.free  structure.
2) The as.flags word is zero indicating it is a empty object, as.*next  
   is set to the contents of the *freelist.
3) The *freelist is set to the address of the current heap element.

(*freelist linking)

Figure 13: Freelist construction order

When the Ruby GC sweeps memory each object that is not marked  is added to the free list.   Any memory referenced by the object is simply freed and returned to free stote.   The as.free  entry is used to construct the *freelist.   The as.flags  word is cleared to indicate that the object is "free",  and the as.*next  entry is used to link the object into the *freelist  as shown above.

Add_heap ()

▼ Add_heap () (Concise edition)

Static void
add_heap()
{
    RVALUE *p, *pend;

// Section 1 -- Extend the length of Heaps Array (if necessary)
//
    if (heaps_used == heaps_length) {
  /* Realloc heaps */
  struct heaps_slot *p;
  int length;

  heaps_length += HEAPS_INCREMENT;
  length = heaps_length*sizeof(struct heaps_slot);
  RUBY_CRITICAL(
      if (heaps_used > 0) {
    p = (struct heaps_slot *)realloc(heaps, length);
    if (p) heaps = p;
      }
      else {
    p = heaps = (struct heaps_slot *)malloc(length);
      });
  if (p == 0) rb_memerror();
    }

// Section 2 - Create new Heap Array and compute Heap_slots values
//
    for (;;) {
  RUBY_CRITICAL(p = 
           (RVALUE*)malloc(sizeof(RVALUE)*(heap_slots+1)));
  if (p == 0) {
      if (heap_slots == HEAP_MIN_SLOTS) {
    rb_memerror();
      }
      heap_slots = HEAP_MIN_SLOTS;
      continue;
  }
        heaps[heaps_used].membase = p;
        if ((VALUE)p % sizeof(RVALUE) == 0)
            heap_slots += 1;
        else
            p = (RVALUE*)((VALUE)p + sizeof(RVALUE) - 
                   ((VALUE)p % sizeof(RVALUE)));
        heaps[heaps_used].slot = p;
        heaps[heaps_used].limit = heap_slots;
  break;
    }

// Section 3 - Compute Begin/End of Heap and Adjust Parameters
//
    pend = p + heap_slots;
    if (lomem == 0 || lomem > p) lomem = p;
    if (himem < pend) himem = pend;
    heaps_used++;
    heap_slots *= 1.8;
    if (heap_slots <= 0) heap_slots = HEAP_MIN_SLOTS;

// Section 4 - Add all new Slots to *freelist
//
    while (p < pend) {
  p->as.free.flags = 0;
  p->as.free.next = freelist;
  freelist = p;
  p++;
    }
}

(gc.c)

The following discussion covers the sections noted in the code block above.

Section 1:
When the initial allotted elements in the heaps  array has been exhausted, the array must be expanded to hold additional elements.   The special case of initialization is handled by the same code,  since the size heaps_used  and heaps_length  are both initially zero.

The length of the heaps array is initially set to the value of the HEAP_MIN_SLOTS constant.   Thereafter, it is exapanded by the same value.

Section 2:
This section attempts to allocate a new heap.   If allocation fails and the heap_slots  value is greater than the HEAP_MIN_SLOTS constant value,  an attempt is made to to allocate just HEAP_MIN_SLOTS.   If this also fails, Ruby aborts with memory failure.

If memory allocation is sucessful, the values for the heaps_slot  element are computed.   The *membase value is the address returned by the malloc function.   If this value can not be evenly divided by the size of RVALUE, the next heighest address that is evenly divisable is used for the *slot  value. Otherwise, the *membase  value is used for *slot.

Section 3:
This section computes the beginning and end addresses for the new heap.   Next,  the number of heaps_used  is incremented.   Finally,  heap_slots  variable is multiplied by the constant '1.8'.   This means that each new heap  will be '1.8'  times larger that the last.   This is done to reduce the number times that the GC is called for programs with large numbers of objects.

Section 4:
The last step is to link each element of the heap into the *freelist.   This provides Ruby with a large number of new "empty" objects to be used.

Rb_newobj ()

▼ rb_newobj ()


VALUE
rb_newobj()
{
    VALUE obj;

    if (!freelist) garbage_collect();

    obj = (VALUE)freelist;
    freelist = freelist->as.free.next;

    MEMZERO((void*)obj, RVALUE, 1);
    return obj;
}

(gc.c)

When Ruby needs to create a new object,  the rb_newobj()  function is called.   This function performs the following basic operations:

1) If the *freelist is empty, garbage collection is performed.
2) A new object is extracted from the *freelist.
3) The new object is cleared and returned to the caller.

Marking

This process is performed by marking that following objects and their descendants:

1) global variables and constants in Ruby.
2) live local variables in Ruby
3) the machine stack
4) objects referenced by C global variables
   registerd by rb_gc_register_address()

Rb_gc_mark ()

rb_gc_mark() - Simply Calls gc_mark().

void
rb_gc_mark(ptr)
    VALUE ptr;
{
    gc_mark(ptr, 0);
}
The separation of   rb_gc_mark()  into two functions is a result of 
refactoring. This allows the core processing functions of gc_mark() to
be modified for reuse in different contexts.  This is example of the 
continuing work being done to not only add new functionality,  but to 
improve the underlying code of the Ruby Interpreter.

gc_mark() - Recursively marks all live objects

static void
gc_mark(ptr, lev)
    VALUE ptr;
    int lev;
{
    register RVALUE *obj;

    obj = RANY(ptr);
    if (rb_special_const_p(ptr)) return; /* special const not marked */
    if (obj->as.basic.flags == 0) return;       /* free cell */
    if (obj->as.basic.flags & FL_MARK) return;  /* already marked */
    obj->as.basic.flags |= FL_MARK;

    if (lev > GC_LEVEL_MAX || (lev == 0 && ruby_stack_check())) {
      if (!mark_stack_overflow) {
        if (mark_stack_ptr - mark_stack < MARK_STACK_MAX) {
          *mark_stack_ptr = ptr;
          mark_stack_ptr++;   
        }
        else {
         mark_stack_overflow = 1;
        }
      }
      return;
   }
   gc_mark_children(ptr, lev+1);
}

The RANY definition allows the reference of any  object.   For this particular function,  it's purpose to reference the flags  word of each object processed.   It's definition is shown below:

    #define RANY(o) ((RVALUE*)(o))

(gc.c)

If the object is a special constant,  already marked,  or a 'free' object no further processing is performed.   Otherwise, the object is marked and further processing is required.

After the object being marked,  a bit of special processing takes place.   For purposes of program stability,  the depth of recursion during object tracing is limited.   When this depth is reached, an attempt is made to place the current object pointer onto a special axuillary stack, the mark_stack.

If the mark stack  is full, the the mark_stack_overflow  marker is checked.   If it is set, when the marking phase is finished,  all viable objects in all the heaps are marked.   This is necessary to insure that all 'live' objects will be marked!

If the mark_stack_overflow  marker is not set,  then all the objects on the mark_stack are passed to gc_mark_children.   When complete, the Ruby GC starts the sweep  phase.

gc_mark_children ()

The following is a abbreviated version of the source code for Gc_mark_children().   The code contains two main sections.   The first processes the T_NODE Objects.   The second processes the standard Object Nodes.

▼ gc_mark_children ()


static void
gc_mark_children(ptr, lev)
    VALUE ptr;
    int lev;
{
#ifdef ENABLE_DBG
if (mute_flag != 0) {
printf("  gc_mark_children(...) at (923)\n");
}
#endif
    register RVALUE *obj = RANY(ptr);

    goto marking;   /* skip */

  again:
    obj = RANY(ptr);
    if (rb_special_const_p(ptr)) return; /* special const not marked */
    if (obj->as.basic.flags == 0) return;       /* free cell */
    if (obj->as.basic.flags & FL_MARK) return;  /* already marked */
    obj->as.basic.flags |= FL_MARK;

  marking:
    if (FL_TEST(obj, FL_EXIVAR)) {
  rb_mark_generic_ivar(ptr);
    }

    switch (obj->as.basic.flags & T_MASK) {
      case T_NIL:
      case T_FIXNUM:
  rb_bug("rb_gc_mark() called for broken object");
  break;

      case T_NODE:
  mark_source_filename(obj->as.node.nd_file);
  switch (nd_type(obj)) {
    case NODE_IF:   /* 1,2,3 */
    case NODE_FOR:
    case NODE_ITER:
         
          /*............ abbreviation............ */
  }
  return;     /* no need to mark class. */
  }

    gc_mark(obj->as.basic.klass, lev);
    switch (obj->as.basic.flags & T_MASK) {
      case T_ICLASS:
      case T_CLASS:
      case T_MODULE:
  mark_tbl(obj->as.klass.m_tbl, lev);
  mark_tbl(obj->as.klass.iv_tbl, lev);
  ptr = obj->as.klass.super;
  goto again;

      case T_ARRAY:
  if (FL_TEST(obj, ELTS_SHARED)) {
      ptr = obj->as.array.aux.shared;
      goto again;
  }
  else {
      long i, len = obj->as.array.len;
      VALUE *ptr = obj->as.array.ptr;

      for (i=0; i < len; i++) {
    gc_mark(*ptr++, lev);
      }
  }
  break;

          /*............ abbreviation............ */
  
      default:
  rb_bug("rb_gc_mark(): unknown data type 0x%x(0x%x) %s",
         obj->as.basic.flags & T_MASK, obj,
         is_pointer_to_heap(obj) ? 
                            "corrupted object" : "non object");
    }
}

(gc.c)

Gc_mark_children()  tries to find all the objects that can be reached from the input object.   As we will see in this and following sections,  this is a endeavor that involves detailed processing.   What do we mean?    Consider the following simple array:   "ary = [ 1, 2]"

(node tree)

Figure 14: Objects for   "ary = [ 1 , 2 ]"

The Gray structures(See Below) indicate the extent of the basic input object,  that is the part of the object that has already been marked.   This is the object that was processed by gc_mark().   It has been passed to this routine to do what?   To mark all of it's childern!   The array structure itself is not marked,  it is considered part of the input object and it will deleted when it's object is destroyed.   This is true of all auxillary memory used by objects.

The blue structures are the children  of the input object,   They are the reachable  objects.   These are the objects that gc_mark_children() will Mark  if they are not already marked.

As you can see,  these are the constructs that represent the objects and data created by the programmer.   However,  there are another set of objects supported by the Ruby Interpreter.   These are T_NODES.   These are the objects that represent the program code inside Ruby.   Each source line  is converted into a tree of  T_NODES.

T_NODES will be discussed in some detail in Secton II.   For now,  T_NODES are shown just to illustrate the processing in first section of gc_mark_children().   The following T_NODE Tree is created for the source line "x = 5 + 1":

(tnode tree)

Figure 15: T_NODES Generated for   "x = 5 + 1"

As before the gray structures illustrate the input object processed by gc_mark() and the Blue structures represent the objects that are "reachable" by gc_mark_children().

This pairing of gc_mark  and gc_mark_children()  work recursively to trace the objects.   The trace depth variable lev  and the mark_stack work to prevent run-away recursion.   These features breaks very deep traces into smaller chucks to ease processing.   It is important to know that while these features sometimes fail,  the program does not crash.  It simply marks ALL of the ojbects.   While this mary appear crude, it will allow Ruby to continue to run until the processor memory is exhausted.

In the case of T_DATA Objects,  if a Marking Function  is present, it is executed.   Ruby has no way of knowing what the structure of a T_DATA Object represents,  thus we can not process this object any further.

 case T_DATA:
   if (obj-> As.Data.Dmark) (*obj-> As.Data.Dmark) (DATA_PTR (obj));
 break;

(gc.c)

Rb_gc ()

The function rb_gc()  simply calls the function garbage_collect()  and on completion calls rb_gc_finalize_deferred().

void
rb_gc()
{
    garbage_collect();
    rb_gc_finalize_deferred();
}

(gc.c)

The function  garbage_collect()  attempts to reclaim the space allocated to non-live  objects before adding another heap.   The reclaimation process is performed by two main tasks.   The first task is performed by calling gc_mark()  for the root set  of objects.   These are:

  1. global variables and constants in Ruby.
  2. live local variables in Ruby
  3. the machine stack
  4. objects referenced by C global variables registerd by rb_gc_register_address()

Second,  the function gc_sweep  is then called called to reclaim all objects that were not marked  by the marking functions.   That is,  all objects that were not reachable from the root set  of objects.

The function rb_gc_finalize_deferred()  is called to perform any actions that have be performed before an object is destroyed.   There three instances when this must be done.

  1. When an object was placed on the deferred_final_list  by calling the method ObjectSpace.define_finalizer,   This method stores a proc  that rb_gc_finalize_deferred()  will execute before deleting the target object.
  2. When a T_DATA Object has a dfree  function specified.
  3. When a T_FILE Object is destroyed, any open I/O channels must be closed.

Ruby stack and object lists

The following section of code,  from garbage_collect(),  traces objects found in Ruby stack structures and certain lists of objects.   These include Local Variables,  Nodes in Stack Frames,  Scope Objects, in-block Variables, and any finalizer objects.  

   gc_mark((VALUE)ruby_current_node, 0);

    /* mark frame stack */
    for (frame = ruby_frame; frame; frame = frame->prev) {
  rb_gc_mark_frame(frame);
  if (frame->tmp) {
      struct FRAME *tmp = frame->tmp;
      while (tmp) {
    rb_gc_mark_frame(tmp);
    tmp = tmp->prev;
      }
  }
    }
    gc_mark((VALUE)ruby_scope, 0);
    gc_mark((VALUE)ruby_dyna_vars, 0);
    if (finalizer_table) {
  mark_tbl(finalizer_table, 0);
    }

(gc.c)

The variable ruby_current_node  points to the current node on the Abstract Syntax Tree(AST).   As usual, marking this node also marks all nodes reachable from this node.

Each Stack Frame contains a pointer to an AST Node,  which is marked!   Additionally, a Stack Frame may contain a sub-list of Stack Frames, which also have their Nodes marked.  

Marking ruby_scope  marks all local variables in the current Scope.

Marking ruby_dyna_vars  marks all in-block Variables in the current Scope.

Finally, if a finalizer_table  is present,  all objects contained in the table are marked.

Registers and the Stack

The GC  also has to worry about objects referenced from machine registers and the stack.   For registers, this presents a problem for the interpreter.   When you attempt to examine register contents,  knowledge of the underlying architecture is required.   This presents some difficulties.   This will be obvious when examining the code.   For the purposes of this document,  will examine only the X86 Processor.

The following code example is for the X86 Processor ( both 32 & 64 bit).   The FLUSH_REGISTER _WINDOWS definition is not used by the X86.   A setjmp()  function is used to store the machine registers in a known location.   Then function mark_locations_array()  then marks any objects referenced from the saved registers.   This function will be discussed shortly.

Now the registers have been marked,  attempt is made to find all apparent object references on the full stack.   These references unlike others are more tentative in nature.   All we know about the references on the stack is that they point an object in an active heap.   When found, these objects are marked.

The only problem we have is that the object may be a "dead" object.   We just don't know for certain!   The reference may never be used.  This is yet another example of why GC is called a conservative garbage collector.   When we are uncertain about whether an object is alive or dead,  we err on the side of caution and mark it!


    FLUSH_REGISTER_WINDOWS;     /* "((void)0)" for X86 Architecture */

    setjmp(save_regs_gc_mark);  /* Save registers in local array */
  
    mark_locations_array((VALUE*)save_regs_gc_mark, 
                     sizeof(save_regs_gc_mark) / sizeof(VALUE *));

    #if STACK_GROW_DIRECTION < 0
       rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);  ??

    ( code for other architectures )

(gc.c)
#ifdef __GNUC__
#if defined(__human68k__) || defined(DJGPP)
#if defined(__human68k__)
typedef unsigned long rb_jmp_buf[8];
__asm__ (".even\n\
_rb_setjmp:\n\
	move.l	4(sp),a0\n\
	movem.l	d3-d7/a3-a5,(a0)\n\
	moveq.l	#0,d0\n\
	rts");
#ifdef setjmp
#undef setjmp
#endif
#else
#if defined(DJGPP)
typedef unsigned long rb_jmp_buf[6];
__asm__ (".align 4\n\
_rb_setjmp:\n\
	pushl	%ebp\n\
	movl	%esp,%ebp\n\
	movl	8(%ebp),%ebp\n\
	movl	%eax,(%ebp)\n\
	movl	%ebx,4(%ebp)\n\
	movl	%ecx,8(%ebp)\n\
	movl	%edx,12(%ebp)\n\
	movl	%esi,16(%ebp)\n\
	movl	%edi,20(%ebp)\n\
	popl	%ebp\n\
	xorl	%eax,%eax\n\
	ret");
#endif
#endif
int rb_setjmp (rb_jmp_buf);
#define jmp_buf rb_jmp_buf
#define setjmp rb_setjmp
#endif /* __human68k__ or DJGPP */
#endif /* __GNUC__ */

(gc.c)

Mark_locations_array ()

This function purpose is to step through an array a possible VALUE  words and determine if they point to a object in a valid heap.   Each possible VALUE is presented to the function is_pointer_to_heap().   If the function returns true the object is marked by calling gc_mark().

As discussed in previous sections,  these arrays have been stored registers and the stack.


mark_locations_array(x, n)
    register VALUE *x;
    register long n;
{
    VALUE v;
    while (n--) {
        v = *x;
        if (is_pointer_to_heap((void *)v)) {
            gc_mark(v, 0);
        }
    x++;
    }
}

(gc.c)

Is_pointer_to_heap ()

This function given a possible  object address,  determines if it is a valid address.   We know that all objects are allocated from one or more heaps in memory.   If the given pointer falls within the address range of a valid heap,  then it is an object.

However, just falling with the proper range does not make is a "live" object.   It simply indicates that it maybe  a valid object.   The object could easily be on the freelist,  for example.   This function only determines that it is valid object address.

static inline int
is_pointer_to_heap(ptr)
    void *ptr;
{
    register RVALUE *p = RANY(ptr);
    register RVALUE *heap_org;
    register long i;

    if (p < lomem || p > himem) return Qfalse;
    if ((VALUE)p % sizeof(RVALUE) != 0) return Qfalse;

    /* check if p looks like a pointer */
    for (i=0; i < heaps_used; i++) {
        heap_org = heaps[i].slot;
        if (heap_org <= p && p < heap_org + heaps[i].limit)
          return Qtrue;
    }
    return Qfalse;
}

(gc.c)

Register window

 #if defined (sparc) || defined (__sparc__)
 # if defined (linux) || defined (__linux__)
 #define FLUSH_REGISTER_WINDOWS asm ("ta 0x83")
 # else /* Solaris and not sparc linux *
 #define FLUSH_REGISTER_WINDOWS asm ("ta 0x03")
 # endif
 #else /* Not a sparc *
 #define FLUSH_REGISTER_WINDOWS
 #endif

(defines.h)

Machine stack

 rb_gc_mark_locations (rb_gc_stack_start, (VALUE*) STACK_END);
 #if defined (__human68k__)
 rb_gc_mark_locations ((VALUE*) ((char*) rb_gc_stack_start
 
 (VALUE*) ((char*) STACK_END + 2));
 #endif

(gc.c)

Init_stack ()

This function initializes the stack parameters.   As with some previous sections that involved machine level interactions,  we will discuss specifics for the X86 processor only.

First,  the starting address of the stack must be computed.   First,  if the specified parameter address contains zero,  it will be filled with it's own address.

Now the macro  STACK_UPPER(x, y, z)  is a piece of tricky business.   Since stack direction and usage varies,  this macro adjusts for it.   For X86 processors the third parameter, z,  is always used.   Thus, the first usage of  STACK_UPPER()  executes the instruction ++addr.   The second,  executes the conditional rb_gc_stack_start < addr.

If the symbol HAVE_GETRLIMIT is defined,  the code following will compute a "reasonable" stack size for the current processor.   This is done by asking the processor for the current Stack Limit and using it to compute the STACK_LEVEL_MAX value.   For the X86 Processor the initial value is set to 655300.

void
Init_stack(addr)
    VALUE *addr;
{
#ifdef __ia64
  ::       ::          ::
#endif

#if defined(_WIN32) || defined(__CYGWIN__)
  ::       ::          ::
#elif defined(STACK_END_ADDRESS)
  ::       ::          ::
#else
    if (!addr) addr = (VALUE *)&addr;   /* X86 Processing */
    STACK_UPPER(&addr, addr, ++addr);
    if (rb_gc_stack_start) {
  if (STACK_UPPER(&addr,
      rb_gc_stack_start > addr,
      rb_gc_stack_start < addr))
      rb_gc_stack_start = addr;
  return;
    }
    rb_gc_stack_start = addr;
#endif

#ifdef HAVE_GETRLIMIT                  /* X86 Processing */
    {
  struct rlimit rlim;

  if (getrlimit(RLIMIT_STACK, &rlim) == 0) {
      unsigned int space = rlim.rlim_cur/5;

      if (space > 1024*1024) space = 1024*1024;
      STACK_LEVEL_MAX = (rlim.rlim_cur - space) / sizeof(VALUE);
  }
    }
#endif
}


(gc.c)

STACK_END

Again,  as we did previously,  we have isolated the defines for the X86 processor.   Other processors take slightly different tacks.   The important issue here is that the two defines that "set" and "return" the Stack End Address are created properly for the current processor.

#ifdef C_ALLOCA
   ::         ::       ::
#else
# if defined(__GNUC__) && 
         defined(USE_BUILTIN_FRAME_ADDRESS) && !defined(__ia64)
   ::         ::       ::
   ::         ::       ::
# else                       /* Starting X86 Defines */                                   
#  define  SET_STACK_END    VALUE *stack_end = alloca(1)
# endif
# define STACK_END (stack_end)
#endif

(gc.c)

The SET_STACK_END operation creates the pointer to the Stack End Address by allocating one word on the current stack.   The returned address is stored in the pointer location, stack_end.

    VALUE *stack_end = alloca(1)

When ever the define STACK_END is used, the value (stack_end)  is inserted.

These methods of creating and returning the Stack End Address isolates all the processor differences in static defines rather than coding executable methods of processor differentiation.   This is a common technique used throughout the Ruby Interpreter.

Rb_gc_mark_locations ()

This function searches the stack for pointers into active heaps.   Checks are made to assure that the possible object appears to be an appropriate size and resides at valid heap location.   At this point,  we assume it is a "live" object.   There is some chance that the object is already "dead", but as a conservative collector we accept the occasional uncollected "dead" object.

The actual traversing of the stack and marking objects is performed by mark_locations_array().   The function rb_gc_mark_locations ()  only coverts the input parameters into a form that is acceptable to mark_locations_array().

void
rb_gc_mark_locations(start, end)
    VALUE *start, *end;
{
    long n;

    n = end - start;
    mark_locations_array(start,n);
}
(gc.c)

While traversing a stack is generally a processor specific function,  rb_gc_mark_locations() avoids this problem by requiring the caller make adjustments for processor differences.   The following code encompasses the various processor adjustments for calls to rb_gc_mark_locations().

 garbage_collect():  (continued)
    ::       ::       ::
    ::       ::       ::
#if STACK_GROW_DIRECTION < 0      /* X86 Processors */
    rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);

#elif STACK_GROW_DIRECTION > 0
    rb_gc_mark_locations(rb_gc_stack_start, (VALUE*)STACK_END + 1);

#else
    if ((VALUE*)STACK_END < rb_gc_stack_start)
       rb_gc_mark_locations((VALUE*)STACK_END, rb_gc_stack_start);
    else
       rb_gc_mark_locations(rb_gc_stack_start, (VALUE*)STACK_END + 1);
#endif

#ifdef __ia64
    /* mark backing store (flushed register window on the stack) */
    /* the basic idea from guile GC code                         */
    rb_gc_mark_locations(rb_gc_register_stack_start,
          (VALUE*)rb_ia64_bsp());
#endif

#if defined(__human68k__) || defined(__mc68000__)
    rb_gc_mark_locations((VALUE*)((char*)STACK_END + 2),
       (VALUE*)((char*)rb_gc_stack_start + 2));
#endif

(gc.c)

As an example the first two lines are for calling this rb_gc_mark_locations()  on X86 Processors.

Notice that the "start" and "end" specifications are not necessarily  the actual start and end addresses. These values are calculated for each processor so the stack is traversed in a positive direction, regardless of the actual stack orientation.

Other route objects

Lastly, there are a number of system tables and miscellaneous objects that are left to be marked.   We will cover them in this section.   As you can see below,  this is the final marking phase before the sweep  begins!

    garbage_collect():  (continued)
       ::         ::       ::
       ::         ::       ::
    rb_gc_mark_threads();

    /* mark protected global variables */
    for (list = global_List; list; list = list->next) {
       rb_gc_mark_maybe(*list->varptr);
    }
    rb_mark_end_proc();
    rb_gc_mark_global_tbl();

    rb_mark_tbl(rb_class_tbl);
    rb_gc_mark_trap_list();

    /* mark generic instance variables for special constants */
    rb_mark_generic_ivar_tbl();

    rb_gc_mark_parser();

    /* gc_mark objects whose marking are not completed*/
    do {
       while (!MARK_STACK_EMPTY) {
         if (mark_stack_overflow){
           gc_mark_all();
         }
         else {
           gc_mark_rest();
         }
       }
       rb_gc_abort_threads();

    } while (!MARK_STACK_EMPTY);

    gc_sweep(); /* Performs the Sweep Phase of the Garbage Collector */
 }

(gc.c)

rb_gc_mark_threads()
If threads are being used,  this function will mark all active threads.

Global List and rb_gc_mark_maybe()
The global list contains pointers to both simple global address and objects.   The global_list  is scanned and each address is passed to rb_gc_mark_maybe().   This function determines if the pointer resides in an active heap,  which means the address points at an object.   All objects found on the global_list are marked.

rb_gc_mark_end_proc()
The handling of proc's by eval.c  evolves keeping various lists of proc objects.  This function marks all the objects recorded in these lists.

rb_gc_mark_global_tbl()
Ruby maintains an internal hash table of all define global variables.   This function marks each of these variables.

rb_gc_mark_trap_list()
Ruby keeps a list of signal traps for handling threads.   This function marks the objects associated with these thread signals.

rb_gc_mark_generic_ivar_tbl()
When instance variables are created for special constants,  they are stored in the generic_ivar_tbl.   This is necessary because special constants exist only in the VALUE word and do not instantiate an actual object.   All variable objects in this table are marked!

rb_gc_mark_parser()
This function marks the objects that hold the state of the parser.

Marking Objects with incomplete marking
The last marking sequence to complete is the marking of objects on the mark_stack.   As stated previously,  if the mark_stack has overflowed  all objects will be marked.   Otherwise, just the objects on the mark_stack are fed to gc_mark_children().

rb_gc_abort_threads()
At the completion of all marking,  this function will terminate all unmarked threads.

Sweep

Special treatment of NODE Objects

For the sweep phase  we will search for objects that are not marked and release them!   However,  there is a small problem with objects of type NODE if the parser's semantic stack is not allocated on the machine stack.   For configurations were the machine stack is not used,  the following code is provided to mark those objects.   Notice that even for these configurations,  this is only a problem if the parser in compiling!

static void
gc_sweep()
{
 RVALUE *p, *pend, *final_list;
 int freed = 0;
 int i;
 unsigned long live = 0;
 unsigned long free_min = 0;
   ::       ::       ::    ::
   ::       ::       ::    ::
/* We should not reclaim nodes during compilation             */
/* if yacc's semantic stack is not allocated on machine stack */

 if (ruby_in_compile && ruby_parser_stack_on_heap()) {
   for (i = 0; i < heaps_used; i++) {
     p = heaps[i].slot; pend = p + heaps[i].limit;
     while (p < pend) {
       if (!(p->as.basic.flags&FL_MARK) && BUILTIN_TYPE(p) == T_NODE)
         gc_mark((VALUE)p, 0);
        p++;
        }
     }
   }
   ::       ::       ::    ::
   ::       ::       ::    ::

(gc.c)

Reclaiming "dead" Objects

The following section of gc_sweep()  examines the active heaps for objects that are not marked!   Unmarked,  or "dead",  objects are first processed by obj_free().   This function releases any auxiliary memory that has been allocated for an object.   These are things like the memory that holds the string for a string object.   Other objects, such a float,  are self-contained objects and are not modified.

This does not actually "free" an object,  it simply reduces it to a bare, and probably an incomplete,  object.   It becomes free only when it is placed on the freelist.   However, before it can legitimately be placed on that list,  it must be checked for a finalizer.

The rest of the section is concerned with insuring that unnecessary heaps are reclaimed and that there are sufficient "free' objects available.   Care is taken not to reclaim an empty heap if the number of "free" objects is less than the computed value free_min.   This value never less than the constant FREE_MIN (currently 4096).   If however,  after all reclamation has been complete the number of "free" objects is less than free_min,  the function add_heap()  performed to increase their number.

  gc_sweep():  (continued)
   ::       ::       ::    ::
   ::       ::       ::    ::
  freelist = 0;
  final_list = deferred_final_list;
  deferred_final_list = 0;
  
  for (i = 0; i < heaps_used; i++) {
    int n = 0;
    RVALUE *free = freelist;
    RVALUE *final = final_list;
    
    p = heaps[i].slot; pend = p + heaps[i].limit;
    while (p < pend) {
      if (!(p->as.basic.flags & FL_MARK)) {
        if (p->as.basic.flags) {
            obj_free((VALUE)p);
        }
        if (need_call_final && FL_TEST(p, FL_FINALIZE)) {
          p->as.free.flags = FL_MARK; /* remain marked */
          p->as.free.next = final_list;
          final_list = p;
        }
        else {
          p->as.free.flags = 0;
          p->as.free.next = freelist;
          freelist = p;
        }
        n++;
      }
      else if (RBASIC(p)->flags == FL_MARK) {
        /* objects to be finalized */
        /* do nothing remain marked */
      }
      else {
        RBASIC(p)->flags &= ~FL_MARK;
        live++;
      }
      p++;
    }
    if (n == heaps[i].limit && freed > free_min) {
      RVALUE *pp;
      
      heaps[i].limit = 0;
      for (pp = final_list; pp != final; pp = pp->as.free.next) {
          pp->as.free.flags |= FL_SINGLETON; /* freeing page mark */
      }
      freelist = free;  /* cancel this page from freelist */
    }
    else {
      freed += n;
    }
  }
  if (malloc_increase > malloc_limit) {
      malloc_limit += (malloc_increase - malloc_limit) * (double)live / (live + freed);
      if (malloc_limit < GC_MALLOC_LIMIT) malloc_limit = GC_MALLOC_LIMIT;
  }
  malloc_increase = 0;
  if (freed < free_min) {
      add_heap();
  }
   ::       ::       ::    ::
   ::       ::       ::    ::

(gc.c)

Finalizers

Ruby supplies two methods for managing finalization processing for objects.   They are:
   1)  ObjectSpace.define_finalizer
   2)  ObjectSpace.undefine_finalizer

Defined finalizers are maintained in the hash table finalizer_table.   This table is indexed by the Object's VALUE and contains a reference to a proc block.

Unfortunately,  two more depreciated methods must also be handled.   They are:
   1)  ObjectSpace.add_finalizer
   2)  ObjectSpace.remove_finalizer

These finalizers are held in the table finalizers.   This table is indexed by the Object's ID and contains a reference to a proc block.   As noted these are depreciated methods and are meant to be eventually removed.

Let us look closer at the section above responsible for finalizer processing(shown below).   When the first  define_finalizer()  method is executed,  the variable need_call_final  is set to one(1).   This enables the checking for finalizers.   Surprisingly,  the flag is never cleared once set.   Even if all finalizers are undefined.

   ::       ::       ::    ::
   if (need_call_final && FL_TEST(p, FL_FINALIZE)) {
     p->as.free.flags = FL_MARK; /* remain marked */
     p->as.free.next = final_list;
     final_list = p;
   }
   else {
     p->as.free.flags = 0;
     p->as.free.next = freelist;
     freelist = p;
   }
   ::       ::       ::    ::

Remember at this point in the processing the current object is unmarked!.   If the FL_FINALIZE Flag is set for the current object,  the object is placed on the final_list and marked.   For those that are wondering why the object is being marked again,  it must now persist in it's new state until the proc  associated with the finalizer is actually executed.

If FL_FINALIZE is not set, the object is placed on the free_list.

Finalizers are only executed at the completion of garbage collection and at program exit.

Run_final

The function run_final(),  as shown below,  handles executing the finalizer proc's.

static void
run_final(obj)
    VALUE obj;
{
    long i;
    int status, critical_save = rb_thread_critical;
    VALUE args[3], table, objid;

    objid = rb_obj_id(obj); /* make obj into id */
       ::       ::       ::    ::
       ::       ::       ::    ::
     /* Code for Depreciated 'add_final' method */
       ::       ::       ::    ::
       ::       ::       ::    ::
    if (finalizer_table && st_delete(finalizer_table, 
                  (st_data_t*)&obj, &table)) {
      for (i=0; ilen; i++) {
        VALUE final = RARRAY(table)->ptr[i];
        args[0] = RARRAY(final)->ptr[1];
        if (!args[1]) args[1] = rb_ary_new3(1, objid);
        args[2] = FIX2INT(RARRAY(final)->ptr[0]);
        rb_protect((VALUE(*)_((VALUE)))run_single_final, 
                   (VALUE)args, &status);
      }
    }
    rb_thread_critical = critical_save;
}

(gc.c)

This function processes the finalizer_table  created by the define_finalizer()  method.   The finalizer parameters are referenced through the hash table finalizer_table.   The hash key  is the target Object's VALUE Pointer and hash value  is a VALUE pointer to the Block Array.

The Block Array contains one VALUE Pointer to a Command Array  for each define_finalizer()  method call applied to a specific object.   If define_finalizer()  was called twice for the same object,  the Block Array will contain two entries.

Each VALUE pointer in the Block Array references an array containing three entries,  cmd,  args,  and safe_level.   The proc  object is executed by feeding these three entries to the rb_eval_cmd()  function.

(finalizer structure)

Figure 16: Finalizer Structure

Rb_gc_force_recycle ()

There comes a time when you wish to kill an object with extreme prejudice!   Not at some undetermined time later,  now!.   While this facility is not available the user,  it is available to internal routines where the consequences are determinate.   In any case,  rb_gc_force_recycle()  will clear the objects flags(i.e. effectively deleting the object) and places the object on the freelist!.

A word of warning though,  rb_gc_force_recycle()  does not release any memory allocated by the object.   Failure to take this into account will lead to memory leaks!

void
rb_gc_force_recycle(p)
    VALUE p;
{
    RANY(p)->as.free.flags = 0;
    RANY(p)->as.free.next = freelist;
    freelist = RANY(p);
}

(gc.c)

Timing of starting

Inside gc.c

Where does Garbage Collection start?   There are three primary places:

1.  Ruby_xmalloc()
2.  Ruby_xrealloc()
3.  Rb_newobj()

These are the functions that allocate space for objects,  both the object itself and any axillary space needed.   Axillary space consists of things like space for strings and other data structures.

The functions Ruby_xmalloc() and Ruby_xrealloc() are used by a variety of internal functions to process requests for additional memory.   When a memory allocation request fails,  these functions call garbage_collect()  to free up memory.   A second failure will cause Ruby to generate a memory allocation error and exit.

The function rb_newobj()  is used to create new objects.   When there are no objects on the freelist,  the function garbage_collect() is run to free up the space allocated to "dead" objects.   When this does not create enough free objects,  a new heap is created.   As above,  failure to allocate memory will cause Ruby to print an error and exit.

Inside interpreter

Inside the interpreter there are numerous places where memory must be allocated,  such as opening files.   In these and other cases when memory is required,  the function rb_gc()  is called and this function calls garbage_collect().

The Ruby statements GC.startgc.garbage_collect,  and ObjectSpace.garbage_collect  all call rb_gc()  directly.

Formation of object

The story of Garbage Collection has ended.   We have discussed the reclamation of dead objects.   We now need to further discuss the formation of objects.   While this was done to some degree during the previous chapter for Classes,  we will continue the discussion in the following sections.

Allocation framework

We have covered the formation of several types of objects.   Here we tackle the creation of a class instance with the 'new' method.

    ::     ::   ::
  class C
  end

  C.new()
    ::     ::   ::
  
(<somefile>.c)

We can guess that  C.new()  will create an object.

But how is this accomplished?   With most languages,  this sort of operation would be handled internally by the interpreter.   In Ruby,  this and many other operations are handled by predefined methods.   As part of the initialization of the Ruby interpreter,  all the built-in objects and their methods are created before interpretation begins.   The following is a portion for the initialization code for Objects.

void

Init_Object() { :: :: :: rb_define_alloc_func(rb_cObject, rb_class_allocate_instance); :: :: :: rb_define_method(rb_cClass, "new", rb_class_new_instance, -1); :: :: :: } (object.c)

Let's set aside the first statement for minute and consider the second statement.   This statement calls a function that adds the method  new()  to the class Class.   This means when a new()  method is specified for a class,  the function rb_class_new_instance()  is executed.

VALUE
rb_class_new_instance(argc, argv, klass)
    int argc;
    VALUE *argv;
    VALUE klass;
{
    VALUE obj;

    obj = rb_obj_alloc(klass);
    rb_obj_call_init(obj, argc, argv);

    return obj;
}

(object.c)

How does the interpreter "know" which allocation function to use when creating an object?   Since different objects can be allocated differently,  then each type of object should have a way to define the allocation function.   That method is called appropriately rb_define_alloc_func().   Notice that this is the first statement highlighted in Init_Object().

For class allocation,  the allocation function rb_class_allocate_instance()  is called by rb_obj_alloc()

static VALUE
rb_class_allocate_instance(klass)
    VALUE klass;
{
    NEWOBJ(obj, struct RObject);
    OBJSETUP(obj, klass, T_OBJECT);
    return (VALUE)obj;
}

(object.c)

This allocation function, and many others, makes use of the macro's NEWOBJ and OBJSETUP to create an appropriate object structure and install the specified values.  

 #define NEWOBJ (obj and type) type *obj = (type*) rb_newobj ()

 #define OBJSETUP (obj, c, t) do {\
   RBASIC (obj) -> Flags = (t); \
   RBASIC (obj) -> Klass = (c); \
   if (rb_safe_level () > = 3) FL_SET (obj and FL_TAINT);\
  While (0)

(ruby.h)

The NEWOBJ  macro calls rb_newobj(),  which fetches the next  RVALUE  slot off the freelist.   Additionally, it sets the entire RVALUE block to zero.

The OBJSETUP  macro sets the Klass  and Flags  words in the object.   It also sets the FL_TAINT flag if necessary.

Returning to the discussion of rb_class_new_instance()  above.   When control returns from rb_obj_alloc(),  the new object is initialized by calling rb_obj_call_init().   For a new class object this entails no further action and the function rb_obj_dummy()  is called.

Creating a class object using the new()  command involves only the following generic steps:

SomeClass.New = Class#new (rb_class_new_instance)
    SomeClass.Allocate = Class#allocate (rb_class_allocate_instance)
    SomeClass#initialize = Object#initialize (rb_obj_dummy)

Formation of user definition objects

This section is concerned with the formation of user defined  objects with the extended library.   The extended library API contains three functions for handling these type of objects.

1. Data_Wrap_Struct - Wraps a value/structure in a Ruby RData Object
2. Data_Make_Struct - Allocates Memory for a 'c-type' and wraps it
3. Data_Get_Struct  - Returns Pointer to the wrapped structure
(RData Structure)

Figure 17: RData Structure

The RDATA  structure is the ruby object structure used to 'wrap' user defined  objects.   Other than the standard basic  structure, the three other items are used as follows:

*dmark  points to a function for marking  Ruby Object references.   This is only necessary if your structure contains references to Ruby objects.   If so,  you must supply a function (pointed to by *dmark)  that will call rb_gc_mark()  for each reference to a Ruby Object.   If no function is needed,  then *dmark  may be set to zero.

*free  points to a user function to free the user defined objects,  the system function free(),  or a '-1'  to just to free the memory pointed to by *data_ptr.   Note that both specifying free()  and '-1' accomplish the same thing,  the choice is up to the user.

*data_ptr  points at the user defined structure 'wrapped' by the RDATA  Object.

Data_Wrap_Struct()   and   Data_Make_Struct()


The Data_Wrap_Struct()  function is called as follows:
Wrap_Data_Struct( VALUE class, void (*dmark)(), void (*free)(), *ptr )

The purpose of Data_Wrap_Struct() is to encapsolate a user defined value or structure in a Ruby RData  Object. Let us look at a simple example:

typedef struct adot {
  int horz;
  int vert;
} ADot;

ADot *ptr;
VALUE info

ptr = ALLOC(ADot);
ptr->horz = 0;
ptr->vert = 0;

info = Data_Wrap_Struct( cDots,  0,  free,  ptr);

Though not shown here, this code refers on a user defined class named cDots.   This class is not particularly important beyond the need for a specific class to be specified.

Since the encapsulated structure does not refer to any other Ruby Objects,  a *dmark  function pointer does not need to be specified.   Further,  because the structure does not have references to other user defined allocations,  the method free  is all that is required for freeing the ADot  structure.   The value '-1'  could have been used instead of specifying free  and produced the same result.

As you can guess,  the ptr  value points a the starting address of a ADot  structure.  

In conclusion,  it returns a VALUE pointer to a RData  object.

The Data_Make_Struct()  function is called as follows:

Data_Make_Struct( VALUE class, c-type, void (*dmark)(), void (*free)(), (c-type) *ptr )

The primary difference between these functions is that Data_Make_Struct()  performs the allocation of the 'c' data structure for you.   Presented here is another version of the previous example.   It uses Data_Make_Struct()  instead of Data_Wrap Struct().

typedef struct adot {
  int horz;
  int vert;
} ADot;

ADot *ptr;
VALUE info

info = Data_Wrap_Struct( cDots,  ADot,   0,  free,  ptr);

ptr->horz = 0;
ptr->vert = 0;

While the use of ptr  in this case to initialize the structure is valid,  you should generally use the method Data_Get_Struct()  to fetch the address of the encapsulated structure.

The definitions for Data_Wrap_Struct and Data_Make_Structare defined as macros.


/*
#define RUBY_DATA_FUNC(func) ((void (*)_((void*)))func)
*/
typedef void (*RUBY_DATA_FUNC) _((void*));

VALUE rb_data_object_alloc _((VALUE,void*,RUBY_DATA_FUNC,RUBY_DATA_FUNC));

#define Data_Wrap_Struct(klass,mark,free,sval)\
    rb_data_object_alloc(klass,sval,(RUBY_DATA_FUNC)mark,(RUBY_DATA_FUNC)free)

#define Data_Make_Struct(klass,type,mark,free,sval) (\
    sval = ALLOC(type),\
    memset(sval, 0, sizeof(type)),\
    Data_Wrap_Struct(klass,mark,free,sval)\
)

(ruby.h)

The function Data_Make_Struct  uses Data_Wrap_Struct  to actually create the desired RData Object.   So in both cases that actual creation of the object is accomplished with the function rb_data_object_alloc()

VALUE rb_data_object_alloc(klass, datap, dmark, dfree)
    VALUE klass;
    void *datap;
    RUBY_DATA_FUNC dmark;
    RUBY_DATA_FUNC dfree;
{
    NEWOBJ(data, struct RData);
    if (klass) Check_Type(klass, T_CLASS);
    OBJSETUP(data, klass, T_DATA);
    data->data = datap;
    data->dfree = dfree;
    data->dmark = dmark;

    return (VALUE)data;
}

(gc.c)

Data_Get_Struct ()

The Data_Get_Struct()  function is called as follows:

Data_Get_Struct( VALUE object, c-type, (c-type) *ptr )

This function returns a c-type pointer to the value or structure embedded in a RData  Object.   The definition for Data_Get_Struct  is a macro and it's definition is as follows:

#define Data_Get_Struct(obj,type,sval) do {\
    Check_Type(obj, T_DATA); \
    sval = (type*)DATA_PTR(obj);\
} while (0)

(ruby.h)

The following is an example of how to retrieve data from a RData  object.   In this case we examine the data stored in the examples above:

typedef struct adot {
  int horz;
  int vert;
} ADot;

ADot *sval;
VALUE info;

int HORZ;
int VERT;

Data_Get_Struct( info, ADot, sval);

HORZ = sval->horz;  
VERT = sval->vert;

The original work is Copyright © 2002 - 2004 Minero AOKI.
Translations,  additions,  and graphics by C.E. Thornton
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.