Hash Tables are constructed with  paired  Data Values.    One is the value is the key Value  and the other is the Data Value.   A hash table is not the only method storing and retrieving Key/Value  pairs.   For example it is possible to use a linked list.
Struct entry { ID key; VALUE val; Struct entry *next; /* Points at next entry */ };
However, this method is about the slowest way to implement a Key/Value  pair table.   If there are a 1000 entries, it can take up to 1000 steps to find the correct pair.   If the Keys  are ordered, alphabetically or Numerically, the retrieval time can be shorten dramatically.   As we will see, Hash Tables are the most efficient method, in most cases,  for storing Key/Value  pairs.
Ruby's Hash Table Functions are provided by the Public Domain Package  st.c./* This is a public domain general purpose hash table package Written by Peter Moore @ UCB. */ (st.c)
First let us setup a hash table with 'n' Bins.  In the following example 'n'  is chosen to be 64.
Figure 1: Basic Hash Table
When inserting the Value  of a Key/Value  pair into a Hash table,  a Hash Function  uses the Key  to generate an index.   In the simplest case,  the Value is stored in the Hash Table at the computed index.   Regardless of the Key's  data type,  the Hash Function  must return an index in the range of 0-(n-1).
Figure 2: Insertion Without Collision
For example when collision happens,  one method of resolving this problem is to insert the data value in the following element.   It is called the  opening address method (figure 3).
Figure 3: Opening address method
Ruby uses the Chain Method.   When a collision is detected,  the instead of advancing the next address in the Hash Table,  it starts a chain a the collision point.   That is a sub-list is created and the new value is stored in the sub-list.   Additional collisions at the same index will be added to the end of the chain.
Figure 4: Chain method
There are methods of generating a Perfect Hash Function ,  that is Hash Functions that guarantee that no collisions will take place.   The GNU gperf package is one method of generating Perfect Hash Functions.
Unfortunately,  these methods only work when the complete set of Keys  is known in advance.   Since this is not possible for Ruby,  the Chain Method  is used.
Each Hash Table must have a root structure.   For Ruby Hash Table's it is the structure st_table.   It maintains the information necessary for the Hash Table's processing functions.   The supporting source code is shown after the description of st_table.
1) st_hash_type structure contains the following pointers: a) Pointer to a function that compares Keys. b) Pointer to the function the generate the Hash Index. 2) num_bins specifies the number of bins in the table. 3) num_entries contains the total number of entries in the table. 4) **bins is a pointer the array of st_table_entry pointers.
typedef struct st_table st_table; struct st_table { struct st_hash_type; /* Hash Table Type */ int num_bins; /* Number of Slots */ int num_entries; /* Total Number of entries in the Table */ struct st_table_entry **bins; /* Root of Bin Table */ }; (st.h)
struct st_table_entry { unsigned int hash; /* Computed hash value */ char *key; /* Key */ char *record; /* Data Value */ st_table_entry *next; /* Next entry in chain */ }; (st.c)
Figure 5: st_table and st_table_entry structures
The st_hash_type  entry points to the following structure.
struct st_hash_type { int (*compare) (); /* Compare Function */ int (*hash) (); /* Hash Function */ }; (st.h)
The construction   "int (*compare) ()"  means:
      Call the function pointed to by (*compare)  and return an int  value.
The usage of the construction  "int (*hash) ()"  is comparable.
There is a compare function for both Numeric and String Keys.  Both return zero if they match a key in the Hash Table,  otherwise they return a non-zero value.
A compare function is necessary because Ruby's Hash Function  does not guarantee a 'perfect hash'.   The possibility that a entry in the referenced Bin  is part of a chain of entries.   The compare function is necessary to determine that none of the entries in the chain are equal to the current Key.
Ruby's Hash Table functions are general purpose in nature because of two design decisions. 1) The st_hash_type structure allows the Compare and Hash Functions to be tailored to the type of Key. 3) The set_table_entry references the Key and the Value through pointers.
In this example we will examine the Hash Table Functions for Numeric Keys.   The st_init_numtable  function creates a Hash Table for these Keys.
st_table* st_init_numtable () { return st_init_table (&type_numhash); } (st.c)
The function st_init_table  builds and initializes an empty st_table.   In this case it is creating a Hash Table for Numeric Keys.   The numeric specific table constructions are as follows:
static struct st_hash_type type_numhash = { numcmp, numhash, } ; static int numcmp (x, y) long x, y; { return x!= Y; } static int numhash (n) long n; { return n; } (st.c)
This type of table is used often within Ruby to convert Numeric ID's into their associated String Values.   The constructions for type_strhash  are similar.
Now let's look at a function that uses the st_table  structures.
st_lookup(table, key, value) st_table *table; register st_data_t key; st_data_t *value; { unsigned int hash_val, bin_pos; register st_table_entry *ptr; hash_val = do_hash(key, table); FIND_ENTRY(table, ptr, hash_val, bin_pos); if (ptr == 0) { return 0; } else { if (value != 0) *value = ptr->record; return 1; } } (st.c)
The st_lookup  function tries to find the Value  associated with the Key  in the specified st_table  table.
The Macros do_hash  and FIND_ENTRY constitute to core of this functions processing.
The Macro do_hash  executes the specified Hash Function.  
#define do_hash(key,table) (unsigned int)(*(table)->type->hash)((key)) (st.c)
This macro purpose is to call the specified Hash Function  with the Key  as it's parameter.   The particular table is referenced via '*(table)',  the '->type' references the st_hash_type  entry in that table,  and the '->hash' points to the actual Hash Function .   The result is returned as an unsigned integer hash key.
Given the hash key  the macro FIND_ENTRY is called,  which in turn calls two other macros,  PTR_NOT_EQUAL and EQUAL..
#define FIND_ENTRY(table, ptr, hash_val, bin_pos) \ bin_pos = hash_val%(table)->num_bins;\ ptr = (table)->bins[bin_pos];\ if (PTR_NOT_EQUAL(table, ptr, hash_val, key)) {\ while (PTR_NOT_EQUAL(table, ptr->next, hash_val, key)) {\ ptr = ptr->next;\ }\ ptr = ptr->next;\ } #define PTR_NOT_EQUAL(table, ptr, hash_val, key) \ ( (ptr) != 0 && \ (ptr->hash != (hash_val) || !EQUAL((table), (key), (ptr)->key)) ) #define EQUAL(table,x,y) \ ( (x)==(y) || (*table->type->compare)((x),(y)) == 0 ) (st.c)
All macros are one-line constructions.   For that reason each line in a macro except the last line are "continued" with the back-slash, ' \' , character.
The macro FIND_ENTRY takes two inputs and returns two outputs.   The inputs are the st_table  to be searched and the Hash Key.   It returns the bin_pos  and either a st_table_entry  pointer or zero(0).
The PTR_NOT_EQUAL macro also uses the input Key  parameter of st_lookup. 
The st_lookup  function returns TRUE(1) if the value  parameter equals the value in st_table_entry associated with the key  parameter.
St_add_direct ()
This functions creates a new entry in the specified st_table.   This function does not inspect the table to see if the entry is already present.   It simply creates a new entry.
void st_add_direct (table, key and value) st_table *table; char *key; char *value; { unsigned int hash_val and bin_pos; hash_val = do_hash (key and table); bin_pos = hash_val % table-> Num_bins; ADD_DIRECT (table, key, value, hash_val and bin_pos); } (st.c)
The core operations are performed by two macros: do_hash  which was described earlier and ADD_DIRECT.
#define ADD_DIRECT(table, key, value, hash_val, bin_pos)\ {\ st_table_entry *entry;\ if (table->num_entries/(table->num_bins) > ST_DEFAULT_MAX_DENSITY) {\ rehash(table);\ bin_pos = hash_val % table->num_bins;\ }\ \ entry = alloc(st_table_entry);\ \ entry->hash = hash_val;\ entry->key = key;\ entry->record = value;\ entry->next = table->bins[bin_pos];\ table->bins[bin_pos] = entry;\ table->num_entries++;\ } (st.c)
Before adding a new entry,  the current Density  of the table is examined.   If greater than ST_DEFAULT_MAX_DENSITY,  the table size is increased and the table is 'rehashed'.
#define ST_DEFAULT_MAX_DENSITY 5 if (table->num_entries/(table->num_bins) > ST_DEFAULT_MAX_DENSITY) {\ rehash(table);\ bin_pos = hash_val % table->num_bins;\ }\
Regardless of whether the table was expanded or not,  a new st_table_entry  structure is allocated and the new entry created.  
The function rehash  will create a new st_table  larger than the current table,  and recompute the hash_val  and  bin_pos  for each entry.   The entries will then be transfered to the new table at the new bin_pos.   This procedure is done to prevent table crowding.   When the number of bins  is too small for the number of entries,  the number of st_table_entry  strings tend to get long.   This leads to slowing of response for table lookups.
The function st_insert  combines st_lookup and st_add_direct.
int st_insert (table, key and value) register st_table *table; register char *key; char *value; { unsigned int hash_val and bin_pos; register st_table_entry *ptr; hash_val = do_hash (key and table); FIND_ENTRY (table, ptr, hash_val and bin_pos); if (ptr == 0) { ADD_DIRECT (table, key, value, hash_val and bin_pos); return 0; } else { ptr-> Record = value; return 1; } } (st.c)
As can be seen,  the macros do_hash  and FIND_ENTRY perform as described in st_lookup.   If the Key  is found in the table,  then value of the key/value pair is returned in the Value  parameter and the function returns TRUE(1)   .   Otherwise,  the ADD_DIRECT macro is called to create a new entry and the function returns FALSE(0).
Internally,  ruby uses an ID (unsigned int) to refer to symbols.   This increases processing efficiency.   In order to convert from Symbols to ID's and ID's to Symbols two st_tables  are used: sym_tbl  and  sym_rev_tbl.
While an  ID  is described above as an unsigned integer,  it like many things in ruby it is a little more complicated.   The ID is actually constructed out of two separate items.  The lower 3 bits are encoded with type information.   The actual ID Value is shifted left three bits to accommodate this type information.
The lower 3 bits of the ID contain the following Type Information: ID_LOCAL 0x01 - Local Variables ID_INSTANCE 0x02 - Instance Variables ID_GLOBAL 0x03 - Global Variables ID_ATTRSET 0x04 - Attributed ID_CONST 0x05 - Constants ID_CLASS 0x06 - Class Variables ID_JUNK 0x07 - Junk - Not Viable
Figure 6: Complete Symbol ID Construction
Additionally,  when a symbol ID is encoded in a VALUE,  it is left shifted again by 8 bits to make room the Symbol Flag - 0xe.   See "VALUE's Dark Secret" in chapter 2.
The function rb_intern  takes a Symbol(name) as a parameter and returns an ID.
static st_table *sym_tbl; /* Symbol -> ID Conversions */ static st_table *sym_rev_tbl; /* ID -> Symbol Conversions */ ID rb_intern(name) const char *name; { const char *m = name; ID id; int last; if (st_lookup(sym_tbl, (st_data_t)name, (st_data_t *)&id)) return id; last = strlen(name)-1; id = 0; switch (*name) { case '$': id |= ID_GLOBAL; m++; if (!is_identchar(*m)) m++; break; case '@': if (name[1] == '@') { m++; id |= ID_CLASS; } else { id |= ID_INSTANCE; } m++; break; default: if (name[0] != '_' && !ISALPHA(name[0]) && !ismbchar(name[0])) { /* operators */ int i; for (i=0; op_tbl[i].token; i++) { if (*op_tbl[i].name == *name && strcmp(op_tbl[i].name, name) == 0) { id = op_tbl[i].token; goto id_regist; } } if (name[last] == '=') { /* attribute assignment */ char *buf = ALLOCA_N(char,last+1); strncpy(buf, name, last); buf[last] = '\0'; id = rb_intern(buf); if (id > tLAST_TOKEN && !is_attrset_id(id)) { id = rb_id_attrset(id); goto id_regist; } id = ID_ATTRSET; } else if (ISUPPER(name[0])) { id = ID_CONST; } else { id = ID_LOCAL; } break; } while (m <= name + last && is_identchar(*m)) { m += mbclen(*m); } if (*m) id = ID_JUNK; id |= ++last_id << ID_SCOPE_SHIFT; id_regist: name = strdup(name); st_add_direct(sym_tbl, (st_data_t)name, id); st_add_direct(sym_rev_tbl, id, (st_data_t)name); return id; } (parse.y)
If the Symbol is already in the sym_tbl,  then the function st_lookup  will return the respective ID.
If the Symbol is not found,  then rest of the rb_intern  function will create an ID for the input Symbol.   This entails creating the appropriate flags for the type of Symbol.   This is done in the switch statement.Finally,  the Symbol / ID Pair is inserted into the two tables,  sym_tbl  and  sym_rev_tbl.   The sym_rev_tbl  as we will see next is used to convert ID's to Symbols.
The function rb_id2name  takes an input parameter of ID and returns the appropriate Symbol String.
char * rb_id2name(id) ID id; { char *name; if (id < tLAST_TOKEN) { int i = 0; for (i=0; op_tbl[i].token; i++) { if (op_tbl[i].token == id) return op_tbl[i].name; } } if (st_lookup(sym_rev_tbl, id, (st_data_t *)&name)) return name; if (is_attrset_id(id)) { ID id2 = (id & ~ID_SCOPE_MASK) | ID_LOCAL; again: name = rb_id2name(id2); if (name) { char *buf = ALLOCA_N(char, strlen(name)+2); strcpy(buf, name); strcat(buf, "="); rb_intern(buf); return rb_id2name(id); } if (is_local_id(id2)) { id2 = (id & ~ID_SCOPE_MASK) | ID_CONST; goto again; } } return 0; } (parse.y)
The first thing we check is the ID for an Operator.  This is possible because all operators have a value less the symbol  tLAST_TOKEN.   If so,  then the Symbol returned is from the table op_tbl.
Next,  we attempt a st_lookup  for the ID.   If the function returns TRUE(1),  the name  parameter is returned as the Symbol.
If both of these conversions fail and the Attribute  bit is set,  then the ID is checked to see if it is ID_LOCAL or ID_CONST.   If it is,  the search will continue and check Locals and Constants for a match.   This is done by looping back to the label  again: .
If no conversion can be found,  then a FALSE(0) is returned.
At the Ruby Level there are times when the programmer wishes to convert a string  into a Symbol.
static VALUE rb_str_intern (str) VALUE str; { ID id; if (!RSTRING (str) -> Ptr || RSTRING (str) -> Len == 0) { rb_raise (rb_eArgError, "interning empty string"); } if (strlen (RSTRING (str) -> Ptr) != RSTRING (str) -> Len) rb_raise (rb_eArgError, "string contains `\\0'"); id = rb_intern (RSTRING (str) -> Ptr) ; return ID2SYM (id); } (string.c)
The input to this function must be a non-empty String and it can not be a Null Terminated Character String.
If these conditions are met,  the function rb_intern  is called using the String  as the input.   It returns the ID for the String  interpreted as a Symbol.
The function rb_str_intern  then returns the Symbol that is equivalent to the String.
The following function takes a Symbol input and returns the equivalent String.
static VALUE sym_to_s (sym) VALUE sym; { return rb_str_new2 (rb_id2name (SYM2ID (sym))); } (object.c)
The original work is Copyright © 2002 - 2004 Minero AOKI.
Translated by Vincent ISAMBART
Translations and Additions by C.E. Thornton
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.