Introduction to the Ruby Language

Ruby is an "Open Source" program. It is supported by a large community for developers and users.

Distribution of Ruby

  1. The Ruby Source Code can be distributed
  2. The Ruby Source Code can be modified
  3. The Altered Source Code can be distributed

In each case no special permission or fee is required.

Ruby is Conservative

Ruby features are in wide use in other languages. Special or experimental features are not included. Compiler Syntax such as parenthesis and semicolons are not necessary in most cases, but can be added for clarity;

Ruby is an Object Orientated Language

The notion of an object can not be separated from Ruby. The nature of this Object Orientation in as far as Ruby will be explained.

It is a Script Language

When Introducing Ruby as a "Object Orientation Script Language", it will not please everyone. But Ruby is a true programming language, with strong theoretical roots. It can be used for almost any programming job, including GUI programs and even web programming.

Ruby is the Interpreter

Ruby is an Interpreter, that is a fact! The fact that it is an interpreter, allows it to solve problems quickly.

Ruby Transferability

Ruby is a UNIX Centered Language. But the fact the it uses assembly coding rarely, makes transferring the language to other systems fairly easy. The following is a partial list of systems and OS's that currently host the Ruby Interpreter.

Linux
Win32 (Windows 95 and 98, Me and NT, 2000, XP)
Cygwin
Djgpp
FreeBSD
NetBSD
OpenBSD
BSD/OS
Mac OS X
Solaris
Tru64 UNIX
HP-UX
AIX
VMS
UX/4800
BeOS
OS/2 (emx)
Psion

Automatic Memory Management

Ruby implements automatic Garbage Collection. The programmer does not have to worry about Malloc/Free sequences as in 'C' and 'C++'. When an object is no longer being used it is automatically destroyed and it's memory returned to the pool.

Variables Have No 'Type'

Variables have to fixed type. It is one of the most powerful weapons of the Ruby Language. This means that an array for example can contain several different data types.

Ruby is written in ANSI 'C'

Nowadays, a program written in ANSI 'C' can be ported to almost an infinite number of systems and OS Environments. The fact that Ruby is written in 'C' is a major feature. The original Ruby code was written in K&R Style 'C' and it's influence is still visible, though it is now ANSI 'C' compatible. It is compiled with gcc in the Linux Environment.

Extended Library

The Ruby Language can be augmented with extended functions written in 'C'. The functions provided mirror the language grammar, thus features written in 'C' closely resemble Ruby Code.

# method call
Obj.Method (arg) # Ruby
Rb_funcall (obj, rb_intern ("method") and 1, arg); # C

# block call
Yield arg # Ruby
Rb_yield (arg); # C

# exceptional forwarding
Raise ArgumentError and 'wrong number of arguments' # Ruby
Rb_raise (rb_eArgError, "wrong number of arguments"); # C

Formation of # object
Arr = Array.New # Ruby
VALUE arr = rb_ary_new (); # C

Thread

Ruby implements threads within Ruby Itself. These are totally in-process, implemented within the Ruby interpreter. That makes the Ruby threads completely portable

The technology which reads the source code

In order to read the Ruby Source Code, certain information and techniques are discussed that will make the process easier.

Learning Ruby Internals

Techniques of analysis

To analyze the source code, there are roughly two techniques:

  1. Static Techniques
  2. Dynamic Techniques

Dynamic analysis

The object program is used

The Object program is run with specific usage in mind and the results are viewed. Either with a debugger, tracer, and/or debug code inserted into the program being Analyzed.

The movement is chased with the debugger

The user can use a debugger to watch execution flow. What data is loaded into data structures. There are tools, like dia to help draw pictures of data structures, which can make visualization of the data structures easier. Also see the graphviz program.

Tracer

Tracers can also capture information about program flow, such as Ctrace... {http://www.Vicente- .Org/ctrace} and to trace system calls Strace... {Http://www.Wi.Leidenuniv.Nl/ - Wichert/strace/} and Ktrace tool. Also, the tool IDBG (www.hawthorne-press.com), which can be a used with Ctrace.

Printf Tracing

Conditional Tracing statements embedded in the code being examined. Again, the IDBG program comes with a Ruby Tracing Support and the ability automatically insert and remove print statements and call support routines.

Rewriting, it moves

Also, if a function is difficult to understand, try changing it's parameters or code slightly and look at the result. The change can often tell what it is doing.

Cflow, Cflow2dot, and Dot

The program 'cflow' or the programs 'prcc' and 'prcg' it uses can be used with 'flow2dot' and 'dot' to produce process flow diagrams.

Static analysis

Importance of name

When doing static analysis of a program, the names of functions, variables, and constants can often be good clues to their usage. This is especially true if the original programmer followed good naming conventions and practices.

The document is read

There are also times when the document which explains internal constitution is available or the Internal Comments are extensive enough to explain the code.

Investigation of abbreviations

If there are abbreviations in the code (Say GC), determine if they are meaningful: Is it 'Graphic Context' or 'Garbage Collection'?

Call Relationships of a function

Using a program suite like 'cflow/cflow2dot/dot' to generate a call graph of a program or section of a program is very helpful. It is easier to grasp the process relationships visually.

Read the function code

Read the function carefully. Try to describe in one or two words it's purpose. If it is hard to read because of coding style, the use indent to convert the 'C' Style to a form you are comfortable with.

Try rewriting the function to your taste

Some times rewriting a function and proving it produces the same result can lead to a much better understanding of the function. However, leave the original code intact. Because, if half way through you find things are not the same, having the original allows you to find out where you may have gone wrong.

Reading program history and change logs

A lot of information about a program is usually found in change logs, whether attached to the program or not. Also CVS Logs and/or annotations of various sorts. Mailing list of changes in the development community, for example ruby-core history of changes has a lot of information.

Tool for static analysis

Using 'etags' to generate a 'TAGS' file, a lot of information is available. For example, a list of all functions called by a particular file. Using 'ctags' a cross reference can be generated.

Building Ruby

Building on a unix platform is divided into three parts

  1. Configure
  2. Make
  3. Make Install

Configure

Configure is a script that try's to determine if everything needed by the build process is present. The method of investigation is unexpected and simple.
The file 'Makefile.in' contains parameterized code that is converted into the final makefile based on results of configure.

Makefile.in      CFLAGS = @CFLAGS@

Makefile         CFLAGS = -g -o2

After the configure script is executed, the file 'config.h' is created. This file contains some of the results of configure execution.

         :
         :
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_STRINGS_H 1
#define HAVE_INTTYPES_H 1
#define HAVE_STDINT_H 1
#define HAVE_UNISTD_H 1
#define _FILE_OFFSET_BITS 64
#define HAVE_LONG_LONG 1
#define HAVE_OFF_T 1
#define SIZEOF_INT 4
#define SIZEOF_SHORT 2
         :

These values can be used by the programmer to determine if certain items are available. For example, this is from 'ruby.h'

  24 #ifdef HAVE_STDLIB_H
  25 # include < Stdlib.H>
  26 #endif

Autoconf

The use of Autoconf, Automake, and friends is described in full in GNU documents available at 'gnu.org'

(Build)

Figure 1: Makefile Construction


Make
This second stage, Make, is processes as follows:
  1. The Ruby Source Code is compiled
  2. The static library is complied
  3. Static Link with Miniruby is done
  4. If --enabled-shared, joint ownership of libruby.so is made
  5. Using Miniruby, the extended libary it compiles
  6. Lastly, the Real Ruby is linked
CVS
CVS is a source management system that allows not only to current system
to be compiled,   but any previous version of the program that was entered
into the CVS System.
Ruby Construction
Physical Structure
As Ruby has gotten larger,   the program sources have been divided into a number of sub-directories:
  1. Documents
  2. Ruby Source Code
  3. Ruby Tools for Building
  4. Standard Extended Library
  5. Standard Ruby Library
  6. Translations and Additions

Classification of source code
The ruby source code itself is divide into several parts:
Core of Ruby language
Class.C       Class-related API
Error.C       Exceptional-related API
Eval.C        Evaluator
Gc.C          Garbage collector
Lex.C         Reserved word table
Object.C      Object system
Parse.Y       Parser
Variable.C    Constant, global variable and class variable
Ruby.H        RubyPrincipal macro and prototype
Intern.H      RubyC API prototype.InternIt is thought that it is
              the abbreviation of internal, but it does not care 
              the fact that the function which has been recorded 
              here is used with the extended library separately.
Rubysig.H     The header file which supplies the macro related to 
              the signals
Node.H        Definition related to syntactic tree node
Env.H         Definition of the structure which expresses the 
              context of the evaluator

Utility
Dln.C         Dynamic loader
Regex.C       Regular expression engine
St.C          Hash table
Util.C        Library of cardinal number conversion and sort etc

Ruby Initialization and Loading Routines
Dmyext.C     Dummy of extended library initialization routine
             (DumMY EXTention)
Inits.C      Entry point of initialization routine of core and 
             library
Main.C       Entry point of command 
             (Libruby it is unnecessary)
Ruby.C       RubyPrincipal part of command 
             (Libruby it is needed)
Version.C    RubyVersion


Class library
Array.C      Class Array
Bignum.C     Class Bignum
Compar.C     Module Comparable
Dir.C        Class Dir
Enum.C       Module Enumerable
File.C       Class File
Hash.C       Class Hash (See 'st.c' also)
Io.C         Class IO
Marshal.C    Module Marshal
Math.C       Module Math
Numeric.C    Class Numeric (Integer, Fixnum, and Float)
Pack.C       Array#pack, snf String#unpack
Prec.C       Module Precision
Process.C    Module Process
Random.C     Kernel#srand(), and Rand()
Range.C      Class Range
Re.C         Class Regexp (See regex.c)
Signal.C     Module Signal
Sprintf.C    Ruby (Exclusive use of Sprint() )
String.C     Class String
Struct.C     Class Struct
Time.C       Class Time

Platform dependence file
Bcc32/       Borland C++ (Win32)
Beos/        BeOS
Cygwin/      Cygwin (the UNIX emulation layer with Win32)
Djgpp/       Djgpp (The free enviroment for software 
             development for DOS)
Vms/         VMS (OS which DEC has done release at one time)
Win32/       Visual C++ (Win32)
X68/         Sharp X680x0 system (as for OS Human68k)

Logical structure
Inside the Ruby Core group of files, it is divided into three parts.   
The First creates the object world of Ruby (The Object Space).    The
Second is the 'Parser' which creates the internal representation of the ruby program.    Lastly, the evaluator that drives the program.
Object Space
The object space is the memory that holds the objects created and operated on by the evaluator.    This is explained in chapters 2 through 7.   The following node chart represents the Nodes in Object Space for a simple ruby program consisting of only a minimal class called TestCase.   The unshaded nodes are created by Ruby before reading the user's program.




Figure 2: Object Space Node Chart   (Created with the Graphviz DOT program)

Parser
The Ruby Parser converts a Ruby program into an internal representation called a "Syntactic Tree".    This representation is processed by the evaluator when executing the program.    The following ruby statements are converted into the "Syntactic Tree" as shown below.





Figure 3: Syntactic Tree for example statements

The Parser and Syntactic Tree's are discussed in chapters 8 through 12.
Evaluator
The Evaluator is where a Ruby Program actually is 'executed'. This is in the third section of this book (Appraisal). The is covered in chapters 13 through 17.






The original work is Copyright © 2002 - 2004 Minero AOKI.

Translated by Vincent ISAMBART

Translations and Additions by C.E. Thornton


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.