In theory, the scanner simply divides a source line in tokens.   The parser
then steps through the tokens branching as appropriate to compute the desired action.  
It would seem that scanner and the parser could act independently.   But as usual with
real world,  it does not work out that way.
It turns out that the scanner and parser have to work together.   The result for Ruby is a reasonably complex state machine driven scanner that feeds the parser to produce the correct result.   To be clear,  both the scanner and the parser manipulate the state variable of the scanner.   In other words, the scanner and the parser are talking to each other.
Concrete example
In many languages white-space outside string literals does not greatly effect the language.   Source lines are often terminated explicitly.   In 'C' a source line is terminated with a semi-colon ( ';' )  or the closing of a block (i.e. '}' ).
For Ruby the interpretation of a statement can change completely when additional tokens are processed.   The source fragment "a[i]" can be,  depending on context,  either an indexed variable  or a method call with implicit parenthesis's.
a[i] = 1 # a[i] = (1) - Index substitution
a[i] # a([i]) - Method with implicit parameter.
This sort of dramatic change in interpretation occurs in Ruby because the language itself is has a very flexible syntax.   Ask yourself,  how many languages allow implicit parenthesis's?  
Ruby's flexible syntax means that a blank can change how a statement is interpreted by Ruby.  Consider the following two statements:
a + 1 # (a) + (1)
a +1 # a (+1)
The first example is interpreted as: <variable> <op> <literal integer>.   The second is interpreted as: <method> <literal integer parameter>.   The same setup in 'C',  for example,  would be interpreted identically.
The bottom line is that the parsing and scanning is more complex with Ruby.   The following sections will attempt to break the scanner down into small enough pieces to be understood.
Function yylex()
The heart of the lexical scanner is yylex().   This function is often generated with the programs such as lex  or flex.   In the case of Ruby,  the complexity of this function necessitated a hand written State Equipped Scanner  that interacts the a Bison generated parser.
The following sections discuss the various scanner states and their functions.
Lex_state
The current state  of the scanner is maintained the variable lex_state.   It's declaration and the definition of it's enumerated states follows:
static enum lex_state {
EXPR_BEG, /* ignore newline, +/- is a sign. */
EXPR_END, /* newline significant, +/- is an operator. */
EXPR_ARG, /* newline significant, +/- is an operator. */
EXPR_CMDARG, /* newline significant, +/- is an operator. */
EXPR_ENDARG, /* newline significant, +/- is an operator. */
EXPR_MID, /* newline significant, +/- is an operator. */
EXPR_FNAME, /* ignore newline, no reserved words. */
EXPR_DOT, /* right after `.' or `::', no reserved words. */
EXPR_CLASS, /* immediate after `class', no here document. */
} lex_state;
(parse.y)
The prefix 'EXPR_'  means expression.   It is used remind us that the scanner is a expression processing engine.
Specifically, EXPR_BEG indicates the beginning of the expression. EXPR_END indicates the end of the expression, EXPR_ARG before method arguments, and EXPR_FNAME before the name of the method (def, etc).
Rest of Chapter is Machine Translation
The prefix EXPR_ of expression, "expression". EXPR_BEG , "the head of the expression"
However EXPR_DOT "during the ceremony, after the dot."
Specifically explain. EXPR_BEG is "the beginning of expression," it shows.
EXPR_END "at the end of the expression" the war. EXPR_ARG "method of argument"
Show. EXPR_FNAME is "( def and others) in the name of the method".飛ばexplanation
After the analysis of them in detail.
By the way, lex_state shows that are "in parentheses after" "head sentence", rather it
The addition of that information, so that the scanner rather than the state of the state like parser
Feel. But the scanner is usually called the state. Why?
In fact, in this case, "state" is usually used "state" is a little different meaning.
lex_state like "state" and the "state of the scanner would behave"
Means. For example EXPR_BEG to precisely "We own head scanner
Beginning of a sentence or salted cod to move like a state. "
In technical terms, use the scanner to see if the state machine and the state, say
Be. But what is there to explain the topics are too hard to break away from the SU
GIRU. Details of the structure of the data to the proper textbooks to read the見繕っ.
KITAI.
state with a scanner reading
State with a scanner to read the tips at any one time and not win them all. Par
Write for human services, with state does not want to use the scanner.
It is only natural that I do not want to be the main topic of the process. So scanner
The state management that is "associated with other parts of the trail as a bonus part" of
There are many. That is the entire state transition scanner It's a beautiful picture of the whole thing from the beginning
Existent.
What to do, and that the purpose-oriented and thorough It's a good read. "The
Solve this part, "" to solve this problem, this code is Oh
Of "the way the code to hack purpose. It's also the problem of interconnectedness
And never to start thinking about the shot. Say again, that is from the original source of
INODA.
Yet it is a certain amount of goals is necessary. Read with a scanner and the state
KINO goal is for each state than any other state to know what it is to put
Should be. For example EXPR_BEG What kind of state? It is an expression of the head and parser
Of it. And so on.
static way
That is, how can I know? Three ways.
state to see the name
ATARIMAE the easiest way. For example EXPR_BEG of course, the beginning of something
(Beginning) of what is expected, which they knew.
How
behavior change in the status or details
Cutting the state-token or change in the way of what is. And the ratio of real movement and
Shown to all.
transition from state to tell me what you see
What kind of tokens from any state, or out of the transition. For example '\ n' after the Essential
ZU HEAD , the status of the transition, it is sure to represent the beginning of the line
Sure.
EXPR_BEG as an example to think.
ruby If the state transition are all lex_state expressed in the assignment because it will
ZU EXPR_BEG assignment grep in the wash. That's where it is then exported. For example
yylex () -'#' and '*' and '!'…… and the like. And before the transition into consideration the state
That's true if you consider what (Figure 1).
Figure 1: EXPR_BEG to transition
Oh I see, this is exactly the type of top-laden statement. Known.
In particular '\ n' and ';'-tempered around it. And also open parentheses or comma
From there, this statement is not only an expression would be the beginning.
dynamic way
With a more convenient way to ascertain the actual behavior. For example debugger
yylex () on a hook lex_state easy to see.
Or source code to modify the output to a state of transition, while making it
. lex_state If the assignment and the comparison is only a few patterns, which Tekito
Variations in the strike they perceive as a transition to書き換えれI output. This is attached
CD-ROM with a rubylex-analyser as a tool to
Ta \ footnote ( rubylex-analyser : The accompanying CD-ROM tools / rubylex-analyser.tar.gz ).
This document is needed while using this tool to explain it.
General steps include, first and debugger tools in the sort of movement
Check out. And that information to determine the source code to see into the敷衍
Is good.
each state
lex_state briefly about the condition of it.
EXPR_BEG
The tip of expression. \ n (([!?:, operator op = and immediately after.
The most common condition.
EXPR_MID
Book word return break next rescue shortly after.
Binomial operator * and & is disabled.
The behavior of EXPR_BEG and the like.
EXPR_ARG
The method calls part of the name of the method, they just might be,
Or '[' shortly after.
However EXPR_CMDARG location of the airport.
EXPR_CMDARG
Usually the first method calls the format of the arguments before.
For details, " do clash" section.
EXPR_END
Is at the end of a sentence. For example, in parentheses after the literal. However EXPR_ENDARG ,
Except for one place.
EXPR_ENDARG
EXPR_END special edition. tLPAREN_ARG respond immediately after a closed parenthesis.
"The first argument parenthetical" section.
EXPR_FNAME
The name of the method. Specifically, def alias undef symbol ':' of
Immediately after that. "` "name alone.
EXPR_DOT
After the dot method calls. EXPR_FNAME and handling are similar.
Book all languages are treated as just an identifier.
'`' name alone.
EXPR_CLASS
Spanish Book class behind. The only condition is quite limited.
In summary,
BEG MID
END ENDARG
ARG CMDARG
FNAME DOT
Each representing a similar situation. EXPR_CLASS , but only a little special,
Some places are very limited in the first place because they do not have to think about.
new line of control
problem
Ruby's sentence need not necessarily end. C or Java, for example, be sure to end
I have not put a semicolon is, Ruby does not need such things.
The basic line in one sentence, so the line at liberty to the end of the sentence.
But on the other hand "is more clear," If the sentence is automatically continue to
In the world. "Clearly there is more" state and the
comma after
INFIKKUSU operator after
parentheses not balanced
reserved word if immediately after
And so on.
implementation
Such a grammar to achieve what? Simply scanner
Skip a line break is not alone. Ruby as a reserved word in a sentence ends区切
In the C language grammar is not about the collision, tried it lightly, return ,
next , break , the method calls are cut back and通らなかったomitted parentheses.
That's the end of a sentence is to leave the sign will not have some form of termination.
That's \ n or ';' regardless of whether they simply mark the end of some
Needed.
There are two solutions. That is parser or resolve or settle in the scanner.
Parser would be resolved, \ n be allowed at all options \ n to rest
Grammar kick like if I can write. If you settle in the scanner, \ n meaningful
Where there is only \ n I pass a parser (skip other locations).
Whether to use is a question of taste, but usually respond to the scanner. The more you
I have a small number, and what the rules are messing about good sign
PASAJENERETA in the use of those means they are missing.
That's not to say in conclusion ruby new line is also dealing with the scanner. Successive lines
When you want to continue \ n skip to the end you want \ n send as a token.
That's yylex () here.
▼ yylex () - '\ n'
3155 case '\ n':
3156 switch (lex_state) (
3157 case EXPR_BEG:
3158 case EXPR_FNAME:
3159 case EXPR_DOT:
3160 case EXPR_CLASS:
3161 goto retry;
3162 default:
3163 break;
3164)
3165 command_start = Qtrue;
3166 lex_state = EXPR_BEG;
3167 return '\ n';
(parse.y)
EXPR_BEG EXPR_FNAME EXPR_DOT EXPR_CLASS , goto retry ,
That is meaningless because it skipped. Labels retry is yylex () giant switch of
Before.
Others at the new line is meant to pass parser, incidentally
lex_state and EXPR_BEG back. There is a new line means namely expr break
So.
Also command_start for the time being and should be ignored. The first said,
In many places at once and be sure to follow the confusion.
Specifically, let's look at some examples. It's accompanying analysis tools
rubylex-analyser to use.
% Rubylex-analyser-e '
m (a,
b, c) unless i
'
+ EXPR_BEG
EXPR_BEG C "\ nm" tIDENTIFIER EXPR_CMDARG
EXPR_CMDARG "(" '(' EXPR_BEG
0: cond push
0: cmd push
EXPR_BEG C "a" tIDENTIFIER EXPR_CMDARG
EXPR_CMDARG "," ',' EXPR_BEG
EXPR_BEG S "\ nb" tIDENTIFIER EXPR_ARG
EXPR_ARG "," ',' EXPR_BEG
EXPR_BEG S "c" tIDENTIFIER EXPR_ARG
EXPR_ARG ")" ')' EXPR_END
0: cond lexpop
0: cmd lexpop
EXPR_END S "unless" kUNLESS_MOD EXPR_BEG
EXPR_BEG S "i" tIDENTIFIER EXPR_ARG
EXPR_ARG "\ n" \ n EXPR_BEG
EXPR_BEG C "\ n" 'EXPR_BEG
There are a lot of output, we need only to the left and center field. Left
The field is yylex () before entering lex_state shows, and its token middle of the field
The symbol.
The first token m The second argument before and b in front of the new line that \ n to toe
Kung before the end of the stick and not come out as a symbol. lex_state is
EXPR_BEG So.
But from the bottom of the second line \ n is at the end has emerged as a symbol. EXPR_ARG .
.
So, if using. The other example I would just take a look at.