Chapter 12 Syntactic Tree

Blue Chapter -- Machine Translation

node

NODE

Ruby already mentioned above once the construction program will be converted into a tree. And syntax tree is something concrete to say, "nodes (node)" and called Produced by the structure of the tree structure. ruby nodes are not all NODE type Represented.

NODE

 
  128 typedef struct RNode ( 
  129 unsigned long flags; 
  130 char * nd_file; 
  131 union ( 
  132 struct RNode * node; 
  133 ID id; 
  134 VALUE value; 
  135 VALUE (* cfunc) (ANYARGS); 
  136 ID * tbl; 
  137) u1; 
  138 union ( 
  139 struct RNode * node; 
  140 ID id; 
  141 int argc; 
  142 VALUE value; 
  143) u2; 
  144 union ( 
  145 struct RNode * node; 
  146 ID id; 
  147 long state; 
  148 struct global_entry * entry; 
  149 long cnt; 
  150 VALUE value; 
  151) u3; 
  152) NODE; 

(node.h) 

Structure name RNode might guess from the object node is Ruby . And the release of the node that is generated ruby is the GABEJIKOREKUTA care.

So naturally flags is the first object structure basic.flags same role. That is the type of structure T_NODE and FL_FREEZE and other records to the flag. NODE If you add to this type of node flags and store them.

What do you mean? Program at if and while , def A variety of Because there are elements, corresponding to the many types of nodes. That's Node type. NODE union is a complex that includes three, The union is using only one type of a node in the decision. For example if nodes NODE_IF If these.

member union members role
u1 u1.node conditional expression
u2 u2.node true body
u3 u3.node fake body

Also node.h , the union members to access the macro have been .

NODE macro access

 
  166 # define nd_head u1.node 
  167 # define nd_alen u2.argc 
  168 # define nd_next u3.node 
  169 
  170 # define nd_cond u1.node 
  171 # define nd_body u2.node 
  172 # define nd_else u3.node 
  173 
  174 # define nd_orig u3.value 
                  : 
                  : 

(node.h) 

For example KONNAFUUNI use.

 
NODE * head, * tail; 
head-> nd_next = tail; / * head-> u3.node = tail * / 

In the most likely source of this macro it is used. The negligible The exception is parse.y , NODE to generate about it and gc.c , NODE mark It's only two places.

By the way like this is to use a macro Why? One is, u1 like it Nothing is meaningless figure is troublesome to learn from. But the more weight It is important, the corresponding figures are not mind changing the actual change may have . For example if conditional clause u1 on the necessity because, if for some reason Oil and u2 might be変えたく. But u1 directly, if SOSUKO Around a re-end and ears and inconvenient. The entire node NODE , To be declared, if representing the node is hard to find. Access to macro - I do have one, you can not be, contrary to the macro-node to determine SU It will also be able to.

node type

NODE structure flags has a node in the types of storage said. This information, let us look at how to memory. Node type is nd_set_type () , Set, nd_type () can be obtained.

nd_type nd_set_type

 
  156 # define nd_type (n) (((RNODE (n)) -> flags>> FL_USHIFT) & 0xff) 
  157 # define nd_set_type (n, t) \ 
  158 RNODE (n) -> flags = ((RNODE (n) -> flags & ~ FL_UMASK) \ 
                              | (((T) < 


FL_USHIFT FL_UMASK

 
  418 # define FL_USHIFT 11 
  429 # define FL_UMASK (0xff < 


nd_type per TAISHITE should pay attention to home. YOUSURUNI as shown in Figure 1 is heard.

(flagUsage)
Figure 1: RNode.flags used in the way

Then, as the macro function to be used in the debugger nodetype () and Have already done that.

nodetype

 
4247 static enum node_type 
4248 nodetype (node) / * for debug * / 
4249 NODE * node; 
(4250 
4251 return (enum node_type) nd_type (node); 
4252) 

(parse.y) 

file name and line number

NODE - nd_file node corresponding to the existing text is filenames (A pointer) is kept. If the file name and line number is likely that Members are not seem appropriate. In fact, the following macro-line number flags to Fill the frame.

nd_line nd_set_line

 
  160 # define NODE_LSHIFT (FL_USHIFT +8) 
  161 # define NODE_LMASK (((long) 1 <<(sizeof (NODE *) * CHAR_BIT-NODE_LSHIFT)) -1) 
  162 # define nd_line (n) \ 
           ((unsigned int) ((RNODE (n) -> flags>> NODE_LSHIFT) & NODE_LMASK)) 
  163 # define nd_set_line (n, l) \ 
  164 RNODE (n) -> flags = ((RNODE (n) -> flags & ~ (-1 < 


nd_set_line () are quite magnificent. But the name of her nd_set_line () and nd_line is a symmetrical work, so it is no doubt a simple nd_line () investigating Parameters, if the relationship grabbed nd_set_line () It is necessary analysis There is no.

First NODE_LSHIFT However, this is the type of nodes in the preceding paragraph to see me talking to imagine As you can, flags the number of bits used. FL_USHIFT is ruby system Book (11-bit, ruby.h ), 8 minutes node type.

Then NODE_LMASK .

 
sizeof (NODE *) * CHAR_BIT - NODE_LSHIFT 

The remaining bits. To restbits is the place to try. Then Much easier.

 
# define NODE_LMASK (((long) 1 < 

That is like KOTORASHII Figure 2. 1 draw and will happen again this time falling a point. Ultimately NODE_LMASK is still available for a sequence of bits of it understandable.

(lmask)
Figure 2: NODE_LMASK

Has once again nd_line () to look at.

 
(RNODE (n) -> flags>> NODE_LSHIFT) & NODE_LMASK 

Shift to the right to an unused space LSB to drop a bit. Bit by bit and the unused territory Only region to leave. In other words flags how to put together and used as in Figure 3 Said. FL_USHIFT was 11, 32-bit machine is 32 - (10 +8) = 13-bit Minutes-line number to use it.

(flags)
Figure 3: NODE , flags using

…… That the line number is 2 ^ 13 = 8192 and the line number is more than Should be mad. Paced puzzler.

 
File.open ( 'overflow.rb', 'w') (| f | 
     10000.times (f.puts) 
     f.puts' raise ' 
) 

On my machines, a 686 ruby overflow.rb Kick Chile in 1809 and displayed on the line. Sung Isao. However 64-bit machine is a little bigger files do not speak It can not fail.

rb_node_newnode ()

Finally, the function that generates node rb_node_newnode () to see.

rb_node_newnode ()

 
4228 NODE * 
4229 rb_node_newnode (type, a0, a1, a2) 
4230 enum node_type type; 
4231 NODE * a0, * a1, * a2; 
(4232 
4233 NODE * n = (NODE *) rb_newobj (); 
4234 
4235 n-> flags | = T_NODE; 
4236 nd_set_type (n, type); 
4237 nd_set_line (n, ruby_sourceline); 
4238 n-> nd_file = ruby_sourcefile; 
4239 
4240 n-> u1.node = a0; 
4241 n-> u2.node = a1; 
4242 n-> u3.node = a2; 
4243 
4244 return n; 
4245) 

(parse.y) 

rb_newobj () Chapter 5 of the moth - BEJIKOREKUSHON, has ever seen. Unoccupied RVALUE to retrieve one Functions. This structure flag T_NODE dipped at VALUE as Initialization is complete. u1 u2 u3 of course NODE * can also receive the value of non-Oh Of thing, but in the meantime NODE * together by it. ruby syntax is a tree double pointer, and so are not counted by size is too small, you can say The following is not worried.

Write to know, forget the rest is details, NODE and

  • flags
  • nodetype
  • nd_line
  • nd_file
  • u1
  • u2
  • u3

Seven members of a structure I think you need.

wooden building construction

The role of a string of bytes parser source code to convert it for use. Grammar is passed on to the job only half KONASE is not even a pair of nodes There should be a tree stand. The construction of this section is a look at the tree-building process Let's.

YYSTYPE

This chapter is to say YOUSURUNI action, $$ and $ 1 type of YYSTYPE is important to it. First ruby -% union to look at.

% union declaration

 
  170% union ( 
  171 NODE * node; 
  172 ID id; 
  173 int num; 
  174 struct RVarmap * vars; 
  175) 

(parse.y) 

struct RVarmap evaluator of the structure, to store a block local variables. The rest is good. Of course, most use them node .

syntax tree landscape

First, the facts of the theory that he read the code. Now what kind of structure Thursday's statement can not because you want to know, first the answer (for use) to see To begin.

Looking at all of the time in the debugger any time, attach a tool CD-ROM. nodedump \ footnote ( nodedump : The accompanying CD-ROM tools / nodedump.tar.gz ) a Easy to use and more syntax tree can be visualized. This tool is Pragmatic Programmers \ footnote (Pragmatic Programmers ^ http://www.pragmaticprogrammers.com) film NodeDump for a reshuffle of this document. Original output is very descriptive, but to me, this revamped version of the The figure beat the syntax tree to direct the output to.

For example m (a) dump the simple formula is the only way to do good.

 
% Ruby-rnodedump-e 'm (a)' 
NODE_NEWLINE 
nd_file = "-e" 
nd_nth = 1 
nd_next: 
     NODE_FCALL 
     nd_mid = 9617 (m) 
     nd_args: 
         NODE_ARRAY 
         nd_alen = 1 
         nd_head: 
             NODE_VCALL 
             nd_mid = 9625 (a) 
         nd_next = (null) 

-r options you want to load the library, -e to pass the program. The expression is the program for use by the dump.

Look at the way the contents of a simple explanation. NODE_NEWLINE and NODE_FCALL is the Node type. The same is written in an indentation in the node-Men BAS contents. For example, is the root of NODE_NEWLINE and its members are nd_file nd_nth nd_next three. nd_file string of C "-e" points, nd_nth C is a whole number of points, nd_next the next node NODE_CALL is stored Being. Words and telling him not to be confusing, and the contrast in Figure 4 I want to try.

(stree)
Figure 4: syntax tree

Each node and explain the meaning, NODE_FCALL is Function CALL. NODE_ARRAY is the name of the array of nodes, where a list of arguments Respectively. NODE_VCALL is Variable or CALL, local variables undefined In a reference.

, NODE_NEWLINE What? This is the time to run the file name and line running Number of nodes in the suit, stmt one set for each one. So Execution is considering the meaning of this node will ignore it. Also nodedump of Instead nodedump-short and require Then the first place NODE_NEWLINE me U is a distraction Back to the area. It is simple and easy to look at the future Especially as write nodedump-short to use.

The following is a syntax tree to capture the overall look for three types of components We take a look at. First syntax leaves (leaf). Then a combination of the ceremony, That is the syntax tree branch. Finally, a list that is to lay out the sentence syntax tree trunks.

leaves

First terminal, to look at the syntax of leaves. And variables such as a literal reference, And the rules say primary in a particularly simple fall into this category.

 
% Ruby-rnodedump-short-e'1 ' 
NODE_LIT 
nd_lit = 1: Fixnum 

Numeric 1. HINERI what is not. But where do you belong to the node 1 C, but Ruby's 1 ( Fixnum 1) It is important to be careful. Why? Said……

 
% Ruby-rnodedump-short-e ': sym' 
NODE_LIT 
nd_lit = 9617: Symbol 

In this way, Symbol syntax of the same tree and NODE_LIT represented by. We always nd_lit in VALUE like to put it on, Symbol awful. UTO Fixnum would run when you are dealing with exactly the same way as we can. - Marika, nd_lit to retrieve the return value as well. It's syntax tree In order to make it run because of the run to make it work is correct.

 
% Ruby-rnodedump-short-e ' "a"' 
NODE_STR 
nd_lit = "a": String 

String. This is Ruby's string. A string literal, so when you are actually a copy.

 
% Ruby-rnodedump-e '[0,1]' 
NODE_NEWLINE 
nd_file = "-e" 
nd_nth = 1 
nd_next: 
     NODE_ARRAY 
     nd_alen = 2 
     nd_head: 
         NODE_LIT 
         nd_lit = 0: Fixnum 
     nd_next: 
         NODE_ARRAY 
         nd_alen = 1 
         nd_head: 
             NODE_LIT 
             nd_lit = 1: Fixnum 
         nd_next = (null) 

Arrays. This is a leaf can not say, OK, so literal and ties. NODE_ARRAY list of nodes in each element suspensory feel? That nodedump-short …… reason not to use this section you'll read to the end.

branch

The following is part of the branch…… "combination" to focus on. Take the example if .

if

NANIKATO said if to the example just feel like, but it is of course structure And it is simple, if reader does not know to write it from a convenient.

Anyway it is if example. For example, a number of trees to try to syntax.

▼ source program

 
if true 
   'true expr' 
else 
   'false expr' 
end 
The syntax tree representation

 
NODE_IF 
nd_cond: 
     NODE_TRUE 
nd_body: 
     NODE_STR 
     nd_lit = "true expr": String 
nd_else: 
     NODE_STR 
     nd_lit = "false expr": String 

This is the aforementioned nodedump-short allows it NODE_NEWLINE is removed and Be. nd_cond criteria, nd_body If the body is true, nd_else is false. In the case of the body.

Now it has to make a look at the code.

if rules

 
1373 | kIF expr_value then 
1374 compstmt 
1375 if_tail 
1376 kEND 
(1377 
1378 $ $ = NEW_IF (cond ($ 2), $ 4, $ 5); 
1379 fixpos ($ $, $ 2); 
1380) 

(parse.y) 

NEW_IF () which is NODE_IF to generate MAKURORASHII. The value of symbols $ 2 $ 4 $ 5 using the symbols and rules $ n deal with it

 
kIF expr_value then compstmt if_tail kEND 
  $ 1 $ 2 $ 3 $ 4 $ 5 $ 6 
NEW_IF (expr_value, compstmt, if_tail) 

. In other words expr_value is conditional expression, compstmt ( $ 4 ) is true, if_tail If it is false.

Meanwhile, all nodes to generate a macro NEW_xxxx with the name, node.h Definition . NEW_IF () to look at.

NEW_IF ()

 
  243 # define NEW_IF (c, t, e) rb_node_newnode (NODE_IF, c, t, e) 

(node.h) 

Each parameter is c that condition, t was then, e else to represent that. As the previous section, the node is a member of the order does not make much sense I do not see into the parameter name is not necessary.

The conditional expression of action to deal with node cond () function of the analysis is meaningless. This is described below.

And fixpos () line number is correct them. NODE of them are "generated When the "loading of the file name and line number to be initialized. But if to I think about it, NODE_IF when making already end to Perth to that. So let go and ZURERU line number. So fixpos () and the correction Rather, they said.

 
fixpos (dest, src) 

The node dest row number of nodes src ones. if say, if formula applies to the entire line is the line number of conditions to the ceremony.

elsif

Then if_tail also look at the rules.

if_tail

 
1543 if_tail: opt_else 
1544 | kELSIF expr_value then 
1545 compstmt 
1546 if_tail 
(1547 
1548 $ $ = NEW_IF (cond ($ 2), $ 4, $ 5); 
1549 fixpos ($ $, $ 2); 
1550) 

1553 opt_else: none 
1554 | kELSE compstmt 
(1555 
1556 $ $ = $ 2; 
1557) 

(parse.y) 

First, the rule is "zero or more elsif clause after opt_else that ends" Respectively. Because, elsif have been limited if_tail without appearing to reproduce, opt_else to come and gone. Appropriate rules to expand the number of eyes Each Be.

 
if_tail: kELSIF .... if_tail 
if_tail: kELSIF .... kELSIF .... if_tail 
if_tail: kELSIF .... kELSIF .... kELSIF .... if_tail 
if_tail: kELSIF .... kELSIF .... kELSIF .... opt_else 
if_tail: kELSIF .... kELSIF .... kELSIF .... kELSE compstmt 

Then the attention and action, what, elsif normal, if same NEW_IF () to use it. In other words, following two programs Syntax tree to become the difference between them is that they were missing.

 
if cond1 if cond1 
   body1 body1 
elsif cond2 else 
   body2 if cond2 
elsif cond3 body2 
   body3 else 
else if cond3 
   body4 body3 
end else 
                                 body4 
                               end 
                             end 
                           end 

I said that's kind of like the C language syntax level, this does not distinguish between the two, It might be natural course. Another option is conditional operator ( a? B: c ) and a syntax tree And if statement and the distinction is out of control.

Grammar, it is important to him is a priority for use of the structure itself contains information Because no longer needed. Also if and conditions for operators like the difference between watching has reached It is no longer any sense of the meaning of (behavior) are important. So if Operators and conditions for use, you can do exactly the same expression that's fine.

These are some examples to try out. and and && are the same. or and || same. not and ! , if and qualified if , and so is identical.

reflexive left and right reflexive

By the way, Chapter 9 crash yacc list, we always represent the list of symbols Was written on the left, if_tail Conversely, it has been noticed? The following is the point where it regrouped.

 
if_tail: opt_else 
        | KELSIF ... if_tail 

Until now, and certainly the opposite. No. list if_tail is right.

In fact, the list is another routine there,

 
list: END_ITEM 
     | ITEM list 

And write, ITEM continued to zero or more END_ITEM ends with a list.

Both list as a representation would be less significant differences, but not the actions The execution-style have been fatally different. list to the right format to write back ITEM order from the action. Left list ifスタッ QUEUE movement has been done, right list If you have to try. Enter the ITEM and four END_ITEM .

first check the
ITEM ITEM to shift
ITEM ITEM ITEM to shift
ITEM ITEM ITEM ITEM to shift
ITEM ITEM ITEM ITEM ITEM to shift
ITEM ITEM ITEM ITEM END_ITEM END_ITEM to shift
ITEM ITEM ITEM ITEM list END_ITEM list reduction in
ITEM ITEM ITEM list ITEM list list reduction in
ITEM ITEM list ITEM list list reduction in
ITEM list ITEM list list reduction in
list ITEM list list reduction in
accept.

"Left to list " when the reduction of shift and had been held alternately, As you can see much of this shift, much reduction in the analysis.

Why if_tail "right list " I had to say, for use in a bottom-up Make. Finally, make a bottom-up and if nodes have forgotten. Shimo Shika "Left list ", if_tail to define, the last to hand if nodes in order to leave Is elsif to find every time elsif 辿っlinks to all of them to add at the end of IKENAKU become desired. This is a pain. TSUIDENI late. So if_tail "The right is list " have built.

Finally headlines, but the meaning, grammar and terminology. "Left to list " reflexive left (left recursion), "The right is list " right-reflexive (right recursion) said. This term is mainly grammar and the ability to read a paper processing yacc used to write books.

stem

Leaves, branches and would end up in the trunk. Together to make a statement in a list of what's going see.

▼ source program

 
7 
8 
Nine 

The corresponding syntax tree, looks like a dump. This is nodedump-short not complete.

▼ corresponding syntax tree

 
NODE_BLOCK 
nd_head: 
     NODE_NEWLINE 
     nd_file = "multistmt" 
     nd_nth = 1 
     nd_next: 
         NODE_LIT 
         nd_lit = 7: Fixnum 
nd_next: 
     NODE_BLOCK 
     nd_head: 
         NODE_NEWLINE 
         nd_file = "multistmt" 
         nd_nth = 2 
         nd_next: 
             NODE_LIT 
             nd_lit = 8: Fixnum 
     nd_next: 
         NODE_BLOCK 
         nd_head: 
             NODE_NEWLINE 
             nd_file = "multistmt" 
             nd_nth = 3 
             nd_next: 
                 NODE_LIT 
                 nd_lit = 9: Fixnum 
         nd_next = (null) 

NODE_BLOCK list is ready, and NODE_NEWLINE is a header Sticking to understand that (Fig. 5).

(blocklist)
Figure 5: NODE_BLOCK and NODE_NEWLINE

In other words statement ( stmt ) at the rate of one NODE_NEWLINE is included, and it is lined with more than NODE_BLOCK to be listed. Let us also take a look at code.

stmts

 
  354 stmts: none 
  355 | stmt 
  356 ( 
  357 $ $ = newline_node ($ 1); 
  358) 
  359 | stmts terms stmt 
  360 ( 
  361 $ $ = block_append ($ 1, newline_node ($ 3)); 
  362) 

(parse.y) 

newline_node () , NODE_NEWLINE to cover, block_append () connect to the list. Descriptive. block_append () , let us just look at the contents.

block_append ()

This function is in the middle of the error-checking, so there are disturbing Back to that part of the show.

block_append () (shorthand version)

 
4285 static NODE * 
4286 block_append (head, tail) 
4287 NODE * head, * tail; 
(4288 
4289 NODE * end; 
4290 
4291 if (tail == 0) return head; 
4292 if (head == 0) return tail; 
4293 
4294 if (nd_type (head)! = NODE_BLOCK) ( 
4295 end = NEW_BLOCK (head); 
4296 end-> nd_end = end; / * (A-1) * / 
4297 fixpos (end, head); 
4298 head = end; 
4299) 
4300 else ( 
4301 end = head-> nd_end; / * (A-2) * / 
4302) 

           / *…… Back…… * / 

4325 if (nd_type (tail)! = NODE_BLOCK) ( 
4326 tail = NEW_BLOCK (tail); 
4327 tail-> nd_end = tail; 
4328) 
4329 end-> nd_next = tail; 
4330 head-> nd_end = tail-> nd_end; / * (A-3) * / 
4331 return head; 
4332) 

(parse.y) 

According to dump trees and syntax NODE_BLOCK is nd_next using a linked list. Also note that read, " head and tail either NODE_BLOCK DENAKATTARA NODE_BLOCK in the case, a consolidated list to each other, "read.

And (A-1 〜 3), the top of the list NODE_BLOCK - nd_end is always at the end of the list Tail NODE_BLOCK to point to that. Thus, if every list 辿らなくat the end of all the elements to be able to be consolidated (Fig. 6). In other words, should be added to the list when NODE_BLOCK is Suited to it.

(append)
Figure 6: Add it easier

two lists

Now, here at the headline said. Syntax wooden structure are also part of the third Come to mass, the second part is to keep this long. But before the end of I want to talk about one thing. It consists of two generic list.

The list is two, BLOCK and LIST . BLOCK described the just - NODE_BLOCK the linked list, which connects to a statement. LIST , LIST saying that while the entity NODE_ARRAY list. Array literal Who have been used. LIST method is a list of arguments and multiple assignment Stored and used.

This list of the differences between the two nodes of how to use and easy to understand.

NODE_BLOCK nd_head elements to store
nd_end list last NODE_BLOCK points
nd_next following NODE_BLOCK points
NODE_ARRAY nd_head elements to store
nd_alen later this node list length
nd_next following NODE_ARRAY points

Usage is the difference between the second element, nd_end and nd_alen only. And that is their raison d'etre. NODE_ARRAY to store size , So often the size of this list may be required for ARRAY to use the list. Otherwise, it is bound fast BLOCK to use the list. This is Enough Use code to see that it is not known, so deeply not talk about the significance of the After the third appearance of the "Oh, I'm with length," he remembers.

semantic analysis

The second part is the first FURETA easily bite-analysis of the program say they also want to Analysis and meaningful analysis of the two. The analysis is to look at yacc they are almost done, After the action was in the good sense to do the analysis.

Action

error in the

Analysis and concrete meaning of what it is? In language, for example-type test. In addition, many times the same name variables that are not defined or defined before does not use Or using procedures that are defined and procedures outside return and you have a How, and semantic analysis is included.

Current ruby , meaning he has been analyzing what? If you look at our example and ruby , error checking accounts for most of the analysis is significant because the errors out And you'll find someone looks good. yacc parser error in Sometimes yyerror () to call itself…… In other words yyerror () to By the way, he said the error occurred. So in Action yyerror () to call him to list the location.

  • value is required to place a value equation does not (void value expression)
  • $ n - alias
  • method in the BEGIN
  • method in the END
  • method outside return
  • constant at the request of local variables
  • method in the class statement
  • inappropriate variable parameters ( $ gvar and CONST , etc.)
  • variable parameters of the same name comes out twice
  • inappropriate methods specific receivers ( def (). method , etc.)
  • literal definition of the specific method
  • hash literal list of odd
  • self / nil / true / false / __FILE__ / __LINE__ assigned to the
  • constant in the method of assigning
  • conditional expression in the multiple assignment

These checks are roughly the following classes of purpose.

    I give an error from
  • rules only complicate the
  • Otherwise
  • (purely semantic analysis)

For example, a "method outside return " is the only complex rules to be checked . This error is a structural problem because I can not deal with grammar. For example, methods and rules out a separate statement, and allowed the unforgivable Tiles do all things for their good. But also consider how to take care of it now, actin HANETA applications, it is very concise.

The " self the assignment", I give an error to be checked I think. This is a "method outside return " than the grammar is negative Much easier, but simply a negative parser "parse error" the output Ends. It is from this

 
% Ruby-e 'self = 1' 
- e: 1: Can't change the value of self 
self = 1 
       ^ 

An error that it is much kindness.

Of course, Chile and Kick "for this purpose" and would not necessarily say that. For example "Methods outside return " to have, it is better to give an error You can check think. The goal is not to overlap.

Now the problem is "purely semantic analysis" But, Ruby, this classification is true Oh Marika. Typed language and type of analysis, but the major event, Ruby is a variable - Protection so pointless. Come to the front instead of an expression of the value of the check.

With value, to be precise and to "assess the value and can be obtained." return and break is no value in itself. Of course return to pass away, but where is the value, return is written in a place with a value in itself is left. So, like the following Expression is strange.

 
i = return (1) 

These expressions were mistaken, or simply because Miss compile time They play better. This value will be taken to make sure one of the functions, value_expr () will be able to look at.

value_expr ()

value_expr () the value (value) with expr that the function of checking.

value_expr ()

 
4754 static int 
4755 value_expr (node) 
4756 NODE * node; 
(4757 
4758 while (node) ( 
4759 switch (nd_type (node)) ( 
4760 case NODE_CLASS: 
4761 case NODE_MODULE: 
4762 case NODE_DEFN: 
4763 case NODE_DEFS: 
4764 rb_warning ( "void value expression"); 
4765 return Qfalse; 
4766 
4767 case NODE_RETURN: 
4768 case NODE_BREAK: 
4769 case NODE_NEXT: 
4770 case NODE_REDO: 
4771 case NODE_RETRY: 
4772 yyerror ( "void value expression"); 
4773 / * or "control never reach"? * / 
4774 return Qfalse; 
4775 
4776 case NODE_BLOCK: 
4777 while (node-> nd_next) ( 
4778 node = node-> nd_next; 
4779) 
4780 node = node-> nd_head; 
4781 break; 
: 4782 
4783 case NODE_BEGIN: 
4784 node = node-> nd_body; 
4785 break; 
4786 
4787 case NODE_IF: 
4788 if (! Value_expr (node-> nd_body)) return Qfalse; 
4789 node = node-> nd_else; 
4790 break; 
4791 
4792 case NODE_AND: 
4793 case NODE_OR: 
4794 node = node-> nd_2nd; 
4795 break; 
4796 
4797 case NODE_NEWLINE: 
4798 node = node-> nd_next; 
4799 break; 
4800 
4801 default: 
4802 return Qtrue; 
4803) 
4804) 
4805 
4806 return Qtrue; 
4807) 

(parse.y) 

algorithm

Summary. The tree in order to lick it "absolutely has no value equation" of the spotted ATATTARA With no known value at the time. rb_warning () , warning that Qfalse returns. Value of Do not leave tree ceremony for a ride OWATTARA value. Qtrue returns.

Here, not necessarily the whole tree in a check should be aware that it is not. For example value_expr () method to assume that argument to be called out. It goes here.

arg value value_expr () check

 
1055 arg_value: arg 
(1056 
1057 value_expr ($ 1); 
1058 $ $ = $ 1; 
1059) 

(parse.y) 

This argument $ 1 in the nest is also called the method might be. And However, the method of argument is already inside value_expr () , that should have been So, you do not check again.

More generally on the bright side. Certain elements of grammar A was as the structure for all Sung elements in the value_expr () to him if the element A to check You will not.

For example, if How? Unconditionally, all the elements in the value_expr () to Called as a handle? In conclusion it up, it does not work. Why If that statement (that do not value) if The body is then supposed to be good value for both返さなく . For example, if you like.

 
def method 
   if true 
     return 1 
   else 
     return 2 
   end 
   Five 
end 

This if statement of values is required. But the value of such cases is necessary.

 
def method (arg) 
   tmp = if arg 
         then 3 
         else 98 
         end 
   tmp * tmp / 3.5 
end 

So in this case, the entire assignment in the first when checking if statement also check It must be like. That is value_expr () - switch statement, along with It is not.

tail recursion removal

By the way, value_expr () and look at the whole pattern follows that saw frequent And you know.

 
while (node) ( 
     switch (nd_type (node)) ( 
       case NODE_XXXX: 
         node = node-> nd_xxxx; 
         break; 
          : 
          : 
     ) 
) 

This expression would change the meaning of the same.

 
return value_expr (node-> nd_xxxx) 

In this way return recursion just before the code tail recursion, At the end of reflexive say, but it is generally goto is known to be able to convert. Optimization technique is commonly used. Scheme has reached the end of recursive language processing systems must be Remove And standards must be established. Lisp language is well-loop system instead of Reflexive use.

I'd like to note, however, at the end of recursion is " return just before the call" only. For instance value_expr () - NODE_IF to see

 
if (! value_expr (node-> nd_body)) return Qfalse; 
node = node-> nd_else; 
break; 

And so is the first time that recursion. return format and will use it,

 
return value_expr (node-> nd_body) & & value_expr (node-> nd_else); 

And the left value_expr () is false if the right value_expr () and Executed. This means that if the left value_expr () is return "just before" Is not. Therefore it is not the end of recursion. So goto can not be expanded.

Check

value of the overall picture

Function to check the value of reading to a close. I think it is too early Not all, for any other functions value_expr () the same honest way to make an honest 辿っnodes to check it, so completely insipid. But the overall picture Want to understand the relationship graph only shows the function call to end (Figure 7).

(callgraph)
Figure 7: validation function call graph

local variables

local variable definition

Ruby's definition of the types of variables and many pretty. And the constant variable is the first class assignment that has been defined, Global instance variable names of all the variables are defined and You can think of, (But a warning is out) suddenly without reference to the assignment.

The definition of local variables, and it is also like a different. Local variables, Variable on the program appeared to be defined at the time. For example follows.

 
lvar = nil 
p lvar # defined in the 

In this case, the first line lvar assigned to that point, so I wrote lvar defined That. Undefined in the following run-time exception NameError said.

 
% Ruby lvar.rb 
lvar.rb: 1: undefined local variable or method `lvar ' 
for #  (NameError) 

"local variable or method" Why are there. Method calls When the argument is omitted parentheses, so when there is no argument at the local variables 付かなくis divided. To solve it ruby is undefined local variables DP Let's look at the call as a method to try. That method is NAKE If the above error.

By the way, the assignment is defined by the "appearance", will actually be assigned No longer be defined by it. Variable defined by the initial value is nil .

 
if false 
   lvar = "This is by no means the assignment does not run" 
end 
p lvar # nil and displayed 

Moreover, the assignment is defined as "appeared" and "when", On the string before without reference to it. For example, the following cases: That is not defined.

 
p lvar # which is not defined! 
lvar = nil # where it is emerging that…… 

"String up" Note that they want. Rating with a There is no relationship. For example, following a number of course continue to assess the condition of expression, Order to string p At the time of lvar not come to an assignment. Therefore NameError said.

 
p (lvar) if lvar = true 

So far, we know the local variables are very influenced by watching for it . Assignment of string when it appeared that, in order to see defined. The Based on information given, ruby in Perth at the time of the local variables Natural and predictable. Because the order, such as the string is out of the parser MATTARA not exist anymore. That's right, and in fact, ruby , parser Local variables.

block local variables

ITERETABUROKKU declared to be the first time in local variables Or block local variables and dynamic variable. Block local variables in the language specification Local variables and the same thing. But these two are implemented is different. What's the difference or not, we take a look at the future.

data structure

Find a local variable table struct local_vars to start a conversation.

struct local_vars

 
5174 static struct local_vars ( 
5175 ID * tbl; / * local variable names table * / 
5176 int nofree; / * from being used outside * / 
5177 int cnt; / * tbl array size * / 
5178 int dlev; / * dyna_vars nesting levels * / 
5179 struct RVarmap * dyna_vars; / * * block local variables / 
5180 struct local_vars * prev; 
5181) * lvtbl; 

(parse.y) 

prev name from a member of the struct local_vars linked list in the reverse direction …… There is to be expected from the stack. Simultaneously declared in the global variable lvtbl the end of the stack local_vars points.

Also struct RVarmap is env.h Definition, and the evaluator also used Published structure. Block local variables used to store.

struct RVarmap

 
   52 struct RVarmap ( 
   53 struct RBasic super; 
   54 ID id; / * variable name * / 
   55 VALUE val; / * The value * / 
   56 struct RVarmap * next; 
   57); 

(env.h) 

Back to struct RBasic This is because there are objects of Ruby. That is managed by GABEJIKOREKUTA. Also next member So tied to the linked list.

The results of this observation, you'll also find information in addition to running both the image parser The figure is shown in Figure 8 and the like.

(localvars)
Figure 8: The local variables when running table image

local variable scope

parse.y view of the function name to the list and local_push () local_pop () local_cnt () functions such as the line the見付? Be. Have every reason to believe this is a bias for local variables. In addition, the names of push pop so bright They stack. The first of these functions are being used to look for Like.

local_push () local_pop () for example

 
1475 | kDEF fname 
(1476 
1477 $  $ = cur_mid; 
1478 cur_mid = $ 2; 
1479 in_def + +;
1480 local_push (0); 
1481) 
1482 f_arglist 
1483 bodystmt 
1484 kEND 
(1485 
1486 / * NOEX_PRIVATE for toplevel * / 
1487 $ $ = NEW_DEFN ($ 2, $ 4, $ 5, 
                                   class_nest? NOEX_PUBLIC: NOEX_PRIVATE); 
1488 if (is_attrset_id ($ 2)) 
                                   $ $ -> nd_noex = NOEX_PUBLIC; 
1489 fixpos ($ $, $ 4); 
1490 local_pop (); 
1491 in_def -; 
1492 cur_mid = $  3; 
1493) 

(parse.y) 

def that is being used to discover. In addition to the class definitions and specific definition of class, especially Definition to a different class. That is the end of the scope of local variables. And look at how to use the method to define the beginning of push over the The following places pop . It is still local_ cognizance of the local function is strange It is almost certain that the number of relationships. Also push and pop until one The scope of local variables can also be found.

In addition local_cnt () been looking for him.

NEW_LASGN ()

 
  269 # define NEW_LASGN (v, val) rb_node_newnode (NODE_LASGN, v, val, local_cnt (v)) 

(node.h) 

node.h be found. parse.y also using it may not be good Do not bother to find out from different sets of files, the author also seems to be a last resort.

This NEW_LASGN new local assignment is an assignment that is a local variable Nodes must be, and is using it and think about where the parameters v argument is a local name. val is the right term value (representing the syntax tree).

More than considering the scope of local variables starting point local_push () , on the way If there is a local variable assignment local_cnt () add local variables, Schou Quit Group, local_pop () . The perfect scenario will float to (Fig. 9).

(localtbl)
Figure 9: The local variable flow management

Function, let's look at the contents.

push and pop

local_push ()

 
5183 static void 
5184 local_push (top) 
5185 int top; 
(5186 
5187 struct local_vars * local; 
5188 
5189 local = ALLOC (struct local_vars); 
5190 local-> prev = lvtbl; 
5191 local-> nofree = 0; 
5192 local-> cnt = 0; 
5193 local-> tbl = 0; 
5194 local-> dlev = 0; 
5195 local-> dyna_vars = ruby_dyna_vars; 
5196 lvtbl = local; 
5197 if (! Top) ( 
5198 / * one table on the scope of variables to keep val * / 
5199 rb_dvar_push (0, (VALUE) ruby_dyna_vars); 
5200 ruby_dyna_vars-> next = 0; 
5201) 
5202) 

(parse.y) 

But struct local_vars is used as a stack. And lvtbl pointed tip of the stack that is understandable. rb_dvar_push () KUDARI to read it later, so keep in the meantime.

Then local_pop () and local_tbl () together to look at.

local_tbl local_pop

 
5218 static ID * 
5219 local_tbl () 
(5220 
5221 lvtbl-> nofree = 1; 
5222 return lvtbl-> tbl; 
5223) 

5204 static void 
5205 local_pop () 
(5206 
5207 struct local_vars * local = lvtbl-> prev; 
5208 
5209 if (lvtbl-> tbl) ( 
5210 if (! Lvtbl-> nofree) free (lvtbl-> tbl); 
5211 else lvtbl-> tbl [0] = lvtbl-> cnt; 
5212) 
5213 ruby_dyna_vars = lvtbl-> dyna_vars; 
5214 free (lvtbl); 
5215 lvtbl = local; 
5216) 

(parse.y) 

local_tbl () I want to see. This is the current local variables table ( lvtbl ) To get the functions. When you call it, the current table nofree is true. nofree natural meaning of " free () to it". This means that Li FARENSUKAUNTO things like, "This table is used to free () of the IDE, "they say. In other words, local_tbl () also called not once Pop the table when it is released, abandoned. For example, local variables, I think that if no method would be.

But here, "necessary table" is lvtbl-> tbl that. As you can see lvtbl Pop and freed itself is in it. In other words, the evaluator made lvtbl-> tbl 使うらしいonly. So This lvtbl-> tbl What is the structure or, if that is important It. To add variables (like) function local_cnt () If you look at Let's see.

Before it and, lvtbl-> tbl 0 index lvtbl-> cnt on I remember that you want.

Add

variable

Add a local variable (like) function local_cnt () .

local_cnt ()

 
5246 static int 
5247 local_cnt (id) 
5248 ID id; 
(5249 
5250 int cnt, max; 
5251 
5252 if (id == 0) return lvtbl-> cnt; 
5253 
5254 for (cnt = 1, max = lvtbl-> cnt +1; cnt  tbl [cnt] == id) return cnt-1; 
5256) 
5257 return local_append (id); 
5258) 

(parse.y) 

lvtbl-> tbl Scan and, id equal and we are looking for. And見付かった I still cnt-1 return is not found when local_append () a . local_append () is append said adding that much work at all . In other words local_cnt () variable is not already registered in check if the If not registered in the local_append () add the returns would be understandable.

This function return value is what they are? lvtbl-> tbl name of the variable Array, so variable names and "The index -1 ( cnt-1 )" is one to one Corresponding (Fig. 10).

(lvtbltbl)
Figure 10: variable name and return the correspondence between

In addition, the return value is zero starting point to be calculated from that, perhaps local Space is an array of variables. And the index to access it. The returns would be possible. Otherwise, if the instance variables and constants Variable from the first to name ( ID ) should do a key.

Why 0 to avoid the index ( cnt = 1 from around the loop) are concerned about Sometimes, there must be local_pop () Cart from the value.

That's from local_append () 's role is also looking at the contents understandable. Local Registered variables and the variables " lvtbl-> tbl index of -1" to return. Following the confirmation.

local_append ()

 
5225 static int 
5226 local_append (id) 
5227 ID id; 
(5228 
5229 if (lvtbl-> tbl == 0) ( 
5230 lvtbl-> tbl = ALLOC_N (ID, 4); 
5231 lvtbl-> tbl [0] = 0; 
5232 lvtbl-> tbl [1] = '_'; 
5233 lvtbl-> tbl [2] = '~'; 
5234 lvtbl-> cnt = 2; 
5235 if (id == '_') return 0; 
5236 if (id == '~') return 1; 
5237) 
5238 else ( 
5239 REALLOC_N (lvtbl-> tbl, ID, lvtbl-> cnt +2); 
5240) 
5241 
5242 lvtbl-> tbl [lvtbl-> cnt +1] = id; 
5243 return lvtbl-> cnt + +; 
5244) 

(parse.y) 

No doubt. lvtbl-> tbl is the local name of an array of variables, "The index -1" to return value (local variables ID), the source said.

Also noteworthy is lvtbl-> cnt to increase that. lvtbl-> cnt to increase This code is not only easier, so I just code lvtbl-> cnt of To determine meaning. Now it has to do with what you mean, "added a new variable When lvtbl-> cnt 1 increase "That's why," lvtbl-> cnt this Schou The presence of a local group representing the number of variables. "

And last tbl [1] and tbl [2] I'll explain. This '_' and '~' is, Ruby accustomed to expect, but you know, $_ and $~. Special variable. These two variables of the global variables and the apparent contrast with the low - Cal's variables. And You'll also explicitly Kernel # gets and other methods DE implicitly calling for it to happen, so the assignment, is always assigned region You have to be.

local variables together

There are lots of variables local story, so once YAYAKOSHIKATTA MATOMEYOU.

First, local variables are unlike other variables st_table management does not look like that It is a point. What do you say in it and is配列らしい. In addition, the scope Unit into an array.

The arrangement is lvtbl-> tbl that the index is 0 local_pop () set in Of the project lvtbl-> cnt namely local variables is going on. 1 index The scope is defined in the local variable names are kept. Therefore Figure 11 is the final figure would be like.

(tbl)
Figure 11: variable name and return the correspondence between

block local variables

The remaining problem is struct local_vars member dyna_vars . That is blocked Local variables. To do something you should have a function, I think the function name to the list眺 Because of it, also. dyna_push () dyna_pop () dyna_in_block () Correction U function is dubious. And you can use it someone goes here.

dyna_push dyna_pop for example

 
1651 brace_block: '(' 
(1652 
1653 $  $ = dyna_push (); 
1654) 
1655 opt_block_var 
1656 compstmt ')' 
(1657 
1658 $ $ = NEW_ITER ($ 3, 0, $ 4); 
1659 fixpos ($ $, $ 4); 
1660 dyna_pop ($  2); 
1661) 

(parse.y) 

ITERETABUROKKU launch push , in the end pop . This is the block local variables The process must be new.

Will look at the function.

dyna_push ()

 
5331 static struct RVarmap * 
5332 dyna_push () 
(5333 
5334 struct RVarmap * vars = ruby_dyna_vars; 
5335 
5336 rb_dvar_push (0, 0); 
5337 lvtbl-> dlev + +; 
5338 return vars; 
5339) 

(parse.y) 

lvtbl-> dlev , which is up to block a marked presence of local variables. And rb_dvar_push () is said……

rb_dvar_push ()

 
  691 void 
  692 rb_dvar_push (id, value) 
  693 ID id; 
  694 VALUE value; 
  695 ( 
  696 ruby_dyna_vars = new_dvar (id, value, ruby_dyna_vars); 
  697) 

(eval.c) 

Variable name id , the value val to a member of struct RVarmap generation, it glow Bar - variable ruby_dyna_vars top of it. This is once again in the form of cons. dyna_push () , ruby_dyna_vars to the evacuation because of the scope on ruby_dyna_vars added as it could get.

And here, adding that RVarmap is id 0 value is a member. This is serious Is a YARANAKATTA, ruby - ID as normal rb_intern () make is that as long Absolute zero. So, this RVarmap is NUL and NULL like a sentry (Sentinel) in the role I think. Given that the variables are Has been added not to holders of variable ( RVarmap ) to add to the reasons can explain Cognizance.

Then dyna_pop () .

dyna_pop ()

 
5341 static void 
5342 dyna_pop (vars) 
5343 struct RVarmap * vars; 
(5344 
5345 lvtbl-> dlev -; 
5346 ruby_dyna_vars = vars; 
5347) 

(parse.y) 

lvtbl-> dlev to come back Monday to block variable scope is recorded. Arguments with something like that, but look at the future together.

This is a block local variables were added. Local variables. Says local_cnt () , or is not bad. So dvar and dyna , grep まくっto him, such a code.

assignable () (part)

 
4599 static NODE * 
4600 assignable (id, val) 
4601 ID id; 
4602 NODE * val; 
(4603 
                             : 
4634 rb_dvar_push (id, Qnil); 
4635 return NEW_DASGN_CURR (id, val); 

(parse.y) 

assignable () node is assigned to make the system function, is quoted block them, KUROKARU handling of variables. We just saw rb_dvar_push () in the new And variable ( ruby_dyna_vars ) seem to be added.

the parser ruby_dyna_vars

Now, taking into account more than the local Perth and one variable scope OWATTA At the time ruby_dyna_vars to try to imagine the face.

First just said, adding that block the beginning of the scope id = 0 of RVarmap to break the block representing the scope of the sentry. This " Ruby_dyna_vars header" called to it.

Then, like I showed ITERETABUROKKU rules of Action I want to pay attention to this part.

 
$  $ = dyna_push (); / * $  $ you put in…… * / 
         : 
         : 
dyna_pop ($  2); / *…… $  2 to come out * / 

dyna_push () is the ruby_dyna_vars returns. dyna_pop () is Arguments ruby_dyna_vars to Cart. This means every block break ruby_dyna_vars is saved by not returning. It is following a program to Perth,

 
iter ( 
     a = nil 
     iter ( 
         b = nil 
         iter ( 
             c = nil 
             Level 3 # nest 
         ) 
         bb = nil 
         Level 2 # nest 
         iter ( 
             e = nil 
         ) 
     ) 
     Level 1 # nest 
) 

ruby_dyna_vars Figure 12 is supposed to be.

(dynavars)
Figure 12: the end of the scope Perth ruby_dyna_vars

This structure is pretty good. Where is the good and tell the level of nesting During the entire list of deep辿っif on the nature and level of variables out of sight This time. Each level from this table to make it in the search process Pretty simple.

The painting is bb is giving away suspensory seem, this is rightness . There is an increase in nesting level dropped after見付かったらvariable is the original level The link, since it is bound to continue. And rather than having to "sign on the column At present the only variable is not defined, "a local version of a natural variable Form of expression.

And finally, (rather than block local variables) local variable scope Break this link in the whole lvtbl-> dyna_vars stored in the back. A little back local_push () and local_pop () I'd like to confirm that.

By the way, so painstakingly created ruby_dyna_vars list, but it itself is Reviews are not used to be. Check this list of variables exist to be used alone, Perth by the end of the GC. Evaluator and again since the beginning of another To make the chain in the world. It requires a lot of reasons for that deep, but the…… The third part is once-per-view to it.


The original work is Copyright © 2002 - 2004 Minero AOKI.
Translations,  additions,  and graphics by C.E. Thornton
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.