yacc
crash courseHow to write parsers for programming languages has been an active area of research for a long time, and there is a quite firm established tactic for doing it. If we limit ourselves to a grammar not too strange (or ambiguous), we can solve this problem by following this method.
The first part consists in splitting a string in a list of words (or tokens). This is called a scanner or lexer. The term “lexical analyzer” is also used, but is too complicated to say so we’ll use the name scanner.
When speaking about scanners, the common sense first says “there are generally spaces at the end of a word”. And in practice, it was made like this in most programming languages, because it’s the easiest way.
There can also be exceptions. For example, in the old Fortran, white spaces did not have any meaning. This means a white space did not end a word, and you could put spaces in the name of a variable. However that made the parsing very complicated so the compiler vendors, one by one, started ignoring that standard. Finally Fortran 90 followed this trend and made the fact that white spaces have an impact the standard.
By the way, it seems the reason white spaces had not meaning in Fortran 77 was that when writing programs on punch cards it was easy to make errors in the number of spaces.
I said that the scanner spits out a list of words (tokens), but, to be exact, what the scanner creates is a list of “symbols”, not words.
What are symbols? Let’s take numbers as an example. In a programming language, 1, 2, 3, 99 are all “numbers”. They can all be handled the same way by the grammar. Where we can write 1, we can also write 2 or 3. That’s why the parser does not need to handle them in different ways. For numbers, “number” is enough.
“number”, “identifier” and others can be grouped together as
“symbol”. But be careful not to mix this with the Symbol
class.
The scanner first splits the string into words and determines what
these symbols are. For example, NUMBER
or DIGIT
for numbers,
IDENTIFIER
for names like “name
”, IF
for the reserved word
if
. These symbols are then given to the next phase.
The list of words and symbols spitted out by the scanner are going to be used to form a tree. This tree is called a syntax tree.
The name “parser” is also sometimes used to include both the scanner and the creation of the syntax tree. However, we will use the narrow sense of “parser”, the creation of the syntax tree. How does this parser make a tree from the list of symbols? In other words, on what should we focus to find the tree corresponding to a piece of code?
The first way is to focus on the meaning of the words. For example,
let’s suppose we find the word var
. If the definition of the local
variable var
has been found before this, we’ll understand it’s the
reading of a local variable.
An other ways is to only focus on what we see. For example, if after
an identified comes a ‘=
’, we’ll understand it’s an assignment. If
the reserved word if
appears, we’ll understand it’s the start of an
if
statement.
The later method, focusing only on what we see, is the current trend. In other words the language must be designed to be analyzed just by looking at the list of symbols. The choice was because because this way is simpler, can be more easily generalized and can therefore be automatized using tools. These tools are called parser generators.
The most used parser generator under UNIX is yacc
. Like many others,
ruby
’s parser is written using yacc
. The input file for this tool
is parser.y
. That’s why to be able to read ruby
’s parser, we need
to understand yacc
to some extent. (Note: Starting from 1.9, ruby
requires bison
instead of yacc
. However, bison
is mainly yacc
with additional functionality, so this does not diminish the interest
of this chapter.)
This chapter will be a simple presentation of yacc
to be able to
understand parse.y
, and therefore we will limit ourselves to what’s
needed to read parse.y
. If you want to know more about parsers and
parser generators, I recommend you a book I wrote called “Rubyを256倍使
うための本 無道編” (The book to use 256 times more of Ruby -
Unreasonable book). I do not recommend it because I wrote it, but
because in this field it’s the easiest book to understand. And besides
it’s cheap so it won’t make me rich.
Nevertheless, if you would like a book from someone else (or can’t read Japanese), I recommend O’Reilly’s “lex & yacc programming” by John R. Levine, Tony Mason and Doug Brown. And if your are still not satisfied, you can also read “Compilers” (also known as the “dragon book” because of the dragon on its cover) by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman.
The input file for yacc
is called “grammar file”, as it’s the file
where the grammar is written. The convention is to name this grammar
file *.y
. It will be given to yacc
who will generate C source
code. This file can then be compiled as usual (figure 1 shows the full
process).
Figure 1: File dependencies
The output file name is always y.tab.c
and can’t be changed. The
recent versions of yacc
usually allow to change it on the command
line, but for compatibility it was safer to keep y.tab.c
. By the
way, it seems the tab
of y.tab.c
comes from table
, as lots of
huge tables are defined in it. We should now have a look at the file.
The grammar file’s content has the the following form:
▼ General form of the grammar file%{ Header %} %union .... %token .... %type .... %% Rules part %% User defined part
yacc
’s input file is first divided in 3 parts by %%
. The first
part if called the definition part, has a lot of definitions and
setups. Between %{
and %}
we can write anything we want in C, like
for example necessary macros. After that, the instructions starting
with %
are special yacc
instructions. Every time we use one, we’ll
explain it.
The middle part of the file is called the rules part, and is the most
essential part for yacc
. It’s where is written the grammar we want
to parse. We’ll explain it in details in the next section.
The last part of the file, the user defined part, can be used freely
by the user. yacc
just copies this part verbatim in the output
file. It’s used for example to put auxiliary routines needed by the
parser.
yacc
do.What yacc
takes care of is mainly this rules part in the
middle. yacc
takes the grammar written there and use it to make a
function called yyparse()
. It’s the parser, in the narrow sense of
the word.
In the narrow sense, so it means a scanner is needed. However, yacc
won’t take care of it, it must be done by the user. The function for
the scanner is named yylex()
.
Even if yacc
creates yyparse()
, it only takes care of its core
part. The “actions” we’ll mention later is out of its scope. You can
think the part done by yacc
is too small, but that’s not the
case. That’s because this “core part” is overly important that yacc
survived to this day even though we keep complaining about it.
But what on earth is this core part? That’s what we’re going to see.
When we want to write a parser in C, its code will be “cut the string
this way, make this an if
statement…” When using parser
generators, we say the opposite, that is “I would like to parse this
grammar.” Doing this creates for us a parser to handle the
grammar. This means telling the specification gives us the
implementation. That’s the convenient point of yacc
.
But how can we tell the specification? With yacc
, the method of
description used is the BNF (Backus-Naur Form). Let’s look at a very
simple example.
if_stmt: IF expr THEN stmt END
Let’s see separately what’s at the left and at the right of the
“:
”. The part on the left side, if_stmt
, is equal to the right
part… is what I mean here. In other words, I’m saying that:
if_stmt
and IF expr THEN stmt END
are equivalent.
Here, if_stmt
, IF
, expr
... are all “symbols”. expr
is the
abbreviation of expression
, stmt
of statement
. It must be for
sure the declaration of the if
statement.
One definition is called a rule. The part at the left of “:
” is
called the left side and the right part called the right side. This is
quite easy to remember.
But something is missing. We do not want an if
statement without
being able to use else
. And even
if we could write else
, having
to always write the else
even when it’s useless would be
cumbersome. In this case we could do the following:
if_stmt: IF expr THEN stmt END | IF expr THEN stmt ELSE stmt END
“|
” means “or”.
if_stmt
is either “IF expr THEN stmt END
” or “IF expr THEN stmt
ELSE stmt END
”.
That’s it.
Here I would like you to pay attention to the split done with
|
. With just this, one more rule is added. In fact, punctuating with
|
is just a shorter way to repeat the left side. The previous
example has exactly the same meaning as the following:
if_stmt: IF expr THEN stmt END if_stmt: IF expr THEN stmt ELSE stmt END
This means two rules are defined in the example.
This is not enough to complete the definition of the if
statement. That’s because the symbols expr
and stmt
are not sent
by the scanner, rules must be defined. To be closer to Ruby, let’s
boldly add some rules.
stmt : if_stmt | IDENTIFIER '=' expr /* assignment */ | expr if_stmt: IF expr THEN stmt END | IF expr THEN stmt ELSE stmt END expr : IDENTIFIER /* reading a variable */ | NUMBER /* integer constant */ | funcall /* FUNction CALL */ funcall: IDENTIFIER '(' args ')' args : expr /* only one parameter */
I used two new elements. First, comments of the same form as in C, and
character expressed using '='
. This '='
is also of course a
symbol. Symbols like ”=” are different from numbers as there is only
one variety for them. That’s why for symbols where can also use '='
.
It would be great to be able to use for strings for, for example,
reserved words, but due to limitations of the C language this cannot
be done.
We add rules like this, to the point we complete writing all the
grammar. With yacc
, the left side of the first written rule is “the
whole grammar we want to express”. So in this example, stmt
expresses the whole program.
It was a little too abstract. Let’s explain this a little more
concretely. By “stmt
expresses the whole program”, I mean stmt
and
the rows of symbols expressed as equivalent by the rules, are all
recognized as grammar. For example, stmt
and stmt
are
equivalent. Of course. Then expr
is equivalent to stmt
. That’s
expressed like this in the rule. Then, NUMBER
and stmt
are
equivalent. That’s because NUMBER
is expr
and expr
is stmt
.
We can also say that more complicated things are equivalent.
stmt ↓ if_stmt ↓ IF expr THEN stmt END ↓ ↓ IF IDENTIFIER THEN expr END ↓ IF IDENTIFIER THEN NUMBER END
In the statement developed here, at the end all symbols are ones sent by the scanner. That means this expression is a correct program. Or putting it the other way around, if this sequence of symbols is sent by the scanner, the parser will understand it in the opposite way it was developed.
IF IDENTIFIER THEN NUMBER END ↓ IF IDENTIFIER THEN expr END ↓ ↓ IF expr THEN stmt END ↓ if_stmt ↓ stmt
And stmt
is a symbol expressing the whole program. That’s why this
sequence of symbols is a correct program for the parser. When it’s the
case, the parsing routine yyparse()
ends returning 0.
By the way, the technical term expressing that the parser succeeded is that it “accepted” the input. The parser is like a government office: if you do not fill the documents in the boxes exactly like he asked you to, he’ll refuse them. The accepted sequences of symbols are the ones for which the boxes where filled correctly. Parser and government office are strangely similar for instance in the fact that they care about details in specification and that they use complicated terms.
Well, in the confusion of the moment I used without explaining it the expression “symbols coming from the scanner”. So let’s explain this. I use one word “symbol” but there are two types.
The first type of the symbols are the ones sent by the scanner. They
are for example, IF
, THEN
, END
, '='
, ... They are called
terminal symbols. That’s because like before when we did the quick
expansion we find them aligned at the end. In this chapter terminal
symbols are always written in capital letters. However, symbols like
'='
between quotes are special. Symbols like this are all terminal
symbols, without exception.
The other type of symbols are the ones that never come from the
scanner, for example if_stmt
, expr
or stmt
. They are called
nonterminal symbols. As they don’t come from the scanner, they only
exist in the parser. Nonterminal symbols also always appear at one
moment or the other as the left side of a rule. In this chapter,
nonterminal symbols are always written in lower case letters.
I’m now going to tell you the way to process the grammar file with
yacc
.
%token A B C D E %% list: A B C | de de : D E
First, put all terminal symbols used after %token
. However, you do
not have to type the symbols with quotes (like '='
). Then, put %%
to mark a change of section and write the grammar. That’s all.
Let’s now process this.
% yacc first.y % ls first.y y.tab.c %
Like most Unix tools, “silence means success”.
There’s also implementations of yacc
that need semicolons at the end
of (groups of) rules. When it’s the case we need to do the following:
%token A B C D E %% list: A B C | de ; de : D E ;
I hate these semicolons so in this book I’ll never use them.
Let’s now look a little more at some of the established ways of grammar description. I’ll first introduce void rules.
void:
There’s nothing on the right side, this rule is “void”. For example,
the two following target
s means exactly the same thing.
target: A B C target: A void B void C void :
What is the use of such a thing? It’s very useful. For example in the following case.
if_stmt : IF expr THEN stmts opt_else END opt_else: | ELSE stmts
Using void rules, we can express cleverly the fact that “the else
section may be omitted”. Compared to the rules made previously using
two definitions, this way is shorter and we do not have to disperse
the burden.
The following example is still a little hard to understand.
list: ITEM /* rule 1 */ | list ITEM /* rule 2 */
This expresses a list of one or more items, in other words any of the following lists of symbols:
ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM :
Do you understand why? First, according to rule 1 list
can be read
ITEM
. If you merge this with rule 2, list
can be ITEM ITEM
.
list: list ITEM = ITEM ITEM
We now understand that the list of symbols ITEM ITEM
is similar to
list
. By applying again rule 2 to list
, we can say that 3 ITEM
are also similar to list
. By quickly continuing this process, the
list can grow to any size.
I’ll now show you the next example. The following example expresses
the lists with 0 or more ITEM
.
list: | list ITEM
First the first line means “list
is equivalent to (void)”. By void I
mean the list with 0 ITEM
. Then, by looking at rule 2 we can say
that “list ITEM
” is equivalent to 1 ITEM
. That’s because list
is
equivalent to void.
list: list ITEM = (void) ITEM = ITEM
By applying the same operations of replacement multiple times, we can
understand that list
is the expression a list of 0 or more items.
With this knowledge, “lists of 2 or more ITEM
” or “lists of 3 or
more ITEM
” are easy, and we can even create “lists or an even number
of elements”.
list: | list ITEM ITEM
This abstract talk lasted long enough so in this section I’d really like to go on with a more concrete talk.
For the moment we have only seen how to write grammars, but what we want is being able to build the full syntax tree. However, I’m afraid to say that has can be expected there’s no way to build the syntax tree with just expressing the rules. That’s why this time we’ll go a little farther and I’ll explain how to build the syntax tree.
We’ll first see what the parser does during the execution. We’ll use the following simple grammar as an example.
%token A B C %% program: A B C
In the parser there is a stack called the semantic stack. The parser pushes on it all the symbols coming from the scanner. This move is called “shifting the symbols”.
[ A B ] ← C shift
And when any of the right side of a rule is equal to the end of the end, this is called ?understanding?. When this happens, the right side of the rule is replaced by the left side on the stack.
[ A B C ] ↓ reduction [ program ]
This move is called “a reduction of A B C
” to program
”. This term
is a little presomptious
%token IF E S THEN END %% program : if if : IF expr THEN stmts END expr : E stmts : S | stmts S
Input from the scanner like this is.
IF E THEN S S S END
Transition of semantics stack of this time is shown below。
Stack | |
IF |
IF |
IF E It shifts |
E It shifts |
IF expr |
E →expr So it restores |
IF expr THEN It shifts |
THEN It shifts |
IF expr THEN S |
S |
IF expr THEN stmts |
S →stmts So it restores |
IF expr THEN stmts S |
S
|
IF expr THEN stmts |
stmts S →stmts So it restores |
IF expr THEN stmts S |
S It shifts |
IF expr THEN stmts |
stmts S →stmts So it restores |
IF expr THEN stmts END |
END It shifts |
if |
IF expr THEN stmts END →if So it restores |
program |
if →program So it restores |
accept. |
Lastly just one note.With restoration the sign does not limit decreases with. When there is a empty rule, when “the non” empty sign is formed, it is
Well, is the important place from here.It probably is shift, but it probably is restoration, but if only [udauda] you do during semantics stack, there are no many meanings.As for our last goals because it was to form the syntactic tree, unless it is connected to that, it is troubled. Being the intention how of dropping yacc and acquiring before?“Hook which the parser restores that it will try it is possible the instantaneous”, it is the answering which yacc puts out.That hook is called action (action) of the parser.Rule you write action as follows lastly.
program: A B C { /* Here action */ }
{With} with the part which is surrounded is action.Like this when you write, this action is executed instantaneously restores A B C to PROGRAM.That with action what will be done, it is free.If C cord/code you can write generally regardless.
But and furthermore is important from here, there is “the value” in all signs.Are the terminal symbol and the nonterminal symbol.Because the terminal symbol comes from the scanner, you receive also the value from the scanner.For example as for that 1 or 9 perhaps 108 vis-a-vis sign NUMBER.Vis-a-vis sign IDENTIFIER perhaps " attr " " name " “sym ".It is good regardless.The value in the sign and simultaneous is stacked in semantics stack.The following figure now exactly it shifts S, it is the place where in value and simultaneous.
IF expr THEN stmts S Value Value Value Value Value
Some time ago, according to rule it can restore stmts S to stmts.If action is written on the rule, but the reason where that is executed, that time, value of the sign of the amount which corresponds to the right-hand side is transferred to action.
IF expr THEN stmts S /*Stack */ Value Value Value Value Value ↓ ↓ stmts: stmts S /* Rule */ ↓ ↓ { $1 + $2; } /* Action */
With, this way with action <$1
、$2
、$3
.....
So it can take the value of the sign which is suitable to the systematic right-hand side.
The $1
or the $2
is the case that yacc
rewrites to the expression which points to stack.
Though if C language because truth however type it is to be a variety
, is difficult, for the time being int
you probably will suppose.
IF expr THEN stmts S /* Stack immediately before the restoring */ Value Value Value Value Value ↓ ↓ stmts: stmts S /*The rule where the right-hand side matches to end */ ↑ ↓ ↓ { $$ = $1 + $2; } /*The action */ IF expr THEN stmts /* Stack after the restoring */ Value Value Value Value Value
Lastly redundancy.As for value of the sign semantic value, there are also times when it is called semantic value. Therefore, the stack which inserts that with semantic value stack, omitting, is the case that it calls semantic stack.
yacc
With typeWell but truly trouble, if story of type is not done, story is not settled.Type of value of the sign just probably is what.When you say from conclusion, it becomes the type, YYSTYPE.Certainly as for this YY Stack TYPE, it is and Semantic value TYPE, difference it is not in either abbreviation. And YYSTYPE naturally something is typedef of another type.The type is the common body which is appointed in the order, %union in the defined department.
But %union how you had not written until now.Although that it is, the fact that it has not become error probably is what reason.Because that processed yacc making the air be effective, having selfishly with default.If you mention default with C, naturally it is int.With the default of YYSTYPE is int with the notion that where you say.
If about example and the portable calculator program which are provided to the book of yacc it does not care even with while it is int but is, to make the syntactic tree, the structure and the pointer and in addition we would like to use the varieties.Then for example %union is used as follows.
%union { struct node { int type; struct node *left; struct node *right; } *node; int num; char *str; }
Because now it is not the case that it is really used, type and member name are suitable.Different from normal C
%union
Because lastly of block the semicolon there is no necessity, note.
When like this you write with that, with y.tab.c it is the case that it becomes as follows.
typedef union { struct node { int type; struct node *left; struct node *right; } *node; int num; char *str; } YYSTYPE;
So when it does, as for semantics stack
YYSTYPE yyvs[256]; /* Substance (yyvs = YY Value Stack) of stack */ YYSTYPE *yyvsp = yyvs; /* The pointer which points to the point of stack */
Expectation is attached with it probably is the feeling which is said, that. Then value of the sign which appears in action ......
/* yacc Action before the processing */ target: A B C { func($1, $2, $3); } /* After the converting, the circumstances with y.tab.c */ { func(yyvsp[-2], yyvsp[-1], yyvsp[0]); ;
Naturally it becomes like this.
In this case because int of default was used, just to refer to stack it is necessary, but when YYSTYPE is the common body, if either the member does not appoint simultaneously, it is the expectation which cannot be accessed.The method of tying at the symbolic unit there are two kinds of method of appointing every time to that.
First generality, from the method of appointing at the symbolic unit.In case of the terminal symbol in case of the nonterminal symbol using %type, you write %token, as follows.
%token<num> A B C /* As for the value of all A B C int type */ %type<str> target /* As for the value of all target char* type */
%union { char *str; }
%%
target: { $
%union { char *str; } %% target: { $<str>$ = "It requires cast like ones"; }
As for this method the one which is used as little as possible is better. What the member is decided at the symbolic unit is the basis.
Everything you spoke now concerning [arekore] of value in the parser.After if you speak the connected protocol of the scanner, as for the item which becomes the nucleus everything you put away.
First when you verify, the scanner () was function yylex. (With int) it returns (terminal) as for the sign itself as a return value of function.Because yacc #define has done constant with the same name as the sign, if sign NUMBER NUMBER just to write it is necessary.And inserting in the global variable, yylval it transfers the value. Also this and with YYSTYPE type, the completely same thing as the time of the parser can call yylval.In other words when it defines with %union, it becomes the common body.But this time because the member is not done it chooses selfishly, unless you write member name by your, it is useless.In other words when it is very simple example, it becomes as follows.
static int yylex() { yylval.str = next_token(); return STRING; }
Because the relationship to here was arranged to Figure 2, we want trying verifying one by one. yylval, $$, $1 and $2 ......And so on, the variable which becomes interface is all YYSTYPE types.
Figure 2: yacc Something related to related variable function
Those which rule are written lastly, with it explained action, but to tell the truth it can also write in the middle of rule
target: A B { puts("embedded action"); } C D
This is called pad action. Pad action is the mere syntax sugar of following kind of description.
target: A B dummy C D dummy : /* Empty rule */ { puts("embedded action"); }
yacc
how we do not fear anymore with this.
When we assume that with you thought, that is rather sweet.Why whether you can fear yacc to this much, the reason is after this.
So far, “when the right-hand side on rule matches to the stacked point”, that you wrote casually, but when there is a following kind of rule, how it probably will become?
target : A B C | A B C
When the string, A B C appears, it is the expectation which stops knowing whether either rule matches.You cannot understand any such a things in even the human. Therefore yacc it is not recognized.When such strange grammar is discovered, as for yacc reduce/reduce conflict (restoration & restoration collision) occurred, that it is complaint.In the sense that plural rules are restoration possible simultaneously.
% yacc rrconf.y conflicts: 1 reduce/reduce
If with say it is normal, you think such a thing other than accident that it does not do, but will the following example how probably be?The string which has described completely is the same.
target : abc | A bc abc : A B C bc : B C
If this it is, it can be relatively.Especially while thinking of rule, when [guchiyaguchiya] it is moving, it is something which it designates such a rule as the unwittingly inside.
There are also following kind of ones with the pattern which is similar.
target : abc | ab C abc : A B C ab : A B
When the string, A B C appears, you do not know whether it should choose abc one, whether ab it should make C combination.Such time yacc occurred shift/reduce conflict (shift restoration collision), that it droops complaint. This in the sense that the rule which can be shifted simultaneously there is a rule which can be restored.
% yacc srconf.y conflicts: 1 shift/reduce
The famous example of shift/reduce conflict “it dangles and else problem” is. For example this problem happens in the if sentence of C language.Simplifying story, it will try writing.
stmt : expr ';' | if expr : IDENTIFIER if : IF '(' expr ')' stmt | IF '(' expr ')' stmt ELSE stmt
As for formula just IDENTIFIER
(variable), if
as for the substance tried making rule as Bunichi just.
Well, when the following program pass is done with this grammar, it probably means some thing.>
if (cond) if (cond) true_stmt; else false_stmt;
Like this, when you write, it is visible clearly, how without but is, to tell the truth it can interpret as follows.
if (cond) { if (cond) true_stmt; } else { false_stmt; }
In other words it is the problem you attach else outside and inside either if.
However shift/reduce conflict if you compare to reduce/reduce conflict, relatively is harmless collision.Because that when you say, why when mostly is, if shift is chosen, it goes well.Shift is chosen, that “be as close as possible the element is connected”, it is synonym generally, it is easy to match to the intuition of the human.Actually, if it shifts also dangling else, it goes well.When shift/reduce conflict occurred with such reason yacc in accordance with the flow, it has reached the point where shift is chosen with default.
We want trying applying the following grammar to yacc in trial.
%token A B C %% target : A B C /*Rule 1 */ | A B /*Rule 2*/
How thinking, it collides and may probably will be.To A B it wishes to shift rule 1 at the point in time when you read and, it wishes to restore rule 2. In other words this is the expectation which becomes shift/reduce conflict.However ......
% yacc conf.y %
It is strange, it does not collide.Why probably will be.
When it is the truth, the parser which was made with yacc “pre-reading (look ahead)” can do just 1 signs. Before shifting truly and restoring, gledging the following sign, it can judge how it does.
Therefore even at the time of parser formation considering that, if you can distinguish with one pre-reading, it does not make collide.For example if some time ago rule if of A B the C comes next, because only rule 1 there is a possibility, rule 1 is chosen, (shifting,).When input ends, rule 2 is chosen, (restoring,).
Those where we want noting are to be meaning of two kinds in the word, “pre-reading”. As for one when processing *.y with yacc, pre-reading.As for one more when really moving the parser which is formed, pre-reading.Pre-reading in the execution time is not difficult very, but yacc itself pre-reading is very complicated.Because because estimating all input configurations of the execution time from just syntax rule, unless it decides behavior, it is not good.
Though, because really as for “all” it is unreasonable, with it means to cope “considerable” pattern.Whether or not and it can cope with the inside about some range of all pattern, it is the case that it becomes strength of pre-reading algorithm.In the collision solution algorithm which as for the pre-reading algorithm which yacc uses at the time of grammatical file processing LALR (1) you say, exists extremely they are powerful ones.
The varieties you said, but because those where you do in this book just read rule are not to write, there are no times when you worry excessively.When we would like to explain here, as for are not pre-reading which used grammar and are method of pre-reading the execution time.
For a while because abstract story continued, already a little concrete story is done here and others.+ And * and so on we will have decided to try defining the rule of binary operator (infix type operator).Because there is a formula even in this, following to that placidly, you should have put.Those like the portable calculator which can use arithmetical operation below were defined.
expr : expr '+' expr | expr '-' expr | expr '*' expr | expr '/' expr | primary primary : NUMBER | '(' expr ')'
primary is translated “the section”.It is the smallest grammatical unit. When expr is bound with the parenthesis, the place that is the point it becomes primary.
The [te], writing this grammar on the file suitably, when it compiles, it becomes like this.
% yacc infix.y 16 shift/reduce conflicts
It collided extremely.If only five minutes think, you think that it is understood, but with this rule it is troubled in following kind of case.
1 - 1 - 1
Being able to interpret this it finishes in both of the following two kinds.。
(1 - 1) - 1 1 - (1 - 1)
The fact that it is natural as a numerical formula of course is former.But as for yacc
doing you see to the last and it is
processing of for the sake of, meaning does not enter completely there. -
With considering also the semantic how this [tsu] [po] [chi] which
That when you say, so how it should have done, like this you should have written on the defined department.
%left '+' '-' %left '*' '/'
This order priority and appoints connection of operator characteristic two simultaneously. It probably will keep explaining to order.
As for the word, priority when doing the story of grammar of programming language, you think that it comes out well.When you speak theoretically, because it is complicated, speaking intuitively know, it is the story the parenthesis is attached to either operator in following kind of case.
1 + 2 * 3
*
If one priority is high, it becomes like this.
1 + (2 * 3)
+
If one priority is high, it becomes like this.
(1 + 2) * 3
This way, those which are strong in operator setting weak ones, the fact that it solves shift/reduce conflict is operator priority
But but, falling in the same circumstance, when should priority have done is the same how probably?For example this way.
1 - 2 - 3
Because this time both is -, priority completely is the same.Using connection characteristic such time, it solves.There are three types of left right nonassoc in connection characteristic, are interpreted respectively as follows.
Connection characteristic | Interpretation |
left(The left connection) | (1 - 2) - 3 |
right(The right connection) | 1 - (2 - 3) |
nonassoc(Non connection) | [pasuera] |
When it is operator for numerical formula, they are most left connections.As for the right connection = of substitution and you use mainly with not of denial.
a = b = 1 # (a = (b = 1)) not not a # (not (not a))
The typical case of nonassoc probably is comparison operators.
a == b == c # Pass error a <= b <= c # Pass error
Though because with Python and the like comparison of three sections is possible, it is not this limit
The order, some time ago %left %right %nonassoc with that is used in order to show the connection characteristic of according to name.And it shows priority in the order which arranges. About the operator which is under priority is high.If it is in the same line, it is the same ranking.
%left '+' '-' /* With the left connection priority 3 */ %left '*' '/' /* With the left connection priority 2 */ %right '!' /* With the left connection priority 1 */