1 # Copyright (C) 2008, Parrot Foundation.
6 pct_optable_guide.pod - A Guide to Using an Operator Parsing Table in PGE-based grammars.
10 This document describes how to use an operator parsing table in grammars
11 written for the Parrot Grammar Engine. The C<mk_language_shell.pl> script
12 will generate a very simple grammar that already includes an operator
13 table. This document may help you to understand the syntax of defining
16 =head1 WHY AN OPTABLE?
18 Even in simple languages such as C<languages/abc>, parse trees can become very
19 big for even very simple expressions.
21 Consider the following example grammar:
23 token addop { '+' | '-' }
24 token mulop { '*' | '/' }
27 <mulexpr> [<addop> <mulexpr>]*
31 <primary> [<mulop> <primary>]*
40 An expression such as C<1 + 2> will result in a parse tree like this:
54 This may not seem very big, but remember it's only for C<1 + 2>!
55 Also note that you have to add at least one rule per precedence level,
56 so the rules for parsing simple statements can increase the size of
57 your grammar dramatically.
59 In order to prevent very big parse trees (which is rather inefficient),
60 it's much better to do bottom-up parsing for expressions such as these.
63 =head1 HOW TO USE AN OPTABLE?
65 Perl 6 rules are run as ordinary subroutines (well, in fact, they are
66 C<methods> of the class representing the grammar you're writing), and
67 parsing will be done by invoking a rule's subrules. A subrule can also
68 contain subrules. Therefore, the generated parser is a top-down parser,
69 starting at the C<TOP> rule, and invoking rules that represent I<lower>
72 =head2 Switching Top-down and Bottom-up parsing
74 An optable parses operators bottom-up, which is the other well-known
75 parsing approach, implemented popular parser generators such as Yacc.
76 At some point, when using an optable, your parser must switch from
77 top-down parsing to bottom-up, and after parsing the operators (and
78 related operands), the parser switches back from bottom-up to top-down
81 Bottom up parsing is very well suited for expressions that consist of
82 terms, unary and binary operators with no two terms in a row.
84 In order to define the entry point of the bottom-up, operator parsing
85 table, you should define a rule. Below is an example that states that
86 an C<expression> is parsed by the operator parser:
88 rule expression is optable { ... }
90 The C<{ ... }> is legal Perl 6 syntax. See C<Synopsis 6: subroutines>
93 In order to define what an operand is, a special rule is define, named
96 proto 'term:' is tighter('infix:*')
97 is parsed(&simple_expression) { ... }
99 This rule states that whenever an operand is expected, it is parsed by
100 the rule named C<simple_expression>. In other words, this is the point
101 where the parser switches back from bottom-up to top-down.
103 Be sure to add the C<tighter> clause, to make sure that your I<terms>
104 are parsed correctly. The grammar compiler (PGE) will not notify you
105 if you don't do this.
107 =head2 Defining operators
109 In between these two rules defining the entry point and the I<exit> point
110 of the operator parser, operators are declared, along with their
111 precedence. This typically look as follows:
113 proto 'infix:+' is looser('infix:*') { ... }
115 This defines the symbol C<+> as an C<infix> operator. The C<is looser>
116 specification defines the precedence of the C<+> operator in relation
117 to the C<*> operator.
119 =head3 Operator precedence
121 Operator precedence can be expressed using the following traits:
127 States that the precedence of the current operator being defined is looser
132 States that the precedence of the current operator being defined is equal
133 to the precedence of operator C<OP>.
135 =item * tighter( OP )
137 States that the precedence of the current operator being defined is
138 tighter than the precedence of the operator C<OP>.
142 Be sure that C<OP> is defined I<before> referencing it in a operator
143 precedence statement as discussed in this section.
145 =head3 Where's the operator?
147 In Perl 6, operators are just subroutines with special names and scoping
148 (see C<Synopsis 6: subroutines>).
150 When operators such as C<infix:+> are defined, the bottom-up parser will
151 generate a C<call> instruction to a subroutine with the same name, that is,
152 C<infix:+>. Therefore, you need to write a subroutine for each operator
153 you define. The subroutine for the C<infix:+> operator could look like this:
158 .param pmc a # left operand
159 .param pmc b # right operand
160 n_add $P0, a, b # create a new PMC object with the value a+b
166 Whenever an expression such as C<42 + 1> is parsed, this will result in a
167 call to C<infix:+> with the operands C<42> and C<1>. The C<infix:+> sub
168 will create a new object, and assign the result of C<42 + 1> to it, after
169 which this new object is returned. Note that the C<n_> prefix of the C<add>
170 instruction implies that a new object is created.
172 You might think that it's somewhat overkill to write and call a subroutine
173 for such a simple operation, and that this could be done simpler. Well,
174 you're right. If the implementation of the C<+> operator is as simple as
175 in this case, there's a simpler way to implement the exact same behavior.
176 This can be accomplished using a the C<pirop> trait, discussed below.
178 =head3 Infix, prefix, postfix
180 Besides C<infix> operators, operators can be C<prefix> or C<postfix>
181 Operators can even be declared to be both C<prefix> and C<infix>. A common
182 example is the C<-> operator. This is fine, because each operator has
183 a certain precedence, which is kept track of by the bottom-up parser.
185 In order to define a prefix C<++> operator, you could write this:
187 proto 'prefix:++' ... # specify precedence
189 Besides these well-known mathematical operators, some languages have
190 built-in operators with C<normal> names. For instance, Ruby defines
191 an operator called C<defined?>, which queries its operand whether it
192 is defined. Defining such special operators are not special in any way,
193 just declared them as any other operator:
195 proto 'prefix:defined?' ...
197 Note that C<defined?> is a prefix operator, which would look like this
202 =head3 Ternary operators
204 Most languages only have one ternary operator: C<? :>. An example is shown
207 $income > 100000 ? print("I'm rich!") : print("I'll keep my job for now")
209 To declare a ternary operator, you write:
211 proto ternary:<? :> is looser('infix:+') { ... }
213 =head3 Circumfix operators
215 Brackets can be considered operators as well. There are two types:
221 C<circumfix> operators can enclose other terms and operators. For instance
222 many languages define a rule C<expression> similar to the following:
227 | expression ['+'|'-'|...] expression
229 When defining an operator table, the third alternative specifying a
230 parenthesized expression can be expressed as follows:
232 proto circumfix:<( )> ...
234 This means that operands (which are parsed through the special C<term:> rule)
235 can be enclosed in parenthesis, like so:
239 This is legal, because the C<( )> is just another operator handled by the
242 =item * postcircumfix
244 Postcircumfix brackets are useful for subroutine invocation or indexing. Of
245 course, this fully depends on the specification of your language. Sometimes,
246 you need a different rule to define subroutine invocation syntax. This is the
247 case when arguments can be other objects than operands of normal operators
248 (which, again, are defined by the C<term:> rule).
250 An example to handle indexing (assuming the index is an operand as any other
251 operator's operand) is this:
253 proto postcircumfix:<[ ]> ...
255 which, given the following rules:
257 rule expression is optable { ... }
259 rule 'term:' (...) is parsed(&simple_expression) { ... }
261 rule simple_expression {
266 allows us to write this:
270 Here, C<"hello"> is a literal and C<foo> is an identifier.
275 =head3 The C<assoc> trait
277 Operators have a certain associacity. For instance, the C<+> operator
278 is usually said to be C<left> associated, while the exponential operator
279 (often written as C<^>) is usually C<right> associated. Consider this
284 Should this be parsed as:
286 ((10 - 5) - 2) # 5-2=3
290 (10 - (5 - 2)) # 10-3=7
292 In other words, does the C<-> I<associate> with the left or right operand?
293 According to standard rules, the C<-> operator is C<left> associated,
294 which means the first option is correct.
296 This associecity is expressed using the C<assoc> trait, like so:
298 proto 'infix:-' is assoc('left') ...
300 If you don't specify the association, it defaults to C<left>. C<right>
301 association works the other way around. In mathematics, the power operator
304 There is a third associacity in Perl 6. Generally a list (using the comma
305 operator) is not nested at all:
309 should neither be parsed as
317 You can achieve that with the C<is assoc('none')> trait.
319 =head3 The C<pirop> trait
321 Some operators can be perfectly mapped to a specific Parrot instruction,
322 for instance the C<n_add> op that we introduced earlier. By default, an
323 operator is implemented as a subroutine call, which is obviously not as
324 efficient as a single Parrot instruction. Therefore, you can specify the
325 Parrot instruction using the C<pirop> trait. You can do this as follows:
327 proto 'infix:+' ... is pirop('n_add') { ... }
329 This will not work for all Parrot ops, though. Certain instructions
330 leave their result in an C<I> register (one of the four types of Parrot
331 registers). PCT currently only supports PMC registers (C<P> registers).
332 Operators such as C<==> and C<!=> must therefore be implemented as
335 =head3 The C<pasttype> trait
337 Some operators have behavior that can be implemented by certain C<PAST>
338 nodes. For instance, many languages define the semantics of the C<and>
339 operator to be the following:
344 2 if result(1) is false return false
348 As soon as the result of A is found to be false, there is no need to
349 evaluate B, as the final result can never be true (as this is the C<and>
350 operator). So, C<B> is evaluated only if C<A> is true.
352 This is very similar to the PAST::Op node with the C<if> C<:pasttype>:
358 Therefore, an operator such as C<and> can be implemented as an C<if>
359 statement. In order to specify that the C<and> operator must be handled
360 as a C<PAST::Op( :pasttype('if') )> node, you can use the C<pasttype> trait,
363 proto 'infix:and' ... is pasttype('if') { ... }
365 The C<or> operator is similar to the semantics of a
366 PAST::Op( :pasttype('unless') node:
371 2 if result(1) is true return true
375 So, C<unless> A is true, evaluate B. Hence, the C<or> operator could
376 be implemented like so:
378 proto 'infix:or' ... is pasttype('unless') { ... }
380 =head3 Special characters
382 Some operators, such as the shift operators contain the "<" or ">" character.
383 As operators can be specified as, for instance, infix:<+>, defining an operator
384 such as "<<" will make the rule parser (the parser generator) confused.
386 You have two options. Either use quoted names for the operator subroutine names,
389 proto 'infix:>>' is equiv('infix:<<') { ... }
391 Or, use so-called French quotes to do this. This looks as follows:
393 proto infix:«>>» is equiv(infix:«<<») { ... }
400 =item * Why are some operators quoted and some not?
402 That's a matter of taste. You can define the operator C<+> in various ways:
408 Note that C<< 'infix:<+>' >> is not allowed.
410 =item * I get an error message saying:
412 "Can only replace inside string or index after end of string"
414 This error occurs if the operator table contains a misspelling somewhere. Make sure
415 that operators containing the character "<" or ">" are quoted using French quotes
416 (« and »), or quote the proto name, as in C<< 'infix:<' >>.
417 The best solution to solve this is to comment out the whole table except for a
418 single operator, and try to find the error using a binary search. Or, better yet,
419 set up your operator table incrementally, while keeping tests passing.
421 =item * Help! I defined my operator table, but some operators are not recognized.
423 Be sure that there are no spelling errors. For instance, on first sight the following
426 proto 'infix:-' is equal('infix:+') { ... }
428 However, C<equal> is not a valid precedence specifier, it should be spelled as C<equiv>.
429 The PGE will not warn you for such errors.
431 =item * How many operator tables can I use in my language?
433 {{ XXX I think one only? Anybody? }}
435 =item * How does an optable parser work?
437 If you really want to know, check out compiler literature.
438 It's something with stacks and emitting instructions for operators with C<tighter>
439 precedence (and their operands) first.
447 =item * L<docs\pct\gettingstarted.pod>
449 =item * L<docs\pct\past_building_blocks.pod>
451 =item * L<http://dev.perl.org/perl6/doc/design/syn/S06.html>