1 # Copyright (C) 2007, The Perl Foundation.
6 docs/pdds/pdd19_pir.pod - Parrot Intermediate Representation
15 This document outlines the architecture and core syntax of the Parrot
16 Intermediate Representation (PIR).
18 This document describes PIR, a stable, middle-level language for both
19 compiler and human to target on.
23 PIR is a stable, middle-level language intended both as a target for the
24 generated output from high-level language compilers, and for human use
25 developing core features and extensions for Parrot.
31 A valid PIR program consists of a sequence of statements, directives, comments
36 A statement starts with an optional label, contains an instruction, and is
37 terminated by a newline (<NL>). Each statement must be on its own line.
39 [label:] [instruction] <NL>
41 An instruction may be either a low-level opcode or a higher-level PIR
42 operation, such as a subroutine call, a method call, a directive, or PIR
47 A directive provides information for the PIR compiler that is outside the
48 normal flow of executable statements. Directives are all prefixed with a ".",
49 as in C<.local> or C<.sub>.
53 Comments start with C<#> and last until the following newline. PIR also allows
54 comments in Pod format. Comments, Pod content, and empty lines are ignored.
58 Identifiers start with a letter or underscore, then may contain additionally
59 letters, digits, and underscores. Identifiers don't have any limit on length
60 at the moment, but some sane-but-generous length limit may be imposed in the
61 future (256 chars, 1024 chars?). The following examples are all valid
68 Opcode names are not reserved words in PIR, and may be used as variable names.
69 For example, you can define a local variable named C<print>. [See RT #24251]
71 {{ NOTE: The use of C<::> in identifiers is deprecated. [See RT #48735] }}
75 A label declaration consists of a label name followed by a colon. A label name
76 conforms to the standard requirements for identifiers. A label declaration may
77 occur at the start of a statement, or stand alone on a line, but always within
80 A reference to a label consists of only the label name, and is generally used
81 as an argument to an instruction or directive.
83 A PIR label is accessible only in the compilation unit where it's defined. A
84 label name must be unique within a compilation unit, but it can be reused in
85 other compilation units.
91 =head3 Registers and Variables
93 There are three ways of referencing Parrot's registers. The first is direct
94 access to a specific register by name In, Sn, Nn, Pn. The second is through a
95 temporary register variable $In, $Sn, $Nn, $Pn. I<n> consists of digit(s)
96 only. There is no limit on the size of I<n>.
98 The third syntax for accessing registers is through named local variables
99 declared with C<.local>.
103 The type of a named variable can be C<int>, C<num>, C<string> or C<pmc>,
104 corresponding to the types of registers. No other types are used. [See
107 The difference between direct register access and register variables or local
108 variables is largely a matter of allocation. If you directly reference C<P99>,
109 Parrot will blindly allocate 100 registers for that compilation unit. If you
110 reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will
111 intelligently allocate a literal register in the background. So, C<$P99> may
112 be stored in C<P0>, if it is the only register in the compilation unit.
116 Constants may be used in place of registers or variables. A constant is not
117 allowed on the left side of an assignment, or in any other context where the
118 variable would be modified.
122 =item 'single-quoted string constant'
124 Are delimited by single-quotes (C<'>). They are taken to be ASCII encoded. No
125 escape sequences are processed.
127 =item "double-quoted string constants"
129 Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped
130 by C<\>. Only 7-bit ASCII is accepted in string constants; to use characters
131 outside that range, specify an encoding in the way below.
133 =item <<"heredoc", <<'heredoc'
135 Heredocs work like single or double quoted strings. All lines up to
136 the terminating delimiter are slurped into the string. The delimiter
137 has to be on its own line, at the beginning of the line and with no
140 Assignment of a heredoc:
146 A heredoc as an argument:
148 function(<<"END_OF_HERE", arg)
160 You may have multiple heredocs within a single statement or directive:
162 function(<<'INPUT', <<'OUTPUT', 'some test')
168 =item charset:"string constant"
170 Like above with a character set attached to the string. Valid character
171 sets are currently: C<ascii> (the default), C<binary>, C<unicode>
172 (with UTF-8 as the default encoding), and C<iso-8859-1>.
176 =head2 String escape sequences
178 Inside double-quoted strings the following escape sequences are processed.
183 \x{h..h} 1..8 hex digits
185 \Uhhhhhhhh 8 hex digits
186 \a, \b, \t, \n, \v, \f, \r, \e, \\
190 =item encoding:charset:"string constant"
192 Like above with an extra encoding attached to the string. For example:
194 set S0, utf8:unicode:"«"
196 The encoding and charset gets attached to the string, no further processing
197 is done, specifically escape sequences are not honored.
199 =item numeric constants
201 C<0x> and C<0b> denote hex and binary constants respectively.
209 =item .local <type> <identifier> [:unique_reg]
211 Define a local name I<identifier> for this compilation unit with the given
212 I<type>. You can define multiple identifiers of the same type by separating
217 The optional C<:unique_reg> modifier will force the register allocator to
218 associate the identifier with a unique register for the duration of the
221 =item .lex <string constant>, <reg>
223 Declare a lexical variable that is an alias for a PMC register. For example,
229 These two opcodes have an identical effect:
234 And these two opcodes also have an identical effect:
239 =item .const <type> <identifier> = <const>
242 .const <string constant> <identifier> = <const>
243 as an alternative to allow ".const 'Sub' ... "
246 Define a constant named I<identifier> of type I<type> and assign value
247 I<const> to it. The constant is stored in the constant table of the current
250 =item .globalconst <type> <identifier> = <const>
252 As C<.const> above, but the defined constant is globally accessible.
254 =item .namespace <identifier> [deprecated: See RT #48737]
256 Open a new scope block. This "namespace" is not the same as the
257 .namespace [ <identifier> ] syntax, which is used for storing subroutines
258 in a particular namespace in the global symbol table.
259 This directive is useful in cases such as (pseudocode):
263 do # open a new namespace/scope block
264 local x = 2; # this x hides the previous x
266 end # close the current namespace
267 print(x); # prints 1 again
269 All types of common language constructs such as if, for, while, repeat and
270 such that have nested scopes, can use this directive.
272 {{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated.
273 They were a hackish attempt at implementing scopes in Parrot, but didn't
274 actually turn out to be useful.}}
276 =item .endnamespace <identifier> [deprecated: See RT #48737]
278 Closes the scope block that was opened with .namespace <identifier>.
280 =item .namespace [ <identifier> ; <identifier> ]
282 .namespace [ <key>? ]
284 key: <identifier> [';' <identifier>]*
286 Defines the namespace from this point onwards. By default the program is not
287 in any namespace. If you specify more than one, separated by semicolons, it
288 creates nested namespaces, by storing the inner namespace object in the outer
289 namespace's global pad.
291 You can specify the root namespace by using empty brackets, such as:
295 The brackets are not optional, although the string inside them is.
297 {{ NOTE: currently the brackets *are* optional. TODO: make decision whether
298 we want the brackets optional. }}
301 =item .pragma n_operators
303 Convert arithmethic infix operators to n_infix operations. The unary opcodes
304 C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
307 .pragma n_operators 1
310 $P0 = $P1 + $P2 # n_add $P0, $P1, $P2
311 $P2 = abs $P0 # n_abs $P2, $P0
313 =item .loadlib "lib_name"
315 Load the given library at compile time, that is, as soon that line is
316 parsed. See also the C<loadlib> opcode, which does the same at run time.
318 A library loaded this way is also available at runtime, as if it has been
319 loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
321 =item .HLL <hll_name>, <hll_lib>
323 Define the HLL for the current file. Takes two string constants. If the string
324 I<hll_lib> isn't empty this compile time pragma also loads the shared lib for
325 the HLL, so that integer type constants are working for creating new PMCs.
327 {{ PROPOSAL: make the ",<hll_lib>" part optional, so you don't have to
328 specify an empty string for the library.
329 (Alternatively, make this two different directives: .HLL_name, .HLL_lib)
332 =item .HLL_map <core_type>, <user_type>
334 {{ PROPOSAL: make the ',' an "->", "=>", "=", for instance, so it's easier
335 to remember what argument comes first, the core type or the user type.
338 Whenever Parrot has to create PMCs inside C code on behalf of the running
339 user program it consults the current type mapping for the executing HLL
340 and creates a PMC of type I<user_type> instead of I<core_type>, if such
341 a mapping is defined. I<core_type> and I<user_type> may be any valid string
344 For example, with this code snippet ...
349 .HLL_map 'LexPad', 'DynLexPad'
354 ... all subroutines for language I<Foo> would use a dynamic lexpad pmc.
356 {{ PROPOSAL: stop using integer constants for types RT#45453 }}
360 .sub <identifier> [:<flag> ...]
361 .sub <quoted string> [:<flag> ...]
363 Define a compilation unit. All code in a PIR source file must be defined in a
364 compilation unit. See the section C<Subroutine flags> for
365 available flags. Optional flags are a list of I<flag>, separated by empty
368 The name of the sub may be either a bare identifier or a quoted string
369 constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
370 above), but string sub names can contain any characters, including characters
371 from different character sets (see L<Constants> above).
373 Always paired with C<.end>.
377 End a compilation unit. Always paired with C<.sub>.
379 =item .line <integer>, <string>
381 Set the line number and filename to the value specified. This is useful in
382 case the PIR code is generated from some source file, and any error messages
383 should print the source file, not the line number and filename of the
386 {{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
387 [RT#43269], and [RT#47141]. }}
391 =head3 Subroutine flags
397 Define "main" entry point to start execution. If multiple subroutines are
398 marked as B<:main>, the B<last> marked subroutine is used. Only the first
399 file loaded or compiled counts; subs marked as B<:main> are ignored by the
404 Run this subroutine when loaded by the B<load_bytecode> op (i.e. neither in
405 the initial program file nor compiled from memory). This is complementary to
406 what B<:init> does (below); to get both behaviours, use B<:init :load>. If
407 multiple subs have the B<:load> pragma, the subs are run in source code order.
411 Run the subroutine when the program is run directly (that is, not loaded as a
412 module), including when it is compiled from memory. This is complementary to
413 what B<:load> does (above); to get both behaviours, use B<:init :load>.
417 Do not install this subroutine in the namespace. Allows the subroutine
420 =item :multi(Type1, Type2...)
422 Engage in multiple dispatch with the listed types.
423 See L<docs/pdds/pdd27_multi_dispatch.pod> for more information on the
424 multiple dispatch system.
428 Execute this subroutine immediately after being compiled, which is analogous
429 to C<BEGIN> in Perl 5.
431 In addition, if the sub returns a PMC value, that value replaces the sub in
432 the constant table of the bytecode file. This makes it possible to build
433 constants at compile time, provided that (a) the generated constant can be
434 computed at compile time (i.e. doesn't depend on the runtime environment), and
435 (b) the constant value is of a PMC class that supports saving in a bytecode
436 file [need a freeze/thaw reference].
438 For example, C<examples/shootout/revcomp.pir> contains the following (slightly
439 abbreviated) definition:
441 .sub tr_00_init :immediate
443 tr_array = new 'FixedIntegerArray'
445 ## [code to initialize tr_array omitted.]
449 This code is run at compile time, and the returned C<tr_array> is stored
450 in the bytecode file in place of the sub. Other subs may then do:
452 .const .Sub tr_00 = 'tr_00_init'
454 in order to fetch the constant.
458 Execute immediately after being compiled, but only if the subroutine is in the
459 initial file (i.e. not in PIR compiled as result of a C<load_bytecode>
460 instruction from another file).
462 As an example, suppose file C<main.pir> contains:
465 load_bytecode "foo.pir"
468 and the file C<foo.pir> contains:
478 Executing C<foo.pir> will run both C<foo> and C<bar>. On the other hand,
479 executing C<main.pir> will run only C<foo>. If C<foo.pir> is compiled to
480 bytecode, only C<foo> will be run, and loading C<foo.pbc> will not run either
485 The marked C<.sub> is a method. In the method body, the object PMC
486 can be referred to with C<self>.
490 The marked C<.sub> overrides a v-table method. By default, a sub with the same
491 name as a v-table method does not override the v-table method. To specify that
492 there should be no namespace entry (that is, it just overrides the v-table
493 method but is callable as a normal method), use B<:vtable :anon>. To give the
494 v-table method a different name, use B<:vtable("...")>. For example, to have
495 the method B<ToString> also be the v-table method B<get_string>), use
496 B<:vtable("get_string")>.
498 When the B<:vtable> flag is set, the object PMC cn be referred to with
499 C<self>, as with the B<:method> flag.
502 =item :outer(subname)
504 The marked C<.sub> is lexically nested within the sub known by B<subname>.
506 =item :lexid( <string_constant> )
508 Identifies the subroutine by the specified string.
510 {{ TODO: explain purpose and details of this flag. }}
515 =head3 Directives used for Parrot calling conventions.
517 {{ A bit of a radical idea, but now would be the time to decide on this:
518 Remove the whole "long-style" invocation syntax altogether.
519 Only allow the short version.
520 As PIR is typically being generated, and hopefully by PCT-based
521 compilers, there seems to be no real use for too much syntactic
522 sugar. Just a thought.
527 =item .begin_call and .end_call
529 Directives to start and end a subroutine invocation, respectively.
531 =item .begin_return and .end_return
533 Directives to start and end a statement to return values.
535 =item .begin_yield and .end_yield
537 Directives to start and end a statement to yield values.
541 Takes either 2 arguments: the sub and the return continuation, or the
542 sub only. For the latter case an B<invokecc> gets emitted. Providing
543 an explicit return continuation is more efficient, if its created
544 outside of a loop and the call is done inside a loop.
548 Directive to specify the object for a method call. Use it in combination
553 Directive to do a method call. It calls the specified method on the object
554 that was specified with the C<.invocant> directive.
558 Directive to make a call through the Native Calling Interface (NCI).
559 The specified subroutine must be loaded using the <dlfunc> op that takes
560 the library, function name and function signature as arguments.
561 See L<docs/pdds/pdd16_native_call> for details.
563 =item .return <var> [:<flag>]*
565 Between C<.begin_return> and C<.end_return>, specify one or
566 more of the return value(s) of the current subroutine. Available
567 flags: C<:flat>, C<:named>.
569 =item .arg <var> [:<flag>]*
571 Between C<.begin_call> and C<.call>, specify an argument to be
572 passed. Available flags: C<:flat>, C<:named>.
574 =item .result <var> [:<flag>]*
576 Between C<.call> and C<.end_call>, specify where one or more return
577 value(s) should be stored. Available flags:
578 C<:slurpy>, C<:named>, C<:optional>, and C<:opt_flag>.
582 =head3 Directives for subroutine parameters
586 =item .param <type> <identifier> [:<flag>]*
588 At the top of a subroutine, declare a local variable, in the manner
589 of C<.local>, into which parameter(s) of the current subroutine should
590 be stored. Available flags:
591 C<:slurpy>, C<:named>, C<:optional>, C<:opt_flag> and C<:unique_reg>.
593 =item .param <type> "<identifier>" => <identifier> [:<flag>]*
595 Define a named parameter. This is syntactic sugar for:
597 .param <type> <identifier> :named("<identifier>")
601 =head3 Parameter Passing and Getting Flags
603 See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of
604 the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>,
605 and C<FLAT>, which correspond to the calling convention flags
606 C<:slurpy>, C<:optional>, C<:opt_flag>, and C<:flat>.
609 =head3 Catching Exceptions
611 Using the C<push_eh> op you can install an exception handler. If an exception
612 is thrown, Parrot will execute the installed exception handler. In order to
613 retrieve the thrown exception, use the C<.get_results> directive. This
614 directive always takes 2 arguments: an exception object and a message string.
616 {{ Wouldn't it be more useful to make this flexible, or at least only the
617 exception object? The message can be retrieved from the exception object. }}
623 .local string message
624 .get_results (exception, message)
627 This is syntactic sugar for the C<get_results> op, but any flags set on the
628 targets will be handled automatically by the PIR compiler.
629 The C<.get_results> directive must be the first instruction of the exception
630 handler; only declarations (.lex, .local) may come first.
632 =head2 Syntactic Sugar
634 Any PASM opcode is a valid PIR instruction. In addition, PIR defines some
635 syntactic shortcuts. These are provided for ease of use by humans producing
636 and maintaing PIR code.
640 =item goto <identifier>
642 C<branch> to I<identifier> (label or subroutine name).
648 =item if <var> goto <identifier>
650 If I<var> evaluates as true, jump to the named I<identifier>. Translate to
651 C<if var, identifier>.
653 =item unless <var> goto <identifier>
655 Unless I<var> evaluates as true, jump to the named I<identifier>. Translate
656 to C<unless var, identifier>.
658 =item if null <var> goto <identifier>
660 If I<var> evaluates as null, jump to the named I<identifier>. Translate to
661 C<if_null var, identifier>.
663 =item unless null <var> goto <identifier>
665 Unless I<var> evaluates as null, jump to the named I<identifier>. Translate
666 to C<unless_null var, identifier>.
668 =item if <var1> <relop> <var2> goto <identifier>
670 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
671 to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
672 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
674 =item unless <var1> <relop> <var2> goto <identifier>
676 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
677 to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless
678 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
680 =item <var1> = <var2>
682 Assign a value. Translates to C<set var1, var2>.
684 =item <var1> = <unary> <var2>
686 The unaries C<!>, C<-> and C<~> generate C<not>, C<neg> and C<bnot> ops.
688 =item <var1> = <var2> <binary> <var3>
690 The binaries C<+>, C<->, C<*>, C</>, C<%> and C<**> generate
691 C<add>, C<sub>, C<mul>, C<div>, C<mod> and C<pow> arithmetic ops.
692 binary C<.> is C<concat> and only valid for string arguments.
694 C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts C<shl> and C<shr>.
695 C<E<gt>E<gt>E<gt>> is the logical shift C<lsr>.
697 C<&&>, C<||> and C<~~> are logic C<and>, C<or> and C<xor>.
699 C<&>, C<|> and C<~> are binary C<band>, C<bor> and C<bxor>.
701 {{PROPOSAL: Change description to support logic operators (comparisons) as
702 implemented (and working) in imcc.y.}}
704 =item <var1> <op>= <var2>
706 This is equivalent to
707 C<E<lt>var1E<gt> = E<lt>var1E<gt> E<lt>opE<gt> E<lt>var2E<gt>>. Where
708 I<op> is called an assignment operator and can be any of the following
709 binary operators described earlier: C<+>, C<->, C<*>, C</>, C<%>, C<.>,
710 C<&>, C<|>, C<~>, C<E<lt>E<lt>>, C<E<gt>E<gt>> or C<E<gt>E<gt>E<gt>>.
712 =item <var> = <var> [ <var> ]
714 This generates either a keyed C<set> operation or C<substr var, var,
715 var, 1> for string arguments and an integer key.
717 =item <var> = <var> [ <key> ]
719 {{ NOTE: keyed assignment is still valid in PIR, but the C<..> notation in
720 keys is deprecated [See RT #48561], so this syntactic sugar for slices is also
721 deprecated. See the (currently experimental) C<slice> opcode instead. }}
727 returns a slice defined starting at C<var1> and ending at C<var2>.
731 returns a slice starting at the first element, and ending at C<var2>.
735 returns a slice starting at C<var1> to the end of the array.
737 see src/pmc/slice.pmc
740 =item <var> [ <var> ] = <var>
742 A keyed C<set> operation.
744 {{ DEPRECATION NOTE: this syntactic sugar will no longer be used for the
745 assign C<substr> op with a length of 1. }}
747 =item <var> = <opcode> <arguments>
749 All opcodes can use this PIR syntactic sugar. The first argument for the
750 opcode is placed before the C<=>, and all remaining arguments go after the
751 opcode name. For example:
759 =item global "string" = <var>
761 {{ DEPRECATED: op store_global was deprecated }}
763 =item <var> = global "string"
765 {{ DEPRECATED: op find_global was deprecated }}
767 =item ([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])
775 .result <var1> <flag1>
779 =item <var> = <var>([arg [:<flag> ...], ...])
781 =item <var>([arg [:<flag> ...], ...])
783 =item <var>."_method"([arg [:<flag> ...], ...])
785 =item <var>._method([arg [:<flag> ...], ...])
787 Function or method call. These notations are shorthand for a longer PCC
788 function call. I<var> can denote a global subroutine, a local I<identifier> or
791 {{We should review the (currently inconsistent) specification of the
792 method name. Currently it can be a bare word, a quoted string or a
793 string register. See #45859.}}
795 =item .return ([<var> [:<flag> ...], ...])
797 Return from the current compilation unit with zero or more values.
799 The surrounded parentheses are mandatory. Besides making sequence
800 break more conspicuous, this is necessary to distinguish this syntax
801 from other uses of the C<.return> directive that will be probably
804 =item .return <var>(args)
806 =item .return <var>."somemethod"(args)
808 =item .return <var>.somemethod(args)
810 Tail call: call a function or method and return from the sub with the
811 function or method call return values.
813 Internally, the call stack doesn't increase because of a tail call, so
814 you can write recursive functions and not have stack overflows.
818 =head2 Assignment and Morphing
820 The C<=> syntactic sugar in PIR, when used in the simple case of:
824 directly corresponds to the C<set> opcode. So, two low-level arguments (int,
825 num, or string registers, variables, or constants) are a direct C assignment,
826 or a C-level conversion (int cast, float cast, a string copy, or a call to one
827 of the conversion functions like C<string_to_num>).
829 A PMC source with a low-level destination, calls the C<get_integer>,
830 C<get_number>, or C<get_string> vtable function on the PMC. A low-level source
831 with a PMC destination calls the C<set_integer_native>, C<set_number_native>,
832 or C<set_string_native> vtable function on the PMC (assign to value
833 semantics). Two PMC arguments are a direct C assignment (assign to container
836 For assign to value semantics for two PMC arguments use C<assign>, which calls
837 the C<assign_pmc> vtable function.
840 {{ NOTE: response to the question:
842 <pmichaud> I don't think that 'morph' as a method call is a good idea
843 <pmichaud> we need something that says "assign to value" versus
844 "assign to container"
845 <pmichaud> we can't eliminate the existing 'morph' opcode until we have a
853 This section describes the macro layer of the PIR language. The macro layer of
854 the PIR compiler handles the following directives:
858 =item * C<.include> "<filename>"
860 The C<.include> directive takes a string argument that contains the
861 name of the PIR file that is included. The contents of the included
862 file are inserted as if they were written at the point where the
863 C<.include> directive occurs.
865 The include file is searched for in the current directory and in
866 runtime/parrot/include, in that order. The first file of that name to be found
869 {{ Check the include directive's search order and whether it's complete }}
871 =item * C<.macro> <identifier> [<parameters>]
873 The C<.macro> directive starts the a macro definition named by the specified
874 identifier. The optional parameter list is a comma-separated list of
875 identifiers, enclosed in parentheses. See C<.endm> for ending the macro
880 Closes a macro definition.
882 =item * C<.macro_const> <identifier> (<literal>|<reg>)
886 The C<.macro_const> directive is a special type of macro; it allows the user
887 to use a symbolic name for a constant value. Like C<.macro>, the substitution
888 occurs at compile time. It takes two arguments (not comma separated), the
889 first is an identifier, the second a constant value or a register.
893 The macro layer is completely implemented in the lexical analysis phase.
894 The parser does not know anything about what happens in the lexical
897 When the C<.include> directive is encountered, the specified file is opened
898 and the following tokens that are requested by the parser are read from
901 A macro expansion is a dot-prefixed identifier. For instance, if a macro
902 was defined as shown below:
908 this macro can be expanded by writing C<.foo(42)>. The body of the macro
909 will be inserted at the point where the macro expansion is written.
911 A C<.macro_const> expansion is more or less the same as a C<.macro> expansion,
912 except that a constant expansion cannot take any arguments, and the
913 substitution of a C<.macro_const> contains no newlines, so it can be used
914 within a line of code.
916 =head3 Macro parameter list
918 The parameter list for a macro is specified in parentheses after the name of
919 the macro. Macro parameters are not typed.
921 .macro foo(bar, baz, buz)
925 The number of arguments in the call to a macro must match the number of
926 parameters in the macro's parameter list. Macros do not perform multidispatch,
927 so you can't have two macros with the same name but different parameters.
928 Calling a macro with the wrong number of arguments gives the user an error.
930 If a macro defines no parameter list, parentheses are optional on both the
931 definition and the call. This means that a macro defined as:
937 can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition
944 can also be expanded by writing either C<.foo> or C<.foo()>.
946 {{ NOTE: this is a change from the current implementation, which requires the
947 definition and call of a zero-parameter macro to match in the use of
952 =item * Heredoc arguments
954 Heredoc arguments are not allowed when expanding a macro. This means that
955 the following is not allowed:
967 {{ NOTE: This is likely because the parsing of heredocs happens later than the
968 preprocessing of macros. Might be nice if we could parse heredocs at the macro
969 level, but not a high priority. compilers/pirc/new can do this. }}
971 Using braces, { }, allows you to span multiple lines for an argument.
972 See runtime/parrot/include/hllmacros.pir for examples and possible usage.
973 A simple example is this:
989 This will expand the macro C<foo>, after which the input to the PIR parser is:
998 which will result in the output:
1004 =head3 Unique local labels
1006 Within the macro body, the user can declare a unique label identifier using
1007 the value of a macro parameter, like so:
1016 =head3 Unique local variables
1018 Within the macro body, the user can declare a local variable with a unique
1026 print .b # prints the value of the unique variable (42)
1030 The C<.macro_local> directive declares a local variable with a unique name in
1031 the macro. When the macro C<.foo()> is called, the resulting code that is
1032 given to the parser will read as follows:
1035 .local int local__foo__b__2
1037 local__foo__b__2 = 42
1038 print local__foo__b__2
1042 The user can also declare a local variable with a unique name set to the
1043 symbolic value of one of the macro parameters.
1050 print .$b # prints the value of the unique variable (42)
1051 print .b # prints the value of parameter "b", which is
1052 # also the name of the variable.
1056 So, the special C<$> character indicates whether the symbol is interpreted as
1057 just the value of the parameter, or that the variable by that name is meant.
1058 Obviously, the value of C<b> should be a string.
1060 The automatic name munging on C<.macro_local> variables allows for using
1061 multiple macros, like so:
1076 This will result in code for the parser as follows:
1079 .local int local__foo__x__2
1080 .local int local__bar__x__4
1083 Each expansion is associated with a unique number; for labels declared with
1084 C<.macro_label> and locals declared with C<.macro_local> expansions, this
1085 means that multiple expansions of a macro will not result in conflicting
1086 label or local names.
1088 =head3 Ordinary local variables
1090 Defining a non-unique variable can still be done, using the normal syntax:
1097 When invoking the macro C<foo> as follows:
1101 there will be two variables: C<b> and C<x>. When the macro is invoked twice:
1108 the resulting code that is given to the parser will read as follows:
1112 .local int local__foo__x
1114 .local int local__foo__y
1117 Obviously, this will result in an error, as the variable C<b> is defined
1118 twice. If you intend the macro to create unique variables names, use
1119 C<.macro_local> instead of C<.local> to take advantage of the name munging.
1123 =head2 Subroutine Definition
1125 .sub _sub_label [<subflag>]*
1136 =head2 Subroutine Call
1138 .const .Sub $P0 = "_sub_label"
1139 $P1 = new 'Continuation'
1140 set_addr $P1, ret_addr
1149 .call $P0, $P1 # r = _sub_label(x, y, z)
1151 .local int r # optional - new result var
1157 load_lib $P0, "libname"
1158 dlfunc $P1, $P0, "funcname", "signature"
1164 .nci_call $P1 # r = funcname(x, y, z)
1165 .local int r # optional - new result var
1169 =head2 Subroutine Call Syntactic Sugar
1171 ... # variable decls
1172 r = _sub_label(x, y, z)
1173 (r1[, r2 ...]) = _sub_label(x, y, z)
1176 This also works for NCI calls, as the subroutine PMC will be
1177 a NCI sub, and on invocation will do the Right Thing.
1178 Instead of the label a subroutine object can be used too:
1180 find_global $P0, "_sub_label"
1186 .namespace [ "Foo" ]
1188 .sub _sub_label :method [,Subpragma, ...]
1193 self."_other_meth"()
1201 The variable "self" automatically refers to the invocating object, if the
1202 subroutine declaration contains "method".
1204 =head2 Calling Methods
1206 The syntax is very similar to subroutine calls. The call is done with
1207 C<meth_call> which must immediately be preceded by the C<.invocant>:
1211 newclass class, "Foo"
1218 .meth_call "_method" [, $P1 ] # r = obj."_method"(x, y, z)
1219 .local int r # optional - new result var
1223 The return continuation is optional. The method can be a string
1224 constant or a string variable.
1226 =head2 Returning and Yielding
1228 .return ( a, b ) # return the values of a and b
1230 .return () # return no value
1232 .return func_call() # tail call function
1234 .return o."meth"() # tail method call
1236 Similarly, one can yield using the .yield directive
1238 .yield ( a, b ) # yield with the values of a and b
1240 .yield () # yield with no value
1243 =head2 Stack calling conventions
1245 Arguments are B<save>d in reverse order onto the user stack:
1247 .arg y # save args in reversed order
1249 call _foo #(r, s) = _foo(x,y)
1252 .result r # restore results in order
1255 and return values are B<restore>d in argument order from there.
1259 .sub _foo # sub foo(int a, int b)
1261 .param int a # receive arguments from left to right
1265 .return mi # return (pl, mi), push results
1266 .return pl # in reverse order
1271 Pushing arguments in reversed order on the user stack makes the left
1272 most argument the top of stack entry. This allows for a variable
1273 number of function arguments (and return values), where the left most
1274 argument before a variable number of following arguments is the
1288 See C<docs/imcc/macros.pod>