docs/pdds/draft/pdd19_pir.pod

   1 # Copyright (C) 2007, The Perl Foundation.
   2 # $Id$
   3
   4 =head1 NAME
   5
   6 docs/pdds/pdd19_pir.pod - Parrot Intermediate Representation
   7
   8 =head1 VERSION
   9
  10 $Revision$
  11
  12
  13 =head1 ABSTRACT
  14
  15 This document outlines the architecture and core syntax of the Parrot
  16 Intermediate Representation (PIR).
  17
  18 This document describes PIR, a stable, middle-level language for both
  19 compiler and human to target on.
  20
  21 =head1 DESCRIPTION
  22
  23 PIR is a stable, middle-level language intended both as a target for the
  24 generated output from high-level language compilers, and for human use
  25 developing core features and extensions for Parrot.
  26
  27 =head1 IMPLEMENTATION
  28
  29 =head2 Basic Syntax
  30
  31 A valid PIR program consists of a sequence of statements, directives, comments
  32 and empty lines.
  33
  34 =head3 Statements
  35
  36 A statement starts with an optional label, contains an instruction, and is
  37 terminated by a newline (<NL>). Each statement must be on its own line.
  38
  39   [label:] [instruction] <NL>
  40
  41 An instruction may be either a low-level opcode or a higher-level PIR
  42 operation, such as a subroutine call, a method call, a directive, or PIR
  43 syntactic sugar.
  44
  45 =head3 Directives
  46
  47 A directive provides information for the PIR compiler that is outside the
  48 normal flow of executable statements. Directives are all prefixed with a ".",
  49 as in C<.local> or C<.sub>.
  50
  51 =head3 Comments
  52
  53 Comments start with C<#> and last until the following newline. PIR also allows
  54 comments in Pod format. Comments, Pod content, and empty lines are ignored.
  55
  56 =head3 Identifiers
  57
  58 Identifiers start with a letter or underscore, then may contain additionally
  59 letters, digits, and underscores. Identifiers don't have any limit on length
  60 at the moment, but some sane-but-generous length limit may be imposed in the
  61 future (256 chars, 1024 chars?). The following examples are all valid
  62 identifiers.
  63
  64     a
  65     _a
  66     A42
  67
  68 Opcode names are not reserved words in PIR, and may be used as variable names.
  69 For example, you can define a local variable named C<print>.  [See RT #24251]
  70
  71 {{ NOTE: The use of C<::> in identifiers is deprecated. [See RT #48735] }}
  72
  73 =head3 Labels
  74
  75 A label declaration consists of a label name followed by a colon. A label name
  76 conforms to the standard requirements for identifiers. A label declaration may
  77 occur at the start of a statement, or stand alone on a line, but always within
  78 a compilation unit.
  79
  80 A reference to a label consists of only the label name, and is generally used
  81 as an argument to an instruction or directive.
  82
  83 A PIR label is accessible only in the compilation unit where it's defined. A
  84 label name must be unique within a compilation unit, but it can be reused in
  85 other compilation units.
  86
  87   goto label1
  88      ...
  89   label1:
  90
  91 =head3 Registers and Variables
  92
  93 There are three ways of referencing Parrot's registers. The first is direct
  94 access to a specific register by name In, Sn, Nn, Pn. The second is through a
  95 temporary register variable $In, $Sn, $Nn, $Pn. I<n> consists of digit(s)
  96 only.  There is no limit on the size of I<n>.
  97
  98 The third syntax for accessing registers is through named local variables
  99 declared with C<.local>.
 100
 101   .local pmc foo
 102
 103 The type of a named variable can be C<int>, C<num>, C<string> or C<pmc>,
 104 corresponding to the types of registers. No other types are used. [See
 105 RT#42769]
 106
 107 The difference between direct register access and register variables or local
 108 variables is largely a matter of allocation. If you directly reference C<P99>,
 109 Parrot will blindly allocate 100 registers for that compilation unit. If you
 110 reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will
 111 intelligently allocate a literal register in the background. So, C<$P99> may
 112 be stored in C<P0>, if it is the only register in the compilation unit.
 113
 114 =head2 Constants
 115
 116 Constants may be used in place of registers or variables. A constant is not
 117 allowed on the left side of an assignment, or in any other context where the
 118 variable would be modified.
 119
 120 =over 4
 121
 122 =item 'single-quoted string constant'
 123
 124 Are delimited by single-quotes (C<'>). They are taken to be ASCII encoded. No
 125 escape sequences are processed.
 126
 127 =item "double-quoted string constants"
 128
 129 Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped
 130 by C<\>.  Only 7-bit ASCII is accepted in string constants; to use characters
 131 outside that range, specify an encoding in the way below.
 132
 133 =item <<"heredoc",  <<'heredoc'
 134
 135 Heredocs work like single or double quoted strings. All lines up to
 136 the terminating delimiter are slurped into the string. The delimiter
 137 has to be on its own line, at the beginning of the line and with no
 138 trailing whitespace.
 139
 140 Assignment of a heredoc:
 141
 142   $S0 = <<"EOS"
 143   ...
 144  EOS
 145
 146 A heredoc as an argument:
 147
 148   function(<<"END_OF_HERE", arg)
 149   ...
 150  END_OF_HERE
 151
 152   .return(<<'EOS')
 153   ...
 154  EOS
 155
 156   .yield(<<'EOS')
 157   ...
 158  EOS
 159
 160 You may have multiple heredocs within a single statement or directive:
 161
 162    function(<<'INPUT', <<'OUTPUT', 'some test')
 163    ...
 164  INPUT
 165    ...
 166  OUTPUT
 167
 168 =item charset:"string constant"
 169
 170 Like above with a character set attached to the string. Valid character
 171 sets are currently: C<ascii> (the default), C<binary>, C<unicode>
 172 (with UTF-8 as the default encoding), and C<iso-8859-1>.
 173
 174 =back
 175
 176 =head2 String escape sequences
 177
 178 Inside double-quoted strings the following escape sequences are processed.
 179
 180   \xhh        1..2 hex digits
 181   \ooo        1..3 oct digits
 182   \cX         control char X
 183   \x{h..h}    1..8 hex digits
 184   \uhhhh      4 hex digits
 185   \Uhhhhhhhh  8 hex digits
 186   \a, \b, \t, \n, \v, \f, \r, \e, \\
 187
 188 =over 4
 189
 190 =item encoding:charset:"string constant"
 191
 192 Like above with an extra encoding attached to the string. For example:
 193
 194   set S0, utf8:unicode:"«"
 195
 196 The encoding and charset gets attached to the string, no further processing
 197 is done, specifically escape sequences are not honored.
 198
 199 =item numeric constants
 200
 201 C<0x> and C<0b> denote hex and binary constants respectively.
 202
 203 =back
 204
 205 =head2 Directives
 206
 207 =over 4
 208
 209 =item .local <type> <identifier> [:unique_reg]
 210
 211 Define a local name I<identifier> for this compilation unit with the given
 212 I<type>. You can define multiple identifiers of the same type by separating
 213 them with commas:
 214
 215   .local int i, j
 216
 217 The optional C<:unique_reg> modifier will force the register allocator to
 218 associate the identifier with a unique register for the duration of the
 219 compilation unit.
 220
 221 =item .lex <string constant>, <reg>
 222
 223 Declare a lexical variable that is an alias for a PMC register. For example,
 224 given this preamble:
 225
 226     .lex "$a", $P0
 227     $P1 = new 'Integer'
 228
 229     These two opcodes have an identical effect:
 230
 231     $P0 = $P1
 232     store_lex "$a", $P1
 233
 234     And these two opcodes also have an identical effect:
 235
 236     $P1 = $P0
 237     $P1 = find_lex "$a"
 238
 239 =item .const <type> <identifier> = <const>
 240
 241 {{ PROPOSAL: add
 242    .const <string constant> <identifier> = <const>
 243    as an alternative to allow ".const 'Sub' ... "
 244 }}
 245
 246 Define a constant named I<identifier> of type I<type> and assign value
 247 I<const> to it. The constant is stored in the constant table of the current
 248 bytecode file.
 249
 250 =item .globalconst <type> <identifier> = <const>
 251
 252 As C<.const> above, but the defined constant is globally accessible.
 253
 254 =item .namespace <identifier> [deprecated: See RT #48737]
 255
 256 Open a new scope block. This "namespace" is not the same as the
 257 .namespace [ <identifier> ] syntax, which is used for storing subroutines
 258 in a particular namespace in the global symbol table.
 259 This directive is useful in cases such as (pseudocode):
 260
 261   local x = 1;
 262   print(x);       # prints 1
 263   do              # open a new namespace/scope block
 264     local x = 2;  # this x hides the previous x
 265     print(x);     # prints 2
 266   end             # close the current namespace
 267   print(x);       # prints 1 again
 268
 269 All types of common language constructs such as if, for, while, repeat and
 270 such that have nested scopes, can use this directive.
 271
 272 {{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated.
 273 They were a hackish attempt at implementing scopes in Parrot, but didn't
 274 actually turn out to be useful.}}
 275
 276 =item .endnamespace <identifier> [deprecated: See RT #48737]
 277
 278 Closes the scope block that was opened with .namespace <identifier>.
 279
 280 =item .namespace [ <identifier> ; <identifier> ]
 281
 282    .namespace [ <key>? ]
 283
 284    key: <identifier> [';' <identifier>]*
 285
 286 Defines the namespace from this point onwards.  By default the program is not
 287 in any namespace.  If you specify more than one, separated by semicolons, it
 288 creates nested namespaces, by storing the inner namespace object in the outer
 289 namespace's global pad.
 290
 291 You can specify the root namespace by using empty brackets, such as:
 292
 293     .namespace [ ]
 294
 295 The brackets are not optional, although the string inside them is.
 296
 297 {{ NOTE: currently the brackets *are* optional. TODO: make decision whether
 298    we want the brackets optional. }}
 299
 300
 301 =item .pragma n_operators
 302
 303 Convert arithmethic infix operators to n_infix operations. The unary opcodes
 304 C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
 305 prefix.
 306
 307  .pragma n_operators 1
 308  .sub foo
 309    ...
 310    $P0 = $P1 + $P2           # n_add $P0, $P1, $P2
 311    $P2 = abs $P0             # n_abs $P2, $P0
 312
 313 =item .loadlib "lib_name"
 314
 315 Load the given library at compile time, that is, as soon that line is
 316 parsed.  See also the C<loadlib> opcode, which does the same at run time.
 317
 318 A library loaded this way is also available at runtime, as if it has been
 319 loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
 320
 321 =item .HLL <hll_name>, <hll_lib>
 322
 323 Define the HLL for the current file. Takes two string constants. If the string
 324 I<hll_lib> isn't empty this compile time pragma also loads the shared lib for
 325 the HLL, so that integer type constants are working for creating new PMCs.
 326
 327 {{ PROPOSAL: make the ",<hll_lib>" part optional, so you don't have to
 328    specify an empty string for the library.
 329    (Alternatively, make this two different directives: .HLL_name, .HLL_lib)
 330 }}
 331
 332 =item .HLL_map <core_type>, <user_type>
 333
 334 {{ PROPOSAL: make the ',' an "->", "=>", "=", for instance, so it's easier
 335    to remember what argument comes first, the core type or the user type.
 336 }}
 337
 338 Whenever Parrot has to create PMCs inside C code on behalf of the running
 339 user program it consults the current type mapping for the executing HLL
 340 and creates a PMC of type I<user_type> instead of I<core_type>, if such
 341 a mapping is defined. I<core_type> and I<user_type> may be any valid string
 342 constant.
 343
 344 For example, with this code snippet ...
 345
 346   .loadlib 'dynlexpad'
 347
 348   .HLL "Foo", ""
 349   .HLL_map 'LexPad', 'DynLexPad'
 350
 351   .sub main :main
 352     ...
 353
 354 ... all subroutines for language I<Foo> would use a dynamic lexpad pmc.
 355
 356 {{ PROPOSAL: stop using integer constants for types RT#45453 }}
 357
 358 =item .sub
 359
 360   .sub <identifier> [:<flag> ...]
 361   .sub <quoted string> [:<flag> ...]
 362
 363 Define a compilation unit. All code in a PIR source file must be defined in a
 364 compilation unit. See the section C<Subroutine flags> for
 365 available flags.  Optional flags are a list of I<flag>, separated by empty
 366 spaces.
 367
 368 The name of the sub may be either a bare identifier or a quoted string
 369 constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
 370 above), but string sub names can contain any characters, including characters
 371 from different character sets (see L<Constants> above).
 372
 373 Always paired with C<.end>.
 374
 375 =item .end
 376
 377 End a compilation unit. Always paired with C<.sub>.
 378
 379 =item .line <integer>, <string>
 380
 381 Set the line number and filename to the value specified. This is useful in
 382 case the PIR code is generated from some source file, and any error messages
 383 should print the source file, not the line number and filename of the
 384 generated file.
 385
 386 {{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
 387 [RT#43269], and [RT#47141]. }}
 388
 389 =back
 390
 391 =head3 Subroutine flags
 392
 393 =over 4
 394
 395 =item :main
 396
 397 Define "main" entry point to start execution.  If multiple subroutines are
 398 marked as B<:main>, the B<last> marked subroutine is used.  Only the first
 399 file loaded or compiled counts; subs marked as B<:main> are ignored by the
 400 B<load_bytecode> op.
 401
 402 =item :load
 403
 404 Run this subroutine when loaded by the B<load_bytecode> op (i.e. neither in
 405 the initial program file nor compiled from memory).  This is complementary to
 406 what B<:init> does (below); to get both behaviours, use B<:init :load>.  If
 407 multiple subs have the B<:load> pragma, the subs are run in source code order.
 408
 409 =item :init
 410
 411 Run the subroutine when the program is run directly (that is, not loaded as a
 412 module), including when it is compiled from memory.  This is complementary to
 413 what B<:load> does (above); to get both behaviours, use B<:init :load>.
 414
 415 =item :anon
 416
 417 Do not install this subroutine in the namespace. Allows the subroutine
 418 name to be reused.
 419
 420 =item :multi(Type1, Type2...)
 421
 422 Engage in multiple dispatch with the listed types.
 423 See L<docs/pdds/pdd27_multi_dispatch.pod> for more information on the
 424 multiple dispatch system.
 425
 426 =item :immediate
 427
 428 Execute this subroutine immediately after being compiled, which is analogous
 429 to C<BEGIN> in Perl 5.
 430
 431 In addition, if the sub returns a PMC value, that value replaces the sub in
 432 the constant table of the bytecode file.  This makes it possible to build
 433 constants at compile time, provided that (a) the generated constant can be
 434 computed at compile time (i.e. doesn't depend on the runtime environment), and
 435 (b) the constant value is of a PMC class that supports saving in a bytecode
 436 file [need a freeze/thaw reference].
 437
 438 For example, C<examples/shootout/revcomp.pir> contains the following (slightly
 439 abbreviated) definition:
 440
 441     .sub tr_00_init :immediate
 442     .local pmc tr_array
 443     tr_array = new 'FixedIntegerArray'
 444     tr_array = 256
 445     ## [code to initialize tr_array omitted.]
 446     .return (tr_array)
 447     .end
 448
 449 This code is run at compile time, and the returned C<tr_array> is stored
 450 in the bytecode file in place of the sub.  Other subs may then do:
 451
 452     .const .Sub tr_00 = 'tr_00_init'
 453
 454 in order to fetch the constant.
 455
 456 =item :postcomp
 457
 458 Execute immediately after being compiled, but only if the subroutine is in the
 459 initial file (i.e. not in PIR compiled as result of a C<load_bytecode>
 460 instruction from another file).
 461
 462 As an example, suppose file C<main.pir> contains:
 463
 464     .sub main
 465         load_bytecode "foo.pir"
 466     .end
 467
 468 and the file C<foo.pir> contains:
 469
 470     .sub foo :immediate
 471         print "42"
 472     .end
 473
 474     .sub bar :postcomp
 475         print "43"
 476     .end
 477
 478 Executing C<foo.pir> will run both C<foo> and C<bar>.  On the other hand,
 479 executing C<main.pir> will run only C<foo>.  If C<foo.pir> is compiled to
 480 bytecode, only C<foo> will be run, and loading C<foo.pbc> will not run either
 481 C<foo> or C<bar>.
 482
 483 =item :method
 484
 485 The marked C<.sub> is a method. In the method body, the object PMC
 486 can be referred to with C<self>.
 487
 488 =item :vtable
 489
 490 The marked C<.sub> overrides a v-table method. By default, a sub with the same
 491 name as a v-table method does not override the v-table method. To specify that
 492 there should be no namespace entry (that is, it just overrides the v-table
 493 method but is callable as a normal method), use B<:vtable :anon>. To give the
 494 v-table method a different name, use B<:vtable("...")>. For example, to have
 495 the method B<ToString> also be the v-table method B<get_string>), use
 496 B<:vtable("get_string")>.
 497
 498 When the B<:vtable> flag is set, the object PMC cn be referred to with
 499 C<self>, as with the B<:method> flag.
 500
 501
 502 =item :outer(subname)
 503
 504 The marked C<.sub> is lexically nested within the sub known by B<subname>.
 505
 506 =item :lexid( <string_constant> )
 507
 508 Identifies the subroutine by the specified string.
 509
 510 {{ TODO: explain purpose and details of this flag. }}
 511
 512 =back
 513
 514
 515 =head3 Directives used for Parrot calling conventions.
 516
 517 {{ A bit of a radical idea, but now would be the time to decide on this:
 518    Remove the whole "long-style" invocation syntax altogether.
 519    Only allow the short version.
 520    As PIR is typically being generated, and hopefully by PCT-based
 521    compilers, there seems to be no real use for too much syntactic
 522    sugar. Just a thought.
 523 }}
 524
 525 =over 4
 526
 527 =item .begin_call and .end_call
 528
 529 Directives to start and end a subroutine invocation, respectively.
 530
 531 =item .begin_return and .end_return
 532
 533 Directives to start and end a statement to return values.
 534
 535 =item .begin_yield and .end_yield
 536
 537 Directives to start and end a statement to yield values.
 538
 539 =item .call
 540
 541 Takes either 2 arguments: the sub and the return continuation, or the
 542 sub only. For the latter case an B<invokecc> gets emitted. Providing
 543 an explicit return continuation is more efficient, if its created
 544 outside of a loop and the call is done inside a loop.
 545
 546 =item .invocant
 547
 548 Directive to specify the object for a method call. Use it in combination
 549 with C<.meth_call>.
 550
 551 =item  .meth_call
 552
 553 Directive to do a method call. It calls the specified method on the object
 554 that was specified with the C<.invocant> directive.
 555
 556 =item .nci_call
 557
 558 Directive to make a call through the Native Calling Interface (NCI).
 559 The specified subroutine must be loaded using the <dlfunc> op that takes
 560 the library, function name and function signature as arguments.
 561 See L<docs/pdds/pdd16_native_call> for details.
 562
 563 =item .return <var> [:<flag>]*
 564
 565 Between C<.begin_return> and C<.end_return>, specify one or
 566 more of the return value(s) of the current subroutine.  Available
 567 flags: C<:flat>, C<:named>.
 568
 569 =item .arg <var> [:<flag>]*
 570
 571 Between C<.begin_call> and C<.call>, specify an argument to be
 572 passed.  Available flags: C<:flat>, C<:named>.
 573
 574 =item .result <var> [:<flag>]*
 575
 576 Between C<.call> and C<.end_call>, specify where one or more return
 577 value(s) should be stored.  Available flags:
 578 C<:slurpy>, C<:named>, C<:optional>, and C<:opt_flag>.
 579
 580 =back
 581
 582 =head3 Directives for subroutine parameters
 583
 584 =over 4
 585
 586 =item .param <type> <identifier> [:<flag>]*
 587
 588 At the top of a subroutine, declare a local variable, in the manner
 589 of C<.local>, into which parameter(s) of the current subroutine should
 590 be stored. Available flags:
 591 C<:slurpy>, C<:named>, C<:optional>, C<:opt_flag> and C<:unique_reg>.
 592
 593 =item .param <type> "<identifier>" => <identifier> [:<flag>]*
 594
 595 Define a named parameter. This is syntactic sugar for:
 596
 597  .param <type> <identifier> :named("<identifier>")
 598
 599 =back
 600
 601 =head3 Parameter Passing and Getting Flags
 602
 603 See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of
 604 the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>,
 605 and C<FLAT>, which correspond to the calling convention flags
 606 C<:slurpy>, C<:optional>, C<:opt_flag>, and C<:flat>.
 607
 608
 609 =head3 Catching Exceptions
 610
 611 Using the C<push_eh> op you can install an exception handler. If an exception
 612 is thrown, Parrot will execute the installed exception handler. In order to
 613 retrieve the thrown exception, use the C<.get_results> directive. This
 614 directive always takes 2 arguments: an exception object and a message string.
 615
 616 {{ Wouldn't it be more useful to make this flexible, or at least only the
 617 exception object? The message can be retrieved from the exception object. }}
 618
 619    push_eh handler
 620    ...
 621  handler:
 622    .local pmc exception
 623    .local string message
 624    .get_results (exception, message)
 625    ...
 626
 627 This is syntactic sugar for the C<get_results> op, but any flags set on the
 628 targets will be handled automatically by the PIR compiler.
 629 The C<.get_results> directive must be the first instruction of the exception
 630 handler; only declarations (.lex, .local) may come first.
 631
 632 =head2 Syntactic Sugar
 633
 634 Any PASM opcode is a valid PIR instruction. In addition, PIR defines some
 635 syntactic shortcuts. These are provided for ease of use by humans producing
 636 and maintaing PIR code.
 637
 638 =over 4
 639
 640 =item goto <identifier>
 641
 642 C<branch> to I<identifier> (label or subroutine name).
 643
 644 Examples:
 645
 646   goto END
 647
 648 =item if <var> goto <identifier>
 649
 650 If I<var> evaluates as true, jump to the named I<identifier>. Translate to
 651 C<if var, identifier>.
 652
 653 =item unless <var> goto <identifier>
 654
 655 Unless I<var> evaluates as true, jump to the named I<identifier>. Translate
 656 to C<unless var, identifier>.
 657
 658 =item if null <var> goto <identifier>
 659
 660 If I<var> evaluates as null, jump to the named I<identifier>. Translate to
 661 C<if_null var, identifier>.
 662
 663 =item unless null <var> goto <identifier>
 664
 665 Unless I<var> evaluates as null, jump to the named I<identifier>. Translate
 666 to C<unless_null var, identifier>.
 667
 668 =item if <var1> <relop> <var2> goto <identifier>
 669
 670 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
 671 to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
 672 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 673
 674 =item unless <var1> <relop> <var2> goto <identifier>
 675
 676 The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
 677 to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless
 678 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 679
 680 =item <var1> = <var2>
 681
 682 Assign a value. Translates to C<set var1, var2>.
 683
 684 =item <var1> = <unary> <var2>
 685
 686 The unaries C<!>, C<-> and C<~> generate C<not>, C<neg> and C<bnot> ops.
 687
 688 =item <var1> = <var2> <binary> <var3>
 689
 690 The binaries C<+>, C<->, C<*>, C</>, C<%> and C<**> generate
 691 C<add>, C<sub>, C<mul>, C<div>, C<mod> and C<pow> arithmetic ops.
 692 binary C<.> is C<concat> and only valid for string arguments.
 693
 694 C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts C<shl> and C<shr>.
 695 C<E<gt>E<gt>E<gt>> is the logical shift C<lsr>.
 696
 697 C<&&>, C<||> and C<~~> are logic C<and>, C<or> and C<xor>.
 698
 699 C<&>, C<|> and C<~> are binary C<band>, C<bor> and C<bxor>.
 700
 701 {{PROPOSAL: Change description to support logic operators (comparisons) as
 702 implemented (and working) in imcc.y.}}
 703
 704 =item <var1> <op>= <var2>
 705
 706 This is equivalent to
 707 C<E<lt>var1E<gt> = E<lt>var1E<gt> E<lt>opE<gt> E<lt>var2E<gt>>. Where
 708 I<op> is called an assignment operator and can be any of the following
 709 binary operators described earlier: C<+>, C<->, C<*>, C</>, C<%>, C<.>,
 710 C<&>, C<|>, C<~>, C<E<lt>E<lt>>, C<E<gt>E<gt>> or C<E<gt>E<gt>E<gt>>.
 711
 712 =item <var> = <var> [ <var> ]
 713
 714 This generates either a keyed C<set> operation or C<substr var, var,
 715 var, 1> for string arguments and an integer key.
 716
 717 =item <var> = <var> [ <key> ]
 718
 719 {{ NOTE: keyed assignment is still valid in PIR, but the C<..> notation in
 720 keys is deprecated [See RT #48561], so this syntactic sugar for slices is also
 721 deprecated. See the (currently experimental) C<slice> opcode instead. }}
 722
 723 where C<key> is:
 724
 725  <var1> .. <var2>
 726
 727 returns a slice defined starting at C<var1> and ending at C<var2>.
 728
 729  .. <var2>
 730
 731 returns a slice starting at the first element, and ending at C<var2>.
 732
 733  <var1> ..
 734
 735 returns a slice starting at C<var1> to the end of the array.
 736
 737 see src/pmc/slice.pmc
 738 and t/pmc/slice.t.
 739
 740 =item <var> [ <var> ] = <var>
 741
 742 A keyed C<set> operation.
 743
 744 {{ DEPRECATION NOTE: this syntactic sugar will no longer be used for the
 745 assign C<substr> op with a length of 1. }}
 746
 747 =item <var> = <opcode> <arguments>
 748
 749 All opcodes can use this PIR syntactic sugar. The first argument for the
 750 opcode is placed before the C<=>, and all remaining arguments go after the
 751 opcode name. For example:
 752
 753   new $P0, 'Type'
 754
 755 becomes:
 756
 757   $P0 = new 'Type'
 758
 759 =item global "string" = <var>
 760
 761 {{ DEPRECATED: op store_global was deprecated }}
 762
 763 =item <var> = global "string"
 764
 765 {{ DEPRECATED: op find_global was deprecated }}
 766
 767 =item ([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])
 768
 769 This is short for:
 770
 771   .begin_call
 772   .arg <arg1> <flag2>
 773   ...
 774   .call <var2>
 775   .result <var1> <flag1>
 776   ...
 777   .end_call
 778
 779 =item <var> = <var>([arg [:<flag> ...], ...])
 780
 781 =item <var>([arg [:<flag> ...], ...])
 782
 783 =item <var>."_method"([arg [:<flag> ...], ...])
 784
 785 =item <var>._method([arg [:<flag> ...], ...])
 786
 787 Function or method call. These notations are shorthand for a longer PCC
 788 function call. I<var> can denote a global subroutine, a local I<identifier> or
 789 a I<reg>.
 790
 791 {{We should review the (currently inconsistent) specification of the
 792 method name. Currently it can be a bare word, a quoted string or a
 793 string register. See #45859.}}
 794
 795 =item .return ([<var> [:<flag> ...], ...])
 796
 797 Return from the current compilation unit with zero or more values.
 798
 799 The surrounded parentheses are mandatory. Besides making sequence
 800 break more conspicuous, this is necessary to distinguish this syntax
 801 from other uses of the C<.return> directive that will be probably
 802 deprecated.
 803
 804 =item .return <var>(args)
 805
 806 =item .return <var>."somemethod"(args)
 807
 808 =item .return <var>.somemethod(args)
 809
 810 Tail call: call a function or method and return from the sub with the
 811 function or method call return values.
 812
 813 Internally, the call stack doesn't increase because of a tail call, so
 814 you can write recursive functions and not have stack overflows.
 815
 816 =back
 817
 818 =head2 Assignment and Morphing
 819
 820 The C<=> syntactic sugar in PIR, when used in the simple case of:
 821
 822   <var1> = <var2>
 823
 824 directly corresponds to the C<set> opcode. So, two low-level arguments (int,
 825 num, or string registers, variables, or constants) are a direct C assignment,
 826 or a C-level conversion (int cast, float cast, a string copy, or a call to one
 827 of the conversion functions like C<string_to_num>).
 828
 829 A PMC source with a low-level destination, calls the C<get_integer>,
 830 C<get_number>, or C<get_string> vtable function on the PMC. A low-level source
 831 with a PMC destination calls the C<set_integer_native>, C<set_number_native>,
 832 or C<set_string_native> vtable function on the PMC (assign to value
 833 semantics).  Two PMC arguments are a direct C assignment (assign to container
 834 semantics).
 835
 836 For assign to value semantics for two PMC arguments use C<assign>, which calls
 837 the C<assign_pmc> vtable function.
 838
 839
 840 {{ NOTE: response to the question:
 841
 842     <pmichaud>  I don't think that 'morph' as a method call is a good idea
 843     <pmichaud>  we need something that says "assign to value" versus
 844         "assign to container"
 845     <pmichaud>  we can't eliminate the existing 'morph' opcode until we have a
 846         replacement
 847
 848 }}
 849
 850
 851 =head2 Macros
 852
 853 This section describes the macro layer of the PIR language. The macro layer of
 854 the PIR compiler handles the following directives:
 855
 856 =over 4
 857
 858 =item * C<.include> "<filename>"
 859
 860 The C<.include> directive takes a string argument that contains the
 861 name of the PIR file that is included. The contents of the included
 862 file are inserted as if they were written at the point where the
 863 C<.include> directive occurs.
 864
 865 The include file is searched for in the current directory and in
 866 runtime/parrot/include, in that order. The first file of that name to be found
 867 is included.
 868
 869 {{ Check the include directive's search order and whether it's complete }}
 870
 871 =item * C<.macro> <identifier> [<parameters>]
 872
 873 The C<.macro> directive starts the a macro definition named by the specified
 874 identifier. The optional parameter list is a comma-separated list of
 875 identifiers, enclosed in parentheses.  See C<.endm> for ending the macro
 876 definition.
 877
 878 =item * C<.endm>
 879
 880 Closes a macro definition.
 881
 882 =item * C<.macro_const> <identifier> (<literal>|<reg>)
 883
 884  .macro_const   PI  3.14
 885
 886 The C<.macro_const> directive is a special type of macro; it allows the user
 887 to use a symbolic name for a constant value. Like C<.macro>, the substitution
 888 occurs at compile time. It takes two arguments (not comma separated), the
 889 first is an identifier, the second a constant value or a register.
 890
 891 =back
 892
 893 The macro layer is completely implemented in the lexical analysis phase.
 894 The parser does not know anything about what happens in the lexical
 895 analysis phase.
 896
 897 When the C<.include> directive is encountered, the specified file is opened
 898 and the following tokens that are requested by the parser are read from
 899 that file.
 900
 901 A macro expansion is a dot-prefixed identifier. For instance, if a macro
 902 was defined as shown below:
 903
 904  .macro foo(bar)
 905  ...
 906  .endm
 907
 908 this macro can be expanded by writing C<.foo(42)>. The body of the macro
 909 will be inserted at the point where the macro expansion is written.
 910
 911 A C<.macro_const> expansion is more or less the same as a C<.macro> expansion,
 912 except that a constant expansion cannot take any arguments, and the
 913 substitution of a C<.macro_const> contains no newlines, so it can be used
 914 within a line of code.
 915
 916 =head3 Macro parameter list
 917
 918 The parameter list for a macro is specified in parentheses after the name of
 919 the macro. Macro parameters are not typed.
 920
 921  .macro foo(bar, baz, buz)
 922  ...
 923  .endm
 924
 925 The number of arguments in the call to a macro must match the number of
 926 parameters in the macro's parameter list. Macros do not perform multidispatch,
 927 so you can't have two macros with the same name but different parameters.
 928 Calling a macro with the wrong number of arguments gives the user an error.
 929
 930 If a macro defines no parameter list, parentheses are optional on both the
 931 definition and the call.  This means that a macro defined as:
 932
 933  .macro foo
 934  ...
 935  .endm
 936
 937 can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition
 938 written as:
 939
 940  .macro foo()
 941  ...
 942  .endm
 943
 944 can also be expanded by writing either C<.foo> or C<.foo()>.
 945
 946 {{ NOTE: this is a change from the current implementation, which requires the
 947 definition and call of a zero-parameter macro to match in the use of
 948 parentheses. }}
 949
 950 =over
 951
 952 =item * Heredoc arguments
 953
 954 Heredoc arguments are not allowed when expanding a macro. This means that
 955 the following is not allowed:
 956
 957    .macro foo(bar)
 958    ...
 959    .endm
 960
 961    .foo(<<'EOS')
 962  This is a heredoc
 963     string.
 964
 965  EOS
 966
 967 {{ NOTE: This is likely because the parsing of heredocs happens later than the
 968 preprocessing of macros. Might be nice if we could parse heredocs at the macro
 969 level, but not a high priority. compilers/pirc/new can do this. }}
 970
 971 Using braces, { }, allows you to span multiple lines for an argument.
 972 See runtime/parrot/include/hllmacros.pir for examples and possible usage.
 973 A simple example is this:
 974
 975  .macro foo(a,b)
 976    .a
 977    .b
 978  .endm
 979
 980  .sub main
 981    .foo({ print "1"
 982           print "2"
 983         }, {
 984           print "3"
 985           print "4"
 986         })
 987  .end
 988
 989 This will expand the macro C<foo>, after which the input to the PIR parser is:
 990
 991  .sub main
 992    print "1"
 993    print "2"
 994    print "3"
 995    print "4"
 996  .end
 997
 998 which will result in the output:
 999
1000  1234
1001
1002 =back
1003
1004 =head3 Unique local labels
1005
1006 Within the macro body, the user can declare a unique label identifier using
1007 the value of a macro parameter, like so:
1008
1009   .macro foo(a)
1010   ...
1011  .label $a:
1012   ...
1013   .endm
1014
1015
1016 =head3 Unique local variables
1017
1018 Within the macro body, the user can declare a local variable with a unique
1019 name.
1020
1021   .macro foo()
1022   ...
1023   .macro_local int b
1024   ...
1025   .b = 42
1026   print .b # prints the value of the unique variable (42)
1027   ...
1028   .endm
1029
1030 The C<.macro_local> directive declares a local variable with a unique name in
1031 the macro. When the macro C<.foo()> is called, the resulting code that is
1032 given to the parser will read as follows:
1033
1034   .sub main
1035     .local int local__foo__b__2
1036     ...
1037     local__foo__b__2 = 42
1038     print local__foo__b__2
1039
1040   .end
1041
1042 The user can also declare a local variable with a unique name set to the
1043 symbolic value of one of the macro parameters.
1044
1045   .macro foo(b)
1046   ...
1047   .macro_local int $b
1048   ...
1049   .$b = 42
1050   print .$b # prints the value of the unique variable (42)
1051   print .b  # prints the value of parameter "b", which is
1052             # also the name of the variable.
1053   ...
1054   .endm
1055
1056 So, the special C<$> character indicates whether the symbol is interpreted as
1057 just the value of the parameter, or that the variable by that name is meant.
1058 Obviously, the value of C<b> should be a string.
1059
1060 The automatic name munging on C<.macro_local> variables allows for using
1061 multiple macros, like so:
1062
1063   .macro foo(a)
1064   .macro_local int $a
1065   .endm
1066
1067   .macro bar(b)
1068   .macro_local int $b
1069   .endm
1070
1071   .sub main
1072     .foo("x")
1073     .bar("x")
1074   .end
1075
1076 This will result in code for the parser as follows:
1077
1078   .sub main
1079     .local int local__foo__x__2
1080     .local int local__bar__x__4
1081   .end
1082
1083 Each expansion is associated with a unique number; for labels declared with
1084 C<.macro_label> and locals declared with C<.macro_local> expansions, this
1085 means that multiple expansions of a macro will not result in conflicting
1086 label or local names.
1087
1088 =head3 Ordinary local variables
1089
1090 Defining a non-unique variable can still be done, using the normal syntax:
1091
1092   .macro foo(b)
1093   .local int b
1094   .macro_local int $b
1095   .endm
1096
1097 When invoking the macro C<foo> as follows:
1098
1099   .foo("x")
1100
1101 there will be two variables: C<b> and C<x>. When the macro is invoked twice:
1102
1103   .sub main
1104     .foo("x")
1105     .foo("y")
1106   .end
1107
1108 the resulting code that is given to the parser will read as follows:
1109
1110   .sub main
1111     .local int b
1112     .local int local__foo__x
1113     .local int b
1114     .local int local__foo__y
1115   .end
1116
1117 Obviously, this will result in an error, as the variable C<b> is defined
1118 twice.  If you intend the macro to create unique variables names, use
1119 C<.macro_local> instead of C<.local> to take advantage of the name munging.
1120
1121 =head1 EXAMPLES
1122
1123 =head2 Subroutine Definition
1124
1125   .sub _sub_label [<subflag>]*
1126    .param int a
1127    .param int b
1128    .param int c
1129   ...
1130   .begin_return
1131    .return xy
1132   .end_return
1133   ...
1134   .end
1135
1136 =head2 Subroutine Call
1137
1138   .const .Sub $P0 = "_sub_label"
1139   $P1 = new 'Continuation'
1140   set_addr $P1, ret_addr
1141   ...
1142   .local int x
1143   .local num y
1144   .local str z
1145   .begin_call
1146   .arg x
1147   .arg y
1148   .arg z
1149   .call $P0, $P1    # r = _sub_label(x, y, z)
1150   ret_addr:
1151   .local int r  # optional - new result var
1152   .result r
1153   .end_call
1154
1155 =head2 NCI Call
1156
1157   load_lib $P0, "libname"
1158   dlfunc $P1, $P0, "funcname", "signature"
1159   ...
1160   .begin_call
1161   .arg x
1162   .arg y
1163   .arg z
1164   .nci_call $P1 # r = funcname(x, y, z)
1165   .local int r  # optional - new result var
1166   .result r
1167   .end_call
1168
1169 =head2 Subroutine Call Syntactic Sugar
1170
1171   ...  # variable decls
1172   r = _sub_label(x, y, z)
1173   (r1[, r2 ...]) = _sub_label(x, y, z)
1174   _sub_label(x, y, z)
1175
1176 This also works for NCI calls, as the subroutine PMC will be
1177 a NCI sub, and on invocation will do the Right Thing.
1178 Instead of the label a subroutine object can be used too:
1179
1180    find_global $P0, "_sub_label"
1181    $P0(args)
1182
1183
1184 =head2 Methods
1185
1186   .namespace [ "Foo" ]
1187
1188   .sub _sub_label :method [,Subpragma, ...]
1189    .param int a
1190    .param int b
1191    .param int c
1192    ...
1193    self."_other_meth"()
1194   ...
1195   .begin_return
1196    .return xy
1197   .end_return
1198   ...
1199   .end
1200
1201 The variable "self" automatically refers to the invocating object, if the
1202 subroutine declaration contains "method".
1203
1204 =head2 Calling Methods
1205
1206 The syntax is very similar to subroutine calls. The call is done with
1207 C<meth_call> which must immediately be preceded by the C<.invocant>:
1208
1209    .local pmc class
1210    .local pmc obj
1211    newclass class, "Foo"
1212    new obj, class
1213   .begin_call
1214   .arg x
1215   .arg y
1216   .arg z
1217   .invocant obj
1218   .meth_call "_method" [, $P1 ] # r = obj."_method"(x, y, z)
1219   .local int r  # optional - new result var
1220   .result r
1221   .end_call
1222
1223 The return continuation is optional. The method can be a string
1224 constant or a string variable.
1225
1226 =head2 Returning and Yielding
1227
1228   .return ( a, b )      # return the values of a and b
1229
1230   .return ()            # return no value
1231
1232   .return func_call()   # tail call function
1233
1234   .return o."meth"()    # tail method call
1235
1236 Similarly, one can yield using the .yield directive
1237
1238   .yield ( a, b )      # yield with the values of a and b
1239
1240   .yield ()            # yield with no value
1241
1242
1243 =head2 Stack calling conventions
1244
1245 Arguments are B<save>d in reverse order onto the user stack:
1246
1247    .arg y   # save args in reversed order
1248    .arg x
1249    call _foo    #(r, s) = _foo(x,y)
1250    .local int r
1251    .local int s
1252    .result r    # restore results in order
1253    .result s    #
1254
1255 and return values are B<restore>d in argument order from there.
1256
1257
1258
1259  .sub _foo      # sub foo(int a, int b)
1260    saveall
1261    .param int a         # receive arguments from left to right
1262    .param int b
1263    ...
1264
1265    .return mi       # return (pl, mi), push results
1266    .return pl       # in reverse order
1267    restoreall
1268    ret
1269  .end
1270
1271 Pushing arguments in reversed order on the user stack makes the left
1272 most argument the top of stack entry. This allows for a variable
1273 number of function arguments (and return values), where the left most
1274 argument before a variable number of following arguments is the
1275 argument count.
1276
1277
1278 =head1 ATTACHMENTS
1279
1280 N/A
1281
1282 =head1 FOOTNOTES
1283
1284 N/A
1285
1286 =head1 REFERENCES
1287
1288 See C<docs/imcc/macros.pod>
1289
1290 =cut
1291
1292 __END__
1293 Local Variables:
1294   fill-column:78
1295 End: