docs/pct/past_building_blocks.pod

   1 # Copyright (C) 2008, Parrot Foundation.
   2 # $Id$
   3
   4 =head1 NAME
   5
   6 PAST Building Blocks - A catalogue of PAST nodes and how to use them.
   7
   8 =head1 DESCRIPTION
   9
  10 This document describes the PIR output for each type of Parrot Abstract Syntax
  11 Tree (PAST) node, and all available attributes.
  12
  13 =head1 PAST::Node
  14
  15 PAST::Node is the base class of all other PAST classes.
  16
  17 =head2 Attributes
  18
  19 =head3 C<:node>
  20
  21 Sets this node's C<source> and C<pos> attributes.  These indicate which part of
  22 the source code generated this node.  The C<node> attribute can be set using
  23 another node or a Match object.
  24
  25 The C<:node> attribute is needed for annotating a source line. If a source line
  26 is represented by several nodes, i.e. a subtree of the whole AST, then only the
  27 root of that subtree needs to have the C<:node> attribute.
  28
  29 =head2 Methods
  30
  31 See L<docs/pdd26_ast.pod>.
  32
  33 =head1 PAST::Block
  34
  35 =head2 Synopsis
  36
  37  PAST::Block.new()
  38
  39 Without any attributes, a PAST::Block translates to the following:
  40
  41 =begin PIR
  42
  43  .namespace []
  44  .sub "_block10"
  45      .return ()
  46  .end
  47
  48 =end PIR
  49
  50 =head2 Typical use
  51
  52 A block represents a lexical C<scope>. The most common use for a block
  53 is subroutines (or functions or procedures or whatever name Your Language
  54 gives to invokable objects). Usually, C<nested> blocks define a new scope
  55 as well, which can also be represented by a block. A typical example is
  56 the common C<if> statement (the example here uses Lua syntax):
  57
  58  if <expr> then
  59    <block1>
  60  else
  61    <block2>
  62  end
  63
  64 A variable in C<block1> is (depending on the language) usually not visible
  65 in C<block2>. If the statements in the body of the C<if> statement are not
  66 in a different scope, use a PAST::Stmts node instead.
  67
  68 =head2 Attributes
  69
  70 =head3 C<:namespace>
  71
  72 Sets the namespace of the block to the specified value.
  73
  74  :namespace('foo')
  75
  76 will result in a directive:
  77
  78  .namespace ["foo"]
  79
  80 {{XXX how to specify nested namespaces? what kind of string array?}}
  81
  82 =head3 C<:blocktype>
  83
  84 Available options:
  85
  86 =over 4
  87
  88 =item 'declaration'
  89
  90 Default value. The block is created, but not invoked. This is typically used
  91 for creating functions.
  92
  93 =item 'immediate'
  94
  95 The block is invoked immediately. When the :blocktype attribute is set to this
  96 value, the generated PIR becomes:
  97
  98 =begin PIR
  99
 100  .namespace []
 101  .sub "anon"
 102      get_global $P11, "_block10"
 103      newclosure $P11, $P11
 104      $P11()
 105  .end
 106
 107  .namespace []
 108  .sub "_block10"
 109      .return ()
 110  .end
 111
 112 =end PIR
 113
 114 The sub C<anon> is the I<main> function, which is the entry point of the
 115 whole program. It looks for the sub that we declared, creates a new closure
 116 from that block (more on that later), and invokes that closure.
 117
 118 =back
 119
 120 =head3 C<:name>
 121
 122 Gives a name to the block. The generated PIR subroutine is named by the
 123 value of this attribute. Example:
 124
 125  PAST::Block.new( :name('foo') )
 126
 127 translates to:
 128
 129 =begin PIR
 130
 131  .sub "foo"
 132      .return ()
 133  .end
 134
 135 =end PIR
 136
 137 =head2 Symbol tables
 138
 139 As a block defines a new scope, a PAST::Block object has a symbol table,
 140 which can be used to store and look up symbols. See PDD26 for more details.
 141 A PAST::Block node has a method C<symbol> used to both enter and find symbols.
 142
 143 The C<symbol> method takes the symbol name as an argument, and optional
 144 attributes. If the symbol does not exist in the table, a new hash is created
 145 for that symbol. Any additional attribute arguments are set into this new hash.
 146 The C<symbol> method returns the current entry for the specified name. Checking
 147 for a symbol in a symbol table is as easy as:
 148
 149  our $?BLOCK; # the current scope
 150
 151  if $?BLOCK.symbol('foo') {
 152    # symbol 'foo' was found
 153  }
 154  else {
 155    # symbol 'foo' was not found
 156  }
 157
 158  {{
 159  XXX Review this; this should refer to PAST compiler's symbol() method.
 160
 161 If a symbol is not found in a block's symbol table, any outer blocks
 162 (outer scopes) will be inspected. The symbol table entry of the innermost
 163 scope is returned. For instance, consider the following Perl 6 code:
 164
 165  sub foo {
 166    my $i = 42;
 167    sub bar {
 168      sub baz {
 169        say("baz: $i");
 170      }
 171    }
 172  }
 173
 174 In the subroutine C<baz>, the variable C<$i> is referenced. It is not
 175 found in the symbol table of C<baz>, so it is looked for in baz' outer
 176 scope, C<bar>. It's not found there either, so bar's outer scope, C<foo>
 177 is inspected, where it is found.
 178
 179  XXX
 180  }}
 181
 182 =head1 PAST::Stmts
 183
 184 =head2 Synopsis
 185
 186  PAST::Stmts.new()
 187
 188 =head2 Typical use
 189
 190 The common use of a PAST::Stmts node is to group a set of statements. Most
 191 programming languages have a grammar rule that similar to this:
 192
 193  rule statements {
 194      [<statement> ';']*
 195      {*}
 196  }
 197
 198 In order to move around the PAST nodes representing these individual statements,
 199 it is useful to group them and store them in one node. This can be done by
 200 storing all PAST nodes in a PAST::Stmts node.
 201 Note that PAST does not define a C<statement>; any PAST node can be considered
 202 a statement, including a PAST node as simple as PAST::Val.
 203
 204 =head1 PAST::Var
 205
 206 =head2 Synopsis
 207
 208  PAST::Var.new( :name('foo'), :scope('lexical') )
 209
 210 results in:
 211
 212 =begin PIR_FRAGMENT
 213
 214  find_lex $P10, "foo"
 215
 216 =end PIR_FRAGMENT
 217
 218 =head2 Typical use
 219
 220 Nodes of type PAST::Var are used to represent variables and their declarations
 221 (based on the C<:isdecl> flag, see below). Wherever a variable is used in the
 222 source language, this can be represented by a PAST::Var node.
 223
 224 =head2 Attributes
 225
 226 =head3 C<:name> (required)
 227
 228 Sets the name of the variable. This attribute is required.
 229
 230 =head3 C<:scope>
 231
 232 Set the scope of this PAST::Var node. If the C<:scope>  is not set, it is
 233 inherited from the symbol table entry in a PAST::Block's symbol table.
 234 If the scope is not set, the scope defaults to C<lexical>.
 235
 236 Available values:
 237
 238 =over 4
 239
 240 =item 'lexical'
 241
 242 When the C<:scope> attribute is set to C<lexical>, then the identifier
 243 specified in the C<:name> attribute is handled as a lexical:
 244
 245 =begin PIR_FRAGMENT
 246
 247  find_lex $P10, "foo"
 248
 249 =end PIR_FRAGMENT
 250
 251 =item 'package'
 252
 253 Defines the variable as a C<package> variable. This is handled by the
 254 C<get_global> and C<set_global> ops.
 255
 256 =item 'parameter'
 257
 258 Defines the variable as a parameter (but is stored as a lexical).
 259
 260 =begin PIR_FRAGMENT
 261
 262  .param pmc param_10
 263  .lex "foo", param_10
 264
 265 =end PIR_FRAGMENT
 266
 267 =item 'keyed'
 268
 269  my $idx := PAST::Var.new( :name('foo'), :scope('package') );
 270  my $agg := PAST::Var.new( :name('bar'), :scope('package') );
 271  make PAST::Var.new( $agg, $idx, :scope('keyed') );
 272
 273 results in:
 274
 275 =begin PIR_FRAGMENT
 276
 277  get_global $P10, "foo"    # get index, a package-scoped variable "foo"
 278  get_global $P11, "bar"    # get aggregate, a package-scoped variable "bar"
 279  set $P12, $P11[$P10]      # get the contents at bar[foo]
 280
 281 =end PIR_FRAGMENT
 282
 283 =item 'attribute'
 284
 285 Defines the variable as an attribute of an object. The first child of
 286 the PAST::Var node evaluates to the object from which the attribute is
 287 requested. If there is no such child, the object defaults to C<self>,
 288 meaning the current invocant.
 289
 290  make PAST::Var.new( PAST::Var.new( :name('foo'), :scope('package') ),
 291                      :name('bar'),
 292                      :scope('attribute')
 293                    );
 294
 295 translates to:
 296
 297 =begin PIR_FRAGMENT_INVALID
 298
 299  get_global $P10, "foo"
 300  get_attribute $P11, $P10, "bar"
 301
 302 =end PIR_FRAGMENT_INVALID
 303
 304 Note: currently, no child nodes are evaluated, so you can only get
 305 attributes on C<self> for now.
 306
 307 =back
 308
 309 =head3 C<:isdecl(INT)>
 310
 311 If this flag is set, the variable represented by this PAST::Var node is
 312 declared at this point. This flag is cleared by default.
 313
 314 =begin PIR_FRAGMENT
 315
 316  .lex "foo", $P10
 317
 318 =end PIR_FRAGMENT
 319
 320 =head3 C<:viviself>
 321
 322 When this attribute is set, the variable is initialized with the value
 323 represented by the PAST subtree given as this attribute's argument.
 324 Adding this attribute will result in the following generated code:
 325
 326     <code for PAST::Var, leaving the variable in $P10>
 327     unless_null $P10, vivify_11
 328     <evaluate the child PAST node, leaving result in $P12>
 329     assign $P10, $P12
 330   vivify_11:
 331
 332 =head3 C<:vivibase>
 333
 334 Similar to C<:viviself>, but the value of this attribute yields the
 335 initialization code for an aggregate object. See L<pdd26_ast.pod> for
 336 more details.
 337
 338 =head1 PAST::Val
 339
 340 =head2 Synopsis
 341
 342  PAST::Val.new( :value(42) )
 343
 344 results in:
 345
 346 =begin PIR_FRAGMENT
 347
 348  new $P10, "Integer"
 349  assign $P10, 42
 350
 351 =end PIR_FRAGMENT
 352
 353 =head2 Typical use
 354
 355 All literal values in your source language can be represented by
 356 PAST::Val nodes.
 357
 358 =head2 Attributes
 359
 360 =head3 C<:value> (required)
 361
 362 Specifies the literal value that is represented by this PAST::Val node.
 363
 364 =head3 C<:returns>
 365
 366 Specifies the type of the value. If this is not specified, then some
 367 defaults are used. For integer values, this is C<Integer>, for quoted
 368 strings this is C<String>; for floating point values, this is C<Float>.
 369
 370 =head1 PAST::VarList
 371
 372 =head2 Synopsis
 373
 374  PAST::VarList.new()
 375
 376 =head2 Typical use
 377
 378 Used to group a number of PAST::Var nodes.
 379
 380
 381 =head1 PAST::Op
 382
 383 =head2 Synopsis
 384
 385  PAST::Op.new( :pasttype('if') )
 386
 387 =head2 Typical use
 388
 389 PAST::Op nodes are used to represent common operations, such as an
 390 C<if> statement, a sub C<call>. They can also be used to generate
 391 custom PIR instructions, using the C<:inline> attribute.
 392
 393 =head2 Attributes
 394
 395 =head3 C<:pasttype>
 396
 397
 398 =over 4
 399
 400 =item C<if>
 401
 402 Requires at least 1 child node, which is evaluated as the conditional
 403 expression.
 404
 405     <evaluate 1st child, result stored in $P11>
 406     if $P11, if_10
 407     <else part (3rd child); optional>
 408     goto if_10_end
 409  if_10:
 410     <then part (2nd child); optional>
 411  if_10_end:
 412
 413 =item C<unless>
 414
 415 Same as C<if>, except that the C<if> op is replaced by the C<unless> op.
 416
 417 =item C<while>
 418
 419 Requires 2 children; the first is the loop condition, the second is the
 420 loop body.
 421
 422  while_10:
 423     <evaluate 1st child, result stored in $P11>
 424     unless $P11, while_10_end
 425     <evaluate 2nd child>
 426     goto while_10
 427  while_10_end:
 428
 429 =item C<until>
 430
 431 Same as C<while>, except the loop is executed while the condition evaluates
 432 to false.
 433
 434 =item C<call> (default)
 435
 436  PAST::Op.new( :name('foo'), :pasttype('call') );
 437
 438 results in:
 439
 440 =begin PIR_FRAGMENT
 441
 442  "foo"()
 443
 444 =end PIR_FRAGMENT
 445
 446 while
 447
 448  my $fun := PAST::Var.new( :name('foo'), :scope('package'));
 449  PAST::Op.new( $fun, :pasttype('call') );
 450
 451 generates:
 452
 453 =begin PIR_FRAGMENT
 454
 455  get_global $P10, "foo"
 456  $P10()
 457
 458 =end PIR_FRAGMENT
 459
 460 Children of a :pasttype('call') node are evaluated and passed as arguments.
 461 If the node does not receive a C<:name> attribute, then the first child
 462 is evaluated as the subroutine to be invoked.
 463
 464 =item C<callmethod>
 465
 466  my $invocant := PAST::Var.new( :name('foo'), :scope('package') );
 467  PAST::Op.new( $invocant,
 468                :name('bar'),
 469                :pasttype('callmethod')
 470               );
 471
 472 generates:
 473
 474 =begin PIR_FRAGMENT
 475
 476  get_global $P10, "foo"
 477  $P10."bar"()
 478
 479 =end PIR_FRAGMENT
 480
 481 =item C<bind>
 482
 483 Binds the variable represented by the first child to the value
 484 represented by the second child.
 485
 486  my $lhs := PAST::Var.new( :name('foo'), :scope('package') );
 487  my $rhs := PAST::Val.new( :value(42) );
 488  make PAST::Op.new($lhs, $rhs, :pasttype('bind') );
 489
 490 results in:
 491
 492 =begin PIR_FRAGMENT
 493
 494  new $P10, "Integer"       # code for evaluating $rhs
 495  assign $P10, 42
 496  set_global "foo", $P10    # code for the :pasttype('bind') op
 497
 498 =end PIR_FRAGMENT
 499
 500 when the scope is set to C<lexical>, the last line becomes:
 501
 502 =begin PIR_FRAGMENT
 503
 504  store_lex "foo", $P10
 505
 506 =end PIR_FRAGMENT
 507
 508 =back
 509
 510 =head3 C<:inline>
 511
 512 If this attribute is specified, C<:pasttype> is implicitly set
 513 to the value of C<inline>. The specified string is emitted in the
 514 code generator. The string may contain special fields: %n where n
 515 is an integer value between (0,9); %r, %t and %u. See the PDD for
 516 details.
 517
 518 Example:
 519
 520  my $var := PAST::Var.new( :name('foo'), :scope('lexical') );
 521  PAST::Op.new( $var, :inline('    %r = %0') );
 522
 523 is transformed to:
 524
 525 =begin PIR_FRAGMENT
 526
 527  find_lex $P10, "foo" # generated for $var
 528  $P11 = $P10          # inline '%r = %0'
 529
 530 =end PIR_FRAGMENT
 531
 532 =head1 TIPS AND TRICKS
 533
 534 Once you have experience in using PAST nodes, generating code for
 535 Your Favorite Language becomes rather straightforward. However, it
 536 is sometimes tricky to get started. Therefore, this section presents
 537 some tips 'n' tricks to get you started.
 538
 539 =head2 Refactor Grammar Rules
 540
 541 =head3 Scenario 1
 542
 543 Sometimes it is useful to refactor the grammar of your language in
 544 order to make code generation somewhat easier or to make the action
 545 method easier. Consider the following example.
 546
 547  rule primary_expr {
 548      [ <prefix> | <functioncall> ] <expression>
 549      {*}
 550  }
 551
 552  method primary_expr($/) {
 553      my $past;
 554      if $<prefix> {
 555          $past := $( $<prefix> );
 556      }
 557      else {
 558          $past := $( $<functioncall> );
 559      }
 560      my $expr := $( $<expression> );
 561
 562      # do something with $past and $expr
 563      # ...
 564  }
 565
 566 while this solution is straightforward, the code in the action method
 567 contains a conditional statement. The more branches you have in your
 568 code, the more you need to think when you re-read this in 6 months time.
 569 An alternative solution would be this:
 570
 571  rule primary_expr {
 572      <prefix_expr> <expression>
 573      {*}
 574  }
 575
 576  rule prefix_expr {
 577      | <prefix> {*}        #= prefix
 578      | <functioncall> {*}  #= functioncall
 579  }
 580
 581  method primary_expr($/) {
 582      my $past := $( $<prefix_expr> );
 583      my $expr := $( $<expression> );
 584
 585      # do something with $past and $expr
 586      # ...
 587  }
 588
 589  method prefix_expr($/, $key) {
 590      make $( $/{$key} );
 591  }
 592
 593 While you have to write a bit more code, this code is more straightforward.
 594 While there might be a small cost in function call overhead, there is no
 595 longer the conditional statement, which itself is more efficient.
 596
 597 =head3 Scenario 2
 598
 599 Consider a language that uses an index notation to indicate fields (attributes),
 600 like so:
 601
 602  foo["bar"] = 42
 603
 604 this language also has some syntactic sugar for this, using a dot notation:
 605
 606  foo.bar = 42
 607
 608 This can be expressed in the following grammar rules:
 609
 610  rule target {
 611      | <ident> <index>?
 612      | ...
 613  }
 614
 615  rule index {
 616      | '.' <ident> {*}      #= ident
 617      | '[' <quote> ']' {*}  #= quote
 618  }
 619
 620 A naive implementation could look like this:
 621
 622  method target($/) {
 623      my $name := $( $<ident> );
 624
 625      if $<index> {
 626          my $idx := $( $<index>[0] );
 627          # do something with $idx
 628      }
 629      # ...
 630  }
 631
 632  method index($/, $key) {
 633      my $indexexpr := $( $/{$key} );
 634
 635      # if $indexexpr is an identifier, stringify it
 636      if $key eq 'ident' {
 637         $indexexpr := PAST::Val.new( :returns('String'), :value($indexexpr) );
 638      }
 639      make $indexexpr;
 640  }
 641
 642 Somewhere you have to check the type of index. Not only does this
 643 result in more complex code (more conditional statements), it is
 644 less efficient, as an extra PAST node must be created.
 645
 646 A more elegant solution is to refactor the grammar slightly, like so:
 647
 648  rule index {
 649      | <dot_field> {*}     #= dot_field
 650      | '[' <quote> ']' {*} #= quote
 651  }
 652
 653  rule dot_field {
 654      '.' <ident>
 655      {*}
 656  }
 657
 658  method index($/, $key) {
 659      make $( $/{$key} );
 660  }
 661
 662  method dot_field($/) {
 663      my $field := $( $<ident> );
 664      make PAST.Val.new( :returns('String'), :value($field) );
 665  }
 666
 667 There is no more conditional code, it's just a matter of computations;
 668 based on the type of index, do the right thing automatically.
 669
 670
 671 =head2 Create PAST nodes I<deep> in the parse tree
 672
 673 Consider a grammar fragment such as this:
 674
 675  rule function_def {
 676      'function' <ident> '(' <parameters>? ')' <block> 'end'
 677      {*}
 678  }
 679
 680  rule parameters {
 681      <ident> [',' <ident>]*
 682      {*}
 683  }
 684
 685 You could write the action methods for these rules as follows:
 686
 687  method function_def($/) {
 688     my $past := PAST::Block.new( :node($/) );
 689
 690     if $<parameters> {
 691         # get the PAST::VarList that contains all parameters
 692         my $params := $( $<parameters>[0] );
 693
 694         # put all of them into the PAST::Block node
 695         for @($params) {
 696             $past.push($_);
 697         }
 698     }
 699
 700     $past.name( $( $<ident> ) );
 701     $past.push( $( $<block> ) );
 702     make $past;
 703  }
 704
 705  method parameters($/) {
 706     my $past := PAST::VarList.new( :node($/) );
 707     for $<ident> {
 708         my $param := $($_);
 709         $param.scope('parameter');
 710         $past.push( $param );
 711     }
 712     make $past;
 713  }
 714
 715 While this solution works well, this is suboptimal. In the
 716 action method for <parameters>, a PAST::VarList node is created,
 717 which is only used to move around the parameter identifiers, and
 718 then discarded.
 719 An alternative solution would be to create the PAST::Block node
 720 that represents the function in the action method for <parameters>.
 721 This makes perfect sense, as the only place where the parameters
 722 should live is in that function block. Then, in the action method
 723 for C<function_def>, this PAST::Block node is retrieved and
 724 I<decorated> with other values (such as a function name, for instance)
 725 and the PAST node for the function body. Only if there are no parameters
 726 should the action method for C<function_def> create a PAST::Block node.
 727
 728 The result could look like this:
 729
 730  method function_def($/) {
 731     my $past;
 732     if $<parameters> {
 733         $past := $( $<parameters>[0] );
 734     }
 735     else { # no parameters, create the function block here
 736         $past := PAST::Block.new( :node($/) );
 737     }
 738     $past.name( $( $<ident> ).name() );
 739     $past.push( $( $<block> ) );
 740  }
 741
 742  method parameters($/) {
 743     my $past := PAST::Block.new( :node($/) );
 744     for $<ident> {
 745         my $param := $($_);
 746         $param.scope('parameter');
 747         $past.push( $param );
 748     }
 749     make $past;
 750  }
 751
 752 A further refactor could result in even simpler code:
 753
 754  rule function_def {
 755      'function' <ident> '(' <parameters> ')' <block> 'end'
 756      {*}
 757  }
 758
 759  rule parameters {
 760      [ <ident> [',' <ident>]* ]?
 761  }
 762
 763  method function_def($/) {
 764     my $past := $( $<parameters> );
 765     $past.name( $( $<ident> ).name() );
 766     $past.push( $( $<block> ) );
 767     make $past;
 768  }
 769
 770  method parameters($/) {
 771     my $past := PAST::Block.new( :node($/) );
 772     for $<ident> {
 773         my $param := $( $<ident> );
 774         $param.scope('parameter');
 775         $past.push($param);
 776     }
 777     make $past;
 778  }
 779
 780 Note that the rule C<parameters> is changed slightly. For indexing
 781 the C<ident> nodes, this makes no difference, as they already lived
 782 in an array. Making all of them optional doesn't change the way they
 783 are stored in the parse tree.
 784
 785 The same principle can be applied in several scenarios.
 786 More tips will be added later.
 787
 788 =head2 Steal from Perl 6
 789
 790 You are implementing a language and you want to find out which PAST nodes
 791 you should generate for your language construct. Often Perl 6 has the same
 792 language construct. This means that you can write a quick example script
 793 in Perl 6 and look at the PAST generated by Rakudo (the Perl 6 implementation
 794 for Parrot):
 795
 796   ./perl6 --target=past t.pl
 797
 798 =head1 SEE ALSO
 799
 800 =over 4
 801
 802 =item * docs/pdds/pdd26_ast.pod
 803
 804 =back
 805
 806 =head1 BUGS AND IMPROVEMENTS
 807
 808 Bug reports and improvements on this document may be sent to
 809 C<parrotbug@parrotcode.org>.
 810
 811 =cut