docs/pdds/pdd26_ast.pod

   1 # Copyright (C) 2007, The Perl Foundation
   2 # $Id$
   3
   4 =head1 NAME
   5
   6 docs/pdds/pdd26_ast.pod - Parrot Abstract Syntax Tree
   7
   8 =head1 ABSTRACT
   9
  10 This PDD describes the node types and semantics of the Parrot
  11 Abstract Syntax Tree representation.
  12
  13 =head1 VERSION
  14
  15 $Revision$
  16
  17 =head1 DESCRIPTION
  18
  19 The Parrot Abstract Syntax Tree (PAST) is a set of Parrot
  20 classes useful for generating an abstract semantic representation
  21 of a program written in a high level language such as Perl or
  22 Python.  Once a program has been translated into its equivalent
  23 PAST representation, the Parrot Compiler Toolkit can be used to
  24 generate executable PIR or bytecode from the abstract syntax
  25 tree representation.
  26
  27 PAST is designed to provide the common operations and semantics
  28 needed by many of the high level languages targeted by Parrot,
  29 while also being extensible to support the unique needs of
  30 specific languages.
  31
  32 =head1 IMPLEMENTATION
  33
  34 =head2 PAST::Node
  35
  36 C<PAST::Node> is the base class for all nodes in an abstract
  37 syntax tree.  Each C<PAST::Node> element has an array of
  38 children nodes (which may be empty), as well as a number of
  39 attributes and accessor methods for the node.
  40
  41 Every PAST node has C<name>, C<source>, and C<pos> attributes.
  42 The C<name> attribute is the node's name, if any, while
  43 C<source> and C<pos> are used to identify the location of the
  44 node in the original source code.
  45
  46 =over 4
  47
  48 =item init([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
  49
  50 Initialize a PAST node with the given children and attributes.
  51 Adds each child to the node (using the C<push> method, below)
  52 and calls the appropriate accessor method for each attribute.
  53
  54 =item new([child1, child2, ..., ] [attr1=>val1, attr2=>val2, ...])
  55
  56 Create a new PAST node with the same type as the invocant
  57 and initialized with the given children and attributes.
  58 Returns the newly created node.
  59
  60 =item push(child)
  61
  62 Add C<child> to the end of the node's array of children.
  63
  64 =item pop()
  65
  66 Remove the last child from the node's array of children.
  67 Returns the child.
  68
  69 =item unshift(child)
  70
  71 Add C<child> to the beginning of the node's array of children.
  72
  73 =item shift()
  74
  75 Remove the first child from the node's array of children.
  76 Returns the child.
  77
  78 =item iterator( )
  79
  80 Return a newly initialized C<Iterator> for the node's list
  81 of children.
  82
  83 =item node(val)
  84
  85 Set the invocant's C<source> and C<pos> attributes to be the
  86 same as C<val>.  If C<val> is another PAST node, then C<source>
  87 and C<pos> are simply copied from that node.  Otherwise, C<val>
  88 is assumed to be a C<Match> object (e.g., from PGE) and the
  89 source and position information are obtained from that.
  90
  91 =item name([value])
  92
  93 Accessor method -- sets/returns the C<name> attribute of the node.
  94
  95 =item named([value])
  96
  97 Accessor method -- for nodes that are being used as named arguments,
  98 sets/returns the name to be associated with the argument.
  99
 100 =item flat([value])
 101
 102 Accessor method -- sets/returns the "flatten" flag on nodes being
 103 used as arguments.
 104
 105 =item attr(STR attrname, PMC value, INT has_value)
 106
 107 Helper method for writing accessors -- if C<has_value> is true
 108 then set the node's value of C<attrname> to C<value>.  Returns
 109 the (resulting) value of C<attrname> for the invocant.
 110
 111 =back
 112
 113 =head2 PAST::Stmts
 114
 115 C<PAST::Stmts> is simply a C<PAST::Node> used to contain a
 116 sequence of other PAST nodes to be evaluated.  It has no
 117 specific methods or attributes.
 118
 119 =head2 PAST::Val
 120
 121 C<PAST::Val> nodes represent constant values in the abstract
 122 syntax tree.  The C<value> attribute represents the value of
 123 the node itself.
 124
 125 =over 4
 126
 127 =item value([value])
 128
 129 Accessor method to get/set the constant value for this node.
 130
 131 =item returns([typename])
 132
 133 Get/set the type of value to be generated by this node.
 134 If not specified, the type is taken from the type of
 135 the value attribute given above.
 136
 137 =back
 138
 139 =head2 PAST::Var
 140
 141 C<PAST::Var> nodes represent variable-like items within the
 142 abstract syntax tree.  The C<name> attribute (inherited from
 143 C<PAST::Node>) identifies the name of the variable as given
 144 by the high level language program.  The other attributes for
 145 C<PAST::Var> nodes provide the details of how the variable
 146 is defined and accessed.
 147
 148 =over 4
 149
 150 =item scope([value])
 151
 152 A variable node's "scope" indicates how the variable is to be
 153 defined or accessed within the program.  The available scopes
 154 include:
 155
 156 =over 4
 157
 158 =item "lexical"
 159
 160 Lexical variables are scoped to the C<PAST::Block> in which
 161 they are declared, including any nested blocks.  If the
 162 node's C<isdecl> attribute is true, then this node defines
 163 a new lexical variable within the current block.  Otherwise,
 164 the node refers to a lexical variable already declared in the current
 165 or outer block.
 166
 167 =item "package"
 168
 169 Package variables represent global or namespace-managed
 170 variables in the program.  The node's C<namespace> attribute
 171 may be used to explicitly identify the namespace of the variable,
 172 otherwise it is assumed to be within the namespace of the
 173 C<PAST::Block> containing the node.
 174
 175 =item "parameter"
 176
 177 Parameter variables are the formal arguments to subroutine and
 178 methods, typically represented as C<PAST::Block> nodes.  The
 179 parameter's lexical name is given by the node's C<name> attribute.
 180 Positional parameters are defined by the order in which they
 181 appear in the C<PAST::Block> node, named parameters have values
 182 for the C<named> attribute.  Optional parameters are identified
 183 via the C<viviself> attribute (see below) indicating how the parameter
 184 is to be initialized if not supplied by the caller.  Slurpy parameters
 185 are indicated via the node's C<slurpy> attribute.
 186
 187 =item "keyed"
 188
 189 Keyed variables represent the elements of aggregates such as
 190 arrays and hashes.  Nodes representing keyed elements have two
 191 children; the first child is the PAST representation of the
 192 aggregate, and the second child is the PAST representation of
 193 the key or index.  The C<vivibase> attribute below may be used
 194 to specify how to generate the aggregate if it doesn't already
 195 exist.
 196
 197 =item "attribute"
 198
 199 Attribute variables represent object attributes (in some languages
 200 they're called "member variables"). The attribute's name is given
 201 by the node's C<name> attribute. Nodes representing attribute
 202 variables have an optional child, representing the object to which
 203 the attribute belongs. If this child is not present, the attribute
 204 is assumed to belong to the current invocant, indicated by the
 205 special variable C<self> (which is implicitly passed to all subs
 206 that are flagged as a C<:method>.
 207
 208
 209 =back
 210
 211 If C<scope> is not explicitly provided in the node, then PAST will
 212 look at the local symbol tables of any outer C<PAST::Block> nodes
 213 to try to determine the scope of the named variable.  If this still
 214 does not result in a scope, then 'lexical' is assumed.
 215
 216 =item viviself([value])
 217
 218 Accessor method for the C<viviself> attribute, which specifies
 219 how uninitialized variables are to be initialized when first
 220 used.  The C<viviself> attribute may be either a string or array
 221 representing a type (e.g., C<Integer>), or it may be a PAST
 222 representation for generating the variable's initial value.
 223
 224 =item vivibase([value])
 225
 226 Accessor method for the C<vivibase> attribute, which specifies
 227 how the base portion of aggregate variables (see "keyed scope" above)
 228 is to be created if it doesn't already exist.  C<Vivibase>
 229 may either be a string or array representing the type of the
 230 newly created aggregate, or it may be a PAST representation for
 231 generating the aggregate.
 232
 233 =item isdecl([flag])
 234
 235 Accessor method for the C<isdecl> attribute.  A true value
 236 of C<isdecl> indicates that the variable given by this node
 237 is being created at this point within the current lexical
 238 scope.  Otherwise, the node refers to a pre-existing variable
 239 (possibly from an outer scope).
 240
 241 =item lvalue([flag])
 242
 243 Accessor method for the C<lvalue> attribute, which indicates
 244 whether this variable is being used in an lvalue context.
 245
 246 =item slurpy([flag])
 247
 248 Accessor method for the C<slurpy> attribute of parameter variables.
 249 A true value of C<slurpy> indicates that the parameter variable
 250 given by this node is to be created as a slurpy parameter (consuming
 251 all remaining arguments passed in).  Named slurpy parameters are
 252 indicated by having a non-empty C<named> attribute and a true value of
 253 C<slurpy>.
 254
 255 =back
 256
 257 =head2 PAST::Op
 258
 259 C<PAST::Op> nodes represent the operations in an abstract
 260 syntax tree.  The primary function of the node is given by
 261 its C<pasttype> attribute, secondary functions may be indicated
 262 by the node's C<name>, C<pirop>, or other attributes as given
 263 below.
 264
 265 =over 4
 266
 267 =item pasttype([value])
 268
 269 Accessor method for the node's C<pasttype> attribute.  The
 270 C<pasttype> is the primary indicator of the type of operation
 271 to be performed, the operands are typically given as the
 272 children of the node.  Defined values of C<pasttype> are:
 273
 274 =over 4
 275
 276 =item copy
 277
 278 Copy the value of the node's second child into the variable
 279 expression given by its first child.
 280
 281 =item bind
 282
 283 Bind the variable given by the node's first child to the
 284 value given by its second child.
 285
 286 =item if
 287
 288 The first, second, and third children represent the
 289 "condition", "then", and "else" parts of a conditional
 290 expression.  The first child is evaluated; if the result
 291 is true then the second child is evaluated and returned
 292 as the result of the C<PAST::Op> node, otherwise the
 293 third child is evaluated and returned as the result.
 294 This implements the standard if-then-else logic needed
 295 by most higher level languages, and can also be used
 296 for implementing the ternary operator.
 297
 298 If the node is missing its second ("then") or third ("else")
 299 child, then the result of the condition is used as the
 300 corresponding result of the operation.  This makes it easy
 301 to implement the "short-circuit and" operator commonly used in many
 302 high level languages.  For example, the standard C<&&> operator
 303 may be implemented using an "if" node, where the left operand is
 304 the first (condition) child, the right operand is the
 305 second (then) child, and the third child is left as null
 306 or uninitialized.
 307
 308 It's also possible to build a "short-circuit or" (C<||>)
 309 operator using this pasttype, by setting the left operand to
 310 C<||> as the first child and the right operand as the I<third>
 311 child (leaving the second child as null).  However, it's probably
 312 simpler to use the "unless" type as described below.
 313
 314 =item unless
 315
 316 Same as C<if> above, except that the second child is evaluated
 317 if the first child evaluates to false and the third child is
 318 evaluated if the first child evaluates to true.
 319
 320 The C<unless> type can be used to implement "short-circuit or"
 321 semantics; simply set the first child to the left operand and
 322 the second child to the right operand, leaving the third
 323 child empty or uninitialized.  If the first child evaluates to
 324 true it is returned as the result of the operation, otherwise the
 325 second child is evaluated and returned as the result.
 326
 327 =item while
 328
 329 Evaluate the first child (condition), if the result is true
 330 then evaluate the second child (body) and repeat.
 331
 332 =item until
 333
 334 Evaluate the first child (condition), if the result is false then evaluate
 335 the second child (body) and repeat.
 336
 337 =item repeat_while, repeat_until
 338
 339 Same as C<while> and C<until> above, except the second child is evaluated
 340 before the conditional first child is evaluated for continuation of
 341 the loop.
 342
 343 =item for
 344
 345 Iterate over the first child in groups of elements given by
 346 C<arity> (default 1).  For each iteration, invoke the second
 347 child, passing the elements as parameters.
 348
 349 =item call
 350
 351 Call the subroutine given by the C<name> attribute, passing
 352 the results of any child nodes as arguments.  If the node
 353 has no C<name> attribute, then the first child is assumed
 354 to evaluate to a callable subroutine, and any remaining
 355 children are used as arguments.
 356
 357 =item callmethod
 358
 359 Invoke the method given by C<name> on the first child,
 360 passing the results of any child nodes as arguments.  If the
 361 node has no C<name> attribute, then the first child is
 362 evaluated as a method to be called, the second child is
 363 the invocant, and any remaining children are arguments to
 364 the method call.
 365
 366 =item pirop
 367
 368 Execute the PIR opcode given by the C<pirop> attribute.  See the
 369 C<pirop> method below for details.  This is also the default
 370 behavior when a C<pirop> attribute is set and C<pasttype> is
 371 not.
 372
 373 =item inline
 374
 375 Execute the sequence of PIR statements given by the
 376 node's C<inline> attribute (a string).  See the C<inline>
 377 method below for details.
 378
 379 =item try
 380
 381 Evaluates the first child, if any exceptions occur then they
 382 are handled by the code given by the second child (if any).
 383
 384 =item xor
 385
 386 Evaluate the child nodes looking for exactly one true result
 387 to be returned.  If two true results are encountered, the
 388 operation immediately short-circuits and returns false.
 389 Otherwise, after all children have been evaluated the result
 390 of any true child is used as the result of the operation, or
 391 the result of the last child if none of the children evaluated
 392 to true.
 393
 394 =back
 395
 396 =item pirop([opcode])
 397
 398 Get/set the PIR opcode to be executed for this node.  Internally
 399 the PAST and POST (Parrot Opcode Syntax Tree) implementations
 400 understand the register types available for common PIR operations
 401 and will handle any needed register or constant conversion of
 402 operands automatically.  Note that except for the C<assign>
 403 opcode, any destination is typically generated automatically
 404 and should not be explicitly given as a child operand to the node.
 405 The table of PIR opcodes that PAST "knows" about is given in
 406 F<compilers/pct/src/PAST/Compiler.pir> .
 407
 408 =item lvalue([flag])
 409
 410 Get/set whether this node is an lvalue, or treats its first
 411 child as an lvalue (e.g., for assignment).
 412
 413 =item inline([STRING code])
 414
 415 Get/set the code to be used for inline PIR when C<pasttype> is
 416 set to "inline".  The C<code> argument is PIR text to be inserted in
 417 the final generated code sequence.  Sequences of "%0", "%1",
 418 "%2", ... "%9" in C<code> are replaced with the evaluated
 419 results of the first, second, third, ..., tenth children nodes.
 420 (If you need more than ten arguments to your inline PIR, consider
 421 making it a subroutine call instead.)
 422
 423 The register to hold the result of the inline PIR operation is
 424 given by "%r", "%t", or "%u" in the C<code> string:
 425
 426   %r   - Generate a unique PMC register for the result.
 427   %t   - Generate a unique PMC register for the result,
 428          and initialize it with an object of type C<returns>
 429          {{Pm: or possibly C<viviself> }}
 430          before the execution of the inline PIR.
 431   %u   - Re-use the first child's PMC (%0) if it's a temporary
 432          result, otherwise same as %t above.
 433
 434 If none of %r, %t, or %u appear in C<code>, then the first
 435 child's (%0) is used as the result of this operation.
 436
 437 =back
 438
 439 =head2 PAST::Block
 440
 441 C<PAST::Block> nodes represent lexical scopes within an abstract
 442 syntax tree, and roughly translate to individual Parrot subroutines.
 443 A C<PAST::Block> node nested within another C<PAST::Block> node
 444 acts like a nested lexical scope.
 445
 446 If the block has a C<name> attribute, that becomes the name of
 447 the resulting Parrot sub.  Otherwise, a unique name is automatically
 448 generated for the block.
 449
 450 Each PAST::Block node can maintain its own local symbol table, see
 451 the C<symbol> method below.
 452
 453 =over 4
 454
 455 =item blocktype([type])
 456
 457 Get/set the type of the block to C<type>.  The currently
 458 understood values are 'declaration' and 'immediate'.  'Declaration'
 459 indicates that a block is simply being defined at this point, the
 460 result of the node is a reference to the block.  A C<blocktype>
 461 of 'immediate' indicates a block that is to be immediately
 462 executed when it is evaluated in the AST, and the result of the
 463 last operation is used as the return value for the block.
 464
 465 =item namespace([value])
 466
 467 Get/set the namespace for this particular block (and any nested
 468 blocks that do not explicitly provide a namespace).  C<Value>
 469 may either be a simple string or an array of strings representing
 470 a nested namespace.
 471
 472 =item symbol(name, [attr1 => val1, attr2 => val2, ...])
 473
 474 Each PAST::Block node can use the C<symbol> method to maintain
 475 its own compile-time notion of a local symbol table.  Each value
 476 of C<name> is given its own hash to hold information about that
 477 symbol for the block (i.e., the symbol table acts like a hash of
 478 hashes indexed by symbol name).  If the C<symbol> method is called
 479 with named arguments, then the method sets the entries in the hash
 480 corresponding to C<name> in the block's symbol table.  If C<symbol>
 481 is called with just a single C<name> argument, then the current hash
 482 for local symbol C<name> is returned.
 483
 484 HLLs are free to place any values in the symbol hashes that
 485 may be useful.  However, the C<scope> entry for a symbol is
 486 typically used to provide the C<scope> attribute for any nested
 487 C<PAST::Var> nodes that do not provide an explicit C<scope> attribute.
 488
 489 =item symbol_defaults([attr1 => val1, attr2 => val2, ...])
 490
 491 Sets default attributes for symbols that aren't explicitly
 492 given by the C<symbol> method above.  For example, an abstract
 493 syntax tree can use this method on a top-level C<PAST::Block>
 494 to specify the C<scope> attribute to be used for all
 495 variable nodes that don't otherwise provide one.
 496
 497 =item pir_pragma([value])
 498
 499 Get/set any PIR pragmas to be used when generating the block.
 500
 501 =item compiler([compiler_name])
 502
 503 Specify that the children nodes of this block are to be compiled
 504 using C<compiler_name> instead of being treated as standard PAST
 505 nodes.  This is useful when a program may contain components of
 506 code written in other HLLs.  For example, the C<perl6> compiler
 507 uses this feature to use PGE to compile pre-parsed Perl 6 regular
 508 expressions, rather than trying to represent the semantics of
 509 those expressions in PAST itself.
 510
 511 When code is generated from a C<PAST::Block> node having a
 512 C<compiler> attribute, the compiler is invoked with C<name>
 513 and C<outer> arguments so that any generated subs can have
 514 names and lexical scoping appropriate to the block (it's the
 515 responsibility of the external compiler to support this).
 516
 517 =back
 518
 519 =head2 PAST::CompUnit
 520
 521 C<PAST::CompUnit> nodes are used for the abstract representation
 522 of compilation units.  Specific attributes and semantics are
 523 yet to be determined.
 524
 525 =head1 REFERENCES
 526
 527 NA.
 528
 529 =head1 COPYRIGHT
 530
 531 Copyright (C) 2007, The Perl Foundation.
 532
 533 =cut
 534
 535 # Local Variables:
 536 #   mode: pir
 537 #   fill-column: 100
 538 # End:
 539 # vim: expandtab shiftwidth=4: