docs/pmc/subs.pod

   1 =head1 NAME
   2
   3 Subs - Parrot Subroutines
   4
   5 =head1 VERSION
   6
   7 $Revision$
   8
   9 =head1 ABSTRACT
  10
  11 This document describes how to define, call, and return from Parrot subroutine
  12 objects and other invokables.
  13
  14 =head1 DESCRIPTION
  15
  16 Parrot comes with different subroutine and related classes which implement CPS
  17 (Continuation Passing Style) and PCC (Parrot Calling Conventions)
  18 F<docs/pdds/pdd03_calling_conventions.pod>.
  19
  20 =head2 Class Tree
  21
  22 These are all of the built-in classes that are directly callable, or
  23 "invokable":
  24
  25   Sub
  26     Closure
  27     Coroutine
  28     Eval
  29   Continuation
  30     ExceptionHandler
  31
  32 By "invokable" we mean that they can be supplied as the first argument to the
  33 C<invoke>, C<invokecc>, or C<tailcall> instructions.  Generally speaking,
  34 invokable objects are divided into two subtypes:  C<Sub> and classes that are
  35 built on it create a new context when invoked, and C<Continuation> classes
  36 return control to an existing context that was captured when the
  37 C<Continuation> was created.
  38
  39 There are (of course) two classes that straddle this distinction:
  40
  41 =over 4
  42
  43 =item 1.
  44
  45 Invoking a C<Closure> object creates a new context for the sub it refers to
  46 directly, but it also captures an "outer" context that provides bindings for
  47 the immediately-enclosing lexical scope (and, if that context is itself is for
  48 a C<Closure>, the subsequent scopes working outwards).
  49
  50 [add a C<newclosure> example?  -- rgr, 6-Apr-08.]
  51
  52 =item 2.
  53
  54 A C<Coroutine> acts like a normal sub when called initially, and can also
  55 return normally, but acts like a continuation when exited via the C<yield>
  56 instruction and re-entered by re-invoking.
  57
  58 [need a reference to a C<coroutine> example.  -- rgr, 6-Apr-08.]
  59
  60 =back
  61
  62 =head1 SYNOPSIS
  63
  64 =head2 Creating subs
  65
  66 Subs are created by IMCC (the PIR compiler) via the B<.sub> directive.  Unless
  67 the C<:anon> pragma is included, they are stored in the constant table
  68 associated with the bytecode and can be fetched with the B<get_hll_global> and
  69 B<get_root_global> opcodes.  Within the PIR source, they can also be put in
  70 registers with a C<.const 'Sub'> declaration:
  71
  72 =begin PIR_FRAGMENT
  73
  74     .const 'Sub' rsub = 'random_sub'
  75
  76 =end PIR_FRAGMENT
  77
  78 This uses C<find_sub_not_null> under the hood to look up the sub named
  79 "random_sub".
  80
  81 Here's an example of fetching a sub from another namespace:
  82
  83 =begin PIR
  84
  85     .sub main :main
  86         get_hll_global $P0, ['Other'; 'Namespace'], "the_sub"
  87         $P0()
  88         print "back\n"
  89     .end
  90
  91     .namespace ['Other'; 'Namespace']
  92
  93     .sub the_sub
  94         print "in sub\n"
  95     .end
  96
  97 =end PIR
  98
  99 Note that C<the_sub> could be defined in a different bytecode or PIR source
 100 file from C<main>.
 101
 102 =head2 Program entry point
 103
 104 One subroutine in the first executed source or bytecode file may be flagged as
 105 the "main" subroutine, where execution starts.
 106
 107 =begin PIR
 108
 109   .sub the_main_event :main
 110      # ...
 111   .end
 112
 113 =end PIR
 114
 115 In the absence of a B<:main> entry Parrot starts execution at the first
 116 statement.  Any C<:main> directives in a subsequent PIR or bytecode file that
 117 are loaded under program control are ignored.
 118
 119 Note that if the first executed source or bytecode file contains more than one
 120 sub flagged as C<:main>, Parrot currently picks the I<last> such sub to start
 121 execution.  This is arguably a bug, so users should not depend upon it.
 122
 123 =head2 Load-time initialization
 124
 125 If a subroutine is marked as B<:load> this subroutine is run, before the
 126 B<load_bytecode> opcode returns.
 127
 128 e.g.
 129
 130 =begin PIR
 131
 132   .sub main :main
 133      print "in main\n"
 134      load_bytecode "library_code.pir"
 135      print "back to main\n"
 136   .end
 137
 138   # library_code.pir
 139
 140   .sub _my_lib_init :load
 141      print "initializing library\n"
 142   .end
 143
 144 =end PIR
 145
 146 If a subroutine is marked as B<:init> this subroutine is run before the
 147 B<:main> or the first subroutine in the source file runs.  Unlike B<:main>
 148 subs, B<:init> subs are also run when compiling from memory.  B<:load> subs
 149 are run only in any source or bytecode files loaded subsequently.
 150
 151 These markers are called "pragmas", and are defined fully in
 152 L<docs/pdds/pdd19_pir.pod>.  The following table summarizes the behavior
 153 of the five pragmas that cause Parrot to run a sub implicitly:
 154
 155                 ------ Executed when --------
 156                 compiling to    -- loading --
 157   Sub Pragma    disk  memory    first   after
 158   ==========    ====  ======    =====   =====
 159    :immediate   yes   yes       no      no
 160    :postcomp    yes   no        no      no
 161    :load        no    no        no      yes
 162    :init        no    yes       yes     no
 163    :main        no    no        yes     no
 164
 165 The same load-time behavior applies regardless of whether the loaded file is
 166 PIR source or bytecode.  Note that it is possible to mark a sub with both
 167 B<:load> and B<:init>.
 168
 169 =head2 Defining subs
 170
 171 A sub is defined by a block of code starting with C<.sub> and ending with
 172 C<.end>. Parameters which the sub can be called with are defined by C<.param>:
 173
 174 =begin PIR
 175
 176     .sub do_something
 177       .param pmc a_pmc
 178       .param string some_string
 179       #do something
 180     .end
 181
 182 =end PIR
 183
 184 The set of C<.param> instructions are converted to a single C<get_params>
 185 instruction. The compiler will decide which registers to use.
 186
 187 =begin PIR_FRAGMENT
 188
 189     get_params '(0,0)', $P0, $S0
 190
 191 =end PIR_FRAGMENT
 192
 193 A parameter can be declared optional with the C<:optional> command. If an
 194 optional parameter is followed by parameter declared C<:opt_flag>, this
 195 parameter will store an integer indicating whether the optional parameter
 196 was used.
 197
 198 =begin PIR_FRAGMENT
 199
 200     .param string maybe :optional
 201     .param int has_maybe :opt_flag
 202     unless has_maybe goto no_maybe
 203     #do something with maybe
 204     no_maybe:
 205     #don't use maybe
 206
 207 =end PIR_FRAGMENT
 208
 209 A sub can accept an arbitrary number of parameters by declaring a C<:slurpy>
 210 parameter.  This creates a pmc containing an array of all parameters passed to
 211 the sub, these can be accessed like so:
 212
 213 =begin PIR_FRAGMENT
 214
 215     .param pmc all_params :slurpy
 216
 217     $P0 = all_params[0]
 218     $S0 = all_params[1]
 219
 220 =end PIR_FRAGMENT
 221
 222 A slurpy parameter can also be defined after a set of positional parameters, in
 223 which case it will only hold any additional parameters passed.
 224
 225 A parameter may also be declared C<:named>, giving them a string which can be
 226 used when calling the sub to explicitly assign a parameter, ignoring position.
 227
 228 =begin PIR_FRAGMENT
 229
 230     .param int counter :named("counter")
 231
 232 =end PIR_FRAGMENT
 233
 234 This can be combined with C<:optional> as well as C<:opt_flag>, so that the
 235 parameter need only be passed when necessary.
 236
 237 If a parameter is declared with C<:slurpy> and C<:named> (with no string), it
 238 creates an associative array containing all named parameters which can be
 239 accessed like so:
 240
 241 =begin PIR_FRAGMENT
 242
 243     .param pmc all_params :slurpy :named
 244     $S0 = all_params['name']
 245     $I0 = all_params['counter']
 246
 247 =end PIR_FRAGMENT
 248
 249 =head2 Calling the sub
 250
 251 PIR sub invocation syntax is similar to HLL syntax:
 252
 253 =begin PIR_FRAGMENT
 254
 255     $P0 = do_something($P1, $S3)
 256
 257 =end PIR_FRAGMENT
 258
 259 This is syntactic sugar for the following four bytecode instructions:
 260
 261 =begin PIR_FRAGMENT
 262
 263     # Establish arguments.
 264     set_args '(0,0)', $P1, $S3
 265     # Find the sub.
 266     $P8 = find_sub_not_null "do_something"
 267     # Establish return values.
 268     get_results '(0)', $P0
 269     # Call the sub in $P8, implicitly creating a return continuation.
 270     invokecc $P8
 271
 272 =end PIR_FRAGMENT
 273
 274 The sub name could be replaced with a PMC register, in which case the
 275 C<find_sub_not_null> instruction would not be needed.  If the return values
 276 from the sub were ignored (by dropping the C<$P0 => part), the C<get_results>
 277 instruction would be omitted.  However, C<set_args> is emitted even in the
 278 case of a call without arguments.
 279
 280 The first operands to the C<set_args> and C<get_results> instructions are
 281 actually placeholders for an integer array that describes the register types.
 282 For example, the '(0,0)' for C<set_args> is replaced internally with C<[2,
 283 1]>, which means "two arguments, of type PMC and string".  Note that return
 284 values get the same register type coercion as sub parameters.  This is all
 285 described in much more detail in L<docs/pdds/pdd03_calling_conventions.pod>.
 286
 287 Named parameters can be explicity called in one of two ways:
 288
 289 =begin PIR_FRAGMENT
 290
 291     $P5 = do_something($I6 :named("counter"), $S4 :named("name"))
 292     #or equivalently
 293     $P5 = do_something("counter" => $I6, "name" => $S4)
 294
 295 =end PIR_FRAGMENT
 296
 297 To receive multiple values, put the register names in parentheses:
 298
 299 =begin PIR_FRAGMENT
 300
 301     ($P10, $P11) = do_something($P1, $S3)
 302
 303     ($P10, $P11) = do_something($P1, $S3)
 304
 305 =end PIR_FRAGMENT
 306
 307 To test whether a value was returned, declare it C<:optional>, and follow it
 308 with an integer register declared C<:opt_val>:
 309
 310 =begin PIR_FRAGMENT_INVALID
 311
 312     ($P10 :optional, $I10 :opt_val) = do_something($P1, $S3)
 313
 314 =end PIR_FRAGMENT_INVALID
 315
 316 A C<:slurpy> value can be declared, as in parameter declarations, to catch an
 317 arbitrary number of return values:
 318
 319 =begin PIR_FRAGMENT
 320
 321     ($P12, $P13 :slurpy) = do_something($P1, $S3)
 322
 323 =end PIR_FRAGMENT
 324
 325 Note that the parameters stored in a C<:slurpy>, or C<:slurpy> C<:named> array
 326 can be used as parameters for another call using the C<:flat> declaration:
 327
 328 =begin PIR_FRAGMENT
 329
 330     ($P14, $P15) = do_something($P13 :flat)
 331
 332 =end PIR_FRAGMENT
 333
 334 Subs may also return C<:named> values, which can be explicitly accessed similar
 335 to parameter declarations:
 336
 337 =begin PIR_FRAGMENT
 338
 339     ($I11 :named("counter"), $S4 :named("name")) = do_something($P1, $S3)
 340
 341 =end PIR_FRAGMENT
 342
 343 All of these affect only the signature provided via C<get_results>.
 344
 345 [not sure what this is for, leaving it alone for now -aninhumer]
 346
 347 =begin PIR_FRAGMENT
 348
 349     # Call the sub in $P8, with continuation (created earlier) in $P9.
 350     invoke $P8, $P9
 351
 352 =end PIR_FRAGMENT
 353
 354 =head2 Returning from a sub
 355
 356 PIR supports a convenient syntax for returning any number of values from a sub
 357 or closure:
 358
 359 =begin PIR
 360
 361     .sub main
 362       .return ($P0, $I1, $S3)
 363     .end
 364
 365 =end PIR
 366
 367 Integer, float, and string constants are also accepted.  This is translated
 368 to:
 369
 370 =begin PIR_FRAGMENT
 371
 372     set_returns '(0,0,0)', $P0, $I1, $S3
 373     returncc    # return by calling the current continuation
 374
 375 =end PIR_FRAGMENT
 376
 377 As for C<set_args>, the '(0,0,0)' is actually a placeholder for an integer
 378 array that describes the register types; it is replaced internally with C<[2,
 379 0, 1]>, which means "three arguments, of type PMC, integer, and string".
 380
 381 All of the declarations allowed for calls to a sub can also be used with
 382 return values. (C<:named>, C<:flat>)
 383
 384 Another way to return from a sub is to use tail-calling, which calls a new sub
 385 with the current continuation, so that the new sub returns directly to the
 386 caller of the old sub (i.e. without first returning to the old sub).  This
 387 passes the three values to C<another_sub> via tail-calling:
 388
 389 =begin PIR
 390
 391     .sub main
 392       .tailcall another_sub($P0, $I1, $S3)
 393     .end
 394
 395 =end PIR
 396
 397 This is translated into a C<set_args> instruction for the call, but with
 398 C<tailcall> instead of C<invokecc>:
 399
 400 =begin PIR_FRAGMENT
 401
 402     set_args '(0,0,0)', $P0, $I1, $S3
 403     $P8 = find_sub_not_null "another_sub"
 404     tailcall $P8
 405
 406 =end PIR_FRAGMENT
 407
 408 As for calling, the sub name could be replaced with a PMC register, in which
 409 case the C<find_sub_not_null> instruction would not be needed.
 410
 411 If needed, the current continuation can be extracted and called explicitly as
 412 follows:
 413
 414 =begin PIR_FRAGMENT
 415
 416     ## This is what defines .INTERPINFO_CURRENT_CONT.
 417     .include 'interpinfo.pasm'
 418     ## Store our return continuation as exit_cont.
 419     .local pmc exit_cont
 420     exit_cont = interpinfo .INTERPINFO_CURRENT_CONT
 421     ## Invoke it explicitly:
 422     invokecc exit_cont
 423     ## ... or equivalently:
 424     tailcall exit_cont
 425
 426 =end PIR_FRAGMENT
 427
 428 To return values, use C<set_args> as before.
 429
 430 =head2 All together now
 431
 432 The following complete example illustrates the typical call/return pattern:
 433
 434 =begin PIR
 435
 436     .sub main :main
 437         print "in main\n"
 438         the_sub()
 439         print "back to main\n"
 440     .end
 441
 442     .sub the_sub
 443         print "in sub\n"
 444     .end
 445
 446 =end PIR
 447
 448 Notice that we are not passing or returning values here.
 449
 450 [example of passing values.  this could get pretty elaborate; look for other
 451 examples first.  -- rgr, 6-Apr-08.]
 452
 453 If a short subroutine is called several times, for instance inside a loop, the
 454 creation of the return continuation can be done outside the loop:
 455
 456 =begin PIR_INVALID
 457
 458     .sub main :main
 459             ## Initialize the sub and the return cont.
 460             .local pmc cont
 461             cont = new 'Continuation'
 462             set_addr cont, ret_label
 463             .const .Sub rsub = 'random_sub'
 464             ## Loop initialization.
 465             .local int loop_max, i
 466             loop_max = 1000000
 467             i = 0
 468
 469             ## Main loop.
 470     again:
 471             set_args '(0)', i
 472             invoke rsub, cont
 473     ret_label:
 474             ## This is where "cont" returns.
 475             inc i
 476             if i < loop_max goto again
 477     .end
 478
 479     .sub random_sub
 480             .param int foo
 481             ## do_something
 482     .end
 483
 484 =end PIR_INVALID
 485
 486 If the sub returns values, the C<get_results> must be B<after> C<ret_label> in
 487 order to receive them.
 488
 489 Since this is much more obscure than the PIR calling syntax, it should only be
 490 done if there is a measurable performance advantage.  Even in this trivial
 491 example, calling "rsub(i)" is only about a third slower on x86.
 492
 493 =head1 FILES
 494
 495 F<src/pmc/sub.pmc>, F<src/pmc/closure.pmc>,
 496 F<src/pmc/continuation.pmc>, F<src/pmc/coroutine.pmc>, F<src/sub.c>,
 497 F<t/pmc/sub.t>
 498
 499 =head1 SEE ALSO
 500
 501 F<docs/pdds/pdd03_calling_conventions.pod>
 502 F<docs/pdds/pdd19_pir.pod>
 503
 504 =head1 AUTHOR
 505
 506 Leopold Toetsch <lt@toetsch.at>
 507
 508 =cut
 509
 510 __END__
 511 Local Variables:
 512   fill-column:78
 513 End: