docs/book/ch04_pir_subroutines.pod

   1 =pod
   2
   3 Z<CHP-4>
   4
   5 =head1 Subroutines
   6
   7 Z<CHP-4-SECT-1>
   8
   9 X<PIR (Parrot intermediate representation);subroutines>
  10 X<subroutines;in PIR>
  11 A calculation like "the factorial of a number" may be used several
  12 times in a large program. Subroutines allow this kind of functionality
  13 to be abstracted into a unit. It's a benefit for code reuse and
  14 maintainability. Even though PASM is just an assembly language for a
  15 virtual processor, it has a number of features to support high-level
  16 subroutine calls. PIR offers a smoother interface to those features.
  17
  18 PIR provides several different sets of syntax for subroutine calls.
  19 This is a language designed to implement other languages, and every
  20 language does subroutine calls a little differently. What's needed is
  21 a set of building blocks and tools, not a single prepackaged solution.
  22
  23 =head2 Parrot Calling Conventions
  24
  25 Z<CHP-4-SECT-1.1>
  26
  27 X<PIR (Parrot intermediate representation);subroutines;Parrot calling conventions>
  28 X<subroutines;Parrot calling conventions;in PIR>
  29 As we mentioned in the previous chapter, Parrot defines a set of
  30 calling conventions for externally visible subroutines. In these
  31 calls, the caller is responsible for preserving its own registers, and
  32 arguments and return values are passed in a predefined set of Parrot
  33 registers. The calling conventions use the Continuation Passing Style
  34 X<Continuation Passing Style (CPS)>X<CPS (Continuation Passing Style)>
  35 to pass control to subroutines and back again.
  36
  37 X<PIR (Parrot intermediate representation);subroutine calls>
  38 The fact that the Parrot calling conventions are clearly defined also
  39 makes it possible to provide some higher-level syntax for it. Manually
  40 setting up all the registers for each subroutine call isn't just
  41 tedious, it's also prone to bugs introduced by typos. PIR's simplest
  42 subroutine call syntax looks much like a high-level language. This
  43 example calls the subroutine C<_fact> with two arguments and assigns
  44 the result to C<$I0>:
  45
  46      ($I0, $I1) = _fact(count, product)
  47
  48 This simple statement hides a great deal of complexity. It generates a
  49 subroutine object and stores it in C<P0>. It assigns the arguments to
  50 the appropriate registers, assigning any extra arguments to the
  51 overflow array in C<P3>. It also sets up the other registers to mark
  52 whether this is a prototyped call and how many arguments it passes of
  53 each type. It calls the subroutine stored in C<P0>, saving and
  54 restoring the top half of all register frames around the call. And
  55 finally, it assigns the result of the call to the given temporary
  56 register variables (for a single result you can drop the parentheses).
  57 If the one line above were written out in basic PIR it would be
  58 something like:
  59
  60   newsub P0, .Sub, _fact
  61   I5 = count
  62   I6 = product
  63   I0 = 1
  64   I1 = 2
  65   I2 = 0
  66   I3 = 0
  67   I4 = 0
  68   savetop
  69   invokecc
  70   restoretop
  71   $I0 = I5
  72   $I1 = I6
  73
  74 The PIR code actually generates an C<invokecc> opcode internally. It
  75 not only invokes the subroutine in C<P0>, but also generates a new
  76 return continuation in C<P1>. The called subroutine invokes this
  77 continuation to return control to the caller.
  78
  79 The single line subroutine call is incredibly convenient, but it isn't
  80 always flexible enough. So PIR also has a more verbose call syntax
  81 that is still more convenient than manual calls. This example pulls
  82 the subroutine C<_fact> out of the global symbol table and calls it:
  83
  84   find_global $P1, "_fact"
  85
  86   .begin_call
  87     .arg count
  88     .arg product
  89     .call $P1
  90     .result $I0
  91   .end_call
  92
  93 X<.arg directive>
  94 X<.result directive>
  95 The whole chunk of code from C<.begin_call> to C<.end_call> acts as a
  96 single unit. The C<.begin_call> directive can be marked as
  97 C<prototyped> or C<unprototyped>, which corresponds to the flag C<I0>
  98 in the calling conventions. The C<.arg> directive sets up arguments to
  99 the call. The C<.call> directive saves top register frames, calls
 100 the subroutine, and restores the top registers. The C<.result>
 101 directive retrieves return values from the call.
 102
 103 X<.param directive>
 104 In addition to syntax for subroutine calls, PIR provides syntax for
 105 subroutine definitions. The C<.param> directive pulls parameters out
 106 of the registers and creates local named variables for them:
 107
 108   .param int c
 109
 110 X<.begin_return directive>
 111 X<.end_return directive>
 112 The C<.begin_return> and C<.end_return> directives act as a
 113 unit much like the C<.begin_call> and C<.end_call> directives:
 114
 115   .begin_return
 116     .return p
 117   .end_return
 118
 119 X<.return directive>
 120 The C<.return> directive sets up return values in the appropriate
 121 registers. After all the registers are set up the unit invokes the
 122 return continuation in C<P1> to return control to the caller.
 123
 124 Here's a complete code example that reimplements the factorial code
 125 from the previous section as an independent subroutine. The subroutine
 126 C<_fact> is a separate compilation unit, assembled and processed after
 127 the C<_main> function.  Parrot resolves global symbols like the
 128 C<_fact> label between different units.
 129
 130   # factorial.pir
 131   .sub _main
 132      .local int count
 133      .local int product
 134      count = 5
 135      product = 1
 136
 137      $I0 = _fact(count, product)
 138
 139      print $I0
 140      print "\n"
 141      end
 142   .end
 143
 144   .sub _fact
 145      .param int c
 146      .param int p
 147
 148   loop:
 149      if c <= 1 goto fin
 150      p = c * p
 151      dec c
 152      branch loop
 153   fin:
 154      .begin_return
 155      .return p
 156      .end_return
 157   .end
 158
 159
 160 This example defines two local named variables, C<count> and
 161 C<product>, and assigns them the values 1 and 5. It calls the C<_fact>
 162 subroutine passing the two variables as arguments. In the call, the
 163 two arguments are assigned to consecutive integer registers, because
 164 they're stored in typed integer variables. The C<_fact> subroutine
 165 uses C<.param> and the return directives for retrieving parameters and
 166 returning results. The final printed result is 120.
 167
 168 You may want to generate a PASM source file for the above example to
 169 look at the details of how the PIR code translates to PASM:
 170
 171   $ parrot -o- factorial.pir
 172
 173 =head2 Compilation Units Revisited
 174
 175 Z<CHP-4-SECT-1.2>
 176
 177 The example above could have been written using simple labels instead
 178 of separate compilation units:
 179
 180   .sub _main
 181       $I1 = 5         # counter
 182       call fact       # same as bsr fact
 183       print $I0
 184       print "\n"
 185       $I1 = 6         # counter
 186       call fact
 187       print $I0
 188       print "\n"
 189       end
 190
 191   fact:
 192       $I0 = 1           # product
 193   L1:
 194       $I0 = $I0 * $I1
 195       dec $I1
 196       if $I1 > 0 goto L1
 197       ret
 198   .end
 199
 200 The unit of code from the C<fact> label definition to C<ret> is a
 201 reusable routine. There are several problems with this simple
 202 approach. First, the caller has to know to pass the argument to
 203 C<fact> in C<$I1> and to get the result from C<$I0>. Second, neither
 204 the caller nor the function itself preserves any registers. This is
 205 fine for the example above, because very few registers are used. But
 206 if this same bit of code were buried deeply in a math routine package,
 207 you would have a high risk of clobbering the caller's register values.
 208
 209 X<PIR (Parrot intermediate representation);register allocation>
 210 X<data flow graph (DFG)>
 211 Another disadvantage of this approach is that C<_main> and C<fact>
 212 share the same compilation unit, so they're parsed and processed as
 213 one piece of code. When Parrot does register allocation, it calculates
 214 the data flow graph (DFG) of all symbols,N<The operation to calculate
 215 the DFG has a quadratic cost or better. It depends on I<n_lines *
 216 n_symbols>.> looks at their usage, calculates the interference between
 217 all possible combinations of symbols, and then assigns a Parrot
 218 register to each symbol. This process is less efficient for large
 219 compilation units than it is for several small ones, so it's better to
 220 keep the code modular. The optimizer will decide whether register
 221 usage is light enough to merit combining two compilation units, or
 222 even inlining the entire function.
 223
 224 =begin sidebar A Short Note on the Optimizer
 225
 226 Z<CHP-4-SIDEBAR-1>
 227
 228 X<optimizer>
 229 The optimizer isn't powerful enough to inline small subroutines yet.
 230 But it already does other simpler optimizations. You may recall that
 231 the PASM opcode C<mul> (multiply) has a two-argument version that uses
 232 the same register for the destination and the first operand. When
 233 Parrot
 234 comes across a PIR statement like C<$I0 = $I0 * $I1>, it can optimize
 235 it to the two-argument C<mul $I0>, C<$I1> instead of C<mul $I0, $I0,
 236 $I1>. This kind of optimization is enabled by the C<-O1> command-line
 237 option.
 238
 239 So you don't need to worry about finding the shortest PASM
 240 instruction, calculating constant terms, or avoiding branches to speed
 241 up your code. Parrot does it already.
 242
 243 =end sidebar
 244
 245
 246 =head2 PASM Subroutines
 247
 248 Z<CHP-4-SECT-1.4>
 249
 250 X<subroutines;PASM>
 251 X<PASM (Parrot assembly language);subroutines>
 252 PIR code can include pure PASM compilation units. These are wrapped in
 253 the C<.emit> and C<.eom> directives instead of C<.sub> and C<.end>.
 254 The C<.emit> directive doesn't take a name, it only acts as a
 255 container for the PASM code. These primitive compilation units can be
 256 useful for grouping PASM functions or function wrappers. Subroutine
 257 entry labels inside C<.emit> blocks have to be global labels:
 258
 259   .emit
 260   _substr:
 261       ...
 262       ret
 263   _grep:
 264       ...
 265       ret
 266   .eom
 267
 268 =head1 Methods
 269
 270 Z<CHP-4-SECT-2>
 271
 272 X<PIR (Parrot intermediate representation);methods>
 273 X<methods;in PIR>
 274 X<classes;methods>
 275 X<. (dot);. (method call);instruction (PIR)>
 276 PIR provides syntax to simplify writing methods and method calls for
 277 object-oriented programming. These calls follow the Parrot
 278 calling conventions as well. First we want to discuss I<namespaces>
 279 in Parrot.
 280
 281 =head2 Namespaces
 282
 283 Z<CHP-4-SECT-2.1>
 284
 285 X<Namespaces>
 286 X<.namespace>
 287 Namespaces provide a mechanism where names can be reused. This may not
 288 sound like much, but in large complicated systems, or systems with
 289 many included libraries, it can become a big hassle very quickly. Each
 290 namespace get's it's own area for function names and global variables.
 291 This way, you can have multiple functions named C<create> or C<new>
 292 or C<convert>, for instance, without having to use I<Multi-Method
 293 Dispatch> (MMD), which we will describe later.
 294
 295 Namespaces are specified with the C<.namespace []> directive. The brackets
 296 are themselves not optional, but the keys inside them are. Here are some
 297 examples:
 298
 299     .namespace [ ]               # The root namespace
 300     .namespace [ "Foo" ]         # The namespace "Foo"
 301     .namespace [ "Foo" ; "Bar" ] # Namespace Foo::Bar
 302
 303 Using semicolons, namespaces can be nested to any arbitrary depth. Namespaces
 304 are special types of PMC, so we can access them and manipulate them just
 305 like other data objects. We can get the PMC for the root namespace using
 306 the C<get_root_namespace> opcode:
 307
 308     $P0 = get_root_namespace
 309
 310 The current namespace, which might be different from the root namespace
 311 can be retrieved with the C<get_namespace> opcode:
 312
 313     $P0 = get_namespace             # get current namespace
 314     $P0 = get_namespace [ "Foo" ]   # get PMC for namespace "Foo"
 315
 316 Once we have a namespace PMC, we can call functions in it, or retrieve
 317 global variables from it using the following functions:
 318
 319     $P1 = get_global $S0            # Get global in current namespace
 320     $P1 = get_global [ "Foo" ], $S0 # Get global in namespace "Foo"
 321     $P1 = get_global $P0, $S0       # Get global in $P0 namespace PMC
 322
 323 In the examples above, of course, C<$S0> contains the string name of the
 324 global variable or function from the namespace to find.
 325
 326
 327 =head2 Method Syntax
 328
 329 Z<CHP-4-SECT-2.2>
 330
 331 Now that we've discussed namespaces, we can start to discuss
 332 object-oriented programming and method calls. The basic syntax is
 333 similar to the single line subroutine call above, but instead of a
 334 subroutine label name it takes a variable for the invocant PMC and a
 335 string with the name of the method:
 336
 337   object."methodname"(arguments)
 338
 339 The invocant can be a variable or register, and the method name can be
 340 a literal string, string variable, or method object register. This
 341 tiny bit of code sets up all the registers for a method call and makes
 342 the call, saving and restoring the top half of the register frames
 343 around the call. Internally, the call is a C<callmethodcc> opcode, so
 344 it also generates a return continuation.
 345
 346 This example defines two methods in the C<Foo> class. It calls one
 347 from the main body of the subroutine and the other from within the
 348 first method:
 349
 350   .sub _main
 351     .local pmc class
 352     .local pmc obj
 353     newclass class, "Foo"       # create a new Foo class
 354     new obj, "Foo"              # instantiate a Foo object
 355     obj."_meth"()               # call obj."_meth" which is actually
 356     print "done\n"              # "_meth" in the "Foo" namespace
 357     end
 358   .end
 359
 360   .namespace [ "Foo" ]          # start namespace "Foo"
 361
 362   .sub _meth :method            # define Foo::_meth global
 363      print "in meth\n"
 364      $S0 = "_other_meth"        # method names can be in a register too
 365      self.$S0()                 # self is the invocant
 366   .end
 367
 368   .sub _other_meth :method      # define another method
 369      print "in other_meth\n"    # as above Parrot provides a return
 370   .end                          # statement
 371
 372 Each method call looks up the method name in the symbol table of the
 373 object's class. Like C<.pccsub> in PASM, C<.sub> makes a symbol table
 374 entry for the subroutine in the current namespace.
 375
 376 When a C<.sub> is declared as a C<method>, it automatically creates a
 377 local variable named C<self> and assigns it the object passed in
 378 C<P2>.
 379
 380 You can pass multiple arguments to a method and retrieve multiple
 381 return values just like a single line subroutine call:
 382
 383   (res1, res2) = obj."method"(arg1, arg2)
 384
 385
 386 =cut
 387
 388 # Local variables:
 389 #   c-file-style: "parrot"
 390 # End:
 391 # vim: expandtab shiftwidth=4: