compilers/pirc/README.pod

   1 # Copyright (C) 2001-2009, Parrot Foundation.
   2 # $Id$
   3
   4 =head1 NAME
   5
   6 README.txt - Readme file for PIRC compiler.
   7
   8 =head1 DESCRIPTION
   9
  10 PIRC is a fresh implementation of the PIR language using Bison and Flex.
  11 Its main features are:
  12
  13 =over 4
  14
  15 =item * thread-safety, so it is reentrant.
  16
  17 =item * strength reduction, implemented in the parser.
  18
  19 =item * constant folding, implemented in the parser.
  20
  21 =item * checks for proper use of op arguments in PIR syntax (disallowing, e.g.: $S0 = print)
  22
  23 =item * allow multiple heredocs in subroutine invocations (like: foo(<<'A', <<'B', <<'C') )
  24
  25 =item * register usage optimization
  26
  27 =back
  28
  29 =head2 Compiling and Running
  30
  31 =head3 Windows using Microsoft Visual Studio
  32
  33 To compile PIRC on windows using MSVC:
  34
  35    nmake
  36
  37 When running PIRC, it needs the shared library C<libparrot>; an easy way to do
  38 this is copy C<libparrot.dll> in the Parrot root directory to C<compilers/pirc/src>.
  39
  40 Running PIRC is as easy as:
  41
  42  pirc test.pir
  43
  44 See 'pirc -h' for help.
  45
  46 =head3 Linux using GCC
  47
  48 The Makefile should work fine on Linux:
  49
  50  cd compilers/pirc && make
  51
  52 When running PIRC, it needs the shared library C<libparrot>; in order to let
  53 PIRC find it, set the path as follows:
  54
  55  export LD_LIBRARY_PATH=../../../blib/lib
  56
  57 Running is as easy as:
  58
  59  ./pirc test.pir
  60
  61 =head2 Overview
  62
  63 The new Bison/Flex based implementation of the PIR compiler is designed
  64 as a two-stage compiler:
  65
  66 =over 4
  67
  68 =item 1. Heredoc preprocessor
  69
  70 =item 2. PIR compiler
  71
  72 =back
  73
  74 =head2 Heredoc preprocessing
  75
  76 The heredoc preprocessor takes the input as written by the PIR programmer,
  77 and flattens out all heredoc strings. An example is shown below to illustrate
  78 this concept:
  79
  80 The following input:
  81
  82  .sub main
  83    $S0 = <<'EOS'
  84  This is a heredoc string
  85    divided
  86      over
  87        five
  88          lines.
  89  EOS
  90  .end
  91
  92 is transformed into:
  93
  94  .sub
  95    $S0 = "This is a heredoc string\n  divided\n    over\n      five\n        lines.\n"
  96  .end
  97
  98 In order to allow C<.include>d file to have heredoc strings, the heredoc preprocessor
  99 also handles the C<.include> directive, even though logically this is a macro function.
 100 See the discussion below for how the C<.include> directive works.
 101
 102 =head2 PIR compilers
 103
 104 The PIR compiler parses the output of the heredoc preprocessor. PIRC's lexer also
 105 handles macros.
 106
 107 The macro layer basically implements text replacements. The following directives are handled:
 108
 109 =over 4
 110
 111 =item C<.macro>
 112
 113 =item C<.macro_const>
 114
 115 =item C<.macro_local>
 116
 117 =item C<.macro_label>
 118
 119 =back
 120
 121 =head3 C<.include>
 122
 123 The C<.include> directive takes a string argument, which is the name of a file. The
 124 contents of this file are inserted at the point where the C<.include> directive
 125 is written. To illustrate this, consider the following example:
 126
 127  main.pir:
 128  ========================
 129  .sub main
 130    print "hi\n"
 131    foo()
 132  .end
 133
 134  .include "lib.pir"
 135  ========================
 136
 137  lib.pir:
 138  ========================
 139  .sub foo
 140    print "foo\n"
 141  .end
 142  ========================
 143
 144 This will result in the following output:
 145
 146  .sub main
 147    print "hi\n"
 148    foo()
 149  .end
 150
 151  .sub foo
 152    print "foo\n"
 153  .end
 154
 155
 156 =head3 C<.macro>
 157
 158 The macro directive starts a macro definition. The macro preprocessor
 159 implements the expansion of macros. For instance, given the following input:
 160
 161  .macro say(msg)
 162    print .msg
 163    print "\n"
 164  .endm
 165
 166  .sub main
 167    .say("hi there!")
 168  .end
 169
 170 will result in this output:
 171
 172  .sub main
 173    print "hi there!"
 174    print "\n"
 175  .end
 176
 177 =head3 C<.macro_const>
 178
 179 The C<.macro_const> directive is similar to the C<.macro> directive, except
 180 that a C<.macro_const> is just a simplified C<.macro>; it merely gives a name
 181 to some constant:
 182
 183  .macro_const PI 3.14
 184
 185  .sub main
 186    print "PI is approximately: "
 187    print .PI
 188    print "\n"
 189  .end
 190
 191 This will result in the output:
 192
 193  .sub main
 194    print "PI is approximately: "
 195    print 3.14
 196    print "\n"
 197  .end
 198
 199
 200 =head3 PIR compiler
 201
 202 As Parrot instructions are polymorphic, the PIR compiler is responsible for
 203 selecting the right variant of the instruction. The selection is based on the
 204 types of the operands. For instance:
 205
 206  set $I0, 42
 207
 208 will select the C<set_i_ic> instruction: this is the C<set> instruction, taking
 209 an integer (i) result operand and an integer constant (ic) operand. Other examples
 210 are:
 211
 212  $P0[1] = 42           --> set_p_kic_ic # kic = key integer constant
 213  $I0 = $P0["hi"]       --> set_i_p_kc   # kc = key constant from constant table
 214  $P1 = new "Hash"      --> new_p_sc     # sc = string constant
 215
 216 =head3 Constant folding
 217
 218 Expressions that can be evaluated at compile-time are pre-evaluated, saving
 219 calculations during runtime. Some constant-folding is required, as Parrot
 220 depends on this. For instance:
 221
 222  add $I0, 1, 2
 223
 224 is not a valid Parrot instruction; there is no C<add_i_ic_ic> instruction.
 225 Instead, this will be translated to:
 226
 227  set $I0, 3
 228
 229 which, as was explained earlier, will select the C<set_i_ic> instruction.
 230
 231 The conditional branch instructions are also pre-evaluated, if possible. For
 232 instance, consider the following statement:
 233
 234  if 1 < 2 goto L1
 235
 236 It is clear during compile time, that 1 is smaller than 2; so instead of
 237 evaluating this during runtime, we know for sure that the branch to label
 238 C<L1> will be made, effectively replacing the above statement by:
 239
 240  goto L1
 241
 242 Likewise, if it's clear that certain instructions don't have any effect,
 243 they can be removed altogether:
 244
 245  if 1 > 2 goto L1        --> nop  # nop is no opcode.
 246  $I0 = $I0 + 0           --> nop
 247
 248 Another type of optimization is the selection of (slightly) more efficient
 249 variants of instructions. For instance, consider the following instruction:
 250
 251  $I0 = $I0 + $I1
 252
 253 which is actually syntactic sugar for:
 254
 255  add $I0, $I0, $I1
 256
 257 In C one would write (ignoring the fact that $I0 and $I0 are not a valid C
 258 identifiers):
 259
 260  $I0 += $I1
 261
 262 which is in fact valid PIR as well. When the PIR parser sees an instruction
 263 of this form, it will automatically select the variant with 2 operands
 264 instead of the 3-operand variant. So:
 265
 266  add $I0, $I0, $1    # $I0 is an out operand
 267
 268 will be optimized, as if you had written:
 269
 270  add $I0, $I1        # $I0 is an in/out operand
 271
 272 The PIR parser can do even more improvements, if it sees opportunity to do so.
 273 Consider the following statement:
 274
 275  $I0 = $I0 + 1
 276
 277 or, in Parrot assembly syntax:
 278
 279  add $I0, $I0, 1
 280
 281 Again, in C one would write (again ignoring the valid identifier issue): C<$I0++>,
 282 or in other words, C<incrementing> the given identifier. Parrot has C<inc> and C<dec>
 283 instructions built-in as well, so that the above statement C<$I0 = $I0 + 1> can be
 284 optimized to:
 285
 286  inc $I0
 287
 288 =head3 Vanilla Register Allocator
 289
 290 The PIR compiler implements a vanilla register allocator. This means that each
 291 declared C<.local> or C<.param> symbol, and each PIR register ($Px, $Sx, $Ix, $Nx)
 292 is assigned a unique PASM register, that is associated with the original symbol
 293 or PIR register throughout the subroutine.
 294
 295 PIRC has a register optimizer, which can optimize the register usage. Run PIRC
 296 with the C<-r> option to activate this. The register optimizer is implemented
 297 using a Linear Scan Register allocator.
 298
 299 The implementation of the vanilla register allocator is done in the PIR symbol
 300 management module (C<pirsymbol.c>).
 301
 302 =head2 Register optimizer
 303
 304 PIRC has a register optimizer, which uses a Linear Scan Register algorithm.
 305 For each symbolic register, a live-interval object is created, which has
 306 an I<start> and I<end> point, indicating the first and last usage of that
 307 symbolic register in the sub. The register optimizer figures out when
 308 symbolic registers don't overlap, in which case they can use the same
 309 register (assuming they're of the same type).
 310
 311 =head2 Status
 312
 313 Bytecode generation is done, but there is the occasional bug. These
 314 are reported in trac.parrot.org.
 315
 316
 317 =head1 IMPLEMENTATION
 318
 319 The directory compilers/pirc has a number of subdirectories:
 320
 321 =over 4
 322
 323 =item doc - contains documentation.
 324
 325 =item heredoc - contains the implementation of the heredoc preprocessor. This is now
 326 integrated with pirc/src. It now only has a driver program to build a stand-alone
 327 heredoc preprocessor.
 328
 329 =item src - contains the Bison/Flex implementation of PIRC
 330
 331 =item t - for tests. Tests input is fed into Parrot after compilation,
 332 which will run the code.
 333
 334 =item macro - contains the old implementation of the macro preprocessor. This is now
 335 integrated with pirc/src. These files are kept as a reference until the macro
 336 preprocessor in pirc/src is completed.
 337
 338 =back
 339
 340 =head1 MAKING CHANGES
 341
 342 If you want to make changes to the lexer of parser files, you will need the Flex
 343 and/or Bison programs. There are ports available for Windows, but I don't know
 344 whether they're any good. I use Cygwin's tools.
 345
 346 =head2 Updating the lexer
 347
 348 The heredoc preprocessor is implemented in C<hdocprep.l>, and can be regenerated
 349 using:
 350
 351    cd compilers/pirc/src
 352    flex hdocprep.l
 353
 354 PIRC's normal lexer is implemented in C<pir.l>, and can be regenerated using:
 355
 356    cd compilers/pirc/src
 357    flex pir.l
 358
 359 =head2 Updating the parser
 360
 361 The parser is implemented in C<pir.y>, and can be regenerated using:
 362
 363    cd compilers/pirc/src
 364    bison pir.y
 365
 366 =head1 NOTES
 367
 368
 369 =head2 Cygwin processable lexer spec.
 370
 371 The file C<pir.l> from which the lexer is generated is I<not> processable by Cygwin's
 372 default version of Flex. In order to make a reentrant lexer, a newer version is needed,
 373 which can be downloaded from the link below.
 374
 375 L<http://sourceforge.net/project/downloading.php?groupname=flex&filename=flex-2.5.33.tar.gz&use_mirror=belnet>
 376
 377 Just do:
 378
 379  $ ./configure
 380  $ make
 381
 382 Then make sure to overwrite the supplied flex binary.
 383
 384 =head1 BUGS
 385
 386 Having a look at this implementation would be greatly appreciated, and any resulting
 387 feedback even more :-). Please post bug reports in trac.parrot.org.
 388
 389
 390 =head1 SEE ALSO
 391
 392 See also:
 393
 394 =over 4
 395
 396 =item * C<languages/PIR> for a PGE based implementation.
 397
 398 =item * C<compilers/imcc>, the current I<standard> PIR implementation.
 399
 400 =item * C<docs/imcc/syntax.pod> for a description of PIR syntax.
 401
 402 =item * C<docs/imcc/> for more documentation about the PIR language.
 403
 404 =item * C<docs/pdds/pdd19_pir.pod> for the PIR design document.
 405
 406 =back
 407
 408 =cut