docs/intro.pod

   1 # Copyright (C) 2001-2009, Parrot Foundation.
   2 # $Id$
   3
   4 =head1 NAME
   5
   6 docs/intro.pod - The Parrot Primer
   7
   8 =head1 Welcome to Parrot
   9
  10 This document provides a gentle introduction to the Parrot virtual machine for
  11 anyone considering writing code for Parrot by hand, writing a compiler that
  12 targets Parrot, getting involved with Parrot development or simply wondering
  13 what on earth Parrot is.
  14
  15 =head1 What is Parrot?
  16
  17 =head2 Virtual Machines
  18
  19 Parrot is a virtual machine. To understand what a virtual machine is, consider
  20 what happens when you write a program in a language such as Perl, then run it
  21 with the applicable interpreter (in the case of Perl, the perl executable).
  22 First, the program you have written in a high level language is turned into
  23 simple instructions, for example I<fetch the value of the variable named x>,
  24 I<add 2 to this value>, I<store this value in the variable named y>, etc. A
  25 single line of code in a high level language may be converted into tens of
  26 these simple instructions. This stage is called I<compilation>.
  27
  28 The second stage involves executing these simple instructions. Some languages
  29 (for example, C) are often compiled to instructions that are understood by the
  30 CPU and as such can be executed by the hardware. Other languages, such as Perl,
  31 Python and Java, are usually compiled to CPU-independent instructions.  A
  32 I<virtual machine> (sometimes known as an I<interpreter>) is required to
  33 execute those instructions.
  34
  35 While the central role of a virtual machine is to efficiently execute
  36 instructions, it also performs a number of other functions. One of these is to
  37 abstract away the details of the hardware and operating system that a program
  38 is running on. Once a program has been compiled to run on a virtual machine, it
  39 will run on any platform that the VM has been implemented on. VMs may also
  40 provide security by allowing more fine-grained limitations to be placed on a
  41 program, memory management functionality and support for high level language
  42 features (such as objects, data structures, types, subroutines, etc).
  43
  44 =head2 Design goals
  45
  46 Parrot is designed with the needs of dynamically typed languages (such as Perl
  47 and Python) in mind, and should be able to run programs written in these
  48 languages more efficiently than VMs developed with static languages in mind
  49 (JVM, .NET). Parrot is also designed to provide interoperability between
  50 languages that compile to it. In theory, you will be able to write a class in
  51 Perl, subclass it in Python and then instantiate and use that subclass in a Tcl
  52 program.
  53
  54 Historically, Parrot started out as the runtime for Perl 6. Unlike Perl 5, the
  55 Perl 6 compiler and runtime (VM) are to be much more clearly separated. The
  56 name I<Parrot> was chosen after the 2001 April Fool's Joke which had Perl and
  57 Python collaborating on the next version of their languages. The name reflects
  58 the intention to build a VM to run not just Perl 6, but also many other
  59 languages.
  60
  61
  62 =head1 Parrot concepts and jargon
  63
  64 =head2 Instruction formats
  65
  66 Parrot can currently accept instructions to execute in four forms. PIR (Parrot
  67 Intermediate Representation) is designed to be written by people and generated
  68 by compilers. It hides away some low-level details, such as the way parameters
  69 are passed to functions. PASM (Parrot Assembly) is a level below PIR - it is
  70 still human readable/writable and can be generated by a compiler, but the
  71 author has to take care of details such as calling conventions and register
  72 allocation. PAST (Parrot Abstract Syntax Tree) enables Parrot to accept an
  73 abstract syntax tree style input - useful for those writing compilers.
  74
  75 All of the above forms of input are automatically converted inside Parrot to
  76 PBC (Parrot Bytecode). This is much like machine code, but understood by the
  77 Parrot interpreter. It is not intended to be human-readable or human-writable,
  78 but unlike the other forms execution can start immediately, without the need
  79 for an assembly phase. Parrot bytecode is platform independent.
  80
  81 =head2 The instruction set
  82
  83 The Parrot instruction set includes arithmetic and logical operators, compare
  84 and branch/jump (for implementing loops, if...then constructs, etc), finding
  85 and storing global and lexical variables, working with classes and objects,
  86 calling subroutines and methods along with their parameters, I/O, threads and
  87 more.
  88
  89 =head2 Registers and fundamental data types
  90
  91 The Parrot VM is register based. This means that, like a hardware CPU, it has a
  92 number of fast-access units of storage called registers. There are 4 types of
  93 register in Parrot: integers (I), numbers (N), strings (S) and PMCs (P). There
  94 are N of each of these, named I0,I1,..N0.., etc. Integer registers are the
  95 same size as a word on the machine Parrot is running on and number registers
  96 also map to a native floating point type.
  97 The amount of registers needed is determined per subroutine at compile-time.
  98
  99 =head2 PMCs
 100
 101 PMC stands for Polymorphic Container. PMCs represent any complex data structure
 102 or type, including aggregate data types (arrays, hash tables, etc). A PMC can
 103 implement its own behavior for arithmetic, logical and string operations
 104 performed on it, allowing for language-specific behavior to be introduced. PMCs
 105 can be built in to the Parrot executable or dynamically loaded when they are
 106 needed.
 107
 108 =head2 Garbage Collection
 109
 110 Parrot provides garbage collection, meaning that Parrot programs do not need
 111 to free memory explicitly; it will be freed when it is no longer in use (that
 112 is, no longer referenced) whenever the garbage collector runs.
 113
 114
 115 =head1 Obtaining, building and testing Parrot
 116
 117 =head2 Where to get Parrot
 118
 119 See L<http://www.parrot.org/download> for several ways to get a recent
 120 version of parrot.
 121
 122 =head2 Building Parrot
 123
 124 The first step to building Parrot is to run the F<Configure.pl> program, which
 125 looks at your platform and decides how Parrot should be built. This is done by
 126 typing:
 127
 128   perl Configure.pl
 129
 130 Once this is complete, run the C<make> program C<Configure.pl> prompts you
 131 with. When this completes, you will have a working C<parrot> executable.
 132
 133 Please report any problems that you encounter while building Parrot so the
 134 developers can fix them. You can do this by creating a login and opening
 135 a new ticket at L<https://trac.parrot.org>.  Please include the F<myconfig>
 136 file that was generated as part of the build process and any errors that you
 137 observed.
 138
 139 =head2 The Parrot test suite
 140
 141 Parrot has an extensive regression test suite. This can be run by typing:
 142
 143   make test
 144
 145 Substituting make for the name of the make program on your platform. The output
 146 will look something like this:
 147
 148  C:\Perl\bin\perl.exe t\harness --gc-debug
 149    t\library\*.t  t\op\*.t  t\pmc\*.t  t\run\*.t  t\native_pbc\*.t
 150    imcc\t\*\*.t  t\dynpmc\*.t  t\p6rules\*.t t\src\*.t t\perl\*.t
 151  t\library\dumper...............ok
 152  t\library\getopt_long..........ok
 153  ...
 154  All tests successful, 4 test and 71 subtests skipped.
 155  Files=163, Tests=2719, 192 wallclock secs ( 0.00 cusr +  0.00 csys =  0.00 CPU)
 156
 157 It is possible that a number of tests may fail. If this is a small number, then
 158 it is probably little to worry about, especially if you have the latest Parrot
 159 sources from the SVN repository. However, please do not let this discourage you
 160 from reporting test failures, using the same method as described for reporting
 161 build problems.
 162
 163
 164 =head1 Some simple Parrot programs
 165
 166 =head2 Hello world!
 167
 168 Create a file called F<hello.pir> that contains the following code.
 169
 170 =begin PIR
 171
 172   .sub main
 173       say "Hello world!"
 174   .end
 175
 176 =end PIR
 177
 178 Then run it by typing:
 179
 180   parrot hello.pir
 181
 182 As expected, this will display the text C<Hello world!> on the console,
 183 followed by a new line.
 184
 185 Let's take the program apart. C<.sub main> states that the instructions that
 186 follow make up a subroutine named C<main>, until a C<.end> is encountered. The
 187 second line contains the C<print> instruction. In this case, we are calling the
 188 variant of the instruction that accepts a constant string. The assembler takes
 189 care of deciding which variant of the instruction to use for us.
 190
 191 =head2 Using registers
 192
 193 We can modify hello.pir to first store the string C<Hello world!> in a
 194 register and then use that register with the print instruction.
 195
 196 =begin PIR
 197
 198   .sub main
 199       $S0 = "Hello world!"
 200       say $S0
 201   .end
 202
 203 =end PIR
 204
 205 PIR does not allow us to set a register directly. We need to prefix the
 206 register name with C<$> when referring to a register. The compiler will map $S0
 207 to one of the available string registers, for example S0, and set the value.
 208 This example also uses the syntactic sugar provided by the C<=> operator.  C<=>
 209 is simply a more readable way of using the C<set> opcode.
 210
 211 To make PIR even more readable, named registers can be used. These are later
 212 mapped to real numbered registers.
 213
 214 =begin PIR
 215
 216   .sub main
 217       .local string hello
 218       hello = "Hello world!"
 219       say hello
 220   .end
 221
 222 =end PIR
 223
 224 The C<.local> directive indicates that the named register is only needed inside
 225 the current subroutine (that is, between C<.sub> and C<.end>). Following
 226 C<.local> is a type. This can be C<int> (for I registers), C<float> (for N
 227 registers), C<string> (for S registers), C<pmc> (for P registers) or the name
 228 of a PMC type.
 229
 230 =head2 PIR vs. PASM
 231
 232 PASM does not handle register allocation or provide support for named
 233 registers.  It also does not have the C<.sub> and C<.end> directives, instead
 234 replacing them with a label at the start of the instructions.
 235
 236 =head2 Summing squares
 237
 238 This example introduces some more instructions and PIR syntax. Lines starting
 239 with a C<#> are comments.
 240
 241 =begin PIR
 242
 243   .sub main
 244       # State the number of squares to sum.
 245       .local int maxnum
 246       maxnum = 10
 247
 248       # We'll use some named registers. Note that we can declare many
 249       # registers of the same type on one line.
 250       .local int i, total, temp
 251       total = 0
 252
 253       # Loop to do the sum.
 254       i = 1
 255   loop:
 256       temp = i * i
 257       total += temp
 258       inc i
 259       if i <= maxnum goto loop
 260
 261       # Output result.
 262       print "The sum of the first "
 263       print maxnum
 264       print " squares is "
 265       print total
 266       print ".\n"
 267   .end
 268
 269 =end PIR
 270
 271 PIR provides a bit of syntactic sugar that makes it look more high level than
 272 assembly. For example:
 273
 274 =begin PIR_FRAGMENT
 275
 276   .local pmc temp, i
 277   temp = i * i
 278
 279 =end PIR_FRAGMENT
 280
 281 Is just another way of writing the more assembly-ish:
 282
 283 =begin PIR_FRAGMENT
 284
 285   .local pmc temp, i
 286   mul temp, i, i
 287
 288 =end PIR_FRAGMENT
 289
 290 And:
 291
 292 =begin PIR_FRAGMENT
 293
 294   .local pmc i, maxnum
 295   if i <= maxnum goto loop
 296   # ...
 297   loop:
 298
 299 =end PIR_FRAGMENT
 300
 301 Is the same as:
 302
 303 =begin PIR_FRAGMENT
 304
 305   .local pmc i, maxnum
 306   le i, maxnum, loop
 307   # ...
 308   loop:
 309
 310 =end PIR_FRAGMENT
 311
 312 And:
 313
 314 =begin PIR_FRAGMENT
 315
 316   .local pmc temp, total
 317   total += temp
 318
 319 =end PIR_FRAGMENT
 320
 321 Is the same as:
 322
 323 =begin PIR_FRAGMENT
 324
 325   .local pmc  temp, total
 326   add total, temp
 327
 328 =end PIR_FRAGMENT
 329
 330 As a rule, whenever a Parrot instruction modifies the contents of a register,
 331 that will be the first register when writing the instruction in assembly form.
 332
 333 As is usual in assembly languages, loops and selection are implemented in terms
 334 of conditional branch statements and labels, as shown above. Assembly
 335 programming is one place where using goto is not bad form!
 336
 337 =head2 Recursively computing factorial
 338
 339 In this example we define a factorial function and recursively call it to
 340 compute factorial.
 341
 342 =begin PIR
 343
 344   .sub factorial
 345       # Get input parameter.
 346       .param int n
 347
 348       # return (n > 1 ? n * factorial(n - 1) : 1)
 349       .local int result
 350
 351       if n > 1 goto recurse
 352       result = 1
 353       goto return
 354
 355   recurse:
 356       $I0 = n - 1
 357       result = factorial($I0)
 358       result *= n
 359
 360   return:
 361       .return (result)
 362   .end
 363
 364
 365   .sub main :main
 366       .local int f, i
 367
 368       # We'll do factorial 0 to 10.
 369       i = 0
 370   loop:
 371       f = factorial(i)
 372
 373       print "Factorial of "
 374       print i
 375       print " is "
 376       print f
 377       print ".\n"
 378
 379       inc i
 380       if i <= 10 goto loop
 381   .end
 382
 383 =end PIR
 384
 385 The first line, C<.param int n>, specifies that this subroutine takes one
 386 integer parameter and that we'd like to refer to the register it was passed in
 387 by the name C<n> for the rest of the sub.
 388
 389 Much of what follows has been seen in previous examples, apart from the line
 390 reading:
 391
 392 =begin PIR_FRAGMENT
 393
 394   .local int result
 395   result = factorial($I0)
 396
 397 =end PIR_FRAGMENT
 398
 399 The last line of PIR actually represents a few lines of PASM. The assembler
 400 builds a PMC that describes the signature, including which register the
 401 arguments are held in. A similar process happens for providing the registers
 402 that the return values should be placed in. Finally, the C<factorial> sub is
 403 invoked.
 404
 405 Right before the C<.end> of the C<factorial> sub, a C<.return> directive is
 406 used to specify that the value held in the register named C<result> is to be
 407 copied to the register that the caller is expecting the return value in.
 408
 409 The call to C<factorial> in main works in just the same was as the recursive
 410 call to C<factorial> within the sub C<factorial> itself. The only remaining
 411 bit of new syntax is the C<:main>, written after C<.sub main>. By default,
 412 PIR assumes that execution begins with the first sub in the file. This
 413 behavior can be changed by marking the sub to start in with C<:main>.
 414
 415 =head2 Compiling to PBC
 416
 417 To compile PIR to bytecode, use the C<-o> flag and specify an output file with
 418 the extension F<.pbc>.
 419
 420   parrot -o factorial.pbc factorial.pir
 421
 422 =head1 Where next?
 423
 424 =head2 Documentation
 425
 426 What documentation you read next depends upon what you are looking to do with
 427 Parrot. The opcodes reference and built-in PMCs reference are useful to dip
 428 into for pretty much everyone. If you intend to write or compile to PIR then
 429 there are a number of documents about PIR that are worth a read. For compiler
 430 writers, the Compiler FAQ is essential reading. If you want to get involved
 431 with Parrot development, the PDDs (Parrot Design Documents) contain some
 432 details of the internals of Parrot; a few other documents fill in the gaps. One
 433 way of helping Parrot development is to write tests, and there is a document
 434 entitled I<Testing Parrot> that will help with this.
 435
 436 =head2 The Parrot Mailing List
 437
 438 Much Parrot development and discussion takes place on the
 439 parrot-dev mailing list. You can subscribe by filling out the form at
 440 L<http://lists.parrot.org/mailman/listinfo/parrot-dev> or read the NNTP
 441 archive at L<http://groups.google.com/group/parrot-dev/>.
 442
 443 =head2 IRC
 444
 445 The Parrot IRC channel is hosted on irc.parrot.org and is named C<#parrot>.
 446 Alternative IRC servers are at irc.pobox.com and irc.rhizomatic.net.
 447
 448 =cut