compilers/pirc/doc/design.pod

   1 =head1 NAME
   2
   3 design.pod - description PIRC's design.
   4
   5 =head1 DESCRIPTION
   6
   7 This document describes the design and implementation of PIRC, a PIR Compiler.
   8
   9 =head1 OVERVIEW
  10
  11 PIRC currently consists of a PIR parser, together with a lexer. It also has
  12 the beginning of semantic actions in the parser. Through the use of a vtable,
  13 several back-ends can be implemented, leaving the parser untouched.
  14
  15 Documentation of the lexer and the parser can be generated by running:
  16
  17  make docs
  18
  19 which will generate html files in the C<doc> directory.
  20
  21 This document will only provide a high-level overview.
  22
  23
  24 =head1 THE LEXER
  25
  26 The lexer is defined in C<pirlexer.c>. The header file lists all tokens that
  27 may be returned by the lexer.
  28
  29 The lexer reads the complete file contents into a buffer, from which it reads
  30 the individual words, or I<tokens>. A buffer is much faster than using C<getc()>
  31 for each character, as I/O is relatively slow.
  32
  33
  34 =head1 THE PARSER
  35
  36 The parser is defined in C<pirparser.c>. The header file only predeclares the
  37 C<parser_state> structure, but its definition is written in the C file, to hide
  38 the implementation details from other files. Access to specific fields is done
  39 through accessor functions, defined in the header file as well.
  40
  41 The parser communicates with the lexer through the lexer's accessor function. Of
  42 these, the C<next_token()> function is most important: it requests the next token
  43 from the lexer.
  44
  45 The parser does not know anything about the spelling of tokens, although it can
  46 request these through C<find_keyword()>.
  47
  48
  49 =head1 SEMANTIC ACTIONS
  50
  51 The parser calls at a number of places C<emit> functions. These are I<hooks> to
  52 which a function can be hooked, that will be called when the parser calls that
  53 function. This is implemented using vtables. Of course, not all hooks need to be
  54 used. If a hook is not assigned a function by the user, the default empty function
  55 is invoked. This is done to prevent NULL checks; let's just hope the optimizer
  56 sees the invoked function is empty, so the overhead of calling it is removed.
  57
  58 =head2 Example Vtable methods
  59
  60 This section gives a simple example to show how things are used.
  61 Let's consider a simplified version of a C<long_invocation>. Syntactically, it
  62 looks like this (this is the simplified version):
  63
  64  long-invocation -> '.begin_call' '\n'
  65                     arguments
  66                     '.call' invokable '\n'
  67                     results
  68                     '.end_call' '\n'
  69
  70  invocant -> IDENTIFIER | PREG
  71
  72 The parsing routine for long-invocation (again, its simplified version) looks
  73 as follows:
  74
  75  static void
  76  long_invocation(parser_state *p) {
  77       emit_invocation_start(p); /* indicate start of invocation */
  78       match(p, T_PCC_BEGIN);
  79       match(p, T_NEWLINE);
  80       arguments(p);
  81
  82       match(p, T_PCC_CALL); /* check for token '.call' */
  83       match(p, T_NEWLINE);  /* check for a newline */
  84
  85       /* get current token from lexer and store it as the invokable object */
  86       emit_invokable(p, get_current_token(p->lexer));
  87
  88       /* check whether it was an invokable object and get next token */
  89       switch (p->curtoken) {
  90           case T_IDENTIFIER:
  91           case T_PREG:
  92               emit_invokable(p, get_current_token(p->lexer));
  93               break;
  94           default:
  95               syntax_error(p, 1, "invokable object expected");
  96               break;
  97       }
  98
  99       results(p);
 100
 101       match(p, T_PCC_END); /* accept token '.end_call' */
 102       match(p, T_NEWLINE); /* accept the newline token */
 103       emit_invocation_end(p); /* close down invocation sequence */
 104
 105  }
 106
 107  static void
 108  arguments(parser_state *p) {
 109       emit_args_start(p);   /* start sequence of arguments */
 110       /* handle arguments */
 111       emit_args_end(p); /* stop sequence of arguments */
 112  }
 113
 114  static void
 115  results(p) {
 116       emit_results_start(p); /* start sequence of results */
 117       /* handle results */
 118       emit_results_end(p); /* stop sequence of results */
 119  }
 120
 121 To each of the emit_* function calls, the writer of the back-end can
 122 hook a custom function, that does the Appropiate Thing. What is
 123 appropiate, depends on the back-end. Some back-ends need to construct
 124 a data structure (AST) (for example the PBC backend would need this),
 125 others can just spit out what they get (like the PIR back-end).
 126
 127
 128 =head2 Supported back-ends
 129
 130 Currently, there are the following back-end targets:
 131
 132 =over 4
 133
 134 =item *
 135
 136 PAST - textual form of PAST using Data::Dumper format.
 137
 138 =item *
 139
 140 PIR - PIR output, which I<may> change PIR syntax into PASM syntax.
 141
 142 =item *
 143
 144 JSON - JSON is extremely simple, and adding this back-end was pretty easy.
 145
 146 =item *
 147
 148 PBC - but this one is not implemented at all. Just a stub file.
 149
 150 =back
 151
 152 See src/pirvtable.{c,h} for details.
 153
 154 Please note that none of the back-ends is complete.
 155
 156
 157 =head2 VTable Methods
 158
 159 Now, you might ask yourself, who or what decides where these hooks, or vtable method calls
 160 are done. "Why is there a hook over I<here>?" Well, that's done by figuring out at what
 161 moment a back-end might need to get some information of the parser. As the parser continues,
 162 the tokens being read are lost, if they're not stored anywhere. So, every once and a while
 163 the back-end needs to be able to store stuff, so it can do its job properly.
 164
 165 The vtable is not complete yet. There are a number of parsing routines that do not have
 166 associated vtable methods (invocations). Of course, we don't want the parser to do too
 167 much vtable invocations. On the other hand, if the parser does too few, constructing a
 168 back-end might be impossible. It's a bit of a trade-off.
 169
 170
 171 =head1 OVERVIEW
 172
 173 This section gives an overview of what functionality is in what file:
 174
 175 =over 4
 176
 177 =item *
 178
 179 src/pirlexer.{c,h} - implementation of the lexer
 180
 181 =item *
 182
 183 src/pirparser.{c,h} - implementation of the parser
 184
 185 =item *
 186
 187 src/pirvtable.{c,h} - constructor of an empty vtable
 188
 189 =item *
 190
 191 src/pirout.{c,h} - back-end that implements the vtable methods to output PIR
 192
 193 =item *
 194
 195 src/pastout.{c,h} - back-end that implements the vtable methods to output PAST
 196 (in Data::Dumper format)
 197
 198 =item *
 199
 200 src/pbcout.{c,h} - dummy back-end for PBC. This file only creates a vtable,
 201 but no implementation yet.
 202
 203 =item *
 204
 205 src/jsonout.{c,h} - back-end that implements the vtable methods to output JSON.
 206
 207 =item *
 208
 209 src/pirmain.c - main file for C<pirc>. Execution starts here.
 210
 211 =back
 212
 213 =head1 WHAT NEEDS TO BE DONE
 214
 215 There are some major TODOs:
 216
 217 =over 4
 218
 219 =item *
 220
 221 Check whether an identifier is actually a Parrot op. In IMCC, this is done by calling
 222 Parrot_is_builtin(). However, for that, we need a Parrot_Interp. Currently I have problems
 223 getting things to link correctly.
 224
 225 =item *
 226
 227 Complete at least 1 back-end, to see what more vtable entries we need. And of course,
 228 to generate PBC in the end.
 229
 230 =item *
 231
 232 Complete the vtable structure with all needed vtable methods.
 233
 234 =item *
 235
 236 Memory management; not all memory is freed at this moment. Does it need to be done
 237 by the back-end, or by the parser?
 238
 239 =back
 240
 241 =head1 AUTHOR
 242
 243 Klaas-Jan Stol <parrotcode at gmail dot com>
 244
 245 =cut