1 # Copyright (C) 2001-2005, Parrot Foundation.
14 =item 0.2 uninitialized warning; optimizations
16 =item 0.3 merged parsing.pod into this file.
22 This document describes the principles of IMCC operation.
26 The main features of imcc are:
30 =item Source file parsing
32 =item Register allocation
38 =item Running the code
42 =head1 SOURCE FILE PARSING
46 IMCC parses and generates code in terms of I<compilation units>. These
47 are self-contained blocks of code very similar to subroutines.
49 Code for a compilation unit is created as soon (or not earlier) as the
50 end of the unit is reached.
52 {{ Is this true? one sub calling another not-yet compiled sub would
53 not work in that case. }}
56 =head2 Symbols, constants and labels
58 I<Compilation units> maintain their own symbol table containing local
59 labels and variable symbols. This symbol table, C<hash>, is not visible
60 to code in different units.
62 If you need global variables, please use the B<get_{hll,root}_global> opcodes.
64 Global labels and constants are kept in the global symbol table
65 C<ghash>. This allows for global constant folding beyond the scope
66 of individual subroutines.
68 This also means that you currently can't use the same global symbol (e.g.
69 subroutine name) in different namespaces. The following creates invalid code:
84 Local labels in different I<compilation units> with the same name are
85 allowed, though assembling the generated PASM doesn't work. However,
86 running this code inside imcc is ok. This will probably change in the
87 future so that local labels are mangled to be unique.
91 =head1 REGISTER ALLOCATION
93 Register allocation is done per I<compilation unit>.
95 IMCC I<identifiers> and I<temporary variables> e.g. $I0 are assigned a
96 physical parrot register depending on the life range of these
97 variables. If the life range of one variable doesn't overlap the range
98 of another variable, they might get the same parrot register. For
117 provided that $I0 is not used after these lines. In this case, the
118 assignment to $I0 is redundant and will be optimized away if IMCC is
119 run with optimization level B<-O2>.
121 I<PASM registers> keep their register. During the usage of a I<PASM
122 register> this register will be not get assigned to. Therefore, they
123 should be used only when absolutely necessary, and you should try to
124 avoid using them within long pieces of code.
128 To determine the life range of variables, the code gets separated into
129 pieces, called B<basic block>s. A B<basic block> starts at a label,
130 which can get jumped to, and ends at a branch instruction.
134 All connections between the B<basic block>s are calculated. This allows
137 =head2 Loop detection
139 where the range and depth of loops is calculated.
143 Whenever an operand is marked as an B<OUT> argument, this
144 operand starts with a new value. This means that at this point the
145 life range of the symbol ends and a new life range is started, which
146 allows the allocation of a different register to the same variable or
147 the same register to a different variable.
149 Variables used as B<IN> parameters must keep their parrot register
150 over their usage range.
152 When C<imcc> detects a register usage, where the first operation is
153 using (reading) a register (and warnings are enabled), C<imcc> emits
154 an appropriate message.
156 Consider these two code snippets (block numbers are attached):
161 $I0 = 0 # 0 initialized
169 print $I1 # 3 all paths leading here do init
180 $I0 = 0 # 0 initialized
181 if $I0 goto l1 # 0 branch to bb 1 or 2
182 $I1 = 1 # 1 init only in block 1
185 print $I1 # 2 no init in code path from block 0
191 The latter of these emits the warning:
193 warning:imcc:propagate_need: '$I1' might be used \
194 uninitialized in _main:7
196 =head2 Interference graph
198 Once the above information is calculated, the next step is to look at
199 which variables interfere with which others. Non-interfering variables
200 can be given the same parrot register.
202 =head2 Register allocation
204 C<imcc> then starts allocating registers according to a variable's
205 score. Variables deeply nested inside loops have the highest score and
206 get a parrot register first. Variables with a long life range (i.e.
207 with many interferences) get allocated last.
211 Optimizations are only done when enabled with the B<-O> switch. Please
212 consult F<t/imcpasm/*.t> for examples. They occur between various
213 stages and may be repeatedly done: e.g. after converting a conditional
214 branch to an absolute one, unreachable code will be removed then,
215 which might cause unused labels ...
217 =head1 OPTIMIZATIONS WITH -O1
219 =head2 Constant substitution
221 Constant arguments to many ops are evaluated. Conditional branches
222 with constant conditions are converted to unconditional branches.
223 Integer arguments to float operations are converted to float operands.
225 =head2 If-branch optimization
253 The same is done for other conditional branches B<gt>, B<ge>, B<eq>
254 and their reverse meanings.
256 =head2 Branches to branches
258 Unconditional branch sequences get optimized to jumps to the final label.
262 Unreferenced labels are deleted.
264 =head2 Dead code removal
266 Code not reachable after an unconditional branch instruction and basic
267 blocks that are not entered from somewhere get removed.
269 =head1 OPTIMIZATIONS WITH -O2
271 Note: These are currently experimental and might not do the Right
276 For a sequence of code
285 where B<$I0> is not used again, the first assignment will be tossed,
286 resulting in code like:
294 =head2 Loop optimization
296 Instructions which are invariant to a loop are pulled out of the loop
297 and inserted in front of the loop entry.
299 =head1 Code generation
301 C<imcc> either generates PASM or else directly generates a PBC file for
306 Additionally the generated code can be run immediately inside imcc.
307 All parrot runtime options like B<-R jit> or B<-t> are available.
311 F<imc.c>, F<cfg.c>, F<optimizer.c>, F<pbc.c>
315 Leopold Toetsch <lt@toetsch.at>