1 We need to switch to a new register allocator.
2 The current one is split in a global and a local register allocator.
3 The global one can assign only callee-saves registers and happens
4 on the tree-based internal representation: it assigns local variables
6 The local one happens on the linear representation on a per basic
7 block basis and assigns hard registers to virtual registers (which
8 hold temporary values during expression executions) and it deals also
9 with the platform-specific issues (fixed registers, call conventions).
11 Moving to a different register will help solve some of the performance
12 issues introduced by the above split, make the register more easily
13 portable and solve some of the issues generated by dealing with trees.
15 The general design ideas are below.
17 The new allocator should have a global view of all the method, so it can be
18 able to assign variables also to some of the volatile registers if possible,
19 even across basic blocks (this would improve performance).
21 The allocator would be driven by per-arch declarative data, so porting
22 should be easier: an architecture needs to specify register classes,
23 call convention and instructions requirements (similar to the gcc code).
25 The allocator should operate on the linear representation, this way it's
26 easier and faster to track usages more correctly. We need to assign virtual
27 registers on a per-method basis instead of per basic block. We can assign
28 virtual registers to variables, too. Note that since we fix the stack offset
29 of local vars only after this step (which happens after the burg rules are run),
30 some of the burg rules that try to optimize the code won't apply anymore:
31 the peephole code may need to be enhanced to do the optimizations instead.
33 We need to handle floating point registers in the global allocator, too.
35 The new allocator also needs to keep track precisely of which registers
36 contain references or managed pointers to allow us to move to a precise GC.
38 It may be worth to use a single increasing set of integers for the virtual
39 registers, with the class of the register stored separately (unless the
40 current local allocator which keeps interger and fp registers separate).
42 Since this is a large task, we need to do it in steps as much as possible.
43 The first is to run the register allocator _after_ the burg rules: this
44 requires a rewrite of the liveness code, too, to use linear indexes instead
45 of basic-block/tree number combinations. This can be done by:
46 *) allocating virtual regs to all the locals that can be register allocated
47 *) running the burg rules (some may require adjustments): the local virtual
48 registers are assigned starting from global-virt-regs+1, instead of the current
49 hardware-regs+1, so we can tell apart global and local virt regs.
50 *) running the liveness/whatever code is needed to allocate the global registers
51 *) allocate the rest of the local variables to stack slots
52 *) continue with the current local allocator
54 This work could take 2-3 weeks.
56 The next step is to define the kind of declarative data an architecture needs
57 and assigning virtual regs to all the registers and making the allocator
58 assign from the volatile registers, too.
59 Note that some of the code that is currently emitted in the arch-specific
60 code, will need to be emitted as instructions that the reg allocator
61 can inspect: think of a method that returns the first argument which is
62 received in a register: the current code copies it to either a local slot or
63 to a global reg in the prolog an copies it back to the return register
64 int he basic block, but since neither the regallocator nor the peephole code
65 knows about the prolog code, the first store cannot be optimized away.
66 The gcc code has some example of how to specify register classes in a