gcc/PROJECTS

   1 0. Improved efficiency.
   2
   3 * Parse and output array initializers an element at a time, freeing
   4 storage after each, instead of parsing the whole initializer first and
   5 then outputting.  This would reduce memory usage for large
   6 initializers.
   7
   8 * See if the techniques describe in Oct 1991 SIGPLAN Notices
   9 (Frazer and Hanson) are applicable to GCC.
  10
  11 1. Better optimization.
  12
  13 * Constants in unused inline functions
  14
  15 It would be nice to delay output of string constants so that string
  16 constants mentioned in unused inline functions are never generated.
  17 Perhaps this would also take care of string constants in dead code.
  18
  19 The difficulty is in finding a clean way for the RTL which refers
  20 to the constant (currently, only by an assembler symbol name)
  21 to point to the constant and cause it to be output.
  22
  23 * More cse
  24
  25 The techniques for doing full global cse are described in the red
  26 dragon book, or (a different version) in Frederick Chow's thesis from
  27 Stanford.  It is likely to be slow and use a lot of memory, but it
  28 might be worth offering as an additional option.
  29
  30 It is probably possible to extend cse to a few very frequent cases
  31 without so much expense.
  32
  33 For example, it is not very hard to handle cse through if-then
  34 statements with no else clauses.  Here's how to do it.  On reaching a
  35 label, notice that the label's use-count is 1 and that the last
  36 preceding jump jumps conditionally to this label.  Now you know it
  37 is a simple if-then statement.  Remove from the hash table
  38 all the expressions that were entered since that jump insn
  39 and you can continue with cse.
  40
  41 It is probably not hard to handle cse from the end of a loop
  42 around to the beginning, and a few loops would be greatly sped
  43 up by this.
  44
  45 * Optimize a sequence of if statements whose conditions are exclusive.
  46
  47 It is possible to optimize
  48
  49     if (x == 1) ...;
  50     if (x == 2) ...;
  51     if (x == 3) ...;
  52
  53 into
  54
  55     if (x == 1) ...;
  56     else if (x == 2) ...;
  57     else if (x == 3) ...;
  58
  59 provided that x is not altered by the contents of the if statements.
  60
  61 It's not certain whether this is worth doing.  Perhaps programmers
  62 nearly always write the else's themselves, leaving few opportunities
  63 to improve anything.
  64
  65 * Un-cse.
  66
  67 Perhaps we should have an un-cse step right after cse, which tries to
  68 replace a reg with its value if the value can be substituted for the
  69 reg everywhere, if that looks like an improvement.  Which is if the
  70 reg is used only a few times.  Use rtx_cost to determine if the
  71 change is really an improvement.
  72
  73 * Clean up how cse works.
  74
  75 The scheme is that each value has just one hash entry.  The
  76 first_same_value and next_same_value chains are no longer needed.
  77
  78 For arithmetic, each hash table elt has the following slots:
  79
  80 * Operation.  This is an rtx code.
  81 * Mode.
  82 * Operands 0, 1 and 2.  These point to other hash table elements.
  83
  84 So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
  85 first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
  86 points to.  Then we put these elts into operands 0 and 1 of a new elt.
  87 We put PLUS and SI into the new elt.
  88
  89 Registers and mem refs would never be entered into the table as such.
  90 However, the values they contain would be entered.  There would be a
  91 table indexed by regno which points at the hash entry for the value in
  92 that reg.
  93
  94 The hash entry index now plays the role of a qty number.
  95 We still need qty_first_reg, reg_next_eqv, etc. to record which regs
  96 share a particular qty.
  97
  98 When a reg is used whose contents are unknown, we need to create a
  99 hash table entry whose contents say "unknown", as a place holder for
 100 whatever the reg contains.  If that reg is added to something, then
 101 the hash entry for the sum will refer to the "unknown" entry.  Use
 102 UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
 103
 104 For a constant, a unique hash entry would be made based on the
 105 value of the constant.
 106
 107 What about MEM?  Each time a memory address is referenced, we need a
 108 qty (a hash table elt) to represent what is in it.  (Just as for a
 109 register.)  If this isn't known, create one, just as for a reg whose
 110 contents are unknown.
 111
 112 We need a way to find all mem refs that still contain a certain value.
 113 Do this with a chain of hash elts (for memory addresses) that point to
 114 locations that hold the value.  The hash elt for the value itself should
 115 point to the start of the chain.  It would be good for the hash elt
 116 for an address to point to the hash elt for the contents of that address
 117 (but this ptr can be null if the contents have never been entered).
 118
 119 With this data structure, nothing need ever be invalidated except
 120 the lists of which regs or mems hold a particular value.  It is easy
 121 to see if there is a reg or mem that is equiv to a particular value.
 122 If the value is constant, it is always explicitly constant.
 123
 124 * Support more general tail-recursion among different functions.
 125
 126 This might be possible under certain circumstances, such as when
 127 the argument lists of the functions have the same lengths.
 128 Perhaps it could be done with a special declaration.
 129
 130 You would need to verify in the calling function that it does not
 131 use the addresses of any local variables and does not use setjmp.
 132
 133 * Put short statics vars at low addresses and use short addressing mode?
 134
 135 Useful on the 68000/68020 and perhaps on the 32000 series,
 136 provided one has a linker that works with the feature.
 137 This is said to make a 15% speedup on the 68000.
 138
 139 * Keep global variables in registers.
 140
 141 Here is a scheme for doing this.  A global variable, or a local variable
 142 whose address is taken, can be kept in a register for an entire function
 143 if it does not use non-constant memory addresses and (for globals only)
 144 does not call other functions.  If the entire function does not meet
 145 this criterion, a loop may.
 146
 147 The VAR_DECL for such a variable would have to have two RTL expressions:
 148 the true home in memory, and the pseudo-register used temporarily.
 149 It is necessary to emit insns to copy the memory location into the
 150 pseudo-register at the beginning of the function or loop, and perhaps
 151 back out at the end.  These insns should have REG_EQUIV notes so that,
 152 if the pseudo-register does not get a hard register, it is spilled into
 153 the memory location which exists in any case.
 154
 155 The easiest way to set up these insns is to modify the routine
 156 put_var_into_stack so that it does not apply to the entire function
 157 (sparing any loops which contain nothing dangerous) and to call it at
 158 the end of the function regardless of where in the function the
 159 address of a local variable is taken.  It would be called
 160 unconditionally at the end of the function for all relevant global
 161 variables.
 162
 163 For debugger output, the thing to do is to invent a new binding level
 164 around the appropriate loop and define the variable name as a register
 165 variable with that scope.
 166
 167 * Live-range splitting.
 168
 169 Currently a variable is allocated a hard register either for the full
 170 extent of its use or not at all.  Sometimes it would be good to
 171 allocate a variable a hard register for just part of a function; for
 172 example, through a particular loop where the variable is mostly used,
 173 or outside of a particular loop where the variable is not used.  (The
 174 latter is nice because it might let the variable be in a register most
 175 of the time even though the loop needs all the registers.)
 176
 177 It might not be very hard to do this in global.c when a variable
 178 fails to get a hard register for its entire life span.
 179
 180 The first step is to find a loop in which the variable is live, but
 181 which is not the whole life span or nearly so.  It's probably best to
 182 use a loop in which the variable is heavily used.
 183
 184 Then create a new pseudo-register to represent the variable in that loop.
 185 Substitute this for the old pseudo-register there, and insert move insns
 186 to copy between the two at the loop entry and all exits.  (When several
 187 such moves are inserted at the same place, some new feature should be
 188 added to say that none of those registers conflict merely because of
 189 overlap between the new moves.  And the reload pass should reorder them
 190 so that a store precedes a load, for any given hard register.)
 191
 192 After doing this for all the reasonable candidates, run global-alloc
 193 over again.  With luck, one of the two pseudo-registers will be fit
 194 somewhere.  It may even have a much higher priority due to its reduced
 195 life span.
 196
 197 There will be no room in general for the new pseudo-registers in
 198 basic_block_live_at_start, so there will need to be a second such
 199 matrix exclusively for the new ones.  Various other vectors indexed by
 200 register number will have to be made bigger, or there will have to be
 201 secondary extender vectors just for global-alloc.
 202
 203 A simple new feature could arrange that both pseudo-registers get the
 204 same stack slot if they both fail to get hard registers.
 205
 206 Other compilers split live ranges when they are not connected, or
 207 try to split off pieces `at the edge'.  I think splitting around loops
 208 will provide more speedup.
 209
 210 Creating a fake binding block and a new like-named variable with
 211 shorter life span and different address might succeed in describing
 212 this technique for the debugger.
 213
 214 * Detect dead stores into memory?
 215
 216 A store into memory is dead if it is followed by another store into
 217 the same location; and, in between, there is no reference to anything
 218 that might be that location (including no reference to a variable
 219 address).
 220
 221 * Loop optimization.
 222
 223 Strength reduction and iteration variable elimination could be
 224 smarter.  They should know how to decide which iteration variables are
 225 not worth making explicit because they can be computed as part of an
 226 address calculation.  Based on this information, they should decide
 227 when it is desirable to eliminate one iteration variable and create
 228 another in its place.
 229
 230 It should be possible to compute what the value of an iteration
 231 variable will be at the end of the loop, and eliminate the variable
 232 within the loop by computing that value at the loop end.
 233
 234 When a loop has a simple increment that adds 1,
 235 instead of jumping in after the increment,
 236 decrement the loop count and jump to the increment.
 237 This allows aob insns to be used.
 238
 239 * Using constraints on values.
 240
 241 Many operations could be simplified based on knowledge of the
 242 minimum and maximum possible values of a register at any particular time.
 243 These limits could come from the data types in the tree, via rtl generation,
 244 or they can be deduced from operations that are performed.  For example,
 245 the result of an `and' operation one of whose operands is 7 must be in
 246 the range 0 to 7.  Compare instructions also tell something about the
 247 possible values of the operand, in the code beyond the test.
 248
 249 Value constraints can be used to determine the results of a further
 250 comparison.  They can also indicate that certain `and' operations are
 251 redundant.  Constraints might permit a decrement and branch
 252 instruction that checks zeroness to be used when the user has
 253 specified to exit if negative.
 254
 255 * Smarter reload pass.
 256
 257 The reload pass as currently written can reload values only into registers
 258 that are reserved for reloading.  This means that in order to use a
 259 register for reloading it must spill everything out of that register.
 260
 261 It would be straightforward, though complicated, for reload1.c to keep
 262 track, during its scan, of which hard registers were available at each
 263 point in the function, and use for reloading even registers that were
 264 free only at the point they were needed.  This would avoid much spilling
 265 and make better code.
 266
 267 * Change the type of a variable.
 268
 269 Sometimes a variable is declared as `int', it is assigned only once
 270 from a value of type `char', and then it is used only by comparison
 271 against constants.  On many machines, better code would result if
 272 the variable had type `char'.  If the compiler could detect this
 273 case, it could change the declaration of the variable and change
 274 all the places that use it.
 275
 276 * Better handling for very sparse switches.
 277
 278 There may be cases where it would be better to compile a switch
 279 statement to use a fixed hash table rather than the current
 280 combination of jump tables and binary search.
 281
 282 * Order of subexpressions.
 283
 284 It might be possible to make better code by paying attention
 285 to the order in which to generate code for subexpressions of an expression.
 286
 287 * More code motion.
 288
 289 Consider hoisting common code up past conditional branches or
 290 tablejumps.
 291
 292 * Trace scheduling.
 293
 294 This technique is said to be able to figure out which way a jump
 295 will usually go, and rearrange the code to make that path the
 296 faster one.
 297
 298 * Distributive law.
 299
 300 The C expression *(X + 4 * (Y + C)) compiles better on certain
 301 machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
 302 modes.  It may be tricky to determine when, and for which machines, to
 303 use each alternative.
 304
 305 Some work has been done on this, in combine.c.
 306
 307 * Can optimize by changing if (x) y; else z; into z; if (x) y;
 308 if z and x do not interfere and z has no effects not undone by y.
 309 This is desirable if z is faster than jumping.
 310
 311 * For a two-insn loop on the 68020, such as
 312   foo:  movb    a2@+,a3@+
 313         jne     foo
 314 it is better to insert dbeq d0,foo before the jne.
 315 d0 can be a junk register.  The challenge is to fit this into
 316 a portable framework: when can you detect this situation and
 317 still be able to allocate a junk register?
 318
 319 2. Simpler porting.
 320
 321 Right now, describing the target machine's instructions is done
 322 cleanly, but describing its addressing mode is done with several
 323 ad-hoc macro definitions.  Porting would be much easier if there were
 324 an RTL description for addressing modes like that for instructions.
 325 Tools analogous to genflags and genrecog would generate macros from
 326 this description.
 327
 328 There would be one pattern in the address-description file for each
 329 kind of addressing, and this pattern would have:
 330
 331   * the RTL expression for the address
 332   * C code to verify its validity (since that may depend on
 333     the exact data).
 334   * C code to print the address in assembler language.
 335   * C code to convert the address into a valid one, if it is not valid.
 336     (This would replace LEGITIMIZE_ADDRESS).
 337   * Register constraints for all indeterminates that appear
 338     in the RTL expression.
 339
 340 3. Other languages.
 341
 342 Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
 343 desirable.
 344
 345 Pascal, Modula-2 and Ada require the implementation of functions
 346 within functions.  Some of the mechanisms for this already exist.
 347
 348 4. More extensions.
 349
 350 * Generated unique labels.  Have some way of generating distinct labels
 351 for use in extended asm statements.  I don't know what a good syntax would
 352 be.
 353
 354 * A way of defining a structure containing a union, in which the choice of
 355 union alternative is controlled by a previous structure component.
 356
 357 Here is a possible syntax for this.
 358
 359 struct foo {
 360   enum { INT, DOUBLE } code;
 361   auto union { case INT: int i; case DOUBLE: double d;} value : code;
 362 };
 363
 364 * Allow constructor expressions as lvalues, like this:
 365
 366         (struct foo) {a, b, c} = foo();
 367
 368 This would call foo, which returns a structure, and then store the
 369 several components of the structure into the variables a, b, and c.
 370
 371 5. Generalize the machine model.
 372
 373 * Some new compiler features may be needed to do a good job on machines
 374 where static data needs to be addressed using base registers.
 375
 376 * Some machines have two stacks in different areas of memory, one used
 377 for scalars and another for large objects.  The compiler does not
 378 now have a way to understand this.
 379
 380 6. Useful warnings.
 381
 382 * Warn about statements that are undefined because the order of
 383 evaluation of increment operators makes a big difference.  Here is an
 384 example:
 385
 386     *foo++ = hack (*foo);
 387
 388 7. Better documentation of how GCC works and how to port it.
 389
 390 Here is an outline proposed by Allan Adler.
 391
 392 I.    Overview of this document
 393 II.   The machines on which GCC is implemented
 394     A. Prose description of those characteristics of target machines and
 395        their operating systems which are pertinent to the implementation
 396        of GCC.
 397         i. target machine characteristics
 398         ii. comparison of this system of machine characteristics with
 399             other systems of machine specification currently in use
 400     B. Tables of the characteristics of the target machines on which
 401        GCC is implemented.
 402     C. A priori restrictions on the values of characteristics of target
 403        machines, with special reference to those parts of the source code
 404        which entail those restrictions
 405         i. restrictions on individual characteristics
 406         ii. restrictions involving relations between various characteristics
 407     D. The use of GCC as a cross-compiler
 408         i. cross-compilation to existing machines
 409         ii. cross-compilation to non-existent machines
 410     E. Assumptions which are made regarding the target machine
 411         i.  assumptions regarding the architecture of the target machine
 412         ii. assumptions regarding the operating system of the target machine
 413         iii. assumptions regarding software resident on the target machine
 414         iv. where in the source code these assumptions are in effect made
 415 III.   A systematic approach to writing the files tm.h and xm.h
 416     A. Macros which require special care or skill
 417     B. Examples, with special reference to the underlying reasoning
 418 IV.    A systematic approach to writing the machine description file md
 419     A. Minimal viable sets of insn descriptions
 420     B. Examples, with special reference to the underlying reasoning
 421 V.     Uses of the file aux-output.c
 422 VI.    Specification of what constitutes correct performance of an
 423        implementation of GCC
 424     A. The components of GCC
 425     B. The itinerary of a C program through GCC
 426     C. A system of benchmark programs
 427     D. What your RTL and assembler should look like with these benchmarks
 428     E. Fine tuning for speed and size of compiled code
 429 VII.   A systematic procedure for debugging an implementation of GCC
 430     A. Use of GDB
 431         i. the macros in the file .gdbinit for GCC
 432         ii. obstacles to the use of GDB
 433             a. functions implemented as macros can't be called in GDB
 434     B. Debugging without GDB
 435         i. How to turn off the normal operation of GCC and access specific
 436            parts of GCC
 437     C. Debugging tools
 438     D. Debugging the parser
 439         i. how machine macros and insn definitions affect the parser
 440     E. Debugging the recognizer
 441         i. how machine macros and insn definitions affect the recognizer
 442
 443 ditto for other components
 444
 445 VIII. Data types used by GCC, with special reference to restrictions not
 446       specified in the formal definition of the data type
 447 IX.   References to the literature for the algorithms used in GCC
 448