gcc/PROJECTS

   1 Haifa scheduler (haifa-sched.c, loop.[ch], unroll.[ch], genattrtab.c):
   2 (contact law@cygnus.com before starting any serious haifa work)
   3
   4   * Fix all the formatting problems.  Simple, mindless work.
   5
   6   * Fix/add comments throughout the code.  Many of the comments are from
   7   the old scheduler and are out of date and misleading.  Many new hunks
   8   of code don't have sufficient comments and documentation.  Those which
   9   do have comments need to be rewritten to use complete sentences and
  10   proper formatting.
  11
  12   * Someone needs make one (or more) passes over the scheduler as a whole to
  13   just clean it up.  Try to move the machine dependent bits into the target
  14   files where they belong, avoid re-creating functions where or near
  15   equivalents already exist (ie is_conditional_branch and friends), etc., etc.
  16
  17   * Document the new scheduling options.  Remove those options which are
  18   not really useful (like reverse scheduling for example).  In general
  19   the haifa scheduler adds _way_ too many options.  I'm definitely of the
  20   opinion that gcc already has too many -foptions, and haifa doesn't help
  21   that situation.
  22
  23   * Testing and benchmarking.    We've converted a few ports to using the
  24   Haifa scheduler (hppa, sparc, ppc, alpha).  We need to continue testing
  25   and benchmarking the new scheduler on additional targets.
  26
  27   We need to have some kind of docs for how to best describe a machine to
  28   the haifa scheduler to get good performance.  Some existing ports have
  29   been tuned to deal with the old scheduler -- they may need to be tuned
  30   to generate good schedules with haifa.
  31
  32
  33
  34 Improvements to global cse and partial redundancy elimination:
  35
  36 The current implementation of global cse uses a lazy code motion algorithm
  37 from Muchnick's "Advanced Compiler Design and Implementation".
  38
  39 Longer term we want to convert to an edge based LCM algorithm using the
  40 new structures defined by flow.c.  This allows for better expression
  41 placement and provides edge splitting "for free".
  42
  43 lcm also provides the underlying framework for several additional optimizations
  44 such as shrink wrapping, spill code motion, dead store elimination, and generic
  45 load/store motion (all the other examples are subcases of load/store motion).
  46
  47 It can probably also be used to improve the reg-stack pass of the compiler.
  48
  49 Contact law@cygnus.com if you're interested in working on lazy code motion.
  50
  51 -------------
  52
  53 The old PROJECTS file.  Stuff I know has been done has been deleted.
  54 Stuff in progress has a contact name associated with it.
  55 has been
  56
  57 1. Better optimization.
  58
  59 * Constants in unused inline functions
  60
  61 It would be nice to delay output of string constants so that string
  62 constants mentioned in unused inline functions are never generated.
  63 Perhaps this would also take care of string constants in dead code.
  64
  65 The difficulty is in finding a clean way for the RTL which refers
  66 to the constant (currently, only by an assembler symbol name)
  67 to point to the constant and cause it to be output.
  68
  69 * Optimize a sequence of if statements whose conditions are exclusive.
  70
  71 It is possible to optimize
  72
  73     if (x == 1) ...;
  74     if (x == 2) ...;
  75     if (x == 3) ...;
  76
  77 into
  78
  79     if (x == 1) ...;
  80     else if (x == 2) ...;
  81     else if (x == 3) ...;
  82
  83 provided that x is not altered by the contents of the if statements.
  84
  85 It's not certain whether this is worth doing.  Perhaps programmers
  86 nearly always write the else's themselves, leaving few opportunities
  87 to improve anything.
  88
  89 * Un-cse.
  90
  91 Perhaps we should have an un-cse step right after cse, which tries to
  92 replace a reg with its value if the value can be substituted for the
  93 reg everywhere, if that looks like an improvement.  Which is if the
  94 reg is used only a few times.  Use rtx_cost to determine if the
  95 change is really an improvement.
  96
  97 * Clean up how cse works.
  98
  99 The scheme is that each value has just one hash entry.  The
 100 first_same_value and next_same_value chains are no longer needed.
 101
 102 For arithmetic, each hash table elt has the following slots:
 103
 104 * Operation.  This is an rtx code.
 105 * Mode.
 106 * Operands 0, 1 and 2.  These point to other hash table elements.
 107
 108 So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
 109 first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
 110 points to.  Then we put these elts into operands 0 and 1 of a new elt.
 111 We put PLUS and SI into the new elt.
 112
 113 Registers and mem refs would never be entered into the table as such.
 114 However, the values they contain would be entered.  There would be a
 115 table indexed by regno which points at the hash entry for the value in
 116 that reg.
 117
 118 The hash entry index now plays the role of a qty number.
 119 We still need qty_first_reg, reg_next_eqv, etc. to record which regs
 120 share a particular qty.
 121
 122 When a reg is used whose contents are unknown, we need to create a
 123 hash table entry whose contents say "unknown", as a place holder for
 124 whatever the reg contains.  If that reg is added to something, then
 125 the hash entry for the sum will refer to the "unknown" entry.  Use
 126 UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
 127
 128 For a constant, a unique hash entry would be made based on the
 129 value of the constant.
 130
 131 What about MEM?  Each time a memory address is referenced, we need a
 132 qty (a hash table elt) to represent what is in it.  (Just as for a
 133 register.)  If this isn't known, create one, just as for a reg whose
 134 contents are unknown.
 135
 136 We need a way to find all mem refs that still contain a certain value.
 137 Do this with a chain of hash elts (for memory addresses) that point to
 138 locations that hold the value.  The hash elt for the value itself should
 139 point to the start of the chain.  It would be good for the hash elt
 140 for an address to point to the hash elt for the contents of that address
 141 (but this ptr can be null if the contents have never been entered).
 142
 143 With this data structure, nothing need ever be invalidated except
 144 the lists of which regs or mems hold a particular value.  It is easy
 145 to see if there is a reg or mem that is equiv to a particular value.
 146 If the value is constant, it is always explicitly constant.
 147
 148 * Support more general tail-recursion among different functions.
 149
 150 This might be possible under certain circumstances, such as when
 151 the argument lists of the functions have the same lengths.
 152 Perhaps it could be done with a special declaration.
 153
 154 You would need to verify in the calling function that it does not
 155 use the addresses of any local variables and does not use setjmp.
 156
 157 * Put short statics vars at low addresses and use short addressing mode?
 158
 159 Useful on the 68000/68020 and perhaps on the 32000 series,
 160 provided one has a linker that works with the feature.
 161 This is said to make a 15% speedup on the 68000.
 162
 163 * Keep global variables in registers.
 164
 165 Here is a scheme for doing this.  A global variable, or a local variable
 166 whose address is taken, can be kept in a register for an entire function
 167 if it does not use non-constant memory addresses and (for globals only)
 168 does not call other functions.  If the entire function does not meet
 169 this criterion, a loop may.
 170
 171 The VAR_DECL for such a variable would have to have two RTL expressions:
 172 the true home in memory, and the pseudo-register used temporarily.
 173 It is necessary to emit insns to copy the memory location into the
 174 pseudo-register at the beginning of the function or loop, and perhaps
 175 back out at the end.  These insns should have REG_EQUIV notes so that,
 176 if the pseudo-register does not get a hard register, it is spilled into
 177 the memory location which exists in any case.
 178
 179 The easiest way to set up these insns is to modify the routine
 180 put_var_into_stack so that it does not apply to the entire function
 181 (sparing any loops which contain nothing dangerous) and to call it at
 182 the end of the function regardless of where in the function the
 183 address of a local variable is taken.  It would be called
 184 unconditionally at the end of the function for all relevant global
 185 variables.
 186
 187 For debugger output, the thing to do is to invent a new binding level
 188 around the appropriate loop and define the variable name as a register
 189 variable with that scope.
 190
 191 * Live-range splitting.
 192
 193 Currently a variable is allocated a hard register either for the full
 194 extent of its use or not at all.  Sometimes it would be good to
 195 allocate a variable a hard register for just part of a function; for
 196 example, through a particular loop where the variable is mostly used,
 197 or outside of a particular loop where the variable is not used.  (The
 198 latter is nice because it might let the variable be in a register most
 199 of the time even though the loop needs all the registers.)
 200
 201 Contact meissner@cygnus.com before starting any work on live range
 202 splitting.
 203
 204 * Detect dead stores into memory?
 205
 206 A store into memory is dead if it is followed by another store into
 207 the same location; and, in between, there is no reference to anything
 208 that might be that location (including no reference to a variable
 209 address).
 210
 211 This can be modeled as a partial redundancy elimination/lazy code motion
 212 problem.  Contact law@cygnus.com before working on dead store elimination
 213 optimizations.
 214
 215 * Loop optimization.
 216
 217 Strength reduction and iteration variable elimination could be
 218 smarter.  They should know how to decide which iteration variables are
 219 not worth making explicit because they can be computed as part of an
 220 address calculation.  Based on this information, they should decide
 221 when it is desirable to eliminate one iteration variable and create
 222 another in its place.
 223
 224 It should be possible to compute what the value of an iteration
 225 variable will be at the end of the loop, and eliminate the variable
 226 within the loop by computing that value at the loop end.
 227
 228 When a loop has a simple increment that adds 1,
 229 instead of jumping in after the increment,
 230 decrement the loop count and jump to the increment.
 231 This allows aob insns to be used.
 232
 233 * Using constraints on values.
 234
 235 Many operations could be simplified based on knowledge of the
 236 minimum and maximum possible values of a register at any particular time.
 237 These limits could come from the data types in the tree, via rtl generation,
 238 or they can be deduced from operations that are performed.  For example,
 239 the result of an `and' operation one of whose operands is 7 must be in
 240 the range 0 to 7.  Compare instructions also tell something about the
 241 possible values of the operand, in the code beyond the test.
 242
 243 Value constraints can be used to determine the results of a further
 244 comparison.  They can also indicate that certain `and' operations are
 245 redundant.  Constraints might permit a decrement and branch
 246 instruction that checks zeroness to be used when the user has
 247 specified to exit if negative.
 248
 249 * Change the type of a variable.
 250
 251 Sometimes a variable is declared as `int', it is assigned only once
 252 from a value of type `char', and then it is used only by comparison
 253 against constants.  On many machines, better code would result if
 254 the variable had type `char'.  If the compiler could detect this
 255 case, it could change the declaration of the variable and change
 256 all the places that use it.
 257
 258 * Better handling for very sparse switches.
 259
 260 There may be cases where it would be better to compile a switch
 261 statement to use a fixed hash table rather than the current
 262 combination of jump tables and binary search.
 263
 264 * Order of subexpressions.
 265
 266 It might be possible to make better code by paying attention
 267 to the order in which to generate code for subexpressions of an expression.
 268
 269 * More code motion.
 270
 271 Consider hoisting common code up past conditional branches or tablejumps.
 272
 273 Contact law@cygnus.com before working on code hoisting.
 274
 275 * Trace scheduling.
 276
 277 This technique is said to be able to figure out which way a jump
 278 will usually go, and rearrange the code to make that path the
 279 faster one.
 280
 281 * Distributive law.
 282
 283 The C expression *(X + 4 * (Y + C)) compiles better on certain
 284 machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
 285 modes.  It may be tricky to determine when, and for which machines, to
 286 use each alternative.
 287
 288 Some work has been done on this, in combine.c.
 289
 290 * Can optimize by changing if (x) y; else z; into z; if (x) y;
 291 if z and x do not interfere and z has no effects not undone by y.
 292 This is desirable if z is faster than jumping.
 293
 294 * For a two-insn loop on the 68020, such as
 295   foo:  movb    a2@+,a3@+
 296         jne     foo
 297 it is better to insert dbeq d0,foo before the jne.
 298 d0 can be a junk register.  The challenge is to fit this into
 299 a portable framework: when can you detect this situation and
 300 still be able to allocate a junk register?
 301
 302 2. Simpler porting.
 303
 304 Right now, describing the target machine's instructions is done
 305 cleanly, but describing its addressing mode is done with several
 306 ad-hoc macro definitions.  Porting would be much easier if there were
 307 an RTL description for addressing modes like that for instructions.
 308 Tools analogous to genflags and genrecog would generate macros from
 309 this description.
 310
 311 There would be one pattern in the address-description file for each
 312 kind of addressing, and this pattern would have:
 313
 314   * the RTL expression for the address
 315   * C code to verify its validity (since that may depend on
 316     the exact data).
 317   * C code to print the address in assembler language.
 318   * C code to convert the address into a valid one, if it is not valid.
 319     (This would replace LEGITIMIZE_ADDRESS).
 320   * Register constraints for all indeterminates that appear
 321     in the RTL expression.
 322
 323 3. Other languages.
 324
 325 Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
 326 desirable.
 327
 328 Pascal, Modula-2 and Ada require the implementation of functions
 329 within functions.  Some of the mechanisms for this already exist.
 330
 331 4. More extensions.
 332
 333 * Generated unique labels.  Have some way of generating distinct labels
 334 for use in extended asm statements.  I don't know what a good syntax would
 335 be.
 336
 337 * A way of defining a structure containing a union, in which the choice of
 338 union alternative is controlled by a previous structure component.
 339
 340 Here is a possible syntax for this.
 341
 342 struct foo {
 343   enum { INT, DOUBLE } code;
 344   auto union { case INT: int i; case DOUBLE: double d;} value : code;
 345 };
 346
 347 * Allow constructor expressions as lvalues, like this:
 348
 349         (struct foo) {a, b, c} = foo();
 350
 351 This would call foo, which returns a structure, and then store the
 352 several components of the structure into the variables a, b, and c.
 353
 354 5. Generalize the machine model.
 355
 356 * Some new compiler features may be needed to do a good job on machines
 357 where static data needs to be addressed using base registers.
 358
 359 * Some machines have two stacks in different areas of memory, one used
 360 for scalars and another for large objects.  The compiler does not
 361 now have a way to understand this.
 362
 363 6. Useful warnings.
 364
 365 * Warn about statements that are undefined because the order of
 366 evaluation of increment operators makes a big difference.  Here is an
 367 example:
 368
 369     *foo++ = hack (*foo);
 370
 371 7. Better documentation of how GCC works and how to port it.
 372
 373 Here is an outline proposed by Allan Adler.
 374
 375 I.    Overview of this document
 376 II.   The machines on which GCC is implemented
 377     A. Prose description of those characteristics of target machines and
 378        their operating systems which are pertinent to the implementation
 379        of GCC.
 380         i. target machine characteristics
 381         ii. comparison of this system of machine characteristics with
 382             other systems of machine specification currently in use
 383     B. Tables of the characteristics of the target machines on which
 384        GCC is implemented.
 385     C. A priori restrictions on the values of characteristics of target
 386        machines, with special reference to those parts of the source code
 387        which entail those restrictions
 388         i. restrictions on individual characteristics
 389         ii. restrictions involving relations between various characteristics
 390     D. The use of GCC as a cross-compiler
 391         i. cross-compilation to existing machines
 392         ii. cross-compilation to non-existent machines
 393     E. Assumptions which are made regarding the target machine
 394         i.  assumptions regarding the architecture of the target machine
 395         ii. assumptions regarding the operating system of the target machine
 396         iii. assumptions regarding software resident on the target machine
 397         iv. where in the source code these assumptions are in effect made
 398 III.   A systematic approach to writing the files tm.h and xm.h
 399     A. Macros which require special care or skill
 400     B. Examples, with special reference to the underlying reasoning
 401 IV.    A systematic approach to writing the machine description file md
 402     A. Minimal viable sets of insn descriptions
 403     B. Examples, with special reference to the underlying reasoning
 404 V.     Uses of the file aux-output.c
 405 VI.    Specification of what constitutes correct performance of an
 406        implementation of GCC
 407     A. The components of GCC
 408     B. The itinerary of a C program through GCC
 409     C. A system of benchmark programs
 410     D. What your RTL and assembler should look like with these benchmarks
 411     E. Fine tuning for speed and size of compiled code
 412 VII.   A systematic procedure for debugging an implementation of GCC
 413     A. Use of GDB
 414         i. the macros in the file .gdbinit for GCC
 415         ii. obstacles to the use of GDB
 416             a. functions implemented as macros can't be called in GDB
 417     B. Debugging without GDB
 418         i. How to turn off the normal operation of GCC and access specific
 419            parts of GCC
 420     C. Debugging tools
 421     D. Debugging the parser
 422         i. how machine macros and insn definitions affect the parser
 423     E. Debugging the recognizer
 424         i. how machine macros and insn definitions affect the recognizer
 425
 426 ditto for other components
 427
 428 VIII. Data types used by GCC, with special reference to restrictions not
 429       specified in the formal definition of the data type
 430 IX.   References to the literature for the algorithms used in GCC
 431