gcc/doc/loop.texi

   1 @c Copyright (C) 2006-2015 Free Software Foundation, Inc.
   2 @c Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5
   6 @c ---------------------------------------------------------------------
   7 @c Loop Representation
   8 @c ---------------------------------------------------------------------
   9
  10 @node Loop Analysis and Representation
  11 @chapter Analysis and Representation of Loops
  12
  13 GCC provides extensive infrastructure for work with natural loops, i.e.,
  14 strongly connected components of CFG with only one entry block.  This
  15 chapter describes representation of loops in GCC, both on GIMPLE and in
  16 RTL, as well as the interfaces to loop-related analyses (induction
  17 variable analysis and number of iterations analysis).
  18
  19 @menu
  20 * Loop representation::         Representation and analysis of loops.
  21 * Loop querying::               Getting information about loops.
  22 * Loop manipulation::           Loop manipulation functions.
  23 * LCSSA::                       Loop-closed SSA form.
  24 * Scalar evolutions::           Induction variables on GIMPLE.
  25 * loop-iv::                     Induction variables on RTL.
  26 * Number of iterations::        Number of iterations analysis.
  27 * Dependency analysis::         Data dependency analysis.
  28 * Omega::                       A solver for linear programming problems.
  29 @end menu
  30
  31 @node Loop representation
  32 @section Loop representation
  33 @cindex Loop representation
  34 @cindex Loop analysis
  35
  36 This chapter describes the representation of loops in GCC, and functions
  37 that can be used to build, modify and analyze this representation.  Most
  38 of the interfaces and data structures are declared in @file{cfgloop.h}.
  39 Loop structures are analyzed and this information disposed or updated
  40 at the discretion of individual passes.  Still most of the generic
  41 CFG manipulation routines are aware of loop structures and try to
  42 keep them up-to-date.  By this means an increasing part of the
  43 compilation pipeline is setup to maintain loop structure across
  44 passes to allow attaching meta information to individual loops
  45 for consumption by later passes.
  46
  47 In general, a natural loop has one entry block (header) and possibly
  48 several back edges (latches) leading to the header from the inside of
  49 the loop.  Loops with several latches may appear if several loops share
  50 a single header, or if there is a branching in the middle of the loop.
  51 The representation of loops in GCC however allows only loops with a
  52 single latch.  During loop analysis, headers of such loops are split and
  53 forwarder blocks are created in order to disambiguate their structures.
  54 Heuristic based on profile information and structure of the induction
  55 variables in the loops is used to determine whether the latches
  56 correspond to sub-loops or to control flow in a single loop.  This means
  57 that the analysis sometimes changes the CFG, and if you run it in the
  58 middle of an optimization pass, you must be able to deal with the new
  59 blocks.  You may avoid CFG changes by passing
  60 @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
  61 note however that most other loop manipulation functions will not work
  62 correctly for loops with multiple latch edges (the functions that only
  63 query membership of blocks to loops and subloop relationships, or
  64 enumerate and test loop exits, can be expected to work).
  65
  66 Body of the loop is the set of blocks that are dominated by its header,
  67 and reachable from its latch against the direction of edges in CFG@.  The
  68 loops are organized in a containment hierarchy (tree) such that all the
  69 loops immediately contained inside loop L are the children of L in the
  70 tree.  This tree is represented by the @code{struct loops} structure.
  71 The root of this tree is a fake loop that contains all blocks in the
  72 function.  Each of the loops is represented in a @code{struct loop}
  73 structure.  Each loop is assigned an index (@code{num} field of the
  74 @code{struct loop} structure), and the pointer to the loop is stored in
  75 the corresponding field of the @code{larray} vector in the loops
  76 structure.  The indices do not have to be continuous, there may be
  77 empty (@code{NULL}) entries in the @code{larray} created by deleting
  78 loops.  Also, there is no guarantee on the relative order of a loop
  79 and its subloops in the numbering.  The index of a loop never changes.
  80
  81 The entries of the @code{larray} field should not be accessed directly.
  82 The function @code{get_loop} returns the loop description for a loop with
  83 the given index.  @code{number_of_loops} function returns number of
  84 loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
  85 macro.  The @code{flags} argument of the macro is used to determine
  86 the direction of traversal and the set of loops visited.  Each loop is
  87 guaranteed to be visited exactly once, regardless of the changes to the
  88 loop tree, and the loops may be removed during the traversal.  The newly
  89 created loops are never traversed, if they need to be visited, this
  90 must be done separately after their creation.  The @code{FOR_EACH_LOOP}
  91 macro allocates temporary variables.  If the @code{FOR_EACH_LOOP} loop
  92 were ended using break or goto, they would not be released;
  93 @code{FOR_EACH_LOOP_BREAK} macro must be used instead.
  94
  95 Each basic block contains the reference to the innermost loop it belongs
  96 to (@code{loop_father}).  For this reason, it is only possible to have
  97 one @code{struct loops} structure initialized at the same time for each
  98 CFG@.  The global variable @code{current_loops} contains the
  99 @code{struct loops} structure.  Many of the loop manipulation functions
 100 assume that dominance information is up-to-date.
 101
 102 The loops are analyzed through @code{loop_optimizer_init} function.  The
 103 argument of this function is a set of flags represented in an integer
 104 bitmask.  These flags specify what other properties of the loop
 105 structures should be calculated/enforced and preserved later:
 106
 107 @itemize
 108 @item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
 109 changes to CFG will be performed in the loop analysis, in particular,
 110 loops with multiple latch edges will not be disambiguated.  If a loop
 111 has multiple latches, its latch block is set to NULL@.  Most of
 112 the loop manipulation functions will not work for loops in this shape.
 113 No other flags that require CFG changes can be passed to
 114 loop_optimizer_init.
 115 @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
 116 a way that each loop has only one entry edge, and additionally, the
 117 source block of this entry edge has only one successor.  This creates a
 118 natural place where the code can be moved out of the loop, and ensures
 119 that the entry edge of the loop leads from its immediate super-loop.
 120 @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
 121 force the latch block of each loop to have only one successor.  This
 122 ensures that the latch of the loop does not belong to any of its
 123 sub-loops, and makes manipulation with the loops significantly easier.
 124 Most of the loop manipulation functions assume that the loops are in
 125 this shape.  Note that with this flag, the ``normal'' loop without any
 126 control flow inside and with one exit consists of two basic blocks.
 127 @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
 128 edges in the strongly connected components that are not natural loops
 129 (have more than one entry block) are marked with
 130 @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
 131 flag is not set for blocks and edges that belong to natural loops that
 132 are in such an irreducible region (but it is set for the entry and exit
 133 edges of such a loop, if they lead to/from this region).
 134 @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
 135 and updated for each loop.  This makes some functions (e.g.,
 136 @code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
 137 @code{single_exit}) can be used only if the lists of exits are
 138 recorded.
 139 @end itemize
 140
 141 These properties may also be computed/enforced later, using functions
 142 @code{create_preheaders}, @code{force_single_succ_latches},
 143 @code{mark_irreducible_loops} and @code{record_loop_exits}.
 144 The properties can be queried using @code{loops_state_satisfies_p}.
 145
 146 The memory occupied by the loops structures should be freed with
 147 @code{loop_optimizer_finalize} function.  When loop structures are
 148 setup to be preserved across passes this function reduces the
 149 information to be kept up-to-date to a minimum (only
 150 @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} set).
 151
 152 The CFG manipulation functions in general do not update loop structures.
 153 Specialized versions that additionally do so are provided for the most
 154 common tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
 155 used to cleanup CFG while updating the loops structures if
 156 @code{current_loops} is set.
 157
 158 At the moment loop structure is preserved from the start of GIMPLE
 159 loop optimizations until the end of RTL loop optimizations.  During
 160 this time a loop can be tracked by its @code{struct loop} and number.
 161
 162 @node Loop querying
 163 @section Loop querying
 164 @cindex Loop querying
 165
 166 The functions to query the information about loops are declared in
 167 @file{cfgloop.h}.  Some of the information can be taken directly from
 168 the structures.  @code{loop_father} field of each basic block contains
 169 the innermost loop to that the block belongs.  The most useful fields of
 170 loop structure (that are kept up-to-date at all times) are:
 171
 172 @itemize
 173 @item @code{header}, @code{latch}: Header and latch basic blocks of the
 174 loop.
 175 @item @code{num_nodes}: Number of basic blocks in the loop (including
 176 the basic blocks of the sub-loops).
 177 @item @code{depth}: The depth of the loop in the loops tree, i.e., the
 178 number of super-loops of the loop.
 179 @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
 180 sub-loop, and the sibling of the loop in the loops tree.
 181 @end itemize
 182
 183 There are other fields in the loop structures, many of them used only by
 184 some of the passes, or not updated during CFG changes; in general, they
 185 should not be accessed directly.
 186
 187 The most important functions to query loop structures are:
 188
 189 @itemize
 190 @item @code{flow_loops_dump}: Dumps the information about loops to a
 191 file.
 192 @item @code{verify_loop_structure}: Checks consistency of the loop
 193 structures.
 194 @item @code{loop_latch_edge}: Returns the latch edge of a loop.
 195 @item @code{loop_preheader_edge}: If loops have preheaders, returns
 196 the preheader edge of a loop.
 197 @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
 198 another loop.
 199 @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
 200 to a loop (including its sub-loops).
 201 @item @code{find_common_loop}: Finds the common super-loop of two loops.
 202 @item @code{superloop_at_depth}: Returns the super-loop of a loop with
 203 the given depth.
 204 @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
 205 number of insns in the loop, on GIMPLE and on RTL.
 206 @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
 207 loop.
 208 @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
 209 with @code{EDGE_LOOP_EXIT} flag.
 210 @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
 211 @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
 212 loop in depth-first search order in reversed CFG, ordered by dominance
 213 relation, and breath-first search order, respectively.
 214 @item @code{single_exit}: Returns the single exit edge of the loop, or
 215 @code{NULL} if the loop has more than one exit.  You can only use this
 216 function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
 217 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
 218 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 219 is executed exactly once during each iteration of a loop (that is, it
 220 does not belong to a sub-loop, and it dominates the latch of the loop).
 221 @end itemize
 222
 223 @node Loop manipulation
 224 @section Loop manipulation
 225 @cindex Loop manipulation
 226
 227 The loops tree can be manipulated using the following functions:
 228
 229 @itemize
 230 @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
 231 @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
 232 @item @code{add_bb_to_loop}: Adds a basic block to a loop.
 233 @item @code{remove_bb_from_loops}: Removes a basic block from loops.
 234 @end itemize
 235
 236 Most low-level CFG functions update loops automatically.  The following
 237 functions handle some more complicated cases of CFG manipulations:
 238
 239 @itemize
 240 @item @code{remove_path}: Removes an edge and all blocks it dominates.
 241 @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
 242 ensuring that PHI node arguments remain in the loop (this ensures that
 243 loop-closed SSA form is preserved).  Only useful on GIMPLE.
 244 @end itemize
 245
 246 Finally, there are some higher-level loop transformations implemented.
 247 While some of them are written so that they should work on non-innermost
 248 loops, they are mostly untested in that case, and at the moment, they
 249 are only reliable for the innermost loops:
 250
 251 @itemize
 252 @item @code{create_iv}: Creates a new induction variable.  Only works on
 253 GIMPLE@.  @code{standard_iv_increment_position} can be used to find a
 254 suitable place for the iv increment.
 255 @item @code{duplicate_loop_to_header_edge},
 256 @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
 257 on GIMPLE) duplicate the body of the loop prescribed number of times on
 258 one of the edges entering loop header, thus performing either loop
 259 unrolling or loop peeling.  @code{can_duplicate_loop_p}
 260 (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
 261 loop.
 262 @item @code{loop_version}, @code{tree_ssa_loop_version}: These function
 263 create a copy of a loop, and a branch before them that selects one of
 264 them depending on the prescribed condition.  This is useful for
 265 optimizations that need to verify some assumptions in runtime (one of
 266 the copies of the loop is usually left unchanged, while the other one is
 267 transformed in some way).
 268 @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
 269 extra iterations to make the number of iterations divisible by unroll
 270 factor, updating the exit condition, and removing the exits that now
 271 cannot be taken.  Works only on GIMPLE.
 272 @end itemize
 273
 274 @node LCSSA
 275 @section Loop-closed SSA form
 276 @cindex LCSSA
 277 @cindex Loop-closed SSA form
 278
 279 Throughout the loop optimizations on tree level, one extra condition is
 280 enforced on the SSA form:  No SSA name is used outside of the loop in
 281 that it is defined.  The SSA form satisfying this condition is called
 282 ``loop-closed SSA form'' -- LCSSA@.  To enforce LCSSA, PHI nodes must be
 283 created at the exits of the loops for the SSA names that are used
 284 outside of them.  Only the real operands (not virtual SSA names) are
 285 held in LCSSA, in order to save memory.
 286
 287 There are various benefits of LCSSA:
 288
 289 @itemize
 290 @item Many optimizations (value range analysis, final value
 291 replacement) are interested in the values that are defined in the loop
 292 and used outside of it, i.e., exactly those for that we create new PHI
 293 nodes.
 294 @item In induction variable analysis, it is not necessary to specify the
 295 loop in that the analysis should be performed -- the scalar evolution
 296 analysis always returns the results with respect to the loop in that the
 297 SSA name is defined.
 298 @item It makes updating of SSA form during loop transformations simpler.
 299 Without LCSSA, operations like loop unrolling may force creation of PHI
 300 nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
 301 updated locally.  However, since we only keep real operands in LCSSA, we
 302 cannot use this advantage (we could have local updating of real
 303 operands, but it is not much more efficient than to use generic SSA form
 304 updating for it as well; the amount of changes to SSA is the same).
 305 @end itemize
 306
 307 However, it also means LCSSA must be updated.  This is usually
 308 straightforward, unless you create a new value in loop and use it
 309 outside, or unless you manipulate loop exit edges (functions are
 310 provided to make these manipulations simple).
 311 @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
 312 LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
 313 LCSSA is preserved.
 314
 315 @node Scalar evolutions
 316 @section Scalar evolutions
 317 @cindex Scalar evolutions
 318 @cindex IV analysis on GIMPLE
 319
 320 Scalar evolutions (SCEV) are used to represent results of induction
 321 variable analysis on GIMPLE@.  They enable us to represent variables with
 322 complicated behavior in a simple and consistent way (we only use it to
 323 express values of polynomial induction variables, but it is possible to
 324 extend it).  The interfaces to SCEV analysis are declared in
 325 @file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
 326 @code{scev_initialize} must be used.  To stop using SCEV,
 327 @code{scev_finalize} should be used.  SCEV analysis caches results in
 328 order to save time and memory.  This cache however is made invalid by
 329 most of the loop transformations, including removal of code.  If such a
 330 transformation is performed, @code{scev_reset} must be called to clean
 331 the caches.
 332
 333 Given an SSA name, its behavior in loops can be analyzed using the
 334 @code{analyze_scalar_evolution} function.  The returned SCEV however
 335 does not have to be fully analyzed and it may contain references to
 336 other SSA names defined in the loop.  To resolve these (potentially
 337 recursive) references, @code{instantiate_parameters} or
 338 @code{resolve_mixers} functions must be used.
 339 @code{instantiate_parameters} is useful when you use the results of SCEV
 340 only for some analysis, and when you work with whole nest of loops at
 341 once.  It will try replacing all SSA names by their SCEV in all loops,
 342 including the super-loops of the current loop, thus providing a complete
 343 information about the behavior of the variable in the loop nest.
 344 @code{resolve_mixers} is useful if you work with only one loop at a
 345 time, and if you possibly need to create code based on the value of the
 346 induction variable.  It will only resolve the SSA names defined in the
 347 current loop, leaving the SSA names defined outside unchanged, even if
 348 their evolution in the outer loops is known.
 349
 350 The SCEV is a normal tree expression, except for the fact that it may
 351 contain several special tree nodes.  One of them is
 352 @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
 353 expressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
 354 has three arguments -- base, step and loop (both base and step may
 355 contain further polynomial chrecs).  Type of the expression and of base
 356 and step must be the same.  A variable has evolution
 357 @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
 358 loop) equivalent to @code{x_1} in the following example
 359
 360 @smallexample
 361 while (@dots{})
 362   @{
 363     x_1 = phi (base, x_2);
 364     x_2 = x_1 + step;
 365   @}
 366 @end smallexample
 367
 368 Note that this includes the language restrictions on the operations.
 369 For example, if we compile C code and @code{x} has signed type, then the
 370 overflow in addition would cause undefined behavior, and we may assume
 371 that this does not happen.  Hence, the value with this SCEV cannot
 372 overflow (which restricts the number of iterations of such a loop).
 373
 374 In many cases, one wants to restrict the attention just to affine
 375 induction variables.  In this case, the extra expressive power of SCEV
 376 is not useful, and may complicate the optimizations.  In this case,
 377 @code{simple_iv} function may be used to analyze a value -- the result
 378 is a loop-invariant base and step.
 379
 380 @node loop-iv
 381 @section IV analysis on RTL
 382 @cindex IV analysis on RTL
 383
 384 The induction variable on RTL is simple and only allows analysis of
 385 affine induction variables, and only in one loop at once.  The interface
 386 is declared in @file{cfgloop.h}.  Before analyzing induction variables
 387 in a loop L, @code{iv_analysis_loop_init} function must be called on L.
 388 After the analysis (possibly calling @code{iv_analysis_loop_init} for
 389 several loops) is finished, @code{iv_analysis_done} should be called.
 390 The following functions can be used to access the results of the
 391 analysis:
 392
 393 @itemize
 394 @item @code{iv_analyze}: Analyzes a single register used in the given
 395 insn.  If no use of the register in this insn is found, the following
 396 insns are scanned, so that this function can be called on the insn
 397 returned by get_condition.
 398 @item @code{iv_analyze_result}: Analyzes result of the assignment in the
 399 given insn.
 400 @item @code{iv_analyze_expr}: Analyzes a more complicated expression.
 401 All its operands are analyzed by @code{iv_analyze}, and hence they must
 402 be used in the specified insn or one of the following insns.
 403 @end itemize
 404
 405 The description of the induction variable is provided in @code{struct
 406 rtx_iv}.  In order to handle subregs, the representation is a bit
 407 complicated; if the value of the @code{extend} field is not
 408 @code{UNKNOWN}, the value of the induction variable in the i-th
 409 iteration is
 410
 411 @smallexample
 412 delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
 413 @end smallexample
 414
 415 with the following exception:  if @code{first_special} is true, then the
 416 value in the first iteration (when @code{i} is zero) is @code{delta +
 417 mult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
 418 then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
 419 and the value in the i-th iteration is
 420
 421 @smallexample
 422 subreg_@{mode@} (base + i * step)
 423 @end smallexample
 424
 425 The function @code{get_iv_value} can be used to perform these
 426 calculations.
 427
 428 @node Number of iterations
 429 @section Number of iterations analysis
 430 @cindex Number of iterations analysis
 431
 432 Both on GIMPLE and on RTL, there are functions available to determine
 433 the number of iterations of a loop, with a similar interface.  The
 434 number of iterations of a loop in GCC is defined as the number of
 435 executions of the loop latch.  In many cases, it is not possible to
 436 determine the number of iterations unconditionally -- the determined
 437 number is correct only if some assumptions are satisfied.  The analysis
 438 tries to verify these conditions using the information contained in the
 439 program; if it fails, the conditions are returned together with the
 440 result.  The following information and conditions are provided by the
 441 analysis:
 442
 443 @itemize
 444 @item @code{assumptions}: If this condition is false, the rest of
 445 the information is invalid.
 446 @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
 447 this condition is true, the loop exits in the first iteration.
 448 @item @code{infinite}: If this condition is true, the loop is infinite.
 449 This condition is only available on RTL@.  On GIMPLE, conditions for
 450 finiteness of the loop are included in @code{assumptions}.
 451 @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
 452 that gives number of iterations.  The number of iterations is defined as
 453 the number of executions of the loop latch.
 454 @end itemize
 455
 456 Both on GIMPLE and on RTL, it necessary for the induction variable
 457 analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
 458 On GIMPLE, the results are stored to @code{struct tree_niter_desc}
 459 structure.  Number of iterations before the loop is exited through a
 460 given exit can be determined using @code{number_of_iterations_exit}
 461 function.  On RTL, the results are returned in @code{struct niter_desc}
 462 structure.  The corresponding function is named
 463 @code{check_simple_exit}.  There are also functions that pass through
 464 all the exits of a loop and try to find one with easy to determine
 465 number of iterations -- @code{find_loop_niter} on GIMPLE and
 466 @code{find_simple_exit} on RTL@.  Finally, there are functions that
 467 provide the same information, but additionally cache it, so that
 468 repeated calls to number of iterations are not so costly --
 469 @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
 470 on RTL.
 471
 472 Note that some of these functions may behave slightly differently than
 473 others -- some of them return only the expression for the number of
 474 iterations, and fail if there are some assumptions.  The function
 475 @code{number_of_latch_executions} works only for single-exit loops.
 476 The function @code{number_of_cond_exit_executions} can be used to
 477 determine number of executions of the exit condition of a single-exit
 478 loop (i.e., the @code{number_of_latch_executions} increased by one).
 479
 480 @node Dependency analysis
 481 @section Data Dependency Analysis
 482 @cindex Data Dependency Analysis
 483
 484 The code for the data dependence analysis can be found in
 485 @file{tree-data-ref.c} and its interface and data structures are
 486 described in @file{tree-data-ref.h}.  The function that computes the
 487 data dependences for all the array and pointer references for a given
 488 loop is @code{compute_data_dependences_for_loop}.  This function is
 489 currently used by the linear loop transform and the vectorization
 490 passes.  Before calling this function, one has to allocate two vectors:
 491 a first vector will contain the set of data references that are
 492 contained in the analyzed loop body, and the second vector will contain
 493 the dependence relations between the data references.  Thus if the
 494 vector of data references is of size @code{n}, the vector containing the
 495 dependence relations will contain @code{n*n} elements.  However if the
 496 analyzed loop contains side effects, such as calls that potentially can
 497 interfere with the data references in the current analyzed loop, the
 498 analysis stops while scanning the loop body for data references, and
 499 inserts a single @code{chrec_dont_know} in the dependence relation
 500 array.
 501
 502 The data references are discovered in a particular order during the
 503 scanning of the loop body: the loop body is analyzed in execution order,
 504 and the data references of each statement are pushed at the end of the
 505 data reference array.  Two data references syntactically occur in the
 506 program in the same order as in the array of data references.  This
 507 syntactic order is important in some classical data dependence tests,
 508 and mapping this order to the elements of this array avoids costly
 509 queries to the loop body representation.
 510
 511 Three types of data references are currently handled: ARRAY_REF,
 512 INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
 513 is @code{data_reference}, where @code{data_reference_p} is a name of a
 514 pointer to the data reference structure. The structure contains the
 515 following elements:
 516
 517 @itemize
 518 @item @code{base_object_info}: Provides information about the base object
 519 of the data reference and its access functions. These access functions
 520 represent the evolution of the data reference in the loop relative to
 521 its base, in keeping with the classical meaning of the data reference
 522 access function for the support of arrays. For example, for a reference
 523 @code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
 524 one for each array subscript, are:
 525 @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
 526
 527 @item @code{first_location_in_loop}: Provides information about the first
 528 location accessed by the data reference in the loop and about the access
 529 function used to represent evolution relative to this location. This data
 530 is used to support pointers, and is not used for arrays (for which we
 531 have base objects). Pointer accesses are represented as a one-dimensional
 532 access that starts from the first location accessed in the loop. For
 533 example:
 534
 535 @smallexample
 536       for1 i
 537          for2 j
 538           *((int *)p + i + j) = a[i][j];
 539 @end smallexample
 540
 541 The access function of the pointer access is @code{@{0, + 4B@}_for2}
 542 relative to @code{p + i}. The access functions of the array are
 543 @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
 544 relative to @code{a}.
 545
 546 Usually, the object the pointer refers to is either unknown, or we can't
 547 prove that the access is confined to the boundaries of a certain object.
 548
 549 Two data references can be compared only if at least one of these two
 550 representations has all its fields filled for both data references.
 551
 552 The current strategy for data dependence tests is as follows:
 553 If both @code{a} and @code{b} are represented as arrays, compare
 554 @code{a.base_object} and @code{b.base_object};
 555 if they are equal, apply dependence tests (use access functions based on
 556 base_objects).
 557 Else if both @code{a} and @code{b} are represented as pointers, compare
 558 @code{a.first_location} and @code{b.first_location};
 559 if they are equal, apply dependence tests (use access functions based on
 560 first location).
 561 However, if @code{a} and @code{b} are represented differently, only try
 562 to prove that the bases are definitely different.
 563
 564 @item Aliasing information.
 565 @item Alignment information.
 566 @end itemize
 567
 568 The structure describing the relation between two data references is
 569 @code{data_dependence_relation} and the shorter name for a pointer to
 570 such a structure is @code{ddr_p}.  This structure contains:
 571
 572 @itemize
 573 @item a pointer to each data reference,
 574 @item a tree node @code{are_dependent} that is set to @code{chrec_known}
 575 if the analysis has proved that there is no dependence between these two
 576 data references, @code{chrec_dont_know} if the analysis was not able to
 577 determine any useful result and potentially there could exist a
 578 dependence between these data references, and @code{are_dependent} is
 579 set to @code{NULL_TREE} if there exist a dependence relation between the
 580 data references, and the description of this dependence relation is
 581 given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
 582 arrays,
 583 @item a boolean that determines whether the dependence relation can be
 584 represented by a classical distance vector,
 585 @item an array @code{subscripts} that contains a description of each
 586 subscript of the data references.  Given two array accesses a
 587 subscript is the tuple composed of the access functions for a given
 588 dimension.  For example, given @code{A[f1][f2][f3]} and
 589 @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
 590 g2), (f3, g3)}.
 591 @item two arrays @code{dir_vects} and @code{dist_vects} that contain
 592 classical representations of the data dependences under the form of
 593 direction and distance dependence vectors,
 594 @item an array of loops @code{loop_nest} that contains the loops to
 595 which the distance and direction vectors refer to.
 596 @end itemize
 597
 598 Several functions for pretty printing the information extracted by the
 599 data dependence analysis are available: @code{dump_ddrs} prints with a
 600 maximum verbosity the details of a data dependence relations array,
 601 @code{dump_dist_dir_vectors} prints only the classical distance and
 602 direction vectors for a data dependence relations array, and
 603 @code{dump_data_references} prints the details of the data references
 604 contained in a data reference array.
 605
 606
 607 @node Omega
 608 @section Omega a solver for linear programming problems
 609 @cindex Omega a solver for linear programming problems
 610
 611 The data dependence analysis contains several solvers triggered
 612 sequentially from the less complex ones to the more sophisticated.
 613 For ensuring the consistency of the results of these solvers, a data
 614 dependence check pass has been implemented based on two different
 615 solvers.  The second method that has been integrated to GCC is based
 616 on the Omega dependence solver, written in the 1990's by William Pugh
 617 and David Wonnacott.  Data dependence tests can be formulated using a
 618 subset of the Presburger arithmetics that can be translated to linear
 619 constraint systems.  These linear constraint systems can then be
 620 solved using the Omega solver.
 621
 622 The Omega solver is using Fourier-Motzkin's algorithm for variable
 623 elimination: a linear constraint system containing @code{n} variables
 624 is reduced to a linear constraint system with @code{n-1} variables.
 625 The Omega solver can also be used for solving other problems that can
 626 be expressed under the form of a system of linear equalities and
 627 inequalities.  The Omega solver is known to have an exponential worst
 628 case, also known under the name of ``omega nightmare'' in the
 629 literature, but in practice, the omega test is known to be efficient
 630 for the common data dependence tests.
 631
 632 The interface used by the Omega solver for describing the linear
 633 programming problems is described in @file{omega.h}, and the solver is
 634 @code{omega_solve_problem}.