gcc/doc/loop.texi

   1 @c Copyright (C) 2006-2020 Free Software Foundation, Inc.
   2 @c Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5
   6 @c ---------------------------------------------------------------------
   7 @c Loop Representation
   8 @c ---------------------------------------------------------------------
   9
  10 @node Loop Analysis and Representation
  11 @chapter Analysis and Representation of Loops
  12
  13 GCC provides extensive infrastructure for work with natural loops, i.e.,
  14 strongly connected components of CFG with only one entry block.  This
  15 chapter describes representation of loops in GCC, both on GIMPLE and in
  16 RTL, as well as the interfaces to loop-related analyses (induction
  17 variable analysis and number of iterations analysis).
  18
  19 @menu
  20 * Loop representation::         Representation and analysis of loops.
  21 * Loop querying::               Getting information about loops.
  22 * Loop manipulation::           Loop manipulation functions.
  23 * LCSSA::                       Loop-closed SSA form.
  24 * Scalar evolutions::           Induction variables on GIMPLE.
  25 * loop-iv::                     Induction variables on RTL.
  26 * Number of iterations::        Number of iterations analysis.
  27 * Dependency analysis::         Data dependency analysis.
  28 @end menu
  29
  30 @node Loop representation
  31 @section Loop representation
  32 @cindex Loop representation
  33 @cindex Loop analysis
  34
  35 This chapter describes the representation of loops in GCC, and functions
  36 that can be used to build, modify and analyze this representation.  Most
  37 of the interfaces and data structures are declared in @file{cfgloop.h}.
  38 Loop structures are analyzed and this information disposed or updated
  39 at the discretion of individual passes.  Still most of the generic
  40 CFG manipulation routines are aware of loop structures and try to
  41 keep them up-to-date.  By this means an increasing part of the
  42 compilation pipeline is setup to maintain loop structure across
  43 passes to allow attaching meta information to individual loops
  44 for consumption by later passes.
  45
  46 In general, a natural loop has one entry block (header) and possibly
  47 several back edges (latches) leading to the header from the inside of
  48 the loop.  Loops with several latches may appear if several loops share
  49 a single header, or if there is a branching in the middle of the loop.
  50 The representation of loops in GCC however allows only loops with a
  51 single latch.  During loop analysis, headers of such loops are split and
  52 forwarder blocks are created in order to disambiguate their structures.
  53 Heuristic based on profile information and structure of the induction
  54 variables in the loops is used to determine whether the latches
  55 correspond to sub-loops or to control flow in a single loop.  This means
  56 that the analysis sometimes changes the CFG, and if you run it in the
  57 middle of an optimization pass, you must be able to deal with the new
  58 blocks.  You may avoid CFG changes by passing
  59 @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
  60 note however that most other loop manipulation functions will not work
  61 correctly for loops with multiple latch edges (the functions that only
  62 query membership of blocks to loops and subloop relationships, or
  63 enumerate and test loop exits, can be expected to work).
  64
  65 Body of the loop is the set of blocks that are dominated by its header,
  66 and reachable from its latch against the direction of edges in CFG@.  The
  67 loops are organized in a containment hierarchy (tree) such that all the
  68 loops immediately contained inside loop L are the children of L in the
  69 tree.  This tree is represented by the @code{struct loops} structure.
  70 The root of this tree is a fake loop that contains all blocks in the
  71 function.  Each of the loops is represented in a @code{struct loop}
  72 structure.  Each loop is assigned an index (@code{num} field of the
  73 @code{struct loop} structure), and the pointer to the loop is stored in
  74 the corresponding field of the @code{larray} vector in the loops
  75 structure.  The indices do not have to be continuous, there may be
  76 empty (@code{NULL}) entries in the @code{larray} created by deleting
  77 loops.  Also, there is no guarantee on the relative order of a loop
  78 and its subloops in the numbering.  The index of a loop never changes.
  79
  80 The entries of the @code{larray} field should not be accessed directly.
  81 The function @code{get_loop} returns the loop description for a loop with
  82 the given index.  @code{number_of_loops} function returns number of
  83 loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
  84 macro.  The @code{flags} argument of the macro is used to determine
  85 the direction of traversal and the set of loops visited.  Each loop is
  86 guaranteed to be visited exactly once, regardless of the changes to the
  87 loop tree, and the loops may be removed during the traversal.  The newly
  88 created loops are never traversed, if they need to be visited, this
  89 must be done separately after their creation.
  90
  91 Each basic block contains the reference to the innermost loop it belongs
  92 to (@code{loop_father}).  For this reason, it is only possible to have
  93 one @code{struct loops} structure initialized at the same time for each
  94 CFG@.  The global variable @code{current_loops} contains the
  95 @code{struct loops} structure.  Many of the loop manipulation functions
  96 assume that dominance information is up-to-date.
  97
  98 The loops are analyzed through @code{loop_optimizer_init} function.  The
  99 argument of this function is a set of flags represented in an integer
 100 bitmask.  These flags specify what other properties of the loop
 101 structures should be calculated/enforced and preserved later:
 102
 103 @itemize
 104 @item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
 105 changes to CFG will be performed in the loop analysis, in particular,
 106 loops with multiple latch edges will not be disambiguated.  If a loop
 107 has multiple latches, its latch block is set to NULL@.  Most of
 108 the loop manipulation functions will not work for loops in this shape.
 109 No other flags that require CFG changes can be passed to
 110 loop_optimizer_init.
 111 @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
 112 a way that each loop has only one entry edge, and additionally, the
 113 source block of this entry edge has only one successor.  This creates a
 114 natural place where the code can be moved out of the loop, and ensures
 115 that the entry edge of the loop leads from its immediate super-loop.
 116 @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
 117 force the latch block of each loop to have only one successor.  This
 118 ensures that the latch of the loop does not belong to any of its
 119 sub-loops, and makes manipulation with the loops significantly easier.
 120 Most of the loop manipulation functions assume that the loops are in
 121 this shape.  Note that with this flag, the ``normal'' loop without any
 122 control flow inside and with one exit consists of two basic blocks.
 123 @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
 124 edges in the strongly connected components that are not natural loops
 125 (have more than one entry block) are marked with
 126 @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
 127 flag is not set for blocks and edges that belong to natural loops that
 128 are in such an irreducible region (but it is set for the entry and exit
 129 edges of such a loop, if they lead to/from this region).
 130 @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
 131 and updated for each loop.  This makes some functions (e.g.,
 132 @code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
 133 @code{single_exit}) can be used only if the lists of exits are
 134 recorded.
 135 @end itemize
 136
 137 These properties may also be computed/enforced later, using functions
 138 @code{create_preheaders}, @code{force_single_succ_latches},
 139 @code{mark_irreducible_loops} and @code{record_loop_exits}.
 140 The properties can be queried using @code{loops_state_satisfies_p}.
 141
 142 The memory occupied by the loops structures should be freed with
 143 @code{loop_optimizer_finalize} function.  When loop structures are
 144 setup to be preserved across passes this function reduces the
 145 information to be kept up-to-date to a minimum (only
 146 @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} set).
 147
 148 The CFG manipulation functions in general do not update loop structures.
 149 Specialized versions that additionally do so are provided for the most
 150 common tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
 151 used to cleanup CFG while updating the loops structures if
 152 @code{current_loops} is set.
 153
 154 At the moment loop structure is preserved from the start of GIMPLE
 155 loop optimizations until the end of RTL loop optimizations.  During
 156 this time a loop can be tracked by its @code{struct loop} and number.
 157
 158 @node Loop querying
 159 @section Loop querying
 160 @cindex Loop querying
 161
 162 The functions to query the information about loops are declared in
 163 @file{cfgloop.h}.  Some of the information can be taken directly from
 164 the structures.  @code{loop_father} field of each basic block contains
 165 the innermost loop to that the block belongs.  The most useful fields of
 166 loop structure (that are kept up-to-date at all times) are:
 167
 168 @itemize
 169 @item @code{header}, @code{latch}: Header and latch basic blocks of the
 170 loop.
 171 @item @code{num_nodes}: Number of basic blocks in the loop (including
 172 the basic blocks of the sub-loops).
 173 @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
 174 sub-loop, and the sibling of the loop in the loops tree.
 175 @end itemize
 176
 177 There are other fields in the loop structures, many of them used only by
 178 some of the passes, or not updated during CFG changes; in general, they
 179 should not be accessed directly.
 180
 181 The most important functions to query loop structures are:
 182
 183 @itemize
 184 @item @code{loop_depth}: The depth of the loop in the loops tree, i.e., the
 185 number of super-loops of the loop.
 186 @item @code{flow_loops_dump}: Dumps the information about loops to a
 187 file.
 188 @item @code{verify_loop_structure}: Checks consistency of the loop
 189 structures.
 190 @item @code{loop_latch_edge}: Returns the latch edge of a loop.
 191 @item @code{loop_preheader_edge}: If loops have preheaders, returns
 192 the preheader edge of a loop.
 193 @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
 194 another loop.
 195 @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
 196 to a loop (including its sub-loops).
 197 @item @code{find_common_loop}: Finds the common super-loop of two loops.
 198 @item @code{superloop_at_depth}: Returns the super-loop of a loop with
 199 the given depth.
 200 @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
 201 number of insns in the loop, on GIMPLE and on RTL.
 202 @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
 203 loop.
 204 @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
 205 with @code{EDGE_LOOP_EXIT} flag.
 206 @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
 207 @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
 208 loop in depth-first search order in reversed CFG, ordered by dominance
 209 relation, and breath-first search order, respectively.
 210 @item @code{single_exit}: Returns the single exit edge of the loop, or
 211 @code{NULL} if the loop has more than one exit.  You can only use this
 212 function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
 213 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
 214 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 215 is executed exactly once during each iteration of a loop (that is, it
 216 does not belong to a sub-loop, and it dominates the latch of the loop).
 217 @end itemize
 218
 219 @node Loop manipulation
 220 @section Loop manipulation
 221 @cindex Loop manipulation
 222
 223 The loops tree can be manipulated using the following functions:
 224
 225 @itemize
 226 @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
 227 @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
 228 @item @code{add_bb_to_loop}: Adds a basic block to a loop.
 229 @item @code{remove_bb_from_loops}: Removes a basic block from loops.
 230 @end itemize
 231
 232 Most low-level CFG functions update loops automatically.  The following
 233 functions handle some more complicated cases of CFG manipulations:
 234
 235 @itemize
 236 @item @code{remove_path}: Removes an edge and all blocks it dominates.
 237 @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
 238 ensuring that PHI node arguments remain in the loop (this ensures that
 239 loop-closed SSA form is preserved).  Only useful on GIMPLE.
 240 @end itemize
 241
 242 Finally, there are some higher-level loop transformations implemented.
 243 While some of them are written so that they should work on non-innermost
 244 loops, they are mostly untested in that case, and at the moment, they
 245 are only reliable for the innermost loops:
 246
 247 @itemize
 248 @item @code{create_iv}: Creates a new induction variable.  Only works on
 249 GIMPLE@.  @code{standard_iv_increment_position} can be used to find a
 250 suitable place for the iv increment.
 251 @item @code{duplicate_loop_to_header_edge},
 252 @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
 253 on GIMPLE) duplicate the body of the loop prescribed number of times on
 254 one of the edges entering loop header, thus performing either loop
 255 unrolling or loop peeling.  @code{can_duplicate_loop_p}
 256 (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
 257 loop.
 258 @item @code{loop_version}: This function creates a copy of a loop, and
 259 a branch before them that selects one of them depending on the
 260 prescribed condition.  This is useful for optimizations that need to
 261 verify some assumptions in runtime (one of the copies of the loop is
 262 usually left unchanged, while the other one is transformed in some way).
 263 @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
 264 extra iterations to make the number of iterations divisible by unroll
 265 factor, updating the exit condition, and removing the exits that now
 266 cannot be taken.  Works only on GIMPLE.
 267 @end itemize
 268
 269 @node LCSSA
 270 @section Loop-closed SSA form
 271 @cindex LCSSA
 272 @cindex Loop-closed SSA form
 273
 274 Throughout the loop optimizations on tree level, one extra condition is
 275 enforced on the SSA form:  No SSA name is used outside of the loop in
 276 that it is defined.  The SSA form satisfying this condition is called
 277 ``loop-closed SSA form'' -- LCSSA@.  To enforce LCSSA, PHI nodes must be
 278 created at the exits of the loops for the SSA names that are used
 279 outside of them.  Only the real operands (not virtual SSA names) are
 280 held in LCSSA, in order to save memory.
 281
 282 There are various benefits of LCSSA:
 283
 284 @itemize
 285 @item Many optimizations (value range analysis, final value
 286 replacement) are interested in the values that are defined in the loop
 287 and used outside of it, i.e., exactly those for that we create new PHI
 288 nodes.
 289 @item In induction variable analysis, it is not necessary to specify the
 290 loop in that the analysis should be performed -- the scalar evolution
 291 analysis always returns the results with respect to the loop in that the
 292 SSA name is defined.
 293 @item It makes updating of SSA form during loop transformations simpler.
 294 Without LCSSA, operations like loop unrolling may force creation of PHI
 295 nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
 296 updated locally.  However, since we only keep real operands in LCSSA, we
 297 cannot use this advantage (we could have local updating of real
 298 operands, but it is not much more efficient than to use generic SSA form
 299 updating for it as well; the amount of changes to SSA is the same).
 300 @end itemize
 301
 302 However, it also means LCSSA must be updated.  This is usually
 303 straightforward, unless you create a new value in loop and use it
 304 outside, or unless you manipulate loop exit edges (functions are
 305 provided to make these manipulations simple).
 306 @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
 307 LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
 308 LCSSA is preserved.
 309
 310 @node Scalar evolutions
 311 @section Scalar evolutions
 312 @cindex Scalar evolutions
 313 @cindex IV analysis on GIMPLE
 314
 315 Scalar evolutions (SCEV) are used to represent results of induction
 316 variable analysis on GIMPLE@.  They enable us to represent variables with
 317 complicated behavior in a simple and consistent way (we only use it to
 318 express values of polynomial induction variables, but it is possible to
 319 extend it).  The interfaces to SCEV analysis are declared in
 320 @file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
 321 @code{scev_initialize} must be used.  To stop using SCEV,
 322 @code{scev_finalize} should be used.  SCEV analysis caches results in
 323 order to save time and memory.  This cache however is made invalid by
 324 most of the loop transformations, including removal of code.  If such a
 325 transformation is performed, @code{scev_reset} must be called to clean
 326 the caches.
 327
 328 Given an SSA name, its behavior in loops can be analyzed using the
 329 @code{analyze_scalar_evolution} function.  The returned SCEV however
 330 does not have to be fully analyzed and it may contain references to
 331 other SSA names defined in the loop.  To resolve these (potentially
 332 recursive) references, @code{instantiate_parameters} or
 333 @code{resolve_mixers} functions must be used.
 334 @code{instantiate_parameters} is useful when you use the results of SCEV
 335 only for some analysis, and when you work with whole nest of loops at
 336 once.  It will try replacing all SSA names by their SCEV in all loops,
 337 including the super-loops of the current loop, thus providing a complete
 338 information about the behavior of the variable in the loop nest.
 339 @code{resolve_mixers} is useful if you work with only one loop at a
 340 time, and if you possibly need to create code based on the value of the
 341 induction variable.  It will only resolve the SSA names defined in the
 342 current loop, leaving the SSA names defined outside unchanged, even if
 343 their evolution in the outer loops is known.
 344
 345 The SCEV is a normal tree expression, except for the fact that it may
 346 contain several special tree nodes.  One of them is
 347 @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
 348 expressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
 349 has three arguments -- base, step and loop (both base and step may
 350 contain further polynomial chrecs).  Type of the expression and of base
 351 and step must be the same.  A variable has evolution
 352 @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
 353 loop) equivalent to @code{x_1} in the following example
 354
 355 @smallexample
 356 while (@dots{})
 357   @{
 358     x_1 = phi (base, x_2);
 359     x_2 = x_1 + step;
 360   @}
 361 @end smallexample
 362
 363 Note that this includes the language restrictions on the operations.
 364 For example, if we compile C code and @code{x} has signed type, then the
 365 overflow in addition would cause undefined behavior, and we may assume
 366 that this does not happen.  Hence, the value with this SCEV cannot
 367 overflow (which restricts the number of iterations of such a loop).
 368
 369 In many cases, one wants to restrict the attention just to affine
 370 induction variables.  In this case, the extra expressive power of SCEV
 371 is not useful, and may complicate the optimizations.  In this case,
 372 @code{simple_iv} function may be used to analyze a value -- the result
 373 is a loop-invariant base and step.
 374
 375 @node loop-iv
 376 @section IV analysis on RTL
 377 @cindex IV analysis on RTL
 378
 379 The induction variable on RTL is simple and only allows analysis of
 380 affine induction variables, and only in one loop at once.  The interface
 381 is declared in @file{cfgloop.h}.  Before analyzing induction variables
 382 in a loop L, @code{iv_analysis_loop_init} function must be called on L.
 383 After the analysis (possibly calling @code{iv_analysis_loop_init} for
 384 several loops) is finished, @code{iv_analysis_done} should be called.
 385 The following functions can be used to access the results of the
 386 analysis:
 387
 388 @itemize
 389 @item @code{iv_analyze}: Analyzes a single register used in the given
 390 insn.  If no use of the register in this insn is found, the following
 391 insns are scanned, so that this function can be called on the insn
 392 returned by get_condition.
 393 @item @code{iv_analyze_result}: Analyzes result of the assignment in the
 394 given insn.
 395 @item @code{iv_analyze_expr}: Analyzes a more complicated expression.
 396 All its operands are analyzed by @code{iv_analyze}, and hence they must
 397 be used in the specified insn or one of the following insns.
 398 @end itemize
 399
 400 The description of the induction variable is provided in @code{struct
 401 rtx_iv}.  In order to handle subregs, the representation is a bit
 402 complicated; if the value of the @code{extend} field is not
 403 @code{UNKNOWN}, the value of the induction variable in the i-th
 404 iteration is
 405
 406 @smallexample
 407 delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
 408 @end smallexample
 409
 410 with the following exception:  if @code{first_special} is true, then the
 411 value in the first iteration (when @code{i} is zero) is @code{delta +
 412 mult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
 413 then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
 414 and the value in the i-th iteration is
 415
 416 @smallexample
 417 subreg_@{mode@} (base + i * step)
 418 @end smallexample
 419
 420 The function @code{get_iv_value} can be used to perform these
 421 calculations.
 422
 423 @node Number of iterations
 424 @section Number of iterations analysis
 425 @cindex Number of iterations analysis
 426
 427 Both on GIMPLE and on RTL, there are functions available to determine
 428 the number of iterations of a loop, with a similar interface.  The
 429 number of iterations of a loop in GCC is defined as the number of
 430 executions of the loop latch.  In many cases, it is not possible to
 431 determine the number of iterations unconditionally -- the determined
 432 number is correct only if some assumptions are satisfied.  The analysis
 433 tries to verify these conditions using the information contained in the
 434 program; if it fails, the conditions are returned together with the
 435 result.  The following information and conditions are provided by the
 436 analysis:
 437
 438 @itemize
 439 @item @code{assumptions}: If this condition is false, the rest of
 440 the information is invalid.
 441 @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
 442 this condition is true, the loop exits in the first iteration.
 443 @item @code{infinite}: If this condition is true, the loop is infinite.
 444 This condition is only available on RTL@.  On GIMPLE, conditions for
 445 finiteness of the loop are included in @code{assumptions}.
 446 @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
 447 that gives number of iterations.  The number of iterations is defined as
 448 the number of executions of the loop latch.
 449 @end itemize
 450
 451 Both on GIMPLE and on RTL, it necessary for the induction variable
 452 analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
 453 On GIMPLE, the results are stored to @code{struct tree_niter_desc}
 454 structure.  Number of iterations before the loop is exited through a
 455 given exit can be determined using @code{number_of_iterations_exit}
 456 function.  On RTL, the results are returned in @code{struct niter_desc}
 457 structure.  The corresponding function is named
 458 @code{check_simple_exit}.  There are also functions that pass through
 459 all the exits of a loop and try to find one with easy to determine
 460 number of iterations -- @code{find_loop_niter} on GIMPLE and
 461 @code{find_simple_exit} on RTL@.  Finally, there are functions that
 462 provide the same information, but additionally cache it, so that
 463 repeated calls to number of iterations are not so costly --
 464 @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
 465 on RTL.
 466
 467 Note that some of these functions may behave slightly differently than
 468 others -- some of them return only the expression for the number of
 469 iterations, and fail if there are some assumptions.  The function
 470 @code{number_of_latch_executions} works only for single-exit loops.
 471 The function @code{number_of_cond_exit_executions} can be used to
 472 determine number of executions of the exit condition of a single-exit
 473 loop (i.e., the @code{number_of_latch_executions} increased by one).
 474
 475 On GIMPLE, below constraint flags affect semantics of some APIs of number
 476 of iterations analyzer:
 477
 478 @itemize
 479 @item @code{LOOP_C_INFINITE}: If this constraint flag is set, the loop
 480 is known to be infinite.  APIs like @code{number_of_iterations_exit} can
 481 return false directly without doing any analysis.
 482 @item @code{LOOP_C_FINITE}: If this constraint flag is set, the loop is
 483 known to be finite, in other words, loop's number of iterations can be
 484 computed with @code{assumptions} be true.
 485 @end itemize
 486
 487 Generally, the constraint flags are set/cleared by consumers which are
 488 loop optimizers.  It's also the consumers' responsibility to set/clear
 489 constraints correctly.  Failing to do that might result in hard to track
 490 down bugs in scev/niter consumers.  One typical use case is vectorizer:
 491 it drives number of iterations analyzer by setting @code{LOOP_C_FINITE}
 492 and vectorizes possibly infinite loop by versioning loop with analysis
 493 result.  In return, constraints set by consumers can also help number of
 494 iterations analyzer in following optimizers.  For example, @code{niter}
 495 of a loop versioned under @code{assumptions} is valid unconditionally.
 496
 497 Other constraints may be added in the future, for example, a constraint
 498 indicating that loops' latch must roll thus @code{may_be_zero} would be
 499 false unconditionally.
 500
 501 @node Dependency analysis
 502 @section Data Dependency Analysis
 503 @cindex Data Dependency Analysis
 504
 505 The code for the data dependence analysis can be found in
 506 @file{tree-data-ref.c} and its interface and data structures are
 507 described in @file{tree-data-ref.h}.  The function that computes the
 508 data dependences for all the array and pointer references for a given
 509 loop is @code{compute_data_dependences_for_loop}.  This function is
 510 currently used by the linear loop transform and the vectorization
 511 passes.  Before calling this function, one has to allocate two vectors:
 512 a first vector will contain the set of data references that are
 513 contained in the analyzed loop body, and the second vector will contain
 514 the dependence relations between the data references.  Thus if the
 515 vector of data references is of size @code{n}, the vector containing the
 516 dependence relations will contain @code{n*n} elements.  However if the
 517 analyzed loop contains side effects, such as calls that potentially can
 518 interfere with the data references in the current analyzed loop, the
 519 analysis stops while scanning the loop body for data references, and
 520 inserts a single @code{chrec_dont_know} in the dependence relation
 521 array.
 522
 523 The data references are discovered in a particular order during the
 524 scanning of the loop body: the loop body is analyzed in execution order,
 525 and the data references of each statement are pushed at the end of the
 526 data reference array.  Two data references syntactically occur in the
 527 program in the same order as in the array of data references.  This
 528 syntactic order is important in some classical data dependence tests,
 529 and mapping this order to the elements of this array avoids costly
 530 queries to the loop body representation.
 531
 532 Three types of data references are currently handled: ARRAY_REF,
 533 INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
 534 is @code{data_reference}, where @code{data_reference_p} is a name of a
 535 pointer to the data reference structure. The structure contains the
 536 following elements:
 537
 538 @itemize
 539 @item @code{base_object_info}: Provides information about the base object
 540 of the data reference and its access functions. These access functions
 541 represent the evolution of the data reference in the loop relative to
 542 its base, in keeping with the classical meaning of the data reference
 543 access function for the support of arrays. For example, for a reference
 544 @code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
 545 one for each array subscript, are:
 546 @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
 547
 548 @item @code{first_location_in_loop}: Provides information about the first
 549 location accessed by the data reference in the loop and about the access
 550 function used to represent evolution relative to this location. This data
 551 is used to support pointers, and is not used for arrays (for which we
 552 have base objects). Pointer accesses are represented as a one-dimensional
 553 access that starts from the first location accessed in the loop. For
 554 example:
 555
 556 @smallexample
 557       for1 i
 558          for2 j
 559           *((int *)p + i + j) = a[i][j];
 560 @end smallexample
 561
 562 The access function of the pointer access is @code{@{0, + 4B@}_for2}
 563 relative to @code{p + i}. The access functions of the array are
 564 @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
 565 relative to @code{a}.
 566
 567 Usually, the object the pointer refers to is either unknown, or we cannot
 568 prove that the access is confined to the boundaries of a certain object.
 569
 570 Two data references can be compared only if at least one of these two
 571 representations has all its fields filled for both data references.
 572
 573 The current strategy for data dependence tests is as follows:
 574 If both @code{a} and @code{b} are represented as arrays, compare
 575 @code{a.base_object} and @code{b.base_object};
 576 if they are equal, apply dependence tests (use access functions based on
 577 base_objects).
 578 Else if both @code{a} and @code{b} are represented as pointers, compare
 579 @code{a.first_location} and @code{b.first_location};
 580 if they are equal, apply dependence tests (use access functions based on
 581 first location).
 582 However, if @code{a} and @code{b} are represented differently, only try
 583 to prove that the bases are definitely different.
 584
 585 @item Aliasing information.
 586 @item Alignment information.
 587 @end itemize
 588
 589 The structure describing the relation between two data references is
 590 @code{data_dependence_relation} and the shorter name for a pointer to
 591 such a structure is @code{ddr_p}.  This structure contains:
 592
 593 @itemize
 594 @item a pointer to each data reference,
 595 @item a tree node @code{are_dependent} that is set to @code{chrec_known}
 596 if the analysis has proved that there is no dependence between these two
 597 data references, @code{chrec_dont_know} if the analysis was not able to
 598 determine any useful result and potentially there could exist a
 599 dependence between these data references, and @code{are_dependent} is
 600 set to @code{NULL_TREE} if there exist a dependence relation between the
 601 data references, and the description of this dependence relation is
 602 given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
 603 arrays,
 604 @item a boolean that determines whether the dependence relation can be
 605 represented by a classical distance vector,
 606 @item an array @code{subscripts} that contains a description of each
 607 subscript of the data references.  Given two array accesses a
 608 subscript is the tuple composed of the access functions for a given
 609 dimension.  For example, given @code{A[f1][f2][f3]} and
 610 @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
 611 g2), (f3, g3)}.
 612 @item two arrays @code{dir_vects} and @code{dist_vects} that contain
 613 classical representations of the data dependences under the form of
 614 direction and distance dependence vectors,
 615 @item an array of loops @code{loop_nest} that contains the loops to
 616 which the distance and direction vectors refer to.
 617 @end itemize
 618
 619 Several functions for pretty printing the information extracted by the
 620 data dependence analysis are available: @code{dump_ddrs} prints with a
 621 maximum verbosity the details of a data dependence relations array,
 622 @code{dump_dist_dir_vectors} prints only the classical distance and
 623 direction vectors for a data dependence relations array, and
 624 @code{dump_data_references} prints the details of the data references
 625 contained in a data reference array.