gcc/doc/lto.texi

   1 @c Copyright (c) 2010 Free Software Foundation, Inc.
   2 @c Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5 @c Contributed by Jan Hubicka <jh@suse.cz> and
   6 @c Diego Novillo <dnovillo@google.com>
   7
   8 @node LTO
   9 @chapter Link Time Optimization
  10 @cindex lto
  11 @cindex whopr
  12 @cindex wpa
  13 @cindex ltrans
  14
  15 Link Time Optimization (LTO) gives GCC the capability of
  16 dumping its internal representation (GIMPLE) to disk,
  17 so that all the different compilation units that make up
  18 a single executable can be optimized as a single module.
  19 This expands the scope of inter-procedural optimizations
  20 to encompass the whole program (or, rather, everything
  21 that is visible at link time).
  22
  23 @menu
  24 * LTO Overview::            Overview of LTO.
  25 * LTO object file layout::  LTO file sections in ELF.
  26 * IPA::                     Using summary information in IPA passes.
  27 * WHOPR::                   Whole program assumptions,
  28                             linker plugin and symbol visibilities.
  29 * Internal flags::          Internal flags controlling @code{lto1}.
  30 @end menu
  31
  32 @node LTO Overview
  33 @section Design Overview
  34
  35 Link time optimization is implemented as a GCC front end for a
  36 bytecode representation of GIMPLE that is emitted in special sections
  37 of @code{.o} files.  Currently, LTO support is enabled in most
  38 ELF-based systems, as well as darwin, cygwin and mingw systems.
  39
  40 Since GIMPLE bytecode is saved alongside final object code, object
  41 files generated with LTO support are larger than regular object files.
  42 This ``fat'' object format makes it easy to integrate LTO into
  43 existing build systems, as one can, for instance, produce archives of
  44 the files.  Additionally, one might be able to ship one set of fat
  45 objects which could be used both for development and the production of
  46 optimized builds.  A, perhaps surprising, side effect of this feature
  47 is that any mistake in the toolchain that leads to LTO information not
  48 being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
  49 This is both an advantage, as the system is more robust, and a
  50 disadvantage, as the user is not informed that the optimization has
  51 been disabled.
  52
  53 The current implementation only produces ``fat'' objects, effectively
  54 doubling compilation time and increasing file sizes up to 5x the
  55 original size.  This hides the problem that some tools, such as
  56 @code{ar} and @code{nm}, need to understand symbol tables of LTO
  57 sections.  These tools were extended to use the plugin infrastructure,
  58 and with these problems solved, GCC will also support ``slim'' objects
  59 consisting of the intermediate code alone.
  60
  61 At the highest level, LTO splits the compiler in two.  The first half
  62 (the ``writer'') produces a streaming representation of all the
  63 internal data structures needed to optimize and generate code.  This
  64 includes declarations, types, the callgraph and the GIMPLE representation
  65 of function bodies.
  66
  67 When @option{-flto} is given during compilation of a source file, the
  68 pass manager executes all the passes in @code{all_lto_gen_passes}.
  69 Currently, this phase is composed of two IPA passes:
  70
  71 @itemize @bullet
  72 @item @code{pass_ipa_lto_gimple_out}
  73 This pass executes the function @code{lto_output} in
  74 @file{lto-streamer-out.c}, which traverses the call graph encoding
  75 every reachable declaration, type and function.  This generates a
  76 memory representation of all the file sections described below.
  77
  78 @item @code{pass_ipa_lto_finish_out}
  79 This pass executes the function @code{produce_asm_for_decls} in
  80 @file{lto-streamer-out.c}, which takes the memory image built in the
  81 previous pass and encodes it in the corresponding ELF file sections.
  82 @end itemize
  83
  84 The second half of LTO support is the ``reader''.  This is implemented
  85 as the GCC front end @file{lto1} in @file{lto/lto.c}.  When
  86 @file{collect2} detects a link set of @code{.o}/@code{.a} files with
  87 LTO information and the @option{-flto} is enabled, it invokes
  88 @file{lto1} which reads the set of files and aggregates them into a
  89 single translation unit for optimization.  The main entry point for
  90 the reader is @file{lto/lto.c}:@code{lto_main}.
  91
  92 @subsection LTO modes of operation
  93
  94 One of the main goals of the GCC link-time infrastructure was to allow
  95 effective compilation of large programs.  For this reason GCC implements two
  96 link-time compilation modes.
  97
  98 @enumerate
  99 @item   @emph{LTO mode}, in which the whole program is read into the
 100 compiler at link-time and optimized in a similar way as if it
 101 were a single source-level compilation unit.
 102
 103 @item   @emph{WHOPR or partitioned mode}, designed to utilize multiple
 104 CPUs and/or a distributed compilation environment to quickly link
 105 large applications.  WHOPR stands for WHOle Program optimizeR (not to
 106 be confused with the semantics of @option{-fwhole-program}).  It
 107 partitions the aggregated callgraph from many different @code{.o}
 108 files and distributes the compilation of the sub-graphs to different
 109 CPUs.
 110
 111 Note that distributed compilation is not implemented yet, but since
 112 the parallelism is facilitated via generating a @code{Makefile}, it
 113 would be easy to implement.
 114 @end enumerate
 115
 116 WHOPR splits LTO into three main stages:
 117 @enumerate
 118 @item Local generation (LGEN)
 119 This stage executes in parallel.  Every file in the program is compiled
 120 into the intermediate language and packaged together with the local
 121 call-graph and summary information.  This stage is the same for both
 122 the LTO and WHOPR compilation mode.
 123
 124 @item Whole Program Analysis (WPA)
 125 WPA is performed sequentially.  The global call-graph is generated, and
 126 a global analysis procedure makes transformation decisions.  The global
 127 call-graph is partitioned to facilitate parallel optimization during
 128 phase 3.  The results of the WPA stage are stored into new object files
 129 which contain the partitions of program expressed in the intermediate
 130 language and the optimization decisions.
 131
 132 @item Local transformations (LTRANS)
 133 This stage executes in parallel.  All the decisions made during phase 2
 134 are implemented locally in each partitioned object file, and the final
 135 object code is generated.  Optimizations which cannot be decided
 136 efficiently during the phase 2 may be performed on the local
 137 call-graph partitions.
 138 @end enumerate
 139
 140 WHOPR can be seen as an extension of the usual LTO mode of
 141 compilation.  In LTO, WPA and LTRANS are executed within a single
 142 execution of the compiler, after the whole program has been read into
 143 memory.
 144
 145 When compiling in WHOPR mode, the callgraph is partitioned during
 146 the WPA stage.  The whole program is split into a given number of
 147 partitions of roughly the same size.  The compiler tries to
 148 minimize the number of references which cross partition boundaries.
 149 The main advantage of WHOPR is to allow the parallel execution of
 150 LTRANS stages, which are the most time-consuming part of the
 151 compilation process.  Additionally, it avoids the need to load the
 152 whole program into memory.
 153
 154
 155 @node LTO object file layout
 156 @section LTO file sections
 157
 158 LTO information is stored in several ELF sections inside object files.
 159 Data structures and enum codes for sections are defined in
 160 @file{lto-streamer.h}.
 161
 162 These sections are emitted from @file{lto-streamer-out.c} and mapped
 163 in all at once from @file{lto/lto.c}:@code{lto_file_read}.  The
 164 individual functions dealing with the reading/writing of each section
 165 are described below.
 166
 167 @itemize @bullet
 168 @item Command line options (@code{.gnu.lto_.opts})
 169
 170 This section contains the command line options used to generate the
 171 object files.  This is used at link time to determine the optimization
 172 level and other settings when they are not explicitly specified at the
 173 linker command line.
 174
 175 Currently, GCC does not support combining LTO object files compiled
 176 with different set of the command line options into a single binary.
 177 At link time, the options given on the command line and the options
 178 saved on all the files in a link-time set are applied globally.  No
 179 attempt is made at validating the combination of flags (other than the
 180 usual validation done by option processing).  This is implemented in
 181 @file{lto/lto.c}:@code{lto_read_all_file_options}.
 182
 183
 184 @item Symbol table (@code{.gnu.lto_.symtab})
 185
 186 This table replaces the ELF symbol table for functions and variables
 187 represented in the LTO IL.  Symbols used and exported by the optimized
 188 assembly code of ``fat'' objects might not match the ones used and
 189 exported by the intermediate code.  This table is necessary because
 190 the intermediate code is less optimized and thus requires a separate
 191 symbol table.
 192
 193 Additionally, the binary code in the ``fat'' object will lack a call
 194 to a function, since the call was optimized out at compilation time
 195 after the intermediate language was streamed out.  In some special
 196 cases, the same optimization may not happen during link-time
 197 optimization.  This would lead to an undefined symbol if only one
 198 symbol table was used.
 199
 200 The symbol table is emitted in
 201 @file{lto-streamer-out.c}:@code{produce_symtab}.
 202
 203
 204 @item Global declarations and types (@code{.gnu.lto_.decls})
 205
 206 This section contains an intermediate language dump of all
 207 declarations and types required to represent the callgraph, static
 208 variables and top-level debug info.
 209
 210 The contents of this section are emitted in
 211 @file{lto-streamer-out.c}:@code{produce_asm_for_decls}.  Types and
 212 symbols are emitted in a topological order that preserves the sharing
 213 of pointers when the file is read back in
 214 (@file{lto.c}:@code{read_cgraph_and_symbols}).
 215
 216
 217 @item The callgraph (@code{.gnu.lto_.cgraph})
 218
 219 This section contains the basic data structure used by the GCC
 220 inter-procedural optimization infrastructure.  This section stores an
 221 annotated multi-graph which represents the functions and call sites as
 222 well as the variables, aliases and top-level @code{asm} statements.
 223
 224 This section is emitted in
 225 @file{lto-streamer-out.c}:@code{output_cgraph} and read in
 226 @file{lto-cgraph.c}:@code{input_cgraph}.
 227
 228
 229 @item IPA references (@code{.gnu.lto_.refs})
 230
 231 This section contains references between function and static
 232 variables.  It is emitted by @file{lto-cgraph.c}:@code{output_refs}
 233 and read by @file{lto-cgraph.c}:@code{input_refs}.
 234
 235
 236 @item Function bodies (@code{.gnu.lto_.function_body.<name>})
 237
 238 This section contains function bodies in the intermediate language
 239 representation.  Every function body is in a separate section to allow
 240 copying of the section independently to different object files or
 241 reading the function on demand.
 242
 243 Functions are emitted in
 244 @file{lto-streamer-out.c}:@code{output_function} and read in
 245 @file{lto-streamer-in.c}:@code{input_function}.
 246
 247
 248 @item Static variable initializers (@code{.gnu.lto_.vars})
 249
 250 This section contains all the symbols in the global variable pool.  It
 251 is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
 252 @file{lto-cgraph.c}:@code{input_cgraph}.
 253
 254 @item Summaries and optimization summaries used by IPA passes
 255 (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
 256 @code{pureconst} or @code{reference})
 257
 258 These sections are used by IPA passes that need to emit summary
 259 information during LTO generation to be read and aggregated at
 260 link time.  Each pass is responsible for implementing two pass manager
 261 hooks: one for writing the summary and another for reading it in.  The
 262 format of these sections is entirely up to each individual pass.  The
 263 only requirement is that the writer and reader hooks agree on the
 264 format.
 265 @end itemize
 266
 267
 268 @node IPA
 269 @section Using summary information in IPA passes
 270
 271 Programs are represented internally as a @emph{callgraph} (a
 272 multi-graph where nodes are functions and edges are call sites)
 273 and a @emph{varpool} (a list of static and external variables in
 274 the program).
 275
 276 The inter-procedural optimization is organized as a sequence of
 277 individual passes, which operate on the callgraph and the
 278 varpool.  To make the implementation of WHOPR possible, every
 279 inter-procedural optimization pass is split into several stages
 280 that are executed at different times during WHOPR compilation:
 281
 282 @itemize @bullet
 283 @item LGEN time
 284 @enumerate
 285 @item @emph{Generate summary} (@code{generate_summary} in
 286 @code{struct ipa_opt_pass_d}).  This stage analyzes every function
 287 body and variable initializer is examined and stores relevant
 288 information into a pass-specific data structure.
 289
 290 @item @emph{Write summary} (@code{write_summary} in
 291 @code{struct ipa_opt_pass_d}).  This stage writes all the
 292 pass-specific information generated by @code{generate_summary}.
 293 Summaries go into their own @code{LTO_section_*} sections that
 294 have to be declared in @file{lto-streamer.h}:@code{enum
 295 lto_section_type}.  A new section is created by calling
 296 @code{create_output_block} and data can be written using the
 297 @code{lto_output_*} routines.
 298 @end enumerate
 299
 300 @item WPA time
 301 @enumerate
 302 @item @emph{Read summary} (@code{read_summary} in
 303 @code{struct ipa_opt_pass_d}).  This stage reads all the
 304 pass-specific information in exactly the same order that it was
 305 written by @code{write_summary}.
 306
 307 @item @emph{Execute} (@code{execute} in @code{struct
 308 opt_pass}).  This performs inter-procedural propagation.  This
 309 must be done without actual access to the individual function
 310 bodies or variable initializers.  Typically, this results in a
 311 transitive closure operation over the summary information of all
 312 the nodes in the callgraph.
 313
 314 @item @emph{Write optimization summary}
 315 (@code{write_optimization_summary} in @code{struct
 316 ipa_opt_pass_d}).  This writes the result of the inter-procedural
 317 propagation into the object file.  This can use the same data
 318 structures and helper routines used in @code{write_summary}.
 319 @end enumerate
 320
 321 @item LTRANS time
 322 @enumerate
 323 @item @emph{Read optimization summary}
 324 (@code{read_optimization_summary} in @code{struct
 325 ipa_opt_pass_d}).  The counterpart to
 326 @code{write_optimization_summary}.  This reads the interprocedural
 327 optimization decisions in exactly the same format emitted by
 328 @code{write_optimization_summary}.
 329
 330 @item @emph{Transform} (@code{function_transform} and
 331 @code{variable_transform} in @code{struct ipa_opt_pass_d}).
 332 The actual function bodies and variable initializers are updated
 333 based on the information passed down from the @emph{Execute} stage.
 334 @end enumerate
 335 @end itemize
 336
 337 The implementation of the inter-procedural passes are shared
 338 between LTO, WHOPR and classic non-LTO compilation.
 339
 340 @itemize
 341 @item During the traditional file-by-file mode every pass executes its
 342 own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
 343 stages within the single execution context of the compiler.
 344
 345 @item In LTO compilation mode, every pass uses @emph{Generate
 346 summary} and @emph{Write summary} stages at compilation time,
 347 while the @emph{Read summary}, @emph{Execute}, and
 348 @emph{Transform} stages are executed at link time.
 349
 350 @item In WHOPR mode all stages are used.
 351 @end itemize
 352
 353 To simplify development, the GCC pass manager differentiates
 354 between normal inter-procedural passes and small inter-procedural
 355 passes.  A @emph{small inter-procedural pass}
 356 (@code{SIMPLE_IPA_PASS}) is a pass that does
 357 everything at once and thus it can not be executed during WPA in
 358 WHOPR mode.  It defines only the @emph{Execute} stage and during
 359 this stage it accesses and modifies the function bodies.  Such
 360 passes are useful for optimization at LGEN or LTRANS time and are
 361 used, for example, to implement early optimization before writing
 362 object files.  The simple inter-procedural passes can also be used
 363 for easier prototyping and development of a new inter-procedural
 364 pass.
 365
 366
 367 @subsection Virtual clones
 368
 369 One of the main challenges of introducing the WHOPR compilation
 370 mode was addressing the interactions between optimization passes.
 371 In LTO compilation mode, the passes are executed in a sequence,
 372 each of which consists of analysis (or @emph{Generate summary}),
 373 propagation (or @emph{Execute}) and @emph{Transform} stages.
 374 Once the work of one pass is finished, the next pass sees the
 375 updated program representation and can execute.  This makes the
 376 individual passes dependent on each other.
 377
 378 In WHOPR mode all passes first execute their @emph{Generate
 379 summary} stage.  Then summary writing marks the end of the LGEN
 380 stage.  At WPA time,
 381 the summaries are read back into memory and all passes run the
 382 @emph{Execute} stage.  Optimization summaries are streamed and
 383 sent to LTRANS, where all the passes execute the @emph{Transform}
 384 stage.
 385
 386 Most optimization passes split naturally into analysis,
 387 propagation and transformation stages.  But some do not.  The
 388 main problem arises when one pass performs changes and the
 389 following pass gets confused by seeing different callgraphs
 390 between the @emph{Transform} stage and the @emph{Generate summary}
 391 or @emph{Execute} stage.  This means that the passes are required
 392 to communicate their decisions with each other.
 393
 394 To facilitate this communication, the GCC callgraph
 395 infrastructure implements @emph{virtual clones}, a method of
 396 representing the changes performed by the optimization passes in
 397 the callgraph without needing to update function bodies.
 398
 399 A @emph{virtual clone} in the callgraph is a function that has no
 400 associated body, just a description of how to create its body based
 401 on a different function (which itself may be a virtual clone).
 402
 403 The description of function modifications includes adjustments to
 404 the function's signature (which allows, for example, removing or
 405 adding function arguments), substitutions to perform on the
 406 function body, and, for inlined functions, a pointer to the
 407 function that it will be inlined into.
 408
 409 It is also possible to redirect any edge of the callgraph from a
 410 function to its virtual clone.  This implies updating of the call
 411 site to adjust for the new function signature.
 412
 413 Most of the transformations performed by inter-procedural
 414 optimizations can be represented via virtual clones.  For
 415 instance, a constant propagation pass can produce a virtual clone
 416 of the function which replaces one of its arguments by a
 417 constant.  The inliner can represent its decisions by producing a
 418 clone of a function whose body will be later integrated into
 419 a given function.
 420
 421 Using @emph{virtual clones}, the program can be easily updated
 422 during the @emph{Execute} stage, solving most of pass interactions
 423 problems that would otherwise occur during @emph{Transform}.
 424
 425 Virtual clones are later materialized in the LTRANS stage and
 426 turned into real functions.  Passes executed after the virtual
 427 clone were introduced also perform their @emph{Transform} stage
 428 on new functions, so for a pass there is no significant
 429 difference between operating on a real function or a virtual
 430 clone introduced before its @emph{Execute} stage.
 431
 432 Optimization passes then work on virtual clones introduced before
 433 their @emph{Execute} stage as if they were real functions.  The
 434 only difference is that clones are not visible during the
 435 @emph{Generate Summary} stage.
 436
 437 To keep function summaries updated, the callgraph interface
 438 allows an optimizer to register a callback that is called every
 439 time a new clone is introduced as well as when the actual
 440 function or variable is generated or when a function or variable
 441 is removed.  These hooks are registered in the @emph{Generate
 442 summary} stage and allow the pass to keep its information intact
 443 until the @emph{Execute} stage.  The same hooks can also be
 444 registered during the @emph{Execute} stage to keep the
 445 optimization summaries updated for the @emph{Transform} stage.
 446
 447 @subsection IPA references
 448
 449 GCC represents IPA references in the callgraph.  For a function
 450 or variable @code{A}, the @emph{IPA reference} is a list of all
 451 locations where the address of @code{A} is taken and, when
 452 @code{A} is a variable, a list of all direct stores and reads
 453 to/from @code{A}.  References represent an oriented multi-graph on
 454 the union of nodes of the callgraph and the varpool.  See
 455 @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
 456 and
 457 @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
 458 for details.
 459
 460 @subsection Jump functions
 461 Suppose that an optimization pass sees a function @code{A} and it
 462 knows the values of (some of) its arguments.  The @emph{jump
 463 function} describes the value of a parameter of a given function
 464 call in function @code{A} based on this knowledge.
 465
 466 Jump functions are used by several optimizations, such as the
 467 inter-procedural constant propagation pass and the
 468 devirtualization pass.  The inliner also uses jump functions to
 469 perform inlining of callbacks.
 470
 471 @node WHOPR
 472 @section Whole program assumptions, linker plugin and symbol visibilities
 473
 474 Link-time optimization gives relatively minor benefits when used
 475 alone.  The problem is that propagation of inter-procedural
 476 information does not work well across functions and variables
 477 that are called or referenced by other compilation units (such as
 478 from a dynamically linked library).  We say that such functions
 479 and variables are @emph{externally visible}.
 480
 481 To make the situation even more difficult, many applications
 482 organize themselves as a set of shared libraries, and the default
 483 ELF visibility rules allow one to overwrite any externally
 484 visible symbol with a different symbol at runtime.  This
 485 basically disables any optimizations across such functions and
 486 variables, because the compiler cannot be sure that the function
 487 body it is seeing is the same function body that will be used at
 488 runtime.  Any function or variable not declared @code{static} in
 489 the sources degrades the quality of inter-procedural
 490 optimization.
 491
 492 To avoid this problem the compiler must assume that it sees the
 493 whole program when doing link-time optimization.  Strictly
 494 speaking, the whole program is rarely visible even at link-time.
 495 Standard system libraries are usually linked dynamically or not
 496 provided with the link-time information.  In GCC, the whole
 497 program option (@option{-fwhole-program}) asserts that every
 498 function and variable defined in the current compilation
 499 unit is static, except for function @code{main} (note: at
 500 link time, the current unit is the union of all objects compiled
 501 with LTO).  Since some functions and variables need to
 502 be referenced externally, for example by another DSO or from an
 503 assembler file, GCC also provides the function and variable
 504 attribute @code{externally_visible} which can be used to disable
 505 the effect of @option{-fwhole-program} on a specific symbol.
 506
 507 The whole program mode assumptions are slightly more complex in
 508 C++, where inline functions in headers are put into @emph{COMDAT}
 509 sections.  COMDAT function and variables can be defined by
 510 multiple object files and their bodies are unified at link-time
 511 and dynamic link-time.  COMDAT functions are changed to local only
 512 when their address is not taken and thus un-sharing them with a
 513 library is not harmful.  COMDAT variables always remain externally
 514 visible, however for readonly variables it is assumed that their
 515 initializers cannot be overwritten by a different value.
 516
 517 GCC provides the function and variable attribute
 518 @code{visibility} that can be used to specify the visibility of
 519 externally visible symbols (or alternatively an
 520 @option{-fdefault-visibility} command line option).  ELF defines
 521 the @code{default}, @code{protected}, @code{hidden} and
 522 @code{internal} visibilities.
 523
 524 The most commonly used is visibility is @code{hidden}.  It
 525 specifies that the symbol cannot be referenced from outside of
 526 the current shared library.  Unfortunately, this information
 527 cannot be used directly by the link-time optimization in the
 528 compiler since the whole shared library also might contain
 529 non-LTO objects and those are not visible to the compiler.
 530
 531 GCC solves this problem using linker plugins.  A @emph{linker
 532 plugin} is an interface to the linker that allows an external
 533 program to claim the ownership of a given object file.  The linker
 534 then performs the linking procedure by querying the plugin about
 535 the symbol table of the claimed objects and once the linking
 536 decisions are complete, the plugin is allowed to provide the
 537 final object file before the actual linking is made.  The linker
 538 plugin obtains the symbol resolution information which specifies
 539 which symbols provided by the claimed objects are bound from the
 540 rest of a binary being linked.
 541
 542 Currently, the linker plugin  works only in combination
 543 with the Gold linker, but a GNU ld implementation is under
 544 development.
 545
 546 GCC is designed to be independent of the rest of the toolchain
 547 and aims to support linkers without plugin support.  For this
 548 reason it does not use the linker plugin by default.  Instead,
 549 the object files are examined by @command{collect2} before being
 550 passed to the linker and objects found to have LTO sections are
 551 passed to @command{lto1} first.  This mode does not work for
 552 library archives.  The decision on what object files from the
 553 archive are needed depends on the actual linking and thus GCC
 554 would have to implement the linker itself.  The resolution
 555 information is missing too and thus GCC needs to make an educated
 556 guess based on @option{-fwhole-program}.  Without the linker
 557 plugin GCC also assumes that symbols are declared @code{hidden}
 558 and not referred by non-LTO code by default.
 559
 560 @node Internal flags
 561 @section Internal flags controlling @code{lto1}
 562
 563 The following flags are passed into @command{lto1} and are not
 564 meant to be used directly from the command line.
 565
 566 @itemize
 567 @item -fwpa
 568 @opindex fwpa
 569 This option runs the serial part of the link-time optimizer
 570 performing the inter-procedural propagation (WPA mode).  The
 571 compiler reads in summary information from all inputs and
 572 performs an analysis based on summary information only.  It
 573 generates object files for subsequent runs of the link-time
 574 optimizer where individual object files are optimized using both
 575 summary information from the WPA mode and the actual function
 576 bodies.  It then drives the LTRANS phase.
 577
 578 @item -fltrans
 579 @opindex fltrans
 580 This option runs the link-time optimizer in the
 581 local-transformation (LTRANS) mode, which reads in output from a
 582 previous run of the LTO in WPA mode.  In the LTRANS mode, LTO
 583 optimizes an object and produces the final assembly.
 584
 585 @item -fltrans-output-list=@var{file}
 586 @opindex fltrans-output-list
 587 This option specifies a file to which the names of LTRANS output
 588 files are written.  This option is only meaningful in conjunction
 589 with @option{-fwpa}.
 590 @end itemize