libstdc++-v3/doc/xml/manual/profile_mode.xml

   1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
   2          xml:id="manual.ext.profile_mode" xreflabel="Profile Mode">
   3 <?dbhtml filename="profile_mode.html"?>
   4
   5 <info><title>Profile Mode</title>
   6   <keywordset>
   7     <keyword>C++</keyword>
   8     <keyword>library</keyword>
   9     <keyword>profile</keyword>
  10   </keywordset>
  11 </info>
  12
  13
  14
  15
  16 <section xml:id="manual.ext.profile_mode.intro" xreflabel="Intro"><info><title>Intro</title></info>
  17
  18   <para>
  19   <emphasis>Goal: </emphasis>Give performance improvement advice based on
  20   recognition of suboptimal usage patterns of the standard library.
  21   </para>
  22
  23   <para>
  24   <emphasis>Method: </emphasis>Wrap the standard library code.  Insert
  25   calls to an instrumentation library to record the internal state of
  26   various components at interesting entry/exit points to/from the standard
  27   library.  Process trace, recognize suboptimal patterns, give advice.
  28   For details, see the
  29   <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://http://ieeexplore.ieee.org/document/4907670/">Perflint
  30   paper presented at CGO 2009</link>.
  31   </para>
  32   <para>
  33   <emphasis>Strengths: </emphasis>
  34 <itemizedlist>
  35   <listitem><para>
  36   Unintrusive solution.  The application code does not require any
  37   modification.
  38   </para></listitem>
  39   <listitem><para> The advice is call context sensitive, thus capable of
  40   identifying precisely interesting dynamic performance behavior.
  41   </para></listitem>
  42   <listitem><para>
  43   The overhead model is pay-per-view.  When you turn off a diagnostic class
  44   at compile time, its overhead disappears.
  45   </para></listitem>
  46 </itemizedlist>
  47   </para>
  48   <para>
  49   <emphasis>Drawbacks: </emphasis>
  50 <itemizedlist>
  51   <listitem><para>
  52   You must recompile the application code with custom options.
  53   </para></listitem>
  54   <listitem><para>You must run the application on representative input.
  55   The advice is input dependent.
  56   </para></listitem>
  57   <listitem><para>
  58   The execution time will increase, in some cases by factors.
  59   </para></listitem>
  60 </itemizedlist>
  61   </para>
  62
  63
  64 <section xml:id="manual.ext.profile_mode.using" xreflabel="Using"><info><title>Using the Profile Mode</title></info>
  65
  66
  67   <para>
  68   This is the anticipated common workflow for program <code>foo.cc</code>:
  69 <programlisting>
  70 $ cat foo.cc
  71 #include &lt;vector&gt;
  72 int main() {
  73   vector&lt;int&gt; v;
  74   for (int k = 0; k &lt; 1024; ++k) v.insert(v.begin(), k);
  75 }
  76
  77 $ g++ -D_GLIBCXX_PROFILE foo.cc
  78 $ ./a.out
  79 $ cat libstdcxx-profile.txt
  80 vector-to-list: improvement = 5: call stack = 0x804842c ...
  81     : advice = change std::vector to std::list
  82 vector-size: improvement = 3: call stack = 0x804842c ...
  83     : advice = change initial container size from 0 to 1024
  84 </programlisting>
  85   </para>
  86
  87   <para>
  88   Anatomy of a warning:
  89   <itemizedlist>
  90   <listitem>
  91   <para>
  92   Warning id.  This is a short descriptive string for the class
  93   that this warning belongs to.  E.g., "vector-to-list".
  94   </para>
  95   </listitem>
  96   <listitem>
  97   <para>
  98   Estimated improvement.  This is an approximation of the benefit expected
  99   from implementing the change suggested by the warning.  It is given on
 100   a log10 scale.  Negative values mean that the alternative would actually
 101   do worse than the current choice.
 102   In the example above, 5 comes from the fact that the overhead of
 103   inserting at the beginning of a vector vs. a list is around 1024 * 1024 / 2,
 104   which is around 10e5.  The improvement from setting the initial size to
 105   1024 is in the range of 10e3, since the overhead of dynamic resizing is
 106   linear in this case.
 107   </para>
 108   </listitem>
 109   <listitem>
 110   <para>
 111   Call stack.  Currently, the addresses are printed without
 112   symbol name or code location attribution.
 113   Users are expected to postprocess the output using, for instance, addr2line.
 114   </para>
 115   </listitem>
 116   <listitem>
 117   <para>
 118   The warning message.  For some warnings, this is static text, e.g.,
 119   "change vector to list".  For other warnings, such as the one above,
 120   the message contains numeric advice, e.g., the suggested initial size
 121   of the vector.
 122   </para>
 123   </listitem>
 124   </itemizedlist>
 125   </para>
 126
 127   <para>Three files are generated.  <code>libstdcxx-profile.txt</code>
 128    contains human readable advice.  <code>libstdcxx-profile.raw</code>
 129    contains implementation specific data about each diagnostic.
 130    Their format is not documented.  They are sufficient to generate
 131    all the advice given in <code>libstdcxx-profile.txt</code>.  The advantage
 132    of keeping this raw format is that traces from multiple executions can
 133    be aggregated simply by concatenating the raw traces.  We intend to
 134    offer an external utility program that can issue advice from a trace.
 135    <code>libstdcxx-profile.conf.out</code> lists the actual diagnostic
 136    parameters used.  To alter parameters, edit this file and rename it to
 137    <code>libstdcxx-profile.conf</code>.
 138   </para>
 139
 140   <para>Advice is given regardless whether the transformation is valid.
 141   For instance, we advise changing a map to an unordered_map even if the
 142   application semantics require that data be ordered.
 143   We believe such warnings can help users understand the performance
 144   behavior of their application better, which can lead to changes
 145   at a higher abstraction level.
 146   </para>
 147
 148 </section>
 149
 150 <section xml:id="manual.ext.profile_mode.tuning" xreflabel="Tuning"><info><title>Tuning the Profile Mode</title></info>
 151
 152
 153   <para>Compile time switches and environment variables (see also file
 154    profiler.h).  Unless specified otherwise, they can be set at compile time
 155    using -D_&lt;name&gt; or by setting variable &lt;name&gt;
 156    in the environment where the program is run, before starting execution.
 157   <itemizedlist>
 158   <listitem><para>
 159    <code>_GLIBCXX_PROFILE_NO_&lt;diagnostic&gt;</code>:
 160    disable specific diagnostics.
 161    See section Diagnostics for possible values.
 162    (Environment variables not supported.)
 163    </para></listitem>
 164   <listitem><para>
 165    <code>_GLIBCXX_PROFILE_TRACE_PATH_ROOT</code>: set an alternative root
 166    path for the output files.
 167    </para></listitem>
 168   <listitem><para>_GLIBCXX_PROFILE_MAX_WARN_COUNT: set it to the maximum
 169    number of warnings desired.  The default value is 10.</para></listitem>
 170   <listitem><para>
 171    <code>_GLIBCXX_PROFILE_MAX_STACK_DEPTH</code>: if set to 0,
 172    the advice will
 173    be collected and reported for the program as a whole, and not for each
 174    call context.
 175    This could also be used in continuous regression tests, where you
 176    just need to know whether there is a regression or not.
 177    The default value is 32.
 178    </para></listitem>
 179   <listitem><para>
 180    <code>_GLIBCXX_PROFILE_MEM_PER_DIAGNOSTIC</code>:
 181    set a limit on how much memory to use for the accounting tables for each
 182    diagnostic type.  When this limit is reached, new events are ignored
 183    until the memory usage decreases under the limit.  Generally, this means
 184    that newly created containers will not be instrumented until some
 185    live containers are deleted.  The default is 128 MB.
 186    </para></listitem>
 187   <listitem><para>
 188    <code>_GLIBCXX_PROFILE_NO_THREADS</code>:
 189    Make the library not use threads.  If thread local storage (TLS) is not
 190    available, you will get a preprocessor error asking you to set
 191    -D_GLIBCXX_PROFILE_NO_THREADS if your program is single-threaded.
 192    Multithreaded execution without TLS is not supported.
 193    (Environment variable not supported.)
 194    </para></listitem>
 195   <listitem><para>
 196    <code>_GLIBCXX_HAVE_EXECINFO_H</code>:
 197    This name should be defined automatically at library configuration time.
 198    If your library was configured without <code>execinfo.h</code>, but
 199    you have it in your include path, you can define it explicitly.  Without
 200    it, advice is collected for the program as a whole, and not for each
 201    call context.
 202    (Environment variable not supported.)
 203    </para></listitem>
 204   </itemizedlist>
 205   </para>
 206
 207 </section>
 208
 209 </section>
 210
 211
 212 <section xml:id="manual.ext.profile_mode.design" xreflabel="Design"><info><title>Design</title></info>
 213 <?dbhtml filename="profile_mode_design.html"?>
 214
 215
 216 <para>
 217 </para>
 218 <table frame="all" xml:id="table.profile_code_loc">
 219 <title>Profile Code Location</title>
 220
 221 <tgroup cols="2" align="left" colsep="1" rowsep="1">
 222 <colspec colname="c1"/>
 223 <colspec colname="c2"/>
 224
 225 <thead>
 226   <row>
 227     <entry>Code Location</entry>
 228     <entry>Use</entry>
 229   </row>
 230 </thead>
 231 <tbody>
 232   <row>
 233     <entry><code>libstdc++-v3/include/std/*</code></entry>
 234     <entry>Preprocessor code to redirect to profile extension headers.</entry>
 235   </row>
 236   <row>
 237     <entry><code>libstdc++-v3/include/profile/*</code></entry>
 238     <entry>Profile extension public headers (map, vector, ...).</entry>
 239   </row>
 240   <row>
 241     <entry><code>libstdc++-v3/include/profile/impl/*</code></entry>
 242     <entry>Profile extension internals.  Implementation files are
 243      only included from <code>impl/profiler.h</code>, which is the only
 244      file included from the public headers.</entry>
 245   </row>
 246 </tbody>
 247 </tgroup>
 248 </table>
 249
 250 <para>
 251 </para>
 252
 253 <section xml:id="manual.ext.profile_mode.design.wrapper" xreflabel="Wrapper"><info><title>Wrapper Model</title></info>
 254
 255   <para>
 256   In order to get our instrumented library version included instead of the
 257   release one,
 258   we use the same wrapper model as the debug mode.
 259   We subclass entities from the release version.  Wherever
 260   <code>_GLIBCXX_PROFILE</code> is defined, the release namespace is
 261   <code>std::__norm</code>, whereas the profile namespace is
 262   <code>std::__profile</code>.  Using plain <code>std</code> translates
 263   into <code>std::__profile</code>.
 264   </para>
 265   <para>
 266   Whenever possible, we try to wrap at the public interface level, e.g.,
 267   in <code>unordered_set</code> rather than in <code>hashtable</code>,
 268   in order not to depend on implementation.
 269   </para>
 270   <para>
 271   Mixing object files built with and without the profile mode must
 272   not affect the program execution.  However, there are no guarantees to
 273   the accuracy of diagnostics when using even a single object not built with
 274   <code>-D_GLIBCXX_PROFILE</code>.
 275   Currently, mixing the profile mode with debug and parallel extensions is
 276   not allowed.  Mixing them at compile time will result in preprocessor errors.
 277   Mixing them at link time is undefined.
 278   </para>
 279 </section>
 280
 281
 282 <section xml:id="manual.ext.profile_mode.design.instrumentation" xreflabel="Instrumentation"><info><title>Instrumentation</title></info>
 283
 284   <para>
 285   Instead of instrumenting every public entry and exit point,
 286   we chose to add instrumentation on demand, as needed
 287   by individual diagnostics.
 288   The main reason is that some diagnostics require us to extract bits of
 289   internal state that are particular only to that diagnostic.
 290   We plan to formalize this later, after we learn more about the requirements
 291   of several diagnostics.
 292   </para>
 293   <para>
 294   All the instrumentation points can be switched on and off using
 295   <code>-D[_NO]_GLIBCXX_PROFILE_&lt;diagnostic&gt;</code> options.
 296   With all the instrumentation calls off, there should be negligible
 297   overhead over the release version.  This property is needed to support
 298   diagnostics based on timing of internal operations.  For such diagnostics,
 299   we anticipate turning most of the instrumentation off in order to prevent
 300   profiling overhead from polluting time measurements, and thus diagnostics.
 301   </para>
 302   <para>
 303   All the instrumentation on/off compile time switches live in
 304   <code>include/profile/profiler.h</code>.
 305   </para>
 306 </section>
 307
 308
 309 <section xml:id="manual.ext.profile_mode.design.rtlib" xreflabel="Run Time Behavior"><info><title>Run Time Behavior</title></info>
 310
 311   <para>
 312   For practical reasons, the instrumentation library processes the trace
 313   partially
 314   rather than dumping it to disk in raw form.  Each event is processed when
 315   it occurs.  It is usually attached a cost and it is aggregated into
 316   the database of a specific diagnostic class.  The cost model
 317   is based largely on the standard performance guarantees, but in some
 318   cases we use knowledge about GCC's standard library implementation.
 319   </para>
 320   <para>
 321   Information is indexed by (1) call stack and (2) instance id or address
 322   to be able to understand and summarize precise creation-use-destruction
 323   dynamic chains.  Although the analysis is sensitive to dynamic instances,
 324   the reports are only sensitive to call context.  Whenever a dynamic instance
 325   is destroyed, we accumulate its effect to the corresponding entry for the
 326   call stack of its constructor location.
 327   </para>
 328
 329   <para>
 330   For details, see
 331    <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://dx.doi.org/10.1109/CGO.2009.36">paper presented at
 332    CGO 2009</link>.
 333   </para>
 334 </section>
 335
 336
 337 <section xml:id="manual.ext.profile_mode.design.analysis" xreflabel="Analysis and Diagnostics"><info><title>Analysis and Diagnostics</title></info>
 338
 339   <para>
 340   Final analysis takes place offline, and it is based entirely on the
 341   generated trace and debugging info in the application binary.
 342   See section Diagnostics for a list of analysis types that we plan to support.
 343   </para>
 344   <para>
 345   The input to the analysis is a table indexed by profile type and call stack.
 346   The data type for each entry depends on the profile type.
 347   </para>
 348 </section>
 349
 350
 351 <section xml:id="manual.ext.profile_mode.design.cost-model" xreflabel="Cost Model"><info><title>Cost Model</title></info>
 352
 353   <para>
 354   While it is likely that cost models become complex as we get into
 355   more sophisticated analysis, we will try to follow a simple set of rules
 356   at the beginning.
 357   </para>
 358 <itemizedlist>
 359   <listitem><para><emphasis>Relative benefit estimation:</emphasis>
 360   The idea is to estimate or measure the cost of all operations
 361   in the original scenario versus the scenario we advise to switch to.
 362   For instance, when advising to change a vector to a list, an occurrence
 363   of the <code>insert</code> method will generally count as a benefit.
 364   Its magnitude depends on (1) the number of elements that get shifted
 365   and (2) whether it triggers a reallocation.
 366   </para></listitem>
 367   <listitem><para><emphasis>Synthetic measurements:</emphasis>
 368   We will measure the relative difference between similar operations on
 369   different containers.  We plan to write a battery of small tests that
 370   compare the times of the executions of similar methods on different
 371   containers.  The idea is to run these tests on the target machine.
 372   If this training phase is very quick, we may decide to perform it at
 373   library initialization time.  The results can be cached on disk and reused
 374   across runs.
 375   </para></listitem>
 376   <listitem><para><emphasis>Timers:</emphasis>
 377   We plan to use timers for operations of larger granularity, such as sort.
 378   For instance, we can switch between different sort methods on the fly
 379   and report the one that performs best for each call context.
 380   </para></listitem>
 381   <listitem><para><emphasis>Show stoppers:</emphasis>
 382   We may decide that the presence of an operation nullifies the advice.
 383   For instance, when considering switching from <code>set</code> to
 384   <code>unordered_set</code>, if we detect use of operator <code>++</code>,
 385   we will simply not issue the advice, since this could signal that the use
 386   care require a sorted container.</para></listitem>
 387 </itemizedlist>
 388
 389 </section>
 390
 391
 392 <section xml:id="manual.ext.profile_mode.design.reports" xreflabel="Reports"><info><title>Reports</title></info>
 393
 394   <para>
 395 There are two types of reports.  First, if we recognize a pattern for which
 396 we have a substitute that is likely to give better performance, we print
 397 the advice and estimated performance gain.  The advice is usually associated
 398 to a code position and possibly a call stack.
 399   </para>
 400   <para>
 401 Second, we report performance characteristics for which we do not have
 402 a clear solution for improvement.  For instance, we can point to the user
 403 the top 10 <code>multimap</code> locations
 404 which have the worst data locality in actual traversals.
 405 Although this does not offer a solution,
 406 it helps the user focus on the key problems and ignore the uninteresting ones.
 407   </para>
 408 </section>
 409
 410
 411 <section xml:id="manual.ext.profile_mode.design.testing" xreflabel="Testing"><info><title>Testing</title></info>
 412
 413   <para>
 414   First, we want to make sure we preserve the behavior of the release mode.
 415   You can just type <code>"make check-profile"</code>, which
 416   builds and runs the whole test suite in profile mode.
 417   </para>
 418   <para>
 419   Second, we want to test the correctness of each diagnostic.
 420   We created a <code>profile</code> directory in the test suite.
 421   Each diagnostic must come with at least two tests, one for false positives
 422   and one for false negatives.
 423   </para>
 424 </section>
 425
 426 </section>
 427
 428 <section xml:id="manual.ext.profile_mode.api" xreflabel="API"><info><title>Extensions for Custom Containers</title></info>
 429 <?dbhtml filename="profile_mode_api.html"?>
 430
 431
 432   <para>
 433   Many large projects use their own data structures instead of the ones in the
 434   standard library.  If these data structures are similar in functionality
 435   to the standard library, they can be instrumented with the same hooks
 436   that are used to instrument the standard library.
 437   The instrumentation API is exposed in file
 438   <code>profiler.h</code> (look for "Instrumentation hooks").
 439   </para>
 440
 441 </section>
 442
 443
 444 <section xml:id="manual.ext.profile_mode.cost_model" xreflabel="Cost Model"><info><title>Empirical Cost Model</title></info>
 445 <?dbhtml filename="profile_mode_cost_model.html"?>
 446
 447
 448   <para>
 449   Currently, the cost model uses formulas with predefined relative weights
 450   for alternative containers or container implementations.  For instance,
 451   iterating through a vector is X times faster than iterating through a list.
 452   </para>
 453   <para>
 454   (Under development.)
 455   We are working on customizing this to a particular machine by providing
 456   an automated way to compute the actual relative weights for operations
 457   on the given machine.
 458   </para>
 459   <para>
 460   (Under development.)
 461   We plan to provide a performance parameter database format that can be
 462   filled in either by hand or by an automated training mechanism.
 463   The analysis module will then use this database instead of the built in.
 464   generic parameters.
 465   </para>
 466
 467 </section>
 468
 469
 470 <section xml:id="manual.ext.profile_mode.implementation" xreflabel="Implementation"><info><title>Implementation Issues</title></info>
 471 <?dbhtml filename="profile_mode_impl.html"?>
 472
 473
 474
 475 <section xml:id="manual.ext.profile_mode.implementation.stack" xreflabel="Stack Traces"><info><title>Stack Traces</title></info>
 476
 477   <para>
 478   Accurate stack traces are needed during profiling since we group events by
 479   call context and dynamic instance.  Without accurate traces, diagnostics
 480   may be hard to interpret.  For instance, when giving advice to the user
 481   it is imperative to reference application code, not library code.
 482   </para>
 483   <para>
 484   Currently we are using the libc <code>backtrace</code> routine to get
 485   stack traces.
 486   <code>_GLIBCXX_PROFILE_STACK_DEPTH</code> can be set
 487   to 0 if you are willing to give up call context information, or to a small
 488   positive value to reduce run time overhead.
 489   </para>
 490 </section>
 491
 492
 493 <section xml:id="manual.ext.profile_mode.implementation.symbols" xreflabel="Symbolization"><info><title>Symbolization of Instruction Addresses</title></info>
 494
 495   <para>
 496   The profiling and analysis phases use only instruction addresses.
 497   An external utility such as addr2line is needed to postprocess the result.
 498   We do not plan to add symbolization support in the profile extension.
 499   This would require access to symbol tables, debug information tables,
 500   external programs or libraries and other system dependent information.
 501   </para>
 502 </section>
 503
 504
 505 <section xml:id="manual.ext.profile_mode.implementation.concurrency" xreflabel="Concurrency"><info><title>Concurrency</title></info>
 506
 507   <para>
 508   Our current model is simplistic, but precise.
 509   We cannot afford to approximate because some of our diagnostics require
 510   precise matching of operations to container instance and call context.
 511   During profiling, we keep a single information table per diagnostic.
 512   There is a single lock per information table.
 513   </para>
 514 </section>
 515
 516
 517 <section xml:id="manual.ext.profile_mode.implementation.stdlib-in-proflib" xreflabel="Using the Standard Library in the Runtime Library"><info><title>Using the Standard Library in the Instrumentation Implementation</title></info>
 518
 519   <para>
 520   As much as we would like to avoid uses of libstdc++ within our
 521   instrumentation library, containers such as unordered_map are very
 522   appealing.  We plan to use them as long as they are named properly
 523   to avoid ambiguity.
 524   </para>
 525 </section>
 526
 527
 528 <section xml:id="manual.ext.profile_mode.implementation.malloc-hooks" xreflabel="Malloc Hooks"><info><title>Malloc Hooks</title></info>
 529
 530   <para>
 531   User applications/libraries can provide malloc hooks.
 532   When the implementation of the malloc hooks uses stdlibc++, there can
 533   be an infinite cycle between the profile mode instrumentation and the
 534   malloc hook code.
 535   </para>
 536   <para>
 537   We protect against reentrance to the profile mode instrumentation code,
 538   which should avoid this problem in most cases.
 539   The protection mechanism is thread safe and exception safe.
 540   This mechanism does not prevent reentrance to the malloc hook itself,
 541   which could still result in deadlock, if, for instance, the malloc hook
 542   uses non-recursive locks.
 543   XXX: A definitive solution to this problem would be for the profile extension
 544   to use a custom allocator internally, and perhaps not to use libstdc++.
 545   </para>
 546 </section>
 547
 548
 549 <section xml:id="manual.ext.profile_mode.implementation.construction-destruction" xreflabel="Construction and Destruction of Global Objects"><info><title>Construction and Destruction of Global Objects</title></info>
 550
 551   <para>
 552   The profiling library state is initialized at the first call to a profiling
 553   method.  This allows us to record the construction of all global objects.
 554   However, we cannot do the same at destruction time.  The trace is written
 555   by a function registered by <code>atexit</code>, thus invoked by
 556   <code>exit</code>.
 557   </para>
 558 </section>
 559
 560 </section>
 561
 562
 563 <section xml:id="manual.ext.profile_mode.developer" xreflabel="Developer Information"><info><title>Developer Information</title></info>
 564 <?dbhtml filename="profile_mode_devel.html"?>
 565
 566
 567 <section xml:id="manual.ext.profile_mode.developer.bigpic" xreflabel="Big Picture"><info><title>Big Picture</title></info>
 568
 569
 570   <para>The profile mode headers are included with
 571    <code>-D_GLIBCXX_PROFILE</code> through preprocessor directives in
 572    <code>include/std/*</code>.
 573   </para>
 574
 575   <para>Instrumented implementations are provided in
 576    <code>include/profile/*</code>.  All instrumentation hooks are macros
 577    defined in <code>include/profile/profiler.h</code>.
 578   </para>
 579
 580   <para>All the implementation of the instrumentation hooks is in
 581    <code>include/profile/impl/*</code>.  Although all the code gets included,
 582    thus is publicly visible, only a small number of functions are called from
 583    outside this directory.  All calls to hook implementations must be
 584    done through macros defined in <code>profiler.h</code>.  The macro
 585    must ensure (1) that the call is guarded against reentrance and
 586    (2) that the call can be turned off at compile time using a
 587    <code>-D_GLIBCXX_PROFILE_...</code> compiler option.
 588   </para>
 589
 590 </section>
 591
 592 <section xml:id="manual.ext.profile_mode.developer.howto" xreflabel="How To Add A Diagnostic"><info><title>How To Add A Diagnostic</title></info>
 593
 594
 595   <para>Let's say the diagnostic name is "magic".
 596   </para>
 597
 598   <para>If you need to instrument a header not already under
 599    <code>include/profile/*</code>, first edit the corresponding header
 600    under <code>include/std/</code> and add a preprocessor directive such
 601    as the one in <code>include/std/vector</code>:
 602 <programlisting>
 603 #ifdef _GLIBCXX_PROFILE
 604 # include &lt;profile/vector&gt;
 605 #endif
 606 </programlisting>
 607   </para>
 608
 609   <para>If the file you need to instrument is not yet under
 610    <code>include/profile/</code>, make a copy of the one in
 611    <code>include/debug</code>, or the main implementation.
 612    You'll need to include the main implementation and inherit the classes
 613    you want to instrument.  Then define the methods you want to instrument,
 614    define the instrumentation hooks and add calls to them.
 615    Look at <code>include/profile/vector</code> for an example.
 616   </para>
 617
 618   <para>Add macros for the instrumentation hooks in
 619    <code>include/profile/impl/profiler.h</code>.
 620    Hook names must start with <code>__profcxx_</code>.
 621    Make sure they transform
 622    in no code with <code>-D_NO_GLIBCXX_PROFILE_MAGIC</code>.
 623    Make sure all calls to any method in namespace <code>__gnu_profile</code>
 624    is protected against reentrance using macro
 625    <code>_GLIBCXX_PROFILE_REENTRANCE_GUARD</code>.
 626    All names of methods in namespace <code>__gnu_profile</code> called from
 627    <code>profiler.h</code> must start with <code>__trace_magic_</code>.
 628   </para>
 629
 630   <para>Add the implementation of the diagnostic.
 631    <itemizedlist>
 632      <listitem><para>
 633       Create new file <code>include/profile/impl/profiler_magic.h</code>.
 634      </para></listitem>
 635      <listitem><para>
 636       Define class <code>__magic_info: public __object_info_base</code>.
 637       This is the representation of a line in the object table.
 638       The <code>__merge</code> method is used to aggregate information
 639       across all dynamic instances created at the same call context.
 640       The <code>__magnitude</code> must return the estimation of the benefit
 641       as a number of small operations, e.g., number of words copied.
 642       The <code>__write</code> method is used to produce the raw trace.
 643       The <code>__advice</code> method is used to produce the advice string.
 644      </para></listitem>
 645      <listitem><para>
 646       Define class <code>__magic_stack_info: public __magic_info</code>.
 647       This defines the content of a line in the stack table.
 648      </para></listitem>
 649      <listitem><para>
 650       Define class <code>__trace_magic: public __trace_base&lt;__magic_info,
 651       __magic_stack_info&gt;</code>.
 652       It defines the content of the trace associated with this diagnostic.
 653      </para></listitem>
 654     </itemizedlist>
 655   </para>
 656
 657   <para>Add initialization and reporting calls in
 658    <code>include/profile/impl/profiler_trace.h</code>.  Use
 659    <code>__trace_vector_to_list</code> as an example.
 660   </para>
 661
 662   <para>Add documentation in file <code>doc/xml/manual/profile_mode.xml</code>.
 663   </para>
 664 </section>
 665 </section>
 666
 667 <section xml:id="manual.ext.profile_mode.diagnostics"><info><title>Diagnostics</title></info>
 668 <?dbhtml filename="profile_mode_diagnostics.html"?>
 669
 670
 671   <para>
 672   The table below presents all the diagnostics we intend to implement.
 673   Each diagnostic has a corresponding compile time switch
 674   <code>-D_GLIBCXX_PROFILE_&lt;diagnostic&gt;</code>.
 675   Groups of related diagnostics can be turned on with a single switch.
 676   For instance, <code>-D_GLIBCXX_PROFILE_LOCALITY</code> is equivalent to
 677   <code>-D_GLIBCXX_PROFILE_SOFTWARE_PREFETCH
 678   -D_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>.
 679   </para>
 680
 681   <para>
 682   The benefit, cost, expected frequency and accuracy of each diagnostic
 683   was given a grade from 1 to 10, where 10 is highest.
 684   A high benefit means that, if the diagnostic is accurate, the expected
 685   performance improvement is high.
 686   A high cost means that turning this diagnostic on leads to high slowdown.
 687   A high frequency means that we expect this to occur relatively often.
 688   A high accuracy means that the diagnostic is unlikely to be wrong.
 689   These grades are not perfect.  They are just meant to guide users with
 690   specific needs or time budgets.
 691   </para>
 692
 693 <table frame="all" xml:id="table.profile_diagnostics">
 694 <title>Profile Diagnostics</title>
 695
 696 <tgroup cols="7" align="left" colsep="1" rowsep="1">
 697 <colspec colname="c1"/>
 698 <colspec colname="c2"/>
 699 <colspec colname="c3"/>
 700 <colspec colname="c4"/>
 701 <colspec colname="c5"/>
 702 <colspec colname="c6"/>
 703 <colspec colname="c7"/>
 704
 705 <thead>
 706   <row>
 707     <entry>Group</entry>
 708     <entry>Flag</entry>
 709     <entry>Benefit</entry>
 710     <entry>Cost</entry>
 711     <entry>Freq.</entry>
 712     <entry>Implemented</entry>
 713   </row>
 714 </thead>
 715 <tbody>
 716   <row>
 717     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.containers">
 718     CONTAINERS</link></entry>
 719     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.hashtable_too_small">
 720     HASHTABLE_TOO_SMALL</link></entry>
 721     <entry>10</entry>
 722     <entry>1</entry>
 723     <entry/>
 724     <entry>10</entry>
 725     <entry>yes</entry>
 726   </row>
 727   <row>
 728     <entry/>
 729     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.hashtable_too_large">
 730     HASHTABLE_TOO_LARGE</link></entry>
 731     <entry>5</entry>
 732     <entry>1</entry>
 733     <entry/>
 734     <entry>10</entry>
 735     <entry>yes</entry>
 736   </row>
 737   <row>
 738     <entry/>
 739     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.inefficient_hash">
 740     INEFFICIENT_HASH</link></entry>
 741     <entry>7</entry>
 742     <entry>3</entry>
 743     <entry/>
 744     <entry>10</entry>
 745     <entry>yes</entry>
 746   </row>
 747   <row>
 748     <entry/>
 749     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.vector_too_small">
 750     VECTOR_TOO_SMALL</link></entry>
 751     <entry>8</entry>
 752     <entry>1</entry>
 753     <entry/>
 754     <entry>10</entry>
 755     <entry>yes</entry>
 756   </row>
 757   <row>
 758     <entry/>
 759     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.vector_too_large">
 760     VECTOR_TOO_LARGE</link></entry>
 761     <entry>5</entry>
 762     <entry>1</entry>
 763     <entry/>
 764     <entry>10</entry>
 765     <entry>yes</entry>
 766   </row>
 767   <row>
 768     <entry/>
 769     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.vector_to_hashtable">
 770     VECTOR_TO_HASHTABLE</link></entry>
 771     <entry>7</entry>
 772     <entry>7</entry>
 773     <entry/>
 774     <entry>10</entry>
 775     <entry>no</entry>
 776   </row>
 777   <row>
 778     <entry/>
 779     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.hashtable_to_vector">
 780     HASHTABLE_TO_VECTOR</link></entry>
 781     <entry>7</entry>
 782     <entry>7</entry>
 783     <entry/>
 784     <entry>10</entry>
 785     <entry>no</entry>
 786   </row>
 787   <row>
 788     <entry/>
 789     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.vector_to_list">
 790     VECTOR_TO_LIST</link></entry>
 791     <entry>8</entry>
 792     <entry>5</entry>
 793     <entry/>
 794     <entry>10</entry>
 795     <entry>yes</entry>
 796   </row>
 797   <row>
 798     <entry/>
 799     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.list_to_vector">
 800     LIST_TO_VECTOR</link></entry>
 801     <entry>10</entry>
 802     <entry>5</entry>
 803     <entry/>
 804     <entry>10</entry>
 805     <entry>no</entry>
 806   </row>
 807   <row>
 808     <entry/>
 809     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.assoc_ord_to_unord">
 810     ORDERED_TO_UNORDERED</link></entry>
 811     <entry>10</entry>
 812     <entry>5</entry>
 813     <entry/>
 814     <entry>10</entry>
 815     <entry>only map/unordered_map</entry>
 816   </row>
 817   <row>
 818     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.algorithms">
 819     ALGORITHMS</link></entry>
 820     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.algorithms.sort">
 821     SORT</link></entry>
 822     <entry>7</entry>
 823     <entry>8</entry>
 824     <entry/>
 825     <entry>7</entry>
 826     <entry>no</entry>
 827   </row>
 828   <row>
 829     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.locality">
 830     LOCALITY</link></entry>
 831     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.locality.sw_prefetch">
 832     SOFTWARE_PREFETCH</link></entry>
 833     <entry>8</entry>
 834     <entry>8</entry>
 835     <entry/>
 836     <entry>5</entry>
 837     <entry>no</entry>
 838   </row>
 839   <row>
 840     <entry/>
 841     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.locality.linked">
 842     RBTREE_LOCALITY</link></entry>
 843     <entry>4</entry>
 844     <entry>8</entry>
 845     <entry/>
 846     <entry>5</entry>
 847     <entry>no</entry>
 848   </row>
 849   <row>
 850     <entry/>
 851     <entry><link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#manual.ext.profile_mode.analysis.mthread.false_share">
 852     FALSE_SHARING</link></entry>
 853     <entry>8</entry>
 854     <entry>10</entry>
 855     <entry/>
 856     <entry>10</entry>
 857     <entry>no</entry>
 858   </row>
 859 </tbody>
 860 </tgroup>
 861 </table>
 862
 863 <section xml:id="manual.ext.profile_mode.analysis.template" xreflabel="Template"><info><title>Diagnostic Template</title></info>
 864
 865 <itemizedlist>
 866   <listitem><para><emphasis>Switch:</emphasis>
 867   <code>_GLIBCXX_PROFILE_&lt;diagnostic&gt;</code>.
 868   </para></listitem>
 869   <listitem><para><emphasis>Goal:</emphasis>  What problem will it diagnose?
 870   </para></listitem>
 871   <listitem><para><emphasis>Fundamentals:</emphasis>.
 872   What is the fundamental reason why this is a problem</para></listitem>
 873   <listitem><para><emphasis>Sample runtime reduction:</emphasis>
 874   Percentage reduction in execution time.  When reduction is more than
 875   a constant factor, describe the reduction rate formula.
 876   </para></listitem>
 877   <listitem><para><emphasis>Recommendation:</emphasis>
 878   What would the advise look like?</para></listitem>
 879   <listitem><para><emphasis>To instrument:</emphasis>
 880   What stdlibc++ components need to be instrumented?</para></listitem>
 881   <listitem><para><emphasis>Analysis:</emphasis>
 882   How do we decide when to issue the advice?</para></listitem>
 883   <listitem><para><emphasis>Cost model:</emphasis>
 884   How do we measure benefits?  Math goes here.</para></listitem>
 885   <listitem><para><emphasis>Example:</emphasis>
 886 <programlisting>
 887 program code
 888 ...
 889 advice sample
 890 </programlisting>
 891 </para></listitem>
 892 </itemizedlist>
 893 </section>
 894
 895
 896 <section xml:id="manual.ext.profile_mode.analysis.containers" xreflabel="Containers"><info><title>Containers</title></info>
 897
 898
 899 <para>
 900 <emphasis>Switch:</emphasis>
 901   <code>_GLIBCXX_PROFILE_CONTAINERS</code>.
 902 </para>
 903
 904 <section xml:id="manual.ext.profile_mode.analysis.hashtable_too_small" xreflabel="Hashtable Too Small"><info><title>Hashtable Too Small</title></info>
 905
 906 <itemizedlist>
 907   <listitem><para><emphasis>Switch:</emphasis>
 908   <code>_GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL</code>.
 909   </para></listitem>
 910   <listitem><para><emphasis>Goal:</emphasis> Detect hashtables with many
 911   rehash operations, small construction size and large destruction size.
 912   </para></listitem>
 913   <listitem><para><emphasis>Fundamentals:</emphasis> Rehash is very expensive.
 914   Read content, follow chains within bucket, evaluate hash function, place at
 915   new location in different order.</para></listitem>
 916   <listitem><para><emphasis>Sample runtime reduction:</emphasis> 36%.
 917   Code similar to example below.
 918   </para></listitem>
 919   <listitem><para><emphasis>Recommendation:</emphasis>
 920   Set initial size to N at construction site S.
 921   </para></listitem>
 922   <listitem><para><emphasis>To instrument:</emphasis>
 923   <code>unordered_set, unordered_map</code> constructor, destructor, rehash.
 924   </para></listitem>
 925   <listitem><para><emphasis>Analysis:</emphasis>
 926   For each dynamic instance of <code>unordered_[multi]set|map</code>,
 927   record initial size and call context of the constructor.
 928   Record size increase, if any, after each relevant operation such as insert.
 929   Record the estimated rehash cost.</para></listitem>
 930   <listitem><para><emphasis>Cost model:</emphasis>
 931   Number of individual rehash operations * cost per rehash.</para></listitem>
 932   <listitem><para><emphasis>Example:</emphasis>
 933 <programlisting>
 934 1 unordered_set&lt;int&gt; us;
 935 2 for (int k = 0; k &lt; 1000000; ++k) {
 936 3   us.insert(k);
 937 4 }
 938
 939 foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations.
 940 </programlisting>
 941 </para></listitem>
 942 </itemizedlist>
 943 </section>
 944
 945
 946 <section xml:id="manual.ext.profile_mode.analysis.hashtable_too_large" xreflabel="Hashtable Too Large"><info><title>Hashtable Too Large</title></info>
 947
 948 <itemizedlist>
 949   <listitem><para><emphasis>Switch:</emphasis>
 950   <code>_GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE</code>.
 951   </para></listitem>
 952   <listitem><para><emphasis>Goal:</emphasis> Detect hashtables which are
 953   never filled up because fewer elements than reserved are ever
 954   inserted.
 955   </para></listitem>
 956   <listitem><para><emphasis>Fundamentals:</emphasis> Save memory, which
 957   is good in itself and may also improve memory reference performance through
 958   fewer cache and TLB misses.</para></listitem>
 959   <listitem><para><emphasis>Sample runtime reduction:</emphasis> unknown.
 960   </para></listitem>
 961   <listitem><para><emphasis>Recommendation:</emphasis>
 962   Set initial size to N at construction site S.
 963   </para></listitem>
 964   <listitem><para><emphasis>To instrument:</emphasis>
 965   <code>unordered_set, unordered_map</code> constructor, destructor, rehash.
 966   </para></listitem>
 967   <listitem><para><emphasis>Analysis:</emphasis>
 968   For each dynamic instance of <code>unordered_[multi]set|map</code>,
 969   record initial size and call context of the constructor, and correlate it
 970   with its size at destruction time.
 971   </para></listitem>
 972   <listitem><para><emphasis>Cost model:</emphasis>
 973   Number of iteration operations + memory saved.</para></listitem>
 974   <listitem><para><emphasis>Example:</emphasis>
 975 <programlisting>
 976 1 vector&lt;unordered_set&lt;int&gt;&gt; v(100000, unordered_set&lt;int&gt;(100)) ;
 977 2 for (int k = 0; k &lt; 100000; ++k) {
 978 3   for (int j = 0; j &lt; 10; ++j) {
 979 4     v[k].insert(k + j);
 980 5  }
 981 6 }
 982
 983 foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N
 984 bytes of memory and M iteration steps.
 985 </programlisting>
 986 </para></listitem>
 987 </itemizedlist>
 988 </section>
 989
 990 <section xml:id="manual.ext.profile_mode.analysis.inefficient_hash" xreflabel="Inefficient Hash"><info><title>Inefficient Hash</title></info>
 991
 992 <itemizedlist>
 993   <listitem><para><emphasis>Switch:</emphasis>
 994   <code>_GLIBCXX_PROFILE_INEFFICIENT_HASH</code>.
 995   </para></listitem>
 996   <listitem><para><emphasis>Goal:</emphasis> Detect hashtables with polarized
 997   distribution.
 998   </para></listitem>
 999   <listitem><para><emphasis>Fundamentals:</emphasis> A non-uniform
1000   distribution may lead to long chains, thus possibly increasing complexity
1001   by a factor up to the number of elements.
1002   </para></listitem>
1003   <listitem><para><emphasis>Sample runtime reduction:</emphasis> factor up
1004    to container size.
1005   </para></listitem>
1006   <listitem><para><emphasis>Recommendation:</emphasis> Change hash function
1007   for container built at site S.  Distribution score = N.  Access score = S.
1008   Longest chain = C, in bucket B.
1009   </para></listitem>
1010   <listitem><para><emphasis>To instrument:</emphasis>
1011   <code>unordered_set, unordered_map</code> constructor, destructor, [],
1012   insert, iterator.
1013   </para></listitem>
1014   <listitem><para><emphasis>Analysis:</emphasis>
1015   Count the exact number of link traversals.
1016   </para></listitem>
1017   <listitem><para><emphasis>Cost model:</emphasis>
1018   Total number of links traversed.</para></listitem>
1019   <listitem><para><emphasis>Example:</emphasis>
1020 <programlisting>
1021 class dumb_hash {
1022  public:
1023   size_t operator() (int i) const { return 0; }
1024 };
1025 ...
1026   unordered_set&lt;int, dumb_hash&gt; hs;
1027   ...
1028   for (int i = 0; i &lt; COUNT; ++i) {
1029     hs.find(i);
1030   }
1031 </programlisting>
1032 </para></listitem>
1033 </itemizedlist>
1034 </section>
1035
1036 <section xml:id="manual.ext.profile_mode.analysis.vector_too_small" xreflabel="Vector Too Small"><info><title>Vector Too Small</title></info>
1037
1038 <itemizedlist>
1039   <listitem><para><emphasis>Switch:</emphasis>
1040   <code>_GLIBCXX_PROFILE_VECTOR_TOO_SMALL</code>.
1041   </para></listitem>
1042   <listitem><para><emphasis>Goal:</emphasis>Detect vectors with many
1043   resize operations, small construction size and large destruction size..
1044   </para></listitem>
1045   <listitem><para><emphasis>Fundamentals:</emphasis>Resizing can be expensive.
1046   Copying large amounts of data takes time.  Resizing many small vectors may
1047   have allocation overhead and affect locality.</para></listitem>
1048   <listitem><para><emphasis>Sample runtime reduction:</emphasis>%.
1049   </para></listitem>
1050   <listitem><para><emphasis>Recommendation:</emphasis>
1051   Set initial size to N at construction site S.</para></listitem>
1052   <listitem><para><emphasis>To instrument:</emphasis><code>vector</code>.
1053   </para></listitem>
1054   <listitem><para><emphasis>Analysis:</emphasis>
1055   For each dynamic instance of <code>vector</code>,
1056   record initial size and call context of the constructor.
1057   Record size increase, if any, after each relevant operation such as
1058   <code>push_back</code>.  Record the estimated resize cost.
1059   </para></listitem>
1060   <listitem><para><emphasis>Cost model:</emphasis>
1061   Total number of words copied * time to copy a word.</para></listitem>
1062   <listitem><para><emphasis>Example:</emphasis>
1063 <programlisting>
1064 1 vector&lt;int&gt; v;
1065 2 for (int k = 0; k &lt; 1000000; ++k) {
1066 3   v.push_back(k);
1067 4 }
1068
1069 foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves
1070 copying 4000000 bytes and 20 memory allocations and deallocations.
1071 </programlisting>
1072 </para></listitem>
1073 </itemizedlist>
1074 </section>
1075
1076 <section xml:id="manual.ext.profile_mode.analysis.vector_too_large" xreflabel="Vector Too Large"><info><title>Vector Too Large</title></info>
1077
1078 <itemizedlist>
1079   <listitem><para><emphasis>Switch:</emphasis>
1080   <code>_GLIBCXX_PROFILE_VECTOR_TOO_LARGE</code>
1081   </para></listitem>
1082   <listitem><para><emphasis>Goal:</emphasis>Detect vectors which are
1083   never filled up because fewer elements than reserved are ever
1084   inserted.
1085   </para></listitem>
1086   <listitem><para><emphasis>Fundamentals:</emphasis>Save memory, which
1087   is good in itself and may also improve memory reference performance through
1088   fewer cache and TLB misses.</para></listitem>
1089   <listitem><para><emphasis>Sample runtime reduction:</emphasis>%.
1090   </para></listitem>
1091   <listitem><para><emphasis>Recommendation:</emphasis>
1092   Set initial size to N at construction site S.</para></listitem>
1093   <listitem><para><emphasis>To instrument:</emphasis><code>vector</code>.
1094   </para></listitem>
1095   <listitem><para><emphasis>Analysis:</emphasis>
1096   For each dynamic instance of <code>vector</code>,
1097   record initial size and call context of the constructor, and correlate it
1098   with its size at destruction time.</para></listitem>
1099   <listitem><para><emphasis>Cost model:</emphasis>
1100   Total amount of memory saved.</para></listitem>
1101   <listitem><para><emphasis>Example:</emphasis>
1102 <programlisting>
1103 1 vector&lt;vector&lt;int&gt;&gt; v(100000, vector&lt;int&gt;(100)) ;
1104 2 for (int k = 0; k &lt; 100000; ++k) {
1105 3   for (int j = 0; j &lt; 10; ++j) {
1106 4     v[k].insert(k + j);
1107 5  }
1108 6 }
1109
1110 foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N
1111 bytes of memory and may reduce the number of cache and TLB misses.
1112 </programlisting>
1113 </para></listitem>
1114 </itemizedlist>
1115 </section>
1116
1117 <section xml:id="manual.ext.profile_mode.analysis.vector_to_hashtable" xreflabel="Vector to Hashtable"><info><title>Vector to Hashtable</title></info>
1118
1119 <itemizedlist>
1120   <listitem><para><emphasis>Switch:</emphasis>
1121   <code>_GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE</code>.
1122   </para></listitem>
1123   <listitem><para><emphasis>Goal:</emphasis> Detect uses of
1124   <code>vector</code> that can be substituted with <code>unordered_set</code>
1125   to reduce execution time.
1126   </para></listitem>
1127   <listitem><para><emphasis>Fundamentals:</emphasis>
1128   Linear search in a vector is very expensive, whereas searching in a hashtable
1129   is very quick.</para></listitem>
1130   <listitem><para><emphasis>Sample runtime reduction:</emphasis>factor up
1131    to container size.
1132   </para></listitem>
1133   <listitem><para><emphasis>Recommendation:</emphasis>Replace
1134   <code>vector</code> with <code>unordered_set</code> at site S.
1135   </para></listitem>
1136   <listitem><para><emphasis>To instrument:</emphasis><code>vector</code>
1137   operations and access methods.</para></listitem>
1138   <listitem><para><emphasis>Analysis:</emphasis>
1139   For each dynamic instance of <code>vector</code>,
1140   record call context of the constructor.  Issue the advice only if the
1141   only methods called on this <code>vector</code> are <code>push_back</code>,
1142   <code>insert</code> and <code>find</code>.
1143   </para></listitem>
1144   <listitem><para><emphasis>Cost model:</emphasis>
1145   Cost(vector::push_back) + cost(vector::insert) + cost(find, vector) -
1146   cost(unordered_set::insert) + cost(unordered_set::find).
1147   </para></listitem>
1148   <listitem><para><emphasis>Example:</emphasis>
1149 <programlisting>
1150 1  vector&lt;int&gt; v;
1151 ...
1152 2  for (int i = 0; i &lt; 1000; ++i) {
1153 3    find(v.begin(), v.end(), i);
1154 4  }
1155
1156 foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000
1157 comparisons.
1158 </programlisting>
1159 </para></listitem>
1160 </itemizedlist>
1161 </section>
1162
1163 <section xml:id="manual.ext.profile_mode.analysis.hashtable_to_vector" xreflabel="Hashtable to Vector"><info><title>Hashtable to Vector</title></info>
1164
1165 <itemizedlist>
1166   <listitem><para><emphasis>Switch:</emphasis>
1167   <code>_GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR</code>.
1168   </para></listitem>
1169   <listitem><para><emphasis>Goal:</emphasis> Detect uses of
1170   <code>unordered_set</code> that can be substituted with <code>vector</code>
1171   to reduce execution time.
1172   </para></listitem>
1173   <listitem><para><emphasis>Fundamentals:</emphasis>
1174   Hashtable iterator is slower than vector iterator.</para></listitem>
1175   <listitem><para><emphasis>Sample runtime reduction:</emphasis>95%.
1176   </para></listitem>
1177   <listitem><para><emphasis>Recommendation:</emphasis>Replace
1178   <code>unordered_set</code> with <code>vector</code> at site S.
1179   </para></listitem>
1180   <listitem><para><emphasis>To instrument:</emphasis><code>unordered_set</code>
1181   operations and access methods.</para></listitem>
1182   <listitem><para><emphasis>Analysis:</emphasis>
1183   For each dynamic instance of <code>unordered_set</code>,
1184   record call context of the constructor.  Issue the advice only if the
1185   number of <code>find</code>, <code>insert</code> and <code>[]</code>
1186   operations on this <code>unordered_set</code> are small relative to the
1187   number of elements, and methods <code>begin</code> or <code>end</code>
1188   are invoked (suggesting iteration).</para></listitem>
1189   <listitem><para><emphasis>Cost model:</emphasis>
1190   Number of .</para></listitem>
1191   <listitem><para><emphasis>Example:</emphasis>
1192 <programlisting>
1193 1  unordered_set&lt;int&gt; us;
1194 ...
1195 2  int s = 0;
1196 3  for (unordered_set&lt;int&gt;::iterator it = us.begin(); it != us.end(); ++it) {
1197 4    s += *it;
1198 5  }
1199
1200 foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N
1201 indirections and may achieve better data locality.
1202 </programlisting>
1203 </para></listitem>
1204 </itemizedlist>
1205 </section>
1206
1207 <section xml:id="manual.ext.profile_mode.analysis.vector_to_list" xreflabel="Vector to List"><info><title>Vector to List</title></info>
1208
1209 <itemizedlist>
1210   <listitem><para><emphasis>Switch:</emphasis>
1211   <code>_GLIBCXX_PROFILE_VECTOR_TO_LIST</code>.
1212   </para></listitem>
1213   <listitem><para><emphasis>Goal:</emphasis> Detect cases where
1214   <code>vector</code> could be substituted with <code>list</code> for
1215   better performance.
1216   </para></listitem>
1217   <listitem><para><emphasis>Fundamentals:</emphasis>
1218   Inserting in the middle of a vector is expensive compared to inserting in a
1219   list.
1220   </para></listitem>
1221   <listitem><para><emphasis>Sample runtime reduction:</emphasis>factor up to
1222    container size.
1223   </para></listitem>
1224   <listitem><para><emphasis>Recommendation:</emphasis>Replace vector with list
1225   at site S.</para></listitem>
1226   <listitem><para><emphasis>To instrument:</emphasis><code>vector</code>
1227   operations and access methods.</para></listitem>
1228   <listitem><para><emphasis>Analysis:</emphasis>
1229   For each dynamic instance of <code>vector</code>,
1230   record the call context of the constructor.  Record the overhead of each
1231   <code>insert</code> operation based on current size and insert position.
1232   Report instance with high insertion overhead.
1233   </para></listitem>
1234   <listitem><para><emphasis>Cost model:</emphasis>
1235   (Sum(cost(vector::method)) - Sum(cost(list::method)), for
1236   method in [push_back, insert, erase])
1237   + (Cost(iterate vector) - Cost(iterate list))</para></listitem>
1238   <listitem><para><emphasis>Example:</emphasis>
1239 <programlisting>
1240 1  vector&lt;int&gt; v;
1241 2  for (int i = 0; i &lt; 10000; ++i) {
1242 3    v.insert(v.begin(), i);
1243 4  }
1244
1245 foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000
1246 operations.
1247 </programlisting>
1248 </para></listitem>
1249 </itemizedlist>
1250 </section>
1251
1252 <section xml:id="manual.ext.profile_mode.analysis.list_to_vector" xreflabel="List to Vector"><info><title>List to Vector</title></info>
1253
1254 <itemizedlist>
1255   <listitem><para><emphasis>Switch:</emphasis>
1256   <code>_GLIBCXX_PROFILE_LIST_TO_VECTOR</code>.
1257   </para></listitem>
1258   <listitem><para><emphasis>Goal:</emphasis> Detect cases where
1259   <code>list</code> could be substituted with <code>vector</code> for
1260   better performance.
1261   </para></listitem>
1262   <listitem><para><emphasis>Fundamentals:</emphasis>
1263   Iterating through a vector is faster than through a list.
1264   </para></listitem>
1265   <listitem><para><emphasis>Sample runtime reduction:</emphasis>64%.
1266   </para></listitem>
1267   <listitem><para><emphasis>Recommendation:</emphasis>Replace list with vector
1268   at site S.</para></listitem>
1269   <listitem><para><emphasis>To instrument:</emphasis><code>vector</code>
1270   operations and access methods.</para></listitem>
1271   <listitem><para><emphasis>Analysis:</emphasis>
1272   Issue the advice if there are no <code>insert</code> operations.
1273   </para></listitem>
1274   <listitem><para><emphasis>Cost model:</emphasis>
1275     (Sum(cost(vector::method)) - Sum(cost(list::method)), for
1276   method in [push_back, insert, erase])
1277   + (Cost(iterate vector) - Cost(iterate list))</para></listitem>
1278   <listitem><para><emphasis>Example:</emphasis>
1279 <programlisting>
1280 1  list&lt;int&gt; l;
1281 ...
1282 2  int sum = 0;
1283 3  for (list&lt;int&gt;::iterator it = l.begin(); it != l.end(); ++it) {
1284 4    sum += *it;
1285 5  }
1286
1287 foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect
1288 memory references.
1289 </programlisting>
1290 </para></listitem>
1291 </itemizedlist>
1292 </section>
1293
1294 <section xml:id="manual.ext.profile_mode.analysis.list_to_slist" xreflabel="List to Forward List"><info><title>List to Forward List (Slist)</title></info>
1295
1296 <itemizedlist>
1297   <listitem><para><emphasis>Switch:</emphasis>
1298   <code>_GLIBCXX_PROFILE_LIST_TO_SLIST</code>.
1299   </para></listitem>
1300   <listitem><para><emphasis>Goal:</emphasis> Detect cases where
1301   <code>list</code> could be substituted with <code>forward_list</code> for
1302   better performance.
1303   </para></listitem>
1304   <listitem><para><emphasis>Fundamentals:</emphasis>
1305   The memory footprint of a forward_list is smaller than that of a list.
1306   This has beneficial effects on memory subsystem, e.g., fewer cache misses.
1307   </para></listitem>
1308   <listitem><para><emphasis>Sample runtime reduction:</emphasis>40%.
1309   Note that the reduction is only noticeable if the size of the forward_list
1310   node is in fact larger than that of the list node.  For memory allocators
1311   with size classes, you will only notice an effect when the two node sizes
1312   belong to different allocator size classes.
1313   </para></listitem>
1314   <listitem><para><emphasis>Recommendation:</emphasis>Replace list with
1315   forward_list at site S.</para></listitem>
1316   <listitem><para><emphasis>To instrument:</emphasis><code>list</code>
1317   operations and iteration methods.</para></listitem>
1318   <listitem><para><emphasis>Analysis:</emphasis>
1319   Issue the advice if there are no <code>backwards</code> traversals
1320   or insertion before a given node.
1321   </para></listitem>
1322   <listitem><para><emphasis>Cost model:</emphasis>
1323   Always true.</para></listitem>
1324   <listitem><para><emphasis>Example:</emphasis>
1325 <programlisting>
1326 1  list&lt;int&gt; l;
1327 ...
1328 2  int sum = 0;
1329 3  for (list&lt;int&gt;::iterator it = l.begin(); it != l.end(); ++it) {
1330 4    sum += *it;
1331 5  }
1332
1333 foo.cc:1: advice: Change "list" to "forward_list".
1334 </programlisting>
1335 </para></listitem>
1336 </itemizedlist>
1337 </section>
1338
1339 <section xml:id="manual.ext.profile_mode.analysis.assoc_ord_to_unord" xreflabel="Ordered to Unordered Associative Container"><info><title>Ordered to Unordered Associative Container</title></info>
1340
1341 <itemizedlist>
1342   <listitem><para><emphasis>Switch:</emphasis>
1343   <code>_GLIBCXX_PROFILE_ORDERED_TO_UNORDERED</code>.
1344   </para></listitem>
1345   <listitem><para><emphasis>Goal:</emphasis>  Detect cases where ordered
1346   associative containers can be replaced with unordered ones.
1347   </para></listitem>
1348   <listitem><para><emphasis>Fundamentals:</emphasis>
1349   Insert and search are quicker in a hashtable than in
1350   a red-black tree.</para></listitem>
1351   <listitem><para><emphasis>Sample runtime reduction:</emphasis>52%.
1352   </para></listitem>
1353   <listitem><para><emphasis>Recommendation:</emphasis>
1354   Replace set with unordered_set at site S.</para></listitem>
1355   <listitem><para><emphasis>To instrument:</emphasis>
1356   <code>set</code>, <code>multiset</code>, <code>map</code>,
1357   <code>multimap</code> methods.</para></listitem>
1358   <listitem><para><emphasis>Analysis:</emphasis>
1359   Issue the advice only if we are not using operator <code>++</code> on any
1360   iterator on a particular <code>[multi]set|map</code>.
1361   </para></listitem>
1362   <listitem><para><emphasis>Cost model:</emphasis>
1363   (Sum(cost(hashtable::method)) - Sum(cost(rbtree::method)), for
1364   method in [insert, erase, find])
1365   + (Cost(iterate hashtable) - Cost(iterate rbtree))</para></listitem>
1366   <listitem><para><emphasis>Example:</emphasis>
1367 <programlisting>
1368 1  set&lt;int&gt; s;
1369 2  for (int i = 0; i &lt; 100000; ++i) {
1370 3    s.insert(i);
1371 4  }
1372 5  int sum = 0;
1373 6  for (int i = 0; i &lt; 100000; ++i) {
1374 7    sum += *s.find(i);
1375 8  }
1376 </programlisting>
1377 </para></listitem>
1378 </itemizedlist>
1379 </section>
1380
1381 </section>
1382
1383
1384
1385 <section xml:id="manual.ext.profile_mode.analysis.algorithms" xreflabel="Algorithms"><info><title>Algorithms</title></info>
1386
1387
1388   <para><emphasis>Switch:</emphasis>
1389   <code>_GLIBCXX_PROFILE_ALGORITHMS</code>.
1390   </para>
1391
1392 <section xml:id="manual.ext.profile_mode.analysis.algorithms.sort" xreflabel="Sorting"><info><title>Sort Algorithm Performance</title></info>
1393
1394 <itemizedlist>
1395   <listitem><para><emphasis>Switch:</emphasis>
1396   <code>_GLIBCXX_PROFILE_SORT</code>.
1397   </para></listitem>
1398   <listitem><para><emphasis>Goal:</emphasis> Give measure of sort algorithm
1399   performance based on actual input.  For instance, advise Radix Sort over
1400   Quick Sort for a particular call context.
1401   </para></listitem>
1402   <listitem><para><emphasis>Fundamentals:</emphasis>
1403   See papers:
1404   <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://dl.acm.org/citation.cfm?doid=1065944.1065981">
1405   A framework for adaptive algorithm selection in STAPL</link> and
1406   <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://ieeexplore.ieee.org/document/4228227/">
1407   Optimizing Sorting with Machine Learning Algorithms</link>.
1408   </para></listitem>
1409   <listitem><para><emphasis>Sample runtime reduction:</emphasis>60%.
1410   </para></listitem>
1411   <listitem><para><emphasis>Recommendation:</emphasis> Change sort algorithm
1412   at site S from X Sort to Y Sort.</para></listitem>
1413   <listitem><para><emphasis>To instrument:</emphasis> <code>sort</code>
1414   algorithm.</para></listitem>
1415   <listitem><para><emphasis>Analysis:</emphasis>
1416   Issue the advice if the cost model tells us that another sort algorithm
1417   would do better on this input.  Requires us to know what algorithm we
1418   are using in our sort implementation in release mode.</para></listitem>
1419   <listitem><para><emphasis>Cost model:</emphasis>
1420   Runtime(algo) for algo in [radix, quick, merge, ...]</para></listitem>
1421   <listitem><para><emphasis>Example:</emphasis>
1422 <programlisting>
1423 </programlisting>
1424 </para></listitem>
1425 </itemizedlist>
1426 </section>
1427
1428 </section>
1429
1430
1431 <section xml:id="manual.ext.profile_mode.analysis.locality" xreflabel="Data Locality"><info><title>Data Locality</title></info>
1432
1433
1434   <para><emphasis>Switch:</emphasis>
1435   <code>_GLIBCXX_PROFILE_LOCALITY</code>.
1436   </para>
1437
1438 <section xml:id="manual.ext.profile_mode.analysis.locality.sw_prefetch" xreflabel="Need Software Prefetch"><info><title>Need Software Prefetch</title></info>
1439
1440 <itemizedlist>
1441   <listitem><para><emphasis>Switch:</emphasis>
1442   <code>_GLIBCXX_PROFILE_SOFTWARE_PREFETCH</code>.
1443   </para></listitem>
1444   <listitem><para><emphasis>Goal:</emphasis> Discover sequences of indirect
1445   memory accesses that are not regular, thus cannot be predicted by
1446   hardware prefetchers.
1447   </para></listitem>
1448   <listitem><para><emphasis>Fundamentals:</emphasis>
1449   Indirect references are hard to predict and are very expensive when they
1450   miss in caches.</para></listitem>
1451   <listitem><para><emphasis>Sample runtime reduction:</emphasis>25%.
1452   </para></listitem>
1453   <listitem><para><emphasis>Recommendation:</emphasis> Insert prefetch
1454   instruction.</para></listitem>
1455   <listitem><para><emphasis>To instrument:</emphasis> Vector iterator and
1456   access operator [].
1457   </para></listitem>
1458   <listitem><para><emphasis>Analysis:</emphasis>
1459   First, get cache line size and page size from system.
1460   Then record iterator dereference sequences for which the value is a pointer.
1461   For each sequence within a container, issue a warning if successive pointer
1462   addresses are not within cache lines and do not form a linear pattern
1463   (otherwise they may be prefetched by hardware).
1464   If they also step across page boundaries, make the warning stronger.
1465   </para>
1466   <para>The same analysis applies to containers other than vector.
1467   However, we cannot give the same advice for linked structures, such as list,
1468   as there is no random access to the n-th element.  The user may still be
1469   able to benefit from this information, for instance by employing frays (user
1470   level light weight threads) to hide the latency of chasing pointers.
1471   </para>
1472   <para>
1473   This analysis is a little oversimplified.  A better cost model could be
1474   created by understanding the capability of the hardware prefetcher.
1475   This model could be trained automatically by running a set of synthetic
1476   cases.
1477   </para>
1478   </listitem>
1479   <listitem><para><emphasis>Cost model:</emphasis>
1480   Total distance between pointer values of successive elements in vectors
1481   of pointers.</para></listitem>
1482   <listitem><para><emphasis>Example:</emphasis>
1483 <programlisting>
1484 1 int zero = 0;
1485 2 vector&lt;int*&gt; v(10000000, &amp;zero);
1486 3 for (int k = 0; k &lt; 10000000; ++k) {
1487 4   v[random() % 10000000] = new int(k);
1488 5 }
1489 6 for (int j = 0; j &lt; 10000000; ++j) {
1490 7   count += (*v[j] == 0 ? 0 : 1);
1491 8 }
1492
1493 foo.cc:7: advice: Insert prefetch instruction.
1494 </programlisting>
1495 </para></listitem>
1496 </itemizedlist>
1497 </section>
1498
1499 <section xml:id="manual.ext.profile_mode.analysis.locality.linked" xreflabel="Linked Structure Locality"><info><title>Linked Structure Locality</title></info>
1500
1501 <itemizedlist>
1502   <listitem><para><emphasis>Switch:</emphasis>
1503   <code>_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>.
1504   </para></listitem>
1505   <listitem><para><emphasis>Goal:</emphasis> Give measure of locality of
1506   objects stored in linked structures (lists, red-black trees and hashtables)
1507   with respect to their actual traversal patterns.
1508   </para></listitem>
1509   <listitem><para><emphasis>Fundamentals:</emphasis>Allocation can be tuned
1510   to a specific traversal pattern, to result in better data locality.
1511   See paper:
1512   <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://parasol.tamu.edu/publications/download.php?file_id=570">
1513   Custom Memory Allocation for Free</link> by Jula and Rauchwerger.
1514   </para></listitem>
1515   <listitem><para><emphasis>Sample runtime reduction:</emphasis>30%.
1516   </para></listitem>
1517   <listitem><para><emphasis>Recommendation:</emphasis>
1518   High scatter score N for container built at site S.
1519   Consider changing allocation sequence or choosing a structure conscious
1520   allocator.</para></listitem>
1521   <listitem><para><emphasis>To instrument:</emphasis> Methods of all
1522   containers using linked structures.</para></listitem>
1523   <listitem><para><emphasis>Analysis:</emphasis>
1524   First, get cache line size and page size from system.
1525   Then record the number of successive elements that are on different line
1526   or page, for each traversal method such as <code>find</code>.  Give advice
1527   only if the ratio between this number and the number of total node hops
1528   is above a threshold.</para></listitem>
1529   <listitem><para><emphasis>Cost model:</emphasis>
1530   Sum(same_cache_line(this,previous))</para></listitem>
1531   <listitem><para><emphasis>Example:</emphasis>
1532 <programlisting>
1533  1  set&lt;int&gt; s;
1534  2  for (int i = 0; i &lt; 10000000; ++i) {
1535  3    s.insert(i);
1536  4  }
1537  5  set&lt;int&gt; s1, s2;
1538  6  for (int i = 0; i &lt; 10000000; ++i) {
1539  7    s1.insert(i);
1540  8    s2.insert(i);
1541  9  }
1542 ...
1543       // Fast, better locality.
1544 10    for (set&lt;int&gt;::iterator it = s.begin(); it != s.end(); ++it) {
1545 11      sum += *it;
1546 12    }
1547       // Slow, elements are further apart.
1548 13    for (set&lt;int&gt;::iterator it = s1.begin(); it != s1.end(); ++it) {
1549 14      sum += *it;
1550 15    }
1551
1552 foo.cc:5: advice: High scatter score NNN for set built here.  Consider changing
1553 the allocation sequence or switching to a structure conscious allocator.
1554 </programlisting>
1555 </para></listitem>
1556 </itemizedlist>
1557 </section>
1558
1559 </section>
1560
1561
1562 <section xml:id="manual.ext.profile_mode.analysis.mthread" xreflabel="Multithreaded Data Access"><info><title>Multithreaded Data Access</title></info>
1563
1564
1565   <para>
1566   The diagnostics in this group are not meant to be implemented short term.
1567   They require compiler support to know when container elements are written
1568   to.  Instrumentation can only tell us when elements are referenced.
1569   </para>
1570
1571   <para><emphasis>Switch:</emphasis>
1572   <code>_GLIBCXX_PROFILE_MULTITHREADED</code>.
1573   </para>
1574
1575 <section xml:id="manual.ext.profile_mode.analysis.mthread.ddtest" xreflabel="Dependence Violations at Container Level"><info><title>Data Dependence Violations at Container Level</title></info>
1576
1577 <itemizedlist>
1578   <listitem><para><emphasis>Switch:</emphasis>
1579   <code>_GLIBCXX_PROFILE_DDTEST</code>.
1580   </para></listitem>
1581   <listitem><para><emphasis>Goal:</emphasis> Detect container elements
1582   that are referenced from multiple threads in the parallel region or
1583   across parallel regions.
1584   </para></listitem>
1585   <listitem><para><emphasis>Fundamentals:</emphasis>
1586   Sharing data between threads requires communication and perhaps locking,
1587   which may be expensive.
1588   </para></listitem>
1589   <listitem><para><emphasis>Sample runtime reduction:</emphasis>?%.
1590   </para></listitem>
1591   <listitem><para><emphasis>Recommendation:</emphasis> Change data
1592   distribution or parallel algorithm.</para></listitem>
1593   <listitem><para><emphasis>To instrument:</emphasis> Container access methods
1594   and iterators.
1595   </para></listitem>
1596   <listitem><para><emphasis>Analysis:</emphasis>
1597   Keep a shadow for each container.  Record iterator dereferences and
1598   container member accesses.  Issue advice for elements referenced by
1599   multiple threads.
1600   See paper: <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://dl.acm.org/citation.cfm?id=207110.207148">
1601   The LRPD test: speculative run-time parallelization of loops with
1602   privatization and reduction parallelization</link>.
1603   </para></listitem>
1604   <listitem><para><emphasis>Cost model:</emphasis>
1605   Number of accesses to elements referenced from multiple threads
1606   </para></listitem>
1607   <listitem><para><emphasis>Example:</emphasis>
1608 <programlisting>
1609 </programlisting>
1610 </para></listitem>
1611 </itemizedlist>
1612 </section>
1613
1614 <section xml:id="manual.ext.profile_mode.analysis.mthread.false_share" xreflabel="False Sharing"><info><title>False Sharing</title></info>
1615
1616 <itemizedlist>
1617   <listitem><para><emphasis>Switch:</emphasis>
1618   <code>_GLIBCXX_PROFILE_FALSE_SHARING</code>.
1619   </para></listitem>
1620   <listitem><para><emphasis>Goal:</emphasis> Detect elements in the
1621   same container which share a cache line, are written by at least one
1622   thread, and accessed by different threads.
1623   </para></listitem>
1624   <listitem><para><emphasis>Fundamentals:</emphasis> Under these assumptions,
1625   cache protocols require
1626   communication to invalidate lines, which may be expensive.
1627   </para></listitem>
1628   <listitem><para><emphasis>Sample runtime reduction:</emphasis>68%.
1629   </para></listitem>
1630   <listitem><para><emphasis>Recommendation:</emphasis> Reorganize container
1631   or use padding to avoid false sharing.</para></listitem>
1632   <listitem><para><emphasis>To instrument:</emphasis> Container access methods
1633   and iterators.
1634   </para></listitem>
1635   <listitem><para><emphasis>Analysis:</emphasis>
1636   First, get the cache line size.
1637   For each shared container, record all the associated iterator dereferences
1638   and member access methods with the thread id.  Compare the address lists
1639   across threads to detect references in two different threads to the same
1640   cache line.  Issue a warning only if the ratio to total references is
1641   significant.  Do the same for iterator dereference values if they are
1642   pointers.</para></listitem>
1643   <listitem><para><emphasis>Cost model:</emphasis>
1644   Number of accesses to same cache line from different threads.
1645   </para></listitem>
1646   <listitem><para><emphasis>Example:</emphasis>
1647 <programlisting>
1648 1     vector&lt;int&gt; v(2, 0);
1649 2 #pragma omp parallel for shared(v, SIZE) schedule(static, 1)
1650 3     for (i = 0; i &lt; SIZE; ++i) {
1651 4       v[i % 2] += i;
1652 5     }
1653
1654 OMP_NUM_THREADS=2 ./a.out
1655 foo.cc:1: advice: Change container structure or padding to avoid false
1656 sharing in multithreaded access at foo.cc:4.  Detected N shared cache lines.
1657 </programlisting>
1658 </para></listitem>
1659 </itemizedlist>
1660 </section>
1661
1662 </section>
1663
1664
1665 <section xml:id="manual.ext.profile_mode.analysis.statistics" xreflabel="Statistics"><info><title>Statistics</title></info>
1666
1667
1668 <para>
1669 <emphasis>Switch:</emphasis>
1670   <code>_GLIBCXX_PROFILE_STATISTICS</code>.
1671 </para>
1672
1673 <para>
1674   In some cases the cost model may not tell us anything because the costs
1675   appear to offset the benefits.  Consider the choice between a vector and
1676   a list.  When there are both inserts and iteration, an automatic advice
1677   may not be issued.  However, the programmer may still be able to make use
1678   of this information in a different way.
1679 </para>
1680 <para>
1681   This diagnostic will not issue any advice, but it will print statistics for
1682   each container construction site.  The statistics will contain the cost
1683   of each operation actually performed on the container.
1684 </para>
1685
1686 </section>
1687
1688
1689 </section>
1690
1691
1692 <bibliography xml:id="profile_mode.biblio"><info><title>Bibliography</title></info>
1693
1694
1695   <biblioentry>
1696     <citetitle>
1697       Perflint: A Context Sensitive Performance Advisor for C++ Programs
1698     </citetitle>
1699
1700     <author><personname><firstname>Lixia</firstname><surname>Liu</surname></personname></author>
1701     <author><personname><firstname>Silvius</firstname><surname>Rus</surname></personname></author>
1702
1703     <copyright>
1704       <year>2009</year>
1705       <holder/>
1706     </copyright>
1707
1708     <publisher>
1709       <publishername>
1710         Proceedings of the 2009 International Symposium on Code Generation
1711         and Optimization
1712       </publishername>
1713     </publisher>
1714   </biblioentry>
1715 </bibliography>
1716
1717
1718 </chapter>