tools-src/gnu/binutils/gprof/gprof.info-2

   1 This is gprof.info, produced by makeinfo version 4.2 from gprof.texi.
   2
   3 START-INFO-DIR-ENTRY
   4 * gprof: (gprof).                Profiling your program's execution
   5 END-INFO-DIR-ENTRY
   6
   7    This file documents the gprof profiler of the GNU system.
   8
   9    Copyright (C) 1988, 92, 97, 98, 99, 2000, 2001 Free Software
  10 Foundation, Inc.
  11
  12    Permission is granted to copy, distribute and/or modify this document
  13 under the terms of the GNU Free Documentation License, Version 1.1 or
  14 any later version published by the Free Software Foundation; with no
  15 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
  16 Texts.  A copy of the license is included in the section entitled "GNU
  17 Free Documentation License".
  18
  19 \x1f
  20 File: gprof.info,  Node: Line-by-line,  Next: Annotated Source,  Prev: Call Graph,  Up: Output
  21
  22 Line-by-line Profiling
  23 ======================
  24
  25    `gprof''s `-l' option causes the program to perform "line-by-line"
  26 profiling.  In this mode, histogram samples are assigned not to
  27 functions, but to individual lines of source code.  The program usually
  28 must be compiled with a `-g' option, in addition to `-pg', in order to
  29 generate debugging symbols for tracking source code lines.
  30
  31    The flat profile is the most useful output table in line-by-line
  32 mode.  The call graph isn't as useful as normal, since the current
  33 version of `gprof' does not propagate call graph arcs from source code
  34 lines to the enclosing function.  The call graph does, however, show
  35 each line of code that called each function, along with a count.
  36
  37    Here is a section of `gprof''s output, without line-by-line
  38 profiling.  Note that `ct_init' accounted for four histogram hits, and
  39 13327 calls to `init_block'.
  40
  41      Flat profile:
  42
  43      Each sample counts as 0.01 seconds.
  44        %   cumulative   self              self     total
  45       time   seconds   seconds    calls  us/call  us/call  name
  46       30.77      0.13     0.04     6335     6.31     6.31  ct_init
  47
  48
  49                      Call graph (explanation follows)
  50
  51
  52      granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
  53
  54      index % time    self  children    called     name
  55
  56                      0.00    0.00       1/13496       name_too_long
  57                      0.00    0.00      40/13496       deflate
  58                      0.00    0.00     128/13496       deflate_fast
  59                      0.00    0.00   13327/13496       ct_init
  60      [7]      0.0    0.00    0.00   13496         init_block
  61
  62    Now let's look at some of `gprof''s output from the same program run,
  63 this time with line-by-line profiling enabled.  Note that `ct_init''s
  64 four histogram hits are broken down into four lines of source code -
  65 one hit occurred on each of lines 349, 351, 382 and 385.  In the call
  66 graph, note how `ct_init''s 13327 calls to `init_block' are broken down
  67 into one call from line 396, 3071 calls from line 384, 3730 calls from
  68 line 385, and 6525 calls from 387.
  69
  70      Flat profile:
  71
  72      Each sample counts as 0.01 seconds.
  73        %   cumulative   self
  74       time   seconds   seconds    calls  name
  75        7.69      0.10     0.01           ct_init (trees.c:349)
  76        7.69      0.11     0.01           ct_init (trees.c:351)
  77        7.69      0.12     0.01           ct_init (trees.c:382)
  78        7.69      0.13     0.01           ct_init (trees.c:385)
  79
  80
  81                      Call graph (explanation follows)
  82
  83
  84      granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds
  85
  86        % time    self  children    called     name
  87
  88                  0.00    0.00       1/13496       name_too_long (gzip.c:1440)
  89                  0.00    0.00       1/13496       deflate (deflate.c:763)
  90                  0.00    0.00       1/13496       ct_init (trees.c:396)
  91                  0.00    0.00       2/13496       deflate (deflate.c:727)
  92                  0.00    0.00       4/13496       deflate (deflate.c:686)
  93                  0.00    0.00       5/13496       deflate (deflate.c:675)
  94                  0.00    0.00      12/13496       deflate (deflate.c:679)
  95                  0.00    0.00      16/13496       deflate (deflate.c:730)
  96                  0.00    0.00     128/13496       deflate_fast (deflate.c:654)
  97                  0.00    0.00    3071/13496       ct_init (trees.c:384)
  98                  0.00    0.00    3730/13496       ct_init (trees.c:385)
  99                  0.00    0.00    6525/13496       ct_init (trees.c:387)
 100      [6]  0.0    0.00    0.00   13496         init_block (trees.c:408)
 101
 102 \x1f
 103 File: gprof.info,  Node: Annotated Source,  Prev: Line-by-line,  Up: Output
 104
 105 The Annotated Source Listing
 106 ============================
 107
 108    `gprof''s `-A' option triggers an annotated source listing, which
 109 lists the program's source code, each function labeled with the number
 110 of times it was called.  You may also need to specify the `-I' option,
 111 if `gprof' can't find the source code files.
 112
 113    Compiling with `gcc ... -g -pg -a' augments your program with
 114 basic-block counting code, in addition to function counting code.  This
 115 enables `gprof' to determine how many times each line of code was
 116 executed.  For example, consider the following function, taken from
 117 gzip, with line numbers added:
 118
 119       1 ulg updcrc(s, n)
 120       2     uch *s;
 121       3     unsigned n;
 122       4 {
 123       5     register ulg c;
 124       6
 125       7     static ulg crc = (ulg)0xffffffffL;
 126       8
 127       9     if (s == NULL) {
 128      10         c = 0xffffffffL;
 129      11     } else {
 130      12         c = crc;
 131      13         if (n) do {
 132      14             c = crc_32_tab[...];
 133      15         } while (--n);
 134      16     }
 135      17     crc = c;
 136      18     return c ^ 0xffffffffL;
 137      19 }
 138
 139    `updcrc' has at least five basic-blocks.  One is the function
 140 itself.  The `if' statement on line 9 generates two more basic-blocks,
 141 one for each branch of the `if'.  A fourth basic-block results from the
 142 `if' on line 13, and the contents of the `do' loop form the fifth
 143 basic-block.  The compiler may also generate additional basic-blocks to
 144 handle various special cases.
 145
 146    A program augmented for basic-block counting can be analyzed with
 147 `gprof -l -A'.  I also suggest use of the `-x' option, which ensures
 148 that each line of code is labeled at least once.  Here is `updcrc''s
 149 annotated source listing for a sample `gzip' run:
 150
 151                      ulg updcrc(s, n)
 152                          uch *s;
 153                          unsigned n;
 154                  2 ->{
 155                          register ulg c;
 156
 157                          static ulg crc = (ulg)0xffffffffL;
 158
 159                  2 ->    if (s == NULL) {
 160                  1 ->   c = 0xffffffffL;
 161                  1 ->    } else {
 162                  1 ->   c = crc;
 163                  1 ->        if (n) do {
 164              26312 ->            c = crc_32_tab[...];
 165      26312,1,26311 ->        } while (--n);
 166                          }
 167                  2 ->    crc = c;
 168                  2 ->    return c ^ 0xffffffffL;
 169                  2 ->}
 170
 171    In this example, the function was called twice, passing once through
 172 each branch of the `if' statement.  The body of the `do' loop was
 173 executed a total of 26312 times.  Note how the `while' statement is
 174 annotated.  It began execution 26312 times, once for each iteration
 175 through the loop.  One of those times (the last time) it exited, while
 176 it branched back to the beginning of the loop 26311 times.
 177
 178 \x1f
 179 File: gprof.info,  Node: Inaccuracy,  Next: How do I?,  Prev: Output,  Up: Top
 180
 181 Inaccuracy of `gprof' Output
 182 ****************************
 183
 184 * Menu:
 185
 186 * Sampling Error::      Statistical margins of error
 187 * Assumptions::         Estimating children times
 188
 189 \x1f
 190 File: gprof.info,  Node: Sampling Error,  Next: Assumptions,  Up: Inaccuracy
 191
 192 Statistical Sampling Error
 193 ==========================
 194
 195    The run-time figures that `gprof' gives you are based on a sampling
 196 process, so they are subject to statistical inaccuracy.  If a function
 197 runs only a small amount of time, so that on the average the sampling
 198 process ought to catch that function in the act only once, there is a
 199 pretty good chance it will actually find that function zero times, or
 200 twice.
 201
 202    By contrast, the number-of-calls and basic-block figures are derived
 203 by counting, not sampling.  They are completely accurate and will not
 204 vary from run to run if your program is deterministic.
 205
 206    The "sampling period" that is printed at the beginning of the flat
 207 profile says how often samples are taken.  The rule of thumb is that a
 208 run-time figure is accurate if it is considerably bigger than the
 209 sampling period.
 210
 211    The actual amount of error can be predicted.  For N samples, the
 212 _expected_ error is the square-root of N.  For example, if the sampling
 213 period is 0.01 seconds and `foo''s run-time is 1 second, N is 100
 214 samples (1 second/0.01 seconds), sqrt(N) is 10 samples, so the expected
 215 error in `foo''s run-time is 0.1 seconds (10*0.01 seconds), or ten
 216 percent of the observed value.  Again, if the sampling period is 0.01
 217 seconds and `bar''s run-time is 100 seconds, N is 10000 samples,
 218 sqrt(N) is 100 samples, so the expected error in `bar''s run-time is 1
 219 second, or one percent of the observed value.  It is likely to vary
 220 this much _on the average_ from one profiling run to the next.
 221 (_Sometimes_ it will vary more.)
 222
 223    This does not mean that a small run-time figure is devoid of
 224 information.  If the program's _total_ run-time is large, a small
 225 run-time for one function does tell you that that function used an
 226 insignificant fraction of the whole program's time.  Usually this means
 227 it is not worth optimizing.
 228
 229    One way to get more accuracy is to give your program more (but
 230 similar) input data so it will take longer.  Another way is to combine
 231 the data from several runs, using the `-s' option of `gprof'.  Here is
 232 how:
 233
 234   1. Run your program once.
 235
 236   2. Issue the command `mv gmon.out gmon.sum'.
 237
 238   3. Run your program again, the same as before.
 239
 240   4. Merge the new data in `gmon.out' into `gmon.sum' with this command:
 241
 242           gprof -s EXECUTABLE-FILE gmon.out gmon.sum
 243
 244   5. Repeat the last two steps as often as you wish.
 245
 246   6. Analyze the cumulative data using this command:
 247
 248           gprof EXECUTABLE-FILE gmon.sum > OUTPUT-FILE
 249
 250 \x1f
 251 File: gprof.info,  Node: Assumptions,  Prev: Sampling Error,  Up: Inaccuracy
 252
 253 Estimating `children' Times
 254 ===========================
 255
 256    Some of the figures in the call graph are estimates--for example, the
 257 `children' time values and all the the time figures in caller and
 258 subroutine lines.
 259
 260    There is no direct information about these measurements in the
 261 profile data itself.  Instead, `gprof' estimates them by making an
 262 assumption about your program that might or might not be true.
 263
 264    The assumption made is that the average time spent in each call to
 265 any function `foo' is not correlated with who called `foo'.  If `foo'
 266 used 5 seconds in all, and 2/5 of the calls to `foo' came from `a',
 267 then `foo' contributes 2 seconds to `a''s `children' time, by
 268 assumption.
 269
 270    This assumption is usually true enough, but for some programs it is
 271 far from true.  Suppose that `foo' returns very quickly when its
 272 argument is zero; suppose that `a' always passes zero as an argument,
 273 while other callers of `foo' pass other arguments.  In this program,
 274 all the time spent in `foo' is in the calls from callers other than `a'.
 275 But `gprof' has no way of knowing this; it will blindly and incorrectly
 276 charge 2 seconds of time in `foo' to the children of `a'.
 277
 278    We hope some day to put more complete data into `gmon.out', so that
 279 this assumption is no longer needed, if we can figure out how.  For the
 280 nonce, the estimated figures are usually more useful than misleading.
 281
 282 \x1f
 283 File: gprof.info,  Node: How do I?,  Next: Incompatibilities,  Prev: Inaccuracy,  Up: Top
 284
 285 Answers to Common Questions
 286 ***************************
 287
 288 How do I find which lines in my program were executed the most times?
 289      Compile your program with basic-block counting enabled, run it,
 290      then use the following pipeline:
 291
 292           gprof -l -C OBJFILE | sort -k 3 -n -r
 293
 294      This listing will show you the lines in your code executed most
 295      often, but not necessarily those that consumed the most time.
 296
 297 How do I find which lines in my program called a particular function?
 298      Use `gprof -l' and lookup the function in the call graph.  The
 299      callers will be broken down by function and line number.
 300
 301 How do I analyze a program that runs for less than a second?
 302      Try using a shell script like this one:
 303
 304           for i in `seq 1 100`; do
 305             fastprog
 306             mv gmon.out gmon.out.$i
 307           done
 308
 309           gprof -s fastprog gmon.out.*
 310
 311           gprof fastprog gmon.sum
 312
 313      If your program is completely deterministic, all the call counts
 314      will be simple multiples of 100 (i.e. a function called once in
 315      each run will appear with a call count of 100).
 316
 317 \x1f
 318 File: gprof.info,  Node: Incompatibilities,  Next: Details,  Prev: How do I?,  Up: Top
 319
 320 Incompatibilities with Unix `gprof'
 321 ***********************************
 322
 323    GNU `gprof' and Berkeley Unix `gprof' use the same data file
 324 `gmon.out', and provide essentially the same information.  But there
 325 are a few differences.
 326
 327    * GNU `gprof' uses a new, generalized file format with support for
 328      basic-block execution counts and non-realtime histograms.  A magic
 329      cookie and version number allows `gprof' to easily identify new
 330      style files.  Old BSD-style files can still be read.  *Note File
 331      Format::.
 332
 333    * For a recursive function, Unix `gprof' lists the function as a
 334      parent and as a child, with a `calls' field that lists the number
 335      of recursive calls.  GNU `gprof' omits these lines and puts the
 336      number of recursive calls in the primary line.
 337
 338    * When a function is suppressed from the call graph with `-e', GNU
 339      `gprof' still lists it as a subroutine of functions that call it.
 340
 341    * GNU `gprof' accepts the `-k' with its argument in the form
 342      `from/to', instead of `from to'.
 343
 344    * In the annotated source listing, if there are multiple basic
 345      blocks on the same line, GNU `gprof' prints all of their counts,
 346      separated by commas.
 347
 348    * The blurbs, field widths, and output formats are different.  GNU
 349      `gprof' prints blurbs after the tables, so that you can see the
 350      tables without skipping the blurbs.
 351
 352 \x1f
 353 File: gprof.info,  Node: Details,  Next: GNU Free Documentation License,  Prev: Incompatibilities,  Up: Top
 354
 355 Details of Profiling
 356 ********************
 357
 358 * Menu:
 359
 360 * Implementation::      How a program collects profiling information
 361 * File Format::         Format of `gmon.out' files
 362 * Internals::           `gprof''s internal operation
 363 * Debugging::           Using `gprof''s `-d' option
 364
 365 \x1f
 366 File: gprof.info,  Node: Implementation,  Next: File Format,  Up: Details
 367
 368 Implementation of Profiling
 369 ===========================
 370
 371    Profiling works by changing how every function in your program is
 372 compiled so that when it is called, it will stash away some information
 373 about where it was called from.  From this, the profiler can figure out
 374 what function called it, and can count how many times it was called.
 375 This change is made by the compiler when your program is compiled with
 376 the `-pg' option, which causes every function to call `mcount' (or
 377 `_mcount', or `__mcount', depending on the OS and compiler) as one of
 378 its first operations.
 379
 380    The `mcount' routine, included in the profiling library, is
 381 responsible for recording in an in-memory call graph table both its
 382 parent routine (the child) and its parent's parent.  This is typically
 383 done by examining the stack frame to find both the address of the
 384 child, and the return address in the original parent.  Since this is a
 385 very machine-dependent operation, `mcount' itself is typically a short
 386 assembly-language stub routine that extracts the required information,
 387 and then calls `__mcount_internal' (a normal C function) with two
 388 arguments - `frompc' and `selfpc'.  `__mcount_internal' is responsible
 389 for maintaining the in-memory call graph, which records `frompc',
 390 `selfpc', and the number of times each of these call arcs was traversed.
 391
 392    GCC Version 2 provides a magical function
 393 (`__builtin_return_address'), which allows a generic `mcount' function
 394 to extract the required information from the stack frame.  However, on
 395 some architectures, most notably the SPARC, using this builtin can be
 396 very computationally expensive, and an assembly language version of
 397 `mcount' is used for performance reasons.
 398
 399    Number-of-calls information for library routines is collected by
 400 using a special version of the C library.  The programs in it are the
 401 same as in the usual C library, but they were compiled with `-pg'.  If
 402 you link your program with `gcc ... -pg', it automatically uses the
 403 profiling version of the library.
 404
 405    Profiling also involves watching your program as it runs, and
 406 keeping a histogram of where the program counter happens to be every
 407 now and then.  Typically the program counter is looked at around 100
 408 times per second of run time, but the exact frequency may vary from
 409 system to system.
 410
 411    This is done is one of two ways.  Most UNIX-like operating systems
 412 provide a `profil()' system call, which registers a memory array with
 413 the kernel, along with a scale factor that determines how the program's
 414 address space maps into the array.  Typical scaling values cause every
 415 2 to 8 bytes of address space to map into a single array slot.  On
 416 every tick of the system clock (assuming the profiled program is
 417 running), the value of the program counter is examined and the
 418 corresponding slot in the memory array is incremented.  Since this is
 419 done in the kernel, which had to interrupt the process anyway to handle
 420 the clock interrupt, very little additional system overhead is required.
 421
 422    However, some operating systems, most notably Linux 2.0 (and
 423 earlier), do not provide a `profil()' system call.  On such a system,
 424 arrangements are made for the kernel to periodically deliver a signal
 425 to the process (typically via `setitimer()'), which then performs the
 426 same operation of examining the program counter and incrementing a slot
 427 in the memory array.  Since this method requires a signal to be
 428 delivered to user space every time a sample is taken, it uses
 429 considerably more overhead than kernel-based profiling.  Also, due to
 430 the added delay required to deliver the signal, this method is less
 431 accurate as well.
 432
 433    A special startup routine allocates memory for the histogram and
 434 either calls `profil()' or sets up a clock signal handler.  This
 435 routine (`monstartup') can be invoked in several ways.  On Linux
 436 systems, a special profiling startup file `gcrt0.o', which invokes
 437 `monstartup' before `main', is used instead of the default `crt0.o'.
 438 Use of this special startup file is one of the effects of using `gcc
 439 ... -pg' to link.  On SPARC systems, no special startup files are used.
 440 Rather, the `mcount' routine, when it is invoked for the first time
 441 (typically when `main' is called), calls `monstartup'.
 442
 443    If the compiler's `-a' option was used, basic-block counting is also
 444 enabled.  Each object file is then compiled with a static array of
 445 counts, initially zero.  In the executable code, every time a new
 446 basic-block begins (i.e. when an `if' statement appears), an extra
 447 instruction is inserted to increment the corresponding count in the
 448 array.  At compile time, a paired array was constructed that recorded
 449 the starting address of each basic-block.  Taken together, the two
 450 arrays record the starting address of every basic-block, along with the
 451 number of times it was executed.
 452
 453    The profiling library also includes a function (`mcleanup') which is
 454 typically registered using `atexit()' to be called as the program
 455 exits, and is responsible for writing the file `gmon.out'.  Profiling
 456 is turned off, various headers are output, and the histogram is
 457 written, followed by the call-graph arcs and the basic-block counts.
 458
 459    The output from `gprof' gives no indication of parts of your program
 460 that are limited by I/O or swapping bandwidth.  This is because samples
 461 of the program counter are taken at fixed intervals of the program's
 462 run time.  Therefore, the time measurements in `gprof' output say
 463 nothing about time that your program was not running.  For example, a
 464 part of the program that creates so much data that it cannot all fit in
 465 physical memory at once may run very slowly due to thrashing, but
 466 `gprof' will say it uses little time.  On the other hand, sampling by
 467 run time has the advantage that the amount of load due to other users
 468 won't directly affect the output you get.
 469
 470 \x1f
 471 File: gprof.info,  Node: File Format,  Next: Internals,  Prev: Implementation,  Up: Details
 472
 473 Profiling Data File Format
 474 ==========================
 475
 476    The old BSD-derived file format used for profile data does not
 477 contain a magic cookie that allows to check whether a data file really
 478 is a `gprof' file.  Furthermore, it does not provide a version number,
 479 thus rendering changes to the file format almost impossible.  GNU
 480 `gprof' uses a new file format that provides these features.  For
 481 backward compatibility, GNU `gprof' continues to support the old
 482 BSD-derived format, but not all features are supported with it.  For
 483 example, basic-block execution counts cannot be accommodated by the old
 484 file format.
 485
 486    The new file format is defined in header file `gmon_out.h'.  It
 487 consists of a header containing the magic cookie and a version number,
 488 as well as some spare bytes available for future extensions.  All data
 489 in a profile data file is in the native format of the target for which
 490 the profile was collected.  GNU `gprof' adapts automatically to the
 491 byte-order in use.
 492
 493    In the new file format, the header is followed by a sequence of
 494 records.  Currently, there are three different record types: histogram
 495 records, call-graph arc records, and basic-block execution count
 496 records.  Each file can contain any number of each record type.  When
 497 reading a file, GNU `gprof' will ensure records of the same type are
 498 compatible with each other and compute the union of all records.  For
 499 example, for basic-block execution counts, the union is simply the sum
 500 of all execution counts for each basic-block.
 501
 502 Histogram Records
 503 -----------------
 504
 505    Histogram records consist of a header that is followed by an array of
 506 bins.  The header contains the text-segment range that the histogram
 507 spans, the size of the histogram in bytes (unlike in the old BSD
 508 format, this does not include the size of the header), the rate of the
 509 profiling clock, and the physical dimension that the bin counts
 510 represent after being scaled by the profiling clock rate.  The physical
 511 dimension is specified in two parts: a long name of up to 15 characters
 512 and a single character abbreviation.  For example, a histogram
 513 representing real-time would specify the long name as "seconds" and the
 514 abbreviation as "s".  This feature is useful for architectures that
 515 support performance monitor hardware (which, fortunately, is becoming
 516 increasingly common).  For example, under DEC OSF/1, the "uprofile"
 517 command can be used to produce a histogram of, say, instruction cache
 518 misses.  In this case, the dimension in the histogram header could be
 519 set to "i-cache misses" and the abbreviation could be set to "1"
 520 (because it is simply a count, not a physical dimension).  Also, the
 521 profiling rate would have to be set to 1 in this case.
 522
 523    Histogram bins are 16-bit numbers and each bin represent an equal
 524 amount of text-space.  For example, if the text-segment is one thousand
 525 bytes long and if there are ten bins in the histogram, each bin
 526 represents one hundred bytes.
 527
 528 Call-Graph Records
 529 ------------------
 530
 531    Call-graph records have a format that is identical to the one used in
 532 the BSD-derived file format.  It consists of an arc in the call graph
 533 and a count indicating the number of times the arc was traversed during
 534 program execution.  Arcs are specified by a pair of addresses: the
 535 first must be within caller's function and the second must be within
 536 the callee's function.  When performing profiling at the function
 537 level, these addresses can point anywhere within the respective
 538 function.  However, when profiling at the line-level, it is better if
 539 the addresses are as close to the call-site/entry-point as possible.
 540 This will ensure that the line-level call-graph is able to identify
 541 exactly which line of source code performed calls to a function.
 542
 543 Basic-Block Execution Count Records
 544 -----------------------------------
 545
 546    Basic-block execution count records consist of a header followed by a
 547 sequence of address/count pairs.  The header simply specifies the
 548 length of the sequence.  In an address/count pair, the address
 549 identifies a basic-block and the count specifies the number of times
 550 that basic-block was executed.  Any address within the basic-address can
 551 be used.
 552
 553 \x1f
 554 File: gprof.info,  Node: Internals,  Next: Debugging,  Prev: File Format,  Up: Details
 555
 556 `gprof''s Internal Operation
 557 ============================
 558
 559    Like most programs, `gprof' begins by processing its options.
 560 During this stage, it may building its symspec list
 561 (`sym_ids.c:sym_id_add'), if options are specified which use symspecs.
 562 `gprof' maintains a single linked list of symspecs, which will
 563 eventually get turned into 12 symbol tables, organized into six
 564 include/exclude pairs - one pair each for the flat profile
 565 (INCL_FLAT/EXCL_FLAT), the call graph arcs (INCL_ARCS/EXCL_ARCS),
 566 printing in the call graph (INCL_GRAPH/EXCL_GRAPH), timing propagation
 567 in the call graph (INCL_TIME/EXCL_TIME), the annotated source listing
 568 (INCL_ANNO/EXCL_ANNO), and the execution count listing
 569 (INCL_EXEC/EXCL_EXEC).
 570
 571    After option processing, `gprof' finishes building the symspec list
 572 by adding all the symspecs in `default_excluded_list' to the exclude
 573 lists EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is
 574 specified, EXCL_FLAT as well.  These default excludes are not added to
 575 EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
 576
 577    Next, the BFD library is called to open the object file, verify that
 578 it is an object file, and read its symbol table (`core.c:core_init'),
 579 using `bfd_canonicalize_symtab' after mallocing an appropriately sized
 580 array of symbols.  At this point, function mappings are read (if the
 581 `--file-ordering' option has been specified), and the core text space
 582 is read into memory (if the `-c' option was given).
 583
 584    `gprof''s own symbol table, an array of Sym structures, is now built.
 585 This is done in one of two ways, by one of two routines, depending on
 586 whether line-by-line profiling (`-l' option) has been enabled.  For
 587 normal profiling, the BFD canonical symbol table is scanned.  For
 588 line-by-line profiling, every text space address is examined, and a new
 589 symbol table entry gets created every time the line number changes.  In
 590 either case, two passes are made through the symbol table - one to
 591 count the size of the symbol table required, and the other to actually
 592 read the symbols.  In between the two passes, a single array of type
 593 `Sym' is created of the appropriate length.  Finally,
 594 `symtab.c:symtab_finalize' is called to sort the symbol table and
 595 remove duplicate entries (entries with the same memory address).
 596
 597    The symbol table must be a contiguous array for two reasons.  First,
 598 the `qsort' library function (which sorts an array) will be used to
 599 sort the symbol table.  Also, the symbol lookup routine
 600 (`symtab.c:sym_lookup'), which finds symbols based on memory address,
 601 uses a binary search algorithm which requires the symbol table to be a
 602 sorted array.  Function symbols are indicated with an `is_func' flag.
 603 Line number symbols have no special flags set.  Additionally, a symbol
 604 can have an `is_static' flag to indicate that it is a local symbol.
 605
 606    With the symbol table read, the symspecs can now be translated into
 607 Syms (`sym_ids.c:sym_id_parse').  Remember that a single symspec can
 608 match multiple symbols.  An array of symbol tables (`syms') is created,
 609 each entry of which is a symbol table of Syms to be included or
 610 excluded from a particular listing.  The master symbol table and the
 611 symspecs are examined by nested loops, and every symbol that matches a
 612 symspec is inserted into the appropriate syms table.  This is done
 613 twice, once to count the size of each required symbol table, and again
 614 to build the tables, which have been malloced between passes.  From now
 615 on, to determine whether a symbol is on an include or exclude symspec
 616 list, `gprof' simply uses its standard symbol lookup routine on the
 617 appropriate table in the `syms' array.
 618
 619    Now the profile data file(s) themselves are read
 620 (`gmon_io.c:gmon_out_read'), first by checking for a new-style
 621 `gmon.out' header, then assuming this is an old-style BSD `gmon.out' if
 622 the magic number test failed.
 623
 624    New-style histogram records are read by `hist.c:hist_read_rec'.  For
 625 the first histogram record, allocate a memory array to hold all the
 626 bins, and read them in.  When multiple profile data files (or files
 627 with multiple histogram records) are read, the starting address, ending
 628 address, number of bins and sampling rate must match between the
 629 various histograms, or a fatal error will result.  If everything
 630 matches, just sum the additional histograms into the existing in-memory
 631 array.
 632
 633    As each call graph record is read (`call_graph.c:cg_read_rec'), the
 634 parent and child addresses are matched to symbol table entries, and a
 635 call graph arc is created by `cg_arcs.c:arc_add', unless the arc fails
 636 a symspec check against INCL_ARCS/EXCL_ARCS.  As each arc is added, a
 637 linked list is maintained of the parent's child arcs, and of the child's
 638 parent arcs.  Both the child's call count and the arc's call count are
 639 incremented by the record's call count.
 640
 641    Basic-block records are read (`basic_blocks.c:bb_read_rec'), but
 642 only if line-by-line profiling has been selected.  Each basic-block
 643 address is matched to a corresponding line symbol in the symbol table,
 644 and an entry made in the symbol's bb_addr and bb_calls arrays.  Again,
 645 if multiple basic-block records are present for the same address, the
 646 call counts are cumulative.
 647
 648    A gmon.sum file is dumped, if requested (`gmon_io.c:gmon_out_write').
 649
 650    If histograms were present in the data files, assign them to symbols
 651 (`hist.c:hist_assign_samples') by iterating over all the sample bins
 652 and assigning them to symbols.  Since the symbol table is sorted in
 653 order of ascending memory addresses, we can simple follow along in the
 654 symbol table as we make our pass over the sample bins.  This step
 655 includes a symspec check against INCL_FLAT/EXCL_FLAT.  Depending on the
 656 histogram scale factor, a sample bin may span multiple symbols, in
 657 which case a fraction of the sample count is allocated to each symbol,
 658 proportional to the degree of overlap.  This effect is rare for normal
 659 profiling, but overlaps are more common during line-by-line profiling,
 660 and can cause each of two adjacent lines to be credited with half a
 661 hit, for example.
 662
 663    If call graph data is present, `cg_arcs.c:cg_assemble' is called.
 664 First, if `-c' was specified, a machine-dependent routine (`find_call')
 665 scans through each symbol's machine code, looking for subroutine call
 666 instructions, and adding them to the call graph with a zero call count.
 667 A topological sort is performed by depth-first numbering all the
 668 symbols (`cg_dfn.c:cg_dfn'), so that children are always numbered less
 669 than their parents, then making a array of pointers into the symbol
 670 table and sorting it into numerical order, which is reverse topological
 671 order (children appear before parents).  Cycles are also detected at
 672 this point, all members of which are assigned the same topological
 673 number.  Two passes are now made through this sorted array of symbol
 674 pointers.  The first pass, from end to beginning (parents to children),
 675 computes the fraction of child time to propagate to each parent and a
 676 print flag.  The print flag reflects symspec handling of
 677 INCL_GRAPH/EXCL_GRAPH, with a parent's include or exclude (print or no
 678 print) property being propagated to its children, unless they
 679 themselves explicitly appear in INCL_GRAPH or EXCL_GRAPH.  A second
 680 pass, from beginning to end (children to parents) actually propagates
 681 the timings along the call graph, subject to a check against
 682 INCL_TIME/EXCL_TIME.  With the print flag, fractions, and timings now
 683 stored in the symbol structures, the topological sort array is now
 684 discarded, and a new array of pointers is assembled, this time sorted
 685 by propagated time.
 686
 687    Finally, print the various outputs the user requested, which is now
 688 fairly straightforward.  The call graph (`cg_print.c:cg_print') and
 689 flat profile (`hist.c:hist_print') are regurgitations of values already
 690 computed.  The annotated source listing
 691 (`basic_blocks.c:print_annotated_source') uses basic-block information,
 692 if present, to label each line of code with call counts, otherwise only
 693 the function call counts are presented.
 694
 695    The function ordering code is marginally well documented in the
 696 source code itself (`cg_print.c').  Basically, the functions with the
 697 most use and the most parents are placed first, followed by other
 698 functions with the most use, followed by lower use functions, followed
 699 by unused functions at the end.
 700
 701 \x1f
 702 File: gprof.info,  Node: Debugging,  Prev: Internals,  Up: Details
 703
 704 Debugging `gprof'
 705 -----------------
 706
 707    If `gprof' was compiled with debugging enabled, the `-d' option
 708 triggers debugging output (to stdout) which can be helpful in
 709 understanding its operation.  The debugging number specified is
 710 interpreted as a sum of the following options:
 711
 712 2 - Topological sort
 713      Monitor depth-first numbering of symbols during call graph analysis
 714
 715 4 - Cycles
 716      Shows symbols as they are identified as cycle heads
 717
 718 16 - Tallying
 719      As the call graph arcs are read, show each arc and how the total
 720      calls to each function are tallied
 721
 722 32 - Call graph arc sorting
 723      Details sorting individual parents/children within each call graph
 724      entry
 725
 726 64 - Reading histogram and call graph records
 727      Shows address ranges of histograms as they are read, and each call
 728      graph arc
 729
 730 128 - Symbol table
 731      Reading, classifying, and sorting the symbol table from the object
 732      file.  For line-by-line profiling (`-l' option), also shows line
 733      numbers being assigned to memory addresses.
 734
 735 256 - Static call graph
 736      Trace operation of `-c' option
 737
 738 512 - Symbol table and arc table lookups
 739      Detail operation of lookup routines
 740
 741 1024 - Call graph propagation
 742      Shows how function times are propagated along the call graph
 743
 744 2048 - Basic-blocks
 745      Shows basic-block records as they are read from profile data (only
 746      meaningful with `-l' option)
 747
 748 4096 - Symspecs
 749      Shows symspec-to-symbol pattern matching operation
 750
 751 8192 - Annotate source
 752      Tracks operation of `-A' option
 753