gcc/doc/analyzer.texi

   1 @c Copyright (C) 2019-2024 Free Software Foundation, Inc.
   2 @c This is part of the GCC manual.
   3 @c For copying conditions, see the file gcc.texi.
   4 @c Contributed by David Malcolm <dmalcolm@redhat.com>.
   5
   6 @node Static Analyzer
   7 @chapter Static Analyzer
   8 @cindex analyzer
   9 @cindex static analysis
  10 @cindex static analyzer
  11
  12 @menu
  13 * Analyzer Internals::       Analyzer Internals
  14 * Debugging the Analyzer::   Useful debugging tips
  15 @end menu
  16
  17 @node Analyzer Internals
  18 @section Analyzer Internals
  19 @cindex analyzer, internals
  20 @cindex static analyzer, internals
  21
  22 @subsection Overview
  23
  24 At a high-level, we're doing coverage-guided symbolic execution of the
  25 user's code.
  26
  27 The analyzer implementation works on the gimple-SSA representation.
  28 (I chose this in the hopes of making it easy to work with LTO to
  29 do whole-program analysis).
  30
  31 The implementation is read-only: it doesn't attempt to change anything,
  32 just emit warnings.
  33
  34 The gimple representation can be seen using @option{-fdump-ipa-analyzer}.
  35 @quotation Tip
  36 If the analyzer ICEs before this is written out, one workaround is to use
  37 @option{--param=analyzer-bb-explosion-factor=0} to force the analyzer
  38 to bail out after analyzing the first basic block.
  39 @end quotation
  40
  41 First, we build a @code{supergraph} which combines the callgraph and all
  42 of the CFGs into a single directed graph, with both interprocedural and
  43 intraprocedural edges.  The nodes and edges in the supergraph are called
  44 ``supernodes'' and ``superedges'', and often referred to in code as
  45 @code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
  46 interprocedural calls, so there can be more than one supernode per
  47 basic block.  Most statements will be in just one supernode, but a call
  48 statement can appear in two supernodes: at the end of one for the call,
  49 and again at the start of another for the return.
  50
  51 The supergraph can be seen using @option{-fdump-analyzer-supergraph}.
  52
  53 We then build an @code{analysis_plan} which walks the callgraph to
  54 determine which calls might be suitable for being summarized (rather
  55 than fully explored) and thus in what order to explore the functions.
  56
  57 Next is the heart of the analyzer: we use a worklist to explore state
  58 within the supergraph, building an "exploded graph".
  59 Nodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
  60      "Precise Interprocedural Dataflow Analysis via Graph Reachability"
  61      (Thomas Reps, Susan Horwitz and Mooly Sagiv) - but note that
  62 we're not using the algorithm described in that paper, just the
  63 ``exploded graph'' terminology.
  64
  65 We reuse nodes for <point, state> pairs we've already seen, and avoid
  66 tracking state too closely, so that (hopefully) we rapidly converge
  67 on a final exploded graph, and terminate the analysis.  We also bail
  68 out if the number of exploded <end-of-basic-block, state> nodes gets
  69 larger than a particular multiple of the total number of basic blocks
  70 (to ensure termination in the face of pathological state-explosion
  71 cases, or bugs).  We also stop exploring a point once we hit a limit
  72 of states for that point.
  73
  74 We can identify problems directly when processing a <point,@w{ }state>
  75 instance.  For example, if we're finding the successors of
  76
  77 @smallexample
  78    <point: before-stmt: "free (ptr);",
  79     state: @{"ptr": freed@}>
  80 @end smallexample
  81
  82 then we can detect a double-free of "ptr".  We can then emit a path
  83 to reach the problem by finding the simplest route through the graph.
  84
  85 Program points in the analysis are much more fine-grained than in the
  86 CFG and supergraph, with points (and thus potentially exploded nodes)
  87 for various events, including before individual statements.
  88 By default the exploded graph merges multiple consecutive statements
  89 in a supernode into one exploded edge to minimize the size of the
  90 exploded graph.  This can be suppressed via
  91 @option{-fanalyzer-fine-grained}.
  92 The fine-grained approach seems to make things simpler and more debuggable
  93 that other approaches I tried, in that each point is responsible for one
  94 thing.
  95
  96 Program points in the analysis also have a "call string" identifying the
  97 stack of callsites below them, so that paths in the exploded graph
  98 correspond to interprocedurally valid paths: we always return to the
  99 correct call site, propagating state information accordingly.
 100 We avoid infinite recursion by stopping the analysis if a callsite
 101 appears more than @code{analyzer-max-recursion-depth} in a callstring
 102 (defaulting to 2).
 103
 104 @subsection Graphs
 105
 106 Nodes and edges in the exploded graph are called ``exploded nodes'' and
 107 ``exploded edges'' and often referred to in the code as
 108 @code{enodes} and @code{eedges} (especially when distinguishing them
 109 from the @code{snodes} and @code{sedges} in the supergraph).
 110
 111 Each graph numbers its nodes, giving unique identifiers - supernodes
 112 are referred to throughout dumps in the form @samp{SN': @var{index}} and
 113 exploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
 114 @samp{EN:29}).
 115
 116 The supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
 117
 118 The exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
 119 and other dump options.  Exploded nodes are color-coded in the .dot output
 120 based on state-machine states to make it easier to see state changes at
 121 a glance.
 122
 123 @subsection State Tracking
 124
 125 There's a tension between:
 126 @itemize @bullet
 127 @item
 128 precision of analysis in the straight-line case, vs
 129 @item
 130 exponential blow-up in the face of control flow.
 131 @end itemize
 132
 133 For example, in general, given this CFG:
 134
 135 @smallexample
 136       A
 137      / \
 138     B   C
 139      \ /
 140       D
 141      / \
 142     E   F
 143      \ /
 144       G
 145 @end smallexample
 146
 147 we want to avoid differences in state-tracking in B and C from
 148 leading to blow-up.  If we don't prevent state blowup, we end up
 149 with exponential growth of the exploded graph like this:
 150
 151 @smallexample
 152
 153            1:A
 154           /   \
 155          /     \
 156         /       \
 157       2:B       3:C
 158        |         |
 159       4:D       5:D        (2 exploded nodes for D)
 160      /   \     /   \
 161    6:E   7:F 8:E   9:F
 162     |     |   |     |
 163    10:G 11:G 12:G  13:G    (4 exploded nodes for G)
 164
 165 @end smallexample
 166
 167 Similar issues arise with loops.
 168
 169 To prevent this, we follow various approaches:
 170
 171 @enumerate a
 172 @item
 173 state pruning: which tries to discard state that won't be relevant
 174 later on withing the function.
 175 This can be disabled via @option{-fno-analyzer-state-purge}.
 176
 177 @item
 178 state merging.  We can try to find the commonality between two
 179 program_state instances to make a third, simpler program_state.
 180 We have two strategies here:
 181
 182   @enumerate
 183   @item
 184      the worklist keeps new nodes for the same program_point together,
 185      and tries to merge them before processing, and thus before they have
 186      successors.  Hence, in the above, the two nodes for D (4 and 5) reach
 187      the front of the worklist together, and we create a node for D with
 188      the merger of the incoming states.
 189
 190   @item
 191      try merging with the state of existing enodes for the program_point
 192      (which may have already been explored).  There will be duplication,
 193      but only one set of duplication; subsequent duplicates are more likely
 194      to hit the cache.  In particular, (hopefully) all merger chains are
 195      finite, and so we guarantee termination.
 196      This is intended to help with loops: we ought to explore the first
 197      iteration, and then have a "subsequent iterations" exploration,
 198      which uses a state merged from that of the first, to be more abstract.
 199   @end enumerate
 200
 201 We avoid merging pairs of states that have state-machine differences,
 202 as these are the kinds of differences that are likely to be most
 203 interesting.  So, for example, given:
 204
 205 @smallexample
 206       if (condition)
 207         ptr = malloc (size);
 208       else
 209         ptr = local_buf;
 210
 211       .... do things with 'ptr'
 212
 213       if (condition)
 214         free (ptr);
 215
 216       ...etc
 217 @end smallexample
 218
 219 then we end up with an exploded graph that looks like this:
 220
 221 @smallexample
 222
 223                    if (condition)
 224                      / T      \ F
 225             ---------          ----------
 226            /                             \
 227       ptr = malloc (size)             ptr = local_buf
 228           |                               |
 229       copy of                         copy of
 230         "do things with 'ptr'"          "do things with 'ptr'"
 231       with ptr: heap-allocated        with ptr: stack-allocated
 232           |                               |
 233       if (condition)                  if (condition)
 234           | known to be T                 | known to be F
 235       free (ptr);                         |
 236            \                             /
 237             -----------------------------
 238                          | ('ptr' is pruned, so states can be merged)
 239                         etc
 240
 241 @end smallexample
 242
 243 where some duplication has occurred, but only for the places where the
 244 the different paths are worth exploringly separately.
 245
 246 Merging can be disabled via @option{-fno-analyzer-state-merge}.
 247 @end enumerate
 248
 249 @subsection Region Model
 250
 251 Part of the state stored at a @code{exploded_node} is a @code{region_model}.
 252 This is an implementation of the region-based ternary model described in
 253 @url{https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs,
 254 "A Memory Model for Static Analysis of C Programs"}
 255 (Zhongxing Xu, Ted Kremenek, and Jian Zhang).
 256
 257 A @code{region_model} encapsulates a representation of the state of
 258 memory, with a @code{store} recording a binding between @code{region}
 259 instances, to @code{svalue} instances.  The bindings are organized into
 260 clusters, where regions accessible via well-defined pointer arithmetic
 261 are in the same cluster.  The representation is graph-like because values
 262 can be pointers to regions.  It also stores a @code{constraint_manager},
 263 capturing relationships between the values.
 264
 265 Because each node in the @code{exploded_graph} has a @code{region_model},
 266 and each of the latter is graph-like, the @code{exploded_graph} is in some
 267 ways a graph of graphs.
 268
 269 There are several ``dump'' functions for use when debugging the analyzer.
 270
 271 Consider this example C code:
 272
 273 @smallexample
 274 void *
 275 calls_malloc (size_t n)
 276 @{
 277   void *result = malloc (1024);
 278   return result; /* HERE */
 279 @}
 280
 281 void test (size_t n)
 282 @{
 283   void *ptr = calls_malloc (n * 4);
 284   /* etc.  */
 285 @}
 286 @end smallexample
 287
 288 and the state at the point @code{/* HERE */} for the interprocedural
 289 analysis case where @code{calls_malloc} returns back to @code{test}.
 290
 291 Here's an example of printing a @code{program_state} at @code{/* HERE */},
 292 showing the @code{region_model} within it, along with state for the
 293 @code{malloc} state machine.
 294
 295 @smallexample
 296 (gdb) break region_model::on_return
 297 [..snip...]
 298 (gdb) run
 299 [..snip...]
 300 (gdb) up
 301 [..snip...]
 302 (gdb) call state->dump()
 303 State
 304 ├─ Region Model
 305 │  ├─ Current Frame: frame: ‘calls_malloc’@@2
 306 │  ├─ Store
 307 │  │  ├─ m_called_unknown_fn: false
 308 │  │  ├─ frame: ‘test’@@1
 309 │  │  │  ╰─ _1: (INIT_VAL(n_2(D))*(size_t)4)
 310 │  │  ╰─ frame: ‘calls_malloc’@@2
 311 │  │     ├─ result_4: &HEAP_ALLOCATED_REGION(27)
 312 │  │     ╰─ _5: &HEAP_ALLOCATED_REGION(27)
 313 │  ╰─ Dynamic Extents
 314 │     ╰─ HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
 315 ╰─ ‘malloc’ state machine
 316    ╰─ 0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked (@{free@}) (‘result_4’)
 317 @end smallexample
 318
 319 Within the store, there are bindings clusters for the SSA names for the
 320 various local variables within frames for @code{test} and
 321 @code{calls_malloc}.  For example,
 322
 323 @itemize @bullet
 324 @item
 325 within @code{test} the whole cluster for @code{_1} is bound
 326 to a @code{binop_svalue} representing @code{n * 4}, and
 327 @item
 328 within @code{test} the whole cluster for @code{result_4} is bound to a
 329 @code{region_svalue} pointing at @code{HEAP_ALLOCATED_REGION(12)}.
 330 @end itemize
 331
 332 Additionally, this latter pointer has the @code{unchecked} state for the
 333 @code{malloc} state machine indicating it hasn't yet been checked against
 334 @code{NULL} since the allocation call.
 335
 336 We also see that the state has captured the size of the heap-allocated
 337 region (``Dynamic Extents'').
 338
 339 This visualization can also be seen within the output of
 340 @option{-fdump-analyzer-exploded-nodes-2} and
 341 @option{-fdump-analyzer-exploded-nodes-3}.
 342
 343 As well as the above visualizations of states, there are tree-like
 344 visualizations for instances of @code{svalue} and @code{region}, showing
 345 their IDs and how they are constructed from simpler symbols:
 346
 347 @smallexample
 348 (gdb) break region_model::set_dynamic_extents
 349 [..snip...]
 350 (gdb) run
 351 [..snip...]
 352 (gdb) up
 353 [..snip...]
 354 (gdb) call size_in_bytes->dump()
 355 (17): ‘long unsigned int’: binop_svalue(mult_expr: ‘*’)
 356 ├─ (15): ‘size_t’: initial_svalue
 357 │  ╰─ m_reg: (12): ‘size_t’: decl_region(‘n_2(D)’)
 358 │     ╰─ parent: (9): frame_region(‘test’, index: 0, depth: 1)
 359 │        ╰─ parent: (1): stack region
 360 │           ╰─ parent: (0): root region
 361 ╰─ (16): ‘size_t’: constant_svalue (‘4’)
 362 @end smallexample
 363
 364 i.e. that @code{size_in_bytes} is a @code{binop_svalue} expressing
 365 the result of multiplying
 366
 367 @itemize @bullet
 368 @item
 369 the initial value of the @code{PARM_DECL} @code{n_2(D)} for the
 370 parameter @code{n} within the frame for @code{test} by
 371 @item
 372 the constant value @code{4}.
 373 @end itemize
 374
 375 The above visualizations rely on the @code{text_art::widget} framework,
 376 which performs significant work to lay out the output, so there is also
 377 an earlier, simpler, form of dumping available.  For states there is:
 378
 379 @smallexample
 380 (gdb) call state->dump(eg.m_ext_state, true)
 381 rmodel:
 382 stack depth: 2
 383   frame (index 1): frame: ‘calls_malloc’@@2
 384   frame (index 0): frame: ‘test’@@1
 385 clusters within frame: ‘test’@@1
 386   cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4)
 387 clusters within frame: ‘calls_malloc’@@2
 388   cluster for: result_4: &HEAP_ALLOCATED_REGION(27)
 389   cluster for: _5: &HEAP_ALLOCATED_REGION(27)
 390 m_called_unknown_fn: FALSE
 391 constraint_manager:
 392   equiv classes:
 393   constraints:
 394 dynamic_extents:
 395   HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
 396 malloc:
 397   0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked (@{free@}) (‘result_4’)
 398 @end smallexample
 399
 400 or for @code{region_model} just:
 401
 402 @smallexample
 403 (gdb) call state->m_region_model->debug()
 404 stack depth: 2
 405   frame (index 1): frame: ‘calls_malloc’@@2
 406   frame (index 0): frame: ‘test’@@1
 407 clusters within frame: ‘test’@@1
 408   cluster for: _1: (INIT_VAL(n_2(D))*(size_t)4)
 409 clusters within frame: ‘calls_malloc’@@2
 410   cluster for: result_4: &HEAP_ALLOCATED_REGION(27)
 411   cluster for: _5: &HEAP_ALLOCATED_REGION(27)
 412 m_called_unknown_fn: FALSE
 413 constraint_manager:
 414   equiv classes:
 415   constraints:
 416 dynamic_extents:
 417   HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
 418 @end smallexample
 419
 420 and for instances of @code{svalue} and @code{region} there is this
 421 older dump implementation, which takes a @code{bool simple} flag
 422 controlling the verbosity of the dump:
 423
 424 @smallexample
 425 (gdb) call size_in_bytes->dump(true)
 426 (INIT_VAL(n_2(D))*(size_t)4)
 427
 428 (gdb) call size_in_bytes->dump(false)
 429 binop_svalue (mult_expr, initial_svalue(‘size_t’, decl_region(frame_region(‘test’, index: 0, depth: 1), ‘size_t’, ‘n_2(D)’)), constant_svalue(‘size_t’, 4))
 430 @end smallexample
 431
 432 @subsection Analyzer Paths
 433
 434 We need to explain to the user what the problem is, and to persuade them
 435 that there really is a problem.  Hence having a @code{diagnostic_path}
 436 isn't just an incidental detail of the analyzer; it's required.
 437
 438 Paths ought to be:
 439 @itemize @bullet
 440 @item
 441 interprocedurally-valid
 442 @item
 443 feasible
 444 @end itemize
 445
 446 Without state-merging, all paths in the exploded graph are feasible
 447 (in terms of constraints being satisfied).
 448 With state-merging, paths in the exploded graph can be infeasible.
 449
 450 We collate warnings and only emit them for the simplest path
 451 e.g. for a bug in a utility function, with lots of routes to calling it,
 452 we only emit the simplest path (which could be intraprocedural, if
 453 it can be reproduced without a caller).
 454
 455 We thus want to find the shortest feasible path through the exploded
 456 graph from the origin to the exploded node at which the diagnostic was
 457 saved.  Unfortunately, if we simply find the shortest such path and
 458 check if it's feasible we might falsely reject the diagnostic, as there
 459 might be a longer path that is feasible.  Examples include the cases
 460 where the diagnostic requires us to go at least once around a loop for a
 461 later condition to be satisfied, or where for a later condition to be
 462 satisfied we need to enter a suite of code that the simpler path skips.
 463
 464 We attempt to find the shortest feasible path to each diagnostic by
 465 first constructing a ``trimmed graph'' from the exploded graph,
 466 containing only those nodes and edges from which there are paths to
 467 the target node, and using Dijkstra's algorithm to order the trimmed
 468 nodes by minimal distance to the target.
 469
 470 We then use a worklist to iteratively build a ``feasible graph''
 471 (actually a tree), capturing the pertinent state along each path, in
 472 which every path to a ``feasible node'' is feasible by construction,
 473 restricting ourselves to the trimmed graph to ensure we stay on target,
 474 and ordering the worklist so that the first feasible path we find to the
 475 target node is the shortest possible path.  Hence we start by trying the
 476 shortest possible path, but if that fails, we explore progressively
 477 longer paths, eventually trying iterations through loops.  The
 478 exploration is captured in the feasible_graph, which can be dumped as a
 479 .dot file via @option{-fdump-analyzer-feasibility} to visualize the
 480 exploration.  The indices of the feasible nodes show the order in which
 481 they were created.  We effectively explore the tree of feasible paths in
 482 order of shortest path until we either find a feasible path to the
 483 target node, or hit a limit and give up.
 484
 485 This is something of a brute-force approach, but the trimmed graph
 486 hopefully keeps the complexity manageable.
 487
 488 This algorithm can be disabled (for debugging purposes) via
 489 @option{-fno-analyzer-feasibility}, which simply uses the shortest path,
 490 and notes if it is infeasible.
 491
 492 The above gives us a shortest feasible @code{exploded_path} through the
 493 @code{exploded_graph} (a list of @code{exploded_edge *}).  We use this
 494 @code{exploded_path} to build a @code{diagnostic_path} (a list of
 495 @strong{events} for the diagnostic subsystem) - specifically a
 496 @code{checker_path}.
 497
 498 Having built the @code{checker_path}, we prune it to try to eliminate
 499 events that aren't relevant, to minimize how much the user has to read.
 500
 501 After pruning, we notify each event in the path of its ID and record the
 502 IDs of interesting events, allowing for events to refer to other events
 503 in their descriptions.  The @code{pending_diagnostic} class has various
 504 vfuncs to support emitting more precise descriptions, so that e.g.
 505
 506 @itemize @bullet
 507 @item
 508 a deref-of-unchecked-malloc diagnostic might use:
 509 @smallexample
 510   returning possibly-NULL pointer to 'make_obj' from 'allocator'
 511 @end smallexample
 512 for a @code{return_event} to make it clearer how the unchecked value moves
 513 from callee back to caller
 514 @item
 515 a double-free diagnostic might use:
 516 @smallexample
 517   second 'free' here; first 'free' was at (3)
 518 @end smallexample
 519 and a use-after-free might use
 520 @smallexample
 521   use after 'free' here; memory was freed at (2)
 522 @end smallexample
 523 @end itemize
 524
 525 At this point we can emit the diagnostic.
 526
 527 @subsection Limitations
 528
 529 @itemize @bullet
 530 @item
 531 Only for C so far
 532 @item
 533 The implementation of call summaries is currently very simplistic.
 534 @item
 535 Lack of function pointer analysis
 536 @item
 537 The constraint-handling code assumes reflexivity in some places
 538 (that values are equal to themselves), which is not the case for NaN.
 539 As a simple workaround, constraints on floating-point values are
 540 currently ignored.
 541 @item
 542 There are various other limitations in the region model (grep for TODO/xfail
 543 in the testsuite).
 544 @item
 545 The constraint_manager's implementation of transitivity is currently too
 546 expensive to enable by default and so must be manually enabled via
 547 @option{-fanalyzer-transitivity}).
 548 @item
 549 The checkers are currently hardcoded and don't allow for user extensibility
 550 (e.g. adding allocate/release pairs).
 551 @item
 552 Although the analyzer's test suite has a proof-of-concept test case for
 553 LTO, LTO support hasn't had extensive testing.  There are various
 554 lang-specific things in the analyzer that assume C rather than LTO.
 555 For example, SSA names are printed to the user in ``raw'' form, rather
 556 than printing the underlying variable name.
 557 @end itemize
 558
 559 @node Debugging the Analyzer
 560 @section Debugging the Analyzer
 561 @cindex analyzer, debugging
 562 @cindex static analyzer, debugging
 563
 564 When debugging the analyzer I normally use all of these options
 565 together:
 566
 567 @smallexample
 568 ./xgcc -B. \
 569   -S \
 570   -fanalyzer \
 571   OTHER_GCC_ARGS \
 572   -wrapper gdb,--args \
 573   -fdump-analyzer-stderr \
 574   -fanalyzer-fine-grained \
 575   -fdump-ipa-analyzer=stderr
 576 @end smallexample
 577
 578 where:
 579
 580 @itemize @bullet
 581 @item @code{./xgcc -B.}
 582 is the usual way to invoke a self-built GCC from within the @file{BUILDDIR/gcc}
 583 subdirectory.
 584
 585 @item @code{-S}
 586 so that the driver (@code{./xgcc}) invokes @code{cc1}, but doesn't bother
 587 running the assembler or linker (since the analyzer runs inside @code{cc1}).
 588
 589 @item @code{-fanalyzer}
 590 enables the analyzer, obviously.
 591
 592 @item @code{-wrapper gdb,--args}
 593 invokes @code{cc1} under the debugger so that I can debug @code{cc1} and
 594 set breakpoints and step through things.
 595
 596 @item @code{-fdump-analyzer-stderr}
 597 so that the logging interface is enabled and goes to stderr, which often
 598 gives valuable context into what's happening when stepping through the
 599 analyzer
 600
 601 @item @code{-fanalyzer-fine-grained}
 602 which splits the effect of every statement into its own
 603 exploded_node, rather than the default (which tries to combine
 604 successive stmts to reduce the size of the exploded_graph).  This makes
 605 it easier to see exactly where a particular change happens.
 606
 607 @item @code{-fdump-ipa-analyzer=stderr}
 608 which dumps the GIMPLE IR seen by the analyzer pass to stderr
 609
 610 @end itemize
 611
 612 Other useful options:
 613
 614 @itemize @bullet
 615 @item @code{-fdump-analyzer-exploded-graph}
 616 which dumps a @file{SRC.eg.dot} GraphViz file that I can look at (with
 617 python-xdot)
 618
 619 @item @code{-fdump-analyzer-exploded-nodes-2}
 620 which dumps a @file{SRC.eg.txt} file containing the full @code{exploded_graph}.
 621
 622 @end itemize
 623
 624 Assuming that you have the
 625 @uref{https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html,,python support scripts for gdb}
 626 installed (which you should do, it makes debugging GCC much easier),
 627 you can use:
 628
 629 @smallexample
 630 (gdb) break-on-saved-diagnostic
 631 @end smallexample
 632
 633 to put a breakpoint at the place where a diagnostic is saved during
 634 @code{exploded_graph} exploration, to see where a particular diagnostic
 635 is being saved, and:
 636
 637 @smallexample
 638 (gdb) break-on-diagnostic
 639 @end smallexample
 640
 641 to put a breakpoint at the place where diagnostics are actually emitted.
 642
 643 @subsection Special Functions for Debugging the Analyzer
 644
 645 The analyzer recognizes various special functions by name, for use
 646 in debugging the analyzer, and for use in DejaGnu tests.
 647
 648 The declarations of these functions can be seen in the testsuite
 649 in @file{analyzer-decls.h}.  None of these functions are actually
 650 implemented in terms of code, merely as @code{known_function} subclasses
 651 (in @file{gcc/analyzer/kf-analyzer.cc}).
 652
 653 @table @code
 654
 655 @item __analyzer_break
 656 Add:
 657 @smallexample
 658   __analyzer_break ();
 659 @end smallexample
 660 to the source being analyzed to trigger a breakpoint in the analyzer when
 661 that source is reached.  By putting a series of these in the source, it's
 662 much easier to effectively step through the program state as it's analyzed.
 663
 664 @item __analyzer_describe
 665 The analyzer handles:
 666
 667 @smallexample
 668 __analyzer_describe (0, expr);
 669 @end smallexample
 670
 671 by emitting a warning describing the 2nd argument (which can be of any
 672 type), at a verbosity level given by the 1st argument.  This is for use when
 673 debugging, and may be of use in DejaGnu tests.
 674
 675 @item __analyzer_dump
 676 @smallexample
 677 __analyzer_dump ();
 678 @end smallexample
 679
 680 will dump the copious information about the analyzer's state each time it
 681 reaches the call in its traversal of the source.
 682
 683 @item __analyzer_dump_capacity
 684 @smallexample
 685 extern void __analyzer_dump_capacity (const void *ptr);
 686 @end smallexample
 687
 688 will emit a warning describing the capacity of the base region of
 689 the region pointed to by the 1st argument.
 690
 691 @item __analyzer_dump_escaped
 692 @smallexample
 693 extern void __analyzer_dump_escaped (void);
 694 @end smallexample
 695
 696 will emit a warning giving the number of decls that have escaped on this
 697 analysis path, followed by a comma-separated list of their names,
 698 in alphabetical order.
 699
 700 @item __analyzer_dump_path
 701 @smallexample
 702 __analyzer_dump_path ();
 703 @end smallexample
 704
 705 will emit a placeholder ``note'' diagnostic with a path to that call site,
 706 if the analyzer finds a feasible path to it.  This can be useful for
 707 writing DejaGnu tests for constraint-tracking and feasibility checking.
 708
 709 @item __analyzer_dump_exploded_nodes
 710 For every callsite to @code{__analyzer_dump_exploded_nodes} the analyzer
 711 will emit a warning after it finished the analysis containing information
 712 on all of the exploded nodes at that program point.
 713
 714 @smallexample
 715   __analyzer_dump_exploded_nodes (0);
 716 @end smallexample
 717
 718 will output the number of ``processed'' nodes, and the IDs of
 719 both ``processed'' and ``merger'' nodes, such as:
 720
 721 @smallexample
 722 warning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
 723 @end smallexample
 724
 725 With a non-zero argument
 726
 727 @smallexample
 728   __analyzer_dump_exploded_nodes (1);
 729 @end smallexample
 730
 731 it will also dump all of the states within the ``processed'' nodes.
 732
 733 @item __analyzer_dump_named_constant
 734 When the analyzer sees a call to @code{__analyzer_dump_named_constant} it
 735 will emit a warning describing what is known about the value of a given
 736 named constant, for parts of the analyzer that interact with target
 737 headers.
 738
 739 For example:
 740
 741 @smallexample
 742 __analyzer_dump_named_constant ("O_RDONLY");
 743 @end smallexample
 744
 745 might lead to the analyzer emitting the warning:
 746
 747 @smallexample
 748 warning: named constant 'O_RDONLY' has value '1'
 749 @end smallexample
 750
 751 @item __analyzer_dump_region_model
 752 @smallexample
 753    __analyzer_dump_region_model ();
 754 @end smallexample
 755 will dump the region_model's state to stderr.
 756
 757 @item __analyzer_dump_state
 758 @smallexample
 759 __analyzer_dump_state ("malloc", ptr);
 760 @end smallexample
 761
 762 will emit a warning describing the state of the 2nd argument
 763 (which can be of any type) with respect to the state machine with
 764 a name matching the 1st argument (which must be a string literal).
 765 This is for use when debugging, and may be of use in DejaGnu tests.
 766
 767 @item __analyzer_eval
 768 @smallexample
 769 __analyzer_eval (expr);
 770 @end smallexample
 771 will emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
 772 truthfulness of the argument.  This is useful for writing DejaGnu tests.
 773
 774 @item __analyzer_get_unknown_ptr
 775 @smallexample
 776 __analyzer_get_unknown_ptr ();
 777 @end smallexample
 778 will obtain an unknown @code{void *}.
 779
 780 @item __analyzer_get_strlen
 781 @smallexample
 782 __analyzer_get_strlen (buf);
 783 @end smallexample
 784 will emit a warning if PTR doesn't point to a null-terminated string.
 785 TODO: eventually get the strlen of the buffer (without the
 786 optimizer touching it).
 787
 788 @end table
 789
 790 @subsection Other Debugging Techniques
 791
 792 To compare two different exploded graphs, try
 793 @code{-fdump-analyzer-exploded-nodes-2 -fdump-noaddr -fanalyzer-fine-grained}.
 794 This will dump a @file{SRC.eg.txt} file containing the full
 795 @code{exploded_graph}. I use @code{diff -u50 -p} to compare two different
 796 such files (e.g. before and after a patch) to find the first place where the
 797 two graphs diverge.  The option @option{-fdump-noaddr} will suppress
 798 printing pointers withihn the dumps (which would otherwise hide the real
 799 differences with irrelevent churn).
 800
 801 The option @option{-fdump-analyzer-json} will dump both the supergraph
 802 and the exploded graph in compressed JSON form.
 803
 804 One approach when tracking down where a particular bogus state is
 805 introduced into the @code{exploded_graph} is to add custom code to
 806 @code{program_state::validate}.
 807
 808 The debug function @code{region::is_named_decl_p} can be used when debugging,
 809 such as for assertions and conditional breakpoints.  For example, when
 810 tracking down a bug in handling a decl called @code{yy_buffer_stack}, I
 811 temporarily added a:
 812 @smallexample
 813   gcc_assert (!m_base_region->is_named_decl_p ("yy_buffer_stack"));
 814 @end smallexample
 815 to @code{binding_cluster::mark_as_escaped} to trap a point where
 816 @code{yy_buffer_stack} was mistakenly being treated as having escaped.