doc/debugger/manual.rst

   1 .. _charmdebug:
   2
   3 ================
   4 Charm++ Debugger
   5 ================
   6
   7 .. contents::
   8    :depth: 3
   9
  10
  11 Introduction
  12 ============
  13
  14 The primary goal of the Charm++ parallel debugger is to provide an integrated
  15 debugging environment that allows the programmer to examine the
  16 changing state of parallel programs during the course of their execution.
  17
  18 The Charm++ debugging system has a number of useful features for Charm++
  19 programmers. The system includes a Java GUI client program which runs on
  20 the programmer’s desktop, and a Charm++ parallel program which acts as a
  21 server. The client and server need not be on the same machine, and
  22 communicate over the network using a secure protocol described in
  23 :ref:`converse_client_server`.
  24
  25 The system provides the following features:
  26
  27 -  Provides a means to easily access and view the major programmer-visible
  28    entities, including array elements and messages in queues,
  29    across the parallel machine during program execution. Objects and
  30    messages are extracted as raw data, and interpreted by the debugger.
  31
  32 -  Provides an interface to set and remove breakpoints on remote entry
  33    points, which capture the major programmer-visible control flows in a
  34    Charm++ program.
  35
  36 -  Provides the ability to freeze and unfreeze the execution of selected
  37    processors of the parallel program, which allows a consistent
  38    snapshot by preventing things from changing as they are examined.
  39
  40 -  Provides a way to attach a sequential debugger to a specific subset
  41    of processes of the parallel program during execution, which keeps a
  42    manageable number of sequential debugger windows open. Currently
  43    these windows are opened independently of the GUI interface, while in
  44    the future they will be transformed into an integrated view.
  45
  46 The debugging client provides these features via extensive support built
  47 into the Charm++ runtime.
  48
  49 Building the Charm++ Debug Tool
  50 ===============================
  51
  52 To get the CharmDebug tool, check out the source code from the following
  53 repository. This will create a directory named ccs_tools. Change to this
  54 directory and build the project.
  55
  56 .. code-block:: bash
  57
  58    $ git clone https://charm.cs.illinois.edu/gerrit/ccs_tools
  59    $ cd ccs_tools
  60    $ ant
  61
  62 This will create the executable ``bin/charmdebug``.
  63
  64 You can also download the binaries from the Charm++ downloads website
  65 and use it directly without building. (NOTE: Binaries may work properly
  66 on some platforms, so building from the source code is recommended.)
  67
  68 Preparing the Charm++ Application for Debugging
  69 ===============================================
  70
  71 Build Charm++ using ``--enable-charmdebug`` option. For example:
  72
  73 .. code-block:: bash
  74
  75    $ ./build charm++ netlrts-darwin-x86_64 --enable-charmdebug
  76
  77 No instrumentation is required to use the Charm++ debugger. Being CCS
  78 based, you can use it to set and step through entry point breakpoints
  79 and examine Charm++ structures in any Charm++ application.
  80
  81 Nevertheless, for some features to be present, some additional options
  82 may be required at either compile or link time:
  83
  84 -  In order to provide a symbolic representation of the machine code executed
  85    by the application, the ``-g`` option is needed at compile time. This
  86    setting is needed to provide function names as well as source file
  87    names and line numbers wherever useful. This is also important to fully
  88    utilize gdb (or any other serial debugger) on one or more processes.
  89
  90 -  Optimization options, by nature of transforming the source
  91    code, can produce a mismatch between the function displayed in the
  92    debugger (for example in a stack trace) and the functions present in
  93    the source code. To produce information coherent with source code,
  94    optimization is discouraged. Newer versions of some compilers support
  95    the ``-Og`` optimization level, which performs all optimizations that do
  96    not inhibit debugging.
  97
  98 -  The link time option ``-memory charmdebug`` is only needed if you want
  99    to use the Memory view (see :numref:`sec-memory`) or the
 100    Inspector framework (see :numref:`sec_inspector`) in CharmDebug.
 101
 102 Record Replay
 103 -------------
 104
 105 The *Record Replay* feature is independent of the charmdebug
 106 application. It is a mechanism used to detect bugs that happen rarely
 107 depending on the order in which messages are processed. The
 108 program in consideration is first executed in record mode which produces a
 109 trace. When the program is run in replay mode it uses previously recorded
 110 trace to ensure that messages are processed in the
 111 same order as the recorded run. The idea is to make use of a
 112 message sequence number to satisfy a theorem says that the serial numbers will
 113 be the same if the messages are processed in the same order.
 114 .. `\cite{rashmithesis}`
 115
 116 *Record Replay* tracing is automatically enabled for Charm++ programs
 117 and requires nothing special to be done during compilation. (Linking with
 118 the option ``-tracemode recordreplay`` used to be necessary). At run
 119 time, the ``+record`` option is used, which records messages in order in
 120 a file for each processor. The same execution order can be replayed
 121 using the ``+replay`` runtime option, which can be used at the same time
 122 as the other debugging tools in Charm++.
 123
 124 *Note!* If your Charm++ is built with ``CMK_OPTIMIZE`` on, all tracing
 125 will be disabled. So, use an unoptimized Charm++ to do your debugging.
 126
 127 Running the Debugger
 128 ====================
 129
 130 CharmDebug command line parameters
 131 ----------------------------------
 132
 133 ``-pes``
 134    Number of PEs
 135
 136 ``+p``
 137    Number of PEs
 138
 139 ``-host``
 140    hostname of CCS server for application
 141
 142 ``-user``
 143    the username to use to connect to the hostname selected
 144
 145 ``-port``
 146    portnumber of CCS server for application
 147
 148 ``-sshtunnel``
 149    force the communication between client and server (in particular the
 150    one for CCS) to be tunnelled through ssh. This allow the bypass of
 151    firewalls.
 152
 153 ``-display``
 154    X Display
 155
 156 Basic usage
 157 -----------
 158
 159 To run an application locally via the debugger on 4 PEs with command
 160 line options for your program (shown here as ``opt1 opt2``):
 161
 162 .. code-block:: bash
 163
 164    $ charmdebug pgm +p4 4 opt1 opt2
 165
 166 If the application should be run in a remote cluster behind a firewall,
 167 the previous command line will become:
 168
 169 .. code-block:: bash
 170
 171    $ charmdebug -host cluster.inst.edu -user myname -sshtunnel pgm +p4 4 opt1 opt2
 172
 173 CharmDebug can also be executed without any parameters. The user can
 174 then choose the application to launch and its command line parameters
 175 from within the ``File`` menu as shown in Figure :numref:`menu`.
 176
 177 .. figure:: figs/menu.png
 178    :name: menu
 179    :width: 3in
 180    :height: 3in
 181
 182    Using the menu to set parameters for the Charm++ program
 183
 184 *Note: charmdebug command line launching only works on netlrts-\* and
 185 verbs-\* builds of Charm++.*
 186
 187 To replay a previously recorded session:
 188
 189 .. code-block:: bash
 190
 191    $ charmdebug pgm +p4 opt1 opt2  +replay
 192
 193 Charm Debugging Related Options
 194 -------------------------------
 195
 196 When using the Charm debugger to launch your application, it will
 197 automatically set these to defaults appropriate for most situations.
 198
 199 ``+cpd``
 200    Triggers application freeze at startup for debugger.
 201
 202 ``++charmdebug``
 203    Triggers charmrun to provide some information about the executable,
 204    as well as provide an interface to gdb for querying.
 205
 206 ``+debugger``
 207    Which debuggers to use.
 208
 209 ``++debug``
 210    Run each node under gdb in an xterm window, prompting the user to
 211    begin execution.
 212
 213 ``++debug-no-pause``
 214    Run each node under gdb in an xterm window immediately (i.e. without
 215    prompting the user to begin execution).
 216
 217    *Note:* If you’re using the charm debugger it will probably be best
 218    to control the sequential (i.e. gdb) debuggers from within its GUI
 219    interface.
 220
 221 ``++DebugDisplay``
 222    X Display for xterm
 223
 224 ``++server-port``
 225    Port to listen for CCS requests
 226
 227 ``++server``
 228    Enable client-server (CCS) mode
 229
 230 ``+record``
 231    Use the recordreplay tracemode to record the exact event/message
 232    sequence for later use.
 233
 234 ``+replay``
 235    Force the use of recorded log of events/messages to exactly reproduce
 236    a previous run.
 237
 238    The preceding pair of commands ``+record +replay`` are used to
 239    produce the “instant replay” feature. This feature is valuable for
 240    catching errors which only occur sporadically. Such bugs which arise
 241    from the nondeterminacy of parallel execution can be fiendishly
 242    difficult to replicate in a debugging environment. Typical usage is
 243    to keep running the application with +record until the bug occurs.
 244    Then run the application under the debugger with the +replay option.
 245
 246 CharmDebug limitations
 247 ----------------------
 248
 249 Clusters
 250 ~~~~~~~~
 251
 252 CharmDebug is currently limited to applications started directly by the
 253 debugger due to implementation peculiarities. It will be extended to
 254 support connection to remote running applications in the near future.
 255
 256 Due to the current implementation, the debugging tool is limited to
 257 netlrts-\* and verbs-\* versions. Other builds of Charm++ might have
 258 unexpected behavior. In the near future this will be extended at least
 259 to the mpi-\* versions.
 260
 261 .. _record-replay-1:
 262
 263 Record Replay
 264 ~~~~~~~~~~~~~
 265
 266 The *Record Replay* feature does not work well with spontaneous
 267 events. Load balancing is the most common form of spontaneous event in
 268 that it occurs periodically with no other causal event.
 269
 270 .. figure:: figs/snapshot3.png
 271    :name: snapshot3
 272    :width: 3in
 273    :height: 4in
 274
 275    Parallel debugger when a break point is reached
 276
 277 As per Rashmi’s thesis:
 278
 279    "There are some unique issues for replay in the
 280    context of Charm because it provides high-level support for dynamic load
 281    balancing, quiescence detection and information sharing. Many of the
 282    load balancing strategies in Charm have a spontaneous component. The
 283    strategy periodically checks the sizes of the queues on the local
 284    processor. A replay load balancing strategy implements the known load
 285    redistribution. The behavior of the old balancing strategy is therefore
 286    not replayed only its effect is. Since minimal tracing is used by the
 287    replay mechanism the amount of perturbation due to tracing is reduced.
 288    The replay mechanism is proposed as a debugging support to replay
 289    asynchronous message arrival orders."
 290
 291 Moreover, if your application crashes without a clean shutdown, the log
 292 may be lost with the application.
 293
 294 .. _sec:using:
 295
 296 Using the Debugger
 297 ------------------
 298
 299 Once the debugger’s GUI loads, the programmer triggers the program
 300 execution by clicking the *Start* button. When starting by command line,
 301 the application is automatically started. The program begins by
 302 displaying the user and system entry points as a list of check boxes,
 303 pausing at the onset. The user could choose to set breakpoints by
 304 clicking on the corresponding entry points and kick off execution by
 305 clicking the *Continue* Button. Figure :numref:`snapshot3` shows a
 306 snapshot of the debugger when a breakpoint is reached. The program
 307 freezes when a breakpoint is reached.
 308
 309 Clicking the *Freeze* button during the execution of the program freezes
 310 execution, while *Continue* button resumes execution. The *Quit* button can
 311 be used to abort execution at any point of time. Entities (for instance,
 312 array elements) and their contents on any processor can be viewed at any
 313 point in time during execution as illustrated in Figure
 314 :numref:`arrayelement`.
 315
 316 .. figure:: figs/arrayelement.png
 317    :name: arrayelement
 318    :width: 3in
 319    :height: 4in
 320
 321    Freezing program execution and viewing the contents of an array
 322    element using the Parallel Debugger
 323
 324 Specific individual processes of the Charm++ program can be attached to
 325 instances of *gdb* as shown in Figure :numref:`gdb`. The programmer
 326 chooses which PEs to connect *gdb* processes to via the checkboxes on
 327 the right side. *Note!* While the program is suspended in gdb for step
 328 debugging, high-level CharmDebug features such as object inspection will not
 329 work.
 330
 331 .. figure:: figs/snapshot4-crop.png
 332    :name: gdb
 333    :width: 6in
 334
 335    Parallel debugger showing instances of *gdb* open for the selected
 336    processor elements
 337
 338 Charm++ objects can be examined via the *View Entities on PE : Display*
 339 selector. It allows the user to choose from *Charm Objects, Array
 340 Elements, Messages in Queue, Readonly Variables, Readonly Messages,
 341 Entry Points, Chare Types, Message Types and Mainchares*. The right
 342 sideselector sets the PE upon which the request for display will be
 343 made. The user may then click on the *Entity* to see the details.
 344
 345 .. _sec-memory:
 346
 347 Memory View
 348 ~~~~~~~~~~~
 349
 350 The menu option Action \ :math:`\rightarrow` Memory allows the user to
 351 display the entire memory layout of a specific processor. An example is
 352 shown in Figure :numref:`fig:memory`. This layout is colored and the
 353 colors have the following meaning:
 354
 355 .. figure:: figs/memoryView.png
 356    :name: fig:memory
 357
 358    Main memory view
 359
 360 red
 361    memory allocated by the Charm++ Runtime System;
 362
 363 blue
 364    memory allocated directly by the user in its code;
 365
 366 pink
 367    memory used by messages;
 368
 369 orange
 370    memory allocated to a chare element;
 371
 372 black
 373    memory not allocated;
 374
 375 gray
 376    a big jump in memory addresses due to the memory pooling system, it
 377    represent a large portion of virtual space not used between two
 378    different zones of used virtual space address;
 379
 380 yellow
 381    the currently selected memory slot;
 382
 383 Currently it is not possible to change this color association. The
 384 bottom part of the view shows the stack trace at the moment when the
 385 highlighted (yellow) memory slot was allocated. By left clicking on a
 386 particular slot, this slot is fixed in highlight mode. This allows a
 387 more accurate inspection of its stack trace when this is large and does
 388 not fit the window.
 389
 390 Info \ :math:`\rightarrow`\ Show Statistics will display a small
 391 information box like the one in Figure :numref:`fig:memory-stat`.
 392
 393 .. figure:: figs/memoryStatistics.png
 394    :name: fig:memory-stat
 395
 396    Information box display memory statistics
 397
 398 A useful tool of this view is the memory leak search. This is located in
 399 the menu Action \ :math:`\rightarrow` Search Leaks. The processor under
 400 inspection runs a reachability test on every memory slot allocated to
 401 find if there is a pointer to it. If there is none, the slot is
 402 partially colored in green, to indicate its status of leak. The user can
 403 the inspect further these slots. Figure :numref:`fig:memory-leak` shows
 404 some leaks being detected.
 405
 406 .. figure:: figs/memoryLeaking.png
 407    :name: fig:memory-leak
 408
 409    Memory view after running the Search Leaks tool
 410
 411 If the memory window is kept open while the application is unfrozen and
 412 makes progress, the loaded image will become obsolete. To cope with
 413 this, the “Update” button will refresh the view to the current
 414 allocation status. All the leaks that had been already found as such,
 415 will still be partially colored in green, while the newly allocated
 416 slots will not, even if leaking. To update the leak status, re-run the
 417 Search Leaks tool.
 418
 419 Finally, when a specific slot is highlighted, the menu
 420 Action \ :math:`\rightarrow` Inspect opens a new window displaying the
 421 content of the memory in that slot, as interpreted by the debugger (see
 422 next subsection for more details on this).
 423
 424 .. _sec_inspector:
 425
 426 Inspector framework
 427 ~~~~~~~~~~~~~~~~~~~
 428
 429 Without any code rewriting of the application, CharmDebug is capable of
 430 loading a raw area of memory and parsing it with a given type name. The
 431 result (as shown in Figure :numref:`fig:inspect`, is a browsable tree.
 432 The initial type of a memory area is given by its virtual table pointer
 433 (Charm++ objects are virtual and therefore loadable). In the case of
 434 memory slots not containing classes with virtual methods, no display
 435 will be possible.
 436
 437 .. figure:: figs/memoryInspector.png
 438    :name: fig:inspect
 439
 440    Raw memory parsed and displayed as a tree
 441
 442 When the view is open and is displaying a type, by right clicking on a
 443 leaf containing a pointer to another memory location, a popup menu will
 444 allow the user to ask for its dereference (shown in
 445 Figure :numref:`fig:inspect`). In this case, CharmDebug will load this
 446 raw data as well and parse it with the given type name of the pointer.
 447 This dereference will be inlined and the leaf will become an internal
 448 node of the browse tree.
 449
 450
 451 Debugger Implementation Details
 452 ===============================
 453
 454 The following classes in the PUP framework were used in implementing
 455 debugging support in charm.
 456
 457 -  ``class PUP::er`` - This class is the abstract superclass of all the
 458    other classes in the framework. The ``pup`` method of a particular
 459    class takes a reference to a ``PUP::er`` as parameter. This class has
 460    methods for dealing with all the basic C++ data types. All these
 461    methods are expressed in terms of a generic pure virtual method.
 462    Subclasses only need to provide the generic method.
 463
 464 -  ``class PUP::toText`` - This is a subclass of the ``PUP::toTextUtil``
 465    class which is a subclass of the ``PUP::er`` class. It copies the
 466    data of an object to a C string, including the terminating NULL.
 467
 468 -  ``class PUP::sizerText`` - This is a subclass of the
 469    ``PUP::toTextUtil`` class which is a subclass of the ``PUP::er``
 470    class. It returns the number of characters including the terminating
 471    NULL and is used by the ``PUP::toText`` object to allocate space for
 472    building the C string.
 473
 474 The code below shows a simple class declaration that includes a ``pup``
 475 method.
 476
 477 .. code-block:: c++
 478
 479      class foo {
 480       private:
 481        bool isBar;
 482        int x;
 483        char y;
 484        unsigned long z;
 485        float q[3];
 486       public:
 487        void pup(PUP::er &p) {
 488          p(isBar);
 489          p(x);p(y);p(z);
 490          p(q,3);
 491        }
 492      };
 493
 494 Converse Client-Server Interface
 495 --------------------------------
 496
 497 The Converse Client-Server (CCS) module enables Converse
 498 .. `\cite{InterOpIPPS96}`
 499 programs to act as parallel servers,
 500 responding to requests from non-Converse programs. The CCS module is
 501 split into two parts -- client and server. The server side is used by
 502 Converse programs while the client side is used by arbitrary non-Converse
 503 programs. A CCS client accesses a running Converse program by talking to
 504 a ``server-host`` which receives the CCS requests and relays them to the
 505 appropriate processor. The ``server-host`` is ``charmrun``
 506 .. `\cite{charmman}`
 507 for netlrts- versions and is the first
 508 processor for all other versions.
 509
 510 In the case of the netlrts- version of Charm++, a Converse program is
 511 started as a server by running the Charm++ program using the additional
 512 runtime option ``++server``. This opens the CCS server on any TCP port
 513 number. The TCP port number can be specified using the command-line
 514 option ``server-port``. A CCS client connects to a CCS server, asks a
 515 server PE to execute a pre-registered handler and receives the response
 516 data. The function ``CcsConnect`` takes a pointer to a ``CcsServer`` as
 517 an argument and connects to the given CCS server. The functions
 518 ``CcsNumNodes``, ``CcsNumPes``, and ``CcsNodeSize`` implemented as part of
 519 the client interface in Charm++ return information about the parallel
 520 machine. The function ``CcsSendRequest`` takes a handler ID and the
 521 destination processor number as arguments and asks the server to execute
 522 the requested handler on the specified processor. ``CcsRecvResponse``
 523 receives a response to the previous request in-place. A timeout is also
 524 specified which gives the number of seconds to wait until the function
 525 returns 0, otherwise the number of bytes received is returned.
 526
 527 Once a request arrives on a CCS server socket, the CCS server runtime
 528 looks up the appropriate registered handler and calls it. If no handler
 529 is found the runtime prints a diagnostic and ignores the message. If the
 530 CCS module is disabled in the core, all CCS routines become macros
 531 returning 0. The function ``CcsRegisterHandler`` is used to register
 532 handlers in the CCS server. A handler ID string and a function pointer
 533 are passed as parameters. A table of strings corresponding to
 534 appropriate function pointers is created. Various built-in functions are
 535 provided which can be called from within a CCS handler. The debugger
 536 behaves as a CCS client invoking appropriate handlers which make use of
 537 some of these functions. Some of the built-in functions are as follows.
 538
 539 -  ``CcsSendReply`` - This function sends the data provided as an
 540    argument back to the client as a reply. This function can only be
 541    called from a CCS handler invoked remotely.
 542
 543 -  ``CcsDelayReply`` - This call is made to allow a CCS reply to be
 544    delayed until after the handler has completed.
 545
 546 The CCS runtime system provides several built-in CCS handlers, which are
 547 available to any Converse program. All Charm++ programs are essentially
 548 Converse programs. ``ccs_getinfo`` takes an empty message and responds
 549 with information about the parallel job. Similarly the handler
 550 ``ccs_killport`` allows a client to be notified when a parallel run
 551 exits.