14 The primary goal of the Charm++ parallel debugger is to provide an integrated
15 debugging environment that allows the programmer to examine the
16 changing state of parallel programs during the course of their execution.
18 The Charm++ debugging system has a number of useful features for Charm++
19 programmers. The system includes a Java GUI client program which runs on
20 the programmer’s desktop, and a Charm++ parallel program which acts as a
21 server. The client and server need not be on the same machine, and
22 communicate over the network using a secure protocol described in
23 :ref:`converse_client_server`.
25 The system provides the following features:
27 - Provides a means to easily access and view the major programmer-visible
28 entities, including array elements and messages in queues,
29 across the parallel machine during program execution. Objects and
30 messages are extracted as raw data, and interpreted by the debugger.
32 - Provides an interface to set and remove breakpoints on remote entry
33 points, which capture the major programmer-visible control flows in a
36 - Provides the ability to freeze and unfreeze the execution of selected
37 processors of the parallel program, which allows a consistent
38 snapshot by preventing things from changing as they are examined.
40 - Provides a way to attach a sequential debugger to a specific subset
41 of processes of the parallel program during execution, which keeps a
42 manageable number of sequential debugger windows open. Currently
43 these windows are opened independently of the GUI interface, while in
44 the future they will be transformed into an integrated view.
46 The debugging client provides these features via extensive support built
47 into the Charm++ runtime.
49 Building the Charm++ Debug Tool
50 ===============================
52 To get the CharmDebug tool, check out the source code from the following
53 repository. This will create a directory named ccs_tools. Change to this
54 directory and build the project.
58 $ git clone https://charm.cs.illinois.edu/gerrit/ccs_tools
62 This will create the executable ``bin/charmdebug``.
64 You can also download the binaries from the Charm++ downloads website
65 and use it directly without building. (NOTE: Binaries may work properly
66 on some platforms, so building from the source code is recommended.)
68 Preparing the Charm++ Application for Debugging
69 ===============================================
71 Build Charm++ using ``--enable-charmdebug`` option. For example:
75 $ ./build charm++ netlrts-darwin-x86_64 --enable-charmdebug
77 No instrumentation is required to use the Charm++ debugger. Being CCS
78 based, you can use it to set and step through entry point breakpoints
79 and examine Charm++ structures in any Charm++ application.
81 Nevertheless, for some features to be present, some additional options
82 may be required at either compile or link time:
84 - In order to provide a symbolic representation of the machine code executed
85 by the application, the ``-g`` option is needed at compile time. This
86 setting is needed to provide function names as well as source file
87 names and line numbers wherever useful. This is also important to fully
88 utilize gdb (or any other serial debugger) on one or more processes.
90 - Optimization options, by nature of transforming the source
91 code, can produce a mismatch between the function displayed in the
92 debugger (for example in a stack trace) and the functions present in
93 the source code. To produce information coherent with source code,
94 optimization is discouraged. Newer versions of some compilers support
95 the ``-Og`` optimization level, which performs all optimizations that do
96 not inhibit debugging.
98 - The link time option ``-memory charmdebug`` is only needed if you want
99 to use the Memory view (see :numref:`sec-memory`) or the
100 Inspector framework (see :numref:`sec_inspector`) in CharmDebug.
105 The *Record Replay* feature is independent of the charmdebug
106 application. It is a mechanism used to detect bugs that happen rarely
107 depending on the order in which messages are processed. The
108 program in consideration is first executed in record mode which produces a
109 trace. When the program is run in replay mode it uses previously recorded
110 trace to ensure that messages are processed in the
111 same order as the recorded run. The idea is to make use of a
112 message sequence number to satisfy a theorem says that the serial numbers will
113 be the same if the messages are processed in the same order.
114 .. `\cite{rashmithesis}`
116 *Record Replay* tracing is automatically enabled for Charm++ programs
117 and requires nothing special to be done during compilation. (Linking with
118 the option ``-tracemode recordreplay`` used to be necessary). At run
119 time, the ``+record`` option is used, which records messages in order in
120 a file for each processor. The same execution order can be replayed
121 using the ``+replay`` runtime option, which can be used at the same time
122 as the other debugging tools in Charm++.
124 *Note!* If your Charm++ is built with ``CMK_OPTIMIZE`` on, all tracing
125 will be disabled. So, use an unoptimized Charm++ to do your debugging.
130 CharmDebug command line parameters
131 ----------------------------------
140 hostname of CCS server for application
143 the username to use to connect to the hostname selected
146 portnumber of CCS server for application
149 force the communication between client and server (in particular the
150 one for CCS) to be tunnelled through ssh. This allow the bypass of
159 To run an application locally via the debugger on 4 PEs with command
160 line options for your program (shown here as ``opt1 opt2``):
164 $ charmdebug pgm +p4 4 opt1 opt2
166 If the application should be run in a remote cluster behind a firewall,
167 the previous command line will become:
171 $ charmdebug -host cluster.inst.edu -user myname -sshtunnel pgm +p4 4 opt1 opt2
173 CharmDebug can also be executed without any parameters. The user can
174 then choose the application to launch and its command line parameters
175 from within the ``File`` menu as shown in Figure :numref:`menu`.
177 .. figure:: figs/menu.png
182 Using the menu to set parameters for the Charm++ program
184 *Note: charmdebug command line launching only works on netlrts-\* and
185 verbs-\* builds of Charm++.*
187 To replay a previously recorded session:
191 $ charmdebug pgm +p4 opt1 opt2 +replay
193 Charm Debugging Related Options
194 -------------------------------
196 When using the Charm debugger to launch your application, it will
197 automatically set these to defaults appropriate for most situations.
200 Triggers application freeze at startup for debugger.
203 Triggers charmrun to provide some information about the executable,
204 as well as provide an interface to gdb for querying.
207 Which debuggers to use.
210 Run each node under gdb in an xterm window, prompting the user to
214 Run each node under gdb in an xterm window immediately (i.e. without
215 prompting the user to begin execution).
217 *Note:* If you’re using the charm debugger it will probably be best
218 to control the sequential (i.e. gdb) debuggers from within its GUI
225 Port to listen for CCS requests
228 Enable client-server (CCS) mode
231 Use the recordreplay tracemode to record the exact event/message
232 sequence for later use.
235 Force the use of recorded log of events/messages to exactly reproduce
238 The preceding pair of commands ``+record +replay`` are used to
239 produce the “instant replay” feature. This feature is valuable for
240 catching errors which only occur sporadically. Such bugs which arise
241 from the nondeterminacy of parallel execution can be fiendishly
242 difficult to replicate in a debugging environment. Typical usage is
243 to keep running the application with +record until the bug occurs.
244 Then run the application under the debugger with the +replay option.
246 CharmDebug limitations
247 ----------------------
252 CharmDebug is currently limited to applications started directly by the
253 debugger due to implementation peculiarities. It will be extended to
254 support connection to remote running applications in the near future.
256 Due to the current implementation, the debugging tool is limited to
257 netlrts-\* and verbs-\* versions. Other builds of Charm++ might have
258 unexpected behavior. In the near future this will be extended at least
259 to the mpi-\* versions.
266 The *Record Replay* feature does not work well with spontaneous
267 events. Load balancing is the most common form of spontaneous event in
268 that it occurs periodically with no other causal event.
270 .. figure:: figs/snapshot3.png
275 Parallel debugger when a break point is reached
277 As per Rashmi’s thesis:
279 "There are some unique issues for replay in the
280 context of Charm because it provides high-level support for dynamic load
281 balancing, quiescence detection and information sharing. Many of the
282 load balancing strategies in Charm have a spontaneous component. The
283 strategy periodically checks the sizes of the queues on the local
284 processor. A replay load balancing strategy implements the known load
285 redistribution. The behavior of the old balancing strategy is therefore
286 not replayed only its effect is. Since minimal tracing is used by the
287 replay mechanism the amount of perturbation due to tracing is reduced.
288 The replay mechanism is proposed as a debugging support to replay
289 asynchronous message arrival orders."
291 Moreover, if your application crashes without a clean shutdown, the log
292 may be lost with the application.
299 Once the debugger’s GUI loads, the programmer triggers the program
300 execution by clicking the *Start* button. When starting by command line,
301 the application is automatically started. The program begins by
302 displaying the user and system entry points as a list of check boxes,
303 pausing at the onset. The user could choose to set breakpoints by
304 clicking on the corresponding entry points and kick off execution by
305 clicking the *Continue* Button. Figure :numref:`snapshot3` shows a
306 snapshot of the debugger when a breakpoint is reached. The program
307 freezes when a breakpoint is reached.
309 Clicking the *Freeze* button during the execution of the program freezes
310 execution, while *Continue* button resumes execution. The *Quit* button can
311 be used to abort execution at any point of time. Entities (for instance,
312 array elements) and their contents on any processor can be viewed at any
313 point in time during execution as illustrated in Figure
314 :numref:`arrayelement`.
316 .. figure:: figs/arrayelement.png
321 Freezing program execution and viewing the contents of an array
322 element using the Parallel Debugger
324 Specific individual processes of the Charm++ program can be attached to
325 instances of *gdb* as shown in Figure :numref:`gdb`. The programmer
326 chooses which PEs to connect *gdb* processes to via the checkboxes on
327 the right side. *Note!* While the program is suspended in gdb for step
328 debugging, high-level CharmDebug features such as object inspection will not
331 .. figure:: figs/snapshot4-crop.png
335 Parallel debugger showing instances of *gdb* open for the selected
338 Charm++ objects can be examined via the *View Entities on PE : Display*
339 selector. It allows the user to choose from *Charm Objects, Array
340 Elements, Messages in Queue, Readonly Variables, Readonly Messages,
341 Entry Points, Chare Types, Message Types and Mainchares*. The right
342 sideselector sets the PE upon which the request for display will be
343 made. The user may then click on the *Entity* to see the details.
350 The menu option Action \ :math:`\rightarrow` Memory allows the user to
351 display the entire memory layout of a specific processor. An example is
352 shown in Figure :numref:`fig:memory`. This layout is colored and the
353 colors have the following meaning:
355 .. figure:: figs/memoryView.png
361 memory allocated by the Charm++ Runtime System;
364 memory allocated directly by the user in its code;
367 memory used by messages;
370 memory allocated to a chare element;
373 memory not allocated;
376 a big jump in memory addresses due to the memory pooling system, it
377 represent a large portion of virtual space not used between two
378 different zones of used virtual space address;
381 the currently selected memory slot;
383 Currently it is not possible to change this color association. The
384 bottom part of the view shows the stack trace at the moment when the
385 highlighted (yellow) memory slot was allocated. By left clicking on a
386 particular slot, this slot is fixed in highlight mode. This allows a
387 more accurate inspection of its stack trace when this is large and does
390 Info \ :math:`\rightarrow`\ Show Statistics will display a small
391 information box like the one in Figure :numref:`fig:memory-stat`.
393 .. figure:: figs/memoryStatistics.png
394 :name: fig:memory-stat
396 Information box display memory statistics
398 A useful tool of this view is the memory leak search. This is located in
399 the menu Action \ :math:`\rightarrow` Search Leaks. The processor under
400 inspection runs a reachability test on every memory slot allocated to
401 find if there is a pointer to it. If there is none, the slot is
402 partially colored in green, to indicate its status of leak. The user can
403 the inspect further these slots. Figure :numref:`fig:memory-leak` shows
404 some leaks being detected.
406 .. figure:: figs/memoryLeaking.png
407 :name: fig:memory-leak
409 Memory view after running the Search Leaks tool
411 If the memory window is kept open while the application is unfrozen and
412 makes progress, the loaded image will become obsolete. To cope with
413 this, the “Update” button will refresh the view to the current
414 allocation status. All the leaks that had been already found as such,
415 will still be partially colored in green, while the newly allocated
416 slots will not, even if leaking. To update the leak status, re-run the
419 Finally, when a specific slot is highlighted, the menu
420 Action \ :math:`\rightarrow` Inspect opens a new window displaying the
421 content of the memory in that slot, as interpreted by the debugger (see
422 next subsection for more details on this).
429 Without any code rewriting of the application, CharmDebug is capable of
430 loading a raw area of memory and parsing it with a given type name. The
431 result (as shown in Figure :numref:`fig:inspect`, is a browsable tree.
432 The initial type of a memory area is given by its virtual table pointer
433 (Charm++ objects are virtual and therefore loadable). In the case of
434 memory slots not containing classes with virtual methods, no display
437 .. figure:: figs/memoryInspector.png
440 Raw memory parsed and displayed as a tree
442 When the view is open and is displaying a type, by right clicking on a
443 leaf containing a pointer to another memory location, a popup menu will
444 allow the user to ask for its dereference (shown in
445 Figure :numref:`fig:inspect`). In this case, CharmDebug will load this
446 raw data as well and parse it with the given type name of the pointer.
447 This dereference will be inlined and the leaf will become an internal
448 node of the browse tree.
451 Debugger Implementation Details
452 ===============================
454 The following classes in the PUP framework were used in implementing
455 debugging support in charm.
457 - ``class PUP::er`` - This class is the abstract superclass of all the
458 other classes in the framework. The ``pup`` method of a particular
459 class takes a reference to a ``PUP::er`` as parameter. This class has
460 methods for dealing with all the basic C++ data types. All these
461 methods are expressed in terms of a generic pure virtual method.
462 Subclasses only need to provide the generic method.
464 - ``class PUP::toText`` - This is a subclass of the ``PUP::toTextUtil``
465 class which is a subclass of the ``PUP::er`` class. It copies the
466 data of an object to a C string, including the terminating NULL.
468 - ``class PUP::sizerText`` - This is a subclass of the
469 ``PUP::toTextUtil`` class which is a subclass of the ``PUP::er``
470 class. It returns the number of characters including the terminating
471 NULL and is used by the ``PUP::toText`` object to allocate space for
472 building the C string.
474 The code below shows a simple class declaration that includes a ``pup``
487 void pup(PUP::er &p) {
494 Converse Client-Server Interface
495 --------------------------------
497 The Converse Client-Server (CCS) module enables Converse
498 .. `\cite{InterOpIPPS96}`
499 programs to act as parallel servers,
500 responding to requests from non-Converse programs. The CCS module is
501 split into two parts -- client and server. The server side is used by
502 Converse programs while the client side is used by arbitrary non-Converse
503 programs. A CCS client accesses a running Converse program by talking to
504 a ``server-host`` which receives the CCS requests and relays them to the
505 appropriate processor. The ``server-host`` is ``charmrun``
507 for netlrts- versions and is the first
508 processor for all other versions.
510 In the case of the netlrts- version of Charm++, a Converse program is
511 started as a server by running the Charm++ program using the additional
512 runtime option ``++server``. This opens the CCS server on any TCP port
513 number. The TCP port number can be specified using the command-line
514 option ``server-port``. A CCS client connects to a CCS server, asks a
515 server PE to execute a pre-registered handler and receives the response
516 data. The function ``CcsConnect`` takes a pointer to a ``CcsServer`` as
517 an argument and connects to the given CCS server. The functions
518 ``CcsNumNodes``, ``CcsNumPes``, and ``CcsNodeSize`` implemented as part of
519 the client interface in Charm++ return information about the parallel
520 machine. The function ``CcsSendRequest`` takes a handler ID and the
521 destination processor number as arguments and asks the server to execute
522 the requested handler on the specified processor. ``CcsRecvResponse``
523 receives a response to the previous request in-place. A timeout is also
524 specified which gives the number of seconds to wait until the function
525 returns 0, otherwise the number of bytes received is returned.
527 Once a request arrives on a CCS server socket, the CCS server runtime
528 looks up the appropriate registered handler and calls it. If no handler
529 is found the runtime prints a diagnostic and ignores the message. If the
530 CCS module is disabled in the core, all CCS routines become macros
531 returning 0. The function ``CcsRegisterHandler`` is used to register
532 handlers in the CCS server. A handler ID string and a function pointer
533 are passed as parameters. A table of strings corresponding to
534 appropriate function pointers is created. Various built-in functions are
535 provided which can be called from within a CCS handler. The debugger
536 behaves as a CCS client invoking appropriate handlers which make use of
537 some of these functions. Some of the built-in functions are as follows.
539 - ``CcsSendReply`` - This function sends the data provided as an
540 argument back to the client as a reply. This function can only be
541 called from a CCS handler invoked remotely.
543 - ``CcsDelayReply`` - This call is made to allow a CCS reply to be
544 delayed until after the handler has completed.
546 The CCS runtime system provides several built-in CCS handlers, which are
547 available to any Converse program. All Charm++ programs are essentially
548 Converse programs. ``ccs_getinfo`` takes an empty message and responds
549 with information about the parallel job. Similarly the handler
550 ``ccs_killport`` allows a client to be notified when a parallel run