doc/projections/tracing.tex

   1 \projections{} is a performance analysis/visualization framework that
   2 helps you understand and investigate performance-related problems in
   3 the (\charmpp{}) applications. It is a framework with an
   4 event tracing component which allows to control the
   5 amount of information generated. The tracing has low perturbation
   6 on the application. It
   7 also has a Java-based visualization and analysis component with
   8 various views that help present the performance information in a
   9 visually useful manner.
  10
  11 Performance analysis with \projections{} typically involves two simple
  12 steps:
  13
  14 \begin{enumerate}
  15 \item
  16 Prepare your application by linking with the appropriate trace
  17 generation modules and execute it to generate trace data.
  18 \item
  19 Using the Java-based tool to visually study various aspects of the
  20 performance and locate the performance issues for that application execution.
  21 \end{enumerate}
  22
  23 The \charmpp{} runtime automatically records pertinent performance
  24 data for performance-related events during execution. These
  25 events include the start and end of entry method execution, message
  26 send from entry methods and scheduler idle time. This means {\em
  27 most} users do not need to manually insert code into their
  28 applications in order to generate trace data. In scenarios where
  29 special performance information not captured by the runtime is
  30 required, an API (see section \ref{sec::api}) is available for
  31 user-specific events with some support for visualization by the
  32 Java-based tool. If greater control over tracing activities
  33 (e.g. dynamically turning instrumentation on and off) is desired, the
  34 API also allows users to insert code into their applications for such
  35 purposes.
  36
  37 The automatic recording of events by the \projections{} framework
  38 introduces the overhead of an if-statement for each runtime event,
  39 even if no performance analysis traces are desired. Developers of
  40 \charmpp{} applications who consider such an overhead to be
  41 unacceptable (e.g. for a production application which requires the
  42 absolute best performance) may recompile the \charmpp{} runtime with
  43 the \verb|--with-production| flag, which removes the instrumentation
  44 stubs.  To enable the instrumentation stubs while retaining the other
  45 optimizations enabled by \verb|--with-production|, one may
  46 compile \charmpp{} with both \verb|--with-production|
  47 and \verb|--enable-tracing|, which explicitly enables Projections tracing.
  48
  49 To enable performance tracing of your application, users simply need
  50 to link the appropriate trace data generation module(s) (also referred
  51 to as {\em tracemode(s)}). (see section \ref{sec::trace modules})
  52
  53 \section{Enabling Performance Tracing at Link/Run Time}
  54 \label{sec::trace modules}
  55
  56 \projections{} tracing modules dictate the type of performance data,
  57 data detail and data format each processor will record. They are also
  58 referred to as ``tracemodes''. There are currently 2 tracemodes
  59 available. Zero or more tracemodes may be specified at link-time. When
  60 no tracemodes are specified, no trace data is generated.
  61
  62 \subsection{Tracemode {\tt projections}}
  63
  64 Link time option: {\tt -tracemode projections}
  65
  66 This tracemode generates files that contain
  67 information about all \charmpp{} events like entry method calls and
  68 message packing during the execution of the program.  The data will be
  69 used by \projections{} in visualization and analysis.
  70
  71 This tracemode creates a single symbol table file and $p$ ASCII
  72 log files for $p$ processors. The names of the log files will be
  73 NAME.\#.log where NAME is the name of your executable and \# is the
  74 processor \#. The name of the symbol table file is NAME.sts where NAME
  75 is the name of your executable.
  76
  77 This is the main source of data needed by the performance
  78 visualizer. Certain tools like timeline will not work without the
  79 detail data from this tracemode.
  80
  81 The following is a list of runtime options available under this tracemode:
  82
  83 \begin{itemize}
  84 \item
  85 {\tt +logsize NUM}: keep only NUM log entries in the memory of each
  86 processor. The logs are emptied and flushed to disk when filled.
  87 \item
  88 {\tt +binary-trace}:  generate projections log in binary form.
  89 \item
  90 {\tt +gz-trace}:      generate gzip (if available) compressed log files.
  91 \item
  92 {\tt +gz-no-trace}:      generate regular (not compressed) log files.
  93 \item
  94 {\tt +checknested}: a debug option. Checks if events are improperly nested
  95 while recorded and issue a warning immediately.
  96
  97 \item {\tt +trace-subdirs NUM}: divide the generated log files among
  98   {\tt NUM} subdirectories of the trace root, each named {\tt
  99     PROGNAME.projdir.K}
 100 \end{itemize}
 101
 102 \subsection{Tracemode {\tt summary}}
 103
 104 Compile option: {\tt -tracemode summary}
 105
 106 In this tracemode, execution time across all entry points for each
 107 processor is partitioned into a fixed number of equally sized
 108 time-interval bins. These bins are globally resized whenever they are
 109 all filled in order to accommodate longer execution times while keeping
 110 the amount of space used constant.
 111
 112 Additional data like the total number of calls made to each entry
 113 point is summarized within each processor.
 114
 115 This tracemode will generate a single symbol table file and $p$ ASCII
 116 summary files for $p$ processors. The names of the summary files will
 117 be NAME.\#.sum where NAME is the name of your executable and \# is the
 118 processor \#. The name of the symbol table file is NAME.sum.sts where NAME
 119 is the name of your executable.
 120
 121 This tracemode can be used to control the amount of output generated
 122 in a run. It is typically used in scenarios where a quick look at the
 123 overall utilization graph of the application is desired to identify
 124 smaller regions of time for more detailed study. Attempting to
 125 generate the same graph using the detailed logs of the prior tracemode
 126 may be unnecessarily time consuming or impossible.
 127
 128 The following is a list of runtime options available under this tracemode:
 129
 130 \begin{itemize}
 131 \item
 132 {\tt +bincount NUM}:   use NUM time-interval bins. The bins are resized and compacted when filled.
 133 \item
 134 {\tt +binsize TIME}:   sets the initial time quantum each bin represents.
 135 \item
 136 {\tt +version}:        set summary version to generate.
 137 %\item
 138 %{\tt +epThreshold}: DOESNT DO ANYTHING YET. LEFT COMMENTED FOR DOC PURPOSES
 139 %\item
 140 %{\tt +epInterval}: DOESNT DO ANYTHING YET. LEFT COMMENTED FOR DOC PURPOSES
 141 \item
 142 {\tt +sumDetail}: Generates a additional set of files, one per processor,
 143 that stores the time spent by each entry method associated with each
 144 time-bin. The names of ``summary detail'' files will be NAME.\#.sumd where
 145 NAME is the name of your executable and \# is the processor \#.
 146 \item
 147 {\tt +sumonly}: Generates a single file that stores a single
 148 utilization value per time-bin, averaged across all processors. This
 149 file bears the name NAME.sum where NAME is the name of your
 150 executable. This runtime option currently overrides the {\tt
 151 +sumDetail} option.
 152 \end{itemize}
 153
 154 \subsection{General Runtime Options}
 155 \label{sec::general options}
 156
 157 The following is a list of runtime options available with the same
 158 semantics for all tracemodes:
 159
 160 \begin{itemize}
 161 \item
 162 {\tt +traceroot DIR}: place all generated files in DIR.
 163 \item
 164 {\tt +traceoff}: trace generation is turned off when the application
 165 is started. The user is expected to insert code to turn tracing on at
 166 some point in the run.
 167 \item
 168 {\tt +traceWarn}: By default, warning messages from the framework are
 169 not displayed. This option enables warning messages to be printed to
 170 screen. However, on large numbers of processors, they can overwhelm
 171 the terminal I/O system of the machine and result in unacceptable
 172 perturbation of the application.
 173 \item
 174 {\tt +traceprocessors RANGE}: Only output logfiles for PEs present in the range (i.e. {\tt 0-31,32-999966:1000,999967-999999} to record every PE on the first 32, only every thousanth for the middle range, and the last 32 for a million processor run).
 175
 176 \end{itemize}
 177
 178 \subsection{End-of-run Analysis for Data Reduction}
 179 \label{sec::data reduction}
 180
 181 As applications are scaled to thousands or hundreds of thousands of
 182 processors, the amount of data generated becomes extremely large and
 183 potentially unmanageable by the visualization tool. At the time of this{\tt +traceWarn}
 184 documentation, \projections{} is capable of handling data from 8000+
 185 processors but with somewhat severe tool responsiveness issues. We
 186 have developed an approach to mitigate this data size problem with
 187 options to trim-off ``uninteresting'' processors' data by not writing
 188 such data at the end of an application's execution.
 189
 190 This is currently done through heuristics to pick out interesting
 191 extremal (i.e. poorly behaved) processors and at the same time using a
 192 k-means clustering to pick out exemplar processors from equivalence
 193 classes to form a representative subset of processor data. The analyst
 194 is advised to also link in the summary module via {\tt +tracemode
 195 summary} and enable the {\tt +sumDetail} option in order to retain
 196 some profile data for processors whose data were dropped.
 197
 198 \begin{itemize}
 199 \item
 200 {\tt +extrema}: enables extremal processor identification analysis at
 201 the end of the application's execution.
 202 \item
 203 {\tt +numClusters}: determines the number of clusters (equivalence
 204 classes) to be used by the k-means clustering algorithm for
 205 determining exemplar processors. Analysts should take advantage of
 206 their knowledge of natural application decomposition to guess at a
 207 good value for this.
 208 \end{itemize}
 209
 210 This feature is still being developed and refined as part of our
 211 research. It would be appreciated if users of this feature could
 212 contact the developers if you have input or suggestions.
 213
 214
 215 \section{Controlling Tracing from Within the Program}
 216 \label{sec::api}
 217
 218 \subsection{Selective Tracing}
 219 \label{sec::selective tracing}
 220
 221 \charmpp{} allows user to start/stop tracing the execution at certain
 222 points in time on the local processor. Users are advised to make these
 223 calls on all processors and at well-defined points in the application.
 224
 225 Users may choose to have instrumentation turned off at first (by
 226 command line option {\tt +traceoff} - see section \ref{sec::general options}) if some period of time in middle of the
 227 application\'s execution is of interest to the user.
 228
 229 Alternatively, users may start the application with instrumentation
 230 turned on (default) and turn off tracing for specific sections of the
 231 application.
 232
 233 Again, users are advised to be consistent as the {\tt +traceoff}
 234 runtime option applies to all processors in the application.
 235
 236 \begin{itemize}
 237 \item
 238 {\tt void traceBegin()}
 239
 240 Enables the runtime to trace events (including all user events) on the local processor where {\tt traceBegin} is called.
 241
 242 \item
 243 {\tt void traceEnd()}
 244
 245 Prevents the runtime from tracing events (including all user events) on the local processor where {\tt traceEnd} is called.
 246
 247 \end{itemize}
 248
 249 \subsection{User Events}
 250 \label{sec::user events}
 251
 252 \projections{} has the ability to visualize traceable user
 253 specified events. User events are usually displayed in the Timeline view as vertical bars above the entry methods. Alternatively the user event can be displayed as a vertical bar that vertically spans the timelines for all processors. Follow these following basic steps for creating user events in a charm++ program:
 254
 255 \begin{enumerate}
 256 \item
 257 Register an event with an identifying string and either specify or acquire
 258 a globally unique event identifier. All user events that are not registered will be displayed in white.
 259
 260 \item
 261 Use the event identifier to specify trace points in your code of interest to you.
 262 \end{enumerate}
 263
 264 The functions available are as follows:
 265
 266 \begin{itemize}
 267 \item
 268 {\tt int traceRegisterUserEvent(char* EventDesc, int EventNum=-1) }
 269
 270 This function registers a user event by associating {\tt EventNum} to
 271 {\tt EventDesc}. If {\tt EventNum} is not specified, a globally unique
 272 event identifier is obtained from the runtime and returned. The string {\tt EventDesc} must either be a constant string, or it can be a dynamically allocated string that is {\bf NOT} freed by the program. If the {\tt EventDesc} contains a substring ``***'' then the Projections Timeline tool will draw the event vertically spanning all PE timelines.
 273
 274 {\tt EventNum} has to be the same on all processors. Therefore use one of the following methods to ensure the same value for any PEs generating the user events:
 275
 276 \begin{enumerate}
 277 \item
 278 Call {\tt traceRegisterUserEvent} on PE 0 in main::main without specifying
 279 an event number, and store returned event number into a readonly variable.
 280 \item
 281 Call {\tt traceRegisterUserEvent} and specify the event number on
 282 processor 0. Doing this on other processors would have no
 283 effect. Afterwards, the event number can be used in the following user
 284 event calls.
 285 \end{enumerate}
 286
 287 Eg. {\tt traceRegisterUserEvent("Time Step Begin", 10);}
 288
 289 Eg. {\tt eventID = traceRegisterUserEvent(``Time Step Begin'');}
 290
 291 \end{itemize}
 292
 293
 294
 295 There are two main types of user events, bracketed and non bracketed. Non-bracketed user events mark a specific point in time. Bracketed user events span an arbitrary contiguous time range. Additionally, the user can supply a short user supplied text string that is recorded with the event in the log file. These strings should not contain newline characters, but they may contain simple html formatting tags such as \texttt{<br>}, \texttt{<b>}, \texttt{<i>}, \texttt{<font color=\#ff00ff>}, etc.
 296
 297 The calls for recording user events are the following:
 298
 299 \begin{itemize}
 300
 301
 302 \item
 303 {\tt void traceUserEvent(int EventNum) }
 304
 305 This function creates a user event that marks a specific point in time.
 306
 307 Eg. {\tt traceUserEvent(10);}
 308
 309 \item
 310 {\tt void traceUserBracketEvent(int EventNum, double StartTime, double EndTime) }
 311
 312 This function records a user event spanning a time interval from {\tt StartTime} to {\tt EndTime}. Both {\tt StartTime} and {\tt EndTime} should be obtained from a call to {\tt CmiWallTimer()} at the appropriate point in the program.
 313
 314 Eg.
 315 \begin{verbatim}
 316    traceRegisterUserEvent("Critical Code", 20); // on PE 0
 317    double critStart = CmiWallTimer();;  // start time
 318    // do the critical code
 319    traceUserBracketEvent(20, critStart,CmiWallTimer());
 320 \end{verbatim}
 321
 322 \item
 323 {\tt void traceUserSuppliedData(int data) }
 324
 325 This function records a user specified data value at the current
 326 time. This data value can be used to color entry method invocations in
 327 Timeline, see \ref{sec::timeline view}.
 328
 329 \item
 330 {\tt void traceUserSuppliedNote(char * note) }
 331
 332 This function records a user specified text string at the current time.
 333
 334 \item
 335 {\tt void traceUserSuppliedBracketedNote(char *note, int EventNum, double StartTime, double EndTime)}
 336
 337 This function records a user event spanning a time interval from {\tt StartTime} to {\tt EndTime}. Both {\tt StartTime} and {\tt EndTime} should be obtained from a call to {\tt CmiWallTimer()} at the appropriate point in the program.
 338
 339 Additionally, a user supplied text string is recorded, and the  {\tt EventNum} is recorded. These events are therefore displayed with colors determined by the {\tt EventNum}, just as those generated with {\tt traceUserBracketEvent} are.
 340
 341 \end{itemize}
 342
 343 \subsection{User Stats}
 344 \label{sec::user stats}
 345
 346 \charmpp{} allows the user to track the progression of any variable or value throughout the program execution.
 347 These user specified stats can then be plotted in \projections{}, either over time or by processor.
 348 To enable this feature for \charmpp{}, build \charmpp{} with the {\tt --enable-tracing} flag.
 349
 350 Follow these steps to track user stats in a \charmpp{} program:
 351
 352 \begin{enumerate}
 353 \item
 354 Register a stat with an identifying string and a globally unique integer identifier.
 355
 356 \item
 357 Update the value of the stat at points of interest in the code by calling the update stat functions.
 358
 359 \item
 360 Compile program with -tracemode projections flag.
 361 \end{enumerate}
 362
 363 The functions available are as follows:
 364
 365 \begin{itemize}
 366 \item
 367 {\tt int traceRegisterUserStat(const char * StatDesc, int StatNum) }
 368
 369 This function is called once near the beginning the of the \charmpp{} program. {\tt StatDesc} is the identifying
 370 string and {\tt StatNum} is the unique integer identifier.
 371
 372 \item
 373 {\tt void updateStat(int StatNum, double StatValue)}
 374
 375 This function updates the value of a user stat and can be called many times throughout program execution.
 376 {\tt StatNum} is the integer identifier corresponding to the desired stat. {\tt StatValue} is the updated value of the user stat.
 377
 378 \item
 379 {\tt void updateStatPair(int StatNum, double StatValue, double Time)}
 380
 381 This function works similar to {\tt updateStat()}, but also allows the user to store a user specified time for the update. In
 382 \projections{}, the user can then choose which time scale to use: real time, user specified time, or ordered.
 383
 384 \end{itemize}