1 \projections{} is a performance analysis/visualization framework that
2 helps you understand and investigate performance-related problems in
3 the (
\charmpp{}) applications. It is a framework with an
4 event tracing component which allows to control the
5 amount of information generated. The tracing has low perturbation
7 also has a Java-based visualization and analysis component with
8 various views that help present the performance information in a
9 visually useful manner.
11 Performance analysis with
\projections{} typically involves two simple
16 Prepare your application by linking with the appropriate trace
17 generation modules and execute it to generate trace data.
19 Using the Java-based tool to visually study various aspects of the
20 performance and locate the performance issues for that application execution.
23 The
\charmpp{} runtime automatically records pertinent performance
24 data for performance-related events during execution. These
25 events include the start and end of entry method execution, message
26 send from entry methods and scheduler idle time. This means
{\em
27 most
} users do not need to manually insert code into their
28 applications in order to generate trace data. In scenarios where
29 special performance information not captured by the runtime is
30 required, an API (see section
\ref{sec::api
}) is available for
31 user-specific events with some support for visualization by the
32 Java-based tool. If greater control over tracing activities
33 (e.g. dynamically turning instrumentation on and off) is desired, the
34 API also allows users to insert code into their applications for such
37 The automatic recording of events by the
\projections{} framework
38 introduces the overhead of an if-statement for each runtime event,
39 even if no performance analysis traces are desired. Developers of
40 \charmpp{} applications who consider such an overhead to be
41 unacceptable (e.g. for a production application which requires the
42 absolute best performance) may recompile the
\charmpp{} runtime with
43 the
\verb|--with-production| flag, which removes the instrumentation
44 stubs. To enable the instrumentation stubs while retaining the other
45 optimizations enabled by
\verb|--with-production|, one may
46 compile
\charmpp{} with both
\verb|--with-production|
47 and
\verb|--enable-tracing|, which explicitly enables Projections tracing.
49 To enable performance tracing of your application, users simply need
50 to link the appropriate trace data generation module(s) (also referred
51 to as
{\em tracemode(s)
}). (see section
\ref{sec::trace modules
})
53 \section{Enabling Performance Tracing at Link/Run Time
}
54 \label{sec::trace modules
}
56 \projections{} tracing modules dictate the type of performance data,
57 data detail and data format each processor will record. They are also
58 referred to as ``tracemodes''. There are currently
2 tracemodes
59 available. Zero or more tracemodes may be specified at link-time. When
60 no tracemodes are specified, no trace data is generated.
62 \subsection{Tracemode
{\tt projections
}}
64 Link time option:
{\tt -tracemode projections
}
66 This tracemode generates files that contain
67 information about all
\charmpp{} events like entry method calls and
68 message packing during the execution of the program. The data will be
69 used by
\projections{} in visualization and analysis.
71 This tracemode creates a single symbol table file and $p$ ASCII
72 log files for $p$ processors. The names of the log files will be
73 NAME.\#.log where NAME is the name of your executable and \# is the
74 processor \#. The name of the symbol table file is NAME.sts where NAME
75 is the name of your executable.
77 This is the main source of data needed by the performance
78 visualizer. Certain tools like timeline will not work without the
79 detail data from this tracemode.
81 The following is a list of runtime options available under this tracemode:
85 {\tt +logsize NUM
}: keep only NUM log entries in the memory of each
86 processor. The logs are emptied and flushed to disk when filled.
88 {\tt +binary-trace
}: generate projections log in binary form.
90 {\tt +gz-trace
}: generate gzip (if available) compressed log files.
92 {\tt +gz-no-trace
}: generate regular (not compressed) log files.
94 {\tt +checknested
}: a debug option. Checks if events are improperly nested
95 while recorded and issue a warning immediately.
97 \item {\tt +trace-subdirs NUM
}: divide the generated log files among
98 {\tt NUM
} subdirectories of the trace root, each named
{\tt
102 \subsection{Tracemode
{\tt summary
}}
104 Compile option:
{\tt -tracemode summary
}
106 In this tracemode, execution time across all entry points for each
107 processor is partitioned into a fixed number of equally sized
108 time-interval bins. These bins are globally resized whenever they are
109 all filled in order to accommodate longer execution times while keeping
110 the amount of space used constant.
112 Additional data like the total number of calls made to each entry
113 point is summarized within each processor.
115 This tracemode will generate a single symbol table file and $p$ ASCII
116 summary files for $p$ processors. The names of the summary files will
117 be NAME.\#.sum where NAME is the name of your executable and \# is the
118 processor \#. The name of the symbol table file is NAME.sum.sts where NAME
119 is the name of your executable.
121 This tracemode can be used to control the amount of output generated
122 in a run. It is typically used in scenarios where a quick look at the
123 overall utilization graph of the application is desired to identify
124 smaller regions of time for more detailed study. Attempting to
125 generate the same graph using the detailed logs of the prior tracemode
126 may be unnecessarily time consuming or impossible.
128 The following is a list of runtime options available under this tracemode:
132 {\tt +bincount NUM
}: use NUM time-interval bins. The bins are resized and compacted when filled.
134 {\tt +binsize TIME
}: sets the initial time quantum each bin represents.
136 {\tt +version
}: set summary version to generate.
138 %{\tt +epThreshold}: DOESNT DO ANYTHING YET. LEFT COMMENTED FOR DOC PURPOSES
140 %{\tt +epInterval}: DOESNT DO ANYTHING YET. LEFT COMMENTED FOR DOC PURPOSES
142 {\tt +sumDetail
}: Generates a additional set of files, one per processor,
143 that stores the time spent by each entry method associated with each
144 time-bin. The names of ``summary detail'' files will be NAME.\#.sumd where
145 NAME is the name of your executable and \# is the processor \#.
147 {\tt +sumonly
}: Generates a single file that stores a single
148 utilization value per time-bin, averaged across all processors. This
149 file bears the name NAME.sum where NAME is the name of your
150 executable. This runtime option currently overrides the
{\tt
154 \subsection{General Runtime Options
}
155 \label{sec::general options
}
157 The following is a list of runtime options available with the same
158 semantics for all tracemodes:
162 {\tt +traceroot DIR
}: place all generated files in DIR.
164 {\tt +traceoff
}: trace generation is turned off when the application
165 is started. The user is expected to insert code to turn tracing on at
166 some point in the run.
168 {\tt +traceWarn
}: By default, warning messages from the framework are
169 not displayed. This option enables warning messages to be printed to
170 screen. However, on large numbers of processors, they can overwhelm
171 the terminal I/O system of the machine and result in unacceptable
172 perturbation of the application.
174 {\tt +traceprocessors RANGE
}: Only output logfiles for PEs present in the range (i.e.
{\tt 0-
31,
32-
999966:
1000,
999967-
999999} to record every PE on the first
32, only every thousanth for the middle range, and the last
32 for a million processor run).
178 \subsection{End-of-run Analysis for Data Reduction
}
179 \label{sec::data reduction
}
181 As applications are scaled to thousands or hundreds of thousands of
182 processors, the amount of data generated becomes extremely large and
183 potentially unmanageable by the visualization tool. At the time of this
{\tt +traceWarn
}
184 documentation,
\projections{} is capable of handling data from
8000+
185 processors but with somewhat severe tool responsiveness issues. We
186 have developed an approach to mitigate this data size problem with
187 options to trim-off ``uninteresting'' processors' data by not writing
188 such data at the end of an application's execution.
190 This is currently done through heuristics to pick out interesting
191 extremal (i.e. poorly behaved) processors and at the same time using a
192 k-means clustering to pick out exemplar processors from equivalence
193 classes to form a representative subset of processor data. The analyst
194 is advised to also link in the summary module via
{\tt +tracemode
195 summary
} and enable the
{\tt +sumDetail
} option in order to retain
196 some profile data for processors whose data were dropped.
200 {\tt +extrema
}: enables extremal processor identification analysis at
201 the end of the application's execution.
203 {\tt +numClusters
}: determines the number of clusters (equivalence
204 classes) to be used by the k-means clustering algorithm for
205 determining exemplar processors. Analysts should take advantage of
206 their knowledge of natural application decomposition to guess at a
210 This feature is still being developed and refined as part of our
211 research. It would be appreciated if users of this feature could
212 contact the developers if you have input or suggestions.
215 \section{Controlling Tracing from Within the Program
}
218 \subsection{Selective Tracing
}
219 \label{sec::selective tracing
}
221 \charmpp{} allows user to start/stop tracing the execution at certain
222 points in time on the local processor. Users are advised to make these
223 calls on all processors and at well-defined points in the application.
225 Users may choose to have instrumentation turned off at first (by
226 command line option
{\tt +traceoff
} - see section
\ref{sec::general options
}) if some period of time in middle of the
227 application\'s execution is of interest to the user.
229 Alternatively, users may start the application with instrumentation
230 turned on (default) and turn off tracing for specific sections of the
233 Again, users are advised to be consistent as the
{\tt +traceoff
}
234 runtime option applies to all processors in the application.
238 {\tt void traceBegin()
}
240 Enables the runtime to trace events (including all user events) on the local processor where
{\tt traceBegin
} is called.
243 {\tt void traceEnd()
}
245 Prevents the runtime from tracing events (including all user events) on the local processor where
{\tt traceEnd
} is called.
249 \subsection{User Events
}
250 \label{sec::user events
}
252 \projections{} has the ability to visualize traceable user
253 specified events. User events are usually displayed in the Timeline view as vertical bars above the entry methods. Alternatively the user event can be displayed as a vertical bar that vertically spans the timelines for all processors. Follow these following basic steps for creating user events in a charm++ program:
257 Register an event with an identifying string and either specify or acquire
258 a globally unique event identifier. All user events that are not registered will be displayed in white.
261 Use the event identifier to specify trace points in your code of interest to you.
264 The functions available are as follows:
268 {\tt int traceRegisterUserEvent(char* EventDesc, int EventNum=-
1)
}
270 This function registers a user event by associating
{\tt EventNum
} to
271 {\tt EventDesc
}. If
{\tt EventNum
} is not specified, a globally unique
272 event identifier is obtained from the runtime and returned. The string
{\tt EventDesc
} must either be a constant string, or it can be a dynamically allocated string that is
{\bf NOT
} freed by the program. If the
{\tt EventDesc
} contains a substring ``***'' then the Projections Timeline tool will draw the event vertically spanning all PE timelines.
274 {\tt EventNum
} has to be the same on all processors. Therefore use one of the following methods to ensure the same value for any PEs generating the user events:
278 Call
{\tt traceRegisterUserEvent
} on PE
0 in main::main without specifying
279 an event number, and store returned event number into a readonly variable.
281 Call
{\tt traceRegisterUserEvent
} and specify the event number on
282 processor
0. Doing this on other processors would have no
283 effect. Afterwards, the event number can be used in the following user
287 Eg.
{\tt traceRegisterUserEvent("Time Step Begin",
10);
}
289 Eg.
{\tt eventID = traceRegisterUserEvent(``Time Step Begin'');
}
295 There are two main types of user events, bracketed and non bracketed. Non-bracketed user events mark a specific point in time. Bracketed user events span an arbitrary contiguous time range. Additionally, the user can supply a short user supplied text string that is recorded with the event in the log file. These strings should not contain newline characters, but they may contain simple html formatting tags such as
\texttt{<br>
},
\texttt{<b>
},
\texttt{<i>
},
\texttt{<font
color=\#ff00ff>
}, etc.
297 The calls for recording user events are the following:
303 {\tt void traceUserEvent(int EventNum)
}
305 This function creates a user event that marks a specific point in time.
307 Eg.
{\tt traceUserEvent(
10);
}
310 {\tt void traceUserBracketEvent(int EventNum, double StartTime, double EndTime)
}
312 This function records a user event spanning a time interval from
{\tt StartTime
} to
{\tt EndTime
}. Both
{\tt StartTime
} and
{\tt EndTime
} should be obtained from a call to
{\tt CmiWallTimer()
} at the appropriate point in the program.
316 traceRegisterUserEvent("Critical Code",
20); // on PE
0
317 double critStart = CmiWallTimer();; // start time
318 // do the critical code
319 traceUserBracketEvent(
20, critStart,CmiWallTimer());
323 {\tt void traceUserSuppliedData(int data)
}
325 This function records a user specified data value at the current
326 time. This data value can be used to
color entry method invocations in
327 Timeline, see
\ref{sec::timeline view
}.
330 {\tt void traceUserSuppliedNote(char * note)
}
332 This function records a user specified text string at the current time.
335 {\tt void traceUserSuppliedBracketedNote(char *note, int EventNum, double StartTime, double EndTime)
}
337 This function records a user event spanning a time interval from
{\tt StartTime
} to
{\tt EndTime
}. Both
{\tt StartTime
} and
{\tt EndTime
} should be obtained from a call to
{\tt CmiWallTimer()
} at the appropriate point in the program.
339 Additionally, a user supplied text string is recorded, and the
{\tt EventNum
} is recorded. These events are therefore displayed with colors determined by the
{\tt EventNum
}, just as those generated with
{\tt traceUserBracketEvent
} are.
343 \subsection{User Stats
}
344 \label{sec::user stats
}
346 \charmpp{} allows the user to track the progression of any variable or value throughout the program execution.
347 These user specified stats can then be plotted in
\projections{}, either over time or by processor.
348 To enable this feature for
\charmpp{}, build
\charmpp{} with the
{\tt --enable-tracing
} flag.
350 Follow these steps to track user stats in a
\charmpp{} program:
354 Register a stat with an identifying string and a globally unique integer identifier.
357 Update the value of the stat at points of interest in the code by calling the update stat functions.
360 Compile program with -tracemode projections flag.
363 The functions available are as follows:
367 {\tt int traceRegisterUserStat(const char * StatDesc, int StatNum)
}
369 This function is called once near the beginning the of the
\charmpp{} program.
{\tt StatDesc
} is the identifying
370 string and
{\tt StatNum
} is the unique integer identifier.
373 {\tt void updateStat(int StatNum, double StatValue)
}
375 This function updates the value of a user stat and can be called many times throughout program execution.
376 {\tt StatNum
} is the integer identifier corresponding to the desired stat.
{\tt StatValue
} is the updated value of the user stat.
379 {\tt void updateStatPair(int StatNum, double StatValue, double Time)
}
381 This function works similar to
{\tt updateStat()
}, but also allows the user to store a user specified time for the update. In
382 \projections{}, the user can then choose which time scale to use: real time, user specified time, or ordered.