mono/profiler/log-profiler.txt

   1 # The Mono log profiler
   2
   3 The Mono *log* profiler can be used to collect a lot of information about
   4 a program running in the Mono runtime. This data can be used (both while the process
   5 is running and later) to do analyses of the program behaviour, determine
   6 resource usage, performance issues or even look for particular execution patterns.
   7
   8 This is accomplished by logging the events provided by the Mono runtime through the
   9 profiling interface and periodically writing them to a file which can be later
  10 inspected with the command line *mprof-report* program or with a GUI (not developed yet).
  11
  12 The events collected include (among others):
  13
  14 * method enter and leave
  15 * object allocation
  16 * garbage collection
  17 * JIT compilation
  18 * metadata loading
  19 * lock contention
  20 * exceptions
  21
  22 In addition, the profiler can periodically collect info about all the objects
  23 present in the heap at the end of a garbage collection (this is called heap shot
  24 and currently implemented only for the sgen garbage collector).
  25 Another available profiler mode is the *sampling* or *statistical* mode:
  26 periodically the program is sampled and the information about what the program was
  27 busy with is saved. This allows to get information about the program behaviour
  28 without degrading its performance too much (usually less than 10%).
  29
  30 ## Basic profiler usage
  31
  32 The simpler way to use the profiler is the following:
  33
  34 `mono --profile=log program.exe`
  35
  36 At the end of the execution the file *output.mlpd* will be found in the current
  37 directory. A summary report of the data can be printed by running:
  38
  39 `mprof-report output.mlpd`
  40
  41 With this invocation a huge amount of data is collected about the program execution
  42 and collecting and saving this data can significantly slow down program execution.
  43 If saving the profiling data is not needed, a report can be generated directly with:
  44
  45 `mono --profile=log:report program.exe`
  46
  47 If the information about allocations is not of interest, it can be excluded:
  48
  49 `mono --profile=log:noalloc program.exe`
  50
  51 On the other hand, if method call timing is not important, while allocations are,
  52 the needed info can be gathered with:
  53
  54 `mono --profile=log:nocalls program.exe`
  55
  56 You will still be able to inspect information about the sequence of calls that lead
  57 to each allocation because at each object allocation a stack trace is collected if
  58 full enter/leave information is not available.
  59
  60 To periodically collect heap shots (and exclude method and allocation events) use the
  61 following options (making sure you run with the sgen garbage collector):
  62
  63 `mono --gc=sgen --profile=log:heapshot program.exe`
  64
  65 To perform a sampling profiler run, use the *sample* option:
  66
  67 `mono --profile=log:sample program.exe`
  68
  69 ## Profiler option documentation
  70
  71 By default the *log* profiler will gather all the events provided by the Mono runtime
  72 and write them to a file named *output.mlpd*. When no option is specified, it
  73 is equivalent to using:
  74
  75 `--profile=log:calls,alloc,output=output.mlpd,maxframes=8,calldepth=100`
  76
  77 The following options can be used to modify this default behaviour. Each option
  78 is separated from the next by a `,` character, with no spaces and all the options
  79 are included after the *log:* profile module specifier.
  80
  81 * *help*: display concise help info about each available option
  82
  83 * *[no]alloc*: *noalloc* disables collecting object allocation info, *alloc* enables
  84 it if it was disabled by another option like *heapshot*.
  85
  86 * *[no]calls*: *nocalls* disables collecting method enter and leave events. When this
  87 option is used at each object allocation and at some other events (like lock contentions
  88 and exception throws) a stack trace is collected by default. See the *maxframes* option to
  89 control this behaviour. *calls* enables method enter/leave events if they were disabled
  90 by another option like *heapshot*.
  91
  92 * *heapshot[=MODE]*: collect heap shot data at each major collection. The frequency of the
  93 heap shots can be changed with the *MODE* parameter. When this option is used
  94 allocation events and method enter/leave events are not recorded by default: if they
  95 are needed, they need to be enabled explicitly.
  96 The optional parameter *MODE* can modify the default heap shot frequency.
  97 heapshot can be used multiple times with different modes: in that case a heap shot is
  98 taken if either of the conditions are met.
  99 MODE can be one of:
 100         * *NUM*ms: perform a heap shot if at least *NUM* milliseconds passed since
 101         the last one.
 102         * *NUM*gc: perform a heap shot every *NUM* major garbage collections
 103         * *ondemand*: perform a heap shot when such a command is sent to the
 104         control port
 105
 106 * *sample[=TYPE[/FREQ]]*: collect statistical samples of the program behaviour. The
 107 default is to collect a 1000 times per second the instruction pointer. This is
 108 equivalent to the value "cycles/1000" for *TYPE*. On some systems, like with recent
 109 Linux kernels, it is possible to cause the sampling to happen for other events
 110 provided by the performance counters of the cpu. In this case, *TYPE* can be one of:
 111         * *cycles*: processor cycles
 112         * *instr*: executed instructions
 113         * *cacherefs*: cache references
 114         * *cachemiss*: cache misses
 115         * *branches*: executed branches
 116         * *branchmiss*: mispredicted branches
 117
 118 * *time=TIMER*: use the TIMER timestamp mode. TIMER can have the following values:
 119         * *fast*: a usually faster but possibly more inaccurate timer
 120
 121 * *maxframes=NUM*: when a stack trace needs to be performed, collect *NUM* frames
 122 at the most. The default is 8.
 123
 124 * *calldepth=NUM*: ignore method enter/leave events when the call chain depth is
 125 bigger than NUM.
 126
 127 * *zip*: automatically compress the output data in gzip format.
 128
 129 * *output=OUTSPEC*: instead of writing the profiling data to the output.mlpd file,
 130 substitute *%p* in *OUTSPEC* with the current process id and *%t* with the current
 131 date and time, then do according to *OUTSPEC*:
 132         * if *OUTSPEC* begins with a *|* character, execute the rest as a program
 133         and feed the data to its standard input
 134         * if *OUTSPEC* begins with a *-* character, use the rest of OUTSPEC as
 135         the filename, but force overwrite any existing file by that name
 136         * if *OUTSPEC* begins with a *#* character, use the rest of OUTSPEC as a
 137         file descriptor and feed the data to it.
 138         * otherwise write the data the the named file: note that is a file by that
 139         name already exists, a warning is issued and profiling is disabled.
 140
 141 * *report*: the profiling data is sent to mprof-report, which will print a summary
 142 report. This is equivalent to the option: `output=mprof-report -`. If the *output*
 143 option is specified as well, the report will be written to the output file instead of
 144 the console.
 145
 146 * *port=PORT*: specify the tcp/ip port to use for the listening command server.
 147 Currently not available for windows.
 148 This server is started for example when heapshot=ondemand is used: it will read
 149 commands line by line. The following commands are available:
 150         * *heapshot*: perform a heapshot as soon as possible
 151
 152 ## Analyzing the profile data
 153
 154 Currently there is a command line program (*mprof-report*) to analyze the
 155 data produced by the profiler. This is ran automatically when the *report*
 156 profiler option is used.
 157 Simply run:
 158
 159 `mprof-report output.mlpd`
 160
 161 to see a summary report of the data included in the file.
 162
 163 ### Trace information for events
 164
 165 Often it is important for some events, like allocations, lock contention
 166 and exception throws to know where they happened. Or we may want to see
 167 what sequence of calls leads to a particular method invocation. To see this
 168 info invoke mprof-report as follows:
 169
 170 `mprof-report --traces output.mlpd`
 171
 172 The maximum number of methods in each stack trace can be specified with the
 173 *--maxframes=NUM* option:
 174
 175 `mprof-report --traces --maxframes=4 output.mlpd`
 176
 177 The stack trace info will be available if method enter/leave events have been
 178 recorded or if stack trace collection wasn't explicitly disabled with the
 179 *maxframes=0* profiler option. Note that the profiler will collect up to 8
 180 frames by default at specific events when the *nocalls* option is used, so
 181 in that case, if more stack frames are required in mprof-report, a bigger
 182 value for maxframes when profiling must be used, too.
 183
 184 The *--traces* option also controls the reverse reference feature in the heapshot
 185 report: for each class it reports how many references to objects of that class
 186 come from other classes.
 187
 188 ### Sort order for methods and allocations
 189
 190 When a list of methods is printed the default sort order is based on the total time
 191 spent in the method. This time is wall clock time (that is, it includes the time
 192 spent, for example, in a sleep call, even if actual cpu time would be basically 0).
 193 Also, if the method has been ran on different threads, the time will be a sum
 194 of the time used in each thread.
 195
 196 To change the sort order, use the option:
 197
 198 `--method-sort=MODE`
 199
 200 where *MODE* can be:
 201
 202 * *self*: amount of time spent in the method itself and not in its callees
 203 * *calls*: the number of method invocations
 204 * *total*: the total time spent in the method.
 205
 206 Object allocation lists are sorted by default depending on the total amount
 207 of bytes used by each type.
 208
 209 To change the sort order of object allocations, use the option:
 210
 211 `--alloc-sort=MODE`
 212
 213 where *MODE* can be:
 214
 215 * *count*: the number of allocated objects of the given type
 216 * *bytes*: the total number of bytes used by objects of the given type
 217
 218 ### Selecting what data to report
 219
 220 The profiler by default collects data about many runtime subsystems and mprof-report
 221 prints a summary of all the subsystems that are found in the data file. It is possible
 222 to tell mprof-report to only show information about some of them with the following
 223 option:
 224
 225 `--reports=R1[,R2...]`
 226
 227 where the report names R1, R2 etc. can be:
 228
 229 * *header*: information about program startup and profiler version
 230 * *jit*: JIT compiler information
 231 * *sample*: statistical sampling information
 232 * *gc*: garbage collection information
 233 * *alloc*: object allocation information
 234 * *call*: method profiling information
 235 * *metadata*: metadata events like image loads
 236 * *exception*: exception throw and handling information
 237 * *monitor*: lock contention information
 238 * *thread*: thread information
 239 * *heapshot*: live heap usage at heap shots
 240
 241 It is possible to limit some of the data displayed to a timeframe of the
 242 program execution with the option:
 243
 244 `--time=FROM-TO`
 245
 246 where *FROM* and *TO* are seconds since application startup (they can be
 247 floating point numbers).
 248
 249 Another interesting option is to consider only events happening on a particular
 250 thread with the following option:
 251
 252 `--thread=THREADID`
 253
 254 where *THREADID* is one of the numbers listed in the thread summary report
 255 (or a thread name when present).
 256
 257 By default long lists of methods or other information like object allocations
 258 are limited to the most important data. To increase the amount of information
 259 printed you can use the option:
 260
 261 `--verbose`
 262
 263 ### Track individual objects
 264
 265 Instead of printing the usual reports from the profiler data, it is possible
 266 to track some interesting information about some specific object addresses.
 267 The objects are selected based on their address with the *--track* option as follows:
 268
 269 `--track=0xaddr1[,0xaddr2,...]`
 270
 271 The reported info (if available in the data file), will be class name, size,
 272 creation time, stack trace of creation (with the *--traces* option), etc.
 273 If heapshot data is available it will be possible to also track what other objects
 274 reference one of the listed addresses.
 275
 276 The object addresses can be gathered either from the profiler report in some
 277 cases (like in the monitor lock report), from the live application or they can
 278 be selected with the *--find=FINDSPEC* option. FINDSPEC can be one of the
 279 following:
 280
 281 * *S:SIZE*: where the object is selected if it's size is at least *SIZE*
 282 * *T:NAME*: where the object is selected if *NAME* partially matches its class name
 283
 284 This option can be specified multiple times with one of the different kinds
 285 of FINDSPEC. For example, the following:
 286
 287 `--find=S:10000 --find=T:Byte[]`
 288
 289 will find all the byte arrays that are at least 10000 bytes in size.
 290
 291 Note that with a moving garbage collector the object address can change, so
 292 you may need to track the changed address manually. It can also happen that
 293 multiple objects are allocated at the same address, so the output from this
 294 option can become large.
 295
 296 ### Saving a profiler report
 297
 298 By default mprof-report will print the summary data to the console.
 299 To print it to a file, instead, use the option:
 300
 301 `--out=FILENAME`
 302
 303 ## Dealing with profiler slowness
 304
 305 If the profiler needs to collect lots of data, the execution of the program will
 306 slow down significantly, usually 10 to 20 times slower. There are several
 307 ways to reduce the impact of the profiler on the program execution.
 308
 309 ### Use the statistical sampling mode
 310
 311 Statistical sampling allows executing a program under the profiler with minimal
 312 performance overhead (usually less than 10%). This mode allows checking where
 313 the program is spending most of it's execution time without significantly
 314 perturbing its behaviour.
 315
 316 ### Collect less data
 317
 318 Collecting method enter/leave events can be very expensive, especially in programs
 319 that perform many millions of tiny calls. The profiler option *nocalls* can be
 320 used to avoid collecting this data or it can be limited to only a few call levels
 321 with the *calldepth* option.
 322
 323 Object allocation information is expensive as well, though much less than
 324 method enter/leave events. If it's not needed, it can be skipped with the
 325 *noalloc* profiler option. Note that when method enter/leave events are
 326 discarded, by default stack traces are collected at each allocation and this
 327 can be expensive as well. The impact of stack trace information can be reduced
 328 by setting a low value with the *maxframes* option or by eliminating them
 329 completely, by setting it to 0.
 330
 331 The other major source of data is the heapshot profiler option: especially
 332 if the managed heap is big, since every object needs to be inspected. The *MODE*
 333 parameter of the *heapshot* option can be used to reduce the frequency of the heap
 334 shots.
 335
 336 ### Reduce the timestamp overhead
 337
 338 On many operating systems or architectures what actually slows down profiling
 339 is the function provided by the system to get timestamp information.
 340 The *time=fast* profiler option can be usually used to speed up this operation,
 341 but, depending on the system, time accounting may have some level of approximation
 342 (though statistically the data should be still fairly valuable).
 343
 344 ## Dealing with the size of the data files
 345
 346 When collecting a lot of information about a profiled program, huge data
 347 files can be generated. There are a few ways to minimize the amount of data,
 348 for example by not collecting some of the more space-consuming information
 349 or by compressing the information on the fly or by just generating a summary
 350 report.
 351
 352 ### Reducing the amount of data
 353
 354 Method enter/leave events can be excluded completely with the *nocalls* option
 355 or they can be limited to just a few levels of calls with the *calldepth* option.
 356 For example, the option:
 357
 358 `calldepth=10`
 359
 360 will ignore the method events when there are more than 10 managed stack frames.
 361 This is very useful for programs that have deep recursion or for programs that
 362 perform many millions of tiny calls deep enough in the call stack. The optimal
 363 number for the calldepth option depends on the program and it needs to be balanced
 364 between providing enough profiling information and allowing fast execution speed.
 365
 366 Note that by default, if method events are not recorded at all, the profiler will
 367 collect stack trace information at events like allocations. To avoid gathering this
 368 data, use the *maxframes=0* profiler option.
 369
 370 Allocation events can be eliminated with the *noalloc* option.
 371
 372 Heap shot data can also be huge: by default it is collected at each major collection.
 373 To reduce the frequency, you can specify a heapshot mode: for example to collect
 374 every 5 collections (including major and minor):
 375
 376 `heapshot=5gc`
 377
 378 or when at least 5 seconds passed since the last heap shot:
 379
 380 `heapshot=5000ms`
 381
 382 ### Compressing the data
 383
 384 To reduce the amout of disk space used by the data, the data can be compressed
 385 either after it has been generated with the gzip command:
 386
 387 `gzip -9 output.mlpd`
 388
 389 or it can be compressed automatically by using the *zip* profiler option. Note
 390 that in this case there could be a significant slowdown of the profiled program.
 391
 392 The mprof-report program will tranparently deal with either compressed or
 393 uncompressed data files.
 394
 395 ### Generating only a summary report
 396
 397 Often it's enough to look at the profiler summary report to diagnose an issue and in this
 398 case it's possible to avoid saving the profiler data file to disk. This can be
 399 accomplished with the *report* profiler option, which will basically send the data
 400 to the mprof-report program for display.
 401
 402 To have more control of what summary information is reported (or to use a completely
 403 different program to decode the profiler data), the *output* profiler option can be
 404 used, with `|` as the first character: the rest of the output name will be
 405 executed as a program with the data fed in on the standard input.
 406
 407 For example, to print only the Monitor summary with stack trace information, you
 408 could use it like this:
 409
 410 `output=|mprof-report --reports=monitor --traces -`
 411