Saved and restored logging._handlerList at the same time as saving/restoring logging...
[python.git] / Doc / lib / libprofile.tex
blobddbae7397ad62d5237001f8a909c5aa36f90bcf2
1 \chapter{The Python Profiler \label{profile}}
3 \sectionauthor{James Roskind}{}
5 Copyright \copyright{} 1994, by InfoSeek Corporation, all rights reserved.
6 \index{InfoSeek Corporation}
8 Written by James Roskind.\footnote{
9 Updated and converted to \LaTeX\ by Guido van Rossum. The references to
10 the old profiler are left in the text, although it no longer exists.}
12 Permission to use, copy, modify, and distribute this Python software
13 and its associated documentation for any purpose (subject to the
14 restriction in the following sentence) without fee is hereby granted,
15 provided that the above copyright notice appears in all copies, and
16 that both that copyright notice and this permission notice appear in
17 supporting documentation, and that the name of InfoSeek not be used in
18 advertising or publicity pertaining to distribution of the software
19 without specific, written prior permission. This permission is
20 explicitly restricted to the copying and modification of the software
21 to remain in Python, compiled Python, or other languages (such as C)
22 wherein the modified or derived code is exclusively imported into a
23 Python module.
25 INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
26 SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
27 FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
28 SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
29 RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
30 CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
31 CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
34 The profiler was written after only programming in Python for 3 weeks.
35 As a result, it is probably clumsy code, but I don't know for sure yet
36 'cause I'm a beginner :-). I did work hard to make the code run fast,
37 so that profiling would be a reasonable thing to do. I tried not to
38 repeat code fragments, but I'm sure I did some stuff in really awkward
39 ways at times. Please send suggestions for improvements to:
40 \email{jar@netscape.com}. I won't promise \emph{any} support. ...but
41 I'd appreciate the feedback.
44 \section{Introduction to the profiler}
45 \nodename{Profiler Introduction}
47 A \dfn{profiler} is a program that describes the run time performance
48 of a program, providing a variety of statistics. This documentation
49 describes the profiler functionality provided in the modules
50 \module{profile} and \module{pstats}. This profiler provides
51 \dfn{deterministic profiling} of any Python programs. It also
52 provides a series of report generation tools to allow users to rapidly
53 examine the results of a profile operation.
54 \index{deterministic profiling}
55 \index{profiling, deterministic}
58 %\section{How Is This Profiler Different From The Old Profiler?}
59 %\nodename{Profiler Changes}
61 %(This section is of historical importance only; the old profiler
62 %discussed here was last seen in Python 1.1.)
64 %The big changes from old profiling module are that you get more
65 %information, and you pay less CPU time. It's not a trade-off, it's a
66 %trade-up.
68 %To be specific:
70 %\begin{description}
72 %\item[Bugs removed:]
73 %Local stack frame is no longer molested, execution time is now charged
74 %to correct functions.
76 %\item[Accuracy increased:]
77 %Profiler execution time is no longer charged to user's code,
78 %calibration for platform is supported, file reads are not done \emph{by}
79 %profiler \emph{during} profiling (and charged to user's code!).
81 %\item[Speed increased:]
82 %Overhead CPU cost was reduced by more than a factor of two (perhaps a
83 %factor of five), lightweight profiler module is all that must be
84 %loaded, and the report generating module (\module{pstats}) is not needed
85 %during profiling.
87 %\item[Recursive functions support:]
88 %Cumulative times in recursive functions are correctly calculated;
89 %recursive entries are counted.
91 %\item[Large growth in report generating UI:]
92 %Distinct profiles runs can be added together forming a comprehensive
93 %report; functions that import statistics take arbitrary lists of
94 %files; sorting criteria is now based on keywords (instead of 4 integer
95 %options); reports shows what functions were profiled as well as what
96 %profile file was referenced; output format has been improved.
98 %\end{description}
101 \section{Instant Users Manual \label{profile-instant}}
103 This section is provided for users that ``don't want to read the
104 manual.'' It provides a very brief overview, and allows a user to
105 rapidly perform profiling on an existing application.
107 To profile an application with a main entry point of \function{foo()},
108 you would add the following to your module:
110 \begin{verbatim}
111 import profile
112 profile.run('foo()')
113 \end{verbatim}
115 The above action would cause \function{foo()} to be run, and a series of
116 informative lines (the profile) to be printed. The above approach is
117 most useful when working with the interpreter. If you would like to
118 save the results of a profile into a file for later examination, you
119 can supply a file name as the second argument to the \function{run()}
120 function:
122 \begin{verbatim}
123 import profile
124 profile.run('foo()', 'fooprof')
125 \end{verbatim}
127 The file \file{profile.py} can also be invoked as
128 a script to profile another script. For example:
130 \begin{verbatim}
131 python -m profile myscript.py
132 \end{verbatim}
134 \file{profile.py} accepts two optional arguments on the command line:
136 \begin{verbatim}
137 profile.py [-o output_file] [-s sort_order]
138 \end{verbatim}
140 \programopt{-s} only applies to standard output (\programopt{-o} is
141 not supplied). Look in the \class{Stats} documentation for valid sort
142 values.
144 When you wish to review the profile, you should use the methods in the
145 \module{pstats} module. Typically you would load the statistics data as
146 follows:
148 \begin{verbatim}
149 import pstats
150 p = pstats.Stats('fooprof')
151 \end{verbatim}
153 The class \class{Stats} (the above code just created an instance of
154 this class) has a variety of methods for manipulating and printing the
155 data that was just read into \code{p}. When you ran
156 \function{profile.run()} above, what was printed was the result of three
157 method calls:
159 \begin{verbatim}
160 p.strip_dirs().sort_stats(-1).print_stats()
161 \end{verbatim}
163 The first method removed the extraneous path from all the module
164 names. The second method sorted all the entries according to the
165 standard module/line/name string that is printed (this is to comply
166 with the semantics of the old profiler). The third method printed out
167 all the statistics. You might try the following sort calls:
169 \begin{verbatim}
170 p.sort_stats('name')
171 p.print_stats()
172 \end{verbatim}
174 The first call will actually sort the list by function name, and the
175 second call will print out the statistics. The following are some
176 interesting calls to experiment with:
178 \begin{verbatim}
179 p.sort_stats('cumulative').print_stats(10)
180 \end{verbatim}
182 This sorts the profile by cumulative time in a function, and then only
183 prints the ten most significant lines. If you want to understand what
184 algorithms are taking time, the above line is what you would use.
186 If you were looking to see what functions were looping a lot, and
187 taking a lot of time, you would do:
189 \begin{verbatim}
190 p.sort_stats('time').print_stats(10)
191 \end{verbatim}
193 to sort according to time spent within each function, and then print
194 the statistics for the top ten functions.
196 You might also try:
198 \begin{verbatim}
199 p.sort_stats('file').print_stats('__init__')
200 \end{verbatim}
202 This will sort all the statistics by file name, and then print out
203 statistics for only the class init methods (since they are spelled
204 with \code{__init__} in them). As one final example, you could try:
206 \begin{verbatim}
207 p.sort_stats('time', 'cum').print_stats(.5, 'init')
208 \end{verbatim}
210 This line sorts statistics with a primary key of time, and a secondary
211 key of cumulative time, and then prints out some of the statistics.
212 To be specific, the list is first culled down to 50\% (re: \samp{.5})
213 of its original size, then only lines containing \code{init} are
214 maintained, and that sub-sub-list is printed.
216 If you wondered what functions called the above functions, you could
217 now (\code{p} is still sorted according to the last criteria) do:
219 \begin{verbatim}
220 p.print_callers(.5, 'init')
221 \end{verbatim}
223 and you would get a list of callers for each of the listed functions.
225 If you want more functionality, you're going to have to read the
226 manual, or guess what the following functions do:
228 \begin{verbatim}
229 p.print_callees()
230 p.add('fooprof')
231 \end{verbatim}
233 Invoked as a script, the \module{pstats} module is a statistics
234 browser for reading and examining profile dumps. It has a simple
235 line-oriented interface (implemented using \refmodule{cmd}) and
236 interactive help.
238 \section{What Is Deterministic Profiling?}
239 \nodename{Deterministic Profiling}
241 \dfn{Deterministic profiling} is meant to reflect the fact that all
242 \emph{function call}, \emph{function return}, and \emph{exception} events
243 are monitored, and precise timings are made for the intervals between
244 these events (during which time the user's code is executing). In
245 contrast, \dfn{statistical profiling} (which is not done by this
246 module) randomly samples the effective instruction pointer, and
247 deduces where time is being spent. The latter technique traditionally
248 involves less overhead (as the code does not need to be instrumented),
249 but provides only relative indications of where time is being spent.
251 In Python, since there is an interpreter active during execution, the
252 presence of instrumented code is not required to do deterministic
253 profiling. Python automatically provides a \dfn{hook} (optional
254 callback) for each event. In addition, the interpreted nature of
255 Python tends to add so much overhead to execution, that deterministic
256 profiling tends to only add small processing overhead in typical
257 applications. The result is that deterministic profiling is not that
258 expensive, yet provides extensive run time statistics about the
259 execution of a Python program.
261 Call count statistics can be used to identify bugs in code (surprising
262 counts), and to identify possible inline-expansion points (high call
263 counts). Internal time statistics can be used to identify ``hot
264 loops'' that should be carefully optimized. Cumulative time
265 statistics should be used to identify high level errors in the
266 selection of algorithms. Note that the unusual handling of cumulative
267 times in this profiler allows statistics for recursive implementations
268 of algorithms to be directly compared to iterative implementations.
271 \section{Reference Manual}
273 \declaremodule{standard}{profile}
274 \modulesynopsis{Python profiler}
278 The primary entry point for the profiler is the global function
279 \function{profile.run()}. It is typically used to create any profile
280 information. The reports are formatted and printed using methods of
281 the class \class{pstats.Stats}. The following is a description of all
282 of these standard entry points and functions. For a more in-depth
283 view of some of the code, consider reading the later section on
284 Profiler Extensions, which includes discussion of how to derive
285 ``better'' profilers from the classes presented, or reading the source
286 code for these modules.
288 \begin{funcdesc}{run}{command\optional{, filename}}
290 This function takes a single argument that has can be passed to the
291 \keyword{exec} statement, and an optional file name. In all cases this
292 routine attempts to \keyword{exec} its first argument, and gather profiling
293 statistics from the execution. If no file name is present, then this
294 function automatically prints a simple profiling report, sorted by the
295 standard name string (file/line/function-name) that is presented in
296 each line. The following is a typical output from such a call:
298 \begin{verbatim}
299 main()
300 2706 function calls (2004 primitive calls) in 4.504 CPU seconds
302 Ordered by: standard name
304 ncalls tottime percall cumtime percall filename:lineno(function)
305 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
306 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
308 \end{verbatim}
310 The first line indicates that this profile was generated by the call:\\
311 \code{profile.run('main()')}, and hence the exec'ed string is
312 \code{'main()'}. The second line indicates that 2706 calls were
313 monitored. Of those calls, 2004 were \dfn{primitive}. We define
314 \dfn{primitive} to mean that the call was not induced via recursion.
315 The next line: \code{Ordered by:\ standard name}, indicates that
316 the text string in the far right column was used to sort the output.
317 The column headings include:
319 \begin{description}
321 \item[ncalls ]
322 for the number of calls,
324 \item[tottime ]
325 for the total time spent in the given function (and excluding time
326 made in calls to sub-functions),
328 \item[percall ]
329 is the quotient of \code{tottime} divided by \code{ncalls}
331 \item[cumtime ]
332 is the total time spent in this and all subfunctions (from invocation
333 till exit). This figure is accurate \emph{even} for recursive
334 functions.
336 \item[percall ]
337 is the quotient of \code{cumtime} divided by primitive calls
339 \item[filename:lineno(function) ]
340 provides the respective data of each function
342 \end{description}
344 When there are two numbers in the first column (for example,
345 \samp{43/3}), then the latter is the number of primitive calls, and
346 the former is the actual number of calls. Note that when the function
347 does not recurse, these two values are the same, and only the single
348 figure is printed.
350 \end{funcdesc}
352 \begin{funcdesc}{runctx}{command, globals, locals\optional{, filename}}
353 This function is similar to \function{profile.run()}, with added
354 arguments to supply the globals and locals dictionaries for the
355 \var{command} string.
356 \end{funcdesc}
358 Analysis of the profiler data is done using this class from the
359 \module{pstats} module:
361 % now switch modules....
362 % (This \stmodindex use may be hard to change ;-( )
363 \stmodindex{pstats}
365 \begin{classdesc}{Stats}{filename\optional{, \moreargs}}
366 This class constructor creates an instance of a ``statistics object''
367 from a \var{filename} (or set of filenames). \class{Stats} objects are
368 manipulated by methods, in order to print useful reports.
370 The file selected by the above constructor must have been created by
371 the corresponding version of \module{profile}. To be specific, there is
372 \emph{no} file compatibility guaranteed with future versions of this
373 profiler, and there is no compatibility with files produced by other
374 profilers (such as the old system profiler).
376 If several files are provided, all the statistics for identical
377 functions will be coalesced, so that an overall view of several
378 processes can be considered in a single report. If additional files
379 need to be combined with data in an existing \class{Stats} object, the
380 \method{add()} method can be used.
381 \end{classdesc}
384 \subsection{The \class{Stats} Class \label{profile-stats}}
386 \class{Stats} objects have the following methods:
388 \begin{methoddesc}[Stats]{strip_dirs}{}
389 This method for the \class{Stats} class removes all leading path
390 information from file names. It is very useful in reducing the size
391 of the printout to fit within (close to) 80 columns. This method
392 modifies the object, and the stripped information is lost. After
393 performing a strip operation, the object is considered to have its
394 entries in a ``random'' order, as it was just after object
395 initialization and loading. If \method{strip_dirs()} causes two
396 function names to be indistinguishable (they are on the same
397 line of the same filename, and have the same function name), then the
398 statistics for these two entries are accumulated into a single entry.
399 \end{methoddesc}
402 \begin{methoddesc}[Stats]{add}{filename\optional{, \moreargs}}
403 This method of the \class{Stats} class accumulates additional
404 profiling information into the current profiling object. Its
405 arguments should refer to filenames created by the corresponding
406 version of \function{profile.run()}. Statistics for identically named
407 (re: file, line, name) functions are automatically accumulated into
408 single function statistics.
409 \end{methoddesc}
411 \begin{methoddesc}[Stats]{dump_stats}{filename}
412 Save the data loaded into the \class{Stats} object to a file named
413 \var{filename}. The file is created if it does not exist, and is
414 overwritten if it already exists. This is equivalent to the method of
415 the same name on the \class{profile.Profile} class.
416 \versionadded{2.3}
417 \end{methoddesc}
419 \begin{methoddesc}[Stats]{sort_stats}{key\optional{, \moreargs}}
420 This method modifies the \class{Stats} object by sorting it according
421 to the supplied criteria. The argument is typically a string
422 identifying the basis of a sort (example: \code{'time'} or
423 \code{'name'}).
425 When more than one key is provided, then additional keys are used as
426 secondary criteria when there is equality in all keys selected
427 before them. For example, \code{sort_stats('name', 'file')} will sort
428 all the entries according to their function name, and resolve all ties
429 (identical function names) by sorting by file name.
431 Abbreviations can be used for any key names, as long as the
432 abbreviation is unambiguous. The following are the keys currently
433 defined:
435 \begin{tableii}{l|l}{code}{Valid Arg}{Meaning}
436 \lineii{'calls'}{call count}
437 \lineii{'cumulative'}{cumulative time}
438 \lineii{'file'}{file name}
439 \lineii{'module'}{file name}
440 \lineii{'pcalls'}{primitive call count}
441 \lineii{'line'}{line number}
442 \lineii{'name'}{function name}
443 \lineii{'nfl'}{name/file/line}
444 \lineii{'stdname'}{standard name}
445 \lineii{'time'}{internal time}
446 \end{tableii}
448 Note that all sorts on statistics are in descending order (placing
449 most time consuming items first), where as name, file, and line number
450 searches are in ascending order (alphabetical). The subtle
451 distinction between \code{'nfl'} and \code{'stdname'} is that the
452 standard name is a sort of the name as printed, which means that the
453 embedded line numbers get compared in an odd way. For example, lines
454 3, 20, and 40 would (if the file names were the same) appear in the
455 string order 20, 3 and 40. In contrast, \code{'nfl'} does a numeric
456 compare of the line numbers. In fact, \code{sort_stats('nfl')} is the
457 same as \code{sort_stats('name', 'file', 'line')}.
459 For compatibility with the old profiler, the numeric arguments
460 \code{-1}, \code{0}, \code{1}, and \code{2} are permitted. They are
461 interpreted as \code{'stdname'}, \code{'calls'}, \code{'time'}, and
462 \code{'cumulative'} respectively. If this old style format (numeric)
463 is used, only one sort key (the numeric key) will be used, and
464 additional arguments will be silently ignored.
465 \end{methoddesc}
468 \begin{methoddesc}[Stats]{reverse_order}{}
469 This method for the \class{Stats} class reverses the ordering of the basic
470 list within the object. This method is provided primarily for
471 compatibility with the old profiler. Its utility is questionable
472 now that ascending vs descending order is properly selected based on
473 the sort key of choice.
474 \end{methoddesc}
476 \begin{methoddesc}[Stats]{print_stats}{\optional{restriction, \moreargs}}
477 This method for the \class{Stats} class prints out a report as described
478 in the \function{profile.run()} definition.
480 The order of the printing is based on the last \method{sort_stats()}
481 operation done on the object (subject to caveats in \method{add()} and
482 \method{strip_dirs()}).
484 The arguments provided (if any) can be used to limit the list down to
485 the significant entries. Initially, the list is taken to be the
486 complete set of profiled functions. Each restriction is either an
487 integer (to select a count of lines), or a decimal fraction between
488 0.0 and 1.0 inclusive (to select a percentage of lines), or a regular
489 expression (to pattern match the standard name that is printed; as of
490 Python 1.5b1, this uses the Perl-style regular expression syntax
491 defined by the \refmodule{re} module). If several restrictions are
492 provided, then they are applied sequentially. For example:
494 \begin{verbatim}
495 print_stats(.1, 'foo:')
496 \end{verbatim}
498 would first limit the printing to first 10\% of list, and then only
499 print functions that were part of filename \file{.*foo:}. In
500 contrast, the command:
502 \begin{verbatim}
503 print_stats('foo:', .1)
504 \end{verbatim}
506 would limit the list to all functions having file names \file{.*foo:},
507 and then proceed to only print the first 10\% of them.
508 \end{methoddesc}
511 \begin{methoddesc}[Stats]{print_callers}{\optional{restriction, \moreargs}}
512 This method for the \class{Stats} class prints a list of all functions
513 that called each function in the profiled database. The ordering is
514 identical to that provided by \method{print_stats()}, and the definition
515 of the restricting argument is also identical. For convenience, a
516 number is shown in parentheses after each caller to show how many
517 times this specific call was made. A second non-parenthesized number
518 is the cumulative time spent in the function at the right.
519 \end{methoddesc}
521 \begin{methoddesc}[Stats]{print_callees}{\optional{restriction, \moreargs}}
522 This method for the \class{Stats} class prints a list of all function
523 that were called by the indicated function. Aside from this reversal
524 of direction of calls (re: called vs was called by), the arguments and
525 ordering are identical to the \method{print_callers()} method.
526 \end{methoddesc}
529 \section{Limitations \label{profile-limits}}
531 One limitation has to do with accuracy of timing information.
532 There is a fundamental problem with deterministic profilers involving
533 accuracy. The most obvious restriction is that the underlying ``clock''
534 is only ticking at a rate (typically) of about .001 seconds. Hence no
535 measurements will be more accurate than the underlying clock. If
536 enough measurements are taken, then the ``error'' will tend to average
537 out. Unfortunately, removing this first error induces a second source
538 of error.
540 The second problem is that it ``takes a while'' from when an event is
541 dispatched until the profiler's call to get the time actually
542 \emph{gets} the state of the clock. Similarly, there is a certain lag
543 when exiting the profiler event handler from the time that the clock's
544 value was obtained (and then squirreled away), until the user's code
545 is once again executing. As a result, functions that are called many
546 times, or call many functions, will typically accumulate this error.
547 The error that accumulates in this fashion is typically less than the
548 accuracy of the clock (less than one clock tick), but it
549 \emph{can} accumulate and become very significant. This profiler
550 provides a means of calibrating itself for a given platform so that
551 this error can be probabilistically (on the average) removed.
552 After the profiler is calibrated, it will be more accurate (in a least
553 square sense), but it will sometimes produce negative numbers (when
554 call counts are exceptionally low, and the gods of probability work
555 against you :-). ) Do \emph{not} be alarmed by negative numbers in
556 the profile. They should \emph{only} appear if you have calibrated
557 your profiler, and the results are actually better than without
558 calibration.
561 \section{Calibration \label{profile-calibration}}
563 The profiler subtracts a constant from each
564 event handling time to compensate for the overhead of calling the time
565 function, and socking away the results. By default, the constant is 0.
566 The following procedure can
567 be used to obtain a better constant for a given platform (see discussion
568 in section Limitations above).
570 \begin{verbatim}
571 import profile
572 pr = profile.Profile()
573 for i in range(5):
574 print pr.calibrate(10000)
575 \end{verbatim}
577 The method executes the number of Python calls given by the argument,
578 directly and again under the profiler, measuring the time for both.
579 It then computes the hidden overhead per profiler event, and returns
580 that as a float. For example, on an 800 MHz Pentium running
581 Windows 2000, and using Python's time.clock() as the timer,
582 the magical number is about 12.5e-6.
584 The object of this exercise is to get a fairly consistent result.
585 If your computer is \emph{very} fast, or your timer function has poor
586 resolution, you might have to pass 100000, or even 1000000, to get
587 consistent results.
589 When you have a consistent answer,
590 there are three ways you can use it:\footnote{Prior to Python 2.2, it
591 was necessary to edit the profiler source code to embed the bias as
592 a literal number. You still can, but that method is no longer
593 described, because no longer needed.}
595 \begin{verbatim}
596 import profile
598 # 1. Apply computed bias to all Profile instances created hereafter.
599 profile.Profile.bias = your_computed_bias
601 # 2. Apply computed bias to a specific Profile instance.
602 pr = profile.Profile()
603 pr.bias = your_computed_bias
605 # 3. Specify computed bias in instance constructor.
606 pr = profile.Profile(bias=your_computed_bias)
607 \end{verbatim}
609 If you have a choice, you are better off choosing a smaller constant, and
610 then your results will ``less often'' show up as negative in profile
611 statistics.
614 \section{Extensions --- Deriving Better Profilers}
615 \nodename{Profiler Extensions}
617 The \class{Profile} class of module \module{profile} was written so that
618 derived classes could be developed to extend the profiler. The details
619 are not described here, as doing this successfully requires an expert
620 understanding of how the \class{Profile} class works internally. Study
621 the source code of module \module{profile} carefully if you want to
622 pursue this.
624 If all you want to do is change how current time is determined (for
625 example, to force use of wall-clock time or elapsed process time),
626 pass the timing function you want to the \class{Profile} class
627 constructor:
629 \begin{verbatim}
630 pr = profile.Profile(your_time_func)
631 \end{verbatim}
633 The resulting profiler will then call \code{your_time_func()}.
634 The function should return a single number, or a list of
635 numbers whose sum is the current time (like what \function{os.times()}
636 returns). If the function returns a single time number, or the list of
637 returned numbers has length 2, then you will get an especially fast
638 version of the dispatch routine.
640 Be warned that you should calibrate the profiler class for the
641 timer function that you choose. For most machines, a timer that
642 returns a lone integer value will provide the best results in terms of
643 low overhead during profiling. (\function{os.times()} is
644 \emph{pretty} bad, as it returns a tuple of floating point values). If
645 you want to substitute a better timer in the cleanest fashion,
646 derive a class and hardwire a replacement dispatch method that best
647 handles your timer call, along with the appropriate calibration
648 constant.