Tools/pybench/README

   1 ________________________________________________________________________
   2
   3 PYBENCH - A Python Benchmark Suite
   4 ________________________________________________________________________
   5
   6      Extendable suite of of low-level benchmarks for measuring
   7           the performance of the Python implementation
   8                  (interpreter, compiler or VM).
   9
  10 pybench is a collection of tests that provides a standardized way to
  11 measure the performance of Python implementations. It takes a very
  12 close look at different aspects of Python programs and let's you
  13 decide which factors are more important to you than others, rather
  14 than wrapping everything up in one number, like the other performance
  15 tests do (e.g. pystone which is included in the Python Standard
  16 Library).
  17
  18 pybench has been used in the past by several Python developers to
  19 track down performance bottlenecks or to demonstrate the impact of
  20 optimizations and new features in Python.
  21
  22 The command line interface for pybench is the file pybench.py. Run
  23 this script with option '--help' to get a listing of the possible
  24 options. Without options, pybench will simply execute the benchmark
  25 and then print out a report to stdout.
  26
  27
  28 Micro-Manual
  29 ------------
  30
  31 Run 'pybench.py -h' to see the help screen.  Run 'pybench.py' to run
  32 the benchmark suite using default settings and 'pybench.py -f <file>'
  33 to have it store the results in a file too.
  34
  35 It is usually a good idea to run pybench.py multiple times to see
  36 whether the environment, timers and benchmark run-times are suitable
  37 for doing benchmark tests.
  38
  39 You can use the comparison feature of pybench.py ('pybench.py -c
  40 <file>') to check how well the system behaves in comparison to a
  41 reference run.
  42
  43 If the differences are well below 10% for each test, then you have a
  44 system that is good for doing benchmark testings.  Of you get random
  45 differences of more than 10% or significant differences between the
  46 values for minimum and average time, then you likely have some
  47 background processes running which cause the readings to become
  48 inconsistent. Examples include: web-browsers, email clients, RSS
  49 readers, music players, backup programs, etc.
  50
  51 If you are only interested in a few tests of the whole suite, you can
  52 use the filtering option, e.g. 'pybench.py -t string' will only
  53 run/show the tests that have 'string' in their name.
  54
  55 This is the current output of pybench.py --help:
  56
  57 """
  58 ------------------------------------------------------------------------
  59 PYBENCH - a benchmark test suite for Python interpreters/compilers.
  60 ------------------------------------------------------------------------
  61
  62 Synopsis:
  63  pybench.py [option] files...
  64
  65 Options and default settings:
  66   -n arg           number of rounds (10)
  67   -f arg           save benchmark to file arg ()
  68   -c arg           compare benchmark with the one in file arg ()
  69   -s arg           show benchmark in file arg, then exit ()
  70   -w arg           set warp factor to arg (10)
  71   -t arg           run only tests with names matching arg ()
  72   -C arg           set the number of calibration runs to arg (20)
  73   -d               hide noise in comparisons (0)
  74   -v               verbose output (not recommended) (0)
  75   --with-gc        enable garbage collection (0)
  76   --with-syscheck  use default sys check interval (0)
  77   --timer arg      use given timer (time.time)
  78   -h               show this help text
  79   --help           show this help text
  80   --debug          enable debugging
  81   --copyright      show copyright
  82   --examples       show examples of usage
  83
  84 Version:
  85  2.0
  86
  87 The normal operation is to run the suite and display the
  88 results. Use -f to save them for later reuse or comparisons.
  89
  90 Available timers:
  91
  92    time.time
  93    time.clock
  94    systimes.processtime
  95
  96 Examples:
  97
  98 python2.1 pybench.py -f p21.pybench
  99 python2.5 pybench.py -f p25.pybench
 100 python pybench.py -s p25.pybench -c p21.pybench
 101 """
 102
 103 License
 104 -------
 105
 106 See LICENSE file.
 107
 108
 109 Sample output
 110 -------------
 111
 112 """
 113 -------------------------------------------------------------------------------
 114 PYBENCH 2.0
 115 -------------------------------------------------------------------------------
 116 * using Python 2.4.2
 117 * disabled garbage collection
 118 * system check interval set to maximum: 2147483647
 119 * using timer: time.time
 120
 121 Calibrating tests. Please wait...
 122
 123 Running 10 round(s) of the suite at warp factor 10:
 124
 125 * Round 1 done in 6.388 seconds.
 126 * Round 2 done in 6.485 seconds.
 127 * Round 3 done in 6.786 seconds.
 128 ...
 129 * Round 10 done in 6.546 seconds.
 130
 131 -------------------------------------------------------------------------------
 132 Benchmark: 2006-06-12 12:09:25
 133 -------------------------------------------------------------------------------
 134
 135     Rounds: 10
 136     Warp:   10
 137     Timer:  time.time
 138
 139     Machine Details:
 140        Platform ID:  Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
 141        Processor:    x86_64
 142
 143     Python:
 144        Executable:   /usr/local/bin/python
 145        Version:      2.4.2
 146        Compiler:     GCC 3.3.4 (pre 3.3.5 20040809)
 147        Bits:         64bit
 148        Build:        Oct  1 2005 15:24:35 (#1)
 149        Unicode:      UCS2
 150
 151
 152 Test                             minimum  average  operation  overhead
 153 -------------------------------------------------------------------------------
 154           BuiltinFunctionCalls:    126ms    145ms    0.28us    0.274ms
 155            BuiltinMethodLookup:    124ms    130ms    0.12us    0.316ms
 156                  CompareFloats:    109ms    110ms    0.09us    0.361ms
 157          CompareFloatsIntegers:    100ms    104ms    0.12us    0.271ms
 158                CompareIntegers:    137ms    138ms    0.08us    0.542ms
 159         CompareInternedStrings:    124ms    127ms    0.08us    1.367ms
 160                   CompareLongs:    100ms    104ms    0.10us    0.316ms
 161                 CompareStrings:    111ms    115ms    0.12us    0.929ms
 162                 CompareUnicode:    108ms    128ms    0.17us    0.693ms
 163                  ConcatStrings:    142ms    155ms    0.31us    0.562ms
 164                  ConcatUnicode:    119ms    127ms    0.42us    0.384ms
 165                CreateInstances:    123ms    128ms    1.14us    0.367ms
 166             CreateNewInstances:    121ms    126ms    1.49us    0.335ms
 167        CreateStringsWithConcat:    130ms    135ms    0.14us    0.916ms
 168        CreateUnicodeWithConcat:    130ms    135ms    0.34us    0.361ms
 169                   DictCreation:    108ms    109ms    0.27us    0.361ms
 170              DictWithFloatKeys:    149ms    153ms    0.17us    0.678ms
 171            DictWithIntegerKeys:    124ms    126ms    0.11us    0.915ms
 172             DictWithStringKeys:    114ms    117ms    0.10us    0.905ms
 173                       ForLoops:    110ms    111ms    4.46us    0.063ms
 174                     IfThenElse:    118ms    119ms    0.09us    0.685ms
 175                    ListSlicing:    116ms    120ms    8.59us    0.103ms
 176                 NestedForLoops:    125ms    137ms    0.09us    0.019ms
 177           NormalClassAttribute:    124ms    136ms    0.11us    0.457ms
 178        NormalInstanceAttribute:    110ms    117ms    0.10us    0.454ms
 179            PythonFunctionCalls:    107ms    113ms    0.34us    0.271ms
 180              PythonMethodCalls:    140ms    149ms    0.66us    0.141ms
 181                      Recursion:    156ms    166ms    3.32us    0.452ms
 182                   SecondImport:    112ms    118ms    1.18us    0.180ms
 183            SecondPackageImport:    118ms    127ms    1.27us    0.180ms
 184          SecondSubmoduleImport:    140ms    151ms    1.51us    0.180ms
 185        SimpleComplexArithmetic:    128ms    139ms    0.16us    0.361ms
 186         SimpleDictManipulation:    134ms    136ms    0.11us    0.452ms
 187          SimpleFloatArithmetic:    110ms    113ms    0.09us    0.571ms
 188       SimpleIntFloatArithmetic:    106ms    111ms    0.08us    0.548ms
 189        SimpleIntegerArithmetic:    106ms    109ms    0.08us    0.544ms
 190         SimpleListManipulation:    103ms    113ms    0.10us    0.587ms
 191           SimpleLongArithmetic:    112ms    118ms    0.18us    0.271ms
 192                     SmallLists:    105ms    116ms    0.17us    0.366ms
 193                    SmallTuples:    108ms    128ms    0.24us    0.406ms
 194          SpecialClassAttribute:    119ms    136ms    0.11us    0.453ms
 195       SpecialInstanceAttribute:    143ms    155ms    0.13us    0.454ms
 196                 StringMappings:    115ms    121ms    0.48us    0.405ms
 197               StringPredicates:    120ms    129ms    0.18us    2.064ms
 198                  StringSlicing:    111ms    127ms    0.23us    0.781ms
 199                      TryExcept:    125ms    126ms    0.06us    0.681ms
 200                 TryRaiseExcept:    133ms    137ms    2.14us    0.361ms
 201                   TupleSlicing:    117ms    120ms    0.46us    0.066ms
 202                UnicodeMappings:    156ms    160ms    4.44us    0.429ms
 203              UnicodePredicates:    117ms    121ms    0.22us    2.487ms
 204              UnicodeProperties:    115ms    153ms    0.38us    2.070ms
 205                 UnicodeSlicing:    126ms    129ms    0.26us    0.689ms
 206 -------------------------------------------------------------------------------
 207 Totals:                           6283ms   6673ms
 208 """
 209 ________________________________________________________________________
 210
 211 Writing New Tests
 212 ________________________________________________________________________
 213
 214 pybench tests are simple modules defining one or more pybench.Test
 215 subclasses.
 216
 217 Writing a test essentially boils down to providing two methods:
 218 .test() which runs .rounds number of .operations test operations each
 219 and .calibrate() which does the same except that it doesn't actually
 220 execute the operations.
 221
 222
 223 Here's an example:
 224 ------------------
 225
 226 from pybench import Test
 227
 228 class IntegerCounting(Test):
 229
 230     # Version number of the test as float (x.yy); this is important
 231     # for comparisons of benchmark runs - tests with unequal version
 232     # number will not get compared.
 233     version = 1.0
 234
 235     # The number of abstract operations done in each round of the
 236     # test. An operation is the basic unit of what you want to
 237     # measure. The benchmark will output the amount of run-time per
 238     # operation. Note that in order to raise the measured timings
 239     # significantly above noise level, it is often required to repeat
 240     # sets of operations more than once per test round. The measured
 241     # overhead per test round should be less than 1 second.
 242     operations = 20
 243
 244     # Number of rounds to execute per test run. This should be
 245     # adjusted to a figure that results in a test run-time of between
 246     # 1-2 seconds (at warp 1).
 247     rounds = 100000
 248
 249     def test(self):
 250
 251         """ Run the test.
 252
 253             The test needs to run self.rounds executing
 254             self.operations number of operations each.
 255
 256         """
 257         # Init the test
 258         a = 1
 259
 260         # Run test rounds
 261         #
 262         # NOTE: Use xrange() for all test loops unless you want to face
 263         # a 20MB process !
 264         #
 265         for i in xrange(self.rounds):
 266
 267             # Repeat the operations per round to raise the run-time
 268             # per operation significantly above the noise level of the
 269             # for-loop overhead.
 270
 271             # Execute 20 operations (a += 1):
 272             a += 1
 273             a += 1
 274             a += 1
 275             a += 1
 276             a += 1
 277             a += 1
 278             a += 1
 279             a += 1
 280             a += 1
 281             a += 1
 282             a += 1
 283             a += 1
 284             a += 1
 285             a += 1
 286             a += 1
 287             a += 1
 288             a += 1
 289             a += 1
 290             a += 1
 291             a += 1
 292
 293     def calibrate(self):
 294
 295         """ Calibrate the test.
 296
 297             This method should execute everything that is needed to
 298             setup and run the test - except for the actual operations
 299             that you intend to measure. pybench uses this method to
 300             measure the test implementation overhead.
 301
 302         """
 303         # Init the test
 304         a = 1
 305
 306         # Run test rounds (without actually doing any operation)
 307         for i in xrange(self.rounds):
 308
 309             # Skip the actual execution of the operations, since we
 310             # only want to measure the test's administration overhead.
 311             pass
 312
 313 Registering a new test module
 314 -----------------------------
 315
 316 To register a test module with pybench, the classes need to be
 317 imported into the pybench.Setup module. pybench will then scan all the
 318 symbols defined in that module for subclasses of pybench.Test and
 319 automatically add them to the benchmark suite.
 320
 321
 322 Breaking Comparability
 323 ----------------------
 324
 325 If a change is made to any individual test that means it is no
 326 longer strictly comparable with previous runs, the '.version' class
 327 variable should be updated. Therefafter, comparisons with previous
 328 versions of the test will list as "n/a" to reflect the change.
 329
 330
 331 Version History
 332 ---------------
 333
 334   2.0: rewrote parts of pybench which resulted in more repeatable
 335        timings:
 336         - made timer a parameter
 337         - changed the platform default timer to use high-resolution
 338           timers rather than process timers (which have a much lower
 339           resolution)
 340         - added option to select timer
 341         - added process time timer (using systimes.py)
 342         - changed to use min() as timing estimator (average
 343           is still taken as well to provide an idea of the difference)
 344         - garbage collection is turned off per default
 345         - sys check interval is set to the highest possible value
 346         - calibration is now a separate step and done using
 347           a different strategy that allows measuring the test
 348           overhead more accurately
 349         - modified the tests to each give a run-time of between
 350           100-200ms using warp 10
 351         - changed default warp factor to 10 (from 20)
 352         - compared results with timeit.py and confirmed measurements
 353         - bumped all test versions to 2.0
 354         - updated platform.py to the latest version
 355         - changed the output format a bit to make it look
 356           nicer
 357         - refactored the APIs somewhat
 358   1.3+: Steve Holden added the NewInstances test and the filtering
 359        option during the NeedForSpeed sprint; this also triggered a long
 360        discussion on how to improve benchmark timing and finally
 361        resulted in the release of 2.0
 362   1.3: initial checkin into the Python SVN repository
 363
 364
 365 Have fun,
 366 --
 367 Marc-Andre Lemburg
 368 mal@lemburg.com