1 ________________________________________________________________________
3 PYBENCH - A Python Benchmark Suite
4 ________________________________________________________________________
6 Extendable suite of of low-level benchmarks for measuring
7 the performance of the Python implementation
8 (interpreter, compiler or VM).
10 pybench is a collection of tests that provides a standardized way to
11 measure the performance of Python implementations. It takes a very
12 close look at different aspects of Python programs and let's you
13 decide which factors are more important to you than others, rather
14 than wrapping everything up in one number, like the other performance
15 tests do (e.g. pystone which is included in the Python Standard
18 pybench has been used in the past by several Python developers to
19 track down performance bottlenecks or to demonstrate the impact of
20 optimizations and new features in Python.
22 The command line interface for pybench is the file pybench.py. Run
23 this script with option '--help' to get a listing of the possible
24 options. Without options, pybench will simply execute the benchmark
25 and then print out a report to stdout.
31 Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
32 the benchmark suite using default settings and 'pybench.py -f <file>'
33 to have it store the results in a file too.
35 It is usually a good idea to run pybench.py multiple times to see
36 whether the environment, timers and benchmark run-times are suitable
37 for doing benchmark tests.
39 You can use the comparison feature of pybench.py ('pybench.py -c
40 <file>') to check how well the system behaves in comparison to a
43 If the differences are well below 10% for each test, then you have a
44 system that is good for doing benchmark testings. Of you get random
45 differences of more than 10% or significant differences between the
46 values for minimum and average time, then you likely have some
47 background processes running which cause the readings to become
48 inconsistent. Examples include: web-browsers, email clients, RSS
49 readers, music players, backup programs, etc.
51 If you are only interested in a few tests of the whole suite, you can
52 use the filtering option, e.g. 'pybench.py -t string' will only
53 run/show the tests that have 'string' in their name.
55 This is the current output of pybench.py --help:
58 ------------------------------------------------------------------------
59 PYBENCH - a benchmark test suite for Python interpreters/compilers.
60 ------------------------------------------------------------------------
63 pybench.py [option] files...
65 Options and default settings:
66 -n arg number of rounds (10)
67 -f arg save benchmark to file arg ()
68 -c arg compare benchmark with the one in file arg ()
69 -s arg show benchmark in file arg, then exit ()
70 -w arg set warp factor to arg (10)
71 -t arg run only tests with names matching arg ()
72 -C arg set the number of calibration runs to arg (20)
73 -d hide noise in comparisons (0)
74 -v verbose output (not recommended) (0)
75 --with-gc enable garbage collection (0)
76 --with-syscheck use default sys check interval (0)
77 --timer arg use given timer (time.time)
78 -h show this help text
79 --help show this help text
80 --debug enable debugging
81 --copyright show copyright
82 --examples show examples of usage
87 The normal operation is to run the suite and display the
88 results. Use -f to save them for later reuse or comparisons.
98 python2.1 pybench.py -f p21.pybench
99 python2.5 pybench.py -f p25.pybench
100 python pybench.py -s p25.pybench -c p21.pybench
113 -------------------------------------------------------------------------------
115 -------------------------------------------------------------------------------
117 * disabled garbage collection
118 * system check interval set to maximum: 2147483647
119 * using timer: time.time
121 Calibrating tests. Please wait...
123 Running 10 round(s) of the suite at warp factor 10:
125 * Round 1 done in 6.388 seconds.
126 * Round 2 done in 6.485 seconds.
127 * Round 3 done in 6.786 seconds.
129 * Round 10 done in 6.546 seconds.
131 -------------------------------------------------------------------------------
132 Benchmark: 2006-06-12 12:09:25
133 -------------------------------------------------------------------------------
140 Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
144 Executable: /usr/local/bin/python
146 Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
148 Build: Oct 1 2005 15:24:35 (#1)
152 Test minimum average operation overhead
153 -------------------------------------------------------------------------------
154 BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
155 BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
156 CompareFloats: 109ms 110ms 0.09us 0.361ms
157 CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
158 CompareIntegers: 137ms 138ms 0.08us 0.542ms
159 CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
160 CompareLongs: 100ms 104ms 0.10us 0.316ms
161 CompareStrings: 111ms 115ms 0.12us 0.929ms
162 CompareUnicode: 108ms 128ms 0.17us 0.693ms
163 ConcatStrings: 142ms 155ms 0.31us 0.562ms
164 ConcatUnicode: 119ms 127ms 0.42us 0.384ms
165 CreateInstances: 123ms 128ms 1.14us 0.367ms
166 CreateNewInstances: 121ms 126ms 1.49us 0.335ms
167 CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
168 CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
169 DictCreation: 108ms 109ms 0.27us 0.361ms
170 DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
171 DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
172 DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
173 ForLoops: 110ms 111ms 4.46us 0.063ms
174 IfThenElse: 118ms 119ms 0.09us 0.685ms
175 ListSlicing: 116ms 120ms 8.59us 0.103ms
176 NestedForLoops: 125ms 137ms 0.09us 0.019ms
177 NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
178 NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
179 PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
180 PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
181 Recursion: 156ms 166ms 3.32us 0.452ms
182 SecondImport: 112ms 118ms 1.18us 0.180ms
183 SecondPackageImport: 118ms 127ms 1.27us 0.180ms
184 SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
185 SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
186 SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
187 SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
188 SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
189 SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
190 SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
191 SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
192 SmallLists: 105ms 116ms 0.17us 0.366ms
193 SmallTuples: 108ms 128ms 0.24us 0.406ms
194 SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
195 SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
196 StringMappings: 115ms 121ms 0.48us 0.405ms
197 StringPredicates: 120ms 129ms 0.18us 2.064ms
198 StringSlicing: 111ms 127ms 0.23us 0.781ms
199 TryExcept: 125ms 126ms 0.06us 0.681ms
200 TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
201 TupleSlicing: 117ms 120ms 0.46us 0.066ms
202 UnicodeMappings: 156ms 160ms 4.44us 0.429ms
203 UnicodePredicates: 117ms 121ms 0.22us 2.487ms
204 UnicodeProperties: 115ms 153ms 0.38us 2.070ms
205 UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
206 -------------------------------------------------------------------------------
207 Totals: 6283ms 6673ms
209 ________________________________________________________________________
212 ________________________________________________________________________
214 pybench tests are simple modules defining one or more pybench.Test
217 Writing a test essentially boils down to providing two methods:
218 .test() which runs .rounds number of .operations test operations each
219 and .calibrate() which does the same except that it doesn't actually
220 execute the operations.
226 from pybench import Test
228 class IntegerCounting(Test):
230 # Version number of the test as float (x.yy); this is important
231 # for comparisons of benchmark runs - tests with unequal version
232 # number will not get compared.
235 # The number of abstract operations done in each round of the
236 # test. An operation is the basic unit of what you want to
237 # measure. The benchmark will output the amount of run-time per
238 # operation. Note that in order to raise the measured timings
239 # significantly above noise level, it is often required to repeat
240 # sets of operations more than once per test round. The measured
241 # overhead per test round should be less than 1 second.
244 # Number of rounds to execute per test run. This should be
245 # adjusted to a figure that results in a test run-time of between
246 # 1-2 seconds (at warp 1).
253 The test needs to run self.rounds executing
254 self.operations number of operations each.
262 # NOTE: Use xrange() for all test loops unless you want to face
265 for i in xrange(self.rounds):
267 # Repeat the operations per round to raise the run-time
268 # per operation significantly above the noise level of the
271 # Execute 20 operations (a += 1):
295 """ Calibrate the test.
297 This method should execute everything that is needed to
298 setup and run the test - except for the actual operations
299 that you intend to measure. pybench uses this method to
300 measure the test implementation overhead.
306 # Run test rounds (without actually doing any operation)
307 for i in xrange(self.rounds):
309 # Skip the actual execution of the operations, since we
310 # only want to measure the test's administration overhead.
313 Registering a new test module
314 -----------------------------
316 To register a test module with pybench, the classes need to be
317 imported into the pybench.Setup module. pybench will then scan all the
318 symbols defined in that module for subclasses of pybench.Test and
319 automatically add them to the benchmark suite.
322 Breaking Comparability
323 ----------------------
325 If a change is made to any individual test that means it is no
326 longer strictly comparable with previous runs, the '.version' class
327 variable should be updated. Therefafter, comparisons with previous
328 versions of the test will list as "n/a" to reflect the change.
334 2.0: rewrote parts of pybench which resulted in more repeatable
336 - made timer a parameter
337 - changed the platform default timer to use high-resolution
338 timers rather than process timers (which have a much lower
340 - added option to select timer
341 - added process time timer (using systimes.py)
342 - changed to use min() as timing estimator (average
343 is still taken as well to provide an idea of the difference)
344 - garbage collection is turned off per default
345 - sys check interval is set to the highest possible value
346 - calibration is now a separate step and done using
347 a different strategy that allows measuring the test
348 overhead more accurately
349 - modified the tests to each give a run-time of between
350 100-200ms using warp 10
351 - changed default warp factor to 10 (from 20)
352 - compared results with timeit.py and confirmed measurements
353 - bumped all test versions to 2.0
354 - updated platform.py to the latest version
355 - changed the output format a bit to make it look
357 - refactored the APIs somewhat
358 1.3+: Steve Holden added the NewInstances test and the filtering
359 option during the NeedForSpeed sprint; this also triggered a long
360 discussion on how to improve benchmark timing and finally
361 resulted in the release of 2.0
362 1.3: initial checkin into the Python SVN repository