perf: huge refactoring
The overall aim of this refactoring was to remove duplicate code in all
benchmarks that was responsible (a) for computing proper workload size
and (b) for computing final statistics.
After the refactoring, the actual benchmark code is quite short and
takes care of the actual work only.
The harness code was factored out into perf.c that is now responsible
for computing the workload size and then runs the actual benchmark.
As an extra feature, we pass stopwatch_t into the benchmark code that is
only responsible for starting/stopping. Duration is then queried outside
the benchmark code.