Improve implementation of cycle subcounting
Configuring with GMX_CYCLE_SUBCOUNTERS on is intended to make active
some counters that show finer-grained timing details, but its
implementation with the preprocessor was more complex and more bug
prone than this one. Static analysis now finds the bug where we
over-run buf (fixed here and in release-5-1).
The only place we care about performance of the subcounter
implementation is that it doesn't do work when GMX_CYCLE_SUBCOUNTERS
is off, and constant propagation and dead code elimination will handle
that.
Also moved some declarations into the blocks where they are used.
Change-Id: I3d7a06a65c636c11557a094997a7a81f86a1ed8a