Separate wallcycle responsibilities
Collecting cycle counts, scaling them by thread count, summing them
over ranks, and printing the results are responsibilities worth
separating. This commit removes several fields from gmx_wallcycle
that don't relate to collecting cycle counts.
Specifically,
* the summation code no longer handles scaling by thread
count, for which there is a new function.
* the summation code returns the summed cycle counts for printing,
rather than keep it in the cycle-counting struct
Noted TODOs regarding
* the way ewcWAIT_GPU_NB_L is being used for load-balancing and
never reporting, so should have a different implementation.
* reducing haveInvalidCount at the same time as counter summation
has been buggy and is needless complexity
* wallcycle_sum needs to return both rank-0 and global data in
a way that makes more clear how they should be used
* that finish_run can be cleaned up and move back to runner.cpp
* that scaling counters by thread counts should be the
responsibility of the code that opened the counting region
Also, removed unused field from wallcc_t
Change-Id: Ieefdb3118c5de539debc3a48426fc0461182f5fe