Restructure the load balancing timing
The load balancing region is now set by a (variable) start and end
point. This is much simpler than the tedious addition of timings of
many small regions during the force communication over multiple files.
The disadvantage of the current implementation is that one needs to
place a call to ddReopenBalanceRegionCpu() after every communication
call that can occur in the balancing region.
This change should avoid instabilities in DLB due to e.g. more time
being measured, but also available, when using GPUs and ranks with
less load starting earlier due to the constrains finishing earlier.
Change-Id: Idf73c3367adc269def533dfabf27df2ba4f6834f
19 files changed: