pack-objects: fix threaded load balancing
The current method consists of a master thread serving chunks of objects
to work threads when they're done with their previous chunk. The issue
is to determine the best chunk size: making it too large creates poor
load balancing, while making it too small has a negative effect on pack
size because of the increased number of chunk boundaries and poor delta
window utilization.
This patch implements a completely different approach by initially
splitting the work in large chunks uniformly amongst all threads, and
whenever a thread is done then it steals half of the remaining work from
another thread with the largest amount of unprocessed objects.
This has the advantage of greatly reducing the number of chunk boundaries
with an almost perfect load balancing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>