gc.sh: overhaul how forks are handled during gc
If the project we are running garbage collection (gc) on has any forks
we must be careful not to remove any objects that while no longer
referenced by the project being gc'd (the parent) are still referenced
by one or more forks (the children) otherwise the children will become
corrupt and we can't abide corrupt children.
One way to accomplish this is to simply hard-link all currently existing
loose objects and packs in the parent into all the children that refer
to the parent (via a line in their objects/info/alternates file) before
beginning the gc operation and then relying on a subsequent gc in the
child to clean up any excess objects/packs. We used to use this
strategy but it's very inefficient because:
1. The disk space used by the old pack(s)/object(s) will not be
reclaimed until all children (and their children, if any) run gc by
which time it's quite possible the topmost parent will have run gc
again and hard-linked yet another old pack down to its children
(not to mention loose objects).
2. As we are now using the "-A" option with "git repack", any new
objects in the parent that are not referenced by children will
continually get exploded out of the hard-linked pack in the
children whenever the children run gc.
3. To avoid suboptimal and/or unnecessarily many packs being
hard-linked into child forks, we must run the "mini" gc maintenance
before we perform the hard-linking into the children which provides
yet another source of inefficiency.
Since we are now using the "-A" option to "git repack" (that was not
always the case) to guarantee we can access old ref values for long
enough to send out a meaningful mail.sh notification, we now have
another, more efficient, option available to prevent corruption of child
forks that continue to refer to objects that are no longer reachable
from any ref in the parent.
The only things that need be copied (or hard-linked) into the child
fork(s) are those objects that have become unreachable from any ref in
the parent. They are the only things that could ever be removed by "git
prune" and therefore the only things we need to prevent the loss of in
order to avoid corruption of the child fork(s).
Therefore change the way we handle forks during gc to now use the
following strategy instead to avoid excessive disk use and lots of
unnecessary loose objects in child forks:
1. Run "git repack -A -d -l" in the parent BEFORE doing anything about
child forks.
2. Collect all remaining existing loose objects in the parent into a
single pack BEFORE running "git prune" in the parent and if it's
not empty then hard-link that single pack into the immediate
children.
3. Now run "git prune" in the parent.
With this new strategy we avoid the need to run any "mini" gc
maintenance before copying (or hard-linking) anything down to the child
forks. Furthermore, only when the parent performs a non-fast-forward
update will anything ever be transferred to the children leaving them
unperturbed in the vast majority of cases. Finally, even if the parent
references objects the children do not, those objects will no longer
continually end up in the children as unreachable loose objects after
the children run gc.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>