Awking Awesome Acceleration!
The TopGit patch dependency graph is a Directed Acyclic Graph (DAG)
embedded within the Git repository via ".topdeps" files located at
the top-level of branches that have a same-named-but-under-topbases
branch.
Most of the TopGit functions need to traverse this TopGit DAG in one
way or another. However, since the DAG itself is external to Git
(even though it's stored in a Git repository), none of the Git
commands can be used for traversal since they know nothing about
the semantics of ".topdeps" files.
Unfortunately, this has resulted in poor performance when traversing
even moderately complex TopGit DAGs. Various auto-maintaining
caches have been added that do result in a nearly 10x speed up
when the cache is up-to-date (which it tends to be just from normal
TopGit operations).
But it's still rather slow and only just bearable.
However, using a combination of the '%(rest)' atom in the format
string for "git cat-file --batch-check" and "git cat-file --batch"
(introduced with Git version 1.8.5) combined with POSIX awk scripts,
a further 10x speed up can be obtained compared to the fully cached
version, but no cache is required for this version which means it
provides approximately a 100x speed up over the uncached version!
The awk scripts provided are strictly POSIX-compliant and Git
version 1.8.5 or later has been required by TopGit for a while
now. Additionally, if Git is version 2.6 or later the "--buffer"
option will be passed along to the "git cat-file" command to
further boost performance.
Awk is fast. Very fast. Some might saw "Awking" fast.
The result is that TopGit commands which only need to access the
TopGit DAG are extremely fast.
There are only two areas these scripts do not help with:
1) Out-of-date checks
This is a branch contains operation and those results continue
to be cached. The first part of the out-of-date check, however,
involves traversal of the TopGit DAG to determine which refs
need to be passed to Git for the branch contains check. The
time needed for that portion of the operation is practically
eliminated by use of the batch-check + awk combination.
(Provided via the run_awk_topgit_recurse helper function.)
2) Merging
The tg update command performs merges to do updates. The
recent change to do index-only merges with octopus support
where possible provided about a 5x speed boost for moderately
complex TopGit DAGs. Awk cannot help with the actual merging
except that the first part of the update operation is an
out-of-date and there is an awk speed boost for that portion.
Overall, use of these awk scripts and helper functions to run them
begets an enormous performance boost for TopGit!
The switchover to make full use of these awk scripts (via the helper
functions defined in tg--awksome) will take place piecemeal over the
course of several future changes.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
15 files changed: