xdiff: load full words in the inner loop of xdl_hash_record
Redo the hashing loop in xdl_hash_record in a way that loads an entire
'long' at a time, using masking tricks to see when and where we found
the terminating '\n'.
I stole inspiration and code from the posts by Linus Torvalds around
https://lkml.org/lkml/2012/3/2/452
https://lkml.org/lkml/2012/3/5/6
His method reads the buffers in sizeof(long) increments, and may thus
overrun it by at most sizeof(long)-1 bytes before it sees the final
newline (or hits the buffer length check). I considered padding out
all buffers by a suitable amount to "catch" the overrun, but
* this does not work for mmap()'d buffers: if you map 4096+8 bytes
from a 4096 byte file, accessing the last 8 bytes results in a
SIGBUS on my machine; and
* it would also be extremely ugly because it intrudes deep into the
unpacking machinery.
So I adapted it to not read beyond the buffer at all. Instead, it
reads the final partial word byte-by-byte and strings it together.
Then it can use the same logic as before to finish the hashing.
So far we enable this only on x86_64, where it provides nice speedup
for diff-related work:
Test origin/next tr/xdiff-fast-hash
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline) 0.07(0.05+0.02) 0.08(0.06+0.02) +14.3%
4000.2: log --raw -3000 (tree-only) 0.37(0.33+0.04) 0.37(0.32+0.04) +0.0%
4000.3: log -p -3000 (Myers) 1.75(1.65+0.09) 1.60(1.49+0.10) -8.6%
4000.4: log -p -3000 --histogram 1.73(1.62+0.09) 1.58(1.49+0.08) -8.7%
4000.5: log -p -3000 --patience 2.11(2.00+0.10) 1.94(1.80+0.11) -8.1%
Perhaps other platforms could also benefit. However it does NOT work
on big-endian systems!
[jc: minimum style and compilation fixes]
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>