[ARM] Optimise memchr for NEON-enabled processors
commitf8f72bc0c3da8ba039e6a1ed670ca576120b1f85
authorPrakhar Bahuguna <prakhar.bahuguna@arm.com>
Tue, 27 Jun 2017 15:43:50 +0000 (27 15:43 +0000)
committerJoseph Myers <joseph@codesourcery.com>
Tue, 27 Jun 2017 15:43:50 +0000 (27 15:43 +0000)
tree83b3438aea7f6425cf94c5f97cbbca1d62797683
parenta37b5daa6bc7fbcbbc229b2549a161fa15023f41
[ARM] Optimise memchr for NEON-enabled processors

This patch provides an optimised implementation of memchr using NEON
instructions to improve its performance, especially with longer search regions.
This gave an improvement in performance against the Thumb2+DSP optimised code,
with more significant gains for larger inputs. The NEON code also wins in cases
where the input is small (less than 8 bytes) by defaulting to a simple
byte-by-byte search. This avoids the overhead imposed by filling two quadword
registers from memory.

* sysdeps/arm/armv7/multiarch/Makefile: Add memchr_neon to
sysdep_routines.
* sysdeps/arm/armv7/multiarch/ifunc-impl-list.c: Add define for
__memchr_neon.
Add ifunc definitions for __memchr_neon and __memchr_noneon.
* sysdeps/arm/armv7/multiarch/memchr.S: New file.
* sysdeps/arm/armv7/multiarch/memchr_impl.S: Likewise.
* sysdeps/arm/armv7/multiarch/memchr_neon.S: Likewise.

Testing done: Ran regression tests for arm-none-linux-gnueabihf as well as a
full toolchain bootstrap. Benchmark tests were ran on ARMv7-A and ARMv8-A
hardware targets.
ChangeLog
sysdeps/arm/armv7/multiarch/Makefile
sysdeps/arm/armv7/multiarch/ifunc-impl-list.c
sysdeps/arm/armv7/multiarch/memchr.S [new file with mode: 0644]
sysdeps/arm/armv7/multiarch/memchr_impl.S [new file with mode: 0644]
sysdeps/arm/armv7/multiarch/memchr_neon.S [new file with mode: 0644]