util/bufferiszero: Add simd acceleration for aarch64
commit22437b4de94c37e6104d90d46b31d80cf14358d4
authorRichard Henderson <richard.henderson@linaro.org>
Sat, 10 Feb 2024 00:02:17 +0000 (10 00:02 +0000)
committerRichard Henderson <richard.henderson@linaro.org>
Fri, 3 May 2024 15:03:35 +0000 (3 08:03 -0700)
treeb849d2df917884f0bb140e9d0f5308ae0ca10d86
parentbf67aa3dd2d8b28d7618d8ec62cd9f6055366751
util/bufferiszero: Add simd acceleration for aarch64

Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
double-check with the compiler flags for __ARM_NEON and don't bother with
a runtime check.  Otherwise, model the loop after the x86 SSE2 function.

Use UMAXV for the vector reduction.  This is 3 cycles on cortex-a76 and
2 cycles on neoverse-n1.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
util/bufferiszero.c