[2/5] AArch64: Improve A64FX memset for large sizes
commit9bc2ed8f46d80859a5596789cc9e8cc2de84b0e7
authorWilco Dijkstra <wdijkstr@arm.com>
Tue, 10 Aug 2021 12:39:37 +0000 (10 13:39 +0100)
committerWilco Dijkstra <wdijkstr@arm.com>
Tue, 10 Aug 2021 12:39:37 +0000 (10 13:39 +0100)
tree732b364e1d9578686e0ad7fee2c1d8ced1818c27
parent07b427296b8d59f439144029d9a948f6c1ce0a31
[2/5] AArch64: Improve A64FX memset for large sizes

Improve performance of large memsets. Simplify alignment code. For zero memset
use DC ZVA, which almost doubles performance. For non-zero memsets use the
unroll8 loop which is about 10% faster.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
sysdeps/aarch64/multiarch/memset_a64fx.S