AArch64: Update A64FX memset not to degrade at 16KB
commit1d9f99ce1b3788d1897cb53a76d57e973111b8fe
authorNaohiro Tamura <naohirot@fujitsu.com>
Fri, 27 Aug 2021 05:03:04 +0000 (27 05:03 +0000)
committerSzabolcs Nagy <szabolcs.nagy@arm.com>
Mon, 6 Sep 2021 09:23:24 +0000 (6 10:23 +0100)
tree684c81cc6e88650313797fadaa642d714fcce8a8
parentf873adf3df443f8d302677f963adcc3c22187e68
AArch64: Update A64FX memset not to degrade at 16KB

This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.

Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
sysdeps/aarch64/multiarch/memset_a64fx.S