Public Git Hosting - qemu/armbru.git/commit

commit	f28e0bbefa41fe643cce2f107e868abff312ced9
author	Alexander Monakov <amonakov@ispras.ru>
	Tue, 6 Feb 2024 20:48:08 +0000 (6 23:48 +0300)
committer	Richard Henderson <richard.henderson@linaro.org>
	Fri, 3 May 2024 15:03:05 +0000 (3 08:03 -0700)
tree	933db7fedccb1c2590441909271db03ff8cba52f	tree \| snapshot (tar.gz zip)
parent	93a6085618f16fb2cd316d1e84f1a638b7e2d8ff	commit \| diff

util/bufferiszero: Optimize SSE2 and AVX2 variants

Increase unroll factor in SIMD loops from 4x to 8x in order to move
their bottlenecks from ALU port contention to load issue rate (two loads
per cycle on popular x86 implementations).

Avoid using out-of-bounds pointers in loop boundary conditions.

Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-6-amonakov@ispras.ru>