Use AVX unaligned memcpy only if AVX2 is availablehjl/release/2.20/master
commit328fc20e5e334a642f0152d9662474789381a897
authorH.J. Lu <hjl.tools@gmail.com>
Fri, 30 Jan 2015 14:50:20 +0000 (30 06:50 -0800)
committerH.J. Lu <hjl.tools@gmail.com>
Fri, 30 Jan 2015 20:26:33 +0000 (30 12:26 -0800)
tree6efc6b6b150d1373c36c8bdaf4a606989c152a9a
parentf80af76648ed97a76745fad6caa3315a79cb1c7c
Use AVX unaligned memcpy only if AVX2 is available

memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge.  This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.

[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.

[cherry picked from commit 56d25c11b64a97255a115901d136d753c86de24e]
sysdeps/x86_64/multiarch/init-arch.c
sysdeps/x86_64/multiarch/init-arch.h
sysdeps/x86_64/multiarch/memcpy.S
sysdeps/x86_64/multiarch/memcpy_chk.S
sysdeps/x86_64/multiarch/memmove.c
sysdeps/x86_64/multiarch/memmove_chk.c
sysdeps/x86_64/multiarch/mempcpy.S
sysdeps/x86_64/multiarch/mempcpy_chk.S