Add SSE4.1 trunc, truncf (bug 20142).
commitae8372d7e4c44f6839aa3d851d4d0cb486b81cd5
authorJoseph Myers <joseph@codesourcery.com>
Wed, 20 Sep 2017 16:54:05 +0000 (20 16:54 +0000)
committerJoseph Myers <joseph@codesourcery.com>
Wed, 20 Sep 2017 16:54:05 +0000 (20 16:54 +0000)
tree83340587a4086402e9f1686c278aa1a264ef77e7
parenta856d4d4a8a56eaefdddb58884bfa2bfe922ee4c
Add SSE4.1 trunc, truncf (bug 20142).

This patch adds SSE4.1 versions of trunc and truncf, using the roundsd
/ roundss instructions, similar to the versions of ceil, floor, rint
and nearbyint functions we already have.  In my testing with the glibc
benchtests these are about 30% faster than the C versions for double,
20% faster for float.

Tested for x86_64.

[BZ #20142]
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1.
* sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file.
* sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
ChangeLog
NEWS
sysdeps/x86_64/fpu/multiarch/Makefile
sysdeps/x86_64/fpu/multiarch/s_trunc-c.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_trunc.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_truncf-c.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/s_truncf.c [new file with mode: 0644]