Add new pow implementation
commit424c4f60ed6190e2ea0e72e0873bf3ebcbbf5448
authorSzabolcs Nagy <szabolcs.nagy@arm.com>
Wed, 13 Jun 2018 16:57:20 +0000 (13 17:57 +0100)
committerSzabolcs Nagy <szabolcs.nagy@arm.com>
Wed, 19 Sep 2018 09:04:51 +0000 (19 10:04 +0100)
tree52fbd60de3d3b1e99208b3018cf79ee8a230a878
parentdab9c3488e86d5304f3e4b778933760374494a82
Add new pow implementation

The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent).  The __exp1 internal symbol
is no longer necessary.

There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases.  The lookup table and consts for
log are 4168 bytes.  The .rodata+.text is decreased by 37908 bytes on
aarch64.  The non-nearest rounding error is less than 1 ULP.

Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]

Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.

* NEWS: Mention pow improvements.
* math/Makefile (type-double-routines): Add e_pow_log_data.
* sysdeps/generic/math_private.h (__exp1): Remove.
* sysdeps/i386/fpu/e_pow_log_data.c: New file.
* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
contraction.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
(exp_inline): Remove.
(__ieee754_exp): Only single double input is handled.
* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
(__pow_log_data): Define.
* sysdeps/ieee754/dbl-64/upow.h: Remove.
* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
contraction.
(CFLAGS-e_pow-fma4.c): Likewise.
15 files changed:
ChangeLog
NEWS
math/Makefile
sysdeps/generic/math_private.h
sysdeps/i386/fpu/e_pow_log_data.c [new file with mode: 0644]
sysdeps/ia64/fpu/e_pow_log_data.c [new file with mode: 0644]
sysdeps/ieee754/dbl-64/Makefile
sysdeps/ieee754/dbl-64/e_exp.c
sysdeps/ieee754/dbl-64/e_pow.c
sysdeps/ieee754/dbl-64/e_pow_log_data.c [new file with mode: 0644]
sysdeps/ieee754/dbl-64/math_config.h
sysdeps/ieee754/dbl-64/upow.h [deleted file]
sysdeps/ieee754/dbl-64/upow.tbl [deleted file]
sysdeps/m68k/m680x0/fpu/e_pow_log_data.c [new file with mode: 0644]
sysdeps/x86_64/fpu/multiarch/Makefile