Optimise sqrt reciprocal multiplications
commit3cb2785efe2b496978a94f52920751007019a4ce
authorktkachov <ktkachov@138bc75d-0d04-0410-961f-82ee72b054a4>
Wed, 5 Sep 2018 13:39:38 +0000 (5 13:39 +0000)
committerktkachov <ktkachov@138bc75d-0d04-0410-961f-82ee72b054a4>
Wed, 5 Sep 2018 13:39:38 +0000 (5 13:39 +0000)
tree65e7f2c1e0a84d7ea2da8fca45dd3c7b86341ccb
parent659169d3b2ea9fde2873aab8e36a4d1eedc8ff2f
Optimise sqrt reciprocal multiplications

This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations.
In particular consider:

x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a * x; // same as sqrt (a)

If x, r1 and r2 are all used further on in the code, this can be transformed into:
tmp1 = 1.0 / a
tmp2 = sqrt (a)
tmp3 = tmp1 * tmp2
x = tmp3
r1 = tmp1
r2 = tmp2

A bit convoluted, but this saves us one multiplication and, more importantly, the sqrt and division are now independent.
This also allows optimisation of a subset of these expressions.
For example:
x = 1.0 / sqrt (a)
r1 = x * x

can be transformed to r1 = 1.0 / a, eliminating the sqrt if x is not used anywhere else.
And similarly:
x = 1.0 / sqrt (a)
r1 = a * x

can be transformed to sqrt (a) eliminating the division.

For the testcase:
double res, res2, tmp;
void
foo (double a, double b)
{
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;
  res2 = a * tmp;
}

We now generate for aarch64 with -Ofast:
foo:
        fmov    d2, 1.0e+0
        adrp    x2, res2
        fsqrt   d1, d0
        adrp    x1, res
        fdiv    d0, d2, d0
        adrp    x0, tmp
        str     d1, [x2, #:lo12:res2]
        fmul    d1, d1, d0
        str     d0, [x1, #:lo12:res]
        str     d1, [x0, #:lo12:tmp]
        ret

where before it generated:
foo:
        fsqrt   d2, d0
        fmov    d1, 1.0e+0
        adrp    x1, res2
        adrp    x2, tmp
        adrp    x0, res
        fdiv    d1, d1, d2
        fmul    d0, d1, d0
        fmul    d2, d1, d1
        str     d1, [x2, #:lo12:tmp]
        str     d0, [x1, #:lo12:res2]
        str     d2, [x0, #:lo12:res]
        ret

As you can see, the new sequence has one fewer multiply and the fsqrt and fdiv are independent.

* tree-ssa-math-opts.c (is_mult_by): New function.
(is_square_of): Use the above.
(optimize_recip_sqrt): New function.
(pass_cse_reciprocals::execute): Use the above.

* gcc.dg/recip_sqrt_mult_1.c: New test.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.
* gcc.dg/recip_sqrt_mult_4.c: Likewise.
* gcc.dg/recip_sqrt_mult_5.c: Likewise.
* g++.dg/recip_sqrt_mult_1.C: Likewise.
* g++.dg/recip_sqrt_mult_2.C: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@264126 138bc75d-0d04-0410-961f-82ee72b054a4
gcc/ChangeLog
gcc/testsuite/ChangeLog
gcc/testsuite/g++.dg/recip_sqrt_mult_1.C [new file with mode: 0644]
gcc/testsuite/g++.dg/recip_sqrt_mult_2.C [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c [new file with mode: 0644]
gcc/tree-ssa-math-opts.c