swap ops in reassoc to reduce cross backedge FMA
commit746344dd53807d840c29f52adba10d0ab093bd3d
authorDi Zhao <dizhao@os.amperecomputing.com>
Thu, 9 Nov 2023 07:06:37 +0000 (9 15:06 +0800)
committerDi Zhao <dizhao@os.amperecomputing.com>
Thu, 23 Nov 2023 12:56:31 +0000 (23 20:56 +0800)
treed125ee965f08ace40f9b75f608bfebe4bc2ff935
parentef296fb37cac12a5a10e83c16ae021a624e1238c
swap ops in reassoc to reduce cross backedge FMA

Previously for ops.length >= 3, when FMA is present, we don't
rank the operands so that more FMAs can be preserved. But this
brings more FMAs with loop dependency, which lead to worse
performance on some targets.

Rank the oprands (set width=2) when:
1. avoid_fma_max_bits is set.
2. And loop dependent FMA sequence is found.

In this way, we don't have to discard all the FMA candidates
in the bad shaped sequence in widening_mul, instead we can keep
fewer FMAs without loop dependency.

With this patch, there's about 2% improvement in 510.parest_r
1-copy run on ampere1 (with "-Ofast -mcpu=ampere1 -flto
--param avoid-fma-max-bits=512").

PR tree-optimization/110279

gcc/ChangeLog:

* tree-ssa-reassoc.cc (get_reassociation_width): check
for loop dependent FMAs.
(reassociate_bb): For 3 ops, refine the condition to call
swap_ops_for_binary_stmt.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: New test.
gcc/testsuite/gcc.dg/pr110279-1.c [new file with mode: 0644]
gcc/tree-ssa-reassoc.cc