Remove pass_cpb which is related to enable avx512 embedded broadcast from constant...
commita6291d88d5b6c17d41950e21d7d452f7f0f73020
authorliuhongt <hongtao.liu@intel.com>
Tue, 13 Jul 2021 10:22:03 +0000 (13 18:22 +0800)
committerliuhongt <hongtao.liu@intel.com>
Thu, 22 Jul 2021 05:07:29 +0000 (22 13:07 +0800)
tree35f123fcca95b4c991f92c66952d105f3a1d4c7c
parenta56c251898ea70b46798d7893a871bcfe318529b
Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.

Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.

benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast

The performance diff

strategy    : cycles
memory      : 1046611188
memory      : 1255420817
memory      : 1044720793
memory      : 1253414145
average     : 1097868397

broadcast   : 1044430688
broadcast   : 1044477630
broadcast   : 1253554603
broadcast   : 1044561934
average     : 1096756213

But however broadcast has larger size.

the size diff

size broadcast.o
   text    data     bss     dec     hex filename
    137       0       0     137      89 broadcast.o

size memory.o
   text    data     bss     dec     hex filename
    115       0       0     115      73 memory.o

gcc/ChangeLog:

* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
gcc/config/i386/i386-expand.c
gcc/config/i386/i386-features.c
gcc/config/i386/i386-passes.def
gcc/config/i386/i386-protos.h
gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c