i386: Improve permutations with INSERTPS instruction [PR94908]
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.
gcc/testsuite/ChangeLog:
PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.