i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]
Also introduce -m[no-]partial-vector-fp-math option to disable trapping
V2SF named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.
The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedron capacita
benchmark can be achieved vs. scalar code.
Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
vs. scalar code. This is what clang does by default, as it defaults
to -fno-trapping-math.
PR target/110832
gcc/ChangeLog:
* config/i386/i386.opt (mpartial-vector-fp-math): New option.
* config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize
upper part of V2SFmode register with -fno-trapping-math.
(<plusminusmult:insn>v2sf3): Enable for ix86_partial_vec_fp_math.
(divv2sf3): Ditto.
(<smaxmin:code>v2sf3): Ditto.
(sqrtv2sf2): Ditto.
(*mmx_haddv2sf3_low): Ditto.
(*mmx_hsubv2sf3_low): Ditto.
(vec_addsubv2sf3): Ditto.
(vec_cmpv2sfv2si): Ditto.
(vcond<V2FI:mode>v2sf): Ditto.
(fmav2sf4): Ditto.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.
(fix_truncv2sfv2si2): Ditto.
(fixuns_truncv2sfv2si2): Ditto.
(floatv2siv2sf2): Ditto.
(floatunsv2siv2sf2): Ditto.
(nearbyintv2sf2): Ditto.
(rintv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.
* doc/invoke.texi (x86 Options): Document
-mpartial-vector-fp-math option.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110832-1.c: New test.
* gcc.target/i386/pr110832-2.c: New test.
* gcc.target/i386/pr110832-3.c: New test.