x86: Tune Skylake, Cannonlake and Icelake as Haswell
commit4810dbfbd115be3180bc36984093e5627804e9f1
authorhjl <hjl@138bc75d-0d04-0410-961f-82ee72b054a4>
Fri, 13 Jul 2018 20:36:01 +0000 (13 20:36 +0000)
committerhjl <hjl@138bc75d-0d04-0410-961f-82ee72b054a4>
Fri, 13 Jul 2018 20:36:01 +0000 (13 20:36 +0000)
treed4259c38880d0b3093c51fdbfc6bcec340ece947
parent950c7ddb563d0f91ef6db973f0393e7edba811f1
x86: Tune Skylake, Cannonlake and Icelake as Haswell

r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
generates slower codes on Skylake than before.  The same also applies
to Cannonlake and Icelak tuning.

This patch changes -mtune={skylake|cannonlake|icelake} to tune like
-mtune=haswell for until their tuning is properly adjusted. It also
enables -mprefer-vector-width=256 for -mtune=haswell, which has no
impact on codegen when AVX512 isn't enabled.

Performance impacts on SPEC CPU 2017 rate with 1 copy using

-march=native -mfpmath=sse -O2 -m64

are

1. On Broadwell server:

500.perlbench_r -0.56%
502.gcc_r -0.18%
505.mcf_r 0.24%
520.omnetpp_r 0.00%
523.xalancbmk_r -0.32%
525.x264_r -0.17%
531.deepsjeng_r 0.00%
541.leela_r 0.00%
548.exchange2_r 0.12%
557.xz_r 0.00%
Geomean 0.00%

503.bwaves_r 0.00%
507.cactuBSSN_r 0.21%
508.namd_r 0.00%
510.parest_r 0.19%
511.povray_r -0.48%
519.lbm_r 0.00%
521.wrf_r 0.28%
526.blender_r 0.19%
527.cam4_r 0.39%
538.imagick_r 0.00%
544.nab_r -0.36%
549.fotonik3d_r 0.51%
554.roms_r 0.00%
Geomean 0.17%

On Skylake client:

500.perlbench_r 0.96%
502.gcc_r 0.13%
505.mcf_r -1.03%
520.omnetpp_r -1.11%
523.xalancbmk_r 1.02%
525.x264_r 0.50%
531.deepsjeng_r 2.97%
541.leela_r 0.50%
548.exchange2_r -0.95%
557.xz_r 2.41%
Geomean 0.56%

503.bwaves_r 0.49%
507.cactuBSSN_r 3.17%
508.namd_r 4.05%
510.parest_r 0.15%
511.povray_r 0.80%
519.lbm_r 3.15%
521.wrf_r 10.56%
526.blender_r 2.97%
527.cam4_r 2.36%
538.imagick_r 46.40%
544.nab_r 2.04%
549.fotonik3d_r 0.00%
554.roms_r 1.27%
Geomean 5.49%

On Skylake server:

500.perlbench_r 0.71%
502.gcc_r -0.51%
505.mcf_r -1.06%
520.omnetpp_r -0.33%
523.xalancbmk_r -0.22%
525.x264_r 1.72%
531.deepsjeng_r -0.26%
541.leela_r 0.57%
548.exchange2_r -0.75%
557.xz_r -1.28%
Geomean -0.21%

503.bwaves_r 0.00%
507.cactuBSSN_r 2.66%
508.namd_r 3.67%
510.parest_r 1.25%
511.povray_r 2.26%
519.lbm_r 1.69%
521.wrf_r 11.03%
526.blender_r 3.39%
527.cam4_r 1.69%
538.imagick_r 64.59%
544.nab_r -0.54%
549.fotonik3d_r 2.68%
554.roms_r 0.00%
Geomean 6.19%

This patch improves -march=native performance on Skylake up to 60% and
leaves -march=native performance unchanged on Haswell.

gcc/

Backport from mainline
2018-07-13  H.J. Lu  <hongjiu.lu@intel.com>
    Sunil K Pandey  <sunil.k.pandey@intel.com>

PR target/84413
* config/i386/i386.c (m_CORE_AVX512): New.
(m_CORE_AVX2): Likewise.
(m_CORE_ALL): Add m_CORE_AVX2.
* config/i386/x86-tune.def: Replace m_HASWELL with m_CORE_AVX2.
Replace m_SKYLAKE_AVX512 with m_CORE_AVX512 on avx256_optimal
and remove the rest of m_SKYLAKE_AVX512.

gcc/testsuite/

Backport from mainline
2018-07-13  H.J. Lu  <hongjiu.lu@intel.com>
    Sunil K Pandey  <sunil.k.pandey@intel.com>

PR target/84413
* gcc.target/i386/pr84413-1.c: New test.
* gcc.target/i386/pr84413-2.c: Likewise.
* gcc.target/i386/pr84413-3.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-8-branch@262650 138bc75d-0d04-0410-961f-82ee72b054a4
gcc/ChangeLog
gcc/config/i386/i386.c
gcc/config/i386/x86-tune.def
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/i386/pr84413-1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr84413-2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/pr84413-3.c [new file with mode: 0644]