tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register. This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.
Also fix the 32-bit avx2, which should have used VPBROADCASTW.
Fixes:
1e262b49b533
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>