sh: Add new optimisation to the SH4 memcpy
This optimization is based on prefetching and 64bit data transfer via FPU
(only for the little endianess)
Tests shows that:
----------------------------------------
Memory bandwidth | Gain
| sh4-300 | sh4-200
----------------------------------------
512 bytes to 16KiB | ~20% | ~25%
from 32KiB to 16MiB | ~190% | ~5%
----------------------------------------
Signed-off-by: Austin Foxley <austinf@cetoncorp.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>