i686: Use generic sinf implementation for SSE2 version
Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X),
the generic algorithm shows slight better performance for
the 'workload-huge.wrf' input set.
* s_sinf-sse2.S:
"sinf": {
"": {
"duration": 3.72405e+09,
"iterations": 2.38374e+08,
"max": 63.973,
"min": 11.211,
"mean": 15.6227
},
"workload-random.wrf": {
"duration": 3.76923e+09,
"iterations": 8.4e+07,
"reciprocal-throughput": 17.6355,
"latency": 72.108,
"max-throughput": 5.67037e+07,
"min-throughput": 1.38681e+07
},
"workload-huge.wrf": {
"duration": 3.76943e+09,
"iterations": 6e+07,
"reciprocal-throughput": 29.3493,
"latency": 96.2985,
"max-throughput": 3.40724e+07,
"min-throughput": 1.03844e+07
}
}
* generic s_sinf.c:
"sinf": {
"": {
"duration": 3.70989e+09,
"iterations": 2.18025e+08,
"max": 69.782,
"min": 11.1,
"mean": 17.0159
},
"workload-random.wrf": {
"duration": 3.77213e+09,
"iterations": 9.6e+07,
"reciprocal-throughput": 17.5402,
"latency": 61.0459,
"max-throughput": 5.70119e+07,
"min-throughput": 1.63811e+07
},
"workload-huge.wrf": {
"duration": 3.81576e+09,
"iterations": 5.6e+07,
"reciprocal-throughput": 38.2111,
"latency": 98.0659,
"max-throughput": 2.61704e+07,
"min-throughput": 1.01972e+07
}
}
Checked on i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>