gpu: generate pure host code if the schedule does not exhibit parallelism
Ideally, we should use a cost model to determine whether we should
generate GPU code or CPU code. However, if the computed schedule
does not exhibit any parallelism, then it is pretty clear that we
should not generate GPU code.
The absence of parallelism is detected for PolyBench benchmarks
cholesky and ludcmp because of memory based dependences induced
by a scalar variable. The symm benchmark also has such a scalar,
but it nevertheless does exhibit some parallelism and therefore
does not trigger the generation of pure host code based on this
commit.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>