gpu: only use shared memory if kernel will be mapped to more than one thread
Originally, we could in theory end up mapping data to shared memory
if the kernel is going to be mapped to a single thread (in particular,
if the number of block dimensions is zero), because the check for coalescing
that we perform is meaningless in this case and may return 0,
making it look like it may be worthwhile to map the data
to shared memory to enable coalescing.
Simply do not even consider mapping anything to shared memory
for kernels that are going to be mapped to a single thread.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>