copy entire array tile to shared memory
When copying data to/from shared memory, we always copied exactly
those elements that were going to be read or that had been written
inside the kernel. This could sometimes contribute to complicated
code being generated for the copying, especially for stencil computations.
For reading from global memory to shared memory, we now copy the
entire global array tile to shared memory (under some conditions).
This may result in some extra elements getting copied that will
not be used, but should also result in simpler code and/or less divergence.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>