copy entire array tile to shared memory
commitfbb71c74f1847f9f8b4c5132f76cbe61c94fbdac
authorSven Verdoolaege <skimo@kotnet.org>
Wed, 29 Jun 2011 13:33:12 +0000 (29 15:33 +0200)
committerSven Verdoolaege <skimo@kotnet.org>
Thu, 7 Jul 2011 21:21:26 +0000 (7 23:21 +0200)
treebdc153fe0170873c566a25e2bfc53e465e812cff
parent512c9d1ae54f37ab2cfb7c4dc8e6e8d94429f99a
copy entire array tile to shared memory

When copying data to/from shared memory, we always copied exactly
those elements that were going to be read or that had been written
inside the kernel.  This could sometimes contribute to complicated
code being generated for the copying, especially for stencil computations.

For reading from global memory to shared memory, we now copy the
entire global array tile to shared memory (under some conditions).
This may result in some extra elements getting copied that will
not be used, but should also result in simpler code and/or less divergence.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
cuda.c