gpu.c: remove_local_accesses: restrict computation to useful part of dataflow
remove_local_accesses performs some computations on the "external"
dataflow dependences that in the end are used to restrict the given
set of accesses. If there are many kernels, then a lot of this
computation is needlessly repeated over all those kernels.
Restrict the computation to that part of the dataflow dependences
that can actually influence the given set of accesses.
The computation can probably be optimized a lot further, but
this should be a reasonable first step.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>