PME force gathering - CUDA kernel + unit tests