NVIDIA Volta performance tweaks
commitbb4128daf630719a670dc795b2cf2cc937c2c8f1
authorSzilárd Páll <pall.szilard@gmail.com>
Mon, 4 Sep 2017 15:26:59 +0000 (4 17:26 +0200)
committerMark Abraham <mark.j.abraham@gmail.com>
Mon, 11 Sep 2017 15:09:24 +0000 (11 17:09 +0200)
tree1044226857c3218672105044f2574f9465e93d4c
parent149c6633b5f9372a8b0143888b014ba2de411fce
NVIDIA Volta performance tweaks

Removed ballot syncs and replaced all computed masks with full warp
mask (as all branches in question are warp-synchronous).
This improves performance by 7-12%.

Change-Id: I769d6d8f0d171eb528d30868d567624d5e246dbf
src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda_kernel.cuh