remove duplicate macros and enable Wundef in CUDA
Commit
adbada4 left a checks of the host-side undefined __CUDA_ARCH__
undefined macro behind in nbnxn_cuda.cu as well as some unused macros.
This change cleans up these leftovers.
Additionally, this change enables -Wundef for CUDA files, but only for
CUDA >=v7.5 as prior versions come with a header that uses a macro check
without checking.
As the host-side code still includes kernels with __CUDA_ARCH__ which is
not defined in the host compilation pass, we need create our own copy
of this arch macro in a new header that is meant to contain CUDA
arch-specific stuff.
Refs #1855
Change-Id: Ibab891eaee11ea8952a67125f7ac4cc620b88d1c