Add support for CUDA CC 6.0/6.1
This change adds build-system and kernel generator support for the
Pascal architectures announced so far (GP100: 6.0, GP104: 6.1) and
supported by the CUDA 8.0 compiler.
By default we now generate binary as well as PTX code for both sm_60 and
sm_61 and given the considerable differences between the two, we also
generate PTX for both virtual arch. For now we don't add CC 6.2 (GP102)
compilation support as know nothing about it.
On the kernel generation side, given the increased register file, for
CC 6.0 the "wider" 128 threads/block kernels are enabled, on 6.1 and
later the 64 threads/block remains.
Some macros that were incorrectly left behind by the
adbada4 fix had to
be eliminated from the CUDA host code because these caused double
definitions.
Change-Id: I7f465651125fe135255ce5c84db644c62caeea6b