Move out CUDA profiler triggers from NBNXN