gpu backend: create single kernel for entire subtree without permutable bands