gpu: generate the AST from a single schedule tree
In particular, this means that we no longer perform nested AST generation.
The main advantage is that the entire AST generation input is available
at once, making the flow a lot easier to understand and significantly
simplifying debugging.
We are however currently limited to only generating a single instance
of a kernel launch since all the information about the kernel is computed
up front. We can also no longer set different iterator names for
different phases of the AST generation since there is only one phase left.
The synchronization may have changed sligthly in this commit.
The original synchronization was difficult to follow and it did not
seem worth it to try and replicate it exactly, especially since
we will be removing some synchronization in later commits.
In order to keep this commit as small as possible, some code
that is no longer used is still kept in the source.
This code will be removed in subsequent commits
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>