Move calling of special force functions
All special force provider algorithm functions are now collected
in computeSpecialForces(). This function is now called before
the wait for non-bondeds on the GPU. This allows for more overlap.
The only cost is potentially one extra force buffer reduction at
steps where the virial is needed, but with PME we already do that.
To achieve this, f->f_novirsum is now always set, as the
documentation already (incorrectly) stated.
This change also simplifies both do_force routines and clarifies
where special algorithms can be called, which could now actually
be anywhere in do_force().
Change-Id: I0711a379ed3c31838ede9e55c4cd5d0a95e967fd