Rewrite virtual loss support
Instead of directly manipulating the u stats, track the number
of parallel descents separately and use during node evaluation.
Also, manipulate the counter atomically.
On highly parallel machines, the amount of inconsistency introduced by
virtual loss was just too huge - mostly, there would be a tendency of
"forgetting" the virtual loss addition, but performing the removal,
therefore swinging the results off balance, an effect that would be
self-reinforcing (a funny consequence of how the search works) and
cause the tree to quickly deteriorate into very long, not particulary
wise branches.
The change makes virtual loss support reliable (though I did not test
*too* rigorously if it still works so well, game performance wise)
and doesn't seem to introduce any noticeable performance degradation.