kernel - Refactor cpu localization for VM page allocations (3)
* Instead of iterating the cpus in the mask starting at cpu #0, iterate
starting at mycpu to the end, then from 0 to mycpu - 1.
This fixes random masked wakeups from favoring lower-numbered cpus.
* The user process scheduler (usched_dfly) was favoring lower-numbered
cpus due to a bug in the simple selection algorithm, causing forked
processes to initially weight improperly. A high fork or fork/exec
rate skewed the way the cpus were loaded.
Fix this by correctly scanning cpus from the (scancpu) rover.
* For now, use a random 'previous' affinity for initially scheduling a
fork.