Fix issues with the scheduler that were causing unnecessary reschedules
between tightly coupled processes as well as inefficient reschedules under
heavy loads.
The basic problem is that a process entering the kernel is 'passively
released', meaning its thread priority is left at TDPRI_USER_NORM. The
thread priority is only raised to TDPRI_KERN_USER if the thread switches
out. This has the side effect of forcing a LWKT reschedule when any other
user process woke up from a blocked condition in the kernel, regardless of
its user priority, because it's LWKT thread was at the higher
TDPRI_KERN_USER priority. This resulted in some significant switching
cavitation under load.
There is a twist here because we do not want to starve threads running in
the kernel acting on behalf of a very low priority user process, because
doing so can deadlock the namecache or other kernel elements that sleep with
lockmgr locks held. In addition, the 'other' LWKT thread might be associated
with a much higher priority user process that we *DO* in fact want to give
cpu to.
The solution is elegant. First, do not force a LWKT reschedule for the
above case. Second, force a LWKT reschedule on every hard clock. Remove
all the old hacks. That's it!
The result is that the current thread is allowed to return to user
mode and run until the next hard clock even if other LWKT threads (running
on behalf of a user process) are runnable. Pure kernel LWKT threads still
get absolute priority, of course. When the hard clock occurs the other LWKT
threads get the cpu and at the end of that whole mess most of those
LWKT threads will be trying to return to user mode and the user scheduler
will be able to select the best one. Doing this on a hardclock boundary
prevents cavitation from occuring at the syscall enter and return boundary.
With this change the TDF_NORESCHED and PNORESCHED flags and their associated
code hacks have also been removed, along with lwkt_checkpri_self() which
is no longer needed.