kernel - Refactor struct vmstats and vm_zone
* These changes significantly improve the simultaneous non-conflicting
VM fault rate. On our 4-socket opteron (48 cores, which makes a great
test case because its cache mastership stalls are so expensive), the
maximum concurrent VM fault rate increased from ~2.4M/sec to ~3.5M/sec,
and suffers no degredation after topping out.
* Refactor the fields in struct vmstats to separate out mostly
read-only variables from nominally modified variables, reducing
cache mastership stalls.
* Remove vm_shared_hit, vm_shared_count, and vm_shared_miss sysctl
statistics, removing related cache mastership stalls from the
critical path.
* Move the spinlock in vpgqueues to the base of the structure.
* Increase the vmstats slop (how large a negative value can accumulate
in pcpu stats before rolling it up).
* Fix cache mastership stalls in the zalloc() and zfree() paths by
consolidating pcpus elements into its own cache-aligned structure
and giving each pcpu its on znalloc counter.