kernel - fine-grained namecache and partial vnode MPSAFE work
Namecache subsystem
* All vnode->v_flag modifications now use vsetflags() and vclrflags().
Because some flags are set and cleared by vhold()/vdrop() which
do not require any locks to be held, all modifications must use atomic
ops.
* Clean up and revamp the namecache MPSAFE work. Namecache operations now
use a fine-grained MPSAFE locking model which loosely follows these
rules:
- lock ordering is child to parent. e.g. lock file, then lock parent
directory. This allows resolver recursions up the parent directory
chain.
- Downward-traversing namecache invalidations and path lookups will
unlock the parent (but leave it referenced) before attempting to
lock the child.
- Namecache hash table lookups utilize a per-bucket spinlock.
- vnode locks may be acquired while holding namecache locks but not
vise-versa. VNodes are not destroyed until all namecache references
go away, but can enter reclamation. Namecache lookups detect the case
and re-resolve to overcome the race. Namecache entries are not
destroyed while referenced.
* Remove vfs_token, the namecache MPSAFE model is now totally fine-grained.
* Revamp namecache locking primitves (cache_lock/cache_unlock and
friends). Use atomic ops and nc_exlocks instead of nc_locktd and
build-in a request flag. This solves busy/tsleep races between lock
holder and lock requester.
* Revamp namecache parent/child linkages. Instead of using vfs_token to
lock such operations we simply lock both child and parent namecache
entries. Hash table operations are also fully integrated with the
parent/child linking operations.
* The vnode->v_namecache list is locked via vnode->v_spinlock, which
is actually vnode->v_lock.lk_spinlock.
* Revamp cache_vref() and cache_vget(). The passed namecache entry must
be referenced and locked. Internals are simplified.
* Fix a deadlock by moving the call to _cache_hysteresis() to a
place where the current thread otherwise does not hold any locked
ncp's.
* Revamp nlookup() to follow the new namecache locking rules.
* Fix a number of places, e.g. in vfs/nfs/nfs_subs.c, where ncp->nc_parent
or ncp->nc_vp was being accessed with an unlocked ncp. nc_parent
and nc_vp accesses are only valid if the ncp is locked.
* Add the vfs.cache_mpsafe sysctl, which defaults to 0. This may be set
to 1 to enable MPSAFE namecache operations for [l,f]stat() and open()
system calls (for the moment).
VFS/VNODE subsystem
* Use a global spinlock for now called vfs_spin to manage vnode_free_list.
Use vnode->v_spinlock (and vfs_spin) to manage vhold/vdrop ops and
to interlock v_auxrefs tests against vnode terminations.
* Integrate per-mount mnt_token and (for now) the MP lock into VOP_*()
and VFS_*() operations. This allows the MP lock to be shifted further
inward from the system calls, but we don't do it quite yet.
* HAMMER: VOP_GETATTR, VOP_READ, and VOP_INACTIVE are now MPSAFE. The
corresponding sysctls have been removed.
* FIFOFS: Needed some MPSAFE work in order to allow HAMMER to make things
MPSAFE above, since HAMMER forwards vops for in-filesystem fifos to
fifofs.
* Add some debugging kprintf()s when certain MP races are averted, for
testing only.
MISC
* Add some assertions to the VM system.
* Document existing and newly MPSAFE code.
44 files changed: