Public Git Hosting - dragonfly.git/commit

commit	caf661fcf8eadefbd5a83af42a3fcb41ac93c805
author	Matthew Dillon <dillon@apollo.backplane.com>
	Fri, 20 Oct 2023 06:00:44 +0000 (19 23:00 -0700)
committer	Matthew Dillon <dillon@apollo.backplane.com>
	Fri, 20 Oct 2023 06:00:44 +0000 (19 23:00 -0700)
tree	a0dfe2843978adccdd40f7b3d534cfca4000e348	tree \| snapshot (tar.gz zip)
parent	34fb48c236fd17fbe558c7b2cf21b4e50f38153e	commit \| diff

hammer2 - Try to reduce no-activity stalls during complex flushes

* Hammer2 keeps track of directory dependencies to maintain
  meta-data consistency at flush boundaries.  This can cause
  issues when heavy simultaneous front-end activity blows out
  dirty buffer limits and stalls in 'h2memw'.

  These front-end stalls are not supposed to be holding vnodes,
  but there do appear to be cases where the backend flusher is
  not able to immediately acquire some vnode locks during the flush.
  This causes the backend flush to skip that vnode but also
  introduce some static delays (rather than becoming cpu-bound).
  The backend flush ultimately restarts the flush and tries again.

  Situations can develop where the backend also stalls in a
  sequence of 'h2syndel' tsleep delays, resulting in zero
  cpu activity (frontend is stalled in 'h2memw'), and zero
  disk activity (backend is also stalled) for a short period of
  time.

* This problem does not lead to permanent deadlocks, however.
  H2 is always able to recover.

* Rearrange a 'h2syndel' tsleep() call in the backend flusher.
  Instead of tsleep on a per-failed-to-lock-vnode basis, we
  now finish flushing the remaining vnodes, then try to wakeup
  processes blocked in 'h2memw' on the frontend, and THEN sleep
  for a few ticks before restarting.

  This is an attempt to close the gap causing these periods of
  no-activity.