block: Fix hangs in synchronous APIs with iothreads
commit4482258130f8a54ade448a483599888570e73e92
authorKevin Wolf <kwolf@redhat.com>
Mon, 7 Jan 2019 12:02:48 +0000 (7 13:02 +0100)
committerMichael Roth <mdroth@linux.vnet.ibm.com>
Tue, 30 Jul 2019 20:44:05 +0000 (30 15:44 -0500)
tree730402e3f6e7a8d0db355a34c16115e4b481bdd9
parent41dd30ff634a3fc8892480881d1abc76daeb5e95
block: Fix hangs in synchronous APIs with iothreads

In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().

For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.

Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.

The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).

The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 4720cbeea1f42fd905fc69338fd42b191e58b412)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
block.c
block/block-backend.c
block/io.c
block/nbd-client.c
block/nvme.c
block/qcow2.c
block/qed.c
tests/Makefile.include
tests/test-block-iothread.c [new file with mode: 0644]