AioContext: fix broken placement of event_notifier_test_and_clear
commit21a03d17f2edb1e63f7137d97ba355cc6f19d79f
authorPaolo Bonzini <pbonzini@redhat.com>
Tue, 21 Jul 2015 14:07:52 +0000 (21 16:07 +0200)
committerStefan Hajnoczi <stefanha@redhat.com>
Wed, 22 Jul 2015 11:41:40 +0000 (22 12:41 +0100)
tree37068ed33f88b2e8a8eae78a63d798f6f7aa4a30
parenteabc977973103527bbb8fed69c91cfaa6691f8ab
AioContext: fix broken placement of event_notifier_test_and_clear

event_notifier_test_and_clear must be called before processing events.
Otherwise, an aio_poll could "eat" the notification before the main
I/O thread invokes ppoll().  The main I/O thread then never wakes up.
This is an example of what could happen:

   i/o thread       vcpu thread                     worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                                                     bh->scheduled = 1
                                                     event_notifier_set
                    lock_iothread
                    notify_me = 3
                    ppoll
                    notify_me = 1
                    aio_dispatch
                     aio_bh_poll
                      thread_pool_completion_bh
                                                     bh->scheduled = 1
                                                     event_notifier_set
                     node->io_read(node->opaque)
                      event_notifier_test_and_clear
   ppoll
   *** hang ***

"Tracing" with qemu_clock_get_ns shows pretty much the same behavior as
in the previous bug, so there are no new tricks here---just stare more
at the code until it is apparent.

One could also use a formal model, of course.  The included one shows
this with three processes: notifier corresponds to a QEMU thread pool
worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
waiter to the main I/O thread.  I would be happy to say that the
formal model found the bug for me, but actually I wrote it after the
fact.

This patch is a bit of a big hammer.  The next one optimizes it,
with help (this time for real rather than a posteriori :)) from
another, similar formal model.

Reported-by: Richard W. M. Jones <rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 1437487673-23740-6-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
aio-posix.c
aio-win32.c
async.c
docs/aio_notify_bug.promela [new file with mode: 0644]