1 jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction
3 From: Mingming Cao <cmm@us.ibm.com>
5 journal_try_to_free_buffers() could race with jbd commit transaction
6 when the later is holding the buffer reference while waiting for the
7 data buffer to flush to disk. If the caller of
8 journal_try_to_free_buffers() request tries hard to release the buffers,
9 it will treat the failure as error and return back to the caller. We
10 have seen the directo IO failed due to this race. Some of the caller of
11 releasepage() also expecting the buffer to be dropped when passed with
12 GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().
14 With this patch, if the caller is passing the GFP_KERNEL to indicating
15 this call could wait, in case of try_to_free_buffers() failed, let's
16 waiting for journal_commit_transaction() to finish commit the current
17 committing transaction , then try to free those buffers again with
20 Signed-off-by: Mingming Cao <cmm@us.ibm.com>
21 Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
22 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
24 fs/jbd2/transaction.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++----
25 1 file changed, 58 insertions(+), 4 deletions(-)
27 Index: linux-2.6.26-rc6/fs/jbd2/transaction.c
28 ===================================================================
29 --- linux-2.6.26-rc6.orig/fs/jbd2/transaction.c 2008-06-17 10:43:20.000000000 -0700
30 +++ linux-2.6.26-rc6/fs/jbd2/transaction.c 2008-06-17 10:43:27.000000000 -0700
31 @@ -41,7 +41,6 @@ static void __jbd2_journal_temp_unlink_b
32 * new transaction and we can't block without protecting against other
33 * processes trying to touch the journal while it is in transition.
35 - * Called under j_state_lock
38 static transaction_t *
39 @@ -1656,12 +1655,43 @@ out:
44 + * jbd2_journal_try_to_free_buffers() could race with
45 + * jbd2_journal_commit_transaction(). The later might still hold the
46 + * reference count to the buffers when inspecting them on
47 + * t_syncdata_list or t_locked_list.
49 + * jbd2_journal_try_to_free_buffers() will call this function to
50 + * wait for the current transaction to finish syncing data buffers, before
51 + * try to free that buffer.
53 + * Called with journal->j_state_lock hold.
55 +static void jbd2_journal_wait_for_transaction_sync_data(journal_t *journal)
57 + transaction_t *transaction = NULL;
60 + spin_lock(&journal->j_state_lock);
61 + transaction = journal->j_committing_transaction;
64 + spin_unlock(&journal->j_state_lock);
68 + tid = transaction->t_tid;
69 + spin_unlock(&journal->j_state_lock);
70 + jbd2_log_wait_commit(journal, tid);
74 * int jbd2_journal_try_to_free_buffers() - try to free page buffers.
75 * @journal: journal for operation
76 * @page: to try and free
77 - * @unused_gfp_mask: unused
78 + * @gfp_mask: we use the mask to detect how hard should we try to release
79 + * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
80 + * release the buffers.
83 * For all the buffers on this page,
84 @@ -1690,9 +1720,11 @@ out:
85 * journal_try_to_free_buffer() is changing its state. But that
86 * cannot happen because we never reallocate freed data as metadata
87 * while the data is part of a transaction. Yes?
89 + * Return 0 on failure, 1 on success
91 int jbd2_journal_try_to_free_buffers(journal_t *journal,
92 - struct page *page, gfp_t unused_gfp_mask)
93 + struct page *page, gfp_t gfp_mask)
95 struct buffer_head *head;
96 struct buffer_head *bh;
97 @@ -1708,7 +1740,8 @@ int jbd2_journal_try_to_free_buffers(jou
99 * We take our own ref against the journal_head here to avoid
100 * having to add tons of locking around each instance of
101 - * jbd2_journal_remove_journal_head() and jbd2_journal_put_journal_head().
102 + * jbd2_journal_remove_journal_head() and
103 + * jbd2_journal_put_journal_head().
105 jh = jbd2_journal_grab_journal_head(bh);
107 @@ -1721,7 +1754,28 @@ int jbd2_journal_try_to_free_buffers(jou
110 } while ((bh = bh->b_this_page) != head);
112 ret = try_to_free_buffers(page);
115 + * There are a number of places where jbd2_journal_try_to_free_buffers()
116 + * could race with jbd2_journal_commit_transaction(), the later still
117 + * holds the reference to the buffers to free while processing them.
118 + * try_to_free_buffers() failed to free those buffers. Some of the
119 + * caller of releasepage() request page buffers to be dropped, otherwise
120 + * treat the fail-to-free as errors (such as generic_file_direct_IO())
122 + * So, if the caller of try_to_release_page() wants the synchronous
123 + * behaviour(i.e make sure buffers are dropped upon return),
124 + * let's wait for the current transaction to finish flush of
125 + * dirty data buffers, then try to free those buffers again,
126 + * with the journal locked.
128 + if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
129 + jbd2_journal_wait_for_transaction_sync_data(journal);
130 + ret = try_to_free_buffers(page);