1 ext4: handle unwritten or delalloc buffers before enabling data journaling
3 From: Daeho Jeong <daeho.jeong@samsung.com>
5 We already allocate delalloc blocks before changing the inode mode into
6 "per-file data journal" mode to prevent delalloc blocks from remaining
7 not allocated, but another issue concerned with "BH_Unwritten" status
8 still exists. For example, by fallocate(), several buffers' status
9 change into "BH_Unwritten", but these buffers cannot be processed by
10 ext4_alloc_da_blocks(). So, they still remain in unwritten status after
11 per-file data journaling is enabled and they cannot be changed into
12 written status any more and, if they are journaled and eventually
13 checkpointed, these unwritten buffer will cause a kernel panic by the
14 below BUG_ON() function of submit_bh_wbc() when they are submitted
17 static int submit_bh_wbc(int rw, struct buffer_head *bh,...
20 BUG_ON(buffer_unwritten(bh));
22 Moreover, when "dioread_nolock" option is enabled, the status of a
23 buffer is changed into "BH_Unwritten" after write_begin() completes and
24 the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
25 if a buffer's status is changed into unwrutten but the buffer's I/O is
26 not submitted and completed, it can cause the same problem after
27 enabling per-file data journaling. You can easily generate this bug by
28 executing the following command.
30 ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269
32 To resolve these problems and define a boundary between the previous
33 mode and per-file data journaling mode, we need to flush and wait all
34 the I/O of buffers of a file before enabling per-file data journaling
37 Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
38 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
39 Reviewed-by: Jan Kara <jack@suse.cz>
41 fs/ext4/inode.c | 31 ++++++++++++++++++++-----------
42 1 file changed, 20 insertions(+), 11 deletions(-)
44 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
45 index 17bfa42..779ef4c 100644
48 @@ -5452,22 +5452,29 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
50 if (is_journal_aborted(journal))
52 - /* We have to allocate physical blocks for delalloc blocks
53 - * before flushing journal. otherwise delalloc blocks can not
54 - * be allocated any more. even more truncate on delalloc blocks
55 - * could trigger BUG by flushing delalloc blocks in journal.
56 - * There is no delalloc block in non-journal data mode.
58 - if (val && test_opt(inode->i_sb, DELALLOC)) {
59 - err = ext4_alloc_da_blocks(inode);
64 /* Wait for all existing dio workers */
65 ext4_inode_block_unlocked_dio(inode);
66 inode_dio_wait(inode);
69 + * Before flushing the journal and switching inode's aops, we have
70 + * to flush all dirty data the inode has. There can be outstanding
71 + * delayed allocations, there can be unwritten extents created by
72 + * fallocate or buffered writes in dioread_nolock mode covered by
73 + * dirty data which can be converted only after flushing the dirty
74 + * data (and journalled aops don't know how to handle these cases).
77 + down_write(&EXT4_I(inode)->i_mmap_sem);
78 + err = filemap_write_and_wait(inode->i_mapping);
80 + up_write(&EXT4_I(inode)->i_mmap_sem);
81 + ext4_inode_resume_unlocked_dio(inode);
86 jbd2_journal_lock_updates(journal);
89 @@ -5492,6 +5499,8 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
92 jbd2_journal_unlock_updates(journal);
94 + up_write(&EXT4_I(inode)->i_mmap_sem);
95 ext4_inode_resume_unlocked_dio(inode);
97 /* Finally we can mark the inode as dirty. */