[Devel] [PATCH RHEL7 COMMIT] writeback: Write dirty times for WB_SYNC_ALL writeback

Vasily Averin vvs at virtuozzo.com
Mon Oct 4 13:57:23 MSK 2021


The commit is pushed to "branch-rh7-3.10.0-1160.42.2.vz7.184.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.42.2.vz7.184.1
------>
commit 34d1136342cc5b90390ff9a7f91b3bb24ae665d3
Author: Jan Kara <jack at suse.cz>
Date:   Mon Oct 4 13:57:23 2021 +0300

    writeback: Write dirty times for WB_SYNC_ALL writeback
    
    ms commit dc5ff2b1d66f
    
    Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
    queue_io() so that inodes which have only dirty timestamps are properly
    written on fsync(2) and sync(2). However there are other call sites -
    most notably going through write_inode_now() - which expect inode to be
    clean after WB_SYNC_ALL writeback. This is not currently true as we do
    not clear I_DIRTY_TIME in __writeback_single_inode() even for
    WB_SYNC_ALL writeback in all the cases. This then resulted in the
    following oops because bdev_write_inode() did not clean the inode and
    writeback code later stumbled over a dirty inode with detached wb.
    
      general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
      Modules linked in:
      CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: writeback wb_workfn (flush-11:0)
      task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
      RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
      locked_inode_to_wb_and_lock_list+0xa2/0x750
      RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
      RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
      RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
      R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
      R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
      FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
      DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Stack:
       ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
       ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
       ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
      Call Trace:
       [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
       [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
       [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
       [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
       [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
       [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
       [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
       [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
       [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
       [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
      Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
      00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
      80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
      RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
      RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
      fs/fs-writeback.c:281
       RSP <ffff88006cdaf7d0>
      ---[ end trace 986a4d314dcb2694 ]---
    
    Fix the problem by making sure __writeback_single_inode() writes inode
    only with dirty times in WB_SYNC_ALL mode.
    
    Reported-by: Dmitry Vyukov <dvyukov at google.com>
    Tested-by: Laurent Dufour <ldufour at linux.vnet.ibm.com>
    Signed-off-by: Jan Kara <jack at suse.cz>
    Signed-off-by: Jens Axboe <axboe at fb.com>
    
    This loses to dirty inode, when it's called from freeze_bdev().
    So, backup loses mtime.
    
    In scope of #PSBM-134225 (but not a not final fix)
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
 fs/fs-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c16a39f4f724..1c8c27188361 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -553,6 +553,7 @@ __do_writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 	dirty = inode->i_state & I_DIRTY;
 	if (inode->i_state & I_DIRTY_TIME) {
 		if ((dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) ||
+		    wbc->sync_mode == WB_SYNC_ALL ||
 		    unlikely(inode->i_state & I_DIRTY_TIME_EXPIRED) ||
 		    unlikely(time_after(jiffies,
 					(inode->dirtied_time_when +


More information about the Devel mailing list