[Devel] [PATCH RHEL7 COMMIT] ms/jbd2: discard dirty data when forgetting an un-journalled buffer

Thu Aug 22 14:36:34 MSK 2019

The commit is pushed to "branch-rh7-3.10.0-957.27.2.vz7.107.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-957.27.2.vz7.107.5
------>
commit c3e05ce86f4c9c5ab668041c61051c3497f91024
Author: zhangyi (F) <yi.zhang at huawei.com>
Date:   Thu Aug 22 14:36:31 2019 +0300

    ms/jbd2: discard dirty data when forgetting an un-journalled buffer
    
    We do not unmap and clear dirty flag when forgetting a buffer without
    journal or does not belongs to any transaction, so the invalid dirty
    data may still be written to the disk later. It's fine if the
    corresponding block is never used before the next mount, and it's also
    fine that we invoke clean_bdev_aliases() related functions to unmap
    the block device mapping when re-allocating such freed block as data
    block. But this logic is somewhat fragile and risky that may lead to
    data corruption if we forget to clean bdev aliases. So, It's better to
    discard dirty data during forget time.
    
    We have been already handled all the cases of forgetting journalled
    buffer, this patch deal with the remaining two cases.
    
    - buffer is not journalled yet,
    - buffer is journalled but doesn't belongs to any transaction.
    
    We invoke __bforget() instead of __brelese() when forgetting an
    un-journalled buffer in jbd2_journal_forget(). After this patch we can
    remove all clean_bdev_aliases() related calls in ext4.
    
    https://jira.sw.ru/browse/PSBM-96719
    
    Suggested-by: Jan Kara <jack at suse.cz>
    Signed-off-by: zhangyi (F) <yi.zhang at huawei.com>
    Signed-off-by: Theodore Ts'o <tytso at mit.edu>
    Reviewed-by: Jan Kara <jack at suse.cz>
    
    (cherry picked from commit 597599268e3b91cac71faf48743f4783dec682fc)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =====================
    Patchset description:
    
    ext4/jbd2: port data corruption fixes from ms
    
    While investigating the data corruption on vzt-ploop-check test, when we
    detect one page in file which contains wrong data, we were lucky to have
    exact the same pattern in bad page each time. So we've added a small
    debug to fail on setting a dirty bit for a page if it contains the
    pattern, in the begining of __set_page_dirty and
    __set_page_dirty_nobuffers.
    
    We've got a crash, which looks related with the ported patches:
    
    crash> bt
    PID: 17855  TASK: ffff8cfb19144000  CPU: 3   COMMAND: "jbd2/ploop45613"
     #0 [ffff8cfcb6fdf8a0] machine_kexec at ffffffff9e2643c4
     #1 [ffff8cfcb6fdf900] __crash_kexec at ffffffff9e32d672
     #2 [ffff8cfcb6fdf9d0] crash_kexec at ffffffff9e32d760
     #3 [ffff8cfcb6fdf9e8] oops_end at ffffffff9e99f858
     #4 [ffff8cfcb6fdfa10] die at ffffffff9e22f88b
     #5 [ffff8cfcb6fdfa40] do_trap at ffffffff9e99eee0
     #6 [ffff8cfcb6fdfa90] do_invalid_op at ffffffff9e22c1d4
     #7 [ffff8cfcb6fdfb40] invalid_op at ffffffff9e9a928e
        [exception RIP: page_check_corruption_pattern+397]
        RIP: ffffffff9e3d719d  RSP: ffff8cfcb6fdfbf8  RFLAGS: 00010246
        RAX: ffff8cfcb6fdffd8  RBX: 00007303b0607000  RCX: 000000010025603b
        RDX: 0000000000000190  RSI: 0000000000000000  RDI: 0000000000000206
        RBP: ffff8cfcb6fdfc10   R8: ffff8cfbe6c19e00   R9: 0000000000000001
        R10: 0000000000000004  R11: 0000000000000005  R12: ffffe079873e7e40
        R13: ffff8cfca7bc3ab0  R14: ffff8cfc351d5a90  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #8 [ffff8cfcb6fdfbf0] page_check_corruption_pattern at ffffffff9e3d70d6
     #9 [ffff8cfcb6fdfc18] __set_page_dirty at ffffffff9e499cb5
     #10 [ffff8cfcb6fdfc50] mark_buffer_dirty at ffffffff9e499efa
     #11 [ffff8cfcb6fdfc70] __jbd2_journal_temp_unlink_buffer at ffffffffc048893a [jbd2]
     #12 [ffff8cfcb6fdfc80] __jbd2_journal_refile_buffer at ffffffffc048ac08 [jbd2]
     #13 [ffff8cfcb6fdfca8] jbd2_journal_commit_transaction at ffffffffc048c1e0 [jbd2]
     #14 [ffff8cfcb6fdfe48] kjournald2 at ffffffffc0491f79 [jbd2]
     #15 [ffff8cfcb6fdfec8] kthread at ffffffff9e2c4661
    
    Before ("jbd2: clear dirty flag when revoking a buffer from an older
    transaction") revoken buffer/page can be wrongly marked dirty, and later
    be wrongly written to disk. Other patches from the same series might be
    also helpful.
    
    https://jira.sw.ru/browse/PSBM-96719
    
    zhangyi (F) (3):
      jbd2: clear dirty flag when revoking a buffer from an older
            transaction
      jbd2: discard dirty data when forgetting an un-journalled buffer
      ext4: cleanup clean_bdev_aliases() calls
---
 fs/jbd2/transaction.c | 42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index c182906a8bc4..9a7d1a57f4f7 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1482,9 +1482,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 			__jbd2_journal_unfile_buffer(jh);
 			if (!buffer_jbd(bh)) {
 				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				__bforget(bh);
-				goto drop;
+				goto not_jbd;
 			}
 		}
 		spin_unlock(&journal->j_list_lock);
@@ -1517,9 +1515,40 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 			if (was_modified)
 				drop_reserve = 1;
 		}
+	} else {
+		/*
+		 * Finally, if the buffer is not belongs to any
+		 * transaction, we can just drop it now if it has no
+		 * checkpoint.
+		 */
+		spin_lock(&journal->j_list_lock);
+		if (!jh->b_cp_transaction) {
+			JBUFFER_TRACE(jh, "belongs to none transaction");
+			spin_unlock(&journal->j_list_lock);
+			goto not_jbd;
+		}
+
+		/*
+		 * Otherwise, if the buffer has been written to disk,
+		 * it is safe to remove the checkpoint and drop it.
+		 */
+		if (!buffer_dirty(bh)) {
+			__jbd2_journal_remove_checkpoint(jh);
+			spin_unlock(&journal->j_list_lock);
+			goto not_jbd;
+		}
+
+		/*
+		 * The buffer is still not written to disk, we should
+		 * attach this buffer to current transaction so that the
+		 * buffer can be checkpointed only after the current
+		 * transaction commits.
+		 */
+		clear_buffer_dirty(bh);
+		__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
+		spin_unlock(&journal->j_list_lock);
 	}
 
-not_jbd:
 	jbd_unlock_bh_state(bh);
 	__brelse(bh);
 drop:
@@ -1528,6 +1557,11 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 		handle->h_buffer_credits++;
 	}
 	return err;
+
+not_jbd:
+	jbd_unlock_bh_state(bh);
+	__bforget(bh);
+	goto drop;
 }
 
 /**