[Devel] [PATCH RHEL9 COMMIT] dm-ploop: fix discard writeback

Wed Jun 25 13:59:12 MSK 2025

The commit is pushed to "branch-rh9-5.14.0-427.55.1.vz9.82.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh9-5.14.0-427.55.1.el9
------>
commit 40fa381f054f8888bde6567bde7851fb87bf7068
Author: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
Date:   Thu Jun 19 12:16:23 2025 +0300

    dm-ploop: fix discard writeback
    
    When doing simple discard tests, we encountered the following crash
    with panic_on_warn:
    
    [4421147.245184] ------------[ cut here ]------------
    [4421147.245569] WARNING: CPU: 6 PID: 505054 at drivers/md/dm-ploop.h:392 ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
    [4421147.246073] Modules linked in: <skipped>
    [4421147.249733] CPU: 6 PID: 505054 Comm: kworker/6:1 ve: / Kdump: loaded Tainted: G            E  X  -------  ---  5.14.0-427.44.1.ovz9.80.29 #1 ovz9.80.29
    [4421147.250261] Hardware name: Virtuozzo KVM/Virtuozzo, BIOS 1.16.3-2.vz9.2 04/01/2014
    [4421147.250651] Workqueue: dio/dm-1 iomap_dio_complete_work
    [4421147.250914] RIP: 0010:ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
    [4421147.251216] Code: <skipped>
    [4421147.251938] RSP: 0018:ffffc07dc8447d98 EFLAGS: 00010002
    [4421147.252202] RAX: ffff9dc778699040 RBX: ffff9dc7413bd800 RCX: 0000000000000000
    [4421147.252578] RDX: 0000000000000010 RSI: 0000000068746957 RDI: ffff9dcc5414c420
    [4421147.252943] RBP: ffff9dcac2299000 R08: 0000000000000000 R09: ffff9dcac2299400
    [4421147.253328] R10: 0000000000000001 R11: ffff9dc747460000 R12: ffffc07dc8447dd0
    [4421147.253706] R13: ffffffffcedc7000 R14: 0000000000000074 R15: ffff9dc8e56aa380
    [4421147.254071] FS:  0000000000000000(0000) GS:ffff9dce9fb80000(0000) knlGS:0000000000000000
    [4421147.254468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [4421147.254743] CR2: 00007ffd825d3000 CR3: 00000001052a4002 CR4: 0000000000372ee0
    [4421147.255109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [4421147.255474] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
    [4421147.255848] Call Trace:
    [4421147.256040]  <TASK>
    [4421147.256225]  ? show_trace_log_lvl+0x1c4/0x2df
    [4421147.256466]  ? show_trace_log_lvl+0x1c4/0x2df
    [4421147.256712]  ? ploop_put_piwb+0x187/0x1f0 [ploop]
    [4421147.256959]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
    [4421147.257272]  ? __warn+0x81/0x110
    [4421147.257485]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
    [4421147.257775]  ? report_bug+0x10a/0x140
    [4421147.257997]  ? handle_bug+0x3c/0x70
    [4421147.258212]  ? exc_invalid_op+0x14/0x70
    [4421147.258438]  ? asm_exc_invalid_op+0x16/0x20
    [4421147.258678]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
    [4421147.258962]  ? ploop_advance_local_after_bat_wb+0xcc/0x2d0 [ploop]
    [4421147.259246]  ploop_put_piwb+0x187/0x1f0 [ploop]
    [4421147.259493]  ploop_do_pio_endio+0x31/0x90 [ploop]
    [4421147.259748]  process_one_work+0x1e2/0x3b0
    [4421147.259980]  ? __pfx_worker_thread+0x10/0x10
    [4421147.261238]  worker_thread+0x50/0x3a0
    [4421147.261460]  ? __pfx_worker_thread+0x10/0x10
    [4421147.261704]  kthread+0xdd/0x100
    [4421147.261915]  ? __pfx_kthread+0x10/0x10
    [4421147.262138]  ret_from_fork+0x29/0x50
    [4421147.262360]  </TASK>
    [4421147.262553] Kernel panic - not syncing: panic_on_warn set ...
    
    The extra debug looks like that:
    
    [ 3022.561985] got discard: pos 0, size 1048576
    [ 3022.562373]          processing discard: clu 0 (idx 16), len 1048576
    [ 3022.563182]                  dropped clu 1752459607 (total 101) [page 0, clu 0 (i 16 + off -16)]
    
    1752459607 -> 0x68746957 -> 57 69 74 68 ("With") which is a start of ploop
    header. So we clearly missed the first metadata page handling.
    
    Since ploop first metadata page contains 64bit header, we shift all
    clusters in bat by this value when accessing them.
    In ploop_advance_local_after_bat_wb() we iterate over the page either from
    0 or 16, if it is a first page. Therefore:
     - i is the position of iterated cluster in the metadata page, so we should
    use it for accessing/writing bat_entries and bat_levels.
     - i + off is the real cluster number
     - bat_entries[i] is the cluster we should punch from hole bitmap
    
    Also the current code does out-of-bound access if the metadata page is
    not the first, as it uses real cluster number as index when accessing
    the bat page (which is only 4096 bytes long)
    
    https://virtuozzo.atlassian.net/browse/VSTOR-108725
    
    Fixes: da5f60147b62 ("dm-ploop: dm-ploop: simplify discard completion")
    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
    Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    Feature: dm-ploop: ploop target driver
---
 drivers/md/dm-ploop-map.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index 6fff0aeb9e839..62a43eaa531d5 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -853,16 +853,15 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
 	for (; i < last; i++) {
 		if (piwb->type == PIWB_TYPE_DISCARD) {
 			/* discard completed */
-			u32 clu = i + off;
-			u8 level = md->bat_levels[clu];
-			u32 d_clu = READ_ONCE(bat_entries[clu]);
+			u8 level = md->bat_levels[i];
+			u32 d_clu = READ_ONCE(bat_entries[i]);
 
 			if (success && !dst_clu[i] &&
 			    (!(d_clu == BAT_ENTRY_NONE ||
 			    level < ploop_top_level(ploop)))) {
 				WARN_ON_ONCE(ploop->nr_deltas != 1);
-				WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
-				WRITE_ONCE(md->bat_levels[clu], 0);
+				WRITE_ONCE(bat_entries[i], BAT_ENTRY_NONE);
+				WRITE_ONCE(md->bat_levels[i], 0);
 				ploop_hole_set_bit(d_clu, ploop);
 			}