[Devel] [PATCH v2] dm-ploop: fix discard writeback

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Thu Jun 19 13:42:25 MSK 2025



On 6/19/25 18:37, Andrey Zhadchenko wrote:
> 
> 
> On 6/19/25 12:17, Pavel Tikhomirov wrote:
>>
>>
>> On 6/19/25 17:16, Andrey Zhadchenko wrote:
>>> When doing simple discard tests, we encountered the following crash
>>> with panic_on_warn:
>>>
>>> [4421147.245184] ------------[ cut here ]------------
>>> [4421147.245569] WARNING: CPU: 6 PID: 505054 at drivers/md/dm- 
>>> ploop.h:392 ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
>>> [4421147.246073] Modules linked in: loop tls dm_qcow2(E) vhost_net 
>>> tap xt_physdev nft_meta_bridge tun dm_zero_rq vhost_blk vhost_vsock 
>>> vhost vhost_iotlb ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM 
>>> xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat 
>>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
>>> nft_counter 8021q garp mrp nf_tables nfnetlink rpcrdma rfkill 
>>> intel_rapl_msr intel_rapl_common intel_pmc_core intel_vsec 
>>> pmt_telemetry pmt_class kvm_intel kvm iTCO_wdt irqbypass 
>>> iTCO_vendor_support rapl vmw_vsock_virtio_transport 
>>> vmw_vsock_virtio_transport_common virtio_console vsock i2c_i801 
>>> pcspkr lpc_ich i2c_smbus pvpanic_mmio pvpanic joydev nfsd 
>>> br_netfilter veth overlay auth_rpcgss nfs_acl lockd grace bridge stp 
>>> llc sunrpc binfmt_misc xfs libcrc32c sd_mod sr_mod cdrom t10_pi sg 
>>> drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea 
>>> crc32_pclmul sysfillrect ahci sysimgblt libahci fb_sys_fops 
>>> virtio_net ghash_clmulni_intel libata net_failover drm virtio_scsi 
>>> failover
>>> [4421147.246120]  serio_raw crc32c_intel fuse_kio_pcs rdma_cm iw_cm 
>>> ib_cm ib_core fuse push_backup dm_tracking ploop dm_mod [last 
>>> unloaded: dm_qcow2]
>>> [4421147.249733] CPU: 6 PID: 505054 Comm: kworker/6:1 ve: / Kdump: 
>>> loaded Tainted: G            E  X  -------  --- 
>>> 5.14.0-427.44.1.ovz9.80.29 #1 ovz9.80.29
>>> [4421147.250261] Hardware name: Virtuozzo KVM/Virtuozzo, BIOS 
>>> 1.16.3-2.vz9.2 04/01/2014
>>> [4421147.250651] Workqueue: dio/dm-1 iomap_dio_complete_work
>>> [4421147.250914] RIP: 
>>> 0010:ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
>>> [4421147.251216] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 
>>> 48 8b b5 88 00 00 00 3b 8d 90 00 00 00 73 0c 89 cf f0 48 0f ab 3e e9 
>>> 81 fe ff ff <0f> 0b e9 7a fe ff ff 0f 0b 0f 0b e9 22 ff ff ff 0f 0b 
>>> e9 c9 fe ff
>>> [4421147.251938] RSP: 0018:ffffc07dc8447d98 EFLAGS: 00010002
>>> [4421147.252202] RAX: ffff9dc778699040 RBX: ffff9dc7413bd800 RCX: 
>>> 0000000000000000
>>> [4421147.252578] RDX: 0000000000000010 RSI: 0000000068746957 RDI: 
>>> ffff9dcc5414c420
>>> [4421147.252943] RBP: ffff9dcac2299000 R08: 0000000000000000 R09: 
>>> ffff9dcac2299400
>>> [4421147.253328] R10: 0000000000000001 R11: ffff9dc747460000 R12: 
>>> ffffc07dc8447dd0
>>> [4421147.253706] R13: ffffffffcedc7000 R14: 0000000000000074 R15: 
>>> ffff9dc8e56aa380
>>> [4421147.254071] FS:  0000000000000000(0000) 
>>> GS:ffff9dce9fb80000(0000) knlGS:0000000000000000
>>> [4421147.254468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [4421147.254743] CR2: 00007ffd825d3000 CR3: 00000001052a4002 CR4: 
>>> 0000000000372ee0
>>> [4421147.255109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> [4421147.255474] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 
>>> 0000000000000400
>>> [4421147.255848] Call Trace:
>>> [4421147.256040]  <TASK>
>>> [4421147.256225]  ? show_trace_log_lvl+0x1c4/0x2df
>>> [4421147.256466]  ? show_trace_log_lvl+0x1c4/0x2df
>>> [4421147.256712]  ? ploop_put_piwb+0x187/0x1f0 [ploop]
>>> [4421147.256959]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
>>> [4421147.257272]  ? __warn+0x81/0x110
>>> [4421147.257485]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
>>> [4421147.257775]  ? report_bug+0x10a/0x140
>>> [4421147.257997]  ? handle_bug+0x3c/0x70
>>> [4421147.258212]  ? exc_invalid_op+0x14/0x70
>>> [4421147.258438]  ? asm_exc_invalid_op+0x16/0x20
>>> [4421147.258678]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
>>> [4421147.258962]  ? ploop_advance_local_after_bat_wb+0xcc/0x2d0 [ploop]
>>> [4421147.259246]  ploop_put_piwb+0x187/0x1f0 [ploop]
>>> [4421147.259493]  ploop_do_pio_endio+0x31/0x90 [ploop]
>>> [4421147.259748]  process_one_work+0x1e2/0x3b0
>>> [4421147.259980]  ? __pfx_worker_thread+0x10/0x10
>>> [4421147.261238]  worker_thread+0x50/0x3a0
>>> [4421147.261460]  ? __pfx_worker_thread+0x10/0x10
>>> [4421147.261704]  kthread+0xdd/0x100
>>> [4421147.261915]  ? __pfx_kthread+0x10/0x10
>>> [4421147.262138]  ret_from_fork+0x29/0x50
>>> [4421147.262360]  </TASK>
>>> [4421147.262553] Kernel panic - not syncing: panic_on_warn set ...
>>>
>>> The extra debug looks like that:
>>>
>>> [ 3022.561985] got discard: pos 0, size 1048576
>>> [ 3022.562373]        processing discard: clu 0 (idx 16), len 1048576
>>> [ 3022.563182]            dropped clu 1752459607 (total 101) [page 0, 
>>> clu 0 (i 16 + off -16)]
>>>
>>> 1752459607 -> 0x68746957 -> 57 69 74 68 ("With") which is a start of 
>>> ploop
>>> header. So we clearly missed the first metadata page handling.
>>>
>>> Since ploop first metadata page contains 64bit header, we shift all
>>> clusters in bat by this value when accessing them.
>>> In ploop_advance_local_after_bat_wb() we iterate over the page either 
>>> from
>>> 0 or 16, if it is a first page. Therefore:
>>>   - i is the position of iterated cluster in the metadata page, so we 
>>> should
>>> use it for accessing/writing bat_entries and bat_levels.
>>>   - i + off is the real cluster number
>>>   - bat_entries[i] is the cluster we should punch from hole bitmap
>>>
>>> Also the current code does out-of-bound access if the metadata page is
>>> not the first, as it uses real cluster number as index when accessing
>>> the bat page (which is only 4096 bytes long)
>>>
>>> Fixes: da5f60147b62 ("dm-ploop: dm-ploop: simplify discard completion")
>>> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
>>> ---
>>> v2: dropped erroneous ploop_hole_set_bit() change, expanded commit
>>> message for a bit
>>>
>>>   drivers/md/dm-ploop-map.c | 9 ++++-----
>>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>>> index 6dd94af5fd94..988bf00ed4be 100644
>>> --- a/drivers/md/dm-ploop-map.c
>>> +++ b/drivers/md/dm-ploop-map.c
>>> @@ -853,16 +853,15 @@ static void 
>>> ploop_advance_local_after_bat_wb(struct ploop *ploop,
>>>       for (; i < last; i++) {
>>>           if (piwb->type == PIWB_TYPE_DISCARD) {
>>>               /* discard completed */
>>> -            u32 clu = i + off;
>>> -            u8 level = md->bat_levels[clu];
>>> -            u32 d_clu = READ_ONCE(bat_entries[clu]);
>>> +            u8 level = md->bat_levels[i];
>>> +            u32 d_clu = READ_ONCE(bat_entries[i]);
>>>               if (success && !dst_clu[i] &&
>>>                   (!(d_clu == BAT_ENTRY_NONE ||
>>>                   level < ploop_top_level(ploop)))) {
>>>                   WARN_ON_ONCE(ploop->nr_deltas != 1);
>>> -                WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
>>> -                WRITE_ONCE(md->bat_levels[clu], 0);
>>> +                WRITE_ONCE(bat_entries[i], BAT_ENTRY_NONE);
>>> +                WRITE_ONCE(md->bat_levels[i], 0);
>>
>> In original code in ploop_piwb_discard_completed we accessed 
>> bat_entries and bat_levels at index no more than 1024 (PAGE_SIZE / 
>> sizeof(map_index_t)) as we used ploop_bat_clu_idx_in_page() for 
>> conversion, now maximum value is not so obvious:
>>
>> (READ_ONCE(ploop->nr_bat_entries) - piwb->page_id * PAGE_SIZE / 
>> sizeof(map_index_t) + PLOOP_MAP_OFFSET - 1)
>>
>> are we sure it won't overflow?
> 
> Sure, that is the point of the patch. We use i as index in the page. It 
> is strictly lower than amount of indexes in a page:
> 
>      last = READ_ONCE(ploop->nr_bat_entries) - off;
>      if (last > PAGE_SIZE / sizeof(map_index_t))
>          last = PAGE_SIZE / sizeof(map_index_t);

Oh, sorry, I missed this "last" update. Then all seems OK.

Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

> 
> This cycle just iterates over all clusters in metadata page either form 
> 0 or from 16 in case of page 1.
> 
>      i = 0;
>      if (!piwb->page_id)
>          i = PLOOP_MAP_OFFSET;
> 
> Could you explain a bit more where we can go out-of-bound in a new code? 
> I do not see this possibility now
> 
>>
>> I mean, maybe the idea in da5f60147b62 ("dm-ploop: dm-ploop: simplify 
>> discard completion") about "ploop_advance_local_after_bat_wb is 
>> handling only one md page" was wrong?
>>
>>>                   ploop_hole_set_bit(d_clu, ploop);
>>>               }
>>
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list