[Devel] [PATCH RHEL9 COMMIT] dm-ploop: fix requeuing of delayed pios at the end of BAT
Konstantin Khorenko
khorenko at virtuozzo.com
Wed Apr 30 12:06:44 MSK 2025
The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh9-5.14.0-427.44.1.vz9.80.31
------>
commit 0008940efa4f995304ca1dc224d6104e548759c7
Author: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
Date: Fri Apr 25 08:35:16 2025 +0300
dm-ploop: fix requeuing of delayed pios at the end of BAT
Case 1: Rework ploop_advance_local_after_bat_wb so that if we have delayed
update to the same page is really executed and don't endup in
wait_list.
Case 2: Due to releasing and taking locks there was a window that a new
update could go into the wait_list.
In both cases above this happens due to WRITEBACK bit cleared after
retaking the lock.
To fix this take all pending pios and clear the bit under lock,
using the same schema as in other code that checks the bit.
After releasing the lock dispatch delayed pios.
https://virtuozzo.atlassian.net/browse/VSTOR-102677
https://virtuozzo.atlassian.net/browse/VSTOR-103768
Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Feature: vStorage
Additional notes about how the code works from Pasha:
Only place we can add to md_page->wait_llist is in
ploop_delay_if_md_busy if md_page->md_status has MD_UPDATING or
MD_WRITEBACK bits (under both md_lock and bat_lock). Previously we first
acquired elements from wait_llist and only then cleared MD_WRITEBACK
bit, thus having a race window where wait_llist can receive new delayed
elements after acquiring elements from it and before clearing
MD_WRITEBACK and those new elements will never be processed (until the
next writeback to the same metadata page).
Putting bit-clear and elements-acquire under both locks will guarantee
it is atomicity relative to ploop_delay_if_md_busy.
One alternative case is when we delay due to MD_UPDATING bit set, and
there is no MD_WRITEBACK at the time. This is also covered though more
trickily. In this case the one who set MD_UPDATING bit, will call
ploop_add_dirty_for_wb, which will be processed later in
do_ploop_run_work -> ploop_submit_metadata_writeback and will eventually
lead to MD_WRITEBACK bit set and writeback submitted via
ploop_index_wb_submit. So it is guaranteed that wait_llist will always
be handled in this case.
---
drivers/md/dm-ploop-map.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index 6dd94af5fd947..35085a04ca5f5 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -884,11 +884,15 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
#endif
WARN_ON_ONCE(!test_bit(MD_WRITEBACK, &md->status));
+
+ wait_llist_pending = llist_del_all(&md->wait_llist);
+
+ /* Clear flag at the end when all processing related is done */
+ clear_bit(MD_WRITEBACK, &md->status);
spin_unlock(&md->md_lock);
spin_unlock_irqrestore(&ploop->bat_lock, flags);
kunmap_local(dst_clu);
- wait_llist_pending = llist_del_all(&md->wait_llist);
if (wait_llist_pending) {
llist_for_each_safe(pos, t, wait_llist_pending) {
pio = list_entry((struct list_head *)pos, typeof(*pio), list);
@@ -900,10 +904,6 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
if (!list_empty(&list))
ploop_dispatch_pios(ploop, NULL, &list);
- spin_lock_irqsave(&ploop->bat_lock, flags);
- /* Clear flag at the end when all processing related is done */
- clear_bit(MD_WRITEBACK, &md->status);
- spin_unlock_irqrestore(&ploop->bat_lock, flags);
}
static void ploop_free_piwb(struct ploop_index_wb *piwb)
More information about the Devel
mailing list