[Devel] [PATCH vz9] dm-ploop: fix requeuing of delayed pios at the end of BAT
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Fri Apr 25 13:05:32 MSK 2025
On 4/25/25 13:35, Alexander Atanasov wrote:
> Rework ploop_advance_local_after_bat_wb so that if we have delayed
> update to the same page is really executed and don't endup in wait_list.
^ missing "it"
> Due to releasing and taking locks there was a window that a new update
> could go into the wait_list. Both due to WRITEBACK bit cleared after
^ "both" does not sound right, maybe you
meant "also"?
> retaking the lock.
>
> To fix this take all pending pios and clear the bit under lock,
> using the same schema as in other code that checks the bit.
> After releasing the lock displatch delayed pios.
^ excess 'l'
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
I would like explanation more specific:
Only place we can add to md_page->wait_llist is in
ploop_delay_if_md_busy if md_page->md_status has MD_UPDATING or
MD_WRITEBACK bits (under both md_lock and bat_lock). Previously we first
acquired elements from wait_llist and only then cleared MD_WRITEBACK
bit, thus having a race window where wait_llist can receive new delayed
elements after acquiring elements from it and before clearing
MD_WRITEBACK and those new elements will never be processed (until the
next writeback to the same metadata page).
Putting bit-clear and elements-acquire under both locks will guarantee
it is atomicity relative to ploop_delay_if_md_busy.
One alternative case is when we delay due to MD_UPDATING bit set, and
there is no MD_WRITEBACK at the time. This is also covered though more
trickily. In this case the one who set MD_UPDATING bit, will call
ploop_add_dirty_for_wb, which will be processed later in
do_ploop_run_work -> ploop_submit_metadata_writeback and will eventually
lead to MD_WRITEBACK bit set and writeback submitted via
ploop_index_wb_submit. So it is guaranteed that wait_llist will always
be handled in this case.
>
> https://virtuozzo.atlassian.net/browse/VSTOR-102677
> https://virtuozzo.atlassian.net/browse/VSTOR-103768
> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
> ---
> drivers/md/dm-ploop-map.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
> index 76544cdb476e..8ef0426e7468 100644
> --- a/drivers/md/dm-ploop-map.c
> +++ b/drivers/md/dm-ploop-map.c
> @@ -884,11 +884,15 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
> #endif
>
> WARN_ON_ONCE(!test_bit(MD_WRITEBACK, &md->status));
> +
> + wait_llist_pending = llist_del_all(&md->wait_llist);
> +
> + /* Clear flag at the end when all processing related is done */
> + clear_bit(MD_WRITEBACK, &md->status);
> spin_unlock(&md->md_lock);
> spin_unlock_irqrestore(&ploop->bat_lock, flags);
> kunmap_local(dst_clu);
>
> - wait_llist_pending = llist_del_all(&md->wait_llist);
> if (wait_llist_pending) {
> llist_for_each_safe(pos, t, wait_llist_pending) {
> pio = list_entry((struct list_head *)pos, typeof(*pio), list);
> @@ -900,10 +904,6 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
> if (!list_empty(&list))
> ploop_dispatch_pios(ploop, NULL, &list);
>
> - spin_lock_irqsave(&ploop->bat_lock, flags);
> - /* Clear flag at the end when all processing related is done */
> - clear_bit(MD_WRITEBACK, &md->status);
> - spin_unlock_irqrestore(&ploop->bat_lock, flags);
> }
>
> static void ploop_free_piwb(struct ploop_index_wb *piwb)
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list