[Devel] [PATCH vz9] dm-ploop: fix requeuing of delayed pios at the end of BAT

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Apr 25 13:05:32 MSK 2025



On 4/25/25 13:35, Alexander Atanasov wrote:
> Rework ploop_advance_local_after_bat_wb so that if we have delayed
> update to the same page is really executed and don't endup in wait_list.
                          ^ missing "it"
> Due to releasing and taking locks there was a window that a new update
> could go into the wait_list. Both due to WRITEBACK bit cleared after
                                ^ "both" does not sound right, maybe you 
meant "also"?
> retaking the lock.
> 
> To fix this take all pending pios and clear the bit under lock,
> using the same schema as in other code that checks the bit.
> After releasing the lock displatch delayed pios.
                                ^ excess 'l'

Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

I would like explanation more specific:

Only place we can add to md_page->wait_llist is in 
ploop_delay_if_md_busy if md_page->md_status has MD_UPDATING or 
MD_WRITEBACK bits (under both md_lock and bat_lock). Previously we first 
acquired elements from wait_llist and only then cleared MD_WRITEBACK 
bit, thus having a race window where wait_llist can receive new delayed 
elements after acquiring elements from it and before clearing 
MD_WRITEBACK and those new elements will never be processed (until the 
next writeback to the same metadata page).

Putting bit-clear and elements-acquire under both locks will guarantee 
it is atomicity relative to ploop_delay_if_md_busy.

One alternative case is when we delay due to MD_UPDATING bit set, and 
there is no MD_WRITEBACK at the time. This is also covered though more 
trickily. In this case the one who set MD_UPDATING bit, will call 
ploop_add_dirty_for_wb, which will be processed later in 
do_ploop_run_work -> ploop_submit_metadata_writeback and will eventually 
lead to MD_WRITEBACK bit set and writeback submitted via 
ploop_index_wb_submit. So it is guaranteed that wait_llist will always 
be handled in this case.

> 
> https://virtuozzo.atlassian.net/browse/VSTOR-102677
> https://virtuozzo.atlassian.net/browse/VSTOR-103768
> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
> ---
>   drivers/md/dm-ploop-map.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
> index 76544cdb476e..8ef0426e7468 100644
> --- a/drivers/md/dm-ploop-map.c
> +++ b/drivers/md/dm-ploop-map.c
> @@ -884,11 +884,15 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
>   #endif
>   
>   	WARN_ON_ONCE(!test_bit(MD_WRITEBACK, &md->status));
> +
> +	wait_llist_pending = llist_del_all(&md->wait_llist);
> +
> +	/* Clear flag at the end when all processing related is done */
> +	clear_bit(MD_WRITEBACK, &md->status);
>   	spin_unlock(&md->md_lock);
>   	spin_unlock_irqrestore(&ploop->bat_lock, flags);
>   	kunmap_local(dst_clu);
>   
> -	wait_llist_pending = llist_del_all(&md->wait_llist);
>   	if (wait_llist_pending) {
>   		llist_for_each_safe(pos, t, wait_llist_pending) {
>   			pio = list_entry((struct list_head *)pos, typeof(*pio), list);
> @@ -900,10 +904,6 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
>   	if (!list_empty(&list))
>   		ploop_dispatch_pios(ploop, NULL, &list);
>   
> -	spin_lock_irqsave(&ploop->bat_lock, flags);
> -	/* Clear flag at the end when all processing related is done */
> -	clear_bit(MD_WRITEBACK, &md->status);
> -	spin_unlock_irqrestore(&ploop->bat_lock, flags);
>   }
>   
>   static void ploop_free_piwb(struct ploop_index_wb *piwb)

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list