[Devel] [PATCH vz9] dm-ploop: fix requeuing of delayed pios at the end of BAT
Alexander Atanasov
alexander.atanasov at virtuozzo.com
Fri Apr 25 13:32:39 MSK 2025
On 25.04.25 13:27, Pavel Tikhomirov wrote:
>
>
> On 4/25/25 18:12, Alexander Atanasov wrote:
>> On 25.04.25 13:05, Pavel Tikhomirov wrote:
>>>
>>>
>>> On 4/25/25 13:35, Alexander Atanasov wrote:
>>>> Rework ploop_advance_local_after_bat_wb so that if we have delayed
>>>> update to the same page is really executed and don't endup in
>>>> wait_list.
>>> ^ missing "it"
>>>> Due to releasing and taking locks there was a window that a new update
>>>> could go into the wait_list. Both due to WRITEBACK bit cleared after
>>> ^ "both" does not sound right, maybe
>>> you meant "also"?
>>
>> We have two cases that a pio can get there for the same reason so both
>> not also.
>
> What cases when? It's not clear from the commit message.
Case 1:
>>>> Rework ploop_advance_local_after_bat_wb so that if we have delayed
>>>> update to the same page is really executed and don't endup in
>>>> wait_list.
Case 2:
>>>> Due to releasing and taking locks there was a window that a new update
>>>> could go into the wait_list. Both due to WRITEBACK bit cleared after
And yes it must be "Both are" to be grammar compliant.
>
>>
>>
>>>> retaking the lock.
>>>>
>>>> To fix this take all pending pios and clear the bit under lock,
>>>> using the same schema as in other code that checks the bit.
>>>> After releasing the lock displatch delayed pios.
>>> ^ excess 'l'
>>>
>>> Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
>>>
>>> I would like explanation more specific:
>>>
>>> Only place we can add to md_page->wait_llist is in
>>> ploop_delay_if_md_busy if md_page->md_status has MD_UPDATING or
>>> MD_WRITEBACK bits (under both md_lock and bat_lock). Previously we
>>> first acquired elements from wait_llist and only then cleared
>>> MD_WRITEBACK bit, thus having a race window where wait_llist can
>>> receive new delayed elements after acquiring elements from it and
>>> before clearing MD_WRITEBACK and those new elements will never be
>>> processed (until the next writeback to the same metadata page).
>>
>>
>> I explain what problem we faced and how we fixed it, Why do i have to
>> explain how code works - it is there to be read if one needs the details?
>>
>>>
>>> Putting bit-clear and elements-acquire under both locks will
>>> guarantee it is atomicity relative to ploop_delay_if_md_busy.
>>>
>>> One alternative case is when we delay due to MD_UPDATING bit set, and
>>> there is no MD_WRITEBACK at the time. This is also covered though
>>> more trickily. In this case the one who set MD_UPDATING bit, will
>>> call ploop_add_dirty_for_wb, which will be processed later in
>>> do_ploop_run_work -> ploop_submit_metadata_writeback and will
>>> eventually lead to MD_WRITEBACK bit set and writeback submitted via
>>> ploop_index_wb_submit. So it is guaranteed that wait_llist will
>>> always be handled in this case.
>>
>> This is again explaining how code works not what the patch does.
>
> Strongly disagree. I believe the whole locking scheme should be
> explained here, as it is the thing that we are fixing. There is no
> description of it anywhere. But we should be able to verify that what is
> written in code corresponds to how we intend it to work.
There is a description in the previous commits how locking works.
Here it is about how we fix a specific place. No need to repeat it.
Documentation yes, there is no documentation we have to use the source
until there is documentation.
>
>>>
>>>>
>>>> https://virtuozzo.atlassian.net/browse/VSTOR-102677
>>>> https://virtuozzo.atlassian.net/browse/VSTOR-103768
>>>> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
>>>> ---
>>>> drivers/md/dm-ploop-map.c | 10 +++++-----
>>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>>>> index 76544cdb476e..8ef0426e7468 100644
>>>> --- a/drivers/md/dm-ploop-map.c
>>>> +++ b/drivers/md/dm-ploop-map.c
>>>> @@ -884,11 +884,15 @@ static void
>>>> ploop_advance_local_after_bat_wb(struct ploop *ploop,
>>>> #endif
>>>> WARN_ON_ONCE(!test_bit(MD_WRITEBACK, &md->status));
>>>> +
>>>> + wait_llist_pending = llist_del_all(&md->wait_llist);
>>>> +
>>>> + /* Clear flag at the end when all processing related is done */
>>>> + clear_bit(MD_WRITEBACK, &md->status);
>>>> spin_unlock(&md->md_lock);
>>>> spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>>> kunmap_local(dst_clu);
>>>> - wait_llist_pending = llist_del_all(&md->wait_llist);
>>>> if (wait_llist_pending) {
>>>> llist_for_each_safe(pos, t, wait_llist_pending) {
>>>> pio = list_entry((struct list_head *)pos,
>>>> typeof(*pio), list);
>>>> @@ -900,10 +904,6 @@ static void
>>>> ploop_advance_local_after_bat_wb(struct ploop *ploop,
>>>> if (!list_empty(&list))
>>>> ploop_dispatch_pios(ploop, NULL, &list);
>>>> - spin_lock_irqsave(&ploop->bat_lock, flags);
>>>> - /* Clear flag at the end when all processing related is done */
>>>> - clear_bit(MD_WRITEBACK, &md->status);
>>>> - spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>>> }
>>>> static void ploop_free_piwb(struct ploop_index_wb *piwb)
>>>
>>
>>
>
--
Regards,
Alexander Atanasov
More information about the Devel
mailing list