[Devel] [RFC PATCH vz9 v6 20/62] dm-ploop: reduce BAT accesses on discard completion

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Jan 21 08:51:30 MSK 2025



On 1/21/25 04:08, Alexander Atanasov wrote:
> On 20.01.25 15:33, Alexander Atanasov wrote:
>> On 20.01.25 6:15, Pavel Tikhomirov wrote:
>>>
>>>
>>> On 12/6/24 05:55, Alexander Atanasov wrote:
>>>> From: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
>>>>
>>>> Drop extra ploop_cluster_is_in_top_delta() as we are planning to
>>>> access BAT anyway
>>>>
>>>> https://virtuozzo.atlassian.net/browse/VSTOR-91817
>>>> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
>>>> ---
>>>>   drivers/md/dm-ploop-map.c | 28 ++++++++++++----------------
>>>>   1 file changed, 12 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>>>> index ad7ca7d43dfc..b00dd364072d 100644
>>>> --- a/drivers/md/dm-ploop-map.c
>>>> +++ b/drivers/md/dm-ploop-map.c
>>>> @@ -711,12 +711,15 @@ static void ploop_complete_cow(struct 
>>>> ploop_cow *cow, blk_status_t bi_status)
>>>>       kmem_cache_free(cow_cache, cow);
>>>>   }
>>>> -static void ploop_release_cluster(struct ploop *ploop, u32 clu)
>>>> +static void ploop_piwb_discard_completed(struct ploop *ploop,
>>>> +                     bool success, u32 clu, u32 new_dst_clu)
>>>>   {
>>>>       u32 id, *bat_entries, dst_clu;
>>>>       struct md_page *md;
>>>> +    u8 level;
>>>> -    lockdep_assert_held(&ploop->bat_rwlock);
>>>> +    if (new_dst_clu)
>>>> +        return;
>>>>       id = ploop_bat_clu_to_page_nr(clu);
>>>>       md = ploop_md_page_find(ploop, id);
>>>
>>> Is this md the same to md in caller function 
>>> ploop_advance_local_after_bat_wb?
>>
>> It can be the same or different, it is iterating over the clusters and 
>> it is possible the page to change, so this needs a rewrite.
>> May be pass md as argument and check if it is the same, if not the 
>> same lock or something like that. i have to think about how to do it.
> 
> 
> After a deeper look - it is the same, i and off are limited to be within
> one page, so it does not change. (i actually tested this with passing md 
> as md_in into ploop_piwb_discard_completed and a WARN_ON(md != md_in))
> 
> I think to remove ploop_piwb_discard_completed.
> most of the init is duplicated it boils down to:
> 
>          if (piwb->type == PIWB_TYPE_DISCARD) {
>              u32 clu = i + off;
>              u8 level = md->bat_levels[clu];
>              u32 d_clu = READ_ONCE(bat_entries[clu]);
> 
>              if (success && !dst_clu[i] && (!(d_clu == BAT_ENTRY_NONE || 
> level < ploop_top_level(ploop)))) {
>                  WARN_ON_ONCE(ploop->nr_deltas != 1);
>                  WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
>                  WRITE_ONCE(md->bat_levels[clu], 0);
>                  ploop_hole_set_bit(d_clu, ploop);
>              }
> 
>              continue;
>          }
> 
> 
> It will save a page lookup (and a function call) and make it a bit more 
> readable. Other option i will explore is to split into different code 
> paths for alloc/discard/realoc instead of single for with conditions.
> This is wip - it may be shortened further.

Looks good.

> 
>>
>>>
>>>> @@ -726,22 +729,15 @@ static void ploop_release_cluster(struct ploop 
>>>> *ploop, u32 clu)
>>>>       bat_entries = md->kmpage;
>>>>       dst_clu = READ_ONCE(bat_entries[clu]);
>>>> -    WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
>>>> -    WRITE_ONCE(md->bat_levels[clu], 0);
>>>> -
>>>> -    ploop_hole_set_bit(dst_clu, ploop);
>>>> -}
>>>> -
>>>> -static void ploop_piwb_discard_completed(struct ploop *ploop,
>>>> -                     bool success, u32 clu, u32 new_dst_clu)
>>>> -{
>>>> -    if (new_dst_clu)
>>>> -        return;
>>>> +    level = md->bat_levels[clu];
>>>
>>> If for previous comment the answer is no, should not we take md- 
>>> >md_lock here to make the use of md->bat_levels and md->kmpage 
>>> atomic / consistent? In the next patch we introduce md->md_lock to 
>>> "use it when accessing md->levels and md->page at the sime time to 
>>> protect readers against writers".
>>>
>>> If the answer is yes, should not we do a lockdep check for md->md_lock?
>>
>> if it comes as an argument lockdep can be added but if it is different 
>> we will get false alarm.
>>
>>>
>>>> -    if (ploop_cluster_is_in_top_delta(ploop, clu)) {
>>>> +    if (!(dst_clu == BAT_ENTRY_NONE || level < 
>>>> ploop_top_level(ploop))) {
>>>>           WARN_ON_ONCE(ploop->nr_deltas != 1);
>>>> -        if (success)
>>>> -            ploop_release_cluster(ploop, clu);
>>>> +        if (success) {
>>>> +            WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
>>>> +            WRITE_ONCE(md->bat_levels[clu], 0);
>>>> +            ploop_hole_set_bit(dst_clu, ploop);
>>>> +        }
>>>>       }
>>>>   }
>>>
>>
> 

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list