[Devel] [RFC PATCH vz9 v6 39/62] dm-ploop: prepare bat updates under bat_lock

Fri Dec 20 18:01:31 MSK 2024

On 20.12.24 15:24, Andrey Zhadchenko wrote:
> 
> 
> On 12/5/24 22:56, Alexander Atanasov wrote:
>> Prepare for threads. When preparing bat updates there are two important
>> things to protect - md->status MD_DIRTY bit  and holes bitmap.
>> Use bat_lock to protect them.
>>
>> https://virtuozzo.atlassian.net/browse/VSTOR-91821
>> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
>> ---
>>   drivers/md/dm-ploop-map.c | 83 ++++++++++++++++++++++++++++-----------
>>   1 file changed, 61 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>> index bb15fa1f854b..0eaab61a0c31 100644
>> --- a/drivers/md/dm-ploop-map.c
>> +++ b/drivers/md/dm-ploop-map.c
>> @@ -555,13 +555,18 @@ static bool ploop_md_make_dirty(struct ploop 
>> *ploop, struct md_page *md)
>>       unsigned long flags;
>>       bool new = false;
>> -    spin_lock_irqsave(&ploop->bat_lock, flags);
>> +    /*
>> +     * Usual pattern is check for dirty, prepare bat update
>> +     * and then call make_dirty - to avoid races callers must lock
>> +     */
>> +
>> +    lockdep_assert_held(&ploop->bat_lock);
>> +
>>       WARN_ON_ONCE(test_bit(MD_WRITEBACK, &md->status));
>>       if (!test_and_set_bit(MD_DIRTY, &md->status)) {
>>           llist_add((struct llist_node *)&md->wb_link, 
>> &ploop->wb_batch_llist);
>>           new = true;
>>       }
>> -    spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>       md->dirty_timeout = jiffies + ploop->md_submit_delay_ms*HZ/1000;
>>       return new;
>> @@ -807,6 +812,8 @@ static void 
>> ploop_advance_local_after_bat_wb(struct ploop *ploop,
>>       dst_clu = piwb->kmpage;
>> +    /* holes bit map requires bat_lock */
>> +    spin_lock_irqsave(&ploop->bat_lock, flags);
>>       spin_lock(&md->md_lock);
>>       if (piwb->type == PIWB_TYPE_ALLOC)
>>           goto skip_apply;
>> @@ -833,6 +840,7 @@ static void 
>> ploop_advance_local_after_bat_wb(struct ploop *ploop,
>>       clear_bit(MD_WRITEBACK, &md->status);
>>       md->piwb = NULL;
>>       spin_unlock(&md->md_lock);
>> +    spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>       wait_llist_pending = llist_del_all(&md->wait_llist);
>>       if (wait_llist_pending) {
>> @@ -1077,10 +1085,22 @@ static int ploop_allocate_cluster(struct ploop 
>> *ploop, u32 *dst_clu)
>>       u32 clu_size = CLU_SIZE(ploop);
>>       loff_t off, pos, end, old_size;
>>       struct file *file = top->file;
>> +    unsigned long flags;
>>       int ret;
>> -    if (ploop_find_dst_clu_bit(ploop, dst_clu) < 0)
>> +    spin_lock_irqsave(&ploop->bat_lock, flags);
>> +    if (ploop_find_dst_clu_bit(ploop, dst_clu) < 0) {
>> +        spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>           return -EIO;
>> +    }
>> +    /*
>> +     * Mark clu as used. Find & clear bit must be locked,
>> +     * since  this may be called from different threads.
>> +     * Note, that set_bit may be made from many places.
>> +     * We only care to clear what find got. parallel set is ok.
>> +     */
>> +    ploop_hole_clear_bit(*dst_clu, ploop);
>> +    spin_unlock_irqrestore(&ploop->bat_lock, flags);
>>       pos = CLU_TO_POS(ploop, *dst_clu);
>>       end = pos + clu_size;
>> @@ -1094,6 +1114,8 @@ static int ploop_allocate_cluster(struct ploop 
>> *ploop, u32 *dst_clu)
>>           else
>>               ret = ploop_zero_range(file, pos, off - pos);
>>           if (ret) {
>> +            ploop_hole_set_bit(*dst_clu, ploop);
>> +
>>               if (ret != -EOPNOTSUPP)
>>                   PL_ERR("punch/zero area: %d", ret);
>>               else if (ploop->falloc_new_clu)
>> @@ -1108,18 +1130,15 @@ static int ploop_allocate_cluster(struct ploop 
>> *ploop, u32 *dst_clu)
>>       if (end > old_size) {
>>           ret = ploop_truncate_prealloc_safe(ploop, top, end, __func__);
>> -        if (ret)
>> +        if (ret) {
>> +            ploop_hole_set_bit(*dst_clu, ploop);
>>               return ret;
>> +        }
>>       }
>>       if (end > top->file_preallocated_area_start)
>>           top->file_preallocated_area_start = end;
> 
> I think whole ploop_md_make_dirty() is also racy. If we have parallel 
> allocations, we also need to protect ploop fields. Probably if there 
> will be too much allocation at the same time (more than prealloc size), 
> we may have racing truncates with different file lengths.

While i agree with you about ploop_md_make_dirty(), i do not think is 
possible to have running parallel allocations - if there are concurent 
one will end in some delay queue either cluster locked or md page dirty. 
But i will double check this. According to code comments - problem is if 
we have parallel allocation and discard but i think bitmap locking helps 
with it.

-- 
Regards,
Alexander Atanasov