[Devel] [PATCH VZ9 2/3] block/blk-cbt: allow multiple cbts in a single queue

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Thu Jan 23 12:23:36 MSK 2025



On 1/22/25 19:29, Andrey Zhadchenko wrote:
> 
> 
> On 1/22/25 12:14, Pavel Tikhomirov wrote:
>>
>>
>> On 1/22/25 19:05, Andrey Zhadchenko wrote:
>>>
>>>
>>> On 1/22/25 11:51, Pavel Tikhomirov wrote:
>>>>> +static struct cbt_info *blk_cbt_find(struct request_queue *q, __u8 
>>>>> *name)
>>>>> +{
>>>>> +    struct cbt_info *cbt;
>>>>> +
>>>>> +    list_for_each_entry(cbt, &q->cbt_list, list)
>>>>> +        if (!memcmp(name, cbt->name, CBT_NAME_LENGTH))
>>>>> +            return cbt;
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>>   static int blk_cbt_snap_create(struct request_queue *q, __u8 *name,
>>>>>                      struct blk_user_cbt_snap_create __user *arg)
>>>>>   {
>>>>> @@ -382,8 +397,7 @@ static int blk_cbt_snap_create(struct 
>>>>> request_queue *q, __u8 *name,
>>>>>           return -EFAULT;
>>>>>       mutex_lock(&cbt_mutex);
>>>>> -    cbt = q->cbt;
>>>>> -
>>>>> +    cbt = blk_cbt_find(q, name);
>>>>>       if (!cbt) {
>>>>>           mutex_unlock(&cbt_mutex);
>>>>>           return -ENOENT;
>>>>> @@ -392,11 +406,6 @@ static int blk_cbt_snap_create(struct 
>>>>> request_queue *q, __u8 *name,
>>>>>       BUG_ON(!cbt->map);
>>>>>       BUG_ON(!cbt->block_max);
>>>>> -    if (!name || memcmp(name, cbt->name, sizeof(cbt->name))) {
>>>>> -        mutex_unlock(&cbt_mutex);
>>>>> -        return -EINVAL;
>>>>> -    }
>>>>> -
>>>>>       if (cbt->snp_map) {
>>>>>           mutex_unlock(&cbt_mutex);
>>>>>           return -EBUSY;
>>>> ...
>>>>> @@ -667,35 +665,46 @@ static void cbt_release_callback(struct 
>>>>> rcu_head *head)
>>>>>       kfree(cbt);
>>>>>   }
>>>>> -void blk_cbt_release(struct request_queue *q)
>>>>> +static void blk_cbt_del(struct cbt_info *cbt)
>>>>>   {
>>>>> -    struct cbt_info *cbt;
>>>>> -    int in_use = 0;
>>>>>       unsigned long flags;
>>>>> +    int in_use;
>>>>> -    cbt = q->cbt;
>>>>> -    if (!cbt)
>>>>> -        return;
>>>>>       spin_lock_irqsave(&cbt->lock, flags);
>>>>>       set_bit(CBT_DEAD, &cbt->flags);
>>>>> -    rcu_assign_pointer(q->cbt, NULL);
>>>>> +    list_del_rcu(&cbt->list);
>>>>>       in_use = cbt->count;
>>>>>       spin_unlock_irqrestore(&cbt->lock, flags);
>>>>>       if (!in_use)
>>>>>           call_rcu(&cbt->rcu, &cbt_release_callback);
>>>>>   }
>>>>
>>>> Why we don't take rcu_read_lock around list iteration in 
>>>> blk_cbt_find? If we don't take rcu_read_lock there, the cbt returned 
>>>> from blk_cbt_find can be already freed.
>>>
>>> rcu is not taken on a path where mutex is held. cbt_mutex makes user 
>>> operations exclusive, so there won't be any race. Maybe I should add 
>>> a comment then?
>>>
>>
>> There is no mutex in cbt_page_alloc path also there is no mutex on 
>> removal path.
> 
> I do think that removal path is blk_cbt_del(), which is either called 
> from release (see my answer about release and concurrency) and 
> cbt_ioc_stop(), which takes cbt_mutex

   +-< blk_cbt_del
     +-< blk_cbt_release
     | +-< blk_free_queue # queue is at freeing stage and we don't 
expect concurrent access to cbt
     | | +-< blk_put_queue
     +-< cbt_ioc_stop # under cbt_mutex

If blk_cbt_del can't run concurrently in blk_free_queue case with any 
other code which accesses cbt, then why don't we (in blk_cbt_release) 
free cbt's from queue->cbt_list immediately without excess call_rcu?

> 
> cbt_page_alloc() is complicated, but in the end it is either called from 
> blk_cbt_bio_queue(), where rcu is taken, or from a bunch of user 
> operation, where mutex is held. Could you point any call stack where 
> this is false?
> 

agreed, ok

   +-< cbt_page_alloc
     +-< __blk_cbt_set
     | +-< blk_cbt_add
     | | +-< blk_cbt_bio_queue # under rcu_read_lock
     | +-< __cbt_flush_cpu_cache
         +-< cbt_flush_cache
           +-< blk_cbt_snap_create # under cbt_mutex
           +-< blk_cbt_update_cbt_size
           | +-< blk_cbt_update_size # under cbt_mutex
           +-< cbt_ioc_get # under cbt_mutex
           +-< cbt_ioc_set # under cbt_mutex
     | +-< cbt_ioc_get # under cbt_mutex
     | +-< cbt_ioc_set # under cbt_mutex
     +-< blk_cbt_snap_merge_back # under cbt_mutex
     +-< cbt_flush_cache # ok, see above

>>
>> We definitely need a good explanation of why rcu usage is correct. If 
>> we partially protect the list with a lock, the lock should be on the 
>> both ends of critical section.
>>
>>>>
>>>> I see other places of _rcu walk without rcu_read_lock, e.g.: 
>>>> cbt_page_alloc (!in_rcu) case, blk_cbt_update_size.
>>>>
>>>> Also list can be broken probably if blk_cbt_del runs concurrently 
>>>> with itself (blk_cbt_release vs cbt_ioc_stop). I think the whole 
>>>> loop in blk_cbt_release should be protected against concurrent 
>>>> cbt_ioc_stop (that may be true as blk_cbt_release probably runs at 
>>>> the very end).
>>>>
>>> I don't think blk_cbt_release can be run concurrently win any of 
>>> blk_cbt functions, because they are operation on a file descriptor. I 
>>> am sure that kernel won't call blk_cbt_release() when some fd still 
>>> refers to the device
>>

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list