[Devel] [PATCH VZ9 2/3] block/blk-cbt: allow multiple cbts in a single queue

Andrey Zhadchenko andrey.zhadchenko at virtuozzo.com
Wed Jan 22 14:29:52 MSK 2025



On 1/22/25 12:14, Pavel Tikhomirov wrote:
> 
> 
> On 1/22/25 19:05, Andrey Zhadchenko wrote:
>>
>>
>> On 1/22/25 11:51, Pavel Tikhomirov wrote:
>>>> +static struct cbt_info *blk_cbt_find(struct request_queue *q, __u8 
>>>> *name)
>>>> +{
>>>> +    struct cbt_info *cbt;
>>>> +
>>>> +    list_for_each_entry(cbt, &q->cbt_list, list)
>>>> +        if (!memcmp(name, cbt->name, CBT_NAME_LENGTH))
>>>> +            return cbt;
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>>   static int blk_cbt_snap_create(struct request_queue *q, __u8 *name,
>>>>                      struct blk_user_cbt_snap_create __user *arg)
>>>>   {
>>>> @@ -382,8 +397,7 @@ static int blk_cbt_snap_create(struct 
>>>> request_queue *q, __u8 *name,
>>>>           return -EFAULT;
>>>>       mutex_lock(&cbt_mutex);
>>>> -    cbt = q->cbt;
>>>> -
>>>> +    cbt = blk_cbt_find(q, name);
>>>>       if (!cbt) {
>>>>           mutex_unlock(&cbt_mutex);
>>>>           return -ENOENT;
>>>> @@ -392,11 +406,6 @@ static int blk_cbt_snap_create(struct 
>>>> request_queue *q, __u8 *name,
>>>>       BUG_ON(!cbt->map);
>>>>       BUG_ON(!cbt->block_max);
>>>> -    if (!name || memcmp(name, cbt->name, sizeof(cbt->name))) {
>>>> -        mutex_unlock(&cbt_mutex);
>>>> -        return -EINVAL;
>>>> -    }
>>>> -
>>>>       if (cbt->snp_map) {
>>>>           mutex_unlock(&cbt_mutex);
>>>>           return -EBUSY;
>>> ...
>>>> @@ -667,35 +665,46 @@ static void cbt_release_callback(struct 
>>>> rcu_head *head)
>>>>       kfree(cbt);
>>>>   }
>>>> -void blk_cbt_release(struct request_queue *q)
>>>> +static void blk_cbt_del(struct cbt_info *cbt)
>>>>   {
>>>> -    struct cbt_info *cbt;
>>>> -    int in_use = 0;
>>>>       unsigned long flags;
>>>> +    int in_use;
>>>> -    cbt = q->cbt;
>>>> -    if (!cbt)
>>>> -        return;
>>>>       spin_lock_irqsave(&cbt->lock, flags);
>>>>       set_bit(CBT_DEAD, &cbt->flags);
>>>> -    rcu_assign_pointer(q->cbt, NULL);
>>>> +    list_del_rcu(&cbt->list);
>>>>       in_use = cbt->count;
>>>>       spin_unlock_irqrestore(&cbt->lock, flags);
>>>>       if (!in_use)
>>>>           call_rcu(&cbt->rcu, &cbt_release_callback);
>>>>   }
>>>
>>> Why we don't take rcu_read_lock around list iteration in 
>>> blk_cbt_find? If we don't take rcu_read_lock there, the cbt returned 
>>> from blk_cbt_find can be already freed.
>>
>> rcu is not taken on a path where mutex is held. cbt_mutex makes user 
>> operations exclusive, so there won't be any race. Maybe I should add a 
>> comment then?
>>
> 
> There is no mutex in cbt_page_alloc path also there is no mutex on 
> removal path.

I do think that removal path is blk_cbt_del(), which is either called 
from release (see my answer about release and concurrency) and 
cbt_ioc_stop(), which takes cbt_mutex

cbt_page_alloc() is complicated, but in the end it is either called from 
blk_cbt_bio_queue(), where rcu is taken, or from a bunch of user 
operation, where mutex is held. Could you point any call stack where 
this is false?

> 
> We definitely need a good explanation of why rcu usage is correct. If we 
> partially protect the list with a lock, the lock should be on the both 
> ends of critical section.
> 
>>>
>>> I see other places of _rcu walk without rcu_read_lock, e.g.: 
>>> cbt_page_alloc (!in_rcu) case, blk_cbt_update_size.
>>>
>>> Also list can be broken probably if blk_cbt_del runs concurrently 
>>> with itself (blk_cbt_release vs cbt_ioc_stop). I think the whole loop 
>>> in blk_cbt_release should be protected against concurrent 
>>> cbt_ioc_stop (that may be true as blk_cbt_release probably runs at 
>>> the very end).
>>>
>> I don't think blk_cbt_release can be run concurrently win any of 
>> blk_cbt functions, because they are operation on a file descriptor. I 
>> am sure that kernel won't call blk_cbt_release() when some fd still 
>> refers to the device
> 


More information about the Devel mailing list