[Devel] [PATCH VZ9 2/3] block/blk-cbt: allow multiple cbts in a single queue
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Wed Jan 22 14:14:08 MSK 2025
On 1/22/25 19:05, Andrey Zhadchenko wrote:
>
>
> On 1/22/25 11:51, Pavel Tikhomirov wrote:
>>> +static struct cbt_info *blk_cbt_find(struct request_queue *q, __u8
>>> *name)
>>> +{
>>> + struct cbt_info *cbt;
>>> +
>>> + list_for_each_entry(cbt, &q->cbt_list, list)
>>> + if (!memcmp(name, cbt->name, CBT_NAME_LENGTH))
>>> + return cbt;
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> static int blk_cbt_snap_create(struct request_queue *q, __u8 *name,
>>> struct blk_user_cbt_snap_create __user *arg)
>>> {
>>> @@ -382,8 +397,7 @@ static int blk_cbt_snap_create(struct
>>> request_queue *q, __u8 *name,
>>> return -EFAULT;
>>> mutex_lock(&cbt_mutex);
>>> - cbt = q->cbt;
>>> -
>>> + cbt = blk_cbt_find(q, name);
>>> if (!cbt) {
>>> mutex_unlock(&cbt_mutex);
>>> return -ENOENT;
>>> @@ -392,11 +406,6 @@ static int blk_cbt_snap_create(struct
>>> request_queue *q, __u8 *name,
>>> BUG_ON(!cbt->map);
>>> BUG_ON(!cbt->block_max);
>>> - if (!name || memcmp(name, cbt->name, sizeof(cbt->name))) {
>>> - mutex_unlock(&cbt_mutex);
>>> - return -EINVAL;
>>> - }
>>> -
>>> if (cbt->snp_map) {
>>> mutex_unlock(&cbt_mutex);
>>> return -EBUSY;
>> ...
>>> @@ -667,35 +665,46 @@ static void cbt_release_callback(struct
>>> rcu_head *head)
>>> kfree(cbt);
>>> }
>>> -void blk_cbt_release(struct request_queue *q)
>>> +static void blk_cbt_del(struct cbt_info *cbt)
>>> {
>>> - struct cbt_info *cbt;
>>> - int in_use = 0;
>>> unsigned long flags;
>>> + int in_use;
>>> - cbt = q->cbt;
>>> - if (!cbt)
>>> - return;
>>> spin_lock_irqsave(&cbt->lock, flags);
>>> set_bit(CBT_DEAD, &cbt->flags);
>>> - rcu_assign_pointer(q->cbt, NULL);
>>> + list_del_rcu(&cbt->list);
>>> in_use = cbt->count;
>>> spin_unlock_irqrestore(&cbt->lock, flags);
>>> if (!in_use)
>>> call_rcu(&cbt->rcu, &cbt_release_callback);
>>> }
>>
>> Why we don't take rcu_read_lock around list iteration in blk_cbt_find?
>> If we don't take rcu_read_lock there, the cbt returned from
>> blk_cbt_find can be already freed.
>
> rcu is not taken on a path where mutex is held. cbt_mutex makes user
> operations exclusive, so there won't be any race. Maybe I should add a
> comment then?
>
There is no mutex in cbt_page_alloc path also there is no mutex on
removal path.
We definitely need a good explanation of why rcu usage is correct. If we
partially protect the list with a lock, the lock should be on the both
ends of critical section.
>>
>> I see other places of _rcu walk without rcu_read_lock, e.g.:
>> cbt_page_alloc (!in_rcu) case, blk_cbt_update_size.
>>
>> Also list can be broken probably if blk_cbt_del runs concurrently with
>> itself (blk_cbt_release vs cbt_ioc_stop). I think the whole loop in
>> blk_cbt_release should be protected against concurrent cbt_ioc_stop
>> (that may be true as blk_cbt_release probably runs at the very end).
>>
> I don't think blk_cbt_release can be run concurrently win any of blk_cbt
> functions, because they are operation on a file descriptor. I am sure
> that kernel won't call blk_cbt_release() when some fd still refers to
> the device
--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list