[Devel] [PATCH VZ9 2/3] block/blk-cbt: allow multiple cbts in a single queue
Andrey Zhadchenko
andrey.zhadchenko at virtuozzo.com
Wed Jan 22 14:29:52 MSK 2025
On 1/22/25 12:14, Pavel Tikhomirov wrote:
>
>
> On 1/22/25 19:05, Andrey Zhadchenko wrote:
>>
>>
>> On 1/22/25 11:51, Pavel Tikhomirov wrote:
>>>> +static struct cbt_info *blk_cbt_find(struct request_queue *q, __u8
>>>> *name)
>>>> +{
>>>> + struct cbt_info *cbt;
>>>> +
>>>> + list_for_each_entry(cbt, &q->cbt_list, list)
>>>> + if (!memcmp(name, cbt->name, CBT_NAME_LENGTH))
>>>> + return cbt;
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> static int blk_cbt_snap_create(struct request_queue *q, __u8 *name,
>>>> struct blk_user_cbt_snap_create __user *arg)
>>>> {
>>>> @@ -382,8 +397,7 @@ static int blk_cbt_snap_create(struct
>>>> request_queue *q, __u8 *name,
>>>> return -EFAULT;
>>>> mutex_lock(&cbt_mutex);
>>>> - cbt = q->cbt;
>>>> -
>>>> + cbt = blk_cbt_find(q, name);
>>>> if (!cbt) {
>>>> mutex_unlock(&cbt_mutex);
>>>> return -ENOENT;
>>>> @@ -392,11 +406,6 @@ static int blk_cbt_snap_create(struct
>>>> request_queue *q, __u8 *name,
>>>> BUG_ON(!cbt->map);
>>>> BUG_ON(!cbt->block_max);
>>>> - if (!name || memcmp(name, cbt->name, sizeof(cbt->name))) {
>>>> - mutex_unlock(&cbt_mutex);
>>>> - return -EINVAL;
>>>> - }
>>>> -
>>>> if (cbt->snp_map) {
>>>> mutex_unlock(&cbt_mutex);
>>>> return -EBUSY;
>>> ...
>>>> @@ -667,35 +665,46 @@ static void cbt_release_callback(struct
>>>> rcu_head *head)
>>>> kfree(cbt);
>>>> }
>>>> -void blk_cbt_release(struct request_queue *q)
>>>> +static void blk_cbt_del(struct cbt_info *cbt)
>>>> {
>>>> - struct cbt_info *cbt;
>>>> - int in_use = 0;
>>>> unsigned long flags;
>>>> + int in_use;
>>>> - cbt = q->cbt;
>>>> - if (!cbt)
>>>> - return;
>>>> spin_lock_irqsave(&cbt->lock, flags);
>>>> set_bit(CBT_DEAD, &cbt->flags);
>>>> - rcu_assign_pointer(q->cbt, NULL);
>>>> + list_del_rcu(&cbt->list);
>>>> in_use = cbt->count;
>>>> spin_unlock_irqrestore(&cbt->lock, flags);
>>>> if (!in_use)
>>>> call_rcu(&cbt->rcu, &cbt_release_callback);
>>>> }
>>>
>>> Why we don't take rcu_read_lock around list iteration in
>>> blk_cbt_find? If we don't take rcu_read_lock there, the cbt returned
>>> from blk_cbt_find can be already freed.
>>
>> rcu is not taken on a path where mutex is held. cbt_mutex makes user
>> operations exclusive, so there won't be any race. Maybe I should add a
>> comment then?
>>
>
> There is no mutex in cbt_page_alloc path also there is no mutex on
> removal path.
I do think that removal path is blk_cbt_del(), which is either called
from release (see my answer about release and concurrency) and
cbt_ioc_stop(), which takes cbt_mutex
cbt_page_alloc() is complicated, but in the end it is either called from
blk_cbt_bio_queue(), where rcu is taken, or from a bunch of user
operation, where mutex is held. Could you point any call stack where
this is false?
>
> We definitely need a good explanation of why rcu usage is correct. If we
> partially protect the list with a lock, the lock should be on the both
> ends of critical section.
>
>>>
>>> I see other places of _rcu walk without rcu_read_lock, e.g.:
>>> cbt_page_alloc (!in_rcu) case, blk_cbt_update_size.
>>>
>>> Also list can be broken probably if blk_cbt_del runs concurrently
>>> with itself (blk_cbt_release vs cbt_ioc_stop). I think the whole loop
>>> in blk_cbt_release should be protected against concurrent
>>> cbt_ioc_stop (that may be true as blk_cbt_release probably runs at
>>> the very end).
>>>
>> I don't think blk_cbt_release can be run concurrently win any of
>> blk_cbt functions, because they are operation on a file descriptor. I
>> am sure that kernel won't call blk_cbt_release() when some fd still
>> refers to the device
>
More information about the Devel
mailing list