[Devel] [PATCH VZ9 2/3] block/blk-cbt: allow multiple cbts in a single queue
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Thu Jan 23 12:23:36 MSK 2025
On 1/22/25 19:29, Andrey Zhadchenko wrote:
>
>
> On 1/22/25 12:14, Pavel Tikhomirov wrote:
>>
>>
>> On 1/22/25 19:05, Andrey Zhadchenko wrote:
>>>
>>>
>>> On 1/22/25 11:51, Pavel Tikhomirov wrote:
>>>>> +static struct cbt_info *blk_cbt_find(struct request_queue *q, __u8
>>>>> *name)
>>>>> +{
>>>>> + struct cbt_info *cbt;
>>>>> +
>>>>> + list_for_each_entry(cbt, &q->cbt_list, list)
>>>>> + if (!memcmp(name, cbt->name, CBT_NAME_LENGTH))
>>>>> + return cbt;
>>>>> +
>>>>> + return NULL;
>>>>> +}
>>>>> +
>>>>> static int blk_cbt_snap_create(struct request_queue *q, __u8 *name,
>>>>> struct blk_user_cbt_snap_create __user *arg)
>>>>> {
>>>>> @@ -382,8 +397,7 @@ static int blk_cbt_snap_create(struct
>>>>> request_queue *q, __u8 *name,
>>>>> return -EFAULT;
>>>>> mutex_lock(&cbt_mutex);
>>>>> - cbt = q->cbt;
>>>>> -
>>>>> + cbt = blk_cbt_find(q, name);
>>>>> if (!cbt) {
>>>>> mutex_unlock(&cbt_mutex);
>>>>> return -ENOENT;
>>>>> @@ -392,11 +406,6 @@ static int blk_cbt_snap_create(struct
>>>>> request_queue *q, __u8 *name,
>>>>> BUG_ON(!cbt->map);
>>>>> BUG_ON(!cbt->block_max);
>>>>> - if (!name || memcmp(name, cbt->name, sizeof(cbt->name))) {
>>>>> - mutex_unlock(&cbt_mutex);
>>>>> - return -EINVAL;
>>>>> - }
>>>>> -
>>>>> if (cbt->snp_map) {
>>>>> mutex_unlock(&cbt_mutex);
>>>>> return -EBUSY;
>>>> ...
>>>>> @@ -667,35 +665,46 @@ static void cbt_release_callback(struct
>>>>> rcu_head *head)
>>>>> kfree(cbt);
>>>>> }
>>>>> -void blk_cbt_release(struct request_queue *q)
>>>>> +static void blk_cbt_del(struct cbt_info *cbt)
>>>>> {
>>>>> - struct cbt_info *cbt;
>>>>> - int in_use = 0;
>>>>> unsigned long flags;
>>>>> + int in_use;
>>>>> - cbt = q->cbt;
>>>>> - if (!cbt)
>>>>> - return;
>>>>> spin_lock_irqsave(&cbt->lock, flags);
>>>>> set_bit(CBT_DEAD, &cbt->flags);
>>>>> - rcu_assign_pointer(q->cbt, NULL);
>>>>> + list_del_rcu(&cbt->list);
>>>>> in_use = cbt->count;
>>>>> spin_unlock_irqrestore(&cbt->lock, flags);
>>>>> if (!in_use)
>>>>> call_rcu(&cbt->rcu, &cbt_release_callback);
>>>>> }
>>>>
>>>> Why we don't take rcu_read_lock around list iteration in
>>>> blk_cbt_find? If we don't take rcu_read_lock there, the cbt returned
>>>> from blk_cbt_find can be already freed.
>>>
>>> rcu is not taken on a path where mutex is held. cbt_mutex makes user
>>> operations exclusive, so there won't be any race. Maybe I should add
>>> a comment then?
>>>
>>
>> There is no mutex in cbt_page_alloc path also there is no mutex on
>> removal path.
>
> I do think that removal path is blk_cbt_del(), which is either called
> from release (see my answer about release and concurrency) and
> cbt_ioc_stop(), which takes cbt_mutex
+-< blk_cbt_del
+-< blk_cbt_release
| +-< blk_free_queue # queue is at freeing stage and we don't
expect concurrent access to cbt
| | +-< blk_put_queue
+-< cbt_ioc_stop # under cbt_mutex
If blk_cbt_del can't run concurrently in blk_free_queue case with any
other code which accesses cbt, then why don't we (in blk_cbt_release)
free cbt's from queue->cbt_list immediately without excess call_rcu?
>
> cbt_page_alloc() is complicated, but in the end it is either called from
> blk_cbt_bio_queue(), where rcu is taken, or from a bunch of user
> operation, where mutex is held. Could you point any call stack where
> this is false?
>
agreed, ok
+-< cbt_page_alloc
+-< __blk_cbt_set
| +-< blk_cbt_add
| | +-< blk_cbt_bio_queue # under rcu_read_lock
| +-< __cbt_flush_cpu_cache
+-< cbt_flush_cache
+-< blk_cbt_snap_create # under cbt_mutex
+-< blk_cbt_update_cbt_size
| +-< blk_cbt_update_size # under cbt_mutex
+-< cbt_ioc_get # under cbt_mutex
+-< cbt_ioc_set # under cbt_mutex
| +-< cbt_ioc_get # under cbt_mutex
| +-< cbt_ioc_set # under cbt_mutex
+-< blk_cbt_snap_merge_back # under cbt_mutex
+-< cbt_flush_cache # ok, see above
>>
>> We definitely need a good explanation of why rcu usage is correct. If
>> we partially protect the list with a lock, the lock should be on the
>> both ends of critical section.
>>
>>>>
>>>> I see other places of _rcu walk without rcu_read_lock, e.g.:
>>>> cbt_page_alloc (!in_rcu) case, blk_cbt_update_size.
>>>>
>>>> Also list can be broken probably if blk_cbt_del runs concurrently
>>>> with itself (blk_cbt_release vs cbt_ioc_stop). I think the whole
>>>> loop in blk_cbt_release should be protected against concurrent
>>>> cbt_ioc_stop (that may be true as blk_cbt_release probably runs at
>>>> the very end).
>>>>
>>> I don't think blk_cbt_release can be run concurrently win any of
>>> blk_cbt functions, because they are operation on a file descriptor. I
>>> am sure that kernel won't call blk_cbt_release() when some fd still
>>> refers to the device
>>
--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list