[Devel] [RFC PATCH vz9 v6 12/62] dm-ploop: WIP move from wq to kthread

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Jan 10 13:40:28 MSK 2025



On 1/10/25 18:36, Alexander Atanasov wrote:
> On 10.01.25 12:28, Pavel Tikhomirov wrote:
>>
>>
>> On 12/6/24 05:55, Alexander Atanasov wrote:
>>> This is a base for moving to multithreaded model.
>>>
>>> https://virtuozzo.atlassian.net/browse/VSTOR-91821
>>> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
>>> ---
>>>   drivers/md/dm-ploop-map.c    | 65 ++++++++++++++++++++++++++++++++----
>>>   drivers/md/dm-ploop-target.c | 60 +++++++++++++++++++++++++++++++++
>>>   drivers/md/dm-ploop.h        | 10 ++++++
>>>   3 files changed, 128 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>>> index 0240d6fc376c..28bced7684e5 100644
>>> --- a/drivers/md/dm-ploop-map.c
>>> +++ b/drivers/md/dm-ploop-map.c
>>> @@ -338,6 +338,16 @@ static int ploop_split_pio_to_list(struct ploop 
>>> *ploop, struct pio *pio,
>>>   }
>>>   ALLOW_ERROR_INJECTION(ploop_split_pio_to_list, ERRNO);
>>> +#define USE_KTHREAD
>>> +static void ploop_schedule_work(struct ploop *ploop)
>>> +{
>>> +#ifndef USE_KTHREAD
>>> +    queue_work(ploop->wq, &ploop->worker);
>>> +#else
>>> +    wake_up_process(ploop->kt_worker->task);
>>> +#endif
>>> +}
>>> +
>>>   static void ploop_dispatch_pio(struct ploop *ploop, struct pio *pio)
>>>   {
>>>       struct llist_head *list = (struct llist_head *)&ploop- 
>>> >pios[pio->queue_list_id];
>>> @@ -361,7 +371,7 @@ void ploop_dispatch_pios(struct ploop *ploop, 
>>> struct pio *pio,
>>>               ploop_dispatch_pio(ploop, pio);
>>>       }
>>> -    queue_work(ploop->wq, &ploop->worker);
>>> +    ploop_schedule_work(ploop);
>>>   }
>>>   static bool ploop_delay_if_md_busy(struct ploop *ploop, struct 
>>> md_page *md,
>>> @@ -693,7 +703,7 @@ static void ploop_complete_cow(struct ploop_cow 
>>> *cow, blk_status_t bi_status)
>>>       ploop_queue_or_fail(ploop, blk_status_to_errno(bi_status), 
>>> cow_pio);
>>> -    queue_work(ploop->wq, &ploop->worker);
>>> +    ploop_schedule_work(ploop);
>>>       ploop_free_pio_with_pages(ploop, cow->aux_pio);
>>>       kmem_cache_free(cow_cache, cow);
>>>   }
>>> @@ -1142,7 +1152,7 @@ static void ploop_queue_resubmit(struct pio *pio)
>>>       llist_add((struct llist_node *)(&pio->list), &ploop- 
>>> >llresubmit_pios);
>>> -    queue_work(ploop->wq, &ploop->worker);
>>> +    ploop_schedule_work(ploop);
>>>   }
>>>   static void ploop_check_standby_mode(struct ploop *ploop, long res)
>>> @@ -1815,9 +1825,8 @@ static void process_ploop_fsync_work(struct 
>>> ploop *ploop)
>>>       }
>>>   }
>>> -void do_ploop_work(struct work_struct *ws)
>>> +void do_ploop_run_work(struct ploop *ploop)
>>>   {
>>> -    struct ploop *ploop = container_of(ws, struct ploop, worker);
>>>       LIST_HEAD(deferred_pios);
>>>       struct llist_node *llembedded_pios;
>>>       struct llist_node *lldeferred_pios;
>>> @@ -1829,12 +1838,14 @@ void do_ploop_work(struct work_struct *ws)
>>>       current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
>>>       llembedded_pios = llist_del_all(&ploop->pios[PLOOP_LIST_PREPARE]);
>>> +    smp_wmb(); /* */
>>> +
>>>       lldeferred_pios = llist_del_all(&ploop- 
>>> >pios[PLOOP_LIST_DEFERRED]);
>>>       lldiscard_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DISCARD]);
>>>       llcow_pios = llist_del_all(&ploop->pios[PLOOP_LIST_COW]);
>>>       llresubmit = llist_del_all(&ploop->llresubmit_pios);
>>> -    /* add old deferred to the list */
>>> +    /* add old deferred back to the list */
>>>       if (lldeferred_pios) {
>>>           struct llist_node *pos, *t;
>>>           struct pio *pio;
>>> @@ -1866,6 +1877,46 @@ void do_ploop_work(struct work_struct *ws)
>>>       current->flags = old_flags;
>>>   }
>>> +void do_ploop_work(struct work_struct *ws)
>>> +{
>>> +    struct ploop *ploop = container_of(ws, struct ploop, worker);
>>> +
>>> +    do_ploop_run_work(ploop);
>>> +}
>>> +
>>> +int ploop_worker(void *data)
>>> +{
>>> +    struct ploop_worker *worker = data;
>>> +    struct ploop *ploop = worker->ploop;
>>> +
>>> +    for (;;) {
>>> +        set_current_state(TASK_INTERRUPTIBLE);
>>> +
>>> +        if (kthread_should_stop()) {
>>> +            __set_current_state(TASK_RUNNING);
>>> +            break;
>>> +        }
>>> +#ifdef USE_KTHREAD
>>> +        smp_rmb(); /* */
>>
>> 1. This requires explanation both in comments and in commit message. 
>> What reads and what writes do you try to separate by memory barriers?
>>
>> 2. It is very hard to guess. It can't be to divide task interruptible 
>> state setting and llist_empty checks, because when you should've used 
>> smp_mb (as you have both write and read). It can be to divide 
>> kthread_should_stop read and llist_empty check reads. But, if it was 
>> so, then why do we have smp_rmb (and not wmb) in ploop_destroy where 
>> we have kthread_stop which is effectively write.
> 
> This we alrady discussed with andrey - barriers are leftovers which i 
> missed to remove.
> 
>> 3. Side note, here we probably need atomic flag setting else flugs can 
>> get corrupted in case concurrent ploop_destroy->kthread_stop.
>>
>> do_ploop_run_work() {
>>    current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
>> }
>>
> 
> That is interesting ... as we play a lot with those flags.
> Even if we implement some atomicity inside ploop code - how we will 
> protect them if changed from other kernel code.
> i will think about it.
> 
>> 4. Also the use of smp_rmb in ploop_destroy looks like separating 
>> llist_empty reads between themselves on while loop iterations. But 
>> it's unclear to me why we might need it...
>>
>> 5. Also note that wake_up_process already has implicit write/read 
>> memory bariers on succesful wakeup.
> 
> barriers are removed - it was an early experiment
> 
>>
>> Please write some explanations, else reviewing it is too complex.
> 
> These patches are step by step as the work progressed and are still an 
> RFC.  So to make things easier it is good to check the final result
> - some of the issues are resolved there and some patches are to be 
> merged but i give priority to bugs/design issues first.
> 

Yes I understand, I just got lost between those barriers =) Please try 
to write followups to major things which were already resolved, just to 
keep everybody updated, I seem to miss at least partially messages about 
barriers.

> 

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list