[Devel] [RFC PATCH vz9 v6 12/62] dm-ploop: WIP move from wq to kthread

Alexander Atanasov alexander.atanasov at virtuozzo.com
Fri Jan 10 13:36:07 MSK 2025


On 10.01.25 12:28, Pavel Tikhomirov wrote:
> 
> 
> On 12/6/24 05:55, Alexander Atanasov wrote:
>> This is a base for moving to multithreaded model.
>>
>> https://virtuozzo.atlassian.net/browse/VSTOR-91821
>> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
>> ---
>>   drivers/md/dm-ploop-map.c    | 65 ++++++++++++++++++++++++++++++++----
>>   drivers/md/dm-ploop-target.c | 60 +++++++++++++++++++++++++++++++++
>>   drivers/md/dm-ploop.h        | 10 ++++++
>>   3 files changed, 128 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
>> index 0240d6fc376c..28bced7684e5 100644
>> --- a/drivers/md/dm-ploop-map.c
>> +++ b/drivers/md/dm-ploop-map.c
>> @@ -338,6 +338,16 @@ static int ploop_split_pio_to_list(struct ploop 
>> *ploop, struct pio *pio,
>>   }
>>   ALLOW_ERROR_INJECTION(ploop_split_pio_to_list, ERRNO);
>> +#define USE_KTHREAD
>> +static void ploop_schedule_work(struct ploop *ploop)
>> +{
>> +#ifndef USE_KTHREAD
>> +    queue_work(ploop->wq, &ploop->worker);
>> +#else
>> +    wake_up_process(ploop->kt_worker->task);
>> +#endif
>> +}
>> +
>>   static void ploop_dispatch_pio(struct ploop *ploop, struct pio *pio)
>>   {
>>       struct llist_head *list = (struct llist_head 
>> *)&ploop->pios[pio->queue_list_id];
>> @@ -361,7 +371,7 @@ void ploop_dispatch_pios(struct ploop *ploop, 
>> struct pio *pio,
>>               ploop_dispatch_pio(ploop, pio);
>>       }
>> -    queue_work(ploop->wq, &ploop->worker);
>> +    ploop_schedule_work(ploop);
>>   }
>>   static bool ploop_delay_if_md_busy(struct ploop *ploop, struct 
>> md_page *md,
>> @@ -693,7 +703,7 @@ static void ploop_complete_cow(struct ploop_cow 
>> *cow, blk_status_t bi_status)
>>       ploop_queue_or_fail(ploop, blk_status_to_errno(bi_status), 
>> cow_pio);
>> -    queue_work(ploop->wq, &ploop->worker);
>> +    ploop_schedule_work(ploop);
>>       ploop_free_pio_with_pages(ploop, cow->aux_pio);
>>       kmem_cache_free(cow_cache, cow);
>>   }
>> @@ -1142,7 +1152,7 @@ static void ploop_queue_resubmit(struct pio *pio)
>>       llist_add((struct llist_node *)(&pio->list), 
>> &ploop->llresubmit_pios);
>> -    queue_work(ploop->wq, &ploop->worker);
>> +    ploop_schedule_work(ploop);
>>   }
>>   static void ploop_check_standby_mode(struct ploop *ploop, long res)
>> @@ -1815,9 +1825,8 @@ static void process_ploop_fsync_work(struct 
>> ploop *ploop)
>>       }
>>   }
>> -void do_ploop_work(struct work_struct *ws)
>> +void do_ploop_run_work(struct ploop *ploop)
>>   {
>> -    struct ploop *ploop = container_of(ws, struct ploop, worker);
>>       LIST_HEAD(deferred_pios);
>>       struct llist_node *llembedded_pios;
>>       struct llist_node *lldeferred_pios;
>> @@ -1829,12 +1838,14 @@ void do_ploop_work(struct work_struct *ws)
>>       current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
>>       llembedded_pios = llist_del_all(&ploop->pios[PLOOP_LIST_PREPARE]);
>> +    smp_wmb(); /* */
>> +
>>       lldeferred_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DEFERRED]);
>>       lldiscard_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DISCARD]);
>>       llcow_pios = llist_del_all(&ploop->pios[PLOOP_LIST_COW]);
>>       llresubmit = llist_del_all(&ploop->llresubmit_pios);
>> -    /* add old deferred to the list */
>> +    /* add old deferred back to the list */
>>       if (lldeferred_pios) {
>>           struct llist_node *pos, *t;
>>           struct pio *pio;
>> @@ -1866,6 +1877,46 @@ void do_ploop_work(struct work_struct *ws)
>>       current->flags = old_flags;
>>   }
>> +void do_ploop_work(struct work_struct *ws)
>> +{
>> +    struct ploop *ploop = container_of(ws, struct ploop, worker);
>> +
>> +    do_ploop_run_work(ploop);
>> +}
>> +
>> +int ploop_worker(void *data)
>> +{
>> +    struct ploop_worker *worker = data;
>> +    struct ploop *ploop = worker->ploop;
>> +
>> +    for (;;) {
>> +        set_current_state(TASK_INTERRUPTIBLE);
>> +
>> +        if (kthread_should_stop()) {
>> +            __set_current_state(TASK_RUNNING);
>> +            break;
>> +        }
>> +#ifdef USE_KTHREAD
>> +        smp_rmb(); /* */
> 
> 1. This requires explanation both in comments and in commit message. 
> What reads and what writes do you try to separate by memory barriers?
> 
> 2. It is very hard to guess. It can't be to divide task interruptible 
> state setting and llist_empty checks, because when you should've used 
> smp_mb (as you have both write and read). It can be to divide 
> kthread_should_stop read and llist_empty check reads. But, if it was so, 
> then why do we have smp_rmb (and not wmb) in ploop_destroy where we have 
> kthread_stop which is effectively write.

This we alrady discussed with andrey - barriers are leftovers which i 
missed to remove.

> 3. Side note, here we probably need atomic flag setting else flugs can 
> get corrupted in case concurrent ploop_destroy->kthread_stop.
> 
> do_ploop_run_work() {
>    current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
> }
> 

That is interesting ... as we play a lot with those flags.
Even if we implement some atomicity inside ploop code - how we will 
protect them if changed from other kernel code.
i will think about it.

> 4. Also the use of smp_rmb in ploop_destroy looks like separating 
> llist_empty reads between themselves on while loop iterations. But it's 
> unclear to me why we might need it...
> 
> 5. Also note that wake_up_process already has implicit write/read memory 
> bariers on succesful wakeup.

barriers are removed - it was an early experiment

> 
> Please write some explanations, else reviewing it is too complex.

These patches are step by step as the work progressed and are still an 
RFC.  So to make things easier it is good to check the final result
- some of the issues are resolved there and some patches are to be 
merged but i give priority to bugs/design issues first.


-- 
Regards,
Alexander Atanasov



More information about the Devel mailing list