[Devel] [PATCH vz9] dm-ploop: fix and rework md updates

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Wed Feb 12 14:12:13 MSK 2025



On 2/12/25 17:42, Alexander Atanasov wrote:
> On 12.02.25 11:02, Pavel Tikhomirov wrote:
>>
>>
>> On 2/12/25 14:38, Alexander Atanasov wrote:
>>> On 12.02.25 7:54, Pavel Tikhomirov wrote:
>>>>
>>>>
>>>> On 2/11/25 22:25, Alexander Atanasov wrote:
>>>>> With the locking reduced it opened windows for races between
>>>>> running updates, pending and new updates. Current logic to
>>>>> deal with them is not correct.
>>>>
>>>> Examples of exact race situation in code would be appreciated, as 
>>>> now it's impossible to unserstand which exact race we are trying to 
>>>> fix.
>>>>
>>>> E.g. preferably in this format:
>>>>
>>>> Thread 1:
>>>> A1
>>>> B1
>>>>                  Thread2:
>>>>                  A2
>>>>                  B2
>>>> C1
>>>>                  C2
>>>
>>>
>>>
>>> Too many to describe them that way - we have three actors.
>>> Locking and logic is wrong.
>>
>> Sorry, but it is not obvious what was wrong with it in the first 
>> place. To review and verify the fix, I need to have full information 
>> about what the problem was (it's unclear from JIRA too, unless I 
>> reinvestigate all the crashes there).
>>
>>> Should i go over the code and explain it?
>>
>> Can you please explain general lock logic a bit and maybe with couple 
>> of references to code. Because for me "races between running updates, 
>> pending and new updates" brings more questions when answers: What is 
>> "running" updates? What is "pending" and "new" updates? What is 
>> "updates"?
> 
> 
> Ok let me try - an md page can be in three states / which comments are 
> updated in .h file/
>   - MD_DIRTY - page awaits writeback - needs to be updated on disk
>   - MD_WRITEBACK - page update is being  processed - write out started 
> and is in progress
> 
> there is only one piwb - in md_page struct.
> Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the 
> base case, others are derivatives.
> 
> 
> First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
> The there is :
> if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))
> 
> It uses piwb to check for dirty - if we have piwb we delay.
> But the problem is in the case MD_WRITEBACK , i.e. it is not started
> page is waiting for it to start.
> 
> Next check the type ALLOC vs DISCARD - we have only one piwb.
> If it is the same it does not delay. And the currently waiting piwb
> can be executed before it is completely updated.
> For example in cow case a pio is added to cow_llist  -> the update can 
> go and complete before it is added, so it won't be executed.
> 
> RACE - bat_write_complete clears  MD_WRITEBACK and sets piwb to NULL
> early before the update is completed. Delayes pios can cause new updates
> while the initial one is not completed.
> 
> 
> Next it marks page dirty - but along that it is adding it to the 
> writeback list - RACE - thread can see it before update is ready.
> Next lock is unlocked.
> 
> To summarize - the race is between update preparation and thread
> executing the updates - in several forms.
> 
> Request thread                            Execution Thread
> Start to prepare update
> take lock
> mark for update and put into the list
> release lock
>                                            Executes unfinished update
>                                            clears flags and NULLs piwb - 
>  > crash somewhere along the path
> 
> do more work to finish preparation

Yes, this explanation is very helpful, now with new MD_UPDATING bit we 
are protected against concurrent "updates" which previously could've 
missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.

And that means that we need to add memory barriers between 
clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding 
pio to cow_llist in ploop_process_one_delta_cow) else we would 
definitely still have races and crashes due to compiler/cpu reordering. 
Probably easiest way to do so is to put bat_lock around all 
clear_bit(MD_UPDATING).

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list