[Devel] [PATCH vz9] dm-ploop: fix and rework md updates

Wed Feb 12 14:19:10 MSK 2025

On 12.02.25 13:12, Pavel Tikhomirov wrote:
> 
> 
> On 2/12/25 17:42, Alexander Atanasov wrote:
>> On 12.02.25 11:02, Pavel Tikhomirov wrote:
>>>
>>>
>>> On 2/12/25 14:38, Alexander Atanasov wrote:
>>>> On 12.02.25 7:54, Pavel Tikhomirov wrote:
>>>>>
>>>>>
>>>>> On 2/11/25 22:25, Alexander Atanasov wrote:
>>>>>> With the locking reduced it opened windows for races between
>>>>>> running updates, pending and new updates. Current logic to
>>>>>> deal with them is not correct.
>>>>>
>>>>> Examples of exact race situation in code would be appreciated, as 
>>>>> now it's impossible to unserstand which exact race we are trying to 
>>>>> fix.
>>>>>
>>>>> E.g. preferably in this format:
>>>>>
>>>>> Thread 1:
>>>>> A1
>>>>> B1
>>>>>                  Thread2:
>>>>>                  A2
>>>>>                  B2
>>>>> C1
>>>>>                  C2
>>>>
>>>>
>>>>
>>>> Too many to describe them that way - we have three actors.
>>>> Locking and logic is wrong.
>>>
>>> Sorry, but it is not obvious what was wrong with it in the first 
>>> place. To review and verify the fix, I need to have full information 
>>> about what the problem was (it's unclear from JIRA too, unless I 
>>> reinvestigate all the crashes there).
>>>
>>>> Should i go over the code and explain it?
>>>
>>> Can you please explain general lock logic a bit and maybe with couple 
>>> of references to code. Because for me "races between running updates, 
>>> pending and new updates" brings more questions when answers: What is 
>>> "running" updates? What is "pending" and "new" updates? What is 
>>> "updates"?
>>
>>
>> Ok let me try - an md page can be in three states / which comments are 
>> updated in .h file/
>>   - MD_DIRTY - page awaits writeback - needs to be updated on disk
>>   - MD_WRITEBACK - page update is being  processed - write out started 
>> and is in progress
>>
>> there is only one piwb - in md_page struct.
>> Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the 
>> base case, others are derivatives.
>>
>>
>> First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
>> The there is :
>> if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))
>>
>> It uses piwb to check for dirty - if we have piwb we delay.
>> But the problem is in the case MD_WRITEBACK , i.e. it is not started
>> page is waiting for it to start.
>>
>> Next check the type ALLOC vs DISCARD - we have only one piwb.
>> If it is the same it does not delay. And the currently waiting piwb
>> can be executed before it is completely updated.
>> For example in cow case a pio is added to cow_llist  -> the update can 
>> go and complete before it is added, so it won't be executed.
>>
>> RACE - bat_write_complete clears  MD_WRITEBACK and sets piwb to NULL
>> early before the update is completed. Delayes pios can cause new updates
>> while the initial one is not completed.
>>
>>
>> Next it marks page dirty - but along that it is adding it to the 
>> writeback list - RACE - thread can see it before update is ready.
>> Next lock is unlocked.
>>
>> To summarize - the race is between update preparation and thread
>> executing the updates - in several forms.
>>
>> Request thread                            Execution Thread
>> Start to prepare update
>> take lock
>> mark for update and put into the list
>> release lock
>>                                            Executes unfinished update
>>                                            clears flags and NULLs piwb 
>> -  > crash somewhere along the path
>>
>> do more work to finish preparation
> 
> Yes, this explanation is very helpful, now with new MD_UPDATING bit we 
> are protected against concurrent "updates" which previously could've 
> missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.
> 
> And that means that we need to add memory barriers between 
> clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding 
> pio to cow_llist in ploop_process_one_delta_cow) else we would 
> definitely still have races and crashes due to compiler/cpu reordering. 
> Probably easiest way to do so is to put bat_lock around all 
> clear_bit(MD_UPDATING).
> 

only the set part is important, even if clear is seen later it is okay.
it will be processed on the next cycle

-- 
Regards,
Alexander Atanasov