[Devel] [PATCH vz9] dm-ploop: fix and rework md updates
Alexander Atanasov
alexander.atanasov at virtuozzo.com
Wed Feb 12 14:19:10 MSK 2025
On 12.02.25 13:12, Pavel Tikhomirov wrote:
>
>
> On 2/12/25 17:42, Alexander Atanasov wrote:
>> On 12.02.25 11:02, Pavel Tikhomirov wrote:
>>>
>>>
>>> On 2/12/25 14:38, Alexander Atanasov wrote:
>>>> On 12.02.25 7:54, Pavel Tikhomirov wrote:
>>>>>
>>>>>
>>>>> On 2/11/25 22:25, Alexander Atanasov wrote:
>>>>>> With the locking reduced it opened windows for races between
>>>>>> running updates, pending and new updates. Current logic to
>>>>>> deal with them is not correct.
>>>>>
>>>>> Examples of exact race situation in code would be appreciated, as
>>>>> now it's impossible to unserstand which exact race we are trying to
>>>>> fix.
>>>>>
>>>>> E.g. preferably in this format:
>>>>>
>>>>> Thread 1:
>>>>> A1
>>>>> B1
>>>>> Thread2:
>>>>> A2
>>>>> B2
>>>>> C1
>>>>> C2
>>>>
>>>>
>>>>
>>>> Too many to describe them that way - we have three actors.
>>>> Locking and logic is wrong.
>>>
>>> Sorry, but it is not obvious what was wrong with it in the first
>>> place. To review and verify the fix, I need to have full information
>>> about what the problem was (it's unclear from JIRA too, unless I
>>> reinvestigate all the crashes there).
>>>
>>>> Should i go over the code and explain it?
>>>
>>> Can you please explain general lock logic a bit and maybe with couple
>>> of references to code. Because for me "races between running updates,
>>> pending and new updates" brings more questions when answers: What is
>>> "running" updates? What is "pending" and "new" updates? What is
>>> "updates"?
>>
>>
>> Ok let me try - an md page can be in three states / which comments are
>> updated in .h file/
>> - MD_DIRTY - page awaits writeback - needs to be updated on disk
>> - MD_WRITEBACK - page update is being processed - write out started
>> and is in progress
>>
>> there is only one piwb - in md_page struct.
>> Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the
>> base case, others are derivatives.
>>
>>
>> First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
>> The there is :
>> if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))
>>
>> It uses piwb to check for dirty - if we have piwb we delay.
>> But the problem is in the case MD_WRITEBACK , i.e. it is not started
>> page is waiting for it to start.
>>
>> Next check the type ALLOC vs DISCARD - we have only one piwb.
>> If it is the same it does not delay. And the currently waiting piwb
>> can be executed before it is completely updated.
>> For example in cow case a pio is added to cow_llist -> the update can
>> go and complete before it is added, so it won't be executed.
>>
>> RACE - bat_write_complete clears MD_WRITEBACK and sets piwb to NULL
>> early before the update is completed. Delayes pios can cause new updates
>> while the initial one is not completed.
>>
>>
>> Next it marks page dirty - but along that it is adding it to the
>> writeback list - RACE - thread can see it before update is ready.
>> Next lock is unlocked.
>>
>> To summarize - the race is between update preparation and thread
>> executing the updates - in several forms.
>>
>> Request thread Execution Thread
>> Start to prepare update
>> take lock
>> mark for update and put into the list
>> release lock
>> Executes unfinished update
>> clears flags and NULLs piwb
>> - > crash somewhere along the path
>>
>> do more work to finish preparation
>
> Yes, this explanation is very helpful, now with new MD_UPDATING bit we
> are protected against concurrent "updates" which previously could've
> missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.
>
> And that means that we need to add memory barriers between
> clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding
> pio to cow_llist in ploop_process_one_delta_cow) else we would
> definitely still have races and crashes due to compiler/cpu reordering.
> Probably easiest way to do so is to put bat_lock around all
> clear_bit(MD_UPDATING).
>
only the set part is important, even if clear is seen later it is okay.
it will be processed on the next cycle
--
Regards,
Alexander Atanasov
More information about the Devel
mailing list