[Devel] [PATCH vz9] dm-ploop: fix and rework md updates
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Wed Feb 12 14:12:13 MSK 2025
On 2/12/25 17:42, Alexander Atanasov wrote:
> On 12.02.25 11:02, Pavel Tikhomirov wrote:
>>
>>
>> On 2/12/25 14:38, Alexander Atanasov wrote:
>>> On 12.02.25 7:54, Pavel Tikhomirov wrote:
>>>>
>>>>
>>>> On 2/11/25 22:25, Alexander Atanasov wrote:
>>>>> With the locking reduced it opened windows for races between
>>>>> running updates, pending and new updates. Current logic to
>>>>> deal with them is not correct.
>>>>
>>>> Examples of exact race situation in code would be appreciated, as
>>>> now it's impossible to unserstand which exact race we are trying to
>>>> fix.
>>>>
>>>> E.g. preferably in this format:
>>>>
>>>> Thread 1:
>>>> A1
>>>> B1
>>>> Thread2:
>>>> A2
>>>> B2
>>>> C1
>>>> C2
>>>
>>>
>>>
>>> Too many to describe them that way - we have three actors.
>>> Locking and logic is wrong.
>>
>> Sorry, but it is not obvious what was wrong with it in the first
>> place. To review and verify the fix, I need to have full information
>> about what the problem was (it's unclear from JIRA too, unless I
>> reinvestigate all the crashes there).
>>
>>> Should i go over the code and explain it?
>>
>> Can you please explain general lock logic a bit and maybe with couple
>> of references to code. Because for me "races between running updates,
>> pending and new updates" brings more questions when answers: What is
>> "running" updates? What is "pending" and "new" updates? What is
>> "updates"?
>
>
> Ok let me try - an md page can be in three states / which comments are
> updated in .h file/
> - MD_DIRTY - page awaits writeback - needs to be updated on disk
> - MD_WRITEBACK - page update is being processed - write out started
> and is in progress
>
> there is only one piwb - in md_page struct.
> Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the
> base case, others are derivatives.
>
>
> First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
> The there is :
> if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))
>
> It uses piwb to check for dirty - if we have piwb we delay.
> But the problem is in the case MD_WRITEBACK , i.e. it is not started
> page is waiting for it to start.
>
> Next check the type ALLOC vs DISCARD - we have only one piwb.
> If it is the same it does not delay. And the currently waiting piwb
> can be executed before it is completely updated.
> For example in cow case a pio is added to cow_llist -> the update can
> go and complete before it is added, so it won't be executed.
>
> RACE - bat_write_complete clears MD_WRITEBACK and sets piwb to NULL
> early before the update is completed. Delayes pios can cause new updates
> while the initial one is not completed.
>
>
> Next it marks page dirty - but along that it is adding it to the
> writeback list - RACE - thread can see it before update is ready.
> Next lock is unlocked.
>
> To summarize - the race is between update preparation and thread
> executing the updates - in several forms.
>
> Request thread Execution Thread
> Start to prepare update
> take lock
> mark for update and put into the list
> release lock
> Executes unfinished update
> clears flags and NULLs piwb -
> > crash somewhere along the path
>
> do more work to finish preparation
Yes, this explanation is very helpful, now with new MD_UPDATING bit we
are protected against concurrent "updates" which previously could've
missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.
And that means that we need to add memory barriers between
clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding
pio to cow_llist in ploop_process_one_delta_cow) else we would
definitely still have races and crashes due to compiler/cpu reordering.
Probably easiest way to do so is to put bat_lock around all
clear_bit(MD_UPDATING).
--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list