[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs
Pavel Emelyanov
xemul at virtuozzo.com
Thu Jul 13 12:48:46 MSK 2017
On 07/12/2017 05:42 PM, Kirill Tkhai wrote:
> On 12.07.2017 17:19, Pavel Emelyanov wrote:
>>
>>>>> If so, why we don't try to analyze the task's
>>>>> descriptors, if it already has the one, we add?
>>>>
>>>> We can. In that case we'll need to add a flag on the fle, saying that
>>>> this fle can be closed by scm-sending code. I thought about it, but
>>>> wanted to make the first version simpler.
>>>
>>> Hm. I skip this in the first look, but you introduces a duplicate of
>>> generic code in close_scm_fds(). Why can't a new scm be added as
>>> a fake fle? If so, generic code will close it as another thing of such
>>> sort.
>>>
>>> So, the algorithm may be much simplier. If task has a fle, use it.
>>> If task has no, add a fle with fake flag, and generic code will close
>>> it later.
>>
>> Do we have this close-if-flag-is-set code already?
>
> fdinfo_list_entry::fake
>
>>> I don't think, that it's good to duplicate a generic code.
>>> I see your comment below vvvv, but it seems it does not suppose to
>>> delete close_scm_fds() later.
>>
>> It does. The flag you're talking above is to be added (unless it's there already).
>
> There is already a generic ^^^
OK, will take a look.
>>>>>>> I suppose, the above introduces some circular dependencies for cases
>>>>>>> with several sockets having epoll, which is listening them.
>>>>>>
>>>>>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>>>>>> sockets. So we can only wait here for some other files, that cannot wait us.
>>>>>>
>>>>>>>> +
>>>>>>>> + /* Optimization for the next pass */
>>>>>>>> + list_del(&sf->l);
>>>>>>>> + xfree(sf);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>>>>> {
>>>>>>>> struct fdinfo_list_entry *fle;
>>>>>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>>>>> struct unix_sk_info *ui;
>>>>>>>> int ret;
>>>>>>>>
>>>>>>>> + ui = container_of(d, struct unix_sk_info, d);
>>>>>>>> +
>>>>>>>> + /* FIXME -- only queue restore may be postponed */
>>>>>>>> + if (chk_restored_scms(ui)) {
>>>>>>>> + pr_info("scm: Wait for tgt to restore\n");
>>>>>>>> + return 1;
>>>>>>>> + }
>>>>>>>
>>>>>>> It's too strong limitation, we delay another tasks here.
>>>>>>
>>>>>> Another tasks? No, we just ask files restore loop to go and try to open
>>>>>> some other files.
>>>>>
>>>>> But another tasks may wait file descriptors you're serving out.
>>>>> This tasks will be delayed.
>>>>
>>>> Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
>>>> comment above it is exactly about it -- complicating the code for the sake
>>>> of (hopefully) performance.
>>>
>>> I don't think this will complicate the code. We moved to asynchronous file
>>> restore engine just to do not have "big criu locks".
>>
>> There's no locks here with this if, it just reorders the sequence of fle->open
>> calls.
>
> It's dependencies and it influence on open order.
Of course. This is what we introduced the ability to call ->open again and again -- to
re-order openings the way we need. And I still don't understand what's the problem with
the fact I delay opening of a socket. "This changes the order" and "delays others" are
not real problems, the biggest problem they introduce is a couple of additional CPU
ticks needed to call unix's open for the 2nd time.
>>> Now all dependences
>>> are as strong as they are need, and not more. It seems to be not good
>>> to move away from this path.
>>>
>>>>>>> We may open the socket and to serve it out, and we do
>>>>>>> not need wait queuers for these actions.
>>>>>>
>>>>>> Not always. There are several cases when we do socketpair() then restore
>>>>>> a queue then close one of the ends immediately.
>>>>>
>>>>> Even if the second end is closed, anither tasks may want this
>>>>> slave fle, and their eventpolls may depend on it.
>>>>> I think, we should wait only in the moments, when we must wait, and do
>>>>> not do that in another cases.
>>>>
>>>> This is valid, but must is to be replaced with should :)
>>>> Is anything broken with the current approach? Or just suboptimal?
>>>
>>> Yes, this is suboptiomal, and it's really easy to avoid that, I think.
>>
>> Actually it's not. As I said earlier, there are cases when you cannot restore
>> a socket without a queue and restore the queue later, so for these cases you
>> need to skip the whole restoration. For the cases when you can restore a socket
>> and postpone the queue restore, you may do the first step and then return 1
>> requesting for yet another ->open call. With this only the unix ->open logic
>> becomes noticeably more complex.
>
> Could you give an example of a case, when a socket needs to wait fill another sockets
> are restored, before it can call socket() syscall?
With SCMs -- easily. open_unix_sk_standalone(), branch (ui->ue->state == TCP_ESTABLISHED) && !ui->ue->peer
or ui->ue->type == SOCK_DGRAM && !ui->queuer. Both call socketpair(), restore queue at once, then close
one of the ends. If I have SCM_RIGHTs message in a queue I need to wait for e.g. inet socket to be opened
before I can call socketpair().
>>> It would be difficult to rework that later, if we found the limitations
>>> as too strong that they don't allow to restore some type of files.
>>
>> While I do agree with this in general, I see no points in splitting the unix sk
>> restore into open, postpone, restore the queue (optionally), postpone again, do
>> post_open. This doesn't seem to solve any real problem, makes something (what?)
>> better for a really rare case and complicates the ->open logic for unix sockets.
>
> It's not need to introduce multy-cases. There is a generic rule:
> queues with scms or msg names are restored by the socket itself, and the socket
> uses task's (maybe ghost) fles to do that.
No queues are not restored by the socket itself, they are restored by the socket queuer, that
is some other socket.
-- Pavel
More information about the CRIU
mailing list