[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs

Pavel Emelyanov xemul at virtuozzo.com
Thu Jul 13 14:41:19 MSK 2017


On 07/13/2017 01:41 PM, Kirill Tkhai wrote:
> On 13.07.2017 12:48, Pavel Emelyanov wrote:
>> On 07/12/2017 05:42 PM, Kirill Tkhai wrote:
>>> On 12.07.2017 17:19, Pavel Emelyanov wrote:
>>>>
>>>>>>> If so, why we don't try to analyze the task's
>>>>>>> descriptors, if it already has the one, we add? 
>>>>>>
>>>>>> We can. In that case we'll need to add a flag on the fle, saying that
>>>>>> this fle can be closed by scm-sending code. I thought about it, but
>>>>>> wanted to make the first version simpler.
>>>>>
>>>>> Hm. I skip this in the first look, but you introduces a duplicate of
>>>>> generic code in close_scm_fds(). Why can't a new scm be added as
>>>>> a fake fle? If so, generic code will close it as another thing of such
>>>>> sort.
>>>>>
>>>>> So, the algorithm may be much simplier. If task has a fle, use it.
>>>>> If task has no, add a fle with fake flag, and generic code will close
>>>>> it later.
>>>>
>>>> Do we have this close-if-flag-is-set code already?
>>>
>>> fdinfo_list_entry::fake
>>>
>>>>> I don't think, that it's good to duplicate a generic code.
>>>>> I see your comment below vvvv, but it seems it does not suppose to
>>>>> delete close_scm_fds() later.
>>>>
>>>> It does. The flag you're talking above is to be added (unless it's there already).
>>>
>>> There is already a generic ^^^
>>
>> OK, will take a look.
>>
>>>>>>>>> I suppose, the above introduces some circular dependencies for cases
>>>>>>>>> with several sockets having epoll, which is listening them.
>>>>>>>>
>>>>>>>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>>>>>>>> sockets. So we can only wait here for some other files, that cannot wait us.
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +		/* Optimization for the next pass */
>>>>>>>>>> +		list_del(&sf->l);
>>>>>>>>>> +		xfree(sf);
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>>>>>>>  {
>>>>>>>>>>  	struct fdinfo_list_entry *fle;
>>>>>>>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>>>>>>>  	struct unix_sk_info *ui;
>>>>>>>>>>  	int ret;
>>>>>>>>>>  
>>>>>>>>>> +	ui = container_of(d, struct unix_sk_info, d);
>>>>>>>>>> +
>>>>>>>>>> +	/* FIXME -- only queue restore may be postponed */
>>>>>>>>>> +	if (chk_restored_scms(ui)) {
>>>>>>>>>> +		pr_info("scm: Wait for tgt to restore\n");
>>>>>>>>>> +		return 1;
>>>>>>>>>> +	}
>>>>>>>>>
>>>>>>>>> It's too strong limitation, we delay another tasks here.
>>>>>>>>
>>>>>>>> Another tasks? No, we just ask files restore loop to go and try to open
>>>>>>>> some other files.
>>>>>>>
>>>>>>> But another tasks may wait file descriptors you're serving out.
>>>>>>> This tasks will be delayed.
>>>>>>
>>>>>> Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
>>>>>> comment above it is exactly about it -- complicating the code for the sake
>>>>>> of (hopefully) performance.
>>>>>
>>>>> I don't think this will complicate the code. We moved to asynchronous file
>>>>> restore engine just to do not have "big criu locks".
>>>>
>>>> There's no locks here with this if, it just reorders the sequence of fle->open
>>>> calls.
>>>
>>> It's dependencies and it influence on open order.
>>
>> Of course. This is what we introduced the ability to call ->open again and again -- to
>> re-order openings the way we need. And I still don't understand what's the problem with
>> the fact I delay opening of a socket. "This changes the order" and "delays others" are
>> not real problems, the biggest problem they introduce is a couple of additional CPU
>> ticks needed to call unix's open for the 2nd time.
> 
> My idea is just to restore as much in parallel, as possible. 

I generally agree with that, but scms is not the case we should break our backs with.

> When you don't send master fle to another tasks, these tasks may reach the stage,

Come on, this is pure race. Whether or not the peer task would go sleeping because
we've skipped unix open this time is the question of hitting or missing the time
frame that it takes to call unix_open by pointer and checking that the return code
is 1. How many CPU ticks (not even sched ticks) is that?!

> when the fle is the only thing they are need at the current moment of restore. And
> they will just wait you.
> 
> Also, you will need to serve master fle out as soon as possible, when you
> will implement scm with unix fds.

When we'll implement scm with unix fds just serving master out ASAP won't help, we'll
need to reshuffle much more code to resolve circular and mutual scms.

>>>>> Now all dependences
>>>>> are as strong as they are need, and not more. It seems to be not good
>>>>> to move away from this path.
>>>>>  
>>>>>>>>> We may open the socket and to serve it out, and we do
>>>>>>>>> not need wait queuers for these actions.
>>>>>>>>
>>>>>>>> Not always. There are several cases when we do socketpair() then restore
>>>>>>>> a queue then close one of the ends immediately.
>>>>>>>
>>>>>>> Even if the second end is closed, anither tasks may want this
>>>>>>> slave fle, and their eventpolls may depend on it.
>>>>>>> I think, we should wait only in the moments, when we must wait, and do
>>>>>>> not do that in another cases.
>>>>>>
>>>>>> This is valid, but must is to be replaced with should :)
>>>>>> Is anything broken with the current approach? Or just suboptimal?
>>>>>
>>>>> Yes, this is suboptiomal, and it's really easy to avoid that, I think.
>>>>
>>>> Actually it's not. As I said earlier, there are cases when you cannot restore
>>>> a socket without a queue and restore the queue later, so for these cases you
>>>> need to skip the whole restoration. For the cases when you can restore a socket
>>>> and postpone the queue restore, you may do the first step and then return 1
>>>> requesting for yet another ->open call. With this only the unix ->open logic
>>>> becomes noticeably more complex.
>>>
>>> Could you give an example of a case, when a socket needs to wait fill another sockets
>>> are restored, before it can call socket() syscall?
>>
>> With SCMs -- easily. open_unix_sk_standalone(), branch (ui->ue->state == TCP_ESTABLISHED) && !ui->ue->peer
>> or ui->ue->type == SOCK_DGRAM && !ui->queuer. Both call socketpair(), restore queue at once, then close
>> one of the ends. If I have SCM_RIGHTs message in a queue I need to wait for e.g. inet socket to be opened
>> before I can call socketpair().
> 
> Then only this type of sockets should wait till scm dependences are restored
> before is creates a socket pair.

Yes, but what's the point? This won't help us resolving generic unix sent via unix case anyway. 
The change you're asking for is doubling the checks for whether or not to call socket() or
socketpair() for the sake of spectral parallelism. I can agree with this change, but not as a
part of this set, it's large enough already.

> And the above is only till the moment unix fds in scm are implemented. When
> they are, this type of socket also will fit generic model, but we will need
> more difficult solution for the second closed end. I think, we will need to
> open socketpair, serve first end out and then to pin second end open as fake
> file.
> 
>>>>> It would be difficult to rework that later, if we found the limitations
>>>>> as too strong that they don't allow to restore some type of files.
>>>>
>>>> While I do agree with this in general, I see no points in splitting the unix sk
>>>> restore into open, postpone, restore the queue (optionally), postpone again, do
>>>> post_open. This doesn't seem to solve any real problem, makes something (what?)
>>>> better for a really rare case and complicates the ->open logic for unix sockets.
>>>
>>> It's not need to introduce multy-cases. There is a generic rule:
>>> queues with scms or msg names are restored by the socket itself, and the socket
>>> uses task's (maybe ghost) fles to do that.
>>
>> No queues are not restored by the socket itself, they are restored by the socket queuer, that
>> is some other socket
> .
> 

-- Pavel


More information about the CRIU mailing list