[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs

Wed Jul 12 17:42:51 MSK 2017

On 12.07.2017 17:19, Pavel Emelyanov wrote:
> 
>>>> If so, why we don't try to analyze the task's
>>>> descriptors, if it already has the one, we add? 
>>>
>>> We can. In that case we'll need to add a flag on the fle, saying that
>>> this fle can be closed by scm-sending code. I thought about it, but
>>> wanted to make the first version simpler.
>>
>> Hm. I skip this in the first look, but you introduces a duplicate of
>> generic code in close_scm_fds(). Why can't a new scm be added as
>> a fake fle? If so, generic code will close it as another thing of such
>> sort.
>>
>> So, the algorithm may be much simplier. If task has a fle, use it.
>> If task has no, add a fle with fake flag, and generic code will close
>> it later.
> 
> Do we have this close-if-flag-is-set code already?

fdinfo_list_entry::fake

>> I don't think, that it's good to duplicate a generic code.
>> I see your comment below vvvv, but it seems it does not suppose to
>> delete close_scm_fds() later.
> 
> It does. The flag you're talking above is to be added (unless it's there already).

There is already a generic ^^^

> Another thing about it -- consider you've added this flag to the generic code. 
> Calling close on the descriptor should happen strictly after the respective fle 
> has been sent over the unix socket with scm. How should the _generic_ code make 
> sure it has happened?
> 
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		pr_info("scm: add %d -> %d.fd[%d]\n", tgt->id, vpid(owner), fd);
>>>>>>> +		sfle = xmalloc(sizeof(*sfle));
>>>>>>> +		if (!sfle)
>>>>>>> +			return -1;
>>>>>>> +
>>>>>>> +		sfle->fle = collect_fd_to(vpid(owner), e, rsti(owner), tgt, false);
>>>>>>> +		if (!sfle->fle) {
>>>>>>> +			pr_err("Can't request new fle for scm\n");
>>>>>>> +			return -1;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		list_add_tail(&sfle->l, &ui->scm_fles);
>>>>>>> +		fds[i] = fd;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int chk_restored_scms(struct unix_sk_info *ui)
>>>>>>> +{
>>>>>>> +	struct scm_fle *sf, *n;
>>>>>>> +
>>>>>>> +	list_for_each_entry_safe(sf, n, &ui->scm_fles, l) {
>>>>>>> +		if (sf->fle->stage != FLE_RESTORED)
>>>>>>> +			return 1;
>>>>>>
>>>>>> It's need to wait for sf->fle->bound here instead. 
>>>>>
>>>>> Why?
>>>>
>>>> Then, what is scms? If these are the files, which contain in the
>>>> queue messages, we just need to wait till they open, don't we?
>>>
>>> Here we wait for a file descriptor to appear in the task's fdtable, so that
>>> we could reference one. Do you want to say that waiting for FLE_OPEN is
>>> enough? I need make sure that fle->fd does reference the correct files, 
>>> in other words, file has been put into proper descriptor already. Is that
>>> so? If yes, I will fix this place, but what's the difference between
>>> FLE_OPEN and FLE_RESTORED then?
>>
>> FLE_OPEN means "file is just open, but it's type-specific part is not configured yet".
>> It means the file is right after the open() syscall or socket() syscall etc.
>> It seems to be enough, if unix sockets do not transfer file attributes
>> in additional to flags we set in setup_and_serve_out::fcntl(fle->flags).
>> If so, we should extent setup_and_serve_out() in this case, but I don't
>> think such attributes exist in Linux: the whole generic part is made
>> in setup_and_serve_out(), if the existed, this function would contained
>> them.
> 
> OK, fair enough. Please, send the patch with comments to the constants in
> questions. I'll update this piece in this set.

OK

>>>>>> I suppose, the above introduces some circular dependencies for cases
>>>>>> with several sockets having epoll, which is listening them.
>>>>>
>>>>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>>>>> sockets. So we can only wait here for some other files, that cannot wait us.
>>>>>
>>>>>>> +
>>>>>>> +		/* Optimization for the next pass */
>>>>>>> +		list_del(&sf->l);
>>>>>>> +		xfree(sf);
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>  static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>>>>  {
>>>>>>>  	struct fdinfo_list_entry *fle;
>>>>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>>>>  	struct unix_sk_info *ui;
>>>>>>>  	int ret;
>>>>>>>  
>>>>>>> +	ui = container_of(d, struct unix_sk_info, d);
>>>>>>> +
>>>>>>> +	/* FIXME -- only queue restore may be postponed */
>>>>>>> +	if (chk_restored_scms(ui)) {
>>>>>>> +		pr_info("scm: Wait for tgt to restore\n");
>>>>>>> +		return 1;
>>>>>>> +	}
>>>>>>
>>>>>> It's too strong limitation, we delay another tasks here.
>>>>>
>>>>> Another tasks? No, we just ask files restore loop to go and try to open
>>>>> some other files.
>>>>
>>>> But another tasks may wait file descriptors you're serving out.
>>>> This tasks will be delayed.
>>>
>>> Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
>>> comment above it is exactly about it -- complicating the code for the sake
>>> of (hopefully) performance.
>>
>> I don't think this will complicate the code. We moved to asynchronous file
>> restore engine just to do not have "big criu locks".
> 
> There's no locks here with this if, it just reorders the sequence of fle->open
> calls.

It's dependencies and it influence on open order.

>> Now all dependences
>> are as strong as they are need, and not more. It seems to be not good
>> to move away from this path.
>>  
>>>>>> We may open the socket and to serve it out, and we do
>>>>>> not need wait queuers for these actions.
>>>>>
>>>>> Not always. There are several cases when we do socketpair() then restore
>>>>> a queue then close one of the ends immediately.
>>>>
>>>> Even if the second end is closed, anither tasks may want this
>>>> slave fle, and their eventpolls may depend on it.
>>>> I think, we should wait only in the moments, when we must wait, and do
>>>> not do that in another cases.
>>>
>>> This is valid, but must is to be replaced with should :)
>>> Is anything broken with the current approach? Or just suboptimal?
>>
>> Yes, this is suboptiomal, and it's really easy to avoid that, I think.
> 
> Actually it's not. As I said earlier, there are cases when you cannot restore
> a socket without a queue and restore the queue later, so for these cases you
> need to skip the whole restoration. For the cases when you can restore a socket
> and postpone the queue restore, you may do the first step and then return 1
> requesting for yet another ->open call. With this only the unix ->open logic
> becomes noticeably more complex.

Could you give an example of a case, when a socket needs to wait fill another sockets
are restored, before it can call socket() syscall?

>> It would be difficult to rework that later, if we found the limitations
>> as too strong that they don't allow to restore some type of files.
> 
> While I do agree with this in general, I see no points in splitting the unix sk
> restore into open, postpone, restore the queue (optionally), postpone again, do
> post_open. This doesn't seem to solve any real problem, makes something (what?)
> better for a really rare case and complicates the ->open logic for unix sockets.

It's not need to introduce multy-cases. There is a generic rule:
queues with scms or msg names are restored by the socket itself, and the socket
uses task's (maybe ghost) fles to do that.