[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs

Wed Jul 12 17:19:58 MSK 2017

>>> If so, why we don't try to analyze the task's
>>> descriptors, if it already has the one, we add? 
>>
>> We can. In that case we'll need to add a flag on the fle, saying that
>> this fle can be closed by scm-sending code. I thought about it, but
>> wanted to make the first version simpler.
> 
> Hm. I skip this in the first look, but you introduces a duplicate of
> generic code in close_scm_fds(). Why can't a new scm be added as
> a fake fle? If so, generic code will close it as another thing of such
> sort.
> 
> So, the algorithm may be much simplier. If task has a fle, use it.
> If task has no, add a fle with fake flag, and generic code will close
> it later.

Do we have this close-if-flag-is-set code already?

> I don't think, that it's good to duplicate a generic code.
> I see your comment below vvvv, but it seems it does not suppose to
> delete close_scm_fds() later.

It does. The flag you're talking above is to be added (unless it's there already).

Another thing about it -- consider you've added this flag to the generic code. 
Calling close on the descriptor should happen strictly after the respective fle 
has been sent over the unix socket with scm. How should the _generic_ code make 
sure it has happened?

>>>>>> +		}
>>>>>> +
>>>>>> +		pr_info("scm: add %d -> %d.fd[%d]\n", tgt->id, vpid(owner), fd);
>>>>>> +		sfle = xmalloc(sizeof(*sfle));
>>>>>> +		if (!sfle)
>>>>>> +			return -1;
>>>>>> +
>>>>>> +		sfle->fle = collect_fd_to(vpid(owner), e, rsti(owner), tgt, false);
>>>>>> +		if (!sfle->fle) {
>>>>>> +			pr_err("Can't request new fle for scm\n");
>>>>>> +			return -1;
>>>>>> +		}
>>>>>> +
>>>>>> +		list_add_tail(&sfle->l, &ui->scm_fles);
>>>>>> +		fds[i] = fd;
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int chk_restored_scms(struct unix_sk_info *ui)
>>>>>> +{
>>>>>> +	struct scm_fle *sf, *n;
>>>>>> +
>>>>>> +	list_for_each_entry_safe(sf, n, &ui->scm_fles, l) {
>>>>>> +		if (sf->fle->stage != FLE_RESTORED)
>>>>>> +			return 1;
>>>>>
>>>>> It's need to wait for sf->fle->bound here instead. 
>>>>
>>>> Why?
>>>
>>> Then, what is scms? If these are the files, which contain in the
>>> queue messages, we just need to wait till they open, don't we?
>>
>> Here we wait for a file descriptor to appear in the task's fdtable, so that
>> we could reference one. Do you want to say that waiting for FLE_OPEN is
>> enough? I need make sure that fle->fd does reference the correct files, 
>> in other words, file has been put into proper descriptor already. Is that
>> so? If yes, I will fix this place, but what's the difference between
>> FLE_OPEN and FLE_RESTORED then?
> 
> FLE_OPEN means "file is just open, but it's type-specific part is not configured yet".
> It means the file is right after the open() syscall or socket() syscall etc.
> It seems to be enough, if unix sockets do not transfer file attributes
> in additional to flags we set in setup_and_serve_out::fcntl(fle->flags).
> If so, we should extent setup_and_serve_out() in this case, but I don't
> think such attributes exist in Linux: the whole generic part is made
> in setup_and_serve_out(), if the existed, this function would contained
> them.

OK, fair enough. Please, send the patch with comments to the constants in
questions. I'll update this piece in this set.

>>>>> I suppose, the above introduces some circular dependencies for cases
>>>>> with several sockets having epoll, which is listening them.
>>>>
>>>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>>>> sockets. So we can only wait here for some other files, that cannot wait us.
>>>>
>>>>>> +
>>>>>> +		/* Optimization for the next pass */
>>>>>> +		list_del(&sf->l);
>>>>>> +		xfree(sf);
>>>>>> +	}
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>>  static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>>>  {
>>>>>>  	struct fdinfo_list_entry *fle;
>>>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>>>  	struct unix_sk_info *ui;
>>>>>>  	int ret;
>>>>>>  
>>>>>> +	ui = container_of(d, struct unix_sk_info, d);
>>>>>> +
>>>>>> +	/* FIXME -- only queue restore may be postponed */
>>>>>> +	if (chk_restored_scms(ui)) {
>>>>>> +		pr_info("scm: Wait for tgt to restore\n");
>>>>>> +		return 1;
>>>>>> +	}
>>>>>
>>>>> It's too strong limitation, we delay another tasks here.
>>>>
>>>> Another tasks? No, we just ask files restore loop to go and try to open
>>>> some other files.
>>>
>>> But another tasks may wait file descriptors you're serving out.
>>> This tasks will be delayed.
>>
>> Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
>> comment above it is exactly about it -- complicating the code for the sake
>> of (hopefully) performance.
> 
> I don't think this will complicate the code. We moved to asynchronous file
> restore engine just to do not have "big criu locks".

There's no locks here with this if, it just reorders the sequence of fle->open
calls.

> Now all dependences
> are as strong as they are need, and not more. It seems to be not good
> to move away from this path.
>  
>>>>> We may open the socket and to serve it out, and we do
>>>>> not need wait queuers for these actions.
>>>>
>>>> Not always. There are several cases when we do socketpair() then restore
>>>> a queue then close one of the ends immediately.
>>>
>>> Even if the second end is closed, anither tasks may want this
>>> slave fle, and their eventpolls may depend on it.
>>> I think, we should wait only in the moments, when we must wait, and do
>>> not do that in another cases.
>>
>> This is valid, but must is to be replaced with should :)
>> Is anything broken with the current approach? Or just suboptimal?
> 
> Yes, this is suboptiomal, and it's really easy to avoid that, I think.

Actually it's not. As I said earlier, there are cases when you cannot restore
a socket without a queue and restore the queue later, so for these cases you
need to skip the whole restoration. For the cases when you can restore a socket
and postpone the queue restore, you may do the first step and then return 1
requesting for yet another ->open call. With this only the unix ->open logic
becomes noticeably more complex.

> It would be difficult to rework that later, if we found the limitations
> as too strong that they don't allow to restore some type of files.

While I do agree with this in general, I see no points in splitting the unix sk
restore into open, postpone, restore the queue (optionally), postpone again, do
post_open. This doesn't seem to solve any real problem, makes something (what?)
better for a really rare case and complicates the ->open logic for unix sockets.

-- Pavel