[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs

Wed Jul 12 15:10:01 MSK 2017

>>>> +	for (i = 0; i < n_ids; i++) {
>>>> +		struct file_desc *tgt;
>>>> +		struct fdinfo_list_entry *fle;
>>>> +		struct scm_fle *sfle;
>>>> +		FdinfoEntry *e;
>>>> +		int fd;
>>>> +
>>>> +		tgt = find_file_desc_raw(FD_TYPES__UND, file_ids[i]);
>>>> +		if (!tgt) {
>>>> +			pr_err("Can't find fdesc to send\n");
>>>> +			return -1;
>>>> +		}
>>>> +
>>>> +		pr_info("`- Found %d file\n", file_ids[i]);
>>>> +		fd = find_unused_fd(owner, -1);
>>>
>>> If I haven't missed something, I don't see, where you try
>>> to found the fle in task's file descriptor  to use it for
>>> queuing. It seems, we add a fake fle for every packet in
>>> a queue. If it's so, it's not performance-good, and firstly
>>> we should try to check, that task already owns the fle instead
>>> of that.
>>
>> Yes, I made this deliberately. Adding one more descriptor for a file
>> will take us several syscalls more. Fortunately queues with scms will
>> not be a frequent case.
> 
> Your series restore queues with file descriptors in scm, and one of
> the queuers is used to to that. Task of this queuer obtains additional
> file descriptors. Right?

Right. Task of the master fle of the queuer does so.

> If so, why we don't try to analyze the task's
> descriptors, if it already has the one, we add? 

We can. In that case we'll need to add a flag on the fle, saying that
this fle can be closed by scm-sending code. I thought about it, but
wanted to make the first version simpler.

> I'm asking about that,
> and this seems to not require excess file descriptors or system calls
> (it's possible to just analyze fd identifiers). I'm not sure we understood
> each other in question %) Do you mean the same?

You're right. If a task that is to send a descriptor already owns the
relevant fle, we can just reuse that. I wanted to do it incrementally,
as this would require more changes and would complicate maintainers'
review :P

At the same time, I don't expect tons of scm-rights messages and think
that saving a couple of syscalls and one descriptors might not worth the 
efforts. But ... anyway.

>>>> +
>>>> +		fle = try_file_master(tgt);
>>>> +		if (fle) {
>>>> +			e = dup_fdinfo(fle->fe, fd, 0);
>>>> +			if (!e) {
>>>> +				pr_err("Can't duplicate fdinfo for scm\n");
>>>> +				return -1;
>>>> +			}
>>>> +		} else {
>>>> +			/*
>>>> +			 * This can happen if the file in question is
>>>> +			 * sent over the socket and closed. In this case
>>>> +			 * we need to ... invent a new one!
>>>> +			 */
>>>> +
>>>> +			e = xmalloc(sizeof(*e));
>>>> +			if (!e)
>>>> +				return -1;
>>>> +
>>>> +			fdinfo_entry__init(e);
>>>> +			e->id = tgt->id;
>>>> +			e->type = tgt->ops->type;
>>>> +			e->fd = fd;
>>>> +			e->flags = 0;
>>>
>>> I'd add the code around fdinfo_entry__init() in separate function
>>> as we already have 3 places like it.
>>
>> OK, will do. What are the others?
> 
> I mean the places, where fdinfo_entry__init() is used:
> $ git grep fdinfo_entry__init
> criu/files.c:   fdinfo_entry__init(e);
> criu/files.c:   fdinfo_entry__init(e);
> criu/sk-unix.c:                 fdinfo_entry__init(e);
> 
> Something like
> 
> e = create_fdinfo_entry(tgt->id, tgt->ops->type, fd, 0, false);
> if (!e)
> 	return -1;
> 
> "false" is to use shmem or not.

Ah, I see. So the options we have now is having N lines of assignments vs
having a routine with M arguments. I would chose the 2nd only if M < N,
but your proposal is 4 assignments vs 5 arguments.

>>>> +		}
>>>> +
>>>> +		pr_info("scm: add %d -> %d.fd[%d]\n", tgt->id, vpid(owner), fd);
>>>> +		sfle = xmalloc(sizeof(*sfle));
>>>> +		if (!sfle)
>>>> +			return -1;
>>>> +
>>>> +		sfle->fle = collect_fd_to(vpid(owner), e, rsti(owner), tgt, false);
>>>> +		if (!sfle->fle) {
>>>> +			pr_err("Can't request new fle for scm\n");
>>>> +			return -1;
>>>> +		}
>>>> +
>>>> +		list_add_tail(&sfle->l, &ui->scm_fles);
>>>> +		fds[i] = fd;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int chk_restored_scms(struct unix_sk_info *ui)
>>>> +{
>>>> +	struct scm_fle *sf, *n;
>>>> +
>>>> +	list_for_each_entry_safe(sf, n, &ui->scm_fles, l) {
>>>> +		if (sf->fle->stage != FLE_RESTORED)
>>>> +			return 1;
>>>
>>> It's need to wait for sf->fle->bound here instead. 
>>
>> Why?
> 
> Then, what is scms? If these are the files, which contain in the
> queue messages, we just need to wait till they open, don't we?

Here we wait for a file descriptor to appear in the task's fdtable, so that
we could reference one. Do you want to say that waiting for FLE_OPEN is
enough? I need make sure that fle->fd does reference the correct files, 
in other words, file has been put into proper descriptor already. Is that
so? If yes, I will fix this place, but what's the difference between
FLE_OPEN and FLE_RESTORED then?

>>> I suppose, the above introduces some circular dependencies for cases
>>> with several sockets having epoll, which is listening them.
>>
>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>> sockets. So we can only wait here for some other files, that cannot wait us.
>>
>>>> +
>>>> +		/* Optimization for the next pass */
>>>> +		list_del(&sf->l);
>>>> +		xfree(sf);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>  {
>>>>  	struct fdinfo_list_entry *fle;
>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>  	struct unix_sk_info *ui;
>>>>  	int ret;
>>>>  
>>>> +	ui = container_of(d, struct unix_sk_info, d);
>>>> +
>>>> +	/* FIXME -- only queue restore may be postponed */
>>>> +	if (chk_restored_scms(ui)) {
>>>> +		pr_info("scm: Wait for tgt to restore\n");
>>>> +		return 1;
>>>> +	}
>>>
>>> It's too strong limitation, we delay another tasks here.
>>
>> Another tasks? No, we just ask files restore loop to go and try to open
>> some other files.
> 
> But another tasks may wait file descriptors you're serving out.
> This tasks will be delayed.

Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
comment above it is exactly about it -- complicating the code for the sake
of (hopefully) performance.

>>> We may open the socket and to serve it out, and we do
>>> not need wait queuers for these actions.
>>
>> Not always. There are several cases when we do socketpair() then restore
>> a queue then close one of the ends immediately.
> 
> Even if the second end is closed, anither tasks may want this
> slave fle, and their eventpolls may depend on it.
> I think, we should wait only in the moments, when we must wait, and do
> not do that in another cases.

This is valid, but must is to be replaced with should :)
Is anything broken with the current approach? Or just suboptimal?

-- Pavel