[CRIU] [PATCH 18/18] SCM: Dump and restore SCM_RIGHTs
Kirill Tkhai
ktkhai at virtuozzo.com
Wed Jul 12 16:04:47 MSK 2017
On 12.07.2017 15:10, Pavel Emelyanov wrote:
>
>>>>> + for (i = 0; i < n_ids; i++) {
>>>>> + struct file_desc *tgt;
>>>>> + struct fdinfo_list_entry *fle;
>>>>> + struct scm_fle *sfle;
>>>>> + FdinfoEntry *e;
>>>>> + int fd;
>>>>> +
>>>>> + tgt = find_file_desc_raw(FD_TYPES__UND, file_ids[i]);
>>>>> + if (!tgt) {
>>>>> + pr_err("Can't find fdesc to send\n");
>>>>> + return -1;
>>>>> + }
>>>>> +
>>>>> + pr_info("`- Found %d file\n", file_ids[i]);
>>>>> + fd = find_unused_fd(owner, -1);
>>>>
>>>> If I haven't missed something, I don't see, where you try
>>>> to found the fle in task's file descriptor to use it for
>>>> queuing. It seems, we add a fake fle for every packet in
>>>> a queue. If it's so, it's not performance-good, and firstly
>>>> we should try to check, that task already owns the fle instead
>>>> of that.
>>>
>>> Yes, I made this deliberately. Adding one more descriptor for a file
>>> will take us several syscalls more. Fortunately queues with scms will
>>> not be a frequent case.
>>
>> Your series restore queues with file descriptors in scm, and one of
>> the queuers is used to to that. Task of this queuer obtains additional
>> file descriptors. Right?
>
> Right. Task of the master fle of the queuer does so.
>
>> If so, why we don't try to analyze the task's
>> descriptors, if it already has the one, we add?
>
> We can. In that case we'll need to add a flag on the fle, saying that
> this fle can be closed by scm-sending code. I thought about it, but
> wanted to make the first version simpler.
Hm. I skip this in the first look, but you introduces a duplicate of
generic code in close_scm_fds(). Why can't a new scm be added as
a fake fle? If so, generic code will close it as another thing of such
sort.
So, the algorithm may be much simplier. If task has a fle, use it.
If task has no, add a fle with fake flag, and generic code will close
it later.
I don't think, that it's good to duplicate a generic code.
I see your comment below vvvv, but it seems it does not suppose to
delete close_scm_fds() later.
>> I'm asking about that,
>> and this seems to not require excess file descriptors or system calls
>> (it's possible to just analyze fd identifiers). I'm not sure we understood
>> each other in question %) Do you mean the same?
>
> You're right. If a task that is to send a descriptor already owns the
> relevant fle, we can just reuse that. I wanted to do it incrementally,
> as this would require more changes and would complicate maintainers'
> review :P
>
> At the same time, I don't expect tons of scm-rights messages and think
> that saving a couple of syscalls and one descriptors might not worth the
> efforts. But ... anyway.
This comment referred ^^^^
>>>>> +
>>>>> + fle = try_file_master(tgt);
>>>>> + if (fle) {
>>>>> + e = dup_fdinfo(fle->fe, fd, 0);
>>>>> + if (!e) {
>>>>> + pr_err("Can't duplicate fdinfo for scm\n");
>>>>> + return -1;
>>>>> + }
>>>>> + } else {
>>>>> + /*
>>>>> + * This can happen if the file in question is
>>>>> + * sent over the socket and closed. In this case
>>>>> + * we need to ... invent a new one!
>>>>> + */
>>>>> +
>>>>> + e = xmalloc(sizeof(*e));
>>>>> + if (!e)
>>>>> + return -1;
>>>>> +
>>>>> + fdinfo_entry__init(e);
>>>>> + e->id = tgt->id;
>>>>> + e->type = tgt->ops->type;
>>>>> + e->fd = fd;
>>>>> + e->flags = 0;
>>>>
>>>> I'd add the code around fdinfo_entry__init() in separate function
>>>> as we already have 3 places like it.
>>>
>>> OK, will do. What are the others?
>>
>> I mean the places, where fdinfo_entry__init() is used:
>> $ git grep fdinfo_entry__init
>> criu/files.c: fdinfo_entry__init(e);
>> criu/files.c: fdinfo_entry__init(e);
>> criu/sk-unix.c: fdinfo_entry__init(e);
>>
>> Something like
>>
>> e = create_fdinfo_entry(tgt->id, tgt->ops->type, fd, 0, false);
>> if (!e)
>> return -1;
>>
>> "false" is to use shmem or not.
>
> Ah, I see. So the options we have now is having N lines of assignments vs
> having a routine with M arguments. I would chose the 2nd only if M < N,
> but your proposal is 4 assignments vs 5 arguments.
>
>>>>> + }
>>>>> +
>>>>> + pr_info("scm: add %d -> %d.fd[%d]\n", tgt->id, vpid(owner), fd);
>>>>> + sfle = xmalloc(sizeof(*sfle));
>>>>> + if (!sfle)
>>>>> + return -1;
>>>>> +
>>>>> + sfle->fle = collect_fd_to(vpid(owner), e, rsti(owner), tgt, false);
>>>>> + if (!sfle->fle) {
>>>>> + pr_err("Can't request new fle for scm\n");
>>>>> + return -1;
>>>>> + }
>>>>> +
>>>>> + list_add_tail(&sfle->l, &ui->scm_fles);
>>>>> + fds[i] = fd;
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> +static int chk_restored_scms(struct unix_sk_info *ui)
>>>>> +{
>>>>> + struct scm_fle *sf, *n;
>>>>> +
>>>>> + list_for_each_entry_safe(sf, n, &ui->scm_fles, l) {
>>>>> + if (sf->fle->stage != FLE_RESTORED)
>>>>> + return 1;
>>>>
>>>> It's need to wait for sf->fle->bound here instead.
>>>
>>> Why?
>>
>> Then, what is scms? If these are the files, which contain in the
>> queue messages, we just need to wait till they open, don't we?
>
> Here we wait for a file descriptor to appear in the task's fdtable, so that
> we could reference one. Do you want to say that waiting for FLE_OPEN is
> enough? I need make sure that fle->fd does reference the correct files,
> in other words, file has been put into proper descriptor already. Is that
> so? If yes, I will fix this place, but what's the difference between
> FLE_OPEN and FLE_RESTORED then?
FLE_OPEN means "file is just open, but it's type-specific part is not configured yet".
It means the file is right after the open() syscall or socket() syscall etc.
It seems to be enough, if unix sockets do not transfer file attributes
in additional to flags we set in setup_and_serve_out::fcntl(fle->flags).
If so, we should extent setup_and_serve_out() in this case, but I don't
think such attributes exist in Linux: the whole generic part is made
in setup_and_serve_out(), if the existed, this function would contained
them.
>>>> I suppose, the above introduces some circular dependencies for cases
>>>> with several sockets having epoll, which is listening them.
>>>
>>> Nope :) On dump I explicitly ban unix sockets and epolls sent via unix
>>> sockets. So we can only wait here for some other files, that cannot wait us.
>>>
>>>>> +
>>>>> + /* Optimization for the next pass */
>>>>> + list_del(&sf->l);
>>>>> + xfree(sf);
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static int wake_connected_sockets(struct unix_sk_info *ui)
>>>>> {
>>>>> struct fdinfo_list_entry *fle;
>>>>> @@ -1322,12 +1438,18 @@ static int open_unix_sk(struct file_desc *d, int *new_fd)
>>>>> struct unix_sk_info *ui;
>>>>> int ret;
>>>>>
>>>>> + ui = container_of(d, struct unix_sk_info, d);
>>>>> +
>>>>> + /* FIXME -- only queue restore may be postponed */
>>>>> + if (chk_restored_scms(ui)) {
>>>>> + pr_info("scm: Wait for tgt to restore\n");
>>>>> + return 1;
>>>>> + }
>>>>
>>>> It's too strong limitation, we delay another tasks here.
>>>
>>> Another tasks? No, we just ask files restore loop to go and try to open
>>> some other files.
>>
>> But another tasks may wait file descriptors you're serving out.
>> This tasks will be delayed.
>
> Mm... Yes, it will. Isn't it the cost of synchronization anyway? The FIXME
> comment above it is exactly about it -- complicating the code for the sake
> of (hopefully) performance.
I don't think this will complicate the code. We moved to asynchronous file
restore engine just to do not have "big criu locks". Now all dependences
are as strong as they are need, and not more. It seems to be not good
to move away from this path.
>>>> We may open the socket and to serve it out, and we do
>>>> not need wait queuers for these actions.
>>>
>>> Not always. There are several cases when we do socketpair() then restore
>>> a queue then close one of the ends immediately.
>>
>> Even if the second end is closed, anither tasks may want this
>> slave fle, and their eventpolls may depend on it.
>> I think, we should wait only in the moments, when we must wait, and do
>> not do that in another cases.
>
> This is valid, but must is to be replaced with should :)
> Is anything broken with the current approach? Or just suboptimal?
Yes, this is suboptiomal, and it's really easy to avoid that, I think.
It would be difficult to rework that later, if we found the limitations
as too strong that they don't allow to restore some type of files.
More information about the CRIU
mailing list