[CRIU] [PATCH] Make restored processes inherit file descriptors from criu
Pavel Emelyanov
xemul at parallels.com
Tue Dec 2 00:33:41 PST 2014
On 12/01/2014 08:15 PM, Saied Kazemi wrote:
>> Actually, after more thinking I came to conclusion that with pipes the situation
>> may be even more confusing. If for some reason the process we restore hold one
>> pipe with two "ends" -- reading and writing -- and we say that this pipe should
>> be inherited from criu, we make this process put one criu's pipe end into its
>> both descriptors. Thus one of them will happen in the wrong read/write mode.
>
> I would go back to the use case. We're not making a broad statement
> that with inherit fd one can replace an arbitrary descriptor at
> checkpoint time with another arbitrary descriptor at restore time for
> the entire process tree and expect everything to work. What we
> can/should say is that inherit fd functionality can restore certain
> descriptors that cannot otherwise be restored at all (causing the
> entire restore to fail).
>
> I suggest that we take the same approach here that we took with
> external bind mounts. Start with a simple implementation that does
> not change core CRIU functionality (hence very low risk) and see in
> practice what issues it may/will uncover. Based on the issues, we can
> decide whether to extend its functionality or drop it.
I don't disagree :) Though dropping something that has an external
API would not be extremely easy...
BTW, while we're at it, we have a list of problems in our image
files [1], I think that some time soon I'll start a page with the
list of API problems. And any idea how to fix those w/o breaking
the users are always welcome.
[1] http://criu.org/What%27s_bad_with_V1_images
>
>> I think that the root cause of it is that with the --inherit-fd option we're
>> mapping into each other objects of different types. We ask criu to replace an
>> "inode" object (/foor/bar from reg-file.img or pipe[12345] from pipes.img)
>> with a "file" one (the fd[3]).
>
> No, the idea is not to replace one inode type with another. We're
> asking CRIU to use a live pipe fd in its descriptors instead of a
> dead/broken pipe fd in the image file.
Agreed.
> Or, we're asking CRIU to use a
> new regular file fd (e.g., /tmp/newfoo) instead of an old regular file
> fd (e.g., /tmp/oldfoo). But we're not asking CRIU to use a regular
> file inode in place of a pipe inode or vice versa. Sorry if this
> wasn't clear.
Well, yes, now it's more clear. But still the issue with addressing the
object stays. Here's what I mean, if a task has some file opened, in the
kernel memory this chain of objects exists:
task -> file-descriptors-table [ FD ] -> struct file -> struct dentry -> struct inode
Several file-s may reference one dentry/inode part, as well as several
FD-s may reference one file. Like this
task1 -> FDT [FD] -+
|
task2 -> FDT [FD] -+-> struct file -+-> struct dentry -> struct inode
|
task3 -> FDT [FD] ---> struct file -+
When we say that we want to --inherit-fd fd[X]:/foo/bar, with the tail
part of the option (the /foo/bar) we address the dentry/inode pair and with
the head part of it (the fd[X]) we point to the struct file object in
the criu's FDT.
I'm ready to make the --inherit-fd option be the "use at your own risk"
one, but the problem is that a CRIU caller has no API to find out what
will happen after he uses it. IOW -- there's currently no way to reveal
the diagram above. For three tasks in this example the /proc/pid/fd
would show you the same three lines
3 -> /foo/bar
and that's it. This complex structure is hidden inside the kernel.
You can find this in the criu's image files as well. In the reg-files.img
there are records like
id: 0xabc ... name: /foo/bar
and while the id addresses the struct file, the name field is about the
dentry/inode. Same with pipes.img
id: 0xcde ... pipe_id: 0x123
The id is struct file and the pipe_id is struct inode (dentries are fake
for pipes).
And the same is true for any other image with struct file-s.
If with the --inherit-fd option we point to the "id" field, then this will
be clear what would happen. Maybe we can go this route?
>> What would you say if we make the --inherid-fd option (on restore only) work like
>> this. User would say "--inherit-fd $pid.$fd:$fd2" thus making criu to get a struct
>> file referenced by task $pid with the descriptor $fd, find everybody else who
>> references the same struct file and replace this struct file with the one, pointed
>> by criu process with the $fd2 descriptor. With this we map one file to another one.
>> And the collect_fd function from files.c does exactly what we need -- it resolves
>> the struct file sharing, finding the one who will "create" the inode and
>> serve one out to the others.
>
> If I understand the gist of your proposal, it's almost the same as
> what we're already doing because task $pid is CRIU itself and $fd is
> the descriptor that is set up for CRIU when it's called.
No, it's reversed. Sorry for not being clear enough.
> If I misunderstood, please explain what is task $pid and how is it created?
I meant that $pid.$fd points to some task from images and $fd2
points to criu's descriptor. Thus with the first pair we address
a struct file in the tree we restore, and with the $fd2 we also
address a struct file but in the criu's FDT.
Actually it's the same as with "id" field I've described above.
The task's fdinfo.img image entry looks like
id: 0xfde ... type: xxx fd: 1
If you take task's $pid entry with fd being $fd you uniquely
identify the (type, id) pair, which is the struct file.
> Is it a task that was previously checkpointed or is it a new task
> that the user has to start in addition to starting CRIU itself?
>
> Should I go ahead to remove checkpoint side of inherit fd and send you
> a patch (for pipe and regular file) or do you prefer to discuss an
> alternate design/implementation?
I agree with the implementation, right now I'd like to sort out
the option semantics :)
Thanks,
Pavel
More information about the CRIU
mailing list