[CRIU] [PATCH] Make restored processes inherit file descriptors from criu

Saied Kazemi saied at google.com
Tue Nov 25 14:49:13 PST 2014


On Mon, Nov 24, 2014 at 11:12 PM, Andrew Vagin <avagin at parallels.com> wrote:
>
> On Mon, Nov 24, 2014 at 02:18:23PM -0800, Saied Kazemi wrote:
> > There are cases where a process's file descriptor cannot be restored
> > from the checkpointed image.  For example, a pipe file descriptor with
> > one end in the checkpointed process and the other end in a separate
> > process (that was not part of the checkpointed process tree) cannot be
> > restored because after checkpoint the pipe would be broken and removed.
> > In these cases, criu's caller should set up a new file descriptor to be
> > inherited by the restored process.
> >
> > When checkpointing, the argument of --inherit-fd is a string that
> > identifies the file.  It can be obtained from the target of the symbolic
> > link in /proc/<pid>/fd (for example, /path/to/log/file or pipe:[234956]).
> >
> > When restoring, the argument of --inherit-fd has the form fd[%d]:%s,
> > where %d tells criu which file descriptor to use for restoring file
> > identified by %s.
>
> How do you handle conflicts between inherit-fd sets and restored file
> descriptors.
>
> For example I have a process with the following set of file descriptors:
> 0 -> tty
> 1 -> tty
> 2 -> tty
> 3 -> /test
> 4 -> pipe:[234956]
>
> On restore I create a pipe and use --inherit-fd fd3:pipe:[234956]. In
> this case the 3 fd will be restored before the 4 fd. Why does this
> operation not overwrite the inherit-fd descriptor?
>
> Why does criu not return an error?
> fd 3 already in use (called at files.c:867

Thanks for your quick review and feedback.

I have a patch that resolves clashing fds.  In an earlier
implementation, I moved inherit fds to service fd area but, in order
to avoid increasing the number of service fds, the patch that I am
going to send you moves the clashing inherit fd to a different
descriptor.

Below is an actual test program that I wrote for the case you've
mentioned above.  There is a pipe (pipe:[823441]) between parent
(29823) and child (29824).  Child keeps writing hello %d to the pipe
which parent reads and prints.  After checkpointing the child (criu
dump -D /tmp/criu_img -o dump.log -v4 -j -t 29824), the parent creates
a new pipe (pipe:[823469]) and calls criu to restore the child (criu
restore -D /tmp/criu_img -o restore.log v4 -j --inherit-fd
fd[3]:pipe:[823441]).  Child resumes execution as if nothing happened.

...
[29824] total 0
lrwx------ 1 root root 64 Nov 25 13:25 0 -> /dev/pts/3
lrwx------ 1 root root 64 Nov 25 13:25 1 -> /dev/pts/3
lrwx------ 1 root root 64 Nov 25 13:25 2 -> /dev/pts/3
lr-x------ 1 root root 64 Nov 25 13:25 3 -> /tmp/test
l-wx------ 1 root root 64 Nov 25 13:25 4 -> pipe:[823441]
[29824] writing hello 3

[29823] lr-x------ 1 root root 64 Nov 25 13:25 /proc/29823/fd/3 -> pipe:[823441]
[29823] read hello 3

[29823] child 29824 exited with status 9
[29823] creating a new pipe
[29869] criu restore -D /tmp/criu_img -o restore.log v4 -j
--inherit-fd fd[3]:pipe:[823441]

[29824] total 0
lrwx------ 1 root root 64 Nov 25 13:25 0 -> /dev/pts/3
lrwx------ 1 root root 64 Nov 25 13:25 1 -> /dev/pts/3
lrwx------ 1 root root 64 Nov 25 13:25 2 -> /dev/pts/3
lr-x------ 1 root root 64 Nov 25 13:25 3 -> /tmp/test
l-wx------ 1 root root 64 Nov 25 13:25 4 -> pipe:[823469]
[29824] writing hello 4

[29823] lr-x------ 1 root root 64 Nov 25 13:25 /proc/29823/fd/3 -> pipe:[823469]
[29823] read hello 4
...


> >
> > Note that while it's OK to omit --inherit-fd for checkpoint and specify
> > it only for restore, it's better to always specify it because checkpoint
> > code will not attempt to dump an inherit fd file.
> >
> > Signed-off-by: Saied Kazemi <saied at google.com>
> > ---
> >  crtools.c            |  15 +++++
> >  files-reg.c          |   8 +++
> >  files.c              | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/cr_options.h |   1 +
> >  include/files.h      |   8 +++
> >  pipes.c              |  33 +++++++++-
>
> Does it work for only pipes now?
>
> >  util.c               |   9 +++

Although my immediate use case and focus has been pipes, it also works
with files.  For example, below is how we can change stdout of test.sh
(http://criu.org/Simple_loop) during restore:

# ./test.sh > /tmp/old &
[1] 31732
# criu dump -D /tmp/criu_img -o dump.log -v4 -j --inherit-fd /tmp/old
-t 31732 && echo OK
[1]+  Killed                  ./test.sh > /tmp/old
OK
# criu restore -D /tmp/criu_img -o restore.log -v4 -j -d --inherit-fd
'fd[1]:tmp/old' > /tmp/new
# echo $?
0
# tail -f /tmp/new
Tue Nov 25 13:52:55 PST 2014
Tue Nov 25 13:52:56 PST 2014
...


> I would like to have tests for this functionality.

Absolutely.   As you've seen in the code, the changes to core criu
logic have been kept to a minimum.  But we still need regression tests
and any help in this area would be greatly appreciated.


> >  7 files changed, 255 insertions(+), 2 deletions(-)
> >
> > diff --git a/crtools.c b/crtools.c
> > index 0ac667c..d7bb6c8 100644
> > --- a/crtools.c
> > +++ b/crtools.c
> > @@ -52,6 +52,7 @@ void init_opts(void)
> >       INIT_LIST_HEAD(&opts.veth_pairs);
> >       INIT_LIST_HEAD(&opts.scripts);
> >       INIT_LIST_HEAD(&opts.ext_mounts);
> > +     INIT_LIST_HEAD(&opts.inherit_fds);
> >       INIT_LIST_HEAD(&opts.new_cgroup_roots);
> >
> >       opts.cpu_cap = CPU_CAP_DEFAULT;
> > @@ -187,6 +188,7 @@ int main(int argc, char *argv[], char *envp[])
> >               { "exec-cmd", no_argument, 0, 1059},
> >               { "manage-cgroups", no_argument, 0, 1060},
> >               { "cgroup-root", required_argument, 0, 1061},
> > +             { "inherit-fd", required_argument, 0, 1062},
> >               { },
> >       };
> >
> > @@ -392,6 +394,10 @@ int main(int argc, char *argv[], char *envp[])
> >                                       return -1;
> >                       }
> >                       break;
> > +             case 1062:
> > +                     if (inherit_fd_add(optarg) < 0)
> > +                             return 1;
> > +                     break;
> >               case 'M':
> >                       {
> >                               char *aux;
> > @@ -469,6 +475,9 @@ int main(int argc, char *argv[], char *envp[])
> >       if (log_init(opts.output))
> >               return 1;
> >
> > +     /* log the inherit fd list now that log is ready */
> > +     inherit_fd_log();
> > +
> >       if (opts.img_parent)
> >               pr_info("Will do snapshot from %s\n", opts.img_parent);
> >
> > @@ -491,6 +500,12 @@ int main(int argc, char *argv[], char *envp[])
> >               if (tree_id)
> >                       pr_warn("Using -t with criu restore is obsoleted\n");
> >
> > +             if (inherit_fd_validate_fds() < 0) {
> > +                     pr_msg("Some inherit fd items are invalid\n");
> > +                     pr_msg("See log file for details\n");
> > +                     return 1;
> > +             }
> > +
> >               ret = cr_restore_tasks();
> >               if (ret == 0 && opts.exec_cmd) {
> >                       close_pid_proc();
> > diff --git a/files-reg.c b/files-reg.c
> > index 20b50a0..bfd1896 100644
> > --- a/files-reg.c
> > +++ b/files-reg.c
> > @@ -962,6 +962,14 @@ int open_path(struct file_desc *d,
> >       }
> >
> >       mntns_root = mntns_get_root_by_mnt_id(rfi->rfe->mnt_id);
> > +
> > +pr_debug(">>> open_path(): rfi->path=(%s)\n", rfi->path);
>
> ^^^^ I think you forgot to remove this message

Yes, sorry about this :)


> > +     tmp = inherit_fd_lookup_id(rfi->path, true);
> > +     if (tmp >= 0) {
> > +             pr_info("File %s will be restored from fd %d\n", rfi->path, tmp);
> > +             return tmp;
> > +     }
> > +
> >       tmp = open_cb(mntns_root, rfi, arg);
> >       if (tmp < 0) {
> >               pr_perror("Can't open file %s", rfi->path);
> > diff --git a/files.c b/files.c
> > index f009e7d..7bace3a 100644
> > --- a/files.c
> > +++ b/files.c
> > @@ -37,6 +37,7 @@
> >  #include "imgset.h"
> >  #include "fs-magic.h"
> >  #include "proc_parse.h"
> > +#include "cr_options.h"
> >
> >  #include "parasite.h"
> >  #include "parasite-syscall.h"
> > @@ -1176,3 +1177,185 @@ int shared_fdt_prepare(struct pstree_item *item)
> >
> >       return 0;
> >  }
> > +
> > +/*
> > + * Inherit fd support.
> > + *
> > + * There are cases where a process's file descriptor cannot be restored
> > + * from the checkpointed image.  For example, a pipe file descriptor with
> > + * one end in the checkpointed process and the other end in a separate
> > + * process (that was not part of the checkpointed process tree) cannot be
> > + * restored because after checkpoint the pipe would be broken and removed.
> > + * In these cases, criu's caller should set up a new file descriptor to be
> > + * inherited by the restored process.
> > + *
> > + * When checkpointing, the argument of --inherit-fd is a string that
> > + * identifies the file.  It can be obtained from the target of the
> > + * symbolic link in /proc/<pid>/fd (for example, pipe:[234956]).
> > + *
> > + * When restoring, the argument of --inherit-fd has the form fd[%d]:%s,
> > + * where %d tells criu which file descriptor to use for restoring %s.
> > + *
> > + * Note that while it's OK to omit --inherit-fd for checkpoint and
> > + * specify it only for restore, it's better to always specify it because
> > + * checkpoint code will not attempt to dump an inherit fd file.
> > + *
> > + * In the following example from http://criu.org/Simple_loop, we first
> > + * redirect the output of test.sh to /tmp/old and then use criu's file
> > + * descriptor 7 to change /tmp/old with /tmp/new:
> > + *
> > + *     $ ./test.sh > /tmp/old
> > + *     $ sudo criu dump -j -t <pid>
> > + *     $ sudo criu restore -j --inherit-fd 'fd[7]:tmp/old' 7> /tmp/new
> > + *
> > + * Notice that the path in the argument of --inherit-fd is relative to the
> > + * root of the process (i.e., tmp/old).
> > + *
> > + * Below is an example of how criu will be called from Docker to checkpoint
> > + * and restore a container running in detached mode:
> > + *
> > + *     $ docker run -d ...
> > + *     $ docker checkpoint <container-id>
> > + *       |
> > + *       +---> criu dump ... --inherit-fd pipe:[686787] pipe:[686788]
> > + *     $ docker restore <container-id>
> > + *       |
> > + *       +---> criu restore ... --inherit-fd fd[1]:pipe:[686787] fd[2]:pipe:[686788]
> > + */
> > +
> > +struct inherit_fd {
> > +     char *inh_id;           /* identifier */
> > +     int inh_fd;             /* file descriptor to restore from */
> > +     struct list_head l;
> > +};
> > +
> > +/*
> > + * We can't print diagnostics messages in this function because the
> > + * log file isn't initialized yet.
> > + */
> > +int inherit_fd_add(char *optarg)
> > +{
> > +     char *cp;
> > +     int n;
> > +     int fd;
> > +     struct inherit_fd *inh;
> > +
> > +     /* beding debug support */
> > +     fd = -1;
> > +     n = sscanf(optarg, "debug[%d]:", &fd);
>
> I say nothing about debug[%d]:fd in the commit message.

I didn't say anything in the commit message because it's debug aid and
I wasn't sure whether we want to keep it or not.  It's actually a nice
feature and if we decide to keep it, I'll document it.


> > +     if (n == 1) {
> > +             /*
> > +              * If the argument has the format debug[%d]:%s, it means
> > +              * just write out the string after colon to the file
> > +              * descriptor %d.  This can be used to leave a restore
> > +              * marker in the output stream of the process.
> > +              */
> > +             if (fd < 0) {
> > +                     pr_msg("Invalid fd number %d in argument %s\n",
>
> pr_err

Will change.


> > +                             fd, optarg);
> > +                     return -1;
> > +             }
> > +             cp = strchr(optarg, ':') + 1;
> > +             n = strlen(cp);
> > +             if (write(fd, cp, n) != n) {
> > +                     pr_msg("Cannot write debug message %s to fd %d\n",
> > +                             cp, fd);
> > +                     return -1;
> > +             }
> > +             return 0;
> > +     }
> > +     /* end debug support */
> > +
> > +     fd = -1;
> > +     n = sscanf(optarg, "fd[%d]:", &fd);
> > +     if (n == 1) {
> > +             /* argument is in fd[%d]:%s format */
> > +             if (fd < 0) {
> > +                     pr_msg("Invalid fd number %d in argument %s\n", fd, optarg);
> > +                     return -1;
> > +             }
> > +             cp = strchr(optarg, ':');
> > +             *cp++ = '\0';
> > +     } else {
> > +             /* argument is in %s format */
> > +             cp = optarg;
> > +     }
> > +
> > +     if ((inh = xmalloc(sizeof *inh)) == NULL)
> > +             return -1;
> > +     inh->inh_id = cp;
> > +     inh->inh_fd = fd;
> > +     list_add_tail(&inh->l, &opts.inherit_fds);
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Log the inherit fd list.  Called for diagnostics purposes
> > + * after the log file is initialized.
> > + */
> > +void inherit_fd_log(void)
> > +{
> > +     struct inherit_fd *inh;
> > +
> > +     list_for_each_entry(inh, &opts.inherit_fds, l) {
> > +             if (inh->inh_fd < 0) {
> > +                     pr_info("File %s will be marked as inherit fd\n",
> > +                             inh->inh_id);
> > +             } else {
> > +                     pr_info("File %s will be restored from criu's fd %d\n",
> > +                             inh->inh_id, inh->inh_fd);
> > +             }
> > +     }
> > +}
> > +
> > +/*
> > + * Validate that all entries in the inherit fd list have a valid
> > + * file descriptor.
> > + */
> > +int inherit_fd_validate_fds(void)
> > +{
> > +     struct inherit_fd *inh;
> > +
> > +     list_for_each_entry(inh, &opts.inherit_fds, l) {
> > +             if (inh->inh_fd < 0)
> > +                     return -1;
> > +     }
> > +     return 0;
> > +}
> > +
> > +/*
> > + * Look up the inherit fd list by a file identifier.
> > + *
> > + * If need_fd is true, the caller wants the corresponding mapping fd.
> > + * Otherwise, the caller just wants to know if the file is in the list.
> > + */
> > +int inherit_fd_lookup_id(char *id, bool need_fd)
> > +{
> > +     struct inherit_fd *inh;
> > +
> > +     list_for_each_entry(inh, &opts.inherit_fds, l) {
> > +             if (!strcmp(inh->inh_id, id)) {
> > +                     pr_info("File identifier %s in inherit fd list\n", id);
> > +                     return need_fd ? inh->inh_fd : 0;
> > +             }
> > +     }
> > +
> > +     return -1;
> > +}
> > +
> > +/*
> > + * Look up the inherit fd list by a file descriptor.
> > + */
> > +bool inherit_fd_lookup_fd(int fd)
> > +{
> > +     struct inherit_fd *inh;
> > +
> > +     list_for_each_entry(inh, &opts.inherit_fds, l) {
> > +             if (inh->inh_fd == fd) {
> > +                     pr_info("File descriptor %d in inherit fd list\n", fd);
> > +                     return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > diff --git a/include/cr_options.h b/include/cr_options.h
> > index a9f9e92..3735b09 100644
> > --- a/include/cr_options.h
> > +++ b/include/cr_options.h
> > @@ -43,6 +43,7 @@ struct cr_options {
> >       struct list_head        veth_pairs;
> >       struct list_head        scripts;
> >       struct list_head        ext_mounts;
> > +     struct list_head        inherit_fds;
> >       char                    *libdir;
> >       bool                    use_page_server;
> >       unsigned short          ps_port;
> > diff --git a/include/files.h b/include/files.h
> > index 4c9300d..e4e304f 100644
> > --- a/include/files.h
> > +++ b/include/files.h
> > @@ -168,4 +168,12 @@ extern struct collect_image_info ext_file_cinfo;
> >  extern int dump_unsupp_fd(struct fd_parms *p, int lfd,
> >                         struct cr_img *, char *more, char *info);
> >
> > +#define INHERIT_FD 0xcafebabe                /* marker */
> > +
> > +extern int inherit_fd_add(char *optarg);
> > +extern void inherit_fd_log(void);
> > +extern int inherit_fd_validate_fds(void);
> > +extern int inherit_fd_lookup_id(char *id, bool need_fd);
> > +extern bool inherit_fd_lookup_fd(int fd);
> > +
> >  #endif /* __CR_FILES_H__ */
> > diff --git a/pipes.c b/pipes.c
> > index ddcc5cf..73bc790 100644
> > --- a/pipes.c
> > +++ b/pipes.c
> > @@ -293,8 +293,20 @@ static int recv_pipe_fd(struct pipe_info *pi)
> >       return fd;
> >  }
> >
> > +static char *pipe_id_string(int pipe_id)
> > +{
> > +     static char idstr[16];
> > +
> > +     if (snprintf(idstr, sizeof idstr, "pipe:[%d]", pipe_id) >= sizeof idstr) {
> > +             pr_err("Not enough room for pipe %d identifier string\n", pipe_id);
> > +             return NULL;
> > +     }
> > +     return idstr;
> > +}
> > +
> >  static int open_pipe(struct file_desc *d)
> >  {
> > +     char *pipe_name;
> >       struct pipe_info *pi, *p;
> >       int ret, tmp;
> >       int pfd[2];
> > @@ -304,6 +316,18 @@ static int open_pipe(struct file_desc *d)
> >
> >       pr_info("\t\tCreating pipe pipe_id=%#x id=%#x\n", pi->pe->pipe_id, pi->pe->id);
> >
> > +     pipe_name = pipe_id_string(pi->pe->pipe_id);
> > +     tmp = inherit_fd_lookup_id(pipe_name, true);
> > +     if (tmp >= 0) {
> > +             pr_info("Pipe %s will be restored from inherit fd %d\n",
> > +                     pipe_name, tmp);
> > +             return tmp;
> > +     }
> > +     if (pi->pe->flags == INHERIT_FD) {
> > +             pr_err("No inherit fd for pipe %s\n", pipe_name);
> > +             return -1;
> > +     }
> > +
> >       if (!pi->create)
> >               return recv_pipe_fd(pi);
> >
> > @@ -497,13 +521,18 @@ static int dump_one_pipe(int lfd, u32 id, const struct fd_parms *p)
> >
> >       pe.id           = id;
> >       pe.pipe_id      = pipe_id(p);
> > -     pe.flags        = p->flags;
> >       pe.fown         = (FownEntry *)&p->fown;
> > +     if (inherit_fd_lookup_id(pipe_id_string(pe.pipe_id), false) < 0)
> > +             pe.flags = p->flags;
> > +     else {
> > +             pe.flags = INHERIT_FD;
> > +             pr_info("Pipe %d marked as inherit fd\n", pe.pipe_id);
> > +     }
> >
> >       if (pb_write_one(img_from_set(glob_imgset, CR_FD_PIPES), &pe, PB_PIPE))
> >               return -1;
> >
> > -     return dump_one_pipe_data(&pd_pipes, lfd, p);
> > +     return pe.flags == INHERIT_FD ? 0 : dump_one_pipe_data(&pd_pipes, lfd, p);
> >  }
> >
> >  const struct fdtype_ops pipe_dump_ops = {
> > diff --git a/util.c b/util.c
> > index dd76863..3b468f2 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -41,6 +41,7 @@
> >  #include "cr_options.h"
> >  #include "servicefd.h"
> >  #include "cr-service.h"
> > +#include "files.h"
> >
> >  #define VMA_OPT_LEN  128
> >
> > @@ -93,6 +94,14 @@ void pr_vma(unsigned int loglevel, const struct vma_area *vma_area)
> >  int close_safe(int *fd)
> >  {
> >       int ret = 0;
> > +
> > +     /*
> > +      * Don't close files descriptors that criu's caller
> > +      * set up to be inherited by the restored process.
> > +      */
> > +     if (inherit_fd_lookup_fd(*fd))
> > +             return 0;
>
> I think it's a bad place for this check. close_safe() is a generic
> function for closing file descriptors.
>
> For example one of patterns how it's used
>
>         int fd = -1;
>         if (smth)
>                 goto err;
>
>         fd = open();
>         if (fd < 0)
>                 goto err;
>
>         ...
>
> err:
>         close_safe(&fd)
>         return -1;

If open succeeds, fd will be a valid descriptor different from any
inherit fds.  Therefore, close_safe() would close it as expected.  Did
I miss something here?

--Saied


More information about the CRIU mailing list