[CRIU] Patch for unnamed unix sockets

Pavel Emelyanov xemul at parallels.com
Wed Jul 22 03:11:17 PDT 2015


On 07/21/2015 12:56 PM, artem.kuzmitskiy at lge.com wrote:
> Hi all,

Welcome :)

> This patch for dumping and restoring unnamed unix socket, details about feature
> you can find -> http://criu.org/External_UNIX_socket#What_to_do_with_socketpair.28.29-s.3F.

Cool! Thanks for the feature, it's really tempting.

> Review please.

Yup, I have several comments, please find them inline.

First of all, please, split the patch into smaller pieces. Minimally I
would suggest having separate patches for -x option extension (dump part)
and --inherit-fd support for sockets (restore part) and for test. I.e.
3 patches. More suggestions for separate patches are inline.

>>From a41fde482f5cb0a08191b4fcc05f04a0a67f77f6 Mon Sep 17 00:00:00 2001
> From: Artem Kuzmitskiy <artem.kuzmitskiy at lge.com>
> Date: Tue, 21 Jul 2015 12:15:21 +0300
> Subject: [PATCH] Added functionality for dumping\restoring unnamed unix
>  sockets.

Please, format the e-mail(s) so that it doesn't contain trash. The example
of patch message is here:
http://criu.org/How_to_submit_patches#An_example_of_patch_message

>     When we call CRIU with dump option, for unnamed socket we should pass it inode
>     into --ext-unix-sk in next format: socket[<inode_value>]. When we call
>     CRIU with restore option, we using inherit functionality and pass socket's
>     inode in same format(put socket instead of pipe).
>     Details about this problem described in
>     http://criu.org/External_UNIX_socket#What_to_do_with_socketpair.28.29-s.3F.
>     Simple example available in test/socketpair directory.
>     Usage example:
>     For dump, criu dump -D images -o dump.log -v4 -x socket:[4529709] -t 13506

I guess the "socket:" part is excessive here, isn't it? We know that we're
dumping the socket, so why writing it one more time? :)

>     For restore, criu restore -d -D images -o restore.log --pidfile restore.pid \
>                  -v4 -x --inherit-fd fd[3]:socket:[4529709]
> 
> Signed-off-by: Artem Kuzmitskiy <artem.kuzmitskiy at lge.com>
> ---
>  crtools.c                     |   6 +-
>  files.c                       |  19 +-
>  include/cr_options.h          |   1 +
>  include/files.h               |   2 +-
>  include/sockets.h             |   3 +
>  lib/criu.h                    |   9 +
>  sk-unix.c                     | 144 ++++++++--
>  test/libcriu/run.sh           |   1 +
>  test/socketpairs/Makefile     |  12 +
>  test/socketpairs/socketpair.c | 595 ++++++++++++++++++++++++++++++++++++++++++
>  10 files changed, 768 insertions(+), 24 deletions(-)
>  create mode 100644 test/socketpairs/Makefile
>  create mode 100644 test/socketpairs/socketpair.c
> 
> diff --git a/crtools.c b/crtools.c
> index b085d33..1012426 100644
> --- a/crtools.c
> +++ b/crtools.c
> @@ -50,6 +50,7 @@ void init_opts(void)
> 
>         /* Default options */
>         opts.final_state = TASK_DEAD;
> +       INIT_LIST_HEAD(&opts.ext_unixsk_ids);
>         INIT_LIST_HEAD(&opts.veth_pairs);
>         INIT_LIST_HEAD(&opts.scripts);
>         INIT_LIST_HEAD(&opts.ext_mounts);

Tabs were corrupted into spaces while sending the patch :(

> @@ -184,7 +185,7 @@ int main(int argc, char *argv[], char *envp[])
>         int log_level = LOG_UNSET;
>         char *imgs_dir = ".";
>         char *work_dir = NULL;
> -       static const char short_opts[] = "dSsRf:F:t:p:hcD:o:n:v::xVr:jlW:L:M:";
> +       static const char short_opts[] = "dSsRf:F:t:p:hcD:o:n:v::x::Vr:jlW:L:M:";
>         static struct option long_opts[] = {
>                 { "tree",                       required_argument,      0, 't'  },
>                 { "pid",                        required_argument,      0, 'p'  },
> @@ -201,7 +202,7 @@ int main(int argc, char *argv[], char *envp[])
>                 { "log-file",                   required_argument,      0, 'o'  },
>                 { "namespaces",                 required_argument,      0, 'n'  },
>                 { "root",                       required_argument,      0, 'r'  },
> -               { USK_EXT_PARAM,                no_argument,            0, 'x'  },
> +               { USK_EXT_PARAM,                optional_argument,      0, 'x'  },

Plz, update the help text below. And add the "adding the necessary bits to the RPC
code" task to your todo list :)

>                 { "help",                       no_argument,            0, 'h'  },
>                 { SK_EST_PARAM,                 no_argument,            0, 1042 },
>                 { "close",                      required_argument,      0, 1043 },
> @@ -278,6 +279,7 @@ int main(int argc, char *argv[], char *envp[])
>                         opts.final_state = TASK_ALIVE;
>                         break;
>                 case 'x':
> +                       if (optarg && unix_sk_ids_parse(optarg) < 0) return 1;

Styling: if/while bodies should go on separate line.

>                         opts.ext_unix_sk = true;
>                         break;
>                 case 'p':
> diff --git a/files.c b/files.c
> index 3e69be7..6c3472c 100644
> --- a/files.c
> +++ b/files.c
> @@ -1381,6 +1381,19 @@ static int inherit_fd_lookup_id(char *id)
>         return ret;
>  }
> 
> +bool inherit_fd_lookup_desc(struct file_desc *d)
> +{
> +       char buf[32], *id_str;
> +
> +       if (!d->ops->name)
> +               return -1;
> +
> +       id_str = d->ops->name(d, buf, sizeof(buf));
> +       int ret = inherit_fd_lookup_id(id_str);
> +
> +       return (ret < 0 ? false : true);
> +}
> +
>  bool inherited_fd(struct file_desc *d, int *fd_p)
>  {
>         char buf[32], *id_str;
> @@ -1398,10 +1411,12 @@ bool inherited_fd(struct file_desc *d, int *fd_p)
>                 return true;
> 
>         *fd_p = dup(i_fd);
> -       if (*fd_p < 0)
> +       if (*fd_p < 0) {
>                 pr_perror("Inherit fd DUP failed");
> +               return false;
> +       }
>         else
> -               pr_info("File %s will be restored from fd %d duped "
> +               pr_info("File %s will be restored from fd %d dumped "
>                                 "from inherit fd %d\n", id_str, *fd_p, i_fd);

This hunk deserves separate patch with comment why you think that reporting
dup() failure explicitly is nice.

>         return true;
>  }
> diff --git a/include/cr_options.h b/include/cr_options.h
> index 9ab8bba..62233c3 100644
> --- a/include/cr_options.h
> +++ b/include/cr_options.h
> @@ -45,6 +45,7 @@ struct cr_options {
>         };
>         bool                    restore_sibling;
>         bool                    ext_unix_sk;
> +       struct list_head        ext_unixsk_ids;
>         bool                    shell_job;
>         bool                    handle_file_locks;
>         bool                    tcp_established_ok;
> diff --git a/include/files.h b/include/files.h
> index db7e108..25d69ad 100644
> --- a/include/files.h
> +++ b/include/files.h
> @@ -174,7 +174,7 @@ extern int inherit_fd_add(int fd, char *key);
>  extern void inherit_fd_log(void);
>  extern int inherit_fd_resolve_clash(int fd);
>  extern int inherit_fd_fini(void);
> -
> +extern bool inherit_fd_lookup_desc(struct file_desc *);
>  extern bool inherited_fd(struct file_desc *, int *fdp);
> 
>  #endif /* __CR_FILES_H__ */
> diff --git a/include/sockets.h b/include/sockets.h
> index a3010e1..deb00a3 100644
> --- a/include/sockets.h
> +++ b/include/sockets.h
> @@ -60,6 +60,9 @@ extern int inet_collect_one(struct nlmsghdr *h, int family, int type);
>  extern int unix_receive_one(struct nlmsghdr *h, void *);
>  extern int netlink_receive_one(struct nlmsghdr *hdr, void *arg);
> 
> +extern int unix_sk_ids_parse(char *optarg);
> +extern int unix_sk_id_add(ino_t ino);
> +
>  extern int do_dump_opt(int sk, int level, int name, void *val, int len);
>  #define dump_opt(s, l, n, f)   do_dump_opt(s, l, n, f, sizeof(*f))
>  extern int do_restore_opt(int sk, int level, int name, void *val, int len);
> diff --git a/lib/criu.h b/lib/criu.h
> index 1655c02..0711313 100644
> --- a/lib/criu.h
> +++ b/lib/criu.h
> @@ -22,11 +22,16 @@
>  #include <stdbool.h>
>  #include "rpc.pb-c.h"
> 
> +#ifdef __GNUG__
> +extern "C" {
> +#endif

Please, describe this change and send as separate patch.

> +
>  enum criu_service_comm {
>         CRIU_COMM_SK,
>         CRIU_COMM_FD
>  };
> 
> +
>  void criu_set_service_address(char *path);
>  void criu_set_service_fd(int fd);
> 
> @@ -188,4 +193,8 @@ int criu_local_restore(criu_opts *opts);
>  int criu_local_restore_child(criu_opts *opts);
>  int criu_local_dump_iters(criu_opts *opts, int (*more)(criu_predump_info pi));
> 
> +#ifdef __GNUG__
> +}
> +#endif
> +
>  #endif /* __CRIU_LIB_H__ */
> diff --git a/sk-unix.c b/sk-unix.c
> index 6c9ec25..203a6f4 100644
> --- a/sk-unix.c
> +++ b/sk-unix.c
> @@ -65,6 +65,11 @@ struct unix_sk_listen_icon {
>         struct unix_sk_listen_icon      *next;
>  };
> 
> +struct  unix_sk_exception {
> +       struct list_head unix_sk_list;
> +       ino_t unix_sk_ino;
> +};
> +
>  #define SK_HASH_SIZE           32
> 
>  static struct unix_sk_listen_icon *unix_listen_icons[SK_HASH_SIZE];
> @@ -129,6 +134,22 @@ static int can_dump_unix_sk(const struct unix_sk_desc *sk)
>         return 1;
>  }
> 
> +static bool unix_sk_exception_lookup_id(ino_t ino)
> +{
> +       bool ret = false;
> +        struct unix_sk_exception *sk;
> +
> +        list_for_each_entry(sk, &opts.ext_unixsk_ids, unix_sk_list) {
> +                if (sk->unix_sk_ino == ino) {
> +                       pr_debug("Found ino %u in exception unix sk list\n", (unsigned int)ino);
> +                        ret = true;
> +                       break;
> +               }
> +        }
> +
> +       return ret;
> +}
> +
>  static int write_unix_entry(struct unix_sk_desc *sk)
>  {
>         int ret;
> @@ -559,16 +580,22 @@ static int dump_external_sockets(struct unix_sk_desc *peer)
>                                 return -1;
>                         }
> 
> -                       if (peer->type != SOCK_DGRAM) {
> -                               show_one_unix("Ext stream not supported", peer);
> -                               pr_err("Can't dump half of stream unix connection.\n");
> -                               return -1;
> +                       if (!peer->name && unix_sk_exception_lookup_id(sk->sd.ino)) {
> +                               pr_debug("found exception for unix name-less external socket.\n");
>                         }
> +                       else {

I don't quite get this if. I thought that ANY socket specified with -x socket:...
will be treated as external, why checking for !peer->name here?

> +                               if (peer->type != SOCK_DGRAM) {
> +                                       show_one_unix("Ext stream not supported", peer);
> +                                       pr_err("Can't dump half of stream unix connection.\n");
> +                                       return -1;
> +                               }
> 
> -                       if (!peer->name) {
> -                               show_one_unix("Ext dgram w/o name", peer);
> -                               pr_err("Can't dump name-less external socket.\n");
> -                               return -1;
> +                               if (!peer->name) {
> +                                       show_one_unix("Ext dgram w/o name", peer);
> +                                       pr_err("Can't dump name-less external socket.\n");
> +                                       pr_err("%d\n", sk->fd);
> +                                       return -1;
> +                               }
>                         }
>                 } else if (ret < 0)
>                         return -1;
> @@ -691,21 +718,24 @@ static int post_open_unix_sk(struct file_desc *d, int fd)
>         if (ui->ue->uflags & USK_CALLBACK)
>                 return 0;
> 
> -       pr_info("\tConnect %#x to %#x\n", ui->ue->ino, peer->ue->ino);
> -
>         /* Skip external sockets */
>         if (!list_empty(&peer->d.fd_info_head))
>                 futex_wait_while(&peer->prepared, 0);
> 
> -       memset(&addr, 0, sizeof(addr));
> -       addr.sun_family = AF_UNIX;
> -       memcpy(&addr.sun_path, peer->name, peer->ue->name.len);
> +       if (!inherit_fd_lookup_desc(d)) {

It was already looked-up in open_unix_sk(). Put a flag onto unix_sk_info 
object in the ->open and check one here. We have plenty of free bits on 
the unix_sk_info.flags.

> 
> -       if (connect(fd, (struct sockaddr *)&addr,
> -                               sizeof(addr.sun_family) +
> -                               peer->ue->name.len) < 0) {
> -               pr_perror("Can't connect %#x socket", ui->ue->ino);
> -               return -1;
> +               memset(&addr, 0, sizeof(addr));
> +               addr.sun_family = AF_UNIX;
> +               memcpy(&addr.sun_path, peer->name, peer->ue->name.len);
> +
> +               pr_info("\tConnect %#x to %#x\n", ui->ue->ino, peer->ue->ino);
> +
> +               if (connect(fd, (struct sockaddr *)&addr,
> +                                       sizeof(addr.sun_family) +
> +                                       peer->ue->name.len) < 0) {
> +                       pr_perror("Can't connect %#x socket", ui->ue->ino);
> +                       return -1;
> +               }
>         }
> 
>         if (restore_sk_queue(fd, peer->ue->id))
> @@ -981,7 +1011,15 @@ static int open_unix_sk(struct file_desc *d)
>         struct unix_sk_info *ui;
> 
>         ui = container_of(d, struct unix_sk_info, d);
> -       if (ui->flags & USK_PAIR_MASTER)
> +
> +       if (inherit_fd_lookup_desc(d)) {
> +               int sk = -1;
> +               if (inherited_fd(d, &sk))

Presumably single call to inherit_fd_lookup_desc(), without
inherit_fd_lookup_desc() in advance, would be enough.

> +                       return sk;
> +               else
> +                       return -1;
> +       }
> +       else if (ui->flags & USK_PAIR_MASTER)
>                 return open_unixsk_pair_master(ui);
>         else if (ui->flags & USK_PAIR_SLAVE)
>                 return open_unixsk_pair_slave(ui);
> @@ -989,11 +1027,27 @@ static int open_unix_sk(struct file_desc *d)
>                 return open_unixsk_standalone(ui);
>  }
> 
> +static char *socket_d_name(struct file_desc *d, char *buf, size_t s)
> +{
> +       struct unix_sk_info *ui;
> +
> +       ui = container_of(d, struct unix_sk_info, d);
> +
> +       if (snprintf(buf, s, "socket:[%d]", ui->ue->ino) >= s) {
> +               pr_err("Not enough room for unixsk %d identifier string\n",
> +                               ui->ue->ino);
> +               return NULL;
> +       }
> +
> +       return buf;
> +}
> +
>  static struct file_desc_ops unix_desc_ops = {
>         .type = FD_TYPES__UNIXSK,
>         .open = open_unix_sk,
>         .post_open = post_open_unix_sk,
>         .want_transport = unixsk_should_open_transport,
> +       .name = socket_d_name,
>  };
> 
>  static int collect_one_unixsk(void *o, ProtobufCMessage *base)
> @@ -1105,3 +1159,55 @@ int resolve_unix_peers(void)
>         return 0;
>  }
> 
> +int unix_sk_ids_parse(char *optarg)
> +{
> +       /* parsing option of the following form: --ext-unix-sk=socket:[<inode value>],
> +        * socket:[<inode value>]... or short form -x socket:[<inode value>],
> +        * socket:[<inode value>]...*/

Style: comments like

/*
 * text
 * text
 */

are preferred over

/* text
 * text */

> +
> +       char *iter = optarg;
> +       char *token = NULL;
> +       int success = 0;
> +       const char *keyword= "socket:[";
> +       while (iter != NULL){

Style: space between ) and {.

> +               if ((token = strstr(iter, keyword)) != NULL) {
> +                       token += strlen(keyword);
> +                       char *begin = token;
> +                       char *end = strchr(token, ']');
> +                       if (end == NULL) {
> +                               success = 0;
> +                               break;
> +                       }
> +                       iter = end;
> +                       ino_t ino = (ino_t)strtoul(begin, &end, 10);
> +                       if (ino > 0 && unix_sk_id_add(ino) > -1)
> +                               success++;
> +                       else {
> +                               success = 0;
> +                               break;
> +                       }
> +               }
> +               else break;
> +       }
> +
> +       if (!success){
> +               pr_err("Can't parse unix socket inode from optarg: %s\n", optarg);
> +               return -1;
> +       }
> +
> +       return 0;
> +}
> +
> +int unix_sk_id_add(ino_t ino)

This routine should be static.

> +{
> +       struct unix_sk_exception *unix_sk;
> +
> +       /* TODO: may validate inode here, but how?*/
> +
> +       unix_sk = xmalloc(sizeof *unix_sk);
> +       if (unix_sk == NULL) return -1;
> +       unix_sk->unix_sk_ino = ino;
> +       list_add_tail(&unix_sk->unix_sk_list, &opts.ext_unixsk_ids);
> +
> +       return 0;
> +}
> diff --git a/test/libcriu/run.sh b/test/libcriu/run.sh
> index e38c76f..d97c518 100755
> --- a/test/libcriu/run.sh
> +++ b/test/libcriu/run.sh
> @@ -43,5 +43,6 @@ run_test test_errno
> 
>  echo "== Stopping service"
>  kill -TERM $(cat wdir/s/pidfile)
> +unlink libcriu.so.1

Plz, describe this change.

>  [ $RESULT -eq 0 ] && echo "Success" || echo "FAIL"
>  exit $RESULT
> diff --git a/test/socketpairs/Makefile b/test/socketpairs/Makefile

Huge thanks for the test! Plz, add one to the test/Makefile's TEST
variable to that the other: target runs one. In this case it will
be auto-tested dayly and we'll find breakges early.

> new file mode 100644
> index 0000000..19330cf
> --- /dev/null
> +++ b/test/socketpairs/Makefile

That's it.

-- Pavel



More information about the CRIU mailing list