[CRIU] [PATCH v2] rework criu check logic
Saied Kazemi
saied at google.com
Thu Mar 10 13:01:18 PST 2016
Pavel and Adrian,
Please take a look at this patch and let me know what you think.
As you will see, I have changed some pr_error() to pr_warn() when calling
"criu check --extra" because we're saying that the extra features do not
necessarily cause dump/restore failures. But we can't change all
pr_error() calls to pr_warn() indiscriminately in the functions that are
called by "criu check --extra" because some of them are real errors.
Another item of discussion is the exit code. Obviously, if there are no
errors or warnings the exit code has to be zero. But what if there's a
warning on an extra or specific feature? Currently, the exit code is
non-zero but one can argue that for warnings it should still be zero.
The current patch is a definite improvement (because it was broken) but to
make it "perfect" will require more time and iterations. It's also very
difficult to test all permutations.
--Saied
On Thu, Mar 10, 2016 at 12:31 PM, Saied Kazemi <saied at google.com> wrote:
> The "criu check" command to check if the kernel is properly configured
> to run criu is broken.
>
> The "criu check --ms" command used to be the way to tell criu to check
> only for features that have been merged upstream. But recent kernels
> have a set of features whose presence doesn't necessarily mean that
> dump or restore will fail but rather *may* fail depending on whether
> the process tree uses those features.
>
> This patch deprecates --ms and introduces --extra, --experimental,
> and --all. See "criu --help" or "man criu" for more info.
>
> Typical use cases are:
>
> $ sudo criu check
> Looks good.
> $ echo $?
> 0
>
> $ sudo criu check --extra
> Looks good.
>
> $ sudo criu check --extra
> <one or more warnings...>
> Looks good but some kernel features are missing
> which, depending on your process tree, may cause
> dump or restore failure.
> $ echo $?
> 1
>
> $ sudo criu check --feature list
> mnt_id aio_remap timerfd tun userns fdinfo_lock seccomp_suspend \
> seccomp_filters loginuid cgroupns
>
> $ sudo criu check --feature mnt_id
> Warn (cr-check.c:283): fdinfo doesn't contain the mnt_id field
> $ echo $?
> 1
>
> $ sudo criu check --feature tun
> tun is supported
> $ echo $?
> 0
>
> Signed-off-by: Saied Kazemi <saied at google.com>
> ---
> Documentation/criu.txt | 48 ++++++++--
> criu/cr-check.c | 217
> ++++++++++++++++++++++++++--------------------
> criu/cr-service.c | 3 -
> criu/crtools.c | 40 +++++++--
> criu/include/cr_options.h | 3 +-
> criu/timerfd.c | 2 +-
> 6 files changed, 196 insertions(+), 117 deletions(-)
>
> diff --git a/Documentation/criu.txt b/Documentation/criu.txt
> index e7ba6dd..bd246bf 100644
> --- a/Documentation/criu.txt
> +++ b/Documentation/criu.txt
> @@ -194,7 +194,7 @@ In other words, do not use it until really needed.
>
> *-j*, *--shell-job*::
> Allow one to dump shell jobs. This implies the restored task will
> inherit session and
> - process group ID from the criu itself. Also this option allows one to
> migrate a
> + process group ID from the *criu* itself. Also this option allows one
> to migrate a
> single external tty connection, in other words this option allows one
> to migrate
> such application as "top" and friends. If passed on *dump* it must be
> specified on *restore* as well.
> @@ -267,7 +267,7 @@ The '<mode>' may be one of below.
> *--ext-mount-map* *auto*::
> This is a special case. If this flag is passed, when an external
> mount is missing from the command line '*--ext-mount-map*
> <KEY>:<VAL>' syntax,
> - criu attempts to automatically resolve this mount from its namespace.
> + *criu* attempts to automatically resolve this mount from its
> namespace.
>
> *--enable-external-sharing*, *--enable-external-masters*::
> These flags enable external shared or slave mounts to be resolved
> @@ -278,7 +278,7 @@ The '<mode>' may be one of below.
>
> *-j*, *--shell-job*::
> Restore shell jobs, in other words inherit session and process group
> - ID from the criu itself.
> + ID from the *criu* itself.
>
> *--cpu-cap* ['<cap>','<cap>']::
> Specify '<cap>' CPU capability to be present on the CPU the process is
> @@ -314,15 +314,45 @@ relaxing capability with *--cpu-cap*=*none*
> parameter.
>
> *check*
> ~~~~~~~
> -Tests whether the kernel support is up to date.
> +Checks whether the kernel supports the features that *criu* needs to
> +successfully dump and restore a process tree.
>
> -*--ms*::
> - Do not check not yet merged features.
> +There are three categories of kernel support as described below. *criu
> +check* always checks Category 1 features unless *--feature* is specified
> +which only checks the specified feature.
> +
> +If there are no errors or warnings, *criu* prints "Looks good." and its
> +exit code is 0. Otherwise, *criu* prints appropriate error and warning
> +messages and its exit code is non-zero.
> +
> +- *Category 1*. Absolutely required. These are features like
> + '/proc/<pid>/map_files', 'NETLINK_SOCK_DIAG' socket
> + monitoring, '/proc/sys/kernel/ns_last_pid', etc.
> +
> +- *Category 2*. Required only for specific cases. These are features
> + like aio remap, '/dev/net/tun', etc. that are
> + required if the process being dumped or restored
> + is using them.
> +
> +- *Category 3*. Experimental. These are features like task-diag that
> + are used for experimental purposes (mostly
> + during development).
> +
> +Without an argument, *criu check* checks Category 1 features. This
> behavior
> +can change with the following options:
> +
> +*--extra*::
> + Check kernel support for Category 2 features.
> +
> +*--experimental*::
> + Check kernel support for Category 3 features.
> +
> +*--all*::
> + Check kernel support for Category 1, 2, and 3 features.
>
> *--feature* '<name>'::
> - Check a particular feature. Instead of checking everything one may
> specify
> - which exactly feature is to be tested. The '<name>' may be: *mnt_id*,
> - *aio_remap*, *timerfd*, *tun*, *userns*.
> + Check a specific feature. If '<name>' is 'list', a list of valid
> + kernel feature names that can be checked will be printed.
>
> *page-server*
> ~~~~~~~~~~~~~
> diff --git a/criu/cr-check.c b/criu/cr-check.c
> index 2896b02..f3f2021 100644
> --- a/criu/cr-check.c
> +++ b/criu/cr-check.c
> @@ -41,6 +41,8 @@
> #include "pstree.h"
> #include "cr_options.h"
>
> +static char *feature_name(int (*func)());
> +
> static int check_tty(void)
> {
> int master = -1, slave = -1;
> @@ -169,9 +171,8 @@ static int check_kcmp(void)
> return -1;
> }
>
> -static int check_prctl(void)
> +static int check_prctl(int pr_set_mm_map)
> {
> - unsigned long user_auxv = 0;
> unsigned int *tid_addr;
> unsigned int size = 0;
> int ret;
> @@ -182,38 +183,13 @@ static int check_prctl(void)
> return -1;
> }
>
> - /*
> - * Either new or old interface must be supported in the kernel.
> - */
> - ret = prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, (unsigned long)&size,
> 0, 0);
> - if (ret) {
> - if (!opts.check_ms_kernel) {
> - pr_msg("prctl: PR_SET_MM_MAP is not supported,
> which "
> - "is required for restoring user
> namespaces\n");
> - return -1;
> - } else
> - pr_warn("Skipping unssuported PR_SET_MM_MAP\n");
> -
> - ret = prctl(PR_SET_MM, PR_SET_MM_BRK, brk(0), 0, 0);
> - if (ret) {
> - if (ret == -EPERM)
> - pr_msg("prctl: One needs CAP_SYS_RESOURCE
> capability to perform testing\n");
> - else
> - pr_msg("prctl: PR_SET_MM is not
> supported\n");
> - return -1;
> - }
> -
> - ret = prctl(PR_SET_MM, PR_SET_MM_EXE_FILE, -1, 0, 0);
> - if (ret != -EBADF) {
> - pr_msg("prctl: PR_SET_MM_EXE_FILE is not supported
> (%d)\n", ret);
> - return -1;
> - }
> -
> - ret = prctl(PR_SET_MM, PR_SET_MM_AUXV, (long)&user_auxv,
> sizeof(user_auxv), 0);
> - if (ret) {
> - pr_msg("prctl: PR_SET_MM_AUXV is not supported\n");
> - return -1;
> - }
> + if (pr_set_mm_map) {
> + /*
> + * PR_SET_MM_MAP is optional because criu/pie/restorer.c
> + * has proper fall-back in case this API is not supported.
> + */
> + if (prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, (unsigned
> long)&size, 0, 0))
> + pr_warn("prctl: PR_SET_MM_MAP is not supported\n");
> }
>
> return 0;
> @@ -304,7 +280,7 @@ int check_mnt_id(void)
> return -1;
>
> if (fdinfo.mnt_id == -1) {
> - pr_err("fdinfo doesn't contain the mnt_id field\n");
> + pr_warn("fdinfo doesn't contain the mnt_id field\n");
> return -1;
> }
>
> @@ -626,7 +602,7 @@ static int check_ptrace_suspend_seccomp(void)
>
> if (ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_SUSPEND_SECCOMP)
> < 0) {
> if (errno == EINVAL) {
> - pr_err("Kernel doesn't support
> PTRACE_O_SUSPEND_SECCOMP\n");
> + pr_warn("Kernel doesn't support
> PTRACE_O_SUSPEND_SECCOMP\n");
> } else {
> pr_perror("couldn't suspend seccomp");
> }
> @@ -670,7 +646,7 @@ static int check_ptrace_dump_seccomp_filters(void)
> len = ptrace(PTRACE_SECCOMP_GET_FILTER, pid, 0, NULL);
> if (len < 0) {
> ret = -1;
> - pr_perror("Dumping seccomp filters not supported");
> + pr_warn("Dumping seccomp filters not supported");
> }
>
> kill(pid, SIGKILL);
> @@ -766,11 +742,8 @@ static int check_aio_remap(void)
> ctx = (aio_context_t)naddr;
> r = syscall(SYS_io_getevents, ctx, 0, 1, NULL, NULL);
> if (r < 0) {
> - if (!opts.check_ms_kernel) {
> - pr_err("AIO remap doesn't work properly\n");
> - return -1;
> - } else
> - pr_warn("Skipping unsupported AIO remap\n");
> + pr_warn("AIO remap doesn't work properly\n");
> + return -1;
> }
>
> return 0;
> @@ -782,12 +755,8 @@ static int check_fdinfo_lock(void)
> return -1;
>
> if (!kdat.has_fdinfo_lock) {
> - if (!opts.check_ms_kernel) {
> - pr_err("fdinfo doesn't contain the lock field\n");
> - return -1;
> - } else {
> - pr_warn("fdinfo doesn't contain the lock field\n");
> - }
> + pr_warn("fdinfo doesn't contain the lock field\n");
> + return -1;
> }
>
> return 0;
> @@ -823,14 +792,10 @@ static int check_clone_parent_vs_pid()
> static int check_cgroupns(void)
> {
> int ret;
> - if (opts.check_ms_kernel) {
> - pr_warn("Skipping cgroup namespaces check\n");
> - return 0;
> - }
>
> ret = access("/proc/self/ns/cgroup", F_OK);
> if (ret < 0) {
> - pr_err("cgroupns not supported. This is not fatal.\n");
> + pr_warn("cgroupns not supported. This is not fatal.\n");
> return -1;
> }
>
> @@ -839,6 +804,22 @@ static int check_cgroupns(void)
>
> static int (*chk_feature)(void);
>
> +/*
> + * There are three categories of kernel features:
> + *
> + * 1. Absolutely required (/proc/pid/map_files, ptrace PEEKSIGINFO, etc.).
> + * 2. Required only for specific cases (aio remap, tun, etc.).
> + * Checked when --extra or --all is specified.
> + * 3. Experimental (task-diag).
> + * Checked when --experimental or --all is specified.
> + *
> + * We fail if any feature in category 1 is missing but tolerate failures
> + * in the other categories. Currently, there is nothing in category 3.
> + */
> +#define GOOD "Looks good."
> +#define GOOD_BUT "Looks good but some kernel features are
> missing\n" \
> + "which, depending on your process tree, may
> cause\n" \
> + "dump or restore failure."
> int cr_check(void)
> {
> struct ns_id ns = { .type = NS_CRIU, .ns_pid = PROC_SELF, .nd =
> &mnt_ns_desc };
> @@ -863,16 +844,22 @@ int cr_check(void)
> return -1;
>
> if (chk_feature) {
> - ret = chk_feature();
> - goto out;
> + if (chk_feature())
> + return -1;
> + print_on_level(DEFAULT_LOGLEVEL, "%s is supported\n",
> + feature_name(chk_feature));
> + return 0;
> }
>
> + /*
> + * Category 1 - absolutely required.
> + */
> ret |= check_map_files();
> ret |= check_sock_diag();
> ret |= check_ns_last_pid();
> ret |= check_sock_peek_off();
> ret |= check_kcmp();
> - ret |= check_prctl();
> + ret |= check_prctl(0);
> ret |= check_fcntl();
> ret |= check_proc_stat();
> ret |= check_tcp();
> @@ -883,22 +870,39 @@ int cr_check(void)
> ret |= check_ipc();
> ret |= check_sigqueuinfo();
> ret |= check_ptrace_peeksiginfo();
> - ret |= check_ptrace_suspend_seccomp();
> - ret |= check_ptrace_dump_seccomp_filters();
> - ret |= check_mem_dirty_track();
> - ret |= check_posix_timers();
> - ret |= check_tun_cr(0);
> - ret |= check_timerfd();
> - ret |= check_mnt_id();
> - ret |= check_aio_remap();
> - ret |= check_fdinfo_lock();
> - ret |= check_clone_parent_vs_pid();
> - ret |= check_cgroupns();
> + if (ret)
> + return ret;
>
> -out:
> - if (!ret)
> - print_on_level(DEFAULT_LOGLEVEL, "Looks good.\n");
> + /*
> + * Category 2 - required for specific cases.
> + */
> + if (opts.check_extra_features) {
> + ret |= check_prctl(1);
> + ret |= check_ptrace_suspend_seccomp();
> + ret |= check_ptrace_dump_seccomp_filters();
> + ret |= check_mem_dirty_track();
> + ret |= check_posix_timers();
> + ret |= check_tun_cr(0);
> + ret |= check_timerfd();
> + ret |= check_mnt_id();
> + ret |= check_aio_remap();
> + ret |= check_fdinfo_lock();
> + ret |= check_clone_parent_vs_pid();
> + ret |= check_cgroupns();
> + }
>
> + /*
> + * Category 3 - experimental.
> + */
> + if (opts.check_experimental_features) {
> + /*
> + * Place holder as currently there are no checks.
> + * Remove this comment when adding checks for
> + * experimental features.
> + */
> + }
> +
> + print_on_level(DEFAULT_LOGLEVEL, "%s\n", ret ? GOOD_BUT : GOOD);
> return ret;
> }
>
> @@ -920,14 +924,14 @@ static int check_userns(void)
>
> ret = access("/proc/self/ns/user", F_OK);
> if (ret) {
> - pr_perror("No userns proc file");
> + pr_warn("No userns proc file");
> return -1;
> }
>
> ret = prctl(PR_SET_MM, PR_SET_MM_MAP_SIZE, (unsigned long)&size,
> 0, 0);
> if (ret) {
> errno = -ret;
> - pr_perror("No new prctl API");
> + pr_warn("No new prctl API");
> return -1;
> }
>
> @@ -947,32 +951,53 @@ static int check_loginuid(void)
> return 0;
> }
>
> +struct feature_list {
> + char *name;
> + int (*func)();
> +};
> +
> +static struct feature_list feature_list[] = {
> + { "mnt_id", check_mnt_id },
> + { "aio_remap", check_aio_remap },
> + { "timerfd", check_timerfd },
> + { "tun", check_tun },
> + { "userns", check_userns },
> + { "fdinfo_lock", check_fdinfo_lock },
> + { "seccomp_suspend", check_ptrace_suspend_seccomp },
> + { "seccomp_filters", check_ptrace_dump_seccomp_filters },
> + { "loginuid", check_loginuid },
> + { "cgroupns", check_cgroupns },
> + { NULL, NULL },
> +};
> +
> int check_add_feature(char *feat)
> {
> - if (!strcmp(feat, "mnt_id"))
> - chk_feature = check_mnt_id;
> - else if (!strcmp(feat, "aio_remap"))
> - chk_feature = check_aio_remap;
> - else if (!strcmp(feat, "timerfd"))
> - chk_feature = check_timerfd;
> - else if (!strcmp(feat, "tun"))
> - chk_feature = check_tun;
> - else if (!strcmp(feat, "userns"))
> - chk_feature = check_userns;
> - else if (!strcmp(feat, "fdinfo_lock"))
> - chk_feature = check_fdinfo_lock;
> - else if (!strcmp(feat, "seccomp_suspend"))
> - chk_feature = check_ptrace_suspend_seccomp;
> - else if (!strcmp(feat, "seccomp_filters"))
> - chk_feature = check_ptrace_dump_seccomp_filters;
> - else if (!strcmp(feat, "loginuid"))
> - chk_feature = check_loginuid;
> - else if (!strcmp(feat, "cgroupns"))
> - chk_feature = check_cgroupns;
> - else {
> - pr_err("Unknown feature %s\n", feat);
> - return -1;
> + struct feature_list *fl;
> +
> + if (!strcmp(feat, "list")) {
> + for (fl = feature_list; fl->name; fl++)
> + pr_msg("%s ", fl->name);
> + pr_msg("\n");
> + return 1;
> }
>
> - return 0;
> + for (fl = feature_list; fl->name; fl++) {
> + if (!strcmp(feat, fl->name)) {
> + chk_feature = fl->func;
> + return 0;
> + }
> + }
> + pr_err("Unknown feature %s\n", feat);
> + return -1;
> +}
> +
> +static char *feature_name(int (*func)())
> +{
> + struct feature_list *fl;
> +
> + for (fl = feature_list; fl->func; fl++) {
> + if (fl->func == func)
> + return fl->name;
> + }
> + return NULL;
> }
> diff --git a/criu/cr-service.c b/criu/cr-service.c
> index 88d4af7..032c763 100644
> --- a/criu/cr-service.c
> +++ b/criu/cr-service.c
> @@ -567,9 +567,6 @@ static int check(int sk)
>
> setproctitle("check --rpc");
>
> - /* Check only minimal kernel support */
> - opts.check_ms_kernel = true;
> -
> if (!cr_check())
> resp.success = true;
>
> diff --git a/criu/crtools.c b/criu/crtools.c
> index d6e8672..db4c2d5 100644
> --- a/criu/crtools.c
> +++ b/criu/crtools.c
> @@ -275,6 +275,9 @@ int main(int argc, char *argv[], char *envp[])
> { "timeout", required_argument, 0,
> 1072 },
> { "external", required_argument, 0,
> 1073 },
> { "empty-ns", required_argument, 0,
> 1074 },
> + { "extra", no_argument, 0,
> 1075 },
> + { "experimental", no_argument, 0,
> 1076 },
> + { "all", no_argument, 0,
> 1077 },
> { },
> };
>
> @@ -455,8 +458,8 @@ int main(int argc, char *argv[], char *envp[])
> opts.force_irmap = true;
> break;
> case 1054:
> - opts.check_ms_kernel = true;
> - break;
> + pr_err("--ms is deprecated; see \"Check options\"
> of criu --help\n");
> + return 1;
> case 'L':
> opts.libdir = optarg;
> break;
> @@ -490,8 +493,11 @@ int main(int argc, char *argv[], char *envp[])
> return 1;
> break;
> case 1063:
> - if (check_add_feature(optarg) < 0)
> + ret = check_add_feature(optarg);
> + if (ret < 0) /* invalid kernel feature name */
> return 1;
> + if (ret > 0) /* list kernel features and exit */
> + return 0;
> break;
> case 1064:
> if (!add_skip_mount(optarg))
> @@ -554,6 +560,16 @@ int main(int argc, char *argv[], char *envp[])
> return 1;
> }
> break;
> + case 1075:
> + opts.check_extra_features = true;
> + break;
> + case 1076:
> + opts.check_experimental_features = true;
> + break;
> + case 1077:
> + opts.check_extra_features = true;
> + opts.check_experimental_features = true;
> + break;
> case 'V':
> pr_msg("Version: %s\n", CRIU_VERSION);
> if (strcmp(CRIU_GITID, "0"))
> @@ -715,7 +731,7 @@ usage:
> "Usage:\n"
> " criu dump|pre-dump -t PID [<options>]\n"
> " criu restore [<options>]\n"
> -" criu check [--ms]\n"
> +" criu check [--feature FEAT]\n"
> " criu exec -p PID <syscall-string>\n"
> " criu page-server\n"
> " criu service [<options>]\n"
> @@ -808,8 +824,19 @@ usage:
> " socket[inode]\n"
> " file[mnt_id:inode]\n"
> " --empty-ns {net}\n"
> -" Create a namespace, but don't restore its
> properies.\n"
> -" An user will retore them from action scripts.\n"
> +" Create a namespace, but don't restore its
> properies.\n"
> +" An user will retore them from action scripts.\n"
> +"Check options:\n"
> +" without any arguments, \"criu check\" checks availability of
> absolutely required\n"
> +" kernel features; if any of these features is missing dump and restore
> will fail\n"
> +" --extra also check availability of extra kernel
> features\n"
> +" --experimental also check availability of experimental kernel
> features\n"
> +" --all also check availability of extra and
> experimental kernel features\n"
> +" --feature FEAT only check availability of one of the following
> kernel features\n"
> +" "
> + );
> + check_add_feature("list");
> + pr_msg(
> "\n"
> "* Logging:\n"
> " -o|--log-file FILE log file name\n"
> @@ -837,7 +864,6 @@ usage:
> "Other options:\n"
> " -h|--help show this text\n"
> " -V|--version show version\n"
> -" --ms don't check not yet merged kernel features\n"
> );
>
> return 0;
> diff --git a/criu/include/cr_options.h b/criu/include/cr_options.h
> index a6f0b3e..5e1cc21 100644
> --- a/criu/include/cr_options.h
> +++ b/criu/include/cr_options.h
> @@ -56,7 +56,8 @@ struct cr_options {
> int final_state;
> char *show_dump_file;
> char *show_fmt;
> - bool check_ms_kernel;
> + bool check_extra_features;
> + bool check_experimental_features;
> bool show_pages_content;
> union {
> bool restore_detach;
> diff --git a/criu/timerfd.c b/criu/timerfd.c
> index 942e5e6..13a6570 100644
> --- a/criu/timerfd.c
> +++ b/criu/timerfd.c
> @@ -52,7 +52,7 @@ int check_timerfd(void)
> ret = ioctl(fd, TFD_IOC_SET_TICKS, NULL);
> if (ret < 0) {
> if (errno != EFAULT)
> - pr_perror("No timerfd support for c/r");
> + pr_warn("No timerfd support for c/r");
> else
> ret = 0;
> }
> --
> 2.7.0.rc3.207.g0ac5344
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20160310/84af60be/attachment-0001.html>
More information about the CRIU
mailing list