[CRIU] [PATCH] restore: helpers and zombies should collect their children
Pavel Emelyanov
xemul at virtuozzo.com
Sun Mar 20 23:44:02 PDT 2016
On 03/17/2016 05:05 PM, Tycho Andersen wrote:
> Consider when there is a double fork of helpers or zombies, e.g. when a
> zombie has a session id which doesn't match its pid.
That's the
task
`- helper
`- zombie
case.
> If the child dies and
> exits before the grandchild, the grandchild reparents to init, and when the
> task dies init doesn't have it in the helper list, so init dies as well,
> viz. the log below.
>
>
> Let's have the zombies and helpers reap their children before they exit to
> avoid this.
But how about
task
`- zombie
`- helper
one? How can _this_ happen?
And some more comments inline.
> v2: block SIGCHLD when waiting on helpers so that it doesn't race with the
> SICGHLD handler
>
> Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> ---
> Full log is available at: http://paste.ubuntu.com/15396732/
> ---
> criu/cr-restore.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> index 30ddff9..7c03190 100644
> --- a/criu/cr-restore.c
> +++ b/criu/cr-restore.c
> @@ -972,6 +972,46 @@ static inline int sig_fatal(int sig)
> struct task_entries *task_entries;
> static unsigned long task_entries_pos;
>
> +static int wait_on_helpers_zombies(void)
> +{
> + struct pstree_item *pi;
> + sigset_t blockmask, oldmask;
> +
> + sigemptyset(&blockmask);
> + sigaddset(&blockmask, SIGCHLD);
> +
> + if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> + pr_perror("Can not set mask of blocked signals");
> + return -1;
> + }
> +
> + list_for_each_entry(pi, ¤t->children, sibling) {
> + pid_t pid = pi->pid.virt;
> + int status;
> +
> + switch (pi->state) {
> + case TASK_DEAD:
> + if (waitid(P_PID, pid, NULL, WNOWAIT | WEXITED) < 0) {
Why P_PID?
> + pr_perror("Wait on %d zombie failed\n", pid);
> + return -1;
> + }
> + futex_dec_and_wake(&task_entries->nr_in_progress);
Hasn't the dead guy decremented this value himself?
> + case TASK_HELPER:
> + if (waitpid(pid, &status, 0) != pid) {
> + pr_perror("waitpid for helper %d failed", pid);
> + return -1;
> + }
> + }
> + }
> +
> + if (sigprocmask(SIG_SETMASK, &oldmask, NULL) == -1) {
> + pr_perror("Can not unset mask of blocked signals");
> + BUG();
> + }
> +
> + return 0;
> +}
> +
> static int restore_one_zombie(CoreEntry *core)
> {
> int exit_code = core->tc->exit_code;
> @@ -985,6 +1025,8 @@ static int restore_one_zombie(CoreEntry *core)
>
> if (task_entries != NULL) {
> restore_finish_stage(CR_STATE_RESTORE);
> + if (wait_on_helpers_zombies())
> + pr_err("failed to wait on helpers and zombies\n");
> zombie_prepare_signals();
> }
>
> @@ -1057,7 +1099,12 @@ static int restore_one_task(int pid, CoreEntry *core)
> ret = restore_one_zombie(core);
> else if (current->state == TASK_HELPER) {
> restore_finish_stage(CR_STATE_RESTORE);
> - ret = 0;
> + if (wait_on_helpers_zombies()) {
> + pr_err("failed to wait on helpers and zombies\n");
> + ret = -1;
> + } else {
> + ret = 0;
> + }
> } else {
> pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
> ret = -1;
>
More information about the CRIU
mailing list