[CRIU] [PATCH] restore: helpers and zombies should collect their children

Tycho Andersen tycho.andersen at canonical.com
Thu Mar 17 07:05:58 PDT 2016


Consider when there is a double fork of helpers or zombies, e.g. when a
zombie has a session id which doesn't match its pid. If the child dies and
exits before the grandchild, the grandchild reparents to init, and when the
task dies init doesn't have it in the helper list, so init dies as well,
viz. the log below.

(00.118789) Add a helper 293 for restoring SID 293
(00.118792) Attach 294 to the temporary task 293
...
(01.394403)    294: Restoring zombie with 0 code
...
pie: Task 294  exited, status= 0
(01.434279) Error (cr-restore.c:1308): 12097 killed by signal 19
(01.434420) Error (cr-restore.c:1308): 12097 killed by signal 19
(01.450258) Switching to new ns to clean ghosts
(01.450324) Error (cr-restore.c:2138): Restoring FAILED.

Let's have the zombies and helpers reap their children before they exit to
avoid this.

v2: block SIGCHLD when waiting on helpers so that it doesn't race with the
    SICGHLD handler

Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
---
Full log is available at: http://paste.ubuntu.com/15396732/
---
 criu/cr-restore.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index 30ddff9..7c03190 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -972,6 +972,46 @@ static inline int sig_fatal(int sig)
 struct task_entries *task_entries;
 static unsigned long task_entries_pos;
 
+static int wait_on_helpers_zombies(void)
+{
+	struct pstree_item *pi;
+	sigset_t blockmask, oldmask;
+
+	sigemptyset(&blockmask);
+	sigaddset(&blockmask, SIGCHLD);
+
+	if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
+		pr_perror("Can not set mask of blocked signals");
+		return -1;
+	}
+
+	list_for_each_entry(pi, &current->children, sibling) {
+		pid_t pid = pi->pid.virt;
+		int status;
+
+		switch (pi->state) {
+		case TASK_DEAD:
+			if (waitid(P_PID, pid, NULL, WNOWAIT | WEXITED) < 0) {
+				pr_perror("Wait on %d zombie failed\n", pid);
+				return -1;
+			}
+			futex_dec_and_wake(&task_entries->nr_in_progress);
+		case TASK_HELPER:
+			if (waitpid(pid, &status, 0) != pid) {
+				pr_perror("waitpid for helper %d failed", pid);
+				return -1;
+			}
+		}
+	}
+
+	if (sigprocmask(SIG_SETMASK, &oldmask, NULL) == -1) {
+		pr_perror("Can not unset mask of blocked signals");
+		BUG();
+	}
+
+	return 0;
+}
+
 static int restore_one_zombie(CoreEntry *core)
 {
 	int exit_code = core->tc->exit_code;
@@ -985,6 +1025,8 @@ static int restore_one_zombie(CoreEntry *core)
 
 	if (task_entries != NULL) {
 		restore_finish_stage(CR_STATE_RESTORE);
+		if (wait_on_helpers_zombies())
+			pr_err("failed to wait on helpers and zombies\n");
 		zombie_prepare_signals();
 	}
 
@@ -1057,7 +1099,12 @@ static int restore_one_task(int pid, CoreEntry *core)
 		ret = restore_one_zombie(core);
 	else if (current->state == TASK_HELPER) {
 		restore_finish_stage(CR_STATE_RESTORE);
-		ret = 0;
+		if (wait_on_helpers_zombies()) {
+			pr_err("failed to wait on helpers and zombies\n");
+			ret = -1;
+		} else {
+			ret = 0;
+		}
 	} else {
 		pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
 		ret = -1;
-- 
2.7.0



More information about the CRIU mailing list