[CRIU] [PATCH] restore: helpers and zombies should collect their children

Tycho Andersen tycho.andersen at canonical.com
Tue Mar 15 15:48:37 PDT 2016


Consider when there is a double fork of helpers or zombies, e.g. when a
zombie has a session id which doesn't match its pid. If the child dies and
exits before the grandchild, the grandchild reparents to init, and when the
task dies init doesn't have it in the helper list, so init dies as well,
viz. the log below.

(00.118789) Add a helper 293 for restoring SID 293
(00.118792) Attach 294 to the temporary task 293
...
(01.394403)    294: Restoring zombie with 0 code
...
pie: Task 294  exited, status= 0
(01.434279) Error (cr-restore.c:1308): 12097 killed by signal 19
(01.434420) Error (cr-restore.c:1308): 12097 killed by signal 19
(01.450258) Switching to new ns to clean ghosts
(01.450324) Error (cr-restore.c:2138): Restoring FAILED.

Let's have the zombies and helpers reap their children before they exit to
avoid this.

Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
---
Full log is available at: http://paste.ubuntu.com/15396732/
---
 criu/cr-restore.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/criu/cr-restore.c b/criu/cr-restore.c
index 30ddff9..f95adce 100644
--- a/criu/cr-restore.c
+++ b/criu/cr-restore.c
@@ -972,6 +972,32 @@ static inline int sig_fatal(int sig)
 struct task_entries *task_entries;
 static unsigned long task_entries_pos;
 
+static int wait_on_helpers_zombies(void)
+{
+	struct pstree_item *pi;
+
+	list_for_each_entry(pi, &current->children, sibling) {
+		pid_t pid = pi->pid.virt;
+		int status;
+
+		switch (pi->state) {
+		case TASK_DEAD:
+			if (waitid(P_PID, pid, NULL, WNOWAIT | WEXITED) < 0) {
+				pr_perror("Wait on %d zombie failed\n", pid);
+				return -1;
+			}
+			futex_dec_and_wake(&task_entries->nr_in_progress);
+		case TASK_HELPER:
+			if (waitpid(pid, &status, 0) != pid) {
+				pr_perror("waitpid for helper %d failed", pid);
+				return -1;
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int restore_one_zombie(CoreEntry *core)
 {
 	int exit_code = core->tc->exit_code;
@@ -985,6 +1011,8 @@ static int restore_one_zombie(CoreEntry *core)
 
 	if (task_entries != NULL) {
 		restore_finish_stage(CR_STATE_RESTORE);
+		if (wait_on_helpers_zombies())
+			pr_err("failed to wait on helpers and zombies\n");
 		zombie_prepare_signals();
 	}
 
@@ -1057,7 +1085,12 @@ static int restore_one_task(int pid, CoreEntry *core)
 		ret = restore_one_zombie(core);
 	else if (current->state == TASK_HELPER) {
 		restore_finish_stage(CR_STATE_RESTORE);
-		ret = 0;
+		if (wait_on_helpers_zombies()) {
+			pr_err("failed to wait on helpers and zombies\n");
+			ret = -1;
+		} else {
+			ret = 0;
+		}
 	} else {
 		pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
 		ret = -1;
-- 
2.7.0



More information about the CRIU mailing list