[CRIU] [PATCH v2] seize: Wait the freezer to complete before processing tags

Mon Jul 11 14:26:09 PDT 2016

Currently, when we use cgroup freezer to seize the tasks we start freezer
and then without waiting the completion of transition procedure we are
seizing tasks read from freezer @tasks file, using fgets.

This is fragile construction because fgets uses internal buffer and tasks
we've read might be exiting same time while we're freezing them,
the kernel won't freeze these exiting tasks because they are dying
anyway and I fear we might read a pid here which is not even in
our cgroup anymore but reused with another out of cgroup task.

Thus lets do the following: use iterations to freeze tasks waiting
for freezer to change its state and then collect/seize all tasks
in one pass.

For example on container I'm playing with it takes just one iteration

 | (00.013690) cg: Set 1 is criu one
 | (00.013705) freezing processes: 1800000 attempst with 100 ms steps
 | (00.013720) freezer.state=THAWED
 | (00.013795) freezer.state=FREEZING
 | (00.113962) freezer.state=FROZEN
 | (00.113990) freezing processes: 1 attempts done
 | (00.114073) SEIZE 240893 (comm systemd): success
 | (00.114110) Warn  (ptrace.c:121): Unable to interrupt task: 240905 (comm kthreadd/1) (Operation not permitted)
 | (00.114136) Warn  (ptrace.c:121): Unable to interrupt task: 240906 (comm khelper) (Operation not permitted)
 | (00.114155) SEIZE 240969 (comm screen): success
 | (00.114166) SEIZE 240970 (comm sendmail): success
 | (00.114179) SEIZE 240971 (comm sendmail): success
 | (00.114189) SEIZE 240972 (comm saslauthd): success
 | (00.114202) SEIZE 240973 (comm crond): success
 | (00.114211) SEIZE 240974 (comm agetty): success
 | (00.114221) SEIZE 240975 (comm agetty): success
 | ...

https://jira.sw.ru/browse/PSBM-49439

Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
---
 criu/seize.c | 66 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/criu/seize.c b/criu/seize.c
index ad88ea0..0eb8242 100644
--- a/criu/seize.c
+++ b/criu/seize.c
@@ -135,6 +135,19 @@ static int seize_cgroup_tree(char *root_path, const char *state)
 			if (stat(buf, &st) == -1 && errno == ENOENT)
 				continue;
 
+			/*
+			 * When task is exiting while we're freezing
+			 * it we may hit a contention: freezer tries
+			 * to freeze the task and send fake signal
+			 * setting TIF_SIGPENDING assuming the next
+			 * time the task scheduled for execution it
+			 * will enter try_to_freeze and stand still.
+			 *
+			 * In turn same this task on another cpu is
+			 * doing do_exit() and once it start calling
+			 * cgroup_exit it will wait for css_set_lock
+			 * to relase. Thus we may meet a zombie here
+			 */
 			/* fails when meets a zombie */
 			pr_err("zombie %d (comm %s) found while seizing\n",
 			       pid, __task_comm_info(pid));
@@ -291,45 +304,38 @@ static int freeze_processes(void)
 			close(fd);
 			return -1;
 		}
-	}
 
-	/*
-	 * There is not way to wait a specified state, so we need to poll the
-	 * freezer.state.
-	 * Here is one extra attempt to check that everything are frozen.
-	 */
-	for (i = 0; i <= nr_attempts; i++) {
-		if (seize_cgroup_tree(opts.freeze_cgroup, state) < 0)
-			goto err;
+		/*
+		 * Wait the freezer to complete before
+		 * processing tasks. They might be exiting
+		 * before freezing complete so we should
+		 * not read @tasks pids while freezer in
+		 * transition stage.
+		 */
+		for (i = 0; i <= nr_attempts; i++) {
+			state = get_freezer_state(fd);
+			if (!state) {
+				close(fd);
+				return -1;
+			}
 
-		if (state == frozen)
-			break;
+			if (state == frozen)
+				break;
+			if (alarm_timeouted())
+				goto err;
+			nanosleep(&req, NULL);
+		}
 
-		state = get_freezer_state(fd);
-		if (!state)
+		if (i > nr_attempts) {
+			pr_err("Unable to freeze cgroup %s\n", opts.freeze_cgroup);
 			goto err;
-
-		if (state == frozen) {
-			/*
-			 * Enumerate all tasks one more time to collect all new
-			 * tasks, which can be born while the cgroup is being frozen.
-			 */
-
-			continue;
 		}
 
-		if (alarm_timeouted())
-			goto err;
-		nanosleep(&req, NULL);
+		pr_debug("freezing processes: %lu attempts done\n", i);
 	}
 
-	if (i > nr_attempts) {
-		pr_err("Unable to freeze cgroup %s\n", opts.freeze_cgroup);
-		goto err;
-	}
+	exit_code = seize_cgroup_tree(opts.freeze_cgroup, state);
 
-	pr_debug("freezing processes: %lu attempts done\n", i);
-	exit_code = 0;
 err:
 	if (exit_code == 0 || freezer_thawed) {
 		lseek(fd, 0, SEEK_SET);
-- 
2.7.4