[CRIU] [PATCH v2] seize: Wait the freezer to complete before processing tags

Mon Jul 11 18:32:20 PDT 2016

On Tue, Jul 12, 2016 at 12:26:09AM +0300, Cyrill Gorcunov wrote:
> Currently, when we use cgroup freezer to seize the tasks we start freezer
> and then without waiting the completion of transition procedure we are
> seizing tasks read from freezer @tasks file, using fgets.
> 
> This is fragile construction because fgets uses internal buffer and tasks
> we've read might be exiting same time while we're freezing them,
> the kernel won't freeze these exiting tasks because they are dying
> anyway and I fear we might read a pid here which is not even in
> our cgroup anymore but reused with another out of cgroup task.
> 
> Thus lets do the following: use iterations to freeze tasks waiting
> for freezer to change its state and then collect/seize all tasks
> in one pass.
> 
> For example on container I'm playing with it takes just one iteration
> 
>  | (00.013690) cg: Set 1 is criu one
>  | (00.013705) freezing processes: 1800000 attempst with 100 ms steps
>  | (00.013720) freezer.state=THAWED
>  | (00.013795) freezer.state=FREEZING
>  | (00.113962) freezer.state=FROZEN
>  | (00.113990) freezing processes: 1 attempts done
>  | (00.114073) SEIZE 240893 (comm systemd): success
>  | (00.114110) Warn  (ptrace.c:121): Unable to interrupt task: 240905 (comm kthreadd/1) (Operation not permitted)
>  | (00.114136) Warn  (ptrace.c:121): Unable to interrupt task: 240906 (comm khelper) (Operation not permitted)
>  | (00.114155) SEIZE 240969 (comm screen): success
>  | (00.114166) SEIZE 240970 (comm sendmail): success
>  | (00.114179) SEIZE 240971 (comm sendmail): success
>  | (00.114189) SEIZE 240972 (comm saslauthd): success
>  | (00.114202) SEIZE 240973 (comm crond): success
>  | (00.114211) SEIZE 240974 (comm agetty): success
>  | (00.114221) SEIZE 240975 (comm agetty): success
>  | ...
> 
> https://jira.sw.ru/browse/PSBM-49439
>

Acked-by: Andrew Vagin <avagin at gmail.com>

> Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
> ---
>  criu/seize.c | 66 +++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 36 insertions(+), 30 deletions(-)
> 
> diff --git a/criu/seize.c b/criu/seize.c
> index ad88ea0..0eb8242 100644
> --- a/criu/seize.c
> +++ b/criu/seize.c
> @@ -135,6 +135,19 @@ static int seize_cgroup_tree(char *root_path, const char *state)
>  			if (stat(buf, &st) == -1 && errno == ENOENT)
>  				continue;
>  
> +			/*
> +			 * When task is exiting while we're freezing
> +			 * it we may hit a contention: freezer tries
> +			 * to freeze the task and send fake signal
> +			 * setting TIF_SIGPENDING assuming the next
> +			 * time the task scheduled for execution it
> +			 * will enter try_to_freeze and stand still.
> +			 *
> +			 * In turn same this task on another cpu is
> +			 * doing do_exit() and once it start calling
> +			 * cgroup_exit it will wait for css_set_lock
> +			 * to relase. Thus we may meet a zombie here
> +			 */
>  			/* fails when meets a zombie */

should we remove the second comment?
>  			pr_err("zombie %d (comm %s) found while seizing\n",
>  			       pid, __task_comm_info(pid));
> @@ -291,45 +304,38 @@ static int freeze_processes(void)
>  			close(fd);
>  			return -1;
>  		}
> -	}
>  
> -	/*
> -	 * There is not way to wait a specified state, so we need to poll the
> -	 * freezer.state.
> -	 * Here is one extra attempt to check that everything are frozen.
> -	 */
> -	for (i = 0; i <= nr_attempts; i++) {
> -		if (seize_cgroup_tree(opts.freeze_cgroup, state) < 0)
> -			goto err;
> +		/*
> +		 * Wait the freezer to complete before
> +		 * processing tasks. They might be exiting
> +		 * before freezing complete so we should
> +		 * not read @tasks pids while freezer in
> +		 * transition stage.
> +		 */
> +		for (i = 0; i <= nr_attempts; i++) {
> +			state = get_freezer_state(fd);
> +			if (!state) {
> +				close(fd);
> +				return -1;
> +			}
>  
> -		if (state == frozen)
> -			break;
> +			if (state == frozen)
> +				break;
> +			if (alarm_timeouted())
> +				goto err;
> +			nanosleep(&req, NULL);
> +		}
>  
> -		state = get_freezer_state(fd);
> -		if (!state)
> +		if (i > nr_attempts) {
> +			pr_err("Unable to freeze cgroup %s\n", opts.freeze_cgroup);
>  			goto err;
> -
> -		if (state == frozen) {
> -			/*
> -			 * Enumerate all tasks one more time to collect all new
> -			 * tasks, which can be born while the cgroup is being frozen.
> -			 */
> -
> -			continue;
>  		}
>  
> -		if (alarm_timeouted())
> -			goto err;
> -		nanosleep(&req, NULL);
> +		pr_debug("freezing processes: %lu attempts done\n", i);
>  	}
>  
> -	if (i > nr_attempts) {
> -		pr_err("Unable to freeze cgroup %s\n", opts.freeze_cgroup);
> -		goto err;
> -	}
> +	exit_code = seize_cgroup_tree(opts.freeze_cgroup, state);
>  
> -	pr_debug("freezing processes: %lu attempts done\n", i);
> -	exit_code = 0;
>  err:
>  	if (exit_code == 0 || freezer_thawed) {
>  		lseek(fd, 0, SEEK_SET);
> -- 
> 2.7.4
>