[CRIU] [PATCHv3 2/3] restore: Fix deadlock when helper's child dies

Andrei Vagin avagin at virtuozzo.com
Thu Jul 20 10:39:49 MSK 2017


On Wed, Jul 19, 2017 at 05:02:53PM +0300, Dmitry Safonov wrote:
> Since commit ced9c529f687 ("restore: fix race with helpers' kids dying
> too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE.
> This way we avoided default criu sighandler as it doesn't expect that
> childs may die.
> 
> This is very racy as we wait on futex for another stage to be started,
> but the next stage may start only when all the tasks complete previous
> stage. If some children of helper dies, the helper may already have
> blocked SIGCHLD and have started sleeping on the futex. Then the next
> stage never comes and no one reads a pending SIGCHLD for helper.
> 
> A customer met this situation on the node, where the following
> (non-related) problem has occured:
> Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable)
> Then child criu of the helper has exited with error-code and the
> lockup has happened.
> 
> While we could fix it by aborting futex in the end of
> restore_task_with_children() for each (non-root also) tasks,
> that would be not completely correct:
> 1. All futex-waiting tasks will wake up after that and they
>    may not expect that some tasks are on the previous stage,
>    so they will spam into logs with unrelated errors and may
>    also die painfully.
> 2. Child may die and miss aborting of the futex due to:
>    o segfault
>    o OOM killer
>    o User-sended SIGKILL
>    o Other error-path we forgot to cover with abort futex
> 
> To fix this deadlock in TASK_HELPER, as suggested-by Kirill,
> let's check if there are children deaths expected - if there
> isn't any, don't block SIGCHLD, otherwise wait() and check if
> death was on expected stage of restore (not CR_STATE_RESTORE).
> 
> Signed-off-by: Dmitry Safonov <dsafonov at virtuozzo.com>
> ---
>  criu/cr-restore.c | 75 +++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 53 insertions(+), 22 deletions(-)
> 
> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> index 9143cd310fed..21c5fbf3eacc 100644
> --- a/criu/cr-restore.c
> +++ b/criu/cr-restore.c
> @@ -1207,6 +1207,58 @@ out:
>  	return ret;
>  }
>  
> +static bool child_death_expected(void)
> +{
> +	struct pstree_item *pi;
> +
> +	list_for_each_entry(pi, &current->children, sibling) {
> +		switch (pi->pid->state) {
> +		case TASK_DEAD:
> +		case TASK_HELPER:
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +

Could you write a comment for this function?

> +static int restore_one_helper(void)
> +{
> +	sigset_t oldmask;
> +	siginfo_t info;
> +
> +	if (prepare_fds(current))
> +		return -1;
> +
> +	if (!child_death_expected()) {
> +		restore_finish_stage(task_entries, CR_STATE_RESTORE);
> +		return 0;
> +	}
> +
> +	if (block_sigmask(&oldmask, SIGCHLD))

block_sigmask(NULL, SIGCHLD) has to work too, oldmask isn't used
> +		return -1;
> +
> +	/* Finish CR_STATE_RESTORE, but do not wait for the next stage. */
> +	futex_dec_and_wake(&task_entries->nr_in_progress);
> +
> +	if (waitid(P_ALL, 0, &info, WEXITED | WNOWAIT)) {
> +		pr_perror("Failed to wait\n");
> +		return -1;
> +	}
> +
> +	if (futex_get(&task_entries->start) == CR_STATE_RESTORE) {
> +		pr_err("Child %d died too early\n", info.si_pid);
> +		return -1;
> +	}
> +
> +	if (wait_on_helpers_zombies()) {
> +		pr_err("Failed to wait on helpers and zombies\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  static int restore_one_task(int pid, CoreEntry *core)
>  {
>  	int i, ret;
> @@ -1218,32 +1270,11 @@ static int restore_one_task(int pid, CoreEntry *core)
>  	else if (current->pid->state == TASK_DEAD)
>  		ret = restore_one_zombie(core);
>  	else if (current->pid->state == TASK_HELPER) {
> -		sigset_t blockmask, oldmask;
> -
> -		if (prepare_fds(current))
> -			return -1;
> -
> -		sigemptyset(&blockmask);
> -		sigaddset(&blockmask, SIGCHLD);
> -
> -		if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> -			pr_perror("Can not set mask of blocked signals");
> -			return -1;
> -		}
> -
> -		restore_finish_stage(task_entries, CR_STATE_RESTORE);
> -
> +		ret = restore_one_helper();
>  		close_image_dir();
>  		close_proc();
>  		for (i = SERVICE_FD_MIN + 1; i < SERVICE_FD_MAX; i++)
>  			close_service_fd(i);
> -
> -		if (wait_on_helpers_zombies()) {
> -			pr_err("failed to wait on helpers and zombies\n");
> -			ret = -1;
> -		} else {
> -			ret = 0;
> -		}
>  	} else {
>  		pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
>  		ret = -1;
> -- 
> 2.13.1
> 
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu


More information about the CRIU mailing list