[CRIU] [PATCH 6/7] restore: fix race with helpers' kids dying too early

Pavel Emelyanov xemul at virtuozzo.com
Tue Jun 28 05:52:49 PDT 2016


On 06/23/2016 06:13 PM, Tycho Andersen wrote:
> We masked off SIGCHLD in wait_on_helpers_zombies(), but in fact this is too
> late: zombies can die any time after CR_STATE_RESTORE before this function
> is called, which lead to us getting "unexpected" deaths. Instead, we should
> mask off SIGCHLD before the helpers finish CR_STATE_RESTORE, since they're
> explicitly going to wait on all their kids to make sure they don't die
> anyway.
> 
> Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> ---
>  criu/cr-restore.c | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> index 3651f54..771ea50 100644
> --- a/criu/cr-restore.c
> +++ b/criu/cr-restore.c
> @@ -624,15 +624,6 @@ static unsigned long task_entries_pos;
>  static int wait_on_helpers_zombies(void)
>  {
>  	struct pstree_item *pi;
> -	sigset_t blockmask, oldmask;
> -
> -	sigemptyset(&blockmask);
> -	sigaddset(&blockmask, SIGCHLD);
> -
> -	if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> -		pr_perror("Can not set mask of blocked signals");
> -		return -1;
> -	}
>  
>  	list_for_each_entry(pi, &current->children, sibling) {
>  		pid_t pid = pi->pid.virt;
> @@ -654,11 +645,6 @@ static int wait_on_helpers_zombies(void)
>  		}
>  	}
>  
> -	if (sigprocmask(SIG_SETMASK, &oldmask, NULL) == -1) {
> -		pr_perror("Can not unset mask of blocked signals");
> -		BUG();
> -	}
> -
>  	return 0;
>  }
>  
> @@ -746,6 +732,16 @@ static int restore_one_task(int pid, CoreEntry *core)
>  	else if (current->pid.state == TASK_DEAD)
>  		ret = restore_one_zombie(core);
>  	else if (current->pid.state == TASK_HELPER) {
> +		sigset_t blockmask, oldmask;
> +
> +		sigemptyset(&blockmask);
> +		sigaddset(&blockmask, SIGCHLD);
> +
> +		if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> +			pr_perror("Can not set mask of blocked signals");
> +			return -1;
> +		}
> +

Would this kill the --unshare pid functionality? The helper_cb below() is supposed
to pretend to be pidns reaper and has to wait() for all the dying stuff in ns.

>  		restore_finish_stage(CR_STATE_RESTORE);
>  		if (rsti(current)->helper_cb)
>  			rsti(current)->helper_cb();
> 



More information about the CRIU mailing list