[CRIU] [PATCH] restore: helpers and zombies should collect their children

Tycho Andersen tycho.andersen at canonical.com
Mon Mar 21 07:51:04 PDT 2016


On Mon, Mar 21, 2016 at 09:44:02AM +0300, Pavel Emelyanov wrote:
> On 03/17/2016 05:05 PM, Tycho Andersen wrote:
> > Consider when there is a double fork of helpers or zombies, e.g. when a
> > zombie has a session id which doesn't match its pid.
> 
> That's the
> 
> task
>  `- helper
>      `- zombie
> 
> case.
> 
> > If the child dies and
> > exits before the grandchild, the grandchild reparents to init, and when the
> > task dies init doesn't have it in the helper list, so init dies as well,
> > viz. the log below.
> > 
> > 
> > Let's have the zombies and helpers reap their children before they exit to
> > avoid this.
> 
> But how about
> 
> task
>  `- zombie
>      `- helper
> 
> one? How can _this_ happen?

Ah, I don't think it can :)

> And some more comments inline.
> 
> > v2: block SIGCHLD when waiting on helpers so that it doesn't race with the
> >     SICGHLD handler
> > 
> > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > ---
> > Full log is available at: http://paste.ubuntu.com/15396732/
> > ---
> >  criu/cr-restore.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 48 insertions(+), 1 deletion(-)
> > 
> > diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> > index 30ddff9..7c03190 100644
> > --- a/criu/cr-restore.c
> > +++ b/criu/cr-restore.c
> > @@ -972,6 +972,46 @@ static inline int sig_fatal(int sig)
> >  struct task_entries *task_entries;
> >  static unsigned long task_entries_pos;
> >  
> > +static int wait_on_helpers_zombies(void)
> > +{
> > +	struct pstree_item *pi;
> > +	sigset_t blockmask, oldmask;
> > +
> > +	sigemptyset(&blockmask);
> > +	sigaddset(&blockmask, SIGCHLD);
> > +
> > +	if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> > +		pr_perror("Can not set mask of blocked signals");
> > +		return -1;
> > +	}
> > +
> > +	list_for_each_entry(pi, &current->children, sibling) {
> > +		pid_t pid = pi->pid.virt;
> > +		int status;
> > +
> > +		switch (pi->state) {
> > +		case TASK_DEAD:
> > +			if (waitid(P_PID, pid, NULL, WNOWAIT | WEXITED) < 0) {
> 
> Why P_PID?

Because we want to wait on exactly this pid that we pass and not any
others (this is the same thing we do in pie/restorer.c).

> > +				pr_perror("Wait on %d zombie failed\n", pid);
> > +				return -1;
> > +			}
> > +			futex_dec_and_wake(&task_entries->nr_in_progress);
> 
> Hasn't the dead guy decremented this value himself?

Yes, you're right. I'll send a revised version with this and the fix
above.

Tycho

> > +		case TASK_HELPER:
> > +			if (waitpid(pid, &status, 0) != pid) {
> > +				pr_perror("waitpid for helper %d failed", pid);
> > +				return -1;
> > +			}
> > +		}
> > +	}
> > +
> > +	if (sigprocmask(SIG_SETMASK, &oldmask, NULL) == -1) {
> > +		pr_perror("Can not unset mask of blocked signals");
> > +		BUG();
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int restore_one_zombie(CoreEntry *core)
> >  {
> >  	int exit_code = core->tc->exit_code;
> > @@ -985,6 +1025,8 @@ static int restore_one_zombie(CoreEntry *core)
> >  
> >  	if (task_entries != NULL) {
> >  		restore_finish_stage(CR_STATE_RESTORE);
> > +		if (wait_on_helpers_zombies())
> > +			pr_err("failed to wait on helpers and zombies\n");
> >  		zombie_prepare_signals();
> >  	}
> >  
> > @@ -1057,7 +1099,12 @@ static int restore_one_task(int pid, CoreEntry *core)
> >  		ret = restore_one_zombie(core);
> >  	else if (current->state == TASK_HELPER) {
> >  		restore_finish_stage(CR_STATE_RESTORE);
> > -		ret = 0;
> > +		if (wait_on_helpers_zombies()) {
> > +			pr_err("failed to wait on helpers and zombies\n");
> > +			ret = -1;
> > +		} else {
> > +			ret = 0;
> > +		}
> >  	} else {
> >  		pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
> >  		ret = -1;
> > 
> 


More information about the CRIU mailing list