[CRIU] [PATCH 6/7] restore: fix race with helpers' kids dying too early
Tycho Andersen
tycho.andersen at canonical.com
Tue Jun 28 07:57:52 PDT 2016
On Tue, Jun 28, 2016 at 03:52:49PM +0300, Pavel Emelyanov wrote:
> On 06/23/2016 06:13 PM, Tycho Andersen wrote:
> > We masked off SIGCHLD in wait_on_helpers_zombies(), but in fact this is too
> > late: zombies can die any time after CR_STATE_RESTORE before this function
> > is called, which lead to us getting "unexpected" deaths. Instead, we should
> > mask off SIGCHLD before the helpers finish CR_STATE_RESTORE, since they're
> > explicitly going to wait on all their kids to make sure they don't die
> > anyway.
> >
> > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > ---
> > criu/cr-restore.c | 24 ++++++++++--------------
> > 1 file changed, 10 insertions(+), 14 deletions(-)
> >
> > diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> > index 3651f54..771ea50 100644
> > --- a/criu/cr-restore.c
> > +++ b/criu/cr-restore.c
> > @@ -624,15 +624,6 @@ static unsigned long task_entries_pos;
> > static int wait_on_helpers_zombies(void)
> > {
> > struct pstree_item *pi;
> > - sigset_t blockmask, oldmask;
> > -
> > - sigemptyset(&blockmask);
> > - sigaddset(&blockmask, SIGCHLD);
> > -
> > - if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> > - pr_perror("Can not set mask of blocked signals");
> > - return -1;
> > - }
> >
> > list_for_each_entry(pi, ¤t->children, sibling) {
> > pid_t pid = pi->pid.virt;
> > @@ -654,11 +645,6 @@ static int wait_on_helpers_zombies(void)
> > }
> > }
> >
> > - if (sigprocmask(SIG_SETMASK, &oldmask, NULL) == -1) {
> > - pr_perror("Can not unset mask of blocked signals");
> > - BUG();
> > - }
> > -
> > return 0;
> > }
> >
> > @@ -746,6 +732,16 @@ static int restore_one_task(int pid, CoreEntry *core)
> > else if (current->pid.state == TASK_DEAD)
> > ret = restore_one_zombie(core);
> > else if (current->pid.state == TASK_HELPER) {
> > + sigset_t blockmask, oldmask;
> > +
> > + sigemptyset(&blockmask);
> > + sigaddset(&blockmask, SIGCHLD);
> > +
> > + if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
> > + pr_perror("Can not set mask of blocked signals");
> > + return -1;
> > + }
> > +
>
> Would this kill the --unshare pid functionality? The helper_cb below() is supposed
> to pretend to be pidns reaper and has to wait() for all the dying stuff in ns.
I don't think so, the fake init doesn't rely on SIGCHLD, it just keeps
wait()ing in a loop, so it should be ok.
> > restore_finish_stage(CR_STATE_RESTORE);
> > if (rsti(current)->helper_cb)
> > rsti(current)->helper_cb();
> >
>
More information about the CRIU
mailing list