[CRIU] [PATCH 7/7] restore: handle the case where zombies are reparented

Tycho Andersen tycho.andersen at canonical.com
Wed Jun 29 08:42:39 PDT 2016


On Tue, Jun 28, 2016 at 09:04:08AM -0600, Tycho Andersen wrote:
> On Tue, Jun 28, 2016 at 04:00:50PM +0300, Pavel Emelyanov wrote:
> > On 06/23/2016 06:13 PM, Tycho Andersen wrote:
> > > This commit is a little ugly: we are now passing three lists of pids
> > > (helpers, zombies, and things that are allowed to die) to the restorer blob
> > > for each task. We do need to have some way to distinguish, because:
> > > 
> > > * we want to wait on helpers
> > > * we don't want to wait on zombies (i.e. we want to keep the pid dead), but
> > >   confirm that it actually died
> > > * we don't want to wait on anything that gets reparented, but we don't want
> > >   to error out if we get a SIGCHLD for it
> > > 
> > > We could introduce a new struct:
> > > 
> > > struct pid_info {
> > >   pid_t pid;
> > >   bool  zombie;
> > >   bool  direct_child;
> > > }
> > > 
> > > to handle this instead of the three separate lists. I'm not sure which is
> > > cleaner, but I'd be happy to refactor to the other way if that's better.
> > 
> > What if we count these non-ours zombies into existing zombies? The only difference
> > from this patch I see is that the wait_zombies() would call wait() on it. Is it bad?
> 
> I thought I tried this and ran into some problem, but I don't remember
> what it was. I might have done this before I had the other patch about
> calculating the number of zombies, which probably would have caused
> problems. Anyway, I'll revisit this and figure it out for the next
> series.

Oh, I remember why now: because sometimes zombies and helpers won't be
reparented to init, so we can't wait() for them always, but we need to
allow them to die if we do see them. I suppose we could be more clever
here and only collect pids which will be reparented (i.e. zombies
whose parent is itself a helper).

Another thing I noticed about this patch is that since helpers wait on
their children, only direct children of init should ever be waited in
init, so I think we don't need to do the recursive bits for that; it's
only the zombies that cause this problem.

I'll send an updated patch which will still be fairly ugly, so we can
continue the discussion :)

Tycho


More information about the CRIU mailing list