[CRIU] hang when restoring container with zombies

Tycho Andersen tycho.andersen at canonical.com
Fri Jul 17 09:41:01 PDT 2015


On Fri, Jul 17, 2015 at 07:15:55PM +0300, Pavel Emelyanov wrote:
> On 07/17/2015 06:36 PM, Tycho Andersen wrote:
> > Hi all,
> > 
> > I'm experiencing a hang when restoring a process with zombies; the
> > zombies exit, but the parent process (in this case the container's
> > init) isn't getting the SIGCHLD, so it just gets stuck waiting for
> > zombies_inprogress. The parent process' /proc/pid/status is below, and
> > it doesn't seem to be blocking SIGCHLD and there are no pending
> > signals. I stuck a printf in the sigchld_handler in the restorer blob,
> > and it does get called for sid helpers, but not for the zombie
> > processes.
> > 
> > Does anyone have any ideas about what's going wrong? I have no idea
> > why the signal would be blocked.
> 
> Maybe it's not blocked but merged with another (previous) sigchild?

Yep, I just traced it and that's exactly what's happening. So I think
the right thing to do here is to waitpid() in a loop in the restorer
blob's sigchld_handler so that we make sure to collect all the
processes that have died?

If that sounds ok, I'll send a patch.

Tycho


More information about the CRIU mailing list