[CRIU] hang when restoring container with zombies

Mon Jul 20 05:16:16 PDT 2015

On Fri, Jul 17, 2015 at 09:20:28PM +0300, Pavel Emelyanov wrote:
> On 07/17/2015 07:41 PM, Tycho Andersen wrote:
> > On Fri, Jul 17, 2015 at 07:15:55PM +0300, Pavel Emelyanov wrote:
> >> On 07/17/2015 06:36 PM, Tycho Andersen wrote:
> >>> Hi all,
> >>>
> >>> I'm experiencing a hang when restoring a process with zombies; the
> >>> zombies exit, but the parent process (in this case the container's
> >>> init) isn't getting the SIGCHLD, so it just gets stuck waiting for
> >>> zombies_inprogress. The parent process' /proc/pid/status is below, and
> >>> it doesn't seem to be blocking SIGCHLD and there are no pending
> >>> signals. I stuck a printf in the sigchld_handler in the restorer blob,
> >>> and it does get called for sid helpers, but not for the zombie
> >>> processes.
> >>>
> >>> Does anyone have any ideas about what's going wrong? I have no idea
> >>> why the signal would be blocked.
> >>
> >> Maybe it's not blocked but merged with another (previous) sigchild?
> > 
> > Yep, I just traced it and that's exactly what's happening. So I think
> > the right thing to do here is to waitpid() in a loop in the restorer
> > blob's sigchld_handler so that we make sure to collect all the
> > processes that have died?
> 
> No, we can't call the waitpid() in pie/restore.c's sigchil_handler()
> since we must _leave_ the zombie in zombie state :)

We can try to use waitid with WNOWAIT
"""
WNOWAIT     Leave the child in a waitable state; a later wait call  can
	   be used to again retrieve the child status information.
"""

> 
> > If that sounds ok, I'll send a patch.
> > 
> > Tycho
> > .
> > 
> 
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu