[CRIU] hang when restoring container with zombies

Mon Jul 20 08:31:54 PDT 2015

On Mon, Jul 20, 2015 at 05:53:59PM +0300, Pavel Emelyanov wrote:
> On 07/20/2015 04:22 PM, Andrew Vagin wrote:
> 
> >>>
> >>> I'll try to send a patch with WNOWAIT.
> >>
> >> I guess another option would be to rename zombie_lock to something
> >> else, and just have the helpers block on that lock too. I can do
> >> either, let me know which you prefer.
> > 
> > I vote for waitid. In this case we will able to remove zombie_lock.
> 
> I agree. But keep in mind, that the sigchild handler is also responsible
> for picking up tasks that died due to some errors and propagating this
> error back to criu main process.

Yes, I think we can be more explicit about this by passing a list of
the zombies to the restorer blob (vs. just assuming anything during
CR_STATE_RESTORE_SIGCHILD that dies is a zombie) if you want, sort of
how we do it with the helpers. I think we can just make one list of
expected_deaths and use that for zombies and helpers and waitid() on
it with WNOHANG | WNOWAIT.

Tycho