[CRIU] hang when restoring container with zombies

Mon Jul 20 08:39:29 PDT 2015

2015-07-20 18:31 GMT+03:00 Tycho Andersen <tycho.andersen at canonical.com>:
> On Mon, Jul 20, 2015 at 05:53:59PM +0300, Pavel Emelyanov wrote:
>> On 07/20/2015 04:22 PM, Andrew Vagin wrote:
>>
>> >>>
>> >>> I'll try to send a patch with WNOWAIT.
>> >>
>> >> I guess another option would be to rename zombie_lock to something
>> >> else, and just have the helpers block on that lock too. I can do
>> >> either, let me know which you prefer.
>> >
>> > I vote for waitid. In this case we will able to remove zombie_lock.
>>
>> I agree. But keep in mind, that the sigchild handler is also responsible
>> for picking up tasks that died due to some errors and propagating this
>> error back to criu main process.
>
> Yes, I think we can be more explicit about this by passing a list of
> the zombies to the restorer blob (vs. just assuming anything during
> CR_STATE_RESTORE_SIGCHILD that dies is a zombie) if you want, sort of
> how we do it with the helpers. I think we can just make one list of
> expected_deaths and use that for zombies and helpers and waitid() on
> it with WNOHANG | WNOWAIT.

We need to use WNOWAIT only for zombies. And  I don't understand why
do we need to use WNOHANG if we want to wait events.

>
> Tycho