[CRIU] hang when restoring container with zombies
Pavel Emelyanov
xemul at parallels.com
Fri Jul 17 11:20:28 PDT 2015
On 07/17/2015 07:41 PM, Tycho Andersen wrote:
> On Fri, Jul 17, 2015 at 07:15:55PM +0300, Pavel Emelyanov wrote:
>> On 07/17/2015 06:36 PM, Tycho Andersen wrote:
>>> Hi all,
>>>
>>> I'm experiencing a hang when restoring a process with zombies; the
>>> zombies exit, but the parent process (in this case the container's
>>> init) isn't getting the SIGCHLD, so it just gets stuck waiting for
>>> zombies_inprogress. The parent process' /proc/pid/status is below, and
>>> it doesn't seem to be blocking SIGCHLD and there are no pending
>>> signals. I stuck a printf in the sigchld_handler in the restorer blob,
>>> and it does get called for sid helpers, but not for the zombie
>>> processes.
>>>
>>> Does anyone have any ideas about what's going wrong? I have no idea
>>> why the signal would be blocked.
>>
>> Maybe it's not blocked but merged with another (previous) sigchild?
>
> Yep, I just traced it and that's exactly what's happening. So I think
> the right thing to do here is to waitpid() in a loop in the restorer
> blob's sigchld_handler so that we make sure to collect all the
> processes that have died?
No, we can't call the waitpid() in pie/restore.c's sigchil_handler()
since we must _leave_ the zombie in zombie state :)
> If that sounds ok, I'll send a patch.
>
> Tycho
> .
>
More information about the CRIU
mailing list