[CRIU] crash in pb_read_one?

Pavel Emelyanov xemul at parallels.com
Tue Sep 16 12:42:17 PDT 2014


On 09/16/2014 11:32 PM, Tycho Andersen wrote:
> Hi Pavel,
> 
> On Tue, Sep 16, 2014 at 10:07:52PM +0400, Pavel Emelyanov wrote:
>> On 09/16/2014 09:44 PM, Tycho Andersen wrote:
>>> Hi Pavel,
>>>
>>> On Tue, Sep 16, 2014 at 12:02:19PM -0500, Tycho Andersen wrote:
>>>>>
>>>>> Hm... This somewhere should be strictly after all files from this
>>>>> helper has been opened. This can be pretty well determined by the
>>>>> remap->users count. Next, when creating such helpers we can feed
>>>>> 0 into clone flag's exit_signal field, thus causing this particular
>>>>> child to auto-reap, so once the remap->users count hits zero we
>>>>> can just shoot it with SIGKILL.
>>>>
>>>> Ah, that sounds like a better approach. Actually I don't think we need
>>>> to shoot it, we can just synchronize it to the end of the RESTORE
>>>> stage and it should Just Work. I will give that a try, seems much
>>>> cleaner than messing around with rst memory.
>>
>> Hm... Then we don't need the users counter as well. Just auto-reap.
>>
>>> Actually it looks like the clone flags for the helpers are 0, but they
>>> still aren't auto-reaped when they exit (i.e. they are zombies, which
>>> need a wait() call). What am I missing?
>>
>> ret = clone(restore_task_with_children, ca.stack_ptr,
>>                         ca.clone_flags | SIGCHLD, &ca);
>>
>> This "| SIGCHLD" reaps auto-reap.
> 
> When I do this I get something like,
> 
> pie: 5: Collect a zombie with (pid 17, 17)
> 
> in the log. I think this means it is working, but that we still need
> to pass down the helper PIDs so that we can ignore them when they are

:( Can we make all this helpers be root's children to have this list
only for the root task?

> reaped by the restorer blob's handler. Also, isn't there a race where
> if the restore finishes entirely before the handler actually dies,
> that the restored process gets a SIGCHLD?

We've solved this with stages. I can't tell you the full story, it
was quite a while ago :) but the final staging we have right now
does prevents us from restored tasks seeing "wrong" handlers or
alien signals.

> I think I am seeing something like this in the session00 test.
> 
> Tycho
> .
> 



More information about the CRIU mailing list