[CRIU] crash when restoring with current git master?
Pavel Emelyanov
xemul at parallels.com
Fri Jun 5 01:09:47 PDT 2015
On 06/05/2015 03:14 AM, Tycho Andersen wrote:
> On Thu, Jun 04, 2015 at 04:29:25PM -0600, Tycho Andersen wrote:
>> On Thu, Jun 04, 2015 at 11:54:54PM +0300, Pavel Emelyanov wrote:
>>
>>>>>> +
>>>>>> +err:
>>>>>> + futex_abort_and_wake(&task_entries->nr_in_progress);
>>>>>> + return -1;
>>>>>> }
>>>>>
>>>>> But this thing has never been here. Instead, when child gets an error is
>>>>> exits and then the sigchld_handler() runs and does futex_abort_and_wake().
>>>>> Why hasn't this logic worked this time?
>>>>
>>>> I just got around to looking at this again, and I'm seeing:
>>>>
>>>> ShdPnd: 0000000000010000
>>>> SigBlk: fffffffe7ffbfeff
>>>>
>>>> in the parent of the process that died. If my math is right that's the
>>>> 17th bit, which is SIGCHLD. I don't know enough about why that
>>>> wouldn't get delivered, though, given SigBlk.
>>>
>>> Yes, the SIGCHLD is pending and is blocked too.
>>
>> Oh, whoops, yes, the mask goes the other way. Sorry about that :)
>>
>>> When restore starts CRIU,
>>> before forking the root, blocks all signals but sigchild (criu_signals_setup).
>>> Maybe SIGCHILD was blocked _before_ CRIU started and this block got inherited?
>>
>> I just checked and it doesn't look like it (nothing is blocked when
>> exec()ing criu or just after criu gets exec'd). I'll poke around and
>> see what I can figure out.
>
> Oh no, I was looking at the wrong thing. It seems this is exactly what
> happens (LXC blocks SIGCHLD and then forks criu). This makes me wonder
> how it ever worked and why I never saw this before...
Presumably you've always had successful restores :D
Now we know that CRIU has been bug-free since the recent time :)
> Is there any reason not to apply the attached patch? With this it
> works for me, and it seems like since criu expects this we should do
> it.
> @@ -1255,7 +1255,7 @@ static int criu_signals_setup(void)
> */
> sigfillset(&blockmask);
> sigdelset(&blockmask, SIGCHLD);
> - ret = sigprocmask(SIG_BLOCK, &blockmask, NULL);
> + ret = sigprocmask(SIG_SETMASK, &blockmask, NULL);
> if (ret < 0) {
> pr_perror("Can't block signals");
> return -1;
I like the patch. Please equip one with good description of why we do this,
and I'll apply one. And if you need the 1.5-stable update with this -- let
me know.
-- Pavel
More information about the CRIU
mailing list