[CRIU] crash when restoring with current git master?

Pavel Emelyanov xemul at parallels.com
Fri Jun 5 01:09:47 PDT 2015


On 06/05/2015 03:14 AM, Tycho Andersen wrote:
> On Thu, Jun 04, 2015 at 04:29:25PM -0600, Tycho Andersen wrote:
>> On Thu, Jun 04, 2015 at 11:54:54PM +0300, Pavel Emelyanov wrote:
>>
>>>>>> +
>>>>>> +err:
>>>>>> +	futex_abort_and_wake(&task_entries->nr_in_progress);
>>>>>> +	return -1;
>>>>>>  }
>>>>>
>>>>> But this thing has never been here. Instead, when child gets an error is
>>>>> exits and then the sigchld_handler() runs and does futex_abort_and_wake().
>>>>> Why hasn't this logic worked this time?
>>>>
>>>> I just got around to looking at this again, and I'm seeing:
>>>>
>>>> ShdPnd: 0000000000010000
>>>> SigBlk: fffffffe7ffbfeff
>>>>
>>>> in the parent of the process that died. If my math is right that's the
>>>> 17th bit, which is SIGCHLD. I don't know enough about why that
>>>> wouldn't get delivered, though, given SigBlk.
>>>
>>> Yes, the SIGCHLD is pending and is blocked too.
>>
>> Oh, whoops, yes, the mask goes the other way. Sorry about that :)
>>
>>> When restore starts CRIU,
>>> before forking the root, blocks all signals but sigchild (criu_signals_setup).
>>> Maybe SIGCHILD was blocked _before_ CRIU started and this block got inherited?
>>
>> I just checked and it doesn't look like it (nothing is blocked when
>> exec()ing criu or just after criu gets exec'd). I'll poke around and
>> see what I can figure out.
> 
> Oh no, I was looking at the wrong thing. It seems this is exactly what
> happens (LXC blocks SIGCHLD and then forks criu). This makes me wonder
> how it ever worked and why I never saw this before...

Presumably you've always had successful restores :D
Now we know that CRIU has been bug-free since the recent time :)

> Is there any reason not to apply the attached patch? With this it
> works for me, and it seems like since criu expects this we should do
> it.

> @@ -1255,7 +1255,7 @@ static int criu_signals_setup(void)
>  	 */
>  	sigfillset(&blockmask);
>  	sigdelset(&blockmask, SIGCHLD);
> -	ret = sigprocmask(SIG_BLOCK, &blockmask, NULL);
> +	ret = sigprocmask(SIG_SETMASK, &blockmask, NULL);
>  	if (ret < 0) {
>  		pr_perror("Can't block signals");
>  		return -1;

I like the patch. Please equip one with good description of why we do this,
and I'll apply one. And if you need the 1.5-stable update with this -- let 
me know.

-- Pavel



More information about the CRIU mailing list