[CRIU] crash when restoring with current git master?

Tycho Andersen tycho.andersen at canonical.com
Fri Jun 5 06:47:28 PDT 2015


On Fri, Jun 05, 2015 at 11:09:47AM +0300, Pavel Emelyanov wrote:
> On 06/05/2015 03:14 AM, Tycho Andersen wrote:
> > On Thu, Jun 04, 2015 at 04:29:25PM -0600, Tycho Andersen wrote:
> >> On Thu, Jun 04, 2015 at 11:54:54PM +0300, Pavel Emelyanov wrote:
> >>
> >>>>>> +
> >>>>>> +err:
> >>>>>> +	futex_abort_and_wake(&task_entries->nr_in_progress);
> >>>>>> +	return -1;
> >>>>>>  }
> >>>>>
> >>>>> But this thing has never been here. Instead, when child gets an error is
> >>>>> exits and then the sigchld_handler() runs and does futex_abort_and_wake().
> >>>>> Why hasn't this logic worked this time?
> >>>>
> >>>> I just got around to looking at this again, and I'm seeing:
> >>>>
> >>>> ShdPnd: 0000000000010000
> >>>> SigBlk: fffffffe7ffbfeff
> >>>>
> >>>> in the parent of the process that died. If my math is right that's the
> >>>> 17th bit, which is SIGCHLD. I don't know enough about why that
> >>>> wouldn't get delivered, though, given SigBlk.
> >>>
> >>> Yes, the SIGCHLD is pending and is blocked too.
> >>
> >> Oh, whoops, yes, the mask goes the other way. Sorry about that :)
> >>
> >>> When restore starts CRIU,
> >>> before forking the root, blocks all signals but sigchild (criu_signals_setup).
> >>> Maybe SIGCHILD was blocked _before_ CRIU started and this block got inherited?
> >>
> >> I just checked and it doesn't look like it (nothing is blocked when
> >> exec()ing criu or just after criu gets exec'd). I'll poke around and
> >> see what I can figure out.
> > 
> > Oh no, I was looking at the wrong thing. It seems this is exactly what
> > happens (LXC blocks SIGCHLD and then forks criu). This makes me wonder
> > how it ever worked and why I never saw this before...
> 
> Presumably you've always had successful restores :D
> Now we know that CRIU has been bug-free since the recent time :)

Ha, maybe so. I've certainly broken things during development, though.
I guess I've just used up all my luck :(

> > Is there any reason not to apply the attached patch? With this it
> > works for me, and it seems like since criu expects this we should do
> > it.
> 
> > @@ -1255,7 +1255,7 @@ static int criu_signals_setup(void)
> >  	 */
> >  	sigfillset(&blockmask);
> >  	sigdelset(&blockmask, SIGCHLD);
> > -	ret = sigprocmask(SIG_BLOCK, &blockmask, NULL);
> > +	ret = sigprocmask(SIG_SETMASK, &blockmask, NULL);
> >  	if (ret < 0) {
> >  		pr_perror("Can't block signals");
> >  		return -1;
> 
> I like the patch. Please equip one with good description of why we do this,
> and I'll apply one. And if you need the 1.5-stable update with this -- let 
> me know.

I don't think we need a stable release for this, but I'll let you know
if that changes. Thanks for the offer.

Tycho

> -- Pavel
> 


More information about the CRIU mailing list