[CRIU] crash when restoring with current git master?

Tycho Andersen tycho.andersen at canonical.com
Thu Jun 4 09:52:01 PDT 2015


On Fri, Apr 24, 2015 at 10:40:39PM +0300, Pavel Emelyanov wrote:
> On 04/24/2015 09:41 PM, Tycho Andersen wrote:
> > On Fri, Apr 24, 2015 at 09:15:46PM +0300, Cyrill Gorcunov wrote:
> >> On Fri, Apr 24, 2015 at 11:53:41AM -0600, Tycho Andersen wrote:
> >>> Hi all,
> >>>
> >>> When doing a c/r of lxc containers (vivid host, vivid container), I'm
> >>> getting a crash sometimes. It looks like something is dying in the
> >>> restorer blob:
> >>>
> >>> http://paste.ubuntu.com/10880188/
> >>>
> >>> Has anyone seen this? Any idea what's causing it?
> >>
> >> Didn't see like this before. But definitely something goes bad in cgroups at least
> >>
> >>  | (00.084962)    333: Error (files-reg.c:1055): File sys/fs/cgroup/perf_event/lxc/u2-30/cgroup.procs has bad size 17 (expect 11)
> >>
> >> someone write another task there?
> > 
> > Ah, whoops. I missed this somehow in the log. That's probably what's
> > happening. Note that criu hangs on this error; the attached patch
> > fixes the hang.
> 
> > +
> > +err:
> > +	futex_abort_and_wake(&task_entries->nr_in_progress);
> > +	return -1;
> >  }
> 
> But this thing has never been here. Instead, when child gets an error is
> exits and then the sigchld_handler() runs and does futex_abort_and_wake().
> Why hasn't this logic worked this time?

I just got around to looking at this again, and I'm seeing:

ShdPnd: 0000000000010000
SigBlk: fffffffe7ffbfeff

in the parent of the process that died. If my math is right that's the
17th bit, which is SIGCHLD. I don't know enough about why that
wouldn't get delivered, though, given SigBlk.

Tycho


More information about the CRIU mailing list