[CRIU] remap_dead_pid test hang?
Tycho Andersen
tycho.andersen at canonical.com
Thu Mar 19 15:20:33 PDT 2015
On Fri, Mar 20, 2015 at 12:59:43AM +0300, Pavel Emelyanov wrote:
> On 03/20/2015 12:42 AM, Tycho Andersen wrote:
> > On Thu, Mar 19, 2015 at 10:05:09PM +0300, Pavel Emelyanov wrote:
> >> On 03/19/2015 10:01 PM, Pavel Emelyanov wrote:
> >>> On 03/19/2015 09:54 PM, Tycho Andersen wrote:
> >>>> On Thu, Mar 19, 2015 at 07:36:41PM +0300, Pavel Emelyanov wrote:
> >>>>> On 03/19/2015 05:54 PM, Tycho Andersen wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> While testing the cgroup property restore patch, I noticed that the
> >>>>>> remap_dead_pid test seems to hang both with my patch and the current
> >>>>>> master. Has anyone else noticed this?
> >>>>>
> >>>>> No. We have Jenkins running full zdtm suite (and more) in 7x24 manner
> >>>>> and though it fails sometimes, but we haven't met any issues with this
> >>>>> particular test.
> >>>>>
> >>>>> Does the test hang by itself, or after restore?
> >>>>
> >>>> How do I tell? It looks like it's just hung:
> >>>>
> >>>> root 32749 0.0 0.1 11652 3292 pts/1 S 18:49 0:00 /bin/bash ./zdtm.sh --ct -r static/remap_dead_pid
> >>>> root 314 0.0 0.0 4196 644 pts/1 S 18:49 0:00 \_ ./zdtm_ct ./zdtm.sh -r static/remap_dead_pid
> >>>> root 370 0.0 0.1 11752 3476 ? Ss 18:49 0:00 \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
> >>>> root 15963 0.0 0.1 11752 2372 ? S 18:49 0:00 \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
> >>>> root 15964 0.0 0.1 8464 2332 ? S 18:49 0:00 \_ make -C zdtm/live/static remap_dead_pid.pid
> >>>> 18943 17937 0.0 0.0 4492 816 ? S 18:49 0:00 \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
> >>>> 18943 17940 0.0 0.0 4492 104 ? Ss 18:49 0:00 \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
> >>>> 18943 17941 0.0 0.0 4492 1136 ? S 18:49 0:00 \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
> >>>
> >>> Hm... :) There should be a directory where criu is supposed to put it's output
> >>> (logs and images), the test/dump/ one if I'm not mistaken. Then you can check
> >>> the remap_dead_pid's directory to container dump.log or restore.log to find
> >>> out whether any of those actions took place.
> >>>
> >>
> >> Ah, no, what am I telling?! Sorry, of course there hasn't been any dumps
> >> yet, the make $test.pid is still there. And it's not finished because the
> >> process 816 hasn't exited. Yet. Can you strace them all to check what are
> >> they currently doing?
> >
> > Yes, 17940 was in waitpid4, and 17941 was in nanosleep. I guess that
> > means the kill() in that test failed somehow?
>
> Looks like yes. What bothers me is that the task is killed with SIGINT,
> not SIGKILL. Is this signal blocked or ignored by 17941? Can you check
> it in /proc/pid/status?
Doesn't look like it,
http://paste.ubuntu.com/10630380/
However, when I switch it to SIGKILL it does seem to succeed every
time for me.
Tycho
> > Is there a race there that I am missing?
>
> Shouldn't be. The code is pretty straightforward -- fork, kill then wait.
>
> Thanks,
> Pavel
>
More information about the CRIU
mailing list