[CRIU] hang in ip tool
Tycho Andersen
tycho.andersen at canonical.com
Mon Sep 8 07:09:01 PDT 2014
On Mon, Sep 08, 2014 at 05:41:29PM +0400, Pavel Emelyanov wrote:
> On 09/08/2014 05:17 PM, Tycho Andersen wrote:
> > On Mon, Sep 08, 2014 at 02:53:36PM +0400, Pavel Emelyanov wrote:
> >> On 09/05/2014 11:22 PM, Tycho Andersen wrote:
> >>> Hi all,
> >>>
> >>> On Wed, Sep 03, 2014 at 06:45:39PM +0400, Pavel Emelyanov wrote:
> >>>> On 09/03/2014 05:52 PM, Tycho Andersen wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Recently when restoring containers I have been getting a hang in the
> >>>>> ip tool when criu runs 'ip addr restore'. It prints,
> >>>>>
> >>>>> RTNETLINK answers: File exists
> >>>>> RTNETLINK answers: File exists
> >>>>
> >>>> I've seen such when it was putting 127.0.0.1 on lo which was
> >>>> already there "automatically" (some kernels seem to do it by
> >>>> default).
> >>>>
> >>>>> and then seems to hang. Has anyone seen this behavior, or any ideas
> >>>>> what the problem is?
> >>>>
> >>>> Hanging is something new to me, I've never seen it. Can you
> >>>> strace it to check where the problem is?
> >>>
> >>> So this issue did just resurface and it turns out that it's not
> >>> actually hanging in ip tool, it is hanging in cr_system in the
> >>> sigprocmask call where it is resetting the mask. As I write this, it
> >>> seems to have gone away again. Any ideas what might cause this?
> >>
> >> Hm... I've never seen a process hanging in procmask reset. Can you
> >> check the /proc/pid/stack file when it hangs for exact in-kernel
> >> calltrace?
> >
> > Yes,
> >
> > # the first criu process
> > criu2:~ sudo cat /proc/1537/stack
> > [<ffffffff810d7a8d>] futex_wait_queue_me+0xdd/0x140
> > [<ffffffff810d84f2>] futex_wait+0x182/0x290
> > [<ffffffff810daaee>] do_futex+0xde/0x760
> > [<ffffffff810db1e1>] SyS_futex+0x71/0x150
> > [<ffffffff8172adff>] tracesys+0xe1/0xe6
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > # the only other forked() process
>
> What is this "other"? The cr_system() is called from "main"
> process.
I don't know, the cmdline is the same as the criu process', that's the
only way I know it is a fork() :)
> > criu2:~ sudo cat /proc/1539/stack
> > [<ffffffff8107888b>] ptrace_stop+0x15b/0x2b0
> > [<ffffffff8107a25d>] get_signal_to_deliver+0x3dd/0x6f0
> > [<ffffffff81013448>] do_signal+0x48/0x960
> > [<ffffffff81013dc9>] do_notify_resume+0x69/0xb0
> > [<ffffffff8172aeaa>] int_signal+0x12/0x17
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > I guess what is happening is that as soon as we unmask some signal is
> > delivered and it hangs?
>
> Presumably yes :) And what signal is it?
The full /proc/pid/status of the main criu process is:
Name: criu
State: S (sleeping)
Tgid: 1537
Ngid: 0
Pid: 1537
PPid: 1536
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 1024
Groups: 0
VmPeak: 11568 kB
VmSize: 11564 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 1172 kB
VmRSS: 1172 kB
VmData: 244 kB
VmStk: 268 kB
VmExe: 648 kB
VmLib: 2084 kB
VmPTE: 44 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/15551
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffafeff
SigIgn: 0000000000000000
SigCgt: 0000000180010000
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 14
nonvoluntary_ctxt_switches: 1
And I think the lowest order bit set corresponds to SIGCHLD (assuming
I've decoded everything correctly), so I guess it is one of those?
I'm a bit confused, since the only futex related thing I see in
cr-restore.c's sigchld_handler() is abort_and_wake, which seems like
it wouldn't block.
Tycho
> Thanks,
> Pavel
>
More information about the CRIU
mailing list