[CRIU] hang in ip tool

Tycho Andersen tycho.andersen at canonical.com
Mon Sep 8 07:09:01 PDT 2014


On Mon, Sep 08, 2014 at 05:41:29PM +0400, Pavel Emelyanov wrote:
> On 09/08/2014 05:17 PM, Tycho Andersen wrote:
> > On Mon, Sep 08, 2014 at 02:53:36PM +0400, Pavel Emelyanov wrote:
> >> On 09/05/2014 11:22 PM, Tycho Andersen wrote:
> >>> Hi all,
> >>>
> >>> On Wed, Sep 03, 2014 at 06:45:39PM +0400, Pavel Emelyanov wrote:
> >>>> On 09/03/2014 05:52 PM, Tycho Andersen wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Recently when restoring containers I have been getting a hang in the
> >>>>> ip tool when criu runs 'ip addr restore'. It prints,
> >>>>>
> >>>>> RTNETLINK answers: File exists
> >>>>> RTNETLINK answers: File exists
> >>>>
> >>>> I've seen such when it was putting 127.0.0.1 on lo which was
> >>>> already there "automatically" (some kernels seem to do it by
> >>>> default).
> >>>>
> >>>>> and then seems to hang. Has anyone seen this behavior, or any ideas
> >>>>> what the problem is?
> >>>>
> >>>> Hanging is something new to me, I've never seen it. Can you 
> >>>> strace it to check where the problem is?
> >>>
> >>> So this issue did just resurface and it turns out that it's not
> >>> actually hanging in ip tool, it is hanging in cr_system in the
> >>> sigprocmask call where it is resetting the mask. As I write this, it
> >>> seems to have gone away again. Any ideas what might cause this?
> >>
> >> Hm... I've never seen a process hanging in procmask reset. Can you
> >> check the /proc/pid/stack file when it hangs for exact in-kernel
> >> calltrace?
> > 
> > Yes,
> > 
> > # the first criu process
> > criu2:~ sudo cat /proc/1537/stack
> > [<ffffffff810d7a8d>] futex_wait_queue_me+0xdd/0x140
> > [<ffffffff810d84f2>] futex_wait+0x182/0x290
> > [<ffffffff810daaee>] do_futex+0xde/0x760
> > [<ffffffff810db1e1>] SyS_futex+0x71/0x150
> > [<ffffffff8172adff>] tracesys+0xe1/0xe6
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > # the only other forked() process
> 
> What is this "other"? The cr_system() is called from "main"
> process.

I don't know, the cmdline is the same as the criu process', that's the
only way I know it is a fork() :)

> > criu2:~ sudo cat /proc/1539/stack
> > [<ffffffff8107888b>] ptrace_stop+0x15b/0x2b0
> > [<ffffffff8107a25d>] get_signal_to_deliver+0x3dd/0x6f0
> > [<ffffffff81013448>] do_signal+0x48/0x960
> > [<ffffffff81013dc9>] do_notify_resume+0x69/0xb0
> > [<ffffffff8172aeaa>] int_signal+0x12/0x17
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > I guess what is happening is that as soon as we unmask some signal is
> > delivered and it hangs?
> 
> Presumably yes :) And what signal is it?

The full /proc/pid/status of the main criu process is:

Name: criu
State:  S (sleeping)
Tgid: 1537
Ngid: 0
Pid:  1537
PPid: 1536
TracerPid:  0
Uid:  0 0 0 0
Gid:  0 0 0 0
FDSize: 1024
Groups: 0 
VmPeak:    11568 kB
VmSize:    11564 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      1172 kB
VmRSS:      1172 kB
VmData:      244 kB
VmStk:       268 kB
VmExe:       648 kB
VmLib:      2084 kB
VmPTE:        44 kB
VmSwap:        0 kB
Threads:  1
SigQ: 0/15551
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffafeff
SigIgn: 0000000000000000
SigCgt: 0000000180010000
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
Seccomp:  0
Cpus_allowed: f
Cpus_allowed_list:  0-3
Mems_allowed: 00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:  14
nonvoluntary_ctxt_switches: 1

And I think the lowest order bit set corresponds to SIGCHLD (assuming
I've decoded everything correctly), so I guess it is one of those?

I'm a bit confused, since the only futex related thing I see in
cr-restore.c's sigchld_handler() is abort_and_wake, which seems like
it wouldn't block.

Tycho

> Thanks,
> Pavel
> 


More information about the CRIU mailing list