[CRIU] hang in ip tool
Pavel Emelyanov
xemul at parallels.com
Mon Sep 8 06:41:29 PDT 2014
On 09/08/2014 05:17 PM, Tycho Andersen wrote:
> On Mon, Sep 08, 2014 at 02:53:36PM +0400, Pavel Emelyanov wrote:
>> On 09/05/2014 11:22 PM, Tycho Andersen wrote:
>>> Hi all,
>>>
>>> On Wed, Sep 03, 2014 at 06:45:39PM +0400, Pavel Emelyanov wrote:
>>>> On 09/03/2014 05:52 PM, Tycho Andersen wrote:
>>>>> Hi all,
>>>>>
>>>>> Recently when restoring containers I have been getting a hang in the
>>>>> ip tool when criu runs 'ip addr restore'. It prints,
>>>>>
>>>>> RTNETLINK answers: File exists
>>>>> RTNETLINK answers: File exists
>>>>
>>>> I've seen such when it was putting 127.0.0.1 on lo which was
>>>> already there "automatically" (some kernels seem to do it by
>>>> default).
>>>>
>>>>> and then seems to hang. Has anyone seen this behavior, or any ideas
>>>>> what the problem is?
>>>>
>>>> Hanging is something new to me, I've never seen it. Can you
>>>> strace it to check where the problem is?
>>>
>>> So this issue did just resurface and it turns out that it's not
>>> actually hanging in ip tool, it is hanging in cr_system in the
>>> sigprocmask call where it is resetting the mask. As I write this, it
>>> seems to have gone away again. Any ideas what might cause this?
>>
>> Hm... I've never seen a process hanging in procmask reset. Can you
>> check the /proc/pid/stack file when it hangs for exact in-kernel
>> calltrace?
>
> Yes,
>
> # the first criu process
> criu2:~ sudo cat /proc/1537/stack
> [<ffffffff810d7a8d>] futex_wait_queue_me+0xdd/0x140
> [<ffffffff810d84f2>] futex_wait+0x182/0x290
> [<ffffffff810daaee>] do_futex+0xde/0x760
> [<ffffffff810db1e1>] SyS_futex+0x71/0x150
> [<ffffffff8172adff>] tracesys+0xe1/0xe6
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> # the only other forked() process
What is this "other"? The cr_system() is called from "main"
process.
> criu2:~ sudo cat /proc/1539/stack
> [<ffffffff8107888b>] ptrace_stop+0x15b/0x2b0
> [<ffffffff8107a25d>] get_signal_to_deliver+0x3dd/0x6f0
> [<ffffffff81013448>] do_signal+0x48/0x960
> [<ffffffff81013dc9>] do_notify_resume+0x69/0xb0
> [<ffffffff8172aeaa>] int_signal+0x12/0x17
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> I guess what is happening is that as soon as we unmask some signal is
> delivered and it hangs?
Presumably yes :) And what signal is it?
Thanks,
Pavel
More information about the CRIU
mailing list