[CRIU] CRIU segfaulting when restoring a process
Nikolay Borisov
kernel at kyup.com
Fri Aug 19 04:48:55 PDT 2016
On 08/19/2016 02:40 PM, Dmitry Safonov wrote:
> On 08/19/2016 01:25 PM, Nikolay Borisov wrote:
>>
>>
>> On 08/19/2016 01:03 PM, Dmitry Safonov wrote:
>>> On 08/19/2016 11:16 AM, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
>>>>> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>>>>>> Hello,
>>>>>
>>>>> Hi Nikolay,
>>>>>
>>>>>> I've built CRIU 2.5 from source + some patches which move around
>>>>>> stuff
>>>>>> in the headers to facilitate compilation on centos 6.7 with external
>>>>>> glibc 2.19. My CRIU is built the following way:
>>>>>>
>>>>>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/
>>>>>> -L/opt/glibc-2.19//lib/
>>>>>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>>>>>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>>>>>
>>>>>> This way I can happily dump a simple process a la
>>>>>> https://criu.org/Simple_loop style. However my problems begin when I
>>>>>> try
>>>>>> to restore the process, since CRIU segfaults. Here is a restore
>>>>>> log as
>>>>>> well as strace from the restore process:
>>>>>>
>>>>>> http://sprunge.us/DcIh - restore.log
>>>>>> http://sprunge.us/CVBG - strace.log
>>>>>>
>>>>>> I'd happy if you could shed some light what I might be causing the
>>>>>> problem. One thing I thought might be the difference between the way
>>>>>> the
>>>>>> process being restore - bash is compiled and criu. Here is a
>>>>>> comparison
>>>>>> how they libraries look like: http://paste.ubuntu.com/23067392/
>>>>>> should
>>>>>> it matter of course
>>>>>
>>>>> So, the problem seems to be in the restorer blob:
>>>>> Switching to the restorer was sucessful:
>>>>>> 23537 write(199999, "(00.188706) 23537: task_args:
>>>>> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
>>>>> 1\ntask_args->clo"..., 155) = 155
>>>>>> 23537 getpid() = 23537
>>>>>
>>>>> which is sys_getpid() in __export_restore_task().
>>>>> The fault address is very strange one:
>>>>> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
>>>>> si_addr=0x12024b48d48} ---
>>>>>
>>>>> So, the fail is somwhere between getpid() and sigaction() calls in
>>>>> __export_restore_task() (criu/pie/restorer.c), as we would see
>>>>> sys_sigaction() if it has been called.
>>>>>
>>>>> Could you give a shot with the next diff and paste strace output
>>>>> before
>>>>> failure?
>>>>>
>>>>> --->8--->8--->8---8<---8<---8<---
>>>>> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
>>>>> index 7cc735c96870..8503078a82d9 100644
>>>>> --- a/criu/pie/restorer.c
>>>>> +++ b/criu/pie/restorer.c
>>>>> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
>>>>> task_restore_args *args)
>>>>> n_helpers = args->helpers_n;
>>>>> zombies = args->zombies;
>>>>> n_zombies = args->zombies_n;
>>>>> + sys_getpid();
>>>>> *args->breakpoint = rst_sigreturn;
>>>>>
>>>>> ksigfillset(&act.rt_sa_mask);
>>>>>
>>>>> --->8--->8--->8---8<---8<---8<---
>>>>>
>>>>> I suspect that the problem is in *args pointer to be garbage by some
>>>>> reason -- if we find another getpid() call in strace log, that's not
>>>>> the reason and it's somewhere in ksigfillset() (which is unlikely).
>>>>>
>>>>> And let me think a while, why *args may have such strange junk inside
>>>>> (0x12024b48d48).
>>>>>
>>>>
>>>>
>>>> So here is an strace with your patch applied, it looks a bit different
>>>> indeed - http://sprunge.us/HBEV
>>>
>>> Hmm, I don't see the second call to getpid(), but *args, which are in
>>> %rdi looks quite normal (0x20000).
>>>
>>>> I checked the disassembly my compiler produces for the
>>>> __export_restore_task and it indeed has a prologue, setting up the
>>>> stack. So that looks good indeed.
>>>>
>>>> Also here are the register state at the time the crash occurs with your
>>>> patch applied:
>>>>
>>>> (gdb) info register
>>>> rax 0x23000 143360
>>>> rbx 0x20000 131072
>>>> rcx 0x12dad 77229
>>>> rdx 0x48000042b0058948 5188147057151805768
>>>> rsi 0x6fc8a0 7325856
>>>> rdi 0x20000 131072
>>>> rbp 0x5b16 0x5b16
>>>> rsp 0x1eec0 0x1eec0
>>>> r8 0x1 1
>>>> r9 0x1 1
>>>> r10 0x7fffbf9c4d70 140736408079728
>>>> r11 0x206 518
>>>> r12 0x1f070 127088
>>>> r13 0x703e20 7355936
>>>> r14 0x203c0 132032
>>>> r15 0x7fffffffde60 140737488346720
>>>> rip 0x10b27 0x10b27
>>>> eflags 0x10206 [ PF IF RF ]
>>>> cs 0x33 51
>>>> ss 0x2b 43
>>>> ds 0x0 0
>>>> es 0x0 0
>>>> fs 0x0 0
>>>> gs 0x0 0
>>>
>>> So %rdi is fine, %rsp also, AFAICS, everything looks just fine.
>>> Could you provide disassembly for __export_restore_task -- till 0xb27
>>> address for this binary?
>>> Like: $ objdump -dS criu/pie/restorer.built-in.o
>>> I belive it's a load from *args, but...
>>
>> Does this help: http://paste.ubuntu.com/23069854/ ?
>
> Yes, thanks.
> So, it does crash here:
> b27: 48 89 02 mov %rax,(%rdx)
> And %rdx was set earlier here:
> afa: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx
>
> This is simple R_X86_64_PC32 relocation at this place.
> So, as compel successfuly patched this relative relocations here
> (in the same function earlier):
> b01: 48 89 05 00 00 00 00 mov %rax,0x0(%rip)
> b0f: 89 05 00 00 00 00 mov %eax,0x0(%rip)
> b1c: 48 89 05 00 00 00 00 mov %rax,0x0(%rip)
>
> It should have resolved this place too (the reason of fail)
> afa: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx
>
> The question, has compel patched it?
> To check it, it would be worth if you do the following:
> 1. compile CRIU, like `make -j5`
> 2. then do `touch criu/pie/restorer.c`
> 3. copy, please, the output of `make V=1` to pastebin.
http://paste.ubuntu.com/23070061/
[SNIP]
More information about the CRIU
mailing list