[CRIU] CRIU segfaulting when restoring a process

Dmitry Safonov dsafonov at virtuozzo.com
Fri Aug 19 04:53:18 PDT 2016


On 08/19/2016 02:48 PM, Nikolay Borisov wrote:
>
>
> On 08/19/2016 02:40 PM, Dmitry Safonov wrote:
>> On 08/19/2016 01:25 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 08/19/2016 01:03 PM, Dmitry Safonov wrote:
>>>> On 08/19/2016 11:16 AM, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
>>>>>> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>>>>>>> Hello,
>>>>>>
>>>>>> Hi Nikolay,
>>>>>>
>>>>>>> I've built CRIU 2.5 from source + some patches which move around
>>>>>>> stuff
>>>>>>> in the headers to facilitate compilation on centos 6.7 with external
>>>>>>> glibc 2.19. My CRIU is built the following way:
>>>>>>>
>>>>>>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/
>>>>>>> -L/opt/glibc-2.19//lib/
>>>>>>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>>>>>>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>>>>>>
>>>>>>> This way I can happily dump a simple process a la
>>>>>>> https://criu.org/Simple_loop style. However my problems begin when I
>>>>>>> try
>>>>>>> to restore the process, since CRIU segfaults. Here is a restore
>>>>>>> log as
>>>>>>> well as strace from the restore process:
>>>>>>>
>>>>>>> http://sprunge.us/DcIh - restore.log
>>>>>>> http://sprunge.us/CVBG - strace.log
>>>>>>>
>>>>>>> I'd happy if you could shed some light what I might be causing the
>>>>>>> problem. One thing I thought might be the difference between the way
>>>>>>> the
>>>>>>> process being restore - bash is compiled and criu. Here is a
>>>>>>> comparison
>>>>>>> how they libraries look like: http://paste.ubuntu.com/23067392/
>>>>>>> should
>>>>>>> it matter of course
>>>>>>
>>>>>> So, the problem seems to be in the restorer blob:
>>>>>> Switching to the restorer was sucessful:
>>>>>>> 23537 write(199999, "(00.188706)  23537: task_args:
>>>>>> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
>>>>>> 1\ntask_args->clo"..., 155) = 155
>>>>>>> 23537 getpid()                          = 23537
>>>>>>
>>>>>> which is sys_getpid() in __export_restore_task().
>>>>>> The fault address is very strange one:
>>>>>> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
>>>>>> si_addr=0x12024b48d48} ---
>>>>>>
>>>>>> So, the fail is somwhere between getpid() and sigaction() calls in
>>>>>> __export_restore_task() (criu/pie/restorer.c), as we would see
>>>>>> sys_sigaction() if it has been called.
>>>>>>
>>>>>> Could you give a shot with the next diff and paste strace output
>>>>>> before
>>>>>> failure?
>>>>>>
>>>>>> --->8--->8--->8---8<---8<---8<---
>>>>>> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
>>>>>> index 7cc735c96870..8503078a82d9 100644
>>>>>> --- a/criu/pie/restorer.c
>>>>>> +++ b/criu/pie/restorer.c
>>>>>> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
>>>>>> task_restore_args *args)
>>>>>>      n_helpers = args->helpers_n;
>>>>>>      zombies = args->zombies;
>>>>>>      n_zombies = args->zombies_n;
>>>>>> +    sys_getpid();
>>>>>>      *args->breakpoint = rst_sigreturn;
>>>>>>
>>>>>>      ksigfillset(&act.rt_sa_mask);
>>>>>>
>>>>>> --->8--->8--->8---8<---8<---8<---
>>>>>>
>>>>>> I suspect that the problem is in *args pointer to be garbage by some
>>>>>> reason -- if we find another getpid() call in strace log, that's not
>>>>>> the reason and it's somewhere in ksigfillset() (which is unlikely).
>>>>>>
>>>>>> And let me think a while, why *args may have such strange junk inside
>>>>>> (0x12024b48d48).
>>>>>>
>>>>>
>>>>>
>>>>> So here is an strace with your patch applied, it looks a bit different
>>>>> indeed - http://sprunge.us/HBEV
>>>>
>>>> Hmm, I don't see the second call to getpid(), but *args, which are in
>>>> %rdi looks quite normal (0x20000).
>>>>
>>>>> I checked the disassembly my compiler produces for the
>>>>> __export_restore_task and it indeed has a prologue, setting up the
>>>>> stack. So that looks good indeed.
>>>>>
>>>>> Also here are the register state at the time the crash occurs with your
>>>>> patch applied:
>>>>>
>>>>> (gdb) info register
>>>>> rax            0x23000    143360
>>>>> rbx            0x20000    131072
>>>>> rcx            0x12dad    77229
>>>>> rdx            0x48000042b0058948    5188147057151805768
>>>>> rsi            0x6fc8a0    7325856
>>>>> rdi            0x20000    131072
>>>>> rbp            0x5b16    0x5b16
>>>>> rsp            0x1eec0    0x1eec0
>>>>> r8             0x1    1
>>>>> r9             0x1    1
>>>>> r10            0x7fffbf9c4d70    140736408079728
>>>>> r11            0x206    518
>>>>> r12            0x1f070    127088
>>>>> r13            0x703e20    7355936
>>>>> r14            0x203c0    132032
>>>>> r15            0x7fffffffde60    140737488346720
>>>>> rip            0x10b27    0x10b27
>>>>> eflags         0x10206    [ PF IF RF ]
>>>>> cs             0x33    51
>>>>> ss             0x2b    43
>>>>> ds             0x0    0
>>>>> es             0x0    0
>>>>> fs             0x0    0
>>>>> gs             0x0    0
>>>>
>>>> So %rdi is fine, %rsp also, AFAICS, everything looks just fine.
>>>> Could you provide disassembly for __export_restore_task -- till 0xb27
>>>> address for this binary?
>>>> Like: $ objdump -dS criu/pie/restorer.built-in.o
>>>> I belive it's a load from *args, but...
>>>
>>> Does this help:  http://paste.ubuntu.com/23069854/ ?
>>
>> Yes, thanks.
>> So, it does crash here:
>> b27:       48 89 02                mov    %rax,(%rdx)
>> And %rdx was set earlier here:
>> afa:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx
>>
>> This is simple R_X86_64_PC32 relocation at this place.
>> So, as compel successfuly patched this relative relocations here
>> (in the same function earlier):
>>      b01:       48 89 05 00 00 00 00    mov    %rax,0x0(%rip)
>>      b0f:       89 05 00 00 00 00       mov    %eax,0x0(%rip)
>>      b1c:       48 89 05 00 00 00 00    mov    %rax,0x0(%rip)
>>
>> It should have resolved this place too (the reason of fail)
>> afa:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx
>>
>> The question, has compel patched it?
>> To check it, it would be worth if you do the following:
>> 1. compile CRIU, like `make -j5`
>> 2. then do `touch criu/pie/restorer.c`
>> 3. copy, please, the output of `make V=1` to pastebin.
>
> http://paste.ubuntu.com/23070061/
>
> [SNIP]

Thanks, that's good.
So, it saw the relocation (0xafd), but hasn't patched it, AFACS:

restorer_blob: 		r_offset 0xafd  r_info 0xd000000009 / sym 0xd0 type 0x9 
  symsecoff 0x0
restorer_blob: 		r_offset 0xb04  r_info 0x200000002 / sym 0x2  type 0x2 
symsecoff 0x0
restorer_blob: 			value 0x0        addend32 44   addend64 44       place 
b04      symname
restorer_blob: 				R_X86_64_PC32     at 0xb04  val 0x42b0

So, that looks like the reason of segfault. Let me check the compel
code, wtf it didn't patch the relative relocation.

-- 
              Dmitry


More information about the CRIU mailing list