[CRIU] CRIU segfaulting when restoring a process

Dmitry Safonov dsafonov at virtuozzo.com
Thu Aug 18 09:28:26 PDT 2016


On 08/18/2016 06:58 PM, Dmitry Safonov wrote:
> On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
>> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>>> Hello,
>>
>> Hi Nikolay,
>>
>>> I've built CRIU 2.5 from source + some patches which move around stuff
>>> in the headers to facilitate compilation on centos 6.7 with external
>>> glibc 2.19. My CRIU is built the following way:
>>>
>>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>>
>>> This way I can happily dump a simple process a la
>>> https://criu.org/Simple_loop style. However my problems begin when I try
>>> to restore the process, since CRIU segfaults. Here is a restore log as
>>> well as strace from the restore process:
>>>
>>> http://sprunge.us/DcIh - restore.log
>>> http://sprunge.us/CVBG - strace.log
>>>
>>> I'd happy if you could shed some light what I might be causing the
>>> problem. One thing I thought might be the difference between the way the
>>> process being restore - bash is compiled and criu. Here is a comparison
>>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>>> it matter of course
>>
>> So, the problem seems to be in the restorer blob:
>> Switching to the restorer was sucessful:
>>> 23537 write(199999, "(00.188706)  23537: task_args:
>> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
>> 1\ntask_args->clo"..., 155) = 155
>>> 23537 getpid()                          = 23537
>>
>> which is sys_getpid() in __export_restore_task().
>> The fault address is very strange one:
>> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
>> si_addr=0x12024b48d48} ---
>>
>> So, the fail is somwhere between getpid() and sigaction() calls in
>> __export_restore_task() (criu/pie/restorer.c), as we would see
>> sys_sigaction() if it has been called.
>>
>> Could you give a shot with the next diff and paste strace output before
>> failure?
>>
>> --->8--->8--->8---8<---8<---8<---
>> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
>> index 7cc735c96870..8503078a82d9 100644
>> --- a/criu/pie/restorer.c
>> +++ b/criu/pie/restorer.c
>> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
>> task_restore_args *args)
>>      n_helpers = args->helpers_n;
>>      zombies = args->zombies;
>>      n_zombies = args->zombies_n;
>> +    sys_getpid();
>>      *args->breakpoint = rst_sigreturn;
>>
>>      ksigfillset(&act.rt_sa_mask);
>>
>> --->8--->8--->8---8<---8<---8<---
>>
>> I suspect that the problem is in *args pointer to be garbage by some
>> reason -- if we find another getpid() call in strace log, that's not
>> the reason and it's somewhere in ksigfillset() (which is unlikely).
>>
>> And let me think a while, why *args may have such strange junk inside
>> (0x12024b48d48).
>>
>
> Hmm, another idea, what may be wrong is stack pointer.
> We're right after entering the restorer and compiler hasn't yet
> accessed stack even to form a stack frame.
> So, if stackframe is corrupted, we can't save the result of getpid()
> and we have the same result.
> It would be worth, if you provide the registers state at segfault
> moment. To do this, run:
> $ ulimit -c unlimited
> which will allow to save core dump files,
> $ strace criu restore -vvvv #... the usual args
> $ gdb core.<pid>
> to open with gdb saved core file in the same directory
>> info registers
> to print registers state at crash moment.
>

So, my toolchain produces stack frame on restorer entry, and I think
your also should (there is nothing special, just a fuction frame):
0000000000000fa0 <__export_restore_thread>:
/*
  * Threads restoration via sigreturn. Note it's locked
  * routine and calls for unlock at the end.
  */
long __export_restore_thread(struct thread_restore_args *args)
{
      fa0:       41 55                   push   %r13
      fa2:       41 54                   push   %r12
      fa4:       55                      push   %rbp
      fa5:       53                      push   %rbx
      fa6:       48 89 fb                mov    %rdi,%rbx
      fa9:       48 83 ec 18             sub    $0x18,%rsp

So, I think, the second theory is just a bs.
Anyway, registers state may be helpful.

-- 
              Dmitry


More information about the CRIU mailing list