[CRIU] CRIU segfaulting when restoring a process

Dmitry Safonov dsafonov at virtuozzo.com
Fri Aug 19 03:03:19 PDT 2016


On 08/19/2016 11:16 AM, Nikolay Borisov wrote:
>
>
> On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
>> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>>> Hello,
>>
>> Hi Nikolay,
>>
>>> I've built CRIU 2.5 from source + some patches which move around stuff
>>> in the headers to facilitate compilation on centos 6.7 with external
>>> glibc 2.19. My CRIU is built the following way:
>>>
>>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>>
>>> This way I can happily dump a simple process a la
>>> https://criu.org/Simple_loop style. However my problems begin when I try
>>> to restore the process, since CRIU segfaults. Here is a restore log as
>>> well as strace from the restore process:
>>>
>>> http://sprunge.us/DcIh - restore.log
>>> http://sprunge.us/CVBG - strace.log
>>>
>>> I'd happy if you could shed some light what I might be causing the
>>> problem. One thing I thought might be the difference between the way the
>>> process being restore - bash is compiled and criu. Here is a comparison
>>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>>> it matter of course
>>
>> So, the problem seems to be in the restorer blob:
>> Switching to the restorer was sucessful:
>>> 23537 write(199999, "(00.188706)  23537: task_args:
>> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
>> 1\ntask_args->clo"..., 155) = 155
>>> 23537 getpid()                          = 23537
>>
>> which is sys_getpid() in __export_restore_task().
>> The fault address is very strange one:
>> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
>> si_addr=0x12024b48d48} ---
>>
>> So, the fail is somwhere between getpid() and sigaction() calls in
>> __export_restore_task() (criu/pie/restorer.c), as we would see
>> sys_sigaction() if it has been called.
>>
>> Could you give a shot with the next diff and paste strace output before
>> failure?
>>
>> --->8--->8--->8---8<---8<---8<---
>> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
>> index 7cc735c96870..8503078a82d9 100644
>> --- a/criu/pie/restorer.c
>> +++ b/criu/pie/restorer.c
>> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
>> task_restore_args *args)
>>      n_helpers = args->helpers_n;
>>      zombies = args->zombies;
>>      n_zombies = args->zombies_n;
>> +    sys_getpid();
>>      *args->breakpoint = rst_sigreturn;
>>
>>      ksigfillset(&act.rt_sa_mask);
>>
>> --->8--->8--->8---8<---8<---8<---
>>
>> I suspect that the problem is in *args pointer to be garbage by some
>> reason -- if we find another getpid() call in strace log, that's not
>> the reason and it's somewhere in ksigfillset() (which is unlikely).
>>
>> And let me think a while, why *args may have such strange junk inside
>> (0x12024b48d48).
>>
>
>
> So here is an strace with your patch applied, it looks a bit different
> indeed - http://sprunge.us/HBEV

Hmm, I don't see the second call to getpid(), but *args, which are in 
%rdi looks quite normal (0x20000).

> I checked the disassembly my compiler produces for the
> __export_restore_task and it indeed has a prologue, setting up the
> stack. So that looks good indeed.
>
> Also here are the register state at the time the crash occurs with your
> patch applied:
>
> (gdb) info register
> rax            0x23000	143360
> rbx            0x20000	131072
> rcx            0x12dad	77229
> rdx            0x48000042b0058948	5188147057151805768
> rsi            0x6fc8a0	7325856
> rdi            0x20000	131072
> rbp            0x5b16	0x5b16
> rsp            0x1eec0	0x1eec0
> r8             0x1	1
> r9             0x1	1
> r10            0x7fffbf9c4d70	140736408079728
> r11            0x206	518
> r12            0x1f070	127088
> r13            0x703e20	7355936
> r14            0x203c0	132032
> r15            0x7fffffffde60	140737488346720
> rip            0x10b27	0x10b27
> eflags         0x10206	[ PF IF RF ]
> cs             0x33	51
> ss             0x2b	43
> ds             0x0	0
> es             0x0	0
> fs             0x0	0
> gs             0x0	0

So %rdi is fine, %rsp also, AFAICS, everything looks just fine.
Could you provide disassembly for __export_restore_task -- till 0xb27 
address for this binary?
Like: $ objdump -dS criu/pie/restorer.built-in.o
I belive it's a load from *args, but...

> My theory here is that since CRIU is compiled with a non-standard glibc,
> it has started being interpreted by glibc 2.19's interpreter
> (ld-linux-x86-64.so.2) and when it's time to restore the process which
> is bash it starts executing it with the new interpreted and my bash
> indeed doesn't work with it and segfaults. Does this sound possible?

Well, it's a good theory, but it can't be applied as the application
wasn't restored fully. So your restoree process yet has not gained the
control and wasn't resumed to execute. The fail is in restore process,
not in the result of it.
So, I think there shouldn't be anything special about using external
glibc -- all should C/R as normal.

The only concern is restorer binary size -- for me address is:
0000000000001520 <__export_restore_task>

And the size at run-time is:
(00.035761)  29972: Found bootstrap VMA hint at: 0x10000 (needs ~104K)

While yours is on a page lesser -- but that's likely just ok.

-- 
              Dmitry


More information about the CRIU mailing list