[CRIU] CRIU segfaulting when restoring a process
Dmitry Safonov
dsafonov at virtuozzo.com
Fri Aug 19 03:03:19 PDT 2016
On 08/19/2016 11:16 AM, Nikolay Borisov wrote:
>
>
> On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
>> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>>> Hello,
>>
>> Hi Nikolay,
>>
>>> I've built CRIU 2.5 from source + some patches which move around stuff
>>> in the headers to facilitate compilation on centos 6.7 with external
>>> glibc 2.19. My CRIU is built the following way:
>>>
>>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>>
>>> This way I can happily dump a simple process a la
>>> https://criu.org/Simple_loop style. However my problems begin when I try
>>> to restore the process, since CRIU segfaults. Here is a restore log as
>>> well as strace from the restore process:
>>>
>>> http://sprunge.us/DcIh - restore.log
>>> http://sprunge.us/CVBG - strace.log
>>>
>>> I'd happy if you could shed some light what I might be causing the
>>> problem. One thing I thought might be the difference between the way the
>>> process being restore - bash is compiled and criu. Here is a comparison
>>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>>> it matter of course
>>
>> So, the problem seems to be in the restorer blob:
>> Switching to the restorer was sucessful:
>>> 23537 write(199999, "(00.188706) 23537: task_args:
>> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
>> 1\ntask_args->clo"..., 155) = 155
>>> 23537 getpid() = 23537
>>
>> which is sys_getpid() in __export_restore_task().
>> The fault address is very strange one:
>> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
>> si_addr=0x12024b48d48} ---
>>
>> So, the fail is somwhere between getpid() and sigaction() calls in
>> __export_restore_task() (criu/pie/restorer.c), as we would see
>> sys_sigaction() if it has been called.
>>
>> Could you give a shot with the next diff and paste strace output before
>> failure?
>>
>> --->8--->8--->8---8<---8<---8<---
>> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
>> index 7cc735c96870..8503078a82d9 100644
>> --- a/criu/pie/restorer.c
>> +++ b/criu/pie/restorer.c
>> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
>> task_restore_args *args)
>> n_helpers = args->helpers_n;
>> zombies = args->zombies;
>> n_zombies = args->zombies_n;
>> + sys_getpid();
>> *args->breakpoint = rst_sigreturn;
>>
>> ksigfillset(&act.rt_sa_mask);
>>
>> --->8--->8--->8---8<---8<---8<---
>>
>> I suspect that the problem is in *args pointer to be garbage by some
>> reason -- if we find another getpid() call in strace log, that's not
>> the reason and it's somewhere in ksigfillset() (which is unlikely).
>>
>> And let me think a while, why *args may have such strange junk inside
>> (0x12024b48d48).
>>
>
>
> So here is an strace with your patch applied, it looks a bit different
> indeed - http://sprunge.us/HBEV
Hmm, I don't see the second call to getpid(), but *args, which are in
%rdi looks quite normal (0x20000).
> I checked the disassembly my compiler produces for the
> __export_restore_task and it indeed has a prologue, setting up the
> stack. So that looks good indeed.
>
> Also here are the register state at the time the crash occurs with your
> patch applied:
>
> (gdb) info register
> rax 0x23000 143360
> rbx 0x20000 131072
> rcx 0x12dad 77229
> rdx 0x48000042b0058948 5188147057151805768
> rsi 0x6fc8a0 7325856
> rdi 0x20000 131072
> rbp 0x5b16 0x5b16
> rsp 0x1eec0 0x1eec0
> r8 0x1 1
> r9 0x1 1
> r10 0x7fffbf9c4d70 140736408079728
> r11 0x206 518
> r12 0x1f070 127088
> r13 0x703e20 7355936
> r14 0x203c0 132032
> r15 0x7fffffffde60 140737488346720
> rip 0x10b27 0x10b27
> eflags 0x10206 [ PF IF RF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
So %rdi is fine, %rsp also, AFAICS, everything looks just fine.
Could you provide disassembly for __export_restore_task -- till 0xb27
address for this binary?
Like: $ objdump -dS criu/pie/restorer.built-in.o
I belive it's a load from *args, but...
> My theory here is that since CRIU is compiled with a non-standard glibc,
> it has started being interpreted by glibc 2.19's interpreter
> (ld-linux-x86-64.so.2) and when it's time to restore the process which
> is bash it starts executing it with the new interpreted and my bash
> indeed doesn't work with it and segfaults. Does this sound possible?
Well, it's a good theory, but it can't be applied as the application
wasn't restored fully. So your restoree process yet has not gained the
control and wasn't resumed to execute. The fail is in restore process,
not in the result of it.
So, I think there shouldn't be anything special about using external
glibc -- all should C/R as normal.
The only concern is restorer binary size -- for me address is:
0000000000001520 <__export_restore_task>
And the size at run-time is:
(00.035761) 29972: Found bootstrap VMA hint at: 0x10000 (needs ~104K)
While yours is on a page lesser -- but that's likely just ok.
--
Dmitry
More information about the CRIU
mailing list