[CRIU] CRIU segfaulting when restoring a process

Nikolay Borisov kernel at kyup.com
Fri Aug 19 01:16:51 PDT 2016



On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>> Hello,
> 
> Hi Nikolay,
> 
>> I've built CRIU 2.5 from source + some patches which move around stuff
>> in the headers to facilitate compilation on centos 6.7 with external
>> glibc 2.19. My CRIU is built the following way:
>>
>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>
>> This way I can happily dump a simple process a la
>> https://criu.org/Simple_loop style. However my problems begin when I try
>> to restore the process, since CRIU segfaults. Here is a restore log as
>> well as strace from the restore process:
>>
>> http://sprunge.us/DcIh - restore.log
>> http://sprunge.us/CVBG - strace.log
>>
>> I'd happy if you could shed some light what I might be causing the
>> problem. One thing I thought might be the difference between the way the
>> process being restore - bash is compiled and criu. Here is a comparison
>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>> it matter of course
> 
> So, the problem seems to be in the restorer blob:
> Switching to the restorer was sucessful:
>> 23537 write(199999, "(00.188706)  23537: task_args:
> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
> 1\ntask_args->clo"..., 155) = 155
>> 23537 getpid()                          = 23537
> 
> which is sys_getpid() in __export_restore_task().
> The fault address is very strange one:
> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
> si_addr=0x12024b48d48} ---
> 
> So, the fail is somwhere between getpid() and sigaction() calls in
> __export_restore_task() (criu/pie/restorer.c), as we would see
> sys_sigaction() if it has been called.
> 
> Could you give a shot with the next diff and paste strace output before
> failure?
> 
> --->8--->8--->8---8<---8<---8<---
> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> index 7cc735c96870..8503078a82d9 100644
> --- a/criu/pie/restorer.c
> +++ b/criu/pie/restorer.c
> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
> task_restore_args *args)
>      n_helpers = args->helpers_n;
>      zombies = args->zombies;
>      n_zombies = args->zombies_n;
> +    sys_getpid();
>      *args->breakpoint = rst_sigreturn;
> 
>      ksigfillset(&act.rt_sa_mask);
> 
> --->8--->8--->8---8<---8<---8<---
> 
> I suspect that the problem is in *args pointer to be garbage by some
> reason -- if we find another getpid() call in strace log, that's not
> the reason and it's somewhere in ksigfillset() (which is unlikely).
> 
> And let me think a while, why *args may have such strange junk inside
> (0x12024b48d48).
> 


So here is an strace with your patch applied, it looks a bit different
indeed - http://sprunge.us/HBEV

I checked the disassembly my compiler produces for the
__export_restore_task and it indeed has a prologue, setting up the
stack. So that looks good indeed.

Also here are the register state at the time the crash occurs with your
patch applied:

(gdb) info register
rax            0x23000	143360
rbx            0x20000	131072
rcx            0x12dad	77229
rdx            0x48000042b0058948	5188147057151805768
rsi            0x6fc8a0	7325856
rdi            0x20000	131072
rbp            0x5b16	0x5b16
rsp            0x1eec0	0x1eec0
r8             0x1	1
r9             0x1	1
r10            0x7fffbf9c4d70	140736408079728
r11            0x206	518
r12            0x1f070	127088
r13            0x703e20	7355936
r14            0x203c0	132032
r15            0x7fffffffde60	140737488346720
rip            0x10b27	0x10b27
eflags         0x10206	[ PF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0


My theory here is that since CRIU is compiled with a non-standard glibc,
it has started being interpreted by glibc 2.19's interpreter
(ld-linux-x86-64.so.2) and when it's time to restore the process which
is bash it starts executing it with the new interpreted and my bash
indeed doesn't work with it and segfaults. Does this sound possible?

Regards,
Nikolay


More information about the CRIU mailing list