[CRIU] CRIU segfaulting when restoring a process
Nikolay Borisov
kernel at kyup.com
Fri Aug 19 01:16:51 PDT 2016
On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>> Hello,
>
> Hi Nikolay,
>
>> I've built CRIU 2.5 from source + some patches which move around stuff
>> in the headers to facilitate compilation on centos 6.7 with external
>> glibc 2.19. My CRIU is built the following way:
>>
>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>
>> This way I can happily dump a simple process a la
>> https://criu.org/Simple_loop style. However my problems begin when I try
>> to restore the process, since CRIU segfaults. Here is a restore log as
>> well as strace from the restore process:
>>
>> http://sprunge.us/DcIh - restore.log
>> http://sprunge.us/CVBG - strace.log
>>
>> I'd happy if you could shed some light what I might be causing the
>> problem. One thing I thought might be the difference between the way the
>> process being restore - bash is compiled and criu. Here is a comparison
>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>> it matter of course
>
> So, the problem seems to be in the restorer blob:
> Switching to the restorer was sucessful:
>> 23537 write(199999, "(00.188706) 23537: task_args:
> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
> 1\ntask_args->clo"..., 155) = 155
>> 23537 getpid() = 23537
>
> which is sys_getpid() in __export_restore_task().
> The fault address is very strange one:
> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
> si_addr=0x12024b48d48} ---
>
> So, the fail is somwhere between getpid() and sigaction() calls in
> __export_restore_task() (criu/pie/restorer.c), as we would see
> sys_sigaction() if it has been called.
>
> Could you give a shot with the next diff and paste strace output before
> failure?
>
> --->8--->8--->8---8<---8<---8<---
> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> index 7cc735c96870..8503078a82d9 100644
> --- a/criu/pie/restorer.c
> +++ b/criu/pie/restorer.c
> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
> task_restore_args *args)
> n_helpers = args->helpers_n;
> zombies = args->zombies;
> n_zombies = args->zombies_n;
> + sys_getpid();
> *args->breakpoint = rst_sigreturn;
>
> ksigfillset(&act.rt_sa_mask);
>
> --->8--->8--->8---8<---8<---8<---
>
> I suspect that the problem is in *args pointer to be garbage by some
> reason -- if we find another getpid() call in strace log, that's not
> the reason and it's somewhere in ksigfillset() (which is unlikely).
>
> And let me think a while, why *args may have such strange junk inside
> (0x12024b48d48).
>
So here is an strace with your patch applied, it looks a bit different
indeed - http://sprunge.us/HBEV
I checked the disassembly my compiler produces for the
__export_restore_task and it indeed has a prologue, setting up the
stack. So that looks good indeed.
Also here are the register state at the time the crash occurs with your
patch applied:
(gdb) info register
rax 0x23000 143360
rbx 0x20000 131072
rcx 0x12dad 77229
rdx 0x48000042b0058948 5188147057151805768
rsi 0x6fc8a0 7325856
rdi 0x20000 131072
rbp 0x5b16 0x5b16
rsp 0x1eec0 0x1eec0
r8 0x1 1
r9 0x1 1
r10 0x7fffbf9c4d70 140736408079728
r11 0x206 518
r12 0x1f070 127088
r13 0x703e20 7355936
r14 0x203c0 132032
r15 0x7fffffffde60 140737488346720
rip 0x10b27 0x10b27
eflags 0x10206 [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
My theory here is that since CRIU is compiled with a non-standard glibc,
it has started being interpreted by glibc 2.19's interpreter
(ld-linux-x86-64.so.2) and when it's time to restore the process which
is bash it starts executing it with the new interpreted and my bash
indeed doesn't work with it and segfaults. Does this sound possible?
Regards,
Nikolay
More information about the CRIU
mailing list