[CRIU] CRIU segfaulting when restoring a process

Dmitry Safonov dsafonov at virtuozzo.com
Thu Aug 18 08:58:31 PDT 2016


On 08/18/2016 06:13 PM, Dmitry Safonov wrote:
> On 08/18/2016 04:44 PM, Nikolay Borisov wrote:
>> Hello,
>
> Hi Nikolay,
>
>> I've built CRIU 2.5 from source + some patches which move around stuff
>> in the headers to facilitate compilation on centos 6.7 with external
>> glibc 2.19. My CRIU is built the following way:
>>
>> make -j8 USERCFLAGS="-I/opt/glibc-2.19/include/ -L/opt/glibc-2.19//lib/
>> -Wl,-dynamic-linker=/opt/glibc-2.19//lib/ld-linux-x86-64.so.2 -Wl,-
>> rpath=/opt/glibc-2.19/lib/:/usr/lib64/:/lib64/"
>>
>> This way I can happily dump a simple process a la
>> https://criu.org/Simple_loop style. However my problems begin when I try
>> to restore the process, since CRIU segfaults. Here is a restore log as
>> well as strace from the restore process:
>>
>> http://sprunge.us/DcIh - restore.log
>> http://sprunge.us/CVBG - strace.log
>>
>> I'd happy if you could shed some light what I might be causing the
>> problem. One thing I thought might be the difference between the way the
>> process being restore - bash is compiled and criu. Here is a comparison
>> how they libraries look like: http://paste.ubuntu.com/23067392/ should
>> it matter of course
>
> So, the problem seems to be in the restorer blob:
> Switching to the restorer was sucessful:
>> 23537 write(199999, "(00.188706)  23537: task_args:
> 0x20000\ntask_args->pid: 23537\ntask_args->nr_threads:
> 1\ntask_args->clo"..., 155) = 155
>> 23537 getpid()                          = 23537
>
> which is sys_getpid() in __export_restore_task().
> The fault address is very strange one:
> 23537 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR,
> si_addr=0x12024b48d48} ---
>
> So, the fail is somwhere between getpid() and sigaction() calls in
> __export_restore_task() (criu/pie/restorer.c), as we would see
> sys_sigaction() if it has been called.
>
> Could you give a shot with the next diff and paste strace output before
> failure?
>
> --->8--->8--->8---8<---8<---8<---
> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> index 7cc735c96870..8503078a82d9 100644
> --- a/criu/pie/restorer.c
> +++ b/criu/pie/restorer.c
> @@ -1123,6 +1123,7 @@ long __export_restore_task(struct
> task_restore_args *args)
>      n_helpers = args->helpers_n;
>      zombies = args->zombies;
>      n_zombies = args->zombies_n;
> +    sys_getpid();
>      *args->breakpoint = rst_sigreturn;
>
>      ksigfillset(&act.rt_sa_mask);
>
> --->8--->8--->8---8<---8<---8<---
>
> I suspect that the problem is in *args pointer to be garbage by some
> reason -- if we find another getpid() call in strace log, that's not
> the reason and it's somewhere in ksigfillset() (which is unlikely).
>
> And let me think a while, why *args may have such strange junk inside
> (0x12024b48d48).
>

Hmm, another idea, what may be wrong is stack pointer.
We're right after entering the restorer and compiler hasn't yet
accessed stack even to form a stack frame.
So, if stackframe is corrupted, we can't save the result of getpid()
and we have the same result.
It would be worth, if you provide the registers state at segfault
moment. To do this, run:
$ ulimit -c unlimited
which will allow to save core dump files,
$ strace criu restore -vvvv #... the usual args
$ gdb core.<pid>
to open with gdb saved core file in the same directory
> info registers
to print registers state at crash moment.

-- 
              Dmitry


More information about the CRIU mailing list