[CRIU] Fwd: Checkpoint failure on arm64 platform

Vijay Kilari vijay.kilari at gmail.com
Wed Dec 23 07:53:00 PST 2015


Hi Christopher,

On Wed, Dec 23, 2015 at 7:27 PM, Christopher Covington
<cov at codeaurora.org> wrote:
> Hi Vijay,
>
> On 12/23/2015 12:18 AM, Vijay Kilari wrote:
>
>>> Try running the test code attached to the following email:
>>>
>>> https://lists.openvz.org/pipermail/criu/2015-March/019161.html
>>
>> The output of test code is as follows
>>
>> ubuntu at ubuntu:~/criu$ mv attachment.bin attachment.c
>> ubuntu at ubuntu:~/criu$ gcc attachment.c
>> ubuntu at ubuntu:~/criu$ ./a.out
>> [40136] vdso: 0x3ff862c0000 - 0x3ff862d0000
>> [40136] vdso moved to: 0x3ff86130000 - 0x3ff86140000
>> [40136] sending SIGUSR1 signal to myself
>> [40136] Caught signal 10 (siginfo=0x3ffe3dce770)
>> [40136]    siginfo=[si_signo=10, si_pid=40136, si_uid=1000]
>> Segmentation fault (core dumped)
>> ubuntu at ubuntu:~/criu$
>>
>>> If that crashes, try the following patch:
>>>
>>> http://www.spinics.net/lists/linux-arm-msm/msg18291.html
>>
>> After applying this patch
>>
>> ubuntu at ubuntu:~/criu$ ./a.out
>> [1493] vdso: 0x3ff93270000 - 0x3ff[   98.900231] pgd = fffffe03cd450000
>> 93280000
>> [1493] vdso moved to: [   98.906466] [00000500]
>> *pgd=00000003cee600030x3ff930e0000 - 0x3ff930f0000
>> [, *pud=00000003cee600031493] sending SIGUSR1 signal to ,
>> *pmd=00000003cee60003myself
>> [1493] Caught signal 10 , *pte=0000000000000000(siginfo=0x3ffc4e3f0c0)
>> [1493]
>> siginfo=[si_signo=10, si_pid=1493, si_uid=1000]
>> Segmentation fault (core dumped)
>> ubuntu at ubuntu:~/criu$
>
> Doesn't look like the patch does anything :(. My patch depends on
> Laurent Dufour's work which was merged in 4.2 but it sounds like you
> should already have those dependencies (f2abee, 83d3f0, 4abad2, 2ae416).
>

Sorry, I tested with wrong kernel. After applying your patch,
following is the output
of test code

[4190] vdso: 0x3ffa37d0000 - 0x3ffa37e0000
[4190] vdso moved to: 0x3ffa3640000 - 0x3ffa3650000
[4190] sending SIGUSR1 signal to myself
[4190] Caught signal 10 (siginfo=0x3ffe713d490)
[4190]    siginfo=[si_signo=10, si_pid=4190, si_uid=1000]
[4190] exiting

Infact, I tried to call move_vdso() twice in the test code. It works

>>> While commit 871da9 seemed to fix the issue for cross-compilation using
>>> Linaro compilers, native building under Arch results in the char
>>> *vdso_symbols requiring relocations. We're currently getting by with a
>>> hacky workaround. Unless some compiler option that we've not yet tried
>>> gets GCC to actually generated position independent code, piegen
>>> probably will need to be ported to AArch64 (and perhaps AArch32 as well)
>>> to resolve the load-time relocations.
>>
>> Assuming the vdso mapping is already done in vdso_init()
>> for vdso_rt and vdso_vvar,
>> I have tried dirty workaround where I removed vdso mapping
>> in pie/restorer.c from __export_restore_task() function. This change
>> worked. checkpoint and restore worked.
>>
>> #if 0
>>         /*
>>          * Proxify vDSO.
>>          */
>>         for (i = 0; i < args->vmas_n; i++) {
>>                 if (vma_entry_is(&args->vmas[i], VMA_AREA_VDSO) ||
>>                     vma_entry_is(&args->vmas[i], VMA_AREA_VVAR)) {
>>                         pr_info("In %s calling vdso_proxify for i =
>> %d\n",__func__, i);
>>                         if (vdso_proxify("dumpee", &args->vdso_sym_rt,
>>                                          args->vdso_rt_parked_at,
>>                                          i, args->vmas, args->vmas_n))
>>                                 goto core_restore_end;
>>                         break;
>>                 }
>>         }
>> #endif
>
> Our current use case is running trusted tools and workloads with as much
> determinism as possible, so we generally turn off address space layout
> randomization (using the norandmaps kernel parameter) which has a side
> effect of skipping the VDSO proxy step.
>

You mean, VDSO proxy step succeed if I pass norandmaps as kernel parameter?

What I observed is the failure is happening when vdso_symbols[i]
is accessed using below function.

void dump_vdso_symtable(void)
{
        const char *vdso_symbols[VDSO_SYMBOL_MAX] = {
                ARCH_VDSO_SYMBOLS
        };
        int i;

        for (i = 0; i < VDSO_SYMBOL_MAX; i++)
                pr_info("In %s i=%d max %d sym %s \n",__func__, i,
VDSO_SYMBOL_MAX, vdso_symbols[i]);
}

The above function is called from two places

1)  From vdso_init() which is calling vdso_fill_self_symtable()

static int vdso_fill_self_symtable(struct vdso_symtable *s)
{
  ....
                   if (has_vdso) {
                        if (s->vma_start != VDSO_BAD_ADDR) {
                                pr_err("Got second vDSO entry\n");
                                goto err;
                        }
                        s->vma_start = start;
                        s->vma_end = end;
                        pr_info("In %s dumping vdso table \n",__func__);
                        dump_vdso_symtable();
                         pr_info("In %s calling vdso_fill_symtable
\n",__func__);
                        ret = vdso_fill_symtable((void *)start, end - start, s);
                        if (ret)
                                goto err;
                }
   ....

}

output is as follows and is successful in dumping vdso_symtable.

(00.000791) vdso: In vdso_fill_self_symtable dumping vdso table
(00.000812) vdso: In dump_vdso_symtable i=0 max 4 sym __kernel_clock_getres
(00.000822) vdso: In dump_vdso_symtable i=1 max 4 sym __kernel_clock_gettime
(00.000832) vdso: In dump_vdso_symtable i=2 max 4 sym __kernel_gettimeofday
(00.000841) vdso: In dump_vdso_symtable i=3 max 4 sym __kernel_rt_sigreturn
(00.000860) vdso: In vdso_fill_self_symtable calling vdso_fill_symtable
(00.000877) vdso: Parsing at 3ff91300000 3ff91310000

2)  From __export_restore_task() in pie/restorer.c

long __export_restore_task(struct task_restore_args *args)
{
        long ret = -1;
        int i;
        VmaEntry *vma_entry;
        unsigned long va;

        struct rt_sigframe *rt_sigframe;
        struct prctl_mm_map prctl_map;
        unsigned long new_sp;
        k_rtsigset_t to_block;
        pid_t my_pid = sys_getpid();
        rt_sigaction_t act;

        pr_info("In %s dump vdso table now \n",__func__);
        dump_vdso_symtable();
        pr_info("In %s dump vdso table after \n",__func__);
        bootstrap_start = args->bootstrap_start;
   ...

}

log below show, dump_vdso_symtable() fails

log:
task_args->nr_threads: 1
task_args->clone_restore_fn: 0x100b08
task_args->thread_args: 0x110540
pie: In __export_restore_task dump vdso table now
(01.164642) Error (cr-restore.c:1266): 2367 killed by signal 11
(01.223953) Error (cr-restore.c:1266): 2367 killed by signal 9
(01.364162) Error (cr-restore.c:1999): Restoring FAILED.

> Regards,
> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project


More information about the CRIU mailing list