[CRIU] Fwd: Checkpoint failure on arm64 platform

Vijay Kilari vijay.kilari at gmail.com
Wed Dec 23 09:54:11 PST 2015


Hi Christopher,

On Wed, Dec 23, 2015 at 10:34 PM, Christopher Covington
<cov at codeaurora.org> wrote:
> Hi Vijay,
>
> On 12/23/2015 10:53 AM, Vijay Kilari wrote:
>
>> Sorry, I tested with wrong kernel. After applying your patch,
>> following is the output
>> of test code
>>
>> [4190] vdso: 0x3ffa37d0000 - 0x3ffa37e0000
>> [4190] vdso moved to: 0x3ffa3640000 - 0x3ffa3650000
>> [4190] sending SIGUSR1 signal to myself
>> [4190] Caught signal 10 (siginfo=0x3ffe713d490)
>> [4190]    siginfo=[si_signo=10, si_pid=4190, si_uid=1000]
>> [4190] exiting
>>
>> Infact, I tried to call move_vdso() twice in the test code. It works
>
> Good to hear. Now I (or someone else) just needs to do the work
> necessary to get that patch merged into the kernel.
>
>>>>> While commit 871da9 seemed to fix the issue for cross-compilation using
>>>>> Linaro compilers, native building under Arch results in the char
>>>>> *vdso_symbols requiring relocations. We're currently getting by with a
>>>>> hacky workaround. Unless some compiler option that we've not yet tried
>>>>> gets GCC to actually generated position independent code, piegen
>>>>> probably will need to be ported to AArch64 (and perhaps AArch32 as well)
>>>>> to resolve the load-time relocations.
>>>>
>>>> Assuming the vdso mapping is already done in vdso_init()
>>>> for vdso_rt and vdso_vvar,
>>>> I have tried dirty workaround where I removed vdso mapping
>>>> in pie/restorer.c from __export_restore_task() function. This change
>>>> worked. checkpoint and restore worked.
>>>>
>>>> #if 0
>>>>         /*
>>>>          * Proxify vDSO.
>>>>          */
>>>>         for (i = 0; i < args->vmas_n; i++) {
>>>>                 if (vma_entry_is(&args->vmas[i], VMA_AREA_VDSO) ||
>>>>                     vma_entry_is(&args->vmas[i], VMA_AREA_VVAR)) {
>>>>                         pr_info("In %s calling vdso_proxify for i =
>>>> %d\n",__func__, i);
>>>>                         if (vdso_proxify("dumpee", &args->vdso_sym_rt,
>>>>                                          args->vdso_rt_parked_at,
>>>>                                          i, args->vmas, args->vmas_n))
>>>>                                 goto core_restore_end;
>>>>                         break;
>>>>                 }
>>>>         }
>>>> #endif
>>>
>>> Our current use case is running trusted tools and workloads with as much
>>> determinism as possible, so we generally turn off address space layout
>>> randomization (using the norandmaps kernel parameter) which has a side
>>> effect of skipping the VDSO proxy step.
>>
>> You mean, VDSO proxy step succeed if I pass norandmaps as kernel parameter?
>
> With norandmaps specified at both dump and restore time, vdso_proxify
> will execute the "Runtime vdso/vvar matches dumpee, remap inplace" code
> path, returning before the "Runtime vdso mismatches dumpee, generate
> proxy" steps. While there have been hiccups in the past with generating
> the proxy, seeing the results below make me think the vdso proxy is not
> the problem.
>
>> What I observed is the failure is happening when vdso_symbols[i]
>> is accessed using below function.
>>
>> void dump_vdso_symtable(void)
>> {
>>         const char *vdso_symbols[VDSO_SYMBOL_MAX] = {
>>                 ARCH_VDSO_SYMBOLS
>>         };
>>         int i;
>>
>>         for (i = 0; i < VDSO_SYMBOL_MAX; i++)
>>                 pr_info("In %s i=%d max %d sym %s \n",__func__, i,
>> VDSO_SYMBOL_MAX, vdso_symbols[i]);
>> }
>>
>> The above function is called from two places
>>
>> 1)  From vdso_init() which is calling vdso_fill_self_symtable()
>>
>> static int vdso_fill_self_symtable(struct vdso_symtable *s)
>> {
>>   ....
>>                    if (has_vdso) {
>>                         if (s->vma_start != VDSO_BAD_ADDR) {
>>                                 pr_err("Got second vDSO entry\n");
>>                                 goto err;
>>                         }
>>                         s->vma_start = start;
>>                         s->vma_end = end;
>>                         pr_info("In %s dumping vdso table \n",__func__);
>>                         dump_vdso_symtable();
>>                          pr_info("In %s calling vdso_fill_symtable
>> \n",__func__);
>>                         ret = vdso_fill_symtable((void *)start, end - start, s);
>>                         if (ret)
>>                                 goto err;
>>                 }
>>    ....
>>
>> }
>>
>> output is as follows and is successful in dumping vdso_symtable.
>>
>> (00.000791) vdso: In vdso_fill_self_symtable dumping vdso table
>> (00.000812) vdso: In dump_vdso_symtable i=0 max 4 sym __kernel_clock_getres
>> (00.000822) vdso: In dump_vdso_symtable i=1 max 4 sym __kernel_clock_gettime
>> (00.000832) vdso: In dump_vdso_symtable i=2 max 4 sym __kernel_gettimeofday
>> (00.000841) vdso: In dump_vdso_symtable i=3 max 4 sym __kernel_rt_sigreturn
>> (00.000860) vdso: In vdso_fill_self_symtable calling vdso_fill_symtable
>> (00.000877) vdso: Parsing at 3ff91300000 3ff91310000
>>
>> 2)  From __export_restore_task() in pie/restorer.c
>>
>> long __export_restore_task(struct task_restore_args *args)
>> {
>>         long ret = -1;
>>         int i;
>>         VmaEntry *vma_entry;
>>         unsigned long va;
>>
>>         struct rt_sigframe *rt_sigframe;
>>         struct prctl_mm_map prctl_map;
>>         unsigned long new_sp;
>>         k_rtsigset_t to_block;
>>         pid_t my_pid = sys_getpid();
>>         rt_sigaction_t act;
>>
>>         pr_info("In %s dump vdso table now \n",__func__);
>>         dump_vdso_symtable();
>>         pr_info("In %s dump vdso table after \n",__func__);
>>         bootstrap_start = args->bootstrap_start;
>>    ...
>>
>> }
>>
>> log below show, dump_vdso_symtable() fails
>>
>> log:
>> task_args->nr_threads: 1
>> task_args->clone_restore_fn: 0x100b08
>> task_args->thread_args: 0x110540
>> pie: In __export_restore_task dump vdso table now
>> (01.164642) Error (cr-restore.c:1266): 2367 killed by signal 11
>> (01.223953) Error (cr-restore.c:1266): 2367 killed by signal 9
>> (01.364162) Error (cr-restore.c:1999): Restoring FAILED.
>
> This indicates to me that GCC is generating code that requires run-time
> relocations that never get applied. The char *array seems to give it
> trouble. Probably the best long-term solution would be to port piegen to
> AArch64. My guesses about how piegen works is that at build time, the
> framework (running on the build host) generates a minimal loader for the
> restorer blob (run on the target) that performs the necessary relocations.

 yes, char *array is the issue. If I use char array, checkpoint and
restore works.

>
> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project


More information about the CRIU mailing list