[CRIU] Fwd: Checkpoint failure on arm64 platform
Vijay Kilari
vijay.kilari at gmail.com
Wed Dec 23 07:53:00 PST 2015
Hi Christopher,
On Wed, Dec 23, 2015 at 7:27 PM, Christopher Covington
<cov at codeaurora.org> wrote:
> Hi Vijay,
>
> On 12/23/2015 12:18 AM, Vijay Kilari wrote:
>
>>> Try running the test code attached to the following email:
>>>
>>> https://lists.openvz.org/pipermail/criu/2015-March/019161.html
>>
>> The output of test code is as follows
>>
>> ubuntu at ubuntu:~/criu$ mv attachment.bin attachment.c
>> ubuntu at ubuntu:~/criu$ gcc attachment.c
>> ubuntu at ubuntu:~/criu$ ./a.out
>> [40136] vdso: 0x3ff862c0000 - 0x3ff862d0000
>> [40136] vdso moved to: 0x3ff86130000 - 0x3ff86140000
>> [40136] sending SIGUSR1 signal to myself
>> [40136] Caught signal 10 (siginfo=0x3ffe3dce770)
>> [40136] siginfo=[si_signo=10, si_pid=40136, si_uid=1000]
>> Segmentation fault (core dumped)
>> ubuntu at ubuntu:~/criu$
>>
>>> If that crashes, try the following patch:
>>>
>>> http://www.spinics.net/lists/linux-arm-msm/msg18291.html
>>
>> After applying this patch
>>
>> ubuntu at ubuntu:~/criu$ ./a.out
>> [1493] vdso: 0x3ff93270000 - 0x3ff[ 98.900231] pgd = fffffe03cd450000
>> 93280000
>> [1493] vdso moved to: [ 98.906466] [00000500]
>> *pgd=00000003cee600030x3ff930e0000 - 0x3ff930f0000
>> [, *pud=00000003cee600031493] sending SIGUSR1 signal to ,
>> *pmd=00000003cee60003myself
>> [1493] Caught signal 10 , *pte=0000000000000000(siginfo=0x3ffc4e3f0c0)
>> [1493]
>> siginfo=[si_signo=10, si_pid=1493, si_uid=1000]
>> Segmentation fault (core dumped)
>> ubuntu at ubuntu:~/criu$
>
> Doesn't look like the patch does anything :(. My patch depends on
> Laurent Dufour's work which was merged in 4.2 but it sounds like you
> should already have those dependencies (f2abee, 83d3f0, 4abad2, 2ae416).
>
Sorry, I tested with wrong kernel. After applying your patch,
following is the output
of test code
[4190] vdso: 0x3ffa37d0000 - 0x3ffa37e0000
[4190] vdso moved to: 0x3ffa3640000 - 0x3ffa3650000
[4190] sending SIGUSR1 signal to myself
[4190] Caught signal 10 (siginfo=0x3ffe713d490)
[4190] siginfo=[si_signo=10, si_pid=4190, si_uid=1000]
[4190] exiting
Infact, I tried to call move_vdso() twice in the test code. It works
>>> While commit 871da9 seemed to fix the issue for cross-compilation using
>>> Linaro compilers, native building under Arch results in the char
>>> *vdso_symbols requiring relocations. We're currently getting by with a
>>> hacky workaround. Unless some compiler option that we've not yet tried
>>> gets GCC to actually generated position independent code, piegen
>>> probably will need to be ported to AArch64 (and perhaps AArch32 as well)
>>> to resolve the load-time relocations.
>>
>> Assuming the vdso mapping is already done in vdso_init()
>> for vdso_rt and vdso_vvar,
>> I have tried dirty workaround where I removed vdso mapping
>> in pie/restorer.c from __export_restore_task() function. This change
>> worked. checkpoint and restore worked.
>>
>> #if 0
>> /*
>> * Proxify vDSO.
>> */
>> for (i = 0; i < args->vmas_n; i++) {
>> if (vma_entry_is(&args->vmas[i], VMA_AREA_VDSO) ||
>> vma_entry_is(&args->vmas[i], VMA_AREA_VVAR)) {
>> pr_info("In %s calling vdso_proxify for i =
>> %d\n",__func__, i);
>> if (vdso_proxify("dumpee", &args->vdso_sym_rt,
>> args->vdso_rt_parked_at,
>> i, args->vmas, args->vmas_n))
>> goto core_restore_end;
>> break;
>> }
>> }
>> #endif
>
> Our current use case is running trusted tools and workloads with as much
> determinism as possible, so we generally turn off address space layout
> randomization (using the norandmaps kernel parameter) which has a side
> effect of skipping the VDSO proxy step.
>
You mean, VDSO proxy step succeed if I pass norandmaps as kernel parameter?
What I observed is the failure is happening when vdso_symbols[i]
is accessed using below function.
void dump_vdso_symtable(void)
{
const char *vdso_symbols[VDSO_SYMBOL_MAX] = {
ARCH_VDSO_SYMBOLS
};
int i;
for (i = 0; i < VDSO_SYMBOL_MAX; i++)
pr_info("In %s i=%d max %d sym %s \n",__func__, i,
VDSO_SYMBOL_MAX, vdso_symbols[i]);
}
The above function is called from two places
1) From vdso_init() which is calling vdso_fill_self_symtable()
static int vdso_fill_self_symtable(struct vdso_symtable *s)
{
....
if (has_vdso) {
if (s->vma_start != VDSO_BAD_ADDR) {
pr_err("Got second vDSO entry\n");
goto err;
}
s->vma_start = start;
s->vma_end = end;
pr_info("In %s dumping vdso table \n",__func__);
dump_vdso_symtable();
pr_info("In %s calling vdso_fill_symtable
\n",__func__);
ret = vdso_fill_symtable((void *)start, end - start, s);
if (ret)
goto err;
}
....
}
output is as follows and is successful in dumping vdso_symtable.
(00.000791) vdso: In vdso_fill_self_symtable dumping vdso table
(00.000812) vdso: In dump_vdso_symtable i=0 max 4 sym __kernel_clock_getres
(00.000822) vdso: In dump_vdso_symtable i=1 max 4 sym __kernel_clock_gettime
(00.000832) vdso: In dump_vdso_symtable i=2 max 4 sym __kernel_gettimeofday
(00.000841) vdso: In dump_vdso_symtable i=3 max 4 sym __kernel_rt_sigreturn
(00.000860) vdso: In vdso_fill_self_symtable calling vdso_fill_symtable
(00.000877) vdso: Parsing at 3ff91300000 3ff91310000
2) From __export_restore_task() in pie/restorer.c
long __export_restore_task(struct task_restore_args *args)
{
long ret = -1;
int i;
VmaEntry *vma_entry;
unsigned long va;
struct rt_sigframe *rt_sigframe;
struct prctl_mm_map prctl_map;
unsigned long new_sp;
k_rtsigset_t to_block;
pid_t my_pid = sys_getpid();
rt_sigaction_t act;
pr_info("In %s dump vdso table now \n",__func__);
dump_vdso_symtable();
pr_info("In %s dump vdso table after \n",__func__);
bootstrap_start = args->bootstrap_start;
...
}
log below show, dump_vdso_symtable() fails
log:
task_args->nr_threads: 1
task_args->clone_restore_fn: 0x100b08
task_args->thread_args: 0x110540
pie: In __export_restore_task dump vdso table now
(01.164642) Error (cr-restore.c:1266): 2367 killed by signal 11
(01.223953) Error (cr-restore.c:1266): 2367 killed by signal 9
(01.364162) Error (cr-restore.c:1999): Restoring FAILED.
> Regards,
> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
More information about the CRIU
mailing list