[CRIU] Fwd: Checkpoint failure on arm64 platform

Vijay Kilari vijay.kilari at gmail.com
Tue Dec 22 21:18:48 PST 2015


Hi Christopher,

On Tue, Dec 22, 2015 at 9:52 PM, Christopher Covington
<cov at codeaurora.org> wrote:
> Hi Vijay,
>
> I'm glad to see someone else using the AArch64 port. Sorry it has so
> many cobwebs.
>
> On 12/22/2015 06:48 AM, Vijay Kilari wrote:
>> On Tue, Dec 22, 2015 at 4:13 PM, Pavel Emelyanov <xemul at parallels.com> wrote:
>>> On 12/22/2015 09:03 AM, Vijay Kilari wrote:
>
>>>> After this changes + changing PAGE_SIZE to 64KB, there is no error
>>>> reporting during checkpoint.
>>>
>>> Good :) Christopher (in Cc) once started to fix the page-size problem for
>>> arm and aarch64.
>
> Our current hack, that we've grown complacent with, is remove the check
> in crtools.c that PAGE_SIZE is PAGE_IMAGE_SIZE (4096), and build twice,
> once with PAGE_SIZE at the default of 4096, and again with
> -DPAGE_SIZE=65536. We rename each criu executable so that we essentially
> have "criu_4096" and "criu_65536". We then use a wrapper script named
> "criu" to run criu_$(getconf PAGE_SIZE).
>
> Hopefully the foundation has been laid to finally take the 4K build, run
> it on 64K (and/or vice versa), see what crashes, and change each
> breaking, hard-coded PAGE_SIZE usage to
> run-time-detected-on-arches-that-need-it page_size(). When I tried this
> previously, support from within the parasite code was most difficult,
> but over the summer I finally figured out how to pass through such
> information with the very similar task_args->task_size = kdat.task_size
> change (7451fc). Lastly, the image files should include the page size as
> well. Pavel suggested doing this with cpuinf dump/check images
> (cpuinfo.proto). Proper runtime probing might even preemptively add 16K
> support, in case anyone is interested in that.
>
> 32-bit ARM (AArch32) only has 4K pages, although there was recent work
> on runtime probing for TASK_SIZE, if I recall correctly.
>
>> The clone() system is failing because, arm64 requires stack pointer
>> to be aligned with 16 bytes. With below changes in cr-restore.c,
>> clone() is ok I will send patch for this as well. Many places inside
>> test/zdtm also requires similar changes.
>>
>> struct cr_clone_arg {
>>         /*
>>          * Reserve some space for clone() to locate arguments
>>          * and retcode in this place
>>          */
>> -        char stack[128] __attribute__((aligned (8)));
>> +       char stack[128] __attribute__((aligned (16)));
>>         char stack_ptr[0];
>>         struct pstree_item *item;
>>         unsigned long clone_flags;
>>         int fd;
>>
>>         CoreEntry *core;
>> };
>
> This is correct. Apologies for not having already sent this patch.
>
>> However now restore fails with signal 11 when executing
>>  vdso_fill_symtable(). Any idea?
>>
>> restore.log:
>> ---------------
>> pie: vdso: Parsing at 0x3ff81110000 0x3ff81120000
>> pie: vdso: PT_LOAD p_vaddr: 0x0
>> pie: vdso: DT_HASH: 0x120
>> pie: vdso: DT_STRTAB: 0x1f8
>> pie: vdso: DT_SYMTAB: 0x150
>> pie: vdso: DT_STRSZ: 0x77
>> pie: vdso: DT_SYMENT: 0x18
>> pie: vdso: nbucket 0x3 nchain 0x7 bucket 0x3ff81110128 chain 0x3ff8111>
>> pie: 0134
>> (01.062166) Error (cr-restore.c:1266): 24818 killed by signal 11
>> (01.103990) Error (cr-restore.c:1266): 24818 killed by signal 9
>> (01.294182) Error (cr-restore.c:1999): Restoring FAILED.
>
> Try running the test code attached to the following email:
>
> https://lists.openvz.org/pipermail/criu/2015-March/019161.html

The output of test code is as follows

ubuntu at ubuntu:~/criu$ mv attachment.bin attachment.c
ubuntu at ubuntu:~/criu$ gcc attachment.c
ubuntu at ubuntu:~/criu$ ./a.out
[40136] vdso: 0x3ff862c0000 - 0x3ff862d0000
[40136] vdso moved to: 0x3ff86130000 - 0x3ff86140000
[40136] sending SIGUSR1 signal to myself
[40136] Caught signal 10 (siginfo=0x3ffe3dce770)
[40136]    siginfo=[si_signo=10, si_pid=40136, si_uid=1000]
Segmentation fault (core dumped)
ubuntu at ubuntu:~/criu$

>
> If that crashes, try the following patch:
>
> http://www.spinics.net/lists/linux-arm-msm/msg18291.html

After applying this patch

ubuntu at ubuntu:~/criu$ ./a.out
[1493] vdso: 0x3ff93270000 - 0x3ff[   98.900231] pgd = fffffe03cd450000
93280000
[1493] vdso moved to: [   98.906466] [00000500]
*pgd=00000003cee600030x3ff930e0000 - 0x3ff930f0000
[, *pud=00000003cee600031493] sending SIGUSR1 signal to ,
*pmd=00000003cee60003myself
[1493] Caught signal 10 , *pte=0000000000000000(siginfo=0x3ffc4e3f0c0)
[1493]
siginfo=[si_signo=10, si_pid=1493, si_uid=1000]
Segmentation fault (core dumped)
ubuntu at ubuntu:~/criu$

>
> While commit 871da9 seemed to fix the issue for cross-compilation using
> Linaro compilers, native building under Arch results in the char
> *vdso_symbols requiring relocations. We're currently getting by with a
> hacky workaround. Unless some compiler option that we've not yet tried
> gets GCC to actually generated position independent code, piegen
> probably will need to be ported to AArch64 (and perhaps AArch32 as well)
> to resolve the load-time relocations.

Assuming the vdso mapping is already done in vdso_init()
for vdso_rt and vdso_vvar,
I have tried dirty workaround where I removed vdso mapping
in pie/restorer.c from __export_restore_task() function. This change
worked. checkpoint and restore worked.

#if 0
        /*
         * Proxify vDSO.
         */
        for (i = 0; i < args->vmas_n; i++) {
                if (vma_entry_is(&args->vmas[i], VMA_AREA_VDSO) ||
                    vma_entry_is(&args->vmas[i], VMA_AREA_VVAR)) {
                        pr_info("In %s calling vdso_proxify for i =
%d\n",__func__, i);
                        if (vdso_proxify("dumpee", &args->vdso_sym_rt,
                                         args->vdso_rt_parked_at,
                                         i, args->vmas, args->vmas_n))
                                goto core_restore_end;
                        break;
                }
        }
#endif


>
> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project


More information about the CRIU mailing list