[CRIU] Dealing with VDSO remap

Christopher Covington cov at codeaurora.org
Mon Mar 16 08:37:23 PDT 2015


On 03/09/2015 12:42 PM, Laurent Dufour wrote:
> On 09/03/2015 15:04, Christopher Covington wrote:
>> Hi Laurent,
>>
>> On 03/09/2015 09:34 AM, Laurent Dufour wrote:
>>> Hi Chris,
>>>
>>> On 06/03/2015 15:58, Christopher Covington wrote:
>>>> Hi Laurent,
>>>>
>>>> On 03/06/2015 09:15 AM, Laurent Dufour wrote:
>>>>> Hi,
>>>>>
>>>>> I'm porting CRIU to the PopwerPC architecture, and among other issues,
>>>>> I'm facing a major one with the VDSO remapping at restart time.
>>>>>
>>>>> On PowerPC, as on ARM64, the kernel keeps track of the VDSO base address
>>>>> because it is using it to jump back to a sigreturn trampoline at the end
>>>>> of a signal processing (see handle_rt_signal64 in
>>>>> arch/powerpc/kernel/signal_64.c, and for ARM64, setup_return in
>>>>> arch/arm64/kernel/signal.c).
>>>>>
>>>>> When remapping the VDSO at restart time, the kernel keep the reference
>>>>> to the previous VDSO mapping, the one inheriting from the criu, so
>>>>> handling signal after the restart leads to unpredictable results, most
>>>>> of the time a SIGSEGV is raised.
>>>>>
>>>>> I didn't find a smart way to update the kernel reference to the vdso
>>>>> mapping once the VDSO is remapped, so no way to work around that today.
>>>>>
>>>>> Furthermore, since this is the same picture on ARM 64, I'm wondering how
>>>>> it could work on this architecture. Am I missing a major thing here ?
>>>>>
>>>>> If not, is there a plan in the CRIU project to to deal with that, other
>>>>> than by hacking the kernel to update its reference at restart time ?
>>>>
>>>> It's been a while since I worked on this, and I feel like I never had a really
>>>> solid understanding of all the parts, but hopefully this can help.
>>>>
>>>> I think the ideal solution would be for a remap system call to move the VDSO.
>>>> This may have been implemented for x86, but I think it's a new feature and
>>>> missing on most other architectures. There's a lot of duplication in the VDSO
>>>> code between architectures. If there was less duplication, the x86 additions
>>>> might easily apply to other architectures as well, but I've never gotten
>>>> around to consolidating the VDSO code and I haven't noticed anyone else having
>>>> gotten around to it either.
>>>
>>> I came to the same conclusion, when the VDSO area is remapped, some
>>> architecture specific code should be triggerd in the kernel to update
>>> the VDSO reference.
>>> I'll take a closer look to the mremap in the kernel..
>>>
>>>> The workaround is to put trampolines/branches at the location that the
>>>> restored process expects to the location that the VDSO is currently located at
>>>> restore time. See vdso_redirect_calls in arch/aarch64/vdso-pie.c.
>>>
>>> I put the same code in the new ppc64 branch I created but it is only
>>> dealing with user space's references to the VDSO, not the kernel ones.
>>>
>>> Unfortunately, creating a trampoline at the place the kernel put the
>>> VDSO at restart is not working all the time since this area may conflict
>>> with a checkpointed memory part. Updating the kernel reference to the
>>> VDSO, when it has been moved, looks to be the only way to address that.
>>
>> I see. My "production" runs are known/trusted code running under qemu system
>> emulation with the norandmaps kernel parameter set for run-to-run
>> determinism/reproducibility. I've only done light testing with randomization
>> fully enabled, so I'm afraid my experience here is limited, but my
>> recollection is that (and some quick double checking confirms) the vdso00 and
>> vdso01 test cases pass for me with /proc/sys/kernel/randomize_va_space == 2.
>> Triggering the case you describe requires an ET_DYN binary (CFLAGS="-pie
>> -fPIE"), right? My binaries are currently ET_EXEC. Should we update the vdso
>> test cases to use those flags?
> 
> I did my test with the same randomize_va_space value, and I'm wondering
> if this parameter change the way the VDSO is mapped on Power.
> 
> This being said, I'm confused by the mention to "ET_DYN binary" you did.
> My concern is about a reference to the VDSO process's base address the
> kernel is using to build the return stack of a signal. This is not tied
> to the way the process is built. It could be statically linked, PIE, PIC
> or whatever else, the signal return stack will be the same, a call to
> the system call sigreturn made through the VDSO. Looking at the ARM 64
> kernel code, this looks to be the same whatever the process's binary is
> (setup_return).
> Am I missing something here ?

(I wrote this a while back but neglected to hit send until now.)

Regarding the effect of -pie -fPIE, here is edited /proc/self/maps output from
an ET_EXEC binary on AArch64:

00000400000  /bin/aarch64-linux-gnu/busybox
00000570000  /bin/aarch64-linux-gnu/busybox
3ffad9d0000
3ffad9e0000  /lib/aarch64-linux-gnu/libc-2.19-2014.05.so
3ffadb20000  /lib/aarch64-linux-gnu/libc-2.19-2014.05.so
3ffadb30000  /lib/aarch64-linux-gnu/libm-2.19-2014.05.so
3ffadbd0000  /lib/aarch64-linux-gnu/libm-2.19-2014.05.so
3ffadbe0000  [vvar]
3ffadbf0000  [vdso]
3ffadc00000  /lib/aarch64-linux-gnu/ld-2.19-2014.05.so
3ffadc20000  /lib/aarch64-linux-gnu/ld-2.19-2014.05.so
3ffd4170000  [stack]

If it were ET_DYN, the busybox mmaps would be in a 3ff region and could
possibly overlap the vdso region of a previous process, which I thought was
what you were describing.

Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


More information about the CRIU mailing list