[CRIU] Dealing with VDSO remap

Laurent Dufour ldufour at linux.vnet.ibm.com
Wed Mar 18 01:29:34 PDT 2015


On 16/03/2015 16:37, Christopher Covington wrote:
> On 03/09/2015 12:42 PM, Laurent Dufour wrote:
>> On 09/03/2015 15:04, Christopher Covington wrote:
>>> Hi Laurent,
>>>
>>> On 03/09/2015 09:34 AM, Laurent Dufour wrote:
>>>> Hi Chris,
>>>>
>>>> On 06/03/2015 15:58, Christopher Covington wrote:
>>>>> Hi Laurent,
>>>>>
>>>>> On 03/06/2015 09:15 AM, Laurent Dufour wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm porting CRIU to the PopwerPC architecture, and among other issues,
>>>>>> I'm facing a major one with the VDSO remapping at restart time.
>>>>>>
>>>>>> On PowerPC, as on ARM64, the kernel keeps track of the VDSO base address
>>>>>> because it is using it to jump back to a sigreturn trampoline at the end
>>>>>> of a signal processing (see handle_rt_signal64 in
>>>>>> arch/powerpc/kernel/signal_64.c, and for ARM64, setup_return in
>>>>>> arch/arm64/kernel/signal.c).
>>>>>>
>>>>>> When remapping the VDSO at restart time, the kernel keep the reference
>>>>>> to the previous VDSO mapping, the one inheriting from the criu, so
>>>>>> handling signal after the restart leads to unpredictable results, most
>>>>>> of the time a SIGSEGV is raised.
>>>>>>
>>>>>> I didn't find a smart way to update the kernel reference to the vdso
>>>>>> mapping once the VDSO is remapped, so no way to work around that today.
>>>>>>
>>>>>> Furthermore, since this is the same picture on ARM 64, I'm wondering how
>>>>>> it could work on this architecture. Am I missing a major thing here ?
>>>>>>
>>>>>> If not, is there a plan in the CRIU project to to deal with that, other
>>>>>> than by hacking the kernel to update its reference at restart time ?
>>>>>
>>>>> It's been a while since I worked on this, and I feel like I never had a really
>>>>> solid understanding of all the parts, but hopefully this can help.
>>>>>
>>>>> I think the ideal solution would be for a remap system call to move the VDSO.
>>>>> This may have been implemented for x86, but I think it's a new feature and
>>>>> missing on most other architectures. There's a lot of duplication in the VDSO
>>>>> code between architectures. If there was less duplication, the x86 additions
>>>>> might easily apply to other architectures as well, but I've never gotten
>>>>> around to consolidating the VDSO code and I haven't noticed anyone else having
>>>>> gotten around to it either.
>>>>
>>>> I came to the same conclusion, when the VDSO area is remapped, some
>>>> architecture specific code should be triggerd in the kernel to update
>>>> the VDSO reference.
>>>> I'll take a closer look to the mremap in the kernel..
>>>>
>>>>> The workaround is to put trampolines/branches at the location that the
>>>>> restored process expects to the location that the VDSO is currently located at
>>>>> restore time. See vdso_redirect_calls in arch/aarch64/vdso-pie.c.
>>>>
>>>> I put the same code in the new ppc64 branch I created but it is only
>>>> dealing with user space's references to the VDSO, not the kernel ones.
>>>>
>>>> Unfortunately, creating a trampoline at the place the kernel put the
>>>> VDSO at restart is not working all the time since this area may conflict
>>>> with a checkpointed memory part. Updating the kernel reference to the
>>>> VDSO, when it has been moved, looks to be the only way to address that.
>>>
>>> I see. My "production" runs are known/trusted code running under qemu system
>>> emulation with the norandmaps kernel parameter set for run-to-run
>>> determinism/reproducibility. I've only done light testing with randomization
>>> fully enabled, so I'm afraid my experience here is limited, but my
>>> recollection is that (and some quick double checking confirms) the vdso00 and
>>> vdso01 test cases pass for me with /proc/sys/kernel/randomize_va_space == 2.
>>> Triggering the case you describe requires an ET_DYN binary (CFLAGS="-pie
>>> -fPIE"), right? My binaries are currently ET_EXEC. Should we update the vdso
>>> test cases to use those flags?
>>
>> I did my test with the same randomize_va_space value, and I'm wondering
>> if this parameter change the way the VDSO is mapped on Power.
>>
>> This being said, I'm confused by the mention to "ET_DYN binary" you did.
>> My concern is about a reference to the VDSO process's base address the
>> kernel is using to build the return stack of a signal. This is not tied
>> to the way the process is built. It could be statically linked, PIE, PIC
>> or whatever else, the signal return stack will be the same, a call to
>> the system call sigreturn made through the VDSO. Looking at the ARM 64
>> kernel code, this looks to be the same whatever the process's binary is
>> (setup_return).
>> Am I missing something here ?
> 
> (I wrote this a while back but neglected to hit send until now.)
> 
> Regarding the effect of -pie -fPIE, here is edited /proc/self/maps output from
> an ET_EXEC binary on AArch64:
> 
> 00000400000  /bin/aarch64-linux-gnu/busybox
> 00000570000  /bin/aarch64-linux-gnu/busybox
> 3ffad9d0000
> 3ffad9e0000  /lib/aarch64-linux-gnu/libc-2.19-2014.05.so
> 3ffadb20000  /lib/aarch64-linux-gnu/libc-2.19-2014.05.so
> 3ffadb30000  /lib/aarch64-linux-gnu/libm-2.19-2014.05.so
> 3ffadbd0000  /lib/aarch64-linux-gnu/libm-2.19-2014.05.so
> 3ffadbe0000  [vvar]
> 3ffadbf0000  [vdso]
> 3ffadc00000  /lib/aarch64-linux-gnu/ld-2.19-2014.05.so
> 3ffadc20000  /lib/aarch64-linux-gnu/ld-2.19-2014.05.so
> 3ffd4170000  [stack]
> 
> If it were ET_DYN, the busybox mmaps would be in a 3ff region and could
> possibly overlap the vdso region of a previous process, which I thought was
> what you were describing.

No my concern was about the remapping of the vDSO and the non updating
of kernel's reference to its base.

On my ppc64 system, the attached test case leads to a process's core
dump when it is returning from the signal handler once the vDSO has been
remapped.
I'd appreciate if you could give it a try on an ARM64 box/guest. I tried
to set up a ARM64 guest on my side, but I failed to make it run :(
My though is that the process should core dump on ARM64 since the
sigreturn stack frame will point to its old base address.

I'm currently working on a kernel patch to handle the vDSO remapping,
which could solve part of this issue (there is still a window which
can't be addressed).

Thanks,
Laurent.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdso_remap.c
Type: text/x-csrc
Size: 2515 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150318/f02ed9ea/attachment.bin>


More information about the CRIU mailing list