[CRIU] Dealing with VDSO remap

Mon Mar 9 09:42:35 PDT 2015

On 09/03/2015 15:04, Christopher Covington wrote:
> Hi Laurent,
> 
> On 03/09/2015 09:34 AM, Laurent Dufour wrote:
>> Hi Chris,
>>
>> On 06/03/2015 15:58, Christopher Covington wrote:
>>> Hi Laurent,
>>>
>>> On 03/06/2015 09:15 AM, Laurent Dufour wrote:
>>>> Hi,
>>>>
>>>> I'm porting CRIU to the PopwerPC architecture, and among other issues,
>>>> I'm facing a major one with the VDSO remapping at restart time.
>>>>
>>>> On PowerPC, as on ARM64, the kernel keeps track of the VDSO base address
>>>> because it is using it to jump back to a sigreturn trampoline at the end
>>>> of a signal processing (see handle_rt_signal64 in
>>>> arch/powerpc/kernel/signal_64.c, and for ARM64, setup_return in
>>>> arch/arm64/kernel/signal.c).
>>>>
>>>> When remapping the VDSO at restart time, the kernel keep the reference
>>>> to the previous VDSO mapping, the one inheriting from the criu, so
>>>> handling signal after the restart leads to unpredictable results, most
>>>> of the time a SIGSEGV is raised.
>>>>
>>>> I didn't find a smart way to update the kernel reference to the vdso
>>>> mapping once the VDSO is remapped, so no way to work around that today.
>>>>
>>>> Furthermore, since this is the same picture on ARM 64, I'm wondering how
>>>> it could work on this architecture. Am I missing a major thing here ?
>>>>
>>>> If not, is there a plan in the CRIU project to to deal with that, other
>>>> than by hacking the kernel to update its reference at restart time ?
>>>
>>> It's been a while since I worked on this, and I feel like I never had a really
>>> solid understanding of all the parts, but hopefully this can help.
>>>
>>> I think the ideal solution would be for a remap system call to move the VDSO.
>>> This may have been implemented for x86, but I think it's a new feature and
>>> missing on most other architectures. There's a lot of duplication in the VDSO
>>> code between architectures. If there was less duplication, the x86 additions
>>> might easily apply to other architectures as well, but I've never gotten
>>> around to consolidating the VDSO code and I haven't noticed anyone else having
>>> gotten around to it either.
>>
>> I came to the same conclusion, when the VDSO area is remapped, some
>> architecture specific code should be triggerd in the kernel to update
>> the VDSO reference.
>> I'll take a closer look to the mremap in the kernel..
>>
>>> The workaround is to put trampolines/branches at the location that the
>>> restored process expects to the location that the VDSO is currently located at
>>> restore time. See vdso_redirect_calls in arch/aarch64/vdso-pie.c.
>>
>> I put the same code in the new ppc64 branch I created but it is only
>> dealing with user space's references to the VDSO, not the kernel ones.
>>
>> Unfortunately, creating a trampoline at the place the kernel put the
>> VDSO at restart is not working all the time since this area may conflict
>> with a checkpointed memory part. Updating the kernel reference to the
>> VDSO, when it has been moved, looks to be the only way to address that.
> 
> I see. My "production" runs are known/trusted code running under qemu system
> emulation with the norandmaps kernel parameter set for run-to-run
> determinism/reproducibility. I've only done light testing with randomization
> fully enabled, so I'm afraid my experience here is limited, but my
> recollection is that (and some quick double checking confirms) the vdso00 and
> vdso01 test cases pass for me with /proc/sys/kernel/randomize_va_space == 2.
> Triggering the case you describe requires an ET_DYN binary (CFLAGS="-pie
> -fPIE"), right? My binaries are currently ET_EXEC. Should we update the vdso
> test cases to use those flags?

I did my test with the same randomize_va_space value, and I'm wondering
if this parameter change the way the VDSO is mapped on Power.

This being said, I'm confused by the mention to "ET_DYN binary" you did.
My concern is about a reference to the VDSO process's base address the
kernel is using to build the return stack of a signal. This is not tied
to the way the process is built. It could be statically linked, PIE, PIC
or whatever else, the signal return stack will be the same, a call to
the system call sigreturn made through the VDSO. Looking at the ARM 64
kernel code, this looks to be the same whatever the process's binary is
(setup_return).
Am I missing something here ?

Laurent.