[CRIU] [RFD] (v)omitting vDSO PFN check in CRIU

Dmitry Safonov dsafonov at virtuozzo.com
Fri Jun 2 15:53:04 MSK 2017


On 06/01/2017 11:58 PM, Cyrill Gorcunov wrote:
> On 06/01/2017 10:48 PM, Dmitry Safonov wrote:
>> Hi guys, criu-ml, et al.
>>
>> I want to hear your opinions on my proposed solution to a bug.
>>
>> Currently, here we have:
>> 1. During checkpointing criu searches vdso vma in dumpee.
>>     It's done by looking into /proc/../maps during maps parsing.
>> 2. Pre-v3.16 kernels lose "[vdso]" naming for vma if vdso was
>>     mremap()'ed. Futhermore, such kernels may return "[vdso]"
>>     name for non-vdso vma if such vma was placed on position,
>>     from which vdso vma was moved with mremap().
>>     That's because arch_vma_name() returns on pre-v3.16 kernels
>>     hint "[vdso]" if vma.start_addr <= mm->context.vdso, where
>>     mm->context.vdso was never updated.
>> 3. To differ real vdso vma from other vmas, criu uses pagemap
>>     for criu to find vdso's pfn and compares it with dumpee's.
>>     If pfn for vma, which previously from /proc/../maps was
>>     supposed to be vdso, differs from criu's vdso pfn,
>>     such vma is considered to be mishinted and is dumped as
>>     usual anonymous area.
>>
>> It all worked good, till we added to vzkernel uname and time-monotonic
>> virtualization: that virtualization results in uniq physical pages for
>> vdso per-CT. That means that two processes in one CT1 will have the
>> same physicall address for vdso, but process A from CT1 and process B
>> from CT2 will have different physicall addresses for vdso.
>> So, today with @ptikhomirov on Jenkins with vz7 we found that every
>> process in a new UTS-ns gets during dump mishinted vdso address
>> (as uts-ns vdso's pfn differ from init-uts-ns's vdso pfn).
>>
>> Here is what I propose as a solution:
>> o Add a kdat test, which tests if "[vdso]" hint stays after mremap() of
>> vdso area.
>> o Don't use checking of vdso's pfn during dumping if the hint stays.
>>
>> @xemul, I know that you don't welcome checks for non-mainstream
>> features, such as uname and gtod virtualization.
>> But, I think this way will suit *both* ms and vz platforms.
>> Reason, why it's good for ms criu version is that we will skip
>> open(), read(), close() for pagemap per each dumpee (for all kernels
>> newer than v3.16). That's a hole bunch of syscalls!
>>
>> So, does the rework of vDSO PFN check to kdat test for "[vdso]" hint
>> after mremap() sounds sane?
>> Opinions?
> 
> Sounds ok to me as a first iteration. If @xemul won't mind I would try
> this way and see how it goes. After all it'll sit in criu-dev for some
> time so we could check if it worth the efforts.

After some thinking, there is another tricky moment:
Currently, checking vdso's pfn ensures that application hasn't changed
it's vdso. Such changed vdso was assumed to be anonymous mapping,
after this change - it will equal real vdso mapping.
This change has pros:
- correctly set context.vdso for application, no need in second vdso
   after restore;
And cons:
- need to re-validate the logic of filling vdso symtable - to be sure
   that any malignant change in vdso will not result in criu crash on
   parsing elf headers or something.

Also at this moment we check pfn only of the first page of vdso, while
it has two pages for all supported kernels from v3.10.. to present.

In my POV, the valid case of self-modified vdso is if someone during
dubugging in CT attached gdb to daemon and placed break on vdso.
After debugging, removed breakpoint and exited gdb.
For such case an debuggee will have COW'ed vdso page(s).
Other than that, I think, changing of self vdso pages is non-valid.

So, in the other words: I would say that CRIU won't support (and doesn't
do it now) applications which change their vdso. That means, that
after C/R mapped vdso will be the original system's vdso (except case
for jump-trampolines, where vdso will be preserved).

> 
>> Other suggestions?
>>
> 
> 


-- 
              Dmitry


More information about the CRIU mailing list