[CRIU] Combining pre-copy and post-copy
Pavel Emelyanov
xemul at virtuozzo.com
Thu May 18 02:01:25 PDT 2017
On 05/18/2017 08:33 AM, Mike Rapoport wrote:
> Hi all,
>
> On Mon, Feb 13, 2017 at 11:02:59AM +0100, Adrian Reber wrote:
>> Hello Mike,
>>
>> I have to come back to an old topic from September:
>>
>> https://lists.openvz.org/pipermail/criu/2016-September/031672.html
>>
>> I am currently trying to restore a process with pre-copy and post-copy
>> and it fails.
>
> It's been a while since the topic was brought up :)
> Anyway, after some off-list exchanges and attempts to debug the issue with
> using pre-copy and post-copy together it seems I have a theory that
> explains what went wrong.
>
> When we do lazy restore after a round of pre-dump, the memory restore can
> be roughly outlined as:
>
> criu restore:
> * Map VMAs at arbitrary address
> * Fill in the pages that reside in the pre-dump
> * Remap VMAs to their original address
> * Register VMAs with uffd
>
> criu-lazy-pages:
> * Populate pages on demand
> * Populate remaining pages
>
> Note, that when we do lazy restore *without* pre-dump, the memory of
> uffd-monitored VMAs is not populated.
>
> Now, to the problem itself. When we fill the memory contents from the
> pre-dump, mappings are populated with pages which become subject to
> khugepaged collapses. During khugepaged collapse, there is a new huge page
> allocated, the content of the original pages is copied there and the new
> page is mapped into the process address space instead of small pages that
> were there originally. This effectively kills the non-present gaps that
> were in the mapping between the pages with content:
>
> address | small pages | huge page
> ---------+-------------------+-----------------
> 0x1000 | page with data | page with data
> 0x2000 | pages not present |
> ... | |
> 0x1b000 | pageis with data |
> 0x1f000 | pages not present |
> 0xff000 | |
> 0x100000 | end of 2M region | end of the page
>
> For the pure lazy restore case, the mappings are empty until they are
> registered with uffd and khugepaged does not attempt to collapse the pages
> in uffd-enabled VMAs.
>
> I could think of two possible ways to resolve this issue:
>
> * Disable THP before memory restore and re-enable it once all the VMAs are
> registered with uffd. The drawback is that it would cause unnecessary huge
> pages splits and collapses and screwed TLB
> * Use madvise(MADV_NOHUGEPAGE) before memory restore and
> madvise(MADV_HUGEPAGE) after VMAs are registered with uffd. It'll work most
> of the time, but for applications that use this madvise() settings we may
> get it wrong in the end.
> * Try to add madvise(MADV_CLR_NOHUGEPAGE) to the kernel. Then we can use
> madvise(MADV_NOHUGEPAGE) right after mmap() and reset that flag after the
> VMA is registered with uffd.
>
> Suggestions?
I'd make khugepaged ignore VMAs that are under UFFD. Or, at least, prevent
it from merging present pages with non-present (holes) for such VMAs.
-- Pavel
> --
> Sincerely yours,
> Mike.
>
> .
>
More information about the CRIU
mailing list