[CRIU] Combining pre-copy and post-copy

Pavel Emelyanov xemul at virtuozzo.com
Thu May 18 02:01:25 PDT 2017


On 05/18/2017 08:33 AM, Mike Rapoport wrote:
> Hi all,
> 
> On Mon, Feb 13, 2017 at 11:02:59AM +0100, Adrian Reber wrote:
>> Hello Mike,
>>
>> I have to come back to an old topic from September:
>>
>> https://lists.openvz.org/pipermail/criu/2016-September/031672.html
>>
>> I am currently trying to restore a process with pre-copy and post-copy
>> and it fails.
> 
> It's been a while since the topic was brought up :)
> Anyway, after some off-list exchanges and attempts to debug the issue with
> using pre-copy and post-copy together it seems I have a theory that
> explains what went wrong.
> 
> When we do lazy restore after a round of pre-dump, the memory restore can
> be roughly outlined as:
> 
> criu restore:
> * Map VMAs at arbitrary address
> * Fill in the pages that reside in the pre-dump
> * Remap VMAs to their original address
> * Register VMAs with uffd
> 
> criu-lazy-pages:
> * Populate pages on demand
> * Populate remaining pages
> 
> Note, that when we do lazy restore *without* pre-dump, the memory of
> uffd-monitored VMAs is not populated.
>  
> Now, to the problem itself. When we fill the memory contents from the
> pre-dump, mappings are populated with pages which become subject to
> khugepaged collapses. During khugepaged collapse, there is a new huge page
> allocated, the content of the original pages is copied there and the new
> page is mapped into the process address space instead of small pages that
> were there originally. This effectively kills the non-present gaps that
> were in the mapping between the pages with content:
> 
> address  | small pages       | huge page
> ---------+-------------------+-----------------
> 0x1000   | page with data    | page with data
> 0x2000   | pages not present |
> ...      |                   |
> 0x1b000  | pageis with data  |
> 0x1f000  | pages not present |
> 0xff000  |                   |
> 0x100000 | end of 2M region  | end of the page
> 
> For the pure lazy restore case, the mappings are empty until they are
> registered with uffd and khugepaged does not attempt to collapse the pages
> in uffd-enabled VMAs.
> 
> I could think of two possible ways to resolve this issue:
> 
> * Disable THP before memory restore and re-enable it once all the VMAs are
> registered with uffd. The drawback is that it would cause unnecessary huge
> pages splits and collapses and screwed TLB
> * Use madvise(MADV_NOHUGEPAGE) before memory restore and
> madvise(MADV_HUGEPAGE) after VMAs are registered with uffd. It'll work most
> of the time, but for applications that use this madvise() settings we may
> get it wrong in the end.
> * Try to add madvise(MADV_CLR_NOHUGEPAGE) to the kernel. Then we can use
> madvise(MADV_NOHUGEPAGE) right after mmap() and reset that flag after the
> VMA is registered with uffd.
> 
> Suggestions?

I'd make khugepaged ignore VMAs that are under UFFD. Or, at least, prevent
it from merging present pages with non-present (holes) for such VMAs.

-- Pavel

> --
> Sincerely yours,
> Mike.
> 
> .
> 



More information about the CRIU mailing list