[CRIU] Combining pre-copy and post-copy

Mike Rapoport rppt at linux.vnet.ibm.com
Wed May 17 22:33:47 PDT 2017


Hi all,

On Mon, Feb 13, 2017 at 11:02:59AM +0100, Adrian Reber wrote:
> Hello Mike,
> 
> I have to come back to an old topic from September:
> 
> https://lists.openvz.org/pipermail/criu/2016-September/031672.html
> 
> I am currently trying to restore a process with pre-copy and post-copy
> and it fails.

It's been a while since the topic was brought up :)
Anyway, after some off-list exchanges and attempts to debug the issue with
using pre-copy and post-copy together it seems I have a theory that
explains what went wrong.

When we do lazy restore after a round of pre-dump, the memory restore can
be roughly outlined as:

criu restore:
* Map VMAs at arbitrary address
* Fill in the pages that reside in the pre-dump
* Remap VMAs to their original address
* Register VMAs with uffd

criu-lazy-pages:
* Populate pages on demand
* Populate remaining pages

Note, that when we do lazy restore *without* pre-dump, the memory of
uffd-monitored VMAs is not populated.
 
Now, to the problem itself. When we fill the memory contents from the
pre-dump, mappings are populated with pages which become subject to
khugepaged collapses. During khugepaged collapse, there is a new huge page
allocated, the content of the original pages is copied there and the new
page is mapped into the process address space instead of small pages that
were there originally. This effectively kills the non-present gaps that
were in the mapping between the pages with content:

address  | small pages       | huge page
---------+-------------------+-----------------
0x1000   | page with data    | page with data
0x2000   | pages not present |
...      |                   |
0x1b000  | pageis with data  |
0x1f000  | pages not present |
0xff000  |                   |
0x100000 | end of 2M region  | end of the page

For the pure lazy restore case, the mappings are empty until they are
registered with uffd and khugepaged does not attempt to collapse the pages
in uffd-enabled VMAs.

I could think of two possible ways to resolve this issue:

* Disable THP before memory restore and re-enable it once all the VMAs are
registered with uffd. The drawback is that it would cause unnecessary huge
pages splits and collapses and screwed TLB
* Use madvise(MADV_NOHUGEPAGE) before memory restore and
madvise(MADV_HUGEPAGE) after VMAs are registered with uffd. It'll work most
of the time, but for applications that use this madvise() settings we may
get it wrong in the end.
* Try to add madvise(MADV_CLR_NOHUGEPAGE) to the kernel. Then we can use
madvise(MADV_NOHUGEPAGE) right after mmap() and reset that flag after the
VMA is registered with uffd.

Suggestions?

--
Sincerely yours,
Mike.



More information about the CRIU mailing list