[CRIU] [PATCH v3 2/2] lazy-pages: add support to combine pre-copy and post-copy

Adrian Reber adrian at lisas.de
Thu Sep 22 00:31:41 PDT 2016


On Wed, Sep 21, 2016 at 09:44:49AM +0300, Mike Rapoport wrote:
> Hi Adrian,
> 
> On Tue, Sep 20, 2016 at 06:54:11PM +0200, Adrian Reber wrote:
> > From: Adrian Reber <areber at redhat.com>
>  
> [snip]
>  
> > v2:
> >  - changed parent detection to use pagemap_in_parent()
> > 
> > v3:
> >  - unfortunately this reverts
> >    c11cf95afbe023a2816a3afaecb65cc4fee670d7
> >    "criu: mem: skip lazy pages during restore based on pagemap info"
> >    To be able to split the VMA-s in the right chunks for the restorer
> >    it is necessary to make the decision lazy or not on the VmaEntry
> >    level.
> 
> I've thought a little bit more about it and I'm not sure it is necessary to
> split VMAs at all. The restorer can register the entire VMA with
> userfaultfd, even if the VMA contains pages that are already restored. We
> just need to make sure uffd.c can properly handle -EEXITS case and it seems
> we are good.
> 
> Consider the following scenario:
> There is a VMA that spawns from 0x10000 to 0x20000 (16 pages). Let's say
> that the range from 0x10000 to 0x1a000 is dumped during pre-dump and there
> were no changes in that memory, so during dump the range 0x10000 - 0x1a0000
> will be marked with PE_PARENT, and the range 0x1a000 - 0x20000 will be
> marked PE_LAZY.
> During restore, the range marked as PE_PARENT will be filled with the
> content from the disk image and the range marked PE_LAZY will remain
> unpopulated.
> restorer will register the entire VMA (0x10000 - 0x20000) with userfaultfd
> and lazy-pages daemon will consider the entire range as lazy.
> However, the pages at 0x10000 - 0x1a000 are already present, therefore
> access to these pages won't cause a page fault.

Sorry for all the emails. Why will accessing pages in the range 0x10000
- 0x1a000 not cause a page fault? I see that it works, but I am not sure
why it does not cause a page fault any more. Is it because we copied
data to the address before we remap it? I guess I forgot how userfaultfd
works. We prepare the pages for usefaultfd, then we remap the pages to
the final destination. But we have never written data to those pages.
Before or after the remapping. Therefore a page fault occurs. If it
contains data from a parent checkpoint this means that we have copied
data to the memory range before remapping it and no page fault occurs.
If we wanted userfaultfd to work on pages with previously copied data we
would need to run madvise() on that pages. Ah, I guess I understand it
again.

		Adrian


More information about the CRIU mailing list