[CRIU] criu and userfaultfd

Pavel Emelyanov xemul at parallels.com
Tue Sep 8 08:00:17 PDT 2015


On 09/08/2015 04:50 PM, Adrian Reber wrote:
> On Wed, Sep 02, 2015 at 03:23:47PM +0300, Pavel Emelyanov wrote:
>> On 09/01/2015 06:22 PM, Adrian Reber wrote:
>>> As userfaultfd has been mentioned multiple times at Linux Plumbers
>>> Conference I wanted to ask if anybody already has some code in that
>>> direction?
>>
>> Sanidhya (in Cc) should have played with it for a while. Also I had
>> some kernel work to make userfaultfd support CRIU case, it can be
>> found here [1]. I planned to get back to them once Andrea's work gets
>> merged into the upstream.
> 
> Ah, interesting. I have also looked into userfaultfd and have now a
> simple test case where I have two processes exchanging FDs. One process
> tries to access memory which is handled by userfaultfd and hanging until
> I make the memory area available from the other process using
> userfaultfd.

Do you use the "non-cooperative" mode I tried to introduce with my set?
Or just stock uffd code from Andrea (that has recently get merged upstream)?

> Trying to understand what would be necessary to use userfaultfd with
> CRIU I would expect to restore everything except the memory pages using
> CRIU and marking all to the process related pages as being handled by
> userfaultfd. For each non existing page a request should be handled by
> userfaultfd. 

Agree. I tried to start this some time ago and put the notes into the
http://criu.org/Userfaultfd page.

> Are there any other problems which need to be resolved to restore
> processes in combination with userfaultfd? I would like to try it out
> and just want to make sure this is not completely impossible.

Well, what I have found is on the wiki page. These are

- Only MAP_PRIVATE | MAP_ANONYMOUS will be supported in the 1st version due to 
  kernel constraints. This (in theory) can be fixed in kernel, and at some point
  it will have to, since MAP_FILE|MAP_PRIVATE mappings are quite common to apps --
  that's the way libraries are mapped.

- Userfault is known not to map one page into two places. Thus -- COW-ed pages
  will get COW-ed right on restore. Not sure how to fix this on the kernel side :(

- Andrea (author) states that UFFDIO_REMAP might be slow as compared to 
  UUFDIO_COPY. Probably it makes sense to copy data into tasks, not move.

- Forks, unmaps and mremaps can screw things up -- if we don't report these events
  via uffd to criu, then pagefault from new process or from remapped area can be
  handled incorrectly by criu. This is what I started writing the non-cooperative
  mode for.

-- Pavel



More information about the CRIU mailing list