[CRIU] criu and userfaultfd

Pavel Emelyanov xemul at parallels.com
Thu Sep 17 04:24:33 PDT 2015


On 09/17/2015 12:52 PM, Adrian Reber wrote:
> On Wed, Sep 09, 2015 at 02:51:04PM +0300, Pavel Emelyanov wrote:
>>> Sounds like it should work, for simple cases at least. Which would be a
>>> good point to know that and how it works. So I will continue/start to
>>> work on userfaultfd in combination with CRIU and once I have something I
>>> can update the wiki page.
>>
>> That's great! Thanks a lot!
> 
> It took me a while but I think I have now found the right place for
> userfaultfd to hook into. Right now I am marking a single page that it
> should be handled by userfaultfd with UFFDIO_REGISTER_MODE_MISSING.

Why do you plan to do it with the page granularity? I thought to go
with the whole VMA-s.

> I am in the restorer after the memory has been remapped and this seems
> to be the right place to register my memory page as userfaultfd handled
> because earlier it does not exist and the corresponding userfaultfd
> ioctl fails. 

Yes.

> In addition to marking the pages as handled by userfaultfd
> I am also setting madvise to MADV_DONTNEED.

Why do we need to madv-dontneed the area? And by whom?

> The uffd FD is opened in sigreturn_restore() and then passed as an
> additional parameter of struct task_args into the restorer. I am opening
> the uffd FD before going in the restorer as I am also transmitting the
> open uffd FD to another process via unix sockets which is later used to
> react on the uffd copy requests once the process is restored and
> accesses memory handled by uffd.

But shouldn't the uffd FD be passed to criu user-space daemon? The daemon
is then to read pages from somewhere and put them back into the restored
task's address space.

Or you're talking that the uffd FD is needed in restorer to properly
add pages/vmas into it?

> Right now I am only marking a single page in the restored process as
> being handled by uffd. The process hangs after restore when accessing
> this page for the first time and I can now insert via uffd whatever
> content I want.

Yes! But keep in mind, that Andrea wrote about mapping page into mm
versus copying it -- he claims that copying _should_ be faster than
remapping as the former doesn't involve tlb flush.

> As I am printing only the content of this page the
> restored process still works but prints out different content.

OK :)

> So far it seems as criu and userfaultfd can be combined. Before
> continuing further I wanted to know if this is the right way to go.

Yes, seem to be the good start. Are you using the non-cooperative uffd
patches or just play with existing Andrea's stuff?

> I am marking the pages as uffd handled after the madvice() bits are
> restored. Using uffd will mean that I will overwrite the madvice()
> information. Does this need be handled better? Or is it okay to
> overwrite all pages with MADV_DONTNEED in the case of uffd?

Hm... I don't get the idea of madvise-ing the pages at all, sorry :(
Can you describe it in more details please?

>>From my point of view the next steps would be to implement a local lazy
> restore. The page server should know how to get the uffd FD from the main
> restore process and then transfer the memory on request to the process being
> lazy restored. This also means that the uffd FD has to be set up in
> sigreturn_restore() and passed to the restorer like I am doing it right
> now.

OK.

> For remote lazy restore a simple version of the page server is necessary
> which also gets the uffd FD but which then requests the pages over the
> network before coping it to the uffd FD.

I agree. But please pay attention to the recent Rodrigo's patches -- he's
implementing image cache and proxy that will help transferring the images
over the network. This work will obsolete the page-server, so probably more
changes will be required in your patches too.

> Does this sound right?

Yup, seem to be correct :)

-- Pavel



More information about the CRIU mailing list