[CRIU] criu and userfaultfd
Adrian Reber
adrian at lisas.de
Thu Sep 17 07:51:27 PDT 2015
On Thu, Sep 17, 2015 at 02:24:33PM +0300, Pavel Emelyanov wrote:
> On 09/17/2015 12:52 PM, Adrian Reber wrote:
> > On Wed, Sep 09, 2015 at 02:51:04PM +0300, Pavel Emelyanov wrote:
> >>> Sounds like it should work, for simple cases at least. Which would be a
> >>> good point to know that and how it works. So I will continue/start to
> >>> work on userfaultfd in combination with CRIU and once I have something I
> >>> can update the wiki page.
> >>
> >> That's great! Thanks a lot!
> >
> > It took me a while but I think I have now found the right place for
> > userfaultfd to hook into. Right now I am marking a single page that it
> > should be handled by userfaultfd with UFFDIO_REGISTER_MODE_MISSING.
>
> Why do you plan to do it with the page granularity? I thought to go
> with the whole VMA-s.
For testing I wanted to control a single page which I can inspect from
the restored process.
> > I am in the restorer after the memory has been remapped and this seems
> > to be the right place to register my memory page as userfaultfd handled
> > because earlier it does not exist and the corresponding userfaultfd
> > ioctl fails.
>
> Yes.
>
> > In addition to marking the pages as handled by userfaultfd
> > I am also setting madvise to MADV_DONTNEED.
>
> Why do we need to madv-dontneed the area? And by whom?
Unfortunately, I have no idea. I just saw in the userfaultfd example
code
https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/plain/tools/testing/selftests/vm/userfaultfd.c?h=userfault
that this was set and as it was not working before I tried to madvise
and it did help. But I have no idea why?
> > The uffd FD is opened in sigreturn_restore() and then passed as an
> > additional parameter of struct task_args into the restorer. I am opening
> > the uffd FD before going in the restorer as I am also transmitting the
> > open uffd FD to another process via unix sockets which is later used to
> > react on the uffd copy requests once the process is restored and
> > accesses memory handled by uffd.
>
> But shouldn't the uffd FD be passed to criu user-space daemon? The daemon
> is then to read pages from somewhere and put them back into the restored
> task's address space.
>
> Or you're talking that the uffd FD is needed in restorer to properly
> add pages/vmas into it?
Yes, exactly. In pie/restorer.c the pages are remapped and after
remapping I can register them with uffd.
> > Right now I am only marking a single page in the restored process as
> > being handled by uffd. The process hangs after restore when accessing
> > this page for the first time and I can now insert via uffd whatever
> > content I want.
>
> Yes! But keep in mind, that Andrea wrote about mapping page into mm
> versus copying it -- he claims that copying _should_ be faster than
> remapping as the former doesn't involve tlb flush.
Until now I have only used
ioctl(uffd, UFFDIO_COPY, &uffdio_copy);
this sounds like copying.
> > As I am printing only the content of this page the
> > restored process still works but prints out different content.
>
> OK :)
>
> > So far it seems as criu and userfaultfd can be combined. Before
> > continuing further I wanted to know if this is the right way to go.
>
> Yes, seem to be the good start. Are you using the non-cooperative uffd
> patches or just play with existing Andrea's stuff?
Right now I am only using Andrea's stuff. I have seen your patches but
not yet applied.
> > I am marking the pages as uffd handled after the madvice() bits are
> > restored. Using uffd will mean that I will overwrite the madvice()
> > information. Does this need be handled better? Or is it okay to
> > overwrite all pages with MADV_DONTNEED in the case of uffd?
>
> Hm... I don't get the idea of madvise-ing the pages at all, sorry :(
> Can you describe it in more details please?
Unfortunately not. As previously said, I took this part from Andrea's
example as it was not working without it. Without madvise my restored
process just works. Maybe this related to the remapping which makes the
pages available to the process. But I do not actually understand this.
> >>From my point of view the next steps would be to implement a local lazy
> > restore. The page server should know how to get the uffd FD from the main
> > restore process and then transfer the memory on request to the process being
> > lazy restored. This also means that the uffd FD has to be set up in
> > sigreturn_restore() and passed to the restorer like I am doing it right
> > now.
>
> OK.
>
> > For remote lazy restore a simple version of the page server is necessary
> > which also gets the uffd FD but which then requests the pages over the
> > network before coping it to the uffd FD.
>
> I agree. But please pay attention to the recent Rodrigo's patches -- he's
> implementing image cache and proxy that will help transferring the images
> over the network. This work will obsolete the page-server, so probably more
> changes will be required in your patches too.
I have seen his work and it sounds like it will provide a better
infrastructure for uffd also. I can probably easily change my code to
use the image cache and proxy once it has been committed.
> > Does this sound right?
>
> Yup, seem to be correct :)
Good to hear.
Adrian
More information about the CRIU
mailing list