[CRIU] [PATCH v2 5/5] UFFD: Support lazy-pages restore between two hosts

Mon Mar 28 06:04:14 PDT 2016

On Mon, Mar 28, 2016 at 01:19:00PM +0300, Pavel Emelyanov wrote:
> On 03/28/2016 10:12 AM, Mike Rapoport wrote:
> > Hi Adrian,
> > 
> > On Thu, Mar 24, 2016 at 03:52:54PM +0000, Adrian Reber wrote:
> >> From: Adrian Reber <areber at redhat.com>
> >>
> >> This enhances lazy-pages mode to work with two different hosts. Instead
> >> of lazy restoring a process on the same host this enables to keep the
> >> memory pages on the source system and actually only transfer the memory
> >> pages on demand from the source to the destination system.
> >>
> >> The previous, only on one host, lazy restore consisted of two process.
> >>
> >>  criu restore --lazy-pages --address /path/to/unix-domain-socket
> >>
> >> and
> >>
> >>  criu lazy-pages --address /path/to/unix-domain-socket
> >>
> >> The unix domain socket was used to transfer the userfault FD (UFFD) from
> >> the 'criu restore' process to the 'criu lazy-pages' process. The 'criu
> >> lazy-pages' was then listening on the UFFD for userfaultfd messages
> >> which were used to retrieve the requested memory page from the
> >> checkpoint directory and transfer that page into the process to be
> >> restored.
> >>
> >> This commit introduces the ability to keep the pages on the remote host
> >> and only request the transfer of the required pages over TCP on demand.
> >> Therefore criu needs to be started differently than previously.
> >>
> >> Host1:
> >>
> >>    criu restore --lazy-pages --address /path/to/unix-domain-socket
> >>
> >>   and
> >>
> >>    criu lazy-pages --address /path/to/unix-domain-socket \
> >>    --lazy-client ADDR-Host2 --port 27
> >>
> >> Host2:
> >>
> >>    criu lazy-pages --lazy-server --port 27
> >>
> >> On Host1 the process is now restored (as criu always does) except that
> >> the memory pages are not read from pages.img and that the appropriate
> >> pages are marked as being userfaultfd handled. As soon as the restored
> >> process tries to access one the pages a UFFD MSG is received by the
> >> lazy-client (on Host1). This UFFD MSG is then transferred via TCP to the
> >> lazy-sever (on Host2). The lazy-server retrieves the memory page from
> >> the local checkpoint and returns a UFFDIO COPY answer back to the
> >> lazy-client which can the forward this message to the local UFFD which
> >> inserts the page into the restored process.
> >>
> >> The remote lazy restore has the same behavior as the local lazy restore
> >> that, if after 5 seconds no more messages are received on the socket
> >> waiting for UFFD MSG, it switches to copy remaining pages mode, where
> >> all non-UFFD-requested pages are transferred into the restored process.
> >>
> >> TODO:
> >>   * Create from the checkpoint directory a checkpoint without the memory
> >>     pages which are UFFD handled. This would enable a real UFFD remote
> >>     restore where the UFFD pages do not need to be transferred to the
> >>     destination host.
> >>
> >> Signed-off-by: Adrian Reber <areber at redhat.com>
> >> ---
> >>  criu/uffd.c | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> >>  1 file changed, 240 insertions(+), 29 deletions(-)
> >  
> > I have some concerns regarding the proposed design. It could be that I'm
> > jumping late, and you've already had discussions about how to use
> > userfaultfd in CRIU, but I'll share my thoughts anyway...
> > 
> > I think that post-copy migration in CRIU may be implemented without
> > creating so many entities taking care of different sides of userfaultfd.
> > The restore part may create a daemon for userfault handling transparently
> > to the user and dump side may be enhanced with ability to send pages on
> > demand, again without starting another process explicitly.
> > 
> > For instance, using 'criu dump --lazy-pages' would do everything required
> > to serve pages for both on demand requests and for background copying and
> > 'criu restore --lazy-pages' would be able to handle userfaultfd.
> 
> I don't disagree with that, but this all applies to the API part of the
> feature. In particular, the lazy pages daemon itself would be required,
> the question is how t start one. The current code starts one explicitly,
> you propose to fork() it as the part of the restore action, which is
> also fine.

It's not just who does fork(), the user from shell or criu from code.
For instance, criu already has a page-server that can receive pages. Maybe
instead of adding another page server to uffd.c it would be worth teaching
the exiting page-server to send pages.

> > Regarding the patch itself, I believe that mixing socket and uffd handling
> > in the same functions is not very good idea. Although there are lots of
> > similarities between their behaviour, socket and uffd are semantically
> > different and should be handled by different code paths.
> 
> Can you clarify this more? The same daemon should poll on two descriptors --
> the uffd one to handle #PFs from restored tree and the socket to read
> pages from the source node.

As far as I understood Adrian's patch, one daemon that runs on dst polls
for uffd and the daemon that runs on src polls for the socket. In addition,
the daemon that runs on dst may communicate with the daemon that runs on
src to retreive the pages from there.
IMHO, using the same handle_requests for both daemons makes it more
difficult to follow the way each one of the three cases is handled. Maybe
finer grained division into smaller functions will be beneficial to both
avoid code duplication and have clearer code structure.

> -- Pavel

--
Sincerely yours,
Mike.