[CRIU] Remote lazy-restore design discussion

Fri Apr 8 07:51:53 PDT 2016

On Thu, Apr 07, 2016 at 03:37:30PM +0300, Pavel Emelyanov wrote:
> On 04/06/2016 10:38 AM, Adrian Reber wrote:
> > On Tue, Apr 05, 2016 at 07:04:45PM +0300, Pavel Emelyanov wrote:
> >> Well, how about this:
> >>
> >> I. Dump side.
> >>
> >> The criu dump process dumps everything but the lazy pagemaps, lazy pagemaps
> >> are skipped and are queued.
> > 
> > Agreed.
> > 
> >> Then criu dump spawns a daemon that opens a connection to the remote host,
> >> creates page_server_xfer, takes the queue of pagemaps-s that are to be sent
> >> to it and starts polling the xfer socket.
> > 
> > Not sure I already understand this. There is now daemon running which
> > has access to the memory of the dumped process which is still at the
> > original place in the memory of the dumped process.
> 
> Yes.
> 
> > This is something I
> > see as important to make sure the pages of the dumped process are copied
> > as seldom as possible.
> 
> Absolutely agree.
> 
> > Why is the daemon connecting to the restore process and not the other
> > way around?
> 
> Well, this is how dump --page-server already does -- dump side connects
> to restore side. So I thought that doing symmetrical thing for lazy pages
> would make sense.
> 
> >>From what I have done so far it seems more logical the other way around
> > then you described.
> > 
> >  * First the process is dumped (without the lazy pages).
> 
> And what about lazy pages? Where are they? In the dump-side images
> or in memory?

I would hope (not checked if really possible) the lazy pages can stay in
the process which is currently dumped.

> >  * Second the dumped information is transferred (scp/rsync) to the
> >    destination.
> 
> Note, that _some_ memory contents will be sent to page server using
> dump-connects-to-restore method.

Not sure I understand this.

> >  * Third, on the destination host, the process is restored in lazy pages
> >    mode and now the uffd page handler connects to the dump daemon on the
> >    source host.
> 
> Hm... OK.
> 
> >  * The process is now restored.
> > 
> >>From my current understanding it makes no sense that the dumping process
> > connects to the uffd process on the destination system as it is unknown
> > when this will be available.
> 
> Well, OK, from this perspective it may be useful to make restore-connect-to-dump.
> But this shouldn't affect the described model, since it mostly describes what
> happens once sides are interconnected.

That's true.

> >> When available for read, it gets request for particular pagemap and pops one
> >> up in the queue.
> >>
> >> When available for write it gets the next pagemap from queue and ->writepage
> >> one to the page_xfer.
> >>
> >> II. Restore side
> >>
> >> The uffd daemon is spawned, it opens a port (to which dump will connect, or
> >> uses the opts.ps_socket provided connection, the connect_to_page_server()
> >> knows this), creates a hash with (pid, uffd, pagemaps) structures (called
> >> lazy_data below) and listens.
> >>
> >> Restore prepares processes and mappings (Adrian's code already does this), sending
> >> uffd-s to the uffd daemon (already there).
> >>
> >> The uffd daemon starts polling all uffds it has and the connection from the
> >> dump side.
> >>
> >> When uffd is available for read, it gets the #PF info, the goes to the new
> >> page_read that sends the page_server_iov request for out-of-order page (note,
> >> that in case of lazy restore from images the regular page_read is used).
> >>
> >> Most of this code is already in criu-dev from Adrian and you, but we need to
> >> add multi-uffd polling and lazy_data thing and the ability to handle "page
> >> will be available later" response from the page_read.
> >>
> >> When dump side connection is available for reading it calls the core part of
> >> the page_server_serve() routine that reads from socket and handles PS_IOV_FOO
> >> commands. The page_xfer used in _this_ case is the one that finds the appropriate
> >> lazy_data and calls map + wakeup ioctls.
> >>
> >> This part is not ready and this is what I meant when was talking about re-using
> >> page-server code with new page_xfer and page_read.
> >>
> >> Does this make sense?
> > 
> > I am confused about the which side connects to the other side.
> 
> OK :) Let's then try to resolve this issue.
> 
> I don't have sting arguments for dump->restore connection, since if
> you look at how p.haul works, it doesn't use this criu connect feature, 
> it passes one a pre-established descriptor. On both sides. So question
> which side connects to which can be solved either way.

Yes, probably right. Which side connects to which is not really
important. That was just the first point which seemed wrong from the
order of steps I expected.

I thought some more about the different current designs currently
available. Currently I can use my first implementation which provides
its own protocol for page exchange (uffd-struct-based) and the one based
upon Mike's page server client (page-server-based).

In my first implementation (uffd-struct-based) the logic which pages
should be copied was running on the source system (uffd-remote-server).
It reacted on requests and transferred unrequested pages at the end.
To know which pages need to be transferred it had to parse the
checkpoint directory. The uffd-daemon on the destination side was just
forwarding pages to and from uffd and the network socket. It needed to
know how to handle uffd requests but it did not require any knowledge of
the actual checkpoint and about which pages are available.

In the page-server-based-remote-restore the page-server-on-the-dest-host
has to have knowledge how to get which page and the uffd-daemon on the
destination side has to parse the checkpoint directory to know which pages
are part of the restored process and which pages have not yet been
transferred.

I am trying to say that in my original not-page-server-related
implementation the uffd daemon has no need to know the details about the
pages and in the page-server-based implementation all included
parts/daemons/page-servers need to know which pages need to be
transferred.

So, I am not sure how important this argument is, but I just wanted to
mention it for completeness as this aspect (that uffd daemon does not
need to read/access/parse the checkpoint directory) reduces the number
of processes involved in the restore with access to the checkpoint
directory from three to two.

		Adrian