[CRIU] Lazy-restore design discussion - round 2

Adrian Reber adrian at lisas.de
Mon Apr 18 03:47:00 PDT 2016


On Mon, Apr 18, 2016 at 01:10:05PM +0300, Mike Rapoport wrote:
> On Mon, Apr 18, 2016 at 12:31:14PM +0300, Pavel Emelyanov wrote:
> > On 04/18/2016 10:46 AM, Adrian Reber wrote:
> > > It seems we have reached some kind of agreement and therefore
> > > I am trying to summarize, from my point of view, our current discussion
> > > results.
> > 
> > Thanks for keeping track of this :)
> 
> Adrian, you've beat me on this :)
>  
> > >  * The UFFD daemon does not need a checkpoint directory to run, all
> > >    required information will be transferred over the network.
> > >    e.g. PID and pages
> > 
> > I would still read process tree from images dir.
> 
> +1

Hmm, then I don't get it how the patches "lazy-pages: handle multiple
processes" fit into it. How can the uffd daemon handle multiple restore
requests when it needs to know where the checkpoint directory is? As I
understand it I can either start it without '-D' and it gets the
information about the pagemap-s from somewhere else (the network) or I
can start the uffd daemon with '-D', but why should it then handle
multiple requests as all the required information is in a directory
specified on the command-line, which can change for every restored
process. From my point of view this seems contradictory.

> > >  * The page-server protocol needs to be extended to transfer the
> > >    lazy-restore pages list from the source system to the UFFD daemon.
> > 
> > You mean the pagemap-s? But we have the images dir on the destination node,
> > uffd can read this information from there as well.

See above, this is still unclear to me. Either the uffd daemon can
handle multiple requests, but then it cannot read information from an
images directory. We could also transfer the information about images
directory from the restore process via the same way as the PID and UFFD
(unix domain socket). Then the uffd daemon would know where the
directory is but does not need it on the command-line. It would still
require access to the local file system, which could be avoided by
transferring the pagemap-s from somewhere else.

> The means that dump side should be teached to split pagemap creation and
> actual page dump, right?

This is part of the tooling work I mentioned below. We probably need
tools to split a checkpoint into a lazy-part and a part which needs to
be transferred and the tools also should be able to combine a previously
lazy-restore checkpoint back to a 'normal' checkpoint directory.

> > >  * The UFFD daemon is the instance which decides which pages are pushed
> > >    when via UFFD into the restored process.
> > 
> > Not sure I understand this correctly. The UFFD daemon gets #PF-s from tasks
> > and sends the requests to source node. The source node sends pages onto
> > destination side (some go in out-of-order mode when #PF request from UFFD
> > is received). The UFFD injects pages into processes right upon receiving.

This then also needs to be decided. Which part is responsible for the
pages transferred. Meaning which part knows which pages have been
requested, which pages have been transferred and which pages are
missing. Also improvements to send adjacent pages together needs to be
handle somewhere. This can either be the uffd-daemon on the destination
system or the page-server on the source system. For me it feels more
correctly to be done by the uffd daemon and not by the source system.
That is also the reason that it either needs to read the image directory
or get a list of pages in the pagemap suitable for lazy restore. It
would enable us to leave out any UFFD logic on the source node as it
only needs to be a page-server. If we implement it on the source node
parts of the page-server code need to be more UFFD aware (sending
adjacent pages, sending remaining pages) but on the other hand the uffd
daemon could be reduced to a simple lazy-pages forwarder.

> > > Do we agree on these points? If yes, I would like to start to implement
> > > it that way. If we get to the point where this works it still requires
> > > lot of work on the tooling. For example how to split out the lazy-pages
> > > from an existing dump, so that only the non-lazy-pages are actually
> > > transferred to the destination system.
> > > 
> > > 		Adrian
> > > .
> > > 
> > 

		Adrian


More information about the CRIU mailing list