[CRIU] Lazy-restore design discussion - round 2

Mon Apr 18 04:13:47 PDT 2016

On Mon, Apr 18, 2016 at 12:47:00PM +0200, Adrian Reber wrote:
> On Mon, Apr 18, 2016 at 01:10:05PM +0300, Mike Rapoport wrote:
> > On Mon, Apr 18, 2016 at 12:31:14PM +0300, Pavel Emelyanov wrote:
> > > On 04/18/2016 10:46 AM, Adrian Reber wrote:
> > > > It seems we have reached some kind of agreement and therefore
> > > > I am trying to summarize, from my point of view, our current discussion
> > > > results.
> > > 
> > > Thanks for keeping track of this :)
> > 
> > Adrian, you've beat me on this :)
> >  
> > > >  * The UFFD daemon does not need a checkpoint directory to run, all
> > > >    required information will be transferred over the network.
> > > >    e.g. PID and pages
> > > 
> > > I would still read process tree from images dir.
> > 
> > +1
> 
> Hmm, then I don't get it how the patches "lazy-pages: handle multiple
> processes" fit into it. How can the uffd daemon handle multiple restore
> requests when it needs to know where the checkpoint directory is? As I
> understand it I can either start it without '-D' and it gets the
> information about the pagemap-s from somewhere else (the network) or I
> can start the uffd daemon with '-D', but why should it then handle
> multiple requests as all the required information is in a directory
> specified on the command-line, which can change for every restored
> process. From my point of view this seems contradictory.

I think we have huge misunderstanding between us here. My view was that
uffd daemon behaves somewhat similar to page-server daemon. It is launched
with -D, waits for restore to get the uffd's and then handles #PF's that
come from the processes being restored. To take care of a #PF, uffd-daemon
either reads pages*.img from the checkpoint directory or gets the pages
from the dump side over the network.

In my understating "multiple requests" means that single uffd-daemon takes
care of the entire process tree present in the checkpoint directory...

> > > >  * The page-server protocol needs to be extended to transfer the
> > > >    lazy-restore pages list from the source system to the UFFD daemon.
> > > 
> > > You mean the pagemap-s? But we have the images dir on the destination node,
> > > uffd can read this information from there as well.
> 
> See above, this is still unclear to me. Either the uffd daemon can
> handle multiple requests, but then it cannot read information from an
> images directory. We could also transfer the information about images
> directory from the restore process via the same way as the PID and UFFD
> (unix domain socket). Then the uffd daemon would know where the
> directory is but does not need it on the command-line. It would still
> require access to the local file system, which could be avoided by
> transferring the pagemap-s from somewhere else.
> 
> > The means that dump side should be teached to split pagemap creation and
> > actual page dump, right?
> 
> This is part of the tooling work I mentioned below. We probably need
> tools to split a checkpoint into a lazy-part and a part which needs to
> be transferred and the tools also should be able to combine a previously
> lazy-restore checkpoint back to a 'normal' checkpoint directory.
> 
> > > >  * The UFFD daemon is the instance which decides which pages are pushed
> > > >    when via UFFD into the restored process.
> > > 
> > > Not sure I understand this correctly. The UFFD daemon gets #PF-s from tasks
> > > and sends the requests to source node. The source node sends pages onto
> > > destination side (some go in out-of-order mode when #PF request from UFFD
> > > is received). The UFFD injects pages into processes right upon receiving.
> 
> This then also needs to be decided. Which part is responsible for the
> pages transferred. Meaning which part knows which pages have been
> requested, which pages have been transferred and which pages are
> missing. Also improvements to send adjacent pages together needs to be
> handle somewhere. This can either be the uffd-daemon on the destination
> system or the page-server on the source system. For me it feels more
> correctly to be done by the uffd daemon and not by the source system.
> That is also the reason that it either needs to read the image directory
> or get a list of pages in the pagemap suitable for lazy restore. It
> would enable us to leave out any UFFD logic on the source node as it
> only needs to be a page-server. If we implement it on the source node
> parts of the page-server code need to be more UFFD aware (sending
> adjacent pages, sending remaining pages) but on the other hand the uffd
> daemon could be reduced to a simple lazy-pages forwarder.
> 
> > > > Do we agree on these points? If yes, I would like to start to implement
> > > > it that way. If we get to the point where this works it still requires
> > > > lot of work on the tooling. For example how to split out the lazy-pages
> > > > from an existing dump, so that only the non-lazy-pages are actually
> > > > transferred to the destination system.
> > > > 
> > > > 		Adrian
> > > > .
> > > > 
> > > 
> 
> 		Adrian