[CRIU] Lazy-restore design discussion - round 2

Mon Apr 18 03:10:05 PDT 2016

On Mon, Apr 18, 2016 at 12:31:14PM +0300, Pavel Emelyanov wrote:
> On 04/18/2016 10:46 AM, Adrian Reber wrote:
> > It seems we have reached some kind of agreement and therefore
> > I am trying to summarize, from my point of view, our current discussion
> > results.
> 
> Thanks for keeping track of this :)

Adrian, you've beat me on this :)

> >  * The UFFD daemon does not need a checkpoint directory to run, all
> >    required information will be transferred over the network.
> >    e.g. PID and pages
> 
> I would still read process tree from images dir.

+1

> >  * The page-server protocol needs to be extended to transfer the
> >    lazy-restore pages list from the source system to the UFFD daemon.
> 
> You mean the pagemap-s? But we have the images dir on the destination node,
> uffd can read this information from there as well.

The means that dump side should be teached to split pagemap creation and
actual page dump, right?

> >  * The UFFD daemon is the instance which decides which pages are pushed
> >    when via UFFD into the restored process.
> 
> Not sure I understand this correctly. The UFFD daemon gets #PF-s from tasks
> and sends the requests to source node. The source node sends pages onto
> destination side (some go in out-of-order mode when #PF request from UFFD
> is received). The UFFD injects pages into processes right upon receiving.
> 
> > Do we agree on these points? If yes, I would like to start to implement
> > it that way. If we get to the point where this works it still requires
> > lot of work on the tooling. For example how to split out the lazy-pages
> > from an existing dump, so that only the non-lazy-pages are actually
> > transferred to the destination system.
> > 
> > 		Adrian
> > .
> > 
>