[CRIU] Remote lazy-restore design discussion

Adrian Reber adrian at lisas.de
Mon Apr 11 03:29:23 PDT 2016


On Fri, Apr 08, 2016 at 07:41:18PM +0300, Pavel Emelyanov wrote:
[...]
> >>>  * Second the dumped information is transferred (scp/rsync) to the
> >>>    destination.
> >>
> >> Note, that _some_ memory contents will be sent to page server using
> >> dump-connects-to-restore method.
> > 
> > Not sure I understand this.
> 
> I meant that while doing the final dump some pages' contents will be sent
> to the destination node via the page server. And thus, at restore time, will
> be present there as on-disk images (or on-tmpfs images).

Ah, so all pages which cannot be handled by uffd, right?

[...]

> >> I don't have sting arguments for dump->restore connection, since if
> >> you look at how p.haul works, it doesn't use this criu connect feature, 
> >> it passes one a pre-established descriptor. On both sides. So question
> >> which side connects to which can be solved either way.
> > 
> > Yes, probably right. Which side connects to which is not really
> > important. That was just the first point which seemed wrong from the
> > order of steps I expected.
> > 
> > I thought some more about the different current designs currently
> > available. Currently I can use my first implementation which provides
> > its own protocol for page exchange (uffd-struct-based) and the one based
> > upon Mike's page server client (page-server-based).
> 
> I would unify page-server and uffd protocols at least in terms of
> messages they exchange.

Make sense.

> > In my first implementation (uffd-struct-based) the logic which pages
> > should be copied was running on the source system (uffd-remote-server).
> > It reacted on requests and transferred unrequested pages at the end.
> > To know which pages need to be transferred it had to parse the
> > checkpoint directory.
> 
> Or keep the information obtained while doing "dump" action. No?

Which is then transferred via network to the uffd-daemon listing on the
uffd? Is there already a protocol in the page-server to transfer
additional (non-pages) data? Or would this mean just an additional
page-server command with its own function handling it? Probably.

> > The uffd-daemon on the destination side was just
> > forwarding pages to and from uffd and the network socket. It needed to
> > know how to handle uffd requests but it did not require any knowledge of
> > the actual checkpoint and about which pages are available.
> > 
> > In the page-server-based-remote-restore the page-server-on-the-dest-host
> > has to have knowledge how to get which page and the uffd-daemon on the
> > destination side has to parse the checkpoint directory to know which pages
> > are part of the restored process and which pages have not yet been
> > transferred.
> > 
> > I am trying to say that in my original not-page-server-related
> > implementation the uffd daemon has no need to know the details about the
> > pages and in the page-server-based implementation all included
> > parts/daemons/page-servers need to know which pages need to be
> > transferred.
> 
> Ah, so your question is who should decide which page to push into the socket
> next -- uffd-side (restore-side) or the dump-side?

Yes. I think I like it better if only two of the three in lazy-pages
restore involved processes need to open the checkpoint directory.
Transferring it over network (like described above) would also solve it.

		Adrian


More information about the CRIU mailing list