[CRIU] Lazy-restore design discussion - round 2

Adrian Reber adrian at lisas.de
Mon Apr 18 00:46:54 PDT 2016


It seems we have reached some kind of agreement and therefore
I am trying to summarize, from my point of view, our current discussion
results.

 * On the source system there will be process listening on a network
   socket. In the first implementation it will use a checkpoint
   directory as the basis for the UFFD pages and in a later version
   it will transfer the pages directly from the checkpointed process.

 * The transport protocol between the source system and the UFFD daemon
   on the destination will be page-server based (something like Mike's patch)

 * The UFFD daemon will be able to handle multiple restore requests
   (also Mike's patch "lazy-pages: handle multiple processes")

 * The UFFD daemon does not need a checkpoint directory to run, all
   required information will be transferred over the network.
   e.g. PID and pages

 * The page-server protocol needs to be extended to transfer the
   lazy-restore pages list from the source system to the UFFD daemon.

 * The UFFD daemon is the instance which decides which pages are pushed
   when via UFFD into the restored process.

Do we agree on these points? If yes, I would like to start to implement
it that way. If we get to the point where this works it still requires
lot of work on the tooling. For example how to split out the lazy-pages
from an existing dump, so that only the non-lazy-pages are actually
transferred to the destination system.

		Adrian


More information about the CRIU mailing list