[CRIU] Remote lazy-restore design discussion

Adrian Reber adrian at lisas.de
Tue Apr 5 01:52:53 PDT 2016


On Mon, Apr 04, 2016 at 04:06:50PM +0300, Pavel Emelyanov wrote:
> On 03/31/2016 05:25 PM, Adrian Reber wrote:
> > after Mike asked if there have been any design discussions and after I
> > am not 100% sure how the page-server fits into the remote restore, it
> > seems to be a good idea to get a common understanding what the right
> > implementation for remote lazy-restore should look like.
> > 
> > I am using my implementation as a starting point for the discussion.
> > 
> > I think we need three different process for remote lazy restore
> > independent of how they are started. 'destination system' is the system
> > the process should be migrated to and 'source system' is the system the
> > original process was running on before the migration.
> > 
> >  1. The actual restore process (destination system):
> >      This is a 'normal' restore with the difference that memory pages
> >      (MAP_ANONYMOUS and MAP_PRIVATE) are not copied to their place
> >      but they are marked as being handled by userfaultfd. Therefore
> >      a userfaultfd FD (UFFD) is opened and passed to a second process.
> > 
> >  2. The local lazy restore UFFD handler (destination system):
> >      This process listens on the UFFD for userfault requests and tries to
> >      handle the userfault requests. Either by reading the required pages
> >      from a local checkpoint (rather unlikely use case) or it is requesting
> >      the pages from a remote system (source system) via the network.
> > 
> >  3. The remote lazy restore page request handler (source system):
> >      This process opens a network port and listens for page requests
> >      and reads the requested pages from a local checkpoint (or even
> >      better, directly from a stopped process).
> 
> Agreed. And the process #1 would eventually turn into the restored process(es).
> 
> I would also add that process 3 should not only listen for page requests, but
> also send other pages in the background. Probably the ideal process 3 should
> 
> 1. Have a queue of pages to be sent (struct page_server_iov-s)
> 2. Fill it with pages that were not transfered (ANON|PRIVATE)
> 3. Start sending them one by one
> 4. Receive messages from the process #3 that can move some items from
>    the queue on top (i.e. -- the pages that are needed right now)

I completely agree with this. This has to be the end result.

		Adrian


More information about the CRIU mailing list