[CRIU] Remote lazy-restore design discussion
Adrian Reber
adrian at lisas.de
Tue Apr 5 01:52:53 PDT 2016
On Mon, Apr 04, 2016 at 04:06:50PM +0300, Pavel Emelyanov wrote:
> On 03/31/2016 05:25 PM, Adrian Reber wrote:
> > after Mike asked if there have been any design discussions and after I
> > am not 100% sure how the page-server fits into the remote restore, it
> > seems to be a good idea to get a common understanding what the right
> > implementation for remote lazy-restore should look like.
> >
> > I am using my implementation as a starting point for the discussion.
> >
> > I think we need three different process for remote lazy restore
> > independent of how they are started. 'destination system' is the system
> > the process should be migrated to and 'source system' is the system the
> > original process was running on before the migration.
> >
> > 1. The actual restore process (destination system):
> > This is a 'normal' restore with the difference that memory pages
> > (MAP_ANONYMOUS and MAP_PRIVATE) are not copied to their place
> > but they are marked as being handled by userfaultfd. Therefore
> > a userfaultfd FD (UFFD) is opened and passed to a second process.
> >
> > 2. The local lazy restore UFFD handler (destination system):
> > This process listens on the UFFD for userfault requests and tries to
> > handle the userfault requests. Either by reading the required pages
> > from a local checkpoint (rather unlikely use case) or it is requesting
> > the pages from a remote system (source system) via the network.
> >
> > 3. The remote lazy restore page request handler (source system):
> > This process opens a network port and listens for page requests
> > and reads the requested pages from a local checkpoint (or even
> > better, directly from a stopped process).
>
> Agreed. And the process #1 would eventually turn into the restored process(es).
>
> I would also add that process 3 should not only listen for page requests, but
> also send other pages in the background. Probably the ideal process 3 should
>
> 1. Have a queue of pages to be sent (struct page_server_iov-s)
> 2. Fill it with pages that were not transfered (ANON|PRIVATE)
> 3. Start sending them one by one
> 4. Receive messages from the process #3 that can move some items from
> the queue on top (i.e. -- the pages that are needed right now)
I completely agree with this. This has to be the end result.
Adrian
More information about the CRIU
mailing list