[CRIU] Remote lazy-restore design discussion

Mon Apr 4 06:06:50 PDT 2016

On 03/31/2016 05:25 PM, Adrian Reber wrote:
> Hello Pavel,
> 
> after Mike asked if there have been any design discussions and after I
> am not 100% sure how the page-server fits into the remote restore, it
> seems to be a good idea to get a common understanding what the right
> implementation for remote lazy-restore should look like.
> 
> I am using my implementation as a starting point for the discussion.
> 
> I think we need three different process for remote lazy restore
> independent of how they are started. 'destination system' is the system
> the process should be migrated to and 'source system' is the system the
> original process was running on before the migration.
> 
>  1. The actual restore process (destination system):
>      This is a 'normal' restore with the difference that memory pages
>      (MAP_ANONYMOUS and MAP_PRIVATE) are not copied to their place
>      but they are marked as being handled by userfaultfd. Therefore
>      a userfaultfd FD (UFFD) is opened and passed to a second process.
> 
>  2. The local lazy restore UFFD handler (destination system):
>      This process listens on the UFFD for userfault requests and tries to
>      handle the userfault requests. Either by reading the required pages
>      from a local checkpoint (rather unlikely use case) or it is requesting
>      the pages from a remote system (source system) via the network.
> 
>  3. The remote lazy restore page request handler (source system):
>      This process opens a network port and listens for page requests
>      and reads the requested pages from a local checkpoint (or even
>      better, directly from a stopped process).

Agreed. And the process #1 would eventually turn into the restored process(es).

I would also add that process 3 should not only listen for page requests, but
also send other pages in the background. Probably the ideal process 3 should

1. Have a queue of pages to be sent (struct page_server_iov-s)
2. Fill it with pages that were not transfered (ANON|PRIVATE)
3. Start sending them one by one
4. Receive messages from the process #3 that can move some items from
   the queue on top (i.e. -- the pages that are needed right now)

> As this describes the solution I have implemented it all sounds correct
> to me. In addition to handling request for pages (processes 2. and 3.)
> both page handlers need to know how to push unrequested pages at some
> point in time to make sure the migration can finish.
> 
> Looking at the page-server it is currently not clear to me how it fits
> into this scenario. Currently it listens on a network port (like process
> 3. from above) and writes the received pages to the local disk.

Not exactly. It redirects pages from socket into particular page_xfer. Right 
ow the page server process only uses the local xfer which results in pages 
being written on disk.

Also, the page server includes page_server_xfer which is used by criu dump
to send the page, and this thing should be used by process 3.

> To serve as the process mention as process 3. it would need to learn all
> the functionality as it has currently been implemented.

You mean the page server should be taught to work with uffd? Well, kinda yes,
when I was talking about uffd daemon to use page server, I meant, that the
uffd process (#2 in your classification) should use page server protocol and
new page_xfer to transfer pages between hosts. And process #3 should use
standard page_server_xfer to transfer pages onto remote host.

> Instead of receiving pages and writing it to disk it needs to
> receive page requests and read the from disk to the network.

Why to disk? For post-copy live migration using disk for images should
be avoided as much as possible.

> This sounds like the opposite of what it is currently doing and,
> from my point of view, it is either a complete separate process,
> like my implementation, or all the functionality needs to be added.
> Also the logic to handle unrequested pages does not seem like
> something which the page-server can currently do or is designed to do.
> 
> So, from my point of view, page-server and remote page request handler
> seem rather different in their functionality (besides being a TCP
> server). I suppose there are some points I am not seeing so I hope to
> understand the situation better from the answers to this email. Thanks.

Probably I was not correct when used the word "page-server". I meant the
components used by it, but you thought of it as of a process itself :)

-- Pavel