[CRIU] Remote lazy-restore design discussion

Tue Apr 5 09:06:30 PDT 2016

On 04/05/2016 11:52 AM, Adrian Reber wrote:
> On Mon, Apr 04, 2016 at 04:06:50PM +0300, Pavel Emelyanov wrote:
>> On 03/31/2016 05:25 PM, Adrian Reber wrote:
>>> after Mike asked if there have been any design discussions and after I
>>> am not 100% sure how the page-server fits into the remote restore, it
>>> seems to be a good idea to get a common understanding what the right
>>> implementation for remote lazy-restore should look like.
>>>
>>> I am using my implementation as a starting point for the discussion.
>>>
>>> I think we need three different process for remote lazy restore
>>> independent of how they are started. 'destination system' is the system
>>> the process should be migrated to and 'source system' is the system the
>>> original process was running on before the migration.
>>>
>>>  1. The actual restore process (destination system):
>>>      This is a 'normal' restore with the difference that memory pages
>>>      (MAP_ANONYMOUS and MAP_PRIVATE) are not copied to their place
>>>      but they are marked as being handled by userfaultfd. Therefore
>>>      a userfaultfd FD (UFFD) is opened and passed to a second process.
>>>
>>>  2. The local lazy restore UFFD handler (destination system):
>>>      This process listens on the UFFD for userfault requests and tries to
>>>      handle the userfault requests. Either by reading the required pages
>>>      from a local checkpoint (rather unlikely use case) or it is requesting
>>>      the pages from a remote system (source system) via the network.
>>>
>>>  3. The remote lazy restore page request handler (source system):
>>>      This process opens a network port and listens for page requests
>>>      and reads the requested pages from a local checkpoint (or even
>>>      better, directly from a stopped process).
>>
>> Agreed. And the process #1 would eventually turn into the restored process(es).
>>
>> I would also add that process 3 should not only listen for page requests, but
>> also send other pages in the background. Probably the ideal process 3 should
>>
>> 1. Have a queue of pages to be sent (struct page_server_iov-s)
>> 2. Fill it with pages that were not transfered (ANON|PRIVATE)
>> 3. Start sending them one by one
>> 4. Receive messages from the process #3 that can move some items from
>>    the queue on top (i.e. -- the pages that are needed right now)
> 
> I completely agree with this. This has to be the end result.

Yup. And I also like your initial proposal to work in "pull mode" for the first
5 seconds. I think it will let us keep the network free from unneeded pages while
the program is re-warming after lazy restore.

I've sent more detailed view of how this all should look like in another e-mail
in this thread. I think that this 5 seconds pull-only period would fit there.

-- Pavel