[CRIU] Remote lazy-restore design discussion

Mon Apr 11 05:23:57 PDT 2016

On 04/11/2016 01:29 PM, Adrian Reber wrote:
> On Fri, Apr 08, 2016 at 07:41:18PM +0300, Pavel Emelyanov wrote:
> [...]
>>>>>  * Second the dumped information is transferred (scp/rsync) to the
>>>>>    destination.
>>>>
>>>> Note, that _some_ memory contents will be sent to page server using
>>>> dump-connects-to-restore method.
>>>
>>> Not sure I understand this.
>>
>> I meant that while doing the final dump some pages' contents will be sent
>> to the destination node via the page server. And thus, at restore time, will
>> be present there as on-disk images (or on-tmpfs images).
> 
> Ah, so all pages which cannot be handled by uffd, right?

Yup.

> [...]
> 
>>>> I don't have sting arguments for dump->restore connection, since if
>>>> you look at how p.haul works, it doesn't use this criu connect feature, 
>>>> it passes one a pre-established descriptor. On both sides. So question
>>>> which side connects to which can be solved either way.
>>>
>>> Yes, probably right. Which side connects to which is not really
>>> important. That was just the first point which seemed wrong from the
>>> order of steps I expected.
>>>
>>> I thought some more about the different current designs currently
>>> available. Currently I can use my first implementation which provides
>>> its own protocol for page exchange (uffd-struct-based) and the one based
>>> upon Mike's page server client (page-server-based).
>>
>> I would unify page-server and uffd protocols at least in terms of
>> messages they exchange.
> 
> Make sense.
> 
>>> In my first implementation (uffd-struct-based) the logic which pages
>>> should be copied was running on the source system (uffd-remote-server).
>>> It reacted on requests and transferred unrequested pages at the end.
>>> To know which pages need to be transferred it had to parse the
>>> checkpoint directory.
>>
>> Or keep the information obtained while doing "dump" action. No?
> 
> Which is then transferred via network to the uffd-daemon listing on the
> uffd? Is there already a protocol in the page-server to transfer
> additional (non-pages) data?

Nope.

> Or would this mean just an additional
> page-server command with its own function handling it? Probably.

Yes, I kept in mind one more command. The page_server_iov structure
seem sufficient for this. It has commaid id, pagemap (start, len) and
the destination id (pid effectively). So using this you can notify
the dump side about pagefault.

>>> The uffd-daemon on the destination side was just
>>> forwarding pages to and from uffd and the network socket. It needed to
>>> know how to handle uffd requests but it did not require any knowledge of
>>> the actual checkpoint and about which pages are available.
>>>
>>> In the page-server-based-remote-restore the page-server-on-the-dest-host
>>> has to have knowledge how to get which page and the uffd-daemon on the
>>> destination side has to parse the checkpoint directory to know which pages
>>> are part of the restored process and which pages have not yet been
>>> transferred.
>>>
>>> I am trying to say that in my original not-page-server-related
>>> implementation the uffd daemon has no need to know the details about the
>>> pages and in the page-server-based implementation all included
>>> parts/daemons/page-servers need to know which pages need to be
>>> transferred.
>>
>> Ah, so your question is who should decide which page to push into the socket
>> next -- uffd-side (restore-side) or the dump-side?
> 
> Yes. I think I like it better if only two of the three in lazy-pages
> restore involved processes need to open the checkpoint directory.
> Transferring it over network (like described above) would also solve it.

Yup :)

-- Pavel