[CRIU] [PATCH v2 5/5] UFFD: Support lazy-pages restore between two hosts

Mon Mar 28 03:19:00 PDT 2016

On 03/28/2016 10:12 AM, Mike Rapoport wrote:
> Hi Adrian,
> 
> On Thu, Mar 24, 2016 at 03:52:54PM +0000, Adrian Reber wrote:
>> From: Adrian Reber <areber at redhat.com>
>>
>> This enhances lazy-pages mode to work with two different hosts. Instead
>> of lazy restoring a process on the same host this enables to keep the
>> memory pages on the source system and actually only transfer the memory
>> pages on demand from the source to the destination system.
>>
>> The previous, only on one host, lazy restore consisted of two process.
>>
>>  criu restore --lazy-pages --address /path/to/unix-domain-socket
>>
>> and
>>
>>  criu lazy-pages --address /path/to/unix-domain-socket
>>
>> The unix domain socket was used to transfer the userfault FD (UFFD) from
>> the 'criu restore' process to the 'criu lazy-pages' process. The 'criu
>> lazy-pages' was then listening on the UFFD for userfaultfd messages
>> which were used to retrieve the requested memory page from the
>> checkpoint directory and transfer that page into the process to be
>> restored.
>>
>> This commit introduces the ability to keep the pages on the remote host
>> and only request the transfer of the required pages over TCP on demand.
>> Therefore criu needs to be started differently than previously.
>>
>> Host1:
>>
>>    criu restore --lazy-pages --address /path/to/unix-domain-socket
>>
>>   and
>>
>>    criu lazy-pages --address /path/to/unix-domain-socket \
>>    --lazy-client ADDR-Host2 --port 27
>>
>> Host2:
>>
>>    criu lazy-pages --lazy-server --port 27
>>
>> On Host1 the process is now restored (as criu always does) except that
>> the memory pages are not read from pages.img and that the appropriate
>> pages are marked as being userfaultfd handled. As soon as the restored
>> process tries to access one the pages a UFFD MSG is received by the
>> lazy-client (on Host1). This UFFD MSG is then transferred via TCP to the
>> lazy-sever (on Host2). The lazy-server retrieves the memory page from
>> the local checkpoint and returns a UFFDIO COPY answer back to the
>> lazy-client which can the forward this message to the local UFFD which
>> inserts the page into the restored process.
>>
>> The remote lazy restore has the same behavior as the local lazy restore
>> that, if after 5 seconds no more messages are received on the socket
>> waiting for UFFD MSG, it switches to copy remaining pages mode, where
>> all non-UFFD-requested pages are transferred into the restored process.
>>
>> TODO:
>>   * Create from the checkpoint directory a checkpoint without the memory
>>     pages which are UFFD handled. This would enable a real UFFD remote
>>     restore where the UFFD pages do not need to be transferred to the
>>     destination host.
>>
>> Signed-off-by: Adrian Reber <areber at redhat.com>
>> ---
>>  criu/uffd.c | 269 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 240 insertions(+), 29 deletions(-)
>  
> I have some concerns regarding the proposed design. It could be that I'm
> jumping late, and you've already had discussions about how to use
> userfaultfd in CRIU, but I'll share my thoughts anyway...
> 
> I think that post-copy migration in CRIU may be implemented without
> creating so many entities taking care of different sides of userfaultfd.
> The restore part may create a daemon for userfault handling transparently
> to the user and dump side may be enhanced with ability to send pages on
> demand, again without starting another process explicitly.
> 
> For instance, using 'criu dump --lazy-pages' would do everything required
> to serve pages for both on demand requests and for background copying and
> 'criu restore --lazy-pages' would be able to handle userfaultfd.

I don't disagree with that, but this all applies to the API part of the
feature. In particular, the lazy pages daemon itself would be required,
the question is how t start one. The current code starts one explicitly,
you propose to fork() it as the part of the restore action, which is
also fine.

> Regarding the patch itself, I believe that mixing socket and uffd handling
> in the same functions is not very good idea. Although there are lots of
> similarities between their behaviour, socket and uffd are semantically
> different and should be handled by different code paths.

Can you clarify this more? The same daemon should poll on two descriptors --
the uffd one to handle #PFs from restored tree and the socket to read
pages from the source node.

-- Pavel