[CRIU] Lazy-restore design discussion - round 3

Tue Apr 19 05:41:30 PDT 2016

On 04/19/2016 02:45 PM, Adrian Reber wrote:
> On Tue, Apr 19, 2016 at 02:21:02PM +0300, Mike Rapoport wrote:
>> On Tue, Apr 19, 2016 at 01:24:09PM +0300, Pavel Emelyanov wrote:
>>> On 04/19/2016 12:39 PM, Adrian Reber wrote:
>>>> The new summary:
>>>>
>>>>  * On the source system there will be process listening on a network
>>>>    socket. In the first implementation it will use a checkpoint
>>>>    directory as the basis for the UFFD pages and in a later version
>>>>    we will add the possibility to transfer the pages directly from the
>>>>    checkpointed process.
>>>
>>> Yes, and in the latter case the daemon will be started automatically by
>>> criu dump.
>>  
>> Why additional process is needed on the dump side? Why the criu dump itself
>> cannot go into "daemon mode" after collecting pagemap's and inserting the
>> memory pages into page-pipe?
> 
> I had the same question. But if it fork()'s or uses some other mechanism
> to go into daemon mode sounds like a implementation detail...

Agreed.

>>>>  * The UFFD daemon is the instance which decides which pages are pushed
>>>>    when via UFFD into the restored process.
>>>
>>> No, from my perspective uffd daemon (restore side) should be passive and
>>> only forward PF-s to dump side and inject into tasks' address spaces
>>> whatever pages arrive from restore side.
>>
>> This one is tough :)
> 
> Yes, it is. This seems to be the main point of discussion.
> 
>> I'm more biased towards making the receive side the smart one and the dump
>> side the dumb one.
> 
> It seems I am again biased towards the other direction ;-)
> 
>> I'd suggest that we start with teaching uffd to get pages over the network
>> instead of checkpoint directory on the destination, and after that works
>> we'll see which side should be the smart one. 
> 
> The current implementation I have (on top of Mike's page-server
> extension patch) does exactly that. But if we want the uffd daemon
> (restore side) to be passive then there is no need to open the
> checkpoint directory.
> 
> Maybe we really should implement it like Mike said. First try to get the
> current locally on my and on Mike's system existing patches into shape and
> then we can decide if we want to move the page handling logic to the
> dump side on the destination system.

OK, let's see how it goes.

But I have one concern about having brains on restore side. Look, the uffd can request
for two kinds (or types) of pages -- those that task are blocked on in #PF (i.e. -- 
explicit uffd requests) and those that task hasn't yet touched (i.e. -- request them
in advance). With the former pages the situation is clear, it's uffd who knows what
these pages are. It can even know something about the latter pages, e.g. with #PF-ed
pages request for adjacent pages as Adrian proposed. That's clear. But what to do
with other "in advance" pages. It seems that it's better to request those pages in
LRU manner, i.e. -- request for recent pages before those that were used long ago. But
the problem I see is that this LRU information can only be obtained from the dump
side -- all this LRU statistics sits _there_. And what would be the way to share
this knowledge with the restore side (as we plan to make it "smart" or "active")?

Had we the "brain" (or "active part") on dump side we could just scan this info and
make decision. But what to do when we have "brain" on restore side and all the LRU
info on the dump side?

-- Pavel