[CRIU] Lazy-restore design discussion - round 3

Tue Apr 19 22:02:57 PDT 2016

On 04/19/2016 10:51 PM, Adrian Reber wrote:
> On Tue, Apr 19, 2016 at 05:03:57PM +0200, Adrian Reber wrote:
>>>>>> Maybe we really should implement it like Mike said. First try to get the
>>>>>> current locally on my and on Mike's system existing patches into shape and
>>>>>> then we can decide if we want to move the page handling logic to the
>>>>>> dump side on the destination system.
>>>>>
>>>>> OK, let's see how it goes.
>>>>>
>>>>> But I have one concern about having brains on restore side. Look, the uffd can request
>>>>> for two kinds (or types) of pages -- those that task are blocked on in #PF (i.e. -- 
>>>>> explicit uffd requests) and those that task hasn't yet touched (i.e. -- request them
>>>>> in advance). With the former pages the situation is clear, it's uffd who knows what
>>>>> these pages are. It can even know something about the latter pages, e.g. with #PF-ed
>>>>> pages request for adjacent pages as Adrian proposed. That's clear. But what to do
>>>>> with other "in advance" pages. It seems that it's better to request those pages in
>>>>> LRU manner, i.e. -- request for recent pages before those that were used long ago. But
>>>>> the problem I see is that this LRU information can only be obtained from the dump
>>>>> side -- all this LRU statistics sits _there_. And what would be the way to share
>>>>> this knowledge with the restore side (as we plan to make it "smart" or "active")?
>>>>>
>>>>> Had we the "brain" (or "active part") on dump side we could just scan this info and
>>>>> make decision. But what to do when we have "brain" on restore side and all the LRU
>>>>> info on the dump side?
>>>>
>>>> >From where do we have the LRU information? Does CRIU collect this during
>>>> dump? Or can this be queried from the kernel?
>>>
>>> Right now we don't collect it by CRIU, it's only present in CRIU (and somehow can be
>>> collected using reference-d bit from the proc pagemap file, but we don't do it either).
>>> My point is that this information is only present on the dump side.
>>
>> Good to know, that this information exists. Your proposal to insert
>> additional pages based on LRU basis makes sense. I need to think about
>> this a bit...
> 
> Knowing about the LRU data on the dump side this means that either the
> dump side decides which pages are transmitted or with have to transmit
> the LRU data to the uffd daemon on the restore side. Transferring the
> LRU data to uffd daemon sounds like it will make the whole thing
> unnecessarily complicated. Having the logic which pages are transferred
> when on the dump side means, that the uffd daemon on the restore side
> does nothing more than forwarding pages or userfault requests. This also
> means it doesn't need to know anything about the restored process and
> therefore it does not need to access the checkpoint directory.

There's one point that might require it to do so -- the uffd daemon will
need to work in 2 phases. The first one is when he will accept uffds from
the restoring processes. And the second one is when he will serve #PFs
and inject arriving pages back. We need to know where the stage one ends
and stage two starts. And we need to know when stage two ends too. All this
might require the uffd daemon to to at least the amount of processes it
have to deal with and the lazy regions he will have to fill with data.
In turn, this knowledge sits in the images directory so uffd daemon might
still need to read one.

> From how
> I see it this would lead to a similar solution like the one implemented
> in my first remote lazy restore patchset. Only based on the
> page-server protocol.
> 
> Thinking more about how the protocol needs to be implemented between
> dump side and uffd daemon it also sounds like the uffd daemon, which is
> now only forwarding pages and userfault requests, will/should lose the
> ability to read local checkpoint directories. It will only be used for
> forwarding pages. A lazy-restore with source and destination on the same
> machine will then probably also require to forward the pages through the
> local uffd-page-forwarding-daemon.
> 
> So the above is the result of my thoughts about what it means that the
> LRU data only exists on the dump side. This is not (yet) what I am
> proposing to do, it is just what I think this might lead to.

-- Pavel