[CRIU] Lazy-restore design discussion - round 3

Tue Apr 19 07:54:02 PDT 2016

On Tue, Apr 19, 2016 at 03:41:30PM +0300, Pavel Emelyanov wrote:
> On 04/19/2016 02:45 PM, Adrian Reber wrote:
> > On Tue, Apr 19, 2016 at 02:21:02PM +0300, Mike Rapoport wrote:
> >> On Tue, Apr 19, 2016 at 01:24:09PM +0300, Pavel Emelyanov wrote:
> >>> On 04/19/2016 12:39 PM, Adrian Reber wrote:
> >>>> The new summary:
> >>>>
> >>>>  * On the source system there will be process listening on a network
> >>>>    socket. In the first implementation it will use a checkpoint
> >>>>    directory as the basis for the UFFD pages and in a later version
> >>>>    we will add the possibility to transfer the pages directly from the
> >>>>    checkpointed process.
> >>>
> >>> Yes, and in the latter case the daemon will be started automatically by
> >>> criu dump.
> >>  
> >> Why additional process is needed on the dump side? Why the criu dump itself
> >> cannot go into "daemon mode" after collecting pagemap's and inserting the
> >> memory pages into page-pipe?
> > 
> > I had the same question. But if it fork()'s or uses some other mechanism
> > to go into daemon mode sounds like a implementation detail...
> 
> Agreed.
> 
> >>>>  * The UFFD daemon is the instance which decides which pages are pushed
> >>>>    when via UFFD into the restored process.
> >>>
> >>> No, from my perspective uffd daemon (restore side) should be passive and
> >>> only forward PF-s to dump side and inject into tasks' address spaces
> >>> whatever pages arrive from restore side.
> >>
> >> This one is tough :)
> > 
> > Yes, it is. This seems to be the main point of discussion.
> > 
> >> I'm more biased towards making the receive side the smart one and the dump
> >> side the dumb one.
> > 
> > It seems I am again biased towards the other direction ;-)
> > 
> >> I'd suggest that we start with teaching uffd to get pages over the network
> >> instead of checkpoint directory on the destination, and after that works
> >> we'll see which side should be the smart one. 
> > 
> > The current implementation I have (on top of Mike's page-server
> > extension patch) does exactly that. But if we want the uffd daemon
> > (restore side) to be passive then there is no need to open the
> > checkpoint directory.
> > 
> > Maybe we really should implement it like Mike said. First try to get the
> > current locally on my and on Mike's system existing patches into shape and
> > then we can decide if we want to move the page handling logic to the
> > dump side on the destination system.
> 
> OK, let's see how it goes.
> 
> But I have one concern about having brains on restore side. Look, the uffd can request
> for two kinds (or types) of pages -- those that task are blocked on in #PF (i.e. -- 
> explicit uffd requests) and those that task hasn't yet touched (i.e. -- request them
> in advance). With the former pages the situation is clear, it's uffd who knows what
> these pages are. It can even know something about the latter pages, e.g. with #PF-ed
> pages request for adjacent pages as Adrian proposed. That's clear. But what to do
> with other "in advance" pages. It seems that it's better to request those pages in
> LRU manner, i.e. -- request for recent pages before those that were used long ago. But
> the problem I see is that this LRU information can only be obtained from the dump
> side -- all this LRU statistics sits _there_. And what would be the way to share
> this knowledge with the restore side (as we plan to make it "smart" or "active")?
> 
> Had we the "brain" (or "active part") on dump side we could just scan this info and
> make decision. But what to do when we have "brain" on restore side and all the LRU
> info on the dump side?

>From where do we have the LRU information? Does CRIU collect this during
dump? Or can this be queried from the kernel?

		Adrian