[CRIU] Lazy-restore design discussion - round 3

Adrian Reber adrian at lisas.de
Tue Apr 19 04:45:08 PDT 2016


On Tue, Apr 19, 2016 at 02:21:02PM +0300, Mike Rapoport wrote:
> On Tue, Apr 19, 2016 at 01:24:09PM +0300, Pavel Emelyanov wrote:
> > On 04/19/2016 12:39 PM, Adrian Reber wrote:
> > > The new summary:
> > > 
> > >  * On the source system there will be process listening on a network
> > >    socket. In the first implementation it will use a checkpoint
> > >    directory as the basis for the UFFD pages and in a later version
> > >    we will add the possibility to transfer the pages directly from the
> > >    checkpointed process.
> > 
> > Yes, and in the latter case the daemon will be started automatically by
> > criu dump.
>  
> Why additional process is needed on the dump side? Why the criu dump itself
> cannot go into "daemon mode" after collecting pagemap's and inserting the
> memory pages into page-pipe?

I had the same question. But if it fork()'s or uses some other mechanism
to go into daemon mode sounds like a implementation detail...

> > >  * The UFFD daemon is the instance which decides which pages are pushed
> > >    when via UFFD into the restored process.
> > 
> > No, from my perspective uffd daemon (restore side) should be passive and
> > only forward PF-s to dump side and inject into tasks' address spaces
> > whatever pages arrive from restore side.
> 
> This one is tough :)

Yes, it is. This seems to be the main point of discussion.

> I'm more biased towards making the receive side the smart one and the dump
> side the dumb one.

It seems I am again biased towards the other direction ;-)

> I'd suggest that we start with teaching uffd to get pages over the network
> instead of checkpoint directory on the destination, and after that works
> we'll see which side should be the smart one. 

The current implementation I have (on top of Mike's page-server
extension patch) does exactly that. But if we want the uffd daemon
(restore side) to be passive then there is no need to open the
checkpoint directory.

Maybe we really should implement it like Mike said. First try to get the
current locally on my and on Mike's system existing patches into shape and
then we can decide if we want to move the page handling logic to the
dump side on the destination system.

		Adrian


More information about the CRIU mailing list