[CRIU] p.haul and lxc

Fri Nov 14 07:47:15 PST 2014

On 11/14/2014 08:04 PM, Tycho Andersen wrote:

>> If p.haul will use LXC's sockets and will use LXC as "checkpoint-restore API"
>> then the workflow would look like this.
>>
>>   src p.haul says to dst one "start page server"
>>   src p.haul says to local "criu api (lxc daemon)" -- start pre-dump
>>
>> After these two steps criu page server on dst and criu pre-dump on the
>> src should be connected. Can LXC daemon provide this?
> 
> Yes, I think we can provide the authenticated socket (or just pass a
> message for criu as a proxy). In fact, the proxy method might be the
> easiest -- p.haul sends stuff to lxd, and then lxd forwards it on to the
> other lxd, which sends it back to the other end's p.haul.

Wait, we seem to talk about different sockets :) Maybe not, but let me
clarify the whole picture anyway :)

The socket I'm talking about is the socket which will be used by criu 
pre-dump to send memory contents of tasks to the page server. Not the 
one that will be used by p.haul ends to talk to each other.

The in-progress picture should look like this

src-LXD                                dst-LXD
 `- p.haul --[ channel for commands ]-- `- p.haul-service
 `- criu   --[  channel for memory  ]-- `- criu
                                        `- init <-- will get CLONE_PARENT by criu
                                            `- ...

There are two network channels and four local via which both p.haul-s can
talk to LXD-s as to "CRIU API" and LXD-s make calls to criu-s.

As far as network channels are concerned.

The 1st channel (for commands) can be implemented "via" LXDs, since it's
nothing but pre-dump/dump/restore stages synchronization. But the 2ns
channel (for memory) should be just a socket for data (auth-d and crypted,
but there's no need in whole LXD in between from my POV). 

BTW, the same channel is currently used by p.haul to transfer non-memory 
images at the very end, so p.haul-s should "know" about it too.

>> Note, that it will
>> not be nice if for every such iteration the new socket will be created,
>> there can be several iterations.
> 
> I think the socket we give to p.haul would be for use exclusively by
> p.haul, so since it's not necessary now, I don't think it would be.
> 
> 
>> Hmm...
>>
>> I guess this can be solved if during LXC-to-LXC migration handshake they 
>> open two (3 in FS migration case) sockets, one is fed to p.haul-s, the 2nd
>> to criu pre-dump and criu page-server.
>>
>> At the same time fork() + exec() of criu on every iteration doesn't sound
>> nice too (can be long). We have the "swrk" mode of criu -- it's when criu
>> gets a socket and reads RPC command from it instead of parsing command
>> line arguments. The page-server start, pre-dump, dump and restore work
>> nice through this mode. I guess we need to polish one in 1.4, document
>> and use _it_ in the migration case. Does this sound OK to you?
> 
> Ah, that's interesting. I hadn't thought about the multiple forks
> being expensive. 

Fork()-s -- no. Execve()-s will (can) be :)

> So we'd start lxc-checkpoint in some sort of daemon
> mode, which would then read rpc commands over the socket from p.haul
> until the final dump was done? Then on the restore side I guess it
> would just be the same single command thing.

Not single, unfortunately. During iterations destination LXD will have
to ask CRIU to start page-servers to accept memory pages.

Can LXD fork criu in swrk mode and just forward to it anything that
comes from p.haul? Without de/en-coding the contents.

> The only problem I see with this is that then lxc needs to depend on
> protobuf-c-compiler, which isn't currently in ubuntu's 'main' repo and
> would take some work to get it there.
> 
> Tycho
> .
> 

Thanks,
Pavel