[CRIU] p.haul and lxc

Tycho Andersen tycho.andersen at canonical.com
Fri Nov 14 06:05:15 PST 2014


On Fri, Nov 14, 2014 at 01:09:04PM +0400, Pavel Emelyanov wrote:
> On 11/14/2014 01:06 AM, Tycho Andersen wrote:
> > Hi all,
> > 
> > I've been looking p.haul a bit and thinking about how we might improve it for
> > use with lxc. Based on my read of the code, I think there are two conceptual
> > changes that would be needed:
> > 
> > 1. We need to add incremental dump support to lxc-checkpoint, so that p.haul
> >    can shell out to lxc-checkpoint, instead of calling criu directly. This
> >    means that p.haul doesn't need to know about lxc internals (e.g. how veths,
> >    ttys are set up and configured) in order to do its thing.
> 
> Makes sense to me. This also means, that p.haul should do final dump via
> lxc-checkpoint too. Which in turn means, that we should move more stuff to
> the htype-s, not just atomic callbacks. But I'd like to make it so, that
> p.haul keeps the ability to live-migrate just a task, w/o lxc/docker/openvz
> backends. This would require p.haul to still call criu directly.

Great! I'll work on this ASAP, although I think the patches will be
mostly for lxc :)

> > 2. We can get rid of any p.haul specific handling of cgroups for lxc, since
> >    these can be restored via criu and lxc-checkpoint and lxc-checkpoint will
> >    try to do the right thing w.r.t. multiple containers with the same name or
> >    any other cgroup collisions.
> 
> Ah, yes :) Cgroups should have left p.haul long time ago. They stay there
> simply because nobody had time to rip them off.
> 
> 
> I would add one more thing. Consider the "reattach" problem you were solving
> for plan lxc-restore. The restored tasks should become lxc's daemon children,
> not criu's ones. And we introduce the --restore-sibling for that.
> 
> The same should be true for p.haul. Right now restore tasks become children
> on p.haul-service process. This is not good. We should make them be children
> of lxc daemon again. From my perspective this can be implemented in two ways.
> 
> 1. p.haul is not an utility, but a library that lxc-daemon links with and calls.
> In this case criu will become the child on the daemon and with --restore-sibling
> will restore tasks as the daemon's kids.
> 
> 2. p.haul-service is uttility, lxc daemon fork()-s it, but then p.haul-service
> forks criu as lxc-daemon's children again, so that criu, in turn, restores
> tasks with --restore-sibling. This is hard to implement in python (CLONE_PARENT
> doesn't exists there), but possible.

I realized I had overlooked this last night, but you're right that it
is a sticky issue. Fortunately, at least in the lxc case I think it is
easy, it looks something like this:

* The top level daemon spawns a p.haul server, to receive the
  migrated container.

* The p.haul server receives the deltas and the final checkpoint
  through the normal iteration process.

* The p.haul server uses the lxc python api to do ->restore(), which
  by default execs criu, so the p.haul server process is replaced by
  the criu process, which does CLONE_PARENT because of
  --restore-sibling, and everything is happy.

For our purposes, we'd then use p.haul as an executable (although it
would be fine as a library, we could just write a little wrapper to
invoke it as a process).

> Thinking more about p.haul + docker I tend to think that the 1-st approach is
> worse, as it would require rewriting most of the p.haul's part in C (as nobody
> links with python) :)
> 
> What do you think?
> 
> > This would require a slight architectural change, since lxc-checkpoint is
> > setuid and execs criu, rather than using the service. However, since the
> > service mechanism is just to get around this problem, I think it should be ok.
> > 
> > Another question is what to do about rewriting the images. Based on our last
> > thread, we decided that in-process (e.g. while criu is restoring) rewriting is
> > most efficient, so we want to pass some --crit-flags to criu to tell it how to
> > rewrite things. Is p.haul the thing that would decide how to rewrite e.g.
> > cpusets, or would that be at some higher level?
> 
> Absolutely. The existing p.haul options handling allows specifying and pushing
> arbitrary flags to arbitrary p.haul sub-modules. In particular, the criu-api one
> can request for any crit flags.
> 
> > Finally, a minor conceptual change is that it would be nice to be able to set
> > up the channel that p.haul communicates over (e.g. a TLS socket or so), vs.
> > just having p.haul do a connect() over a raw socket. If nothing else, I think
> > we can just implement some other version of the `rpc_proxy` class to do this.
> 
> Yes, Ruslan is working on it. He implemented the ssh tunnel, but I agree, that
> there should be an option to use the pre-established channel.
> 
> Also note one thing. Currently in p.haul there are two channels -- one for RPC
> commands and the other one for memory (pre-)dumps. The latter is the socket
> that is fed to criu pre-dump, dump and page-server actions. With pre-established
> channel we'll have to do something about it. And pushing control commands AND
> memory over the same socket doesn't seem as the good solution to me.

Yes, agreed. I think the best way is to allow users (perhaps via some
plugins to the library, if not just executable arguments) to spawn new
sockets, and then just have p.haul ask that plugin for a socket to the
server; maybe with some ordering like the first socket is the control
socket, and then every socket after that's type is negotiated over the
control socket. We're interested in spawning only authenticated (TLS)
sockets, so a simple connect() won't work for us.

> And one more thing about channels :) There are cases when we have to copy file
> system to remote host. IIRC you used rsync in your demo :) So we will need the
> 3rd channel. Can we make the channels set-up be independent from the p.haul
> caller, i.e. p.haul should have an ability to set channels himself. Somehow.
> 
> > Do these sound reasonable? Does anyone have any thoughts?
> 
> Thanks for joining the p.haul efforts :)

No problem! I'm very excited to be working on p.haul and criu :)

Tycho


More information about the CRIU mailing list