[CRIU] p.haul and lxc

Fri Nov 14 01:09:04 PST 2014

On 11/14/2014 01:06 AM, Tycho Andersen wrote:
> Hi all,
> 
> I've been looking p.haul a bit and thinking about how we might improve it for
> use with lxc. Based on my read of the code, I think there are two conceptual
> changes that would be needed:
> 
> 1. We need to add incremental dump support to lxc-checkpoint, so that p.haul
>    can shell out to lxc-checkpoint, instead of calling criu directly. This
>    means that p.haul doesn't need to know about lxc internals (e.g. how veths,
>    ttys are set up and configured) in order to do its thing.

Makes sense to me. This also means, that p.haul should do final dump via
lxc-checkpoint too. Which in turn means, that we should move more stuff to
the htype-s, not just atomic callbacks. But I'd like to make it so, that
p.haul keeps the ability to live-migrate just a task, w/o lxc/docker/openvz
backends. This would require p.haul to still call criu directly.

> 2. We can get rid of any p.haul specific handling of cgroups for lxc, since
>    these can be restored via criu and lxc-checkpoint and lxc-checkpoint will
>    try to do the right thing w.r.t. multiple containers with the same name or
>    any other cgroup collisions.

Ah, yes :) Cgroups should have left p.haul long time ago. They stay there
simply because nobody had time to rip them off.

I would add one more thing. Consider the "reattach" problem you were solving
for plan lxc-restore. The restored tasks should become lxc's daemon children,
not criu's ones. And we introduce the --restore-sibling for that.

The same should be true for p.haul. Right now restore tasks become children
on p.haul-service process. This is not good. We should make them be children
of lxc daemon again. From my perspective this can be implemented in two ways.

1. p.haul is not an utility, but a library that lxc-daemon links with and calls.
In this case criu will become the child on the daemon and with --restore-sibling
will restore tasks as the daemon's kids.

2. p.haul-service is uttility, lxc daemon fork()-s it, but then p.haul-service
forks criu as lxc-daemon's children again, so that criu, in turn, restores
tasks with --restore-sibling. This is hard to implement in python (CLONE_PARENT
doesn't exists there), but possible.

Thinking more about p.haul + docker I tend to think that the 1-st approach is
worse, as it would require rewriting most of the p.haul's part in C (as nobody
links with python) :)

What do you think?

> This would require a slight architectural change, since lxc-checkpoint is
> setuid and execs criu, rather than using the service. However, since the
> service mechanism is just to get around this problem, I think it should be ok.
> 
> Another question is what to do about rewriting the images. Based on our last
> thread, we decided that in-process (e.g. while criu is restoring) rewriting is
> most efficient, so we want to pass some --crit-flags to criu to tell it how to
> rewrite things. Is p.haul the thing that would decide how to rewrite e.g.
> cpusets, or would that be at some higher level?

Absolutely. The existing p.haul options handling allows specifying and pushing
arbitrary flags to arbitrary p.haul sub-modules. In particular, the criu-api one
can request for any crit flags.

> Finally, a minor conceptual change is that it would be nice to be able to set
> up the channel that p.haul communicates over (e.g. a TLS socket or so), vs.
> just having p.haul do a connect() over a raw socket. If nothing else, I think
> we can just implement some other version of the `rpc_proxy` class to do this.

Yes, Ruslan is working on it. He implemented the ssh tunnel, but I agree, that
there should be an option to use the pre-established channel.

Also note one thing. Currently in p.haul there are two channels -- one for RPC
commands and the other one for memory (pre-)dumps. The latter is the socket
that is fed to criu pre-dump, dump and page-server actions. With pre-established
channel we'll have to do something about it. And pushing control commands AND
memory over the same socket doesn't seem as the good solution to me.

And one more thing about channels :) There are cases when we have to copy file
system to remote host. IIRC you used rsync in your demo :) So we will need the
3rd channel. Can we make the channels set-up be independent from the p.haul
caller, i.e. p.haul should have an ability to set channels himself. Somehow.

> Do these sound reasonable? Does anyone have any thoughts?

Thanks for joining the p.haul efforts :)