[CRIU] [PATCH 2/2] restore: correctly restore cgroup mounts inside a container

Thu Mar 24 07:14:00 PDT 2016

On Thu, Mar 24, 2016 at 05:11:31PM +0300, Pavel Emelyanov wrote:
> On 03/24/2016 05:00 PM, Tycho Andersen wrote:
> > On Thu, Mar 24, 2016 at 10:16:30AM +0300, Pavel Emelyanov wrote:
> >> On 03/24/2016 12:41 AM, Tycho Andersen wrote:
> >>> Before the nsroot= mount option, we were just getting lucky because the
> >>> cgroup superblocks "matched" when inspecting them from userspace, so we
> >>> were actually getting a bind mount from the host when migrating from within
> >>> cgroup namespaces.
> >>>
> >>> Instead, let's actually do a new (i.e. not a bind mount) for cgroup
> >>> namespaces. For this, we need two things:
> >>>
> >>> 1. to prepare the cgroup namespace (and thus the cgroups) before the mount
> >>>    ns, so when the mount() occurrs it is relative to the right cgroup path.
> >>>
> >>> 2. not reject cgroup filesystems with no root. A cgroup ns mount looks
> >>>    like:
> >>>
> >>> 	 223 222 0:22 /lxc/unpriv /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/lxc/unpriv
> >>>
> >>>    i.e. it has /lxc/unpriv as its root, and thus doesn't look rooted to CRIU.
> >>>    Let's allow cgroup mounts to be unrooted so we can deal with this.
> >>
> >> Am I right that the only problem here is that criu doesn't support mounting
> >> of anything that doesn't have root mount? If so, the correct fix would be
> >> to support _this_ by creating a temporary root mount, then umounting it at
> >> the very end.
> > 
> > Yes. One problem with creating this temporary mount is that any mount
> > of cgroupfs is created relative to the task's current cgroup ns root.
> > So we'd have to create this temporary mount, restore cgroup ns, and
> > then do the real mount restore, which would effectively just be
> > another completely new and unrelated mount. I don't see what the
> > temporary root mount buys us in this case.
> 
> Ah, I see. The mountpoint that is normally non-root mount turns out to
> effectively be such in case we call mount() from inside the cgroup ns, right?

Yes, exactly. I agree that the patch looks a little weird, though, so
I am open to changing this bit, I'm just not sure what the best way
is, if any.

Tycho