[CRIU] [PATCH v2 2/2] restore: correctly restore cgroup mounts inside a container

Fri Mar 25 10:46:55 PDT 2016

On Fri, Mar 25, 2016 at 08:37:48PM +0300, Pavel Emelyanov wrote:
> On 03/24/2016 08:09 PM, Tycho Andersen wrote:
> > Before the nsroot= mount option, we were just getting lucky because the
> > cgroup superblocks "matched" when inspecting them from userspace, so we
> > were actually getting a bind mount from the host when migrating from within
> > cgroup namespaces.
> > 
> > Instead, let's actually do a new (i.e. not a bind mount) for cgroup
> > namespaces. For this, we need two things:
> > 
> > 1. to prepare the cgroup namespace (and thus the cgroups) before the mount
> >    ns, so when the mount() occurrs it is relative to the right cgroup path.
> > 
> > 2. not reject cgroup filesystems with no root. A cgroup ns mount looks
> >    like:
> > 
> > 	 223 222 0:22 /lxc/unpriv /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd,nsroot=/lxc/unpriv
> > 
> >    i.e. it has /lxc/unpriv as its root, and thus doesn't look rooted to CRIU.
> >    Let's allow cgroup mounts to be unrooted so we can deal with this.
> 
> I have a suggestion how to avoid the hackish checks in validate and can_mount_now().
> 
> 1. Add ->read_img callback to fstype called when reading mount points 
>    images (collect_mnt_from_image) 
> 2. For cgroupfs check for root_ns_mask to contain CLONE_NEWCGROUPNS and
>    cut the mi->root to be "/" effectively turning the mount point into
>    fsroot one
> 
> and leave the hunk that moves tasks into cgroups earlier. Hopefully before
> setting up namespaces would work, all the more so we configure the namespaces
> at the very end.
> 
> What do you think?

Sounds good to me, I'll rework the patch and resend.

Thanks!

Tycho