[Devel] Re: [PATCH 0/3][V2] remove the ns_cgroup
Serge E. Hallyn
serge.hallyn at canonical.com
Mon Sep 27 13:36:58 PDT 2010
Quoting Andrew Morton (akpm at linux-foundation.org):
> On Mon, 27 Sep 2010 12:14:10 +0200
> Daniel Lezcano <daniel.lezcano at free.fr> wrote:
>
> > The ns_cgroup is a control group interacting with the namespaces.
> > When a new namespace is created, a corresponding cgroup is
> > automatically created too. The cgroup name is the pid of the process
> > who did 'unshare' or the child of 'clone'.
> >
> > This cgroup is tied with the namespace because it prevents a
> > process to escape the control group and use the post_clone callback,
> > so the child cgroup inherits the values of the parent cgroup.
> >
> > Unfortunately, the more we use this cgroup and the more we are facing
> > problems with it:
> >
> > (1) when a process unshares, the cgroup name may conflict with a previous
> > cgroup with the same pid, so unshare or clone return -EEXIST
> >
> > (2) the cgroup creation is out of control because there may have an
> > application creating several namespaces where the system will automatically
> > create several cgroups in his back and let them on the cgroupfs (eg. a vrf
> > based on the network namespace).
> >
> > (3) the mix of (1) and (2) force an administrator to regularly check and
> > clean these cgroups.
> >
> > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> > and the cgroupfs mount option. It enables the copy of the parent cgroup
> > when a child cgroup is created. We can then safely remove the ns_cgroup as
> > this flag brings a compatibility. We have now to manually create and add the
> > task to a cgroup, which is consistent with the cgroup framework.
>
> So this is a non-backward-compatible userspace-visible change?
Yes, it is.
Patch 1 is needed to let lxc and libvirt both control containers with
same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel,
what do you think about holding off on patch 3?
> What are the implications of this?
The ns cgroup does 2 things which no other cgroup does: (1) it
moves tasks into a child cgroup any time they unshare or clone
a namespace. And (2) it prevents them from moving up to a parent
cgroup. The latter in particular makes it the only way, without
using an LSM, of locking root into a cgroup, until user namespaces
are further developed (*).
-serge
(*) - Maybe something to add to that new kernel todo list
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list