[Devel] Re: [PATCH 0/3][V2] remove the ns_cgroup
Daniel Lezcano
daniel.lezcano at free.fr
Mon Sep 27 13:43:18 PDT 2010
On 09/27/2010 09:57 PM, Andrew Morton wrote:
> On Mon, 27 Sep 2010 12:14:10 +0200
> Daniel Lezcano<daniel.lezcano at free.fr> wrote:
>
>
>> The ns_cgroup is a control group interacting with the namespaces.
>> When a new namespace is created, a corresponding cgroup is
>> automatically created too. The cgroup name is the pid of the process
>> who did 'unshare' or the child of 'clone'.
>>
>> This cgroup is tied with the namespace because it prevents a
>> process to escape the control group and use the post_clone callback,
>> so the child cgroup inherits the values of the parent cgroup.
>>
>> Unfortunately, the more we use this cgroup and the more we are facing
>> problems with it:
>>
>> (1) when a process unshares, the cgroup name may conflict with a previous
>> cgroup with the same pid, so unshare or clone return -EEXIST
>>
>> (2) the cgroup creation is out of control because there may have an
>> application creating several namespaces where the system will automatically
>> create several cgroups in his back and let them on the cgroupfs (eg. a vrf
>> based on the network namespace).
>>
>> (3) the mix of (1) and (2) force an administrator to regularly check and
>> clean these cgroups.
>>
>> This patchset removes the ns_cgroup by adding a new flag to the cgroup
>> and the cgroupfs mount option. It enables the copy of the parent cgroup
>> when a child cgroup is created. We can then safely remove the ns_cgroup as
>> this flag brings a compatibility. We have now to manually create and add the
>> task to a cgroup, which is consistent with the cgroup framework.
>>
> So this is a non-backward-compatible userspace-visible change?
>
> What are the implications of this?
>
An application will have to create a directory in the cgroup directory
and write the pid in the tasks file, instead of assuming it is
automatically created with the unshare/clone. The cgroupfs should be
mounted with the 'clone_children' option set.
AFAIK, I am the only one, with the lxc tools, using the ns_cgroup and I
will be happy to get rid of it. People is used to change the default
cgroup mount options to mount all the subsystems except the ns_cgroup
(for example this is needed for libvirt if I am not wrong). IMHO, a very
few people will be impacted, to not say nobody.
-- Daniel
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list