[Devel] [PATCH RH7 2/2] proc/cpuset: do not show cpuset in CT

Fri May 20 02:16:03 PDT 2016

On Thu, May 19, 2016 at 06:32:01PM +0300, Pavel Tikhomirov wrote:
> After commit da53619c5d49 ("ve/cpuset: revert changes allowing to
> attach to empty cpusets") one can not create non-empty cpuset
> cgroup in CT. And docker which tries to create cgroup for every
> visible controller creates cpuset cgroup for docker-ct and fails
> to add processes to it.
> 
> Cgroup files cpuset.cpus are by design not valid to use
> in our CTs as they pin processes in cgroup to defined range of
> processors, but we don't want processes in container to be able
> to pin itself to cpus they want. We have other mechanism to restric
> CT's cpus usage - cpu.nr_cpus cgroup file, which allows balansing
> containers between cpus. So we faked cpuset.cpus in CT so one can
> not realy pin processes in CT. But that makes all cpuset cgroups
> non-initialized and we also can't attach processes to cgroups. Same
> is valid for cpuset.mems exept we do not have ~nr_mems.
> 
> We can just hide cpuset cgroup from /proc/self/cgroup and /proc/cgroups
> to protect it from being used in CT(and also do not mount it in
> libvzctl, which seem to automaticly happen).

Yeah, libvzctl is smart enough to check /proc/cgroups when adding cgroup
bindmounts, which happens after attaching to ve cgroup, so it should
just work.

> Docker not seeing
> cpuset will almost silently skip it and work as usual.

I hope nobody but docker has intention to use cpuset inside container.

> 
> *Over option is to make fake cpuset.{cpus,mems} to be semi-fake copying
> values from root cgroup on write. But that can lock us changing
> these files on host root cgroup as they are hierarchical.

Nah, that's bad. I bet we would end up reintroducing our ugly hacks
allowing to attach to empty cpusets then.

BTW, I think we can now revert commit 7c57b078d025b ("sched: Port
diff-fairsched-cpuset-add-fake-cpuset-for-containers"). Please do.

> 
> https://jira.sw.ru/browse/PSBM-47280
> 
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
>  kernel/cgroup.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5afeb59b..0a284f2 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4953,6 +4953,11 @@ int proc_cgroup_show(struct seq_file *m, void *v)
>  		struct cgroup *cgrp;
>  		int count = 0;
>  
> +		if (!ve_is_super(get_exec_env()) &&
> +		    (root->subsys_mask & (1UL << cpuset_subsys_id)))
> +			/* do not show cpuset in CT */
> +			continue;
> +
>  		seq_printf(m, "%d:", root->hierarchy_id);
>  		for_each_subsys(root, ss)
>  			seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
> @@ -4997,6 +5002,10 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
>  
>  		if (ss == NULL)
>  			continue;
> +		if (!ve_is_super(get_exec_env()) &&
> +		    (ss->root->subsys_mask & (1UL << cpuset_subsys_id)))
> +			/* do not show cpuset in CT */
> +			continue;

This check is better to be in a function. More flexible that way - you
can extend the black list by patching just one place instead of two
then.

>  		num = _cg_virtualized(ss->root->number_of_cgroups);
>  		seq_printf(m, "%s\t%d\t%d\t%d\n",
>  			   ss->name, ss->root->hierarchy_id,