[Devel] [PATCH RH7 2/2] proc/cpuset: do not show cpuset in CT
Vladimir Davydov
vdavydov at virtuozzo.com
Fri May 20 02:16:03 PDT 2016
On Thu, May 19, 2016 at 06:32:01PM +0300, Pavel Tikhomirov wrote:
> After commit da53619c5d49 ("ve/cpuset: revert changes allowing to
> attach to empty cpusets") one can not create non-empty cpuset
> cgroup in CT. And docker which tries to create cgroup for every
> visible controller creates cpuset cgroup for docker-ct and fails
> to add processes to it.
>
> Cgroup files cpuset.cpus are by design not valid to use
> in our CTs as they pin processes in cgroup to defined range of
> processors, but we don't want processes in container to be able
> to pin itself to cpus they want. We have other mechanism to restric
> CT's cpus usage - cpu.nr_cpus cgroup file, which allows balansing
> containers between cpus. So we faked cpuset.cpus in CT so one can
> not realy pin processes in CT. But that makes all cpuset cgroups
> non-initialized and we also can't attach processes to cgroups. Same
> is valid for cpuset.mems exept we do not have ~nr_mems.
>
> We can just hide cpuset cgroup from /proc/self/cgroup and /proc/cgroups
> to protect it from being used in CT(and also do not mount it in
> libvzctl, which seem to automaticly happen).
Yeah, libvzctl is smart enough to check /proc/cgroups when adding cgroup
bindmounts, which happens after attaching to ve cgroup, so it should
just work.
> Docker not seeing
> cpuset will almost silently skip it and work as usual.
I hope nobody but docker has intention to use cpuset inside container.
>
> *Over option is to make fake cpuset.{cpus,mems} to be semi-fake copying
> values from root cgroup on write. But that can lock us changing
> these files on host root cgroup as they are hierarchical.
Nah, that's bad. I bet we would end up reintroducing our ugly hacks
allowing to attach to empty cpusets then.
BTW, I think we can now revert commit 7c57b078d025b ("sched: Port
diff-fairsched-cpuset-add-fake-cpuset-for-containers"). Please do.
>
> https://jira.sw.ru/browse/PSBM-47280
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
> kernel/cgroup.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5afeb59b..0a284f2 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4953,6 +4953,11 @@ int proc_cgroup_show(struct seq_file *m, void *v)
> struct cgroup *cgrp;
> int count = 0;
>
> + if (!ve_is_super(get_exec_env()) &&
> + (root->subsys_mask & (1UL << cpuset_subsys_id)))
> + /* do not show cpuset in CT */
> + continue;
> +
> seq_printf(m, "%d:", root->hierarchy_id);
> for_each_subsys(root, ss)
> seq_printf(m, "%s%s", count++ ? "," : "", ss->name);
> @@ -4997,6 +5002,10 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
>
> if (ss == NULL)
> continue;
> + if (!ve_is_super(get_exec_env()) &&
> + (ss->root->subsys_mask & (1UL << cpuset_subsys_id)))
> + /* do not show cpuset in CT */
> + continue;
This check is better to be in a function. More flexible that way - you
can extend the black list by patching just one place instead of two
then.
> num = _cg_virtualized(ss->root->number_of_cgroups);
> seq_printf(m, "%s\t%d\t%d\t%d\n",
> ss->name, ss->root->hierarchy_id,
More information about the Devel
mailing list