[Devel] Re: [RFC] cpuset update_cgroup_cpus_allowed
David Rientjes
rientjes at google.com
Mon Oct 15 11:49:02 PDT 2007
On Mon, 15 Oct 2007, Paul Jackson wrote:
> --- 2.6.23-mm1.orig/kernel/cpuset.c 2007-10-14 22:24:56.268309633 -0700
> +++ 2.6.23-mm1/kernel/cpuset.c 2007-10-14 22:34:52.645364388 -0700
> @@ -677,6 +677,64 @@ done:
> }
>
> /*
> + * update_cgroup_cpus_allowed(cont, cpus)
> + *
> + * Keep looping over the tasks in cgroup 'cont', up to 'ntasks'
> + * tasks at a time, setting each task->cpus_allowed to 'cpus',
> + * until all tasks in the cgroup have that cpus_allowed setting.
> + *
> + * The 'set_cpus_allowed()' call cannot be made while holding the
> + * css_set_lock lock embedded in the cgroup_iter_* calls, so we stash
> + * some task pointers, in the tasks[] array on the stack, then drop
> + * that lock (cgroup_iter_end) before looping over the stashed tasks
> + * to update their cpus_allowed fields.
> + *
> + * Making the const 'ntasks' larger would use more stack space (bad),
> + * and reduce the number of cgroup_iter_start/cgroup_iter_end calls
> + * (good). But perhaps more importantly, it could allow any bugs
> + * lurking in the 'need_repeat' looping logic to remain hidden longer.
> + * So keep ntasks rather small, to ensure any bugs in this loop logic
> + * are exposed quickly.
> + */
> +static void update_cgroup_cpus_allowed(struct cgroup *cont, cpumask_t *cpus)
> +{
> + int need_repeat = true;
> +
> + while (need_repeat) {
> + struct cgroup_iter it;
> + const int ntasks = 10;
> + struct task_struct *tasks[ntasks];
> + struct task_struct **p, **q;
> +
> + need_repeat = false;
> + p = tasks;
> +
> + cgroup_iter_start(cont, &it);
> + while (1) {
> + struct task_struct *t;
> +
> + t = cgroup_iter_next(cont, &it);
> + if (!t)
> + break;
> + if (cpus_equal(*cpus, t->cpus_allowed))
> + continue;
By making this cpus_equal() and not cpus_intersects(), you're trying to
make sure that t->cpus_allowed is always equal to *cpus for each task in
the iterator.
> + if (p == tasks + ntasks) {
> + need_repeat = true;
> + break;
> + }
> + get_task_struct(t);
> + *p++ = t;
> + }
> + cgroup_iter_end(cont, &it);
> +
> + for (q = tasks; q < p; q++) {
> + set_cpus_allowed(*q, *cpus);
> + put_task_struct(*q);
> + }
> + }
> +}
Yet by not doing any locking here to prevent a cpu from being
hot-unplugged, you can race and allow the hot-unplug event to happen
before calling set_cpus_allowed(). That makes this entire function a
no-op with set_cpus_allowed() returning -EINVAL for every call, which
isn't caught, and no error is reported to userspace.
Now all the tasks in the cpuset have an inconsistent state with respect to
their p->cpuset->cpus_allowed, because that was already updated in
update_cpumask(). When userspace checks that value via the 'cpus' file,
this is the value returned which is actually not true at all for any of
the tasks in 'tasks'.
David
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list