[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable
Ben Blum
bblum at andrew.cmu.edu
Fri Dec 24 20:24:33 PST 2010
On Thu, Dec 23, 2010 at 10:33:52PM -0500, Ben Blum wrote:
> On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote:
> > Patches have gone a bit stale, sorry. Refactoring in
> > kernel/cgroup_freezer.c necessitates a refresh and retest please.
>
> commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff
> in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which
> wraps kmalloc) instead of on the stack.
>
> 1. All these functions have 'void' return values, indicating that
> calling them them must not fail. Sure there are bailout cases, but no
> semblance of cross-function error propagation. Most importantly,
> cpuset_attach is a subsystem callback, which MUST not fail given the
> way it's used in cgroups, so relying on kmalloc is not safe.
>
> 2. I'm working on a patch series which needs to hold tasklist_lock
> across ss->attach callbacks (in cpuset_attach's "if (threadgroup)"
> case, this is how safe traversal of tsk->thread_group will be
> ensured), and kmalloc isn't allowed while holding a spin-lock.
>
> Why do we need heap-allocation here at all? In each case their scope is
> exactly the function's scope, and neither the commit nor the surrounding
> patch series give any explanation. I'd like to revert the patch if
> possible.
>
> cc'ing Miao Xie (author) and David Rientjes (acker).
>
> -- Ben
>
By the way, there is still another issue on top of this.
cpuset_attach
-> do_migrate_pages
-> migrate_prep
-> lru_add_drain_all
-> __alloc_percpu, which does GFP_KERNEL, which can sleep.
This will be no good if we call ss->attach with tasklist_lock held. A
friend points out that even without the sleeping/etc you still don't
want to be doing a memory migration while holding a spinlock.
I think the best solution for this would be to add another subsystem
callback, "attach_thread", which would do the cheap once-per-thread
operations and be called under tasklist_lock. It'd look like this:
read_lock(tasklist_lock)
for each thread (c) {
cgroup_task_migrate(c)
for each subsys (ss) {
ss->attach_thread(c)
}
}
read_unlock(tasklist_lock)
for each subsys (ss) {
ss->attach(leader)
}
For cpuset, this will need to keep the nodemask data between the
attach_thread and attach functions, so the nodemasks will need to be
global instead of local.
-- Ben
P.S. when i'm iterating over tsk->thread_group while using tasklist_lock
to protect it instead of rcu_read_lock, should i use list_for_each_entry
or list_for_each_entry_rcu? it has been a while since i wrote that bit.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list