[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable
Daisuke Nishimura
nishimura at mxp.nes.nec.co.jp
Sun Dec 26 16:53:53 PST 2010
On Fri, 24 Dec 2010 21:55:08 -0500
Ben Blum <bblum at andrew.cmu.edu> wrote:
> On Thu, Dec 23, 2010 at 10:33:52PM -0500, Ben Blum wrote:
> > On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote:
> > > Patches have gone a bit stale, sorry. Refactoring in
> > > kernel/cgroup_freezer.c necessitates a refresh and retest please.
> >
> > commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff
> > in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which
> > wraps kmalloc) instead of on the stack.
> >
> > 1. All these functions have 'void' return values, indicating that
> > calling them them must not fail. Sure there are bailout cases, but no
> > semblance of cross-function error propagation. Most importantly,
> > cpuset_attach is a subsystem callback, which MUST not fail given the
> > way it's used in cgroups, so relying on kmalloc is not safe.
> >
> > 2. I'm working on a patch series which needs to hold tasklist_lock
> > across ss->attach callbacks (in cpuset_attach's "if (threadgroup)"
> > case, this is how safe traversal of tsk->thread_group will be
> > ensured), and kmalloc isn't allowed while holding a spin-lock.
> >
> > Why do we need heap-allocation here at all? In each case their scope is
> > exactly the function's scope, and neither the commit nor the surrounding
> > patch series give any explanation. I'd like to revert the patch if
> > possible.
> >
> > cc'ing Miao Xie (author) and David Rientjes (acker).
> >
> > -- Ben
>
> Well even with the proposed solution to this there is still another
> problem that I see - that of mmap_sem. cpuset_attach() calls into
> mpol_rebind_mm() and do_migrate_pages(), which take mm->mmap_sem for
> writing and reading respectively. This is going to conflict with
> tasklist_lock... but moreover, the memcontrol subsys also touches the
> task's mm->mmap_sem, holding onto it between mem_cgroup_can_attach() and
> mem_cgroup_move_task() - as of b1dd693e5b9348bd68a80e679e03cf9c0973b01b.
>
> So we have (currently, even without my patches):
>
> cgroup_attach_task
> (1) cpuset_can_attach
> (2) mem_cgroup_can_attach
> - down_read(&mm->mmap_sem);
> (3) cpuset_attach
> - mpol_rebind_mm
> - down_write(&mm->mmap_sem);
> - up_write(&mm->mmap_sem);
> - cpuset_migrate_mm
> - do_migrate_pages
> - down_read(&mm->mmap_sem);
> - up_read(&mm->mmap_sem);
> (4) mem_cgroup_move_task
> - mem_cgroup_clear_mc
> - up_read(...);
>
hmm, nice catch.
> Is there some interdependency I'm missing here that guarantees recursive
> locking/deadlock will be avoided? It all looks like typical-case code.
>
Unfortunately, not.
I couldn't hit this because I mount all subsystems onto different
mount points in my environment.
> I think we should move taking the mmap_sem all the way up into
> cgroup_attach_task and cgroup_attach_proc; it will be held for writing
> the whole time. I don't quite understand the mempolicy stuff but maybe
> there can be ways to use mpol_rebind_mm and do_migrate_pages when the
> lock is already held.
>
I agree.
Another solution(just an idea): avoid enabling both "move_charge" feature of memcg
and "memory_migrate" of cpuset at the same time iff they are mounted
under the same mount point. But, hmm... it's not a good idea to make a subsystem
take account of another subsystem, IMHO.
Thanks,
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list