[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

KAMEZAWA Hiroyuki kamezawa.hiroyu at jp.fujitsu.com
Sun Dec 26 23:42:07 PST 2010


On Mon, 27 Dec 2010 02:21:23 -0500
Ben Blum <bblum at andrew.cmu.edu> wrote:
 
> > IIUC, what you request is 'don't call any kind of function may sleep'.
> > in can_attach() and attach() callback. That's impossible.
> > Memory cgroup will go to sleep because of calling memory reclaim.
> > 
> > Is tasklist_lock the only way ? Can you catch "new thread" event in cgroup_fork() ?
> > 
> > For example,
> > 
> > ==
> > void cgroup_fork(struct task_struct *child)
> > {
> > 	if (current->in_moving) {
> > 		add_to_waitqueue somewhere.
> > 	}
> >         task_lock(current);
> >         child->cgroups = current->cgroups;
> >         get_css_set(child->cgroups);
> >         task_unlock(current);
> >         INIT_LIST_HEAD(&child->cg_list);
> > }
> > ==
> > 
> > ==
> >  read_lock(&tasklist_lock);
> >  list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) {
> > 	tsk->in_moving = true;
> >  }
> >  read_unlock(&tasklist_lock);
> > 
> >  ->pre_attach()
> >  ->attach()
> > 
> >  read_unlock(&tasklist_lock);
> >  list_for_each_.....
> > 	tsk->in_moving_cgroup = false;
> > 
> >  wakeup threads in waitq.
> > ==
> > 
> > Ah yes, this will have some bug. But please avoid to take tasklist_lock() around
> > all pre_attach() and attach().
> > 
> > Thanks,
> > -Kame
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > -Kame
> 
> You misunderstand slightly: the callbacks are split into a
> once-per-thread function and a thread-independent function, and only the
> former is not allowed to sleep. All of memcg's attachment is thread-
> independent, so sleeping there is fine.
> 
Okay, then, problem is cpuset, which allocates memory.
(But I feel that limitation -never_sleep- is not very good.)

BTW, mem cgroup chesk mm_owner(mm) == p to decide to take mmap_sem().
Your code does moving "thread_group_leader()". IIUC, this isn't guaranteed to
be the same. I wonder we can see problem with some stupid userland.


> Also, the tasklist_lock isn't used to synchronize fork events; that's
> what threadgroup_fork_lock (an rwsem) is for. It's for protecting the
> thread-group list while iterating over it - so that a race with exec()
> (de_thread(), specifically) doesn't change the threadgroup state. It's
> basically never safe to iterate over leader->thread_group unless you're
> in a no-sleep section.
> 

Thank you for explanation.

At first, cpuset should modify settings around 'mm' only when the thread == mm->owner.
....is there any reason to allow all threads can affect the 'mm' ?

About nodemask, I feel what's required is pre_pre_attach() to cache required memory
before attaching as radix_tree_preload(). Then, subsys can prepare for creating
working area.

Hmm...but I wonder de_thread() should take threadgroup_fork_write_unlock().

I may not understand anything important but I feel taking tasklist_lock() is overkill.

Thanks, 
-Kame

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list