[Devel] [PATCH RHEL7 COMMIT] cgroup: switch from cgroup_mutex to cgroup_root_mutex in proc_cgroup_show
Konstantin Khorenko
khorenko at virtuozzo.com
Mon Jun 6 18:32:05 MSK 2022
The commit is pushed to "branch-rh7-3.10.0-1160.62.1.vz7.187.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.62.1.vz7.187.3
------>
commit 129c9c7181a49aeca45f46ae2ccc731ac02fb32e
Author: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Date: Wed Jun 1 11:29:32 2022 +0300
cgroup: switch from cgroup_mutex to cgroup_root_mutex in proc_cgroup_show
We saw systemd reading /proc/<pid>/cgroup quiet a lot and we also see
that cgroup_mutex is contended enough to make it's waiters wait for
several seconds.
Assuming that cgroup_root_mutex should be less contended as roots change
not so much let's switch /proc/<pid>/cgroup to it.
Why we can do this?
- roots list is protected by both cgroup_mutex and cgroup_root_mutex
- without cgroup_mutex ve_hide_cgroups() can be racy when current task
is moved to other ve cgroup, but that's not too bad
- for a root from roots list (not "rootnode" root), root->subsys_list is
also protected by both mutexes
- __task_cgroup_from_root() iterates over task->cgroups->cg_links while
holding css_set_lock, which should be safe, and to protect found
cgroup from dissapearing after we release css_set_lock, we take
rcu_read_lock().
khorenko@ notes:
- __put_css_set() also takes rcu_read_lock() before decrementing cgrp->count
and states this is enough to keep cgroup alive.
- comment in mm/memcontrol.c (and Pasha says the callstack is still valid):
* Cgroup destruction stack:
* cgroup_offline_fn()
* offline_css()
* list_del_rcu()
* dput()
* ...
* cgroup_diput()
* call_rcu(&cgrp->rcu_head, cgroup_free_rcu)
So this callstack also proves holding rcu_read_lock() is enough to keep
cgroup alive.
https://jira.sw.ru/browse/PSBM-139206
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
kernel/cgroup.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 64f4227b2305..1f5d80130070 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -705,13 +705,14 @@ static inline bool css_has_host_cgroups(struct css_set *css_set)
* Return the cgroup for "task" from the given hierarchy. Must be
* called with cgroup_mutex held.
*/
-static struct cgroup *task_cgroup_from_root(struct task_struct *task,
- struct cgroupfs_root *root)
+static struct cgroup *__task_cgroup_from_root(struct task_struct *task,
+ struct cgroupfs_root *root,
+ int nocgmtx)
{
struct css_set *css;
struct cgroup *res = NULL;
- BUG_ON(!mutex_is_locked(&cgroup_mutex));
+ BUG_ON(!nocgmtx && !mutex_is_locked(&cgroup_mutex));
read_lock(&css_set_lock);
/*
* No need to lock the task - since we hold cgroup_mutex the
@@ -736,6 +737,12 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task,
return res;
}
+static struct cgroup *task_cgroup_from_root(struct task_struct *task,
+ struct cgroupfs_root *root)
+{
+ return __task_cgroup_from_root(task, root, false);
+}
+
/*
* There is one global cgroup mutex. We also require taking
* task_lock() when dereferencing a task's cgroup subsys pointers.
@@ -5578,7 +5585,8 @@ int proc_cgroup_show(struct seq_file *m, void *v)
retval = 0;
- mutex_lock(&cgroup_mutex);
+ /* without cgroup_mutex we must protect roots list */
+ mutex_lock(&cgroup_root_mutex);
for_each_active_root(root) {
struct cgroup_subsys *ss;
@@ -5594,8 +5602,13 @@ int proc_cgroup_show(struct seq_file *m, void *v)
seq_printf(m, "%sname=%s", count ? "," : "",
root->name);
seq_putc(m, ':');
- cgrp = task_cgroup_from_root(tsk, root);
+
+ /* without cgroup_mutex we take rcu to make cgrp valid */
+ rcu_read_lock();
+ cgrp = __task_cgroup_from_root(tsk, root, true);
retval = cgroup_path_ve(cgrp, buf, PAGE_SIZE);
+ rcu_read_unlock();
+
if (retval < 0)
goto out_unlock;
seq_puts(m, buf);
@@ -5603,7 +5616,7 @@ int proc_cgroup_show(struct seq_file *m, void *v)
}
out_unlock:
- mutex_unlock(&cgroup_mutex);
+ mutex_unlock(&cgroup_root_mutex);
put_task_struct(tsk);
out_free:
kfree(buf);
More information about the Devel
mailing list