[Devel] [PATCH RHEL7 COMMIT] cgroup: switch from cgroup_mutex to cgroup_root_mutex in proc_cgroup_show

Konstantin Khorenko khorenko at virtuozzo.com
Mon Jun 6 18:32:05 MSK 2022


The commit is pushed to "branch-rh7-3.10.0-1160.62.1.vz7.187.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.62.1.vz7.187.3
------>
commit 129c9c7181a49aeca45f46ae2ccc731ac02fb32e
Author: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Date:   Wed Jun 1 11:29:32 2022 +0300

    cgroup: switch from cgroup_mutex to cgroup_root_mutex in proc_cgroup_show
    
    We saw systemd reading /proc/<pid>/cgroup quiet a lot and we also see
    that cgroup_mutex is contended enough to make it's waiters wait for
    several seconds.
    
    Assuming that cgroup_root_mutex should be less contended as roots change
    not so much let's switch /proc/<pid>/cgroup to it.
    
    Why we can do this?
     - roots list is protected by both cgroup_mutex and cgroup_root_mutex
     - without cgroup_mutex ve_hide_cgroups() can be racy when current task
       is moved to other ve cgroup, but that's not too bad
     - for a root from roots list (not "rootnode" root), root->subsys_list is
       also protected by both mutexes
     - __task_cgroup_from_root() iterates over task->cgroups->cg_links while
       holding css_set_lock, which should be safe, and to protect found
       cgroup from dissapearing after we release css_set_lock, we take
       rcu_read_lock().
    
    khorenko@ notes:
     - __put_css_set() also takes rcu_read_lock() before decrementing cgrp->count
       and states this is enough to keep cgroup alive.
     - comment in mm/memcontrol.c (and Pasha says the callstack is still valid):
             * Cgroup destruction stack:
             *   cgroup_offline_fn()
             *    offline_css()
             *    list_del_rcu()
             *    dput()
             *    ...
             *     cgroup_diput()
             *      call_rcu(&cgrp->rcu_head, cgroup_free_rcu)
    
       So this callstack also proves holding rcu_read_lock() is enough to keep
    cgroup alive.
    
    https://jira.sw.ru/browse/PSBM-139206
    
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 kernel/cgroup.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 64f4227b2305..1f5d80130070 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -705,13 +705,14 @@ static inline bool css_has_host_cgroups(struct css_set *css_set)
  * Return the cgroup for "task" from the given hierarchy. Must be
  * called with cgroup_mutex held.
  */
-static struct cgroup *task_cgroup_from_root(struct task_struct *task,
-					    struct cgroupfs_root *root)
+static struct cgroup *__task_cgroup_from_root(struct task_struct *task,
+					      struct cgroupfs_root *root,
+					      int nocgmtx)
 {
 	struct css_set *css;
 	struct cgroup *res = NULL;
 
-	BUG_ON(!mutex_is_locked(&cgroup_mutex));
+	BUG_ON(!nocgmtx && !mutex_is_locked(&cgroup_mutex));
 	read_lock(&css_set_lock);
 	/*
 	 * No need to lock the task - since we hold cgroup_mutex the
@@ -736,6 +737,12 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task,
 	return res;
 }
 
+static struct cgroup *task_cgroup_from_root(struct task_struct *task,
+					      struct cgroupfs_root *root)
+{
+	return __task_cgroup_from_root(task, root, false);
+}
+
 /*
  * There is one global cgroup mutex. We also require taking
  * task_lock() when dereferencing a task's cgroup subsys pointers.
@@ -5578,7 +5585,8 @@ int proc_cgroup_show(struct seq_file *m, void *v)
 
 	retval = 0;
 
-	mutex_lock(&cgroup_mutex);
+	/* without cgroup_mutex we must protect roots list */
+	mutex_lock(&cgroup_root_mutex);
 
 	for_each_active_root(root) {
 		struct cgroup_subsys *ss;
@@ -5594,8 +5602,13 @@ int proc_cgroup_show(struct seq_file *m, void *v)
 			seq_printf(m, "%sname=%s", count ? "," : "",
 				   root->name);
 		seq_putc(m, ':');
-		cgrp = task_cgroup_from_root(tsk, root);
+
+		/* without cgroup_mutex we take rcu to make cgrp valid */
+		rcu_read_lock();
+		cgrp = __task_cgroup_from_root(tsk, root, true);
 		retval = cgroup_path_ve(cgrp, buf, PAGE_SIZE);
+		rcu_read_unlock();
+
 		if (retval < 0)
 			goto out_unlock;
 		seq_puts(m, buf);
@@ -5603,7 +5616,7 @@ int proc_cgroup_show(struct seq_file *m, void *v)
 	}
 
 out_unlock:
-	mutex_unlock(&cgroup_mutex);
+	mutex_unlock(&cgroup_root_mutex);
 	put_task_struct(tsk);
 out_free:
 	kfree(buf);


More information about the Devel mailing list