[Devel] [PATCH RHEL7 COMMIT] ms/memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure

Konstantin Khorenko khorenko at virtuozzo.com
Wed Jul 5 18:37:01 MSK 2023


The commit is pushed to "branch-rh7-3.10.0-1160.90.1.vz7.200.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.90.1.vz7.200.4
------>
commit f6ed14e332eb3286cfe15aedcaa7fe333de20f6b
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date:   Wed Jul 5 14:39:49 2023 +0800

    ms/memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
    
    In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed
    in mem_cgroup_idr even after memcg memory is freed.  This leads to leak
    of ID in mem_cgroup_idr.
    
    This patch adds removal into mem_cgroup_css_alloc(), which fixes the
    problem.  For better readability, it adds a generic helper which is used
    in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well.
    
    Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain
    Fixes 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    Acked-by: Johannes Weiner <hannes at cmpxchg.org>
    Acked-by: Vladimir Davydov <vdavydov.dev at gmail.com>
    Cc: Michal Hocko <mhocko at kernel.org>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    Changes:
    - skip hunk in mem_cgroup_css_alloc as there is no such error path yet,
    patch is not strictly required but better have it as a small cleanup
    
    https://jira.vzint.dev/browse/PSBM-147036
    
    (cherry picked from commit 7e97de0b033bcac4fa9a35cef72e0c06e6a22c67)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =================
    Patchset description:
    memcg: release id when offlinging cgroup
    
    We see that container user can deplete memory cgroup ids on the system
    (64k) and prevent further memory cgroup creation. In crash collected by
    our customer in such a situation we see that mem_cgroup_idr is full of
    cgroups from one container with same exact path (cgroup of docker
    service), cgroups are not released because they have kmem charges, this
    kmem charge is for a tmpfs dentry allocated from this cgroup. (And on
    vz7 kernel it seems that such a dentry is only released after umounting
    tmpfs or removing the corresponding file from tmpfs.)
    
    So there is a valid way to hold kmem cgroup for a long time. Similar
    thing was mentioned in mainstream with page cache holding kmem cgroup
    for a long time. And they proposed a way to deal with it - just release
    cgroup id early so that one can allocate new cgroups immediately.
    
    Reproduce:
    https://git.vzint.dev/users/ptikhomirov/repos/helpers/browse/memcg-related/test-mycg-tmpfs.sh
    
    After this fix the number of memory cgroups in /proc/cgroups can now
    show > 64k as we allow to leave memory cgroups hanging while releasing
    their ids.
    
    Note: Maybe it's a bad idea to allow container to eat kernel
    memory with such a hanging cgroups, but yet I don't have better ideas.
    
    https://jira.vzint.dev/browse/PSBM-147473
    https://jira.vzint.dev/browse/PSBM-147036
---
 mm/memcontrol.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0ff99bf5abdb..5c0a7dc32908 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6671,6 +6671,14 @@ unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
 	return memcg->id.id;
 }
 
+static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
+{
+	if (memcg->id.id > 0) {
+		idr_remove(&mem_cgroup_idr, memcg->id.id);
+		memcg->id.id = 0;
+	}
+}
+
 static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n)
 {
 	atomic_add(n, &memcg->id.ref);
@@ -6697,8 +6705,7 @@ static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
 static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n)
 {
 	if (atomic_sub_and_test(n, &memcg->id.ref)) {
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
-		memcg->id.id = 0;
+		mem_cgroup_id_remove(memcg);
 
 		/* Memcg ID pins CSS */
 		css_put(&memcg->css);
@@ -6808,8 +6815,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	for_each_node(node)
 		free_mem_cgroup_per_zone_info(memcg, node);
 
-	if (memcg->id.id > 0)
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
+	mem_cgroup_id_remove(memcg);
 fail:
 	kfree(memcg);
 	return NULL;


More information about the Devel mailing list