[Devel] [PATCH RHEL7 COMMIT] ms/mm: memcontrol: fix memcg id ref counter on swap charge move

Wed Jul 5 18:37:01 MSK 2023

The commit is pushed to "branch-rh7-3.10.0-1160.90.1.vz7.200.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.90.1.vz7.200.4
------>
commit 11870ae0f7b6083816d7532b8b39e86985a0b3de
Author: Vladimir Davydov <vdavydov at virtuozzo.com>
Date:   Wed Jul 5 14:39:47 2023 +0800

    ms/mm: memcontrol: fix memcg id ref counter on swap charge move
    
    Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
    after many small jobs") swap entries do not pin memcg->css.refcnt
    directly.  Instead, they pin memcg->id.ref.  So we should adjust the
    reference counters accordingly when moving swap charges between cgroups.
    
    Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
    Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.com
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
    
    Acked-by: Michal Hocko <mhocko at suse.com>
    Acked-by: Johannes Weiner <hannes at cmpxchg.org>
    Cc: <stable at vger.kernel.org>    [3.19+]
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    Changes:
    original patch also did:
    -               css_put_many(&mc.from->css, mc.moved_swap);
    +               css_put_many(&mc.to->css, mc.moved_swap);
    but this change seems strange and unrelated to me, let's skip it.
    
    https://jira.vzint.dev/browse/PSBM-147036
    
    (cherry picked from commit 615d66c37c755c49ce022c9e5ac0875d27d2603d)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =================
    Patchset description:
    memcg: release id when offlinging cgroup
    
    We see that container user can deplete memory cgroup ids on the system
    (64k) and prevent further memory cgroup creation. In crash collected by
    our customer in such a situation we see that mem_cgroup_idr is full of
    cgroups from one container with same exact path (cgroup of docker
    service), cgroups are not released because they have kmem charges, this
    kmem charge is for a tmpfs dentry allocated from this cgroup. (And on
    vz7 kernel it seems that such a dentry is only released after umounting
    tmpfs or removing the corresponding file from tmpfs.)
    
    So there is a valid way to hold kmem cgroup for a long time. Similar
    thing was mentioned in mainstream with page cache holding kmem cgroup
    for a long time. And they proposed a way to deal with it - just release
    cgroup id early so that one can allocate new cgroups immediately.
    
    Reproduce:
    https://git.vzint.dev/users/ptikhomirov/repos/helpers/browse/memcg-related/test-mycg-tmpfs.sh
    
    After this fix the number of memory cgroups in /proc/cgroups can now
    show > 64k as we allow to leave memory cgroups hanging while releasing
    their ids.
    
    Note: Maybe it's a bad idea to allow container to eat kernel
    memory with such a hanging cgroups, but yet I don't have better ideas.
    
    https://jira.vzint.dev/browse/PSBM-147473
    https://jira.vzint.dev/browse/PSBM-147036
---
 mm/memcontrol.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d106e8f64997..f86c395fe8ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6671,9 +6671,9 @@ unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
 	return memcg->id.id;
 }
 
-static void mem_cgroup_id_get(struct mem_cgroup *memcg)
+static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n)
 {
-	atomic_inc(&memcg->id.ref);
+	atomic_add(n, &memcg->id.ref);
 }
 
 static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
@@ -6694,9 +6694,9 @@ static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
 	return memcg;
 }
 
-static void mem_cgroup_id_put(struct mem_cgroup *memcg)
+static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n)
 {
-	if (atomic_dec_and_test(&memcg->id.ref)) {
+	if (atomic_sub_and_test(n, &memcg->id.ref)) {
 		idr_remove(&mem_cgroup_idr, memcg->id.id);
 		memcg->id.id = 0;
 
@@ -6705,6 +6705,16 @@ static void mem_cgroup_id_put(struct mem_cgroup *memcg)
 	}
 }
 
+static inline void mem_cgroup_id_get(struct mem_cgroup *memcg)
+{
+	mem_cgroup_id_get_many(memcg, 1);
+}
+
+static inline void mem_cgroup_id_put(struct mem_cgroup *memcg)
+{
+	mem_cgroup_id_put_many(memcg, 1);
+}
+
 /**
  * mem_cgroup_from_id - look up a memcg from a memcg id
  * @id: the memcg id to look up
@@ -7439,6 +7449,8 @@ static void __mem_cgroup_clear_mc(void)
 		if (!mem_cgroup_is_root(mc.from))
 			page_counter_uncharge(&mc.from->memsw, mc.moved_swap);
 
+		mem_cgroup_id_put_many(mc.from, mc.moved_swap);
+
 		for (i = 0; i < mc.moved_swap; i++)
 			css_put(&mc.from->css);
 
@@ -7449,7 +7461,9 @@ static void __mem_cgroup_clear_mc(void)
 			 */
 			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
 		}
-		/* we've already done css_get(mc.to) */
+
+		mem_cgroup_id_get_many(mc.to, mc.moved_swap);
+
 		mc.moved_swap = 0;
 	}
 	if (do_swap_account) {