[Devel] [PATCH RHEL7 COMMIT] ms/mm/memcg: fix refcount error while moving and swapping

Konstantin Khorenko khorenko at virtuozzo.com
Wed Jul 5 18:37:01 MSK 2023


The commit is pushed to "branch-rh7-3.10.0-1160.90.1.vz7.200.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.90.1.vz7.200.4
------>
commit 217d8b335eede58ef7fecb62d811a6bfda5b74c5
Author: Hugh Dickins <hughd at google.com>
Date:   Wed Jul 5 14:39:48 2023 +0800

    ms/mm/memcg: fix refcount error while moving and swapping
    
    It was hard to keep a test running, moving tasks between memcgs with
    move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s
    refcount is discovered to be 0 (supposedly impossible), so it is then
    forced to REFCOUNT_SATURATED, and after thousands of warnings in quick
    succession, the test is at last put out of misery by being OOM killed.
    
    This is because of the way moved_swap accounting was saved up until the
    task move gets completed in __mem_cgroup_clear_mc(), deferred from when
    mem_cgroup_move_swap_account() actually exchanged old and new ids.
    Concurrent activity can free up swap quicker than the task is scanned,
    bringing id refcount down 0 (which should only be possible when
    offlining).
    
    Just skip that optimization: do that part of the accounting immediately.
    
    Fixes: 615d66c37c75 ("mm: memcontrol: fix memcg id ref counter on swap charge move")
    Signed-off-by: Hugh Dickins <hughd at google.com>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi at linux.alibaba.com>
    Cc: Johannes Weiner <hannes at cmpxchg.org>
    Cc: Alex Shi <alex.shi at linux.alibaba.com>
    Cc: Shakeel Butt <shakeelb at google.com>
    Cc: Michal Hocko <mhocko at suse.com>
    Cc: <stable at vger.kernel.org>
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    https://jira.vzint.dev/browse/PSBM-147036
    
    (cherry picked from commit 8d22a9351035ef2ff12ef163a1091b8b8cf1e49c)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =================
    Patchset description:
    memcg: release id when offlinging cgroup
    
    We see that container user can deplete memory cgroup ids on the system
    (64k) and prevent further memory cgroup creation. In crash collected by
    our customer in such a situation we see that mem_cgroup_idr is full of
    cgroups from one container with same exact path (cgroup of docker
    service), cgroups are not released because they have kmem charges, this
    kmem charge is for a tmpfs dentry allocated from this cgroup. (And on
    vz7 kernel it seems that such a dentry is only released after umounting
    tmpfs or removing the corresponding file from tmpfs.)
    
    So there is a valid way to hold kmem cgroup for a long time. Similar
    thing was mentioned in mainstream with page cache holding kmem cgroup
    for a long time. And they proposed a way to deal with it - just release
    cgroup id early so that one can allocate new cgroups immediately.
    
    Reproduce:
    https://git.vzint.dev/users/ptikhomirov/repos/helpers/browse/memcg-related/test-mycg-tmpfs.sh
    
    After this fix the number of memory cgroups in /proc/cgroups can now
    show > 64k as we allow to leave memory cgroups hanging while releasing
    their ids.
    
    Note: Maybe it's a bad idea to allow container to eat kernel
    memory with such a hanging cgroups, but yet I don't have better ideas.
    
    https://jira.vzint.dev/browse/PSBM-147473
    https://jira.vzint.dev/browse/PSBM-147036
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f86c395fe8ee..0ff99bf5abdb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7462,7 +7462,6 @@ static void __mem_cgroup_clear_mc(void)
 			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
 		}
 
-		mem_cgroup_id_get_many(mc.to, mc.moved_swap);
 
 		mc.moved_swap = 0;
 	}
@@ -7622,7 +7621,8 @@ put:			/* get_mctgt_type() gets the page */
 			ent = target.ent;
 			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
 				mc.precharge--;
-				/* we fixup refcnts and charges later. */
+				mem_cgroup_id_get_many(mc.to, 1);
+				/* we fixup other refcnts and charges later. */
 				mc.moved_swap++;
 			}
 			break;


More information about the Devel mailing list