[Devel] [PATCH RHEL7 COMMIT] ms/mm: memcontrol: uncharge pages on swapout

Wed Jul 5 18:37:00 MSK 2023

The commit is pushed to "branch-rh7-3.10.0-1160.90.1.vz7.200.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.90.1.vz7.200.4
------>
commit d0f735e5df66207ef30de23668b5f19b7c16a212
Author: Johannes Weiner <hannes at cmpxchg.org>
Date:   Wed Jul 5 14:39:44 2023 +0800

    ms/mm: memcontrol: uncharge pages on swapout
    
    This series gets rid of the remaining page_cgroup flags, thus cutting the
    memcg per-page overhead down to one pointer.
    
    This patch (of 4):
    
    mem_cgroup_swapout() is called with exclusive access to the page at the
    end of the page's lifetime.  Instead of clearing the PCG_MEMSW flag and
    deferring the uncharge, just do it right away.  This allows follow-up
    patches to simplify the uncharge code.
    
    Signed-off-by: Johannes Weiner <hannes at cmpxchg.org>
    Cc: Hugh Dickins <hughd at google.com>
    Acked-by: Michal Hocko <mhocko at suse.cz>
    Acked-by: Vladimir Davydov <vdavydov at parallels.com>
    Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu at jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    https://jira.vzint.dev/browse/PSBM-147036
    
    (cherry picked from commit 7bdd143c37e591c254d0991ac398a53f3f9ef1af)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =================
    Patchset description:
    memcg: release id when offlinging cgroup
    
    We see that container user can deplete memory cgroup ids on the system
    (64k) and prevent further memory cgroup creation. In crash collected by
    our customer in such a situation we see that mem_cgroup_idr is full of
    cgroups from one container with same exact path (cgroup of docker
    service), cgroups are not released because they have kmem charges, this
    kmem charge is for a tmpfs dentry allocated from this cgroup. (And on
    vz7 kernel it seems that such a dentry is only released after umounting
    tmpfs or removing the corresponding file from tmpfs.)
    
    So there is a valid way to hold kmem cgroup for a long time. Similar
    thing was mentioned in mainstream with page cache holding kmem cgroup
    for a long time. And they proposed a way to deal with it - just release
    cgroup id early so that one can allocate new cgroups immediately.
    
    Reproduce:
    https://git.vzint.dev/users/ptikhomirov/repos/helpers/browse/memcg-related/test-mycg-tmpfs.sh
    
    After this fix the number of memory cgroups in /proc/cgroups can now
    show > 64k as we allow to leave memory cgroups hanging while releasing
    their ids.
    
    Note: Maybe it's a bad idea to allow container to eat kernel
    memory with such a hanging cgroups, but yet I don't have better ideas.
    
    https://jira.vzint.dev/browse/PSBM-147473
    https://jira.vzint.dev/browse/PSBM-147036
---
 mm/memcontrol.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7135306c6ac0..99e0dbfe8f77 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7738,6 +7738,7 @@ static void __init enable_swap_cgroup(void)
  */
 void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 {
+	struct mem_cgroup *memcg;
 	struct page_cgroup *pc;
 	unsigned short oldid;
 
@@ -7754,14 +7755,23 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 		return;
 
 	VM_BUG_ON_PAGE(!(pc->flags & PCG_MEMSW), page);
+	memcg = pc->mem_cgroup;
 
-	oldid = swap_cgroup_record(entry, mem_cgroup_id(pc->mem_cgroup));
+	oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
 	VM_BUG_ON_PAGE(oldid, page);
+	mem_cgroup_swap_statistics(memcg, true);
+	this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PSWPOUT]);
+
+	pc->flags = 0;
 
-	pc->flags &= ~PCG_MEMSW;
-	css_get(&pc->mem_cgroup->css);
-	mem_cgroup_swap_statistics(pc->mem_cgroup, true);
-	this_cpu_inc(pc->mem_cgroup->stat->events[MEM_CGROUP_EVENTS_PSWPOUT]);
+	if (!mem_cgroup_is_root(memcg))
+		page_counter_uncharge(&memcg->memory, 1);
+
+	/* XXX: caller holds IRQ-safe mapping->tree_lock */
+	VM_BUG_ON(!irqs_disabled());
+
+	mem_cgroup_charge_statistics(memcg, page, -1);
+	memcg_check_events(memcg, page);
 }
 
 /**