[Devel] [PATCH RHEL COMMIT] ms/memcg: enable accounting for pids in nested pid namespaces

Konstantin Khorenko khorenko at virtuozzo.com
Tue Sep 28 14:16:19 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit f461c2bb368d912849ef81f260c775483bb9f0f1
Author: Vasily Averin <vvs at virtuozzo.com>
Date:   Tue Sep 28 14:16:19 2021 +0300

    ms/memcg: enable accounting for pids in nested pid namespaces
    
    Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
    enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
    but forgot to adjust the setting for nested pid namespaces.  As a result,
    pid memory is not accounted exactly where it is really needed, inside
    memcg-limited containers with their own pid namespaces.
    
    Pid was one the first kernel objects enabled for memcg accounting.
    init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that any
    new pids in the system are memcg-accounted.
    
    Though recently I've noticed that it is wrong.  nested pid namespaces
    creates own slab caches for pid objects, nested pids have increased size
    because contain id both for all parent and for own pid namespaces.  The
    problem is that these slab caches are _NOT_ marked by SLAB_ACCOUNT, as a
    result any pids allocated in nested pid namespaces are not
    memcg-accounted.
    
    Pid struct in nested pid namespace consumes up to 500 bytes memory, 100000
    such objects gives us up to ~50Mb unaccounted memory, this allow container
    to exceed assigned memcg limits.
    
    Link: https://lkml.kernel.org/r/8b6de616-fd1a-02c6-cbdb-976ecdcfa604@virtuozzo.com
    Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
    Cc: stable at vger.kernel.org
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    Reviewed-by: Michal Koutný <mkoutny at suse.com>
    Reviewed-by: Shakeel Butt <shakeelb at google.com>
    Acked-by: Christian Brauner <christian.brauner at ubuntu.com>
    Acked-by: Roman Gushchin <guro at fb.com>
    Cc: Michal Hocko <mhocko at suse.com>
    Cc: Johannes Weiner <hannes at cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    (cherry picked from commit fab827dbee8c2e06ca4ba000fa6c48bcf9054aba)
    https://jira.sw.ru/browse/PSBM-133990
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
---
 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 51897deed16e..159b577d5123 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -52,7 +52,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
 	mutex_lock(&pid_caches_mutex);
 	/* Name collision forces to do allocation under mutex. */
 	if (!*pkc)
-		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
+		*pkc = kmem_cache_create(name, len, 0,
+					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
 	mutex_unlock(&pid_caches_mutex);
 	/* current can fail, but someone else can succeed. */
 	return READ_ONCE(*pkc);


More information about the Devel mailing list