[Devel] [PATCH RHEL COMMIT] memcg: charge kmem allocations accounted to UBC in PCS6 to memcg

Konstantin Khorenko khorenko at virtuozzo.com
Tue Sep 28 14:16:27 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit 0b105df3aace5906acc4a03c66a3f361b30fa6ab
Author: Vasily Averin <vvs at virtuozzo.com>
Date:   Tue Sep 28 14:16:27 2021 +0300

    memcg: charge kmem allocations accounted to UBC in PCS6 to memcg
    
    First patch description:
    ms/kmemcg: account certain kmem allocations to memcg
    
    Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg.  For the list, see below:
    
     - threadinfo
     - task_struct
     - task_delay_info
     - pid
     - cred
     - mm_struct
     - vm_area_struct and vm_region (nommu)
     - anon_vma and anon_vma_chain
     - signal_struct
     - sighand_struct
     - fs_struct
     - files_struct
     - fdtable and fdtable->full_fds_bits
     - dentry and external_name
     - inode for all filesystems. This is the most tedious part, because
       most filesystems overwrite the alloc_inode method.
    
    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds.  Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).
    
    [akpm at linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
    
    Acked-by: Johannes Weiner <hannes at cmpxchg.org>
    Acked-by: Michal Hocko <mhocko at suse.com>
    Cc: Tejun Heo <tj at kernel.org>
    Cc: Greg Thelen <gthelen at google.com>
    Cc: Christoph Lameter <cl at linux.com>
    Cc: Pekka Enberg <penberg at kernel.org>
    Cc: David Rientjes <rientjes at google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim at lge.com>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    (cherry picked from commit 5d097056c9a017a3b720849efb5432f37acabbac)
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
    
    Conflicts:
            drivers/staging/lustre/lustre/llite/super25.c
            fs/dcache.c
            fs/f2fs/super.c
            fs/file.c
            kernel/fork.c
    
    +++
    memcg: charge kmem allocations accounted to UBC in PCS6 to memcg
    
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
    
    +++
    memcg/tty: charge tty kmem allocations
    
    To be merged into c2b05d7bf7f3c73d8377796f0212d9b65127a016
    ("memcg: charge kmem allocations accounted to UBC in PCS6 to memcg")
    
    https://jira.sw.ru/browse/PSBM-54928
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    tty,vtty: Fix building procedure
    
     - after rebase no __alloc_tty_struct needed, the
       alloc_tty_struct accounts for memory itself
     - need to fix release and master open routines
    
    https://jira.sw.ru/browse/PSBM-54928
    
    To be merged with f89c673a386b2ab7a23158d748d0e69a2c2c0f21
    ("memcg/tty: charge tty kmem allocations") and further.
    
    Signed-off-by: Cyrill Gorcunov <gorcunov at openvz.org>
    
    Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
    
    +++
    memcg: Corrected __GFP_ACCOUNT use in ipv6_add_addr()
    
    __GFP_ACCOUNT is required for ifa allocation only,
    following f6i allocation uses SLAB_ACCOUNT marked slab.
    
    https://jira.sw.ru/browse/PSBM-120694
    
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    +++
    memcg: Account xt_counters for ip6_tables and arp_tables
    
    currently xt_counters are accounted for ip_tables only,
    it makes sense to do it for ip6_tables and arp_tables too.
    
    https://jira.sw.ru/browse/PSBM-120694
    
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    +++
    memcg: disable accounting in netdev_create_hash()
    
    netdev_create_hash() is called twice per netns and
    allocates 1 page only. I think it is too small to account this.
    
    https://jira.sw.ru/browse/PSBM-120694
    to memcg")
    
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    +++
    memcg: disable incomplete accounting for af_packet
    
    This patch revert af_packet changes in alloc_pg_vec()
    
    https://jira.sw.ru/browse/PSBM-120694
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    +++
    memcg: disable memcg accounting for nf_ct hash tables
    
    vz8 does not use per-container conntracks hash tables,
    all nf_ct_alloc_hashtable() calls allocates global hash tables only.
    
    https://jira.sw.ru/browse/PSBM-120694
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    +++
    ms/memcg: drop GFP_KERNEL_ACCOUNT use in tty_save_termios()
    
    Jiri Slaby pointed that termios are not saved for PTYs and for other
    terminals used inside containers. Therefore accounting for saved
    termios have near to zero impact in real life scenarios.
    
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    VvS: rh9 backport. Most part of accounting patches was pushed to
    upstream already, rest ones are placed here.
    (cherry picked from commit f426a39020f76e731cca0a359bd4f28e44e7297e)
    https://jira.sw.ru/browse/PSBM-133990
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
---
 drivers/tty/tty_io.c            |  2 +-
 net/ipv4/netfilter/arp_tables.c |  2 +-
 net/ipv4/netfilter/ip_tables.c  |  2 +-
 net/ipv6/netfilter/ip6_tables.c |  2 +-
 net/netfilter/ipvs/ip_vs_conn.c |  2 +-
 net/netfilter/x_tables.c        | 11 ++++++++---
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 26debec26b4e..a6230b25fbe5 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3119,7 +3119,7 @@ struct tty_struct *alloc_tty_struct(struct tty_driver *driver, int idx)
 {
 	struct tty_struct *tty;
 
-	tty = kzalloc(sizeof(*tty), GFP_KERNEL);
+	tty = kzalloc(sizeof(*tty), GFP_KERNEL_ACCOUNT);
 	if (!tty)
 		return NULL;
 
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index c53f14b94356..8bd6c32d62ce 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -656,7 +656,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	 * about).
 	 */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 13acb687c19a..4b38026f429b 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -797,7 +797,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index de2cf3943b91..78cb5723b94c 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -813,7 +813,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a159c75bb39c..a77ad6cb5110 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1482,7 +1482,7 @@ int __init ip_vs_conn_init(void)
 	/* Allocate ip_vs_conn slab cache */
 	ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
 					      sizeof(struct ip_vs_conn), 0,
-					      SLAB_HWCACHE_ALIGN, NULL);
+					      SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL);
 	if (!ip_vs_conn_cachep) {
 		vfree(ip_vs_conn_tab);
 		return -ENOMEM;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index c507a7f8d2c0..4c8fe717d880 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1094,7 +1094,7 @@ void *xt_copy_counters(sockptr_t arg, unsigned int len,
 	if (size != (u64)len)
 		return ERR_PTR(-EINVAL);
 
-	mem = vmalloc(len);
+	mem = vmalloc_account(len);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
@@ -1175,7 +1175,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
 		return NULL;
 
-	info = kvmalloc(sz, GFP_KERNEL_ACCOUNT);
+	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
+	 * work reasonably well if sz is too large and bail out rather
+	 * than shoot all processes down before realizing there is nothing
+	 * more to reclaim.
+	 */
+	info = kvmalloc(sz, GFP_KERNEL_ACCOUNT | __GFP_NORETRY);
 	if (!info)
 		return NULL;
 
@@ -1367,7 +1372,7 @@ struct xt_counters *xt_counters_alloc(unsigned int counters)
 	if (counters > XT_MAX_TABLE_SIZE)
 		return NULL;
 
-	return vzalloc(counters);
+	return vzalloc_account(counters);
 }
 EXPORT_SYMBOL(xt_counters_alloc);
 


More information about the Devel mailing list