[Devel] [PATCH RH9 22/23] memcg: charge kmem allocations accounted to UBC in PCS6 to memcg

Vasily Averin vvs at virtuozzo.com
Sun Sep 26 13:33:35 MSK 2021


First patch description:
ms/kmemcg: account certain kmem allocations to memcg

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm at linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>

Acked-by: Johannes Weiner <hannes at cmpxchg.org>
Acked-by: Michal Hocko <mhocko at suse.com>
Cc: Tejun Heo <tj at kernel.org>
Cc: Greg Thelen <gthelen at google.com>
Cc: Christoph Lameter <cl at linux.com>
Cc: Pekka Enberg <penberg at kernel.org>
Cc: David Rientjes <rientjes at google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim at lge.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>

(cherry picked from commit 5d097056c9a017a3b720849efb5432f37acabbac)
Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>

Conflicts:
	drivers/staging/lustre/lustre/llite/super25.c
	fs/dcache.c
	fs/f2fs/super.c
	fs/file.c
	kernel/fork.c

+++
memcg: charge kmem allocations accounted to UBC in PCS6 to memcg

Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>

+++
memcg/tty: charge tty kmem allocations

To be merged into c2b05d7bf7f3c73d8377796f0212d9b65127a016
("memcg: charge kmem allocations accounted to UBC in PCS6 to memcg")

https://jira.sw.ru/browse/PSBM-54928

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

+++
tty,vtty: Fix building procedure

 - after rebase no __alloc_tty_struct needed, the
   alloc_tty_struct accounts for memory itself
 - need to fix release and master open routines

https://jira.sw.ru/browse/PSBM-54928

To be merged with f89c673a386b2ab7a23158d748d0e69a2c2c0f21
("memcg/tty: charge tty kmem allocations") and further.

Signed-off-by: Cyrill Gorcunov <gorcunov at openvz.org>
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>

+++
memcg: Corrected __GFP_ACCOUNT use in ipv6_add_addr()

__GFP_ACCOUNT is required for ifa allocation only,
following f6i allocation uses SLAB_ACCOUNT marked slab.

https://jira.sw.ru/browse/PSBM-120694

Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

+++
memcg: Account xt_counters for ip6_tables and arp_tables

currently xt_counters are accounted for ip_tables only,
it makes sense to do it for ip6_tables and arp_tables too.

https://jira.sw.ru/browse/PSBM-120694

Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

+++
memcg: disable accounting in netdev_create_hash()

netdev_create_hash() is called twice per netns and
allocates 1 page only. I think it is too small to account this.

https://jira.sw.ru/browse/PSBM-120694
to memcg")

Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

+++
memcg: disable incomplete accounting for af_packet

This patch revert af_packet changes in alloc_pg_vec()

https://jira.sw.ru/browse/PSBM-120694
Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

+++
memcg: disable memcg accounting for nf_ct hash tables

vz8 does not use per-container conntracks hash tables,
all nf_ct_alloc_hashtable() calls allocates global hash tables only.

https://jira.sw.ru/browse/PSBM-120694
Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

+++
ms/memcg: drop GFP_KERNEL_ACCOUNT use in tty_save_termios()

Jiri Slaby pointed that termios are not saved for PTYs and for other
terminals used inside containers. Therefore accounting for saved
termios have near to zero impact in real life scenarios.

Signed-off-by: Vasily Averin <vvs at virtuozzo.com>

VvS: rh9 backport. Most part of accounting patches was pushed to
upstream already, rest ones are placed here.
(cherry picked from commit f426a39020f76e731cca0a359bd4f28e44e7297e)
https://jira.sw.ru/browse/PSBM-133990
Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
---
 drivers/tty/tty_io.c            |  2 +-
 net/ipv4/netfilter/arp_tables.c |  2 +-
 net/ipv4/netfilter/ip_tables.c  |  2 +-
 net/ipv6/netfilter/ip6_tables.c |  2 +-
 net/netfilter/ipvs/ip_vs_conn.c |  2 +-
 net/netfilter/x_tables.c        | 11 ++++++++---
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 26debec26b4e..a6230b25fbe5 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3119,7 +3119,7 @@ struct tty_struct *alloc_tty_struct(struct tty_driver *driver, int idx)
 {
 	struct tty_struct *tty;
 
-	tty = kzalloc(sizeof(*tty), GFP_KERNEL);
+	tty = kzalloc(sizeof(*tty), GFP_KERNEL_ACCOUNT);
 	if (!tty)
 		return NULL;
 
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index c53f14b94356..8bd6c32d62ce 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -656,7 +656,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	 * about).
 	 */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 13acb687c19a..4b38026f429b 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -797,7 +797,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index de2cf3943b91..78cb5723b94c 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -813,7 +813,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
 	   (other than comefrom, which userspace doesn't care
 	   about). */
 	countersize = sizeof(struct xt_counters) * private->number;
-	counters = vzalloc(countersize);
+	counters = vzalloc_account(countersize);
 
 	if (counters == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index c100c6b112c8..6b46236b26aa 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1482,7 +1482,7 @@ int __init ip_vs_conn_init(void)
 	/* Allocate ip_vs_conn slab cache */
 	ip_vs_conn_cachep = kmem_cache_create("ip_vs_conn",
 					      sizeof(struct ip_vs_conn), 0,
-					      SLAB_HWCACHE_ALIGN, NULL);
+					      SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL);
 	if (!ip_vs_conn_cachep) {
 		vfree(ip_vs_conn_tab);
 		return -ENOMEM;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 84e58ee501a4..f3be58c9bb6b 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1094,7 +1094,7 @@ void *xt_copy_counters(sockptr_t arg, unsigned int len,
 	if (size != (u64)len)
 		return ERR_PTR(-EINVAL);
 
-	mem = vmalloc(len);
+	mem = vmalloc_account(len);
 	if (!mem)
 		return ERR_PTR(-ENOMEM);
 
@@ -1175,7 +1175,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE)
 		return NULL;
 
-	info = kvmalloc(sz, GFP_KERNEL_ACCOUNT);
+	/* __GFP_NORETRY is not fully supported by kvmalloc but it should
+	 * work reasonably well if sz is too large and bail out rather
+	 * than shoot all processes down before realizing there is nothing
+	 * more to reclaim.
+	 */
+	info = kvmalloc(sz, GFP_KERNEL_ACCOUNT | __GFP_NORETRY);
 	if (!info)
 		return NULL;
 
@@ -1367,7 +1372,7 @@ struct xt_counters *xt_counters_alloc(unsigned int counters)
 	if (counters > XT_MAX_TABLE_SIZE)
 		return NULL;
 
-	return vzalloc(counters);
+	return vzalloc_account(counters);
 }
 EXPORT_SYMBOL(xt_counters_alloc);
 
-- 
2.25.1



More information about the Devel mailing list