[Devel] [PATCH RH9 16/16] ve/proc: virtualize /proc/meminfo in a Container

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Sep 28 15:41:06 MSK 2021


From: Konstantin Khorenko <khorenko at virtuozzo.com>

Show virtualized data in /proc/meminfo inside a Container.
Counters are taken from CT root memory cgroup.

Host users see non-virtualized data,
tasks in ve cgroup see virtualized data.

The patch is a port of following patches:
- a84b6c9882dd ("ub: Split meminfo_proc_show()")
- 0ae658e29d17 ("ubc: initial patch")
- 031916206423 ("ve/proc: Port diff-ve-proc-add-buffers-field-to-meminfo")

    ve/proc: Port diff-ve-proc-add-buffers-field-to-meminfo

    Author: Dmitry Guryanov
    Email: dguryanov at parallels.com
    Subject: proc: add Buffers field to meminfo
    Date: Mon, 5 Aug 2013 16:22:03 +0400

    A Customer has experienced a problem with some reporting tool
    which wants Buffers: string in meminfo.
    Strings in meminfo in ve0 and container was the same on 2.6.18,
    but on 2.6.32 they are different. Let's add only Buffers: string
    now and check if it will fix the problem.

    https://jira.sw.ru/browse/PSBM-19448

    Signed-off-by: Dmitry Guryanov <dguryanov at parallels.com>

    Acked-by: Konstantin Khlebnikov <khlebnikov at openvz.org>
    =====================================================================

    Actually, we need to show something meaningful there now, because
    buffers are now accounted to CT RAM via kmemcg - will be done in the
    scope of PSBM-34444.

    Related to https://jira.sw.ru/browse/PSBM-33650

    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>

=======================================================
Addition from https://jira.sw.ru/browse/PSBM-34444 :
vdavydov@:

/proc/meminfo:Buffers = size of bdev cache pages. We accounted them in
UB_PHYSPAGES in PCS6 just like we account them to memory.usage in Vz7.
They have nothing to do with kmem accounting, which significantly differs
between PCS6 and Vz7 (do not mix with buffer heads, which do not have a
separate counter and accounted in SReclaimable along with dcache).

Nobody has ever complained that we show 0 for meminfo:Buffers in PCS6,
so there is no point in rushing ahead and implementing it in Vz7.

=======================================================
Rebase to RHEL8.2 kernel-4.18.0-193.6.3.el8_2 note:
use accumulate_memcg_tree() now instead of tree_stat().

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

+++
ve/memcg: Make virtualization of /proc/meminfo view inside CT recursive

When we read /proc/meminfo inside container we expect to see not only
stats for container root cgroup but an agregated stats for all container
cgroups, so let's make in recursive like in VZ7.

Note: In VZ7 this was done via virtinfo subsystem which is dropped.

https://jira.sw.ru/browse/PSBM-127780

Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

+++
ve/memcg: Fix /proc/meminfo virtualization (eliminate double recursion)

This patch partially reverts commit 47f1b6c1d8e5 ("ve/memcg: Make
virtualization of /proc/meminfo view inside CT recursive")

In vz8 we have both memcg->vmstats (recursive) and
memcg->vmstats_local (non-recursive), and mem_page_state_recursive()
brought by the reverted commit does double recursion,
so revert that logic, but leave the other stuff.

https://jira.sw.ru/browse/PSBM-131992

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

+++
ve/memcg: Honor changing per-memcg s[un]reclaimable counters to bytes in per-CT /proc/meminfo

RHEL8.4 has following ms commit backported:
d42f3245c7e2 ("mm: memcg: convert vmstat slab counters to bytes")

So, update places were we use per-memcg counters NR_SLAB_[UN]RECLAIMABLE_B
accordingly.

https://jira.sw.ru/browse/PSBM-132893

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

+++
mm/memcg: Drop unused struct "accumulated_stats"

https://jira.sw.ru/browse/PSBM-131992

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

(cherry picked from vz8 commit 4da49440e1fa5ced1ef5583d219c91684a382cd2)

vz9 changes: export memcg_page_state into memcontrol.h (inspired by
mainstream commit 7490a2d24814 ("writeback: memcg: simplify
cgroup_writeback_by_id")).

https://jira.sw.ru/browse/PSBM-133988
---
 fs/proc/meminfo.c          | 105 ++++++++++++++++++++++++++++++++++++-
 include/linux/memcontrol.h |  24 +++++++++
 include/linux/ve.h         |   5 ++
 include/linux/virtinfo.h   |  24 +++++++++
 kernel/ve/ve.c             |   3 ++
 mm/memcontrol.c            |  38 ++++++++++----
 6 files changed, 187 insertions(+), 12 deletions(-)
 create mode 100644 include/linux/virtinfo.h

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8f7335f464c7..87001a186504 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -13,6 +13,9 @@
 #include <linux/vmstat.h>
 #include <linux/atomic.h>
 #include <linux/vmalloc.h>
+#include <linux/virtinfo.h>
+#include <linux/memcontrol.h>
+#include <linux/ve.h>
 #ifdef CONFIG_CMA
 #include <linux/cma.h>
 #endif
@@ -35,9 +38,93 @@ extern unsigned long get_nr_tcache_pages(void);
 static inline unsigned long get_nr_tcache_pages(void) { return 0; }
 #endif
 
-static int meminfo_proc_show(struct seq_file *m, void *v)
+static int meminfo_proc_show_mi(struct seq_file *m, struct meminfo *mi)
+{
+	unsigned long *pages;
+
+	pages = mi->pages;
+
+	show_val_kb(m, "MemTotal:       ", mi->si->totalram);
+	show_val_kb(m, "MemFree:        ", mi->si->freeram);
+	show_val_kb(m, "Buffers:        ", 0);
+	show_val_kb(m, "Cached:         ", mi->cached);
+
+	show_val_kb(m, "Active:         ", pages[LRU_ACTIVE_ANON] +
+					   pages[LRU_ACTIVE_FILE]);
+	show_val_kb(m, "Inactive:       ", pages[LRU_INACTIVE_ANON] +
+					   pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "Active(anon):   ", pages[LRU_ACTIVE_ANON]);
+	show_val_kb(m, "Inactive(anon): ", pages[LRU_INACTIVE_ANON]);
+	show_val_kb(m, "Active(file):   ", pages[LRU_ACTIVE_FILE]);
+	show_val_kb(m, "Inactive(file): ", pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "Unevictable:    ", pages[LRU_UNEVICTABLE]);
+	show_val_kb(m, "Mlocked:        ", 0);
+
+	show_val_kb(m, "SwapTotal:      ", mi->si->totalswap);
+	show_val_kb(m, "SwapFree:       ", mi->si->freeswap);
+	show_val_kb(m, "Dirty:          ", mi->dirty_pages);
+	show_val_kb(m, "Writeback:      ", mi->writeback_pages);
+
+	show_val_kb(m, "AnonPages:      ", pages[LRU_ACTIVE_ANON] +
+					   pages[LRU_INACTIVE_ANON]);
+	show_val_kb(m, "Shmem:          ", mi->si->sharedram);
+	show_val_kb(m, "Slab:           ", mi->slab_reclaimable +
+					   mi->slab_unreclaimable);
+	show_val_kb(m, "SReclaimable:   ", mi->slab_reclaimable);
+	show_val_kb(m, "SUnreclaim:     ", mi->slab_unreclaimable);
+
+       return 0;
+}
+
+void si_meminfo_ve(struct sysinfo *si, struct ve_struct *ve)
+{
+	unsigned long memtotal, memused, swaptotal, swapused;
+	struct mem_cgroup *memcg;
+	struct cgroup_subsys_state *css;
+
+	memset(si, 0, sizeof(*si));
+
+	css = ve_get_init_css(ve, memory_cgrp_id);
+	memcg = mem_cgroup_from_css(css);
+
+	memtotal = READ_ONCE(memcg->memory.max);
+	memused = page_counter_read(&memcg->memory);
+	si->totalram = memtotal;
+	si->freeram = (memtotal > memused ? memtotal - memused : 0);
+
+	si->sharedram = memcg_page_state(memcg, NR_SHMEM);
+
+	swaptotal = READ_ONCE(memcg->memsw.max) - memtotal;
+	swapused = page_counter_read(&memcg->memsw) - memused;
+	si->totalswap = swaptotal;
+	/* Due to global reclaim, memory.memsw.usage can be greater than
+	 * (memory.memsw.max - memory.max). */
+	si->freeswap = (swaptotal > swapused ? swaptotal - swapused : 0);
+
+	si->mem_unit = PAGE_SIZE;
+
+	css_put(css);
+
+	/* bufferram, totalhigh and freehigh left 0 */
+}
+
+static void fill_meminfo_ve(struct meminfo *mi, struct ve_struct *ve)
+{
+	struct cgroup_subsys_state *css;
+
+	si_meminfo_ve(mi->si, ve);
+
+	css = ve_get_init_css(ve, memory_cgrp_id);
+	mem_cgroup_fill_meminfo(mem_cgroup_from_css(css), mi);
+	css_put(css);
+
+}
+
+static int meminfo_proc_show_ve(struct seq_file *m, void *v,
+				struct ve_struct *ve)
 {
 	struct sysinfo i;
+	struct meminfo mi;
 	unsigned long committed;
 	long cached;
 	long available;
@@ -47,6 +134,17 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 
 	si_meminfo(&i);
 	si_swapinfo(&i);
+
+        memset(&mi, 0, sizeof(mi));
+        mi.si = &i;
+        mi.ve = ve;
+
+	if (!ve_is_super(ve) && ve->meminfo_val == VE_MEMINFO_DEFAULT) {
+		fill_meminfo_ve(&mi, ve);
+
+		return meminfo_proc_show_mi(m, &mi);
+	}
+
 	committed = vm_memory_committed();
 
 	cached = global_node_page_state(NR_FILE_PAGES) -
@@ -162,6 +260,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int meminfo_proc_show(struct seq_file *m, void *v)
+{
+	return meminfo_proc_show_ve(m, v, get_exec_env());
+}
+
 static int __init proc_meminfo_init(void)
 {
 	proc_net_create_single("meminfo", 0, NULL, meminfo_proc_show);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b1feb6a36da0..b716a5bc806f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -21,6 +21,7 @@
 #include <linux/vmstat.h>
 #include <linux/writeback.h>
 #include <linux/page-flags.h>
+#include <linux/virtinfo.h>
 
 struct mem_cgroup;
 struct obj_cgroup;
@@ -1002,6 +1003,17 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 	local_irq_restore(flags);
 }
 
+/* idx can be of type enum memcg_stat_item or node_stat_item. */
+static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+{
+        long x = READ_ONCE(memcg->vmstats.state[idx]);
+#ifdef CONFIG_SMP
+        if (x < 0)
+                x = 0;
+#endif
+        return x;
+}
+
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 					      enum node_stat_item idx)
 {
@@ -1148,6 +1160,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 						gfp_t gfp_mask,
 						unsigned long *total_scanned);
 
+void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi);
+
 #else /* CONFIG_MEMCG */
 
 #define MEM_CGROUP_ID_SHIFT	0
@@ -1459,6 +1473,11 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 {
 }
 
+static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+{
+	return 0;
+}
+
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 					      enum node_stat_item idx)
 {
@@ -1525,6 +1544,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 {
 	return 0;
 }
+
+static void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi)
+{
+}
+
 #endif /* CONFIG_MEMCG */
 
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/include/linux/ve.h b/include/linux/ve.h
index bed0c186ac80..1e2bf3814b14 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -59,8 +59,13 @@ struct ve_struct {
 	u64			_uevent_seqnum;
 
 	int			_randomize_va_space;
+
+	unsigned long		meminfo_val;
 };
 
+#define VE_MEMINFO_DEFAULT	1	/* default behaviour */
+#define VE_MEMINFO_SYSTEM	0	/* disable meminfo virtualization */
+
 extern int nr_ve;
 
 #define NETNS_MAX_NR_DEFAULT	256	/* number of net-namespaces per-VE */
diff --git a/include/linux/virtinfo.h b/include/linux/virtinfo.h
new file mode 100644
index 000000000000..317d0f4d817b
--- /dev/null
+++ b/include/linux/virtinfo.h
@@ -0,0 +1,24 @@
+/*
+ *  include/linux/virtinfo.h
+ *
+ *  Copyright (c) 2005-2008 SWsoft
+ *  Copyright (c) 2009-2015 Parallels IP Holdings GmbH
+ *  Copyright (c) 2017-2021 Virtuozzo International GmbH. All rights reserved.
+ *
+ */
+
+#ifndef __LINUX_VIRTINFO_H
+#define __LINUX_VIRTINFO_H
+
+struct sysinfo;
+struct ve_struct;
+
+struct meminfo {
+        struct sysinfo *si;
+        struct ve_struct *ve;	/* for debug only */
+        unsigned long pages[NR_LRU_LISTS];
+        unsigned long cached, dirty_pages, writeback_pages, shmem;
+        unsigned long slab_reclaimable, slab_unreclaimable;
+};
+
+#endif /* __LINUX_VIRTINFO_H */
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index ba5c6e240633..b1d2a07b9d56 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -51,6 +51,7 @@ struct ve_struct ve0 = {
 #else
 					2,
 #endif
+	.meminfo_val		= VE_MEMINFO_SYSTEM,
 };
 EXPORT_SYMBOL(ve0);
 
@@ -424,6 +425,8 @@ static struct cgroup_subsys_state *ve_create(struct cgroup_subsys_state *parent_
 	ve->features = VE_FEATURES_DEF;
 	ve->_randomize_va_space = ve0._randomize_va_space;
 
+	ve->meminfo_val = VE_MEMINFO_DEFAULT;
+
 	atomic_set(&ve->netns_avail_nr, NETNS_MAX_NR_DEFAULT);
 	ve->netns_max_nr = NETNS_MAX_NR_DEFAULT;
 do_init:
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6d882c660c21..2cd961d92534 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include <linux/tracehook.h>
 #include <linux/psi.h>
 #include <linux/seq_buf.h>
+#include <linux/virtinfo.h>
 #include "internal.h"
 #include <net/sock.h>
 #include <net/ip.h>
@@ -646,17 +647,6 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
 	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
 }
 
-/* idx can be of type enum memcg_stat_item or node_stat_item. */
-static unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
-{
-	long x = READ_ONCE(memcg->vmstats.state[idx]);
-#ifdef CONFIG_SMP
-	if (x < 0)
-		x = 0;
-#endif
-	return x;
-}
-
 /* idx can be of type enum memcg_stat_item or node_stat_item. */
 static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
 {
@@ -4004,6 +3994,32 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 	return nr;
 }
 
+void mem_cgroup_get_nr_pages(struct mem_cgroup *memcg, unsigned long *pages)
+{
+	enum lru_list lru;
+
+	for_each_lru(lru)
+		pages[lru] += mem_cgroup_nr_lru_pages(memcg, BIT(lru), true);
+}
+
+void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi)
+{
+	memset(&mi->pages, 0, sizeof(mi->pages));
+	mem_cgroup_get_nr_pages(memcg, mi->pages);
+
+	mi->slab_reclaimable = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)
+								>> PAGE_SHIFT;
+	mi->slab_unreclaimable = memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B)
+								>> PAGE_SHIFT;
+	mi->cached = memcg_page_state(memcg, NR_FILE_PAGES);
+	mi->shmem = memcg_page_state(memcg, NR_SHMEM);
+	mi->dirty_pages = memcg_page_state(memcg, NR_FILE_DIRTY);
+	mi->writeback_pages = memcg_page_state(memcg, NR_WRITEBACK);
+
+	/* locked pages are accounted per zone */
+	/* mi->locked = 0; */
+}
+
 static int memcg_numa_stat_show(struct seq_file *m, void *v)
 {
 	struct numa_stat {
-- 
2.31.1



More information about the Devel mailing list