[Devel] [PATCH RHEL COMMIT] ve/proc: virtualize /proc/meminfo in a Container

Konstantin Khorenko khorenko at virtuozzo.com
Tue Sep 28 19:45:34 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit 4a98c3e4014015330e3a4da8aa2a90c51e261101
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date:   Tue Sep 28 19:45:34 2021 +0300

    ve/proc: virtualize /proc/meminfo in a Container
    
    Show virtualized data in /proc/meminfo inside a Container.
    Counters are taken from CT root memory cgroup.
    
    Host users see non-virtualized data,
    tasks in ve cgroup see virtualized data.
    
    The patch is a port of following patches:
    - a84b6c9882dd ("ub: Split meminfo_proc_show()")
    - 0ae658e29d17 ("ubc: initial patch")
    - 031916206423 ("ve/proc: Port diff-ve-proc-add-buffers-field-to-meminfo")
    
        ve/proc: Port diff-ve-proc-add-buffers-field-to-meminfo
    
        Author: Dmitry Guryanov
        Email: dguryanov at parallels.com
        Subject: proc: add Buffers field to meminfo
        Date: Mon, 5 Aug 2013 16:22:03 +0400
    
        A Customer has experienced a problem with some reporting tool
        which wants Buffers: string in meminfo.
        Strings in meminfo in ve0 and container was the same on 2.6.18,
        but on 2.6.32 they are different. Let's add only Buffers: string
        now and check if it will fix the problem.
    
        https://jira.sw.ru/browse/PSBM-19448
    
        Signed-off-by: Dmitry Guryanov <dguryanov at parallels.com>
    
        Acked-by: Konstantin Khlebnikov <khlebnikov at openvz.org>
        =====================================================================
    
        Actually, we need to show something meaningful there now, because
        buffers are now accounted to CT RAM via kmemcg - will be done in the
        scope of PSBM-34444.
    
        Related to https://jira.sw.ru/browse/PSBM-33650
    
        Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
    
    =======================================================
    Addition from https://jira.sw.ru/browse/PSBM-34444 :
    vdavydov@:
    
    /proc/meminfo:Buffers = size of bdev cache pages. We accounted them in
    UB_PHYSPAGES in PCS6 just like we account them to memory.usage in Vz7.
    They have nothing to do with kmem accounting, which significantly differs
    between PCS6 and Vz7 (do not mix with buffer heads, which do not have a
    separate counter and accounted in SReclaimable along with dcache).
    
    Nobody has ever complained that we show 0 for meminfo:Buffers in PCS6,
    so there is no point in rushing ahead and implementing it in Vz7.
    
    =======================================================
    Rebase to RHEL8.2 kernel-4.18.0-193.6.3.el8_2 note:
    use accumulate_memcg_tree() now instead of tree_stat().
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    ve/memcg: Make virtualization of /proc/meminfo view inside CT recursive
    
    When we read /proc/meminfo inside container we expect to see not only
    stats for container root cgroup but an agregated stats for all container
    cgroups, so let's make in recursive like in VZ7.
    
    Note: In VZ7 this was done via virtinfo subsystem which is dropped.
    
    https://jira.sw.ru/browse/PSBM-127780
    
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    +++
    ve/memcg: Fix /proc/meminfo virtualization (eliminate double recursion)
    
    This patch partially reverts commit 47f1b6c1d8e5 ("ve/memcg: Make
    virtualization of /proc/meminfo view inside CT recursive")
    
    In vz8 we have both memcg->vmstats (recursive) and
    memcg->vmstats_local (non-recursive), and mem_page_state_recursive()
    brought by the reverted commit does double recursion,
    so revert that logic, but leave the other stuff.
    
    https://jira.sw.ru/browse/PSBM-131992
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    ve/memcg: Honor changing per-memcg s[un]reclaimable counters to bytes in per-CT /proc/meminfo
    
    RHEL8.4 has following ms commit backported:
    d42f3245c7e2 ("mm: memcg: convert vmstat slab counters to bytes")
    
    So, update places were we use per-memcg counters NR_SLAB_[UN]RECLAIMABLE_B
    accordingly.
    
    https://jira.sw.ru/browse/PSBM-132893
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    mm/memcg: Drop unused struct "accumulated_stats"
    
    https://jira.sw.ru/browse/PSBM-131992
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    (cherry picked from vz8 commit 4da49440e1fa5ced1ef5583d219c91684a382cd2)
    
    vz9 changes: export memcg_page_state into memcontrol.h (inspired by
    mainstream commit 7490a2d24814 ("writeback: memcg: simplify
    cgroup_writeback_by_id")).
    
    https://jira.sw.ru/browse/PSBM-133988
---
 fs/proc/meminfo.c          | 105 ++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/memcontrol.h |  24 +++++++++++
 include/linux/ve.h         |   5 +++
 include/linux/virtinfo.h   |  24 +++++++++++
 kernel/ve/ve.c             |   3 ++
 mm/memcontrol.c            |  38 +++++++++++-----
 6 files changed, 187 insertions(+), 12 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8f7335f464c7..87001a186504 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -13,6 +13,9 @@
 #include <linux/vmstat.h>
 #include <linux/atomic.h>
 #include <linux/vmalloc.h>
+#include <linux/virtinfo.h>
+#include <linux/memcontrol.h>
+#include <linux/ve.h>
 #ifdef CONFIG_CMA
 #include <linux/cma.h>
 #endif
@@ -35,9 +38,93 @@ extern unsigned long get_nr_tcache_pages(void);
 static inline unsigned long get_nr_tcache_pages(void) { return 0; }
 #endif
 
-static int meminfo_proc_show(struct seq_file *m, void *v)
+static int meminfo_proc_show_mi(struct seq_file *m, struct meminfo *mi)
+{
+	unsigned long *pages;
+
+	pages = mi->pages;
+
+	show_val_kb(m, "MemTotal:       ", mi->si->totalram);
+	show_val_kb(m, "MemFree:        ", mi->si->freeram);
+	show_val_kb(m, "Buffers:        ", 0);
+	show_val_kb(m, "Cached:         ", mi->cached);
+
+	show_val_kb(m, "Active:         ", pages[LRU_ACTIVE_ANON] +
+					   pages[LRU_ACTIVE_FILE]);
+	show_val_kb(m, "Inactive:       ", pages[LRU_INACTIVE_ANON] +
+					   pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "Active(anon):   ", pages[LRU_ACTIVE_ANON]);
+	show_val_kb(m, "Inactive(anon): ", pages[LRU_INACTIVE_ANON]);
+	show_val_kb(m, "Active(file):   ", pages[LRU_ACTIVE_FILE]);
+	show_val_kb(m, "Inactive(file): ", pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "Unevictable:    ", pages[LRU_UNEVICTABLE]);
+	show_val_kb(m, "Mlocked:        ", 0);
+
+	show_val_kb(m, "SwapTotal:      ", mi->si->totalswap);
+	show_val_kb(m, "SwapFree:       ", mi->si->freeswap);
+	show_val_kb(m, "Dirty:          ", mi->dirty_pages);
+	show_val_kb(m, "Writeback:      ", mi->writeback_pages);
+
+	show_val_kb(m, "AnonPages:      ", pages[LRU_ACTIVE_ANON] +
+					   pages[LRU_INACTIVE_ANON]);
+	show_val_kb(m, "Shmem:          ", mi->si->sharedram);
+	show_val_kb(m, "Slab:           ", mi->slab_reclaimable +
+					   mi->slab_unreclaimable);
+	show_val_kb(m, "SReclaimable:   ", mi->slab_reclaimable);
+	show_val_kb(m, "SUnreclaim:     ", mi->slab_unreclaimable);
+
+       return 0;
+}
+
+void si_meminfo_ve(struct sysinfo *si, struct ve_struct *ve)
+{
+	unsigned long memtotal, memused, swaptotal, swapused;
+	struct mem_cgroup *memcg;
+	struct cgroup_subsys_state *css;
+
+	memset(si, 0, sizeof(*si));
+
+	css = ve_get_init_css(ve, memory_cgrp_id);
+	memcg = mem_cgroup_from_css(css);
+
+	memtotal = READ_ONCE(memcg->memory.max);
+	memused = page_counter_read(&memcg->memory);
+	si->totalram = memtotal;
+	si->freeram = (memtotal > memused ? memtotal - memused : 0);
+
+	si->sharedram = memcg_page_state(memcg, NR_SHMEM);
+
+	swaptotal = READ_ONCE(memcg->memsw.max) - memtotal;
+	swapused = page_counter_read(&memcg->memsw) - memused;
+	si->totalswap = swaptotal;
+	/* Due to global reclaim, memory.memsw.usage can be greater than
+	 * (memory.memsw.max - memory.max). */
+	si->freeswap = (swaptotal > swapused ? swaptotal - swapused : 0);
+
+	si->mem_unit = PAGE_SIZE;
+
+	css_put(css);
+
+	/* bufferram, totalhigh and freehigh left 0 */
+}
+
+static void fill_meminfo_ve(struct meminfo *mi, struct ve_struct *ve)
+{
+	struct cgroup_subsys_state *css;
+
+	si_meminfo_ve(mi->si, ve);
+
+	css = ve_get_init_css(ve, memory_cgrp_id);
+	mem_cgroup_fill_meminfo(mem_cgroup_from_css(css), mi);
+	css_put(css);
+
+}
+
+static int meminfo_proc_show_ve(struct seq_file *m, void *v,
+				struct ve_struct *ve)
 {
 	struct sysinfo i;
+	struct meminfo mi;
 	unsigned long committed;
 	long cached;
 	long available;
@@ -47,6 +134,17 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 
 	si_meminfo(&i);
 	si_swapinfo(&i);
+
+        memset(&mi, 0, sizeof(mi));
+        mi.si = &i;
+        mi.ve = ve;
+
+	if (!ve_is_super(ve) && ve->meminfo_val == VE_MEMINFO_DEFAULT) {
+		fill_meminfo_ve(&mi, ve);
+
+		return meminfo_proc_show_mi(m, &mi);
+	}
+
 	committed = vm_memory_committed();
 
 	cached = global_node_page_state(NR_FILE_PAGES) -
@@ -162,6 +260,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int meminfo_proc_show(struct seq_file *m, void *v)
+{
+	return meminfo_proc_show_ve(m, v, get_exec_env());
+}
+
 static int __init proc_meminfo_init(void)
 {
 	proc_net_create_single("meminfo", 0, NULL, meminfo_proc_show);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b1feb6a36da0..b716a5bc806f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -21,6 +21,7 @@
 #include <linux/vmstat.h>
 #include <linux/writeback.h>
 #include <linux/page-flags.h>
+#include <linux/virtinfo.h>
 
 struct mem_cgroup;
 struct obj_cgroup;
@@ -1002,6 +1003,17 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 	local_irq_restore(flags);
 }
 
+/* idx can be of type enum memcg_stat_item or node_stat_item. */
+static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+{
+        long x = READ_ONCE(memcg->vmstats.state[idx]);
+#ifdef CONFIG_SMP
+        if (x < 0)
+                x = 0;
+#endif
+        return x;
+}
+
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 					      enum node_stat_item idx)
 {
@@ -1148,6 +1160,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 						gfp_t gfp_mask,
 						unsigned long *total_scanned);
 
+void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi);
+
 #else /* CONFIG_MEMCG */
 
 #define MEM_CGROUP_ID_SHIFT	0
@@ -1459,6 +1473,11 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
 {
 }
 
+static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
+{
+	return 0;
+}
+
 static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
 					      enum node_stat_item idx)
 {
@@ -1525,6 +1544,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 {
 	return 0;
 }
+
+static void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi)
+{
+}
+
 #endif /* CONFIG_MEMCG */
 
 static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/include/linux/ve.h b/include/linux/ve.h
index 30f4daa402f5..248cdeb0a2e4 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -61,6 +61,8 @@ struct ve_struct {
 
 	int			_randomize_va_space;
 
+	unsigned long		meminfo_val;
+
 	struct kthread_worker	*kthreadd_worker;
 	struct task_struct	*kthreadd_task;
 
@@ -68,6 +70,9 @@ struct ve_struct {
 	struct task_struct	*umh_task;
 };
 
+#define VE_MEMINFO_DEFAULT	1	/* default behaviour */
+#define VE_MEMINFO_SYSTEM	0	/* disable meminfo virtualization */
+
 extern int nr_ve;
 
 #define NETNS_MAX_NR_DEFAULT	256	/* number of net-namespaces per-VE */
diff --git a/include/linux/virtinfo.h b/include/linux/virtinfo.h
new file mode 100644
index 000000000000..317d0f4d817b
--- /dev/null
+++ b/include/linux/virtinfo.h
@@ -0,0 +1,24 @@
+/*
+ *  include/linux/virtinfo.h
+ *
+ *  Copyright (c) 2005-2008 SWsoft
+ *  Copyright (c) 2009-2015 Parallels IP Holdings GmbH
+ *  Copyright (c) 2017-2021 Virtuozzo International GmbH. All rights reserved.
+ *
+ */
+
+#ifndef __LINUX_VIRTINFO_H
+#define __LINUX_VIRTINFO_H
+
+struct sysinfo;
+struct ve_struct;
+
+struct meminfo {
+        struct sysinfo *si;
+        struct ve_struct *ve;	/* for debug only */
+        unsigned long pages[NR_LRU_LISTS];
+        unsigned long cached, dirty_pages, writeback_pages, shmem;
+        unsigned long slab_reclaimable, slab_unreclaimable;
+};
+
+#endif /* __LINUX_VIRTINFO_H */
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index df82b7577bc9..8f192ee41832 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -54,6 +54,7 @@ struct ve_struct ve0 = {
 #else
 					2,
 #endif
+	.meminfo_val		= VE_MEMINFO_SYSTEM,
 };
 EXPORT_SYMBOL(ve0);
 
@@ -585,6 +586,8 @@ static struct cgroup_subsys_state *ve_create(struct cgroup_subsys_state *parent_
 	ve->features = VE_FEATURES_DEF;
 	ve->_randomize_va_space = ve0._randomize_va_space;
 
+	ve->meminfo_val = VE_MEMINFO_DEFAULT;
+
 	atomic_set(&ve->netns_avail_nr, NETNS_MAX_NR_DEFAULT);
 	ve->netns_max_nr = NETNS_MAX_NR_DEFAULT;
 do_init:
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 74a6dba5a023..492e4b4e7574 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include <linux/tracehook.h>
 #include <linux/psi.h>
 #include <linux/seq_buf.h>
+#include <linux/virtinfo.h>
 #include "internal.h"
 #include <net/sock.h>
 #include <net/ip.h>
@@ -646,17 +647,6 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
 	cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
 }
 
-/* idx can be of type enum memcg_stat_item or node_stat_item. */
-static unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
-{
-	long x = READ_ONCE(memcg->vmstats.state[idx]);
-#ifdef CONFIG_SMP
-	if (x < 0)
-		x = 0;
-#endif
-	return x;
-}
-
 /* idx can be of type enum memcg_stat_item or node_stat_item. */
 static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx)
 {
@@ -4004,6 +3994,32 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
 	return nr;
 }
 
+void mem_cgroup_get_nr_pages(struct mem_cgroup *memcg, unsigned long *pages)
+{
+	enum lru_list lru;
+
+	for_each_lru(lru)
+		pages[lru] += mem_cgroup_nr_lru_pages(memcg, BIT(lru), true);
+}
+
+void mem_cgroup_fill_meminfo(struct mem_cgroup *memcg, struct meminfo *mi)
+{
+	memset(&mi->pages, 0, sizeof(mi->pages));
+	mem_cgroup_get_nr_pages(memcg, mi->pages);
+
+	mi->slab_reclaimable = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)
+								>> PAGE_SHIFT;
+	mi->slab_unreclaimable = memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B)
+								>> PAGE_SHIFT;
+	mi->cached = memcg_page_state(memcg, NR_FILE_PAGES);
+	mi->shmem = memcg_page_state(memcg, NR_SHMEM);
+	mi->dirty_pages = memcg_page_state(memcg, NR_FILE_DIRTY);
+	mi->writeback_pages = memcg_page_state(memcg, NR_WRITEBACK);
+
+	/* locked pages are accounted per zone */
+	/* mi->locked = 0; */
+}
+
 static int memcg_numa_stat_show(struct seq_file *m, void *v)
 {
 	struct numa_stat {


More information about the Devel mailing list