[Devel] [PATCH RHEL8 COMMIT] mm/page_alloc: Adjust the number of managed pages for a zone if it is wrong

Konstantin Khorenko khorenko at virtuozzo.com
Thu Jul 15 11:11:57 MSK 2021


The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.54
------>
commit 28c689b8581fa86934deb8fa272b5c325beec010
Author: Evgenii Shatokhin <eshatokhin at virtuozzo.com>
Date:   Thu Jul 15 11:11:57 2021 +0300

    mm/page_alloc: Adjust the number of managed pages for a zone if it is wrong
    
    (A temporary hack, to be dropped after the rebase on top of RHEL 8.4.)
    
    In certain cases, the number of managed pages in a memory zone becomes
    less than the number of free pages, leading to negative or overly large
    'MemUsed' value (managed_pages - free_pages) shown in
    /sys/devices/system/node/node*/meminfo.
    
    It is suspected that the number of managed pages is calculated
    incorrectly for some reason on NUMA systems. However, the root cause is
    unclear.
    
    The patch detects such conditions, outputs a message to dmesg and
    'corrects' the value of managed_pages used to prepare data for 'meminfo'
    files. It does not change zone->managed_pages, only the stats shown to
    the users. So, it is not the fix but, instead, it just hides the problem
    and allows our testing to continue.
    
    The patch was prepared in the scope of
    https://jira.sw.ru/browse/PSBM-129304.
    
    The problem seems to be fixed in the kernel from RHEL 8.4, so the patch
    should be dropped after rebase on top of that kernel.
    
    Signed-off-by: Evgenii Shatokhin <eshatokhin at virtuozzo.com>
---
 mm/page_alloc.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fca875aa8ab3..874de3912942 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5009,8 +5009,25 @@ void si_meminfo_node(struct sysinfo *val, int nid)
 	unsigned long free_highpages = 0;
 	pg_data_t *pgdat = NODE_DATA(nid);
 
-	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-		managed_pages += pgdat->node_zones[zone_type].managed_pages;
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
+		struct zone *zone = &pgdat->node_zones[zone_type];
+		unsigned long nr_managed = zone->managed_pages;
+		unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES);
+
+		/*
+		 * HACK, PSBM-129304
+		 * In certain cases, the number of managed pages becomes less
+		 * than the number of free pages in a zone, leading to negative
+		 * or overly large 'MemUsed' (managed_pages - free_pages).
+		 * 'Correct' the numbers until the root cause is resolved.
+		 */
+		if (nr_managed < nr_free) {
+			pr_notice_once("Node %d, zone %d: managed_pages (%lu) is less than free_pages (%lu)\n",
+				       nid, zone_type, nr_managed, nr_free);
+			nr_managed = nr_free;
+		}
+		managed_pages += nr_managed;
+	}
 	val->totalram = managed_pages;
 	val->sharedram = node_page_state(pgdat, NR_SHMEM);
 	val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);


More information about the Devel mailing list