[Devel] [PATCH RH8] mm/page_alloc: Adjust the number of managed pages for a zone if it is wrong

Evgenii Shatokhin eshatokhin at virtuozzo.com
Mon Jul 12 22:53:11 MSK 2021


(A temporary hack, to be dropped after the rebase on top of RHEL 8.4.)

In certain cases, the number of managed pages in a memory zone becomes
less than the number of free pages, leading to negative or overly large
'MemUsed' value (managed_pages - free_pages) shown in
/sys/devices/system/node/node*/meminfo.

It is suspected that the number of managed pages is calculated
incorrectly for some reason on NUMA systems. However, the root cause is
unclear.

The patch detects such conditions, outputs a message to dmesg and
'corrects' the value of managed_pages used to prepare data for 'meminfo'
files. It does not change zone->managed_pages, only the stats shown to
the users. So, it is not the fix but, instead, it just hides the problem
and allows our testing to continue.

The patch was prepared in the scope of
https://jira.sw.ru/browse/PSBM-129304.

The problem seems to be fixed in the kernel from RHEL 8.4, so the patch
should be dropped after rebase on top of that kernel.

Signed-off-by: Evgenii Shatokhin <eshatokhin at virtuozzo.com>
---
 mm/page_alloc.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fca875aa8ab3..874de3912942 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5009,8 +5009,25 @@ void si_meminfo_node(struct sysinfo *val, int nid)
 	unsigned long free_highpages = 0;
 	pg_data_t *pgdat = NODE_DATA(nid);
 
-	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
-		managed_pages += pgdat->node_zones[zone_type].managed_pages;
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
+		struct zone *zone = &pgdat->node_zones[zone_type];
+		unsigned long nr_managed = zone->managed_pages;
+		unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES);
+
+		/*
+		 * HACK, PSBM-129304
+		 * In certain cases, the number of managed pages becomes less
+		 * than the number of free pages in a zone, leading to negative
+		 * or overly large 'MemUsed' (managed_pages - free_pages).
+		 * 'Correct' the numbers until the root cause is resolved.
+		 */
+		if (nr_managed < nr_free) {
+			pr_notice_once("Node %d, zone %d: managed_pages (%lu) is less than free_pages (%lu)\n",
+				       nid, zone_type, nr_managed, nr_free);
+			nr_managed = nr_free;
+		}
+		managed_pages += nr_managed;
+	}
 	val->totalram = managed_pages;
 	val->sharedram = node_page_state(pgdat, NR_SHMEM);
 	val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);
-- 
2.29.0



More information about the Devel mailing list