[Devel] [PATCH RHEL7 COMMIT] mm: Change formula of calculation of default min_free_kbytes

Konstantin Khorenko khorenko at virtuozzo.com
Fri Mar 22 23:31:08 MSK 2019


The commit is pushed to "branch-rh7-3.10.0-957.10.1.vz7.85.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-957.10.1.vz7.85.2
------>
commit 642f61f480977033660bfe63855b6e03a5ac89cb
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date:   Wed Dec 6 18:13:08 2017 +0300

    mm: Change formula of calculation of default min_free_kbytes
    
    Parameter min_free_kbytes acts on per zone watermarks. It is used
    to calculate the zones free memory value, below which the direct
    reclaim starts and becomes throttled (the called task sleeps).
    
    This patch makes default min_free_kbytes to be 2% of available
    physical memory, but not more than 4GB. And this is more, than
    previous formula gave (it was a sqrt). Why do we need that.
    
    We bumped in the situation, when intense disc write inside a CT
    on a node, having very few free memory, may lead to the state,
    when almost all tasks are spining in direct reclaim. The tasks
    can't do effective reclaim as generated dirty pages are written
    and released by ploop threads, and thus the tasks in practically
    are just busy looping. Ploop threads can't produce the effective
    reclaim, as processors are occupied by the busylooping tasks
    and also they need free pages to do that. So, the system is
    looping and becomes very slow and unresponsible.
    
    https://jira.sw.ru/browse/PSBM-69296
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    khorenko@:
    We apply the patch once again because of
    https://pmc.acronis.com/browse/VSTOR-21390
    
    We faced a situation when (under high memory pressure caused by
    pagecache) skbs were allocated from pfmemalloc reserves and
    correspondingly were dropped in sk_filter_trim_cap() which degrades the
    network latency a lot.
    We better have less network latency and pay by keeping more memory free.
    
    Later this sysctl is to be configured by userspace.
---
 mm/page_alloc.c | 27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 134c93a1962b..5dc48331242a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6812,27 +6812,6 @@ void setup_per_zone_wmarks(void)
 
 /*
  * Initialise min_free_kbytes.
- *
- * For small machines we want it small (128k min).  For large machines
- * we want it large (64MB max).  But it is not linear, because network
- * bandwidth does not increase linearly with machine size.  We use
- *
- * 	min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
- *	min_free_kbytes = sqrt(lowmem_kbytes * 16)
- *
- * which yields
- *
- * 16MB:	512k
- * 32MB:	724k
- * 64MB:	1024k
- * 128MB:	1448k
- * 256MB:	2048k
- * 512MB:	2896k
- * 1024MB:	4096k
- * 2048MB:	5792k
- * 4096MB:	8192k
- * 8192MB:	11584k
- * 16384MB:	16384k
  */
 int __meminit init_per_zone_wmark_min(void)
 {
@@ -6840,14 +6819,14 @@ int __meminit init_per_zone_wmark_min(void)
 	int new_min_free_kbytes;
 
 	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
-	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
+	new_min_free_kbytes = lowmem_kbytes * 2 / 100; /* 2% */
 
 	if (new_min_free_kbytes > user_min_free_kbytes) {
 		min_free_kbytes = new_min_free_kbytes;
 		if (min_free_kbytes < 128)
 			min_free_kbytes = 128;
-		if (min_free_kbytes > 65536)
-			min_free_kbytes = 65536;
+		if (min_free_kbytes > 4194304)
+			min_free_kbytes = 4194304;
 	} else {
 		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
 				new_min_free_kbytes, user_min_free_kbytes);



More information about the Devel mailing list