[Devel] [PATCH RHEL7 COMMIT] mm: Change formula of calculation of default min_free_kbytes

Konstantin Khorenko khorenko at virtuozzo.com
Wed Dec 6 18:13:09 MSK 2017


The commit is pushed to "branch-rh7-3.10.0-693.11.1.vz7.39.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.11.1.vz7.39.1
------>
commit 1de6694f6a766df4fddebc206b7754cd468eb991
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date:   Wed Dec 6 18:13:08 2017 +0300

    mm: Change formula of calculation of default min_free_kbytes
    
    Parameter min_free_kbytes acts on per zone watermarks. It is used
    to calculate the zones free memory value, below which the direct
    reclaim starts and becomes throttled (the called task sleeps).
    
    This patch makes default min_free_kbytes to be 2% of available
    physical memory, but not more than 4GB. And this is more, than
    previous formula gave (it was a sqrt). Why do we need that.
    
    We bumped in the situation, when intense disc write inside a CT
    on a node, having very few free memory, may lead to the state,
    when almost all tasks are spining in direct reclaim. The tasks
    can't do effective reclaim as generated dirty pages are written
    and released by ploop threads, and thus the tasks in practically
    are just busy looping. Ploop threads can't produce the effective
    reclaim, as processors are occupied by the busylooping tasks
    and also they need free pages to do that. So, the system is
    looping and becomes very slow and unresponsible.
    
    https://jira.sw.ru/browse/PSBM-69296
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
 mm/page_alloc.c | 27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f2b7f49493f8..40700c3bd133 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6399,27 +6399,6 @@ void setup_per_zone_wmarks(void)
 
 /*
  * Initialise min_free_kbytes.
- *
- * For small machines we want it small (128k min).  For large machines
- * we want it large (64MB max).  But it is not linear, because network
- * bandwidth does not increase linearly with machine size.  We use
- *
- * 	min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
- *	min_free_kbytes = sqrt(lowmem_kbytes * 16)
- *
- * which yields
- *
- * 16MB:	512k
- * 32MB:	724k
- * 64MB:	1024k
- * 128MB:	1448k
- * 256MB:	2048k
- * 512MB:	2896k
- * 1024MB:	4096k
- * 2048MB:	5792k
- * 4096MB:	8192k
- * 8192MB:	11584k
- * 16384MB:	16384k
  */
 int __meminit init_per_zone_wmark_min(void)
 {
@@ -6427,14 +6406,14 @@ int __meminit init_per_zone_wmark_min(void)
 	int new_min_free_kbytes;
 
 	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
-	new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);
+	new_min_free_kbytes = lowmem_kbytes * 2 / 100; /* 2% */
 
 	if (new_min_free_kbytes > user_min_free_kbytes) {
 		min_free_kbytes = new_min_free_kbytes;
 		if (min_free_kbytes < 128)
 			min_free_kbytes = 128;
-		if (min_free_kbytes > 65536)
-			min_free_kbytes = 65536;
+		if (min_free_kbytes > 4194304)
+			min_free_kbytes = 4194304;
 	} else {
 		pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
 				new_min_free_kbytes, user_min_free_kbytes);


More information about the Devel mailing list