[Devel] [PATCH RHEL7 COMMIT] fs/mm: writeback: fix per bdi dirty background threshold calculation

Konstantin Khorenko khorenko at virtuozzo.com
Tue Apr 12 03:13:10 PDT 2016


The commit is pushed to "branch-rh7-3.10.0-327.10.1.vz7.12.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.10.1.vz7.12.7
------>
commit d179e86ac4293a2d9c7fc25e574b916e23383531
Author: Vladimir Davydov <vdavydov at virtuozzo.com>
Date:   Tue Apr 12 14:13:10 2016 +0400

    fs/mm: writeback: fix per bdi dirty background threshold calculation
    
    After patch [1] introduced upper and lower boundaries for per bdi dirty
    threshold (see bdi->min_dirty_pages and max_dirty_pages), it is
    incorrect to use bdi_dirty_limit() helper for calculating background
    threshold. E.g. on a 16 GB host, bdi_dirty_limit() would return the
    following values for a FUSE device if the upper boundary was unset:
    
      bdi_thresh = (16 GB * 20 / 100) * 20 / 100 = 655 MB
                    ^^^^^   ^^^^^^^^    ^^^^^^^^
                  RAM size           bdi->max_ratio
    
                        vm.dirty_ratio
    
      bdi_bg_thresh = (16 GB * 10 / 100) * 20 / 100 = 327 MB
                       ^^^^^   ^^^^^^^^    ^^^^^^^^
                     RAM size           bdi->max_ratio
    
                       vm.dirty_background_ratio
    
    which looks fine.
    
    However, with the default upper threshold of 256 MB for FUSE devices,
    both dirty and background thresholds will be equal to 256 MB. As a
    result the background flusher will only wake up once the writer is
    throttled. This obviously results in a huge write rate degradation.
    
    To fix this issue, let's use bdi_dirty_limit() helper only for
    calculating the throttle threshold, and compute the background threshold
    as follows:
    
      bdi_bg_thresh = bdi_thresh * global_background_thresh / global_thresh
    
    https://jira.sw.ru/browse/PSBM-45497
    
    Fixes: 2f5b9552e256d ("fuse: improve bdi dirty memory limits for fuse") [1]
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
    Acked-by: Maxim Patlasov <mpatlasov at virtuozzo.com>
---
 fs/fs-writeback.c   | 8 ++++++--
 mm/page-writeback.c | 7 +++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b6f2e3f..55eca54 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -835,6 +835,7 @@ long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
 static bool over_bground_thresh(struct backing_dev_info *bdi)
 {
 	unsigned long background_thresh, dirty_thresh;
+	unsigned long bdi_thresh, bdi_bg_thresh;
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
 
@@ -842,8 +843,11 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
 	    global_page_state(NR_UNSTABLE_NFS) > background_thresh)
 		return true;
 
-	if (bdi_stat(bdi, BDI_RECLAIMABLE) >
-				bdi_dirty_limit(bdi, background_thresh))
+	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
+	bdi_bg_thresh = div_u64((u64)bdi_thresh * background_thresh,
+				dirty_thresh);
+
+	if (bdi_stat(bdi, BDI_RECLAIMABLE) > bdi_bg_thresh)
 		return true;
 
 	return false;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 64a64f3..35e3ba8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1129,12 +1129,15 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
 	 * of backing device (see the implementation of bdi_dirty_limit()).
 	 */
 	if (unlikely(bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
+		unsigned long bdi_bg_thresh;
+
+		bdi_bg_thresh = div_u64((u64)bdi_thresh * bg_thresh, thresh);
+
 		dirty = bdi_dirty;
 		if (bdi_dirty < 8)
 			setpoint = bdi_dirty + 1;
 		else
-			setpoint = (bdi_thresh +
-				    bdi_dirty_limit(bdi, bg_thresh)) / 2;
+			setpoint = (bdi_thresh + bdi_bg_thresh) / 2;
 	}
 
 	if (dirty < setpoint) {


More information about the Devel mailing list