[Devel] [PATCH RHEL7 COMMIT] fs/mm: writeback: fix per bdi dirty background threshold calculation
Konstantin Khorenko
khorenko at virtuozzo.com
Tue Apr 12 03:13:10 PDT 2016
The commit is pushed to "branch-rh7-3.10.0-327.10.1.vz7.12.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.10.1.vz7.12.7
------>
commit d179e86ac4293a2d9c7fc25e574b916e23383531
Author: Vladimir Davydov <vdavydov at virtuozzo.com>
Date: Tue Apr 12 14:13:10 2016 +0400
fs/mm: writeback: fix per bdi dirty background threshold calculation
After patch [1] introduced upper and lower boundaries for per bdi dirty
threshold (see bdi->min_dirty_pages and max_dirty_pages), it is
incorrect to use bdi_dirty_limit() helper for calculating background
threshold. E.g. on a 16 GB host, bdi_dirty_limit() would return the
following values for a FUSE device if the upper boundary was unset:
bdi_thresh = (16 GB * 20 / 100) * 20 / 100 = 655 MB
^^^^^ ^^^^^^^^ ^^^^^^^^
RAM size bdi->max_ratio
vm.dirty_ratio
bdi_bg_thresh = (16 GB * 10 / 100) * 20 / 100 = 327 MB
^^^^^ ^^^^^^^^ ^^^^^^^^
RAM size bdi->max_ratio
vm.dirty_background_ratio
which looks fine.
However, with the default upper threshold of 256 MB for FUSE devices,
both dirty and background thresholds will be equal to 256 MB. As a
result the background flusher will only wake up once the writer is
throttled. This obviously results in a huge write rate degradation.
To fix this issue, let's use bdi_dirty_limit() helper only for
calculating the throttle threshold, and compute the background threshold
as follows:
bdi_bg_thresh = bdi_thresh * global_background_thresh / global_thresh
https://jira.sw.ru/browse/PSBM-45497
Fixes: 2f5b9552e256d ("fuse: improve bdi dirty memory limits for fuse") [1]
Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
Acked-by: Maxim Patlasov <mpatlasov at virtuozzo.com>
---
fs/fs-writeback.c | 8 ++++++--
mm/page-writeback.c | 7 +++++--
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b6f2e3f..55eca54 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -835,6 +835,7 @@ long writeback_inodes_wb(struct bdi_writeback *wb, long nr_pages,
static bool over_bground_thresh(struct backing_dev_info *bdi)
{
unsigned long background_thresh, dirty_thresh;
+ unsigned long bdi_thresh, bdi_bg_thresh;
global_dirty_limits(&background_thresh, &dirty_thresh);
@@ -842,8 +843,11 @@ static bool over_bground_thresh(struct backing_dev_info *bdi)
global_page_state(NR_UNSTABLE_NFS) > background_thresh)
return true;
- if (bdi_stat(bdi, BDI_RECLAIMABLE) >
- bdi_dirty_limit(bdi, background_thresh))
+ bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
+ bdi_bg_thresh = div_u64((u64)bdi_thresh * background_thresh,
+ dirty_thresh);
+
+ if (bdi_stat(bdi, BDI_RECLAIMABLE) > bdi_bg_thresh)
return true;
return false;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 64a64f3..35e3ba8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1129,12 +1129,15 @@ static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
* of backing device (see the implementation of bdi_dirty_limit()).
*/
if (unlikely(bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
+ unsigned long bdi_bg_thresh;
+
+ bdi_bg_thresh = div_u64((u64)bdi_thresh * bg_thresh, thresh);
+
dirty = bdi_dirty;
if (bdi_dirty < 8)
setpoint = bdi_dirty + 1;
else
- setpoint = (bdi_thresh +
- bdi_dirty_limit(bdi, bg_thresh)) / 2;
+ setpoint = (bdi_thresh + bdi_bg_thresh) / 2;
}
if (dirty < setpoint) {
More information about the Devel
mailing list