[Devel] [PATCH RHEL8 COMMIT] mm: allow kmem limit bypassing if reclaimable slabs detected
Konstantin Khorenko
khorenko at virtuozzo.com
Fri Jun 11 15:09:44 MSK 2021
The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.40
------>
commit f16035de517f0bfe406c4a4331fe542a6195c5cb
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date: Fri Jun 11 15:09:44 2021 +0300
mm: allow kmem limit bypassing if reclaimable slabs detected
If we generate a lot of kmem (dentries and inodes in particular)
we may hit cgroup kmem limit in GFP_NOFS context (e.g. in
ext4_alloc_inode()) and fail to free reclaimable inodes due to NOFS
context.
Detect reclaimable kmem on hitting the limit and allow to bypass the
limit - reclaim will happen on next kmem alloc in GFP_KERNEL context.
Honor "vm.vfs_cache_min_ratio" sysctl and don't bypass in case the
amount of reclaimable kmem is not enough.
https://jira.sw.ru/browse/PSBM-91566
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
Rebased to vz8:
- As EINTR logic and bypass mark is gone for try_charge we should
just force allocation
- Use mem_page_state instead of obsolete mem_cgroup_read_stat2_fast
(cherry-picked from vz7 commit 1bbcb753b7f9 ("mm: allow kmem limit bypassing if
reclaimable slabs detected"))
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
---
fs/super.c | 3 ++-
mm/memcontrol.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index ca7863c56079..4dc309ac42f8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -50,7 +50,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
"sb_internal",
};
-static bool dcache_is_low(struct mem_cgroup *memcg)
+bool dcache_is_low(struct mem_cgroup *memcg)
{
unsigned long anon, file, dcache;
int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio);
@@ -68,6 +68,7 @@ static bool dcache_is_low(struct mem_cgroup *memcg)
return dcache / vfs_cache_min_ratio <
(anon + file + dcache) / 100;
}
+EXPORT_SYMBOL(dcache_is_low);
/*
* One thing we have to be careful of with a per-sb shrinker is that we don't
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6e7e0495da7c..18f90e5467bb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2411,6 +2411,28 @@ void mem_cgroup_handle_over_high(void)
current->memcg_nr_pages_over_high = 0;
}
+extern bool dcache_is_low(struct mem_cgroup *memcg);
+/*
+ * Do we have anything to reclaim in memcg kmem?
+ * Have to honor vfs_cache_min_ratio here because if dcache_is_low()
+ * we won't reclaim dcache at all in do_shrink_slab().
+ */
+static bool kmem_reclaim_is_low(struct mem_cgroup *memcg)
+{
+#define KMEM_RECLAIM_LOW_MARK 32
+
+ unsigned long dcache;
+ int vfs_cache_min_ratio = READ_ONCE(sysctl_vfs_cache_min_ratio);
+
+ if (vfs_cache_min_ratio <= 0) {
+ dcache = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE);
+
+ return dcache < KMEM_RECLAIM_LOW_MARK;
+ }
+
+ return dcache_is_low(memcg);
+}
+
static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, bool kmem_charge,
unsigned int nr_pages, bool cache_charge)
{
@@ -2543,6 +2565,16 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, bool kmem_charge
if (fatal_signal_pending(current))
goto force;
+ /*
+ * We might have [a lot of] reclaimable kmem which we cannot reclaim in
+ * the current context, e.g. lot of inodes/dentries while tring to get
+ * allocate kmem for new inode with GFP_NOFS.
+ * Thus overcharge kmem now, it will be reclaimed on next allocation in
+ * usual GFP_KERNEL context.
+ */
+ if (kmem_limit && !kmem_reclaim_is_low(mem_over_limit))
+ goto force;
+
/*
* keep retrying as long as the memcg oom killer is able to make
* a forward progress or bypass the charge if the oom killer
More information about the Devel
mailing list