[Devel] [PATCH RHEL7 COMMIT] mm/mem_cgroup_iter: Drop dead_count related infrastructure
Vasily Averin
vvs at virtuozzo.com
Wed Mar 3 09:26:42 MSK 2021
The commit is pushed to "branch-rh7-3.10.0-1160.15.2.vz7.173.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.15.2.vz7.173.1
------>
commit ce1b1505bf0e4c0eec0a1fa5db1e5afce72f1c79
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date: Wed Mar 3 09:26:42 2021 +0300
mm/mem_cgroup_iter: Drop dead_count related infrastructure
Patch-set description:
May thanks to Kirill Tkhai for his bright ideas and review!
Problem description from the user point of view:
* the Node is slow
* the Node has a lot of free RAM
* the Node has a lot of swapin/swapout
* kswapd is always running
Problem in a nutshell from technical point of view:
* kswapd is looping in shrink_zone() inside the loop
do {} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
(and never goes trough the outer loop)
* there are a quite a number of memory cgroups of the Node (~1000)
* some cgroups are hard to reclaim (reclaim may take ~3 seconds),
this is because of very busy disk due to permanent swapin/swapout
* mem_cgroup_iter() does not have success scanning all cgroups
in a row, it restarts from the root cgroup one time after
another (after different number of cgroups scanned)
Q: Why does mem_cgroup_iter() restart from the root memcg?
A: Because it is invalidated once some memory cgroup is
destroyed on the Node.
Note: ANY memory cgroup destroy on the Node leads to iter
restart.
The following patchset solves this problem in the following way:
there is no need to restart the iter until we see the iter has
the position which is exactly the memory cgroup being destroyed.
The patchset ensures the iter->last_visited is NULL-ified on
invalidation and thus restarts only in the unlikely case when
the iter points to the memcg being destroyed.
Testing: i've tested this patchset using modified kernel which breaks
the memcg iterator in case of global reclaim with probability of 2%.
3 kernels have been tested: "release", KASAN-only, "debug" kernels.
Each worked for 12 hours, no issues, from 12000 to 26000 races were
caught during this period (i.e. dying memcg was found in some iterator
and wiped).
The testing scenario is documented in the jira issue.
https://jira.sw.ru/browse/PSBM-123655
+++ Current patch description:
As we now have stable and reliable iter->last_visited,
don't need to save/compare number of destroyed cgroups.
https://jira.sw.ru/browse/PSBM-123655
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
mm/memcontrol.c | 21 ++++-----------------
1 file changed, 4 insertions(+), 17 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 45ac3fd..8dbd140 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -199,7 +199,6 @@ struct mem_cgroup_reclaim_iter {
* protection scheme.
*/
struct mem_cgroup __rcu *last_visited;
- unsigned long last_dead_count;
/* scan generation, increased every round-trip */
unsigned int generation;
@@ -405,7 +404,6 @@ struct mem_cgroup {
spinlock_t pcp_counter_lock;
atomic_long_t oom;
- atomic_t dead_count;
#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_INET)
struct tcp_memcontrol tcp_mem;
struct udp_memcontrol udp_mem;
@@ -1632,19 +1630,11 @@ static void mem_cgroup_iter_invalidate(struct mem_cgroup *root,
}
}
}
-
- /*
- * When a group in the hierarchy below root is destroyed, the
- * hierarchy iterator can no longer be trusted since it might
- * have pointed to the destroyed group. Invalidate it.
- */
- atomic_inc(&root->dead_count);
}
static struct mem_cgroup *
mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter,
- struct mem_cgroup *root,
- int *sequence)
+ struct mem_cgroup *root)
{
struct mem_cgroup *position = NULL;
/*
@@ -1672,8 +1662,7 @@ mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter,
static void mem_cgroup_iter_update(struct mem_cgroup_reclaim_iter *iter,
struct mem_cgroup *last_visited,
struct mem_cgroup *new_position,
- struct mem_cgroup *root,
- int sequence)
+ struct mem_cgroup *root)
{
/*
* The position saved in 'last_visited' is always valid.
@@ -1789,7 +1778,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
rcu_read_lock_sched();
while (!memcg) {
struct mem_cgroup_reclaim_iter *uninitialized_var(iter);
- int uninitialized_var(seq);
if (reclaim) {
int nid = zone_to_nid(reclaim->zone);
@@ -1803,14 +1791,13 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
goto out_unlock;
}
- last_visited = mem_cgroup_iter_load(iter, root, &seq);
+ last_visited = mem_cgroup_iter_load(iter, root);
}
memcg = __mem_cgroup_iter_next(root, last_visited);
if (reclaim) {
- mem_cgroup_iter_update(iter, last_visited, memcg, root,
- seq);
+ mem_cgroup_iter_update(iter, last_visited, memcg, root);
if (!memcg)
iter->generation++;
More information about the Devel
mailing list