[Devel] [PATCH RHEL7 COMMIT] ve/radix-tree: do not account radix_tree_nodes to memcg
Konstantin Khorenko
khorenko at virtuozzo.com
Fri Aug 28 07:44:29 PDT 2015
The commit is pushed to "branch-rh7-3.10.0-229.7.2-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-229.7.2.vz7.6.3
------>
commit d4b302e64d3523bddf4e300d0a975a7717ac784b
Author: Vladimir Davydov <vdavydov at parallels.com>
Date: Fri Aug 28 18:44:29 2015 +0400
ve/radix-tree: do not account radix_tree_nodes to memcg
There are two problems if they are accounted.
First, radix_tree_nodes allocated by tcache/tswap for storing their
internal data will be accounted to the container that issued a store,
which is wrong, because they can only get reclaimed on global pressure.
Using __GFP_NOACCOUNT in tcache/tswap wouldn't help due to per cpu
radix_tree_node preloads.
Second, workingset detection logic (see mm/workingset.c) is still not
memory cgroup aware. In particular, this means that shadow
radix_tree_nodes can only be reclaimed on global memory pressure
although they are accounted to a memory cgroup. As a result, after
reading a huge file, all the container's memory can get filled with
shadow entries, which won't be reclaimed on local memory pressure,
making the container unusable.
This is a quick-fix which makes radix_tree_nodes unaccountable. This is
acceptable for now, because we had never accounted radix_tree_nodes
before Vz7 anyway. The true fix would be (a) making radix_tree_node
preloads unaccountable (or per memory cgroup) and (b) making workingset
detection logic memory cgroup aware. This should and will be done
upstream first.
https://jira.sw.ru/browse/PSBM-35205
Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
---
lib/radix-tree.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index dd3347f..4b362cb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -228,7 +228,8 @@ radix_tree_node_alloc(struct radix_tree_root *root)
}
}
if (ret == NULL)
- ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ ret = kmem_cache_alloc(radix_tree_node_cachep,
+ gfp_mask | __GFP_NOACCOUNT);
BUG_ON(radix_tree_is_indirect_ptr(ret));
return ret;
@@ -279,7 +280,8 @@ static int __radix_tree_preload(gfp_t gfp_mask)
rtp = &__get_cpu_var(radix_tree_preloads);
while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
preempt_enable();
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ node = kmem_cache_alloc(radix_tree_node_cachep,
+ gfp_mask | __GFP_NOACCOUNT);
if (node == NULL)
goto out;
preempt_disable();
More information about the Devel
mailing list