[Devel] [PATCH RHEL7 COMMIT] ve/radix-tree: do not account radix_tree_nodes to memcg

Konstantin Khorenko khorenko at virtuozzo.com
Fri Aug 28 07:44:29 PDT 2015


The commit is pushed to "branch-rh7-3.10.0-229.7.2-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-229.7.2.vz7.6.3
------>
commit d4b302e64d3523bddf4e300d0a975a7717ac784b
Author: Vladimir Davydov <vdavydov at parallels.com>
Date:   Fri Aug 28 18:44:29 2015 +0400

    ve/radix-tree: do not account radix_tree_nodes to memcg
    
    There are two problems if they are accounted.
    
    First, radix_tree_nodes allocated by tcache/tswap for storing their
    internal data will be accounted to the container that issued a store,
    which is wrong, because they can only get reclaimed on global pressure.
    Using __GFP_NOACCOUNT in tcache/tswap wouldn't help due to per cpu
    radix_tree_node preloads.
    
    Second, workingset detection logic (see mm/workingset.c) is still not
    memory cgroup aware. In particular, this means that shadow
    radix_tree_nodes can only be reclaimed on global memory pressure
    although they are accounted to a memory cgroup. As a result, after
    reading a huge file, all the container's memory can get filled with
    shadow entries, which won't be reclaimed on local memory pressure,
    making the container unusable.
    
    This is a quick-fix which makes radix_tree_nodes unaccountable. This is
    acceptable for now, because we had never accounted radix_tree_nodes
    before Vz7 anyway. The true fix would be (a) making radix_tree_node
    preloads unaccountable (or per memory cgroup) and (b) making workingset
    detection logic memory cgroup aware. This should and will be done
    upstream first.
    
    https://jira.sw.ru/browse/PSBM-35205
    
    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
---
 lib/radix-tree.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index dd3347f..4b362cb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -228,7 +228,8 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		}
 	}
 	if (ret == NULL)
-		ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+		ret = kmem_cache_alloc(radix_tree_node_cachep,
+				       gfp_mask | __GFP_NOACCOUNT);
 
 	BUG_ON(radix_tree_is_indirect_ptr(ret));
 	return ret;
@@ -279,7 +280,8 @@ static int __radix_tree_preload(gfp_t gfp_mask)
 	rtp = &__get_cpu_var(radix_tree_preloads);
 	while (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
 		preempt_enable();
-		node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+		node = kmem_cache_alloc(radix_tree_node_cachep,
+					gfp_mask | __GFP_NOACCOUNT);
 		if (node == NULL)
 			goto out;
 		preempt_disable();



More information about the Devel mailing list