[Devel] [PATCH rh7 v2 00/10] mm/mem_cgroup_iter: Reduce the number of iterator restarts upon cgroup removals

Wed Feb 24 15:43:25 MSK 2021

On 24.02.2021 15:39, Kirill Tkhai wrote:
> On 24.02.2021 15:36, Kirill Tkhai wrote:
>> On 24.02.2021 13:47, Konstantin Khorenko wrote:
>>> May thanks to Kirill Tkhai for his bright ideas and review!
>>>
>>> Problem description from the user point of view:
>>>   * the Node is slow
>>>   * the Node has a lot of free RAM
>>>   * the Node has a lot of swapin/swapout
>>>   * kswapd is always running
>>>
>>> Problem in a nutshell from technical point of view:
>>>   * kswapd is looping in shrink_zone() inside the loop
>>>       do {} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
>>>     (and never goes trough the outer loop)
>>>   * there are a quite a number of memory cgroups of the Node (~1000)
>>>   * some cgroups are hard to reclaim (reclaim may take ~3 seconds),
>>>     this is because of very busy disk due to permanent swapin/swapout
>>>   * mem_cgroup_iter() does not have success scanning all cgroups
>>>     in a row, it restarts from the root cgroup one time after
>>>     another (after different number of cgroups scanned)
>>>
>>> Q: Why does mem_cgroup_iter() restart from the root memcg?
>>> A: Because it is invalidated once some memory cgroup is
>>>    destroyed on the Node.
>>>    Note: ANY memory cgroup destroy on the Node leads to iter
>>>    restart.
>>>
>>> The following patchset solves this problem in the following way:
>>> there is no need to restart the iter until we see the iter has
>>> the position which is exactly the memory cgroup being destroyed.
>>>
>>> The patchset ensures the iter->last_visited is NULL-ified on
>>> invalidation and thus restarts only in the unlikely case when
>>> the iter points to the memcg being destroyed.
>>>
>>> https://jira.sw.ru/browse/PSBM-123655
>>>
>>> v2 changes:
>>>  - reverted 2 patches in this code which were focused on syncronizing
>>>    updates of iter->last_visited and ->last_dead_count
>>>    (as we are getting rid of iter->last_dead_count at all)
>>>  - use rcu primitives to access iter->last_visited
>>>
>>> Konstantin Khorenko (10):
>>>   Revert "mm/memcg: fix css_tryget(),css_put() imbalance"
>>>   Revert "mm/memcg: use seqlock to protect reclaim_iter updates"
>>>   mm/mem_cgroup_iter: Make 'iter->last_visited' a bit more stable
>>>   mm/mem_cgroup_iter: Always assign iter->last_visited under rcu
>>>   mm/mem_cgroup_iter: NULL-ify 'last_visited' for invalidated iterators
>>>   mm/mem_cgroup_iter: Provide _iter_invalidate() the dying memcg as an
>>>     argument
>>>   mm/mem_cgroup_iter: Invalidate iterators only if needed
>>>   mm/mem_cgroup_iter: Don't bother checking 'dead_count' anymore
>>>   mm/mem_cgroup_iter: Cleanup mem_cgroup_iter_load()
>>>   mm/mem_cgroup_iter: Drop dead_count related infrastructure
>>>
>>>  mm/memcontrol.c | 133 +++++++++++++++++++++++++++---------------------
>>>  1 file changed, 76 insertions(+), 57 deletions(-)
>>
>> Somewhere we need
>>
>>         struct mem_cgroup __rcu *last_visited;
>>
>> with a comment that it's of RCU type because of we provide pointed memory are
>> not freed one grace period after we rewrite it.
> 
> Hm, but we don't provide this, do we?

Ok, we provide that in

                call_rcu(&cgrp->rcu_head, cgroup_free_rcu);