[Devel] [PATCH rh7 v4 0/9] mm/mem_cgroup_iter: Reduce the number of iterator restarts upon cgroup removals

Fri Feb 26 12:23:51 MSK 2021

On 24.02.2021 21:55, Konstantin Khorenko wrote:
> May thanks to Kirill Tkhai for his bright ideas and review!
> 
> Problem description from the user point of view:
>   * the Node is slow
>   * the Node has a lot of free RAM
>   * the Node has a lot of swapin/swapout
>   * kswapd is always running
> 
> Problem in a nutshell from technical point of view:
>   * kswapd is looping in shrink_zone() inside the loop
>       do {} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
>     (and never goes trough the outer loop)
>   * there are a quite a number of memory cgroups of the Node (~1000)
>   * some cgroups are hard to reclaim (reclaim may take ~3 seconds),
>     this is because of very busy disk due to permanent swapin/swapout
>   * mem_cgroup_iter() does not have success scanning all cgroups
>     in a row, it restarts from the root cgroup one time after
>     another (after different number of cgroups scanned)
> 
> Q: Why does mem_cgroup_iter() restart from the root memcg?
> A: Because it is invalidated once some memory cgroup is
>    destroyed on the Node.
>    Note: ANY memory cgroup destroy on the Node leads to iter
>    restart.
> 
> The following patchset solves this problem in the following way:
> there is no need to restart the iter until we see the iter has
> the position which is exactly the memory cgroup being destroyed.
> 
> The patchset ensures the iter->last_visited is NULL-ified on
> invalidation and thus restarts only in the unlikely case when
> the iter points to the memcg being destroyed.
> 
> https://jira.sw.ru/browse/PSBM-123655
> 
> v2 changes:
>  - reverted 2 patches in this code which were focused on syncronizing
>    updates of iter->last_visited and ->last_dead_count
>    (as we are getting rid of iter->last_dead_count at all)
>  - use rcu primitives to access iter->last_visited
> 
> v3 changes:
>  - more comments explaining the locking scheme
>  - use rcu_read_{lock,unlock}_sched() in mem_cgroup_iter()
>    for syncronization with iterator invalidation func
>  - do not use rcu_read_{lock/unlock}() wrap in iterator invalidation func
>    as it protects nothing
> 
> v4 changes:
>  - extended comment why iter invalidation function must see all
>    pointers to dying memcg and no pointer to it can be written later
> 
> Konstantin Khorenko (9):
>   Revert "mm/memcg: fix css_tryget(),css_put() imbalance"
>   Revert "mm/memcg: use seqlock to protect reclaim_iter updates"
>   mm/mem_cgroup_iter: Make 'iter->last_visited' a bit more stable
>   mm/mem_cgroup_iter: Always assign iter->last_visited under rcu
>   mm/mem_cgroup_iter: Provide _iter_invalidate() the dying memcg as an
>     argument
>   mm/mem_cgroup_iter: NULL-ify 'last_visited' for invalidated iterators
>   mm/mem_cgroup_iter: Don't bother checking 'dead_count' anymore
>   mm/mem_cgroup_iter: Cleanup mem_cgroup_iter_load()
>   mm/mem_cgroup_iter: Drop dead_count related infrastructure
> 
>  mm/memcontrol.c | 208 ++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 150 insertions(+), 58 deletions(-)

The final result is OK, but [4/9] hunks must be moved.