[Devel] [PATCH RHEL9 COMMIT] ms/mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks

Konstantin Khorenko khorenko at virtuozzo.com
Fri Nov 12 20:30:29 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-4.vz9.10.26
------>
commit 3b18b86fbcdd394f38bd51f5d1d0ed49dfcf78b2
Author: Vasily Averin <vvs at virtuozzo.com>
Date:   Fri Nov 12 20:30:28 2021 +0300

    ms/mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
    
    Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3.
    
    Memory cgroup charging allows killed or exiting tasks to exceed the hard
    limit.  It can be misused and allowed to trigger global OOM from inside
    a memcg-limited container.  On the other hand if memcg fails allocation,
    called from inside #PF handler it triggers global OOM from inside
    pagefault_out_of_memory().
    
    To prevent these problems this patchset:
     (a) removes execution of out_of_memory() from
         pagefault_out_of_memory(), becasue nobody can explain why it is
         necessary.
     (b) allow memcg to fail allocation of dying/killed tasks.
    
    This patch (of 3):
    
    Any allocation failure during the #PF path will return with VM_FAULT_OOM
    which in turn results in pagefault_out_of_memory which in turn executes
    out_out_memory() and can kill a random task.
    
    An allocation might fail when the current task is the oom victim and
    there are no memory reserves left.  The OOM killer is already handled at
    the page allocator level for the global OOM and at the charging level
    for the memcg one.  Both have much more information about the scope of
    allocation/charge request.  This means that either the OOM killer has
    been invoked properly and didn't lead to the allocation success or it
    has been skipped because it couldn't have been invoked.  In both cases
    triggering it from here is pointless and even harmful.
    
    It makes much more sense to let the killed task die rather than to wake
    up an eternally hungry oom-killer and send him to choose a fatter victim
    for breakfast.
    
    Link: https://lkml.kernel.org/r/0828a149-786e-7c06-b70a-52d086818ea3@virtuozzo.com
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
    
    Suggested-by: Michal Hocko <mhocko at suse.com>
    Acked-by: Michal Hocko <mhocko at suse.com>
    Cc: Johannes Weiner <hannes at cmpxchg.org>
    Cc: Mel Gorman <mgorman at techsingularity.net>
    Cc: Roman Gushchin <guro at fb.com>
    Cc: Shakeel Butt <shakeelb at google.com>
    Cc: Tetsuo Handa <penguin-kernel at i-love.sakura.ne.jp>
    Cc: Uladzislau Rezki <urezki at gmail.com>
    Cc: Vladimir Davydov <vdavydov.dev at gmail.com>
    Cc: Vlastimil Babka <vbabka at suse.cz>
    Cc: <stable at vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    https://jira.sw.ru/browse/PSBM-134774
    (cherry picked from commit 0b28179a6138a5edd9d82ad2687c05b3773c387b)
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
---
 mm/oom_kill.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f603a954a646..1eeea2900828 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1260,6 +1260,9 @@ void pagefault_out_of_memory(void)
 	if (mem_cgroup_oom_synchronize(true))
 		return;
 
+	if (fatal_signal_pending(current))
+		return;
+
 	if (!mutex_trylock(&oom_lock))
 		return;
 	out_of_memory(&oc);


More information about the Devel mailing list