[Devel] [PATCH RH9 05/33] exit: clear TIF_MEMDIE after exit_task_work

Andrey Zhadchenko andrey.zhadchenko at virtuozzo.com
Thu Sep 23 22:08:08 MSK 2021


From: Vladimir Davydov <vdavydov at virtuozzo.com>

An mm_struct may be pinned by a file. An example is vhost-net device
created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
vhost_dev_set_owner). If such process gets OOM-killed, the reference to
its mm_struct will only be released from exit_task_work -> ____fput ->
__fput -> vhost_net_release -> vhost_dev_cleanup, which is called after
exit_mmap, where TIF_MEMDIE is cleared. As a result, we can start
selecting the next victim before giving the last one a chance to free
its memory. In practice, this leads to killing several VMs along with
the fattest one.

https://jira.sw.ru/browse/PSBM-44683

Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>

khorenko@: Volodya tried to send this upstream, but the fix was not applied:
https://lkml.org/lkml/2016/2/29/537

The patch was rejected because in ms it increases chances for deadlock:
someone takes a lock A->tries to alloc memory->no memory->calls OOM->
OOM selects a task->task requires lock A in order to die-> deadlock.

Better solution has not been implemented in ms, we are appliying the current
patch because we have a timeout against such a deadlock: in case OOM cannot
kill a task in X secs, the OOM caller drops locks and tries to allocate memory
once again.

Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>

(cherry picked from vz8 commit bd5ffae6952cb97fd97d1ffdba6049baab6c9396)
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
---
 kernel/exit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 9a89e7f..9e07095 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -499,8 +499,6 @@ static void exit_mm(void)
 	mmap_read_unlock(mm);
 	mm_update_next_owner(mm);
 	mmput(mm);
-	if (test_thread_flag(TIF_MEMDIE))
-		exit_oom_victim();
 }
 
 static struct task_struct *find_alive_thread(struct task_struct *p)
@@ -824,6 +822,8 @@ void __noreturn do_exit(long code)
 	exit_task_namespaces(tsk);
 	exit_task_work(tsk);
 	exit_thread(tsk);
+	if (test_thread_flag(TIF_MEMDIE))
+		exit_oom_victim();
 
 	/*
 	 * Flush inherited counters to the parent - before the parent
-- 
1.8.3.1



More information about the Devel mailing list