[Devel] [PATCH vz9 08/14] ve/umh: create kernel thread for each synchronious UMH request

Mon Sep 27 09:29:54 MSK 2021

From: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>

So, rationale...
UHM request is just a work, attached to some queue, which is handled by
per-cpu kernel threads (this is a simple explanation, but true in general).

We want to execute this work in container environment because:
1) we need to set container root to execute the right binary and
2) we want to acoount userspace process resources in container.

The solution we had previously and have now is to create per-container
UMH kthreads to serve UMH per-container requests. This is done by creating and
attaching "kthreadd" kthread first and then forking all the kthreads from it
(including UMH "khelper" kthreads).

But it's revealed, that on start NFSd queues synchroneous UMH work, which
executes a userspace binary, which in turn enters the kernel and queues
another work. The problem is that NFSd waits for the first one to finish,
while it depends on the second to finish. And the second can't be executed,
because "khelper" thread is occupied by the first work (i.e. waiting it to
finish).
IOW it's a deadlock.

Initial idea was to create a per-cpu "khelpers" with a shared workqueue, but
it wasn't clear, how this will scale in case of changing cpuset cgroup
parameteres, plus having idle kthreads is resources wastage.
Another approach was to try to use cpu hot plug callbacks, but yet again it's
not clear how this interracts with virtual cpus (and cpuset cgroups) and how
to provide a custom VE pointer into cpuhp callbacks.

So, this patch solves the issue in a simple and stright-forward manner:
instead of executing synchroneous work, "khelper" forks another kernel thread
for it thus being able to process another work immideately.
This allows to execute UMH works asynchronously and thus solves the issue.

Note: we pay fork overhead (which is negligible in assumption, that UMH is not
called that often), but get simplicity plus we aren't wasting resources for
additional per-cpu "khelpers".

Suggested-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>

(cherry-picked from vz8 commit c72dd108711c ("ve/umh: create kernel thread for
each synchronious UMH request"))

Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
---
 kernel/umh.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/umh.c b/kernel/umh.c
index 15574a075c2a..e4de634358e7 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -421,12 +421,32 @@ static void call_usermodehelper_queue_ve(struct subprocess_info *info)
 	kthread_queue_work(&ve->umh_worker, &info->ve_work);
 }
 
+static int call_usermodehelper_exec_sync_cb(void *data)
+{
+	call_usermodehelper_exec_sync(data);
+	do_exit(0);
+}
+
 static void call_usermodehelper_exec_work_ve(struct kthread_work *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, ve_work);
+	pid_t pid;
+	int (*umh_worker)(void *data);
 
-	__call_usermodehelper_exec_work(sub_info);
+	/* Execute work asynchroniously if caller waits for it to avoid
+	 * deadlock (we have only one kworker).
+	 * Otherwise it's Ok to execute binary sinchroniously. */
+	if (sub_info->wait & UMH_WAIT_PROC)
+		umh_worker = call_usermodehelper_exec_sync_cb;
+	else
+		umh_worker = call_usermodehelper_exec_async;
+
+	pid = kernel_thread(umh_worker, sub_info, CLONE_PARENT | SIGCHLD);
+	if (pid < 0) {
+		sub_info->retval = pid;
+		umh_complete(sub_info);
+	}
 }
 
 int call_usermodehelper_exec_ve(struct ve_struct *ve,
-- 
2.30.2