[Devel] [PATCH RHEL COMMIT] ve/umh: create kernel thread for each synchronious UMH request

Konstantin Khorenko khorenko at virtuozzo.com
Tue Sep 28 14:42:37 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit e64d20c95b5eb314d35e1127f316dbc4b18914d4
Author: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
Date:   Tue Sep 28 14:42:37 2021 +0300

    ve/umh: create kernel thread for each synchronious UMH request
    
    So, rationale...
    UHM request is just a work, attached to some queue, which is handled by
    per-cpu kernel threads (this is a simple explanation, but true in general).
    
    We want to execute this work in container environment because:
    1) we need to set container root to execute the right binary and
    2) we want to acoount userspace process resources in container.
    
    The solution we had previously and have now is to create per-container
    UMH kthreads to serve UMH per-container requests. This is done by creating and
    attaching "kthreadd" kthread first and then forking all the kthreads from it
    (including UMH "khelper" kthreads).
    
    But it's revealed, that on start NFSd queues synchroneous UMH work, which
    executes a userspace binary, which in turn enters the kernel and queues
    another work. The problem is that NFSd waits for the first one to finish,
    while it depends on the second to finish. And the second can't be executed,
    because "khelper" thread is occupied by the first work (i.e. waiting it to
    finish).
    IOW it's a deadlock.
    
    Initial idea was to create a per-cpu "khelpers" with a shared workqueue, but
    it wasn't clear, how this will scale in case of changing cpuset cgroup
    parameteres, plus having idle kthreads is resources wastage.
    Another approach was to try to use cpu hot plug callbacks, but yet again it's
    not clear how this interracts with virtual cpus (and cpuset cgroups) and how
    to provide a custom VE pointer into cpuhp callbacks.
    
    So, this patch solves the issue in a simple and stright-forward manner:
    instead of executing synchroneous work, "khelper" forks another kernel thread
    for it thus being able to process another work immideately.
    This allows to execute UMH works asynchronously and thus solves the issue.
    
    Note: we pay fork overhead (which is negligible in assumption, that UMH is not
    called that often), but get simplicity plus we aren't wasting resources for
    additional per-cpu "khelpers".
    
    Suggested-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
    
    (cherry-picked from vz8 commit c72dd108711c ("ve/umh: create kernel thread for
    each synchronious UMH request"))
    
    Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
---
 kernel/umh.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/umh.c b/kernel/umh.c
index 15574a075c2a..e4de634358e7 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -421,12 +421,32 @@ static void call_usermodehelper_queue_ve(struct subprocess_info *info)
 	kthread_queue_work(&ve->umh_worker, &info->ve_work);
 }
 
+static int call_usermodehelper_exec_sync_cb(void *data)
+{
+	call_usermodehelper_exec_sync(data);
+	do_exit(0);
+}
+
 static void call_usermodehelper_exec_work_ve(struct kthread_work *work)
 {
 	struct subprocess_info *sub_info =
 		container_of(work, struct subprocess_info, ve_work);
+	pid_t pid;
+	int (*umh_worker)(void *data);
 
-	__call_usermodehelper_exec_work(sub_info);
+	/* Execute work asynchroniously if caller waits for it to avoid
+	 * deadlock (we have only one kworker).
+	 * Otherwise it's Ok to execute binary sinchroniously. */
+	if (sub_info->wait & UMH_WAIT_PROC)
+		umh_worker = call_usermodehelper_exec_sync_cb;
+	else
+		umh_worker = call_usermodehelper_exec_async;
+
+	pid = kernel_thread(umh_worker, sub_info, CLONE_PARENT | SIGCHLD);
+	if (pid < 0) {
+		sub_info->retval = pid;
+		umh_complete(sub_info);
+	}
 }
 
 int call_usermodehelper_exec_ve(struct ve_struct *ve,


More information about the Devel mailing list