[Devel] [PATCH RH7 v2 2/2] ve/kthread: fix race when work can be added to stopped kthread worker

Kirill Tkhai ktkhai at virtuozzo.com
Wed Mar 21 17:53:18 MSK 2018


On 21.03.2018 15:38, Pavel Tikhomirov wrote:
> Race can be reproduced with steps below:
> 
> 1) Add these test patch to increase the race probability:
> 
> kernel/kmod.c
> @@ -977,6 +977,8 @@ int call_usermodehelper_fns_ve(struct ve_struct *ve,
>         khelper = ve_is_super(ve) ? &khelper_worker : &ve->ve_umh_worker;
> 
>         if (ve_is_super(ve) || (get_exec_env() == ve)) {
> +               while(ve->is_running)
> +                       cond_resched();
>                 err = call_usermodehelper_by(khelper, path, argv, envp, wait, init,
>                                              cleanup, data);
>                 goto out_put;
> 
> 2) Set corepattern to type pipe in CT:
> 
> echo "|/bin/dd of=/vz/coredumps/core-%e-%t.%p" > /proc/sys/kernel/core_pattern
> 
> 3) Produce a segfault inside a container and next try to stop the
> container killing init.
> 
> Coredump creates "dd" work and ads it to ve_umh_worker, which is already
> stopped and will never handle these work, and our work will hang
> forever, and container will never stop:
> 
> [root at kuchy ~]# cat /proc/6064/stack
> [<ffffffff810a86f9>] __call_usermodehelper_exec+0x179/0x1a0
> [<ffffffff810a8eae>] call_usermodehelper_by+0x5e/0xa0
> [<ffffffff810a901c>] call_usermodehelper_fns_ve+0xec/0x170
> [<ffffffff8128600c>] do_coredump+0x4dc/0xaf0
> [<ffffffff810a1cc5>] get_signal_to_deliver+0x1c5/0x5e0
> [<ffffffff8102a387>] do_signal+0x57/0x6b0
> [<ffffffff8102aa3f>] do_notify_resume+0x5f/0xb0
> [<ffffffff816df6ec>] retint_signal+0x48/0x8c
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Fix these by putting is_running check and usermodhelper queueing in one
> op_sem lock critical section. It works as we already have these lock
> taken in ve_stop_ns.
> 
> commit a08505af8c51 ("ve/kthread: khelper kthread in a container
> introduced") says that we can't hold lock when waiting for userspace so
> drop the lock when waiting.
> 
> v2: split function arguments change into separate preparatory patch
> 
> https://jira.sw.ru/browse/PSBM-82490
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> ---
>  kernel/kmod.c | 50 +++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 9c37e8b3ab01..2b45f06cbe51 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -862,6 +862,19 @@ struct subprocess_info *call_usermodehelper_setup(char *path, char **argv,
>  }
>  EXPORT_SYMBOL(call_usermodehelper_setup);
>  
> +static void ve_op_sem_down_read(struct ve_struct *ve)
> +{
> +	if (ve)
> +		down_read(&ve->op_sem);
> +}
> +
> +static void ve_op_sem_up_read(struct ve_struct *ve)
> +{
> +	if (ve)
> +		up_read(&ve->op_sem);
> +}
> +
> +
>  /**
>   * call_usermodehelper_exec - start a usermode application
>   * @sub_info: information about the subprocessa
> @@ -914,7 +927,9 @@ static int __call_usermodehelper_exec(struct subprocess_info *sub_info,
>  		goto unlock;
>  
>  	if (wait & UMH_KILLABLE) {
> +		ve_op_sem_up_read(ve);
>  		retval = wait_for_completion_killable(&done);
> +		ve_op_sem_down_read(ve);

I think it's not good to export ve_struct::op_sem outside ve.c file.
We will never collect this can with worms back again.

Can't we just ignore usermode helper requests, if init_task is dead?
Something like this (not tested):

diff --git a/kernel/kmod.c b/kernel/kmod.c
index 9c37e8b3ab01..88c99c57891e 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -803,6 +803,12 @@ int __usermodehelper_disable(enum umh_disable_depth depth)
 	return -EAGAIN;
 }
 
+void wait_khelpers(void)
+{
+	wait_event(running_helpers_waitq,
+		   atomic_read(&running_helpers) == 0);
+}
+
 static void helper_lock(void)
 {
 	atomic_inc(&running_helpers);
@@ -891,7 +897,9 @@ static int __call_usermodehelper_exec(struct subprocess_info *sub_info,
 	if (sub_info->path[0] == '\0')
 		goto out;
 
-	if (usermodehelper_disabled) {
+	if (usermodehelper_disabled ||
+	    (ve && (ve->init_task->flags & PF_EXITING))) {
+		smp_rmb(); /* Pairs with smp_mb() in do_exit() */
 		retval = -EBUSY;
 		goto out;
 	}
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 68f84797a840..f562fb9e5cbf 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -556,6 +556,8 @@ void ve_stop_ns(struct pid_namespace *pid_ns)
 	if (!ve->ve_ns || ve->ve_ns->pid_ns != pid_ns)
 		return;
 
+	wait_khelpers();
+
 	down_write(&ve->op_sem);
 	/*
 	 * Here the VE changes its state into "not running".


More information about the Devel mailing list