[Devel] [PATCH RHEL7 COMMIT] ms/pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
Konstantin Khorenko
khorenko at virtuozzo.com
Tue May 16 09:23:04 PDT 2017
The commit is pushed to "branch-rh7-3.10.0-514.16.1.vz7.32.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.16.1.vz7.32.3
------>
commit 21c4940330d1b414cae840280b4c6bd8b00d4b89
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date: Tue May 16 20:23:04 2017 +0400
ms/pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
This will go to mainstream:
https://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git/commit/?h=for-linus&id=3fd37226216620c1a468afa999739d5016fbc349
Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:
Task from parent pid_ns Child reaper
copy_process() ..
alloc_pid() ..
.. zap_pid_ns_processes()
.. disable_pid_allocation()
.. read_lock(&tasklist_lock)
.. iterate over pids in pid_ns
.. kill tasks linked to pids
.. read_unlock(&tasklist_lock)
write_lock_irq(&tasklist_lock); ..
attach_pid(p, PIDTYPE_PID); ..
.. ..
So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.
The patch fixes the problem. It simply checks for
(pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:
"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in ->nr_hashed."
If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().
v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.
Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
CC: Andrew Morton <akpm at linux-foundation.org>
CC: Ingo Molnar <mingo at kernel.org>
CC: Peter Zijlstra <peterz at infradead.org>
CC: Oleg Nesterov <oleg at redhat.com>
CC: Mike Rapoport <rppt at linux.vnet.ibm.com>
CC: Michal Hocko <mhocko at suse.com>
CC: Andy Lutomirski <luto at kernel.org>
CC: "Eric W. Biederman" <ebiederm at xmission.com>
CC: Andrei Vagin <avagin at openvz.org>
CC: Cyrill Gorcunov <gorcunov at openvz.org>
CC: Serge Hallyn <serge at hallyn.com>
Cc: stable at vger.kernel.org
Acked-by: Oleg Nesterov <oleg at redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm at xmission.com>
Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
kernel/fork.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 24e178f..0509a83 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1601,11 +1601,13 @@ static struct task_struct *copy_process(unsigned long clone_flags,
*/
recalc_sigpending();
if (signal_pending(current)) {
- spin_unlock(¤t->sighand->siglock);
- write_unlock_irq(&tasklist_lock);
retval = -ERESTARTNOINTR;
goto bad_fork_cancel_cgroup;
}
+ if (unlikely(!(ns_of_pid(pid)->nr_hashed & PIDNS_HASH_ADDING))) {
+ retval = -ENOMEM;
+ goto bad_fork_cancel_cgroup;
+ }
if (likely(p->pid)) {
ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
@@ -1655,6 +1657,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
return p;
bad_fork_cancel_cgroup:
+ spin_unlock(¤t->sighand->siglock);
+ write_unlock_irq(&tasklist_lock);
cgroup_cancel_fork(p, cgrp_ss_priv);
bad_fork_free_pid:
if (pid != &init_struct_pid)
More information about the Devel
mailing list