[Devel] [PATCH RHEL7 COMMIT] pidns: expose task pid_ns_for_children to userspace
Konstantin Khorenko
khorenko at virtuozzo.com
Thu Jun 11 19:20:16 MSK 2020
The commit is pushed to "branch-rh7-3.10.0-1127.10.1.vz7.162.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.10.1.vz7.162.2
------>
commit b385cd4048db5a6dd7c3e859596d4530277937c9
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date: Thu Jun 11 19:20:15 2020 +0300
pidns: expose task pid_ns_for_children to userspace
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.
It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.
This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":
~# ls /proc/5531/ns -l | grep pid
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]
Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Cc: Andrei Vagin <avagin at virtuozzo.com>
Cc: Andreas Gruenbacher <agruenba at redhat.com>
Cc: Kees Cook <keescook at chromium.org>
Cc: Michael Kerrisk <mtk.manpages at googlemail.com>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg at redhat.com>
Cc: Paul Moore <paul at paul-moore.com>
Cc: Eric Biederman <ebiederm at xmission.com>
Cc: Andy Lutomirski <luto at amacapital.net>
Cc: Ingo Molnar <mingo at kernel.org>
Cc: Serge Hallyn <serge at hallyn.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
(cherry picked from VZ8 commit eaa0d190bfe1ed891b814a52712dcd852554cb08)
https://jira.sw.ru/browse/PSBM-102357
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
=====================
Patchset description:
port nsfs from vz8
We have problems with /proc/pid/ns/name bind-mounts in CRIU
1) Currently (without nsfs) such a bind mount have same superblock with
/proc mount, but in case of nested pid-namespaces container can have
multiple different /proc mounts and for ns-bind-mount we need to bind it
from the right pidns. So we will need to enter proper pid-namespace to
reopen ns-file fd from proper proc, it looks too complex.
If we port nsfs ns-bind-mounts will be all on the same superblock which
does not depend from procfs's we opened the ns-file on.
2) Bigger problem will come then we will wan't to migrate ns-bind-mounts
from non-nsfs to nsfs (vz8) kernel this would bring a lot of crutches,
we will need to workaround the fact that before migration mounts were
with same superblock and after migration they can't be.
To overcome those we can port nsfs to vz7 and do ns-bind-mount support in
a new world of nsfs, looks like it would be easier.
First we need to revert all patches which depend from nsfs:
8782a0069f1b proc: add a proc_show_path method to fix mountinfo
b823f8df2fcb ms/tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device
302889fa2e3d ms/net: add an ioctl to get a socket network namespace
7cb9e7ae7041 ms/tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
ac08c64138ac nsfs: add ioctl to get a parent namespace
a8e0dd94d5cd nsfs: add ioctl to get an owning user namespace for ns file descriptor
93dca538d184 kernel: add a helper to get an owning user namespace for a namespace
edaecdb8adac ms/pidns: expose task pid_ns_for_children to userspace
2b151c3f8909 ms/ns: allow ns_entries to have custom symlink content
Cherry-pick nsfs from VZ8:
435d5f4bb2cc common object embedded into various struct ....ns
58be28256d98 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
ff24870f46d5 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
3c0411846118 switch the rest of proc_ns_operations to working with &...->ns
64964528b24e make proc_ns_operations work with struct ns_common * instead of void *
6344c433a452 new helpers: ns_alloc_inum/ns_free_inum
33c429405a2c copy address of proc_ns_ops into ns_common
f77c80142e1a bury struct proc_ns in fs/proc
292662014509 dcache.c: call ->d_prune() regardless of d_unhashed()
e149ed2b805f take the targets of /proc/*/ns/* symlinks to separate fs
Cherry-pick part of reverted patches back from VZ8:
bcac25a58bfc kernel: add a helper to get an owning user namespace for a namespace
6786741dbf99 nsfs: add ioctl to get an owning user namespace for ns file descriptor
a7306ed8d94a nsfs: add ioctl to get a parent namespace
c62cce2caee5 net: add an ioctl to get a socket network namespace
25b14e92af1a ns: allow ns_entries to have custom symlink content
eaa0d190bfe1 pidns: expose task pid_ns_for_children to userspace
Cherry-pick reverted patches back from MS (we also need them to vz8):
75509fd88fbd nsfs: Add a show_path method to fix mountinfo
24dce0800baa net: Export open_related_ns()
d8d211a2a0c3 net: Make extern and export get_net_ns()
f2780d6d7475 tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
0c3e0e3bb623 tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device
073c516ff735 nsfs: mark dentry with DCACHE_RCUACCESS
On this kernel I've runed zdtm, so the change should not break interfaces.
https://jira.sw.ru/browse/PSBM-102357
Al Viro (10):
ms: common object embedded into various struct ....ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
netns: switch ->get()/->put()/->install()/->inum() to working with
&net->ns
switch the rest of proc_ns_operations to working with &...->ns
make proc_ns_operations work with struct ns_common * instead of void *
new helpers: ns_alloc_inum/ns_free_inum
copy address of proc_ns_ops into ns_common
bury struct proc_ns in fs/proc
dcache.c: call ->d_prune() regardless of d_unhashed()
take the targets of /proc/*/ns/* symlinks to separate fs
Andrey Vagin (4):
kernel: add a helper to get an owning user namespace for a namespace
nsfs: add ioctl to get an owning user namespace for ns file descriptor
nsfs: add ioctl to get a parent namespace
net: add an ioctl to get a socket network namespace
Cong Wang (1):
nsfs: mark dentry with DCACHE_RCUACCESS
Eric W. Biederman (1):
nsfs: Add a show_path method to fix mountinfo
Kirill Tkhai (6):
ns: allow ns_entries to have custom symlink content
pidns: expose task pid_ns_for_children to userspace
net: Export open_related_ns()
net: Make extern and export get_net_ns()
tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of
tun device
Pavel Tikhomirov (10):
Revert "proc: add a proc_show_path method to fix mountinfo"
Revert "ms/tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real
net ns of tun device"
Revert "ms/net: add an ioctl to get a socket network namespace"
Revert "ms/tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of
tun device"
Revert "nsfs: add ioctl to get a parent namespace"
Revert "nsfs: add ioctl to get an owning user namespace for ns file
descriptor"
Revert "kernel: add a helper to get an owning user namespace for a
namespace"
Revert "ms/pidns: expose task pid_ns_for_children to userspace"
Revert "ms/ns: allow ns_entries to have custom symlink content"
userns: move EXPORT_SYMBOL closer to current_in_userns
---
fs/proc/namespaces.c | 1 +
include/linux/proc_ns.h | 1 +
kernel/pid_namespace.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 49d459f055460..c9bb30b97401c 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -23,6 +23,7 @@ static const struct proc_ns_operations *ns_entries[] = {
#endif
#ifdef CONFIG_PID_NS
&pidns_operations,
+ &pidns_for_children_operations,
#endif
#ifdef CONFIG_USER_NS
&userns_operations,
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index ea002b8e5ab9b..dc925ebb4ed2c 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -27,6 +27,7 @@ extern const struct proc_ns_operations netns_operations;
extern const struct proc_ns_operations utsns_operations;
extern const struct proc_ns_operations ipcns_operations;
extern const struct proc_ns_operations pidns_operations;
+extern const struct proc_ns_operations pidns_for_children_operations;
extern const struct proc_ns_operations userns_operations;
extern const struct proc_ns_operations mntns_operations;
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index c5541caf7a090..df3361aaa3a6c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -361,6 +361,29 @@ static struct ns_common *pidns_get(struct task_struct *task)
return ns ? &ns->ns : NULL;
}
+static struct ns_common *pidns_for_children_get(struct task_struct *task)
+{
+ struct pid_namespace *ns = NULL;
+
+ task_lock(task);
+ if (task->nsproxy) {
+ ns = task->nsproxy->pid_ns;
+ get_pid_ns(ns);
+ }
+ task_unlock(task);
+
+ if (ns) {
+ qread_lock(&tasklist_lock);
+ if (!ns->child_reaper) {
+ put_pid_ns(ns);
+ ns = NULL;
+ }
+ qread_unlock(&tasklist_lock);
+ }
+
+ return ns ? &ns->ns : NULL;
+}
+
static void pidns_put(struct ns_common *ns)
{
put_pid_ns(to_pid_ns(ns));
@@ -430,6 +453,17 @@ const struct proc_ns_operations pidns_operations = {
.get_parent = pidns_get_parent,
};
+const struct proc_ns_operations pidns_for_children_operations = {
+ .name = "pid_for_children",
+ .real_ns_name = "pid",
+ .type = CLONE_NEWPID,
+ .get = pidns_for_children_get,
+ .put = pidns_put,
+ .install = pidns_install,
+ .owner = pidns_owner,
+ .get_parent = pidns_get_parent,
+};
+
static __init int pid_namespaces_init(void)
{
pid_ns_cachep = KMEM_CACHE(pid_namespace, SLAB_PANIC);
More information about the Devel
mailing list