[Devel] [PATCH RH7 00/32] port nsfs from vz8

Konstantin Khorenko khorenko at virtuozzo.com
Tue Jun 16 13:17:34 MSK 2020


On 06/08/2020 08:05 PM, Pavel Tikhomirov wrote:
> We have problems with /proc/pid/ns/name bind-mounts in CRIU
>
> 1) Currently (without nsfs) such a bind mount have same superblock with
> /proc mount, but in case of nested pid-namespaces container can have
> multiple different /proc mounts and for ns-bind-mount we need to bind it
> from the right pidns. So we will need to enter proper pid-namespace to
> reopen ns-file fd from proper proc, it looks too complex.
>
> If we port nsfs ns-bind-mounts will be all on the same superblock which
> does not depend from procfs's we opened the ns-file on.
>
> 2) Bigger problem will come then we will wan't to migrate ns-bind-mounts
> from non-nsfs to nsfs (vz8) kernel this would bring a lot of crutches,
> we will need to workaround the fact that before migration mounts were
> with same superblock and after migration they can't be.
>
> To overcome those we can port nsfs to vz7 and do ns-bind-mount support in
> a new world of nsfs, looks like it would be easier.
>
> First we need to revert all patches which depend from nsfs:
>
> 8782a0069f1b proc: add a proc_show_path method to fix mountinfo
> b823f8df2fcb ms/tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device
> 302889fa2e3d ms/net: add an ioctl to get a socket network namespace
> 7cb9e7ae7041 ms/tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
> ac08c64138ac nsfs: add ioctl to get a parent namespace
> a8e0dd94d5cd nsfs: add ioctl to get an owning user namespace for ns file descriptor
> 93dca538d184 kernel: add a helper to get an owning user namespace for a namespace
> edaecdb8adac ms/pidns: expose task pid_ns_for_children to userspace
> 2b151c3f8909 ms/ns: allow ns_entries to have custom symlink content
>
> Cherry-pick nsfs from VZ8:
>
> 435d5f4bb2cc common object embedded into various struct ....ns
> 58be28256d98 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
> ff24870f46d5 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
> 3c0411846118 switch the rest of proc_ns_operations to working with &...->ns
> 64964528b24e make proc_ns_operations work with struct ns_common * instead of void *
> 6344c433a452 new helpers: ns_alloc_inum/ns_free_inum
> 33c429405a2c copy address of proc_ns_ops into ns_common
> f77c80142e1a bury struct proc_ns in fs/proc
> 292662014509 dcache.c: call ->d_prune() regardless of d_unhashed()
> e149ed2b805f take the targets of /proc/*/ns/* symlinks to separate fs
>
> Cherry-pick part of reverted patches back from VZ8:
>
> bcac25a58bfc kernel: add a helper to get an owning user namespace for a namespace
> 6786741dbf99 nsfs: add ioctl to get an owning user namespace for ns file descriptor
> a7306ed8d94a nsfs: add ioctl to get a parent namespace
> c62cce2caee5 net: add an ioctl to get a socket network namespace
> 25b14e92af1a ns: allow ns_entries to have custom symlink content
> eaa0d190bfe1 pidns: expose task pid_ns_for_children to userspace
>
> Cherry-pick reverted patches back from MS (we also need them to vz8):
>
> 75509fd88fbd nsfs: Add a show_path method to fix mountinfo
> 24dce0800baa net: Export open_related_ns()
> d8d211a2a0c3 net: Make extern and export get_net_ns()

> f2780d6d7475 tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
> 0c3e0e3bb623 tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device

Only those 2 patches (tun related) were missing in vz8 kernel, so backported them from ms.

> 073c516ff735 nsfs: mark dentry with DCACHE_RCUACCESS
>
> On this kernel I've runed zdtm, so the change should not break interfaces.
>
> https://jira.sw.ru/browse/PSBM-102357
>
> Al Viro (10):
>   ms: common object embedded into various struct ....ns
>   make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
>   netns: switch ->get()/->put()/->install()/->inum() to working with
>     &net->ns
>   switch the rest of proc_ns_operations to working with &...->ns
>   make proc_ns_operations work with struct ns_common * instead of void *
>   new helpers: ns_alloc_inum/ns_free_inum
>   copy address of proc_ns_ops into ns_common
>   bury struct proc_ns in fs/proc
>   dcache.c: call ->d_prune() regardless of d_unhashed()
>   take the targets of /proc/*/ns/* symlinks to separate fs
>
> Andrey Vagin (4):
>   kernel: add a helper to get an owning user namespace for a namespace
>   nsfs: add ioctl to get an owning user namespace for ns file descriptor
>   nsfs: add ioctl to get a parent namespace
>   net: add an ioctl to get a socket network namespace
>
> Cong Wang (1):
>   nsfs: mark dentry with DCACHE_RCUACCESS
>
> Eric W. Biederman (1):
>   nsfs: Add a show_path method to fix mountinfo
>
> Kirill Tkhai (6):
>   ns: allow ns_entries to have custom symlink content
>   pidns: expose task pid_ns_for_children to userspace
>   net: Export open_related_ns()
>   net: Make extern and export get_net_ns()
>   tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
>   tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of
>     tun device
>
> Pavel Tikhomirov (10):
>   Revert "proc: add a proc_show_path method to fix mountinfo"
>   Revert "ms/tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real
>     net ns of tun device"
>   Revert "ms/net: add an ioctl to get a socket network namespace"
>   Revert "ms/tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of
>     tun device"
>   Revert "nsfs: add ioctl to get a parent namespace"
>   Revert "nsfs: add ioctl to get an owning user namespace for ns file
>     descriptor"
>   Revert "kernel: add a helper to get an owning user namespace for a
>     namespace"
>   Revert "ms/pidns: expose task pid_ns_for_children to userspace"
>   Revert "ms/ns: allow ns_entries to have custom symlink content"
>   userns: move EXPORT_SYMBOL closer to current_in_userns
>
>  drivers/net/tun.c              |  15 +-
>  fs/Makefile                    |   2 +-
>  fs/dcache.c                    |   2 +-
>  fs/internal.h                  |   5 +
>  fs/mount.h                     |   3 +-
>  fs/namespace.c                 |  56 ++++----
>  fs/nfs_common/grace.c          |   2 +-
>  fs/nfsd/nfs4recover.c          |   2 +-
>  fs/nsfs.c                      | 255 +++++++++++++++++++++++++++++++++
>  fs/proc/inode.c                |  24 ----
>  fs/proc/internal.h             |   5 +
>  fs/proc/namespaces.c           | 245 +++----------------------------
>  include/linux/ipc_namespace.h  |   3 +-
>  include/linux/ns_common.h      |  12 ++
>  include/linux/pid_namespace.h  |   3 +-
>  include/linux/proc_fs.h        |   4 +
>  include/linux/proc_ns.h        |  52 ++++---
>  include/linux/socket.h         |   2 +
>  include/linux/user_namespace.h |  10 +-
>  include/linux/utsname.h        |   3 +-
>  include/net/net_namespace.h    |   3 +-
>  include/uapi/linux/magic.h     |   1 +
>  include/uapi/linux/nsfs.h      |  13 ++
>  init/main.c                    |   2 +
>  init/version.c                 |   5 +-
>  ipc/msgutil.c                  |   5 +-
>  ipc/namespace.c                |  37 +++--
>  kernel/nsproxy.c               |  10 +-
>  kernel/pid.c                   |   5 +-
>  kernel/pid_namespace.c         |  55 ++++---
>  kernel/user.c                  |   5 +-
>  kernel/user_namespace.c        |  46 +++---
>  kernel/utsname.c               |  36 +++--
>  net/core/net_namespace.c       |  44 +++---
>  net/socket.c                   |  26 +---
>  35 files changed, 521 insertions(+), 477 deletions(-)
>  create mode 100644 fs/nsfs.c
>  create mode 100644 include/linux/ns_common.h
>  create mode 100644 include/uapi/linux/nsfs.h
>


More information about the Devel mailing list