[Devel] [PATCH RH9 01/12] ve/fs/namespace: allow submounts in non-init userns

Thu Oct 7 13:20:00 MSK 2021

From: Konstantin Khorenko <khorenko at virtuozzo.com>

Simple NFS mount inside a Container brings us to vfs_submount(), so if
we want to enable NFS inside a Container (read - in CT root userns), we
have to soften the check for init userns.

SyS_mount
 do_mount
  vfs_kern_mount
   mount_fs
    nfs_fs_mount
     nfs4_try_mount
      nfs_follow_remote_path
       mount_subtree
        vfs_path_lookup
         do_path_lookup
          filename_lookup
           path_lookupat
            lookup_slow
             follow_managed
              nfs_d_automount
               nfs4_submount
                nfs_do_submount
                 vfs_submount

https://jira.sw.ru/browse/PSBM-86277
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>

https://jira.sw.ru/browse/PSBM-127234
(cherry picked from vz7 commit bc060d46276144f91a139b7d0acf384dcd0a4dde)

vz7->vz8 port note: in vz7 the check has been dropped at all
in vz8 we leave the check, but allow submounts only for root CT userns.

Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

+++
ve/fs/namespace: fix allowing submounts in non-init userns

When mounting nfs4 mount inside container with something like:

  mount -t nfs4 $NODEIP:/root/build/criu /mnt

we can see that because the source "root" path is several directories
long we do create several submounts.

Adding perf probes to list mountpoint->d_sb->s_user_ns and
mountpoint->d_iname from vfs_submount we see:

crash > p &init_user_ns
$2 = (struct user_namespace *) 0xffffffff9644efc0

1) First submount created has mountpoint dentry "root" and ve userns:
mount.nfs4 ...:         probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffff8b6d6e86a000 dentry="root"

2) Second submount created has mountpoint dentry "build" from first
submount and init userns of host:
mount.nfs4 ...:         probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffffffff9644efc0 dentry="build"

So on first step we have ve userns and on second init userns. Either
compairing it to one of init userns or ve userns would not work because
we can have both of them. So easy solution here is to disable the check
completely like we do in vz7.

Note: this patch allows nfs4 mounts in containers, thus we overcome
nfs3 rpcbind non-dumpable socket migration problems, as now nfs mounts
in v4 mode by default.

https://jira.sw.ru/browse/PSBM-102629
mFixes: 81a2b734416d ("ve/fs/namespace: allow submounts in non-init
userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
 fs/namespace.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index c10614908e7e..85a451861e14 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1051,12 +1051,37 @@ struct vfsmount *
 vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
 	     const char *name, void *data)
 {
+#if 0
 	/* Until it is worked out how to pass the user namespace
 	 * through from the parent mount to the submount don't support
 	 * unprivileged mounts with submounts.
 	 */
+	/* Simple NFS mount inside a Container brings us here, so if we want to
+	 * enable NFS inside a Container (read - in non-init userns), we have
+	 * to omit the check. Below is how is was in VZ8:
+	 *
+	 *  SyS_mount
+	 *   do_mount
+	 *    vfs_kern_mount
+	 *     mount_fs
+	 *      nfs_fs_mount
+	 *       nfs4_try_mount
+	 *        nfs_follow_remote_path
+	 *         mount_subtree
+	 *	    vfs_path_lookup
+	 *	     do_path_lookup
+	 *	      filename_lookup
+	 *	       path_lookupat
+	 *	        lookup_slow
+	 *	         follow_managed
+	 *	          nfs_d_automount
+	 *	           nfs4_submount
+	 *		    nfs_do_submount
+	 *		     vfs_submount
+	 */
 	if (mountpoint->d_sb->s_user_ns != &init_user_ns)
 		return ERR_PTR(-EPERM);
+#endif
 
 	return vfs_kern_mount(type, SB_SUBMOUNT, name, data);
 }