[Devel] [PATCH RHEL8 COMMIT] ve/fs/namespace: fix allowing submounts in non-init userns

Konstantin Khorenko khorenko at virtuozzo.com
Tue Jun 29 15:51:13 MSK 2021


The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.50
------>
commit fbd8e56d6e3bb2c1a7e1aa300117c6c98eb7ce80
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date:   Tue Jun 29 15:51:13 2021 +0300

    ve/fs/namespace: fix allowing submounts in non-init userns
    
    When mounting nfs4 mount inside container with something like:
    
      mount -t nfs4 $NODEIP:/root/build/criu /mnt
    
    we can see that because the source "root" path is several directories
    long we do create several submounts.
    
    Adding perf probes to list mountpoint->d_sb->s_user_ns and
    mountpoint->d_iname from vfs_submount we see:
    
    crash > p &init_user_ns
    $2 = (struct user_namespace *) 0xffffffff9644efc0
    
    1) First submount created has mountpoint dentry "root" and ve userns:
    mount.nfs4 ...:         probe:vfs_submount: (ffffffff95a970e0)
    user_ns=0xffff8b6d6e86a000 dentry="root"
    
    2) Second submount created has mountpoint dentry "build" from first
    submount and init userns of host:
    mount.nfs4 ...:         probe:vfs_submount: (ffffffff95a970e0)
    user_ns=0xffffffff9644efc0 dentry="build"
    
    So on first step we have ve userns and on second init userns. Either
    compairing it to one of init userns or ve userns would not work because
    we can have both of them. So easy solution here is to disable the check
    completely like we do in vz7.
    
    Note: this patch allows nfs4 mounts in containers, thus we overcome
    nfs3 rpcbind non-dumpable socket migration problems, as now nfs mounts
    in v4 mode by default.
    
    https://jira.sw.ru/browse/PSBM-102629
    Fixes: 81a2b734416d ("ve/fs/namespace: allow submounts in non-init
    userns")
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 fs/namespace.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 75aa3ae9585e..321a79198aac 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1017,13 +1017,14 @@ struct vfsmount *
 vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
 	     const char *name, void *data)
 {
+#if 0
 	/* Until it is worked out how to pass the user namespace
 	 * through from the parent mount to the submount don't support
 	 * unprivileged mounts with submounts.
 	 */
 	/* Simple NFS mount inside a Container brings us here, so if we want to
-	 * enable NFS inside a Container (read - in CT root userns), we have
-	 * to soften the check.
+	 * enable NFS inside a Container (read - in non-init userns), we have
+	 * to omit the check.
 	 *
 	 *  SyS_mount
 	 *   do_mount
@@ -1044,8 +1045,9 @@ vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
 	 *		    nfs_do_submount
 	 *		     vfs_submount
 	 */
-	if (mountpoint->d_sb->s_user_ns != ve_init_user_ns())
+	if (mountpoint->d_sb->s_user_ns != &init_user_ns)
 		return ERR_PTR(-EPERM);
+#endif
 
 	return vfs_kern_mount(type, SB_SUBMOUNT, name, data);
 }


More information about the Devel mailing list