[Devel] [PATCH RHEL8 COMMIT] ve/fs/namespace: fix allowing submounts in non-init userns
Konstantin Khorenko
khorenko at virtuozzo.com
Tue Jun 29 15:51:13 MSK 2021
The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.50
------>
commit fbd8e56d6e3bb2c1a7e1aa300117c6c98eb7ce80
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date: Tue Jun 29 15:51:13 2021 +0300
ve/fs/namespace: fix allowing submounts in non-init userns
When mounting nfs4 mount inside container with something like:
mount -t nfs4 $NODEIP:/root/build/criu /mnt
we can see that because the source "root" path is several directories
long we do create several submounts.
Adding perf probes to list mountpoint->d_sb->s_user_ns and
mountpoint->d_iname from vfs_submount we see:
crash > p &init_user_ns
$2 = (struct user_namespace *) 0xffffffff9644efc0
1) First submount created has mountpoint dentry "root" and ve userns:
mount.nfs4 ...: probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffff8b6d6e86a000 dentry="root"
2) Second submount created has mountpoint dentry "build" from first
submount and init userns of host:
mount.nfs4 ...: probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffffffff9644efc0 dentry="build"
So on first step we have ve userns and on second init userns. Either
compairing it to one of init userns or ve userns would not work because
we can have both of them. So easy solution here is to disable the check
completely like we do in vz7.
Note: this patch allows nfs4 mounts in containers, thus we overcome
nfs3 rpcbind non-dumpable socket migration problems, as now nfs mounts
in v4 mode by default.
https://jira.sw.ru/browse/PSBM-102629
Fixes: 81a2b734416d ("ve/fs/namespace: allow submounts in non-init
userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
fs/namespace.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 75aa3ae9585e..321a79198aac 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1017,13 +1017,14 @@ struct vfsmount *
vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
const char *name, void *data)
{
+#if 0
/* Until it is worked out how to pass the user namespace
* through from the parent mount to the submount don't support
* unprivileged mounts with submounts.
*/
/* Simple NFS mount inside a Container brings us here, so if we want to
- * enable NFS inside a Container (read - in CT root userns), we have
- * to soften the check.
+ * enable NFS inside a Container (read - in non-init userns), we have
+ * to omit the check.
*
* SyS_mount
* do_mount
@@ -1044,8 +1045,9 @@ vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
* nfs_do_submount
* vfs_submount
*/
- if (mountpoint->d_sb->s_user_ns != ve_init_user_ns())
+ if (mountpoint->d_sb->s_user_ns != &init_user_ns)
return ERR_PTR(-EPERM);
+#endif
return vfs_kern_mount(type, SB_SUBMOUNT, name, data);
}
More information about the Devel
mailing list