[Devel] [PATCH RH9 01/12] ve/fs/namespace: allow submounts in non-init userns
Kirill Tkhai
ktkhai at virtuozzo.com
Thu Oct 7 13:20:00 MSK 2021
From: Konstantin Khorenko <khorenko at virtuozzo.com>
Simple NFS mount inside a Container brings us to vfs_submount(), so if
we want to enable NFS inside a Container (read - in CT root userns), we
have to soften the check for init userns.
SyS_mount
do_mount
vfs_kern_mount
mount_fs
nfs_fs_mount
nfs4_try_mount
nfs_follow_remote_path
mount_subtree
vfs_path_lookup
do_path_lookup
filename_lookup
path_lookupat
lookup_slow
follow_managed
nfs_d_automount
nfs4_submount
nfs_do_submount
vfs_submount
https://jira.sw.ru/browse/PSBM-86277
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
https://jira.sw.ru/browse/PSBM-127234
(cherry picked from vz7 commit bc060d46276144f91a139b7d0acf384dcd0a4dde)
vz7->vz8 port note: in vz7 the check has been dropped at all
in vz8 we leave the check, but allow submounts only for root CT userns.
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
+++
ve/fs/namespace: fix allowing submounts in non-init userns
When mounting nfs4 mount inside container with something like:
mount -t nfs4 $NODEIP:/root/build/criu /mnt
we can see that because the source "root" path is several directories
long we do create several submounts.
Adding perf probes to list mountpoint->d_sb->s_user_ns and
mountpoint->d_iname from vfs_submount we see:
crash > p &init_user_ns
$2 = (struct user_namespace *) 0xffffffff9644efc0
1) First submount created has mountpoint dentry "root" and ve userns:
mount.nfs4 ...: probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffff8b6d6e86a000 dentry="root"
2) Second submount created has mountpoint dentry "build" from first
submount and init userns of host:
mount.nfs4 ...: probe:vfs_submount: (ffffffff95a970e0)
user_ns=0xffffffff9644efc0 dentry="build"
So on first step we have ve userns and on second init userns. Either
compairing it to one of init userns or ve userns would not work because
we can have both of them. So easy solution here is to disable the check
completely like we do in vz7.
Note: this patch allows nfs4 mounts in containers, thus we overcome
nfs3 rpcbind non-dumpable socket migration problems, as now nfs mounts
in v4 mode by default.
https://jira.sw.ru/browse/PSBM-102629
mFixes: 81a2b734416d ("ve/fs/namespace: allow submounts in non-init
userns")
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
---
fs/namespace.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index c10614908e7e..85a451861e14 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1051,12 +1051,37 @@ struct vfsmount *
vfs_submount(const struct dentry *mountpoint, struct file_system_type *type,
const char *name, void *data)
{
+#if 0
/* Until it is worked out how to pass the user namespace
* through from the parent mount to the submount don't support
* unprivileged mounts with submounts.
*/
+ /* Simple NFS mount inside a Container brings us here, so if we want to
+ * enable NFS inside a Container (read - in non-init userns), we have
+ * to omit the check. Below is how is was in VZ8:
+ *
+ * SyS_mount
+ * do_mount
+ * vfs_kern_mount
+ * mount_fs
+ * nfs_fs_mount
+ * nfs4_try_mount
+ * nfs_follow_remote_path
+ * mount_subtree
+ * vfs_path_lookup
+ * do_path_lookup
+ * filename_lookup
+ * path_lookupat
+ * lookup_slow
+ * follow_managed
+ * nfs_d_automount
+ * nfs4_submount
+ * nfs_do_submount
+ * vfs_submount
+ */
if (mountpoint->d_sb->s_user_ns != &init_user_ns)
return ERR_PTR(-EPERM);
+#endif
return vfs_kern_mount(type, SB_SUBMOUNT, name, data);
}
More information about the Devel
mailing list